Message ID | 20221025151344.3784230-4-chao.p.peng@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp1065970wru; Tue, 25 Oct 2022 08:22:05 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7jcs5Zdmkr9363SG4oIw1aXTop34N1bmFKXzcKJa5Zjk/rrMTPlyzpOIqGQzgZ7rt1hojS X-Received: by 2002:a17:906:cc0f:b0:7ad:2da5:4711 with SMTP id ml15-20020a170906cc0f00b007ad2da54711mr490148ejb.628.1666711324984; Tue, 25 Oct 2022 08:22:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666711324; cv=none; d=google.com; s=arc-20160816; b=X7f0D+9eDBMhOsLuVDgVRJcmGRo63jCZQ29gJNTJq/H18XhuKQE/XzKf+qPUzeD2vq vZh3NEINa8xTKa4b4CDFj4CKVqAVNjIeuqR81XBVvmmgD026E+3mOMj43jLKVdi4jZLe huDUOnIWUxjD7uhFUI56vqg0eru85Y8/w72CuyKBWBvcLCJNMZDrQCx1/TP4Sa97GbzH EFmvdK/lUTbZwLw/yxEiVy5T6BfvxxU4UuFWx8NNzISNDUa3Rs5yXpgCEw2lvDZG+Y+1 skEQQvXNeSHHXsH5zVSFttr5o4Hes8cdExmfx8Cw17/gxwS1wJhapp1RqQTdvwm4dZ9+ cuQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=fknS4Ck8zD0qFZp1qLGKTaX9bAxoV+hj2N6xgz0n5M0=; b=pPWYMeSeq/6xe7u1DiPxKt5ZPsvm3JhLnZjvSXHb4PLHpD1d+WtccVZGtY7kadmJsh IEsgjeNFLS2Qxt2SfIcLo+MxSMGsPszFNKvIyp7OaEUZI6oFAyqONHnH8OP1AYH+GF5U ppt7aKqn0BeW9NLaZdT2DrqHLizDiZIFhOW4D+EwU5BufQy1uDGtGuDRd9OIaqOBgIqQ mUIbT/m8sn+VsqzZzDeS4ds7x0bqGyZMOPSWS1TxP+/Hk4kNSXkN3vcczcf8NxATBDvE uELn8By8wZ0qzJB/88cHm6lJPenSaQ/9IhAlSvF98uq/lKLMUDekauEBnLP77cUEEnDR /3Vw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=kFK9dAVM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bk6-20020a170906b0c600b0077b21283d09si2705373ejb.349.2022.10.25.08.21.40; Tue, 25 Oct 2022 08:22:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=kFK9dAVM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233132AbiJYPT0 (ORCPT <rfc822;pwkd43@gmail.com> + 99 others); Tue, 25 Oct 2022 11:19:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233174AbiJYPTF (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 25 Oct 2022 11:19:05 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD40818A01A; Tue, 25 Oct 2022 08:18:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666711139; x=1698247139; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ih/6HAJshbJt6TWBogTEZBcZrL2eIfbHydJCwgeirTw=; b=kFK9dAVMD1GARcs8kLVXHQANNi01S689U2idQCwQeiEI36zfxs0IRv7J kQoiCJqeURImcmONFBS8fXxM1YsLLna+ZFGi543IxWNbdjjcvwVsTWQEq 67jv7z4T4QV3LhUaK3IdNTdxvK4xiCx2nDgiUBgYl7YOisyBcjyQqj86N aA8gZ8V1kXjF0RIzvvMYQVdpdmNtmPEWPD0inbpv8v07lYtGmMU0IzPrP M6G/87cxqxKMx6tbd9OWv84b5CcTQm2pR7HD0sbNU1fjdMa5mBETnwJWg aotvT/TQtmQ/3SkMaUhnORzF7qNCMK0+p2Zv4NbieZOQj3Orm7NJZ6A4V g==; X-IronPort-AV: E=McAfee;i="6500,9779,10510"; a="307700547" X-IronPort-AV: E=Sophos;i="5.95,212,1661842800"; d="scan'208";a="307700547" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Oct 2022 08:18:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10510"; a="736865566" X-IronPort-AV: E=Sophos;i="5.95,212,1661842800"; d="scan'208";a="736865566" Received: from chaop.bj.intel.com ([10.240.193.75]) by fmsmga002.fm.intel.com with ESMTP; 25 Oct 2022 08:18:47 -0700 From: Chao Peng <chao.p.peng@linux.intel.com> To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini <pbonzini@redhat.com>, Jonathan Corbet <corbet@lwn.net>, Sean Christopherson <seanjc@google.com>, Vitaly Kuznetsov <vkuznets@redhat.com>, Wanpeng Li <wanpengli@tencent.com>, Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>, Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>, "J . Bruce Fields" <bfields@fieldses.org>, Andrew Morton <akpm@linux-foundation.org>, Shuah Khan <shuah@kernel.org>, Mike Rapoport <rppt@kernel.org>, Steven Price <steven.price@arm.com>, "Maciej S . Szmigiero" <mail@maciej.szmigiero.name>, Vlastimil Babka <vbabka@suse.cz>, Vishal Annapurve <vannapurve@google.com>, Yu Zhang <yu.c.zhang@linux.intel.com>, Chao Peng <chao.p.peng@linux.intel.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret <qperret@google.com>, tabba@google.com, Michael Roth <michael.roth@amd.com>, mhocko@suse.com, Muchun Song <songmuchun@bytedance.com>, wei.w.wang@intel.com Subject: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit Date: Tue, 25 Oct 2022 23:13:39 +0800 Message-Id: <20221025151344.3784230-4-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221025151344.3784230-1-chao.p.peng@linux.intel.com> References: <20221025151344.3784230-1-chao.p.peng@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_HI,SPF_HELO_NONE, SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747673494300872350?= X-GMAIL-MSGID: =?utf-8?q?1747673494300872350?= |
Series |
KVM: mm: fd-based approach for supporting KVM
|
|
Commit Message
Chao Peng
Oct. 25, 2022, 3:13 p.m. UTC
This new KVM exit allows userspace to handle memory-related errors. It indicates an error happens in KVM at guest memory range [gpa, gpa+size). The flags includes additional information for userspace to handle the error. Currently bit 0 is defined as 'private memory' where '1' indicates error happens due to private memory access and '0' indicates error happens due to shared memory access. When private memory is enabled, this new exit will be used for KVM to exit to userspace for shared <-> private memory conversion in memory encryption usage. In such usage, typically there are two kind of memory conversions: - explicit conversion: happens when guest explicitly calls into KVM to map a range (as private or shared), KVM then exits to userspace to perform the map/unmap operations. - implicit conversion: happens in KVM page fault handler where KVM exits to userspace for an implicit conversion when the page is in a different state than requested (private or shared). Suggested-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> --- Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++ include/uapi/linux/kvm.h | 9 +++++++++ 2 files changed, 32 insertions(+)
Comments
On Tue, 25 Oct 2022 at 16:21, Chao Peng <chao.p.peng@linux.intel.com> wrote: > > This new KVM exit allows userspace to handle memory-related errors. It > indicates an error happens in KVM at guest memory range [gpa, gpa+size). > The flags includes additional information for userspace to handle the > error. Currently bit 0 is defined as 'private memory' where '1' > indicates error happens due to private memory access and '0' indicates > error happens due to shared memory access. > > When private memory is enabled, this new exit will be used for KVM to > exit to userspace for shared <-> private memory conversion in memory > encryption usage. In such usage, typically there are two kind of memory > conversions: > - explicit conversion: happens when guest explicitly calls into KVM > to map a range (as private or shared), KVM then exits to userspace > to perform the map/unmap operations. > - implicit conversion: happens in KVM page fault handler where KVM > exits to userspace for an implicit conversion when the page is in a > different state than requested (private or shared). > > Suggested-by: Sean Christopherson <seanjc@google.com> > Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> > --- > Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++ > include/uapi/linux/kvm.h | 9 +++++++++ > 2 files changed, 32 insertions(+) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index f3fa75649a78..975688912b8c 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6537,6 +6537,29 @@ array field represents return values. The userspace should update the return > values of SBI call before resuming the VCPU. For more details on RISC-V SBI > spec refer, https://github.com/riscv/riscv-sbi-doc. > > +:: > + > + /* KVM_EXIT_MEMORY_FAULT */ > + struct { > + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > + __u32 flags; > + __u32 padding; > + __u64 gpa; > + __u64 size; > + } memory; > + > +If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has > +encountered a memory error which is not handled by KVM kernel module and > +userspace may choose to handle it. The 'flags' field indicates the memory > +properties of the exit. > + > + - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by > + private memory access when the bit is set. Otherwise the memory error is > + caused by shared memory access when the bit is clear. > + > +'gpa' and 'size' indicate the memory range the error occurs at. The userspace > +may handle the error and return to KVM to retry the previous memory access. > + What's the difference between this and a plain old MMIO exit ? Just that we can specify a wider size and some flags ? -- PMM
On Tue, Oct 25, 2022, Peter Maydell wrote: > On Tue, 25 Oct 2022 at 16:21, Chao Peng <chao.p.peng@linux.intel.com> wrote: > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > index f3fa75649a78..975688912b8c 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -6537,6 +6537,29 @@ array field represents return values. The userspace should update the return > > values of SBI call before resuming the VCPU. For more details on RISC-V SBI > > spec refer, https://github.com/riscv/riscv-sbi-doc. > > > > +:: > > + > > + /* KVM_EXIT_MEMORY_FAULT */ > > + struct { > > + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > > + __u32 flags; > > + __u32 padding; > > + __u64 gpa; > > + __u64 size; > > + } memory; > > + > > +If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has > > +encountered a memory error which is not handled by KVM kernel module and > > +userspace may choose to handle it. The 'flags' field indicates the memory > > +properties of the exit. > > + > > + - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by > > + private memory access when the bit is set. Otherwise the memory error is > > + caused by shared memory access when the bit is clear. > > + > > +'gpa' and 'size' indicate the memory range the error occurs at. The userspace > > +may handle the error and return to KVM to retry the previous memory access. > > + > > What's the difference between this and a plain old MMIO exit ? > Just that we can specify a wider size and some flags ? KVM_EXIT_MMIO is purely for cases where there is no memslot. KVM_EXIT_MEMORY_FAULT will be used for scenarios where there is a valid memslot for a GPA, but for whatever reason KVM cannot map the memslot into the guest. In this series, the new exit type is use to handle guest-initiated conversions between shared and private memory. By design, conversion requires explicit action from userspace, and so even though KVM has a valid memslot, KVM needs to exit to userspace to effectively forward the conversion request to userspace. Long term, I also hope to convert all guest-triggered -EFAULT paths to instead return KVM_EXIT_MEMORY_FAULT. At minimum, returning KVM_EXIT_MEMORY_FAULT instead of -EFAULT will allow KVM to provide userspace with the "bad" GPA when something goes sideways, e.g. if faulting in the page failed because there's no valid userspace mapping. There have also been two potential use cases[1][2], though they both appear to have been abandoned, where userspace would do something more than just kill the guest in response to KVM_EXIT_MEMORY_FAULT. [1] https://lkml.kernel.org/r/20200617230052.GB27751@linux.intel.com [2] https://lore.kernel.org/all/YKxJLcg%2FWomPE422@google.com
Hi, On Tue, Oct 25, 2022 at 4:19 PM Chao Peng <chao.p.peng@linux.intel.com> wrote: > > This new KVM exit allows userspace to handle memory-related errors. It > indicates an error happens in KVM at guest memory range [gpa, gpa+size). > The flags includes additional information for userspace to handle the > error. Currently bit 0 is defined as 'private memory' where '1' > indicates error happens due to private memory access and '0' indicates > error happens due to shared memory access. > > When private memory is enabled, this new exit will be used for KVM to > exit to userspace for shared <-> private memory conversion in memory > encryption usage. In such usage, typically there are two kind of memory > conversions: > - explicit conversion: happens when guest explicitly calls into KVM > to map a range (as private or shared), KVM then exits to userspace > to perform the map/unmap operations. > - implicit conversion: happens in KVM page fault handler where KVM > exits to userspace for an implicit conversion when the page is in a > different state than requested (private or shared). > > Suggested-by: Sean Christopherson <seanjc@google.com> > Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> > --- Reviewed-by: Fuad Tabba <tabba@google.com> I have tested the V8 version of this patch on arm64/qemu, and considering this hasn't changed: Tested-by: Fuad Tabba <tabba@google.com> Cheers, /fuad > Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++ > include/uapi/linux/kvm.h | 9 +++++++++ > 2 files changed, 32 insertions(+) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index f3fa75649a78..975688912b8c 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6537,6 +6537,29 @@ array field represents return values. The userspace should update the return > values of SBI call before resuming the VCPU. For more details on RISC-V SBI > spec refer, https://github.com/riscv/riscv-sbi-doc. > > +:: > + > + /* KVM_EXIT_MEMORY_FAULT */ > + struct { > + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > + __u32 flags; > + __u32 padding; > + __u64 gpa; > + __u64 size; > + } memory; > + > +If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has > +encountered a memory error which is not handled by KVM kernel module and > +userspace may choose to handle it. The 'flags' field indicates the memory > +properties of the exit. > + > + - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by > + private memory access when the bit is set. Otherwise the memory error is > + caused by shared memory access when the bit is clear. > + > +'gpa' and 'size' indicate the memory range the error occurs at. The userspace > +may handle the error and return to KVM to retry the previous memory access. > + > :: > > /* KVM_EXIT_NOTIFY */ > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index f1ae45c10c94..fa60b032a405 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -300,6 +300,7 @@ struct kvm_xen_exit { > #define KVM_EXIT_RISCV_SBI 35 > #define KVM_EXIT_RISCV_CSR 36 > #define KVM_EXIT_NOTIFY 37 > +#define KVM_EXIT_MEMORY_FAULT 38 > > /* For KVM_EXIT_INTERNAL_ERROR */ > /* Emulate instruction failed. */ > @@ -538,6 +539,14 @@ struct kvm_run { > #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) > __u32 flags; > } notify; > + /* KVM_EXIT_MEMORY_FAULT */ > + struct { > +#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > + __u32 flags; > + __u32 padding; > + __u64 gpa; > + __u64 size; > + } memory; > /* Fix the size of the union. */ > char padding[256]; > }; > -- > 2.25.1 >
On Thu, Oct 27, 2022 at 11:27:05AM +0100, Fuad Tabba wrote: > Hi, > > On Tue, Oct 25, 2022 at 4:19 PM Chao Peng <chao.p.peng@linux.intel.com> wrote: > > > > This new KVM exit allows userspace to handle memory-related errors. It > > indicates an error happens in KVM at guest memory range [gpa, gpa+size). > > The flags includes additional information for userspace to handle the > > error. Currently bit 0 is defined as 'private memory' where '1' > > indicates error happens due to private memory access and '0' indicates > > error happens due to shared memory access. > > > > When private memory is enabled, this new exit will be used for KVM to > > exit to userspace for shared <-> private memory conversion in memory > > encryption usage. In such usage, typically there are two kind of memory > > conversions: > > - explicit conversion: happens when guest explicitly calls into KVM > > to map a range (as private or shared), KVM then exits to userspace > > to perform the map/unmap operations. > > - implicit conversion: happens in KVM page fault handler where KVM > > exits to userspace for an implicit conversion when the page is in a > > different state than requested (private or shared). > > > > Suggested-by: Sean Christopherson <seanjc@google.com> > > Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> > > --- > > Reviewed-by: Fuad Tabba <tabba@google.com> > > I have tested the V8 version of this patch on arm64/qemu, and > considering this hasn't changed: > Tested-by: Fuad Tabba <tabba@google.com> Appreciate your review and testing! Chao > > Cheers, > /fuad > > > > > Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++ > > include/uapi/linux/kvm.h | 9 +++++++++ > > 2 files changed, 32 insertions(+) > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > index f3fa75649a78..975688912b8c 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -6537,6 +6537,29 @@ array field represents return values. The userspace should update the return > > values of SBI call before resuming the VCPU. For more details on RISC-V SBI > > spec refer, https://github.com/riscv/riscv-sbi-doc. > > > > +:: > > + > > + /* KVM_EXIT_MEMORY_FAULT */ > > + struct { > > + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > > + __u32 flags; > > + __u32 padding; > > + __u64 gpa; > > + __u64 size; > > + } memory; > > + > > +If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has > > +encountered a memory error which is not handled by KVM kernel module and > > +userspace may choose to handle it. The 'flags' field indicates the memory > > +properties of the exit. > > + > > + - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by > > + private memory access when the bit is set. Otherwise the memory error is > > + caused by shared memory access when the bit is clear. > > + > > +'gpa' and 'size' indicate the memory range the error occurs at. The userspace > > +may handle the error and return to KVM to retry the previous memory access. > > + > > :: > > > > /* KVM_EXIT_NOTIFY */ > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index f1ae45c10c94..fa60b032a405 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -300,6 +300,7 @@ struct kvm_xen_exit { > > #define KVM_EXIT_RISCV_SBI 35 > > #define KVM_EXIT_RISCV_CSR 36 > > #define KVM_EXIT_NOTIFY 37 > > +#define KVM_EXIT_MEMORY_FAULT 38 > > > > /* For KVM_EXIT_INTERNAL_ERROR */ > > /* Emulate instruction failed. */ > > @@ -538,6 +539,14 @@ struct kvm_run { > > #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) > > __u32 flags; > > } notify; > > + /* KVM_EXIT_MEMORY_FAULT */ > > + struct { > > +#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > > + __u32 flags; > > + __u32 padding; > > + __u64 gpa; > > + __u64 size; > > + } memory; > > /* Fix the size of the union. */ > > char padding[256]; > > }; > > -- > > 2.25.1 > >
Chao Peng <chao.p.peng@linux.intel.com> writes: > This new KVM exit allows userspace to handle memory-related errors. It > indicates an error happens in KVM at guest memory range [gpa, gpa+size). > The flags includes additional information for userspace to handle the > error. Currently bit 0 is defined as 'private memory' where '1' > indicates error happens due to private memory access and '0' indicates > error happens due to shared memory access. > > When private memory is enabled, this new exit will be used for KVM to > exit to userspace for shared <-> private memory conversion in memory > encryption usage. In such usage, typically there are two kind of memory > conversions: > - explicit conversion: happens when guest explicitly calls into KVM > to map a range (as private or shared), KVM then exits to userspace > to perform the map/unmap operations. > - implicit conversion: happens in KVM page fault handler where KVM > exits to userspace for an implicit conversion when the page is in a > different state than requested (private or shared). > > Suggested-by: Sean Christopherson <seanjc@google.com> > Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> > --- > Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++ > include/uapi/linux/kvm.h | 9 +++++++++ > 2 files changed, 32 insertions(+) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index f3fa75649a78..975688912b8c 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6537,6 +6537,29 @@ array field represents return values. The userspace should update the return > values of SBI call before resuming the VCPU. For more details on RISC-V SBI > spec refer, https://github.com/riscv/riscv-sbi-doc. > > +:: > + > + /* KVM_EXIT_MEMORY_FAULT */ > + struct { > + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > + __u32 flags; > + __u32 padding; > + __u64 gpa; > + __u64 size; > + } memory; > + > +If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has > +encountered a memory error which is not handled by KVM kernel module and > +userspace may choose to handle it. The 'flags' field indicates the memory > +properties of the exit. > + > + - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by > + private memory access when the bit is set. Otherwise the memory error is > + caused by shared memory access when the bit is clear. What does a shared memory access failure entail? If you envision any other failure modes it might be worth making it explicit with additional flags. I also wonder if a bitmask makes sense if there can only be one reason for a failure? Maybe all that is needed is a reason enum? > + > +'gpa' and 'size' indicate the memory range the error occurs at. The userspace > +may handle the error and return to KVM to retry the previous memory access. > + > :: > > /* KVM_EXIT_NOTIFY */ > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index f1ae45c10c94..fa60b032a405 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -300,6 +300,7 @@ struct kvm_xen_exit { > #define KVM_EXIT_RISCV_SBI 35 > #define KVM_EXIT_RISCV_CSR 36 > #define KVM_EXIT_NOTIFY 37 > +#define KVM_EXIT_MEMORY_FAULT 38 > > /* For KVM_EXIT_INTERNAL_ERROR */ > /* Emulate instruction failed. */ > @@ -538,6 +539,14 @@ struct kvm_run { > #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) > __u32 flags; > } notify; > + /* KVM_EXIT_MEMORY_FAULT */ > + struct { > +#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > + __u32 flags; > + __u32 padding; > + __u64 gpa; > + __u64 size; > + } memory; > /* Fix the size of the union. */ > char padding[256]; > };
On Tue, Nov 15, 2022 at 04:56:12PM +0000, Alex Bennée wrote: > > Chao Peng <chao.p.peng@linux.intel.com> writes: > > > This new KVM exit allows userspace to handle memory-related errors. It > > indicates an error happens in KVM at guest memory range [gpa, gpa+size). > > The flags includes additional information for userspace to handle the > > error. Currently bit 0 is defined as 'private memory' where '1' > > indicates error happens due to private memory access and '0' indicates > > error happens due to shared memory access. > > > > When private memory is enabled, this new exit will be used for KVM to > > exit to userspace for shared <-> private memory conversion in memory > > encryption usage. In such usage, typically there are two kind of memory > > conversions: > > - explicit conversion: happens when guest explicitly calls into KVM > > to map a range (as private or shared), KVM then exits to userspace > > to perform the map/unmap operations. > > - implicit conversion: happens in KVM page fault handler where KVM > > exits to userspace for an implicit conversion when the page is in a > > different state than requested (private or shared). > > > > Suggested-by: Sean Christopherson <seanjc@google.com> > > Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> > > --- > > Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++ > > include/uapi/linux/kvm.h | 9 +++++++++ > > 2 files changed, 32 insertions(+) > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > index f3fa75649a78..975688912b8c 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -6537,6 +6537,29 @@ array field represents return values. The userspace should update the return > > values of SBI call before resuming the VCPU. For more details on RISC-V SBI > > spec refer, https://github.com/riscv/riscv-sbi-doc. > > > > +:: > > + > > + /* KVM_EXIT_MEMORY_FAULT */ > > + struct { > > + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > > + __u32 flags; > > + __u32 padding; > > + __u64 gpa; > > + __u64 size; > > + } memory; > > + > > +If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has > > +encountered a memory error which is not handled by KVM kernel module and > > +userspace may choose to handle it. The 'flags' field indicates the memory > > +properties of the exit. > > + > > + - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by > > + private memory access when the bit is set. Otherwise the memory error is > > + caused by shared memory access when the bit is clear. > > What does a shared memory access failure entail? In the context of confidential computing usages, guest can issue a shared memory access while the memory is actually private from the host point of view. This exit with bit 0 cleared gives userspace a chance to convert the private memory to shared memory on host. > > If you envision any other failure modes it might be worth making it > explicit with additional flags. Sean mentioned some more usages[1][]2] other than the memory conversion for confidential usage. But I would leave those flags being added in the future after those usages being well discussed. [1] https://lkml.kernel.org/r/20200617230052.GB27751@linux.intel.com [2] https://lore.kernel.org/all/YKxJLcg%2FWomPE422@google.com > I also wonder if a bitmask makes sense if > there can only be one reason for a failure? Maybe all that is needed is > a reason enum? Tough we only have one reason right now but we still want to leave room for future extension. Enum can express a single value at once well but bitmask makes it possible to express multiple orthogonal flags. Chao > > > + > > +'gpa' and 'size' indicate the memory range the error occurs at. The userspace > > +may handle the error and return to KVM to retry the previous memory access. > > + > > :: > > > > /* KVM_EXIT_NOTIFY */ > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index f1ae45c10c94..fa60b032a405 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -300,6 +300,7 @@ struct kvm_xen_exit { > > #define KVM_EXIT_RISCV_SBI 35 > > #define KVM_EXIT_RISCV_CSR 36 > > #define KVM_EXIT_NOTIFY 37 > > +#define KVM_EXIT_MEMORY_FAULT 38 > > > > /* For KVM_EXIT_INTERNAL_ERROR */ > > /* Emulate instruction failed. */ > > @@ -538,6 +539,14 @@ struct kvm_run { > > #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) > > __u32 flags; > > } notify; > > + /* KVM_EXIT_MEMORY_FAULT */ > > + struct { > > +#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > > + __u32 flags; > > + __u32 padding; > > + __u64 gpa; > > + __u64 size; > > + } memory; > > /* Fix the size of the union. */ > > char padding[256]; > > }; > > > -- > Alex Bennée
On Tue, Oct 25, 2022, at 8:13 AM, Chao Peng wrote: > This new KVM exit allows userspace to handle memory-related errors. It > indicates an error happens in KVM at guest memory range [gpa, gpa+size). > The flags includes additional information for userspace to handle the > error. Currently bit 0 is defined as 'private memory' where '1' > indicates error happens due to private memory access and '0' indicates > error happens due to shared memory access. > > When private memory is enabled, this new exit will be used for KVM to > exit to userspace for shared <-> private memory conversion in memory > encryption usage. In such usage, typically there are two kind of memory > conversions: > - explicit conversion: happens when guest explicitly calls into KVM > to map a range (as private or shared), KVM then exits to userspace > to perform the map/unmap operations. > - implicit conversion: happens in KVM page fault handler where KVM > exits to userspace for an implicit conversion when the page is in a > different state than requested (private or shared). > > Suggested-by: Sean Christopherson <seanjc@google.com> > Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> > --- > Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++ > include/uapi/linux/kvm.h | 9 +++++++++ > 2 files changed, 32 insertions(+) > > diff --git a/Documentation/virt/kvm/api.rst > b/Documentation/virt/kvm/api.rst > index f3fa75649a78..975688912b8c 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6537,6 +6537,29 @@ array field represents return values. The > userspace should update the return > values of SBI call before resuming the VCPU. For more details on > RISC-V SBI > spec refer, https://github.com/riscv/riscv-sbi-doc. > > +:: > + > + /* KVM_EXIT_MEMORY_FAULT */ > + struct { > + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > + __u32 flags; > + __u32 padding; > + __u64 gpa; > + __u64 size; > + } memory; > + Would it make sense to also have a field for the access type (read, write, execute, etc)? I realize that shared <-> private conversion doesn't strictly need this, but it seems like it could be useful for logging failures and also for avoiding a second immediate fault if the type gets converted but doesn't have the right protection yet. (Obviously, if this were changed, KVM would need the ability to report that it doesn't actually know the mode.) --Andy
On Wed, Nov 16, 2022, Andy Lutomirski wrote: > > > On Tue, Oct 25, 2022, at 8:13 AM, Chao Peng wrote: > > diff --git a/Documentation/virt/kvm/api.rst > > b/Documentation/virt/kvm/api.rst > > index f3fa75649a78..975688912b8c 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -6537,6 +6537,29 @@ array field represents return values. The > > userspace should update the return > > values of SBI call before resuming the VCPU. For more details on > > RISC-V SBI > > spec refer, https://github.com/riscv/riscv-sbi-doc. > > > > +:: > > + > > + /* KVM_EXIT_MEMORY_FAULT */ > > + struct { > > + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > > + __u32 flags; > > + __u32 padding; > > + __u64 gpa; > > + __u64 size; > > + } memory; > > + > > Would it make sense to also have a field for the access type (read, write, > execute, etc)? I realize that shared <-> private conversion doesn't strictly > need this, but it seems like it could be useful for logging failures and also > for avoiding a second immediate fault if the type gets converted but doesn't > have the right protection yet. I don't think a separate field is necessary, that info can be conveyed via flags. Though maybe we should go straight to a u64 for flags. Hmm, and maybe avoid bits 0-3 so that if/when RWX info is conveyed the flags can align with PROT_{READ,WRITE,EXEC} and the EPT flags, e.g. KVM_MEMORY_EXIT_FLAG_READ (1 << 0) KVM_MEMORY_EXIT_FLAG_WRITE (1 << 1) KVM_MEMORY_EXIT_FLAG_EXECUTE (1 << 2) > (Obviously, if this were changed, KVM would need the ability to report that > it doesn't actually know the mode.) > > --Andy
Chao Peng <chao.p.peng@linux.intel.com> writes: > On Tue, Nov 15, 2022 at 04:56:12PM +0000, Alex Bennée wrote: >> >> Chao Peng <chao.p.peng@linux.intel.com> writes: >> >> > This new KVM exit allows userspace to handle memory-related errors. It >> > indicates an error happens in KVM at guest memory range [gpa, gpa+size). >> > The flags includes additional information for userspace to handle the >> > error. Currently bit 0 is defined as 'private memory' where '1' >> > indicates error happens due to private memory access and '0' indicates >> > error happens due to shared memory access. >> > >> > When private memory is enabled, this new exit will be used for KVM to >> > exit to userspace for shared <-> private memory conversion in memory >> > encryption usage. In such usage, typically there are two kind of memory >> > conversions: >> > - explicit conversion: happens when guest explicitly calls into KVM >> > to map a range (as private or shared), KVM then exits to userspace >> > to perform the map/unmap operations. >> > - implicit conversion: happens in KVM page fault handler where KVM >> > exits to userspace for an implicit conversion when the page is in a >> > different state than requested (private or shared). >> > >> > Suggested-by: Sean Christopherson <seanjc@google.com> >> > Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com> >> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> >> > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> >> > --- >> > Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++ >> > include/uapi/linux/kvm.h | 9 +++++++++ >> > 2 files changed, 32 insertions(+) >> > >> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst >> > index f3fa75649a78..975688912b8c 100644 >> > --- a/Documentation/virt/kvm/api.rst >> > +++ b/Documentation/virt/kvm/api.rst >> > @@ -6537,6 +6537,29 @@ array field represents return values. The userspace should update the return >> > values of SBI call before resuming the VCPU. For more details on RISC-V SBI >> > spec refer, https://github.com/riscv/riscv-sbi-doc. >> > >> > +:: >> > + >> > + /* KVM_EXIT_MEMORY_FAULT */ >> > + struct { >> > + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) >> > + __u32 flags; >> > + __u32 padding; >> > + __u64 gpa; >> > + __u64 size; >> > + } memory; >> > + >> > +If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has >> > +encountered a memory error which is not handled by KVM kernel module and >> > +userspace may choose to handle it. The 'flags' field indicates the memory >> > +properties of the exit. >> > + >> > + - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by >> > + private memory access when the bit is set. Otherwise the memory error is >> > + caused by shared memory access when the bit is clear. >> >> What does a shared memory access failure entail? > > In the context of confidential computing usages, guest can issue a > shared memory access while the memory is actually private from the host > point of view. This exit with bit 0 cleared gives userspace a chance to > convert the private memory to shared memory on host. I think this should be explicit rather than implied by the absence of another flag. Sean suggested you might want flags for RWX failures so maybe something like: KVM_MEMORY_EXIT_SHARED_FLAG_READ (1 << 0) KVM_MEMORY_EXIT_SHARED_FLAG_WRITE (1 << 1) KVM_MEMORY_EXIT_SHARED_FLAG_EXECUTE (1 << 2) KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 3) which would allow you to signal the various failure modes of the shared region, or that you had accessed private memory. > >> >> If you envision any other failure modes it might be worth making it >> explicit with additional flags. > > Sean mentioned some more usages[1][]2] other than the memory conversion > for confidential usage. But I would leave those flags being added in the > future after those usages being well discussed. > > [1] https://lkml.kernel.org/r/20200617230052.GB27751@linux.intel.com > [2] https://lore.kernel.org/all/YKxJLcg%2FWomPE422@google.com > >> I also wonder if a bitmask makes sense if >> there can only be one reason for a failure? Maybe all that is needed is >> a reason enum? > > Tough we only have one reason right now but we still want to leave room > for future extension. Enum can express a single value at once well but > bitmask makes it possible to express multiple orthogonal flags. I agree if multiple orthogonal failures can occur at once a bitmask is the right choice. > > Chao >> >> > + >> > +'gpa' and 'size' indicate the memory range the error occurs at. The userspace >> > +may handle the error and return to KVM to retry the previous memory access. >> > + >> > :: >> > >> > /* KVM_EXIT_NOTIFY */ >> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >> > index f1ae45c10c94..fa60b032a405 100644 >> > --- a/include/uapi/linux/kvm.h >> > +++ b/include/uapi/linux/kvm.h >> > @@ -300,6 +300,7 @@ struct kvm_xen_exit { >> > #define KVM_EXIT_RISCV_SBI 35 >> > #define KVM_EXIT_RISCV_CSR 36 >> > #define KVM_EXIT_NOTIFY 37 >> > +#define KVM_EXIT_MEMORY_FAULT 38 >> > >> > /* For KVM_EXIT_INTERNAL_ERROR */ >> > /* Emulate instruction failed. */ >> > @@ -538,6 +539,14 @@ struct kvm_run { >> > #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) >> > __u32 flags; >> > } notify; >> > + /* KVM_EXIT_MEMORY_FAULT */ >> > + struct { >> > +#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) >> > + __u32 flags; >> > + __u32 padding; >> > + __u64 gpa; >> > + __u64 size; >> > + } memory; >> > /* Fix the size of the union. */ >> > char padding[256]; >> > }; >> >> >> -- >> Alex Bennée
On Wed, Nov 16, 2022 at 06:48:43PM +0000, Sean Christopherson wrote: > On Wed, Nov 16, 2022, Andy Lutomirski wrote: > > > > > > On Tue, Oct 25, 2022, at 8:13 AM, Chao Peng wrote: > > > diff --git a/Documentation/virt/kvm/api.rst > > > b/Documentation/virt/kvm/api.rst > > > index f3fa75649a78..975688912b8c 100644 > > > --- a/Documentation/virt/kvm/api.rst > > > +++ b/Documentation/virt/kvm/api.rst > > > @@ -6537,6 +6537,29 @@ array field represents return values. The > > > userspace should update the return > > > values of SBI call before resuming the VCPU. For more details on > > > RISC-V SBI > > > spec refer, https://github.com/riscv/riscv-sbi-doc. > > > > > > +:: > > > + > > > + /* KVM_EXIT_MEMORY_FAULT */ > > > + struct { > > > + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > > > + __u32 flags; > > > + __u32 padding; > > > + __u64 gpa; > > > + __u64 size; > > > + } memory; > > > + > > > > Would it make sense to also have a field for the access type (read, write, > > execute, etc)? I realize that shared <-> private conversion doesn't strictly > > need this, but it seems like it could be useful for logging failures and also > > for avoiding a second immediate fault if the type gets converted but doesn't > > have the right protection yet. > > I don't think a separate field is necessary, that info can be conveyed via flags. > Though maybe we should go straight to a u64 for flags. Yeah, I can do that. > Hmm, and maybe avoid bits > 0-3 so that if/when RWX info is conveyed the flags can align with > PROT_{READ,WRITE,EXEC} and the EPT flags, e.g. You mean avoiding bits 0-2, right, bit3 is not so special and can be used for KVM_MEMORY_EXIT_FLAG_PRIVATE. Chao > > KVM_MEMORY_EXIT_FLAG_READ (1 << 0) > KVM_MEMORY_EXIT_FLAG_WRITE (1 << 1) > KVM_MEMORY_EXIT_FLAG_EXECUTE (1 << 2) > > > (Obviously, if this were changed, KVM would need the ability to report that > > it doesn't actually know the mode.) > > > > --Andy
On Wed, Nov 16, 2022 at 07:03:49PM +0000, Alex Bennée wrote: > > Chao Peng <chao.p.peng@linux.intel.com> writes: > > > On Tue, Nov 15, 2022 at 04:56:12PM +0000, Alex Bennée wrote: > >> > >> Chao Peng <chao.p.peng@linux.intel.com> writes: > >> > >> > This new KVM exit allows userspace to handle memory-related errors. It > >> > indicates an error happens in KVM at guest memory range [gpa, gpa+size). > >> > The flags includes additional information for userspace to handle the > >> > error. Currently bit 0 is defined as 'private memory' where '1' > >> > indicates error happens due to private memory access and '0' indicates > >> > error happens due to shared memory access. > >> > > >> > When private memory is enabled, this new exit will be used for KVM to > >> > exit to userspace for shared <-> private memory conversion in memory > >> > encryption usage. In such usage, typically there are two kind of memory > >> > conversions: > >> > - explicit conversion: happens when guest explicitly calls into KVM > >> > to map a range (as private or shared), KVM then exits to userspace > >> > to perform the map/unmap operations. > >> > - implicit conversion: happens in KVM page fault handler where KVM > >> > exits to userspace for an implicit conversion when the page is in a > >> > different state than requested (private or shared). > >> > > >> > Suggested-by: Sean Christopherson <seanjc@google.com> > >> > Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com> > >> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > >> > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> > >> > --- > >> > Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++ > >> > include/uapi/linux/kvm.h | 9 +++++++++ > >> > 2 files changed, 32 insertions(+) > >> > > >> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > >> > index f3fa75649a78..975688912b8c 100644 > >> > --- a/Documentation/virt/kvm/api.rst > >> > +++ b/Documentation/virt/kvm/api.rst > >> > @@ -6537,6 +6537,29 @@ array field represents return values. The userspace should update the return > >> > values of SBI call before resuming the VCPU. For more details on RISC-V SBI > >> > spec refer, https://github.com/riscv/riscv-sbi-doc. > >> > > >> > +:: > >> > + > >> > + /* KVM_EXIT_MEMORY_FAULT */ > >> > + struct { > >> > + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > >> > + __u32 flags; > >> > + __u32 padding; > >> > + __u64 gpa; > >> > + __u64 size; > >> > + } memory; > >> > + > >> > +If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has > >> > +encountered a memory error which is not handled by KVM kernel module and > >> > +userspace may choose to handle it. The 'flags' field indicates the memory > >> > +properties of the exit. > >> > + > >> > + - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by > >> > + private memory access when the bit is set. Otherwise the memory error is > >> > + caused by shared memory access when the bit is clear. > >> > >> What does a shared memory access failure entail? > > > > In the context of confidential computing usages, guest can issue a > > shared memory access while the memory is actually private from the host > > point of view. This exit with bit 0 cleared gives userspace a chance to > > convert the private memory to shared memory on host. > > I think this should be explicit rather than implied by the absence of > another flag. Sean suggested you might want flags for RWX failures so > maybe something like: > > KVM_MEMORY_EXIT_SHARED_FLAG_READ (1 << 0) > KVM_MEMORY_EXIT_SHARED_FLAG_WRITE (1 << 1) > KVM_MEMORY_EXIT_SHARED_FLAG_EXECUTE (1 << 2) > KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 3) Yes, but I would not add 'SHARED' to RWX, they are not share memory specific, private memory can also set them once introduced. Thanks, Chao > > which would allow you to signal the various failure modes of the shared > region, or that you had accessed private memory. > > > > >> > >> If you envision any other failure modes it might be worth making it > >> explicit with additional flags. > > > > Sean mentioned some more usages[1][]2] other than the memory conversion > > for confidential usage. But I would leave those flags being added in the > > future after those usages being well discussed. > > > > [1] https://lkml.kernel.org/r/20200617230052.GB27751@linux.intel.com > > [2] https://lore.kernel.org/all/YKxJLcg%2FWomPE422@google.com > > > >> I also wonder if a bitmask makes sense if > >> there can only be one reason for a failure? Maybe all that is needed is > >> a reason enum? > > > > Tough we only have one reason right now but we still want to leave room > > for future extension. Enum can express a single value at once well but > > bitmask makes it possible to express multiple orthogonal flags. > > I agree if multiple orthogonal failures can occur at once a bitmask is > the right choice. > > > > > Chao > >> > >> > + > >> > +'gpa' and 'size' indicate the memory range the error occurs at. The userspace > >> > +may handle the error and return to KVM to retry the previous memory access. > >> > + > >> > :: > >> > > >> > /* KVM_EXIT_NOTIFY */ > >> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > >> > index f1ae45c10c94..fa60b032a405 100644 > >> > --- a/include/uapi/linux/kvm.h > >> > +++ b/include/uapi/linux/kvm.h > >> > @@ -300,6 +300,7 @@ struct kvm_xen_exit { > >> > #define KVM_EXIT_RISCV_SBI 35 > >> > #define KVM_EXIT_RISCV_CSR 36 > >> > #define KVM_EXIT_NOTIFY 37 > >> > +#define KVM_EXIT_MEMORY_FAULT 38 > >> > > >> > /* For KVM_EXIT_INTERNAL_ERROR */ > >> > /* Emulate instruction failed. */ > >> > @@ -538,6 +539,14 @@ struct kvm_run { > >> > #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) > >> > __u32 flags; > >> > } notify; > >> > + /* KVM_EXIT_MEMORY_FAULT */ > >> > + struct { > >> > +#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > >> > + __u32 flags; > >> > + __u32 padding; > >> > + __u64 gpa; > >> > + __u64 size; > >> > + } memory; > >> > /* Fix the size of the union. */ > >> > char padding[256]; > >> > }; > >> > >> > >> -- > >> Alex Bennée > > > -- > Alex Bennée
Chao Peng <chao.p.peng@linux.intel.com> writes: > On Wed, Nov 16, 2022 at 07:03:49PM +0000, Alex Bennée wrote: >> >> Chao Peng <chao.p.peng@linux.intel.com> writes: >> >> > On Tue, Nov 15, 2022 at 04:56:12PM +0000, Alex Bennée wrote: >> >> >> >> Chao Peng <chao.p.peng@linux.intel.com> writes: >> >> >> >> > This new KVM exit allows userspace to handle memory-related errors. It >> >> > indicates an error happens in KVM at guest memory range [gpa, gpa+size). >> >> > The flags includes additional information for userspace to handle the >> >> > error. Currently bit 0 is defined as 'private memory' where '1' >> >> > indicates error happens due to private memory access and '0' indicates >> >> > error happens due to shared memory access. >> >> > >> >> > When private memory is enabled, this new exit will be used for KVM to >> >> > exit to userspace for shared <-> private memory conversion in memory >> >> > encryption usage. In such usage, typically there are two kind of memory >> >> > conversions: >> >> > - explicit conversion: happens when guest explicitly calls into KVM >> >> > to map a range (as private or shared), KVM then exits to userspace >> >> > to perform the map/unmap operations. >> >> > - implicit conversion: happens in KVM page fault handler where KVM >> >> > exits to userspace for an implicit conversion when the page is in a >> >> > different state than requested (private or shared). >> >> > >> >> > Suggested-by: Sean Christopherson <seanjc@google.com> >> >> > Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com> >> >> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> >> >> > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> >> >> > --- >> >> > Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++ >> >> > include/uapi/linux/kvm.h | 9 +++++++++ >> >> > 2 files changed, 32 insertions(+) >> >> > >> >> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst >> >> > index f3fa75649a78..975688912b8c 100644 >> >> > --- a/Documentation/virt/kvm/api.rst >> >> > +++ b/Documentation/virt/kvm/api.rst >> >> > @@ -6537,6 +6537,29 @@ array field represents return values. The userspace should update the return >> >> > values of SBI call before resuming the VCPU. For more details on RISC-V SBI >> >> > spec refer, https://github.com/riscv/riscv-sbi-doc. >> >> > >> >> > +:: >> >> > + >> >> > + /* KVM_EXIT_MEMORY_FAULT */ >> >> > + struct { >> >> > + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) >> >> > + __u32 flags; >> >> > + __u32 padding; >> >> > + __u64 gpa; >> >> > + __u64 size; >> >> > + } memory; >> >> > + >> >> > +If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has >> >> > +encountered a memory error which is not handled by KVM kernel module and >> >> > +userspace may choose to handle it. The 'flags' field indicates the memory >> >> > +properties of the exit. >> >> > + >> >> > + - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by >> >> > + private memory access when the bit is set. Otherwise the memory error is >> >> > + caused by shared memory access when the bit is clear. >> >> >> >> What does a shared memory access failure entail? >> > >> > In the context of confidential computing usages, guest can issue a >> > shared memory access while the memory is actually private from the host >> > point of view. This exit with bit 0 cleared gives userspace a chance to >> > convert the private memory to shared memory on host. >> >> I think this should be explicit rather than implied by the absence of >> another flag. Sean suggested you might want flags for RWX failures so >> maybe something like: >> >> KVM_MEMORY_EXIT_SHARED_FLAG_READ (1 << 0) >> KVM_MEMORY_EXIT_SHARED_FLAG_WRITE (1 << 1) >> KVM_MEMORY_EXIT_SHARED_FLAG_EXECUTE (1 << 2) >> KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 3) > > Yes, but I would not add 'SHARED' to RWX, they are not share memory > specific, private memory can also set them once introduced. OK so how about: KVM_MEMORY_EXIT_FLAG_READ (1 << 0) KVM_MEMORY_EXIT_FLAG_WRITE (1 << 1) KVM_MEMORY_EXIT_FLAG_EXECUTE (1 << 2) KVM_MEMORY_EXIT_FLAG_SHARED (1 << 3) KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 4) > > Thanks, > Chao >> >> which would allow you to signal the various failure modes of the shared >> region, or that you had accessed private memory. >> >> > >> >> >> >> If you envision any other failure modes it might be worth making it >> >> explicit with additional flags. >> > >> > Sean mentioned some more usages[1][]2] other than the memory conversion >> > for confidential usage. But I would leave those flags being added in the >> > future after those usages being well discussed. >> > >> > [1] https://lkml.kernel.org/r/20200617230052.GB27751@linux.intel.com >> > [2] https://lore.kernel.org/all/YKxJLcg%2FWomPE422@google.com >> > >> >> I also wonder if a bitmask makes sense if >> >> there can only be one reason for a failure? Maybe all that is needed is >> >> a reason enum? >> > >> > Tough we only have one reason right now but we still want to leave room >> > for future extension. Enum can express a single value at once well but >> > bitmask makes it possible to express multiple orthogonal flags. >> >> I agree if multiple orthogonal failures can occur at once a bitmask is >> the right choice. >> >> > >> > Chao >> >> >> >> > + >> >> > +'gpa' and 'size' indicate the memory range the error occurs at. The userspace >> >> > +may handle the error and return to KVM to retry the previous memory access. >> >> > + >> >> > :: >> >> > >> >> > /* KVM_EXIT_NOTIFY */ >> >> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >> >> > index f1ae45c10c94..fa60b032a405 100644 >> >> > --- a/include/uapi/linux/kvm.h >> >> > +++ b/include/uapi/linux/kvm.h >> >> > @@ -300,6 +300,7 @@ struct kvm_xen_exit { >> >> > #define KVM_EXIT_RISCV_SBI 35 >> >> > #define KVM_EXIT_RISCV_CSR 36 >> >> > #define KVM_EXIT_NOTIFY 37 >> >> > +#define KVM_EXIT_MEMORY_FAULT 38 >> >> > >> >> > /* For KVM_EXIT_INTERNAL_ERROR */ >> >> > /* Emulate instruction failed. */ >> >> > @@ -538,6 +539,14 @@ struct kvm_run { >> >> > #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) >> >> > __u32 flags; >> >> > } notify; >> >> > + /* KVM_EXIT_MEMORY_FAULT */ >> >> > + struct { >> >> > +#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) >> >> > + __u32 flags; >> >> > + __u32 padding; >> >> > + __u64 gpa; >> >> > + __u64 size; >> >> > + } memory; >> >> > /* Fix the size of the union. */ >> >> > char padding[256]; >> >> > }; >> >> >> >> >> >> -- >> >> Alex Bennée >> >> >> -- >> Alex Bennée
On Thu, Nov 17, 2022 at 03:08:17PM +0000, Alex Bennée wrote: > > Chao Peng <chao.p.peng@linux.intel.com> writes: > > > On Wed, Nov 16, 2022 at 07:03:49PM +0000, Alex Bennée wrote: > >> > >> Chao Peng <chao.p.peng@linux.intel.com> writes: > >> > >> > On Tue, Nov 15, 2022 at 04:56:12PM +0000, Alex Bennée wrote: > >> >> > >> >> Chao Peng <chao.p.peng@linux.intel.com> writes: > >> >> > >> >> > This new KVM exit allows userspace to handle memory-related errors. It > >> >> > indicates an error happens in KVM at guest memory range [gpa, gpa+size). > >> >> > The flags includes additional information for userspace to handle the > >> >> > error. Currently bit 0 is defined as 'private memory' where '1' > >> >> > indicates error happens due to private memory access and '0' indicates > >> >> > error happens due to shared memory access. > >> >> > > >> >> > When private memory is enabled, this new exit will be used for KVM to > >> >> > exit to userspace for shared <-> private memory conversion in memory > >> >> > encryption usage. In such usage, typically there are two kind of memory > >> >> > conversions: > >> >> > - explicit conversion: happens when guest explicitly calls into KVM > >> >> > to map a range (as private or shared), KVM then exits to userspace > >> >> > to perform the map/unmap operations. > >> >> > - implicit conversion: happens in KVM page fault handler where KVM > >> >> > exits to userspace for an implicit conversion when the page is in a > >> >> > different state than requested (private or shared). > >> >> > > >> >> > Suggested-by: Sean Christopherson <seanjc@google.com> > >> >> > Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com> > >> >> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > >> >> > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> > >> >> > --- > >> >> > Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++ > >> >> > include/uapi/linux/kvm.h | 9 +++++++++ > >> >> > 2 files changed, 32 insertions(+) > >> >> > > >> >> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > >> >> > index f3fa75649a78..975688912b8c 100644 > >> >> > --- a/Documentation/virt/kvm/api.rst > >> >> > +++ b/Documentation/virt/kvm/api.rst > >> >> > @@ -6537,6 +6537,29 @@ array field represents return values. The userspace should update the return > >> >> > values of SBI call before resuming the VCPU. For more details on RISC-V SBI > >> >> > spec refer, https://github.com/riscv/riscv-sbi-doc. > >> >> > > >> >> > +:: > >> >> > + > >> >> > + /* KVM_EXIT_MEMORY_FAULT */ > >> >> > + struct { > >> >> > + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > >> >> > + __u32 flags; > >> >> > + __u32 padding; > >> >> > + __u64 gpa; > >> >> > + __u64 size; > >> >> > + } memory; > >> >> > + > >> >> > +If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has > >> >> > +encountered a memory error which is not handled by KVM kernel module and > >> >> > +userspace may choose to handle it. The 'flags' field indicates the memory > >> >> > +properties of the exit. > >> >> > + > >> >> > + - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by > >> >> > + private memory access when the bit is set. Otherwise the memory error is > >> >> > + caused by shared memory access when the bit is clear. > >> >> > >> >> What does a shared memory access failure entail? > >> > > >> > In the context of confidential computing usages, guest can issue a > >> > shared memory access while the memory is actually private from the host > >> > point of view. This exit with bit 0 cleared gives userspace a chance to > >> > convert the private memory to shared memory on host. > >> > >> I think this should be explicit rather than implied by the absence of > >> another flag. Sean suggested you might want flags for RWX failures so > >> maybe something like: > >> > >> KVM_MEMORY_EXIT_SHARED_FLAG_READ (1 << 0) > >> KVM_MEMORY_EXIT_SHARED_FLAG_WRITE (1 << 1) > >> KVM_MEMORY_EXIT_SHARED_FLAG_EXECUTE (1 << 2) > >> KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 3) > > > > Yes, but I would not add 'SHARED' to RWX, they are not share memory > > specific, private memory can also set them once introduced. > > OK so how about: > > KVM_MEMORY_EXIT_FLAG_READ (1 << 0) > KVM_MEMORY_EXIT_FLAG_WRITE (1 << 1) > KVM_MEMORY_EXIT_FLAG_EXECUTE (1 << 2) > KVM_MEMORY_EXIT_FLAG_SHARED (1 << 3) > KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 4) We don't actually need a new bit, the opposite side of private is shared, i.e. flags with KVM_MEMORY_EXIT_FLAG_PRIVATE cleared expresses 'shared'. Chao > > > > > Thanks, > > Chao > >> > >> which would allow you to signal the various failure modes of the shared > >> region, or that you had accessed private memory. > >> > >> > > >> >> > >> >> If you envision any other failure modes it might be worth making it > >> >> explicit with additional flags. > >> > > >> > Sean mentioned some more usages[1][]2] other than the memory conversion > >> > for confidential usage. But I would leave those flags being added in the > >> > future after those usages being well discussed. > >> > > >> > [1] https://lkml.kernel.org/r/20200617230052.GB27751@linux.intel.com > >> > [2] https://lore.kernel.org/all/YKxJLcg%2FWomPE422@google.com > >> > > >> >> I also wonder if a bitmask makes sense if > >> >> there can only be one reason for a failure? Maybe all that is needed is > >> >> a reason enum? > >> > > >> > Tough we only have one reason right now but we still want to leave room > >> > for future extension. Enum can express a single value at once well but > >> > bitmask makes it possible to express multiple orthogonal flags. > >> > >> I agree if multiple orthogonal failures can occur at once a bitmask is > >> the right choice. > >> > >> > > >> > Chao > >> >> > >> >> > + > >> >> > +'gpa' and 'size' indicate the memory range the error occurs at. The userspace > >> >> > +may handle the error and return to KVM to retry the previous memory access. > >> >> > + > >> >> > :: > >> >> > > >> >> > /* KVM_EXIT_NOTIFY */ > >> >> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > >> >> > index f1ae45c10c94..fa60b032a405 100644 > >> >> > --- a/include/uapi/linux/kvm.h > >> >> > +++ b/include/uapi/linux/kvm.h > >> >> > @@ -300,6 +300,7 @@ struct kvm_xen_exit { > >> >> > #define KVM_EXIT_RISCV_SBI 35 > >> >> > #define KVM_EXIT_RISCV_CSR 36 > >> >> > #define KVM_EXIT_NOTIFY 37 > >> >> > +#define KVM_EXIT_MEMORY_FAULT 38 > >> >> > > >> >> > /* For KVM_EXIT_INTERNAL_ERROR */ > >> >> > /* Emulate instruction failed. */ > >> >> > @@ -538,6 +539,14 @@ struct kvm_run { > >> >> > #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) > >> >> > __u32 flags; > >> >> > } notify; > >> >> > + /* KVM_EXIT_MEMORY_FAULT */ > >> >> > + struct { > >> >> > +#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > >> >> > + __u32 flags; > >> >> > + __u32 padding; > >> >> > + __u64 gpa; > >> >> > + __u64 size; > >> >> > + } memory; > >> >> > /* Fix the size of the union. */ > >> >> > char padding[256]; > >> >> > }; > >> >> > >> >> > >> >> -- > >> >> Alex Bennée > >> > >> > >> -- > >> Alex Bennée > > > -- > Alex Bennée
Chao Peng <chao.p.peng@linux.intel.com> writes: > On Thu, Nov 17, 2022 at 03:08:17PM +0000, Alex Bennée wrote: >> <snip> >> >> >> > + >> >> >> > + /* KVM_EXIT_MEMORY_FAULT */ >> >> >> > + struct { >> >> >> > + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) >> >> >> > + __u32 flags; >> >> >> > + __u32 padding; >> >> >> > + __u64 gpa; >> >> >> > + __u64 size; >> >> >> > + } memory; >> >> >> > + >> >> >> > +If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has >> >> >> > +encountered a memory error which is not handled by KVM kernel module and >> >> >> > +userspace may choose to handle it. The 'flags' field indicates the memory >> >> >> > +properties of the exit. >> >> >> > + >> >> >> > + - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by >> >> >> > + private memory access when the bit is set. Otherwise the memory error is >> >> >> > + caused by shared memory access when the bit is clear. >> >> >> >> >> >> What does a shared memory access failure entail? >> >> > >> >> > In the context of confidential computing usages, guest can issue a >> >> > shared memory access while the memory is actually private from the host >> >> > point of view. This exit with bit 0 cleared gives userspace a chance to >> >> > convert the private memory to shared memory on host. >> >> >> >> I think this should be explicit rather than implied by the absence of >> >> another flag. Sean suggested you might want flags for RWX failures so >> >> maybe something like: >> >> >> >> KVM_MEMORY_EXIT_SHARED_FLAG_READ (1 << 0) >> >> KVM_MEMORY_EXIT_SHARED_FLAG_WRITE (1 << 1) >> >> KVM_MEMORY_EXIT_SHARED_FLAG_EXECUTE (1 << 2) >> >> KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 3) >> > >> > Yes, but I would not add 'SHARED' to RWX, they are not share memory >> > specific, private memory can also set them once introduced. >> >> OK so how about: >> >> KVM_MEMORY_EXIT_FLAG_READ (1 << 0) >> KVM_MEMORY_EXIT_FLAG_WRITE (1 << 1) >> KVM_MEMORY_EXIT_FLAG_EXECUTE (1 << 2) >> KVM_MEMORY_EXIT_FLAG_SHARED (1 << 3) >> KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 4) > > We don't actually need a new bit, the opposite side of private is > shared, i.e. flags with KVM_MEMORY_EXIT_FLAG_PRIVATE cleared expresses > 'shared'. If that is always true and we never expect a 3rd type of memory that is fine. But given we are leaving room for expansion having an explicit bit allows for that as well as making cases of forgetting to set the flags more obvious.
On Fri, Nov 18, 2022, Alex Bennée wrote: > > Chao Peng <chao.p.peng@linux.intel.com> writes: > > > On Thu, Nov 17, 2022 at 03:08:17PM +0000, Alex Bennée wrote: > >> >> I think this should be explicit rather than implied by the absence of > >> >> another flag. Sean suggested you might want flags for RWX failures so > >> >> maybe something like: > >> >> > >> >> KVM_MEMORY_EXIT_SHARED_FLAG_READ (1 << 0) > >> >> KVM_MEMORY_EXIT_SHARED_FLAG_WRITE (1 << 1) > >> >> KVM_MEMORY_EXIT_SHARED_FLAG_EXECUTE (1 << 2) > >> >> KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 3) > >> > > >> > Yes, but I would not add 'SHARED' to RWX, they are not share memory > >> > specific, private memory can also set them once introduced. > >> > >> OK so how about: > >> > >> KVM_MEMORY_EXIT_FLAG_READ (1 << 0) > >> KVM_MEMORY_EXIT_FLAG_WRITE (1 << 1) > >> KVM_MEMORY_EXIT_FLAG_EXECUTE (1 << 2) > >> KVM_MEMORY_EXIT_FLAG_SHARED (1 << 3) > >> KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 4) > > > > We don't actually need a new bit, the opposite side of private is > > shared, i.e. flags with KVM_MEMORY_EXIT_FLAG_PRIVATE cleared expresses > > 'shared'. > > If that is always true and we never expect a 3rd type of memory that is > fine. But given we are leaving room for expansion having an explicit bit > allows for that as well as making cases of forgetting to set the flags > more obvious. Hrm, I'm on the fence. A dedicated flag isn't strictly needed, e.g. even if we end up with 3+ types in this category, the baseline could always be "private". I do like being explicit, and adding a PRIVATE flag costs KVM practically nothing to implement and maintain, but evetually we'll up with flags that are paired with an implicit state, e.g. see the many #PF error codes in x86. In other words, inevitably KVM will need to define the default/base state of the access, at which point the base state for SHARED vs. PRIVATE is "undefined". The RWX bits are in the same boat, e.g. the READ flag isn't strictly necessary. I was thinking more of the KVM_SET_MEMORY_ATTRIBUTES ioctl(), which does need the full RWX gamut, when I typed out that response. So I would say if we add an explicit READ flag, then we might as well add an explicit PRIVATE flag too. But if we omit PRIVATE, then we should omit READ too.
On Fri, Nov 18, 2022 at 03:59:12PM +0000, Sean Christopherson wrote: > On Fri, Nov 18, 2022, Alex Benn?e wrote: > > > > Chao Peng <chao.p.peng@linux.intel.com> writes: > > > > > On Thu, Nov 17, 2022 at 03:08:17PM +0000, Alex Benn?e wrote: > > >> >> I think this should be explicit rather than implied by the absence of > > >> >> another flag. Sean suggested you might want flags for RWX failures so > > >> >> maybe something like: > > >> >> > > >> >> KVM_MEMORY_EXIT_SHARED_FLAG_READ (1 << 0) > > >> >> KVM_MEMORY_EXIT_SHARED_FLAG_WRITE (1 << 1) > > >> >> KVM_MEMORY_EXIT_SHARED_FLAG_EXECUTE (1 << 2) > > >> >> KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 3) > > >> > > > >> > Yes, but I would not add 'SHARED' to RWX, they are not share memory > > >> > specific, private memory can also set them once introduced. > > >> > > >> OK so how about: > > >> > > >> KVM_MEMORY_EXIT_FLAG_READ (1 << 0) > > >> KVM_MEMORY_EXIT_FLAG_WRITE (1 << 1) > > >> KVM_MEMORY_EXIT_FLAG_EXECUTE (1 << 2) > > >> KVM_MEMORY_EXIT_FLAG_SHARED (1 << 3) > > >> KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 4) > > > > > > We don't actually need a new bit, the opposite side of private is > > > shared, i.e. flags with KVM_MEMORY_EXIT_FLAG_PRIVATE cleared expresses > > > 'shared'. > > > > If that is always true and we never expect a 3rd type of memory that is > > fine. But given we are leaving room for expansion having an explicit bit > > allows for that as well as making cases of forgetting to set the flags > > more obvious. > > Hrm, I'm on the fence. > > A dedicated flag isn't strictly needed, e.g. even if we end up with 3+ types in > this category, the baseline could always be "private". The baseline for the current code is actually "shared". > > I do like being explicit, and adding a PRIVATE flag costs KVM practically nothing > to implement and maintain, but evetually we'll up with flags that are paired with > an implicit state, e.g. see the many #PF error codes in x86. In other words, > inevitably KVM will need to define the default/base state of the access, at which > point the base state for SHARED vs. PRIVATE is "undefined". Current memory conversion for confidential usage is bi-directional so we already need both private and shared states and if we use one bit for both "shared" and "private" then we will have to define the default state, e.g, currently the default state is "shared" when we define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > > The RWX bits are in the same boat, e.g. the READ flag isn't strictly necessary. > I was thinking more of the KVM_SET_MEMORY_ATTRIBUTES ioctl(), which does need > the full RWX gamut, when I typed out that response. For KVM_SET_MEMORY_ATTRIBUTES it's reasonable to add RWX bits and match that to the permission bits definition in EPT entry. > > So I would say if we add an explicit READ flag, then we might as well add an explicit > PRIVATE flag too. But if we omit PRIVATE, then we should omit READ too. Since we assume the default state is shared, so we actually only need a PRIVATE flag, e.g. there is no SHARED flag and will ignore the RWX for now. Chao
On Tue, Nov 22, 2022, Chao Peng wrote: > On Fri, Nov 18, 2022 at 03:59:12PM +0000, Sean Christopherson wrote: > > On Fri, Nov 18, 2022, Alex Benn?e wrote: > > > > We don't actually need a new bit, the opposite side of private is > > > > shared, i.e. flags with KVM_MEMORY_EXIT_FLAG_PRIVATE cleared expresses > > > > 'shared'. > > > > > > If that is always true and we never expect a 3rd type of memory that is > > > fine. But given we are leaving room for expansion having an explicit bit > > > allows for that as well as making cases of forgetting to set the flags > > > more obvious. > > > > Hrm, I'm on the fence. > > > > A dedicated flag isn't strictly needed, e.g. even if we end up with 3+ types in > > this category, the baseline could always be "private". > > The baseline for the current code is actually "shared". Ah, right, the baseline needs to be "shared" so that legacy code doesn't end up with impossible states. > > I do like being explicit, and adding a PRIVATE flag costs KVM practically nothing > > to implement and maintain, but evetually we'll up with flags that are paired with > > an implicit state, e.g. see the many #PF error codes in x86. In other words, > > inevitably KVM will need to define the default/base state of the access, at which > > point the base state for SHARED vs. PRIVATE is "undefined". > > Current memory conversion for confidential usage is bi-directional so we > already need both private and shared states and if we use one bit for > both "shared" and "private" then we will have to define the default > state, e.g, currently the default state is "shared" when we define > > KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) ... > > So I would say if we add an explicit READ flag, then we might as well add an explicit > > PRIVATE flag too. But if we omit PRIVATE, then we should omit READ too. > > Since we assume the default state is shared, so we actually only need a > PRIVATE flag, e.g. there is no SHARED flag and will ignore the RWX for now. Yeah, I'm leading towards "shared" being the implied default state. Ditto for "read" if/when we need to communicate write/execute information E.g. for VMs that don't support guest private memory, the "shared" flag is in some ways nonsensical. Worst case scenario, e.g. if we end up with variations of "shared", we'll need something like KVM_MEMORY_EXIT_FLAG_SHARED_RESTRICTIVE or whatever, but the basic "shared" default will still work.
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index f3fa75649a78..975688912b8c 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6537,6 +6537,29 @@ array field represents return values. The userspace should update the return values of SBI call before resuming the VCPU. For more details on RISC-V SBI spec refer, https://github.com/riscv/riscv-sbi-doc. +:: + + /* KVM_EXIT_MEMORY_FAULT */ + struct { + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) + __u32 flags; + __u32 padding; + __u64 gpa; + __u64 size; + } memory; + +If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has +encountered a memory error which is not handled by KVM kernel module and +userspace may choose to handle it. The 'flags' field indicates the memory +properties of the exit. + + - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by + private memory access when the bit is set. Otherwise the memory error is + caused by shared memory access when the bit is clear. + +'gpa' and 'size' indicate the memory range the error occurs at. The userspace +may handle the error and return to KVM to retry the previous memory access. + :: /* KVM_EXIT_NOTIFY */ diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index f1ae45c10c94..fa60b032a405 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -300,6 +300,7 @@ struct kvm_xen_exit { #define KVM_EXIT_RISCV_SBI 35 #define KVM_EXIT_RISCV_CSR 36 #define KVM_EXIT_NOTIFY 37 +#define KVM_EXIT_MEMORY_FAULT 38 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -538,6 +539,14 @@ struct kvm_run { #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) __u32 flags; } notify; + /* KVM_EXIT_MEMORY_FAULT */ + struct { +#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) + __u32 flags; + __u32 padding; + __u64 gpa; + __u64 size; + } memory; /* Fix the size of the union. */ char padding[256]; };