Message ID | 20231027182217.3615211-9-seanjc@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp801231vqb; Fri, 27 Oct 2023 11:24:07 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHZ6TvzBebLrEj/wTx/2Dd3rYthh5JF1K5JZWuuMljeGtcnwn0mYIotScoHb2guIjrtqr+c X-Received: by 2002:a9d:7cc8:0:b0:6b9:1af3:3307 with SMTP id r8-20020a9d7cc8000000b006b91af33307mr3537385otn.17.1698431047585; Fri, 27 Oct 2023 11:24:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698431047; cv=none; d=google.com; s=arc-20160816; b=ox8YqQRIJnhrSQkUK4f650KXqyIUi9P1cJw3XyfqB5V8rjtC+9P1a3RyoPoddXSdIX KfpG6nj0Dk0bzLT2iHXP1808YidScKkuADvUrcPdNOrADZlqxnKkEFTvwv/x08a2r4EG HcQ3lgvjxsPZhrUKR6KYT/RnVzcINFgEI2VEjmlhMfoZPmDnuZBr/HS443kHne8HN1AF Lz3Ekp0VqcFBQ5yz4AafNI1ztAVh2TP8kIephiBqZZ4Jyzos6exkJGuKiQF5vLweYozU 29aqqDO419jrnx90scldZxGRsOOUG6QtogJcrA3NkZZSKaQhAIqV8zbpbsWe0hRiewz/ RWsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=7QvlKvGCf16Xr+uDo8e13B+5xMJ1JYvNFtX+dCaVrEc=; fh=lhteFENhZrfxRoH7K7/E/bqXvDWa/XLvUszFia9mLtM=; b=osBYd5LnqGqjF63etMcC9TuFjVD0d8dyxiQd+Jdq6YkCt+IfXw2j3RaoRWE0COLXlO mCpQWzKsLl9K0HmiXOXoNA2lcCGNPqHHajvweVisK/rwWHuj5T+PBUt+RVUFlphoAxJW 1nWphNllnsaEdYUBLbhZwtwShcGUGFIB2ZIQ/brtMNVfFIbQpc9uDcqTDjsJ4BRgDRj8 wefw71n8cpTmEKWGMT9iN/ZfJP/obxgC7Kqwi+kzSecMqvJGY/FN12IpX2EC4pLOcR5b yudWk0iXoyW1YIETTQPUDn85HddG5SsyyVvEI4nb2I+FvFAuvsM2WBnPbZXWU3MEZfve MXwA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=jyjoB6LW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id o128-20020a817386000000b005a247d6f44fsi3155957ywc.519.2023.10.27.11.24.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Oct 2023 11:24:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=jyjoB6LW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 2799C82CF123; Fri, 27 Oct 2023 11:24:00 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346420AbjJ0SXQ (ORCPT <rfc822;a1648639935@gmail.com> + 25 others); Fri, 27 Oct 2023 14:23:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346361AbjJ0SWz (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 27 Oct 2023 14:22:55 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF420D69 for <linux-kernel@vger.kernel.org>; Fri, 27 Oct 2023 11:22:42 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-d9a5a3f2d4fso1710466276.3 for <linux-kernel@vger.kernel.org>; Fri, 27 Oct 2023 11:22:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698430961; x=1699035761; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=7QvlKvGCf16Xr+uDo8e13B+5xMJ1JYvNFtX+dCaVrEc=; b=jyjoB6LWat3FKiauSkBKryBDFjRNoXl0eRbelBuNmhZAgDPE1YP/jZlcPDcwztL5IQ RYCBZVzYzRivYeZgIINfVev0j47j/L/G9jLMWWrSLZnXcTv9lQYVRalG+Xx4tnwBIxaW jh8fZsRTl4aoAc1/KnlafDmqARyP3G4D6a+QavE1qA4HVuxfknmyTRQFxNnqXGOJrL5X 0UBy0Ym8b5ot0Q8D+rrEZijzA/DDH2yaAGZpQYIPZZJCJsY5fmSsIzfMtoEA28d0leeh 1X+ifIbmPw2cNYoN1FGnlMrx/znZYRPL2lTV3g6NlDQ7hgc0p2LMcLcLAu/rANssonRA Mx3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698430961; x=1699035761; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7QvlKvGCf16Xr+uDo8e13B+5xMJ1JYvNFtX+dCaVrEc=; b=CtFPWc0pnI+kiSi1Ql7qI7SEMjdxu3TVLsILxVWiifgtM1FvYqvr1cjfeC12Mpl7Gk HlGxbxfeXmYVPmJ5gntPyX7vwJpZXmrTYFT6/1b+J98vxZVjE7YK36dB1ZqznrtrSss2 +UY66Tw1Kr/MjHBd5HLJMAVs8S4IE+3xwcvJ62lKea2dXLo8VVCMQhaNO8dc6qwrtsKl 0xhtfuoti4zd/mUfVX0Ef1SGR6bDv3Er+c4MiUy7VLnqSo0jlqTY4blIb1rgdh/cWEmq irDg5wRLwgtoD2Jyv86NGpSYMuP3dU10QArFTavb+d2F/TgB/9+nkAL7l3YCvo2dNST9 pMMw== X-Gm-Message-State: AOJu0YyrenO+xFKNzDDniT4LZrg94NY+qCti18WSmG02VmmJAGGDxjdl fxzEVdHbhOt0SK0RlRbyl+DAJCRCTjI= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:abe2:0:b0:da0:48e1:5f46 with SMTP id v89-20020a25abe2000000b00da048e15f46mr66725ybi.9.1698430961446; Fri, 27 Oct 2023 11:22:41 -0700 (PDT) Reply-To: Sean Christopherson <seanjc@google.com> Date: Fri, 27 Oct 2023 11:21:50 -0700 In-Reply-To: <20231027182217.3615211-1-seanjc@google.com> Mime-Version: 1.0 References: <20231027182217.3615211-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.820.g83a721a137-goog Message-ID: <20231027182217.3615211-9-seanjc@google.com> Subject: [PATCH v13 08/35] KVM: Introduce KVM_SET_USER_MEMORY_REGION2 From: Sean Christopherson <seanjc@google.com> To: Paolo Bonzini <pbonzini@redhat.com>, Marc Zyngier <maz@kernel.org>, Oliver Upton <oliver.upton@linux.dev>, Huacai Chen <chenhuacai@kernel.org>, Michael Ellerman <mpe@ellerman.id.au>, Anup Patel <anup@brainfault.org>, Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt <palmer@dabbelt.com>, Albert Ou <aou@eecs.berkeley.edu>, Sean Christopherson <seanjc@google.com>, Alexander Viro <viro@zeniv.linux.org.uk>, Christian Brauner <brauner@kernel.org>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, Andrew Morton <akpm@linux-foundation.org> Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xiaoyao Li <xiaoyao.li@intel.com>, Xu Yilun <yilun.xu@intel.com>, Chao Peng <chao.p.peng@linux.intel.com>, Fuad Tabba <tabba@google.com>, Jarkko Sakkinen <jarkko@kernel.org>, Anish Moorthy <amoorthy@google.com>, David Matlack <dmatlack@google.com>, Yu Zhang <yu.c.zhang@linux.intel.com>, Isaku Yamahata <isaku.yamahata@intel.com>, " =?utf-8?q?Micka=C3=ABl_Sala?= =?utf-8?q?=C3=BCn?= " <mic@digikod.net>, Vlastimil Babka <vbabka@suse.cz>, Vishal Annapurve <vannapurve@google.com>, Ackerley Tng <ackerleytng@google.com>, Maciej Szmigiero <mail@maciej.szmigiero.name>, David Hildenbrand <david@redhat.com>, Quentin Perret <qperret@google.com>, Michael Roth <michael.roth@amd.com>, Wang <wei.w.wang@intel.com>, Liam Merwick <liam.merwick@oracle.com>, Isaku Yamahata <isaku.yamahata@gmail.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Fri, 27 Oct 2023 11:24:00 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780934034180029584 X-GMAIL-MSGID: 1780934034180029584 |
Series |
KVM: guest_memfd() and per-page attributes
|
|
Commit Message
Sean Christopherson
Oct. 27, 2023, 6:21 p.m. UTC
Introduce a "version 2" of KVM_SET_USER_MEMORY_REGION so that additional information can be supplied without setting userspace up to fail. The padding in the new kvm_userspace_memory_region2 structure will be used to pass a file descriptor in addition to the userspace_addr, i.e. allow userspace to point at a file descriptor and map memory into a guest that is NOT mapped into host userspace. Alternatively, KVM could simply add "struct kvm_userspace_memory_region2" without a new ioctl(), but as Paolo pointed out, adding a new ioctl() makes detection of bad flags a bit more robust, e.g. if the new fd field is guarded only by a flag and not a new ioctl(), then a userspace bug (setting a "bad" flag) would generate out-of-bounds access instead of an -EINVAL error. Cc: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> --- Documentation/virt/kvm/api.rst | 21 +++++++++++++++++++ arch/x86/kvm/x86.c | 2 +- include/linux/kvm_host.h | 4 ++-- include/uapi/linux/kvm.h | 13 ++++++++++++ virt/kvm/kvm_main.c | 38 +++++++++++++++++++++++++++------- 5 files changed, 67 insertions(+), 11 deletions(-)
Comments
On 10/27/23 20:21, Sean Christopherson wrote: > > + if (ioctl == KVM_SET_USER_MEMORY_REGION) > + size = sizeof(struct kvm_userspace_memory_region); This also needs a memset(&mem, 0, sizeof(mem)), otherwise the out-of-bounds access of the commit message becomes a kernel stack read. Probably worth adding a check on valid flags here. Paolo > + else > + size = sizeof(struct kvm_userspace_memory_region2); > + > + /* Ensure the common parts of the two structs are identical. */ > + SANITY_CHECK_MEM_REGION_FIELD(slot); > + SANITY_CHECK_MEM_REGION_FIELD(flags); > + SANITY_CHECK_MEM_REGION_FIELD(guest_phys_addr); > + SANITY_CHECK_MEM_REGION_FIELD(memory_size); > + SANITY_CHECK_MEM_REGION_FIELD(userspace_addr); >
On Mon, Oct 30, 2023, Paolo Bonzini wrote: > On 10/27/23 20:21, Sean Christopherson wrote: > > > > + if (ioctl == KVM_SET_USER_MEMORY_REGION) > > + size = sizeof(struct kvm_userspace_memory_region); > > This also needs a memset(&mem, 0, sizeof(mem)), otherwise the out-of-bounds > access of the commit message becomes a kernel stack read. Ouch. There's some irony. Might be worth doing memset(&mem, -1, sizeof(mem)) though as '0' is a valid file descriptor and a valid file offset. > Probably worth adding a check on valid flags here. Definitely needed. There's a very real bug here. But rather than duplicate flags checking or plumb @ioctl all the way to __kvm_set_memory_region(), now that we have the fancy guard(mutex) and there are no internal calls to kvm_set_memory_region(), what if we: 1. Acquire/release slots_lock in __kvm_set_memory_region() 2. Call kvm_set_memory_region() from x86 code for the internal memslots 3. Disallow *any* flags for internal memslots 4. Open code check_memory_region_flags in kvm_vm_ioctl_set_memory_region() 5. Pass @ioctl to kvm_vm_ioctl_set_memory_region() and allow KVM_MEM_PRIVATE only for KVM_SET_USER_MEMORY_REGION2 E.g. this over ~5 patches --- arch/x86/kvm/x86.c | 2 +- include/linux/kvm_host.h | 4 +-- virt/kvm/kvm_main.c | 65 +++++++++++++++++----------------------- 3 files changed, 29 insertions(+), 42 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e3eb608b6692..dd3e2017366c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12478,7 +12478,7 @@ void __user * __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, m.guest_phys_addr = gpa; m.userspace_addr = hva; m.memory_size = size; - r = __kvm_set_memory_region(kvm, &m); + r = kvm_set_memory_region(kvm, &m); if (r < 0) return ERR_PTR_USR(r); } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 687589ce9f63..fbb98efe8200 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1170,7 +1170,7 @@ static inline bool kvm_memslot_iter_is_valid(struct kvm_memslot_iter *iter, gfn_ * -- just change its flags * * Since flags can be changed by some of these operations, the following - * differentiation is the best we can do for __kvm_set_memory_region(): + * differentiation is the best we can do for __kvm_set_memory_region(). */ enum kvm_mr_change { KVM_MR_CREATE, @@ -1181,8 +1181,6 @@ enum kvm_mr_change { int kvm_set_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region2 *mem); -int __kvm_set_memory_region(struct kvm *kvm, - const struct kvm_userspace_memory_region2 *mem); void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot); void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen); int kvm_arch_prepare_memory_region(struct kvm *kvm, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 23633984142f..39ceee2f67f2 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1608,28 +1608,6 @@ static void kvm_replace_memslot(struct kvm *kvm, } } -static int check_memory_region_flags(struct kvm *kvm, - const struct kvm_userspace_memory_region2 *mem) -{ - u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES; - - if (kvm_arch_has_private_mem(kvm)) - valid_flags |= KVM_MEM_PRIVATE; - - /* Dirty logging private memory is not currently supported. */ - if (mem->flags & KVM_MEM_PRIVATE) - valid_flags &= ~KVM_MEM_LOG_DIRTY_PAGES; - -#ifdef __KVM_HAVE_READONLY_MEM - valid_flags |= KVM_MEM_READONLY; -#endif - - if (mem->flags & ~valid_flags) - return -EINVAL; - - return 0; -} - static void kvm_swap_active_memslots(struct kvm *kvm, int as_id) { struct kvm_memslots *slots = kvm_get_inactive_memslots(kvm, as_id); @@ -2014,11 +1992,9 @@ static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id, * space. * * Discontiguous memory is allowed, mostly for framebuffers. - * - * Must be called holding kvm->slots_lock for write. */ -int __kvm_set_memory_region(struct kvm *kvm, - const struct kvm_userspace_memory_region2 *mem) +static int __kvm_set_memory_region(struct kvm *kvm, + const struct kvm_userspace_memory_region2 *mem) { struct kvm_memory_slot *old, *new; struct kvm_memslots *slots; @@ -2028,9 +2004,7 @@ int __kvm_set_memory_region(struct kvm *kvm, int as_id, id; int r; - r = check_memory_region_flags(kvm, mem); - if (r) - return r; + guard(mutex)(&kvm->slots_lock); as_id = mem->slot >> 16; id = (u16)mem->slot; @@ -2139,27 +2113,42 @@ int __kvm_set_memory_region(struct kvm *kvm, kfree(new); return r; } -EXPORT_SYMBOL_GPL(__kvm_set_memory_region); int kvm_set_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region2 *mem) { - int r; + /* Flags aren't supported for KVM-internal memslots. */ + if (WARN_ON_ONCE(mem->flags)) + return -EINVAL; - mutex_lock(&kvm->slots_lock); - r = __kvm_set_memory_region(kvm, mem); - mutex_unlock(&kvm->slots_lock); - return r; + return __kvm_set_memory_region(kvm, mem); } EXPORT_SYMBOL_GPL(kvm_set_memory_region); -static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, +static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, unsigned int ioctl, struct kvm_userspace_memory_region2 *mem) { + u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES; + + if (ioctl == KVM_SET_USER_MEMORY_REGION2 && + kvm_arch_has_private_mem(kvm)) + valid_flags |= KVM_MEM_PRIVATE; + + /* Dirty logging private memory is not currently supported. */ + if (mem->flags & KVM_MEM_PRIVATE) + valid_flags &= ~KVM_MEM_LOG_DIRTY_PAGES; + +#ifdef __KVM_HAVE_READONLY_MEM + valid_flags |= KVM_MEM_READONLY; +#endif + + if (mem->flags & ~valid_flags) + return -EINVAL; + if ((u16)mem->slot >= KVM_USER_MEM_SLOTS) return -EINVAL; - return kvm_set_memory_region(kvm, mem); + return __kvm_set_memory_region(kvm, mem); } #ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT @@ -5145,7 +5134,7 @@ static long kvm_vm_ioctl(struct file *filp, if (copy_from_user(&mem, argp, size)) goto out; - r = kvm_vm_ioctl_set_memory_region(kvm, &mem); + r = kvm_vm_ioctl_set_memory_region(kvm, ioctl, &mem); break; } case KVM_GET_DIRTY_LOG: { base-commit: 881375a408c0f4ea451ff14545b59216d2923881 --
On Mon, Oct 30, 2023, Sean Christopherson wrote: > On Mon, Oct 30, 2023, Paolo Bonzini wrote: > > On 10/27/23 20:21, Sean Christopherson wrote: > > > > > > + if (ioctl == KVM_SET_USER_MEMORY_REGION) > > > + size = sizeof(struct kvm_userspace_memory_region); > > > > This also needs a memset(&mem, 0, sizeof(mem)), otherwise the out-of-bounds > > access of the commit message becomes a kernel stack read. > > Ouch. There's some irony. Might be worth doing memset(&mem, -1, sizeof(mem)) > though as '0' is a valid file descriptor and a valid file offset. > > > Probably worth adding a check on valid flags here. > > Definitely needed. There's a very real bug here. But rather than duplicate flags > checking or plumb @ioctl all the way to __kvm_set_memory_region(), now that we > have the fancy guard(mutex) and there are no internal calls to kvm_set_memory_region(), > what if we: > > 1. Acquire/release slots_lock in __kvm_set_memory_region() Gah, this won't work with x86's usage, which acquire slots_lock quite far away from __kvm_set_memory_region() in several places. The rest of the idea still works, kvm_vm_ioctl_set_memory_region() will just need to take slots_lock manually. > 2. Call kvm_set_memory_region() from x86 code for the internal memslots > 3. Disallow *any* flags for internal memslots > 4. Open code check_memory_region_flags in kvm_vm_ioctl_set_memory_region() > 5. Pass @ioctl to kvm_vm_ioctl_set_memory_region() and allow KVM_MEM_PRIVATE > only for KVM_SET_USER_MEMORY_REGION2
On 10/30/23 21:25, Sean Christopherson wrote: > On Mon, Oct 30, 2023, Paolo Bonzini wrote: >> On 10/27/23 20:21, Sean Christopherson wrote: >>> >>> + if (ioctl == KVM_SET_USER_MEMORY_REGION) >>> + size = sizeof(struct kvm_userspace_memory_region); >> >> This also needs a memset(&mem, 0, sizeof(mem)), otherwise the out-of-bounds >> access of the commit message becomes a kernel stack read. > > Ouch. There's some irony. Might be worth doing memset(&mem, -1, sizeof(mem)) > though as '0' is a valid file descriptor and a valid file offset. Either is okay, because unless the flags check is screwed up it should not matter. The memset is actually unnecessary, though it may be a good idea anyway to keep it, aka belt-and-suspenders. >> Probably worth adding a check on valid flags here. > > Definitely needed. There's a very real bug here. But rather than duplicate flags > checking or plumb @ioctl all the way to __kvm_set_memory_region(), now that we > have the fancy guard(mutex) and there are no internal calls to kvm_set_memory_region(), > what if we: > > 1. Acquire/release slots_lock in __kvm_set_memory_region() > 2. Call kvm_set_memory_region() from x86 code for the internal memslots > 3. Disallow *any* flags for internal memslots > 4. Open code check_memory_region_flags in kvm_vm_ioctl_set_memory_region() I dislike this step, there is a clear point where all paths meet (ioctl/internal, locked/unlocked) and that's __kvm_set_memory_region(). I think that's the place where flags should be checked. (I don't mind the restriction on internal memslots; it's just that to me it's not a particularly natural way to structure the checks). On the other hand, the place where to protect from out-of-bounds accesses, is the place where you stop caring about struct kvm_userspace_memory_region vs kvm_userspace_memory_region2 (and your code gets it right, by dropping "ioctl" as soon as possible). diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 87f45aa91ced..fe5a2af14fff 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1635,6 +1635,14 @@ bool __weak kvm_arch_dirty_log_supported(struct kvm *kvm) return true; } +/* + * Flags that do not access any of the extra space of struct + * kvm_userspace_memory_region2. KVM_SET_USER_MEMORY_REGION_FLAGS + * only allows these. + */ +#define KVM_SET_USER_MEMORY_REGION_FLAGS \ + (KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY) + static int check_memory_region_flags(struct kvm *kvm, const struct kvm_userspace_memory_region2 *mem) { @@ -5149,10 +5149,16 @@ static long kvm_vm_ioctl(struct file *filp, struct kvm_userspace_memory_region2 mem; unsigned long size; - if (ioctl == KVM_SET_USER_MEMORY_REGION) + if (ioctl == KVM_SET_USER_MEMORY_REGION) { + /* + * Fields beyond struct kvm_userspace_memory_region shouldn't be + * accessed, but avoid leaking kernel memory in case of a bug. + */ + memset(&mem, 0, sizeof(mem)); size = sizeof(struct kvm_userspace_memory_region); - else + } else { size = sizeof(struct kvm_userspace_memory_region2); + } /* Ensure the common parts of the two structs are identical. */ SANITY_CHECK_MEM_REGION_FIELD(slot); @@ -5165,6 +5167,11 @@ static long kvm_vm_ioctl(struct file *filp, if (copy_from_user(&mem, argp, size)) goto out; + r = -EINVAL; + if (ioctl == KVM_SET_USER_MEMORY_REGION && + (mem->flags & ~KVM_SET_USER_MEMORY_REGION_FLAGS)) + goto out; + r = kvm_vm_ioctl_set_memory_region(kvm, &mem); break; } That's a kind of patch that you can't really get wrong (though I have the brown paper bag ready). Maintainance-wise it's fine, since flags are being added at a pace of roughly one every five years, and anyway it's also future proof: I placed the #define near check_memory_region_flags so that in five years we remember to keep it up to date. But worst case, the new flags will only be allowed by KVM_SET_USER_MEMORY_REGION2 unnecessarily; there are no security issues waiting to bite us. In sum, this is exactly the only kind of fix that should be in the v13->v14 delta. Paolo
On Tue, Oct 31, 2023, Paolo Bonzini wrote: > On 10/30/23 21:25, Sean Christopherson wrote: > > > Probably worth adding a check on valid flags here. > > > > Definitely needed. There's a very real bug here. But rather than duplicate flags > > checking or plumb @ioctl all the way to __kvm_set_memory_region(), now that we > > have the fancy guard(mutex) and there are no internal calls to kvm_set_memory_region(), > > what if we: > > > > 1. Acquire/release slots_lock in __kvm_set_memory_region() > > 2. Call kvm_set_memory_region() from x86 code for the internal memslots > > 3. Disallow *any* flags for internal memslots > > 4. Open code check_memory_region_flags in kvm_vm_ioctl_set_memory_region() > > I dislike this step, there is a clear point where all paths meet > (ioctl/internal, locked/unlocked) and that's __kvm_set_memory_region(). > I think that's the place where flags should be checked. (I don't mind > the restriction on internal memslots; it's just that to me it's not a > particularly natural way to structure the checks). Yeah, I just don't like the discrepancy it causes where some flags are explicitly checked and allowed, allowed and then later disallowed. > On the other hand, the place where to protect from out-of-bounds > accesses, is the place where you stop caring about struct > kvm_userspace_memory_region vs kvm_userspace_memory_region2 (and > your code gets it right, by dropping "ioctl" as soon as possible). > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 87f45aa91ced..fe5a2af14fff 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1635,6 +1635,14 @@ bool __weak kvm_arch_dirty_log_supported(struct kvm *kvm) > return true; > } > +/* > + * Flags that do not access any of the extra space of struct > + * kvm_userspace_memory_region2. KVM_SET_USER_MEMORY_REGION_FLAGS > + * only allows these. > + */ > +#define KVM_SET_USER_MEMORY_REGION_FLAGS \ Can we name this KVM_SET_USER_MEMORY_REGION_LEGACY_FLAGS, or something equally horrific? As is, this sounds way too much like a generic "allowed flags for any memory region". Or maybe invert the macro? I.e. something to make it more obvious that it's effectively a versioning check, not a generic "what's supported?" check. #define KVM_SET_USER_MEMORY_FLAGS_V2_ONLY \ (~(KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY)) > + (KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY) > + > static int check_memory_region_flags(struct kvm *kvm, > const struct kvm_userspace_memory_region2 *mem) > { > @@ -5149,10 +5149,16 @@ static long kvm_vm_ioctl(struct file *filp, > struct kvm_userspace_memory_region2 mem; > unsigned long size; > - if (ioctl == KVM_SET_USER_MEMORY_REGION) > + if (ioctl == KVM_SET_USER_MEMORY_REGION) { > + /* > + * Fields beyond struct kvm_userspace_memory_region shouldn't be > + * accessed, but avoid leaking kernel memory in case of a bug. > + */ > + memset(&mem, 0, sizeof(mem)); > size = sizeof(struct kvm_userspace_memory_region); > - else > + } else { > size = sizeof(struct kvm_userspace_memory_region2); > + } > /* Ensure the common parts of the two structs are identical. */ > SANITY_CHECK_MEM_REGION_FIELD(slot); > @@ -5165,6 +5167,11 @@ static long kvm_vm_ioctl(struct file *filp, > if (copy_from_user(&mem, argp, size)) > goto out; > + r = -EINVAL; > + if (ioctl == KVM_SET_USER_MEMORY_REGION && > + (mem->flags & ~KVM_SET_USER_MEMORY_REGION_FLAGS)) > + goto out; > + > r = kvm_vm_ioctl_set_memory_region(kvm, &mem); > break; > } > > > That's a kind of patch that you can't really get wrong (though I have > the brown paper bag ready). > > Maintainance-wise it's fine, since flags are being added at a pace of > roughly one every five years, Heh, true. > and anyway it's also future proof: I placed the #define near > check_memory_region_flags so that in five years we remember to keep it up to > date. But worst case, the new flags will only be allowed by > KVM_SET_USER_MEMORY_REGION2 unnecessarily; there are no security issues > waiting to bite us. > > In sum, this is exactly the only kind of fix that should be in the v13->v14 > delta. Boiling the ocean can be fun too ;-)
On 10/28/2023 2:21 AM, Sean Christopherson wrote: > Introduce a "version 2" of KVM_SET_USER_MEMORY_REGION so that additional > information can be supplied without setting userspace up to fail. The > padding in the new kvm_userspace_memory_region2 structure will be used to > pass a file descriptor in addition to the userspace_addr, i.e. allow > userspace to point at a file descriptor and map memory into a guest that > is NOT mapped into host userspace. > > Alternatively, KVM could simply add "struct kvm_userspace_memory_region2" > without a new ioctl(), but as Paolo pointed out, adding a new ioctl() > makes detection of bad flags a bit more robust, e.g. if the new fd field > is guarded only by a flag and not a new ioctl(), then a userspace bug > (setting a "bad" flag) would generate out-of-bounds access instead of an > -EINVAL error. > > Cc: Jarkko Sakkinen <jarkko@kernel.org> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> > Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > Documentation/virt/kvm/api.rst | 21 +++++++++++++++++++ > arch/x86/kvm/x86.c | 2 +- > include/linux/kvm_host.h | 4 ++-- > include/uapi/linux/kvm.h | 13 ++++++++++++ > virt/kvm/kvm_main.c | 38 +++++++++++++++++++++++++++------- > 5 files changed, 67 insertions(+), 11 deletions(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index 21a7578142a1..ace984acc125 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6070,6 +6070,27 @@ writes to the CNTVCT_EL0 and CNTPCT_EL0 registers using the SET_ONE_REG > interface. No error will be returned, but the resulting offset will not be > applied. > > +4.139 KVM_SET_USER_MEMORY_REGION2 > +--------------------------------- > + > +:Capability: KVM_CAP_USER_MEMORY2 > +:Architectures: all > +:Type: vm ioctl > +:Parameters: struct kvm_userspace_memory_region2 (in) > +:Returns: 0 on success, -1 on error > + > +:: > + > + struct kvm_userspace_memory_region2 { > + __u32 slot; > + __u32 flags; > + __u64 guest_phys_addr; > + __u64 memory_size; /* bytes */ > + __u64 userspace_addr; /* start of the userspace allocated memory */ missing __u64 pad[16]; > + }; > + > +See KVM_SET_USER_MEMORY_REGION. > + > 5. The kvm_run structure > ======================== > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 41cce5031126..6409914428ca 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -12455,7 +12455,7 @@ void __user * __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, > } > > for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { > - struct kvm_userspace_memory_region m; > + struct kvm_userspace_memory_region2 m; > > m.slot = id | (i << 16); > m.flags = 0; > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 5faba69403ac..4e741ff27af3 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -1146,9 +1146,9 @@ enum kvm_mr_change { > }; > > int kvm_set_memory_region(struct kvm *kvm, > - const struct kvm_userspace_memory_region *mem); > + const struct kvm_userspace_memory_region2 *mem); > int __kvm_set_memory_region(struct kvm *kvm, > - const struct kvm_userspace_memory_region *mem); > + const struct kvm_userspace_memory_region2 *mem); > void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot); > void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen); > int kvm_arch_prepare_memory_region(struct kvm *kvm, > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 13065dd96132..bd1abe067f28 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -95,6 +95,16 @@ struct kvm_userspace_memory_region { > __u64 userspace_addr; /* start of the userspace allocated memory */ > }; > > +/* for KVM_SET_USER_MEMORY_REGION2 */ > +struct kvm_userspace_memory_region2 { > + __u32 slot; > + __u32 flags; > + __u64 guest_phys_addr; > + __u64 memory_size; > + __u64 userspace_addr; > + __u64 pad[16]; > +}; > + > /* > * The bit 0 ~ bit 15 of kvm_userspace_memory_region::flags are visible for > * userspace, other bits are reserved for kvm internal use which are defined > @@ -1192,6 +1202,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_COUNTER_OFFSET 227 > #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228 > #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229 > +#define KVM_CAP_USER_MEMORY2 230 > > #ifdef KVM_CAP_IRQ_ROUTING > > @@ -1473,6 +1484,8 @@ struct kvm_vfio_spapr_tce { > struct kvm_userspace_memory_region) > #define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47) > #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64) > +#define KVM_SET_USER_MEMORY_REGION2 _IOW(KVMIO, 0x49, \ > + struct kvm_userspace_memory_region2) > > /* enable ucontrol for s390 */ > struct kvm_s390_ucas_mapping { > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 6e708017064d..3f5b7c2c5327 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1578,7 +1578,7 @@ static void kvm_replace_memslot(struct kvm *kvm, > } > } > > -static int check_memory_region_flags(const struct kvm_userspace_memory_region *mem) > +static int check_memory_region_flags(const struct kvm_userspace_memory_region2 *mem) > { > u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES; > > @@ -1980,7 +1980,7 @@ static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id, > * Must be called holding kvm->slots_lock for write. > */ > int __kvm_set_memory_region(struct kvm *kvm, > - const struct kvm_userspace_memory_region *mem) > + const struct kvm_userspace_memory_region2 *mem) > { > struct kvm_memory_slot *old, *new; > struct kvm_memslots *slots; > @@ -2084,7 +2084,7 @@ int __kvm_set_memory_region(struct kvm *kvm, > EXPORT_SYMBOL_GPL(__kvm_set_memory_region); > > int kvm_set_memory_region(struct kvm *kvm, > - const struct kvm_userspace_memory_region *mem) > + const struct kvm_userspace_memory_region2 *mem) > { > int r; > > @@ -2096,7 +2096,7 @@ int kvm_set_memory_region(struct kvm *kvm, > EXPORT_SYMBOL_GPL(kvm_set_memory_region); > > static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, > - struct kvm_userspace_memory_region *mem) > + struct kvm_userspace_memory_region2 *mem) > { > if ((u16)mem->slot >= KVM_USER_MEM_SLOTS) > return -EINVAL; > @@ -4566,6 +4566,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) > { > switch (arg) { > case KVM_CAP_USER_MEMORY: > + case KVM_CAP_USER_MEMORY2: > case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: > case KVM_CAP_JOIN_MEMORY_REGIONS_WORKS: > case KVM_CAP_INTERNAL_ERROR_DATA: > @@ -4821,6 +4822,14 @@ static int kvm_vm_ioctl_get_stats_fd(struct kvm *kvm) > return fd; > } > > +#define SANITY_CHECK_MEM_REGION_FIELD(field) \ > +do { \ > + BUILD_BUG_ON(offsetof(struct kvm_userspace_memory_region, field) != \ > + offsetof(struct kvm_userspace_memory_region2, field)); \ > + BUILD_BUG_ON(sizeof_field(struct kvm_userspace_memory_region, field) != \ > + sizeof_field(struct kvm_userspace_memory_region2, field)); \ > +} while (0) > + > static long kvm_vm_ioctl(struct file *filp, > unsigned int ioctl, unsigned long arg) > { > @@ -4843,15 +4852,28 @@ static long kvm_vm_ioctl(struct file *filp, > r = kvm_vm_ioctl_enable_cap_generic(kvm, &cap); > break; > } > + case KVM_SET_USER_MEMORY_REGION2: > case KVM_SET_USER_MEMORY_REGION: { > - struct kvm_userspace_memory_region kvm_userspace_mem; > + struct kvm_userspace_memory_region2 mem; > + unsigned long size; > + > + if (ioctl == KVM_SET_USER_MEMORY_REGION) > + size = sizeof(struct kvm_userspace_memory_region); > + else > + size = sizeof(struct kvm_userspace_memory_region2); > + > + /* Ensure the common parts of the two structs are identical. */ > + SANITY_CHECK_MEM_REGION_FIELD(slot); > + SANITY_CHECK_MEM_REGION_FIELD(flags); > + SANITY_CHECK_MEM_REGION_FIELD(guest_phys_addr); > + SANITY_CHECK_MEM_REGION_FIELD(memory_size); > + SANITY_CHECK_MEM_REGION_FIELD(userspace_addr); > > r = -EFAULT; > - if (copy_from_user(&kvm_userspace_mem, argp, > - sizeof(kvm_userspace_mem))) > + if (copy_from_user(&mem, argp, size)) > goto out; > > - r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem); > + r = kvm_vm_ioctl_set_memory_region(kvm, &mem); > break; > } > case KVM_GET_DIRTY_LOG: {
On Tue, Oct 31, 2023, Xiaoyao Li wrote: > On 10/28/2023 2:21 AM, Sean Christopherson wrote: > > Introduce a "version 2" of KVM_SET_USER_MEMORY_REGION so that additional > > information can be supplied without setting userspace up to fail. The > > padding in the new kvm_userspace_memory_region2 structure will be used to > > pass a file descriptor in addition to the userspace_addr, i.e. allow > > userspace to point at a file descriptor and map memory into a guest that > > is NOT mapped into host userspace. > > > > Alternatively, KVM could simply add "struct kvm_userspace_memory_region2" > > without a new ioctl(), but as Paolo pointed out, adding a new ioctl() > > makes detection of bad flags a bit more robust, e.g. if the new fd field > > is guarded only by a flag and not a new ioctl(), then a userspace bug > > (setting a "bad" flag) would generate out-of-bounds access instead of an > > -EINVAL error. > > > > Cc: Jarkko Sakkinen <jarkko@kernel.org> > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> > > Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > --- > > Documentation/virt/kvm/api.rst | 21 +++++++++++++++++++ > > arch/x86/kvm/x86.c | 2 +- > > include/linux/kvm_host.h | 4 ++-- > > include/uapi/linux/kvm.h | 13 ++++++++++++ > > virt/kvm/kvm_main.c | 38 +++++++++++++++++++++++++++------- > > 5 files changed, 67 insertions(+), 11 deletions(-) > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > index 21a7578142a1..ace984acc125 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -6070,6 +6070,27 @@ writes to the CNTVCT_EL0 and CNTPCT_EL0 registers using the SET_ONE_REG > > interface. No error will be returned, but the resulting offset will not be > > applied. > > +4.139 KVM_SET_USER_MEMORY_REGION2 > > +--------------------------------- > > + > > +:Capability: KVM_CAP_USER_MEMORY2 > > +:Architectures: all > > +:Type: vm ioctl > > +:Parameters: struct kvm_userspace_memory_region2 (in) > > +:Returns: 0 on success, -1 on error > > + > > +:: > > + > > + struct kvm_userspace_memory_region2 { > > + __u32 slot; > > + __u32 flags; > > + __u64 guest_phys_addr; > > + __u64 memory_size; /* bytes */ > > + __u64 userspace_addr; /* start of the userspace allocated memory */ > > missing > > __u64 pad[16]; I can't even copy+paste correctly :-(
On Fri, Oct 27, 2023 at 7:22 PM Sean Christopherson <seanjc@google.com> wrote: > > Introduce a "version 2" of KVM_SET_USER_MEMORY_REGION so that additional > information can be supplied without setting userspace up to fail. The > padding in the new kvm_userspace_memory_region2 structure will be used to > pass a file descriptor in addition to the userspace_addr, i.e. allow > userspace to point at a file descriptor and map memory into a guest that > is NOT mapped into host userspace. > > Alternatively, KVM could simply add "struct kvm_userspace_memory_region2" > without a new ioctl(), but as Paolo pointed out, adding a new ioctl() > makes detection of bad flags a bit more robust, e.g. if the new fd field > is guarded only by a flag and not a new ioctl(), then a userspace bug > (setting a "bad" flag) would generate out-of-bounds access instead of an > -EINVAL error. > > Cc: Jarkko Sakkinen <jarkko@kernel.org> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> > Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- With the missing pad in api.rst fixed: Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Cheers, /fuad > Documentation/virt/kvm/api.rst | 21 +++++++++++++++++++ > arch/x86/kvm/x86.c | 2 +- > include/linux/kvm_host.h | 4 ++-- > include/uapi/linux/kvm.h | 13 ++++++++++++ > virt/kvm/kvm_main.c | 38 +++++++++++++++++++++++++++------- > 5 files changed, 67 insertions(+), 11 deletions(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index 21a7578142a1..ace984acc125 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6070,6 +6070,27 @@ writes to the CNTVCT_EL0 and CNTPCT_EL0 registers using the SET_ONE_REG > interface. No error will be returned, but the resulting offset will not be > applied. > > +4.139 KVM_SET_USER_MEMORY_REGION2 > +--------------------------------- > + > +:Capability: KVM_CAP_USER_MEMORY2 > +:Architectures: all > +:Type: vm ioctl > +:Parameters: struct kvm_userspace_memory_region2 (in) > +:Returns: 0 on success, -1 on error > + > +:: > + > + struct kvm_userspace_memory_region2 { > + __u32 slot; > + __u32 flags; > + __u64 guest_phys_addr; > + __u64 memory_size; /* bytes */ > + __u64 userspace_addr; /* start of the userspace allocated memory */ > + }; > + > +See KVM_SET_USER_MEMORY_REGION. > + > 5. The kvm_run structure > ======================== > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 41cce5031126..6409914428ca 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -12455,7 +12455,7 @@ void __user * __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, > } > > for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { > - struct kvm_userspace_memory_region m; > + struct kvm_userspace_memory_region2 m; > > m.slot = id | (i << 16); > m.flags = 0; > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 5faba69403ac..4e741ff27af3 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -1146,9 +1146,9 @@ enum kvm_mr_change { > }; > > int kvm_set_memory_region(struct kvm *kvm, > - const struct kvm_userspace_memory_region *mem); > + const struct kvm_userspace_memory_region2 *mem); > int __kvm_set_memory_region(struct kvm *kvm, > - const struct kvm_userspace_memory_region *mem); > + const struct kvm_userspace_memory_region2 *mem); > void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot); > void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen); > int kvm_arch_prepare_memory_region(struct kvm *kvm, > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 13065dd96132..bd1abe067f28 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -95,6 +95,16 @@ struct kvm_userspace_memory_region { > __u64 userspace_addr; /* start of the userspace allocated memory */ > }; > > +/* for KVM_SET_USER_MEMORY_REGION2 */ > +struct kvm_userspace_memory_region2 { > + __u32 slot; > + __u32 flags; > + __u64 guest_phys_addr; > + __u64 memory_size; > + __u64 userspace_addr; > + __u64 pad[16]; > +}; > + > /* > * The bit 0 ~ bit 15 of kvm_userspace_memory_region::flags are visible for > * userspace, other bits are reserved for kvm internal use which are defined > @@ -1192,6 +1202,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_COUNTER_OFFSET 227 > #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228 > #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229 > +#define KVM_CAP_USER_MEMORY2 230 > > #ifdef KVM_CAP_IRQ_ROUTING > > @@ -1473,6 +1484,8 @@ struct kvm_vfio_spapr_tce { > struct kvm_userspace_memory_region) > #define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47) > #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64) > +#define KVM_SET_USER_MEMORY_REGION2 _IOW(KVMIO, 0x49, \ > + struct kvm_userspace_memory_region2) > > /* enable ucontrol for s390 */ > struct kvm_s390_ucas_mapping { > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 6e708017064d..3f5b7c2c5327 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1578,7 +1578,7 @@ static void kvm_replace_memslot(struct kvm *kvm, > } > } > > -static int check_memory_region_flags(const struct kvm_userspace_memory_region *mem) > +static int check_memory_region_flags(const struct kvm_userspace_memory_region2 *mem) > { > u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES; > > @@ -1980,7 +1980,7 @@ static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id, > * Must be called holding kvm->slots_lock for write. > */ > int __kvm_set_memory_region(struct kvm *kvm, > - const struct kvm_userspace_memory_region *mem) > + const struct kvm_userspace_memory_region2 *mem) > { > struct kvm_memory_slot *old, *new; > struct kvm_memslots *slots; > @@ -2084,7 +2084,7 @@ int __kvm_set_memory_region(struct kvm *kvm, > EXPORT_SYMBOL_GPL(__kvm_set_memory_region); > > int kvm_set_memory_region(struct kvm *kvm, > - const struct kvm_userspace_memory_region *mem) > + const struct kvm_userspace_memory_region2 *mem) > { > int r; > > @@ -2096,7 +2096,7 @@ int kvm_set_memory_region(struct kvm *kvm, > EXPORT_SYMBOL_GPL(kvm_set_memory_region); > > static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, > - struct kvm_userspace_memory_region *mem) > + struct kvm_userspace_memory_region2 *mem) > { > if ((u16)mem->slot >= KVM_USER_MEM_SLOTS) > return -EINVAL; > @@ -4566,6 +4566,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) > { > switch (arg) { > case KVM_CAP_USER_MEMORY: > + case KVM_CAP_USER_MEMORY2: > case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: > case KVM_CAP_JOIN_MEMORY_REGIONS_WORKS: > case KVM_CAP_INTERNAL_ERROR_DATA: > @@ -4821,6 +4822,14 @@ static int kvm_vm_ioctl_get_stats_fd(struct kvm *kvm) > return fd; > } > > +#define SANITY_CHECK_MEM_REGION_FIELD(field) \ > +do { \ > + BUILD_BUG_ON(offsetof(struct kvm_userspace_memory_region, field) != \ > + offsetof(struct kvm_userspace_memory_region2, field)); \ > + BUILD_BUG_ON(sizeof_field(struct kvm_userspace_memory_region, field) != \ > + sizeof_field(struct kvm_userspace_memory_region2, field)); \ > +} while (0) > + > static long kvm_vm_ioctl(struct file *filp, > unsigned int ioctl, unsigned long arg) > { > @@ -4843,15 +4852,28 @@ static long kvm_vm_ioctl(struct file *filp, > r = kvm_vm_ioctl_enable_cap_generic(kvm, &cap); > break; > } > + case KVM_SET_USER_MEMORY_REGION2: > case KVM_SET_USER_MEMORY_REGION: { > - struct kvm_userspace_memory_region kvm_userspace_mem; > + struct kvm_userspace_memory_region2 mem; > + unsigned long size; > + > + if (ioctl == KVM_SET_USER_MEMORY_REGION) > + size = sizeof(struct kvm_userspace_memory_region); > + else > + size = sizeof(struct kvm_userspace_memory_region2); > + > + /* Ensure the common parts of the two structs are identical. */ > + SANITY_CHECK_MEM_REGION_FIELD(slot); > + SANITY_CHECK_MEM_REGION_FIELD(flags); > + SANITY_CHECK_MEM_REGION_FIELD(guest_phys_addr); > + SANITY_CHECK_MEM_REGION_FIELD(memory_size); > + SANITY_CHECK_MEM_REGION_FIELD(userspace_addr); > > r = -EFAULT; > - if (copy_from_user(&kvm_userspace_mem, argp, > - sizeof(kvm_userspace_mem))) > + if (copy_from_user(&mem, argp, size)) > goto out; > > - r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem); > + r = kvm_vm_ioctl_set_memory_region(kvm, &mem); > break; > } > case KVM_GET_DIRTY_LOG: { > -- > 2.42.0.820.g83a721a137-goog >
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 21a7578142a1..ace984acc125 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6070,6 +6070,27 @@ writes to the CNTVCT_EL0 and CNTPCT_EL0 registers using the SET_ONE_REG interface. No error will be returned, but the resulting offset will not be applied. +4.139 KVM_SET_USER_MEMORY_REGION2 +--------------------------------- + +:Capability: KVM_CAP_USER_MEMORY2 +:Architectures: all +:Type: vm ioctl +:Parameters: struct kvm_userspace_memory_region2 (in) +:Returns: 0 on success, -1 on error + +:: + + struct kvm_userspace_memory_region2 { + __u32 slot; + __u32 flags; + __u64 guest_phys_addr; + __u64 memory_size; /* bytes */ + __u64 userspace_addr; /* start of the userspace allocated memory */ + }; + +See KVM_SET_USER_MEMORY_REGION. + 5. The kvm_run structure ======================== diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 41cce5031126..6409914428ca 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12455,7 +12455,7 @@ void __user * __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, } for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { - struct kvm_userspace_memory_region m; + struct kvm_userspace_memory_region2 m; m.slot = id | (i << 16); m.flags = 0; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 5faba69403ac..4e741ff27af3 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1146,9 +1146,9 @@ enum kvm_mr_change { }; int kvm_set_memory_region(struct kvm *kvm, - const struct kvm_userspace_memory_region *mem); + const struct kvm_userspace_memory_region2 *mem); int __kvm_set_memory_region(struct kvm *kvm, - const struct kvm_userspace_memory_region *mem); + const struct kvm_userspace_memory_region2 *mem); void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot); void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen); int kvm_arch_prepare_memory_region(struct kvm *kvm, diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 13065dd96132..bd1abe067f28 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -95,6 +95,16 @@ struct kvm_userspace_memory_region { __u64 userspace_addr; /* start of the userspace allocated memory */ }; +/* for KVM_SET_USER_MEMORY_REGION2 */ +struct kvm_userspace_memory_region2 { + __u32 slot; + __u32 flags; + __u64 guest_phys_addr; + __u64 memory_size; + __u64 userspace_addr; + __u64 pad[16]; +}; + /* * The bit 0 ~ bit 15 of kvm_userspace_memory_region::flags are visible for * userspace, other bits are reserved for kvm internal use which are defined @@ -1192,6 +1202,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_COUNTER_OFFSET 227 #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228 #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229 +#define KVM_CAP_USER_MEMORY2 230 #ifdef KVM_CAP_IRQ_ROUTING @@ -1473,6 +1484,8 @@ struct kvm_vfio_spapr_tce { struct kvm_userspace_memory_region) #define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47) #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64) +#define KVM_SET_USER_MEMORY_REGION2 _IOW(KVMIO, 0x49, \ + struct kvm_userspace_memory_region2) /* enable ucontrol for s390 */ struct kvm_s390_ucas_mapping { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 6e708017064d..3f5b7c2c5327 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1578,7 +1578,7 @@ static void kvm_replace_memslot(struct kvm *kvm, } } -static int check_memory_region_flags(const struct kvm_userspace_memory_region *mem) +static int check_memory_region_flags(const struct kvm_userspace_memory_region2 *mem) { u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES; @@ -1980,7 +1980,7 @@ static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id, * Must be called holding kvm->slots_lock for write. */ int __kvm_set_memory_region(struct kvm *kvm, - const struct kvm_userspace_memory_region *mem) + const struct kvm_userspace_memory_region2 *mem) { struct kvm_memory_slot *old, *new; struct kvm_memslots *slots; @@ -2084,7 +2084,7 @@ int __kvm_set_memory_region(struct kvm *kvm, EXPORT_SYMBOL_GPL(__kvm_set_memory_region); int kvm_set_memory_region(struct kvm *kvm, - const struct kvm_userspace_memory_region *mem) + const struct kvm_userspace_memory_region2 *mem) { int r; @@ -2096,7 +2096,7 @@ int kvm_set_memory_region(struct kvm *kvm, EXPORT_SYMBOL_GPL(kvm_set_memory_region); static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, - struct kvm_userspace_memory_region *mem) + struct kvm_userspace_memory_region2 *mem) { if ((u16)mem->slot >= KVM_USER_MEM_SLOTS) return -EINVAL; @@ -4566,6 +4566,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) { switch (arg) { case KVM_CAP_USER_MEMORY: + case KVM_CAP_USER_MEMORY2: case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: case KVM_CAP_JOIN_MEMORY_REGIONS_WORKS: case KVM_CAP_INTERNAL_ERROR_DATA: @@ -4821,6 +4822,14 @@ static int kvm_vm_ioctl_get_stats_fd(struct kvm *kvm) return fd; } +#define SANITY_CHECK_MEM_REGION_FIELD(field) \ +do { \ + BUILD_BUG_ON(offsetof(struct kvm_userspace_memory_region, field) != \ + offsetof(struct kvm_userspace_memory_region2, field)); \ + BUILD_BUG_ON(sizeof_field(struct kvm_userspace_memory_region, field) != \ + sizeof_field(struct kvm_userspace_memory_region2, field)); \ +} while (0) + static long kvm_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { @@ -4843,15 +4852,28 @@ static long kvm_vm_ioctl(struct file *filp, r = kvm_vm_ioctl_enable_cap_generic(kvm, &cap); break; } + case KVM_SET_USER_MEMORY_REGION2: case KVM_SET_USER_MEMORY_REGION: { - struct kvm_userspace_memory_region kvm_userspace_mem; + struct kvm_userspace_memory_region2 mem; + unsigned long size; + + if (ioctl == KVM_SET_USER_MEMORY_REGION) + size = sizeof(struct kvm_userspace_memory_region); + else + size = sizeof(struct kvm_userspace_memory_region2); + + /* Ensure the common parts of the two structs are identical. */ + SANITY_CHECK_MEM_REGION_FIELD(slot); + SANITY_CHECK_MEM_REGION_FIELD(flags); + SANITY_CHECK_MEM_REGION_FIELD(guest_phys_addr); + SANITY_CHECK_MEM_REGION_FIELD(memory_size); + SANITY_CHECK_MEM_REGION_FIELD(userspace_addr); r = -EFAULT; - if (copy_from_user(&kvm_userspace_mem, argp, - sizeof(kvm_userspace_mem))) + if (copy_from_user(&mem, argp, size)) goto out; - r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem); + r = kvm_vm_ioctl_set_memory_region(kvm, &mem); break; } case KVM_GET_DIRTY_LOG: {