[RFC,v12,10/33] KVM: Set the stage for handling only shared mappings in mmu_notifier events
Message ID | 20230914015531.1419405-11-seanjc@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp498873vqi; Thu, 14 Sep 2023 10:13:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEcNeR53/iZHhCkf9sBxwmOlbDnvrov3DRT+RjjQNLDj8/JaqtQu5VuCgLu1za0QZVgoiM1 X-Received: by 2002:aa7:9535:0:b0:68f:be13:6c16 with SMTP id c21-20020aa79535000000b0068fbe136c16mr6505469pfp.2.1694711581698; Thu, 14 Sep 2023 10:13:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694711581; cv=none; d=google.com; s=arc-20160816; b=dWU9EUz38vDxjhStS0XgfKw9Fl/tLrbFH/h7d16qaEVzBdkZuPaSbjT2yJRL/SyrcG i8k1UtuJD2vWe2pZZFb9VFUTdRF7GckNT7fG8RXiu9w/xEIJ5OGipIJnhjhjKIKFQllO dvPj3qihNku5qahD4UewgSjuw7AoCu+dV2Sggb/PXy5sM1Q95uocc3B6Qoy6ge9bXO0w +ylrB28RiNxsBTKqsuJveQEulj4MT1mcyD/I8TbGWm8V8UT6zwIITdY1WYI9SeTSy7MP YZnKE+B2fGjfHrk2NiEJz9+COaNWENZoBg/44LpgC0tdLNojCX4OcKS/ZdK6UxAQ28Cc HMYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=3EzYMZ/N2FtmFuyogL5CZw7tscjDCoW1YPbolbcS4Zk=; fh=61hsfVoef5Tbbo+Rm06/Hxsz4fAtyORDF8Po5ZVRZDI=; b=mBn7AjTvP3C89mqyh9gfTwhp2wKNlwW67+GDJfVBp955SdTRdIcGkAxKaMllcdXXqk 3MARyCX2qG9GGyA1TdnbDYfroFlpOysuuPCgSkadvEGicEuClgoUfibtxwWUQdfyL8i+ yU9ZuS4W4ht+F08MoC8aT9zaydRXifuHolbiMlI5gziYeXeqr6mfhdChyr+2KgbL4iYI jQwClRWkbP+0FL7ttzlmgRt10cVN5pL1piHskZESuNokupukT6D8IzBdH5jkn9O9JeOm wt5IOyUp/zk10m6xHldpp1CaqzpwnMuu/HBPIVQVzbEle0v9U6uX8Dyl8jsAXHsIY1GY ObFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=TWWCyuos; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id g22-20020a056a000b9600b0069024c6a9acsi1979963pfj.314.2023.09.14.10.13.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Sep 2023 10:13:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=TWWCyuos; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id B55C482A6753; Wed, 13 Sep 2023 18:58:00 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233938AbjINB5r (ORCPT <rfc822;chrisfriedt@gmail.com> + 35 others); Wed, 13 Sep 2023 21:57:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234316AbjINB4R (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 13 Sep 2023 21:56:17 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D59472703 for <linux-kernel@vger.kernel.org>; Wed, 13 Sep 2023 18:55:56 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-d8153284d6eso593469276.3 for <linux-kernel@vger.kernel.org>; Wed, 13 Sep 2023 18:55:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694656556; x=1695261356; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=3EzYMZ/N2FtmFuyogL5CZw7tscjDCoW1YPbolbcS4Zk=; b=TWWCyuossBhoUKujtRV691Is/eoqhEp8FPYcgXXGjCJefdUwHRQmMD8XmkXPZMGwp0 jXubTVAizd4Zn7uBl1MJkvRm3M1c12jH3+IuuwV1NUPJrWplgSDmJamNxDw+47ROcgKA GC5Yia0PsB0mWqxnI25FNBIxRMKhte73B+RMY+YvThJ8yButbW8QrO+UjFYzqlGprczK kmq0A0JzRkv4MnOOd6ysK0ia0DzUv36l/NOy+LSZEbOxqD0s9FZiFFqOiSkwwAjtnguZ Sip2kfYH93QElCrrc9JJVY9kTvUg4L7fhkLinWkxGBS019sVqG/ix1GF3epsnldzhM+v vHhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694656556; x=1695261356; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=3EzYMZ/N2FtmFuyogL5CZw7tscjDCoW1YPbolbcS4Zk=; b=tQBg+99LMGeEGmOHrQ2kHJwVTRv4wA3lYx0PVaqyY4aSWGH13wi+GwNo52DlNDuLBB qnDZn+cREFPcai1MinIKydEfktcf7LAxFsAPaYmnp5rAj4OsVVmIin3dfuAxVqxdlq8d 52RxxRmN2axuBUjxG4sm+tURIcLnltd9TAehjhVVnvt1T2QZwRcabeMc8KZQCscdKyfD wz/6Sn51umh4RB74cccfIDXLUCfuZuGs1wlPKGDf8UhXP8VcbfP9MkNvSI0yUzJCGV9c aT03RbIJU/BVUWZCr+7CGQWuVtQrZGwl9yvzQctKpylMjYBovD7kPV0sWM3qyudSL70L gCdA== X-Gm-Message-State: AOJu0YzoHKeFVv3ULa3CetUpaevFoK9I7ctbHab6lSHq3JatKfbyP0Tg D1g5o3hsnOkvxASfXSo4PTzNRWGGkAY= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:ad87:0:b0:d80:eb4:9ca with SMTP id z7-20020a25ad87000000b00d800eb409camr101075ybi.0.1694656555915; Wed, 13 Sep 2023 18:55:55 -0700 (PDT) Reply-To: Sean Christopherson <seanjc@google.com> Date: Wed, 13 Sep 2023 18:55:08 -0700 In-Reply-To: <20230914015531.1419405-1-seanjc@google.com> Mime-Version: 1.0 References: <20230914015531.1419405-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.283.g2d96d420d3-goog Message-ID: <20230914015531.1419405-11-seanjc@google.com> Subject: [RFC PATCH v12 10/33] KVM: Set the stage for handling only shared mappings in mmu_notifier events From: Sean Christopherson <seanjc@google.com> To: Paolo Bonzini <pbonzini@redhat.com>, Marc Zyngier <maz@kernel.org>, Oliver Upton <oliver.upton@linux.dev>, Huacai Chen <chenhuacai@kernel.org>, Michael Ellerman <mpe@ellerman.id.au>, Anup Patel <anup@brainfault.org>, Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt <palmer@dabbelt.com>, Albert Ou <aou@eecs.berkeley.edu>, Sean Christopherson <seanjc@google.com>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, Andrew Morton <akpm@linux-foundation.org>, Paul Moore <paul@paul-moore.com>, James Morris <jmorris@namei.org>, "Serge E. Hallyn" <serge@hallyn.com> Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng <chao.p.peng@linux.intel.com>, Fuad Tabba <tabba@google.com>, Jarkko Sakkinen <jarkko@kernel.org>, Anish Moorthy <amoorthy@google.com>, Yu Zhang <yu.c.zhang@linux.intel.com>, Isaku Yamahata <isaku.yamahata@intel.com>, Xu Yilun <yilun.xu@intel.com>, Vlastimil Babka <vbabka@suse.cz>, Vishal Annapurve <vannapurve@google.com>, Ackerley Tng <ackerleytng@google.com>, Maciej Szmigiero <mail@maciej.szmigiero.name>, David Hildenbrand <david@redhat.com>, Quentin Perret <qperret@google.com>, Michael Roth <michael.roth@amd.com>, Wang <wei.w.wang@intel.com>, Liam Merwick <liam.merwick@oracle.com>, Isaku Yamahata <isaku.yamahata@gmail.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Wed, 13 Sep 2023 18:58:00 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777033891190892475 X-GMAIL-MSGID: 1777033891190892475 |
Series |
KVM: guest_memfd() and per-page attributes
|
|
Commit Message
Sean Christopherson
Sept. 14, 2023, 1:55 a.m. UTC
Add flags to "struct kvm_gfn_range" to let notifier events target only
shared and only private mappings, and write up the existing mmu_notifier
events to be shared-only (private memory is never associated with a
userspace virtual address, i.e. can't be reached via mmu_notifiers).
Add two flags so that KVM can handle the three possibilities (shared,
private, and shared+private) without needing something like a tri-state
enum.
Link: https://lore.kernel.org/all/ZJX0hk+KpQP0KUyB@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/linux/kvm_host.h | 2 ++
virt/kvm/kvm_main.c | 7 +++++++
2 files changed, 9 insertions(+)
Comments
On 9/14/2023 9:55 AM, Sean Christopherson wrote: > Add flags to "struct kvm_gfn_range" to let notifier events target only > shared and only private mappings, and write up the existing mmu_notifier > events to be shared-only (private memory is never associated with a > userspace virtual address, i.e. can't be reached via mmu_notifiers). > > Add two flags so that KVM can handle the three possibilities (shared, > private, and shared+private) without needing something like a tri-state > enum. How to understand the word "stage" in short log? > > Link: https://lore.kernel.org/all/ZJX0hk+KpQP0KUyB@google.com > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > include/linux/kvm_host.h | 2 ++ > virt/kvm/kvm_main.c | 7 +++++++ > 2 files changed, 9 insertions(+) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index d8c6ce6c8211..b5373cee2b08 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -263,6 +263,8 @@ struct kvm_gfn_range { > gfn_t start; > gfn_t end; > union kvm_mmu_notifier_arg arg; > + bool only_private; > + bool only_shared; > bool may_block; > }; > bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 174de2789657..a41f8658dfe0 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -635,6 +635,13 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, > * the second or later invocation of the handler). > */ > gfn_range.arg = range->arg; > + > + /* > + * HVA-based notifications aren't relevant to private > + * mappings as they don't have a userspace mapping. > + */ > + gfn_range.only_private = false; > + gfn_range.only_shared = true; > gfn_range.may_block = range->may_block; > > /*
On Mon, Sep 18, 2023, Binbin Wu wrote: > > > On 9/14/2023 9:55 AM, Sean Christopherson wrote: > > Add flags to "struct kvm_gfn_range" to let notifier events target only > > shared and only private mappings, and write up the existing mmu_notifier > > events to be shared-only (private memory is never associated with a > > userspace virtual address, i.e. can't be reached via mmu_notifiers). > > > > Add two flags so that KVM can handle the three possibilities (shared, > > private, and shared+private) without needing something like a tri-state > > enum. > > How to understand the word "stage" in short log? Sorry, it's an idiom[*] that essentially means "to prepare for". I'll rephrase the shortlog to be more straightforward (I have a bad habit of using idioms). [*] https://dictionary.cambridge.org/us/dictionary/english/set-the-stage-for
On Wed, Sep 13, 2023 at 06:55:08PM -0700, Sean Christopherson wrote: > Add flags to "struct kvm_gfn_range" to let notifier events target only > shared and only private mappings, and write up the existing mmu_notifier > events to be shared-only (private memory is never associated with a > userspace virtual address, i.e. can't be reached via mmu_notifiers). > > Add two flags so that KVM can handle the three possibilities (shared, > private, and shared+private) without needing something like a tri-state > enum. > > Link: https://lore.kernel.org/all/ZJX0hk+KpQP0KUyB@google.com > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > include/linux/kvm_host.h | 2 ++ > virt/kvm/kvm_main.c | 7 +++++++ > 2 files changed, 9 insertions(+) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index d8c6ce6c8211..b5373cee2b08 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -263,6 +263,8 @@ struct kvm_gfn_range { > gfn_t start; > gfn_t end; > union kvm_mmu_notifier_arg arg; > + bool only_private; > + bool only_shared; > bool may_block; > }; > bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 174de2789657..a41f8658dfe0 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -635,6 +635,13 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, > * the second or later invocation of the handler). > */ > gfn_range.arg = range->arg; > + > + /* > + * HVA-based notifications aren't relevant to private > + * mappings as they don't have a userspace mapping. > + */ > + gfn_range.only_private = false; > + gfn_range.only_shared = true; > gfn_range.may_block = range->may_block; Who is supposed to read only_private/only_shared? Is it supposed to be plumbed onto arch code and handled specially there? I ask because I see elsewhere you have: /* * If one or more memslots were found and thus zapped, notify arch code * that guest memory has been reclaimed. This needs to be done *after* * dropping mmu_lock, as x86's reclaim path is slooooow. */ if (__kvm_handle_hva_range(kvm, &hva_range).found_memslot) kvm_arch_guest_memory_reclaimed(kvm); and if there are any MMU notifier events that touch HVAs, then kvm_arch_guest_memory_reclaimed()->wbinvd_on_all_cpus() will get called, which causes the performance issues for SEV and SNP that Ashish had brought up. Technically that would only need to happen if there are GPAs in that memslot that aren't currently backed by gmem pages (and then gmem could handle its own wbinvd_on_all_cpus() (or maybe clflush per-page)). Actually, even if there are shared pages in the GPA range, the kvm_arch_guest_memory_reclaimed()->wbinvd_on_all_cpus() can be skipped for guests that only use gmem pages for private memory. Is that acceptable? Just trying to figure out where this only_private/only_shared handling ties into that (or if it's a separate thing entirely). -Mike > > /* > -- > 2.42.0.283.g2d96d420d3-goog >
On Mon, Sep 18, 2023, Michael Roth wrote: > On Wed, Sep 13, 2023 at 06:55:08PM -0700, Sean Christopherson wrote: > > Add flags to "struct kvm_gfn_range" to let notifier events target only > > shared and only private mappings, and write up the existing mmu_notifier > > events to be shared-only (private memory is never associated with a > > userspace virtual address, i.e. can't be reached via mmu_notifiers). > > > > Add two flags so that KVM can handle the three possibilities (shared, > > private, and shared+private) without needing something like a tri-state > > enum. > > > > Link: https://lore.kernel.org/all/ZJX0hk+KpQP0KUyB@google.com > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > --- > > include/linux/kvm_host.h | 2 ++ > > virt/kvm/kvm_main.c | 7 +++++++ > > 2 files changed, 9 insertions(+) > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index d8c6ce6c8211..b5373cee2b08 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -263,6 +263,8 @@ struct kvm_gfn_range { > > gfn_t start; > > gfn_t end; > > union kvm_mmu_notifier_arg arg; > > + bool only_private; > > + bool only_shared; > > bool may_block; > > }; > > bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 174de2789657..a41f8658dfe0 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -635,6 +635,13 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, > > * the second or later invocation of the handler). > > */ > > gfn_range.arg = range->arg; > > + > > + /* > > + * HVA-based notifications aren't relevant to private > > + * mappings as they don't have a userspace mapping. > > + */ > > + gfn_range.only_private = false; > > + gfn_range.only_shared = true; > > gfn_range.may_block = range->may_block; > > Who is supposed to read only_private/only_shared? Is it supposed to be > plumbed onto arch code and handled specially there? Yeah, that's the idea. Though I don't know that it's worth using for SNP, the cost of checking the RMP may be higher than just eating the extra faults. > I ask because I see elsewhere you have: > > /* > * If one or more memslots were found and thus zapped, notify arch code > * that guest memory has been reclaimed. This needs to be done *after* > * dropping mmu_lock, as x86's reclaim path is slooooow. > */ > if (__kvm_handle_hva_range(kvm, &hva_range).found_memslot) > kvm_arch_guest_memory_reclaimed(kvm); > > and if there are any MMU notifier events that touch HVAs, then > kvm_arch_guest_memory_reclaimed()->wbinvd_on_all_cpus() will get called, > which causes the performance issues for SEV and SNP that Ashish had brought > up. Technically that would only need to happen if there are GPAs in that > memslot that aren't currently backed by gmem pages (and then gmem could handle > its own wbinvd_on_all_cpus() (or maybe clflush per-page)). > > Actually, even if there are shared pages in the GPA range, the > kvm_arch_guest_memory_reclaimed()->wbinvd_on_all_cpus() can be skipped for > guests that only use gmem pages for private memory. Is that acceptable? Yes, that was my original plan. I may have forgotten that exact plan at one point or another and not communicated it well. But the idea is definitely that if a VM type, a.k.a. SNP guests, is required to use gmem for private memory, then there's no need to blast WBINVD because barring a KVM bug, the mmu_notifier event can't have freed private memory, even if it *did* zap SPTEs. For gmem, if KVM doesn't precisely zap only shared SPTEs for SNP (is that even possible to do race-free?), then KVM needs to blast WBINVD when freeing memory from gmem even if there are no SPTEs. But that seems like a non-issue for a well-behaved setup because the odds of there being *zero* SPTEs should be nil. > Just trying to figure out where this only_private/only_shared handling ties > into that (or if it's a separate thing entirely). It's mostly a TDX thing. I threw it in this series mostly to "formally" document that the mmu_notifier path only affects shared mappings. If the code causes confusion without the TDX context, and won't be used by SNP, we can and should drop it from the initial merge and have it go along with the TDX series.
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index d8c6ce6c8211..b5373cee2b08 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -263,6 +263,8 @@ struct kvm_gfn_range { gfn_t start; gfn_t end; union kvm_mmu_notifier_arg arg; + bool only_private; + bool only_shared; bool may_block; }; bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 174de2789657..a41f8658dfe0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -635,6 +635,13 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, * the second or later invocation of the handler). */ gfn_range.arg = range->arg; + + /* + * HVA-based notifications aren't relevant to private + * mappings as they don't have a userspace mapping. + */ + gfn_range.only_private = false; + gfn_range.only_shared = true; gfn_range.may_block = range->may_block; /*