Message ID | 20230526234435.662652-10-yuzhao@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp29090vqr; Fri, 26 May 2023 16:53:04 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6BvofQE9at3XCzPcGj8UmFB0UUt+zVr2wMPmmqAlv+CT8vg2O1NcDZzaeaLkvmYKJbl48f X-Received: by 2002:a17:90a:6648:b0:255:53b4:278c with SMTP id f8-20020a17090a664800b0025553b4278cmr3867213pjm.36.1685145184280; Fri, 26 May 2023 16:53:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685145184; cv=none; d=google.com; s=arc-20160816; b=Yk8Ck6U9JGkAloKcb36n/ii73oayYx3Lmv9wh3ytVaXaYqovBoS4qrop1e/4vQHHQF 9z16dvbYrn+NUu120bglP96ddgHQz7tGog9H8gbeq4deu9Rs23b6AyT0ktv9KiVPOKA9 T0H7YBJgYlZpnKwhYzXHuhQ5d3bkk6ET+lzbuSl3Sl9Ts7FSck130xEpqgWCpkvfg3qY iCHSoI1XkFsAewaNRaWnZ+N576LjYNJIR4MVUO4vWm8+nbYCoXHSfoXY6SHigNPQy26i 2TGHWWR9iubMyuPosjkPPzETH+5rhzc8DxJvCzWIHmUdodmnYRwk2zd5XmwsJPeKS1i9 1T9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:references:mime-version :message-id:in-reply-to:date:dkim-signature; bh=M3AF7fBmc1xnJAMOiZdfCsfXZHiS6oPk41ReBM4zbdo=; b=YzO4a2eUagCTEfYmgIjwRFgQsqbWhPI4UtvRM54jlTriAn73U1MUoAigHl1dXl4tZE tTlECU98lWJ0T88vUKhwUJdfNU+TirZE/4mcq/gaycuMto6fPfBBlFxbUQZJxakAbMkI jNNCFbKXTpzcZBKHCK2QMSkspk5OZrSzpKuMhox+S1YnhXZ0KYvrGZQVGfpMY5Jmf7gb 2b2X/rNqB2svUyDGpaNd6XwbsaScO20ytkLh0iMt95pJ7yAsnhJxJU6A7YnhGfhmGC3B 693a0XCqeZxz5ZjABZ/R1deMRdL/BlwmfDHI/LhJhCa83ZILTATthrjo567SY5jYhhv0 JFTg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=yrD5cypj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w16-20020a17090a8a1000b002535c437f9fsi5404484pjn.33.2023.05.26.16.52.50; Fri, 26 May 2023 16:53:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=yrD5cypj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243780AbjEZXpg (ORCPT <rfc822;zhanglyra.2023@gmail.com> + 99 others); Fri, 26 May 2023 19:45:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238127AbjEZXpF (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 26 May 2023 19:45:05 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7BB87E58 for <linux-kernel@vger.kernel.org>; Fri, 26 May 2023 16:44:56 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-bad06cc7fb7so2973482276.3 for <linux-kernel@vger.kernel.org>; Fri, 26 May 2023 16:44:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685144695; x=1687736695; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=M3AF7fBmc1xnJAMOiZdfCsfXZHiS6oPk41ReBM4zbdo=; b=yrD5cypjwj76oGySeMKoL+EObdL+VlHBR0ZE4dlUQxoYbTEWiiVpSKdOJmkcaaiF/Z 4zilNg4E7S1qiA0OC/fMHUKvaq4lLe3h3XrcfBRajVZEzuTWsMjYGvjMBJ/wRyLbmfd/ x0p59R3lRKLAmUF0+sTBQhD8rRciT1hrCi2rbco+ObNSFPCgz7YI6gzDi295X90OlgMM zrJNip6i/8aKO3FHb/HgogYFOfPOgx60+DOedoHxymQQDAPdmrus+K28IPt1M7E0g0cI asbBmGj30I6PgAgE5VquUWTvHUX2/5VYaFLlsw0xBC7qUsoM7lDeTwedn+JU6nSVVIPC w2tA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685144695; x=1687736695; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=M3AF7fBmc1xnJAMOiZdfCsfXZHiS6oPk41ReBM4zbdo=; b=iUTJn3HyLEtseGAGXgHZih0qBY+moBv7tFUVT8tv8+Xp78xeW64NrsFfFdKnokiZSj TZ2S6RlBSvQ4XmhrJjxCY69nwiqDgDmDu8XdTqgwlxxNfv7i67z4/D/qhFhsermu1327 E9TzU8TYS94sbpoHmpJUNBt7Ly/cXXRBg/0KPd8gSVt0nA2E+3Wc6kUT2qw+oAweZ3XZ cHQ1xBP4NzpICf5lJEcfpwUW7NDazGE8qhNQ+FGEZK07CZcnPMpVjmX28KW9y36yMaUz 52G371lS4U6DNctwO858uf+LXG3C0KzSPyZVi2OYa+KRhOOgd7q0xh3OzLeU3pCNIVis HHhw== X-Gm-Message-State: AC+VfDxFYhQCddbj8NOBU4FHMa2m048mXPlZcwATMcUIqTW5N79Kc7td GTb9ZqiVsnhaqeKhlyv/kQc6EhRY6SQ= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:910f:8a15:592b:2087]) (user=yuzhao job=sendgmr) by 2002:a05:6902:1341:b0:bac:6bb:2549 with SMTP id g1-20020a056902134100b00bac06bb2549mr1840571ybu.7.1685144695170; Fri, 26 May 2023 16:44:55 -0700 (PDT) Date: Fri, 26 May 2023 17:44:34 -0600 In-Reply-To: <20230526234435.662652-1-yuzhao@google.com> Message-Id: <20230526234435.662652-10-yuzhao@google.com> Mime-Version: 1.0 References: <20230526234435.662652-1-yuzhao@google.com> X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Subject: [PATCH mm-unstable v2 09/10] kvm/x86: add kvm_arch_test_clear_young() From: Yu Zhao <yuzhao@google.com> To: Andrew Morton <akpm@linux-foundation.org>, Paolo Bonzini <pbonzini@redhat.com> Cc: Alistair Popple <apopple@nvidia.com>, Anup Patel <anup@brainfault.org>, Ben Gardon <bgardon@google.com>, Borislav Petkov <bp@alien8.de>, Catalin Marinas <catalin.marinas@arm.com>, Chao Peng <chao.p.peng@linux.intel.com>, Christophe Leroy <christophe.leroy@csgroup.eu>, Dave Hansen <dave.hansen@linux.intel.com>, Fabiano Rosas <farosas@linux.ibm.com>, Gaosheng Cui <cuigaosheng1@huawei.com>, Gavin Shan <gshan@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>, James Morse <james.morse@arm.com>, "Jason A. Donenfeld" <Jason@zx2c4.com>, Jason Gunthorpe <jgg@ziepe.ca>, Jonathan Corbet <corbet@lwn.net>, Marc Zyngier <maz@kernel.org>, Masami Hiramatsu <mhiramat@kernel.org>, Michael Ellerman <mpe@ellerman.id.au>, Michael Larabel <michael@michaellarabel.com>, Mike Rapoport <rppt@kernel.org>, Nicholas Piggin <npiggin@gmail.com>, Oliver Upton <oliver.upton@linux.dev>, Paul Mackerras <paulus@ozlabs.org>, Peter Xu <peterx@redhat.com>, Sean Christopherson <seanjc@google.com>, Steven Rostedt <rostedt@goodmis.org>, Suzuki K Poulose <suzuki.poulose@arm.com>, Thomas Gleixner <tglx@linutronix.de>, Thomas Huth <thuth@redhat.com>, Will Deacon <will@kernel.org>, Zenghui Yu <yuzenghui@huawei.com>, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-trace-kernel@vger.kernel.org, x86@kernel.org, linux-mm@google.com, Yu Zhao <yuzhao@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767002797159683606?= X-GMAIL-MSGID: =?utf-8?q?1767002797159683606?= |
Series |
mm/kvm: locklessly clear the accessed bit
|
|
Commit Message
Yu Zhao
May 26, 2023, 11:44 p.m. UTC
Implement kvm_arch_test_clear_young() to support the fast path in
mmu_notifier_ops->test_clear_young().
It focuses on a simple case, i.e., TDP MMU sets the accessed bit in
KVM PTEs and VMs are not nested, where it can rely on RCU and
clear_bit() to safely clear the accessed bit without taking
kvm->mmu_lock. Complex cases fall back to the existing slow path
where kvm->mmu_lock is then taken.
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
arch/x86/include/asm/kvm_host.h | 7 +++++++
arch/x86/kvm/mmu/tdp_mmu.c | 34 +++++++++++++++++++++++++++++++++
2 files changed, 41 insertions(+)
Comments
On 5/27/23 01:44, Yu Zhao wrote: > +#define kvm_arch_has_test_clear_young kvm_arch_has_test_clear_young > +static inline bool kvm_arch_has_test_clear_young(void) > +{ > + return IS_ENABLED(CONFIG_X86_64) && > + (!IS_REACHABLE(CONFIG_KVM) || (tdp_mmu_enabled && shadow_accessed_mask)); > +} I don't think you need IS_REACHABLE(CONFIG_KVM) here, it would be a bug if this is called from outside KVM code. Maybe make it a BUILD_BUG_ON? Paolo
On Fri, May 26, 2023, Yu Zhao wrote: > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index 08340219c35a..6875a819e007 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -1232,6 +1232,40 @@ bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > return kvm_tdp_mmu_handle_gfn(kvm, range, test_age_gfn); > } > > +bool kvm_arch_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range) > +{ > + struct kvm_mmu_page *root; > + int offset = ffs(shadow_accessed_mask) - 1; > + > + if (kvm_shadow_root_allocated(kvm)) This needs a comment. > + return true; > + > + rcu_read_lock(); > + > + list_for_each_entry_rcu(root, &kvm->arch.tdp_mmu_roots, link) { As requested in v1[1], please add a macro for a lockless walk. [1] https://lkml.kernel.org/r/Y%2Fed0XYAPx%2B7pukA%40google.com > + struct tdp_iter iter; > + > + if (kvm_mmu_page_as_id(root) != range->slot->as_id) > + continue; > + > + tdp_root_for_each_leaf_pte(iter, root, range->start, range->end) { > + u64 *sptep = rcu_dereference(iter.sptep); > + > + VM_WARN_ON_ONCE(!page_count(virt_to_page(sptep))); Hrm, I don't like adding this in KVM. The primary MMU might guarantee that this callback is invoked if and only if the SPTE is backed by struct page memory, but there's no reason to assume that's true in KVM. If we want the sanity check, then this needs to use kvm_pfn_to_refcounted_page(). And it should use KVM's MMU_WARN_ON(), which is a mess and effectively dead code, but I'm working on changing that[*], i.e. by the time this gets to Linus' tree, the sanity check should have a much cleaner implementation. [2] https://lore.kernel.org/all/20230511235917.639770-8-seanjc@google.com > + > + if (!(iter.old_spte & shadow_accessed_mask)) > + continue; > + > + if (kvm_should_clear_young(range, iter.gfn)) > + clear_bit(offset, (unsigned long *)sptep); If/when you rebase on https://github.com/kvm-x86/linux/tree/next, can you pull out the atomic bits of tdp_mmu_clear_spte_bits() and use that new helper? E.g. diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index fae559559a80..914c34518829 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -58,15 +58,18 @@ static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte, return old_spte; } +static inline u64 tdp_mmu_clear_spte_bits_atomic(tdp_ptep_t sptep, u64 mask) +{ + atomic64_t *sptep_atomic = (atomic64_t *)rcu_dereference(sptep); + + return (u64)atomic64_fetch_and(~mask, sptep_atomic); +} + static inline u64 tdp_mmu_clear_spte_bits(tdp_ptep_t sptep, u64 old_spte, u64 mask, int level) { - atomic64_t *sptep_atomic; - - if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) { - sptep_atomic = (atomic64_t *)rcu_dereference(sptep); - return (u64)atomic64_fetch_and(~mask, sptep_atomic); - } + if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) + return tdp_mmu_clear_spte_bits_atomic(sptep, mask); __kvm_tdp_mmu_write_spte(sptep, old_spte & ~mask); return old_spte;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 753c67072c47..d6dfdebe3d94 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2223,4 +2223,11 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages); */ #define KVM_EXIT_HYPERCALL_MBZ GENMASK_ULL(31, 1) +#define kvm_arch_has_test_clear_young kvm_arch_has_test_clear_young +static inline bool kvm_arch_has_test_clear_young(void) +{ + return IS_ENABLED(CONFIG_X86_64) && + (!IS_REACHABLE(CONFIG_KVM) || (tdp_mmu_enabled && shadow_accessed_mask)); +} + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 08340219c35a..6875a819e007 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1232,6 +1232,40 @@ bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) return kvm_tdp_mmu_handle_gfn(kvm, range, test_age_gfn); } +bool kvm_arch_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range) +{ + struct kvm_mmu_page *root; + int offset = ffs(shadow_accessed_mask) - 1; + + if (kvm_shadow_root_allocated(kvm)) + return true; + + rcu_read_lock(); + + list_for_each_entry_rcu(root, &kvm->arch.tdp_mmu_roots, link) { + struct tdp_iter iter; + + if (kvm_mmu_page_as_id(root) != range->slot->as_id) + continue; + + tdp_root_for_each_leaf_pte(iter, root, range->start, range->end) { + u64 *sptep = rcu_dereference(iter.sptep); + + VM_WARN_ON_ONCE(!page_count(virt_to_page(sptep))); + + if (!(iter.old_spte & shadow_accessed_mask)) + continue; + + if (kvm_should_clear_young(range, iter.gfn)) + clear_bit(offset, (unsigned long *)sptep); + } + } + + rcu_read_unlock(); + + return false; +} + static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter, struct kvm_gfn_range *range) {