Message ID | 20221213033030.83345-6-seanjc@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp2610597wrr; Mon, 12 Dec 2022 19:34:36 -0800 (PST) X-Google-Smtp-Source: AA0mqf6an2tPYud+YjHMbY0TIRCEXtNnIzU1r6Akn4z2dseXKs5/3Yay8DYBlRgGjOrh6lNBMhRO X-Received: by 2002:a17:906:e54:b0:7c1:23f7:623a with SMTP id q20-20020a1709060e5400b007c123f7623amr14984240eji.66.1670902475884; Mon, 12 Dec 2022 19:34:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670902475; cv=none; d=google.com; s=arc-20160816; b=pYlGpYSjvMErW3SUg1kwoGXBrlWI+35o28LwTFsSyX19f3bfbEhXFUCX6+/OMg5HxY SwqqIujDIIL9wxNRghXD8SCeif3WFjUd35nsvoGzgi/aBvcq3wYeMddVZUpz9Ev8wNFt p+pYpBqZFJl3zrQTpVO4ZmBhVHWo5KMsGqL5MVhFmgT70BiE05QvuGAqssf1ZyPraIhc rAiYEvrm3G/drweH2/SeYHVAD9dxLcLTm6hIu68WmKQIrhCBLMKjC2ROf3Hgv+hXnKnC BhEk6t/C6hVAIzRe8oQv3TgfIWa/eSk28OiDZW+K+Nw/kIX/bDzxjgAQwsy7POd4sBda Al2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=fGXZEHyqBBNocJxYYVo3uHcPnlIM/Akr56NebVZoj5U=; b=nsxpZGpIV4yPSPijPMxYKFxlaBSuhna9Ruqj8Pi9x3oNAOCixCTySvBJdPOFFCxHNV mtbngOxabPVuPzYL5S1weWnLLOmiVBtL7NNSXZ2gAIdC3C60fFj074B4r4typdz+ARNt QQZTlh+7YwZlq8ZE2BiCPB1hPuWH0LbNK5lCC6GJnheOnfJJTknugadPZsSHOZgeQYiT 4b/d9h2xCCypwFnV/voPz5rshACe9phD0B93feo8aacS7B0Dxjd25EhjU5u1JMAFry2M Y0294cHvxpBWJdXj2qtO4X35YuUpz3Z+l5HFfFHVSYWnN3gyc/xhzW6XyXqtowkSa7yT WPqg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=JfKNgB4T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i5-20020a1709064fc500b007c1727f7c57si3304838ejw.243.2022.12.12.19.34.11; Mon, 12 Dec 2022 19:34:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=JfKNgB4T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234295AbiLMDbC (ORCPT <rfc822;jeantsuru.cumc.mandola@gmail.com> + 99 others); Mon, 12 Dec 2022 22:31:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40588 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234130AbiLMDax (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 12 Dec 2022 22:30:53 -0500 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3886B1C418 for <linux-kernel@vger.kernel.org>; Mon, 12 Dec 2022 19:30:42 -0800 (PST) Received: by mail-pf1-x449.google.com with SMTP id n16-20020a056a000d5000b005764608bb24so1193408pfv.12 for <linux-kernel@vger.kernel.org>; Mon, 12 Dec 2022 19:30:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=fGXZEHyqBBNocJxYYVo3uHcPnlIM/Akr56NebVZoj5U=; b=JfKNgB4TUBplhFHUAMfehRbe93HY5CVbJ6zjVogM3HDrERmm2XLfsW4IArihP7/yzm MPPL7MSClc1DfYygfZDbfH66WzrYwqvOvMS6fQ4Sc3JvdX+TDhRC6vsRjcUf6vS/fHqd On1T0IaHnMhYHhBer7cf8XEaeDK5w+1iZ+0+8TSPDwRohpjshU4tyeS2PmWvNAR6QRZl TIP+tN5G7RaIYP/9Omyet0xvZNz3HkOB/ncBNtIfXyPugSWUzFbE1pBAY64Xs1BksA9S bFfvU/VqjO3tN/pg3XjD+JIhxkM9aSasRY8lUKbPY+Ul0YE7nHlP08eX4dqe99z0m97U vU1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fGXZEHyqBBNocJxYYVo3uHcPnlIM/Akr56NebVZoj5U=; b=VBilLxb6nWlXtrlGvmKpcf2GOfUv4058xUGTIo/6dxDxyKVqLFAuB2c8aeoWJ4OWxb N+OQi2lvNg/lw4X0sRyqgP9ZxBnoOl+bHnq4tAXIKLwYXoq+vAQbZIkO5zS+prD8UNSf qNhjXfYbhXTU4JkMMRxSj6jJqGIWoNIlYg0Nd81uvgnXYlFA+0iCUYwyntjZb9xUYLz6 MeXPc0KZq8dnpqrg6wmaJ6EcVNJ8w0hzyBpwbHn48ShsyTYnJIcUAP9i1Lx75y9SxU+v PcGooY8svbvi3cA4i/MHTZDtW0z/55y4VDYmPjC5VmOMlJMGKXO5iDmtIMKVDML3orEP amBQ== X-Gm-Message-State: ANoB5pmGw+4tbFzFkhUWs0cfQmjTu7P9YQyAcHlc92g10ug63icFRJDe 88XI/5TYOUJVc0LCINglFiwpJSxVIZE= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:d711:b0:188:c7b2:2dd with SMTP id w17-20020a170902d71100b00188c7b202ddmr79943918ply.88.1670902241759; Mon, 12 Dec 2022 19:30:41 -0800 (PST) Reply-To: Sean Christopherson <seanjc@google.com> Date: Tue, 13 Dec 2022 03:30:30 +0000 In-Reply-To: <20221213033030.83345-1-seanjc@google.com> Mime-Version: 1.0 References: <20221213033030.83345-1-seanjc@google.com> X-Mailer: git-send-email 2.39.0.rc1.256.g54fd8350bd-goog Message-ID: <20221213033030.83345-6-seanjc@google.com> Subject: [PATCH 5/5] KVM: x86/mmu: Move kvm_tdp_mmu_map()'s prolog and epilog to its caller From: Sean Christopherson <seanjc@google.com> To: Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Robert Hoo <robert.hu@linux.intel.com>, Greg Thelen <gthelen@google.com>, David Matlack <dmatlack@google.com>, Ben Gardon <bgardon@google.com>, Mingwei Zhang <mizhang@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752068234779466780?= X-GMAIL-MSGID: =?utf-8?q?1752068234779466780?= |
Series |
KVM: x86/mmu: TDP MMU fixes for 6.2
|
|
Commit Message
Sean Christopherson
Dec. 13, 2022, 3:30 a.m. UTC
Move the hugepage adjust, tracepoint, and RCU (un)lock logic out of
kvm_tdp_mmu_map() and into its sole caller, kvm_tdp_mmu_page_fault(), to
eliminate the gotos used to bounce through rcu_read_unlock() when bailing
from the walk.
Opportunistically mark kvm_mmu_hugepage_adjust() as static as
kvm_tdp_mmu_map() was the only external user.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 9 ++++++++-
arch/x86/kvm/mmu/mmu_internal.h | 1 -
arch/x86/kvm/mmu/tdp_mmu.c | 22 ++++------------------
3 files changed, 12 insertions(+), 20 deletions(-)
Comments
On Tue, Dec 13, 2022 at 03:30:30AM +0000, Sean Christopherson wrote: > Move the hugepage adjust, tracepoint, and RCU (un)lock logic out of > kvm_tdp_mmu_map() and into its sole caller, kvm_tdp_mmu_page_fault(), to > eliminate the gotos used to bounce through rcu_read_unlock() when bailing > from the walk. > > Opportunistically mark kvm_mmu_hugepage_adjust() as static as > kvm_tdp_mmu_map() was the only external user. > > No functional change intended. > > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/mmu/mmu.c | 9 ++++++++- > arch/x86/kvm/mmu/mmu_internal.h | 1 - > arch/x86/kvm/mmu/tdp_mmu.c | 22 ++++------------------ > 3 files changed, 12 insertions(+), 20 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 254bc46234e0..99c40617d325 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -3085,7 +3085,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, > return min(host_level, max_level); > } > > -void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > +static void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, > + struct kvm_page_fault *fault) > { > struct kvm_memory_slot *slot = fault->slot; > kvm_pfn_t mask; > @@ -4405,7 +4406,13 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, > if (is_page_fault_stale(vcpu, fault)) > goto out_unlock; > > + kvm_mmu_hugepage_adjust(vcpu, fault); Can you also move the call to kvm_mmu_hugepage_adjust() from direct_map() to direct_page_fault()? I do think it's worth the maintenence burden to keep those functions consistent. > + > + trace_kvm_mmu_spte_requested(fault); > + > + rcu_read_lock(); > r = kvm_tdp_mmu_map(vcpu, fault); > + rcu_read_unlock(); I would prefer to keep these in tdp_mmu.c, to reduce the amount of TDP MMU details that bleed into mmu.c (RCU) and for consistency with other TDP MMU APIs that don't require the caller to acquire RCU. This will also be helpful for the Common MMU, as the tracepoint and RCU will be common. e.g. static int __kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { ... } int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { int r; trace_kvm_mmu_spte_requested(fault); rcu_read_lock(); r = __kvm_tdp_mmu_map(vcpu, fault); rcu_read_unlock(); return r; }
On Tue, Dec 20, 2022, David Matlack wrote: > On Tue, Dec 13, 2022 at 03:30:30AM +0000, Sean Christopherson wrote: > > Move the hugepage adjust, tracepoint, and RCU (un)lock logic out of > > kvm_tdp_mmu_map() and into its sole caller, kvm_tdp_mmu_page_fault(), to > > eliminate the gotos used to bounce through rcu_read_unlock() when bailing > > from the walk. > > > > Opportunistically mark kvm_mmu_hugepage_adjust() as static as > > kvm_tdp_mmu_map() was the only external user. > > > > No functional change intended. > > > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > --- > > arch/x86/kvm/mmu/mmu.c | 9 ++++++++- > > arch/x86/kvm/mmu/mmu_internal.h | 1 - > > arch/x86/kvm/mmu/tdp_mmu.c | 22 ++++------------------ > > 3 files changed, 12 insertions(+), 20 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 254bc46234e0..99c40617d325 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -3085,7 +3085,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, > > return min(host_level, max_level); > > } > > > > -void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > +static void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, > > + struct kvm_page_fault *fault) > > { > > struct kvm_memory_slot *slot = fault->slot; > > kvm_pfn_t mask; > > @@ -4405,7 +4406,13 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, > > if (is_page_fault_stale(vcpu, fault)) > > goto out_unlock; > > > > + kvm_mmu_hugepage_adjust(vcpu, fault); > > Can you also move the call to kvm_mmu_hugepage_adjust() from > direct_map() to direct_page_fault()? I do think it's worth the > maintenence burden to keep those functions consistent. Sure. > > + trace_kvm_mmu_spte_requested(fault); > > + > > + rcu_read_lock(); > > r = kvm_tdp_mmu_map(vcpu, fault); > > + rcu_read_unlock(); > > I would prefer to keep these in tdp_mmu.c, to reduce the amount of TDP > MMU details that bleed into mmu.c (RCU) and for consistency with other > TDP MMU APIs that don't require the caller to acquire RCU. This will > also be helpful for the Common MMU, as the tracepoint and RCU will be > common. > > e.g. > > static int __kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > { > ... > } > > int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > { > int r; > > trace_kvm_mmu_spte_requested(fault); > > rcu_read_lock(); > r = __kvm_tdp_mmu_map(vcpu, fault); > rcu_read_unlock(); > > return r; > } I did that originally, but it felt really silly to have the trivial wrapper, especially because mmu.c already has TDP MMU details, e.g. kvm_tdp_mmu_page_fault() takes mmu_lock for read and other flows acquire rcu_read_lock() to protected the TDP MMU. What about the below (split into multiple patches) instead? kvm_tdp_mmu_page_fault() really should live in tdp_mmu.c, the only reason it's in mmu.c is to get at various helpers, e.g. fast_page_fault() and kvm_faultin_pfn(). Or is that doomed to fail because the TDP MMU will want to add code before kvm_faultin_pfn() (I can't remember what motivated splitting out kvm_tdp_mmu_page_fault() in the first place). --- arch/x86/kvm/mmu/mmu.c | 132 ++++++++------------------------ arch/x86/kvm/mmu/mmu_internal.h | 50 ++++++++++++ arch/x86/kvm/mmu/spte.h | 7 -- arch/x86/kvm/mmu/tdp_mmu.c | 41 ++++++---- arch/x86/kvm/mmu/tdp_mmu.h | 8 +- 5 files changed, 108 insertions(+), 130 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 254bc46234e0..8203b1dd2753 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1927,16 +1927,6 @@ static bool kvm_mmu_remote_flush_or_zap(struct kvm *kvm, return true; } -static bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp) -{ - if (sp->role.invalid) - return true; - - /* TDP MMU pages do not use the MMU generation. */ - return !is_tdp_mmu_page(sp) && - unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen); -} - struct mmu_page_path { struct kvm_mmu_page *parent[PT64_ROOT_MAX_LEVEL]; unsigned int idx[PT64_ROOT_MAX_LEVEL]; @@ -3148,9 +3138,6 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) int ret; gfn_t base_gfn = fault->gfn; - kvm_mmu_hugepage_adjust(vcpu, fault); - - trace_kvm_mmu_spte_requested(fault); for_each_shadow_entry(vcpu, fault->addr, it) { /* * We cannot overwrite existing page tables with an NX @@ -4270,54 +4257,10 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, return RET_PF_CONTINUE; } -/* - * Returns true if the page fault is stale and needs to be retried, i.e. if the - * root was invalidated by a memslot update or a relevant mmu_notifier fired. - */ -static bool is_page_fault_stale(struct kvm_vcpu *vcpu, - struct kvm_page_fault *fault) +static int __direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { - struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa); + int r = RET_PF_RETRY; - /* Special roots, e.g. pae_root, are not backed by shadow pages. */ - if (sp && is_obsolete_sp(vcpu->kvm, sp)) - return true; - - /* - * Roots without an associated shadow page are considered invalid if - * there is a pending request to free obsolete roots. The request is - * only a hint that the current root _may_ be obsolete and needs to be - * reloaded, e.g. if the guest frees a PGD that KVM is tracking as a - * previous root, then __kvm_mmu_prepare_zap_page() signals all vCPUs - * to reload even if no vCPU is actively using the root. - */ - if (!sp && kvm_test_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu)) - return true; - - return fault->slot && - mmu_invalidate_retry_hva(vcpu->kvm, fault->mmu_seq, fault->hva); -} - -static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) -{ - int r; - - if (page_fault_handle_page_track(vcpu, fault)) - return RET_PF_EMULATE; - - r = fast_page_fault(vcpu, fault); - if (r != RET_PF_INVALID) - return r; - - r = mmu_topup_memory_caches(vcpu, false); - if (r) - return r; - - r = kvm_faultin_pfn(vcpu, fault, ACC_ALL); - if (r != RET_PF_CONTINUE) - return r; - - r = RET_PF_RETRY; write_lock(&vcpu->kvm->mmu_lock); if (is_page_fault_stale(vcpu, fault)) @@ -4327,6 +4270,10 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (r) goto out_unlock; + kvm_mmu_hugepage_adjust(vcpu, fault); + + trace_kvm_mmu_spte_requested(fault); + r = direct_map(vcpu, fault); out_unlock: @@ -4335,6 +4282,32 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault return r; } +static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) +{ + int r; + + if (page_fault_handle_page_track(vcpu, fault)) + return RET_PF_EMULATE; + + r = fast_page_fault(vcpu, fault); + if (r != RET_PF_INVALID) + return r; + + r = mmu_topup_memory_caches(vcpu, false); + if (r) + return r; + + r = kvm_faultin_pfn(vcpu, fault, ACC_ALL); + if (r != RET_PF_CONTINUE) + return r; + +#ifdef CONFIG_X86_64 + if (tdp_mmu_enabled) + return kvm_tdp_mmu_page_fault(vcpu, fault); +#endif + return __direct_page_fault(vcpu, fault); +} + static int nonpaging_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { @@ -4378,42 +4351,6 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, } EXPORT_SYMBOL_GPL(kvm_handle_page_fault); -#ifdef CONFIG_X86_64 -static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, - struct kvm_page_fault *fault) -{ - int r; - - if (page_fault_handle_page_track(vcpu, fault)) - return RET_PF_EMULATE; - - r = fast_page_fault(vcpu, fault); - if (r != RET_PF_INVALID) - return r; - - r = mmu_topup_memory_caches(vcpu, false); - if (r) - return r; - - r = kvm_faultin_pfn(vcpu, fault, ACC_ALL); - if (r != RET_PF_CONTINUE) - return r; - - r = RET_PF_RETRY; - read_lock(&vcpu->kvm->mmu_lock); - - if (is_page_fault_stale(vcpu, fault)) - goto out_unlock; - - r = kvm_tdp_mmu_map(vcpu, fault); - -out_unlock: - read_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(fault->pfn); - return r; -} -#endif - int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { /* @@ -4438,11 +4375,6 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) } } -#ifdef CONFIG_X86_64 - if (tdp_mmu_enabled) - return kvm_tdp_mmu_page_fault(vcpu, fault); -#endif - return direct_page_fault(vcpu, fault); } diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index ac00bfbf32f6..2c7c2b49f719 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -133,6 +133,28 @@ struct kvm_mmu_page { extern struct kmem_cache *mmu_page_header_cache; +static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page) +{ + struct page *page = pfn_to_page((shadow_page) >> PAGE_SHIFT); + + return (struct kvm_mmu_page *)page_private(page); +} + +static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) +{ + return IS_ENABLED(CONFIG_X86_64) && sp->tdp_mmu_page; +} + +static inline bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp) +{ + if (sp->role.invalid) + return true; + + /* TDP MMU pages do not use the MMU generation. */ + return !is_tdp_mmu_page(sp) && + unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen); +} + static inline int kvm_mmu_role_as_id(union kvm_mmu_page_role role) { return role.smm ? 1 : 0; @@ -314,6 +336,34 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, return r; } +/* + * Returns true if the page fault is stale and needs to be retried, i.e. if the + * root was invalidated by a memslot update or a relevant mmu_notifier fired. + */ +static inline bool is_page_fault_stale(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) +{ + struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa); + + /* Special roots, e.g. pae_root, are not backed by shadow pages. */ + if (sp && is_obsolete_sp(vcpu->kvm, sp)) + return true; + + /* + * Roots without an associated shadow page are considered invalid if + * there is a pending request to free obsolete roots. The request is + * only a hint that the current root _may_ be obsolete and needs to be + * reloaded, e.g. if the guest frees a PGD that KVM is tracking as a + * previous root, then __kvm_mmu_prepare_zap_page() signals all vCPUs + * to reload even if no vCPU is actively using the root. + */ + if (!sp && kvm_test_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu)) + return true; + + return fault->slot && + mmu_invalidate_retry_hva(vcpu->kvm, fault->mmu_seq, fault->hva); +} + int kvm_mmu_max_mapping_level(struct kvm *kvm, const struct kvm_memory_slot *slot, gfn_t gfn, int max_level); diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 1f03701b943a..23e8f8c152b5 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -219,13 +219,6 @@ static inline int spte_index(u64 *sptep) */ extern u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask; -static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page) -{ - struct page *page = pfn_to_page((shadow_page) >> PAGE_SHIFT); - - return (struct kvm_mmu_page *)page_private(page); -} - static inline struct kvm_mmu_page *spte_to_child_sp(u64 spte) { return to_shadow_page(spte & SPTE_BASE_ADDR_MASK); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index cc1fb9a65620..4bb58c0f465b 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1144,19 +1144,12 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter, * Handle a TDP page fault (NPT/EPT violation/misconfiguration) by installing * page tables and SPTEs to translate the faulting guest physical address. */ -int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) +static int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_mmu *mmu = vcpu->arch.mmu; struct kvm *kvm = vcpu->kvm; struct tdp_iter iter; struct kvm_mmu_page *sp; - int ret = RET_PF_RETRY; - - kvm_mmu_hugepage_adjust(vcpu, fault); - - trace_kvm_mmu_spte_requested(fault); - - rcu_read_lock(); tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) { int r; @@ -1169,10 +1162,10 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) * retry, avoiding unnecessary page table allocation and free. */ if (is_removed_spte(iter.old_spte)) - goto retry; + return RET_PF_RETRY; if (iter.level == fault->goal_level) - goto map_target_level; + return tdp_mmu_map_handle_target_level(vcpu, fault, &iter); /* Step down into the lower level page table if it exists. */ if (is_shadow_present_pte(iter.old_spte) && @@ -1199,7 +1192,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) */ if (r) { tdp_mmu_free_sp(sp); - goto retry; + return RET_PF_RETRY; } if (fault->huge_page_disallowed && @@ -1216,14 +1209,30 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) * iterator detected an upper level SPTE was frozen during traversal. */ WARN_ON_ONCE(iter.level == fault->goal_level); - goto retry; + return RET_PF_RETRY; +} -map_target_level: - ret = tdp_mmu_map_handle_target_level(vcpu, fault, &iter); +int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) +{ + int r = RET_PF_RETRY; -retry: + read_lock(&vcpu->kvm->mmu_lock); + + if (is_page_fault_stale(vcpu, fault)) + goto out_unlock; + + kvm_mmu_hugepage_adjust(vcpu, fault); + + trace_kvm_mmu_spte_requested(fault); + + rcu_read_lock(); + r = kvm_tdp_mmu_map(vcpu, fault); rcu_read_unlock(); - return ret; + +out_unlock: + read_unlock(&vcpu->kvm->mmu_lock); + kvm_release_pfn_clean(fault->pfn); + return r; } bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 0a63b1afabd3..697dca948d0a 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -27,7 +27,7 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm); void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm); void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm); -int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); +int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, bool flush); @@ -70,10 +70,4 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr, u64 *spte); -#ifdef CONFIG_X86_64 -static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; } -#else -static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; } -#endif - #endif /* __KVM_X86_MMU_TDP_MMU_H */ base-commit: ffc4e70d9855921b740aee9bcbc8503cc753aee2 --
On Wed, Dec 21, 2022 at 06:32:05PM +0000, Sean Christopherson wrote: > On Tue, Dec 20, 2022, David Matlack wrote: > > On Tue, Dec 13, 2022 at 03:30:30AM +0000, Sean Christopherson wrote: > > > Move the hugepage adjust, tracepoint, and RCU (un)lock logic out of > > > kvm_tdp_mmu_map() and into its sole caller, kvm_tdp_mmu_page_fault(), to > > > eliminate the gotos used to bounce through rcu_read_unlock() when bailing > > > from the walk. > > > > > > Opportunistically mark kvm_mmu_hugepage_adjust() as static as > > > kvm_tdp_mmu_map() was the only external user. > > > > > > No functional change intended. > > > > > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > > --- > > > arch/x86/kvm/mmu/mmu.c | 9 ++++++++- > > > arch/x86/kvm/mmu/mmu_internal.h | 1 - > > > arch/x86/kvm/mmu/tdp_mmu.c | 22 ++++------------------ > > > 3 files changed, 12 insertions(+), 20 deletions(-) > > > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > index 254bc46234e0..99c40617d325 100644 > > > --- a/arch/x86/kvm/mmu/mmu.c > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > @@ -3085,7 +3085,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, > > > return min(host_level, max_level); > > > } > > > > > > -void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > > +static void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, > > > + struct kvm_page_fault *fault) > > > { > > > struct kvm_memory_slot *slot = fault->slot; > > > kvm_pfn_t mask; > > > @@ -4405,7 +4406,13 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, > > > if (is_page_fault_stale(vcpu, fault)) > > > goto out_unlock; > > > > > > + kvm_mmu_hugepage_adjust(vcpu, fault); > > > > Can you also move the call to kvm_mmu_hugepage_adjust() from > > direct_map() to direct_page_fault()? I do think it's worth the > > maintenence burden to keep those functions consistent. > > Sure. > > > > + trace_kvm_mmu_spte_requested(fault); > > > + > > > + rcu_read_lock(); > > > r = kvm_tdp_mmu_map(vcpu, fault); > > > + rcu_read_unlock(); > > > > I would prefer to keep these in tdp_mmu.c, to reduce the amount of TDP > > MMU details that bleed into mmu.c (RCU) and for consistency with other > > TDP MMU APIs that don't require the caller to acquire RCU. This will > > also be helpful for the Common MMU, as the tracepoint and RCU will be > > common. > > > > e.g. > > > > static int __kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > { > > ... > > } > > > > int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > { > > int r; > > > > trace_kvm_mmu_spte_requested(fault); > > > > rcu_read_lock(); > > r = __kvm_tdp_mmu_map(vcpu, fault); > > rcu_read_unlock(); > > > > return r; > > } > > I did that originally, but it felt really silly to have the trivial wrapper, especially > because mmu.c already has TDP MMU details, e.g. kvm_tdp_mmu_page_fault() takes mmu_lock > for read and other flows acquire rcu_read_lock() to protected the TDP MMU. A trivial wrapper is useful in this case. While mmu.c does already have some TDP MMU RCU details, I'd like to decrease that not increase it. > > What about the below (split into multiple patches) instead? kvm_tdp_mmu_page_fault() > really should live in tdp_mmu.c, the only reason it's in mmu.c is to get at various > helpers, e.g. fast_page_fault() and kvm_faultin_pfn(). Maybe, I'm not sure. The page fault handling routines have more to do with mmu.c than tdp_mmu.c. i.e. It's more about integrating with the rest of KVM/x86 (page fault tracking, MMU notifiers, etc.). We only go into tdp_mmu.c to manipulate page tables. > > Or is that doomed to fail because the TDP MMU will want to add code before > kvm_faultin_pfn() (I can't remember what motivated splitting out kvm_tdp_mmu_page_fault() > in the first place). To improve readability (less conditionals: if (tdp_mmu_enabled)) and prepare for more divergence. Your proposal (below) to split out the "lower half" of the page fault handling routine works now because that's where all the divergence is. But with the common MMU there's also going to be divergence in the fast page fault handler. So I prefer to just keep the routines separate to avoid thrashing down the road. > > --- > arch/x86/kvm/mmu/mmu.c | 132 ++++++++------------------------ > arch/x86/kvm/mmu/mmu_internal.h | 50 ++++++++++++ > arch/x86/kvm/mmu/spte.h | 7 -- > arch/x86/kvm/mmu/tdp_mmu.c | 41 ++++++---- > arch/x86/kvm/mmu/tdp_mmu.h | 8 +- > 5 files changed, 108 insertions(+), 130 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 254bc46234e0..8203b1dd2753 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -1927,16 +1927,6 @@ static bool kvm_mmu_remote_flush_or_zap(struct kvm *kvm, > return true; > } > > -static bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp) > -{ > - if (sp->role.invalid) > - return true; > - > - /* TDP MMU pages do not use the MMU generation. */ > - return !is_tdp_mmu_page(sp) && > - unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen); > -} > - > struct mmu_page_path { > struct kvm_mmu_page *parent[PT64_ROOT_MAX_LEVEL]; > unsigned int idx[PT64_ROOT_MAX_LEVEL]; > @@ -3148,9 +3138,6 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > int ret; > gfn_t base_gfn = fault->gfn; > > - kvm_mmu_hugepage_adjust(vcpu, fault); > - > - trace_kvm_mmu_spte_requested(fault); > for_each_shadow_entry(vcpu, fault->addr, it) { > /* > * We cannot overwrite existing page tables with an NX > @@ -4270,54 +4257,10 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, > return RET_PF_CONTINUE; > } > > -/* > - * Returns true if the page fault is stale and needs to be retried, i.e. if the > - * root was invalidated by a memslot update or a relevant mmu_notifier fired. > - */ > -static bool is_page_fault_stale(struct kvm_vcpu *vcpu, > - struct kvm_page_fault *fault) > +static int __direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > { > - struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa); > + int r = RET_PF_RETRY; > > - /* Special roots, e.g. pae_root, are not backed by shadow pages. */ > - if (sp && is_obsolete_sp(vcpu->kvm, sp)) > - return true; > - > - /* > - * Roots without an associated shadow page are considered invalid if > - * there is a pending request to free obsolete roots. The request is > - * only a hint that the current root _may_ be obsolete and needs to be > - * reloaded, e.g. if the guest frees a PGD that KVM is tracking as a > - * previous root, then __kvm_mmu_prepare_zap_page() signals all vCPUs > - * to reload even if no vCPU is actively using the root. > - */ > - if (!sp && kvm_test_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu)) > - return true; > - > - return fault->slot && > - mmu_invalidate_retry_hva(vcpu->kvm, fault->mmu_seq, fault->hva); > -} > - > -static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > -{ > - int r; > - > - if (page_fault_handle_page_track(vcpu, fault)) > - return RET_PF_EMULATE; > - > - r = fast_page_fault(vcpu, fault); > - if (r != RET_PF_INVALID) > - return r; > - > - r = mmu_topup_memory_caches(vcpu, false); > - if (r) > - return r; > - > - r = kvm_faultin_pfn(vcpu, fault, ACC_ALL); > - if (r != RET_PF_CONTINUE) > - return r; > - > - r = RET_PF_RETRY; > write_lock(&vcpu->kvm->mmu_lock); > > if (is_page_fault_stale(vcpu, fault)) > @@ -4327,6 +4270,10 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault > if (r) > goto out_unlock; > > + kvm_mmu_hugepage_adjust(vcpu, fault); > + > + trace_kvm_mmu_spte_requested(fault); > + > r = direct_map(vcpu, fault); > > out_unlock: > @@ -4335,6 +4282,32 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault > return r; > } > > +static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > +{ > + int r; > + > + if (page_fault_handle_page_track(vcpu, fault)) > + return RET_PF_EMULATE; > + > + r = fast_page_fault(vcpu, fault); > + if (r != RET_PF_INVALID) > + return r; > + > + r = mmu_topup_memory_caches(vcpu, false); > + if (r) > + return r; > + > + r = kvm_faultin_pfn(vcpu, fault, ACC_ALL); > + if (r != RET_PF_CONTINUE) > + return r; > + > +#ifdef CONFIG_X86_64 > + if (tdp_mmu_enabled) > + return kvm_tdp_mmu_page_fault(vcpu, fault); > +#endif > + return __direct_page_fault(vcpu, fault); > +} > + > static int nonpaging_page_fault(struct kvm_vcpu *vcpu, > struct kvm_page_fault *fault) > { > @@ -4378,42 +4351,6 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, > } > EXPORT_SYMBOL_GPL(kvm_handle_page_fault); > > -#ifdef CONFIG_X86_64 > -static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, > - struct kvm_page_fault *fault) > -{ > - int r; > - > - if (page_fault_handle_page_track(vcpu, fault)) > - return RET_PF_EMULATE; > - > - r = fast_page_fault(vcpu, fault); > - if (r != RET_PF_INVALID) > - return r; > - > - r = mmu_topup_memory_caches(vcpu, false); > - if (r) > - return r; > - > - r = kvm_faultin_pfn(vcpu, fault, ACC_ALL); > - if (r != RET_PF_CONTINUE) > - return r; > - > - r = RET_PF_RETRY; > - read_lock(&vcpu->kvm->mmu_lock); > - > - if (is_page_fault_stale(vcpu, fault)) > - goto out_unlock; > - > - r = kvm_tdp_mmu_map(vcpu, fault); > - > -out_unlock: > - read_unlock(&vcpu->kvm->mmu_lock); > - kvm_release_pfn_clean(fault->pfn); > - return r; > -} > -#endif > - > int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > { > /* > @@ -4438,11 +4375,6 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > } > } > > -#ifdef CONFIG_X86_64 > - if (tdp_mmu_enabled) > - return kvm_tdp_mmu_page_fault(vcpu, fault); > -#endif > - > return direct_page_fault(vcpu, fault); > } > > diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h > index ac00bfbf32f6..2c7c2b49f719 100644 > --- a/arch/x86/kvm/mmu/mmu_internal.h > +++ b/arch/x86/kvm/mmu/mmu_internal.h > @@ -133,6 +133,28 @@ struct kvm_mmu_page { > > extern struct kmem_cache *mmu_page_header_cache; > > +static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page) > +{ > + struct page *page = pfn_to_page((shadow_page) >> PAGE_SHIFT); > + > + return (struct kvm_mmu_page *)page_private(page); > +} > + > +static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) > +{ > + return IS_ENABLED(CONFIG_X86_64) && sp->tdp_mmu_page; > +} > + > +static inline bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp) > +{ > + if (sp->role.invalid) > + return true; > + > + /* TDP MMU pages do not use the MMU generation. */ > + return !is_tdp_mmu_page(sp) && > + unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen); > +} > + > static inline int kvm_mmu_role_as_id(union kvm_mmu_page_role role) > { > return role.smm ? 1 : 0; > @@ -314,6 +336,34 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, > return r; > } > > +/* > + * Returns true if the page fault is stale and needs to be retried, i.e. if the > + * root was invalidated by a memslot update or a relevant mmu_notifier fired. > + */ > +static inline bool is_page_fault_stale(struct kvm_vcpu *vcpu, > + struct kvm_page_fault *fault) > +{ > + struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa); > + > + /* Special roots, e.g. pae_root, are not backed by shadow pages. */ > + if (sp && is_obsolete_sp(vcpu->kvm, sp)) > + return true; > + > + /* > + * Roots without an associated shadow page are considered invalid if > + * there is a pending request to free obsolete roots. The request is > + * only a hint that the current root _may_ be obsolete and needs to be > + * reloaded, e.g. if the guest frees a PGD that KVM is tracking as a > + * previous root, then __kvm_mmu_prepare_zap_page() signals all vCPUs > + * to reload even if no vCPU is actively using the root. > + */ > + if (!sp && kvm_test_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu)) > + return true; > + > + return fault->slot && > + mmu_invalidate_retry_hva(vcpu->kvm, fault->mmu_seq, fault->hva); > +} is_page_fault_stale() is overkill for the TDP MMU and is KVM/x86-specific. If we do go with your way of splitting things, I'd prefer to have the TDP MMU not call is_page_fault_stale() so that more code can be shared across architectures when the TDP MMU is common. diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 99c40617d325..68db6805072a 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3085,8 +3085,7 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, return min(host_level, max_level); } -static void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, - struct kvm_page_fault *fault) +void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; kvm_pfn_t mask; @@ -4295,8 +4294,28 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu, if (!sp && kvm_test_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu)) return true; - return fault->slot && - mmu_invalidate_retry_hva(vcpu->kvm, fault->mmu_seq, fault->hva); + return mmu_invalidate_retry_fault(vcpu, fault); +} + +static int __direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) +{ + int r = RET_PF_RETRY; + + write_lock(&vcpu->kvm->mmu_lock); + + if (is_page_fault_stale(vcpu, fault)) + goto out_unlock; + + r = make_mmu_pages_available(vcpu); + if (r) + goto out_unlock; + + r = direct_map(vcpu, fault); + +out_unlock: + write_unlock(&vcpu->kvm->mmu_lock); + kvm_release_pfn_clean(fault->pfn); + return r; } static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) @@ -4318,22 +4337,13 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (r != RET_PF_CONTINUE) return r; - r = RET_PF_RETRY; - write_lock(&vcpu->kvm->mmu_lock); - - if (is_page_fault_stale(vcpu, fault)) - goto out_unlock; - - r = make_mmu_pages_available(vcpu); - if (r) - goto out_unlock; +#ifdef CONFIG_X86_64 + if (tdp_mmu_enabled) + return kvm_tdp_mmu_page_fault(vcpu, fault); +#endif - r = direct_map(vcpu, fault); + return __direct_page_fault(vcpu, fault); -out_unlock: - write_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(fault->pfn); - return r; } static int nonpaging_page_fault(struct kvm_vcpu *vcpu, @@ -4379,48 +4389,6 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, } EXPORT_SYMBOL_GPL(kvm_handle_page_fault); -#ifdef CONFIG_X86_64 -static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, - struct kvm_page_fault *fault) -{ - int r; - - if (page_fault_handle_page_track(vcpu, fault)) - return RET_PF_EMULATE; - - r = fast_page_fault(vcpu, fault); - if (r != RET_PF_INVALID) - return r; - - r = mmu_topup_memory_caches(vcpu, false); - if (r) - return r; - - r = kvm_faultin_pfn(vcpu, fault, ACC_ALL); - if (r != RET_PF_CONTINUE) - return r; - - r = RET_PF_RETRY; - read_lock(&vcpu->kvm->mmu_lock); - - if (is_page_fault_stale(vcpu, fault)) - goto out_unlock; - - kvm_mmu_hugepage_adjust(vcpu, fault); - - trace_kvm_mmu_spte_requested(fault); - - rcu_read_lock(); - r = kvm_tdp_mmu_map(vcpu, fault); - rcu_read_unlock(); - -out_unlock: - read_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(fault->pfn); - return r; -} -#endif - int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { /* @@ -4445,11 +4413,6 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) } } -#ifdef CONFIG_X86_64 - if (tdp_mmu_enabled) - return kvm_tdp_mmu_page_fault(vcpu, fault); -#endif - return direct_page_fault(vcpu, fault); } diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 66c294d67641..776b0ad4e58a 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -323,5 +323,13 @@ void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp); void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp); +void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); + +static inline bool mmu_invalidate_retry_fault(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) +{ + return fault->slot && + mmu_invalidate_retry_hva(vcpu->kvm, fault->mmu_seq, fault->hva); +} #endif /* __KVM_X86_MMU_INTERNAL_H */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 78f47eb74544..d4736cb91c9f 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1144,7 +1144,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter, * Handle a TDP page fault (NPT/EPT violation/misconfiguration) by installing * page tables and SPTEs to translate the faulting guest physical address. */ -int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) +static int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_mmu *mmu = vcpu->arch.mmu; struct kvm *kvm = vcpu->kvm; @@ -1212,6 +1212,33 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) return RET_PF_RETRY; } +int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) +{ + struct kvm_mmu_page *root = to_shadow_page(vcpu->arch.mmu->root.hpa); + int r = RET_PF_RETRY; + + read_lock(&vcpu->kvm->mmu_lock); + + if (root->role.invalid) + goto out; + + if (mmu_invalidate_retry_fault(vcpu, fault)) + goto out; + + kvm_mmu_hugepage_adjust(vcpu, fault); + + trace_kvm_mmu_spte_requested(fault); + + rcu_read_lock(); + r = kvm_tdp_mmu_map(vcpu, fault); + rcu_read_unlock(); + +out: + read_unlock(&vcpu->kvm->mmu_lock); + kvm_release_pfn_clean(fault->pfn); + return r; +} + bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, bool flush) { diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 0a63b1afabd3..849e5886e73b 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -27,7 +27,7 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm); void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm); void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm); -int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); +int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, bool flush);
On 12/29/22 20:51, David Matlack wrote: > Your proposal (below) to split out the "lower half" of the page fault > handling routine works now because that's where all the divergence is. > But with the common MMU there's also going to be divergence in the fast > page fault handler. So I prefer to just keep the routines separate to > avoid thrashing down the road. Can you put the changes at the beginning of the common MMU series? Large parts of the whole common MMU refactoring can be merged piece by piece, so they can be taken as soon as they're ready. Paolo
On Thu, Dec 29, 2022 at 1:06 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 12/29/22 20:51, David Matlack wrote: > > Your proposal (below) to split out the "lower half" of the page fault > > handling routine works now because that's where all the divergence is. > > But with the common MMU there's also going to be divergence in the fast > > page fault handler. So I prefer to just keep the routines separate to > > avoid thrashing down the road. > > Can you put the changes at the beginning of the common MMU series? Can do. By "the changes" I assume you mean the yet-to-be-written changes to split out a fast_page_fault() handler for the TDP MMU? Or do you mean Sean's changes (this series)? > Large parts of the whole common MMU refactoring can be merged piece by > piece, so they can be taken as soon as they're ready. Ack.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 254bc46234e0..99c40617d325 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3085,7 +3085,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, return min(host_level, max_level); } -void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) +static void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; kvm_pfn_t mask; @@ -4405,7 +4406,13 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, if (is_page_fault_stale(vcpu, fault)) goto out_unlock; + kvm_mmu_hugepage_adjust(vcpu, fault); + + trace_kvm_mmu_spte_requested(fault); + + rcu_read_lock(); r = kvm_tdp_mmu_map(vcpu, fault); + rcu_read_unlock(); out_unlock: read_unlock(&vcpu->kvm->mmu_lock); diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index ac00bfbf32f6..66c294d67641 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -317,7 +317,6 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, int kvm_mmu_max_mapping_level(struct kvm *kvm, const struct kvm_memory_slot *slot, gfn_t gfn, int max_level); -void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level); void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index cc1fb9a65620..78f47eb74544 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1150,13 +1150,6 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) struct kvm *kvm = vcpu->kvm; struct tdp_iter iter; struct kvm_mmu_page *sp; - int ret = RET_PF_RETRY; - - kvm_mmu_hugepage_adjust(vcpu, fault); - - trace_kvm_mmu_spte_requested(fault); - - rcu_read_lock(); tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) { int r; @@ -1169,10 +1162,10 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) * retry, avoiding unnecessary page table allocation and free. */ if (is_removed_spte(iter.old_spte)) - goto retry; + return RET_PF_RETRY; if (iter.level == fault->goal_level) - goto map_target_level; + return tdp_mmu_map_handle_target_level(vcpu, fault, &iter); /* Step down into the lower level page table if it exists. */ if (is_shadow_present_pte(iter.old_spte) && @@ -1199,7 +1192,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) */ if (r) { tdp_mmu_free_sp(sp); - goto retry; + return RET_PF_RETRY; } if (fault->huge_page_disallowed && @@ -1216,14 +1209,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) * iterator detected an upper level SPTE was frozen during traversal. */ WARN_ON_ONCE(iter.level == fault->goal_level); - goto retry; - -map_target_level: - ret = tdp_mmu_map_handle_target_level(vcpu, fault, &iter); - -retry: - rcu_read_unlock(); - return ret; + return RET_PF_RETRY; } bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,