From patchwork Thu Feb  2 18:27:49 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52113
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp400677wrn;
        Thu, 2 Feb 2023 10:29:56 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set/W8j6KChEsFyx14MU+/rAglIkYnB7yUhUOH72seX2Tz2BRuB+kW59JUE2AkX6M+9h3trnH
X-Received: by 2002:a17:902:e0c5:b0:194:9324:7084 with SMTP id
 e5-20020a170902e0c500b0019493247084mr6449467pla.36.1675362596420;
        Thu, 02 Feb 2023 10:29:56 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675362596; cv=none;
        d=google.com; s=arc-20160816;
        b=0GAomXVh1US8KmqH/z6svMl3m9GZl1aV4mRIeYC0M2mW6QRu22aNx5ulqoXdgo59Nc
         lIUl1GzM4ZXmt5BN1FKL8SGQ+44CThe3vvmSRPQU0lASohYnEln4qdxcKJkQVeHcyeYO
         IZ4fQipb6jIj+lSuhINbQaDBkO4ABinVXZnf5lZ+Jcp3sK9zWIahyxHYA73o2h+TCQl8
         CwNxuibu0b9NHkyu8m0SovuhmaX5L2EM/jVOQx4kw92dL/1XpB6YTrcOj7c0wYL5zN7X
         EcvFjlLrpf4Qo6vpWOqPJK2Ok4fLhwykGRX1eb+HLaVD4F4y5PKbYQE1PplpuJS3ufm/
         Th+A==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=SvvKkW75+gdxQjUlvNBkeiHOn7ZjUPUrkdnVT/ZkYsk=;
        b=CK5FRKHbIcJBlT7KCQqv2lPVkbljEX8zZTDQdojJ8fALQyusgmVOLj1Mb7J0ZUHB75
         DV455Y451ptJB94XmiIWotYvw6vMQwRJQaZz6Nvew6S64WYz0D/sFxjUbjN5WaM6fWyp
         yEJIgp11S7g4zZ6iNJ4djadjymePhEAQwue5k+5n3Jr+TD6tGMkKlQtOgjSjYnKmOxnl
         daAzbL3ng13TdvkDSby6LYxXLq1/H1/Bs12zx2AbUTEnNR04/a3suxx0VzIHUJVQQkGg
         k8rSTDoU049Yko58wscH8UHMUaYCUBodf/mpEXrzQi3GhO2oWMOL9/JaTbH2TrkO76O1
         /dow==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=D8+uY0Iw;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 142-20020a630194000000b004f287792832si282197pgb.307.2023.02.02.10.29.43;
        Thu, 02 Feb 2023 10:29:56 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=D8+uY0Iw;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232360AbjBBS2R (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:28:17 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60530 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232095AbjBBS2P (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:28:15 -0500
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 285E21E289
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:14 -0800 (PST)
Received: by mail-pg1-x549.google.com with SMTP id
 127-20020a630685000000b004ec5996dcc0so1364924pgg.8
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=SvvKkW75+gdxQjUlvNBkeiHOn7ZjUPUrkdnVT/ZkYsk=;
        b=D8+uY0IwwkhAM0wjz6tGjWHmroHIAqQNhGYpcXwU3usVKzeklVUJT/qLUp9slcFtyf
         hoCrm+MMG/4mwarivWDAcU3jO+n8M5t4sRZfcrdnHJG/ERQhauamLgBJuAk7ewlKMDqW
         58SEYIw8EBpKYvX7A1onz2XNAPWLcNPYBO4Kwq/qKaxkqOgnkvoc+wM03GqZN/HMKZWa
         2F38iTqHWnSsb5aLv6EUT8KhwiEzTauEr/d87z19n3wVfLp/rzOMsyiXXuJhUyNOzLLP
         /8wozJR7QfRfDRxZAacAGNdn3ue3w5hQqG7aZeOZHPAf2uNb7Go2XbSp0RObkPmhAt93
         HyjA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=SvvKkW75+gdxQjUlvNBkeiHOn7ZjUPUrkdnVT/ZkYsk=;
        b=YTrbELos90TVjOEo7XlZSDtpU/8I/HTQL1Fes4ZN7p3tpxi83VroctncuWudXcVSsU
         TtKfvlxk5gmAuM3WCgt2J7wGDqgKfiDcBUrbgWmAQN0zr4GX5WVoyqaS0yBLR5nsR+eq
         Zr7t92SJMlPixcjY9Et6EnaI5owLgcMx5mniS8KXNzN53xzzTTaVwZUBmwk5Bu0nHGhn
         rzH/81GNETsdjv8KobCTtZ4/an6IuedvhjRcnoZnNA71QfNNrs7f7w9X3wcM1PFYaSFZ
         CGtc2v+yD6iNe6IhVJhmtvGtxKXQE409ahueBYOkY3GcN8Eqd7GPvImhhwT1COi4e9B3
         JI/g==
X-Gm-Message-State: AO0yUKW5WREylS5noEbT693UW1Sp1QigjEEUUTuapx0SRlCtwTYK2lBI
        ohqMiZ6qFO9M2vK6iRoJRHHL69u3NxLoHVqjUrCFcxhaS/RCc81HnO0z/O4FUoXcrPd2UjI+b8y
        AsyNu6v+Y7xsqrqq+sCiJMu5SFK5ANRmsMvlcnLWe5eZfmR/55LyGqB05ami1n6u6eXLHv/WS
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a05:6a00:190c:b0:593:a079:639a with SMTP
 id y12-20020a056a00190c00b00593a079639amr1783509pfi.44.1675362493497; Thu, 02
 Feb 2023 10:28:13 -0800 (PST)
Date: Thu,  2 Feb 2023 18:27:49 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-2-bgardon@google.com>
Subject: [PATCH 01/21] KVM: x86/mmu: Rename slot rmap walkers to add clarity
 and clean up code
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745010320407486?=
X-GMAIL-MSGID: =?utf-8?q?1756745010320407486?=

From: Sean Christopherson <seanjc@google.com>

Replace "slot_handle_level" with "walk_slot_rmaps" to better capture what
the helpers are doing, and to slightly shorten the function names so that
each function's return type and attributes can be placed on the same line
as the function declaration.

No functional change intended.

Link: https://lore.kernel.org/mm-commits/CAHk-=wjS-Jg7sGMwUPpDsjv392nDOOs0CtUtVkp=S6Q7JzFJRw@mail.gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 66 +++++++++++++++++++++---------------------
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index aeb240b339f54..09a0a2cc76bae 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5801,23 +5801,24 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 EXPORT_SYMBOL_GPL(kvm_configure_mmu);
 
 /* The return value indicates if tlb flush on all vcpus is needed. */
-typedef bool (*slot_level_handler) (struct kvm *kvm,
+typedef bool (*slot_rmaps_handler) (struct kvm *kvm,
 				    struct kvm_rmap_head *rmap_head,
 				    const struct kvm_memory_slot *slot);
 
 /* The caller should hold mmu-lock before calling this function. */
-static __always_inline bool
-slot_handle_level_range(struct kvm *kvm, const struct kvm_memory_slot *memslot,
-			slot_level_handler fn, int start_level, int end_level,
-			gfn_t start_gfn, gfn_t end_gfn, bool flush_on_yield,
-			bool flush)
+static __always_inline bool __walk_slot_rmaps(struct kvm *kvm,
+					      const struct kvm_memory_slot *slot,
+					      slot_rmaps_handler fn,
+					      int start_level, int end_level,
+					      gfn_t start_gfn, gfn_t end_gfn,
+					      bool flush_on_yield, bool flush)
 {
 	struct slot_rmap_walk_iterator iterator;
 
-	for_each_slot_rmap_range(memslot, start_level, end_level, start_gfn,
+	for_each_slot_rmap_range(slot, start_level, end_level, start_gfn,
 			end_gfn, &iterator) {
 		if (iterator.rmap)
-			flush |= fn(kvm, iterator.rmap, memslot);
+			flush |= fn(kvm, iterator.rmap, slot);
 
 		if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
 			if (flush && flush_on_yield) {
@@ -5833,23 +5834,23 @@ slot_handle_level_range(struct kvm *kvm, const struct kvm_memory_slot *memslot,
 	return flush;
 }
 
-static __always_inline bool
-slot_handle_level(struct kvm *kvm, const struct kvm_memory_slot *memslot,
-		  slot_level_handler fn, int start_level, int end_level,
-		  bool flush_on_yield)
+static __always_inline bool walk_slot_rmaps(struct kvm *kvm,
+					    const struct kvm_memory_slot *slot,
+					    slot_rmaps_handler fn,
+					    int start_level, int end_level,
+					    bool flush_on_yield)
 {
-	return slot_handle_level_range(kvm, memslot, fn, start_level,
-			end_level, memslot->base_gfn,
-			memslot->base_gfn + memslot->npages - 1,
-			flush_on_yield, false);
+	return __walk_slot_rmaps(kvm, slot, fn, start_level, end_level,
+				 slot->base_gfn, slot->base_gfn + slot->npages - 1,
+				 flush_on_yield, false);
 }
 
-static __always_inline bool
-slot_handle_level_4k(struct kvm *kvm, const struct kvm_memory_slot *memslot,
-		     slot_level_handler fn, bool flush_on_yield)
+static __always_inline bool walk_slot_rmaps_4k(struct kvm *kvm,
+					       const struct kvm_memory_slot *slot,
+					       slot_rmaps_handler fn,
+					       bool flush_on_yield)
 {
-	return slot_handle_level(kvm, memslot, fn, PG_LEVEL_4K,
-				 PG_LEVEL_4K, flush_on_yield);
+	return walk_slot_rmaps(kvm, slot, fn, PG_LEVEL_4K, PG_LEVEL_4K, flush_on_yield);
 }
 
 static void free_mmu_pages(struct kvm_mmu *mmu)
@@ -6144,9 +6145,9 @@ static bool kvm_rmap_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_e
 			if (WARN_ON_ONCE(start >= end))
 				continue;
 
-			flush = slot_handle_level_range(kvm, memslot, __kvm_zap_rmap,
-							PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
-							start, end - 1, true, flush);
+			flush = __walk_slot_rmaps(kvm, memslot, __kvm_zap_rmap,
+						  PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
+						  start, end - 1, true, flush);
 		}
 	}
 
@@ -6199,8 +6200,8 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 {
 	if (kvm_memslots_have_rmaps(kvm)) {
 		write_lock(&kvm->mmu_lock);
-		slot_handle_level(kvm, memslot, slot_rmap_write_protect,
-				  start_level, KVM_MAX_HUGEPAGE_LEVEL, false);
+		walk_slot_rmaps(kvm, memslot, slot_rmap_write_protect,
+				start_level, KVM_MAX_HUGEPAGE_LEVEL, false);
 		write_unlock(&kvm->mmu_lock);
 	}
 
@@ -6435,10 +6436,9 @@ static void kvm_shadow_mmu_try_split_huge_pages(struct kvm *kvm,
 	 * all the way to the target level. There's no need to split pages
 	 * already at the target level.
 	 */
-	for (level = KVM_MAX_HUGEPAGE_LEVEL; level > target_level; level--) {
-		slot_handle_level_range(kvm, slot, shadow_mmu_try_split_huge_pages,
-					level, level, start, end - 1, true, false);
-	}
+	for (level = KVM_MAX_HUGEPAGE_LEVEL; level > target_level; level--)
+		__walk_slot_rmaps(kvm, slot, shadow_mmu_try_split_huge_pages,
+				  level, level, start, end - 1, true, false);
 }
 
 /* Must be called with the mmu_lock held in write-mode. */
@@ -6537,8 +6537,8 @@ static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
 	 * Note, use KVM_MAX_HUGEPAGE_LEVEL - 1 since there's no need to zap
 	 * pages that are already mapped at the maximum hugepage level.
 	 */
-	if (slot_handle_level(kvm, slot, kvm_mmu_zap_collapsible_spte,
-			      PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL - 1, true))
+	if (walk_slot_rmaps(kvm, slot, kvm_mmu_zap_collapsible_spte,
+			    PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL - 1, true))
 		kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
 }
 
@@ -6582,7 +6582,7 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 		 * Clear dirty bits only on 4k SPTEs since the legacy MMU only
 		 * support dirty logging at a 4k granularity.
 		 */
-		slot_handle_level_4k(kvm, memslot, __rmap_clear_dirty, false);
+		walk_slot_rmaps_4k(kvm, memslot, __rmap_clear_dirty, false);
 		write_unlock(&kvm->mmu_lock);
 	}
 

From patchwork Thu Feb  2 18:27:50 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52122
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp401029wrn;
        Thu, 2 Feb 2023 10:30:29 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set9uRLS4jk1J9zJXSshDjgbuUOTxJWSrA+XvjafWJaYLAsazPz8ggnRSDjTlVuVVphnFspyG
X-Received: by 2002:a17:903:487:b0:196:3d0a:84cf with SMTP id
 jj7-20020a170903048700b001963d0a84cfmr7031320plb.34.1675362629567;
        Thu, 02 Feb 2023 10:30:29 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675362629; cv=none;
        d=google.com; s=arc-20160816;
        b=eUK+nHVpb+trvePR1PZpwvryyntvSMd0TBy6NF78tMkbG5rPI6yhcLedl72lb8WDyI
         sB7KcAlpJpmsaAos7BRM5qvaasjRd/KmSuVbt+jloD21jcOK5jvhZM36TC1nILRLGa57
         ZzoOZ1/d6Oz48FmppAkqsHRIDmEGtOxYZFLxUiWEsZ4W3KWDmBPv3VVdw88dVsuDVNUG
         PrjUcKNr85xV61kjckTVfPPvO1OINKcNSWpp5u03wuxbVqzyuqJQz9k6Q4tcBNTZHs7H
         kl2TWsTLqfXbDX3UzlDsdML2nGIFMM+2wYOmrQX2DhRlpLW+jYjtEESJAPigZhOEgDIf
         dCNA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=68l7Ca3pftQohWSSywgHUbTYrISqbCJiWjSISNH3A/g=;
        b=bzkrCTE5I6JIVwq8aX+Pzhp1Omk8IUf3oQZx6Ub/44MwybR9FfN9PIC+Vem8g00PEJ
         gpHIeqwSQFiWTGurRHy0CqX6wQlqkWicLD+G8ht6uaWcesdQvf2Plan5zmRc/mXG6m5X
         l+hNbWE+1J2jhCf5mC8s0Q+seYmlkeVMy7pwcjS3Tc44k6yrVuZKSzpare7CPJFdoYFS
         eUz1oRTBNodfcSqYekdwnUTX9mCPo4vz7tvQfti8CwxSLXkc02nnhwuPNB16lGHYhXOo
         pwwIAvyM5Wj4S0hgy+oSQNRnTcb9YAk+69KEOLAo4qxC0EXtyP9MMToa0tbKD2EvrVvv
         D1Bw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=EMXPzaTx;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 e6-20020a170902744600b00194a297cbaesi22091355plt.347.2023.02.02.10.30.17;
        Thu, 02 Feb 2023 10:30:29 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=EMXPzaTx;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232468AbjBBS2V (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:28:21 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60534 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232135AbjBBS2Q (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:28:16 -0500
Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com
 [IPv6:2607:f8b0:4864:20::449])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88B4615CB2
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:15 -0800 (PST)
Received: by mail-pf1-x449.google.com with SMTP id
 z7-20020aa79587000000b00593f19705d5so1367733pfj.10
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=68l7Ca3pftQohWSSywgHUbTYrISqbCJiWjSISNH3A/g=;
        b=EMXPzaTxaxT9UedHi545O+WFewb41vhprDn/JgWd0GuGuUVJf5bJ/x9JMOgYr/DaUo
         GbRid8cxpf9fdXPEp33jBiP2VbnEyzckcdCFs7ozLk3ZyQfN8gRLTpYGA1paFQLWnOvY
         VCC3muI5+X/9OTSPo8pmYE1G3+g+77IGW8s5dN8ACqGG1y1m3K87eLAWbR4LTwMeY60n
         R2GY+cV9cHHulDNM/vtauKXntnFTmlx9V/URQHJrZFL1dBB3JkanU+9ruEO0KMEnqz0I
         6XgC/3Dv3SMD6uxplAxGSCDFMj73TutdCu0MOt4JuHMmUS9zI4MflVOwcaDl/7JL2sy9
         cyAw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=68l7Ca3pftQohWSSywgHUbTYrISqbCJiWjSISNH3A/g=;
        b=iTqVQX90G394ExxbG1qzJZPwOSIDQ1eU4RdW4r7QGgQXcyIsnHRAAZZby5v4bM5zxv
         ++9d2xbSuyRoHUqR8eWuvKQHVh01wgbJz3DS8cvDExZvLlOEEZMs3BtLfUc6/OGTnxhV
         c3+puF3+7v5B8T7J7iaIKODJgRiXl9pam34Ylt5NeE7JXN2SSBK2X6eu1UBC85hgt5q5
         7VNGG667zjYR9mVUqBvBtHZXvnoKqlmeGbyl8CpDI9hEDzFxK965uOQdB4P5Qw4V2Ohz
         f4RNjLh6o5CFdE/qfhdMNjS7sMRXw+A8E0s3nuuGfNJyo4FcpxSz8YX/144FOYwwoLfJ
         aleA==
X-Gm-Message-State: AO0yUKXXPHbfW0LpqGgjDRe3muVfXCyqD2SMa8ittKy/BTBJ7ejvOCjL
        P6/vzvaA7Wcx3jAb/8Sk9wDB/CMmoSDiPsu/BptejcMeEAb3FBVtPw1dH3BP1fOmR8lbpFoFNt1
        4f0o45eZqjBJOBkLFy9Pd2DiwZga/1cS0Jn+i1o48aqUT3yNovLUz3SJgwwiVcCKLc0ojcl4z
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a17:903:120e:b0:196:44d4:244b with SMTP id
 l14-20020a170903120e00b0019644d4244bmr1762579plh.8.1675362494885; Thu, 02 Feb
 2023 10:28:14 -0800 (PST)
Date: Thu,  2 Feb 2023 18:27:50 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-3-bgardon@google.com>
Subject: [PATCH 02/21] KVM: x86/mmu: Replace comment with an actual lockdep
 assertion on mmu_lock
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745044237668066?=
X-GMAIL-MSGID: =?utf-8?q?1756745044237668066?=

From: Sean Christopherson <seanjc@google.com>

Assert that mmu_lock is held for write in __walk_slot_rmaps() instead of
hoping the function comment will magically prevent introducing bugs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 09a0a2cc76bae..2ea8e58f83256 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5805,7 +5805,6 @@ typedef bool (*slot_rmaps_handler) (struct kvm *kvm,
 				    struct kvm_rmap_head *rmap_head,
 				    const struct kvm_memory_slot *slot);
 
-/* The caller should hold mmu-lock before calling this function. */
 static __always_inline bool __walk_slot_rmaps(struct kvm *kvm,
 					      const struct kvm_memory_slot *slot,
 					      slot_rmaps_handler fn,
@@ -5815,6 +5814,8 @@ static __always_inline bool __walk_slot_rmaps(struct kvm *kvm,
 {
 	struct slot_rmap_walk_iterator iterator;
 
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
 	for_each_slot_rmap_range(slot, start_level, end_level, start_gfn,
 			end_gfn, &iterator) {
 		if (iterator.rmap)

From patchwork Thu Feb  2 18:27:51 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52125
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp401126wrn;
        Thu, 2 Feb 2023 10:30:39 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set/+63w/UjUsGgybrAEAKAlN6jtmvaR7EKxh2eWp78dG68qV9u+lFJS26EmHiWvRsGOdYc9l
X-Received: by 2002:a62:7949:0:b0:582:ca4d:f6a7 with SMTP id
 u70-20020a627949000000b00582ca4df6a7mr2587314pfc.4.1675362638840;
        Thu, 02 Feb 2023 10:30:38 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675362638; cv=none;
        d=google.com; s=arc-20160816;
        b=iwJOyQEkWynKX8HlvwvSaysK7XxDq1JcGRUiNIjVBWG+2mJynDIqZ/juiyyIXNxwoh
         tvZGkF1LAPP/xO4SYny8NIYlAGAT20aRACELrjrSU1cAMNU22iqC1Bm6nWeMOJOCU9L+
         AJ0EXP/IFKiPOdFzoO0wrVDxvbCUXiIoPAKaG2Ya3gWd0wnC6HP/PQOpX7FU/uSFiCOP
         br787tA0KmlOxPgcBAndF1+OcSIowTioum4gvdVaBNoy4FiX2Zz7W2CHNV4t3IeBNEhJ
         31DZBKMmrCgNV96OZXlUAbqwkY+fuj8XhI0L/rEGX8NLjWJo8Un3qt+lx8pIQBiTM5sC
         1Fww==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=OApy4O0jSWZaxP9yRDTnwWGTbe0pAxcU+yFhklbFDjU=;
        b=VfI1Hu1qmkamZftN4qeZ3n90GsdJeMBwhEjz8DSopGS2x40bq0Qml9jYIlrBPabRK8
         29XI3UE4Nl2QvBMboFayNy+2+2iScHd9LbFnXJWauNDFRBnT4DdPoCA4TrfNYuq4rpb/
         fJ9/WiyXNzMGU7TuYj9NyKGl3g9AnNYeoToGG1AwyGfEf5BkMpDQ+s4Ngka4CxSD3DXq
         d7mhQ5REPeKg8vanruJeGRNjDNbio4fd9HIG9pTwftqvQiAFIZzB2YOIdk3Ria6kuPlF
         En2P5K8EcX7VEqOCl1GWhNsTqIl75+MW9l+iP3lA3xkIXueyEA7Vo5w+etKTezYflI6L
         89SA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=E6j6K2Cd;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 l185-20020a6225c2000000b00593af10e9edsi41456pfl.38.2023.02.02.10.30.26;
        Thu, 02 Feb 2023 10:30:38 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=E6j6K2Cd;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232209AbjBBS2Y (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:28:24 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60642 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231463AbjBBS2S (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:28:18 -0500
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B2C3265B8
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:17 -0800 (PST)
Received: by mail-pg1-x549.google.com with SMTP id
 k16-20020a635a50000000b0042986056df6so1366117pgm.2
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=OApy4O0jSWZaxP9yRDTnwWGTbe0pAxcU+yFhklbFDjU=;
        b=E6j6K2Cd3cHV60TkGdQXzHn0GMsSZ+APABGIDp9NcizWH0fxalbd2/UOO7UoBv2DrB
         gxQtS3CRY6dUNStY5iKnL+BtTrvlacr6wqlD+WDHbBY6LB6jKpd+fpv7gHPQl9J6wbX7
         R1WUD07zUy03S6GRJz3I5femHFrlHEwS+NUE1KSf3/fkJiOt1cRpLR5/kTexMxtVJy0I
         tM0yhd7ZKR81cFTAN7N8uPCbzghtqPWcP4SAlzhI4f3sX7YkzWAJ0M4Pn2Qn+BJuAHKX
         et2Y5KW0mcUSCOHiHE0z6i7Wm0k8cIMKYPcEOP3NY+xEkSSjyEPc+USyS3mUXrKv4S1H
         QKGA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=OApy4O0jSWZaxP9yRDTnwWGTbe0pAxcU+yFhklbFDjU=;
        b=4CUbRjCKXnR48ix0f6b4tdvRTxzBiwg312WY8BBqJ81kq+2lFgYN/hwEVFajB8O/V8
         mn7RnzMGD8pDmeOrBiiD80QQe9PIPsWaoXwUnB5Bsuj4m65E2Hj6AxOMVU7BRKALNtnE
         1YJoU6mtpY0Xea1t91FhhBDHjYiPtC0zFkkdGTilADMhID5NframnI8dMn2yQSQnyvib
         7wWXkuVrBu/IXKOHs3Vivv9Drb5mku7XDnqQnluCza2YxQUVXNu3+RGOY4BioIvy/eM8
         eyB7KgEpg/hc6Kh2xdnqw86qKiUf29H/MOlPXvspN5qFC2L2FQ1CyayVegA2pva8zaaO
         98cg==
X-Gm-Message-State: AO0yUKVA3hRaAY2KLE5iwjgVlMvHVmGWM6XLpIN5IzTJ7rN9zgvunC5r
        Wv7ma8X0gzmjiL5vX9UsHwjb+zzksQmhfYQggviVO+fcySN6yc9c88eped0noM4EqendyDJQNLM
        Xm4N/QPBW1LjnM2s1OhZKOejigJYGTM6t0V/fccyhT8/dO+uEd5nz/e3V81ZI1ToKpPoCXIiO
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a17:90a:17ef:b0:22c:5369:8a36 with SMTP id
 q102-20020a17090a17ef00b0022c53698a36mr52pja.0.1675362496248; Thu, 02 Feb
 2023 10:28:16 -0800 (PST)
Date: Thu,  2 Feb 2023 18:27:51 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-4-bgardon@google.com>
Subject: [PATCH 03/21] KVM: x86/mmu: Clean up mmu.c functions that put return
 type on separate line
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745053938063485?=
X-GMAIL-MSGID: =?utf-8?q?1756745053938063485?=

From: Sean Christopherson <seanjc@google.com>

Adjust a variety of functions in mmu.c to put the function return type on
the same line as the function declaration.  As stated in the Linus
specification:

  But the "on their own line" is complete garbage to begin with. That
  will NEVER be a kernel rule. We should never have a rule that assumes
  things are so long that they need to be on multiple lines.

  We don't put function return types on their own lines either, even if
  some other projects have that rule (just to get function names at the
  beginning of lines or some other odd reason).

Leave the functions generated by BUILD_MMU_ROLE_REGS_ACCESSOR() as-is,
that code is basically illegible no matter how it's formatted.

No functional change intended.

Link: https://lore.kernel.org/mm-commits/CAHk-=wjS-Jg7sGMwUPpDsjv392nDOOs0CtUtVkp=S6Q7JzFJRw@mail.gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 59 ++++++++++++++++++++----------------------
 1 file changed, 28 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2ea8e58f83256..3674bde2203b2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -876,9 +876,9 @@ static void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 	untrack_possible_nx_huge_page(kvm, sp);
 }
 
-static struct kvm_memory_slot *
-gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
-			    bool no_dirty_log)
+static struct kvm_memory_slot *gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu,
+							   gfn_t gfn,
+							   bool no_dirty_log)
 {
 	struct kvm_memory_slot *slot;
 
@@ -938,10 +938,9 @@ static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte,
 	return count;
 }
 
-static void
-pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
-			   struct pte_list_desc *desc, int i,
-			   struct pte_list_desc *prev_desc)
+static void pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
+				       struct pte_list_desc *desc, int i,
+				       struct pte_list_desc *prev_desc)
 {
 	int j = desc->spte_count - 1;
 
@@ -1493,8 +1492,8 @@ struct slot_rmap_walk_iterator {
 	struct kvm_rmap_head *end_rmap;
 };
 
-static void
-rmap_walk_init_level(struct slot_rmap_walk_iterator *iterator, int level)
+static void rmap_walk_init_level(struct slot_rmap_walk_iterator *iterator,
+				 int level)
 {
 	iterator->level = level;
 	iterator->gfn = iterator->start_gfn;
@@ -1502,10 +1501,10 @@ rmap_walk_init_level(struct slot_rmap_walk_iterator *iterator, int level)
 	iterator->end_rmap = gfn_to_rmap(iterator->end_gfn, level, iterator->slot);
 }
 
-static void
-slot_rmap_walk_init(struct slot_rmap_walk_iterator *iterator,
-		    const struct kvm_memory_slot *slot, int start_level,
-		    int end_level, gfn_t start_gfn, gfn_t end_gfn)
+static void slot_rmap_walk_init(struct slot_rmap_walk_iterator *iterator,
+				const struct kvm_memory_slot *slot,
+				int start_level, int end_level,
+				gfn_t start_gfn, gfn_t end_gfn)
 {
 	iterator->slot = slot;
 	iterator->start_level = start_level;
@@ -3295,9 +3294,9 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
  * Returns true if the SPTE was fixed successfully. Otherwise,
  * someone else modified the SPTE from its original value.
  */
-static bool
-fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
-			u64 *sptep, u64 old_spte, u64 new_spte)
+static bool fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu,
+				    struct kvm_page_fault *fault,
+				    u64 *sptep, u64 old_spte, u64 new_spte)
 {
 	/*
 	 * Theoretically we could also set dirty bit (and flush TLB) here in
@@ -4626,10 +4625,9 @@ static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
 #include "paging_tmpl.h"
 #undef PTTYPE
 
-static void
-__reset_rsvds_bits_mask(struct rsvd_bits_validate *rsvd_check,
-			u64 pa_bits_rsvd, int level, bool nx, bool gbpages,
-			bool pse, bool amd)
+static void __reset_rsvds_bits_mask(struct rsvd_bits_validate *rsvd_check,
+				    u64 pa_bits_rsvd, int level, bool nx,
+				    bool gbpages, bool pse, bool amd)
 {
 	u64 gbpages_bit_rsvd = 0;
 	u64 nonleaf_bit8_rsvd = 0;
@@ -4742,9 +4740,9 @@ static void reset_guest_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 				guest_cpuid_is_amd_or_hygon(vcpu));
 }
 
-static void
-__reset_rsvds_bits_mask_ept(struct rsvd_bits_validate *rsvd_check,
-			    u64 pa_bits_rsvd, bool execonly, int huge_page_level)
+static void __reset_rsvds_bits_mask_ept(struct rsvd_bits_validate *rsvd_check,
+					u64 pa_bits_rsvd, bool execonly,
+					int huge_page_level)
 {
 	u64 high_bits_rsvd = pa_bits_rsvd & rsvd_bits(0, 51);
 	u64 large_1g_rsvd = 0, large_2m_rsvd = 0;
@@ -4844,8 +4842,7 @@ static inline bool boot_cpu_is_amd(void)
  * the direct page table on host, use as much mmu features as
  * possible, however, kvm currently does not do execution-protection.
  */
-static void
-reset_tdp_shadow_zero_bits_mask(struct kvm_mmu *context)
+static void reset_tdp_shadow_zero_bits_mask(struct kvm_mmu *context)
 {
 	struct rsvd_bits_validate *shadow_zero_check;
 	int i;
@@ -5060,8 +5057,8 @@ static void paging32_init_context(struct kvm_mmu *context)
 	context->invlpg = paging32_invlpg;
 }
 
-static union kvm_cpu_role
-kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
+static union kvm_cpu_role kvm_calc_cpu_role(struct kvm_vcpu *vcpu,
+					    const struct kvm_mmu_role_regs *regs)
 {
 	union kvm_cpu_role role = {0};
 
@@ -6653,8 +6650,8 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
 	}
 }
 
-static unsigned long
-mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
+static unsigned long mmu_shrink_scan(struct shrinker *shrink,
+				     struct shrink_control *sc)
 {
 	struct kvm *kvm;
 	int nr_to_scan = sc->nr_to_scan;
@@ -6712,8 +6709,8 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 	return freed;
 }
 
-static unsigned long
-mmu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
+static unsigned long mmu_shrink_count(struct shrinker *shrink,
+				      struct shrink_control *sc)
 {
 	return percpu_counter_read_positive(&kvm_total_used_mmu_pages);
 }

From patchwork Thu Feb  2 18:27:52 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52116
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp400730wrn;
        Thu, 2 Feb 2023 10:30:01 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set/nbdygJ7vn0+R8otfdmAkwOAfOIuiGmapaOoYHz7R/w3aUtnqItluDpYAG4rwpNsnL2eDf
X-Received: by 2002:a05:6a20:3d92:b0:b9:7a47:bca5 with SMTP id
 s18-20020a056a203d9200b000b97a47bca5mr9853992pzi.43.1675362601104;
        Thu, 02 Feb 2023 10:30:01 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675362601; cv=none;
        d=google.com; s=arc-20160816;
        b=R5ZMZOZl51v16p/GBIn9zRcE/6dqay++y99zb1BxbxjefPBa1fb7RCuvHfLwCYFHv2
         YJGayL720APDMk1H8XGMztt6cQRJOF8i8ewiOXAlHbFtJhCmMVdny3Z8YjbsLWa3fBDC
         XdNbsqpPZCO36QUvq8EVitJfnC4qquOb0FrOWs/CUzMYJiKNNR3fKVoZ6pLDr8OFpFIv
         Z7KNs1NhFB1rYGBTw5IS7Pkthf9Ln1QdOChBDy5T0/sWVI3z4ydP7i6+p9bABoXZFEEE
         J/u95sZNlEbkfeYUR9D/GMlXJ29GLOWKOms8SdSh96lHSOqrSyFrII1cpf84t7Uilze9
         A4Fg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=oI2rqe8UglBSqeQM/qZzRcQFn/AWh8f3tqKjZ0B80vs=;
        b=ZUakdTcj1CGYCWDBP6nFFDxLkIFD/EmBRzlRkvjKx40E2OIMzS9QGypyrs2rQV+5W+
         FTE4vgHrAhD7PfwRkADoNx4MLQGlfo1TtKIeeodjlAdsOElODJIiEqpEA6yxnpN90k7p
         kuJ0euLPd0zvF9lsfIm8RzyxslWogK8T5H96e77Fx6+wDMI8K8D/d1WvdJpsyQqgoAS8
         hSkxWPguY7JB1J8Po+bTkEinSb++mwgeV9T+mJRJhf3eiO3U+22EN6jzawlVrAi06vDl
         VQ86IShuySFZ9lQKiEo5QUyogKEFZuMm6xaqEXjSw3REn25LkjoF59t7GLA4zeSpAhTs
         3YOw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=n3D5F4ZK;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 c2-20020a637242000000b004dfce7d3decsi204712pgn.795.2023.02.02.10.29.49;
        Thu, 02 Feb 2023 10:30:01 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=n3D5F4ZK;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232530AbjBBS2k (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:28:40 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60758 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232466AbjBBS2V (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:28:21 -0500
Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com
 [IPv6:2607:f8b0:4864:20::64a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86B8342DE1
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:18 -0800 (PST)
Received: by mail-pl1-x64a.google.com with SMTP id
 y20-20020a170902cad400b001962668ef33so1315739pld.22
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=oI2rqe8UglBSqeQM/qZzRcQFn/AWh8f3tqKjZ0B80vs=;
        b=n3D5F4ZKgEu5F9+wwtibXHJcuhI7R+kG2XBY/kMf5C0UMb4zaihINeT5u8YcwnOcqr
         ZIG9YfwT6ihp8cAkJkJVxAfhYRAFsi2dxNtmuQ9bpesC1zYnFBXNZpJFtLjXFuhtXtWF
         /pmvmP7pUTj+gHJyRPdpAgXGXAtScU8e8+3I5Odg/hNTFgw5BMMS8+/bZxlUBJF3NZre
         ZZxEQGZMLpK/Fk+ZMI5QRLJ17txFiyi9/L27QuRD5+TbWTIxFiJYxzVkVDO1tgLWVQ5C
         3N3GPLOcRB3986GtQaZsknMzTXX4ytBsBLVZM8Inm2aOaMtXVO1LTUcJwfhSzGUUn+gY
         g7sg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=oI2rqe8UglBSqeQM/qZzRcQFn/AWh8f3tqKjZ0B80vs=;
        b=2Ts5RDeaVUFZ1U3OtYXUs8qEYw2bQZpiLYMNQFuFXx9Bhk1TXIAORVgBA8+nOvF/n2
         GA+1LM+MS4OuAgfJCollBkP3zE7/UY9tRWHB6WCxmi4ShVnkjErxOtr8aMt2hlNjdgGb
         PqlLVI+oKi7xk0V58w3OyHgjiCmaqrb9VUd8fW+paxNqGsI7tdKDJXyaQBd3y9iHPguI
         HPQQUj7x4u0bDENyWJwPhLlGOne61rzZzcy09ndZIpBz6uoeYlVrT4Y2EzvhTt2nyUnJ
         7hpgRegyRjxsSbRLgxz+WYcJ1QtrnOE8FdROrh6RPQuLiEdJjgljZ+tgLksE3ku3ijL4
         0xFg==
X-Gm-Message-State: AO0yUKXEBx4s0iWIAjxze48pfyxEmfW/slONBKP96e/bA/M5RpFHUsbl
        Vn4tpx3qg6sEEcwpvYloz0GsluTRt4TWpG8z2WzQRgVH2USE+HINWOHpFv75M7K8jDXdV8rsfuQ
        lU3fHNa2J3I244UxW06CHtTTrcmImAlr1PNi0/iYLfcwtvyk509cBLGH9EcudbN4BPiTKoemj
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a17:902:f552:b0:198:a5da:bf6c with SMTP id
 h18-20020a170902f55200b00198a5dabf6cmr1834157plf.9.1675362497874; Thu, 02 Feb
 2023 10:28:17 -0800 (PST)
Date: Thu,  2 Feb 2023 18:27:52 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-5-bgardon@google.com>
Subject: [PATCH 04/21] KVM: x86/MMU: Add shadow_mmu.(c|h)
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745014949397914?=
X-GMAIL-MSGID: =?utf-8?q?1756745014949397914?=

As a first step to splitting the Shadow MMU out of KVM MMU common code,
add separate files for it with some of the boilerplate and includes the
Shadow MMU will need.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/Makefile         |  2 +-
 arch/x86/kvm/mmu/mmu.c        |  1 +
 arch/x86/kvm/mmu/shadow_mmu.c | 23 +++++++++++++++++++++++
 arch/x86/kvm/mmu/shadow_mmu.h | 21 +++++++++++++++++++++
 4 files changed, 46 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kvm/mmu/shadow_mmu.c
 create mode 100644 arch/x86/kvm/mmu/shadow_mmu.h

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 80e3fe184d17e..d6e94660b006e 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -12,7 +12,7 @@ include $(srctree)/virt/kvm/Makefile.kvm
 kvm-y			+= x86.o emulate.o i8259.o irq.o lapic.o \
 			   i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
 			   hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \
-			   mmu/spte.o
+			   mmu/spte.o mmu/shadow_mmu.o
 
 ifdef CONFIG_HYPERV
 kvm-y			+= kvm_onhyperv.o
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3674bde2203b2..752c38d625a32 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -21,6 +21,7 @@
 #include "mmu.h"
 #include "mmu_internal.h"
 #include "tdp_mmu.h"
+#include "shadow_mmu.h"
 #include "x86.h"
 #include "kvm_cache_regs.h"
 #include "smm.h"
diff --git a/arch/x86/kvm/mmu/shadow_mmu.c b/arch/x86/kvm/mmu/shadow_mmu.c
new file mode 100644
index 0000000000000..eee5a6796d9b0
--- /dev/null
+++ b/arch/x86/kvm/mmu/shadow_mmu.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KVM Shadow MMU
+ *
+ * Extracted from mmu.c
+ *
+ * Copyright (C) 2006 Qumranet, Inc.
+ * Copyright 2010 Red Hat, Inc. and/or its affiliates.
+ * Copyright (C) 2023, Google, Inc.
+ *
+ * Original authors:
+ *   Yaniv Kamay  <yaniv@qumranet.com>
+ *   Avi Kivity   <avi@qumranet.com>
+ */
+#include "mmu.h"
+#include "mmu_internal.h"
+#include "mmutrace.h"
+#include "shadow_mmu.h"
+#include "spte.h"
+
+#include <asm/vmx.h>
+#include <asm/cmpxchg.h>
+#include <trace/events/kvm.h>
diff --git a/arch/x86/kvm/mmu/shadow_mmu.h b/arch/x86/kvm/mmu/shadow_mmu.h
new file mode 100644
index 0000000000000..2bfba6ad20688
--- /dev/null
+++ b/arch/x86/kvm/mmu/shadow_mmu.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * KVM Shadow MMU
+ *
+ * Extracted from mmu.c
+ *
+ * Copyright (C) 2006 Qumranet, Inc.
+ * Copyright 2010 Red Hat, Inc. and/or its affiliates.
+ * Copyright (C) 2023, Google, Inc.
+ *
+ * Original authors:
+ *   Yaniv Kamay  <yaniv@qumranet.com>
+ *   Avi Kivity   <avi@qumranet.com>
+ */
+
+#ifndef __KVM_X86_MMU_SHADOW_MMU_H
+#define __KVM_X86_MMU_SHADOW_MMU_H
+
+#include <linux/kvm_host.h>
+
+#endif /* __KVM_X86_MMU_SHADOW_MMU_H */

From patchwork Thu Feb  2 18:27:53 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52115
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp400727wrn;
        Thu, 2 Feb 2023 10:30:01 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set+KHKdI88tZKfD0h8Gkexd2end8miEyQvRnOJCqFhQU+tzZxOVAVnMH4QocvoIfBZ0bijbG
X-Received: by 2002:a05:6a20:e185:b0:bf:3def:16cd with SMTP id
 ks5-20020a056a20e18500b000bf3def16cdmr6946119pzb.59.1675362600801;
        Thu, 02 Feb 2023 10:30:00 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675362600; cv=none;
        d=google.com; s=arc-20160816;
        b=X4UjpHTUE2odYixyJHNAUG07++MwtAc8p9nyBWnngLKP30uyeBNKL9951KZKMt5KSQ
         wyqR1v4mFKkfykayajuq48/h0rpy2+jXq4Du2f5xE4Io3Oh6iMuieGT0NzTGEzIebLgM
         Nk6KAor9KyXOHYrPpGhIT7Msfa/0yhdlcTRVIIWXTTfWhZLRLUgW4r8IVrSczSwY4jq+
         5MjQt8EdtYUAnAcZ1eglbWCwtIXVoNepF2ANA5OfYJzPXReG/wN9S+ujS7/5pHCae6ls
         qJC9Wdhl9WVCypcXx1fMNqQhrTtH/XZpPF1Lp6lbsWS3tKo9B6wGFTeyIsgB0vN1BEAF
         tZ7Q==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=IxKe7Lw8l+Mx0w7jZKX1pH/HEpUfPWqwIgLzjXVd3Vk=;
        b=GMvnAlk/bfvi+/95yvEf0FNtl9pb2ocurGgTL4WNnXGVewBKojWX0iMQTSzq/ERCQK
         0VZuZ8oSBPrnz9kcsOUmlqxUuptp1pAAYyXr/Vre5r5OfrEt7cHGXldb8gC+iLMcjc6Y
         wxV48t93NqXUNti6t0ZAsb4R8aE+sak5GsrgZwSFJH4zvKraIZcteyLTRMaQNu+spL8k
         3c5j8KYPkwsJ1geDZF3mn4bQvSpBMKc0zN6N21iObG6afjxQ2QtszouG611M3eXxdCUd
         coc5R7nqsWze65r+xuy8Hlmvf1UH8wwO+AUauq4oVR/jyo+43OV3hkGWnIfdB7AV68jD
         CDog==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=rebAKGLv;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 g8-20020a636b08000000b004e407a446adsi228796pgc.647.2023.02.02.10.29.46;
        Thu, 02 Feb 2023 10:30:00 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=rebAKGLv;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232667AbjBBS2v (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:28:51 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32998 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232494AbjBBS21 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:28:27 -0500
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D58611E9DC
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:20 -0800 (PST)
Received: by mail-pj1-x1049.google.com with SMTP id
 ls15-20020a17090b350f00b0022e601fa9e6so3257947pjb.5
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=IxKe7Lw8l+Mx0w7jZKX1pH/HEpUfPWqwIgLzjXVd3Vk=;
        b=rebAKGLvNp+3ijcwuDqkO0k3UA+4y2RWEIs2g5bQcv3wWKbvJ6yGLbiJ3ICht38aq2
         MyN0zR8GK30mk+DrTJUNwc442qYjZtx8xJKdnGNW2bNbp1CHrmL4ML44zV/LyH/sJQvT
         ddK+3gv4X6aO0ctRLj+5Dt6PZIYQqcX/UHvSDm3S1xPWac+108ylZrqVkVIAOdDsHl0O
         8KU9+zKhVBQczB2IBPY4W0ASZjtW7kX9o04RqN4wx8FKQk9oETxmLuiDzHmmNwFKRwpF
         drKFftkZ32hj9UTmBECN5SVx7R8R/yyeGi6L/yU8cii2FzHk/oJTz4XJCpL9EfQp2qMX
         7taw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=IxKe7Lw8l+Mx0w7jZKX1pH/HEpUfPWqwIgLzjXVd3Vk=;
        b=aabiw3LBGDyZXcX6NCYHEA1RHjCohpHJ5wuekZQS7JxyxngBnuasOuMf1lwrtFTjYJ
         RzrChgQL7JhFJVCiL3NEmECzCU/xdkW3Yli8uES0YvWwqDjyQauY7niXWTrHu1A3tm8A
         F/gkbNeKoBxtQBVcLL9510V0UuAqE+f+JHGFRbEvJYJT8mYgxFBSrhfK2ZVGQmsDdwme
         l+oYT7aaf4cyZoFlI78A0Npv34iL7XR0/flcdGEbbM4zBkdutWidXO1s8SYjDixHHVTC
         Tk7C5dsIpg+KnUB6wBydUX9B7FtTZoBKXjehamCKXooQZOeF5QjxmTIbpFscxYyNX7dr
         JTxA==
X-Gm-Message-State: AO0yUKV4L8yrqM37oYRATVvL9oV9ta8gnHLTqVIkmjxKmF1ZWHCT5hli
        MyNdTwukf7AZuXGbCoepY/F5laRuK4N7q51OnNfwFt5ImzmV0qWEU49zVSux5oWCrHQuzaC3/Bb
        U5gDUcdzMJfjaHa/R81Bj8YhIOHBnZBpYLZWVf2l4gKEAoui5u2JcHmJ73lVELKOjNMuT4xd5
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a17:902:d508:b0:196:7127:2240 with SMTP id
 b8-20020a170902d50800b0019671272240mr1761378plg.11.1675362499606; Thu, 02 Feb
 2023 10:28:19 -0800 (PST)
Date: Thu,  2 Feb 2023 18:27:53 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-6-bgardon@google.com>
Subject: [PATCH 05/21] KVM: x86/MMU: Expose functions for the Shadow MMU
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745014653179935?=
X-GMAIL-MSGID: =?utf-8?q?1756745014653179935?=

Expose various common MMU functions which the Shadow MMU will need via
mmu_internal.h. This slightly reduces the work needed to move the
shadow MMU code out of mmu.c, which will already be a massive change.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 41 ++++++++++++++-------------------
 arch/x86/kvm/mmu/mmu_internal.h | 24 +++++++++++++++++++
 2 files changed, 41 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 752c38d625a32..12d38a8772a80 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -164,9 +164,9 @@ struct kvm_shadow_walk_iterator {
 		({ spte = mmu_spte_get_lockless(_walker.sptep); 1; });	\
 	     __shadow_walk_next(&(_walker), spte))
 
-static struct kmem_cache *pte_list_desc_cache;
+struct kmem_cache *pte_list_desc_cache;
 struct kmem_cache *mmu_page_header_cache;
-static struct percpu_counter kvm_total_used_mmu_pages;
+struct percpu_counter kvm_total_used_mmu_pages;
 
 static void mmu_spte_set(u64 *sptep, u64 spte);
 
@@ -242,11 +242,6 @@ static struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
 	return regs;
 }
 
-static inline bool kvm_available_flush_tlb_with_range(void)
-{
-	return kvm_x86_ops.tlb_remote_flush_with_range;
-}
-
 static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm,
 		struct kvm_tlb_range *range)
 {
@@ -270,8 +265,8 @@ void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
 	kvm_flush_remote_tlbs_with_range(kvm, &range);
 }
 
-static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
-			   unsigned int access)
+void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
+		    unsigned int access)
 {
 	u64 spte = make_mmio_spte(vcpu, gfn, access);
 
@@ -623,7 +618,7 @@ static inline bool is_tdp_mmu_active(struct kvm_vcpu *vcpu)
 	return tdp_mmu_enabled && vcpu->arch.mmu->root_role.direct;
 }
 
-static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
+void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
 {
 	if (is_tdp_mmu_active(vcpu)) {
 		kvm_tdp_mmu_walk_lockless_begin();
@@ -642,7 +637,7 @@ static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
 	}
 }
 
-static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
+void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
 {
 	if (is_tdp_mmu_active(vcpu)) {
 		kvm_tdp_mmu_walk_lockless_end();
@@ -835,8 +830,8 @@ void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 		      &kvm->arch.possible_nx_huge_pages);
 }
 
-static void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
-				 bool nx_huge_page_possible)
+void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+			  bool nx_huge_page_possible)
 {
 	sp->nx_huge_page_disallowed = true;
 
@@ -870,16 +865,15 @@ void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 	list_del_init(&sp->possible_nx_huge_page_link);
 }
 
-static void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	sp->nx_huge_page_disallowed = false;
 
 	untrack_possible_nx_huge_page(kvm, sp);
 }
 
-static struct kvm_memory_slot *gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu,
-							   gfn_t gfn,
-							   bool no_dirty_log)
+struct kvm_memory_slot *gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu,
+						    gfn_t gfn, bool no_dirty_log)
 {
 	struct kvm_memory_slot *slot;
 
@@ -1415,7 +1409,7 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 	return write_protected;
 }
 
-static bool kvm_vcpu_write_protect_gfn(struct kvm_vcpu *vcpu, u64 gfn)
+bool kvm_vcpu_write_protect_gfn(struct kvm_vcpu *vcpu, u64 gfn)
 {
 	struct kvm_memory_slot *slot;
 
@@ -1914,9 +1908,8 @@ static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	return ret;
 }
 
-static bool kvm_mmu_remote_flush_or_zap(struct kvm *kvm,
-					struct list_head *invalid_list,
-					bool remote_flush)
+bool kvm_mmu_remote_flush_or_zap(struct kvm *kvm, struct list_head *invalid_list,
+				 bool remote_flush)
 {
 	if (!remote_flush && list_empty(invalid_list))
 		return false;
@@ -1928,7 +1921,7 @@ static bool kvm_mmu_remote_flush_or_zap(struct kvm *kvm,
 	return true;
 }
 
-static bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
+bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	if (sp->role.invalid)
 		return true;
@@ -6216,7 +6209,7 @@ static inline bool need_topup(struct kvm_mmu_memory_cache *cache, int min)
 	return kvm_mmu_memory_cache_nr_free_objects(cache) < min;
 }
 
-static bool need_topup_split_caches_or_resched(struct kvm *kvm)
+bool need_topup_split_caches_or_resched(struct kvm *kvm)
 {
 	if (need_resched() || rwlock_needbreak(&kvm->mmu_lock))
 		return true;
@@ -6231,7 +6224,7 @@ static bool need_topup_split_caches_or_resched(struct kvm *kvm)
 	       need_topup(&kvm->arch.split_shadow_page_cache, 1);
 }
 
-static int topup_split_caches(struct kvm *kvm)
+int topup_split_caches(struct kvm *kvm)
 {
 	/*
 	 * Allocating rmap list entries when splitting huge pages for nested
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index ac00bfbf32f67..95f0adfb3b0b4 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -131,7 +131,9 @@ struct kvm_mmu_page {
 #endif
 };
 
+extern struct kmem_cache *pte_list_desc_cache;
 extern struct kmem_cache *mmu_page_header_cache;
+extern struct percpu_counter kvm_total_used_mmu_pages;
 
 static inline int kvm_mmu_role_as_id(union kvm_mmu_page_role role)
 {
@@ -323,6 +325,28 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
 void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
 
 void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
+void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+			  bool nx_huge_page_possible);
 void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
+void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 
+static inline bool kvm_available_flush_tlb_with_range(void)
+{
+	return kvm_x86_ops.tlb_remote_flush_with_range;
+}
+
+void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
+		    unsigned int access);
+struct kvm_memory_slot *gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu,
+						    gfn_t gfn, bool no_dirty_log);
+bool kvm_vcpu_write_protect_gfn(struct kvm_vcpu *vcpu, u64 gfn);
+bool kvm_mmu_remote_flush_or_zap(struct kvm *kvm, struct list_head *invalid_list,
+				 bool remote_flush);
+bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp);
+
+void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu);
+void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu);
+
+bool need_topup_split_caches_or_resched(struct kvm *kvm);
+int topup_split_caches(struct kvm *kvm);
 #endif /* __KVM_X86_MMU_INTERNAL_H */

From patchwork Thu Feb  2 18:27:54 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52112
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp400598wrn;
        Thu, 2 Feb 2023 10:29:44 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set+Uuvs0kCXqFjhCx6oYZjeXINNqXPyXOWuwWcNsAJgCTyuWhcWrjRX5W57ha62f/U+WdnhO
X-Received: by 2002:a05:6a21:158e:b0:bc:9007:e53 with SMTP id
 nr14-20020a056a21158e00b000bc90070e53mr6171316pzb.0.1675362584172;
        Thu, 02 Feb 2023 10:29:44 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675362584; cv=none;
        d=google.com; s=arc-20160816;
        b=gKGcIa0DruN2gjxqaqBU1/1WkF3Zgheb9BwforSwMhpol6b578+LA25M7iRUXEfYar
         PO1JWzZSXOfbxjtUYudZpNmW4oERZgGYiZVlXQf/xS/ZmUU1s1o3k5mcq7JW0i9dFjc5
         FL8pP41kmfYa9ppdW+p2PewRi/p2z9tUH3j4EkwTWD/LaUSWvLCFbzSe33sLuKWMX5ex
         g0gcBHjDLSL1scmtnCP7r34PbLZxLM191ojcMJWdB8J9c7mIfyFrkrRfrVyIJx7jLbHO
         /zdOfDMh06ISXsyFmq6t69f8NaSHNi1RJwaGMttCNTRaBUdml+aBwP8g0NLmOH+/ms+I
         Vk2w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=VxkR2t/soCZeC9eyZl1mM8b7dI9FSV2mUjQI7FtOHrA=;
        b=k3RP7JaU2F5txJvaEMZrJCt3Z3qO0X3f009wp3fYhoUE4uS/EWeTe2rekLt8Kr1LvP
         Vka554WrUXjvM77JbqdPYxFCQJwSK43ESrVktD1mXKwJlVHUrJ7I9z8pJPID2+Z09uO7
         0dc9jJJpBRGQHzFqZ34gfLH84mNHXl6KE4oYFYm3g0PCjt3IIuJreT1j7xrtVdXyyRZj
         A3yb4UOP1Ls4t4jnVpq6sSAA8P04mwBcVAfbRfsknnXwTaveHtCWJuYPd0+tbSN0iOgW
         q2d7Bbj5lWT60bwwV/z5k9ooWT+t1sx08s1Z4fxvsK/eI7KM4vMinUKvb9fXFpxXHtQM
         wvHQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=bRIm91U6;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 e24-20020a637458000000b004f235dc52c1si310882pgn.350.2023.02.02.10.29.29;
        Thu, 02 Feb 2023 10:29:44 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=bRIm91U6;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232341AbjBBS2o (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:28:44 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33262 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232514AbjBBS2j (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:28:39 -0500
Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com
 [IPv6:2607:f8b0:4864:20::54a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 20520265B8
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:21 -0800 (PST)
Received: by mail-pg1-x54a.google.com with SMTP id
 g12-20020a656ccc000000b004ee62dadb95so1363039pgw.9
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=VxkR2t/soCZeC9eyZl1mM8b7dI9FSV2mUjQI7FtOHrA=;
        b=bRIm91U6CXHVWLtrnVZHyCbrrPa45LPwdl416lNZtl+FS4NbZUA/w8w3xI7dR3QaUJ
         uzEyVHHOYRIhLTQG+2FeNkgmc+azanymGtl1QjsMyWScmGGcZ4ulvAEN2hHl11MQnb5J
         eOl35AxgHUpuLLm/0yUkZ6/NrKDsh0/x+qZEWGU+6030SYpFKtDTj64pHtPMMmbY3Djo
         hurVP9eX5Osf7WNhIL0csIDXWHRBcJu+1F20CketRewTixj85Jz2/ph8G9qDLLd45WEW
         kk/bY69JhozRJVuKnj5DtXaDcPI7Ke+YuN+juZuw+du57/vbwq2+6mu5fKeeVzoB0WdE
         jUJA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=VxkR2t/soCZeC9eyZl1mM8b7dI9FSV2mUjQI7FtOHrA=;
        b=6vSQPnvyZ+EWTkVZ0Z6H+qWmuNhvzEbZOBJ+5n4cw5g6gioFhQUaWzVSyRXcDMaROK
         k6MlMA4/YwBRDHi3WO5TTnGcct0tyMjpp5p+ZFQnoDWdtpkeVBkMO1C0GXZD1VVxvRf1
         BHSUSlYw8cfyEq340LRs8iw6xD+vDq693y7PUx0M2IhQEyvlIjAcChERgKsACcvfgknf
         Asy3MpU9O00IAqvv0jKinhU+x8b+Xpac4b6LH8rZVadPW+cCsUwWQGfiYHAU82yBt3fr
         bsox+5XFtz6LbXHGLN/WvDUsc4ZV/UdEi8b50Dci/4Md4u0WCSjb8r4p7mfuqPOfutN+
         K0Fw==
X-Gm-Message-State: AO0yUKWQwMIvIlessG+oVUGpqPZ7Xf8XZ1Cf0DHPn2820gsAD+NfiaAB
        Hj4Evzqm6xxuc2RCnaKR4/8oh/n0j2WywCM1al3ELlhE52NDOuuczTcXk17jrL2/o8q0EN20wXu
        uxuyQB2M78ehfzC+FuEtztQWMBge0rOylDNU3iVBKZcl5DtVvlrt2jcBF9AQSYXZWsw99uWip
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:aa7:9f51:0:b0:58b:c29a:87a6 with SMTP id
 h17-20020aa79f51000000b0058bc29a87a6mr1724887pfr.13.1675362500951; Thu, 02
 Feb 2023 10:28:20 -0800 (PST)
Date: Thu,  2 Feb 2023 18:27:54 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-7-bgardon@google.com>
Subject: [PATCH 06/21] KVM: x86/mmu: Get rid of is_cpuid_PSE36()
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756744997003628115?=
X-GMAIL-MSGID: =?utf-8?q?1756744997003628115?=

is_cpuid_PSE36() always returns 1 and is never overridden, so just get
rid of the function. This saves having to export it in a future commit
in order to move the include of paging_tmpl.h out of mmu.c.

No functional change intended.

Suggested-by: David Matlack <dmatlack@google.com>

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c         | 13 ++-----------
 arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
 2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 12d38a8772a80..35cb59737c0a3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -304,11 +304,6 @@ static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte)
 	return likely(kvm_gen == spte_gen);
 }
 
-static int is_cpuid_PSE36(void)
-{
-	return 1;
-}
-
 #ifdef CONFIG_X86_64
 static void __set_spte(u64 *sptep, u64 spte)
 {
@@ -4661,12 +4656,8 @@ static void __reset_rsvds_bits_mask(struct rsvd_bits_validate *rsvd_check,
 			break;
 		}
 
-		if (is_cpuid_PSE36())
-			/* 36bits PSE 4MB page */
-			rsvd_check->rsvd_bits_mask[1][1] = rsvd_bits(17, 21);
-		else
-			/* 32 bits PSE 4MB page */
-			rsvd_check->rsvd_bits_mask[1][1] = rsvd_bits(13, 21);
+		/* 36bits PSE 4MB page */
+		rsvd_check->rsvd_bits_mask[1][1] = rsvd_bits(17, 21);
 		break;
 	case PT32E_ROOT_LEVEL:
 		rsvd_check->rsvd_bits_mask[0][2] = rsvd_bits(63, 63) |
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e5662dbd519c4..730b413eebfde 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -426,7 +426,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	gfn += (addr & PT_LVL_OFFSET_MASK(walker->level)) >> PAGE_SHIFT;
 
 #if PTTYPE == 32
-	if (walker->level > PG_LEVEL_4K && is_cpuid_PSE36())
+	if (walker->level > PG_LEVEL_4K)
 		gfn += pse36_gfn_delta(pte);
 #endif
 

From patchwork Thu Feb  2 18:27:55 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52119
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp400836wrn;
        Thu, 2 Feb 2023 10:30:14 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set/QqqnZRTM9bmNUSTQ5Zw1ZGYwLI8vu1olkdqB9VB32myZyaovRXJdc/XKJjlp9QmkZM2zw
X-Received: by 2002:a05:6a20:439f:b0:be:c874:b7d9 with SMTP id
 i31-20020a056a20439f00b000bec874b7d9mr7857453pzl.21.1675362614365;
        Thu, 02 Feb 2023 10:30:14 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675362614; cv=none;
        d=google.com; s=arc-20160816;
        b=JHT5mgkq4akIDCLyHkXmqzv7rEfr+iRrS69zVZeJtQrF9N1fxICOAl8vIhoPFvob8c
         SsECYm2Lm30k/wXfddWi6aiVAAkBUWQEgzcAVsjcZ9L8JluojUXxlD12M3B5u14BPT00
         y+lbbW+24mmutmBUlFTz8J4f9hnSxtWHrr3H3p/VxvMG/5Y/v/WG0Wo/3t7ONikA/PXT
         N/ADgpch62U8TnQb8CZxJWp57b62WR/UWv4VeBpmdKV+gV5LHvaGNRysg3dZWwQoY+Ih
         Q0qNeJLaU3RbROVCA+2orRptZWiDwKe6JXXELjkF74PuMcDDNoG03CWGmi36ovvCUzGt
         a/QQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=o9sD/xknztw1p3Rn8mfRPOEhjY5zpJi+eeqYJnkJHHs=;
        b=PFiKFXwC3bsNOAWZDQhfDww+7M1OOSOIeoccJnVlG9XbzK18f5ESLoMWsNnMLmP7jO
         nGIoqFQIboydfCupQL88BQoASit3bzp91yQjQ91zSXisTtN4F9A+YVhH+bU+C4abU4bU
         vAbk/Y2ZE/whgMTNX6bxPuNhNEj87JlfxQTpIHRJEMLS3/hN+4sx7PTztyi8XexLjVnE
         FSjsY4aKJGy0J4amAvzzqBXAvQnmB6LH/aoQh/u3KVXAQpMaaliaDOOYKMjEcHXgROj2
         XtSx82vbuNOxAO0e4PcVrqzTYF0QF8P0N3wOOKX7y1qSJJD4CCnPhqKOTRTdMb9ZzjqZ
         yqbg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=PhFmATdU;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 m8-20020a656a08000000b004e382954739si324499pgu.664.2023.02.02.10.30.00;
        Thu, 02 Feb 2023 10:30:14 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=PhFmATdU;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232628AbjBBS3X (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:29:23 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33370 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232606AbjBBS2m (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:28:42 -0500
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75C9B66026
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:24 -0800 (PST)
Received: by mail-pg1-x549.google.com with SMTP id
 139-20020a630791000000b004ef5cf7541dso1373353pgh.15
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=o9sD/xknztw1p3Rn8mfRPOEhjY5zpJi+eeqYJnkJHHs=;
        b=PhFmATdUh9hLeEhiOLkU9m8hgFJRww9Y2poTDkLRLdPQXzX4lIO/GhFLs4NcOonIW1
         sMqpC6cFbeN8M7tYVtzNTuHbrZ4G4uaaSDaU2Fj+J5M3wp7XpdNTtCR/D20HWwNibhre
         OK28LRwtqdz4O+KT1UGL0/9B9JZ+zUypF6Pnca4caiJaA5Xs5jjq1/nYYo6GCeiRoUjJ
         QFO94roxqVTSHmISBt+EUbZ1XsFOpQU0+xyHzFnptxFIe/Xg2oz7h7v5BneJMm+RI5OL
         RCNtgW8Eol8BdsJycniM2T7KX07/2YBSFSnM1h8Lx+Mj8xQRQNDIWl+h3Aoxhk75E+WO
         GrEg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=o9sD/xknztw1p3Rn8mfRPOEhjY5zpJi+eeqYJnkJHHs=;
        b=IsJHN/Q+g9Bgl60H8duVIeVTTmaXXYmFCtXEkEJojNShFZ2q9fmEUrJvUwty7hf7p0
         Psft9W3JwKNvfrZy5EGS8GVHksANpQhV276vBIfHeeT4qRXBXB8SbczgdRnML3S+J1vR
         Qc3xwB0kDGHVIp9d3rUde6UCwdlUfF43J1g6aKuiF15M5coFTCZCCUMHTtzR8rymOo6D
         T/ysQJUGGhKA2qQhLpgolSI6klVo0kPrifMaZark1gNw7IaBNG4nYhD9dCvw2ackaZAy
         Qy2LvJ6o9/thdO9Jm5oH8+06K709XInyGkzYXkU3ITYnnvMSu7kqegDGCvJiCciJo9xM
         YA5g==
X-Gm-Message-State: AO0yUKUjcrApYVGuIfpUFcbZA4OFhZizfW94LSJ/YYfQfD/C3XrmoN8Z
        5ItL+6HttFNqnPeHvPLmnaoZy4RR6iongpcwnD0H0UNObC1l8IwyNSL1L+18N/34A6JMMFAjpCi
        +MIaxUWOuZUTOglG4ymBVQIhAO2lNcdr4RmpxlQxHPatMmWddUQLcdBEADP6KBE1fBGaEvJG2
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a17:903:2285:b0:196:1087:edfc with SMTP id
 b5-20020a170903228500b001961087edfcmr1747347plh.25.1675362502776; Thu, 02 Feb
 2023 10:28:22 -0800 (PST)
Date: Thu,  2 Feb 2023 18:27:55 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-8-bgardon@google.com>
Subject: [PATCH 07/21] KVM: x86/MMU: Move the Shadow MMU implementation to
 shadow_mmu.c
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745028688395759?=
X-GMAIL-MSGID: =?utf-8?q?1756745028688395759?=

Cut and paste the implementation of the Shadow MMU to shadow_mmu.(c|h).
This is a monsterously large commit, moving ~3500 lines. With such a
large move, there's no way to make it easy. Do the move in one massive
step to simplify dealing with merge conflicts and to make the git
history a little easier to dig through. Several cleanup commits follow
this one rather than preceed it so that their git history will remain
easy to see.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/debugfs.c          |    1 +
 arch/x86/kvm/mmu/mmu.c          | 4510 ++++---------------------------
 arch/x86/kvm/mmu/mmu_internal.h |    4 +-
 arch/x86/kvm/mmu/shadow_mmu.c   | 3418 +++++++++++++++++++++++
 arch/x86/kvm/mmu/shadow_mmu.h   |  145 +
 5 files changed, 4083 insertions(+), 3995 deletions(-)

diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c
index ee8c4c3496edd..4825d7a56f39f 100644
--- a/arch/x86/kvm/debugfs.c
+++ b/arch/x86/kvm/debugfs.c
@@ -11,6 +11,7 @@
 #include "lapic.h"
 #include "mmu.h"
 #include "mmu/mmu_internal.h"
+#include "mmu/shadow_mmu.h"
 
 static int vcpu_get_timer_advance_ns(void *data, u64 *val)
 {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 35cb59737c0a3..2162dfda9601f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -117,59 +117,12 @@ bool dbg = 0;
 module_param(dbg, bool, 0644);
 #endif
 
-#define PTE_PREFETCH_NUM		8
-
 #include <trace/events/kvm.h>
 
-/* make pte_list_desc fit well in cache lines */
-#define PTE_LIST_EXT 14
-
-/*
- * Slight optimization of cacheline layout, by putting `more' and `spte_count'
- * at the start; then accessing it will only use one single cacheline for
- * either full (entries==PTE_LIST_EXT) case or entries<=6.
- */
-struct pte_list_desc {
-	struct pte_list_desc *more;
-	/*
-	 * Stores number of entries stored in the pte_list_desc.  No need to be
-	 * u64 but just for easier alignment.  When PTE_LIST_EXT, means full.
-	 */
-	u64 spte_count;
-	u64 *sptes[PTE_LIST_EXT];
-};
-
-struct kvm_shadow_walk_iterator {
-	u64 addr;
-	hpa_t shadow_addr;
-	u64 *sptep;
-	int level;
-	unsigned index;
-};
-
-#define for_each_shadow_entry_using_root(_vcpu, _root, _addr, _walker)     \
-	for (shadow_walk_init_using_root(&(_walker), (_vcpu),              \
-					 (_root), (_addr));                \
-	     shadow_walk_okay(&(_walker));			           \
-	     shadow_walk_next(&(_walker)))
-
-#define for_each_shadow_entry(_vcpu, _addr, _walker)            \
-	for (shadow_walk_init(&(_walker), _vcpu, _addr);	\
-	     shadow_walk_okay(&(_walker));			\
-	     shadow_walk_next(&(_walker)))
-
-#define for_each_shadow_entry_lockless(_vcpu, _addr, _walker, spte)	\
-	for (shadow_walk_init(&(_walker), _vcpu, _addr);		\
-	     shadow_walk_okay(&(_walker)) &&				\
-		({ spte = mmu_spte_get_lockless(_walker.sptep); 1; });	\
-	     __shadow_walk_next(&(_walker), spte))
-
 struct kmem_cache *pte_list_desc_cache;
 struct kmem_cache *mmu_page_header_cache;
 struct percpu_counter kvm_total_used_mmu_pages;
 
-static void mmu_spte_set(u64 *sptep, u64 spte);
-
 struct kvm_mmu_role_regs {
 	const unsigned long cr0;
 	const unsigned long cr4;
@@ -265,15 +218,6 @@ void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
 	kvm_flush_remote_tlbs_with_range(kvm, &range);
 }
 
-void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
-		    unsigned int access)
-{
-	u64 spte = make_mmio_spte(vcpu, gfn, access);
-
-	trace_mark_mmio_spte(sptep, gfn, spte);
-	mmu_spte_set(sptep, spte);
-}
-
 static gfn_t get_mmio_spte_gfn(u64 spte)
 {
 	u64 gpa = spte & shadow_nonpresent_or_rsvd_lower_gfn_mask;
@@ -304,310 +248,6 @@ static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte)
 	return likely(kvm_gen == spte_gen);
 }
 
-#ifdef CONFIG_X86_64
-static void __set_spte(u64 *sptep, u64 spte)
-{
-	WRITE_ONCE(*sptep, spte);
-}
-
-static void __update_clear_spte_fast(u64 *sptep, u64 spte)
-{
-	WRITE_ONCE(*sptep, spte);
-}
-
-static u64 __update_clear_spte_slow(u64 *sptep, u64 spte)
-{
-	return xchg(sptep, spte);
-}
-
-static u64 __get_spte_lockless(u64 *sptep)
-{
-	return READ_ONCE(*sptep);
-}
-#else
-union split_spte {
-	struct {
-		u32 spte_low;
-		u32 spte_high;
-	};
-	u64 spte;
-};
-
-static void count_spte_clear(u64 *sptep, u64 spte)
-{
-	struct kvm_mmu_page *sp =  sptep_to_sp(sptep);
-
-	if (is_shadow_present_pte(spte))
-		return;
-
-	/* Ensure the spte is completely set before we increase the count */
-	smp_wmb();
-	sp->clear_spte_count++;
-}
-
-static void __set_spte(u64 *sptep, u64 spte)
-{
-	union split_spte *ssptep, sspte;
-
-	ssptep = (union split_spte *)sptep;
-	sspte = (union split_spte)spte;
-
-	ssptep->spte_high = sspte.spte_high;
-
-	/*
-	 * If we map the spte from nonpresent to present, We should store
-	 * the high bits firstly, then set present bit, so cpu can not
-	 * fetch this spte while we are setting the spte.
-	 */
-	smp_wmb();
-
-	WRITE_ONCE(ssptep->spte_low, sspte.spte_low);
-}
-
-static void __update_clear_spte_fast(u64 *sptep, u64 spte)
-{
-	union split_spte *ssptep, sspte;
-
-	ssptep = (union split_spte *)sptep;
-	sspte = (union split_spte)spte;
-
-	WRITE_ONCE(ssptep->spte_low, sspte.spte_low);
-
-	/*
-	 * If we map the spte from present to nonpresent, we should clear
-	 * present bit firstly to avoid vcpu fetch the old high bits.
-	 */
-	smp_wmb();
-
-	ssptep->spte_high = sspte.spte_high;
-	count_spte_clear(sptep, spte);
-}
-
-static u64 __update_clear_spte_slow(u64 *sptep, u64 spte)
-{
-	union split_spte *ssptep, sspte, orig;
-
-	ssptep = (union split_spte *)sptep;
-	sspte = (union split_spte)spte;
-
-	/* xchg acts as a barrier before the setting of the high bits */
-	orig.spte_low = xchg(&ssptep->spte_low, sspte.spte_low);
-	orig.spte_high = ssptep->spte_high;
-	ssptep->spte_high = sspte.spte_high;
-	count_spte_clear(sptep, spte);
-
-	return orig.spte;
-}
-
-/*
- * The idea using the light way get the spte on x86_32 guest is from
- * gup_get_pte (mm/gup.c).
- *
- * An spte tlb flush may be pending, because kvm_set_pte_rmap
- * coalesces them and we are running out of the MMU lock.  Therefore
- * we need to protect against in-progress updates of the spte.
- *
- * Reading the spte while an update is in progress may get the old value
- * for the high part of the spte.  The race is fine for a present->non-present
- * change (because the high part of the spte is ignored for non-present spte),
- * but for a present->present change we must reread the spte.
- *
- * All such changes are done in two steps (present->non-present and
- * non-present->present), hence it is enough to count the number of
- * present->non-present updates: if it changed while reading the spte,
- * we might have hit the race.  This is done using clear_spte_count.
- */
-static u64 __get_spte_lockless(u64 *sptep)
-{
-	struct kvm_mmu_page *sp =  sptep_to_sp(sptep);
-	union split_spte spte, *orig = (union split_spte *)sptep;
-	int count;
-
-retry:
-	count = sp->clear_spte_count;
-	smp_rmb();
-
-	spte.spte_low = orig->spte_low;
-	smp_rmb();
-
-	spte.spte_high = orig->spte_high;
-	smp_rmb();
-
-	if (unlikely(spte.spte_low != orig->spte_low ||
-	      count != sp->clear_spte_count))
-		goto retry;
-
-	return spte.spte;
-}
-#endif
-
-/* Rules for using mmu_spte_set:
- * Set the sptep from nonpresent to present.
- * Note: the sptep being assigned *must* be either not present
- * or in a state where the hardware will not attempt to update
- * the spte.
- */
-static void mmu_spte_set(u64 *sptep, u64 new_spte)
-{
-	WARN_ON(is_shadow_present_pte(*sptep));
-	__set_spte(sptep, new_spte);
-}
-
-/*
- * Update the SPTE (excluding the PFN), but do not track changes in its
- * accessed/dirty status.
- */
-static u64 mmu_spte_update_no_track(u64 *sptep, u64 new_spte)
-{
-	u64 old_spte = *sptep;
-
-	WARN_ON(!is_shadow_present_pte(new_spte));
-	check_spte_writable_invariants(new_spte);
-
-	if (!is_shadow_present_pte(old_spte)) {
-		mmu_spte_set(sptep, new_spte);
-		return old_spte;
-	}
-
-	if (!spte_has_volatile_bits(old_spte))
-		__update_clear_spte_fast(sptep, new_spte);
-	else
-		old_spte = __update_clear_spte_slow(sptep, new_spte);
-
-	WARN_ON(spte_to_pfn(old_spte) != spte_to_pfn(new_spte));
-
-	return old_spte;
-}
-
-/* Rules for using mmu_spte_update:
- * Update the state bits, it means the mapped pfn is not changed.
- *
- * Whenever an MMU-writable SPTE is overwritten with a read-only SPTE, remote
- * TLBs must be flushed. Otherwise rmap_write_protect will find a read-only
- * spte, even though the writable spte might be cached on a CPU's TLB.
- *
- * Returns true if the TLB needs to be flushed
- */
-static bool mmu_spte_update(u64 *sptep, u64 new_spte)
-{
-	bool flush = false;
-	u64 old_spte = mmu_spte_update_no_track(sptep, new_spte);
-
-	if (!is_shadow_present_pte(old_spte))
-		return false;
-
-	/*
-	 * For the spte updated out of mmu-lock is safe, since
-	 * we always atomically update it, see the comments in
-	 * spte_has_volatile_bits().
-	 */
-	if (is_mmu_writable_spte(old_spte) &&
-	      !is_writable_pte(new_spte))
-		flush = true;
-
-	/*
-	 * Flush TLB when accessed/dirty states are changed in the page tables,
-	 * to guarantee consistency between TLB and page tables.
-	 */
-
-	if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte)) {
-		flush = true;
-		kvm_set_pfn_accessed(spte_to_pfn(old_spte));
-	}
-
-	if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte)) {
-		flush = true;
-		kvm_set_pfn_dirty(spte_to_pfn(old_spte));
-	}
-
-	return flush;
-}
-
-/*
- * Rules for using mmu_spte_clear_track_bits:
- * It sets the sptep from present to nonpresent, and track the
- * state bits, it is used to clear the last level sptep.
- * Returns the old PTE.
- */
-static u64 mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
-{
-	kvm_pfn_t pfn;
-	u64 old_spte = *sptep;
-	int level = sptep_to_sp(sptep)->role.level;
-	struct page *page;
-
-	if (!is_shadow_present_pte(old_spte) ||
-	    !spte_has_volatile_bits(old_spte))
-		__update_clear_spte_fast(sptep, 0ull);
-	else
-		old_spte = __update_clear_spte_slow(sptep, 0ull);
-
-	if (!is_shadow_present_pte(old_spte))
-		return old_spte;
-
-	kvm_update_page_stats(kvm, level, -1);
-
-	pfn = spte_to_pfn(old_spte);
-
-	/*
-	 * KVM doesn't hold a reference to any pages mapped into the guest, and
-	 * instead uses the mmu_notifier to ensure that KVM unmaps any pages
-	 * before they are reclaimed.  Sanity check that, if the pfn is backed
-	 * by a refcounted page, the refcount is elevated.
-	 */
-	page = kvm_pfn_to_refcounted_page(pfn);
-	WARN_ON(page && !page_count(page));
-
-	if (is_accessed_spte(old_spte))
-		kvm_set_pfn_accessed(pfn);
-
-	if (is_dirty_spte(old_spte))
-		kvm_set_pfn_dirty(pfn);
-
-	return old_spte;
-}
-
-/*
- * Rules for using mmu_spte_clear_no_track:
- * Directly clear spte without caring the state bits of sptep,
- * it is used to set the upper level spte.
- */
-static void mmu_spte_clear_no_track(u64 *sptep)
-{
-	__update_clear_spte_fast(sptep, 0ull);
-}
-
-static u64 mmu_spte_get_lockless(u64 *sptep)
-{
-	return __get_spte_lockless(sptep);
-}
-
-/* Returns the Accessed status of the PTE and resets it at the same time. */
-static bool mmu_spte_age(u64 *sptep)
-{
-	u64 spte = mmu_spte_get_lockless(sptep);
-
-	if (!is_accessed_spte(spte))
-		return false;
-
-	if (spte_ad_enabled(spte)) {
-		clear_bit((ffs(shadow_accessed_mask) - 1),
-			  (unsigned long *)sptep);
-	} else {
-		/*
-		 * Capture the dirty status of the page, so that it doesn't get
-		 * lost when the SPTE is marked for access tracking.
-		 */
-		if (is_writable_pte(spte))
-			kvm_set_pfn_dirty(spte_to_pfn(spte));
-
-		spte = mark_spte_for_access_track(spte);
-		mmu_spte_update_no_track(sptep, spte);
-	}
-
-	return true;
-}
-
 static inline bool is_tdp_mmu_active(struct kvm_vcpu *vcpu)
 {
 	return tdp_mmu_enabled && vcpu->arch.mmu->root_role.direct;
@@ -678,77 +318,6 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
 }
 
-static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
-{
-	kmem_cache_free(pte_list_desc_cache, pte_list_desc);
-}
-
-static bool sp_has_gptes(struct kvm_mmu_page *sp);
-
-static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
-{
-	if (sp->role.passthrough)
-		return sp->gfn;
-
-	if (!sp->role.direct)
-		return sp->shadowed_translation[index] >> PAGE_SHIFT;
-
-	return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
-}
-
-/*
- * For leaf SPTEs, fetch the *guest* access permissions being shadowed. Note
- * that the SPTE itself may have a more constrained access permissions that
- * what the guest enforces. For example, a guest may create an executable
- * huge PTE but KVM may disallow execution to mitigate iTLB multihit.
- */
-static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
-{
-	if (sp_has_gptes(sp))
-		return sp->shadowed_translation[index] & ACC_ALL;
-
-	/*
-	 * For direct MMUs (e.g. TDP or non-paging guests) or passthrough SPs,
-	 * KVM is not shadowing any guest page tables, so the "guest access
-	 * permissions" are just ACC_ALL.
-	 *
-	 * For direct SPs in indirect MMUs (shadow paging), i.e. when KVM
-	 * is shadowing a guest huge page with small pages, the guest access
-	 * permissions being shadowed are the access permissions of the huge
-	 * page.
-	 *
-	 * In both cases, sp->role.access contains the correct access bits.
-	 */
-	return sp->role.access;
-}
-
-static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
-					 gfn_t gfn, unsigned int access)
-{
-	if (sp_has_gptes(sp)) {
-		sp->shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
-		return;
-	}
-
-	WARN_ONCE(access != kvm_mmu_page_get_access(sp, index),
-	          "access mismatch under %s page %llx (expected %u, got %u)\n",
-	          sp->role.passthrough ? "passthrough" : "direct",
-	          sp->gfn, kvm_mmu_page_get_access(sp, index), access);
-
-	WARN_ONCE(gfn != kvm_mmu_page_get_gfn(sp, index),
-	          "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
-	          sp->role.passthrough ? "passthrough" : "direct",
-	          sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
-}
-
-static void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index,
-				    unsigned int access)
-{
-	gfn_t gfn = kvm_mmu_page_get_gfn(sp, index);
-
-	kvm_mmu_page_set_translation(sp, index, gfn, access);
-}
-
 /*
  * Return the pointer to the large page information for a given gfn,
  * handling slots that are not large page aligned.
@@ -785,28 +354,6 @@ void kvm_mmu_gfn_allow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn)
 	update_gfn_disallow_lpage_count(slot, gfn, -1);
 }
 
-static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
-{
-	struct kvm_memslots *slots;
-	struct kvm_memory_slot *slot;
-	gfn_t gfn;
-
-	kvm->arch.indirect_shadow_pages++;
-	gfn = sp->gfn;
-	slots = kvm_memslots_for_spte_role(kvm, sp->role);
-	slot = __gfn_to_memslot(slots, gfn);
-
-	/* the non-leaf shadow pages are keeping readonly. */
-	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_add_page(kvm, slot, gfn,
-						    KVM_PAGE_TRACK_WRITE);
-
-	kvm_mmu_gfn_disallow_lpage(slot, gfn);
-
-	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
-		kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
-}
-
 void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	/*
@@ -834,23 +381,6 @@ void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 		track_possible_nx_huge_page(kvm, sp);
 }
 
-static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
-{
-	struct kvm_memslots *slots;
-	struct kvm_memory_slot *slot;
-	gfn_t gfn;
-
-	kvm->arch.indirect_shadow_pages--;
-	gfn = sp->gfn;
-	slots = kvm_memslots_for_spte_role(kvm, sp->role);
-	slot = __gfn_to_memslot(slots, gfn);
-	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_remove_page(kvm, slot, gfn,
-						       KVM_PAGE_TRACK_WRITE);
-
-	kvm_mmu_gfn_allow_lpage(slot, gfn);
-}
-
 void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	if (list_empty(&sp->possible_nx_huge_page_link))
@@ -881,436 +411,51 @@ struct kvm_memory_slot *gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu,
 	return slot;
 }
 
-/*
- * About rmap_head encoding:
+/**
+ * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages
+ * @kvm: kvm instance
+ * @slot: slot to protect
+ * @gfn_offset: start of the BITS_PER_LONG pages we care about
+ * @mask: indicates which pages we should protect
  *
- * If the bit zero of rmap_head->val is clear, then it points to the only spte
- * in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct
- * pte_list_desc containing more mappings.
- */
-
-/*
- * Returns the number of pointers in the rmap chain, not counting the new one.
+ * Used when we do not need to care about huge page mappings.
  */
-static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte,
-			struct kvm_rmap_head *rmap_head)
-{
-	struct pte_list_desc *desc;
-	int count = 0;
-
-	if (!rmap_head->val) {
-		rmap_printk("%p %llx 0->1\n", spte, *spte);
-		rmap_head->val = (unsigned long)spte;
-	} else if (!(rmap_head->val & 1)) {
-		rmap_printk("%p %llx 1->many\n", spte, *spte);
-		desc = kvm_mmu_memory_cache_alloc(cache);
-		desc->sptes[0] = (u64 *)rmap_head->val;
-		desc->sptes[1] = spte;
-		desc->spte_count = 2;
-		rmap_head->val = (unsigned long)desc | 1;
-		++count;
-	} else {
-		rmap_printk("%p %llx many->many\n", spte, *spte);
-		desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
-		while (desc->spte_count == PTE_LIST_EXT) {
-			count += PTE_LIST_EXT;
-			if (!desc->more) {
-				desc->more = kvm_mmu_memory_cache_alloc(cache);
-				desc = desc->more;
-				desc->spte_count = 0;
-				break;
-			}
-			desc = desc->more;
-		}
-		count += desc->spte_count;
-		desc->sptes[desc->spte_count++] = spte;
-	}
-	return count;
-}
-
-static void pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
-				       struct pte_list_desc *desc, int i,
-				       struct pte_list_desc *prev_desc)
+static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+				     struct kvm_memory_slot *slot,
+				     gfn_t gfn_offset, unsigned long mask)
 {
-	int j = desc->spte_count - 1;
+	struct kvm_rmap_head *rmap_head;
+
+	if (tdp_mmu_enabled)
+		kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot,
+				slot->base_gfn + gfn_offset, mask, true);
 
-	desc->sptes[i] = desc->sptes[j];
-	desc->sptes[j] = NULL;
-	desc->spte_count--;
-	if (desc->spte_count)
+	if (!kvm_memslots_have_rmaps(kvm))
 		return;
-	if (!prev_desc && !desc->more)
-		rmap_head->val = 0;
-	else
-		if (prev_desc)
-			prev_desc->more = desc->more;
-		else
-			rmap_head->val = (unsigned long)desc->more | 1;
-	mmu_free_pte_list_desc(desc);
-}
 
-static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
-{
-	struct pte_list_desc *desc;
-	struct pte_list_desc *prev_desc;
-	int i;
+	while (mask) {
+		rmap_head = gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
+					PG_LEVEL_4K, slot);
+		rmap_write_protect(rmap_head, false);
 
-	if (!rmap_head->val) {
-		pr_err("%s: %p 0->BUG\n", __func__, spte);
-		BUG();
-	} else if (!(rmap_head->val & 1)) {
-		rmap_printk("%p 1->0\n", spte);
-		if ((u64 *)rmap_head->val != spte) {
-			pr_err("%s:  %p 1->BUG\n", __func__, spte);
-			BUG();
-		}
-		rmap_head->val = 0;
-	} else {
-		rmap_printk("%p many->many\n", spte);
-		desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
-		prev_desc = NULL;
-		while (desc) {
-			for (i = 0; i < desc->spte_count; ++i) {
-				if (desc->sptes[i] == spte) {
-					pte_list_desc_remove_entry(rmap_head,
-							desc, i, prev_desc);
-					return;
-				}
-			}
-			prev_desc = desc;
-			desc = desc->more;
-		}
-		pr_err("%s: %p many->many\n", __func__, spte);
-		BUG();
+		/* clear the first set bit */
+		mask &= mask - 1;
 	}
 }
 
-static void kvm_zap_one_rmap_spte(struct kvm *kvm,
-				  struct kvm_rmap_head *rmap_head, u64 *sptep)
-{
-	mmu_spte_clear_track_bits(kvm, sptep);
-	pte_list_remove(sptep, rmap_head);
-}
-
-/* Return true if at least one SPTE was zapped, false otherwise */
-static bool kvm_zap_all_rmap_sptes(struct kvm *kvm,
-				   struct kvm_rmap_head *rmap_head)
-{
-	struct pte_list_desc *desc, *next;
-	int i;
-
-	if (!rmap_head->val)
-		return false;
-
-	if (!(rmap_head->val & 1)) {
-		mmu_spte_clear_track_bits(kvm, (u64 *)rmap_head->val);
-		goto out;
-	}
-
-	desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
-
-	for (; desc; desc = next) {
-		for (i = 0; i < desc->spte_count; i++)
-			mmu_spte_clear_track_bits(kvm, desc->sptes[i]);
-		next = desc->more;
-		mmu_free_pte_list_desc(desc);
-	}
-out:
-	/* rmap_head is meaningless now, remember to reset it */
-	rmap_head->val = 0;
-	return true;
-}
-
-unsigned int pte_list_count(struct kvm_rmap_head *rmap_head)
-{
-	struct pte_list_desc *desc;
-	unsigned int count = 0;
-
-	if (!rmap_head->val)
-		return 0;
-	else if (!(rmap_head->val & 1))
-		return 1;
-
-	desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
-
-	while (desc) {
-		count += desc->spte_count;
-		desc = desc->more;
-	}
-
-	return count;
-}
-
-static struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
-					 const struct kvm_memory_slot *slot)
-{
-	unsigned long idx;
-
-	idx = gfn_to_index(gfn, slot->base_gfn, level);
-	return &slot->arch.rmap[level - PG_LEVEL_4K][idx];
-}
-
-static bool rmap_can_add(struct kvm_vcpu *vcpu)
-{
-	struct kvm_mmu_memory_cache *mc;
-
-	mc = &vcpu->arch.mmu_pte_list_desc_cache;
-	return kvm_mmu_memory_cache_nr_free_objects(mc);
-}
-
-static void rmap_remove(struct kvm *kvm, u64 *spte)
-{
-	struct kvm_memslots *slots;
-	struct kvm_memory_slot *slot;
-	struct kvm_mmu_page *sp;
-	gfn_t gfn;
-	struct kvm_rmap_head *rmap_head;
-
-	sp = sptep_to_sp(spte);
-	gfn = kvm_mmu_page_get_gfn(sp, spte_index(spte));
-
-	/*
-	 * Unlike rmap_add, rmap_remove does not run in the context of a vCPU
-	 * so we have to determine which memslots to use based on context
-	 * information in sp->role.
-	 */
-	slots = kvm_memslots_for_spte_role(kvm, sp->role);
-
-	slot = __gfn_to_memslot(slots, gfn);
-	rmap_head = gfn_to_rmap(gfn, sp->role.level, slot);
-
-	pte_list_remove(spte, rmap_head);
-}
-
-/*
- * Used by the following functions to iterate through the sptes linked by a
- * rmap.  All fields are private and not assumed to be used outside.
- */
-struct rmap_iterator {
-	/* private fields */
-	struct pte_list_desc *desc;	/* holds the sptep if not NULL */
-	int pos;			/* index of the sptep */
-};
-
-/*
- * Iteration must be started by this function.  This should also be used after
- * removing/dropping sptes from the rmap link because in such cases the
- * information in the iterator may not be valid.
- *
- * Returns sptep if found, NULL otherwise.
- */
-static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head,
-			   struct rmap_iterator *iter)
-{
-	u64 *sptep;
-
-	if (!rmap_head->val)
-		return NULL;
-
-	if (!(rmap_head->val & 1)) {
-		iter->desc = NULL;
-		sptep = (u64 *)rmap_head->val;
-		goto out;
-	}
-
-	iter->desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
-	iter->pos = 0;
-	sptep = iter->desc->sptes[iter->pos];
-out:
-	BUG_ON(!is_shadow_present_pte(*sptep));
-	return sptep;
-}
-
-/*
- * Must be used with a valid iterator: e.g. after rmap_get_first().
- *
- * Returns sptep if found, NULL otherwise.
- */
-static u64 *rmap_get_next(struct rmap_iterator *iter)
-{
-	u64 *sptep;
-
-	if (iter->desc) {
-		if (iter->pos < PTE_LIST_EXT - 1) {
-			++iter->pos;
-			sptep = iter->desc->sptes[iter->pos];
-			if (sptep)
-				goto out;
-		}
-
-		iter->desc = iter->desc->more;
-
-		if (iter->desc) {
-			iter->pos = 0;
-			/* desc->sptes[0] cannot be NULL */
-			sptep = iter->desc->sptes[iter->pos];
-			goto out;
-		}
-	}
-
-	return NULL;
-out:
-	BUG_ON(!is_shadow_present_pte(*sptep));
-	return sptep;
-}
-
-#define for_each_rmap_spte(_rmap_head_, _iter_, _spte_)			\
-	for (_spte_ = rmap_get_first(_rmap_head_, _iter_);		\
-	     _spte_; _spte_ = rmap_get_next(_iter_))
-
-static void drop_spte(struct kvm *kvm, u64 *sptep)
-{
-	u64 old_spte = mmu_spte_clear_track_bits(kvm, sptep);
-
-	if (is_shadow_present_pte(old_spte))
-		rmap_remove(kvm, sptep);
-}
-
-static void drop_large_spte(struct kvm *kvm, u64 *sptep, bool flush)
-{
-	struct kvm_mmu_page *sp;
-
-	sp = sptep_to_sp(sptep);
-	WARN_ON(sp->role.level == PG_LEVEL_4K);
-
-	drop_spte(kvm, sptep);
-
-	if (flush)
-		kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
-			KVM_PAGES_PER_HPAGE(sp->role.level));
-}
-
-/*
- * Write-protect on the specified @sptep, @pt_protect indicates whether
- * spte write-protection is caused by protecting shadow page table.
- *
- * Note: write protection is difference between dirty logging and spte
- * protection:
- * - for dirty logging, the spte can be set to writable at anytime if
- *   its dirty bitmap is properly set.
- * - for spte protection, the spte can be writable only after unsync-ing
- *   shadow page.
- *
- * Return true if tlb need be flushed.
- */
-static bool spte_write_protect(u64 *sptep, bool pt_protect)
-{
-	u64 spte = *sptep;
-
-	if (!is_writable_pte(spte) &&
-	    !(pt_protect && is_mmu_writable_spte(spte)))
-		return false;
-
-	rmap_printk("spte %p %llx\n", sptep, *sptep);
-
-	if (pt_protect)
-		spte &= ~shadow_mmu_writable_mask;
-	spte = spte & ~PT_WRITABLE_MASK;
-
-	return mmu_spte_update(sptep, spte);
-}
-
-static bool rmap_write_protect(struct kvm_rmap_head *rmap_head,
-			       bool pt_protect)
-{
-	u64 *sptep;
-	struct rmap_iterator iter;
-	bool flush = false;
-
-	for_each_rmap_spte(rmap_head, &iter, sptep)
-		flush |= spte_write_protect(sptep, pt_protect);
-
-	return flush;
-}
-
-static bool spte_clear_dirty(u64 *sptep)
-{
-	u64 spte = *sptep;
-
-	rmap_printk("spte %p %llx\n", sptep, *sptep);
-
-	MMU_WARN_ON(!spte_ad_enabled(spte));
-	spte &= ~shadow_dirty_mask;
-	return mmu_spte_update(sptep, spte);
-}
-
-static bool spte_wrprot_for_clear_dirty(u64 *sptep)
-{
-	bool was_writable = test_and_clear_bit(PT_WRITABLE_SHIFT,
-					       (unsigned long *)sptep);
-	if (was_writable && !spte_ad_enabled(*sptep))
-		kvm_set_pfn_dirty(spte_to_pfn(*sptep));
-
-	return was_writable;
-}
-
-/*
- * Gets the GFN ready for another round of dirty logging by clearing the
- *	- D bit on ad-enabled SPTEs, and
- *	- W bit on ad-disabled SPTEs.
- * Returns true iff any D or W bits were cleared.
- */
-static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-			       const struct kvm_memory_slot *slot)
-{
-	u64 *sptep;
-	struct rmap_iterator iter;
-	bool flush = false;
-
-	for_each_rmap_spte(rmap_head, &iter, sptep)
-		if (spte_ad_need_write_protect(*sptep))
-			flush |= spte_wrprot_for_clear_dirty(sptep);
-		else
-			flush |= spte_clear_dirty(sptep);
-
-	return flush;
-}
-
-/**
- * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages
- * @kvm: kvm instance
- * @slot: slot to protect
- * @gfn_offset: start of the BITS_PER_LONG pages we care about
- * @mask: indicates which pages we should protect
- *
- * Used when we do not need to care about huge page mappings.
- */
-static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
-				     struct kvm_memory_slot *slot,
-				     gfn_t gfn_offset, unsigned long mask)
-{
-	struct kvm_rmap_head *rmap_head;
-
-	if (tdp_mmu_enabled)
-		kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot,
-				slot->base_gfn + gfn_offset, mask, true);
-
-	if (!kvm_memslots_have_rmaps(kvm))
-		return;
-
-	while (mask) {
-		rmap_head = gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
-					PG_LEVEL_4K, slot);
-		rmap_write_protect(rmap_head, false);
-
-		/* clear the first set bit */
-		mask &= mask - 1;
-	}
-}
-
-/**
- * kvm_mmu_clear_dirty_pt_masked - clear MMU D-bit for PT level pages, or write
- * protect the page if the D-bit isn't supported.
- * @kvm: kvm instance
- * @slot: slot to clear D-bit
- * @gfn_offset: start of the BITS_PER_LONG pages we care about
- * @mask: indicates which pages we should clear D-bit
- *
- * Used for PML to re-log the dirty GPAs after userspace querying dirty_bitmap.
- */
-static void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
-					 struct kvm_memory_slot *slot,
-					 gfn_t gfn_offset, unsigned long mask)
+/**
+ * kvm_mmu_clear_dirty_pt_masked - clear MMU D-bit for PT level pages, or write
+ * protect the page if the D-bit isn't supported.
+ * @kvm: kvm instance
+ * @slot: slot to clear D-bit
+ * @gfn_offset: start of the BITS_PER_LONG pages we care about
+ * @mask: indicates which pages we should clear D-bit
+ *
+ * Used for PML to re-log the dirty GPAs after userspace querying dirty_bitmap.
+ */
+static void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
+					 struct kvm_memory_slot *slot,
+					 gfn_t gfn_offset, unsigned long mask)
 {
 	struct kvm_rmap_head *rmap_head;
 
@@ -1412,147 +557,6 @@ bool kvm_vcpu_write_protect_gfn(struct kvm_vcpu *vcpu, u64 gfn)
 	return kvm_mmu_slot_gfn_write_protect(vcpu->kvm, slot, gfn, PG_LEVEL_4K);
 }
 
-static bool __kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-			   const struct kvm_memory_slot *slot)
-{
-	return kvm_zap_all_rmap_sptes(kvm, rmap_head);
-}
-
-static bool kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-			 struct kvm_memory_slot *slot, gfn_t gfn, int level,
-			 pte_t unused)
-{
-	return __kvm_zap_rmap(kvm, rmap_head, slot);
-}
-
-static bool kvm_set_pte_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-			     struct kvm_memory_slot *slot, gfn_t gfn, int level,
-			     pte_t pte)
-{
-	u64 *sptep;
-	struct rmap_iterator iter;
-	bool need_flush = false;
-	u64 new_spte;
-	kvm_pfn_t new_pfn;
-
-	WARN_ON(pte_huge(pte));
-	new_pfn = pte_pfn(pte);
-
-restart:
-	for_each_rmap_spte(rmap_head, &iter, sptep) {
-		rmap_printk("spte %p %llx gfn %llx (%d)\n",
-			    sptep, *sptep, gfn, level);
-
-		need_flush = true;
-
-		if (pte_write(pte)) {
-			kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
-			goto restart;
-		} else {
-			new_spte = kvm_mmu_changed_pte_notifier_make_spte(
-					*sptep, new_pfn);
-
-			mmu_spte_clear_track_bits(kvm, sptep);
-			mmu_spte_set(sptep, new_spte);
-		}
-	}
-
-	if (need_flush && kvm_available_flush_tlb_with_range()) {
-		kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
-		return false;
-	}
-
-	return need_flush;
-}
-
-struct slot_rmap_walk_iterator {
-	/* input fields. */
-	const struct kvm_memory_slot *slot;
-	gfn_t start_gfn;
-	gfn_t end_gfn;
-	int start_level;
-	int end_level;
-
-	/* output fields. */
-	gfn_t gfn;
-	struct kvm_rmap_head *rmap;
-	int level;
-
-	/* private field. */
-	struct kvm_rmap_head *end_rmap;
-};
-
-static void rmap_walk_init_level(struct slot_rmap_walk_iterator *iterator,
-				 int level)
-{
-	iterator->level = level;
-	iterator->gfn = iterator->start_gfn;
-	iterator->rmap = gfn_to_rmap(iterator->gfn, level, iterator->slot);
-	iterator->end_rmap = gfn_to_rmap(iterator->end_gfn, level, iterator->slot);
-}
-
-static void slot_rmap_walk_init(struct slot_rmap_walk_iterator *iterator,
-				const struct kvm_memory_slot *slot,
-				int start_level, int end_level,
-				gfn_t start_gfn, gfn_t end_gfn)
-{
-	iterator->slot = slot;
-	iterator->start_level = start_level;
-	iterator->end_level = end_level;
-	iterator->start_gfn = start_gfn;
-	iterator->end_gfn = end_gfn;
-
-	rmap_walk_init_level(iterator, iterator->start_level);
-}
-
-static bool slot_rmap_walk_okay(struct slot_rmap_walk_iterator *iterator)
-{
-	return !!iterator->rmap;
-}
-
-static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
-{
-	while (++iterator->rmap <= iterator->end_rmap) {
-		iterator->gfn += (1UL << KVM_HPAGE_GFN_SHIFT(iterator->level));
-
-		if (iterator->rmap->val)
-			return;
-	}
-
-	if (++iterator->level > iterator->end_level) {
-		iterator->rmap = NULL;
-		return;
-	}
-
-	rmap_walk_init_level(iterator, iterator->level);
-}
-
-#define for_each_slot_rmap_range(_slot_, _start_level_, _end_level_,	\
-	   _start_gfn, _end_gfn, _iter_)				\
-	for (slot_rmap_walk_init(_iter_, _slot_, _start_level_,		\
-				 _end_level_, _start_gfn, _end_gfn);	\
-	     slot_rmap_walk_okay(_iter_);				\
-	     slot_rmap_walk_next(_iter_))
-
-typedef bool (*rmap_handler_t)(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-			       struct kvm_memory_slot *slot, gfn_t gfn,
-			       int level, pte_t pte);
-
-static __always_inline bool kvm_handle_gfn_range(struct kvm *kvm,
-						 struct kvm_gfn_range *range,
-						 rmap_handler_t handler)
-{
-	struct slot_rmap_walk_iterator iterator;
-	bool ret = false;
-
-	for_each_slot_rmap_range(range->slot, PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
-				 range->start, range->end - 1, &iterator)
-		ret |= handler(kvm, iterator.rmap, range->slot, iterator.gfn,
-			       iterator.level, range->pte);
-
-	return ret;
-}
-
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 {
 	bool flush = false;
@@ -1579,68 +583,6 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 	return flush;
 }
 
-static bool kvm_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-			 struct kvm_memory_slot *slot, gfn_t gfn, int level,
-			 pte_t unused)
-{
-	u64 *sptep;
-	struct rmap_iterator iter;
-	int young = 0;
-
-	for_each_rmap_spte(rmap_head, &iter, sptep)
-		young |= mmu_spte_age(sptep);
-
-	return young;
-}
-
-static bool kvm_test_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-			      struct kvm_memory_slot *slot, gfn_t gfn,
-			      int level, pte_t unused)
-{
-	u64 *sptep;
-	struct rmap_iterator iter;
-
-	for_each_rmap_spte(rmap_head, &iter, sptep)
-		if (is_accessed_spte(*sptep))
-			return true;
-	return false;
-}
-
-#define RMAP_RECYCLE_THRESHOLD 1000
-
-static void __rmap_add(struct kvm *kvm,
-		       struct kvm_mmu_memory_cache *cache,
-		       const struct kvm_memory_slot *slot,
-		       u64 *spte, gfn_t gfn, unsigned int access)
-{
-	struct kvm_mmu_page *sp;
-	struct kvm_rmap_head *rmap_head;
-	int rmap_count;
-
-	sp = sptep_to_sp(spte);
-	kvm_mmu_page_set_translation(sp, spte_index(spte), gfn, access);
-	kvm_update_page_stats(kvm, sp->role.level, 1);
-
-	rmap_head = gfn_to_rmap(gfn, sp->role.level, slot);
-	rmap_count = pte_list_add(cache, spte, rmap_head);
-
-	if (rmap_count > kvm->stat.max_mmu_rmap_size)
-		kvm->stat.max_mmu_rmap_size = rmap_count;
-	if (rmap_count > RMAP_RECYCLE_THRESHOLD) {
-		kvm_zap_all_rmap_sptes(kvm, rmap_head);
-		kvm_flush_remote_tlbs_with_address(
-				kvm, sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
-	}
-}
-
-static void rmap_add(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot,
-		     u64 *spte, gfn_t gfn, unsigned int access)
-{
-	struct kvm_mmu_memory_cache *cache = &vcpu->arch.mmu_pte_list_desc_cache;
-
-	__rmap_add(vcpu->kvm, cache, slot, spte, gfn, access);
-}
-
 bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
 	bool young = false;
@@ -1667,2315 +609,571 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 	return young;
 }
 
-#ifdef MMU_DEBUG
-static int is_empty_shadow_page(u64 *spt)
+bool kvm_mmu_remote_flush_or_zap(struct kvm *kvm, struct list_head *invalid_list,
+				 bool remote_flush)
 {
-	u64 *pos;
-	u64 *end;
+	if (!remote_flush && list_empty(invalid_list))
+		return false;
 
-	for (pos = spt, end = pos + SPTE_ENT_PER_PAGE; pos != end; pos++)
-		if (is_shadow_present_pte(*pos)) {
-			printk(KERN_ERR "%s: %p %llx\n", __func__,
-			       pos, *pos);
-			return 0;
-		}
-	return 1;
+	if (!list_empty(invalid_list))
+		kvm_mmu_commit_zap_page(kvm, invalid_list);
+	else
+		kvm_flush_remote_tlbs(kvm);
+	return true;
 }
-#endif
 
-/*
- * This value is the sum of all of the kvm instances's
- * kvm->arch.n_used_mmu_pages values.  We need a global,
- * aggregate version in order to make the slab shrinker
- * faster
- */
-static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, long nr)
+bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	kvm->arch.n_used_mmu_pages += nr;
-	percpu_counter_add(&kvm_total_used_mmu_pages, nr);
-}
+	if (sp->role.invalid)
+		return true;
 
-static void kvm_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
-{
-	kvm_mod_used_mmu_pages(kvm, +1);
-	kvm_account_pgtable_pages((void *)sp->spt, +1);
+	/* TDP MMU pages do not use the MMU generation. */
+	return !is_tdp_mmu_page(sp) &&
+	       unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen);
 }
 
-static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+/*
+ * Lookup the mapping level for @gfn in the current mm.
+ *
+ * WARNING!  Use of host_pfn_mapping_level() requires the caller and the end
+ * consumer to be tied into KVM's handlers for MMU notifier events!
+ *
+ * There are several ways to safely use this helper:
+ *
+ * - Check mmu_invalidate_retry_hva() after grabbing the mapping level, before
+ *   consuming it.  In this case, mmu_lock doesn't need to be held during the
+ *   lookup, but it does need to be held while checking the MMU notifier.
+ *
+ * - Hold mmu_lock AND ensure there is no in-progress MMU notifier invalidation
+ *   event for the hva.  This can be done by explicit checking the MMU notifier
+ *   or by ensuring that KVM already has a valid mapping that covers the hva.
+ *
+ * - Do not use the result to install new mappings, e.g. use the host mapping
+ *   level only to decide whether or not to zap an entry.  In this case, it's
+ *   not required to hold mmu_lock (though it's highly likely the caller will
+ *   want to hold mmu_lock anyways, e.g. to modify SPTEs).
+ *
+ * Note!  The lookup can still race with modifications to host page tables, but
+ * the above "rules" ensure KVM will not _consume_ the result of the walk if a
+ * race with the primary MMU occurs.
+ */
+static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
+				  const struct kvm_memory_slot *slot)
 {
-	kvm_mod_used_mmu_pages(kvm, -1);
-	kvm_account_pgtable_pages((void *)sp->spt, -1);
-}
+	int level = PG_LEVEL_4K;
+	unsigned long hva;
+	unsigned long flags;
+	pgd_t pgd;
+	p4d_t p4d;
+	pud_t pud;
+	pmd_t pmd;
 
-static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
-{
-	MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
-	hlist_del(&sp->hash_link);
-	list_del(&sp->link);
-	free_page((unsigned long)sp->spt);
-	if (!sp->role.direct)
-		free_page((unsigned long)sp->shadowed_translation);
-	kmem_cache_free(mmu_page_header_cache, sp);
-}
+	/*
+	 * Note, using the already-retrieved memslot and __gfn_to_hva_memslot()
+	 * is not solely for performance, it's also necessary to avoid the
+	 * "writable" check in __gfn_to_hva_many(), which will always fail on
+	 * read-only memslots due to gfn_to_hva() assuming writes.  Earlier
+	 * page fault steps have already verified the guest isn't writing a
+	 * read-only memslot.
+	 */
+	hva = __gfn_to_hva_memslot(slot, gfn);
 
-static unsigned kvm_page_table_hashfn(gfn_t gfn)
-{
-	return hash_64(gfn, KVM_MMU_HASH_SHIFT);
-}
+	/*
+	 * Disable IRQs to prevent concurrent tear down of host page tables,
+	 * e.g. if the primary MMU promotes a P*D to a huge page and then frees
+	 * the original page table.
+	 */
+	local_irq_save(flags);
 
-static void mmu_page_add_parent_pte(struct kvm_mmu_memory_cache *cache,
-				    struct kvm_mmu_page *sp, u64 *parent_pte)
-{
-	if (!parent_pte)
-		return;
+	/*
+	 * Read each entry once.  As above, a non-leaf entry can be promoted to
+	 * a huge page _during_ this walk.  Re-reading the entry could send the
+	 * walk into the weeks, e.g. p*d_large() returns false (sees the old
+	 * value) and then p*d_offset() walks into the target huge page instead
+	 * of the old page table (sees the new value).
+	 */
+	pgd = READ_ONCE(*pgd_offset(kvm->mm, hva));
+	if (pgd_none(pgd))
+		goto out;
 
-	pte_list_add(cache, parent_pte, &sp->parent_ptes);
-}
+	p4d = READ_ONCE(*p4d_offset(&pgd, hva));
+	if (p4d_none(p4d) || !p4d_present(p4d))
+		goto out;
 
-static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
-				       u64 *parent_pte)
-{
-	pte_list_remove(parent_pte, &sp->parent_ptes);
-}
+	pud = READ_ONCE(*pud_offset(&p4d, hva));
+	if (pud_none(pud) || !pud_present(pud))
+		goto out;
 
-static void drop_parent_pte(struct kvm_mmu_page *sp,
-			    u64 *parent_pte)
-{
-	mmu_page_remove_parent_pte(sp, parent_pte);
-	mmu_spte_clear_no_track(parent_pte);
+	if (pud_large(pud)) {
+		level = PG_LEVEL_1G;
+		goto out;
+	}
+
+	pmd = READ_ONCE(*pmd_offset(&pud, hva));
+	if (pmd_none(pmd) || !pmd_present(pmd))
+		goto out;
+
+	if (pmd_large(pmd))
+		level = PG_LEVEL_2M;
+
+out:
+	local_irq_restore(flags);
+	return level;
 }
 
-static void mark_unsync(u64 *spte);
-static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
+int kvm_mmu_max_mapping_level(struct kvm *kvm,
+			      const struct kvm_memory_slot *slot, gfn_t gfn,
+			      int max_level)
 {
-	u64 *sptep;
-	struct rmap_iterator iter;
+	struct kvm_lpage_info *linfo;
+	int host_level;
 
-	for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
-		mark_unsync(sptep);
+	max_level = min(max_level, max_huge_page_level);
+	for ( ; max_level > PG_LEVEL_4K; max_level--) {
+		linfo = lpage_info_slot(gfn, slot, max_level);
+		if (!linfo->disallow_lpage)
+			break;
 	}
-}
 
-static void mark_unsync(u64 *spte)
-{
-	struct kvm_mmu_page *sp;
+	if (max_level == PG_LEVEL_4K)
+		return PG_LEVEL_4K;
 
-	sp = sptep_to_sp(spte);
-	if (__test_and_set_bit(spte_index(spte), sp->unsync_child_bitmap))
-		return;
-	if (sp->unsync_children++)
-		return;
-	kvm_mmu_mark_parents_unsync(sp);
+	host_level = host_pfn_mapping_level(kvm, gfn, slot);
+	return min(host_level, max_level);
 }
 
-static int nonpaging_sync_page(struct kvm_vcpu *vcpu,
-			       struct kvm_mmu_page *sp)
+void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
-	return -1;
-}
+	struct kvm_memory_slot *slot = fault->slot;
+	kvm_pfn_t mask;
 
-#define KVM_PAGE_ARRAY_NR 16
+	fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled;
 
-struct kvm_mmu_pages {
-	struct mmu_page_and_offset {
-		struct kvm_mmu_page *sp;
-		unsigned int idx;
-	} page[KVM_PAGE_ARRAY_NR];
-	unsigned int nr;
-};
+	if (unlikely(fault->max_level == PG_LEVEL_4K))
+		return;
 
-static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
-			 int idx)
-{
-	int i;
+	if (is_error_noslot_pfn(fault->pfn))
+		return;
 
-	if (sp->unsync)
-		for (i=0; i < pvec->nr; i++)
-			if (pvec->page[i].sp == sp)
-				return 0;
+	if (kvm_slot_dirty_track_enabled(slot))
+		return;
 
-	pvec->page[pvec->nr].sp = sp;
-	pvec->page[pvec->nr].idx = idx;
-	pvec->nr++;
-	return (pvec->nr == KVM_PAGE_ARRAY_NR);
-}
+	/*
+	 * Enforce the iTLB multihit workaround after capturing the requested
+	 * level, which will be used to do precise, accurate accounting.
+	 */
+	fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot,
+						     fault->gfn, fault->max_level);
+	if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
+		return;
 
-static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
-{
-	--sp->unsync_children;
-	WARN_ON((int)sp->unsync_children < 0);
-	__clear_bit(idx, sp->unsync_child_bitmap);
+	/*
+	 * mmu_invalidate_retry() was successful and mmu_lock is held, so
+	 * the pmd can't be split from under us.
+	 */
+	fault->goal_level = fault->req_level;
+	mask = KVM_PAGES_PER_HPAGE(fault->goal_level) - 1;
+	VM_BUG_ON((fault->gfn & mask) != (fault->pfn & mask));
+	fault->pfn &= ~mask;
 }
 
-static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
-			   struct kvm_mmu_pages *pvec)
+void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level)
 {
-	int i, ret, nr_unsync_leaf = 0;
-
-	for_each_set_bit(i, sp->unsync_child_bitmap, 512) {
-		struct kvm_mmu_page *child;
-		u64 ent = sp->spt[i];
-
-		if (!is_shadow_present_pte(ent) || is_large_pte(ent)) {
-			clear_unsync_child_bit(sp, i);
-			continue;
-		}
-
-		child = spte_to_child_sp(ent);
-
-		if (child->unsync_children) {
-			if (mmu_pages_add(pvec, child, i))
-				return -ENOSPC;
-
-			ret = __mmu_unsync_walk(child, pvec);
-			if (!ret) {
-				clear_unsync_child_bit(sp, i);
-				continue;
-			} else if (ret > 0) {
-				nr_unsync_leaf += ret;
-			} else
-				return ret;
-		} else if (child->unsync) {
-			nr_unsync_leaf++;
-			if (mmu_pages_add(pvec, child, i))
-				return -ENOSPC;
-		} else
-			clear_unsync_child_bit(sp, i);
+	if (cur_level > PG_LEVEL_4K &&
+	    cur_level == fault->goal_level &&
+	    is_shadow_present_pte(spte) &&
+	    !is_large_pte(spte) &&
+	    spte_to_child_sp(spte)->nx_huge_page_disallowed) {
+		/*
+		 * A small SPTE exists for this pfn, but FNAME(fetch),
+		 * direct_map(), or kvm_tdp_mmu_map() would like to create a
+		 * large PTE instead: just force them to go down another level,
+		 * patching back for them into pfn the next 9 bits of the
+		 * address.
+		 */
+		u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
+				KVM_PAGES_PER_HPAGE(cur_level - 1);
+		fault->pfn |= fault->gfn & page_mask;
+		fault->goal_level--;
 	}
-
-	return nr_unsync_leaf;
 }
 
-#define INVALID_INDEX (-1)
-
-static int mmu_unsync_walk(struct kvm_mmu_page *sp,
-			   struct kvm_mmu_pages *pvec)
+static void kvm_send_hwpoison_signal(struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	pvec->nr = 0;
-	if (!sp->unsync_children)
-		return 0;
+	unsigned long hva = gfn_to_hva_memslot(slot, gfn);
 
-	mmu_pages_add(pvec, sp, INVALID_INDEX);
-	return __mmu_unsync_walk(sp, pvec);
+	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva, PAGE_SHIFT, current);
 }
 
-static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
-	WARN_ON(!sp->unsync);
-	trace_kvm_mmu_sync_page(sp);
-	sp->unsync = 0;
-	--kvm->stat.mmu_unsync;
-}
-
-static bool kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
-				     struct list_head *invalid_list);
-static void kvm_mmu_commit_zap_page(struct kvm *kvm,
-				    struct list_head *invalid_list);
+	if (is_sigpending_pfn(fault->pfn)) {
+		kvm_handle_signal_exit(vcpu);
+		return -EINTR;
+	}
 
-static bool sp_has_gptes(struct kvm_mmu_page *sp)
-{
-	if (sp->role.direct)
-		return false;
+	/*
+	 * Do not cache the mmio info caused by writing the readonly gfn
+	 * into the spte otherwise read access on readonly gfn also can
+	 * caused mmio page fault and treat it as mmio access.
+	 */
+	if (fault->pfn == KVM_PFN_ERR_RO_FAULT)
+		return RET_PF_EMULATE;
 
-	if (sp->role.passthrough)
-		return false;
+	if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
+		kvm_send_hwpoison_signal(fault->slot, fault->gfn);
+		return RET_PF_RETRY;
+	}
 
-	return true;
+	return -EFAULT;
 }
 
-#define for_each_valid_sp(_kvm, _sp, _list)				\
-	hlist_for_each_entry(_sp, _list, hash_link)			\
-		if (is_obsolete_sp((_kvm), (_sp))) {			\
-		} else
+static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
+				   struct kvm_page_fault *fault,
+				   unsigned int access)
+{
+	gva_t gva = fault->is_tdp ? 0 : fault->addr;
 
-#define for_each_gfn_valid_sp_with_gptes(_kvm, _sp, _gfn)		\
-	for_each_valid_sp(_kvm, _sp,					\
-	  &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)])	\
-		if ((_sp)->gfn != (_gfn) || !sp_has_gptes(_sp)) {} else
+	vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
+			     access & shadow_mmio_access_mask);
 
-static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
-			 struct list_head *invalid_list)
-{
-	int ret = vcpu->arch.mmu->sync_page(vcpu, sp);
+	/*
+	 * If MMIO caching is disabled, emulate immediately without
+	 * touching the shadow page tables as attempting to install an
+	 * MMIO SPTE will just be an expensive nop.
+	 */
+	if (unlikely(!enable_mmio_caching))
+		return RET_PF_EMULATE;
 
-	if (ret < 0)
-		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
-	return ret;
+	/*
+	 * Do not create an MMIO SPTE for a gfn greater than host.MAXPHYADDR,
+	 * any guest that generates such gfns is running nested and is being
+	 * tricked by L0 userspace (you can observe gfn > L1.MAXPHYADDR if and
+	 * only if L1's MAXPHYADDR is inaccurate with respect to the
+	 * hardware's).
+	 */
+	if (unlikely(fault->gfn > kvm_mmu_max_gfn()))
+		return RET_PF_EMULATE;
+
+	return RET_PF_CONTINUE;
 }
 
-bool kvm_mmu_remote_flush_or_zap(struct kvm *kvm, struct list_head *invalid_list,
-				 bool remote_flush)
+static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
 {
-	if (!remote_flush && list_empty(invalid_list))
+	/*
+	 * Page faults with reserved bits set, i.e. faults on MMIO SPTEs, only
+	 * reach the common page fault handler if the SPTE has an invalid MMIO
+	 * generation number.  Refreshing the MMIO generation needs to go down
+	 * the slow path.  Note, EPT Misconfigs do NOT set the PRESENT flag!
+	 */
+	if (fault->rsvd)
 		return false;
 
-	if (!list_empty(invalid_list))
-		kvm_mmu_commit_zap_page(kvm, invalid_list);
-	else
-		kvm_flush_remote_tlbs(kvm);
-	return true;
-}
-
-bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
-{
-	if (sp->role.invalid)
-		return true;
-
-	/* TDP MMU pages do not use the MMU generation. */
-	return !is_tdp_mmu_page(sp) &&
-	       unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen);
-}
-
-struct mmu_page_path {
-	struct kvm_mmu_page *parent[PT64_ROOT_MAX_LEVEL];
-	unsigned int idx[PT64_ROOT_MAX_LEVEL];
-};
-
-#define for_each_sp(pvec, sp, parents, i)			\
-		for (i = mmu_pages_first(&pvec, &parents);	\
-			i < pvec.nr && ({ sp = pvec.page[i].sp; 1;});	\
-			i = mmu_pages_next(&pvec, &parents, i))
-
-static int mmu_pages_next(struct kvm_mmu_pages *pvec,
-			  struct mmu_page_path *parents,
-			  int i)
-{
-	int n;
-
-	for (n = i+1; n < pvec->nr; n++) {
-		struct kvm_mmu_page *sp = pvec->page[n].sp;
-		unsigned idx = pvec->page[n].idx;
-		int level = sp->role.level;
-
-		parents->idx[level-1] = idx;
-		if (level == PG_LEVEL_4K)
-			break;
-
-		parents->parent[level-2] = sp;
-	}
+	/*
+	 * #PF can be fast if:
+	 *
+	 * 1. The shadow page table entry is not present and A/D bits are
+	 *    disabled _by KVM_, which could mean that the fault is potentially
+	 *    caused by access tracking (if enabled).  If A/D bits are enabled
+	 *    by KVM, but disabled by L1 for L2, KVM is forced to disable A/D
+	 *    bits for L2 and employ access tracking, but the fast page fault
+	 *    mechanism only supports direct MMUs.
+	 * 2. The shadow page table entry is present, the access is a write,
+	 *    and no reserved bits are set (MMIO SPTEs cannot be "fixed"), i.e.
+	 *    the fault was caused by a write-protection violation.  If the
+	 *    SPTE is MMU-writable (determined later), the fault can be fixed
+	 *    by setting the Writable bit, which can be done out of mmu_lock.
+	 */
+	if (!fault->present)
+		return !kvm_ad_enabled();
 
-	return n;
+	/*
+	 * Note, instruction fetches and writes are mutually exclusive, ignore
+	 * the "exec" flag.
+	 */
+	return fault->write;
 }
 
-static int mmu_pages_first(struct kvm_mmu_pages *pvec,
-			   struct mmu_page_path *parents)
+/*
+ * Returns true if the SPTE was fixed successfully. Otherwise,
+ * someone else modified the SPTE from its original value.
+ */
+static bool fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu,
+				    struct kvm_page_fault *fault,
+				    u64 *sptep, u64 old_spte, u64 new_spte)
 {
-	struct kvm_mmu_page *sp;
-	int level;
-
-	if (pvec->nr == 0)
-		return 0;
-
-	WARN_ON(pvec->page[0].idx != INVALID_INDEX);
-
-	sp = pvec->page[0].sp;
-	level = sp->role.level;
-	WARN_ON(level == PG_LEVEL_4K);
+	/*
+	 * Theoretically we could also set dirty bit (and flush TLB) here in
+	 * order to eliminate unnecessary PML logging. See comments in
+	 * set_spte. But fast_page_fault is very unlikely to happen with PML
+	 * enabled, so we do not do this. This might result in the same GPA
+	 * to be logged in PML buffer again when the write really happens, and
+	 * eventually to be called by mark_page_dirty twice. But it's also no
+	 * harm. This also avoids the TLB flush needed after setting dirty bit
+	 * so non-PML cases won't be impacted.
+	 *
+	 * Compare with set_spte where instead shadow_dirty_mask is set.
+	 */
+	if (!try_cmpxchg64(sptep, &old_spte, new_spte))
+		return false;
 
-	parents->parent[level-2] = sp;
+	if (is_writable_pte(new_spte) && !is_writable_pte(old_spte))
+		mark_page_dirty_in_slot(vcpu->kvm, fault->slot, fault->gfn);
 
-	/* Also set up a sentinel.  Further entries in pvec are all
-	 * children of sp, so this element is never overwritten.
-	 */
-	parents->parent[level-1] = NULL;
-	return mmu_pages_next(pvec, parents, 0);
+	return true;
 }
 
-static void mmu_pages_clear_parents(struct mmu_page_path *parents)
+static bool is_access_allowed(struct kvm_page_fault *fault, u64 spte)
 {
-	struct kvm_mmu_page *sp;
-	unsigned int level = 0;
+	if (fault->exec)
+		return is_executable_pte(spte);
 
-	do {
-		unsigned int idx = parents->idx[level];
-		sp = parents->parent[level];
-		if (!sp)
-			return;
+	if (fault->write)
+		return is_writable_pte(spte);
 
-		WARN_ON(idx == INVALID_INDEX);
-		clear_unsync_child_bit(sp, idx);
-		level++;
-	} while (!sp->unsync_children);
+	/* Fault was on Read access */
+	return spte & PT_PRESENT_MASK;
 }
 
-static int mmu_sync_children(struct kvm_vcpu *vcpu,
-			     struct kvm_mmu_page *parent, bool can_yield)
+/*
+ * Returns one of RET_PF_INVALID, RET_PF_FIXED or RET_PF_SPURIOUS.
+ */
+static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
-	int i;
 	struct kvm_mmu_page *sp;
-	struct mmu_page_path parents;
-	struct kvm_mmu_pages pages;
-	LIST_HEAD(invalid_list);
-	bool flush = false;
-
-	while (mmu_unsync_walk(parent, &pages)) {
-		bool protected = false;
-
-		for_each_sp(pages, sp, parents, i)
-			protected |= kvm_vcpu_write_protect_gfn(vcpu, sp->gfn);
-
-		if (protected) {
-			kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, true);
-			flush = false;
-		}
+	int ret = RET_PF_INVALID;
+	u64 spte = 0ull;
+	u64 *sptep = NULL;
+	uint retry_count = 0;
 
-		for_each_sp(pages, sp, parents, i) {
-			kvm_unlink_unsync_page(vcpu->kvm, sp);
-			flush |= kvm_sync_page(vcpu, sp, &invalid_list) > 0;
-			mmu_pages_clear_parents(&parents);
-		}
-		if (need_resched() || rwlock_needbreak(&vcpu->kvm->mmu_lock)) {
-			kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, flush);
-			if (!can_yield) {
-				kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
-				return -EINTR;
-			}
+	if (!page_fault_can_be_fast(fault))
+		return ret;
 
-			cond_resched_rwlock_write(&vcpu->kvm->mmu_lock);
-			flush = false;
-		}
-	}
+	walk_shadow_page_lockless_begin(vcpu);
 
-	kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, flush);
-	return 0;
-}
+	do {
+		u64 new_spte;
 
-static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
-{
-	atomic_set(&sp->write_flooding_count,  0);
-}
+		if (tdp_mmu_enabled)
+			sptep = kvm_tdp_mmu_fast_pf_get_last_sptep(vcpu, fault->addr, &spte);
+		else
+			sptep = fast_pf_get_last_sptep(vcpu, fault->addr, &spte);
 
-static void clear_sp_write_flooding_count(u64 *spte)
-{
-	__clear_sp_write_flooding_count(sptep_to_sp(spte));
-}
+		if (!is_shadow_present_pte(spte))
+			break;
 
-/*
- * The vCPU is required when finding indirect shadow pages; the shadow
- * page may already exist and syncing it needs the vCPU pointer in
- * order to read guest page tables.  Direct shadow pages are never
- * unsync, thus @vcpu can be NULL if @role.direct is true.
- */
-static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
-						     struct kvm_vcpu *vcpu,
-						     gfn_t gfn,
-						     struct hlist_head *sp_list,
-						     union kvm_mmu_page_role role)
-{
-	struct kvm_mmu_page *sp;
-	int ret;
-	int collisions = 0;
-	LIST_HEAD(invalid_list);
+		sp = sptep_to_sp(sptep);
+		if (!is_last_spte(spte, sp->role.level))
+			break;
 
-	for_each_valid_sp(kvm, sp, sp_list) {
-		if (sp->gfn != gfn) {
-			collisions++;
-			continue;
+		/*
+		 * Check whether the memory access that caused the fault would
+		 * still cause it if it were to be performed right now. If not,
+		 * then this is a spurious fault caused by TLB lazily flushed,
+		 * or some other CPU has already fixed the PTE after the
+		 * current CPU took the fault.
+		 *
+		 * Need not check the access of upper level table entries since
+		 * they are always ACC_ALL.
+		 */
+		if (is_access_allowed(fault, spte)) {
+			ret = RET_PF_SPURIOUS;
+			break;
 		}
 
-		if (sp->role.word != role.word) {
-			/*
-			 * If the guest is creating an upper-level page, zap
-			 * unsync pages for the same gfn.  While it's possible
-			 * the guest is using recursive page tables, in all
-			 * likelihood the guest has stopped using the unsync
-			 * page and is installing a completely unrelated page.
-			 * Unsync pages must not be left as is, because the new
-			 * upper-level page will be write-protected.
-			 */
-			if (role.level > PG_LEVEL_4K && sp->unsync)
-				kvm_mmu_prepare_zap_page(kvm, sp,
-							 &invalid_list);
-			continue;
-		}
+		new_spte = spte;
 
-		/* unsync and write-flooding only apply to indirect SPs. */
-		if (sp->role.direct)
-			goto out;
+		/*
+		 * KVM only supports fixing page faults outside of MMU lock for
+		 * direct MMUs, nested MMUs are always indirect, and KVM always
+		 * uses A/D bits for non-nested MMUs.  Thus, if A/D bits are
+		 * enabled, the SPTE can't be an access-tracked SPTE.
+		 */
+		if (unlikely(!kvm_ad_enabled()) && is_access_track_spte(spte))
+			new_spte = restore_acc_track_spte(new_spte);
 
-		if (sp->unsync) {
-			if (KVM_BUG_ON(!vcpu, kvm))
-				break;
+		/*
+		 * To keep things simple, only SPTEs that are MMU-writable can
+		 * be made fully writable outside of mmu_lock, e.g. only SPTEs
+		 * that were write-protected for dirty-logging or access
+		 * tracking are handled here.  Don't bother checking if the
+		 * SPTE is writable to prioritize running with A/D bits enabled.
+		 * The is_access_allowed() check above handles the common case
+		 * of the fault being spurious, and the SPTE is known to be
+		 * shadow-present, i.e. except for access tracking restoration
+		 * making the new SPTE writable, the check is wasteful.
+		 */
+		if (fault->write && is_mmu_writable_spte(spte)) {
+			new_spte |= PT_WRITABLE_MASK;
 
 			/*
-			 * The page is good, but is stale.  kvm_sync_page does
-			 * get the latest guest state, but (unlike mmu_unsync_children)
-			 * it doesn't write-protect the page or mark it synchronized!
-			 * This way the validity of the mapping is ensured, but the
-			 * overhead of write protection is not incurred until the
-			 * guest invalidates the TLB mapping.  This allows multiple
-			 * SPs for a single gfn to be unsync.
+			 * Do not fix write-permission on the large spte when
+			 * dirty logging is enabled. Since we only dirty the
+			 * first page into the dirty-bitmap in
+			 * fast_pf_fix_direct_spte(), other pages are missed
+			 * if its slot has dirty logging enabled.
 			 *
-			 * If the sync fails, the page is zapped.  If so, break
-			 * in order to rebuild it.
+			 * Instead, we let the slow page fault path create a
+			 * normal spte to fix the access.
 			 */
-			ret = kvm_sync_page(vcpu, sp, &invalid_list);
-			if (ret < 0)
+			if (sp->role.level > PG_LEVEL_4K &&
+			    kvm_slot_dirty_track_enabled(fault->slot))
 				break;
-
-			WARN_ON(!list_empty(&invalid_list));
-			if (ret > 0)
-				kvm_flush_remote_tlbs(kvm);
 		}
 
-		__clear_sp_write_flooding_count(sp);
-
-		goto out;
-	}
-
-	sp = NULL;
-	++kvm->stat.mmu_cache_miss;
-
-out:
-	kvm_mmu_commit_zap_page(kvm, &invalid_list);
-
-	if (collisions > kvm->stat.max_mmu_page_hash_collisions)
-		kvm->stat.max_mmu_page_hash_collisions = collisions;
-	return sp;
-}
-
-/* Caches used when allocating a new shadow page. */
-struct shadow_page_caches {
-	struct kvm_mmu_memory_cache *page_header_cache;
-	struct kvm_mmu_memory_cache *shadow_page_cache;
-	struct kvm_mmu_memory_cache *shadowed_info_cache;
-};
-
-static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
-						      struct shadow_page_caches *caches,
-						      gfn_t gfn,
-						      struct hlist_head *sp_list,
-						      union kvm_mmu_page_role role)
-{
-	struct kvm_mmu_page *sp;
-
-	sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
-	sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
-	if (!role.direct)
-		sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
-
-	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
-
-	INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
-
-	/*
-	 * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages()
-	 * depends on valid pages being added to the head of the list.  See
-	 * comments in kvm_zap_obsolete_pages().
-	 */
-	sp->mmu_valid_gen = kvm->arch.mmu_valid_gen;
-	list_add(&sp->link, &kvm->arch.active_mmu_pages);
-	kvm_account_mmu_page(kvm, sp);
-
-	sp->gfn = gfn;
-	sp->role = role;
-	hlist_add_head(&sp->hash_link, sp_list);
-	if (sp_has_gptes(sp))
-		account_shadowed(kvm, sp);
-
-	return sp;
-}
-
-/* Note, @vcpu may be NULL if @role.direct is true; see kvm_mmu_find_shadow_page. */
-static struct kvm_mmu_page *__kvm_mmu_get_shadow_page(struct kvm *kvm,
-						      struct kvm_vcpu *vcpu,
-						      struct shadow_page_caches *caches,
-						      gfn_t gfn,
-						      union kvm_mmu_page_role role)
-{
-	struct hlist_head *sp_list;
-	struct kvm_mmu_page *sp;
-	bool created = false;
-
-	sp_list = &kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
-
-	sp = kvm_mmu_find_shadow_page(kvm, vcpu, gfn, sp_list, role);
-	if (!sp) {
-		created = true;
-		sp = kvm_mmu_alloc_shadow_page(kvm, caches, gfn, sp_list, role);
-	}
-
-	trace_kvm_mmu_get_page(sp, created);
-	return sp;
-}
-
-static struct kvm_mmu_page *kvm_mmu_get_shadow_page(struct kvm_vcpu *vcpu,
-						    gfn_t gfn,
-						    union kvm_mmu_page_role role)
-{
-	struct shadow_page_caches caches = {
-		.page_header_cache = &vcpu->arch.mmu_page_header_cache,
-		.shadow_page_cache = &vcpu->arch.mmu_shadow_page_cache,
-		.shadowed_info_cache = &vcpu->arch.mmu_shadowed_info_cache,
-	};
-
-	return __kvm_mmu_get_shadow_page(vcpu->kvm, vcpu, &caches, gfn, role);
-}
-
-static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
-						  unsigned int access)
-{
-	struct kvm_mmu_page *parent_sp = sptep_to_sp(sptep);
-	union kvm_mmu_page_role role;
-
-	role = parent_sp->role;
-	role.level--;
-	role.access = access;
-	role.direct = direct;
-	role.passthrough = 0;
-
-	/*
-	 * If the guest has 4-byte PTEs then that means it's using 32-bit,
-	 * 2-level, non-PAE paging. KVM shadows such guests with PAE paging
-	 * (i.e. 8-byte PTEs). The difference in PTE size means that KVM must
-	 * shadow each guest page table with multiple shadow page tables, which
-	 * requires extra bookkeeping in the role.
-	 *
-	 * Specifically, to shadow the guest's page directory (which covers a
-	 * 4GiB address space), KVM uses 4 PAE page directories, each mapping
-	 * 1GiB of the address space. @role.quadrant encodes which quarter of
-	 * the address space each maps.
-	 *
-	 * To shadow the guest's page tables (which each map a 4MiB region), KVM
-	 * uses 2 PAE page tables, each mapping a 2MiB region. For these,
-	 * @role.quadrant encodes which half of the region they map.
-	 *
-	 * Concretely, a 4-byte PDE consumes bits 31:22, while an 8-byte PDE
-	 * consumes bits 29:21.  To consume bits 31:30, KVM's uses 4 shadow
-	 * PDPTEs; those 4 PAE page directories are pre-allocated and their
-	 * quadrant is assigned in mmu_alloc_root().   A 4-byte PTE consumes
-	 * bits 21:12, while an 8-byte PTE consumes bits 20:12.  To consume
-	 * bit 21 in the PTE (the child here), KVM propagates that bit to the
-	 * quadrant, i.e. sets quadrant to '0' or '1'.  The parent 8-byte PDE
-	 * covers bit 21 (see above), thus the quadrant is calculated from the
-	 * _least_ significant bit of the PDE index.
-	 */
-	if (role.has_4_byte_gpte) {
-		WARN_ON_ONCE(role.level != PG_LEVEL_4K);
-		role.quadrant = spte_index(sptep) & 1;
-	}
-
-	return role;
-}
-
-static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu,
-						 u64 *sptep, gfn_t gfn,
-						 bool direct, unsigned int access)
-{
-	union kvm_mmu_page_role role;
-
-	if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
-		return ERR_PTR(-EEXIST);
-
-	role = kvm_mmu_child_role(sptep, direct, access);
-	return kvm_mmu_get_shadow_page(vcpu, gfn, role);
-}
-
-static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterator,
-					struct kvm_vcpu *vcpu, hpa_t root,
-					u64 addr)
-{
-	iterator->addr = addr;
-	iterator->shadow_addr = root;
-	iterator->level = vcpu->arch.mmu->root_role.level;
-
-	if (iterator->level >= PT64_ROOT_4LEVEL &&
-	    vcpu->arch.mmu->cpu_role.base.level < PT64_ROOT_4LEVEL &&
-	    !vcpu->arch.mmu->root_role.direct)
-		iterator->level = PT32E_ROOT_LEVEL;
+		/* Verify that the fault can be handled in the fast path */
+		if (new_spte == spte ||
+		    !is_access_allowed(fault, new_spte))
+			break;
 
-	if (iterator->level == PT32E_ROOT_LEVEL) {
 		/*
-		 * prev_root is currently only used for 64-bit hosts. So only
-		 * the active root_hpa is valid here.
+		 * Currently, fast page fault only works for direct mapping
+		 * since the gfn is not stable for indirect shadow page. See
+		 * Documentation/virt/kvm/locking.rst to get more detail.
 		 */
-		BUG_ON(root != vcpu->arch.mmu->root.hpa);
-
-		iterator->shadow_addr
-			= vcpu->arch.mmu->pae_root[(addr >> 30) & 3];
-		iterator->shadow_addr &= SPTE_BASE_ADDR_MASK;
-		--iterator->level;
-		if (!iterator->shadow_addr)
-			iterator->level = 0;
-	}
-}
+		if (fast_pf_fix_direct_spte(vcpu, fault, sptep, spte, new_spte)) {
+			ret = RET_PF_FIXED;
+			break;
+		}
 
-static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
-			     struct kvm_vcpu *vcpu, u64 addr)
-{
-	shadow_walk_init_using_root(iterator, vcpu, vcpu->arch.mmu->root.hpa,
-				    addr);
-}
-
-static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
-{
-	if (iterator->level < PG_LEVEL_4K)
-		return false;
-
-	iterator->index = SPTE_INDEX(iterator->addr, iterator->level);
-	iterator->sptep	= ((u64 *)__va(iterator->shadow_addr)) + iterator->index;
-	return true;
-}
-
-static void __shadow_walk_next(struct kvm_shadow_walk_iterator *iterator,
-			       u64 spte)
-{
-	if (!is_shadow_present_pte(spte) || is_last_spte(spte, iterator->level)) {
-		iterator->level = 0;
-		return;
-	}
-
-	iterator->shadow_addr = spte & SPTE_BASE_ADDR_MASK;
-	--iterator->level;
-}
-
-static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
-{
-	__shadow_walk_next(iterator, *iterator->sptep);
-}
-
-static void __link_shadow_page(struct kvm *kvm,
-			       struct kvm_mmu_memory_cache *cache, u64 *sptep,
-			       struct kvm_mmu_page *sp, bool flush)
-{
-	u64 spte;
-
-	BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
-
-	/*
-	 * If an SPTE is present already, it must be a leaf and therefore
-	 * a large one.  Drop it, and flush the TLB if needed, before
-	 * installing sp.
-	 */
-	if (is_shadow_present_pte(*sptep))
-		drop_large_spte(kvm, sptep, flush);
-
-	spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp));
-
-	mmu_spte_set(sptep, spte);
-
-	mmu_page_add_parent_pte(cache, sp, sptep);
-
-	/*
-	 * The non-direct sub-pagetable must be updated before linking.  For
-	 * L1 sp, the pagetable is updated via kvm_sync_page() in
-	 * kvm_mmu_find_shadow_page() without write-protecting the gfn,
-	 * so sp->unsync can be true or false.  For higher level non-direct
-	 * sp, the pagetable is updated/synced via mmu_sync_children() in
-	 * FNAME(fetch)(), so sp->unsync_children can only be false.
-	 * WARN_ON_ONCE() if anything happens unexpectedly.
-	 */
-	if (WARN_ON_ONCE(sp->unsync_children) || sp->unsync)
-		mark_unsync(sptep);
-}
-
-static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
-			     struct kvm_mmu_page *sp)
-{
-	__link_shadow_page(vcpu->kvm, &vcpu->arch.mmu_pte_list_desc_cache, sptep, sp, true);
-}
-
-static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-				   unsigned direct_access)
-{
-	if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) {
-		struct kvm_mmu_page *child;
-
-		/*
-		 * For the direct sp, if the guest pte's dirty bit
-		 * changed form clean to dirty, it will corrupt the
-		 * sp's access: allow writable in the read-only sp,
-		 * so we should update the spte at this point to get
-		 * a new sp with the correct access.
-		 */
-		child = spte_to_child_sp(*sptep);
-		if (child->role.access == direct_access)
-			return;
-
-		drop_parent_pte(child, sptep);
-		kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
-	}
-}
-
-/* Returns the number of zapped non-leaf child shadow pages. */
-static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
-			    u64 *spte, struct list_head *invalid_list)
-{
-	u64 pte;
-	struct kvm_mmu_page *child;
-
-	pte = *spte;
-	if (is_shadow_present_pte(pte)) {
-		if (is_last_spte(pte, sp->role.level)) {
-			drop_spte(kvm, spte);
-		} else {
-			child = spte_to_child_sp(pte);
-			drop_parent_pte(child, spte);
-
-			/*
-			 * Recursively zap nested TDP SPs, parentless SPs are
-			 * unlikely to be used again in the near future.  This
-			 * avoids retaining a large number of stale nested SPs.
-			 */
-			if (tdp_enabled && invalid_list &&
-			    child->role.guest_mode && !child->parent_ptes.val)
-				return kvm_mmu_prepare_zap_page(kvm, child,
-								invalid_list);
-		}
-	} else if (is_mmio_spte(pte)) {
-		mmu_spte_clear_no_track(spte);
-	}
-	return 0;
-}
-
-static int kvm_mmu_page_unlink_children(struct kvm *kvm,
-					struct kvm_mmu_page *sp,
-					struct list_head *invalid_list)
-{
-	int zapped = 0;
-	unsigned i;
-
-	for (i = 0; i < SPTE_ENT_PER_PAGE; ++i)
-		zapped += mmu_page_zap_pte(kvm, sp, sp->spt + i, invalid_list);
-
-	return zapped;
-}
-
-static void kvm_mmu_unlink_parents(struct kvm_mmu_page *sp)
-{
-	u64 *sptep;
-	struct rmap_iterator iter;
-
-	while ((sptep = rmap_get_first(&sp->parent_ptes, &iter)))
-		drop_parent_pte(sp, sptep);
-}
-
-static int mmu_zap_unsync_children(struct kvm *kvm,
-				   struct kvm_mmu_page *parent,
-				   struct list_head *invalid_list)
-{
-	int i, zapped = 0;
-	struct mmu_page_path parents;
-	struct kvm_mmu_pages pages;
-
-	if (parent->role.level == PG_LEVEL_4K)
-		return 0;
-
-	while (mmu_unsync_walk(parent, &pages)) {
-		struct kvm_mmu_page *sp;
-
-		for_each_sp(pages, sp, parents, i) {
-			kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
-			mmu_pages_clear_parents(&parents);
-			zapped++;
-		}
-	}
-
-	return zapped;
-}
-
-static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
-				       struct kvm_mmu_page *sp,
-				       struct list_head *invalid_list,
-				       int *nr_zapped)
-{
-	bool list_unstable, zapped_root = false;
-
-	lockdep_assert_held_write(&kvm->mmu_lock);
-	trace_kvm_mmu_prepare_zap_page(sp);
-	++kvm->stat.mmu_shadow_zapped;
-	*nr_zapped = mmu_zap_unsync_children(kvm, sp, invalid_list);
-	*nr_zapped += kvm_mmu_page_unlink_children(kvm, sp, invalid_list);
-	kvm_mmu_unlink_parents(sp);
-
-	/* Zapping children means active_mmu_pages has become unstable. */
-	list_unstable = *nr_zapped;
-
-	if (!sp->role.invalid && sp_has_gptes(sp))
-		unaccount_shadowed(kvm, sp);
-
-	if (sp->unsync)
-		kvm_unlink_unsync_page(kvm, sp);
-	if (!sp->root_count) {
-		/* Count self */
-		(*nr_zapped)++;
-
-		/*
-		 * Already invalid pages (previously active roots) are not on
-		 * the active page list.  See list_del() in the "else" case of
-		 * !sp->root_count.
-		 */
-		if (sp->role.invalid)
-			list_add(&sp->link, invalid_list);
-		else
-			list_move(&sp->link, invalid_list);
-		kvm_unaccount_mmu_page(kvm, sp);
-	} else {
-		/*
-		 * Remove the active root from the active page list, the root
-		 * will be explicitly freed when the root_count hits zero.
-		 */
-		list_del(&sp->link);
-
-		/*
-		 * Obsolete pages cannot be used on any vCPUs, see the comment
-		 * in kvm_mmu_zap_all_fast().  Note, is_obsolete_sp() also
-		 * treats invalid shadow pages as being obsolete.
-		 */
-		zapped_root = !is_obsolete_sp(kvm, sp);
-	}
-
-	if (sp->nx_huge_page_disallowed)
-		unaccount_nx_huge_page(kvm, sp);
-
-	sp->role.invalid = 1;
-
-	/*
-	 * Make the request to free obsolete roots after marking the root
-	 * invalid, otherwise other vCPUs may not see it as invalid.
-	 */
-	if (zapped_root)
-		kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS);
-	return list_unstable;
-}
-
-static bool kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
-				     struct list_head *invalid_list)
-{
-	int nr_zapped;
-
-	__kvm_mmu_prepare_zap_page(kvm, sp, invalid_list, &nr_zapped);
-	return nr_zapped;
-}
-
-static void kvm_mmu_commit_zap_page(struct kvm *kvm,
-				    struct list_head *invalid_list)
-{
-	struct kvm_mmu_page *sp, *nsp;
-
-	if (list_empty(invalid_list))
-		return;
-
-	/*
-	 * We need to make sure everyone sees our modifications to
-	 * the page tables and see changes to vcpu->mode here. The barrier
-	 * in the kvm_flush_remote_tlbs() achieves this. This pairs
-	 * with vcpu_enter_guest and walk_shadow_page_lockless_begin/end.
-	 *
-	 * In addition, kvm_flush_remote_tlbs waits for all vcpus to exit
-	 * guest mode and/or lockless shadow page table walks.
-	 */
-	kvm_flush_remote_tlbs(kvm);
-
-	list_for_each_entry_safe(sp, nsp, invalid_list, link) {
-		WARN_ON(!sp->role.invalid || sp->root_count);
-		kvm_mmu_free_shadow_page(sp);
-	}
-}
-
-static unsigned long kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm,
-						  unsigned long nr_to_zap)
-{
-	unsigned long total_zapped = 0;
-	struct kvm_mmu_page *sp, *tmp;
-	LIST_HEAD(invalid_list);
-	bool unstable;
-	int nr_zapped;
-
-	if (list_empty(&kvm->arch.active_mmu_pages))
-		return 0;
-
-restart:
-	list_for_each_entry_safe_reverse(sp, tmp, &kvm->arch.active_mmu_pages, link) {
-		/*
-		 * Don't zap active root pages, the page itself can't be freed
-		 * and zapping it will just force vCPUs to realloc and reload.
-		 */
-		if (sp->root_count)
-			continue;
-
-		unstable = __kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list,
-						      &nr_zapped);
-		total_zapped += nr_zapped;
-		if (total_zapped >= nr_to_zap)
-			break;
-
-		if (unstable)
-			goto restart;
-	}
-
-	kvm_mmu_commit_zap_page(kvm, &invalid_list);
-
-	kvm->stat.mmu_recycled += total_zapped;
-	return total_zapped;
-}
-
-static inline unsigned long kvm_mmu_available_pages(struct kvm *kvm)
-{
-	if (kvm->arch.n_max_mmu_pages > kvm->arch.n_used_mmu_pages)
-		return kvm->arch.n_max_mmu_pages -
-			kvm->arch.n_used_mmu_pages;
-
-	return 0;
-}
-
-static int make_mmu_pages_available(struct kvm_vcpu *vcpu)
-{
-	unsigned long avail = kvm_mmu_available_pages(vcpu->kvm);
-
-	if (likely(avail >= KVM_MIN_FREE_MMU_PAGES))
-		return 0;
-
-	kvm_mmu_zap_oldest_mmu_pages(vcpu->kvm, KVM_REFILL_PAGES - avail);
-
-	/*
-	 * Note, this check is intentionally soft, it only guarantees that one
-	 * page is available, while the caller may end up allocating as many as
-	 * four pages, e.g. for PAE roots or for 5-level paging.  Temporarily
-	 * exceeding the (arbitrary by default) limit will not harm the host,
-	 * being too aggressive may unnecessarily kill the guest, and getting an
-	 * exact count is far more trouble than it's worth, especially in the
-	 * page fault paths.
-	 */
-	if (!kvm_mmu_available_pages(vcpu->kvm))
-		return -ENOSPC;
-	return 0;
-}
-
-/*
- * Changing the number of mmu pages allocated to the vm
- * Note: if goal_nr_mmu_pages is too small, you will get dead lock
- */
-void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long goal_nr_mmu_pages)
-{
-	write_lock(&kvm->mmu_lock);
-
-	if (kvm->arch.n_used_mmu_pages > goal_nr_mmu_pages) {
-		kvm_mmu_zap_oldest_mmu_pages(kvm, kvm->arch.n_used_mmu_pages -
-						  goal_nr_mmu_pages);
-
-		goal_nr_mmu_pages = kvm->arch.n_used_mmu_pages;
-	}
-
-	kvm->arch.n_max_mmu_pages = goal_nr_mmu_pages;
-
-	write_unlock(&kvm->mmu_lock);
-}
-
-int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
-{
-	struct kvm_mmu_page *sp;
-	LIST_HEAD(invalid_list);
-	int r;
-
-	pgprintk("%s: looking for gfn %llx\n", __func__, gfn);
-	r = 0;
-	write_lock(&kvm->mmu_lock);
-	for_each_gfn_valid_sp_with_gptes(kvm, sp, gfn) {
-		pgprintk("%s: gfn %llx role %x\n", __func__, gfn,
-			 sp->role.word);
-		r = 1;
-		kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
-	}
-	kvm_mmu_commit_zap_page(kvm, &invalid_list);
-	write_unlock(&kvm->mmu_lock);
-
-	return r;
-}
-
-static int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
-{
-	gpa_t gpa;
-	int r;
-
-	if (vcpu->arch.mmu->root_role.direct)
-		return 0;
-
-	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
-
-	r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
-
-	return r;
-}
-
-static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
-{
-	trace_kvm_mmu_unsync_page(sp);
-	++kvm->stat.mmu_unsync;
-	sp->unsync = 1;
-
-	kvm_mmu_mark_parents_unsync(sp);
-}
-
-/*
- * Attempt to unsync any shadow pages that can be reached by the specified gfn,
- * KVM is creating a writable mapping for said gfn.  Returns 0 if all pages
- * were marked unsync (or if there is no shadow page), -EPERM if the SPTE must
- * be write-protected.
- */
-int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
-			    gfn_t gfn, bool can_unsync, bool prefetch)
-{
-	struct kvm_mmu_page *sp;
-	bool locked = false;
-
-	/*
-	 * Force write-protection if the page is being tracked.  Note, the page
-	 * track machinery is used to write-protect upper-level shadow pages,
-	 * i.e. this guards the role.level == 4K assertion below!
-	 */
-	if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE))
-		return -EPERM;
-
-	/*
-	 * The page is not write-tracked, mark existing shadow pages unsync
-	 * unless KVM is synchronizing an unsync SP (can_unsync = false).  In
-	 * that case, KVM must complete emulation of the guest TLB flush before
-	 * allowing shadow pages to become unsync (writable by the guest).
-	 */
-	for_each_gfn_valid_sp_with_gptes(kvm, sp, gfn) {
-		if (!can_unsync)
-			return -EPERM;
-
-		if (sp->unsync)
-			continue;
-
-		if (prefetch)
-			return -EEXIST;
-
-		/*
-		 * TDP MMU page faults require an additional spinlock as they
-		 * run with mmu_lock held for read, not write, and the unsync
-		 * logic is not thread safe.  Take the spinklock regardless of
-		 * the MMU type to avoid extra conditionals/parameters, there's
-		 * no meaningful penalty if mmu_lock is held for write.
-		 */
-		if (!locked) {
-			locked = true;
-			spin_lock(&kvm->arch.mmu_unsync_pages_lock);
-
-			/*
-			 * Recheck after taking the spinlock, a different vCPU
-			 * may have since marked the page unsync.  A false
-			 * positive on the unprotected check above is not
-			 * possible as clearing sp->unsync _must_ hold mmu_lock
-			 * for write, i.e. unsync cannot transition from 0->1
-			 * while this CPU holds mmu_lock for read (or write).
-			 */
-			if (READ_ONCE(sp->unsync))
-				continue;
-		}
-
-		WARN_ON(sp->role.level != PG_LEVEL_4K);
-		kvm_unsync_page(kvm, sp);
-	}
-	if (locked)
-		spin_unlock(&kvm->arch.mmu_unsync_pages_lock);
-
-	/*
-	 * We need to ensure that the marking of unsync pages is visible
-	 * before the SPTE is updated to allow writes because
-	 * kvm_mmu_sync_roots() checks the unsync flags without holding
-	 * the MMU lock and so can race with this. If the SPTE was updated
-	 * before the page had been marked as unsync-ed, something like the
-	 * following could happen:
-	 *
-	 * CPU 1                    CPU 2
-	 * ---------------------------------------------------------------------
-	 * 1.2 Host updates SPTE
-	 *     to be writable
-	 *                      2.1 Guest writes a GPTE for GVA X.
-	 *                          (GPTE being in the guest page table shadowed
-	 *                           by the SP from CPU 1.)
-	 *                          This reads SPTE during the page table walk.
-	 *                          Since SPTE.W is read as 1, there is no
-	 *                          fault.
-	 *
-	 *                      2.2 Guest issues TLB flush.
-	 *                          That causes a VM Exit.
-	 *
-	 *                      2.3 Walking of unsync pages sees sp->unsync is
-	 *                          false and skips the page.
-	 *
-	 *                      2.4 Guest accesses GVA X.
-	 *                          Since the mapping in the SP was not updated,
-	 *                          so the old mapping for GVA X incorrectly
-	 *                          gets used.
-	 * 1.1 Host marks SP
-	 *     as unsync
-	 *     (sp->unsync = true)
-	 *
-	 * The write barrier below ensures that 1.1 happens before 1.2 and thus
-	 * the situation in 2.4 does not arise.  It pairs with the read barrier
-	 * in is_unsync_root(), placed between 2.1's load of SPTE.W and 2.3.
-	 */
-	smp_wmb();
-
-	return 0;
-}
-
-static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
-			u64 *sptep, unsigned int pte_access, gfn_t gfn,
-			kvm_pfn_t pfn, struct kvm_page_fault *fault)
-{
-	struct kvm_mmu_page *sp = sptep_to_sp(sptep);
-	int level = sp->role.level;
-	int was_rmapped = 0;
-	int ret = RET_PF_FIXED;
-	bool flush = false;
-	bool wrprot;
-	u64 spte;
-
-	/* Prefetching always gets a writable pfn.  */
-	bool host_writable = !fault || fault->map_writable;
-	bool prefetch = !fault || fault->prefetch;
-	bool write_fault = fault && fault->write;
-
-	pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
-		 *sptep, write_fault, gfn);
-
-	if (unlikely(is_noslot_pfn(pfn))) {
-		vcpu->stat.pf_mmio_spte_created++;
-		mark_mmio_spte(vcpu, sptep, gfn, pte_access);
-		return RET_PF_EMULATE;
-	}
-
-	if (is_shadow_present_pte(*sptep)) {
-		/*
-		 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
-		 * the parent of the now unreachable PTE.
-		 */
-		if (level > PG_LEVEL_4K && !is_large_pte(*sptep)) {
-			struct kvm_mmu_page *child;
-			u64 pte = *sptep;
-
-			child = spte_to_child_sp(pte);
-			drop_parent_pte(child, sptep);
-			flush = true;
-		} else if (pfn != spte_to_pfn(*sptep)) {
-			pgprintk("hfn old %llx new %llx\n",
-				 spte_to_pfn(*sptep), pfn);
-			drop_spte(vcpu->kvm, sptep);
-			flush = true;
-		} else
-			was_rmapped = 1;
-	}
-
-	wrprot = make_spte(vcpu, sp, slot, pte_access, gfn, pfn, *sptep, prefetch,
-			   true, host_writable, &spte);
-
-	if (*sptep == spte) {
-		ret = RET_PF_SPURIOUS;
-	} else {
-		flush |= mmu_spte_update(sptep, spte);
-		trace_kvm_mmu_set_spte(level, gfn, sptep);
-	}
-
-	if (wrprot) {
-		if (write_fault)
-			ret = RET_PF_EMULATE;
-	}
-
-	if (flush)
-		kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn,
-				KVM_PAGES_PER_HPAGE(level));
-
-	pgprintk("%s: setting spte %llx\n", __func__, *sptep);
-
-	if (!was_rmapped) {
-		WARN_ON_ONCE(ret == RET_PF_SPURIOUS);
-		rmap_add(vcpu, slot, sptep, gfn, pte_access);
-	} else {
-		/* Already rmapped but the pte_access bits may have changed. */
-		kvm_mmu_page_set_access(sp, spte_index(sptep), pte_access);
-	}
-
-	return ret;
-}
-
-static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
-				    struct kvm_mmu_page *sp,
-				    u64 *start, u64 *end)
-{
-	struct page *pages[PTE_PREFETCH_NUM];
-	struct kvm_memory_slot *slot;
-	unsigned int access = sp->role.access;
-	int i, ret;
-	gfn_t gfn;
-
-	gfn = kvm_mmu_page_get_gfn(sp, spte_index(start));
-	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, access & ACC_WRITE_MASK);
-	if (!slot)
-		return -1;
-
-	ret = gfn_to_page_many_atomic(slot, gfn, pages, end - start);
-	if (ret <= 0)
-		return -1;
-
-	for (i = 0; i < ret; i++, gfn++, start++) {
-		mmu_set_spte(vcpu, slot, start, access, gfn,
-			     page_to_pfn(pages[i]), NULL);
-		put_page(pages[i]);
-	}
-
-	return 0;
-}
-
-static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
-				  struct kvm_mmu_page *sp, u64 *sptep)
-{
-	u64 *spte, *start = NULL;
-	int i;
-
-	WARN_ON(!sp->role.direct);
-
-	i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
-	spte = sp->spt + i;
-
-	for (i = 0; i < PTE_PREFETCH_NUM; i++, spte++) {
-		if (is_shadow_present_pte(*spte) || spte == sptep) {
-			if (!start)
-				continue;
-			if (direct_pte_prefetch_many(vcpu, sp, start, spte) < 0)
-				return;
-			start = NULL;
-		} else if (!start)
-			start = spte;
-	}
-	if (start)
-		direct_pte_prefetch_many(vcpu, sp, start, spte);
-}
-
-static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
-{
-	struct kvm_mmu_page *sp;
-
-	sp = sptep_to_sp(sptep);
-
-	/*
-	 * Without accessed bits, there's no way to distinguish between
-	 * actually accessed translations and prefetched, so disable pte
-	 * prefetch if accessed bits aren't available.
-	 */
-	if (sp_ad_disabled(sp))
-		return;
-
-	if (sp->role.level > PG_LEVEL_4K)
-		return;
-
-	/*
-	 * If addresses are being invalidated, skip prefetching to avoid
-	 * accidentally prefetching those addresses.
-	 */
-	if (unlikely(vcpu->kvm->mmu_invalidate_in_progress))
-		return;
-
-	__direct_pte_prefetch(vcpu, sp, sptep);
-}
-
-/*
- * Lookup the mapping level for @gfn in the current mm.
- *
- * WARNING!  Use of host_pfn_mapping_level() requires the caller and the end
- * consumer to be tied into KVM's handlers for MMU notifier events!
- *
- * There are several ways to safely use this helper:
- *
- * - Check mmu_invalidate_retry_hva() after grabbing the mapping level, before
- *   consuming it.  In this case, mmu_lock doesn't need to be held during the
- *   lookup, but it does need to be held while checking the MMU notifier.
- *
- * - Hold mmu_lock AND ensure there is no in-progress MMU notifier invalidation
- *   event for the hva.  This can be done by explicit checking the MMU notifier
- *   or by ensuring that KVM already has a valid mapping that covers the hva.
- *
- * - Do not use the result to install new mappings, e.g. use the host mapping
- *   level only to decide whether or not to zap an entry.  In this case, it's
- *   not required to hold mmu_lock (though it's highly likely the caller will
- *   want to hold mmu_lock anyways, e.g. to modify SPTEs).
- *
- * Note!  The lookup can still race with modifications to host page tables, but
- * the above "rules" ensure KVM will not _consume_ the result of the walk if a
- * race with the primary MMU occurs.
- */
-static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
-				  const struct kvm_memory_slot *slot)
-{
-	int level = PG_LEVEL_4K;
-	unsigned long hva;
-	unsigned long flags;
-	pgd_t pgd;
-	p4d_t p4d;
-	pud_t pud;
-	pmd_t pmd;
-
-	/*
-	 * Note, using the already-retrieved memslot and __gfn_to_hva_memslot()
-	 * is not solely for performance, it's also necessary to avoid the
-	 * "writable" check in __gfn_to_hva_many(), which will always fail on
-	 * read-only memslots due to gfn_to_hva() assuming writes.  Earlier
-	 * page fault steps have already verified the guest isn't writing a
-	 * read-only memslot.
-	 */
-	hva = __gfn_to_hva_memslot(slot, gfn);
-
-	/*
-	 * Disable IRQs to prevent concurrent tear down of host page tables,
-	 * e.g. if the primary MMU promotes a P*D to a huge page and then frees
-	 * the original page table.
-	 */
-	local_irq_save(flags);
-
-	/*
-	 * Read each entry once.  As above, a non-leaf entry can be promoted to
-	 * a huge page _during_ this walk.  Re-reading the entry could send the
-	 * walk into the weeks, e.g. p*d_large() returns false (sees the old
-	 * value) and then p*d_offset() walks into the target huge page instead
-	 * of the old page table (sees the new value).
-	 */
-	pgd = READ_ONCE(*pgd_offset(kvm->mm, hva));
-	if (pgd_none(pgd))
-		goto out;
-
-	p4d = READ_ONCE(*p4d_offset(&pgd, hva));
-	if (p4d_none(p4d) || !p4d_present(p4d))
-		goto out;
-
-	pud = READ_ONCE(*pud_offset(&p4d, hva));
-	if (pud_none(pud) || !pud_present(pud))
-		goto out;
-
-	if (pud_large(pud)) {
-		level = PG_LEVEL_1G;
-		goto out;
-	}
-
-	pmd = READ_ONCE(*pmd_offset(&pud, hva));
-	if (pmd_none(pmd) || !pmd_present(pmd))
-		goto out;
-
-	if (pmd_large(pmd))
-		level = PG_LEVEL_2M;
-
-out:
-	local_irq_restore(flags);
-	return level;
-}
-
-int kvm_mmu_max_mapping_level(struct kvm *kvm,
-			      const struct kvm_memory_slot *slot, gfn_t gfn,
-			      int max_level)
-{
-	struct kvm_lpage_info *linfo;
-	int host_level;
-
-	max_level = min(max_level, max_huge_page_level);
-	for ( ; max_level > PG_LEVEL_4K; max_level--) {
-		linfo = lpage_info_slot(gfn, slot, max_level);
-		if (!linfo->disallow_lpage)
-			break;
-	}
-
-	if (max_level == PG_LEVEL_4K)
-		return PG_LEVEL_4K;
-
-	host_level = host_pfn_mapping_level(kvm, gfn, slot);
-	return min(host_level, max_level);
-}
-
-void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
-{
-	struct kvm_memory_slot *slot = fault->slot;
-	kvm_pfn_t mask;
-
-	fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled;
-
-	if (unlikely(fault->max_level == PG_LEVEL_4K))
-		return;
-
-	if (is_error_noslot_pfn(fault->pfn))
-		return;
-
-	if (kvm_slot_dirty_track_enabled(slot))
-		return;
-
-	/*
-	 * Enforce the iTLB multihit workaround after capturing the requested
-	 * level, which will be used to do precise, accurate accounting.
-	 */
-	fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot,
-						     fault->gfn, fault->max_level);
-	if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
-		return;
-
-	/*
-	 * mmu_invalidate_retry() was successful and mmu_lock is held, so
-	 * the pmd can't be split from under us.
-	 */
-	fault->goal_level = fault->req_level;
-	mask = KVM_PAGES_PER_HPAGE(fault->goal_level) - 1;
-	VM_BUG_ON((fault->gfn & mask) != (fault->pfn & mask));
-	fault->pfn &= ~mask;
-}
-
-void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level)
-{
-	if (cur_level > PG_LEVEL_4K &&
-	    cur_level == fault->goal_level &&
-	    is_shadow_present_pte(spte) &&
-	    !is_large_pte(spte) &&
-	    spte_to_child_sp(spte)->nx_huge_page_disallowed) {
-		/*
-		 * A small SPTE exists for this pfn, but FNAME(fetch),
-		 * direct_map(), or kvm_tdp_mmu_map() would like to create a
-		 * large PTE instead: just force them to go down another level,
-		 * patching back for them into pfn the next 9 bits of the
-		 * address.
-		 */
-		u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
-				KVM_PAGES_PER_HPAGE(cur_level - 1);
-		fault->pfn |= fault->gfn & page_mask;
-		fault->goal_level--;
-	}
-}
-
-static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
-{
-	struct kvm_shadow_walk_iterator it;
-	struct kvm_mmu_page *sp;
-	int ret;
-	gfn_t base_gfn = fault->gfn;
-
-	kvm_mmu_hugepage_adjust(vcpu, fault);
-
-	trace_kvm_mmu_spte_requested(fault);
-	for_each_shadow_entry(vcpu, fault->addr, it) {
-		/*
-		 * We cannot overwrite existing page tables with an NX
-		 * large page, as the leaf could be executable.
-		 */
-		if (fault->nx_huge_page_workaround_enabled)
-			disallowed_hugepage_adjust(fault, *it.sptep, it.level);
-
-		base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
-		if (it.level == fault->goal_level)
-			break;
-
-		sp = kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, true, ACC_ALL);
-		if (sp == ERR_PTR(-EEXIST))
-			continue;
-
-		link_shadow_page(vcpu, it.sptep, sp);
-		if (fault->huge_page_disallowed)
-			account_nx_huge_page(vcpu->kvm, sp,
-					     fault->req_level >= it.level);
-	}
-
-	if (WARN_ON_ONCE(it.level != fault->goal_level))
-		return -EFAULT;
-
-	ret = mmu_set_spte(vcpu, fault->slot, it.sptep, ACC_ALL,
-			   base_gfn, fault->pfn, fault);
-	if (ret == RET_PF_SPURIOUS)
-		return ret;
-
-	direct_pte_prefetch(vcpu, it.sptep);
-	return ret;
-}
-
-static void kvm_send_hwpoison_signal(struct kvm_memory_slot *slot, gfn_t gfn)
-{
-	unsigned long hva = gfn_to_hva_memslot(slot, gfn);
-
-	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva, PAGE_SHIFT, current);
-}
-
-static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
-{
-	if (is_sigpending_pfn(fault->pfn)) {
-		kvm_handle_signal_exit(vcpu);
-		return -EINTR;
-	}
-
-	/*
-	 * Do not cache the mmio info caused by writing the readonly gfn
-	 * into the spte otherwise read access on readonly gfn also can
-	 * caused mmio page fault and treat it as mmio access.
-	 */
-	if (fault->pfn == KVM_PFN_ERR_RO_FAULT)
-		return RET_PF_EMULATE;
-
-	if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
-		kvm_send_hwpoison_signal(fault->slot, fault->gfn);
-		return RET_PF_RETRY;
-	}
-
-	return -EFAULT;
-}
-
-static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
-				   struct kvm_page_fault *fault,
-				   unsigned int access)
-{
-	gva_t gva = fault->is_tdp ? 0 : fault->addr;
-
-	vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
-			     access & shadow_mmio_access_mask);
-
-	/*
-	 * If MMIO caching is disabled, emulate immediately without
-	 * touching the shadow page tables as attempting to install an
-	 * MMIO SPTE will just be an expensive nop.
-	 */
-	if (unlikely(!enable_mmio_caching))
-		return RET_PF_EMULATE;
-
-	/*
-	 * Do not create an MMIO SPTE for a gfn greater than host.MAXPHYADDR,
-	 * any guest that generates such gfns is running nested and is being
-	 * tricked by L0 userspace (you can observe gfn > L1.MAXPHYADDR if and
-	 * only if L1's MAXPHYADDR is inaccurate with respect to the
-	 * hardware's).
-	 */
-	if (unlikely(fault->gfn > kvm_mmu_max_gfn()))
-		return RET_PF_EMULATE;
-
-	return RET_PF_CONTINUE;
-}
-
-static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
-{
-	/*
-	 * Page faults with reserved bits set, i.e. faults on MMIO SPTEs, only
-	 * reach the common page fault handler if the SPTE has an invalid MMIO
-	 * generation number.  Refreshing the MMIO generation needs to go down
-	 * the slow path.  Note, EPT Misconfigs do NOT set the PRESENT flag!
-	 */
-	if (fault->rsvd)
-		return false;
-
-	/*
-	 * #PF can be fast if:
-	 *
-	 * 1. The shadow page table entry is not present and A/D bits are
-	 *    disabled _by KVM_, which could mean that the fault is potentially
-	 *    caused by access tracking (if enabled).  If A/D bits are enabled
-	 *    by KVM, but disabled by L1 for L2, KVM is forced to disable A/D
-	 *    bits for L2 and employ access tracking, but the fast page fault
-	 *    mechanism only supports direct MMUs.
-	 * 2. The shadow page table entry is present, the access is a write,
-	 *    and no reserved bits are set (MMIO SPTEs cannot be "fixed"), i.e.
-	 *    the fault was caused by a write-protection violation.  If the
-	 *    SPTE is MMU-writable (determined later), the fault can be fixed
-	 *    by setting the Writable bit, which can be done out of mmu_lock.
-	 */
-	if (!fault->present)
-		return !kvm_ad_enabled();
-
-	/*
-	 * Note, instruction fetches and writes are mutually exclusive, ignore
-	 * the "exec" flag.
-	 */
-	return fault->write;
-}
-
-/*
- * Returns true if the SPTE was fixed successfully. Otherwise,
- * someone else modified the SPTE from its original value.
- */
-static bool fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu,
-				    struct kvm_page_fault *fault,
-				    u64 *sptep, u64 old_spte, u64 new_spte)
-{
-	/*
-	 * Theoretically we could also set dirty bit (and flush TLB) here in
-	 * order to eliminate unnecessary PML logging. See comments in
-	 * set_spte. But fast_page_fault is very unlikely to happen with PML
-	 * enabled, so we do not do this. This might result in the same GPA
-	 * to be logged in PML buffer again when the write really happens, and
-	 * eventually to be called by mark_page_dirty twice. But it's also no
-	 * harm. This also avoids the TLB flush needed after setting dirty bit
-	 * so non-PML cases won't be impacted.
-	 *
-	 * Compare with set_spte where instead shadow_dirty_mask is set.
-	 */
-	if (!try_cmpxchg64(sptep, &old_spte, new_spte))
-		return false;
-
-	if (is_writable_pte(new_spte) && !is_writable_pte(old_spte))
-		mark_page_dirty_in_slot(vcpu->kvm, fault->slot, fault->gfn);
-
-	return true;
-}
-
-static bool is_access_allowed(struct kvm_page_fault *fault, u64 spte)
-{
-	if (fault->exec)
-		return is_executable_pte(spte);
-
-	if (fault->write)
-		return is_writable_pte(spte);
-
-	/* Fault was on Read access */
-	return spte & PT_PRESENT_MASK;
-}
-
-/*
- * Returns the last level spte pointer of the shadow page walk for the given
- * gpa, and sets *spte to the spte value. This spte may be non-preset. If no
- * walk could be performed, returns NULL and *spte does not contain valid data.
- *
- * Contract:
- *  - Must be called between walk_shadow_page_lockless_{begin,end}.
- *  - The returned sptep must not be used after walk_shadow_page_lockless_end.
- */
-static u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte)
-{
-	struct kvm_shadow_walk_iterator iterator;
-	u64 old_spte;
-	u64 *sptep = NULL;
-
-	for_each_shadow_entry_lockless(vcpu, gpa, iterator, old_spte) {
-		sptep = iterator.sptep;
-		*spte = old_spte;
-	}
-
-	return sptep;
-}
-
-/*
- * Returns one of RET_PF_INVALID, RET_PF_FIXED or RET_PF_SPURIOUS.
- */
-static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
-{
-	struct kvm_mmu_page *sp;
-	int ret = RET_PF_INVALID;
-	u64 spte = 0ull;
-	u64 *sptep = NULL;
-	uint retry_count = 0;
-
-	if (!page_fault_can_be_fast(fault))
-		return ret;
-
-	walk_shadow_page_lockless_begin(vcpu);
-
-	do {
-		u64 new_spte;
-
-		if (tdp_mmu_enabled)
-			sptep = kvm_tdp_mmu_fast_pf_get_last_sptep(vcpu, fault->addr, &spte);
-		else
-			sptep = fast_pf_get_last_sptep(vcpu, fault->addr, &spte);
-
-		if (!is_shadow_present_pte(spte))
-			break;
-
-		sp = sptep_to_sp(sptep);
-		if (!is_last_spte(spte, sp->role.level))
-			break;
-
-		/*
-		 * Check whether the memory access that caused the fault would
-		 * still cause it if it were to be performed right now. If not,
-		 * then this is a spurious fault caused by TLB lazily flushed,
-		 * or some other CPU has already fixed the PTE after the
-		 * current CPU took the fault.
-		 *
-		 * Need not check the access of upper level table entries since
-		 * they are always ACC_ALL.
-		 */
-		if (is_access_allowed(fault, spte)) {
-			ret = RET_PF_SPURIOUS;
-			break;
-		}
-
-		new_spte = spte;
-
-		/*
-		 * KVM only supports fixing page faults outside of MMU lock for
-		 * direct MMUs, nested MMUs are always indirect, and KVM always
-		 * uses A/D bits for non-nested MMUs.  Thus, if A/D bits are
-		 * enabled, the SPTE can't be an access-tracked SPTE.
-		 */
-		if (unlikely(!kvm_ad_enabled()) && is_access_track_spte(spte))
-			new_spte = restore_acc_track_spte(new_spte);
-
-		/*
-		 * To keep things simple, only SPTEs that are MMU-writable can
-		 * be made fully writable outside of mmu_lock, e.g. only SPTEs
-		 * that were write-protected for dirty-logging or access
-		 * tracking are handled here.  Don't bother checking if the
-		 * SPTE is writable to prioritize running with A/D bits enabled.
-		 * The is_access_allowed() check above handles the common case
-		 * of the fault being spurious, and the SPTE is known to be
-		 * shadow-present, i.e. except for access tracking restoration
-		 * making the new SPTE writable, the check is wasteful.
-		 */
-		if (fault->write && is_mmu_writable_spte(spte)) {
-			new_spte |= PT_WRITABLE_MASK;
-
-			/*
-			 * Do not fix write-permission on the large spte when
-			 * dirty logging is enabled. Since we only dirty the
-			 * first page into the dirty-bitmap in
-			 * fast_pf_fix_direct_spte(), other pages are missed
-			 * if its slot has dirty logging enabled.
-			 *
-			 * Instead, we let the slow page fault path create a
-			 * normal spte to fix the access.
-			 */
-			if (sp->role.level > PG_LEVEL_4K &&
-			    kvm_slot_dirty_track_enabled(fault->slot))
-				break;
-		}
-
-		/* Verify that the fault can be handled in the fast path */
-		if (new_spte == spte ||
-		    !is_access_allowed(fault, new_spte))
-			break;
-
-		/*
-		 * Currently, fast page fault only works for direct mapping
-		 * since the gfn is not stable for indirect shadow page. See
-		 * Documentation/virt/kvm/locking.rst to get more detail.
-		 */
-		if (fast_pf_fix_direct_spte(vcpu, fault, sptep, spte, new_spte)) {
-			ret = RET_PF_FIXED;
-			break;
-		}
-
-		if (++retry_count > 4) {
-			pr_warn_once("Fast #PF retrying more than 4 times.\n");
-			break;
-		}
-
-	} while (true);
-
-	trace_fast_page_fault(vcpu, fault, sptep, spte, ret);
-	walk_shadow_page_lockless_end(vcpu);
-
-	if (ret != RET_PF_INVALID)
-		vcpu->stat.pf_fast++;
-
-	return ret;
-}
-
-static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
-			       struct list_head *invalid_list)
-{
-	struct kvm_mmu_page *sp;
-
-	if (!VALID_PAGE(*root_hpa))
-		return;
-
-	/*
-	 * The "root" may be a special root, e.g. a PAE entry, treat it as a
-	 * SPTE to ensure any non-PA bits are dropped.
-	 */
-	sp = spte_to_child_sp(*root_hpa);
-	if (WARN_ON(!sp))
-		return;
-
-	if (is_tdp_mmu_page(sp))
-		kvm_tdp_mmu_put_root(kvm, sp, false);
-	else if (!--sp->root_count && sp->role.invalid)
-		kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
-
-	*root_hpa = INVALID_PAGE;
-}
-
-/* roots_to_free must be some combination of the KVM_MMU_ROOT_* flags */
-void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
-			ulong roots_to_free)
-{
-	int i;
-	LIST_HEAD(invalid_list);
-	bool free_active_root;
-
-	BUILD_BUG_ON(KVM_MMU_NUM_PREV_ROOTS >= BITS_PER_LONG);
-
-	/* Before acquiring the MMU lock, see if we need to do any real work. */
-	free_active_root = (roots_to_free & KVM_MMU_ROOT_CURRENT)
-		&& VALID_PAGE(mmu->root.hpa);
-
-	if (!free_active_root) {
-		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
-			if ((roots_to_free & KVM_MMU_ROOT_PREVIOUS(i)) &&
-			    VALID_PAGE(mmu->prev_roots[i].hpa))
-				break;
-
-		if (i == KVM_MMU_NUM_PREV_ROOTS)
-			return;
-	}
-
-	write_lock(&kvm->mmu_lock);
-
-	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
-		if (roots_to_free & KVM_MMU_ROOT_PREVIOUS(i))
-			mmu_free_root_page(kvm, &mmu->prev_roots[i].hpa,
-					   &invalid_list);
-
-	if (free_active_root) {
-		if (to_shadow_page(mmu->root.hpa)) {
-			mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list);
-		} else if (mmu->pae_root) {
-			for (i = 0; i < 4; ++i) {
-				if (!IS_VALID_PAE_ROOT(mmu->pae_root[i]))
-					continue;
-
-				mmu_free_root_page(kvm, &mmu->pae_root[i],
-						   &invalid_list);
-				mmu->pae_root[i] = INVALID_PAE_ROOT;
-			}
-		}
-		mmu->root.hpa = INVALID_PAGE;
-		mmu->root.pgd = 0;
-	}
-
-	kvm_mmu_commit_zap_page(kvm, &invalid_list);
-	write_unlock(&kvm->mmu_lock);
-}
-EXPORT_SYMBOL_GPL(kvm_mmu_free_roots);
-
-void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
-{
-	unsigned long roots_to_free = 0;
-	hpa_t root_hpa;
-	int i;
-
-	/*
-	 * This should not be called while L2 is active, L2 can't invalidate
-	 * _only_ its own roots, e.g. INVVPID unconditionally exits.
-	 */
-	WARN_ON_ONCE(mmu->root_role.guest_mode);
-
-	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
-		root_hpa = mmu->prev_roots[i].hpa;
-		if (!VALID_PAGE(root_hpa))
-			continue;
-
-		if (!to_shadow_page(root_hpa) ||
-			to_shadow_page(root_hpa)->role.guest_mode)
-			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
-	}
-
-	kvm_mmu_free_roots(kvm, mmu, roots_to_free);
-}
-EXPORT_SYMBOL_GPL(kvm_mmu_free_guest_mode_roots);
-
-
-static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
-{
-	int ret = 0;
-
-	if (!kvm_vcpu_is_visible_gfn(vcpu, root_gfn)) {
-		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
-		ret = 1;
-	}
-
-	return ret;
-}
-
-static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
-			    u8 level)
-{
-	union kvm_mmu_page_role role = vcpu->arch.mmu->root_role;
-	struct kvm_mmu_page *sp;
-
-	role.level = level;
-	role.quadrant = quadrant;
-
-	WARN_ON_ONCE(quadrant && !role.has_4_byte_gpte);
-	WARN_ON_ONCE(role.direct && role.has_4_byte_gpte);
-
-	sp = kvm_mmu_get_shadow_page(vcpu, gfn, role);
-	++sp->root_count;
-
-	return __pa(sp->spt);
-}
-
-static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
-{
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
-	u8 shadow_root_level = mmu->root_role.level;
-	hpa_t root;
-	unsigned i;
-	int r;
-
-	write_lock(&vcpu->kvm->mmu_lock);
-	r = make_mmu_pages_available(vcpu);
-	if (r < 0)
-		goto out_unlock;
-
-	if (tdp_mmu_enabled) {
-		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
-		mmu->root.hpa = root;
-	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
-		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level);
-		mmu->root.hpa = root;
-	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
-		if (WARN_ON_ONCE(!mmu->pae_root)) {
-			r = -EIO;
-			goto out_unlock;
-		}
-
-		for (i = 0; i < 4; ++i) {
-			WARN_ON_ONCE(IS_VALID_PAE_ROOT(mmu->pae_root[i]));
-
-			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT), 0,
-					      PT32_ROOT_LEVEL);
-			mmu->pae_root[i] = root | PT_PRESENT_MASK |
-					   shadow_me_value;
-		}
-		mmu->root.hpa = __pa(mmu->pae_root);
-	} else {
-		WARN_ONCE(1, "Bad TDP root level = %d\n", shadow_root_level);
-		r = -EIO;
-		goto out_unlock;
-	}
-
-	/* root.pgd is ignored for direct MMUs. */
-	mmu->root.pgd = 0;
-out_unlock:
-	write_unlock(&vcpu->kvm->mmu_lock);
-	return r;
-}
-
-static int mmu_first_shadow_root_alloc(struct kvm *kvm)
-{
-	struct kvm_memslots *slots;
-	struct kvm_memory_slot *slot;
-	int r = 0, i, bkt;
-
-	/*
-	 * Check if this is the first shadow root being allocated before
-	 * taking the lock.
-	 */
-	if (kvm_shadow_root_allocated(kvm))
-		return 0;
-
-	mutex_lock(&kvm->slots_arch_lock);
-
-	/* Recheck, under the lock, whether this is the first shadow root. */
-	if (kvm_shadow_root_allocated(kvm))
-		goto out_unlock;
-
-	/*
-	 * Check if anything actually needs to be allocated, e.g. all metadata
-	 * will be allocated upfront if TDP is disabled.
-	 */
-	if (kvm_memslots_have_rmaps(kvm) &&
-	    kvm_page_track_write_tracking_enabled(kvm))
-		goto out_success;
-
-	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
-		slots = __kvm_memslots(kvm, i);
-		kvm_for_each_memslot(slot, bkt, slots) {
-			/*
-			 * Both of these functions are no-ops if the target is
-			 * already allocated, so unconditionally calling both
-			 * is safe.  Intentionally do NOT free allocations on
-			 * failure to avoid having to track which allocations
-			 * were made now versus when the memslot was created.
-			 * The metadata is guaranteed to be freed when the slot
-			 * is freed, and will be kept/used if userspace retries
-			 * KVM_RUN instead of killing the VM.
-			 */
-			r = memslot_rmap_alloc(slot, slot->npages);
-			if (r)
-				goto out_unlock;
-			r = kvm_page_track_write_tracking_alloc(slot);
-			if (r)
-				goto out_unlock;
-		}
-	}
-
-	/*
-	 * Ensure that shadow_root_allocated becomes true strictly after
-	 * all the related pointers are set.
-	 */
-out_success:
-	smp_store_release(&kvm->arch.shadow_root_allocated, true);
-
-out_unlock:
-	mutex_unlock(&kvm->slots_arch_lock);
-	return r;
-}
-
-static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
-{
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
-	u64 pdptrs[4], pm_mask;
-	gfn_t root_gfn, root_pgd;
-	int quadrant, i, r;
-	hpa_t root;
-
-	root_pgd = mmu->get_guest_pgd(vcpu);
-	root_gfn = root_pgd >> PAGE_SHIFT;
-
-	if (mmu_check_root(vcpu, root_gfn))
-		return 1;
-
-	/*
-	 * On SVM, reading PDPTRs might access guest memory, which might fault
-	 * and thus might sleep.  Grab the PDPTRs before acquiring mmu_lock.
-	 */
-	if (mmu->cpu_role.base.level == PT32E_ROOT_LEVEL) {
-		for (i = 0; i < 4; ++i) {
-			pdptrs[i] = mmu->get_pdptr(vcpu, i);
-			if (!(pdptrs[i] & PT_PRESENT_MASK))
-				continue;
-
-			if (mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT))
-				return 1;
-		}
-	}
-
-	r = mmu_first_shadow_root_alloc(vcpu->kvm);
-	if (r)
-		return r;
-
-	write_lock(&vcpu->kvm->mmu_lock);
-	r = make_mmu_pages_available(vcpu);
-	if (r < 0)
-		goto out_unlock;
-
-	/*
-	 * Do we shadow a long mode page table? If so we need to
-	 * write-protect the guests page table root.
-	 */
-	if (mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL) {
-		root = mmu_alloc_root(vcpu, root_gfn, 0,
-				      mmu->root_role.level);
-		mmu->root.hpa = root;
-		goto set_root_pgd;
-	}
-
-	if (WARN_ON_ONCE(!mmu->pae_root)) {
-		r = -EIO;
-		goto out_unlock;
-	}
-
-	/*
-	 * We shadow a 32 bit page table. This may be a legacy 2-level
-	 * or a PAE 3-level page table. In either case we need to be aware that
-	 * the shadow page table may be a PAE or a long mode page table.
-	 */
-	pm_mask = PT_PRESENT_MASK | shadow_me_value;
-	if (mmu->root_role.level >= PT64_ROOT_4LEVEL) {
-		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
-
-		if (WARN_ON_ONCE(!mmu->pml4_root)) {
-			r = -EIO;
-			goto out_unlock;
-		}
-		mmu->pml4_root[0] = __pa(mmu->pae_root) | pm_mask;
-
-		if (mmu->root_role.level == PT64_ROOT_5LEVEL) {
-			if (WARN_ON_ONCE(!mmu->pml5_root)) {
-				r = -EIO;
-				goto out_unlock;
-			}
-			mmu->pml5_root[0] = __pa(mmu->pml4_root) | pm_mask;
-		}
-	}
-
-	for (i = 0; i < 4; ++i) {
-		WARN_ON_ONCE(IS_VALID_PAE_ROOT(mmu->pae_root[i]));
-
-		if (mmu->cpu_role.base.level == PT32E_ROOT_LEVEL) {
-			if (!(pdptrs[i] & PT_PRESENT_MASK)) {
-				mmu->pae_root[i] = INVALID_PAE_ROOT;
-				continue;
-			}
-			root_gfn = pdptrs[i] >> PAGE_SHIFT;
-		}
-
-		/*
-		 * If shadowing 32-bit non-PAE page tables, each PAE page
-		 * directory maps one quarter of the guest's non-PAE page
-		 * directory. Othwerise each PAE page direct shadows one guest
-		 * PAE page directory so that quadrant should be 0.
-		 */
-		quadrant = (mmu->cpu_role.base.level == PT32_ROOT_LEVEL) ? i : 0;
-
-		root = mmu_alloc_root(vcpu, root_gfn, quadrant, PT32_ROOT_LEVEL);
-		mmu->pae_root[i] = root | pm_mask;
-	}
-
-	if (mmu->root_role.level == PT64_ROOT_5LEVEL)
-		mmu->root.hpa = __pa(mmu->pml5_root);
-	else if (mmu->root_role.level == PT64_ROOT_4LEVEL)
-		mmu->root.hpa = __pa(mmu->pml4_root);
-	else
-		mmu->root.hpa = __pa(mmu->pae_root);
-
-set_root_pgd:
-	mmu->root.pgd = root_pgd;
-out_unlock:
-	write_unlock(&vcpu->kvm->mmu_lock);
-
-	return r;
-}
-
-static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
-{
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
-	bool need_pml5 = mmu->root_role.level > PT64_ROOT_4LEVEL;
-	u64 *pml5_root = NULL;
-	u64 *pml4_root = NULL;
-	u64 *pae_root;
-
-	/*
-	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
-	 * tables are allocated and initialized at root creation as there is no
-	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
-	 * on demand, as running a 32-bit L1 VMM on 64-bit KVM is very rare.
-	 */
-	if (mmu->root_role.direct ||
-	    mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL ||
-	    mmu->root_role.level < PT64_ROOT_4LEVEL)
-		return 0;
-
-	/*
-	 * NPT, the only paging mode that uses this horror, uses a fixed number
-	 * of levels for the shadow page tables, e.g. all MMUs are 4-level or
-	 * all MMus are 5-level.  Thus, this can safely require that pml5_root
-	 * is allocated if the other roots are valid and pml5 is needed, as any
-	 * prior MMU would also have required pml5.
-	 */
-	if (mmu->pae_root && mmu->pml4_root && (!need_pml5 || mmu->pml5_root))
-		return 0;
-
-	/*
-	 * The special roots should always be allocated in concert.  Yell and
-	 * bail if KVM ends up in a state where only one of the roots is valid.
-	 */
-	if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->pml4_root ||
-			 (need_pml5 && mmu->pml5_root)))
-		return -EIO;
-
-	/*
-	 * Unlike 32-bit NPT, the PDP table doesn't need to be in low mem, and
-	 * doesn't need to be decrypted.
-	 */
-	pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-	if (!pae_root)
-		return -ENOMEM;
+		if (++retry_count > 4) {
+			pr_warn_once("Fast #PF retrying more than 4 times.\n");
+			break;
+		}
 
-#ifdef CONFIG_X86_64
-	pml4_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-	if (!pml4_root)
-		goto err_pml4;
-
-	if (need_pml5) {
-		pml5_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
-		if (!pml5_root)
-			goto err_pml5;
-	}
-#endif
+	} while (true);
 
-	mmu->pae_root = pae_root;
-	mmu->pml4_root = pml4_root;
-	mmu->pml5_root = pml5_root;
+	trace_fast_page_fault(vcpu, fault, sptep, spte, ret);
+	walk_shadow_page_lockless_end(vcpu);
 
-	return 0;
+	if (ret != RET_PF_INVALID)
+		vcpu->stat.pf_fast++;
 
-#ifdef CONFIG_X86_64
-err_pml5:
-	free_page((unsigned long)pml4_root);
-err_pml4:
-	free_page((unsigned long)pae_root);
-	return -ENOMEM;
-#endif
+	return ret;
 }
 
-static bool is_unsync_root(hpa_t root)
+static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
+			       struct list_head *invalid_list)
 {
 	struct kvm_mmu_page *sp;
 
-	if (!VALID_PAGE(root))
-		return false;
-
-	/*
-	 * The read barrier orders the CPU's read of SPTE.W during the page table
-	 * walk before the reads of sp->unsync/sp->unsync_children here.
-	 *
-	 * Even if another CPU was marking the SP as unsync-ed simultaneously,
-	 * any guest page table changes are not guaranteed to be visible anyway
-	 * until this VCPU issues a TLB flush strictly after those changes are
-	 * made.  We only need to ensure that the other CPU sets these flags
-	 * before any actual changes to the page tables are made.  The comments
-	 * in mmu_try_to_unsync_pages() describe what could go wrong if this
-	 * requirement isn't satisfied.
-	 */
-	smp_rmb();
-	sp = to_shadow_page(root);
+	if (!VALID_PAGE(*root_hpa))
+		return;
 
 	/*
-	 * PAE roots (somewhat arbitrarily) aren't backed by shadow pages, the
-	 * PDPTEs for a given PAE root need to be synchronized individually.
+	 * The "root" may be a special root, e.g. a PAE entry, treat it as a
+	 * SPTE to ensure any non-PA bits are dropped.
 	 */
-	if (WARN_ON_ONCE(!sp))
-		return false;
+	sp = spte_to_child_sp(*root_hpa);
+	if (WARN_ON(!sp))
+		return;
 
-	if (sp->unsync || sp->unsync_children)
-		return true;
+	if (is_tdp_mmu_page(sp))
+		kvm_tdp_mmu_put_root(kvm, sp, false);
+	else if (!--sp->root_count && sp->role.invalid)
+		kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
 
-	return false;
+	*root_hpa = INVALID_PAGE;
 }
 
-void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
+/* roots_to_free must be some combination of the KVM_MMU_ROOT_* flags */
+void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
+			ulong roots_to_free)
 {
 	int i;
-	struct kvm_mmu_page *sp;
-
-	if (vcpu->arch.mmu->root_role.direct)
-		return;
+	LIST_HEAD(invalid_list);
+	bool free_active_root;
 
-	if (!VALID_PAGE(vcpu->arch.mmu->root.hpa))
-		return;
+	BUILD_BUG_ON(KVM_MMU_NUM_PREV_ROOTS >= BITS_PER_LONG);
 
-	vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
+	/* Before acquiring the MMU lock, see if we need to do any real work. */
+	free_active_root = (roots_to_free & KVM_MMU_ROOT_CURRENT)
+		&& VALID_PAGE(mmu->root.hpa);
 
-	if (vcpu->arch.mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL) {
-		hpa_t root = vcpu->arch.mmu->root.hpa;
-		sp = to_shadow_page(root);
+	if (!free_active_root) {
+		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
+			if ((roots_to_free & KVM_MMU_ROOT_PREVIOUS(i)) &&
+			    VALID_PAGE(mmu->prev_roots[i].hpa))
+				break;
 
-		if (!is_unsync_root(root))
+		if (i == KVM_MMU_NUM_PREV_ROOTS)
 			return;
-
-		write_lock(&vcpu->kvm->mmu_lock);
-		mmu_sync_children(vcpu, sp, true);
-		write_unlock(&vcpu->kvm->mmu_lock);
-		return;
 	}
 
-	write_lock(&vcpu->kvm->mmu_lock);
+	write_lock(&kvm->mmu_lock);
+
+	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
+		if (roots_to_free & KVM_MMU_ROOT_PREVIOUS(i))
+			mmu_free_root_page(kvm, &mmu->prev_roots[i].hpa,
+					   &invalid_list);
 
-	for (i = 0; i < 4; ++i) {
-		hpa_t root = vcpu->arch.mmu->pae_root[i];
+	if (free_active_root) {
+		if (to_shadow_page(mmu->root.hpa)) {
+			mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list);
+		} else if (mmu->pae_root) {
+			for (i = 0; i < 4; ++i) {
+				if (!IS_VALID_PAE_ROOT(mmu->pae_root[i]))
+					continue;
 
-		if (IS_VALID_PAE_ROOT(root)) {
-			sp = spte_to_child_sp(root);
-			mmu_sync_children(vcpu, sp, true);
+				mmu_free_root_page(kvm, &mmu->pae_root[i],
+						   &invalid_list);
+				mmu->pae_root[i] = INVALID_PAE_ROOT;
+			}
 		}
+		mmu->root.hpa = INVALID_PAGE;
+		mmu->root.pgd = 0;
 	}
 
-	write_unlock(&vcpu->kvm->mmu_lock);
+	kvm_mmu_commit_zap_page(kvm, &invalid_list);
+	write_unlock(&kvm->mmu_lock);
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_free_roots);
 
-void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu)
+static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 {
-	unsigned long roots_to_free = 0;
-	int i;
+	struct kvm_mmu *mmu = vcpu->arch.mmu;
+	u8 shadow_root_level = mmu->root_role.level;
+	hpa_t root;
+	unsigned i;
+	int r;
 
-	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
-		if (is_unsync_root(vcpu->arch.mmu->prev_roots[i].hpa))
-			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
+	write_lock(&vcpu->kvm->mmu_lock);
+	r = make_mmu_pages_available(vcpu);
+	if (r < 0)
+		goto out_unlock;
+
+	if (tdp_mmu_enabled) {
+		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
+		mmu->root.hpa = root;
+	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
+		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level);
+		mmu->root.hpa = root;
+	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
+		if (WARN_ON_ONCE(!mmu->pae_root)) {
+			r = -EIO;
+			goto out_unlock;
+		}
+
+		for (i = 0; i < 4; ++i) {
+			WARN_ON_ONCE(IS_VALID_PAE_ROOT(mmu->pae_root[i]));
 
-	/* sync prev_roots by simply freeing them */
-	kvm_mmu_free_roots(vcpu->kvm, vcpu->arch.mmu, roots_to_free);
+			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT), 0,
+					      PT32_ROOT_LEVEL);
+			mmu->pae_root[i] = root | PT_PRESENT_MASK |
+					   shadow_me_value;
+		}
+		mmu->root.hpa = __pa(mmu->pae_root);
+	} else {
+		WARN_ONCE(1, "Bad TDP root level = %d\n", shadow_root_level);
+		r = -EIO;
+		goto out_unlock;
+	}
+
+	/* root.pgd is ignored for direct MMUs. */
+	mmu->root.pgd = 0;
+out_unlock:
+	write_unlock(&vcpu->kvm->mmu_lock);
+	return r;
 }
 
 static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
@@ -4002,31 +1200,6 @@ static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 addr, bool direct)
 	return vcpu_match_mmio_gva(vcpu, addr);
 }
 
-/*
- * Return the level of the lowest level SPTE added to sptes.
- * That SPTE may be non-present.
- *
- * Must be called between walk_shadow_page_lockless_{begin,end}.
- */
-static int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level)
-{
-	struct kvm_shadow_walk_iterator iterator;
-	int leaf = -1;
-	u64 spte;
-
-	for (shadow_walk_init(&iterator, vcpu, addr),
-	     *root_level = iterator.level;
-	     shadow_walk_okay(&iterator);
-	     __shadow_walk_next(&iterator, spte)) {
-		leaf = iterator.level;
-		spte = mmu_spte_get_lockless(iterator.sptep);
-
-		sptes[leaf] = spte;
-	}
-
-	return leaf;
-}
-
 /* return true if reserved bit(s) are detected on a valid, non-MMIO SPTE. */
 static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
 {
@@ -4130,17 +1303,6 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 	return false;
 }
 
-static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
-{
-	struct kvm_shadow_walk_iterator iterator;
-	u64 spte;
-
-	walk_shadow_page_lockless_begin(vcpu);
-	for_each_shadow_entry_lockless(vcpu, addr, iterator, spte)
-		clear_sp_write_flooding_count(iterator.sptep);
-	walk_shadow_page_lockless_end(vcpu);
-}
-
 static u32 alloc_apf_token(struct kvm_vcpu *vcpu)
 {
 	/* make sure the token value is not 0 */
@@ -5356,264 +2518,65 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	vcpu->arch.nested_mmu.root_role.word = 0;
 	vcpu->arch.root_mmu.cpu_role.ext.valid = 0;
 	vcpu->arch.guest_mmu.cpu_role.ext.valid = 0;
-	vcpu->arch.nested_mmu.cpu_role.ext.valid = 0;
-	kvm_mmu_reset_context(vcpu);
-
-	/*
-	 * Changing guest CPUID after KVM_RUN is forbidden, see the comment in
-	 * kvm_arch_vcpu_ioctl().
-	 */
-	KVM_BUG_ON(vcpu->arch.last_vmentry_cpu != -1, vcpu->kvm);
-}
-
-void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
-{
-	kvm_mmu_unload(vcpu);
-	kvm_init_mmu(vcpu);
-}
-EXPORT_SYMBOL_GPL(kvm_mmu_reset_context);
-
-int kvm_mmu_load(struct kvm_vcpu *vcpu)
-{
-	int r;
-
-	r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.direct);
-	if (r)
-		goto out;
-	r = mmu_alloc_special_roots(vcpu);
-	if (r)
-		goto out;
-	if (vcpu->arch.mmu->root_role.direct)
-		r = mmu_alloc_direct_roots(vcpu);
-	else
-		r = mmu_alloc_shadow_roots(vcpu);
-	if (r)
-		goto out;
-
-	kvm_mmu_sync_roots(vcpu);
-
-	kvm_mmu_load_pgd(vcpu);
-
-	/*
-	 * Flush any TLB entries for the new root, the provenance of the root
-	 * is unknown.  Even if KVM ensures there are no stale TLB entries
-	 * for a freed root, in theory another hypervisor could have left
-	 * stale entries.  Flushing on alloc also allows KVM to skip the TLB
-	 * flush when freeing a root (see kvm_tdp_mmu_put_root()).
-	 */
-	static_call(kvm_x86_flush_tlb_current)(vcpu);
-out:
-	return r;
-}
-
-void kvm_mmu_unload(struct kvm_vcpu *vcpu)
-{
-	struct kvm *kvm = vcpu->kvm;
-
-	kvm_mmu_free_roots(kvm, &vcpu->arch.root_mmu, KVM_MMU_ROOTS_ALL);
-	WARN_ON(VALID_PAGE(vcpu->arch.root_mmu.root.hpa));
-	kvm_mmu_free_roots(kvm, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
-	WARN_ON(VALID_PAGE(vcpu->arch.guest_mmu.root.hpa));
-	vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
-}
-
-static bool is_obsolete_root(struct kvm *kvm, hpa_t root_hpa)
-{
-	struct kvm_mmu_page *sp;
-
-	if (!VALID_PAGE(root_hpa))
-		return false;
-
-	/*
-	 * When freeing obsolete roots, treat roots as obsolete if they don't
-	 * have an associated shadow page.  This does mean KVM will get false
-	 * positives and free roots that don't strictly need to be freed, but
-	 * such false positives are relatively rare:
-	 *
-	 *  (a) only PAE paging and nested NPT has roots without shadow pages
-	 *  (b) remote reloads due to a memslot update obsoletes _all_ roots
-	 *  (c) KVM doesn't track previous roots for PAE paging, and the guest
-	 *      is unlikely to zap an in-use PGD.
-	 */
-	sp = to_shadow_page(root_hpa);
-	return !sp || is_obsolete_sp(kvm, sp);
-}
-
-static void __kvm_mmu_free_obsolete_roots(struct kvm *kvm, struct kvm_mmu *mmu)
-{
-	unsigned long roots_to_free = 0;
-	int i;
-
-	if (is_obsolete_root(kvm, mmu->root.hpa))
-		roots_to_free |= KVM_MMU_ROOT_CURRENT;
-
-	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
-		if (is_obsolete_root(kvm, mmu->prev_roots[i].hpa))
-			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
-	}
-
-	if (roots_to_free)
-		kvm_mmu_free_roots(kvm, mmu, roots_to_free);
-}
-
-void kvm_mmu_free_obsolete_roots(struct kvm_vcpu *vcpu)
-{
-	__kvm_mmu_free_obsolete_roots(vcpu->kvm, &vcpu->arch.root_mmu);
-	__kvm_mmu_free_obsolete_roots(vcpu->kvm, &vcpu->arch.guest_mmu);
-}
-
-static u64 mmu_pte_write_fetch_gpte(struct kvm_vcpu *vcpu, gpa_t *gpa,
-				    int *bytes)
-{
-	u64 gentry = 0;
-	int r;
-
-	/*
-	 * Assume that the pte write on a page table of the same type
-	 * as the current vcpu paging mode since we update the sptes only
-	 * when they have the same mode.
-	 */
-	if (is_pae(vcpu) && *bytes == 4) {
-		/* Handle a 32-bit guest writing two halves of a 64-bit gpte */
-		*gpa &= ~(gpa_t)7;
-		*bytes = 8;
-	}
-
-	if (*bytes == 4 || *bytes == 8) {
-		r = kvm_vcpu_read_guest_atomic(vcpu, *gpa, &gentry, *bytes);
-		if (r)
-			gentry = 0;
-	}
-
-	return gentry;
-}
-
-/*
- * If we're seeing too many writes to a page, it may no longer be a page table,
- * or we may be forking, in which case it is better to unmap the page.
- */
-static bool detect_write_flooding(struct kvm_mmu_page *sp)
-{
-	/*
-	 * Skip write-flooding detected for the sp whose level is 1, because
-	 * it can become unsync, then the guest page is not write-protected.
-	 */
-	if (sp->role.level == PG_LEVEL_4K)
-		return false;
-
-	atomic_inc(&sp->write_flooding_count);
-	return atomic_read(&sp->write_flooding_count) >= 3;
-}
-
-/*
- * Misaligned accesses are too much trouble to fix up; also, they usually
- * indicate a page is not used as a page table.
- */
-static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa,
-				    int bytes)
-{
-	unsigned offset, pte_size, misaligned;
-
-	pgprintk("misaligned: gpa %llx bytes %d role %x\n",
-		 gpa, bytes, sp->role.word);
-
-	offset = offset_in_page(gpa);
-	pte_size = sp->role.has_4_byte_gpte ? 4 : 8;
+	vcpu->arch.nested_mmu.cpu_role.ext.valid = 0;
+	kvm_mmu_reset_context(vcpu);
 
 	/*
-	 * Sometimes, the OS only writes the last one bytes to update status
-	 * bits, for example, in linux, andb instruction is used in clear_bit().
+	 * Changing guest CPUID after KVM_RUN is forbidden, see the comment in
+	 * kvm_arch_vcpu_ioctl().
 	 */
-	if (!(offset & (pte_size - 1)) && bytes == 1)
-		return false;
-
-	misaligned = (offset ^ (offset + bytes - 1)) & ~(pte_size - 1);
-	misaligned |= bytes < 4;
-
-	return misaligned;
+	KVM_BUG_ON(vcpu->arch.last_vmentry_cpu != -1, vcpu->kvm);
 }
 
-static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
+void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
 {
-	unsigned page_offset, quadrant;
-	u64 *spte;
-	int level;
-
-	page_offset = offset_in_page(gpa);
-	level = sp->role.level;
-	*nspte = 1;
-	if (sp->role.has_4_byte_gpte) {
-		page_offset <<= 1;	/* 32->64 */
-		/*
-		 * A 32-bit pde maps 4MB while the shadow pdes map
-		 * only 2MB.  So we need to double the offset again
-		 * and zap two pdes instead of one.
-		 */
-		if (level == PT32_ROOT_LEVEL) {
-			page_offset &= ~7; /* kill rounding error */
-			page_offset <<= 1;
-			*nspte = 2;
-		}
-		quadrant = page_offset >> PAGE_SHIFT;
-		page_offset &= ~PAGE_MASK;
-		if (quadrant != sp->role.quadrant)
-			return NULL;
-	}
-
-	spte = &sp->spt[page_offset / sizeof(*spte)];
-	return spte;
+	kvm_mmu_unload(vcpu);
+	kvm_init_mmu(vcpu);
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_reset_context);
 
-static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-			      const u8 *new, int bytes,
-			      struct kvm_page_track_notifier_node *node)
+int kvm_mmu_load(struct kvm_vcpu *vcpu)
 {
-	gfn_t gfn = gpa >> PAGE_SHIFT;
-	struct kvm_mmu_page *sp;
-	LIST_HEAD(invalid_list);
-	u64 entry, gentry, *spte;
-	int npte;
-	bool flush = false;
-
-	/*
-	 * If we don't have indirect shadow pages, it means no page is
-	 * write-protected, so we can exit simply.
-	 */
-	if (!READ_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
-		return;
-
-	pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
+	int r;
 
-	write_lock(&vcpu->kvm->mmu_lock);
+	r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.direct);
+	if (r)
+		goto out;
+	r = mmu_alloc_special_roots(vcpu);
+	if (r)
+		goto out;
+	if (vcpu->arch.mmu->root_role.direct)
+		r = mmu_alloc_direct_roots(vcpu);
+	else
+		r = mmu_alloc_shadow_roots(vcpu);
+	if (r)
+		goto out;
 
-	gentry = mmu_pte_write_fetch_gpte(vcpu, &gpa, &bytes);
+	kvm_mmu_sync_roots(vcpu);
 
-	++vcpu->kvm->stat.mmu_pte_write;
+	kvm_mmu_load_pgd(vcpu);
 
-	for_each_gfn_valid_sp_with_gptes(vcpu->kvm, sp, gfn) {
-		if (detect_write_misaligned(sp, gpa, bytes) ||
-		      detect_write_flooding(sp)) {
-			kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list);
-			++vcpu->kvm->stat.mmu_flooded;
-			continue;
-		}
+	/*
+	 * Flush any TLB entries for the new root, the provenance of the root
+	 * is unknown.  Even if KVM ensures there are no stale TLB entries
+	 * for a freed root, in theory another hypervisor could have left
+	 * stale entries.  Flushing on alloc also allows KVM to skip the TLB
+	 * flush when freeing a root (see kvm_tdp_mmu_put_root()).
+	 */
+	static_call(kvm_x86_flush_tlb_current)(vcpu);
+out:
+	return r;
+}
 
-		spte = get_written_sptes(sp, gpa, &npte);
-		if (!spte)
-			continue;
+void kvm_mmu_unload(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
 
-		while (npte--) {
-			entry = *spte;
-			mmu_page_zap_pte(vcpu->kvm, sp, spte, NULL);
-			if (gentry && sp->role.level != PG_LEVEL_4K)
-				++vcpu->kvm->stat.mmu_pde_zapped;
-			if (is_shadow_present_pte(entry))
-				flush = true;
-			++spte;
-		}
-	}
-	kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, flush);
-	write_unlock(&vcpu->kvm->mmu_lock);
+	kvm_mmu_free_roots(kvm, &vcpu->arch.root_mmu, KVM_MMU_ROOTS_ALL);
+	WARN_ON(VALID_PAGE(vcpu->arch.root_mmu.root.hpa));
+	kvm_mmu_free_roots(kvm, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
+	WARN_ON(VALID_PAGE(vcpu->arch.guest_mmu.root.hpa));
+	vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
 }
 
 int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
@@ -5782,60 +2745,6 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 }
 EXPORT_SYMBOL_GPL(kvm_configure_mmu);
 
-/* The return value indicates if tlb flush on all vcpus is needed. */
-typedef bool (*slot_rmaps_handler) (struct kvm *kvm,
-				    struct kvm_rmap_head *rmap_head,
-				    const struct kvm_memory_slot *slot);
-
-static __always_inline bool __walk_slot_rmaps(struct kvm *kvm,
-					      const struct kvm_memory_slot *slot,
-					      slot_rmaps_handler fn,
-					      int start_level, int end_level,
-					      gfn_t start_gfn, gfn_t end_gfn,
-					      bool flush_on_yield, bool flush)
-{
-	struct slot_rmap_walk_iterator iterator;
-
-	lockdep_assert_held_write(&kvm->mmu_lock);
-
-	for_each_slot_rmap_range(slot, start_level, end_level, start_gfn,
-			end_gfn, &iterator) {
-		if (iterator.rmap)
-			flush |= fn(kvm, iterator.rmap, slot);
-
-		if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
-			if (flush && flush_on_yield) {
-				kvm_flush_remote_tlbs_with_address(kvm,
-						start_gfn,
-						iterator.gfn - start_gfn + 1);
-				flush = false;
-			}
-			cond_resched_rwlock_write(&kvm->mmu_lock);
-		}
-	}
-
-	return flush;
-}
-
-static __always_inline bool walk_slot_rmaps(struct kvm *kvm,
-					    const struct kvm_memory_slot *slot,
-					    slot_rmaps_handler fn,
-					    int start_level, int end_level,
-					    bool flush_on_yield)
-{
-	return __walk_slot_rmaps(kvm, slot, fn, start_level, end_level,
-				 slot->base_gfn, slot->base_gfn + slot->npages - 1,
-				 flush_on_yield, false);
-}
-
-static __always_inline bool walk_slot_rmaps_4k(struct kvm *kvm,
-					       const struct kvm_memory_slot *slot,
-					       slot_rmaps_handler fn,
-					       bool flush_on_yield)
-{
-	return walk_slot_rmaps(kvm, slot, fn, PG_LEVEL_4K, PG_LEVEL_4K, flush_on_yield);
-}
-
 static void free_mmu_pages(struct kvm_mmu *mmu)
 {
 	if (!tdp_enabled && mmu->pae_root)
@@ -5927,63 +2836,6 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
-#define BATCH_ZAP_PAGES	10
-static void kvm_zap_obsolete_pages(struct kvm *kvm)
-{
-	struct kvm_mmu_page *sp, *node;
-	int nr_zapped, batch = 0;
-	bool unstable;
-
-restart:
-	list_for_each_entry_safe_reverse(sp, node,
-	      &kvm->arch.active_mmu_pages, link) {
-		/*
-		 * No obsolete valid page exists before a newly created page
-		 * since active_mmu_pages is a FIFO list.
-		 */
-		if (!is_obsolete_sp(kvm, sp))
-			break;
-
-		/*
-		 * Invalid pages should never land back on the list of active
-		 * pages.  Skip the bogus page, otherwise we'll get stuck in an
-		 * infinite loop if the page gets put back on the list (again).
-		 */
-		if (WARN_ON(sp->role.invalid))
-			continue;
-
-		/*
-		 * No need to flush the TLB since we're only zapping shadow
-		 * pages with an obsolete generation number and all vCPUS have
-		 * loaded a new root, i.e. the shadow pages being zapped cannot
-		 * be in active use by the guest.
-		 */
-		if (batch >= BATCH_ZAP_PAGES &&
-		    cond_resched_rwlock_write(&kvm->mmu_lock)) {
-			batch = 0;
-			goto restart;
-		}
-
-		unstable = __kvm_mmu_prepare_zap_page(kvm, sp,
-				&kvm->arch.zapped_obsolete_pages, &nr_zapped);
-		batch += nr_zapped;
-
-		if (unstable)
-			goto restart;
-	}
-
-	/*
-	 * Kick all vCPUs (via remote TLB flush) before freeing the page tables
-	 * to ensure KVM is not in the middle of a lockless shadow page table
-	 * walk, which may reference the pages.  The remote TLB flush itself is
-	 * not required and is simply a convenient way to kick vCPUs as needed.
-	 * KVM performs a local TLB flush when allocating a new root (see
-	 * kvm_mmu_load()), and the reload in the caller ensure no vCPUs are
-	 * running with an obsolete MMU.
-	 */
-	kvm_mmu_commit_zap_page(kvm, &kvm->arch.zapped_obsolete_pages);
-}
-
 /*
  * Fast invalidate all shadow pages and use lock-break technique
  * to zap obsolete pages.
@@ -6044,11 +2896,6 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 		kvm_tdp_mmu_zap_invalidated_roots(kvm);
 }
 
-static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
-{
-	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
-}
-
 static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
 			struct kvm_memory_slot *slot,
 			struct kvm_page_track_notifier_node *node)
@@ -6106,37 +2953,6 @@ void kvm_mmu_uninit_vm(struct kvm *kvm)
 	mmu_free_vm_memory_caches(kvm);
 }
 
-static bool kvm_rmap_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
-{
-	const struct kvm_memory_slot *memslot;
-	struct kvm_memslots *slots;
-	struct kvm_memslot_iter iter;
-	bool flush = false;
-	gfn_t start, end;
-	int i;
-
-	if (!kvm_memslots_have_rmaps(kvm))
-		return flush;
-
-	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
-		slots = __kvm_memslots(kvm, i);
-
-		kvm_for_each_memslot_in_gfn_range(&iter, slots, gfn_start, gfn_end) {
-			memslot = iter.slot;
-			start = max(gfn_start, memslot->base_gfn);
-			end = min(gfn_end, memslot->base_gfn + memslot->npages);
-			if (WARN_ON_ONCE(start >= end))
-				continue;
-
-			flush = __walk_slot_rmaps(kvm, memslot, __kvm_zap_rmap,
-						  PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
-						  start, end - 1, true, flush);
-		}
-	}
-
-	return flush;
-}
-
 /*
  * Invalidate (zap) SPTEs that cover GFNs from gfn_start and up to gfn_end
  * (not including it)
@@ -6170,13 +2986,6 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 	write_unlock(&kvm->mmu_lock);
 }
 
-static bool slot_rmap_write_protect(struct kvm *kvm,
-				    struct kvm_rmap_head *rmap_head,
-				    const struct kvm_memory_slot *slot)
-{
-	return rmap_write_protect(rmap_head, false);
-}
-
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 				      const struct kvm_memory_slot *memslot,
 				      int start_level)
@@ -6248,182 +3057,6 @@ int topup_split_caches(struct kvm *kvm)
 	return kvm_mmu_topup_memory_cache(&kvm->arch.split_shadow_page_cache, 1);
 }
 
-static struct kvm_mmu_page *shadow_mmu_get_sp_for_split(struct kvm *kvm, u64 *huge_sptep)
-{
-	struct kvm_mmu_page *huge_sp = sptep_to_sp(huge_sptep);
-	struct shadow_page_caches caches = {};
-	union kvm_mmu_page_role role;
-	unsigned int access;
-	gfn_t gfn;
-
-	gfn = kvm_mmu_page_get_gfn(huge_sp, spte_index(huge_sptep));
-	access = kvm_mmu_page_get_access(huge_sp, spte_index(huge_sptep));
-
-	/*
-	 * Note, huge page splitting always uses direct shadow pages, regardless
-	 * of whether the huge page itself is mapped by a direct or indirect
-	 * shadow page, since the huge page region itself is being directly
-	 * mapped with smaller pages.
-	 */
-	role = kvm_mmu_child_role(huge_sptep, /*direct=*/true, access);
-
-	/* Direct SPs do not require a shadowed_info_cache. */
-	caches.page_header_cache = &kvm->arch.split_page_header_cache;
-	caches.shadow_page_cache = &kvm->arch.split_shadow_page_cache;
-
-	/* Safe to pass NULL for vCPU since requesting a direct SP. */
-	return __kvm_mmu_get_shadow_page(kvm, NULL, &caches, gfn, role);
-}
-
-static void shadow_mmu_split_huge_page(struct kvm *kvm,
-				       const struct kvm_memory_slot *slot,
-				       u64 *huge_sptep)
-
-{
-	struct kvm_mmu_memory_cache *cache = &kvm->arch.split_desc_cache;
-	u64 huge_spte = READ_ONCE(*huge_sptep);
-	struct kvm_mmu_page *sp;
-	bool flush = false;
-	u64 *sptep, spte;
-	gfn_t gfn;
-	int index;
-
-	sp = shadow_mmu_get_sp_for_split(kvm, huge_sptep);
-
-	for (index = 0; index < SPTE_ENT_PER_PAGE; index++) {
-		sptep = &sp->spt[index];
-		gfn = kvm_mmu_page_get_gfn(sp, index);
-
-		/*
-		 * The SP may already have populated SPTEs, e.g. if this huge
-		 * page is aliased by multiple sptes with the same access
-		 * permissions. These entries are guaranteed to map the same
-		 * gfn-to-pfn translation since the SP is direct, so no need to
-		 * modify them.
-		 *
-		 * However, if a given SPTE points to a lower level page table,
-		 * that lower level page table may only be partially populated.
-		 * Installing such SPTEs would effectively unmap a potion of the
-		 * huge page. Unmapping guest memory always requires a TLB flush
-		 * since a subsequent operation on the unmapped regions would
-		 * fail to detect the need to flush.
-		 */
-		if (is_shadow_present_pte(*sptep)) {
-			flush |= !is_last_spte(*sptep, sp->role.level);
-			continue;
-		}
-
-		spte = make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
-		mmu_spte_set(sptep, spte);
-		__rmap_add(kvm, cache, slot, sptep, gfn, sp->role.access);
-	}
-
-	__link_shadow_page(kvm, cache, huge_sptep, sp, flush);
-}
-
-static int shadow_mmu_try_split_huge_page(struct kvm *kvm,
-					  const struct kvm_memory_slot *slot,
-					  u64 *huge_sptep)
-{
-	struct kvm_mmu_page *huge_sp = sptep_to_sp(huge_sptep);
-	int level, r = 0;
-	gfn_t gfn;
-	u64 spte;
-
-	/* Grab information for the tracepoint before dropping the MMU lock. */
-	gfn = kvm_mmu_page_get_gfn(huge_sp, spte_index(huge_sptep));
-	level = huge_sp->role.level;
-	spte = *huge_sptep;
-
-	if (kvm_mmu_available_pages(kvm) <= KVM_MIN_FREE_MMU_PAGES) {
-		r = -ENOSPC;
-		goto out;
-	}
-
-	if (need_topup_split_caches_or_resched(kvm)) {
-		write_unlock(&kvm->mmu_lock);
-		cond_resched();
-		/*
-		 * If the topup succeeds, return -EAGAIN to indicate that the
-		 * rmap iterator should be restarted because the MMU lock was
-		 * dropped.
-		 */
-		r = topup_split_caches(kvm) ?: -EAGAIN;
-		write_lock(&kvm->mmu_lock);
-		goto out;
-	}
-
-	shadow_mmu_split_huge_page(kvm, slot, huge_sptep);
-
-out:
-	trace_kvm_mmu_split_huge_page(gfn, spte, level, r);
-	return r;
-}
-
-static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
-					    struct kvm_rmap_head *rmap_head,
-					    const struct kvm_memory_slot *slot)
-{
-	struct rmap_iterator iter;
-	struct kvm_mmu_page *sp;
-	u64 *huge_sptep;
-	int r;
-
-restart:
-	for_each_rmap_spte(rmap_head, &iter, huge_sptep) {
-		sp = sptep_to_sp(huge_sptep);
-
-		/* TDP MMU is enabled, so rmap only contains nested MMU SPs. */
-		if (WARN_ON_ONCE(!sp->role.guest_mode))
-			continue;
-
-		/* The rmaps should never contain non-leaf SPTEs. */
-		if (WARN_ON_ONCE(!is_large_pte(*huge_sptep)))
-			continue;
-
-		/* SPs with level >PG_LEVEL_4K should never by unsync. */
-		if (WARN_ON_ONCE(sp->unsync))
-			continue;
-
-		/* Don't bother splitting huge pages on invalid SPs. */
-		if (sp->role.invalid)
-			continue;
-
-		r = shadow_mmu_try_split_huge_page(kvm, slot, huge_sptep);
-
-		/*
-		 * The split succeeded or needs to be retried because the MMU
-		 * lock was dropped. Either way, restart the iterator to get it
-		 * back into a consistent state.
-		 */
-		if (!r || r == -EAGAIN)
-			goto restart;
-
-		/* The split failed and shouldn't be retried (e.g. -ENOMEM). */
-		break;
-	}
-
-	return false;
-}
-
-static void kvm_shadow_mmu_try_split_huge_pages(struct kvm *kvm,
-						const struct kvm_memory_slot *slot,
-						gfn_t start, gfn_t end,
-						int target_level)
-{
-	int level;
-
-	/*
-	 * Split huge pages starting with KVM_MAX_HUGEPAGE_LEVEL and working
-	 * down to the target level. This ensures pages are recursively split
-	 * all the way to the target level. There's no need to split pages
-	 * already at the target level.
-	 */
-	for (level = KVM_MAX_HUGEPAGE_LEVEL; level > target_level; level--)
-		__walk_slot_rmaps(kvm, slot, shadow_mmu_try_split_huge_pages,
-				  level, level, start, end - 1, true, false);
-}
-
 /* Must be called with the mmu_lock held in write-mode. */
 void kvm_mmu_try_split_huge_pages(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot,
@@ -6475,56 +3108,6 @@ void kvm_mmu_slot_try_split_huge_pages(struct kvm *kvm,
 	 */
 }
 
-static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
-					 struct kvm_rmap_head *rmap_head,
-					 const struct kvm_memory_slot *slot)
-{
-	u64 *sptep;
-	struct rmap_iterator iter;
-	int need_tlb_flush = 0;
-	struct kvm_mmu_page *sp;
-
-restart:
-	for_each_rmap_spte(rmap_head, &iter, sptep) {
-		sp = sptep_to_sp(sptep);
-
-		/*
-		 * We cannot do huge page mapping for indirect shadow pages,
-		 * which are found on the last rmap (level = 1) when not using
-		 * tdp; such shadow pages are synced with the page table in
-		 * the guest, and the guest page table is using 4K page size
-		 * mapping if the indirect sp has level = 1.
-		 */
-		if (sp->role.direct &&
-		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
-							       PG_LEVEL_NUM)) {
-			kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
-
-			if (kvm_available_flush_tlb_with_range())
-				kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
-					KVM_PAGES_PER_HPAGE(sp->role.level));
-			else
-				need_tlb_flush = 1;
-
-			goto restart;
-		}
-	}
-
-	return need_tlb_flush;
-}
-
-static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
-					   const struct kvm_memory_slot *slot)
-{
-	/*
-	 * Note, use KVM_MAX_HUGEPAGE_LEVEL - 1 since there's no need to zap
-	 * pages that are already mapped at the maximum hugepage level.
-	 */
-	if (walk_slot_rmaps(kvm, slot, kvm_mmu_zap_collapsible_spte,
-			    PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL - 1, true))
-		kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
-}
-
 void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 				   const struct kvm_memory_slot *slot)
 {
@@ -6635,65 +3218,6 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
 	}
 }
 
-static unsigned long mmu_shrink_scan(struct shrinker *shrink,
-				     struct shrink_control *sc)
-{
-	struct kvm *kvm;
-	int nr_to_scan = sc->nr_to_scan;
-	unsigned long freed = 0;
-
-	mutex_lock(&kvm_lock);
-
-	list_for_each_entry(kvm, &vm_list, vm_list) {
-		int idx;
-		LIST_HEAD(invalid_list);
-
-		/*
-		 * Never scan more than sc->nr_to_scan VM instances.
-		 * Will not hit this condition practically since we do not try
-		 * to shrink more than one VM and it is very unlikely to see
-		 * !n_used_mmu_pages so many times.
-		 */
-		if (!nr_to_scan--)
-			break;
-		/*
-		 * n_used_mmu_pages is accessed without holding kvm->mmu_lock
-		 * here. We may skip a VM instance errorneosly, but we do not
-		 * want to shrink a VM that only started to populate its MMU
-		 * anyway.
-		 */
-		if (!kvm->arch.n_used_mmu_pages &&
-		    !kvm_has_zapped_obsolete_pages(kvm))
-			continue;
-
-		idx = srcu_read_lock(&kvm->srcu);
-		write_lock(&kvm->mmu_lock);
-
-		if (kvm_has_zapped_obsolete_pages(kvm)) {
-			kvm_mmu_commit_zap_page(kvm,
-			      &kvm->arch.zapped_obsolete_pages);
-			goto unlock;
-		}
-
-		freed = kvm_mmu_zap_oldest_mmu_pages(kvm, sc->nr_to_scan);
-
-unlock:
-		write_unlock(&kvm->mmu_lock);
-		srcu_read_unlock(&kvm->srcu, idx);
-
-		/*
-		 * unfair on small ones
-		 * per-vm shrinkers cry out
-		 * sadness comes quickly
-		 */
-		list_move_tail(&kvm->vm_list, &vm_list);
-		break;
-	}
-
-	mutex_unlock(&kvm_lock);
-	return freed;
-}
-
 static unsigned long mmu_shrink_count(struct shrinker *shrink,
 				      struct shrink_control *sc)
 {
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 95f0adfb3b0b4..9c1399762496b 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -44,6 +44,8 @@ extern bool dbg;
 #define INVALID_PAE_ROOT	0
 #define IS_VALID_PAE_ROOT(x)	(!!(x))
 
+#define PTE_PREFETCH_NUM		8
+
 typedef u64 __rcu *tdp_ptep_t;
 
 struct kvm_mmu_page {
@@ -168,8 +170,6 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    int min_level);
 void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
 					u64 start_gfn, u64 pages);
-unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
-
 extern int nx_huge_pages;
 static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/mmu/shadow_mmu.c b/arch/x86/kvm/mmu/shadow_mmu.c
index eee5a6796d9b0..f3e2ed5b675eb 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.c
+++ b/arch/x86/kvm/mmu/shadow_mmu.c
@@ -21,3 +21,3421 @@
 #include <asm/vmx.h>
 #include <asm/cmpxchg.h>
 #include <trace/events/kvm.h>
+
+#define for_each_shadow_entry(_vcpu, _addr, _walker)            \
+	for (shadow_walk_init(&(_walker), _vcpu, _addr);	\
+	     shadow_walk_okay(&(_walker));			\
+	     shadow_walk_next(&(_walker)))
+
+#define for_each_shadow_entry_lockless(_vcpu, _addr, _walker, spte)	\
+	for (shadow_walk_init(&(_walker), _vcpu, _addr);		\
+	     shadow_walk_okay(&(_walker)) &&				\
+		({ spte = mmu_spte_get_lockless(_walker.sptep); 1; });	\
+	     __shadow_walk_next(&(_walker), spte))
+
+static void mmu_spte_set(u64 *sptep, u64 spte);
+
+void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
+		    unsigned int access)
+{
+	u64 spte = make_mmio_spte(vcpu, gfn, access);
+
+	trace_mark_mmio_spte(sptep, gfn, spte);
+	mmu_spte_set(sptep, spte);
+}
+
+#ifdef CONFIG_X86_64
+static void __set_spte(u64 *sptep, u64 spte)
+{
+	WRITE_ONCE(*sptep, spte);
+}
+
+static void __update_clear_spte_fast(u64 *sptep, u64 spte)
+{
+	WRITE_ONCE(*sptep, spte);
+}
+
+static u64 __update_clear_spte_slow(u64 *sptep, u64 spte)
+{
+	return xchg(sptep, spte);
+}
+
+static u64 __get_spte_lockless(u64 *sptep)
+{
+	return READ_ONCE(*sptep);
+}
+#else
+union split_spte {
+	struct {
+		u32 spte_low;
+		u32 spte_high;
+	};
+	u64 spte;
+};
+
+static void count_spte_clear(u64 *sptep, u64 spte)
+{
+	struct kvm_mmu_page *sp =  sptep_to_sp(sptep);
+
+	if (is_shadow_present_pte(spte))
+		return;
+
+	/* Ensure the spte is completely set before we increase the count */
+	smp_wmb();
+	sp->clear_spte_count++;
+}
+
+static void __set_spte(u64 *sptep, u64 spte)
+{
+	union split_spte *ssptep, sspte;
+
+	ssptep = (union split_spte *)sptep;
+	sspte = (union split_spte)spte;
+
+	ssptep->spte_high = sspte.spte_high;
+
+	/*
+	 * If we map the spte from nonpresent to present, We should store
+	 * the high bits firstly, then set present bit, so cpu can not
+	 * fetch this spte while we are setting the spte.
+	 */
+	smp_wmb();
+
+	WRITE_ONCE(ssptep->spte_low, sspte.spte_low);
+}
+
+static void __update_clear_spte_fast(u64 *sptep, u64 spte)
+{
+	union split_spte *ssptep, sspte;
+
+	ssptep = (union split_spte *)sptep;
+	sspte = (union split_spte)spte;
+
+	WRITE_ONCE(ssptep->spte_low, sspte.spte_low);
+
+	/*
+	 * If we map the spte from present to nonpresent, we should clear
+	 * present bit firstly to avoid vcpu fetch the old high bits.
+	 */
+	smp_wmb();
+
+	ssptep->spte_high = sspte.spte_high;
+	count_spte_clear(sptep, spte);
+}
+
+static u64 __update_clear_spte_slow(u64 *sptep, u64 spte)
+{
+	union split_spte *ssptep, sspte, orig;
+
+	ssptep = (union split_spte *)sptep;
+	sspte = (union split_spte)spte;
+
+	/* xchg acts as a barrier before the setting of the high bits */
+	orig.spte_low = xchg(&ssptep->spte_low, sspte.spte_low);
+	orig.spte_high = ssptep->spte_high;
+	ssptep->spte_high = sspte.spte_high;
+	count_spte_clear(sptep, spte);
+
+	return orig.spte;
+}
+
+/*
+ * The idea using the light way get the spte on x86_32 guest is from
+ * gup_get_pte (mm/gup.c).
+ *
+ * An spte tlb flush may be pending, because kvm_set_pte_rmap
+ * coalesces them and we are running out of the MMU lock.  Therefore
+ * we need to protect against in-progress updates of the spte.
+ *
+ * Reading the spte while an update is in progress may get the old value
+ * for the high part of the spte.  The race is fine for a present->non-present
+ * change (because the high part of the spte is ignored for non-present spte),
+ * but for a present->present change we must reread the spte.
+ *
+ * All such changes are done in two steps (present->non-present and
+ * non-present->present), hence it is enough to count the number of
+ * present->non-present updates: if it changed while reading the spte,
+ * we might have hit the race.  This is done using clear_spte_count.
+ */
+static u64 __get_spte_lockless(u64 *sptep)
+{
+	struct kvm_mmu_page *sp =  sptep_to_sp(sptep);
+	union split_spte spte, *orig = (union split_spte *)sptep;
+	int count;
+
+retry:
+	count = sp->clear_spte_count;
+	smp_rmb();
+
+	spte.spte_low = orig->spte_low;
+	smp_rmb();
+
+	spte.spte_high = orig->spte_high;
+	smp_rmb();
+
+	if (unlikely(spte.spte_low != orig->spte_low ||
+	      count != sp->clear_spte_count))
+		goto retry;
+
+	return spte.spte;
+}
+#endif
+
+/* Rules for using mmu_spte_set:
+ * Set the sptep from nonpresent to present.
+ * Note: the sptep being assigned *must* be either not present
+ * or in a state where the hardware will not attempt to update
+ * the spte.
+ */
+static void mmu_spte_set(u64 *sptep, u64 new_spte)
+{
+	WARN_ON(is_shadow_present_pte(*sptep));
+	__set_spte(sptep, new_spte);
+}
+
+/*
+ * Update the SPTE (excluding the PFN), but do not track changes in its
+ * accessed/dirty status.
+ */
+static u64 mmu_spte_update_no_track(u64 *sptep, u64 new_spte)
+{
+	u64 old_spte = *sptep;
+
+	WARN_ON(!is_shadow_present_pte(new_spte));
+	check_spte_writable_invariants(new_spte);
+
+	if (!is_shadow_present_pte(old_spte)) {
+		mmu_spte_set(sptep, new_spte);
+		return old_spte;
+	}
+
+	if (!spte_has_volatile_bits(old_spte))
+		__update_clear_spte_fast(sptep, new_spte);
+	else
+		old_spte = __update_clear_spte_slow(sptep, new_spte);
+
+	WARN_ON(spte_to_pfn(old_spte) != spte_to_pfn(new_spte));
+
+	return old_spte;
+}
+
+/* Rules for using mmu_spte_update:
+ * Update the state bits, it means the mapped pfn is not changed.
+ *
+ * Whenever an MMU-writable SPTE is overwritten with a read-only SPTE, remote
+ * TLBs must be flushed. Otherwise rmap_write_protect will find a read-only
+ * spte, even though the writable spte might be cached on a CPU's TLB.
+ *
+ * Returns true if the TLB needs to be flushed
+ */
+bool mmu_spte_update(u64 *sptep, u64 new_spte)
+{
+	bool flush = false;
+	u64 old_spte = mmu_spte_update_no_track(sptep, new_spte);
+
+	if (!is_shadow_present_pte(old_spte))
+		return false;
+
+	/*
+	 * For the spte updated out of mmu-lock is safe, since
+	 * we always atomically update it, see the comments in
+	 * spte_has_volatile_bits().
+	 */
+	if (is_mmu_writable_spte(old_spte) &&
+	      !is_writable_pte(new_spte))
+		flush = true;
+
+	/*
+	 * Flush TLB when accessed/dirty states are changed in the page tables,
+	 * to guarantee consistency between TLB and page tables.
+	 */
+
+	if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte)) {
+		flush = true;
+		kvm_set_pfn_accessed(spte_to_pfn(old_spte));
+	}
+
+	if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte)) {
+		flush = true;
+		kvm_set_pfn_dirty(spte_to_pfn(old_spte));
+	}
+
+	return flush;
+}
+
+/*
+ * Rules for using mmu_spte_clear_track_bits:
+ * It sets the sptep from present to nonpresent, and track the
+ * state bits, it is used to clear the last level sptep.
+ * Returns the old PTE.
+ */
+static u64 mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
+{
+	kvm_pfn_t pfn;
+	u64 old_spte = *sptep;
+	int level = sptep_to_sp(sptep)->role.level;
+	struct page *page;
+
+	if (!is_shadow_present_pte(old_spte) ||
+	    !spte_has_volatile_bits(old_spte))
+		__update_clear_spte_fast(sptep, 0ull);
+	else
+		old_spte = __update_clear_spte_slow(sptep, 0ull);
+
+	if (!is_shadow_present_pte(old_spte))
+		return old_spte;
+
+	kvm_update_page_stats(kvm, level, -1);
+
+	pfn = spte_to_pfn(old_spte);
+
+	/*
+	 * KVM doesn't hold a reference to any pages mapped into the guest, and
+	 * instead uses the mmu_notifier to ensure that KVM unmaps any pages
+	 * before they are reclaimed.  Sanity check that, if the pfn is backed
+	 * by a refcounted page, the refcount is elevated.
+	 */
+	page = kvm_pfn_to_refcounted_page(pfn);
+	WARN_ON(page && !page_count(page));
+
+	if (is_accessed_spte(old_spte))
+		kvm_set_pfn_accessed(pfn);
+
+	if (is_dirty_spte(old_spte))
+		kvm_set_pfn_dirty(pfn);
+
+	return old_spte;
+}
+
+/*
+ * Rules for using mmu_spte_clear_no_track:
+ * Directly clear spte without caring the state bits of sptep,
+ * it is used to set the upper level spte.
+ */
+void mmu_spte_clear_no_track(u64 *sptep)
+{
+	__update_clear_spte_fast(sptep, 0ull);
+}
+
+static u64 mmu_spte_get_lockless(u64 *sptep)
+{
+	return __get_spte_lockless(sptep);
+}
+
+/* Returns the Accessed status of the PTE and resets it at the same time. */
+static bool mmu_spte_age(u64 *sptep)
+{
+	u64 spte = mmu_spte_get_lockless(sptep);
+
+	if (!is_accessed_spte(spte))
+		return false;
+
+	if (spte_ad_enabled(spte)) {
+		clear_bit((ffs(shadow_accessed_mask) - 1),
+			  (unsigned long *)sptep);
+	} else {
+		/*
+		 * Capture the dirty status of the page, so that it doesn't get
+		 * lost when the SPTE is marked for access tracking.
+		 */
+		if (is_writable_pte(spte))
+			kvm_set_pfn_dirty(spte_to_pfn(spte));
+
+		spte = mark_spte_for_access_track(spte);
+		mmu_spte_update_no_track(sptep, spte);
+	}
+
+	return true;
+}
+
+static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
+{
+	kmem_cache_free(pte_list_desc_cache, pte_list_desc);
+}
+
+static bool sp_has_gptes(struct kvm_mmu_page *sp);
+
+gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
+{
+	if (sp->role.passthrough)
+		return sp->gfn;
+
+	if (!sp->role.direct)
+		return sp->shadowed_translation[index] >> PAGE_SHIFT;
+
+	return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
+}
+
+/*
+ * For leaf SPTEs, fetch the *guest* access permissions being shadowed. Note
+ * that the SPTE itself may have a more constrained access permissions that
+ * what the guest enforces. For example, a guest may create an executable
+ * huge PTE but KVM may disallow execution to mitigate iTLB multihit.
+ */
+static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
+{
+	if (sp_has_gptes(sp))
+		return sp->shadowed_translation[index] & ACC_ALL;
+
+	/*
+	 * For direct MMUs (e.g. TDP or non-paging guests) or passthrough SPs,
+	 * KVM is not shadowing any guest page tables, so the "guest access
+	 * permissions" are just ACC_ALL.
+	 *
+	 * For direct SPs in indirect MMUs (shadow paging), i.e. when KVM
+	 * is shadowing a guest huge page with small pages, the guest access
+	 * permissions being shadowed are the access permissions of the huge
+	 * page.
+	 *
+	 * In both cases, sp->role.access contains the correct access bits.
+	 */
+	return sp->role.access;
+}
+
+static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
+					 gfn_t gfn, unsigned int access)
+{
+	if (sp_has_gptes(sp)) {
+		sp->shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
+		return;
+	}
+
+	WARN_ONCE(access != kvm_mmu_page_get_access(sp, index),
+	          "access mismatch under %s page %llx (expected %u, got %u)\n",
+	          sp->role.passthrough ? "passthrough" : "direct",
+	          sp->gfn, kvm_mmu_page_get_access(sp, index), access);
+
+	WARN_ONCE(gfn != kvm_mmu_page_get_gfn(sp, index),
+	          "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
+	          sp->role.passthrough ? "passthrough" : "direct",
+	          sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
+}
+
+void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index,
+			     unsigned int access)
+{
+	gfn_t gfn = kvm_mmu_page_get_gfn(sp, index);
+
+	kvm_mmu_page_set_translation(sp, index, gfn, access);
+}
+
+static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *slot;
+	gfn_t gfn;
+
+	kvm->arch.indirect_shadow_pages++;
+	gfn = sp->gfn;
+	slots = kvm_memslots_for_spte_role(kvm, sp->role);
+	slot = __gfn_to_memslot(slots, gfn);
+
+	/* the non-leaf shadow pages are keeping readonly. */
+	if (sp->role.level > PG_LEVEL_4K)
+		return kvm_slot_page_track_add_page(kvm, slot, gfn,
+						    KVM_PAGE_TRACK_WRITE);
+
+	kvm_mmu_gfn_disallow_lpage(slot, gfn);
+
+	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
+		kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
+}
+
+static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *slot;
+	gfn_t gfn;
+
+	kvm->arch.indirect_shadow_pages--;
+	gfn = sp->gfn;
+	slots = kvm_memslots_for_spte_role(kvm, sp->role);
+	slot = __gfn_to_memslot(slots, gfn);
+	if (sp->role.level > PG_LEVEL_4K)
+		return kvm_slot_page_track_remove_page(kvm, slot, gfn,
+						       KVM_PAGE_TRACK_WRITE);
+
+	kvm_mmu_gfn_allow_lpage(slot, gfn);
+}
+
+/*
+ * About rmap_head encoding:
+ *
+ * If the bit zero of rmap_head->val is clear, then it points to the only spte
+ * in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct
+ * pte_list_desc containing more mappings.
+ */
+
+/*
+ * Returns the number of pointers in the rmap chain, not counting the new one.
+ */
+static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte,
+			struct kvm_rmap_head *rmap_head)
+{
+	struct pte_list_desc *desc;
+	int count = 0;
+
+	if (!rmap_head->val) {
+		rmap_printk("%p %llx 0->1\n", spte, *spte);
+		rmap_head->val = (unsigned long)spte;
+	} else if (!(rmap_head->val & 1)) {
+		rmap_printk("%p %llx 1->many\n", spte, *spte);
+		desc = kvm_mmu_memory_cache_alloc(cache);
+		desc->sptes[0] = (u64 *)rmap_head->val;
+		desc->sptes[1] = spte;
+		desc->spte_count = 2;
+		rmap_head->val = (unsigned long)desc | 1;
+		++count;
+	} else {
+		rmap_printk("%p %llx many->many\n", spte, *spte);
+		desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+		while (desc->spte_count == PTE_LIST_EXT) {
+			count += PTE_LIST_EXT;
+			if (!desc->more) {
+				desc->more = kvm_mmu_memory_cache_alloc(cache);
+				desc = desc->more;
+				desc->spte_count = 0;
+				break;
+			}
+			desc = desc->more;
+		}
+		count += desc->spte_count;
+		desc->sptes[desc->spte_count++] = spte;
+	}
+	return count;
+}
+
+static void pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
+				       struct pte_list_desc *desc, int i,
+				       struct pte_list_desc *prev_desc)
+{
+	int j = desc->spte_count - 1;
+
+	desc->sptes[i] = desc->sptes[j];
+	desc->sptes[j] = NULL;
+	desc->spte_count--;
+	if (desc->spte_count)
+		return;
+	if (!prev_desc && !desc->more)
+		rmap_head->val = 0;
+	else
+		if (prev_desc)
+			prev_desc->more = desc->more;
+		else
+			rmap_head->val = (unsigned long)desc->more | 1;
+	mmu_free_pte_list_desc(desc);
+}
+
+static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
+{
+	struct pte_list_desc *desc;
+	struct pte_list_desc *prev_desc;
+	int i;
+
+	if (!rmap_head->val) {
+		pr_err("%s: %p 0->BUG\n", __func__, spte);
+		BUG();
+	} else if (!(rmap_head->val & 1)) {
+		rmap_printk("%p 1->0\n", spte);
+		if ((u64 *)rmap_head->val != spte) {
+			pr_err("%s:  %p 1->BUG\n", __func__, spte);
+			BUG();
+		}
+		rmap_head->val = 0;
+	} else {
+		rmap_printk("%p many->many\n", spte);
+		desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+		prev_desc = NULL;
+		while (desc) {
+			for (i = 0; i < desc->spte_count; ++i) {
+				if (desc->sptes[i] == spte) {
+					pte_list_desc_remove_entry(rmap_head,
+							desc, i, prev_desc);
+					return;
+				}
+			}
+			prev_desc = desc;
+			desc = desc->more;
+		}
+		pr_err("%s: %p many->many\n", __func__, spte);
+		BUG();
+	}
+}
+
+static void kvm_zap_one_rmap_spte(struct kvm *kvm,
+				  struct kvm_rmap_head *rmap_head, u64 *sptep)
+{
+	mmu_spte_clear_track_bits(kvm, sptep);
+	pte_list_remove(sptep, rmap_head);
+}
+
+/* Return true if at least one SPTE was zapped, false otherwise */
+static bool kvm_zap_all_rmap_sptes(struct kvm *kvm,
+				   struct kvm_rmap_head *rmap_head)
+{
+	struct pte_list_desc *desc, *next;
+	int i;
+
+	if (!rmap_head->val)
+		return false;
+
+	if (!(rmap_head->val & 1)) {
+		mmu_spte_clear_track_bits(kvm, (u64 *)rmap_head->val);
+		goto out;
+	}
+
+	desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+
+	for (; desc; desc = next) {
+		for (i = 0; i < desc->spte_count; i++)
+			mmu_spte_clear_track_bits(kvm, desc->sptes[i]);
+		next = desc->more;
+		mmu_free_pte_list_desc(desc);
+	}
+out:
+	/* rmap_head is meaningless now, remember to reset it */
+	rmap_head->val = 0;
+	return true;
+}
+
+unsigned int pte_list_count(struct kvm_rmap_head *rmap_head)
+{
+	struct pte_list_desc *desc;
+	unsigned int count = 0;
+
+	if (!rmap_head->val)
+		return 0;
+	else if (!(rmap_head->val & 1))
+		return 1;
+
+	desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+
+	while (desc) {
+		count += desc->spte_count;
+		desc = desc->more;
+	}
+
+	return count;
+}
+
+struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
+				  const struct kvm_memory_slot *slot)
+{
+	unsigned long idx;
+
+	idx = gfn_to_index(gfn, slot->base_gfn, level);
+	return &slot->arch.rmap[level - PG_LEVEL_4K][idx];
+}
+
+bool rmap_can_add(struct kvm_vcpu *vcpu)
+{
+	struct kvm_mmu_memory_cache *mc;
+
+	mc = &vcpu->arch.mmu_pte_list_desc_cache;
+	return kvm_mmu_memory_cache_nr_free_objects(mc);
+}
+
+static void rmap_remove(struct kvm *kvm, u64 *spte)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *slot;
+	struct kvm_mmu_page *sp;
+	gfn_t gfn;
+	struct kvm_rmap_head *rmap_head;
+
+	sp = sptep_to_sp(spte);
+	gfn = kvm_mmu_page_get_gfn(sp, spte_index(spte));
+
+	/*
+	 * Unlike rmap_add, rmap_remove does not run in the context of a vCPU
+	 * so we have to determine which memslots to use based on context
+	 * information in sp->role.
+	 */
+	slots = kvm_memslots_for_spte_role(kvm, sp->role);
+
+	slot = __gfn_to_memslot(slots, gfn);
+	rmap_head = gfn_to_rmap(gfn, sp->role.level, slot);
+
+	pte_list_remove(spte, rmap_head);
+}
+
+/*
+ * Used by the following functions to iterate through the sptes linked by a
+ * rmap.  All fields are private and not assumed to be used outside.
+ */
+struct rmap_iterator {
+	/* private fields */
+	struct pte_list_desc *desc;	/* holds the sptep if not NULL */
+	int pos;			/* index of the sptep */
+};
+
+/*
+ * Iteration must be started by this function.  This should also be used after
+ * removing/dropping sptes from the rmap link because in such cases the
+ * information in the iterator may not be valid.
+ *
+ * Returns sptep if found, NULL otherwise.
+ */
+static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head,
+			   struct rmap_iterator *iter)
+{
+	u64 *sptep;
+
+	if (!rmap_head->val)
+		return NULL;
+
+	if (!(rmap_head->val & 1)) {
+		iter->desc = NULL;
+		sptep = (u64 *)rmap_head->val;
+		goto out;
+	}
+
+	iter->desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
+	iter->pos = 0;
+	sptep = iter->desc->sptes[iter->pos];
+out:
+	BUG_ON(!is_shadow_present_pte(*sptep));
+	return sptep;
+}
+
+/*
+ * Must be used with a valid iterator: e.g. after rmap_get_first().
+ *
+ * Returns sptep if found, NULL otherwise.
+ */
+static u64 *rmap_get_next(struct rmap_iterator *iter)
+{
+	u64 *sptep;
+
+	if (iter->desc) {
+		if (iter->pos < PTE_LIST_EXT - 1) {
+			++iter->pos;
+			sptep = iter->desc->sptes[iter->pos];
+			if (sptep)
+				goto out;
+		}
+
+		iter->desc = iter->desc->more;
+
+		if (iter->desc) {
+			iter->pos = 0;
+			/* desc->sptes[0] cannot be NULL */
+			sptep = iter->desc->sptes[iter->pos];
+			goto out;
+		}
+	}
+
+	return NULL;
+out:
+	BUG_ON(!is_shadow_present_pte(*sptep));
+	return sptep;
+}
+
+#define for_each_rmap_spte(_rmap_head_, _iter_, _spte_)			\
+	for (_spte_ = rmap_get_first(_rmap_head_, _iter_);		\
+	     _spte_; _spte_ = rmap_get_next(_iter_))
+
+void drop_spte(struct kvm *kvm, u64 *sptep)
+{
+	u64 old_spte = mmu_spte_clear_track_bits(kvm, sptep);
+
+	if (is_shadow_present_pte(old_spte))
+		rmap_remove(kvm, sptep);
+}
+
+static void drop_large_spte(struct kvm *kvm, u64 *sptep, bool flush)
+{
+	struct kvm_mmu_page *sp;
+
+	sp = sptep_to_sp(sptep);
+	WARN_ON(sp->role.level == PG_LEVEL_4K);
+
+	drop_spte(kvm, sptep);
+
+	if (flush)
+		kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
+			KVM_PAGES_PER_HPAGE(sp->role.level));
+}
+
+/*
+ * Write-protect on the specified @sptep, @pt_protect indicates whether
+ * spte write-protection is caused by protecting shadow page table.
+ *
+ * Note: write protection is difference between dirty logging and spte
+ * protection:
+ * - for dirty logging, the spte can be set to writable at anytime if
+ *   its dirty bitmap is properly set.
+ * - for spte protection, the spte can be writable only after unsync-ing
+ *   shadow page.
+ *
+ * Return true if tlb need be flushed.
+ */
+static bool spte_write_protect(u64 *sptep, bool pt_protect)
+{
+	u64 spte = *sptep;
+
+	if (!is_writable_pte(spte) &&
+	    !(pt_protect && is_mmu_writable_spte(spte)))
+		return false;
+
+	rmap_printk("spte %p %llx\n", sptep, *sptep);
+
+	if (pt_protect)
+		spte &= ~shadow_mmu_writable_mask;
+	spte = spte & ~PT_WRITABLE_MASK;
+
+	return mmu_spte_update(sptep, spte);
+}
+
+bool rmap_write_protect(struct kvm_rmap_head *rmap_head, bool pt_protect)
+{
+	u64 *sptep;
+	struct rmap_iterator iter;
+	bool flush = false;
+
+	for_each_rmap_spte(rmap_head, &iter, sptep)
+		flush |= spte_write_protect(sptep, pt_protect);
+
+	return flush;
+}
+
+static bool spte_clear_dirty(u64 *sptep)
+{
+	u64 spte = *sptep;
+
+	rmap_printk("spte %p %llx\n", sptep, *sptep);
+
+	MMU_WARN_ON(!spte_ad_enabled(spte));
+	spte &= ~shadow_dirty_mask;
+	return mmu_spte_update(sptep, spte);
+}
+
+static bool spte_wrprot_for_clear_dirty(u64 *sptep)
+{
+	bool was_writable = test_and_clear_bit(PT_WRITABLE_SHIFT,
+					       (unsigned long *)sptep);
+	if (was_writable && !spte_ad_enabled(*sptep))
+		kvm_set_pfn_dirty(spte_to_pfn(*sptep));
+
+	return was_writable;
+}
+
+/*
+ * Gets the GFN ready for another round of dirty logging by clearing the
+ *	- D bit on ad-enabled SPTEs, and
+ *	- W bit on ad-disabled SPTEs.
+ * Returns true iff any D or W bits were cleared.
+ */
+bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			const struct kvm_memory_slot *slot)
+{
+	u64 *sptep;
+	struct rmap_iterator iter;
+	bool flush = false;
+
+	for_each_rmap_spte(rmap_head, &iter, sptep)
+		if (spte_ad_need_write_protect(*sptep))
+			flush |= spte_wrprot_for_clear_dirty(sptep);
+		else
+			flush |= spte_clear_dirty(sptep);
+
+	return flush;
+}
+
+static bool __kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			   const struct kvm_memory_slot *slot)
+{
+	return kvm_zap_all_rmap_sptes(kvm, rmap_head);
+}
+
+bool kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+		  struct kvm_memory_slot *slot, gfn_t gfn, int level,
+		  pte_t unused)
+{
+	return __kvm_zap_rmap(kvm, rmap_head, slot);
+}
+
+bool kvm_set_pte_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+		      struct kvm_memory_slot *slot, gfn_t gfn, int level,
+		      pte_t pte)
+{
+	u64 *sptep;
+	struct rmap_iterator iter;
+	bool need_flush = false;
+	u64 new_spte;
+	kvm_pfn_t new_pfn;
+
+	WARN_ON(pte_huge(pte));
+	new_pfn = pte_pfn(pte);
+
+restart:
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
+		rmap_printk("spte %p %llx gfn %llx (%d)\n",
+			    sptep, *sptep, gfn, level);
+
+		need_flush = true;
+
+		if (pte_write(pte)) {
+			kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
+			goto restart;
+		} else {
+			new_spte = kvm_mmu_changed_pte_notifier_make_spte(
+					*sptep, new_pfn);
+
+			mmu_spte_clear_track_bits(kvm, sptep);
+			mmu_spte_set(sptep, new_spte);
+		}
+	}
+
+	if (need_flush && kvm_available_flush_tlb_with_range()) {
+		kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
+		return false;
+	}
+
+	return need_flush;
+}
+
+struct slot_rmap_walk_iterator {
+	/* input fields. */
+	const struct kvm_memory_slot *slot;
+	gfn_t start_gfn;
+	gfn_t end_gfn;
+	int start_level;
+	int end_level;
+
+	/* output fields. */
+	gfn_t gfn;
+	struct kvm_rmap_head *rmap;
+	int level;
+
+	/* private field. */
+	struct kvm_rmap_head *end_rmap;
+};
+
+static void rmap_walk_init_level(struct slot_rmap_walk_iterator *iterator,
+				 int level)
+{
+	iterator->level = level;
+	iterator->gfn = iterator->start_gfn;
+	iterator->rmap = gfn_to_rmap(iterator->gfn, level, iterator->slot);
+	iterator->end_rmap = gfn_to_rmap(iterator->end_gfn, level, iterator->slot);
+}
+
+static void slot_rmap_walk_init(struct slot_rmap_walk_iterator *iterator,
+				const struct kvm_memory_slot *slot,
+				int start_level, int end_level,
+				gfn_t start_gfn, gfn_t end_gfn)
+{
+	iterator->slot = slot;
+	iterator->start_level = start_level;
+	iterator->end_level = end_level;
+	iterator->start_gfn = start_gfn;
+	iterator->end_gfn = end_gfn;
+
+	rmap_walk_init_level(iterator, iterator->start_level);
+}
+
+static bool slot_rmap_walk_okay(struct slot_rmap_walk_iterator *iterator)
+{
+	return !!iterator->rmap;
+}
+
+static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
+{
+	while (++iterator->rmap <= iterator->end_rmap) {
+		iterator->gfn += (1UL << KVM_HPAGE_GFN_SHIFT(iterator->level));
+
+		if (iterator->rmap->val)
+			return;
+	}
+
+	if (++iterator->level > iterator->end_level) {
+		iterator->rmap = NULL;
+		return;
+	}
+
+	rmap_walk_init_level(iterator, iterator->level);
+}
+
+#define for_each_slot_rmap_range(_slot_, _start_level_, _end_level_,	\
+	   _start_gfn, _end_gfn, _iter_)				\
+	for (slot_rmap_walk_init(_iter_, _slot_, _start_level_,		\
+				 _end_level_, _start_gfn, _end_gfn);	\
+	     slot_rmap_walk_okay(_iter_);				\
+	     slot_rmap_walk_next(_iter_))
+
+__always_inline bool kvm_handle_gfn_range(struct kvm *kvm,
+					  struct kvm_gfn_range *range,
+					  rmap_handler_t handler)
+{
+	struct slot_rmap_walk_iterator iterator;
+	bool ret = false;
+
+	for_each_slot_rmap_range(range->slot, PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
+				 range->start, range->end - 1, &iterator)
+		ret |= handler(kvm, iterator.rmap, range->slot, iterator.gfn,
+			       iterator.level, range->pte);
+
+	return ret;
+}
+
+bool kvm_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+		  struct kvm_memory_slot *slot, gfn_t gfn, int level,
+		  pte_t unused)
+{
+	u64 *sptep;
+	struct rmap_iterator iter;
+	int young = 0;
+
+	for_each_rmap_spte(rmap_head, &iter, sptep)
+		young |= mmu_spte_age(sptep);
+
+	return young;
+}
+
+bool kvm_test_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+		       struct kvm_memory_slot *slot, gfn_t gfn,
+		       int level, pte_t unused)
+{
+	u64 *sptep;
+	struct rmap_iterator iter;
+
+	for_each_rmap_spte(rmap_head, &iter, sptep)
+		if (is_accessed_spte(*sptep))
+			return true;
+	return false;
+}
+
+#define RMAP_RECYCLE_THRESHOLD 1000
+
+static void __rmap_add(struct kvm *kvm,
+		       struct kvm_mmu_memory_cache *cache,
+		       const struct kvm_memory_slot *slot,
+		       u64 *spte, gfn_t gfn, unsigned int access)
+{
+	struct kvm_mmu_page *sp;
+	struct kvm_rmap_head *rmap_head;
+	int rmap_count;
+
+	sp = sptep_to_sp(spte);
+	kvm_mmu_page_set_translation(sp, spte_index(spte), gfn, access);
+	kvm_update_page_stats(kvm, sp->role.level, 1);
+
+	rmap_head = gfn_to_rmap(gfn, sp->role.level, slot);
+	rmap_count = pte_list_add(cache, spte, rmap_head);
+
+	if (rmap_count > kvm->stat.max_mmu_rmap_size)
+		kvm->stat.max_mmu_rmap_size = rmap_count;
+	if (rmap_count > RMAP_RECYCLE_THRESHOLD) {
+		kvm_zap_all_rmap_sptes(kvm, rmap_head);
+		kvm_flush_remote_tlbs_with_address(
+				kvm, sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
+	}
+}
+
+static void rmap_add(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot,
+		     u64 *spte, gfn_t gfn, unsigned int access)
+{
+	struct kvm_mmu_memory_cache *cache = &vcpu->arch.mmu_pte_list_desc_cache;
+
+	__rmap_add(vcpu->kvm, cache, slot, spte, gfn, access);
+}
+
+#ifdef MMU_DEBUG
+static int is_empty_shadow_page(u64 *spt)
+{
+	u64 *pos;
+	u64 *end;
+
+	for (pos = spt, end = pos + SPTE_ENT_PER_PAGE; pos != end; pos++)
+		if (is_shadow_present_pte(*pos)) {
+			printk(KERN_ERR "%s: %p %llx\n", __func__,
+			       pos, *pos);
+			return 0;
+		}
+	return 1;
+}
+#endif
+
+/*
+ * This value is the sum of all of the kvm instances's
+ * kvm->arch.n_used_mmu_pages values.  We need a global,
+ * aggregate version in order to make the slab shrinker
+ * faster
+ */
+static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, long nr)
+{
+	kvm->arch.n_used_mmu_pages += nr;
+	percpu_counter_add(&kvm_total_used_mmu_pages, nr);
+}
+
+static void kvm_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	kvm_mod_used_mmu_pages(kvm, +1);
+	kvm_account_pgtable_pages((void *)sp->spt, +1);
+}
+
+static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	kvm_mod_used_mmu_pages(kvm, -1);
+	kvm_account_pgtable_pages((void *)sp->spt, -1);
+}
+
+static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
+{
+	MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
+	hlist_del(&sp->hash_link);
+	list_del(&sp->link);
+	free_page((unsigned long)sp->spt);
+	if (!sp->role.direct)
+		free_page((unsigned long)sp->shadowed_translation);
+	kmem_cache_free(mmu_page_header_cache, sp);
+}
+
+static unsigned kvm_page_table_hashfn(gfn_t gfn)
+{
+	return hash_64(gfn, KVM_MMU_HASH_SHIFT);
+}
+
+static void mmu_page_add_parent_pte(struct kvm_mmu_memory_cache *cache,
+				    struct kvm_mmu_page *sp, u64 *parent_pte)
+{
+	if (!parent_pte)
+		return;
+
+	pte_list_add(cache, parent_pte, &sp->parent_ptes);
+}
+
+static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
+				       u64 *parent_pte)
+{
+	pte_list_remove(parent_pte, &sp->parent_ptes);
+}
+
+void drop_parent_pte(struct kvm_mmu_page *sp, u64 *parent_pte)
+{
+	mmu_page_remove_parent_pte(sp, parent_pte);
+	mmu_spte_clear_no_track(parent_pte);
+}
+
+static void mark_unsync(u64 *spte);
+static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
+{
+	u64 *sptep;
+	struct rmap_iterator iter;
+
+	for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
+		mark_unsync(sptep);
+	}
+}
+
+static void mark_unsync(u64 *spte)
+{
+	struct kvm_mmu_page *sp;
+
+	sp = sptep_to_sp(spte);
+	if (__test_and_set_bit(spte_index(spte), sp->unsync_child_bitmap))
+		return;
+	if (sp->unsync_children++)
+		return;
+	kvm_mmu_mark_parents_unsync(sp);
+}
+
+int nonpaging_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+	return -1;
+}
+
+#define KVM_PAGE_ARRAY_NR 16
+
+struct kvm_mmu_pages {
+	struct mmu_page_and_offset {
+		struct kvm_mmu_page *sp;
+		unsigned int idx;
+	} page[KVM_PAGE_ARRAY_NR];
+	unsigned int nr;
+};
+
+static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
+			 int idx)
+{
+	int i;
+
+	if (sp->unsync)
+		for (i=0; i < pvec->nr; i++)
+			if (pvec->page[i].sp == sp)
+				return 0;
+
+	pvec->page[pvec->nr].sp = sp;
+	pvec->page[pvec->nr].idx = idx;
+	pvec->nr++;
+	return (pvec->nr == KVM_PAGE_ARRAY_NR);
+}
+
+static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
+{
+	--sp->unsync_children;
+	WARN_ON((int)sp->unsync_children < 0);
+	__clear_bit(idx, sp->unsync_child_bitmap);
+}
+
+static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
+			   struct kvm_mmu_pages *pvec)
+{
+	int i, ret, nr_unsync_leaf = 0;
+
+	for_each_set_bit(i, sp->unsync_child_bitmap, 512) {
+		struct kvm_mmu_page *child;
+		u64 ent = sp->spt[i];
+
+		if (!is_shadow_present_pte(ent) || is_large_pte(ent)) {
+			clear_unsync_child_bit(sp, i);
+			continue;
+		}
+
+		child = spte_to_child_sp(ent);
+
+		if (child->unsync_children) {
+			if (mmu_pages_add(pvec, child, i))
+				return -ENOSPC;
+
+			ret = __mmu_unsync_walk(child, pvec);
+			if (!ret) {
+				clear_unsync_child_bit(sp, i);
+				continue;
+			} else if (ret > 0) {
+				nr_unsync_leaf += ret;
+			} else
+				return ret;
+		} else if (child->unsync) {
+			nr_unsync_leaf++;
+			if (mmu_pages_add(pvec, child, i))
+				return -ENOSPC;
+		} else
+			clear_unsync_child_bit(sp, i);
+	}
+
+	return nr_unsync_leaf;
+}
+
+#define INVALID_INDEX (-1)
+
+static int mmu_unsync_walk(struct kvm_mmu_page *sp,
+			   struct kvm_mmu_pages *pvec)
+{
+	pvec->nr = 0;
+	if (!sp->unsync_children)
+		return 0;
+
+	mmu_pages_add(pvec, sp, INVALID_INDEX);
+	return __mmu_unsync_walk(sp, pvec);
+}
+
+static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	WARN_ON(!sp->unsync);
+	trace_kvm_mmu_sync_page(sp);
+	sp->unsync = 0;
+	--kvm->stat.mmu_unsync;
+}
+
+static bool sp_has_gptes(struct kvm_mmu_page *sp)
+{
+	if (sp->role.direct)
+		return false;
+
+	if (sp->role.passthrough)
+		return false;
+
+	return true;
+}
+
+#define for_each_valid_sp(_kvm, _sp, _list)				\
+	hlist_for_each_entry(_sp, _list, hash_link)			\
+		if (is_obsolete_sp((_kvm), (_sp))) {			\
+		} else
+
+#define for_each_gfn_valid_sp_with_gptes(_kvm, _sp, _gfn)		\
+	for_each_valid_sp(_kvm, _sp,					\
+	  &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)])	\
+		if ((_sp)->gfn != (_gfn) || !sp_has_gptes(_sp)) {} else
+
+static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+			 struct list_head *invalid_list)
+{
+	int ret = vcpu->arch.mmu->sync_page(vcpu, sp);
+
+	if (ret < 0)
+		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
+	return ret;
+}
+
+struct mmu_page_path {
+	struct kvm_mmu_page *parent[PT64_ROOT_MAX_LEVEL];
+	unsigned int idx[PT64_ROOT_MAX_LEVEL];
+};
+
+#define for_each_sp(pvec, sp, parents, i)			\
+		for (i = mmu_pages_first(&pvec, &parents);	\
+			i < pvec.nr && ({ sp = pvec.page[i].sp; 1;});	\
+			i = mmu_pages_next(&pvec, &parents, i))
+
+static int mmu_pages_next(struct kvm_mmu_pages *pvec,
+			  struct mmu_page_path *parents,
+			  int i)
+{
+	int n;
+
+	for (n = i+1; n < pvec->nr; n++) {
+		struct kvm_mmu_page *sp = pvec->page[n].sp;
+		unsigned idx = pvec->page[n].idx;
+		int level = sp->role.level;
+
+		parents->idx[level-1] = idx;
+		if (level == PG_LEVEL_4K)
+			break;
+
+		parents->parent[level-2] = sp;
+	}
+
+	return n;
+}
+
+static int mmu_pages_first(struct kvm_mmu_pages *pvec,
+			   struct mmu_page_path *parents)
+{
+	struct kvm_mmu_page *sp;
+	int level;
+
+	if (pvec->nr == 0)
+		return 0;
+
+	WARN_ON(pvec->page[0].idx != INVALID_INDEX);
+
+	sp = pvec->page[0].sp;
+	level = sp->role.level;
+	WARN_ON(level == PG_LEVEL_4K);
+
+	parents->parent[level-2] = sp;
+
+	/* Also set up a sentinel.  Further entries in pvec are all
+	 * children of sp, so this element is never overwritten.
+	 */
+	parents->parent[level-1] = NULL;
+	return mmu_pages_next(pvec, parents, 0);
+}
+
+static void mmu_pages_clear_parents(struct mmu_page_path *parents)
+{
+	struct kvm_mmu_page *sp;
+	unsigned int level = 0;
+
+	do {
+		unsigned int idx = parents->idx[level];
+		sp = parents->parent[level];
+		if (!sp)
+			return;
+
+		WARN_ON(idx == INVALID_INDEX);
+		clear_unsync_child_bit(sp, idx);
+		level++;
+	} while (!sp->unsync_children);
+}
+
+int mmu_sync_children(struct kvm_vcpu *vcpu, struct kvm_mmu_page *parent,
+		      bool can_yield)
+{
+	int i;
+	struct kvm_mmu_page *sp;
+	struct mmu_page_path parents;
+	struct kvm_mmu_pages pages;
+	LIST_HEAD(invalid_list);
+	bool flush = false;
+
+	while (mmu_unsync_walk(parent, &pages)) {
+		bool protected = false;
+
+		for_each_sp(pages, sp, parents, i)
+			protected |= kvm_vcpu_write_protect_gfn(vcpu, sp->gfn);
+
+		if (protected) {
+			kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, true);
+			flush = false;
+		}
+
+		for_each_sp(pages, sp, parents, i) {
+			kvm_unlink_unsync_page(vcpu->kvm, sp);
+			flush |= kvm_sync_page(vcpu, sp, &invalid_list) > 0;
+			mmu_pages_clear_parents(&parents);
+		}
+		if (need_resched() || rwlock_needbreak(&vcpu->kvm->mmu_lock)) {
+			kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, flush);
+			if (!can_yield) {
+				kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
+				return -EINTR;
+			}
+
+			cond_resched_rwlock_write(&vcpu->kvm->mmu_lock);
+			flush = false;
+		}
+	}
+
+	kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, flush);
+	return 0;
+}
+
+void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
+{
+	atomic_set(&sp->write_flooding_count,  0);
+}
+
+void clear_sp_write_flooding_count(u64 *spte)
+{
+	__clear_sp_write_flooding_count(sptep_to_sp(spte));
+}
+
+/*
+ * The vCPU is required when finding indirect shadow pages; the shadow
+ * page may already exist and syncing it needs the vCPU pointer in
+ * order to read guest page tables.  Direct shadow pages are never
+ * unsync, thus @vcpu can be NULL if @role.direct is true.
+ */
+static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
+						     struct kvm_vcpu *vcpu,
+						     gfn_t gfn,
+						     struct hlist_head *sp_list,
+						     union kvm_mmu_page_role role)
+{
+	struct kvm_mmu_page *sp;
+	int ret;
+	int collisions = 0;
+	LIST_HEAD(invalid_list);
+
+	for_each_valid_sp(kvm, sp, sp_list) {
+		if (sp->gfn != gfn) {
+			collisions++;
+			continue;
+		}
+
+		if (sp->role.word != role.word) {
+			/*
+			 * If the guest is creating an upper-level page, zap
+			 * unsync pages for the same gfn.  While it's possible
+			 * the guest is using recursive page tables, in all
+			 * likelihood the guest has stopped using the unsync
+			 * page and is installing a completely unrelated page.
+			 * Unsync pages must not be left as is, because the new
+			 * upper-level page will be write-protected.
+			 */
+			if (role.level > PG_LEVEL_4K && sp->unsync)
+				kvm_mmu_prepare_zap_page(kvm, sp,
+							 &invalid_list);
+			continue;
+		}
+
+		/* unsync and write-flooding only apply to indirect SPs. */
+		if (sp->role.direct)
+			goto out;
+
+		if (sp->unsync) {
+			if (KVM_BUG_ON(!vcpu, kvm))
+				break;
+
+			/*
+			 * The page is good, but is stale.  kvm_sync_page does
+			 * get the latest guest state, but (unlike mmu_unsync_children)
+			 * it doesn't write-protect the page or mark it synchronized!
+			 * This way the validity of the mapping is ensured, but the
+			 * overhead of write protection is not incurred until the
+			 * guest invalidates the TLB mapping.  This allows multiple
+			 * SPs for a single gfn to be unsync.
+			 *
+			 * If the sync fails, the page is zapped.  If so, break
+			 * in order to rebuild it.
+			 */
+			ret = kvm_sync_page(vcpu, sp, &invalid_list);
+			if (ret < 0)
+				break;
+
+			WARN_ON(!list_empty(&invalid_list));
+			if (ret > 0)
+				kvm_flush_remote_tlbs(kvm);
+		}
+
+		__clear_sp_write_flooding_count(sp);
+
+		goto out;
+	}
+
+	sp = NULL;
+	++kvm->stat.mmu_cache_miss;
+
+out:
+	kvm_mmu_commit_zap_page(kvm, &invalid_list);
+
+	if (collisions > kvm->stat.max_mmu_page_hash_collisions)
+		kvm->stat.max_mmu_page_hash_collisions = collisions;
+	return sp;
+}
+
+/* Caches used when allocating a new shadow page. */
+struct shadow_page_caches {
+	struct kvm_mmu_memory_cache *page_header_cache;
+	struct kvm_mmu_memory_cache *shadow_page_cache;
+	struct kvm_mmu_memory_cache *shadowed_info_cache;
+};
+
+static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
+						      struct shadow_page_caches *caches,
+						      gfn_t gfn,
+						      struct hlist_head *sp_list,
+						      union kvm_mmu_page_role role)
+{
+	struct kvm_mmu_page *sp;
+
+	sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
+	sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
+	if (!role.direct)
+		sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
+
+	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
+
+	INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
+
+	/*
+	 * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages()
+	 * depends on valid pages being added to the head of the list.  See
+	 * comments in kvm_zap_obsolete_pages().
+	 */
+	sp->mmu_valid_gen = kvm->arch.mmu_valid_gen;
+	list_add(&sp->link, &kvm->arch.active_mmu_pages);
+	kvm_account_mmu_page(kvm, sp);
+
+	sp->gfn = gfn;
+	sp->role = role;
+	hlist_add_head(&sp->hash_link, sp_list);
+	if (sp_has_gptes(sp))
+		account_shadowed(kvm, sp);
+
+	return sp;
+}
+
+/* Note, @vcpu may be NULL if @role.direct is true; see kvm_mmu_find_shadow_page. */
+static struct kvm_mmu_page *__kvm_mmu_get_shadow_page(struct kvm *kvm,
+						      struct kvm_vcpu *vcpu,
+						      struct shadow_page_caches *caches,
+						      gfn_t gfn,
+						      union kvm_mmu_page_role role)
+{
+	struct hlist_head *sp_list;
+	struct kvm_mmu_page *sp;
+	bool created = false;
+
+	sp_list = &kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
+
+	sp = kvm_mmu_find_shadow_page(kvm, vcpu, gfn, sp_list, role);
+	if (!sp) {
+		created = true;
+		sp = kvm_mmu_alloc_shadow_page(kvm, caches, gfn, sp_list, role);
+	}
+
+	trace_kvm_mmu_get_page(sp, created);
+	return sp;
+}
+
+static struct kvm_mmu_page *kvm_mmu_get_shadow_page(struct kvm_vcpu *vcpu,
+						    gfn_t gfn,
+						    union kvm_mmu_page_role role)
+{
+	struct shadow_page_caches caches = {
+		.page_header_cache = &vcpu->arch.mmu_page_header_cache,
+		.shadow_page_cache = &vcpu->arch.mmu_shadow_page_cache,
+		.shadowed_info_cache = &vcpu->arch.mmu_shadowed_info_cache,
+	};
+
+	return __kvm_mmu_get_shadow_page(vcpu->kvm, vcpu, &caches, gfn, role);
+}
+
+static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
+						  unsigned int access)
+{
+	struct kvm_mmu_page *parent_sp = sptep_to_sp(sptep);
+	union kvm_mmu_page_role role;
+
+	role = parent_sp->role;
+	role.level--;
+	role.access = access;
+	role.direct = direct;
+	role.passthrough = 0;
+
+	/*
+	 * If the guest has 4-byte PTEs then that means it's using 32-bit,
+	 * 2-level, non-PAE paging. KVM shadows such guests with PAE paging
+	 * (i.e. 8-byte PTEs). The difference in PTE size means that KVM must
+	 * shadow each guest page table with multiple shadow page tables, which
+	 * requires extra bookkeeping in the role.
+	 *
+	 * Specifically, to shadow the guest's page directory (which covers a
+	 * 4GiB address space), KVM uses 4 PAE page directories, each mapping
+	 * 1GiB of the address space. @role.quadrant encodes which quarter of
+	 * the address space each maps.
+	 *
+	 * To shadow the guest's page tables (which each map a 4MiB region), KVM
+	 * uses 2 PAE page tables, each mapping a 2MiB region. For these,
+	 * @role.quadrant encodes which half of the region they map.
+	 *
+	 * Concretely, a 4-byte PDE consumes bits 31:22, while an 8-byte PDE
+	 * consumes bits 29:21.  To consume bits 31:30, KVM's uses 4 shadow
+	 * PDPTEs; those 4 PAE page directories are pre-allocated and their
+	 * quadrant is assigned in mmu_alloc_root().   A 4-byte PTE consumes
+	 * bits 21:12, while an 8-byte PTE consumes bits 20:12.  To consume
+	 * bit 21 in the PTE (the child here), KVM propagates that bit to the
+	 * quadrant, i.e. sets quadrant to '0' or '1'.  The parent 8-byte PDE
+	 * covers bit 21 (see above), thus the quadrant is calculated from the
+	 * _least_ significant bit of the PDE index.
+	 */
+	if (role.has_4_byte_gpte) {
+		WARN_ON_ONCE(role.level != PG_LEVEL_4K);
+		role.quadrant = spte_index(sptep) & 1;
+	}
+
+	return role;
+}
+
+struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu, u64 *sptep,
+					  gfn_t gfn, bool direct,
+					  unsigned int access)
+{
+	union kvm_mmu_page_role role;
+
+	if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
+		return ERR_PTR(-EEXIST);
+
+	role = kvm_mmu_child_role(sptep, direct, access);
+	return kvm_mmu_get_shadow_page(vcpu, gfn, role);
+}
+
+void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterator,
+				 struct kvm_vcpu *vcpu, hpa_t root, u64 addr)
+{
+	iterator->addr = addr;
+	iterator->shadow_addr = root;
+	iterator->level = vcpu->arch.mmu->root_role.level;
+
+	if (iterator->level >= PT64_ROOT_4LEVEL &&
+	    vcpu->arch.mmu->cpu_role.base.level < PT64_ROOT_4LEVEL &&
+	    !vcpu->arch.mmu->root_role.direct)
+		iterator->level = PT32E_ROOT_LEVEL;
+
+	if (iterator->level == PT32E_ROOT_LEVEL) {
+		/*
+		 * prev_root is currently only used for 64-bit hosts. So only
+		 * the active root_hpa is valid here.
+		 */
+		BUG_ON(root != vcpu->arch.mmu->root.hpa);
+
+		iterator->shadow_addr
+			= vcpu->arch.mmu->pae_root[(addr >> 30) & 3];
+		iterator->shadow_addr &= SPTE_BASE_ADDR_MASK;
+		--iterator->level;
+		if (!iterator->shadow_addr)
+			iterator->level = 0;
+	}
+}
+
+void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
+		      struct kvm_vcpu *vcpu, u64 addr)
+{
+	shadow_walk_init_using_root(iterator, vcpu, vcpu->arch.mmu->root.hpa,
+				    addr);
+}
+
+bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
+{
+	if (iterator->level < PG_LEVEL_4K)
+		return false;
+
+	iterator->index = SPTE_INDEX(iterator->addr, iterator->level);
+	iterator->sptep	= ((u64 *)__va(iterator->shadow_addr)) + iterator->index;
+	return true;
+}
+
+static void __shadow_walk_next(struct kvm_shadow_walk_iterator *iterator,
+			       u64 spte)
+{
+	if (!is_shadow_present_pte(spte) || is_last_spte(spte, iterator->level)) {
+		iterator->level = 0;
+		return;
+	}
+
+	iterator->shadow_addr = spte & SPTE_BASE_ADDR_MASK;
+	--iterator->level;
+}
+
+void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
+{
+	__shadow_walk_next(iterator, *iterator->sptep);
+}
+
+static void __link_shadow_page(struct kvm *kvm,
+			       struct kvm_mmu_memory_cache *cache, u64 *sptep,
+			       struct kvm_mmu_page *sp, bool flush)
+{
+	u64 spte;
+
+	BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
+
+	/*
+	 * If an SPTE is present already, it must be a leaf and therefore
+	 * a large one.  Drop it, and flush the TLB if needed, before
+	 * installing sp.
+	 */
+	if (is_shadow_present_pte(*sptep))
+		drop_large_spte(kvm, sptep, flush);
+
+	spte = make_nonleaf_spte(sp->spt, sp_ad_disabled(sp));
+
+	mmu_spte_set(sptep, spte);
+
+	mmu_page_add_parent_pte(cache, sp, sptep);
+
+	/*
+	 * The non-direct sub-pagetable must be updated before linking.  For
+	 * L1 sp, the pagetable is updated via kvm_sync_page() in
+	 * kvm_mmu_find_shadow_page() without write-protecting the gfn,
+	 * so sp->unsync can be true or false.  For higher level non-direct
+	 * sp, the pagetable is updated/synced via mmu_sync_children() in
+	 * FNAME(fetch)(), so sp->unsync_children can only be false.
+	 * WARN_ON_ONCE() if anything happens unexpectedly.
+	 */
+	if (WARN_ON_ONCE(sp->unsync_children) || sp->unsync)
+		mark_unsync(sptep);
+}
+
+void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep, struct kvm_mmu_page *sp)
+{
+	__link_shadow_page(vcpu->kvm, &vcpu->arch.mmu_pte_list_desc_cache, sptep, sp, true);
+}
+
+void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
+			  unsigned direct_access)
+{
+	if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) {
+		struct kvm_mmu_page *child;
+
+		/*
+		 * For the direct sp, if the guest pte's dirty bit
+		 * changed form clean to dirty, it will corrupt the
+		 * sp's access: allow writable in the read-only sp,
+		 * so we should update the spte at this point to get
+		 * a new sp with the correct access.
+		 */
+		child = spte_to_child_sp(*sptep);
+		if (child->role.access == direct_access)
+			return;
+
+		drop_parent_pte(child, sptep);
+		kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
+	}
+}
+
+/* Returns the number of zapped non-leaf child shadow pages. */
+int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, u64 *spte,
+		     struct list_head *invalid_list)
+{
+	u64 pte;
+	struct kvm_mmu_page *child;
+
+	pte = *spte;
+	if (is_shadow_present_pte(pte)) {
+		if (is_last_spte(pte, sp->role.level)) {
+			drop_spte(kvm, spte);
+		} else {
+			child = spte_to_child_sp(pte);
+			drop_parent_pte(child, spte);
+
+			/*
+			 * Recursively zap nested TDP SPs, parentless SPs are
+			 * unlikely to be used again in the near future.  This
+			 * avoids retaining a large number of stale nested SPs.
+			 */
+			if (tdp_enabled && invalid_list &&
+			    child->role.guest_mode && !child->parent_ptes.val)
+				return kvm_mmu_prepare_zap_page(kvm, child,
+								invalid_list);
+		}
+	} else if (is_mmio_spte(pte)) {
+		mmu_spte_clear_no_track(spte);
+	}
+	return 0;
+}
+
+static int kvm_mmu_page_unlink_children(struct kvm *kvm,
+					struct kvm_mmu_page *sp,
+					struct list_head *invalid_list)
+{
+	int zapped = 0;
+	unsigned i;
+
+	for (i = 0; i < SPTE_ENT_PER_PAGE; ++i)
+		zapped += mmu_page_zap_pte(kvm, sp, sp->spt + i, invalid_list);
+
+	return zapped;
+}
+
+static void kvm_mmu_unlink_parents(struct kvm_mmu_page *sp)
+{
+	u64 *sptep;
+	struct rmap_iterator iter;
+
+	while ((sptep = rmap_get_first(&sp->parent_ptes, &iter)))
+		drop_parent_pte(sp, sptep);
+}
+
+static int mmu_zap_unsync_children(struct kvm *kvm,
+				   struct kvm_mmu_page *parent,
+				   struct list_head *invalid_list)
+{
+	int i, zapped = 0;
+	struct mmu_page_path parents;
+	struct kvm_mmu_pages pages;
+
+	if (parent->role.level == PG_LEVEL_4K)
+		return 0;
+
+	while (mmu_unsync_walk(parent, &pages)) {
+		struct kvm_mmu_page *sp;
+
+		for_each_sp(pages, sp, parents, i) {
+			kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
+			mmu_pages_clear_parents(&parents);
+			zapped++;
+		}
+	}
+
+	return zapped;
+}
+
+bool __kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+				struct list_head *invalid_list,
+				int *nr_zapped)
+{
+	bool list_unstable, zapped_root = false;
+
+	lockdep_assert_held_write(&kvm->mmu_lock);
+	trace_kvm_mmu_prepare_zap_page(sp);
+	++kvm->stat.mmu_shadow_zapped;
+	*nr_zapped = mmu_zap_unsync_children(kvm, sp, invalid_list);
+	*nr_zapped += kvm_mmu_page_unlink_children(kvm, sp, invalid_list);
+	kvm_mmu_unlink_parents(sp);
+
+	/* Zapping children means active_mmu_pages has become unstable. */
+	list_unstable = *nr_zapped;
+
+	if (!sp->role.invalid && sp_has_gptes(sp))
+		unaccount_shadowed(kvm, sp);
+
+	if (sp->unsync)
+		kvm_unlink_unsync_page(kvm, sp);
+	if (!sp->root_count) {
+		/* Count self */
+		(*nr_zapped)++;
+
+		/*
+		 * Already invalid pages (previously active roots) are not on
+		 * the active page list.  See list_del() in the "else" case of
+		 * !sp->root_count.
+		 */
+		if (sp->role.invalid)
+			list_add(&sp->link, invalid_list);
+		else
+			list_move(&sp->link, invalid_list);
+		kvm_unaccount_mmu_page(kvm, sp);
+	} else {
+		/*
+		 * Remove the active root from the active page list, the root
+		 * will be explicitly freed when the root_count hits zero.
+		 */
+		list_del(&sp->link);
+
+		/*
+		 * Obsolete pages cannot be used on any vCPUs, see the comment
+		 * in kvm_mmu_zap_all_fast().  Note, is_obsolete_sp() also
+		 * treats invalid shadow pages as being obsolete.
+		 */
+		zapped_root = !is_obsolete_sp(kvm, sp);
+	}
+
+	if (sp->nx_huge_page_disallowed)
+		unaccount_nx_huge_page(kvm, sp);
+
+	sp->role.invalid = 1;
+
+	/*
+	 * Make the request to free obsolete roots after marking the root
+	 * invalid, otherwise other vCPUs may not see it as invalid.
+	 */
+	if (zapped_root)
+		kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS);
+	return list_unstable;
+}
+
+bool kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+			      struct list_head *invalid_list)
+{
+	int nr_zapped;
+
+	__kvm_mmu_prepare_zap_page(kvm, sp, invalid_list, &nr_zapped);
+	return nr_zapped;
+}
+
+void kvm_mmu_commit_zap_page(struct kvm *kvm, struct list_head *invalid_list)
+{
+	struct kvm_mmu_page *sp, *nsp;
+
+	if (list_empty(invalid_list))
+		return;
+
+	/*
+	 * We need to make sure everyone sees our modifications to
+	 * the page tables and see changes to vcpu->mode here. The barrier
+	 * in the kvm_flush_remote_tlbs() achieves this. This pairs
+	 * with vcpu_enter_guest and walk_shadow_page_lockless_begin/end.
+	 *
+	 * In addition, kvm_flush_remote_tlbs waits for all vcpus to exit
+	 * guest mode and/or lockless shadow page table walks.
+	 */
+	kvm_flush_remote_tlbs(kvm);
+
+	list_for_each_entry_safe(sp, nsp, invalid_list, link) {
+		WARN_ON(!sp->role.invalid || sp->root_count);
+		kvm_mmu_free_shadow_page(sp);
+	}
+}
+
+static unsigned long kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm,
+						  unsigned long nr_to_zap)
+{
+	unsigned long total_zapped = 0;
+	struct kvm_mmu_page *sp, *tmp;
+	LIST_HEAD(invalid_list);
+	bool unstable;
+	int nr_zapped;
+
+	if (list_empty(&kvm->arch.active_mmu_pages))
+		return 0;
+
+restart:
+	list_for_each_entry_safe_reverse(sp, tmp, &kvm->arch.active_mmu_pages, link) {
+		/*
+		 * Don't zap active root pages, the page itself can't be freed
+		 * and zapping it will just force vCPUs to realloc and reload.
+		 */
+		if (sp->root_count)
+			continue;
+
+		unstable = __kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list,
+						      &nr_zapped);
+		total_zapped += nr_zapped;
+		if (total_zapped >= nr_to_zap)
+			break;
+
+		if (unstable)
+			goto restart;
+	}
+
+	kvm_mmu_commit_zap_page(kvm, &invalid_list);
+
+	kvm->stat.mmu_recycled += total_zapped;
+	return total_zapped;
+}
+
+static inline unsigned long kvm_mmu_available_pages(struct kvm *kvm)
+{
+	if (kvm->arch.n_max_mmu_pages > kvm->arch.n_used_mmu_pages)
+		return kvm->arch.n_max_mmu_pages -
+			kvm->arch.n_used_mmu_pages;
+
+	return 0;
+}
+
+int make_mmu_pages_available(struct kvm_vcpu *vcpu)
+{
+	unsigned long avail = kvm_mmu_available_pages(vcpu->kvm);
+
+	if (likely(avail >= KVM_MIN_FREE_MMU_PAGES))
+		return 0;
+
+	kvm_mmu_zap_oldest_mmu_pages(vcpu->kvm, KVM_REFILL_PAGES - avail);
+
+	/*
+	 * Note, this check is intentionally soft, it only guarantees that one
+	 * page is available, while the caller may end up allocating as many as
+	 * four pages, e.g. for PAE roots or for 5-level paging.  Temporarily
+	 * exceeding the (arbitrary by default) limit will not harm the host,
+	 * being too aggressive may unnecessarily kill the guest, and getting an
+	 * exact count is far more trouble than it's worth, especially in the
+	 * page fault paths.
+	 */
+	if (!kvm_mmu_available_pages(vcpu->kvm))
+		return -ENOSPC;
+	return 0;
+}
+
+/*
+ * Changing the number of mmu pages allocated to the vm
+ * Note: if goal_nr_mmu_pages is too small, you will get dead lock
+ */
+void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long goal_nr_mmu_pages)
+{
+	write_lock(&kvm->mmu_lock);
+
+	if (kvm->arch.n_used_mmu_pages > goal_nr_mmu_pages) {
+		kvm_mmu_zap_oldest_mmu_pages(kvm, kvm->arch.n_used_mmu_pages -
+						  goal_nr_mmu_pages);
+
+		goal_nr_mmu_pages = kvm->arch.n_used_mmu_pages;
+	}
+
+	kvm->arch.n_max_mmu_pages = goal_nr_mmu_pages;
+
+	write_unlock(&kvm->mmu_lock);
+}
+
+int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
+{
+	struct kvm_mmu_page *sp;
+	LIST_HEAD(invalid_list);
+	int r;
+
+	pgprintk("%s: looking for gfn %llx\n", __func__, gfn);
+	r = 0;
+	write_lock(&kvm->mmu_lock);
+	for_each_gfn_valid_sp_with_gptes(kvm, sp, gfn) {
+		pgprintk("%s: gfn %llx role %x\n", __func__, gfn,
+			 sp->role.word);
+		r = 1;
+		kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
+	}
+	kvm_mmu_commit_zap_page(kvm, &invalid_list);
+	write_unlock(&kvm->mmu_lock);
+
+	return r;
+}
+
+int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
+{
+	gpa_t gpa;
+	int r;
+
+	if (vcpu->arch.mmu->root_role.direct)
+		return 0;
+
+	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
+
+	r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
+
+	return r;
+}
+
+static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	trace_kvm_mmu_unsync_page(sp);
+	++kvm->stat.mmu_unsync;
+	sp->unsync = 1;
+
+	kvm_mmu_mark_parents_unsync(sp);
+}
+
+/*
+ * Attempt to unsync any shadow pages that can be reached by the specified gfn,
+ * KVM is creating a writable mapping for said gfn.  Returns 0 if all pages
+ * were marked unsync (or if there is no shadow page), -EPERM if the SPTE must
+ * be write-protected.
+ */
+int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
+			    gfn_t gfn, bool can_unsync, bool prefetch)
+{
+	struct kvm_mmu_page *sp;
+	bool locked = false;
+
+	/*
+	 * Force write-protection if the page is being tracked.  Note, the page
+	 * track machinery is used to write-protect upper-level shadow pages,
+	 * i.e. this guards the role.level == 4K assertion below!
+	 */
+	if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE))
+		return -EPERM;
+
+	/*
+	 * The page is not write-tracked, mark existing shadow pages unsync
+	 * unless KVM is synchronizing an unsync SP (can_unsync = false).  In
+	 * that case, KVM must complete emulation of the guest TLB flush before
+	 * allowing shadow pages to become unsync (writable by the guest).
+	 */
+	for_each_gfn_valid_sp_with_gptes(kvm, sp, gfn) {
+		if (!can_unsync)
+			return -EPERM;
+
+		if (sp->unsync)
+			continue;
+
+		if (prefetch)
+			return -EEXIST;
+
+		/*
+		 * TDP MMU page faults require an additional spinlock as they
+		 * run with mmu_lock held for read, not write, and the unsync
+		 * logic is not thread safe.  Take the spinklock regardless of
+		 * the MMU type to avoid extra conditionals/parameters, there's
+		 * no meaningful penalty if mmu_lock is held for write.
+		 */
+		if (!locked) {
+			locked = true;
+			spin_lock(&kvm->arch.mmu_unsync_pages_lock);
+
+			/*
+			 * Recheck after taking the spinlock, a different vCPU
+			 * may have since marked the page unsync.  A false
+			 * positive on the unprotected check above is not
+			 * possible as clearing sp->unsync _must_ hold mmu_lock
+			 * for write, i.e. unsync cannot transition from 0->1
+			 * while this CPU holds mmu_lock for read (or write).
+			 */
+			if (READ_ONCE(sp->unsync))
+				continue;
+		}
+
+		WARN_ON(sp->role.level != PG_LEVEL_4K);
+		kvm_unsync_page(kvm, sp);
+	}
+	if (locked)
+		spin_unlock(&kvm->arch.mmu_unsync_pages_lock);
+
+	/*
+	 * We need to ensure that the marking of unsync pages is visible
+	 * before the SPTE is updated to allow writes because
+	 * kvm_mmu_sync_roots() checks the unsync flags without holding
+	 * the MMU lock and so can race with this. If the SPTE was updated
+	 * before the page had been marked as unsync-ed, something like the
+	 * following could happen:
+	 *
+	 * CPU 1                    CPU 2
+	 * ---------------------------------------------------------------------
+	 * 1.2 Host updates SPTE
+	 *     to be writable
+	 *                      2.1 Guest writes a GPTE for GVA X.
+	 *                          (GPTE being in the guest page table shadowed
+	 *                           by the SP from CPU 1.)
+	 *                          This reads SPTE during the page table walk.
+	 *                          Since SPTE.W is read as 1, there is no
+	 *                          fault.
+	 *
+	 *                      2.2 Guest issues TLB flush.
+	 *                          That causes a VM Exit.
+	 *
+	 *                      2.3 Walking of unsync pages sees sp->unsync is
+	 *                          false and skips the page.
+	 *
+	 *                      2.4 Guest accesses GVA X.
+	 *                          Since the mapping in the SP was not updated,
+	 *                          so the old mapping for GVA X incorrectly
+	 *                          gets used.
+	 * 1.1 Host marks SP
+	 *     as unsync
+	 *     (sp->unsync = true)
+	 *
+	 * The write barrier below ensures that 1.1 happens before 1.2 and thus
+	 * the situation in 2.4 does not arise.  It pairs with the read barrier
+	 * in is_unsync_root(), placed between 2.1's load of SPTE.W and 2.3.
+	 */
+	smp_wmb();
+
+	return 0;
+}
+
+int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
+		 u64 *sptep, unsigned int pte_access, gfn_t gfn,
+		 kvm_pfn_t pfn, struct kvm_page_fault *fault)
+{
+	struct kvm_mmu_page *sp = sptep_to_sp(sptep);
+	int level = sp->role.level;
+	int was_rmapped = 0;
+	int ret = RET_PF_FIXED;
+	bool flush = false;
+	bool wrprot;
+	u64 spte;
+
+	/* Prefetching always gets a writable pfn.  */
+	bool host_writable = !fault || fault->map_writable;
+	bool prefetch = !fault || fault->prefetch;
+	bool write_fault = fault && fault->write;
+
+	pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
+		 *sptep, write_fault, gfn);
+
+	if (unlikely(is_noslot_pfn(pfn))) {
+		vcpu->stat.pf_mmio_spte_created++;
+		mark_mmio_spte(vcpu, sptep, gfn, pte_access);
+		return RET_PF_EMULATE;
+	}
+
+	if (is_shadow_present_pte(*sptep)) {
+		/*
+		 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
+		 * the parent of the now unreachable PTE.
+		 */
+		if (level > PG_LEVEL_4K && !is_large_pte(*sptep)) {
+			struct kvm_mmu_page *child;
+			u64 pte = *sptep;
+
+			child = spte_to_child_sp(pte);
+			drop_parent_pte(child, sptep);
+			flush = true;
+		} else if (pfn != spte_to_pfn(*sptep)) {
+			pgprintk("hfn old %llx new %llx\n",
+				 spte_to_pfn(*sptep), pfn);
+			drop_spte(vcpu->kvm, sptep);
+			flush = true;
+		} else
+			was_rmapped = 1;
+	}
+
+	wrprot = make_spte(vcpu, sp, slot, pte_access, gfn, pfn, *sptep, prefetch,
+			   true, host_writable, &spte);
+
+	if (*sptep == spte) {
+		ret = RET_PF_SPURIOUS;
+	} else {
+		flush |= mmu_spte_update(sptep, spte);
+		trace_kvm_mmu_set_spte(level, gfn, sptep);
+	}
+
+	if (wrprot) {
+		if (write_fault)
+			ret = RET_PF_EMULATE;
+	}
+
+	if (flush)
+		kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn,
+				KVM_PAGES_PER_HPAGE(level));
+
+	pgprintk("%s: setting spte %llx\n", __func__, *sptep);
+
+	if (!was_rmapped) {
+		WARN_ON_ONCE(ret == RET_PF_SPURIOUS);
+		rmap_add(vcpu, slot, sptep, gfn, pte_access);
+	} else {
+		/* Already rmapped but the pte_access bits may have changed. */
+		kvm_mmu_page_set_access(sp, spte_index(sptep), pte_access);
+	}
+
+	return ret;
+}
+
+static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
+				    struct kvm_mmu_page *sp,
+				    u64 *start, u64 *end)
+{
+	struct page *pages[PTE_PREFETCH_NUM];
+	struct kvm_memory_slot *slot;
+	unsigned int access = sp->role.access;
+	int i, ret;
+	gfn_t gfn;
+
+	gfn = kvm_mmu_page_get_gfn(sp, spte_index(start));
+	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, access & ACC_WRITE_MASK);
+	if (!slot)
+		return -1;
+
+	ret = gfn_to_page_many_atomic(slot, gfn, pages, end - start);
+	if (ret <= 0)
+		return -1;
+
+	for (i = 0; i < ret; i++, gfn++, start++) {
+		mmu_set_spte(vcpu, slot, start, access, gfn,
+			     page_to_pfn(pages[i]), NULL);
+		put_page(pages[i]);
+	}
+
+	return 0;
+}
+
+void __direct_pte_prefetch(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+			   u64 *sptep)
+{
+	u64 *spte, *start = NULL;
+	int i;
+
+	WARN_ON(!sp->role.direct);
+
+	i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
+	spte = sp->spt + i;
+
+	for (i = 0; i < PTE_PREFETCH_NUM; i++, spte++) {
+		if (is_shadow_present_pte(*spte) || spte == sptep) {
+			if (!start)
+				continue;
+			if (direct_pte_prefetch_many(vcpu, sp, start, spte) < 0)
+				return;
+			start = NULL;
+		} else if (!start)
+			start = spte;
+	}
+	if (start)
+		direct_pte_prefetch_many(vcpu, sp, start, spte);
+}
+
+static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
+{
+	struct kvm_mmu_page *sp;
+
+	sp = sptep_to_sp(sptep);
+
+	/*
+	 * Without accessed bits, there's no way to distinguish between
+	 * actually accessed translations and prefetched, so disable pte
+	 * prefetch if accessed bits aren't available.
+	 */
+	if (sp_ad_disabled(sp))
+		return;
+
+	if (sp->role.level > PG_LEVEL_4K)
+		return;
+
+	/*
+	 * If addresses are being invalidated, skip prefetching to avoid
+	 * accidentally prefetching those addresses.
+	 */
+	if (unlikely(vcpu->kvm->mmu_invalidate_in_progress))
+		return;
+
+	__direct_pte_prefetch(vcpu, sp, sptep);
+}
+
+int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+{
+	struct kvm_shadow_walk_iterator it;
+	struct kvm_mmu_page *sp;
+	int ret;
+	gfn_t base_gfn = fault->gfn;
+
+	kvm_mmu_hugepage_adjust(vcpu, fault);
+
+	trace_kvm_mmu_spte_requested(fault);
+	for_each_shadow_entry(vcpu, fault->addr, it) {
+		/*
+		 * We cannot overwrite existing page tables with an NX
+		 * large page, as the leaf could be executable.
+		 */
+		if (fault->nx_huge_page_workaround_enabled)
+			disallowed_hugepage_adjust(fault, *it.sptep, it.level);
+
+		base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
+		if (it.level == fault->goal_level)
+			break;
+
+		sp = kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, true, ACC_ALL);
+		if (sp == ERR_PTR(-EEXIST))
+			continue;
+
+		link_shadow_page(vcpu, it.sptep, sp);
+		if (fault->huge_page_disallowed)
+			account_nx_huge_page(vcpu->kvm, sp,
+					     fault->req_level >= it.level);
+	}
+
+	if (WARN_ON_ONCE(it.level != fault->goal_level))
+		return -EFAULT;
+
+	ret = mmu_set_spte(vcpu, fault->slot, it.sptep, ACC_ALL,
+			   base_gfn, fault->pfn, fault);
+	if (ret == RET_PF_SPURIOUS)
+		return ret;
+
+	direct_pte_prefetch(vcpu, it.sptep);
+	return ret;
+}
+
+/*
+ * Returns the last level spte pointer of the shadow page walk for the given
+ * gpa, and sets *spte to the spte value. This spte may be non-preset. If no
+ * walk could be performed, returns NULL and *spte does not contain valid data.
+ *
+ * Contract:
+ *  - Must be called between walk_shadow_page_lockless_{begin,end}.
+ *  - The returned sptep must not be used after walk_shadow_page_lockless_end.
+ */
+u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte)
+{
+	struct kvm_shadow_walk_iterator iterator;
+	u64 old_spte;
+	u64 *sptep = NULL;
+
+	for_each_shadow_entry_lockless(vcpu, gpa, iterator, old_spte) {
+		sptep = iterator.sptep;
+		*spte = old_spte;
+	}
+
+	return sptep;
+}
+
+void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
+{
+	unsigned long roots_to_free = 0;
+	hpa_t root_hpa;
+	int i;
+
+	/*
+	 * This should not be called while L2 is active, L2 can't invalidate
+	 * _only_ its own roots, e.g. INVVPID unconditionally exits.
+	 */
+	WARN_ON_ONCE(mmu->root_role.guest_mode);
+
+	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
+		root_hpa = mmu->prev_roots[i].hpa;
+		if (!VALID_PAGE(root_hpa))
+			continue;
+
+		if (!to_shadow_page(root_hpa) ||
+			to_shadow_page(root_hpa)->role.guest_mode)
+			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
+	}
+
+	kvm_mmu_free_roots(kvm, mmu, roots_to_free);
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_free_guest_mode_roots);
+
+
+static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
+{
+	int ret = 0;
+
+	if (!kvm_vcpu_is_visible_gfn(vcpu, root_gfn)) {
+		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+		ret = 1;
+	}
+
+	return ret;
+}
+
+hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant, u8 level)
+{
+	union kvm_mmu_page_role role = vcpu->arch.mmu->root_role;
+	struct kvm_mmu_page *sp;
+
+	role.level = level;
+	role.quadrant = quadrant;
+
+	WARN_ON_ONCE(quadrant && !role.has_4_byte_gpte);
+	WARN_ON_ONCE(role.direct && role.has_4_byte_gpte);
+
+	sp = kvm_mmu_get_shadow_page(vcpu, gfn, role);
+	++sp->root_count;
+
+	return __pa(sp->spt);
+}
+
+static int mmu_first_shadow_root_alloc(struct kvm *kvm)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *slot;
+	int r = 0, i, bkt;
+
+	/*
+	 * Check if this is the first shadow root being allocated before
+	 * taking the lock.
+	 */
+	if (kvm_shadow_root_allocated(kvm))
+		return 0;
+
+	mutex_lock(&kvm->slots_arch_lock);
+
+	/* Recheck, under the lock, whether this is the first shadow root. */
+	if (kvm_shadow_root_allocated(kvm))
+		goto out_unlock;
+
+	/*
+	 * Check if anything actually needs to be allocated, e.g. all metadata
+	 * will be allocated upfront if TDP is disabled.
+	 */
+	if (kvm_memslots_have_rmaps(kvm) &&
+	    kvm_page_track_write_tracking_enabled(kvm))
+		goto out_success;
+
+	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+		slots = __kvm_memslots(kvm, i);
+		kvm_for_each_memslot(slot, bkt, slots) {
+			/*
+			 * Both of these functions are no-ops if the target is
+			 * already allocated, so unconditionally calling both
+			 * is safe.  Intentionally do NOT free allocations on
+			 * failure to avoid having to track which allocations
+			 * were made now versus when the memslot was created.
+			 * The metadata is guaranteed to be freed when the slot
+			 * is freed, and will be kept/used if userspace retries
+			 * KVM_RUN instead of killing the VM.
+			 */
+			r = memslot_rmap_alloc(slot, slot->npages);
+			if (r)
+				goto out_unlock;
+			r = kvm_page_track_write_tracking_alloc(slot);
+			if (r)
+				goto out_unlock;
+		}
+	}
+
+	/*
+	 * Ensure that shadow_root_allocated becomes true strictly after
+	 * all the related pointers are set.
+	 */
+out_success:
+	smp_store_release(&kvm->arch.shadow_root_allocated, true);
+
+out_unlock:
+	mutex_unlock(&kvm->slots_arch_lock);
+	return r;
+}
+
+int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
+{
+	struct kvm_mmu *mmu = vcpu->arch.mmu;
+	u64 pdptrs[4], pm_mask;
+	gfn_t root_gfn, root_pgd;
+	int quadrant, i, r;
+	hpa_t root;
+
+	root_pgd = mmu->get_guest_pgd(vcpu);
+	root_gfn = root_pgd >> PAGE_SHIFT;
+
+	if (mmu_check_root(vcpu, root_gfn))
+		return 1;
+
+	/*
+	 * On SVM, reading PDPTRs might access guest memory, which might fault
+	 * and thus might sleep.  Grab the PDPTRs before acquiring mmu_lock.
+	 */
+	if (mmu->cpu_role.base.level == PT32E_ROOT_LEVEL) {
+		for (i = 0; i < 4; ++i) {
+			pdptrs[i] = mmu->get_pdptr(vcpu, i);
+			if (!(pdptrs[i] & PT_PRESENT_MASK))
+				continue;
+
+			if (mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT))
+				return 1;
+		}
+	}
+
+	r = mmu_first_shadow_root_alloc(vcpu->kvm);
+	if (r)
+		return r;
+
+	write_lock(&vcpu->kvm->mmu_lock);
+	r = make_mmu_pages_available(vcpu);
+	if (r < 0)
+		goto out_unlock;
+
+	/*
+	 * Do we shadow a long mode page table? If so we need to
+	 * write-protect the guests page table root.
+	 */
+	if (mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL) {
+		root = mmu_alloc_root(vcpu, root_gfn, 0,
+				      mmu->root_role.level);
+		mmu->root.hpa = root;
+		goto set_root_pgd;
+	}
+
+	if (WARN_ON_ONCE(!mmu->pae_root)) {
+		r = -EIO;
+		goto out_unlock;
+	}
+
+	/*
+	 * We shadow a 32 bit page table. This may be a legacy 2-level
+	 * or a PAE 3-level page table. In either case we need to be aware that
+	 * the shadow page table may be a PAE or a long mode page table.
+	 */
+	pm_mask = PT_PRESENT_MASK | shadow_me_value;
+	if (mmu->root_role.level >= PT64_ROOT_4LEVEL) {
+		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
+
+		if (WARN_ON_ONCE(!mmu->pml4_root)) {
+			r = -EIO;
+			goto out_unlock;
+		}
+		mmu->pml4_root[0] = __pa(mmu->pae_root) | pm_mask;
+
+		if (mmu->root_role.level == PT64_ROOT_5LEVEL) {
+			if (WARN_ON_ONCE(!mmu->pml5_root)) {
+				r = -EIO;
+				goto out_unlock;
+			}
+			mmu->pml5_root[0] = __pa(mmu->pml4_root) | pm_mask;
+		}
+	}
+
+	for (i = 0; i < 4; ++i) {
+		WARN_ON_ONCE(IS_VALID_PAE_ROOT(mmu->pae_root[i]));
+
+		if (mmu->cpu_role.base.level == PT32E_ROOT_LEVEL) {
+			if (!(pdptrs[i] & PT_PRESENT_MASK)) {
+				mmu->pae_root[i] = INVALID_PAE_ROOT;
+				continue;
+			}
+			root_gfn = pdptrs[i] >> PAGE_SHIFT;
+		}
+
+		/*
+		 * If shadowing 32-bit non-PAE page tables, each PAE page
+		 * directory maps one quarter of the guest's non-PAE page
+		 * directory. Othwerise each PAE page direct shadows one guest
+		 * PAE page directory so that quadrant should be 0.
+		 */
+		quadrant = (mmu->cpu_role.base.level == PT32_ROOT_LEVEL) ? i : 0;
+
+		root = mmu_alloc_root(vcpu, root_gfn, quadrant, PT32_ROOT_LEVEL);
+		mmu->pae_root[i] = root | pm_mask;
+	}
+
+	if (mmu->root_role.level == PT64_ROOT_5LEVEL)
+		mmu->root.hpa = __pa(mmu->pml5_root);
+	else if (mmu->root_role.level == PT64_ROOT_4LEVEL)
+		mmu->root.hpa = __pa(mmu->pml4_root);
+	else
+		mmu->root.hpa = __pa(mmu->pae_root);
+
+set_root_pgd:
+	mmu->root.pgd = root_pgd;
+out_unlock:
+	write_unlock(&vcpu->kvm->mmu_lock);
+
+	return r;
+}
+
+int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
+{
+	struct kvm_mmu *mmu = vcpu->arch.mmu;
+	bool need_pml5 = mmu->root_role.level > PT64_ROOT_4LEVEL;
+	u64 *pml5_root = NULL;
+	u64 *pml4_root = NULL;
+	u64 *pae_root;
+
+	/*
+	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
+	 * tables are allocated and initialized at root creation as there is no
+	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
+	 * on demand, as running a 32-bit L1 VMM on 64-bit KVM is very rare.
+	 */
+	if (mmu->root_role.direct ||
+	    mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL ||
+	    mmu->root_role.level < PT64_ROOT_4LEVEL)
+		return 0;
+
+	/*
+	 * NPT, the only paging mode that uses this horror, uses a fixed number
+	 * of levels for the shadow page tables, e.g. all MMUs are 4-level or
+	 * all MMus are 5-level.  Thus, this can safely require that pml5_root
+	 * is allocated if the other roots are valid and pml5 is needed, as any
+	 * prior MMU would also have required pml5.
+	 */
+	if (mmu->pae_root && mmu->pml4_root && (!need_pml5 || mmu->pml5_root))
+		return 0;
+
+	/*
+	 * The special roots should always be allocated in concert.  Yell and
+	 * bail if KVM ends up in a state where only one of the roots is valid.
+	 */
+	if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->pml4_root ||
+			 (need_pml5 && mmu->pml5_root)))
+		return -EIO;
+
+	/*
+	 * Unlike 32-bit NPT, the PDP table doesn't need to be in low mem, and
+	 * doesn't need to be decrypted.
+	 */
+	pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+	if (!pae_root)
+		return -ENOMEM;
+
+#ifdef CONFIG_X86_64
+	pml4_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+	if (!pml4_root)
+		goto err_pml4;
+
+	if (need_pml5) {
+		pml5_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+		if (!pml5_root)
+			goto err_pml5;
+	}
+#endif
+
+	mmu->pae_root = pae_root;
+	mmu->pml4_root = pml4_root;
+	mmu->pml5_root = pml5_root;
+
+	return 0;
+
+#ifdef CONFIG_X86_64
+err_pml5:
+	free_page((unsigned long)pml4_root);
+err_pml4:
+	free_page((unsigned long)pae_root);
+	return -ENOMEM;
+#endif
+}
+
+static bool is_unsync_root(hpa_t root)
+{
+	struct kvm_mmu_page *sp;
+
+	if (!VALID_PAGE(root))
+		return false;
+
+	/*
+	 * The read barrier orders the CPU's read of SPTE.W during the page table
+	 * walk before the reads of sp->unsync/sp->unsync_children here.
+	 *
+	 * Even if another CPU was marking the SP as unsync-ed simultaneously,
+	 * any guest page table changes are not guaranteed to be visible anyway
+	 * until this VCPU issues a TLB flush strictly after those changes are
+	 * made.  We only need to ensure that the other CPU sets these flags
+	 * before any actual changes to the page tables are made.  The comments
+	 * in mmu_try_to_unsync_pages() describe what could go wrong if this
+	 * requirement isn't satisfied.
+	 */
+	smp_rmb();
+	sp = to_shadow_page(root);
+
+	/*
+	 * PAE roots (somewhat arbitrarily) aren't backed by shadow pages, the
+	 * PDPTEs for a given PAE root need to be synchronized individually.
+	 */
+	if (WARN_ON_ONCE(!sp))
+		return false;
+
+	if (sp->unsync || sp->unsync_children)
+		return true;
+
+	return false;
+}
+
+void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
+{
+	int i;
+	struct kvm_mmu_page *sp;
+
+	if (vcpu->arch.mmu->root_role.direct)
+		return;
+
+	if (!VALID_PAGE(vcpu->arch.mmu->root.hpa))
+		return;
+
+	vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
+
+	if (vcpu->arch.mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL) {
+		hpa_t root = vcpu->arch.mmu->root.hpa;
+		sp = to_shadow_page(root);
+
+		if (!is_unsync_root(root))
+			return;
+
+		write_lock(&vcpu->kvm->mmu_lock);
+		mmu_sync_children(vcpu, sp, true);
+		write_unlock(&vcpu->kvm->mmu_lock);
+		return;
+	}
+
+	write_lock(&vcpu->kvm->mmu_lock);
+
+	for (i = 0; i < 4; ++i) {
+		hpa_t root = vcpu->arch.mmu->pae_root[i];
+
+		if (IS_VALID_PAE_ROOT(root)) {
+			sp = spte_to_child_sp(root);
+			mmu_sync_children(vcpu, sp, true);
+		}
+	}
+
+	write_unlock(&vcpu->kvm->mmu_lock);
+}
+
+void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu)
+{
+	unsigned long roots_to_free = 0;
+	int i;
+
+	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
+		if (is_unsync_root(vcpu->arch.mmu->prev_roots[i].hpa))
+			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
+
+	/* sync prev_roots by simply freeing them */
+	kvm_mmu_free_roots(vcpu->kvm, vcpu->arch.mmu, roots_to_free);
+}
+
+/*
+ * Return the level of the lowest level SPTE added to sptes.
+ * That SPTE may be non-present.
+ *
+ * Must be called between walk_shadow_page_lockless_{begin,end}.
+ */
+int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level)
+{
+	struct kvm_shadow_walk_iterator iterator;
+	int leaf = -1;
+	u64 spte;
+
+	for (shadow_walk_init(&iterator, vcpu, addr),
+	     *root_level = iterator.level;
+	     shadow_walk_okay(&iterator);
+	     __shadow_walk_next(&iterator, spte)) {
+		leaf = iterator.level;
+		spte = mmu_spte_get_lockless(iterator.sptep);
+
+		sptes[leaf] = spte;
+	}
+
+	return leaf;
+}
+
+void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
+{
+	struct kvm_shadow_walk_iterator iterator;
+	u64 spte;
+
+	walk_shadow_page_lockless_begin(vcpu);
+	for_each_shadow_entry_lockless(vcpu, addr, iterator, spte)
+		clear_sp_write_flooding_count(iterator.sptep);
+	walk_shadow_page_lockless_end(vcpu);
+}
+
+static bool is_obsolete_root(struct kvm *kvm, hpa_t root_hpa)
+{
+	struct kvm_mmu_page *sp;
+
+	if (!VALID_PAGE(root_hpa))
+		return false;
+
+	/*
+	 * When freeing obsolete roots, treat roots as obsolete if they don't
+	 * have an associated shadow page.  This does mean KVM will get false
+	 * positives and free roots that don't strictly need to be freed, but
+	 * such false positives are relatively rare:
+	 *
+	 *  (a) only PAE paging and nested NPT has roots without shadow pages
+	 *  (b) remote reloads due to a memslot update obsoletes _all_ roots
+	 *  (c) KVM doesn't track previous roots for PAE paging, and the guest
+	 *      is unlikely to zap an in-use PGD.
+	 */
+	sp = to_shadow_page(root_hpa);
+	return !sp || is_obsolete_sp(kvm, sp);
+}
+
+static void __kvm_mmu_free_obsolete_roots(struct kvm *kvm, struct kvm_mmu *mmu)
+{
+	unsigned long roots_to_free = 0;
+	int i;
+
+	if (is_obsolete_root(kvm, mmu->root.hpa))
+		roots_to_free |= KVM_MMU_ROOT_CURRENT;
+
+	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
+		if (is_obsolete_root(kvm, mmu->prev_roots[i].hpa))
+			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
+	}
+
+	if (roots_to_free)
+		kvm_mmu_free_roots(kvm, mmu, roots_to_free);
+}
+
+void kvm_mmu_free_obsolete_roots(struct kvm_vcpu *vcpu)
+{
+	__kvm_mmu_free_obsolete_roots(vcpu->kvm, &vcpu->arch.root_mmu);
+	__kvm_mmu_free_obsolete_roots(vcpu->kvm, &vcpu->arch.guest_mmu);
+}
+
+static u64 mmu_pte_write_fetch_gpte(struct kvm_vcpu *vcpu, gpa_t *gpa,
+				    int *bytes)
+{
+	u64 gentry = 0;
+	int r;
+
+	/*
+	 * Assume that the pte write on a page table of the same type
+	 * as the current vcpu paging mode since we update the sptes only
+	 * when they have the same mode.
+	 */
+	if (is_pae(vcpu) && *bytes == 4) {
+		/* Handle a 32-bit guest writing two halves of a 64-bit gpte */
+		*gpa &= ~(gpa_t)7;
+		*bytes = 8;
+	}
+
+	if (*bytes == 4 || *bytes == 8) {
+		r = kvm_vcpu_read_guest_atomic(vcpu, *gpa, &gentry, *bytes);
+		if (r)
+			gentry = 0;
+	}
+
+	return gentry;
+}
+
+/*
+ * If we're seeing too many writes to a page, it may no longer be a page table,
+ * or we may be forking, in which case it is better to unmap the page.
+ */
+static bool detect_write_flooding(struct kvm_mmu_page *sp)
+{
+	/*
+	 * Skip write-flooding detected for the sp whose level is 1, because
+	 * it can become unsync, then the guest page is not write-protected.
+	 */
+	if (sp->role.level == PG_LEVEL_4K)
+		return false;
+
+	atomic_inc(&sp->write_flooding_count);
+	return atomic_read(&sp->write_flooding_count) >= 3;
+}
+
+/*
+ * Misaligned accesses are too much trouble to fix up; also, they usually
+ * indicate a page is not used as a page table.
+ */
+static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa,
+				    int bytes)
+{
+	unsigned offset, pte_size, misaligned;
+
+	pgprintk("misaligned: gpa %llx bytes %d role %x\n",
+		 gpa, bytes, sp->role.word);
+
+	offset = offset_in_page(gpa);
+	pte_size = sp->role.has_4_byte_gpte ? 4 : 8;
+
+	/*
+	 * Sometimes, the OS only writes the last one bytes to update status
+	 * bits, for example, in linux, andb instruction is used in clear_bit().
+	 */
+	if (!(offset & (pte_size - 1)) && bytes == 1)
+		return false;
+
+	misaligned = (offset ^ (offset + bytes - 1)) & ~(pte_size - 1);
+	misaligned |= bytes < 4;
+
+	return misaligned;
+}
+
+static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
+{
+	unsigned page_offset, quadrant;
+	u64 *spte;
+	int level;
+
+	page_offset = offset_in_page(gpa);
+	level = sp->role.level;
+	*nspte = 1;
+	if (sp->role.has_4_byte_gpte) {
+		page_offset <<= 1;	/* 32->64 */
+		/*
+		 * A 32-bit pde maps 4MB while the shadow pdes map
+		 * only 2MB.  So we need to double the offset again
+		 * and zap two pdes instead of one.
+		 */
+		if (level == PT32_ROOT_LEVEL) {
+			page_offset &= ~7; /* kill rounding error */
+			page_offset <<= 1;
+			*nspte = 2;
+		}
+		quadrant = page_offset >> PAGE_SHIFT;
+		page_offset &= ~PAGE_MASK;
+		if (quadrant != sp->role.quadrant)
+			return NULL;
+	}
+
+	spte = &sp->spt[page_offset / sizeof(*spte)];
+	return spte;
+}
+
+void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+		       int bytes, struct kvm_page_track_notifier_node *node)
+{
+	gfn_t gfn = gpa >> PAGE_SHIFT;
+	struct kvm_mmu_page *sp;
+	LIST_HEAD(invalid_list);
+	u64 entry, gentry, *spte;
+	int npte;
+	bool flush = false;
+
+	/*
+	 * If we don't have indirect shadow pages, it means no page is
+	 * write-protected, so we can exit simply.
+	 */
+	if (!READ_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
+		return;
+
+	pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
+
+	write_lock(&vcpu->kvm->mmu_lock);
+
+	gentry = mmu_pte_write_fetch_gpte(vcpu, &gpa, &bytes);
+
+	++vcpu->kvm->stat.mmu_pte_write;
+
+	for_each_gfn_valid_sp_with_gptes(vcpu->kvm, sp, gfn) {
+		if (detect_write_misaligned(sp, gpa, bytes) ||
+		      detect_write_flooding(sp)) {
+			kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list);
+			++vcpu->kvm->stat.mmu_flooded;
+			continue;
+		}
+
+		spte = get_written_sptes(sp, gpa, &npte);
+		if (!spte)
+			continue;
+
+		while (npte--) {
+			entry = *spte;
+			mmu_page_zap_pte(vcpu->kvm, sp, spte, NULL);
+			if (gentry && sp->role.level != PG_LEVEL_4K)
+				++vcpu->kvm->stat.mmu_pde_zapped;
+			if (is_shadow_present_pte(entry))
+				flush = true;
+			++spte;
+		}
+	}
+	kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, flush);
+	write_unlock(&vcpu->kvm->mmu_lock);
+}
+
+static __always_inline bool __walk_slot_rmaps(struct kvm *kvm,
+					      const struct kvm_memory_slot *slot,
+					      slot_rmaps_handler fn,
+					      int start_level, int end_level,
+					      gfn_t start_gfn, gfn_t end_gfn,
+					      bool flush_on_yield, bool flush)
+{
+	struct slot_rmap_walk_iterator iterator;
+
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	for_each_slot_rmap_range(slot, start_level, end_level, start_gfn,
+				 end_gfn, &iterator) {
+		if (iterator.rmap)
+			flush |= fn(kvm, iterator.rmap, slot);
+
+		if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
+			if (flush && flush_on_yield) {
+				kvm_flush_remote_tlbs_with_address(kvm,
+						start_gfn,
+						iterator.gfn - start_gfn + 1);
+				flush = false;
+			}
+			cond_resched_rwlock_write(&kvm->mmu_lock);
+		}
+	}
+
+	return flush;
+}
+
+__always_inline bool walk_slot_rmaps(struct kvm *kvm,
+				     const struct kvm_memory_slot *slot,
+				     slot_rmaps_handler fn, int start_level,
+				     int end_level, bool flush_on_yield)
+{
+	return __walk_slot_rmaps(kvm, slot, fn, start_level, end_level,
+				 slot->base_gfn, slot->base_gfn + slot->npages - 1,
+				 flush_on_yield, false);
+}
+
+__always_inline bool walk_slot_rmaps_4k(struct kvm *kvm,
+					const struct kvm_memory_slot *slot,
+					slot_rmaps_handler fn,
+					bool flush_on_yield)
+{
+	return walk_slot_rmaps(kvm, slot, fn, PG_LEVEL_4K,
+			       PG_LEVEL_4K, flush_on_yield);
+}
+
+#define BATCH_ZAP_PAGES	10
+void kvm_zap_obsolete_pages(struct kvm *kvm)
+{
+	struct kvm_mmu_page *sp, *node;
+	int nr_zapped, batch = 0;
+	bool unstable;
+
+restart:
+	list_for_each_entry_safe_reverse(sp, node,
+	      &kvm->arch.active_mmu_pages, link) {
+		/*
+		 * No obsolete valid page exists before a newly created page
+		 * since active_mmu_pages is a FIFO list.
+		 */
+		if (!is_obsolete_sp(kvm, sp))
+			break;
+
+		/*
+		 * Invalid pages should never land back on the list of active
+		 * pages.  Skip the bogus page, otherwise we'll get stuck in an
+		 * infinite loop if the page gets put back on the list (again).
+		 */
+		if (WARN_ON(sp->role.invalid))
+			continue;
+
+		/*
+		 * No need to flush the TLB since we're only zapping shadow
+		 * pages with an obsolete generation number and all vCPUS have
+		 * loaded a new root, i.e. the shadow pages being zapped cannot
+		 * be in active use by the guest.
+		 */
+		if (batch >= BATCH_ZAP_PAGES &&
+		    cond_resched_rwlock_write(&kvm->mmu_lock)) {
+			batch = 0;
+			goto restart;
+		}
+
+		unstable = __kvm_mmu_prepare_zap_page(kvm, sp,
+				&kvm->arch.zapped_obsolete_pages, &nr_zapped);
+		batch += nr_zapped;
+
+		if (unstable)
+			goto restart;
+	}
+
+	/*
+	 * Kick all vCPUs (via remote TLB flush) before freeing the page tables
+	 * to ensure KVM is not in the middle of a lockless shadow page table
+	 * walk, which may reference the pages.  The remote TLB flush itself is
+	 * not required and is simply a convenient way to kick vCPUs as needed.
+	 * KVM performs a local TLB flush when allocating a new root (see
+	 * kvm_mmu_load()), and the reload in the caller ensure no vCPUs are
+	 * running with an obsolete MMU.
+	 */
+	kvm_mmu_commit_zap_page(kvm, &kvm->arch.zapped_obsolete_pages);
+}
+
+static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
+{
+	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
+}
+
+bool kvm_rmap_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
+{
+	const struct kvm_memory_slot *memslot;
+	struct kvm_memslots *slots;
+	struct kvm_memslot_iter iter;
+	bool flush = false;
+	gfn_t start, end;
+	int i;
+
+	if (!kvm_memslots_have_rmaps(kvm))
+		return flush;
+
+	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+		slots = __kvm_memslots(kvm, i);
+
+		kvm_for_each_memslot_in_gfn_range(&iter, slots, gfn_start, gfn_end) {
+			memslot = iter.slot;
+			start = max(gfn_start, memslot->base_gfn);
+			end = min(gfn_end, memslot->base_gfn + memslot->npages);
+			if (WARN_ON_ONCE(start >= end))
+				continue;
+
+			flush = __walk_slot_rmaps(kvm, memslot, __kvm_zap_rmap,
+						  PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
+						  start, end - 1, true, flush);
+		}
+	}
+
+	return flush;
+}
+
+bool slot_rmap_write_protect(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			     const struct kvm_memory_slot *slot)
+{
+	return rmap_write_protect(rmap_head, false);
+}
+
+static struct kvm_mmu_page *shadow_mmu_get_sp_for_split(struct kvm *kvm, u64 *huge_sptep)
+{
+	struct kvm_mmu_page *huge_sp = sptep_to_sp(huge_sptep);
+	struct shadow_page_caches caches = {};
+	union kvm_mmu_page_role role;
+	unsigned int access;
+	gfn_t gfn;
+
+	gfn = kvm_mmu_page_get_gfn(huge_sp, spte_index(huge_sptep));
+	access = kvm_mmu_page_get_access(huge_sp, spte_index(huge_sptep));
+
+	/*
+	 * Note, huge page splitting always uses direct shadow pages, regardless
+	 * of whether the huge page itself is mapped by a direct or indirect
+	 * shadow page, since the huge page region itself is being directly
+	 * mapped with smaller pages.
+	 */
+	role = kvm_mmu_child_role(huge_sptep, /*direct=*/true, access);
+
+	/* Direct SPs do not require a shadowed_info_cache. */
+	caches.page_header_cache = &kvm->arch.split_page_header_cache;
+	caches.shadow_page_cache = &kvm->arch.split_shadow_page_cache;
+
+	/* Safe to pass NULL for vCPU since requesting a direct SP. */
+	return __kvm_mmu_get_shadow_page(kvm, NULL, &caches, gfn, role);
+}
+
+static void shadow_mmu_split_huge_page(struct kvm *kvm,
+				       const struct kvm_memory_slot *slot,
+				       u64 *huge_sptep)
+
+{
+	struct kvm_mmu_memory_cache *cache = &kvm->arch.split_desc_cache;
+	u64 huge_spte = READ_ONCE(*huge_sptep);
+	struct kvm_mmu_page *sp;
+	bool flush = false;
+	u64 *sptep, spte;
+	gfn_t gfn;
+	int index;
+
+	sp = shadow_mmu_get_sp_for_split(kvm, huge_sptep);
+
+	for (index = 0; index < SPTE_ENT_PER_PAGE; index++) {
+		sptep = &sp->spt[index];
+		gfn = kvm_mmu_page_get_gfn(sp, index);
+
+		/*
+		 * The SP may already have populated SPTEs, e.g. if this huge
+		 * page is aliased by multiple sptes with the same access
+		 * permissions. These entries are guaranteed to map the same
+		 * gfn-to-pfn translation since the SP is direct, so no need to
+		 * modify them.
+		 *
+		 * However, if a given SPTE points to a lower level page table,
+		 * that lower level page table may only be partially populated.
+		 * Installing such SPTEs would effectively unmap a potion of the
+		 * huge page. Unmapping guest memory always requires a TLB flush
+		 * since a subsequent operation on the unmapped regions would
+		 * fail to detect the need to flush.
+		 */
+		if (is_shadow_present_pte(*sptep)) {
+			flush |= !is_last_spte(*sptep, sp->role.level);
+			continue;
+		}
+
+		spte = make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
+		mmu_spte_set(sptep, spte);
+		__rmap_add(kvm, cache, slot, sptep, gfn, sp->role.access);
+	}
+
+	__link_shadow_page(kvm, cache, huge_sptep, sp, flush);
+}
+
+static int shadow_mmu_try_split_huge_page(struct kvm *kvm,
+					  const struct kvm_memory_slot *slot,
+					  u64 *huge_sptep)
+{
+	struct kvm_mmu_page *huge_sp = sptep_to_sp(huge_sptep);
+	int level, r = 0;
+	gfn_t gfn;
+	u64 spte;
+
+	/* Grab information for the tracepoint before dropping the MMU lock. */
+	gfn = kvm_mmu_page_get_gfn(huge_sp, spte_index(huge_sptep));
+	level = huge_sp->role.level;
+	spte = *huge_sptep;
+
+	if (kvm_mmu_available_pages(kvm) <= KVM_MIN_FREE_MMU_PAGES) {
+		r = -ENOSPC;
+		goto out;
+	}
+
+	if (need_topup_split_caches_or_resched(kvm)) {
+		write_unlock(&kvm->mmu_lock);
+		cond_resched();
+		/*
+		 * If the topup succeeds, return -EAGAIN to indicate that the
+		 * rmap iterator should be restarted because the MMU lock was
+		 * dropped.
+		 */
+		r = topup_split_caches(kvm) ?: -EAGAIN;
+		write_lock(&kvm->mmu_lock);
+		goto out;
+	}
+
+	shadow_mmu_split_huge_page(kvm, slot, huge_sptep);
+
+out:
+	trace_kvm_mmu_split_huge_page(gfn, spte, level, r);
+	return r;
+}
+
+static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
+					    struct kvm_rmap_head *rmap_head,
+					    const struct kvm_memory_slot *slot)
+{
+	struct rmap_iterator iter;
+	struct kvm_mmu_page *sp;
+	u64 *huge_sptep;
+	int r;
+
+restart:
+	for_each_rmap_spte(rmap_head, &iter, huge_sptep) {
+		sp = sptep_to_sp(huge_sptep);
+
+		/* TDP MMU is enabled, so rmap only contains nested MMU SPs. */
+		if (WARN_ON_ONCE(!sp->role.guest_mode))
+			continue;
+
+		/* The rmaps should never contain non-leaf SPTEs. */
+		if (WARN_ON_ONCE(!is_large_pte(*huge_sptep)))
+			continue;
+
+		/* SPs with level >PG_LEVEL_4K should never by unsync. */
+		if (WARN_ON_ONCE(sp->unsync))
+			continue;
+
+		/* Don't bother splitting huge pages on invalid SPs. */
+		if (sp->role.invalid)
+			continue;
+
+		r = shadow_mmu_try_split_huge_page(kvm, slot, huge_sptep);
+
+		/*
+		 * The split succeeded or needs to be retried because the MMU
+		 * lock was dropped. Either way, restart the iterator to get it
+		 * back into a consistent state.
+		 */
+		if (!r || r == -EAGAIN)
+			goto restart;
+
+		/* The split failed and shouldn't be retried (e.g. -ENOMEM). */
+		break;
+	}
+
+	return false;
+}
+
+void kvm_shadow_mmu_try_split_huge_pages(struct kvm *kvm,
+					 const struct kvm_memory_slot *slot,
+					 gfn_t start, gfn_t end,
+					 int target_level)
+{
+	int level;
+
+	/*
+	 * Split huge pages starting with KVM_MAX_HUGEPAGE_LEVEL and working
+	 * down to the target level. This ensures pages are recursively split
+	 * all the way to the target level. There's no need to split pages
+	 * already at the target level.
+	 */
+	for (level = KVM_MAX_HUGEPAGE_LEVEL; level > target_level; level--) {
+		__walk_slot_rmaps(kvm, slot, shadow_mmu_try_split_huge_pages,
+				  level, level, start, end - 1, true, false);
+	}
+}
+
+static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
+					 struct kvm_rmap_head *rmap_head,
+					 const struct kvm_memory_slot *slot)
+{
+	u64 *sptep;
+	struct rmap_iterator iter;
+	int need_tlb_flush = 0;
+	struct kvm_mmu_page *sp;
+
+restart:
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
+		sp = sptep_to_sp(sptep);
+
+		/*
+		 * We cannot do huge page mapping for indirect shadow pages,
+		 * which are found on the last rmap (level = 1) when not using
+		 * tdp; such shadow pages are synced with the page table in
+		 * the guest, and the guest page table is using 4K page size
+		 * mapping if the indirect sp has level = 1.
+		 */
+		if (sp->role.direct &&
+		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
+							       PG_LEVEL_NUM)) {
+			kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
+
+			if (kvm_available_flush_tlb_with_range())
+				kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
+					KVM_PAGES_PER_HPAGE(sp->role.level));
+			else
+				need_tlb_flush = 1;
+
+			goto restart;
+		}
+	}
+
+	return need_tlb_flush;
+}
+
+void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
+				    const struct kvm_memory_slot *slot)
+{
+	/*
+	 * Note, use KVM_MAX_HUGEPAGE_LEVEL - 1 since there's no need to zap
+	 * pages that are already mapped at the maximum hugepage level.
+	 */
+	if (walk_slot_rmaps(kvm, slot, kvm_mmu_zap_collapsible_spte,
+			    PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL - 1, true))
+		kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
+}
+
+unsigned long mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
+{
+	struct kvm *kvm;
+	int nr_to_scan = sc->nr_to_scan;
+	unsigned long freed = 0;
+
+	mutex_lock(&kvm_lock);
+
+	list_for_each_entry(kvm, &vm_list, vm_list) {
+		int idx;
+		LIST_HEAD(invalid_list);
+
+		/*
+		 * Never scan more than sc->nr_to_scan VM instances.
+		 * Will not hit this condition practically since we do not try
+		 * to shrink more than one VM and it is very unlikely to see
+		 * !n_used_mmu_pages so many times.
+		 */
+		if (!nr_to_scan--)
+			break;
+		/*
+		 * n_used_mmu_pages is accessed without holding kvm->mmu_lock
+		 * here. We may skip a VM instance errorneosly, but we do not
+		 * want to shrink a VM that only started to populate its MMU
+		 * anyway.
+		 */
+		if (!kvm->arch.n_used_mmu_pages &&
+		    !kvm_has_zapped_obsolete_pages(kvm))
+			continue;
+
+		idx = srcu_read_lock(&kvm->srcu);
+		write_lock(&kvm->mmu_lock);
+
+		if (kvm_has_zapped_obsolete_pages(kvm)) {
+			kvm_mmu_commit_zap_page(kvm,
+			      &kvm->arch.zapped_obsolete_pages);
+			goto unlock;
+		}
+
+		freed = kvm_mmu_zap_oldest_mmu_pages(kvm, sc->nr_to_scan);
+
+unlock:
+		write_unlock(&kvm->mmu_lock);
+		srcu_read_unlock(&kvm->srcu, idx);
+
+		/*
+		 * unfair on small ones
+		 * per-vm shrinkers cry out
+		 * sadness comes quickly
+		 */
+		list_move_tail(&kvm->vm_list, &vm_list);
+		break;
+	}
+
+	mutex_unlock(&kvm_lock);
+	return freed;
+}
diff --git a/arch/x86/kvm/mmu/shadow_mmu.h b/arch/x86/kvm/mmu/shadow_mmu.h
index 2bfba6ad20688..4534eadc9a17c 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.h
+++ b/arch/x86/kvm/mmu/shadow_mmu.h
@@ -18,4 +18,149 @@
 
 #include <linux/kvm_host.h>
 
+/* make pte_list_desc fit well in cache lines */
+#define PTE_LIST_EXT 14
+
+/*
+ * Slight optimization of cacheline layout, by putting `more' and `spte_count'
+ * at the start; then accessing it will only use one single cacheline for
+ * either full (entries==PTE_LIST_EXT) case or entries<=6.
+ */
+struct pte_list_desc {
+	struct pte_list_desc *more;
+	/*
+	 * Stores number of entries stored in the pte_list_desc.  No need to be
+	 * u64 but just for easier alignment.  When PTE_LIST_EXT, means full.
+	 */
+	u64 spte_count;
+	u64 *sptes[PTE_LIST_EXT];
+};
+
+unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
+
+struct kvm_shadow_walk_iterator {
+	u64 addr;
+	hpa_t shadow_addr;
+	u64 *sptep;
+	int level;
+	unsigned index;
+};
+
+#define for_each_shadow_entry_using_root(_vcpu, _root, _addr, _walker)     \
+	for (shadow_walk_init_using_root(&(_walker), (_vcpu),              \
+					 (_root), (_addr));                \
+	     shadow_walk_okay(&(_walker));			           \
+	     shadow_walk_next(&(_walker)))
+
+bool mmu_spte_update(u64 *sptep, u64 new_spte);
+void mmu_spte_clear_no_track(u64 *sptep);
+gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index);
+void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index,
+			     unsigned int access);
+
+struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
+				  const struct kvm_memory_slot *slot);
+bool rmap_can_add(struct kvm_vcpu *vcpu);
+void drop_spte(struct kvm *kvm, u64 *sptep);
+bool rmap_write_protect(struct kvm_rmap_head *rmap_head, bool pt_protect);
+bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			const struct kvm_memory_slot *slot);
+bool kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+		  struct kvm_memory_slot *slot, gfn_t gfn, int level,
+		  pte_t unused);
+bool kvm_set_pte_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+		      struct kvm_memory_slot *slot, gfn_t gfn, int level,
+		      pte_t pte);
+
+typedef bool (*rmap_handler_t)(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			       struct kvm_memory_slot *slot, gfn_t gfn,
+			       int level, pte_t pte);
+bool kvm_handle_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
+			  rmap_handler_t handler);
+
+bool kvm_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+		  struct kvm_memory_slot *slot, gfn_t gfn, int level,
+		  pte_t unused);
+bool kvm_test_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+		       struct kvm_memory_slot *slot, gfn_t gfn,
+		       int level, pte_t unused);
+
+void drop_parent_pte(struct kvm_mmu_page *sp, u64 *parent_pte);
+int nonpaging_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp);
+int mmu_sync_children(struct kvm_vcpu *vcpu, struct kvm_mmu_page *parent,
+		      bool can_yield);
+void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp);
+void clear_sp_write_flooding_count(u64 *spte);
+
+struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu, u64 *sptep,
+					  gfn_t gfn, bool direct,
+					  unsigned int access);
+
+void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterator,
+				 struct kvm_vcpu *vcpu, hpa_t root, u64 addr);
+void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
+		      struct kvm_vcpu *vcpu, u64 addr);
+bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator);
+void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator);
+
+void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep, struct kvm_mmu_page *sp);
+
+void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
+			  unsigned direct_access);
+
+int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, u64 *spte,
+		     struct list_head *invalid_list);
+bool __kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+				struct list_head *invalid_list,
+				int *nr_zapped);
+bool kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+			      struct list_head *invalid_list);
+void kvm_mmu_commit_zap_page(struct kvm *kvm, struct list_head *invalid_list);
+
+int make_mmu_pages_available(struct kvm_vcpu *vcpu);
+
+int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
+
+int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
+		 u64 *sptep, unsigned int pte_access, gfn_t gfn,
+		 kvm_pfn_t pfn, struct kvm_page_fault *fault);
+void __direct_pte_prefetch(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+			   u64 *sptep);
+int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
+u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte);
+
+hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant, u8 level);
+int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu);
+int mmu_alloc_special_roots(struct kvm_vcpu *vcpu);
+
+int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level);
+
+void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr);
+void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+		       int bytes, struct kvm_page_track_notifier_node *node);
+
+/* The return value indicates if tlb flush on all vcpus is needed. */
+typedef bool (*slot_rmaps_handler) (struct kvm *kvm,
+				    struct kvm_rmap_head *rmap_head,
+				    const struct kvm_memory_slot *slot);
+bool walk_slot_rmaps(struct kvm *kvm, const struct kvm_memory_slot *slot,
+		       slot_rmaps_handler fn, int start_level, int end_level,
+		       bool flush_on_yield);
+bool walk_slot_rmaps_4k(struct kvm *kvm, const struct kvm_memory_slot *slot,
+			slot_rmaps_handler fn, bool flush_on_yield);
+
+void kvm_zap_obsolete_pages(struct kvm *kvm);
+bool kvm_rmap_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+
+bool slot_rmap_write_protect(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			     const struct kvm_memory_slot *slot);
+
+void kvm_shadow_mmu_try_split_huge_pages(struct kvm *kvm,
+					 const struct kvm_memory_slot *slot,
+					 gfn_t start, gfn_t end,
+					 int target_level);
+void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
+				    const struct kvm_memory_slot *slot);
+
+unsigned long mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc);
 #endif /* __KVM_X86_MMU_SHADOW_MMU_H */

From patchwork Thu Feb  2 18:27:56 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52118
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp400814wrn;
        Thu, 2 Feb 2023 10:30:11 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set/0jdqp4AWkZu/mW6JyyKj65tt7IE/+z2EabDt4gLRE12HjCSJOfc5OVgWM4d+uSIrpjGIM
X-Received: by 2002:a17:90b:17d1:b0:22b:b89b:b9d0 with SMTP id
 me17-20020a17090b17d100b0022bb89bb9d0mr7662736pjb.22.1675362611132;
        Thu, 02 Feb 2023 10:30:11 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675362611; cv=none;
        d=google.com; s=arc-20160816;
        b=XLBzgl3y6hV53DqaTIdCsQBV/6csGU5RouFZcDCkzLAQYkKUZ3qBLF2fPhGZRvbUgn
         ihUivFZKDlbwykCkmNJGcMKJ4G+ayikCn9INjVOxaAXfreEhgo2ZLYPBHDoztbtip9Zx
         9Bda03W74wpF3fS9LLGgzCAsxi81AwFq+igEaXI62xcoBED1C2WdiP2rh7sFPPmeEg3V
         ui46KocM3lOrOUuVqzfo30iqW/toLNrHJLDMECBWbY3/pxoOdt+ysDGjZtW0FkMDLJzF
         5MBALVmdCuoJolfvzJ0AVnQBbTTUiBL63IpspkeTZ87CdRuNntH2z1fMxuskg6PYe/fj
         AH1w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=8tan31zdS6PKUwJVzyLN2c6lLyefV39OjyNzpOIJzl0=;
        b=qu+O92iULvC2wdbgdm1YU8VW94YOXTO5tlE8w7nNvc99VR1JqX8rbidugqvJFCV9Ds
         fhEUmVL1dC464jyFKl+CXcKV5pHNFAmP3fOx5LiZ1RfbR0VbVYm8VlrZCh6JZ1B+cyTe
         Xs5x6zSMvlimVTumGuq+FN8EH9CtYwmflKh/gDSK/vXwGYSLx55GwbrpUut1YJQuaTr/
         DjgFdqr9A0OhbAN4jB+Sjo4/6FoNfeUVsHTg5V4FgizfNZZ9HjVWUj6QbEcnlQWJnqLR
         LzhQaIHjdANd1Cwq5w4FZDZAQV8v52/XXPkGKyaFwVYdyGOWj6mcu8Uz4rCb1PHuSjgj
         Y0cg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=ISqEcv5g;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 w2-20020a17090aad4200b00219b5765ff9si5325943pjv.105.2023.02.02.10.29.58;
        Thu, 02 Feb 2023 10:30:11 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=ISqEcv5g;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232461AbjBBS3Q (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:29:16 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33344 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232603AbjBBS2m (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:28:42 -0500
Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com
 [IPv6:2607:f8b0:4864:20::54a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 694C367796
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:25 -0800 (PST)
Received: by mail-pg1-x54a.google.com with SMTP id
 h126-20020a636c84000000b004d31ad79086so1368629pgc.23
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=8tan31zdS6PKUwJVzyLN2c6lLyefV39OjyNzpOIJzl0=;
        b=ISqEcv5gcdG1vlSJqzcQZNb8goQIJ7GJymvAuPaRAAGzFmj7/6ioDcfJmKq8OkerTl
         Y35IYK48Z16XX1h/1FO4UtdDMCdajAYrethvBjoFGMSRM3UiRZRv093nrn1DHq5FGdbw
         9l+di1c2Ijrief9oI3G0pT64Y9m1Pzzs69/OE2PeXU7It3P6LuoDX1vTriZtsbPH6RA1
         joFvu87bobMayqIYx/m5W7ZmRSXHDJyw9kCxX9OOnN1AmAqUWgxDAhmuLg79rlArv5kc
         WG4NLXMSDHA+xZW7NTnUfLCoGEIoXvO2y7H8CF5SlvCMUkwQBCQerBt+iBGwo6MzfZ1j
         7CpQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=8tan31zdS6PKUwJVzyLN2c6lLyefV39OjyNzpOIJzl0=;
        b=3H/kaO8GHZXPDEpv4APpVejFWAaygoUxpMig/+OLdoZMeH6b1e9Gdx80rb7tnSmsbb
         Od2Yj2gvWohwHIEtrFf8lVIIptpZQ7zWYOzJw0F5SUKGaAf3rbgwh1AIfsVQnvQOCiu/
         mvnOd3mt7S3CY1blgymjc1lY4VmLYUMTLXm1ZUqtTldvgrN7QLGiscfX4wG508bJCX+/
         rCbKcWc/org8tLJd/Vus/8A/gak0YLEd5NVWup+mqOWF+dYjSsPuFKrbL8HuIgKNCgX6
         OdQK6cX32SaUel47PYqeb9qRdN5ZApfNujSqkq64XUr9iYkYfXLUELvIIN/SCyCMl4J8
         E9hw==
X-Gm-Message-State: AO0yUKXSJX92AeQzENY+KMwrT4ORcrbUKIgdoDyEHNEaIzD6f0K1r3Hz
        Ufw2FYyZ1KdeSHRHbX9+OHIKz5Dptlb+/Gg8nDHv3hClf0e1fjtn/b7slIIDMVu/miY9FuLordw
        E3M3rFaMInLteK9gggMEIWq0lvhZYFjlhdKSQ/zPHMTMeSt9IAR+DqyoueE3pkCE0xs0xmONb
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a17:90a:be0e:b0:22c:305a:4da3 with SMTP id
 a14-20020a17090abe0e00b0022c305a4da3mr809032pjs.74.1675362504523; Thu, 02 Feb
 2023 10:28:24 -0800 (PST)
Date: Thu,  2 Feb 2023 18:27:56 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-9-bgardon@google.com>
Subject: [PATCH 08/21] KVM: x86/MMU: Expose functions for paging_tmpl.h
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745024894814039?=
X-GMAIL-MSGID: =?utf-8?q?1756745024894814039?=

In preparation for moving paging_tmpl.h to shadow_mmu.c, expose various
functions it needs through mmu_internal.h. This includes moving all the
BUILD_MMU_ROLE_*() macros. Not all of those macros are strictly needed
by paging_tmpl.h, but it is cleaner to keep them together.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 68 +++++----------------------------
 arch/x86/kvm/mmu/mmu_internal.h | 59 ++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+), 59 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2162dfda9601f..da290bfca0137 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -123,57 +123,9 @@ struct kmem_cache *pte_list_desc_cache;
 struct kmem_cache *mmu_page_header_cache;
 struct percpu_counter kvm_total_used_mmu_pages;
 
-struct kvm_mmu_role_regs {
-	const unsigned long cr0;
-	const unsigned long cr4;
-	const u64 efer;
-};
-
 #define CREATE_TRACE_POINTS
 #include "mmutrace.h"
 
-/*
- * Yes, lot's of underscores.  They're a hint that you probably shouldn't be
- * reading from the role_regs.  Once the root_role is constructed, it becomes
- * the single source of truth for the MMU's state.
- */
-#define BUILD_MMU_ROLE_REGS_ACCESSOR(reg, name, flag)			\
-static inline bool __maybe_unused					\
-____is_##reg##_##name(const struct kvm_mmu_role_regs *regs)		\
-{									\
-	return !!(regs->reg & flag);					\
-}
-BUILD_MMU_ROLE_REGS_ACCESSOR(cr0, pg, X86_CR0_PG);
-BUILD_MMU_ROLE_REGS_ACCESSOR(cr0, wp, X86_CR0_WP);
-BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, pse, X86_CR4_PSE);
-BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, pae, X86_CR4_PAE);
-BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, smep, X86_CR4_SMEP);
-BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, smap, X86_CR4_SMAP);
-BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, pke, X86_CR4_PKE);
-BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, la57, X86_CR4_LA57);
-BUILD_MMU_ROLE_REGS_ACCESSOR(efer, nx, EFER_NX);
-BUILD_MMU_ROLE_REGS_ACCESSOR(efer, lma, EFER_LMA);
-
-/*
- * The MMU itself (with a valid role) is the single source of truth for the
- * MMU.  Do not use the regs used to build the MMU/role, nor the vCPU.  The
- * regs don't account for dependencies, e.g. clearing CR4 bits if CR0.PG=1,
- * and the vCPU may be incorrect/irrelevant.
- */
-#define BUILD_MMU_ROLE_ACCESSOR(base_or_ext, reg, name)		\
-static inline bool __maybe_unused is_##reg##_##name(struct kvm_mmu *mmu)	\
-{								\
-	return !!(mmu->cpu_role. base_or_ext . reg##_##name);	\
-}
-BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp);
-BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pse);
-BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smep);
-BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smap);
-BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pke);
-BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, la57);
-BUILD_MMU_ROLE_ACCESSOR(base, efer, nx);
-BUILD_MMU_ROLE_ACCESSOR(ext,  efer, lma);
-
 static inline bool is_cr0_pg(struct kvm_mmu *mmu)
 {
         return mmu->cpu_role.base.level > 0;
@@ -218,7 +170,7 @@ void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
 	kvm_flush_remote_tlbs_with_range(kvm, &range);
 }
 
-static gfn_t get_mmio_spte_gfn(u64 spte)
+gfn_t get_mmio_spte_gfn(u64 spte)
 {
 	u64 gpa = spte & shadow_nonpresent_or_rsvd_lower_gfn_mask;
 
@@ -287,7 +239,7 @@ void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
 	}
 }
 
-static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
+int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 {
 	int r;
 
@@ -828,9 +780,8 @@ static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa
 	return -EFAULT;
 }
 
-static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
-				   struct kvm_page_fault *fault,
-				   unsigned int access)
+int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+			    unsigned int access)
 {
 	gva_t gva = fault->is_tdp ? 0 : fault->addr;
 
@@ -1284,8 +1235,8 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
 	return RET_PF_RETRY;
 }
 
-static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
-					 struct kvm_page_fault *fault)
+bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
+				  struct kvm_page_fault *fault)
 {
 	if (unlikely(fault->rsvd))
 		return false;
@@ -1408,8 +1359,8 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	return RET_PF_CONTINUE;
 }
 
-static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
-			   unsigned int access)
+int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+		    unsigned int access)
 {
 	int ret;
 
@@ -1433,8 +1384,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
  * Returns true if the page fault is stale and needs to be retried, i.e. if the
  * root was invalidated by a memslot update or a relevant mmu_notifier fired.
  */
-static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
-				struct kvm_page_fault *fault)
+bool is_page_fault_stale(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa);
 
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 9c1399762496b..349d4a300ad34 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -347,6 +347,65 @@ bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp);
 void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu);
 void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu);
 
+int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect);
 bool need_topup_split_caches_or_resched(struct kvm *kvm);
 int topup_split_caches(struct kvm *kvm);
+
+bool is_page_fault_stale(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
+bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
+				  struct kvm_page_fault *fault);
+int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+		    unsigned int access);
+int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+			    unsigned int access);
+
+gfn_t get_mmio_spte_gfn(u64 spte);
+
+struct kvm_mmu_role_regs {
+	const unsigned long cr0;
+	const unsigned long cr4;
+	const u64 efer;
+};
+
+/*
+ * Yes, lot's of underscores.  They're a hint that you probably shouldn't be
+ * reading from the role_regs.  Once the root_role is constructed, it becomes
+ * the single source of truth for the MMU's state.
+ */
+#define BUILD_MMU_ROLE_REGS_ACCESSOR(reg, name, flag)			\
+static inline bool __maybe_unused					\
+____is_##reg##_##name(const struct kvm_mmu_role_regs *regs)		\
+{									\
+	return !!(regs->reg & flag);					\
+}
+BUILD_MMU_ROLE_REGS_ACCESSOR(cr0, pg, X86_CR0_PG);
+BUILD_MMU_ROLE_REGS_ACCESSOR(cr0, wp, X86_CR0_WP);
+BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, pse, X86_CR4_PSE);
+BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, pae, X86_CR4_PAE);
+BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, smep, X86_CR4_SMEP);
+BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, smap, X86_CR4_SMAP);
+BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, pke, X86_CR4_PKE);
+BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, la57, X86_CR4_LA57);
+BUILD_MMU_ROLE_REGS_ACCESSOR(efer, nx, EFER_NX);
+BUILD_MMU_ROLE_REGS_ACCESSOR(efer, lma, EFER_LMA);
+
+/*
+ * The MMU itself (with a valid role) is the single source of truth for the
+ * MMU.  Do not use the regs used to build the MMU/role, nor the vCPU.  The
+ * regs don't account for dependencies, e.g. clearing CR4 bits if CR0.PG=1,
+ * and the vCPU may be incorrect/irrelevant.
+ */
+#define BUILD_MMU_ROLE_ACCESSOR(base_or_ext, reg, name)		\
+static inline bool __maybe_unused is_##reg##_##name(struct kvm_mmu *mmu)	\
+{								\
+	return !!(mmu->cpu_role. base_or_ext . reg##_##name);	\
+}
+BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp);
+BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pse);
+BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smep);
+BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smap);
+BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pke);
+BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, la57);
+BUILD_MMU_ROLE_ACCESSOR(base, efer, nx);
+BUILD_MMU_ROLE_ACCESSOR(ext,  efer, lma);
 #endif /* __KVM_X86_MMU_INTERNAL_H */

From patchwork Thu Feb  2 18:27:57 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52124
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp401112wrn;
        Thu, 2 Feb 2023 10:30:38 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set/9UGvsGF6i75i6NuO6Uz5MQzZoyu7OI+rqht2EVVjm5SQpcijXe3FmoJtrs53qXPbhBvQv
X-Received: by 2002:a05:6a20:3944:b0:bf:bcfb:1fc6 with SMTP id
 r4-20020a056a20394400b000bfbcfb1fc6mr1822531pzg.60.1675362637848;
        Thu, 02 Feb 2023 10:30:37 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675362637; cv=none;
        d=google.com; s=arc-20160816;
        b=JV+57A+9T6tWeyT8DQVqa+gthIuJSMyen/LWXCIpvOF9J7WUa2KqZr3fvtKvY5YmQR
         EFBe3V78IWoQSthf3lVRPnYohfI8yUCbUfJUEKZDWAqMQjy4P5SoojVTxKir3cgDzB6w
         eMBDPCrRtOFavb4QLKDc+1ayq2j9apuZNzFQrUI8O+UBeWFNs6MGtctU3zy/GeOw85Ay
         8ok7LUFdfIVDbrhMMlNO03/l5PMpBFLFJRqqL28wzYGE7r8v00iTPXgTQr18juunhl/E
         GdADW1H9pRVdBxygwc0FZw7xr0QqIgSSoy1RRfrZbslDNT6ZeZe2UOcgXzfbhN7bVg1y
         gWbw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=SCjhftg1/DYUwFVT/w3C8ZdYHS1OzJkYD1GQMbpOnSA=;
        b=U/q6vKHkSi52Tg0As8Ur/JbfI5Jyk0Ne4mvRP0sQaI2WlFQEX0rNk2Uuvu6yuLjQfU
         BbOLFnDGGJNBEMO6Hb45drRryD4+PSjbSkgnrfFLOIdfR2jQ0lpzKDelIwFifV72eZLP
         sXdf6J+7YQPEzEwlg2d0L6iHMOQ70VxPiI92BxjShSCzcMs54//Yx1S0fsSfe1Eq7W37
         c61L6O5hHpJr2JWQ4idIIuyL4xYnj+mBO7ojmcJdwzp0zi+G5zr8ah0HSTqfv0zp+uz+
         FXbhR0prgjQEjI92pwjpVSjm203/eJWm1BpCGngWOV98BOfRS+jp5gbW8ufmXF5Ivqwd
         yqtQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=HlTm2yeT;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 p11-20020a17090a930b00b0022e5d2d5837si487659pjo.53.2023.02.02.10.30.25;
        Thu, 02 Feb 2023 10:30:37 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=HlTm2yeT;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232583AbjBBS3q (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:29:46 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33812 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232466AbjBBS3A (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:29:00 -0500
Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com
 [IPv6:2607:f8b0:4864:20::b49])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 345F073046
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:28 -0800 (PST)
Received: by mail-yb1-xb49.google.com with SMTP id
 y14-20020a2586ce000000b0086167203873so1142816ybm.13
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=SCjhftg1/DYUwFVT/w3C8ZdYHS1OzJkYD1GQMbpOnSA=;
        b=HlTm2yeT65x5qhKPD97fhROttSIpL9Llk0/tXW2MD0B7vVoGJR0wZXi5xs4h/ZxfyB
         j4hilZMTw3mXGQ+HHHXweXQWbfLQfK+j0/vL83Ot2YNxKpds9TyJsJ3JgO6z4MfXrmoW
         52cY6grV1ChIelYO+8TgHqJ0aAwpT3F2c1QGMp9sQ5T/j58R1TBMiBCiSrQvEOpawHH6
         sHT+Bmgis6Z/MrWkFRLcNM5URdTqokKKVf0FHug4c6yxqqrg5TkrIXIA6HTWiZ+o5I1z
         xE6VDaoDPgzJknJSm0KFuonUDaw8EkYEk1FwVHqzgXRDxqGkQNbgnNv8CvVzg5bPEIyq
         FNQw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=SCjhftg1/DYUwFVT/w3C8ZdYHS1OzJkYD1GQMbpOnSA=;
        b=yowZ5k5PTDTqoHsnYpm8Vf1a6PfTT9dNYAd3ewDaZGuvkbXp/k+Ps94EDAq8lNmylL
         rV54Ge0FPgAmUWWX35OXYbFwj9QZONJRk1+9nthphvBey39NzZ+g62ADWe1PQDFCxwYD
         ZBlLRcP79gOP1gzogGl+SJCYK0hrAm1uwdQYMcmyqrQt62ceN6zLDDhh5uEm6QB/5ccM
         telgqFkvYzS4dqoKvzyIVnd9cufUrAUhqWnLE+IixdUbfGFsnwqN8wk6EBK4O4LXLCf0
         q1XnmyeIrobYJU75scZvArdp524lSLRPSbkpBdY0vIxZAZwKDVb8guGNMVPYeu3m5Va1
         sG1A==
X-Gm-Message-State: AO0yUKX/aIroIe3zwrmstMK4A+Ja6/F6n+oFduWXe3xfC/Q2xk0oPb4E
        BI/eRZh6AwVEW7YrNGMO51Yv4KEYVfS5atj82VCagwhsTdAeINV+GKTX5vtvW25hHyajPhkAqMF
        2LLtnmSWJCa9Ms7V9Ft7R+ZsnlF1WA+9XvLD//A9Q0ILU/WMMY7nXZPVxFd1MfaCbu2W/vGeb
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a05:690c:28c:b0:506:626d:f67d with SMTP id
 bf12-20020a05690c028c00b00506626df67dmr871738ywb.270.1675362506276; Thu, 02
 Feb 2023 10:28:26 -0800 (PST)
Date: Thu,  2 Feb 2023 18:27:57 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-10-bgardon@google.com>
Subject: [PATCH 09/21] KVM: x86/MMU: Move paging_tmpl.h includes to
 shadow_mmu.c
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745053402510594?=
X-GMAIL-MSGID: =?utf-8?q?1756745053402510594?=

Move the integration point for paging_tmpl.h to shadow_mmu.c since
paging_tmpl.h is ostensibly part of the Shadow MMU. This requires
modifying some of the definitions to be non-static and then exporting
the pre-processed function names through shadow_mmu.h since they are
needed for mmu context callbacks in mmu.c. This will facilitate cleanups
in following commits because many of the functions being exposed by
shadow_mmu.h are only needed by paging_tmpl.h. Those functions will no
longer need to be exported.

sync_mmio_spte() is only used by paging_tmpl.h, so move it along with
the includes.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c         | 29 -----------------------------
 arch/x86/kvm/mmu/paging_tmpl.h | 11 +++++------
 arch/x86/kvm/mmu/shadow_mmu.c  | 31 +++++++++++++++++++++++++++++++
 arch/x86/kvm/mmu/shadow_mmu.h  | 25 ++++++++++++++++++++++++-
 4 files changed, 60 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index da290bfca0137..cef481a17a519 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1697,35 +1697,6 @@ static unsigned long get_cr3(struct kvm_vcpu *vcpu)
 	return kvm_read_cr3(vcpu);
 }
 
-static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
-			   unsigned int access)
-{
-	if (unlikely(is_mmio_spte(*sptep))) {
-		if (gfn != get_mmio_spte_gfn(*sptep)) {
-			mmu_spte_clear_no_track(sptep);
-			return true;
-		}
-
-		mark_mmio_spte(vcpu, sptep, gfn, access);
-		return true;
-	}
-
-	return false;
-}
-
-#define PTTYPE_EPT 18 /* arbitrary */
-#define PTTYPE PTTYPE_EPT
-#include "paging_tmpl.h"
-#undef PTTYPE
-
-#define PTTYPE 64
-#include "paging_tmpl.h"
-#undef PTTYPE
-
-#define PTTYPE 32
-#include "paging_tmpl.h"
-#undef PTTYPE
-
 static void __reset_rsvds_bits_mask(struct rsvd_bits_validate *rsvd_check,
 				    u64 pa_bits_rsvd, int level, bool nx,
 				    bool gbpages, bool pse, bool amd)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 730b413eebfde..1251357794538 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -787,7 +787,7 @@ FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
  *  Returns: 1 if we need to emulate the instruction, 0 otherwise, or
  *           a negative value on error.
  */
-static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	struct guest_walker walker;
 	int r;
@@ -889,7 +889,7 @@ static gpa_t FNAME(get_level1_sp_gpa)(struct kvm_mmu_page *sp)
 	return gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
 }
 
-static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
+void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
 {
 	struct kvm_shadow_walk_iterator iterator;
 	struct kvm_mmu_page *sp;
@@ -949,9 +949,8 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
 }
 
 /* Note, @addr is a GPA when gva_to_gpa() translates an L2 GPA to an L1 GPA. */
-static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
-			       gpa_t addr, u64 access,
-			       struct x86_exception *exception)
+gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, gpa_t addr,
+			u64 access, struct x86_exception *exception)
 {
 	struct guest_walker walker;
 	gpa_t gpa = INVALID_GPA;
@@ -984,7 +983,7 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
  *   0: the sp is synced and no tlb flushing is required
  * > 0: the sp is synced and tlb flushing is required
  */
-static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
 	union kvm_mmu_page_role root_role = vcpu->arch.mmu->root_role;
 	int i;
diff --git a/arch/x86/kvm/mmu/shadow_mmu.c b/arch/x86/kvm/mmu/shadow_mmu.c
index f3e2ed5b675eb..c7cfdc6f51b53 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.c
+++ b/arch/x86/kvm/mmu/shadow_mmu.c
@@ -12,6 +12,8 @@
  *   Yaniv Kamay  <yaniv@qumranet.com>
  *   Avi Kivity   <avi@qumranet.com>
  */
+
+#include "ioapic.h"
 #include "mmu.h"
 #include "mmu_internal.h"
 #include "mmutrace.h"
@@ -2809,6 +2811,35 @@ void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
 	walk_shadow_page_lockless_end(vcpu);
 }
 
+static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
+			   unsigned int access)
+{
+	if (unlikely(is_mmio_spte(*sptep))) {
+		if (gfn != get_mmio_spte_gfn(*sptep)) {
+			mmu_spte_clear_no_track(sptep);
+			return true;
+		}
+
+		mark_mmio_spte(vcpu, sptep, gfn, access);
+		return true;
+	}
+
+	return false;
+}
+
+#define PTTYPE_EPT 18 /* arbitrary */
+#define PTTYPE PTTYPE_EPT
+#include "paging_tmpl.h"
+#undef PTTYPE
+
+#define PTTYPE 64
+#include "paging_tmpl.h"
+#undef PTTYPE
+
+#define PTTYPE 32
+#include "paging_tmpl.h"
+#undef PTTYPE
+
 static bool is_obsolete_root(struct kvm *kvm, hpa_t root_hpa)
 {
 	struct kvm_mmu_page *sp;
diff --git a/arch/x86/kvm/mmu/shadow_mmu.h b/arch/x86/kvm/mmu/shadow_mmu.h
index 4534eadc9a17c..7faf8b06e68f1 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.h
+++ b/arch/x86/kvm/mmu/shadow_mmu.h
@@ -86,7 +86,6 @@ bool kvm_test_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 		       int level, pte_t unused);
 
 void drop_parent_pte(struct kvm_mmu_page *sp, u64 *parent_pte);
-int nonpaging_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp);
 int mmu_sync_children(struct kvm_vcpu *vcpu, struct kvm_mmu_page *parent,
 		      bool can_yield);
 void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp);
@@ -163,4 +162,28 @@ void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
 				    const struct kvm_memory_slot *slot);
 
 unsigned long mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc);
+
+/* Exports from paging_tmpl.h */
+gpa_t paging32_gva_to_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+			  gpa_t vaddr, u64 access,
+			  struct x86_exception *exception);
+gpa_t paging64_gva_to_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+			  gpa_t vaddr, u64 access,
+			  struct x86_exception *exception);
+gpa_t ept_gva_to_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, gpa_t vaddr,
+		     u64 access, struct x86_exception *exception);
+
+int paging32_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
+int paging64_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
+int ept_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
+
+int paging32_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp);
+int paging64_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp);
+int ept_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp);
+/* Defined in shadow_mmu.c. */
+int nonpaging_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp);
+
+void paging32_invlpg(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root);
+void paging64_invlpg(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root);
+void ept_invlpg(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root);
 #endif /* __KVM_X86_MMU_SHADOW_MMU_H */

From patchwork Thu Feb  2 18:27:58 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52129
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp401263wrn;
        Thu, 2 Feb 2023 10:30:53 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set/VRXhG+Cf2HcjpGKczQ3mpAdUxXecVbYhz8Mqy585niJ+rG+BIwzbGaRrr6NJc1m1ZNhzM
X-Received: by 2002:a05:6a00:1ace:b0:593:b422:f11d with SMTP id
 f14-20020a056a001ace00b00593b422f11dmr8172698pfv.30.1675362653202;
        Thu, 02 Feb 2023 10:30:53 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675362653; cv=none;
        d=google.com; s=arc-20160816;
        b=PB3hEt37qCNWJVdapfEqp4/VQb1nAcnnymgVUKkR65uWsxerIQcI1LGqQ4i/JWtAt0
         pzhpO764BKnNuXK34q6pcbwcy7n9fjWyoY10/sbadxiNyCU4mdCJF11wOOe8qzDhtt+T
         YWqxfcJ0KB+VQRziSn9vblEhQZtV5yoVDC+IM/H6coXsFBoAmF2YTNGD3d3BSI6BhKhb
         YGHwoxtsjGzH1tbRts3Zd0OJawX4oejverdpAD085YJG3ORgpzKqRK63aKpEM8ymiqa0
         cFj+vEhrxet02MftELlB26Vtl+ejU4O5hRzsvEseRxetjZmdnG9kkuj7X1INzqNps0K7
         KP+w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=MglY/RjpYcqUADFANaRWEQQDd9cl7EMg6LVL3/WV9Zo=;
        b=WHa80LTriF2zXNiaOgKiMUMSioYfatkJKgLGYXMpbcp7CQid+ljH/X3v1Otg86ikCY
         EW4XfdPt5UPJ/TG5eGXcsf+WUEMRceAZKcHBZGux63vbQ8UcCDkk6LBgfQgq/wElwlSe
         0cFb4KAhdQPkhFuKvL2LsGEzGlzqc/Xchb2T4Vr+m9Nc3jBKYHtkuBjkbYimDVMxYpjO
         nXQQPTwOuHNs41qXC6vFSKut+QRji76onF2GWpWuiXXD/o4D3ovw76bfTKfg88FU88BR
         7AwWhpt3rrLXdi4e/1GXVkUJjieTcPOTXWQqNmR2lUhRGwP3a9+BKaY/eP7DZI3KEOGt
         RKbg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=PnUKay1I;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 a5-20020aa795a5000000b00593ef3e5f4asi29195pfk.47.2023.02.02.10.30.40;
        Thu, 02 Feb 2023 10:30:53 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=PnUKay1I;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232742AbjBBSaD (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:30:03 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33438 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232611AbjBBS3Y (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:29:24 -0500
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36B311E9D5
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:40 -0800 (PST)
Received: by mail-pj1-x1049.google.com with SMTP id
 k15-20020a17090a590f00b002300fe6b09dso3281152pji.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:40 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=MglY/RjpYcqUADFANaRWEQQDd9cl7EMg6LVL3/WV9Zo=;
        b=PnUKay1IZj95QB6+OCa89Nu7SKUG4PXK9M9N56Ez/hDS/DkMKQChuX/zpQ+LBk/3yi
         TKjRDdjkSHcfHej5A/WDWz41y4pOpl0eHKg6HXGzddNG4DAsUgyC3AmNOx8Tzc6tnbrF
         MsXnhWT+jJM8XoIfHhmgjAbEOduXrmBsfukeasKfxYszj0+ucVL1lIuON9xNf6rmZcnO
         WKcCTbLxtTE+QVoazNp9YK6Maw1TCo576hnKdzOSGb0yXURJS8a0UW8lefG82MLl9C1U
         jY+UJkj0/wwuLfO+C6nWy8J1HTnqz840YqlEf4xqtgM7VDnNfKqg18uugg9UrvWf6WYk
         wLLw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=MglY/RjpYcqUADFANaRWEQQDd9cl7EMg6LVL3/WV9Zo=;
        b=mHhRzAOa9d+/MxkYvzZP0cCAmN6FZc4LJ8SIL8lD2i8RC59YnMNuRqv4z1SpZY46EJ
         BMFbroQFIOv+xyWxgl4qMG0/AwFGUzVfULHCplKETPrdqFrzRYyFFLjsfwOBhYaFtKn/
         6cyKWRR0khtD06OYDIiMVHRy8DEb6+U2JkblLhu8g8koJpXD+LMDWWomcIOK6zwA/azF
         7tDju8+JDj6BVSOAtPqN+A9/s0adxe4hDtqC7/s+Yk5eUNXFrvH6uV55GoaSf8yIAYXR
         sbGv3easUIXPF7Iv5MtdhUF60kGMNPxfY9WmIZJl83nH0fE9HJ4ofJD2TXNiVbFx+7pp
         PzRQ==
X-Gm-Message-State: AO0yUKURm/fKjcUUAhodWBFgpnrjCQbd7Gw+Bf402Gw5Fs49AJWRkrh7
        +qYEJcZwfe4mmKu7PE4Ze+qrfSWd+NiiGL/k8rqjWdZfddwGhLS2h+6BlYzaOCe1ZWEhVBEDaLT
        zHrFTmw3jRH+KvkINfUN5UHdMvwlXcCnM20aDPFmfbQAfxMePtc30y68edVZirSgD0i0Q/Mqi
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a17:90a:600d:b0:229:2296:4be3 with SMTP id
 y13-20020a17090a600d00b0022922964be3mr771911pji.5.1675362507870; Thu, 02 Feb
 2023 10:28:27 -0800 (PST)
Date: Thu,  2 Feb 2023 18:27:58 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-11-bgardon@google.com>
Subject: [PATCH 10/21] KVM: x86/MMU: Clean up Shadow MMU exports
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745069145286756?=
X-GMAIL-MSGID: =?utf-8?q?1756745069145286756?=

Now that paging_tmpl.h is included from shadow_mmu.c, there's no need to
export many of the functions currrently in shadow_mmu.h, so remove those
exports and mark the functions static. This cleans up the interface
of the Shadow MMU, and will allow the implementation to keep the details
of rmap_heads internal.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/shadow_mmu.c | 78 +++++++++++++++++++++--------------
 arch/x86/kvm/mmu/shadow_mmu.h | 51 +----------------------
 2 files changed, 48 insertions(+), 81 deletions(-)

diff --git a/arch/x86/kvm/mmu/shadow_mmu.c b/arch/x86/kvm/mmu/shadow_mmu.c
index c7cfdc6f51b53..1be680bce15a6 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.c
+++ b/arch/x86/kvm/mmu/shadow_mmu.c
@@ -24,6 +24,20 @@
 #include <asm/cmpxchg.h>
 #include <trace/events/kvm.h>
 
+struct kvm_shadow_walk_iterator {
+	u64 addr;
+	hpa_t shadow_addr;
+	u64 *sptep;
+	int level;
+	unsigned index;
+};
+
+#define for_each_shadow_entry_using_root(_vcpu, _root, _addr, _walker)     \
+	for (shadow_walk_init_using_root(&(_walker), (_vcpu),              \
+					 (_root), (_addr));                \
+	     shadow_walk_okay(&(_walker));			           \
+	     shadow_walk_next(&(_walker)))
+
 #define for_each_shadow_entry(_vcpu, _addr, _walker)            \
 	for (shadow_walk_init(&(_walker), _vcpu, _addr);	\
 	     shadow_walk_okay(&(_walker));			\
@@ -230,7 +244,7 @@ static u64 mmu_spte_update_no_track(u64 *sptep, u64 new_spte)
  *
  * Returns true if the TLB needs to be flushed
  */
-bool mmu_spte_update(u64 *sptep, u64 new_spte)
+static bool mmu_spte_update(u64 *sptep, u64 new_spte)
 {
 	bool flush = false;
 	u64 old_spte = mmu_spte_update_no_track(sptep, new_spte);
@@ -314,7 +328,7 @@ static u64 mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
  * Directly clear spte without caring the state bits of sptep,
  * it is used to set the upper level spte.
  */
-void mmu_spte_clear_no_track(u64 *sptep)
+static void mmu_spte_clear_no_track(u64 *sptep)
 {
 	__update_clear_spte_fast(sptep, 0ull);
 }
@@ -357,7 +371,7 @@ static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
 
 static bool sp_has_gptes(struct kvm_mmu_page *sp);
 
-gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
+static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
 {
 	if (sp->role.passthrough)
 		return sp->gfn;
@@ -413,8 +427,8 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
 	          sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
 }
 
-void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index,
-			     unsigned int access)
+static void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index,
+				    unsigned int access)
 {
 	gfn_t gfn = kvm_mmu_page_get_gfn(sp, index);
 
@@ -629,7 +643,7 @@ struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
 	return &slot->arch.rmap[level - PG_LEVEL_4K][idx];
 }
 
-bool rmap_can_add(struct kvm_vcpu *vcpu)
+static bool rmap_can_add(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu_memory_cache *mc;
 
@@ -737,7 +751,7 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
 	for (_spte_ = rmap_get_first(_rmap_head_, _iter_);		\
 	     _spte_; _spte_ = rmap_get_next(_iter_))
 
-void drop_spte(struct kvm *kvm, u64 *sptep)
+static void drop_spte(struct kvm *kvm, u64 *sptep)
 {
 	u64 old_spte = mmu_spte_clear_track_bits(kvm, sptep);
 
@@ -1114,7 +1128,7 @@ static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
 	pte_list_remove(parent_pte, &sp->parent_ptes);
 }
 
-void drop_parent_pte(struct kvm_mmu_page *sp, u64 *parent_pte)
+static void drop_parent_pte(struct kvm_mmu_page *sp, u64 *parent_pte)
 {
 	mmu_page_remove_parent_pte(sp, parent_pte);
 	mmu_spte_clear_no_track(parent_pte);
@@ -1344,8 +1358,8 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
 	} while (!sp->unsync_children);
 }
 
-int mmu_sync_children(struct kvm_vcpu *vcpu, struct kvm_mmu_page *parent,
-		      bool can_yield)
+static int mmu_sync_children(struct kvm_vcpu *vcpu, struct kvm_mmu_page *parent,
+			     bool can_yield)
 {
 	int i;
 	struct kvm_mmu_page *sp;
@@ -1391,7 +1405,7 @@ void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
 	atomic_set(&sp->write_flooding_count,  0);
 }
 
-void clear_sp_write_flooding_count(u64 *spte)
+static void clear_sp_write_flooding_count(u64 *spte)
 {
 	__clear_sp_write_flooding_count(sptep_to_sp(spte));
 }
@@ -1604,9 +1618,9 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
 	return role;
 }
 
-struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu, u64 *sptep,
-					  gfn_t gfn, bool direct,
-					  unsigned int access)
+static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu,
+						 u64 *sptep, gfn_t gfn,
+						 bool direct, unsigned int access)
 {
 	union kvm_mmu_page_role role;
 
@@ -1617,8 +1631,9 @@ struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu, u64 *sptep,
 	return kvm_mmu_get_shadow_page(vcpu, gfn, role);
 }
 
-void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterator,
-				 struct kvm_vcpu *vcpu, hpa_t root, u64 addr)
+static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterator,
+					struct kvm_vcpu *vcpu, hpa_t root,
+					u64 addr)
 {
 	iterator->addr = addr;
 	iterator->shadow_addr = root;
@@ -1645,14 +1660,14 @@ void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterator,
 	}
 }
 
-void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
-		      struct kvm_vcpu *vcpu, u64 addr)
+static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
+			     struct kvm_vcpu *vcpu, u64 addr)
 {
 	shadow_walk_init_using_root(iterator, vcpu, vcpu->arch.mmu->root.hpa,
 				    addr);
 }
 
-bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
+static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
 {
 	if (iterator->level < PG_LEVEL_4K)
 		return false;
@@ -1674,7 +1689,7 @@ static void __shadow_walk_next(struct kvm_shadow_walk_iterator *iterator,
 	--iterator->level;
 }
 
-void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
+static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
 {
 	__shadow_walk_next(iterator, *iterator->sptep);
 }
@@ -1714,13 +1729,14 @@ static void __link_shadow_page(struct kvm *kvm,
 		mark_unsync(sptep);
 }
 
-void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep, struct kvm_mmu_page *sp)
+static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
+			     struct kvm_mmu_page *sp)
 {
 	__link_shadow_page(vcpu->kvm, &vcpu->arch.mmu_pte_list_desc_cache, sptep, sp, true);
 }
 
-void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-			  unsigned direct_access)
+static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
+				 unsigned direct_access)
 {
 	if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) {
 		struct kvm_mmu_page *child;
@@ -1742,8 +1758,8 @@ void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 }
 
 /* Returns the number of zapped non-leaf child shadow pages. */
-int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, u64 *spte,
-		     struct list_head *invalid_list)
+static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, u64 *spte,
+			    struct list_head *invalid_list)
 {
 	u64 pte;
 	struct kvm_mmu_page *child;
@@ -2156,9 +2172,9 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	return 0;
 }
 
-int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
-		 u64 *sptep, unsigned int pte_access, gfn_t gfn,
-		 kvm_pfn_t pfn, struct kvm_page_fault *fault)
+static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
+			u64 *sptep, unsigned int pte_access, gfn_t gfn,
+			kvm_pfn_t pfn, struct kvm_page_fault *fault)
 {
 	struct kvm_mmu_page *sp = sptep_to_sp(sptep);
 	int level = sp->role.level;
@@ -2263,8 +2279,8 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
-void __direct_pte_prefetch(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
-			   u64 *sptep)
+static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
+				  struct kvm_mmu_page *sp, u64 *sptep)
 {
 	u64 *spte, *start = NULL;
 	int i;
@@ -2800,7 +2816,7 @@ int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level)
 	return leaf;
 }
 
-void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
+static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
 {
 	struct kvm_shadow_walk_iterator iterator;
 	u64 spte;
diff --git a/arch/x86/kvm/mmu/shadow_mmu.h b/arch/x86/kvm/mmu/shadow_mmu.h
index 7faf8b06e68f1..9f16c4782bfbf 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.h
+++ b/arch/x86/kvm/mmu/shadow_mmu.h
@@ -36,32 +36,11 @@ struct pte_list_desc {
 	u64 *sptes[PTE_LIST_EXT];
 };
 
+/* Only exported for debugfs.c. */
 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
 
-struct kvm_shadow_walk_iterator {
-	u64 addr;
-	hpa_t shadow_addr;
-	u64 *sptep;
-	int level;
-	unsigned index;
-};
-
-#define for_each_shadow_entry_using_root(_vcpu, _root, _addr, _walker)     \
-	for (shadow_walk_init_using_root(&(_walker), (_vcpu),              \
-					 (_root), (_addr));                \
-	     shadow_walk_okay(&(_walker));			           \
-	     shadow_walk_next(&(_walker)))
-
-bool mmu_spte_update(u64 *sptep, u64 new_spte);
-void mmu_spte_clear_no_track(u64 *sptep);
-gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index);
-void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index,
-			     unsigned int access);
-
 struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
 				  const struct kvm_memory_slot *slot);
-bool rmap_can_add(struct kvm_vcpu *vcpu);
-void drop_spte(struct kvm *kvm, u64 *sptep);
 bool rmap_write_protect(struct kvm_rmap_head *rmap_head, bool pt_protect);
 bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 			const struct kvm_memory_slot *slot);
@@ -85,30 +64,8 @@ bool kvm_test_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 		       struct kvm_memory_slot *slot, gfn_t gfn,
 		       int level, pte_t unused);
 
-void drop_parent_pte(struct kvm_mmu_page *sp, u64 *parent_pte);
-int mmu_sync_children(struct kvm_vcpu *vcpu, struct kvm_mmu_page *parent,
-		      bool can_yield);
 void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp);
-void clear_sp_write_flooding_count(u64 *spte);
-
-struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu, u64 *sptep,
-					  gfn_t gfn, bool direct,
-					  unsigned int access);
-
-void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterator,
-				 struct kvm_vcpu *vcpu, hpa_t root, u64 addr);
-void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
-		      struct kvm_vcpu *vcpu, u64 addr);
-bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator);
-void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator);
-
-void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep, struct kvm_mmu_page *sp);
-
-void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-			  unsigned direct_access);
 
-int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, u64 *spte,
-		     struct list_head *invalid_list);
 bool __kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 				struct list_head *invalid_list,
 				int *nr_zapped);
@@ -120,11 +77,6 @@ int make_mmu_pages_available(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
 
-int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
-		 u64 *sptep, unsigned int pte_access, gfn_t gfn,
-		 kvm_pfn_t pfn, struct kvm_page_fault *fault);
-void __direct_pte_prefetch(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
-			   u64 *sptep);
 int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
 u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte);
 
@@ -134,7 +86,6 @@ int mmu_alloc_special_roots(struct kvm_vcpu *vcpu);
 
 int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level);
 
-void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr);
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 		       int bytes, struct kvm_page_track_notifier_node *node);
 

From patchwork Thu Feb  2 18:27:59 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52127
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp401217wrn;
        Thu, 2 Feb 2023 10:30:49 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set/7rRjYNkoKwNKQh4z0UD3EW9CyRVLhhDnB4Mzicg3/asTIO6zh6tAsBEeOiL6KUXIhPtW0
X-Received: by 2002:a17:902:e285:b0:196:5f76:4c5 with SMTP id
 o5-20020a170902e28500b001965f7604c5mr6173094plc.61.1675362649344;
        Thu, 02 Feb 2023 10:30:49 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675362649; cv=none;
        d=google.com; s=arc-20160816;
        b=C1bKLh4OhOdA24i2LSkgKX7ow7BJ5NSRMdI5Aa4hqpVmllosdd2hOBRm1nhgEouowt
         c+QkTpCKCs+YoqdUIDmh+3zZ7hJKqk9JPkloEj4sE/ibUeSFzvpBswO0oIwxbW0ePrkn
         iKpA3EQbUWPMc8rvDBPVEDkOPNmyiLTCtwkbNKkNCKdQpA+xqqOGvS/9OTpOREw98KyL
         NWXmfjbqFZE+8Y0pg/I+k7HxMFB3CmC7Leqa07QS4CI5ef/ZfgnjcrUJDTuB9xgO4CBP
         zvf8NmTc8ajvUrgULACJoIf/bfBeiwpXxu9bx/MuWyuCsHn0giveBpQaQA20uGz+sqSj
         4d2w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=QZ1HEwEMVxqlhDuM7XEGMF2OMxJrvw5/FJ7SR+T/ad8=;
        b=sXceo7Slklgx0XORElLq2/WH3gDdrSwFnB6HRWXPAEIMEDrNPP1dajdwleWcs3QOTs
         Os3gsDrVAojTm2W8rSx9z1BXjQSr3raYhtwQLKuBAmctgjsyDPhp7rpg4dwisO3NOkRa
         xxKKok0I0rARuP9p6r/bn+/Gm6SroIYGI0F7/o+U3Zs43biTZGr6JDpT8DImbqwwOxVQ
         0XYZmhKpai9t/kJJAqhAt5vUdGhCqJLodBuepC8vQUMIAawKAaWWv6HbFodmXzjKUQT5
         uZtmpKB8tkaXmvEGDe15ZnAeESSaZounsHHPEH+kXEBU+IHqIl1y+aJ4STHIdTh15bEO
         1pww==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=OHVXVGpC;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 ba10-20020a170902720a00b00196189e1695si15220750plb.117.2023.02.02.10.30.36;
        Thu, 02 Feb 2023 10:30:49 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=OHVXVGpC;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232603AbjBBS35 (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:29:57 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60752 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232633AbjBBS3Y (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:29:24 -0500
Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com
 [IPv6:2607:f8b0:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A62D3F285
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:40 -0800 (PST)
Received: by mail-pl1-x649.google.com with SMTP id
 u6-20020a170903124600b00188cd4769bcso1305325plh.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:40 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=QZ1HEwEMVxqlhDuM7XEGMF2OMxJrvw5/FJ7SR+T/ad8=;
        b=OHVXVGpCVzM0EbGrsrKhehaH5ULWml3IrlfPY6Jo4UAMn5inCsOWXYB1cvUeJNtXdA
         UfS9yb/fZXTG28yX9xD6JZmcXLur/xzJBjrs0e+fhd2ilyDa9RMkIxY603RVrBtLV+81
         CsRlC2gYChVYF0Rqbx2LMh2dByZ2Vg8SwasajDI69Dnpd6uStiFbqZlheIBgT4E+ySl7
         MrK2/2OBoUebq36L17SJcVofmFayxKfwx/AfF7nLa+/xgF5xEMAUQShZnSLGSXWYBkAX
         5RSISurkxzM20nJtaIAZEre9jUzF7lIW6r8H8Ywn7Ssy8Q6SJQpAgYvlorPdxrE/1VeT
         N6KQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=QZ1HEwEMVxqlhDuM7XEGMF2OMxJrvw5/FJ7SR+T/ad8=;
        b=oVJknqdYHkYieuLmirrG6UrXJaJDs6i2D6c0/ggr9f01DVT8yWPjSMYMK4GoTbMNws
         YyINbnuETDvurHeBKWW/9CibQtrQn774IhVVpezJ3krmDubaVoYnf4XJ9M9DVY2VMIIF
         pzUAenLLcUkeIU+ZE50OyeA6G/uMX7Y4uWo9qoUkTEy8cKS8dIbYr6Qid9Szb+SvjvN0
         roG4Nv40qdkYt/xZfHCFzHbSz+2NnSLci1uqKkxfM80UFVXe0PvAneFA2/WBjwp7iexB
         ZnuHpl5kwLDD7xm/Q8Ym99KjmervXNXkYEfkBXt2bUbC0/JMKiT+434bd4E00Ncz4wQa
         V3yQ==
X-Gm-Message-State: AO0yUKUKNFyOYYAO4ZRgEJbwA0daluDte8o2sdxMmRnNplHhsuYMnyjJ
        yfEGJplJB1ksuYVaEoxoTl+KFl2b+Qt1+mxNAc32TNOO2HnNfVfCUUpS3fuROmOtNtnG26Agz1v
        gXPrcyRldlKlPQVY25euTdmBnIVMFQNpYy++3k6YL7qPcTXChfthXt7sg7k3/lWjpQObRJs1V
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a17:90b:946:b0:22c:1bd6:77d5 with SMTP id
 dw6-20020a17090b094600b0022c1bd677d5mr815107pjb.18.1675362509401; Thu, 02 Feb
 2023 10:28:29 -0800 (PST)
Date: Thu,  2 Feb 2023 18:27:59 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-12-bgardon@google.com>
Subject: [PATCH 11/21] KVM: x86/MMU: Cleanup shrinker interface with Shadow
 MMU
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745065387596192?=
X-GMAIL-MSGID: =?utf-8?q?1756745065387596192?=

The MMU shrinker currently only operates on the Shadow MMU, but having
the entire implemenatation in shadow_mmu.c is awkward since much of the
function isn't Shadow MMU specific. There has also been talk of changing
the target of the shrinker to the MMU caches rather than already allocated
page tables. As a result, it makes sense to move some of the implementation
back to mmu.c.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c        | 43 ++++++++++++++++++++++++
 arch/x86/kvm/mmu/shadow_mmu.c | 62 ++++++++---------------------------
 arch/x86/kvm/mmu/shadow_mmu.h |  3 +-
 3 files changed, 58 insertions(+), 50 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index cef481a17a519..3ea54b08239aa 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3145,6 +3145,49 @@ static unsigned long mmu_shrink_count(struct shrinker *shrink,
 	return percpu_counter_read_positive(&kvm_total_used_mmu_pages);
 }
 
+unsigned long mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
+{
+	struct kvm *kvm;
+	int nr_to_scan = sc->nr_to_scan;
+	unsigned long freed = 0;
+
+	mutex_lock(&kvm_lock);
+
+	list_for_each_entry(kvm, &vm_list, vm_list) {
+		/*
+		 * Never scan more than sc->nr_to_scan VM instances.
+		 * Will not hit this condition practically since we do not try
+		 * to shrink more than one VM and it is very unlikely to see
+		 * !n_used_mmu_pages so many times.
+		 */
+		if (!nr_to_scan--)
+			break;
+
+		/*
+		 * n_used_mmu_pages is accessed without holding kvm->mmu_lock
+		 * here. We may skip a VM instance errorneosly, but we do not
+		 * want to shrink a VM that only started to populate its MMU
+		 * anyway.
+		 */
+		if (!kvm->arch.n_used_mmu_pages &&
+		    !kvm_shadow_mmu_has_zapped_obsolete_pages(kvm))
+			continue;
+
+		freed = kvm_shadow_mmu_shrink_scan(kvm, sc->nr_to_scan);
+
+		/*
+		 * unfair on small ones
+		 * per-vm shrinkers cry out
+		 * sadness comes quickly
+		 */
+		list_move_tail(&kvm->vm_list, &vm_list);
+		break;
+	}
+
+	mutex_unlock(&kvm_lock);
+	return freed;
+}
+
 static struct shrinker mmu_shrinker = {
 	.count_objects = mmu_shrink_count,
 	.scan_objects = mmu_shrink_scan,
diff --git a/arch/x86/kvm/mmu/shadow_mmu.c b/arch/x86/kvm/mmu/shadow_mmu.c
index 1be680bce15a6..76c50aca3c487 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.c
+++ b/arch/x86/kvm/mmu/shadow_mmu.c
@@ -3160,7 +3160,7 @@ void kvm_zap_obsolete_pages(struct kvm *kvm)
 	kvm_mmu_commit_zap_page(kvm, &kvm->arch.zapped_obsolete_pages);
 }
 
-static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
+bool kvm_shadow_mmu_has_zapped_obsolete_pages(struct kvm *kvm)
 {
 	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
 }
@@ -3429,60 +3429,24 @@ void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
 		kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
 }
 
-unsigned long mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
+unsigned long kvm_shadow_mmu_shrink_scan(struct kvm *kvm, int pages_to_free)
 {
-	struct kvm *kvm;
-	int nr_to_scan = sc->nr_to_scan;
 	unsigned long freed = 0;
+	int idx;
 
-	mutex_lock(&kvm_lock);
-
-	list_for_each_entry(kvm, &vm_list, vm_list) {
-		int idx;
-		LIST_HEAD(invalid_list);
-
-		/*
-		 * Never scan more than sc->nr_to_scan VM instances.
-		 * Will not hit this condition practically since we do not try
-		 * to shrink more than one VM and it is very unlikely to see
-		 * !n_used_mmu_pages so many times.
-		 */
-		if (!nr_to_scan--)
-			break;
-		/*
-		 * n_used_mmu_pages is accessed without holding kvm->mmu_lock
-		 * here. We may skip a VM instance errorneosly, but we do not
-		 * want to shrink a VM that only started to populate its MMU
-		 * anyway.
-		 */
-		if (!kvm->arch.n_used_mmu_pages &&
-		    !kvm_has_zapped_obsolete_pages(kvm))
-			continue;
-
-		idx = srcu_read_lock(&kvm->srcu);
-		write_lock(&kvm->mmu_lock);
-
-		if (kvm_has_zapped_obsolete_pages(kvm)) {
-			kvm_mmu_commit_zap_page(kvm,
-			      &kvm->arch.zapped_obsolete_pages);
-			goto unlock;
-		}
+	idx = srcu_read_lock(&kvm->srcu);
+	write_lock(&kvm->mmu_lock);
 
-		freed = kvm_mmu_zap_oldest_mmu_pages(kvm, sc->nr_to_scan);
+	if (kvm_shadow_mmu_has_zapped_obsolete_pages(kvm)) {
+		kvm_mmu_commit_zap_page(kvm, &kvm->arch.zapped_obsolete_pages);
+		goto out;
+	}
 
-unlock:
-		write_unlock(&kvm->mmu_lock);
-		srcu_read_unlock(&kvm->srcu, idx);
+	freed = kvm_mmu_zap_oldest_mmu_pages(kvm, pages_to_free);
 
-		/*
-		 * unfair on small ones
-		 * per-vm shrinkers cry out
-		 * sadness comes quickly
-		 */
-		list_move_tail(&kvm->vm_list, &vm_list);
-		break;
-	}
+out:
+	write_unlock(&kvm->mmu_lock);
+	srcu_read_unlock(&kvm->srcu, idx);
 
-	mutex_unlock(&kvm_lock);
 	return freed;
 }
diff --git a/arch/x86/kvm/mmu/shadow_mmu.h b/arch/x86/kvm/mmu/shadow_mmu.h
index 9f16c4782bfbf..9e27d03fbe368 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.h
+++ b/arch/x86/kvm/mmu/shadow_mmu.h
@@ -112,7 +112,8 @@ void kvm_shadow_mmu_try_split_huge_pages(struct kvm *kvm,
 void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
 				    const struct kvm_memory_slot *slot);
 
-unsigned long mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc);
+bool kvm_shadow_mmu_has_zapped_obsolete_pages(struct kvm *kvm);
+unsigned long kvm_shadow_mmu_shrink_scan(struct kvm *kvm, int pages_to_free);
 
 /* Exports from paging_tmpl.h */
 gpa_t paging32_gva_to_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,

From patchwork Thu Feb  2 18:28:00 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52126
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp401138wrn;
        Thu, 2 Feb 2023 10:30:40 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set9wvWyhBIQCRzHCkb1lJK4iivhERy3JrfgK4ha8xg9QL3ueeuowe/aTr4Px+w4zUBJGwlUm
X-Received: by 2002:a17:902:e313:b0:196:47f0:50b6 with SMTP id
 q19-20020a170902e31300b0019647f050b6mr5430674plc.47.1675362640203;
        Thu, 02 Feb 2023 10:30:40 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675362640; cv=none;
        d=google.com; s=arc-20160816;
        b=WzLHXk5qAA4Vn26o9/z4oTqPW5a83RvE38T8kbRWbbHHFPZpNb11NRlKJSbpbrPAks
         R1yxGM4hCNe8bpNYySA6UJtQJPAYpshLG1kN5K+v4EQMlHy96Fq+s4z71jmfSetZyw6v
         q+nYfq6ThDrdtAprU/f+LGimOvYOA6NKy5pb6CJmV8L+w1Fm5pJZm8CIBHJ2MPN4/oO4
         uYvLc/hx6WDgst6htJa4InsJ5YfBQsqNtrtOcBQ2Zh/lcHutuBJmeG5yXWziyB1GOxfE
         P95ngSA6VDeS6AEd9b9s/BrZr7rUT/NFcJ/tU0hZA7o92tnUU2CyXtVyoaWRZdkCyLMz
         rEFA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=F2DHKgw0dJCtwQxpbZaWFIOX07AJ3Q8QHUf3isrBY5Q=;
        b=x+gAQuM2CucWAsjTxxDhiZ/lIfk1g+kJv6REAy4umPs8M+mqkdSijniTFiS0u+jpDM
         lt4BrlCmVnb82km+wIqT7EZV79HQ0TCkPQR+/LBpmZp3ZqpvRRd/CH1aTlDHYlKxWWc6
         54yKzxPETDv2ysOq/afESOBlwG8IDB9nBDU8Z/cm9HXGytQpd//FP3R7qOLSj7sbbmxK
         Ez1DTUEcPJjQucBsTWN8zcKCZy5EmUNArs+0cpvH4sM4zHM2PJ+hQvPAOHhTq9JI0Q7j
         uIk4NiYRV8FMq3JRJiKfmSi5SK65/HKShUfN+M1EpWcO1vsqbaeDCOFH8vOQvQFnjuGC
         ak/g==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b="hp3/EcJm";
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 ik9-20020a170902ab0900b00192a2923f37si22466795plb.359.2023.02.02.10.30.26;
        Thu, 02 Feb 2023 10:30:40 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b="hp3/EcJm";
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232795AbjBBS3u (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:29:50 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60808 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232732AbjBBS3K (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:29:10 -0500
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C53F179F25
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:31 -0800 (PST)
Received: by mail-pg1-x549.google.com with SMTP id
 127-20020a630685000000b004ec5996dcc0so1365244pgg.8
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=F2DHKgw0dJCtwQxpbZaWFIOX07AJ3Q8QHUf3isrBY5Q=;
        b=hp3/EcJmYWEvxFjMK6a/uHpM1/syENeJe8Vvon0/F0y95LHB1UdDLRIDLy8N2fbnGz
         aLVOkNCYi1c49Z3AvQargVa8ZHbDsPSERWCFQbdrUusIodCCUkfqI0SNdPidsNnRfPm0
         JAZDm+rCgHNoBeAXuhhHiZy6TKc1Dou8DmV4tVp+bExmmdqerC28oE/BkhYDUMS+mfi6
         vtMcr/v80IWmTBj6AV/kY5Y7thWKzdPPsI5jxHsgvYNxug0EmP7ubS9aNVRGYj/m3k7j
         DTv45ZxkQ1pxSQwGY4zXeQgCvjHb46/SQTAemmsze3WrW0UbLW/5lV5NhGSw4dtjLLqz
         8PSA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=F2DHKgw0dJCtwQxpbZaWFIOX07AJ3Q8QHUf3isrBY5Q=;
        b=cKzxy6IubwcU+VRyqW+1+gUfUyDQzQqQsoK/7Hns9kW7EQP/xwEZM3MOYi2eZDrB7F
         9QTesmCfKYVUVnmsDzCWoJrPI3lmLtGuHRpIDdZD074JXAciDbmxWz0jUFEqgaIHmBav
         O2C6xY/pxKs5MT4qlOPYHOgB0+SDCgZ6Ld/YDYFlltK72znaJEbbIvFjHvW7voJ1+S3x
         8UweWLl0aa8vAoEaVKyat/Yzwke8Ms905YvQrDgGPR+4PuH9fG3gk+sQ3gqZ4abcHvSS
         uQFJJ3oXvZ5yOcioEQal0n8Sil2WO4KNUlhyz2dnHUmzOxKM7MyqKwDXjGJN5T8Ychup
         I/jA==
X-Gm-Message-State: AO0yUKVA1XDnBfsYjHbVeS0T1FdBB8aWcnm/zemDQiCk0w1/fviQJ81M
        SmENeYjd1lVxdrmyUCf45iOIKcmcfdfEO18xf9MY8tmTcdGOvTjsLjn/Io9j2t3bPakQw3BdmWH
        lL2ULGftcY1UhzqHkxBUIbGfErrc57z6kVf1Uy+F0GMjJRKdbLJt6AAQDVJsRsnb4opUXLKQ8
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a17:90a:6388:b0:225:eaa2:3f5d with SMTP id
 f8-20020a17090a638800b00225eaa23f5dmr105787pjj.2.1675362510924; Thu, 02 Feb
 2023 10:28:30 -0800 (PST)
Date: Thu,  2 Feb 2023 18:28:00 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-13-bgardon@google.com>
Subject: [PATCH 12/21] KVM: x86/MMU: Clean up naming of exported Shadow MMU
 functions
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745055733314817?=
X-GMAIL-MSGID: =?utf-8?q?1756745055733314817?=

Change the naming scheme on several functions exported from the shadow
MMU to match the naming scheme used by the TDP MMU: kvm_shadow_mmu_.
More cleanups will follow to convert the remaining functions to a similar
naming scheme, but for now, start with the trivial renames.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c         | 19 ++++++++++---------
 arch/x86/kvm/mmu/paging_tmpl.h |  2 +-
 arch/x86/kvm/mmu/shadow_mmu.c  | 19 ++++++++++---------
 arch/x86/kvm/mmu/shadow_mmu.h  | 17 +++++++++--------
 4 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3ea54b08239aa..9308ab8102f9b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1089,7 +1089,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 	int r;
 
 	write_lock(&vcpu->kvm->mmu_lock);
-	r = make_mmu_pages_available(vcpu);
+	r = kvm_shadow_mmu_make_pages_available(vcpu);
 	if (r < 0)
 		goto out_unlock;
 
@@ -1164,7 +1164,7 @@ static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
 	if (is_tdp_mmu_active(vcpu))
 		leaf = kvm_tdp_mmu_get_walk(vcpu, addr, sptes, &root);
 	else
-		leaf = get_walk(vcpu, addr, sptes, &root);
+		leaf = kvm_shadow_mmu_get_walk(vcpu, addr, sptes, &root);
 
 	walk_shadow_page_lockless_end(vcpu);
 
@@ -1432,11 +1432,11 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (is_page_fault_stale(vcpu, fault))
 		goto out_unlock;
 
-	r = make_mmu_pages_available(vcpu);
+	r = kvm_shadow_mmu_make_pages_available(vcpu);
 	if (r)
 		goto out_unlock;
 
-	r = direct_map(vcpu, fault);
+	r = kvm_shadow_mmu_direct_map(vcpu, fault);
 
 out_unlock:
 	write_unlock(&vcpu->kvm->mmu_lock);
@@ -1471,7 +1471,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 		trace_kvm_page_fault(vcpu, fault_address, error_code);
 
 		if (kvm_event_needs_reinjection(vcpu))
-			kvm_mmu_unprotect_page_virt(vcpu, fault_address);
+			kvm_shadow_mmu_unprotect_page_virt(vcpu, fault_address);
 		r = kvm_mmu_page_fault(vcpu, fault_address, error_code, insn,
 				insn_len);
 	} else if (flags & KVM_PV_REASON_PAGE_NOT_PRESENT) {
@@ -2786,7 +2786,8 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 	 * In order to ensure all vCPUs drop their soon-to-be invalid roots,
 	 * invalidating TDP MMU roots must be done while holding mmu_lock for
 	 * write and in the same critical section as making the reload request,
-	 * e.g. before kvm_zap_obsolete_pages() could drop mmu_lock and yield.
+	 * e.g. before kvm_shadow_mmu_zap_obsolete_pages() could drop mmu_lock
+	 * and yield.
 	 */
 	if (tdp_mmu_enabled)
 		kvm_tdp_mmu_invalidate_all_roots(kvm);
@@ -2801,7 +2802,7 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 	 */
 	kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS);
 
-	kvm_zap_obsolete_pages(kvm);
+	kvm_shadow_mmu_zap_obsolete_pages(kvm);
 
 	write_unlock(&kvm->mmu_lock);
 
@@ -2890,7 +2891,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 
 	kvm_mmu_invalidate_begin(kvm, 0, -1ul);
 
-	flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end);
+	flush = kvm_shadow_mmu_zap_gfn_range(kvm, gfn_start, gfn_end);
 
 	if (tdp_mmu_enabled) {
 		for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
@@ -3034,7 +3035,7 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 {
 	if (kvm_memslots_have_rmaps(kvm)) {
 		write_lock(&kvm->mmu_lock);
-		kvm_rmap_zap_collapsible_sptes(kvm, slot);
+		kvm_shadow_mmu_zap_collapsible_sptes(kvm, slot);
 		write_unlock(&kvm->mmu_lock);
 	}
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 1251357794538..14a8c8217c4cf 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -866,7 +866,7 @@ int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	if (is_page_fault_stale(vcpu, fault))
 		goto out_unlock;
 
-	r = make_mmu_pages_available(vcpu);
+	r = kvm_shadow_mmu_make_pages_available(vcpu);
 	if (r)
 		goto out_unlock;
 	r = FNAME(fetch)(vcpu, fault, &walker);
diff --git a/arch/x86/kvm/mmu/shadow_mmu.c b/arch/x86/kvm/mmu/shadow_mmu.c
index 76c50aca3c487..36b335d75aee2 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.c
+++ b/arch/x86/kvm/mmu/shadow_mmu.c
@@ -1977,7 +1977,7 @@ static inline unsigned long kvm_mmu_available_pages(struct kvm *kvm)
 	return 0;
 }
 
-int make_mmu_pages_available(struct kvm_vcpu *vcpu)
+int kvm_shadow_mmu_make_pages_available(struct kvm_vcpu *vcpu)
 {
 	unsigned long avail = kvm_mmu_available_pages(vcpu->kvm);
 
@@ -2041,7 +2041,7 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
 	return r;
 }
 
-int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
+int kvm_shadow_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
 {
 	gpa_t gpa;
 	int r;
@@ -2331,7 +2331,7 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
 	__direct_pte_prefetch(vcpu, sp, sptep);
 }
 
-int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+int kvm_shadow_mmu_direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	struct kvm_shadow_walk_iterator it;
 	struct kvm_mmu_page *sp;
@@ -2549,7 +2549,7 @@ int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		return r;
 
 	write_lock(&vcpu->kvm->mmu_lock);
-	r = make_mmu_pages_available(vcpu);
+	r = kvm_shadow_mmu_make_pages_available(vcpu);
 	if (r < 0)
 		goto out_unlock;
 
@@ -2797,7 +2797,8 @@ void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu)
  *
  * Must be called between walk_shadow_page_lockless_{begin,end}.
  */
-int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level)
+int kvm_shadow_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
+			    int *root_level)
 {
 	struct kvm_shadow_walk_iterator iterator;
 	int leaf = -1;
@@ -3104,7 +3105,7 @@ __always_inline bool walk_slot_rmaps_4k(struct kvm *kvm,
 }
 
 #define BATCH_ZAP_PAGES	10
-void kvm_zap_obsolete_pages(struct kvm *kvm)
+void kvm_shadow_mmu_zap_obsolete_pages(struct kvm *kvm)
 {
 	struct kvm_mmu_page *sp, *node;
 	int nr_zapped, batch = 0;
@@ -3165,7 +3166,7 @@ bool kvm_shadow_mmu_has_zapped_obsolete_pages(struct kvm *kvm)
 	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
 }
 
-bool kvm_rmap_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
+bool kvm_shadow_mmu_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 {
 	const struct kvm_memory_slot *memslot;
 	struct kvm_memslots *slots;
@@ -3417,8 +3418,8 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 	return need_tlb_flush;
 }
 
-void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
-				    const struct kvm_memory_slot *slot)
+void kvm_shadow_mmu_zap_collapsible_sptes(struct kvm *kvm,
+					  const struct kvm_memory_slot *slot)
 {
 	/*
 	 * Note, use KVM_MAX_HUGEPAGE_LEVEL - 1 since there's no need to zap
diff --git a/arch/x86/kvm/mmu/shadow_mmu.h b/arch/x86/kvm/mmu/shadow_mmu.h
index 9e27d03fbe368..cc28895d2a24f 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.h
+++ b/arch/x86/kvm/mmu/shadow_mmu.h
@@ -73,18 +73,19 @@ bool kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 			      struct list_head *invalid_list);
 void kvm_mmu_commit_zap_page(struct kvm *kvm, struct list_head *invalid_list);
 
-int make_mmu_pages_available(struct kvm_vcpu *vcpu);
+int kvm_shadow_mmu_make_pages_available(struct kvm_vcpu *vcpu);
 
-int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
+int kvm_shadow_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
 
-int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
+int kvm_shadow_mmu_direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
 u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte);
 
 hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant, u8 level);
 int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu);
 int mmu_alloc_special_roots(struct kvm_vcpu *vcpu);
 
-int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level);
+int kvm_shadow_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
+			    int *root_level);
 
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 		       int bytes, struct kvm_page_track_notifier_node *node);
@@ -99,8 +100,8 @@ bool walk_slot_rmaps(struct kvm *kvm, const struct kvm_memory_slot *slot,
 bool walk_slot_rmaps_4k(struct kvm *kvm, const struct kvm_memory_slot *slot,
 			slot_rmaps_handler fn, bool flush_on_yield);
 
-void kvm_zap_obsolete_pages(struct kvm *kvm);
-bool kvm_rmap_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+void kvm_shadow_mmu_zap_obsolete_pages(struct kvm *kvm);
+bool kvm_shadow_mmu_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
 
 bool slot_rmap_write_protect(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 			     const struct kvm_memory_slot *slot);
@@ -109,8 +110,8 @@ void kvm_shadow_mmu_try_split_huge_pages(struct kvm *kvm,
 					 const struct kvm_memory_slot *slot,
 					 gfn_t start, gfn_t end,
 					 int target_level);
-void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
-				    const struct kvm_memory_slot *slot);
+void kvm_shadow_mmu_zap_collapsible_sptes(struct kvm *kvm,
+					  const struct kvm_memory_slot *slot);
 
 bool kvm_shadow_mmu_has_zapped_obsolete_pages(struct kvm *kvm);
 unsigned long kvm_shadow_mmu_shrink_scan(struct kvm *kvm, int pages_to_free);

From patchwork Thu Feb  2 18:28:01 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52130
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp401322wrn;
        Thu, 2 Feb 2023 10:30:58 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set8tAJYN1kGl8UlyC3wHOVgrjQrj5YJHYcCKoRQg3HEyRkWTL7mfubuez+9tMTrHgxKm2z65
X-Received: by 2002:a05:6a20:3d27:b0:bf:6602:edde with SMTP id
 y39-20020a056a203d2700b000bf6602eddemr6213667pzi.60.1675362657796;
        Thu, 02 Feb 2023 10:30:57 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675362657; cv=none;
        d=google.com; s=arc-20160816;
        b=DmZ4TUdO1Sd8PwPGgIJfNet23CApSBHaPp65w50Fz2/+P0f0r0Nxgo3n6q9Yy1AX7P
         QF+BCgTNEo0ykpmdpeLrSwulWxjCnF11UEOOBsJ4lysoOYKQU68mvCBpW5873KHkGYxN
         v+Sm70U8FRI41TXXYdH2rCNVoep9amLuY0WBJVQYJxTIzfJEEd5E+RJ2+R/Vdr/GIf09
         BUefQwUm2V1tdCSCasR7VIaUcOiFBTyB6pJI/1M4OoNd3GmezYWNM99mjo3gv2D9b2yh
         Ee7eowuyiVeiwKvj3UNlIEaITWH8pJqCjEgVyMJseUtj+YAZPOM0NlK8+P6lAIcxOXGN
         THyw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=TamQ7OunAE4JUpYasRsLdFf4L+G/oCkA7IYvB0Ze0GI=;
        b=nTocE78tsLfaro9hjuAPFXrx34S6oJAUuO/hDwk3vIOfGwtzqJsxa+FNw5OvT/rh8f
         ycDaQP7Q4MRtOkg2M0VNFxbwuBDvdbD9t9kUDhNDGQAi/Im/KlI7qTdoxC2c2QS74RzV
         Fu5LJ94co+aHP3NO4xRgzB8cK0Qe7SpfgqYIbqXyFQGqDUtUxjjdb3duwwRifGzUftPF
         0WtV3A/Nie923WPxbNVpCPbnifKKpSQbtiPVmwMkzA5Xv/q84I1GKR2uOklIF7Dpf9H8
         EaRCegmkbl/Rk2EmXPeKYGGAGx1yxAKOHwJixZyhRdVNtuUpSmMPkWhCVh+vl4bLT+nt
         eVSg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=Kd4q0HGp;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 y128-20020a623286000000b00590712dd84bsi27000pfy.81.2023.02.02.10.30.44;
        Thu, 02 Feb 2023 10:30:57 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=Kd4q0HGp;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232890AbjBBSaP (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:30:15 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33462 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232766AbjBBS30 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:29:26 -0500
Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com
 [IPv6:2607:f8b0:4864:20::64a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F6D91E9C2
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:43 -0800 (PST)
Received: by mail-pl1-x64a.google.com with SMTP id
 q5-20020a170902e30500b001967df62d0dso1307108plc.18
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=TamQ7OunAE4JUpYasRsLdFf4L+G/oCkA7IYvB0Ze0GI=;
        b=Kd4q0HGp6rXEqYvuPFpu14ySATnUsyqeHtimcLiusOsHbo+XUUY11/D9w+KTWi32TN
         eitny6WUWUhqgnQqH0KcvvXlL/EezRN4RI+67mfbfnQxnYDh456aRbNvCBrN49NPBHUt
         q7WIMhCS8v9qymyctVGJerM42ANGqAz3pfNWH6UzSHhCf/Gy0rXHU28MHIGrAR8rp2j7
         8W6wmy6tZ74gfniRy27HLIYoNSkeQ/vJsE/HGRuMzhXwvsLVRf9ik8gDPFyZ32UfOFFD
         goJ2VcXA+GqEiETlVMOgd+HJT/WQI8V1VQ40S3eYYtEMU0Dy4cXZ8aPC9NVZxpeupxdO
         ke+Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=TamQ7OunAE4JUpYasRsLdFf4L+G/oCkA7IYvB0Ze0GI=;
        b=b4zGRFgAIb3hxiF2+1qWMKIqmjph4FJqP8Ci/cFVlvbPRA4tuBc+Xvk4O/SlG8LsjF
         oFOPCrRddzSAcWoncV17MN/tSseMmLbrdKk0DwLSRQGKGBXcfveulGOfaR2IVyixrITo
         3lgEYnxetw4VN09qI82rZZa3wnFDZqUSX4PEuatsY7/1Ck1KH0u19G3giYgM4FlFclaG
         XWVfE2VjnAcd7f0pDuCTmQ5h3HrIchXDDkXyx9tvx/LoGZ7pt/UtWO32QLPvkAkTZ+Hz
         FOrd2rzx5VmcGqNOFuP8oZi5UD8BWfPNC94PtsR9rkwPFjvKMqeXuKpWihXXDNPxoU8o
         yMXg==
X-Gm-Message-State: AO0yUKVptGyZiUZMVL6qxFdfwyHlgBLZEFvf6SNsO1sMRudzeDOlcTiz
        RqWo48u2OZ4eB51Y5mqtqWP4jEr8tbvkWYqk965N3nDpWsC2OH0oPSyCa04AvkTQYMcrQ8Q/H6w
        VG9kH80RjId9C6wj5CA4teEqpSYmtsmjFIs4yx62CrQw+Syp5TkQg6tLLmVv3rXaal/JTVump
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a17:90a:38e3:b0:230:815:ff39 with SMTP id
 x90-20020a17090a38e300b002300815ff39mr676156pjb.141.1675362512730; Thu, 02
 Feb 2023 10:28:32 -0800 (PST)
Date: Thu,  2 Feb 2023 18:28:01 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-14-bgardon@google.com>
Subject: [PATCH 13/21] KVM: x86/MMU: Fix naming on prepare / commit zap page
 functions
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745073787738094?=
X-GMAIL-MSGID: =?utf-8?q?1756745073787738094?=

Since the various prepare / commit zap page functions are part of the
Shadow MMU and used all over both shadow_mmu.c and mmu.c, add _shadow_
to the function names to match the rest of the Shadow MMU interface.
Since there are so many uses of these functions, this rename gets its
own commit.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c        | 21 +++++++--------
 arch/x86/kvm/mmu/shadow_mmu.c | 48 ++++++++++++++++++-----------------
 arch/x86/kvm/mmu/shadow_mmu.h | 13 +++++-----
 3 files changed, 43 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9308ab8102f9b..9b217e04cab0e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -230,8 +230,9 @@ void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
 		kvm_tdp_mmu_walk_lockless_end();
 	} else {
 		/*
-		 * Make sure the write to vcpu->mode is not reordered in front of
-		 * reads to sptes.  If it does, kvm_mmu_commit_zap_page() can see us
+		 * Make sure the write to vcpu->mode is not reordered in front
+		 * of reads to sptes.  If it does,
+		 * kvm_shadow_mmu_commit_zap_page() can see us
 		 * OUTSIDE_GUEST_MODE and proceed to free the shadow page table.
 		 */
 		smp_store_release(&vcpu->mode, OUTSIDE_GUEST_MODE);
@@ -568,7 +569,7 @@ bool kvm_mmu_remote_flush_or_zap(struct kvm *kvm, struct list_head *invalid_list
 		return false;
 
 	if (!list_empty(invalid_list))
-		kvm_mmu_commit_zap_page(kvm, invalid_list);
+		kvm_shadow_mmu_commit_zap_page(kvm, invalid_list);
 	else
 		kvm_flush_remote_tlbs(kvm);
 	return true;
@@ -1022,7 +1023,7 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
 	if (is_tdp_mmu_page(sp))
 		kvm_tdp_mmu_put_root(kvm, sp, false);
 	else if (!--sp->root_count && sp->role.invalid)
-		kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
+		kvm_shadow_mmu_prepare_zap_page(kvm, sp, invalid_list);
 
 	*root_hpa = INVALID_PAGE;
 }
@@ -1075,7 +1076,7 @@ void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
 		mmu->root.pgd = 0;
 	}
 
-	kvm_mmu_commit_zap_page(kvm, &invalid_list);
+	kvm_shadow_mmu_commit_zap_page(kvm, &invalid_list);
 	write_unlock(&kvm->mmu_lock);
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_free_roots);
@@ -1397,8 +1398,8 @@ bool is_page_fault_stale(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	 * there is a pending request to free obsolete roots.  The request is
 	 * only a hint that the current root _may_ be obsolete and needs to be
 	 * reloaded, e.g. if the guest frees a PGD that KVM is tracking as a
-	 * previous root, then __kvm_mmu_prepare_zap_page() signals all vCPUs
-	 * to reload even if no vCPU is actively using the root.
+	 * previous root, then __kvm_shadow_mmu_prepare_zap_page() signals all
+	 * vCPUs to reload even if no vCPU is actively using the root.
 	 */
 	if (!sp && kvm_test_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu))
 		return true;
@@ -3101,13 +3102,13 @@ void kvm_mmu_zap_all(struct kvm *kvm)
 	list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
 		if (WARN_ON(sp->role.invalid))
 			continue;
-		if (__kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, &ign))
+		if (__kvm_shadow_mmu_prepare_zap_page(kvm, sp, &invalid_list, &ign))
 			goto restart;
 		if (cond_resched_rwlock_write(&kvm->mmu_lock))
 			goto restart;
 	}
 
-	kvm_mmu_commit_zap_page(kvm, &invalid_list);
+	kvm_shadow_mmu_commit_zap_page(kvm, &invalid_list);
 
 	if (tdp_mmu_enabled)
 		kvm_tdp_mmu_zap_all(kvm);
@@ -3457,7 +3458,7 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
 		else if (is_tdp_mmu_page(sp))
 			flush |= kvm_tdp_mmu_zap_sp(kvm, sp);
 		else
-			kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
+			kvm_shadow_mmu_prepare_zap_page(kvm, sp, &invalid_list);
 		WARN_ON_ONCE(sp->nx_huge_page_disallowed);
 
 		if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
diff --git a/arch/x86/kvm/mmu/shadow_mmu.c b/arch/x86/kvm/mmu/shadow_mmu.c
index 36b335d75aee2..32a24530cf19a 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.c
+++ b/arch/x86/kvm/mmu/shadow_mmu.c
@@ -1282,7 +1282,7 @@ static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	int ret = vcpu->arch.mmu->sync_page(vcpu, sp);
 
 	if (ret < 0)
-		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
+		kvm_shadow_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
 	return ret;
 }
 
@@ -1444,8 +1444,8 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 			 * upper-level page will be write-protected.
 			 */
 			if (role.level > PG_LEVEL_4K && sp->unsync)
-				kvm_mmu_prepare_zap_page(kvm, sp,
-							 &invalid_list);
+				kvm_shadow_mmu_prepare_zap_page(kvm, sp,
+								&invalid_list);
 			continue;
 		}
 
@@ -1487,7 +1487,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 	++kvm->stat.mmu_cache_miss;
 
 out:
-	kvm_mmu_commit_zap_page(kvm, &invalid_list);
+	kvm_shadow_mmu_commit_zap_page(kvm, &invalid_list);
 
 	if (collisions > kvm->stat.max_mmu_page_hash_collisions)
 		kvm->stat.max_mmu_page_hash_collisions = collisions;
@@ -1779,8 +1779,8 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, u64 *spte,
 			 */
 			if (tdp_enabled && invalid_list &&
 			    child->role.guest_mode && !child->parent_ptes.val)
-				return kvm_mmu_prepare_zap_page(kvm, child,
-								invalid_list);
+				return kvm_shadow_mmu_prepare_zap_page(kvm,
+							child, invalid_list);
 		}
 	} else if (is_mmio_spte(pte)) {
 		mmu_spte_clear_no_track(spte);
@@ -1825,7 +1825,7 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
 		struct kvm_mmu_page *sp;
 
 		for_each_sp(pages, sp, parents, i) {
-			kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
+			kvm_shadow_mmu_prepare_zap_page(kvm, sp, invalid_list);
 			mmu_pages_clear_parents(&parents);
 			zapped++;
 		}
@@ -1834,9 +1834,9 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
 	return zapped;
 }
 
-bool __kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
-				struct list_head *invalid_list,
-				int *nr_zapped)
+bool __kvm_shadow_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+				       struct list_head *invalid_list,
+				       int *nr_zapped)
 {
 	bool list_unstable, zapped_root = false;
 
@@ -1898,16 +1898,17 @@ bool __kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 	return list_unstable;
 }
 
-bool kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
-			      struct list_head *invalid_list)
+bool kvm_shadow_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+				     struct list_head *invalid_list)
 {
 	int nr_zapped;
 
-	__kvm_mmu_prepare_zap_page(kvm, sp, invalid_list, &nr_zapped);
+	__kvm_shadow_mmu_prepare_zap_page(kvm, sp, invalid_list, &nr_zapped);
 	return nr_zapped;
 }
 
-void kvm_mmu_commit_zap_page(struct kvm *kvm, struct list_head *invalid_list)
+void kvm_shadow_mmu_commit_zap_page(struct kvm *kvm,
+				    struct list_head *invalid_list)
 {
 	struct kvm_mmu_page *sp, *nsp;
 
@@ -1952,8 +1953,8 @@ static unsigned long kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm,
 		if (sp->root_count)
 			continue;
 
-		unstable = __kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list,
-						      &nr_zapped);
+		unstable = __kvm_shadow_mmu_prepare_zap_page(kvm, sp,
+						&invalid_list, &nr_zapped);
 		total_zapped += nr_zapped;
 		if (total_zapped >= nr_to_zap)
 			break;
@@ -1962,7 +1963,7 @@ static unsigned long kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm,
 			goto restart;
 	}
 
-	kvm_mmu_commit_zap_page(kvm, &invalid_list);
+	kvm_shadow_mmu_commit_zap_page(kvm, &invalid_list);
 
 	kvm->stat.mmu_recycled += total_zapped;
 	return total_zapped;
@@ -2033,9 +2034,9 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
 		pgprintk("%s: gfn %llx role %x\n", __func__, gfn,
 			 sp->role.word);
 		r = 1;
-		kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
+		kvm_shadow_mmu_prepare_zap_page(kvm, sp, &invalid_list);
 	}
-	kvm_mmu_commit_zap_page(kvm, &invalid_list);
+	kvm_shadow_mmu_commit_zap_page(kvm, &invalid_list);
 	write_unlock(&kvm->mmu_lock);
 
 	return r;
@@ -3032,7 +3033,8 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 	for_each_gfn_valid_sp_with_gptes(vcpu->kvm, sp, gfn) {
 		if (detect_write_misaligned(sp, gpa, bytes) ||
 		      detect_write_flooding(sp)) {
-			kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list);
+			kvm_shadow_mmu_prepare_zap_page(vcpu->kvm, sp,
+							&invalid_list);
 			++vcpu->kvm->stat.mmu_flooded;
 			continue;
 		}
@@ -3141,7 +3143,7 @@ void kvm_shadow_mmu_zap_obsolete_pages(struct kvm *kvm)
 			goto restart;
 		}
 
-		unstable = __kvm_mmu_prepare_zap_page(kvm, sp,
+		unstable = __kvm_shadow_mmu_prepare_zap_page(kvm, sp,
 				&kvm->arch.zapped_obsolete_pages, &nr_zapped);
 		batch += nr_zapped;
 
@@ -3158,7 +3160,7 @@ void kvm_shadow_mmu_zap_obsolete_pages(struct kvm *kvm)
 	 * kvm_mmu_load()), and the reload in the caller ensure no vCPUs are
 	 * running with an obsolete MMU.
 	 */
-	kvm_mmu_commit_zap_page(kvm, &kvm->arch.zapped_obsolete_pages);
+	kvm_shadow_mmu_commit_zap_page(kvm, &kvm->arch.zapped_obsolete_pages);
 }
 
 bool kvm_shadow_mmu_has_zapped_obsolete_pages(struct kvm *kvm)
@@ -3439,7 +3441,7 @@ unsigned long kvm_shadow_mmu_shrink_scan(struct kvm *kvm, int pages_to_free)
 	write_lock(&kvm->mmu_lock);
 
 	if (kvm_shadow_mmu_has_zapped_obsolete_pages(kvm)) {
-		kvm_mmu_commit_zap_page(kvm, &kvm->arch.zapped_obsolete_pages);
+		kvm_shadow_mmu_commit_zap_page(kvm, &kvm->arch.zapped_obsolete_pages);
 		goto out;
 	}
 
diff --git a/arch/x86/kvm/mmu/shadow_mmu.h b/arch/x86/kvm/mmu/shadow_mmu.h
index cc28895d2a24f..82eed9bb9bc9a 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.h
+++ b/arch/x86/kvm/mmu/shadow_mmu.h
@@ -66,12 +66,13 @@ bool kvm_test_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 
 void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp);
 
-bool __kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
-				struct list_head *invalid_list,
-				int *nr_zapped);
-bool kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
-			      struct list_head *invalid_list);
-void kvm_mmu_commit_zap_page(struct kvm *kvm, struct list_head *invalid_list);
+bool __kvm_shadow_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+				       struct list_head *invalid_list,
+				       int *nr_zapped);
+bool kvm_shadow_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+				     struct list_head *invalid_list);
+void kvm_shadow_mmu_commit_zap_page(struct kvm *kvm,
+				    struct list_head *invalid_list);
 
 int kvm_shadow_mmu_make_pages_available(struct kvm_vcpu *vcpu);
 

From patchwork Thu Feb  2 18:28:02 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52131
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp401348wrn;
        Thu, 2 Feb 2023 10:31:00 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set+j4IaHSIKWUsCLOZe4d2mQa6gXRSIj/u270Igi4lWzOqbooVYM6FanzVgjHGyR3Hyl8RnF
X-Received: by 2002:a17:902:f34d:b0:198:a5d9:14b8 with SMTP id
 q13-20020a170902f34d00b00198a5d914b8mr5770414ple.54.1675362660500;
        Thu, 02 Feb 2023 10:31:00 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675362660; cv=none;
        d=google.com; s=arc-20160816;
        b=jGT2CFwbc9b92ay+9clHSh/HxeVesWc9S1HqDxhCECpKqv5tPQumato4KgIJIbLq2c
         g/SkxfleAdcoCplwRRpaowjHdL9yAuKCxr2F3MeOtANHLcLzD+nSlmYWNVQ+RmFuETuF
         acB+E9YJPEtY8dHrN5ROeDPbD+3ogPmEURBbasRmLdpcTethDoLEsSfXDBuL8RvMIkdL
         mM6zLWF+rBicE+P9NVmgVnoKYaBmbbbYeMK2Wd3HmE+h+S+tAoXyMxY3kArg7zgpaPXv
         rUzWkfTtpts4PCo2/Nxl+OPxMeKRaHQWFNPZLjHoAYtuCduJDLsg6U+XlsBr6Mp8u4+m
         dWWg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=hvqHXEXjWYHTPhQcpfchQJxtERd1tvsiHPz9/kGUDP4=;
        b=v71oF+VUmLqqQ9ihbluZmjnrxO2WC0mFVeExjkH0Gt4C3z5UcUXAWx8RHxhyMXQq/L
         EvJpAB3NeqniJbM2jA+7uBw4qOK4XxvlkuW7C6JOlcL5pfVCxd97Vgp8r1aze3CG3AMx
         rNoc6wO35IQw4Xz8y9wgJ4+FzfrvhoYaJicdWydyHGy0ElE6kahFv1v3UZq7Diumfj2+
         3Le4KFt6diliJMCU43cGgmF829m9m96gpBFEmTUoK0xt2myCyC9rPGZ3rhi4i2TsLCnC
         jIzXXby6lADsUxzgrRkhsx7oNEMvlz83dbkoNh7V14ss/BJEGawoI48+WXNZ6pdy1v0v
         FAYA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=Ngf+eS88;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 v1-20020a63b641000000b004771126e2f7si294223pgt.142.2023.02.02.10.30.47;
        Thu, 02 Feb 2023 10:31:00 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=Ngf+eS88;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232685AbjBBSaZ (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:30:25 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33812 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232463AbjBBS3o (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:29:44 -0500
Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com
 [IPv6:2607:f8b0:4864:20::64a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D42365ED4
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:48 -0800 (PST)
Received: by mail-pl1-x64a.google.com with SMTP id
 p15-20020a170902a40f00b00192b2bbb7f8so1316789plq.14
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=hvqHXEXjWYHTPhQcpfchQJxtERd1tvsiHPz9/kGUDP4=;
        b=Ngf+eS881ONCbEyEVOMB8xVJqlLzJrgH97HgBnQSTrNlY2iSaBMTaZjWLKJhC/ZMi8
         f8E2E1Yo82GSOrHqqL/9T/2trk8ldfp0OvAzru9ofatoVZJ5++/ZkNAT107ZyLhJ9mj7
         0k4W5P7cDjIau9O0gNL76KPh0/FLqIx6PF++TzoEXG/T45kDNT4qJJvCWPG1Qw2neb45
         hEWu3gh2LYGWBwcM0jz5VHLI5ifRaTnzPFxYXtqkvfUPBx2jWnmrSak8mhhkqdXC9Ll1
         pNf4WNzVJIJa2VoHYHYvPp9Jb247Rog/b+/kFo7GwydGDFID/miKyzHVzW666Q6IU7aG
         vHbA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=hvqHXEXjWYHTPhQcpfchQJxtERd1tvsiHPz9/kGUDP4=;
        b=OYOhmrHomCzjyFqc9E+ZW7uIt/xfhbnl92/0Xis3dwgfD2s0XZfDVVSbWLHGSEPffm
         9cL9RgCg2A3hNFLaMMT5tMKoJc9ctI1hkeN06yowvoBPHfVGYT3gwXTLiyhJFdS7fmaC
         6LHERJdV83SAMW/9dBIxOeNJXiKpgXHeujJqoSUJelD4Mq+M5I8En94e5WijAyY4Sxce
         4QK2V+qMEnuTRe5xJEP7zjsdIKa8KdVPU7geft52sfKcYRmMx48bqZKoMZaE4JiaVWqp
         dAq85GlWh+/KU5dAhofgQKcA6KrEZ13shT1kbv4fStpM/GTKlg5xVrMRjkrThsVhU8sN
         3YmQ==
X-Gm-Message-State: AO0yUKUb0PWWgAUiFrEd6oTKYqhbhY1VTGellmzUKN6pVVrp2X72eZ5Z
        QtV0b0tWwrQpRZuFVKms2Jg2HD/lIZuJb+lVCJeUEzW3Z9IMxNXCMWWAv2ubOVNFfubyxLrgWnJ
        VD1G4KfpDha8AjZIVBdfRCy4xLKE75kbXLdWsQIcIZADHDWnqculNo9gbZBUoB4QILrR78o6o
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a05:6a00:4208:b0:58e:2111:9c42 with SMTP
 id cd8-20020a056a00420800b0058e21119c42mr1702934pfb.22.1675362514256; Thu, 02
 Feb 2023 10:28:34 -0800 (PST)
Date: Thu,  2 Feb 2023 18:28:02 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-15-bgardon@google.com>
Subject: [PATCH 14/21] KVM: x86/MMU: Factor Shadow MMU wrprot / clear dirty
 ops out of mmu.c
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745076968061643?=
X-GMAIL-MSGID: =?utf-8?q?1756745076968061643?=

There are several functions in mmu.c which bifrucate to the Shadow
and/or TDP MMU implementations. In most of these, the Shadow MMU
implementation is open-coded. Wrap these instances in a nice function
which just needs kvm and slot arguments or similar. This matches the TDP
MMU interface and will allow for some nice cleanups in a following
commit.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c        | 52 ++++++----------------------
 arch/x86/kvm/mmu/shadow_mmu.c | 64 +++++++++++++++++++++++++++++++++++
 arch/x86/kvm/mmu/shadow_mmu.h | 15 ++++++++
 3 files changed, 90 insertions(+), 41 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9b217e04cab0e..44a00396284d5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -377,23 +377,13 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 				     struct kvm_memory_slot *slot,
 				     gfn_t gfn_offset, unsigned long mask)
 {
-	struct kvm_rmap_head *rmap_head;
-
 	if (tdp_mmu_enabled)
 		kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot,
 				slot->base_gfn + gfn_offset, mask, true);
 
-	if (!kvm_memslots_have_rmaps(kvm))
-		return;
-
-	while (mask) {
-		rmap_head = gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
-					PG_LEVEL_4K, slot);
-		rmap_write_protect(rmap_head, false);
-
-		/* clear the first set bit */
-		mask &= mask - 1;
-	}
+	if (kvm_memslots_have_rmaps(kvm))
+		kvm_shadow_mmu_write_protect_pt_masked(kvm, slot, gfn_offset,
+						       mask);
 }
 
 /**
@@ -410,23 +400,13 @@ static void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 					 struct kvm_memory_slot *slot,
 					 gfn_t gfn_offset, unsigned long mask)
 {
-	struct kvm_rmap_head *rmap_head;
-
 	if (tdp_mmu_enabled)
 		kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot,
 				slot->base_gfn + gfn_offset, mask, false);
 
-	if (!kvm_memslots_have_rmaps(kvm))
-		return;
-
-	while (mask) {
-		rmap_head = gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
-					PG_LEVEL_4K, slot);
-		__rmap_clear_dirty(kvm, rmap_head, slot);
-
-		/* clear the first set bit */
-		mask &= mask - 1;
-	}
+	if (kvm_memslots_have_rmaps(kvm))
+		kvm_shadow_mmu_clear_dirty_pt_masked(kvm, slot, gfn_offset,
+						     mask);
 }
 
 /**
@@ -484,16 +464,11 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn,
 				    int min_level)
 {
-	struct kvm_rmap_head *rmap_head;
-	int i;
 	bool write_protected = false;
 
-	if (kvm_memslots_have_rmaps(kvm)) {
-		for (i = min_level; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
-			rmap_head = gfn_to_rmap(gfn, i, slot);
-			write_protected |= rmap_write_protect(rmap_head, true);
-		}
-	}
+	if (kvm_memslots_have_rmaps(kvm))
+		write_protected |=
+			kvm_shadow_mmu_write_protect_gfn(kvm, slot, gfn, min_level);
 
 	if (tdp_mmu_enabled)
 		write_protected |=
@@ -2915,8 +2890,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 {
 	if (kvm_memslots_have_rmaps(kvm)) {
 		write_lock(&kvm->mmu_lock);
-		walk_slot_rmaps(kvm, memslot, slot_rmap_write_protect,
-				start_level, KVM_MAX_HUGEPAGE_LEVEL, false);
+		kvm_shadow_mmu_wrprot_slot(kvm, memslot, start_level);
 		write_unlock(&kvm->mmu_lock);
 	}
 
@@ -3067,11 +3041,7 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 {
 	if (kvm_memslots_have_rmaps(kvm)) {
 		write_lock(&kvm->mmu_lock);
-		/*
-		 * Clear dirty bits only on 4k SPTEs since the legacy MMU only
-		 * support dirty logging at a 4k granularity.
-		 */
-		walk_slot_rmaps_4k(kvm, memslot, __rmap_clear_dirty, false);
+		kvm_shadow_mmu_clear_dirty_slot(kvm, memslot);
 		write_unlock(&kvm->mmu_lock);
 	}
 
diff --git a/arch/x86/kvm/mmu/shadow_mmu.c b/arch/x86/kvm/mmu/shadow_mmu.c
index 32a24530cf19a..b93a6174717d3 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.c
+++ b/arch/x86/kvm/mmu/shadow_mmu.c
@@ -3453,3 +3453,67 @@ unsigned long kvm_shadow_mmu_shrink_scan(struct kvm *kvm, int pages_to_free)
 
 	return freed;
 }
+
+void kvm_shadow_mmu_write_protect_pt_masked(struct kvm *kvm,
+					    struct kvm_memory_slot *slot,
+					    gfn_t gfn_offset, unsigned long mask)
+{
+	struct kvm_rmap_head *rmap_head;
+
+	while (mask) {
+		rmap_head = gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
+					PG_LEVEL_4K, slot);
+		rmap_write_protect(rmap_head, false);
+
+		/* clear the first set bit */
+		mask &= mask - 1;
+	}
+}
+
+void kvm_shadow_mmu_clear_dirty_pt_masked(struct kvm *kvm,
+					  struct kvm_memory_slot *slot,
+					  gfn_t gfn_offset, unsigned long mask)
+{
+	struct kvm_rmap_head *rmap_head;
+
+	while (mask) {
+		rmap_head = gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
+					PG_LEVEL_4K, slot);
+		__rmap_clear_dirty(kvm, rmap_head, slot);
+
+		/* clear the first set bit */
+		mask &= mask - 1;
+	}
+}
+
+bool kvm_shadow_mmu_write_protect_gfn(struct kvm *kvm,
+				      struct kvm_memory_slot *slot,
+				      u64 gfn, int min_level)
+{
+	struct kvm_rmap_head *rmap_head;
+	int i;
+	bool write_protected = false;
+
+	if (kvm_memslots_have_rmaps(kvm)) {
+		for (i = min_level; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
+			rmap_head = gfn_to_rmap(gfn, i, slot);
+			write_protected |= rmap_write_protect(rmap_head, true);
+		}
+	}
+
+	return write_protected;
+}
+
+void kvm_shadow_mmu_clear_dirty_slot(struct kvm *kvm,
+				     const struct kvm_memory_slot *memslot)
+{
+	walk_slot_rmaps_4k(kvm, memslot, __rmap_clear_dirty, false);
+}
+
+void kvm_shadow_mmu_wrprot_slot(struct kvm *kvm,
+				const struct kvm_memory_slot *memslot,
+				int start_level)
+{
+	walk_slot_rmaps(kvm, memslot, slot_rmap_write_protect,
+			start_level, KVM_MAX_HUGEPAGE_LEVEL, false);
+}
diff --git a/arch/x86/kvm/mmu/shadow_mmu.h b/arch/x86/kvm/mmu/shadow_mmu.h
index 82eed9bb9bc9a..58f48293b4773 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.h
+++ b/arch/x86/kvm/mmu/shadow_mmu.h
@@ -117,6 +117,21 @@ void kvm_shadow_mmu_zap_collapsible_sptes(struct kvm *kvm,
 bool kvm_shadow_mmu_has_zapped_obsolete_pages(struct kvm *kvm);
 unsigned long kvm_shadow_mmu_shrink_scan(struct kvm *kvm, int pages_to_free);
 
+void kvm_shadow_mmu_write_protect_pt_masked(struct kvm *kvm,
+					    struct kvm_memory_slot *slot,
+					    gfn_t gfn_offset, unsigned long mask);
+void kvm_shadow_mmu_clear_dirty_pt_masked(struct kvm *kvm,
+					  struct kvm_memory_slot *slot,
+					  gfn_t gfn_offset, unsigned long mask);
+bool kvm_shadow_mmu_write_protect_gfn(struct kvm *kvm,
+				      struct kvm_memory_slot *slot,
+				      u64 gfn, int min_level);
+void kvm_shadow_mmu_clear_dirty_slot(struct kvm *kvm,
+				     const struct kvm_memory_slot *memslot);
+void kvm_shadow_mmu_wrprot_slot(struct kvm *kvm,
+				const struct kvm_memory_slot *memslot,
+				int start_level);
+
 /* Exports from paging_tmpl.h */
 gpa_t paging32_gva_to_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			  gpa_t vaddr, u64 access,

From patchwork Thu Feb  2 18:28:03 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52132
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp402415wrn;
        Thu, 2 Feb 2023 10:33:16 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set8OJXwnsmJ+QujwrbqTA6iV5QmKY/Xpua7SciUTFoxx3lKxKqsdOrMaUySMB6wECJyd9N6T
X-Received: by 2002:a17:902:d2ce:b0:196:3596:4a29 with SMTP id
 n14-20020a170902d2ce00b0019635964a29mr7831033plc.21.1675362796414;
        Thu, 02 Feb 2023 10:33:16 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675362796; cv=none;
        d=google.com; s=arc-20160816;
        b=OzAsfGUIT87xfNv4Z+UEsdpbANgAA4qRDSgg1PK9YmzsvCZwy5UztwTy0YDlcJyfs6
         n44kUyHRFiW90QTqA2jFDcz0QMZwEGx4DTN+8d+INIUNccqjDkePTZhdZbc5r/OWAw4T
         fvdNZIdtwoWzn7JtkcHwoLuLYJdJwNL5dIcLLMdNOPV1/AAUrJoyN1ZA8w4GFxy4LVL+
         mMgy7159GYYKGCk/DNz4XQo1j7X3h2QaCiHPPlXsM2Z+YF4PO8wkNXahRlXn8RidM+Pc
         Pb9V8G/MZv0q56xfVZtYNGhwZJ1O3UyVjga6l9tI6sNuwG9OUWJ6MNycBxzIys9WvdnO
         oixw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=2BER/1IEMd55LQsJILKozxqxXLopbafo2dDttN7vu9E=;
        b=XJ+bJHkupCI8ToLGN7g1gn19+61WY72pF5z82VdXd+Edlo517B3yk0CF/Uoz6XhVBw
         8qmyBtK9I8R08QQRg3z4up6VZ18Cz+mWXRl7WUoDlEU/2eBFM8D/hRZpVuzpOHIxbvfm
         TpENkuaj2iehUQHT7bi6fmOhEEgxIfMUvpe9CmDiXFk+nklWeFXJXUB37e4W6tINoHHB
         tP8SmCJehhtJcnNFeYDwC57pY+R13mERhjurWBf4b1f9T8jPGFP6JSlsR2tTYKlfkht+
         /Nw4Eof183QTHb3e4NxNmLvYc4FihPfU8oirMsOJqo8R4JJfmr/nLL50hf5tFsqvztbi
         zGHQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=ISrUx6Ku;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 q15-20020a170902dacf00b00192a69cd30dsi26205459plx.567.2023.02.02.10.33.03;
        Thu, 02 Feb 2023 10:33:16 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=ISrUx6Ku;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232918AbjBBSae (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:30:34 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33350 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232727AbjBBS3t (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:29:49 -0500
Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com
 [IPv6:2607:f8b0:4864:20::b4a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CA9E7BBCE
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:55 -0800 (PST)
Received: by mail-yb1-xb4a.google.com with SMTP id
 u186-20020a2560c3000000b007c8e2cf3668so2499599ybb.14
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:55 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=2BER/1IEMd55LQsJILKozxqxXLopbafo2dDttN7vu9E=;
        b=ISrUx6Kuco9AWdauP5MGzGSzbbY+lXEVtPZgbIuskIdrCJHulKZdtW9tchPow/b65W
         r6mzDiLmuobwuNPQSb1bV2KFT+HyPogGNXEaSWcOfj8x1Q5iMIxrAUa0DmThF98fwoaS
         zrXX6j1MEqV1WZqqSOGJOzErglfd4ZKsJ6cNUrgxapKSOurF4vPgRRSPMp6Q0nhA1B08
         mTBtKS0K8UDTCVZQP+sWrDG3XitQfLOE9PayJASe6vHecnUas6f7iheUi6OydFSnCl0x
         ruPNuic9DuThsqzjqnuyIc4piacsKGtz2B2XTB/3IqRTheKSefBJ4Qd/O0UzCVYaxZV2
         7CRQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=2BER/1IEMd55LQsJILKozxqxXLopbafo2dDttN7vu9E=;
        b=H2mvV67FZ2wgIgu2LiTKYvcYDf3GfC3Ec4R2MtIELJT1e4oODmOaAtE81+kCxAzh9I
         s9USIXMpJfzimyEHRFW8uTCNmtbeTV7fQdVqIWdtFDxmwD/e2HMrfROj5jEDHYSr9KqS
         4cgdYC8YtfiCIb0OmFfNrFk6tw9VfLhUpQtXCZbVxRgEdE3RJOfe6xy3GmN/GgIP/FiL
         zDk6GJxENIsn5DTEn/V4/DXMuUE6RQFjU5+Nw7aThncyg0T1X7njgUDMNxhb5iJqdqBo
         r6sHpH4E6cTpIC7gzKoRNpwuL33yjt3p1RZHwldEJF/G87KyptUXxR6eYWWCTvpbYB37
         /VRw==
X-Gm-Message-State: AO0yUKVo0ywVc7ixNo85D/TaWod87WZfvCoQZ0o2uKD/jvwbKq31/rAB
        0eicU4OE+cju2/GV8Fi2ms9AEaQ1zLyrGIrvALgbivtATwIhdOQT5RrKvsl2+jNE3HmZeX/r2q5
        HzPfRddsQHthqbAIjlw4KXiw7yIzwPCnHUP4a6bXqPwxhiL+Nju+On/2cSNeZZi++Zb3zHciD
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a81:1952:0:b0:521:e063:71e7 with SMTP id
 79-20020a811952000000b00521e06371e7mr5ywz.9.1675362516022; Thu, 02 Feb 2023
 10:28:36 -0800 (PST)
Date: Thu,  2 Feb 2023 18:28:03 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-16-bgardon@google.com>
Subject: [PATCH 15/21] KVM: x86/MMU: Remove unneeded exports from shadow_mmu.c
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745219665964446?=
X-GMAIL-MSGID: =?utf-8?q?1756745219665964446?=

Now that the various dirty logging / wrprot function implementations are
in shadow_mmu.c, do another round of cleanups to remove functions which
no longer need to be exposed and can be marked static.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/shadow_mmu.c | 32 +++++++++++++++++++-------------
 arch/x86/kvm/mmu/shadow_mmu.h | 18 ------------------
 2 files changed, 19 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kvm/mmu/shadow_mmu.c b/arch/x86/kvm/mmu/shadow_mmu.c
index b93a6174717d3..dc5c4b9899cc6 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.c
+++ b/arch/x86/kvm/mmu/shadow_mmu.c
@@ -634,8 +634,8 @@ unsigned int pte_list_count(struct kvm_rmap_head *rmap_head)
 	return count;
 }
 
-struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
-				  const struct kvm_memory_slot *slot)
+static struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
+					 const struct kvm_memory_slot *slot)
 {
 	unsigned long idx;
 
@@ -803,7 +803,7 @@ static bool spte_write_protect(u64 *sptep, bool pt_protect)
 	return mmu_spte_update(sptep, spte);
 }
 
-bool rmap_write_protect(struct kvm_rmap_head *rmap_head, bool pt_protect)
+static bool rmap_write_protect(struct kvm_rmap_head *rmap_head, bool pt_protect)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
@@ -842,8 +842,8 @@ static bool spte_wrprot_for_clear_dirty(u64 *sptep)
  *	- W bit on ad-disabled SPTEs.
  * Returns true iff any D or W bits were cleared.
  */
-bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-			const struct kvm_memory_slot *slot)
+static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			       const struct kvm_memory_slot *slot)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
@@ -3057,6 +3057,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 	write_unlock(&vcpu->kvm->mmu_lock);
 }
 
+/* The return value indicates if tlb flush on all vcpus is needed. */
+typedef bool (*slot_rmaps_handler) (struct kvm *kvm,
+				    struct kvm_rmap_head *rmap_head,
+				    const struct kvm_memory_slot *slot);
+
 static __always_inline bool __walk_slot_rmaps(struct kvm *kvm,
 					      const struct kvm_memory_slot *slot,
 					      slot_rmaps_handler fn,
@@ -3087,20 +3092,21 @@ static __always_inline bool __walk_slot_rmaps(struct kvm *kvm,
 	return flush;
 }
 
-__always_inline bool walk_slot_rmaps(struct kvm *kvm,
-				     const struct kvm_memory_slot *slot,
-				     slot_rmaps_handler fn, int start_level,
-				     int end_level, bool flush_on_yield)
+static __always_inline bool walk_slot_rmaps(struct kvm *kvm,
+					    const struct kvm_memory_slot *slot,
+					    slot_rmaps_handler fn,
+					    int start_level, int end_level,
+					    bool flush_on_yield)
 {
 	return __walk_slot_rmaps(kvm, slot, fn, start_level, end_level,
 				 slot->base_gfn, slot->base_gfn + slot->npages - 1,
 				 flush_on_yield, false);
 }
 
-__always_inline bool walk_slot_rmaps_4k(struct kvm *kvm,
-					const struct kvm_memory_slot *slot,
-					slot_rmaps_handler fn,
-					bool flush_on_yield)
+static __always_inline bool walk_slot_rmaps_4k(struct kvm *kvm,
+					       const struct kvm_memory_slot *slot,
+					       slot_rmaps_handler fn,
+					       bool flush_on_yield)
 {
 	return walk_slot_rmaps(kvm, slot, fn, PG_LEVEL_4K,
 			       PG_LEVEL_4K, flush_on_yield);
diff --git a/arch/x86/kvm/mmu/shadow_mmu.h b/arch/x86/kvm/mmu/shadow_mmu.h
index 58f48293b4773..36fe8013931d2 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.h
+++ b/arch/x86/kvm/mmu/shadow_mmu.h
@@ -39,11 +39,6 @@ struct pte_list_desc {
 /* Only exported for debugfs.c. */
 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
 
-struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
-				  const struct kvm_memory_slot *slot);
-bool rmap_write_protect(struct kvm_rmap_head *rmap_head, bool pt_protect);
-bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-			const struct kvm_memory_slot *slot);
 bool kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 		  struct kvm_memory_slot *slot, gfn_t gfn, int level,
 		  pte_t unused);
@@ -91,22 +86,9 @@ int kvm_shadow_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 		       int bytes, struct kvm_page_track_notifier_node *node);
 
-/* The return value indicates if tlb flush on all vcpus is needed. */
-typedef bool (*slot_rmaps_handler) (struct kvm *kvm,
-				    struct kvm_rmap_head *rmap_head,
-				    const struct kvm_memory_slot *slot);
-bool walk_slot_rmaps(struct kvm *kvm, const struct kvm_memory_slot *slot,
-		       slot_rmaps_handler fn, int start_level, int end_level,
-		       bool flush_on_yield);
-bool walk_slot_rmaps_4k(struct kvm *kvm, const struct kvm_memory_slot *slot,
-			slot_rmaps_handler fn, bool flush_on_yield);
-
 void kvm_shadow_mmu_zap_obsolete_pages(struct kvm *kvm);
 bool kvm_shadow_mmu_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
 
-bool slot_rmap_write_protect(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-			     const struct kvm_memory_slot *slot);
-
 void kvm_shadow_mmu_try_split_huge_pages(struct kvm *kvm,
 					 const struct kvm_memory_slot *slot,
 					 gfn_t start, gfn_t end,

From patchwork Thu Feb  2 18:28:04 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52133
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp403195wrn;
        Thu, 2 Feb 2023 10:34:50 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set+ZCwsUkGr8kL65zhO5oSoqO8sTYDpzRhfNb7Qv9J0v+sXyERze62SOPnCur4Hs0Sdipx02
X-Received: by 2002:a05:6a20:8f16:b0:bc:55ca:e63e with SMTP id
 b22-20020a056a208f1600b000bc55cae63emr9255824pzk.53.1675362890434;
        Thu, 02 Feb 2023 10:34:50 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675362890; cv=none;
        d=google.com; s=arc-20160816;
        b=D0wHedFGWaA+sxxLWv4BEs1QzzOiiwtZsksjGAAUxCWHkMBtR/Et3Fpg/AsXxzQPSU
         lynAJu/mKmFimacRO8x3Y0k6I+WLQijiEsrPLs5T7zF4OvVlA4ajVvXTxKWAg+1AJxO4
         jcZuxOniaqyOvl13KzGW57/WKZ+NCBNOxwNv/poE4Skm9H1PyiXAPNBp8bB5BcEsMlNO
         0knAmQJr3YM9UGgVT9xg5ikREl2CLbB6imjw8yzLl/3Gkq9OZH5i4KdSesoX5nLPP7tF
         oAMlQCsahdLuCLbTBTnCdJk6RVXaflVxrxZzsFpoPKItkQ7KX4e9whpuFOYpjmFKzzs6
         SrRw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=5z7kzwo6bFhkuATdWJN+pGPJy4Fvw6TYtZKMARZWD4Y=;
        b=V5LXT/StWDBhbQ8vJzBKYw/NPMXJugIt1L1LmvjMcZT8TcVILpDToMvztu7flin+92
         Yt+G61q1Dlx7PY418+rh5nbovrhqBcLU74tcx2qwdkGN93RSiv6+4uSbUlMyrSsYu8pA
         Jl5KgjMjNOj3U41XNNLo7yuk2qR1U/HJaEJSD4gWxZlvygoJibNjKaalqT/ys9bxZ+D0
         b81sy0uzw4xPUCuiBH0BmpeYyB+u6ui7h1KAmp6JhfFA6+gXK50oPaI88g6Pqsw+NoVu
         idru/vAsLP7yVMMIvMyBgrLk5uZBhOHQH/SI47ONpOTUBN4sG3iukqMlR0W0q3GvUi0j
         jU5Q==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=Jwl4LCtL;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 i14-20020a63bf4e000000b004de8bd03b80si219035pgo.878.2023.02.02.10.34.36;
        Thu, 02 Feb 2023 10:34:50 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=Jwl4LCtL;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232858AbjBBSaj (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:30:39 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34150 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232823AbjBBS3x (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:29:53 -0500
Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com
 [IPv6:2607:f8b0:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 440E47BBE4
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:56 -0800 (PST)
Received: by mail-pl1-x649.google.com with SMTP id
 5-20020a170902c20500b0019682a04155so1307478pll.19
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=5z7kzwo6bFhkuATdWJN+pGPJy4Fvw6TYtZKMARZWD4Y=;
        b=Jwl4LCtLIKrw/7ynBH+9YL/WMWqT6AlooFuOZxazMcXaLj1RrSAmieHktrDNixNcRq
         p5sK/BuD90GIPzj+YyWjSfUISO3nKtg3o49YEClAeBNE/U/34GSLVTC/HsUqm/mE0p2y
         DBJ9axV9yeTzNTWFYUQ/3arBLPp8VjZMSBxGhZk7rAUlEqBFFXBQXlrl5kSjQIGFKvMP
         mrhGWST7M1Ei6QNaH6tH8oBeNL/odNyFxAwzEKtmbHyNBogtFLU5EG+Tbbus3rcMY4lv
         OcuaeHkg2EvXvlyC7TaHOOUvDjq2wdnUMd7li9iWRxLLFFxYKaQ6roy341k4aQW/36Ag
         KSYQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=5z7kzwo6bFhkuATdWJN+pGPJy4Fvw6TYtZKMARZWD4Y=;
        b=gH39t4Qq2OKjct0SqREND4Kj4bpN7NQMYKJVWcymiztCb1zzIvGKtRfLyhAQ58dqLt
         l/sFxPgN2bBFxXI3B6KzVxmIer80XSRWzM9j7wAnrNgEK2r3mxfMWkFjcllTZLpFnHIh
         VmJG42eYuyPeFVv9ZCkLIg5BFgZTxxiZ44JKcOvkvylTXJ471zZso8KCS42iaUn+YKZl
         x7A7oasoSLYfKtieNBD3se8vE350Hmq2TCAj/BZ5wzAFkyeJRw/ZrmwG/0Af3pQbhZUC
         23BwcZHPDCgXKI6/RzF4elEJwAY4zwp23c8sdo3utZ1ZkMlBDEqK9y+1uZDLMo46EmUo
         2JEg==
X-Gm-Message-State: AO0yUKV46aJnd2DMca27djNO5AI8yGLjxqAikJq7SZPdVHiyanA/MH0v
        ehbAe3U2FI8ChY0ilXC88ZPBVphOta2KB5edmp2Fc6VYKO/AwDjX27xfIusMnnArQkpIMti9Poo
        LFswbtgcmmQoaHeVqQNidMqc3xe4UCL92mYmEN73vEUN4DSWrqcB9jLpwwotopNNmI4onoGxY
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a17:902:7608:b0:196:7555:f810 with SMTP id
 k8-20020a170902760800b001967555f810mr1769024pll.7.1675362517803; Thu, 02 Feb
 2023 10:28:37 -0800 (PST)
Date: Thu,  2 Feb 2023 18:28:04 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-17-bgardon@google.com>
Subject: [PATCH 16/21] KVM: x86/MMU: Wrap uses of kvm_handle_gfn_range in
 mmu.c
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745318238948593?=
X-GMAIL-MSGID: =?utf-8?q?1756745318238948593?=

handle_gfn_range + callback is not a bad interface, but it requires
exporting the whole callback scheme to mmu.c. Simplify the interface
with some basic wrapper functions, making the callback scheme internal
to shadow_mmu.c.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c        |  8 +++---
 arch/x86/kvm/mmu/shadow_mmu.c | 54 +++++++++++++++++++++++++----------
 arch/x86/kvm/mmu/shadow_mmu.h | 25 ++++------------
 3 files changed, 48 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 44a00396284d5..156ab2e4cd811 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -490,7 +490,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 	bool flush = false;
 
 	if (kvm_memslots_have_rmaps(kvm))
-		flush = kvm_handle_gfn_range(kvm, range, kvm_zap_rmap);
+		flush = kvm_shadow_mmu_unmap_gfn_range(kvm, range);
 
 	if (tdp_mmu_enabled)
 		flush = kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush);
@@ -503,7 +503,7 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 	bool flush = false;
 
 	if (kvm_memslots_have_rmaps(kvm))
-		flush = kvm_handle_gfn_range(kvm, range, kvm_set_pte_rmap);
+		flush = kvm_shadow_mmu_set_spte_gfn(kvm, range);
 
 	if (tdp_mmu_enabled)
 		flush |= kvm_tdp_mmu_set_spte_gfn(kvm, range);
@@ -516,7 +516,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 	bool young = false;
 
 	if (kvm_memslots_have_rmaps(kvm))
-		young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap);
+		young = kvm_shadow_mmu_age_gfn_range(kvm, range);
 
 	if (tdp_mmu_enabled)
 		young |= kvm_tdp_mmu_age_gfn_range(kvm, range);
@@ -529,7 +529,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 	bool young = false;
 
 	if (kvm_memslots_have_rmaps(kvm))
-		young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmap);
+		young = kvm_shadow_mmu_test_age_gfn(kvm, range);
 
 	if (tdp_mmu_enabled)
 		young |= kvm_tdp_mmu_test_age_gfn(kvm, range);
diff --git a/arch/x86/kvm/mmu/shadow_mmu.c b/arch/x86/kvm/mmu/shadow_mmu.c
index dc5c4b9899cc6..dfff65db97c3b 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.c
+++ b/arch/x86/kvm/mmu/shadow_mmu.c
@@ -864,16 +864,16 @@ static bool __kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 	return kvm_zap_all_rmap_sptes(kvm, rmap_head);
 }
 
-bool kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-		  struct kvm_memory_slot *slot, gfn_t gfn, int level,
-		  pte_t unused)
+static bool kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			 struct kvm_memory_slot *slot, gfn_t gfn, int level,
+			 pte_t unused)
 {
 	return __kvm_zap_rmap(kvm, rmap_head, slot);
 }
 
-bool kvm_set_pte_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-		      struct kvm_memory_slot *slot, gfn_t gfn, int level,
-		      pte_t pte)
+static bool kvm_set_pte_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			     struct kvm_memory_slot *slot, gfn_t gfn, int level,
+			     pte_t pte)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
@@ -980,9 +980,13 @@ static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
 	     slot_rmap_walk_okay(_iter_);				\
 	     slot_rmap_walk_next(_iter_))
 
-__always_inline bool kvm_handle_gfn_range(struct kvm *kvm,
-					  struct kvm_gfn_range *range,
-					  rmap_handler_t handler)
+typedef bool (*rmap_handler_t)(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			       struct kvm_memory_slot *slot, gfn_t gfn,
+			       int level, pte_t pte);
+
+static __always_inline bool
+kvm_handle_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
+		     rmap_handler_t handler)
 {
 	struct slot_rmap_walk_iterator iterator;
 	bool ret = false;
@@ -995,9 +999,9 @@ __always_inline bool kvm_handle_gfn_range(struct kvm *kvm,
 	return ret;
 }
 
-bool kvm_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-		  struct kvm_memory_slot *slot, gfn_t gfn, int level,
-		  pte_t unused)
+static bool kvm_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			 struct kvm_memory_slot *slot, gfn_t gfn, int level,
+			 pte_t unused)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
@@ -1009,9 +1013,9 @@ bool kvm_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 	return young;
 }
 
-bool kvm_test_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-		       struct kvm_memory_slot *slot, gfn_t gfn,
-		       int level, pte_t unused)
+static bool kvm_test_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			      struct kvm_memory_slot *slot, gfn_t gfn,
+			      int level, pte_t unused)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
@@ -3523,3 +3527,23 @@ void kvm_shadow_mmu_wrprot_slot(struct kvm *kvm,
 	walk_slot_rmaps(kvm, memslot, slot_rmap_write_protect,
 			start_level, KVM_MAX_HUGEPAGE_LEVEL, false);
 }
+
+bool kvm_shadow_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+	return kvm_handle_gfn_range(kvm, range, kvm_zap_rmap);
+}
+
+bool kvm_shadow_mmu_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+	return kvm_handle_gfn_range(kvm, range, kvm_set_pte_rmap);
+}
+
+bool kvm_shadow_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+	return kvm_handle_gfn_range(kvm, range, kvm_age_rmap);
+}
+
+bool kvm_shadow_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+	return kvm_handle_gfn_range(kvm, range, kvm_test_age_rmap);
+}
diff --git a/arch/x86/kvm/mmu/shadow_mmu.h b/arch/x86/kvm/mmu/shadow_mmu.h
index 36fe8013931d2..e4fbc842f524e 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.h
+++ b/arch/x86/kvm/mmu/shadow_mmu.h
@@ -39,26 +39,6 @@ struct pte_list_desc {
 /* Only exported for debugfs.c. */
 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
 
-bool kvm_zap_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-		  struct kvm_memory_slot *slot, gfn_t gfn, int level,
-		  pte_t unused);
-bool kvm_set_pte_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-		      struct kvm_memory_slot *slot, gfn_t gfn, int level,
-		      pte_t pte);
-
-typedef bool (*rmap_handler_t)(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-			       struct kvm_memory_slot *slot, gfn_t gfn,
-			       int level, pte_t pte);
-bool kvm_handle_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
-			  rmap_handler_t handler);
-
-bool kvm_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-		  struct kvm_memory_slot *slot, gfn_t gfn, int level,
-		  pte_t unused);
-bool kvm_test_age_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-		       struct kvm_memory_slot *slot, gfn_t gfn,
-		       int level, pte_t unused);
-
 void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp);
 
 bool __kvm_shadow_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
@@ -114,6 +94,11 @@ void kvm_shadow_mmu_wrprot_slot(struct kvm *kvm,
 				const struct kvm_memory_slot *memslot,
 				int start_level);
 
+bool kvm_shadow_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
+bool kvm_shadow_mmu_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
+bool kvm_shadow_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
+bool kvm_shadow_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
+
 /* Exports from paging_tmpl.h */
 gpa_t paging32_gva_to_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			  gpa_t vaddr, u64 access,

From patchwork Thu Feb  2 18:28:05 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52128
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp401260wrn;
        Thu, 2 Feb 2023 10:30:53 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set+yCM17u7WxkYlR/59Hj75LUw+S0/+IONNynM02UJwmgxtoDmoRYJU29/loM+aBQMnd1IAo
X-Received: by 2002:a17:90b:1d89:b0:22b:b346:4d86 with SMTP id
 pf9-20020a17090b1d8900b0022bb3464d86mr7334993pjb.43.1675362653189;
        Thu, 02 Feb 2023 10:30:53 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675362653; cv=none;
        d=google.com; s=arc-20160816;
        b=ktIL2Dipn+jZzubsiWNMUl6zWz0sItFQHV5Ea52KafRx/ky62Kyl6cL/BadYiV8Kte
         gis7RKFqYHkks+B952HoODvp7/Zb5ijc8CpN+7kLS2hBiemoitkx9iWDMXlxP8L8aprJ
         +R52R1VgQ/XBfL58rwuIQxDX2WmZKRIsG3GmSDR8aS650ihs27gjambEDtdEUwnErvbU
         cJrYOibPGOjNrHYeb3WMgaC23AmdjkFhoa68kB6oKj+vCimi8U/r5oNUVND3hI+H58rf
         Hisqmq3yXfajep3AsVurl4lQf+Mi4JQvJ0zrguB4aYWOUc9sO3rd5nn6lF5trBH/T5zX
         DKgw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=qC/QhOftpR+PdNazZD7MOW3k7jLGZsIPHB5S26OPowU=;
        b=ZL7+ZrVLPvYnlcerN2ycsTl1LMkD0/60doUMqxiIr06Citudh/4mIgyASI0RYWDNyJ
         RGP96LYq72Zxyk2d5xdyBapDgnuNXPoO997GI9xkNXcfozCRLlEveuB/YaXBjs2VbQRn
         +/0F23Q4qJ5+UaPREJNBVjCv+HROCJRoVHXnruOftFvr8x4Pv7l23GtLxrXiyL15GJ+P
         b6dq+oPySwBtzdZGK+lHsX/vSVc+4+I5iFsSMISEFnRIFbVGEE5aDw4r+CwXpsZktdTS
         0mA7NU5Yf2+Qb0QEFb2Abf1YEPwGVBfY+HHr84x3VrpCUw7fGkiJqYJhoVcuMqttX+kT
         fFHA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=r1a9O5Vv;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 b20-20020a17090a551400b0022c8ba1bd77si5162157pji.174.2023.02.02.10.30.39;
        Thu, 02 Feb 2023 10:30:53 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=r1a9O5Vv;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232733AbjBBSaA (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:30:00 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33040 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232622AbjBBS3X (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:29:23 -0500
Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com
 [IPv6:2607:f8b0:4864:20::449])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBE933CE3E
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:39 -0800 (PST)
Received: by mail-pf1-x449.google.com with SMTP id
 a27-20020aa78e9b000000b00593f636220cso1361496pfr.11
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=qC/QhOftpR+PdNazZD7MOW3k7jLGZsIPHB5S26OPowU=;
        b=r1a9O5Vve17d01xB8zezcTdgAlg0zf+sD9UAz4qcAMUw6GmPEg9i6PFdfS0Qo2hbZz
         dmH62gkel4+bp++gx3KsQzO9hkJ3PfXwqOeVcG04pAA4INKjcm9IZXfVmvFdX0ZFdX7q
         RRbS8huuFX/aPZzBu6FQ6BSGfdggwGarEUFvGhMUKKP3QTZ7/d9dVCLgqFGkLPM45bSm
         mT4caN6RZDes88eDM/TIXJvSsg3HneHoRGMup7f44fn4YiqtA7NlyZxR2pRDtEXaLSIs
         1wB2aQR7pBWHFsNLVz6Idp3Czx1UYu5T4aH1YZJ3kO8MiOmDKPARWd1CmaGiacbUZFoE
         0Pvw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=qC/QhOftpR+PdNazZD7MOW3k7jLGZsIPHB5S26OPowU=;
        b=uGh3jrEoTvb1Ett6hUSfBprLwxzXF3jSQND/wR60sx8ydAZ4ezomSYBoRVpp6F1U73
         qET+J1LjTnsu8xB8V8QyelaBJInIGy+4rhiQdOgUfhTmZPLuodUtH7bAoMWSdXpJG6rK
         S6ce4f3A3Zc+VWGIVHHfNG6CvchWVd76X2JonXaEpUs1ln/E9TODVg6xpaSwM3vjG+7y
         30mXkW8iWB2pqd5cfWfoY71Rkpdoocgg7IRfgUljv4fff2RJWsljfONfMGEDZJRzZeBk
         cE0ROWOG9cpfEEzUNCh5D7GRkHS4Tk0FJlLP63GZ17ozIReHMj3jahP/O4fUc4Zu27GY
         EfLw==
X-Gm-Message-State: AO0yUKWy6hGQDveLk2cJHUqwvzJx/usUm9ka7jydn+/72rbZt6+VA2+f
        +BYXJRyJxMBJIYYH2mquGoakRSh3UWbNe0k52IOdyH+j4y+dwwSuUPgv5IUsu8QPHFq2oE87fwI
        vFb7jiBpMPSP8Murhr65Lk00SOlrT9GwMAmqy/iGtpOBY2EWmaF1r9bW4z2CUagPoBQ4Z13x0
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:aa7:8104:0:b0:592:591c:f6dd with SMTP id
 b4-20020aa78104000000b00592591cf6ddmr1613722pfi.7.1675362519372; Thu, 02 Feb
 2023 10:28:39 -0800 (PST)
Date: Thu,  2 Feb 2023 18:28:05 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-18-bgardon@google.com>
Subject: [PATCH 17/21] KVM: x86/MMU: Add kvm_shadow_mmu_ to the last few
 functions in shadow_mmu.h
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745069180904849?=
X-GMAIL-MSGID: =?utf-8?q?1756745069180904849?=

Fix up the names of the last few Shadow MMU functions in shadow_mmu.h.
This gives a clean and obvious interface between the shared x86 MMU
code and the Shadow MMU. There are still a few functions exported from
paging_tmpl.h that are left as-is, but changing those will need to be
done separately, if at all.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c        | 19 ++++++++-------
 arch/x86/kvm/mmu/shadow_mmu.c | 44 +++++++++++++++++++----------------
 arch/x86/kvm/mmu/shadow_mmu.h | 16 +++++++------
 3 files changed, 43 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 156ab2e4cd811..f5b9db00eff99 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -884,7 +884,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		if (tdp_mmu_enabled)
 			sptep = kvm_tdp_mmu_fast_pf_get_last_sptep(vcpu, fault->addr, &spte);
 		else
-			sptep = fast_pf_get_last_sptep(vcpu, fault->addr, &spte);
+			sptep = kvm_shadow_mmu_fast_pf_get_last_sptep(vcpu, fault->addr, &spte);
 
 		if (!is_shadow_present_pte(spte))
 			break;
@@ -1073,7 +1073,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
 		mmu->root.hpa = root;
 	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
-		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level);
+		root = kvm_shadow_mmu_alloc_root(vcpu, 0, 0, shadow_root_level);
 		mmu->root.hpa = root;
 	} else if (shadow_root_level == PT32E_ROOT_LEVEL) {
 		if (WARN_ON_ONCE(!mmu->pae_root)) {
@@ -1084,8 +1084,8 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 		for (i = 0; i < 4; ++i) {
 			WARN_ON_ONCE(IS_VALID_PAE_ROOT(mmu->pae_root[i]));
 
-			root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT), 0,
-					      PT32_ROOT_LEVEL);
+			root = kvm_shadow_mmu_alloc_root(vcpu,
+					i << (30 - PAGE_SHIFT), 0, PT32_ROOT_LEVEL);
 			mmu->pae_root[i] = root | PT_PRESENT_MASK |
 					   shadow_me_value;
 		}
@@ -1663,7 +1663,7 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
 	 * count. Otherwise, clear the write flooding count.
 	 */
 	if (!new_role.direct)
-		__clear_sp_write_flooding_count(
+		kvm_shadow_mmu_clear_sp_write_flooding_count(
 				to_shadow_page(vcpu->arch.mmu->root.hpa));
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd);
@@ -2439,13 +2439,13 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 	r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.direct);
 	if (r)
 		goto out;
-	r = mmu_alloc_special_roots(vcpu);
+	r = kvm_shadow_mmu_alloc_special_roots(vcpu);
 	if (r)
 		goto out;
 	if (vcpu->arch.mmu->root_role.direct)
 		r = mmu_alloc_direct_roots(vcpu);
 	else
-		r = mmu_alloc_shadow_roots(vcpu);
+		r = kvm_shadow_mmu_alloc_shadow_roots(vcpu);
 	if (r)
 		goto out;
 
@@ -2674,7 +2674,8 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
 	 * generally doesn't use PAE paging and can skip allocating the PDP
 	 * table.  The main exception, handled here, is SVM's 32-bit NPT.  The
 	 * other exception is for shadowing L1's 32-bit or PAE NPT on 64-bit
-	 * KVM; that horror is handled on-demand by mmu_alloc_special_roots().
+	 * KVM; that horror is handled on-demand by
+	 * kvm_shadow_mmu_alloc_special_roots().
 	 */
 	if (tdp_enabled && kvm_mmu_get_tdp_level(vcpu) > PT32E_ROOT_LEVEL)
 		return 0;
@@ -2817,7 +2818,7 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 			return r;
 	}
 
-	node->track_write = kvm_mmu_pte_write;
+	node->track_write = kvm_shadow_mmu_pte_write;
 	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
 	kvm_page_track_register_notifier(kvm, node);
 
diff --git a/arch/x86/kvm/mmu/shadow_mmu.c b/arch/x86/kvm/mmu/shadow_mmu.c
index dfff65db97c3b..eb4424fedd73a 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.c
+++ b/arch/x86/kvm/mmu/shadow_mmu.c
@@ -1404,14 +1404,14 @@ static int mmu_sync_children(struct kvm_vcpu *vcpu, struct kvm_mmu_page *parent,
 	return 0;
 }
 
-void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
+void kvm_shadow_mmu_clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
 {
 	atomic_set(&sp->write_flooding_count,  0);
 }
 
 static void clear_sp_write_flooding_count(u64 *spte)
 {
-	__clear_sp_write_flooding_count(sptep_to_sp(spte));
+	kvm_shadow_mmu_clear_sp_write_flooding_count(sptep_to_sp(spte));
 }
 
 /*
@@ -1482,7 +1482,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 				kvm_flush_remote_tlbs(kvm);
 		}
 
-		__clear_sp_write_flooding_count(sp);
+		kvm_shadow_mmu_clear_sp_write_flooding_count(sp);
 
 		goto out;
 	}
@@ -1607,12 +1607,13 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
 	 * Concretely, a 4-byte PDE consumes bits 31:22, while an 8-byte PDE
 	 * consumes bits 29:21.  To consume bits 31:30, KVM's uses 4 shadow
 	 * PDPTEs; those 4 PAE page directories are pre-allocated and their
-	 * quadrant is assigned in mmu_alloc_root().   A 4-byte PTE consumes
-	 * bits 21:12, while an 8-byte PTE consumes bits 20:12.  To consume
-	 * bit 21 in the PTE (the child here), KVM propagates that bit to the
-	 * quadrant, i.e. sets quadrant to '0' or '1'.  The parent 8-byte PDE
-	 * covers bit 21 (see above), thus the quadrant is calculated from the
-	 * _least_ significant bit of the PDE index.
+	 * quadrant is assigned in kvm_shadow_mmu_alloc_root().
+	 * A 4-byte PTE consumes bits 21:12, while an 8-byte PTE consumes
+	 * bits 20:12.  To consume bit 21 in the PTE (the child here), KVM
+	 * propagates that bit to the quadrant, i.e. sets quadrant to
+	 * '0' or '1'.  The parent 8-byte PDE covers bit 21 (see above), thus
+	 * the quadrant is calculated from the _least_ significant bit of the
+	 * PDE index.
 	 */
 	if (role.has_4_byte_gpte) {
 		WARN_ON_ONCE(role.level != PG_LEVEL_4K);
@@ -2389,7 +2390,8 @@ int kvm_shadow_mmu_direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *faul
  *  - Must be called between walk_shadow_page_lockless_{begin,end}.
  *  - The returned sptep must not be used after walk_shadow_page_lockless_end.
  */
-u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte)
+u64 *kvm_shadow_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa,
+					   u64 *spte)
 {
 	struct kvm_shadow_walk_iterator iterator;
 	u64 old_spte;
@@ -2442,7 +2444,8 @@ static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
 	return ret;
 }
 
-hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant, u8 level)
+hpa_t kvm_shadow_mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
+				u8 level)
 {
 	union kvm_mmu_page_role role = vcpu->arch.mmu->root_role;
 	struct kvm_mmu_page *sp;
@@ -2459,7 +2462,7 @@ hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant, u8 level)
 	return __pa(sp->spt);
 }
 
-static int mmu_first_shadow_root_alloc(struct kvm *kvm)
+static int kvm_shadow_mmu_first_shadow_root_alloc(struct kvm *kvm)
 {
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *slot;
@@ -2520,7 +2523,7 @@ static int mmu_first_shadow_root_alloc(struct kvm *kvm)
 	return r;
 }
 
-int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
+int kvm_shadow_mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
 	u64 pdptrs[4], pm_mask;
@@ -2549,7 +2552,7 @@ int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		}
 	}
 
-	r = mmu_first_shadow_root_alloc(vcpu->kvm);
+	r = kvm_shadow_mmu_first_shadow_root_alloc(vcpu->kvm);
 	if (r)
 		return r;
 
@@ -2563,8 +2566,8 @@ int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * write-protect the guests page table root.
 	 */
 	if (mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL) {
-		root = mmu_alloc_root(vcpu, root_gfn, 0,
-				      mmu->root_role.level);
+		root = kvm_shadow_mmu_alloc_root(vcpu, root_gfn, 0,
+						 mmu->root_role.level);
 		mmu->root.hpa = root;
 		goto set_root_pgd;
 	}
@@ -2617,7 +2620,8 @@ int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		 */
 		quadrant = (mmu->cpu_role.base.level == PT32_ROOT_LEVEL) ? i : 0;
 
-		root = mmu_alloc_root(vcpu, root_gfn, quadrant, PT32_ROOT_LEVEL);
+		root = kvm_shadow_mmu_alloc_root(vcpu, root_gfn, quadrant,
+						 PT32_ROOT_LEVEL);
 		mmu->pae_root[i] = root | pm_mask;
 	}
 
@@ -2636,7 +2640,7 @@ int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	return r;
 }
 
-int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
+int kvm_shadow_mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu *mmu = vcpu->arch.mmu;
 	bool need_pml5 = mmu->root_role.level > PT64_ROOT_4LEVEL;
@@ -3009,8 +3013,8 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
 	return spte;
 }
 
-void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-		       int bytes, struct kvm_page_track_notifier_node *node)
+void kvm_shadow_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			      int bytes, struct kvm_page_track_notifier_node *node)
 {
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	struct kvm_mmu_page *sp;
diff --git a/arch/x86/kvm/mmu/shadow_mmu.h b/arch/x86/kvm/mmu/shadow_mmu.h
index e4fbc842f524e..4d39017873aa6 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.h
+++ b/arch/x86/kvm/mmu/shadow_mmu.h
@@ -39,7 +39,7 @@ struct pte_list_desc {
 /* Only exported for debugfs.c. */
 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
 
-void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp);
+void kvm_shadow_mmu_clear_sp_write_flooding_count(struct kvm_mmu_page *sp);
 
 bool __kvm_shadow_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 				       struct list_head *invalid_list,
@@ -54,17 +54,19 @@ int kvm_shadow_mmu_make_pages_available(struct kvm_vcpu *vcpu);
 int kvm_shadow_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
 
 int kvm_shadow_mmu_direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
-u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte);
+u64 *kvm_shadow_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa,
+					   u64 *spte);
 
-hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant, u8 level);
-int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu);
-int mmu_alloc_special_roots(struct kvm_vcpu *vcpu);
+hpa_t kvm_shadow_mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
+				u8 level);
+int kvm_shadow_mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu);
+int kvm_shadow_mmu_alloc_special_roots(struct kvm_vcpu *vcpu);
 
 int kvm_shadow_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
 			    int *root_level);
 
-void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-		       int bytes, struct kvm_page_track_notifier_node *node);
+void kvm_shadow_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			      int bytes, struct kvm_page_track_notifier_node *node);
 
 void kvm_shadow_mmu_zap_obsolete_pages(struct kvm *kvm);
 bool kvm_shadow_mmu_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);

From patchwork Thu Feb  2 18:28:06 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52134
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp404814wrn;
        Thu, 2 Feb 2023 10:38:25 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set9vjyqp9Q4Men3pl2YobpoPYRWWShzQu9qby2mll8A0Pea4nzQox3sNDCA/mzWCu7iPVftq
X-Received: by 2002:aa7:9258:0:b0:593:1bab:1501 with SMTP id
 24-20020aa79258000000b005931bab1501mr5848715pfp.7.1675363105358;
        Thu, 02 Feb 2023 10:38:25 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675363105; cv=none;
        d=google.com; s=arc-20160816;
        b=PxHCfmZhT9SH8zFCdMez0QZ6Ro/6n1+OsTVuPEzoUB06LjFwUISlJ8SlBPc1VL/5Ww
         byKL+azwiNHS2qu7wdOu6Jl5HAc+S3rrsjo0AfN1f5kgCt1owyuDrZpUZvMmRz0A729C
         wA9wr3E+hL+UykcvHWdMpmXJwgsEktj5jYd6EzCza0AMnAIe0TNxnQhjmn24j64UsBsj
         eqX4YR2bLHUH3TeCm3+aUeYjt+Cn/Coyra6b+3K8NVctIMydAB0SoH6lfDx027EVH3eJ
         IHrVhHksTOXiRmPcEQPTWBy7RgeF329bewMya3Lu7hv/LBKe9pugAATGJfYkWPxtPrUm
         58vw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=hIkQB5ezFR1E5ltI975hig3hYSC7kHPqCc3qy20DN/I=;
        b=aRKiNX6QrRG0Xn26edKr+mpmC8XHAse8f/hBXLhRdJgaOyt5g/cEED/g3JJNVEKlu2
         EpCLJCBfILggt4X8mLp8PqqHBQcbF8ICsubBkMg63RFZ3kne3Cnw3YB+nVUFVakTgYys
         SlJUVclFYzaIv2mOefQ7p7nNGs1QJQwCOr0j8RB18h7+lpPVDASEGtB5GIXNDLgBTubt
         uvRZYCe1WhzSlmW0LFhNdGNu6gNTHmtMx1lAdpgostZKgiqDYhgGjG7Pvqq871DYQ08S
         o9rTFPb3R0MJGHH8FP/Q21YsiJM7rQPe8i25/FLJuOVqsT9oa0t2dq2rYx++PfkN3co0
         mPkA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=Nah7cPJK;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 m65-20020a625844000000b0057629288720si12809pfb.176.2023.02.02.10.38.12;
        Thu, 02 Feb 2023 10:38:25 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=Nah7cPJK;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232875AbjBBSal (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:30:41 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33408 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232845AbjBBS3y (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:29:54 -0500
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F4747B7A2
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:57 -0800 (PST)
Received: by mail-pj1-x104a.google.com with SMTP id
 h1-20020a17090a9c0100b00230353d4d2aso1329382pjp.8
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:57 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=hIkQB5ezFR1E5ltI975hig3hYSC7kHPqCc3qy20DN/I=;
        b=Nah7cPJKZohGO4FLqLOCT6KxuTNsfV+DRfD03Pfh3QmDkv+5covs2CWSHe2yIVwXfy
         nzsvQ21unrgkeewgaki5RAYrp+L/N+1is0nm9d2enSCBuIHvCXgDGxzz9HNmPbxn8Dve
         Ry+C8Z0SasVDvtXWpPp+/1ju59GA4D2DoNm0HVzRzymsI9kvfweHQBu3f77CwZ8IKG9x
         xSji2nCTLE5LYWL/t3vN1QZeUARql/yPqoZFaiUjxJ+fI9HQwJhbcuMcQxe3VIjX4xL9
         JJHmZtsIale/8f3UdcBBR1n9edoP1NwTFR4num2+Uw9mrZyvFKNaMm6crnIqltr3EDFG
         7x8A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=hIkQB5ezFR1E5ltI975hig3hYSC7kHPqCc3qy20DN/I=;
        b=jKxmDw6I7mOzIauBiolPy+eHr+wCpvAKgC6/BYVggDFAuFkmhcKeLG6fPC7l2qZTaM
         KgSFDv5GXen9VWde1+9BZhHXpoUb3ZIRpMJeBipicbDkEHQAtYZV1EJFBrP5IYnBrH5Q
         p1z1wnjOsGVYsdiCtmZj5MRorS5I29SYzueZz4jabmddzpl0h847LQyCfXndQ1hwWUg8
         kns9Dj8pWQ+wDpFXwONtRjs9aCw5aUon4w3Ng4rEfYqb2Zd9In+MFWb1BWrzyTxVfXrh
         viGE+gJJSbUFruq3cd9apymYIwE5R6OVSvMOC6CdwRZ6U3igslIm1w5mL6LBNCQ9upqj
         3dhw==
X-Gm-Message-State: AO0yUKV5XMZhHHx0i743VxBcZIrG+cFL2zUTMbhvDJef+2qbMPO0VxdK
        QGf3uo+ZUMme6IeBNeyWapkd2Ey+scazWPW5oj2LEQbC8sg/7RPxYB1dxejACE9cx9AZgK1EBbg
        x9su9hh393+3tXW3hFsoThX2WvWKNwOrAB+iEuFOe2AR+fBlY31Ccjo5tQl2vGM4eHz5CXM2J
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a17:902:f68e:b0:198:d453:68ef with SMTP id
 l14-20020a170902f68e00b00198d45368efmr213738plg.0.1675362521139; Thu, 02 Feb
 2023 10:28:41 -0800 (PST)
Date: Thu,  2 Feb 2023 18:28:06 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-19-bgardon@google.com>
Subject: [PATCH 18/21] KVM: x86/mmu: Move split cache topup functions to
 shadow_mmu.c
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745543759748100?=
X-GMAIL-MSGID: =?utf-8?q?1756745543759748100?=

The split cache topup functions are only used by the Shadow MMU and were
left behind in mmu.c when splitting the Shadow MMU out to a separate
file. Move them over as well.

No functional change intended.

Suggested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 53 ---------------------------------
 arch/x86/kvm/mmu/mmu_internal.h |  2 --
 arch/x86/kvm/mmu/shadow_mmu.c   | 53 +++++++++++++++++++++++++++++++++
 3 files changed, 53 insertions(+), 55 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f5b9db00eff99..8514e998e2127 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2902,59 +2902,6 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 	}
 }
 
-static inline bool need_topup(struct kvm_mmu_memory_cache *cache, int min)
-{
-	return kvm_mmu_memory_cache_nr_free_objects(cache) < min;
-}
-
-bool need_topup_split_caches_or_resched(struct kvm *kvm)
-{
-	if (need_resched() || rwlock_needbreak(&kvm->mmu_lock))
-		return true;
-
-	/*
-	 * In the worst case, SPLIT_DESC_CACHE_MIN_NR_OBJECTS descriptors are needed
-	 * to split a single huge page. Calculating how many are actually needed
-	 * is possible but not worth the complexity.
-	 */
-	return need_topup(&kvm->arch.split_desc_cache, SPLIT_DESC_CACHE_MIN_NR_OBJECTS) ||
-	       need_topup(&kvm->arch.split_page_header_cache, 1) ||
-	       need_topup(&kvm->arch.split_shadow_page_cache, 1);
-}
-
-int topup_split_caches(struct kvm *kvm)
-{
-	/*
-	 * Allocating rmap list entries when splitting huge pages for nested
-	 * MMUs is uncommon as KVM needs to use a list if and only if there is
-	 * more than one rmap entry for a gfn, i.e. requires an L1 gfn to be
-	 * aliased by multiple L2 gfns and/or from multiple nested roots with
-	 * different roles.  Aliasing gfns when using TDP is atypical for VMMs;
-	 * a few gfns are often aliased during boot, e.g. when remapping BIOS,
-	 * but aliasing rarely occurs post-boot or for many gfns.  If there is
-	 * only one rmap entry, rmap->val points directly at that one entry and
-	 * doesn't need to allocate a list.  Buffer the cache by the default
-	 * capacity so that KVM doesn't have to drop mmu_lock to topup if KVM
-	 * encounters an aliased gfn or two.
-	 */
-	const int capacity = SPLIT_DESC_CACHE_MIN_NR_OBJECTS +
-			     KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE;
-	int r;
-
-	lockdep_assert_held(&kvm->slots_lock);
-
-	r = __kvm_mmu_topup_memory_cache(&kvm->arch.split_desc_cache, capacity,
-					 SPLIT_DESC_CACHE_MIN_NR_OBJECTS);
-	if (r)
-		return r;
-
-	r = kvm_mmu_topup_memory_cache(&kvm->arch.split_page_header_cache, 1);
-	if (r)
-		return r;
-
-	return kvm_mmu_topup_memory_cache(&kvm->arch.split_shadow_page_cache, 1);
-}
-
 /* Must be called with the mmu_lock held in write-mode. */
 void kvm_mmu_try_split_huge_pages(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot,
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 349d4a300ad34..2273c6263faf0 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -348,8 +348,6 @@ void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu);
 void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu);
 
 int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect);
-bool need_topup_split_caches_or_resched(struct kvm *kvm);
-int topup_split_caches(struct kvm *kvm);
 
 bool is_page_fault_stale(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
 bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/mmu/shadow_mmu.c b/arch/x86/kvm/mmu/shadow_mmu.c
index eb4424fedd73a..bb23692d34a73 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.c
+++ b/arch/x86/kvm/mmu/shadow_mmu.c
@@ -3219,6 +3219,59 @@ bool slot_rmap_write_protect(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 	return rmap_write_protect(rmap_head, false);
 }
 
+static inline bool need_topup(struct kvm_mmu_memory_cache *cache, int min)
+{
+	return kvm_mmu_memory_cache_nr_free_objects(cache) < min;
+}
+
+static bool need_topup_split_caches_or_resched(struct kvm *kvm)
+{
+	if (need_resched() || rwlock_needbreak(&kvm->mmu_lock))
+		return true;
+
+	/*
+	 * In the worst case, SPLIT_DESC_CACHE_MIN_NR_OBJECTS descriptors are needed
+	 * to split a single huge page. Calculating how many are actually needed
+	 * is possible but not worth the complexity.
+	 */
+	return need_topup(&kvm->arch.split_desc_cache, SPLIT_DESC_CACHE_MIN_NR_OBJECTS) ||
+	       need_topup(&kvm->arch.split_page_header_cache, 1) ||
+	       need_topup(&kvm->arch.split_shadow_page_cache, 1);
+}
+
+static int topup_split_caches(struct kvm *kvm)
+{
+	/*
+	 * Allocating rmap list entries when splitting huge pages for nested
+	 * MMUs is uncommon as KVM needs to use a list if and only if there is
+	 * more than one rmap entry for a gfn, i.e. requires an L1 gfn to be
+	 * aliased by multiple L2 gfns and/or from multiple nested roots with
+	 * different roles.  Aliasing gfns when using TDP is atypical for VMMs;
+	 * a few gfns are often aliased during boot, e.g. when remapping BIOS,
+	 * but aliasing rarely occurs post-boot or for many gfns.  If there is
+	 * only one rmap entry, rmap->val points directly at that one entry and
+	 * doesn't need to allocate a list.  Buffer the cache by the default
+	 * capacity so that KVM doesn't have to drop mmu_lock to topup if KVM
+	 * encounters an aliased gfn or two.
+	 */
+	const int capacity = SPLIT_DESC_CACHE_MIN_NR_OBJECTS +
+			     KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE;
+	int r;
+
+	lockdep_assert_held(&kvm->slots_lock);
+
+	r = __kvm_mmu_topup_memory_cache(&kvm->arch.split_desc_cache, capacity,
+					 SPLIT_DESC_CACHE_MIN_NR_OBJECTS);
+	if (r)
+		return r;
+
+	r = kvm_mmu_topup_memory_cache(&kvm->arch.split_page_header_cache, 1);
+	if (r)
+		return r;
+
+	return kvm_mmu_topup_memory_cache(&kvm->arch.split_shadow_page_cache, 1);
+}
+
 static struct kvm_mmu_page *shadow_mmu_get_sp_for_split(struct kvm *kvm, u64 *huge_sptep)
 {
 	struct kvm_mmu_page *huge_sp = sptep_to_sp(huge_sptep);

From patchwork Thu Feb  2 18:28:07 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52136
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp405077wrn;
        Thu, 2 Feb 2023 10:39:02 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set+uh84ULXQ+r+YY6LKiGM32WoucxxCmvoWTpzLBeBVkyM7FHuKCnC3qVU6l99AHtfPgB4qv
X-Received: by 2002:a17:902:dac6:b0:196:8124:dbe8 with SMTP id
 q6-20020a170902dac600b001968124dbe8mr8961450plx.61.1675363142006;
        Thu, 02 Feb 2023 10:39:02 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675363141; cv=none;
        d=google.com; s=arc-20160816;
        b=kqxLHTCAFN+JzNymeYQiQs0s7/L/rwgOXaQl5Udzxv+HsLGrLU9Ezb+6n6zNmHxW1I
         mRikI6LSmkd/3UGpkAzyfSEgEmTNJuNRSOj81ZTj4fEvEoBvskOQmvtVegF5/FUU3dmf
         CgGxlVYYIE1mIHNmu+wRFTltE24C0L6edMzPSmFy3cHhNOdMqigpRNQfmc0D/u82AODM
         6K9qaotyrCWYZVGYyn2O0tvhuoMRLl1tLxoyE3Two1McTND/Qie63+9SvcKjFRls3fLN
         Foeoji8ufte7C70qaMxJJeXpVmbR74vt0ygcP0HQM8WCf9ed48qTv3/16bg6MBGZoeQf
         KDiw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=+tw1hsgT63166ej2zhrQQOmMvKwFEkphuz02olq7+08=;
        b=QKGrDGm4+bCDPhmwgPgfOGh4Ku50lqu5TKmYp72RviIQs9RdHR7jOEf1SVNE5+NnWl
         ZYc6OTAt5XtBqo+1Y0xsDjPv7Idq/jTJIE+Vh4+S07jZWmzzeh7GXeq8IGVnetkiUWib
         z5nxu8NDNjc3MwYnHPsajF0hPfQienx7QvpLkzy66slnxO0WkDZdAcs+LgNK1MTX8+TN
         aMVIKDoYk4L6jq54DfGKuZHLew3ppX7ZE8EaFHstozqaH96Ph6eBzKtvIa3n3xRBNCdi
         xkpFFl8G/eH95ViR1fj7Th2Vru/dyULBhneM+L/nTBvOD6C+yqxu6JLp6O86sSUbF1Wr
         U2Jg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=XUzVSAJx;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 w22-20020a170902a71600b00194b1c44205si21150988plq.523.2023.02.02.10.38.49;
        Thu, 02 Feb 2023 10:39:01 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=XUzVSAJx;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232955AbjBBSao (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:30:44 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33430 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232861AbjBBS3z (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:29:55 -0500
Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com
 [IPv6:2607:f8b0:4864:20::114a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42BC87BE42
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:28:58 -0800 (PST)
Received: by mail-yw1-x114a.google.com with SMTP id
 00721157ae682-524c76beb5eso1759077b3.21
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:28:58 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=+tw1hsgT63166ej2zhrQQOmMvKwFEkphuz02olq7+08=;
        b=XUzVSAJxWqPbzwVpr2pag5CYXwETYKK0ALTrMWIMIaMu9FjXqHaBxWsq6xe7VlIz30
         yFIDNAFg1bM/ofmE4lU/7FZABxu12+JMdYE3NzAf09Ckl4wguykHS6VSPMPFZr1saL69
         Sl4KrpQGIX4AM9TkxrEbGqTvxEfsnRe0OD4hqMCA/jj8jRZM8ldKBHJvudyJzDH1R/7u
         h+8PFXEa/1NMjUUolvl18NX70IoW2UA1H3yQ0qPMFf+AWxvhU6+ruNAxOey9YOhoWE89
         QoaDG2ZOqizZI1ALnaHRHMb7Fop0Lph4P72Lzd/W6jJ2B+6vqIBmoOWjaoEblZ0BpWuh
         2Xfw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=+tw1hsgT63166ej2zhrQQOmMvKwFEkphuz02olq7+08=;
        b=xIkhCf2h0WvNH1ArWZZACFNYGoaMFKLpd9QjH+vnos541Nn3IWqTt8ubbJeWRQVPUx
         RPEYu9ucUBs0kBphSOMcROgd/XMSZTyAM77VR9MVySgsOsf5TfkoGGZipHKBVThX/OIK
         wHShUV/gHq7NJHu9aFd6XoQVjJfzy0JD/Igf0gQfGI6yhyD98mYyQc+tAWagFps/1A6G
         nISKxkR3qrw/HKKcnvht4FJl4sb+iIh++Ld5mPK6FoF0JECU+JWuaxk6CGCMBd63u+pk
         nPuAx11szHDRGu5d1BQApSEiJCmj3s3H61RS7z0LVp/iW9etsQuJ1Nj7dleewmz/YKW/
         iLqQ==
X-Gm-Message-State: AO0yUKVyA1c+3NMWhGQbzwVXyFFa98/4REXyEQQYHjPD1NLjMqgyeGGt
        xIRJVQOHWGykiSUfeGCFwxVr1Bn1097whpMfPopAGd2BUEkBqYFF/fbC9WhgcVBGqY1ZbXt2/Xi
        ubB7Rxmau2/lTJ9LYf3BmpdAqRhBvQm8EOLo3HG7Xc2BLFlx4H0w1BiZimnIgT+Hnl09BcN/j
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a81:78ca:0:b0:521:db3f:a11f with SMTP id
 t193-20020a8178ca000000b00521db3fa11fmr5ywc.9.1675362522851; Thu, 02 Feb 2023
 10:28:42 -0800 (PST)
Date: Thu,  2 Feb 2023 18:28:07 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-20-bgardon@google.com>
Subject: [PATCH 19/21] KVM: x86/mmu: Move Shadow MMU part of kvm_mmu_zap_all()
 to shadow_mmu.h
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745582061117919?=
X-GMAIL-MSGID: =?utf-8?q?1756745582061117919?=

Move the Shadow MMU part of kvm_mmu_zap_all() into a helper function in
shadow_mmu.h. Also check kvm_memslots_have_rmaps so the Shadow MMU
operation can be skipped entierly if it's not needed. This could present
an opportuinity to move the TDP MMU portion of the function under the
MMU lock in read mode, but since zapping all paging structures should be
a very rare and thus not a perfromance sensitive operation, it's not
necessary.

Suggested-by: David Matlack <dmatlack@google.com>

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c        | 17 ++---------------
 arch/x86/kvm/mmu/shadow_mmu.c | 19 +++++++++++++++++++
 arch/x86/kvm/mmu/shadow_mmu.h |  2 ++
 3 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8514e998e2127..63b928bded9d1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3011,22 +3011,9 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 
 void kvm_mmu_zap_all(struct kvm *kvm)
 {
-	struct kvm_mmu_page *sp, *node;
-	LIST_HEAD(invalid_list);
-	int ign;
-
 	write_lock(&kvm->mmu_lock);
-restart:
-	list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
-		if (WARN_ON(sp->role.invalid))
-			continue;
-		if (__kvm_shadow_mmu_prepare_zap_page(kvm, sp, &invalid_list, &ign))
-			goto restart;
-		if (cond_resched_rwlock_write(&kvm->mmu_lock))
-			goto restart;
-	}
-
-	kvm_shadow_mmu_commit_zap_page(kvm, &invalid_list);
+	if (kvm_memslots_have_rmaps(kvm))
+		kvm_shadow_mmu_zap_all(kvm);
 
 	if (tdp_mmu_enabled)
 		kvm_tdp_mmu_zap_all(kvm);
diff --git a/arch/x86/kvm/mmu/shadow_mmu.c b/arch/x86/kvm/mmu/shadow_mmu.c
index bb23692d34a73..c6d3da795992e 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.c
+++ b/arch/x86/kvm/mmu/shadow_mmu.c
@@ -3604,3 +3604,22 @@ bool kvm_shadow_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
 	return kvm_handle_gfn_range(kvm, range, kvm_test_age_rmap);
 }
+
+void kvm_shadow_mmu_zap_all(struct kvm *kvm)
+{
+	struct kvm_mmu_page *sp, *node;
+	LIST_HEAD(invalid_list);
+	int ign;
+
+restart:
+	list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
+		if (WARN_ON(sp->role.invalid))
+			continue;
+		if (__kvm_shadow_mmu_prepare_zap_page(kvm, sp, &invalid_list, &ign))
+			goto restart;
+		if (cond_resched_rwlock_write(&kvm->mmu_lock))
+			goto restart;
+	}
+
+	kvm_shadow_mmu_commit_zap_page(kvm, &invalid_list);
+}
diff --git a/arch/x86/kvm/mmu/shadow_mmu.h b/arch/x86/kvm/mmu/shadow_mmu.h
index 4d39017873aa6..ab01636373bda 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.h
+++ b/arch/x86/kvm/mmu/shadow_mmu.h
@@ -101,6 +101,8 @@ bool kvm_shadow_mmu_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_shadow_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_shadow_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
 
+void kvm_shadow_mmu_zap_all(struct kvm *kvm);
+
 /* Exports from paging_tmpl.h */
 gpa_t paging32_gva_to_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			  gpa_t vaddr, u64 access,

From patchwork Thu Feb  2 18:28:08 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52139
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp410381wrn;
        Thu, 2 Feb 2023 10:50:24 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set8Sj/aNAxq+jYxMluCUghwhUcYzGeufuw4IH8RUEqACIsSKIId7CTafo3jOJsgBUj+wkFOy
X-Received: by 2002:a17:902:d510:b0:198:b945:4108 with SMTP id
 b16-20020a170902d51000b00198b9454108mr5503208plg.0.1675363824233;
        Thu, 02 Feb 2023 10:50:24 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675363824; cv=none;
        d=google.com; s=arc-20160816;
        b=j4GrYoPdGuSWeXmxK1wByPObgfFJv/gIPFVyjjL1Gsnah5+O+Sd6A8TJ8HgM8PrrW5
         MKIjWFs0Oz3SQhwU+d5k6k+qYabS9gfjviM/ywfz1K08uLtHXS/j9iibTbEUtwBanra9
         MSjf52j3bZQOdJ1k877dNhGSAypaiQbJaPkRFgwhm5hHiHhca5witvSvHMcdpEI4vwxs
         934Id5+jL+3zRCJ2YeUSlJVKyVPep1HW0Dl50jhh041m8a1fyFpjYVcwCR68F9rt5KUh
         LBBzDrqK64OXFUHXRneoJwhv1NIU1FbbNmvE4eCEfwaAzmQIgDmJCMNjDjfVIUThg7pV
         g01w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=j1L53IKzbh0q2dj44QZHCfw/umfVO+10rbiNwcUgEhA=;
        b=NWp3wNPplR7NaFvRulZatJCajw1PdcBFb/xYXJMy6k6UYL8Eap09BB0uU7UM+EAfAD
         qiBZ/JSKHiWLAG4LBTOoIX6KTg0qRhDg0iUNc2pSy1kerilR8HhptrVWgqgTYa6d/UY7
         1AxuIDyqALFrGzLG+XioUZSvXAW2jSJkmie2vcs/3Hmz4+1EAuKvF5ddD6m/f3RuuN/k
         FNk6sMRd0v/s+6hi+4voo3S/FaeCP86vVCu+H4N5fz6nxrh1LDzB5vDHfIeU+eZNthQd
         EprnNHTxhtu1yhWqI1fLT+2/pN/wVCavzLSo+ZY2Ft6S6LAEKd2qrSi0CzYsuzWO0yzv
         TNNg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=W1S45eT2;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 m3-20020a1709026bc300b001967afdbc94si56404plt.27.2023.02.02.10.50.11;
        Thu, 02 Feb 2023 10:50:24 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b=W1S45eT2;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232960AbjBBSar (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:30:47 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33438 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232869AbjBBS34 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:29:56 -0500
Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com
 [IPv6:2607:f8b0:4864:20::44a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2111054542
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:29:00 -0800 (PST)
Received: by mail-pf1-x44a.google.com with SMTP id
 x9-20020aa79409000000b00593a1f7c3d8so1367486pfo.14
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:29:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=j1L53IKzbh0q2dj44QZHCfw/umfVO+10rbiNwcUgEhA=;
        b=W1S45eT23c0+qcokTUS8aeHe9NQ0UJlsA8P8amVpYEG5na6e9bWDZkdSemeWjy5s4c
         Q3x5YQdMqMG9sOw+jpBa0BuqS4vl1h5JFeEVTcscL7zQrdvi83yZU0mdweM3jK3Sybnc
         7bn1PGgFgtR7pQQ5yibMzVq38fz84GEMPIbRGY42hhwRwHkWdDXvI0LsCl1A+21U3dmT
         WbDhSnwzRmI8cVTzq19EpBrUIsXhdGHet3yG0DbiFyYYBZBQ9da39DANL1F2QxLZPpSf
         ifC9/mJFXwbWze15ZHLVPt1tb78Ksk9gqibxUCSuCLGD6JNE63FFu/w3e6I50lLnfMMG
         XbQA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=j1L53IKzbh0q2dj44QZHCfw/umfVO+10rbiNwcUgEhA=;
        b=Ztr/pWU97xUxBqlnI+awqDefRfeepK84LNIMZ27b81hRqxnborw5O/U2BsITpL7kSf
         fZGYJrtkUQxtap88ReD1hrZEs0RJTjsvmEjvsvBeRWsr7mCslgzqafbEPFZhbk9BO4Ca
         8BG/LK+4wqqJB6HUsb6UvMmEp4rbYbefr6bwEN9RWM3EZk9PYv6g36VsdN/S1fuBOLep
         /Z4gmhn3Lpg65w7Ri3ue8TlzNLgoAqNjEHh9Gg2udAxBvR4pCVZ9rZUgzXw5oQb9ex1/
         O+B/l/LxZ9yskht+GtDtaIOsP4PJizSmqGA4cjeSqN5BvxjHHIOUzDDJzOaV4qeeOVfV
         dDqA==
X-Gm-Message-State: AO0yUKXtJwbcTD2myOGkRC0KBeRfwTmCrSudJ+w/9HgrAm9+eZmeOhCp
        HcnxOq4yahlMNVh6InbroKNF6xaxqptZFwj0bGNzslnU+5tOyK9zf72qr3Gz/rK94Sf+s8zbQyn
        Uh7Cydmd4eqlf+5UQmOGRWfMD1zBHqxNj6b1Xfny3tpW9dqRExeqkBucG8zGqvEBl7QM7BWRS
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a62:1a57:0:b0:593:bac2:b49 with SMTP id
 a84-20020a621a57000000b00593bac20b49mr1825572pfa.44.1675362524733; Thu, 02
 Feb 2023 10:28:44 -0800 (PST)
Date: Thu,  2 Feb 2023 18:28:08 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-21-bgardon@google.com>
Subject: [PATCH 20/21] KVM: x86/mmu: Move Shadow MMU init/teardown to
 shadow_mmu.c
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756746297473338110?=
X-GMAIL-MSGID: =?utf-8?q?1756746297473338110?=

Move the meat of kvm_mmu_init_vm() and kvm_mmu_uninit_vm() pertaining to
the Shadow MMU to shadow_mmu.c.

Suggested-by: David Matlack <dmatlack@google.com>

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 41 +++---------------------------
 arch/x86/kvm/mmu/mmu_internal.h |  2 ++
 arch/x86/kvm/mmu/shadow_mmu.c   | 44 +++++++++++++++++++++++++++++++--
 arch/x86/kvm/mmu/shadow_mmu.h   |  6 ++---
 4 files changed, 51 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 63b928bded9d1..10aff23dea75d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2743,7 +2743,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
  * not use any resource of the being-deleted slot or all slots
  * after calling the function.
  */
-static void kvm_mmu_zap_all_fast(struct kvm *kvm)
+void kvm_mmu_zap_all_fast(struct kvm *kvm)
 {
 	lockdep_assert_held(&kvm->slots_lock);
 
@@ -2795,22 +2795,13 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 		kvm_tdp_mmu_zap_invalidated_roots(kvm);
 }
 
-static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
-			struct kvm_memory_slot *slot,
-			struct kvm_page_track_notifier_node *node)
-{
-	kvm_mmu_zap_all_fast(kvm);
-}
-
 int kvm_mmu_init_vm(struct kvm *kvm)
 {
-	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
 	int r;
 
-	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
-	INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
 	INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages);
-	spin_lock_init(&kvm->arch.mmu_unsync_pages_lock);
+
+	kvm_mmu_init_shadow_mmu(kvm);
 
 	if (tdp_mmu_enabled) {
 		r = kvm_mmu_init_tdp_mmu(kvm);
@@ -2818,38 +2809,14 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 			return r;
 	}
 
-	node->track_write = kvm_shadow_mmu_pte_write;
-	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
-	kvm_page_track_register_notifier(kvm, node);
-
-	kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache;
-	kvm->arch.split_page_header_cache.gfp_zero = __GFP_ZERO;
-
-	kvm->arch.split_shadow_page_cache.gfp_zero = __GFP_ZERO;
-
-	kvm->arch.split_desc_cache.kmem_cache = pte_list_desc_cache;
-	kvm->arch.split_desc_cache.gfp_zero = __GFP_ZERO;
-
 	return 0;
 }
 
-static void mmu_free_vm_memory_caches(struct kvm *kvm)
-{
-	kvm_mmu_free_memory_cache(&kvm->arch.split_desc_cache);
-	kvm_mmu_free_memory_cache(&kvm->arch.split_page_header_cache);
-	kvm_mmu_free_memory_cache(&kvm->arch.split_shadow_page_cache);
-}
-
 void kvm_mmu_uninit_vm(struct kvm *kvm)
 {
-	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
-
-	kvm_page_track_unregister_notifier(kvm, node);
-
+	kvm_mmu_uninit_shadow_mmu(kvm);
 	if (tdp_mmu_enabled)
 		kvm_mmu_uninit_tdp_mmu(kvm);
-
-	mmu_free_vm_memory_caches(kvm);
 }
 
 /*
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 2273c6263faf0..c49d302b037ec 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -406,4 +406,6 @@ BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pke);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, la57);
 BUILD_MMU_ROLE_ACCESSOR(base, efer, nx);
 BUILD_MMU_ROLE_ACCESSOR(ext,  efer, lma);
+
+void kvm_mmu_zap_all_fast(struct kvm *kvm);
 #endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/shadow_mmu.c b/arch/x86/kvm/mmu/shadow_mmu.c
index c6d3da795992e..6449ac4de4883 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.c
+++ b/arch/x86/kvm/mmu/shadow_mmu.c
@@ -3013,8 +3013,9 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
 	return spte;
 }
 
-void kvm_shadow_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			      int bytes, struct kvm_page_track_notifier_node *node)
+static void kvm_shadow_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
+				     const u8 *new, int bytes,
+				     struct kvm_page_track_notifier_node *node)
 {
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	struct kvm_mmu_page *sp;
@@ -3623,3 +3624,42 @@ void kvm_shadow_mmu_zap_all(struct kvm *kvm)
 
 	kvm_shadow_mmu_commit_zap_page(kvm, &invalid_list);
 }
+
+static void kvm_shadow_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
+			struct kvm_memory_slot *slot,
+			struct kvm_page_track_notifier_node *node)
+{
+	kvm_mmu_zap_all_fast(kvm);
+}
+
+void kvm_mmu_init_shadow_mmu(struct kvm *kvm)
+{
+	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
+
+	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
+	INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
+	spin_lock_init(&kvm->arch.mmu_unsync_pages_lock);
+
+	node->track_write = kvm_shadow_mmu_pte_write;
+	node->track_flush_slot = kvm_shadow_mmu_invalidate_zap_pages_in_memslot;
+	kvm_page_track_register_notifier(kvm, node);
+
+	kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache;
+	kvm->arch.split_page_header_cache.gfp_zero = __GFP_ZERO;
+
+	kvm->arch.split_shadow_page_cache.gfp_zero = __GFP_ZERO;
+
+	kvm->arch.split_desc_cache.kmem_cache = pte_list_desc_cache;
+	kvm->arch.split_desc_cache.gfp_zero = __GFP_ZERO;
+}
+
+void kvm_mmu_uninit_shadow_mmu(struct kvm *kvm)
+{
+	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
+
+	kvm_page_track_unregister_notifier(kvm, node);
+
+	kvm_mmu_free_memory_cache(&kvm->arch.split_desc_cache);
+	kvm_mmu_free_memory_cache(&kvm->arch.split_page_header_cache);
+	kvm_mmu_free_memory_cache(&kvm->arch.split_shadow_page_cache);
+}
diff --git a/arch/x86/kvm/mmu/shadow_mmu.h b/arch/x86/kvm/mmu/shadow_mmu.h
index ab01636373bda..f2e54355ebb19 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.h
+++ b/arch/x86/kvm/mmu/shadow_mmu.h
@@ -65,9 +65,6 @@ int kvm_shadow_mmu_alloc_special_roots(struct kvm_vcpu *vcpu);
 int kvm_shadow_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
 			    int *root_level);
 
-void kvm_shadow_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			      int bytes, struct kvm_page_track_notifier_node *node);
-
 void kvm_shadow_mmu_zap_obsolete_pages(struct kvm *kvm);
 bool kvm_shadow_mmu_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
 
@@ -103,6 +100,9 @@ bool kvm_shadow_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
 
 void kvm_shadow_mmu_zap_all(struct kvm *kvm);
 
+void kvm_mmu_init_shadow_mmu(struct kvm *kvm);
+void kvm_mmu_uninit_shadow_mmu(struct kvm *kvm);
+
 /* Exports from paging_tmpl.h */
 gpa_t paging32_gva_to_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			  gpa_t vaddr, u64 access,

From patchwork Thu Feb  2 18:28:09 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Gardon <bgardon@google.com>
X-Patchwork-Id: 52135
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp404901wrn;
        Thu, 2 Feb 2023 10:38:38 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set+34TUNd8VgPqtfbNX4LSRJDAQozpbsayuqd/kCpx8Un1H8xL6rORwxoZTOKuoyWBIing15
X-Received: by 2002:a05:6a20:1aa5:b0:be:b878:6d71 with SMTP id
 ci37-20020a056a201aa500b000beb8786d71mr7054990pzb.7.1675363118588;
        Thu, 02 Feb 2023 10:38:38 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1675363118; cv=none;
        d=google.com; s=arc-20160816;
        b=uBfHxurBz8zPiC9tYgYX2Hfwe4uGyOp/xEdJ6sDl1QVaYMJsbwazb13L88/h4rpHIW
         W3samAxBA1U1WAhUIb5rS7d69y1LFysz+NuYCB1iFGfkixa1pYTxCpjp7qCFgPnx4SjB
         ljjeAgy+c1L6HCgdJOmjH4ZA7AXRVmnELpR1jvUHtD9T/nPFC31PWBJwdl3VuFVTNmHV
         EkkDq4yK8dtVCG+o71v0g09HyYnQzXEuHbKvqGOR2hc2n1QaABxGFMV0AMkePRxz/jXz
         LuRsNpotJOMZOp/Ns7eA0ZLNe4jl6pBktJgmPcPR/YSo5M6rE/iZlFFIew3WX8a24H5z
         6Alg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:dkim-signature;
        bh=NXpeAGdEPSulNPWvHhUL3AAPz0Ix6lkzu8E3W+75JBc=;
        b=BiYqMJEQDZKX8yBUua4PtUDm4fc5WoGkzBobgCM1Wf92vFVWbO9S+kBRCWHq70N+F7
         he1G77SJ4Aia8gFD9fw/qIDg5Qksj//hpSR+Z08mloGthEuX5r85OdYwdvUMEJDkutq5
         XvcpSu5s1PS+4kkSfioKuWUBwXO1QIh2PVqv8w03E57fq9DjdEpzy6rlQb05wx5yVy+9
         viMBZIPNcKAjpW3aXgr1+kjsWVSt8lDsRFnzjb01bM5ty47TjVZ7ux2eRbMzt2bbQ3i8
         5Hzf4dLkbycYT82CrXz3wBJQjFHeU1AozWo4MNQ2dhGFSugeYoO0tcYcgU4MsPihC0Z5
         82zw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b="dsU/td+8";
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 e22-20020a633716000000b004df020aac1dsi307369pga.165.2023.02.02.10.38.25;
        Thu, 02 Feb 2023 10:38:38 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@google.com header.s=20210112 header.b="dsU/td+8";
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232653AbjBBSa6 (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others);
        Thu, 2 Feb 2023 13:30:58 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33556 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232888AbjBBSaO (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Feb 2023 13:30:14 -0500
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19BDC67798
        for <linux-kernel@vger.kernel.org>;
 Thu,  2 Feb 2023 10:29:05 -0800 (PST)
Received: by mail-pg1-x549.google.com with SMTP id
 s76-20020a632c4f000000b0049ceb0f185eso1366043pgs.7
        for <linux-kernel@vger.kernel.org>;
 Thu, 02 Feb 2023 10:29:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=NXpeAGdEPSulNPWvHhUL3AAPz0Ix6lkzu8E3W+75JBc=;
        b=dsU/td+8xp0leO2w7PQmQDrRLVy4p/HFJoA82VAfk4ElPpgJQNeyIPufST4m9JSM8x
         sn/Fp0hiIj1iDpo4oSb+h3kxZ1s0jSEc6CGM3PAMMXIAsXOdNQfVWWI1VzLR0lRrTIwV
         wu/ZEVfSv9U1/3EGwxbBaciymb7Fjp1VaYTkt9B7xxT4FqzoaQ2xEDqUGUALILNQVMZs
         D5Z+J+zNsc+S5sTYOEhy+oOF4Ry6e6UzvFGlARBvoY5fn+OVeVpOptjD8H893NbK61MR
         6nKY4HCqESI6ItEmDJWLCnauPvJVXHbI5Ji8C8rdpYM+a75osUi0570GEeEdduDQoZsc
         PsWg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=NXpeAGdEPSulNPWvHhUL3AAPz0Ix6lkzu8E3W+75JBc=;
        b=JcgG9BWynumluwwUY5HQXmedFA7TQWwvvm4RKX6hiWzroQx9byqtPl1jWNNFwnSKYY
         JyQxGTvvGxK0HXkFTgqWJikKAyDBSOwoWH+FDon2Tvdjh1xyYXCe9PJModTSovXAEmjG
         nRRyyzOi6ahj3I+qA3B05Y/uIl2CASAdhkzgodYoSjLDeYo1fQVB2wAi6if9fO1LKXOl
         0v0L1uCR5RDI1hhrsJ7P/Tzb/mmPBeV/mEg28PHuLUxllG2G1vc3fBKpFxAWVQvRXnMK
         7XNQ1Vnsf9YuMIVivX8/M3uyixccfHM1y2+h4b/5qcVa2MK0VB+kf1Dxmxibf36Qo6bE
         JoAw==
X-Gm-Message-State: AO0yUKXuTzS1WiHZc7o9VrXjbvwk3HFuAMxuHP9pm96dy+huBLjUsrJI
        +j0eX9+8dhOejsBDuer1PXO7Xq6EjVt9Kyx4IYgeeqS4scqU03EVP07/oh83aoSOXUFk4scwDmo
        wxdRdKdYEcTrjeRimrDbuIaj4hQOUgj/DgDEVrqZGdElrHll/ybLopdalQQ/kxtgv9jOQ78ZG
X-Received: from sweer.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:e45])
 (user=bgardon job=sendgmr) by 2002:a17:902:7885:b0:196:7545:2cca with SMTP id
 q5-20020a170902788500b0019675452ccamr1670026pll.0.1675362526384; Thu, 02 Feb
 2023 10:28:46 -0800 (PST)
Date: Thu,  2 Feb 2023 18:28:09 +0000
In-Reply-To: <20230202182809.1929122-1-bgardon@google.com>
Mime-Version: 1.0
References: <20230202182809.1929122-1-bgardon@google.com>
X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog
Message-ID: <20230202182809.1929122-22-bgardon@google.com>
Subject: [PATCH 21/21] KVM: x86/mmu: Split out Shadow MMU lockless walk
 begin/end
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        David Matlack <dmatlack@google.com>,
        Vipin Sharma <vipinsh@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Ben Gardon <bgardon@google.com>
X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1756745557303562369?=
X-GMAIL-MSGID: =?utf-8?q?1756745557303562369?=

Split out the meat of kvm_shadow_mmu_walk_lockless_begin/end() to
functions in shadow_mmu.c since there's no need for it in the common MMU
code.

Suggested-by: David Matlack <dmatlack@google.com>

Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c        | 31 ++++++-------------------------
 arch/x86/kvm/mmu/shadow_mmu.c | 27 +++++++++++++++++++++++++++
 arch/x86/kvm/mmu/shadow_mmu.h |  3 +++
 3 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 10aff23dea75d..cfccc4c7a1427 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -207,37 +207,18 @@ static inline bool is_tdp_mmu_active(struct kvm_vcpu *vcpu)
 
 void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
 {
-	if (is_tdp_mmu_active(vcpu)) {
+	if (is_tdp_mmu_active(vcpu))
 		kvm_tdp_mmu_walk_lockless_begin();
-	} else {
-		/*
-		 * Prevent page table teardown by making any free-er wait during
-		 * kvm_flush_remote_tlbs() IPI to all active vcpus.
-		 */
-		local_irq_disable();
-
-		/*
-		 * Make sure a following spte read is not reordered ahead of the write
-		 * to vcpu->mode.
-		 */
-		smp_store_mb(vcpu->mode, READING_SHADOW_PAGE_TABLES);
-	}
+	else
+		kvm_shadow_mmu_walk_lockless_begin(vcpu);
 }
 
 void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
 {
-	if (is_tdp_mmu_active(vcpu)) {
+	if (is_tdp_mmu_active(vcpu))
 		kvm_tdp_mmu_walk_lockless_end();
-	} else {
-		/*
-		 * Make sure the write to vcpu->mode is not reordered in front
-		 * of reads to sptes.  If it does,
-		 * kvm_shadow_mmu_commit_zap_page() can see us
-		 * OUTSIDE_GUEST_MODE and proceed to free the shadow page table.
-		 */
-		smp_store_release(&vcpu->mode, OUTSIDE_GUEST_MODE);
-		local_irq_enable();
-	}
+	else
+		kvm_shadow_mmu_walk_lockless_end(vcpu);
 }
 
 int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
diff --git a/arch/x86/kvm/mmu/shadow_mmu.c b/arch/x86/kvm/mmu/shadow_mmu.c
index 6449ac4de4883..c5d0accd6e057 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.c
+++ b/arch/x86/kvm/mmu/shadow_mmu.c
@@ -3663,3 +3663,30 @@ void kvm_mmu_uninit_shadow_mmu(struct kvm *kvm)
 	kvm_mmu_free_memory_cache(&kvm->arch.split_page_header_cache);
 	kvm_mmu_free_memory_cache(&kvm->arch.split_shadow_page_cache);
 }
+
+void kvm_shadow_mmu_walk_lockless_begin(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Prevent page table teardown by making any free-er wait during
+	 * kvm_flush_remote_tlbs() IPI to all active vcpus.
+	 */
+	local_irq_disable();
+
+	/*
+	 * Make sure a following spte read is not reordered ahead of the write
+	 * to vcpu->mode.
+	 */
+	smp_store_mb(vcpu->mode, READING_SHADOW_PAGE_TABLES);
+}
+
+void kvm_shadow_mmu_walk_lockless_end(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Make sure the write to vcpu->mode is not reordered in front
+	 * of reads to sptes.  If it does,
+	 * kvm_shadow_mmu_commit_zap_page() can see us
+	 * OUTSIDE_GUEST_MODE and proceed to free the shadow page table.
+	 */
+	smp_store_release(&vcpu->mode, OUTSIDE_GUEST_MODE);
+	local_irq_enable();
+}
diff --git a/arch/x86/kvm/mmu/shadow_mmu.h b/arch/x86/kvm/mmu/shadow_mmu.h
index f2e54355ebb19..12835872bda34 100644
--- a/arch/x86/kvm/mmu/shadow_mmu.h
+++ b/arch/x86/kvm/mmu/shadow_mmu.h
@@ -103,6 +103,9 @@ void kvm_shadow_mmu_zap_all(struct kvm *kvm);
 void kvm_mmu_init_shadow_mmu(struct kvm *kvm);
 void kvm_mmu_uninit_shadow_mmu(struct kvm *kvm);
 
+void kvm_shadow_mmu_walk_lockless_begin(struct kvm_vcpu *vcpu);
+void kvm_shadow_mmu_walk_lockless_end(struct kvm_vcpu *vcpu);
+
 /* Exports from paging_tmpl.h */
 gpa_t paging32_gva_to_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			  gpa_t vaddr, u64 access,