From patchwork Mon Feb 27 17:17:51 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
X-Patchwork-Id: 62006
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp2547756wrd;
        Mon, 27 Feb 2023 09:30:50 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set+HvI1ipKQtii+sxBOVjiYy6sPpvTiF17xK15ZZuA4QdGEu4+OSF0cafzhe16XAk/VxzSe+
X-Received: by 2002:a17:906:f755:b0:88a:7408:384c with SMTP id
 jp21-20020a170906f75500b0088a7408384cmr32958221ejb.47.1677519050241;
        Mon, 27 Feb 2023 09:30:50 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1677519050; cv=none;
        d=google.com; s=arc-20160816;
        b=P8fpJoJneP8sl1GmXMfbG++UnDJMqlUJKtSOVxHEbEOJo0TjCNxZ4YTk0FKOo4XeHf
         UCR3Fbu26WfrLF5rKywjkWuswenEDsGqOQsCq+ZEoT3CD2SDpiZAEF2n9OVgzEsT0iwn
         Meh5robbWyKlxnzYH7VLm2KMfZ1HqqkWsI752yREOIrTIQ7AolaHU3SWR3Rb//NMUnM6
         AXU5m/8To8uc/e7lJ9y7Y7jAPmyKYqqILEoVx8Zc0HzfUwLZW0K+a+sYRupL0zFbTy3N
         i6+un7TMlrKmgxi3Co+ZziB0ablq2dYhEfUc7AkkXOdIRVDo8fkVxSHHu7feGpls4suU
         dZbg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :message-id:date:subject:cc:to:from:dkim-signature:dkim-filter;
        bh=fknedrBN9SwzuVi6ot1s6RGt4uHR8qcI6hAfSyjoEfM=;
        b=H7E0Sg/6xgWyeBcZh8kWmVytyAyRW1F7lr3pFwBdRBnW5kDgGNSMBxjahU8PDP06+Y
         TVzYokFQ62fQqiQhhNEbAGx1mLxddpeLdSK8mYxz/VakQFEgJlphEWj8KHWS/aJUGvoz
         FAWH/XMtAIf7bwPq2lmW+CAgICN+IuzqqaelLXBr3OF90ys61Z2vhpjiaVlzTzmyWSB4
         siLqFecuzn30yB3apg86vvh2yS/ygdVLcOK3/f/+AM9D6nwD2guvZIONsy6NjsjuwdcR
         KjTuDW5QDdVYDAJ7IpuadmyMrTHqh1sy8p0hgwIDxrdpxaWvyNebI7HDyTCLKvPfwADK
         /wDA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@linux.microsoft.com header.s=default
 header.b=ThR2HyTE;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 y20-20020a1709063a9400b008b178585afcsi934415ejd.250.2023.02.27.09.30.27;
        Mon, 27 Feb 2023 09:30:50 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@linux.microsoft.com header.s=default
 header.b=ThR2HyTE;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230007AbjB0RST (ORCPT <rfc822;wenzhi022@gmail.com> + 99 others);
        Mon, 27 Feb 2023 12:18:19 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55644 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229781AbjB0RSR (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 12:18:17 -0500
Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182])
        by lindbergh.monkeyblade.net (Postfix) with ESMTP id A77E96EB0;
        Mon, 27 Feb 2023 09:18:15 -0800 (PST)
Received: from vm02.corp.microsoft.com (unknown [167.220.196.155])
        by linux.microsoft.com (Postfix) with ESMTPSA id A6B2F20BC5E7;
        Mon, 27 Feb 2023 09:18:13 -0800 (PST)
DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com A6B2F20BC5E7
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com;
        s=default; t=1677518295;
        bh=fknedrBN9SwzuVi6ot1s6RGt4uHR8qcI6hAfSyjoEfM=;
        h=From:To:Cc:Subject:Date:From;
        b=ThR2HyTEPb8Z4zjVX+cn1Rno45lCerw0KQn0uxlZNWM1DtDzFCX+sAc8fK61cK1Rn
         0lcaY50yjzua3uHlxiKwxCunnqxlbws+KanjnafbyzzZGWhxLCRufKHaSxGPOAU6XE
         JJUGO1/hJdQ0CZUWDtOoClvOXhMDOWonCG1rmTQo=
From: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
To: linux-kernel@vger.kernel.org
Cc: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>,
        kvm@vger.kernel.org, Vitaly Kuznetsov <vkuznets@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Tianyu Lan <ltykernel@gmail.com>,
        Michael Kelley <mikelley@microsoft.com>
Subject: [PATCH] KVM: SVM: Disable TDP MMU when running on Hyper-V
Date: Mon, 27 Feb 2023 17:17:51 +0000
Message-Id: <20230227171751.1211786-1-jpiotrowski@linux.microsoft.com>
X-Mailer: git-send-email 2.25.1
MIME-Version: 1.0
X-Spam-Status: No, score=-19.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,
        DKIM_VALID,DKIM_VALID_AU,ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_MED,
        SPF_HELO_PASS,SPF_PASS,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL
        autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1759006215537999319?=
X-GMAIL-MSGID: =?utf-8?q?1759006215537999319?=

TDP MMU has been broken on AMD CPUs when running on Hyper-V since v5.17.
The issue was first introduced by two commmits:

- bb95dfb9e2dfbe6b3f5eb5e8a20e0259dadbe906 "KVM: x86/mmu: Defer TLB
  flush to caller when freeing TDP MMU shadow pages"
- efd995dae5eba57c5d28d6886a85298b390a4f07 "KVM: x86/mmu: Zap defunct
  roots via asynchronous worker"

The root cause is that since then there are missing TLB flushes which
are required by HV_X64_NESTED_ENLIGHTENED_TLB. The failure manifests
as L2 guest VMs being unable to complete boot due to memory
inconsistencies between L1 and L2 guests which lead to various
assertion/emulation failures.

The HV_X64_NESTED_ENLIGHTENED_TLB enlightenment is always exposed by
Hyper-V on AMD and is always used by Linux. The TLB flush required by
HV_X64_NESTED_ENLIGHTENED_TLB is much stricter than the local TLB flush
that TDP MMU wants to issue. We have also found that with TDP MMU L2 guest
boot performance on AMD is reproducibly slower compared to when TDP MMU is
disabled.

Disable TDP MMU when using SVM Hyper-V for the time being while we
search for a better fix.

Link: https://lore.kernel.org/lkml/43980946-7bbf-dcef-7e40-af904c456250@linux.microsoft.com/t/#u
Signed-off-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
---
Based on kvm-x86-mmu-6.3. The approach used here does not apply cleanly to
<=v6.2. This would be needed in stable too, and I don't know about putting
fixes tags.

Jeremi

 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/mmu/mmu.c          |  5 +++--
 arch/x86/kvm/svm/svm.c          |  6 +++++-
 arch/x86/kvm/svm/svm_onhyperv.h | 10 ++++++++++
 arch/x86/kvm/vmx/vmx.c          |  3 ++-
 5 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4d2bc08794e4..a0868ae3688d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2031,7 +2031,8 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
 void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
 
 void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
-		       int tdp_max_root_level, int tdp_huge_page_level);
+		       int tdp_max_root_level, int tdp_huge_page_level,
+		       bool enable_tdp_mmu);
 
 static inline u16 kvm_read_ldt(void)
 {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c91ee2927dd7..5c0e28a7a3bc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5787,14 +5787,15 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
 }
 
 void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
-		       int tdp_max_root_level, int tdp_huge_page_level)
+		       int tdp_max_root_level, int tdp_huge_page_level,
+		       bool enable_tdp_mmu)
 {
 	tdp_enabled = enable_tdp;
 	tdp_root_level = tdp_forced_root_level;
 	max_tdp_level = tdp_max_root_level;
 
 #ifdef CONFIG_X86_64
-	tdp_mmu_enabled = tdp_mmu_allowed && tdp_enabled;
+	tdp_mmu_enabled = tdp_mmu_allowed && tdp_enabled && enable_tdp_mmu;
 #endif
 	/*
 	 * max_huge_page_level reflects KVM's MMU capabilities irrespective
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d13cf53e7390..070c3f7f8c9f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4925,6 +4925,7 @@ static __init int svm_hardware_setup(void)
 	struct page *iopm_pages;
 	void *iopm_va;
 	int r;
+	bool enable_tdp_mmu;
 	unsigned int order = get_order(IOPM_SIZE);
 
 	/*
@@ -4991,9 +4992,12 @@ static __init int svm_hardware_setup(void)
 	if (!boot_cpu_has(X86_FEATURE_NPT))
 		npt_enabled = false;
 
+	enable_tdp_mmu = svm_hv_enable_tdp_mmu();
+
 	/* Force VM NPT level equal to the host's paging level */
 	kvm_configure_mmu(npt_enabled, get_npt_level(),
-			  get_npt_level(), PG_LEVEL_1G);
+			  get_npt_level(), PG_LEVEL_1G,
+			  enable_tdp_mmu);
 	pr_info("Nested Paging %sabled\n", npt_enabled ? "en" : "dis");
 
 	/* Setup shadow_me_value and shadow_me_mask */
diff --git a/arch/x86/kvm/svm/svm_onhyperv.h b/arch/x86/kvm/svm/svm_onhyperv.h
index 6981c1e9a809..aa49ac5d66bc 100644
--- a/arch/x86/kvm/svm/svm_onhyperv.h
+++ b/arch/x86/kvm/svm/svm_onhyperv.h
@@ -30,6 +30,11 @@ static inline void svm_hv_init_vmcb(struct vmcb *vmcb)
 		hve->hv_enlightenments_control.msr_bitmap = 1;
 }
 
+static inline bool svm_hv_enable_tdp_mmu(void)
+{
+	return !(npt_enabled && ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB);
+}
+
 static inline void svm_hv_hardware_setup(void)
 {
 	if (npt_enabled &&
@@ -84,6 +89,11 @@ static inline void svm_hv_init_vmcb(struct vmcb *vmcb)
 {
 }
 
+static inline bool svm_hv_enable_tdp_mmu(void)
+{
+	return true;
+}
+
 static inline void svm_hv_hardware_setup(void)
 {
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c788aa382611..4d3808755d39 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8442,7 +8442,8 @@ static __init int hardware_setup(void)
 	vmx_setup_me_spte_mask();
 
 	kvm_configure_mmu(enable_ept, 0, vmx_get_max_tdp_level(),
-			  ept_caps_to_lpage_level(vmx_capability.ept));
+			  ept_caps_to_lpage_level(vmx_capability.ept),
+			  true);
 
 	/*
 	 * Only enable PML when hardware supports PML feature, and both EPT