KVM: SVM: Flush Hyper-V TLB when required

Message ID 20230320185110.1346829-1-jpiotrowski@linux.microsoft.com
State New
Headers
Series KVM: SVM: Flush Hyper-V TLB when required |

Commit Message

Jeremi Piotrowski March 20, 2023, 6:51 p.m. UTC
  The Hyper-V "EnlightenedNptTlb" enlightenment is always enabled when KVM
is running on top of Hyper-V and Hyper-V exposes support for it (which
is always). On AMD CPUs this enlightenment results in ASID invalidations
not flushing TLB entries derived from the NPT. To force the underlying
(L0) hypervisor to rebuild its shadow page tables, an explicit hypercall
is needed.

The original KVM implementation of Hyper-V's "EnlightenedNptTlb" on SVM
only added remote TLB flush hooks. This worked out fine for a while, as
sufficient remote TLB flushes where being issued in KVM to mask the
problem. Since v5.17, changes in the TDP code reduced the number of
flushes and the out-of-sync TLB prevents guests from booting
successfully.

Split svm_flush_tlb_current() into separate callbacks for the 3 cases
(guest/all/current), and issue the required Hyper-V hypercall when a
Hyper-V TLB flush is needed. The most important case where the TLB flush
was missing is when loading a new PGD, which is followed by what is now
svm_flush_tlb_current(). Since the hypercall acts on all CPUs, cache the
last flushed root in kvm_arch->hv_root_tdp. This prevents the shadow
NPTs from being unnecessarily rebuilt for multiple vcpus and when the
same root is flushed multiple times in a row on a single vcpu.

Cc: stable@vger.kernel.org # v5.17+
Fixes: 1e0c7d40758b ("KVM: SVM: hyper-v: Remote TLB flush for SVM")
Link: https://lore.kernel.org/lkml/43980946-7bbf-dcef-7e40-af904c456250@linux.microsoft.com/
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
---
 arch/x86/kvm/kvm_onhyperv.c | 23 +++++++++++++++++++++++
 arch/x86/kvm/kvm_onhyperv.h |  5 +++++
 arch/x86/kvm/svm/svm.c      | 18 +++++++++++++++---
 3 files changed, 43 insertions(+), 3 deletions(-)
  

Comments

Sean Christopherson March 22, 2023, 4:20 p.m. UTC | #1
On Mon, Mar 20, 2023, Jeremi Piotrowski wrote:
> ---
>  arch/x86/kvm/kvm_onhyperv.c | 23 +++++++++++++++++++++++
>  arch/x86/kvm/kvm_onhyperv.h |  5 +++++
>  arch/x86/kvm/svm/svm.c      | 18 +++++++++++++++---
>  3 files changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/kvm_onhyperv.c b/arch/x86/kvm/kvm_onhyperv.c
> index 482d6639ef88..036e04c0a161 100644
> --- a/arch/x86/kvm/kvm_onhyperv.c
> +++ b/arch/x86/kvm/kvm_onhyperv.c
> @@ -94,6 +94,29 @@ int hv_remote_flush_tlb(struct kvm *kvm)
>  }
>  EXPORT_SYMBOL_GPL(hv_remote_flush_tlb);
>  
> +void hv_flush_tlb_current(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_arch *kvm_arch = &vcpu->kvm->arch;
> +	hpa_t root_tdp = vcpu->arch.mmu->root.hpa;
> +
> +	if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb && VALID_PAGE(root_tdp)) {
> +		spin_lock(&kvm_arch->hv_root_tdp_lock);
> +		if (kvm_arch->hv_root_tdp != root_tdp) {
> +			hyperv_flush_guest_mapping(root_tdp);
> +			kvm_arch->hv_root_tdp = root_tdp;

In a vacuum, accessing kvm_arch->hv_root_tdp in the flush path is wrong.  This
likely fixes the issues you are seeing because the KVM bug only affects the case
when KVM is loading a new root (that used to be valid), in which case hv_root_tdp
is guaranteed to be different.  But KVM should not rely on that behavior, i.e. if
KVM says flush, then we flush.  There might be scenarios where the flush is
unnecessary, but those flushes should be elided by the code that knows the flush
is unnecessary, not in this common code just because the target root is the
globally shared root.

Somewhat of a moot point, but setting hv_root_tdp to root_tdp is also wrong.  KVM's
behavior is that hv_root_tdp points at a valid root if and only if all vCPUs share
said root.  E.g. invoking this when vCPUs have different roots will "corrupt"
hv_root_tdp and possibly cause a remote flush to do the wrong thing.

> +		}
> +		spin_unlock(&kvm_arch->hv_root_tdp_lock);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(hv_flush_tlb_current);
> +
> +void hv_flush_tlb_all(struct kvm_vcpu *vcpu)
> +{
> +	if (WARN_ON_ONCE(kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb))

Hmm, looking at the KVM code, AFAICT KVM only enables enlightened_npt_tlb for L1
(L1 from KVM's perspective) as svm_hv_init_vmcb() is only ever called with vmcb01,
never with vmcb02.  I don't know if that's intentional, but I do think it means
KVM can skip the Hyper-V flush for vmcb02 and instead rely on the ASID flush,
i.e. KVM can do the Hyper-V iff enlightened_npt_tlb is set in the current VMCB.
And that should continue to work if KVM does ever enabled enlightened_npt_tlb for L2.

> +		hv_remote_flush_tlb(vcpu->kvm);
> +}
> +EXPORT_SYMBOL_GPL(hv_flush_tlb_all);

I'd rather not add helpers to the common KVM code.  I do like minimizing the amount
of #ifdeffery, but defining these as common helpers makes it seem like VMX-on-HyperV
is broken, i.e. raises the question of why VMX doesn't use these helpers when running
on Hyper-V.

I'm thinking this?

---
 arch/x86/kvm/svm/svm.c          | 39 ++++++++++++++++++++++++++++++---
 arch/x86/kvm/svm/svm_onhyperv.h |  7 ++++++
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 70183d2271b5..ab97fe8f1d81 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3746,7 +3746,7 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
 	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
 }
 
-static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
+static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
@@ -3770,6 +3770,39 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
 		svm->current_vmcb->asid_generation--;
 }
 
+static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
+{
+#if IS_ENABLED(CONFIG_HYPERV)
+	hpa_t root_tdp = vcpu->arch.mmu->root.hpa;
+
+	/*
+	 * When running on Hyper-V with EnlightenedNptTlb enabled, explicitly
+	 * flush the NPT mappings via hypercall as flushing the ASID only
+	 * affects virtual to physical mappings, it does not invalidate guest
+	 * physical to host physical mappings.
+	 */
+	if (svm_hv_is_enlightened_tlb_enabled(vcpu) && VALID_PAGE(root_tdp))
+		hyperv_flush_guest_mapping(root_tdp);
+#endif
+	svm_flush_tlb_asid(vcpu);
+}
+
+static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
+{
+#if IS_ENABLED(CONFIG_HYPERV)
+	/*
+	 * When running on Hyper-V with EnlightenedNptTlb enabled, remote TLB
+	 * flushes should be routed to hv_remote_flush_tlb() without requesting
+	 * a "regular" remote flush.  Reaching this point means either there's
+	 * a KVM bug or a prior hv_remote_flush_tlb() call failed, both of
+	 * which might be fatal to the the guest.  Yell, but try to recover.
+	 */
+	if (WARN_ON_ONCE(svm_hv_is_enlightened_tlb_enabled(vcpu)))
+		hv_remote_flush_tlb(vcpu->kvm);
+#endif
+	svm_flush_tlb_asid(vcpu);
+}
+
 static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4762,10 +4795,10 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.set_rflags = svm_set_rflags,
 	.get_if_flag = svm_get_if_flag,
 
-	.flush_tlb_all = svm_flush_tlb_current,
+	.flush_tlb_all = svm_flush_tlb_all,
 	.flush_tlb_current = svm_flush_tlb_current,
 	.flush_tlb_gva = svm_flush_tlb_gva,
-	.flush_tlb_guest = svm_flush_tlb_current,
+	.flush_tlb_guest = svm_flush_tlb_asid,
 
 	.vcpu_pre_run = svm_vcpu_pre_run,
 	.vcpu_run = svm_vcpu_run,
diff --git a/arch/x86/kvm/svm/svm_onhyperv.h b/arch/x86/kvm/svm/svm_onhyperv.h
index cff838f15db5..d91e019fb7da 100644
--- a/arch/x86/kvm/svm/svm_onhyperv.h
+++ b/arch/x86/kvm/svm/svm_onhyperv.h
@@ -15,6 +15,13 @@ static struct kvm_x86_ops svm_x86_ops;
 
 int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu);
 
+static inline bool svm_hv_is_enlightened_tlb_enabled(struct kvm_vcpu *vcpu)
+{
+	struct hv_vmcb_enlightenments *hve = &to_svm(vcpu)->vmcb->control.hv_enlightenments;
+
+	return !!hve->hv_enlightenments_control.enlightened_npt_tlb;
+}
+
 static inline void svm_hv_init_vmcb(struct vmcb *vmcb)
 {
 	struct hv_vmcb_enlightenments *hve = &vmcb->control.hv_enlightenments;

base-commit: 50f13998451effea5c5fdc70fe576f8b435d6224
--
  
Vitaly Kuznetsov March 22, 2023, 4:53 p.m. UTC | #2
Sean Christopherson <seanjc@google.com> writes:

> On Mon, Mar 20, 2023, Jeremi Piotrowski wrote:
>> ---
>>  arch/x86/kvm/kvm_onhyperv.c | 23 +++++++++++++++++++++++
>>  arch/x86/kvm/kvm_onhyperv.h |  5 +++++
>>  arch/x86/kvm/svm/svm.c      | 18 +++++++++++++++---
>>  3 files changed, 43 insertions(+), 3 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/kvm_onhyperv.c b/arch/x86/kvm/kvm_onhyperv.c
>> index 482d6639ef88..036e04c0a161 100644
>> --- a/arch/x86/kvm/kvm_onhyperv.c
>> +++ b/arch/x86/kvm/kvm_onhyperv.c
>> @@ -94,6 +94,29 @@ int hv_remote_flush_tlb(struct kvm *kvm)
>>  }
>>  EXPORT_SYMBOL_GPL(hv_remote_flush_tlb);
>>  
>> +void hv_flush_tlb_current(struct kvm_vcpu *vcpu)
>> +{
>> +	struct kvm_arch *kvm_arch = &vcpu->kvm->arch;
>> +	hpa_t root_tdp = vcpu->arch.mmu->root.hpa;
>> +
>> +	if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb && VALID_PAGE(root_tdp)) {
>> +		spin_lock(&kvm_arch->hv_root_tdp_lock);
>> +		if (kvm_arch->hv_root_tdp != root_tdp) {
>> +			hyperv_flush_guest_mapping(root_tdp);
>> +			kvm_arch->hv_root_tdp = root_tdp;
>
> In a vacuum, accessing kvm_arch->hv_root_tdp in the flush path is wrong.  This
> likely fixes the issues you are seeing because the KVM bug only affects the case
> when KVM is loading a new root (that used to be valid), in which case hv_root_tdp
> is guaranteed to be different.  But KVM should not rely on that behavior, i.e. if
> KVM says flush, then we flush.  There might be scenarios where the flush is
> unnecessary, but those flushes should be elided by the code that knows the flush
> is unnecessary, not in this common code just because the target root is the
> globally shared root.
>
> Somewhat of a moot point, but setting hv_root_tdp to root_tdp is also wrong.  KVM's
> behavior is that hv_root_tdp points at a valid root if and only if all vCPUs share
> said root.  E.g. invoking this when vCPUs have different roots will "corrupt"
> hv_root_tdp and possibly cause a remote flush to do the wrong thing.
>
>> +		}
>> +		spin_unlock(&kvm_arch->hv_root_tdp_lock);
>> +	}
>> +}
>> +EXPORT_SYMBOL_GPL(hv_flush_tlb_current);
>> +
>> +void hv_flush_tlb_all(struct kvm_vcpu *vcpu)
>> +{
>> +	if (WARN_ON_ONCE(kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb))
>
> Hmm, looking at the KVM code, AFAICT KVM only enables enlightened_npt_tlb for L1
> (L1 from KVM's perspective) as svm_hv_init_vmcb() is only ever called with vmcb01,
> never with vmcb02.  I don't know if that's intentional, but I do think it means
> KVM can skip the Hyper-V flush for vmcb02 and instead rely on the ASID flush,
> i.e. KVM can do the Hyper-V iff enlightened_npt_tlb is set in the current VMCB.
> And that should continue to work if KVM does ever enabled enlightened_npt_tlb for L2.
>
>> +		hv_remote_flush_tlb(vcpu->kvm);
>> +}
>> +EXPORT_SYMBOL_GPL(hv_flush_tlb_all);
>
> I'd rather not add helpers to the common KVM code.  I do like minimizing the amount
> of #ifdeffery, but defining these as common helpers makes it seem like VMX-on-HyperV
> is broken, i.e. raises the question of why VMX doesn't use these helpers when running
> on Hyper-V.
>
> I'm thinking this?
>
> ---
>  arch/x86/kvm/svm/svm.c          | 39 ++++++++++++++++++++++++++++++---
>  arch/x86/kvm/svm/svm_onhyperv.h |  7 ++++++
>  2 files changed, 43 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 70183d2271b5..ab97fe8f1d81 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3746,7 +3746,7 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>  	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
>  }
>  
> -static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> +static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> @@ -3770,6 +3770,39 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
>  		svm->current_vmcb->asid_generation--;
>  }
>  
> +static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> +{
> +#if IS_ENABLED(CONFIG_HYPERV)
> +	hpa_t root_tdp = vcpu->arch.mmu->root.hpa;
> +
> +	/*
> +	 * When running on Hyper-V with EnlightenedNptTlb enabled, explicitly
> +	 * flush the NPT mappings via hypercall as flushing the ASID only
> +	 * affects virtual to physical mappings, it does not invalidate guest
> +	 * physical to host physical mappings.
> +	 */
> +	if (svm_hv_is_enlightened_tlb_enabled(vcpu) && VALID_PAGE(root_tdp))
> +		hyperv_flush_guest_mapping(root_tdp);
> +#endif
> +	svm_flush_tlb_asid(vcpu);
> +}
> +
> +static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
> +{
> +#if IS_ENABLED(CONFIG_HYPERV)
> +	/*
> +	 * When running on Hyper-V with EnlightenedNptTlb enabled, remote TLB
> +	 * flushes should be routed to hv_remote_flush_tlb() without requesting
> +	 * a "regular" remote flush.  Reaching this point means either there's
> +	 * a KVM bug or a prior hv_remote_flush_tlb() call failed, both of
> +	 * which might be fatal to the the guest.  Yell, but try to recover.
> +	 */
> +	if (WARN_ON_ONCE(svm_hv_is_enlightened_tlb_enabled(vcpu)))
> +		hv_remote_flush_tlb(vcpu->kvm);
> +#endif
> +	svm_flush_tlb_asid(vcpu);
> +}
> +
>  static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -4762,10 +4795,10 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.set_rflags = svm_set_rflags,
>  	.get_if_flag = svm_get_if_flag,
>  
> -	.flush_tlb_all = svm_flush_tlb_current,
> +	.flush_tlb_all = svm_flush_tlb_all,
>  	.flush_tlb_current = svm_flush_tlb_current,
>  	.flush_tlb_gva = svm_flush_tlb_gva,
> -	.flush_tlb_guest = svm_flush_tlb_current,
> +	.flush_tlb_guest = svm_flush_tlb_asid,
>  
>  	.vcpu_pre_run = svm_vcpu_pre_run,
>  	.vcpu_run = svm_vcpu_run,
> diff --git a/arch/x86/kvm/svm/svm_onhyperv.h b/arch/x86/kvm/svm/svm_onhyperv.h
> index cff838f15db5..d91e019fb7da 100644
> --- a/arch/x86/kvm/svm/svm_onhyperv.h
> +++ b/arch/x86/kvm/svm/svm_onhyperv.h
> @@ -15,6 +15,13 @@ static struct kvm_x86_ops svm_x86_ops;
>  
>  int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu);
>  
> +static inline bool svm_hv_is_enlightened_tlb_enabled(struct kvm_vcpu *vcpu)
> +{
> +	struct hv_vmcb_enlightenments *hve = &to_svm(vcpu)->vmcb->control.hv_enlightenments;
> +
> +	return !!hve->hv_enlightenments_control.enlightened_npt_tlb;

In theory, we should not look at Hyper-V enlightenments in VMCB control
just because our kernel has CONFIG_HYPERV enabled. I'd suggest we add a
real check that we're running on Hyper-V and we can do it the same way
it is done in svm_hv_hardware_setup()/svm_hv_init_vmcb():

	return (ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB)
		&& !!hve->hv_enlightenments_control.enlightened_npt_tlb;

(untested).

> +}
> +
>  static inline void svm_hv_init_vmcb(struct vmcb *vmcb)
>  {
>  	struct hv_vmcb_enlightenments *hve = &vmcb->control.hv_enlightenments;
>
> base-commit: 50f13998451effea5c5fdc70fe576f8b435d6224
  
Sean Christopherson March 22, 2023, 5:01 p.m. UTC | #3
On Wed, Mar 22, 2023, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> > diff --git a/arch/x86/kvm/svm/svm_onhyperv.h b/arch/x86/kvm/svm/svm_onhyperv.h
> > index cff838f15db5..d91e019fb7da 100644
> > --- a/arch/x86/kvm/svm/svm_onhyperv.h
> > +++ b/arch/x86/kvm/svm/svm_onhyperv.h
> > @@ -15,6 +15,13 @@ static struct kvm_x86_ops svm_x86_ops;
> >  
> >  int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu);
> >  
> > +static inline bool svm_hv_is_enlightened_tlb_enabled(struct kvm_vcpu *vcpu)
> > +{
> > +	struct hv_vmcb_enlightenments *hve = &to_svm(vcpu)->vmcb->control.hv_enlightenments;
> > +
> > +	return !!hve->hv_enlightenments_control.enlightened_npt_tlb;
> 
> In theory, we should not look at Hyper-V enlightenments in VMCB control
> just because our kernel has CONFIG_HYPERV enabled.

Oooh, right, because hv_enlightenments uses software reserved bits, and in theory
KVM could be running on a different hypervisor that uses those bits for something
completely different.

> I'd suggest we add a
> real check that we're running on Hyper-V and we can do it the same way
> it is done in svm_hv_hardware_setup()/svm_hv_init_vmcb():
> 
> 	return (ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB)
> 		&& !!hve->hv_enlightenments_control.enlightened_npt_tlb;

Jeremi, if you grab this, can you put the && on the previous line?  I.e.

	return (ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB) &&
	       !!hve->hv_enlightenments_control.enlightened_npt_tlb;
  
Jeremi Piotrowski March 22, 2023, 5:07 p.m. UTC | #4
On 22/03/2023 18:01, Sean Christopherson wrote:
> On Wed, Mar 22, 2023, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>>> diff --git a/arch/x86/kvm/svm/svm_onhyperv.h b/arch/x86/kvm/svm/svm_onhyperv.h
>>> index cff838f15db5..d91e019fb7da 100644
>>> --- a/arch/x86/kvm/svm/svm_onhyperv.h
>>> +++ b/arch/x86/kvm/svm/svm_onhyperv.h
>>> @@ -15,6 +15,13 @@ static struct kvm_x86_ops svm_x86_ops;
>>>  
>>>  int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu);
>>>  
>>> +static inline bool svm_hv_is_enlightened_tlb_enabled(struct kvm_vcpu *vcpu)
>>> +{
>>> +	struct hv_vmcb_enlightenments *hve = &to_svm(vcpu)->vmcb->control.hv_enlightenments;
>>> +
>>> +	return !!hve->hv_enlightenments_control.enlightened_npt_tlb;
>>
>> In theory, we should not look at Hyper-V enlightenments in VMCB control
>> just because our kernel has CONFIG_HYPERV enabled.
> 
> Oooh, right, because hv_enlightenments uses software reserved bits, and in theory
> KVM could be running on a different hypervisor that uses those bits for something
> completely different.
> 
>> I'd suggest we add a
>> real check that we're running on Hyper-V and we can do it the same way
>> it is done in svm_hv_hardware_setup()/svm_hv_init_vmcb():
>>
>> 	return (ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB)
>> 		&& !!hve->hv_enlightenments_control.enlightened_npt_tlb;
> 
> Jeremi, if you grab this, can you put the && on the previous line?  I.e.
> 
> 	return (ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB) &&
> 	       !!hve->hv_enlightenments_control.enlightened_npt_tlb;

Will do. I'll need to read the replies in more detail tomorrow.
  
Jeremi Piotrowski March 24, 2023, 1:42 p.m. UTC | #5
On 3/22/2023 5:20 PM, Sean Christopherson wrote:
> On Mon, Mar 20, 2023, Jeremi Piotrowski wrote:
>> ---
>>  arch/x86/kvm/kvm_onhyperv.c | 23 +++++++++++++++++++++++
>>  arch/x86/kvm/kvm_onhyperv.h |  5 +++++
>>  arch/x86/kvm/svm/svm.c      | 18 +++++++++++++++---
>>  3 files changed, 43 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/kvm_onhyperv.c b/arch/x86/kvm/kvm_onhyperv.c
>> index 482d6639ef88..036e04c0a161 100644
>> --- a/arch/x86/kvm/kvm_onhyperv.c
>> +++ b/arch/x86/kvm/kvm_onhyperv.c
>> @@ -94,6 +94,29 @@ int hv_remote_flush_tlb(struct kvm *kvm)
>>  }
>>  EXPORT_SYMBOL_GPL(hv_remote_flush_tlb);
>>  
>> +void hv_flush_tlb_current(struct kvm_vcpu *vcpu)
>> +{
>> +	struct kvm_arch *kvm_arch = &vcpu->kvm->arch;
>> +	hpa_t root_tdp = vcpu->arch.mmu->root.hpa;
>> +
>> +	if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb && VALID_PAGE(root_tdp)) {
>> +		spin_lock(&kvm_arch->hv_root_tdp_lock);
>> +		if (kvm_arch->hv_root_tdp != root_tdp) {
>> +			hyperv_flush_guest_mapping(root_tdp);
>> +			kvm_arch->hv_root_tdp = root_tdp;
> 
> In a vacuum, accessing kvm_arch->hv_root_tdp in the flush path is wrong.  This
> likely fixes the issues you are seeing because the KVM bug only affects the case
> when KVM is loading a new root (that used to be valid), in which case hv_root_tdp
> is guaranteed to be different.  But KVM should not rely on that behavior, i.e. if
> KVM says flush, then we flush.  There might be scenarios where the flush is
> unnecessary, but those flushes should be elided by the code that knows the flush
> is unnecessary, not in this common code just because the target root is the
> globally shared root> 

That's fair, and I'm fine with doing the flush unconditionally to fix the issue at
this time.

But eliding the flushes higher up will require bubbling up more knowledge about
the enlightened TLB and the fact that hyperv_flush_guest_mapping() already acts
across all cpus. And we would still want to call svm_flush_tlb_asid() anyway, right?

> Somewhat of a moot point, but setting hv_root_tdp to root_tdp is also wrong.  KVM's
> behavior is that hv_root_tdp points at a valid root if and only if all vCPUs share
> said root.  E.g. invoking this when vCPUs have different roots will "corrupt"
> hv_root_tdp and possibly cause a remote flush to do the wrong thing.> 

Oh, that's right. I'm dropping this for now.

>> +		}
>> +		spin_unlock(&kvm_arch->hv_root_tdp_lock);
>> +	}
>> +}
>> +EXPORT_SYMBOL_GPL(hv_flush_tlb_current);
>> +
>> +void hv_flush_tlb_all(struct kvm_vcpu *vcpu)
>> +{
>> +	if (WARN_ON_ONCE(kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb))
> 
> Hmm, looking at the KVM code, AFAICT KVM only enables enlightened_npt_tlb for L1
> (L1 from KVM's perspective) as svm_hv_init_vmcb() is only ever called with vmcb01,
> never with vmcb02.  I don't know if that's intentional, but I do think it means
> KVM can skip the Hyper-V flush for vmcb02 and instead rely on the ASID flush,
> i.e. KVM can do the Hyper-V iff enlightened_npt_tlb is set in the current VMCB.
> And that should continue to work if KVM does ever enabled enlightened_npt_tlb for L2.
> 
>> +		hv_remote_flush_tlb(vcpu->kvm);
>> +}
>> +EXPORT_SYMBOL_GPL(hv_flush_tlb_all);
> 
> I'd rather not add helpers to the common KVM code.  I do like minimizing the amount
> of #ifdeffery, but defining these as common helpers makes it seem like VMX-on-HyperV
> is broken, i.e. raises the question of why VMX doesn't use these helpers when running
> on Hyper-V.
> 
> I'm thinking this?
> 

I have the #ifdef version ready to send out, but what do you think about this:

diff --git a/arch/x86/kvm/kvm_onhyperv.h b/arch/x86/kvm/kvm_onhyperv.h
index 287e98ef9df3..b3ee0bb7e95f 100644
--- a/arch/x86/kvm/kvm_onhyperv.h
+++ b/arch/x86/kvm/kvm_onhyperv.h
@@ -12,6 +12,11 @@ int hv_remote_flush_tlb_with_range(struct kvm *kvm,
 int hv_remote_flush_tlb(struct kvm *kvm);
 void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp);
 #else /* !CONFIG_HYPERV */
+static inline int hv_remote_flush_tlb(struct kvm *kvm)
+{
+	return -1;
+}
+
 static inline void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp)
 {
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 252e7f37e4e2..e707511a91e3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3729,7 +3729,7 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
 	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
 }
 
-static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
+static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
@@ -3753,6 +3753,39 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
 		svm->current_vmcb->asid_generation--;
 }
 
+static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
+{
+	hpa_t root_tdp = vcpu->arch.mmu->root.hpa;
+
+	/*
+	 * When running on Hyper-V with EnlightenedNptTlb enabled, explicitly
+	 * flush the NPT mappings via hypercall as flushing the ASID only
+	 * affects virtual to physical mappings, it does not invalidate guest
+	 * physical to host physical mappings.
+	 */
+	if (IS_ENABLED(CONFIG_HYPERV) &&
+	    svm_hv_is_enlightened_tlb_enabled(vcpu) &&
+	    VALID_PAGE(root_tdp))
+		hyperv_flush_guest_mapping(root_tdp);
+
+	svm_flush_tlb_asid(vcpu);
+}
+
+static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * When running on Hyper-V with EnlightenedNptTlb enabled, remote TLB
+	 * flushes should be routed to hv_remote_flush_tlb() without requesting
+	 * a "regular" remote flush.  Reaching this point means either there's
+	 * a KVM bug or a prior hv_remote_flush_tlb() call failed, both of
+	 * which might be fatal to the guest.  Yell, but try to recover.
+	 */
+	if (IS_ENABLED(CONFIG_HYPERV) && WARN_ON_ONCE(svm_hv_is_enlightened_tlb_enabled(vcpu)))
+		hv_remote_flush_tlb(vcpu->kvm);
+
+	svm_flush_tlb_asid(vcpu);
+}
+
 static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4745,10 +4778,10 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.set_rflags = svm_set_rflags,
 	.get_if_flag = svm_get_if_flag,
 
-	.flush_tlb_all = svm_flush_tlb_current,
+	.flush_tlb_all = svm_flush_tlb_all,
 	.flush_tlb_current = svm_flush_tlb_current,
 	.flush_tlb_gva = svm_flush_tlb_gva,
-	.flush_tlb_guest = svm_flush_tlb_current,
+	.flush_tlb_guest = svm_flush_tlb_asid,
 
 	.vcpu_pre_run = svm_vcpu_pre_run,
 	.vcpu_run = svm_vcpu_run,
diff --git a/arch/x86/kvm/svm/svm_onhyperv.h b/arch/x86/kvm/svm/svm_onhyperv.h
index cff838f15db5..4c9e0d4ba3dd 100644
--- a/arch/x86/kvm/svm/svm_onhyperv.h
+++ b/arch/x86/kvm/svm/svm_onhyperv.h
@@ -6,6 +6,8 @@
 #ifndef __ARCH_X86_KVM_SVM_ONHYPERV_H__
 #define __ARCH_X86_KVM_SVM_ONHYPERV_H__

+#include <asm/mshyperv.h>
+
 #if IS_ENABLED(CONFIG_HYPERV)

 #include "kvm_onhyperv.h"
@@ -15,6 +17,14 @@ static struct kvm_x86_ops svm_x86_ops;
 
 int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu);
 
+static inline bool svm_hv_is_enlightened_tlb_enabled(struct kvm_vcpu *vcpu)
+{
+	struct hv_vmcb_enlightenments *hve = &to_svm(vcpu)->vmcb->control.hv_enlightenments;
+
+	return ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB &&
+		!!hve->hv_enlightenments_control.enlightened_npt_tlb;
+}
+
 static inline void svm_hv_init_vmcb(struct vmcb *vmcb)
 {
 	struct hv_vmcb_enlightenments *hve = &vmcb->control.hv_enlightenments;
@@ -80,6 +90,11 @@ static inline void svm_hv_update_vp_id(struct vmcb *vmcb, struct kvm_vcpu *vcpu)
 }
 #else
 
+static inline bool svm_hv_is_enlightened_tlb_enabled(struct kvm_vcpu *vcpu)
+{
+	return false;
+}
+
 static inline void svm_hv_init_vmcb(struct vmcb *vmcb)
 {
 }
  
Sean Christopherson March 24, 2023, 2:10 p.m. UTC | #6
On Fri, Mar 24, 2023, Jeremi Piotrowski wrote:
> I have the #ifdef version ready to send out, but what do you think about this:

Oh, nice!  Yeah, that works, I didn't see the stub for hyperv_flush_guest_mapping().

> @@ -3753,6 +3753,39 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
>  		svm->current_vmcb->asid_generation--;
>  }
>  
> +static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> +{
> +	hpa_t root_tdp = vcpu->arch.mmu->root.hpa;
> +
> +	/*
> +	 * When running on Hyper-V with EnlightenedNptTlb enabled, explicitly
> +	 * flush the NPT mappings via hypercall as flushing the ASID only
> +	 * affects virtual to physical mappings, it does not invalidate guest
> +	 * physical to host physical mappings.
> +	 */
> +	if (IS_ENABLED(CONFIG_HYPERV) &&

No need for the IS_ENABLED(CONFIG_HYPERV) check here, the svm_hv_is_enlightened_tlb_enabled()
stub that's provided for CONFIG_HYPERV=n will guard this properly

	if (svm_hv_is_enlightened_tlb_enabled(vcpu) && VALID_PAGE(root_tdp))
		hyperv_flush_guest_mapping(root_tdp);

> +	    svm_hv_is_enlightened_tlb_enabled(vcpu) &&
> +	    VALID_PAGE(root_tdp))
> +		hyperv_flush_guest_mapping(root_tdp);
> +
> +	svm_flush_tlb_asid(vcpu);
> +}
> +
> +static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
> +{
> +	/*
> +	 * When running on Hyper-V with EnlightenedNptTlb enabled, remote TLB
> +	 * flushes should be routed to hv_remote_flush_tlb() without requesting
> +	 * a "regular" remote flush.  Reaching this point means either there's
> +	 * a KVM bug or a prior hv_remote_flush_tlb() call failed, both of
> +	 * which might be fatal to the guest.  Yell, but try to recover.
> +	 */
> +	if (IS_ENABLED(CONFIG_HYPERV) && WARN_ON_ONCE(svm_hv_is_enlightened_tlb_enabled(vcpu)))
> +		hv_remote_flush_tlb(vcpu->kvm);

And then

	if (WARN_ON_ONCE(svm_hv_is_enlightened_tlb_enabled(vcpu)))
		hv_remote_flush_tlb(vcpu->kvm);


> +
> +	svm_flush_tlb_asid(vcpu);
> +}
> +
>  static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -4745,10 +4778,10 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.set_rflags = svm_set_rflags,
>  	.get_if_flag = svm_get_if_flag,
>  
> -	.flush_tlb_all = svm_flush_tlb_current,
> +	.flush_tlb_all = svm_flush_tlb_all,
>  	.flush_tlb_current = svm_flush_tlb_current,
>  	.flush_tlb_gva = svm_flush_tlb_gva,
> -	.flush_tlb_guest = svm_flush_tlb_current,
> +	.flush_tlb_guest = svm_flush_tlb_asid,
>  
>  	.vcpu_pre_run = svm_vcpu_pre_run,
>  	.vcpu_run = svm_vcpu_run,
> diff --git a/arch/x86/kvm/svm/svm_onhyperv.h b/arch/x86/kvm/svm/svm_onhyperv.h
> index cff838f15db5..4c9e0d4ba3dd 100644
> --- a/arch/x86/kvm/svm/svm_onhyperv.h
> +++ b/arch/x86/kvm/svm/svm_onhyperv.h
> @@ -6,6 +6,8 @@
>  #ifndef __ARCH_X86_KVM_SVM_ONHYPERV_H__
>  #define __ARCH_X86_KVM_SVM_ONHYPERV_H__
> 
> +#include <asm/mshyperv.h>
> +
>  #if IS_ENABLED(CONFIG_HYPERV)
> 
>  #include "kvm_onhyperv.h"
> @@ -15,6 +17,14 @@ static struct kvm_x86_ops svm_x86_ops;
>  
>  int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu);
>  
> +static inline bool svm_hv_is_enlightened_tlb_enabled(struct kvm_vcpu *vcpu)
> +{
> +	struct hv_vmcb_enlightenments *hve = &to_svm(vcpu)->vmcb->control.hv_enlightenments;
> +
> +	return ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB &&
> +		!!hve->hv_enlightenments_control.enlightened_npt_tlb;

Uber nit, align the indentation (7 spaces instead of 1 tab):

	return ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB &&
	       !!hve->hv_enlightenments_control.enlightened_npt_tlb;
  

Patch

diff --git a/arch/x86/kvm/kvm_onhyperv.c b/arch/x86/kvm/kvm_onhyperv.c
index 482d6639ef88..036e04c0a161 100644
--- a/arch/x86/kvm/kvm_onhyperv.c
+++ b/arch/x86/kvm/kvm_onhyperv.c
@@ -94,6 +94,29 @@  int hv_remote_flush_tlb(struct kvm *kvm)
 }
 EXPORT_SYMBOL_GPL(hv_remote_flush_tlb);
 
+void hv_flush_tlb_current(struct kvm_vcpu *vcpu)
+{
+	struct kvm_arch *kvm_arch = &vcpu->kvm->arch;
+	hpa_t root_tdp = vcpu->arch.mmu->root.hpa;
+
+	if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb && VALID_PAGE(root_tdp)) {
+		spin_lock(&kvm_arch->hv_root_tdp_lock);
+		if (kvm_arch->hv_root_tdp != root_tdp) {
+			hyperv_flush_guest_mapping(root_tdp);
+			kvm_arch->hv_root_tdp = root_tdp;
+		}
+		spin_unlock(&kvm_arch->hv_root_tdp_lock);
+	}
+}
+EXPORT_SYMBOL_GPL(hv_flush_tlb_current);
+
+void hv_flush_tlb_all(struct kvm_vcpu *vcpu)
+{
+	if (WARN_ON_ONCE(kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb))
+		hv_remote_flush_tlb(vcpu->kvm);
+}
+EXPORT_SYMBOL_GPL(hv_flush_tlb_all);
+
 void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp)
 {
 	struct kvm_arch *kvm_arch = &vcpu->kvm->arch;
diff --git a/arch/x86/kvm/kvm_onhyperv.h b/arch/x86/kvm/kvm_onhyperv.h
index 287e98ef9df3..f24d0ca41d2b 100644
--- a/arch/x86/kvm/kvm_onhyperv.h
+++ b/arch/x86/kvm/kvm_onhyperv.h
@@ -10,11 +10,16 @@ 
 int hv_remote_flush_tlb_with_range(struct kvm *kvm,
 		struct kvm_tlb_range *range);
 int hv_remote_flush_tlb(struct kvm *kvm);
+void hv_flush_tlb_current(struct kvm_vcpu *vcpu);
+void hv_flush_tlb_all(struct kvm_vcpu *vcpu);
 void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp);
 #else /* !CONFIG_HYPERV */
 static inline void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp)
 {
 }
+
+static inline void hv_flush_tlb_current(struct kvm_vcpu *vcpu) { }
+static inline void hv_flush_tlb_all(struct kvm_vcpu *vcpu) { }
 #endif /* !CONFIG_HYPERV */
 
 #endif
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 252e7f37e4e2..8da6740ef595 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3729,7 +3729,7 @@  static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
 	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
 }
 
-static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
+static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
@@ -3753,6 +3753,18 @@  static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
 		svm->current_vmcb->asid_generation--;
 }
 
+static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
+{
+	hv_flush_tlb_current(vcpu);
+	svm_flush_tlb_asid(vcpu);
+}
+
+static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
+{
+	hv_flush_tlb_all(vcpu);
+	svm_flush_tlb_asid(vcpu);
+}
+
 static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4745,10 +4757,10 @@  static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.set_rflags = svm_set_rflags,
 	.get_if_flag = svm_get_if_flag,
 
-	.flush_tlb_all = svm_flush_tlb_current,
+	.flush_tlb_all = svm_flush_tlb_all,
 	.flush_tlb_current = svm_flush_tlb_current,
 	.flush_tlb_gva = svm_flush_tlb_gva,
-	.flush_tlb_guest = svm_flush_tlb_current,
+	.flush_tlb_guest = svm_flush_tlb_asid,
 
 	.vcpu_pre_run = svm_vcpu_pre_run,
 	.vcpu_run = svm_vcpu_run,