[v3,08/11] KVM: x86: move vmx code to get EPT memtype when CR0.CD=1 to x86 common code

Message ID 20230616023858.7503-1-yan.y.zhao@intel.com
State New
Headers
Series KVM: x86/mmu: refine memtype related mmu zap |

Commit Message

Yan Zhao June 16, 2023, 2:38 a.m. UTC
  Move code in vmx.c to get cache disabled memtype when non-coherent DMA
present to x86 common code.

This is the preparation patch for later implementation of fine-grained gfn
zap for CR0.CD toggles when guest MTRRs are honored.

No functional change intended.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/mtrr.c    | 19 +++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c | 10 +++++-----
 arch/x86/kvm/x86.h     |  1 +
 3 files changed, 25 insertions(+), 5 deletions(-)
  

Comments

Sean Christopherson June 28, 2023, 10:57 p.m. UTC | #1
On Fri, Jun 16, 2023, Yan Zhao wrote:
> Move code in vmx.c to get cache disabled memtype when non-coherent DMA
> present to x86 common code.
> 
> This is the preparation patch for later implementation of fine-grained gfn
> zap for CR0.CD toggles when guest MTRRs are honored.
> 
> No functional change intended.
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  arch/x86/kvm/mtrr.c    | 19 +++++++++++++++++++
>  arch/x86/kvm/vmx/vmx.c | 10 +++++-----
>  arch/x86/kvm/x86.h     |  1 +
>  3 files changed, 25 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
> index 3ce58734ad22..b35dd0bc9cad 100644
> --- a/arch/x86/kvm/mtrr.c
> +++ b/arch/x86/kvm/mtrr.c
> @@ -721,3 +721,22 @@ bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
>  
>  	return type == mtrr_default_type(mtrr_state);
>  }
> +
> +void kvm_mtrr_get_cd_memory_type(struct kvm_vcpu *vcpu, u8 *type, bool *ipat)

Hmm, I'm not convinced that this logic is subtle enough to warrant a common
helper with out params (I *really* don't like out params :-) ).

UC, or more specifically CR0.CD=1 on VMX without the quirk, is a super special case,
because to faithfully emulatee "No Fill" mode, KVM needs to ignore guest PAT (stupid WC).

I don't love having the same logic/assumptions in multiple places, but the CR0.CD=1
behavior is so rigidly tied to what KVM must do to that I think trying to provide a
common helper makes the code more complex than it needs to be.

If we open code the logic in the MTRR helper, than I think it can be distilled
down to:

	struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state;
	bool mtrr_enabled = mtrr_is_enabled(mtrr_state);
	u8 default_type;

	/*
	 * Faithfully emulating CR0.CD=1 on VMX requires ignoring guest PAT, as
	 * WC in the PAT overrides UC in the MTRRs.  Zap all SPTEs so that KVM
	 * will once again start honoring guest PAT.
	 */
	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
		return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL));

	default_type = mtrr_enabled ? mtrr_default_type(mtrr_state) :
				      mtrr_disabled_type(vcpu);

	if (default_type != MTRR_TYPE_WRBACK)
		return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL));

	if (mtrr_enabled) {
		if (gather_non_wb_fixed_mtrrs(vcpu, MTRR_TYPE_WRBACK))
			goto fail;

		if (gather_non_wb_var_mtrrs(vcpu, MTRR_TYPE_WRBACK))
			goto fail;

		kvm_zap_or_wait_mtrrs(vcpu->kvm);
	}

and this patch goes away.

> +{
> +	/*
> +	 * this routine is supposed to be called when guest mtrrs are honored
> +	 */
> +	if (unlikely(!kvm_mmu_honors_guest_mtrrs(vcpu->kvm))) {

I don't think this is worth checking, e.g. this would be WARN-worthy if it weren't
for an otherwise benign race with device (un)assignment.

> +		*type = MTRR_TYPE_WRBACK;
> +		*ipat = true;

> +	} else if (unlikely(!kvm_check_has_quirk(vcpu->kvm,

Eh, drop the "unlikely()" annotations.  AIUI, they almost never provide actual
performance benefits, and I dislike unnecessarily speculating on what userspace
is doing when it comes to code (though I 100% agree that this definitely unlikely)
  
Yan Zhao June 29, 2023, 12:55 a.m. UTC | #2
On Wed, Jun 28, 2023 at 03:57:09PM -0700, Sean Christopherson wrote:
> On Fri, Jun 16, 2023, Yan Zhao wrote:
> > Move code in vmx.c to get cache disabled memtype when non-coherent DMA
> > present to x86 common code.
> > 
> > This is the preparation patch for later implementation of fine-grained gfn
> > zap for CR0.CD toggles when guest MTRRs are honored.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >  arch/x86/kvm/mtrr.c    | 19 +++++++++++++++++++
> >  arch/x86/kvm/vmx/vmx.c | 10 +++++-----
> >  arch/x86/kvm/x86.h     |  1 +
> >  3 files changed, 25 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
> > index 3ce58734ad22..b35dd0bc9cad 100644
> > --- a/arch/x86/kvm/mtrr.c
> > +++ b/arch/x86/kvm/mtrr.c
> > @@ -721,3 +721,22 @@ bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
> >  
> >  	return type == mtrr_default_type(mtrr_state);
> >  }
> > +
> > +void kvm_mtrr_get_cd_memory_type(struct kvm_vcpu *vcpu, u8 *type, bool *ipat)
> 
> Hmm, I'm not convinced that this logic is subtle enough to warrant a common
I added this patch because the memtype to use under CR0.CD=1 is determined by
vmx specific code (i.e. vmx.c), while mtrr.c is a common code for x86.

I don't know if it's good to assume what vmx.c will return as in below open code. 
(e.g. if someone added IPAT bit for CR0.CD=1 under the quirk, and forgot
to update the code here, we actually need to zap everything rather than
zap only non-WB ranges).

That's why I want to introduce a helper and let vmx.c call into it.
(sorry, I didn't write a good commit message to explain the real intent).

> helper with out params (I *really* don't like out params :-) ).
I don't like the out params either. :)
I just don't know a better way to return the ipat in the helper.

> 
> UC, or more specifically CR0.CD=1 on VMX without the quirk, is a super special case,
> because to faithfully emulatee "No Fill" mode, KVM needs to ignore guest PAT (stupid WC).
> 
> I don't love having the same logic/assumptions in multiple places, but the CR0.CD=1
> behavior is so rigidly tied to what KVM must do to that I think trying to provide a
> common helper makes the code more complex than it needs to be.
> 
> If we open code the logic in the MTRR helper, than I think it can be distilled
> down to:
> 
> 	struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state;
> 	bool mtrr_enabled = mtrr_is_enabled(mtrr_state);
> 	u8 default_type;
> 
> 	/*
> 	 * Faithfully emulating CR0.CD=1 on VMX requires ignoring guest PAT, as
> 	 * WC in the PAT overrides UC in the MTRRs.  Zap all SPTEs so that KVM
> 	 * will once again start honoring guest PAT.
> 	 */
> 	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
> 		return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
> 
> 	default_type = mtrr_enabled ? mtrr_default_type(mtrr_state) :
> 				      mtrr_disabled_type(vcpu);
> 
> 	if (default_type != MTRR_TYPE_WRBACK)
> 		return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
> 
> 	if (mtrr_enabled) {
> 		if (gather_non_wb_fixed_mtrrs(vcpu, MTRR_TYPE_WRBACK))
> 			goto fail;
> 
> 		if (gather_non_wb_var_mtrrs(vcpu, MTRR_TYPE_WRBACK))
> 			goto fail;
> 
> 		kvm_zap_or_wait_mtrrs(vcpu->kvm);
> 	}
> 
> and this patch goes away.
> 
> > +{
> > +	/*
> > +	 * this routine is supposed to be called when guest mtrrs are honored
> > +	 */
> > +	if (unlikely(!kvm_mmu_honors_guest_mtrrs(vcpu->kvm))) {
> 
> I don't think this is worth checking, e.g. this would be WARN-worthy if it weren't
> for an otherwise benign race with device (un)assignment.
Yes, WANR_ON is a better way.

> 
> > +		*type = MTRR_TYPE_WRBACK;
> > +		*ipat = true;
> 
> > +	} else if (unlikely(!kvm_check_has_quirk(vcpu->kvm,
> 
> Eh, drop the "unlikely()" annotations.  AIUI, they almost never provide actual
> performance benefits, and I dislike unnecessarily speculating on what userspace
> is doing when it comes to code (though I 100% agree that this definitely unlikely)
It makes sence!

Thanks!
  
Sean Christopherson June 29, 2023, 8:42 p.m. UTC | #3
On Thu, Jun 29, 2023, Yan Zhao wrote:
> On Wed, Jun 28, 2023 at 03:57:09PM -0700, Sean Christopherson wrote:
> > On Fri, Jun 16, 2023, Yan Zhao wrote:
> > > Move code in vmx.c to get cache disabled memtype when non-coherent DMA
> > > present to x86 common code.
> > > 
> > > This is the preparation patch for later implementation of fine-grained gfn
> > > zap for CR0.CD toggles when guest MTRRs are honored.
> > > 
> > > No functional change intended.
> > > 
> > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > > ---
> > >  arch/x86/kvm/mtrr.c    | 19 +++++++++++++++++++
> > >  arch/x86/kvm/vmx/vmx.c | 10 +++++-----
> > >  arch/x86/kvm/x86.h     |  1 +
> > >  3 files changed, 25 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
> > > index 3ce58734ad22..b35dd0bc9cad 100644
> > > --- a/arch/x86/kvm/mtrr.c
> > > +++ b/arch/x86/kvm/mtrr.c
> > > @@ -721,3 +721,22 @@ bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
> > >  
> > >  	return type == mtrr_default_type(mtrr_state);
> > >  }
> > > +
> > > +void kvm_mtrr_get_cd_memory_type(struct kvm_vcpu *vcpu, u8 *type, bool *ipat)
> > 
> > Hmm, I'm not convinced that this logic is subtle enough to warrant a common
> I added this patch because the memtype to use under CR0.CD=1 is determined by
> vmx specific code (i.e. vmx.c), while mtrr.c is a common code for x86.
> 
> I don't know if it's good to assume what vmx.c will return as in below open code. 
> (e.g. if someone added IPAT bit for CR0.CD=1 under the quirk, and forgot
> to update the code here, we actually need to zap everything rather than
> zap only non-WB ranges).
> 
> That's why I want to introduce a helper and let vmx.c call into it.
> (sorry, I didn't write a good commit message to explain the real intent).

No need to apologize, I fully understood the intent.  I'm just not convinced that
the risk of us screwing up this particular case is worth the extra layers of crud
that are necessary to let VMX and MTRRs share the core logic.

Absent emulating CR0.CD=1 with UC, setting IPAT is complete nonsense when KVM is
honoring the guest memtype.

I 100% agree that splitting the logic is less than ideal, but providing a common
helper feels forced and IMO yields significantly less readable code.  And exporting
kvm_mtrr_get_cd_memory_type() only adds to the confusion because calling it on
SVM, which can't fully ignore gPAT, is also nonsensical.
  
Yan Zhao June 30, 2023, 7:49 a.m. UTC | #4
On Thu, Jun 29, 2023 at 01:42:46PM -0700, Sean Christopherson wrote:
> On Thu, Jun 29, 2023, Yan Zhao wrote:
> > On Wed, Jun 28, 2023 at 03:57:09PM -0700, Sean Christopherson wrote:
> > > On Fri, Jun 16, 2023, Yan Zhao wrote:
...
> > > > +void kvm_mtrr_get_cd_memory_type(struct kvm_vcpu *vcpu, u8 *type, bool *ipat)
> > > 
> > > Hmm, I'm not convinced that this logic is subtle enough to warrant a common
> > I added this patch because the memtype to use under CR0.CD=1 is determined by
> > vmx specific code (i.e. vmx.c), while mtrr.c is a common code for x86.
> > 
> > I don't know if it's good to assume what vmx.c will return as in below open code. 
> > (e.g. if someone added IPAT bit for CR0.CD=1 under the quirk, and forgot
> > to update the code here, we actually need to zap everything rather than
> > zap only non-WB ranges).
> > 
> > That's why I want to introduce a helper and let vmx.c call into it.
> > (sorry, I didn't write a good commit message to explain the real intent).
> 
> No need to apologize, I fully understood the intent.  I'm just not convinced that
> the risk of us screwing up this particular case is worth the extra layers of crud
> that are necessary to let VMX and MTRRs share the core logic.
> 
> Absent emulating CR0.CD=1 with UC, setting IPAT is complete nonsense when KVM is
> honoring the guest memtype.
Yes, I'm just paranoid :)

> 
> I 100% agree that splitting the logic is less than ideal, but providing a common
> helper feels forced and IMO yields significantly less readable code.  And exporting
> kvm_mtrr_get_cd_memory_type() only adds to the confusion because calling it on
> SVM, which can't fully ignore gPAT, is also nonsensical.
Ok. I get your concern now. You are right.
Looks the easiest way now is to add some comments in VMM to caution that
changes in memtype when noncoherent DMA present and CR0.CD=1 may lead to
update of code for GFN zap.
Or, do you think it's worth adding a new callback in kvm_x86_ops, e.g.
static_call_cond(kvm_x86_get_cd_mt_honor_guest_mtrr)()?
  
Yan Zhao July 14, 2023, 7 a.m. UTC | #5
On Fri, Jun 30, 2023 at 03:49:10PM +0800, Yan Zhao wrote:
> On Thu, Jun 29, 2023 at 01:42:46PM -0700, Sean Christopherson wrote:
> > On Thu, Jun 29, 2023, Yan Zhao wrote:
> > > On Wed, Jun 28, 2023 at 03:57:09PM -0700, Sean Christopherson wrote:
> > > > On Fri, Jun 16, 2023, Yan Zhao wrote:
> ...
> > > > > +void kvm_mtrr_get_cd_memory_type(struct kvm_vcpu *vcpu, u8 *type, bool *ipat)
> > > > 
> > > > Hmm, I'm not convinced that this logic is subtle enough to warrant a common
> > > I added this patch because the memtype to use under CR0.CD=1 is determined by
> > > vmx specific code (i.e. vmx.c), while mtrr.c is a common code for x86.
> > > 
> > > I don't know if it's good to assume what vmx.c will return as in below open code. 
> > > (e.g. if someone added IPAT bit for CR0.CD=1 under the quirk, and forgot
> > > to update the code here, we actually need to zap everything rather than
> > > zap only non-WB ranges).
> > > 
> > > That's why I want to introduce a helper and let vmx.c call into it.
> > > (sorry, I didn't write a good commit message to explain the real intent).
> > 
> > No need to apologize, I fully understood the intent.  I'm just not convinced that
> > the risk of us screwing up this particular case is worth the extra layers of crud
> > that are necessary to let VMX and MTRRs share the core logic.
> > 
> > Absent emulating CR0.CD=1 with UC, setting IPAT is complete nonsense when KVM is
> > honoring the guest memtype.
> Yes, I'm just paranoid :)
> 
> > 
> > I 100% agree that splitting the logic is less than ideal, but providing a common
> > helper feels forced and IMO yields significantly less readable code.  And exporting
What about renaming it to kvm_honors_guest_mtrrs_get_cd_memtype()?
Then it's only needed to be called when guest mtrrs are honored and provides a
kind of enforcement. So that if there're other x86 participants (besides VMX/SVM)
who want to honor guest mtrr, the same memtype is used with CR0.CD=1.
(I know there might never be such kind of participants, or you may want to
update the code until they appear)

I tried in this way in v4 here
https://lore.kernel.org/all/20230714065356.20620-1-yan.y.zhao@intel.com/.
Feel free to ask me to drop it if you still don't like it :)

> > kvm_mtrr_get_cd_memory_type() only adds to the confusion because calling it on
> > SVM, which can't fully ignore gPAT, is also nonsensical.
> Ok. I get your concern now. You are right.
> Looks the easiest way now is to add some comments in VMM to caution that
> changes in memtype when noncoherent DMA present and CR0.CD=1 may lead to
> update of code for GFN zap.
> Or, do you think it's worth adding a new callback in kvm_x86_ops, e.g.
> static_call_cond(kvm_x86_get_cd_mt_honor_guest_mtrr)()?
  

Patch

diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index 3ce58734ad22..b35dd0bc9cad 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -721,3 +721,22 @@  bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
 
 	return type == mtrr_default_type(mtrr_state);
 }
+
+void kvm_mtrr_get_cd_memory_type(struct kvm_vcpu *vcpu, u8 *type, bool *ipat)
+{
+	/*
+	 * this routine is supposed to be called when guest mtrrs are honored
+	 */
+	if (unlikely(!kvm_mmu_honors_guest_mtrrs(vcpu->kvm))) {
+		*type = MTRR_TYPE_WRBACK;
+		*ipat = true;
+	} else if (unlikely(!kvm_check_has_quirk(vcpu->kvm,
+			    KVM_X86_QUIRK_CD_NW_CLEARED))) {
+		*type = MTRR_TYPE_UNCACHABLE;
+		*ipat = true;
+	} else {
+		*type = MTRR_TYPE_WRBACK;
+		*ipat = false;
+	}
+}
+EXPORT_SYMBOL_GPL(kvm_mtrr_get_cd_memory_type);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c1e93678cea4..6414c5a6e892 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7573,11 +7573,11 @@  static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 		return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
 
 	if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) {
-		if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
-			return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
-		else
-			return (MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT) |
-				VMX_EPT_IPAT_BIT;
+		bool ipat;
+		u8 cache;
+
+		kvm_mtrr_get_cd_memory_type(vcpu, &cache, &ipat);
+		return cache << VMX_EPT_MT_EPTE_SHIFT | (ipat ? VMX_EPT_IPAT_BIT : 0);
 	}
 
 	return kvm_mtrr_get_guest_memory_type(vcpu, gfn) << VMX_EPT_MT_EPTE_SHIFT;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 82e3dafc5453..9781b4b32d68 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -313,6 +313,7 @@  int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data);
 int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
 bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
 					  int page_num);
+void kvm_mtrr_get_cd_memory_type(struct kvm_vcpu *vcpu, u8 *type, bool *ipat);
 bool kvm_vector_hashing_enabled(void);
 void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code);
 int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type,