[v2,1/6] KVM: x86/mmu: add a new mmu zap helper to indicate memtype changes

Message ID 20230509135006.1604-1-yan.y.zhao@intel.com
State New
Headers
Series KVM: x86/mmu: refine memtype related mmu zap |

Commit Message

Yan Zhao May 9, 2023, 1:50 p.m. UTC
  Add a helper to indicate that the kvm_zap_gfn_range() request is to update
memory type.

Then the zap can be avoided in cases:
1. TDP is not enabled.
2. EPT is not enabled.

This is because only memory type of EPT leaf entries are subjected to
change when noncoherent DMA/guest CR0.CD/guest MTRR settings change.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/mmu.h     |  1 +
 arch/x86/kvm/mmu/mmu.c | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)
  

Comments

Chao Gao May 10, 2023, 5:30 a.m. UTC | #1
On Tue, May 09, 2023 at 09:50:06PM +0800, Yan Zhao wrote:
>Add a helper to indicate that the kvm_zap_gfn_range() request is to update
>memory type.
>
>Then the zap can be avoided in cases:
>1. TDP is not enabled.
>2. EPT is not enabled.
>
>This is because only memory type of EPT leaf entries are subjected to
>change when noncoherent DMA/guest CR0.CD/guest MTRR settings change.
>
>Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
>---
> arch/x86/kvm/mmu.h     |  1 +
> arch/x86/kvm/mmu/mmu.c | 16 ++++++++++++++++
> 2 files changed, 17 insertions(+)
>
>diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>index 92d5a1924fc1..a04577afbc71 100644
>--- a/arch/x86/kvm/mmu.h
>+++ b/arch/x86/kvm/mmu.h
>@@ -236,6 +236,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> }
> 
> void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
>+void kvm_zap_gfn_for_memtype(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
> 
> int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
> 
>diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>index c8961f45e3b1..2706754794d1 100644
>--- a/arch/x86/kvm/mmu/mmu.c
>+++ b/arch/x86/kvm/mmu/mmu.c
>@@ -6272,6 +6272,22 @@ static bool kvm_rmap_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_e
> 	return flush;
> }
> 
>+/*
>+ * Invalidate (zap) TDP SPTEs that cover GFNs from gfn_start and up to gfn_end
>+ * (not including it) for reason of memory type being updated.
>+ */
>+void kvm_zap_gfn_for_memtype(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
>+{
>+	/* Currently only memory type of EPT leaf entries are affected by
>+	 * guest CR0.CD and guest MTRR.
>+	 * So skip invalidation (zap) in other cases
>+	 */
>+	if (!shadow_memtype_mask)

Do you need to check kvm_arch_has_noncoherent_dma()? if yes, maybe just extract
the first if() statement and its associated comment from kvm_tdp_page_fault()?

>+		return;
>+
>+	kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL));

This should be:

	kvm_zap_gfn_range(kvm, start, end);

>+}
>+
> /*
>  * Invalidate (zap) SPTEs that cover GFNs from gfn_start and up to gfn_end
>  * (not including it)
>-- 
>2.17.1
>
  
Yan Zhao May 10, 2023, 8:06 a.m. UTC | #2
On Wed, May 10, 2023 at 01:30:29PM +0800, Chao Gao wrote:
> On Tue, May 09, 2023 at 09:50:06PM +0800, Yan Zhao wrote:
> >Add a helper to indicate that the kvm_zap_gfn_range() request is to update
> >memory type.
> >
> >Then the zap can be avoided in cases:
> >1. TDP is not enabled.
> >2. EPT is not enabled.
> >
> >This is because only memory type of EPT leaf entries are subjected to
> >change when noncoherent DMA/guest CR0.CD/guest MTRR settings change.
> >
> >Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> >---
> > arch/x86/kvm/mmu.h     |  1 +
> > arch/x86/kvm/mmu/mmu.c | 16 ++++++++++++++++
> > 2 files changed, 17 insertions(+)
> >
> >diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> >index 92d5a1924fc1..a04577afbc71 100644
> >--- a/arch/x86/kvm/mmu.h
> >+++ b/arch/x86/kvm/mmu.h
> >@@ -236,6 +236,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> > }
> > 
> > void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
> >+void kvm_zap_gfn_for_memtype(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
> > 
> > int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
> > 
> >diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> >index c8961f45e3b1..2706754794d1 100644
> >--- a/arch/x86/kvm/mmu/mmu.c
> >+++ b/arch/x86/kvm/mmu/mmu.c
> >@@ -6272,6 +6272,22 @@ static bool kvm_rmap_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_e
> > 	return flush;
> > }
> > 
> >+/*
> >+ * Invalidate (zap) TDP SPTEs that cover GFNs from gfn_start and up to gfn_end
> >+ * (not including it) for reason of memory type being updated.
> >+ */
> >+void kvm_zap_gfn_for_memtype(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
> >+{
> >+	/* Currently only memory type of EPT leaf entries are affected by
> >+	 * guest CR0.CD and guest MTRR.
> >+	 * So skip invalidation (zap) in other cases
> >+	 */
> >+	if (!shadow_memtype_mask)
> 
> Do you need to check kvm_arch_has_noncoherent_dma()? if yes, maybe just extract
> the first if() statement and its associated comment from kvm_tdp_page_fault()?
>
This kvm_zap_gfn_for_memtype() helper will be called in
kvm_arch_unregister_noncoherent_dma() as well when noncoherent_dma_count
goes from 1 to 0.
So, checking kvm_arch_has_noncoherent_dma() is not desired.

> >+		return;
> >+
> >+	kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
> 
> This should be:
> 
> 	kvm_zap_gfn_range(kvm, start, end);
>
Oops. sorry about this mistake. Will fix it in next version.

> >+}
> >+
> > /*
> >  * Invalidate (zap) SPTEs that cover GFNs from gfn_start and up to gfn_end
> >  * (not including it)
> >-- 
> >2.17.1
> >
  
Sean Christopherson May 23, 2023, 10:51 p.m. UTC | #3
On Wed, May 10, 2023, Yan Zhao wrote:
> On Wed, May 10, 2023 at 01:30:29PM +0800, Chao Gao wrote:
> > On Tue, May 09, 2023 at 09:50:06PM +0800, Yan Zhao wrote:
> > >Add a helper to indicate that the kvm_zap_gfn_range() request is to update
> > >memory type.
> > >
> > >Then the zap can be avoided in cases:
> > >1. TDP is not enabled.
> > >2. EPT is not enabled.
> > >
> > >This is because only memory type of EPT leaf entries are subjected to
> > >change when noncoherent DMA/guest CR0.CD/guest MTRR settings change.
> > >
> > >Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > >---
> > > arch/x86/kvm/mmu.h     |  1 +
> > > arch/x86/kvm/mmu/mmu.c | 16 ++++++++++++++++
> > > 2 files changed, 17 insertions(+)
> > >
> > >diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > >index 92d5a1924fc1..a04577afbc71 100644
> > >--- a/arch/x86/kvm/mmu.h
> > >+++ b/arch/x86/kvm/mmu.h
> > >@@ -236,6 +236,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> > > }
> > > 
> > > void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
> > >+void kvm_zap_gfn_for_memtype(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
> > > 
> > > int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
> > > 
> > >diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > >index c8961f45e3b1..2706754794d1 100644
> > >--- a/arch/x86/kvm/mmu/mmu.c
> > >+++ b/arch/x86/kvm/mmu/mmu.c
> > >@@ -6272,6 +6272,22 @@ static bool kvm_rmap_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_e
> > > 	return flush;
> > > }
> > > 
> > >+/*
> > >+ * Invalidate (zap) TDP SPTEs that cover GFNs from gfn_start and up to gfn_end
> > >+ * (not including it) for reason of memory type being updated.
> > >+ */
> > >+void kvm_zap_gfn_for_memtype(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
> > >+{
> > >+	/* Currently only memory type of EPT leaf entries are affected by
> > >+	 * guest CR0.CD and guest MTRR.
> > >+	 * So skip invalidation (zap) in other cases
> > >+	 */
> > >+	if (!shadow_memtype_mask)
> > 
> > Do you need to check kvm_arch_has_noncoherent_dma()? if yes, maybe just extract
> > the first if() statement and its associated comment from kvm_tdp_page_fault()?
> >
> This kvm_zap_gfn_for_memtype() helper will be called in
> kvm_arch_unregister_noncoherent_dma() as well when noncoherent_dma_count
> goes from 1 to 0.
> So, checking kvm_arch_has_noncoherent_dma() is not desired.
> 
> > >+		return;
> > >+
> > >+	kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
> > 
> > This should be:
> > 
> > 	kvm_zap_gfn_range(kvm, start, end);
> >
> Oops. sorry about this mistake. Will fix it in next version.

Rather than add a helper to zap "for memtype", add a helper to query if KVM honors
guest MTRRs.  The "for memtype" isn't intuitive and ends up being incomplete.
That will also allow for deduplicating other code (and a comment).  Waiting until
the zap also wastes cycles in the MTRR case, there's no need to compute start and
end if the zap will ultimately be skipped.

E.g. over two or three patches:

---
 arch/x86/kvm/mmu.h     |  1 +
 arch/x86/kvm/mmu/mmu.c | 19 ++++++++++++++-----
 arch/x86/kvm/mtrr.c    |  2 +-
 arch/x86/kvm/x86.c     |  2 +-
 4 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 92d5a1924fc1..c3c50ec9d9db 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -103,6 +103,7 @@ static inline u8 kvm_get_shadow_phys_bits(void)
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
 void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
 void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);
+bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm);
 
 void kvm_init_mmu(struct kvm_vcpu *vcpu);
 void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c8961f45e3b1..a2b1723bc6a4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4512,12 +4512,10 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
 }
 #endif
 
-int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
 {
 	/*
-	 * If the guest's MTRRs may be used to compute the "real" memtype,
-	 * restrict the mapping level to ensure KVM uses a consistent memtype
-	 * across the entire mapping.  If the host MTRRs are ignored by TDP
+	 * If the TDP is enabled, the host MTRRs are ignored by TDP
 	 * (shadow_memtype_mask is non-zero), and the VM has non-coherent DMA
 	 * (DMA doesn't snoop CPU caches), KVM's ABI is to honor the memtype
 	 * from the guest's MTRRs so that guest accesses to memory that is
@@ -4526,7 +4524,18 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	 * Note, KVM may still ultimately ignore guest MTRRs for certain PFNs,
 	 * e.g. KVM will force UC memtype for host MMIO.
 	 */
-	if (shadow_memtype_mask && kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
+	return tdp_enabled && shadow_memtype_mask &&
+	       kvm_arch_has_noncoherent_dma(kvm);
+}
+
+int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+{
+	/*
+	 * If the guest's MTRRs may be used to compute the "real" memtype,
+	 * restrict the mapping level to ensure KVM uses a consistent memtype
+	 * across the entire mapping.
+	 */
+	if (kvm_mmu_honors_guest_mtrrs(vcpu->kvm)) {
 		for ( ; fault->max_level > PG_LEVEL_4K; --fault->max_level) {
 			int page_num = KVM_PAGES_PER_HPAGE(fault->max_level);
 			gfn_t base = gfn_round_for_level(fault->gfn,
diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index 3eb6e7f47e96..a67c28a56417 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -320,7 +320,7 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
 	struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state;
 	gfn_t start, end;
 
-	if (!tdp_enabled || !kvm_arch_has_noncoherent_dma(vcpu->kvm))
+	if (!kvm_mmu_honors_guest_mtrrs(vcpu->kvm))
 		return;
 
 	if (!mtrr_is_enabled(mtrr_state) && msr != MSR_MTRRdefType)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 076d50f6321b..977dceb7ba7e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -942,7 +942,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
 		kvm_mmu_reset_context(vcpu);
 
 	if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
-	    kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
+	    kvm_mmu_honors_guest_mtrrs(vcpu->kvm) &&
 	    !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
 		kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
 }

base-commit: 8d4a1b4011c125b1510254b724433aaae746ce14
--
  
Yan Zhao May 24, 2023, 2:22 a.m. UTC | #4
On Tue, May 23, 2023 at 03:51:49PM -0700, Sean Christopherson wrote:
> Rather than add a helper to zap "for memtype", add a helper to query if KVM honors
> guest MTRRs.  The "for memtype" isn't intuitive and ends up being incomplete.
> That will also allow for deduplicating other code (and a comment).  Waiting until
> the zap also wastes cycles in the MTRR case, there's no need to compute start and
> end if the zap will ultimately be skipped.
Agreed.

> 
> E.g. over two or three patches:
> 
> ---
>  arch/x86/kvm/mmu.h     |  1 +
>  arch/x86/kvm/mmu/mmu.c | 19 ++++++++++++++-----
>  arch/x86/kvm/mtrr.c    |  2 +-
>  arch/x86/kvm/x86.c     |  2 +-
>  4 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 92d5a1924fc1..c3c50ec9d9db 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -103,6 +103,7 @@ static inline u8 kvm_get_shadow_phys_bits(void)
>  void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
>  void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
>  void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);
> +bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm);
>  
>  void kvm_init_mmu(struct kvm_vcpu *vcpu);
>  void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c8961f45e3b1..a2b1723bc6a4 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4512,12 +4512,10 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
>  }
>  #endif
>  
> -int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> +bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
>  {
>  	/*
> -	 * If the guest's MTRRs may be used to compute the "real" memtype,
> -	 * restrict the mapping level to ensure KVM uses a consistent memtype
> -	 * across the entire mapping.  If the host MTRRs are ignored by TDP
> +	 * If the TDP is enabled, the host MTRRs are ignored by TDP
>  	 * (shadow_memtype_mask is non-zero), and the VM has non-coherent DMA
>  	 * (DMA doesn't snoop CPU caches), KVM's ABI is to honor the memtype
>  	 * from the guest's MTRRs so that guest accesses to memory that is
> @@ -4526,7 +4524,18 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  	 * Note, KVM may still ultimately ignore guest MTRRs for certain PFNs,
>  	 * e.g. KVM will force UC memtype for host MMIO.
>  	 */
> -	if (shadow_memtype_mask && kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
> +	return tdp_enabled && shadow_memtype_mask &&
> +	       kvm_arch_has_noncoherent_dma(kvm);
> +}
> +
> +int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> +{
> +	/*
> +	 * If the guest's MTRRs may be used to compute the "real" memtype,
> +	 * restrict the mapping level to ensure KVM uses a consistent memtype
> +	 * across the entire mapping.
> +	 */
> +	if (kvm_mmu_honors_guest_mtrrs(vcpu->kvm)) {
>  		for ( ; fault->max_level > PG_LEVEL_4K; --fault->max_level) {
>  			int page_num = KVM_PAGES_PER_HPAGE(fault->max_level);
>  			gfn_t base = gfn_round_for_level(fault->gfn,
> diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
> index 3eb6e7f47e96..a67c28a56417 100644
> --- a/arch/x86/kvm/mtrr.c
> +++ b/arch/x86/kvm/mtrr.c
> @@ -320,7 +320,7 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
>  	struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state;
>  	gfn_t start, end;
>  
> -	if (!tdp_enabled || !kvm_arch_has_noncoherent_dma(vcpu->kvm))
> +	if (!kvm_mmu_honors_guest_mtrrs(vcpu->kvm))
Could we also add another helper kvm_mmu_cap_honors_guest_mtrrs(), which
does not check kvm_arch_has_noncoherent_dma()?

+static inline bool kvm_mmu_cap_honors_guest_mtrrs(struct kvm *kvm)
+{
+       return !!shadow_memtype_mask;
+}

This is because in patch 4 I plan to do the EPT zap when
noncoherent_dma_count goes from 1 to 0.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 41d7bb51a297..ad0c43d7f532 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13146,13 +13146,19 @@ EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device);

 void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
 {
-       atomic_inc(&kvm->arch.noncoherent_dma_count);
+       if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1) {
+               if (kvm_mmu_cap_honors_guest_mtrrs(kvm))
+                       kvm_zap_gfn_range(kvm, 0, ~0ULL);
+       }
 }
 EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma);

 void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm)
 {
-       atomic_dec(&kvm->arch.noncoherent_dma_count);
+       if (!atomic_dec_return(&kvm->arch.noncoherent_dma_count)) {
+               if (kvm_mmu_cap_honors_guest_mtrrs(kvm))
+                       kvm_zap_gfn_range(kvm, 0, ~0ULL);
+       }
 }
 EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma);


>  		return;
>  
>  	if (!mtrr_is_enabled(mtrr_state) && msr != MSR_MTRRdefType)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 076d50f6321b..977dceb7ba7e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -942,7 +942,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
>  		kvm_mmu_reset_context(vcpu);
>  
>  	if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
> -	    kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
> +	    kvm_mmu_honors_guest_mtrrs(vcpu->kvm) &&
>  	    !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
>  		kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
>  }
> 
> base-commit: 8d4a1b4011c125b1510254b724433aaae746ce14
> -- 
>
  
Sean Christopherson May 24, 2023, 2:50 p.m. UTC | #5
On Wed, May 24, 2023, Yan Zhao wrote:
> On Tue, May 23, 2023 at 03:51:49PM -0700, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
> > index 3eb6e7f47e96..a67c28a56417 100644
> > --- a/arch/x86/kvm/mtrr.c
> > +++ b/arch/x86/kvm/mtrr.c
> > @@ -320,7 +320,7 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
> >  	struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state;
> >  	gfn_t start, end;
> >  
> > -	if (!tdp_enabled || !kvm_arch_has_noncoherent_dma(vcpu->kvm))
> > +	if (!kvm_mmu_honors_guest_mtrrs(vcpu->kvm))
> Could we also add another helper kvm_mmu_cap_honors_guest_mtrrs(), which
> does not check kvm_arch_has_noncoherent_dma()?
> 
> +static inline bool kvm_mmu_cap_honors_guest_mtrrs(struct kvm *kvm)
> +{
> +       return !!shadow_memtype_mask;
> +}
> 
> This is because in patch 4 I plan to do the EPT zap when
> noncoherent_dma_count goes from 1 to 0.

Hrm, the 1->0 transition is annoying.  Rather than trying to capture the "everything
except non-coherent DMA" aspect, what about this?

mmu.c:

bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma)
{
	/*
	 * If the TDP is enabled, the host MTRRs are ignored by TDP
	 * (shadow_memtype_mask is non-zero), and the VM has non-coherent DMA
	 * (DMA doesn't snoop CPU caches), KVM's ABI is to honor the memtype
	 * from the guest's MTRRs so that guest accesses to memory that is
	 * DMA'd aren't cached against the guest's wishes.
	 *
	 * Note, KVM may still ultimately ignore guest MTRRs for certain PFNs,
	 * e.g. KVM will force UC memtype for host MMIO.
	 */
	return vm_has_noncoherent_dma && tdp_enabled && shadow_memtype_mask;
}

mmu.h:

bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma);

static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
{
	
	return __kvm_mmu_honors_guest_mtrrs(kvm, kvm_arch_has_noncoherent_dma(kvm));
}

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 41d7bb51a297..ad0c43d7f532 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13146,13 +13146,19 @@ EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device);
> 
>  void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
>  {
> -       atomic_inc(&kvm->arch.noncoherent_dma_count);
> +       if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1) {
> +               if (kvm_mmu_cap_honors_guest_mtrrs(kvm))
> +                       kvm_zap_gfn_range(kvm, 0, ~0ULL);

No need for multiple if statements.  Though rather than have identical code in
both the start/end paths, how about this?  That provides a single location for a
comment.  Or maybe first/last instead of start/end?

static void kvm_noncoherent_dma_start_or_end(struct kvm *kvm)
{
	/* comment goes here. */
	if (__kvm_mmu_honors_guest_mtrrs(kvm, true))
		kvm_zap_gfn_range(kvm, 0, ~0ULL);
}

void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
{
	if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1)
		kvm_noncoherent_dma_start_or_end(kvm);
}
EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma);

void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm)
{
	if (!atomic_dec_return(&kvm->arch.noncoherent_dma_count))
		kvm_noncoherent_dma_start_or_end(kvm);
}
EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma);
  
Yan Zhao May 25, 2023, 10:14 a.m. UTC | #6
On Wed, May 24, 2023 at 07:50:24AM -0700, Sean Christopherson wrote:
> On Wed, May 24, 2023, Yan Zhao wrote:
> > On Tue, May 23, 2023 at 03:51:49PM -0700, Sean Christopherson wrote:
> > > diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
> > > index 3eb6e7f47e96..a67c28a56417 100644
> > > --- a/arch/x86/kvm/mtrr.c
> > > +++ b/arch/x86/kvm/mtrr.c
> > > @@ -320,7 +320,7 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
> > >  	struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state;
> > >  	gfn_t start, end;
> > >  
> > > -	if (!tdp_enabled || !kvm_arch_has_noncoherent_dma(vcpu->kvm))
> > > +	if (!kvm_mmu_honors_guest_mtrrs(vcpu->kvm))
> > Could we also add another helper kvm_mmu_cap_honors_guest_mtrrs(), which
> > does not check kvm_arch_has_noncoherent_dma()?
> > 
> > +static inline bool kvm_mmu_cap_honors_guest_mtrrs(struct kvm *kvm)
> > +{
> > +       return !!shadow_memtype_mask;
> > +}
> > 
> > This is because in patch 4 I plan to do the EPT zap when
> > noncoherent_dma_count goes from 1 to 0.
> 
> Hrm, the 1->0 transition is annoying.  Rather than trying to capture the "everything
> except non-coherent DMA" aspect, what about this?
> 
> mmu.c:
> 
> bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma)
> {
> 	/*
> 	 * If the TDP is enabled, the host MTRRs are ignored by TDP
> 	 * (shadow_memtype_mask is non-zero), and the VM has non-coherent DMA
> 	 * (DMA doesn't snoop CPU caches), KVM's ABI is to honor the memtype
> 	 * from the guest's MTRRs so that guest accesses to memory that is
> 	 * DMA'd aren't cached against the guest's wishes.
> 	 *
> 	 * Note, KVM may still ultimately ignore guest MTRRs for certain PFNs,
> 	 * e.g. KVM will force UC memtype for host MMIO.
> 	 */
> 	return vm_has_noncoherent_dma && tdp_enabled && shadow_memtype_mask;
> }
> 
> mmu.h:
> 
> bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma);
> 
> static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
> {
> 	
> 	return __kvm_mmu_honors_guest_mtrrs(kvm, kvm_arch_has_noncoherent_dma(kvm));
> }

This should work and it centralizes the comments into one place, though I dislike
having to pass true as vm_has_noncoherent_dma in case of 1->0 transition. :)

> 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 41d7bb51a297..ad0c43d7f532 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -13146,13 +13146,19 @@ EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device);
> > 
> >  void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
> >  {
> > -       atomic_inc(&kvm->arch.noncoherent_dma_count);
> > +       if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1) {
> > +               if (kvm_mmu_cap_honors_guest_mtrrs(kvm))
> > +                       kvm_zap_gfn_range(kvm, 0, ~0ULL);
> 
> No need for multiple if statements.  Though rather than have identical code in
> both the start/end paths, how about this?  That provides a single location for a
> comment.  Or maybe first/last instead of start/end?
> 
> static void kvm_noncoherent_dma_start_or_end(struct kvm *kvm)
What does start_or_end or first_or_last stand for? 

> {
> 	/* comment goes here. */
> 	if (__kvm_mmu_honors_guest_mtrrs(kvm, true))
> 		kvm_zap_gfn_range(kvm, 0, ~0ULL);
> }
> 
> void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
> {
> 	if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1)
> 		kvm_noncoherent_dma_start_or_end(kvm);
> }
> EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma);
> 
> void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm)
> {
> 	if (!atomic_dec_return(&kvm->arch.noncoherent_dma_count))
> 		kvm_noncoherent_dma_start_or_end(kvm);
> }
> EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma);
>
  
Sean Christopherson May 25, 2023, 3:54 p.m. UTC | #7
On Thu, May 25, 2023, Yan Zhao wrote:
> On Wed, May 24, 2023 at 07:50:24AM -0700, Sean Christopherson wrote:
> > bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma);
> > 
> > static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
> > {
> > 	
> > 	return __kvm_mmu_honors_guest_mtrrs(kvm, kvm_arch_has_noncoherent_dma(kvm));
> > }
> 
> This should work and it centralizes the comments into one place, though I dislike
> having to pass true as vm_has_noncoherent_dma in case of 1->0 transition. :)

Yeah, I don't love it either, but the whole 1=>0 transition is awkward.  FWIW,
KVM doesn't strictly need to zap in that case since the guest isn't relying on
WB for functionality, i.e. we could just skip it.

> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 41d7bb51a297..ad0c43d7f532 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -13146,13 +13146,19 @@ EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device);
> > > 
> > >  void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
> > >  {
> > > -       atomic_inc(&kvm->arch.noncoherent_dma_count);
> > > +       if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1) {
> > > +               if (kvm_mmu_cap_honors_guest_mtrrs(kvm))
> > > +                       kvm_zap_gfn_range(kvm, 0, ~0ULL);
> > 
> > No need for multiple if statements.  Though rather than have identical code in
> > both the start/end paths, how about this?  That provides a single location for a
> > comment.  Or maybe first/last instead of start/end?
> > 
> > static void kvm_noncoherent_dma_start_or_end(struct kvm *kvm)
> What does start_or_end or first_or_last stand for? 

Start/End of device (un)assignment, or First/Last device (un)assigned.  Definitely
feel free to pick a better name.
  
Yan Zhao May 30, 2023, 1:32 a.m. UTC | #8
On Thu, May 25, 2023 at 08:54:15AM -0700, Sean Christopherson wrote:
> On Thu, May 25, 2023, Yan Zhao wrote:
> > On Wed, May 24, 2023 at 07:50:24AM -0700, Sean Christopherson wrote:
> > > bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma);
> > > 
> > > static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
> > > {
> > > 	
> > > 	return __kvm_mmu_honors_guest_mtrrs(kvm, kvm_arch_has_noncoherent_dma(kvm));
> > > }
> > 
> > This should work and it centralizes the comments into one place, though I dislike
> > having to pass true as vm_has_noncoherent_dma in case of 1->0 transition. :)
> 
> Yeah, I don't love it either, but the whole 1=>0 transition is awkward.  FWIW,
> KVM doesn't strictly need to zap in that case since the guest isn't relying on
> WB for functionality, i.e. we could just skip it.
I think zap when 1=>0 transition is still useful.
E.g. if this non-coherent DMA is unassigned
(1) when CR0.CD=1 (KVM_X86_QUIRK_CD_NW_CLEARED is not enabled), or
(2) when CR0.CD=0 and MTRRs are disabled,
it's better to zap the UC ranges for better performance.
  
Yan Zhao May 30, 2023, 9:48 a.m. UTC | #9
On Thu, May 25, 2023 at 08:54:15AM -0700, Sean Christopherson wrote:
> > > static void kvm_noncoherent_dma_start_or_end(struct kvm *kvm)
> > What does start_or_end or first_or_last stand for? 
> 
> Start/End of device (un)assignment, or First/Last device (un)assigned.  Definitely
> feel free to pick a better name.
Hi Sean,
start_or_end gave me a first impression that it stands for gfn_start, gfn_end.
so currently I changed it to start/stop to avoid confusion.

+static void kvm_noncoherent_dma_assignment_start_or_stop(struct kvm *kvm)
+{
+       /*
+        * Only necessary to zap KVM TDP if guest MTRR is honored.
+        * As whether a VM has noncoherent DMA can affect whether
+        * KVM honors guest MTRR and the resulting memory type in TDP,
+        * specify its value as true here to test if guest MTRR
+        * is honored after the assignment starts or
+        * was honored before the assignment stops.
+        */
+       if (kvm_mmu_honors_guest_mtrrs(kvm, true))
+               kvm_zap_gfn_range(kvm, 0, ~0ULL);
+}

And I combined the __kvm_mmu_honors_guest_mtrrs() into
kvm_mmu_honors_guest_mtrrs(). Not sure if you like it :)

+/*
+ * Returns if KVM honors guest MTRRs
+ * @override_vm_has_noncoherent_dma: Allow caller to override non-coherent DMA
+ *                                   status returned from
+ *                                   kvm_arch_has_noncoherent_dma()
+ */
+bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm,
+                               bool override_vm_has_noncoherent_dma)
+{
+       bool noncoherent_dma = override_vm_has_noncoherent_dma ? true :
+                              kvm_arch_has_noncoherent_dma(kvm);
+
+       /*
+        * If the TDP is enabled, the host MTRRs are ignored by TDP
+        * (shadow_memtype_mask is non-zero), and the VM has non-coherent DMA
+        * (DMA doesn't snoop CPU caches), KVM's ABI is to honor the memtype
+        * from the guest's MTRRs so that guest accesses to memory that is
+        * DMA'd aren't cached against the guest's wishes.
+        *
+        * Note, KVM may still ultimately ignore guest MTRRs for certain PFNs,
+        * e.g. KVM will force UC memtype for host MMIO.
+        */
+       return noncoherent_dma && tdp_enabled && shadow_memtype_mask;
+}
+

Thanks
Yan
  
Sean Christopherson May 30, 2023, 11:51 p.m. UTC | #10
On Tue, May 30, 2023, Yan Zhao wrote:
> On Thu, May 25, 2023 at 08:54:15AM -0700, Sean Christopherson wrote:
> And I combined the __kvm_mmu_honors_guest_mtrrs() into
> kvm_mmu_honors_guest_mtrrs(). Not sure if you like it :)

I would prefer to provide a separater inner helper, mainly so that the common
case callers don't need to pass %false.  I don't love passing bools, but it's
tolerable for a one-off use of an inner helper.

> +/*
> + * Returns if KVM honors guest MTRRs
> + * @override_vm_has_noncoherent_dma: Allow caller to override non-coherent DMA
> + *                                   status returned from
> + *                                   kvm_arch_has_noncoherent_dma()
> + */
> +bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm,
> +                               bool override_vm_has_noncoherent_dma)
> +{
> +       bool noncoherent_dma = override_vm_has_noncoherent_dma ? true :
> +                              kvm_arch_has_noncoherent_dma(kvm);

The "override" name is confusing, e.g. it won't be clear when it's safe/correct
for a new caller to override kvm_arch_has_noncoherent_dma().  If we go with a
single helper, I could live with:

bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool stopping_noncoherent_dma)
{
	bool noncoherent_dma = stopping_noncoherent_dma ||
			       kvm_arch_has_noncoherent_dma(kvm);

	...
}

but that makes it awkward to use common code for start+stop assignment, and as
above there are three "normal" callers that would have to pass magic %false
values regardless of the name.
  
Yan Zhao May 31, 2023, 12:18 a.m. UTC | #11
On Tue, May 30, 2023 at 04:51:25PM -0700, Sean Christopherson wrote:
> On Tue, May 30, 2023, Yan Zhao wrote:
> > On Thu, May 25, 2023 at 08:54:15AM -0700, Sean Christopherson wrote:
> > And I combined the __kvm_mmu_honors_guest_mtrrs() into
> > kvm_mmu_honors_guest_mtrrs(). Not sure if you like it :)
> 
> I would prefer to provide a separater inner helper, mainly so that the common
> case callers don't need to pass %false.  I don't love passing bools, but it's
> tolerable for a one-off use of an inner helper.

Ok. Will follow this way.
It's reasonable :)

> > +/*
> > + * Returns if KVM honors guest MTRRs
> > + * @override_vm_has_noncoherent_dma: Allow caller to override non-coherent DMA
> > + *                                   status returned from
> > + *                                   kvm_arch_has_noncoherent_dma()
> > + */
> > +bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm,
> > +                               bool override_vm_has_noncoherent_dma)
> > +{
> > +       bool noncoherent_dma = override_vm_has_noncoherent_dma ? true :
> > +                              kvm_arch_has_noncoherent_dma(kvm);
> 
> The "override" name is confusing, e.g. it won't be clear when it's safe/correct
> for a new caller to override kvm_arch_has_noncoherent_dma().  If we go with a
> single helper, I could live with:
> 
> bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool stopping_noncoherent_dma)
> {
> 	bool noncoherent_dma = stopping_noncoherent_dma ||
> 			       kvm_arch_has_noncoherent_dma(kvm);
> 
> 	...
> }
> 
> but that makes it awkward to use common code for start+stop assignment, and as
> above there are three "normal" callers that would have to pass magic %false
> values regardless of the name.
  

Patch

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 92d5a1924fc1..a04577afbc71 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -236,6 +236,7 @@  static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 }
 
 void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+void kvm_zap_gfn_for_memtype(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
 
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c8961f45e3b1..2706754794d1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6272,6 +6272,22 @@  static bool kvm_rmap_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_e
 	return flush;
 }
 
+/*
+ * Invalidate (zap) TDP SPTEs that cover GFNs from gfn_start and up to gfn_end
+ * (not including it) for reason of memory type being updated.
+ */
+void kvm_zap_gfn_for_memtype(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
+{
+	/* Currently only memory type of EPT leaf entries are affected by
+	 * guest CR0.CD and guest MTRR.
+	 * So skip invalidation (zap) in other cases
+	 */
+	if (!shadow_memtype_mask)
+		return;
+
+	kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
+}
+
 /*
  * Invalidate (zap) SPTEs that cover GFNs from gfn_start and up to gfn_end
  * (not including it)