[v3,07/11] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED

Message ID 20230616023815.7439-1-yan.y.zhao@intel.com
State New
Headers
Series KVM: x86/mmu: refine memtype related mmu zap |

Commit Message

Yan Zhao June 16, 2023, 2:38 a.m. UTC
  For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory
types when cache is disabled and non-coherent DMA are present.

With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the
EPT memory type when guest cache is disabled before this patch.
Removing the IPAT bit in this patch will allow effective memory type to
honor PAT values as well, which will make the effective memory type
stronger than WB as WB is the weakest memtype. However, this change is
acceptable as it doesn't make the memory type weaker, consider without
this quirk, the original memtype for cache disabled is UC + IPAT.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)
  

Comments

Yan Zhao June 20, 2023, 2:34 a.m. UTC | #1
On Tue, Jun 20, 2023 at 10:42:57AM +0800, Chao Gao wrote:
> On Fri, Jun 16, 2023 at 10:38:15AM +0800, Yan Zhao wrote:
> >For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory
> >types when cache is disabled and non-coherent DMA are present.
> >
> >With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the
> >EPT memory type when guest cache is disabled before this patch.
> >Removing the IPAT bit in this patch will allow effective memory type to
> >honor PAT values as well, which will make the effective memory type
> 
> Given guest sets CR0.CD, what's the point of honoring (guest) PAT? e.g.,
> which guests can benefit from this change?
This patch is actually a preparation for later patch 10 to implement
fine-grained zap.
If when CR0.CD=1 the EPT type is WB + IPAT, and
when CR0.CD=0 + mtrr enabled, EPT type is WB or UC or ..., which are
without IPAT, then we have to always zap all EPT entries.

Given removing the IPAT bit when CR0.CD=1 only makes the quirk
KVM_X86_QUIRK_CD_NW_CLEARED more strict (meaning it could be WC/UC... if
the guest PAT overwrites it), it's still acceptable.

On the other hand, from the guest's point of view, it sees the GPA is UC
with CR0.CD=1. So, if we want to align host EPT memtype with guest's
view, we have to drop the quirk KVM_X86_QUIRK_CD_NW_CLEARED, which,
however, will impact the guest bootup performance dramatically and is
not adopted in any VM.

So, we remove the IPAT bit when CR0.CD=1 with the quirk
KVM_X86_QUIRK_CD_NW_CLEARED to make it stricter and to enable later
optimization.

> 
> >stronger than WB as WB is the weakest memtype. However, this change is
> >acceptable as it doesn't make the memory type weaker,
> 
> >consider without
> >this quirk, the original memtype for cache disabled is UC + IPAT.
> 
> This change impacts only the case where the quirk is enabled. Maybe you
> mean:
> 
> _with_ the quirk, the original memtype for cached disabled is _WB_ + IPAT.
> 
Uh, I mean originally without the quirk, UC + IPAT should be returned,
which is the correct value to return.

I can explain the full background more clearly in the next version.

Thanks

> 
> >
> >Suggested-by: Sean Christopherson <seanjc@google.com>
> >Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> >---
> > arch/x86/kvm/vmx/vmx.c | 9 +++------
> > 1 file changed, 3 insertions(+), 6 deletions(-)
> >
> >diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> >index 0ecf4be2c6af..c1e93678cea4 100644
> >--- a/arch/x86/kvm/vmx/vmx.c
> >+++ b/arch/x86/kvm/vmx/vmx.c
> >@@ -7548,8 +7548,6 @@ static int vmx_vm_init(struct kvm *kvm)
> > 
> > static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
> > {
> >-	u8 cache;
> >-
> > 	/* We wanted to honor guest CD/MTRR/PAT, but doing so could result in
> > 	 * memory aliases with conflicting memory types and sometimes MCEs.
> > 	 * We have to be careful as to what are honored and when.
> >@@ -7576,11 +7574,10 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
> > 
> > 	if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) {
> > 		if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
> >-			cache = MTRR_TYPE_WRBACK;
> >+			return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
> 
> Shouldn't KVM honor guest MTRRs as well? i.e., as if CR0.CD isn't set.
> 
> > 		else
> >-			cache = MTRR_TYPE_UNCACHABLE;
> >-
> >-		return (cache << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
> >+			return (MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT) |
> >+				VMX_EPT_IPAT_BIT;
> > 	}
> > 
> > 	return kvm_mtrr_get_guest_memory_type(vcpu, gfn) << VMX_EPT_MT_EPTE_SHIFT;
> >-- 
> >2.17.1
> >
  
Chao Gao June 20, 2023, 2:42 a.m. UTC | #2
On Fri, Jun 16, 2023 at 10:38:15AM +0800, Yan Zhao wrote:
>For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory
>types when cache is disabled and non-coherent DMA are present.
>
>With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the
>EPT memory type when guest cache is disabled before this patch.
>Removing the IPAT bit in this patch will allow effective memory type to
>honor PAT values as well, which will make the effective memory type

Given guest sets CR0.CD, what's the point of honoring (guest) PAT? e.g.,
which guests can benefit from this change?

>stronger than WB as WB is the weakest memtype. However, this change is
>acceptable as it doesn't make the memory type weaker,

>consider without
>this quirk, the original memtype for cache disabled is UC + IPAT.

This change impacts only the case where the quirk is enabled. Maybe you
mean:

_with_ the quirk, the original memtype for cached disabled is _WB_ + IPAT.


>
>Suggested-by: Sean Christopherson <seanjc@google.com>
>Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
>---
> arch/x86/kvm/vmx/vmx.c | 9 +++------
> 1 file changed, 3 insertions(+), 6 deletions(-)
>
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index 0ecf4be2c6af..c1e93678cea4 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -7548,8 +7548,6 @@ static int vmx_vm_init(struct kvm *kvm)
> 
> static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
> {
>-	u8 cache;
>-
> 	/* We wanted to honor guest CD/MTRR/PAT, but doing so could result in
> 	 * memory aliases with conflicting memory types and sometimes MCEs.
> 	 * We have to be careful as to what are honored and when.
>@@ -7576,11 +7574,10 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
> 
> 	if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) {
> 		if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
>-			cache = MTRR_TYPE_WRBACK;
>+			return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;

Shouldn't KVM honor guest MTRRs as well? i.e., as if CR0.CD isn't set.

> 		else
>-			cache = MTRR_TYPE_UNCACHABLE;
>-
>-		return (cache << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
>+			return (MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT) |
>+				VMX_EPT_IPAT_BIT;
> 	}
> 
> 	return kvm_mtrr_get_guest_memory_type(vcpu, gfn) << VMX_EPT_MT_EPTE_SHIFT;
>-- 
>2.17.1
>
  
Yan Zhao June 20, 2023, 3:17 a.m. UTC | #3
On Tue, Jun 20, 2023 at 10:42:57AM +0800, Chao Gao wrote:
> >
> >diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> >index 0ecf4be2c6af..c1e93678cea4 100644
> >--- a/arch/x86/kvm/vmx/vmx.c
> >+++ b/arch/x86/kvm/vmx/vmx.c
> >@@ -7548,8 +7548,6 @@ static int vmx_vm_init(struct kvm *kvm)
> > 
> > static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
> > {
> >-	u8 cache;
> >-
> > 	/* We wanted to honor guest CD/MTRR/PAT, but doing so could result in
> > 	 * memory aliases with conflicting memory types and sometimes MCEs.
> > 	 * We have to be careful as to what are honored and when.
> >@@ -7576,11 +7574,10 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
> > 
> > 	if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) {
> > 		if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
> >-			cache = MTRR_TYPE_WRBACK;
> >+			return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
> 
> Shouldn't KVM honor guest MTRRs as well? i.e., as if CR0.CD isn't set.

I proposed to return guest mtrr type when CR0.CD=1.
https://lore.kernel.org/all/ZHDZpHYQhPtkNnQe@google.com/

Sean rejected it because before guest MTRR is first time enabled, we
have to return WB with the quirk, otherwise the VM bootup will be super
slow.
Or maybe we can return WB when guest MTRR is not enabled and guest MTRR
type when guest MTRR is anabled to woraround it. e.g.

if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) {
	if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
	{
		cache = !mtrr_is_enabled(mtrr_state) ? MTRR_TYPE_WRBACK :
		kvm_mtrr_get_guest_memory_type(vcpu, gfn);
		return cache << VMX_EPT_MT_EPTE_SHIFT;
	}
	...
}

But Sean does not like piling too much on top of the quirk and I think it is right
to keep the quirk simple.

"In short, I am steadfastly against any change that piles more arbitrary behavior
functional behavior on top of the quirk, especially when that behavior relies on
heuristics to try and guess what the guest is doing."

> 
> > 		else
> >-			cache = MTRR_TYPE_UNCACHABLE;
> >-
> >-		return (cache << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
> >+			return (MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT) |
> >+				VMX_EPT_IPAT_BIT;
> > 	}
> > 
> > 	return kvm_mtrr_get_guest_memory_type(vcpu, gfn) << VMX_EPT_MT_EPTE_SHIFT;
> >-- 
> >2.17.1
> >
  
Yan Zhao June 20, 2023, 3:19 a.m. UTC | #4
On Tue, Jun 20, 2023 at 11:34:43AM +0800, Chao Gao wrote:
> On Tue, Jun 20, 2023 at 10:34:29AM +0800, Yan Zhao wrote:
> >On Tue, Jun 20, 2023 at 10:42:57AM +0800, Chao Gao wrote:
> >> On Fri, Jun 16, 2023 at 10:38:15AM +0800, Yan Zhao wrote:
> >> >For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory
> >> >types when cache is disabled and non-coherent DMA are present.
> >> >
> >> >With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the
> >> >EPT memory type when guest cache is disabled before this patch.
> >> >Removing the IPAT bit in this patch will allow effective memory type to
> >> >honor PAT values as well, which will make the effective memory type
> >> 
> >> Given guest sets CR0.CD, what's the point of honoring (guest) PAT? e.g.,
> >> which guests can benefit from this change?
> >This patch is actually a preparation for later patch 10 to implement
> >fine-grained zap.
> >If when CR0.CD=1 the EPT type is WB + IPAT, and
> >when CR0.CD=0 + mtrr enabled, EPT type is WB or UC or ..., which are
> >without IPAT, then we have to always zap all EPT entries.
> 
> OK. The goal is to reduce the cost of toggling CR0.CD. The key is if KVM sets
> the IPAT, then when CR0.CD is cleared by guest, KVM has to zap _all_ EPT entries
> at least to clear IPAT.
> 
> Can kvm honor guest MTRRs as well when CR0.CD=1 && with the quirk? then later
> clearing CR0.CD needn't zap _any_ EPT entry. But the optimization is exactly the
> one removed in patch 6. Maybe I miss something important.
I replied it in another mail.
  
Chao Gao June 20, 2023, 3:34 a.m. UTC | #5
On Tue, Jun 20, 2023 at 10:34:29AM +0800, Yan Zhao wrote:
>On Tue, Jun 20, 2023 at 10:42:57AM +0800, Chao Gao wrote:
>> On Fri, Jun 16, 2023 at 10:38:15AM +0800, Yan Zhao wrote:
>> >For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory
>> >types when cache is disabled and non-coherent DMA are present.
>> >
>> >With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the
>> >EPT memory type when guest cache is disabled before this patch.
>> >Removing the IPAT bit in this patch will allow effective memory type to
>> >honor PAT values as well, which will make the effective memory type
>> 
>> Given guest sets CR0.CD, what's the point of honoring (guest) PAT? e.g.,
>> which guests can benefit from this change?
>This patch is actually a preparation for later patch 10 to implement
>fine-grained zap.
>If when CR0.CD=1 the EPT type is WB + IPAT, and
>when CR0.CD=0 + mtrr enabled, EPT type is WB or UC or ..., which are
>without IPAT, then we have to always zap all EPT entries.

OK. The goal is to reduce the cost of toggling CR0.CD. The key is if KVM sets
the IPAT, then when CR0.CD is cleared by guest, KVM has to zap _all_ EPT entries
at least to clear IPAT.

Can kvm honor guest MTRRs as well when CR0.CD=1 && with the quirk? then later
clearing CR0.CD needn't zap _any_ EPT entry. But the optimization is exactly the
one removed in patch 6. Maybe I miss something important.
  
Xiaoyao Li June 25, 2023, 7:14 a.m. UTC | #6
On 6/20/2023 10:34 AM, Yan Zhao wrote:
> On Tue, Jun 20, 2023 at 10:42:57AM +0800, Chao Gao wrote:
>> On Fri, Jun 16, 2023 at 10:38:15AM +0800, Yan Zhao wrote:
>>> For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory
>>> types when cache is disabled and non-coherent DMA are present.
>>>
>>> With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the
>>> EPT memory type when guest cache is disabled before this patch.
>>> Removing the IPAT bit in this patch will allow effective memory type to
>>> honor PAT values as well, which will make the effective memory type
>> Given guest sets CR0.CD, what's the point of honoring (guest) PAT? e.g.,
>> which guests can benefit from this change?
> This patch is actually a preparation for later patch 10 to implement
> fine-grained zap.
> If when CR0.CD=1 the EPT type is WB + IPAT, and
> when CR0.CD=0 + mtrr enabled, EPT type is WB or UC or ..., which are
> without IPAT, then we have to always zap all EPT entries.
> 
> Given removing the IPAT bit when CR0.CD=1 only makes the quirk
> KVM_X86_QUIRK_CD_NW_CLEARED more strict (meaning it could be WC/UC... if
> the guest PAT overwrites it), it's still acceptable.

Per my understanding, the reason why KVM had KVM_X86_QUIRK_CD_NW_CLEARED 
is to ensure the memory type is WB to achieve better boot performance 
for old OVMF.

you need to justify the original purpose is not broken by this patch.
  
Yan Zhao June 26, 2023, 12:08 a.m. UTC | #7
On Sun, Jun 25, 2023 at 03:14:37PM +0800, Xiaoyao Li wrote:
> On 6/20/2023 10:34 AM, Yan Zhao wrote:
> > On Tue, Jun 20, 2023 at 10:42:57AM +0800, Chao Gao wrote:
> > > On Fri, Jun 16, 2023 at 10:38:15AM +0800, Yan Zhao wrote:
> > > > For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory
> > > > types when cache is disabled and non-coherent DMA are present.
> > > > 
> > > > With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the
> > > > EPT memory type when guest cache is disabled before this patch.
> > > > Removing the IPAT bit in this patch will allow effective memory type to
> > > > honor PAT values as well, which will make the effective memory type
> > > Given guest sets CR0.CD, what's the point of honoring (guest) PAT? e.g.,
> > > which guests can benefit from this change?
> > This patch is actually a preparation for later patch 10 to implement
> > fine-grained zap.
> > If when CR0.CD=1 the EPT type is WB + IPAT, and
> > when CR0.CD=0 + mtrr enabled, EPT type is WB or UC or ..., which are
> > without IPAT, then we have to always zap all EPT entries.
> > 
> > Given removing the IPAT bit when CR0.CD=1 only makes the quirk
> > KVM_X86_QUIRK_CD_NW_CLEARED more strict (meaning it could be WC/UC... if
> > the guest PAT overwrites it), it's still acceptable.
> 
> Per my understanding, the reason why KVM had KVM_X86_QUIRK_CD_NW_CLEARED is
> to ensure the memory type is WB to achieve better boot performance for old
> OVMF.
It works well for OVMF c9e5618f84b0cb54a9ac2d7604f7b7e7859b45a7,
which is Apr 14 2015.


> you need to justify the original purpose is not broken by this patch.

Hmm, to dig into the history, the reason for this quirk is explained below:

commit fb279950ba02e3210a16b11ecfa8871f3ee0ca49
Author: Xiao Guangrong <guangrong.xiao@intel.com>
Date:   Thu Jul 16 03:25:56 2015 +0800

    KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED

    OVMF depends on WB to boot fast, because it only clears caches after
    it has set up MTRRs---which is too late.

    Let's do writeback if CR0.CD is set to make it happy, similar to what
    SVM is already doing.


which means WB is only a must for fast boot before OVMF has set up MTRRs.
At that period, PAT is default to WB.

After OVMF setting up MTRR, according to the definition of no-fill cache
mode, "Strict memory ordering is not enforced unless the MTRRs are
disabled and/or all memory is referenced as uncached", it's valid to
honor PAT in no-fill cache mode.
Besides, if the guest explicitly claim UC via PAT, why should KVM return
WB?
In other words, if it's still slow caused by a UC value in guest PAT,
it's desired to be fixed in guest instead of a workaround in KVM.
  
Yan Zhao June 26, 2023, 3:38 a.m. UTC | #8
On Mon, Jun 26, 2023 at 11:40:28AM +0800, Yuan Yao wrote:
> On Mon, Jun 26, 2023 at 08:08:20AM +0800, Yan Zhao wrote:
> > On Sun, Jun 25, 2023 at 03:14:37PM +0800, Xiaoyao Li wrote:
> > > On 6/20/2023 10:34 AM, Yan Zhao wrote:
> > > > On Tue, Jun 20, 2023 at 10:42:57AM +0800, Chao Gao wrote:
> > > > > On Fri, Jun 16, 2023 at 10:38:15AM +0800, Yan Zhao wrote:
> > > > > > For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory
> > > > > > types when cache is disabled and non-coherent DMA are present.
> > > > > >
> > > > > > With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the
> > > > > > EPT memory type when guest cache is disabled before this patch.
> > > > > > Removing the IPAT bit in this patch will allow effective memory type to
> > > > > > honor PAT values as well, which will make the effective memory type
> > > > > Given guest sets CR0.CD, what's the point of honoring (guest) PAT? e.g.,
> > > > > which guests can benefit from this change?
> > > > This patch is actually a preparation for later patch 10 to implement
> > > > fine-grained zap.
> > > > If when CR0.CD=1 the EPT type is WB + IPAT, and
> > > > when CR0.CD=0 + mtrr enabled, EPT type is WB or UC or ..., which are
> > > > without IPAT, then we have to always zap all EPT entries.
> > > >
> > > > Given removing the IPAT bit when CR0.CD=1 only makes the quirk
> > > > KVM_X86_QUIRK_CD_NW_CLEARED more strict (meaning it could be WC/UC... if
> > > > the guest PAT overwrites it), it's still acceptable.
> > >
> > > Per my understanding, the reason why KVM had KVM_X86_QUIRK_CD_NW_CLEARED is
> > > to ensure the memory type is WB to achieve better boot performance for old
> > > OVMF.
> > It works well for OVMF c9e5618f84b0cb54a9ac2d7604f7b7e7859b45a7,
> > which is Apr 14 2015.
> >
> >
> > > you need to justify the original purpose is not broken by this patch.
> >
> > Hmm, to dig into the history, the reason for this quirk is explained below:
> >
> > commit fb279950ba02e3210a16b11ecfa8871f3ee0ca49
> > Author: Xiao Guangrong <guangrong.xiao@intel.com>
> > Date:   Thu Jul 16 03:25:56 2015 +0800
> >
> >     KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED
> >
> >     OVMF depends on WB to boot fast, because it only clears caches after
> >     it has set up MTRRs---which is too late.
> >
> >     Let's do writeback if CR0.CD is set to make it happy, similar to what
> >     SVM is already doing.
> >
> >
> > which means WB is only a must for fast boot before OVMF has set up MTRRs.
> > At that period, PAT is default to WB.
> >
> > After OVMF setting up MTRR, according to the definition of no-fill cache
> > mode, "Strict memory ordering is not enforced unless the MTRRs are
> > disabled and/or all memory is referenced as uncached", it's valid to
> > honor PAT in no-fill cache mode.
> 
> Does it also mean that, the honor PAT in such no-fill cache mode should
> also happen for non-quirk case ? e.g. the effective memory type can be
> WC if EPT is UC + guest PAT is WC for CD=1.
No. Only the quirk KVM_X86_QUIRK_CD_NW_CLEARED indicates no-fill cache
mode (CD=1 and NW=0).
Without the quirk, UC + IPAT is desired.

> 
> > Besides, if the guest explicitly claim UC via PAT, why should KVM return
> > WB?
> > In other words, if it's still slow caused by a UC value in guest PAT,
> > it's desired to be fixed in guest instead of a workaround in KVM.
> 
> the quirk may not work after this patch if the guest PAT is
> stronger than WB for CD=1, we don't if any guest "works correctly" based
> on this quirk, I hope no. How about highlight this in commit message
At least for Seabios and OVMF, the PAT is WB by default.
Even after MTRRs enabled, if there are UC ranges, they are small in size
and are desired to be UC.
So, I think it's ok.

> explicitly ?
Will try to explain the background and possible influence.

Thanks

> 
> Also I agree that such issue should be fixed in guest not in KVM.
>
  
Yuan Yao June 26, 2023, 3:40 a.m. UTC | #9
On Mon, Jun 26, 2023 at 08:08:20AM +0800, Yan Zhao wrote:
> On Sun, Jun 25, 2023 at 03:14:37PM +0800, Xiaoyao Li wrote:
> > On 6/20/2023 10:34 AM, Yan Zhao wrote:
> > > On Tue, Jun 20, 2023 at 10:42:57AM +0800, Chao Gao wrote:
> > > > On Fri, Jun 16, 2023 at 10:38:15AM +0800, Yan Zhao wrote:
> > > > > For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory
> > > > > types when cache is disabled and non-coherent DMA are present.
> > > > >
> > > > > With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the
> > > > > EPT memory type when guest cache is disabled before this patch.
> > > > > Removing the IPAT bit in this patch will allow effective memory type to
> > > > > honor PAT values as well, which will make the effective memory type
> > > > Given guest sets CR0.CD, what's the point of honoring (guest) PAT? e.g.,
> > > > which guests can benefit from this change?
> > > This patch is actually a preparation for later patch 10 to implement
> > > fine-grained zap.
> > > If when CR0.CD=1 the EPT type is WB + IPAT, and
> > > when CR0.CD=0 + mtrr enabled, EPT type is WB or UC or ..., which are
> > > without IPAT, then we have to always zap all EPT entries.
> > >
> > > Given removing the IPAT bit when CR0.CD=1 only makes the quirk
> > > KVM_X86_QUIRK_CD_NW_CLEARED more strict (meaning it could be WC/UC... if
> > > the guest PAT overwrites it), it's still acceptable.
> >
> > Per my understanding, the reason why KVM had KVM_X86_QUIRK_CD_NW_CLEARED is
> > to ensure the memory type is WB to achieve better boot performance for old
> > OVMF.
> It works well for OVMF c9e5618f84b0cb54a9ac2d7604f7b7e7859b45a7,
> which is Apr 14 2015.
>
>
> > you need to justify the original purpose is not broken by this patch.
>
> Hmm, to dig into the history, the reason for this quirk is explained below:
>
> commit fb279950ba02e3210a16b11ecfa8871f3ee0ca49
> Author: Xiao Guangrong <guangrong.xiao@intel.com>
> Date:   Thu Jul 16 03:25:56 2015 +0800
>
>     KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED
>
>     OVMF depends on WB to boot fast, because it only clears caches after
>     it has set up MTRRs---which is too late.
>
>     Let's do writeback if CR0.CD is set to make it happy, similar to what
>     SVM is already doing.
>
>
> which means WB is only a must for fast boot before OVMF has set up MTRRs.
> At that period, PAT is default to WB.
>
> After OVMF setting up MTRR, according to the definition of no-fill cache
> mode, "Strict memory ordering is not enforced unless the MTRRs are
> disabled and/or all memory is referenced as uncached", it's valid to
> honor PAT in no-fill cache mode.

Does it also mean that, the honor PAT in such no-fill cache mode should
also happen for non-quirk case ? e.g. the effective memory type can be
WC if EPT is UC + guest PAT is WC for CD=1.

> Besides, if the guest explicitly claim UC via PAT, why should KVM return
> WB?
> In other words, if it's still slow caused by a UC value in guest PAT,
> it's desired to be fixed in guest instead of a workaround in KVM.

the quirk may not work after this patch if the guest PAT is
stronger than WB for CD=1, we don't if any guest "works correctly" based
on this quirk, I hope no. How about highlight this in commit message
explicitly ?

Also I agree that such issue should be fixed in guest not in KVM.

>
>
>
>
  

Patch

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0ecf4be2c6af..c1e93678cea4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7548,8 +7548,6 @@  static int vmx_vm_init(struct kvm *kvm)
 
 static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
-	u8 cache;
-
 	/* We wanted to honor guest CD/MTRR/PAT, but doing so could result in
 	 * memory aliases with conflicting memory types and sometimes MCEs.
 	 * We have to be careful as to what are honored and when.
@@ -7576,11 +7574,10 @@  static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 
 	if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) {
 		if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
-			cache = MTRR_TYPE_WRBACK;
+			return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
 		else
-			cache = MTRR_TYPE_UNCACHABLE;
-
-		return (cache << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
+			return (MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT) |
+				VMX_EPT_IPAT_BIT;
 	}
 
 	return kvm_mtrr_get_guest_memory_type(vcpu, gfn) << VMX_EPT_MT_EPTE_SHIFT;