[2/5] KVM: SVM: Use kvm_pat_valid() directly instead of kvm_mtrr_valid()

Message ID 20230503182852.3431281-3-seanjc@google.com
State New
Headers
Series KVM: x86: Clean up MSR PAT handling |

Commit Message

Sean Christopherson May 3, 2023, 6:28 p.m. UTC
  From: Ke Guo <guoke@uniontech.com>

Use kvm_pat_valid() directly instead of bouncing through kvm_mtrr_valid().
The PAT is not an MTRR, and kvm_mtrr_valid() just redirects to
kvm_pat_valid(), i.e. for better or worse, KVM doesn't apply the "zap
SPTEs" logic to guest PAT changes when the VM has a passthrough device
with non-coherent DMA.

Signed-off-by: Ke Guo <guoke@uniontech.com>
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Kai Huang May 3, 2023, 11:04 p.m. UTC | #1
> for better or worse, KVM doesn't apply the "zap
> SPTEs" logic to guest PAT changes when the VM has a passthrough device
> with non-coherent DMA.

Is it a bug?

> 
> Signed-off-by: Ke Guo <guoke@uniontech.com>
> [sean: massage changelog]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/svm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index eb308c9994f9..db237ccdc957 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2935,7 +2935,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>  
>  		break;
>  	case MSR_IA32_CR_PAT:
> -		if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
> +		if (!kvm_pat_valid(data))
>  			return 1;
>  		vcpu->arch.pat = data;
>  		svm->vmcb01.ptr->save.g_pat = data;

Anyway for this change,

Reviewed-by: Kai Huang <kai.huang@intel.com>
  
Sean Christopherson May 4, 2023, 3:34 p.m. UTC | #2
On Wed, May 03, 2023, Kai Huang wrote:
> > for better or worse, KVM doesn't apply the "zap
> > SPTEs" logic to guest PAT changes when the VM has a passthrough device
> > with non-coherent DMA.
> 
> Is it a bug?

No.  KVM's MTRR behavior is using a heuristic to try not to break the VM: if the
VM has non-coherent DMA, then honor UC mapping in the MTRRs as such mappings may
be coverage the non-coherent DMA.

From vmx_get_mt_mask():

	/* We wanted to honor guest CD/MTRR/PAT, but doing so could result in
	 * memory aliases with conflicting memory types and sometimes MCEs.
	 * We have to be careful as to what are honored and when.

The PAT is problematic because it is referenced via the guest PTEs, versus the
MTRRs being tied to the guest physical address, e.g. different virtual mappings
for the same physical address can yield different memtypes via the PAT.  My head
hurts just thinking about how that might interact with shadow paging :-)

Even the MTRRs are somewhat sketchy because they are technically per-CPU, i.e.
two vCPUs could have different memtypes for the same physical address.  But in
practice, sane software/firmware uses consistent MTRRs across all CPUs.
  
Kai Huang May 5, 2023, 11:20 a.m. UTC | #3
On Thu, 2023-05-04 at 08:34 -0700, Sean Christopherson wrote:
> On Wed, May 03, 2023, Kai Huang wrote:
> > > for better or worse, KVM doesn't apply the "zap
> > > SPTEs" logic to guest PAT changes when the VM has a passthrough device
> > > with non-coherent DMA.
> > 
> > Is it a bug?
> 
> No.  KVM's MTRR behavior is using a heuristic to try not to break the VM: if the
> VM has non-coherent DMA, then honor UC mapping in the MTRRs as such mappings may
> be coverage the non-coherent DMA.
> 
> From vmx_get_mt_mask():
> 
> 	/* We wanted to honor guest CD/MTRR/PAT, but doing so could result in
> 	 * memory aliases with conflicting memory types and sometimes MCEs.
> 	 * We have to be careful as to what are honored and when.
> 
> The PAT is problematic because it is referenced via the guest PTEs, versus the
> MTRRs being tied to the guest physical address, e.g. different virtual mappings
> for the same physical address can yield different memtypes via the PAT.  My head
> hurts just thinking about how that might interact with shadow paging :-)
> 
> Even the MTRRs are somewhat sketchy because they are technically per-CPU, i.e.
> two vCPUs could have different memtypes for the same physical address.  But in
> practice, sane software/firmware uses consistent MTRRs across all CPUs.

Agreed on all above odds.

But I think the answer to my question is actually we simply don't _need_ to zap
SPTEs (with non-coherent DMA) when guest's IA32_PAT is changed:

1) If EPT is enabled, IIUC guest's PAT is already horned.  VMCS's GUEST_IA32_PAT
always reflects the IA32_PAT that guest wants to set.  EPT's memtype bits are
set according to guest's MTRR.  That means guest changing IA32_PAT doesn't need
to zap EPT PTEs as "EPT PTEs essentially only replaces guest's MTRRs".

2) If EPT is disabled, looking at the code, if I read correctly, the
'shadow_memtype_mask' is 0 for Intel, in which case KVM won't try to set any PAT
memtype bit in shadow MMU PTE, which means the true PAT memtype is always WB and
guest's memtype is never horned (guest's MTRRs are also never actually used by
HW), which should be fine I guess??  My brain refused to go further :)

But anyway back to my question, I think "changing guest's IA32_PAT" shouldn't
result in needing to "zap SPTEs".
  
Sean Christopherson May 11, 2023, 11:03 p.m. UTC | #4
On Fri, May 05, 2023, Kai Huang wrote:
> On Thu, 2023-05-04 at 08:34 -0700, Sean Christopherson wrote:
> > On Wed, May 03, 2023, Kai Huang wrote:
> > > > for better or worse, KVM doesn't apply the "zap
> > > > SPTEs" logic to guest PAT changes when the VM has a passthrough device
> > > > with non-coherent DMA.
> > > 
> > > Is it a bug?
> > 
> > No.  KVM's MTRR behavior is using a heuristic to try not to break the VM: if the
> > VM has non-coherent DMA, then honor UC mapping in the MTRRs as such mappings may
> > be coverage the non-coherent DMA.
> > 
> > From vmx_get_mt_mask():
> > 
> > 	/* We wanted to honor guest CD/MTRR/PAT, but doing so could result in
> > 	 * memory aliases with conflicting memory types and sometimes MCEs.
> > 	 * We have to be careful as to what are honored and when.
> > 
> > The PAT is problematic because it is referenced via the guest PTEs, versus the
> > MTRRs being tied to the guest physical address, e.g. different virtual mappings
> > for the same physical address can yield different memtypes via the PAT.  My head
> > hurts just thinking about how that might interact with shadow paging :-)
> > 
> > Even the MTRRs are somewhat sketchy because they are technically per-CPU, i.e.
> > two vCPUs could have different memtypes for the same physical address.  But in
> > practice, sane software/firmware uses consistent MTRRs across all CPUs.
> 
> Agreed on all above odds.
> 
> But I think the answer to my question is actually we simply don't _need_ to zap
> SPTEs (with non-coherent DMA) when guest's IA32_PAT is changed:
> 
> 1) If EPT is enabled, IIUC guest's PAT is already horned.  VMCS's GUEST_IA32_PAT
> always reflects the IA32_PAT that guest wants to set.  EPT's memtype bits are
> set according to guest's MTRR.  That means guest changing IA32_PAT doesn't need
> to zap EPT PTEs as "EPT PTEs essentially only replaces guest's MTRRs".

Ah, yes, you're correct.  I thought KVM _always_ set the "ignore guest PAT" bit
in the EPT PTEs, but KVM honors guest PAT when non-coherent DMA is present and
CR0.CD=0.

> 2) If EPT is disabled, looking at the code, if I read correctly, the
> 'shadow_memtype_mask' is 0 for Intel, in which case KVM won't try to set any PAT
> memtype bit in shadow MMU PTE, which means the true PAT memtype is always WB and
> guest's memtype is never horned (guest's MTRRs are also never actually used by
> HW), which should be fine I guess??  My brain refused to go further :)

Yep.  It's entirely possible that VT-d without snoop control simply doesn't work
with shadow paging, but no one has ever cared.

> But anyway back to my question, I think "changing guest's IA32_PAT" shouldn't
> result in needing to "zap SPTEs".
  

Patch

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index eb308c9994f9..db237ccdc957 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2935,7 +2935,7 @@  static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 
 		break;
 	case MSR_IA32_CR_PAT:
-		if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
+		if (!kvm_pat_valid(data))
 			return 1;
 		vcpu->arch.pat = data;
 		svm->vmcb01.ptr->save.g_pat = data;