[05/16] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults

Message ID 20240228024147.41573-6-seanjc@google.com
State New
Headers
Series KVM: x86/mmu: Page fault and MMIO cleanups |

Commit Message

Sean Christopherson Feb. 28, 2024, 2:41 a.m. UTC
  Add and use a synthetic, KVM-defined page fault error code to indicate
whether a fault is to private vs. shared memory.  TDX and SNP have
different mechanisms for reporting private vs. shared, and KVM's
software-protected VMs have no mechanism at all.  Usurp an error code
flag to avoid having to plumb another parameter to kvm_mmu_page_fault()
and friends.

Alternatively, KVM could borrow AMD's PFERR_GUEST_ENC_MASK, i.e. set it
for TDX and software-protected VMs as appropriate, but that would require
*clearing* the flag for SEV and SEV-ES VMs, which support encrypted
memory at the hardware layer, but don't utilize private memory at the
KVM layer.

Opportunistically add a comment to call out that the logic for software-
protected VMs is (and was before this commit) broken for nested MMUs, i.e.
for nested TDP, as the GPA is an L2 GPA.  Punt on trying to play nice with
nested MMUs as there is a _lot_ of functionality that simply doesn't work
for software-protected VMs, e.g. all of the paths where KVM accesses guest
memory need to be updated to be aware of private vs. shared memory.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 11 +++++++++++
 arch/x86/kvm/mmu/mmu.c          | 26 +++++++++++++++++++-------
 arch/x86/kvm/mmu/mmu_internal.h |  2 +-
 3 files changed, 31 insertions(+), 8 deletions(-)
  

Comments

Kai Huang Feb. 29, 2024, 11:16 a.m. UTC | #1
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 408969ac1291..7807bdcd87e8 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5839,19 +5839,31 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>  	bool direct = vcpu->arch.mmu->root_role.direct;
>  
>  	/*
> -	 * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP
> -	 * checks when emulating instructions that triggers implicit access.
>  	 * WARN if hardware generates a fault with an error code that collides
> -	 * with the KVM-defined value.  Clear the flag and continue on, i.e.
> -	 * don't terminate the VM, as KVM can't possibly be relying on a flag
> -	 * that KVM doesn't know about.
> +	 * with KVM-defined sythentic flags.  Clear the flags and continue on,
> +	 * i.e. don't terminate the VM, as KVM can't possibly be relying on a
> +	 * flag that KVM doesn't know about.
>  	 */
> -	if (WARN_ON_ONCE(error_code & PFERR_IMPLICIT_ACCESS))
> -		error_code &= ~PFERR_IMPLICIT_ACCESS;
> +	if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
> +		error_code &= ~PFERR_SYNTHETIC_MASK;
>  

Hmm.. I thought for TDX the caller -- handle_ept_violation() -- should
explicitly set the PFERR_PRIVATE_ACCESS so that here the fault handler can
figure out the fault is private.

Now it seems the caller should never pass PFERR_PRIVATE_ACCESS, then ...

>  	if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
>  		return RET_PF_RETRY;
>  
> +	/*
> +	 * Except for reserved faults (emulated MMIO is shared-only), set the
> +	 * private flag for software-protected VMs based on the gfn's current
> +	 * attributes, which are the source of truth for such VMs.  Note, this
> +	 * wrong for nested MMUs as the GPA is an L2 GPA, but KVM doesn't
> +	 * currently supported nested virtualization (among many other things)
> +	 * for software-protected VMs.
> +	 */
> +	if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) &&
> +	    !(error_code & PFERR_RSVD_MASK) &&
> +	    vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM &&
> +	    kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)))
> +		error_code |= PFERR_PRIVATE_ACCESS;
> +
> 

... I am wondering how we figure out whether a fault is private for TDX?
  
Sean Christopherson Feb. 29, 2024, 3:17 p.m. UTC | #2
On Thu, Feb 29, 2024, Kai Huang wrote:
> 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 408969ac1291..7807bdcd87e8 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -5839,19 +5839,31 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
> >  	bool direct = vcpu->arch.mmu->root_role.direct;
> >  
> >  	/*
> > -	 * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP
> > -	 * checks when emulating instructions that triggers implicit access.
> >  	 * WARN if hardware generates a fault with an error code that collides
> > -	 * with the KVM-defined value.  Clear the flag and continue on, i.e.
> > -	 * don't terminate the VM, as KVM can't possibly be relying on a flag
> > -	 * that KVM doesn't know about.
> > +	 * with KVM-defined sythentic flags.  Clear the flags and continue on,
> > +	 * i.e. don't terminate the VM, as KVM can't possibly be relying on a
> > +	 * flag that KVM doesn't know about.
> >  	 */
> > -	if (WARN_ON_ONCE(error_code & PFERR_IMPLICIT_ACCESS))
> > -		error_code &= ~PFERR_IMPLICIT_ACCESS;
> > +	if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
> > +		error_code &= ~PFERR_SYNTHETIC_MASK;
> >  
> 
> Hmm.. I thought for TDX the caller -- handle_ept_violation() -- should
> explicitly set the PFERR_PRIVATE_ACCESS so that here the fault handler can
> figure out the fault is private.
> 
> Now it seems the caller should never pass PFERR_PRIVATE_ACCESS, then ...
> 
> >  	if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
> >  		return RET_PF_RETRY;
> >  
> > +	/*
> > +	 * Except for reserved faults (emulated MMIO is shared-only), set the
> > +	 * private flag for software-protected VMs based on the gfn's current
> > +	 * attributes, which are the source of truth for such VMs.  Note, this
> > +	 * wrong for nested MMUs as the GPA is an L2 GPA, but KVM doesn't
> > +	 * currently supported nested virtualization (among many other things)
> > +	 * for software-protected VMs.
> > +	 */
> > +	if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) &&
> > +	    !(error_code & PFERR_RSVD_MASK) &&
> > +	    vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM &&
> > +	    kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)))
> > +		error_code |= PFERR_PRIVATE_ACCESS;
> > +
> > 
> 
> ... I am wondering how we figure out whether a fault is private for TDX?

Read the next few patches :-)

The sanity check gets moved to the legacy #PF handler (any error code with bits
63:32!=0 yells) and SVM's #NPF handler (error code with synthetic bits set yells),
leaving VMX free and clear to stuff PFERR_PRIVATE_ACCESS as appropriate.
  

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1e69743ef0fb..4077c46c61ab 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -267,7 +267,18 @@  enum x86_intercept_stage;
 #define PFERR_GUEST_ENC_MASK	BIT_ULL(34)
 #define PFERR_GUEST_SIZEM_MASK	BIT_ULL(35)
 #define PFERR_GUEST_VMPL_MASK	BIT_ULL(36)
+
+/*
+ * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP checks
+ * when emulating instructions that triggers implicit access.
+ */
 #define PFERR_IMPLICIT_ACCESS	BIT_ULL(48)
+/*
+ * PRIVATE_ACCESS is a KVM-defined flag us to indicate that a fault occurred
+ * when the guest was accessing private memory.
+ */
+#define PFERR_PRIVATE_ACCESS	BIT_ULL(49)
+#define PFERR_SYNTHETIC_MASK	(PFERR_IMPLICIT_ACCESS | PFERR_PRIVATE_ACCESS)
 
 #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
 				 PFERR_WRITE_MASK |		\
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 408969ac1291..7807bdcd87e8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5839,19 +5839,31 @@  int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	bool direct = vcpu->arch.mmu->root_role.direct;
 
 	/*
-	 * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP
-	 * checks when emulating instructions that triggers implicit access.
 	 * WARN if hardware generates a fault with an error code that collides
-	 * with the KVM-defined value.  Clear the flag and continue on, i.e.
-	 * don't terminate the VM, as KVM can't possibly be relying on a flag
-	 * that KVM doesn't know about.
+	 * with KVM-defined sythentic flags.  Clear the flags and continue on,
+	 * i.e. don't terminate the VM, as KVM can't possibly be relying on a
+	 * flag that KVM doesn't know about.
 	 */
-	if (WARN_ON_ONCE(error_code & PFERR_IMPLICIT_ACCESS))
-		error_code &= ~PFERR_IMPLICIT_ACCESS;
+	if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
+		error_code &= ~PFERR_SYNTHETIC_MASK;
 
 	if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
 		return RET_PF_RETRY;
 
+	/*
+	 * Except for reserved faults (emulated MMIO is shared-only), set the
+	 * private flag for software-protected VMs based on the gfn's current
+	 * attributes, which are the source of truth for such VMs.  Note, this
+	 * wrong for nested MMUs as the GPA is an L2 GPA, but KVM doesn't
+	 * currently supported nested virtualization (among many other things)
+	 * for software-protected VMs.
+	 */
+	if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) &&
+	    !(error_code & PFERR_RSVD_MASK) &&
+	    vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM &&
+	    kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)))
+		error_code |= PFERR_PRIVATE_ACCESS;
+
 	r = RET_PF_INVALID;
 	if (unlikely(error_code & PFERR_RSVD_MASK)) {
 		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 1fab1f2359b5..d7c10d338f14 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -306,7 +306,7 @@  static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
 		.req_level = PG_LEVEL_4K,
 		.goal_level = PG_LEVEL_4K,
-		.is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
+		.is_private = err & PFERR_PRIVATE_ACCESS,
 	};
 	int r;