[08/16] KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults

Message ID 20240228024147.41573-9-seanjc@google.com
State New
Headers
Series KVM: x86/mmu: Page fault and MMIO cleanups |

Commit Message

Sean Christopherson Feb. 28, 2024, 2:41 a.m. UTC
  WARN and skip the emulated MMIO fastpath if a private, reserved page fault
is encountered, as private+reserved should be an impossible combination
(KVM should never create an MMIO SPTE for a private access).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
  

Comments

Kai Huang Feb. 29, 2024, 10:26 p.m. UTC | #1
On 28/02/2024 3:41 pm, Sean Christopherson wrote:
> WARN and skip the emulated MMIO fastpath if a private, reserved page fault
> is encountered, as private+reserved should be an impossible combination
> (KVM should never create an MMIO SPTE for a private access).
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index bd342ebd0809..9206cfa58feb 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5866,7 +5866,8 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>   		error_code |= PFERR_PRIVATE_ACCESS;
>   
>   	r = RET_PF_INVALID;
> -	if (unlikely(error_code & PFERR_RSVD_MASK)) {
> +	if (unlikely((error_code & PFERR_RSVD_MASK) &&
> +		     !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) {
>   		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
>   		if (r == RET_PF_EMULATE)
>   			goto emulate;

It seems this will make KVM continue to call kvm_mmu_do_page_fault() 
when such private+reserve error code actually happens (e.g., due to 
bug), because @r is still RET_PF_INVALID in such case.

Is it better to just return error, e.g., -EINVAL, and give up?
  
Sean Christopherson Feb. 29, 2024, 11:06 p.m. UTC | #2
On Fri, Mar 01, 2024, Kai Huang wrote:
> 
> 
> On 28/02/2024 3:41 pm, Sean Christopherson wrote:
> > WARN and skip the emulated MMIO fastpath if a private, reserved page fault
> > is encountered, as private+reserved should be an impossible combination
> > (KVM should never create an MMIO SPTE for a private access).
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/x86/kvm/mmu/mmu.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index bd342ebd0809..9206cfa58feb 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -5866,7 +5866,8 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
> >   		error_code |= PFERR_PRIVATE_ACCESS;
> >   	r = RET_PF_INVALID;
> > -	if (unlikely(error_code & PFERR_RSVD_MASK)) {
> > +	if (unlikely((error_code & PFERR_RSVD_MASK) &&
> > +		     !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) {
> >   		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
> >   		if (r == RET_PF_EMULATE)
> >   			goto emulate;
> 
> It seems this will make KVM continue to call kvm_mmu_do_page_fault() when
> such private+reserve error code actually happens (e.g., due to bug), because
> @r is still RET_PF_INVALID in such case.

Yep.

> Is it better to just return error, e.g., -EINVAL, and give up?

As long as there is no obvious/immediate danger to the host, no obvious way for
the "bad" behavior to cause data corruption for the guest, and continuing on has
a plausible chance of working, then KVM should generally try to continue on and
not terminate the VM.

E.g. in this case, KVM will just skip various fast paths because of the RSVD flag,
and treat the fault like a PRIVATE fault.  Hmm, but page_fault_handle_page_track()
would skip write tracking, which could theoretically cause data corruption, so I
guess arguably it would be safer to bail?

Anyone else have an opinion?  This type of bug should never escape development,
so I'm a-ok effectively killing the VM.  Unless someone has a good argument for
continuing on, I'll go with Kai's suggestion and squash this:

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index cedacb1b89c5..d796a162b2da 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5892,8 +5892,10 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
                error_code |= PFERR_PRIVATE_ACCESS;
 
        r = RET_PF_INVALID;
-       if (unlikely((error_code & PFERR_RSVD_MASK) &&
-                    !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) {
+       if (unlikely(error_code & PFERR_RSVD_MASK)) {
+               if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))
+                       return -EFAULT;
+
                r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
                if (r == RET_PF_EMULATE)
                        goto emulate;
  
Kai Huang Feb. 29, 2024, 11:21 p.m. UTC | #3
On 1/03/2024 12:06 pm, Sean Christopherson wrote:
> On Fri, Mar 01, 2024, Kai Huang wrote:
>>
>>
>> On 28/02/2024 3:41 pm, Sean Christopherson wrote:
>>> WARN and skip the emulated MMIO fastpath if a private, reserved page fault
>>> is encountered, as private+reserved should be an impossible combination
>>> (KVM should never create an MMIO SPTE for a private access).
>>>
>>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>>> ---
>>>    arch/x86/kvm/mmu/mmu.c | 3 ++-
>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>>> index bd342ebd0809..9206cfa58feb 100644
>>> --- a/arch/x86/kvm/mmu/mmu.c
>>> +++ b/arch/x86/kvm/mmu/mmu.c
>>> @@ -5866,7 +5866,8 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>>>    		error_code |= PFERR_PRIVATE_ACCESS;
>>>    	r = RET_PF_INVALID;
>>> -	if (unlikely(error_code & PFERR_RSVD_MASK)) {
>>> +	if (unlikely((error_code & PFERR_RSVD_MASK) &&
>>> +		     !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) {
>>>    		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
>>>    		if (r == RET_PF_EMULATE)
>>>    			goto emulate;
>>
>> It seems this will make KVM continue to call kvm_mmu_do_page_fault() when
>> such private+reserve error code actually happens (e.g., due to bug), because
>> @r is still RET_PF_INVALID in such case.
> 
> Yep.
> 
>> Is it better to just return error, e.g., -EINVAL, and give up?
> 
> As long as there is no obvious/immediate danger to the host, no obvious way for
> the "bad" behavior to cause data corruption for the guest, and continuing on has
> a plausible chance of working, then KVM should generally try to continue on and
> not terminate the VM.

Agreed.  But I think sometimes it is hard to tell whether there's any 
dangerous things waiting to happen, because that means we have to sanity 
check a lot of code, and when new patches arrive we need to keep that in 
mind too, which could be a nightmare in terms of maintenance.

> 
> E.g. in this case, KVM will just skip various fast paths because of the RSVD flag,
> and treat the fault like a PRIVATE fault.  Hmm, but page_fault_handle_page_track()
> would skip write tracking, which could theoretically cause data corruption, so I
> guess arguably it would be safer to bail?
> 
> Anyone else have an opinion?  This type of bug should never escape development,
> so I'm a-ok effectively killing the VM.  Unless someone has a good argument for
> continuing on, I'll go with Kai's suggestion and squash this:
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index cedacb1b89c5..d796a162b2da 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5892,8 +5892,10 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>                  error_code |= PFERR_PRIVATE_ACCESS;
>   
>          r = RET_PF_INVALID;
> -       if (unlikely((error_code & PFERR_RSVD_MASK) &&
> -                    !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) {
> +       if (unlikely(error_code & PFERR_RSVD_MASK)) {
> +               if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))
> +                       return -EFAULT;

-EFAULT is part of guest_memfd() memory fault ABI.  I didn't think over 
this thoroughly but do you want to return -EFAULT here?
  
Sean Christopherson March 4, 2024, 3:51 p.m. UTC | #4
On Fri, Mar 01, 2024, Kai Huang wrote:
> On 1/03/2024 12:06 pm, Sean Christopherson wrote:
> > E.g. in this case, KVM will just skip various fast paths because of the RSVD flag,
> > and treat the fault like a PRIVATE fault.  Hmm, but page_fault_handle_page_track()
> > would skip write tracking, which could theoretically cause data corruption, so I
> > guess arguably it would be safer to bail?
> > 
> > Anyone else have an opinion?  This type of bug should never escape development,
> > so I'm a-ok effectively killing the VM.  Unless someone has a good argument for
> > continuing on, I'll go with Kai's suggestion and squash this:
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index cedacb1b89c5..d796a162b2da 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -5892,8 +5892,10 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
> >                  error_code |= PFERR_PRIVATE_ACCESS;
> >          r = RET_PF_INVALID;
> > -       if (unlikely((error_code & PFERR_RSVD_MASK) &&
> > -                    !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) {
> > +       if (unlikely(error_code & PFERR_RSVD_MASK)) {
> > +               if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))
> > +                       return -EFAULT;
> 
> -EFAULT is part of guest_memfd() memory fault ABI.  I didn't think over this
> thoroughly but do you want to return -EFAULT here?

Yes, I/we do.  There are many existing paths that can return -EFAULT from KVM_RUN
without setting run->exit_reason to KVM_EXIT_MEMORY_FAULT.  Userspace is responsible
for checking run->exit_reason on -EFAULT (and -EHWPOISON), i.e. must be prepared
to handle a "bare" -EFAULT, where for all intents and purposes "handle" means
"terminate the guest".

That's actually one of the reasons why KVM_EXIT_MEMORY_FAULT exists, it'd require
an absurd amount of work and churn in KVM to *safely* return useful information
on *all* -EFAULTs.  FWIW, I had hopes and dreams of actually doing exactly this,
but have long since abandoned those dreams.

In other words, KVM_EXIT_MEMORY_FAULT essentially communicates to userspace that
(a) userspace can likely fix whatever badness triggered the -EFAULT, and (b) that
KVM is in a state where fixing the underlying problem and resuming the guest is
safe, e.g. won't corrupt the guest (because KVM is in a half-baked state).
  

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index bd342ebd0809..9206cfa58feb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5866,7 +5866,8 @@  int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 		error_code |= PFERR_PRIVATE_ACCESS;
 
 	r = RET_PF_INVALID;
-	if (unlikely(error_code & PFERR_RSVD_MASK)) {
+	if (unlikely((error_code & PFERR_RSVD_MASK) &&
+		     !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) {
 		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
 		if (r == RET_PF_EMULATE)
 			goto emulate;