[v2,03/16] KVM: x86: Always use non-compat vcpu_runstate_info size for gfn=>pfn cache

Message ID 20221013211234.1318131-4-seanjc@google.com
State New
Headers
Series KVM: x86: gfn_to_pfn_cache fixes and cleanups |

Commit Message

Sean Christopherson Oct. 13, 2022, 9:12 p.m. UTC
  Always use the size of Xen's non-compat vcpu_runstate_info struct when
checking that the GPA+size doesn't cross a page boundary.  Conceptually,
using the current mode is more correct, but KVM isn't consistent with
itself as kvm_xen_vcpu_set_attr() unconditionally uses the "full" size
when activating the cache.  More importantly, prior to the introduction
of the gfn_to_pfn_cache, KVM _always_ used the full size, i.e. allowing
the guest (userspace?) to use a poorly aligned GPA in 32-bit mode but not
64-bit mode is more of a bug than a feature, and fixing the bug doesn't
break KVM's historical ABI.

Always using the non-compat size will allow for future gfn_to_pfn_cache
clenups as this is (was) the only case where KVM uses a different size at
check()+refresh() than at activate().  E.g. the length/size of the cache
can be made immutable and dropped from check()+refresh(), which yields a
cleaner set of APIs and avoids potential bugs that could occur if check()
where invoked with a different size than refresh().

Fixes: a795cd43c5b5 ("KVM: x86/xen: Use gfn_to_pfn_cache for runstate area")
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/xen.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)
  

Comments

Paolo Bonzini Oct. 27, 2022, 11:11 a.m. UTC | #1
On 10/13/22 23:12, Sean Christopherson wrote:
> Always use the size of Xen's non-compat vcpu_runstate_info struct when
> checking that the GPA+size doesn't cross a page boundary.  Conceptually,
> using the current mode is more correct, but KVM isn't consistent with
> itself as kvm_xen_vcpu_set_attr() unconditionally uses the "full" size
> when activating the cache.  More importantly, prior to the introduction
> of the gfn_to_pfn_cache, KVM _always_ used the full size, i.e. allowing
> the guest (userspace?) to use a poorly aligned GPA in 32-bit mode but not
> 64-bit mode is more of a bug than a feature, and fixing the bug doesn't
> break KVM's historical ABI.

I'd rather not introduce additional restrictions in KVM, mostly because 
it's actually easy to avoid this patch by instead enforcing that 
attributes are set in a sensible order:

- long mode cannot be changed after the shared info page is enabled 
(which makes sense because the shared info page also has a compat version)

- the caches must be activated after the shared info page (which 
enforces that the vCPU attributes are set after the VM attributes)

This is technically a userspace API break, but nobody is really using 
this API outside Amazon so...  Patches coming after I finish testing.

Paolo


> Always using the non-compat size will allow for future gfn_to_pfn_cache
> clenups as this is (was) the only case where KVM uses a different size at
> check()+refresh() than at activate().  E.g. the length/size of the cache
> can be made immutable and dropped from check()+refresh(), which yields a
> cleaner set of APIs and avoids potential bugs that could occur if check()
> where invoked with a different size than refresh().
> 
> Fixes: a795cd43c5b5 ("KVM: x86/xen: Use gfn_to_pfn_cache for runstate area")
> Cc: David Woodhouse <dwmw2@infradead.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/xen.c | 5 +----
>   1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index b2be60c6efa4..9e79ef2cca99 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -212,10 +212,7 @@ void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state)
>   	if (!vx->runstate_cache.active)
>   		return;
>   
> -	if (IS_ENABLED(CONFIG_64BIT) && v->kvm->arch.xen.long_mode)
> -		user_len = sizeof(struct vcpu_runstate_info);
> -	else
> -		user_len = sizeof(struct compat_vcpu_runstate_info);
> +	user_len = sizeof(struct vcpu_runstate_info);
>   
>   	read_lock_irqsave(&gpc->lock, flags);
>   	while (!kvm_gfn_to_pfn_cache_check(v->kvm, gpc, gpc->gpa,
  
Sean Christopherson Oct. 27, 2022, 2:44 p.m. UTC | #2
On Thu, Oct 27, 2022, Paolo Bonzini wrote:
> On 10/13/22 23:12, Sean Christopherson wrote:
> > Always use the size of Xen's non-compat vcpu_runstate_info struct when
> > checking that the GPA+size doesn't cross a page boundary.  Conceptually,
> > using the current mode is more correct, but KVM isn't consistent with
> > itself as kvm_xen_vcpu_set_attr() unconditionally uses the "full" size
> > when activating the cache.  More importantly, prior to the introduction
> > of the gfn_to_pfn_cache, KVM _always_ used the full size, i.e. allowing
> > the guest (userspace?) to use a poorly aligned GPA in 32-bit mode but not
> > 64-bit mode is more of a bug than a feature, and fixing the bug doesn't
> > break KVM's historical ABI.
> 
> I'd rather not introduce additional restrictions in KVM,

But KVM already has this restriction.  "struct vcpu_info" is always checked for
the non-compat size, and as above, "struct vcpu_runstate_info" is checked for the
non-compat size during its initialization.

> actually easy to avoid this patch by instead enforcing that attributes are
> set in a sensible order:

I don't care about fixing XEN support, I care about forcing "length" to be immutable
in order to simplify the gfn_to_pfn_cache implementation.  Avoiding this patch
prevents doing that later in this series.

> - long mode cannot be changed after the shared info page is enabled (which
> makes sense because the shared info page also has a compat version)

How is this not introducing an additional restriction?  This seems way more
onerous than what is effectively a revert.

> - the caches must be activated after the shared info page (which enforces
> that the vCPU attributes are set after the VM attributes)
> 
> This is technically a userspace API break, but nobody is really using this
> API outside Amazon so...  Patches coming after I finish testing.

It's not just userspace break, it affects the guest ABI as well.  arch.xen.long_mode
isn't set just by userspace, it's also snapshot when the guest changes the hypercall
page.  Maybe there's something in the XEN ABI that says the hypercall page can never
be changed, but barring that I don't see how to prevent ending up with a misaligned
cache due to the guest enabling long mode.

  int kvm_xen_write_hypercall_page(struct kvm_vcpu *vcpu, u64 data)
  {
	struct kvm *kvm = vcpu->kvm;
	u32 page_num = data & ~PAGE_MASK;
	u64 page_addr = data & PAGE_MASK;
	bool lm = is_long_mode(vcpu);

	/* Latch long_mode for shared_info pages etc. */
	vcpu->kvm->arch.xen.long_mode = lm;
  
Paolo Bonzini Oct. 27, 2022, 3:03 p.m. UTC | #3
On 10/27/22 16:44, Sean Christopherson wrote:
>> - long mode cannot be changed after the shared info page is enabled (which
>> makes sense because the shared info page also has a compat version)
> 
> How is this not introducing an additional restriction?  This seems way more
> onerous than what is effectively a revert.
>
>> - the caches must be activated after the shared info page (which enforces
>> that the vCPU attributes are set after the VM attributes)
>>
>> This is technically a userspace API break, but nobody is really using this
>> API outside Amazon so...  Patches coming after I finish testing.
> 
> It's not just userspace break, it affects the guest ABI as well.

Yes, I was talking of the VMM here; additional restrictions are fine there.

The guests however should be compatible with Xen, so you also need to 
re-activate the cache after the hypercall page is written, but that's 
two lines of code.

Paolo
  
Sean Christopherson Oct. 27, 2022, 3:10 p.m. UTC | #4
On Thu, Oct 27, 2022, Paolo Bonzini wrote:
> On 10/27/22 16:44, Sean Christopherson wrote:
> > > - long mode cannot be changed after the shared info page is enabled (which
> > > makes sense because the shared info page also has a compat version)
> > 
> > How is this not introducing an additional restriction?  This seems way more
> > onerous than what is effectively a revert.
> > 
> > > - the caches must be activated after the shared info page (which enforces
> > > that the vCPU attributes are set after the VM attributes)
> > > 
> > > This is technically a userspace API break, but nobody is really using this
> > > API outside Amazon so...  Patches coming after I finish testing.
> > 
> > It's not just userspace break, it affects the guest ABI as well.
> 
> Yes, I was talking of the VMM here; additional restrictions are fine there.

Additional restrictions are fine where?

> The guests however should be compatible with Xen, so you also need to
> re-activate the cache after the hypercall page is written, but that's two
> lines of code.

And do what if the guest transitions from 32-bit => 64-bit and the cache isn't
aligned for 64-bit?  E.g. kvm_xen_set_evtchn() will silently drop events no matter
what KVM does.  In other words, I don't see how KVM can provide a same ABI without
forcing the cached pages to be aligned for the largets possible size.
  
Sean Christopherson Oct. 27, 2022, 4:56 p.m. UTC | #5
On Thu, Oct 27, 2022, Sean Christopherson wrote:
> On Thu, Oct 27, 2022, Paolo Bonzini wrote:
> > On 10/13/22 23:12, Sean Christopherson wrote:
> > > Always use the size of Xen's non-compat vcpu_runstate_info struct when
> > > checking that the GPA+size doesn't cross a page boundary.  Conceptually,
> > > using the current mode is more correct, but KVM isn't consistent with
> > > itself as kvm_xen_vcpu_set_attr() unconditionally uses the "full" size
> > > when activating the cache.  More importantly, prior to the introduction
> > > of the gfn_to_pfn_cache, KVM _always_ used the full size, i.e. allowing
> > > the guest (userspace?) to use a poorly aligned GPA in 32-bit mode but not
> > > 64-bit mode is more of a bug than a feature, and fixing the bug doesn't
> > > break KVM's historical ABI.
> > 
> > I'd rather not introduce additional restrictions in KVM,
> 
> But KVM already has this restriction.  "struct vcpu_info" is always checked for
> the non-compat size, and as above, "struct vcpu_runstate_info" is checked for the
> non-compat size during its initialization.

Ah, I forgot those are the same size:

		BUILD_BUG_ON(sizeof(struct vcpu_info) !=
			     sizeof(struct compat_vcpu_info));
  

Patch

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index b2be60c6efa4..9e79ef2cca99 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -212,10 +212,7 @@  void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state)
 	if (!vx->runstate_cache.active)
 		return;
 
-	if (IS_ENABLED(CONFIG_64BIT) && v->kvm->arch.xen.long_mode)
-		user_len = sizeof(struct vcpu_runstate_info);
-	else
-		user_len = sizeof(struct compat_vcpu_runstate_info);
+	user_len = sizeof(struct vcpu_runstate_info);
 
 	read_lock_irqsave(&gpc->lock, flags);
 	while (!kvm_gfn_to_pfn_cache_check(v->kvm, gpc, gpc->gpa,