[v2,14/27] KVM: x86: Reject memslot MOVE operations if KVMGT is attached

Message ID 20230311002258.852397-15-seanjc@google.com
State New
Headers
Series drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups |

Commit Message

Sean Christopherson March 11, 2023, 12:22 a.m. UTC
  Disallow moving memslots if the VM has external page-track users, i.e. if
KVMGT is being used to expose a virtual GPU to the guest, as KVM doesn't
correctly handle moving memory regions.

Note, this is potential ABI breakage!  E.g. userspace could move regions
that aren't shadowed by KVMGT without harming the guest.  However, the
only known user of KVMGT is QEMU, and QEMU doesn't move generic memory
regions.  KVM's own support for moving memory regions was also broken for
multiple years (albeit for an edge case, but arguably moving RAM is
itself an edge case), e.g. see commit edd4fa37baa6 ("KVM: x86: Allocate
new rmap and large page tracking when moving memslot").

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 3 +++
 arch/x86/kvm/mmu/page_track.c         | 5 +++++
 arch/x86/kvm/x86.c                    | 7 +++++++
 3 files changed, 15 insertions(+)
  

Comments

Yan Zhao March 15, 2023, 8:03 a.m. UTC | #1
On Fri, Mar 10, 2023 at 04:22:45PM -0800, Sean Christopherson wrote:
> Disallow moving memslots if the VM has external page-track users, i.e. if
> KVMGT is being used to expose a virtual GPU to the guest, as KVM doesn't
> correctly handle moving memory regions.
> 
> Note, this is potential ABI breakage!  E.g. userspace could move regions
> that aren't shadowed by KVMGT without harming the guest.  However, the
> only known user of KVMGT is QEMU, and QEMU doesn't move generic memory
> regions.  KVM's own support for moving memory regions was also broken for
> multiple years (albeit for an edge case, but arguably moving RAM is
> itself an edge case), e.g. see commit edd4fa37baa6 ("KVM: x86: Allocate
> new rmap and large page tracking when moving memslot").
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
...
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 29dd6c97d145..47ac9291cd43 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12484,6 +12484,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  				   struct kvm_memory_slot *new,
>  				   enum kvm_mr_change change)
>  {
> +	/*
> +	 * KVM doesn't support moving memslots when there are external page
> +	 * trackers attached to the VM, i.e. if KVMGT is in use.
> +	 */
> +	if (change == KVM_MR_MOVE && kvm_page_track_has_external_user(kvm))
> +		return -EINVAL;
Hmm, will page track work correctly on moving memslots when there's no
external users?

in case of KVM_MR_MOVE,
kvm_prepare_memory_region(kvm, old, new, change)
  |->kvm_arch_prepare_memory_region(kvm, old, new, change)
       |->kvm_alloc_memslot_metadata(kvm, new)
            |->memset(&slot->arch, 0, sizeof(slot->arch));
            |->kvm_page_track_create_memslot(kvm, slot, npages)
The new->arch.arch.gfn_write_track will be fresh empty.


kvm_arch_commit_memory_region(kvm, old, new, change);
  |->kvm_arch_free_memslot(kvm, old);
       |->kvm_page_track_free_memslot(slot);
The old->arch.gfn_write_track is freed afterwards.

So, in theory, the new GFNs are not write tracked though the old ones are.

Is that acceptable for the internal page-track user?

>  	if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) {
>  		if ((new->base_gfn + new->npages - 1) > kvm_mmu_max_gfn())
>  			return -EINVAL;
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
>
  
Sean Christopherson March 15, 2023, 3:43 p.m. UTC | #2
On Wed, Mar 15, 2023, Yan Zhao wrote:
> On Fri, Mar 10, 2023 at 04:22:45PM -0800, Sean Christopherson wrote:
> > Disallow moving memslots if the VM has external page-track users, i.e. if
> > KVMGT is being used to expose a virtual GPU to the guest, as KVM doesn't
> > correctly handle moving memory regions.
> > 
> > Note, this is potential ABI breakage!  E.g. userspace could move regions
> > that aren't shadowed by KVMGT without harming the guest.  However, the
> > only known user of KVMGT is QEMU, and QEMU doesn't move generic memory
> > regions.  KVM's own support for moving memory regions was also broken for
> > multiple years (albeit for an edge case, but arguably moving RAM is
> > itself an edge case), e.g. see commit edd4fa37baa6 ("KVM: x86: Allocate
> > new rmap and large page tracking when moving memslot").
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> ...
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 29dd6c97d145..47ac9291cd43 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -12484,6 +12484,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> >  				   struct kvm_memory_slot *new,
> >  				   enum kvm_mr_change change)
> >  {
> > +	/*
> > +	 * KVM doesn't support moving memslots when there are external page
> > +	 * trackers attached to the VM, i.e. if KVMGT is in use.
> > +	 */
> > +	if (change == KVM_MR_MOVE && kvm_page_track_has_external_user(kvm))
> > +		return -EINVAL;
> Hmm, will page track work correctly on moving memslots when there's no
> external users?
> 
> in case of KVM_MR_MOVE,
> kvm_prepare_memory_region(kvm, old, new, change)
>   |->kvm_arch_prepare_memory_region(kvm, old, new, change)
>        |->kvm_alloc_memslot_metadata(kvm, new)
>             |->memset(&slot->arch, 0, sizeof(slot->arch));
>             |->kvm_page_track_create_memslot(kvm, slot, npages)
> The new->arch.arch.gfn_write_track will be fresh empty.
> 
> 
> kvm_arch_commit_memory_region(kvm, old, new, change);
>   |->kvm_arch_free_memslot(kvm, old);
>        |->kvm_page_track_free_memslot(slot);
> The old->arch.gfn_write_track is freed afterwards.
> 
> So, in theory, the new GFNs are not write tracked though the old ones are.
> 
> Is that acceptable for the internal page-track user?

It works because KVM zaps all SPTEs when a memslot is moved, i.e. the fact that
KVM loses the write-tracking counts is benign.  I suspect no VMM actually does
does KVM_MR_MOVE in conjunction with shadow paging, but the ongoing maintenance
cost of supporting KVM_MR_MOVE is quite low at this point, so trying to rip it
out isn't worth the pain of having to deal with potential ABI breakage.

Though in hindsight I wish I had tried disallowed moving memslots instead of
fixing the various bugs a few years back. :-(
  
Yan Zhao March 16, 2023, 9:27 a.m. UTC | #3
On Wed, Mar 15, 2023 at 08:43:54AM -0700, Sean Christopherson wrote:
> > So, in theory, the new GFNs are not write tracked though the old ones are.
> > 
> > Is that acceptable for the internal page-track user?
> 
> It works because KVM zaps all SPTEs when a memslot is moved, i.e. the fact that
Oh, yes!
And KVM will not shadow SPTEs for a invalid memslot, so there's no
problem.
Thanks~

> KVM loses the write-tracking counts is benign.  I suspect no VMM actually does
> does KVM_MR_MOVE in conjunction with shadow paging, but the ongoing maintenance
> cost of supporting KVM_MR_MOVE is quite low at this point, so trying to rip it
> out isn't worth the pain of having to deal with potential ABI breakage.
> 
> Though in hindsight I wish I had tried disallowed moving memslots instead of
> fixing the various bugs a few years back. :-(
  
Yan Zhao March 17, 2023, 7:29 a.m. UTC | #4
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, Mar 10, 2023 at 04:22:45PM -0800, Sean Christopherson wrote:
> Disallow moving memslots if the VM has external page-track users, i.e. if
> KVMGT is being used to expose a virtual GPU to the guest, as KVM doesn't
> correctly handle moving memory regions.
> 
> Note, this is potential ABI breakage!  E.g. userspace could move regions
> that aren't shadowed by KVMGT without harming the guest.  However, the
> only known user of KVMGT is QEMU, and QEMU doesn't move generic memory
> regions.  KVM's own support for moving memory regions was also broken for
> multiple years (albeit for an edge case, but arguably moving RAM is
> itself an edge case), e.g. see commit edd4fa37baa6 ("KVM: x86: Allocate
> new rmap and large page tracking when moving memslot").
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_page_track.h | 3 +++
>  arch/x86/kvm/mmu/page_track.c         | 5 +++++
>  arch/x86/kvm/x86.c                    | 7 +++++++
>  3 files changed, 15 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index 0d65ae203fd6..6a287bcbe8a9 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -77,4 +77,7 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
>  void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>  			  int bytes);
>  void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
> +
> +bool kvm_page_track_has_external_user(struct kvm *kvm);
> +
>  #endif
> diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> index 39a0863af8b4..1cfc0a0ccc23 100644
> --- a/arch/x86/kvm/mmu/page_track.c
> +++ b/arch/x86/kvm/mmu/page_track.c
> @@ -321,3 +321,8 @@ enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
>  	return max_level;
>  }
>  EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
> +
> +bool kvm_page_track_has_external_user(struct kvm *kvm)
> +{
> +	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
> +}
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 29dd6c97d145..47ac9291cd43 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12484,6 +12484,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  				   struct kvm_memory_slot *new,
>  				   enum kvm_mr_change change)
>  {
> +	/*
> +	 * KVM doesn't support moving memslots when there are external page
> +	 * trackers attached to the VM, i.e. if KVMGT is in use.
> +	 */
> +	if (change == KVM_MR_MOVE && kvm_page_track_has_external_user(kvm))
> +		return -EINVAL;
> +
>  	if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) {
>  		if ((new->base_gfn + new->npages - 1) > kvm_mmu_max_gfn())
>  			return -EINVAL;
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
>
  

Patch

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 0d65ae203fd6..6a287bcbe8a9 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -77,4 +77,7 @@  kvm_page_track_unregister_notifier(struct kvm *kvm,
 void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 			  int bytes);
 void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
+
+bool kvm_page_track_has_external_user(struct kvm *kvm);
+
 #endif
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 39a0863af8b4..1cfc0a0ccc23 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -321,3 +321,8 @@  enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 	return max_level;
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
+
+bool kvm_page_track_has_external_user(struct kvm *kvm)
+{
+	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 29dd6c97d145..47ac9291cd43 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12484,6 +12484,13 @@  int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				   struct kvm_memory_slot *new,
 				   enum kvm_mr_change change)
 {
+	/*
+	 * KVM doesn't support moving memslots when there are external page
+	 * trackers attached to the VM, i.e. if KVMGT is in use.
+	 */
+	if (change == KVM_MR_MOVE && kvm_page_track_has_external_user(kvm))
+		return -EINVAL;
+
 	if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) {
 		if ((new->base_gfn + new->npages - 1) > kvm_mmu_max_gfn())
 			return -EINVAL;