kvm/x86: add capability to disable the write-track mechanism
Commit Message
The write-track is used externally only by the gpu/drm/i915 driver.
Currently, it is always enabled, if a kernel has been compiled with this
driver.
Enabling the write-track mechanism adds a two-byte overhead per page across
all memory slots. It isn't significant for regular VMs. However in gVisor,
where the entire process virtual address space is mapped into the VM, even
with a 39-bit address space, the overhead amounts to 256MB.
This change introduces the new KVM_CAP_PAGE_WRITE_TRACKING capability,
allowing users to enable/disable the write-track mechanism. It is enabled
by default for backward compatibility.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Andrei Vagin <avagin@google.com>
---
Documentation/virt/kvm/api.rst | 16 ++++++++++++++++
arch/x86/include/asm/kvm_host.h | 4 ++++
arch/x86/kvm/mmu/page_track.c | 13 +++++++++++--
arch/x86/kvm/x86.c | 25 +++++++++++++++++++++++++
include/uapi/linux/kvm.h | 1 +
5 files changed, 57 insertions(+), 2 deletions(-)
Comments
On Mon, Feb 05, 2024, Andrei Vagin wrote:
> The write-track is used externally only by the gpu/drm/i915 driver.
> Currently, it is always enabled, if a kernel has been compiled with this
> driver.
>
> Enabling the write-track mechanism adds a two-byte overhead per page across
> all memory slots. It isn't significant for regular VMs. However in gVisor,
> where the entire process virtual address space is mapped into the VM, even
> with a 39-bit address space, the overhead amounts to 256MB.
>
> This change introduces the new KVM_CAP_PAGE_WRITE_TRACKING capability,
> allowing users to enable/disable the write-track mechanism. It is enabled
> by default for backward compatibility.
I would much prefer to allocate the write-tracking metadata on-demand in
kvm_page_track_register_notifier(), i.e. do the same as mmu_first_shadow_root_alloc(),
except for just gfn_write_track.
The only potential hiccup would be if taking slots_arch_lock would deadlock, but
it should be impossible for slots_arch_lock to be taken in any other path that
involves VFIO and/or KVMGT *and* can be coincident. Except for kvm_arch_destroy_vm()
(which deletes KVM's internal memslots), slots_arch_lock is taken only through
KVM ioctls(), and the caller of kvm_page_track_register_notifier() *must* hold
a reference to the VM.
That way there's no need for new uAPI and no need for userspace changes.
On Mon, Feb 5, 2024 at 10:41 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Feb 05, 2024, Andrei Vagin wrote:
> > The write-track is used externally only by the gpu/drm/i915 driver.
> > Currently, it is always enabled, if a kernel has been compiled with this
> > driver.
> >
> > Enabling the write-track mechanism adds a two-byte overhead per page across
> > all memory slots. It isn't significant for regular VMs. However in gVisor,
> > where the entire process virtual address space is mapped into the VM, even
> > with a 39-bit address space, the overhead amounts to 256MB.
> >
> > This change introduces the new KVM_CAP_PAGE_WRITE_TRACKING capability,
> > allowing users to enable/disable the write-track mechanism. It is enabled
> > by default for backward compatibility.
>
> I would much prefer to allocate the write-tracking metadata on-demand in
> kvm_page_track_register_notifier(), i.e. do the same as mmu_first_shadow_root_alloc(),
> except for just gfn_write_track.
>
> The only potential hiccup would be if taking slots_arch_lock would deadlock, but
> it should be impossible for slots_arch_lock to be taken in any other path that
> involves VFIO and/or KVMGT *and* can be coincident. Except for kvm_arch_destroy_vm()
> (which deletes KVM's internal memslots), slots_arch_lock is taken only through
> KVM ioctls(), and the caller of kvm_page_track_register_notifier() *must* hold
> a reference to the VM.
>
> That way there's no need for new uAPI and no need for userspace changes.
I think it is a good idea, I don't know why I didn't consider it.
Thanks. I will prepare a new patch.
Thanks,
Andrei
On Mon, Feb 05, 2024, Andrei Vagin wrote:
> On Mon, Feb 5, 2024 at 10:41 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Mon, Feb 05, 2024, Andrei Vagin wrote:
> > > The write-track is used externally only by the gpu/drm/i915 driver.
> > > Currently, it is always enabled, if a kernel has been compiled with this
> > > driver.
> > >
> > > Enabling the write-track mechanism adds a two-byte overhead per page across
> > > all memory slots. It isn't significant for regular VMs. However in gVisor,
> > > where the entire process virtual address space is mapped into the VM, even
> > > with a 39-bit address space, the overhead amounts to 256MB.
> > >
> > > This change introduces the new KVM_CAP_PAGE_WRITE_TRACKING capability,
> > > allowing users to enable/disable the write-track mechanism. It is enabled
> > > by default for backward compatibility.
> >
> > I would much prefer to allocate the write-tracking metadata on-demand in
> > kvm_page_track_register_notifier(), i.e. do the same as mmu_first_shadow_root_alloc(),
> > except for just gfn_write_track.
> >
> > The only potential hiccup would be if taking slots_arch_lock would deadlock, but
> > it should be impossible for slots_arch_lock to be taken in any other path that
> > involves VFIO and/or KVMGT *and* can be coincident. Except for kvm_arch_destroy_vm()
> > (which deletes KVM's internal memslots), slots_arch_lock is taken only through
> > KVM ioctls(), and the caller of kvm_page_track_register_notifier() *must* hold
> > a reference to the VM.
> >
> > That way there's no need for new uAPI and no need for userspace changes.
>
> I think it is a good idea, I don't know why I didn't consider it.
Because you wanted to make me look smart ;-)
@@ -8791,6 +8791,22 @@ means the VM type with value @n is supported. Possible values of @n are::
#define KVM_X86_DEFAULT_VM 0
#define KVM_X86_SW_PROTECTED_VM 1
+8.38 KVM_CAP_PAGE_WRITE_TRACKING
+-------------------------------------
+
+:Capability: KVM_CAP_PAGE_WRITE_TRACKING
+:Architectures: x86
+:Type: system ioctl
+:Parameters: args[0] defines whether the feature should be enabled or not
+:Returns: 0 on success, -EINVAL if args[0] isn't 1 or 0, -EINVAL if any memslot
+ was already created or any users was registered, -EBUSY if write
+ tracking is used for internal needs and can't be disabled.
+
+This capability enables/disables the page write-track mechanism. It is enabled
+by default for backward compatibility.
+
+This capability may only be set before any mem slots are created.
+
9. Known KVM API problems
=========================
@@ -1503,6 +1503,10 @@ struct kvm_arch {
*/
#define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
struct kvm_mmu_memory_cache split_desc_cache;
+
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
+ bool page_write_tracking_disabled;
+#endif
};
struct kvm_vm_stat {
@@ -20,9 +20,15 @@
#include "mmu_internal.h"
#include "page_track.h"
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
+#define KVM_EXTERNAL_WRITE_TRACKING_ENABLED(kvm) \
+ (!kvm->arch.page_write_tracking_disabled)
+#else
+#define KVM_EXTERNAL_WRITE_TRACKING_ENABLED(kvm) false
+#endif
+
bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
-{
- return IS_ENABLED(CONFIG_KVM_EXTERNAL_WRITE_TRACKING) ||
+{ return KVM_EXTERNAL_WRITE_TRACKING_ENABLED(kvm) ||
!tdp_enabled || kvm_shadow_root_allocated(kvm);
}
@@ -165,6 +171,9 @@ int kvm_page_track_register_notifier(struct kvm *kvm,
if (!kvm || kvm->mm != current->mm)
return -ESRCH;
+ if (kvm->arch.page_write_tracking_disabled)
+ return -EINVAL;
+
kvm_get_kvm(kvm);
head = &kvm->arch.track_notifier_head;
@@ -4668,6 +4668,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES:
case KVM_CAP_IRQFD_RESAMPLE:
case KVM_CAP_MEMORY_FAULT_INFO:
+ case KVM_CAP_PAGE_WRITE_TRACKING:
r = 1;
break;
case KVM_CAP_EXIT_HYPERCALL:
@@ -6675,6 +6676,30 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
}
mutex_unlock(&kvm->lock);
break;
+ case KVM_CAP_PAGE_WRITE_TRACKING: {
+ bool enabling = cap->args[0];
+
+ r = -EINVAL;
+ if (cap->args[0] & ~1)
+ break;
+
+ r = -EBUSY;
+ if (!enabling && (!tdp_enabled || kvm_shadow_root_allocated(kvm)))
+ break;
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
+ write_lock(&kvm->mmu_lock);
+ if (!kvm_memslots_empty(kvm_memslots(kvm)) ||
+ kvm_page_track_has_external_user(kvm)) {
+ write_unlock(&kvm->mmu_lock);
+ r = -EINVAL;
+ break;
+ }
+ kvm->arch.page_write_tracking_disabled = !enabling;
+ write_unlock(&kvm->mmu_lock);
+#endif
+ r = 0;
+ break;
+ }
default:
r = -EINVAL;
break;
@@ -1155,6 +1155,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_MEMORY_ATTRIBUTES 233
#define KVM_CAP_GUEST_MEMFD 234
#define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_PAGE_WRITE_TRACKING 236
#ifdef KVM_CAP_IRQ_ROUTING