KVM: x86: avoid memslot check in NX hugepage recovery if it cannot be true

Message ID 20221117173109.3126912-1-pbonzini@redhat.com
State New
Headers
Series KVM: x86: avoid memslot check in NX hugepage recovery if it cannot be true |

Commit Message

Paolo Bonzini Nov. 17, 2022, 5:31 p.m. UTC
  Since gfn_to_memslot() is relatively expensive, it helps to
skip it if it the memslot cannot possibly have dirty logging
enabled.  In order to do this, add to struct kvm a counter
of the number of log-page memslots.  While the correct value
can only be read with slots_lock taken, the NX recovery thread
is content with using an approximate value.  Therefore, the
counter is an atomic_t.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c   | 22 +++++++++++++++++++---
 include/linux/kvm_host.h |  5 +++++
 virt/kvm/kvm_main.c      |  5 +++++
 3 files changed, 29 insertions(+), 3 deletions(-)
  

Comments

David Matlack Nov. 17, 2022, 6:16 p.m. UTC | #1
On Thu, Nov 17, 2022 at 9:31 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> Since gfn_to_memslot() is relatively expensive, it helps to
> skip it if it the memslot cannot possibly have dirty logging
> enabled.  In order to do this, add to struct kvm a counter
> of the number of log-page memslots.  While the correct value
> can only be read with slots_lock taken, the NX recovery thread
> is content with using an approximate value.  Therefore, the
> counter is an atomic_t.

Oo, good idea to use the counter to skip gfn_to_memslot() in the steady state.

FYI I sent an earlier patch to add an equivalent counter in case
you want to use that and apply the change to
kvm_recover_nx_huge_pages() as a separate patch.

https://lore.kernel.org/kvm/20221027200316.2221027-2-dmatlack@google.com/

>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu/mmu.c   | 22 +++++++++++++++++++---
>  include/linux/kvm_host.h |  5 +++++
>  virt/kvm/kvm_main.c      |  5 +++++
>  3 files changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index cfff74685a25..d4ec9491d468 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6878,16 +6878,32 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
>                 WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
>                 WARN_ON_ONCE(!sp->role.direct);
>
> -               slot = gfn_to_memslot(kvm, sp->gfn);
> -               WARN_ON_ONCE(!slot);
> -
>                 /*
>                  * Unaccount and do not attempt to recover any NX Huge Pages
>                  * that are being dirty tracked, as they would just be faulted
>                  * back in as 4KiB pages. The NX Huge Pages in this slot will be
>                  * recovered, along with all the other huge pages in the slot,
>                  * when dirty logging is disabled.
> +                *
> +                * Since gfn_to_memslot() is relatively expensive, it helps to
> +                * skip it if it the test cannot possibly return true.  On the
> +                * other hand, if any memslot has logging enabled, chances are
> +                * good that all of them do, in which case unaccount_nx_huge_page()
> +                * is much cheaper than zapping the page.
> +                *
> +                * If a memslot update is in progress, reading an incorrect value
> +                * of kvm->nr_logpage_memslots is not a problem: if it is becoming
> +                * zero, gfn_to_memslot() will be done unnecessarily; if it is
> +                * becoming nonzero, the page will be zapped unnecessarily.
> +                * Either way, this only affects efficiency in racy situations,
> +                * and not correctness.
>                  */
> +               slot = NULL;
> +               if (atomic_read(&kvm->nr_logpage_memslots)) {
> +                       slot = gfn_to_memslot(kvm, sp->gfn);
> +                       WARN_ON_ONCE(!slot);
> +               }
> +
>                 if (slot && kvm_slot_dirty_track_enabled(slot))
>                         unaccount_nx_huge_page(kvm, sp);
>                 else if (is_tdp_mmu_page(sp))
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index e6e66c5e56f2..b3c2b975e737 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -722,6 +722,11 @@ struct kvm {
>         /* The current active memslot set for each address space */
>         struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
>         struct xarray vcpu_array;
> +       /*
> +        * Protected by slots_lock, but can be read outside if an
> +        * incorrect answer is acceptable.
> +        */
> +       atomic_t nr_logpage_memslots;

Can also be int + READ_ONCE(), but I do like that atomic_t forces the
reader to use atomic_read().

>
>         /* Used to wait for completion of MMU notifiers.  */
>         spinlock_t mn_invalidate_lock;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 43bbe4fde078..7670ebd29bcf 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1627,6 +1627,11 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
>                 }
>         }
>
> +       atomic_set(&kvm->nr_logpage_memslots,
> +                  atomic_read(&kvm->nr_logpage_memslots)
> +                  + !!(new->flags & KVM_MEM_LOG_DIRTY_PAGES)
> +                  - !!(old->flags & KVM_MEM_LOG_DIRTY_PAGES));

@new and @old can be NULL here if creating or destroying a memslot.


> +
>         r = kvm_arch_prepare_memory_region(kvm, old, new, change);
>
>         /* Free the bitmap on failure if it was allocated above. */
> --
> 2.31.1
>
  
Sean Christopherson Nov. 18, 2022, 12:37 a.m. UTC | #2
On Thu, Nov 17, 2022, Paolo Bonzini wrote:
> +		if (atomic_read(&kvm->nr_logpage_memslots)) {

Can we use something like nr_dirty_logged_memslots?  logpage doesn't precisely
capture the "dirty log" aspect, e.g. for a (very brief) second I though this was
log(nr_memslot_pages).

> +			slot = gfn_to_memslot(kvm, sp->gfn);
> +			WARN_ON_ONCE(!slot);
> +		}
> +
>  		if (slot && kvm_slot_dirty_track_enabled(slot))
>  			unaccount_nx_huge_page(kvm, sp);
>  		else if (is_tdp_mmu_page(sp))
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index e6e66c5e56f2..b3c2b975e737 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -722,6 +722,11 @@ struct kvm {
>  	/* The current active memslot set for each address space */
>  	struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
>  	struct xarray vcpu_array;
> +	/*
> +	 * Protected by slots_lock, but can be read outside if an
> +	 * incorrect answer is acceptable.
> +	 */
> +	atomic_t nr_logpage_memslots;
>  
>  	/* Used to wait for completion of MMU notifiers.  */
>  	spinlock_t mn_invalidate_lock;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 43bbe4fde078..7670ebd29bcf 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1627,6 +1627,11 @@ static int kvm_prepare_memory_region(struct kvm *kvm,

This needs to be done in the commit stage, e.g. if kvm_arch_prepare_memory_region()
fails the count will be all kinds of wrong.  Even better, since this seems to be
x86-centric, put it in kvm_mmu_slot_apply_flags() under the

	if ((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES)

to avoid atomic operations if dirty logging isn't being toggled.  That would also
deal with the NULL pointer issues David pointed out.

>  		}
>  	}
>  
> +	atomic_set(&kvm->nr_logpage_memslots,
> +		   atomic_read(&kvm->nr_logpage_memslots)
> +		   + !!(new->flags & KVM_MEM_LOG_DIRTY_PAGES)
> +		   - !!(old->flags & KVM_MEM_LOG_DIRTY_PAGES));

I belive this can be:

	atomic_add(+ !!(new_flags & KVM_MEM_LOG_DIRTY_PAGES)
		   - !!(old_flags & KVM_MEM_LOG_DIRTY_PAGES), ...);

or less weirdly...

	if ((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES) {
		...

		if (new_flags & KVM_MEM_LOG_DIRTY_PAGES)
			atomic_inc(...);
		else
			atomic_dec(...);
	}
  

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index cfff74685a25..d4ec9491d468 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6878,16 +6878,32 @@  static void kvm_recover_nx_huge_pages(struct kvm *kvm)
 		WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
 		WARN_ON_ONCE(!sp->role.direct);
 
-		slot = gfn_to_memslot(kvm, sp->gfn);
-		WARN_ON_ONCE(!slot);
-
 		/*
 		 * Unaccount and do not attempt to recover any NX Huge Pages
 		 * that are being dirty tracked, as they would just be faulted
 		 * back in as 4KiB pages. The NX Huge Pages in this slot will be
 		 * recovered, along with all the other huge pages in the slot,
 		 * when dirty logging is disabled.
+		 *
+		 * Since gfn_to_memslot() is relatively expensive, it helps to
+		 * skip it if it the test cannot possibly return true.  On the
+		 * other hand, if any memslot has logging enabled, chances are
+		 * good that all of them do, in which case unaccount_nx_huge_page()
+		 * is much cheaper than zapping the page.
+		 *
+		 * If a memslot update is in progress, reading an incorrect value
+		 * of kvm->nr_logpage_memslots is not a problem: if it is becoming
+		 * zero, gfn_to_memslot() will be done unnecessarily; if it is
+		 * becoming nonzero, the page will be zapped unnecessarily.
+		 * Either way, this only affects efficiency in racy situations,
+		 * and not correctness.
 		 */
+		slot = NULL;
+		if (atomic_read(&kvm->nr_logpage_memslots)) {
+			slot = gfn_to_memslot(kvm, sp->gfn);
+			WARN_ON_ONCE(!slot);
+		}
+
 		if (slot && kvm_slot_dirty_track_enabled(slot))
 			unaccount_nx_huge_page(kvm, sp);
 		else if (is_tdp_mmu_page(sp))
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e6e66c5e56f2..b3c2b975e737 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -722,6 +722,11 @@  struct kvm {
 	/* The current active memslot set for each address space */
 	struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
 	struct xarray vcpu_array;
+	/*
+	 * Protected by slots_lock, but can be read outside if an
+	 * incorrect answer is acceptable.
+	 */
+	atomic_t nr_logpage_memslots;
 
 	/* Used to wait for completion of MMU notifiers.  */
 	spinlock_t mn_invalidate_lock;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 43bbe4fde078..7670ebd29bcf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1627,6 +1627,11 @@  static int kvm_prepare_memory_region(struct kvm *kvm,
 		}
 	}
 
+	atomic_set(&kvm->nr_logpage_memslots,
+		   atomic_read(&kvm->nr_logpage_memslots)
+		   + !!(new->flags & KVM_MEM_LOG_DIRTY_PAGES)
+		   - !!(old->flags & KVM_MEM_LOG_DIRTY_PAGES));
+
 	r = kvm_arch_prepare_memory_region(kvm, old, new, change);
 
 	/* Free the bitmap on failure if it was allocated above. */