[v4,06/13] KVM: x86/mmu: Bypass __handle_changed_spte() when clearing TDP MMU dirty bits

Message ID 20230321220021.2119033-7-seanjc@google.com
State New
Headers
Series KVM: x86/mmu: Optimize clear dirty log |

Commit Message

Sean Christopherson March 21, 2023, 10 p.m. UTC
  From: Vipin Sharma <vipinsh@google.com>

Drop everything except marking the PFN dirty and the relevant tracepoint
parts of __handle_changed_spte() when clearing the dirty status of gfns in
the TDP MMU.  Clearing only the Dirty (or Writable) bit doesn't affect
the SPTEs shadow-present status, whether or not the SPTE is a leaf, or
change the SPTE's PFN.  I.e. other than marking the PFN dirty, none of the
functional updates handled by __handle_changed_spte() are relevant.

Losing __handle_changed_spte()'s sanity checks does mean that a bug could
theoretical go unnoticed, but that scenario is extremely unlikely, e.g.
would effectively require a misconfigured or a locking bug elsewhere.

Opportunistically remove a comment blurb from __handle_changed_spte()
about all modifications to TDP MMU SPTEs needing to invoke said function,
that "rule" hasn't been true since fast page fault support was added for
the TDP MMU (and perhaps even before).

Tested on a VM (160 vCPUs, 160 GB memory) and found that performance of
clear dirty log stage improved by ~40% in dirty_log_perf_test (with the
full optimization applied).

Before optimization:
--------------------
Iteration 1 clear dirty log time: 3.638543593s
Iteration 2 clear dirty log time: 3.145032742s
Iteration 3 clear dirty log time: 3.142340358s
Clear dirty log over 3 iterations took 9.925916693s. (Avg 3.308638897s/iteration)

After optimization:
-------------------
Iteration 1 clear dirty log time: 2.318988110s
Iteration 2 clear dirty log time: 1.794470164s
Iteration 3 clear dirty log time: 1.791668628s
Clear dirty log over 3 iterations took 5.905126902s. (Avg 1.968375634s/iteration)

Link: https://lore.kernel.org/all/Y9hXmz%2FnDOr1hQal@google.com
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
[sean: split the switch to atomic-AND to a separate patch]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)
  

Comments

Like Xu June 25, 2023, 7:44 a.m. UTC | #1
On 22/3/2023 6:00 am, Sean Christopherson wrote:
> From: Vipin Sharma <vipinsh@google.com>
> 
> Drop everything except marking the PFN dirty and the relevant tracepoint
> parts of __handle_changed_spte() when clearing the dirty status of gfns in
> the TDP MMU.  Clearing only the Dirty (or Writable) bit doesn't affect
> the SPTEs shadow-present status, whether or not the SPTE is a leaf, or
> change the SPTE's PFN.  I.e. other than marking the PFN dirty, none of the
> functional updates handled by __handle_changed_spte() are relevant.
> 
> Losing __handle_changed_spte()'s sanity checks does mean that a bug could
> theoretical go unnoticed, but that scenario is extremely unlikely, e.g.
> would effectively require a misconfigured or a locking bug elsewhere.
> 
> Opportunistically remove a comment blurb from __handle_changed_spte()
> about all modifications to TDP MMU SPTEs needing to invoke said function,
> that "rule" hasn't been true since fast page fault support was added for
> the TDP MMU (and perhaps even before).
> 
> Tested on a VM (160 vCPUs, 160 GB memory) and found that performance of
> clear dirty log stage improved by ~40% in dirty_log_perf_test (with the
> full optimization applied).
> 
> Before optimization:
> --------------------
> Iteration 1 clear dirty log time: 3.638543593s
> Iteration 2 clear dirty log time: 3.145032742s
> Iteration 3 clear dirty log time: 3.142340358s
> Clear dirty log over 3 iterations took 9.925916693s. (Avg 3.308638897s/iteration)
> 
> After optimization:
> -------------------
> Iteration 1 clear dirty log time: 2.318988110s
> Iteration 2 clear dirty log time: 1.794470164s
> Iteration 3 clear dirty log time: 1.791668628s
> Clear dirty log over 3 iterations took 5.905126902s. (Avg 1.968375634s/iteration)
> 
> Link: https://lore.kernel.org/all/Y9hXmz%2FnDOr1hQal@google.com
> Signed-off-by: Vipin Sharma <vipinsh@google.com>
> Reviewed-by: David Matlack <dmatlack@google.com>
> [sean: split the switch to atomic-AND to a separate patch]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/mmu/tdp_mmu.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 950c5d23ecee..467931c43968 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -517,7 +517,6 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
>    *	    threads that might be modifying SPTEs.
>    *
>    * Handle bookkeeping that might result from the modification of a SPTE.
> - * This function must be called for all TDP SPTE modifications.
>    */
>   static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
>   				  u64 old_spte, u64 new_spte, int level,
> @@ -1689,8 +1688,10 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
>   							iter.old_spte, dbit,
>   							iter.level);
>   
> -		__handle_changed_spte(kvm, iter.as_id, iter.gfn, iter.old_spte,
> -				      iter.old_spte & ~dbit, iter.level, false);
> +		trace_kvm_tdp_mmu_spte_changed(iter.as_id, iter.gfn, iter.level,

Here the first parameter "kvm" is no longer used in this context.

Please help confirm that for clear_dirty_pt_masked(), should the "struct kvm 
*kvm" parameter
be cleared from the list of incoming parameters ?

> +					       iter.old_spte,
> +					       iter.old_spte & ~dbit);
> +		kvm_set_pfn_dirty(spte_to_pfn(iter.old_spte));
>   	}
>   
>   	rcu_read_unlock();
  
Sean Christopherson June 26, 2023, 9:37 p.m. UTC | #2
On Sun, Jun 25, 2023, Like Xu wrote:
> On 22/3/2023 6:00 am, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index 950c5d23ecee..467931c43968 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -517,7 +517,6 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
> >    *	    threads that might be modifying SPTEs.
> >    *
> >    * Handle bookkeeping that might result from the modification of a SPTE.
> > - * This function must be called for all TDP SPTE modifications.
> >    */
> >   static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> >   				  u64 old_spte, u64 new_spte, int level,
> > @@ -1689,8 +1688,10 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
> >   							iter.old_spte, dbit,
> >   							iter.level);
> > -		__handle_changed_spte(kvm, iter.as_id, iter.gfn, iter.old_spte,
> > -				      iter.old_spte & ~dbit, iter.level, false);
> > +		trace_kvm_tdp_mmu_spte_changed(iter.as_id, iter.gfn, iter.level,
> 
> Here the first parameter "kvm" is no longer used in this context.
> 
> Please help confirm that for clear_dirty_pt_masked(), should the "struct kvm
> *kvm" parameter be cleared from the list of incoming parameters ?

Hmm, there's only one caller, so keeping @kvm around "just in case" probably
doesn't make sense, e.g. adding it back so that we could do KVM_BUG_ON() in the
future wouldn't require much churn.

That said, I'm tempted to move the lockdep so that it's more obvious why it's safe
for clear_dirty_pt_masked() to use the non-atomic (for non-volatile SPTEs)
tdp_mmu_clear_spte_bits() helper.  for_each_tdp_mmu_root() does its own lockdep,
so the only "loss" in lockdep coverage is if the list is completely empty.

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 512163d52194..0b4f03bef70e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1600,6 +1600,8 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
                                                   shadow_dirty_mask;
        struct tdp_iter iter;
 
+       lockdep_assert_held_write(&kvm->mmu_lock);
+
        rcu_read_lock();
 
        tdp_root_for_each_leaf_pte(iter, root, gfn + __ffs(mask),
@@ -1646,7 +1648,6 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 {
        struct kvm_mmu_page *root;
 
-       lockdep_assert_held_write(&kvm->mmu_lock);
        for_each_tdp_mmu_root(kvm, root, slot->as_id)
                clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot);
 }
  

Patch

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 950c5d23ecee..467931c43968 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -517,7 +517,6 @@  static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
  *	    threads that might be modifying SPTEs.
  *
  * Handle bookkeeping that might result from the modification of a SPTE.
- * This function must be called for all TDP SPTE modifications.
  */
 static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 				  u64 old_spte, u64 new_spte, int level,
@@ -1689,8 +1688,10 @@  static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
 							iter.old_spte, dbit,
 							iter.level);
 
-		__handle_changed_spte(kvm, iter.as_id, iter.gfn, iter.old_spte,
-				      iter.old_spte & ~dbit, iter.level, false);
+		trace_kvm_tdp_mmu_spte_changed(iter.as_id, iter.gfn, iter.level,
+					       iter.old_spte,
+					       iter.old_spte & ~dbit);
+		kvm_set_pfn_dirty(spte_to_pfn(iter.old_spte));
 	}
 
 	rcu_read_unlock();