[v12,18/20] KVM: pfncache: check the need for invalidation under read lock first

Message ID 20240115125707.1183-19-paul@xen.org
State New
Headers
Series KVM: xen: update shared_info and vcpu_info handling |

Commit Message

Paul Durrant Jan. 15, 2024, 12:57 p.m. UTC
  From: Paul Durrant <pdurrant@amazon.com>

Taking a write lock on a pfncache will be disruptive if the cache is
heavily used (which only requires a read lock). Hence, in the MMU notifier
callback, take read locks on caches to check for a match; only taking a
write lock to actually perform an invalidation (after a another check).

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>

v10:
 - New in this version.
---
 virt/kvm/pfncache.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)
  

Comments

Sean Christopherson Feb. 7, 2024, 4:22 a.m. UTC | #1
On Mon, Jan 15, 2024, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> Taking a write lock on a pfncache will be disruptive if the cache is

*Unnecessarily* taking a write lock.  Please save readers a bit of brain power
and explain that this is beneificial when there are _unrelated_ invalidation.

> heavily used (which only requires a read lock). Hence, in the MMU notifier
> callback, take read locks on caches to check for a match; only taking a
> write lock to actually perform an invalidation (after a another check).

This doesn't have any dependency on this series, does it?  I.e. this should be
posted separately, and preferably with some performance data.  Not having data
isn't a sticking point, but it would be nice to verify that this isn't a
pointless optimization.
  
David Woodhouse Feb. 7, 2024, 4:27 a.m. UTC | #2
On Tue, 2024-02-06 at 20:22 -0800, Sean Christopherson wrote:
> On Mon, Jan 15, 2024, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> > 
> > Taking a write lock on a pfncache will be disruptive if the cache is
> 
> *Unnecessarily* taking a write lock.

No. Taking a write lock will be disrupting.

Unnecessarily taking a write lock will be unnecessarily disrupting.

Taking a write lock on a Thursday will be disrupting on a Thursday.

But the key is that if the cache is heavily used, the user gets
disrupted.


>   Please save readers a bit of brain power
> and explain that this is beneificial when there are _unrelated_ invalidation.

I don't understand what you're saying there. Paul's sentence did have
an implicit "...so do that less then", but that didn't take much brain
power to infer.

> > heavily used (which only requires a read lock). Hence, in the MMU notifier
> > callback, take read locks on caches to check for a match; only taking a
> > write lock to actually perform an invalidation (after a another check).
> 
> This doesn't have any dependency on this series, does it?  I.e. this should be
> posted separately, and preferably with some performance data.  Not having data
> isn't a sticking point, but it would be nice to verify that this isn't a
> pointless optimization.

No fundamental dependency, no. But it was triggered by the previous
patch, which makes kvm_xen_set_evtchn_fast() use read_trylock() and
makes it take the slow path when there's contention. It lives here just
fine as part of the series.
  
Sean Christopherson Feb. 7, 2024, 4:47 a.m. UTC | #3
On Tue, Feb 06, 2024, David Woodhouse wrote:
> On Tue, 2024-02-06 at 20:22 -0800, Sean Christopherson wrote:
> > On Mon, Jan 15, 2024, Paul Durrant wrote:
> > > From: Paul Durrant <pdurrant@amazon.com>
> > > 
> > > Taking a write lock on a pfncache will be disruptive if the cache is
> > 
> > *Unnecessarily* taking a write lock.
> 
> No. Taking a write lock will be disrupting.
> 
> Unnecessarily taking a write lock will be unnecessarily disrupting.
> 
> Taking a write lock on a Thursday will be disrupting on a Thursday.
> 
> But the key is that if the cache is heavily used, the user gets
> disrupted.

If the invalidation is relevant, then this code is taking gpc->lock for write no
matter what.  The purpose of the changelog is to explain _why_ a patch adds value.

> >   Please save readers a bit of brain power
> > and explain that this is beneificial when there are _unrelated_ invalidation.
> 
> I don't understand what you're saying there. Paul's sentence did have
> an implicit "...so do that less then", but that didn't take much brain
> power to infer.

I'm saying this:

  When processing mmu_notifier invalidations for gpc caches, pre-check for
  overlap with the invalidation event while holding gpc->lock for read, and
  only take gpc->lock for write if the cache needs to be invalidated.  Doing
  a pre-check without taking gpc->lock for write avoids unnecessarily
  contending the lock for unrelated invalidations, which is very beneficial
  for caches that are heavily used (but rarely subjected to mmu_notifier
  invalidations).

is much friendlier to readers than this:

  Taking a write lock on a pfncache will be disruptive if the cache is
  heavily used (which only requires a read lock). Hence, in the MMU notifier
  callback, take read locks on caches to check for a match; only taking a
  write lock to actually perform an invalidation (after a another check).

Is it too much hand-holding, and bordering on stating the obvious?  Maybe.  But
(a) a lot of people that read mailing lists and KVM code are *not* kernel experts,
and (b) a changelog is written _once_, and read hundreds if not thousands of times.

If we can save each reader even a few seconds, then taking an extra minute or two
to write a more verbose changelog is a net win.
  
David Woodhouse Feb. 7, 2024, 4:59 a.m. UTC | #4
On Tue, 2024-02-06 at 20:47 -0800, Sean Christopherson wrote:
> 
> I'm saying this:
> 
>   When processing mmu_notifier invalidations for gpc caches, pre-check for
>   overlap with the invalidation event while holding gpc->lock for read, and
>   only take gpc->lock for write if the cache needs to be invalidated  Doing
>   a pre-check without taking gpc->lock for write avoids unnecessarily
>   contending the lock for unrelated invalidations, which is very beneficial
>   for caches that are heavily used (but rarely subjected to mmu_notifier
>   invalidations).
> 
> is much friendlier to readers than this:
> 
>   Taking a write lock on a pfncache will be disruptive if the cache is
>   heavily used (which only requires a read lock). Hence, in the MMU notifier
>   callback, take read locks on caches to check for a match; only taking a
>   write lock to actually perform an invalidation (after a another check).

That's a somewhat subjective observation. I actually find the latter to
be far more succinct and obvious.

Actually... maybe I find yours harder because it isn't actually stating
the situation as I understand it. You said "unrelated invalidation" in
your first email, and "overlap with the invalidation event" in this
one... neither of which makes sense to me because there is no *other*
invalidation here.

We're only talking about the MMU notifier gratuitously taking the write
lock on a GPC that it *isn't* going to invalidate (the common case),
and that disrupting users which are trying to take the read lock on
that GPC.
  
Sean Christopherson Feb. 7, 2024, 3:10 p.m. UTC | #5
On Tue, Feb 06, 2024, David Woodhouse wrote:
> On Tue, 2024-02-06 at 20:47 -0800, Sean Christopherson wrote:
> > 
> > I'm saying this:
> > 
> >   When processing mmu_notifier invalidations for gpc caches, pre-check for
> >   overlap with the invalidation event while holding gpc->lock for read, and
> >   only take gpc->lock for write if the cache needs to be invalidated.  Doing
> >   a pre-check without taking gpc->lock for write avoids unnecessarily
> >   contending the lock for unrelated invalidations, which is very beneficial
> >   for caches that are heavily used (but rarely subjected to mmu_notifier
> >   invalidations).
> > 
> > is much friendlier to readers than this:
> > 
> >   Taking a write lock on a pfncache will be disruptive if the cache is
> >   heavily used (which only requires a read lock). Hence, in the MMU notifier
> >   callback, take read locks on caches to check for a match; only taking a
> >   write lock to actually perform an invalidation (after a another check).
> 
> That's a somewhat subjective observation. I actually find the latter to
> be far more succinct and obvious.
> 
> Actually... maybe I find yours harder because it isn't actually stating
> the situation as I understand it. You said "unrelated invalidation" in
> your first email, and "overlap with the invalidation event" in this
> one... neither of which makes sense to me because there is no *other*
> invalidation here.

I am referring to the "mmu_notifier invalidation event".  While a particular GPC
may not be affected by the invalidation, it's entirely possible that a different
GPC and/or some chunk of guest memory does need to be invalidated/zapped.

> We're only talking about the MMU notifier gratuitously taking the write

It's not "the MMU notifier" though, it's KVM that unnecessarily takes a lock.  I
know I'm being somewhat pedantic, but the distinction does matter.  E.g. with
guest_memfd, there will be invalidations that get routed through this code, but
that do not originate in the mmu_notifier.

And I think it's important to make it clear to readers that an mmu_notifier really
just is a notification from the primary MMU, albeit a notification that comes with
a rather strict contract.

> lock on a GPC that it *isn't* going to invalidate (the common case),
> and that disrupting users which are trying to take the read lock on
> that GPC.
  

Patch

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index ae822bff812f..70394d7c9a38 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -29,14 +29,30 @@  void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start,
 
 	spin_lock(&kvm->gpc_lock);
 	list_for_each_entry(gpc, &kvm->gpc_list, list) {
-		write_lock_irq(&gpc->lock);
+		read_lock_irq(&gpc->lock);
 
 		/* Only a single page so no need to care about length */
 		if (gpc->valid && !is_error_noslot_pfn(gpc->pfn) &&
 		    gpc->uhva >= start && gpc->uhva < end) {
-			gpc->valid = false;
+			read_unlock_irq(&gpc->lock);
+
+			/*
+			 * There is a small window here where the cache could
+			 * be modified, and invalidation would no longer be
+			 * necessary. Hence check again whether invalidation
+			 * is still necessary once the write lock has been
+			 * acquired.
+			 */
+
+			write_lock_irq(&gpc->lock);
+			if (gpc->valid && !is_error_noslot_pfn(gpc->pfn) &&
+			    gpc->uhva >= start && gpc->uhva < end)
+				gpc->valid = false;
+			write_unlock_irq(&gpc->lock);
+			continue;
 		}
-		write_unlock_irq(&gpc->lock);
+
+		read_unlock_irq(&gpc->lock);
 	}
 	spin_unlock(&kvm->gpc_lock);
 }