Message ID | 20240115125707.1183-19-paul@xen.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-26004-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:693c:2614:b0:101:6a76:bbe3 with SMTP id mm20csp1692926dyc; Mon, 15 Jan 2024 05:13:11 -0800 (PST) X-Google-Smtp-Source: AGHT+IGkT/gp443jOI/Ig7AIsgAjNVUun5qhU2xn034/91NUr+b5JawDZPmIRdGGI3caSwP8lQvq X-Received: by 2002:a05:620a:269b:b0:783:4d50:ce19 with SMTP id c27-20020a05620a269b00b007834d50ce19mr4637055qkp.62.1705324391498; Mon, 15 Jan 2024 05:13:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1705324391; cv=none; d=google.com; s=arc-20160816; b=PEYelFFseUMIqWCDra3WUKg02U/MhWvs78HEd0kE32WDvhx7oq1gLIOt4oBCKwSYM3 G8stbuU2Rbb8gr5NAwRNHAxLTT+NtZd0ORtE0Ge7ACvb/w9YRs5z08XMonvtw7Ki/42x xZGUHCqV1TXz9Fvb1n2NAj3ymPbfopMHSXOySGSMTVi4zFCv9dK7Jg2b+0WRYpiFXYFj a1HACImN+p6lP14AM58mK3uPoiieRA5VkHrisQFSLhoSTk5Ou4uLJ4WutEC5WnUCFLxF ZsZ961BLrkDWaQyBP5hNnD4E95smMaeyQcT2JiTTLdHZA0EdSeKX7ziNuF67XaJB4CCA vzxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:to:from:dkim-signature; bh=z7zH+v6OnxjwAjiPJoCCBsbBFCz5/p3/vhyVHwUde2s=; fh=aAPqvXm40pY9772UiBOljEePwtz/3Q0zIOpw238tj3M=; b=HpTHih+c+MXqgtTxQrob57yxKxiW8tzgWbbCUyiyjoTdMpNoJ11gA+YzGKkKMYAAiP z5RspAaK9nF0z3AVxrTXQbJxGgOJZXqBYr8Kfju2oTYQ3ZiPH0jRueeANUXEJEXMzIhb jAuJ7jg5dez6CZOUmZt/DWEqIIcW1apDcO27pY5vJqKCpMAmMzz0LKcPLS1VRRMvytb0 4olwdzvFUeSZpjj+2/ZdnuvtUxFtftN7FngF9fROx+BEYSN10hKsJnutvOb+do5CKQ8/ nKEVjoGqVAwdfySytHh00WuKY2Sy+Bd03PX5OOnEjRvz8AEVKLKO1wqiOZ9z1DFu4gnr uz7A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@xen.org header.s=20200302mail header.b=ioaQBr0o; spf=pass (google.com: domain of linux-kernel+bounces-26004-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-26004-ouuuleilei=gmail.com@vger.kernel.org" Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id ea17-20020a05620a489100b00783527bebd6si4417525qkb.252.2024.01.15.05.13.11 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Jan 2024 05:13:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-26004-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@xen.org header.s=20200302mail header.b=ioaQBr0o; spf=pass (google.com: domain of linux-kernel+bounces-26004-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-26004-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 329E41C218E3 for <ouuuleilei@gmail.com>; Mon, 15 Jan 2024 13:13:11 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A4EE218C01; Mon, 15 Jan 2024 13:09:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=xen.org header.i=@xen.org header.b="ioaQBr0o" Received: from mail.xenproject.org (mail.xenproject.org [104.130.215.37]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 243D9175B9; Mon, 15 Jan 2024 13:09:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=xen.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=xen.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=xen.org; s=20200302mail; h=Content-Transfer-Encoding:MIME-Version:References: In-Reply-To:Message-Id:Date:Subject:To:From; bh=z7zH+v6OnxjwAjiPJoCCBsbBFCz5/p3/vhyVHwUde2s=; b=ioaQBr0o5KTn9Br5QwqGYCUaG3 cA8gKp8XP14OrX09JaP3bWITXuVxtZ4KhlKAPJT0VuhtQoREzz7+C1LEDXb1enNxkELT3/ZO49gsM eATdbbyFhpqDdsNdeYA85Ier3DbXG1O1PgbvHWoiGO1qZrjvReY4VwsonrKvX8HJM6SA=; Received: from xenbits.xenproject.org ([104.239.192.120]) by mail.xenproject.org with esmtp (Exim 4.92) (envelope-from <paul@xen.org>) id 1rPMiY-00035a-Ik; Mon, 15 Jan 2024 13:09:30 +0000 Received: from 54-240-197-231.amazon.com ([54.240.197.231] helo=REM-PW02S00X.ant.amazon.com) by xenbits.xenproject.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <paul@xen.org>) id 1rPMXf-0002kM-QU; Mon, 15 Jan 2024 12:58:16 +0000 From: Paul Durrant <paul@xen.org> To: Paolo Bonzini <pbonzini@redhat.com>, Jonathan Corbet <corbet@lwn.net>, Sean Christopherson <seanjc@google.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>, David Woodhouse <dwmw2@infradead.org>, Paul Durrant <paul@xen.org>, Shuah Khan <shuah@kernel.org>, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH v12 18/20] KVM: pfncache: check the need for invalidation under read lock first Date: Mon, 15 Jan 2024 12:57:05 +0000 Message-Id: <20240115125707.1183-19-paul@xen.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240115125707.1183-1-paul@xen.org> References: <20240115125707.1183-1-paul@xen.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1788162229392503700 X-GMAIL-MSGID: 1788162229392503700 |
Series |
KVM: xen: update shared_info and vcpu_info handling
|
|
Commit Message
Paul Durrant
Jan. 15, 2024, 12:57 p.m. UTC
From: Paul Durrant <pdurrant@amazon.com> Taking a write lock on a pfncache will be disruptive if the cache is heavily used (which only requires a read lock). Hence, in the MMU notifier callback, take read locks on caches to check for a match; only taking a write lock to actually perform an invalidation (after a another check). Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> --- Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> v10: - New in this version. --- virt/kvm/pfncache.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-)
Comments
On Mon, Jan 15, 2024, Paul Durrant wrote: > From: Paul Durrant <pdurrant@amazon.com> > > Taking a write lock on a pfncache will be disruptive if the cache is *Unnecessarily* taking a write lock. Please save readers a bit of brain power and explain that this is beneificial when there are _unrelated_ invalidation. > heavily used (which only requires a read lock). Hence, in the MMU notifier > callback, take read locks on caches to check for a match; only taking a > write lock to actually perform an invalidation (after a another check). This doesn't have any dependency on this series, does it? I.e. this should be posted separately, and preferably with some performance data. Not having data isn't a sticking point, but it would be nice to verify that this isn't a pointless optimization.
On Tue, 2024-02-06 at 20:22 -0800, Sean Christopherson wrote: > On Mon, Jan 15, 2024, Paul Durrant wrote: > > From: Paul Durrant <pdurrant@amazon.com> > > > > Taking a write lock on a pfncache will be disruptive if the cache is > > *Unnecessarily* taking a write lock. No. Taking a write lock will be disrupting. Unnecessarily taking a write lock will be unnecessarily disrupting. Taking a write lock on a Thursday will be disrupting on a Thursday. But the key is that if the cache is heavily used, the user gets disrupted. > Please save readers a bit of brain power > and explain that this is beneificial when there are _unrelated_ invalidation. I don't understand what you're saying there. Paul's sentence did have an implicit "...so do that less then", but that didn't take much brain power to infer. > > heavily used (which only requires a read lock). Hence, in the MMU notifier > > callback, take read locks on caches to check for a match; only taking a > > write lock to actually perform an invalidation (after a another check). > > This doesn't have any dependency on this series, does it? I.e. this should be > posted separately, and preferably with some performance data. Not having data > isn't a sticking point, but it would be nice to verify that this isn't a > pointless optimization. No fundamental dependency, no. But it was triggered by the previous patch, which makes kvm_xen_set_evtchn_fast() use read_trylock() and makes it take the slow path when there's contention. It lives here just fine as part of the series.
On Tue, Feb 06, 2024, David Woodhouse wrote: > On Tue, 2024-02-06 at 20:22 -0800, Sean Christopherson wrote: > > On Mon, Jan 15, 2024, Paul Durrant wrote: > > > From: Paul Durrant <pdurrant@amazon.com> > > > > > > Taking a write lock on a pfncache will be disruptive if the cache is > > > > *Unnecessarily* taking a write lock. > > No. Taking a write lock will be disrupting. > > Unnecessarily taking a write lock will be unnecessarily disrupting. > > Taking a write lock on a Thursday will be disrupting on a Thursday. > > But the key is that if the cache is heavily used, the user gets > disrupted. If the invalidation is relevant, then this code is taking gpc->lock for write no matter what. The purpose of the changelog is to explain _why_ a patch adds value. > > Please save readers a bit of brain power > > and explain that this is beneificial when there are _unrelated_ invalidation. > > I don't understand what you're saying there. Paul's sentence did have > an implicit "...so do that less then", but that didn't take much brain > power to infer. I'm saying this: When processing mmu_notifier invalidations for gpc caches, pre-check for overlap with the invalidation event while holding gpc->lock for read, and only take gpc->lock for write if the cache needs to be invalidated. Doing a pre-check without taking gpc->lock for write avoids unnecessarily contending the lock for unrelated invalidations, which is very beneficial for caches that are heavily used (but rarely subjected to mmu_notifier invalidations). is much friendlier to readers than this: Taking a write lock on a pfncache will be disruptive if the cache is heavily used (which only requires a read lock). Hence, in the MMU notifier callback, take read locks on caches to check for a match; only taking a write lock to actually perform an invalidation (after a another check). Is it too much hand-holding, and bordering on stating the obvious? Maybe. But (a) a lot of people that read mailing lists and KVM code are *not* kernel experts, and (b) a changelog is written _once_, and read hundreds if not thousands of times. If we can save each reader even a few seconds, then taking an extra minute or two to write a more verbose changelog is a net win.
On Tue, 2024-02-06 at 20:47 -0800, Sean Christopherson wrote: > > I'm saying this: > > When processing mmu_notifier invalidations for gpc caches, pre-check for > overlap with the invalidation event while holding gpc->lock for read, and > only take gpc->lock for write if the cache needs to be invalidated Doing > a pre-check without taking gpc->lock for write avoids unnecessarily > contending the lock for unrelated invalidations, which is very beneficial > for caches that are heavily used (but rarely subjected to mmu_notifier > invalidations). > > is much friendlier to readers than this: > > Taking a write lock on a pfncache will be disruptive if the cache is > heavily used (which only requires a read lock). Hence, in the MMU notifier > callback, take read locks on caches to check for a match; only taking a > write lock to actually perform an invalidation (after a another check). That's a somewhat subjective observation. I actually find the latter to be far more succinct and obvious. Actually... maybe I find yours harder because it isn't actually stating the situation as I understand it. You said "unrelated invalidation" in your first email, and "overlap with the invalidation event" in this one... neither of which makes sense to me because there is no *other* invalidation here. We're only talking about the MMU notifier gratuitously taking the write lock on a GPC that it *isn't* going to invalidate (the common case), and that disrupting users which are trying to take the read lock on that GPC.
On Tue, Feb 06, 2024, David Woodhouse wrote: > On Tue, 2024-02-06 at 20:47 -0800, Sean Christopherson wrote: > > > > I'm saying this: > > > > When processing mmu_notifier invalidations for gpc caches, pre-check for > > overlap with the invalidation event while holding gpc->lock for read, and > > only take gpc->lock for write if the cache needs to be invalidated. Doing > > a pre-check without taking gpc->lock for write avoids unnecessarily > > contending the lock for unrelated invalidations, which is very beneficial > > for caches that are heavily used (but rarely subjected to mmu_notifier > > invalidations). > > > > is much friendlier to readers than this: > > > > Taking a write lock on a pfncache will be disruptive if the cache is > > heavily used (which only requires a read lock). Hence, in the MMU notifier > > callback, take read locks on caches to check for a match; only taking a > > write lock to actually perform an invalidation (after a another check). > > That's a somewhat subjective observation. I actually find the latter to > be far more succinct and obvious. > > Actually... maybe I find yours harder because it isn't actually stating > the situation as I understand it. You said "unrelated invalidation" in > your first email, and "overlap with the invalidation event" in this > one... neither of which makes sense to me because there is no *other* > invalidation here. I am referring to the "mmu_notifier invalidation event". While a particular GPC may not be affected by the invalidation, it's entirely possible that a different GPC and/or some chunk of guest memory does need to be invalidated/zapped. > We're only talking about the MMU notifier gratuitously taking the write It's not "the MMU notifier" though, it's KVM that unnecessarily takes a lock. I know I'm being somewhat pedantic, but the distinction does matter. E.g. with guest_memfd, there will be invalidations that get routed through this code, but that do not originate in the mmu_notifier. And I think it's important to make it clear to readers that an mmu_notifier really just is a notification from the primary MMU, albeit a notification that comes with a rather strict contract. > lock on a GPC that it *isn't* going to invalidate (the common case), > and that disrupting users which are trying to take the read lock on > that GPC.
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index ae822bff812f..70394d7c9a38 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -29,14 +29,30 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start, spin_lock(&kvm->gpc_lock); list_for_each_entry(gpc, &kvm->gpc_list, list) { - write_lock_irq(&gpc->lock); + read_lock_irq(&gpc->lock); /* Only a single page so no need to care about length */ if (gpc->valid && !is_error_noslot_pfn(gpc->pfn) && gpc->uhva >= start && gpc->uhva < end) { - gpc->valid = false; + read_unlock_irq(&gpc->lock); + + /* + * There is a small window here where the cache could + * be modified, and invalidation would no longer be + * necessary. Hence check again whether invalidation + * is still necessary once the write lock has been + * acquired. + */ + + write_lock_irq(&gpc->lock); + if (gpc->valid && !is_error_noslot_pfn(gpc->pfn) && + gpc->uhva >= start && gpc->uhva < end) + gpc->valid = false; + write_unlock_irq(&gpc->lock); + continue; } - write_unlock_irq(&gpc->lock); + + read_unlock_irq(&gpc->lock); } spin_unlock(&kvm->gpc_lock); }