Message ID | 20221228110410.1682852-2-pbonzini@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp1836521wrt; Wed, 28 Dec 2022 03:12:44 -0800 (PST) X-Google-Smtp-Source: AMrXdXvGpPWxXHdKosFjG+Zq6AyJ115tixa1KuDlk8E9D5QJrq37g7dM2QF5OJfvVMXIPtPv1mNx X-Received: by 2002:a05:6a20:a58b:b0:af:cc17:116c with SMTP id bc11-20020a056a20a58b00b000afcc17116cmr31483360pzb.1.1672225963970; Wed, 28 Dec 2022 03:12:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672225963; cv=none; d=google.com; s=arc-20160816; b=LqAlmPQaWrGXA+07yrTccCPuLKzKs+3mXAK+sXxttM6bxBG23rBekOB/0mQDJErcuU nlmXO4J3cTDF2OIFnFAqcVJzJSQ1Az5s025/YkL4Kgw9ngKkt6Tuo9o0UTMbRCrUYTa6 BjnJp3UUDCmab3wgau5i7QKVTkkmXit8Dyjp1BpofvNHH0/btv3FlzsdCIPc/X1e06mC M9qVydmdw6AKahKfpJvv/PYCT23HbntTwWn2UeMLbQ2RM5tz/qxZ9YFfCLBpM/iVS9UI Bg181vkn+lJiim/87TREsFx0PbnEWz/bK4CegAQXDWfRpy9xmPrEcdddr4D2K/ap5j60 qc/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=TUX0sXEcAN0K1aU6BJZ1TjOIBQmXMpyyj+laRKo6qDQ=; b=cOVSdaK8KTHEx42eaJQ7mdSCBEKKzZXlMWY+YoBXNhLCtk+MUIf0BEUe3yuUodHv7k jq3JBQiZgdsUGBKhHcLc8+/rdwr/VXxShofOhBRmzNjHrZWRQtPG9fxkyML1tJicuqar esrXtSj7N+9ZRew7he4HyLKkBo476Ovg9Qs6RKTIWuIfkT/hPNtPKhSd4Zil5L4pGJ5T BrzxlW+Vfg8j9C0aIp4/UVaYLdBnUEKuKZWLi3+2l2mmZyHio8bj5qS1UHogDsubgVHu xsR+YJAhpbYauQX4d2b4RBcu/jAKti7XCmMEUQvu9fpBafSFpPPowMtkpFE1Ee0B0sSs vxMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hUY3pKnN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n23-20020a63a517000000b004985aa60f58si11137900pgf.192.2022.12.28.03.12.32; Wed, 28 Dec 2022 03:12:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hUY3pKnN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232851AbiL1LF3 (ORCPT <rfc822;eddaouddi.ayoub@gmail.com> + 99 others); Wed, 28 Dec 2022 06:05:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232905AbiL1LFH (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 28 Dec 2022 06:05:07 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8F816419 for <linux-kernel@vger.kernel.org>; Wed, 28 Dec 2022 03:04:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1672225455; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=TUX0sXEcAN0K1aU6BJZ1TjOIBQmXMpyyj+laRKo6qDQ=; b=hUY3pKnNnUZmFVBO+bFII4UDaI9hFc9t/p8HnuEg1w2fo2jZ8T+ylZ972h7fXKDFqVL+Jn AmFK4DOG9Qvt9S0wkqMKAsNr33zFB01kyz0uF/RU8ZMkJvgLtRttPT7zXK700pUZEW8ap1 KvWoa4FMwTF+qbMXja7SKnrkGbLWb+k= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-391-GvfkGOrYPuecNGvFWr0C4g-1; Wed, 28 Dec 2022 06:04:12 -0500 X-MC-Unique: GvfkGOrYPuecNGvFWr0C4g-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B41C585D066; Wed, 28 Dec 2022 11:04:11 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 98C4A40AE1EA; Wed, 28 Dec 2022 11:04:11 +0000 (UTC) From: Paolo Bonzini <pbonzini@redhat.com> To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com Subject: [PATCH] Documentation: kvm: clarify SRCU locking order Date: Wed, 28 Dec 2022 06:04:10 -0500 Message-Id: <20221228110410.1682852-2-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1753456012365274214?= X-GMAIL-MSGID: =?utf-8?q?1753456012365274214?= |
Series |
Documentation: kvm: clarify SRCU locking order
|
|
Commit Message
Paolo Bonzini
Dec. 28, 2022, 11:04 a.m. UTC
Currently only the locking order of SRCU vs kvm->slots_arch_lock
and kvm->slots_lock is documented. Extend this to kvm->lock
since Xen emulation got it terribly wrong.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
Documentation/virt/kvm/locking.rst | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
Comments
On Wed, Dec 28, 2022, Paolo Bonzini wrote: > Currently only the locking order of SRCU vs kvm->slots_arch_lock > and kvm->slots_lock is documented. Extend this to kvm->lock > since Xen emulation got it terribly wrong. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > Documentation/virt/kvm/locking.rst | 19 ++++++++++++++----- > 1 file changed, 14 insertions(+), 5 deletions(-) > > diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst > index 845a561629f1..a3ca76f9be75 100644 > --- a/Documentation/virt/kvm/locking.rst > +++ b/Documentation/virt/kvm/locking.rst > @@ -16,17 +16,26 @@ The acquisition orders for mutexes are as follows: > - kvm->slots_lock is taken outside kvm->irq_lock, though acquiring > them together is quite rare. > > -- Unlike kvm->slots_lock, kvm->slots_arch_lock is released before > - synchronize_srcu(&kvm->srcu). Therefore kvm->slots_arch_lock > - can be taken inside a kvm->srcu read-side critical section, > - while kvm->slots_lock cannot. > - > - kvm->mn_active_invalidate_count ensures that pairs of > invalidate_range_start() and invalidate_range_end() callbacks > use the same memslots array. kvm->slots_lock and kvm->slots_arch_lock > are taken on the waiting side in install_new_memslots, so MMU notifiers > must not take either kvm->slots_lock or kvm->slots_arch_lock. > > +For SRCU: > + > +- ``synchronize_srcu(&kvm->srcu)`` is called _inside_ > + the kvm->slots_lock critical section, therefore kvm->slots_lock > + cannot be taken inside a kvm->srcu read-side critical section. > + Instead, kvm->slots_arch_lock is released before the call > + to ``synchronize_srcu()`` and _can_ be taken inside a > + kvm->srcu read-side critical section. > + > +- kvm->lock is taken inside kvm->srcu, therefore Prior to the recent Xen change, is this actually true? There are many instances where kvm->srcu is taken inside kvm->lock, but I can't find any existing cases where the reverse is true. Logically, it makes sense to take kvm->lock first since kvm->srcu can be taken deep in helpers, e.g. for accessing guest memory. It's also more consistent to take kvm->lock first since kvm->srcu is taken inside vcpu->mutex, and vcpu->mutex is taken inside kvm->lock. Disallowing synchronize_srcu(kvm->srcu) inside kvm->lock isn't probelmatic per se, but it's going to result in a weird set of rules because synchronize_scru() can, and is, called while holding a variety of other locks. In other words, IMO taking kvm->srcu outside of kvm->lock in the Xen code is the real bug. > + ``synchronize_srcu(&kvm->srcu)`` cannot be called inside > + a kvm->lock critical section. If you cannot delay the > + call until after kvm->lock is released, use ``call_srcu``. > + > On x86: > > - vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock > -- > 2.31.1 >
On Tue, Jan 03, 2023, Sean Christopherson wrote: > On Wed, Dec 28, 2022, Paolo Bonzini wrote: > > Currently only the locking order of SRCU vs kvm->slots_arch_lock > > and kvm->slots_lock is documented. Extend this to kvm->lock > > since Xen emulation got it terribly wrong. > > > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > --- > > Documentation/virt/kvm/locking.rst | 19 ++++++++++++++----- > > 1 file changed, 14 insertions(+), 5 deletions(-) > > > > diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst > > index 845a561629f1..a3ca76f9be75 100644 > > --- a/Documentation/virt/kvm/locking.rst > > +++ b/Documentation/virt/kvm/locking.rst > > @@ -16,17 +16,26 @@ The acquisition orders for mutexes are as follows: > > - kvm->slots_lock is taken outside kvm->irq_lock, though acquiring > > them together is quite rare. > > > > -- Unlike kvm->slots_lock, kvm->slots_arch_lock is released before > > - synchronize_srcu(&kvm->srcu). Therefore kvm->slots_arch_lock > > - can be taken inside a kvm->srcu read-side critical section, > > - while kvm->slots_lock cannot. > > - > > - kvm->mn_active_invalidate_count ensures that pairs of > > invalidate_range_start() and invalidate_range_end() callbacks > > use the same memslots array. kvm->slots_lock and kvm->slots_arch_lock > > are taken on the waiting side in install_new_memslots, so MMU notifiers > > must not take either kvm->slots_lock or kvm->slots_arch_lock. > > > > +For SRCU: > > + > > +- ``synchronize_srcu(&kvm->srcu)`` is called _inside_ > > + the kvm->slots_lock critical section, therefore kvm->slots_lock > > + cannot be taken inside a kvm->srcu read-side critical section. > > + Instead, kvm->slots_arch_lock is released before the call > > + to ``synchronize_srcu()`` and _can_ be taken inside a > > + kvm->srcu read-side critical section. > > + > > +- kvm->lock is taken inside kvm->srcu, therefore > > Prior to the recent Xen change, is this actually true? I was thinking of a different change, but v5.19 is still kinda recent, so I'll count it. Better to be lucky than good :-) Commit 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests") introduced the only case I can find where kvm->srcu is taken inside kvm->lock. > There are many instances where kvm->srcu is taken inside kvm->lock, but I > can't find any existing cases where the reverse is true. Logically, it makes > sense to take kvm->lock first since kvm->srcu can be taken deep in helpers, > e.g. for accessing guest memory. It's also more consistent to take kvm->lock > first since kvm->srcu is taken inside vcpu->mutex, and vcpu->mutex is taken > inside kvm->lock. > > Disallowing synchronize_srcu(kvm->srcu) inside kvm->lock isn't probelmatic per se, > but it's going to result in a weird set of rules because synchronize_scru() can, > and is, called while holding a variety of other locks. > > In other words, IMO taking kvm->srcu outside of kvm->lock in the Xen code is the > real bug. I'm doubing down on this. Taking kvm->srcu outside of kvm->lock is all kinds of sketchy, and likely indicates a larger problem. The aformentioned commit that introduced the problematic kvm->srcu vs. kvm->lock also blatantly violates ordering between kvm->lock and vcpu->mutex. Details in the link[*]. The vast majority of flows that take kvm->srcu will already hold a lock of some kind, otherwise the task can't safely deference any VM/vCPU/device data and thus has no reason to acquire kvm->srcu. E.g. taking kvm->srcu to read guest memory is nonsensical without a stable guest physical address to work with. There are exceptions, e.g. evtchn_set_fn() and maybe some ioctls(), but in most cases, taking kvm->lock inside kvm->srcu is just asking for problems. [*] https://lore.kernel.org/all/Y7dN0Negds7XUbvI@google.com
On Thu, 2023-01-05 at 22:38 +0000, Sean Christopherson wrote: > > > > +- kvm->lock is taken inside kvm->srcu, therefore > > > > Prior to the recent Xen change, is this actually true? > > I was thinking of a different change, but v5.19 is still kinda recent, so I'll > count it. Better to be lucky than good :-) > > Commit 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests") introduced > the only case I can find where kvm->srcu is taken inside kvm->lock. > > > There are many instances where kvm->srcu is taken inside kvm->lock, but I > > can't find any existing cases where the reverse is true. Logically, it makes > > sense to take kvm->lock first since kvm->srcu can be taken deep in helpers, > > e.g. for accessing guest memory. It's also more consistent to take kvm->lock > > first since kvm->srcu is taken inside vcpu->mutex, and vcpu->mutex is taken > > inside kvm->lock. > > > > Disallowing synchronize_srcu(kvm->srcu) inside kvm->lock isn't probelmatic per se, > > but it's going to result in a weird set of rules because synchronize_scru() can, > > and is, called while holding a variety of other locks. > > > > In other words, IMO taking kvm->srcu outside of kvm->lock in the Xen code is the > > real bug. > > I'm doubing down on this. Taking kvm->srcu outside of kvm->lock is all kinds of > sketchy, and likely indicates a larger problem. The aformentioned commit that > introduced the problematic kvm->srcu vs. kvm->lock also blatantly violates ordering > between kvm->lock and vcpu->mutex. Details in the link[*]. > > The vast majority of flows that take kvm->srcu will already hold a lock of some > kind, otherwise the task can't safely deference any VM/vCPU/device data and thus > has no reason to acquire kvm->srcu. E.g. taking kvm->srcu to read guest memory > is nonsensical without a stable guest physical address to work with. > > There are exceptions, e.g. evtchn_set_fn() and maybe some ioctls(), but in most > cases, taking kvm->lock inside kvm->srcu is just asking for problems. > > [*] https://lore.kernel.org/all/Y7dN0Negds7XUbvI@google.com So... If we apply the patch below, then the Xen code never takes kvm->lock inside kvm->srcu. (Never takes kvm->lock at all, in fact; I'm just rereading the manual search/replace to make sure the change is valid in all cases.) It does take kvm->scru *without* holding kvm->lock, just not "outside" it. I think we can probably revert commit a79b53aaaa ("KVM: x86: fix deadlock for KVM_XEN_EVTCHN_RESET") after doing this too? From: David Woodhouse <dwmw@amazon.co.uk> Date: Wed, 11 Jan 2023 10:04:38 +0000 Subject: [PATCH] KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock In commit 14243b387137a ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery") the clever version of me left some helpful notes for those who would come after him: /* * For the irqfd workqueue, using the main kvm->lock mutex is * fine since this function is invoked from kvm_set_irq() with * no other lock held, no srcu. In future if it will be called * directly from a vCPU thread (e.g. on hypercall for an IPI) * then it may need to switch to using a leaf-node mutex for * serializing the shared_info mapping. */ mutex_lock(&kvm->lock); In commit 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests") the other version of me ran straight past that comment without reading it, and introduced a potential deadlock by taking vcpu->mutex and kvm->lock in the wrong order. Solve this as originally suggested, by adding a leaf-node lock in the Xen state rather than using kvm->lock for it. Fixes: 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/xen.c | 65 +++++++++++++++------------------ 2 files changed, 30 insertions(+), 36 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 2f5bf581d00a..4ef0143fa682 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1115,6 +1115,7 @@ struct msr_bitmap_range { /* Xen emulation context */ struct kvm_xen { + struct mutex xen_lock; u32 xen_version; bool long_mode; bool runstate_update_flag; diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index c444948ab1ac..713241808a3d 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -604,26 +604,26 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data) if (!IS_ENABLED(CONFIG_64BIT) && data->u.long_mode) { r = -EINVAL; } else { - mutex_lock(&kvm->lock); + mutex_lock(&kvm->arch.xen.xen_lock); kvm->arch.xen.long_mode = !!data->u.long_mode; - mutex_unlock(&kvm->lock); + mutex_unlock(&kvm->arch.xen.xen_lock); r = 0; } break; case KVM_XEN_ATTR_TYPE_SHARED_INFO: - mutex_lock(&kvm->lock); + mutex_lock(&kvm->arch.xen.xen_lock); r = kvm_xen_shared_info_init(kvm, data->u.shared_info.gfn); - mutex_unlock(&kvm->lock); + mutex_unlock(&kvm->arch.xen.xen_lock); break; case KVM_XEN_ATTR_TYPE_UPCALL_VECTOR: if (data->u.vector && data->u.vector < 0x10) r = -EINVAL; else { - mutex_lock(&kvm->lock); + mutex_lock(&kvm->arch.xen.xen_lock); kvm->arch.xen.upcall_vector = data->u.vector; - mutex_unlock(&kvm->lock); + mutex_unlock(&kvm->arch.xen.xen_lock); r = 0; } break; @@ -633,9 +633,9 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data) break; case KVM_XEN_ATTR_TYPE_XEN_VERSION: - mutex_lock(&kvm->lock); + mutex_lock(&kvm->arch.xen.xen_lock); kvm->arch.xen.xen_version = data->u.xen_version; - mutex_unlock(&kvm->lock); + mutex_unlock(&kvm->arch.xen.xen_lock); r = 0; break; @@ -644,9 +644,9 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data) r = -EOPNOTSUPP; break; } - mutex_lock(&kvm->lock); + mutex_lock(&kvm->arch.xen.xen_lock); kvm->arch.xen.runstate_update_flag = !!data->u.runstate_update_flag; - mutex_unlock(&kvm->lock); + mutex_unlock(&kvm->arch.xen.xen_lock); r = 0; break; @@ -661,7 +661,7 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data) { int r = -ENOENT; - mutex_lock(&kvm->lock); + mutex_lock(&kvm->arch.xen.xen_lock); switch (data->type) { case KVM_XEN_ATTR_TYPE_LONG_MODE: @@ -700,7 +700,7 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data) break; } - mutex_unlock(&kvm->lock); + mutex_unlock(&kvm->arch.xen.xen_lock); return r; } @@ -708,7 +708,7 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) { int idx, r = -ENOENT; - mutex_lock(&vcpu->kvm->lock); + mutex_lock(&vcpu->kvm->arch.xen.xen_lock); idx = srcu_read_lock(&vcpu->kvm->srcu); switch (data->type) { @@ -936,7 +936,7 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) } srcu_read_unlock(&vcpu->kvm->srcu, idx); - mutex_unlock(&vcpu->kvm->lock); + mutex_unlock(&vcpu->kvm->arch.xen.xen_lock); return r; } @@ -944,7 +944,7 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) { int r = -ENOENT; - mutex_lock(&vcpu->kvm->lock); + mutex_lock(&vcpu->kvm->arch.xen.xen_lock); switch (data->type) { case KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO: @@ -1027,7 +1027,7 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) break; } - mutex_unlock(&vcpu->kvm->lock); + mutex_unlock(&vcpu->kvm->arch.xen.xen_lock); return r; } @@ -1120,7 +1120,7 @@ int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc) xhc->blob_size_32 || xhc->blob_size_64)) return -EINVAL; - mutex_lock(&kvm->lock); + mutex_lock(&kvm->arch.xen.xen_lock); if (xhc->msr && !kvm->arch.xen_hvm_config.msr) static_branch_inc(&kvm_xen_enabled.key); @@ -1129,7 +1129,7 @@ int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc) memcpy(&kvm->arch.xen_hvm_config, xhc, sizeof(*xhc)); - mutex_unlock(&kvm->lock); + mutex_unlock(&kvm->arch.xen.xen_lock); return 0; } @@ -1672,15 +1672,7 @@ static int kvm_xen_set_evtchn(struct kvm_xen_evtchn *xe, struct kvm *kvm) mm_borrowed = true; } - /* - * For the irqfd workqueue, using the main kvm->lock mutex is - * fine since this function is invoked from kvm_set_irq() with - * no other lock held, no srcu. In future if it will be called - * directly from a vCPU thread (e.g. on hypercall for an IPI) - * then it may need to switch to using a leaf-node mutex for - * serializing the shared_info mapping. - */ - mutex_lock(&kvm->lock); + mutex_lock(&kvm->arch.xen.xen_lock); /* * It is theoretically possible for the page to be unmapped @@ -1709,7 +1701,7 @@ static int kvm_xen_set_evtchn(struct kvm_xen_evtchn *xe, struct kvm *kvm) srcu_read_unlock(&kvm->srcu, idx); } while(!rc); - mutex_unlock(&kvm->lock); + mutex_unlock(&kvm->arch.xen.xen_lock); if (mm_borrowed) kthread_unuse_mm(kvm->mm); @@ -1825,7 +1817,7 @@ static int kvm_xen_eventfd_update(struct kvm *kvm, int ret; /* Protect writes to evtchnfd as well as the idr lookup. */ - mutex_lock(&kvm->lock); + mutex_lock(&kvm->arch.xen.xen_lock); evtchnfd = idr_find(&kvm->arch.xen.evtchn_ports, port); ret = -ENOENT; @@ -1856,7 +1848,7 @@ static int kvm_xen_eventfd_update(struct kvm *kvm, } ret = 0; out_unlock: - mutex_unlock(&kvm->lock); + mutex_unlock(&kvm->arch.xen.xen_lock); return ret; } @@ -1919,10 +1911,10 @@ static int kvm_xen_eventfd_assign(struct kvm *kvm, evtchnfd->deliver.port.priority = data->u.evtchn.deliver.port.priority; } - mutex_lock(&kvm->lock); + mutex_lock(&kvm->arch.xen.xen_lock); ret = idr_alloc(&kvm->arch.xen.evtchn_ports, evtchnfd, port, port + 1, GFP_KERNEL); - mutex_unlock(&kvm->lock); + mutex_unlock(&kvm->arch.xen.xen_lock); if (ret >= 0) return 0; @@ -1940,9 +1932,9 @@ static int kvm_xen_eventfd_deassign(struct kvm *kvm, u32 port) { struct evtchnfd *evtchnfd; - mutex_lock(&kvm->lock); + mutex_lock(&kvm->arch.xen.xen_lock); evtchnfd = idr_remove(&kvm->arch.xen.evtchn_ports, port); - mutex_unlock(&kvm->lock); + mutex_unlock(&kvm->arch.xen.xen_lock); if (!evtchnfd) return -ENOENT; @@ -1959,7 +1951,7 @@ static int kvm_xen_eventfd_reset(struct kvm *kvm) struct evtchnfd *evtchnfd; int i; - mutex_lock(&kvm->lock); + mutex_lock(&kvm->arch.xen.xen_lock); idr_for_each_entry(&kvm->arch.xen.evtchn_ports, evtchnfd, i) { idr_remove(&kvm->arch.xen.evtchn_ports, evtchnfd->send_port); synchronize_srcu(&kvm->srcu); @@ -1967,7 +1959,7 @@ static int kvm_xen_eventfd_reset(struct kvm *kvm) eventfd_ctx_put(evtchnfd->deliver.eventfd.ctx); kfree(evtchnfd); } - mutex_unlock(&kvm->lock); + mutex_unlock(&kvm->arch.xen.xen_lock); return 0; } @@ -2059,6 +2051,7 @@ void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu) void kvm_xen_init_vm(struct kvm *kvm) { + mutex_init(&kvm->arch.xen.xen_lock); idr_init(&kvm->arch.xen.evtchn_ports); kvm_gpc_init(&kvm->arch.xen.shinfo_cache, kvm, NULL, KVM_HOST_USES_PFN); }
diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst index 845a561629f1..a3ca76f9be75 100644 --- a/Documentation/virt/kvm/locking.rst +++ b/Documentation/virt/kvm/locking.rst @@ -16,17 +16,26 @@ The acquisition orders for mutexes are as follows: - kvm->slots_lock is taken outside kvm->irq_lock, though acquiring them together is quite rare. -- Unlike kvm->slots_lock, kvm->slots_arch_lock is released before - synchronize_srcu(&kvm->srcu). Therefore kvm->slots_arch_lock - can be taken inside a kvm->srcu read-side critical section, - while kvm->slots_lock cannot. - - kvm->mn_active_invalidate_count ensures that pairs of invalidate_range_start() and invalidate_range_end() callbacks use the same memslots array. kvm->slots_lock and kvm->slots_arch_lock are taken on the waiting side in install_new_memslots, so MMU notifiers must not take either kvm->slots_lock or kvm->slots_arch_lock. +For SRCU: + +- ``synchronize_srcu(&kvm->srcu)`` is called _inside_ + the kvm->slots_lock critical section, therefore kvm->slots_lock + cannot be taken inside a kvm->srcu read-side critical section. + Instead, kvm->slots_arch_lock is released before the call + to ``synchronize_srcu()`` and _can_ be taken inside a + kvm->srcu read-side critical section. + +- kvm->lock is taken inside kvm->srcu, therefore + ``synchronize_srcu(&kvm->srcu)`` cannot be called inside + a kvm->lock critical section. If you cannot delay the + call until after kvm->lock is released, use ``call_srcu``. + On x86: - vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock