Message ID | 20230310155955.29652-1-yan.y.zhao@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp973744wrd; Fri, 10 Mar 2023 08:39:26 -0800 (PST) X-Google-Smtp-Source: AK7set/J+UaIdHQxX/M1YM7FaK2S1kGatGPWkCEbPgg4Vui95VgjPCt1vYQXmkCZ81h2n0rZC7Mx X-Received: by 2002:a17:903:1247:b0:19e:e39b:6d90 with SMTP id u7-20020a170903124700b0019ee39b6d90mr14533508plh.25.1678466366738; Fri, 10 Mar 2023 08:39:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678466366; cv=none; d=google.com; s=arc-20160816; b=gbjByeZSGAuUW/1IhywrYnttljzLeWJ7Oe+plfvqAadSkYQTVzk4u45yJXSKAvjZTX jpfNDd9xCpIGQWa8LNmf4k4YKX6AhFOj3T5M40jdD0pI8HUY70d+3KZOB4Fx8ixWwI0B XUU4yno9PmCXDWzQ0n81oPnDZdz1j0cimPqzqGBbWqY5JngF+QmlSI1lIuiLZhDLZp+c dV5j8gf5guKQ2XNY69D8MwI2/gA5FP6Z23BdKKqGITO3rKDVYA5OG9AvvnGLLMzVdI13 vAl96wsgFJWHzSLgtfJfh9bIW5LLpozcxqs0PP0pno0mh1mn5dX6AXhphTQTo+z8dS3T 1EAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from :dkim-signature; bh=Fdob0qduMTD21kbU2abukk8iO/a4pr/9pE08e8+Rp80=; b=0bUh2SdJVdCr+c46aXP4SiuJyzleQHgZWnR4frkxaDVu18MZ8ygO6dmJUosrPXiiQE Dquy8nn0RDLwSEnFlIQhgHTDN37PSv0fDGcjWuOduUlL0r7okZL9s/qY5SSjN143sXyz m5xJ3dtuxNkgfe9CU8DL946ry7gahgUiRLG5bJvATmaQraIrzSopQKGaFmOtZ0+7/p57 Uxcqf5o6d+5vx3PVMmgrwdfe4F/lqCiQZoE6fPVFUQfIs95P416vmHwSTU3se5OROB+D KtgS7c3wkc52APx98Hj82OgDtygzCkZii4CywVadNsMrOCAiWhN9sNwkFBnOzM6VLLNp J0TQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fQsa6kLS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j19-20020a170902759300b0019a5aa4d768si253696pll.611.2023.03.10.08.39.14; Fri, 10 Mar 2023 08:39:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fQsa6kLS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231540AbjCJQ2F (ORCPT <rfc822;carlos.wei.hk@gmail.com> + 99 others); Fri, 10 Mar 2023 11:28:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47924 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231489AbjCJQ1q (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 10 Mar 2023 11:27:46 -0500 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C13C310D773; Fri, 10 Mar 2023 08:23:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678465439; x=1710001439; h=from:to:cc:subject:date:message-id; bh=L//xlqufoDi3Y+BynOipec0y5DkvvSVKczMAsPP3L4Y=; b=fQsa6kLS1tLs/ZZKbX++U2LJsaJ0FEp7treqWHclua1TXzlLt8ZDdjps 0OLq6Ci37wZAoacVzzszkqAUcnSnf6gh0492myKLbucQFR7OdK254Vxaa 9x2u9v3+wZW9kZ7nygt9HOVN4YhN+ukO8WNEFfwcf2bwCg1eKkQDeBxRd j9s3qvZUFMN6J1woOLrl6C4R7FjLR79rJzzX/BMetHfsaIbc16q6Z8jSP +Pd1itzf2igs6TFoU1TOIv5TCu4ZfnOaURfcpJNA30w7bkWKxQeSiHf9r jGU/EOiJxFEKdvotqyIeTEIgDYydAxqdtNkOt75QMZ+tJESUkbKS8f//n w==; X-IronPort-AV: E=McAfee;i="6500,9779,10645"; a="338318320" X-IronPort-AV: E=Sophos;i="5.98,250,1673942400"; d="scan'208";a="338318320" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2023 08:23:48 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10645"; a="671138170" X-IronPort-AV: E=Sophos;i="5.98,250,1673942400"; d="scan'208";a="671138170" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2023 08:23:46 -0800 From: Yan Zhao <yan.y.zhao@intel.com> To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, Yan Zhao <yan.y.zhao@intel.com> Subject: [PATCH] KVM: VMX: fix lockdep warning on posted intr wakeup Date: Fri, 10 Mar 2023 23:59:55 +0800 Message-Id: <20230310155955.29652-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759999548779571773?= X-GMAIL-MSGID: =?utf-8?q?1759999548779571773?= |
Series |
KVM: VMX: fix lockdep warning on posted intr wakeup
|
|
Commit Message
Yan Zhao
March 10, 2023, 3:59 p.m. UTC
Use rcu list to break the possible circular locking dependency reported
by lockdep.
path 1, ``sysvec_kvm_posted_intr_wakeup_ipi()`` --> ``pi_wakeup_handler()``
--> ``kvm_vcpu_wake_up()`` --> ``try_to_wake_up()``,
the lock sequence is
&per_cpu(wakeup_vcpus_on_cpu_lock, cpu) --> &p->pi_lock.
path 2, ``schedule()`` --> ``kvm_sched_out()`` --> ``vmx_vcpu_put()`` -->
``vmx_vcpu_pi_put()`` --> ``pi_enable_wakeup_handler()``,
the lock sequence is
&rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu).
path 3, ``task_rq_lock()``,
the lock sequence is &p->pi_lock --> &rq->__lock
lockdep report:
Chain exists of:
&p->pi_lock --> &rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu)
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
lock(&rq->__lock);
lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
lock(&p->pi_lock);
*** DEADLOCK ***
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
arch/x86/kvm/vmx/posted_intr.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
base-commit: 89400df96a7570b651404bbc3b7afe627c52a192
Comments
On Fri, Mar 10, 2023, Yan Zhao wrote: > Use rcu list to break the possible circular locking dependency reported > by lockdep. > > path 1, ``sysvec_kvm_posted_intr_wakeup_ipi()`` --> ``pi_wakeup_handler()`` > --> ``kvm_vcpu_wake_up()`` --> ``try_to_wake_up()``, > the lock sequence is > &per_cpu(wakeup_vcpus_on_cpu_lock, cpu) --> &p->pi_lock. Heh, that's an unfortunate naming collision. It took me a bit of staring to realize pi_lock is a scheduler lock, not a posted interrupt lock. > path 2, ``schedule()`` --> ``kvm_sched_out()`` --> ``vmx_vcpu_put()`` --> > ``vmx_vcpu_pi_put()`` --> ``pi_enable_wakeup_handler()``, > the lock sequence is > &rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu). > > path 3, ``task_rq_lock()``, > the lock sequence is &p->pi_lock --> &rq->__lock > > lockdep report: > Chain exists of: > &p->pi_lock --> &rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu) > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu)); > lock(&rq->__lock); > lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu)); > lock(&p->pi_lock); > > *** DEADLOCK *** I don't think there's a deadlock here. pi_wakeup_handler() is called from IRQ context, pi_enable_wakeup_handler() disable IRQs before acquiring wakeup_vcpus_on_cpu_lock, and "cpu" in pi_enable_wakeup_handler() is guaranteed to be the current CPU, i.e. the same CPU. So CPU0 and CPU1 can't be contending for the same wakeup_vcpus_on_cpu_lock in this scenario. vmx_vcpu_pi_load() does do cross-CPU locking, but finish_task_switch() drops rq->__lock before invoking the sched_in notifiers. > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > --- > arch/x86/kvm/vmx/posted_intr.c | 12 +++++------- > 1 file changed, 5 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c > index 94c38bea60e7..e3ffc45c0a7b 100644 > --- a/arch/x86/kvm/vmx/posted_intr.c > +++ b/arch/x86/kvm/vmx/posted_intr.c > @@ -90,7 +90,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu) > */ > if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) { > raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu)); > - list_del(&vmx->pi_wakeup_list); > + list_del_rcu(&vmx->pi_wakeup_list); > raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu)); _If_ there is indeed a possible deadlock, there technically needs to be an explicit synchonize_rcu() before freeing the vCPU. In practice, there are probably multiple synchonize_rcu() calls in the destruction path, not to mention that it would take a minor miracle for pi_wakeup_handler() to get stalled long enough to achieve a use-after-free. > } > > @@ -153,7 +153,7 @@ static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu) > local_irq_save(flags); > > raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu)); > - list_add_tail(&vmx->pi_wakeup_list, > + list_add_tail_rcu(&vmx->pi_wakeup_list, > &per_cpu(wakeup_vcpus_on_cpu, vcpu->cpu)); > raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu)); > @@ -219,16 +219,14 @@ void pi_wakeup_handler(void) > { > int cpu = smp_processor_id(); > struct list_head *wakeup_list = &per_cpu(wakeup_vcpus_on_cpu, cpu); > - raw_spinlock_t *spinlock = &per_cpu(wakeup_vcpus_on_cpu_lock, cpu); > struct vcpu_vmx *vmx; > > - raw_spin_lock(spinlock); > - list_for_each_entry(vmx, wakeup_list, pi_wakeup_list) { > - > + rcu_read_lock(); This isn't strictly necessary, IRQs are disabled. > + list_for_each_entry_rcu(vmx, wakeup_list, pi_wakeup_list) { > if (pi_test_on(&vmx->pi_desc)) > kvm_vcpu_wake_up(&vmx->vcpu); > } > - raw_spin_unlock(spinlock); > + rcu_read_unlock(); > } > > void __init pi_init_cpu(int cpu) > > base-commit: 89400df96a7570b651404bbc3b7afe627c52a192 > -- > 2.17.1 >
On Fri, Mar 10, 2023 at 09:00:00AM -0800, Sean Christopherson wrote: > On Fri, Mar 10, 2023, Yan Zhao wrote: > > Use rcu list to break the possible circular locking dependency reported > > by lockdep. > > > > path 1, ``sysvec_kvm_posted_intr_wakeup_ipi()`` --> ``pi_wakeup_handler()`` > > --> ``kvm_vcpu_wake_up()`` --> ``try_to_wake_up()``, > > the lock sequence is > > &per_cpu(wakeup_vcpus_on_cpu_lock, cpu) --> &p->pi_lock. > > Heh, that's an unfortunate naming collision. It took me a bit of staring to > realize pi_lock is a scheduler lock, not a posted interrupt lock. me too :) > > > path 2, ``schedule()`` --> ``kvm_sched_out()`` --> ``vmx_vcpu_put()`` --> > > ``vmx_vcpu_pi_put()`` --> ``pi_enable_wakeup_handler()``, > > the lock sequence is > > &rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu). > > > > path 3, ``task_rq_lock()``, > > the lock sequence is &p->pi_lock --> &rq->__lock > > > > lockdep report: > > Chain exists of: > > &p->pi_lock --> &rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu) > > > > Possible unsafe locking scenario: > > > > CPU0 CPU1 > > ---- ---- > > lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu)); > > lock(&rq->__lock); > > lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu)); > > lock(&p->pi_lock); > > > > *** DEADLOCK *** > > I don't think there's a deadlock here. pi_wakeup_handler() is called from IRQ > context, pi_enable_wakeup_handler() disable IRQs before acquiring > wakeup_vcpus_on_cpu_lock, and "cpu" in pi_enable_wakeup_handler() is guaranteed > to be the current CPU, i.e. the same CPU. So CPU0 and CPU1 can't be contending > for the same wakeup_vcpus_on_cpu_lock in this scenario. > > vmx_vcpu_pi_load() does do cross-CPU locking, but finish_task_switch() drops > rq->__lock before invoking the sched_in notifiers. Right. Thanks for this analysis! But the path of pi_wakeup_handler() tells lockdep that the lock ordering is &p->pi_lock --> &rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu), so the lockdep just complains about it. > > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > > --- > > arch/x86/kvm/vmx/posted_intr.c | 12 +++++------- > > 1 file changed, 5 insertions(+), 7 deletions(-) > > > > diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c > > index 94c38bea60e7..e3ffc45c0a7b 100644 > > --- a/arch/x86/kvm/vmx/posted_intr.c > > +++ b/arch/x86/kvm/vmx/posted_intr.c > > @@ -90,7 +90,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu) > > */ > > if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) { > > raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu)); > > - list_del(&vmx->pi_wakeup_list); > > + list_del_rcu(&vmx->pi_wakeup_list); > > raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu)); > > _If_ there is indeed a possible deadlock, there technically needs to be an explicit > synchonize_rcu() before freeing the vCPU. In practice, there are probably multiple > synchonize_rcu() calls in the destruction path, not to mention that it would take a > minor miracle for pi_wakeup_handler() to get stalled long enough to achieve a > use-after-free. > Yes, I neglected it. Thanks for the quick and detailed review! I will post v2 to fix it. Yan
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c index 94c38bea60e7..e3ffc45c0a7b 100644 --- a/arch/x86/kvm/vmx/posted_intr.c +++ b/arch/x86/kvm/vmx/posted_intr.c @@ -90,7 +90,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu) */ if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) { raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu)); - list_del(&vmx->pi_wakeup_list); + list_del_rcu(&vmx->pi_wakeup_list); raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu)); } @@ -153,7 +153,7 @@ static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu) local_irq_save(flags); raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu)); - list_add_tail(&vmx->pi_wakeup_list, + list_add_tail_rcu(&vmx->pi_wakeup_list, &per_cpu(wakeup_vcpus_on_cpu, vcpu->cpu)); raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu)); @@ -219,16 +219,14 @@ void pi_wakeup_handler(void) { int cpu = smp_processor_id(); struct list_head *wakeup_list = &per_cpu(wakeup_vcpus_on_cpu, cpu); - raw_spinlock_t *spinlock = &per_cpu(wakeup_vcpus_on_cpu_lock, cpu); struct vcpu_vmx *vmx; - raw_spin_lock(spinlock); - list_for_each_entry(vmx, wakeup_list, pi_wakeup_list) { - + rcu_read_lock(); + list_for_each_entry_rcu(vmx, wakeup_list, pi_wakeup_list) { if (pi_test_on(&vmx->pi_desc)) kvm_vcpu_wake_up(&vmx->vcpu); } - raw_spin_unlock(spinlock); + rcu_read_unlock(); } void __init pi_init_cpu(int cpu)