Message ID | 20230421075636.24946-1-alice.chao@mediatek.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp910292vqo; Fri, 21 Apr 2023 01:31:42 -0700 (PDT) X-Google-Smtp-Source: AKy350Zd5MTxF9KuaZ5QDu+kaWEGOa3QwMwIAOuRir/cuqR+JBv9Yti83ieyu/7h851O+PipGHyo X-Received: by 2002:a17:903:2444:b0:19e:6cb9:4c8f with SMTP id l4-20020a170903244400b0019e6cb94c8fmr5900690pls.41.1682065901744; Fri, 21 Apr 2023 01:31:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682065901; cv=none; d=google.com; s=arc-20160816; b=JMK4WDYJW9R5sQFabpyqx1fFG14vRZIiE4Ahexxavb2wRb09Aj51jFtD4ZNYEufvuV HZMtSVhKCxj8GtwLVBe1geSNYGjRIa7XfP95G/A6g6dPIw48+BK/adlpURE9XUyySXba vSZoOcYznvGxAy8mj1JrivJ1pmYz3Nd1iXBemdGc5Cm7XEmBJcsbl6KDDtu8eX993Wgj 5VqQSw1/xWv/WO7RfK99biXuoGjux9JrMs6KraF4/ZMDw691UgkKbQMyqrfBZMAJFGOk P7dPSaBV8Q7uSXOFaPP4g8UsTIExu9aesm/WxbcoBCWEC01RkSehrTIgfefvaLKIS/Mz nQ9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:subject:cc:to:from :dkim-signature; bh=sxNYSMoz06hMy3uIvM5UFTdP8k6fXmAzZv11t+SiaFE=; b=lyyHQWt5mzsqTX2vE9zoIS4B2ZnmKxSd+u9ad18UB6QOBYd7to6cVnDbfVnpZceCtL Z8HvyFR949RijUYqNfbI54n1ZOlt/i9+3WrMS96hrleVKbCRt0jiupkcw81a6EkcBEND CXFj7IpwCOAucfZSmx5JKDZKU5WRoPv7i6S4cA8ng6t0NcXP96YmAoPqYG7/SFGu54lD 0TMAXq3mDXwXsO/5qGLHDLucl8eHacjsfpf+qcTXTYtsABC+SzZ6W58fAXFLAvKLCJDo IUF8goJTk0o17dzAL7pkTy6Lym0vn6zzbh1ZR8Xu9T/asO/o1A6dE5iElyEV105qBldv qkwg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mediatek.com header.s=dk header.b=mBfEw5E2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=mediatek.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c5-20020a170903234500b001a6828a19c7si4076360plh.555.2023.04.21.01.31.29; Fri, 21 Apr 2023 01:31:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@mediatek.com header.s=dk header.b=mBfEw5E2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=mediatek.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231621AbjDUH5j (ORCPT <rfc822;cjcooper78@gmail.com> + 99 others); Fri, 21 Apr 2023 03:57:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231440AbjDUH5C (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 21 Apr 2023 03:57:02 -0400 Received: from mailgw01.mediatek.com (unknown [60.244.123.138]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD7027DA8; Fri, 21 Apr 2023 00:56:56 -0700 (PDT) X-UUID: 0dec830ce01a11eda9a90f0bb45854f4-20230421 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=mediatek.com; s=dk; h=Content-Type:MIME-Version:Message-ID:Date:Subject:CC:To:From; bh=sxNYSMoz06hMy3uIvM5UFTdP8k6fXmAzZv11t+SiaFE=; b=mBfEw5E2PTHOMvkUs0p3UcTPJ1TF7+PhHhAp9GxBbQ0OUw3ApbAMgZzpTsAjfCdcim6ERVA+8nzH93hIhMYrD12z/R1ad53eZFW8DuirJhsmIbQCXUZq6ps0R+/4jFb7b6bip51z9NjR409RKt8pjC2raxRhYCyBEqsKvXc65eM=; X-CID-P-RULE: Release_Ham X-CID-O-INFO: VERSION:1.1.22,REQID:6f2f2dc9-2390-46da-9dd3-4284b72fd113,IP:0,U RL:0,TC:0,Content:0,EDM:0,RT:0,SF:0,FILE:0,BULK:0,RULE:Release_Ham,ACTION: release,TS:0 X-CID-META: VersionHash:120426c,CLOUDID:bce312a2-8fcb-430b-954a-ba3f00fa94a5,B ulkID:nil,BulkQuantity:0,Recheck:0,SF:102,TC:nil,Content:0,EDM:-3,IP:nil,U RL:1,File:nil,Bulk:nil,QS:nil,BEC:nil,COL:0,OSI:0,OSA:0,AV:0 X-CID-BVR: 0,NGT X-CID-BAS: 0,NGT,0,_ X-UUID: 0dec830ce01a11eda9a90f0bb45854f4-20230421 Received: from mtkmbs11n2.mediatek.inc [(172.21.101.187)] by mailgw01.mediatek.com (envelope-from <alice.chao@mediatek.com>) (Generic MTA with TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 256/256) with ESMTP id 1109669155; Fri, 21 Apr 2023 15:56:42 +0800 Received: from mtkmbs11n2.mediatek.inc (172.21.101.187) by mtkmbs11n1.mediatek.inc (172.21.101.185) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.25; Fri, 21 Apr 2023 15:56:41 +0800 Received: from mtksdccf07.mediatek.inc (172.21.84.99) by mtkmbs11n2.mediatek.inc (172.21.101.73) with Microsoft SMTP Server id 15.2.1118.25 via Frontend Transport; Fri, 21 Apr 2023 15:56:41 +0800 From: Alice Chao <alice.chao@mediatek.com> To: <alim.akhtar@samsung.com>, <avri.altman@wdc.com>, <bvanassche@acm.org>, <jejb@linux.ibm.com>, <martin.petersen@oracle.com>, <matthias.bgg@gmail.com>, <angelogioacchino.delregno@collabora.com>, <quic_asutoshd@quicinc.com>, <quic_cang@quicinc.com>, <mani@kernel.org>, <linux-scsi@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <linux-arm-kernel@lists.infradead.org>, <linux-mediatek@lists.infradead.org> CC: <stanley.chu@mediatek.com>, <peter.wang@mediatek.com>, <chun-hung.wu@mediatek.com>, <alice.chao@mediatek.com>, <powen.kao@mediatek.com>, <naomi.chu@mediatek.com>, <cc.chou@mediatek.com>, <chaotian.jing@mediatek.com>, <jiajie.hao@mediatek.com>, <tun-yu.yu@mediatek.com>, <eddie.huang@mediatek.com>, <wsd_upstream@mediatek.com> Subject: [PATCH 1/1] ufs: core: fix &hwq->cq_lock deadlock issue Date: Fri, 21 Apr 2023 15:56:36 +0800 Message-ID: <20230421075636.24946-1-alice.chao@mediatek.com> X-Mailer: git-send-email 2.18.0 MIME-Version: 1.0 Content-Type: text/plain X-MTK: N X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS, T_SCC_BODY_TEXT_LINE,T_SPF_TEMPERROR,UNPARSEABLE_RELAY,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763773935198436988?= X-GMAIL-MSGID: =?utf-8?q?1763773935198436988?= |
Series |
[1/1] ufs: core: fix &hwq->cq_lock deadlock issue
|
|
Commit Message
Alice Chao
April 21, 2023, 7:56 a.m. UTC
[name:lockdep&]WARNING: inconsistent lock state
[name:lockdep&]--------------------------------
[name:lockdep&]inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[name:lockdep&]kworker/u16:4/260 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffffff8028444600 (&hwq->cq_lock){?.-.}-{2:2}, at:
ufshcd_mcq_poll_cqe_lock+0x30/0xe0
[name:lockdep&]{IN-HARDIRQ-W} state was registered at:
lock_acquire+0x17c/0x33c
_raw_spin_lock+0x5c/0x7c
ufshcd_mcq_poll_cqe_lock+0x30/0xe0
ufs_mtk_mcq_intr+0x60/0x1bc [ufs_mediatek_mod]
__handle_irq_event_percpu+0x140/0x3ec
handle_irq_event+0x50/0xd8
handle_fasteoi_irq+0x148/0x2b0
generic_handle_domain_irq+0x4c/0x6c
gic_handle_irq+0x58/0x134
call_on_irq_stack+0x40/0x74
do_interrupt_handler+0x84/0xe4
el1_interrupt+0x3c/0x78
<snip>
Possible unsafe locking scenario:
CPU0
----
lock(&hwq->cq_lock);
<Interrupt>
lock(&hwq->cq_lock);
*** DEADLOCK ***
2 locks held by kworker/u16:4/260:
[name:lockdep&]
stack backtrace:
CPU: 7 PID: 260 Comm: kworker/u16:4 Tainted: G S W OE
6.1.17-mainline-android14-2-g277223301adb #1
Workqueue: ufs_eh_wq_0 ufshcd_err_handler
Call trace:
dump_backtrace+0x10c/0x160
show_stack+0x20/0x30
dump_stack_lvl+0x98/0xd8
dump_stack+0x20/0x60
print_usage_bug+0x584/0x76c
mark_lock_irq+0x488/0x510
mark_lock+0x1ec/0x25c
__lock_acquire+0x4d8/0xffc
lock_acquire+0x17c/0x33c
_raw_spin_lock+0x5c/0x7c
ufshcd_mcq_poll_cqe_lock+0x30/0xe0
ufshcd_poll+0x68/0x1b0
ufshcd_transfer_req_compl+0x9c/0xc8
ufshcd_err_handler+0x3bc/0xea0
process_one_work+0x2f4/0x7e8
worker_thread+0x234/0x450
kthread+0x110/0x134
ret_from_fork+0x10/0x20
ufs_mtk_mcq_intr() could refer to
https://lore.kernel.org/all/20230328103423.10970-3-powen.kao@mediatek.com/
When ufshcd_err_handler() is executed, CQ event interrupt can enter
waiting for the same lock. It could happened in upstream code path
ufshcd_handle_mcq_cq_events() and also in ufs_mtk_mcq_intr(). This
warning message will be generated when &hwq->cq_lock is used in IRQ
context with IRQ enabled. Use ufshcd_mcq_poll_cqe_lock() with
spin_lock_irqsave instead of spin_lock to resolve the deadlock issue.
Signed-off-by: Alice Chao <alice.chao@mediatek.com>
---
drivers/ufs/core/ufs-mcq.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
Comments
On 4/21/2023 3:56 PM, Alice Chao wrote: > [name:lockdep&]WARNING: inconsistent lock state > [name:lockdep&]-------------------------------- > [name:lockdep&]inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. > [name:lockdep&]kworker/u16:4/260 [HC0[0]:SC0[0]:HE1:SE1] takes: > ffffff8028444600 (&hwq->cq_lock){?.-.}-{2:2}, at: > ufshcd_mcq_poll_cqe_lock+0x30/0xe0 > [name:lockdep&]{IN-HARDIRQ-W} state was registered at: > lock_acquire+0x17c/0x33c > _raw_spin_lock+0x5c/0x7c > ufshcd_mcq_poll_cqe_lock+0x30/0xe0 > ufs_mtk_mcq_intr+0x60/0x1bc [ufs_mediatek_mod] > __handle_irq_event_percpu+0x140/0x3ec > handle_irq_event+0x50/0xd8 > handle_fasteoi_irq+0x148/0x2b0 > generic_handle_domain_irq+0x4c/0x6c > gic_handle_irq+0x58/0x134 > call_on_irq_stack+0x40/0x74 > do_interrupt_handler+0x84/0xe4 > el1_interrupt+0x3c/0x78 > <snip> > > Possible unsafe locking scenario: > CPU0 > ---- > lock(&hwq->cq_lock); > <Interrupt> > lock(&hwq->cq_lock); > *** DEADLOCK *** > 2 locks held by kworker/u16:4/260: > > [name:lockdep&] > stack backtrace: > CPU: 7 PID: 260 Comm: kworker/u16:4 Tainted: G S W OE > 6.1.17-mainline-android14-2-g277223301adb #1 > Workqueue: ufs_eh_wq_0 ufshcd_err_handler > > Call trace: > dump_backtrace+0x10c/0x160 > show_stack+0x20/0x30 > dump_stack_lvl+0x98/0xd8 > dump_stack+0x20/0x60 > print_usage_bug+0x584/0x76c > mark_lock_irq+0x488/0x510 > mark_lock+0x1ec/0x25c > __lock_acquire+0x4d8/0xffc > lock_acquire+0x17c/0x33c > _raw_spin_lock+0x5c/0x7c > ufshcd_mcq_poll_cqe_lock+0x30/0xe0 > ufshcd_poll+0x68/0x1b0 > ufshcd_transfer_req_compl+0x9c/0xc8 > ufshcd_err_handler+0x3bc/0xea0 > process_one_work+0x2f4/0x7e8 > worker_thread+0x234/0x450 > kthread+0x110/0x134 > ret_from_fork+0x10/0x20 > > ufs_mtk_mcq_intr() could refer to > https://lore.kernel.org/all/20230328103423.10970-3-powen.kao@mediatek.com/ > > When ufshcd_err_handler() is executed, CQ event interrupt can enter > waiting for the same lock. It could happened in upstream code path > ufshcd_handle_mcq_cq_events() and also in ufs_mtk_mcq_intr(). This > warning message will be generated when &hwq->cq_lock is used in IRQ > context with IRQ enabled. Use ufshcd_mcq_poll_cqe_lock() with > spin_lock_irqsave instead of spin_lock to resolve the deadlock issue. > > Signed-off-by: Alice Chao <alice.chao@mediatek.com> > --- > drivers/ufs/core/ufs-mcq.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c > index 31df052fbc41..202ff71e1b58 100644 > --- a/drivers/ufs/core/ufs-mcq.c > +++ b/drivers/ufs/core/ufs-mcq.c > @@ -299,11 +299,11 @@ EXPORT_SYMBOL_GPL(ufshcd_mcq_poll_cqe_nolock); > unsigned long ufshcd_mcq_poll_cqe_lock(struct ufs_hba *hba, > struct ufs_hw_queue *hwq) > { > - unsigned long completed_reqs; > + unsigned long completed_reqs, flags; > > - spin_lock(&hwq->cq_lock); > + spin_lock_irqsave(&hwq->cq_lock, flags); > completed_reqs = ufshcd_mcq_poll_cqe_nolock(hba, hwq); > - spin_unlock(&hwq->cq_lock); > + spin_unlock_irqrestore(&hwq->cq_lock, flags); > > return completed_reqs; > } Reviewed-by: Can Guo <quic_cang@quicinc.com> Thanks for the fix. Regards, Can Guo.
On 4/21/23 00:56, Alice Chao wrote: > When ufshcd_err_handler() is executed, CQ event interrupt can enter > waiting for the same lock. It could happened in upstream code path > ufshcd_handle_mcq_cq_events() and also in ufs_mtk_mcq_intr(). This > warning message will be generated when &hwq->cq_lock is used in IRQ > context with IRQ enabled. Use ufshcd_mcq_poll_cqe_lock() with > spin_lock_irqsave instead of spin_lock to resolve the deadlock issue. Please add a Fixes: tag. Thanks, Bart.
Hi Alice, Alice Chao <alice.chao@mediatek.com> 於 2023年4月21日 週五 下午3:58寫道: > > [name:lockdep&]WARNING: inconsistent lock state > [name:lockdep&]-------------------------------- > [name:lockdep&]inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. > [name:lockdep&]kworker/u16:4/260 [HC0[0]:SC0[0]:HE1:SE1] takes: > ffffff8028444600 (&hwq->cq_lock){?.-.}-{2:2}, at: > ufshcd_mcq_poll_cqe_lock+0x30/0xe0 > [name:lockdep&]{IN-HARDIRQ-W} state was registered at: > lock_acquire+0x17c/0x33c > _raw_spin_lock+0x5c/0x7c > ufshcd_mcq_poll_cqe_lock+0x30/0xe0 > ufs_mtk_mcq_intr+0x60/0x1bc [ufs_mediatek_mod] > __handle_irq_event_percpu+0x140/0x3ec > handle_irq_event+0x50/0xd8 > handle_fasteoi_irq+0x148/0x2b0 > generic_handle_domain_irq+0x4c/0x6c > gic_handle_irq+0x58/0x134 > call_on_irq_stack+0x40/0x74 > do_interrupt_handler+0x84/0xe4 > el1_interrupt+0x3c/0x78 > <snip> > > Possible unsafe locking scenario: > CPU0 > ---- > lock(&hwq->cq_lock); > <Interrupt> > lock(&hwq->cq_lock); > *** DEADLOCK *** > 2 locks held by kworker/u16:4/260: > > [name:lockdep&] > stack backtrace: > CPU: 7 PID: 260 Comm: kworker/u16:4 Tainted: G S W OE > 6.1.17-mainline-android14-2-g277223301adb #1 > Workqueue: ufs_eh_wq_0 ufshcd_err_handler > > Call trace: > dump_backtrace+0x10c/0x160 > show_stack+0x20/0x30 > dump_stack_lvl+0x98/0xd8 > dump_stack+0x20/0x60 > print_usage_bug+0x584/0x76c > mark_lock_irq+0x488/0x510 > mark_lock+0x1ec/0x25c > __lock_acquire+0x4d8/0xffc > lock_acquire+0x17c/0x33c > _raw_spin_lock+0x5c/0x7c > ufshcd_mcq_poll_cqe_lock+0x30/0xe0 > ufshcd_poll+0x68/0x1b0 > ufshcd_transfer_req_compl+0x9c/0xc8 > ufshcd_err_handler+0x3bc/0xea0 > process_one_work+0x2f4/0x7e8 > worker_thread+0x234/0x450 > kthread+0x110/0x134 > ret_from_fork+0x10/0x20 > > ufs_mtk_mcq_intr() could refer to > https://lore.kernel.org/all/20230328103423.10970-3-powen.kao@mediatek.com/ > > When ufshcd_err_handler() is executed, CQ event interrupt can enter > waiting for the same lock. It could happened in upstream code path > ufshcd_handle_mcq_cq_events() and also in ufs_mtk_mcq_intr(). This > warning message will be generated when &hwq->cq_lock is used in IRQ > context with IRQ enabled. Use ufshcd_mcq_poll_cqe_lock() with > spin_lock_irqsave instead of spin_lock to resolve the deadlock issue. > > Signed-off-by: Alice Chao <alice.chao@mediatek.com> > --- After adding a Fixes: tag, feel free to add, Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c index 31df052fbc41..202ff71e1b58 100644 --- a/drivers/ufs/core/ufs-mcq.c +++ b/drivers/ufs/core/ufs-mcq.c @@ -299,11 +299,11 @@ EXPORT_SYMBOL_GPL(ufshcd_mcq_poll_cqe_nolock); unsigned long ufshcd_mcq_poll_cqe_lock(struct ufs_hba *hba, struct ufs_hw_queue *hwq) { - unsigned long completed_reqs; + unsigned long completed_reqs, flags; - spin_lock(&hwq->cq_lock); + spin_lock_irqsave(&hwq->cq_lock, flags); completed_reqs = ufshcd_mcq_poll_cqe_nolock(hba, hwq); - spin_unlock(&hwq->cq_lock); + spin_unlock_irqrestore(&hwq->cq_lock, flags); return completed_reqs; }