Message ID | 20230318081303.792969-1-zyytlz.wz@163.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp190784wrt; Sat, 18 Mar 2023 01:38:39 -0700 (PDT) X-Google-Smtp-Source: AK7set9I5Gos5CjnNU2TnrMH2ZQRz9HCoadrv3KLJaqCyBuC0WzWJoK+D4acIijhJhb3CwRFzMpJ X-Received: by 2002:a17:90b:3145:b0:23f:5a76:506 with SMTP id ip5-20020a17090b314500b0023f5a760506mr5324123pjb.46.1679128718998; Sat, 18 Mar 2023 01:38:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679128718; cv=none; d=google.com; s=arc-20160816; b=zyP4nYX2QOVaSMypYIHDgeK47FW+SOpXd9tPmF8x02UeLpGmZH/o7hGz6DyLRA3TwK Kmm8KDdUR/6QguOTyUuqsDAJYcp0XI+17fOdZ6Ugme5olJKZbdpVadAUBdmummSRZ9Dq qOTddo8nhiAFMl59xl6hlM9Xi/5VOtvoHunYO422yU71XFVuhsRoIf9EO5WAlgLhsYUt j2Qr3wNkm3FVB3AdIF6zIDNL2ASceSLCJwWD8QyU5m4cNvOGl9C8+O3ASdmLl4+dhiI2 jEGRkPPLLybfxU65onZYbfqm5gE5mDvAxioQyo3KPirXO8yedHpiYxUypJCiEyebIAg/ dPlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=0txFruaHYw7D6gvoux40mvteetA7L1PU1RtHGjVt8xg=; b=rxFFrhg7AHWSkLADGP41FJhkg9UCV1zLeqxvpHG6g3BaFCw6/TZ8B0+QpUFZynl/e/ S/YP56QjXshtLnjtRs7wzGkpy2j2aodR/3G7CNgJqU0aJtk5Y/Px3MCDm3YWTh4vIvfY wCzqZi40HpIE+mGiKrm/O0296VfLffrJPfrUdPaDABZRhpV77zd0SBWt2RI3hPKwheDx l9UlbD78qOjm4iuK3NwAUR8h02/yp0CWO6c5xptsl81ukO3kyWEqt+CjKzzIEY8+/nvq 5KJdGscAyoSues0AOrM0rBt7WqLXex+6Zcwcl5fB/2DbQAOJM8ltNlsSGHY+MisYVqz6 SeXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@163.com header.s=s110527 header.b="hr/hUXMF"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=163.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i13-20020a17090a4b8d00b00230cc557396si4370180pjh.73.2023.03.18.01.38.26; Sat, 18 Mar 2023 01:38:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@163.com header.s=s110527 header.b="hr/hUXMF"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=163.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229970AbjCRIN3 (ORCPT <rfc822;pusanteemu@gmail.com> + 99 others); Sat, 18 Mar 2023 04:13:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40196 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229516AbjCRIN1 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 18 Mar 2023 04:13:27 -0400 Received: from m12.mail.163.com (m12.mail.163.com [220.181.12.196]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8DB1429E00; Sat, 18 Mar 2023 01:13:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=0txFr uaHYw7D6gvoux40mvteetA7L1PU1RtHGjVt8xg=; b=hr/hUXMF6yUVgPVC79XKg U2AtTM0Ys9oPjqi/2NZC4peVtS+HbIDE+XOHeBqk6dhXIn1gQWXeenJJpdW+dql6 7X1PEuS6zFflcPWeSODtHiTFbnhYKZaIELBSHJEc6QTVLAJWllCV2PFZS5P5p4Ot TDba4mKopM9w3BodYcRZmI= Received: from leanderwang-LC2.localdomain (unknown [111.206.145.21]) by zwqz-smtp-mta-g3-4 (Coremail) with SMTP id _____wBnpg6RchVkX3VnAQ--.39240S2; Sat, 18 Mar 2023 16:13:05 +0800 (CST) From: Zheng Wang <zyytlz.wz@163.com> To: njavali@marvell.com Cc: mrangankar@marvell.com, GR-QLogic-Storage-Upstream@marvell.com, jejb@linux.ibm.com, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org, hackerzheng666@gmail.com, 1395428693sheep@gmail.com, alex000young@gmail.com, Zheng Wang <zyytlz.wz@163.com> Subject: [PATCH RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition Date: Sat, 18 Mar 2023 16:13:03 +0800 Message-Id: <20230318081303.792969-1-zyytlz.wz@163.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: _____wBnpg6RchVkX3VnAQ--.39240S2 X-Coremail-Antispam: 1Uf129KBjvJXoW7Kw17Gry3WFWUtFWUWr4UArb_yoW8GF43pr ZxGryfCw1UWa4FqFn8J3W0qFy0k3yDtFW0ga97Ww47X3W3u3yqv34Ika4jgry7JFZ2qa17 tF4xXFy7WFyDG3JanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0ziID73UUUUU= X-Originating-IP: [111.206.145.21] X-CM-SenderInfo: h2113zf2oz6qqrwthudrp/1tbiQhI2U1aEEtFw6AAAs8 X-Spam-Status: No, score=-0.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_MSPIKE_H2, RCVD_IN_VALIDITY_RPBL,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760694075782785160?= X-GMAIL-MSGID: =?utf-8?q?1760694075782785160?= |
Series |
[RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition
|
|
Commit Message
Zheng Wang
March 18, 2023, 8:13 a.m. UTC
In qedi_probe, it calls __qedi_probe, which bound &qedi->recovery_work
with qedi_recovery_handler and bound &qedi->board_disable_work
with qedi_board_disable_work.
When it calls qedi_schedule_recovery_handler, it will finally
call schedule_delayed_work to start the work.
When we call qedi_remove to remove the driver, there
may be a sequence as follows:
Fix it by finishing the work before cleanup in qedi_remove.
CPU0 CPU1
|qedi_recovery_handler
qedi_remove |
__qedi_remove |
iscsi_host_free |
scsi_host_put |
//free shost |
|iscsi_host_for_each_session
|//use qedi->shost
Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
---
drivers/scsi/qedi/qedi_main.c | 4 ++++
1 file changed, 4 insertions(+)
Comments
On 3/18/23 3:13 AM, Zheng Wang wrote: > In qedi_probe, it calls __qedi_probe, which bound &qedi->recovery_work > with qedi_recovery_handler and bound &qedi->board_disable_work > with qedi_board_disable_work. > > When it calls qedi_schedule_recovery_handler, it will finally > call schedule_delayed_work to start the work. > > When we call qedi_remove to remove the driver, there > may be a sequence as follows: > > Fix it by finishing the work before cleanup in qedi_remove. > > CPU0 CPU1 > > |qedi_recovery_handler > qedi_remove | > __qedi_remove | > iscsi_host_free | > scsi_host_put | > //free shost | > |iscsi_host_for_each_session > |//use qedi->shost > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process") > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > --- > drivers/scsi/qedi/qedi_main.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c > index f2ee49756df8..25223f6f5344 100644 > --- a/drivers/scsi/qedi/qedi_main.c > +++ b/drivers/scsi/qedi/qedi_main.c > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev *pdev, int mode) > int rval; > u16 retry = 10; > > + /*cancel work*/ This comment is not needed. The name of the functions you are calling have "cancel" and "work" in them so we know. If you want to add a comment explain why the cancel calls are needed here. > + cancel_delayed_work_sync(&qedi->recovery_work); > + cancel_delayed_work_sync(&qedi->board_disable_work); How do you know after you have called cancel_delayed_work_sync that schedule_recovery_handler or schedule_hw_err_handler can't be called? I don't know the qed driver well, but it looks like you could have operations still running, so after you cancel here one of those ops could lead to them scheduling the work again. > + > if (mode == QEDI_MODE_NORMAL) > iscsi_host_remove(qedi->shost, false); > else if (mode == QEDI_MODE_SHUTDOWN)
Mike Christie <michael.christie@oracle.com> 于2023年3月21日周二 00:11写道: > > On 3/18/23 3:13 AM, Zheng Wang wrote: > > In qedi_probe, it calls __qedi_probe, which bound &qedi->recovery_work > > with qedi_recovery_handler and bound &qedi->board_disable_work > > with qedi_board_disable_work. > > > > When it calls qedi_schedule_recovery_handler, it will finally > > call schedule_delayed_work to start the work. > > > > When we call qedi_remove to remove the driver, there > > may be a sequence as follows: > > > > Fix it by finishing the work before cleanup in qedi_remove. > > > > CPU0 CPU1 > > > > |qedi_recovery_handler > > qedi_remove | > > __qedi_remove | > > iscsi_host_free | > > scsi_host_put | > > //free shost | > > |iscsi_host_for_each_session > > |//use qedi->shost > > > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process") > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > --- > > drivers/scsi/qedi/qedi_main.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c > > index f2ee49756df8..25223f6f5344 100644 > > --- a/drivers/scsi/qedi/qedi_main.c > > +++ b/drivers/scsi/qedi/qedi_main.c > > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev *pdev, int mode) > > int rval; > > u16 retry = 10; > > > > + /*cancel work*/ > > This comment is not needed. The name of the functions you are calling have > "cancel" and "work" in them so we know. If you want to add a comment explain > why the cancel calls are needed here. > Hi, Sorry for my late reply and thanks for your advice. Will remove it in the next version of patch. > > > + cancel_delayed_work_sync(&qedi->recovery_work); > > + cancel_delayed_work_sync(&qedi->board_disable_work); > > > How do you know after you have called cancel_delayed_work_sync that > schedule_recovery_handler or schedule_hw_err_handler can't be called? > I don't know the qed driver well, but it looks like you could have > operations still running, so after you cancel here one of those ops > could lead to them scheduling the work again. > Sorry I didn't know how to make sure there's no more schedule. But I do think this is important. Maybe there're someone else who can give us advice. Best regards, Zheng > > > + > > if (mode == QEDI_MODE_NORMAL) > > iscsi_host_remove(qedi->shost, false); > > else if (mode == QEDI_MODE_SHUTDOWN) >
> -----Original Message----- > From: Zheng Hacker <hackerzheng666@gmail.com> > Sent: Thursday, March 23, 2023 9:15 AM > To: Mike Christie <michael.christie@oracle.com> > Cc: Zheng Wang <zyytlz.wz@163.com>; Nilesh Javali <njavali@marvell.com>; > Manish Rangankar <mrangankar@marvell.com>; GR-QLogic-Storage- > Upstream <GR-QLogic-Storage-Upstream@marvell.com>; > jejb@linux.ibm.com; martin.petersen@oracle.com; linux- > scsi@vger.kernel.org; linux-kernel@vger.kernel.org; > 1395428693sheep@gmail.com; alex000young@gmail.com > Subject: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in > qedi_remove due to race condition > > External Email > > ---------------------------------------------------------------------- > Mike Christie <michael.christie@oracle.com> 于2023年3月21日周二 00:11写 > 道: > > > > On 3/18/23 3:13 AM, Zheng Wang wrote: > > > In qedi_probe, it calls __qedi_probe, which bound > > > &qedi->recovery_work with qedi_recovery_handler and bound > > > &qedi->board_disable_work with qedi_board_disable_work. > > > > > > When it calls qedi_schedule_recovery_handler, it will finally call > > > schedule_delayed_work to start the work. > > > > > > When we call qedi_remove to remove the driver, there may be a > > > sequence as follows: > > > > > > Fix it by finishing the work before cleanup in qedi_remove. > > > > > > CPU0 CPU1 > > > > > > |qedi_recovery_handler > > > qedi_remove | > > > __qedi_remove | > > > iscsi_host_free | > > > scsi_host_put | > > > //free shost | > > > |iscsi_host_for_each_session > > > |//use qedi->shost > > > > > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process") > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > --- > > > drivers/scsi/qedi/qedi_main.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/drivers/scsi/qedi/qedi_main.c > > > b/drivers/scsi/qedi/qedi_main.c index f2ee49756df8..25223f6f5344 > > > 100644 > > > --- a/drivers/scsi/qedi/qedi_main.c > > > +++ b/drivers/scsi/qedi/qedi_main.c > > > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev > *pdev, int mode) > > > int rval; > > > u16 retry = 10; > > > > > > + /*cancel work*/ > > > > This comment is not needed. The name of the functions you are calling > > have "cancel" and "work" in them so we know. If you want to add a > > comment explain why the cancel calls are needed here. > > > > Hi, > > Sorry for my late reply and thanks for your advice. Will remove it in the next > version of patch. > > > > > > + cancel_delayed_work_sync(&qedi->recovery_work); > > > + cancel_delayed_work_sync(&qedi->board_disable_work); > > > > > > How do you know after you have called cancel_delayed_work_sync that > > schedule_recovery_handler or schedule_hw_err_handler can't be called? > > I don't know the qed driver well, but it looks like you could have > > operations still running, so after you cancel here one of those ops > > could lead to them scheduling the work again. > > > > Sorry I didn't know how to make sure there's no more schedule. But I do > think this is important. Maybe there're someone else who can give us advice. > > Best regards, > Zheng > > Best place to call cancel_delayed_work_sync is after qedi_ops->stop(qedi->cdev) and qedi_ops->ll2->stop(qedi->cdev);, after these qed calls firmware will not post any events to qedi driver. Thanks, Manish
Manish Rangankar <mrangankar@marvell.com> 于2023年3月23日周四 18:17写道: > > > > > -----Original Message----- > > From: Zheng Hacker <hackerzheng666@gmail.com> > > Sent: Thursday, March 23, 2023 9:15 AM > > To: Mike Christie <michael.christie@oracle.com> > > Cc: Zheng Wang <zyytlz.wz@163.com>; Nilesh Javali <njavali@marvell.com>; > > Manish Rangankar <mrangankar@marvell.com>; GR-QLogic-Storage- > > Upstream <GR-QLogic-Storage-Upstream@marvell.com>; > > jejb@linux.ibm.com; martin.petersen@oracle.com; linux- > > scsi@vger.kernel.org; linux-kernel@vger.kernel.org; > > 1395428693sheep@gmail.com; alex000young@gmail.com > > Subject: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in > > qedi_remove due to race condition > > > > External Email > > > > ---------------------------------------------------------------------- > > Mike Christie <michael.christie@oracle.com> 于2023年3月21日周二 00:11写 > > 道: > > > > > > On 3/18/23 3:13 AM, Zheng Wang wrote: > > > > In qedi_probe, it calls __qedi_probe, which bound > > > > &qedi->recovery_work with qedi_recovery_handler and bound > > > > &qedi->board_disable_work with qedi_board_disable_work. > > > > > > > > When it calls qedi_schedule_recovery_handler, it will finally call > > > > schedule_delayed_work to start the work. > > > > > > > > When we call qedi_remove to remove the driver, there may be a > > > > sequence as follows: > > > > > > > > Fix it by finishing the work before cleanup in qedi_remove. > > > > > > > > CPU0 CPU1 > > > > > > > > |qedi_recovery_handler > > > > qedi_remove | > > > > __qedi_remove | > > > > iscsi_host_free | > > > > scsi_host_put | > > > > //free shost | > > > > |iscsi_host_for_each_session > > > > |//use qedi->shost > > > > > > > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process") > > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > > --- > > > > drivers/scsi/qedi/qedi_main.c | 4 ++++ > > > > 1 file changed, 4 insertions(+) > > > > > > > > diff --git a/drivers/scsi/qedi/qedi_main.c > > > > b/drivers/scsi/qedi/qedi_main.c index f2ee49756df8..25223f6f5344 > > > > 100644 > > > > --- a/drivers/scsi/qedi/qedi_main.c > > > > +++ b/drivers/scsi/qedi/qedi_main.c > > > > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev > > *pdev, int mode) > > > > int rval; > > > > u16 retry = 10; > > > > > > > > + /*cancel work*/ > > > > > > This comment is not needed. The name of the functions you are calling > > > have "cancel" and "work" in them so we know. If you want to add a > > > comment explain why the cancel calls are needed here. > > > > > > > Hi, > > > > Sorry for my late reply and thanks for your advice. Will remove it in the next > > version of patch. > > > > > > > > > + cancel_delayed_work_sync(&qedi->recovery_work); > > > > + cancel_delayed_work_sync(&qedi->board_disable_work); > > > > > > > > > How do you know after you have called cancel_delayed_work_sync that > > > schedule_recovery_handler or schedule_hw_err_handler can't be called? > > > I don't know the qed driver well, but it looks like you could have > > > operations still running, so after you cancel here one of those ops > > > could lead to them scheduling the work again. > > > > > > > Sorry I didn't know how to make sure there's no more schedule. But I do > > think this is important. Maybe there're someone else who can give us advice. > > > > Best regards, > > Zheng > > > > > Best place to call cancel_delayed_work_sync is after qedi_ops->stop(qedi->cdev) and > qedi_ops->ll2->stop(qedi->cdev);, after these qed calls firmware will not post any events to qedi driver. > Sorry for my late reply. Will apply that in next version. Best reagrds, Zheng > Thanks, > Manish >
diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c index f2ee49756df8..25223f6f5344 100644 --- a/drivers/scsi/qedi/qedi_main.c +++ b/drivers/scsi/qedi/qedi_main.c @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev *pdev, int mode) int rval; u16 retry = 10; + /*cancel work*/ + cancel_delayed_work_sync(&qedi->recovery_work); + cancel_delayed_work_sync(&qedi->board_disable_work); + if (mode == QEDI_MODE_NORMAL) iscsi_host_remove(qedi->shost, false); else if (mode == QEDI_MODE_SHUTDOWN)