Message ID | 20230725030026.1664873-1-zyytlz.wz@163.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9010:0:b0:3e4:2afc:c1 with SMTP id l16csp2205887vqg; Mon, 24 Jul 2023 20:26:10 -0700 (PDT) X-Google-Smtp-Source: APBJJlH5Z9Tvme8yESuojokcBFAuIHgstyQq6ZAWNKNynY2KrCsjKtcIW9U3iROe/nb0ZxiUJO7/ X-Received: by 2002:a17:902:c412:b0:1b8:6245:1235 with SMTP id k18-20020a170902c41200b001b862451235mr1678870plk.13.1690255570117; Mon, 24 Jul 2023 20:26:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690255570; cv=none; d=google.com; s=arc-20160816; b=LRt8rnQvkVFRgaVx3fKjouunFRs4jsqGcCZaOmnTP3WFFcY3RJCUBcVRn8Ugehn/TG +WkEQn6vn11nWxukfdoVFe2ztA5rVFzphSVoy9fcv1m4NLDJxg+s5Y/2rhPZJF8f6ZFb VBW0Rzyfe+3BUrF0eRpkZG2nqW4g6NsjQtHplBeg86qGbwtC2nb/vXejD/qYq3OUOTXP 0SIe2FB21BG2GZO0aPnV2zt6AOSQeA7kI0fMyKR7UQvJaw+4ow0Z6Wef6gxXJ3QiSyDm jXW08TJ3033TIAU8YyLA2kNhsFLpQTAbKUu5rKDnJnKyLXb9LGbocUdUU6YwMT4R1fQr zbrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=Y7Db9CtYu9IW3X/wWavIFjW3pwP0JBOPHcat5DJxUPQ=; fh=0qZbkesdVDOwLQAQ4/SHw1aEi4sIgeTGayoY46taFq4=; b=SjYfyC16+TX4s4DxuC7aflkxkh4eTlJ/15fzzQJ83fSE7/fPtvyMU+HTwNBd9jnDrJ Zmu1aYo4YGoLbgpwsi80XSCKOOsAwV6KQSJl9JDJ9KNClYgsM0emferwFfDbOVRP6epJ XSrLF8TRkdr19IhAvfsO/P7HSqFjQ+seu/oL4AWi5GnKSsAfo/KJszUZUaaSM1FdHidR eBeMIG4pKOebjxvwXvnbBhoHcgVhwrBHP7xpXZp5p/g52jCuIb4YA1/1WT4O8vgOkZMK E30EvXfPlVqYgFVQbkvXKwpTpezDKXnQJUE+vN+ot0eRlCo0ve7wBLQnBSyw29f2Pn3J Uy5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@163.com header.s=s110527 header.b=RIRF2W4a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=163.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g191-20020a636bc8000000b0056341268ed7si10264427pgc.853.2023.07.24.20.25.57; Mon, 24 Jul 2023 20:26:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@163.com header.s=s110527 header.b=RIRF2W4a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=163.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231549AbjGYDCH (ORCPT <rfc822;kautuk.consul.80@gmail.com> + 99 others); Mon, 24 Jul 2023 23:02:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231578AbjGYDBv (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 24 Jul 2023 23:01:51 -0400 Received: from m12.mail.163.com (m12.mail.163.com [220.181.12.196]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 83DBD173D; Mon, 24 Jul 2023 20:01:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=Y7Db9 CtYu9IW3X/wWavIFjW3pwP0JBOPHcat5DJxUPQ=; b=RIRF2W4aI3DUUx39z4OG9 6v5Rz2dG6AjeONNuuML9MW2NRCODYyvWmVmZpNYpiZOWefl+5OT8yGrRKmgyEnac IRiTO13k/Aa8QajqA79MbT55ZwFSnhz38Bi55aelv5dFicALob56UF+FwOtBVUc6 ORWNfK4LtN2IpIIIXTiSik= Received: from leanderwang-LC2.localdomain (unknown [111.206.145.21]) by zwqz-smtp-mta-g1-2 (Coremail) with SMTP id _____wC3TZbMOr9kRW7dBA--.23188S2; Tue, 25 Jul 2023 11:00:29 +0800 (CST) From: Zheng Wang <zyytlz.wz@163.com> To: s.shtylyov@omp.ru Cc: lee@kernel.org, linyunsheng@huawei.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, richardcochran@gmail.com, p.zabel@pengutronix.de, geert+renesas@glider.be, magnus.damm@gmail.com, yoshihiro.shimoda.uh@renesas.com, biju.das.jz@bp.renesas.com, wsa+renesas@sang-engineering.com, netdev@vger.kernel.org, linux-renesas-soc@vger.kernel.org, linux-kernel@vger.kernel.org, hackerzheng666@gmail.com, 1395428693sheep@gmail.com, alex000young@gmail.com, Zheng Wang <zyytlz.wz@163.com> Subject: [PATCH v4] net: ravb: Fix possible UAF bug in ravb_remove Date: Tue, 25 Jul 2023 11:00:26 +0800 Message-Id: <20230725030026.1664873-1-zyytlz.wz@163.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: _____wC3TZbMOr9kRW7dBA--.23188S2 X-Coremail-Antispam: 1Uf129KBjvJXoW7CF1DZF48JrWrZryUtr1xGrg_yoW8Xry3p3 9xKa4F9ws5J3WUWa1xJFs7ZFWrCw17Kr909FZ7Aw1rZ3Zay3WDXr1FgFy8Aw1UJFZ5t3Wa vrWUZ3Wxu3WDAa7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0pM4E_DUUUUU= X-Originating-IP: [111.206.145.21] X-CM-SenderInfo: h2113zf2oz6qqrwthudrp/xtbBRw+3U2I0Ut2f+wAAsR X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772361424657870529 X-GMAIL-MSGID: 1772361424657870529 |
Series |
[v4] net: ravb: Fix possible UAF bug in ravb_remove
|
|
Commit Message
Zheng Wang
July 25, 2023, 3 a.m. UTC
In ravb_probe, priv->work was bound with ravb_tx_timeout_work. If timeout occurs, it will start the work. And if we call ravb_remove without finishing the work, there may be a use-after-free bug on ndev. Fix it by finishing the job before cleanup in ravb_remove. Note that this bug is found by static analysis, it might be false positive. Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Zheng Wang <zyytlz.wz@163.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> --- v4: - add information about the bug was found suggested by Yunsheng Lin v3: - fix typo in commit message v2: - stop dev_watchdog so that handle no more timeout work suggested by Yunsheng Lin, add an empty line to make code clear suggested by Sergey Shtylyov --- drivers/net/ethernet/renesas/ravb_main.c | 3 +++ 1 file changed, 3 insertions(+)
Comments
On Tue, 25 Jul 2023 11:00:26 +0800 Zheng Wang wrote: > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > index 4d6b3b7d6abb..ce2da5101e51 100644 > --- a/drivers/net/ethernet/renesas/ravb_main.c > +++ b/drivers/net/ethernet/renesas/ravb_main.c > @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev) > struct ravb_private *priv = netdev_priv(ndev); > const struct ravb_hw_info *info = priv->info; > > + netif_carrier_off(ndev); > + netif_tx_disable(ndev); > + cancel_work_sync(&priv->work); Still racy, the carrier can come back up after canceling the work. But whatever, this is a non-issue in the first place. The fact that ravb_tx_timeout_work doesn't take any locks seems much more suspicious.
On Tue, 2023-07-25 at 20:19 -0700, Jakub Kicinski wrote: > On Tue, 25 Jul 2023 11:00:26 +0800 Zheng Wang wrote: > > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > > index 4d6b3b7d6abb..ce2da5101e51 100644 > > --- a/drivers/net/ethernet/renesas/ravb_main.c > > +++ b/drivers/net/ethernet/renesas/ravb_main.c > > @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev) > > struct ravb_private *priv = netdev_priv(ndev); > > const struct ravb_hw_info *info = priv->info; > > > > + netif_carrier_off(ndev); > > + netif_tx_disable(ndev); > > + cancel_work_sync(&priv->work); > > Still racy, the carrier can come back up after canceling the work. I must admit I don't see how/when this driver sets the carrier on ?!? > But whatever, this is a non-issue in the first place. Do you mean the UaF can't happen? I think that is real. > The fact that ravb_tx_timeout_work doesn't take any locks seems much > more suspicious. Indeed! But that should be a different patch, right? Waiting a little more for feedback from renesas. /P
Hello! On 7/27/23 11:21 AM, Paolo Abeni wrote: [...] >>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c >>> index 4d6b3b7d6abb..ce2da5101e51 100644 >>> --- a/drivers/net/ethernet/renesas/ravb_main.c >>> +++ b/drivers/net/ethernet/renesas/ravb_main.c >>> @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev) >>> struct ravb_private *priv = netdev_priv(ndev); >>> const struct ravb_hw_info *info = priv->info; >>> >>> + netif_carrier_off(ndev); >>> + netif_tx_disable(ndev); >>> + cancel_work_sync(&priv->work); >> >> Still racy, the carrier can come back up after canceling the work. > > I must admit I don't see how/when this driver sets the carrier on ?!? The phylib code does it for this MAC driver, see the call tree of phy_link_change(), on e.g. https://elixir.bootlin.com/linux/v6.5-rc3/source/... >> But whatever, this is a non-issue in the first place. > > Do you mean the UaF can't happen? I think that is real. Looks possible to me, at least now... and anyway, shouldn't we clean up after ourselves if we call schedule_work()?However my current impression is that cancel_work_sync() should be called from ravb_close(), after calling phy_{stop|disconnect}()... >> The fact that ravb_tx_timeout_work doesn't take any locks seems much >> more suspicious. > > Indeed! But that should be a different patch, right? Yes. > Waiting a little more for feedback from renesas. Renesas historically hasn't shown much interest to reviewing the sh_eth/ravb driver patches, so I took that task upon myself. I also happen to be a nominal author of this driver... :-) > /P MBR, Sergey
On Thu, 27 Jul 2023 21:48:41 +0300 Sergey Shtylyov wrote: > >> Still racy, the carrier can come back up after canceling the work. > > > > I must admit I don't see how/when this driver sets the carrier on ?!? > > The phylib code does it for this MAC driver, see the call tree of > phy_link_change(), on e.g. https://elixir.bootlin.com/linux/v6.5-rc3/source/... > > >> But whatever, this is a non-issue in the first place. > > > > Do you mean the UaF can't happen? I think that is real. > > Looks possible to me, at least now... and anyway, shouldn't we clean up > after ourselves if we call schedule_work()?However my current impression is > that cancel_work_sync() should be called from ravb_close(), after calling > phy_{stop|disconnect}()... > > >> The fact that ravb_tx_timeout_work doesn't take any locks seems much > >> more suspicious. > > > > Indeed! But that should be a different patch, right? > > Yes. > > > Waiting a little more for feedback from renesas. > > Renesas historically hasn't shown much interest to reviewing the sh_eth/ravb > driver patches, so I took that task upon myself. I also happen to be a nominal > author of this driver... :-) Simplest fix I can think of is to take a reference on the netdev before scheduling the work, and then check if it's still registered in the work itself. Wrap the timeout work in rtnl_lock() to avoid any races there.
On Thu, Jul 27, 2023 at 09:48:41PM +0300, Sergey Shtylyov wrote: > Hello! > > On 7/27/23 11:21 AM, Paolo Abeni wrote: > [...] > >>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > >>> index 4d6b3b7d6abb..ce2da5101e51 100644 > >>> --- a/drivers/net/ethernet/renesas/ravb_main.c > >>> +++ b/drivers/net/ethernet/renesas/ravb_main.c > >>> @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev) > >>> struct ravb_private *priv = netdev_priv(ndev); > >>> const struct ravb_hw_info *info = priv->info; > >>> > >>> + netif_carrier_off(ndev); > >>> + netif_tx_disable(ndev); > >>> + cancel_work_sync(&priv->work); > >> > >> Still racy, the carrier can come back up after canceling the work. > > > > I must admit I don't see how/when this driver sets the carrier on ?!? > > The phylib code does it for this MAC driver, see the call tree of > phy_link_change(), on e.g. https://elixir.bootlin.com/linux/v6.5-rc3/source/... > > >> But whatever, this is a non-issue in the first place. > > > > Do you mean the UaF can't happen? I think that is real. > > Looks possible to me, at least now... and anyway, shouldn't we clean up > after ourselves if we call schedule_work()?However my current impression is > that cancel_work_sync() should be called from ravb_close(), after calling > phy_{stop|disconnect}()... > > >> The fact that ravb_tx_timeout_work doesn't take any locks seems much > >> more suspicious. > > > > Indeed! But that should be a different patch, right? > > Yes. > > > Waiting a little more for feedback from renesas. > > Renesas historically hasn't shown much interest to reviewing the sh_eth/ravb > driver patches, so I took that task upon myself. I also happen to be a nominal > author of this driver... :-) FWIIW, that matches my recollection. Although it may be out of date by now.
On Tue, 25 Jul 2023, Zheng Wang wrote: > In ravb_probe, priv->work was bound with ravb_tx_timeout_work. > If timeout occurs, it will start the work. And if we call > ravb_remove without finishing the work, there may be a > use-after-free bug on ndev. > > Fix it by finishing the job before cleanup in ravb_remove. > > Note that this bug is found by static analysis, it might be > false positive. > > Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> > --- > v4: > - add information about the bug was found suggested by Yunsheng Lin > v3: > - fix typo in commit message > v2: > - stop dev_watchdog so that handle no more timeout work suggested by Yunsheng Lin, > add an empty line to make code clear suggested by Sergey Shtylyov > --- > drivers/net/ethernet/renesas/ravb_main.c | 3 +++ > 1 file changed, 3 insertions(+) Trying my best not to sound like a broken record, but ... What's the latest with this fix? Is a v5 en route?
Hello Sergey! > From: Sergey Shtylyov, Sent: Friday, July 28, 2023 3:49 AM > > Hello! > > On 7/27/23 11:21 AM, Paolo Abeni wrote: > [...] > >>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > >>> index 4d6b3b7d6abb..ce2da5101e51 100644 > >>> --- a/drivers/net/ethernet/renesas/ravb_main.c > >>> +++ b/drivers/net/ethernet/renesas/ravb_main.c > >>> @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev) > >>> struct ravb_private *priv = netdev_priv(ndev); > >>> const struct ravb_hw_info *info = priv->info; > >>> > >>> + netif_carrier_off(ndev); > >>> + netif_tx_disable(ndev); > >>> + cancel_work_sync(&priv->work); > >> > >> Still racy, the carrier can come back up after canceling the work. > > > > I must admit I don't see how/when this driver sets the carrier on ?!? > > The phylib code does it for this MAC driver, see the call tree of > phy_link_change(), on e.g. > https://elixir.bootlin.com/linux/v6.5-rc3/source > > >> But whatever, this is a non-issue in the first place. > > > > Do you mean the UaF can't happen? I think that is real. > > Looks possible to me, at least now... and anyway, shouldn't we clean up > after ourselves if we call schedule_work()?However my current impression is > that cancel_work_sync() should be called from ravb_close(), after calling > phy_{stop|disconnect}()... I also think so. ravb_remove() calls unregister_netdev(). -> unregister_netdev() calls rtnl_lock() and unregister_netdevice(). --> unregiter_netdevice_queue() ---> unregiter_netdevice_many() ----> unregiter_netdevice_many_notify(). -----> dev_close_many() ------> __dev_close_many() -------> ops->ndo_stop() ravb_close() calls phy_stop() -> phy_state_machine() with PHY_HALTED --> phy_link_down() ---> phy_link_change() ----> netif_carrier_off() The patch will be the following: --- drivers/net/ethernet/renesas/ravb_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index 7df9f9f8e134..e452d90de7c2 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -2167,6 +2167,8 @@ static int ravb_close(struct net_device *ndev) of_phy_deregister_fixed_link(np); } + cancel_work_sync(&priv->work); + if (info->multi_irqs) { free_irq(priv->tx_irqs[RAVB_NC], ndev); free_irq(priv->rx_irqs[RAVB_NC], ndev); --- If this patch is acceptable, I'll submit it. But, what do you think? Best regards, Yoshihiro Shimoda > >> The fact that ravb_tx_timeout_work doesn't take any locks seems much > >> more suspicious. > > > > Indeed! But that should be a different patch, right? > > Yes. > > > Waiting a little more for feedback from renesas. > > Renesas historically hasn't shown much interest to reviewing the sh_eth/ravb > driver patches, so I took that task upon myself. I also happen to be a nominal > author of this driver... :-) > > > /P > > MBR, Sergey
Hello! On 9/20/23 5:37 AM, Yoshihiro Shimoda wrote: Sorry, I got ill that same day and still have subfebrile temperature, and I forgot about your mail. I'll try replying to it on this weekend... [...] MBR, Sergey
Hello Sergey, > From: Sergey Shtylyov, Sent: Saturday, September 30, 2023 5:23 AM > > Hello! > > On 9/20/23 5:37 AM, Yoshihiro Shimoda wrote: > > Sorry, I got ill that same day and still have subfebrile temperature, > and I forgot about your mail. I'll try replying to it on this weekend... Thank you for your reply! I understood it. Please take care of yourself. I hope you will get well soon. Best regards, Yoshihiro Shimoda > [...] > > MBR, Sergey
Hello! Concerning the subject: I doubt that UAF acronym is known to everybody (e.g. it wasn't known to me), I think we should be able to afford spelling out use-after-free there... On 9/20/23 5:37 AM, Yoshihiro Shimoda wrote: [...] >>>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c >>>>> index 4d6b3b7d6abb..ce2da5101e51 100644 >>>>> --- a/drivers/net/ethernet/renesas/ravb_main.c >>>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c >>>>> @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev) >>>>> struct ravb_private *priv = netdev_priv(ndev); >>>>> const struct ravb_hw_info *info = priv->info; >>>>> >>>>> + netif_carrier_off(ndev); >>>>> + netif_tx_disable(ndev); >>>>> + cancel_work_sync(&priv->work); >>>> >>>> Still racy, the carrier can come back up after canceling the work. >>> >>> I must admit I don't see how/when this driver sets the carrier on ?!? >> >> The phylib code does it for this MAC driver, see the call tree of >> phy_link_change(), on e.g. >> https://elixir.bootlin.com/linux/v6.5-rc3/source >> >>>> But whatever, this is a non-issue in the first place. >>> >>> Do you mean the UaF can't happen? I think that is real. >> >> Looks possible to me, at least now... and anyway, shouldn't we clean up >> after ourselves if we call schedule_work()?However my current impression is >> that cancel_work_sync() should be called from ravb_close(), after calling >> phy_{stop|disconnect}()... > > I also think so. > > ravb_remove() calls unregister_netdev(). > -> unregister_netdev() calls rtnl_lock() and unregister_netdevice(). > --> unregiter_netdevice_queue() > ---> unregiter_netdevice_many() > ----> unregiter_netdevice_many_notify(). > -----> dev_close_many() > ------> __dev_close_many() > -------> ops->ndo_stop() > > ravb_close() calls phy_stop() > -> phy_state_machine() with PHY_HALTED > --> phy_link_down() > ---> phy_link_change() > ----> netif_carrier_off() Thanks for sharing the call chain, I've followed it once again... :-) > The patch will be the following: > --- > drivers/net/ethernet/renesas/ravb_main.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > index 7df9f9f8e134..e452d90de7c2 100644 > --- a/drivers/net/ethernet/renesas/ravb_main.c > +++ b/drivers/net/ethernet/renesas/ravb_main.c > @@ -2167,6 +2167,8 @@ static int ravb_close(struct net_device *ndev) > of_phy_deregister_fixed_link(np); > } > > + cancel_work_sync(&priv->work); > + > if (info->multi_irqs) { > free_irq(priv->tx_irqs[RAVB_NC], ndev); > free_irq(priv->rx_irqs[RAVB_NC], ndev); > --- > > If this patch is acceptable, I'll submit it. But, what do you think? I think it should do the job. And I suspect you can even test it... :-) > Best regards, > Yoshihiro Shimoda [...] MBR, Sergey
Hello Sergey, > From: Sergey Shtylyov, Sent: Wednesday, October 4, 2023 4:39 AM > > Hello! > > Concerning the subject: I doubt that UAF acronym is known to > everybody (e.g. it wasn't known to me), I think we should be able > to afford spelling out use-after-free there... I got it. I'll change the subject. > On 9/20/23 5:37 AM, Yoshihiro Shimoda wrote: > [...] > > >>>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > >>>>> index 4d6b3b7d6abb..ce2da5101e51 100644 > >>>>> --- a/drivers/net/ethernet/renesas/ravb_main.c > >>>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c > >>>>> @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev) > >>>>> struct ravb_private *priv = netdev_priv(ndev); > >>>>> const struct ravb_hw_info *info = priv->info; > >>>>> > >>>>> + netif_carrier_off(ndev); > >>>>> + netif_tx_disable(ndev); > >>>>> + cancel_work_sync(&priv->work); > >>>> > >>>> Still racy, the carrier can come back up after canceling the work. > >>> > >>> I must admit I don't see how/when this driver sets the carrier on ?!? > >> > >> The phylib code does it for this MAC driver, see the call tree of > >> phy_link_change(), on e.g. > >> <snip URL> > >> > >>>> But whatever, this is a non-issue in the first place. > >>> > >>> Do you mean the UaF can't happen? I think that is real. > >> > >> Looks possible to me, at least now... and anyway, shouldn't we clean up > >> after ourselves if we call schedule_work()?However my current impression is > >> that cancel_work_sync() should be called from ravb_close(), after calling > >> phy_{stop|disconnect}()... > > > > I also think so. > > > > ravb_remove() calls unregister_netdev(). > > -> unregister_netdev() calls rtnl_lock() and unregister_netdevice(). > > --> unregiter_netdevice_queue() > > ---> unregiter_netdevice_many() > > ----> unregiter_netdevice_many_notify(). > > -----> dev_close_many() > > ------> __dev_close_many() > > -------> ops->ndo_stop() > > > > ravb_close() calls phy_stop() > > -> phy_state_machine() with PHY_HALTED > > --> phy_link_down() > > ---> phy_link_change() > > ----> netif_carrier_off() > > Thanks for sharing the call chain, I've followed it once again... :-) Thank you :) > > The patch will be the following: > > --- > > drivers/net/ethernet/renesas/ravb_main.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > > index 7df9f9f8e134..e452d90de7c2 100644 > > --- a/drivers/net/ethernet/renesas/ravb_main.c > > +++ b/drivers/net/ethernet/renesas/ravb_main.c > > @@ -2167,6 +2167,8 @@ static int ravb_close(struct net_device *ndev) > > of_phy_deregister_fixed_link(np); > > } > > > > + cancel_work_sync(&priv->work); > > + > > if (info->multi_irqs) { > > free_irq(priv->tx_irqs[RAVB_NC], ndev); > > free_irq(priv->rx_irqs[RAVB_NC], ndev); > > --- > > > > If this patch is acceptable, I'll submit it. But, what do you think? > > I think it should do the job. Thank you for your comment! I'll make such a patch. > And I suspect you can even test it... :-) IIUC, causing tx timeout is difficult. But, I guess we can add a fault injection code somehow. Best regards, Yoshihiro Shimoda > > Best regards, > > Yoshihiro Shimoda > > [...] > > MBR, Sergey
On 26.07.2023 05:19, Jakub Kicinski wrote: ... > The fact that ravb_tx_timeout_work doesn't take any locks seems much > more suspicious. Does anybody plan to look into this, too? Best regards Dirk
Hello Behme, > From: Behme Dirk (CM/ESO2), Sent: Tuesday, October 10, 2023 9:59 PM > > On 26.07.2023 05:19, Jakub Kicinski wrote: > ... > > The fact that ravb_tx_timeout_work doesn't take any locks seems much > > more suspicious. > Does anybody plan to look into this, too? I believe my fixed patch [1] resolved this issue too. Let me explain it in detail below. In the thread, Jakub also mentioned [2] like below: --- Simplest fix I can think of is to take a reference on the netdev before scheduling the work, and then check if it's still registered in the work itself. Wrap the timeout work in rtnl_lock() to avoid any races there. --- Sergey suggested to add cancel_work_sync() into the ravb_close () [3]. And I investigated calltrace, and then the ravb_close() is under rtnl_lock() [4] like below: ----------------------------------------------------------------------- ravb_remove() calls unregister_netdev(). -> unregister_netdev() calls rtnl_lock() and unregister_netdevice(). --> unregiter_netdevice_queue() ---> unregiter_netdevice_many() ----> unregiter_netdevice_many_notify(). -----> dev_close_many() ------> __dev_close_many() -------> ops->ndo_stop() ravb_close() calls phy_stop() -> phy_state_machine() with PHY_HALTED --> phy_link_down() ---> phy_link_change() ----> netif_carrier_off() ----------------------------------------------------------------------- So, during cancel_work_sync() is waiting for canceling the workqueue in ravb_close(), it's under rtnl_lock() so that no additional locks are needed in ravb_tx_timeout_work(). [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=3971442870713de527684398416970cf025b4f89 [2] https://lore.kernel.org/netdev/20230727164820.48c9e685@kernel.org/ [3] https://lore.kernel.org/netdev/607f4fe4-5a59-39dd-71c2-0cf769b48187@omp.ru/ [4] https://lore.kernel.org/netdev/OSYPR01MB53341CFDBB49A3BA41A6752CD8F9A@OSYPR01MB5334.jpnprd01.prod.outlook.com/ Best regards, Yoshihiro Shimoda > Best regards > > Dirk
Hi, On 12.10.2023 10:39, Yoshihiro Shimoda wrote: > Hello Behme, > >> From: Behme Dirk (CM/ESO2), Sent: Tuesday, October 10, 2023 9:59 PM >> >> On 26.07.2023 05:19, Jakub Kicinski wrote: >> ... >>> The fact that ravb_tx_timeout_work doesn't take any locks seems much >>> more suspicious. >> Does anybody plan to look into this, too? > > I believe my fixed patch [1] resolved this issue too. I'm not an expert of this driver nor the network stack, so sorry if I'm totally wrong here ;) But somehow this answer confuses me. Let me explain: What you did with [1] is to stop/cancel the workqueue in ravb_close(). That's fine. But that is at driver's close time. What's about driver's normal runtime? What I understood is that ravb_tx_timeout_work() might run totally asynchronously to the rest of the driver. And therefore needs locking/protection/synchronization if it uses resources of the driver which are used elsewhere in the driver, too. I think this is exactly what is described with: > --- > Simplest fix I can think of is to take a reference on the netdev before > scheduling the work, and then check if it's still registered in the work > itself. Wrap the timeout work in rtnl_lock() to avoid any races there. > --- So, where is above done? Not at close time, but at normal run time of the driver? Best regards Dirk > Sergey suggested to add cancel_work_sync() into the ravb_close () [3]. > And I investigated calltrace, and then the ravb_close() is under rtnl_lock() [4] > like below: > ----------------------------------------------------------------------- > ravb_remove() calls unregister_netdev(). > -> unregister_netdev() calls rtnl_lock() and unregister_netdevice(). > --> unregiter_netdevice_queue() > ---> unregiter_netdevice_many() > ----> unregiter_netdevice_many_notify(). > -----> dev_close_many() > ------> __dev_close_many() > -------> ops->ndo_stop() > > ravb_close() calls phy_stop() > -> phy_state_machine() with PHY_HALTED > --> phy_link_down() > ---> phy_link_change() > ----> netif_carrier_off() > ----------------------------------------------------------------------- > > So, during cancel_work_sync() is waiting for canceling the workqueue in ravb_close(), > it's under rtnl_lock() so that no additional locks are needed in ravb_tx_timeout_work(). > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=3971442870713de527684398416970cf025b4f89 > [2] https://lore.kernel.org/netdev/20230727164820.48c9e685@kernel.org/ > [3] https://lore.kernel.org/netdev/607f4fe4-5a59-39dd-71c2-0cf769b48187@omp.ru/ > [4] https://lore.kernel.org/netdev/OSYPR01MB53341CFDBB49A3BA41A6752CD8F9A@OSYPR01MB5334.jpnprd01.prod.outlook.com/ > > Best regards, > Yoshihiro Shimoda > >> Best regards >> >> Dirk
Hi, > From: Behme Dirk (CM/ESO2), Sent: Friday, October 13, 2023 3:05 PM > > Hi, > > On 12.10.2023 10:39, Yoshihiro Shimoda wrote: > > Hello Behme, > > > >> From: Behme Dirk (CM/ESO2), Sent: Tuesday, October 10, 2023 9:59 PM > >> > >> On 26.07.2023 05:19, Jakub Kicinski wrote: > >> ... > >>> The fact that ravb_tx_timeout_work doesn't take any locks seems much > >>> more suspicious. > >> Does anybody plan to look into this, too? > > > > I believe my fixed patch [1] resolved this issue too. > > > I'm not an expert of this driver nor the network stack, so sorry if I'm > totally wrong here ;) But somehow this answer confuses me. Let me explain: > > What you did with [1] is to stop/cancel the workqueue in ravb_close(). > That's fine. But that is at driver's close time. > > What's about driver's normal runtime? What I understood is that > ravb_tx_timeout_work() might run totally asynchronously to the rest of > the driver. And therefore needs locking/protection/synchronization if it > uses resources of the driver which are used elsewhere in the driver, too. > > I think this is exactly what is described with: > > > --- > > Simplest fix I can think of is to take a reference on the netdev before > > scheduling the work, and then check if it's still registered in the work > > itself. Wrap the timeout work in rtnl_lock() to avoid any races there. > > --- > > So, where is above done? Not at close time, but at normal run time of > the driver? Thank you very much for your detailed explanation. I understood it. ravb_tx_timeout_work() still has races between ethtool ops for instance. So, I'll make a patch for it by early next week. However, IIUC, using rtnl_lock() in ravb_tx_timeout_work() is possible to cause deadlock from cancel_work_sync() in ravb_close(). So, I'll use rtnl_trylock() instead. Best regards, Yoshihiro Shimoda > Best regards > > Dirk > > > Sergey suggested to add cancel_work_sync() into the ravb_close () [3]. > > And I investigated calltrace, and then the ravb_close() is under rtnl_lock() [4] > > like below: > > ----------------------------------------------------------------------- > > ravb_remove() calls unregister_netdev(). > > -> unregister_netdev() calls rtnl_lock() and unregister_netdevice(). > > --> unregiter_netdevice_queue() > > ---> unregiter_netdevice_many() > > ----> unregiter_netdevice_many_notify(). > > -----> dev_close_many() > > ------> __dev_close_many() > > -------> ops->ndo_stop() > > > > ravb_close() calls phy_stop() > > -> phy_state_machine() with PHY_HALTED > > --> phy_link_down() > > ---> phy_link_change() > > ----> netif_carrier_off() > > ----------------------------------------------------------------------- > > > > So, during cancel_work_sync() is waiting for canceling the workqueue in ravb_close(), > > it's under rtnl_lock() so that no additional locks are needed in ravb_tx_timeout_work(). > > > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git%25 > 2Fnetdev%2Fnet.git%2Fcommit%2F%3Fid%3D3971442870713de527684398416970cf025b4f89&data=05%7C01%7Cyoshihiro.shimoda.uh%4 > 0renesas.com%7C466e046b20b548b264f808dbcbb255f6%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638327739033548199%7CUn > known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HkA8f5a > gawjXMvAGkaE6tELaSpjpbIn7M3mU5xbDTD0%3D&reserved=0 > > [2] > https://lore.kernel.org/netdev/20230727164820.48c9e685 > %40kernel.org%2F&data=05%7C01%7Cyoshihiro.shimoda.uh%40renesas.com%7C466e046b20b548b264f808dbcbb255f6%7C53d82571da19 > 47e49cb4625a166a4a2a%7C0%7C0%7C638327739033548199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTi > I6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=cGvnA8WqxM%2FUDa%2FNS2OBztr1IWgjCX4IzBYXe1LGkZU%3D&reserved=0 > > [3] > https://lore.kernel.org/netdev/607f4fe4-5a59-39dd-71c2 > -0cf769b48187%40omp.ru%2F&data=05%7C01%7Cyoshihiro.shimoda.uh%40renesas.com%7C466e046b20b548b264f808dbcbb255f6%7C53d > 82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638327739033548199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM > zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=OWwBKy%2Fdckgo3clPPfn2hxE4H6ToyqdcbhPhGoqoo30%3D&reserved=0 > > [4] > https://lore.kernel.org/netdev/OSYPR01MB53341CFDBB49A3 > BA41A6752CD8F9A%40OSYPR01MB5334.jpnprd01.prod.outlook.com%2F&data=05%7C01%7Cyoshihiro.shimoda.uh%40renesas.com%7C466 > e046b20b548b264f808dbcbb255f6%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638327739033548199%7CUnknown%7CTWFpbGZsb3 > d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Jfypf10jiUfTqWUAukjnPzIQp > urx7m0ETF5N2Toq8wE%3D&reserved=0 > > > > Best regards, > > Yoshihiro Shimoda > > > >> Best regards > >> > >> Dirk > > -- > ====================================================================== > Dirk Behme Robert Bosch Car Multimedia GmbH > CM/ESO2 > Phone: +49 5121 49-3274 Dirk Behme > Fax: +49 711 811 5053274 PO Box 77 77 77 > mailto:dirk.behme@de.bosch.com D-31132 Hildesheim - Germany > > Bosch Group, Car Multimedia (CM) > Engineering SW Operating Systems 2 (ESO2) > > Robert Bosch Car Multimedia GmbH - Ein Unternehmen der Bosch Gruppe > Sitz: Hildesheim > Registergericht: Amtsgericht Hildesheim HRB 201334 > Aufsichtsratsvorsitzender: Dr. Dirk Hoheisel > Geschäftsführung: Dr. Steffen Berns; > Dr. Sven Ost, Jörg Pollak, Dr. Walter Schirm > ======================================================================
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index 4d6b3b7d6abb..ce2da5101e51 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -2885,6 +2885,9 @@ static int ravb_remove(struct platform_device *pdev) struct ravb_private *priv = netdev_priv(ndev); const struct ravb_hw_info *info = priv->info; + netif_carrier_off(ndev); + netif_tx_disable(ndev); + cancel_work_sync(&priv->work); /* Stop PTP Clock driver */ if (info->ccc_gac) ravb_ptp_stop(ndev);