Message ID | 20231130174126.688486-3-herve.codina@bootlin.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp563731vqy; Thu, 30 Nov 2023 09:41:52 -0800 (PST) X-Google-Smtp-Source: AGHT+IGt+MZ0+6RxzTIik7W/pSFzzWs+JAboPBRKW6POEmAMNL7h3qjA4G8pOsZa3cxRMLTJDO0X X-Received: by 2002:a05:6a20:549b:b0:187:758e:9781 with SMTP id i27-20020a056a20549b00b00187758e9781mr34762377pzk.17.1701366112292; Thu, 30 Nov 2023 09:41:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701366112; cv=none; d=google.com; s=arc-20160816; b=y4juQxIKyrspMGFd57nqD1997u1qPY96YFZavVJI2MKY7q2D0P+uwY48pY42PKTofw fgZ/vr5RLHZoCv3slaz0nTRhlbMz6yQg74T2Lu0JKytLQJJiAz3tvsYodqw7hm2QzcYq Z2gGAhZaHcr1NJ+RfdPya3S9Q/KYhGp5xFG/iPTfSRRMMIYgRi1QEh7CEsg5nLD/q3mm K11NRI3fzIgChHAjo1vNniqRVF/S0Np3nrcHyNMqiKJNqE22dKBCi6AsLGPZDD4M4QkP /1/p3M+0ok1GB339gEC8pQvklwGRtyO0zLhN3XOXBDs3Q3/pIPdq5EBpFGC4B8d0ECC0 FLNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=YCtXOEjZA4vVhqRvZxyqSJ/sJ1S1+pskPHyTx+aj7Bo=; fh=hxCb8OzaXBuSbXNrdSfbxd9E+OOUaTeXTLLu8usFTUA=; b=RIO5VMpvcfSwzi0YYiwxraYAkWKnLSsZMVXcw2/hzAz2T9Eo3CMcU9LlQwsI2hnxwK 2hP9TDGkMG8N01b1HxXjx6OWQotr7F4LKF1ntXj/+F4ggWeUnvKnDLoU0cjUIHIYHvku +GGiQ+YAGRJx2v01KSbDJh5G+fa2cxDkA+ApSQ7yvMrq7iQYj2GhcHn9oVgeBdKF2wff hbaj8Hf90KrEvSyquir0wypQamNjCCEO2Gqgp1i7c5J+GB41GqqrC1HJJ6CQx8jeeVXk jagatHDANKdgAJutgliRMeSGSfJrhXes+uxXIFlX5sQl7c+2scWk5EFxGhG0ozQMXaBI aJHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bootlin.com header.s=gm1 header.b=LPx1+fRX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=bootlin.com Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id dh3-20020a056a020b8300b005b999968b85si1825830pgb.666.2023.11.30.09.41.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 09:41:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@bootlin.com header.s=gm1 header.b=LPx1+fRX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=bootlin.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 351EC82CB858; Thu, 30 Nov 2023 09:41:48 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346604AbjK3Rlc (ORCPT <rfc822;ruipengqi7@gmail.com> + 99 others); Thu, 30 Nov 2023 12:41:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53310 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346496AbjK3Rl1 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 30 Nov 2023 12:41:27 -0500 Received: from relay8-d.mail.gandi.net (relay8-d.mail.gandi.net [IPv6:2001:4b98:dc4:8::228]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 363A810DE; Thu, 30 Nov 2023 09:41:33 -0800 (PST) Received: by mail.gandi.net (Postfix) with ESMTPA id 32A621BF206; Thu, 30 Nov 2023 17:41:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bootlin.com; s=gm1; t=1701366091; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YCtXOEjZA4vVhqRvZxyqSJ/sJ1S1+pskPHyTx+aj7Bo=; b=LPx1+fRXYo9UizKn96Mo1PH2HBts4hfEy67nbnAsjK2WsGwfe2AqQ1twJi1tJiOE5RZ4C5 gaSWtIScGmcLpC8O+IbgeysUdhiqvbpILR9O1//gEskk4TKHXqDP1FCYd4nJdPuRyRyAgd NSVPU2xlfSCr3WLH1Xe0uUpzvCsH6XGMUxrtfyFn/UgZniaRrDzf1gGBOCJ0OMVPehGvhO pNJAlvN8ZOLQTzremWQjjtGbH7eTtKiUNV7heBDwknRhEZfXR1k7aVESlwktB7HsdUyVFk S449nqA/ggrQQOzKd5s5rvetaK5RNG58hVkjC0PxrrJdkTx03lxYC7XAzgweNw== From: Herve Codina <herve.codina@bootlin.com> To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, "Rafael J. Wysocki" <rafael@kernel.org>, Rob Herring <robh+dt@kernel.org>, Frank Rowand <frowand.list@gmail.com> Cc: Lizhi Hou <lizhi.hou@amd.com>, Max Zhen <max.zhen@amd.com>, Sonal Santan <sonal.santan@amd.com>, Stefano Stabellini <stefano.stabellini@xilinx.com>, Jonathan Cameron <Jonathan.Cameron@Huawei.com>, linux-kernel@vger.kernel.org, devicetree@vger.kernel.org, Allan Nielsen <allan.nielsen@microchip.com>, Horatiu Vultur <horatiu.vultur@microchip.com>, Steen Hegelund <steen.hegelund@microchip.com>, Thomas Petazzoni <thomas.petazzoni@bootlin.com>, Herve Codina <herve.codina@bootlin.com> Subject: [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals Date: Thu, 30 Nov 2023 18:41:09 +0100 Message-ID: <20231130174126.688486-3-herve.codina@bootlin.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231130174126.688486-1-herve.codina@bootlin.com> References: <20231130174126.688486-1-herve.codina@bootlin.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-GND-Sasl: herve.codina@bootlin.com X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Thu, 30 Nov 2023 09:41:48 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784011672933693390 X-GMAIL-MSGID: 1784011672933693390 |
Series |
Synchronize DT overlay removal with devlink removals
|
|
Commit Message
Herve Codina
Nov. 30, 2023, 5:41 p.m. UTC
In the following sequence:
1) of_platform_depopulate()
2) of_overlay_remove()
During the step 1, devices are destroyed and devlinks are removed.
During the step 2, OF nodes are destroyed but
__of_changeset_entry_destroy() can raise warnings related to missing
of_node_put():
ERROR: memory leak, expected refcount 1 instead of 2 ...
Indeed, during the devlink removals performed at step 1, the removal
itself releasing the device (and the attached of_node) is done by a job
queued in a workqueue and so, it is done asynchronously with respect to
function calls.
When the warning is present, of_node_put() will be called but wrongly
too late from the workqueue job.
In order to be sure that any ongoing devlink removals are done before
the of_node destruction, synchronize the of_overlay_remove() with the
devlink removals.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
---
drivers/of/overlay.c | 6 ++++++
1 file changed, 6 insertions(+)
Comments
On Thu, Nov 30, 2023 at 9:41 AM Herve Codina <herve.codina@bootlin.com> wrote: > > In the following sequence: > 1) of_platform_depopulate() > 2) of_overlay_remove() > > During the step 1, devices are destroyed and devlinks are removed. > During the step 2, OF nodes are destroyed but > __of_changeset_entry_destroy() can raise warnings related to missing > of_node_put(): > ERROR: memory leak, expected refcount 1 instead of 2 ... > > Indeed, during the devlink removals performed at step 1, the removal > itself releasing the device (and the attached of_node) is done by a job > queued in a workqueue and so, it is done asynchronously with respect to > function calls. > When the warning is present, of_node_put() will be called but wrongly > too late from the workqueue job. > > In order to be sure that any ongoing devlink removals are done before > the of_node destruction, synchronize the of_overlay_remove() with the > devlink removals. > Add Fixes tag for this one too to point to the change that added the workqueue. Please CC Nuno and Luca in your v2 series. > Signed-off-by: Herve Codina <herve.codina@bootlin.com> > --- > drivers/of/overlay.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > index a9a292d6d59b..5c5f808b163e 100644 > --- a/drivers/of/overlay.c > +++ b/drivers/of/overlay.c > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id) > goto out; > } > > + /* > + * Wait for any ongoing device link removals before removing some of > + * nodes > + */ > + device_link_wait_removal(); > + Nuno in his patch[1] had this "wait" happen inside __of_changeset_entry_destroy(). Which seems to be necessary to not hit the issue that Luca reported[2] in this patch series. Is there any problem with doing that? Luca for some reason did a unlock/lock(of_mutex) in his test patch and I don't think that's necessary. Can you move this call to where Nuno did it and see if that works for all of you? [1] - https://lore.kernel.org/all/20240205-fix-device-links-overlays-v2-2-5344f8c79d57@analog.com/ [2] - https://lore.kernel.org/all/20231220181627.341e8789@booty/ Thank, Saravana > mutex_lock(&of_mutex); > > ovcs = idr_find(&ovcs_idr, *ovcs_id); > -- > 2.42.0 > >
On Tue, 2024-02-20 at 16:37 -0800, Saravana Kannan wrote: > On Thu, Nov 30, 2023 at 9:41 AM Herve Codina <herve.codina@bootlin.com> wrote: > > > > In the following sequence: > > 1) of_platform_depopulate() > > 2) of_overlay_remove() > > > > During the step 1, devices are destroyed and devlinks are removed. > > During the step 2, OF nodes are destroyed but > > __of_changeset_entry_destroy() can raise warnings related to missing > > of_node_put(): > > ERROR: memory leak, expected refcount 1 instead of 2 ... > > > > Indeed, during the devlink removals performed at step 1, the removal > > itself releasing the device (and the attached of_node) is done by a job > > queued in a workqueue and so, it is done asynchronously with respect to > > function calls. > > When the warning is present, of_node_put() will be called but wrongly > > too late from the workqueue job. > > > > In order to be sure that any ongoing devlink removals are done before > > the of_node destruction, synchronize the of_overlay_remove() with the > > devlink removals. > > > > Add Fixes tag for this one too to point to the change that added the workqueue. > > Please CC Nuno and Luca in your v2 series. > > > Signed-off-by: Herve Codina <herve.codina@bootlin.com> > > --- > > drivers/of/overlay.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > > index a9a292d6d59b..5c5f808b163e 100644 > > --- a/drivers/of/overlay.c > > +++ b/drivers/of/overlay.c > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id) > > goto out; > > } > > > > + /* > > + * Wait for any ongoing device link removals before removing some of > > + * nodes > > + */ > > + device_link_wait_removal(); > > + > > Nuno in his patch[1] had this "wait" happen inside > __of_changeset_entry_destroy(). Which seems to be necessary to not hit > the issue that Luca reported[2] in this patch series. Is there any > problem with doing that? > In my tests, I did not saw any issue. Logically it also makes sense as you wait for all possible refcount drops just before checking your assumptions. But I might be missing something though... > Luca for some reason did a unlock/lock(of_mutex) in his test patch and > I don't think that's necessary. Yes, I agree. queue_work() and flush_worqueue() should already have all the synchronization semantics internally. - Nuno Sá
On Fri, 2024-02-23 at 10:45 +0100, Herve Codina wrote: > Hi Saravana, Nuno, > > On Tue, 20 Feb 2024 16:37:05 -0800 > Saravana Kannan <saravanak@google.com> wrote: > > ... > > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id) > > > goto out; > > > } > > > > > > + /* > > > + * Wait for any ongoing device link removals before removing some > > > of > > > + * nodes > > > + */ > > > + device_link_wait_removal(); > > > + > > > > Nuno in his patch[1] had this "wait" happen inside > > __of_changeset_entry_destroy(). Which seems to be necessary to not hit > > the issue that Luca reported[2] in this patch series. Is there any > > problem with doing that? > > Is it the right place to wait ? > > __of_changeset_entry_destroy() can do some of_node_put() and I am not sure > that of_node_put() can call device_put() when the of_node refcount reachs > zero. > I don't think of_node_put() can call device_put(). At least by looking at: https://elixir.bootlin.com/linux/v6.8-rc5/source/drivers/of/dynamic.c#L326 > If of_node_put() cannot call device_put(), I think we can wait in the > of_changeset_destroy(). I.e. the __of_changeset_entry_destroy() caller. > https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/dynamic.c#L670 > > What do you think about this ? > Does it make sense ? I think it makes sense from a logical point of view. Like, let's flush the queue right before checking our assumptions... In my tests, I did not saw any issue (Hopefully I was not missing any subtlety). - Nuno Sá
On Tue, 2024-02-27 at 16:24 +0100, Herve Codina wrote: > Hi Saravana, Luca, Nuno, > > On Tue, 20 Feb 2024 16:37:05 -0800 > Saravana Kannan <saravanak@google.com> wrote: > > ... > > > > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > > > index a9a292d6d59b..5c5f808b163e 100644 > > > --- a/drivers/of/overlay.c > > > +++ b/drivers/of/overlay.c > > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id) > > > goto out; > > > } > > > > > > + /* > > > + * Wait for any ongoing device link removals before removing some of > > > + * nodes > > > + */ > > > + device_link_wait_removal(); > > > + > > > > Nuno in his patch[1] had this "wait" happen inside > > __of_changeset_entry_destroy(). Which seems to be necessary to not hit > > the issue that Luca reported[2] in this patch series. Is there any > > problem with doing that? > > > > Luca for some reason did a unlock/lock(of_mutex) in his test patch and > > I don't think that's necessary. > > I think the unlock/lock in Luca's case and so in Nuno's case is needed. > > I do the device_link_wait_removal() wihout having the of_mutex locked. > > Now, suppose I do the device_link_wait_removal() call with the of_mutex locked. > The following flow is allowed and a deadlock is present. > > of_overlay_remove() > lock(of_mutex) > device_link_wait_removal() > > And, from the workqueue jobs execution: > ... > device_put() > some_driver->remove() > of_overlay_remove() <--- The job will never end. > It is waiting for of_mutex. > Deadlock > We may need some input from Saravana (and others) on this. I might be missing something but can a put_device() lead into a driver remove callback? Driver code is not device code and put_device() leads to device_release() which will either call the device ->release(), ->type->release() or the class ->dev_release(). And, IMO, calling of_overlay_remove() or something like that (like something that would lead to unbinding a device from it's driver) in a device release callback would be at the very least very questionable. Typically, what you see in there is of_node_put() and things like kfree() of the device itself or any other data. The driver remove callback should be called when unbinding the device from it's drivers and devlinks should also be removed after device_unbind_cleanup() (i.e, after the driver remove callback). Having said the above, the driver core has lots of subtleties so, again, I can be missing something. But at this point I'm still not seeing any deadlock... - Nuno Sá
Hi Nuno, On Tue, 27 Feb 2024 17:55:07 +0100 Nuno Sá <noname.nuno@gmail.com> wrote: > On Tue, 2024-02-27 at 16:24 +0100, Herve Codina wrote: > > Hi Saravana, Luca, Nuno, > > > > On Tue, 20 Feb 2024 16:37:05 -0800 > > Saravana Kannan <saravanak@google.com> wrote: > > > > ... > > > > > > > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > > > > index a9a292d6d59b..5c5f808b163e 100644 > > > > --- a/drivers/of/overlay.c > > > > +++ b/drivers/of/overlay.c > > > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id) > > > > goto out; > > > > } > > > > > > > > + /* > > > > + * Wait for any ongoing device link removals before removing some of > > > > + * nodes > > > > + */ > > > > + device_link_wait_removal(); > > > > + > > > > > > Nuno in his patch[1] had this "wait" happen inside > > > __of_changeset_entry_destroy(). Which seems to be necessary to not hit > > > the issue that Luca reported[2] in this patch series. Is there any > > > problem with doing that? > > > > > > Luca for some reason did a unlock/lock(of_mutex) in his test patch and > > > I don't think that's necessary. > > > > I think the unlock/lock in Luca's case and so in Nuno's case is needed. > > > > I do the device_link_wait_removal() wihout having the of_mutex locked. > > > > Now, suppose I do the device_link_wait_removal() call with the of_mutex locked. > > The following flow is allowed and a deadlock is present. > > > > of_overlay_remove() > > lock(of_mutex) > > device_link_wait_removal() > > > > And, from the workqueue jobs execution: > > ... > > device_put() > > some_driver->remove() > > of_overlay_remove() <--- The job will never end. > > It is waiting for of_mutex. > > Deadlock > > > > We may need some input from Saravana (and others) on this. I might be missing > something but can a put_device() lead into a driver remove callback? Driver code is > not device code and put_device() leads to device_release() which will either call the > device ->release(), ->type->release() or the class ->dev_release(). And, IMO, calling > of_overlay_remove() or something like that (like something that would lead to > unbinding a device from it's driver) in a device release callback would be at the > very least very questionable. Typically, what you see in there is of_node_put() and > things like kfree() of the device itself or any other data. I think that calling of_overlay_remove() in a device release callback makes sense. The overlay is used to declare sub-nodes from the device node. It does not add/remove the device node itself but sub-nodes. The use case is the use of DT overlays to describe PCI devices. https://lore.kernel.org/all/1692120000-46900-1-git-send-email-lizhi.hou@amd.com/ https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ --- 8< --- The lan966x SoCs can be used in two different ways: - It can run Linux by itself, on ARM64 cores included in the SoC. This use-case of the lan966x is currently being upstreamed, using a traditional Device Tree representation of the lan996x HW blocks [1] A number of drivers for the different IPs of the SoC have already been merged in upstream Linux. - It can be used as a PCIe endpoint, connected to a separate platform that acts as the PCIe root complex. In this case, all the devices that are embedded on this SoC are exposed through PCIe BARs and the ARM64 cores of the SoC are not used. Since this is a PCIe card, it can be plugged on any platform, of any architecture supporting PCIe. --- 8< --- This quite long story led to DT overlay support for PCI devices and so the unittest I mentioned: https://elixir.bootlin.com/linux/v6.8-rc6/source/drivers/of/unittest.c#L3946 So, I have a PCI driver that bind to the lan966x PCI board. This driver loads an overlay at probe() and unload it at remove(). Also, this driver can be module. A simple rmmod leads to the remove() call. This driver is not yet upstream because I haven't yet fixed all the issues I encountered that's why of now, I can point only the unittest related to overlay support for PCI. > > The driver remove callback should be called when unbinding the device from it's > drivers and devlinks should also be removed after device_unbind_cleanup() (i.e, after > the driver remove callback). > > Having said the above, the driver core has lots of subtleties so, again, I can be > missing something. But at this point I'm still not seeing any deadlock... > I gave a wrong example. Based on Luca's sequence he gave in https://lore.kernel.org/all/20231220181627.341e8789@booty/ We can have the following: --- 8< --- int of_overlay_remove(int *ovcs_id) { ... device_link_wait_removal(); // proposed by this patch series mutex_lock(&of_mutex); ... ret = __of_changeset_revert_notify(&ovcs->cset); // this ends up calling (excerpt from a long stack trace): // -> of_i2c_notify // -> device_remove // -> devm_regulator_release // -> device_link_remove // -> devlink_dev_release, which queues work for // device_link_release_fn, which in turn calls: // -> device_put // -> device_release // -> {platform,regulator,...}_dev*_release // -> of_node_put() [**] ... free_overlay_changeset(ovcs); // calls: // -> of_changeset_destroy // -> __of_changeset_entry_destroy // -> pr_err("ERROR: memory leak, expected refcount 1 instead of %d... // The error appears or not, based on when the workqueue runs err_unlock: mutex_unlock(&of_mutex); ... } --- 8< --- I think, on your side, you can have something similar. I was wrong (sorry for my mistake). the problem is not device_put() but device_remove(). In my deadlog example, s/device_put()/device_remove()/ Regards, Hervé
Hi Herve, On Tue, 2024-02-27 at 18:54 +0100, Herve Codina wrote: > Hi Nuno, > > On Tue, 27 Feb 2024 17:55:07 +0100 > Nuno Sá <noname.nuno@gmail.com> wrote: > > > On Tue, 2024-02-27 at 16:24 +0100, Herve Codina wrote: > > > Hi Saravana, Luca, Nuno, > > > > > > On Tue, 20 Feb 2024 16:37:05 -0800 > > > Saravana Kannan <saravanak@google.com> wrote: > > > > > > ... > > > > > > > > > > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > > > > > index a9a292d6d59b..5c5f808b163e 100644 > > > > > --- a/drivers/of/overlay.c > > > > > +++ b/drivers/of/overlay.c > > > > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id) > > > > > goto out; > > > > > } > > > > > > > > > > + /* > > > > > + * Wait for any ongoing device link removals before removing some > > > > > of > > > > > + * nodes > > > > > + */ > > > > > + device_link_wait_removal(); > > > > > + > > > > > > > > Nuno in his patch[1] had this "wait" happen inside > > > > __of_changeset_entry_destroy(). Which seems to be necessary to not hit > > > > the issue that Luca reported[2] in this patch series. Is there any > > > > problem with doing that? > > > > > > > > Luca for some reason did a unlock/lock(of_mutex) in his test patch and > > > > I don't think that's necessary. > > > > > > I think the unlock/lock in Luca's case and so in Nuno's case is needed. > > > > > > I do the device_link_wait_removal() wihout having the of_mutex locked. > > > > > > Now, suppose I do the device_link_wait_removal() call with the of_mutex locked. > > > The following flow is allowed and a deadlock is present. > > > > > > of_overlay_remove() > > > lock(of_mutex) > > > device_link_wait_removal() > > > > > > And, from the workqueue jobs execution: > > > ... > > > device_put() > > > some_driver->remove() > > > of_overlay_remove() <--- The job will never end. > > > It is waiting for of_mutex. > > > Deadlock > > > > > > > We may need some input from Saravana (and others) on this. I might be missing > > something but can a put_device() lead into a driver remove callback? Driver code > > is > > not device code and put_device() leads to device_release() which will either call > > the > > device ->release(), ->type->release() or the class ->dev_release() And, IMO, > > calling > > of_overlay_remove() or something like that (like something that would lead to > > unbinding a device from it's driver) in a device release callback would be at the > > very least very questionable. Typically, what you see in there is of_node_put() > > and > > things like kfree() of the device itself or any other data. > > I think that calling of_overlay_remove() in a device release callback makes > sense. The overlay is used to declare sub-nodes from the device node. It > does not add/remove the device node itself but sub-nodes. > I think we are speaking about two different things... device release is not the same as the driver remove callback. I admit the pci case seems to be a beast of it's own and I just spent some time (given your links) on it so I can't surely be sure about what I'm about to say... But, AFAICT, I did not saw any overlay or changeset being removed from a kobj_type release callback. > The use case is the use of DT overlays to describe PCI devices. > https://lore.kernel.org/all/1692120000-46900-1-git-send-email-lizhi.hou@amd.com/ > https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ > --- 8< --- > The lan966x SoCs can be used in two different ways: > > - It can run Linux by itself, on ARM64 cores included in the SoC. This > use-case of the lan966x is currently being upstreamed, using a > traditional Device Tree representation of the lan996x HW blocks [1] > A number of drivers for the different IPs of the SoC have already > been merged in upstream Linux. > > - It can be used as a PCIe endpoint, connected to a separate platform > that acts as the PCIe root complex. In this case, all the devices > that are embedded on this SoC are exposed through PCIe BARs and the > ARM64 cores of the SoC are not used. Since this is a PCIe card, it > can be plugged on any platform, of any architecture supporting PCIe. > --- 8< --- > > This quite long story led to DT overlay support for PCI devices and so the > unittest I mentioned: > https://elixir.bootlin.com/linux/v6.8-rc6/source/drivers/of/unittest.c#L3946 > > > So, I have a PCI driver that bind to the lan966x PCI board. > This driver loads an overlay at probe() and unload it at remove(). > Also, this driver can be module. A simple rmmod leads to the remove() call. > Hmm, and I think that would not be an issue... Note that the code that runs in device_link_release_fn() is doing put_device() which ends ups on the kobj_type release callback and so far I could not see any evidence of such a callback being responsible of calling device_remove() on another device. That would be weird (I think) since I would expect such a call to happen in a kind of unregister function. > This driver is not yet upstream because I haven't yet fixed all the issues I > encountered that's why of now, I can point only the unittest related to overlay > support for PCI. > > > > > The driver remove callback should be called when unbinding the device from it's > > drivers and devlinks should also be removed after device_unbind_cleanup() (i.e, > > after > > the driver remove callback). > > > > Having said the above, the driver core has lots of subtleties so, again, I can be > > missing something. But at this point I'm still not seeing any deadlock.. > > > > I gave a wrong example. > Based on Luca's sequence he gave in > https://lore.kernel.org/all/20231220181627.341e8789@booty/ Regarding Luca's comments, my first approach was actually to just make the devlink removal synchronously... I'm still not sure what would be the issue of doing that (other than potentially waiting some time for the srcu synchronization). About the unlock, I'm just not sure what could happen if someone else (other than us) sneaks in and grabs the of_mutex while we are in the middle of removing an overlay... > > We can have the following: > > --- 8< --- > int of_overlay_remove(int *ovcs_id) > { > ... > > device_link_wait_removal(); // proposed by this patch series > > mutex_lock(&of_mutex); > > ... > > ret = __of_changeset_revert_notify(&ovcs->cset); > // this ends up calling (excerpt from a long stack trace): > // -> of_i2c_notify > // -> device_remove > // -> devm_regulator_release > // -> device_link_remove > // -> devlink_dev_release, which queues work for > // device_link_release_fn, which in turn calls: > // -> device_put > // -> device_release > // -> {platform,regulator,...}_dev*_release > // -> of_node_put() [**] > > ... > > free_overlay_changeset(ovcs); > // calls: > // -> of_changeset_destroy > // -> __of_changeset_entry_destroy > // -> pr_err("ERROR: memory leak, expected refcount 1 instead of %d... > // The error appears or not, based on when the workqueue runs > > err_unlock: > mutex_unlock(&of_mutex); > > ... > } > --- 8< --- > > I think, on your side, you can have something similar. > I was wrong (sorry for my mistake). the problem is not device_put() but > device_remove(). But I'm not seeing how device_remove() can deadlock since I'm not sure we can go from device_link_release_fn() to device_remove(). If there's such a path, then I'll agree on the deadlock. > > In my deadlog example, s/device_put()/device_remove()/ > Exactly... and that is why my first question was to wonder about put_device() being able to call any overlay removal code. So, do you know if it's really possible for a device release callback to end up calling device_remove()? Because, then I could see the deadlock as device_remove() can end up unbinding the device from it's driver and hence calling drv->remove(). - Nuno Sá
On Tue, Feb 27, 2024 at 8:08 PM Nuno Sá <noname.nuno@gmail.com> wrote: > > Hi Herve, > > On Tue, 2024-02-27 at 18:54 +0100, Herve Codina wrote: > > Hi Nuno, > > > > On Tue, 27 Feb 2024 17:55:07 +0100 > > Nuno Sá <noname.nuno@gmail.com> wrote: > > > > > On Tue, 2024-02-27 at 16:24 +0100, Herve Codina wrote: > > > > Hi Saravana, Luca, Nuno, > > > > > > > > On Tue, 20 Feb 2024 16:37:05 -0800 > > > > Saravana Kannan <saravanak@google.com> wrote: > > > > > > > > ... > > > > > > > > > > > > > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > > > > > > index a9a292d6d59b..5c5f808b163e 100644 > > > > > > --- a/drivers/of/overlay.c > > > > > > +++ b/drivers/of/overlay.c > > > > > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id) > > > > > > goto out; > > > > > > } > > > > > > > > > > > > + /* > > > > > > + * Wait for any ongoing device link removals before removing some > > > > > > of > > > > > > + * nodes > > > > > > + */ > > > > > > + device_link_wait_removal(); > > > > > > + > > > > > > > > > > Nuno in his patch[1] had this "wait" happen inside > > > > > __of_changeset_entry_destroy(). Which seems to be necessary to not hit > > > > > the issue that Luca reported[2] in this patch series. Is there any > > > > > problem with doing that? > > > > > > > > > > Luca for some reason did a unlock/lock(of_mutex) in his test patch and > > > > > I don't think that's necessary. > > > > > > > > I think the unlock/lock in Luca's case and so in Nuno's case is needed. > > > > > > > > I do the device_link_wait_removal() wihout having the of_mutex locked. > > > > > > > > Now, suppose I do the device_link_wait_removal() call with the of_mutex locked. > > > > The following flow is allowed and a deadlock is present. > > > > > > > > of_overlay_remove() > > > > lock(of_mutex) > > > > device_link_wait_removal() > > > > > > > > And, from the workqueue jobs execution: > > > > ... > > > > device_put() > > > > some_driver->remove() > > > > of_overlay_remove() <--- The job will never end. > > > > It is waiting for of_mutex. > > > > Deadlock > > > > > > > > > > We may need some input from Saravana (and others) on this. I might be missing > > > something but can a put_device() lead into a driver remove callback? Driver code > > > is > > > not device code and put_device() leads to device_release() which will either call > > > the > > > device ->release(), ->type->release() or the class ->dev_release(). And, IMO, > > > calling > > > of_overlay_remove() or something like that (like something that would lead to > > > unbinding a device from it's driver) in a device release callback would be at the > > > very least very questionable. Typically, what you see in there is of_node_put() > > > and > > > things like kfree() of the device itself or any other data. > > > > I think that calling of_overlay_remove() in a device release callback makes > > sense. The overlay is used to declare sub-nodes from the device node. It > > does not add/remove the device node itself but sub-nodes. > > > > I think we are speaking about two different things... device release is not the same > as the driver remove callback. I admit the pci case seems to be a beast of it's own > and I just spent some time (given your links) on it so I can't surely be sure about > what I'm about to say... But, AFAICT, I did not saw any overlay or changeset being > removed from a kobj_type release callback. > > > The use case is the use of DT overlays to describe PCI devices. > > https://lore.kernel.org/all/1692120000-46900-1-git-send-email-lizhi.hou@amd.com/ > > https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ > > --- 8< --- > > The lan966x SoCs can be used in two different ways: > > > > - It can run Linux by itself, on ARM64 cores included in the SoC. This > > use-case of the lan966x is currently being upstreamed, using a > > traditional Device Tree representation of the lan996x HW blocks [1] > > A number of drivers for the different IPs of the SoC have already > > been merged in upstream Linux. > > > > - It can be used as a PCIe endpoint, connected to a separate platform > > that acts as the PCIe root complex. In this case, all the devices > > that are embedded on this SoC are exposed through PCIe BARs and the > > ARM64 cores of the SoC are not used. Since this is a PCIe card, it > > can be plugged on any platform, of any architecture supporting PCIe. > > --- 8< --- > > > > This quite long story led to DT overlay support for PCI devices and so the > > unittest I mentioned: > > https://elixir.bootlin.com/linux/v6.8-rc6/source/drivers/of/unittest.c#L3946 > > > > > > So, I have a PCI driver that bind to the lan966x PCI board. > > This driver loads an overlay at probe() and unload it at remove(). > > Also, this driver can be module. A simple rmmod leads to the remove() call. > > > > Hmm, and I think that would not be an issue... Note that the code that runs in > device_link_release_fn() is doing put_device() which ends ups on the kobj_type > release callback and so far I could not see any evidence of such a callback being > responsible of calling device_remove() on another device. That would be weird (I > think) since I would expect such a call to happen in a kind of unregister function. > > > This driver is not yet upstream because I haven't yet fixed all the issues I > > encountered that's why of now, I can point only the unittest related to overlay > > support for PCI. > > > > > > > > The driver remove callback should be called when unbinding the device from it's > > > drivers and devlinks should also be removed after device_unbind_cleanup() (i.e, > > > after > > > the driver remove callback). > > > > > > Having said the above, the driver core has lots of subtleties so, again, I can be > > > missing something. But at this point I'm still not seeing any deadlock... > > > > > > > I gave a wrong example. > > Based on Luca's sequence he gave in > > https://lore.kernel.org/all/20231220181627.341e8789@booty/ > > Regarding Luca's comments, my first approach was actually to just make the devlink > removal synchronously... I'm still not sure what would be the issue of doing that > (other than potentially waiting some time for the srcu synchronization). It would allow forward progress to be made, but it would add potential delay for everybody, which is only really needed in the DT overlay case. Sounds like something to avoid to me. > About the unlock, I'm just not sure what could happen if someone else (other than us) sneaks in > and grabs the of_mutex while we are in the middle of removing an overlay.. If that is possible at all, I'd call it a bug.
On Tue, 2024-02-27 at 20:13 +0100, Rafael J. Wysocki wrote: > On Tue, Feb 27, 2024 at 8:08 PM Nuno Sá <noname.nuno@gmail.com> wrote: > > > > Hi Herve, > > > > On Tue, 2024-02-27 at 18:54 +0100, Herve Codina wrote: > > > Hi Nuno, > > > > > > On Tue, 27 Feb 2024 17:55:07 +0100 > > > Nuno Sá <noname.nuno@gmail.com> wrote: > > > > > > > On Tue, 2024-02-27 at 16:24 +0100, Herve Codina wrote: > > > > > Hi Saravana, Luca, Nuno, > > > > > > > > > > On Tue, 20 Feb 2024 16:37:05 -0800 > > > > > Saravana Kannan <saravanak@google.com> wrote: > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > > > > > > > index a9a292d6d59b..5c5f808b163e 100644 > > > > > > > --- a/drivers/of/overlay.c > > > > > > > +++ b/drivers/of/overlay.c > > > > > > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id) > > > > > > > goto out; > > > > > > > } > > > > > > > > > > > > > > + /* > > > > > > > + * Wait for any ongoing device link removals before removing > > > > > > > some > > > > > > > of > > > > > > > + * nodes > > > > > > > + */ > > > > > > > + device_link_wait_removal(); > > > > > > > + > > > > > > > > > > > > Nuno in his patch[1] had this "wait" happen inside > > > > > > __of_changeset_entry_destroy(). Which seems to be necessary to not hit > > > > > > the issue that Luca reported[2] in this patch series. Is there any > > > > > > problem with doing that? > > > > > > > > > > > > Luca for some reason did a unlock/lock(of_mutex) in his test patch and > > > > > > I don't think that's necessary. > > > > > > > > > > I think the unlock/lock in Luca's case and so in Nuno's case is needed. > > > > > > > > > > I do the device_link_wait_removal() wihout having the of_mutex locked. > > > > > > > > > > Now, suppose I do the device_link_wait_removal() call with the of_mutex > > > > > locked. > > > > > The following flow is allowed and a deadlock is present. > > > > > > > > > > of_overlay_remove() > > > > > lock(of_mutex) > > > > > device_link_wait_removal() > > > > > > > > > > And, from the workqueue jobs execution: > > > > > ... > > > > > device_put() > > > > > some_driver->remove() > > > > > of_overlay_remove() <--- The job will never end. > > > > > It is waiting for of_mutex. > > > > > Deadlock > > > > > > > > > > > > > We may need some input from Saravana (and others) on this. I might be missing > > > > something but can a put_device() lead into a driver remove callback? Driver > > > > code > > > > is > > > > not device code and put_device() leads to device_release() which will either > > > > call > > > > the > > > > device ->release(), ->type->release() or the class ->dev_release(). And, IMO, > > > > calling > > > > of_overlay_remove() or something like that (like something that would lead to > > > > unbinding a device from it's driver) in a device release callback would be at > > > > the > > > > very least very questionable. Typically, what you see in there is > > > > of_node_put() > > > > and > > > > things like kfree() of the device itself or any other data. > > > > > > I think that calling of_overlay_remove() in a device release callback makes > > > sense. The overlay is used to declare sub-nodes from the device node. It > > > does not add/remove the device node itself but sub-nodes. > > > > > > > I think we are speaking about two different things... device release is not the > > same > > as the driver remove callback. I admit the pci case seems to be a beast of it's > > own > > and I just spent some time (given your links) on it so I can't surely be sure > > about > > what I'm about to say... But, AFAICT, I did not saw any overlay or changeset > > being > > removed from a kobj_type release callback. > > > > > The use case is the use of DT overlays to describe PCI devices. > > > https://lore.kernel.org/all/1692120000-46900-1-git-send-email-lizhi.hou@amd.com/ > > > https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/ > > > --- 8< --- > > > The lan966x SoCs can be used in two different ways: > > > > > > - It can run Linux by itself, on ARM64 cores included in the SoC. This > > > use-case of the lan966x is currently being upstreamed, using a > > > traditional Device Tree representation of the lan996x HW blocks [1] > > > A number of drivers for the different IPs of the SoC have already > > > been merged in upstream Linux. > > > > > > - It can be used as a PCIe endpoint, connected to a separate platform > > > that acts as the PCIe root complex. In this case, all the devices > > > that are embedded on this SoC are exposed through PCIe BARs and the > > > ARM64 cores of the SoC are not used. Since this is a PCIe card, it > > > can be plugged on any platform, of any architecture supporting PCIe. > > > --- 8< --- > > > > > > This quite long story led to DT overlay support for PCI devices and so the > > > unittest I mentioned: > > > https://elixir.bootlin.com/linux/v6.8-rc6/source/drivers/of/unittest.c#L3946 > > > > > > > > > So, I have a PCI driver that bind to the lan966x PCI board. > > > This driver loads an overlay at probe() and unload it at remove(). > > > Also, this driver can be module. A simple rmmod leads to the remove() call. > > > > > > > Hmm, and I think that would not be an issue... Note that the code that runs in > > device_link_release_fn() is doing put_device() which ends ups on the kobj_type > > release callback and so far I could not see any evidence of such a callback being > > responsible of calling device_remove() on another device. That would be weird (I > > think) since I would expect such a call to happen in a kind of unregister > > function. > > > > > This driver is not yet upstream because I haven't yet fixed all the issues I > > > encountered that's why of now, I can point only the unittest related to overlay > > > support for PCI. > > > > > > > > > > > The driver remove callback should be called when unbinding the device from > > > > it's > > > > drivers and devlinks should also be removed after device_unbind_cleanup() > > > > (i.e, > > > > after > > > > the driver remove callback). > > > > > > > > Having said the above, the driver core has lots of subtleties so, again, I > > > > can be > > > > missing something. But at this point I'm still not seeing any deadlock... > > > > > > > > > > I gave a wrong example. > > > Based on Luca's sequence he gave in > > > https://lore.kernel.org/all/20231220181627.341e8789@booty/ > > > > Regarding Luca's comments, my first approach was actually to just make the > > devlink > > removal synchronously... I'm still not sure what would be the issue of doing that > > (other than potentially waiting some time for the srcu synchronization). > > It would allow forward progress to be made, but it would add potential > delay for everybody, which is only really needed in the DT overlay > case. Sounds like something to avoid to me. I mean, we could surely detect (or have a way to do it) if the devlink is being removed as part of an overlay removal and only call device_link_release_fn() synchronously in that case. It would minimize the effects I guess. But yeah, we still need to make sure there's an actual deadlock... I'm still not seeing it but Herve may very well be correct about it. > > > About the unlock, I'm just not sure what could happen if someone else (other than > > us) sneaks in > > and grabs the of_mutex while we are in the middle of removing an overlay... > > If that is possible at all, I'd call it a bug. I think, in theory, it could happen as it looks to be a fairly coarse grained lock for OF: https://elixir.bootlin.com/linux/latest/source/drivers/of/of_private.h#L40 - Nuno Sá
diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c index a9a292d6d59b..5c5f808b163e 100644 --- a/drivers/of/overlay.c +++ b/drivers/of/overlay.c @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id) goto out; } + /* + * Wait for any ongoing device link removals before removing some of + * nodes + */ + device_link_wait_removal(); + mutex_lock(&of_mutex); ovcs = idr_find(&ovcs_idr, *ovcs_id);