Message ID | 20240123-fix-device-links-overlays-v1-1-9e4f6acaab6c@analog.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-35577-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:2553:b0:103:945f:af90 with SMTP id p19csp431442dyi; Tue, 23 Jan 2024 08:01:00 -0800 (PST) X-Google-Smtp-Source: AGHT+IGZ2luXoH5Qwe3IQ/H6n7eU0XgNpKWv8XE6WQSdqRbuzAzwYRuvAfQ5Sn9r8vfIEzqy6hvT X-Received: by 2002:a62:d10b:0:b0:6db:7038:fcc1 with SMTP id z11-20020a62d10b000000b006db7038fcc1mr6388561pfg.11.1706025659958; Tue, 23 Jan 2024 08:00:59 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706025659; cv=pass; d=google.com; s=arc-20160816; b=trQVGmc8KkxyCEoJob3VlNLy6OV3xb9u9qcKCibckUTPov1vQiTp6uQ5AzxENMI2xw Jl3lRyWUSB8hq2kxc8eWisI5LmwCJmZI1crslcu6fMcMfZbW7J08/EKiU/kLVYrylgYe dx05ojHQk+yui5ZqBxVswZA5+23kE9DgvcRPW3EVtg0gjP+VldCgqmzMmye60RvmF6RA PIIrgksXrUJZUo52P0KlBUm04f4kLwx1ETdsaZ3Nkt+In4sTSVHgAqfG3BXTna6LHQX5 JLhy8Ivb774VaWukxo4w2/uX4MvWIMmSr5yEXvqWfTteYHo8we34+goIDG7Kjab85nPg 0uWA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=reply-to:cc:to:message-id:content-transfer-encoding:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:subject:date :from:dkim-signature; bh=KkFgmpT/7JuUocT48SO0zSmBdmQ7ZuO3yWdC9Woka6s=; fh=3Mfh9GavZfmFyJxnA6ZvIAXM1QiCCC7al/GpPhvvgzo=; b=AUWMkW1Pvjz5/ur6tYiKddHSZOr9ybcRvPt6ZPyyqkFOTNQ1xNPfMOPNJ8fvqeSW81 CCLuaGo3m4zHBYMjHi0pA883ASbwLNWA+igoJ2bnm4KJwcXTTdwIzEMzmElbBUVTLkKj D5FQuX31BJHkTRuxECMGIRbhr0LChaVeFEFWdYCk9ELjuauys+FUlgESl6KZ2dehEwey y3rCrmW60bztPTqP8H2xOD8jskjOfCYujA4phkEVPm0RADmWG6g3YC0+P9CBKSA6E8bN y1g3DQjxZTxLgNhytevMyxeuXw4C1KC2h5KHgyvKEEyBGDvNBwDazXZJKD127YM9UHJO Vptw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=fI2B7tbe; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-35577-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-35577-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id g22-20020a635656000000b005ce801d8713si10126128pgm.822.2024.01.23.08.00.59 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Jan 2024 08:00:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-35577-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=fI2B7tbe; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-35577-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-35577-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 347CA294409 for <ouuuleilei@gmail.com>; Tue, 23 Jan 2024 15:48:29 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 76D3F62A19; Tue, 23 Jan 2024 15:40:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fI2B7tbe" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 93E2A60263 for <linux-kernel@vger.kernel.org>; Tue, 23 Jan 2024 15:40:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706024431; cv=none; b=b8k0x7epXb9vDHlgb3N+PY97ZZl9CkSh5ALG7fq/dybFqW8xq3om7b5KCmtlAz+fwZCDryg9EHUyHUYDawfy4QzDPRSa99RC8Toh7eeXD/DV0IQtjACXLL8YWUNUchFO76fhjOldx7UsNAMrvNm4w/4SR8sv41qwpF+eCIMIG8Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706024431; c=relaxed/simple; bh=xomYPaRSCJ17SFHQpi42vpCKL2kFTiH6LcE2pydaTNY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=RQn2Gj9+z2uc2+N/T5Xkme9jKaTmgtslrGDHCJSo3vLfqmfCrFiZk8siECGHpz8S17m71SBCqEVp7U/bLdbKsa9suwvvxHgeFa6j4MZbGlpVP2+XB2k++fdPKQEOSaiTlMRaXR1j8hJ+im+6dlIzV2A4n1EiH+PdNpHC3xGdSmU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fI2B7tbe; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPS id B8400C433C7; Tue, 23 Jan 2024 15:40:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706024430; bh=xomYPaRSCJ17SFHQpi42vpCKL2kFTiH6LcE2pydaTNY=; h=From:Date:Subject:To:Cc:Reply-To:From; b=fI2B7tbevQiY4Kb/UkiDt7ymq7wWm00jflqq3B62ydvJx9RRyixXhjHqkK7RMLpf/ di8A+wlgTcbHeC7MHiQvnYrWoiZb68iqRC1bPUV4OFeOal3hay1YCkTpaj/Oqplr0D VLlasYfl1Z/snOz6/Ratx/ExzYGDkajUm5T2DTcZPw3wIEwnZsE2F80DnsvxuFf/bK db5nKZK0O4tkN0uVf4Yodvjs9KuifgrwQj38IsPg/d19gdjn0AEgchUeZOucthrZC8 B9os6RxnyeMc+hxvTofg/xwkZm9D2H+FG6VSoZkzjyQLl+3urgTXMRZksR1eutcv0Q HbfSuSUjVFz9w== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id A18A2C47258; Tue, 23 Jan 2024 15:40:30 +0000 (UTC) From: Nuno Sa via B4 Relay <devnull+nuno.sa.analog.com@kernel.org> Date: Tue, 23 Jan 2024 16:40:23 +0100 Subject: [PATCH RESEND RFC] driver: core: don't queue device links removal for dt overlays Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Message-Id: <20240123-fix-device-links-overlays-v1-1-9e4f6acaab6c@analog.com> X-B4-Tracking: v=1; b=H4sIAObdr2UC/6tWKk4tykwtVrJSqFYqSi3LLM7MzwNyDHUUlJIzE vPSU3UzU4B8JSMDIxMDQyNj3bTMCt0UoMLkVN2czLzsYt38stSinMTKYl1TEyOjVANj40QDyyQ loP6ColSgYrDZ0UpBrsGufi4KQW7OSrG1tQCNgijYdwAAAA== To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, "Rafael J. Wysocki" <rafael@kernel.org>, Frank Rowand <frowand.list@gmail.com>, Rob Herring <robh+dt@kernel.org> Cc: linux-kernel@vger.kernel.org X-Mailer: b4 0.12.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1706024429; l=4328; i=nuno.sa@analog.com; s=20231116; h=from:subject:message-id; bh=JVOSn21OvtpDUkPOC5fF8puS0bgK3Rufn2sGa79Madg=; b=cfODMPRpL5pVgH6gCwrcVuOzf9hR9c2ZnI52FZAG+HavO68Un6vPiv87iNTZsl81JzVaQTyUi fKkWyAvmDM7Ds5gFXS3ylvbYDGtrMwq6oP9l6UhMUZ3uCoVhclfhgGP X-Developer-Key: i=nuno.sa@analog.com; a=ed25519; pk=3NQwYA013OUYZsmDFBf8rmyyr5iQlxV/9H4/Df83o1E= X-Endpoint-Received: by B4 Relay for nuno.sa@analog.com/20231116 with auth_id=100 X-Original-From: Nuno Sa <nuno.sa@analog.com> Reply-To: <nuno.sa@analog.com> X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1788897562553530350 X-GMAIL-MSGID: 1788897562553530350 |
Series |
[RESEND,RFC] driver: core: don't queue device links removal for dt overlays
|
|
Commit Message
Nuno Sa via B4 Relay
Jan. 23, 2024, 3:40 p.m. UTC
From: Nuno Sa <nuno.sa@analog.com> For device links, releasing the supplier/consumer devices references happens asynchronously in device_link_release_fn(). Hence, the possible release of an of_node is also asynchronous. If these nodes were added through overlays we have a problem because this does not respect the devicetree overlays assumptions that when a changeset is being removed in __of_changeset_entry_destroy(), it must hold the last reference to that node. Due to the async nature of device links that cannot be guaranteed. Given the above, in case one of the link consumer/supplier is part of an overlay node we call directly device_link_release_fn() instead of queueing it. Yes, it might take some significant time for device_link_release_fn() to complete because of synchronize_srcu() but we would need to, anyways, wait for all OF references to be released if we want to respect overlays assumptions. Signed-off-by: Nuno Sa <nuno.sa@analog.com> --- This RFC is a follow up of a previous one that I sent to the devicetree folks [1]. It got rejected because it was not really fixing the root cause of the issue (which I do agree). Please see the link where I fully explain what the issue is. I did also some git blaming and did saw that commit 80dd33cf72d1 ("drivers: base: Fix device link removal") introduced queue_work() as we could be releasing the last device reference and hence sleeping which is against SRCU callback requirements. However, that same commit is now making use of synchronize_srcu() which may take significant time (and I think that's the reason for the work item?). However, given the dt overlays requirements, I'm not seeing any reason to not be able to run device_link_release_fn() synchronously if we detect an OVERLAY node is being released. I mean, even if we come up (and I did some experiments in this regard) with some async mechanism to release the OF nodes refcounts, we still need a synchronization point somewhere. Anyways, I would like to have some feedback on how acceptable would this be or what else could I do so we can have a "clean" dt overlay removal. I'm also including dt folks so they can give some comments on the new device_node_overlay_removal() function. My goal is to try to detect when an overlay is being removed (maybe we could even have an explicit flag for it?) and only directly call device_link_release_fn() in that case. [1]: https://lore.kernel.org/linux-devicetree/20230511151047.1779841-1-nuno.sa@analog.com/ --- drivers/base/core.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) --- base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d change-id: 20240123-fix-device-links-overlays-5422e033a09b -- Thanks! - Nuno Sá
Comments
On Wed, Jan 31, 2024 at 1:20 PM Nuno Sá <noname.nuno@gmail.com> wrote: > > On Tue, 2024-01-23 at 16:40 +0100, Nuno Sa via B4 Relay wrote: > > From: Nuno Sa <nuno.sa@analog.com> > > > > For device links, releasing the supplier/consumer devices references > > happens asynchronously in device_link_release_fn(). Hence, the possible > > release of an of_node is also asynchronous. If these nodes were added > > through overlays we have a problem because this does not respect the > > devicetree overlays assumptions that when a changeset is > > being removed in __of_changeset_entry_destroy(), it must hold the last > > reference to that node. Due to the async nature of device links that > > cannot be guaranteed. > > > > Given the above, in case one of the link consumer/supplier is part of > > an overlay node we call directly device_link_release_fn() instead of > > queueing it. Yes, it might take some significant time for > > device_link_release_fn() to complete because of synchronize_srcu() but > > we would need to, anyways, wait for all OF references to be released if > > we want to respect overlays assumptions. > > > > Signed-off-by: Nuno Sa <nuno.sa@analog.com> > > --- > > This RFC is a follow up of a previous one that I sent to the devicetree > > folks [1]. It got rejected because it was not really fixing the root > > cause of the issue (which I do agree). Please see the link where I > > fully explain what the issue is. > > > > I did also some git blaming and did saw that commit > > 80dd33cf72d1 ("drivers: base: Fix device link removal") introduced > > queue_work() as we could be releasing the last device reference and hence > > sleeping which is against SRCU callback requirements. However, that same > > commit is now making use of synchronize_srcu() which may take > > significant time (and I think that's the reason for the work item?). > > > > However, given the dt overlays requirements, I'm not seeing any > > reason to not be able to run device_link_release_fn() synchronously if we > > detect an OVERLAY node is being released. I mean, even if we come up > > (and I did some experiments in this regard) with some async mechanism to > > release the OF nodes refcounts, we still need a synchronization point > > somewhere. > > > > Anyways, I would like to have some feedback on how acceptable would this > > be or what else could I do so we can have a "clean" dt overlay removal. > > > > I'm also including dt folks so they can give some comments on the new > > device_node_overlay_removal() function. My goal is to try to detect when an > > overlay is being removed (maybe we could even have an explicit flag for > > it?) and only directly call device_link_release_fn() in that case. > > > > [1]: > > https://lore.kernel.org/linux-devicetree/20230511151047.1779841-1-nuno.sa@analog.com/ > > --- > > drivers/base/core.c | 25 ++++++++++++++++++++++++- > > 1 file changed, 24 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/base/core.c b/drivers/base/core.c > > index 14d46af40f9a..31ea001f6142 100644 > > --- a/drivers/base/core.c > > +++ b/drivers/base/core.c > > @@ -497,6 +497,18 @@ static struct attribute *devlink_attrs[] = { > > }; > > ATTRIBUTE_GROUPS(devlink); > > > > +static bool device_node_overlay_removal(struct device *dev) > > +{ > > + if (!dev_of_node(dev)) > > + return false; > > + if (!of_node_check_flag(dev->of_node, OF_DETACHED)) > > + return false; > > + if (!of_node_check_flag(dev->of_node, OF_OVERLAY)) > > + return false; > > + > > + return true; > > +} > > + > > static void device_link_release_fn(struct work_struct *work) > > { > > struct device_link *link = container_of(work, struct device_link, > > rm_work); > > @@ -532,8 +544,19 @@ static void devlink_dev_release(struct device *dev) > > * synchronization in device_link_release_fn() and if the consumer or > > * supplier devices get deleted when it runs, so put it into the > > "long" > > * workqueue. > > + * > > + * However, if any of the supplier, consumer nodes is being removed > > + * through overlay removal, the expectation in > > + * __of_changeset_entry_destroy() is for the node 'kref' to be 1 > > which > > + * cannot be guaranteed with the async nature of > > + * device_link_release_fn(). Hence, do it synchronously for the > > overlay > > + * case. > > */ > > - queue_work(system_long_wq, &link->rm_work); > > + if (device_node_overlay_removal(link->consumer) || > > + device_node_overlay_removal(link->supplier)) > > + device_link_release_fn(&link->rm_work); > > + else > > + queue_work(system_long_wq, &link->rm_work); > > } > > > > static struct class devlink_class = { > > > > --- > > base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d > > change-id: 20240123-fix-device-links-overlays-5422e033a09b > > -- > > > > Thanks! > > - Nuno Sá > > > > Hi Rafael, > > Would be nice to have your feedback on this one or if this is a complete nack... > I think calling device_link_release_fn() synchronously is ok but I might be > completely wrong. Well, it sounds like you are expecting me to confirm that what you are doing makes sense, but I cannot do that, because I am not sufficiently familiar with DT overlays. You first need to convince yourself that you are not completely wrong. > +Cc Saravan as he should also be very familiar with device_links and see if the > above fairly simple solution is sane. > > I also don't want to be pushy as I know you guys are all very busy but it's (i > think) the third time I resend the patch :) Sorry about that, I haven't realized that my input is requisite. So the patch not only calls device_link_release_fn() synchronously, but it also calls this function directly and I, personally, wouldn't do at least the latter. It should be fine to run it synchronously from within devlink_dev_release(), it will just take time for the SRCU synchronization, but AFAICS it is not generally safe to run it without dropping the last reference to the device link.
On Wed, Jan 31, 2024 at 3:18 PM Nuno Sá <noname.nuno@gmail.com> wrote: > > On Wed, 2024-01-31 at 14:30 +0100, Rafael J. Wysocki wrote: > > On Wed, Jan 31, 2024 at 1:20 PM Nuno Sá <noname.nuno@gmailcom> wrote: > > > > > > On Tue, 2024-01-23 at 16:40 +0100, Nuno Sa via B4 Relay wrote: > > > > From: Nuno Sa <nuno.sa@analog.com> > > > > > > > > For device links, releasing the supplier/consumer devices references > > > > happens asynchronously in device_link_release_fn(). Hence, the possible > > > > release of an of_node is also asynchronous. If these nodes were added > > > > through overlays we have a problem because this does not respect the > > > > devicetree overlays assumptions that when a changeset is > > > > being removed in __of_changeset_entry_destroy(), it must hold the last > > > > reference to that node. Due to the async nature of device links that > > > > cannot be guaranteed. > > > > > > > > Given the above, in case one of the link consumer/supplier is part of > > > > an overlay node we call directly device_link_release_fn() instead of > > > > queueing it. Yes, it might take some significant time for > > > > device_link_release_fn() to complete because of synchronize_srcu() but > > > > we would need to, anyways, wait for all OF references to be released if > > > > we want to respect overlays assumptions. > > > > > > > > Signed-off-by: Nuno Sa <nuno.sa@analog.com> > > > > --- > > > > This RFC is a follow up of a previous one that I sent to the devicetree > > > > folks [1]. It got rejected because it was not really fixing the root > > > > cause of the issue (which I do agree). Please see the link where I > > > > fully explain what the issue is. > > > > > > > > I did also some git blaming and did saw that commit > > > > 80dd33cf72d1 ("drivers: base: Fix device link removal") introduced > > > > queue_work() as we could be releasing the last device reference and hence > > > > sleeping which is against SRCU callback requirements. However, that same > > > > commit is now making use of synchronize_srcu() which may take > > > > significant time (and I think that's the reason for the work item?). > > > > > > > > However, given the dt overlays requirements, I'm not seeing any > > > > reason to not be able to run device_link_release_fn() synchronously if we > > > > detect an OVERLAY node is being released. I mean, even if we come up > > > > (and I did some experiments in this regard) with some async mechanism to > > > > release the OF nodes refcounts, we still need a synchronization point > > > > somewhere. > > > > > > > > Anyways, I would like to have some feedback on how acceptable would this > > > > be or what else could I do so we can have a "clean" dt overlay removal. > > > > > > > > I'm also including dt folks so they can give some comments on the new > > > > device_node_overlay_removal() function. My goal is to try to detect when > > > > an > > > > overlay is being removed (maybe we could even have an explicit flag for > > > > it?) and only directly call device_link_release_fn() in that case. > > > > > > > > [1]: > > > > https://lore.kernel.org/linux-devicetree/20230511151047.1779841-1-nuno.sa@analog.com/ > > > > --- > > > > drivers/base/core.c | 25 ++++++++++++++++++++++++- > > > > 1 file changed, 24 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/base/core.c b/drivers/base/core.c > > > > index 14d46af40f9a..31ea001f6142 100644 > > > > --- a/drivers/base/core.c > > > > +++ b/drivers/base/core.c > > > > @@ -497,6 +497,18 @@ static struct attribute *devlink_attrs[] = { > > > > }; > > > > ATTRIBUTE_GROUPS(devlink); > > > > > > > > +static bool device_node_overlay_removal(struct device *dev) > > > > +{ > > > > + if (!dev_of_node(dev)) > > > > + return false; > > > > + if (!of_node_check_flag(dev->of_node, OF_DETACHED)) > > > > + return false; > > > > + if (!of_node_check_flag(dev->of_node, OF_OVERLAY)) > > > > + return false; > > > > + > > > > + return true; > > > > +} > > > > + > > > > static void device_link_release_fn(struct work_struct *work) > > > > { > > > > struct device_link *link = container_of(work, struct device_link, > > > > rm_work); > > > > @@ -532,8 +544,19 @@ static void devlink_dev_release(struct device *dev) > > > > * synchronization in device_link_release_fn() and if the consumer > > > > or > > > > * supplier devices get deleted when it runs, so put it into the > > > > "long" > > > > * workqueue. > > > > + * > > > > + * However, if any of the supplier, consumer nodes is being removed > > > > + * through overlay removal, the expectation in > > > > + * __of_changeset_entry_destroy() is for the node 'kref' to be 1 > > > > which > > > > + * cannot be guaranteed with the async nature of > > > > + * device_link_release_fn(). Hence, do it synchronously for the > > > > overlay > > > > + * case. > > > > */ > > > > - queue_work(system_long_wq, &link->rm_work); > > > > + if (device_node_overlay_removal(link->consumer) || > > > > + device_node_overlay_removal(link->supplier)) > > > > + device_link_release_fn(&link->rm_work); > > > > + else > > > > + queue_work(system_long_wq, &link->rm_work); > > > > } > > > > > > > > static struct class devlink_class = { > > > > > > > > --- > > > > base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d > > > > change-id: 20240123-fix-device-links-overlays-5422e033a09b > > > > -- > > > > > > > > Thanks! > > > > - Nuno Sá > > > > > > > > > > Hi Rafael, > > > > > > Would be nice to have your feedback on this one or if this is a complete > > > nack... > > > I think calling device_link_release_fn() synchronously is ok but I might be > > > completely wrong. > > > > Well, it sounds like you are expecting me to confirm that what you are > > doing makes sense, but I cannot do that, because I am not sufficiently > > familiar with DT overlays. > > > > I'm trying to understand if there's no hidden issue by calling it synchronously. > (don't think there is but this is rather core stuff :)). > > From the DT guys, it would be helpful to get feedback on the new > device_node_overlay_removal() helper I'm introducing. The goal is to just do the > sync release in case we detect a node being removed as a result of an overlay > removal. > > > You first need to convince yourself that you are not completely wrong. > > I mean, the problem is definitely real and if you see the link I pasted in the > cover, this will all lead to big splats. > > > > > > +Cc Saravan as he should also be very familiar with device_links and see if > > > the > > > above fairly simple solution is sane. > > > > > > I also don't want to be pushy as I know you guys are all very busy but it's > > > (i > > > think) the third time I resend the patch :) > > > > Sorry about that, I haven't realized that my input is requisite. > > > > Yeah, get_mantainers gives me you and Greg but I think you're the main dev on > dev_links right? > > > So the patch not only calls device_link_release_fn() synchronously, > > but it also calls this function directly and I, personally, wouldn't > > do at least the latter. > > > > So you mean adding something like adding a new > > device_link_release(struct device_link *link) helper > and either call it synchronously from devlink_dev_release() or asynchronously > from device_link_release_fn()? > > I can drop the RFC and send a patch with the above... No, IMV devlink_dev_release() needs to be called via device_link_put_kref(), but it may run device_link_release_fn() directly if the link is marked in a special way or something like this. AFAICS, this is the only way to do it and be sure that all of the references to the link have been dropped when it is freed.
diff --git a/drivers/base/core.c b/drivers/base/core.c index 14d46af40f9a..31ea001f6142 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -497,6 +497,18 @@ static struct attribute *devlink_attrs[] = { }; ATTRIBUTE_GROUPS(devlink); +static bool device_node_overlay_removal(struct device *dev) +{ + if (!dev_of_node(dev)) + return false; + if (!of_node_check_flag(dev->of_node, OF_DETACHED)) + return false; + if (!of_node_check_flag(dev->of_node, OF_OVERLAY)) + return false; + + return true; +} + static void device_link_release_fn(struct work_struct *work) { struct device_link *link = container_of(work, struct device_link, rm_work); @@ -532,8 +544,19 @@ static void devlink_dev_release(struct device *dev) * synchronization in device_link_release_fn() and if the consumer or * supplier devices get deleted when it runs, so put it into the "long" * workqueue. + * + * However, if any of the supplier, consumer nodes is being removed + * through overlay removal, the expectation in + * __of_changeset_entry_destroy() is for the node 'kref' to be 1 which + * cannot be guaranteed with the async nature of + * device_link_release_fn(). Hence, do it synchronously for the overlay + * case. */ - queue_work(system_long_wq, &link->rm_work); + if (device_node_overlay_removal(link->consumer) || + device_node_overlay_removal(link->supplier)) + device_link_release_fn(&link->rm_work); + else + queue_work(system_long_wq, &link->rm_work); } static struct class devlink_class = {