Message ID | 20240229105204.720717-3-herve.codina@bootlin.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-86580-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2097:b0:108:e6aa:91d0 with SMTP id gs23csp308728dyb; Thu, 29 Feb 2024 02:53:21 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVS8qb2PZMiEmuDW6/+HcPbANfq2d7chMjhf6m7XkftY6QeWXFksJT1e8Sm2+TFbBr4Jvv4Edu758q9OkARzT6xr0gGCA== X-Google-Smtp-Source: AGHT+IHWteYED+lhqUj0Xr78g3/5qag4uFFGeL1J9PXXpsEcfpF985QjT8BkZvDmsYT6CaXmeu1v X-Received: by 2002:a05:6a00:194e:b0:6e3:d857:d35e with SMTP id s14-20020a056a00194e00b006e3d857d35emr1974186pfk.12.1709204001576; Thu, 29 Feb 2024 02:53:21 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709204001; cv=pass; d=google.com; s=arc-20160816; b=vdw4WxVT21ka8+l1IOIKSZTpY55k29fR9+NGx/O0sK6Hfw8I82cfjyJPDQfkpX165k fxOTyWajA+oNmW7zu8fKn1VpRs4/9J6GT98aLSsA7ApyDUQWz46Dstkm4ogw2+H/qvzT 8voWiMQFrxJR9u63L0bydugg/VRNaeGorCBN76c8aj/PdK4m3pBK5W6gpYXJi1vUngzw gr0S/wcyhmr+EPYoIRpDMmMl0IZctBPCdfHYr3j7M2j7bVZh+ItcEn6AYKUyuv9pBlSr U1/CpIehdrUv5mzGhTHuzY71wLxaCkTT/Vq/lJNmli3tmXvgXcuQc4OxSfAFfmGJigbu Ei3g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=yCXysWU34VdlLu8PHd2qD3CH0eUBVQ9+sisrVNEEJMU=; fh=kyPkL1xU3CVKvlOH7po29zw99gIYtWokMlxj62AX3dE=; b=yE3vfVNy1uSdedC+B/lQv0bLG4NOhIExLMPSGzr9C9nT84jMEP7+cOJieDk0GUCFlq LKhWWys6ZEpIh7n89f0Pj9npmvzhwT92vVDI5U5STMuNUzOA1MkpqTo0mNcckA2+0dMd 4BaLiecOhKD9/+fWKTth0v868EQSYBIUotjkGrKKgLPjlEyIvZrm3fUEzX0VRr+li4Bz wROHGlcQ/kWxoMpYC4SY6b97T/+8kD0H2PJTyWS1JHNlOozch9TTPvVQ9Nk4Q2I9rzKY 83MQnkbWq8VF1m8dxOSWj+kKAXilBnS6TXh4XOC84a7v/jRTaxAD2CCG9Ji4SBo5Dwnz ZkAA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@bootlin.com header.s=gm1 header.b=A3ZE4+ob; arc=pass (i=1 spf=pass spfdomain=bootlin.com dkim=pass dkdomain=bootlin.com dmarc=pass fromdomain=bootlin.com); spf=pass (google.com: domain of linux-kernel+bounces-86580-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86580-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=bootlin.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id v5-20020a655685000000b005e49d2a67e2si1136830pgs.555.2024.02.29.02.53.21 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Feb 2024 02:53:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-86580-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@bootlin.com header.s=gm1 header.b=A3ZE4+ob; arc=pass (i=1 spf=pass spfdomain=bootlin.com dkim=pass dkdomain=bootlin.com dmarc=pass fromdomain=bootlin.com); spf=pass (google.com: domain of linux-kernel+bounces-86580-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86580-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=bootlin.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 605BF28758D for <ouuuleilei@gmail.com>; Thu, 29 Feb 2024 10:53:21 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1408A7C0A5; Thu, 29 Feb 2024 10:52:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bootlin.com header.i=@bootlin.com header.b="A3ZE4+ob" Received: from relay6-d.mail.gandi.net (relay6-d.mail.gandi.net [217.70.183.198]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 279007AE73; Thu, 29 Feb 2024 10:52:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.70.183.198 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709203941; cv=none; b=fE9ZQ0pl9P8ev6Kp0SH7dDBDKtSRsKie96Ew6OcfJxCqDWWCwD0cAWk3Tie0wPh5cp8rTXBnyLjtFo9naMhYiBWb62ObOpILepk8g+DM8SMv4cZOp33gfqZLZGN2UAA4lmmIMs5ika0tbPMbrqjJ++8BTvwffZRp/qnJ5i/079U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709203941; c=relaxed/simple; bh=zfGeDfyRmaFp1JOEzDfUGcjVtP7tQN/s50BcDdDqvD0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=D2RK6mIXr2WcN++WtosjSXQ4yIzRBIlaQpAGDILuuDQWk7IZmt759c9/364ZwtMpsCpMsa74xe4xSdEVYsXNJQruFhnFeOxurSOHkvv1R0ENuzs54SasiDwywTuklyTxrTXNHcJ0MhFcT8BjNqyPk6P4H9H39Dkpa4nXJDEZSng= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=bootlin.com; spf=pass smtp.mailfrom=bootlin.com; dkim=pass (2048-bit key) header.d=bootlin.com header.i=@bootlin.com header.b=A3ZE4+ob; arc=none smtp.client-ip=217.70.183.198 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=bootlin.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bootlin.com Received: by mail.gandi.net (Postfix) with ESMTPA id 821C9C0003; Thu, 29 Feb 2024 10:52:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bootlin.com; s=gm1; t=1709203937; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yCXysWU34VdlLu8PHd2qD3CH0eUBVQ9+sisrVNEEJMU=; b=A3ZE4+obaUKBUYrzYYpw7/+4tzUBwLP3K+U4Z+PkVz+LyIlBL0SEinir8alS1UsUr7QYu2 1+okgDWJFW+K4k8eZYmxcbmfl4vihNEvnKl402+rm1RWHb+4TNnm7K/RaoGmrAWj6BGSHB bY2bT4f+1yJj9EP1cD2ujhBJRG64iu8yzudkCRBeuNpF6+IpDoG/KBrqERFPRfmY/65nSF hgyuLNA5+HSCUBqQk4B2AW8m+g7TyJv2Z0uZxR9IlKhCYdtaSErIBVIh4fxWgdX5G/DciB hbkIcOoZF8FXonqU7Z+fYaSGqu2AZqI7YA0bwgXBnHF41MZ2jzVK1WaXb4BfeQ== From: Herve Codina <herve.codina@bootlin.com> To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, "Rafael J. Wysocki" <rafael@kernel.org>, Rob Herring <robh+dt@kernel.org>, Frank Rowand <frowand.list@gmail.com> Cc: Lizhi Hou <lizhi.hou@amd.com>, Max Zhen <max.zhen@amd.com>, Sonal Santan <sonal.santan@amd.com>, Stefano Stabellini <stefano.stabellini@xilinx.com>, Jonathan Cameron <Jonathan.Cameron@Huawei.com>, linux-kernel@vger.kernel.org, devicetree@vger.kernel.org, Allan Nielsen <allan.nielsen@microchip.com>, Horatiu Vultur <horatiu.vultur@microchip.com>, Steen Hegelund <steen.hegelund@microchip.com>, Luca Ceresoli <luca.ceresoli@bootlin.com>, Nuno Sa <nuno.sa@analog.com>, Thomas Petazzoni <thomas.petazzoni@bootlin.com>, Herve Codina <herve.codina@bootlin.com>, stable@vger.kernel.org Subject: [PATCH v3 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals Date: Thu, 29 Feb 2024 11:52:03 +0100 Message-ID: <20240229105204.720717-3-herve.codina@bootlin.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240229105204.720717-1-herve.codina@bootlin.com> References: <20240229105204.720717-1-herve.codina@bootlin.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-GND-Sasl: herve.codina@bootlin.com X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792230294988011682 X-GMAIL-MSGID: 1792230294988011682 |
Series |
Synchronize DT overlay removal with devlink removals
|
|
Commit Message
Herve Codina
Feb. 29, 2024, 10:52 a.m. UTC
In the following sequence:
1) of_platform_depopulate()
2) of_overlay_remove()
During the step 1, devices are destroyed and devlinks are removed.
During the step 2, OF nodes are destroyed but
__of_changeset_entry_destroy() can raise warnings related to missing
of_node_put():
ERROR: memory leak, expected refcount 1 instead of 2 ...
Indeed, during the devlink removals performed at step 1, the removal
itself releasing the device (and the attached of_node) is done by a job
queued in a workqueue and so, it is done asynchronously with respect to
function calls.
When the warning is present, of_node_put() will be called but wrongly
too late from the workqueue job.
In order to be sure that any ongoing devlink removals are done before
the of_node destruction, synchronize the of_overlay_remove() with the
devlink removals.
Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
Cc: stable@vger.kernel.org
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
---
drivers/of/overlay.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
Comments
On Thu, 2024-02-29 at 11:52 +0100, Herve Codina wrote: > In the following sequence: > 1) of_platform_depopulate() > 2) of_overlay_remove() > > During the step 1, devices are destroyed and devlinks are removed. > During the step 2, OF nodes are destroyed but > __of_changeset_entry_destroy() can raise warnings related to missing > of_node_put(): > ERROR: memory leak, expected refcount 1 instead of 2 ... > > Indeed, during the devlink removals performed at step 1, the removal > itself releasing the device (and the attached of_node) is done by a job > queued in a workqueue and so, it is done asynchronously with respect to > function calls. > When the warning is present, of_node_put() will be called but wrongly > too late from the workqueue job. > > In order to be sure that any ongoing devlink removals are done before > the of_node destruction, synchronize the of_overlay_remove() with the > devlink removals. > > Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") > Cc: stable@vger.kernel.org > Signed-off-by: Herve Codina <herve.codina@bootlin.com> > --- > drivers/of/overlay.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > index 2ae7e9d24a64..7a010a62b9d8 100644 > --- a/drivers/of/overlay.c > +++ b/drivers/of/overlay.c > @@ -8,6 +8,7 @@ > > #define pr_fmt(fmt) "OF: overlay: " fmt > > +#include <linux/device.h> This is clearly up to the DT maintainers to decide but, IMHO, I would very much prefer to see fwnode.h included in here rather than directly device.h (so yeah, renaming the function to fwnode_*). But yeah, I might be biased by own series :) > #include <linux/kernel.h> > #include <linux/module.h> > #include <linux/of.h> > @@ -853,6 +854,14 @@ static void free_overlay_changeset(struct > overlay_changeset *ovcs) > { > int i; > > + /* > + * Wait for any ongoing device link removals before removing some of > + * nodes. Drop the global lock while waiting > + */ > + mutex_unlock(&of_mutex); > + device_link_wait_removal(); > + mutex_lock(&of_mutex); I'm still not convinced we need to drop the lock. What happens if someone else grabs the lock while we are in device_link_wait_removal()? Can we guarantee that we can't screw things badly? The question is, do you have a system/use case where you can really see the deadlock happening? Until I see one, I'm very skeptical about this. And if we have one, I'm not really sure this is also the right solution for it. - Nuno Sá
On Thu, Feb 29, 2024 at 12:18:49PM +0100, Nuno Sá wrote: > On Thu, 2024-02-29 at 11:52 +0100, Herve Codina wrote: > > In the following sequence: > > 1) of_platform_depopulate() > > 2) of_overlay_remove() > > > > During the step 1, devices are destroyed and devlinks are removed. > > During the step 2, OF nodes are destroyed but > > __of_changeset_entry_destroy() can raise warnings related to missing > > of_node_put(): > > ERROR: memory leak, expected refcount 1 instead of 2 ... > > > > Indeed, during the devlink removals performed at step 1, the removal > > itself releasing the device (and the attached of_node) is done by a job > > queued in a workqueue and so, it is done asynchronously with respect to > > function calls. > > When the warning is present, of_node_put() will be called but wrongly > > too late from the workqueue job. > > > > In order to be sure that any ongoing devlink removals are done before > > the of_node destruction, synchronize the of_overlay_remove() with the > > devlink removals. > > > > Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") > > Cc: stable@vger.kernel.org > > Signed-off-by: Herve Codina <herve.codina@bootlin.com> > > --- > > drivers/of/overlay.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > > index 2ae7e9d24a64..7a010a62b9d8 100644 > > --- a/drivers/of/overlay.c > > +++ b/drivers/of/overlay.c > > @@ -8,6 +8,7 @@ > > > > #define pr_fmt(fmt) "OF: overlay: " fmt > > > > +#include <linux/device.h> > > This is clearly up to the DT maintainers to decide but, IMHO, I would very much > prefer to see fwnode.h included in here rather than directly device.h (so yeah, > renaming the function to fwnode_*). IMO, the DT code should know almost nothing about fwnode because that's the layer above it. But then overlay stuff is kind of a layer above the core DT code too. > But yeah, I might be biased by own series :) > > > #include <linux/kernel.h> > > #include <linux/module.h> > > #include <linux/of.h> > > @@ -853,6 +854,14 @@ static void free_overlay_changeset(struct > > overlay_changeset *ovcs) > > { > > int i; > > > > + /* > > + * Wait for any ongoing device link removals before removing some of > > + * nodes. Drop the global lock while waiting > > + */ > > + mutex_unlock(&of_mutex); > > + device_link_wait_removal(); > > + mutex_lock(&of_mutex); > > I'm still not convinced we need to drop the lock. What happens if someone else > grabs the lock while we are in device_link_wait_removal()? Can we guarantee that > we can't screw things badly? It is also just ugly because it's the callers of free_overlay_changeset() that hold the lock and now we're releasing it behind their back. As device_link_wait_removal() is called before we touch anything, can't it be called before we take the lock? And do we need to call it if applying the overlay fails? Rob
On Mon, 2024-03-04 at 09:22 -0600, Rob Herring wrote: > On Thu, Feb 29, 2024 at 12:18:49PM +0100, Nuno Sá wrote: > > On Thu, 2024-02-29 at 11:52 +0100, Herve Codina wrote: > > > In the following sequence: > > > 1) of_platform_depopulate() > > > 2) of_overlay_remove() > > > > > > During the step 1, devices are destroyed and devlinks are removed. > > > During the step 2, OF nodes are destroyed but > > > __of_changeset_entry_destroy() can raise warnings related to missing > > > of_node_put(): > > > ERROR: memory leak, expected refcount 1 instead of 2 ... > > > > > > Indeed, during the devlink removals performed at step 1, the removal > > > itself releasing the device (and the attached of_node) is done by a job > > > queued in a workqueue and so, it is done asynchronously with respect to > > > function calls. > > > When the warning is present, of_node_put() will be called but wrongly > > > too late from the workqueue job. > > > > > > In order to be sure that any ongoing devlink removals are done before > > > the of_node destruction, synchronize the of_overlay_remove() with the > > > devlink removals. > > > > > > Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") > > > Cc: stable@vger.kernel.org > > > Signed-off-by: Herve Codina <herve.codina@bootlin.com> > > > --- > > > drivers/of/overlay.c | 10 +++++++++- > > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > > > index 2ae7e9d24a64..7a010a62b9d8 100644 > > > --- a/drivers/of/overlay.c > > > +++ b/drivers/of/overlay.c > > > @@ -8,6 +8,7 @@ > > > > > > #define pr_fmt(fmt) "OF: overlay: " fmt > > > > > > +#include <linux/device.h> > > > > This is clearly up to the DT maintainers to decide but, IMHO, I would very > > much > > prefer to see fwnode.h included in here rather than directly device.h (so > > yeah, > > renaming the function to fwnode_*). > > IMO, the DT code should know almost nothing about fwnode because that's > the layer above it. But then overlay stuff is kind of a layer above the > core DT code too. Yeah, my reasoning is just that it may be better than knowing about device.h code... But maybe I'm wrong :) > > > But yeah, I might be biased by own series :) > > > > > #include <linux/kernel.h> > > > #include <linux/module.h> > > > #include <linux/of.h> > > > @@ -853,6 +854,14 @@ static void free_overlay_changeset(struct > > > overlay_changeset *ovcs) > > > { > > > int i; > > > > > > + /* > > > + * Wait for any ongoing device link removals before removing some > > > of > > > + * nodes. Drop the global lock while waiting > > > + */ > > > + mutex_unlock(&of_mutex); > > > + device_link_wait_removal(); > > > + mutex_lock(&of_mutex); > > > > I'm still not convinced we need to drop the lock. What happens if someone > > else > > grabs the lock while we are in device_link_wait_removal()? Can we guarantee > > that > > we can't screw things badly? > > It is also just ugly because it's the callers of > free_overlay_changeset() that hold the lock and now we're releasing it > behind their back. > > As device_link_wait_removal() is called before we touch anything, can't > it be called before we take the lock? And do we need to call it if > applying the overlay fails? > My natural feeling was to put it right before checking the node refcount... and I would like to still see proof that there's any potential deadlock. I did not checked the code but the issue with calling it before we take the lock is that likely the device links wont be removed because the overlay removal path (which unbinds devices from drivers) needs to run under the lock? - Nuno Sá
diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c index 2ae7e9d24a64..7a010a62b9d8 100644 --- a/drivers/of/overlay.c +++ b/drivers/of/overlay.c @@ -8,6 +8,7 @@ #define pr_fmt(fmt) "OF: overlay: " fmt +#include <linux/device.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/of.h> @@ -853,6 +854,14 @@ static void free_overlay_changeset(struct overlay_changeset *ovcs) { int i; + /* + * Wait for any ongoing device link removals before removing some of + * nodes. Drop the global lock while waiting + */ + mutex_unlock(&of_mutex); + device_link_wait_removal(); + mutex_lock(&of_mutex); + if (ovcs->cset.entries.next) of_changeset_destroy(&ovcs->cset); @@ -862,7 +871,6 @@ static void free_overlay_changeset(struct overlay_changeset *ovcs) ovcs->id = 0; } - for (i = 0; i < ovcs->count; i++) { of_node_put(ovcs->fragments[i].target); of_node_put(ovcs->fragments[i].overlay);