Message ID | 1702093576-30405-1-git-send-email-ssengar@linux.microsoft.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp5863794vqy; Fri, 8 Dec 2023 19:46:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IFHEmUe+QRR1jJzd+1P8VbPL7OKOCQPJBqY5FpgD/3jqOywzsXgyqAruzqeD6zzy+QnBn2d X-Received: by 2002:a05:6a21:6da1:b0:190:5d48:ddd9 with SMTP id wl33-20020a056a216da100b001905d48ddd9mr1406495pzb.59.1702093600705; Fri, 08 Dec 2023 19:46:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702093600; cv=none; d=google.com; s=arc-20160816; b=KsD4lxEf8oh8ATBz/7ZEulpzVSiOT3FkKDBniHIJX9d6oq70X08YvEGKu6RFWnoZ7l D5gGEuoAW0kp1OlfnK0X8ixm/cOrD7MDq/5kXj08xitdM/4XoPOvtYpBMrrCLq4yWeDg NS6/7z2CgTnNwMlWHquGQk4Tutw4n/Su0s8IWUbrAGRJA/WW22qLKre19ufjVlmprC40 yEAEuszceO33T6II1caVTnLppz5xI+s4YN++4NlZR3ckKMuXxKme8XfYQwtu2N4o6hET v3dpw7DdcVDO73j1AqvSa1DOVR1tTpeH4aoPaANs5SV4/LL82B1zFLtRUAr+3TGBEQrK M/fA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from :dkim-signature:dkim-filter; bh=CndfACro1vB4lqJpl3P/DHaR4O77a0Wj2l2e2pQn8hE=; fh=YWw2jpNt/uItMw8BDFu6bfdh2ZkQWl5/B1Llo++rxpI=; b=f2aGh21J0eeoELDvOE940AogMFPDHwf6f/eQyG9wNjMuQ25LcXxd5x966QE9u+AfWO NwCOfdA5iSTPLWaoCKs/hxxuKS0powBHYEl+4jtDOHq3ijqFZJZ7Gfx2iuU6golqVlYF rOIjZB8XmNvu6O7s3m0HKF2GMMZS5rMEVFXEFq1CkKZ6NnLEN5vR2YgEhjhT4nHc7jLB WFHQxDvmOVyGVQN/jyh/tzhzZDu/Cqc0EOJAE8QnEQBAwHAbRWkMtJoiRoQDixWOnEHm Pxf+LD7P5U0PpNTSEtZ6uQdHk7shGSnXKJBcdCeEldyf4RvSdlkPNWqL0bBlD6AusNzB +Ynw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=N1XGDzHm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id n30-20020a634d5e000000b005be264316d8si2536670pgl.417.2023.12.08.19.46.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 19:46:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=N1XGDzHm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id AC0AA808DDAC; Fri, 8 Dec 2023 19:46:37 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229868AbjLIDqU (ORCPT <rfc822;makky5685@gmail.com> + 99 others); Fri, 8 Dec 2023 22:46:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229731AbjLIDqS (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 8 Dec 2023 22:46:18 -0500 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 08FB210E6; Fri, 8 Dec 2023 19:46:25 -0800 (PST) Received: from linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net (linux.microsoft.com [13.77.154.182]) by linux.microsoft.com (Postfix) with ESMTPSA id 5802320B74C0; Fri, 8 Dec 2023 19:46:24 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 5802320B74C0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1702093584; bh=CndfACro1vB4lqJpl3P/DHaR4O77a0Wj2l2e2pQn8hE=; h=From:To:Cc:Subject:Date:From; b=N1XGDzHmrjNwtgrvUfVz9ax9DqVgQnm1uI9kAMBPBmgegGR4k4LxktFm8eWq7h1dH wlX3aPIPKtFSYJcAGIfumerdUIFjbsppHq393Be3g3o5hfpSEAnqURpEaE7Hy+VmfP CmJqmLSbqhoRxZW0Szc/uBc/Hq1Ho6fC8A9N0NOY= From: Saurabh Sengar <ssengar@linux.microsoft.com> To: bhelgaas@google.com, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org Cc: alexander.stein@ew.tq-group.com, ssengar@linux.microsoft.com, decui@microsoft.com Subject: [PATCH] PCI/sysfs: Fix race in pci sysfs creation Date: Fri, 8 Dec 2023 19:46:16 -0800 Message-Id: <1702093576-30405-1-git-send-email-ssengar@linux.microsoft.com> X-Mailer: git-send-email 1.8.3.1 X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Fri, 08 Dec 2023 19:46:37 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784774499254916835 X-GMAIL-MSGID: 1784774499254916835 |
Series |
PCI/sysfs: Fix race in pci sysfs creation
|
|
Commit Message
Saurabh Singh Sengar
Dec. 9, 2023, 3:46 a.m. UTC
Currently there is a race in calling pci_create_resource_files function
from two different therads, first therad is triggered by pci_sysfs_init
from the late initcall where as the second thread is initiated by
pci_bus_add_devices from the respective PCI drivers probe.
The synchronization between these threads relies on the sysfs_initialized
flag. However, in pci_sysfs_init, sysfs_initialized is set right before
calling pci_create_resource_files which is wrong as it can create race
condition with pci_bus_add_devices threads. Fix this by setting
sysfs_initialized flag at the end of pci_sysfs_init and direecly call the
pci_create_resource_files function from it.
There can be an additional case where driver probe is so delayed that
pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init.
In such cases, attempting to access already existing sysfs resources is
unnecessary. Fix this by adding a check for sysfs attributes and return
if they are already allocated.
In both cases, the consequence will be the removal of sysfs resources that
were appropriately allocated by pci_sysfs_init following the warning below.
[ 3.376688] sysfs: cannot create duplicate filename '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/47505500-0001-0000-3130-444531454238/pci0001:00/0001:00:00.0/resource0'
[ 3.385103] CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 5.15.0-1046-azure #53~20.04.1-Ubuntu
[ 3.389585] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
[ 3.394663] Workqueue: events_unbound async_run_entry_fn
[ 3.397687] Call Trace:
[ 3.399312] <TASK>
[ 3.400780] dump_stack_lvl+0x38/0x4d
[ 3.402998] dump_stack+0x10/0x16
[ 3.406050] sysfs_warn_dup.cold+0x17/0x2b
[ 3.408476] sysfs_add_file_mode_ns+0x17b/0x190
[ 3.411072] sysfs_create_bin_file+0x64/0x90
[ 3.413514] pci_create_attr+0xc7/0x260
[ 3.415827] pci_create_resource_files+0x6f/0x150
[ 3.418455] pci_create_sysfs_dev_files+0x18/0x30
[ 3.421136] pci_bus_add_device+0x30/0x70
[ 3.423512] pci_bus_add_devices+0x31/0x70
[ 3.425958] hv_pci_probe+0x4ce/0x640
[ 3.428106] vmbus_probe+0x67/0x90
[ 3.430121] really_probe.part.0+0xcb/0x380
[ 3.432516] really_probe+0x40/0x80
[ 3.434581] __driver_probe_device+0xe8/0x140
[ 3.437119] driver_probe_device+0x23/0xb0
[ 3.439504] __driver_attach_async_helper+0x31/0x90
[ 3.442296] async_run_entry_fn+0x33/0x120
[ 3.444666] process_one_work+0x225/0x3d0
[ 3.447043] worker_thread+0x4d/0x3e0
[ 3.449233] ? process_one_work+0x3d0/0x3d0
[ 3.451632] kthread+0x12a/0x150
[ 3.453583] ? set_kthread_struct+0x50/0x50
[ 3.456103] ret_from_fork+0x22/0x30
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
---
There has been earlier attempts to fix this problem, below are the patches
for reference of these attempts.
1. https://lore.kernel.org/linux-pci/20230316103036.1837869-1-alexander.stein@ew.tq-group.com/T/#u
2. https://lwn.net/ml/linux-kernel/20230316091540.494366-1-alexander.stein@ew.tq-group.com/
Bug details: https://bugzilla.kernel.org/show_bug.cgi?id=215515
drivers/pci/pci-sysfs.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
Comments
Hi Saurabh, thanks for the patch. Am Samstag, 9. Dezember 2023, 04:46:16 CET schrieb Saurabh Sengar: > Currently there is a race in calling pci_create_resource_files function > from two different therads, first therad is triggered by pci_sysfs_init > from the late initcall where as the second thread is initiated by > pci_bus_add_devices from the respective PCI drivers probe. > > The synchronization between these threads relies on the sysfs_initialized > flag. However, in pci_sysfs_init, sysfs_initialized is set right before > calling pci_create_resource_files which is wrong as it can create race > condition with pci_bus_add_devices threads. Fix this by setting > sysfs_initialized flag at the end of pci_sysfs_init and direecly call the Small typo here: direecly -> directly > pci_create_resource_files function from it. > > There can be an additional case where driver probe is so delayed that > pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init. > In such cases, attempting to access already existing sysfs resources is > unnecessary. Fix this by adding a check for sysfs attributes and return > if they are already allocated. > > In both cases, the consequence will be the removal of sysfs resources that > were appropriately allocated by pci_sysfs_init following the warning below. I'm not sure if this is the way to go. Unfortunately I can't trigger this error on my imx6 platform at the moment (apparently timing is off). But reading [1] again, the most expressive way is that pci_bus_add_devices() needs to wait until pci_sysfs_init() has passed. Best regards, Alexander [1] https://lore.kernel.org/lkml/a1cca367-52b6-a6b1-fb01-890cad39fd29@suse.com/ > > [ 3.376688] sysfs: cannot create duplicate filename > '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/47505500-00 > 01-0000-3130-444531454238/pci0001:00/0001:00:00.0/resource0' [ 3.385103] > CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 5.15.0-1046-azure > #53~20.04.1-Ubuntu [ 3.389585] Hardware name: Microsoft Corporation > Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 [ 3.394663] > Workqueue: events_unbound async_run_entry_fn > [ 3.397687] Call Trace: > [ 3.399312] <TASK> > [ 3.400780] dump_stack_lvl+0x38/0x4d > [ 3.402998] dump_stack+0x10/0x16 > [ 3.406050] sysfs_warn_dup.cold+0x17/0x2b > [ 3.408476] sysfs_add_file_mode_ns+0x17b/0x190 > [ 3.411072] sysfs_create_bin_file+0x64/0x90 > [ 3.413514] pci_create_attr+0xc7/0x260 > [ 3.415827] pci_create_resource_files+0x6f/0x150 > [ 3.418455] pci_create_sysfs_dev_files+0x18/0x30 > [ 3.421136] pci_bus_add_device+0x30/0x70 > [ 3.423512] pci_bus_add_devices+0x31/0x70 > [ 3.425958] hv_pci_probe+0x4ce/0x640 > [ 3.428106] vmbus_probe+0x67/0x90 > [ 3.430121] really_probe.part.0+0xcb/0x380 > [ 3.432516] really_probe+0x40/0x80 > [ 3.434581] __driver_probe_device+0xe8/0x140 > [ 3.437119] driver_probe_device+0x23/0xb0 > [ 3.439504] __driver_attach_async_helper+0x31/0x90 > [ 3.442296] async_run_entry_fn+0x33/0x120 > [ 3.444666] process_one_work+0x225/0x3d0 > [ 3.447043] worker_thread+0x4d/0x3e0 > [ 3.449233] ? process_one_work+0x3d0/0x3d0 > [ 3.451632] kthread+0x12a/0x150 > [ 3.453583] ? set_kthread_struct+0x50/0x50 > [ 3.456103] ret_from_fork+0x22/0x30 > > Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> > --- > There has been earlier attempts to fix this problem, below are the patches > for reference of these attempts. > 1. > https://lore.kernel.org/linux-pci/20230316103036.1837869-1-alexander.stein@ > ew.tq-group.com/T/#u 2. > https://lwn.net/ml/linux-kernel/20230316091540.494366-1-alexander.stein@ew. > tq-group.com/ > > Bug details: https://bugzilla.kernel.org/show_bug.cgi?id=215515 > > drivers/pci/pci-sysfs.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index f2909ae93f2f..a31f6f2cf309 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -1230,6 +1230,10 @@ static int pci_create_resource_files(struct pci_dev > *pdev) if (!pci_resource_len(pdev, i)) > continue; > > + /* Check if resource already allocated and proceed no further */ > + if (pdev->res_attr[i] || pdev->res_attr_wc[i]) > + return 0; > + > retval = pci_create_attr(pdev, i, 0); > /* for prefetchable resources, create a WC mappable file */ > if (!retval && arch_can_pci_mmap_wc() && > @@ -1411,9 +1415,8 @@ static int __init pci_sysfs_init(void) > struct pci_bus *pbus = NULL; > int retval; > > - sysfs_initialized = 1; > for_each_pci_dev(pdev) { > - retval = pci_create_sysfs_dev_files(pdev); > + retval = pci_create_resource_files(pdev); > if (retval) { > pci_dev_put(pdev); > return retval; > @@ -1423,6 +1426,8 @@ static int __init pci_sysfs_init(void) > while ((pbus = pci_find_next_bus(pbus))) > pci_create_legacy_files(pbus); > > + sysfs_initialized = 1; > + > return 0; > } > late_initcall(pci_sysfs_init);
On Tue, Dec 12, 2023 at 08:19:11AM +0100, Alexander Stein wrote: > Hi Saurabh, > > thanks for the patch. > > Am Samstag, 9. Dezember 2023, 04:46:16 CET schrieb Saurabh Sengar: > > Currently there is a race in calling pci_create_resource_files function > > from two different therads, first therad is triggered by pci_sysfs_init > > from the late initcall where as the second thread is initiated by > > pci_bus_add_devices from the respective PCI drivers probe. > > > > The synchronization between these threads relies on the sysfs_initialized > > flag. However, in pci_sysfs_init, sysfs_initialized is set right before > > calling pci_create_resource_files which is wrong as it can create race > > condition with pci_bus_add_devices threads. Fix this by setting > > sysfs_initialized flag at the end of pci_sysfs_init and direecly call the > > Small typo here: direecly -> directly thanks > > > pci_create_resource_files function from it. > > > > There can be an additional case where driver probe is so delayed that > > pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init. > > In such cases, attempting to access already existing sysfs resources is > > unnecessary. Fix this by adding a check for sysfs attributes and return > > if they are already allocated. > > > > In both cases, the consequence will be the removal of sysfs resources that > > were appropriately allocated by pci_sysfs_init following the warning below. > > I'm not sure if this is the way to go. Unfortunately I can't trigger this > error on my imx6 platform at the moment (apparently timing is off). The first case in the commit message is the issue which motivated me to write this patch. The additional case I am explaining in the commit message is not happening for me as well, I have hacked my driver to add a big sleep (10 second) before pci_bus_add_devices to create this scenario. Probably you can try the same as well. The check added for "resource already allocated" is for this additional case only. > But reading [1] again, the most expressive way is that pci_bus_add_devices() > needs to wait until pci_sysfs_init() has passed. For the first case I agree with you. This patch is doing exactly the same by moving sysfs_initialized flag setting at the end of pci_sysfs_init function. Is there anything I might be ovelooking ? - Saurabh > > Best regards, > Alexander > > [1] https://lore.kernel.org/lkml/a1cca367-52b6-a6b1-fb01-890cad39fd29@suse.com/ > > > > > [ 3.376688] sysfs: cannot create duplicate filename > > '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/47505500-00 > > 01-0000-3130-444531454238/pci0001:00/0001:00:00.0/resource0' [ 3.385103] > > CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 5.15.0-1046-azure > > #53~20.04.1-Ubuntu [ 3.389585] Hardware name: Microsoft Corporation > > Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 [ 3.394663] > > Workqueue: events_unbound async_run_entry_fn > > [ 3.397687] Call Trace: > > [ 3.399312] <TASK> > > [ 3.400780] dump_stack_lvl+0x38/0x4d > > [ 3.402998] dump_stack+0x10/0x16 > > [ 3.406050] sysfs_warn_dup.cold+0x17/0x2b > > [ 3.408476] sysfs_add_file_mode_ns+0x17b/0x190 > > [ 3.411072] sysfs_create_bin_file+0x64/0x90 > > [ 3.413514] pci_create_attr+0xc7/0x260 > > [ 3.415827] pci_create_resource_files+0x6f/0x150 > > [ 3.418455] pci_create_sysfs_dev_files+0x18/0x30 > > [ 3.421136] pci_bus_add_device+0x30/0x70 > > [ 3.423512] pci_bus_add_devices+0x31/0x70 > > [ 3.425958] hv_pci_probe+0x4ce/0x640 > > [ 3.428106] vmbus_probe+0x67/0x90 > > [ 3.430121] really_probe.part.0+0xcb/0x380 > > [ 3.432516] really_probe+0x40/0x80 > > [ 3.434581] __driver_probe_device+0xe8/0x140 > > [ 3.437119] driver_probe_device+0x23/0xb0 > > [ 3.439504] __driver_attach_async_helper+0x31/0x90 > > [ 3.442296] async_run_entry_fn+0x33/0x120 > > [ 3.444666] process_one_work+0x225/0x3d0 > > [ 3.447043] worker_thread+0x4d/0x3e0 > > [ 3.449233] ? process_one_work+0x3d0/0x3d0 > > [ 3.451632] kthread+0x12a/0x150 > > [ 3.453583] ? set_kthread_struct+0x50/0x50 > > [ 3.456103] ret_from_fork+0x22/0x30 > > > > Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> > > --- > > There has been earlier attempts to fix this problem, below are the patches > > for reference of these attempts. > > 1. > > https://lore.kernel.org/linux-pci/20230316103036.1837869-1-alexander.stein@ > > ew.tq-group.com/T/#u 2. > > https://lwn.net/ml/linux-kernel/20230316091540.494366-1-alexander.stein@ew. > > tq-group.com/ > > > > Bug details: https://bugzilla.kernel.org/show_bug.cgi?id=215515 > > > > drivers/pci/pci-sysfs.c | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > > index f2909ae93f2f..a31f6f2cf309 100644 > > --- a/drivers/pci/pci-sysfs.c > > +++ b/drivers/pci/pci-sysfs.c > > @@ -1230,6 +1230,10 @@ static int pci_create_resource_files(struct pci_dev > > *pdev) if (!pci_resource_len(pdev, i)) > > continue; > > > > + /* Check if resource already allocated and proceed no > further */ > > + if (pdev->res_attr[i] || pdev->res_attr_wc[i]) > > + return 0; > > + > > retval = pci_create_attr(pdev, i, 0); > > /* for prefetchable resources, create a WC mappable file > */ > > if (!retval && arch_can_pci_mmap_wc() && > > @@ -1411,9 +1415,8 @@ static int __init pci_sysfs_init(void) > > struct pci_bus *pbus = NULL; > > int retval; > > > > - sysfs_initialized = 1; > > for_each_pci_dev(pdev) { > > - retval = pci_create_sysfs_dev_files(pdev); > > + retval = pci_create_resource_files(pdev); > > if (retval) { > > pci_dev_put(pdev); > > return retval; > > @@ -1423,6 +1426,8 @@ static int __init pci_sysfs_init(void) > > while ((pbus = pci_find_next_bus(pbus))) > > pci_create_legacy_files(pbus); > > > > + sysfs_initialized = 1; > > + > > return 0; > > } > > late_initcall(pci_sysfs_init); > > > -- > TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany > Amtsgericht München, HRB 105018 > Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider > http://www.tq-group.com/ >
On Tue, Dec 12, 2023 at 08:19:11AM +0100, Alexander Stein wrote: > Hi Saurabh, > > thanks for the patch. > > Am Samstag, 9. Dezember 2023, 04:46:16 CET schrieb Saurabh Sengar: > > Currently there is a race in calling pci_create_resource_files function > > from two different therads, first therad is triggered by pci_sysfs_init > > from the late initcall where as the second thread is initiated by > > pci_bus_add_devices from the respective PCI drivers probe. > > > > The synchronization between these threads relies on the sysfs_initialized > > flag. However, in pci_sysfs_init, sysfs_initialized is set right before > > calling pci_create_resource_files which is wrong as it can create race > > condition with pci_bus_add_devices threads. Fix this by setting > > sysfs_initialized flag at the end of pci_sysfs_init and direecly call the > > Small typo here: direecly -> directly > > > pci_create_resource_files function from it. > > > > There can be an additional case where driver probe is so delayed that > > pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init. > > In such cases, attempting to access already existing sysfs resources is > > unnecessary. Fix this by adding a check for sysfs attributes and return > > if they are already allocated. > > > > In both cases, the consequence will be the removal of sysfs resources that > > were appropriately allocated by pci_sysfs_init following the warning below. > > I'm not sure if this is the way to go. Unfortunately I can't trigger this > error on my imx6 platform at the moment (apparently timing is off). > But reading [1] again, the most expressive way is that pci_bus_add_devices() > needs to wait until pci_sysfs_init() has passed. (I correct my self a bit in my earlier reply) The problem with waiting is that sysfs entries will be created by pci_sysfs_init already and when pci_bus_add_devices try to create it will observe that the entries are already existing and in such case PCI code will remove the sysfs entries created by pci_sysfs_init. Resulting system will be having no sysfs entries. - Saurabh
On Tue, Dec 12, 2023 at 12:28:05AM -0800, Saurabh Singh Sengar wrote: > On Tue, Dec 12, 2023 at 08:19:11AM +0100, Alexander Stein wrote: > > Hi Saurabh, > > > > thanks for the patch. > > > > Am Samstag, 9. Dezember 2023, 04:46:16 CET schrieb Saurabh Sengar: > > > Currently there is a race in calling pci_create_resource_files function > > > from two different therads, first therad is triggered by pci_sysfs_init > > > from the late initcall where as the second thread is initiated by > > > pci_bus_add_devices from the respective PCI drivers probe. > > > > > > The synchronization between these threads relies on the sysfs_initialized > > > flag. However, in pci_sysfs_init, sysfs_initialized is set right before > > > calling pci_create_resource_files which is wrong as it can create race > > > condition with pci_bus_add_devices threads. Fix this by setting > > > sysfs_initialized flag at the end of pci_sysfs_init and direecly call the > > > > Small typo here: direecly -> directly > > > > > pci_create_resource_files function from it. > > > > > > There can be an additional case where driver probe is so delayed that > > > pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init. > > > In such cases, attempting to access already existing sysfs resources is > > > unnecessary. Fix this by adding a check for sysfs attributes and return > > > if they are already allocated. > > > > > > In both cases, the consequence will be the removal of sysfs resources that > > > were appropriately allocated by pci_sysfs_init following the warning below. > > > > I'm not sure if this is the way to go. Unfortunately I can't trigger this > > error on my imx6 platform at the moment (apparently timing is off). > > But reading [1] again, the most expressive way is that pci_bus_add_devices() > > needs to wait until pci_sysfs_init() has passed. > > (I correct my self a bit in my earlier reply) > The problem with waiting is that sysfs entries will be created by pci_sysfs_init > already and when pci_bus_add_devices try to create it will observe that the > entries are already existing and in such case PCI code will remove the sysfs > entries created by pci_sysfs_init. Resulting system will be having no sysfs > entries. Hi Alexander, Have you got time to check this ? Please let me know if you think there is any concern left with this patch. - Saurabh
On Wed, Jan 03, 2024 at 09:38:03PM -0800, Saurabh Singh Sengar wrote: > On Tue, Dec 12, 2023 at 12:28:05AM -0800, Saurabh Singh Sengar wrote: > > On Tue, Dec 12, 2023 at 08:19:11AM +0100, Alexander Stein wrote: > > > Hi Saurabh, > > > > > > thanks for the patch. > > > > > > Am Samstag, 9. Dezember 2023, 04:46:16 CET schrieb Saurabh Sengar: > > > > Currently there is a race in calling pci_create_resource_files function > > > > from two different therads, first therad is triggered by pci_sysfs_init > > > > from the late initcall where as the second thread is initiated by > > > > pci_bus_add_devices from the respective PCI drivers probe. > > > > > > > > The synchronization between these threads relies on the sysfs_initialized > > > > flag. However, in pci_sysfs_init, sysfs_initialized is set right before > > > > calling pci_create_resource_files which is wrong as it can create race > > > > condition with pci_bus_add_devices threads. Fix this by setting > > > > sysfs_initialized flag at the end of pci_sysfs_init and direecly call the > > > > > > Small typo here: direecly -> directly > > > > > > > pci_create_resource_files function from it. > > > > > > > > There can be an additional case where driver probe is so delayed that > > > > pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init. > > > > In such cases, attempting to access already existing sysfs resources is > > > > unnecessary. Fix this by adding a check for sysfs attributes and return > > > > if they are already allocated. > > > > > > > > In both cases, the consequence will be the removal of sysfs resources that > > > > were appropriately allocated by pci_sysfs_init following the warning below. > > > > > > I'm not sure if this is the way to go. Unfortunately I can't trigger this > > > error on my imx6 platform at the moment (apparently timing is off). > > > But reading [1] again, the most expressive way is that pci_bus_add_devices() > > > needs to wait until pci_sysfs_init() has passed. > > > > (I correct my self a bit in my earlier reply) > > The problem with waiting is that sysfs entries will be created by pci_sysfs_init > > already and when pci_bus_add_devices try to create it will observe that the > > entries are already existing and in such case PCI code will remove the sysfs > > entries created by pci_sysfs_init. Resulting system will be having no sysfs > > entries. > > > Hi Alexander, > Have you got time to check this ? Please let me know if you think there is any > concern left with this patch. Hi PCI Maintainers, If there is no objection, can we take this patch in ? Regards, Saurabh
On Fri, Jan 19, 2024 at 10:41:56PM -0800, Saurabh Singh Sengar wrote: > On Wed, Jan 03, 2024 at 09:38:03PM -0800, Saurabh Singh Sengar wrote: > > On Tue, Dec 12, 2023 at 12:28:05AM -0800, Saurabh Singh Sengar wrote: > > > On Tue, Dec 12, 2023 at 08:19:11AM +0100, Alexander Stein wrote: > > > > Hi Saurabh, > > > > > > > > thanks for the patch. > > > > > > > > Am Samstag, 9. Dezember 2023, 04:46:16 CET schrieb Saurabh Sengar: > > > > > Currently there is a race in calling pci_create_resource_files function > > > > > from two different therads, first therad is triggered by pci_sysfs_init > > > > > from the late initcall where as the second thread is initiated by > > > > > pci_bus_add_devices from the respective PCI drivers probe. > > > > > > > > > > The synchronization between these threads relies on the sysfs_initialized > > > > > flag. However, in pci_sysfs_init, sysfs_initialized is set right before > > > > > calling pci_create_resource_files which is wrong as it can create race > > > > > condition with pci_bus_add_devices threads. Fix this by setting > > > > > sysfs_initialized flag at the end of pci_sysfs_init and direecly call the > > > > > > > > Small typo here: direecly -> directly > > > > > > > > > pci_create_resource_files function from it. > > > > > > > > > > There can be an additional case where driver probe is so delayed that > > > > > pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init. > > > > > In such cases, attempting to access already existing sysfs resources is > > > > > unnecessary. Fix this by adding a check for sysfs attributes and return > > > > > if they are already allocated. > > > > > > > > > > In both cases, the consequence will be the removal of sysfs resources that > > > > > were appropriately allocated by pci_sysfs_init following the warning below. > > > > > > > > I'm not sure if this is the way to go. Unfortunately I can't trigger this > > > > error on my imx6 platform at the moment (apparently timing is off). > > > > But reading [1] again, the most expressive way is that pci_bus_add_devices() > > > > needs to wait until pci_sysfs_init() has passed. > > > > > > (I correct my self a bit in my earlier reply) > > > The problem with waiting is that sysfs entries will be created by pci_sysfs_init > > > already and when pci_bus_add_devices try to create it will observe that the > > > entries are already existing and in such case PCI code will remove the sysfs > > > entries created by pci_sysfs_init. Resulting system will be having no sysfs > > > entries. > > > > > > Hi Alexander, > > Have you got time to check this ? Please let me know if you think there is any > > concern left with this patch. > > Hi PCI Maintainers, > > If there is no objection, can we take this patch in ? > ping > > Regards, > Saurabh > > > >
[+cc Krzysztof] On Fri, Dec 08, 2023 at 07:46:16PM -0800, Saurabh Sengar wrote: > Currently there is a race in calling pci_create_resource_files function > from two different therads, first therad is triggered by pci_sysfs_init > from the late initcall where as the second thread is initiated by > pci_bus_add_devices from the respective PCI drivers probe. > > The synchronization between these threads relies on the sysfs_initialized > flag. However, in pci_sysfs_init, sysfs_initialized is set right before > calling pci_create_resource_files which is wrong as it can create race > condition with pci_bus_add_devices threads. Fix this by setting > sysfs_initialized flag at the end of pci_sysfs_init and direecly call the > pci_create_resource_files function from it. > > There can be an additional case where driver probe is so delayed that > pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init. > In such cases, attempting to access already existing sysfs resources is > unnecessary. Fix this by adding a check for sysfs attributes and return > if they are already allocated. > > In both cases, the consequence will be the removal of sysfs resources that > were appropriately allocated by pci_sysfs_init following the warning below. > > [ 3.376688] sysfs: cannot create duplicate filename '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/47505500-0001-0000-3130-444531454238/pci0001:00/0001:00:00.0/resource0' > [ 3.385103] CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 5.15.0-1046-azure #53~20.04.1-Ubuntu > [ 3.389585] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 > [ 3.394663] Workqueue: events_unbound async_run_entry_fn > [ 3.397687] Call Trace: > [ 3.399312] <TASK> > [ 3.400780] dump_stack_lvl+0x38/0x4d > [ 3.402998] dump_stack+0x10/0x16 > [ 3.406050] sysfs_warn_dup.cold+0x17/0x2b > [ 3.408476] sysfs_add_file_mode_ns+0x17b/0x190 > [ 3.411072] sysfs_create_bin_file+0x64/0x90 > [ 3.413514] pci_create_attr+0xc7/0x260 > [ 3.415827] pci_create_resource_files+0x6f/0x150 > [ 3.418455] pci_create_sysfs_dev_files+0x18/0x30 > [ 3.421136] pci_bus_add_device+0x30/0x70 > [ 3.423512] pci_bus_add_devices+0x31/0x70 > [ 3.425958] hv_pci_probe+0x4ce/0x640 > [ 3.428106] vmbus_probe+0x67/0x90 > [ 3.430121] really_probe.part.0+0xcb/0x380 > [ 3.432516] really_probe+0x40/0x80 > [ 3.434581] __driver_probe_device+0xe8/0x140 > [ 3.437119] driver_probe_device+0x23/0xb0 > [ 3.439504] __driver_attach_async_helper+0x31/0x90 > [ 3.442296] async_run_entry_fn+0x33/0x120 > [ 3.444666] process_one_work+0x225/0x3d0 > [ 3.447043] worker_thread+0x4d/0x3e0 > [ 3.449233] ? process_one_work+0x3d0/0x3d0 > [ 3.451632] kthread+0x12a/0x150 > [ 3.453583] ? set_kthread_struct+0x50/0x50 > [ 3.456103] ret_from_fork+0x22/0x30 > > Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> > --- > There has been earlier attempts to fix this problem, below are the patches > for reference of these attempts. > 1. https://lore.kernel.org/linux-pci/20230316103036.1837869-1-alexander.stein@ew.tq-group.com/T/#u > 2. https://lwn.net/ml/linux-kernel/20230316091540.494366-1-alexander.stein@ew.tq-group.com/ > > Bug details: https://bugzilla.kernel.org/show_bug.cgi?id=215515 > > drivers/pci/pci-sysfs.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index f2909ae93f2f..a31f6f2cf309 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -1230,6 +1230,10 @@ static int pci_create_resource_files(struct pci_dev *pdev) > if (!pci_resource_len(pdev, i)) > continue; > > + /* Check if resource already allocated and proceed no further */ > + if (pdev->res_attr[i] || pdev->res_attr_wc[i]) > + return 0; > + > retval = pci_create_attr(pdev, i, 0); > /* for prefetchable resources, create a WC mappable file */ > if (!retval && arch_can_pci_mmap_wc() && > @@ -1411,9 +1415,8 @@ static int __init pci_sysfs_init(void) > struct pci_bus *pbus = NULL; > int retval; > > - sysfs_initialized = 1; > for_each_pci_dev(pdev) { > - retval = pci_create_sysfs_dev_files(pdev); > + retval = pci_create_resource_files(pdev); > if (retval) { > pci_dev_put(pdev); > return retval; > @@ -1423,6 +1426,8 @@ static int __init pci_sysfs_init(void) > while ((pbus = pci_find_next_bus(pbus))) > pci_create_legacy_files(pbus); > > + sysfs_initialized = 1; > + > return 0; > } > late_initcall(pci_sysfs_init); Sorry for the delay in looking at this. Consider the following sequence where thread A is executing pci_sysfs_init() at the same time as thread B enumerates and adds device X: Thread A: pci_sysfs_init for_each_pci_dev(pdev) { # device X not included pci_create_resource_files(pdev); } Thread B: pci_bus_add_device # add device X pci_create_sysfs_dev_files if (!sysfs_initialized) # sysfs_initialized still zero return -EACCES; pci_create_resource_files(pdev); # not executed Thread A: while ((pbus = pci_find_next_bus(pbus))) pci_create_legacy_files(pbus); sysfs_initialized = 1; Doesn't this have a similar race where instead of the duplicate filename from having two threads try to create the resource files, neither thread creates them and device X ends up with no resource files at all? Krzysztof has done a ton of work to convert these files to static attributes, where the device model prevents most of these races: 506140f9c06b ("PCI/sysfs: Convert "index", "acpi_index", "label" to static attributes") d93f8399053d ("PCI/sysfs: Convert "vpd" to static attribute") f42c35ea3b13 ("PCI/sysfs: Convert "reset" to static attribute") 527139d738d7 ("PCI/sysfs: Convert "rom" to static attribute") e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute") and he even posted a series to do the same for the resource files: https://lore.kernel.org/linux-pci/20210910202623.2293708-1-kw@linux.com/ I can't remember why we didn't apply that at the time, and it no longer applies cleanly, but I think that's the direction we should go. Bjorn
On Tue, Feb 06, 2024 at 04:07:15PM -0600, Bjorn Helgaas wrote: > [+cc Krzysztof] > > On Fri, Dec 08, 2023 at 07:46:16PM -0800, Saurabh Sengar wrote: > > Currently there is a race in calling pci_create_resource_files function > > from two different therads, first therad is triggered by pci_sysfs_init > > from the late initcall where as the second thread is initiated by > > pci_bus_add_devices from the respective PCI drivers probe. > > > > The synchronization between these threads relies on the sysfs_initialized > > flag. However, in pci_sysfs_init, sysfs_initialized is set right before > > calling pci_create_resource_files which is wrong as it can create race > > condition with pci_bus_add_devices threads. Fix this by setting > > sysfs_initialized flag at the end of pci_sysfs_init and direecly call the > > pci_create_resource_files function from it. > > > > There can be an additional case where driver probe is so delayed that > > pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init. > > In such cases, attempting to access already existing sysfs resources is > > unnecessary. Fix this by adding a check for sysfs attributes and return > > if they are already allocated. > > > > In both cases, the consequence will be the removal of sysfs resources that > > were appropriately allocated by pci_sysfs_init following the warning below. > > > > [ 3.376688] sysfs: cannot create duplicate filename '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/47505500-0001-0000-3130-444531454238/pci0001:00/0001:00:00.0/resource0' > > [ 3.385103] CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 5.15.0-1046-azure #53~20.04.1-Ubuntu > > [ 3.389585] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 > > [ 3.394663] Workqueue: events_unbound async_run_entry_fn > > [ 3.397687] Call Trace: > > [ 3.399312] <TASK> > > [ 3.400780] dump_stack_lvl+0x38/0x4d > > [ 3.402998] dump_stack+0x10/0x16 > > [ 3.406050] sysfs_warn_dup.cold+0x17/0x2b > > [ 3.408476] sysfs_add_file_mode_ns+0x17b/0x190 > > [ 3.411072] sysfs_create_bin_file+0x64/0x90 > > [ 3.413514] pci_create_attr+0xc7/0x260 > > [ 3.415827] pci_create_resource_files+0x6f/0x150 > > [ 3.418455] pci_create_sysfs_dev_files+0x18/0x30 > > [ 3.421136] pci_bus_add_device+0x30/0x70 > > [ 3.423512] pci_bus_add_devices+0x31/0x70 > > [ 3.425958] hv_pci_probe+0x4ce/0x640 > > [ 3.428106] vmbus_probe+0x67/0x90 > > [ 3.430121] really_probe.part.0+0xcb/0x380 > > [ 3.432516] really_probe+0x40/0x80 > > [ 3.434581] __driver_probe_device+0xe8/0x140 > > [ 3.437119] driver_probe_device+0x23/0xb0 > > [ 3.439504] __driver_attach_async_helper+0x31/0x90 > > [ 3.442296] async_run_entry_fn+0x33/0x120 > > [ 3.444666] process_one_work+0x225/0x3d0 > > [ 3.447043] worker_thread+0x4d/0x3e0 > > [ 3.449233] ? process_one_work+0x3d0/0x3d0 > > [ 3.451632] kthread+0x12a/0x150 > > [ 3.453583] ? set_kthread_struct+0x50/0x50 > > [ 3.456103] ret_from_fork+0x22/0x30 > > > > Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> > > --- > > There has been earlier attempts to fix this problem, below are the patches > > for reference of these attempts. > > 1. https://lore.kernel.org/linux-pci/20230316103036.1837869-1-alexander.stein@ew.tq-group.com/T/#u > > 2. https://lwn.net/ml/linux-kernel/20230316091540.494366-1-alexander.stein@ew.tq-group.com/ > > > > Bug details: https://bugzilla.kernel.org/show_bug.cgi?id=215515 > > > > drivers/pci/pci-sysfs.c | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > > index f2909ae93f2f..a31f6f2cf309 100644 > > --- a/drivers/pci/pci-sysfs.c > > +++ b/drivers/pci/pci-sysfs.c > > @@ -1230,6 +1230,10 @@ static int pci_create_resource_files(struct pci_dev *pdev) > > if (!pci_resource_len(pdev, i)) > > continue; > > > > + /* Check if resource already allocated and proceed no further */ > > + if (pdev->res_attr[i] || pdev->res_attr_wc[i]) > > + return 0; > > + > > retval = pci_create_attr(pdev, i, 0); > > /* for prefetchable resources, create a WC mappable file */ > > if (!retval && arch_can_pci_mmap_wc() && > > @@ -1411,9 +1415,8 @@ static int __init pci_sysfs_init(void) > > struct pci_bus *pbus = NULL; > > int retval; > > > > - sysfs_initialized = 1; > > for_each_pci_dev(pdev) { > > - retval = pci_create_sysfs_dev_files(pdev); > > + retval = pci_create_resource_files(pdev); > > if (retval) { > > pci_dev_put(pdev); > > return retval; > > @@ -1423,6 +1426,8 @@ static int __init pci_sysfs_init(void) > > while ((pbus = pci_find_next_bus(pbus))) > > pci_create_legacy_files(pbus); > > > > + sysfs_initialized = 1; > > + > > return 0; > > } > > late_initcall(pci_sysfs_init); > > Sorry for the delay in looking at this. Consider the following > sequence where thread A is executing pci_sysfs_init() at the same time > as thread B enumerates and adds device X: > > Thread A: > > pci_sysfs_init > for_each_pci_dev(pdev) { # device X not included > pci_create_resource_files(pdev); > } > > Thread B: > > pci_bus_add_device # add device X > pci_create_sysfs_dev_files > if (!sysfs_initialized) # sysfs_initialized still zero > return -EACCES; > pci_create_resource_files(pdev); # not executed > > Thread A: > > while ((pbus = pci_find_next_bus(pbus))) > pci_create_legacy_files(pbus); > > sysfs_initialized = 1; > > Doesn't this have a similar race where instead of the duplicate > filename from having two threads try to create the resource files, > neither thread creates them and device X ends up with no resource > files at all? > > Krzysztof has done a ton of work to convert these files to static > attributes, where the device model prevents most of these races: > > 506140f9c06b ("PCI/sysfs: Convert "index", "acpi_index", "label" to static attributes") > d93f8399053d ("PCI/sysfs: Convert "vpd" to static attribute") > f42c35ea3b13 ("PCI/sysfs: Convert "reset" to static attribute") > 527139d738d7 ("PCI/sysfs: Convert "rom" to static attribute") > e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute") > > and he even posted a series to do the same for the resource files: > > https://lore.kernel.org/linux-pci/20210910202623.2293708-1-kw@linux.com/ > > I can't remember why we didn't apply that at the time, and it no > longer applies cleanly, but I think that's the direction we should go. > > Bjorn Thanks for you review. Krzysztof, Please inform me if there's existing feedback explaining why this series hasn't been merged yet. I am willing to further improve it if necessary. - Saurabh
On Wed, Feb 07, 2024 at 08:30:27AM -0800, Saurabh Singh Sengar wrote: > On Tue, Feb 06, 2024 at 04:07:15PM -0600, Bjorn Helgaas wrote: > > [+cc Krzysztof] > > > > On Fri, Dec 08, 2023 at 07:46:16PM -0800, Saurabh Sengar wrote: > > > Currently there is a race in calling pci_create_resource_files function > > > from two different therads, first therad is triggered by pci_sysfs_init > > > from the late initcall where as the second thread is initiated by > > > pci_bus_add_devices from the respective PCI drivers probe. > > > > > > The synchronization between these threads relies on the sysfs_initialized > > > flag. However, in pci_sysfs_init, sysfs_initialized is set right before > > > calling pci_create_resource_files which is wrong as it can create race > > > condition with pci_bus_add_devices threads. Fix this by setting > > > sysfs_initialized flag at the end of pci_sysfs_init and direecly call the > > > pci_create_resource_files function from it. > > > > > > There can be an additional case where driver probe is so delayed that > > > pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init. > > > In such cases, attempting to access already existing sysfs resources is > > > unnecessary. Fix this by adding a check for sysfs attributes and return > > > if they are already allocated. > > > > > > In both cases, the consequence will be the removal of sysfs resources that > > > were appropriately allocated by pci_sysfs_init following the warning below. > > > > > > [ 3.376688] sysfs: cannot create duplicate filename '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/47505500-0001-0000-3130-444531454238/pci0001:00/0001:00:00.0/resource0' > > > [ 3.385103] CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 5.15.0-1046-azure #53~20.04.1-Ubuntu > > > [ 3.389585] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 > > > [ 3.394663] Workqueue: events_unbound async_run_entry_fn > > > [ 3.397687] Call Trace: > > > [ 3.399312] <TASK> > > > [ 3.400780] dump_stack_lvl+0x38/0x4d > > > [ 3.402998] dump_stack+0x10/0x16 > > > [ 3.406050] sysfs_warn_dup.cold+0x17/0x2b > > > [ 3.408476] sysfs_add_file_mode_ns+0x17b/0x190 > > > [ 3.411072] sysfs_create_bin_file+0x64/0x90 > > > [ 3.413514] pci_create_attr+0xc7/0x260 > > > [ 3.415827] pci_create_resource_files+0x6f/0x150 > > > [ 3.418455] pci_create_sysfs_dev_files+0x18/0x30 > > > [ 3.421136] pci_bus_add_device+0x30/0x70 > > > [ 3.423512] pci_bus_add_devices+0x31/0x70 > > > [ 3.425958] hv_pci_probe+0x4ce/0x640 > > > [ 3.428106] vmbus_probe+0x67/0x90 > > > [ 3.430121] really_probe.part.0+0xcb/0x380 > > > [ 3.432516] really_probe+0x40/0x80 > > > [ 3.434581] __driver_probe_device+0xe8/0x140 > > > [ 3.437119] driver_probe_device+0x23/0xb0 > > > [ 3.439504] __driver_attach_async_helper+0x31/0x90 > > > [ 3.442296] async_run_entry_fn+0x33/0x120 > > > [ 3.444666] process_one_work+0x225/0x3d0 > > > [ 3.447043] worker_thread+0x4d/0x3e0 > > > [ 3.449233] ? process_one_work+0x3d0/0x3d0 > > > [ 3.451632] kthread+0x12a/0x150 > > > [ 3.453583] ? set_kthread_struct+0x50/0x50 > > > [ 3.456103] ret_from_fork+0x22/0x30 > > > > > > Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> > > > --- > > > There has been earlier attempts to fix this problem, below are the patches > > > for reference of these attempts. > > > 1. https://lore.kernel.org/linux-pci/20230316103036.1837869-1-alexander.stein@ew.tq-group.com/T/#u > > > 2. https://lwn.net/ml/linux-kernel/20230316091540.494366-1-alexander.stein@ew.tq-group.com/ > > > > > > Bug details: https://bugzilla.kernel.org/show_bug.cgi?id=215515 > > > > > > drivers/pci/pci-sysfs.c | 9 +++++++-- > > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > > > index f2909ae93f2f..a31f6f2cf309 100644 > > > --- a/drivers/pci/pci-sysfs.c > > > +++ b/drivers/pci/pci-sysfs.c > > > @@ -1230,6 +1230,10 @@ static int pci_create_resource_files(struct pci_dev *pdev) > > > if (!pci_resource_len(pdev, i)) > > > continue; > > > > > > + /* Check if resource already allocated and proceed no further */ > > > + if (pdev->res_attr[i] || pdev->res_attr_wc[i]) > > > + return 0; > > > + > > > retval = pci_create_attr(pdev, i, 0); > > > /* for prefetchable resources, create a WC mappable file */ > > > if (!retval && arch_can_pci_mmap_wc() && > > > @@ -1411,9 +1415,8 @@ static int __init pci_sysfs_init(void) > > > struct pci_bus *pbus = NULL; > > > int retval; > > > > > > - sysfs_initialized = 1; > > > for_each_pci_dev(pdev) { > > > - retval = pci_create_sysfs_dev_files(pdev); > > > + retval = pci_create_resource_files(pdev); > > > if (retval) { > > > pci_dev_put(pdev); > > > return retval; > > > @@ -1423,6 +1426,8 @@ static int __init pci_sysfs_init(void) > > > while ((pbus = pci_find_next_bus(pbus))) > > > pci_create_legacy_files(pbus); > > > > > > + sysfs_initialized = 1; > > > + > > > return 0; > > > } > > > late_initcall(pci_sysfs_init); > > > > Sorry for the delay in looking at this. Consider the following > > sequence where thread A is executing pci_sysfs_init() at the same time > > as thread B enumerates and adds device X: > > > > Thread A: > > > > pci_sysfs_init > > for_each_pci_dev(pdev) { # device X not included > > pci_create_resource_files(pdev); > > } > > > > Thread B: > > > > pci_bus_add_device # add device X > > pci_create_sysfs_dev_files > > if (!sysfs_initialized) # sysfs_initialized still zero > > return -EACCES; > > pci_create_resource_files(pdev); # not executed > > > > Thread A: > > > > while ((pbus = pci_find_next_bus(pbus))) > > pci_create_legacy_files(pbus); > > > > sysfs_initialized = 1; > > > > Doesn't this have a similar race where instead of the duplicate > > filename from having two threads try to create the resource files, > > neither thread creates them and device X ends up with no resource > > files at all? > > > > Krzysztof has done a ton of work to convert these files to static > > attributes, where the device model prevents most of these races: > > > > 506140f9c06b ("PCI/sysfs: Convert "index", "acpi_index", "label" to static attributes") > > d93f8399053d ("PCI/sysfs: Convert "vpd" to static attribute") > > f42c35ea3b13 ("PCI/sysfs: Convert "reset" to static attribute") > > 527139d738d7 ("PCI/sysfs: Convert "rom" to static attribute") > > e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute") > > > > and he even posted a series to do the same for the resource files: > > > > https://lore.kernel.org/linux-pci/20210910202623.2293708-1-kw@linux.com/ > > > > I can't remember why we didn't apply that at the time, and it no > > longer applies cleanly, but I think that's the direction we should go. > > > > Bjorn > > > Thanks for you review. > > Krzysztof, > > Please inform me if there's existing feedback explaining why this > series hasn't been merged yet. I am willing to further improve it > if necessary. Krzysztof Wilczyński, Let us know your opinion so that we can move ahead in fixing this long pending bug. - Saurabh > > - Saurabh
On Tue, Feb 27, 2024 at 09:14:58AM -0800, Saurabh Singh Sengar wrote: > On Wed, Feb 07, 2024 at 08:30:27AM -0800, Saurabh Singh Sengar wrote: > > On Tue, Feb 06, 2024 at 04:07:15PM -0600, Bjorn Helgaas wrote: > > > On Fri, Dec 08, 2023 at 07:46:16PM -0800, Saurabh Sengar wrote: > > > > Currently there is a race in calling pci_create_resource_files function > > > > from two different therads, first therad is triggered by pci_sysfs_init > > > > from the late initcall where as the second thread is initiated by > > > > pci_bus_add_devices from the respective PCI drivers probe. > ... > > > Krzysztof has done a ton of work to convert these files to static > > > attributes, where the device model prevents most of these races: > > > > > > 506140f9c06b ("PCI/sysfs: Convert "index", "acpi_index", "label" to static attributes") > > > d93f8399053d ("PCI/sysfs: Convert "vpd" to static attribute") > > > f42c35ea3b13 ("PCI/sysfs: Convert "reset" to static attribute") > > > 527139d738d7 ("PCI/sysfs: Convert "rom" to static attribute") > > > e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute") > > > > > > and he even posted a series to do the same for the resource files: > > > > > > https://lore.kernel.org/linux-pci/20210910202623.2293708-1-kw@linux.com/ > > > > > > I can't remember why we didn't apply that at the time, and it no > > > longer applies cleanly, but I think that's the direction we should go. > > > > Thanks for you review. > > > > Please inform me if there's existing feedback explaining why this > > series hasn't been merged yet. I am willing to further improve it > > if necessary. > > Let us know your opinion so that we can move ahead in fixing this > long pending bug. There's no feedback on the mailing list (I checked the link above), so the way forward is to update the series so it applies cleanly again and post it as a v3. There's no need to wait for Krzysztof to refresh it, and if you have time to do it, it would be very welcomed! The best base would be v6.8-rc1. Bjorn
Hello, Sorry for late reply. [...] > > > > Krzysztof has done a ton of work to convert these files to static > > > > attributes, where the device model prevents most of these races: > > > > > > > > 506140f9c06b ("PCI/sysfs: Convert "index", "acpi_index", "label" to static attributes") > > > > d93f8399053d ("PCI/sysfs: Convert "vpd" to static attribute") > > > > f42c35ea3b13 ("PCI/sysfs: Convert "reset" to static attribute") > > > > 527139d738d7 ("PCI/sysfs: Convert "rom" to static attribute") > > > > e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute") > > > > > > > > and he even posted a series to do the same for the resource files: > > > > > > > > https://lore.kernel.org/linux-pci/20210910202623.2293708-1-kw@linux.com/ > > > > > > > > I can't remember why we didn't apply that at the time, and it no > > > > longer applies cleanly, but I think that's the direction we should go. > > > > > > Thanks for you review. > > > > > > Please inform me if there's existing feedback explaining why this > > > series hasn't been merged yet. I am willing to further improve it > > > if necessary. > > > > Let us know your opinion so that we can move ahead in fixing this > > long pending bug. I really thought you were asking me about your patch. So, I didn't reply since Bjorn added his review. > There's no feedback on the mailing list (I checked the link above), so > the way forward is to update the series so it applies cleanly again > and post it as a v3. Start with a review, if you have some time. Perhaps we can make it better before sending another revision. There are two areas which this series decided not to tackle initially: - Support for the Alpha platform - Support for legacy PCI platforms Feel free to have a look at the above. Perhaps you will have some ideas on how to best convert both of these to use static attributes, so that we could convert everything at the same time. > There's no need to wait for Krzysztof to refresh it, and if you have > time to do it, it would be very welcomed! The best base would be > v6.8-rc1. That I can do, perhaps this coming weekend. Or even sooner when I find some time this week. Krzysztof
On Thu, Feb 29, 2024 at 02:22:55AM +0900, Krzysztof Wilczyński wrote: > Hello, > > Sorry for late reply. > > [...] > > > > > Krzysztof has done a ton of work to convert these files to static > > > > > attributes, where the device model prevents most of these races: > > > > > > > > > > 506140f9c06b ("PCI/sysfs: Convert "index", "acpi_index", "label" to static attributes") > > > > > d93f8399053d ("PCI/sysfs: Convert "vpd" to static attribute") > > > > > f42c35ea3b13 ("PCI/sysfs: Convert "reset" to static attribute") > > > > > 527139d738d7 ("PCI/sysfs: Convert "rom" to static attribute") > > > > > e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute") > > > > > > > > > > and he even posted a series to do the same for the resource files: > > > > > > > > > > https://lore.kernel.org/linux-pci/20210910202623.2293708-1-kw@linux.com/ > > > > > > > > > > I can't remember why we didn't apply that at the time, and it no > > > > > longer applies cleanly, but I think that's the direction we should go. > > > > > > > > Thanks for you review. > > > > > > > > Please inform me if there's existing feedback explaining why this > > > > series hasn't been merged yet. I am willing to further improve it > > > > if necessary. > > > > > > Let us know your opinion so that we can move ahead in fixing this > > > long pending bug. > > I really thought you were asking me about your patch. So, I didn't reply > since Bjorn added his review. > > > There's no feedback on the mailing list (I checked the link above), so > > the way forward is to update the series so it applies cleanly again > > and post it as a v3. > > Start with a review, if you have some time. Perhaps we can make it better > before sending another revision. > > There are two areas which this series decided not to tackle initially: > > - Support for the Alpha platform > - Support for legacy PCI platforms > > Feel free to have a look at the above. Perhaps you will have some ideas on how > to best convert both of these to use static attributes, so that we could convert > everything at the same time. > > > There's no need to wait for Krzysztof to refresh it, and if you have > > time to do it, it would be very welcomed! The best base would be > > v6.8-rc1. > > That I can do, perhaps this coming weekend. Or even sooner when I find some > time this week. > > Krzysztof Thank you ! I will look forward to it. I am happy to review and offer contributions if required. - Saurabh
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index f2909ae93f2f..a31f6f2cf309 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -1230,6 +1230,10 @@ static int pci_create_resource_files(struct pci_dev *pdev) if (!pci_resource_len(pdev, i)) continue; + /* Check if resource already allocated and proceed no further */ + if (pdev->res_attr[i] || pdev->res_attr_wc[i]) + return 0; + retval = pci_create_attr(pdev, i, 0); /* for prefetchable resources, create a WC mappable file */ if (!retval && arch_can_pci_mmap_wc() && @@ -1411,9 +1415,8 @@ static int __init pci_sysfs_init(void) struct pci_bus *pbus = NULL; int retval; - sysfs_initialized = 1; for_each_pci_dev(pdev) { - retval = pci_create_sysfs_dev_files(pdev); + retval = pci_create_resource_files(pdev); if (retval) { pci_dev_put(pdev); return retval; @@ -1423,6 +1426,8 @@ static int __init pci_sysfs_init(void) while ((pbus = pci_find_next_bus(pbus))) pci_create_legacy_files(pbus); + sysfs_initialized = 1; + return 0; } late_initcall(pci_sysfs_init);