Message ID | 20230316091540.494366-1-alexander.stein@ew.tq-group.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp380312wrt; Thu, 16 Mar 2023 02:39:10 -0700 (PDT) X-Google-Smtp-Source: AK7set9vIk9fOl9WG8R2kmEZlHOhG4Uw74gdS/kpIl5d31QCdpCGwcsR+qEkKAfK8YLd9uwI0R0R X-Received: by 2002:a17:902:db08:b0:19e:866c:3547 with SMTP id m8-20020a170902db0800b0019e866c3547mr3129380plx.65.1678959550355; Thu, 16 Mar 2023 02:39:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678959550; cv=none; d=google.com; s=arc-20160816; b=Eqf75fwbGDLmegDofJ/YV7mv5w4Ja9Cv0rSBbA8iDq9WAaTkspVJS131gQXIl58utV 4gjlOwbwlDT/SF4MfwpijpZi61Jff5p2eQ6RE2Zg/35JUia63DrUAwxSeaWv0HfAlq0I muffdbVqiF2dIJRuSQmGkvLiwYYa3MBzS8zthNTttausnYAhVXQJcXw9WbqqfOTabHPc 7m98iJxbvDC19ULg+DgveqtqYgqE6Rue6qEK8036zstdFoD8fLd9hZFlLHpRO5F+u2/H Y0hBW3NP5a6f3514xVxeCEnLXHLrf6BbKmTknPt5HRDSSxCTVcbZAcyc5+wel4eehsGA slTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature:dkim-signature; bh=vNBKUCn+oSOubieZrKRWxbCHp3w8zx3bOHe3umDYycw=; b=I9QTy1oCSvEdE4IgUXqiDE+tvm7R8HsCM8RdjBAy5CAel88qMkWf+/ezuL3qs7MDzN Zw8tg9e5FZ79dmELO/ldk0hOsSWNuwzDAITMK96B+7Mh5vpg7vl5R7C/qEsbvRz7SX3C TWGEMALFhu3IgiZaTq6BKkiawsjNxwA0fH2fQpOc4yqnKkdwhFlby8hD4t4UhQvHr7Hq PJhecsY1EZpS6k+A+/nUYo/A8WK2IT4thkOmRX9ijBmw6Pu526syE7Fp38Niw6t4+YWi 9qMgLJd0aJEL2a2/wTjdh3K52zqxmjNOJxFXgA4z0ReUoCwIO2INF0zfLBSA2ofuPKdO 8lxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@tq-group.com header.s=key1 header.b=aDstGSYa; dkim=pass header.i=@tq-group.com header.s=key1 header.b=Q4h+tnXT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=tq-group.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j9-20020a170902da8900b001a0a44fa57csi5074989plx.389.2023.03.16.02.38.57; Thu, 16 Mar 2023 02:39:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@tq-group.com header.s=key1 header.b=aDstGSYa; dkim=pass header.i=@tq-group.com header.s=key1 header.b=Q4h+tnXT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=tq-group.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230283AbjCPJP5 (ORCPT <rfc822;pwkd43@gmail.com> + 99 others); Thu, 16 Mar 2023 05:15:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230177AbjCPJPy (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 16 Mar 2023 05:15:54 -0400 Received: from mx1.tq-group.com (mx1.tq-group.com [93.104.207.81]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A0805AB4D; Thu, 16 Mar 2023 02:15:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tq-group.com; i=@tq-group.com; q=dns/txt; s=key1; t=1678958152; x=1710494152; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=vNBKUCn+oSOubieZrKRWxbCHp3w8zx3bOHe3umDYycw=; b=aDstGSYayo0Uez0xz1sw0mJ3Zqsb7qmgeec3GwAjf4CrwsAexESBYOmY g3bHZzzp7UF5oaEt2shEP1QWlVYmJ7Lbf3NVg3FwhBMAsKCC0rxSnqAeA rr2xllU+6Wl5NsEKyGPfeLQnF3zoaHHnPqpzKyjdgYpa2qsEfaVNkNuCN zyDp/4GfrGpAnAdwvUl02v9So1ymvO7CuT0NAm2sQo8kYiIxiGDFRncl2 BgcwgHn3KgLjoxvazMUPTtwT2tBztIvjR/ELUB1I0TEcS2Qb8LEvNk1CU YP3uC9Cicaq8zpKkHr4OnbTJe5NjjQcs4G8Q66Cdletojg6hUq0X59Nlj A==; X-IronPort-AV: E=Sophos;i="5.98,265,1673910000"; d="scan'208";a="29730469" Received: from unknown (HELO tq-pgp-pr1.tq-net.de) ([192.168.6.15]) by mx1-pgp.tq-group.com with ESMTP; 16 Mar 2023 10:15:49 +0100 Received: from mx1.tq-group.com ([192.168.6.7]) by tq-pgp-pr1.tq-net.de (PGP Universal service); Thu, 16 Mar 2023 10:15:49 +0100 X-PGP-Universal: processed; by tq-pgp-pr1.tq-net.de on Thu, 16 Mar 2023 10:15:49 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tq-group.com; i=@tq-group.com; q=dns/txt; s=key1; t=1678958149; x=1710494149; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=vNBKUCn+oSOubieZrKRWxbCHp3w8zx3bOHe3umDYycw=; b=Q4h+tnXTJvDIjbGBHT/Ri4SboGxaSMFFTAKh3SM6/NJITqg1XKuptYgu uylJwGqVYU72CEntyARNxePw7usmJe/tVkeZxi2LRwME08vlUR2pd9plL dWSOO6uXdpTNzE5zZdlFULRiW6ZSPbhflJHlYpxIc2gBfz7N2sYtiFrXM DOxKI455z4K+HsfXemK1CAjflfZq9TJEPF/giHWpIakZ9Eg+AboZsaynQ NlZZykyNugPQ1zyhMFaSxKKvux59jnSYcy8GhvBPheLFTLQtbRKPd41iG 0EiV7t581d3yLbTXSF5cb0ATY2RCv78yI86/JjvRegx6WouKHezC2RZkB w==; X-IronPort-AV: E=Sophos;i="5.98,265,1673910000"; d="scan'208";a="29730468" Received: from vtuxmail01.tq-net.de ([10.115.0.20]) by mx1.tq-group.com with ESMTP; 16 Mar 2023 10:15:49 +0100 Received: from steina-w.tq-net.de (unknown [10.123.53.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by vtuxmail01.tq-net.de (Postfix) with ESMTPSA id 3ABF8280056; Thu, 16 Mar 2023 10:15:49 +0100 (CET) From: Alexander Stein <alexander.stein@ew.tq-group.com> To: Bjorn Helgaas <bhelgaas@google.com> Cc: Korneliusz Osmenda <korneliuszo@gmail.com>, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Alexander Stein <alexander.stein@ew.tq-group.com> Subject: [PATCH v2 1/1] Guard pci_create_sysfs_dev_files with atomic value Date: Thu, 16 Mar 2023 10:15:40 +0100 Message-Id: <20230316091540.494366-1-alexander.stein@ew.tq-group.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760515837549639695?= X-GMAIL-MSGID: =?utf-8?q?1760516689472021501?= |
Series |
[v2,1/1] Guard pci_create_sysfs_dev_files with atomic value
|
|
Commit Message
Alexander Stein
March 16, 2023, 9:15 a.m. UTC
From: Korneliusz Osmenda <korneliuszo@gmail.com> On Gateworks Ventana there is a number of PCI devices and: - imx6_pcie_probe takes longer than start of late init - pci_sysfs_init sets up flag sysfs_initialized - pci_sysfs_init initializes already found devices - imx6_pcie_probe tries to reinitialize device Link: https://bugzilla.kernel.org/show_bug.cgi?id=215515 Signed-off-by: Korneliusz Osmenda <korneliuszo@gmail.com> Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> --- drivers/pci/pci-sysfs.c | 6 ++++++ include/linux/pci.h | 2 ++ 2 files changed, 8 insertions(+)
Comments
On 16.03.23 10:15, Alexander Stein wrote: > From: Korneliusz Osmenda <korneliuszo@gmail.com> > > On Gateworks Ventana there is a number of PCI devices and: > - imx6_pcie_probe takes longer than start of late init > - pci_sysfs_init sets up flag sysfs_initialized > - pci_sysfs_init initializes already found devices > - imx6_pcie_probe tries to reinitialize device > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215515 > > Signed-off-by: Korneliusz Osmenda <korneliuszo@gmail.com> > Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> > --- > drivers/pci/pci-sysfs.c | 6 ++++++ > include/linux/pci.h | 2 ++ > 2 files changed, 8 insertions(+) > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index dd0d9d9bc509..998e44716b6f 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -1497,6 +1497,9 @@ int __must_check pci_create_sysfs_dev_files(struct pci_dev *pdev) > if (!sysfs_initialized) > return -EACCES; > > + if (atomic_cmpxchg(&pdev->sysfs_init_cnt, 0, 1) == 1) > + return 0; /* already added */ > + > return pci_create_resource_files(pdev); This is very likely a bug. You are returning an error in the error case. Yet the flag stays. And simply resetting it in the error case would be a race. There is something fishy in that design. Regards Oliver
Hi Oliver, Am Donnerstag, 16. März 2023, 10:23:54 CET schrieb Oliver Neukum: > On 16.03.23 10:15, Alexander Stein wrote: > > From: Korneliusz Osmenda <korneliuszo@gmail.com> > > > > On Gateworks Ventana there is a number of PCI devices and: > > - imx6_pcie_probe takes longer than start of late init > > - pci_sysfs_init sets up flag sysfs_initialized > > - pci_sysfs_init initializes already found devices > > - imx6_pcie_probe tries to reinitialize device > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215515 > > > > Signed-off-by: Korneliusz Osmenda <korneliuszo@gmail.com> > > Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> > > --- > > > > drivers/pci/pci-sysfs.c | 6 ++++++ > > include/linux/pci.h | 2 ++ > > 2 files changed, 8 insertions(+) > > > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > > index dd0d9d9bc509..998e44716b6f 100644 > > --- a/drivers/pci/pci-sysfs.c > > +++ b/drivers/pci/pci-sysfs.c > > @@ -1497,6 +1497,9 @@ int __must_check pci_create_sysfs_dev_files(struct > > pci_dev *pdev)> > > if (!sysfs_initialized) > > > > return -EACCES; > > > > + if (atomic_cmpxchg(&pdev->sysfs_init_cnt, 0, 1) == 1) > > + return 0; /* already added */ > > + > > > > return pci_create_resource_files(pdev); > > This is very likely a bug. You are returning an error in the error > case. Yet the flag stays. Ah, you are right. This is something needed to address. > And simply resetting it in the error case > would be a race. There is something fishy in that design. Admittedly I would like to get rid of these two pathes for creating sysfs files in the first place, but I do not know the pci subsystem very well. IMHO for_each_pci_dev(pdev) in pci_sysfs_init is part of the problem as it unconditionally iterates over the bus, without any locks, thus creating sysfs files for each device added to the bus. Any ideas? Best regards, Alexander
On 16.03.23 10:33, Alexander Stein wrote: > Hi Oliver, Hi, > > Admittedly > I would like to get rid of these two pathes for creating sysfs files in the > first place, but I do not know the pci subsystem very well. > IMHO for_each_pci_dev(pdev) in pci_sysfs_init is part of the problem as it > unconditionally iterates over the bus, without any locks, thus creating sysfs > files for each device added to the bus. > Any ideas? First of all, this existing code is a mess. If I understand you have the issue that your driver adds a bridge in dw_pcie_host_init() and the generic code in pci_create_sysfs_dev_files() populates the directory before or while your driver does so and the devices are effectively discovered twice. It seems to me that you must not add a bridge before pci_create_sysfs_dev_files() has finished. Now you could add a wait_queue and a flag and wait for it to finish. But that is not very elegant. From which initcall is your driver probed? Regards Oliver
Hi Oliver, Am Donnerstag, 16. März 2023, 12:17:32 CET schrieb Oliver Neukum: > On 16.03.23 10:33, Alexander Stein wrote: > > Hi Oliver, > > Hi, > > > Admittedly > > I would like to get rid of these two pathes for creating sysfs files in > > the > > first place, but I do not know the pci subsystem very well. > > IMHO for_each_pci_dev(pdev) in pci_sysfs_init is part of the problem as it > > unconditionally iterates over the bus, without any locks, thus creating > > sysfs files for each device added to the bus. > > Any ideas? > > First of all, this existing code is a mess. > > If I understand you have the issue that your driver adds a bridge > in dw_pcie_host_init() and the generic code in pci_create_sysfs_dev_files() > populates the directory before or while your driver does so and > the devices are effectively discovered twice. Yep, that's my observation as well. > It seems to me that you must not add a bridge before > pci_create_sysfs_dev_files() has finished. Now you could add a wait_queue > and a flag and wait for it to finish. But that is not very elegant. Do we need the pci_sysfs_init initcall at all? Or to put it in other words, what does this initcall solve? See my different approach eliminating this race at all. > From which initcall is your driver probed? The callstack looks like this: > imx6_pcie_probe from platform_probe+0x5c/0xb8 > platform_probe from call_driver_probe+0x24/0x118 > call_driver_probe from really_probe+0xc4/0x31c > really_probe from __driver_probe_device+0x8c/0x120 > __driver_probe_device from driver_probe_device+0x30/0xc0 > driver_probe_device from __driver_attach_async_helper+0x50/0xd8 > __driver_attach_async_helper from async_run_entry_fn+0x30/0x144 > async_run_entry_fn from process_one_work+0x1c4/0x3d0 > process_one_work from worker_thread+0x50/0x41c > worker_thread from kthread+0xec/0x104 > kthread from ret_from_fork+0x14/0x2c So technically the device is not probed from within a initcall but a kthread. It is set to be probed asynchronous in imx6_pcie_driver. This async call is scheduled in __driver_attach, from this callstack: > __driver_attach from bus_for_each_dev+0x74/0xc8 > bus_for_each_dev from bus_add_driver+0xf0/0x1f4 > bus_add_driver from driver_register+0x7c/0x118 > driver_register from do_one_initcall+0x4c/0x180 > do_one_initcall from do_initcalls+0xe0/0x114 > do_initcalls from kernel_init_freeable+0xd8/0x100 > kernel_init_freeable from kernel_init+0x18/0x12c > kernel_init from ret_from_fork+0x14/0x2c Best regards, Alexander
On 16.03.23 12:58, Alexander Stein wrote: > Hi Oliver, > > Am Donnerstag, 16. März 2023, 12:17:32 CET schrieb Oliver Neukum: >> It seems to me that you must not add a bridge before >> pci_create_sysfs_dev_files() has finished. Now you could add a wait_queue >> and a flag and wait for it to finish. But that is not very elegant. > > Do we need the pci_sysfs_init initcall at all? Or to put it in other words, > what does this initcall solve? Fundamentally something has to discover the root bridge. Secondly your system has to boot. The device right behind the root bridge will already be up and running when the kernel takes control. IMHO treating such devices differently from other devices makes sense. > See my different approach eliminating this race at all. Please elaborate >> From which initcall is your driver probed? > > The callstack looks like this: >> imx6_pcie_probe from platform_probe+0x5c/0xb8 >> platform_probe from call_driver_probe+0x24/0x118 >> call_driver_probe from really_probe+0xc4/0x31c >> really_probe from __driver_probe_device+0x8c/0x120 >> __driver_probe_device from driver_probe_device+0x30/0xc0 >> driver_probe_device from __driver_attach_async_helper+0x50/0xd8 >> __driver_attach_async_helper from async_run_entry_fn+0x30/0x144 >> async_run_entry_fn from process_one_work+0x1c4/0x3d0 >> process_one_work from worker_thread+0x50/0x41c >> worker_thread from kthread+0xec/0x104 >> kthread from ret_from_fork+0x14/0x2c > > So technically the device is not probed from within a initcall but a kthread. > It is set to be probed asynchronous in imx6_pcie_driver. That may be the problem, respectively that system is incomplete You are registering a PCI bridge. The PCI subsystem should be done setting up when you run. That is just a simple dependency. Regards Oliver
Hi Oliver, Am Donnerstag, 16. März 2023, 13:23:25 CET schrieb Oliver Neukum: > On 16.03.23 12:58, Alexander Stein wrote: > > Hi Oliver, > > > > Am Donnerstag, 16. März 2023, 12:17:32 CET schrieb Oliver Neukum: > >> It seems to me that you must not add a bridge before > >> pci_create_sysfs_dev_files() has finished. Now you could add a wait_queue > >> and a flag and wait for it to finish. But that is not very elegant. > > > > Do we need the pci_sysfs_init initcall at all? Or to put it in other > > words, > > what does this initcall solve? > > Fundamentally something has to discover the root bridge. > Secondly your system has to boot. The device right behind > the root bridge will already be up and running when the kernel > takes control. IMHO treating such devices differently from > other devices makes sense. But isn't the root bridge discovered by the driver (pci-imx6 in this case) for that? And the driver probe path eventually calls into the sysfs file creation. I compared the file creation to usb, as this is a discoverable bus as well. There is no special initialization regarding sysfs. > > See my different approach eliminating this race at all. > > Please elaborate Currently the initcall pci_sysfs_init and the PCIe root bridge driver probe paths are competing for file creation. If, for some reason, the device enumeration for PCI bus during imx6_pcie_probe is delayed after pci_sysfs_init initcall, this initcall essentially does nothing, no devices or busses to iterate. Which means the complete pcie sysfs creation is done from bridge probe path. There is no reason to iterate over discovered PCIe devices/busses separately. I assume this issue is not that prominent, if at all, as other platforms vary in speed a lot. I was not able to reproduce on i.MX8MP which uses the same PCIe bridge driver. Due to improved speed performance, I guess on this platform pci_sysfs_init finishes, without doing anything, before PCIe bridge is probed. I might be missing something (ACPI systems, etc.), I do not know the details within pci subsystem, but from my point of view this initcall is superfluous. For the record the patch is at [1] [1] https://lore.kernel.org/linux-pci/20230316103036.1837869-1-alexander.stein@ew.tq-group.com/T/#u > >> From which initcall is your driver probed? > > > > The callstack looks like this: > >> imx6_pcie_probe from platform_probe+0x5c/0xb8 > >> platform_probe from call_driver_probe+0x24/0x118 > >> call_driver_probe from really_probe+0xc4/0x31c > >> really_probe from __driver_probe_device+0x8c/0x120 > >> __driver_probe_device from driver_probe_device+0x30/0xc0 > >> driver_probe_device from __driver_attach_async_helper+0x50/0xd8 > >> __driver_attach_async_helper from async_run_entry_fn+0x30/0x144 > >> async_run_entry_fn from process_one_work+0x1c4/0x3d0 > >> process_one_work from worker_thread+0x50/0x41c > >> worker_thread from kthread+0xec/0x104 > >> kthread from ret_from_fork+0x14/0x2c > > > > So technically the device is not probed from within a initcall but a > > kthread. It is set to be probed asynchronous in imx6_pcie_driver. > > That may be the problem, respectively that system is incomplete > You are registering a PCI bridge. The PCI subsystem should be > done setting up when you run. That is just a simple dependency. Is there such an dependency in the first place? I can't see anything, even the late_initcall to pci_resource_alignment_sysfs_init is a different matter. Best regards, Alexander
On 16.03.23 14:16, Alexander Stein wrote: > But isn't the root bridge discovered by the driver (pci-imx6 in this case) for > that? And the driver probe path eventually calls into the sysfs file creation. > I compared the file creation to usb, as this is a discoverable bus as well. > There is no special initialization regarding sysfs. If you discover a bus system you always have the option of creating of virtual hotplug event for the root hub or host controller. But for PCI that is a bad design choice. USB is different. > If, for some reason, the device enumeration for PCI bus during imx6_pcie_probe > is delayed after pci_sysfs_init initcall, this initcall essentially does > nothing, no devices or busses to iterate. Which means the complete pcie sysfs On your specific system. You cannot use that as a model for all systems. > creation is done from bridge probe path. There is no reason to iterate over > discovered PCIe devices/busses separately. If there is no other PCI device, the loop is a nop. But otherwise it is necessary. >>> So technically the device is not probed from within a initcall but a >>> kthread. It is set to be probed asynchronous in imx6_pcie_driver. >> >> That may be the problem, respectively that system is incomplete >> You are registering a PCI bridge. The PCI subsystem should be >> done setting up when you run. That is just a simple dependency. > > Is there such an dependency in the first place? I can't see anything, even the > late_initcall to pci_resource_alignment_sysfs_init is a different matter. On your hardware, yes. In the kernel, no. That is the very point. The kernel is missing a way to represent a dependency. Regards Oliver
Hi Oliver, Am Donnerstag, 16. März 2023, 15:01:25 CET schrieb Oliver Neukum: > On 16.03.23 14:16, Alexander Stein wrote: > > But isn't the root bridge discovered by the driver (pci-imx6 in this case) > > for that? And the driver probe path eventually calls into the sysfs file > > creation. I compared the file creation to usb, as this is a discoverable > > bus as well. There is no special initialization regarding sysfs. > > If you discover a bus system you always have the option of creating of > virtual hotplug event for the root hub or host controller. > But for PCI that is a bad design choice. USB is different. I'm not sure if I can follow you here. Can you elaborate? > > If, for some reason, the device enumeration for PCI bus during > > imx6_pcie_probe is delayed after pci_sysfs_init initcall, this initcall > > essentially does nothing, no devices or busses to iterate. Which means > > the complete pcie sysfs > On your specific system. You cannot use that as a model for all systems. I am aware that my platform is not a role model for the others. But I've yet to get information what is actually different on other platforms. > > creation is done from bridge probe path. There is no reason to iterate > > over > > discovered PCIe devices/busses separately. > > If there is no other PCI device, the loop is a nop. But otherwise it is > necessary. How is it necessary? How do these PCI devices get attaches to the pci_bus_type bus without calling pci_bus_add_device? > >>> So technically the device is not probed from within a initcall but a > >>> kthread. It is set to be probed asynchronous in imx6_pcie_driver. > >> > >> That may be the problem, respectively that system is incomplete > >> You are registering a PCI bridge. The PCI subsystem should be > >> done setting up when you run. That is just a simple dependency. > > > > Is there such an dependency in the first place? I can't see anything, even > > the late_initcall to pci_resource_alignment_sysfs_init is a different > > matter. > On your hardware, yes. In the kernel, no. > That is the very point. The kernel is missing a way to represent a > dependency. Okay, so which dependency is provided by pci_sysfs_init, which are required by drivers then? Best regards, Alexander
Am Donnerstag, 16. März 2023, 15:01:25 CET schrieb Oliver Neukum: > I'm not sure if I can follow you here. Can you elaborate? There are far better reasons to leave the setup of a PCI bus as the firmware has done it than to leave a USB as is. > How is it necessary? How do these PCI devices get attaches to the pci_bus_type > bus without calling pci_bus_add_device? AFAICT they don't. But somebody has to call it. > Okay, so which dependency is provided by pci_sysfs_init, which are required by > drivers then? It isn't. It is missing from the code. But it exists in reality. That is the point. You have a race condition between two probes. We cannot have that. Hence IMHO the dependency would be best expressed by waiting for pci_sysfs_init() to finish in the init sequence before you add any PCI bridges or devices to the system. Regards Oliver
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index dd0d9d9bc509..998e44716b6f 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -1497,6 +1497,9 @@ int __must_check pci_create_sysfs_dev_files(struct pci_dev *pdev) if (!sysfs_initialized) return -EACCES; + if (atomic_cmpxchg(&pdev->sysfs_init_cnt, 0, 1) == 1) + return 0; /* already added */ + return pci_create_resource_files(pdev); } @@ -1511,6 +1514,9 @@ void pci_remove_sysfs_dev_files(struct pci_dev *pdev) if (!sysfs_initialized) return; + if (atomic_cmpxchg(&pdev->sysfs_init_cnt, 1, 0) == 0) + return; /* already removed */ + pci_remove_resource_files(pdev); } diff --git a/include/linux/pci.h b/include/linux/pci.h index b50e5c79f7e3..024313a7a90a 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -467,6 +467,8 @@ struct pci_dev { pci_dev_flags_t dev_flags; atomic_t enable_cnt; /* pci_enable_device has been called */ + atomic_t sysfs_init_cnt; /* pci_create_sysfs_dev_files has been called */ + u32 saved_config_space[16]; /* Config space saved at suspend time */ struct hlist_head saved_cap_space; int rom_attr_enabled; /* Display of ROM attribute enabled? */