[0/3] PCI: Fix race condition upon sysfs init

Message ID 20230427142901.3570536-1-alexander.stein@ew.tq-group.com
Headers
Series PCI: Fix race condition upon sysfs init |

Message

Alexander Stein April 27, 2023, 2:28 p.m. UTC
  Hi everyone,

this series is a totally different approach for fixing the sysfs init race
condition. The initial problem is stated at [1]. Previous proposals were
rejected ([2] and [3]). Here is what's happening


        CPU 0                                  CPU 1

                                        imx6_pcie_probe()
                                        dw_pcie_host_init()
                                        pci_host_probe()
                                        pci_scan_root_bus_bridge()
                                          pci_scan_child_bus_extend()
                                          pci_scan_slot()
                                          pci_scan_single_device()
                                          pci_device_add()
pci_sysfs_init()                          device_add()
  sysfs_initialized = 1;                  bus_add_device()
  for_each_pci_dev()                          ...
    pci_create_sysfs_dev_files()                  
                                        pci_bus_add_devices()
                                        pci_bus_add_device()
                                        pci_create_sysfs_dev_files()

Eventually calling pci_create_sysfs_dev_files() twice on the same pci_dev.
It's a very tight window, deeper PCIe trees increase that window during
host probe. Asynchronous PCIe host probe is a necessity
(PROBE_PREFER_ASYNCHRONOUS).

The first two patches are preparations for the last one actually fixing
the race. As functions like pci_create_sysfs_dev_files() are called from
externtal and internal to pci-sysfs, an internal version without checking
for sysfs_initialized is required.
For the fix a wait queue is introduced where all callers from external
callsites (regarding pci-sysfs.c) are waiting until pci_sysfs_init
initcall has finished and woken up all waiters.

A subtlety is that within __pci_create_sysfs_dev_files the resource files
(created by pci_sysfs_init) need to be removed, so they can be created
again from pci_host_probe call.

Best regards,
Alexander

Links:
[1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
[2] https://lore.kernel.org/linux-pci/20230316091540.494366-1-alexander.stein@ew.tq-group.com/
[3] https://lore.kernel.org/linux-pci/20230316103036.1837869-1-alexander.stein@ew.tq-group.com/

Alexander Stein (3):
  PCI/sysfs: sort headers alphabetically
  PCI/sysfs: create private functions for
    pci_create_legacy_files/pci_create_sysfs_dev_files
  PCI/sysfs: Fix sysfs init race condition

 drivers/pci/pci-sysfs.c | 87 +++++++++++++++++++++++++----------------
 1 file changed, 53 insertions(+), 34 deletions(-)
  

Comments

Bjorn Helgaas April 27, 2023, 4:14 p.m. UTC | #1
On Thu, Apr 27, 2023 at 04:28:58PM +0200, Alexander Stein wrote:
> Hi everyone,
> 
> this series is a totally different approach for fixing the sysfs init race
> condition. The initial problem is stated at [1]. Previous proposals were
> rejected ([2] and [3]). Here is what's happening
> 
> 
>         CPU 0                                  CPU 1
> 
>                                         imx6_pcie_probe()
>                                         dw_pcie_host_init()
>                                         pci_host_probe()
>                                         pci_scan_root_bus_bridge()
>                                           pci_scan_child_bus_extend()
>                                           pci_scan_slot()
>                                           pci_scan_single_device()
>                                           pci_device_add()
> pci_sysfs_init()                          device_add()
>   sysfs_initialized = 1;                  bus_add_device()
>   for_each_pci_dev()                          ...
>     pci_create_sysfs_dev_files()                  
>                                         pci_bus_add_devices()
>                                         pci_bus_add_device()
>                                         pci_create_sysfs_dev_files()
> 
> Eventually calling pci_create_sysfs_dev_files() twice on the same pci_dev.
> It's a very tight window, deeper PCIe trees increase that window during
> host probe. Asynchronous PCIe host probe is a necessity
> (PROBE_PREFER_ASYNCHRONOUS).
> 
> The first two patches are preparations for the last one actually fixing
> the race. As functions like pci_create_sysfs_dev_files() are called from
> externtal and internal to pci-sysfs, an internal version without checking
> for sysfs_initialized is required.
> For the fix a wait queue is introduced where all callers from external
> callsites (regarding pci-sysfs.c) are waiting until pci_sysfs_init
> initcall has finished and woken up all waiters.
> 
> A subtlety is that within __pci_create_sysfs_dev_files the resource files
> (created by pci_sysfs_init) need to be removed, so they can be created
> again from pci_host_probe call.

I'll look at this in more detail, but if there's any way at all that
we could get rid of pci_sysfs_init() completely and do this with
static attributes or some other existing sysfs infrastructure, I would
STRONGLY prefer it because that infrastructure has already solved this
problem.

Maybe that's impossible and we really need to make a one-off solution
just for PCI, but ... I haven't been convinced yet.

> Links:
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=215515
> [2] https://lore.kernel.org/linux-pci/20230316091540.494366-1-alexander.stein@ew.tq-group.com/
> [3] https://lore.kernel.org/linux-pci/20230316103036.1837869-1-alexander.stein@ew.tq-group.com/
> 
> Alexander Stein (3):
>   PCI/sysfs: sort headers alphabetically
>   PCI/sysfs: create private functions for
>     pci_create_legacy_files/pci_create_sysfs_dev_files
>   PCI/sysfs: Fix sysfs init race condition
> 
>  drivers/pci/pci-sysfs.c | 87 +++++++++++++++++++++++++----------------
>  1 file changed, 53 insertions(+), 34 deletions(-)
> 
> -- 
> 2.34.1
>