[v2,1/1] Guard pci_create_sysfs_dev_files with atomic value

Message ID 20230316091540.494366-1-alexander.stein@ew.tq-group.com
State New
Headers
Series [v2,1/1] Guard pci_create_sysfs_dev_files with atomic value |

Commit Message

Alexander Stein March 16, 2023, 9:15 a.m. UTC
  From: Korneliusz Osmenda <korneliuszo@gmail.com>

On Gateworks Ventana there is a number of PCI devices and:
  - imx6_pcie_probe takes longer than start of late init
  - pci_sysfs_init sets up flag sysfs_initialized
  - pci_sysfs_init initializes already found devices
  - imx6_pcie_probe tries to reinitialize device

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215515

Signed-off-by: Korneliusz Osmenda <korneliuszo@gmail.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
---
 drivers/pci/pci-sysfs.c | 6 ++++++
 include/linux/pci.h     | 2 ++
 2 files changed, 8 insertions(+)
  

Comments

Oliver Neukum March 16, 2023, 9:23 a.m. UTC | #1
On 16.03.23 10:15, Alexander Stein wrote:
> From: Korneliusz Osmenda <korneliuszo@gmail.com>
> 
> On Gateworks Ventana there is a number of PCI devices and:
>    - imx6_pcie_probe takes longer than start of late init
>    - pci_sysfs_init sets up flag sysfs_initialized
>    - pci_sysfs_init initializes already found devices
>    - imx6_pcie_probe tries to reinitialize device
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215515
> 
> Signed-off-by: Korneliusz Osmenda <korneliuszo@gmail.com>
> Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
> ---
>   drivers/pci/pci-sysfs.c | 6 ++++++
>   include/linux/pci.h     | 2 ++
>   2 files changed, 8 insertions(+)
> 
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index dd0d9d9bc509..998e44716b6f 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -1497,6 +1497,9 @@ int __must_check pci_create_sysfs_dev_files(struct pci_dev *pdev)
>   	if (!sysfs_initialized)
>   		return -EACCES;
>   
> +	if (atomic_cmpxchg(&pdev->sysfs_init_cnt, 0, 1) == 1)
> +		return 0;		/* already added */
> +
>   	return pci_create_resource_files(pdev);

This is very likely a bug. You are returning an error in the error
case. Yet the flag stays. And simply resetting it in the error case
would be a race. There is something fishy in that design.

	Regards
		Oliver
  
Alexander Stein March 16, 2023, 9:33 a.m. UTC | #2
Hi Oliver,

Am Donnerstag, 16. März 2023, 10:23:54 CET schrieb Oliver Neukum:
> On 16.03.23 10:15, Alexander Stein wrote:
> > From: Korneliusz Osmenda <korneliuszo@gmail.com>
> > 
> > On Gateworks Ventana there is a number of PCI devices and:
> >    - imx6_pcie_probe takes longer than start of late init
> >    - pci_sysfs_init sets up flag sysfs_initialized
> >    - pci_sysfs_init initializes already found devices
> >    - imx6_pcie_probe tries to reinitialize device
> > 
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215515
> > 
> > Signed-off-by: Korneliusz Osmenda <korneliuszo@gmail.com>
> > Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
> > ---
> > 
> >   drivers/pci/pci-sysfs.c | 6 ++++++
> >   include/linux/pci.h     | 2 ++
> >   2 files changed, 8 insertions(+)
> > 
> > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > index dd0d9d9bc509..998e44716b6f 100644
> > --- a/drivers/pci/pci-sysfs.c
> > +++ b/drivers/pci/pci-sysfs.c
> > @@ -1497,6 +1497,9 @@ int __must_check pci_create_sysfs_dev_files(struct
> > pci_dev *pdev)> 
> >   	if (!sysfs_initialized)
> >   	
> >   		return -EACCES;
> > 
> > +	if (atomic_cmpxchg(&pdev->sysfs_init_cnt, 0, 1) == 1)
> > +		return 0;		/* already added */
> > +
> > 
> >   	return pci_create_resource_files(pdev);
> 
> This is very likely a bug. You are returning an error in the error
> case. Yet the flag stays.

Ah, you are right. This is something needed to address.

> And simply resetting it in the error case
> would be a race. There is something fishy in that design.

Admittedly 
I would like to get rid of these two pathes for creating sysfs files in the 
first place, but I do not know the pci subsystem very well.
IMHO for_each_pci_dev(pdev) in pci_sysfs_init is part of the problem as it 
unconditionally iterates over the bus, without any locks, thus creating sysfs 
files for each device added to the bus.
Any ideas?

Best regards,
Alexander
  
Oliver Neukum March 16, 2023, 11:17 a.m. UTC | #3
On 16.03.23 10:33, Alexander Stein wrote:
> Hi Oliver,

Hi,

> 
> Admittedly
> I would like to get rid of these two pathes for creating sysfs files in the
> first place, but I do not know the pci subsystem very well.
> IMHO for_each_pci_dev(pdev) in pci_sysfs_init is part of the problem as it
> unconditionally iterates over the bus, without any locks, thus creating sysfs
> files for each device added to the bus.
> Any ideas?

First of all, this existing code is a mess.

If I understand you have the issue that your driver adds a bridge
in dw_pcie_host_init() and the generic code in pci_create_sysfs_dev_files()
populates the directory before or while your driver does so and
the devices are effectively discovered twice.

It seems to me that you must not add a bridge before pci_create_sysfs_dev_files()
has finished. Now you could add a wait_queue and a flag and wait for it to
finish. But that is not very elegant. From which initcall is your driver
probed?

	Regards
		Oliver
  
Alexander Stein March 16, 2023, 11:58 a.m. UTC | #4
Hi Oliver,

Am Donnerstag, 16. März 2023, 12:17:32 CET schrieb Oliver Neukum:
> On 16.03.23 10:33, Alexander Stein wrote:
> > Hi Oliver,
> 
> Hi,
> 
> > Admittedly
> > I would like to get rid of these two pathes for creating sysfs files in
> > the
> > first place, but I do not know the pci subsystem very well.
> > IMHO for_each_pci_dev(pdev) in pci_sysfs_init is part of the problem as it
> > unconditionally iterates over the bus, without any locks, thus creating
> > sysfs files for each device added to the bus.
> > Any ideas?
> 
> First of all, this existing code is a mess.
> 
> If I understand you have the issue that your driver adds a bridge
> in dw_pcie_host_init() and the generic code in pci_create_sysfs_dev_files()
> populates the directory before or while your driver does so and
> the devices are effectively discovered twice.

Yep, that's my observation as well.

> It seems to me that you must not add a bridge before
> pci_create_sysfs_dev_files() has finished. Now you could add a wait_queue
> and a flag and wait for it to finish. But that is not very elegant.

Do we need the pci_sysfs_init initcall at all? Or to put it in other words, 
what does this initcall solve?
See my different approach eliminating this race at all.

> From which initcall is your driver probed?

The callstack looks like this:
> imx6_pcie_probe from platform_probe+0x5c/0xb8
> platform_probe from call_driver_probe+0x24/0x118
> call_driver_probe from really_probe+0xc4/0x31c
> really_probe from __driver_probe_device+0x8c/0x120
> __driver_probe_device from driver_probe_device+0x30/0xc0
> driver_probe_device from __driver_attach_async_helper+0x50/0xd8
> __driver_attach_async_helper from async_run_entry_fn+0x30/0x144
> async_run_entry_fn from process_one_work+0x1c4/0x3d0
> process_one_work from worker_thread+0x50/0x41c
> worker_thread from kthread+0xec/0x104
> kthread from ret_from_fork+0x14/0x2c

So technically the device is not probed from within a initcall but a kthread. 
It is set to be probed asynchronous in imx6_pcie_driver.

This async call is scheduled in __driver_attach, from this callstack:
> __driver_attach from bus_for_each_dev+0x74/0xc8
> bus_for_each_dev from bus_add_driver+0xf0/0x1f4
> bus_add_driver from driver_register+0x7c/0x118
> driver_register from do_one_initcall+0x4c/0x180
> do_one_initcall from do_initcalls+0xe0/0x114
> do_initcalls from kernel_init_freeable+0xd8/0x100
> kernel_init_freeable from kernel_init+0x18/0x12c
> kernel_init from ret_from_fork+0x14/0x2c

Best regards,
Alexander
  
Oliver Neukum March 16, 2023, 12:23 p.m. UTC | #5
On 16.03.23 12:58, Alexander Stein wrote:
> Hi Oliver,
> 
> Am Donnerstag, 16. März 2023, 12:17:32 CET schrieb Oliver Neukum:

>> It seems to me that you must not add a bridge before
>> pci_create_sysfs_dev_files() has finished. Now you could add a wait_queue
>> and a flag and wait for it to finish. But that is not very elegant.
> 
> Do we need the pci_sysfs_init initcall at all? Or to put it in other words,
> what does this initcall solve?

Fundamentally something has to discover the root bridge.
Secondly your system has to boot. The device right behind
the root bridge will already be up and running when the kernel
takes control. IMHO treating such devices differently from
other devices makes sense.

> See my different approach eliminating this race at all.

Please elaborate
  
>>  From which initcall is your driver probed?
> 
> The callstack looks like this:
>> imx6_pcie_probe from platform_probe+0x5c/0xb8
>> platform_probe from call_driver_probe+0x24/0x118
>> call_driver_probe from really_probe+0xc4/0x31c
>> really_probe from __driver_probe_device+0x8c/0x120
>> __driver_probe_device from driver_probe_device+0x30/0xc0
>> driver_probe_device from __driver_attach_async_helper+0x50/0xd8
>> __driver_attach_async_helper from async_run_entry_fn+0x30/0x144
>> async_run_entry_fn from process_one_work+0x1c4/0x3d0
>> process_one_work from worker_thread+0x50/0x41c
>> worker_thread from kthread+0xec/0x104
>> kthread from ret_from_fork+0x14/0x2c
> 
> So technically the device is not probed from within a initcall but a kthread.
> It is set to be probed asynchronous in imx6_pcie_driver.

That may be the problem, respectively that system is incomplete
You are registering a PCI bridge. The PCI subsystem should be
done setting up when you run. That is just a simple dependency.

	Regards
		Oliver
  
Alexander Stein March 16, 2023, 1:16 p.m. UTC | #6
Hi Oliver,

Am Donnerstag, 16. März 2023, 13:23:25 CET schrieb Oliver Neukum:
> On 16.03.23 12:58, Alexander Stein wrote:
> > Hi Oliver,
> > 
> > Am Donnerstag, 16. März 2023, 12:17:32 CET schrieb Oliver Neukum:
> >> It seems to me that you must not add a bridge before
> >> pci_create_sysfs_dev_files() has finished. Now you could add a wait_queue
> >> and a flag and wait for it to finish. But that is not very elegant.
> > 
> > Do we need the pci_sysfs_init initcall at all? Or to put it in other
> > words,
> > what does this initcall solve?
> 
> Fundamentally something has to discover the root bridge.
> Secondly your system has to boot. The device right behind
> the root bridge will already be up and running when the kernel
> takes control. IMHO treating such devices differently from
> other devices makes sense.

But isn't the root bridge discovered by the driver (pci-imx6 in this case) for 
that? And the driver probe path eventually calls into the sysfs file creation.
I compared the file creation to usb, as this is a discoverable bus as well. 
There is no special initialization regarding sysfs.

> > See my different approach eliminating this race at all.
> 
> Please elaborate

Currently the initcall pci_sysfs_init and the PCIe root bridge driver probe
paths are competing for file creation.
If, for some reason, the device enumeration for PCI bus during imx6_pcie_probe 
is delayed after pci_sysfs_init initcall, this initcall essentially does 
nothing, no devices or busses to iterate. Which means the complete pcie sysfs 
creation is done from bridge probe path. There is no reason to iterate over 
discovered PCIe devices/busses separately.

I assume this issue is not that prominent, if at all, as other platforms vary 
in speed a lot. I was not able to reproduce on i.MX8MP which uses the same 
PCIe bridge driver. Due to improved speed performance, I guess on this 
platform pci_sysfs_init finishes, without doing anything, before PCIe bridge 
is probed.

I might be missing something (ACPI systems, etc.), I do not know the details 
within pci subsystem, but from my point of view this initcall is superfluous.

For the record the patch is at [1]

[1] https://lore.kernel.org/linux-pci/20230316103036.1837869-1-alexander.stein@ew.tq-group.com/T/#u

> >>  From which initcall is your driver probed?
> > 
> > The callstack looks like this:
> >> imx6_pcie_probe from platform_probe+0x5c/0xb8
> >> platform_probe from call_driver_probe+0x24/0x118
> >> call_driver_probe from really_probe+0xc4/0x31c
> >> really_probe from __driver_probe_device+0x8c/0x120
> >> __driver_probe_device from driver_probe_device+0x30/0xc0
> >> driver_probe_device from __driver_attach_async_helper+0x50/0xd8
> >> __driver_attach_async_helper from async_run_entry_fn+0x30/0x144
> >> async_run_entry_fn from process_one_work+0x1c4/0x3d0
> >> process_one_work from worker_thread+0x50/0x41c
> >> worker_thread from kthread+0xec/0x104
> >> kthread from ret_from_fork+0x14/0x2c
> > 
> > So technically the device is not probed from within a initcall but a
> > kthread. It is set to be probed asynchronous in imx6_pcie_driver.
> 
> That may be the problem, respectively that system is incomplete
> You are registering a PCI bridge. The PCI subsystem should be
> done setting up when you run. That is just a simple dependency.

Is there such an dependency in the first place? I can't see anything, even the 
late_initcall to pci_resource_alignment_sysfs_init is a different matter.

Best regards,
Alexander
  
Oliver Neukum March 16, 2023, 2:01 p.m. UTC | #7
On 16.03.23 14:16, Alexander Stein wrote:

> But isn't the root bridge discovered by the driver (pci-imx6 in this case) for
> that? And the driver probe path eventually calls into the sysfs file creation.
> I compared the file creation to usb, as this is a discoverable bus as well.
> There is no special initialization regarding sysfs.

If you discover a bus system you always have the option of creating of virtual
hotplug event for the root hub or host controller.
But for PCI that is a bad design choice. USB is different.

> If, for some reason, the device enumeration for PCI bus during imx6_pcie_probe
> is delayed after pci_sysfs_init initcall, this initcall essentially does
> nothing, no devices or busses to iterate. Which means the complete pcie sysfs

On your specific system. You cannot use that as a model for all systems.

> creation is done from bridge probe path. There is no reason to iterate over
> discovered PCIe devices/busses separately.

If there is no other PCI device, the loop is a nop. But otherwise it is necessary.

>>> So technically the device is not probed from within a initcall but a
>>> kthread. It is set to be probed asynchronous in imx6_pcie_driver.
>>
>> That may be the problem, respectively that system is incomplete
>> You are registering a PCI bridge. The PCI subsystem should be
>> done setting up when you run. That is just a simple dependency.
> 
> Is there such an dependency in the first place? I can't see anything, even the
> late_initcall to pci_resource_alignment_sysfs_init is a different matter.

On your hardware, yes. In the kernel, no.
That is the very point. The kernel is missing a way to represent a dependency.

	Regards
		Oliver
  
Alexander Stein March 16, 2023, 3 p.m. UTC | #8
Hi Oliver,

Am Donnerstag, 16. März 2023, 15:01:25 CET schrieb Oliver Neukum:
> On 16.03.23 14:16, Alexander Stein wrote:
> > But isn't the root bridge discovered by the driver (pci-imx6 in this case)
> > for that? And the driver probe path eventually calls into the sysfs file
> > creation. I compared the file creation to usb, as this is a discoverable
> > bus as well. There is no special initialization regarding sysfs.
> 
> If you discover a bus system you always have the option of creating of
> virtual hotplug event for the root hub or host controller.
> But for PCI that is a bad design choice. USB is different.

I'm not sure if I can follow you here. Can you elaborate?

> > If, for some reason, the device enumeration for PCI bus during
> > imx6_pcie_probe is delayed after pci_sysfs_init initcall, this initcall
> > essentially does nothing, no devices or busses to iterate. Which means
> > the complete pcie sysfs
> On your specific system. You cannot use that as a model for all systems.

I am aware that my platform is not a role model for the others. But I've yet 
to get information what is actually different on other platforms.

> > creation is done from bridge probe path. There is no reason to iterate
> > over
> > discovered PCIe devices/busses separately.
> 
> If there is no other PCI device, the loop is a nop. But otherwise it is
> necessary.

How is it necessary? How do these PCI devices get attaches to the pci_bus_type 
bus without calling pci_bus_add_device?

> >>> So technically the device is not probed from within a initcall but a
> >>> kthread. It is set to be probed asynchronous in imx6_pcie_driver.
> >> 
> >> That may be the problem, respectively that system is incomplete
> >> You are registering a PCI bridge. The PCI subsystem should be
> >> done setting up when you run. That is just a simple dependency.
> > 
> > Is there such an dependency in the first place? I can't see anything, even
> > the late_initcall to pci_resource_alignment_sysfs_init is a different
> > matter.
> On your hardware, yes. In the kernel, no.
> That is the very point. The kernel is missing a way to represent a
> dependency.

Okay, so which dependency is provided by pci_sysfs_init, which are required by 
drivers then?

Best regards,
Alexander
  
Oliver Neukum March 21, 2023, 9:09 a.m. UTC | #9
Am Donnerstag, 16. März 2023, 15:01:25 CET schrieb Oliver Neukum:
> I'm not sure if I can follow you here. Can you elaborate?

There are far better reasons to leave the setup of a PCI bus as the firmware
has done it than to leave a USB as is.


> How is it necessary? How do these PCI devices get attaches to the pci_bus_type 
> bus without calling pci_bus_add_device?

AFAICT they don't. But somebody has to call it.


> Okay, so which dependency is provided by pci_sysfs_init, which are required by 
> drivers then?

It isn't. It is missing from the code. But it exists in reality. That is the point.
You have a race condition between two probes. We cannot have that.
Hence IMHO the dependency would be best expressed by waiting for pci_sysfs_init()
to finish in the init sequence before you add any PCI bridges or devices to the system.

	Regards
		Oliver
  

Patch

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index dd0d9d9bc509..998e44716b6f 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1497,6 +1497,9 @@  int __must_check pci_create_sysfs_dev_files(struct pci_dev *pdev)
 	if (!sysfs_initialized)
 		return -EACCES;
 
+	if (atomic_cmpxchg(&pdev->sysfs_init_cnt, 0, 1) == 1)
+		return 0;		/* already added */
+
 	return pci_create_resource_files(pdev);
 }
 
@@ -1511,6 +1514,9 @@  void pci_remove_sysfs_dev_files(struct pci_dev *pdev)
 	if (!sysfs_initialized)
 		return;
 
+	if (atomic_cmpxchg(&pdev->sysfs_init_cnt, 1, 0) == 0)
+		return;		/* already removed */
+
 	pci_remove_resource_files(pdev);
 }
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index b50e5c79f7e3..024313a7a90a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -467,6 +467,8 @@  struct pci_dev {
 	pci_dev_flags_t dev_flags;
 	atomic_t	enable_cnt;	/* pci_enable_device has been called */
 
+	atomic_t	sysfs_init_cnt;	/* pci_create_sysfs_dev_files has been called */
+
 	u32		saved_config_space[16]; /* Config space saved at suspend time */
 	struct hlist_head saved_cap_space;
 	int		rom_attr_enabled;	/* Display of ROM attribute enabled? */