[v1,1/1] Revert "pinctrl: avoid unsafe code pattern in find_pinctrl()"

Message ID 20231017141806.535191-1-andriy.shevchenko@linux.intel.com
State New
Headers
Series [v1,1/1] Revert "pinctrl: avoid unsafe code pattern in find_pinctrl()" |

Commit Message

Andy Shevchenko Oct. 17, 2023, 2:18 p.m. UTC
  The commit breaks MMC enumeration on the Intel Merrifield
plaform.

Before:
[   36.439057] mmc0: SDHCI controller on PCI [0000:00:01.0] using ADMA
[   36.450924] mmc2: SDHCI controller on PCI [0000:00:01.3] using ADMA
[   36.459355] mmc1: SDHCI controller on PCI [0000:00:01.2] using ADMA
[   36.706399] mmc0: new DDR MMC card at address 0001
[   37.058972] mmc2: new ultra high speed DDR50 SDIO card at address 0001
[   37.278977] mmcblk0: mmc0:0001 H4G1d 3.64 GiB
[   37.297300]  mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10

After:
[   36.436704] mmc2: SDHCI controller on PCI [0000:00:01.3] using ADMA
[   36.436720] mmc1: SDHCI controller on PCI [0000:00:01.0] using ADMA
[   36.463685] mmc0: SDHCI controller on PCI [0000:00:01.2] using ADMA
[   36.720627] mmc1: new DDR MMC card at address 0001
[   37.068181] mmc2: new ultra high speed DDR50 SDIO card at address 0001
[   37.279998] mmcblk1: mmc1:0001 H4G1d 3.64 GiB
[   37.302670]  mmcblk1: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10

This reverts commit c153a4edff6ab01370fcac8e46f9c89cca1060c2.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 drivers/pinctrl/core.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)
  

Comments

Linus Walleij Oct. 17, 2023, 6:18 p.m. UTC | #1
On Tue, Oct 17, 2023 at 4:18 PM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:

> The commit breaks MMC enumeration on the Intel Merrifield
> plaform.

The enumeration works, just that the probe order is different, right?

> Before:
> [   36.439057] mmc0: SDHCI controller on PCI [0000:00:01.0] using ADMA
> [   36.450924] mmc2: SDHCI controller on PCI [0000:00:01.3] using ADMA
> [   36.459355] mmc1: SDHCI controller on PCI [0000:00:01.2] using ADMA
> [   36.706399] mmc0: new DDR MMC card at address 0001
> [   37.058972] mmc2: new ultra high speed DDR50 SDIO card at address 0001
> [   37.278977] mmcblk0: mmc0:0001 H4G1d 3.64 GiB
> [   37.297300]  mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
>
> After:
> [   36.436704] mmc2: SDHCI controller on PCI [0000:00:01.3] using ADMA
> [   36.436720] mmc1: SDHCI controller on PCI [0000:00:01.0] using ADMA
> [   36.463685] mmc0: SDHCI controller on PCI [0000:00:01.2] using ADMA
> [   36.720627] mmc1: new DDR MMC card at address 0001
> [   37.068181] mmc2: new ultra high speed DDR50 SDIO card at address 0001
> [   37.279998] mmcblk1: mmc1:0001 H4G1d 3.64 GiB
> [   37.302670]  mmcblk1: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
>
> This reverts commit c153a4edff6ab01370fcac8e46f9c89cca1060c2.
>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

Relying on this probe order or whatever it is causing one or the other
to be enumerated first seems very fragile, I think this condition can be
caused by other much more random things in the probe path as well,
so it would be great if we could just hammer this down for good, as
it is apparently ABI.

In the past some file system developers have told us (Ulf will know)
that we can't rely on the block device enumeration to identify
devices, and requires that we use things such as sysfs or the
UUID volume label in ext4 to identify storage.

That said, device trees are full of stuff like this:

        aliases {
                serial0 = &uart_AO;
                mmc0 = &sd_card_slot;
                mmc1 = &sdhc;
        };

Notice how this enumeration gets defined by the aliases.

Can you do the same with device properties? (If anyone can
answer that question it's Dmitry!)

Yours,
Linus Walleij
  
Andy Shevchenko Oct. 17, 2023, 6:34 p.m. UTC | #2
On Tue, Oct 17, 2023 at 08:18:23PM +0200, Linus Walleij wrote:
> On Tue, Oct 17, 2023 at 4:18 PM Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> wrote:
> 
> > The commit breaks MMC enumeration on the Intel Merrifield
> > plaform.
> 
> The enumeration works, just that the probe order is different, right?
> 
> > Before:
> > [   36.439057] mmc0: SDHCI controller on PCI [0000:00:01.0] using ADMA
> > [   36.450924] mmc2: SDHCI controller on PCI [0000:00:01.3] using ADMA
> > [   36.459355] mmc1: SDHCI controller on PCI [0000:00:01.2] using ADMA
> > [   36.706399] mmc0: new DDR MMC card at address 0001
> > [   37.058972] mmc2: new ultra high speed DDR50 SDIO card at address 0001
> > [   37.278977] mmcblk0: mmc0:0001 H4G1d 3.64 GiB
> > [   37.297300]  mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
> >
> > After:
> > [   36.436704] mmc2: SDHCI controller on PCI [0000:00:01.3] using ADMA
> > [   36.436720] mmc1: SDHCI controller on PCI [0000:00:01.0] using ADMA
> > [   36.463685] mmc0: SDHCI controller on PCI [0000:00:01.2] using ADMA
> > [   36.720627] mmc1: new DDR MMC card at address 0001
> > [   37.068181] mmc2: new ultra high speed DDR50 SDIO card at address 0001
> > [   37.279998] mmcblk1: mmc1:0001 H4G1d 3.64 GiB
> > [   37.302670]  mmcblk1: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
> >
> > This reverts commit c153a4edff6ab01370fcac8e46f9c89cca1060c2.
> >
> > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> 
> Relying on this probe order or whatever it is causing one or the other
> to be enumerated first seems very fragile, I think this condition can be
> caused by other much more random things in the probe path as well,
> so it would be great if we could just hammer this down for good, as
> it is apparently ABI.
> 
> In the past some file system developers have told us (Ulf will know)
> that we can't rely on the block device enumeration to identify
> devices, and requires that we use things such as sysfs or the
> UUID volume label in ext4 to identify storage.

While I technically might agree with you, this was working for everybody
since day 1 of support of Intel Merrifield added (circa v4.8), now _user
space_ is broken.

Note, I'm having _simple_ setup, no fancy UDEV or DBUS there, and I want
my scripts simply continue working. As I mentioned, this is Buildroot
+ Busybox which I haven't touched in the area of how they treat MMC
devices in _user space_.

Since we are at rc6 I prefer to get this reverted first and next cycle we can
discuss better solutions. I'm all for testing any.

> That said, device trees are full of stuff like this:
> 
>         aliases {
>                 serial0 = &uart_AO;
>                 mmc0 = &sd_card_slot;
>                 mmc1 = &sdhc;
>         };

And Rob, AFAIU, is against aliases.

> Notice how this enumeration gets defined by the aliases.
> 
> Can you do the same with device properties? (If anyone can
> answer that question it's Dmitry!)

No, and why should we?
  
Andy Shevchenko Oct. 17, 2023, 6:39 p.m. UTC | #3
On Tue, Oct 17, 2023 at 09:34:34PM +0300, Andy Shevchenko wrote:
> On Tue, Oct 17, 2023 at 08:18:23PM +0200, Linus Walleij wrote:
> > On Tue, Oct 17, 2023 at 4:18 PM Andy Shevchenko
> > <andriy.shevchenko@linux.intel.com> wrote:
> > 
> > > The commit breaks MMC enumeration on the Intel Merrifield
> > > plaform.
> > 
> > The enumeration works, just that the probe order is different, right?
> > 
> > > Before:
> > > [   36.439057] mmc0: SDHCI controller on PCI [0000:00:01.0] using ADMA
> > > [   36.450924] mmc2: SDHCI controller on PCI [0000:00:01.3] using ADMA
> > > [   36.459355] mmc1: SDHCI controller on PCI [0000:00:01.2] using ADMA
> > > [   36.706399] mmc0: new DDR MMC card at address 0001
> > > [   37.058972] mmc2: new ultra high speed DDR50 SDIO card at address 0001
> > > [   37.278977] mmcblk0: mmc0:0001 H4G1d 3.64 GiB
> > > [   37.297300]  mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
> > >
> > > After:
> > > [   36.436704] mmc2: SDHCI controller on PCI [0000:00:01.3] using ADMA
> > > [   36.436720] mmc1: SDHCI controller on PCI [0000:00:01.0] using ADMA
> > > [   36.463685] mmc0: SDHCI controller on PCI [0000:00:01.2] using ADMA
> > > [   36.720627] mmc1: new DDR MMC card at address 0001
> > > [   37.068181] mmc2: new ultra high speed DDR50 SDIO card at address 0001
> > > [   37.279998] mmcblk1: mmc1:0001 H4G1d 3.64 GiB
> > > [   37.302670]  mmcblk1: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
> > >
> > > This reverts commit c153a4edff6ab01370fcac8e46f9c89cca1060c2.
> > >
> > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > 
> > Relying on this probe order or whatever it is causing one or the other
> > to be enumerated first seems very fragile, I think this condition can be
> > caused by other much more random things in the probe path as well,
> > so it would be great if we could just hammer this down for good, as
> > it is apparently ABI.

And as I mentioned in the reply to the patch, I have 100% reproducibility
of the issue, I never have "random" or arbitrary numbers. While it might
be fragile, it very well works reliably for _years_.

> > In the past some file system developers have told us (Ulf will know)
> > that we can't rely on the block device enumeration to identify
> > devices, and requires that we use things such as sysfs or the
> > UUID volume label in ext4 to identify storage.
> 
> While I technically might agree with you, this was working for everybody
> since day 1 of support of Intel Merrifield added (circa v4.8), now _user
> space_ is broken.
> 
> Note, I'm having _simple_ setup, no fancy UDEV or DBUS there, and I want
> my scripts simply continue working. As I mentioned, this is Buildroot
> + Busybox which I haven't touched in the area of how they treat MMC
> devices in _user space_.
> 
> Since we are at rc6 I prefer to get this reverted first and next cycle we can
> discuss better solutions. I'm all for testing any.
> 
> > That said, device trees are full of stuff like this:
> > 
> >         aliases {
> >                 serial0 = &uart_AO;
> >                 mmc0 = &sd_card_slot;
> >                 mmc1 = &sdhc;
> >         };
> 
> And Rob, AFAIU, is against aliases.
> 
> > Notice how this enumeration gets defined by the aliases.
> > 
> > Can you do the same with device properties? (If anyone can
> > answer that question it's Dmitry!)
> 
> No, and why should we?
  
Linus Walleij Oct. 17, 2023, 6:59 p.m. UTC | #4
On Tue, Oct 17, 2023 at 8:34 PM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
> On Tue, Oct 17, 2023 at 08:18:23PM +0200, Linus Walleij wrote:

> > In the past some file system developers have told us (Ulf will know)
> > that we can't rely on the block device enumeration to identify
> > devices, and requires that we use things such as sysfs or the
> > UUID volume label in ext4 to identify storage.
>
> While I technically might agree with you, this was working for everybody
> since day 1 of support of Intel Merrifield added (circa v4.8), now _user
> space_ is broken.

Actually, I don't agree with that, just relaying it. I would prefer that we
solve exactly the problem that we are facing here: some random unrelated
code or similar affecting enumeration order of mmc devices.

It's not the first time it happens to me, I have several devices that change
this enumeration order depending on whether an SD card is plugged
in or not, and in a *BIG* way: the boot partition on the soldered eMMC
changes enumeration depending on whether an SD card is inserted
or not, and that has never been fixed (because above).

> > That said, device trees are full of stuff like this:
> >
> >         aliases {
> >                 serial0 = &uart_AO;
> >                 mmc0 = &sd_card_slot;
> >                 mmc1 = &sdhc;
> >         };
>
> And Rob, AFAIU, is against aliases.
>
> > Notice how this enumeration gets defined by the aliases.
> >
> > Can you do the same with device properties? (If anyone can
> > answer that question it's Dmitry!)
>
> No, and why should we?

Because device properties are not device tree, they are just some
Linux thing so we can do whatever we want. Just checking if
Dmitry has some idea that would solve this for good, he usually
replies quickly.

Yours,
Linus Walleij
  
Andy Shevchenko Oct. 17, 2023, 7:45 p.m. UTC | #5
On Tue, Oct 17, 2023 at 08:59:05PM +0200, Linus Walleij wrote:
> On Tue, Oct 17, 2023 at 8:34 PM Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> wrote:
> > On Tue, Oct 17, 2023 at 08:18:23PM +0200, Linus Walleij wrote:
> 
> > > In the past some file system developers have told us (Ulf will know)
> > > that we can't rely on the block device enumeration to identify
> > > devices, and requires that we use things such as sysfs or the
> > > UUID volume label in ext4 to identify storage.
> >
> > While I technically might agree with you, this was working for everybody
> > since day 1 of support of Intel Merrifield added (circa v4.8), now _user
> > space_ is broken.
> 
> Actually, I don't agree with that, just relaying it. I would prefer that we
> solve exactly the problem that we are facing here: some random unrelated
> code or similar affecting enumeration order of mmc devices.
> 
> It's not the first time it happens to me, I have several devices that change
> this enumeration order depending on whether an SD card is plugged
> in or not, and in a *BIG* way: the boot partition on the soldered eMMC
> changes enumeration depending on whether an SD card is inserted
> or not, and that has never been fixed (because above).

This is not the problem I have. I haven't added any SD card, hardware
configuration is the same. The solely difference in the whole setup is
this revert applied or not.

> > > That said, device trees are full of stuff like this:
> > >
> > >         aliases {
> > >                 serial0 = &uart_AO;
> > >                 mmc0 = &sd_card_slot;
> > >                 mmc1 = &sdhc;
> > >         };
> >
> > And Rob, AFAIU, is against aliases.
> >
> > > Notice how this enumeration gets defined by the aliases.
> > >
> > > Can you do the same with device properties? (If anyone can
> > > answer that question it's Dmitry!)
> >
> > No, and why should we?
> 
> Because device properties are not device tree, they are just some
> Linux thing so we can do whatever we want. Just checking if
> Dmitry has some idea that would solve this for good, he usually
> replies quickly.

OK.
  
Dmitry Torokhov Oct. 17, 2023, 9:43 p.m. UTC | #6
Hi Andy,

On Tue, Oct 17, 2023 at 10:45:39PM +0300, Andy Shevchenko wrote:
> On Tue, Oct 17, 2023 at 08:59:05PM +0200, Linus Walleij wrote:
> > On Tue, Oct 17, 2023 at 8:34 PM Andy Shevchenko
> > <andriy.shevchenko@linux.intel.com> wrote:
> > > On Tue, Oct 17, 2023 at 08:18:23PM +0200, Linus Walleij wrote:
> > 
> > > > In the past some file system developers have told us (Ulf will know)
> > > > that we can't rely on the block device enumeration to identify
> > > > devices, and requires that we use things such as sysfs or the
> > > > UUID volume label in ext4 to identify storage.
> > >
> > > While I technically might agree with you, this was working for everybody
> > > since day 1 of support of Intel Merrifield added (circa v4.8), now _user
> > > space_ is broken.
> > 
> > Actually, I don't agree with that, just relaying it. I would prefer that we
> > solve exactly the problem that we are facing here: some random unrelated
> > code or similar affecting enumeration order of mmc devices.

Sorry, but the era of static configuration where one has a well defined
order in which things are probed and numbered has long gone. The right
answer is either device aliases that provides stable numbering on a
board that is not dependent on scheduler behavior, mutexes
implementation (how they deal with writer starvation, etc),
kernel/driver/subsystem linking order and myriad other things, or
mounting by UUID. The kernel does not provide any guarantees on the
stability of device probe and instantiation order.

If you think about it it is the same issue as legacy GPIO numbering.
It was convenient some time ago, but now it is no longer suitable or
sufficient and could change when kernel is uprevved.

> > 
> > It's not the first time it happens to me, I have several devices that change
> > this enumeration order depending on whether an SD card is plugged
> > in or not, and in a *BIG* way: the boot partition on the soldered eMMC
> > changes enumeration depending on whether an SD card is inserted
> > or not, and that has never been fixed (because above).
> 
> This is not the problem I have. I haven't added any SD card, hardware
> configuration is the same. The solely difference in the whole setup is
> this revert applied or not.

Yes, I guess there is a contention on this mutex and the fact that we
are now taking it once and not twice makes difference in which probes
happen. If you look at the logs, you will see that even before the patch
controllers did not enumerate on the order of PCI functions:

[   36.439057] mmc0: SDHCI controller on PCI [0000:00:01.0] using ADMA
[   36.450924] mmc2: SDHCI controller on PCI [0000:00:01.3] using ADMA
[   36.459355] mmc1: SDHCI controller on PCI [0000:00:01.2] using ADMA

So you have mmc2 instantiated before mmc1 even before the patch. This
happens because we now have

		.probe_type = PROBE_PREFER_ASYNCHRONOUS,

in sdhci_driver structure in drivers/mmc/host/sdhci-pci-core.c. It just
happened that even with asynchronous probing your storage did end up on
mmc0 originally and you were happy.

I wonder, could you please post entire dmesg for your system?

> 
> > > > That said, device trees are full of stuff like this:
> > > >
> > > >         aliases {
> > > >                 serial0 = &uart_AO;
> > > >                 mmc0 = &sd_card_slot;
> > > >                 mmc1 = &sdhc;
> > > >         };
> > >
> > > And Rob, AFAIU, is against aliases.

Rob might not want them, but they are the reality and are present for
multiple classes of devices and I believe are here to stay.

> > >
> > > > Notice how this enumeration gets defined by the aliases.
> > > >
> > > > Can you do the same with device properties? (If anyone can
> > > > answer that question it's Dmitry!)
> > >
> > > No, and why should we?
> > 
> > Because device properties are not device tree, they are just some
> > Linux thing so we can do whatever we want. Just checking if
> > Dmitry has some idea that would solve this for good, he usually
> > replies quickly.
> 
> OK.

I think the right answer is "fix the userspace" really in this case. We
could also try extend of_alias_get_id() to see if we could pass some
preferred numbering on x86. But this will again be fragile if the
knowledge resides in the driver and is not tied to a particular board
(as it is in DT case): there could be multiple controllers, things will
be shifting board to board...

Thanks.
  
Andy Shevchenko Oct. 18, 2023, 5:01 a.m. UTC | #7
On Tue, Oct 17, 2023 at 02:43:01PM -0700, Dmitry Torokhov wrote:
> On Tue, Oct 17, 2023 at 10:45:39PM +0300, Andy Shevchenko wrote:

Thanks for your response.

...

> I wonder, could you please post entire dmesg for your system?

Working, non-working or both?

...

> I think the right answer is "fix the userspace" really in this case. We
> could also try extend of_alias_get_id() to see if we could pass some
> preferred numbering on x86. But this will again be fragile if the
> knowledge resides in the driver and is not tied to a particular board
> (as it is in DT case): there could be multiple controllers, things will
> be shifting board to board...

Any suggestion how should it be properly done in the minimum shell environment?
(Busybox uses mdev with static tables IIRC and there is no fancy udev or so)
  
Ferry Toth Oct. 18, 2023, 7:56 a.m. UTC | #8
Hi,

(resend due to html reject)

On 17-10-2023 23:43, Dmitry Torokhov wrote:
> Hi Andy,
> 
> On Tue, Oct 17, 2023 at 10:45:39PM +0300, Andy Shevchenko wrote:
>> On Tue, Oct 17, 2023 at 08:59:05PM +0200, Linus Walleij wrote:
>>> On Tue, Oct 17, 2023 at 8:34 PM Andy Shevchenko
>>> <andriy.shevchenko@linux.intel.com> wrote:
>>>> On Tue, Oct 17, 2023 at 08:18:23PM +0200, Linus Walleij wrote:
>>>
>>>>> In the past some file system developers have told us (Ulf will know)
>>>>> that we can't rely on the block device enumeration to identify
>>>>> devices, and requires that we use things such as sysfs or the
>>>>> UUID volume label in ext4 to identify storage.
>>>>
>>>> While I technically might agree with you, this was working for everybody
>>>> since day 1 of support of Intel Merrifield added (circa v4.8), now _user
>>>> space_ is broken.
>>>
>>> Actually, I don't agree with that, just relaying it. I would prefer that we
>>> solve exactly the problem that we are facing here: some random unrelated
>>> code or similar affecting enumeration order of mmc devices.
> 
> Sorry, but the era of static configuration where one has a well defined
> order in which things are probed and numbered has long gone. The right
> answer is either device aliases that provides stable numbering on a
> board that is not dependent on scheduler behavior, mutexes
> implementation (how they deal with writer starvation, etc),
> kernel/driver/subsystem linking order and myriad other things, or
> mounting by UUID. The kernel does not provide any guarantees on the
> stability of device probe and instantiation order.
> 
> If you think about it it is the same issue as legacy GPIO numbering.
> It was convenient some time ago, but now it is no longer suitable or
> sufficient and could change when kernel is uprevved.
> 
>>>
>>> It's not the first time it happens to me, I have several devices that change
>>> this enumeration order depending on whether an SD card is plugged
>>> in or not, and in a *BIG* way: the boot partition on the soldered eMMC
>>> changes enumeration depending on whether an SD card is inserted
>>> or not, and that has never been fixed (because above).
>>
>> This is not the problem I have. I haven't added any SD card, hardware
>> configuration is the same. The solely difference in the whole setup is
>> this revert applied or not.
> 
> Yes, I guess there is a contention on this mutex and the fact that we
> are now taking it once and not twice makes difference in which probes
> happen. If you look at the logs, you will see that even before the patch
> controllers did not enumerate on the order of PCI functions:
> 
> [   36.439057] mmc0: SDHCI controller on PCI [0000:00:01.0] using ADMA
> [   36.450924] mmc2: SDHCI controller on PCI [0000:00:01.3] using ADMA
> [   36.459355] mmc1: SDHCI controller on PCI [0000:00:01.2] using ADMA

You are referring to the order printed in dmesg. But actually

mmc0 = 0000:00:01.0
mmc1 = 0000:00:01.2
mmc2 = 0000:00:01.3

And this has been so for like 8 years. See f.i. 
https://github.com/edison-fw/meta-intel-edison/issues/135
(this is with Yocto, so using systemd, the issue discussed there is not 
related to this but to card detection iirc)

> So you have mmc2 instantiated before mmc1 even before the patch. This
> happens because we now have
> 
> 		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
> 
> in sdhci_driver structure in drivers/mmc/host/sdhci-pci-core.c. It just
> happened that even with asynchronous probing your storage did end up on
> mmc0 originally and you were happy.
> 
> I wonder, could you please post entire dmesg for your system?
> 
>>
>>>>> That said, device trees are full of stuff like this:
>>>>>
>>>>>          aliases {
>>>>>                  serial0 = &uart_AO;
>>>>>                  mmc0 = &sd_card_slot;
>>>>>                  mmc1 = &sdhc;
>>>>>          };
>>>>
>>>> And Rob, AFAIU, is against aliases.
> 
> Rob might not want them, but they are the reality and are present for
> multiple classes of devices and I believe are here to stay.
> 
>>>>
>>>>> Notice how this enumeration gets defined by the aliases.
>>>>>
>>>>> Can you do the same with device properties? (If anyone can
>>>>> answer that question it's Dmitry!)
>>>>
>>>> No, and why should we?
>>>
>>> Because device properties are not device tree, they are just some
>>> Linux thing so we can do whatever we want. Just checking if
>>> Dmitry has some idea that would solve this for good, he usually
>>> replies quickly.
>>
>> OK.
> 
> I think the right answer is "fix the userspace" really in this case. We
> could also try extend of_alias_get_id() to see if we could pass some
> preferred numbering on x86. But this will again be fragile if the
> knowledge resides in the driver and is not tied to a particular board
> (as it is in DT case): there could be multiple controllers, things will
> be shifting board to board...
> 
> Thanks.
>
  
Dmitry Torokhov Oct. 18, 2023, 10:41 p.m. UTC | #9
On Wed, Oct 18, 2023 at 08:01:23AM +0300, Andy Shevchenko wrote:
> On Tue, Oct 17, 2023 at 02:43:01PM -0700, Dmitry Torokhov wrote:
> > On Tue, Oct 17, 2023 at 10:45:39PM +0300, Andy Shevchenko wrote:
> 
> Thanks for your response.
> 
> ...
> 
> > I wonder, could you please post entire dmesg for your system?
> 
> Working, non-working or both?

Non working, especially if you also enable debug logs in
drivers/mmc/host/sdhci-pci-core.c.

What I do not quite understand is that I think we should not be hitting
the case where pinctrl is already created for the device, which is the
code path my patch was changing. IIUIC we should be mostly executing the
"pinctrl not found" path and that did not really change. Maybe you could
also put some more annotations to show how/at what exact point the probe
order changed? Maybe log find_pinctrl() calls and compare?

Linus, BTW, I think there are more problems there with pinctrl lookup,
because, if we assume there are concurrent accesses to pinctrl_get(),
the fact that we did not find an instance while scanning the list does
not mean we will not find it when we go to insert a newly created one.

Another problem, as far as I can see, that there is not really a defined
owner of pinctrl structure, it is created on demand, and destroyed when
last user is gone. So if we execute last pintctrl_put() and there is
another pinctrl_get() running simultaneously, we may get and bump up the
refcount, and then release (pinctrl_free) will acquire the mutex, and
zap the structure.

Given that there are more issues in that code, maybe we should revert
the patch for now so Andy has a chance to convert to UUID/LABEL booting?

> 
> ...
> 
> > I think the right answer is "fix the userspace" really in this case. We
> > could also try extend of_alias_get_id() to see if we could pass some
> > preferred numbering on x86. But this will again be fragile if the
> > knowledge resides in the driver and is not tied to a particular board
> > (as it is in DT case): there could be multiple controllers, things will
> > be shifting board to board...
> 
> Any suggestion how should it be properly done in the minimum shell environment?
> (Busybox uses mdev with static tables IIRC and there is no fancy udev or so)

I'm not sure, so you have something like blkid running? You just need to
locate the device and chroot there. This assumes you do have initramfs.

Thanks.
  
Linus Walleij Oct. 19, 2023, 8:12 a.m. UTC | #10
On Thu, Oct 19, 2023 at 12:41 AM Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:

> Linus, BTW, I think there are more problems there with pinctrl lookup,
> because, if we assume there are concurrent accesses to pinctrl_get(),
> the fact that we did not find an instance while scanning the list does
> not mean we will not find it when we go to insert a newly created one.

I'm not surprised. Pinctrl comes from a time when the probing was
mostly serial, and since subsystems (such as MMC) has increasingly
added asynchronous probing, accompanied by rework of device tree
core etc (device tree didn't even exist when we started pin control)
to parallelize probing based on device hierarchy topology etc.

The people making probes ever more asynchronous probably just
tested if the system still boots and did not bother to go and look at
any rough edges in resource supplying subsystems, including clk,
regulator, gpio, reset, pin control...

There is a reason to why it mostly works, which I explain below:

> Another problem, as far as I can see, that there is not really a defined
> owner of pinctrl structure, it is created on demand, and destroyed when
> last user is gone. So if we execute last pintctrl_put() and there is
> another pinctrl_get() running simultaneously, we may get and bump up the
> refcount, and then release (pinctrl_free) will acquire the mutex, and
> zap the structure.

You mean we need to acquire the mutex in the code that calls
find_pinctrl() instead of inside find_pinctrl()? Yes I think you're right,
wanna do a patch?

It is largely theoretical because of the following:

A pin control handle is
usually taken by a driver for a device, it is usually unique
for that exact hardware (in difference from a clock, or a regulator,
which is often shared), so the scenario you are designing for here
would be that the driver for a device is probing the *same* hardware
on two runpaths, which is not going happen, right?

So while the software is not race-safe, the nature of the hardware
makes it safe: there is just one device instance per pin control
handle.

I haven't thought it through in detail so there may be corner
cases.

> Given that there are more issues in that code, maybe we should revert
> the patch for now so Andy has a chance to convert to UUID/LABEL booting?

Yeah I reverted it, the above elaboration may apply to this patch
too and makes me feel we are "mostly safe" in this regard anyway.

Yours,
Linus Walleij
  
Linus Walleij Oct. 19, 2023, 8:13 a.m. UTC | #11
On Tue, Oct 17, 2023 at 4:18 PM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:


> The commit breaks MMC enumeration on the Intel Merrifield
> plaform.
>
> Before:
> [   36.439057] mmc0: SDHCI controller on PCI [0000:00:01.0] using ADMA
> [   36.450924] mmc2: SDHCI controller on PCI [0000:00:01.3] using ADMA
> [   36.459355] mmc1: SDHCI controller on PCI [0000:00:01.2] using ADMA
> [   36.706399] mmc0: new DDR MMC card at address 0001
> [   37.058972] mmc2: new ultra high speed DDR50 SDIO card at address 0001
> [   37.278977] mmcblk0: mmc0:0001 H4G1d 3.64 GiB
> [   37.297300]  mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
>
> After:
> [   36.436704] mmc2: SDHCI controller on PCI [0000:00:01.3] using ADMA
> [   36.436720] mmc1: SDHCI controller on PCI [0000:00:01.0] using ADMA
> [   36.463685] mmc0: SDHCI controller on PCI [0000:00:01.2] using ADMA
> [   36.720627] mmc1: new DDR MMC card at address 0001
> [   37.068181] mmc2: new ultra high speed DDR50 SDIO card at address 0001
> [   37.279998] mmcblk1: mmc1:0001 H4G1d 3.64 GiB
> [   37.302670]  mmcblk1: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
>
> This reverts commit c153a4edff6ab01370fcac8e46f9c89cca1060c2.
>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

Patch applied for pronto fixes!

Yours,
Linus Walleij
  
Andy Shevchenko Oct. 19, 2023, 12:13 p.m. UTC | #12
On Wed, Oct 18, 2023 at 03:41:24PM -0700, Dmitry Torokhov wrote:
> On Wed, Oct 18, 2023 at 08:01:23AM +0300, Andy Shevchenko wrote:
> > On Tue, Oct 17, 2023 at 02:43:01PM -0700, Dmitry Torokhov wrote:
> > > On Tue, Oct 17, 2023 at 10:45:39PM +0300, Andy Shevchenko wrote:
> > 
> > Thanks for your response.

...

> > > I wonder, could you please post entire dmesg for your system?
> > 
> > Working, non-working or both?
> 
> Non working, especially if you also enable debug logs in
> drivers/mmc/host/sdhci-pci-core.c.
> 
> What I do not quite understand is that I think we should not be hitting
> the case where pinctrl is already created for the device, which is the
> code path my patch was changing. IIUIC we should be mostly executing the
> "pinctrl not found" path and that did not really change. Maybe you could
> also put some more annotations to show how/at what exact point the probe
> order changed? Maybe log find_pinctrl() calls and compare?
> 
> Linus, BTW, I think there are more problems there with pinctrl lookup,
> because, if we assume there are concurrent accesses to pinctrl_get(),
> the fact that we did not find an instance while scanning the list does
> not mean we will not find it when we go to insert a newly created one.
> 
> Another problem, as far as I can see, that there is not really a defined
> owner of pinctrl structure, it is created on demand, and destroyed when
> last user is gone. So if we execute last pintctrl_put() and there is
> another pinctrl_get() running simultaneously, we may get and bump up the
> refcount, and then release (pinctrl_free) will acquire the mutex, and
> zap the structure.

Oh, that's a lot of fixing ahead! But if you send anything to test, I would
happy do it.

> Given that there are more issues in that code, maybe we should revert
> the patch for now so Andy has a chance to convert to UUID/LABEL booting?

I believe it's not feasible, see below why.

...

> > > I think the right answer is "fix the userspace" really in this case. We
> > > could also try extend of_alias_get_id() to see if we could pass some
> > > preferred numbering on x86. But this will again be fragile if the
> > > knowledge resides in the driver and is not tied to a particular board
> > > (as it is in DT case): there could be multiple controllers, things will
> > > be shifting board to board...
> > 
> > Any suggestion how should it be properly done in the minimum shell environment?
> > (Busybox uses mdev with static tables IIRC and there is no fancy udev or so)
> 
> I'm not sure, so you have something like blkid running? You just need to
> locate the device and chroot there. This assumes you do have initramfs.

I don't think this is working solution.

My case is: I have build an environment with a script that has hardcoded
mmcblk0 to mount from. When I run this script I do not know _which_ exact
board I run on, it should work on any of them (same boards, but different
UUIDs).

While writing this I realised that the common denominator I have here is
the physical device (as it's a PCI one), and it's on-SoC, so can't change
its BDF. So, there seem to be a solution. Let me try to implement that.
  
Andy Shevchenko Oct. 19, 2023, 12:15 p.m. UTC | #13
On Thu, Oct 19, 2023 at 10:12:57AM +0200, Linus Walleij wrote:
> On Thu, Oct 19, 2023 at 12:41 AM Dmitry Torokhov
> <dmitry.torokhov@gmail.com> wrote:

...

> > Given that there are more issues in that code, maybe we should revert
> > the patch for now so Andy has a chance to convert to UUID/LABEL booting?
> 
> Yeah I reverted it, the above elaboration may apply to this patch
> too and makes me feel we are "mostly safe" in this regard anyway.

Thank you!

And on my side I'll try my best to avoid static enumerations.
  
Andy Shevchenko Oct. 19, 2023, 4:52 p.m. UTC | #14
On Wed, Oct 18, 2023 at 03:41:24PM -0700, Dmitry Torokhov wrote:
> On Wed, Oct 18, 2023 at 08:01:23AM +0300, Andy Shevchenko wrote:
> > On Tue, Oct 17, 2023 at 02:43:01PM -0700, Dmitry Torokhov wrote:
> > > On Tue, Oct 17, 2023 at 10:45:39PM +0300, Andy Shevchenko wrote:
> > 
> > Thanks for your response.

...

> > > I wonder, could you please post entire dmesg for your system?
> > 
> > Working, non-working or both?
> 
> Non working, especially if you also enable debug logs in
> drivers/mmc/host/sdhci-pci-core.c.

Here we are
https://paste.debian.net/hidden/5d778105/

> What I do not quite understand is that I think we should not be hitting
> the case where pinctrl is already created for the device, which is the
> code path my patch was changing. IIUIC we should be mostly executing the
> "pinctrl not found" path and that did not really change. Maybe you could
> also put some more annotations to show how/at what exact point the probe
> order changed? Maybe log find_pinctrl() calls and compare?

I see this order in dmesg
[   48.429681] sdhci-pci 0000:00:01.2: Mapped GSI37 to IRQ79
[   48.436219] sdhci-pci 0000:00:01.0: Mapped GSI0 to IRQ80
[   48.450347] sdhci-pci 0000:00:01.3: Mapped GSI38 to IRQ81

which suggests that PCI enabling devices are happening in parallel
(pcim_enable_device() in SDHCI PCI driver) and whoever wins first gets
the ID via IDA (see mmc_alloc_host() implementation). But PCI itself
guarantees that function 0 has to be always present, so the PCI itself
enumerates it _always_ in the same order (and we are talking about exactly
BDF == x:y.0 in this case).

> Linus, BTW, I think there are more problems there with pinctrl lookup,
> because, if we assume there are concurrent accesses to pinctrl_get(),
> the fact that we did not find an instance while scanning the list does
> not mean we will not find it when we go to insert a newly created one.
> 
> Another problem, as far as I can see, that there is not really a defined
> owner of pinctrl structure, it is created on demand, and destroyed when
> last user is gone. So if we execute last pintctrl_put() and there is
> another pinctrl_get() running simultaneously, we may get and bump up the
> refcount, and then release (pinctrl_free) will acquire the mutex, and
> zap the structure.
> 
> Given that there are more issues in that code, maybe we should revert
> the patch for now so Andy has a chance to convert to UUID/LABEL booting?

I'm testing a PoC of the script, so looks promising, but needs more time to
check other possibilities (see below) and deploy.

...

> > > I think the right answer is "fix the userspace" really in this case. We
> > > could also try extend of_alias_get_id() to see if we could pass some
> > > preferred numbering on x86. But this will again be fragile if the
> > > knowledge resides in the driver and is not tied to a particular board
> > > (as it is in DT case): there could be multiple controllers, things will
> > > be shifting board to board...
> > 
> > Any suggestion how should it be properly done in the minimum shell environment?
> > (Busybox uses mdev with static tables IIRC and there is no fancy udev or so)
> 
> I'm not sure, so you have something like blkid running? You just need to
> locate the device and chroot there. This assumes you do have initramfs.

blkid shows UUID for the partition of interest and it doesn't have any label,
OTOH I could parse it for the specific template, while it's less reliable than
going via sysfs from PCI device name, that's defined by hardware and may not be
changed.
  
Andy Shevchenko Oct. 19, 2023, 5:19 p.m. UTC | #15
On Thu, Oct 19, 2023 at 07:52:26PM +0300, Andy Shevchenko wrote:
> On Wed, Oct 18, 2023 at 03:41:24PM -0700, Dmitry Torokhov wrote:
> > On Wed, Oct 18, 2023 at 08:01:23AM +0300, Andy Shevchenko wrote:
> > > On Tue, Oct 17, 2023 at 02:43:01PM -0700, Dmitry Torokhov wrote:
> > > > On Tue, Oct 17, 2023 at 10:45:39PM +0300, Andy Shevchenko wrote:
> > > 
> > > Thanks for your response.

...

> > > > I wonder, could you please post entire dmesg for your system?
> > > 
> > > Working, non-working or both?
> > 
> > Non working, especially if you also enable debug logs in
> > drivers/mmc/host/sdhci-pci-core.c.
> 
> Here we are
> https://paste.debian.net/hidden/5d778105/

For the sake of completeness

https://paste.debian.net/hidden/149933ac/

the working case on the same codebase (the hash is different due to patch that
changes couple of BUG*() to WARN*(), other than that the code is identical).

> > What I do not quite understand is that I think we should not be hitting
> > the case where pinctrl is already created for the device, which is the
> > code path my patch was changing. IIUIC we should be mostly executing the
> > "pinctrl not found" path and that did not really change. Maybe you could
> > also put some more annotations to show how/at what exact point the probe
> > order changed? Maybe log find_pinctrl() calls and compare?
> 
> I see this order in dmesg
> [   48.429681] sdhci-pci 0000:00:01.2: Mapped GSI37 to IRQ79
> [   48.436219] sdhci-pci 0000:00:01.0: Mapped GSI0 to IRQ80
> [   48.450347] sdhci-pci 0000:00:01.3: Mapped GSI38 to IRQ81
> 
> which suggests that PCI enabling devices are happening in parallel
> (pcim_enable_device() in SDHCI PCI driver) and whoever wins first gets
> the ID via IDA (see mmc_alloc_host() implementation). But PCI itself
> guarantees that function 0 has to be always present, so the PCI itself
> enumerates it _always_ in the same order (and we are talking about exactly
> BDF == x:y.0 in this case).
> 
> > Linus, BTW, I think there are more problems there with pinctrl lookup,
> > because, if we assume there are concurrent accesses to pinctrl_get(),
> > the fact that we did not find an instance while scanning the list does
> > not mean we will not find it when we go to insert a newly created one.
> > 
> > Another problem, as far as I can see, that there is not really a defined
> > owner of pinctrl structure, it is created on demand, and destroyed when
> > last user is gone. So if we execute last pintctrl_put() and there is
> > another pinctrl_get() running simultaneously, we may get and bump up the
> > refcount, and then release (pinctrl_free) will acquire the mutex, and
> > zap the structure.
> > 
> > Given that there are more issues in that code, maybe we should revert
> > the patch for now so Andy has a chance to convert to UUID/LABEL booting?
> 
> I'm testing a PoC of the script, so looks promising, but needs more time to
> check other possibilities (see below) and deploy.

...

> > > > I think the right answer is "fix the userspace" really in this case. We
> > > > could also try extend of_alias_get_id() to see if we could pass some
> > > > preferred numbering on x86. But this will again be fragile if the
> > > > knowledge resides in the driver and is not tied to a particular board
> > > > (as it is in DT case): there could be multiple controllers, things will
> > > > be shifting board to board...
> > > 
> > > Any suggestion how should it be properly done in the minimum shell environment?
> > > (Busybox uses mdev with static tables IIRC and there is no fancy udev or so)
> > 
> > I'm not sure, so you have something like blkid running? You just need to
> > locate the device and chroot there. This assumes you do have initramfs.
> 
> blkid shows UUID for the partition of interest and it doesn't have any label,
> OTOH I could parse it for the specific template, while it's less reliable than
> going via sysfs from PCI device name, that's defined by hardware and may not be
> changed.
  

Patch

diff --git a/drivers/pinctrl/core.c b/drivers/pinctrl/core.c
index e2f7519bef04..e9dc9638120a 100644
--- a/drivers/pinctrl/core.c
+++ b/drivers/pinctrl/core.c
@@ -1022,20 +1022,17 @@  static int add_setting(struct pinctrl *p, struct pinctrl_dev *pctldev,
 
 static struct pinctrl *find_pinctrl(struct device *dev)
 {
-	struct pinctrl *entry, *p = NULL;
+	struct pinctrl *p;
 
 	mutex_lock(&pinctrl_list_mutex);
-
-	list_for_each_entry(entry, &pinctrl_list, node) {
-		if (entry->dev == dev) {
-			p = entry;
-			kref_get(&p->users);
-			break;
+	list_for_each_entry(p, &pinctrl_list, node)
+		if (p->dev == dev) {
+			mutex_unlock(&pinctrl_list_mutex);
+			return p;
 		}
-	}
 
 	mutex_unlock(&pinctrl_list_mutex);
-	return p;
+	return NULL;
 }
 
 static void pinctrl_free(struct pinctrl *p, bool inlist);
@@ -1143,6 +1140,7 @@  struct pinctrl *pinctrl_get(struct device *dev)
 	p = find_pinctrl(dev);
 	if (p) {
 		dev_dbg(dev, "obtain a copy of previously claimed pinctrl\n");
+		kref_get(&p->users);
 		return p;
 	}