[GIT,PULL] gpio: fixes for v6.1-rc8 - take 2

Message ID 20221203150539.11483-1-brgl@bgdev.pl
State New
Headers
Series [GIT,PULL] gpio: fixes for v6.1-rc8 - take 2 |

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git gpio-fixes-for-v6.1-rc8-take-2

Message

Bartosz Golaszewski Dec. 3, 2022, 3:05 p.m. UTC
  From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

Linus,

Here's a fixed PR from the GPIO subsystem for the next rc.

Here's the last round of fixes for the upcoming release. The two resource leak
fixes are self-explanatory. The two character device commits need some more
backstory:

I recently listened to Laurent Pinchart's talk from this year's LPC[1] where he
discussed an issue with many subsystems that export device nodes to user-space
where one can open the device file, unbind the underlying device from the
driver, then call any of the relevant system calls and observe the kernel crash
with a NULL-pointer derefence.

I verified that the problem exists with the GPIO subsystem as well. The reason
for that is: when a GPIO chip is removed, we drop the chip's data from the
GPIO device and set the relevant pointer to NULL but don't check it in syscall
callbacks in the character device's code. That's fixed by the first patch.

However that fix alone leaves the character device vulnerable to a race
condition in which the GPIO chip is removed when a syscall is in progress.
To avoid that, we have a second fix that uses a lock to protect the device's
structure from being disabled before all syscall callbacks returned.

Laurent blamed the issue on devres but I believe the problem comes from the
fact that many driver developers are simply unaware that resources exported
to user-space need to survive the driver unbind and only be released when the
device's reference count goes down to 0. Devres' docs are pretty clear on that:
the resources get freed on driver unbind. Resources that should survive it,
must not be managed. This is however a topic for a separate discussion which
I intend to start soon.

Please pull,
Bartosz Golaszewski

[1] https://www.youtube.com/watch?v=kW8LHWlJPTU

The following changes since commit b7b275e60bcd5f89771e865a8239325f86d9927d:

  Linux 6.1-rc7 (2022-11-27 13:31:48 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git gpio-fixes-for-v6.1-rc8-take-2

for you to fetch changes up to 450571883735e9a7c3b38691225531d54773e9a2:

  gpiolib: protect the GPIO device against being dropped while in use by user-space (2022-12-02 17:01:37 +0100)

----------------------------------------------------------------
gpio fixes for v6.1-rc8

- fix a memory leak in gpiochip_setup_dev() in core gpiolib
- fix a reference leak in gpio-amd8111
- fix a problem with user-space being able to trigger a NULL-pointer
  dereference ovet the GPIO character device

----------------------------------------------------------------
Bartosz Golaszewski (2):
      gpiolib: cdev: fix NULL-pointer dereferences
      gpiolib: protect the GPIO device against being dropped while in use by user-space

Xiongfeng Wang (1):
      gpio: amd8111: Fix PCI device reference count leak

Zeng Heng (1):
      gpiolib: fix memory leak in gpiochip_setup_dev()

 drivers/gpio/gpio-amd8111.c |   4 +
 drivers/gpio/gpiolib-cdev.c | 207 ++++++++++++++++++++++++++++++++++++++------
 drivers/gpio/gpiolib.c      |  46 ++++++----
 drivers/gpio/gpiolib.h      |   5 ++
 4 files changed, 221 insertions(+), 41 deletions(-)
  

Comments

Linus Torvalds Dec. 4, 2022, 8:17 p.m. UTC | #1
On Sat, Dec 3, 2022 at 7:05 AM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>
> Here's a fixed PR from the GPIO subsystem for the next rc.

No, this cannot be right.

That last commit seems *very* dubious, and in particular all those

        if (!down_read_trylock(&gdev->sem))
                return EPOLLHUP | EPOLLERR;

are a sign that something is very very wrong there.

Either the lock is necessary or it isn't, and "trylock" isn't the way
to deal with it, with random failures if you cannot take the lock.

If you are using "trylock" because the data structure might go away
from under you, you have already lost, and the code is buggy.

And if the data structure cannot go away from under  you,  you should
do an unconditional lock, and then check "gdev->chip" for being NULL
once you have gotten the lock (the same way you did in open()).

But a "trylock and return error if it failed" just means that now you
are randomly returning errors to user space, which is entirely
undebuggable and makes no sense.

Or, alternatively, the trylock succeeds - because it hits fully
*after* gpiochip_remove() has finished, and now ->chip is NULL anyway,
which is what you claim to protect against.

End result: "trylock" can never be right in this kind of context.

That "call_locked() helper might make sense more along the lines of

        ret = -ENODEV;

        down_read(&gdev->sem))
        // Does the device still exist?
        if (gdev->chip)
                ret = func(file, cmd, arg);
        up_read(&gdev->sem);

        return ret;

or similar. Not with that odd "try to lock, and if that fails, assume error".

And again - if the trylock is there because 'gdev' itself might go
away at any time and you can't afford to wait on the lock, then it's
broken regardless (and the above suggestion won't help either)

Anyway: the end result of this all is that I think this is a
fundamental bug in the gpio layer, and rc7 (soon to be rc8) is too
late to try these kinds of unfinished games.

Fix it properly for 6.2, and make it back-portable, because I'm not
pulling something like this right now.

               Linus
  
Linus Torvalds Dec. 4, 2022, 8:30 p.m. UTC | #2
On Sun, Dec 4, 2022 at 12:17 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And again - if the trylock is there because 'gdev' itself might go
> away at any time and you can't afford to wait on the lock, then it's
> broken regardless (and the above suggestion won't help either)

.. another reason I can see is that you are holding other locks, and
the trylock is either for deadlock avoidance or because the other
locks are spinning locks and you cannot sleep.

But if that's the case, then the trylock is basically a hacky
workaround for broken locking, together with a "I know the only reason
it fails is because we've already entered the shutdown phase".

Again, that kind of hacky thing is not for this late in the rc game.

It might be an acceptable workaround for backporting if it has *huge*
comments about why it's done that way, but it's not acceptable as some
kind of fix without that kind of documentation for why it's done that
hacky way rather than with proper locking.

               Linus
  
Bartosz Golaszewski Dec. 4, 2022, 8:46 p.m. UTC | #3
On Sun, Dec 4, 2022 at 9:17 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sat, Dec 3, 2022 at 7:05 AM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
> >
> > Here's a fixed PR from the GPIO subsystem for the next rc.
>
> No, this cannot be right.
>
> That last commit seems *very* dubious, and in particular all those
>
>         if (!down_read_trylock(&gdev->sem))
>                 return EPOLLHUP | EPOLLERR;
>
> are a sign that something is very very wrong there.
>
> Either the lock is necessary or it isn't, and "trylock" isn't the way
> to deal with it, with random failures if you cannot take the lock.
>
> If you are using "trylock" because the data structure might go away
> from under you, you have already lost, and the code is buggy.
>
> And if the data structure cannot go away from under  you,  you should
> do an unconditional lock, and then check "gdev->chip" for being NULL
> once you have gotten the lock (the same way you did in open()).
>

No, the data can't be removed with these locks in place. It's just to
avoid going into the callbacks if gpiochip_remove() is already in
progress (the only reason why trylock would fail).

> But a "trylock and return error if it failed" just means that now you
> are randomly returning errors to user space, which is entirely
> undebuggable and makes no sense.
>

Technically these are the same errors we'll return later if gdev->chip
is NULL but I get your point.

> Or, alternatively, the trylock succeeds - because it hits fully
> *after* gpiochip_remove() has finished, and now ->chip is NULL anyway,
> which is what you claim to protect against.
>
> End result: "trylock" can never be right in this kind of context.
>
> That "call_locked() helper might make sense more along the lines of
>
>         ret = -ENODEV;
>
>         down_read(&gdev->sem))
>         // Does the device still exist?
>         if (gdev->chip)
>                 ret = func(file, cmd, arg);
>         up_read(&gdev->sem);
>
>         return ret;
>

This is a good suggestion, thanks. And with it, the two patches can
get squashed into one for easier backporting.

> or similar. Not with that odd "try to lock, and if that fails, assume error".
>
> And again - if the trylock is there because 'gdev' itself might go
> away at any time and you can't afford to wait on the lock, then it's
> broken regardless (and the above suggestion won't help either)
>
> Anyway: the end result of this all is that I think this is a
> fundamental bug in the gpio layer, and rc7 (soon to be rc8) is too
> late to try these kinds of unfinished games.
>
> Fix it properly for 6.2, and make it back-portable, because I'm not
> pulling something like this right now.
>
>                Linus

Will do. I will still resend the PR with only the resource leak fixes.

Bartosz
  
Linus Torvalds Dec. 4, 2022, 9:24 p.m. UTC | #4
On Sun, Dec 4, 2022 at 12:47 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>
> No, the data can't be removed with these locks in place. It's just to
> avoid going into the callbacks if gpiochip_remove() is already in
> progress (the only reason why trylock would fail).

That "the only reason why trylock would fail" may be true in practice,
but it's really just an implementation detail.

Other issues *could* make a trylock fail.

For example, assuming the trylock is just implemented as a non-looping
"cmpxchg" (which may or may not be the case), even another reader
coming in and racing with a trylock could make the cmpxchg fail.

That "do one single cmpxchg" is what the spinlock trylock code does,
for example.

Now, that's not actually what we do for the down_read_trylock() case -
we will actually loop over it - but locking is complicated and you
absolutely should not make assumptions about the exact implementation
details.

And even with the loop, if you ever have *any* other reason to get the
write-lock (or even just do a "try_write", suddenly the details of
when the trylock fails changes entirely.

So that's why I really don't want some random driver layer to depend
on some semantics for trylock that aren't actually guaranteed in the
bigger picture.

The only thing "trylock" says is "I tried to get the lock". It really
could easily fail for random reasons that aren't about actual writers.

For example, even aside from any "do a single cmpxchg" issue: a
sleeping lock could be implemented using a spinlock to protect the
"real" locking data. That kind of implementation is particularly
likely for debugging.

And then, in order to be usable in a recursive environment, a
"trylock" would quite likely be implemented without spinning on that
spinlock, even before it gets to the actual lock internal
implementation. So just contention on the spinlock - by other readers,
not writers - could make the trylock fail.

See?

             Linus