[0/8] can: usb: remove all usb_set_intfdata(intf, NULL) in drivers' disconnect()

Message ID 20221203133159.94414-1-mailhol.vincent@wanadoo.fr
Headers
Series can: usb: remove all usb_set_intfdata(intf, NULL) in drivers' disconnect() |

Message

Vincent Mailhol Dec. 3, 2022, 1:31 p.m. UTC
  The core sets the usb_interface to NULL in [1]. Also setting it to
NULL in usb_driver::disconnects() is at best useless, at worse risky.

Indeed, if a driver set the usb interface to NULL before all actions
relying on the interface-data pointer complete, there is a risk of
NULL pointer dereference. Typically, this is the case if there are
outstanding urbs which have not yet completed when entering
disconnect().

If all actions are already completed, doing usb_set_intfdata(intf,
NULL) is useless because the core does it at [1].

The first seven patches fix all drivers which set their usb_interface
to NULL while outstanding URB might still exists. There is one patch
per driver in order to add the relevant "Fixes:" tag to each of them.

The last patch removes in bulk the remaining benign calls to
usb_set_intfdata(intf, NULL) in etas_es58x and peak_usb.

N.B. some other usb drivers outside of the can tree also have the same
issue, but this is out of scope of this.

[1] function usb_unbind_interface() from drivers/usb/core/driver.c
Link: https://elixir.bootlin.com/linux/v6.0/source/drivers/usb/core/driver.c#L497

Vincent Mailhol (8):
  can: ems_usb: ems_usb_disconnect(): fix NULL pointer dereference
  can: esd_usb: esd_usb_disconnect(): fix NULL pointer dereference
  can: gs_usb: gs_usb_disconnect(): fix NULL pointer dereference
  can: kvaser_usb: kvaser_usb_disconnect(): fix NULL pointer dereference
  can: mcba_usb: mcba_usb_disconnect(): fix NULL pointer dereference
  can: ucan: ucan_disconnect(): fix NULL pointer dereference
  can: usb_8dev: usb_8dev_disconnect(): fix NULL pointer dereference
  can: etas_es58x and peak_usb: remove useless call to
    usb_set_intfdata()

 drivers/net/can/usb/ems_usb.c                    | 2 --
 drivers/net/can/usb/esd_usb.c                    | 2 --
 drivers/net/can/usb/etas_es58x/es58x_core.c      | 1 -
 drivers/net/can/usb/gs_usb.c                     | 2 --
 drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c | 2 --
 drivers/net/can/usb/mcba_usb.c                   | 2 --
 drivers/net/can/usb/peak_usb/pcan_usb_core.c     | 2 --
 drivers/net/can/usb/ucan.c                       | 2 --
 drivers/net/can/usb/usb_8dev.c                   | 2 --
 9 files changed, 17 deletions(-)
  

Comments

Oliver Neukum Dec. 5, 2022, 8:35 a.m. UTC | #1
On 03.12.22 14:31, Vincent Mailhol wrote:
> The core sets the usb_interface to NULL in [1]. Also setting it to
> NULL in usb_driver::disconnects() is at best useless, at worse risky.

Hi,

I am afraid there is a major issue with your series of patches.
The drivers you are removing this from often have a subsequent check
for the data they got from usb_get_intfdata() being NULL.

That pattern is taken from drivers like btusb or CDC-ACM, which
claim secondary interfaces disconnect() will be called a second time
for.
In addition, a driver can use setting intfdata to NULL as a flag
for disconnect() having proceeded to a point where certain things
can no longer be safely done. You need to check for that in every driver
you remove this code from and if you decide that it can safely be removed,
which is likely, then please also remove checks like this:

  	struct ems_usb *dev = usb_get_intfdata(intf);
  
	usb_set_intfdata(intf, NULL);

  	if (dev) {
  		unregister_netdev(dev->netdev);

Either it can be called a second time, then you need to leave it
as is, or the check for NULL is superfluous. But only removing setting
the pointer to NULL never makes sense.

	Regards
		Oliver
  
Vincent Mailhol Dec. 8, 2022, 9 a.m. UTC | #2
On Mon. 5 Dec. 2022 at 17:39, Oliver Neukum <oneukum@suse.com> wrote:
> On 03.12.22 14:31, Vincent Mailhol wrote:
> > The core sets the usb_interface to NULL in [1]. Also setting it to
> > NULL in usb_driver::disconnects() is at best useless, at worse risky.
>
> Hi,
>
> I am afraid there is a major issue with your series of patches.
> The drivers you are removing this from often have a subsequent check
> for the data they got from usb_get_intfdata() being NULL.

ACK, but I do not see the connection.

> That pattern is taken from drivers like btusb or CDC-ACM

Where does CDC-ACM set *his* interface to NULL? Looking at:

  https://elixir.bootlin.com/linux/v6.0/source/drivers/usb/class/cdc-acm.c#L1531

I can see that cdc-acm sets acm->control and acm->data to NULL in his
disconnect(), but it doesn't set its own usb_interface to NULL.

> which claim secondary interfaces disconnect() will be called a second time
> for.

Are you saying that the disconnect() of those CAN USB drivers is being
called twice? I do not see this in the source code. The only caller of
usb_driver::disconnect() I can see is:

  https://elixir.bootlin.com/linux/v6.0/source/drivers/usb/core/driver.c#L458

> In addition, a driver can use setting intfdata to NULL as a flag
> for disconnect() having proceeded to a point where certain things
> can no longer be safely done.

Any reference that a driver can do that? This pattern seems racy.

By the way, I did check all the drivers:

  * ems_usb: intf is only used in ems_usb_probe() and
ems_usb_disconnect() functions.

  * esd_usb: intf is only used in the esd_usb_probe(),
    esd_usb_probe_one_net() (which is part of probing),
    esd_usb_disconnect() and a couple of sysfs functions (which only
    use intf to get a pointer to struct esd_usb).

  * gs_usb: intf is used several time but only to retrive struct
    usb_device. This seems useless, I will sent this patch to remove
    it:
    https://lore.kernel.org/linux-can/20221208081142.16936-3-mailhol.vincent@wanadoo.fr/
    Aside of that, intf is only used in gs_usb_probe(),
    gs_make_candev() (which is part of probing) and
    gs_usb_disconnect() functions.

  * kvaser_usb: intf is only used in kvaser_usb_probe() and
    kvaser_usb_disconnect() functions.

  * mcba_usb: intf is only used in mcba_usb_probe() and
    mcba_usb_disconnect() functions.

  * ucan: intf is only used in ucan_probe() and
    ucan_disconnect(). struct ucan_priv also has a pointer to intf but
    it is never used. I sent this patch to remove it:
    https://lore.kernel.org/linux-can/20221208081142.16936-2-mailhol.vincent@wanadoo.fr/

  * usb_8dev: intf is only used in usb_8dev_probe() and
    usb_8dev_disconnect().

With no significant use of intf outside of the probe() and
disconnect(), there is definitely no such "use intf as a flag" in any
of these drivers.

> You need to check for that in every driver
> you remove this code from and if you decide that it can safely be removed,

What makes you assume that I didn't check this in the first place? Or
do you see something I missed?

> which is likely, then please also remove checks like this:
>
>         struct ems_usb *dev = usb_get_intfdata(intf);
>
>         usb_set_intfdata(intf, NULL);
>
>         if (dev) {
>                 unregister_netdev(dev->netdev);

How is the if (dev) check related? There is no correlation between
setting intf to NULL and dev not being NULL.

I think dev is never NULL, but I did not assess that dev could not be NULL.

> Either it can be called a second time, then you need to leave it
> as is,

Really?! The first thing disconnect() does is calling
usb_get_intfdata(intf) which dereferences intf without checking if it
is NULL, c.f.:

  https://elixir.bootlin.com/linux/v6.0/source/include/linux/usb.h#L265

Then it sets intf to NULL.

The second time you call disconnect(), the usb_get_intfdata(intf)
would be a NULL pointer dereference.

> or the check for NULL is superfluous. But only removing setting
> the pointer to NULL never makes sense.


Yours sincerely,
Vincent Mailhol
  
Oliver Neukum Dec. 8, 2022, 10:55 a.m. UTC | #3
On 08.12.22 10:00, Vincent MAILHOL wrote:
> On Mon. 5 Dec. 2022 at 17:39, Oliver Neukum <oneukum@suse.com> wrote:
>> On 03.12.22 14:31, Vincent Mailhol wrote:

Good Morning!

> ACK, but I do not see the connection.
Well, useless checks are bad. In particular, we should always
make it clear whether a pointer may or may not be NULL.
That is, I have no problem with what you were trying to do
with your patch set. It is a good idea and possibly slightly
overdue. The problem is the method.

> I can see that cdc-acm sets acm->control and acm->data to NULL in his
> disconnect(), but it doesn't set its own usb_interface to NULL.

You don't have to, but you can. I was explaining the two patterns for doing so.

>> which claim secondary interfaces disconnect() will be called a second time
>> for.
> 
> Are you saying that the disconnect() of those CAN USB drivers is being
> called twice? I do not see this in the source code. The only caller of
> usb_driver::disconnect() I can see is:
> 
>    https://elixir.bootlin.com/linux/v6.0/source/drivers/usb/core/driver.c#L458

If they use usb_claim_interface(), yes it is called twice. Once per
interface. That is in the case of ACM once for the originally probed
interface and a second time for the claimed interface.
But not necessarily in that order, as you can be kicked off an interface
via sysfs. Yet you need to cease operations as soon as you are disconnected
from any interface. That is annoying because it means you cannot use a
refcount. From that stems the widespread use of intfdata as a flag.

>> In addition, a driver can use setting intfdata to NULL as a flag
>> for disconnect() having proceeded to a point where certain things
>> can no longer be safely done.
> 
> Any reference that a driver can do that? This pattern seems racy.

Technically that is exactly what drivers that use usb_claim_interface()
do. You free everything at the first call and use intfdata as a flag
to prevent a double free.
The race is prevented by usbcore locking, which guarantees that probe()
and disconnect() have mutual exclusion.
If you use intfdata in sysfs, yes additional locking is needed.

> What makes you assume that I didn't check this in the first place? Or
> do you see something I missed?

That you did not put it into the changelogs.
That reads like the drivers are doing something obsolete or stupid.
They do not. They copied something that is necessary only under
some circumstances.

And that you did not remove the checks.

>> which is likely, then please also remove checks like this:
>>
>>          struct ems_usb *dev = usb_get_intfdata(intf);
>>
>>          usb_set_intfdata(intf, NULL);
>>
>>          if (dev) {

Here. If you have a driver that uses usb_claim_interface().
You need this check or you unregister an already unregistered
netdev.

The way this disconnect() method is coded is extremely defensive.
Most drivers do not need this check. But it is never
wrong in the strict sense.

Hence doing a mass removal with a change log that does
not say that this driver is using only a single interface
hence the check can be dropped to reduce code size
is not good.

	Regards
		Oliver
  
Vincent Mailhol Dec. 8, 2022, 3:44 p.m. UTC | #4
On Thu. 8 Dec. 2022 at 20:04, Oliver Neukum <oneukum@suse.com> wrote:
> On 08.12.22 10:00, Vincent MAILHOL wrote:
> > On Mon. 5 Dec. 2022 at 17:39, Oliver Neukum <oneukum@suse.com> wrote:
> >> On 03.12.22 14:31, Vincent Mailhol wrote:
>
> Good Morning!

Good night! (different time zone :))

> > ACK, but I do not see the connection.
> Well, useless checks are bad. In particular, we should always
> make it clear whether a pointer may or may not be NULL.
> That is, I have no problem with what you were trying to do
> with your patch set. It is a good idea and possibly slightly
> overdue. The problem is the method.
>
> > I can see that cdc-acm sets acm->control and acm->data to NULL in his
> > disconnect(), but it doesn't set its own usb_interface to NULL.
>
> You don't have to, but you can. I was explaining the two patterns for doing so.
>
> >> which claim secondary interfaces disconnect() will be called a second time
> >> for.
> >
> > Are you saying that the disconnect() of those CAN USB drivers is being
> > called twice? I do not see this in the source code. The only caller of
> > usb_driver::disconnect() I can see is:
> >
> >    https://elixir.bootlin.com/linux/v6.0/source/drivers/usb/core/driver.c#L458
>
> If they use usb_claim_interface(), yes it is called twice. Once per
> interface. That is in the case of ACM once for the originally probed
> interface and a second time for the claimed interface.
> But not necessarily in that order, as you can be kicked off an interface
> via sysfs. Yet you need to cease operations as soon as you are disconnected
> from any interface. That is annoying because it means you cannot use a
> refcount. From that stems the widespread use of intfdata as a flag.

Thank you for the details! I better understand this part now.

> >> In addition, a driver can use setting intfdata to NULL as a flag
> >> for disconnect() having proceeded to a point where certain things
> >> can no longer be safely done.
> >
> > Any reference that a driver can do that? This pattern seems racy.
>
> Technically that is exactly what drivers that use usb_claim_interface()
> do. You free everything at the first call and use intfdata as a flag
> to prevent a double free.
> The race is prevented by usbcore locking, which guarantees that probe()
> and disconnect() have mutual exclusion.
> If you use intfdata in sysfs, yes additional locking is needed.

ACK for the mutual exclusion. My question was about what you said in
your previous message:

| In addition, a driver can use setting intfdata to NULL as a flag
| for *disconnect() having proceeded to a point* where certain things
| can no longer be safely done.

How do you check that disconnect() has proceeded *to a given point*
using intf without being racy? You can check if it has already
completed once but not check how far it has proceeded, right?

> > What makes you assume that I didn't check this in the first place? Or
> > do you see something I missed?
>
> That you did not put it into the changelogs.
> That reads like the drivers are doing something obsolete or stupid.
> They do not. They copied something that is necessary only under
> some circumstances.
>
> And that you did not remove the checks.
>
> >> which is likely, then please also remove checks like this:
> >>
> >>          struct ems_usb *dev = usb_get_intfdata(intf);
> >>
> >>          usb_set_intfdata(intf, NULL);
> >>
> >>          if (dev) {
>
> Here. If you have a driver that uses usb_claim_interface().
> You need this check or you unregister an already unregistered
> netdev.

Sorry, but with all my best intentions, I still do not get it. During
the second iteration, inft is NULL and:

        /* equivalent to dev = intf->dev.data. Because intf is NULL,
         * this is a NULL pointer dereference */
        struct ems_usb *dev = usb_get_intfdata(intf);

        /* OK, intf is already NULL */
        usb_set_intfdata(intf, NULL);

        /* follows a NULL pointer dereference so this is undefined
         * behaviour */
       if (dev) {

How is this a valid check that you entered the function for the second
time? If intf is the flag, you should check intf, not dev? Something
like this:

        struct ems_usb *dev;

        if (!intf)
                return;

        dev = usb_get_intfdata(intf);
        /* ... */

I just can not see the connection between intf being NULL and the if
(dev) check. All I see is some undefined behaviour, sorry.

> The way this disconnect() method is coded is extremely defensive.
> Most drivers do not need this check. But it is never
> wrong in the strict sense.
>
> Hence doing a mass removal with a change log that does
> not say that this driver is using only a single interface
> hence the check can be dropped to reduce code size
> is not good.
>
>         Regards
>                 Oliver
  
Alan Stern Dec. 8, 2022, 4:28 p.m. UTC | #5
On Fri, Dec 09, 2022 at 12:44:51AM +0900, Vincent MAILHOL wrote:
> On Thu. 8 Dec. 2022 at 20:04, Oliver Neukum <oneukum@suse.com> wrote:

> > >> which is likely, then please also remove checks like this:
> > >>
> > >>          struct ems_usb *dev = usb_get_intfdata(intf);
> > >>
> > >>          usb_set_intfdata(intf, NULL);
> > >>
> > >>          if (dev) {
> >
> > Here. If you have a driver that uses usb_claim_interface().
> > You need this check or you unregister an already unregistered
> > netdev.
> 
> Sorry, but with all my best intentions, I still do not get it. During
> the second iteration, inft is NULL and:

No, intf is never NULL.  Rather, the driver-specific pointer stored in 
intfdata may be NULL.

You seem to be confusing intf with intfdata(intf).

>         /* equivalent to dev = intf->dev.data. Because intf is NULL,
>          * this is a NULL pointer dereference */
>         struct ems_usb *dev = usb_get_intfdata(intf);

So here dev will be NULL when the second interface's disconnect routine 
runs, because the first time through the routine sets the intfdata to 
NULL for both interfaces:

	USB core calls ->disconnect(intf1)

		disconnect routine sets intfdata(intf1) and 
		intfdata(intf2) both to NULL and handles the
		disconnection

	USB core calls ->disconnect(intf2)

		disconnect routine sees that intfdata(intf2) is
		already NULL, so it knows that it doesn't need
		to do anything more.

As you can see in this scenario, neither intf1 nor intf2 is ever NULL.

>         /* OK, intf is already NULL */
>         usb_set_intfdata(intf, NULL);
> 
>         /* follows a NULL pointer dereference so this is undefined
>          * behaviour */
>        if (dev) {
> 
> How is this a valid check that you entered the function for the second
> time? If intf is the flag, you should check intf, not dev? Something
> like this:

intf is not a flag; it is the argument to the function and is never 
NULL.  The flag is the intfdata.

>         struct ems_usb *dev;
> 
>         if (!intf)
>                 return;
> 
>         dev = usb_get_intfdata(intf);
>         /* ... */
> 
> I just can not see the connection between intf being NULL and the if
> (dev) check. All I see is some undefined behaviour, sorry.

Once you get it straightened out in your head, you will understand.

Alan Stern
  
Oliver Neukum Dec. 8, 2022, 4:51 p.m. UTC | #6
On 08.12.22 16:44, Vincent MAILHOL wrote:
> On Thu. 8 Dec. 2022 at 20:04, Oliver Neukum <oneukum@suse.com> wrote:
>> On 08.12.22 10:00, Vincent MAILHOL wrote:
>>> On Mon. 5 Dec. 2022 at 17:39, Oliver Neukum <oneukum@suse.com> wrote:
>>>> On 03.12.22 14:31, Vincent Mailhol wrote:
>>
>> Good Morning!
> 
> Good night! (different time zone :))

Good evening!

> 
> How do you check that disconnect() has proceeded *to a given point*
> using intf without being racy? You can check if it has already
> completed once but not check how far it has proceeded, right?

You'd use intfdata, which is a pointer stored in intf.

But other than that the simplest way would be to use a mutex.


	Regards
		Oliver
  
Vincent Mailhol Dec. 10, 2022, 9:02 a.m. UTC | #7
Hi,

Thanks Alan and Oliver for your patience, really appreciated. And
sorry that it took me four messages to realize my mistake.

I will send a v2 right now.