platform: Provide a remove callback that returns no value

Message ID 20221209150914.3557650-1-u.kleine-koenig@pengutronix.de
State New
Headers
Series platform: Provide a remove callback that returns no value |

Commit Message

Uwe Kleine-König Dec. 9, 2022, 3:09 p.m. UTC
  struct platform_driver::remove returning an integer made driver authors
expect that returning an error code was proper error handling. However
the driver core ignores the error and continues to remove the device
because there is nothing the core could do anyhow and reentering the
remove callback again is only calling for trouble.

So this is an source for errors typically yielding resource leaks in the
error path.

As there are too many platform drivers to neatly convert them all to
return void in a single go, do it in several steps after this patch:

 a) Convert all drivers to implement .remove_new() returning void instead
    of .remove() returning int;
 b) Change struct platform_driver::remove() to return void and so make
    it identical to .remove_new();
 c) Change all drivers back to .remove() now with the better prototype;
 d) drop struct platform_driver::remove_new().

While this touches all drivers eventually twice, steps a) and c) can be
done one driver after another and so reduces coordination efforts
immensely and simplifies review.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
---
 drivers/base/platform.c         |  4 +++-
 include/linux/platform_device.h | 11 +++++++++++
 2 files changed, 14 insertions(+), 1 deletion(-)
  

Comments

Greg KH Dec. 9, 2022, 3:21 p.m. UTC | #1
On Fri, Dec 09, 2022 at 04:09:14PM +0100, Uwe Kleine-König wrote:
> struct platform_driver::remove returning an integer made driver authors
> expect that returning an error code was proper error handling. However
> the driver core ignores the error and continues to remove the device
> because there is nothing the core could do anyhow and reentering the
> remove callback again is only calling for trouble.
> 
> So this is an source for errors typically yielding resource leaks in the
> error path.
> 
> As there are too many platform drivers to neatly convert them all to
> return void in a single go, do it in several steps after this patch:
> 
>  a) Convert all drivers to implement .remove_new() returning void instead
>     of .remove() returning int;
>  b) Change struct platform_driver::remove() to return void and so make
>     it identical to .remove_new();
>  c) Change all drivers back to .remove() now with the better prototype;

Change c) seems like it will be just as much work as a), right?

>  d) drop struct platform_driver::remove_new().




> 
> While this touches all drivers eventually twice, steps a) and c) can be
> done one driver after another and so reduces coordination efforts
> immensely and simplifies review.
> 
> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> ---
>  drivers/base/platform.c         |  4 +++-
>  include/linux/platform_device.h | 11 +++++++++++
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 968f3d71eeab..a4938d1c8fe1 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -1416,7 +1416,9 @@ static void platform_remove(struct device *_dev)
>  	struct platform_driver *drv = to_platform_driver(_dev->driver);
>  	struct platform_device *dev = to_platform_device(_dev);
>  
> -	if (drv->remove) {
> +	if (drv->remove_new) {
> +		drv->remove_new(dev);
> +	} else if (drv->remove) {
>  		int ret = drv->remove(dev);
>  
>  		if (ret)
> diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
> index b0d5a253156e..b845fd83f429 100644
> --- a/include/linux/platform_device.h
> +++ b/include/linux/platform_device.h
> @@ -207,7 +207,18 @@ extern void platform_device_put(struct platform_device *pdev);
>  
>  struct platform_driver {
>  	int (*probe)(struct platform_device *);
> +
> +	/*
> +	 * Traditionally the remove callback returned an int which however is
> +	 * ignored by the driver core. This led to wrong expectations by driver
> +	 * authors who thought returning an error code was a valid error
> +	 * handling strategy. To convert to a callback returning void, new
> +	 * drivers should implement .remove_new() until the conversion it done
> +	 * that eventually makes .remove() return void.
> +	 */
>  	int (*remove)(struct platform_device *);
> +	void (*remove_new)(struct platform_device *);
> +

Who is going to do the work of the conversion to this new prototype?
I'll be glad to take this, but I don't want to see a half-finished
conversion happen and us stuck with a "new" and "old" call, as that
would just be a mess.

thanks,

greg k-h
  
Uwe Kleine-König Dec. 9, 2022, 3:52 p.m. UTC | #2
Hello Greg,

On Fri, Dec 09, 2022 at 04:21:30PM +0100, Greg Kroah-Hartman wrote:
> On Fri, Dec 09, 2022 at 04:09:14PM +0100, Uwe Kleine-König wrote:
> > struct platform_driver::remove returning an integer made driver authors
> > expect that returning an error code was proper error handling. However
> > the driver core ignores the error and continues to remove the device
> > because there is nothing the core could do anyhow and reentering the
> > remove callback again is only calling for trouble.
> > 
> > So this is an source for errors typically yielding resource leaks in the
> > error path.
> > 
> > As there are too many platform drivers to neatly convert them all to
> > return void in a single go, do it in several steps after this patch:
> > 
> >  a) Convert all drivers to implement .remove_new() returning void instead
> >     of .remove() returning int;
> >  b) Change struct platform_driver::remove() to return void and so make
> >     it identical to .remove_new();
> >  c) Change all drivers back to .remove() now with the better prototype;
> 
> Change c) seems like it will be just as much work as a), right?

Yeah, but c) should be trivially doable per subsystem using coccinelle.
So my plan is to do a) per subsystem with one patch per driver and c)
with one patch per subsystem.

> Who is going to do the work of the conversion to this new prototype?
> I'll be glad to take this, but I don't want to see a half-finished
> conversion happen and us stuck with a "new" and "old" call, as that
> would just be a mess.

The idea is that this becomes my new pet project once 
https://lore.kernel.org/lkml/20221118224540.619276-1-uwe@kleine-koenig.org
is complete. :-)

I intend to work on that once the patch under discussion is included in
an -rc1.

Best regards
Uwe
  
Greg KH Dec. 9, 2022, 4:15 p.m. UTC | #3
On Fri, Dec 09, 2022 at 04:52:07PM +0100, Uwe Kleine-König wrote:
> Hello Greg,
> 
> On Fri, Dec 09, 2022 at 04:21:30PM +0100, Greg Kroah-Hartman wrote:
> > On Fri, Dec 09, 2022 at 04:09:14PM +0100, Uwe Kleine-König wrote:
> > > struct platform_driver::remove returning an integer made driver authors
> > > expect that returning an error code was proper error handling. However
> > > the driver core ignores the error and continues to remove the device
> > > because there is nothing the core could do anyhow and reentering the
> > > remove callback again is only calling for trouble.
> > > 
> > > So this is an source for errors typically yielding resource leaks in the
> > > error path.
> > > 
> > > As there are too many platform drivers to neatly convert them all to
> > > return void in a single go, do it in several steps after this patch:
> > > 
> > >  a) Convert all drivers to implement .remove_new() returning void instead
> > >     of .remove() returning int;
> > >  b) Change struct platform_driver::remove() to return void and so make
> > >     it identical to .remove_new();
> > >  c) Change all drivers back to .remove() now with the better prototype;
> > 
> > Change c) seems like it will be just as much work as a), right?
> 
> Yeah, but c) should be trivially doable per subsystem using coccinelle.
> So my plan is to do a) per subsystem with one patch per driver and c)
> with one patch per subsystem.
> 
> > Who is going to do the work of the conversion to this new prototype?
> > I'll be glad to take this, but I don't want to see a half-finished
> > conversion happen and us stuck with a "new" and "old" call, as that
> > would just be a mess.
> 
> The idea is that this becomes my new pet project once 
> https://lore.kernel.org/lkml/20221118224540.619276-1-uwe@kleine-koenig.org
> is complete. :-)
> 
> I intend to work on that once the patch under discussion is included in
> an -rc1.

Ok, I'll wait to queue this up to my tree until after 6.2-rc1 is out,
thanks.

greg k-h
  
Uwe Kleine-König Jan. 12, 2023, 8:20 a.m. UTC | #4
Hello Greg,

On Fri, Dec 09, 2022 at 05:15:42PM +0100, Greg Kroah-Hartman wrote:
> On Fri, Dec 09, 2022 at 04:52:07PM +0100, Uwe Kleine-König wrote:
> > On Fri, Dec 09, 2022 at 04:21:30PM +0100, Greg Kroah-Hartman wrote:
> > > On Fri, Dec 09, 2022 at 04:09:14PM +0100, Uwe Kleine-König wrote:
> > > > struct platform_driver::remove returning an integer made driver authors
> > > > expect that returning an error code was proper error handling. However
> > > > the driver core ignores the error and continues to remove the device
> > > > because there is nothing the core could do anyhow and reentering the
> > > > remove callback again is only calling for trouble.
> > > > 
> > > > So this is an source for errors typically yielding resource leaks in the
> > > > error path.
> > > > 
> > > > As there are too many platform drivers to neatly convert them all to
> > > > return void in a single go, do it in several steps after this patch:
> > > > 
> > > >  a) Convert all drivers to implement .remove_new() returning void instead
> > > >     of .remove() returning int;
> > > >  b) Change struct platform_driver::remove() to return void and so make
> > > >     it identical to .remove_new();
> > > >  c) Change all drivers back to .remove() now with the better prototype;
> > > 
> > > Change c) seems like it will be just as much work as a), right?
> > 
> > Yeah, but c) should be trivially doable per subsystem using coccinelle.
> > So my plan is to do a) per subsystem with one patch per driver and c)
> > with one patch per subsystem.
> > 
> > > Who is going to do the work of the conversion to this new prototype?
> > > I'll be glad to take this, but I don't want to see a half-finished
> > > conversion happen and us stuck with a "new" and "old" call, as that
> > > would just be a mess.
> > 
> > The idea is that this becomes my new pet project once 
> > https://lore.kernel.org/lkml/20221118224540.619276-1-uwe@kleine-koenig.org
> > is complete. :-)
> > 
> > I intend to work on that once the patch under discussion is included in
> > an -rc1.
> 
> Ok, I'll wait to queue this up to my tree until after 6.2-rc1 is out,
> thanks.

We're at v6.2-rc3 now. Is this patch still in your queue and you didn't
come around yet to apply it, or did it fell through the cracks?

Best regards
Uwe
  
Greg KH Jan. 13, 2023, 11:43 a.m. UTC | #5
On Thu, Jan 12, 2023 at 09:20:29AM +0100, Uwe Kleine-König wrote:
> Hello Greg,
> 
> On Fri, Dec 09, 2022 at 05:15:42PM +0100, Greg Kroah-Hartman wrote:
> > On Fri, Dec 09, 2022 at 04:52:07PM +0100, Uwe Kleine-König wrote:
> > > On Fri, Dec 09, 2022 at 04:21:30PM +0100, Greg Kroah-Hartman wrote:
> > > > On Fri, Dec 09, 2022 at 04:09:14PM +0100, Uwe Kleine-König wrote:
> > > > > struct platform_driver::remove returning an integer made driver authors
> > > > > expect that returning an error code was proper error handling. However
> > > > > the driver core ignores the error and continues to remove the device
> > > > > because there is nothing the core could do anyhow and reentering the
> > > > > remove callback again is only calling for trouble.
> > > > > 
> > > > > So this is an source for errors typically yielding resource leaks in the
> > > > > error path.
> > > > > 
> > > > > As there are too many platform drivers to neatly convert them all to
> > > > > return void in a single go, do it in several steps after this patch:
> > > > > 
> > > > >  a) Convert all drivers to implement .remove_new() returning void instead
> > > > >     of .remove() returning int;
> > > > >  b) Change struct platform_driver::remove() to return void and so make
> > > > >     it identical to .remove_new();
> > > > >  c) Change all drivers back to .remove() now with the better prototype;
> > > > 
> > > > Change c) seems like it will be just as much work as a), right?
> > > 
> > > Yeah, but c) should be trivially doable per subsystem using coccinelle.
> > > So my plan is to do a) per subsystem with one patch per driver and c)
> > > with one patch per subsystem.
> > > 
> > > > Who is going to do the work of the conversion to this new prototype?
> > > > I'll be glad to take this, but I don't want to see a half-finished
> > > > conversion happen and us stuck with a "new" and "old" call, as that
> > > > would just be a mess.
> > > 
> > > The idea is that this becomes my new pet project once 
> > > https://lore.kernel.org/lkml/20221118224540.619276-1-uwe@kleine-koenig.org
> > > is complete. :-)
> > > 
> > > I intend to work on that once the patch under discussion is included in
> > > an -rc1.
> > 
> > Ok, I'll wait to queue this up to my tree until after 6.2-rc1 is out,
> > thanks.
> 
> We're at v6.2-rc3 now. Is this patch still in your queue and you didn't
> come around yet to apply it, or did it fell through the cracks?

My queue is huge right now.

I'll work on this "soon".  Do you want this on a tag that others can
pull into their trees, or just in my normal driver-core-next branch?
Either is fine for me.

thanks,

greg k-h
  
Uwe Kleine-König Jan. 13, 2023, 5:40 p.m. UTC | #6
On Fri, Jan 13, 2023 at 12:43:39PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Jan 12, 2023 at 09:20:29AM +0100, Uwe Kleine-König wrote:
> > Hello Greg,
> > 
> > On Fri, Dec 09, 2022 at 05:15:42PM +0100, Greg Kroah-Hartman wrote:
> > > On Fri, Dec 09, 2022 at 04:52:07PM +0100, Uwe Kleine-König wrote:
> > > > On Fri, Dec 09, 2022 at 04:21:30PM +0100, Greg Kroah-Hartman wrote:
> > > > > On Fri, Dec 09, 2022 at 04:09:14PM +0100, Uwe Kleine-König wrote:
> > > > > > struct platform_driver::remove returning an integer made driver authors
> > > > > > expect that returning an error code was proper error handling. However
> > > > > > the driver core ignores the error and continues to remove the device
> > > > > > because there is nothing the core could do anyhow and reentering the
> > > > > > remove callback again is only calling for trouble.
> > > > > > 
> > > > > > So this is an source for errors typically yielding resource leaks in the
> > > > > > error path.
> > > > > > 
> > > > > > As there are too many platform drivers to neatly convert them all to
> > > > > > return void in a single go, do it in several steps after this patch:
> > > > > > 
> > > > > >  a) Convert all drivers to implement .remove_new() returning void instead
> > > > > >     of .remove() returning int;
> > > > > >  b) Change struct platform_driver::remove() to return void and so make
> > > > > >     it identical to .remove_new();
> > > > > >  c) Change all drivers back to .remove() now with the better prototype;
> > > > > 
> > > > > Change c) seems like it will be just as much work as a), right?
> > > > 
> > > > Yeah, but c) should be trivially doable per subsystem using coccinelle.
> > > > So my plan is to do a) per subsystem with one patch per driver and c)
> > > > with one patch per subsystem.
> > > > 
> > > > > Who is going to do the work of the conversion to this new prototype?
> > > > > I'll be glad to take this, but I don't want to see a half-finished
> > > > > conversion happen and us stuck with a "new" and "old" call, as that
> > > > > would just be a mess.
> > > > 
> > > > The idea is that this becomes my new pet project once 
> > > > https://lore.kernel.org/lkml/20221118224540.619276-1-uwe@kleine-koenig.org
> > > > is complete. :-)
> > > > 
> > > > I intend to work on that once the patch under discussion is included in
> > > > an -rc1.
> > > 
> > > Ok, I'll wait to queue this up to my tree until after 6.2-rc1 is out,
> > > thanks.
> > 
> > We're at v6.2-rc3 now. Is this patch still in your queue and you didn't
> > come around yet to apply it, or did it fell through the cracks?
> 
> My queue is huge right now.
> 
> I'll work on this "soon".  Do you want this on a tag that others can
> pull into their trees, or just in my normal driver-core-next branch?
> Either is fine for me.

In my experience maintainers stumble when patches depend on patches that
are not in -rc1. So I will be patient until this hits an -rc1. Thanks
for the offer.

Best regards
Uwe
  
Greg KH Jan. 17, 2023, 6:05 p.m. UTC | #7
On Fri, Jan 13, 2023 at 06:40:04PM +0100, Uwe Kleine-König wrote:
> On Fri, Jan 13, 2023 at 12:43:39PM +0100, Greg Kroah-Hartman wrote:
> > On Thu, Jan 12, 2023 at 09:20:29AM +0100, Uwe Kleine-König wrote:
> > > Hello Greg,
> > > 
> > > On Fri, Dec 09, 2022 at 05:15:42PM +0100, Greg Kroah-Hartman wrote:
> > > > On Fri, Dec 09, 2022 at 04:52:07PM +0100, Uwe Kleine-König wrote:
> > > > > On Fri, Dec 09, 2022 at 04:21:30PM +0100, Greg Kroah-Hartman wrote:
> > > > > > On Fri, Dec 09, 2022 at 04:09:14PM +0100, Uwe Kleine-König wrote:
> > > > > > > struct platform_driver::remove returning an integer made driver authors
> > > > > > > expect that returning an error code was proper error handling. However
> > > > > > > the driver core ignores the error and continues to remove the device
> > > > > > > because there is nothing the core could do anyhow and reentering the
> > > > > > > remove callback again is only calling for trouble.
> > > > > > > 
> > > > > > > So this is an source for errors typically yielding resource leaks in the
> > > > > > > error path.
> > > > > > > 
> > > > > > > As there are too many platform drivers to neatly convert them all to
> > > > > > > return void in a single go, do it in several steps after this patch:
> > > > > > > 
> > > > > > >  a) Convert all drivers to implement .remove_new() returning void instead
> > > > > > >     of .remove() returning int;
> > > > > > >  b) Change struct platform_driver::remove() to return void and so make
> > > > > > >     it identical to .remove_new();
> > > > > > >  c) Change all drivers back to .remove() now with the better prototype;
> > > > > > 
> > > > > > Change c) seems like it will be just as much work as a), right?
> > > > > 
> > > > > Yeah, but c) should be trivially doable per subsystem using coccinelle.
> > > > > So my plan is to do a) per subsystem with one patch per driver and c)
> > > > > with one patch per subsystem.
> > > > > 
> > > > > > Who is going to do the work of the conversion to this new prototype?
> > > > > > I'll be glad to take this, but I don't want to see a half-finished
> > > > > > conversion happen and us stuck with a "new" and "old" call, as that
> > > > > > would just be a mess.
> > > > > 
> > > > > The idea is that this becomes my new pet project once 
> > > > > https://lore.kernel.org/lkml/20221118224540.619276-1-uwe@kleine-koenig.org
> > > > > is complete. :-)
> > > > > 
> > > > > I intend to work on that once the patch under discussion is included in
> > > > > an -rc1.
> > > > 
> > > > Ok, I'll wait to queue this up to my tree until after 6.2-rc1 is out,
> > > > thanks.
> > > 
> > > We're at v6.2-rc3 now. Is this patch still in your queue and you didn't
> > > come around yet to apply it, or did it fell through the cracks?
> > 
> > My queue is huge right now.
> > 
> > I'll work on this "soon".  Do you want this on a tag that others can
> > pull into their trees, or just in my normal driver-core-next branch?
> > Either is fine for me.
> 
> In my experience maintainers stumble when patches depend on patches that
> are not in -rc1. So I will be patient until this hits an -rc1. Thanks
> for the offer.

Fair enough, now added to my tree, sorry for the delay.  Feel free to
start flooding me with these types of changes, I'll be glad to take them
through my tree if at all possible.

greg k-h
  

Patch

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 968f3d71eeab..a4938d1c8fe1 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1416,7 +1416,9 @@  static void platform_remove(struct device *_dev)
 	struct platform_driver *drv = to_platform_driver(_dev->driver);
 	struct platform_device *dev = to_platform_device(_dev);
 
-	if (drv->remove) {
+	if (drv->remove_new) {
+		drv->remove_new(dev);
+	} else if (drv->remove) {
 		int ret = drv->remove(dev);
 
 		if (ret)
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index b0d5a253156e..b845fd83f429 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -207,7 +207,18 @@  extern void platform_device_put(struct platform_device *pdev);
 
 struct platform_driver {
 	int (*probe)(struct platform_device *);
+
+	/*
+	 * Traditionally the remove callback returned an int which however is
+	 * ignored by the driver core. This led to wrong expectations by driver
+	 * authors who thought returning an error code was a valid error
+	 * handling strategy. To convert to a callback returning void, new
+	 * drivers should implement .remove_new() until the conversion it done
+	 * that eventually makes .remove() return void.
+	 */
 	int (*remove)(struct platform_device *);
+	void (*remove_new)(struct platform_device *);
+
 	void (*shutdown)(struct platform_device *);
 	int (*suspend)(struct platform_device *, pm_message_t state);
 	int (*resume)(struct platform_device *);