[02/13] leds: el15203000: Fix devm vs. non-devm ordering

Message ID 1667983694-15040-3-git-send-email-wangyufen@huawei.com
State New
Headers
Series leds: Fix devm vs. non-devm ordering |

Commit Message

wangyufen Nov. 9, 2022, 8:48 a.m. UTC
  When non-devm resources are allocated they mustn't be followed by devm
allocations, otherwise it will break the tear down ordering and might
lead to crashes or other bugs during ->remove() stage. Fix this by
wrapping mutex_destroy() call with devm_add_action_or_reset().

Fixes: fc19967bcb8f ("leds: add LED driver for EL15203000 board")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Oleh Kravchenko <oleg@kaa.org.ua>
---
 drivers/leds/leds-el15203000.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)
  

Comments

Oleh Kravchenko Nov. 9, 2022, 9:39 a.m. UTC | #1
Hello all!

> 9 лист. 2022 р. о 10:48 Wang Yufen <wangyufen@huawei.com> написав(ла):
> 
> When non-devm resources are allocated they mustn't be followed by devm
> allocations, otherwise it will break the tear down ordering and might
> lead to crashes or other bugs during ->remove() stage. Fix this by
> wrapping mutex_destroy() call with devm_add_action_or_reset().
> 
> Fixes: fc19967bcb8f ("leds: add LED driver for EL15203000 board")
> Signed-off-by: Wang Yufen <wangyufen@huawei.com>
> Cc: Oleh Kravchenko <oleg@kaa.org.ua>
> ---
> drivers/leds/leds-el15203000.c | 18 +++++++++++-------
> 1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/leds/leds-el15203000.c b/drivers/leds/leds-el15203000.c
> index 7e7b617..9be934e 100644
> --- a/drivers/leds/leds-el15203000.c
> +++ b/drivers/leds/leds-el15203000.c
> @@ -287,10 +287,16 @@ static int el15203000_probe_dt(struct el15203000 *priv)
> return ret;
> }
> 
> +static void el15203000_mutex_destroy(void *lock)
> +{
> + mutex_destroy(lock);
> +}
> +
> static int el15203000_probe(struct spi_device *spi)
> {
> struct el15203000 *priv;
> size_t count;
> + int ret;
> 
> count = device_get_child_node_count(&spi->dev);
> if (!count) {
> @@ -312,15 +318,14 @@ static int el15203000_probe(struct spi_device *spi)
> 
> spi_set_drvdata(spi, priv);
> 
> + ret = devm_add_action_or_reset(&spi->dev, el15203000_mutex_destroy,
> +       &priv->lock);
> + if (ret)
> + return ret;
> +
> return el15203000_probe_dt(priv);
> }
> 
> -static void el15203000_remove(struct spi_device *spi)

Is remove() callback from struct spi_driver deprecated?

> -{
> - struct el15203000 *priv = spi_get_drvdata(spi);
> -
> - mutex_destroy(&priv->lock);
> -}
> 
> static const struct of_device_id el15203000_dt_ids[] = {
> { .compatible = "crane,el15203000", },
> @@ -331,7 +336,6 @@ static void el15203000_remove(struct spi_device *spi)
> 
> static struct spi_driver el15203000_driver = {
> .probe = el15203000_probe,
> - .remove = el15203000_remove,
> .driver = {
> .name = KBUILD_MODNAME,
> .of_match_table = el15203000_dt_ids,
> -- 
> 1.8.3.1
>
  
wangyufen Nov. 9, 2022, 10:25 a.m. UTC | #2
在 2022/11/9 17:39, Oleh Kravchenko 写道:
> Hello all!
>
>> 9 лист. 2022 р. о 10:48 Wang Yufen <wangyufen@huawei.com> написав(ла):
>>
>> When non-devm resources are allocated they mustn't be followed by devm
>> allocations, otherwise it will break the tear down ordering and might
>> lead to crashes or other bugs during ->remove() stage. Fix this by
>> wrapping mutex_destroy() call with devm_add_action_or_reset().
>>
>> Fixes: fc19967bcb8f ("leds: add LED driver for EL15203000 board")
>> Signed-off-by: Wang Yufen <wangyufen@huawei.com>
>> Cc: Oleh Kravchenko <oleg@kaa.org.ua>
>> ---
>> drivers/leds/leds-el15203000.c | 18 +++++++++++-------
>> 1 file changed, 11 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/leds/leds-el15203000.c b/drivers/leds/leds-el15203000.c
>> index 7e7b617..9be934e 100644
>> --- a/drivers/leds/leds-el15203000.c
>> +++ b/drivers/leds/leds-el15203000.c
>> @@ -287,10 +287,16 @@ static int el15203000_probe_dt(struct el15203000 *priv)
>> return ret;
>> }
>>
>> +static void el15203000_mutex_destroy(void *lock)
>> +{
>> + mutex_destroy(lock);
>> +}
>> +
>> static int el15203000_probe(struct spi_device *spi)
>> {
>> struct el15203000 *priv;
>> size_t count;
>> + int ret;
>>
>> count = device_get_child_node_count(&spi->dev);
>> if (!count) {
>> @@ -312,15 +318,14 @@ static int el15203000_probe(struct spi_device *spi)
>>
>> spi_set_drvdata(spi, priv);
>>
>> + ret = devm_add_action_or_reset(&spi->dev, el15203000_mutex_destroy,
>> +       &priv->lock);
>> + if (ret)
>> + return ret;
>> +
>> return el15203000_probe_dt(priv);
>> }
>>
>> -static void el15203000_remove(struct spi_device *spi)
> Is remove() callback from struct spi_driver deprecated?

It is not that remove() callback is deprecated,
it's that after wrapping mutex_destroy() call with devm_add_action_or_reset(),
remove() callback is unnecessary here.

>
>> -{
>> - struct el15203000 *priv = spi_get_drvdata(spi);
>> -
>> - mutex_destroy(&priv->lock);
>> -}
>>
>> static const struct of_device_id el15203000_dt_ids[] = {
>> { .compatible = "crane,el15203000", },
>> @@ -331,7 +336,6 @@ static void el15203000_remove(struct spi_device *spi)
>>
>> static struct spi_driver el15203000_driver = {
>> .probe = el15203000_probe,
>> - .remove = el15203000_remove,
>> .driver = {
>> .name = KBUILD_MODNAME,
>> .of_match_table = el15203000_dt_ids,
>> -- 
>> 1.8.3.1
>>
>
  
Oleh Kravchenko Nov. 9, 2022, 10:43 a.m. UTC | #3
> 9 лист. 2022 р. о 12:25 wangyufen <wangyufen@huawei.com> написав(ла):
> 
> 
> 在 2022/11/9 17:39, Oleh Kravchenko 写道:
>> Hello all!
>> 
>>> 9 лист. 2022 р. о 10:48 Wang Yufen <wangyufen@huawei.com> написав(ла):
>>> 
>>> return el15203000_probe_dt(priv);
>>> }
>>> 
>>> -static void el15203000_remove(struct spi_device *spi)
>> Is remove() callback from struct spi_driver deprecated?
> 
> It is not that remove() callback is deprecated,
> it's that after wrapping mutex_destroy() call with devm_add_action_or_reset(),
> remove() callback is unnecessary here.

When remove() is called, the memory allocated by devm_*() is valid.
So what you try to fix here?

> 
>> 
>>> -{
>>> - struct el15203000 *priv = spi_get_drvdata(spi);
>>> -
>>> - mutex_destroy(&priv->lock);
>>> -}
>>> 
>>> static const struct of_device_id el15203000_dt_ids[] = {
>>> { .compatible = "crane,el15203000", },
>>> @@ -331,7 +336,6 @@ static void el15203000_remove(struct spi_device *spi)
>>> 
>>> static struct spi_driver el15203000_driver = {
>>> .probe = el15203000_probe,
>>> - .remove = el15203000_remove,
>>> .driver = {
>>> .name = KBUILD_MODNAME,
>>> .of_match_table = el15203000_dt_ids,
>>> -- 
>>> 1.8.3.1
>>> 
>>
  
Oleh Kravchenko Nov. 11, 2022, 10:39 a.m. UTC | #4
Hello Wang,

> 11 лист. 2022 р. о 11:21 wangyufen <wangyufen@huawei.com> написав(ла):
> 
> 
> 在 2022/11/9 18:43, Oleh Kravchenko 写道:
>> 
>> 
>>> 9 лист. 2022 р. о 12:25 wangyufen <wangyufen@huawei.com> написав(ла):
>>> 
>>> 
>>> 在 2022/11/9 17:39, Oleh Kravchenko 写道:
>>> 
>>>>> -static void el15203000_remove(struct spi_device *spi)
>>>>> 
>>>> Is remove() callback from struct spi_driver deprecated?
>>>> 
>>> It is not that remove() callback is deprecated,
>>> it's that after wrapping mutex_destroy() call with devm_add_action_or_reset(),
>>> remove() callback is unnecessary here.
>>> 
>> When remove() is called, the memory allocated by devm_*() is valid.
>> So what you try to fix here?
> 
> Fix the &priv->lock used after destroy, for details, please see patch #0
> LKML: Wang Yufen: [PATCH 00/13] leds: Fix devm vs. non-devm ordering

It doesn’t make any sense for me.
You saying that remove() called before devm_* allocation
if it true then set_brightness_delayed() will crash the system in anyway.

LED device has a parent SPI device; LED device can’t exist without SPI device.

So deallocation order should be next:
1. LED device devm_*()
2. SPI device remove()
  
Wei Yongjun Nov. 15, 2022, 2:06 a.m. UTC | #5
Hi Oleh,

On 2022/11/11 18:39, Oleh Kravchenko wrote:
> Hello Wang,
> 
>> 11 лист. 2022 р. о 11:21 wangyufen <wangyufen@huawei.com> написав(ла):
>>
>>
>> 在 2022/11/9 18:43, Oleh Kravchenko 写道:
>>>
>>>
>>>> 9 лист. 2022 р. о 12:25 wangyufen <wangyufen@huawei.com> написав(ла):
>>>>
>>>>
>>>> 在 2022/11/9 17:39, Oleh Kravchenko 写道:
>>>>
>>>>>> -static void el15203000_remove(struct spi_device *spi)
>>>>>>
>>>>> Is remove() callback from struct spi_driver deprecated?
>>>>>
>>>> It is not that remove() callback is deprecated,
>>>> it's that after wrapping mutex_destroy() call with devm_add_action_or_reset(),
>>>> remove() callback is unnecessary here.
>>>>
>>> When remove() is called, the memory allocated by devm_*() is valid.
>>> So what you try to fix here?
>>
>> Fix the &priv->lock used after destroy, for details, please see patch #0
>> LKML: Wang Yufen: [PATCH 00/13] leds: Fix devm vs. non-devm ordering
> 
> It doesn’t make any sense for me.
> You saying that remove() called before devm_* allocation
> if it true then set_brightness_delayed() will crash the system in anyway.
> 
> LED device has a parent SPI device; LED device can’t exist without SPI device.
> 
> So deallocation order should be next:
> 1. LED device devm_*()
> 2. SPI device remove()

The allocation order is as follows:

el15203000_probe()
  mutex_init(&priv->lock);
  el15203000_probe_dt(priv)
    device_for_each_child_node(priv->dev, child) {
      ...
      led->ldev.brightness_set_blocking = el15203000_set_blocking;
      ...
      devm_led_classdev_register_ext(priv->dev, &led->ldev, &init_data);
         dr = devres_alloc(devm_led_classdev_release, sizeof(*dr), GFP_KERNEL);
         <-- dr->node.release = devm_led_classdev_release()
         ...
         devres_add(parent, dr);
         <-- add dr->node to &priv->dev->devres_head

And the full deallocation order should be this:

1. SPI device .remove callback
2. LED device devm_*()
3. SPI device deallocation

spi_unregister_device()
  device_del()
    bus_remove_device()
      device_release_driver_internal()
        __device_release_driver()
          ...
	  device_remove()
	    spi_remove()   <-- call el15203000_remove() here, mutex_destroy(&priv->lock), lock destroy
          ...
          device_unbind_cleanup()
            devres_release_all()
              release_nodes()
              <-- traverse spi->dev->devres_head list and call dr->node.release in sequence.
                   devm_led_classdev_release()
                     led_classdev_unregister()
                     <-- flush set_brightness_work here, before the work flush, set_brightness_work may be sched.
                     <-- that is el15203000_set_blocking()..-> mutex_lock(&led->priv->lock) is called,
                     <-- this leads to the priv->lock use after destroy.
 put_device(&spi->dev) <-- spi device is deallocation in here


Regards,
Wei Yongjun
  
Wang Yufen Nov. 22, 2022, 1:10 a.m. UTC | #6
在 2022/11/15 10:06, Wei Yongjun 写道:
> Hi Oleh,
> 
> On 2022/11/11 18:39, Oleh Kravchenko wrote:
>> Hello Wang,
>>
>>> 11 лист. 2022 р. о 11:21 wangyufen <wangyufen@huawei.com> написав(ла):
>>>
>>>
>>> 在 2022/11/9 18:43, Oleh Kravchenko 写道:
>>>>
>>>>
>>>>> 9 лист. 2022 р. о 12:25 wangyufen <wangyufen@huawei.com> написав(ла):
>>>>>
>>>>>
>>>>> 在 2022/11/9 17:39, Oleh Kravchenko 写道:
>>>>>
>>>>>>> -static void el15203000_remove(struct spi_device *spi)
>>>>>>>
>>>>>> Is remove() callback from struct spi_driver deprecated?
>>>>>>
>>>>> It is not that remove() callback is deprecated,
>>>>> it's that after wrapping mutex_destroy() call with devm_add_action_or_reset(),
>>>>> remove() callback is unnecessary here.
>>>>>
>>>> When remove() is called, the memory allocated by devm_*() is valid.
>>>> So what you try to fix here?
>>>
>>> Fix the &priv->lock used after destroy, for details, please see patch #0
>>> LKML: Wang Yufen: [PATCH 00/13] leds: Fix devm vs. non-devm ordering
>>
>> It doesn’t make any sense for me.
>> You saying that remove() called before devm_* allocation
>> if it true then set_brightness_delayed() will crash the system in anyway.
>>
>> LED device has a parent SPI device; LED device can’t exist without SPI device.
>>
>> So deallocation order should be next:
>> 1. LED device devm_*()
>> 2. SPI device remove()
> 
> The allocation order is as follows:
> 
> el15203000_probe()
>    mutex_init(&priv->lock);
>    el15203000_probe_dt(priv)
>      device_for_each_child_node(priv->dev, child) {
>        ...
>        led->ldev.brightness_set_blocking = el15203000_set_blocking;
>        ...
>        devm_led_classdev_register_ext(priv->dev, &led->ldev, &init_data);
>           dr = devres_alloc(devm_led_classdev_release, sizeof(*dr), GFP_KERNEL);
>           <-- dr->node.release = devm_led_classdev_release()
>           ...
>           devres_add(parent, dr);
>           <-- add dr->node to &priv->dev->devres_head
> 
> And the full deallocation order should be this:
> 
> 1. SPI device .remove callback
> 2. LED device devm_*()
> 3. SPI device deallocation
> 
> spi_unregister_device()
>    device_del()
>      bus_remove_device()
>        device_release_driver_internal()
>          __device_release_driver()
>            ...
> 	  device_remove()
> 	    spi_remove()   <-- call el15203000_remove() here, mutex_destroy(&priv->lock), lock destroy
>            ...
>            device_unbind_cleanup()
>              devres_release_all()
>                release_nodes()
>                <-- traverse spi->dev->devres_head list and call dr->node.release in sequence.
>                     devm_led_classdev_release()
>                       led_classdev_unregister()
>                       <-- flush set_brightness_work here, before the work flush, set_brightness_work may be sched.
>                       <-- that is el15203000_set_blocking()..-> mutex_lock(&led->priv->lock) is called,
>                       <-- this leads to the priv->lock use after destroy.
>   put_device(&spi->dev) <-- spi device is deallocation in here
> 
>
Hi Oleh,

Judging from the deallocation order above, there is a issue that the 
&priv->lock used after destroy, right?

And thanks Wei for the detailed explanation.

Thanks,
Wang


> Regards,
> Wei Yongjun
>
  
Oleh Kravchenko Nov. 25, 2022, 10:21 p.m. UTC | #7
Hello Wang,

22.11.22 03:10, Wang Yufen пише:
> 
> Hi Oleh,
> 
> Judging from the deallocation order above, there is a issue that the &priv->lock used after destroy, right?
> 
> And thanks Wei for the detailed explanation.
> 
> Thanks,
> Wang

Sorry, guys.
The last russian missile attack made my work impossible.
I will try to verify all when I have the ability.
  

Patch

diff --git a/drivers/leds/leds-el15203000.c b/drivers/leds/leds-el15203000.c
index 7e7b617..9be934e 100644
--- a/drivers/leds/leds-el15203000.c
+++ b/drivers/leds/leds-el15203000.c
@@ -287,10 +287,16 @@  static int el15203000_probe_dt(struct el15203000 *priv)
 	return ret;
 }
 
+static void el15203000_mutex_destroy(void *lock)
+{
+	mutex_destroy(lock);
+}
+
 static int el15203000_probe(struct spi_device *spi)
 {
 	struct el15203000	*priv;
 	size_t			count;
+	int			ret;
 
 	count = device_get_child_node_count(&spi->dev);
 	if (!count) {
@@ -312,15 +318,14 @@  static int el15203000_probe(struct spi_device *spi)
 
 	spi_set_drvdata(spi, priv);
 
+	ret = devm_add_action_or_reset(&spi->dev, el15203000_mutex_destroy,
+				       &priv->lock);
+	if (ret)
+		return ret;
+
 	return el15203000_probe_dt(priv);
 }
 
-static void el15203000_remove(struct spi_device *spi)
-{
-	struct el15203000 *priv = spi_get_drvdata(spi);
-
-	mutex_destroy(&priv->lock);
-}
 
 static const struct of_device_id el15203000_dt_ids[] = {
 	{ .compatible = "crane,el15203000", },
@@ -331,7 +336,6 @@  static void el15203000_remove(struct spi_device *spi)
 
 static struct spi_driver el15203000_driver = {
 	.probe		= el15203000_probe,
-	.remove		= el15203000_remove,
 	.driver = {
 		.name		= KBUILD_MODNAME,
 		.of_match_table	= el15203000_dt_ids,