scsi: sd: unregister device if device_add_disk() failed in sd_probe()

Message ID 20231208082335.1754205-1-linan666@huaweicloud.com
State New
Headers
Series scsi: sd: unregister device if device_add_disk() failed in sd_probe() |

Commit Message

Li Nan Dec. 8, 2023, 8:23 a.m. UTC
  From: Li Nan <linan122@huawei.com>

"if device_add() succeeds, you should call device_del() when you want to
get rid of it."

In sd_probe(), device_add_disk() fails when device_add() has already
succeeded, so change put_device() to device_unregister() to ensure device
resources are released.

Fixes: 2a7a891f4c40 ("scsi: sd: Add error handling support for add_disk()")
Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/scsi/sd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Luis Chamberlain Dec. 22, 2023, 6:49 a.m. UTC | #1
On Fri, Dec 08, 2023 at 04:23:35PM +0800, linan666@huaweicloud.com wrote:
> From: Li Nan <linan122@huawei.com>
> 
> "if device_add() succeeds, you should call device_del() when you want to
> get rid of it."
> 
> In sd_probe(), device_add_disk() fails when device_add() has already
> succeeded, so change put_device() to device_unregister() to ensure device
> resources are released.
> 
> Fixes: 2a7a891f4c40 ("scsi: sd: Add error handling support for add_disk()")
> Signed-off-by: Li Nan <linan122@huawei.com>

Nacked-by: Luis Chamberlain <mcgrof@kernel.org>

> ---
>  drivers/scsi/sd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 542a4bbb21bc..d81cbeee06eb 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3736,7 +3736,7 @@ static int sd_probe(struct device *dev)
>  
>  	error = device_add_disk(dev, gd, NULL);
>  	if (error) {
> -		put_device(&sdkp->disk_dev);
> +		device_unregister(&sdkp->disk_dev);
>  		put_disk(gd);
>  		goto out;
>  	}

This is incorrect, device_unregister() calls:

void device_unregister(struct device *dev)                                      
{                                                                               
	pr_debug("device: '%s': %s\n", dev_name(dev), __func__);                
	device_del(dev);                                                        
	put_device(dev);                                                        
}   

So you're adding what you believe to be a correct missing device_del().
But what you missed is that if device_add_disk() fails then device_add()
did not succeed because the new code we have in the kernel *today* unwinds
this for us now.

What you missed is that in today's code inside device_add_disk(), if
device_add() succeeeds we now unwind and call device_del() for the
device for you. And so, quoting the next sentence you took from
device_add():

"If device_add() has *not* succeeded, use *only* put_device() to drop the
 reference count."

Please do reference in the future a crash dump / or explain how you
reached your conclusions if you do not have a crash dump to prove an
issue. Specially if you are suggesting it Fixes a commit.

  Luis
  
Yu Kuai Dec. 22, 2023, 8:27 a.m. UTC | #2
Hi,

在 2023/12/22 14:49, Luis Chamberlain 写道:
> On Fri, Dec 08, 2023 at 04:23:35PM +0800, linan666@huaweicloud.com wrote:
>> From: Li Nan <linan122@huawei.com>
>>
>> "if device_add() succeeds, you should call device_del() when you want to
>> get rid of it."
>>
>> In sd_probe(), device_add_disk() fails when device_add() has already
>> succeeded, so change put_device() to device_unregister() to ensure device
>> resources are released.
>>
>> Fixes: 2a7a891f4c40 ("scsi: sd: Add error handling support for add_disk()")
>> Signed-off-by: Li Nan <linan122@huawei.com>
> 
> Nacked-by: Luis Chamberlain <mcgrof@kernel.org>
> 
>> ---
>>   drivers/scsi/sd.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index 542a4bbb21bc..d81cbeee06eb 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -3736,7 +3736,7 @@ static int sd_probe(struct device *dev)
>>   
>>   	error = device_add_disk(dev, gd, NULL);
>>   	if (error) {
>> -		put_device(&sdkp->disk_dev);
>> +		device_unregister(&sdkp->disk_dev);
>>   		put_disk(gd);
>>   		goto out;
>>   	}
> 
> This is incorrect, device_unregister() calls:
> 
> void device_unregister(struct device *dev)
> {
> 	pr_debug("device: '%s': %s\n", dev_name(dev), __func__);
> 	device_del(dev);
> 	put_device(dev);
> }
> 
> So you're adding what you believe to be a correct missing device_del().
> But what you missed is that if device_add_disk() fails then device_add()
> did not succeed because the new code we have in the kernel *today* unwinds
> this for us now.

I'm confused here, there are two device here, one is 'sdkp->disk_dev',
one is gendisk->part0->bd_device, and the order in which they
initialize:

sd_probe
device_add(&sdkp->disk_dev) -> succeed
device_add_disk -> failed, and device_add(bd_device) did not succeed
put_device(&sdkp->disk_dev) -> device_del is missed

I don't see that if device_add_disk() fail, device_del() for
'sdkp->disk_dev'is called from anywhere. Do I missing anything?

Thanks,
Kuai

> 
> What you missed is that in today's code inside device_add_disk(), if
> device_add() succeeeds we now unwind and call device_del() for the
> device for you. And so, quoting the next sentence you took from
> device_add():
> 
> "If device_add() has *not* succeeded, use *only* put_device() to drop the
>   reference count."
> 
> Please do reference in the future a crash dump / or explain how you
> reached your conclusions if you do not have a crash dump to prove an
> issue. Specially if you are suggesting it Fixes a commit.
> 
>    Luis
> 
> .
>
  
Li Nan Jan. 29, 2024, 1:26 p.m. UTC | #3
friendly ping ...

在 2023/12/8 16:23, linan666@huaweicloud.com 写道:
> From: Li Nan <linan122@huawei.com>
> 
> "if device_add() succeeds, you should call device_del() when you want to
> get rid of it."
> 
> In sd_probe(), device_add_disk() fails when device_add() has already
> succeeded, so change put_device() to device_unregister() to ensure device
> resources are released.
> 
> Fixes: 2a7a891f4c40 ("scsi: sd: Add error handling support for add_disk()")
> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
>   drivers/scsi/sd.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 542a4bbb21bc..d81cbeee06eb 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3736,7 +3736,7 @@ static int sd_probe(struct device *dev)
>   
>   	error = device_add_disk(dev, gd, NULL);
>   	if (error) {
> -		put_device(&sdkp->disk_dev);
> +		device_unregister(&sdkp->disk_dev);
>   		put_disk(gd);
>   		goto out;
>   	}
  
Luis Chamberlain Jan. 29, 2024, 5:46 p.m. UTC | #4
On Fri, Dec 22, 2023 at 04:27:16PM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2023/12/22 14:49, Luis Chamberlain 写道:
> > On Fri, Dec 08, 2023 at 04:23:35PM +0800, linan666@huaweicloud.com wrote:
> > > From: Li Nan <linan122@huawei.com>
> > > 
> > > "if device_add() succeeds, you should call device_del() when you want to
> > > get rid of it."
> > > 
> > > In sd_probe(), device_add_disk() fails when device_add() has already
> > > succeeded, so change put_device() to device_unregister() to ensure device
> > > resources are released.
> > > 
> > > Fixes: 2a7a891f4c40 ("scsi: sd: Add error handling support for add_disk()")
> > > Signed-off-by: Li Nan <linan122@huawei.com>
> > 
> > Nacked-by: Luis Chamberlain <mcgrof@kernel.org>
> > 
> > > ---
> > >   drivers/scsi/sd.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> > > index 542a4bbb21bc..d81cbeee06eb 100644
> > > --- a/drivers/scsi/sd.c
> > > +++ b/drivers/scsi/sd.c
> > > @@ -3736,7 +3736,7 @@ static int sd_probe(struct device *dev)
> > >   	error = device_add_disk(dev, gd, NULL);
> > >   	if (error) {
> > > -		put_device(&sdkp->disk_dev);
> > > +		device_unregister(&sdkp->disk_dev);
> > >   		put_disk(gd);
> > >   		goto out;
> > >   	}
> > 
> > This is incorrect, device_unregister() calls:
> > 
> > void device_unregister(struct device *dev)
> > {
> > 	pr_debug("device: '%s': %s\n", dev_name(dev), __func__);
> > 	device_del(dev);
> > 	put_device(dev);
> > }
> > 
> > So you're adding what you believe to be a correct missing device_del().
> > But what you missed is that if device_add_disk() fails then device_add()
> > did not succeed because the new code we have in the kernel *today* unwinds
> > this for us now.
> 
> I'm confused here, there are two device here, one is 'sdkp->disk_dev',
> one is gendisk->part0->bd_device, and the order in which they
> initialize:
> 
> sd_probe
> device_add(&sdkp->disk_dev) -> succeed
> device_add_disk -> failed, and device_add(bd_device) did not succeed
> put_device(&sdkp->disk_dev) -> device_del is missed
> 
> I don't see that if device_add_disk() fail, device_del() for
> 'sdkp->disk_dev'is called from anywhere. Do I missing anything?

Ah then the fix is still incorrect and the commit log should
describe that this is for another device.

How about this instead?

From c3f6e03f4a82aa253b6c487a293dcd576393b606 Mon Sep 17 00:00:00 2001
From: Luis Chamberlain <mcgrof@kernel.org>
Date: Mon, 29 Jan 2024 09:25:18 -0800
Subject: [PATCH] sd: remove extra put_device() for extra scsi device

The sd driver first device_add() its own device, and later use
device_add_disk() with another device. When we added error handling
for device_add_disk() we now call put_disk() and that will trigger
disk_release() when the refcount is 0. That will end up calling
the block driver's disk->fops->free_disk() if one is defined. The
sd driver has scsi_disk_free_disk() as its free_disk() and that
does the proper put_device(&sdkp->disk_dev) for us so we should not
need to call it, however we are left still missing the device_del()
for it.

While at it, unwind with scsi_autopm_put_device(sdp) *prior* to
putting to device as we do in sd_remove().

Reported-by: Li Nan <linan122@huawei.com>
Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
Fixes: 2a7a891f4c40 ("scsi: sd: Add error handling support for add_disk()")
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/scsi/sd.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 7f949adbadfd..6475a3c947f8 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3693,8 +3693,9 @@ static int sd_probe(struct device *dev)
 
 	error = device_add(&sdkp->disk_dev);
 	if (error) {
+		scsi_autopm_put_device(sdp);
 		put_device(&sdkp->disk_dev);
-		goto out;
+		return error;
 	}
 
 	dev_set_drvdata(dev, sdkp);
@@ -3734,9 +3735,10 @@ static int sd_probe(struct device *dev)
 
 	error = device_add_disk(dev, gd, NULL);
 	if (error) {
-		put_device(&sdkp->disk_dev);
+		scsi_autopm_put_device(sdp);
+		device_del(&sdkp->disk_dev);
 		put_disk(gd);
-		goto out;
+		return error;
 	}
 
 	if (sdkp->security) {
  
Yu Kuai Jan. 30, 2024, 1:30 a.m. UTC | #5
Hi,

在 2024/01/30 1:46, Luis Chamberlain 写道:
> On Fri, Dec 22, 2023 at 04:27:16PM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/12/22 14:49, Luis Chamberlain 写道:
>>> On Fri, Dec 08, 2023 at 04:23:35PM +0800, linan666@huaweicloud.com wrote:
>>>> From: Li Nan <linan122@huawei.com>
>>>>
>>>> "if device_add() succeeds, you should call device_del() when you want to
>>>> get rid of it."
>>>>
>>>> In sd_probe(), device_add_disk() fails when device_add() has already
>>>> succeeded, so change put_device() to device_unregister() to ensure device
>>>> resources are released.
>>>>
>>>> Fixes: 2a7a891f4c40 ("scsi: sd: Add error handling support for add_disk()")
>>>> Signed-off-by: Li Nan <linan122@huawei.com>
>>>
>>> Nacked-by: Luis Chamberlain <mcgrof@kernel.org>
>>>
>>>> ---
>>>>    drivers/scsi/sd.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>>>> index 542a4bbb21bc..d81cbeee06eb 100644
>>>> --- a/drivers/scsi/sd.c
>>>> +++ b/drivers/scsi/sd.c
>>>> @@ -3736,7 +3736,7 @@ static int sd_probe(struct device *dev)
>>>>    	error = device_add_disk(dev, gd, NULL);
>>>>    	if (error) {
>>>> -		put_device(&sdkp->disk_dev);
>>>> +		device_unregister(&sdkp->disk_dev);
>>>>    		put_disk(gd);
>>>>    		goto out;
>>>>    	}
>>>
>>> This is incorrect, device_unregister() calls:
>>>
>>> void device_unregister(struct device *dev)
>>> {
>>> 	pr_debug("device: '%s': %s\n", dev_name(dev), __func__);
>>> 	device_del(dev);
>>> 	put_device(dev);
>>> }
>>>
>>> So you're adding what you believe to be a correct missing device_del().
>>> But what you missed is that if device_add_disk() fails then device_add()
>>> did not succeed because the new code we have in the kernel *today* unwinds
>>> this for us now.
>>
>> I'm confused here, there are two device here, one is 'sdkp->disk_dev',
>> one is gendisk->part0->bd_device, and the order in which they
>> initialize:
>>
>> sd_probe
>> device_add(&sdkp->disk_dev) -> succeed
>> device_add_disk -> failed, and device_add(bd_device) did not succeed
>> put_device(&sdkp->disk_dev) -> device_del is missed
>>
>> I don't see that if device_add_disk() fail, device_del() for
>> 'sdkp->disk_dev'is called from anywhere. Do I missing anything?
> 
> Ah then the fix is still incorrect and the commit log should
> describe that this is for another device.
> 
> How about this instead?
> 
>>From c3f6e03f4a82aa253b6c487a293dcd576393b606 Mon Sep 17 00:00:00 2001
> From: Luis Chamberlain <mcgrof@kernel.org>
> Date: Mon, 29 Jan 2024 09:25:18 -0800
> Subject: [PATCH] sd: remove extra put_device() for extra scsi device
> 
> The sd driver first device_add() its own device, and later use
> device_add_disk() with another device. When we added error handling
> for device_add_disk() we now call put_disk() and that will trigger
> disk_release() when the refcount is 0. That will end up calling
> the block driver's disk->fops->free_disk() if one is defined. The

This is incorrect. GD_ADDED will only set when device_add_disk()
succeed, and free_disk() will only be called from disk_release() if
GD_ADDED is set. I think Li Nan's patch is correct.

> sd driver has scsi_disk_free_disk() as its free_disk() and that
> does the proper put_device(&sdkp->disk_dev) for us so we should not
> need to call it, however we are left still missing the device_del()
> for it.
> 
> While at it, unwind with scsi_autopm_put_device(sdp) *prior* to
> putting to device as we do in sd_remove().
> 
> Reported-by: Li Nan <linan122@huawei.com>
> Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
> Fixes: 2a7a891f4c40 ("scsi: sd: Add error handling support for add_disk()")
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>   drivers/scsi/sd.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 7f949adbadfd..6475a3c947f8 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3693,8 +3693,9 @@ static int sd_probe(struct device *dev)
>   
>   	error = device_add(&sdkp->disk_dev);
>   	if (error) {
> +		scsi_autopm_put_device(sdp);
>   		put_device(&sdkp->disk_dev);
> -		goto out;
> +		return error;

I don't see why this is necessary, the tag 'out' is still there. If
you think is a problem, I think you need a separate patch to call
scsi_autopm_put_device() before putting the device.

Thanks,
Kuai

>   	}
>   
>   	dev_set_drvdata(dev, sdkp);
> @@ -3734,9 +3735,10 @@ static int sd_probe(struct device *dev)
>   
>   	error = device_add_disk(dev, gd, NULL);
>   	if (error) {
> -		put_device(&sdkp->disk_dev);
> +		scsi_autopm_put_device(sdp);
> +		device_del(&sdkp->disk_dev);
>   		put_disk(gd);
> -		goto out;
> +		return error;
>   	}
>   
>   	if (sdkp->security) {
>
  
Li Nan Feb. 22, 2024, 9:24 a.m. UTC | #6
friendly ping...

在 2023/12/8 16:23, linan666@huaweicloud.com 写道:
> From: Li Nan <linan122@huawei.com>
> 
> "if device_add() succeeds, you should call device_del() when you want to
> get rid of it."
> 
> In sd_probe(), device_add_disk() fails when device_add() has already
> succeeded, so change put_device() to device_unregister() to ensure device
> resources are released.
> 
> Fixes: 2a7a891f4c40 ("scsi: sd: Add error handling support for add_disk()")
> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
>   drivers/scsi/sd.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 542a4bbb21bc..d81cbeee06eb 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3736,7 +3736,7 @@ static int sd_probe(struct device *dev)
>   
>   	error = device_add_disk(dev, gd, NULL);
>   	if (error) {
> -		put_device(&sdkp->disk_dev);
> +		device_unregister(&sdkp->disk_dev);
>   		put_disk(gd);
>   		goto out;
>   	}
  

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 542a4bbb21bc..d81cbeee06eb 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3736,7 +3736,7 @@  static int sd_probe(struct device *dev)
 
 	error = device_add_disk(dev, gd, NULL);
 	if (error) {
-		put_device(&sdkp->disk_dev);
+		device_unregister(&sdkp->disk_dev);
 		put_disk(gd);
 		goto out;
 	}