dax/hmem: Fix refcount leak in dax_hmem_probe()

Message ID 20221203095858.612027-1-liuyongqiang13@huawei.com
State New
Headers
Series dax/hmem: Fix refcount leak in dax_hmem_probe() |

Commit Message

Yongqiang Liu Dec. 3, 2022, 9:58 a.m. UTC
  We should always call dax_region_put() whenever devm_create_dev_dax()
succeed or fail to avoid refcount leak of dax_region. Move the return
value check after dax_region_put().

Fixes: c01044cc8191 ("ACPI: HMAT: refactor hmat_register_target_device to hmem_register_device")
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
---
 drivers/dax/hmem/hmem.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
  

Comments

Ira Weiny Dec. 3, 2022, 8:49 p.m. UTC | #1
On Sat, Dec 03, 2022 at 09:58:58AM +0000, Yongqiang Liu wrote:
> We should always call dax_region_put() whenever devm_create_dev_dax()
> succeed or fail to avoid refcount leak of dax_region. Move the return
> value check after dax_region_put().

I think dax_region_put is called from dax_region_unregister() automatically on
tear down.

> 
> Fixes: c01044cc8191 ("ACPI: HMAT: refactor hmat_register_target_device to hmem_register_device")

I'm also not sure how this patch is related to this fix.

Ira

> Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
> ---
>  drivers/dax/hmem/hmem.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 1bf040dbc834..09f5cd7b6c8e 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -36,12 +36,11 @@ static int dax_hmem_probe(struct platform_device *pdev)
>  		.size = region_idle ? 0 : resource_size(res),
>  	};
>  	dev_dax = devm_create_dev_dax(&data);
> -	if (IS_ERR(dev_dax))
> -		return PTR_ERR(dev_dax);
>  
>  	/* child dev_dax instances now own the lifetime of the dax_region */
>  	dax_region_put(dax_region);
> -	return 0;
> +
> +	return IS_ERR(dev_dax) ? PTR_ERR(dev_dax) : 0;
>  }
>  
>  static int dax_hmem_remove(struct platform_device *pdev)
> -- 
> 2.25.1
> 
>
  
Yongqiang Liu Dec. 6, 2022, 9:17 a.m. UTC | #2
在 2022/12/4 4:49, Ira Weiny 写道:
> On Sat, Dec 03, 2022 at 09:58:58AM +0000, Yongqiang Liu wrote:
>> We should always call dax_region_put() whenever devm_create_dev_dax()
>> succeed or fail to avoid refcount leak of dax_region. Move the return
>> value check after dax_region_put().
> I think dax_region_put is called from dax_region_unregister() automatically on
> tear down.
Yes, Thanks for your explanation.
>> Fixes: c01044cc8191 ("ACPI: HMAT: refactor hmat_register_target_device to hmem_register_device")
> I'm also not sure how this patch is related to this fix.
>
> Ira
>
>> Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
>> ---
>>   drivers/dax/hmem/hmem.c | 5 ++---
>>   1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
>> index 1bf040dbc834..09f5cd7b6c8e 100644
>> --- a/drivers/dax/hmem/hmem.c
>> +++ b/drivers/dax/hmem/hmem.c
>> @@ -36,12 +36,11 @@ static int dax_hmem_probe(struct platform_device *pdev)
>>   		.size = region_idle ? 0 : resource_size(res),
>>   	};
>>   	dev_dax = devm_create_dev_dax(&data);
>> -	if (IS_ERR(dev_dax))
>> -		return PTR_ERR(dev_dax);
>>   
>>   	/* child dev_dax instances now own the lifetime of the dax_region */
>>   	dax_region_put(dax_region);
>> -	return 0;
>> +
>> +	return IS_ERR(dev_dax) ? PTR_ERR(dev_dax) : 0;
>>   }
>>   
>>   static int dax_hmem_remove(struct platform_device *pdev)
>> -- 
>> 2.25.1
>>
>>
> .
  
Paul Cassella June 2, 2023, 2:31 a.m. UTC | #3
On Sat, 3 Dec 2022, Ira Weiny wrote:
> On Sat, Dec 03, 2022 at 09:58:58AM +0000, Yongqiang Liu wrote:

> > We should always call dax_region_put() whenever devm_create_dev_dax()
> > succeed or fail to avoid refcount leak of dax_region. Move the return
> > value check after dax_region_put().

> I think dax_region_put is called from dax_region_unregister() automatically on
> tear down.

Hi Ira,

Note the reference dax_region_unregister() will be putting is the one 
devm_create_dev_dax() takes by kref_get(&dax_region->kref). I think 
dax_hmem_probe() needs to put its reference in the error case, as in the 
successful case.

Consider, devm_create_dev_dax() has error paths that return without 
involving dax_region_unregister(), prior to kref_get() and device_add().  
dax_hmem_probe() is clearly responsible for freeing the region in those 
cases.


dax_hmem_probe() drops its own reference in the successful case because 
(per the comment) "child dev_dax instances now own the lifetime of the 
dax_region".  That ownership is the reference that the error-case 
dax_region_unregister() is dropping.  dax_hmem_probe()'s initial reference 
also needs to be dropped in the error case, as in the successful case.


> > Fixes: c01044cc8191 ("ACPI: HMAT: refactor hmat_register_target_device to hmem_register_device")
> 
> I'm also not sure how this patch is related to this fix.
> 
> Ira
> 
> > Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
> > ---
> >  drivers/dax/hmem/hmem.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> > index 1bf040dbc834..09f5cd7b6c8e 100644
> > --- a/drivers/dax/hmem/hmem.c
> > +++ b/drivers/dax/hmem/hmem.c
> > @@ -36,12 +36,11 @@ static int dax_hmem_probe(struct platform_device *pdev)
> >  		.size = region_idle ? 0 : resource_size(res),
> >  	};
> >  	dev_dax = devm_create_dev_dax(&data);
> > -	if (IS_ERR(dev_dax))
> > -		return PTR_ERR(dev_dax);
> >  
> >  	/* child dev_dax instances now own the lifetime of the dax_region */

This comment should probably be updated now.  :)
  
Ira Weiny June 2, 2023, 6:43 p.m. UTC | #4
Paul Cassella wrote:
> On Sat, 3 Dec 2022, Ira Weiny wrote:
> > On Sat, Dec 03, 2022 at 09:58:58AM +0000, Yongqiang Liu wrote:
> 
> > > We should always call dax_region_put() whenever devm_create_dev_dax()
> > > succeed or fail to avoid refcount leak of dax_region. Move the return
> > > value check after dax_region_put().
> 
> > I think dax_region_put is called from dax_region_unregister() automatically on
> > tear down.
> 
> Hi Ira,
> 
> Note the reference dax_region_unregister() will be putting is the one 
> devm_create_dev_dax() takes by kref_get(&dax_region->kref).   I think 
> dax_hmem_probe() needs to put its reference in the error case, as in the 
> successful case.

Looking at this again I'm inclined to agree that something is wrong.  But
I'm not sure this patch fixes it. anything.

When you say:

> ...   I think 
> dax_hmem_probe() needs to put its reference in the error case, as in the 
> successful case.

... which kref_get() is dax_hmem_probe() letting go?

Here is what I see with the current code.

dax_hmem_probe()
	alloc_dax_region()
		kref_get(&dax_region->kref)
		devm_add_action_or_reset(... dax_region_unregister, ...)
				... kref_put() later...

	devm_create_dev_dax()
		...may return error...
		kref_get()
		[dev_dax_release() set to call kref_put() later]
		...may return error...

	if not error
		dax_region_put() => kref_put()

I think this is an extra unneeded put???

Dan I see this pattern repeated in cxl and pmem.  Is the intent to remove
the need for dax_region_unregister() to be called when the platform device
unwinds because the platform device is not intended to own the dax_region
after success?  If so it seems like the device managed call should be
removed too.  Not just calling dax_region_put()?  :-/

But wouldn't that cause an issue with the sysfs entries created?

> 
> Consider, devm_create_dev_dax() has error paths that return without 
> involving dax_region_unregister(), prior to kref_get() and device_add().  
> dax_hmem_probe() is clearly responsible for freeing the region in those 
> cases.

No the devm_add_action_or_reset(... dax_region_unregister, ...) will cause
dax_region_unregister() to release the reference when the platform device
unwinds.

devm_create_dev_dax() configures a reference release through the
dev_dax->type release.  So when the dev_dax device is put the dax_region
will be released through dev_dax_release()->dax_region_put().

> 
> 
> dax_hmem_probe() drops its own reference in the successful case because 
> (per the comment) "child dev_dax instances now own the lifetime of the 
> dax_region".  That ownership is the reference that the error-case 
> dax_region_unregister() is dropping.

No dax_region_unregister() is not just an error case flow.

>
> dax_hmem_probe()'s initial reference 
> also needs to be dropped in the error case, as in the successful case.
> 

I don't follow this.  Doesn't this now result in an invalid reference
release in dax_region_unregister() when the platform device is unwound?
Furthermore, that reference is required I think for the sysfs entries.

> 
> > > Fixes: c01044cc8191 ("ACPI: HMAT: refactor hmat_register_target_device to hmem_register_device")
> > 
> > I'm also not sure how this patch is related to this fix.
> > 
> > Ira
> > 
> > > Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
> > > ---
> > >  drivers/dax/hmem/hmem.c | 5 ++---
> > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> > > index 1bf040dbc834..09f5cd7b6c8e 100644
> > > --- a/drivers/dax/hmem/hmem.c
> > > +++ b/drivers/dax/hmem/hmem.c
> > > @@ -36,12 +36,11 @@ static int dax_hmem_probe(struct platform_device *pdev)
> > >  		.size = region_idle ? 0 : resource_size(res),
> > >  	};
> > >  	dev_dax = devm_create_dev_dax(&data);
> > > -	if (IS_ERR(dev_dax))
> > > -		return PTR_ERR(dev_dax);
> > >  
> > >  	/* child dev_dax instances now own the lifetime of the dax_region */
> 
> This comment should probably be updated now.  :)
> 

I think removed...

Dan what do you think of this patch?  Am I seriously off the rails here?
I worry about this code being around for so long.  But since it is in tear
down perhaps it is just a race which has never been lost.

Ira

---- 8< ---------------------

From f63969c761b04fb5e646e7ba7df77a48bc26ba1c Mon Sep 17 00:00:00 2001
From: Ira Weiny <ira.weiny@intel.com>
Date: Fri, 2 Jun 2023 11:17:10 -0700
Subject: [PATCH] dax: Avoid extra kref_put() of dax_regions

In alloc_dax_region() sysfs attribute groups are created against the
parent object the dax_region is being created under.  A reference to the
dax_region was thus obtained for the lifetime of the parent device via
kobj_get() and released via dax_region_unregister().

The ownership of the dax_region was intended to be transferred to the
device dax device but this transfer is not necessary and could be
problematic if the sysfs entries are used after the dax device is
unwound but before the parent device is.

For the dax device the dax_region reference taken during creation in
devm_create_dev_dax() is sufficient to ensure the dax_region lifetime
for both the parent device and the dax device.

Remove the extraneous dax_region_put() from all call paths with this
pattern.

Not-Yet-Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 drivers/dax/cxl.c       | 3 ---
 drivers/dax/hmem/hmem.c | 3 ---
 drivers/dax/pmem.c      | 7 +------
 3 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
index ccdf8de85bd5..d3c3654842ba 100644
--- a/drivers/dax/cxl.c
+++ b/drivers/dax/cxl.c
@@ -31,9 +31,6 @@ static int cxl_dax_region_probe(struct device *dev)
 	dev_dax = devm_create_dev_dax(&data);
 	if (IS_ERR(dev_dax))
 		return PTR_ERR(dev_dax);
-
-	/* child dev_dax instances now own the lifetime of the dax_region */
-	dax_region_put(dax_region);
 	return 0;
 }
 
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index e5fe8b39fb94..d22d56964120 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -41,9 +41,6 @@ static int dax_hmem_probe(struct platform_device *pdev)
 	dev_dax = devm_create_dev_dax(&data);
 	if (IS_ERR(dev_dax))
 		return PTR_ERR(dev_dax);
-
-	/* child dev_dax instances now own the lifetime of the dax_region */
-	dax_region_put(dax_region);
 	return 0;
 }
 
diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index f050ea78bb83..efbdaac51e5f 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -65,12 +65,7 @@ static struct dev_dax *__dax_pmem_probe(struct device *dev)
 		.pgmap = &pgmap,
 		.size = range_len(&range),
 	};
-	dev_dax = devm_create_dev_dax(&data);
-
-	/* child dev_dax instances now own the lifetime of the dax_region */
-	dax_region_put(dax_region);
-
-	return dev_dax;
+	return devm_create_dev_dax(&data);
 }
 
 static int dax_pmem_probe(struct device *dev)
  
Paul Cassella June 2, 2023, 9:11 p.m. UTC | #5
On Fri, 2 Jun 2023, Ira Weiny wrote:
> Paul Cassella wrote:
> > On Sat, 3 Dec 2022, Ira Weiny wrote:
> > > On Sat, Dec 03, 2022 at 09:58:58AM +0000, Yongqiang Liu wrote:

> > > > We should always call dax_region_put() whenever devm_create_dev_dax()
> > > > succeed or fail to avoid refcount leak of dax_region. Move the return
> > > > value check after dax_region_put().

> > > I think dax_region_put is called from dax_region_unregister() automatically on
> > > tear down.

> > Note the reference dax_region_unregister() will be putting is the one 
> > devm_create_dev_dax() takes by kref_get(&dax_region->kref).   I think 
> > dax_hmem_probe() needs to put its reference in the error case, as in the 
> > successful case.

> Looking at this again I'm inclined to agree that something is wrong.  But
> I'm not sure this patch fixes it. anything.

> When you say:
> 
> > ...   I think 
> > dax_hmem_probe() needs to put its reference in the error case, as in the 
> > successful case.
> 
> ... which kref_get() is dax_hmem_probe() letting go?

Isn't it letting go of the initial kref_init() reference from 
alloc_dax_region()?

Sorry, I had gone through the code a little carelessly yesterday.  Now I 
think that kref_init() reference is the one that dax_hmem_probe() is 
dropping in the success case, and which still needs to be dropped in the 
error case.

(If so, I think the alloc_dax_region() path that returns NULL on 
devm_add_action_or_reset() failure, releasing the kref_get reference, will 
leak the kref_init reference and the region.)
  
Ira Weiny June 2, 2023, 11:34 p.m. UTC | #6
Paul Cassella wrote:
> On Fri, 2 Jun 2023, Ira Weiny wrote:
> > Paul Cassella wrote:
> > > On Sat, 3 Dec 2022, Ira Weiny wrote:
> > > > On Sat, Dec 03, 2022 at 09:58:58AM +0000, Yongqiang Liu wrote:
> 
> > > > > We should always call dax_region_put() whenever devm_create_dev_dax()
> > > > > succeed or fail to avoid refcount leak of dax_region. Move the return
> > > > > value check after dax_region_put().
> 
> > > > I think dax_region_put is called from dax_region_unregister() automatically on
> > > > tear down.
> 
> > > Note the reference dax_region_unregister() will be putting is the one 
> > > devm_create_dev_dax() takes by kref_get(&dax_region->kref).   I think 
> > > dax_hmem_probe() needs to put its reference in the error case, as in the 
> > > successful case.
> 
> > Looking at this again I'm inclined to agree that something is wrong.  But
> > I'm not sure this patch fixes it. anything.
> 
> > When you say:
> > 
> > > ...   I think 
> > > dax_hmem_probe() needs to put its reference in the error case, as in the 
> > > successful case.
> > 
> > ... which kref_get() is dax_hmem_probe() letting go?
> 
> Isn't it letting go of the initial kref_init() reference from 
> alloc_dax_region()?

Oh wow!  I did not realize that about kref_init()...  :-(

> 
> Sorry, I had gone through the code a little carelessly yesterday.  Now I 
> think that kref_init() reference is the one that dax_hmem_probe() is 
> dropping in the success case, and which still needs to be dropped in the 
> error case.

yea ok...

> 
> (If so, I think the alloc_dax_region() path that returns NULL on 
> devm_add_action_or_reset() failure, releasing the kref_get reference, will 
> leak the kref_init reference and the region.)

I see now.  The ref is taken prior to the add action or reset to ensure
the dax_region does not go away should the platform device go away...

However, in all the call paths currently the device passed to
alloc_dax_region() can't go away prior to the dev_dax taking a reference.
So I don't think this extra reference is required.  :-/

As you say the ref counting right now is not correct on error flows.  But
I'm torn on the correct solution.

I think a small series broken out so it can be backported if needed would
be best.

Ira
  
Dan Williams June 3, 2023, 4:11 a.m. UTC | #7
Paul Cassella wrote:
> On Fri, 2 Jun 2023, Ira Weiny wrote:
> > Paul Cassella wrote:
> > > On Sat, 3 Dec 2022, Ira Weiny wrote:
> > > > On Sat, Dec 03, 2022 at 09:58:58AM +0000, Yongqiang Liu wrote:
> 
> > > > > We should always call dax_region_put() whenever devm_create_dev_dax()
> > > > > succeed or fail to avoid refcount leak of dax_region. Move the return
> > > > > value check after dax_region_put().
> 
> > > > I think dax_region_put is called from dax_region_unregister() automatically on
> > > > tear down.
> 
> > > Note the reference dax_region_unregister() will be putting is the one 
> > > devm_create_dev_dax() takes by kref_get(&dax_region->kref).   I think 
> > > dax_hmem_probe() needs to put its reference in the error case, as in the 
> > > successful case.
> 
> > Looking at this again I'm inclined to agree that something is wrong.  But
> > I'm not sure this patch fixes it. anything.
> 
> > When you say:
> > 
> > > ...   I think 
> > > dax_hmem_probe() needs to put its reference in the error case, as in the 
> > > successful case.
> > 
> > ... which kref_get() is dax_hmem_probe() letting go?
> 
> Isn't it letting go of the initial kref_init() reference from 
> alloc_dax_region()?
> 
> Sorry, I had gone through the code a little carelessly yesterday.  Now I 
> think that kref_init() reference is the one that dax_hmem_probe() is 
> dropping in the success case, and which still needs to be dropped in the 
> error case.
> 
> (If so, I think the alloc_dax_region() path that returns NULL on 
> devm_add_action_or_reset() failure, releasing the kref_get reference, will 
> leak the kref_init reference and the region.)

Yes, *this* one looks valid. All the other patches in this thread had me
asking, "um, did you try it?".

Now, that said, the games that this init path is playing are hard to
follow and it should not be the case that alloc_dax_region() needs to
hold a reference on behalf of some future code path that might need it.

Lets make this clearer by moving the reference count management closer
to the object that needs it dax_region->ida. Where the extra references
were being taken on behalf of "static" regions just in case they needed
to be reference in dev_dax_release().

I also found a dax_mapping lifetime issue while testing this, so I
appreciate the focus here.
  

Patch

diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 1bf040dbc834..09f5cd7b6c8e 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -36,12 +36,11 @@  static int dax_hmem_probe(struct platform_device *pdev)
 		.size = region_idle ? 0 : resource_size(res),
 	};
 	dev_dax = devm_create_dev_dax(&data);
-	if (IS_ERR(dev_dax))
-		return PTR_ERR(dev_dax);
 
 	/* child dev_dax instances now own the lifetime of the dax_region */
 	dax_region_put(dax_region);
-	return 0;
+
+	return IS_ERR(dev_dax) ? PTR_ERR(dev_dax) : 0;
 }
 
 static int dax_hmem_remove(struct platform_device *pdev)