[8/8] mm: memory-failure: fix race window when trying to get hugetlb folio

Message ID 20230708085744.3599311-9-linmiaohe@huawei.com
State New
Headers
Series A few fixup and cleanup patches for memory-failure |

Commit Message

Miaohe Lin July 8, 2023, 8:57 a.m. UTC
  page_folio() is fetched before calling get_hwpoison_hugetlb_folio()
without hugetlb_lock being held. So hugetlb page could be demoted
before get_hwpoison_hugetlb_folio() holding hugetlb_lock but after
page_folio() is fetched. So get_hwpoison_hugetlb_folio() will hold
unexpected extra refcnt of hugetlb folio while leaving demoted page
un-refcnted.

Fixes: 25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/memory-failure.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)
  

Comments

Naoya Horiguchi July 10, 2023, 7:58 a.m. UTC | #1
On Sat, Jul 08, 2023 at 04:57:44PM +0800, Miaohe Lin wrote:
> page_folio() is fetched before calling get_hwpoison_hugetlb_folio()
> without hugetlb_lock being held. So hugetlb page could be demoted
> before get_hwpoison_hugetlb_folio() holding hugetlb_lock but after
> page_folio() is fetched. So get_hwpoison_hugetlb_folio() will hold
> unexpected extra refcnt of hugetlb folio while leaving demoted page
> un-refcnted.

Very nice, thank you for finding the issue.

> 
> Fixes: 25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation")
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/memory-failure.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 76d88d27cdbe..066bf57f2d22 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1388,8 +1388,14 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags)
>  	bool hugetlb = false;
>  
>  	ret = get_hwpoison_hugetlb_folio(folio, &hugetlb, false);
> -	if (hugetlb)
> -		return ret;
> +	if (hugetlb) {
> +		if (folio == page_folio(page))
> +			return ret;

Some short comment about the race against demotion here is helpful.

Anyway, the patch looks good to me.

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

> +		if (ret > 0) {
> +			folio_put(folio);
> +			folio = page_folio(page);
> +		}
> +	}
>  
>  	/*
>  	 * This check prevents from calling folio_try_get() for any
> @@ -1478,8 +1484,12 @@ static int __get_unpoison_page(struct page *page)
>  	bool hugetlb = false;
>  
>  	ret = get_hwpoison_hugetlb_folio(folio, &hugetlb, true);
> -	if (hugetlb)
> -		return ret;
> +	if (hugetlb) {
> +		if (folio == page_folio(page))
> +			return ret;
> +		if (ret > 0)
> +			folio_put(folio);
> +	}
>  
>  	/*
>  	 * PageHWPoisonTakenOff pages are not only marked as PG_hwpoison,
> -- 
> 2.33.0
> 
> 
>
  
Miaohe Lin July 10, 2023, 8:32 a.m. UTC | #2
On 2023/7/10 15:58, Naoya Horiguchi wrote:
> On Sat, Jul 08, 2023 at 04:57:44PM +0800, Miaohe Lin wrote:
>> page_folio() is fetched before calling get_hwpoison_hugetlb_folio()
>> without hugetlb_lock being held. So hugetlb page could be demoted
>> before get_hwpoison_hugetlb_folio() holding hugetlb_lock but after
>> page_folio() is fetched. So get_hwpoison_hugetlb_folio() will hold
>> unexpected extra refcnt of hugetlb folio while leaving demoted page
>> un-refcnted.
> 
> Very nice, thank you for finding the issue.
> 
>>
>> Fixes: 25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation")
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/memory-failure.c | 18 ++++++++++++++----
>>  1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 76d88d27cdbe..066bf57f2d22 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1388,8 +1388,14 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags)
>>  	bool hugetlb = false;
>>  
>>  	ret = get_hwpoison_hugetlb_folio(folio, &hugetlb, false);
>> -	if (hugetlb)
>> -		return ret;
>> +	if (hugetlb) {
>> +		if (folio == page_folio(page))
>> +			return ret;
> 
> Some short comment about the race against demotion here is helpful.

Does the below comment makes sense to you?

"
  Make sure hugetlb demotion did not happen from under us.
"

> 
> Anyway, the patch looks good to me.
> 
> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Many thanks for your review and comment, Naoya.
  
Naoya Horiguchi July 10, 2023, 8:39 a.m. UTC | #3
On Mon, Jul 10, 2023 at 04:32:27PM +0800, Miaohe Lin wrote:
> On 2023/7/10 15:58, Naoya Horiguchi wrote:
> > On Sat, Jul 08, 2023 at 04:57:44PM +0800, Miaohe Lin wrote:
> >> page_folio() is fetched before calling get_hwpoison_hugetlb_folio()
> >> without hugetlb_lock being held. So hugetlb page could be demoted
> >> before get_hwpoison_hugetlb_folio() holding hugetlb_lock but after
> >> page_folio() is fetched. So get_hwpoison_hugetlb_folio() will hold
> >> unexpected extra refcnt of hugetlb folio while leaving demoted page
> >> un-refcnted.
> > 
> > Very nice, thank you for finding the issue.
> > 
> >>
> >> Fixes: 25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation")
> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >> ---
> >>  mm/memory-failure.c | 18 ++++++++++++++----
> >>  1 file changed, 14 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index 76d88d27cdbe..066bf57f2d22 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1388,8 +1388,14 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags)
> >>  	bool hugetlb = false;
> >>  
> >>  	ret = get_hwpoison_hugetlb_folio(folio, &hugetlb, false);
> >> -	if (hugetlb)
> >> -		return ret;
> >> +	if (hugetlb) {
> >> +		if (folio == page_folio(page))
> >> +			return ret;
> > 
> > Some short comment about the race against demotion here is helpful.
> 
> Does the below comment makes sense to you?
> 
> "
>   Make sure hugetlb demotion did not happen from under us.
> "

Yes, this sounds fine.

Thanks,
Naoya Horiguchi
  
Miaohe Lin July 10, 2023, 8:59 a.m. UTC | #4
On 2023/7/10 16:39, Naoya Horiguchi wrote:
> On Mon, Jul 10, 2023 at 04:32:27PM +0800, Miaohe Lin wrote:
>> On 2023/7/10 15:58, Naoya Horiguchi wrote:
>>> On Sat, Jul 08, 2023 at 04:57:44PM +0800, Miaohe Lin wrote:
>>>> page_folio() is fetched before calling get_hwpoison_hugetlb_folio()
>>>> without hugetlb_lock being held. So hugetlb page could be demoted
>>>> before get_hwpoison_hugetlb_folio() holding hugetlb_lock but after
>>>> page_folio() is fetched. So get_hwpoison_hugetlb_folio() will hold
>>>> unexpected extra refcnt of hugetlb folio while leaving demoted page
>>>> un-refcnted.
>>>
>>> Very nice, thank you for finding the issue.
>>>
>>>>
>>>> Fixes: 25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation")
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>> ---
>>>>  mm/memory-failure.c | 18 ++++++++++++++----
>>>>  1 file changed, 14 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>> index 76d88d27cdbe..066bf57f2d22 100644
>>>> --- a/mm/memory-failure.c
>>>> +++ b/mm/memory-failure.c
>>>> @@ -1388,8 +1388,14 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags)
>>>>  	bool hugetlb = false;
>>>>  
>>>>  	ret = get_hwpoison_hugetlb_folio(folio, &hugetlb, false);
>>>> -	if (hugetlb)
>>>> -		return ret;
>>>> +	if (hugetlb) {
>>>> +		if (folio == page_folio(page))
>>>> +			return ret;
>>>
>>> Some short comment about the race against demotion here is helpful.
>>
>> Does the below comment makes sense to you?
>>
>> "
>>   Make sure hugetlb demotion did not happen from under us.
>> "
> 
> Yes, this sounds fine.

Will do it in v2. Thanks.

> 
> Thanks,
> Naoya Horiguchi
> 
> .
>
  

Patch

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 76d88d27cdbe..066bf57f2d22 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1388,8 +1388,14 @@  static int __get_hwpoison_page(struct page *page, unsigned long flags)
 	bool hugetlb = false;
 
 	ret = get_hwpoison_hugetlb_folio(folio, &hugetlb, false);
-	if (hugetlb)
-		return ret;
+	if (hugetlb) {
+		if (folio == page_folio(page))
+			return ret;
+		if (ret > 0) {
+			folio_put(folio);
+			folio = page_folio(page);
+		}
+	}
 
 	/*
 	 * This check prevents from calling folio_try_get() for any
@@ -1478,8 +1484,12 @@  static int __get_unpoison_page(struct page *page)
 	bool hugetlb = false;
 
 	ret = get_hwpoison_hugetlb_folio(folio, &hugetlb, true);
-	if (hugetlb)
-		return ret;
+	if (hugetlb) {
+		if (folio == page_folio(page))
+			return ret;
+		if (ret > 0)
+			folio_put(folio);
+	}
 
 	/*
 	 * PageHWPoisonTakenOff pages are not only marked as PG_hwpoison,