mm/memory-failure: fix hardware poison check in unpoison_memory()

Message ID 20230717181812.167757-1-sidhartha.kumar@oracle.com
State New
Headers
Series mm/memory-failure: fix hardware poison check in unpoison_memory() |

Commit Message

Sidhartha Kumar July 17, 2023, 6:18 p.m. UTC
  It was pointed out[1] that using folio_test_hwpoison() is wrong
as we need to check the indiviual page that has poison.
folio_test_hwpoison() only checks the head page so go back to using
PageHWPoison().

Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
Cc: stable@vger.kernel.org #v6.4
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>

[1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/
---
 mm/memory-failure.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Andrew Morton July 17, 2023, 7 p.m. UTC | #1
On Mon, 17 Jul 2023 11:18:12 -0700 Sidhartha Kumar <sidhartha.kumar@oracle.com> wrote:

> It was pointed out[1] that using folio_test_hwpoison() is wrong
> as we need to check the indiviual page that has poison.
> folio_test_hwpoison() only checks the head page so go back to using
> PageHWPoison().

Please describe the user-visible effects of the bug, especially
when proposing a -stable backport.
  
Jane Chu July 17, 2023, 11:21 p.m. UTC | #2
On 7/17/2023 11:18 AM, Sidhartha Kumar wrote:
> It was pointed out[1] that using folio_test_hwpoison() is wrong
> as we need to check the indiviual page that has poison.
> folio_test_hwpoison() only checks the head page so go back to using
> PageHWPoison().
> 
> Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
> Cc: stable@vger.kernel.org #v6.4
> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> 
> [1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/
> ---
>   mm/memory-failure.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 02b1d8f104d51..a114c8c3039cd 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2523,7 +2523,7 @@ int unpoison_memory(unsigned long pfn)
>   		goto unlock_mutex;
>   	}
>   
> -	if (!folio_test_hwpoison(folio)) {
> +	if (!PageHWPoison(p)) {
>   		unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
>   				 pfn, &unpoison_rs);
>   		goto unlock_mutex;

Would it worth the trouble to create folio_page_test_##lname(folio, 
index) macros to address folio subpage?

thanks!
-jane
  
Naoya Horiguchi July 18, 2023, 12:14 a.m. UTC | #3
On Mon, Jul 17, 2023 at 11:18:12AM -0700, Sidhartha Kumar wrote:
> It was pointed out[1] that using folio_test_hwpoison() is wrong
> as we need to check the indiviual page that has poison.
> folio_test_hwpoison() only checks the head page so go back to using
> PageHWPoison().
> 
> Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
> Cc: stable@vger.kernel.org #v6.4
> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> 
> [1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/
> ---
>  mm/memory-failure.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 02b1d8f104d51..a114c8c3039cd 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2523,7 +2523,7 @@ int unpoison_memory(unsigned long pfn)
>  		goto unlock_mutex;
>  	}
>  
> -	if (!folio_test_hwpoison(folio)) {
> +	if (!PageHWPoison(p)) {


I don't think this works for hwpoisoned hugetlb pages that have PageHWPoison
set on the head page, rather than on the raw subpage. In the case of
hwpoisoned thps, PageHWPoison is set on the raw subpage, not on the head
pages.  (I believe this is not detected because no one considers the
scenario of unpoisoning hwpoisoned thps, which is a rare case).  Perhaps the
function is_page_hwpoison() would be useful for this purpose?

Thanks,
Naoya Horiguchi

>  		unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
>  				 pfn, &unpoison_rs);
>  		goto unlock_mutex;
> -- 
> 2.41.0
> 
> 
>
  
Naoya Horiguchi July 18, 2023, 12:39 a.m. UTC | #4
On Tue, Jul 18, 2023 at 09:14:09AM +0900, Naoya Horiguchi wrote:
> On Mon, Jul 17, 2023 at 11:18:12AM -0700, Sidhartha Kumar wrote:
> > It was pointed out[1] that using folio_test_hwpoison() is wrong
> > as we need to check the indiviual page that has poison.
> > folio_test_hwpoison() only checks the head page so go back to using
> > PageHWPoison().
> > 
> > Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
> > Cc: stable@vger.kernel.org #v6.4
> > Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> > 
> > [1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/
> > ---
> >  mm/memory-failure.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 02b1d8f104d51..a114c8c3039cd 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -2523,7 +2523,7 @@ int unpoison_memory(unsigned long pfn)
> >  		goto unlock_mutex;
> >  	}
> >  
> > -	if (!folio_test_hwpoison(folio)) {
> > +	if (!PageHWPoison(p)) {
> 
> 
> I don't think this works for hwpoisoned hugetlb pages that have PageHWPoison
> set on the head page, rather than on the raw subpage. In the case of
> hwpoisoned thps, PageHWPoison is set on the raw subpage, not on the head
> pages.  (I believe this is not detected because no one considers the
> scenario of unpoisoning hwpoisoned thps, which is a rare case).  Perhaps the
> function is_page_hwpoison() would be useful for this purpose?

Sorry, I was wrong.  Checking PageHWPoison() is fine because the users of
unpoison should know where the PageHWPoison is set via /proc/kpageflags.
So this patch is OK to me after comments from other reviewers are resolved.

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Thanks,
Naoya Horiguchi
  
Sidhartha Kumar July 18, 2023, 2:25 p.m. UTC | #5
On 7/17/23 12:00 PM, Andrew Morton wrote:
> On Mon, 17 Jul 2023 11:18:12 -0700 Sidhartha Kumar <sidhartha.kumar@oracle.com> wrote:
> 
>> It was pointed out[1] that using folio_test_hwpoison() is wrong
>> as we need to check the indiviual page that has poison.
>> folio_test_hwpoison() only checks the head page so go back to using
>> PageHWPoison().
> 
> Please describe the user-visible effects of the bug, especially
> when proposing a -stable backport.

User-visible effects include existing hwpoison-inject tests possibly 
failing as unpoisoning a single subpage could lead to unpoisoning an 
entire folio. Memory unpoisoning could also not work as expected as the 
function will break early due to only checking the head page and not the 
actually poisoned subpage.
  
Sidhartha Kumar July 18, 2023, 2:30 p.m. UTC | #6
On 7/17/23 5:39 PM, Naoya Horiguchi wrote:
> On Tue, Jul 18, 2023 at 09:14:09AM +0900, Naoya Horiguchi wrote:
>> On Mon, Jul 17, 2023 at 11:18:12AM -0700, Sidhartha Kumar wrote:
>>> It was pointed out[1] that using folio_test_hwpoison() is wrong
>>> as we need to check the indiviual page that has poison.
>>> folio_test_hwpoison() only checks the head page so go back to using
>>> PageHWPoison().
>>>
>>> Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>> Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
>>> Cc: stable@vger.kernel.org #v6.4
>>> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>>>
>>> [1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/
>>> ---
>>>   mm/memory-failure.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index 02b1d8f104d51..a114c8c3039cd 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -2523,7 +2523,7 @@ int unpoison_memory(unsigned long pfn)
>>>   		goto unlock_mutex;
>>>   	}
>>>   
>>> -	if (!folio_test_hwpoison(folio)) {
>>> +	if (!PageHWPoison(p)) {
>>
>>
>> I don't think this works for hwpoisoned hugetlb pages that have PageHWPoison
>> set on the head page, rather than on the raw subpage. In the case of
>> hwpoisoned thps, PageHWPoison is set on the raw subpage, not on the head
>> pages.  (I believe this is not detected because no one considers the
>> scenario of unpoisoning hwpoisoned thps, which is a rare case).  Perhaps the
>> function is_page_hwpoison() would be useful for this purpose?
> 
> Sorry, I was wrong.  Checking PageHWPoison() is fine because the users of
> unpoison should know where the PageHWPoison is set via /proc/kpageflags.
> So this patch is OK to me after comments from other reviewers are resolved.
> 

Hi Naoya,

While taking a closer at the patch, later in unpoison_memory() there is 
also:

-               ret = TestClearPageHWPoison(page) ? 0 : -EBUSY;
+               ret = folio_test_clear_hwpoison(folio) ? 0 : -EBUSY;

I thought this folio conversion would be safe because page is the result 
of a compound_head() call but I'm wondering if the same issue exists 
here and we should be calling TestClearPageHWPoison() on the specific 
subpage by doing TestClearPageHWPoison(p).

Thanks,
Sidhartha Kumar

> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> Thanks,
> Naoya Horiguchi
>
  
Naoya Horiguchi July 18, 2023, 11:59 p.m. UTC | #7
On Tue, Jul 18, 2023 at 07:30:23AM -0700, Sidhartha Kumar wrote:
> On 7/17/23 5:39 PM, Naoya Horiguchi wrote:
> > On Tue, Jul 18, 2023 at 09:14:09AM +0900, Naoya Horiguchi wrote:
> > > On Mon, Jul 17, 2023 at 11:18:12AM -0700, Sidhartha Kumar wrote:
> > > > It was pointed out[1] that using folio_test_hwpoison() is wrong
> > > > as we need to check the indiviual page that has poison.
> > > > folio_test_hwpoison() only checks the head page so go back to using
> > > > PageHWPoison().
> > > > 
> > > > Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > > Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
> > > > Cc: stable@vger.kernel.org #v6.4
> > > > Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> > > > 
> > > > [1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/
> > > > ---
> > > >   mm/memory-failure.c | 2 +-
> > > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > > > index 02b1d8f104d51..a114c8c3039cd 100644
> > > > --- a/mm/memory-failure.c
> > > > +++ b/mm/memory-failure.c
> > > > @@ -2523,7 +2523,7 @@ int unpoison_memory(unsigned long pfn)
> > > >   		goto unlock_mutex;
> > > >   	}
> > > > -	if (!folio_test_hwpoison(folio)) {
> > > > +	if (!PageHWPoison(p)) {
> > > 
> > > 
> > > I don't think this works for hwpoisoned hugetlb pages that have PageHWPoison
> > > set on the head page, rather than on the raw subpage. In the case of
> > > hwpoisoned thps, PageHWPoison is set on the raw subpage, not on the head
> > > pages.  (I believe this is not detected because no one considers the
> > > scenario of unpoisoning hwpoisoned thps, which is a rare case).  Perhaps the
> > > function is_page_hwpoison() would be useful for this purpose?
> > 
> > Sorry, I was wrong.  Checking PageHWPoison() is fine because the users of
> > unpoison should know where the PageHWPoison is set via /proc/kpageflags.
> > So this patch is OK to me after comments from other reviewers are resolved.
> > 
> 
> Hi Naoya,
> 
> While taking a closer at the patch, later in unpoison_memory() there is
> also:
> 
> -               ret = TestClearPageHWPoison(page) ? 0 : -EBUSY;
> +               ret = folio_test_clear_hwpoison(folio) ? 0 : -EBUSY;
> 
> I thought this folio conversion would be safe because page is the result of
> a compound_head() call but I'm wondering if the same issue exists here and
> we should be calling TestClearPageHWPoison() on the specific subpage by
> doing TestClearPageHWPoison(p).

In this case (get_hwpoison_page returns 0), the target of unpoison_memory was
buddy page or free huge page, so there seems not any realistic problem.
But putting back to TestClearPageHWPoison() looks consistent, so I'm fine with it.

Thanks,
Naoya Horiguchi
  
Miaohe Lin July 20, 2023, 9:06 a.m. UTC | #8
On 2023/7/18 2:18, Sidhartha Kumar wrote:
> It was pointed out[1] that using folio_test_hwpoison() is wrong
> as we need to check the indiviual page that has poison.
> folio_test_hwpoison() only checks the head page so go back to using
> PageHWPoison().
> 
> Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
> Cc: stable@vger.kernel.org #v6.4
> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> 
> [1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/
> ---
>  mm/memory-failure.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 02b1d8f104d51..a114c8c3039cd 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2523,7 +2523,7 @@ int unpoison_memory(unsigned long pfn)
>  		goto unlock_mutex;
>  	}
>  
> -	if (!folio_test_hwpoison(folio)) {
> +	if (!PageHWPoison(p)) {

For successfully handled pages, they should be non-compound pages (dissolved, splitted or normal pages).
So this patch makes no change for them. But for failed to hwpoisoned thp and hugetlb, there's some difference.
But since Naoya points out that, "the users of unpoison should know where the PageHWPoison is set via
/proc/kpageflags.", I'm fine with this patch.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Thanks.
  

Patch

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 02b1d8f104d51..a114c8c3039cd 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2523,7 +2523,7 @@  int unpoison_memory(unsigned long pfn)
 		goto unlock_mutex;
 	}
 
-	if (!folio_test_hwpoison(folio)) {
+	if (!PageHWPoison(p)) {
 		unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
 				 pfn, &unpoison_rs);
 		goto unlock_mutex;