[1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount

Message ID 20230125015738.912924-1-zokeefe@google.com
State New
Headers
Series [1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount |

Commit Message

Zach O'Keefe Jan. 25, 2023, 1:57 a.m. UTC
  During collapse, in a few places we check to see if a given small page
has any unaccounted references.  If the refcount on the page doesn't
match our expectations, it must be there is an unknown user concurrently
interested in the page, and so it's not safe to move the contents
elsewhere. However, the unaccounted pins are likely an ephemeral state.

In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that
collapse may succeed on retry.

Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Zach O'Keefe <zokeefe@google.com>

---
 mm/khugepaged.c | 1 +
 1 file changed, 1 insertion(+)
  

Comments

Yang Shi Jan. 25, 2023, 6:06 p.m. UTC | #1
On Tue, Jan 24, 2023 at 5:58 PM Zach O'Keefe <zokeefe@google.com> wrote:
>
> During collapse, in a few places we check to see if a given small page
> has any unaccounted references.  If the refcount on the page doesn't
> match our expectations, it must be there is an unknown user concurrently
> interested in the page, and so it's not safe to move the contents
> elsewhere. However, the unaccounted pins are likely an ephemeral state.
>
> In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that
> collapse may succeed on retry.

The page may be DMA pinned (for example, pin_user_pages()), it is not
worth retrying for such pages. But it may also not be worth optimizing
for this case at this point.

So the patch looks good to me. Reviewed-by: Yang Shi <shy828301@gmail.com>

>
> Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
> Reported-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
>
> ---
>  mm/khugepaged.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index e23619bfecc4..fa38cae240b9 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2712,6 +2712,7 @@ static int madvise_collapse_errno(enum scan_result r)
>         case SCAN_CGROUP_CHARGE_FAIL:
>                 return -EBUSY;
>         /* Resource temporary unavailable - trying again might succeed */
> +       case SCAN_PAGE_COUNT:
>         case SCAN_PAGE_LOCK:
>         case SCAN_PAGE_LRU:
>         case SCAN_DEL_PAGE_LRU:
> --
> 2.39.1.405.gd4c25cc71f-goog
>
  
Zach O'Keefe Jan. 25, 2023, 7:15 p.m. UTC | #2
On Wed, Jan 25, 2023 at 10:06 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Tue, Jan 24, 2023 at 5:58 PM Zach O'Keefe <zokeefe@google.com> wrote:
> >
> > During collapse, in a few places we check to see if a given small page
> > has any unaccounted references.  If the refcount on the page doesn't
> > match our expectations, it must be there is an unknown user concurrently
> > interested in the page, and so it's not safe to move the contents
> > elsewhere. However, the unaccounted pins are likely an ephemeral state.
> >
> > In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that
> > collapse may succeed on retry.
>
> The page may be DMA pinned (for example, pin_user_pages()), it is not
> worth retrying for such pages. But it may also not be worth optimizing
> for this case at this point.
>
> So the patch looks good to me. Reviewed-by: Yang Shi <shy828301@gmail.com>

Thanks as always, Yang, and good point about DMA pinning. As you
mentioned, I don't know if it's worth considering that too much right
now, as it's unlikely these two uses (MADV_COLLAPSE and DMA pining)
would be used together. We can revisit if necessary later if it's an
issue, but for now, I think it's a win that MADV_COLLAPSE (+ a bounded
userspace retry loop based off erno) is more likely to succeed.
  
Hugh Dickins Feb. 9, 2023, 5:09 a.m. UTC | #3
On Tue, 24 Jan 2023, Zach O'Keefe wrote:

> During collapse, in a few places we check to see if a given small page
> has any unaccounted references.  If the refcount on the page doesn't
> match our expectations, it must be there is an unknown user concurrently
> interested in the page, and so it's not safe to move the contents
> elsewhere. However, the unaccounted pins are likely an ephemeral state.
> 
> In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that
> collapse may succeed on retry.
> 
> Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
> Reported-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>

This was
Reviewed-by: Yang Shi <shy828301@gmail.com>
and now I'll give it a nudge with
Acked-by: Hugh Dickins <hughd@google.com>
since it hasn't appeared in mm-unstable or linux-next yet:
I think its Cc:stable sibling 2/2, already in 6.2-rc, got all the attention.

Thanks!
Hugh

> 
> ---
>  mm/khugepaged.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index e23619bfecc4..fa38cae240b9 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2712,6 +2712,7 @@ static int madvise_collapse_errno(enum scan_result r)
>  	case SCAN_CGROUP_CHARGE_FAIL:
>  		return -EBUSY;
>  	/* Resource temporary unavailable - trying again might succeed */
> +	case SCAN_PAGE_COUNT:
>  	case SCAN_PAGE_LOCK:
>  	case SCAN_PAGE_LRU:
>  	case SCAN_DEL_PAGE_LRU:
> -- 
> 2.39.1.405.gd4c25cc71f-goog
  
Andrew Morton Feb. 9, 2023, 9:28 p.m. UTC | #4
On Wed, 8 Feb 2023 21:09:04 -0800 (PST) Hugh Dickins <hughd@google.com> wrote:

> On Tue, 24 Jan 2023, Zach O'Keefe wrote:
> 
> > During collapse, in a few places we check to see if a given small page
> > has any unaccounted references.  If the refcount on the page doesn't
> > match our expectations, it must be there is an unknown user concurrently
> > interested in the page, and so it's not safe to move the contents
> > elsewhere. However, the unaccounted pins are likely an ephemeral state.
> > 
> > In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that
> > collapse may succeed on retry.
> > 
> > Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
> > Reported-by: Hugh Dickins <hughd@google.com>
> > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> 
> This was
> Reviewed-by: Yang Shi <shy828301@gmail.com>
> and now I'll give it a nudge with
> Acked-by: Hugh Dickins <hughd@google.com>
> since it hasn't appeared in mm-unstable or linux-next yet:

Buildbot failed on [2/2] so I skipped the whole series in expectation
of a v2 series, which didn't happen.  Instead, Zach trickily sent what
was [2/2] as a standalone patch.  So [1/2] got lost.  Sigh, poor me.

Thanks, I'll merge [1/2] into mm-hotfixes.

> I think its Cc:stable sibling 2/2, already in 6.2-rc, got all the attention.

I'm not seeing anything in the [1/2] changelog which indicates that a
backport is needed.  IOW,

# cat .signature
When fixing a bug, please describe the end-user visible effects of that bug.
  
Hugh Dickins Feb. 9, 2023, 9:50 p.m. UTC | #5
On Thu, 9 Feb 2023, Andrew Morton wrote:
> 
> Thanks, I'll merge [1/2] into mm-hotfixes.

Great, thanks.

> 
> I'm not seeing anything in the [1/2] changelog which indicates that a
> backport is needed.  IOW,

Correct: it's just changing the errno for some racy cases from "you're
wrong, don't bother me again" to "it might be worth having another go":
not fixing an instability, as 2/2 was.

> 
> # cat .signature
> When fixing a bug, please describe the end-user visible effects of that bug.

If whatever's being run by the end-user is coded to try again on -EAGAIN,
then the end-user will less often see occasional unexplained failures.

Hugh
  
Andrew Morton Feb. 9, 2023, 10:12 p.m. UTC | #6
On Thu, 9 Feb 2023 13:50:30 -0800 (PST) Hugh Dickins <hughd@google.com> wrote:

> 
> > 
> > I'm not seeing anything in the [1/2] changelog which indicates that a
> > backport is needed.  IOW,
> 
> Correct: it's just changing the errno for some racy cases from "you're
> wrong, don't bother me again" to "it might be worth having another go":
> not fixing an instability, as 2/2 was.
> 
> > 
> > # cat .signature
> > When fixing a bug, please describe the end-user visible effects of that bug.
> 
> If whatever's being run by the end-user is coded to try again on -EAGAIN,
> then the end-user will less often see occasional unexplained failures.
> 

OK, thanks.  I redid the changelog's final paragraph thusly:

: In this situation, MADV_COLLAPSE returns -EINVAL when it should return
: -EAGAIN.  This could cause userspace to conclude that the syscall failed,
: when it in fact could succeed by retrying.
  
Zach O'Keefe Feb. 9, 2023, 10:29 p.m. UTC | #7
On Thu, Feb 9, 2023 at 2:12 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 9 Feb 2023 13:50:30 -0800 (PST) Hugh Dickins <hughd@google.com> wrote:
>
> >
> > >
> > > I'm not seeing anything in the [1/2] changelog which indicates that a
> > > backport is needed.  IOW,
> >
> > Correct: it's just changing the errno for some racy cases from "you're
> > wrong, don't bother me again" to "it might be worth having another go":
> > not fixing an instability, as 2/2 was.
> >
> > >
> > > # cat .signature
> > > When fixing a bug, please describe the end-user visible effects of that bug.
> >
> > If whatever's being run by the end-user is coded to try again on -EAGAIN,
> > then the end-user will less often see occasional unexplained failures.
> >
>
> OK, thanks.  I redid the changelog's final paragraph thusly:
>
> : In this situation, MADV_COLLAPSE returns -EINVAL when it should return
> : -EAGAIN.  This could cause userspace to conclude that the syscall failed,
> : when it in fact could succeed by retrying.
>

This looks good to me. Thanks Andrew! Also thanks Hugh for being on
the lookout for this patch -- I hastily read through my emails
regarding which patches were merged where and had assumed this merged
with 2/2.

Also, apologies about the confusing v1 [1/2] and v2 [2/2] fiasco; in
hindsight that probably wasn't the most decipherable thing to do :)

Best,
Zach
  

Patch

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index e23619bfecc4..fa38cae240b9 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2712,6 +2712,7 @@  static int madvise_collapse_errno(enum scan_result r)
 	case SCAN_CGROUP_CHARGE_FAIL:
 		return -EBUSY;
 	/* Resource temporary unavailable - trying again might succeed */
+	case SCAN_PAGE_COUNT:
 	case SCAN_PAGE_LOCK:
 	case SCAN_PAGE_LRU:
 	case SCAN_DEL_PAGE_LRU: