[6.0,19/20] mm/huge_memory: do not clobber swp_entry_t during THP split

Message ID 20221024112935.173960536@linuxfoundation.org
State New
Headers
Series None |

Commit Message

Greg KH Oct. 24, 2022, 11:31 a.m. UTC
  From: Mel Gorman <mgorman@techsingularity.net>

commit 71e2d666ef85d51834d658830f823560c402b8b6 upstream.

The following has been observed when running stressng mmap since commit
b653db77350c ("mm: Clear page->private when splitting or migrating a page")

   watchdog: BUG: soft lockup - CPU#75 stuck for 26s! [stress-ng:9546]
   CPU: 75 PID: 9546 Comm: stress-ng Tainted: G            E      6.0.0-revert-b653db77-fix+ #29 0357d79b60fb09775f678e4f3f64ef0579ad1374
   Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016
   RIP: 0010:xas_descend+0x28/0x80
   Code: cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2
   RSP: 0018:ffffbbf02a2236a8 EFLAGS: 00000246
   RAX: ffff9cab7d6a0002 RBX: ffffe04b0af88040 RCX: 0000000000000002
   RDX: 0000000000000030 RSI: ffff9cab60509b60 RDI: ffffbbf02a2236c0
   RBP: 0000000000000000 R08: ffff9cab60509b60 R09: ffffbbf02a2236c0
   R10: 0000000000000001 R11: ffffbbf02a223698 R12: 0000000000000000
   R13: ffff9cab4e28da80 R14: 0000000000039c01 R15: ffff9cab4e28da88
   FS:  00007fab89b85e40(0000) GS:ffff9cea3fcc0000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 00007fab84e00000 CR3: 00000040b73a4003 CR4: 00000000003706e0
   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   Call Trace:
    <TASK>
    xas_load+0x3a/0x50
    __filemap_get_folio+0x80/0x370
    ? put_swap_page+0x163/0x360
    pagecache_get_page+0x13/0x90
    __try_to_reclaim_swap+0x50/0x190
    scan_swap_map_slots+0x31e/0x670
    get_swap_pages+0x226/0x3c0
    folio_alloc_swap+0x1cc/0x240
    add_to_swap+0x14/0x70
    shrink_page_list+0x968/0xbc0
    reclaim_page_list+0x70/0xf0
    reclaim_pages+0xdd/0x120
    madvise_cold_or_pageout_pte_range+0x814/0xf30
    walk_pgd_range+0x637/0xa30
    __walk_page_range+0x142/0x170
    walk_page_range+0x146/0x170
    madvise_pageout+0xb7/0x280
    ? asm_common_interrupt+0x22/0x40
    madvise_vma_behavior+0x3b7/0xac0
    ? find_vma+0x4a/0x70
    ? find_vma+0x64/0x70
    ? madvise_vma_anon_name+0x40/0x40
    madvise_walk_vmas+0xa6/0x130
    do_madvise+0x2f4/0x360
    __x64_sys_madvise+0x26/0x30
    do_syscall_64+0x5b/0x80
    ? do_syscall_64+0x67/0x80
    ? syscall_exit_to_user_mode+0x17/0x40
    ? do_syscall_64+0x67/0x80
    ? syscall_exit_to_user_mode+0x17/0x40
    ? do_syscall_64+0x67/0x80
    ? do_syscall_64+0x67/0x80
    ? common_interrupt+0x8b/0xa0
    entry_SYSCALL_64_after_hwframe+0x63/0xcd

The problem can be reproduced with the mmtests config
config-workload-stressng-mmap.  It does not always happen and when it
triggers is variable but it has happened on multiple machines.

The intent of commit b653db77350c patch was to avoid the case where
PG_private is clear but folio->private is not-NULL.  However, THP tail
pages uses page->private for "swp_entry_t if folio_test_swapcache()" as
stated in the documentation for struct folio.  This patch only clobbers
page->private for tail pages if the head page was not in swapcache and
warns once if page->private had an unexpected value.

Link: https://lkml.kernel.org/r/20221019134156.zjyyn5aownakvztf@techsingularity.net
Fixes: b653db77350c ("mm: Clear page->private when splitting or migrating a page")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/huge_memory.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)
  

Comments

Hugh Dickins Oct. 25, 2022, 3:11 p.m. UTC | #1
On Mon, 24 Oct 2022, Greg Kroah-Hartman wrote:
> From: Mel Gorman <mgorman@techsingularity.net>
> 
> commit 71e2d666ef85d51834d658830f823560c402b8b6 upstream.
> 
> The following has been observed when running stressng mmap since commit
> b653db77350c ("mm: Clear page->private when splitting or migrating a page")
> 
>    watchdog: BUG: soft lockup - CPU#75 stuck for 26s! [stress-ng:9546]
>    CPU: 75 PID: 9546 Comm: stress-ng Tainted: G            E      6.0.0-revert-b653db77-fix+ #29 0357d79b60fb09775f678e4f3f64ef0579ad1374
>    Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016
>    RIP: 0010:xas_descend+0x28/0x80
>    Code: cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2
>    RSP: 0018:ffffbbf02a2236a8 EFLAGS: 00000246
>    RAX: ffff9cab7d6a0002 RBX: ffffe04b0af88040 RCX: 0000000000000002
>    RDX: 0000000000000030 RSI: ffff9cab60509b60 RDI: ffffbbf02a2236c0
>    RBP: 0000000000000000 R08: ffff9cab60509b60 R09: ffffbbf02a2236c0
>    R10: 0000000000000001 R11: ffffbbf02a223698 R12: 0000000000000000
>    R13: ffff9cab4e28da80 R14: 0000000000039c01 R15: ffff9cab4e28da88
>    FS:  00007fab89b85e40(0000) GS:ffff9cea3fcc0000(0000) knlGS:0000000000000000
>    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>    CR2: 00007fab84e00000 CR3: 00000040b73a4003 CR4: 00000000003706e0
>    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>    Call Trace:
>     <TASK>
>     xas_load+0x3a/0x50
>     __filemap_get_folio+0x80/0x370
>     ? put_swap_page+0x163/0x360
>     pagecache_get_page+0x13/0x90
>     __try_to_reclaim_swap+0x50/0x190
>     scan_swap_map_slots+0x31e/0x670
>     get_swap_pages+0x226/0x3c0
>     folio_alloc_swap+0x1cc/0x240
>     add_to_swap+0x14/0x70
>     shrink_page_list+0x968/0xbc0
>     reclaim_page_list+0x70/0xf0
>     reclaim_pages+0xdd/0x120
>     madvise_cold_or_pageout_pte_range+0x814/0xf30
>     walk_pgd_range+0x637/0xa30
>     __walk_page_range+0x142/0x170
>     walk_page_range+0x146/0x170
>     madvise_pageout+0xb7/0x280
>     ? asm_common_interrupt+0x22/0x40
>     madvise_vma_behavior+0x3b7/0xac0
>     ? find_vma+0x4a/0x70
>     ? find_vma+0x64/0x70
>     ? madvise_vma_anon_name+0x40/0x40
>     madvise_walk_vmas+0xa6/0x130
>     do_madvise+0x2f4/0x360
>     __x64_sys_madvise+0x26/0x30
>     do_syscall_64+0x5b/0x80
>     ? do_syscall_64+0x67/0x80
>     ? syscall_exit_to_user_mode+0x17/0x40
>     ? do_syscall_64+0x67/0x80
>     ? syscall_exit_to_user_mode+0x17/0x40
>     ? do_syscall_64+0x67/0x80
>     ? do_syscall_64+0x67/0x80
>     ? common_interrupt+0x8b/0xa0
>     entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> The problem can be reproduced with the mmtests config
> config-workload-stressng-mmap.  It does not always happen and when it
> triggers is variable but it has happened on multiple machines.
> 
> The intent of commit b653db77350c patch was to avoid the case where
> PG_private is clear but folio->private is not-NULL.  However, THP tail
> pages uses page->private for "swp_entry_t if folio_test_swapcache()" as
> stated in the documentation for struct folio.  This patch only clobbers
> page->private for tail pages if the head page was not in swapcache and
> warns once if page->private had an unexpected value.
> 
> Link: https://lkml.kernel.org/r/20221019134156.zjyyn5aownakvztf@techsingularity.net
> Fixes: b653db77350c ("mm: Clear page->private when splitting or migrating a page")
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Brian Foster <bfoster@redhat.com>
> Cc: Dan Streetman <ddstreet@ieee.org>
> Cc: Miaohe Lin <linmiaohe@huawei.com>
> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
> Cc: Seth Jennings <sjenning@redhat.com>
> Cc: Vitaly Wool <vitaly.wool@konsulko.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Greg, this patch from Mel is important,
but introduces a warning which is giving Tvrtko trouble - see linux-mm
https://lore.kernel.org/linux-mm/1596edbb-02ad-6bdf-51b8-15c2d2e08b76@linux.intel.com/

We already have the fix for the warning, it's making its way through the
system, and is marked for stable, but it has not reached Linus's tree yet.

Please drop this 19/20 from 6.0.4, then I'll reply to this to let you know
when the fix does reach Linus's tree - hopefully the two can go together
in the next 6.0-stable.

I apologize for not writing yesterday: gmail had gathered together
different threads with the same subject, I thought you and stable
were Cc'ed on the linux-mm mail and you would immediately drop it
yourself, but in fact you were not on that thread at all.

Thanks,
Hugh
  
Greg KH Oct. 25, 2022, 3:58 p.m. UTC | #2
On Tue, Oct 25, 2022 at 08:11:44AM -0700, Hugh Dickins wrote:
> On Mon, 24 Oct 2022, Greg Kroah-Hartman wrote:
> > From: Mel Gorman <mgorman@techsingularity.net>
> > 
> > commit 71e2d666ef85d51834d658830f823560c402b8b6 upstream.
> > 
> > The following has been observed when running stressng mmap since commit
> > b653db77350c ("mm: Clear page->private when splitting or migrating a page")
> > 
> >    watchdog: BUG: soft lockup - CPU#75 stuck for 26s! [stress-ng:9546]
> >    CPU: 75 PID: 9546 Comm: stress-ng Tainted: G            E      6.0.0-revert-b653db77-fix+ #29 0357d79b60fb09775f678e4f3f64ef0579ad1374
> >    Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016
> >    RIP: 0010:xas_descend+0x28/0x80
> >    Code: cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2
> >    RSP: 0018:ffffbbf02a2236a8 EFLAGS: 00000246
> >    RAX: ffff9cab7d6a0002 RBX: ffffe04b0af88040 RCX: 0000000000000002
> >    RDX: 0000000000000030 RSI: ffff9cab60509b60 RDI: ffffbbf02a2236c0
> >    RBP: 0000000000000000 R08: ffff9cab60509b60 R09: ffffbbf02a2236c0
> >    R10: 0000000000000001 R11: ffffbbf02a223698 R12: 0000000000000000
> >    R13: ffff9cab4e28da80 R14: 0000000000039c01 R15: ffff9cab4e28da88
> >    FS:  00007fab89b85e40(0000) GS:ffff9cea3fcc0000(0000) knlGS:0000000000000000
> >    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >    CR2: 00007fab84e00000 CR3: 00000040b73a4003 CR4: 00000000003706e0
> >    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >    Call Trace:
> >     <TASK>
> >     xas_load+0x3a/0x50
> >     __filemap_get_folio+0x80/0x370
> >     ? put_swap_page+0x163/0x360
> >     pagecache_get_page+0x13/0x90
> >     __try_to_reclaim_swap+0x50/0x190
> >     scan_swap_map_slots+0x31e/0x670
> >     get_swap_pages+0x226/0x3c0
> >     folio_alloc_swap+0x1cc/0x240
> >     add_to_swap+0x14/0x70
> >     shrink_page_list+0x968/0xbc0
> >     reclaim_page_list+0x70/0xf0
> >     reclaim_pages+0xdd/0x120
> >     madvise_cold_or_pageout_pte_range+0x814/0xf30
> >     walk_pgd_range+0x637/0xa30
> >     __walk_page_range+0x142/0x170
> >     walk_page_range+0x146/0x170
> >     madvise_pageout+0xb7/0x280
> >     ? asm_common_interrupt+0x22/0x40
> >     madvise_vma_behavior+0x3b7/0xac0
> >     ? find_vma+0x4a/0x70
> >     ? find_vma+0x64/0x70
> >     ? madvise_vma_anon_name+0x40/0x40
> >     madvise_walk_vmas+0xa6/0x130
> >     do_madvise+0x2f4/0x360
> >     __x64_sys_madvise+0x26/0x30
> >     do_syscall_64+0x5b/0x80
> >     ? do_syscall_64+0x67/0x80
> >     ? syscall_exit_to_user_mode+0x17/0x40
> >     ? do_syscall_64+0x67/0x80
> >     ? syscall_exit_to_user_mode+0x17/0x40
> >     ? do_syscall_64+0x67/0x80
> >     ? do_syscall_64+0x67/0x80
> >     ? common_interrupt+0x8b/0xa0
> >     entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > 
> > The problem can be reproduced with the mmtests config
> > config-workload-stressng-mmap.  It does not always happen and when it
> > triggers is variable but it has happened on multiple machines.
> > 
> > The intent of commit b653db77350c patch was to avoid the case where
> > PG_private is clear but folio->private is not-NULL.  However, THP tail
> > pages uses page->private for "swp_entry_t if folio_test_swapcache()" as
> > stated in the documentation for struct folio.  This patch only clobbers
> > page->private for tail pages if the head page was not in swapcache and
> > warns once if page->private had an unexpected value.
> > 
> > Link: https://lkml.kernel.org/r/20221019134156.zjyyn5aownakvztf@techsingularity.net
> > Fixes: b653db77350c ("mm: Clear page->private when splitting or migrating a page")
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> > Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Cc: Mel Gorman <mgorman@techsingularity.net>
> > Cc: Yang Shi <shy828301@gmail.com>
> > Cc: Brian Foster <bfoster@redhat.com>
> > Cc: Dan Streetman <ddstreet@ieee.org>
> > Cc: Miaohe Lin <linmiaohe@huawei.com>
> > Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
> > Cc: Seth Jennings <sjenning@redhat.com>
> > Cc: Vitaly Wool <vitaly.wool@konsulko.com>
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> Greg, this patch from Mel is important,
> but introduces a warning which is giving Tvrtko trouble - see linux-mm
> https://lore.kernel.org/linux-mm/1596edbb-02ad-6bdf-51b8-15c2d2e08b76@linux.intel.com/
> 
> We already have the fix for the warning, it's making its way through the
> system, and is marked for stable, but it has not reached Linus's tree yet.
> 
> Please drop this 19/20 from 6.0.4, then I'll reply to this to let you know
> when the fix does reach Linus's tree - hopefully the two can go together
> in the next 6.0-stable.
> 
> I apologize for not writing yesterday: gmail had gathered together
> different threads with the same subject, I thought you and stable
> were Cc'ed on the linux-mm mail and you would immediately drop it
> yourself, but in fact you were not on that thread at all.

No worries, now dropped, thanks.

greg k-h
  
Hugh Dickins Oct. 30, 2022, 3:33 a.m. UTC | #3
On Tue, 25 Oct 2022, Greg Kroah-Hartman wrote:
> On Tue, Oct 25, 2022 at 08:11:44AM -0700, Hugh Dickins wrote:
> > On Mon, 24 Oct 2022, Greg Kroah-Hartman wrote:
> > > From: Mel Gorman <mgorman@techsingularity.net>
> > > 
> > > commit 71e2d666ef85d51834d658830f823560c402b8b6 upstream.
> > >...
> > 
> > Greg, this patch from Mel is important,
> > but introduces a warning which is giving Tvrtko trouble - see linux-mm
> > https://lore.kernel.org/linux-mm/1596edbb-02ad-6bdf-51b8-15c2d2e08b76@linux.intel.com/
> > 
> > We already have the fix for the warning, it's making its way through the
> > system, and is marked for stable, but it has not reached Linus's tree yet.
> > 
> > Please drop this 19/20 from 6.0.4, then I'll reply to this to let you know
> > when the fix does reach Linus's tree - hopefully the two can go together
> > in the next 6.0-stable.
> > 
> > I apologize for not writing yesterday: gmail had gathered together
> > different threads with the same subject, I thought you and stable
> > were Cc'ed on the linux-mm mail and you would immediately drop it
> > yourself, but in fact you were not on that thread at all.
> 
> No worries, now dropped, thanks.

Thanks Greg.  Linus's tree now contains my fix
5aae9265ee1a ("mm: prep_compound_tail() clear page->private")
to Mel's fix
71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t during THP split")
so they can now go on together into 6.0 stable.

They would also have been good in 5.19 stable: but too late now, it's EOL.

Hugh
  
Greg KH Oct. 31, 2022, 6:44 a.m. UTC | #4
On Sat, Oct 29, 2022 at 08:33:01PM -0700, Hugh Dickins wrote:
> On Tue, 25 Oct 2022, Greg Kroah-Hartman wrote:
> > On Tue, Oct 25, 2022 at 08:11:44AM -0700, Hugh Dickins wrote:
> > > On Mon, 24 Oct 2022, Greg Kroah-Hartman wrote:
> > > > From: Mel Gorman <mgorman@techsingularity.net>
> > > > 
> > > > commit 71e2d666ef85d51834d658830f823560c402b8b6 upstream.
> > > >...
> > > 
> > > Greg, this patch from Mel is important,
> > > but introduces a warning which is giving Tvrtko trouble - see linux-mm
> > > https://lore.kernel.org/linux-mm/1596edbb-02ad-6bdf-51b8-15c2d2e08b76@linux.intel.com/
> > > 
> > > We already have the fix for the warning, it's making its way through the
> > > system, and is marked for stable, but it has not reached Linus's tree yet.
> > > 
> > > Please drop this 19/20 from 6.0.4, then I'll reply to this to let you know
> > > when the fix does reach Linus's tree - hopefully the two can go together
> > > in the next 6.0-stable.
> > > 
> > > I apologize for not writing yesterday: gmail had gathered together
> > > different threads with the same subject, I thought you and stable
> > > were Cc'ed on the linux-mm mail and you would immediately drop it
> > > yourself, but in fact you were not on that thread at all.
> > 
> > No worries, now dropped, thanks.
> 
> Thanks Greg.  Linus's tree now contains my fix
> 5aae9265ee1a ("mm: prep_compound_tail() clear page->private")
> to Mel's fix
> 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t during THP split")
> so they can now go on together into 6.0 stable.
> 
> They would also have been good in 5.19 stable: but too late now, it's EOL.

Thanks, both now queued up for 6.0

greg k-h
  

Patch

--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2445,7 +2445,16 @@  static void __split_huge_page_tail(struc
 			page_tail);
 	page_tail->mapping = head->mapping;
 	page_tail->index = head->index + tail;
-	page_tail->private = 0;
+
+	/*
+	 * page->private should not be set in tail pages with the exception
+	 * of swap cache pages that store the swp_entry_t in tail pages.
+	 * Fix up and warn once if private is unexpectedly set.
+	 */
+	if (!folio_test_swapcache(page_folio(head))) {
+		VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, head);
+		page_tail->private = 0;
+	}
 
 	/* Page flags must be visible before we make the page non-compound. */
 	smp_wmb();