Message ID | 20230529123705.955378-1-mawupeng1@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1484531vqr; Mon, 29 May 2023 05:47:59 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6UELbrxjY8NmYX2AvvKHa56W3/W1hBjzgt60k9JzZcFkG0WxJ+QS/zFhtLk9ZCcNOazA+M X-Received: by 2002:a17:90a:760b:b0:253:8a50:1bcb with SMTP id s11-20020a17090a760b00b002538a501bcbmr10959328pjk.25.1685364478814; Mon, 29 May 2023 05:47:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685364478; cv=none; d=google.com; s=arc-20160816; b=H2bAlzOUQlb9ipdu0WUrdjoKDRHOGslDxQtxHixW4xFkt765o0MhIBBy0+wIKRwUI6 Y9i6gddjyKzIIQfbdLr2zUItqN5IzqeVAUiTAY7bQZ6Y3IycVc16608FEXKHxAI3EFF4 OKnUhjmyaDYsKvXano92JZdFWqC468inr7rJKcxVRYlpjwU2GkR2i55R5M82N3K7UR7m KYGrEW93DKYqoLCT2tGU7WNxZlbXS1Wtnfm24OIJLqgEQR+z6gKrxfOAfbgAy1+BkKUg L7MA2GGgrLQtEA324WoOwCQhQwUqgYJTaT5oYVh4UqsQE3MrKvaHws8jOtnmLvHkxm7L 0sAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=6T0bchCW3ZO+jkDRRBiy5NT7fsh94Kd1GmCkmycBYBw=; b=Ihc9oYl2ghXCfdw6+6WMiM7jI5dx5NF7J9YCKq7wDxRRqQFJumix+kb5DZvQzKk4ab WxRU0X94DjKnnq7Bk0w9dQo1QQTGuwc8kIJfpd+mW4pXvd4SzlA0k8dDwypyyuuqhEBK zMup2uDtzAmz304c+4GQig+KlVlLTk/Oe+EWWIWXyfL4n8eyTeRDUU1co+3+bL+pRNLD DA2TbVXo+a8DbYi6Cy3plu5GFMEG41IHJhFAEeRenyCIsOr//rt6h0MFOqzJws2LWjdS TUsD6PpzK81fRofPeHdTRd4EHCwsDzsVbVIYvloFCpFwd+HWhCWkR8yA4jvw7WRqbMHY hB2g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lj12-20020a17090b344c00b0024df8757367si9644559pjb.87.2023.05.29.05.47.46; Mon, 29 May 2023 05:47:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231479AbjE2MhS (ORCPT <rfc822;callmefire3@gmail.com> + 99 others); Mon, 29 May 2023 08:37:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53222 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230149AbjE2MhR (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 29 May 2023 08:37:17 -0400 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 970EAAD for <linux-kernel@vger.kernel.org>; Mon, 29 May 2023 05:37:14 -0700 (PDT) Received: from dggpemm500014.china.huawei.com (unknown [172.30.72.56]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4QVFP50DRgzLmPC; Mon, 29 May 2023 20:35:37 +0800 (CST) Received: from localhost.localdomain (10.175.112.125) by dggpemm500014.china.huawei.com (7.185.36.153) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Mon, 29 May 2023 20:37:10 +0800 From: Wupeng Ma <mawupeng1@huawei.com> To: <akpm@linux-foundation.org>, <kirill.shutemov@linux.intel.com>, <hughd@google.com> CC: <n-horiguchi@ah.jp.nec.com>, <jmarchan@redhat.com>, <willy@infradead.org>, <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>, <mawupeng1@huawei.com> Subject: [RFC PATCH stable 5.10/5.15] mm: Pass head page to clear_page_mlock for page_remove_rmap Date: Mon, 29 May 2023 20:37:05 +0800 Message-ID: <20230529123705.955378-1-mawupeng1@huawei.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.175.112.125] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpemm500014.china.huawei.com (7.185.36.153) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767232743436787565?= X-GMAIL-MSGID: =?utf-8?q?1767232743436787565?= |
Series |
[RFC,stable,5.10/5.15] mm: Pass head page to clear_page_mlock for page_remove_rmap
|
|
Commit Message
mawupeng
May 29, 2023, 12:37 p.m. UTC
From: Ma Wupeng <mawupeng1@huawei.com> Our syzbot report a mlock related problem. During exit_mm, tail page is passed to clear_page_mlock which final lead to kernel panic. During unmap_page_range, if compound is false, it means this page is seen as a small page. This page is passed to isolate_lru_page if this page is PageMlocked and finally lead to "trying to isolate tail page" warning. Here is the simplified calltrace: unmap_page_range zap_pte_range page_remove_rmap(page, false); // compound is false means to handle to small page not compound page nr_pages = thp_nr_pages(page); clear_page_mlock(page) // maybe tail page here isolate_lru_page WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"); Since mlock is not supposed to handle tail, we pass head page to clear_page_mlock() to slove this problem. This bug can lead to multiple reports. Here ares the simplified reports: ------------[ cut here ]------------ trying to isolate tail page WARNING: CPU: 1 PID: 24489 at mm/vmscan.c:2031 isolate_lru_page+0x574/0x660 page:fffffc000eb7a300 refcount:512 mapcount:0 mapping:0000000000000000 index:0x2008c pfn:0x3ede8c head:fffffc000eb78000 order:9 compound_mapcount:0 compound_pincount:0 memcg:ffff0000d24bc000 anon flags: 0x37ffff80009080c(uptodate|dirty|arch_1|head|swapbacked|node=1|zone=2|lastcpupid=0xfffff) raw: 037ffff800000800 fffffc000eb78001 fffffc000eb7a308 dead000000000400 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 head: 037ffff80009080c fffffc000eb70008 fffffc000e350708 ffff0003829eb839 head: 0000000000020000 0000000000000000 00000200ffffffff ffff0000d24bc000 page dumped because: VM_WARN_ON_ONCE_PAGE(!memcg && !mem_cgroup_disabled()) ------------[ cut here ]------------ WARNING: CPU: 1 PID: 24489 at include/linux/memcontrol.h:767 lock_page_lruvec_irq+0x148/0x190 page:fffffc000eb7a300 refcount:0 mapcount:0 mapping:dead000000000400 index:0x0 pfn:0x3ede8c failed to read mapping contents, not a valid kernel address? flags: 0x37ffff800000800(arch_1|node=1|zone=2|lastcpupid=0xfffff) raw: 037ffff800000800 dead000000000100 dead000000000122 dead000000000400 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(((unsigned int) page_ref_count(page) + 127u <= 127u)) ------------[ cut here ]------------ kernel BUG at include/linux/mm.h:1213! Call trace: lru_cache_add+0x2d4/0x2e8 putback_lru_page+0x2c/0x168 clear_page_mlock+0x254/0x318 page_remove_rmap+0x900/0x9c0 unmap_page_range+0xa78/0x16a0 unmap_single_vma+0x114/0x1a0 unmap_vmas+0x100/0x220 exit_mmap+0x120/0x410 mmput+0x174/0x498 exit_mm+0x33c/0x460 do_exit+0x3c0/0x1310 do_group_exit+0x98/0x170 get_signal+0x370/0x13d0 do_notify_resume+0x5a0/0x968 el0_da+0x154/0x188 el0t_64_sync_handler+0x88/0xb8 el0t_64_sync+0x1a0/0x1a4 Code: 912b0021 aa1503e0 910c0021 9401a49c (d4210000) This bug can be reproduced in both linux-5.10.y & linux-5.15.y and maybe fixed after commit 889a3747b3b7 ("mm/lru: Add folio LRU functions"). This patch turn page into folio for LRU related operations, all operations to page is turn to folio which means head page after this patch. Fixes: d281ee614518 ("rmap: add argument to charge compound page") Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> --- mm/rmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On Mon, 29 May 2023, Wupeng Ma wrote: > From: Ma Wupeng <mawupeng1@huawei.com> > > Our syzbot report a mlock related problem. During exit_mm, tail page is > passed to clear_page_mlock which final lead to kernel panic. > > During unmap_page_range, if compound is false, it means this page is > seen as a small page. This page is passed to isolate_lru_page if this > page is PageMlocked and finally lead to "trying to isolate tail page" > warning. > > Here is the simplified calltrace: > > unmap_page_range > zap_pte_range > page_remove_rmap(page, false); // compound is false means to handle > to small page not compound page > nr_pages = thp_nr_pages(page); > clear_page_mlock(page) // maybe tail page here > isolate_lru_page > WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"); > > Since mlock is not supposed to handle tail, we pass head page to > clear_page_mlock() to slove this problem. Your patch looks plausible for stable, and might even end up as the best that can be done; but I think you have not root-caused the problem yet (and until it's root-caused, there is likely to be other damage). 5.15 and 5.10 were releases with the PageDoubleMap flag, and the intention then was that a compound page with PageDoubleMap set could not be Mlocked, and PageMlocked had to be cleared when setting PageDoubleMap. See, for example, the line in the old mlock_vma_page() VM_BUG_ON_PAGE(PageCompound(page) && PageDoubleMap(page), page); before it did the TestSetPageMlocked(). So it should have been impossible to find PageMlocked on a Tail page (even with PageMlocked redirecting to the head page to look up the flag) there; so unnecessary for clear_page_mlock() to use compound_head(). Since nobody reported this problem before, my suspicion is that a commit has been backported to 5.15 and 5.10 stable, which does not belong there. Or perhaps the stable trees are okay, but your own tree has an unsuitable backport in it? > > This bug can lead to multiple reports. Here ares the simplified reports: > > ------------[ cut here ]------------ > trying to isolate tail page > WARNING: CPU: 1 PID: 24489 at mm/vmscan.c:2031 isolate_lru_page+0x574/0x660 > > page:fffffc000eb7a300 refcount:512 mapcount:0 mapping:0000000000000000 index:0x2008c pfn:0x3ede8c > head:fffffc000eb78000 order:9 compound_mapcount:0 compound_pincount:0 > memcg:ffff0000d24bc000 > anon flags: 0x37ffff80009080c(uptodate|dirty|arch_1|head|swapbacked|node=1|zone=2|lastcpupid=0xfffff) > raw: 037ffff800000800 fffffc000eb78001 fffffc000eb7a308 dead000000000400 > raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 > head: 037ffff80009080c fffffc000eb70008 fffffc000e350708 ffff0003829eb839 > head: 0000000000020000 0000000000000000 00000200ffffffff ffff0000d24bc000 > page dumped because: VM_WARN_ON_ONCE_PAGE(!memcg && !mem_cgroup_disabled()) > ------------[ cut here ]------------ > WARNING: CPU: 1 PID: 24489 at include/linux/memcontrol.h:767 lock_page_lruvec_irq+0x148/0x190 > > page:fffffc000eb7a300 refcount:0 mapcount:0 mapping:dead000000000400 index:0x0 pfn:0x3ede8c > failed to read mapping contents, not a valid kernel address? > flags: 0x37ffff800000800(arch_1|node=1|zone=2|lastcpupid=0xfffff) > raw: 037ffff800000800 dead000000000100 dead000000000122 dead000000000400 > raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 > page dumped because: VM_BUG_ON_PAGE(((unsigned int) page_ref_count(page) + 127u <= 127u)) > ------------[ cut here ]------------ > kernel BUG at include/linux/mm.h:1213! > Call trace: > lru_cache_add+0x2d4/0x2e8 > putback_lru_page+0x2c/0x168 > clear_page_mlock+0x254/0x318 > page_remove_rmap+0x900/0x9c0 > unmap_page_range+0xa78/0x16a0 > unmap_single_vma+0x114/0x1a0 > unmap_vmas+0x100/0x220 > exit_mmap+0x120/0x410 > mmput+0x174/0x498 > exit_mm+0x33c/0x460 > do_exit+0x3c0/0x1310 > do_group_exit+0x98/0x170 > get_signal+0x370/0x13d0 > do_notify_resume+0x5a0/0x968 > el0_da+0x154/0x188 > el0t_64_sync_handler+0x88/0xb8 > el0t_64_sync+0x1a0/0x1a4 > Code: 912b0021 aa1503e0 910c0021 9401a49c (d4210000) > > This bug can be reproduced in both linux-5.10.y & linux-5.15.y and maybe > fixed after commit 889a3747b3b7 ("mm/lru: Add folio LRU functions"). > This patch turn page into folio for LRU related operations, all > operations to page is turn to folio which means head page after this > patch. No, that commit is not likely to have been a fix for this issue. If there ever was such an issue in the 5.15 and 5.10 trees, it would more likely have been fixed by the munlock changes in 5.18, or by the removal of PageDoubleMap in 6.2. > > Fixes: d281ee614518 ("rmap: add argument to charge compound page") Perhaps, but I think an inappropriate backport is more likely. > Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> > --- > mm/rmap.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/rmap.c b/mm/rmap.c > index 330b361a460e..8838f6a9d65d 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1372,7 +1372,7 @@ void page_remove_rmap(struct page *page, bool compound) > __dec_lruvec_page_state(page, NR_ANON_MAPPED); > > if (unlikely(PageMlocked(page))) > - clear_page_mlock(page); > + clear_page_mlock(compound_head(page)); > > if (PageTransCompound(page)) > deferred_split_huge_page(compound_head(page)); And what about the clear_page_mlock() in page_remove_file_rmap()? Thanks, Hugh
On 2023/6/2 10:04, Hugh Dickins wrote: > On Mon, 29 May 2023, Wupeng Ma wrote: >> From: Ma Wupeng <mawupeng1@huawei.com> >> >> Our syzbot report a mlock related problem. During exit_mm, tail page is >> passed to clear_page_mlock which final lead to kernel panic. >> >> During unmap_page_range, if compound is false, it means this page is >> seen as a small page. This page is passed to isolate_lru_page if this >> page is PageMlocked and finally lead to "trying to isolate tail page" >> warning. >> >> Here is the simplified calltrace: >> >> unmap_page_range >> zap_pte_range >> page_remove_rmap(page, false); // compound is false means to handle >> to small page not compound page >> nr_pages = thp_nr_pages(page); >> clear_page_mlock(page) // maybe tail page here >> isolate_lru_page >> WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"); >> >> Since mlock is not supposed to handle tail, we pass head page to >> clear_page_mlock() to slove this problem. > > Your patch looks plausible for stable, and might even end up as the best > that can be done; but I think you have not root-caused the problem yet > (and until it's root-caused, there is likely to be other damage). This I do agreed. The root cause of this problem is still unknown. > > 5.15 and 5.10 were releases with the PageDoubleMap flag, and the intention > then was that a compound page with PageDoubleMap set could not be Mlocked, > and PageMlocked had to be cleared when setting PageDoubleMap. > > See, for example, the line in the old mlock_vma_page() > VM_BUG_ON_PAGE(PageCompound(page) && PageDoubleMap(page), page); > before it did the TestSetPageMlocked(). > > So it should have been impossible to find PageMlocked on a Tail page > (even with PageMlocked redirecting to the head page to look up the flag) > there; so unnecessary for clear_page_mlock() to use compound_head(). > > Since nobody reported this problem before, my suspicion is that a commit > has been backported to 5.15 and 5.10 stable, which does not belong there. > Or perhaps the stable trees are okay, but your own tree has an unsuitable > backport in it? We are using the latest 5.10/5.15 for testing, without any our own patches. The corresponding reproduction c file is attached as follow. We are tesing it in ARM64 with the following config enabled: CONFIG_KASAN=y CONFIG_DEBUG_VM=y CONFIG_DEBUG_LIST=y > >> >> This bug can lead to multiple reports. Here ares the simplified reports: >> >> ------------[ cut here ]------------ >> trying to isolate tail page >> WARNING: CPU: 1 PID: 24489 at mm/vmscan.c:2031 isolate_lru_page+0x574/0x660 >> >> page:fffffc000eb7a300 refcount:512 mapcount:0 mapping:0000000000000000 index:0x2008c pfn:0x3ede8c >> head:fffffc000eb78000 order:9 compound_mapcount:0 compound_pincount:0 >> memcg:ffff0000d24bc000 >> anon flags: 0x37ffff80009080c(uptodate|dirty|arch_1|head|swapbacked|node=1|zone=2|lastcpupid=0xfffff) >> raw: 037ffff800000800 fffffc000eb78001 fffffc000eb7a308 dead000000000400 >> raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 >> head: 037ffff80009080c fffffc000eb70008 fffffc000e350708 ffff0003829eb839 >> head: 0000000000020000 0000000000000000 00000200ffffffff ffff0000d24bc000 >> page dumped because: VM_WARN_ON_ONCE_PAGE(!memcg && !mem_cgroup_disabled()) >> ------------[ cut here ]------------ >> WARNING: CPU: 1 PID: 24489 at include/linux/memcontrol.h:767 lock_page_lruvec_irq+0x148/0x190 >> >> page:fffffc000eb7a300 refcount:0 mapcount:0 mapping:dead000000000400 index:0x0 pfn:0x3ede8c >> failed to read mapping contents, not a valid kernel address? >> flags: 0x37ffff800000800(arch_1|node=1|zone=2|lastcpupid=0xfffff) >> raw: 037ffff800000800 dead000000000100 dead000000000122 dead000000000400 >> raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 >> page dumped because: VM_BUG_ON_PAGE(((unsigned int) page_ref_count(page) + 127u <= 127u)) >> ------------[ cut here ]------------ >> kernel BUG at include/linux/mm.h:1213! >> Call trace: >> lru_cache_add+0x2d4/0x2e8 >> putback_lru_page+0x2c/0x168 >> clear_page_mlock+0x254/0x318 >> page_remove_rmap+0x900/0x9c0 >> unmap_page_range+0xa78/0x16a0 >> unmap_single_vma+0x114/0x1a0 >> unmap_vmas+0x100/0x220 >> exit_mmap+0x120/0x410 >> mmput+0x174/0x498 >> exit_mm+0x33c/0x460 >> do_exit+0x3c0/0x1310 >> do_group_exit+0x98/0x170 >> get_signal+0x370/0x13d0 >> do_notify_resume+0x5a0/0x968 >> el0_da+0x154/0x188 >> el0t_64_sync_handler+0x88/0xb8 >> el0t_64_sync+0x1a0/0x1a4 >> Code: 912b0021 aa1503e0 910c0021 9401a49c (d4210000) >> >> This bug can be reproduced in both linux-5.10.y & linux-5.15.y and maybe >> fixed after commit 889a3747b3b7 ("mm/lru: Add folio LRU functions"). >> This patch turn page into folio for LRU related operations, all >> operations to page is turn to folio which means head page after this >> patch. > > No, that commit is not likely to have been a fix for this issue. > If there ever was such an issue in the 5.15 and 5.10 trees, it would > more likely have been fixed by the munlock changes in 5.18, or by the > removal of PageDoubleMap in 6.2. Sorry, my bad, commit 889a3747b3b7 ("mm/lru: Add folio LRU functions") only fix one warning, the real fix is commit b109b87050df, ("mm/munlock: replace clear_page_mlock() by final clearance"). > >> >> Fixes: d281ee614518 ("rmap: add argument to charge compound page") > > Perhaps, but I think an inappropriate backport is more likely. > >> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> >> --- >> mm/rmap.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/rmap.c b/mm/rmap.c >> index 330b361a460e..8838f6a9d65d 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -1372,7 +1372,7 @@ void page_remove_rmap(struct page *page, bool compound) >> __dec_lruvec_page_state(page, NR_ANON_MAPPED); >> >> if (unlikely(PageMlocked(page))) >> - clear_page_mlock(page); >> + clear_page_mlock(compound_head(page)); >> >> if (PageTransCompound(page)) >> deferred_split_huge_page(compound_head(page)); > > And what about the clear_page_mlock() in page_remove_file_rmap()? According to the same logic, this should be fixed too. Thanks for your reply, Wupeng > > Thanks, > Hugh // autogenerated by syzkaller (https://github.com/google/syzkaller) #define _GNU_SOURCE #include <dirent.h> #include <endian.h> #include <errno.h> #include <fcntl.h> #include <sched.h> #include <setjmp.h> #include <signal.h> #include <stdarg.h> #include <stdbool.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/prctl.h> #include <sys/stat.h> #include <sys/syscall.h> #include <sys/types.h> #include <sys/wait.h> #include <time.h> #include <unistd.h> #ifndef __NR_clone #define __NR_clone 220 #endif #ifndef __NR_exit #define __NR_exit 93 #endif #ifndef __NR_mbind #define __NR_mbind 235 #endif #ifndef __NR_mlockall #define __NR_mlockall 230 #endif #ifndef __NR_mmap #define __NR_mmap 222 #endif #ifndef __NR_shmat #define __NR_shmat 196 #endif #ifndef __NR_shmget #define __NR_shmget 194 #endif static __thread int clone_ongoing; static __thread int skip_segv; static __thread jmp_buf segv_env; static void segv_handler(int sig, siginfo_t* info, void* ctx) { if (__atomic_load_n(&clone_ongoing, __ATOMIC_RELAXED) != 0) { exit(sig); } uintptr_t addr = (uintptr_t)info->si_addr; const uintptr_t prog_start = 1 << 20; const uintptr_t prog_end = 100 << 20; int skip = __atomic_load_n(&skip_segv, __ATOMIC_RELAXED) != 0; int valid = addr < prog_start || addr > prog_end; if (skip && valid) { _longjmp(segv_env, 1); } exit(sig); } static void install_segv_handler(void) { struct sigaction sa; memset(&sa, 0, sizeof(sa)); sa.sa_handler = SIG_IGN; syscall(SYS_rt_sigaction, 0x20, &sa, NULL, 8); syscall(SYS_rt_sigaction, 0x21, &sa, NULL, 8); memset(&sa, 0, sizeof(sa)); sa.sa_sigaction = segv_handler; sa.sa_flags = SA_NODEFER | SA_SIGINFO; sigaction(SIGSEGV, &sa, NULL); sigaction(SIGBUS, &sa, NULL); } #define NONFAILING(...) \ ({ \ int ok = 1; \ __atomic_fetch_add(&skip_segv, 1, __ATOMIC_SEQ_CST); \ if (_setjmp(segv_env) == 0) { \ __VA_ARGS__; \ } else \ ok = 0; \ __atomic_fetch_sub(&skip_segv, 1, __ATOMIC_SEQ_CST); \ ok; \ }) static void sleep_ms(uint64_t ms) { usleep(ms * 1000); } static uint64_t current_time_ms(void) { struct timespec ts; if (clock_gettime(CLOCK_MONOTONIC, &ts)) exit(1); return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000; } static bool write_file(const char* file, const char* what, ...) { char buf[1024]; va_list args; va_start(args, what); vsnprintf(buf, sizeof(buf), what, args); va_end(args); buf[sizeof(buf) - 1] = 0; int len = strlen(buf); int fd = open(file, O_WRONLY | O_CLOEXEC); if (fd == -1) return false; if (write(fd, buf, len) != len) { int err = errno; close(fd); errno = err; return false; } close(fd); return true; } static void kill_and_wait(int pid, int* status) { kill(-pid, SIGKILL); kill(pid, SIGKILL); for (int i = 0; i < 100; i++) { if (waitpid(-1, status, WNOHANG | __WALL) == pid) return; usleep(1000); } DIR* dir = opendir("/sys/fs/fuse/connections"); if (dir) { for (;;) { struct dirent* ent = readdir(dir); if (!ent) break; if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0) continue; char abort[300]; snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort", ent->d_name); int fd = open(abort, O_WRONLY); if (fd == -1) { continue; } if (write(fd, abort, 1) < 0) { } close(fd); } closedir(dir); } else { } while (waitpid(-1, status, __WALL) != pid) { } } static void setup_test() { prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0); setpgrp(); write_file("/proc/self/oom_score_adj", "1000"); } #define USLEEP_FORKED_CHILD (3 * 50 * 1000) static long handle_clone_ret(long ret) { if (ret != 0) { __atomic_store_n(&clone_ongoing, 0, __ATOMIC_RELAXED); return ret; } usleep(USLEEP_FORKED_CHILD); syscall(__NR_exit, 0); while (1) { } } static long syz_clone(volatile long flags, volatile long stack, volatile long stack_len, volatile long ptid, volatile long ctid, volatile long tls) { long sp = (stack + stack_len) & ~15; __atomic_store_n(&clone_ongoing, 1, __ATOMIC_RELAXED); long ret = (long)syscall(__NR_clone, flags & ~CLONE_VM, sp, ptid, ctid, tls); return handle_clone_ret(ret); } static void execute_one(void); #define WAIT_FLAGS __WALL static void loop(void) { int iter = 0; for (;; iter++) { int pid = fork(); if (pid < 0) exit(1); if (pid == 0) { setup_test(); execute_one(); exit(0); } int status = 0; uint64_t start = current_time_ms(); for (;;) { if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid) break; sleep_ms(1); if (current_time_ms() - start < 5000) continue; kill_and_wait(pid, &status); break; } } } uint64_t r[1] = {0x0}; void execute_one(void) { intptr_t res = 0; syscall(__NR_mlockall, 1ul); NONFAILING(memcpy( (void*)0x20000000, "\xe8\x55\xac\x84\x1a\xea\xd7\xfe\x5d\xcb\x7e\x83\x72\x7b\xad\xd2", 16)); NONFAILING(syz_clone(0, 0x20000000, 0x10, 0, 0, 0)); res = syscall(__NR_shmget, 0ul, 0x3000ul, 0ul, 0x20ffc000ul); if (res != -1) r[0] = res; syscall(__NR_shmat, r[0], 0x20ffe000ul, 0x4000ul); syscall(__NR_mbind, 0x200e7000ul, 0x400000ul, 0ul, 0ul, 0ul, 2ul); } int main(void) { syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul); syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul); syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul); install_segv_handler(); loop(); return 0; }
diff --git a/mm/rmap.c b/mm/rmap.c index 330b361a460e..8838f6a9d65d 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1372,7 +1372,7 @@ void page_remove_rmap(struct page *page, bool compound) __dec_lruvec_page_state(page, NR_ANON_MAPPED); if (unlikely(PageMlocked(page))) - clear_page_mlock(page); + clear_page_mlock(compound_head(page)); if (PageTransCompound(page)) deferred_split_huge_page(compound_head(page));