Message ID | 20221116083811.464678-3-liushixin2@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp12160wru; Tue, 15 Nov 2022 23:56:49 -0800 (PST) X-Google-Smtp-Source: AA0mqf5YDHWLWWL1IQaDQsm91fGjRHswjRYQgMpyVvO5U9i1soTk1RljZedmTWPHrbkzxT+Xe7l0 X-Received: by 2002:a17:90b:3a8d:b0:210:5de0:f3e3 with SMTP id om13-20020a17090b3a8d00b002105de0f3e3mr2477159pjb.61.1668585409028; Tue, 15 Nov 2022 23:56:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668585409; cv=none; d=google.com; s=arc-20160816; b=PHrtZY4tnCjy9Erav/rp8FdCihx6rwrQKNXJWLSrMEUacTUkD84xoVWOLzVV2RWZjT KAnolaTBhW/HqNDZuWIIkjuilOgpcH9b+3ikCeek7xVesTDx6gKcIvIQAGgzrAyvhmm8 vm7gCn0F96mdSYUJFEAVywCk36Q4GunYJ9Rfugq6gfIuPZGRjM2x2QsvJ4Oz9Fa/MqEy H3wru6VtkQAJncdHsyqOTt7aPNL3lXzMs35jGe1vRgXBYcbfCs5jWxJLlnLdf6RfjZWl BOWlDPJ4Pk6XRxOqxt+r/63oyEwjoViRaM8UcGWiXKw9DByX/n8go0sP/RNJ4dHKGwSJ 74LA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=hh+GG7klkx4TeJ32mrCN5Rnn5BNFEGZKTmT7eD7ZjsM=; b=bl0KUVNtAeevCLfRDyui0Zv6S4vQhbv6yF84jCsjmd/DcEWwfhKXTZ8NqvxmGJQtD9 JJal7/ud4DDWEKLkNaelgPiUo81/MGFMomTmuJ4HgyzyiXv7Vhi5EJTlAFWI6Alp9YNE 1tVo1yRaTq0DR4+EZjwJp/DitCr5zkDKVEZX5ZHyUcEtXBH7M7sTh1i1zzB+UUnYdmKZ PYvUUBIpVsPWRZWda0Uex23ergiz1ISnUFbinwBtfz8nBNajNdPFXUcm3vDYNZp8Ux1A Ebykdq3MdRptvw7vocAM52PMnFJbOIFfv28BnbNmA3J4BkqqKhW0z0xJKhsMiybDpWVN lSYg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n2-20020a638f02000000b004622caf9ac6si13907544pgd.781.2022.11.15.23.56.35; Tue, 15 Nov 2022 23:56:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232324AbiKPHvO (ORCPT <rfc822;maxim.cournoyer@gmail.com> + 99 others); Wed, 16 Nov 2022 02:51:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57578 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232527AbiKPHvE (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 16 Nov 2022 02:51:04 -0500 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 02D2013D3A for <linux-kernel@vger.kernel.org>; Tue, 15 Nov 2022 23:51:03 -0800 (PST) Received: from dggpemm500023.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4NBwFs6TCXzRpMy; Wed, 16 Nov 2022 15:50:41 +0800 (CST) Received: from dggpemm100009.china.huawei.com (7.185.36.113) by dggpemm500023.china.huawei.com (7.185.36.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Wed, 16 Nov 2022 15:51:01 +0800 Received: from huawei.com (10.175.113.32) by dggpemm100009.china.huawei.com (7.185.36.113) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Wed, 16 Nov 2022 15:51:00 +0800 From: Liu Shixin <liushixin2@huawei.com> To: Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Denys Vlasenko <dvlasenk@redhat.com>, Kefeng Wang <wangkefeng.wang@huawei.com>, Anshuman Khandual <anshuman.khandual@arm.com>, David Hildenbrand <dhildenb@redhat.com>, Rafael Aquini <raquini@redhat.com>, Pasha Tatashin <pasha.tatashin@soleen.com> CC: <linux-arm-kernel@lists.infradead.org>, <linux-kernel@vger.kernel.org>, Liu Shixin <liushixin2@huawei.com> Subject: [PATCH 2/2] arm64/mm: fix incorrect file_map_count for invalid pmd/pud Date: Wed, 16 Nov 2022 16:38:11 +0800 Message-ID: <20221116083811.464678-3-liushixin2@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221116083811.464678-1-liushixin2@huawei.com> References: <20221116083811.464678-1-liushixin2@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.175.113.32] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpemm100009.china.huawei.com (7.185.36.113) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749638613600387369?= X-GMAIL-MSGID: =?utf-8?q?1749638613600387369?= |
Series |
arm64: fix two bug about page table check
|
|
Commit Message
Liu Shixin
Nov. 16, 2022, 8:38 a.m. UTC
The page table check trigger BUG_ON() unexpectedly when split hugepage:
------------[ cut here ]------------
kernel BUG at mm/page_table_check.c:119!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748
Hardware name: linux,dummy-virt (DT)
pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : page_table_check_set.isra.0+0x398/0x468
lr : page_table_check_set.isra.0+0x1c0/0x468
[...]
Call trace:
page_table_check_set.isra.0+0x398/0x468
__page_table_check_pte_set+0x160/0x1c0
__split_huge_pmd_locked+0x900/0x1648
__split_huge_pmd+0x28c/0x3b8
unmap_page_range+0x428/0x858
unmap_single_vma+0xf4/0x1c8
zap_page_range+0x2b0/0x410
madvise_vma_behavior+0xc44/0xe78
do_madvise+0x280/0x698
__arm64_sys_madvise+0x90/0xe8
invoke_syscall.constprop.0+0xdc/0x1d8
do_el0_svc+0xf4/0x3f8
el0_svc+0x58/0x120
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x19c/0x1a0
[...]
On arm64, pmd_present() will return true even if the pmd is invalid. So
in pmdp_invalidate() the file_map_count will not only decrease once but
also increase once. Then in set_pte_at(), the file_map_count increase
again, and so trigger BUG_ON() unexpectedly.
Fix this problem by adding pmd_valid() in pmd_user_accessible_page().
Moreover, add pud_valid() for pud_user_accessible_page() too.
Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK")
Reported-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
---
arch/arm64/include/asm/pgtable.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Comments
On 16.11.22 09:38, Liu Shixin wrote: > The page table check trigger BUG_ON() unexpectedly when split hugepage: > > ------------[ cut here ]------------ > kernel BUG at mm/page_table_check.c:119! > Internal error: Oops - BUG: 00000000f2000800 [#1] SMP > Dumping ftrace buffer: > (ftrace buffer empty) > Modules linked in: > CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748 > Hardware name: linux,dummy-virt (DT) > pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : page_table_check_set.isra.0+0x398/0x468 > lr : page_table_check_set.isra.0+0x1c0/0x468 > [...] > Call trace: > page_table_check_set.isra.0+0x398/0x468 > __page_table_check_pte_set+0x160/0x1c0 > __split_huge_pmd_locked+0x900/0x1648 > __split_huge_pmd+0x28c/0x3b8 > unmap_page_range+0x428/0x858 > unmap_single_vma+0xf4/0x1c8 > zap_page_range+0x2b0/0x410 > madvise_vma_behavior+0xc44/0xe78 > do_madvise+0x280/0x698 > __arm64_sys_madvise+0x90/0xe8 > invoke_syscall.constprop.0+0xdc/0x1d8 > do_el0_svc+0xf4/0x3f8 > el0_svc+0x58/0x120 > el0t_64_sync_handler+0xb8/0xc0 > el0t_64_sync+0x19c/0x1a0 > [...] > > On arm64, pmd_present() will return true even if the pmd is invalid. I assume that's because of the pmd_present_invalid() check. ... I wonder why that behavior was chosen. Sounds error-prone to me. Fix LGHTM, but I am not an arm64 pgtable expert.
On Wed, Nov 16, 2022 at 2:51 AM Liu Shixin <liushixin2@huawei.com> wrote: > > The page table check trigger BUG_ON() unexpectedly when split hugepage: > > ------------[ cut here ]------------ > kernel BUG at mm/page_table_check.c:119! > Internal error: Oops - BUG: 00000000f2000800 [#1] SMP > Dumping ftrace buffer: > (ftrace buffer empty) > Modules linked in: > CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748 > Hardware name: linux,dummy-virt (DT) > pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : page_table_check_set.isra.0+0x398/0x468 > lr : page_table_check_set.isra.0+0x1c0/0x468 > [...] > Call trace: > page_table_check_set.isra.0+0x398/0x468 > __page_table_check_pte_set+0x160/0x1c0 > __split_huge_pmd_locked+0x900/0x1648 > __split_huge_pmd+0x28c/0x3b8 > unmap_page_range+0x428/0x858 > unmap_single_vma+0xf4/0x1c8 > zap_page_range+0x2b0/0x410 > madvise_vma_behavior+0xc44/0xe78 > do_madvise+0x280/0x698 > __arm64_sys_madvise+0x90/0xe8 > invoke_syscall.constprop.0+0xdc/0x1d8 > do_el0_svc+0xf4/0x3f8 > el0_svc+0x58/0x120 > el0t_64_sync_handler+0xb8/0xc0 > el0t_64_sync+0x19c/0x1a0 > [...] > > On arm64, pmd_present() will return true even if the pmd is invalid. > in pmdp_invalidate() the file_map_count will not only decrease once but > also increase once. Then in set_pte_at(), the file_map_count increase > again, and so trigger BUG_ON() unexpectedly. > static inline bool pmd_user_accessible_page(pmd_t pmd) > { > - return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)); > + return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)) && pmd_valid(pmd); > } > > static inline bool pud_user_accessible_page(pud_t pud) > { > - return pud_leaf(pud) && pud_user(pud); > + return pud_leaf(pud) && pud_user(pud) && pud_valid(pud); This looks closer to x86 where the check is directly: pmd_val(pmd) & _PAGE_PRESENT, without PTE_PROT_NONE that is part of pmd_present() Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com> Thanks, Pasha
On Wed, Nov 16, 2022 at 10:08:27AM +0100, David Hildenbrand wrote: > On 16.11.22 09:38, Liu Shixin wrote: > > The page table check trigger BUG_ON() unexpectedly when split hugepage: > > > > ------------[ cut here ]------------ > > kernel BUG at mm/page_table_check.c:119! > > Internal error: Oops - BUG: 00000000f2000800 [#1] SMP > > Dumping ftrace buffer: > > (ftrace buffer empty) > > Modules linked in: > > CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748 > > Hardware name: linux,dummy-virt (DT) > > pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > > pc : page_table_check_set.isra.0+0x398/0x468 > > lr : page_table_check_set.isra.0+0x1c0/0x468 > > [...] > > Call trace: > > page_table_check_set.isra.0+0x398/0x468 > > __page_table_check_pte_set+0x160/0x1c0 > > __split_huge_pmd_locked+0x900/0x1648 > > __split_huge_pmd+0x28c/0x3b8 > > unmap_page_range+0x428/0x858 > > unmap_single_vma+0xf4/0x1c8 > > zap_page_range+0x2b0/0x410 > > madvise_vma_behavior+0xc44/0xe78 > > do_madvise+0x280/0x698 > > __arm64_sys_madvise+0x90/0xe8 > > invoke_syscall.constprop.0+0xdc/0x1d8 > > do_el0_svc+0xf4/0x3f8 > > el0_svc+0x58/0x120 > > el0t_64_sync_handler+0xb8/0xc0 > > el0t_64_sync+0x19c/0x1a0 > > [...] > > > > On arm64, pmd_present() will return true even if the pmd is invalid. > > I assume that's because of the pmd_present_invalid() check. > > ... I wonder why that behavior was chosen. Sounds error-prone to me. That seems to be down to commit: b65399f6111b03df ("arm64/mm: Change THP helpers to comply with generic MM semantics") ... apparently because Andrea Arcangelli said this was necessary in: https://lore.kernel.org/lkml/20181017020930.GN30832@redhat.com/ ... but that does see to contradict what's said in: Documentation/mm/arch_pgtable_helpers.rst ... which just says: pmd_present Tests a valid mapped PMD ... and it's not clear to me why this *only* applies to the PMD level. Anshuman? Thanks, Mark.
On Wed, Nov 16, 2022 at 04:38:11PM +0800, Liu Shixin wrote: > The page table check trigger BUG_ON() unexpectedly when split hugepage: > > ------------[ cut here ]------------ > kernel BUG at mm/page_table_check.c:119! > Internal error: Oops - BUG: 00000000f2000800 [#1] SMP > Dumping ftrace buffer: > (ftrace buffer empty) > Modules linked in: > CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748 > Hardware name: linux,dummy-virt (DT) > pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : page_table_check_set.isra.0+0x398/0x468 > lr : page_table_check_set.isra.0+0x1c0/0x468 > [...] > Call trace: > page_table_check_set.isra.0+0x398/0x468 > __page_table_check_pte_set+0x160/0x1c0 > __split_huge_pmd_locked+0x900/0x1648 > __split_huge_pmd+0x28c/0x3b8 > unmap_page_range+0x428/0x858 > unmap_single_vma+0xf4/0x1c8 > zap_page_range+0x2b0/0x410 > madvise_vma_behavior+0xc44/0xe78 > do_madvise+0x280/0x698 > __arm64_sys_madvise+0x90/0xe8 > invoke_syscall.constprop.0+0xdc/0x1d8 > do_el0_svc+0xf4/0x3f8 > el0_svc+0x58/0x120 > el0t_64_sync_handler+0xb8/0xc0 > el0t_64_sync+0x19c/0x1a0 > [...] > > On arm64, pmd_present() will return true even if the pmd is invalid. So > in pmdp_invalidate() the file_map_count will not only decrease once but > also increase once. Then in set_pte_at(), the file_map_count increase > again, and so trigger BUG_ON() unexpectedly. It's not clear to me how pmd_present() relates to p?d_user_accessible_page() below. How are they related? (or is this a copy-paste error)? > Fix this problem by adding pmd_valid() in pmd_user_accessible_page(). > Moreover, add pud_valid() for pud_user_accessible_page() too. > > Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK") > Reported-by: Denys Vlasenko <dvlasenk@redhat.com> > Signed-off-by: Liu Shixin <liushixin2@huawei.com> > --- > arch/arm64/include/asm/pgtable.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index edf6625ce965..56e178de75e7 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -863,12 +863,12 @@ static inline bool pte_user_accessible_page(pte_t pte) > > static inline bool pmd_user_accessible_page(pmd_t pmd) > { > - return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)); > + return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)) && pmd_valid(pmd); > } > > static inline bool pud_user_accessible_page(pud_t pud) > { > - return pud_leaf(pud) && pud_user(pud); > + return pud_leaf(pud) && pud_user(pud) && pud_valid(pud); I think these p?d_valid() checks should be first for clarity, since the other bits aren't necessarily meaningful for !valid entries. Thanks, Mark.
On 2022/11/16 23:52, Mark Rutland wrote: > On Wed, Nov 16, 2022 at 04:38:11PM +0800, Liu Shixin wrote: >> The page table check trigger BUG_ON() unexpectedly when split hugepage: >> >> ------------[ cut here ]------------ >> kernel BUG at mm/page_table_check.c:119! >> Internal error: Oops - BUG: 00000000f2000800 [#1] SMP >> Dumping ftrace buffer: >> (ftrace buffer empty) >> Modules linked in: >> CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748 >> Hardware name: linux,dummy-virt (DT) >> pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> pc : page_table_check_set.isra.0+0x398/0x468 >> lr : page_table_check_set.isra.0+0x1c0/0x468 >> [...] >> Call trace: >> page_table_check_set.isra.0+0x398/0x468 >> __page_table_check_pte_set+0x160/0x1c0 >> __split_huge_pmd_locked+0x900/0x1648 >> __split_huge_pmd+0x28c/0x3b8 >> unmap_page_range+0x428/0x858 >> unmap_single_vma+0xf4/0x1c8 >> zap_page_range+0x2b0/0x410 >> madvise_vma_behavior+0xc44/0xe78 >> do_madvise+0x280/0x698 >> __arm64_sys_madvise+0x90/0xe8 >> invoke_syscall.constprop.0+0xdc/0x1d8 >> do_el0_svc+0xf4/0x3f8 >> el0_svc+0x58/0x120 >> el0t_64_sync_handler+0xb8/0xc0 >> el0t_64_sync+0x19c/0x1a0 >> [...] >> >> On arm64, pmd_present() will return true even if the pmd is invalid. So >> in pmdp_invalidate() the file_map_count will not only decrease once but >> also increase once. Then in set_pte_at(), the file_map_count increase >> again, and so trigger BUG_ON() unexpectedly. > It's not clear to me how pmd_present() relates to p?d_user_accessible_page() > below. How are they related? (or is this a copy-paste error)? Yes, should be pmd_leaf(). In the previous patch, pmd_present() has already replaced with pmd_leaf(). Thanks for your careful discovery. Will fix in next version. >> Fix this problem by adding pmd_valid() in pmd_user_accessible_page(). >> Moreover, add pud_valid() for pud_user_accessible_page() too. >> >> Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK") >> Reported-by: Denys Vlasenko <dvlasenk@redhat.com> >> Signed-off-by: Liu Shixin <liushixin2@huawei.com> >> --- >> arch/arm64/include/asm/pgtable.h | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h >> index edf6625ce965..56e178de75e7 100644 >> --- a/arch/arm64/include/asm/pgtable.h >> +++ b/arch/arm64/include/asm/pgtable.h >> @@ -863,12 +863,12 @@ static inline bool pte_user_accessible_page(pte_t pte) >> >> static inline bool pmd_user_accessible_page(pmd_t pmd) >> { >> - return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)); >> + return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)) && pmd_valid(pmd); >> } >> >> static inline bool pud_user_accessible_page(pud_t pud) >> { >> - return pud_leaf(pud) && pud_user(pud); >> + return pud_leaf(pud) && pud_user(pud) && pud_valid(pud); > I think these p?d_valid() checks should be first for clarity, since the other > bits aren't necessarily meaningful for !valid entries. Thanks for your advice. > > Thanks, > Mark. > > . >
On 11/16/22 21:16, Mark Rutland wrote: > On Wed, Nov 16, 2022 at 10:08:27AM +0100, David Hildenbrand wrote: >> On 16.11.22 09:38, Liu Shixin wrote: >>> The page table check trigger BUG_ON() unexpectedly when split hugepage: >>> >>> ------------[ cut here ]------------ >>> kernel BUG at mm/page_table_check.c:119! >>> Internal error: Oops - BUG: 00000000f2000800 [#1] SMP >>> Dumping ftrace buffer: >>> (ftrace buffer empty) >>> Modules linked in: >>> CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748 >>> Hardware name: linux,dummy-virt (DT) >>> pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>> pc : page_table_check_set.isra.0+0x398/0x468 >>> lr : page_table_check_set.isra.0+0x1c0/0x468 >>> [...] >>> Call trace: >>> page_table_check_set.isra.0+0x398/0x468 >>> __page_table_check_pte_set+0x160/0x1c0 >>> __split_huge_pmd_locked+0x900/0x1648 >>> __split_huge_pmd+0x28c/0x3b8 >>> unmap_page_range+0x428/0x858 >>> unmap_single_vma+0xf4/0x1c8 >>> zap_page_range+0x2b0/0x410 >>> madvise_vma_behavior+0xc44/0xe78 >>> do_madvise+0x280/0x698 >>> __arm64_sys_madvise+0x90/0xe8 >>> invoke_syscall.constprop.0+0xdc/0x1d8 >>> do_el0_svc+0xf4/0x3f8 >>> el0_svc+0x58/0x120 >>> el0t_64_sync_handler+0xb8/0xc0 >>> el0t_64_sync+0x19c/0x1a0 >>> [...] >>> >>> On arm64, pmd_present() will return true even if the pmd is invalid. >> >> I assume that's because of the pmd_present_invalid() check. >> >> ... I wonder why that behavior was chosen. Sounds error-prone to me. > > That seems to be down to commit: > > b65399f6111b03df ("arm64/mm: Change THP helpers to comply with generic MM semantics") > > ... apparently because Andrea Arcangelli said this was necessary in: > > https://lore.kernel.org/lkml/20181017020930.GN30832@redhat.com/ > > ... but that does see to contradict what's said in: > > Documentation/mm/arch_pgtable_helpers.rst > > ... which just says: > > pmd_present Tests a valid mapped PMD It should be as follows instead, will update. Not sure about PUD level though, where anon THP is not supported (AFAIK). +---------------------------+--------------------------------------------------+ | pmd_present | Tests if pmd_page() points to valid memory page | +---------------------------+--------------------------------------------------+ > > ... and it's not clear to me why this *only* applies to the PMD level. > > Anshuman? Because THP is supported at PMD level. As Andrea had explained earlier, pmd_present() should return positive if pmd_page() on the entry points to valid memory irrespective of whether the entry is valid/mapped or not. That is the semantics expected in generic THP during PMD split, collapse, migration etc and other memory code walking past such PMD entries. That was my understanding.
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index edf6625ce965..56e178de75e7 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -863,12 +863,12 @@ static inline bool pte_user_accessible_page(pte_t pte) static inline bool pmd_user_accessible_page(pmd_t pmd) { - return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)); + return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)) && pmd_valid(pmd); } static inline bool pud_user_accessible_page(pud_t pud) { - return pud_leaf(pud) && pud_user(pud); + return pud_leaf(pud) && pud_user(pud) && pud_valid(pud); } #endif