[2/2] arm64/mm: fix incorrect file_map_count for invalid pmd/pud

Message ID 20221116083811.464678-3-liushixin2@huawei.com
State New
Headers
Series arm64: fix two bug about page table check |

Commit Message

Liu Shixin Nov. 16, 2022, 8:38 a.m. UTC
  The page table check trigger BUG_ON() unexpectedly when split hugepage:

 ------------[ cut here ]------------
 kernel BUG at mm/page_table_check.c:119!
 Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
 Dumping ftrace buffer:
    (ftrace buffer empty)
 Modules linked in:
 CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748
 Hardware name: linux,dummy-virt (DT)
 pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : page_table_check_set.isra.0+0x398/0x468
 lr : page_table_check_set.isra.0+0x1c0/0x468
[...]
 Call trace:
  page_table_check_set.isra.0+0x398/0x468
  __page_table_check_pte_set+0x160/0x1c0
  __split_huge_pmd_locked+0x900/0x1648
  __split_huge_pmd+0x28c/0x3b8
  unmap_page_range+0x428/0x858
  unmap_single_vma+0xf4/0x1c8
  zap_page_range+0x2b0/0x410
  madvise_vma_behavior+0xc44/0xe78
  do_madvise+0x280/0x698
  __arm64_sys_madvise+0x90/0xe8
  invoke_syscall.constprop.0+0xdc/0x1d8
  do_el0_svc+0xf4/0x3f8
  el0_svc+0x58/0x120
  el0t_64_sync_handler+0xb8/0xc0
  el0t_64_sync+0x19c/0x1a0
[...]

On arm64, pmd_present() will return true even if the pmd is invalid. So
in pmdp_invalidate() the file_map_count will not only decrease once but
also increase once. Then in set_pte_at(), the file_map_count increase
again, and so trigger BUG_ON() unexpectedly.

Fix this problem by adding pmd_valid() in pmd_user_accessible_page().
Moreover, add pud_valid() for pud_user_accessible_page() too.

Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK")
Reported-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
---
 arch/arm64/include/asm/pgtable.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

David Hildenbrand Nov. 16, 2022, 9:08 a.m. UTC | #1
On 16.11.22 09:38, Liu Shixin wrote:
> The page table check trigger BUG_ON() unexpectedly when split hugepage:
> 
>   ------------[ cut here ]------------
>   kernel BUG at mm/page_table_check.c:119!
>   Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>   Dumping ftrace buffer:
>      (ftrace buffer empty)
>   Modules linked in:
>   CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748
>   Hardware name: linux,dummy-virt (DT)
>   pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   pc : page_table_check_set.isra.0+0x398/0x468
>   lr : page_table_check_set.isra.0+0x1c0/0x468
> [...]
>   Call trace:
>    page_table_check_set.isra.0+0x398/0x468
>    __page_table_check_pte_set+0x160/0x1c0
>    __split_huge_pmd_locked+0x900/0x1648
>    __split_huge_pmd+0x28c/0x3b8
>    unmap_page_range+0x428/0x858
>    unmap_single_vma+0xf4/0x1c8
>    zap_page_range+0x2b0/0x410
>    madvise_vma_behavior+0xc44/0xe78
>    do_madvise+0x280/0x698
>    __arm64_sys_madvise+0x90/0xe8
>    invoke_syscall.constprop.0+0xdc/0x1d8
>    do_el0_svc+0xf4/0x3f8
>    el0_svc+0x58/0x120
>    el0t_64_sync_handler+0xb8/0xc0
>    el0t_64_sync+0x19c/0x1a0
> [...]
> 
> On arm64, pmd_present() will return true even if the pmd is invalid.

I assume that's because of the pmd_present_invalid() check.

... I wonder why that behavior was chosen. Sounds error-prone to me.

Fix LGHTM, but I am not an arm64 pgtable expert.
  
Pasha Tatashin Nov. 16, 2022, 3:18 p.m. UTC | #2
On Wed, Nov 16, 2022 at 2:51 AM Liu Shixin <liushixin2@huawei.com> wrote:
>
> The page table check trigger BUG_ON() unexpectedly when split hugepage:
>
>  ------------[ cut here ]------------
>  kernel BUG at mm/page_table_check.c:119!
>  Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>  Dumping ftrace buffer:
>     (ftrace buffer empty)
>  Modules linked in:
>  CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748
>  Hardware name: linux,dummy-virt (DT)
>  pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>  pc : page_table_check_set.isra.0+0x398/0x468
>  lr : page_table_check_set.isra.0+0x1c0/0x468
> [...]
>  Call trace:
>   page_table_check_set.isra.0+0x398/0x468
>   __page_table_check_pte_set+0x160/0x1c0
>   __split_huge_pmd_locked+0x900/0x1648
>   __split_huge_pmd+0x28c/0x3b8
>   unmap_page_range+0x428/0x858
>   unmap_single_vma+0xf4/0x1c8
>   zap_page_range+0x2b0/0x410
>   madvise_vma_behavior+0xc44/0xe78
>   do_madvise+0x280/0x698
>   __arm64_sys_madvise+0x90/0xe8
>   invoke_syscall.constprop.0+0xdc/0x1d8
>   do_el0_svc+0xf4/0x3f8
>   el0_svc+0x58/0x120
>   el0t_64_sync_handler+0xb8/0xc0
>   el0t_64_sync+0x19c/0x1a0
> [...]
>
> On arm64, pmd_present() will return true even if the pmd is invalid.
> in pmdp_invalidate() the file_map_count will not only decrease once but
> also increase once. Then in set_pte_at(), the file_map_count increase
> again, and so trigger BUG_ON() unexpectedly.

>  static inline bool pmd_user_accessible_page(pmd_t pmd)
>  {
> -       return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
> +       return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)) && pmd_valid(pmd);
>  }
>
>  static inline bool pud_user_accessible_page(pud_t pud)
>  {
> -       return pud_leaf(pud) && pud_user(pud);
> +       return pud_leaf(pud) && pud_user(pud) && pud_valid(pud);

This looks closer to x86 where the check is directly: pmd_val(pmd) &
_PAGE_PRESENT, without PTE_PROT_NONE that is part of pmd_present()

Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com>

Thanks,
Pasha
  
Mark Rutland Nov. 16, 2022, 3:46 p.m. UTC | #3
On Wed, Nov 16, 2022 at 10:08:27AM +0100, David Hildenbrand wrote:
> On 16.11.22 09:38, Liu Shixin wrote:
> > The page table check trigger BUG_ON() unexpectedly when split hugepage:
> > 
> >   ------------[ cut here ]------------
> >   kernel BUG at mm/page_table_check.c:119!
> >   Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
> >   Dumping ftrace buffer:
> >      (ftrace buffer empty)
> >   Modules linked in:
> >   CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748
> >   Hardware name: linux,dummy-virt (DT)
> >   pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >   pc : page_table_check_set.isra.0+0x398/0x468
> >   lr : page_table_check_set.isra.0+0x1c0/0x468
> > [...]
> >   Call trace:
> >    page_table_check_set.isra.0+0x398/0x468
> >    __page_table_check_pte_set+0x160/0x1c0
> >    __split_huge_pmd_locked+0x900/0x1648
> >    __split_huge_pmd+0x28c/0x3b8
> >    unmap_page_range+0x428/0x858
> >    unmap_single_vma+0xf4/0x1c8
> >    zap_page_range+0x2b0/0x410
> >    madvise_vma_behavior+0xc44/0xe78
> >    do_madvise+0x280/0x698
> >    __arm64_sys_madvise+0x90/0xe8
> >    invoke_syscall.constprop.0+0xdc/0x1d8
> >    do_el0_svc+0xf4/0x3f8
> >    el0_svc+0x58/0x120
> >    el0t_64_sync_handler+0xb8/0xc0
> >    el0t_64_sync+0x19c/0x1a0
> > [...]
> > 
> > On arm64, pmd_present() will return true even if the pmd is invalid.
> 
> I assume that's because of the pmd_present_invalid() check.
> 
> ... I wonder why that behavior was chosen. Sounds error-prone to me.

That seems to be down to commit:

  b65399f6111b03df ("arm64/mm: Change THP helpers to comply with generic MM semantics")

... apparently because Andrea Arcangelli said this was necessary in:

  https://lore.kernel.org/lkml/20181017020930.GN30832@redhat.com/

... but that does see to contradict what's said in:

  Documentation/mm/arch_pgtable_helpers.rst

... which just says:

  pmd_present  Tests a valid mapped PMD 

... and it's not clear to me why this *only* applies to the PMD level.

Anshuman?

Thanks,
Mark.
  
Mark Rutland Nov. 16, 2022, 3:52 p.m. UTC | #4
On Wed, Nov 16, 2022 at 04:38:11PM +0800, Liu Shixin wrote:
> The page table check trigger BUG_ON() unexpectedly when split hugepage:
> 
>  ------------[ cut here ]------------
>  kernel BUG at mm/page_table_check.c:119!
>  Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>  Dumping ftrace buffer:
>     (ftrace buffer empty)
>  Modules linked in:
>  CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748
>  Hardware name: linux,dummy-virt (DT)
>  pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>  pc : page_table_check_set.isra.0+0x398/0x468
>  lr : page_table_check_set.isra.0+0x1c0/0x468
> [...]
>  Call trace:
>   page_table_check_set.isra.0+0x398/0x468
>   __page_table_check_pte_set+0x160/0x1c0
>   __split_huge_pmd_locked+0x900/0x1648
>   __split_huge_pmd+0x28c/0x3b8
>   unmap_page_range+0x428/0x858
>   unmap_single_vma+0xf4/0x1c8
>   zap_page_range+0x2b0/0x410
>   madvise_vma_behavior+0xc44/0xe78
>   do_madvise+0x280/0x698
>   __arm64_sys_madvise+0x90/0xe8
>   invoke_syscall.constprop.0+0xdc/0x1d8
>   do_el0_svc+0xf4/0x3f8
>   el0_svc+0x58/0x120
>   el0t_64_sync_handler+0xb8/0xc0
>   el0t_64_sync+0x19c/0x1a0
> [...]
> 
> On arm64, pmd_present() will return true even if the pmd is invalid. So
> in pmdp_invalidate() the file_map_count will not only decrease once but
> also increase once. Then in set_pte_at(), the file_map_count increase
> again, and so trigger BUG_ON() unexpectedly.

It's not clear to me how pmd_present() relates to p?d_user_accessible_page()
below. How are they related? (or is this a copy-paste error)?

> Fix this problem by adding pmd_valid() in pmd_user_accessible_page().
> Moreover, add pud_valid() for pud_user_accessible_page() too.
> 
> Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK")
> Reported-by: Denys Vlasenko <dvlasenk@redhat.com>
> Signed-off-by: Liu Shixin <liushixin2@huawei.com>
> ---
>  arch/arm64/include/asm/pgtable.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index edf6625ce965..56e178de75e7 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -863,12 +863,12 @@ static inline bool pte_user_accessible_page(pte_t pte)
>  
>  static inline bool pmd_user_accessible_page(pmd_t pmd)
>  {
> -	return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
> +	return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)) && pmd_valid(pmd);
>  }
>  
>  static inline bool pud_user_accessible_page(pud_t pud)
>  {
> -	return pud_leaf(pud) && pud_user(pud);
> +	return pud_leaf(pud) && pud_user(pud) && pud_valid(pud);

I think these p?d_valid() checks should be first for clarity, since the other
bits aren't necessarily meaningful for !valid entries.

Thanks,
Mark.
  
Liu Shixin Nov. 17, 2022, 3:15 a.m. UTC | #5
On 2022/11/16 23:52, Mark Rutland wrote:
> On Wed, Nov 16, 2022 at 04:38:11PM +0800, Liu Shixin wrote:
>> The page table check trigger BUG_ON() unexpectedly when split hugepage:
>>
>>  ------------[ cut here ]------------
>>  kernel BUG at mm/page_table_check.c:119!
>>  Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>>  Dumping ftrace buffer:
>>     (ftrace buffer empty)
>>  Modules linked in:
>>  CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748
>>  Hardware name: linux,dummy-virt (DT)
>>  pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>  pc : page_table_check_set.isra.0+0x398/0x468
>>  lr : page_table_check_set.isra.0+0x1c0/0x468
>> [...]
>>  Call trace:
>>   page_table_check_set.isra.0+0x398/0x468
>>   __page_table_check_pte_set+0x160/0x1c0
>>   __split_huge_pmd_locked+0x900/0x1648
>>   __split_huge_pmd+0x28c/0x3b8
>>   unmap_page_range+0x428/0x858
>>   unmap_single_vma+0xf4/0x1c8
>>   zap_page_range+0x2b0/0x410
>>   madvise_vma_behavior+0xc44/0xe78
>>   do_madvise+0x280/0x698
>>   __arm64_sys_madvise+0x90/0xe8
>>   invoke_syscall.constprop.0+0xdc/0x1d8
>>   do_el0_svc+0xf4/0x3f8
>>   el0_svc+0x58/0x120
>>   el0t_64_sync_handler+0xb8/0xc0
>>   el0t_64_sync+0x19c/0x1a0
>> [...]
>>
>> On arm64, pmd_present() will return true even if the pmd is invalid. So
>> in pmdp_invalidate() the file_map_count will not only decrease once but
>> also increase once. Then in set_pte_at(), the file_map_count increase
>> again, and so trigger BUG_ON() unexpectedly.
> It's not clear to me how pmd_present() relates to p?d_user_accessible_page()
> below. How are they related? (or is this a copy-paste error)?
Yes, should be pmd_leaf(). In the previous patch, pmd_present() has already replaced with pmd_leaf().
Thanks for your careful discovery. Will fix in next version.
>> Fix this problem by adding pmd_valid() in pmd_user_accessible_page().
>> Moreover, add pud_valid() for pud_user_accessible_page() too.
>>
>> Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK")
>> Reported-by: Denys Vlasenko <dvlasenk@redhat.com>
>> Signed-off-by: Liu Shixin <liushixin2@huawei.com>
>> ---
>>  arch/arm64/include/asm/pgtable.h | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index edf6625ce965..56e178de75e7 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -863,12 +863,12 @@ static inline bool pte_user_accessible_page(pte_t pte)
>>  
>>  static inline bool pmd_user_accessible_page(pmd_t pmd)
>>  {
>> -	return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
>> +	return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)) && pmd_valid(pmd);
>>  }
>>  
>>  static inline bool pud_user_accessible_page(pud_t pud)
>>  {
>> -	return pud_leaf(pud) && pud_user(pud);
>> +	return pud_leaf(pud) && pud_user(pud) && pud_valid(pud);
> I think these p?d_valid() checks should be first for clarity, since the other
> bits aren't necessarily meaningful for !valid entries.
Thanks for your advice.
>
> Thanks,
> Mark.
>
> .
>
  
Anshuman Khandual Nov. 17, 2022, 4:24 a.m. UTC | #6
On 11/16/22 21:16, Mark Rutland wrote:
> On Wed, Nov 16, 2022 at 10:08:27AM +0100, David Hildenbrand wrote:
>> On 16.11.22 09:38, Liu Shixin wrote:
>>> The page table check trigger BUG_ON() unexpectedly when split hugepage:
>>>
>>>   ------------[ cut here ]------------
>>>   kernel BUG at mm/page_table_check.c:119!
>>>   Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>>>   Dumping ftrace buffer:
>>>      (ftrace buffer empty)
>>>   Modules linked in:
>>>   CPU: 7 PID: 210 Comm: transhuge-stres Not tainted 6.1.0-rc3+ #748
>>>   Hardware name: linux,dummy-virt (DT)
>>>   pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>   pc : page_table_check_set.isra.0+0x398/0x468
>>>   lr : page_table_check_set.isra.0+0x1c0/0x468
>>> [...]
>>>   Call trace:
>>>    page_table_check_set.isra.0+0x398/0x468
>>>    __page_table_check_pte_set+0x160/0x1c0
>>>    __split_huge_pmd_locked+0x900/0x1648
>>>    __split_huge_pmd+0x28c/0x3b8
>>>    unmap_page_range+0x428/0x858
>>>    unmap_single_vma+0xf4/0x1c8
>>>    zap_page_range+0x2b0/0x410
>>>    madvise_vma_behavior+0xc44/0xe78
>>>    do_madvise+0x280/0x698
>>>    __arm64_sys_madvise+0x90/0xe8
>>>    invoke_syscall.constprop.0+0xdc/0x1d8
>>>    do_el0_svc+0xf4/0x3f8
>>>    el0_svc+0x58/0x120
>>>    el0t_64_sync_handler+0xb8/0xc0
>>>    el0t_64_sync+0x19c/0x1a0
>>> [...]
>>>
>>> On arm64, pmd_present() will return true even if the pmd is invalid.
>>
>> I assume that's because of the pmd_present_invalid() check.
>>
>> ... I wonder why that behavior was chosen. Sounds error-prone to me.
> 
> That seems to be down to commit:
> 
>   b65399f6111b03df ("arm64/mm: Change THP helpers to comply with generic MM semantics")
> 
> ... apparently because Andrea Arcangelli said this was necessary in:
> 
>   https://lore.kernel.org/lkml/20181017020930.GN30832@redhat.com/
> 
> ... but that does see to contradict what's said in:
> 
>   Documentation/mm/arch_pgtable_helpers.rst
> 
> ... which just says:
> 
>   pmd_present  Tests a valid mapped PMD

It should be as follows instead, will update. Not sure about PUD level though,
where anon THP is not supported (AFAIK).

+---------------------------+--------------------------------------------------+
| pmd_present               | Tests if pmd_page() points to valid memory page  |
+---------------------------+--------------------------------------------------+

> 
> ... and it's not clear to me why this *only* applies to the PMD level.
> 
> Anshuman?

Because THP is supported at PMD level. As Andrea had explained earlier, pmd_present()
should return positive if pmd_page() on the entry points to valid memory irrespective
of whether the entry is valid/mapped or not. That is the semantics expected in generic
THP during PMD split, collapse, migration etc and other memory code walking past such
PMD entries. That was my understanding.
  

Patch

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index edf6625ce965..56e178de75e7 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -863,12 +863,12 @@  static inline bool pte_user_accessible_page(pte_t pte)
 
 static inline bool pmd_user_accessible_page(pmd_t pmd)
 {
-	return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
+	return pmd_leaf(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)) && pmd_valid(pmd);
 }
 
 static inline bool pud_user_accessible_page(pud_t pud)
 {
-	return pud_leaf(pud) && pud_user(pud);
+	return pud_leaf(pud) && pud_user(pud) && pud_valid(pud);
 }
 #endif