arm64/mm: Intercept pfn changes in set_pte_at()
Commit Message
Changing pfn on a user page table mapped entry, without first going through
break-before-make (BBM) procedure is unsafe. This just updates set_pte_at()
to intercept such changes, via an updated pgattr_change_is_safe(). This new
check happens via __check_racy_pte_update(), which has now been renamed as
__check_safe_pte_update().
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
This applies on v6.1-rc4
arch/arm64/include/asm/pgtable.h | 8 ++++++--
arch/arm64/mm/mmu.c | 8 +++++++-
2 files changed, 13 insertions(+), 3 deletions(-)
Comments
On Wed, Nov 16, 2022 at 08:40:01AM +0530, Anshuman Khandual wrote:
> Changing pfn on a user page table mapped entry, without first going through
> break-before-make (BBM) procedure is unsafe. This just updates set_pte_at()
> to intercept such changes, via an updated pgattr_change_is_safe(). This new
> check happens via __check_racy_pte_update(), which has now been renamed as
> __check_safe_pte_update().
>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> This applies on v6.1-rc4
>
> arch/arm64/include/asm/pgtable.h | 8 ++++++--
> arch/arm64/mm/mmu.c | 8 +++++++-
> 2 files changed, 13 insertions(+), 3 deletions(-)
I remember Mark saying that BBM is sometimes violated by the core code in
cases where the pte isn't actually part of a live pgtable (e.g. if it's on
the stack or part of a newly allocated table). Won't that cause false
positives here?
Will
On 11/18/22 19:43, Will Deacon wrote:
> On Wed, Nov 16, 2022 at 08:40:01AM +0530, Anshuman Khandual wrote:
>> Changing pfn on a user page table mapped entry, without first going through
>> break-before-make (BBM) procedure is unsafe. This just updates set_pte_at()
>> to intercept such changes, via an updated pgattr_change_is_safe(). This new
>> check happens via __check_racy_pte_update(), which has now been renamed as
>> __check_safe_pte_update().
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>> This applies on v6.1-rc4
>>
>> arch/arm64/include/asm/pgtable.h | 8 ++++++--
>> arch/arm64/mm/mmu.c | 8 +++++++-
>> 2 files changed, 13 insertions(+), 3 deletions(-)
>
> I remember Mark saying that BBM is sometimes violated by the core code in
> cases where the pte isn't actually part of a live pgtable (e.g. if it's on
> the stack or part of a newly allocated table). Won't that cause false
> positives here?
Could you please elaborate ? If the pte is not on a live page table, then
pte_valid() will return negative on such entries. So any update there will
be safe. I am wondering, how this change will cause false positives which
would not have been possible earlier.
On Tue, Nov 22, 2022 at 01:43:17PM +0530, Anshuman Khandual wrote:
>
>
> On 11/18/22 19:43, Will Deacon wrote:
> > On Wed, Nov 16, 2022 at 08:40:01AM +0530, Anshuman Khandual wrote:
> >> Changing pfn on a user page table mapped entry, without first going through
> >> break-before-make (BBM) procedure is unsafe. This just updates set_pte_at()
> >> to intercept such changes, via an updated pgattr_change_is_safe(). This new
> >> check happens via __check_racy_pte_update(), which has now been renamed as
> >> __check_safe_pte_update().
> >>
> >> Cc: Catalin Marinas <catalin.marinas@arm.com>
> >> Cc: Will Deacon <will@kernel.org>
> >> Cc: Mark Rutland <mark.rutland@arm.com>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: linux-arm-kernel@lists.infradead.org
> >> Cc: linux-kernel@vger.kernel.org
> >> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> >> ---
> >> This applies on v6.1-rc4
> >>
> >> arch/arm64/include/asm/pgtable.h | 8 ++++++--
> >> arch/arm64/mm/mmu.c | 8 +++++++-
> >> 2 files changed, 13 insertions(+), 3 deletions(-)
> >
> > I remember Mark saying that BBM is sometimes violated by the core code in
> > cases where the pte isn't actually part of a live pgtable (e.g. if it's on
> > the stack or part of a newly allocated table). Won't that cause false
> > positives here?
>
> Could you please elaborate ? If the pte is not on a live page table, then
> pte_valid() will return negative on such entries. So any update there will
> be safe. I am wondering, how this change will cause false positives which
> would not have been possible earlier.
I don't think pte_valid() will always return false for these entries.
Consider, for example, ptes which are valid but which live in a table that
is not reachable by the MMU. I think this is what Mark had in mind, but it
would be helpful if he could chime in with the specific example he ran into.
Will
On Tue, Nov 22, 2022 at 09:57:49AM +0000, Will Deacon wrote:
> On Tue, Nov 22, 2022 at 01:43:17PM +0530, Anshuman Khandual wrote:
> >
> >
> > On 11/18/22 19:43, Will Deacon wrote:
> > > On Wed, Nov 16, 2022 at 08:40:01AM +0530, Anshuman Khandual wrote:
> > >> Changing pfn on a user page table mapped entry, without first going through
> > >> break-before-make (BBM) procedure is unsafe. This just updates set_pte_at()
> > >> to intercept such changes, via an updated pgattr_change_is_safe(). This new
> > >> check happens via __check_racy_pte_update(), which has now been renamed as
> > >> __check_safe_pte_update().
> > >>
> > >> Cc: Catalin Marinas <catalin.marinas@arm.com>
> > >> Cc: Will Deacon <will@kernel.org>
> > >> Cc: Mark Rutland <mark.rutland@arm.com>
> > >> Cc: Andrew Morton <akpm@linux-foundation.org>
> > >> Cc: linux-arm-kernel@lists.infradead.org
> > >> Cc: linux-kernel@vger.kernel.org
> > >> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> > >> ---
> > >> This applies on v6.1-rc4
> > >>
> > >> arch/arm64/include/asm/pgtable.h | 8 ++++++--
> > >> arch/arm64/mm/mmu.c | 8 +++++++-
> > >> 2 files changed, 13 insertions(+), 3 deletions(-)
> > >
> > > I remember Mark saying that BBM is sometimes violated by the core code in
> > > cases where the pte isn't actually part of a live pgtable (e.g. if it's on
> > > the stack or part of a newly allocated table). Won't that cause false
> > > positives here?
> >
> > Could you please elaborate ? If the pte is not on a live page table, then
> > pte_valid() will return negative on such entries. So any update there will
> > be safe. I am wondering, how this change will cause false positives which
> > would not have been possible earlier.
>
> I don't think pte_valid() will always return false for these entries.
> Consider, for example, ptes which are valid but which live in a table that
> is not reachable by the MMU. I think this is what Mark had in mind, but it
> would be helpful if he could chime in with the specific example he ran into.
Yup -- that was the case I had in mind. IIRC I hit that in the past when trying
to do something similar, but I can't recall exactly where that was. I suspect
that was probably to do with page migration or huge page splitting/merging.
Looking around, at least __split_huge_zero_page_pmd() and
__split_huge_pmd_locked() do something like that, creating a temporary pmd
entry on the stack, populating a table of non-live but valid ptes, then
plumbing it into the real pmd.
We'd need to check that there aren't other cases like that.
Thanks,
Mark.
On 11/22/22 16:41, Mark Rutland wrote:
> On Tue, Nov 22, 2022 at 09:57:49AM +0000, Will Deacon wrote:
>> On Tue, Nov 22, 2022 at 01:43:17PM +0530, Anshuman Khandual wrote:
>>>
>>>
>>> On 11/18/22 19:43, Will Deacon wrote:
>>>> On Wed, Nov 16, 2022 at 08:40:01AM +0530, Anshuman Khandual wrote:
>>>>> Changing pfn on a user page table mapped entry, without first going through
>>>>> break-before-make (BBM) procedure is unsafe. This just updates set_pte_at()
>>>>> to intercept such changes, via an updated pgattr_change_is_safe(). This new
>>>>> check happens via __check_racy_pte_update(), which has now been renamed as
>>>>> __check_safe_pte_update().
>>>>>
>>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>>> Cc: Will Deacon <will@kernel.org>
>>>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>> Cc: linux-arm-kernel@lists.infradead.org
>>>>> Cc: linux-kernel@vger.kernel.org
>>>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>>>> ---
>>>>> This applies on v6.1-rc4
>>>>>
>>>>> arch/arm64/include/asm/pgtable.h | 8 ++++++--
>>>>> arch/arm64/mm/mmu.c | 8 +++++++-
>>>>> 2 files changed, 13 insertions(+), 3 deletions(-)
>>>>
>>>> I remember Mark saying that BBM is sometimes violated by the core code in
>>>> cases where the pte isn't actually part of a live pgtable (e.g. if it's on
>>>> the stack or part of a newly allocated table). Won't that cause false
>>>> positives here?
>>>
>>> Could you please elaborate ? If the pte is not on a live page table, then
>>> pte_valid() will return negative on such entries. So any update there will
>>> be safe. I am wondering, how this change will cause false positives which
>>> would not have been possible earlier.
>>
>> I don't think pte_valid() will always return false for these entries.
>> Consider, for example, ptes which are valid but which live in a table that
>> is not reachable by the MMU. I think this is what Mark had in mind, but it
>> would be helpful if he could chime in with the specific example he ran into.
>
> Yup -- that was the case I had in mind. IIRC I hit that in the past when trying
> to do something similar, but I can't recall exactly where that was. I suspect
> that was probably to do with page migration or huge page splitting/merging.
>
> Looking around, at least __split_huge_zero_page_pmd() and
> __split_huge_pmd_locked() do something like that, creating a temporary pmd
> entry on the stack, populating a table of non-live but valid ptes, then
> plumbing it into the real pmd.
In both cases i.e __split_huge_zero_page_pmd() and __split_huge_pmd_locked(), the
entry is first asserted to be empty via pte_none(), before writing a new value in
there. set_pte_at() would still consider such updates safe because pfn_valid(old)
will return negative on such entries.
VM_BUG_ON(!pte_none(*pte));
set_pte_at(mm, haddr, pte, entry);
But if these entries still get updated yet again (while still being inactive) with
new pte values, then set_pte_at() would complain for the pfn update on the entry,
while being "valid". But is this a viable scenario ?
>
> We'd need to check that there aren't other cases like that.
>
Sure, might be some what tricky but anything in particular to be looked into ? I
guess if this change gets into a CI system which runs all memory stress tests for
long enough with CONFIG_DEBUG_VM enabled, we might get some more clue if there
are other similar scenarios possible.
@@ -275,6 +275,7 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
}
extern void __sync_icache_dcache(pte_t pteval);
+bool pgattr_change_is_safe(u64 old, u64 new);
/*
* PTE bits configuration in the presence of hardware Dirty Bit Management
@@ -292,7 +293,7 @@ extern void __sync_icache_dcache(pte_t pteval);
* PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY)
*/
-static inline void __check_racy_pte_update(struct mm_struct *mm, pte_t *ptep,
+static inline void __check_safe_pte_update(struct mm_struct *mm, pte_t *ptep,
pte_t pte)
{
pte_t old_pte;
@@ -318,6 +319,9 @@ static inline void __check_racy_pte_update(struct mm_struct *mm, pte_t *ptep,
VM_WARN_ONCE(pte_write(old_pte) && !pte_dirty(pte),
"%s: racy dirty state clearing: 0x%016llx -> 0x%016llx",
__func__, pte_val(old_pte), pte_val(pte));
+ VM_WARN_ONCE(!pgattr_change_is_safe(pte_val(old_pte), pte_val(pte)),
+ "%s: unsafe attribute change: 0x%016llx -> 0x%016llx",
+ __func__, pte_val(old_pte), pte_val(pte));
}
static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
@@ -346,7 +350,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
mte_sync_tags(old_pte, pte);
}
- __check_racy_pte_update(mm, ptep, pte);
+ __check_safe_pte_update(mm, ptep, pte);
set_pte(ptep, pte);
}
@@ -133,7 +133,7 @@ static phys_addr_t __init early_pgtable_alloc(int shift)
return phys;
}
-static bool pgattr_change_is_safe(u64 old, u64 new)
+bool pgattr_change_is_safe(u64 old, u64 new)
{
/*
* The following mapping attributes may be updated in live
@@ -145,6 +145,12 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
if (old == 0 || new == 0)
return true;
+ /* If old and new ptes are valid, pfn should not change */
+ if (pte_valid(__pte(old)) && pte_valid(__pte(new))) {
+ if (pte_pfn(__pte(old)) != pte_pfn(__pte(new)))
+ return false;
+ }
+
/* live contiguous mappings may not be manipulated at all */
if ((old | new) & PTE_CONT)
return false;