[v6,11/41] mm: Introduce pte_mkwrite_kernel()
Commit Message
The x86 Control-flow Enforcement Technology (CET) feature includes a new
type of memory called shadow stack. This shadow stack memory has some
unusual properties, which requires some core mm changes to function
properly.
One of these changes is to allow for pte_mkwrite() to create different
types of writable memory (the existing conventionally writable type and
also the new shadow stack type). Future patches will convert pte_mkwrite()
to take a VMA in order to facilitate this, however there are places in the
kernel where pte_mkwrite() is called outside of the context of a VMA.
These are for kernel memory. So create a new variant called
pte_mkwrite_kernel() and switch the kernel users over to it. Have
pte_mkwrite() and pte_mkwrite_kernel() be the same for now. Future patches
will introduce changes to make pte_mkwrite() take a VMA.
Only do this for architectures that need it because they call pte_mkwrite()
in arch code without an associated VMA. Since it will only currently be
used in arch code, so do not include it in arch_pgtable_helpers.rst.
Cc: linux-doc@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
Hi Non-x86 Arch’s,
x86 has a feature that allows for the creation of a special type of
writable memory (shadow stack) that is only writable in limited specific
ways. Previously, changes were proposed to core MM code to teach it to
decide when to create normally writable memory or the special shadow stack
writable memory, but David Hildenbrand suggested[0] to change
pXX_mkwrite() to take a VMA, so awareness of shadow stack memory can be
moved into x86 code.
Since pXX_mkwrite() is defined in every arch, it requires some tree-wide
changes. So that is why you are seeing some patches out of a big x86
series pop up in your arch mailing list. There is no functional change.
After this refactor, the shadow stack series goes on to use the arch
helpers to push shadow stack memory details inside arch/x86.
Testing was just 0-day build testing.
Hopefully that is enough context. Thanks!
[0] https://lore.kernel.org/lkml/0e29a2d0-08d8-bcd6-ff26-4bea0e4037b0@redhat.com/#t
v6:
- New patch
---
arch/arm64/include/asm/pgtable.h | 7 ++++++-
arch/arm64/mm/trans_pgd.c | 4 ++--
arch/s390/include/asm/pgtable.h | 7 ++++++-
arch/s390/mm/pageattr.c | 2 +-
arch/x86/include/asm/pgtable.h | 7 ++++++-
arch/x86/xen/mmu_pv.c | 2 +-
6 files changed, 22 insertions(+), 7 deletions(-)
Comments
On Sat, Feb 18, 2023 at 01:14:03PM -0800, Rick Edgecombe wrote:
> The x86 Control-flow Enforcement Technology (CET) feature includes a new
> type of memory called shadow stack. This shadow stack memory has some
> unusual properties, which requires some core mm changes to function
> properly.
>
> One of these changes is to allow for pte_mkwrite() to create different
> types of writable memory (the existing conventionally writable type and
> also the new shadow stack type). Future patches will convert pte_mkwrite()
> to take a VMA in order to facilitate this, however there are places in the
> kernel where pte_mkwrite() is called outside of the context of a VMA.
> These are for kernel memory. So create a new variant called
> pte_mkwrite_kernel() and switch the kernel users over to it. Have
> pte_mkwrite() and pte_mkwrite_kernel() be the same for now. Future patches
> will introduce changes to make pte_mkwrite() take a VMA.
>
> Only do this for architectures that need it because they call pte_mkwrite()
> in arch code without an associated VMA. Since it will only currently be
> used in arch code, so do not include it in arch_pgtable_helpers.rst.
>
> Cc: linux-doc@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-s390@vger.kernel.org
> Cc: xen-devel@lists.xenproject.org
> Cc: linux-arch@vger.kernel.org
> Cc: linux-mm@kvack.org
> Tested-by: Pengfei Xu <pengfei.xu@intel.com>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
I think it's a little weird that it's the only PTE helper taking a vma,
but it does seem like the right approach.
Reviewed-by: Kees Cook <keescook@chromium.org>
On 19.02.23 21:38, Kees Cook wrote:
> On Sat, Feb 18, 2023 at 01:14:03PM -0800, Rick Edgecombe wrote:
>> The x86 Control-flow Enforcement Technology (CET) feature includes a new
>> type of memory called shadow stack. This shadow stack memory has some
>> unusual properties, which requires some core mm changes to function
>> properly.
>>
>> One of these changes is to allow for pte_mkwrite() to create different
>> types of writable memory (the existing conventionally writable type and
>> also the new shadow stack type). Future patches will convert pte_mkwrite()
>> to take a VMA in order to facilitate this, however there are places in the
>> kernel where pte_mkwrite() is called outside of the context of a VMA.
>> These are for kernel memory. So create a new variant called
>> pte_mkwrite_kernel() and switch the kernel users over to it. Have
>> pte_mkwrite() and pte_mkwrite_kernel() be the same for now. Future patches
>> will introduce changes to make pte_mkwrite() take a VMA.
>>
>> Only do this for architectures that need it because they call pte_mkwrite()
>> in arch code without an associated VMA. Since it will only currently be
>> used in arch code, so do not include it in arch_pgtable_helpers.rst.
>>
>> Cc: linux-doc@vger.kernel.org
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-s390@vger.kernel.org
>> Cc: xen-devel@lists.xenproject.org
>> Cc: linux-arch@vger.kernel.org
>> Cc: linux-mm@kvack.org
>> Tested-by: Pengfei Xu <pengfei.xu@intel.com>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>
> I think it's a little weird that it's the only PTE helper taking a vma,
> but it does seem like the right approach.
Right. We could pass the vm flags instead, but not sure if that really
improves the situation. So unless someone has a better idea, this LGTM.
On 18.02.23 22:14, Rick Edgecombe wrote:
> The x86 Control-flow Enforcement Technology (CET) feature includes a new
> type of memory called shadow stack. This shadow stack memory has some
> unusual properties, which requires some core mm changes to function
> properly.
>
> One of these changes is to allow for pte_mkwrite() to create different
> types of writable memory (the existing conventionally writable type and
> also the new shadow stack type). Future patches will convert pte_mkwrite()
> to take a VMA in order to facilitate this, however there are places in the
> kernel where pte_mkwrite() is called outside of the context of a VMA.
> These are for kernel memory. So create a new variant called
> pte_mkwrite_kernel() and switch the kernel users over to it. Have
> pte_mkwrite() and pte_mkwrite_kernel() be the same for now. Future patches
> will introduce changes to make pte_mkwrite() take a VMA.
>
> Only do this for architectures that need it because they call pte_mkwrite()
> in arch code without an associated VMA. Since it will only currently be
> used in arch code, so do not include it in arch_pgtable_helpers.rst.
>
> Cc: linux-doc@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-s390@vger.kernel.org
> Cc: xen-devel@lists.xenproject.org
> Cc: linux-arch@vger.kernel.org
> Cc: linux-mm@kvack.org
> Tested-by: Pengfei Xu <pengfei.xu@intel.com>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>
Acked-by: David Hildenbrand <david@redhat.com>
Do we also have to care about pmd_mkwrite() ?
On Sat, Feb 18, 2023 at 01:14:03PM -0800, Rick Edgecombe wrote:
>The x86 Control-flow Enforcement Technology (CET) feature includes a new
>type of memory called shadow stack. This shadow stack memory has some
>unusual properties, which requires some core mm changes to function
>properly.
>
>One of these changes is to allow for pte_mkwrite() to create different
>types of writable memory (the existing conventionally writable type and
>also the new shadow stack type). Future patches will convert pte_mkwrite()
>to take a VMA in order to facilitate this, however there are places in the
>kernel where pte_mkwrite() is called outside of the context of a VMA.
>These are for kernel memory. So create a new variant called
>pte_mkwrite_kernel() and switch the kernel users over to it. Have
>pte_mkwrite() and pte_mkwrite_kernel() be the same for now. Future patches
>will introduce changes to make pte_mkwrite() take a VMA.
>
>Only do this for architectures that need it because they call pte_mkwrite()
>in arch code without an associated VMA. Since it will only currently be
>used in arch code, so do not include it in arch_pgtable_helpers.rst.
>
>Cc: linux-doc@vger.kernel.org
>Cc: linux-arm-kernel@lists.infradead.org
>Cc: linux-s390@vger.kernel.org
>Cc: xen-devel@lists.xenproject.org
>Cc: linux-arch@vger.kernel.org
>Cc: linux-mm@kvack.org
>Tested-by: Pengfei Xu <pengfei.xu@intel.com>
>Suggested-by: David Hildenbrand <david@redhat.com>
>Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>
Acked-by: Deepak Gupta <debug@rivosinc.com>
@@ -180,13 +180,18 @@ static inline pmd_t set_pmd_bit(pmd_t pmd, pgprot_t prot)
return pmd;
}
-static inline pte_t pte_mkwrite(pte_t pte)
+static inline pte_t pte_mkwrite_kernel(pte_t pte)
{
pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
return pte;
}
+static inline pte_t pte_mkwrite(pte_t pte)
+{
+ return pte_mkwrite_kernel(pte);
+}
+
static inline pte_t pte_mkclean(pte_t pte)
{
pte = clear_pte_bit(pte, __pgprot(PTE_DIRTY));
@@ -40,7 +40,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
* read only (code, rodata). Clear the RDONLY bit from
* the temporary mappings we use during restore.
*/
- set_pte(dst_ptep, pte_mkwrite(pte));
+ set_pte(dst_ptep, pte_mkwrite_kernel(pte));
} else if (debug_pagealloc_enabled() && !pte_none(pte)) {
/*
* debug_pagealloc will removed the PTE_VALID bit if
@@ -53,7 +53,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
*/
BUG_ON(!pfn_valid(pte_pfn(pte)));
- set_pte(dst_ptep, pte_mkpresent(pte_mkwrite(pte)));
+ set_pte(dst_ptep, pte_mkpresent(pte_mkwrite_kernel(pte)));
}
}
@@ -991,7 +991,7 @@ static inline pte_t pte_wrprotect(pte_t pte)
return set_pte_bit(pte, __pgprot(_PAGE_PROTECT));
}
-static inline pte_t pte_mkwrite(pte_t pte)
+static inline pte_t pte_mkwrite_kernel(pte_t pte)
{
pte = set_pte_bit(pte, __pgprot(_PAGE_WRITE));
if (pte_val(pte) & _PAGE_DIRTY)
@@ -999,6 +999,11 @@ static inline pte_t pte_mkwrite(pte_t pte)
return pte;
}
+static inline pte_t pte_mkwrite(pte_t pte)
+{
+ return pte_mkwrite_kernel(pte);
+}
+
static inline pte_t pte_mkclean(pte_t pte)
{
pte = clear_pte_bit(pte, __pgprot(_PAGE_DIRTY));
@@ -96,7 +96,7 @@ static int walk_pte_level(pmd_t *pmdp, unsigned long addr, unsigned long end,
if (flags & SET_MEMORY_RO)
new = pte_wrprotect(new);
else if (flags & SET_MEMORY_RW)
- new = pte_mkwrite(pte_mkdirty(new));
+ new = pte_mkwrite_kernel(pte_mkdirty(new));
if (flags & SET_MEMORY_NX)
new = set_pte_bit(new, __pgprot(_PAGE_NOEXEC));
else if (flags & SET_MEMORY_X)
@@ -364,11 +364,16 @@ static inline pte_t pte_mkyoung(pte_t pte)
return pte_set_flags(pte, _PAGE_ACCESSED);
}
-static inline pte_t pte_mkwrite(pte_t pte)
+static inline pte_t pte_mkwrite_kernel(pte_t pte)
{
return pte_set_flags(pte, _PAGE_RW);
}
+static inline pte_t pte_mkwrite(pte_t pte)
+{
+ return pte_mkwrite_kernel(pte);
+}
+
static inline pte_t pte_mkhuge(pte_t pte)
{
return pte_set_flags(pte, _PAGE_PSE);
@@ -150,7 +150,7 @@ void make_lowmem_page_readwrite(void *vaddr)
if (pte == NULL)
return; /* vaddr missing */
- ptev = pte_mkwrite(*pte);
+ ptev = pte_mkwrite_kernel(*pte);
if (HYPERVISOR_update_va_mapping(address, ptev, 0))
BUG();