[v6,18/41] mm: Introduce VM_SHADOW_STACK for shadow stack memory

Message ID 20230218211433.26859-19-rick.p.edgecombe@intel.com
State New
Headers
Series Shadow stacks for userspace |

Commit Message

Edgecombe, Rick P Feb. 18, 2023, 9:14 p.m. UTC
  From: Yu-cheng Yu <yu-cheng.yu@intel.com>

The x86 Control-flow Enforcement Technology (CET) feature includes a new
type of memory called shadow stack. This shadow stack memory has some
unusual properties, which requires some core mm changes to function
properly.

A shadow stack PTE must be read-only and have _PAGE_DIRTY set. However,
read-only and Dirty PTEs also exist for copy-on-write (COW) pages. These
two cases are handled differently for page faults. Introduce
VM_SHADOW_STACK to track shadow stack VMAs.

Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Kees Cook <keescook@chromium.org>

---
v6:
 - Add comment about VM_SHADOW_STACK not being allowed with VM_SHARED
   (David Hildenbrand)

v3:
 - Drop arch specific change in arch_vma_name(). The memory can show as
   anonymous (Kirill)
 - Change CONFIG_ARCH_HAS_SHADOW_STACK to CONFIG_X86_USER_SHADOW_STACK
   in show_smap_vma_flags() (Boris)
---
 Documentation/filesystems/proc.rst | 1 +
 fs/proc/task_mmu.c                 | 3 +++
 include/linux/mm.h                 | 8 ++++++++
 3 files changed, 12 insertions(+)
  

Comments

David Hildenbrand Feb. 20, 2023, 12:56 p.m. UTC | #1
On 18.02.23 22:14, Rick Edgecombe wrote:
> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> 
> The x86 Control-flow Enforcement Technology (CET) feature includes a new
> type of memory called shadow stack. This shadow stack memory has some
> unusual properties, which requires some core mm changes to function
> properly.
> 
> A shadow stack PTE must be read-only and have _PAGE_DIRTY set. However,
> read-only and Dirty PTEs also exist for copy-on-write (COW) pages. These
> two cases are handled differently for page faults. Introduce
> VM_SHADOW_STACK to track shadow stack VMAs.

I suggest simplifying and abstracting that description.

"New hardware extensions implement support for shadow stack memory, such 
as x86 Control-flow Enforcement Technology (CET). Let's add a new VM 
flag to identify these areas, for example, to be used to properly 
indicate shadow stack PTEs to the hardware."

> 
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Tested-by: Pengfei Xu <pengfei.xu@intel.com>
> Tested-by: John Allen <john.allen@amd.com>
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> 
> ---
> v6:
>   - Add comment about VM_SHADOW_STACK not being allowed with VM_SHARED
>     (David Hildenbrand)

Might want to add some more meat to the patch description why that is 
the case.

> 
> v3:
>   - Drop arch specific change in arch_vma_name(). The memory can show as
>     anonymous (Kirill)
>   - Change CONFIG_ARCH_HAS_SHADOW_STACK to CONFIG_X86_USER_SHADOW_STACK
>     in show_smap_vma_flags() (Boris)
> ---
>   Documentation/filesystems/proc.rst | 1 +
>   fs/proc/task_mmu.c                 | 3 +++
>   include/linux/mm.h                 | 8 ++++++++
>   3 files changed, 12 insertions(+)
> 
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index e224b6d5b642..115843e8cce3 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -564,6 +564,7 @@ encoded manner. The codes are the following:
>       mt    arm64 MTE allocation tags are enabled
>       um    userfaultfd missing tracking
>       uw    userfaultfd wr-protect tracking
> +    ss    shadow stack page
>       ==    =======================================
>   
>   Note that there is no guarantee that every flag and associated mnemonic will
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index af1c49ae11b1..9e2cefe47749 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -711,6 +711,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
>   #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
>   		[ilog2(VM_UFFD_MINOR)]	= "ui",
>   #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
> +#ifdef CONFIG_X86_USER_SHADOW_STACK
> +		[ilog2(VM_SHADOW_STACK)] = "ss",
> +#endif
>   	};
>   	size_t i;
>   
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e6f1789c8e69..76e0a09aeffe 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -315,11 +315,13 @@ extern unsigned int kobjsize(const void *objp);
>   #define VM_HIGH_ARCH_BIT_2	34	/* bit only usable on 64-bit architectures */
>   #define VM_HIGH_ARCH_BIT_3	35	/* bit only usable on 64-bit architectures */
>   #define VM_HIGH_ARCH_BIT_4	36	/* bit only usable on 64-bit architectures */
> +#define VM_HIGH_ARCH_BIT_5	37	/* bit only usable on 64-bit architectures */
>   #define VM_HIGH_ARCH_0	BIT(VM_HIGH_ARCH_BIT_0)
>   #define VM_HIGH_ARCH_1	BIT(VM_HIGH_ARCH_BIT_1)
>   #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
>   #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
>   #define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
> +#define VM_HIGH_ARCH_5	BIT(VM_HIGH_ARCH_BIT_5)
>   #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
>   
>   #ifdef CONFIG_ARCH_HAS_PKEYS
> @@ -335,6 +337,12 @@ extern unsigned int kobjsize(const void *objp);
>   #endif
>   #endif /* CONFIG_ARCH_HAS_PKEYS */
>   
> +#ifdef CONFIG_X86_USER_SHADOW_STACK


Should we abstract this to CONFIG_ARCH_USER_SHADOW_STACK, seeing that 
other architectures might similarly need it?
  
Edgecombe, Rick P Feb. 20, 2023, 10:08 p.m. UTC | #2
On Mon, 2023-02-20 at 13:56 +0100, David Hildenbrand wrote:
> On 18.02.23 22:14, Rick Edgecombe wrote:
> > From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > 
> > The x86 Control-flow Enforcement Technology (CET) feature includes
> > a new
> > type of memory called shadow stack. This shadow stack memory has
> > some
> > unusual properties, which requires some core mm changes to function
> > properly.
> > 
> > A shadow stack PTE must be read-only and have _PAGE_DIRTY set.
> > However,
> > read-only and Dirty PTEs also exist for copy-on-write (COW) pages.
> > These
> > two cases are handled differently for page faults. Introduce
> > VM_SHADOW_STACK to track shadow stack VMAs.
> 
> I suggest simplifying and abstracting that description.
> 
> "New hardware extensions implement support for shadow stack memory,
> such 
> as x86 Control-flow Enforcement Technology (CET). Let's add a new VM 
> flag to identify these areas, for example, to be used to properly 
> indicate shadow stack PTEs to the hardware."

Ah yea, that top blurb was added to all the non-x86 arch patches after
some feedback from Andrew Morton. He had said basically (in some more
colorful language) that the changelogs (at the time) were written
assuming the reader knows what a shadow stack is.

So it might be worth keeping a little more info in the log?

> 
> > 
> > Reviewed-by: Kees Cook <keescook@chromium.org>
> > Tested-by: Pengfei Xu <pengfei.xu@intel.com>
> > Tested-by: John Allen <john.allen@amd.com>
> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > 
> > ---
> > v6:
> >   - Add comment about VM_SHADOW_STACK not being allowed with
> > VM_SHARED
> >     (David Hildenbrand)
> 
> Might want to add some more meat to the patch description why that
> is 
> the case.

Sure.

> 
> > 
> > v3:
> >   - Drop arch specific change in arch_vma_name(). The memory can
> > show as
> >     anonymous (Kirill)
> >   - Change CONFIG_ARCH_HAS_SHADOW_STACK to
> > CONFIG_X86_USER_SHADOW_STACK
> >     in show_smap_vma_flags() (Boris)
> > ---
> >   Documentation/filesystems/proc.rst | 1 +
> >   fs/proc/task_mmu.c                 | 3 +++
> >   include/linux/mm.h                 | 8 ++++++++
> >   3 files changed, 12 insertions(+)
> > 
> > diff --git a/Documentation/filesystems/proc.rst
> > b/Documentation/filesystems/proc.rst
> > index e224b6d5b642..115843e8cce3 100644
> > --- a/Documentation/filesystems/proc.rst
> > +++ b/Documentation/filesystems/proc.rst
> > @@ -564,6 +564,7 @@ encoded manner. The codes are the following:
> >       mt    arm64 MTE allocation tags are enabled
> >       um    userfaultfd missing tracking
> >       uw    userfaultfd wr-protect tracking
> > +    ss    shadow stack page
> >       ==    =======================================
> >   
> >   Note that there is no guarantee that every flag and associated
> > mnemonic will
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index af1c49ae11b1..9e2cefe47749 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -711,6 +711,9 @@ static void show_smap_vma_flags(struct seq_file
> > *m, struct vm_area_struct *vma)
> >   #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
> >   		[ilog2(VM_UFFD_MINOR)]	= "ui",
> >   #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
> > +#ifdef CONFIG_X86_USER_SHADOW_STACK
> > +		[ilog2(VM_SHADOW_STACK)] = "ss",
> > +#endif
> >   	};
> >   	size_t i;
> >   
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index e6f1789c8e69..76e0a09aeffe 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -315,11 +315,13 @@ extern unsigned int kobjsize(const void
> > *objp);
> >   #define VM_HIGH_ARCH_BIT_2	34	/* bit only usable on 64-
> > bit architectures */
> >   #define VM_HIGH_ARCH_BIT_3	35	/* bit only usable on 64-
> > bit architectures */
> >   #define VM_HIGH_ARCH_BIT_4	36	/* bit only usable on 64-
> > bit architectures */
> > +#define VM_HIGH_ARCH_BIT_5	37	/* bit only usable on 64-bit
> > architectures */
> >   #define VM_HIGH_ARCH_0	BIT(VM_HIGH_ARCH_BIT_0)
> >   #define VM_HIGH_ARCH_1	BIT(VM_HIGH_ARCH_BIT_1)
> >   #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
> >   #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
> >   #define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
> > +#define VM_HIGH_ARCH_5	BIT(VM_HIGH_ARCH_BIT_5)
> >   #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
> >   
> >   #ifdef CONFIG_ARCH_HAS_PKEYS
> > @@ -335,6 +337,12 @@ extern unsigned int kobjsize(const void
> > *objp);
> >   #endif
> >   #endif /* CONFIG_ARCH_HAS_PKEYS */
> >   
> > +#ifdef CONFIG_X86_USER_SHADOW_STACK
> 
> 
> Should we abstract this to CONFIG_ARCH_USER_SHADOW_STACK, seeing
> that 
> other architectures might similarly need it?

There was an ARCH_HAS_SHADOW_STACK but it got removed following this
discussion:

https://lore.kernel.org/lkml/d09e952d8ae696f687f0787dfeb7be7699c02913.camel@intel.com/

Now we have this new RFC for riscv as potentially a second
implementation. But it is still very early, and I'm not sure anyone
knows exactly what the similarities will be in a mature version. So I
think it would be better to refactor in an ARCH_HAS_SHADOW_STACK later
(and similar abstractions) once that series is more mature and we have
an idea of what pieces will be shared. I don't have a problem in
principle with an ARCH config, just don't think we should do it yet.

>
  
David Hildenbrand Feb. 21, 2023, 8:34 a.m. UTC | #3
On 20.02.23 23:08, Edgecombe, Rick P wrote:
> On Mon, 2023-02-20 at 13:56 +0100, David Hildenbrand wrote:
>> On 18.02.23 22:14, Rick Edgecombe wrote:
>>> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
>>>
>>> The x86 Control-flow Enforcement Technology (CET) feature includes
>>> a new
>>> type of memory called shadow stack. This shadow stack memory has
>>> some
>>> unusual properties, which requires some core mm changes to function
>>> properly.
>>>
>>> A shadow stack PTE must be read-only and have _PAGE_DIRTY set.
>>> However,
>>> read-only and Dirty PTEs also exist for copy-on-write (COW) pages.
>>> These
>>> two cases are handled differently for page faults. Introduce
>>> VM_SHADOW_STACK to track shadow stack VMAs.
>>
>> I suggest simplifying and abstracting that description.
>>
>> "New hardware extensions implement support for shadow stack memory,
>> such
>> as x86 Control-flow Enforcement Technology (CET). Let's add a new VM
>> flag to identify these areas, for example, to be used to properly
>> indicate shadow stack PTEs to the hardware."
> 
> Ah yea, that top blurb was added to all the non-x86 arch patches after
> some feedback from Andrew Morton. He had said basically (in some more
> colorful language) that the changelogs (at the time) were written
> assuming the reader knows what a shadow stack is.

Okay. It's a bit repetitive, though.

Ideally, we'd just explain it in the cover letter in detail and 
Andrews's script would include the cover letter in the first commit. 
IIRC, that's what usually happens.

> 
> So it might be worth keeping a little more info in the log?

Copying the same paragraph into each commit is IMHO a bit repetitive. 
But these are just my 2 cents.

[...]

>> Should we abstract this to CONFIG_ARCH_USER_SHADOW_STACK, seeing
>> that
>> other architectures might similarly need it?
> 
> There was an ARCH_HAS_SHADOW_STACK but it got removed following this
> discussion:
> 
> https://lore.kernel.org/lkml/d09e952d8ae696f687f0787dfeb7be7699c02913.camel@intel.com/
> 
> Now we have this new RFC for riscv as potentially a second
> implementation. But it is still very early, and I'm not sure anyone
> knows exactly what the similarities will be in a mature version. So I
> think it would be better to refactor in an ARCH_HAS_SHADOW_STACK later
> (and similar abstractions) once that series is more mature and we have
> an idea of what pieces will be shared. I don't have a problem in
> principle with an ARCH config, just don't think we should do it yet.

Okay, easy to factor out later.

Acked-by: David Hildenbrand <david@redhat.com>
  
Deepak Gupta Feb. 22, 2023, 10:13 p.m. UTC | #4
On Tue, Feb 21, 2023 at 09:34:35AM +0100, David Hildenbrand wrote:
>On 20.02.23 23:08, Edgecombe, Rick P wrote:
>>On Mon, 2023-02-20 at 13:56 +0100, David Hildenbrand wrote:
>>>On 18.02.23 22:14, Rick Edgecombe wrote:
>>>>From: Yu-cheng Yu <yu-cheng.yu@intel.com>
>>>>
>>>>The x86 Control-flow Enforcement Technology (CET) feature includes
>>>>a new
>>>>type of memory called shadow stack. This shadow stack memory has
>>>>some
>>>>unusual properties, which requires some core mm changes to function
>>>>properly.
>>>>
>>>>A shadow stack PTE must be read-only and have _PAGE_DIRTY set.
>>>>However,
>>>>read-only and Dirty PTEs also exist for copy-on-write (COW) pages.
>>>>These
>>>>two cases are handled differently for page faults. Introduce
>>>>VM_SHADOW_STACK to track shadow stack VMAs.
>>>
>>>I suggest simplifying and abstracting that description.
>>>
>>>"New hardware extensions implement support for shadow stack memory,
>>>such
>>>as x86 Control-flow Enforcement Technology (CET). Let's add a new VM
>>>flag to identify these areas, for example, to be used to properly
>>>indicate shadow stack PTEs to the hardware."
>>
>>Ah yea, that top blurb was added to all the non-x86 arch patches after
>>some feedback from Andrew Morton. He had said basically (in some more
>>colorful language) that the changelogs (at the time) were written
>>assuming the reader knows what a shadow stack is.
>
>Okay. It's a bit repetitive, though.
>
>Ideally, we'd just explain it in the cover letter in detail and 
>Andrews's script would include the cover letter in the first commit. 
>IIRC, that's what usually happens.
>
>>
>>So it might be worth keeping a little more info in the log?
>
>Copying the same paragraph into each commit is IMHO a bit repetitive. 
>But these are just my 2 cents.
>
>[...]
>
>>>Should we abstract this to CONFIG_ARCH_USER_SHADOW_STACK, seeing
>>>that
>>>other architectures might similarly need it?
>>
>>There was an ARCH_HAS_SHADOW_STACK but it got removed following this
>>discussion:
>>
>>https://lore.kernel.org/lkml/d09e952d8ae696f687f0787dfeb7be7699c02913.camel@intel.com/
>>
>>Now we have this new RFC for riscv as potentially a second
>>implementation. But it is still very early, and I'm not sure anyone
>>knows exactly what the similarities will be in a mature version. So I
>>think it would be better to refactor in an ARCH_HAS_SHADOW_STACK later
>>(and similar abstractions) once that series is more mature and we have
>>an idea of what pieces will be shared. I don't have a problem in
>>principle with an ARCH config, just don't think we should do it yet.
>
>Okay, easy to factor out later.

I would be more than happy if this config name would've been abstracted out and arches can
choose to implement. It's a bit sad that it was generic earlier and was later changed due
to lack of support from other architectures. Now there are three architectures who either
already support shadow stack (x86), announced the support (aarch64) or are planning to
support (riscv).

However given patch reduction I will get due to `pte_mkwrite` refactor, I am in favor of
future refactor for config.
>
>Acked-by: David Hildenbrand <david@redhat.com>
>
>-- 
>Thanks,
>
>David / dhildenb
>
  

Patch

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index e224b6d5b642..115843e8cce3 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -564,6 +564,7 @@  encoded manner. The codes are the following:
     mt    arm64 MTE allocation tags are enabled
     um    userfaultfd missing tracking
     uw    userfaultfd wr-protect tracking
+    ss    shadow stack page
     ==    =======================================
 
 Note that there is no guarantee that every flag and associated mnemonic will
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index af1c49ae11b1..9e2cefe47749 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -711,6 +711,9 @@  static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
 		[ilog2(VM_UFFD_MINOR)]	= "ui",
 #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
+#ifdef CONFIG_X86_USER_SHADOW_STACK
+		[ilog2(VM_SHADOW_STACK)] = "ss",
+#endif
 	};
 	size_t i;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e6f1789c8e69..76e0a09aeffe 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -315,11 +315,13 @@  extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_BIT_2	34	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_3	35	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_4	36	/* bit only usable on 64-bit architectures */
+#define VM_HIGH_ARCH_BIT_5	37	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_0	BIT(VM_HIGH_ARCH_BIT_0)
 #define VM_HIGH_ARCH_1	BIT(VM_HIGH_ARCH_BIT_1)
 #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
 #define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
+#define VM_HIGH_ARCH_5	BIT(VM_HIGH_ARCH_BIT_5)
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
@@ -335,6 +337,12 @@  extern unsigned int kobjsize(const void *objp);
 #endif
 #endif /* CONFIG_ARCH_HAS_PKEYS */
 
+#ifdef CONFIG_X86_USER_SHADOW_STACK
+# define VM_SHADOW_STACK	VM_HIGH_ARCH_5 /* Should not be set with VM_SHARED */
+#else
+# define VM_SHADOW_STACK	VM_NONE
+#endif
+
 #if defined(CONFIG_X86)
 # define VM_PAT		VM_ARCH_1	/* PAT reserves whole VMA at once (x86) */
 #elif defined(CONFIG_PPC)