[v5,11/39] x86/mm: Update pte_modify for _PAGE_COW

Message ID 20230119212317.8324-12-rick.p.edgecombe@intel.com
State New
Headers
Series Shadow stacks for userspace |

Commit Message

Edgecombe, Rick P Jan. 19, 2023, 9:22 p.m. UTC
  From: Yu-cheng Yu <yu-cheng.yu@intel.com>

The Write=0,Dirty=1 PTE has been used to indicate copy-on-write pages.
However, newer x86 processors also regard a Write=0,Dirty=1 PTE as a
shadow stack page. In order to separate the two, the software-defined
_PAGE_DIRTY is changed to _PAGE_COW for the copy-on-write case, and
pte_*() are updated to do this.

pte_modify() takes a "raw" pgprot_t which was not necessarily created
with any of the existing PTE bit helpers. That means that it can return a
pte_t with Write=0,Dirty=1, a shadow stack PTE, when it did not intend to
create one.

However pte_modify() changes a PTE to 'newprot', but it doesn't use the
pte_*(). Modify it to also move _PAGE_DIRTY to _PAGE_COW. Do this by
using the pte_mkdirty() helper. Since pte_mkdirty() also sets the soft
dirty bit, extract a helper that optionally doesn't set
_PAGE_SOFT_DIRTY. This helper will allow future logic for deciding when to
move _PAGE_DIRTY to _PAGE_COW can live in one place.

Apply the same changes to pmd_modify().

Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---

v5:
 - Fix pte_modify() again, to not lose _PAGE_DIRTY, but still not set
   _PAGE_SOFT_DIRTY as was fixed in v4.

v4:
 - Fix an issue in soft-dirty test, where pte_modify() would detect
   _PAGE_COW in pte_dirty() and set the soft dirty bit in pte_mkdirty().

v2:
 - Update commit log with text and suggestions from (Dave Hansen)
 - Drop fixup_dirty_pte() in favor of clearing the HW dirty bit along
   with the _PAGE_CHG_MASK masking, then calling pte_mkdirty() (Dave
   Hansen)

 arch/x86/include/asm/pgtable.h | 64 +++++++++++++++++++++++++++++-----
 1 file changed, 56 insertions(+), 8 deletions(-)
  

Comments

Kees Cook Jan. 20, 2023, 12:57 a.m. UTC | #1
On Thu, Jan 19, 2023 at 01:22:49PM -0800, Rick Edgecombe wrote:
> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> 
> The Write=0,Dirty=1 PTE has been used to indicate copy-on-write pages.
> However, newer x86 processors also regard a Write=0,Dirty=1 PTE as a
> shadow stack page. In order to separate the two, the software-defined
> _PAGE_DIRTY is changed to _PAGE_COW for the copy-on-write case, and
> pte_*() are updated to do this.
> 
> pte_modify() takes a "raw" pgprot_t which was not necessarily created
> with any of the existing PTE bit helpers. That means that it can return a
> pte_t with Write=0,Dirty=1, a shadow stack PTE, when it did not intend to
> create one.
> 
> However pte_modify() changes a PTE to 'newprot', but it doesn't use the
> pte_*(). Modify it to also move _PAGE_DIRTY to _PAGE_COW. Do this by
> using the pte_mkdirty() helper. Since pte_mkdirty() also sets the soft
> dirty bit, extract a helper that optionally doesn't set
> _PAGE_SOFT_DIRTY. This helper will allow future logic for deciding when to
> move _PAGE_DIRTY to _PAGE_COW can live in one place.
> 
> Apply the same changes to pmd_modify().
> 
> Tested-by: Pengfei Xu <pengfei.xu@intel.com>
> Tested-by: John Allen <john.allen@amd.com>
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Reviewed-by: Kees Cook <keescook@chromium.org>
  
Borislav Petkov Feb. 9, 2023, 2:08 p.m. UTC | #2
On Thu, Jan 19, 2023 at 01:22:49PM -0800, Rick Edgecombe wrote:
> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> 
> The Write=0,Dirty=1 PTE has been used to indicate copy-on-write pages.
> However, newer x86 processors also regard a Write=0,Dirty=1 PTE as a
> shadow stack page. In order to separate the two, the software-defined
> _PAGE_DIRTY is changed to _PAGE_COW for the copy-on-write case, and
> pte_*() are updated to do this.

"In order to separate the two, change the software-defined ..."

From section "2) Describe your changes" in
Documentation/process/submitting-patches.rst:

"Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour."

> +static inline pte_t __pte_mkdirty(pte_t pte, bool soft)
> +{
> +	pteval_t dirty = _PAGE_DIRTY;
> +
> +	if (soft)
> +		dirty |= _PAGE_SOFT_DIRTY;
> +
> +	return pte_set_flags(pte, dirty);
> +}

Dunno, do you even need that __pte_mkdirty() helper?

AFAIU, pte_mkdirty() will always set _PAGE_SOFT_DIRTY too so whatever
the __pte_mkdirty() thing needs to do, you can simply do it by foot in
the two callsites.

And this way you won't have the confusion: should I use pte_mkdirty() or
__pte_mkdirty()?

Ditto for the pmd variants.

Otherwise, this is starting to make more sense now.

Thx.
  
Edgecombe, Rick P Feb. 9, 2023, 5:09 p.m. UTC | #3
On Thu, 2023-02-09 at 15:08 +0100, Borislav Petkov wrote:
> On Thu, Jan 19, 2023 at 01:22:49PM -0800, Rick Edgecombe wrote:
> > From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > 
> > The Write=0,Dirty=1 PTE has been used to indicate copy-on-write
> > pages.
> > However, newer x86 processors also regard a Write=0,Dirty=1 PTE as
> > a
> > shadow stack page. In order to separate the two, the software-
> > defined
> > _PAGE_DIRTY is changed to _PAGE_COW for the copy-on-write case, and
> > pte_*() are updated to do this.
> 
> "In order to separate the two, change the software-defined ..."
> 
> From section "2) Describe your changes" in
> Documentation/process/submitting-patches.rst:
> 
> "Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
> instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
> to do frotz", as if you are giving orders to the codebase to change
> its behaviour."

Yea, this is ambiguous. It's actually trying to say that "the software-
defined..." *were* changed in previous patches. I'll change it to make
that clear.

> 
> > +static inline pte_t __pte_mkdirty(pte_t pte, bool soft)
> > +{
> > +     pteval_t dirty = _PAGE_DIRTY;
> > +
> > +     if (soft)
> > +             dirty |= _PAGE_SOFT_DIRTY;
> > +
> > +     return pte_set_flags(pte, dirty);
> > +}
> 
> Dunno, do you even need that __pte_mkdirty() helper?
> 
> AFAIU, pte_mkdirty() will always set _PAGE_SOFT_DIRTY too so whatever
> the __pte_mkdirty() thing needs to do, you can simply do it by foot
> in
> the two callsites.
> 
> And this way you won't have the confusion: should I use pte_mkdirty()
> or
> __pte_mkdirty()?
> 
> Ditto for the pmd variants.
> 
> Otherwise, this is starting to make more sense now.

The thing is it would need to duplicate the pte_write() and shadow
stack enablement check and know when to set the Cow(soon to be
SavedDirty) bit.

I see that having a similar helper is not ideal, but isn't it nice that
this special critical logic for setting the Cow bit is all in one
place? I actually tried it the other way, but thought that it was nicer
to have a helper that might drive future people to not miss the Cow bit
part.

What do you think, can we leave it or give it a new name? Maybe
pte_set_dirty() to be more like the x86-only pte_set_flags() family of
functions? Then we have:
static inline pte_t pte_mkdirty(pte_t pte)
{
	pte = pte_set_flags(pte, _PAGE_SOFT_DIRTY);

	return pte_set_dirty(pte);
}

And...
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
...
	/*
	 * Dirty bit is not preserved above so it can be done
	 * in a special way for the shadow stack case, where it
	 * may need to set _PAGE_SAVED_DIRTY. __pte_mkdirty() will do
	 * this in the case of shadow stack.
	 */
	if (oldval & _PAGE_DIRTY)
		pte_result = pte_set_dirty(pte_result);

	return pte_result;
}
  
Borislav Petkov Feb. 10, 2023, 1:57 p.m. UTC | #4
On Thu, Feb 09, 2023 at 05:09:15PM +0000, Edgecombe, Rick P wrote:
> What do you think, can we leave it or give it a new name? Maybe
> pte_set_dirty() to be more like the x86-only pte_set_flags() family of
> functions?

I'd do this (ontop of yours, not built, not tested, etc). It is short
and sweet:

pte_mkdirty() set both dirty flags
pte_modifl() sets only _PAGE_DIRTY

No special helpers to lookup what they do, no nothing. Plain and simple.

---
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 7942eff2af50..8ba37380966c 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -392,19 +392,10 @@ static inline pte_t pte_mkexec(pte_t pte)
 	return pte_clear_flags(pte, _PAGE_NX);
 }
 
-static inline pte_t __pte_mkdirty(pte_t pte, bool soft)
-{
-	pteval_t dirty = _PAGE_DIRTY;
-
-	if (soft)
-		dirty |= _PAGE_SOFT_DIRTY;
-
-	return pte_set_flags(pte, dirty);
-}
-
+/* Set _PAGE_SOFT_DIRTY for shadow stack pages. */
 static inline pte_t pte_mkdirty(pte_t pte)
 {
-	return __pte_mkdirty(pte, true);
+	return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
 }
 
 static inline pte_t pte_mkyoung(pte_t pte)
@@ -749,14 +740,8 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 
 	pte_result = __pte(val);
 
-	/*
-	 * Dirty bit is not preserved above so it can be done
-	 * in a special way for the shadow stack case, where it
-	 * may need to set _PAGE_COW. __pte_mkdirty() will do this in
-	 * the case of shadow stack.
-	 */
 	if (pte_dirty(pte))
-		pte_result = __pte_mkdirty(pte_result, false);
+		pte_result = pte_set_flags(pte_result, _PAGE_DIRTY);
 
 	return pte_result;
 }
  
Edgecombe, Rick P Feb. 10, 2023, 5 p.m. UTC | #5
On Fri, 2023-02-10 at 14:57 +0100, Borislav Petkov wrote:
> @@ -749,14 +740,8 @@ static inline pte_t pte_modify(pte_t pte,
> pgprot_t newprot)
>  
>         pte_result = __pte(val);
>  
> -       /*
> -        * Dirty bit is not preserved above so it can be done
> -        * in a special way for the shadow stack case, where it
> -        * may need to set _PAGE_COW. __pte_mkdirty() will do this in
> -        * the case of shadow stack.
> -        */
>         if (pte_dirty(pte))
> -               pte_result = __pte_mkdirty(pte_result, false);
> +               pte_result = pte_set_flags(pte_result, _PAGE_DIRTY);
>  
>         return pte_result;
>  }
> 

Oh, I see what you are seeing now. Did you notice that  the
__pte_mkdirty() logic got expanded in "x86/mm: Start actually marking
_PAGE_COW"? So if we don't put that logic in a usable helper, it ends
up open coded with pte_modify() looking something like this:
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{
	pteval_t val = pte_val(pte), oldval = val;
	pte_t pte_result;

	/*
	 * Chop off the NX bit (if present), and add the NX portion of
	 * the newprot (if present):
	 */
	val &= (_PAGE_CHG_MASK & ~_PAGE_DIRTY);
	val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK;
	val = flip_protnone_guard(oldval, val, PTE_PFN_MASK);

	pte_result = __pte(val);

	/*
	 * Dirty bit is not preserved above so it can be done
	 * in a special way for the shadow stack case, where it
	 * may need to set _PAGE_SAVED_DIRTY. __pte_mkdirty() will do
	 * this in the case of shadow stack.
	 */
	if (oldval & _PAGE_DIRTY)
		if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) &&
		    !pte_write(pte_result))
			pte_set_flags(pte_result, _PAGE_SAVED_DIRTY);
		else
			pte_set_flags(pte_result, _PAGE_DIRTY);
	}

	return pte_result;
}

So the later logic of doing the _PAGE_SAVED_DIRTY (_PAGE_COW) part is
not centralized. It's ok?
  
Borislav Petkov Feb. 17, 2023, 4:11 p.m. UTC | #6
On Fri, Feb 10, 2023 at 05:00:05PM +0000, Edgecombe, Rick P wrote:
> 	/*
> 	 * Dirty bit is not preserved above so it can be done
> 	 * in a special way for the shadow stack case, where it
> 	 * may need to set _PAGE_SAVED_DIRTY. __pte_mkdirty() will do
> 	 * this in the case of shadow stack.
> 	 */
> 	if (oldval & _PAGE_DIRTY)
> 		if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) &&
> 		    !pte_write(pte_result))
> 			pte_set_flags(pte_result, _PAGE_SAVED_DIRTY);
> 		else
> 			pte_set_flags(pte_result, _PAGE_DIRTY);
> 	}
> 
> 	return pte_result;
> }
> 
> So the later logic of doing the _PAGE_SAVED_DIRTY (_PAGE_COW) part is
> not centralized. It's ok?

I think so.

1. If you have a single pte_mkdirty() and not also a __ helper, then
   there's less confusion for callers as to which interface they should be
   using

2. The not centralized part is a single conditional so it's not like
   you're saving on gazillion code lines

So I'd prefer that.

If we end up needing this in more places then we can carve it out into
a proper helper which is not in a header file such that anyone can use
it but move the whole functionality into cet.c or so where we can
control its visibility to the rest of the kernel.

I'd say.

Thx.
  
Edgecombe, Rick P Feb. 17, 2023, 4:53 p.m. UTC | #7
On Fri, 2023-02-17 at 17:11 +0100, Borislav Petkov wrote:
> On Fri, Feb 10, 2023 at 05:00:05PM +0000, Edgecombe, Rick P wrote:
> >        /*
> >         * Dirty bit is not preserved above so it can be done
> >         * in a special way for the shadow stack case, where it
> >         * may need to set _PAGE_SAVED_DIRTY. __pte_mkdirty() will
> > do
> >         * this in the case of shadow stack.
> >         */
> >        if (oldval & _PAGE_DIRTY)
> >                if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) &&
> >                    !pte_write(pte_result))
> >                        pte_set_flags(pte_result,
> > _PAGE_SAVED_DIRTY);
> >                else
> >                        pte_set_flags(pte_result, _PAGE_DIRTY);
> >        }
> > 
> >        return pte_result;
> > }
> > 
> > So the later logic of doing the _PAGE_SAVED_DIRTY (_PAGE_COW) part
> > is
> > not centralized. It's ok?
> 
> I think so.
> 
> 1. If you have a single pte_mkdirty() and not also a __ helper, then
>    there's less confusion for callers as to which interface they
> should be
>    using
> 
> 2. The not centralized part is a single conditional so it's not like
>    you're saving on gazillion code lines
> 
> So I'd prefer that.
> 
> 
Fair enough, I'll adjust it. Thanks!
  

Patch

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 6d2f612c04b5..7942eff2af50 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -392,9 +392,19 @@  static inline pte_t pte_mkexec(pte_t pte)
 	return pte_clear_flags(pte, _PAGE_NX);
 }
 
+static inline pte_t __pte_mkdirty(pte_t pte, bool soft)
+{
+	pteval_t dirty = _PAGE_DIRTY;
+
+	if (soft)
+		dirty |= _PAGE_SOFT_DIRTY;
+
+	return pte_set_flags(pte, dirty);
+}
+
 static inline pte_t pte_mkdirty(pte_t pte)
 {
-	return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
+	return __pte_mkdirty(pte, true);
 }
 
 static inline pte_t pte_mkyoung(pte_t pte)
@@ -503,9 +513,19 @@  static inline pmd_t pmd_wrprotect(pmd_t pmd)
 	return pmd_clear_flags(pmd, _PAGE_RW);
 }
 
+static inline pmd_t __pmd_mkdirty(pmd_t pmd, bool soft)
+{
+	pmdval_t dirty = _PAGE_DIRTY;
+
+	if (soft)
+		dirty |= _PAGE_SOFT_DIRTY;
+
+	return pmd_set_flags(pmd, dirty);
+}
+
 static inline pmd_t pmd_mkdirty(pmd_t pmd)
 {
-	return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
+	return __pmd_mkdirty(pmd, true);
 }
 
 static inline pmd_t pmd_mkdevmap(pmd_t pmd)
@@ -715,26 +735,54 @@  static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask);
 
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
+	pteval_t _page_chg_mask_no_dirty = _PAGE_CHG_MASK & ~_PAGE_DIRTY;
 	pteval_t val = pte_val(pte), oldval = val;
+	pte_t pte_result;
 
 	/*
 	 * Chop off the NX bit (if present), and add the NX portion of
 	 * the newprot (if present):
 	 */
-	val &= _PAGE_CHG_MASK;
-	val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK;
+	val &= _page_chg_mask_no_dirty;
+	val |= check_pgprot(newprot) & ~_page_chg_mask_no_dirty;
 	val = flip_protnone_guard(oldval, val, PTE_PFN_MASK);
-	return __pte(val);
+
+	pte_result = __pte(val);
+
+	/*
+	 * Dirty bit is not preserved above so it can be done
+	 * in a special way for the shadow stack case, where it
+	 * may need to set _PAGE_COW. __pte_mkdirty() will do this in
+	 * the case of shadow stack.
+	 */
+	if (pte_dirty(pte))
+		pte_result = __pte_mkdirty(pte_result, false);
+
+	return pte_result;
 }
 
 static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
 {
+	pteval_t _hpage_chg_mask_no_dirty = _HPAGE_CHG_MASK & ~_PAGE_DIRTY;
 	pmdval_t val = pmd_val(pmd), oldval = val;
+	pmd_t pmd_result;
 
-	val &= _HPAGE_CHG_MASK;
-	val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK;
+	val &= _hpage_chg_mask_no_dirty;
+	val |= check_pgprot(newprot) & ~_hpage_chg_mask_no_dirty;
 	val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK);
-	return __pmd(val);
+
+	pmd_result = __pmd(val);
+
+	/*
+	 * Dirty bit is not preserved above so it can be done
+	 * in a special way for the shadow stack case, where it
+	 * may need to set _PAGE_COW. __pmd_mkdirty() will do this in
+	 * the case of shadow stack.
+	 */
+	if (pmd_dirty(pmd))
+		pmd_result = __pmd_mkdirty(pmd_result, false);
+
+	return pmd_result;
 }
 
 /*