[v4,11/39] x86/mm: Update pte_modify for _PAGE_COW

Message ID 20221203003606.6838-12-rick.p.edgecombe@intel.com
State New
Headers
Series Shadow stacks for userspace |

Commit Message

Edgecombe, Rick P Dec. 3, 2022, 12:35 a.m. UTC
  From: Yu-cheng Yu <yu-cheng.yu@intel.com>

The Write=0,Dirty=1 PTE has been used to indicate copy-on-write pages.
However, newer x86 processors also regard a Write=0,Dirty=1 PTE as a
shadow stack page. In order to separate the two, the software-defined
_PAGE_DIRTY is changed to _PAGE_COW for the copy-on-write case, and
pte_*() are updated to do this.

pte_modify() takes a "raw" pgprot_t which was not necessarily created
with any of the existing PTE bit helpers. That means that it can return a
pte_t with Write=0,Dirty=1, a shadow stack PTE, when it did not intend to
create one.

However pte_modify() changes a PTE to 'newprot', but it doesn't use the
pte_*(). Modify it to also move _PAGE_DIRTY to _PAGE_COW. Apply the same
changes to pmd_modify().

Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---

v4:
 - Fix an issue in soft-dirty test, where pte_modify() would detect
   _PAGE_COW in pte_dirty() and set the soft dirty bit in pte_mkdirty().

v2:
 - Update commit log with text and suggestions from (Dave Hansen)
 - Drop fixup_dirty_pte() in favor of clearing the HW dirty bit along
   with the _PAGE_CHG_MASK masking, then calling pte_mkdirty() (Dave
   Hansen)

 arch/x86/include/asm/pgtable.h | 40 +++++++++++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 6 deletions(-)
  

Comments

Kees Cook Dec. 3, 2022, 2:31 a.m. UTC | #1
On Fri, Dec 02, 2022 at 04:35:38PM -0800, Rick Edgecombe wrote:
> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> 
> The Write=0,Dirty=1 PTE has been used to indicate copy-on-write pages.
> However, newer x86 processors also regard a Write=0,Dirty=1 PTE as a
> shadow stack page. In order to separate the two, the software-defined
> _PAGE_DIRTY is changed to _PAGE_COW for the copy-on-write case, and
> pte_*() are updated to do this.
> 
> pte_modify() takes a "raw" pgprot_t which was not necessarily created
> with any of the existing PTE bit helpers. That means that it can return a
> pte_t with Write=0,Dirty=1, a shadow stack PTE, when it did not intend to
> create one.
> 
> However pte_modify() changes a PTE to 'newprot', but it doesn't use the
> pte_*(). Modify it to also move _PAGE_DIRTY to _PAGE_COW. Apply the same
> changes to pmd_modify().
> 
> Tested-by: Pengfei Xu <pengfei.xu@intel.com>
> Tested-by: John Allen <john.allen@amd.com>
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Reviewed-by: Kees Cook <keescook@chromium.org>
  
Borislav Petkov Dec. 27, 2022, 11:42 a.m. UTC | #2
On Fri, Dec 02, 2022 at 04:35:38PM -0800, Rick Edgecombe wrote:
>  static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
>  {
> +	pteval_t _page_chg_mask_no_dirty = _PAGE_CHG_MASK & ~_PAGE_DIRTY;
>  	pteval_t val = pte_val(pte), oldval = val;
> +	pte_t pte_result;
>  
>  	/*
>  	 * Chop off the NX bit (if present), and add the NX portion of
>  	 * the newprot (if present):
>  	 */
> -	val &= _PAGE_CHG_MASK;
> -	val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK;
> +	val &= _page_chg_mask_no_dirty;
> +	val |= check_pgprot(newprot) & ~_page_chg_mask_no_dirty;
>  	val = flip_protnone_guard(oldval, val, PTE_PFN_MASK);
> -	return __pte(val);
> +
> +	pte_result = __pte(val);
> +
> +	/*
> +	 * Dirty bit is not preserved above so it can be done

Just for my own understanding: are you saying here that
flip_protnone_guard() might end up setting _PAGE_DIRTY in val...

> +	 * in a special way for the shadow stack case, where it
> +	 * needs to set _PAGE_COW. pte_mkcow() will do this in
> +	 * the case of shadow stack.
> +	 */
> +	if (pte_dirty(pte_result))
> +		pte_result = pte_mkcow(pte_result);

... and in that case we need to turn it into a _PAGE_COW setting?

Thx.
  
Edgecombe, Rick P Dec. 27, 2022, 11:31 p.m. UTC | #3
On Tue, 2022-12-27 at 12:42 +0100, Borislav Petkov wrote:
> On Fri, Dec 02, 2022 at 04:35:38PM -0800, Rick Edgecombe wrote:
> >  static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> >  {
> > +       pteval_t _page_chg_mask_no_dirty = _PAGE_CHG_MASK &
> > ~_PAGE_DIRTY;
> >         pteval_t val = pte_val(pte), oldval = val;
> > +       pte_t pte_result;
> >  
> >         /*
> >          * Chop off the NX bit (if present), and add the NX portion
> > of
> >          * the newprot (if present):
> >          */
> > -       val &= _PAGE_CHG_MASK;
> > -       val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK;
> > +       val &= _page_chg_mask_no_dirty;
> > +       val |= check_pgprot(newprot) & ~_page_chg_mask_no_dirty;
> >         val = flip_protnone_guard(oldval, val, PTE_PFN_MASK);
> > -       return __pte(val);
> > +
> > +       pte_result = __pte(val);
> > +
> > +       /*
> > +        * Dirty bit is not preserved above so it can be done
> 
> Just for my own understanding: are you saying here that
> flip_protnone_guard() might end up setting _PAGE_DIRTY in val...
> 
> > +        * in a special way for the shadow stack case, where it
> > +        * needs to set _PAGE_COW. pte_mkcow() will do this in
> > +        * the case of shadow stack.
> > +        */
> > +       if (pte_dirty(pte_result))
> > +               pte_result = pte_mkcow(pte_result);
> 
> ... and in that case we need to turn it into a _PAGE_COW setting?
> 
The comment is referring to the dirty bits possibly coming from
newprot, but looking at it now I think the code was broken trying to
fix the recent soft dirty test breakage. Now it might lose pre-existing
dirty bits in the pte unessarily... I think. I'm temporarily without
access to my test equipment so will have to get back to you on this.
Thanks for flagging that something looks off.
  
Borislav Petkov Jan. 4, 2023, 1:25 p.m. UTC | #4
On Tue, Dec 27, 2022 at 11:31:37PM +0000, Edgecombe, Rick P wrote:
> The comment is referring to the dirty bits possibly coming from
> newprot,

Ah right, ofc.

> but looking at it now I think the code was broken trying to
> fix the recent soft dirty test breakage. Now it might lose pre-existing
> dirty bits in the pte unessarily... I think.

Right, does this code need to be simplified?

I.e., match the shadow stack PTE (Write=0,Dirty=1) and handle that in a separate
helper?

So that the flows are separate. I'm not a mm guy but this function makes my head
hurt - dunno about other folks. :)

Thx.
  
Edgecombe, Rick P Jan. 5, 2023, 1:06 a.m. UTC | #5
On Wed, 2023-01-04 at 14:25 +0100, Borislav Petkov wrote:
> On Tue, Dec 27, 2022 at 11:31:37PM +0000, Edgecombe, Rick P wrote:
> > The comment is referring to the dirty bits possibly coming from
> > newprot,
> 
> Ah right, ofc.
> 
> > but looking at it now I think the code was broken trying to
> > fix the recent soft dirty test breakage. Now it might lose pre-
> > existing
> > dirty bits in the pte unessarily... I think.
> 
> Right, does this code need to be simplified?
> 
> I.e., match the shadow stack PTE (Write=0,Dirty=1) and handle that in
> a separate
> helper?
> 
> So that the flows are separate. I'm not a mm guy but this function
> makes my head
> hurt - dunno about other folks. :)

Yea, the whole Write=0,Dirty=1 thing has been a bit of a challenge to
make clear in the MM code. Dave had suggested a sketch here for
pte_modify():

https://lore.kernel.org/lkml/95299e90-245b-61c5-8ef0-5e6da3c37c5e@intel.com/

The problem was that pte_mkdirty() also sets the soft dirty bit. So it
did more than preserve the dirty bit - it also added on the soft dirty
bit. I extracted a helper __pte_mkdirty() that can optionally not set
the soft dirty bit. So then it looks pretty close to how Dave
suggested:

static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{
	pteval_t _page_chg_mask_no_dirty = _PAGE_CHG_MASK &
~_PAGE_DIRTY;
	pteval_t val = pte_val(pte), oldval = val;
	pte_t pte_result;

	/*
	 * Chop off the NX bit (if present), and add the NX portion of
	 * the newprot (if present):
	 */
	val &= _page_chg_mask_no_dirty;
	val |= check_pgprot(newprot) & ~_page_chg_mask_no_dirty;
	val = flip_protnone_guard(oldval, val, PTE_PFN_MASK);

	pte_result = __pte(val);

	/*
	 * Dirty bit is not preserved above so it can be done
	 * in a special way for the shadow stack case, where it
	 * may need to set _PAGE_COW. __pte_mkdirty() will do this in
	 * the case of shadow stack.
	 */
	if (pte_dirty(pte))
		pte_result = __pte_mkdirty(pte_result, false);

	return pte_result;
}
  

Patch

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index ee5fbdc2615f..67bd2627c293 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -698,26 +698,54 @@  static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask);
 
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
+	pteval_t _page_chg_mask_no_dirty = _PAGE_CHG_MASK & ~_PAGE_DIRTY;
 	pteval_t val = pte_val(pte), oldval = val;
+	pte_t pte_result;
 
 	/*
 	 * Chop off the NX bit (if present), and add the NX portion of
 	 * the newprot (if present):
 	 */
-	val &= _PAGE_CHG_MASK;
-	val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK;
+	val &= _page_chg_mask_no_dirty;
+	val |= check_pgprot(newprot) & ~_page_chg_mask_no_dirty;
 	val = flip_protnone_guard(oldval, val, PTE_PFN_MASK);
-	return __pte(val);
+
+	pte_result = __pte(val);
+
+	/*
+	 * Dirty bit is not preserved above so it can be done
+	 * in a special way for the shadow stack case, where it
+	 * needs to set _PAGE_COW. pte_mkcow() will do this in
+	 * the case of shadow stack.
+	 */
+	if (pte_dirty(pte_result))
+		pte_result = pte_mkcow(pte_result);
+
+	return pte_result;
 }
 
 static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
 {
+	pteval_t _hpage_chg_mask_no_dirty = _HPAGE_CHG_MASK & ~_PAGE_DIRTY;
 	pmdval_t val = pmd_val(pmd), oldval = val;
+	pmd_t pmd_result;
 
-	val &= _HPAGE_CHG_MASK;
-	val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK;
+	val &= _hpage_chg_mask_no_dirty;
+	val |= check_pgprot(newprot) & ~_hpage_chg_mask_no_dirty;
 	val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK);
-	return __pmd(val);
+
+	pmd_result = __pmd(val);
+
+	/*
+	 * Dirty bit is not preserved above so it can be done
+	 * in a special way for the shadow stack case, where it
+	 * needs to set _PAGE_COW. pmd_mkcow() will do this in
+	 * the case of shadow stack.
+	 */
+	if (pmd_dirty(pmd_result))
+		pmd_result = pmd_mkcow(pmd_result);
+
+	return pmd_result;
 }
 
 /*