[v3] mm/hugetlb: Fix uffd wr-protection for CoW optimization path

Message ID 20230324142620.2344140-1-peterx@redhat.com
State New
Headers
Series [v3] mm/hugetlb: Fix uffd wr-protection for CoW optimization path |

Commit Message

Peter Xu March 24, 2023, 2:26 p.m. UTC
  This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be
writable even with uffd-wp bit set.  It only happens with hugetlb private
mappings, when someone firstly wr-protects a missing pte (which will
install a pte marker), then a write to the same page without any prior
access to the page.

Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before
reaching hugetlb_wp() to avoid taking more locks that userfault won't need.
However there's one CoW optimization path that can trigger hugetlb_wp()
inside hugetlb_no_page(), which will bypass the trap.

This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit
is detected.  The new path will only trigger in the CoW optimization path
because generic hugetlb_fault() (e.g. when a present pte was wr-protected)
will resolve the uffd-wp bit already.  Also make sure anonymous UNSHARE
won't be affected and can still be resolved, IOW only skip CoW not CoR.

This patch will be needed for v5.19+ hence copy stable.

Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: linux-stable <stable@vger.kernel.org>
Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection")
Signed-off-by: Peter Xu <peterx@redhat.com>
---

Notes:

v2 is not on the list but in an attachment in the reply; this v3 is mostly
to make sure it's not the same as the patch used to be attached.  Sorry
Andrew, we need to drop the queued one as I rewrote the commit message.

Muhammad, I didn't attach your T-b because of the slight functional change.
Please feel free to re-attach if it still works for you (which I believe
should).

thanks,
---
 mm/hugetlb.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)
  

Comments

Mike Kravetz March 24, 2023, 10:27 p.m. UTC | #1
On 03/24/23 10:26, Peter Xu wrote:
> This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be
> writable even with uffd-wp bit set.  It only happens with hugetlb private
> mappings, when someone firstly wr-protects a missing pte (which will
> install a pte marker), then a write to the same page without any prior
> access to the page.
> 
> Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before
> reaching hugetlb_wp() to avoid taking more locks that userfault won't need.
> However there's one CoW optimization path that can trigger hugetlb_wp()
> inside hugetlb_no_page(), which will bypass the trap.
> 
> This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit
> is detected.  The new path will only trigger in the CoW optimization path
> because generic hugetlb_fault() (e.g. when a present pte was wr-protected)
> will resolve the uffd-wp bit already.  Also make sure anonymous UNSHARE
> won't be affected and can still be resolved, IOW only skip CoW not CoR.
> 
> This patch will be needed for v5.19+ hence copy stable.
> 
> Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
> Cc: linux-stable <stable@vger.kernel.org>
> Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> 
> Notes:
> 
> v2 is not on the list but in an attachment in the reply; this v3 is mostly
> to make sure it's not the same as the patch used to be attached.  Sorry
> Andrew, we need to drop the queued one as I rewrote the commit message.

My appologies!  I saw the code path missed in v2 and assumed you did not
think it applied.  So, I said nothing.  My bad!

> Muhammad, I didn't attach your T-b because of the slight functional change.
> Please feel free to re-attach if it still works for you (which I believe
> should).
> 
> thanks,
> ---
>  mm/hugetlb.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8bfd07f4c143..a58b3739ed4b 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>  		       struct folio *pagecache_folio, spinlock_t *ptl)
>  {
>  	const bool unshare = flags & FAULT_FLAG_UNSHARE;
> -	pte_t pte;
> +	pte_t pte = huge_ptep_get(ptep);
>  	struct hstate *h = hstate_vma(vma);
>  	struct page *old_page;
>  	struct folio *new_folio;
> @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>  	unsigned long haddr = address & huge_page_mask(h);
>  	struct mmu_notifier_range range;
>  
> +	/*
> +	 * Never handle CoW for uffd-wp protected pages.  It should be only
> +	 * handled when the uffd-wp protection is removed.
> +	 *
> +	 * Note that only the CoW optimization path (in hugetlb_no_page())
> +	 * can trigger this, because hugetlb_fault() will always resolve
> +	 * uffd-wp bit first.
> +	 */
> +	if (!unshare && huge_pte_uffd_wp(pte))
> +		return 0;

This looks correct.  However, since the previous version looked correct I must
ask.  Can we have unshare set and huge_pte_uffd_wp true?  If so, then it seems
we would need to possibly propogate that uffd_wp to the new pte as in v2

> +
>  	/*
>  	 * hugetlb does not support FOLL_FORCE-style write faults that keep the
>  	 * PTE mapped R/O such as maybe_mkwrite() would do.
> @@ -5500,7 +5511,6 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>  		return 0;
>  	}
>  
> -	pte = huge_ptep_get(ptep);
>  	old_page = pte_page(pte);
>  
>  	delayacct_wpcopy_start();
> -- 
> 2.39.1
>
  
David Hildenbrand March 24, 2023, 10:36 p.m. UTC | #2
On 24.03.23 23:27, Mike Kravetz wrote:
> On 03/24/23 10:26, Peter Xu wrote:
>> This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be
>> writable even with uffd-wp bit set.  It only happens with hugetlb private
>> mappings, when someone firstly wr-protects a missing pte (which will
>> install a pte marker), then a write to the same page without any prior
>> access to the page.
>>
>> Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before
>> reaching hugetlb_wp() to avoid taking more locks that userfault won't need.
>> However there's one CoW optimization path that can trigger hugetlb_wp()
>> inside hugetlb_no_page(), which will bypass the trap.
>>
>> This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit
>> is detected.  The new path will only trigger in the CoW optimization path
>> because generic hugetlb_fault() (e.g. when a present pte was wr-protected)
>> will resolve the uffd-wp bit already.  Also make sure anonymous UNSHARE
>> won't be affected and can still be resolved, IOW only skip CoW not CoR.
>>
>> This patch will be needed for v5.19+ hence copy stable.
>>
>> Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>> Cc: linux-stable <stable@vger.kernel.org>
>> Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection")
>> Signed-off-by: Peter Xu <peterx@redhat.com>
>> ---
>>
>> Notes:
>>
>> v2 is not on the list but in an attachment in the reply; this v3 is mostly
>> to make sure it's not the same as the patch used to be attached.  Sorry
>> Andrew, we need to drop the queued one as I rewrote the commit message.
> 
> My appologies!  I saw the code path missed in v2 and assumed you did not
> think it applied.  So, I said nothing.  My bad!
> 
>> Muhammad, I didn't attach your T-b because of the slight functional change.
>> Please feel free to re-attach if it still works for you (which I believe
>> should).
>>
>> thanks,
>> ---
>>   mm/hugetlb.c | 14 ++++++++++++--
>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 8bfd07f4c143..a58b3739ed4b 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>>   		       struct folio *pagecache_folio, spinlock_t *ptl)
>>   {
>>   	const bool unshare = flags & FAULT_FLAG_UNSHARE;
>> -	pte_t pte;
>> +	pte_t pte = huge_ptep_get(ptep);
>>   	struct hstate *h = hstate_vma(vma);
>>   	struct page *old_page;
>>   	struct folio *new_folio;
>> @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>>   	unsigned long haddr = address & huge_page_mask(h);
>>   	struct mmu_notifier_range range;
>>   
>> +	/*
>> +	 * Never handle CoW for uffd-wp protected pages.  It should be only
>> +	 * handled when the uffd-wp protection is removed.
>> +	 *
>> +	 * Note that only the CoW optimization path (in hugetlb_no_page())
>> +	 * can trigger this, because hugetlb_fault() will always resolve
>> +	 * uffd-wp bit first.
>> +	 */
>> +	if (!unshare && huge_pte_uffd_wp(pte))
>> +		return 0;
> 
> This looks correct.  However, since the previous version looked correct I must
> ask.  Can we have unshare set and huge_pte_uffd_wp true?  If so, then it seems
> we would need to possibly propogate that uffd_wp to the new pte as in v2

We can. A reproducer would share an anon hugetlb page because parent and 
child. In the parent, we would uffd-wp that page. We could trigger 
unsharing by R/O-pinning that page.
  
Peter Xu March 26, 2023, 2:46 p.m. UTC | #3
On Fri, Mar 24, 2023 at 11:36:53PM +0100, David Hildenbrand wrote:
> > > @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
> > >   	unsigned long haddr = address & huge_page_mask(h);
> > >   	struct mmu_notifier_range range;
> > > +	/*
> > > +	 * Never handle CoW for uffd-wp protected pages.  It should be only
> > > +	 * handled when the uffd-wp protection is removed.
> > > +	 *
> > > +	 * Note that only the CoW optimization path (in hugetlb_no_page())
> > > +	 * can trigger this, because hugetlb_fault() will always resolve
> > > +	 * uffd-wp bit first.
> > > +	 */
> > > +	if (!unshare && huge_pte_uffd_wp(pte))
> > > +		return 0;
> > 
> > This looks correct.  However, since the previous version looked correct I must
> > ask.  Can we have unshare set and huge_pte_uffd_wp true?  If so, then it seems
> > we would need to possibly propogate that uffd_wp to the new pte as in v2

Good point, thanks for spotting!

> 
> We can. A reproducer would share an anon hugetlb page because parent and
> child. In the parent, we would uffd-wp that page. We could trigger unsharing
> by R/O-pinning that page.

Right.  This seems to be a separate bug..  It should be triggered in
totally different context and much harder due to rare use of RO pins,
meanwhile used with userfault-wp.

If both of you agree, I can prepare a separate patch for this bug, and I'll
better prepare a reproducer/selftest with it.
  
Mike Kravetz March 27, 2023, 6:34 p.m. UTC | #4
On 03/26/23 10:46, Peter Xu wrote:
> On Fri, Mar 24, 2023 at 11:36:53PM +0100, David Hildenbrand wrote:
> > > > @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
> > > >   	unsigned long haddr = address & huge_page_mask(h);
> > > >   	struct mmu_notifier_range range;
> > > > +	/*
> > > > +	 * Never handle CoW for uffd-wp protected pages.  It should be only
> > > > +	 * handled when the uffd-wp protection is removed.
> > > > +	 *
> > > > +	 * Note that only the CoW optimization path (in hugetlb_no_page())
> > > > +	 * can trigger this, because hugetlb_fault() will always resolve
> > > > +	 * uffd-wp bit first.
> > > > +	 */
> > > > +	if (!unshare && huge_pte_uffd_wp(pte))
> > > > +		return 0;
> > > 
> > > This looks correct.  However, since the previous version looked correct I must
> > > ask.  Can we have unshare set and huge_pte_uffd_wp true?  If so, then it seems
> > > we would need to possibly propogate that uffd_wp to the new pte as in v2
> 
> Good point, thanks for spotting!
> 
> > 
> > We can. A reproducer would share an anon hugetlb page because parent and
> > child. In the parent, we would uffd-wp that page. We could trigger unsharing
> > by R/O-pinning that page.
> 
> Right.  This seems to be a separate bug..  It should be triggered in
> totally different context and much harder due to rare use of RO pins,
> meanwhile used with userfault-wp.
> 
> If both of you agree, I can prepare a separate patch for this bug, and I'll
> better prepare a reproducer/selftest with it.
> 

I am OK with separate patches, and agree that the R/O pinning case is less
likely to happen.

Since this patch addresses the issue found by Muhammad,

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
  
David Hildenbrand March 27, 2023, 8:57 p.m. UTC | #5
On 27.03.23 20:34, Mike Kravetz wrote:
> On 03/26/23 10:46, Peter Xu wrote:
>> On Fri, Mar 24, 2023 at 11:36:53PM +0100, David Hildenbrand wrote:
>>>>> @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>>>>>    	unsigned long haddr = address & huge_page_mask(h);
>>>>>    	struct mmu_notifier_range range;
>>>>> +	/*
>>>>> +	 * Never handle CoW for uffd-wp protected pages.  It should be only
>>>>> +	 * handled when the uffd-wp protection is removed.
>>>>> +	 *
>>>>> +	 * Note that only the CoW optimization path (in hugetlb_no_page())
>>>>> +	 * can trigger this, because hugetlb_fault() will always resolve
>>>>> +	 * uffd-wp bit first.
>>>>> +	 */
>>>>> +	if (!unshare && huge_pte_uffd_wp(pte))
>>>>> +		return 0;
>>>>
>>>> This looks correct.  However, since the previous version looked correct I must
>>>> ask.  Can we have unshare set and huge_pte_uffd_wp true?  If so, then it seems
>>>> we would need to possibly propogate that uffd_wp to the new pte as in v2
>>
>> Good point, thanks for spotting!
>>
>>>
>>> We can. A reproducer would share an anon hugetlb page because parent and
>>> child. In the parent, we would uffd-wp that page. We could trigger unsharing
>>> by R/O-pinning that page.
>>
>> Right.  This seems to be a separate bug..  It should be triggered in
>> totally different context and much harder due to rare use of RO pins,
>> meanwhile used with userfault-wp.
>>
>> If both of you agree, I can prepare a separate patch for this bug, and I'll
>> better prepare a reproducer/selftest with it.
>>
> 
> I am OK with separate patches, and agree that the R/O pinning case is less
> likely to happen.

Yes, the combination should be rather rare and we can fix that 
separately. Ideally, we'd try to mimic the same uffd code flow in 
hugetlb cow/unshare handling that we use in memory.c

> 
> Since this patch addresses the issue found by Muhammad,
> 
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

Hopefully we didn't forget about yet another case :D

Acked-by: David Hildenbrand <david@redhat.com>
  

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8bfd07f4c143..a58b3739ed4b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5478,7 +5478,7 @@  static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
 		       struct folio *pagecache_folio, spinlock_t *ptl)
 {
 	const bool unshare = flags & FAULT_FLAG_UNSHARE;
-	pte_t pte;
+	pte_t pte = huge_ptep_get(ptep);
 	struct hstate *h = hstate_vma(vma);
 	struct page *old_page;
 	struct folio *new_folio;
@@ -5487,6 +5487,17 @@  static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
 	unsigned long haddr = address & huge_page_mask(h);
 	struct mmu_notifier_range range;
 
+	/*
+	 * Never handle CoW for uffd-wp protected pages.  It should be only
+	 * handled when the uffd-wp protection is removed.
+	 *
+	 * Note that only the CoW optimization path (in hugetlb_no_page())
+	 * can trigger this, because hugetlb_fault() will always resolve
+	 * uffd-wp bit first.
+	 */
+	if (!unshare && huge_pte_uffd_wp(pte))
+		return 0;
+
 	/*
 	 * hugetlb does not support FOLL_FORCE-style write faults that keep the
 	 * PTE mapped R/O such as maybe_mkwrite() would do.
@@ -5500,7 +5511,6 @@  static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
 		return 0;
 	}
 
-	pte = huge_ptep_get(ptep);
 	old_page = pte_page(pte);
 
 	delayacct_wpcopy_start();