[v2,2/8] mm/hugetlb: Prepare hugetlb_follow_page_mask() for FOLL_PIN

Message ID 20230619231044.112894-3-peterx@redhat.com
State New
Headers
Series mm/gup: Unify hugetlb, speed up thp |

Commit Message

Peter Xu June 19, 2023, 11:10 p.m. UTC
  follow_page() doesn't use FOLL_PIN, meanwhile hugetlb seems to not be the
target of FOLL_WRITE either.  However add the checks.

Namely, either the need to CoW due to missing write bit, or proper CoR on
!AnonExclusive pages over R/O pins to reject the follow page.  That brings
this function closer to follow_hugetlb_page().

So we don't care before, and also for now.  But we'll care if we switch
over slow-gup to use hugetlb_follow_page_mask().  We'll also care when to
return -EMLINK properly, as that's the gup internal api to mean "we should
do CoR".  Not really needed for follow page path, though.

When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be
clear that it just should never fail.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/hugetlb.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)
  

Comments

David Hildenbrand June 20, 2023, 3:22 p.m. UTC | #1
On 20.06.23 01:10, Peter Xu wrote:
> follow_page() doesn't use FOLL_PIN, meanwhile hugetlb seems to not be the
> target of FOLL_WRITE either.  However add the checks.
> 
> Namely, either the need to CoW due to missing write bit, or proper CoR on

s/CoR/unsharing/

> !AnonExclusive pages over R/O pins to reject the follow page.  That brings
> this function closer to follow_hugetlb_page().
> 
> So we don't care before, and also for now.  But we'll care if we switch
> over slow-gup to use hugetlb_follow_page_mask().  We'll also care when to
> return -EMLINK properly, as that's the gup internal api to mean "we should
> do CoR".  Not really needed for follow page path, though.

"we should unshare".

> 
> When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be
> clear that it just should never fail.
> 
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   mm/hugetlb.c | 24 +++++++++++++++---------
>   1 file changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f75f5e78ff0b..9a6918c4250a 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6463,13 +6463,6 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
>   	spinlock_t *ptl;
>   	pte_t *pte, entry;
>   
> -	/*
> -	 * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via
> -	 * follow_hugetlb_page().
> -	 */
> -	if (WARN_ON_ONCE(flags & FOLL_PIN))
> -		return NULL;
> -
>   	hugetlb_vma_lock_read(vma);
>   	pte = hugetlb_walk(vma, haddr, huge_page_size(h));
>   	if (!pte)
> @@ -6478,8 +6471,21 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
>   	ptl = huge_pte_lock(h, mm, pte);
>   	entry = huge_ptep_get(pte);
>   	if (pte_present(entry)) {
> -		page = pte_page(entry) +
> -				((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
> +		page = pte_page(entry);
> +
> +		if (gup_must_unshare(vma, flags, page)) {

All other callers (like follow_page_pte(), including 
__follow_hugetlb_must_fault())

(a) check for write permissions first.

(b) check for gup_must_unshare() only if !pte_write(entry)

I'd vote to keep these checks as similar as possible to the other GUP code.

> +			/* Tell the caller to do Copy-On-Read */

"Tell the caller to unshare".

> +			page = ERR_PTR(-EMLINK);
> +			goto out;
> +		}
> +
> +		if ((flags & FOLL_WRITE) && !pte_write(entry)) {
> +			page = NULL;
> +			goto out;
> +		}


I'm confused about pte_write() vs. huge_pte_write(), and I don't know 
what's right or wrong here.
  
David Hildenbrand June 20, 2023, 3:28 p.m. UTC | #2
On 20.06.23 01:10, Peter Xu wrote:
> follow_page() doesn't use FOLL_PIN, meanwhile hugetlb seems to not be the
> target of FOLL_WRITE either.  However add the checks.
> 
> Namely, either the need to CoW due to missing write bit, or proper CoR on
> !AnonExclusive pages over R/O pins to reject the follow page.  That brings
> this function closer to follow_hugetlb_page().
> 
> So we don't care before, and also for now.  But we'll care if we switch
> over slow-gup to use hugetlb_follow_page_mask().  We'll also care when to
> return -EMLINK properly, as that's the gup internal api to mean "we should
> do CoR".  Not really needed for follow page path, though.
> 
> When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be
> clear that it just should never fail.

Oh, and does this comment really belong into this patch or am I confused?
  
Peter Xu June 20, 2023, 4:03 p.m. UTC | #3
On Tue, Jun 20, 2023 at 05:22:02PM +0200, David Hildenbrand wrote:
> On 20.06.23 01:10, Peter Xu wrote:
> > follow_page() doesn't use FOLL_PIN, meanwhile hugetlb seems to not be the
> > target of FOLL_WRITE either.  However add the checks.
> > 
> > Namely, either the need to CoW due to missing write bit, or proper CoR on
> 
> s/CoR/unsharing/
> 
> > !AnonExclusive pages over R/O pins to reject the follow page.  That brings
> > this function closer to follow_hugetlb_page().
> > 
> > So we don't care before, and also for now.  But we'll care if we switch
> > over slow-gup to use hugetlb_follow_page_mask().  We'll also care when to
> > return -EMLINK properly, as that's the gup internal api to mean "we should
> > do CoR".  Not really needed for follow page path, though.
> 
> "we should unshare".
> 
> > 
> > When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be
> > clear that it just should never fail.
> > 
> > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >   mm/hugetlb.c | 24 +++++++++++++++---------
> >   1 file changed, 15 insertions(+), 9 deletions(-)
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index f75f5e78ff0b..9a6918c4250a 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -6463,13 +6463,6 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> >   	spinlock_t *ptl;
> >   	pte_t *pte, entry;
> > -	/*
> > -	 * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via
> > -	 * follow_hugetlb_page().
> > -	 */
> > -	if (WARN_ON_ONCE(flags & FOLL_PIN))
> > -		return NULL;
> > -
> >   	hugetlb_vma_lock_read(vma);
> >   	pte = hugetlb_walk(vma, haddr, huge_page_size(h));
> >   	if (!pte)
> > @@ -6478,8 +6471,21 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> >   	ptl = huge_pte_lock(h, mm, pte);
> >   	entry = huge_ptep_get(pte);
> >   	if (pte_present(entry)) {
> > -		page = pte_page(entry) +
> > -				((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
> > +		page = pte_page(entry);
> > +
> > +		if (gup_must_unshare(vma, flags, page)) {
> 
> All other callers (like follow_page_pte(), including
> __follow_hugetlb_must_fault())
> 
> (a) check for write permissions first.
> 
> (b) check for gup_must_unshare() only if !pte_write(entry)
> 
> I'd vote to keep these checks as similar as possible to the other GUP code.

I'm pretty sure the order doesn't matter here since one for read and one
for write.. but sure I can switch the order.

> 
> > +			/* Tell the caller to do Copy-On-Read */
> 
> "Tell the caller to unshare".
> 
> > +			page = ERR_PTR(-EMLINK);
> > +			goto out;
> > +		}
> > +
> > +		if ((flags & FOLL_WRITE) && !pte_write(entry)) {
> > +			page = NULL;
> > +			goto out;
> > +		}
> 
> 
> I'm confused about pte_write() vs. huge_pte_write(), and I don't know what's
> right or wrong here.

AFAICT, they should always be identical in code. But yeah.. I should just
use the huge_ version.

Thanks,
  
Peter Xu June 20, 2023, 4:06 p.m. UTC | #4
On Tue, Jun 20, 2023 at 05:28:28PM +0200, David Hildenbrand wrote:
> On 20.06.23 01:10, Peter Xu wrote:
> > follow_page() doesn't use FOLL_PIN, meanwhile hugetlb seems to not be the
> > target of FOLL_WRITE either.  However add the checks.
> > 
> > Namely, either the need to CoW due to missing write bit, or proper CoR on
> > !AnonExclusive pages over R/O pins to reject the follow page.  That brings
> > this function closer to follow_hugetlb_page().
> > 
> > So we don't care before, and also for now.  But we'll care if we switch
> > over slow-gup to use hugetlb_follow_page_mask().  We'll also care when to
> > return -EMLINK properly, as that's the gup internal api to mean "we should
> > do CoR".  Not really needed for follow page path, though.
> > 
> > When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be
> > clear that it just should never fail.
> 
> Oh, and does this comment really belong into this patch or am I confused?

Ah yeh, thanks for spotting.  I used to have it in v1 but I kept the old
failure path here to partly address Lorenzo's worry; but then I forgot to
add the WARN_ON_ONCE back to guard.  I'll remember to add that in v3.
  

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f75f5e78ff0b..9a6918c4250a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6463,13 +6463,6 @@  struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
 	spinlock_t *ptl;
 	pte_t *pte, entry;
 
-	/*
-	 * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via
-	 * follow_hugetlb_page().
-	 */
-	if (WARN_ON_ONCE(flags & FOLL_PIN))
-		return NULL;
-
 	hugetlb_vma_lock_read(vma);
 	pte = hugetlb_walk(vma, haddr, huge_page_size(h));
 	if (!pte)
@@ -6478,8 +6471,21 @@  struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
 	ptl = huge_pte_lock(h, mm, pte);
 	entry = huge_ptep_get(pte);
 	if (pte_present(entry)) {
-		page = pte_page(entry) +
-				((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
+		page = pte_page(entry);
+
+		if (gup_must_unshare(vma, flags, page)) {
+			/* Tell the caller to do Copy-On-Read */
+			page = ERR_PTR(-EMLINK);
+			goto out;
+		}
+
+		if ((flags & FOLL_WRITE) && !pte_write(entry)) {
+			page = NULL;
+			goto out;
+		}
+
+		page += ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
+
 		/*
 		 * Note that page may be a sub-page, and with vmemmap
 		 * optimizations the page struct may be read only.