[RFC,07/12] mm/gup: Refactor record_subpages() to find 1st small page

Message ID 20231116012908.392077-8-peterx@redhat.com
State New
Headers
Series mm/gup: Unify hugetlb, part 2 |

Commit Message

Peter Xu Nov. 16, 2023, 1:29 a.m. UTC
  All the fast-gup functions take a tail page to operate, always need to do
page mask calculations before feeding that into record_subpages().

Merge that logic into record_subpages(), so that we always take a head
page, and leave the rest calculation to record_subpages().

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/gup.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)
  

Comments

Matthew Wilcox Nov. 16, 2023, 2:51 p.m. UTC | #1
On Wed, Nov 15, 2023 at 08:29:03PM -0500, Peter Xu wrote:
> All the fast-gup functions take a tail page to operate, always need to do
> page mask calculations before feeding that into record_subpages().
> 
> Merge that logic into record_subpages(), so that we always take a head
> page, and leave the rest calculation to record_subpages().

This is a bit fragile.  You're assuming that pmd_page() always returns
a head page, and that's only true today because I looked at the work
required vs the reward and decided to cap the large folio size at PMD
size.  If we allowed 2*PMD_SIZE (eg 4MB on x86), pmd_page() would not
return a head page.  There is a small amount of demand for > PMD size
large folio support, so I suspect we will want to do this eventually.
I'm not particularly trying to do these conversions, but it would be
good to not add more assumptions that pmd_page() returns a head page.

> +static int record_subpages(struct page *head, unsigned long sz,
> +			   unsigned long addr, unsigned long end,
> +			   struct page **pages)

> @@ -2870,8 +2873,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>  					     pages, nr);
>  	}
>  
> -	page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
> -	refs = record_subpages(page, addr, end, pages + *nr);
> +	page = pmd_page(orig);
> +	refs = record_subpages(page, PMD_SIZE, addr, end, pages + *nr);
>  
>  	folio = try_grab_folio(page, refs, flags);
>  	if (!folio)
  
Peter Xu Nov. 16, 2023, 7:40 p.m. UTC | #2
On Thu, Nov 16, 2023 at 02:51:52PM +0000, Matthew Wilcox wrote:
> On Wed, Nov 15, 2023 at 08:29:03PM -0500, Peter Xu wrote:
> > All the fast-gup functions take a tail page to operate, always need to do
> > page mask calculations before feeding that into record_subpages().
> > 
> > Merge that logic into record_subpages(), so that we always take a head
> > page, and leave the rest calculation to record_subpages().
> 
> This is a bit fragile.  You're assuming that pmd_page() always returns
> a head page, and that's only true today because I looked at the work
> required vs the reward and decided to cap the large folio size at PMD
> size.  If we allowed 2*PMD_SIZE (eg 4MB on x86), pmd_page() would not
> return a head page.  There is a small amount of demand for > PMD size
> large folio support, so I suspect we will want to do this eventually.
> I'm not particularly trying to do these conversions, but it would be
> good to not add more assumptions that pmd_page() returns a head page.

Makes sense.  Actually, IIUC arm64's CONT_PMD pages can already make that
not a head page.

The code should still be correct, though.  AFAIU what I need to do then is
renaming the first field of record_subpages() (s/head/base/) in the next
version, or just keep it the old one ("page"), then update the commit
message.

Thanks,
  
Matthew Wilcox Nov. 16, 2023, 7:41 p.m. UTC | #3
On Thu, Nov 16, 2023 at 02:40:21PM -0500, Peter Xu wrote:
> On Thu, Nov 16, 2023 at 02:51:52PM +0000, Matthew Wilcox wrote:
> > On Wed, Nov 15, 2023 at 08:29:03PM -0500, Peter Xu wrote:
> > > All the fast-gup functions take a tail page to operate, always need to do
> > > page mask calculations before feeding that into record_subpages().
> > > 
> > > Merge that logic into record_subpages(), so that we always take a head
> > > page, and leave the rest calculation to record_subpages().
> > 
> > This is a bit fragile.  You're assuming that pmd_page() always returns
> > a head page, and that's only true today because I looked at the work
> > required vs the reward and decided to cap the large folio size at PMD
> > size.  If we allowed 2*PMD_SIZE (eg 4MB on x86), pmd_page() would not
> > return a head page.  There is a small amount of demand for > PMD size
> > large folio support, so I suspect we will want to do this eventually.
> > I'm not particularly trying to do these conversions, but it would be
> > good to not add more assumptions that pmd_page() returns a head page.
> 
> Makes sense.  Actually, IIUC arm64's CONT_PMD pages can already make that
> not a head page.
> 
> The code should still be correct, though.  AFAIU what I need to do then is
> renaming the first field of record_subpages() (s/head/base/) in the next
> version, or just keep it the old one ("page"), then update the commit
> message.

Yeah, I think just leave it as 'page' would be best.  Thanks.
  

Patch

diff --git a/mm/gup.c b/mm/gup.c
index 424d45e1afb3..69dae51f3eb1 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2763,11 +2763,14 @@  static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, unsigned long addr,
 }
 #endif
 
-static int record_subpages(struct page *page, unsigned long addr,
-			   unsigned long end, struct page **pages)
+static int record_subpages(struct page *head, unsigned long sz,
+			   unsigned long addr, unsigned long end,
+			   struct page **pages)
 {
+	struct page *page;
 	int nr;
 
+	page = nth_page(head, (addr & (sz - 1)) >> PAGE_SHIFT);
 	for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
 		pages[nr] = nth_page(page, nr);
 
@@ -2804,8 +2807,8 @@  static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
 	/* hugepages are never "special" */
 	VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 
-	page = nth_page(pte_page(pte), (addr & (sz - 1)) >> PAGE_SHIFT);
-	refs = record_subpages(page, addr, end, pages + *nr);
+	page = pte_page(pte);
+	refs = record_subpages(page, sz, addr, end, pages + *nr);
 
 	folio = try_grab_folio(page, refs, flags);
 	if (!folio)
@@ -2870,8 +2873,8 @@  static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 					     pages, nr);
 	}
 
-	page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
-	refs = record_subpages(page, addr, end, pages + *nr);
+	page = pmd_page(orig);
+	refs = record_subpages(page, PMD_SIZE, addr, end, pages + *nr);
 
 	folio = try_grab_folio(page, refs, flags);
 	if (!folio)
@@ -2914,8 +2917,8 @@  static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 					     pages, nr);
 	}
 
-	page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
-	refs = record_subpages(page, addr, end, pages + *nr);
+	page = pud_page(orig);
+	refs = record_subpages(page, PUD_SIZE, addr, end, pages + *nr);
 
 	folio = try_grab_folio(page, refs, flags);
 	if (!folio)
@@ -2954,8 +2957,8 @@  static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 
 	BUILD_BUG_ON(pgd_devmap(orig));
 
-	page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT);
-	refs = record_subpages(page, addr, end, pages + *nr);
+	page = pgd_page(orig);
+	refs = record_subpages(page, PGDIR_SIZE, addr, end, pages + *nr);
 
 	folio = try_grab_folio(page, refs, flags);
 	if (!folio)