[RFC,07/12] mm/gup: Refactor record_subpages() to find 1st small page
Commit Message
All the fast-gup functions take a tail page to operate, always need to do
page mask calculations before feeding that into record_subpages().
Merge that logic into record_subpages(), so that we always take a head
page, and leave the rest calculation to record_subpages().
Signed-off-by: Peter Xu <peterx@redhat.com>
---
mm/gup.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
Comments
On Wed, Nov 15, 2023 at 08:29:03PM -0500, Peter Xu wrote:
> All the fast-gup functions take a tail page to operate, always need to do
> page mask calculations before feeding that into record_subpages().
>
> Merge that logic into record_subpages(), so that we always take a head
> page, and leave the rest calculation to record_subpages().
This is a bit fragile. You're assuming that pmd_page() always returns
a head page, and that's only true today because I looked at the work
required vs the reward and decided to cap the large folio size at PMD
size. If we allowed 2*PMD_SIZE (eg 4MB on x86), pmd_page() would not
return a head page. There is a small amount of demand for > PMD size
large folio support, so I suspect we will want to do this eventually.
I'm not particularly trying to do these conversions, but it would be
good to not add more assumptions that pmd_page() returns a head page.
> +static int record_subpages(struct page *head, unsigned long sz,
> + unsigned long addr, unsigned long end,
> + struct page **pages)
> @@ -2870,8 +2873,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> pages, nr);
> }
>
> - page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
> - refs = record_subpages(page, addr, end, pages + *nr);
> + page = pmd_page(orig);
> + refs = record_subpages(page, PMD_SIZE, addr, end, pages + *nr);
>
> folio = try_grab_folio(page, refs, flags);
> if (!folio)
On Thu, Nov 16, 2023 at 02:51:52PM +0000, Matthew Wilcox wrote:
> On Wed, Nov 15, 2023 at 08:29:03PM -0500, Peter Xu wrote:
> > All the fast-gup functions take a tail page to operate, always need to do
> > page mask calculations before feeding that into record_subpages().
> >
> > Merge that logic into record_subpages(), so that we always take a head
> > page, and leave the rest calculation to record_subpages().
>
> This is a bit fragile. You're assuming that pmd_page() always returns
> a head page, and that's only true today because I looked at the work
> required vs the reward and decided to cap the large folio size at PMD
> size. If we allowed 2*PMD_SIZE (eg 4MB on x86), pmd_page() would not
> return a head page. There is a small amount of demand for > PMD size
> large folio support, so I suspect we will want to do this eventually.
> I'm not particularly trying to do these conversions, but it would be
> good to not add more assumptions that pmd_page() returns a head page.
Makes sense. Actually, IIUC arm64's CONT_PMD pages can already make that
not a head page.
The code should still be correct, though. AFAIU what I need to do then is
renaming the first field of record_subpages() (s/head/base/) in the next
version, or just keep it the old one ("page"), then update the commit
message.
Thanks,
On Thu, Nov 16, 2023 at 02:40:21PM -0500, Peter Xu wrote:
> On Thu, Nov 16, 2023 at 02:51:52PM +0000, Matthew Wilcox wrote:
> > On Wed, Nov 15, 2023 at 08:29:03PM -0500, Peter Xu wrote:
> > > All the fast-gup functions take a tail page to operate, always need to do
> > > page mask calculations before feeding that into record_subpages().
> > >
> > > Merge that logic into record_subpages(), so that we always take a head
> > > page, and leave the rest calculation to record_subpages().
> >
> > This is a bit fragile. You're assuming that pmd_page() always returns
> > a head page, and that's only true today because I looked at the work
> > required vs the reward and decided to cap the large folio size at PMD
> > size. If we allowed 2*PMD_SIZE (eg 4MB on x86), pmd_page() would not
> > return a head page. There is a small amount of demand for > PMD size
> > large folio support, so I suspect we will want to do this eventually.
> > I'm not particularly trying to do these conversions, but it would be
> > good to not add more assumptions that pmd_page() returns a head page.
>
> Makes sense. Actually, IIUC arm64's CONT_PMD pages can already make that
> not a head page.
>
> The code should still be correct, though. AFAIU what I need to do then is
> renaming the first field of record_subpages() (s/head/base/) in the next
> version, or just keep it the old one ("page"), then update the commit
> message.
Yeah, I think just leave it as 'page' would be best. Thanks.
@@ -2763,11 +2763,14 @@ static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, unsigned long addr,
}
#endif
-static int record_subpages(struct page *page, unsigned long addr,
- unsigned long end, struct page **pages)
+static int record_subpages(struct page *head, unsigned long sz,
+ unsigned long addr, unsigned long end,
+ struct page **pages)
{
+ struct page *page;
int nr;
+ page = nth_page(head, (addr & (sz - 1)) >> PAGE_SHIFT);
for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
pages[nr] = nth_page(page, nr);
@@ -2804,8 +2807,8 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
/* hugepages are never "special" */
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
- page = nth_page(pte_page(pte), (addr & (sz - 1)) >> PAGE_SHIFT);
- refs = record_subpages(page, addr, end, pages + *nr);
+ page = pte_page(pte);
+ refs = record_subpages(page, sz, addr, end, pages + *nr);
folio = try_grab_folio(page, refs, flags);
if (!folio)
@@ -2870,8 +2873,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
pages, nr);
}
- page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
- refs = record_subpages(page, addr, end, pages + *nr);
+ page = pmd_page(orig);
+ refs = record_subpages(page, PMD_SIZE, addr, end, pages + *nr);
folio = try_grab_folio(page, refs, flags);
if (!folio)
@@ -2914,8 +2917,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
pages, nr);
}
- page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
- refs = record_subpages(page, addr, end, pages + *nr);
+ page = pud_page(orig);
+ refs = record_subpages(page, PUD_SIZE, addr, end, pages + *nr);
folio = try_grab_folio(page, refs, flags);
if (!folio)
@@ -2954,8 +2957,8 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
BUILD_BUG_ON(pgd_devmap(orig));
- page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT);
- refs = record_subpages(page, addr, end, pages + *nr);
+ page = pgd_page(orig);
+ refs = record_subpages(page, PGDIR_SIZE, addr, end, pages + *nr);
folio = try_grab_folio(page, refs, flags);
if (!folio)