mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"

Message ID 20230812210053.2325091-1-zokeefe@google.com
State New
Headers
Series mm/thp: fix "mm: thp: kill __transhuge_page_enabled()" |

Commit Message

Zach O'Keefe Aug. 12, 2023, 9 p.m. UTC
  The 6.0 commits:

commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")

merged "can we have THPs in this VMA?" logic that was previously done
separately by fault-path, khugepaged, and smaps "THPeligible".

During the process, the check on VM_NO_KHUGEPAGED from the khugepaged
path was accidentally added to fault and smaps paths.  Certainly the
previous behavior for fault should be restored, and since smaps should
report the union of THP eligibility for fault and khugepaged, also opt
smaps out of this constraint.

Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com>
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Yang Shi <shy828301@gmail.com>
---
 mm/huge_memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Zach O'Keefe Aug. 12, 2023, 9:24 p.m. UTC | #1
On Sat, Aug 12, 2023 at 2:01 PM Zach O'Keefe <zokeefe@google.com> wrote:
>
> The 6.0 commits:
>
> commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
> commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
>
> merged "can we have THPs in this VMA?" logic that was previously done
> separately by fault-path, khugepaged, and smaps "THPeligible".
>
> During the process, the check on VM_NO_KHUGEPAGED from the khugepaged
> path was accidentally added to fault and smaps paths.  Certainly the
> previous behavior for fault should be restored, and since smaps should
> report the union of THP eligibility for fault and khugepaged, also opt
> smaps out of this constraint.
>
> Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com>
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> Cc: Yang Shi <shy828301@gmail.com>
> ---
>  mm/huge_memory.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index eb3678360b97..e098c26d5e2e 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -96,11 +96,11 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
>                 return in_pf;
>
>         /*
> -        * Special VMA and hugetlb VMA.
> +        * khugepaged check for special VMA and hugetlb VMA.
>          * Must be checked after dax since some dax mappings may have
>          * VM_MIXEDMAP set.
>          */
> -       if (vm_flags & VM_NO_KHUGEPAGED)
> +       if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED))
>                 return false;
>
>         /*
> --
> 2.41.0.694.ge786442a9b-goog
>

I should note that this was discussed before[1], and VM_MIXEDMAP was
called out then, but we didn't have any use cases.

What was reported broken by Saurabh was an out-of-tree driver that
relies on being able to fault in THPs over VM_HUGEPAGE|VM_MIXEDMAP
VMAs. We mentioned back then we could always opt fault-path out of
this check in the future, and it seems like we should.

To that extent, should this be added to stable?

Apologies, I should have added this context to the commit log.

Best,
Zach

[1] https://lore.kernel.org/linux-mm/YqdPmitColnzlXJ0@google.com/
  
Saurabh Singh Sengar Aug. 13, 2023, 6:19 a.m. UTC | #2
> -----Original Message-----
> From: Zach O'Keefe <zokeefe@google.com>
> Sent: Sunday, August 13, 2023 2:31 AM
> To: linux-mm@kvack.org; Yang Shi <shy828301@gmail.com>
> Cc: linux-kernel@vger.kernel.org; Zach O'Keefe <zokeefe@google.com>;
> Saurabh Singh Sengar <ssengar@microsoft.com>
> Subject: [EXTERNAL] [PATCH] mm/thp: fix "mm: thp: kill
> __transhuge_page_enabled()"
> 
> [You don't often get email from zokeefe@google.com. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> The 6.0 commits:
> 
> commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()") commit
> 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> 
> merged "can we have THPs in this VMA?" logic that was previously done
> separately by fault-path, khugepaged, and smaps "THPeligible".
> 
> During the process, the check on VM_NO_KHUGEPAGED from the
> khugepaged path was accidentally added to fault and smaps paths.  Certainly
> the previous behavior for fault should be restored, and since smaps should
> report the union of THP eligibility for fault and khugepaged, also opt smaps
> out of this constraint.
> 
> Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com>
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> Cc: Yang Shi <shy828301@gmail.com>
> ---
>  mm/huge_memory.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c index
> eb3678360b97..e098c26d5e2e 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -96,11 +96,11 @@ bool hugepage_vma_check(struct vm_area_struct
> *vma, unsigned long vm_flags,
>                 return in_pf;
> 
>         /*
> -        * Special VMA and hugetlb VMA.
> +        * khugepaged check for special VMA and hugetlb VMA.
>          * Must be checked after dax since some dax mappings may have
>          * VM_MIXEDMAP set.
>          */
> -       if (vm_flags & VM_NO_KHUGEPAGED)
> +       if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED))
>                 return false;
> 
>         /*
> --
> 2.41.0.694.ge786442a9b-goog

Thanks for the patch, I realized with the commit 9fec51689ff60,
!vma_is_anonymous restriction is also introduced. To make fault path
work same as before we need relaxation for this check as well. Can we
add the below as will in this patch:

-       if (!vma_is_anonymous(vma))
+       if (!is_pf && !vma_is_anonymous(vma))
                return false;

- Saurabh
  
Matthew Wilcox Aug. 15, 2023, 2:24 a.m. UTC | #3
On Mon, Aug 14, 2023 at 05:04:47PM -0700, Zach O'Keefe wrote:
> > From a large folios perspective, filesystems do not implement a special
> > handler.  They call filemap_fault() (directly or indirectly) from their
> > ->fault handler.  If there is already a folio in the page cache which
> > satisfies this fault, we insert it into the page tables (no matter what
> > size it is).  If there is no folio, we call readahead to populate that
> > index in the page cache, and probably some other indices around it.
> > That's do_sync_mmap_readahead().
> >
> > If you look at that, you'll see that we check the VM_HUGEPAGE flag, and
> > if set we align to a PMD boundary and read two PMD-size pages (so that we
> > can do async readahead for the second page, if we're doing a linear scan).
> > If the VM_HUGEPAGE flag isn't set, we'll use the readahead algorithm to
> > decide how large the folio should be that we're reading into; if it's a
> > random read workload, we'll stick to order-0 pages, but if we're getting
> > good hit rate from the linear scan, we'll increase the size (although
> > we won't go past PMD size)
> >
> > There's also the ->map_pages() optimisation which handles page faults
> > locklessly, and will fail back to ->fault() if there's even a light
> > breeze.  I don't think that's of any particular use in answering your
> > question, so I'm not going into details about it.
> >
> > I'm not sure I understand the code that's being modified well enough to
> > be able to give you a straight answer to your question, but hopefully
> > this is helpful to you.
> 
> Thank you, this was great info. I had thought, incorrectly, that large
> folio work would eventually tie into that ->huge_fault() handler
> (should be dax_huge_fault() ?)
> 
> If that's the case, then faulting file-backed, non-DAX memory as
> (pmd-mapped-)THPs isn't supported at all, and no fault lies with the
> aforementioned patches.

Ah, wait, hang on.  You absolutely can get a PMD mapping by calling into
->fault.  Look at how finish_fault() works:

        if (pmd_none(*vmf->pmd)) {
                if (PageTransCompound(page)) {
                        ret = do_set_pmd(vmf, page);
                        if (ret != VM_FAULT_FALLBACK)
                                return ret;
                }

                if (vmf->prealloc_pte)
                        pmd_install(vma->vm_mm, vmf->pmd, &vmf->prealloc_pte);

So if we find a large folio that is PMD mappable, and there's nothing
at vmf->pmd, we install a PMD-sized mapping at that spot.  If that
fails, we install the preallocated PTE table at vmf->pmd and continue to
trying set one or more PTEs to satisfy this page fault.

So why, you may be asking, do we have ->huge_fault.  Well, you should
ask the clown who did commit b96375f74a6d ... in fairness to me,
finish_fault() did not exist at the time, and the ability to return
a PMD-sized page was added later.
  

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index eb3678360b97..e098c26d5e2e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -96,11 +96,11 @@  bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
 		return in_pf;
 
 	/*
-	 * Special VMA and hugetlb VMA.
+	 * khugepaged check for special VMA and hugetlb VMA.
 	 * Must be checked after dax since some dax mappings may have
 	 * VM_MIXEDMAP set.
 	 */
-	if (vm_flags & VM_NO_KHUGEPAGED)
+	if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED))
 		return false;
 
 	/*