[v2,5/5] hugetlb: Allow faults to be handled under the VMA lock

Message ID 20240221234732.187629-6-vishal.moola@gmail.com
State New
Headers
Series Handle hugetlb faults under the VMA lock |

Commit Message

Vishal Moola Feb. 21, 2024, 11:47 p.m. UTC
  Hugetlb can now safely handle faults under the VMA lock, so allow it to
do so.

This patch may cause ltp hugemmap10 to "fail". Hugemmap10 tests hugetlb
counters, and expects the counters to remain unchanged on failure to
handle a fault.

In hugetlb_no_page(), vmf_anon_prepare() may bailout with no anon_vma
under the VMA lock after allocating a folio for the hugepage. In
free_huge_folio(), this folio is completely freed on bailout iff there
is a surplus of hugetlb pages. This will remove a folio off the freelist
and decrement the number of hugepages while ltp expects these counters
to remain unchanged on failure.

Originally this could only happen due to OOM failures, but now it may
also occur after we allocate a hugetlb folio without a suitable anon_vma
under the VMA lock. This should only happen for the first freshly
allocated hugepage in this vma.

Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
---
 mm/hugetlb.c | 6 ------
 1 file changed, 6 deletions(-)
  

Comments

Matthew Wilcox Feb. 22, 2024, 3:55 a.m. UTC | #1
On Wed, Feb 21, 2024 at 03:47:32PM -0800, Vishal Moola (Oracle) wrote:
> Hugetlb can now safely handle faults under the VMA lock, so allow it to
> do so.
> 
> This patch may cause ltp hugemmap10 to "fail". Hugemmap10 tests hugetlb
> counters, and expects the counters to remain unchanged on failure to
> handle a fault.
> 
> In hugetlb_no_page(), vmf_anon_prepare() may bailout with no anon_vma
> under the VMA lock after allocating a folio for the hugepage. In
> free_huge_folio(), this folio is completely freed on bailout iff there
> is a surplus of hugetlb pages. This will remove a folio off the freelist
> and decrement the number of hugepages while ltp expects these counters
> to remain unchanged on failure.
> 
> Originally this could only happen due to OOM failures, but now it may
> also occur after we allocate a hugetlb folio without a suitable anon_vma
> under the VMA lock. This should only happen for the first freshly
> allocated hugepage in this vma.

Hmm, so it's a bug in LTP.  Have you talked to the LTP people about it
(cc's added)?

Also, did you try moving the anon_vma check befor the folio allocation
so that you can bail out without disturbing the counters?

> Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
  
Vishal Moola Feb. 22, 2024, 5:27 p.m. UTC | #2
On Wed, Feb 21, 2024 at 7:55 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Feb 21, 2024 at 03:47:32PM -0800, Vishal Moola (Oracle) wrote:
> > Hugetlb can now safely handle faults under the VMA lock, so allow it to
> > do so.
> >
> > This patch may cause ltp hugemmap10 to "fail". Hugemmap10 tests hugetlb
> > counters, and expects the counters to remain unchanged on failure to
> > handle a fault.
> >
> > In hugetlb_no_page(), vmf_anon_prepare() may bailout with no anon_vma
> > under the VMA lock after allocating a folio for the hugepage. In
> > free_huge_folio(), this folio is completely freed on bailout iff there
> > is a surplus of hugetlb pages. This will remove a folio off the freelist
> > and decrement the number of hugepages while ltp expects these counters
> > to remain unchanged on failure.
> >
> > Originally this could only happen due to OOM failures, but now it may
> > also occur after we allocate a hugetlb folio without a suitable anon_vma
> > under the VMA lock. This should only happen for the first freshly
> > allocated hugepage in this vma.
>
> Hmm, so it's a bug in LTP.  Have you talked to the LTP people about it
> (cc's added)?
>
> Also, did you try moving the anon_vma check befor the folio allocation
> so that you can bail out without disturbing the counters?

Moving the check before folio allocation occurs keeps the folios on the
freelist so the counters remain static as expected.

If we are looking at a shareable vma, hugetlb_no_page() does not need
this check at all, so I left the check where it is. We could definitely place
the anon_vma check above allocation, it would just make the code
slightly more complicated - needing another variable (or reassigning
ret in various places) as well as adding another potentially unnecessary
vma flag check.

> > Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
>
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
  

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ae8c8b3da981..688017ca0cc2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6354,12 +6354,6 @@  vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		 */
 	};
 
-	/* TODO: Handle faults under the VMA lock */
-	if (flags & FAULT_FLAG_VMA_LOCK) {
-		vma_end_read(vma);
-		return VM_FAULT_RETRY;
-	}
-
 	/*
 	 * Serialize hugepage allocation and instantiation, so that we don't
 	 * get spurious allocation failures if two CPUs race to instantiate