[2/3] hugetlb: Use vmf_anon_prepare() instead of anon_vma_prepare()

Message ID 20240220231424.126600-3-vishal.moola@gmail.com
State New
Headers
Series Handle hugetlb faults under the VMA lock |

Commit Message

Vishal Moola Feb. 20, 2024, 11:14 p.m. UTC
  hugetlb_no_page() and hugetlb_wp() call anon_vma_prepare(). In
preparation for hugetlb to safely handle faults under the VMA lock,
use vmf_anon_prepare() here instead.

Additionally, define a struct vm_fault at the top of each function.
These can later be used to convert hugetlb to use struct vm_fault -
similar to mm/memory.

Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
---
 mm/hugetlb.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)
  

Comments

Matthew Wilcox Feb. 21, 2024, 3:46 a.m. UTC | #1
On Tue, Feb 20, 2024 at 03:14:23PM -0800, Vishal Moola (Oracle) wrote:
> +++ b/mm/hugetlb.c
> @@ -5834,9 +5834,15 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>  	struct folio *old_folio;
>  	struct folio *new_folio;
>  	int outside_reserve = 0;
> -	vm_fault_t ret = 0;
> +	vm_fault_t ret = 0, anon_ret = 0;

Do we need a second variable here?  Seems to me like we could
unconditionally assign to ret:

> -	if (unlikely(anon_vma_prepare(vma))) {
> -		ret = VM_FAULT_OOM;
> +	anon_ret = vmf_anon_prepare(&vmf);
> +	if (unlikely(anon_ret)) {
> +		ret = anon_ret;



>  	unsigned long haddr = address & huge_page_mask(h);
>  	struct mmu_notifier_range range;
> +	struct vm_fault vmf = {
> +				.vma = vma,
> +				.address = haddr,
> +				.real_address = address,
> +				.flags = flags,
> +	};

We don't usually indent quite so far.  One extra tab would be enough.

Also, I thought we talked about creating the vmf in hugetlb_fault(),
then passing it to hugetlb_wp() hugetlb_no_page() and handle_userfault()?
Was there a reason to abandon that idea?
  
Vishal Moola Feb. 21, 2024, 5:15 p.m. UTC | #2
On Tue, Feb 20, 2024 at 7:46 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Feb 20, 2024 at 03:14:23PM -0800, Vishal Moola (Oracle) wrote:
> > +++ b/mm/hugetlb.c
> > @@ -5834,9 +5834,15 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
> >       struct folio *old_folio;
> >       struct folio *new_folio;
> >       int outside_reserve = 0;
> > -     vm_fault_t ret = 0;
> > +     vm_fault_t ret = 0, anon_ret = 0;
>
> Do we need a second variable here?  Seems to me like we could
> unconditionally assign to ret:

Hmm, looks like we can directly assign to ret without any problems
in both functions, I'll change that for v2.

> > -     if (unlikely(anon_vma_prepare(vma))) {
> > -             ret = VM_FAULT_OOM;
> > +     anon_ret = vmf_anon_prepare(&vmf);
> > +     if (unlikely(anon_ret)) {
> > +             ret = anon_ret;
>
>
>
> >       unsigned long haddr = address & huge_page_mask(h);
> >       struct mmu_notifier_range range;
> > +     struct vm_fault vmf = {
> > +                             .vma = vma,
> > +                             .address = haddr,
> > +                             .real_address = address,
> > +                             .flags = flags,
> > +     };
>
> We don't usually indent quite so far.  One extra tab would be enough.
>
> Also, I thought we talked about creating the vmf in hugetlb_fault(),
> then passing it to hugetlb_wp() hugetlb_no_page() and handle_userfault()?
> Was there a reason to abandon that idea?

No I haven't abandoned that idea, I intend to have a separate patchset to go
on top of this one - just keeping them separate since they are conceptually
different. I'm converting each function to use struct vm_fault first, then
shifting it to be passed throughout as an arguement while cleaning up the
excess variables laying around. In a sense working bottom-up instead
of top-down.
  
Matthew Wilcox Feb. 21, 2024, 5:55 p.m. UTC | #3
On Wed, Feb 21, 2024 at 09:15:51AM -0800, Vishal Moola wrote:
> > >       unsigned long haddr = address & huge_page_mask(h);
> > >       struct mmu_notifier_range range;
> > > +     struct vm_fault vmf = {
> > > +                             .vma = vma,
> > > +                             .address = haddr,
> > > +                             .real_address = address,
> > > +                             .flags = flags,
> > > +     };
> >
> > We don't usually indent quite so far.  One extra tab would be enough.
> >
> > Also, I thought we talked about creating the vmf in hugetlb_fault(),
> > then passing it to hugetlb_wp() hugetlb_no_page() and handle_userfault()?
> > Was there a reason to abandon that idea?
> 
> No I haven't abandoned that idea, I intend to have a separate patchset to go
> on top of this one - just keeping them separate since they are conceptually
> different. I'm converting each function to use struct vm_fault first, then
> shifting it to be passed throughout as an arguement while cleaning up the
> excess variables laying around. In a sense working bottom-up instead
> of top-down.

I think you'll find it less work to create it in hugetlb_fault()
first.  ie patch 2 could be to hoist its creation from half-way down
hugetlb_fault to the top of hugetlb_fault.  Patch 3 could pass it
through hugetlb_no_page() to hugetlb_handle_userfault() and remove its
creation there.  Now you've alreedy got it, and can make use of it in
this patch which would be the new patch 4.

If you want to do a cleanup patch afterwards, you could hoist the vmf
creation all the way to handle_mm_fault() ;-)
  
Vishal Moola Feb. 21, 2024, 6:02 p.m. UTC | #4
On Wed, Feb 21, 2024 at 9:55 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Feb 21, 2024 at 09:15:51AM -0800, Vishal Moola wrote:
> > > >       unsigned long haddr = address & huge_page_mask(h);
> > > >       struct mmu_notifier_range range;
> > > > +     struct vm_fault vmf = {
> > > > +                             .vma = vma,
> > > > +                             .address = haddr,
> > > > +                             .real_address = address,
> > > > +                             .flags = flags,
> > > > +     };
> > >
> > > We don't usually indent quite so far.  One extra tab would be enough.
> > >
> > > Also, I thought we talked about creating the vmf in hugetlb_fault(),
> > > then passing it to hugetlb_wp() hugetlb_no_page() and handle_userfault()?
> > > Was there a reason to abandon that idea?
> >
> > No I haven't abandoned that idea, I intend to have a separate patchset to go
> > on top of this one - just keeping them separate since they are conceptually
> > different. I'm converting each function to use struct vm_fault first, then
> > shifting it to be passed throughout as an arguement while cleaning up the
> > excess variables laying around. In a sense working bottom-up instead
> > of top-down.
>
> I think you'll find it less work to create it in hugetlb_fault()
> first.  ie patch 2 could be to hoist its creation from half-way down
> hugetlb_fault to the top of hugetlb_fault.  Patch 3 could pass it
> through hugetlb_no_page() to hugetlb_handle_userfault() and remove its
> creation there.  Now you've alreedy got it, and can make use of it in
> this patch which would be the new patch 4.

Ah I see, that way would definitely be a lot less work. I'll make that
change for this patchset in v2 then.

> If you want to do a cleanup patch afterwards, you could hoist the vmf
> creation all the way to handle_mm_fault() ;-)

Yeah, I was already looking at doing that in the future patchset :)
  

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ed1581b670d4..10f57306e1f0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5834,9 +5834,15 @@  static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
 	struct folio *old_folio;
 	struct folio *new_folio;
 	int outside_reserve = 0;
-	vm_fault_t ret = 0;
+	vm_fault_t ret = 0, anon_ret = 0;
 	unsigned long haddr = address & huge_page_mask(h);
 	struct mmu_notifier_range range;
+	struct vm_fault vmf = {
+				.vma = vma,
+				.address = haddr,
+				.real_address = address,
+				.flags = flags,
+	};
 
 	/*
 	 * Never handle CoW for uffd-wp protected pages.  It should be only
@@ -5960,8 +5966,9 @@  static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * When the original hugepage is shared one, it does not have
 	 * anon_vma prepared.
 	 */
-	if (unlikely(anon_vma_prepare(vma))) {
-		ret = VM_FAULT_OOM;
+	anon_ret = vmf_anon_prepare(&vmf);
+	if (unlikely(anon_ret)) {
+		ret = anon_ret;
 		goto out_release_all;
 	}
 
@@ -6119,7 +6126,7 @@  static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 			pte_t old_pte, unsigned int flags)
 {
 	struct hstate *h = hstate_vma(vma);
-	vm_fault_t ret = VM_FAULT_SIGBUS;
+	vm_fault_t ret = VM_FAULT_SIGBUS, anon_ret = 0;
 	int anon_rmap = 0;
 	unsigned long size;
 	struct folio *folio;
@@ -6128,6 +6135,12 @@  static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 	unsigned long haddr = address & huge_page_mask(h);
 	bool new_folio, new_pagecache_folio = false;
 	u32 hash = hugetlb_fault_mutex_hash(mapping, idx);
+	struct vm_fault vmf = {
+				.vma = vma,
+				.address = haddr,
+				.real_address = address,
+				.flags = flags,
+	};
 
 	/*
 	 * Currently, we are forced to kill the process in the event the
@@ -6221,8 +6234,10 @@  static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 			new_pagecache_folio = true;
 		} else {
 			folio_lock(folio);
-			if (unlikely(anon_vma_prepare(vma))) {
-				ret = VM_FAULT_OOM;
+
+			anon_ret = vmf_anon_prepare(&vmf);
+			if (unlikely(anon_ret)) {
+				ret = anon_ret;
 				goto backout_unlocked;
 			}
 			anon_rmap = 1;