[v2,4/6] mm: drop VMA lock before waiting for migration

Message ID 20230609005158.2421285-5-surenb@google.com
State New
Headers
Series Per-vma lock support for swap and userfaults |

Commit Message

Suren Baghdasaryan June 9, 2023, 12:51 a.m. UTC
  migration_entry_wait does not need VMA lock, therefore it can be dropped
before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA
lock was dropped while in handle_mm_fault().
Note that once VMA lock is dropped, the VMA reference can't be used as
there are no guarantees it was not freed.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 arch/arm64/mm/fault.c    |  3 ++-
 arch/powerpc/mm/fault.c  |  3 ++-
 arch/s390/mm/fault.c     |  3 ++-
 arch/x86/mm/fault.c      |  3 ++-
 include/linux/mm_types.h |  6 +++++-
 mm/memory.c              | 12 ++++++++++--
 6 files changed, 23 insertions(+), 7 deletions(-)
  

Comments

Peter Xu June 9, 2023, 8:42 p.m. UTC | #1
On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote:
> migration_entry_wait does not need VMA lock, therefore it can be dropped
> before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA
> lock was dropped while in handle_mm_fault().
> Note that once VMA lock is dropped, the VMA reference can't be used as
> there are no guarantees it was not freed.

Then vma lock behaves differently from mmap read lock, am I right?  Can we
still make them match on behaviors, or there's reason not to do so?

One reason is if they match they can reuse existing flags and there'll be
less confusing, e.g. this:

  (fault->flags & FAULT_FLAG_VMA_LOCK) &&
    (vm_fault_ret && (VM_FAULT_RETRY || VM_FAULT_COMPLETE))

can replace the new flag, iiuc.

Thanks,
  
Suren Baghdasaryan June 9, 2023, 10:30 p.m. UTC | #2
On Fri, Jun 9, 2023 at 1:42 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote:
> > migration_entry_wait does not need VMA lock, therefore it can be dropped
> > before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA
> > lock was dropped while in handle_mm_fault().
> > Note that once VMA lock is dropped, the VMA reference can't be used as
> > there are no guarantees it was not freed.
>
> Then vma lock behaves differently from mmap read lock, am I right?  Can we
> still make them match on behaviors, or there's reason not to do so?

I think we could match their behavior by also dropping mmap_lock here
when fault is handled under mmap_lock (!(fault->flags &
FAULT_FLAG_VMA_LOCK)).
I missed the fact that VM_FAULT_COMPLETED can be used to skip dropping
mmap_lock in do_page_fault(), so indeed, I might be able to use
VM_FAULT_COMPLETED to skip vma_end_read(vma) for per-vma locks as well
instead of introducing FAULT_FLAG_VMA_LOCK. I think that was your idea
of reusing existing flags?

>
> One reason is if they match they can reuse existing flags and there'll be
> less confusing, e.g. this:
>
>   (fault->flags & FAULT_FLAG_VMA_LOCK) &&
>     (vm_fault_ret && (VM_FAULT_RETRY || VM_FAULT_COMPLETE))
>
> can replace the new flag, iiuc.
>
> Thanks,
>
> --
> Peter Xu
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
  
Suren Baghdasaryan June 10, 2023, 1:29 a.m. UTC | #3
On Fri, Jun 9, 2023 at 3:30 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Fri, Jun 9, 2023 at 1:42 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote:
> > > migration_entry_wait does not need VMA lock, therefore it can be dropped
> > > before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA
> > > lock was dropped while in handle_mm_fault().
> > > Note that once VMA lock is dropped, the VMA reference can't be used as
> > > there are no guarantees it was not freed.
> >
> > Then vma lock behaves differently from mmap read lock, am I right?  Can we
> > still make them match on behaviors, or there's reason not to do so?
>
> I think we could match their behavior by also dropping mmap_lock here
> when fault is handled under mmap_lock (!(fault->flags &
> FAULT_FLAG_VMA_LOCK)).
> I missed the fact that VM_FAULT_COMPLETED can be used to skip dropping
> mmap_lock in do_page_fault(), so indeed, I might be able to use
> VM_FAULT_COMPLETED to skip vma_end_read(vma) for per-vma locks as well
> instead of introducing FAULT_FLAG_VMA_LOCK. I think that was your idea
> of reusing existing flags?
Sorry, I meant VM_FAULT_VMA_UNLOCKED, not FAULT_FLAG_VMA_LOCK in the
above reply.

I took a closer look into using VM_FAULT_COMPLETED instead of
VM_FAULT_VMA_UNLOCKED but when we fall back from per-vma lock to
mmap_lock we need to retry with an indication that the per-vma lock
was dropped. Returning (VM_FAULT_RETRY | VM_FAULT_COMPLETE) to
indicate such state seems strange to me ("retry" and "complete" seem
like contradicting concepts to be used in a single result). I could
use VM_FAULT_COMPLETE when releasing mmap_lock since we don't use it
in combination with VM_FAULT_RETRY and (VM_FAULT_RETRY |
VM_FAULT_VMA_UNLOCKED) when dropping per-vma lock and falling back to
mmap_lock. It still requires the new VM_FAULT_VMA_UNLOCKED flag but I
think logically that makes more sense. WDYT?

>
> >
> > One reason is if they match they can reuse existing flags and there'll be
> > less confusing, e.g. this:
> >
> >   (fault->flags & FAULT_FLAG_VMA_LOCK) &&
> >     (vm_fault_ret && (VM_FAULT_RETRY || VM_FAULT_COMPLETE))
> >
> > can replace the new flag, iiuc.
> >
> > Thanks,
> >
> > --
> > Peter Xu
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >
  
Peter Xu June 12, 2023, 1:28 p.m. UTC | #4
On Fri, Jun 09, 2023 at 03:30:10PM -0700, Suren Baghdasaryan wrote:
> On Fri, Jun 9, 2023 at 1:42 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote:
> > > migration_entry_wait does not need VMA lock, therefore it can be dropped
> > > before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA
> > > lock was dropped while in handle_mm_fault().
> > > Note that once VMA lock is dropped, the VMA reference can't be used as
> > > there are no guarantees it was not freed.
> >
> > Then vma lock behaves differently from mmap read lock, am I right?  Can we
> > still make them match on behaviors, or there's reason not to do so?
> 
> I think we could match their behavior by also dropping mmap_lock here
> when fault is handled under mmap_lock (!(fault->flags &
> FAULT_FLAG_VMA_LOCK)).
> I missed the fact that VM_FAULT_COMPLETED can be used to skip dropping
> mmap_lock in do_page_fault(), so indeed, I might be able to use
> VM_FAULT_COMPLETED to skip vma_end_read(vma) for per-vma locks as well
> instead of introducing FAULT_FLAG_VMA_LOCK. I think that was your idea
> of reusing existing flags?

Yes.

I'd suggest we move this patch out of the series as it's not really part of
it on enabling swap + uffd.  It can be a separate patch and hopefully it'll
always change both vma+mmap lock cases, and with proper reasonings.

Thanks,
  
Peter Xu June 12, 2023, 1:36 p.m. UTC | #5
On Fri, Jun 09, 2023 at 06:29:43PM -0700, Suren Baghdasaryan wrote:
> On Fri, Jun 9, 2023 at 3:30 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Fri, Jun 9, 2023 at 1:42 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote:
> > > > migration_entry_wait does not need VMA lock, therefore it can be dropped
> > > > before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA
> > > > lock was dropped while in handle_mm_fault().
> > > > Note that once VMA lock is dropped, the VMA reference can't be used as
> > > > there are no guarantees it was not freed.
> > >
> > > Then vma lock behaves differently from mmap read lock, am I right?  Can we
> > > still make them match on behaviors, or there's reason not to do so?
> >
> > I think we could match their behavior by also dropping mmap_lock here
> > when fault is handled under mmap_lock (!(fault->flags &
> > FAULT_FLAG_VMA_LOCK)).
> > I missed the fact that VM_FAULT_COMPLETED can be used to skip dropping
> > mmap_lock in do_page_fault(), so indeed, I might be able to use
> > VM_FAULT_COMPLETED to skip vma_end_read(vma) for per-vma locks as well
> > instead of introducing FAULT_FLAG_VMA_LOCK. I think that was your idea
> > of reusing existing flags?
> Sorry, I meant VM_FAULT_VMA_UNLOCKED, not FAULT_FLAG_VMA_LOCK in the
> above reply.
> 
> I took a closer look into using VM_FAULT_COMPLETED instead of
> VM_FAULT_VMA_UNLOCKED but when we fall back from per-vma lock to
> mmap_lock we need to retry with an indication that the per-vma lock
> was dropped. Returning (VM_FAULT_RETRY | VM_FAULT_COMPLETE) to
> indicate such state seems strange to me ("retry" and "complete" seem

Not relevant to this migration patch, but for the whole idea I was thinking
whether it should just work if we simply:

        fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
-       vma_end_read(vma);
+       if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
+               vma_end_read(vma);

?

GUP may need more caution on NOWAIT, but vma lock is only in fault paths so
IIUC it's fine?
  
Suren Baghdasaryan June 12, 2023, 3:41 p.m. UTC | #6
On Mon, Jun 12, 2023 at 6:28 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Jun 09, 2023 at 03:30:10PM -0700, Suren Baghdasaryan wrote:
> > On Fri, Jun 9, 2023 at 1:42 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote:
> > > > migration_entry_wait does not need VMA lock, therefore it can be dropped
> > > > before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA
> > > > lock was dropped while in handle_mm_fault().
> > > > Note that once VMA lock is dropped, the VMA reference can't be used as
> > > > there are no guarantees it was not freed.
> > >
> > > Then vma lock behaves differently from mmap read lock, am I right?  Can we
> > > still make them match on behaviors, or there's reason not to do so?
> >
> > I think we could match their behavior by also dropping mmap_lock here
> > when fault is handled under mmap_lock (!(fault->flags &
> > FAULT_FLAG_VMA_LOCK)).
> > I missed the fact that VM_FAULT_COMPLETED can be used to skip dropping
> > mmap_lock in do_page_fault(), so indeed, I might be able to use
> > VM_FAULT_COMPLETED to skip vma_end_read(vma) for per-vma locks as well
> > instead of introducing FAULT_FLAG_VMA_LOCK. I think that was your idea
> > of reusing existing flags?
>
> Yes.
>
> I'd suggest we move this patch out of the series as it's not really part of
> it on enabling swap + uffd.  It can be a separate patch and hopefully it'll
> always change both vma+mmap lock cases, and with proper reasonings.

Ok, I can move it out with mmap_lock support only and then add per-vma
lock support in my patchset (because this path is still part of
do_swap_page and my patchset enables swap support for per-vma locks).

>
> Thanks,
>
> --
> Peter Xu
>
  
Suren Baghdasaryan June 12, 2023, 4:07 p.m. UTC | #7
On Mon, Jun 12, 2023 at 6:36 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Jun 09, 2023 at 06:29:43PM -0700, Suren Baghdasaryan wrote:
> > On Fri, Jun 9, 2023 at 3:30 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Fri, Jun 9, 2023 at 1:42 PM Peter Xu <peterx@redhat.com> wrote:
> > > >
> > > > On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote:
> > > > > migration_entry_wait does not need VMA lock, therefore it can be dropped
> > > > > before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA
> > > > > lock was dropped while in handle_mm_fault().
> > > > > Note that once VMA lock is dropped, the VMA reference can't be used as
> > > > > there are no guarantees it was not freed.
> > > >
> > > > Then vma lock behaves differently from mmap read lock, am I right?  Can we
> > > > still make them match on behaviors, or there's reason not to do so?
> > >
> > > I think we could match their behavior by also dropping mmap_lock here
> > > when fault is handled under mmap_lock (!(fault->flags &
> > > FAULT_FLAG_VMA_LOCK)).
> > > I missed the fact that VM_FAULT_COMPLETED can be used to skip dropping
> > > mmap_lock in do_page_fault(), so indeed, I might be able to use
> > > VM_FAULT_COMPLETED to skip vma_end_read(vma) for per-vma locks as well
> > > instead of introducing FAULT_FLAG_VMA_LOCK. I think that was your idea
> > > of reusing existing flags?
> > Sorry, I meant VM_FAULT_VMA_UNLOCKED, not FAULT_FLAG_VMA_LOCK in the
> > above reply.
> >
> > I took a closer look into using VM_FAULT_COMPLETED instead of
> > VM_FAULT_VMA_UNLOCKED but when we fall back from per-vma lock to
> > mmap_lock we need to retry with an indication that the per-vma lock
> > was dropped. Returning (VM_FAULT_RETRY | VM_FAULT_COMPLETE) to
> > indicate such state seems strange to me ("retry" and "complete" seem
>
> Not relevant to this migration patch, but for the whole idea I was thinking
> whether it should just work if we simply:
>
>         fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
> -       vma_end_read(vma);
> +       if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
> +               vma_end_read(vma);
>
> ?

Today when we can't handle a page fault under per-vma locks we return
VM_FAULT_RETRY, in which case per-vma lock is dropped and the fault is
retried under mmap_lock. The condition you suggest above would not
drop per-vma lock for VM_FAULT_RETRY case and would break the current
fallback mechanism.
However your suggestion gave me an idea. I could indicate that per-vma
lock got dropped using vmf structure (like Matthew suggested before)
and once handle_pte_fault(vmf) returns I could check if it returned
VM_FAULT_RETRY but per-vma lock is still held. If that happens I can
call vma_end_read() before returning from __handle_mm_fault(). That
way any time handle_mm_fault() returns VM_FAULT_RETRY per-vma lock
will be already released, so your condition in do_page_fault() will
work correctly. That would eliminate the need for a new
VM_FAULT_VMA_UNLOCKED flag. WDYT?

>
> GUP may need more caution on NOWAIT, but vma lock is only in fault paths so
> IIUC it's fine?
>
> --
> Peter Xu
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
  
Peter Xu June 12, 2023, 6:34 p.m. UTC | #8
On Mon, Jun 12, 2023 at 09:07:38AM -0700, Suren Baghdasaryan wrote:
> On Mon, Jun 12, 2023 at 6:36 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Fri, Jun 09, 2023 at 06:29:43PM -0700, Suren Baghdasaryan wrote:
> > > On Fri, Jun 9, 2023 at 3:30 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > On Fri, Jun 9, 2023 at 1:42 PM Peter Xu <peterx@redhat.com> wrote:
> > > > >
> > > > > On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote:
> > > > > > migration_entry_wait does not need VMA lock, therefore it can be dropped
> > > > > > before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA
> > > > > > lock was dropped while in handle_mm_fault().
> > > > > > Note that once VMA lock is dropped, the VMA reference can't be used as
> > > > > > there are no guarantees it was not freed.
> > > > >
> > > > > Then vma lock behaves differently from mmap read lock, am I right?  Can we
> > > > > still make them match on behaviors, or there's reason not to do so?
> > > >
> > > > I think we could match their behavior by also dropping mmap_lock here
> > > > when fault is handled under mmap_lock (!(fault->flags &
> > > > FAULT_FLAG_VMA_LOCK)).
> > > > I missed the fact that VM_FAULT_COMPLETED can be used to skip dropping
> > > > mmap_lock in do_page_fault(), so indeed, I might be able to use
> > > > VM_FAULT_COMPLETED to skip vma_end_read(vma) for per-vma locks as well
> > > > instead of introducing FAULT_FLAG_VMA_LOCK. I think that was your idea
> > > > of reusing existing flags?
> > > Sorry, I meant VM_FAULT_VMA_UNLOCKED, not FAULT_FLAG_VMA_LOCK in the
> > > above reply.
> > >
> > > I took a closer look into using VM_FAULT_COMPLETED instead of
> > > VM_FAULT_VMA_UNLOCKED but when we fall back from per-vma lock to
> > > mmap_lock we need to retry with an indication that the per-vma lock
> > > was dropped. Returning (VM_FAULT_RETRY | VM_FAULT_COMPLETE) to
> > > indicate such state seems strange to me ("retry" and "complete" seem
> >
> > Not relevant to this migration patch, but for the whole idea I was thinking
> > whether it should just work if we simply:
> >
> >         fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
> > -       vma_end_read(vma);
> > +       if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
> > +               vma_end_read(vma);
> >
> > ?
> 
> Today when we can't handle a page fault under per-vma locks we return
> VM_FAULT_RETRY, in which case per-vma lock is dropped and the fault is

Oh I see what I missed.  I think it may not be a good idea to reuse
VM_FAULT_RETRY just for that, because it makes VM_FAULT_RETRY even more
complicated - now it adds one more case where the lock is not released,
that's when PER_VMA even if !NOWAIT.

> retried under mmap_lock. The condition you suggest above would not
> drop per-vma lock for VM_FAULT_RETRY case and would break the current
> fallback mechanism.
> However your suggestion gave me an idea. I could indicate that per-vma
> lock got dropped using vmf structure (like Matthew suggested before)
> and once handle_pte_fault(vmf) returns I could check if it returned
> VM_FAULT_RETRY but per-vma lock is still held.
> If that happens I can
> call vma_end_read() before returning from __handle_mm_fault(). That
> way any time handle_mm_fault() returns VM_FAULT_RETRY per-vma lock
> will be already released, so your condition in do_page_fault() will
> work correctly. That would eliminate the need for a new
> VM_FAULT_VMA_UNLOCKED flag. WDYT?

Sounds good.

So probably that's the major pain for now with the legacy fallback (I'll
have commented if I noticed it with the initial vma lock support..).  I
assume that'll go away as long as we recover the VM_FAULT_RETRY semantics
to before.
  
Suren Baghdasaryan June 12, 2023, 6:44 p.m. UTC | #9
On Mon, Jun 12, 2023 at 11:34 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, Jun 12, 2023 at 09:07:38AM -0700, Suren Baghdasaryan wrote:
> > On Mon, Jun 12, 2023 at 6:36 AM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Fri, Jun 09, 2023 at 06:29:43PM -0700, Suren Baghdasaryan wrote:
> > > > On Fri, Jun 9, 2023 at 3:30 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > >
> > > > > On Fri, Jun 9, 2023 at 1:42 PM Peter Xu <peterx@redhat.com> wrote:
> > > > > >
> > > > > > On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote:
> > > > > > > migration_entry_wait does not need VMA lock, therefore it can be dropped
> > > > > > > before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA
> > > > > > > lock was dropped while in handle_mm_fault().
> > > > > > > Note that once VMA lock is dropped, the VMA reference can't be used as
> > > > > > > there are no guarantees it was not freed.
> > > > > >
> > > > > > Then vma lock behaves differently from mmap read lock, am I right?  Can we
> > > > > > still make them match on behaviors, or there's reason not to do so?
> > > > >
> > > > > I think we could match their behavior by also dropping mmap_lock here
> > > > > when fault is handled under mmap_lock (!(fault->flags &
> > > > > FAULT_FLAG_VMA_LOCK)).
> > > > > I missed the fact that VM_FAULT_COMPLETED can be used to skip dropping
> > > > > mmap_lock in do_page_fault(), so indeed, I might be able to use
> > > > > VM_FAULT_COMPLETED to skip vma_end_read(vma) for per-vma locks as well
> > > > > instead of introducing FAULT_FLAG_VMA_LOCK. I think that was your idea
> > > > > of reusing existing flags?
> > > > Sorry, I meant VM_FAULT_VMA_UNLOCKED, not FAULT_FLAG_VMA_LOCK in the
> > > > above reply.
> > > >
> > > > I took a closer look into using VM_FAULT_COMPLETED instead of
> > > > VM_FAULT_VMA_UNLOCKED but when we fall back from per-vma lock to
> > > > mmap_lock we need to retry with an indication that the per-vma lock
> > > > was dropped. Returning (VM_FAULT_RETRY | VM_FAULT_COMPLETE) to
> > > > indicate such state seems strange to me ("retry" and "complete" seem
> > >
> > > Not relevant to this migration patch, but for the whole idea I was thinking
> > > whether it should just work if we simply:
> > >
> > >         fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
> > > -       vma_end_read(vma);
> > > +       if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
> > > +               vma_end_read(vma);
> > >
> > > ?
> >
> > Today when we can't handle a page fault under per-vma locks we return
> > VM_FAULT_RETRY, in which case per-vma lock is dropped and the fault is
>
> Oh I see what I missed.  I think it may not be a good idea to reuse
> VM_FAULT_RETRY just for that, because it makes VM_FAULT_RETRY even more
> complicated - now it adds one more case where the lock is not released,
> that's when PER_VMA even if !NOWAIT.
>
> > retried under mmap_lock. The condition you suggest above would not
> > drop per-vma lock for VM_FAULT_RETRY case and would break the current
> > fallback mechanism.
> > However your suggestion gave me an idea. I could indicate that per-vma
> > lock got dropped using vmf structure (like Matthew suggested before)
> > and once handle_pte_fault(vmf) returns I could check if it returned
> > VM_FAULT_RETRY but per-vma lock is still held.
> > If that happens I can
> > call vma_end_read() before returning from __handle_mm_fault(). That
> > way any time handle_mm_fault() returns VM_FAULT_RETRY per-vma lock
> > will be already released, so your condition in do_page_fault() will
> > work correctly. That would eliminate the need for a new
> > VM_FAULT_VMA_UNLOCKED flag. WDYT?
>
> Sounds good.
>
> So probably that's the major pain for now with the legacy fallback (I'll
> have commented if I noticed it with the initial vma lock support..).  I
> assume that'll go away as long as we recover the VM_FAULT_RETRY semantics
> to before.

I think so. With that change getting VM_FAULT_RETRY in do_page_fault()
will guarantee that per-vma lock was dropped. Is that what you mean?

>
> --
> Peter Xu
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
  
Peter Xu June 12, 2023, 6:57 p.m. UTC | #10
On Mon, Jun 12, 2023 at 11:44:33AM -0700, Suren Baghdasaryan wrote:
> I think so. With that change getting VM_FAULT_RETRY in do_page_fault()
> will guarantee that per-vma lock was dropped. Is that what you mean?

Yes, with the newly added "return VM_FAULT_RETRY" in do_swap_page() for
per-vma lock removed.  AFAICT all the rest paths guaranteed that as long as
in the fault paths.
  

Patch

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 6045a5117ac1..8f59badbffb5 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -601,7 +601,8 @@  static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 		goto lock_mmap;
 	}
 	fault = handle_mm_fault(vma, addr, mm_flags | FAULT_FLAG_VMA_LOCK, regs);
-	vma_end_read(vma);
+	if (!(fault & VM_FAULT_VMA_UNLOCKED))
+		vma_end_read(vma);
 
 	if (!(fault & VM_FAULT_RETRY)) {
 		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 531177a4ee08..b27730f07141 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -494,7 +494,8 @@  static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
 	}
 
 	fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
-	vma_end_read(vma);
+	if (!(fault & VM_FAULT_VMA_UNLOCKED))
+		vma_end_read(vma);
 
 	if (!(fault & VM_FAULT_RETRY)) {
 		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index b65144c392b0..cc923dbb0821 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -418,7 +418,8 @@  static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
 		goto lock_mmap;
 	}
 	fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
-	vma_end_read(vma);
+	if (!(fault & VM_FAULT_VMA_UNLOCKED))
+		vma_end_read(vma);
 	if (!(fault & VM_FAULT_RETRY)) {
 		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
 		goto out;
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e4399983c50c..ef62ab2fd211 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1347,7 +1347,8 @@  void do_user_addr_fault(struct pt_regs *regs,
 		goto lock_mmap;
 	}
 	fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
-	vma_end_read(vma);
+	if (!(fault & VM_FAULT_VMA_UNLOCKED))
+		vma_end_read(vma);
 
 	if (!(fault & VM_FAULT_RETRY)) {
 		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 79765e3dd8f3..bd6b95c82f7a 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1030,6 +1030,8 @@  typedef __bitwise unsigned int vm_fault_t;
  *				fsync() to complete (for synchronous page faults
  *				in DAX)
  * @VM_FAULT_COMPLETED:		->fault completed, meanwhile mmap lock released
+ * @VM_FAULT_VMA_UNLOCKED:	VMA lock was released, vmf->vma should no longer
+ *				be accessed
  * @VM_FAULT_HINDEX_MASK:	mask HINDEX value
  *
  */
@@ -1047,6 +1049,7 @@  enum vm_fault_reason {
 	VM_FAULT_DONE_COW       = (__force vm_fault_t)0x001000,
 	VM_FAULT_NEEDDSYNC      = (__force vm_fault_t)0x002000,
 	VM_FAULT_COMPLETED      = (__force vm_fault_t)0x004000,
+	VM_FAULT_VMA_UNLOCKED   = (__force vm_fault_t)0x008000,
 	VM_FAULT_HINDEX_MASK    = (__force vm_fault_t)0x0f0000,
 };
 
@@ -1071,7 +1074,8 @@  enum vm_fault_reason {
 	{ VM_FAULT_FALLBACK,            "FALLBACK" },	\
 	{ VM_FAULT_DONE_COW,            "DONE_COW" },	\
 	{ VM_FAULT_NEEDDSYNC,           "NEEDDSYNC" },	\
-	{ VM_FAULT_COMPLETED,           "COMPLETED" }
+	{ VM_FAULT_COMPLETED,           "COMPLETED" },	\
+	{ VM_FAULT_VMA_UNLOCKED,        "VMA_UNLOCKED" }
 
 struct vm_special_mapping {
 	const char *name;	/* The name, e.g. "[vdso]". */
diff --git a/mm/memory.c b/mm/memory.c
index 41f45819a923..c234f8085f1e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3714,8 +3714,16 @@  vm_fault_t do_swap_page(struct vm_fault *vmf)
 	entry = pte_to_swp_entry(vmf->orig_pte);
 	if (unlikely(non_swap_entry(entry))) {
 		if (is_migration_entry(entry)) {
-			migration_entry_wait(vma->vm_mm, vmf->pmd,
-					     vmf->address);
+			/* Save mm in case VMA lock is dropped */
+			struct mm_struct *mm = vma->vm_mm;
+
+			if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
+				/* No need to hold VMA lock for migration */
+				vma_end_read(vma);
+				/* WARNING: VMA can't be used after this */
+				ret |= VM_FAULT_VMA_UNLOCKED;
+			}
+			migration_entry_wait(mm, vmf->pmd, vmf->address);
 		} else if (is_device_exclusive_entry(entry)) {
 			vmf->page = pfn_swap_entry_to_page(entry);
 			ret = remove_device_exclusive_entry(vmf);