Message ID | 20230609005158.2421285-5-surenb@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp638049vqr; Thu, 8 Jun 2023 18:00:19 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5R+8yTCSv2Zxox/UZXEYu1ASNlM5o40S2KFyy6bsal3NQ95pIVMROwtwaHmWqxjMLjQ9SC X-Received: by 2002:a05:6358:baa2:b0:129:ca25:6486 with SMTP id dg34-20020a056358baa200b00129ca256486mr4704600rwb.29.1686272419152; Thu, 08 Jun 2023 18:00:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686272419; cv=none; d=google.com; s=arc-20160816; b=wMLlYwvkJ6Ks0E+oDUPzYe2TvUWPjBG34lgFSk/viGl9v4pTR9TDVDmm34gb91dr6y xFe+fLTpsLTifzVwafBUI8L9TpvMpSLx7EspSn3dT+yEzVQGTzmrSaUxVvTa3cx4yOH9 X/8iW18y+1Qr/mm11HH+fEH7yFNApVmT13cet4JLlcyq0Ql0pqc+aICjHplMVa+uyXhz 0iVQRkychEbXWfx4FcdnZ1HN1ffqAdWHCVMyV1swjvrrElpYZg2h1nBgLXsUjlnP4upU 3EKm3x9lfbGKiSf0wIDUm/etCBt93f0JC8Mt9GkPOkhJLDgfUhwGgvOmAlwvnZisSF7o FvXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=uuo3Il/2NbgOWAM0rjqTihVZSZhUnoZMQ/ESG1PbGOc=; b=bJi7Dg1d4BcVjulkY4Q7DaVp5UOEfG9xunSpA4g3E7KoAYivkm+j0AGwERjKk1Lo5Y TWppBVsXDQqD6Ci3MlrBjz2/cWJwM9Bv+eeL651BDh/xFGSTVi62dIc64HYNjPfYw53+ dZkYC1w9TtNESRtNP7W7QI4zMvw/4XJraH/OfD5ZvdEN+kcw/f4lW6u6YloOHnMFM3aP A1iqxqokYvGO5hJNh9JlqOleU43YozlR4J7O+QShWVpLZIWIbmKTSmhtBt8uX0DpAQhg wyo4z9wrUB66B8V84pHuNDEWOxwY8z/BkV70MAUiZ+tAW65kdeSxHkXQ0u4Lv1w8Qz1r hL7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=aoHc7JPC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q2-20020a637502000000b0053f33ec5635si1813757pgc.865.2023.06.08.18.00.06; Thu, 08 Jun 2023 18:00:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=aoHc7JPC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229661AbjFIAwh (ORCPT <rfc822;literming00@gmail.com> + 99 others); Thu, 8 Jun 2023 20:52:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234892AbjFIAwO (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 8 Jun 2023 20:52:14 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C68AC2685 for <linux-kernel@vger.kernel.org>; Thu, 8 Jun 2023 17:52:12 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-53b9eb7bda0so220334a12.0 for <linux-kernel@vger.kernel.org>; Thu, 08 Jun 2023 17:52:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686271932; x=1688863932; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uuo3Il/2NbgOWAM0rjqTihVZSZhUnoZMQ/ESG1PbGOc=; b=aoHc7JPCZ32CAYt9q46WDYGEOxxS+8Zg8n+MkQ8tBFVy2AxaNM8+3Gds918auhPTHa q1i93JJIOBjYqWdat77Bw43ovYfILELgsQ3A15M4ncn74YtVwsuP81alu9yqHz/6QEf4 0ZCIwnE+WqrId9F8Np2B9UzySwNVPcW9qaPUxrGBEsK8nkCGUUE7v/JS22HdzPoOb4MB Chi4Dc24/vK44G+H3O7mb/dNtV3gVAS4dKZye3QS8d/JCKeQ8FR8f6Rt4chkxNrZVvil NBHO4/LslE98QkZt8lZcAJ+QXDZaDtGSI1vY2Loous6hPUzUbnAktXCDF6R5gPFfIwoD hcMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686271932; x=1688863932; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uuo3Il/2NbgOWAM0rjqTihVZSZhUnoZMQ/ESG1PbGOc=; b=W696qUNB0XBmguvWzibGh1+7V+4mPcC0oBIxXjHuCYY6D1P/iaEob+oQwyEqnNJK1g 0Zqr5AJlb2yOWOHKIUejie2Woq4h52fgBSoMPTySj1ZCRDTVL/zaDG/eyY/BmHXEM0gu kqSSXR9ohk6muSNs9oppWIVU14CflfpdV9nyuO+iGx+e0ENHt1eYWjJ+Usv7UklU0CNI U8aN11MIDeeWk1KXYOAu/cn10B2jkU2e3REG/3gRPkfkYVVG5y5sKL6l9xxFErZLriAy lvQwAiM5U68S8yivgpSGetwBJAhIpIt2H9CA8wn9/SCGb6RXJEhm0grSF7f6c+D+J81s J1/g== X-Gm-Message-State: AC+VfDw7AMq/UqIMU/CQtkye9W+oLEpgZJDAeVs4iiL5kKRM8aKXd+XZ ZGcMoJepZiCK1G7VutpyXz/VzcaZXSM= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:c03e:d3b7:767a:9467]) (user=surenb job=sendgmr) by 2002:a63:d44:0:b0:503:77c9:45aa with SMTP id 4-20020a630d44000000b0050377c945aamr225410pgn.9.1686271932107; Thu, 08 Jun 2023 17:52:12 -0700 (PDT) Date: Thu, 8 Jun 2023 17:51:56 -0700 In-Reply-To: <20230609005158.2421285-1-surenb@google.com> Mime-Version: 1.0 References: <20230609005158.2421285-1-surenb@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230609005158.2421285-5-surenb@google.com> Subject: [PATCH v2 4/6] mm: drop VMA lock before waiting for migration From: Suren Baghdasaryan <surenb@google.com> To: akpm@linux-foundation.org Cc: willy@infradead.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, hdanton@sina.com, apopple@nvidia.com, peterx@redhat.com, ying.huang@intel.com, david@redhat.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, pasha.tatashin@soleen.com, surenb@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768184788353554269?= X-GMAIL-MSGID: =?utf-8?q?1768184788353554269?= |
Series |
Per-vma lock support for swap and userfaults
|
|
Commit Message
Suren Baghdasaryan
June 9, 2023, 12:51 a.m. UTC
migration_entry_wait does not need VMA lock, therefore it can be dropped
before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA
lock was dropped while in handle_mm_fault().
Note that once VMA lock is dropped, the VMA reference can't be used as
there are no guarantees it was not freed.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
arch/arm64/mm/fault.c | 3 ++-
arch/powerpc/mm/fault.c | 3 ++-
arch/s390/mm/fault.c | 3 ++-
arch/x86/mm/fault.c | 3 ++-
include/linux/mm_types.h | 6 +++++-
mm/memory.c | 12 ++++++++++--
6 files changed, 23 insertions(+), 7 deletions(-)
Comments
On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote: > migration_entry_wait does not need VMA lock, therefore it can be dropped > before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA > lock was dropped while in handle_mm_fault(). > Note that once VMA lock is dropped, the VMA reference can't be used as > there are no guarantees it was not freed. Then vma lock behaves differently from mmap read lock, am I right? Can we still make them match on behaviors, or there's reason not to do so? One reason is if they match they can reuse existing flags and there'll be less confusing, e.g. this: (fault->flags & FAULT_FLAG_VMA_LOCK) && (vm_fault_ret && (VM_FAULT_RETRY || VM_FAULT_COMPLETE)) can replace the new flag, iiuc. Thanks,
On Fri, Jun 9, 2023 at 1:42 PM Peter Xu <peterx@redhat.com> wrote: > > On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote: > > migration_entry_wait does not need VMA lock, therefore it can be dropped > > before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA > > lock was dropped while in handle_mm_fault(). > > Note that once VMA lock is dropped, the VMA reference can't be used as > > there are no guarantees it was not freed. > > Then vma lock behaves differently from mmap read lock, am I right? Can we > still make them match on behaviors, or there's reason not to do so? I think we could match their behavior by also dropping mmap_lock here when fault is handled under mmap_lock (!(fault->flags & FAULT_FLAG_VMA_LOCK)). I missed the fact that VM_FAULT_COMPLETED can be used to skip dropping mmap_lock in do_page_fault(), so indeed, I might be able to use VM_FAULT_COMPLETED to skip vma_end_read(vma) for per-vma locks as well instead of introducing FAULT_FLAG_VMA_LOCK. I think that was your idea of reusing existing flags? > > One reason is if they match they can reuse existing flags and there'll be > less confusing, e.g. this: > > (fault->flags & FAULT_FLAG_VMA_LOCK) && > (vm_fault_ret && (VM_FAULT_RETRY || VM_FAULT_COMPLETE)) > > can replace the new flag, iiuc. > > Thanks, > > -- > Peter Xu > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
On Fri, Jun 9, 2023 at 3:30 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Fri, Jun 9, 2023 at 1:42 PM Peter Xu <peterx@redhat.com> wrote: > > > > On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote: > > > migration_entry_wait does not need VMA lock, therefore it can be dropped > > > before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA > > > lock was dropped while in handle_mm_fault(). > > > Note that once VMA lock is dropped, the VMA reference can't be used as > > > there are no guarantees it was not freed. > > > > Then vma lock behaves differently from mmap read lock, am I right? Can we > > still make them match on behaviors, or there's reason not to do so? > > I think we could match their behavior by also dropping mmap_lock here > when fault is handled under mmap_lock (!(fault->flags & > FAULT_FLAG_VMA_LOCK)). > I missed the fact that VM_FAULT_COMPLETED can be used to skip dropping > mmap_lock in do_page_fault(), so indeed, I might be able to use > VM_FAULT_COMPLETED to skip vma_end_read(vma) for per-vma locks as well > instead of introducing FAULT_FLAG_VMA_LOCK. I think that was your idea > of reusing existing flags? Sorry, I meant VM_FAULT_VMA_UNLOCKED, not FAULT_FLAG_VMA_LOCK in the above reply. I took a closer look into using VM_FAULT_COMPLETED instead of VM_FAULT_VMA_UNLOCKED but when we fall back from per-vma lock to mmap_lock we need to retry with an indication that the per-vma lock was dropped. Returning (VM_FAULT_RETRY | VM_FAULT_COMPLETE) to indicate such state seems strange to me ("retry" and "complete" seem like contradicting concepts to be used in a single result). I could use VM_FAULT_COMPLETE when releasing mmap_lock since we don't use it in combination with VM_FAULT_RETRY and (VM_FAULT_RETRY | VM_FAULT_VMA_UNLOCKED) when dropping per-vma lock and falling back to mmap_lock. It still requires the new VM_FAULT_VMA_UNLOCKED flag but I think logically that makes more sense. WDYT? > > > > > One reason is if they match they can reuse existing flags and there'll be > > less confusing, e.g. this: > > > > (fault->flags & FAULT_FLAG_VMA_LOCK) && > > (vm_fault_ret && (VM_FAULT_RETRY || VM_FAULT_COMPLETE)) > > > > can replace the new flag, iiuc. > > > > Thanks, > > > > -- > > Peter Xu > > > > -- > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. > >
On Fri, Jun 09, 2023 at 03:30:10PM -0700, Suren Baghdasaryan wrote: > On Fri, Jun 9, 2023 at 1:42 PM Peter Xu <peterx@redhat.com> wrote: > > > > On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote: > > > migration_entry_wait does not need VMA lock, therefore it can be dropped > > > before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA > > > lock was dropped while in handle_mm_fault(). > > > Note that once VMA lock is dropped, the VMA reference can't be used as > > > there are no guarantees it was not freed. > > > > Then vma lock behaves differently from mmap read lock, am I right? Can we > > still make them match on behaviors, or there's reason not to do so? > > I think we could match their behavior by also dropping mmap_lock here > when fault is handled under mmap_lock (!(fault->flags & > FAULT_FLAG_VMA_LOCK)). > I missed the fact that VM_FAULT_COMPLETED can be used to skip dropping > mmap_lock in do_page_fault(), so indeed, I might be able to use > VM_FAULT_COMPLETED to skip vma_end_read(vma) for per-vma locks as well > instead of introducing FAULT_FLAG_VMA_LOCK. I think that was your idea > of reusing existing flags? Yes. I'd suggest we move this patch out of the series as it's not really part of it on enabling swap + uffd. It can be a separate patch and hopefully it'll always change both vma+mmap lock cases, and with proper reasonings. Thanks,
On Fri, Jun 09, 2023 at 06:29:43PM -0700, Suren Baghdasaryan wrote: > On Fri, Jun 9, 2023 at 3:30 PM Suren Baghdasaryan <surenb@google.com> wrote: > > > > On Fri, Jun 9, 2023 at 1:42 PM Peter Xu <peterx@redhat.com> wrote: > > > > > > On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote: > > > > migration_entry_wait does not need VMA lock, therefore it can be dropped > > > > before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA > > > > lock was dropped while in handle_mm_fault(). > > > > Note that once VMA lock is dropped, the VMA reference can't be used as > > > > there are no guarantees it was not freed. > > > > > > Then vma lock behaves differently from mmap read lock, am I right? Can we > > > still make them match on behaviors, or there's reason not to do so? > > > > I think we could match their behavior by also dropping mmap_lock here > > when fault is handled under mmap_lock (!(fault->flags & > > FAULT_FLAG_VMA_LOCK)). > > I missed the fact that VM_FAULT_COMPLETED can be used to skip dropping > > mmap_lock in do_page_fault(), so indeed, I might be able to use > > VM_FAULT_COMPLETED to skip vma_end_read(vma) for per-vma locks as well > > instead of introducing FAULT_FLAG_VMA_LOCK. I think that was your idea > > of reusing existing flags? > Sorry, I meant VM_FAULT_VMA_UNLOCKED, not FAULT_FLAG_VMA_LOCK in the > above reply. > > I took a closer look into using VM_FAULT_COMPLETED instead of > VM_FAULT_VMA_UNLOCKED but when we fall back from per-vma lock to > mmap_lock we need to retry with an indication that the per-vma lock > was dropped. Returning (VM_FAULT_RETRY | VM_FAULT_COMPLETE) to > indicate such state seems strange to me ("retry" and "complete" seem Not relevant to this migration patch, but for the whole idea I was thinking whether it should just work if we simply: fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs); - vma_end_read(vma); + if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED))) + vma_end_read(vma); ? GUP may need more caution on NOWAIT, but vma lock is only in fault paths so IIUC it's fine?
On Mon, Jun 12, 2023 at 6:28 AM Peter Xu <peterx@redhat.com> wrote: > > On Fri, Jun 09, 2023 at 03:30:10PM -0700, Suren Baghdasaryan wrote: > > On Fri, Jun 9, 2023 at 1:42 PM Peter Xu <peterx@redhat.com> wrote: > > > > > > On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote: > > > > migration_entry_wait does not need VMA lock, therefore it can be dropped > > > > before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA > > > > lock was dropped while in handle_mm_fault(). > > > > Note that once VMA lock is dropped, the VMA reference can't be used as > > > > there are no guarantees it was not freed. > > > > > > Then vma lock behaves differently from mmap read lock, am I right? Can we > > > still make them match on behaviors, or there's reason not to do so? > > > > I think we could match their behavior by also dropping mmap_lock here > > when fault is handled under mmap_lock (!(fault->flags & > > FAULT_FLAG_VMA_LOCK)). > > I missed the fact that VM_FAULT_COMPLETED can be used to skip dropping > > mmap_lock in do_page_fault(), so indeed, I might be able to use > > VM_FAULT_COMPLETED to skip vma_end_read(vma) for per-vma locks as well > > instead of introducing FAULT_FLAG_VMA_LOCK. I think that was your idea > > of reusing existing flags? > > Yes. > > I'd suggest we move this patch out of the series as it's not really part of > it on enabling swap + uffd. It can be a separate patch and hopefully it'll > always change both vma+mmap lock cases, and with proper reasonings. Ok, I can move it out with mmap_lock support only and then add per-vma lock support in my patchset (because this path is still part of do_swap_page and my patchset enables swap support for per-vma locks). > > Thanks, > > -- > Peter Xu >
On Mon, Jun 12, 2023 at 6:36 AM Peter Xu <peterx@redhat.com> wrote: > > On Fri, Jun 09, 2023 at 06:29:43PM -0700, Suren Baghdasaryan wrote: > > On Fri, Jun 9, 2023 at 3:30 PM Suren Baghdasaryan <surenb@google.com> wrote: > > > > > > On Fri, Jun 9, 2023 at 1:42 PM Peter Xu <peterx@redhat.com> wrote: > > > > > > > > On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote: > > > > > migration_entry_wait does not need VMA lock, therefore it can be dropped > > > > > before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA > > > > > lock was dropped while in handle_mm_fault(). > > > > > Note that once VMA lock is dropped, the VMA reference can't be used as > > > > > there are no guarantees it was not freed. > > > > > > > > Then vma lock behaves differently from mmap read lock, am I right? Can we > > > > still make them match on behaviors, or there's reason not to do so? > > > > > > I think we could match their behavior by also dropping mmap_lock here > > > when fault is handled under mmap_lock (!(fault->flags & > > > FAULT_FLAG_VMA_LOCK)). > > > I missed the fact that VM_FAULT_COMPLETED can be used to skip dropping > > > mmap_lock in do_page_fault(), so indeed, I might be able to use > > > VM_FAULT_COMPLETED to skip vma_end_read(vma) for per-vma locks as well > > > instead of introducing FAULT_FLAG_VMA_LOCK. I think that was your idea > > > of reusing existing flags? > > Sorry, I meant VM_FAULT_VMA_UNLOCKED, not FAULT_FLAG_VMA_LOCK in the > > above reply. > > > > I took a closer look into using VM_FAULT_COMPLETED instead of > > VM_FAULT_VMA_UNLOCKED but when we fall back from per-vma lock to > > mmap_lock we need to retry with an indication that the per-vma lock > > was dropped. Returning (VM_FAULT_RETRY | VM_FAULT_COMPLETE) to > > indicate such state seems strange to me ("retry" and "complete" seem > > Not relevant to this migration patch, but for the whole idea I was thinking > whether it should just work if we simply: > > fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs); > - vma_end_read(vma); > + if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED))) > + vma_end_read(vma); > > ? Today when we can't handle a page fault under per-vma locks we return VM_FAULT_RETRY, in which case per-vma lock is dropped and the fault is retried under mmap_lock. The condition you suggest above would not drop per-vma lock for VM_FAULT_RETRY case and would break the current fallback mechanism. However your suggestion gave me an idea. I could indicate that per-vma lock got dropped using vmf structure (like Matthew suggested before) and once handle_pte_fault(vmf) returns I could check if it returned VM_FAULT_RETRY but per-vma lock is still held. If that happens I can call vma_end_read() before returning from __handle_mm_fault(). That way any time handle_mm_fault() returns VM_FAULT_RETRY per-vma lock will be already released, so your condition in do_page_fault() will work correctly. That would eliminate the need for a new VM_FAULT_VMA_UNLOCKED flag. WDYT? > > GUP may need more caution on NOWAIT, but vma lock is only in fault paths so > IIUC it's fine? > > -- > Peter Xu > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
On Mon, Jun 12, 2023 at 09:07:38AM -0700, Suren Baghdasaryan wrote: > On Mon, Jun 12, 2023 at 6:36 AM Peter Xu <peterx@redhat.com> wrote: > > > > On Fri, Jun 09, 2023 at 06:29:43PM -0700, Suren Baghdasaryan wrote: > > > On Fri, Jun 9, 2023 at 3:30 PM Suren Baghdasaryan <surenb@google.com> wrote: > > > > > > > > On Fri, Jun 9, 2023 at 1:42 PM Peter Xu <peterx@redhat.com> wrote: > > > > > > > > > > On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote: > > > > > > migration_entry_wait does not need VMA lock, therefore it can be dropped > > > > > > before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA > > > > > > lock was dropped while in handle_mm_fault(). > > > > > > Note that once VMA lock is dropped, the VMA reference can't be used as > > > > > > there are no guarantees it was not freed. > > > > > > > > > > Then vma lock behaves differently from mmap read lock, am I right? Can we > > > > > still make them match on behaviors, or there's reason not to do so? > > > > > > > > I think we could match their behavior by also dropping mmap_lock here > > > > when fault is handled under mmap_lock (!(fault->flags & > > > > FAULT_FLAG_VMA_LOCK)). > > > > I missed the fact that VM_FAULT_COMPLETED can be used to skip dropping > > > > mmap_lock in do_page_fault(), so indeed, I might be able to use > > > > VM_FAULT_COMPLETED to skip vma_end_read(vma) for per-vma locks as well > > > > instead of introducing FAULT_FLAG_VMA_LOCK. I think that was your idea > > > > of reusing existing flags? > > > Sorry, I meant VM_FAULT_VMA_UNLOCKED, not FAULT_FLAG_VMA_LOCK in the > > > above reply. > > > > > > I took a closer look into using VM_FAULT_COMPLETED instead of > > > VM_FAULT_VMA_UNLOCKED but when we fall back from per-vma lock to > > > mmap_lock we need to retry with an indication that the per-vma lock > > > was dropped. Returning (VM_FAULT_RETRY | VM_FAULT_COMPLETE) to > > > indicate such state seems strange to me ("retry" and "complete" seem > > > > Not relevant to this migration patch, but for the whole idea I was thinking > > whether it should just work if we simply: > > > > fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs); > > - vma_end_read(vma); > > + if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED))) > > + vma_end_read(vma); > > > > ? > > Today when we can't handle a page fault under per-vma locks we return > VM_FAULT_RETRY, in which case per-vma lock is dropped and the fault is Oh I see what I missed. I think it may not be a good idea to reuse VM_FAULT_RETRY just for that, because it makes VM_FAULT_RETRY even more complicated - now it adds one more case where the lock is not released, that's when PER_VMA even if !NOWAIT. > retried under mmap_lock. The condition you suggest above would not > drop per-vma lock for VM_FAULT_RETRY case and would break the current > fallback mechanism. > However your suggestion gave me an idea. I could indicate that per-vma > lock got dropped using vmf structure (like Matthew suggested before) > and once handle_pte_fault(vmf) returns I could check if it returned > VM_FAULT_RETRY but per-vma lock is still held. > If that happens I can > call vma_end_read() before returning from __handle_mm_fault(). That > way any time handle_mm_fault() returns VM_FAULT_RETRY per-vma lock > will be already released, so your condition in do_page_fault() will > work correctly. That would eliminate the need for a new > VM_FAULT_VMA_UNLOCKED flag. WDYT? Sounds good. So probably that's the major pain for now with the legacy fallback (I'll have commented if I noticed it with the initial vma lock support..). I assume that'll go away as long as we recover the VM_FAULT_RETRY semantics to before.
On Mon, Jun 12, 2023 at 11:34 AM Peter Xu <peterx@redhat.com> wrote: > > On Mon, Jun 12, 2023 at 09:07:38AM -0700, Suren Baghdasaryan wrote: > > On Mon, Jun 12, 2023 at 6:36 AM Peter Xu <peterx@redhat.com> wrote: > > > > > > On Fri, Jun 09, 2023 at 06:29:43PM -0700, Suren Baghdasaryan wrote: > > > > On Fri, Jun 9, 2023 at 3:30 PM Suren Baghdasaryan <surenb@google.com> wrote: > > > > > > > > > > On Fri, Jun 9, 2023 at 1:42 PM Peter Xu <peterx@redhat.com> wrote: > > > > > > > > > > > > On Thu, Jun 08, 2023 at 05:51:56PM -0700, Suren Baghdasaryan wrote: > > > > > > > migration_entry_wait does not need VMA lock, therefore it can be dropped > > > > > > > before waiting. Introduce VM_FAULT_VMA_UNLOCKED to indicate that VMA > > > > > > > lock was dropped while in handle_mm_fault(). > > > > > > > Note that once VMA lock is dropped, the VMA reference can't be used as > > > > > > > there are no guarantees it was not freed. > > > > > > > > > > > > Then vma lock behaves differently from mmap read lock, am I right? Can we > > > > > > still make them match on behaviors, or there's reason not to do so? > > > > > > > > > > I think we could match their behavior by also dropping mmap_lock here > > > > > when fault is handled under mmap_lock (!(fault->flags & > > > > > FAULT_FLAG_VMA_LOCK)). > > > > > I missed the fact that VM_FAULT_COMPLETED can be used to skip dropping > > > > > mmap_lock in do_page_fault(), so indeed, I might be able to use > > > > > VM_FAULT_COMPLETED to skip vma_end_read(vma) for per-vma locks as well > > > > > instead of introducing FAULT_FLAG_VMA_LOCK. I think that was your idea > > > > > of reusing existing flags? > > > > Sorry, I meant VM_FAULT_VMA_UNLOCKED, not FAULT_FLAG_VMA_LOCK in the > > > > above reply. > > > > > > > > I took a closer look into using VM_FAULT_COMPLETED instead of > > > > VM_FAULT_VMA_UNLOCKED but when we fall back from per-vma lock to > > > > mmap_lock we need to retry with an indication that the per-vma lock > > > > was dropped. Returning (VM_FAULT_RETRY | VM_FAULT_COMPLETE) to > > > > indicate such state seems strange to me ("retry" and "complete" seem > > > > > > Not relevant to this migration patch, but for the whole idea I was thinking > > > whether it should just work if we simply: > > > > > > fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs); > > > - vma_end_read(vma); > > > + if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED))) > > > + vma_end_read(vma); > > > > > > ? > > > > Today when we can't handle a page fault under per-vma locks we return > > VM_FAULT_RETRY, in which case per-vma lock is dropped and the fault is > > Oh I see what I missed. I think it may not be a good idea to reuse > VM_FAULT_RETRY just for that, because it makes VM_FAULT_RETRY even more > complicated - now it adds one more case where the lock is not released, > that's when PER_VMA even if !NOWAIT. > > > retried under mmap_lock. The condition you suggest above would not > > drop per-vma lock for VM_FAULT_RETRY case and would break the current > > fallback mechanism. > > However your suggestion gave me an idea. I could indicate that per-vma > > lock got dropped using vmf structure (like Matthew suggested before) > > and once handle_pte_fault(vmf) returns I could check if it returned > > VM_FAULT_RETRY but per-vma lock is still held. > > If that happens I can > > call vma_end_read() before returning from __handle_mm_fault(). That > > way any time handle_mm_fault() returns VM_FAULT_RETRY per-vma lock > > will be already released, so your condition in do_page_fault() will > > work correctly. That would eliminate the need for a new > > VM_FAULT_VMA_UNLOCKED flag. WDYT? > > Sounds good. > > So probably that's the major pain for now with the legacy fallback (I'll > have commented if I noticed it with the initial vma lock support..). I > assume that'll go away as long as we recover the VM_FAULT_RETRY semantics > to before. I think so. With that change getting VM_FAULT_RETRY in do_page_fault() will guarantee that per-vma lock was dropped. Is that what you mean? > > -- > Peter Xu > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
On Mon, Jun 12, 2023 at 11:44:33AM -0700, Suren Baghdasaryan wrote: > I think so. With that change getting VM_FAULT_RETRY in do_page_fault() > will guarantee that per-vma lock was dropped. Is that what you mean? Yes, with the newly added "return VM_FAULT_RETRY" in do_swap_page() for per-vma lock removed. AFAICT all the rest paths guaranteed that as long as in the fault paths.
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 6045a5117ac1..8f59badbffb5 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -601,7 +601,8 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr, goto lock_mmap; } fault = handle_mm_fault(vma, addr, mm_flags | FAULT_FLAG_VMA_LOCK, regs); - vma_end_read(vma); + if (!(fault & VM_FAULT_VMA_UNLOCKED)) + vma_end_read(vma); if (!(fault & VM_FAULT_RETRY)) { count_vm_vma_lock_event(VMA_LOCK_SUCCESS); diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 531177a4ee08..b27730f07141 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -494,7 +494,8 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address, } fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs); - vma_end_read(vma); + if (!(fault & VM_FAULT_VMA_UNLOCKED)) + vma_end_read(vma); if (!(fault & VM_FAULT_RETRY)) { count_vm_vma_lock_event(VMA_LOCK_SUCCESS); diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index b65144c392b0..cc923dbb0821 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -418,7 +418,8 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access) goto lock_mmap; } fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs); - vma_end_read(vma); + if (!(fault & VM_FAULT_VMA_UNLOCKED)) + vma_end_read(vma); if (!(fault & VM_FAULT_RETRY)) { count_vm_vma_lock_event(VMA_LOCK_SUCCESS); goto out; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index e4399983c50c..ef62ab2fd211 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1347,7 +1347,8 @@ void do_user_addr_fault(struct pt_regs *regs, goto lock_mmap; } fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs); - vma_end_read(vma); + if (!(fault & VM_FAULT_VMA_UNLOCKED)) + vma_end_read(vma); if (!(fault & VM_FAULT_RETRY)) { count_vm_vma_lock_event(VMA_LOCK_SUCCESS); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 79765e3dd8f3..bd6b95c82f7a 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1030,6 +1030,8 @@ typedef __bitwise unsigned int vm_fault_t; * fsync() to complete (for synchronous page faults * in DAX) * @VM_FAULT_COMPLETED: ->fault completed, meanwhile mmap lock released + * @VM_FAULT_VMA_UNLOCKED: VMA lock was released, vmf->vma should no longer + * be accessed * @VM_FAULT_HINDEX_MASK: mask HINDEX value * */ @@ -1047,6 +1049,7 @@ enum vm_fault_reason { VM_FAULT_DONE_COW = (__force vm_fault_t)0x001000, VM_FAULT_NEEDDSYNC = (__force vm_fault_t)0x002000, VM_FAULT_COMPLETED = (__force vm_fault_t)0x004000, + VM_FAULT_VMA_UNLOCKED = (__force vm_fault_t)0x008000, VM_FAULT_HINDEX_MASK = (__force vm_fault_t)0x0f0000, }; @@ -1071,7 +1074,8 @@ enum vm_fault_reason { { VM_FAULT_FALLBACK, "FALLBACK" }, \ { VM_FAULT_DONE_COW, "DONE_COW" }, \ { VM_FAULT_NEEDDSYNC, "NEEDDSYNC" }, \ - { VM_FAULT_COMPLETED, "COMPLETED" } + { VM_FAULT_COMPLETED, "COMPLETED" }, \ + { VM_FAULT_VMA_UNLOCKED, "VMA_UNLOCKED" } struct vm_special_mapping { const char *name; /* The name, e.g. "[vdso]". */ diff --git a/mm/memory.c b/mm/memory.c index 41f45819a923..c234f8085f1e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3714,8 +3714,16 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) entry = pte_to_swp_entry(vmf->orig_pte); if (unlikely(non_swap_entry(entry))) { if (is_migration_entry(entry)) { - migration_entry_wait(vma->vm_mm, vmf->pmd, - vmf->address); + /* Save mm in case VMA lock is dropped */ + struct mm_struct *mm = vma->vm_mm; + + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { + /* No need to hold VMA lock for migration */ + vma_end_read(vma); + /* WARNING: VMA can't be used after this */ + ret |= VM_FAULT_VMA_UNLOCKED; + } + migration_entry_wait(mm, vmf->pmd, vmf->address); } else if (is_device_exclusive_entry(entry)) { vmf->page = pfn_swap_entry_to_page(entry); ret = remove_device_exclusive_entry(vmf);