Message ID | 20230627042321.1763765-7-surenb@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp7937026vqr; Mon, 26 Jun 2023 21:26:02 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7r6HDR0A2HG/HsFzKQRDQPxU/Onj5KhYBSDHlEuvVChFrShFzYSdOff7syI5l2SiK+UcOr X-Received: by 2002:a05:6a20:2590:b0:10c:38d3:437c with SMTP id k16-20020a056a20259000b0010c38d3437cmr28920388pzd.58.1687839962274; Mon, 26 Jun 2023 21:26:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687839962; cv=none; d=google.com; s=arc-20160816; b=zCMk/Uem09eSbXAvgtPGCLjr1RAoP4AtErZPyHxMKexxWTuYTKbUf6mu7AaoaU1dQc 9eYk8SOIxwkHiAvp7qc70+6kSoomguP7YDIAh5w8VhrzEdLY5e8DjoCytCfP2O474fSB HGeECAGzCiEWlt2BqQ5p2U+VVkWgbm1Q0uLKfwUNI0mugf36bbGKNlLFaYTva0+h+0ZF zHkq4ipLTpylFGNRM1utEj9e3Svp8mq3X6ixD0E00rxJ24S+PWg2GLxYm5K5usj8iStL LVHtNV1LA0LqmkIh0PTAge1E2gthUyoR9fGYJFK+2dJAVL3gYdR1444nxEIp6K6a9X9b Oc7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=L/K5IT4x+bKsumOYQJ+yi2cT1e5z+UZoZveNoih+cDc=; fh=RMSYO7NH5mgRq+Q9Sd8uvtKjjFjspH3ZqZT6M3GUwas=; b=AorOoCX9iDWfcvaVUkINA6RhPkLlYu2DDYGdQ3tp0rIXe2k+h5MnFaVonrWrPHcmnT FIWrRhKmXvucSv0AUMQHbM60iOCKKCBiajGKJ+rKeY31eHYZXq0MWJyltze5bfxFl0rK yThzTPG56aDEbl3TXn8GoOpzpXaWdFXa7HnUnNRrAXmIX1y/7DoUidY1gfXi6xcqoMxa WRQTk9Wl6vU2C5UZ1pq1epm7jWibOyyZJw4YXG1rRUrnB8pNO2Pqc5xRZupItH7/xkyW jg3QhU6wR6wBC/fgqO+C56IG/e5s9gBh6D5MDMhR2Fu7nKHM9KyrOrqdExJc1/c6FCrS rKyw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=SPsLrhIf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a7-20020a634d07000000b00543e36736d3si6830534pgb.628.2023.06.26.21.25.49; Mon, 26 Jun 2023 21:26:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=SPsLrhIf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230444AbjF0EYS (ORCPT <rfc822;nicolai.engesland@gmail.com> + 99 others); Tue, 27 Jun 2023 00:24:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230138AbjF0EXn (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 27 Jun 2023 00:23:43 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0D9F1987 for <linux-kernel@vger.kernel.org>; Mon, 26 Jun 2023 21:23:38 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-bf34588085bso5537414276.0 for <linux-kernel@vger.kernel.org>; Mon, 26 Jun 2023 21:23:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687839818; x=1690431818; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=L/K5IT4x+bKsumOYQJ+yi2cT1e5z+UZoZveNoih+cDc=; b=SPsLrhIf+/FviYu5sa3cDiqhSS9nUO2CjK/9TugBc73fpzLKK7PAw7c+quZSTddlaF vZFybm/r8dGNnIepLjzU4IjRPPNo5Dyn4qcIKvLx1Ra7pU8Cv/EXd8r7DspbiwGRfqHi MU+3kIKCrT0+6JdwkeK19scpvS+xkXgrwyrrwzTvdmNJYgNZXtE3QZuG/vo4kg5AeGI2 bmnmp+IfBgDLtPSxL3LPcMXJgEk0vn02BtNF9KdqeRATk3kahh1G/AEseDofSFqqS0cB IngFLnBfekjj25euZhbjVOsR2M+c4gYV9YLPJZnMZCSFvyWp7HxKSX62DnyRwsvBkXkx 8M7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687839818; x=1690431818; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=L/K5IT4x+bKsumOYQJ+yi2cT1e5z+UZoZveNoih+cDc=; b=L23pPQUnAc7SXbbVypV5QjnbSC5CYKFJEcFRMa+9d1CGEYEX/oeumkPxN5fosw4j93 HXTrRM8f587iEhAFYiMMzsfOqxNacmTpkz3D5FbTaLsiJnqhebiNZBQJVn0NL7SwylIz 5foBCSZCMByZ0yn/KxFnE0rGATcKnsj+mqkOcMUBle7YuUCmRrykky0vkwOFlyVX6e30 RvMUmghAH8i1ncdlw4xV797NdNCoBfMqPHIQwcAMX2gxbsvWi8icX3SOCuxZ+Z4ncQiJ pxCNbdaUC0LIGMB/i36rAi3eUaHVf3Qklfa9WEISPn0GDldbCX2CASo0bx0W/WsojytK oabw== X-Gm-Message-State: AC+VfDxAaHy2tyfutBBWpJBSGbA7L9kyty/B1GrGV0S0gaWNnjF9aLRY wkZu/ISUnx2JfRzdr3hQYY43KmHsuy8= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:5075:f38d:ce2f:eb1b]) (user=surenb job=sendgmr) by 2002:a25:e7c2:0:b0:c1d:4fce:452 with SMTP id e185-20020a25e7c2000000b00c1d4fce0452mr3205627ybh.1.1687839817945; Mon, 26 Jun 2023 21:23:37 -0700 (PDT) Date: Mon, 26 Jun 2023 21:23:19 -0700 In-Reply-To: <20230627042321.1763765-1-surenb@google.com> Mime-Version: 1.0 References: <20230627042321.1763765-1-surenb@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230627042321.1763765-7-surenb@google.com> Subject: [PATCH v3 6/8] mm: handle swap page faults under per-VMA lock From: Suren Baghdasaryan <surenb@google.com> To: akpm@linux-foundation.org Cc: willy@infradead.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, hdanton@sina.com, apopple@nvidia.com, peterx@redhat.com, ying.huang@intel.com, david@redhat.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, pasha.tatashin@soleen.com, surenb@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769828476547941828?= X-GMAIL-MSGID: =?utf-8?q?1769828476547941828?= |
Series |
Per-VMA lock support for swap and userfaults
|
|
Commit Message
Suren Baghdasaryan
June 27, 2023, 4:23 a.m. UTC
When page fault is handled under per-VMA lock protection, all swap page
faults are retried with mmap_lock because folio_lock_fault (formerly
known as folio_lock_or_retry) had to drop and reacquire mmap_lock
if folio could not be immediately locked.
Follow the same pattern as mmap_lock to drop per-VMA lock when waiting
for folio in folio_lock_fault and retrying once folio is available.
With this obstacle removed, enable do_swap_page to operate under
per-VMA lock protection. Drivers implementing ops->migrate_to_ram might
still rely on mmap_lock, therefore we have to fall back to mmap_lock in
that particular case.
Note that the only time do_swap_page calls synchronous swap_readpage
is when SWP_SYNCHRONOUS_IO is set, which is only set for
QUEUE_FLAG_SYNCHRONOUS devices: brd, zram and nvdimms (both btt and
pmem). Therefore we don't sleep in this path, and there's no need to
drop the mmap or per-VMA lock.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
mm/filemap.c | 24 ++++++++++++++++--------
mm/memory.c | 21 ++++++++++++++-------
2 files changed, 30 insertions(+), 15 deletions(-)
Comments
On Mon, Jun 26, 2023 at 09:23:19PM -0700, Suren Baghdasaryan wrote: > When page fault is handled under per-VMA lock protection, all swap page > faults are retried with mmap_lock because folio_lock_fault (formerly > known as folio_lock_or_retry) had to drop and reacquire mmap_lock > if folio could not be immediately locked. > Follow the same pattern as mmap_lock to drop per-VMA lock when waiting > for folio in folio_lock_fault and retrying once folio is available. > With this obstacle removed, enable do_swap_page to operate under > per-VMA lock protection. Drivers implementing ops->migrate_to_ram might > still rely on mmap_lock, therefore we have to fall back to mmap_lock in > that particular case. > Note that the only time do_swap_page calls synchronous swap_readpage > is when SWP_SYNCHRONOUS_IO is set, which is only set for > QUEUE_FLAG_SYNCHRONOUS devices: brd, zram and nvdimms (both btt and > pmem). Therefore we don't sleep in this path, and there's no need to > drop the mmap or per-VMA lock. > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > --- > mm/filemap.c | 24 ++++++++++++++++-------- > mm/memory.c | 21 ++++++++++++++------- > 2 files changed, 30 insertions(+), 15 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 8ad06d69895b..683f11f244cd 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1703,12 +1703,14 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait) > * Return values: > * 0 - folio is locked. > * VM_FAULT_RETRY - folio is not locked. > - * mmap_lock has been released (mmap_read_unlock(), unless flags had both > - * FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in > - * which case mmap_lock is still held. > + * FAULT_FLAG_LOCK_DROPPED bit in vmf flags will be set if mmap_lock or This "FAULT_FLAG_LOCK_DROPPED" should belong to that patch when introduced. But again I still think this flag as a whole with that patch is not needed and should be dropped, unless I miss something important.. > + * per-VMA lock got dropped. mmap_lock/per-VMA lock is dropped when > + * function fails to lock the folio, unless flags had both > + * FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in which case > + * the lock is still held. > * > * If neither ALLOW_RETRY nor KILLABLE are set, will always return 0 > - * with the folio locked and the mmap_lock unperturbed. > + * with the folio locked and the mmap_lock/per-VMA lock unperturbed. > */ > vm_fault_t __folio_lock_fault(struct folio *folio, struct vm_fault *vmf) > { > @@ -1716,13 +1718,16 @@ vm_fault_t __folio_lock_fault(struct folio *folio, struct vm_fault *vmf) > > if (fault_flag_allow_retry_first(vmf->flags)) { > /* > - * CAUTION! In this case, mmap_lock is not released > - * even though return VM_FAULT_RETRY. > + * CAUTION! In this case, mmap_lock/per-VMA lock is not > + * released even though returning VM_FAULT_RETRY. > */ > if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) > return VM_FAULT_RETRY; > > - mmap_read_unlock(mm); > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) > + vma_end_read(vmf->vma); > + else > + mmap_read_unlock(mm); > vmf->flags |= FAULT_FLAG_LOCK_DROPPED; > if (vmf->flags & FAULT_FLAG_KILLABLE) > folio_wait_locked_killable(folio); > @@ -1735,7 +1740,10 @@ vm_fault_t __folio_lock_fault(struct folio *folio, struct vm_fault *vmf) > > ret = __folio_lock_killable(folio); > if (ret) { > - mmap_read_unlock(mm); > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) > + vma_end_read(vmf->vma); > + else > + mmap_read_unlock(mm); > vmf->flags |= FAULT_FLAG_LOCK_DROPPED; > return VM_FAULT_RETRY; > } > diff --git a/mm/memory.c b/mm/memory.c > index 3c2acafcd7b6..5caaa4c66ea2 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3712,11 +3712,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > if (!pte_unmap_same(vmf)) > goto out; > > - if (vmf->flags & FAULT_FLAG_VMA_LOCK) { So if with my imagination, here we'll already have the vma_read_end() and this patch will remove it, which makes sense. Then... > - ret = VM_FAULT_RETRY; > - goto out; > - } > - > entry = pte_to_swp_entry(vmf->orig_pte); > if (unlikely(non_swap_entry(entry))) { > if (is_migration_entry(entry)) { > @@ -3726,6 +3721,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > vmf->page = pfn_swap_entry_to_page(entry); > ret = remove_device_exclusive_entry(vmf); > } else if (is_device_private_entry(entry)) { > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > + /* > + * migrate_to_ram is not yet ready to operate > + * under VMA lock. > + */ ... here we probably just do vma_read_end(), then... > + ret |= VM_FAULT_RETRY; > + goto out; > + } > + > vmf->page = pfn_swap_entry_to_page(entry); > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, > vmf->address, &vmf->ptl); > @@ -5089,9 +5093,12 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, > /* > * In case of VM_FAULT_RETRY or VM_FAULT_COMPLETED we might > * be still holding per-VMA lock to keep the vma stable as long > - * as possible. Drop it before returning. > + * as possible. In this situation vmf.flags has > + * FAULT_FLAG_VMA_LOCK set and FAULT_FLAG_LOCK_DROPPED unset. > + * Drop the lock before returning when this happens. > */ > - if (vmf.flags & FAULT_FLAG_VMA_LOCK) > + if ((vmf.flags & (FAULT_FLAG_VMA_LOCK | FAULT_FLAG_LOCK_DROPPED)) == > + FAULT_FLAG_VMA_LOCK) > vma_end_read(vma); This whole chunk should have been dropped altogether with my comment in previous patch, iiuc, and it should be no-op anyway for swap case. For the real "waiting for page lock during swapin" phase we should always 100% release the vma lock in folio_lock_or_retry() - just like mmap lock. Thanks, > } > return ret; > -- > 2.41.0.178.g377b9f9a00-goog >
On Tue, Jun 27, 2023 at 8:41 AM Peter Xu <peterx@redhat.com> wrote: > > On Mon, Jun 26, 2023 at 09:23:19PM -0700, Suren Baghdasaryan wrote: > > When page fault is handled under per-VMA lock protection, all swap page > > faults are retried with mmap_lock because folio_lock_fault (formerly > > known as folio_lock_or_retry) had to drop and reacquire mmap_lock > > if folio could not be immediately locked. > > Follow the same pattern as mmap_lock to drop per-VMA lock when waiting > > for folio in folio_lock_fault and retrying once folio is available. > > With this obstacle removed, enable do_swap_page to operate under > > per-VMA lock protection. Drivers implementing ops->migrate_to_ram might > > still rely on mmap_lock, therefore we have to fall back to mmap_lock in > > that particular case. > > Note that the only time do_swap_page calls synchronous swap_readpage > > is when SWP_SYNCHRONOUS_IO is set, which is only set for > > QUEUE_FLAG_SYNCHRONOUS devices: brd, zram and nvdimms (both btt and > > pmem). Therefore we don't sleep in this path, and there's no need to > > drop the mmap or per-VMA lock. > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > --- > > mm/filemap.c | 24 ++++++++++++++++-------- > > mm/memory.c | 21 ++++++++++++++------- > > 2 files changed, 30 insertions(+), 15 deletions(-) > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 8ad06d69895b..683f11f244cd 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -1703,12 +1703,14 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait) > > * Return values: > > * 0 - folio is locked. > > * VM_FAULT_RETRY - folio is not locked. > > - * mmap_lock has been released (mmap_read_unlock(), unless flags had both > > - * FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in > > - * which case mmap_lock is still held. > > + * FAULT_FLAG_LOCK_DROPPED bit in vmf flags will be set if mmap_lock or > > This "FAULT_FLAG_LOCK_DROPPED" should belong to that patch when introduced. > But again I still think this flag as a whole with that patch is not needed > and should be dropped, unless I miss something important.. > > > + * per-VMA lock got dropped. mmap_lock/per-VMA lock is dropped when > > + * function fails to lock the folio, unless flags had both > > + * FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in which case > > + * the lock is still held. > > * > > * If neither ALLOW_RETRY nor KILLABLE are set, will always return 0 > > - * with the folio locked and the mmap_lock unperturbed. > > + * with the folio locked and the mmap_lock/per-VMA lock unperturbed. > > */ > > vm_fault_t __folio_lock_fault(struct folio *folio, struct vm_fault *vmf) > > { > > @@ -1716,13 +1718,16 @@ vm_fault_t __folio_lock_fault(struct folio *folio, struct vm_fault *vmf) > > > > if (fault_flag_allow_retry_first(vmf->flags)) { > > /* > > - * CAUTION! In this case, mmap_lock is not released > > - * even though return VM_FAULT_RETRY. > > + * CAUTION! In this case, mmap_lock/per-VMA lock is not > > + * released even though returning VM_FAULT_RETRY. > > */ > > if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) > > return VM_FAULT_RETRY; > > > > - mmap_read_unlock(mm); > > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) > > + vma_end_read(vmf->vma); > > + else > > + mmap_read_unlock(mm); > > vmf->flags |= FAULT_FLAG_LOCK_DROPPED; > > if (vmf->flags & FAULT_FLAG_KILLABLE) > > folio_wait_locked_killable(folio); > > @@ -1735,7 +1740,10 @@ vm_fault_t __folio_lock_fault(struct folio *folio, struct vm_fault *vmf) > > > > ret = __folio_lock_killable(folio); > > if (ret) { > > - mmap_read_unlock(mm); > > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) > > + vma_end_read(vmf->vma); > > + else > > + mmap_read_unlock(mm); > > vmf->flags |= FAULT_FLAG_LOCK_DROPPED; > > return VM_FAULT_RETRY; > > } > > diff --git a/mm/memory.c b/mm/memory.c > > index 3c2acafcd7b6..5caaa4c66ea2 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -3712,11 +3712,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > if (!pte_unmap_same(vmf)) > > goto out; > > > > - if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > > So if with my imagination, here we'll already have the vma_read_end() and > this patch will remove it, which makes sense. Then... > > > - ret = VM_FAULT_RETRY; > > - goto out; > > - } > > - > > entry = pte_to_swp_entry(vmf->orig_pte); > > if (unlikely(non_swap_entry(entry))) { > > if (is_migration_entry(entry)) { > > @@ -3726,6 +3721,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > vmf->page = pfn_swap_entry_to_page(entry); > > ret = remove_device_exclusive_entry(vmf); > > } else if (is_device_private_entry(entry)) { > > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > > + /* > > + * migrate_to_ram is not yet ready to operate > > + * under VMA lock. > > + */ > > ... here we probably just do vma_read_end(), then... > > > + ret |= VM_FAULT_RETRY; > > + goto out; > > + } > > + > > vmf->page = pfn_swap_entry_to_page(entry); > > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, > > vmf->address, &vmf->ptl); > > @@ -5089,9 +5093,12 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, > > /* > > * In case of VM_FAULT_RETRY or VM_FAULT_COMPLETED we might > > * be still holding per-VMA lock to keep the vma stable as long > > - * as possible. Drop it before returning. > > + * as possible. In this situation vmf.flags has > > + * FAULT_FLAG_VMA_LOCK set and FAULT_FLAG_LOCK_DROPPED unset. > > + * Drop the lock before returning when this happens. > > */ > > - if (vmf.flags & FAULT_FLAG_VMA_LOCK) > > + if ((vmf.flags & (FAULT_FLAG_VMA_LOCK | FAULT_FLAG_LOCK_DROPPED)) == > > + FAULT_FLAG_VMA_LOCK) > > vma_end_read(vma); > > This whole chunk should have been dropped altogether with my comment in > previous patch, iiuc, and it should be no-op anyway for swap case. For the > real "waiting for page lock during swapin" phase we should always 100% > release the vma lock in folio_lock_or_retry() - just like mmap lock. Yep, we drop FAULT_FLAG_LOCK_DROPPED, release vma lock when we return RETRY and that makes all this unnecessary. I just need to make sure we do not access VMA after we drop its lock since we will be releasing it now earlier than before. > > Thanks, > > > } > > return ret; > > -- > > 2.41.0.178.g377b9f9a00-goog > > > > -- > Peter Xu >
On Tue, Jun 27, 2023 at 09:05:32AM -0700, Suren Baghdasaryan wrote: > On Tue, Jun 27, 2023 at 8:41 AM Peter Xu <peterx@redhat.com> wrote: > > > > On Mon, Jun 26, 2023 at 09:23:19PM -0700, Suren Baghdasaryan wrote: > > > When page fault is handled under per-VMA lock protection, all swap page > > > faults are retried with mmap_lock because folio_lock_fault (formerly > > > known as folio_lock_or_retry) had to drop and reacquire mmap_lock > > > if folio could not be immediately locked. > > > Follow the same pattern as mmap_lock to drop per-VMA lock when waiting > > > for folio in folio_lock_fault and retrying once folio is available. > > > With this obstacle removed, enable do_swap_page to operate under > > > per-VMA lock protection. Drivers implementing ops->migrate_to_ram might > > > still rely on mmap_lock, therefore we have to fall back to mmap_lock in > > > that particular case. > > > Note that the only time do_swap_page calls synchronous swap_readpage > > > is when SWP_SYNCHRONOUS_IO is set, which is only set for > > > QUEUE_FLAG_SYNCHRONOUS devices: brd, zram and nvdimms (both btt and > > > pmem). Therefore we don't sleep in this path, and there's no need to > > > drop the mmap or per-VMA lock. > > > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > > --- > > > mm/filemap.c | 24 ++++++++++++++++-------- > > > mm/memory.c | 21 ++++++++++++++------- > > > 2 files changed, 30 insertions(+), 15 deletions(-) > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > index 8ad06d69895b..683f11f244cd 100644 > > > --- a/mm/filemap.c > > > +++ b/mm/filemap.c > > > @@ -1703,12 +1703,14 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait) > > > * Return values: > > > * 0 - folio is locked. > > > * VM_FAULT_RETRY - folio is not locked. > > > - * mmap_lock has been released (mmap_read_unlock(), unless flags had both > > > - * FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in > > > - * which case mmap_lock is still held. > > > + * FAULT_FLAG_LOCK_DROPPED bit in vmf flags will be set if mmap_lock or > > > > This "FAULT_FLAG_LOCK_DROPPED" should belong to that patch when introduced. > > But again I still think this flag as a whole with that patch is not needed > > and should be dropped, unless I miss something important.. > > > > > + * per-VMA lock got dropped. mmap_lock/per-VMA lock is dropped when > > > + * function fails to lock the folio, unless flags had both > > > + * FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in which case > > > + * the lock is still held. > > > * > > > * If neither ALLOW_RETRY nor KILLABLE are set, will always return 0 > > > - * with the folio locked and the mmap_lock unperturbed. > > > + * with the folio locked and the mmap_lock/per-VMA lock unperturbed. > > > */ > > > vm_fault_t __folio_lock_fault(struct folio *folio, struct vm_fault *vmf) > > > { > > > @@ -1716,13 +1718,16 @@ vm_fault_t __folio_lock_fault(struct folio *folio, struct vm_fault *vmf) > > > > > > if (fault_flag_allow_retry_first(vmf->flags)) { > > > /* > > > - * CAUTION! In this case, mmap_lock is not released > > > - * even though return VM_FAULT_RETRY. > > > + * CAUTION! In this case, mmap_lock/per-VMA lock is not > > > + * released even though returning VM_FAULT_RETRY. > > > */ > > > if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) > > > return VM_FAULT_RETRY; > > > > > > - mmap_read_unlock(mm); > > > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) > > > + vma_end_read(vmf->vma); > > > + else > > > + mmap_read_unlock(mm); > > > vmf->flags |= FAULT_FLAG_LOCK_DROPPED; > > > if (vmf->flags & FAULT_FLAG_KILLABLE) > > > folio_wait_locked_killable(folio); > > > @@ -1735,7 +1740,10 @@ vm_fault_t __folio_lock_fault(struct folio *folio, struct vm_fault *vmf) > > > > > > ret = __folio_lock_killable(folio); > > > if (ret) { > > > - mmap_read_unlock(mm); > > > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) > > > + vma_end_read(vmf->vma); > > > + else > > > + mmap_read_unlock(mm); > > > vmf->flags |= FAULT_FLAG_LOCK_DROPPED; > > > return VM_FAULT_RETRY; > > > } > > > diff --git a/mm/memory.c b/mm/memory.c > > > index 3c2acafcd7b6..5caaa4c66ea2 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > > @@ -3712,11 +3712,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > > if (!pte_unmap_same(vmf)) > > > goto out; > > > > > > - if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > > > > So if with my imagination, here we'll already have the vma_read_end() and > > this patch will remove it, which makes sense. Then... > > > > > - ret = VM_FAULT_RETRY; > > > - goto out; > > > - } > > > - > > > entry = pte_to_swp_entry(vmf->orig_pte); > > > if (unlikely(non_swap_entry(entry))) { > > > if (is_migration_entry(entry)) { > > > @@ -3726,6 +3721,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > > vmf->page = pfn_swap_entry_to_page(entry); > > > ret = remove_device_exclusive_entry(vmf); > > > } else if (is_device_private_entry(entry)) { > > > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > > > + /* > > > + * migrate_to_ram is not yet ready to operate > > > + * under VMA lock. > > > + */ > > > > ... here we probably just do vma_read_end(), then... > > > > > + ret |= VM_FAULT_RETRY; > > > + goto out; > > > + } > > > + > > > vmf->page = pfn_swap_entry_to_page(entry); > > > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, > > > vmf->address, &vmf->ptl); > > > @@ -5089,9 +5093,12 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, > > > /* > > > * In case of VM_FAULT_RETRY or VM_FAULT_COMPLETED we might > > > * be still holding per-VMA lock to keep the vma stable as long > > > - * as possible. Drop it before returning. > > > + * as possible. In this situation vmf.flags has > > > + * FAULT_FLAG_VMA_LOCK set and FAULT_FLAG_LOCK_DROPPED unset. > > > + * Drop the lock before returning when this happens. > > > */ > > > - if (vmf.flags & FAULT_FLAG_VMA_LOCK) > > > + if ((vmf.flags & (FAULT_FLAG_VMA_LOCK | FAULT_FLAG_LOCK_DROPPED)) == > > > + FAULT_FLAG_VMA_LOCK) > > > vma_end_read(vma); > > > > This whole chunk should have been dropped altogether with my comment in > > previous patch, iiuc, and it should be no-op anyway for swap case. For the > > real "waiting for page lock during swapin" phase we should always 100% > > release the vma lock in folio_lock_or_retry() - just like mmap lock. > > Yep, we drop FAULT_FLAG_LOCK_DROPPED, release vma lock when we return > RETRY and that makes all this unnecessary. I just need to make sure we > do not access VMA after we drop its lock since we will be releasing it > now earlier than before. Yes. Hopefully that's not a problem, because mmap lock has it the same way. And that's also the good thing if we can always keep both locks behaving the same if possible, because the problem is really shared, imho.
diff --git a/mm/filemap.c b/mm/filemap.c index 8ad06d69895b..683f11f244cd 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1703,12 +1703,14 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait) * Return values: * 0 - folio is locked. * VM_FAULT_RETRY - folio is not locked. - * mmap_lock has been released (mmap_read_unlock(), unless flags had both - * FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in - * which case mmap_lock is still held. + * FAULT_FLAG_LOCK_DROPPED bit in vmf flags will be set if mmap_lock or + * per-VMA lock got dropped. mmap_lock/per-VMA lock is dropped when + * function fails to lock the folio, unless flags had both + * FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in which case + * the lock is still held. * * If neither ALLOW_RETRY nor KILLABLE are set, will always return 0 - * with the folio locked and the mmap_lock unperturbed. + * with the folio locked and the mmap_lock/per-VMA lock unperturbed. */ vm_fault_t __folio_lock_fault(struct folio *folio, struct vm_fault *vmf) { @@ -1716,13 +1718,16 @@ vm_fault_t __folio_lock_fault(struct folio *folio, struct vm_fault *vmf) if (fault_flag_allow_retry_first(vmf->flags)) { /* - * CAUTION! In this case, mmap_lock is not released - * even though return VM_FAULT_RETRY. + * CAUTION! In this case, mmap_lock/per-VMA lock is not + * released even though returning VM_FAULT_RETRY. */ if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) return VM_FAULT_RETRY; - mmap_read_unlock(mm); + if (vmf->flags & FAULT_FLAG_VMA_LOCK) + vma_end_read(vmf->vma); + else + mmap_read_unlock(mm); vmf->flags |= FAULT_FLAG_LOCK_DROPPED; if (vmf->flags & FAULT_FLAG_KILLABLE) folio_wait_locked_killable(folio); @@ -1735,7 +1740,10 @@ vm_fault_t __folio_lock_fault(struct folio *folio, struct vm_fault *vmf) ret = __folio_lock_killable(folio); if (ret) { - mmap_read_unlock(mm); + if (vmf->flags & FAULT_FLAG_VMA_LOCK) + vma_end_read(vmf->vma); + else + mmap_read_unlock(mm); vmf->flags |= FAULT_FLAG_LOCK_DROPPED; return VM_FAULT_RETRY; } diff --git a/mm/memory.c b/mm/memory.c index 3c2acafcd7b6..5caaa4c66ea2 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3712,11 +3712,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!pte_unmap_same(vmf)) goto out; - if (vmf->flags & FAULT_FLAG_VMA_LOCK) { - ret = VM_FAULT_RETRY; - goto out; - } - entry = pte_to_swp_entry(vmf->orig_pte); if (unlikely(non_swap_entry(entry))) { if (is_migration_entry(entry)) { @@ -3726,6 +3721,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) vmf->page = pfn_swap_entry_to_page(entry); ret = remove_device_exclusive_entry(vmf); } else if (is_device_private_entry(entry)) { + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { + /* + * migrate_to_ram is not yet ready to operate + * under VMA lock. + */ + ret |= VM_FAULT_RETRY; + goto out; + } + vmf->page = pfn_swap_entry_to_page(entry); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); @@ -5089,9 +5093,12 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, /* * In case of VM_FAULT_RETRY or VM_FAULT_COMPLETED we might * be still holding per-VMA lock to keep the vma stable as long - * as possible. Drop it before returning. + * as possible. In this situation vmf.flags has + * FAULT_FLAG_VMA_LOCK set and FAULT_FLAG_LOCK_DROPPED unset. + * Drop the lock before returning when this happens. */ - if (vmf.flags & FAULT_FLAG_VMA_LOCK) + if ((vmf.flags & (FAULT_FLAG_VMA_LOCK | FAULT_FLAG_LOCK_DROPPED)) == + FAULT_FLAG_VMA_LOCK) vma_end_read(vma); } return ret;