Message ID | 20230216051750.3125598-22-surenb@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp117344wrn; Wed, 15 Feb 2023 21:21:16 -0800 (PST) X-Google-Smtp-Source: AK7set/zI/4joEe1zWsOsLqXd3Z+0I9Hhz19c1xnXNC3TenunPTIfpkEnGQse8x0H7BvwmPnfPYx X-Received: by 2002:a17:906:4e84:b0:889:58bd:86f1 with SMTP id v4-20020a1709064e8400b0088958bd86f1mr4936030eju.14.1676524875912; Wed, 15 Feb 2023 21:21:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676524875; cv=none; d=google.com; s=arc-20160816; b=JbcmhBGfB3PpG9kgSRLwbBZnVt+IIne3dt8OSA/KiCCvwYFdzPuAUqzn0EtHvAjolh riyLzAhjJC5VeYLCtpH7NZxGzwYOXaCozuKD/X5i7liWdg0BQdVTmB0mf+GDPI3UuB1y n2ivD773A1REqzmCfv+5C5XChQg/Iu13RGRwZunKKdEy57mxsuNx4DgNgVMjXPyoCEeR 1c3DC4nQQVgR38dmHw1975Cap9/ypn4P5GgDFlHHs1Rf2ubSdEi1z3NBIq8MK8Qw7nGc KA9+dqn5nj1fmLs9YfRrMR03z+ddLh5/noTlPeM+MUc4sXe4Kjt/C2zl2mX2X7K01k6V nMuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=zeEKIs6RouZ3NeHI7ZJ8m5Jmv1yLdTyFBRJo2pt5tCE=; b=SKDfHpyEQjARw5WaQvHQfaDulxquPLxfpl09Ex6ZoOZxlvEVtaPHkthP5M5Eb0Q1uk G2fPNwMScbMXB88RbToWxiGhBN0A72a3XU+EJ94yYgdQKYGnm2jECvD+glYCCswRFtIL U8D/1dTjh9MKPS1QxdXcoYzrIqPYFgb3Puyo2gN/GGeJAY2MYUX+8zlpXjp1gPD7waEq SBO3huCuxvLQce1KFJsmjEhKnrTSpJRB++AsuSSaJVZyq8r2i/GlD1iEzf39tKwDnuoo oHmZK0rPaOmaKA6xeutkqSDCMw/kQReGoVsQefrNP+Z+eP1WBAQ/sdNC5XpaJIb4ohVz DTAA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Hd3NZlon; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mf8-20020a1709071a4800b008b1384001f8si768368ejc.520.2023.02.15.21.20.52; Wed, 15 Feb 2023 21:21:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Hd3NZlon; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229816AbjBPFUT (ORCPT <rfc822;hadasmailinglist@gmail.com> + 99 others); Thu, 16 Feb 2023 00:20:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46396 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229623AbjBPFTW (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 16 Feb 2023 00:19:22 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5931636FC0 for <linux-kernel@vger.kernel.org>; Wed, 15 Feb 2023 21:18:47 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id n20-20020a25da14000000b008fa1d22bd55so840318ybf.21 for <linux-kernel@vger.kernel.org>; Wed, 15 Feb 2023 21:18:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=zeEKIs6RouZ3NeHI7ZJ8m5Jmv1yLdTyFBRJo2pt5tCE=; b=Hd3NZlonbpCzy5K7Z5feVeMneevLupNr2wIBk8xrF2F0FeVHuTW7v0p8Iwz465OudO Kz/cwziHnFST/NwsyYv9H1NpjXxfHSGLxO5ukc+3xsoPYXaSl8+RA3RdmvmM4fbLpXN9 gimJ3OxJZ4Ep64JmtioAuz7PGMRMavPtJuthlMhEIfRvRDzYwwXEt+kIY/UhQ/1ohnXV Mp7Yg73p6F/OdwNe1qmUkwSe8yYAoyBy/8c1r/0Zsdj45B+B6DEhDKlW7MvfqHG54uOt TWogzsTy9UemTUaeFAzTdpALJEgq+fwHHnoe+QVimQT+Dp4bg9S2kMJXbPj9+Gb2VnGT o48A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zeEKIs6RouZ3NeHI7ZJ8m5Jmv1yLdTyFBRJo2pt5tCE=; b=FV5sNiMFzNHCNtvK23N/vPhSa88H749KUN2uP8iGhzQx7K+wYFs1fqloQJUvbcXwwg R6OKWNFPaTzj3aQ2TYOncfxBOFMPMgNgy26wQgmT1raRUN6gkZDZsu+ISTS/H0Z67FfD y//3p7Ht8NcQcJOsE0R5EKGDQBbSVAmN5Jp5V/fIvBJFu7WIKaMjqfJeR/XbsZ3ugZ9O Gf7cNJgPhJhHE/ssF9UCDaxtENN77oeH8NbNvBolB80oYXxW+QFw3sgRU+bDxqDmkptv Y19RdCjRxtkZZRhGP3oFjspef3lPMMBzFiMjQ+Cl8BTrRAWDnCYOqQo8/9RKAG8c+PBs lGKA== X-Gm-Message-State: AO0yUKXXbwg/WF7QEdet7cIP1MfWQvGumGsRur2t0XDBnuXUE5A3xB82 /BFlWIVTGqFtpiO7QVMvyvvaD8FvLws= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:f781:d5ed:1806:6ebb]) (user=surenb job=sendgmr) by 2002:a5b:604:0:b0:859:2acc:deb6 with SMTP id d4-20020a5b0604000000b008592accdeb6mr561571ybq.79.1676524725671; Wed, 15 Feb 2023 21:18:45 -0800 (PST) Date: Wed, 15 Feb 2023 21:17:36 -0800 In-Reply-To: <20230216051750.3125598-1-surenb@google.com> Mime-Version: 1.0 References: <20230216051750.3125598-1-surenb@google.com> X-Mailer: git-send-email 2.39.1.581.gbfd45094c4-goog Message-ID: <20230216051750.3125598-22-surenb@google.com> Subject: [PATCH v3 21/35] mm/mmap: write-lock adjacent VMAs if they can grow into unmapped area From: Suren Baghdasaryan <surenb@google.com> To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan <surenb@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757963748559215981?= X-GMAIL-MSGID: =?utf-8?q?1757963748559215981?= |
Series |
Per-VMA locks
|
|
Commit Message
Suren Baghdasaryan
Feb. 16, 2023, 5:17 a.m. UTC
While unmapping VMAs, adjacent VMAs might be able to grow into the area
being unmapped. In such cases write-lock adjacent VMAs to prevent this
growth.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
mm/mmap.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
Comments
First, sorry I didn't see this before v3.. * Suren Baghdasaryan <surenb@google.com> [230216 00:18]: > While unmapping VMAs, adjacent VMAs might be able to grow into the area > being unmapped. In such cases write-lock adjacent VMAs to prevent this > growth. > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > --- > mm/mmap.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 118b2246bba9..00f8c5798936 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2399,11 +2399,13 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, > * down_read(mmap_lock) and collide with the VMA we are about to unmap. > */ > if (downgrade) { > - if (next && (next->vm_flags & VM_GROWSDOWN)) > + if (next && (next->vm_flags & VM_GROWSDOWN)) { > + vma_start_write(next); > downgrade = false; If the mmap write lock is insufficient to protect us from next/prev modifications then we need to move *most* of this block above the maple tree write operation, otherwise we have a race here. When I say most, I mean everything besides the call to mmap_write_downgrade() needs to be moved. If the mmap write lock is sufficient to protect us from next/prev modifications then we don't need to write lock the vmas themselves. I believe this is for expand_stack() protection, so I believe it's okay to not vma write lock these vmas.. I don't think there are other areas where we can modify the vmas without holding the mmap lock, but others on the CC list please chime in if I've forgotten something. So, if I am correct, then you shouldn't lock next/prev and allow the vma locking fault method on these vmas. This will work because lock_vma_under_rcu() uses mas_walk() on the faulting address. That is, your lock_vma_under_rcu() will fail to find anything that needs to be grown and go back to mmap lock protection. As it is written today, the vma locking fault handler will fail and we will wait for the mmap lock to be released even when the vma isn't going to expand. > - else if (prev && (prev->vm_flags & VM_GROWSUP)) > + } else if (prev && (prev->vm_flags & VM_GROWSUP)) { > + vma_start_write(prev); > downgrade = false; > - else > + } else > mmap_write_downgrade(mm); > } > > -- > 2.39.1
On Thu, Feb 16, 2023 at 7:34 AM Liam R. Howlett <Liam.Howlett@oracle.com> wrote: > > > First, sorry I didn't see this before v3.. Feedback at any time is highly appreciated! > > * Suren Baghdasaryan <surenb@google.com> [230216 00:18]: > > While unmapping VMAs, adjacent VMAs might be able to grow into the area > > being unmapped. In such cases write-lock adjacent VMAs to prevent this > > growth. > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > --- > > mm/mmap.c | 8 +++++--- > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > index 118b2246bba9..00f8c5798936 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -2399,11 +2399,13 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, > > * down_read(mmap_lock) and collide with the VMA we are about to unmap. > > */ > > if (downgrade) { > > - if (next && (next->vm_flags & VM_GROWSDOWN)) > > + if (next && (next->vm_flags & VM_GROWSDOWN)) { > > + vma_start_write(next); > > downgrade = false; > > If the mmap write lock is insufficient to protect us from next/prev > modifications then we need to move *most* of this block above the maple > tree write operation, otherwise we have a race here. When I say most, I > mean everything besides the call to mmap_write_downgrade() needs to be > moved. Which prior maple tree write operation are you referring to? I see __split_vma() and munmap_sidetree() which both already lock the VMAs they operate on, so page faults can't happen in those VMAs. > > If the mmap write lock is sufficient to protect us from next/prev > modifications then we don't need to write lock the vmas themselves. mmap write lock is not sufficient because with per-VMA locks we do not take mmap lock at all. > > I believe this is for expand_stack() protection, so I believe it's okay > to not vma write lock these vmas.. I don't think there are other areas > where we can modify the vmas without holding the mmap lock, but others > on the CC list please chime in if I've forgotten something. > > So, if I am correct, then you shouldn't lock next/prev and allow the > vma locking fault method on these vmas. This will work because > lock_vma_under_rcu() uses mas_walk() on the faulting address. That is, > your lock_vma_under_rcu() will fail to find anything that needs to be > grown and go back to mmap lock protection. As it is written today, the > vma locking fault handler will fail and we will wait for the mmap lock > to be released even when the vma isn't going to expand. So, let's consider a case when the next VMA is not being removed (so it was neither removed nor locked by munmap_sidetree()) and it is found by lock_vma_under_rcu() in the page fault handling path. Page fault handler can now expand it and push into the area we are unmapping in unmap_region(). That is the race I'm trying to prevent here by locking the next/prev VMAs which can be expanded before unmap_region() unmaps them. Am I missing something? > > > > - else if (prev && (prev->vm_flags & VM_GROWSUP)) > > + } else if (prev && (prev->vm_flags & VM_GROWSUP)) { > > + vma_start_write(prev); > > downgrade = false; > > - else > > + } else > > mmap_write_downgrade(mm); > > } > > > > -- > > 2.39.1 > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
* Suren Baghdasaryan <surenb@google.com> [230216 14:36]: > On Thu, Feb 16, 2023 at 7:34 AM Liam R. Howlett <Liam.Howlett@oracle.com> wrote: > > > > > > First, sorry I didn't see this before v3.. > > Feedback at any time is highly appreciated! > > > > > * Suren Baghdasaryan <surenb@google.com> [230216 00:18]: > > > While unmapping VMAs, adjacent VMAs might be able to grow into the area > > > being unmapped. In such cases write-lock adjacent VMAs to prevent this > > > growth. > > > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > > --- > > > mm/mmap.c | 8 +++++--- > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index 118b2246bba9..00f8c5798936 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -2399,11 +2399,13 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, > > > * down_read(mmap_lock) and collide with the VMA we are about to unmap. > > > */ > > > if (downgrade) { > > > - if (next && (next->vm_flags & VM_GROWSDOWN)) > > > + if (next && (next->vm_flags & VM_GROWSDOWN)) { > > > + vma_start_write(next); > > > downgrade = false; > > > > If the mmap write lock is insufficient to protect us from next/prev > > modifications then we need to move *most* of this block above the maple > > tree write operation, otherwise we have a race here. When I say most, I > > mean everything besides the call to mmap_write_downgrade() needs to be > > moved. > > Which prior maple tree write operation are you referring to? I see > __split_vma() and munmap_sidetree() which both already lock the VMAs > they operate on, so page faults can't happen in those VMAs. The write that removes the VMAs from the maple tree a few lines above.. /* Point of no return */ If the mmap lock is not sufficient, then we need to move the vma_start_write() of prev/next to above the call to vma_iter_clear_gfp() in do_vmi_align_munmap(). But I still think it IS enough. > > > > > If the mmap write lock is sufficient to protect us from next/prev > > modifications then we don't need to write lock the vmas themselves. > > mmap write lock is not sufficient because with per-VMA locks we do not > take mmap lock at all. Understood, but it also does not expand VMAs. > > > > > I believe this is for expand_stack() protection, so I believe it's okay > > to not vma write lock these vmas.. I don't think there are other areas > > where we can modify the vmas without holding the mmap lock, but others > > on the CC list please chime in if I've forgotten something. > > > > So, if I am correct, then you shouldn't lock next/prev and allow the > > vma locking fault method on these vmas. This will work because > > lock_vma_under_rcu() uses mas_walk() on the faulting address. That is, > > your lock_vma_under_rcu() will fail to find anything that needs to be > > grown and go back to mmap lock protection. As it is written today, the > > vma locking fault handler will fail and we will wait for the mmap lock > > to be released even when the vma isn't going to expand. > > So, let's consider a case when the next VMA is not being removed (so > it was neither removed nor locked by munmap_sidetree()) and it is > found by lock_vma_under_rcu() in the page fault handling path. By this point next VMA is either NULL or outside the munmap area, so what you said here is always true. >Page > fault handler can now expand it and push into the area we are > unmapping in unmap_region(). That is the race I'm trying to prevent > here by locking the next/prev VMAs which can be expanded before > unmap_region() unmaps them. Am I missing something? Yes, I think the part you are missing (or I am missing..) is that expand_stack() will never be called without the mmap lock. We don't use the vma locking to expand the stack. ...
On Fri, Feb 17, 2023 at 6:51 AM Liam R. Howlett <Liam.Howlett@oracle.com> wrote: > > * Suren Baghdasaryan <surenb@google.com> [230216 14:36]: > > On Thu, Feb 16, 2023 at 7:34 AM Liam R. Howlett <Liam.Howlett@oracle.com> wrote: > > > > > > > > > First, sorry I didn't see this before v3.. > > > > Feedback at any time is highly appreciated! > > > > > > > > * Suren Baghdasaryan <surenb@google.com> [230216 00:18]: > > > > While unmapping VMAs, adjacent VMAs might be able to grow into the area > > > > being unmapped. In such cases write-lock adjacent VMAs to prevent this > > > > growth. > > > > > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > > > --- > > > > mm/mmap.c | 8 +++++--- > > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > > index 118b2246bba9..00f8c5798936 100644 > > > > --- a/mm/mmap.c > > > > +++ b/mm/mmap.c > > > > @@ -2399,11 +2399,13 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, > > > > * down_read(mmap_lock) and collide with the VMA we are about to unmap. > > > > */ > > > > if (downgrade) { > > > > - if (next && (next->vm_flags & VM_GROWSDOWN)) > > > > + if (next && (next->vm_flags & VM_GROWSDOWN)) { > > > > + vma_start_write(next); > > > > downgrade = false; > > > > > > If the mmap write lock is insufficient to protect us from next/prev > > > modifications then we need to move *most* of this block above the maple > > > tree write operation, otherwise we have a race here. When I say most, I > > > mean everything besides the call to mmap_write_downgrade() needs to be > > > moved. > > > > Which prior maple tree write operation are you referring to? I see > > __split_vma() and munmap_sidetree() which both already lock the VMAs > > they operate on, so page faults can't happen in those VMAs. > > The write that removes the VMAs from the maple tree a few lines above.. > /* Point of no return */ > > If the mmap lock is not sufficient, then we need to move the > vma_start_write() of prev/next to above the call to > vma_iter_clear_gfp() in do_vmi_align_munmap(). > > But I still think it IS enough. > > > > > > > > > If the mmap write lock is sufficient to protect us from next/prev > > > modifications then we don't need to write lock the vmas themselves. > > > > mmap write lock is not sufficient because with per-VMA locks we do not > > take mmap lock at all. > > Understood, but it also does not expand VMAs. > > > > > > > > > I believe this is for expand_stack() protection, so I believe it's okay > > > to not vma write lock these vmas.. I don't think there are other areas > > > where we can modify the vmas without holding the mmap lock, but others > > > on the CC list please chime in if I've forgotten something. > > > > > > So, if I am correct, then you shouldn't lock next/prev and allow the > > > vma locking fault method on these vmas. This will work because > > > lock_vma_under_rcu() uses mas_walk() on the faulting address. That is, > > > your lock_vma_under_rcu() will fail to find anything that needs to be > > > grown and go back to mmap lock protection. As it is written today, the > > > vma locking fault handler will fail and we will wait for the mmap lock > > > to be released even when the vma isn't going to expand. > > > > So, let's consider a case when the next VMA is not being removed (so > > it was neither removed nor locked by munmap_sidetree()) and it is > > found by lock_vma_under_rcu() in the page fault handling path. > > By this point next VMA is either NULL or outside the munmap area, so > what you said here is always true. > > >Page > > fault handler can now expand it and push into the area we are > > unmapping in unmap_region(). That is the race I'm trying to prevent > > here by locking the next/prev VMAs which can be expanded before > > unmap_region() unmaps them. Am I missing something? > > Yes, I think the part you are missing (or I am missing..) is that > expand_stack() will never be called without the mmap lock. We don't use > the vma locking to expand the stack. Ah, yes, you are absolutely right. I missed that when the VMA explands as a result of a page fault, lock_vma_under_rcu() can't find the faulting VMA (the fault is outside of the area and hence the need to expand) and will fall back to mmap read locking. Since do_vmi_align_munmap() holds the mmap write lock and does not downgrade it, the race will be avoided and expansion will wait until we drop the mmap write lock. Good catch Liam! We can drop this patch completely from the series. Thanks, Suren. > > ... > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
diff --git a/mm/mmap.c b/mm/mmap.c index 118b2246bba9..00f8c5798936 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2399,11 +2399,13 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, * down_read(mmap_lock) and collide with the VMA we are about to unmap. */ if (downgrade) { - if (next && (next->vm_flags & VM_GROWSDOWN)) + if (next && (next->vm_flags & VM_GROWSDOWN)) { + vma_start_write(next); downgrade = false; - else if (prev && (prev->vm_flags & VM_GROWSUP)) + } else if (prev && (prev->vm_flags & VM_GROWSUP)) { + vma_start_write(prev); downgrade = false; - else + } else mmap_write_downgrade(mm); }