Message ID | 20230227173632.3292573-19-surenb@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp2552202wrd; Mon, 27 Feb 2023 09:39:24 -0800 (PST) X-Google-Smtp-Source: AK7set/bQjIrTk9BI/xoGZeEsPEmlRN1hL8H0Uoo5MVd/OECPcgwpV5czNm+wOCOf6k8Sf4mw3cf X-Received: by 2002:a05:6402:742:b0:4ad:1e21:9981 with SMTP id p2-20020a056402074200b004ad1e219981mr243575edy.41.1677519564483; Mon, 27 Feb 2023 09:39:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677519564; cv=none; d=google.com; s=arc-20160816; b=CacCSPYVA4O26MBmKb5LbQKIKKXwxVYacsiGZj7wNWDXIu71MaRJoHeYrhmMzCbFBy rRJyXWtyZ9MAvHciVPsjAv6StS6Be3Dog5wDr3OrmscYdt0MmOQxDOlTh5Q454HlaZbe uKN25QXO+89PmpA+NlceoIyaPyVy57+wxH4GSzL6RMrpBbbpFYwXFQK47r3MxU7hyB2K 4WV7OBMmUToCAzuci9DeViqiueiDSs4L2a/hak+/KHpQWqdXNEZbO0ExhTu9wxdZxmqG tDy5tpKJUYsQqLqOopuiv6ZF2HQoowTIvOv/GBBT37x6Rvq9q0uAF8r5DnME+1sTWqNY Q6Wg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=r0m7Mx7yExEfpE40BLjkT9H7Bv79s8byLhYJ0wRxTC4=; b=jJqJMLgfLp/XluHRo0cytwfvG8qm5/895BLOY7gcLeWlVoAEcBoUN53mc6jNlckwyj cZiIg0uEZO+QW8IrONQ3/xGIdc38LKvxXpZAY8LnQN7oB8vYPixaNLADbT2NYgp1gdTC X7Fta0ChKjyHfyMnYTgGMFjqToT+Rb9U7Um+vfN0LXQgnsAdtKuAH0+dYpYsUis/KH6J pmEwbUUdM9k9FIif51WLKFgG/5d2+PE/bsVGZSkmyy1sffYHdBvzQPsIfgHOGdtUXPPx e0fEgTwjGYFl8vMcPkBFPET2KrcWJm1LPc1GyJrn2+3JsDF37Ak03F/j5YysV22X/vq5 Is3g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=J6HTSW3i; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z15-20020aa7cf8f000000b004ad7c43aa20si8851680edx.39.2023.02.27.09.39.01; Mon, 27 Feb 2023 09:39:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=J6HTSW3i; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230293AbjB0Ri0 (ORCPT <rfc822;wenzhi022@gmail.com> + 99 others); Mon, 27 Feb 2023 12:38:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229558AbjB0Rhr (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 27 Feb 2023 12:37:47 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E321724C89 for <linux-kernel@vger.kernel.org>; Mon, 27 Feb 2023 09:37:21 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5376fa4106eso153559837b3.7 for <linux-kernel@vger.kernel.org>; Mon, 27 Feb 2023 09:37:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=r0m7Mx7yExEfpE40BLjkT9H7Bv79s8byLhYJ0wRxTC4=; b=J6HTSW3ije3IDV45f4imLJua2F9c9fG/LezyaDIF0k2VV4fuwgH+fku2QXSjWP2auL d+6+O+xNSm6Xj7FlWbcn2IucG0PD3/gJW2mH/MURmde3Q+9WMeczT04Csr++aQNQAB4v uOclz5rts/w0oMZxgj6JdiddKEJJwswsgJAAZvi48OAzGBi1BuSBhTNF8+d2d3+YbCDm fwaoPEsSaG9C8KiMPsrgpeVhqU2cm1pHUxTYDzrWvN6667+vmOEZ293x3rsB2i9qTzir LhGya0SRahx81MeDy4vFauftQ8Z+i+VjBO+uhvPnINIKKKPeRWHZkWIWZS9shPaay/sM /HGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=r0m7Mx7yExEfpE40BLjkT9H7Bv79s8byLhYJ0wRxTC4=; b=5uYSAYKwD5doJXYhic5PCBoDqbu70mY5a36+UV34OOTYxNhLjtbowNKbbNyt7SX6Zp Hrswm12lbMtbln6IyqLKNshrhvhypHs5hnk+PSUa2LPaCjVhbopMjrzNetsaGayZ/RCV svHV5HwQRwc7zQHAsK86AmtJJLTmbJ/OB9z3YNN1M5gPtB/Q5hLhAwScm+/53Oy2jBE/ buk75JMs16VObL87jxMYDjzXKgdygYDsirawymQC49cDHcfJkjA9bguhu273ldh1XM6g lx3LRGqlB1DeBkmPMROn3/jDKV1wLdhc4KHtgaBsJ3GqKDyIK5vK4IUsbv+rMPle+Ty0 IxyQ== X-Gm-Message-State: AO0yUKU1hEQvZ6J/paSSYmBWR4TNfWXwULDhNcIEN235UxqVfVmQQKYT jW/nHdvkf0I3DuOzbKu/bJaqqQzkb0A= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a25:938e:0:b0:a60:c167:c056 with SMTP id a14-20020a25938e000000b00a60c167c056mr4911453ybm.9.1677519440118; Mon, 27 Feb 2023 09:37:20 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:17 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-19-surenb@google.com> Subject: [PATCH v4 18/33] mm: write-lock VMAs before removing them from VMA tree From: Suren Baghdasaryan <surenb@google.com> To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan <surenb@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759006755129015190?= X-GMAIL-MSGID: =?utf-8?q?1759006755129015190?= |
Series |
Per-VMA locks
|
|
Commit Message
Suren Baghdasaryan
Feb. 27, 2023, 5:36 p.m. UTC
Write-locking VMAs before isolating them ensures that page fault
handlers don't operate on isolated VMAs.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
mm/mmap.c | 1 +
mm/nommu.c | 5 +++++
2 files changed, 6 insertions(+)
Comments
On Mon, Feb 27, 2023 at 09:36:17AM -0800, Suren Baghdasaryan wrote: > Write-locking VMAs before isolating them ensures that page fault > handlers don't operate on isolated VMAs. > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > --- > mm/mmap.c | 1 + > mm/nommu.c | 5 +++++ > 2 files changed, 6 insertions(+) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 1f42b9a52b9b..f7ed357056c4 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2255,6 +2255,7 @@ int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, > static inline int munmap_sidetree(struct vm_area_struct *vma, > struct ma_state *mas_detach) > { > + vma_start_write(vma); > mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1); I may be missing something, but have few questions: 1) Why does a writer need to both write-lock a VMA and mark the VMA detached when unmapping it, isn't it enough to just only write-lock a VMA? 2) as VMAs that are going to be removed are already locked in vma_prepare(), so I think this hunk could be dropped? > if (mas_store_gfp(mas_detach, vma, GFP_KERNEL)) > return -ENOMEM; > diff --git a/mm/nommu.c b/mm/nommu.c > index 57ba243c6a37..2ab162d773e2 100644 > --- a/mm/nommu.c > +++ b/mm/nommu.c > @@ -588,6 +588,7 @@ static int delete_vma_from_mm(struct vm_area_struct *vma) > current->pid); > return -ENOMEM; > } > + vma_start_write(vma); > cleanup_vma_from_mm(vma); 3) I think this hunk could be dropped as Per-VMA lock depends on MMU anyway. Thanks, Hyeonggon > > /* remove from the MM's tree and list */ > @@ -1519,6 +1520,10 @@ void exit_mmap(struct mm_struct *mm) > */ > mmap_write_lock(mm); > for_each_vma(vmi, vma) { > + /* > + * No need to lock VMA because this is the only mm user and no > + * page fault handled can race with it. > + */ > cleanup_vma_from_mm(vma); > delete_vma(mm, vma); > cond_resched(); > -- > 2.39.2.722.g9855ee24e9-goog > >
On Wed, Mar 01, 2023 at 07:43:33AM +0000, Hyeonggon Yoo wrote: > On Mon, Feb 27, 2023 at 09:36:17AM -0800, Suren Baghdasaryan wrote: > > Write-locking VMAs before isolating them ensures that page fault > > handlers don't operate on isolated VMAs. > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > --- > > mm/mmap.c | 1 + > > mm/nommu.c | 5 +++++ > > 2 files changed, 6 insertions(+) > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > index 1f42b9a52b9b..f7ed357056c4 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -2255,6 +2255,7 @@ int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, > > static inline int munmap_sidetree(struct vm_area_struct *vma, > > struct ma_state *mas_detach) > > { > > + vma_start_write(vma); > > mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1); > > I may be missing something, but have few questions: > > 1) Why does a writer need to both write-lock a VMA and mark the VMA detached > when unmapping it, isn't it enough to just only write-lock a VMA? > > 2) as VMAs that are going to be removed are already locked in vma_prepare(), > so I think this hunk could be dropped? After sending this just realized that I did not consider simple munmap case :) But I still think 1) and 3) are valid question. > > > if (mas_store_gfp(mas_detach, vma, GFP_KERNEL)) > > return -ENOMEM; > > diff --git a/mm/nommu.c b/mm/nommu.c > > index 57ba243c6a37..2ab162d773e2 100644 > > --- a/mm/nommu.c > > +++ b/mm/nommu.c > > @@ -588,6 +588,7 @@ static int delete_vma_from_mm(struct vm_area_struct *vma) > > current->pid); > > return -ENOMEM; > > } > > + vma_start_write(vma); > > cleanup_vma_from_mm(vma); > > 3) I think this hunk could be dropped as Per-VMA lock depends on MMU anyway. > > Thanks, > Hyeonggon > > > > > /* remove from the MM's tree and list */ > > @@ -1519,6 +1520,10 @@ void exit_mmap(struct mm_struct *mm) > > */ > > mmap_write_lock(mm); > > for_each_vma(vmi, vma) { > > + /* > > + * No need to lock VMA because this is the only mm user and no > > + * page fault handled can race with it. > > + */ > > cleanup_vma_from_mm(vma); > > delete_vma(mm, vma); > > cond_resched(); > > -- > > 2.39.2.722.g9855ee24e9-goog > > > > >
On Tue, Feb 28, 2023 at 11:57 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote: > > On Wed, Mar 01, 2023 at 07:43:33AM +0000, Hyeonggon Yoo wrote: > > On Mon, Feb 27, 2023 at 09:36:17AM -0800, Suren Baghdasaryan wrote: > > > Write-locking VMAs before isolating them ensures that page fault > > > handlers don't operate on isolated VMAs. > > > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > > --- > > > mm/mmap.c | 1 + > > > mm/nommu.c | 5 +++++ > > > 2 files changed, 6 insertions(+) > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index 1f42b9a52b9b..f7ed357056c4 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -2255,6 +2255,7 @@ int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, > > > static inline int munmap_sidetree(struct vm_area_struct *vma, > > > struct ma_state *mas_detach) > > > { > > > + vma_start_write(vma); > > > mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1); > > > > I may be missing something, but have few questions: > > > > 1) Why does a writer need to both write-lock a VMA and mark the VMA detached > > when unmapping it, isn't it enough to just only write-lock a VMA? We need to mark the VMA detached to avoid handling page fault in a detached VMA. The possible scenario is: lock_vma_under_rcu vma = mas_walk(&mas) munmap_sidetree vma_start_write(vma) mas_store_gfp() // remove VMA from the tree vma_end_write_all() vma_start_read(vma) // we locked the VMA but it is not part of the tree anymore. So, marking the VMA locked before vma_end_write_all() and checking vma->detached after vma_start_read() helps us avoid handling faults in the detached VMA. > > > > 2) as VMAs that are going to be removed are already locked in vma_prepare(), > > so I think this hunk could be dropped? > > After sending this just realized that I did not consider simple munmap case :) > But I still think 1) and 3) are valid question. > > > > > > if (mas_store_gfp(mas_detach, vma, GFP_KERNEL)) > > > return -ENOMEM; > > > diff --git a/mm/nommu.c b/mm/nommu.c > > > index 57ba243c6a37..2ab162d773e2 100644 > > > --- a/mm/nommu.c > > > +++ b/mm/nommu.c > > > @@ -588,6 +588,7 @@ static int delete_vma_from_mm(struct vm_area_struct *vma) > > > current->pid); > > > return -ENOMEM; > > > } > > > + vma_start_write(vma); > > > cleanup_vma_from_mm(vma); > > > > 3) I think this hunk could be dropped as Per-VMA lock depends on MMU anyway. Ah, yes, you are right. We can safely remove the changes in nommu.c Andrew, should I post a fixup or you can make the removal directly in mm-unstable? Thanks, Suren. > > > > Thanks, > > Hyeonggon > > > > > > > > /* remove from the MM's tree and list */ > > > @@ -1519,6 +1520,10 @@ void exit_mmap(struct mm_struct *mm) > > > */ > > > mmap_write_lock(mm); > > > for_each_vma(vmi, vma) { > > > + /* > > > + * No need to lock VMA because this is the only mm user and no > > > + * page fault handled can race with it. > > > + */ > > > cleanup_vma_from_mm(vma); > > > delete_vma(mm, vma); > > > cond_resched(); > > > -- > > > 2.39.2.722.g9855ee24e9-goog > > > > > > > > > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
On Wed, Mar 1, 2023 at 10:34 AM Suren Baghdasaryan <surenb@google.com> wrote: > > On Tue, Feb 28, 2023 at 11:57 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote: > > > > On Wed, Mar 01, 2023 at 07:43:33AM +0000, Hyeonggon Yoo wrote: > > > On Mon, Feb 27, 2023 at 09:36:17AM -0800, Suren Baghdasaryan wrote: > > > > Write-locking VMAs before isolating them ensures that page fault > > > > handlers don't operate on isolated VMAs. > > > > > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > > > --- > > > > mm/mmap.c | 1 + > > > > mm/nommu.c | 5 +++++ > > > > 2 files changed, 6 insertions(+) > > > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > > index 1f42b9a52b9b..f7ed357056c4 100644 > > > > --- a/mm/mmap.c > > > > +++ b/mm/mmap.c > > > > @@ -2255,6 +2255,7 @@ int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, > > > > static inline int munmap_sidetree(struct vm_area_struct *vma, > > > > struct ma_state *mas_detach) > > > > { > > > > + vma_start_write(vma); > > > > mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1); > > > > > > I may be missing something, but have few questions: > > > > > > 1) Why does a writer need to both write-lock a VMA and mark the VMA detached > > > when unmapping it, isn't it enough to just only write-lock a VMA? > > We need to mark the VMA detached to avoid handling page fault in a > detached VMA. The possible scenario is: > > lock_vma_under_rcu > vma = mas_walk(&mas) > munmap_sidetree > vma_start_write(vma) > > mas_store_gfp() // remove VMA from the tree > vma_end_write_all() > vma_start_read(vma) > // we locked the VMA but it is not part of the tree anymore. > > So, marking the VMA locked before vma_end_write_all() and checking Sorry, I should have said "marking the VMA *detached* before vma_end_write_all() and checking vma->detached after vma_start_read() helps us avoid handling faults in the detached VMA." > vma->detached after vma_start_read() helps us avoid handling faults in > the detached VMA. > > > > > > > > 2) as VMAs that are going to be removed are already locked in vma_prepare(), > > > so I think this hunk could be dropped? > > > > After sending this just realized that I did not consider simple munmap case :) > > But I still think 1) and 3) are valid question. > > > > > > > > > if (mas_store_gfp(mas_detach, vma, GFP_KERNEL)) > > > > return -ENOMEM; > > > > diff --git a/mm/nommu.c b/mm/nommu.c > > > > index 57ba243c6a37..2ab162d773e2 100644 > > > > --- a/mm/nommu.c > > > > +++ b/mm/nommu.c > > > > @@ -588,6 +588,7 @@ static int delete_vma_from_mm(struct vm_area_struct *vma) > > > > current->pid); > > > > return -ENOMEM; > > > > } > > > > + vma_start_write(vma); > > > > cleanup_vma_from_mm(vma); > > > > > > 3) I think this hunk could be dropped as Per-VMA lock depends on MMU anyway. > > Ah, yes, you are right. We can safely remove the changes in nommu.c > Andrew, should I post a fixup or you can make the removal directly in > mm-unstable? > Thanks, > Suren. > > > > > > > Thanks, > > > Hyeonggon > > > > > > > > > > > /* remove from the MM's tree and list */ > > > > @@ -1519,6 +1520,10 @@ void exit_mmap(struct mm_struct *mm) > > > > */ > > > > mmap_write_lock(mm); > > > > for_each_vma(vmi, vma) { > > > > + /* > > > > + * No need to lock VMA because this is the only mm user and no > > > > + * page fault handled can race with it. > > > > + */ > > > > cleanup_vma_from_mm(vma); > > > > delete_vma(mm, vma); > > > > cond_resched(); > > > > -- > > > > 2.39.2.722.g9855ee24e9-goog > > > > > > > > > > > > > > > -- > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. > >
On Wed, Mar 1, 2023 at 10:34 AM Suren Baghdasaryan <surenb@google.com> wrote: > > On Tue, Feb 28, 2023 at 11:57 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote: > > > > On Wed, Mar 01, 2023 at 07:43:33AM +0000, Hyeonggon Yoo wrote: > > > On Mon, Feb 27, 2023 at 09:36:17AM -0800, Suren Baghdasaryan wrote: > > > > Write-locking VMAs before isolating them ensures that page fault > > > > handlers don't operate on isolated VMAs. > > > > > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > > > --- > > > > mm/mmap.c | 1 + > > > > mm/nommu.c | 5 +++++ > > > > 2 files changed, 6 insertions(+) > > > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > > index 1f42b9a52b9b..f7ed357056c4 100644 > > > > --- a/mm/mmap.c > > > > +++ b/mm/mmap.c > > > > @@ -2255,6 +2255,7 @@ int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, > > > > static inline int munmap_sidetree(struct vm_area_struct *vma, > > > > struct ma_state *mas_detach) > > > > { > > > > + vma_start_write(vma); > > > > mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1); > > > > > > I may be missing something, but have few questions: > > > > > > 1) Why does a writer need to both write-lock a VMA and mark the VMA detached > > > when unmapping it, isn't it enough to just only write-lock a VMA? > > We need to mark the VMA detached to avoid handling page fault in a > detached VMA. The possible scenario is: > > lock_vma_under_rcu > vma = mas_walk(&mas) > munmap_sidetree > vma_start_write(vma) > > mas_store_gfp() // remove VMA from the tree > vma_end_write_all() > vma_start_read(vma) > // we locked the VMA but it is not part of the tree anymore. > > So, marking the VMA locked before vma_end_write_all() and checking > vma->detached after vma_start_read() helps us avoid handling faults in > the detached VMA. > > > > > > > > 2) as VMAs that are going to be removed are already locked in vma_prepare(), > > > so I think this hunk could be dropped? > > > > After sending this just realized that I did not consider simple munmap case :) > > But I still think 1) and 3) are valid question. > > > > > > > > > if (mas_store_gfp(mas_detach, vma, GFP_KERNEL)) > > > > return -ENOMEM; > > > > diff --git a/mm/nommu.c b/mm/nommu.c > > > > index 57ba243c6a37..2ab162d773e2 100644 > > > > --- a/mm/nommu.c > > > > +++ b/mm/nommu.c > > > > @@ -588,6 +588,7 @@ static int delete_vma_from_mm(struct vm_area_struct *vma) > > > > current->pid); > > > > return -ENOMEM; > > > > } > > > > + vma_start_write(vma); > > > > cleanup_vma_from_mm(vma); > > > > > > 3) I think this hunk could be dropped as Per-VMA lock depends on MMU anyway. > > Ah, yes, you are right. We can safely remove the changes in nommu.c > Andrew, should I post a fixup or you can make the removal directly in > mm-unstable? I went ahead and posted the fixup for this at: https://lore.kernel.org/all/20230301190457.1498985-1-surenb@google.com/ > Thanks, > Suren. > > > > > > > Thanks, > > > Hyeonggon > > > > > > > > > > > /* remove from the MM's tree and list */ > > > > @@ -1519,6 +1520,10 @@ void exit_mmap(struct mm_struct *mm) > > > > */ > > > > mmap_write_lock(mm); > > > > for_each_vma(vmi, vma) { > > > > + /* > > > > + * No need to lock VMA because this is the only mm user and no > > > > + * page fault handled can race with it. > > > > + */ > > > > cleanup_vma_from_mm(vma); > > > > delete_vma(mm, vma); > > > > cond_resched(); > > > > -- > > > > 2.39.2.722.g9855ee24e9-goog > > > > > > > > > > > > > > > -- > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. > >
On Wed, Mar 01, 2023 at 10:42:48AM -0800, Suren Baghdasaryan wrote: > On Wed, Mar 1, 2023 at 10:34 AM Suren Baghdasaryan <surenb@google.com> wrote: > > > > On Tue, Feb 28, 2023 at 11:57 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote: > > > > > > On Wed, Mar 01, 2023 at 07:43:33AM +0000, Hyeonggon Yoo wrote: > > > > On Mon, Feb 27, 2023 at 09:36:17AM -0800, Suren Baghdasaryan wrote: > > > > > Write-locking VMAs before isolating them ensures that page fault > > > > > handlers don't operate on isolated VMAs. > > > > > > > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > > > > --- > > > > > mm/mmap.c | 1 + > > > > > mm/nommu.c | 5 +++++ > > > > > 2 files changed, 6 insertions(+) > > > > > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > > > index 1f42b9a52b9b..f7ed357056c4 100644 > > > > > --- a/mm/mmap.c > > > > > +++ b/mm/mmap.c > > > > > @@ -2255,6 +2255,7 @@ int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, > > > > > static inline int munmap_sidetree(struct vm_area_struct *vma, > > > > > struct ma_state *mas_detach) > > > > > { > > > > > + vma_start_write(vma); > > > > > mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1); > > > > > > > > I may be missing something, but have few questions: > > > > > > > > 1) Why does a writer need to both write-lock a VMA and mark the VMA detached > > > > when unmapping it, isn't it enough to just only write-lock a VMA? > > > > We need to mark the VMA detached to avoid handling page fault in a > > detached VMA. The possible scenario is: > > > > lock_vma_under_rcu > > vma = mas_walk(&mas) > > munmap_sidetree > > vma_start_write(vma) > > > > mas_store_gfp() // remove VMA from the tree > > vma_end_write_all() > > vma_start_read(vma) > > // we locked the VMA but it is not part of the tree anymore. > > > > So, marking the VMA locked before vma_end_write_all() and checking > > Sorry, I should have said "marking the VMA *detached* before > vma_end_write_all() and checking vma->detached after vma_start_read() > helps us avoid handling faults in the detached VMA." > > > vma->detached after vma_start_read() helps us avoid handling faults in > > the detached VMA. Thank you for explanation, that makes sense! By the way, if there are no 32bit users of Per-VMA lock (are there?), "detached" bool could be a VMA flag (i.e. making it depend on 64BIT and selecting ARCH_USES_HIGH_VMA_FLAGS) Thanks, Hyeonggon
On Wed, Mar 1, 2023 at 4:54 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote: > > On Wed, Mar 01, 2023 at 10:42:48AM -0800, Suren Baghdasaryan wrote: > > On Wed, Mar 1, 2023 at 10:34 AM Suren Baghdasaryan <surenb@google.com> wrote: > > > > > > On Tue, Feb 28, 2023 at 11:57 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote: > > > > > > > > On Wed, Mar 01, 2023 at 07:43:33AM +0000, Hyeonggon Yoo wrote: > > > > > On Mon, Feb 27, 2023 at 09:36:17AM -0800, Suren Baghdasaryan wrote: > > > > > > Write-locking VMAs before isolating them ensures that page fault > > > > > > handlers don't operate on isolated VMAs. > > > > > > > > > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > > > > > --- > > > > > > mm/mmap.c | 1 + > > > > > > mm/nommu.c | 5 +++++ > > > > > > 2 files changed, 6 insertions(+) > > > > > > > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > > > > index 1f42b9a52b9b..f7ed357056c4 100644 > > > > > > --- a/mm/mmap.c > > > > > > +++ b/mm/mmap.c > > > > > > @@ -2255,6 +2255,7 @@ int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, > > > > > > static inline int munmap_sidetree(struct vm_area_struct *vma, > > > > > > struct ma_state *mas_detach) > > > > > > { > > > > > > + vma_start_write(vma); > > > > > > mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1); > > > > > > > > > > I may be missing something, but have few questions: > > > > > > > > > > 1) Why does a writer need to both write-lock a VMA and mark the VMA detached > > > > > when unmapping it, isn't it enough to just only write-lock a VMA? > > > > > > We need to mark the VMA detached to avoid handling page fault in a > > > detached VMA. The possible scenario is: > > > > > > lock_vma_under_rcu > > > vma = mas_walk(&mas) > > > munmap_sidetree > > > vma_start_write(vma) > > > > > > mas_store_gfp() // remove VMA from the tree > > > vma_end_write_all() > > > vma_start_read(vma) > > > // we locked the VMA but it is not part of the tree anymore. > > > > > > So, marking the VMA locked before vma_end_write_all() and checking > > > > Sorry, I should have said "marking the VMA *detached* before > > vma_end_write_all() and checking vma->detached after vma_start_read() > > helps us avoid handling faults in the detached VMA." > > > > > vma->detached after vma_start_read() helps us avoid handling faults in > > > the detached VMA. > > Thank you for explanation, that makes sense! > > By the way, if there are no 32bit users of Per-VMA lock (are there?), > "detached" bool could be a VMA flag (i.e. making it depend on 64BIT > and selecting ARCH_USES_HIGH_VMA_FLAGS) Yeah, I thought about it but didn't want to make assumptions about potential users just yet. Besides, I heard there are attempts to make vm_flags to be always 64-bit (I think Matthew mentioned that to me once). If that happens, we won't need any dependencies here. Either way, this conversion into a flag can be done as an additional optimization later on. I prefer to keep the main patchset as simple as possible for now. Thanks, Suren. > > Thanks, > Hyeonggon > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
diff --git a/mm/mmap.c b/mm/mmap.c index 1f42b9a52b9b..f7ed357056c4 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2255,6 +2255,7 @@ int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, static inline int munmap_sidetree(struct vm_area_struct *vma, struct ma_state *mas_detach) { + vma_start_write(vma); mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1); if (mas_store_gfp(mas_detach, vma, GFP_KERNEL)) return -ENOMEM; diff --git a/mm/nommu.c b/mm/nommu.c index 57ba243c6a37..2ab162d773e2 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -588,6 +588,7 @@ static int delete_vma_from_mm(struct vm_area_struct *vma) current->pid); return -ENOMEM; } + vma_start_write(vma); cleanup_vma_from_mm(vma); /* remove from the MM's tree and list */ @@ -1519,6 +1520,10 @@ void exit_mmap(struct mm_struct *mm) */ mmap_write_lock(mm); for_each_vma(vmi, vma) { + /* + * No need to lock VMA because this is the only mm user and no + * page fault handled can race with it. + */ cleanup_vma_from_mm(vma); delete_vma(mm, vma); cond_resched();