Message ID | 20230813123333.1705833-1-mjguzik@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp2232780vqi; Sun, 13 Aug 2023 07:37:22 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGwLXKFf0vT3ve3yhEn6uRpt7HPqeZr0xKoIS90x1CQqJx5o9WNDXOln1luPHgKG9T5+M+K X-Received: by 2002:a05:651c:217:b0:2b7:7ab:3c60 with SMTP id y23-20020a05651c021700b002b707ab3c60mr5214161ljn.32.1691937442025; Sun, 13 Aug 2023 07:37:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691937441; cv=none; d=google.com; s=arc-20160816; b=mwjNzH2OyEoEKUtR/89WkwmxZ/nHUx1p8Ehgo2DOupEtrlP1WiErt7/fmyG701yHso 08L6+ZLtkM1HQUMEYbIDsrG20lhU9JK6TLWYUJxcmTTVqUmnSjvK6/b9lKoj1rWWQm+e tCAtRGVoIRpTHc5xnd8VJQO6S8zMzzygjckPRnwZ0lfVUOGwb2upsflUKY8ugr7C/2Zv c55iPdDuXVvl6cCcpowyRkDbpx2BqBV7VCAxqNxN71tzwryK0HAqWFmqjOEIO4oUlaIr PBt3TRGtBEFyZa1zg0gnR8wZ48uur99TJaRGBbSU1sjC3mwoVa2hTtZmKEAT/ohBwmWt EdUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=tSjhW1uG1o6nQbq3sjFtkQd0RVg/lRnd6Jfl5GatpGQ=; fh=Ai2WA5AMBT6LvM84P+aobZfrkvFKDi6pfkYNni/hf6M=; b=cGmU4ppWyHno6FNUvHUCk/DMNPpWoiN0veQa9GunLL7o4dNT3iykaUhjlKnzxWCMy4 2ybQ2TBvUqmt96m4TcTj9CQ1iwA8gk1+ZtvH5aut6wYgjPwm83pVowhm3y3uBil+0Cpt IV8XV94klU5KJZ0eVK8hR0m41t5JlYlwCntdCO0oQ7q2dAt5Sh/bX6apORHed0d+b8gK o5+9KCkmUQ7ZQHFbDwRO2MuAGM1Qde+RC7W3mzu3ko5Uht+j6h1rkQtL45Z1shOv5n1e LbQPl4OFgDoJjRTfOFTPotZb6etX8Ml29iZv24djQj4xVdBEpieGfHyO5BheNDD5kSLV tPVw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=Wd3Fnl8Z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cb13-20020a170906a44d00b0099bc91c69fasi7214684ejb.392.2023.08.13.07.36.57; Sun, 13 Aug 2023 07:37:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=Wd3Fnl8Z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230245AbjHMMdp (ORCPT <rfc822;274620705z@gmail.com> + 99 others); Sun, 13 Aug 2023 08:33:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229478AbjHMMdo (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 13 Aug 2023 08:33:44 -0400 Received: from mail-lf1-x12a.google.com (mail-lf1-x12a.google.com [IPv6:2a00:1450:4864:20::12a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14AC5170A for <linux-kernel@vger.kernel.org>; Sun, 13 Aug 2023 05:33:45 -0700 (PDT) Received: by mail-lf1-x12a.google.com with SMTP id 2adb3069b0e04-4fe934c4decso4378309e87.1 for <linux-kernel@vger.kernel.org>; Sun, 13 Aug 2023 05:33:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691930023; x=1692534823; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=tSjhW1uG1o6nQbq3sjFtkQd0RVg/lRnd6Jfl5GatpGQ=; b=Wd3Fnl8ZMJI7vU8qFAWKyJs3Roeecc/2RWh8MqNKinKmhL82lginkhzba9Qa6YUH0I /Eo/J91R4cc14OKedBRGU/CcCb1c2BRSc3UL3I4Cvg6wem5hi+INWpXQlHWhG0/jJUJ5 n47zE1TYWiBihbm7xk93gcKernCHwG5dLaO2yRE5lDsvkAmXKC0w1xr4KBhh3Dxi0LL5 xA9TPowVG/eluGLSukW/xW6quFk4WZF5CtiQONwnN5z6QA7/akecF6GHrYtg8NxHAOjt wl6wtW2BNEqGGQc38Mm7ZPiYOjeqDTLQ3XQrSmVycp5KLGxPVjFUZzC5Kg7k8BdlKSMI CX+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691930023; x=1692534823; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=tSjhW1uG1o6nQbq3sjFtkQd0RVg/lRnd6Jfl5GatpGQ=; b=G8/H8eO+6FOltcmxVHae0sGUN7EtlwYze6LQNchl4y4I58hjnJ4BESWqUpf2u7b1TR e8UhRDLSDFeMEMXnYU0uQBotimlsoh1VBwUsYVVnDc2UkFn2gN2lPwc+AFbNlr6RXXl5 kKyqfqT1qhhlcCs1O8cuO7DpRVtvM7GezULh0+ac9rBCAAqrj/hBlYDb8dKdVrrYLBQ8 YbD5MhSJ525nB388NRGQUJznQAtuU0dp2iJTtj+RjWTacSsUHPDNaulkeJ7jUEKGkvEI HLGgXamMUOFrYS+r+N2/8FdgCkxNdj1Bow8JGA8lRHc8Cdrl38rS2O/6wCQKXFJ7vwWl CaKA== X-Gm-Message-State: AOJu0YxIXemrIexyi9MRR+LcD53QnQSQNYnOYAEMUW2+sAmbwka9ijV/ 5Uy1MNk7aWYroAqpj6xceO6pWX6/z4bRuQ== X-Received: by 2002:a05:6512:10d4:b0:4f8:7513:8cb0 with SMTP id k20-20020a05651210d400b004f875138cb0mr4928430lfg.2.1691930022946; Sun, 13 Aug 2023 05:33:42 -0700 (PDT) Received: from f.. (cst-prg-75-195.cust.vodafone.cz. [46.135.75.195]) by smtp.gmail.com with ESMTPSA id z24-20020aa7cf98000000b0052341df84d0sm4366470edx.33.2023.08.13.05.33.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 13 Aug 2023 05:33:42 -0700 (PDT) From: Mateusz Guzik <mjguzik@gmail.com> To: linux-kernel@vger.kernel.org Cc: torvalds@linux-foundation.org, brauner@kernel.org, ebiederm@xmission.com, david@redhat.com, akpm@linux-foundation.org, linux-mm@kvack.org, koct9i@gmail.com, oleg@redhat.com, dave@stgolabs.net, Mateusz Guzik <mjguzik@gmail.com> Subject: [PATCH] kernel/fork: stop playing lockless games for exe_file replacement Date: Sun, 13 Aug 2023 14:33:33 +0200 Message-Id: <20230813123333.1705833-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1774124995268494836 X-GMAIL-MSGID: 1774124995268494836 |
Series |
kernel/fork: stop playing lockless games for exe_file replacement
|
|
Commit Message
Mateusz Guzik
Aug. 13, 2023, 12:33 p.m. UTC
xchg originated in 6e399cd144d8 ("prctl: avoid using mmap_sem for
exe_file serialization"). While the commit message does not explain
*why* the change, clearly the intent was to use mmap_sem less in this
codepath. I found the original submission [1] which ultimately claims it
cleans things up.
However, replacement itself takes it for reading before doing any work
and other places associated with it also take it.
fe69d560b5bd ("kernel/fork: always deny write access to current MM
exe_file") added another lock trip to synchronize the state of exe_file
against fork, further defeating the point of xchg.
As such I think the atomic here only adds complexity for no benefit.
Just write-lock around the replacement.
I also note that replacement races against the mapping check loop as
nothing synchronizes actual assignment with with said checks but I am
not addressing it in this patch. (Is the loop of any use to begin with?)
Link: https://lore.kernel.org/linux-mm/1424979417.10344.14.camel@stgolabs.net/ [1]
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
kernel/fork.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
Comments
On 13.08.23 14:33, Mateusz Guzik wrote: > xchg originated in 6e399cd144d8 ("prctl: avoid using mmap_sem for > exe_file serialization"). While the commit message does not explain > *why* the change, clearly the intent was to use mmap_sem less in this > codepath. I found the original submission [1] which ultimately claims it > cleans things up. More details are apparently in v1 of that patch: https://lore.kernel.org/lkml/1424979417.10344.14.camel@stgolabs.net/ Regarding your patch: adding more mmap_write_lock() where avoidable, I'm not so sure. Your patch doesn't look (to me) like it is removing a lot of complexity.
On 8/14/23, David Hildenbrand <david@redhat.com> wrote: > On 13.08.23 14:33, Mateusz Guzik wrote: >> xchg originated in 6e399cd144d8 ("prctl: avoid using mmap_sem for >> exe_file serialization"). While the commit message does not explain >> *why* the change, clearly the intent was to use mmap_sem less in this >> codepath. I found the original submission [1] which ultimately claims it >> cleans things up. > > More details are apparently in v1 of that patch: > > https://lore.kernel.org/lkml/1424979417.10344.14.camel@stgolabs.net/ > > Regarding your patch: adding more mmap_write_lock() where avoidable, I'm > not so sure. > But exe_file access is already synchronized with the semaphore and your own commit added a mmap_read_lock/mmap_read_unlock cycle after the xchg in question to accomodate this requirement. Then mmap_write_lock around the replacement is the obvious thing to do. > Your patch doesn't look (to me) like it is removing a lot of complexity. > The code in the current form makes the reader ask what prompts xchg + read-lock instead of mere write-locking. This is not a hot path either and afaics it can only cause contention if userspace is trying to abuse the interface to break the kernel, messing with a processs hard at work (which would be an extra argument to not play games on kernel side). That said, no, it does not remove "a lot of complexity". It does remove some though at no real downside that I can see. But then again, it is on people who insist on xchg to justify it.
On 14.08.23 10:21, Mateusz Guzik wrote: > On 8/14/23, David Hildenbrand <david@redhat.com> wrote: >> On 13.08.23 14:33, Mateusz Guzik wrote: >>> xchg originated in 6e399cd144d8 ("prctl: avoid using mmap_sem for >>> exe_file serialization"). While the commit message does not explain >>> *why* the change, clearly the intent was to use mmap_sem less in this >>> codepath. I found the original submission [1] which ultimately claims it >>> cleans things up. >> >> More details are apparently in v1 of that patch: >> >> https://lore.kernel.org/lkml/1424979417.10344.14.camel@stgolabs.net/ >> >> Regarding your patch: adding more mmap_write_lock() where avoidable, I'm >> not so sure. >> > > But exe_file access is already synchronized with the semaphore and > your own commit added a mmap_read_lock/mmap_read_unlock cycle after > the xchg in question to accomodate this requirement. Yes, we want to handle concurrent fork() ("Don't race with dup_mmap()"), thus mmap_read_lock. > Then mmap_write_lock around the replacement is the obvious thing to do. Apparently to you. :) mmap_write_lock will block more than fork. For example, concurrent page faults (without new VMA locking), for no apparent reason to me. > >> Your patch doesn't look (to me) like it is removing a lot of complexity. >> > > The code in the current form makes the reader ask what prompts xchg + > read-lock instead of mere write-locking. > > This is not a hot path either and afaics it can only cause contention > if userspace is trying to abuse the interface to break the kernel, > messing with a processs hard at work (which would be an extra argument > to not play games on kernel side). > > That said, no, it does not remove "a lot of complexity". It does > remove some though at no real downside that I can see. > > But then again, it is on people who insist on xchg to justify it. Changing it now needs good justification, why we would want to block any concurrent MM activity. Maybe there is good justification. In any case, this commit would have to update the documentation of replace_mm_exe_file, that spells out existing locking behavior.
On 8/14/23, David Hildenbrand <david@redhat.com> wrote: > On 14.08.23 10:21, Mateusz Guzik wrote: >> On 8/14/23, David Hildenbrand <david@redhat.com> wrote: >>> On 13.08.23 14:33, Mateusz Guzik wrote: >>>> xchg originated in 6e399cd144d8 ("prctl: avoid using mmap_sem for >>>> exe_file serialization"). While the commit message does not explain >>>> *why* the change, clearly the intent was to use mmap_sem less in this >>>> codepath. I found the original submission [1] which ultimately claims >>>> it >>>> cleans things up. >>> >>> More details are apparently in v1 of that patch: >>> >>> https://lore.kernel.org/lkml/1424979417.10344.14.camel@stgolabs.net/ >>> >>> Regarding your patch: adding more mmap_write_lock() where avoidable, I'm >>> not so sure. >>> >> >> But exe_file access is already synchronized with the semaphore and >> your own commit added a mmap_read_lock/mmap_read_unlock cycle after >> the xchg in question to accomodate this requirement. > > Yes, we want to handle concurrent fork() ("Don't race with dup_mmap()"), > thus mmap_read_lock. > >> Then mmap_write_lock around the replacement is the obvious thing to do. > > Apparently to you. :) > > mmap_write_lock will block more than fork. For example, concurrent page > faults (without new VMA locking), for no apparent reason to me. > >> >>> Your patch doesn't look (to me) like it is removing a lot of complexity. >>> >> >> The code in the current form makes the reader ask what prompts xchg + >> read-lock instead of mere write-locking. >> >> This is not a hot path either and afaics it can only cause contention >> if userspace is trying to abuse the interface to break the kernel, >> messing with a processs hard at work (which would be an extra argument >> to not play games on kernel side). >> >> That said, no, it does not remove "a lot of complexity". It does >> remove some though at no real downside that I can see. >> >> But then again, it is on people who insist on xchg to justify it. > > Changing it now needs good justification, why we would want to block any > concurrent MM activity. Maybe there is good justification. > > In any case, this commit would have to update the documentation of > replace_mm_exe_file, that spells out existing locking behavior. > Perhaps it will help if I add that the prctl thingy always had a troubled relationship with locking. Last time I looked at it was in 2016, where I found that it was doing down_read to update arg_start/arg_end and others while a consumer in procfs would read them and assert on their sanity. As only a read-lock was held, 2 threads could be used to transiently produce a bogus state as they race with their updates and trigger the BUG. See this commit (but also excuse weirdly bad english ;)) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ddf1d398e517e660207e2c807f76a90df543a217 Moreover check out the following in prctl_set_auxv: task_lock(current); memcpy(mm->saved_auxv, user_auxv, len); task_unlock(current); any thread in the process can reach that codepath while sharing the same mm, thus this does not lock any updates. Not only that, but a duplicated memcpy onto the area in prctl_set_mm_map does not even take that lock and the code to read this does not take any locks. [Code duplication and synchronization aside, additional points deducted for saved_auxv storing always-NULL pointers instead of adding them on reads.] The above exhausts my willingness to argue about this change, I'm just a passerby. If it is NAKed, I'm dropping the subject. I am willing to do the comment tidy ups if this can go in though, but not before there is consensus.
On 08/13, Mateusz Guzik wrote: > > fe69d560b5bd ("kernel/fork: always deny write access to current MM > exe_file") added another lock trip to synchronize the state of exe_file > against fork, further defeating the point of xchg. > > As such I think the atomic here only adds complexity for no benefit. > > Just write-lock around the replacement. Well, I tend to agree but can't really comment because I forgot everything about these code paths. But I have to admit that I don't understand the code in replace_mm_exe_file() without this patch... old_exe_file = xchg(&mm->exe_file, new_exe_file); if (old_exe_file) { /* * Don't race with dup_mmap() getting the file and disallowing * write access while someone might open the file writable. */ mmap_read_lock(mm); allow_write_access(old_exe_file); fput(old_exe_file); mmap_read_unlock(mm); } Can someone please explain me which exactly race this mmap_read_lock() tries to avoid and how ? Oleg.
On 08/14, Oleg Nesterov wrote: > > On 08/13, Mateusz Guzik wrote: > > > > fe69d560b5bd ("kernel/fork: always deny write access to current MM > > exe_file") added another lock trip to synchronize the state of exe_file > > against fork, further defeating the point of xchg. > > > > As such I think the atomic here only adds complexity for no benefit. > > > > Just write-lock around the replacement. > > Well, I tend to agree but can't really comment because I forgot everything > about these code paths. > > But I have to admit that I don't understand the code in replace_mm_exe_file() > without this patch... > > old_exe_file = xchg(&mm->exe_file, new_exe_file); > if (old_exe_file) { > /* > * Don't race with dup_mmap() getting the file and disallowing > * write access while someone might open the file writable. > */ > mmap_read_lock(mm); > allow_write_access(old_exe_file); > fput(old_exe_file); > mmap_read_unlock(mm); > } > > Can someone please explain me which exactly race this mmap_read_lock() tries > to avoid and how ? OK, I seem to understand... without mmap_read_lock() it is possible that - dup_mm_exe_file() sees mm->exe_file = old_exe_file - replace_mm_exe_file() does allow_write_access(old_exe_file) - another process does get_write_access(old_exe_file) - dup_mm_exe_file()->deny_write_access() fails Right? Or something else? Well to me Mateusz's patch does make this logic more clear ;) Oleg.
On 14.08.23 10:54, Mateusz Guzik wrote: > On 8/14/23, David Hildenbrand <david@redhat.com> wrote: >> On 14.08.23 10:21, Mateusz Guzik wrote: >>> On 8/14/23, David Hildenbrand <david@redhat.com> wrote: >>>> On 13.08.23 14:33, Mateusz Guzik wrote: >>>>> xchg originated in 6e399cd144d8 ("prctl: avoid using mmap_sem for >>>>> exe_file serialization"). While the commit message does not explain >>>>> *why* the change, clearly the intent was to use mmap_sem less in this >>>>> codepath. I found the original submission [1] which ultimately claims >>>>> it >>>>> cleans things up. >>>> >>>> More details are apparently in v1 of that patch: >>>> >>>> https://lore.kernel.org/lkml/1424979417.10344.14.camel@stgolabs.net/ >>>> >>>> Regarding your patch: adding more mmap_write_lock() where avoidable, I'm >>>> not so sure. >>>> >>> >>> But exe_file access is already synchronized with the semaphore and >>> your own commit added a mmap_read_lock/mmap_read_unlock cycle after >>> the xchg in question to accomodate this requirement. >> >> Yes, we want to handle concurrent fork() ("Don't race with dup_mmap()"), >> thus mmap_read_lock. >> >>> Then mmap_write_lock around the replacement is the obvious thing to do. >> >> Apparently to you. :) >> >> mmap_write_lock will block more than fork. For example, concurrent page >> faults (without new VMA locking), for no apparent reason to me. >> >>> >>>> Your patch doesn't look (to me) like it is removing a lot of complexity. >>>> >>> >>> The code in the current form makes the reader ask what prompts xchg + >>> read-lock instead of mere write-locking. >>> >>> This is not a hot path either and afaics it can only cause contention >>> if userspace is trying to abuse the interface to break the kernel, >>> messing with a processs hard at work (which would be an extra argument >>> to not play games on kernel side). >>> >>> That said, no, it does not remove "a lot of complexity". It does >>> remove some though at no real downside that I can see. >>> >>> But then again, it is on people who insist on xchg to justify it. >> >> Changing it now needs good justification, why we would want to block any >> concurrent MM activity. Maybe there is good justification. >> >> In any case, this commit would have to update the documentation of >> replace_mm_exe_file, that spells out existing locking behavior. >> > > Perhaps it will help if I add that the prctl thingy always had a > troubled relationship with locking. Yes, it's not the first time that I looked at kernel/sys.c and wodnered why some of the toggles don't perform any locking. > > Last time I looked at it was in 2016, where I found that it was doing > down_read to update arg_start/arg_end and others while a consumer in > procfs would read them and assert on their sanity. As only a read-lock > was held, 2 threads could be used to transiently produce a bogus state > as they race with their updates and trigger the BUG. See this commit > (but also excuse weirdly bad english ;)) > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ddf1d398e517e660207e2c807f76a90df543a217 > > Moreover check out the following in prctl_set_auxv: > > task_lock(current); > memcpy(mm->saved_auxv, user_auxv, len); > task_unlock(current); > > any thread in the process can reach that codepath while sharing the > same mm, thus this does not lock any updates. Not only that, but a > duplicated memcpy onto the area in prctl_set_mm_map does not even take > that lock and the code to read this does not take any locks. > > [Code duplication and synchronization aside, additional points > deducted for saved_auxv storing always-NULL pointers instead of adding > them on reads.] > > The above exhausts my willingness to argue about this change, I'm just > a passerby. If it is NAKed, I'm dropping the subject. As long as nobody cares about concurrent MM activity being restricted with your change (I suspect we don't care), we're good.
On 08/14, David Hildenbrand wrote: > > >OK, I seem to understand... without mmap_read_lock() it is possible that > > > > - dup_mm_exe_file() sees mm->exe_file = old_exe_file > > > > - replace_mm_exe_file() does allow_write_access(old_exe_file) > > > > - another process does get_write_access(old_exe_file) > > > > - dup_mm_exe_file()->deny_write_access() fails > > > >Right? > > From what I recall, yes. Thanks! but then... David, this all is subjective, feel free to ignore, but the current code doesn't look good to me, I mean the purpose of mmap_read_lock() is very unclear. To me something like if (old_exe_file) { /* * Ensure that if we race with dup_mm_exe_file() and it sees * mm->exe_file == old_exe_file deny_write_access(old_exe_file) * can't fail after we do allow_write_access() and another task * does get_write_access(old_exe_file). */ mmap_read_lock(mm); mmap_read_unlock(mm); allow_write_access(old_exe_file); fput(old_exe_file); } looks more understandable... But this patch from Mateusz looks even better to me. So, FWIW, Acked-by: Oleg Nesterov <oleg@redhat.com>
On 14.08.23 17:58, Oleg Nesterov wrote: > On 08/14, David Hildenbrand wrote: >> >>> OK, I seem to understand... without mmap_read_lock() it is possible that >>> >>> - dup_mm_exe_file() sees mm->exe_file = old_exe_file >>> >>> - replace_mm_exe_file() does allow_write_access(old_exe_file) >>> >>> - another process does get_write_access(old_exe_file) >>> >>> - dup_mm_exe_file()->deny_write_access() fails >>> >>> Right? >> >> From what I recall, yes. > > Thanks! but then... David, this all is subjective, feel free to ignore, but > the current code doesn't look good to me, I mean the purpose of mmap_read_lock() > is very unclear. To me something like > > if (old_exe_file) { > /* > * Ensure that if we race with dup_mm_exe_file() and it sees > * mm->exe_file == old_exe_file deny_write_access(old_exe_file) > * can't fail after we do allow_write_access() and another task > * does get_write_access(old_exe_file). > */ > mmap_read_lock(mm); > mmap_read_unlock(mm); > > allow_write_access(old_exe_file); > fput(old_exe_file); > } > > looks more understandable... I don't particularly care about that code, and if there are ways to make it clearer, great. As long as we can clarify in the patch description why we decided to go again the other direction (write lock) and not do what we did in 2015, that would be great.
diff --git a/kernel/fork.c b/kernel/fork.c index d2e12b6d2b18..f576ce341e43 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1464,22 +1464,20 @@ int replace_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file) return ret; } - /* set the new file, lockless */ ret = deny_write_access(new_exe_file); if (ret) return -EACCES; get_file(new_exe_file); - old_exe_file = xchg(&mm->exe_file, new_exe_file); + /* set the new file */ + mmap_write_lock(mm); + old_exe_file = rcu_dereference_raw(mm->exe_file); + rcu_assign_pointer(mm->exe_file, new_exe_file); + mmap_write_unlock(mm); + if (old_exe_file) { - /* - * Don't race with dup_mmap() getting the file and disallowing - * write access while someone might open the file writable. - */ - mmap_read_lock(mm); allow_write_access(old_exe_file); fput(old_exe_file); - mmap_read_unlock(mm); } return 0; }