Message ID | 20230807085203.819772-1-david@readahead.eu |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c44e:0:b0:3f2:4152:657d with SMTP id w14csp1314005vqr; Mon, 7 Aug 2023 01:59:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGtw7EOKG6qBTTjMgPN1Ke5JTlRb3o/lupwFP6kv8xm8bS911ANo+qX4UhdS4sVQ0/Gz24I X-Received: by 2002:a05:6a20:138e:b0:126:f64b:6689 with SMTP id hn14-20020a056a20138e00b00126f64b6689mr7182901pzc.12.1691398752933; Mon, 07 Aug 2023 01:59:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691398752; cv=none; d=google.com; s=arc-20160816; b=Zwys+pjkjY76XwrZPkG3RllUkKFDfp7F4FLEqZvNARVAEY4FNqJ3XBlQC6KIrHoVxo ARLzM+O9bVRwrh6T35yzp158LG5XFTWM3VOwfS5hesoZAjt9Lct8Y+s+wf5hJQGwMqO7 6MU5ac5A3VtZA6pLdHOyjaHA5B+DgmijlYSB/RPRSidau40LMzb2vXTLeJ7DpqHFBOxJ UO/mQ/oNyViEfxlqpyLhR+2Wly6LqjVwrYFQjDm8mnifPOteo9tT1AchFaGwdMCUJsE4 9BhPJIPrtRJNEv6qFDP96IAnW8f7zOr/XW/+8bPuI7SGXe32uPPdddFWJK/wcOqZQ/iQ Y0wQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:feedback-id:dkim-signature :dkim-signature; bh=pK+JY1UZHRKX40EFsSZc3Ch/k+Ium31qkjrWu7lTaJ4=; fh=8SMVP86YAZTBxKnFOd469RQJ6gUylBfY6zFuDwGgmBk=; b=gjn2op/qhqOnLYsGCzk0wLFwZnG//OQD4KslINxF/EAFxKqrGWPSLMWrNaKzGmIE72 HU71xWYZ0jnLmKLstKOFrauFVbrhesqNFxE8E/8OLL1696kpABgRQuqhb7ZAJUeBXQnN 4YN30VRmC3Inf2rNlWh77JnRcufFriBUV0iQxYbrYDdJEbyt0IWi707xYP5jYWHD53Gm 4QG27DlKL82HyzlBx3g+iIH+kkZLKznwVUhq0xW1+k6Akgl+4ypYNVDbHYFcYAl+2tc+ Ze50QaOTSF51Tc5x7YDaVcKqEM6gBpSmmd+N1eTge/4JPNjEHw6qqoB45YHzzwsNbNH9 wMLw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@readahead.eu header.s=fm3 header.b=d51qVqoD; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=BPK1TApj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cw27-20020a056a00451b00b00687080bf78dsi5477307pfb.284.2023.08.07.01.58.50; Mon, 07 Aug 2023 01:59:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@readahead.eu header.s=fm3 header.b=d51qVqoD; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=BPK1TApj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229824AbjHGIwh (ORCPT <rfc822;aaronkmseo@gmail.com> + 99 others); Mon, 7 Aug 2023 04:52:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231172AbjHGIwb (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 7 Aug 2023 04:52:31 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30F6392 for <linux-kernel@vger.kernel.org>; Mon, 7 Aug 2023 01:52:30 -0700 (PDT) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 9529F5C0071; Mon, 7 Aug 2023 04:52:29 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Mon, 07 Aug 2023 04:52:29 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=readahead.eu; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:message-id:mime-version:reply-to:sender:subject :subject:to:to; s=fm3; t=1691398349; x=1691484749; bh=pK+JY1UZHR KX40EFsSZc3Ch/k+Ium31qkjrWu7lTaJ4=; b=d51qVqoD/EYic19iCczvJVjmCM T6Bg8JJDKX2UCRbq8qpGubBCy8ZgZaJXCrseisgm8IGV7etXE0ZNI6SC14+P7n1u Ry7eXwxiZMtt7sb+lNtoZDvS9W+X0X4weE2Am/goB9rHR7i638840T/gWxMeDCQ7 43clmkOTrfJ7lg3dFfbpzFJU2mOmSd9buWzR9VC1/tG92GhX401K0h+h5CJHmwIP UOvMbyHrGi1bS0ryXeyaxmXrZrTtByc6lm4USolHGvcz4BmjUN7QfEHG88g4Fr3Q FZjMrExl1UzQxFuBY5Krd9w/kTAxBz2ExWWmjE3ctF0ndsqbZ+mbfstl7SAg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:message-id:mime-version:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; t=1691398349; x=1691484749; bh=pK+JY1UZHRKX4 0EFsSZc3Ch/k+Ium31qkjrWu7lTaJ4=; b=BPK1TApja4rq2rGxfw0Rt3SkhwB94 3Q+L99PGQpjOUW6GonMq2NJSt2XkefX2qkQ+wHj+JkbVZJqhuoocs/ut0zIw8awK mk/VOkfSbJG4TLAPxHzegWBP8aGWPktDyI7PYFy5LnSn/j2TQu064EIr929Xj3ZE vOwU+y7TSweXlQpLdukbPni7Cg9lFMUTJZ7JLPO4tU3zxAyH7I3FaiUilBszAgbk 32Jcpj1Umb4QrQ9TsOraUkvtjjiiP1ujIXS6/iGPJeJk8rmMwLaWvJBvSNmP4ukW DKVlYz9MxQKNN18D5fvCwgHITZ5rR9SppqKwkS9p1BZ1UmkRVmvTPeu5A== X-ME-Sender: <xms:zLDQZMJAt1fjDHneBc1QUsQWe1RfBxuGVSP0A-834Bc-DBuJpzaW_Q> <xme:zLDQZML6AjSBYHI5UNV4f6nytkyds911G3YUfRLUgA3Mur1eZImD2ziLNTBGixnCK vnZhT0mTDpngs_U2Qg> X-ME-Received: <xmr:zLDQZMtoxzChWaHt5Veha3eTOsAOr931WHv6w92jNiLj4o0ugjCED0tvXiG2rirfwZz8mFsOu3AXIgCwctcnjn2IFwT1_bq3toSVJ7s2fwV1sieMi-FcsSA> X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrledtgddtjecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefhvfevufffkffoggfgsedtkeertdertddtnecuhfhrohhmpeffrghvihguucft hhgvihhnshgsvghrghcuoegurghvihgusehrvggruggrhhgvrggurdgvuheqnecuggftrf grthhtvghrnhepkeeivdeggeehleeltedujeejhedvfedvieeiiedvteevvdejhefgkeet gffggeevnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomh epuggrvhhiugesrhgvrggurghhvggrugdrvghu X-ME-Proxy: <xmx:zLDQZJZsFdfL3Kgquja4BHUJvPgtU1AV2copwAgw-65CG-Y3Xkvjzw> <xmx:zLDQZDbEOT5izRR8str_uAVst2-ecK49NoWBN-RePHV46zkS-WYIgQ> <xmx:zLDQZFCrj5vY2YJwr1qc4-i3L5yIMOQqKHRX-r238FGj9TGVwMxYSQ> <xmx:zbDQZOwZux1xziwONXfdPSul7mGGknTVTQNWZUlkCfRDdWIWQDKzPA> Feedback-ID: id2994666:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 7 Aug 2023 04:52:27 -0400 (EDT) From: David Rheinsberg <david@readahead.eu> To: linux-kernel@vger.kernel.org Cc: Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>, Kees Cook <keescook@chromium.org>, Alexander Mikhalitsyn <alexander@mihalicyn.com>, Luca Boccassi <bluca@debian.org>, David Rheinsberg <david@readahead.eu> Subject: [PATCH] pid: allow pidfds for reaped tasks Date: Mon, 7 Aug 2023 10:52:03 +0200 Message-ID: <20230807085203.819772-1-david@readahead.eu> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773560139004307245 X-GMAIL-MSGID: 1773560139004307245 |
Series |
pid: allow pidfds for reaped tasks
|
|
Commit Message
David Rheinsberg
Aug. 7, 2023, 8:52 a.m. UTC
A pidfd can currently only be created for tasks that are thread-group
leaders and not reaped. This patch changes the pidfd-core to allow for
pidfds on reapead thread-group leaders as well.
A pidfd can outlive the task it refers to, and thus user-space must
already be prepared that the task underlying a pidfd is gone at the time
they get their hands on the pidfd. For instance, resolving the pidfd to
a PID via the fdinfo must be prepared to read `-1`.
Despite user-space knowing that a pidfd might be stale, several kernel
APIs currently add another layer that checks for this. In particular,
SO_PEERPIDFD returns `EINVAL` if the peer-task was already reaped,
but returns a stale pidfd if the task is reaped immediately after the
respective alive-check.
This has the unfortunate effect that user-space now has two ways to
check for the exact same scenario: A syscall might return
EINVAL/ESRCH/... *or* the pidfd might be stale, even though there is no
particular reason to distinguish both cases. This also propagates
through user-space APIs, which pass on pidfds. They must be prepared to
pass on `-1` *or* the pidfd, because there is no guaranteed way to get a
stale pidfd from the kernel.
This patch changes the core pidfd helpers to allow creation of pidfds
even if the PID is no longer linked to any task. This only affects one
of the three pidfd users that currently exist:
1) fanotify already tests for a linked TGID-task manually before
creating the PIDFD, thus it is not directly affected by this change.
However, note that the current fanotify code fails with an error if
the target process is reaped exactly between the TGID-check in
fanotify and the test in pidfd_prepare(). With this patch, this
will no longer be the case.
2) pidfd_open(2) calls find_get_pid() before creating the pidfd, thus
it is also not directly affected by this change.
Again, similar to fanotify, there is a race between the
find_get_pid() call and pidfd_prepare(), which currently causes
pidfd_open(2) to return EINVAL rather than ESRCH if the process is
reaped just between those two checks. With this patch, this will no
longer be the case.
3) SO_PEERPIDFD will be affected by this change and from now on return
stale pidfds rather than EINVAL if the respective peer task is
reaped already.
Given that users of SO_PEERPIDFD must already deal with stale pidfds,
this change hopefully simplifies the API of SO_PEERPIDFD, and all
dependent user-space APIs (e.g., GetConnectionCredentials() on D-Bus
driver APIs). Also note that SO_PEERPIDFD is still pending to be
released with linux-6.5.
Signed-off-by: David Rheinsberg <david@readahead.eu>
---
kernel/fork.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
Comments
On Mon, Aug 7, 2023 at 10:52 AM David Rheinsberg <david@readahead.eu> wrote: > > A pidfd can currently only be created for tasks that are thread-group > leaders and not reaped. This patch changes the pidfd-core to allow for > pidfds on reapead thread-group leaders as well. > > A pidfd can outlive the task it refers to, and thus user-space must > already be prepared that the task underlying a pidfd is gone at the time > they get their hands on the pidfd. For instance, resolving the pidfd to > a PID via the fdinfo must be prepared to read `-1`. > > Despite user-space knowing that a pidfd might be stale, several kernel > APIs currently add another layer that checks for this. In particular, > SO_PEERPIDFD returns `EINVAL` if the peer-task was already reaped, > but returns a stale pidfd if the task is reaped immediately after the > respective alive-check. > > This has the unfortunate effect that user-space now has two ways to > check for the exact same scenario: A syscall might return > EINVAL/ESRCH/... *or* the pidfd might be stale, even though there is no > particular reason to distinguish both cases. This also propagates > through user-space APIs, which pass on pidfds. They must be prepared to > pass on `-1` *or* the pidfd, because there is no guaranteed way to get a > stale pidfd from the kernel. > > This patch changes the core pidfd helpers to allow creation of pidfds > even if the PID is no longer linked to any task. This only affects one > of the three pidfd users that currently exist: > > 1) fanotify already tests for a linked TGID-task manually before > creating the PIDFD, thus it is not directly affected by this change. > However, note that the current fanotify code fails with an error if > the target process is reaped exactly between the TGID-check in > fanotify and the test in pidfd_prepare(). With this patch, this > will no longer be the case. > > 2) pidfd_open(2) calls find_get_pid() before creating the pidfd, thus > it is also not directly affected by this change. > Again, similar to fanotify, there is a race between the > find_get_pid() call and pidfd_prepare(), which currently causes > pidfd_open(2) to return EINVAL rather than ESRCH if the process is > reaped just between those two checks. With this patch, this will no > longer be the case. > > 3) SO_PEERPIDFD will be affected by this change and from now on return > stale pidfds rather than EINVAL if the respective peer task is > reaped already. > > Given that users of SO_PEERPIDFD must already deal with stale pidfds, > this change hopefully simplifies the API of SO_PEERPIDFD, and all > dependent user-space APIs (e.g., GetConnectionCredentials() on D-Bus > driver APIs). Also note that SO_PEERPIDFD is still pending to be > released with linux-6.5. > > Signed-off-by: David Rheinsberg <david@readahead.eu> > --- > kernel/fork.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/kernel/fork.c b/kernel/fork.c > index d2e12b6d2b18..4dde19a8c264 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -2161,7 +2161,7 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re > * Allocate a new file that stashes @pid and reserve a new pidfd number in the > * caller's file descriptor table. The pidfd is reserved but not installed yet. > * > - * The helper verifies that @pid is used as a thread group leader. > + * The helper verifies that @pid is/was used as a thread group leader. > * > * If this function returns successfully the caller is responsible to either > * call fd_install() passing the returned pidfd and pidfd file as arguments in > @@ -2180,7 +2180,14 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re > */ > int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret) > { > - if (!pid || !pid_has_task(pid, PIDTYPE_TGID)) > + if (!pid) > + return -EINVAL; > + > + /* > + * Non thread-group leaders cannot have pidfds, but we allow them for > + * reaped thread-group leaders. > + */ > + if (pid_has_task(pid, PIDTYPE_PID) && !pid_has_task(pid, PIDTYPE_TGID)) > return -EINVAL; Hi David! As far as I understand, __unhash_process is always called with a tasklist_lock held for writing. Don't we need to take tasklist_lock for reading here to guarantee consistency between pid_has_task(pid, PIDTYPE_PID) and pid_has_task(pid, PIDTYPE_TGID) return values? Kind regards, Alex > > return __pidfd_prepare(pid, flags, ret); > -- > 2.41.0 >
Hi On Mon, Aug 7, 2023, at 11:01 AM, Alexander Mikhalitsyn wrote: > On Mon, Aug 7, 2023 at 10:52 AM David Rheinsberg <david@readahead.eu> wrote: [...] >> int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret) >> { >> - if (!pid || !pid_has_task(pid, PIDTYPE_TGID)) >> + if (!pid) >> + return -EINVAL; >> + >> + /* >> + * Non thread-group leaders cannot have pidfds, but we allow them for >> + * reaped thread-group leaders. >> + */ >> + if (pid_has_task(pid, PIDTYPE_PID) && !pid_has_task(pid, PIDTYPE_TGID)) >> return -EINVAL; > > Hi David! > > As far as I understand, __unhash_process is always called with a > tasklist_lock held for writing. > Don't we need to take tasklist_lock for reading here to guarantee > consistency between > pid_has_task(pid, PIDTYPE_PID) and pid_has_task(pid, PIDTYPE_TGID) > return values? You mean PIDTYPE_TGID being cleared before PIDTYPE_PID (at least from the perspective of the unlocked reader)? I don't think it is a compatibility issue, because the same issue existed before the patch. But it might indeed be required to avoid spurious EINVAL _while_ the target process is reaped. It would be unfortunate if we need that. Because it is really not required for AF_UNIX or fanotify (they guarantee that they always deal with TGIDs). So maybe the correct call is to just drop pidfd_prepare() and always use __pidfd_prepare()? So far the safety-measures of pidfd_prepare() introduced two races I already mentioned in the commit-message. So maybe it is just better to document that the caller of __pidfd_prepare() needs to ensure the source is/was a TGID? Thanks David
On Mon, Aug 7, 2023 at 11:12 AM David Rheinsberg <david@readahead.eu> wrote: > > Hi > > On Mon, Aug 7, 2023, at 11:01 AM, Alexander Mikhalitsyn wrote: > > On Mon, Aug 7, 2023 at 10:52 AM David Rheinsberg <david@readahead.eu> wrote: > [...] > >> int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret) > >> { > >> - if (!pid || !pid_has_task(pid, PIDTYPE_TGID)) > >> + if (!pid) > >> + return -EINVAL; > >> + > >> + /* > >> + * Non thread-group leaders cannot have pidfds, but we allow them for > >> + * reaped thread-group leaders. > >> + */ > >> + if (pid_has_task(pid, PIDTYPE_PID) && !pid_has_task(pid, PIDTYPE_TGID)) > >> return -EINVAL; > > > > Hi David! > > > > As far as I understand, __unhash_process is always called with a > > tasklist_lock held for writing. > > Don't we need to take tasklist_lock for reading here to guarantee > > consistency between > > pid_has_task(pid, PIDTYPE_PID) and pid_has_task(pid, PIDTYPE_TGID) > > return values? > > You mean PIDTYPE_TGID being cleared before PIDTYPE_PID (at least from the perspective of the unlocked reader)? I don't think it is a compatibility issue, because the same issue existed before the patch. But it might indeed be required to avoid spurious EINVAL _while_ the target process is reaped. Yes, that was my thought. At the same time we can see that __unhash_process() function at first detaches PIDTYPE_PID and then PIDTYPE_TGID. But without having any kind of memory barrier (and locks are also implicit memory barriers) we can't be sure that inconsistency won't happen here. > > It would be unfortunate if we need that. Because it is really not required for AF_UNIX or fanotify (they guarantee that they always deal with TGIDs). So maybe the correct call is to just drop pidfd_prepare() and always use __pidfd_prepare()? So far the safety-measures of pidfd_prepare() introduced two races I already mentioned in the commit-message. So maybe it is just better to document that the caller of __pidfd_prepare() needs to ensure the source is/was a TGID? Do you think that taking read_lock(&tasklist_lock) can cause any issues with contention on it? IMHO, read_lock should be safe as we are taking it for a short period of time. But anyways, I'm not insisting on that. I've just wanted to point this out to discuss with you and folks. Kind regards, Alex > > Thanks > David
Hey Oleg, A question for you below. On Mon, Aug 07, 2023 at 10:52:03AM +0200, David Rheinsberg wrote: > A pidfd can currently only be created for tasks that are thread-group > leaders and not reaped. This patch changes the pidfd-core to allow for > pidfds on reapead thread-group leaders as well. > > A pidfd can outlive the task it refers to, and thus user-space must > already be prepared that the task underlying a pidfd is gone at the time > they get their hands on the pidfd. For instance, resolving the pidfd to > a PID via the fdinfo must be prepared to read `-1`. > > Despite user-space knowing that a pidfd might be stale, several kernel > APIs currently add another layer that checks for this. In particular, > SO_PEERPIDFD returns `EINVAL` if the peer-task was already reaped, > but returns a stale pidfd if the task is reaped immediately after the > respective alive-check. > > This has the unfortunate effect that user-space now has two ways to > check for the exact same scenario: A syscall might return > EINVAL/ESRCH/... *or* the pidfd might be stale, even though there is no > particular reason to distinguish both cases. This also propagates > through user-space APIs, which pass on pidfds. They must be prepared to > pass on `-1` *or* the pidfd, because there is no guaranteed way to get a > stale pidfd from the kernel. > > This patch changes the core pidfd helpers to allow creation of pidfds > even if the PID is no longer linked to any task. This only affects one > of the three pidfd users that currently exist: > > 1) fanotify already tests for a linked TGID-task manually before > creating the PIDFD, thus it is not directly affected by this change. > However, note that the current fanotify code fails with an error if > the target process is reaped exactly between the TGID-check in > fanotify and the test in pidfd_prepare(). With this patch, this > will no longer be the case. > > 2) pidfd_open(2) calls find_get_pid() before creating the pidfd, thus > it is also not directly affected by this change. > Again, similar to fanotify, there is a race between the > find_get_pid() call and pidfd_prepare(), which currently causes > pidfd_open(2) to return EINVAL rather than ESRCH if the process is > reaped just between those two checks. With this patch, this will no > longer be the case. > > 3) SO_PEERPIDFD will be affected by this change and from now on return > stale pidfds rather than EINVAL if the respective peer task is > reaped already. > > Given that users of SO_PEERPIDFD must already deal with stale pidfds, > this change hopefully simplifies the API of SO_PEERPIDFD, and all > dependent user-space APIs (e.g., GetConnectionCredentials() on D-Bus > driver APIs). Also note that SO_PEERPIDFD is still pending to be > released with linux-6.5. > > Signed-off-by: David Rheinsberg <david@readahead.eu> > --- > kernel/fork.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/kernel/fork.c b/kernel/fork.c > index d2e12b6d2b18..4dde19a8c264 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -2161,7 +2161,7 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re > * Allocate a new file that stashes @pid and reserve a new pidfd number in the > * caller's file descriptor table. The pidfd is reserved but not installed yet. > * > - * The helper verifies that @pid is used as a thread group leader. > + * The helper verifies that @pid is/was used as a thread group leader. > * > * If this function returns successfully the caller is responsible to either > * call fd_install() passing the returned pidfd and pidfd file as arguments in > @@ -2180,7 +2180,14 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re > */ > int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret) > { > - if (!pid || !pid_has_task(pid, PIDTYPE_TGID)) > + if (!pid) > + return -EINVAL; > + > + /* > + * Non thread-group leaders cannot have pidfds, but we allow them for > + * reaped thread-group leaders. > + */ > + if (pid_has_task(pid, PIDTYPE_PID) && !pid_has_task(pid, PIDTYPE_TGID)) > return -EINVAL; TL;DR userspace wants to be able to get a pidfd to an already reaped thread-group leader. I don't see any issues with this. But I'm not entirely clear how to make it safe so that we can distinguish between @pid not being used as a thread-group leader and PIDTYPE_TGID having already been detached from @pid. IOW, we need a snapshot of PIDTYPE_PID and PIDTYPE_TGID so that we can compare the returned tasks (Or another way to achieve a similar result.). Any thoughts?
Hi Christian, Sorry for delay, I've just returned from vacation and I am slowly crawling my email backlog. On 08/07, Christian Brauner wrote: > > > int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret) > > { > > - if (!pid || !pid_has_task(pid, PIDTYPE_TGID)) > > + if (!pid) > > + return -EINVAL; > > + > > + /* > > + * Non thread-group leaders cannot have pidfds, but we allow them for > > + * reaped thread-group leaders. > > + */ > > + if (pid_has_task(pid, PIDTYPE_PID) && !pid_has_task(pid, PIDTYPE_TGID)) > > return -EINVAL; > > TL;DR userspace wants to be able to get a pidfd to an already reaped > thread-group leader. I don't see any issues with this. I guess I need to read the whole thread carefully, but right now I don't understand this patch and the problem... OK, suppose we have a group leader L with pid 100 and its sub-thread T with pid 101. With this patch pidfd_open(101) can succeed if T exits right after find_get_pid(101) because pid_has_task(pid, PIDTYPE_PID) above will fail, right? This looks wrong, 101 was never a leader pid... Oleg.
On Fri, Aug 11, 2023 at 01:29:11PM +0200, Oleg Nesterov wrote: > Hi Christian, > > Sorry for delay, I've just returned from vacation and I am slowly Absolutely no problem! Thanks for getting back to us. > crawling my email backlog. > > > > On 08/07, Christian Brauner wrote: > > > > > int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret) > > > { > > > - if (!pid || !pid_has_task(pid, PIDTYPE_TGID)) > > > + if (!pid) > > > + return -EINVAL; > > > + > > > + /* > > > + * Non thread-group leaders cannot have pidfds, but we allow them for > > > + * reaped thread-group leaders. > > > + */ > > > + if (pid_has_task(pid, PIDTYPE_PID) && !pid_has_task(pid, PIDTYPE_TGID)) > > > return -EINVAL; > > > > TL;DR userspace wants to be able to get a pidfd to an already reaped > > thread-group leader. I don't see any issues with this. > > I guess I need to read the whole thread carefully, but right now > I don't understand this patch and the problem... > > OK, suppose we have a group leader L with pid 100 and its sub-thread > T with pid 101. > > With this patch pidfd_open(101) can succeed if T exits right after > find_get_pid(101) because pid_has_task(pid, PIDTYPE_PID) above will > fail, right? > > This looks wrong, 101 was never a leader pid... Well, let me simplify the question: What code do we need to allow userspace to open a pidfd to a leader pid even if it has already been exited and reaped (without also accidently allowing to open non-lead pid pidfds)? I hope that clarifies?
As I said, I am not sure I understand the problem. And I know nothing about net/ but... On 08/07, Christian Brauner wrote: > > > SO_PEERPIDFD returns `EINVAL` if the peer-task was already reaped, > > but returns a stale pidfd if the task is reaped immediately after the > > respective alive-check. after the quick grep it seems that SO_PEERPIDFD can simply use __pidfd_prepare() ? We know that sk->sk_peer_pid was initialized with task_tgid(current) and thus we know it is (was) a valid TGID pid. The same is probably true for scm->pid and scm_pidfd_recv()... But again, I am not familiar with this code, I can be wrong. Oleg.
On 08/11, Christian Brauner wrote: > > > > > int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret) > > > > { > > > > - if (!pid || !pid_has_task(pid, PIDTYPE_TGID)) > > > > + if (!pid) > > > > + return -EINVAL; > > > > + > > > > + /* > > > > + * Non thread-group leaders cannot have pidfds, but we allow them for > > > > + * reaped thread-group leaders. > > > > + */ > > > > + if (pid_has_task(pid, PIDTYPE_PID) && !pid_has_task(pid, PIDTYPE_TGID)) > > > > return -EINVAL; > > > > > > TL;DR userspace wants to be able to get a pidfd to an already reaped > > > thread-group leader. I don't see any issues with this. > > > > I guess I need to read the whole thread carefully, but right now > > I don't understand this patch and the problem... > > > > OK, suppose we have a group leader L with pid 100 and its sub-thread > > T with pid 101. > > > > With this patch pidfd_open(101) can succeed if T exits right after > > find_get_pid(101) because pid_has_task(pid, PIDTYPE_PID) above will > > fail, right? > > > > This looks wrong, 101 was never a leader pid... > > Well, let me simplify the question: Thanks, > What code do we need to allow userspace to open a pidfd to a leader pid > even if it has already been exited and reaped (without also accidently > allowing to open non-lead pid pidfds)? I'll try to think more, but can you also explain why do we need this? See my another email. Can't we simply shift the pid_has_task(PIDTYPE_TGID) check from pidfd_prepare() to pidfd_create() ? (and then we can kill pidfd_prepare and rename __pidfd_prepare to pidfd_prepare). Oleg.
Hi Oleg, On Fri, Aug 11, 2023, at 1:57 PM, Oleg Nesterov wrote: >> What code do we need to allow userspace to open a pidfd to a leader pid >> even if it has already been exited and reaped (without also accidently >> allowing to open non-lead pid pidfds)? > > I'll try to think more, but can you also explain why do we need this? > > See my another email. Can't we simply shift the pid_has_task(PIDTYPE_TGID) > check from pidfd_prepare() to pidfd_create() ? (and then we can kill > pidfd_prepare and rename __pidfd_prepare to pidfd_prepare). Yes, the easiest solution would be to use `__pidfd_prepare()` and ensure that the caller only ever calls this on tg-leaders. This would work just fine, imo. And this was my initial approach. I think Christian preferred an explicit assertion that ensures we do not accidentally hand out pidfds for non-tg-leaders. The question is thus whether there is an easy way to assert this even for reaped tasks? Or whether there is a simple way to flag a pid that was used as tg-leader? Or, ultimately, whether this has limited use and we should just use `__pidfd_prepare()`? Thanks a lot! David
On 08/14, David Rheinsberg wrote: > > Hi Oleg, > > On Fri, Aug 11, 2023, at 1:57 PM, Oleg Nesterov wrote: > >> What code do we need to allow userspace to open a pidfd to a leader pid > >> even if it has already been exited and reaped (without also accidently > >> allowing to open non-lead pid pidfds)? > > > > I'll try to think more, but can you also explain why do we need this? > > > > See my another email. Can't we simply shift the pid_has_task(PIDTYPE_TGID) > > check from pidfd_prepare() to pidfd_create() ? (and then we can kill > > pidfd_prepare and rename __pidfd_prepare to pidfd_prepare). > > Yes, the easiest solution would be to use `__pidfd_prepare()` and ensure > that the caller only ever calls this on tg-leaders. This would work just > fine, imo. And this was my initial approach. Great, > I think Christian preferred an explicit assertion that ensures we do not > accidentally hand out pidfds for non-tg-leaders. The question is thus whether > there is an easy way to assert this even for reaped tasks? > Or whether there is a simple way to flag a pid that was used as tg-leader? I do not see how can we check if a detached pid was a leader pid, and I don't think it makes sense to add a new member into struct pid... > Or, ultimately, whether this has limited use and we should just use > `__pidfd_prepare()`? Well, if you confirm that sk->sk_peer_pid and scm->pid are always initialized with task_tgid(current), I'd certainly prefer this approach unless Christian objects. Oleg.
On Mon, Aug 14, 2023 at 3:21 PM Oleg Nesterov <oleg@redhat.com> wrote: > > On 08/14, David Rheinsberg wrote: > > > > Hi Oleg, > > > > On Fri, Aug 11, 2023, at 1:57 PM, Oleg Nesterov wrote: > > >> What code do we need to allow userspace to open a pidfd to a leader pid > > >> even if it has already been exited and reaped (without also accidently > > >> allowing to open non-lead pid pidfds)? > > > > > > I'll try to think more, but can you also explain why do we need this? > > > > > > See my another email. Can't we simply shift the pid_has_task(PIDTYPE_TGID) > > > check from pidfd_prepare() to pidfd_create() ? (and then we can kill > > > pidfd_prepare and rename __pidfd_prepare to pidfd_prepare). > > > > Yes, the easiest solution would be to use `__pidfd_prepare()` and ensure > > that the caller only ever calls this on tg-leaders. This would work just > > fine, imo. And this was my initial approach. > > Great, > > > I think Christian preferred an explicit assertion that ensures we do not > > accidentally hand out pidfds for non-tg-leaders. The question is thus whether > > there is an easy way to assert this even for reaped tasks? > > Or whether there is a simple way to flag a pid that was used as tg-leader? > > I do not see how can we check if a detached pid was a leader pid, and I don't > think it makes sense to add a new member into struct pid... > > > Or, ultimately, whether this has limited use and we should just use > > `__pidfd_prepare()`? > > Well, if you confirm that sk->sk_peer_pid and scm->pid are always initialized with > task_tgid(current), I'd certainly prefer this approach unless Christian objects. Dear colleagues, I can confirm that sk->sk_peer_pid and scm->pid are always thread-group leaders. Kind regards, Alex > > Oleg. >
On Mon, Aug 14, 2023 at 03:20:39PM +0200, Oleg Nesterov wrote: > On 08/14, David Rheinsberg wrote: > > > > Hi Oleg, > > > > On Fri, Aug 11, 2023, at 1:57 PM, Oleg Nesterov wrote: > > >> What code do we need to allow userspace to open a pidfd to a leader pid > > >> even if it has already been exited and reaped (without also accidently > > >> allowing to open non-lead pid pidfds)? > > > > > > I'll try to think more, but can you also explain why do we need this? > > > > > > See my another email. Can't we simply shift the pid_has_task(PIDTYPE_TGID) > > > check from pidfd_prepare() to pidfd_create() ? (and then we can kill > > > pidfd_prepare and rename __pidfd_prepare to pidfd_prepare). > > > > Yes, the easiest solution would be to use `__pidfd_prepare()` and ensure > > that the caller only ever calls this on tg-leaders. This would work just > > fine, imo. And this was my initial approach. > > Great, > > > I think Christian preferred an explicit assertion that ensures we do not > > accidentally hand out pidfds for non-tg-leaders. The question is thus whether > > there is an easy way to assert this even for reaped tasks? > > Or whether there is a simple way to flag a pid that was used as tg-leader? > > I do not see how can we check if a detached pid was a leader pid, and I don't > think it makes sense to add a new member into struct pid... > > > Or, ultimately, whether this has limited use and we should just use > > `__pidfd_prepare()`? > > Well, if you confirm that sk->sk_peer_pid and scm->pid are always initialized with > task_tgid(current), I'd certainly prefer this approach unless Christian objects. No no, I'm absolutely not objecting. I specifically want you to take the opinionated lead here. :) Thanks for chiming in!
diff --git a/kernel/fork.c b/kernel/fork.c index d2e12b6d2b18..4dde19a8c264 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2161,7 +2161,7 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re * Allocate a new file that stashes @pid and reserve a new pidfd number in the * caller's file descriptor table. The pidfd is reserved but not installed yet. * - * The helper verifies that @pid is used as a thread group leader. + * The helper verifies that @pid is/was used as a thread group leader. * * If this function returns successfully the caller is responsible to either * call fd_install() passing the returned pidfd and pidfd file as arguments in @@ -2180,7 +2180,14 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re */ int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret) { - if (!pid || !pid_has_task(pid, PIDTYPE_TGID)) + if (!pid) + return -EINVAL; + + /* + * Non thread-group leaders cannot have pidfds, but we allow them for + * reaped thread-group leaders. + */ + if (pid_has_task(pid, PIDTYPE_PID) && !pid_has_task(pid, PIDTYPE_TGID)) return -EINVAL; return __pidfd_prepare(pid, flags, ret);