pidfd: getfd should always report ESRCH if a task is exiting

Message ID 20240206164308.62620-1-tycho@tycho.pizza
State New
Headers
Series pidfd: getfd should always report ESRCH if a task is exiting |

Commit Message

Tycho Andersen Feb. 6, 2024, 4:43 p.m. UTC
  From: Tycho Andersen <tandersen@netflix.com>

We can get EBADF from __pidfd_fget() if a task is currently exiting, which
might be confusing. Let's check PF_EXITING, and just report ESRCH if so.

I chose PF_EXITING, because it is set in exit_signals(), which is called
before exit_files(). Since ->exit_status is mostly set after exit_files()
in exit_notify(), using that still leaves a window open for the race.

Signed-off-by: Tycho Andersen <tandersen@netflix.com>
---
 kernel/pid.c                                  |  2 +-
 .../selftests/pidfd/pidfd_getfd_test.c        | 31 ++++++++++++++++++-
 2 files changed, 31 insertions(+), 2 deletions(-)


base-commit: 082d11c164aef02e51bcd9c7cbf1554a8e42d9b5
  

Comments

Oleg Nesterov Feb. 6, 2024, 5:37 p.m. UTC | #1
On 02/06, Tycho Andersen wrote:
>
> From: Tycho Andersen <tandersen@netflix.com>
>
> We can get EBADF from __pidfd_fget() if a task is currently exiting, which
> might be confusing.

agreed, because EBADF looks as if the "fd" argument was wrong,

> Let's check PF_EXITING, and just report ESRCH if so.

agreed, we can pretend that the task has already exited,

But:

> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -688,7 +688,7 @@ static int pidfd_getfd(struct pid *pid, int fd)
>  	int ret;
>  
>  	task = get_pid_task(pid, PIDTYPE_PID);
> -	if (!task)
> +	if (!task || task->flags & PF_EXITING)
>  		return -ESRCH;

This looks racy. Suppose that pidfd_getfd() races with the exiting task.

It is possible that this task sets PF_EXITING and does exit_files()
after the "task->flags & PF_EXITING" check above and before pidfd_getfd()
does __pidfd_fget(), in this case pidfd_getfd() still returns the same
EBADF we want to avoid.

Perhaps we can change pidfd_getfd() to do

	if (IS_ERR(file))
		return (task->flags & PF_EXITING) ? -ESRCH : PTR_ERR(file);

instead?

This needs a comment to explain the PF_EXITING check. And perhaps another
comment to explain that we can't miss PF_EXITING if the target task has
already passed exit_files, both exit_files() and fget_task() take the same
task_lock(task).

What do you think?

Oleg.
  
Tycho Andersen Feb. 6, 2024, 5:55 p.m. UTC | #2
On Tue, Feb 06, 2024 at 06:37:22PM +0100, Oleg Nesterov wrote:
> > --- a/kernel/pid.c
> > +++ b/kernel/pid.c
> > @@ -688,7 +688,7 @@ static int pidfd_getfd(struct pid *pid, int fd)
> >  	int ret;
> >  
> >  	task = get_pid_task(pid, PIDTYPE_PID);
> > -	if (!task)
> > +	if (!task || task->flags & PF_EXITING)
> >  		return -ESRCH;
> 
> This looks racy. Suppose that pidfd_getfd() races with the exiting task.
> 
> It is possible that this task sets PF_EXITING and does exit_files()
> after the "task->flags & PF_EXITING" check above and before pidfd_getfd()
> does __pidfd_fget(), in this case pidfd_getfd() still returns the same
> EBADF we want to avoid.
> 
> Perhaps we can change pidfd_getfd() to do
> 
> 	if (IS_ERR(file))
> 		return (task->flags & PF_EXITING) ? -ESRCH : PTR_ERR(file);
> 
> instead?
> 
> This needs a comment to explain the PF_EXITING check. And perhaps another
> comment to explain that we can't miss PF_EXITING if the target task has
> already passed exit_files, both exit_files() and fget_task() take the same
> task_lock(task).
> 
> What do you think?

Yes, you're absolutely right. Let me resend.

Tycho
  
Oleg Nesterov Feb. 6, 2024, 6:06 p.m. UTC | #3
Sorry for noise, forgot to mention...

On 02/06, Oleg Nesterov wrote:
>
> On 02/06, Tycho Andersen wrote:
> >
> > From: Tycho Andersen <tandersen@netflix.com>
> >
> > We can get EBADF from __pidfd_fget() if a task is currently exiting, which
> > might be confusing.
> 
> agreed, because EBADF looks as if the "fd" argument was wrong,
> 
> > Let's check PF_EXITING, and just report ESRCH if so.
> 
> agreed, we can pretend that the task has already exited,
> 
> But:
> 
> > --- a/kernel/pid.c
> > +++ b/kernel/pid.c
> > @@ -688,7 +688,7 @@ static int pidfd_getfd(struct pid *pid, int fd)
> >  	int ret;
> >  
> >  	task = get_pid_task(pid, PIDTYPE_PID);
> > -	if (!task)
> > +	if (!task || task->flags & PF_EXITING)
> >  		return -ESRCH;
> 
> This looks racy. Suppose that pidfd_getfd() races with the exiting task.
> 
> It is possible that this task sets PF_EXITING and does exit_files()
> after the "task->flags & PF_EXITING" check above and before pidfd_getfd()
> does __pidfd_fget(), in this case pidfd_getfd() still returns the same
> EBADF we want to avoid.
> 
> Perhaps we can change pidfd_getfd() to do
> 
> 	if (IS_ERR(file))
> 		return (task->flags & PF_EXITING) ? -ESRCH : PTR_ERR(file);

Or we can check task->files != NULL rather than PF_EXITING.

To me this looks even better, but looks more confusing without a comment.
OTOH, imo this needs a comment anyway ;)

> 
> instead?
> 
> This needs a comment to explain the PF_EXITING check. And perhaps another
> comment to explain that we can't miss PF_EXITING if the target task has
> already passed exit_files, both exit_files() and fget_task() take the same
> task_lock(task).
> 
> What do you think?
> 
> Oleg.
  
Tycho Andersen Feb. 6, 2024, 6:09 p.m. UTC | #4
On Tue, Feb 06, 2024 at 07:06:07PM +0100, Oleg Nesterov wrote:
> Or we can check task->files != NULL rather than PF_EXITING.
> 
> To me this looks even better, but looks more confusing without a comment.
> OTOH, imo this needs a comment anyway ;)

I thought about this, but I didn't really understand the null check in
exit_files(); if it can really be called more than once, are there
other cases where task->files == NULL that we really should report
EBADF?

Tycho
  
Oleg Nesterov Feb. 6, 2024, 7:25 p.m. UTC | #5
On 02/06, Tycho Andersen wrote:

> On Tue, Feb 06, 2024 at 07:06:07PM +0100, Oleg Nesterov wrote:
> > Or we can check task->files != NULL rather than PF_EXITING.
> >
> > To me this looks even better, but looks more confusing without a comment.
> > OTOH, imo this needs a comment anyway ;)
>
> I thought about this, but I didn't really understand the null check in
> exit_files();

I guess task->files can be NULL at least if it was cloned with
kernel_clone_args->no_files == T

> if it can really be called more than once,

I don't think this is possible. Well, unless the exiting task hits
a BUG() after exit_files() and calls do_exit() recursively.

> are there
> other cases where task->files == NULL that we really should report
> EBADF?

I don't think so...

If nothing else, sys_close() dereferences current->files without any
checks, so I think task->files == NULL is simply impossible if this
task is a userspace process/thread until it exits.

But Tycho, I won't insist. If you prefer to check PF_EXITING, I am fine.

Oleg.
  
Tycho Andersen Feb. 6, 2024, 7:35 p.m. UTC | #6
On Tue, Feb 06, 2024 at 08:25:54PM +0100, Oleg Nesterov wrote:
> But Tycho, I won't insist. If you prefer to check PF_EXITING, I am fine.

Looks like we raced, I sent a v2 with PF_EXITING, mostly because I
didn't want to run into weird things I didn't understand. I'm happy to
fix it up to check ->files if that's what you prefer Christian?

Tycho
  
Christian Brauner Feb. 7, 2024, 9:11 a.m. UTC | #7
On Tue, Feb 06, 2024 at 08:25:54PM +0100, Oleg Nesterov wrote:
> On 02/06, Tycho Andersen wrote:
> 
> > On Tue, Feb 06, 2024 at 07:06:07PM +0100, Oleg Nesterov wrote:
> > > Or we can check task->files != NULL rather than PF_EXITING.
> > >
> > > To me this looks even better, but looks more confusing without a comment.
> > > OTOH, imo this needs a comment anyway ;)
> >
> > I thought about this, but I didn't really understand the null check in
> > exit_files();
> 
> I guess task->files can be NULL at least if it was cloned with
> kernel_clone_args->no_files == T

Won't this give false positives for vhost workers which do set
->no_files but are user workers? IOW, return -ESRCH even though -EBADF
would be correct in this scenario?
  
Oleg Nesterov Feb. 7, 2024, 10:28 a.m. UTC | #8
On 02/07, Christian Brauner wrote:
>
> On Tue, Feb 06, 2024 at 08:25:54PM +0100, Oleg Nesterov wrote:
> > On 02/06, Tycho Andersen wrote:
> >
> > > On Tue, Feb 06, 2024 at 07:06:07PM +0100, Oleg Nesterov wrote:
> > > > Or we can check task->files != NULL rather than PF_EXITING.
> > > >
> > > > To me this looks even better, but looks more confusing without a comment.
> > > > OTOH, imo this needs a comment anyway ;)
> > >
> > > I thought about this, but I didn't really understand the null check in
> > > exit_files();
> >
> > I guess task->files can be NULL at least if it was cloned with
> > kernel_clone_args->no_files == T
>
> Won't this give false positives for vhost workers which do set
> ->no_files but are user workers? IOW, return -ESRCH even though -EBADF
> would be correct in this scenario?

OK, agreed. Lets check PF_EXITING or exit_state.

Oleg.
  

Patch

diff --git a/kernel/pid.c b/kernel/pid.c
index de0bf2f8d18b..db8731f0ee45 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -688,7 +688,7 @@  static int pidfd_getfd(struct pid *pid, int fd)
 	int ret;
 
 	task = get_pid_task(pid, PIDTYPE_PID);
-	if (!task)
+	if (!task || task->flags & PF_EXITING)
 		return -ESRCH;
 
 	file = __pidfd_fget(task, fd);
diff --git a/tools/testing/selftests/pidfd/pidfd_getfd_test.c b/tools/testing/selftests/pidfd/pidfd_getfd_test.c
index 0930e2411dfb..cd51d547b751 100644
--- a/tools/testing/selftests/pidfd/pidfd_getfd_test.c
+++ b/tools/testing/selftests/pidfd/pidfd_getfd_test.c
@@ -5,6 +5,7 @@ 
 #include <fcntl.h>
 #include <limits.h>
 #include <linux/types.h>
+#include <poll.h>
 #include <sched.h>
 #include <signal.h>
 #include <stdio.h>
@@ -129,6 +130,7 @@  FIXTURE(child)
 	 * When it is closed, the child will exit.
 	 */
 	int sk;
+	bool ignore_child_result;
 };
 
 FIXTURE_SETUP(child)
@@ -165,10 +167,14 @@  FIXTURE_SETUP(child)
 
 FIXTURE_TEARDOWN(child)
 {
+	int ret;
+
 	EXPECT_EQ(0, close(self->pidfd));
 	EXPECT_EQ(0, close(self->sk));
 
-	EXPECT_EQ(0, wait_for_pid(self->pid));
+	ret = wait_for_pid(self->pid);
+	if (!self->ignore_child_result)
+		EXPECT_EQ(0, ret);
 }
 
 TEST_F(child, disable_ptrace)
@@ -235,6 +241,29 @@  TEST(flags_set)
 	EXPECT_EQ(errno, EINVAL);
 }
 
+TEST_F(child, no_strange_EBADF)
+{
+	struct pollfd fds;
+
+	self->ignore_child_result = true;
+
+	fds.fd = self->pidfd;
+	fds.events = POLLIN;
+
+	ASSERT_EQ(kill(self->pid, SIGKILL), 0);
+	ASSERT_EQ(poll(&fds, 1, 5000), 1);
+
+	/*
+	 * It used to be that pidfd_getfd() could race with the exiting thread
+	 * between exit_files() and release_task(), and get a non-null task
+	 * with a NULL files struct, and you'd get EBADF, which was slightly
+	 * confusing.
+	 */
+	errno = 0;
+	EXPECT_EQ(sys_pidfd_getfd(self->pidfd, self->remote_fd, 0), -1);
+	EXPECT_EQ(errno, ESRCH);
+}
+
 #if __NR_pidfd_getfd == -1
 int main(void)
 {