[v3] fuse: In fuse_flush only wait if someone wants the return code

Message ID 20221114160209.1229849-1-tycho@tycho.pizza
State New
Headers
Series [v3] fuse: In fuse_flush only wait if someone wants the return code |

Commit Message

Tycho Andersen Nov. 14, 2022, 4:02 p.m. UTC
  If a fuse filesystem is mounted inside a container, there is a problem
during pid namespace destruction. The scenario is:

1. task (a thread in the fuse server, with a fuse file open) starts
   exiting, does exit_signals(), goes into fuse_flush() -> wait
2. fuse daemon gets killed, tries to wake everyone up
3. task from 1 is stuck because complete_signal() doesn't wake it up, since
   it has PF_EXITING.

The result is that the thread will never be woken up, and pid namespace
destruction will block indefinitely.

To add insult to injury, nobody is waiting for these return codes, since
the pid namespace is being destroyed.

To fix this, let's not block on flush operations when the current task has
PF_EXITING.

This does change the semantics slightly: the wait here is for posix locks
to be unlocked, so the task will exit before things are unlocked. To quote
Miklos: https://lore.kernel.org/all/CAJfpegsTmiO-sKaBLgoVT4WxDXBkRES=HF1YmQN1ES7gfJEJ+w@mail.gmail.com/

> "remote" posix locks are almost never used due to problems like this,
> so I think it's safe to do this.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
Link: https://lore.kernel.org/all/YrShFXRLtRt6T%2Fj+@risky/
---
v2: drop the fuse_flush_async() function and just re-use the already
    prepared args; add a description of the problem+note about posix locks
v3: use schedule_work() to avoid other sleeps in inode_write_now() and
    fuse_sync_writes(). Fix a UAF of the stack-based inarg.
---
 fs/fuse/file.c | 106 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 84 insertions(+), 22 deletions(-)


base-commit: f0c4d9fc9cc9462659728d168387191387e903cc
  

Comments

Tycho Andersen Nov. 28, 2022, 3 p.m. UTC | #1
Hi Milkos,

On Mon, Nov 14, 2022 at 09:02:09AM -0700, Tycho Andersen wrote:
> v3: use schedule_work() to avoid other sleeps in inode_write_now() and
>     fuse_sync_writes(). Fix a UAF of the stack-based inarg.

Thoughts on this version?

Thanks,

Tycho
  
Miklos Szeredi Dec. 8, 2022, 2:26 p.m. UTC | #2
On Mon, 28 Nov 2022 at 16:01, Tycho Andersen <tycho@tycho.pizza> wrote:
>
> Hi Milkos,
>
> On Mon, Nov 14, 2022 at 09:02:09AM -0700, Tycho Andersen wrote:
> > v3: use schedule_work() to avoid other sleeps in inode_write_now() and
> >     fuse_sync_writes(). Fix a UAF of the stack-based inarg.
>
> Thoughts on this version?

Skipping attr invalidation on success is wrong.  And there's still too
much duplication, IMO.

How about the attached (untested) patch?

Thanks,
Miklos
  
Tycho Andersen Dec. 8, 2022, 5:49 p.m. UTC | #3
On Thu, Dec 08, 2022 at 03:26:19PM +0100, Miklos Szeredi wrote:
> On Mon, 28 Nov 2022 at 16:01, Tycho Andersen <tycho@tycho.pizza> wrote:
> >
> > Hi Milkos,
> >
> > On Mon, Nov 14, 2022 at 09:02:09AM -0700, Tycho Andersen wrote:
> > > v3: use schedule_work() to avoid other sleeps in inode_write_now() and
> > >     fuse_sync_writes(). Fix a UAF of the stack-based inarg.
> >
> > Thoughts on this version?
> 
> Skipping attr invalidation on success is wrong.

Agreed, that looks like my mistake.

> How about the attached (untested) patch?

It passes my reproducer with no warnings or anything. Feel free to
add:

Tested-by: Tycho Andersen <tycho@tycho.pizza>

if you want to commit it.

Tycho
  
Tycho Andersen Dec. 19, 2022, 7:16 p.m. UTC | #4
On Thu, Dec 08, 2022 at 10:49:30AM -0700, Tycho Andersen wrote:
> On Thu, Dec 08, 2022 at 03:26:19PM +0100, Miklos Szeredi wrote:
> > On Mon, 28 Nov 2022 at 16:01, Tycho Andersen <tycho@tycho.pizza> wrote:
> > >
> > > Hi Milkos,
> > >
> > > On Mon, Nov 14, 2022 at 09:02:09AM -0700, Tycho Andersen wrote:
> > > > v3: use schedule_work() to avoid other sleeps in inode_write_now() and
> > > >     fuse_sync_writes(). Fix a UAF of the stack-based inarg.
> > >
> > > Thoughts on this version?
> > 
> > Skipping attr invalidation on success is wrong.
> 
> Agreed, that looks like my mistake.
> 
> > How about the attached (untested) patch?
> 
> It passes my reproducer with no warnings or anything. Feel free to
> add:
> 
> Tested-by: Tycho Andersen <tycho@tycho.pizza>
> 
> if you want to commit it.

Ping, thoughts on landing this?

Thanks,

Tycho
  
Tycho Andersen Jan. 3, 2023, 2:51 p.m. UTC | #5
On Mon, Dec 19, 2022 at 12:16:50PM -0700, Tycho Andersen wrote:
> On Thu, Dec 08, 2022 at 10:49:30AM -0700, Tycho Andersen wrote:
> > On Thu, Dec 08, 2022 at 03:26:19PM +0100, Miklos Szeredi wrote:
> > > On Mon, 28 Nov 2022 at 16:01, Tycho Andersen <tycho@tycho.pizza> wrote:
> > > >
> > > > Hi Milkos,
> > > >
> > > > On Mon, Nov 14, 2022 at 09:02:09AM -0700, Tycho Andersen wrote:
> > > > > v3: use schedule_work() to avoid other sleeps in inode_write_now() and
> > > > >     fuse_sync_writes(). Fix a UAF of the stack-based inarg.
> > > >
> > > > Thoughts on this version?
> > > 
> > > Skipping attr invalidation on success is wrong.
> > 
> > Agreed, that looks like my mistake.
> > 
> > > How about the attached (untested) patch?
> > 
> > It passes my reproducer with no warnings or anything. Feel free to
> > add:
> > 
> > Tested-by: Tycho Andersen <tycho@tycho.pizza>
> > 
> > if you want to commit it.
> 
> Ping, thoughts on landing this?

Happy new year all. Any update here?

Thanks,

Tycho
  
Serge Hallyn Jan. 5, 2023, 3:15 p.m. UTC | #6
On Tue, Jan 03, 2023 at 07:51:22AM -0700, Tycho Andersen wrote:
> On Mon, Dec 19, 2022 at 12:16:50PM -0700, Tycho Andersen wrote:
> > On Thu, Dec 08, 2022 at 10:49:30AM -0700, Tycho Andersen wrote:
> > > On Thu, Dec 08, 2022 at 03:26:19PM +0100, Miklos Szeredi wrote:
> > > > On Mon, 28 Nov 2022 at 16:01, Tycho Andersen <tycho@tycho.pizza> wrote:
> > > > >
> > > > > Hi Milkos,
> > > > >
> > > > > On Mon, Nov 14, 2022 at 09:02:09AM -0700, Tycho Andersen wrote:
> > > > > > v3: use schedule_work() to avoid other sleeps in inode_write_now() and
> > > > > >     fuse_sync_writes(). Fix a UAF of the stack-based inarg.
> > > > >
> > > > > Thoughts on this version?
> > > > 
> > > > Skipping attr invalidation on success is wrong.
> > > 
> > > Agreed, that looks like my mistake.
> > > 
> > > > How about the attached (untested) patch?
> > > 
> > > It passes my reproducer with no warnings or anything. Feel free to
> > > add:
> > > 
> > > Tested-by: Tycho Andersen <tycho@tycho.pizza>
> > > 
> > > if you want to commit it.
> > 
> > Ping, thoughts on landing this?
> 
> Happy new year all. Any update here?
> 
> Thanks,
> 
> Tycho

Thanks for pushing on this, Tycho.  I'd suggest sending a clean new version
incorporating Miklos' fix.

-serge
  
Miklos Szeredi Jan. 26, 2023, 2:12 p.m. UTC | #7
On Tue, 3 Jan 2023 at 15:51, Tycho Andersen <tycho@tycho.pizza> wrote:

> Happy new year all. Any update here?

Applied, thanks.

Miklos
  

Patch

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 71bfb663aac5..10173b0e74b7 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -18,6 +18,7 @@ 
 #include <linux/falloc.h>
 #include <linux/uio.h>
 #include <linux/fs.h>
+#include <linux/file.h>
 
 static int fuse_send_open(struct fuse_mount *fm, u64 nodeid,
 			  unsigned int open_flags, int opcode,
@@ -477,20 +478,20 @@  static void fuse_sync_writes(struct inode *inode)
 	fuse_release_nowrite(inode);
 }
 
-static int fuse_flush(struct file *file, fl_owner_t id)
+static void fuse_invalidate_attrs(struct fuse_mount *fm, int err, struct inode *inode)
 {
-	struct inode *inode = file_inode(file);
-	struct fuse_mount *fm = get_fuse_mount(inode);
-	struct fuse_file *ff = file->private_data;
-	struct fuse_flush_in inarg;
-	FUSE_ARGS(args);
-	int err;
-
-	if (fuse_is_bad(inode))
-		return -EIO;
+	/*
+	 * In memory i_blocks is not maintained by fuse, if writeback cache is
+	 * enabled, i_blocks from cached attr may not be accurate.
+	 */
+	if (!err && fm->fc->writeback_cache)
+		fuse_invalidate_attr_mask(inode, STATX_BLOCKS);
+}
 
-	if (ff->open_flags & FOPEN_NOFLUSH && !fm->fc->writeback_cache)
-		return 0;
+static int do_fuse_flush(struct fuse_mount *fm, struct inode *inode,
+			 struct file *file, struct fuse_args *args)
+{
+	int err;
 
 	err = write_inode_now(inode, 1);
 	if (err)
@@ -504,6 +505,53 @@  static int fuse_flush(struct file *file, fl_owner_t id)
 	if (err)
 		return err;
 
+	err = fuse_simple_request(fm, args);
+	if (err == -ENOSYS) {
+		fm->fc->no_flush = 1;
+		err = 0;
+	}
+
+	return err;
+}
+
+struct fuse_flush_args {
+	struct fuse_args args;
+	struct fuse_flush_in inarg;
+	struct inode *inode;
+	struct fuse_file *ff;
+	struct work_struct work;
+	struct file *file;
+};
+
+static void fuse_flush_async(struct work_struct *work)
+{
+	struct fuse_flush_args *fa = container_of(work, typeof(*fa), work);
+	struct fuse_mount *fm = get_fuse_mount(fa->inode);
+	int err;
+
+	err = do_fuse_flush(fm, fa->inode, fa->file, &fa->args);
+	if (err < 0)
+		fuse_invalidate_attrs(fm, err, fa->inode);
+	fuse_file_put(fa->ff, false, false);
+	fput(fa->file);
+	kfree(fa);
+}
+
+static int fuse_flush(struct file *file, fl_owner_t id)
+{
+	struct inode *inode = file_inode(file);
+	struct fuse_mount *fm = get_fuse_mount(inode);
+	struct fuse_file *ff = file->private_data;
+	struct fuse_flush_in inarg;
+	FUSE_ARGS(args);
+	int err;
+
+	if (fuse_is_bad(inode))
+		return -EIO;
+
+	if (ff->open_flags & FOPEN_NOFLUSH && !fm->fc->writeback_cache)
+		return 0;
+
 	err = 0;
 	if (fm->fc->no_flush)
 		goto inval_attr_out;
@@ -518,19 +566,33 @@  static int fuse_flush(struct file *file, fl_owner_t id)
 	args.in_args[0].value = &inarg;
 	args.force = true;
 
-	err = fuse_simple_request(fm, &args);
-	if (err == -ENOSYS) {
-		fm->fc->no_flush = 1;
-		err = 0;
+	if (current->flags & PF_EXITING) {
+		struct fuse_flush_args *fa;
+
+		err = -ENOMEM;
+		fa = kzalloc(sizeof(*fa), GFP_KERNEL);
+		if (!fa)
+			goto inval_attr_out;
+
+		memcpy(&fa->args, &args, sizeof(args));
+		memcpy(&fa->inarg, &inarg, sizeof(inarg));
+		fa->args.in_args[0].value = &fa->inarg;
+		fa->args.nocreds = true;
+		fa->ff = fuse_file_get(ff);
+		fa->inode = inode;
+		fa->file = get_file(file);
+
+		INIT_WORK(&fa->work, fuse_flush_async);
+		schedule_work(&fa->work);
+		return 0;
 	}
 
+	err = do_fuse_flush(fm, inode, file, &args);
+	if (!err)
+		return 0;
+
 inval_attr_out:
-	/*
-	 * In memory i_blocks is not maintained by fuse, if writeback cache is
-	 * enabled, i_blocks from cached attr may not be accurate.
-	 */
-	if (!err && fm->fc->writeback_cache)
-		fuse_invalidate_attr_mask(inode, STATX_BLOCKS);
+	fuse_invalidate_attrs(fm, err, inode);
 	return err;
 }