[for-linus,1/3] eventfs: Have the inodes all for files and directories all be the same

Message ID 20240117143810.531966508@goodmis.org
State New
Headers
Series eventfs: A few more fixes for 6.8 |

Commit Message

Steven Rostedt Jan. 17, 2024, 2:35 p.m. UTC
  From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

The dentries and inodes are created in the readdir for the sole purpose of
getting a consistent inode number. Linus stated that is unnecessary, and
that all inodes can have the same inode number. For a virtual file system
they are pretty meaningless.

Instead use a single unique inode number for all files and one for all
directories.

Link: https://lore.kernel.org/all/20240116133753.2808d45e@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.412180363@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al  Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 fs/tracefs/event_inode.c | 10 ++++++++++
 1 file changed, 10 insertions(+)
  

Comments

Linus Torvalds Jan. 22, 2024, 5:37 p.m. UTC | #1
On Mon, 22 Jan 2024 at 08:46, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I can add this patch to make sure directory inodes are unique, as it causes
> a regression in find, but keep the file inodes the same.

Yeah, limiting it to directories will at least somewhat help the
address leaking.

However, I also note that you never did the "set i_nlink to one"
trick, which is the traditional thing to do to tell 'find' that it
cannot do its directory optimization thing.

I'm not sure that the nlink trick disables this part of the find
sanity checks, but the *first* thing to check would be something like
this

  --- a/fs/tracefs/inode.c
  +++ b/fs/tracefs/inode.c
  @@ -182,6 +182,7 @@ static int tracefs_getattr(struct mnt_idmap *idmap,

        set_tracefs_inode_owner(inode);
        generic_fillattr(idmap, request_mask, inode, stat);
  +     stat->nlink = 1;
        return 0;
   }

because it might just fix the issue.

Having nlink == 1 is how non-unix filesystems (like FAT etc) indicate
that you can't try to count directory entries to optimize traversal.

And it is possible that that is where the whole find thing comes from,
but who knows, it could be a generic loop detector that runs
independently of the usual link detection.

               Linus
  
Mathieu Desnoyers Jan. 22, 2024, 6:35 p.m. UTC | #2
On 2024-01-22 12:50, Steven Rostedt wrote:
> On Mon, 22 Jan 2024 12:14:36 -0500
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
[...]
>> On my 6.1.0 kernel:
>>
>> find /sys/kernel/tracing | wc -l
>> 15598
>>
>> (mainly due to TRACE_EVENT ABI files)
>>
>> Hashing risks:
>>
>> - Exposing kernel addresses if the hashing algorithm is broken,
> 
> Well this was my biggest concern, but if I truncate at least a nibble, with
> the unique salt to the algorithm for each file, how easily does that expose
> kernel addresses.
> 
> The ei itself, is created from kmalloc() so you would at best get a heap
> address. But with the missing nibble (if I mask it with ((1 << 28) - 1),
> and much more taken away for 64 bit systems), and the added unique salt, is
> it possible for this to expose anything that could be used in an attack?

I don't know, which is why I am concerned about it.

But I don't think we should spend time trying to understand all
possible attack scenarios associated with hashing of kernel addresses
when there are much simpler options available.

> 
>> - Collisions if users are unlucky (which could trigger those
>>     'find' errors).
>>
>> Those 15598 inode values fit within a single page (bitmap of
>> 1922 bytes).
>>
>> So I would recommend simply adding a bitmap per tracefs filesystem
>> instance to keep track of inode number allocation.
> 
> And how do I recover this bit after the inode is freed, but then referenced
> again?

You would keep the allocated inode number value within your data
structure associated with the inode.

If you never free inodes, then you can just use a static increment
as Linus suggested. But AFAIU there are cases where you free inodes,
hence my suggestion of bitmap.

When the inode is freed, you know which inode number is associated from
the field in your data structure, so you can clear this bit in the bitmap.

On the next inode allocation, you find-first-zero-bit in the bitmap, and
set it to one to reserve it.

> 
>>
>> Creation/removal of files/directories in tracefs should not be
>> a fast-path anyway, so who cares about the speed of a find first
>> bit within a single page ?
>>
> 
> When an inode is no longer referenced, it is freed. When it is referenced
> again, I want it to be recreated with the same inode number it had
> previously. How would having a bitmask help with that? I need a way to map
> an ei structure with a unique number without adding another 4 bytes to the
> structure itself.

As discussed in a separate exchange with Linus, why do you care so much about
not adding a 4 bytes field to the structure ?

Thanks,

Mathieu
  
Kees Cook Jan. 22, 2024, 6:50 p.m. UTC | #3
On January 22, 2024 10:19:12 AM PST, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>On Mon, 22 Jan 2024 at 09:39, Linus Torvalds
><torvalds@linux-foundation.org> wrote:
>>
>> Actually, why not juist add an inode number to your data structures,
>> at least for directories? And just do a static increment on it as they
>> get registered?

Yeah, this is what I'd suggest too. It avoids all the hash complexity. Is wrap-around realistic for it?

-Kees
  
Christian Brauner Jan. 25, 2024, 5:40 p.m. UTC | #4
On Mon, Jan 22, 2024 at 02:44:43PM -0500, Steven Rostedt wrote:
> On Mon, 22 Jan 2024 10:19:12 -0800
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Mon, 22 Jan 2024 at 09:39, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > Actually, why not juist add an inode number to your data structures,
> > > at least for directories? And just do a static increment on it as they
> > > get registered?
> > >
> > > That avoids the whole issue with possibly leaking kernel address data.  
> > 
> > The 'nlink = 1' thing doesn't seem to make 'find' any happier for this
> > case, sadly.
> > 
> > But the inode number in the 'struct eventfs_inode' looks trivial. And
> > doesn't even grow that structure on 64-bit architectures at least,
> > because the struct is already 64-bit aligned, and had only one 32-bit
> > entry at the end.
> > 
> > On 32-bit architectures the structure size grows, but I'm not sure the
> > allocation size grows. Our kmalloc() is quantized at odd numbers.
> > 
> > IOW, this trivial patch seems to be much safer than worrying about
> > some pointer exposure.
> 
> I originally wanted to avoid the addition of the 4 bytes, but your comment
> about it not making a difference on 64bit due to alignment makes sense.

Non-unique inode numbers aren't exactly great for userspace and there
are a surprisingly small yet large enough number of tools that trip over
this in various ways. So if that can be avoided then great.

But luckily no one is probably going to tar up tracefs. ;)
  

Patch

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index fdff53d5a1f8..5edf0b96758b 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -32,6 +32,10 @@ 
  */
 static DEFINE_MUTEX(eventfs_mutex);
 
+/* Choose something "unique" ;-) */
+#define EVENTFS_FILE_INODE_INO		0x12c4e37
+#define EVENTFS_DIR_INODE_INO		0x134b2f5
+
 /*
  * The eventfs_inode (ei) itself is protected by SRCU. It is released from
  * its parent's list and will have is_freed set (under eventfs_mutex).
@@ -352,6 +356,9 @@  static struct dentry *create_file(const char *name, umode_t mode,
 	inode->i_fop = fop;
 	inode->i_private = data;
 
+	/* All files will have the same inode number */
+	inode->i_ino = EVENTFS_FILE_INODE_INO;
+
 	ti = get_tracefs(inode);
 	ti->flags |= TRACEFS_EVENT_INODE;
 	d_instantiate(dentry, inode);
@@ -388,6 +395,9 @@  static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent
 	inode->i_op = &eventfs_root_dir_inode_operations;
 	inode->i_fop = &eventfs_file_operations;
 
+	/* All directories will have the same inode number */
+	inode->i_ino = EVENTFS_DIR_INODE_INO;
+
 	ti = get_tracefs(inode);
 	ti->flags |= TRACEFS_EVENT_INODE;