[GIT,PULL] tracing: Add eventfs file to help with debugging any more issues

Message ID 20230913192905.0a92bcab@gandalf.local.home
State New
Headers
Series [GIT,PULL] tracing: Add eventfs file to help with debugging any more issues |

Commit Message

Steven Rostedt Sept. 13, 2023, 11:29 p.m. UTC
  Linus,

tracing: Add eventfs file to help with debugging any more issues

While debugging the eventfs dynamic file creation issues, creating a file
in tracefs that exposed what dentries that were created along with their ref
counts proved invaluable.


Please pull the latest trace-v6.6-rc1-2 tree, which can be found at:


  git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
trace-v6.6-rc1-2

Tag SHA1: 48380652486eacb149420e083516fb159b475d13
Head SHA1: d6f358d6fff619cebbc547f2bdd3ec66ade04954


Steven Rostedt (Google) (1):
      tracefs: Add show_events_dentries

----
 fs/tracefs/Makefile         |   1 +
 fs/tracefs/event_inode.c    |  42 +------------
 fs/tracefs/event_show.c     | 147 ++++++++++++++++++++++++++++++++++++++++++++
 fs/tracefs/internal.h       |  44 +++++++++++++
 include/linux/tracefs.h     |   2 +
 kernel/trace/trace_events.c |   3 +
 6 files changed, 198 insertions(+), 41 deletions(-)
 create mode 100644 fs/tracefs/event_show.c
---------------------------
commit d6f358d6fff619cebbc547f2bdd3ec66ade04954
Author: Steven Rostedt (Google) <rostedt@goodmis.org>
Date:   Wed Sep 13 11:36:28 2023 -0400

    tracefs: Add show_events_dentries
    
    Add a file in tracefs that shows the "events" allocated entries and the
    dentries that are attached to them. This is used to see what dentries have
    been dynamically allocated as well as their current ref counts.
    
     ~# cat /sys/kernel/tracing/events/sched/sched_switch/enable
     0
     ~# grep -A4 sched_switch /sys/kernel/tracing/show_events_dentries
     sched_switch/ dentry: (1)
     enable dentry: (0)
     id
     filter
     trigger
    
    The first value is the name of the file or directory. If a dentry is
    allocated, then a "dentry: (<ref-count>)" is displayed showing the address
    of the dentry as well as its ref count.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20230913113628.172907b7@gandalf.local.home
    
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Ajay Kaher <akaher@vmware.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
  

Comments

Linus Torvalds Sept. 15, 2023, 7:28 p.m. UTC | #1
On Wed, 13 Sept 2023 at 16:28, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> tracing: Add eventfs file to help with debugging any more issues
>
> While debugging the eventfs dynamic file creation issues, creating a file
> in tracefs that exposed what dentries that were created along with their ref
> counts proved invaluable.

Honestly, this is neither a bug-fix, nor does it seem to make any
sense at all in the main tree.

This really feels like a "temporary debug patch for tracing developers".

              Linus
  
Steven Rostedt Sept. 15, 2023, 8:36 p.m. UTC | #2
On Fri, 15 Sep 2023 12:28:57 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, 13 Sept 2023 at 16:28, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > tracing: Add eventfs file to help with debugging any more issues
> >
> > While debugging the eventfs dynamic file creation issues, creating a file
> > in tracefs that exposed what dentries that were created along with their ref
> > counts proved invaluable.  
> 
> Honestly, this is neither a bug-fix, nor does it seem to make any
> sense at all in the main tree.
> 
> This really feels like a "temporary debug patch for tracing developers".

I wouldn't say its temporary, especially since it can be used to see what's
happening internally.

I'm OK with it not going in now, but instead I'll wrap an ifdef around it
and move it to my queue for the next merge window. I still would like to
keep these "what's going on internally" available, as I'll ask people to
enable them when they report an issue.

There's a few other files that do similar things that have been very useful
for such cases.

-- Steve
  
Linus Torvalds Sept. 15, 2023, 8:50 p.m. UTC | #3
On Fri, 15 Sept 2023 at 13:36, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I'm OK with it not going in now, but instead I'll wrap an ifdef around it
> and move it to my queue for the next merge window. I still would like to
> keep these "what's going on internally" available, as I'll ask people to
> enable them when they report an issue.

Honestly, you copied the pattern from the /proc filesystem.

The /proc filesyustem is widely used and has never had this kind of
random debugging code in mainline.

Seriously, that eventfs_file thing is not worthy of this kind of
special debug code.

That debug code seems to be approaching the same order of size as all
the code evetfs_file code itself is.

There's a point where this kind of stuff just becomes ridiculous. At
least wait until there's a *reason* to debug a simple linked list of
objects.

If you have a hard time figuring out what the eventfs entries are,
maybe you should just have made "iterate_shared" show them, and then
you could use fancy tools like "ls" to see what the heck is up in that
directory?

                 Linus
  
Steven Rostedt Sept. 15, 2023, 9:13 p.m. UTC | #4
On Fri, 15 Sep 2023 13:50:17 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, 15 Sept 2023 at 13:36, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > I'm OK with it not going in now, but instead I'll wrap an ifdef around it
> > and move it to my queue for the next merge window. I still would like to
> > keep these "what's going on internally" available, as I'll ask people to
> > enable them when they report an issue.  
> 
> Honestly, you copied the pattern from the /proc filesystem.

I didn't actually copy it, even though the /proc filesystem does something
similar (I didn't even know that it did until I presented this idea for
eventfs in LSFMM, and someone told me that /proc did so too).

I tried to look at how /proc does things and I couldn't really use it as
easily, because proc uses its own set of "proc_ops", and I had some
different requirements.

> 
> The /proc filesyustem is widely used and has never had this kind of
> random debugging code in mainline.

Again, it is implemented differently.

> 
> Seriously, that eventfs_file thing is not worthy of this kind of
> special debug code.
> 
> That debug code seems to be approaching the same order of size as all
> the code evetfs_file code itself is.

You mean the event_show.c code?

> 
> There's a point where this kind of stuff just becomes ridiculous. At
> least wait until there's a *reason* to debug a simple linked list of
> objects.
> 
> If you have a hard time figuring out what the eventfs entries are,
> maybe you should just have made "iterate_shared" show them, and then
> you could use fancy tools like "ls" to see what the heck is up in that
> directory?
> 

I was more interested in what did not exist than what existed. I wanted to
make sure that things were cleaned up properly. One of my tests that I used
was to do a: find /sys/kernel/tracing/events, and then run my ring_buffer
memory size stress test (that keeps increasing the size of the ring buffer
to make sure it fails safely when it runs out of memory). Then I check to make
sure all the unused dentries and inodes were reclaimed nicely, as they hang
around until a reclaim is made.

It did prove useful for the initial debugging, but it also helped a lot for
the new code I have saved for the next merge window. That code changes the
internal interface quite drastically.

The current code has meta data for every file in the eventfs (defined by the
eventfs_file structure). The new code has meta data only for the events
themselves (which map to the directories) and I remove the eventfs_file
entirely. It uses a callback from the eventfs code to create the dentries
and inodes of the files on the fly (there's no meta data representing the
individual files).

  https://lore.kernel.org/linux-trace-kernel/20230914163535.269645249@goodmis.org/

Now I use this debug file to know what files are added, and more
importantly, not added.

Are you entirely against this file, or is it fine if it's just wrapped
around an CONFIG_EVENTFS_DEBUG?

-- Steve
  
Linus Torvalds Sept. 15, 2023, 9:30 p.m. UTC | #5
On Fri, 15 Sept 2023 at 14:13, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Are you entirely against this file, or is it fine if it's just wrapped
> around an CONFIG_EVENTFS_DEBUG?

Honestly, I think its' extra code that we'd carry around - probably
for much too long - with absolutely _zero_ indication that it's
actually worth it.

Not worth asking people about, but also not worth carrying around.

You worry about bugs in it now, because it's new code. That's normal.
That doesn't make your debug interface worth any kind of future.

Keep it around as a private patch. Send it out to people if there are
actual issues that might indicate this debug support would helkp. And
if it has shown itself to be useful several times, at that point you
have an argument for the code.

As it is, right now I look at that code and I see extra BS that we'll
carry around forever that helps *zero* users, and I find it very
questionable whether it would help you.

And if you really think that people need to know what the events exist
in eventfs, then dammit, make 'readdir()' see them. Not some stupid
specialty debug interface. That's what filesystems *have* readdir for.

              Linus
  
Linus Torvalds Sept. 15, 2023, 9:35 p.m. UTC | #6
On Fri, 15 Sept 2023 at 14:30, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And if you really think that people need to know what the events exist
> in eventfs, then dammit, make 'readdir()' see them. Not some stupid
> specialty debug interface. That's what filesystems *have* readdir for.

.. alternatively, if you have noticed that it's just a pain to not be
able to see the data, instead of introducing this completely separate
and illogical debug interface, just say "ok, it was a mistake, let's
go back to just keeping things in dentries since we can _see_ those".

Put another way: this is all self-inflicted damage, and you seem to
argue for this debug interface purely on "I can't see what's going on
any more, the old model was really nice because you could *see* the
events".

To me, if that's really a major issue, that just says "ok, this
eventfs abstraction was mis-designed, and hid data that the main
developer actually wants".

We don't add new debug interfaces just because you screwed up the
design. Fix it.

              Linus
  
Steven Rostedt Sept. 15, 2023, 10:01 p.m. UTC | #7
On Fri, 15 Sep 2023 14:35:40 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, 15 Sept 2023 at 14:30, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > And if you really think that people need to know what the events exist
> > in eventfs, then dammit, make 'readdir()' see them. Not some stupid
> > specialty debug interface. That's what filesystems *have* readdir for.  
> 
> .. alternatively, if you have noticed that it's just a pain to not be
> able to see the data, instead of introducing this completely separate
> and illogical debug interface, just say "ok, it was a mistake, let's
> go back to just keeping things in dentries since we can _see_ those".
> 
> Put another way: this is all self-inflicted damage, and you seem to
> argue for this debug interface purely on "I can't see what's going on
> any more, the old model was really nice because you could *see* the
> events".

The entire tracing infrastructure was created because of the "I can't see
what's going on" ;-)  Not everyone is as skilled with printk as you.

The old eventfs model was too costly because of the memory footprint, which
was the entire objective of this code patch. The BPF folks told me they
looked into use a tracing instance but said it added too much memory
overhead to do so. That's when I noticed that the copy of the entire events
directory that an instance has was the culprit, and needed to be fixed.

> 
> To me, if that's really a major issue, that just says "ok, this
> eventfs abstraction was mis-designed, and hid data that the main
> developer actually wants".
> 
> We don't add new debug interfaces just because you screwed up the
> design. Fix it.

This interface is used to make sure that things are being freed, and the
overhead is low. The bugs it found was where ref counts were screwed up,
and things were permanently there or just leaked in general.

Not sure what design you want fixed.

But I get your point. I will agree that this interface will likely be
mostly useful for the first year or two after the new code is added. But
after a few years, we could delete it too. If it's a debug option I highly
doubt that any distro will keep it on and anything will actually depend on
it.

I'm not going to fight for it. I can just keep this code in patchwork that
could be easily available, or even in a git branch on my own tree.

Consider this pull request dropped.

-- Steve
  
Steven Rostedt Sept. 15, 2023, 10:11 p.m. UTC | #8
On Fri, 15 Sep 2023 18:01:18 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> > To me, if that's really a major issue, that just says "ok, this
> > eventfs abstraction was mis-designed, and hid data that the main
> > developer actually wants".

Just to clarify the objective of the show_event_dentries file was the
heisenberg effect.

Just doing a 'ls' in the eventfs will create the dentries.

I'm interested in knowing that the dentries do not exist before the 'ls',
so I look at that file to make sure they are not there.

Then I do an 'ls' where I see all the files.

I then look at the file again to make sure the ref counts are correct.

I then run a memory pressure test, and look at the file to make sure that
the dentries are all cleaned up.

For kicks I'll do another 'ls' and see all the files again.

You may be correct that once I did the above, the code could be considered
working. My fear is that something might change in vfs that causes it to
break, and this file could be useful in catching that.

But if it never breaks, then the file becomes useless, which I guess is
what you are saying.

I'll keep the code around locally, and if vfs ever changes and breaks this
code where this file helps in solving it, I'll then do another pull request
to put this file upstream ;-)

-- Steve
  

Patch

diff --git a/fs/tracefs/Makefile b/fs/tracefs/Makefile
index 73c56da8e284..8f48f4fc6698 100644
--- a/fs/tracefs/Makefile
+++ b/fs/tracefs/Makefile
@@ -1,6 +1,7 @@ 
 # SPDX-License-Identifier: GPL-2.0-only
 tracefs-objs	:= inode.o
 tracefs-objs	+= event_inode.o
+tracefs-objs	+= event_show.o
 
 obj-$(CONFIG_TRACING)	+= tracefs.o
 
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 9f64e7332796..b23bb0957bb4 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -23,47 +23,7 @@ 
 #include <linux/delay.h>
 #include "internal.h"
 
-struct eventfs_inode {
-	struct list_head	e_top_files;
-};
-
-/*
- * struct eventfs_file - hold the properties of the eventfs files and
- *                       directories.
- * @name:	the name of the file or directory to create
- * @d_parent:   holds parent's dentry
- * @dentry:     once accessed holds dentry
- * @list:	file or directory to be added to parent directory
- * @ei:		list of files and directories within directory
- * @fop:	file_operations for file or directory
- * @iop:	inode_operations for file or directory
- * @data:	something that the caller will want to get to later on
- * @mode:	the permission that the file or directory should have
- */
-struct eventfs_file {
-	const char			*name;
-	struct dentry			*d_parent;
-	struct dentry			*dentry;
-	struct list_head		list;
-	struct eventfs_inode		*ei;
-	const struct file_operations	*fop;
-	const struct inode_operations	*iop;
-	/*
-	 * Union - used for deletion
-	 * @del_list:	list of eventfs_file to delete
-	 * @rcu:	eventfs_file to delete in RCU
-	 * @is_freed:	node is freed if one of the above is set
-	 */
-	union {
-		struct list_head	del_list;
-		struct rcu_head		rcu;
-		unsigned long		is_freed;
-	};
-	void				*data;
-	umode_t				mode;
-};
-
-static DEFINE_MUTEX(eventfs_mutex);
+DEFINE_MUTEX(eventfs_mutex);
 DEFINE_STATIC_SRCU(eventfs_srcu);
 
 static struct dentry *eventfs_root_lookup(struct inode *dir,
diff --git a/fs/tracefs/event_show.c b/fs/tracefs/event_show.c
new file mode 100644
index 000000000000..66dece7cc810
--- /dev/null
+++ b/fs/tracefs/event_show.c
@@ -0,0 +1,147 @@ 
+#include <linux/seq_file.h>
+#include <linux/tracefs.h>
+#include "internal.h"
+
+/*
+ * This will iterate three lists that correspond to the directory level
+ * of the eventfs directory.
+ *
+ * level 0 : /sys/kernel/tracing/events
+ * level 1 : /sys/kernel/tracing/events/<system>
+ * level 2 : /sys/kernel/tracing/events/<system>/event
+ *
+ * The iterator needs to see all levels as they all contain dynamically
+ * allocated dentries and inodes.
+ */
+struct event_list {
+	int			level;
+	struct list_head	*head[3];
+	struct list_head	*next[3];
+};
+
+static void *e_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct event_list *elist = m->private;
+	int level = elist->level;
+	struct list_head *head = elist->head[level];
+	struct list_head *next = elist->next[level];
+	struct eventfs_file *ef;
+
+	(*pos)++;
+
+	/* If next is equal to head, then the list is complete */
+	while (next == head) {
+		if (!level)
+			return NULL;
+
+		/* sublevel below top level, go up one */
+		elist->level = --level;
+		head = elist->head[level];
+		/* Going down does not update next, so do it here */
+		next = elist->next[level]->next;
+		elist->next[level] = next;
+	}
+
+	ef = list_entry(next, struct eventfs_file, list);
+
+	/* For each entry (not at the bottom) do a breadth first search */
+	if (ef->ei && !list_empty(&ef->ei->e_top_files) && level < 2) {
+		elist->level = ++level;
+		head = &ef->ei->e_top_files;
+		elist->head[level] = head;
+		next = head;
+		/*
+		 * Note, next is now pointing to the next sub level.
+		 * Need to update the next for the previous level on the way up.
+		 */
+	}
+
+	elist->next[level] = next->next;
+	return ef;
+}
+
+static void *e_start(struct seq_file *m, loff_t *pos)
+{
+	struct event_list *elist = m->private;
+	struct eventfs_file *ef = NULL;
+	loff_t l;
+
+	mutex_lock(&eventfs_mutex);
+
+	elist->level = 0;
+	elist->next[0] = elist->head[0]->next;
+
+	for (l = 0; l <= *pos; ) {
+		ef = e_next(m, ef, &l);
+		if (!ef)
+			break;
+	}
+	return ef;
+}
+
+static int e_show(struct seq_file *m, void *v)
+{
+	struct eventfs_file *ef = v;
+
+	seq_printf(m, "%s", ef->name);
+	if (ef->ei)
+		seq_putc(m, '/');
+
+	if (ef->dentry)
+		seq_printf(m, " dentry: (%d)", d_count(ef->dentry));
+	seq_putc(m, '\n');
+
+	return 0;
+}
+
+static void e_stop(struct seq_file *m, void *p)
+{
+	mutex_unlock(&eventfs_mutex);
+}
+
+static const struct seq_operations eventfs_show_dentry_seq_ops = {
+	.start = e_start,
+	.next = e_next,
+	.show = e_show,
+	.stop = e_stop,
+};
+
+static int
+eventfs_show_dentry_open(struct inode *inode, struct file *file)
+{
+	const struct seq_operations *seq_ops = &eventfs_show_dentry_seq_ops;
+	struct event_list *elist;
+	struct tracefs_inode *ti;
+	struct eventfs_inode *ei;
+	struct dentry *dentry;
+
+	/* The inode private should have the dentry of the "events" directory */
+	dentry = inode->i_private;
+	if (!dentry)
+		return -EINVAL;
+
+	elist = __seq_open_private(file, seq_ops, sizeof(*elist));
+	if (!elist)
+		return -ENOMEM;
+
+	ti = get_tracefs(dentry->d_inode);
+	ei = ti->private;
+
+	/*
+	 * Start off at level 0 (/sys/kernel/tracing/events)
+	 * Initialize head to the events files and next to the
+	 * first file.
+	 */
+	elist->level = 0;
+	elist->head[0] = &ei->e_top_files;
+	elist->next[0] = ei->e_top_files.next;
+
+	return 0;
+}
+
+const struct file_operations eventfs_show_dentry_fops = {
+	.open = eventfs_show_dentry_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = seq_release,
+};
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 4f2e49e2197b..461920f0133f 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -2,11 +2,55 @@ 
 #ifndef _TRACEFS_INTERNAL_H
 #define _TRACEFS_INTERNAL_H
 
+#include <linux/mutex.h>
+
 enum {
 	TRACEFS_EVENT_INODE		= BIT(1),
 	TRACEFS_EVENT_TOP_INODE		= BIT(2),
 };
 
+struct eventfs_inode {
+	struct list_head	e_top_files;
+};
+
+/*
+ * struct eventfs_file - hold the properties of the eventfs files and
+ *                       directories.
+ * @name:	the name of the file or directory to create
+ * @d_parent:   holds parent's dentry
+ * @dentry:     once accessed holds dentry
+ * @list:	file or directory to be added to parent directory
+ * @ei:		list of files and directories within directory
+ * @fop:	file_operations for file or directory
+ * @iop:	inode_operations for file or directory
+ * @data:	something that the caller will want to get to later on
+ * @mode:	the permission that the file or directory should have
+ */
+struct eventfs_file {
+	const char			*name;
+	struct dentry			*d_parent;
+	struct dentry			*dentry;
+	struct list_head		list;
+	struct eventfs_inode		*ei;
+	const struct file_operations	*fop;
+	const struct inode_operations	*iop;
+	/*
+	 * Union - used for deletion
+	 * @del_list:	list of eventfs_file to delete
+	 * @rcu:	eventfs_file to delete in RCU
+	 * @is_freed:	node is freed if one of the above is set
+	 */
+	union {
+		struct list_head	del_list;
+		struct rcu_head		rcu;
+		unsigned long		is_freed;
+	};
+	void				*data;
+	umode_t				mode;
+};
+
+extern struct mutex eventfs_mutex;
+
 struct tracefs_inode {
 	unsigned long           flags;
 	void                    *private;
diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
index 009072792fa3..f76c7d74b23d 100644
--- a/include/linux/tracefs.h
+++ b/include/linux/tracefs.h
@@ -58,6 +58,8 @@  struct dentry *tracefs_create_instance_dir(const char *name, struct dentry *pare
 
 bool tracefs_initialized(void);
 
+extern const struct file_operations eventfs_show_dentry_fops;
+
 #endif /* CONFIG_TRACING */
 
 #endif
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 91951d038ba4..5b0cc40910b2 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -3639,6 +3639,9 @@  create_event_toplevel_files(struct dentry *parent, struct trace_array *tr)
 		return -ENOMEM;
 	}
 
+	trace_create_file("show_events_dentries", TRACE_MODE_READ, parent, d_events,
+			  &eventfs_show_dentry_fops);
+
 	error = eventfs_add_events_file("enable", TRACE_MODE_WRITE, d_events,
 				  tr, &ftrace_tr_enable_fops);
 	if (error)