[6/6] eventfs: clean up dentry ops and add revalidate function
Commit Message
In order for the dentries to stay up-to-date with the eventfs changes,
just add a 'd_revalidate' function that checks the 'is_freed' bit.
Also, clean up the dentry release to actually use d_release() rather
than the slightly odd d_iput() function. We don't care about the inode,
all we want to do is to get rid of the refcount to the eventfs data
added by dentry->d_fsdata.
It would probably be cleaner to make eventfs its own filesystem, or at
least set its own dentry ops when looking up eventfs files. But as it
is, only eventfs dentries use d_fsdata, so we don't really need to split
these things up by use.
Another thing that might be worth doing is to make all eventfs lookups
mark their dentries as not worth caching. We could do that with
d_delete(), but the DCACHE_DONTCACHE flag would likely be even better.
As it is, the dentries are all freeable, but they only tend to get freed
at memory pressure rather than more proactively. But that's a separate
issue.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
fs/tracefs/event_inode.c | 21 +++++----------------
fs/tracefs/inode.c | 27 ++++++++++++++++++---------
fs/tracefs/internal.h | 3 ++-
3 files changed, 25 insertions(+), 26 deletions(-)
Comments
On Tue, 30 Jan 2024 at 13:25, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I actually find the dentry's sticking around when their ref counts go to
> zero a feature. Tracing applications will open and close the eventfs files
> many times. If the dentry were to be deleted on every close, it would need
> to be create over again in short spurts.
Sure. I think this is literally a tuning thing, and we'll see if
anybody ever says "the dentry cache grows too much with tracefs".
Linus
On Tue, 30 Jan 2024 at 18:46, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> What's to stop ->d_revalidate() from being called in parallel with
> __dentry_kill()?
Oh, you're right.
For some reason I thought we did the d_release() _after_ the RCU grace
period, but we don't.
Why don't we, btw? It would be so much better if we did the
d_release() from __d_free(). We have all that smarts in fs/dcache.c to
decide if we need to RCU-delay it or not, and then we don't let
filesystems use it.
I assume the reason is that some 'd_delete' cases might want to sleep
etc. Still, for things like this that just want to release memory, it
would be *much* better to have d_release called when the dentry is
actually released.
Hmm. Not very many d_delete cases, but I did see a couple that
definitely want process context (dma_buf_release goes to things that
do vfree() etc).
So I guess the "make low-level filesystems do their own kfree_rcu() is
what we're doing.
In this case it's as simple as doing that
- kfree(ei);
+ kfree_rcu(ei, rcu);
and we'd just make the rcu entry a union with something that isn't
that 'is_freed' field so that it doesn't take more space.
Linus
On Wed, 31 Jan 2024 13:00:39 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 30 Jan 2024 11:03:55 -0800
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > It would probably be cleaner to make eventfs its own filesystem, or at
> > least set its own dentry ops when looking up eventfs files. But as it
> > is, only eventfs dentries use d_fsdata, so we don't really need to split
> > these things up by use.
>
> BTW, I have been thinking about making eventfs a completely separate file
> system that could be mounted outside of tracefs. One that is readonly that
> only contains the "id" and "format" files and nothing more.
>
> Why? Because perf and powertop both use those files to know how to parse
> the raw event formats. I don't think there's anything in there that
> requires root privileges to read. They should not be exposing any internal
> kernel information besides the event format layouts, and it would be nice
> to have a /sys/kernel/events directory that only had that.
That's a good idea! So maybe we can allow perf to read it without root user.
>
> Making eventfs a separate file system where, when added to tracefs, has the
> control files for the specific trace_array, but for the /sys/kernel
> directory, only cares about the trace format files.
>
> Then tracefs could be nicely converted over to kernfs, and eventfs would be
> its own entity.
If so, maybe we can just make symlinks to the 'id' and 'format' from events
under tracefs? :)
Thank you,
>
> -- Steve
@@ -414,23 +414,14 @@ static inline struct eventfs_inode *alloc_ei(const char *name)
/**
- * eventfs_set_ei_status_free - remove the dentry reference from an eventfs_inode
- * @ti: the tracefs_inode of the dentry
+ * eventfs_d_release - dentry is going away
* @dentry: dentry which has the reference to remove.
*
* Remove the association between a dentry from an eventfs_inode.
*/
-void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
+void eventfs_d_release(struct dentry *dentry)
{
- struct eventfs_inode *ei;
-
- mutex_lock(&eventfs_mutex);
- ei = dentry->d_fsdata;
- if (ei) {
- dentry->d_fsdata = NULL;
- put_ei(ei);
- }
- mutex_unlock(&eventfs_mutex);
+ put_ei(dentry->d_fsdata);
}
/**
@@ -517,9 +508,8 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
}
enoent:
- /* Nothing found? */
- d_add(dentry, NULL);
- result = NULL;
+ /* Don't cache negative lookups, just return an error */
+ result = ERR_PTR(-ENOENT);
out:
mutex_unlock(&eventfs_mutex);
@@ -857,6 +847,5 @@ void eventfs_remove_events_dir(struct eventfs_inode *ei)
* sticks around while the other ei->dentry are created
* and destroyed dynamically.
*/
- simple_recursive_removal(dentry, NULL);
dput(dentry);
}
@@ -379,21 +379,30 @@ static const struct super_operations tracefs_super_operations = {
.show_options = tracefs_show_options,
};
-static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode)
+/*
+ * It would be cleaner if eventfs had its own dentry ops.
+ *
+ * Note that d_revalidate is called potentially under RCU,
+ * so it can't take the eventfs mutex etc. It's fine - if
+ * we open a file just as it's marked dead, things will
+ * still work just fine, and just see the old stale case.
+ */
+static void tracefs_d_release(struct dentry *dentry)
{
- struct tracefs_inode *ti;
+ if (dentry->d_fsdata)
+ eventfs_d_release(dentry);
+}
- if (!dentry || !inode)
- return;
+static int tracefs_d_revalidate(struct dentry *dentry, unsigned int flags)
+{
+ struct eventfs_inode *ei = dentry->d_fsdata;
- ti = get_tracefs(inode);
- if (ti && ti->flags & TRACEFS_EVENT_INODE)
- eventfs_set_ei_status_free(ti, dentry);
- iput(inode);
+ return !(ei && ei->is_freed);
}
static const struct dentry_operations tracefs_dentry_operations = {
- .d_iput = tracefs_dentry_iput,
+ .d_revalidate = tracefs_d_revalidate,
+ .d_release = tracefs_d_release,
};
static int trace_fill_super(struct super_block *sb, void *data, int silent)
@@ -79,6 +79,7 @@ struct inode *tracefs_get_inode(struct super_block *sb);
struct dentry *eventfs_start_creating(const char *name, struct dentry *parent);
struct dentry *eventfs_failed_creating(struct dentry *dentry);
struct dentry *eventfs_end_creating(struct dentry *dentry);
-void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry);
+
+void eventfs_d_release(struct dentry *dentry);
#endif /* _TRACEFS_INTERNAL_H */