[v4,17/22] lib/stackdepot: allow users to evict stack traces

Message ID 1d1ad5692ee43d4fc2b3fd9d221331d30b36123f.1700502145.git.andreyknvl@google.com
State New
Headers
Series stackdepot: allow evicting stack traces |

Commit Message

andrey.konovalov@linux.dev Nov. 20, 2023, 5:47 p.m. UTC
  From: Andrey Konovalov <andreyknvl@google.com>

Add stack_depot_put, a function that decrements the reference counter
on a stack record and removes it from the stack depot once the counter
reaches 0.

Internally, when removing a stack record, the function unlinks it from
the hash table bucket and returns to the freelist.

With this change, the users of stack depot can call stack_depot_put
when keeping a stack trace in the stack depot is not needed anymore.
This allows avoiding polluting the stack depot with irrelevant stack
traces and thus have more space to store the relevant ones before the
stack depot reaches its capacity.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>

---

Changes v1->v2:
- Comments fixes as suggested by Marco.
- Add lockdep_assert annotation.
- Adapt to using list_head's.
- Rename stack_depot_evict to stack_depot_put.
---
 include/linux/stackdepot.h | 14 ++++++++++++++
 lib/stackdepot.c           | 37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 50 insertions(+), 1 deletion(-)
  

Comments

Oscar Salvador Jan. 4, 2024, 8:52 a.m. UTC | #1
On Mon, Nov 20, 2023 at 06:47:15PM +0100, andrey.konovalov@linux.dev wrote:
> From: Andrey Konovalov <andreyknvl@google.com>
> 
> Add stack_depot_put, a function that decrements the reference counter
> on a stack record and removes it from the stack depot once the counter
> reaches 0.
> 
> Internally, when removing a stack record, the function unlinks it from
> the hash table bucket and returns to the freelist.
> 
> With this change, the users of stack depot can call stack_depot_put
> when keeping a stack trace in the stack depot is not needed anymore.
> This allows avoiding polluting the stack depot with irrelevant stack
> traces and thus have more space to store the relevant ones before the
> stack depot reaches its capacity.
> 
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>

I yet have to review the final bits of this series, but I'd like to
comment on something below

  
> +void stack_depot_put(depot_stack_handle_t handle)
> +{
> +	struct stack_record *stack;
> +	unsigned long flags;
> +
> +	if (!handle || stack_depot_disabled)
> +		return;
> +
> +	write_lock_irqsave(&pool_rwlock, flags);
> +
> +	stack = depot_fetch_stack(handle);
> +	if (WARN_ON(!stack))
> +		goto out;
> +
> +	if (refcount_dec_and_test(&stack->count)) {
> +		/* Unlink stack from the hash table. */
> +		list_del(&stack->list);
> +
> +		/* Free stack. */
> +		depot_free_stack(stack);

It would be great if stack_depot_put would also accept a boolean,
which would determine whether we want to erase the stack or not.

For the feature I'm working on page_ower [1], I need to keep track
of how many times we allocated/freed from a certain path, which may
expose a potential leak, and I was using the refcount to do that,
but I don't want the record to be erased, because this new
functionality won't be exclusive with the existing one.

e.g:  you can check /sys/kernel/debug/page_owner AND
/sys/kernel/debug/page_owner_stacks

So, while the new functionaliy won't care if a record has been erased,
the old one will, so information will be lost.

[1] https://patchwork.kernel.org/project/linux-mm/cover/20231120084300.4368-1-osalvador@suse.de/
  
Marco Elver Jan. 4, 2024, 9:25 a.m. UTC | #2
On Thu, 4 Jan 2024 at 09:52, Oscar Salvador <osalvador@suse.de> wrote:
>
> On Mon, Nov 20, 2023 at 06:47:15PM +0100, andrey.konovalov@linux.dev wrote:
> > From: Andrey Konovalov <andreyknvl@google.com>
> >
> > Add stack_depot_put, a function that decrements the reference counter
> > on a stack record and removes it from the stack depot once the counter
> > reaches 0.
> >
> > Internally, when removing a stack record, the function unlinks it from
> > the hash table bucket and returns to the freelist.
> >
> > With this change, the users of stack depot can call stack_depot_put
> > when keeping a stack trace in the stack depot is not needed anymore.
> > This allows avoiding polluting the stack depot with irrelevant stack
> > traces and thus have more space to store the relevant ones before the
> > stack depot reaches its capacity.
> >
> > Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
>
> I yet have to review the final bits of this series, but I'd like to
> comment on something below
>
>
> > +void stack_depot_put(depot_stack_handle_t handle)
> > +{
> > +     struct stack_record *stack;
> > +     unsigned long flags;
> > +
> > +     if (!handle || stack_depot_disabled)
> > +             return;
> > +
> > +     write_lock_irqsave(&pool_rwlock, flags);
> > +
> > +     stack = depot_fetch_stack(handle);
> > +     if (WARN_ON(!stack))
> > +             goto out;
> > +
> > +     if (refcount_dec_and_test(&stack->count)) {
> > +             /* Unlink stack from the hash table. */
> > +             list_del(&stack->list);
> > +
> > +             /* Free stack. */
> > +             depot_free_stack(stack);
>
> It would be great if stack_depot_put would also accept a boolean,
> which would determine whether we want to erase the stack or not.

I think a boolean makes the interface more confusing for everyone
else. At that point stack_depot_put merely decrements the refcount and
becomes a wrapper around refcount_dec, right?

I think you want to expose the stack_record struct anyway for your
series, so why not simply avoid calling stack_depot_put and decrement
the refcount with your own helper (there needs to be a new stackdepot
function to return a stack_record under the pool_rwlock held as
reader).

Also, you need to ensure noone else calls stack_depot_put on the stack
traces you want to keep. If there is a risk someone else may call
stack_depot_put on them, it obviously won't work (I think the only
option then is to introduce a way to pin stacks).


> For the feature I'm working on page_ower [1], I need to keep track
> of how many times we allocated/freed from a certain path, which may
> expose a potential leak, and I was using the refcount to do that,
> but I don't want the record to be erased, because this new
> functionality won't be exclusive with the existing one.
>
> e.g:  you can check /sys/kernel/debug/page_owner AND
> /sys/kernel/debug/page_owner_stacks
>
> So, while the new functionaliy won't care if a record has been erased,
> the old one will, so information will be lost.
>
> [1] https://patchwork.kernel.org/project/linux-mm/cover/20231120084300.4368-1-osalvador@suse.de/
>
>
>
> --
> Oscar Salvador
> SUSE Labs
  
Oscar Salvador Jan. 4, 2024, 10:19 a.m. UTC | #3
On Thu, Jan 04, 2024 at 10:25:40AM +0100, Marco Elver wrote:
> I think a boolean makes the interface more confusing for everyone
> else. At that point stack_depot_put merely decrements the refcount and
> becomes a wrapper around refcount_dec, right?

Thanks Marco for the feedback.

Fair enough.

> I think you want to expose the stack_record struct anyway for your
> series, so why not simply avoid calling stack_depot_put and decrement
> the refcount with your own helper (there needs to be a new stackdepot
> function to return a stack_record under the pool_rwlock held as
> reader).

Yeah, that was something I was experimenting with my last version.
See [0], I moved the "stack_record" struct into the header so page_owner
can make sense of it. I guess that's fine right?
If so, I'd do as you mentioned, just decrementing it with my own helper
so no calls to stack_depot_put will be needed.

Regarding the locking, I yet have to check the patch that implements
the read/write lock, but given that page_owner won't be evicting
anything, do I still have to fiddle with the locks?

> Also, you need to ensure noone else calls stack_depot_put on the stack
> traces you want to keep. If there is a risk someone else may call
> stack_depot_put on them, it obviously won't work (I think the only
> option then is to introduce a way to pin stacks).

Well, since page_owner won't call stack_depot_put, I don't see
how someone else would be able to interfere there, so I think
I am safe there.

[0] https://patchwork.kernel.org/project/linux-mm/patch/20231120084300.4368-3-osalvador@suse.de/
  
Marco Elver Jan. 4, 2024, 10:42 a.m. UTC | #4
On Thu, 4 Jan 2024 at 11:18, Oscar Salvador <osalvador@suse.de> wrote:
>
> On Thu, Jan 04, 2024 at 10:25:40AM +0100, Marco Elver wrote:
> > I think a boolean makes the interface more confusing for everyone
> > else. At that point stack_depot_put merely decrements the refcount and
> > becomes a wrapper around refcount_dec, right?
>
> Thanks Marco for the feedback.
>
> Fair enough.
>
> > I think you want to expose the stack_record struct anyway for your
> > series, so why not simply avoid calling stack_depot_put and decrement
> > the refcount with your own helper (there needs to be a new stackdepot
> > function to return a stack_record under the pool_rwlock held as
> > reader).
>
> Yeah, that was something I was experimenting with my last version.
> See [0], I moved the "stack_record" struct into the header so page_owner
> can make sense of it. I guess that's fine right?

Not exposing the internals would be better, but at this point I think
your usecase looks like it's doing something that is somewhat out of
the bounds of the stackdepot API. I also don't see that it makes sense
to add more helpers to the stackdepot API to deal with such special
cases.

As such, I'd reason that it's ok to expose the struct for this special usecase.

> If so, I'd do as you mentioned, just decrementing it with my own helper
> so no calls to stack_depot_put will be needed.
>
> Regarding the locking, I yet have to check the patch that implements
> the read/write lock, but given that page_owner won't be evicting
> anything, do I still have to fiddle with the locks?

You need to grab the lock as a reader to fetch the stack_record and
return it. All that should be hidden behind whatever function you'll
introduce to return the stack_record (stack_depot_find()?).

> > Also, you need to ensure noone else calls stack_depot_put on the stack
> > traces you want to keep. If there is a risk someone else may call
> > stack_depot_put on them, it obviously won't work (I think the only
> > option then is to introduce a way to pin stacks).
>
> Well, since page_owner won't call stack_depot_put, I don't see
> how someone else would be able to interfere there, so I think
> I am safe there.
>
> [0] https://patchwork.kernel.org/project/linux-mm/patch/20231120084300.4368-3-osalvador@suse.de/
>
> --
> Oscar Salvador
> SUSE Labs
  

Patch

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 611716702d73..a6796f178913 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -97,6 +97,8 @@  static inline int stack_depot_early_init(void)	{ return 0; }
  *
  * If STACK_DEPOT_FLAG_GET is set in @depot_flags, stack depot will increment
  * the refcount on the saved stack trace if it already exists in stack depot.
+ * Users of this flag must also call stack_depot_put() when keeping the stack
+ * trace is no longer required to avoid overflowing the refcount.
  *
  * If the provided stack trace comes from the interrupt context, only the part
  * up to the interrupt entry is saved.
@@ -162,6 +164,18 @@  void stack_depot_print(depot_stack_handle_t stack);
 int stack_depot_snprint(depot_stack_handle_t handle, char *buf, size_t size,
 		       int spaces);
 
+/**
+ * stack_depot_put - Drop a reference to a stack trace from stack depot
+ *
+ * @handle:	Stack depot handle returned from stack_depot_save()
+ *
+ * The stack trace is evicted from stack depot once all references to it have
+ * been dropped (once the number of stack_depot_evict() calls matches the
+ * number of stack_depot_save_flags() calls with STACK_DEPOT_FLAG_GET set for
+ * this stack trace).
+ */
+void stack_depot_put(depot_stack_handle_t handle);
+
 /**
  * stack_depot_set_extra_bits - Set extra bits in a stack depot handle
  *
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 911dee11bf39..c1b31160f4b4 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -394,7 +394,7 @@  static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
 	size_t offset = parts.offset << DEPOT_STACK_ALIGN;
 	struct stack_record *stack;
 
-	lockdep_assert_held_read(&pool_rwlock);
+	lockdep_assert_held(&pool_rwlock);
 
 	if (parts.pool_index > pools_num) {
 		WARN(1, "pool index %d out of bounds (%d) for stack id %08x\n",
@@ -410,6 +410,14 @@  static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
 	return stack;
 }
 
+/* Links stack into the freelist. */
+static void depot_free_stack(struct stack_record *stack)
+{
+	lockdep_assert_held_write(&pool_rwlock);
+
+	list_add(&stack->list, &free_stacks);
+}
+
 /* Calculates the hash for a stack. */
 static inline u32 hash_stack(unsigned long *entries, unsigned int size)
 {
@@ -592,6 +600,33 @@  unsigned int stack_depot_fetch(depot_stack_handle_t handle,
 }
 EXPORT_SYMBOL_GPL(stack_depot_fetch);
 
+void stack_depot_put(depot_stack_handle_t handle)
+{
+	struct stack_record *stack;
+	unsigned long flags;
+
+	if (!handle || stack_depot_disabled)
+		return;
+
+	write_lock_irqsave(&pool_rwlock, flags);
+
+	stack = depot_fetch_stack(handle);
+	if (WARN_ON(!stack))
+		goto out;
+
+	if (refcount_dec_and_test(&stack->count)) {
+		/* Unlink stack from the hash table. */
+		list_del(&stack->list);
+
+		/* Free stack. */
+		depot_free_stack(stack);
+	}
+
+out:
+	write_unlock_irqrestore(&pool_rwlock, flags);
+}
+EXPORT_SYMBOL_GPL(stack_depot_put);
+
 void stack_depot_print(depot_stack_handle_t stack)
 {
 	unsigned long *entries;