[v2,12/19] lib/stackdepot: use list_head for stack record links
Commit Message
From: Andrey Konovalov <andreyknvl@google.com>
Switch stack_record to use list_head for links in the hash table
and in the freelist.
This will allow removing entries from the hash table buckets.
This is preparatory patch for implementing the eviction of stack records
from the stack depot.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
Changes v1->v2:
- Use list_head instead of open-coding backward links.
---
lib/stackdepot.c | 77 ++++++++++++++++++++++++++----------------------
1 file changed, 42 insertions(+), 35 deletions(-)
Comments
On 2023-09-13 19:14, andrey.konovalov@linux.dev wrote:
> From: Andrey Konovalov <andreyknvl@google.com>
>
> Switch stack_record to use list_head for links in the hash table
> and in the freelist.
>
> This will allow removing entries from the hash table buckets.
>
> This is preparatory patch for implementing the eviction of stack records
> from the stack depot.
>
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
>
Building on an arm64 kernel from linux-next tag next-20230915, and boot
that in QEMU. I see the following kernel panic.
[ 67.398850][ T1] Unable to handle kernel read from unreadable memory at virtual address 0000000000000010
[ 67.407996][ T1] Mem abort info:
[ 67.411023][ T1] ESR = 0x0000000096000004
[ 67.414757][ T1] EC = 0x25: DABT (current EL), IL = 32 bits
[ 67.419945][ T1] SET = 0, FnV = 0
[ 67.423172][ T1] EA = 0, S1PTW = 0
[ 67.426669][ T1] FSC = 0x04: level 0 translation fault
[ 67.431357][ T1] Data abort info:
[ 67.434593][ T1] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 67.439801][ T1] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 67.444948][ T1] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 67.449910][ T1] [0000000000000010] user address but active_mm is swapper
[ 67.456236][ T1] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 67.462181][ T1] Modules linked in:
[ 67.465435][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G T 6.6.0-rc1-next-20230915 #2 e95cf19845fbc1e6a5f0694214d59e527e463469
[ 67.477126][ T1] Hardware name: linux,dummy-virt (DT)
[ 67.481994][ T1] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 67.488454][ T1] pc : stack_depot_save_flags+0x2a8/0x780
[ 67.493348][ T1] lr : stack_depot_save_flags+0x2a8/0x780
[ 67.498339][ T1] sp : ffff80008000b870
[ 67.501670][ T1] x29: ffff80008000b870 x28: 00000000650dddc5 x27: 0000000000000000
[ 67.508658][ T1] x26: ffff80008470a000 x25: ffff80008000b9e8 x24: 0000000000000001
[ 67.515564][ T1] x23: 000000000000000e x22: ffff80008000b988 x21: 0000000000000001
[ 67.522430][ T1] x20: ffff00007b40d070 x19: 000000006ee80007 x18: ffff80008000d080
[ 67.529365][ T1] x17: 0000000000000000 x16: 0000000000000000 x15: 2030303178302f30
[ 67.536101][ T1] x14: 0000000000000000 x13: 205d315420202020 x12: 0000000000000000
[ 67.542985][ T1] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
[ 67.549863][ T1] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[ 67.556764][ T1] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[ 67.563687][ T1] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[ 67.570500][ T1] Call trace:
[ 67.573275][ T1] stack_depot_save_flags+0x2a8/0x780
[ 67.577794][ T1] stack_depot_save+0x4c/0xc0
[ 67.582062][ T1] ref_tracker_alloc+0x354/0x480
[ 67.586273][ T1] sk_alloc+0x280/0x5f8
[ 67.590064][ T1] __netlink_create+0x84/0x200
[ 67.594009][ T1] __netlink_kernel_create+0x11c/0x500
[ 67.598816][ T1] rtnetlink_net_init+0xc4/0x180
[ 67.603052][ T1] ops_init+0x100/0x2c0
[ 67.606827][ T1] register_pernet_operations+0x228/0x480
[ 67.611568][ T1] register_pernet_subsys+0x5c/0xc0
[ 67.616282][ T1] rtnetlink_init+0x60/0xb00
[ 67.620086][ T1] netlink_proto_init+0x374/0x400
[ 67.624465][ T1] do_one_initcall+0x2c8/0x840
[ 67.628518][ T1] do_initcalls+0x21c/0x340
[ 67.632527][ T1] kernel_init_freeable+0x3b0/0x480
[ 67.636905][ T1] kernel_init+0x58/0x380
[ 67.640768][ T1] ret_from_fork+0x10/0x40
[ 67.644606][ T1] Code: eb1b029f 540008c0 91004360 97caa437 (b9401360)
[ 67.650293][ T1] ---[ end trace 0000000000000000 ]---
[ 67.654948][ T1] Kernel panic - not syncing: Oops: Fatal exception
[ 67.660229][ T1] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
The full log can be found [1] and the .config file [2]. I bisected down
to this commit, see the bisect log [3].
When reverted these two commits I managed to build and the kernel
booted.
47590ecf1166 ("lib/stackdepot: use list_head for stack record links")
8729f3c26fc2 ("lib/stackdepot: allow users to evict stack traces")
Cheers,
Anders
[1] http://ix.io/4GyE
[2] https://people.linaro.org/~anders.roxell/next-20230915.config
[3] http://ix.io/4GyG
On Sat, 16 Sep 2023 19:43:35 +0200 Anders Roxell <anders.roxell@linaro.org> wrote:
> On 2023-09-13 19:14, andrey.konovalov@linux.dev wrote:
> > From: Andrey Konovalov <andreyknvl@google.com>
> >
> > Switch stack_record to use list_head for links in the hash table
> > and in the freelist.
> >
> > This will allow removing entries from the hash table buckets.
> >
> > This is preparatory patch for implementing the eviction of stack records
> > from the stack depot.
> >
> > Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> >
>
> Building on an arm64 kernel from linux-next tag next-20230915, and boot
> that in QEMU. I see the following kernel panic.
>
> ...
>
> The full log can be found [1] and the .config file [2]. I bisected down
> to this commit, see the bisect log [3].
>
> When reverted these two commits I managed to build and the kernel
> booted.
>
> 47590ecf1166 ("lib/stackdepot: use list_head for stack record links")
> 8729f3c26fc2 ("lib/stackdepot: allow users to evict stack traces")
Thanks, I have dropped this v2 series.
On Sat, Sep 16, 2023 at 10:04 PM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Sat, 16 Sep 2023 19:43:35 +0200 Anders Roxell <anders.roxell@linaro.org> wrote:
>
> > On 2023-09-13 19:14, andrey.konovalov@linux.dev wrote:
> > > From: Andrey Konovalov <andreyknvl@google.com>
> > >
> > > Switch stack_record to use list_head for links in the hash table
> > > and in the freelist.
> > >
> > > This will allow removing entries from the hash table buckets.
> > >
> > > This is preparatory patch for implementing the eviction of stack records
> > > from the stack depot.
> > >
> > > Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> > >
> >
> > Building on an arm64 kernel from linux-next tag next-20230915, and boot
> > that in QEMU. I see the following kernel panic.
> >
> > ...
> >
> > The full log can be found [1] and the .config file [2]. I bisected down
> > to this commit, see the bisect log [3].
I am also seeing similar crashes on an x86 KMSAN build.
They are happening when in the following code:
list_for_each(pos, bucket) {
found = list_entry(pos, struct stack_record, list);
if (found->hash == hash &&
found->size == size &&
!stackdepot_memcmp(entries, found->entries, size))
return found;
}
`found` is NULL
@Andrey, could you please take a look?
On Mon, Oct 9, 2023 at 2:16 PM Alexander Potapenko <glider@google.com> wrote:
>
> On Sat, Sep 16, 2023 at 10:04 PM Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >
> > On Sat, 16 Sep 2023 19:43:35 +0200 Anders Roxell <anders.roxell@linaro.org> wrote:
> >
> > > On 2023-09-13 19:14, andrey.konovalov@linux.dev wrote:
> > > > From: Andrey Konovalov <andreyknvl@google.com>
> > > >
> > > > Switch stack_record to use list_head for links in the hash table
> > > > and in the freelist.
> > > >
> > > > This will allow removing entries from the hash table buckets.
> > > >
> > > > This is preparatory patch for implementing the eviction of stack records
> > > > from the stack depot.
> > > >
> > > > Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> > > >
> > >
> > > Building on an arm64 kernel from linux-next tag next-20230915, and boot
> > > that in QEMU. I see the following kernel panic.
> > >
> > > ...
> > >
> > > The full log can be found [1] and the .config file [2]. I bisected down
> > > to this commit, see the bisect log [3].
>
> I am also seeing similar crashes on an x86 KMSAN build.
>
> They are happening when in the following code:
>
> list_for_each(pos, bucket) {
> found = list_entry(pos, struct stack_record, list);
> if (found->hash == hash &&
> found->size == size &&
> !stackdepot_memcmp(entries, found->entries, size))
> return found;
> }
>
> `found` is NULL
>
> @Andrey, could you please take a look?
Found a bug, will fix in v3. Thank you!
@@ -18,6 +18,7 @@
#include <linux/jhash.h>
#include <linux/kernel.h>
#include <linux/kmsan.h>
+#include <linux/list.h>
#include <linux/mm.h>
#include <linux/mutex.h>
#include <linux/percpu.h>
@@ -55,7 +56,7 @@ union handle_parts {
};
struct stack_record {
- struct stack_record *next; /* Link in hash table or freelist */
+ struct list_head list; /* Links in hash table or freelist */
u32 hash; /* Hash in hash table */
u32 size; /* Number of stored frames */
union handle_parts handle;
@@ -77,21 +78,21 @@ static bool __stack_depot_early_init_passed __initdata;
/* Initial seed for jhash2. */
#define STACK_HASH_SEED 0x9747b28c
-/* Hash table of pointers to stored stack traces. */
-static struct stack_record **stack_table;
+/* Hash table of stored stack records. */
+static struct list_head *stack_table;
/* Fixed order of the number of table buckets. Used when KASAN is enabled. */
static unsigned int stack_bucket_number_order;
/* Hash mask for indexing the table. */
static unsigned int stack_hash_mask;
-/* Array of memory regions that store stack traces. */
+/* Array of memory regions that store stack records. */
static void *stack_pools[DEPOT_MAX_POOLS];
/* Newly allocated pool that is not yet added to stack_pools. */
static void *new_pool;
/* Number of pools in stack_pools. */
static int pools_num;
-/* Next stack in the freelist of stack records within stack_pools. */
-static struct stack_record *next_stack;
+/* Freelist of stack records within stack_pools. */
+static LIST_HEAD(free_stacks);
/*
* Stack depot tries to keep an extra pool allocated even before it runs out
* of space in the currently used pool. This flag marks whether this extra pool
@@ -123,6 +124,15 @@ void __init stack_depot_request_early_init(void)
__stack_depot_early_init_requested = true;
}
+/* Initialize list_head's within the hash table. */
+static void init_stack_table(unsigned long entries)
+{
+ unsigned long i;
+
+ for (i = 0; i < entries; i++)
+ INIT_LIST_HEAD(&stack_table[i]);
+}
+
/* Allocates a hash table via memblock. Can only be used during early boot. */
int __init stack_depot_early_init(void)
{
@@ -152,10 +162,10 @@ int __init stack_depot_early_init(void)
entries = 1UL << stack_bucket_number_order;
pr_info("allocating hash table via alloc_large_system_hash\n");
stack_table = alloc_large_system_hash("stackdepot",
- sizeof(struct stack_record *),
+ sizeof(struct list_head),
entries,
STACK_HASH_TABLE_SCALE,
- HASH_EARLY | HASH_ZERO,
+ HASH_EARLY,
NULL,
&stack_hash_mask,
1UL << STACK_BUCKET_NUMBER_ORDER_MIN,
@@ -165,6 +175,7 @@ int __init stack_depot_early_init(void)
stack_depot_disabled = true;
return -ENOMEM;
}
+ init_stack_table(entries);
return 0;
}
@@ -205,7 +216,7 @@ int stack_depot_init(void)
entries = 1UL << STACK_BUCKET_NUMBER_ORDER_MAX;
pr_info("allocating hash table of %lu entries via kvcalloc\n", entries);
- stack_table = kvcalloc(entries, sizeof(struct stack_record *), GFP_KERNEL);
+ stack_table = kvcalloc(entries, sizeof(struct list_head), GFP_KERNEL);
if (!stack_table) {
pr_err("hash table allocation failed, disabling\n");
stack_depot_disabled = true;
@@ -213,6 +224,7 @@ int stack_depot_init(void)
goto out_unlock;
}
stack_hash_mask = entries - 1;
+ init_stack_table(entries);
out_unlock:
mutex_unlock(&stack_depot_init_mutex);
@@ -224,30 +236,24 @@ EXPORT_SYMBOL_GPL(stack_depot_init);
/* Initializes a stack depol pool. */
static void depot_init_pool(void *pool)
{
- const int records_in_pool = DEPOT_POOL_SIZE / DEPOT_STACK_RECORD_SIZE;
- int i, offset;
+ int offset;
lockdep_assert_held_write(&pool_rwlock);
- /* Initialize handles and link stack records to each other. */
- for (i = 0, offset = 0;
- offset <= DEPOT_POOL_SIZE - DEPOT_STACK_RECORD_SIZE;
- i++, offset += DEPOT_STACK_RECORD_SIZE) {
+ WARN_ON(!list_empty(&free_stacks));
+
+ /* Initialize handles and link stack records into the freelist. */
+ for (offset = 0; offset <= DEPOT_POOL_SIZE - DEPOT_STACK_RECORD_SIZE;
+ offset += DEPOT_STACK_RECORD_SIZE) {
struct stack_record *stack = pool + offset;
stack->handle.pool_index = pools_num;
stack->handle.offset = offset >> DEPOT_STACK_ALIGN;
stack->handle.extra = 0;
- if (i < records_in_pool - 1)
- stack->next = (void *)stack + DEPOT_STACK_RECORD_SIZE;
- else
- stack->next = NULL;
+ list_add(&stack->list, &free_stacks);
}
- /* Link stack records into the freelist. */
- WARN_ON(next_stack);
- next_stack = pool;
/* Save reference to the pool to be used by depot_fetch_stack. */
stack_pools[pools_num] = pool;
@@ -289,7 +295,7 @@ static bool depot_update_pools(void **prealloc)
lockdep_assert_held_write(&pool_rwlock);
/* Check if we still have objects in the freelist. */
- if (next_stack)
+ if (!list_empty(&free_stacks))
goto out_keep_prealloc;
/* Check if we have a new pool saved and use it. */
@@ -340,19 +346,18 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
return NULL;
/* Check if we have a stack record to save the stack trace. */
- stack = next_stack;
- if (!stack)
+ if (list_empty(&free_stacks))
return NULL;
- /* Advance the freelist. */
- next_stack = stack->next;
+ /* Get and unlink the first entry from the freelist. */
+ stack = list_first_entry(&free_stacks, struct stack_record, list);
+ list_del(&stack->list);
/* Limit number of saved frames to CONFIG_STACKDEPOT_MAX_FRAMES. */
if (size > CONFIG_STACKDEPOT_MAX_FRAMES)
size = CONFIG_STACKDEPOT_MAX_FRAMES;
/* Save the stack trace. */
- stack->next = NULL;
stack->hash = hash;
stack->size = size;
/* stack->handle is already filled in by depot_init_pool. */
@@ -414,15 +419,17 @@ int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2,
}
/* Finds a stack in a bucket of the hash table. */
-static inline struct stack_record *find_stack(struct stack_record *bucket,
+static inline struct stack_record *find_stack(struct list_head *bucket,
unsigned long *entries, int size,
u32 hash)
{
+ struct list_head *pos;
struct stack_record *found;
lockdep_assert_held(&pool_rwlock);
- for (found = bucket; found; found = found->next) {
+ list_for_each(pos, bucket) {
+ found = list_entry(pos, struct stack_record, list);
if (found->hash == hash &&
found->size == size &&
!stackdepot_memcmp(entries, found->entries, size))
@@ -435,7 +442,8 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
gfp_t alloc_flags, bool can_alloc)
{
- struct stack_record *found = NULL, **bucket;
+ struct list_head *bucket;
+ struct stack_record *found = NULL;
depot_stack_handle_t handle = 0;
struct page *page = NULL;
void *prealloc = NULL;
@@ -462,7 +470,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
read_lock_irqsave(&pool_rwlock, flags);
/* Fast path: look the stack trace up without full locking. */
- found = find_stack(*bucket, entries, nr_entries, hash);
+ found = find_stack(bucket, entries, nr_entries, hash);
if (found) {
read_unlock_irqrestore(&pool_rwlock, flags);
goto exit;
@@ -494,14 +502,13 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
write_lock_irqsave(&pool_rwlock, flags);
- found = find_stack(*bucket, entries, nr_entries, hash);
+ found = find_stack(bucket, entries, nr_entries, hash);
if (!found) {
struct stack_record *new =
depot_alloc_stack(entries, nr_entries, hash, &prealloc);
if (new) {
- new->next = *bucket;
- *bucket = new;
+ list_add(&new->list, bucket);
found = new;
}
} else if (prealloc) {