[mm,2/4] kasan: handle concurrent kasan_record_aux_stack calls

Message ID 432a89fafce11244287c8af757e73a2eb22a5354.1702339432.git.andreyknvl@google.com
State New
Headers
Series lib/stackdepot, kasan: fixes for stack eviction series |

Commit Message

andrey.konovalov@linux.dev Dec. 12, 2023, 12:14 a.m. UTC
  From: Andrey Konovalov <andreyknvl@google.com>

kasan_record_aux_stack can be called concurrently on the same object.
This might lead to a race condition when rotating the saved aux stack
trace handles.

Fix by introducing a spinlock to protect the aux stack trace handles
in kasan_record_aux_stack.

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: syzbot+186b55175d8360728234@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000784b1c060b0074a2@google.com/
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>

---

This can be squashed into "kasan: use stack_depot_put for Generic mode"
or left standalone.
---
 mm/kasan/generic.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)
  

Comments

Marco Elver Dec. 12, 2023, 7:28 p.m. UTC | #1
On Tue, 12 Dec 2023 at 01:14, <andrey.konovalov@linux.dev> wrote:
>
> From: Andrey Konovalov <andreyknvl@google.com>
>
> kasan_record_aux_stack can be called concurrently on the same object.
> This might lead to a race condition when rotating the saved aux stack
> trace handles.
>
> Fix by introducing a spinlock to protect the aux stack trace handles
> in kasan_record_aux_stack.
>
> Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> Reported-by: syzbot+186b55175d8360728234@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/000000000000784b1c060b0074a2@google.com/
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
>
> ---
>
> This can be squashed into "kasan: use stack_depot_put for Generic mode"
> or left standalone.
> ---
>  mm/kasan/generic.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
> index 54e20b2bc3e1..ca5c75a1866c 100644
> --- a/mm/kasan/generic.c
> +++ b/mm/kasan/generic.c
> @@ -25,6 +25,7 @@
>  #include <linux/sched.h>
>  #include <linux/sched/task_stack.h>
>  #include <linux/slab.h>
> +#include <linux/spinlock.h>
>  #include <linux/stackdepot.h>
>  #include <linux/stacktrace.h>
>  #include <linux/string.h>
> @@ -35,6 +36,8 @@
>  #include "kasan.h"
>  #include "../slab.h"
>
> +DEFINE_SPINLOCK(aux_lock);

No, please don't.

>  /*
>   * All functions below always inlined so compiler could
>   * perform better optimizations in each of __asan_loadX/__assn_storeX
> @@ -502,6 +505,8 @@ static void __kasan_record_aux_stack(void *addr, depot_flags_t depot_flags)
>         struct kmem_cache *cache;
>         struct kasan_alloc_meta *alloc_meta;
>         void *object;
> +       depot_stack_handle_t new_handle, old_handle;
> +       unsigned long flags;
>
>         if (is_kfence_address(addr) || !slab)
>                 return;
> @@ -512,9 +517,15 @@ static void __kasan_record_aux_stack(void *addr, depot_flags_t depot_flags)
>         if (!alloc_meta)
>                 return;
>
> -       stack_depot_put(alloc_meta->aux_stack[1]);
> +       new_handle = kasan_save_stack(0, depot_flags);
> +
> +       spin_lock_irqsave(&aux_lock, flags);

This is a unnecessary global lock. What's the problem here? As far as
I can understand a race is possible where we may end up with
duplicated or lost stack handles.

Since storing this information is best effort anyway, and bugs are
rare, a global lock protecting this is overkill.

I'd just accept the racyness and use READ_ONCE() / WRITE_ONCE() just
to make sure we don't tear any reads/writes and the depot handles are
valid. There are other more complex schemes [1], but I think they are
overkill as well.

[1]: Since a depot stack handle is just an u32, we can have a

 union {
   depot_stack_handle_t handles[2];
   atomic64_t atomic_handle;
  } aux_stack;
(BUILD_BUG_ON somewhere if sizeof handles and atomic_handle mismatch.)

Then in the code here create the same union and load atomic_handle.
Swap handle[1] into handle[0] and write the new one in handles[1].
Then do a cmpxchg loop to store the new atomic_handle.

> +       old_handle = alloc_meta->aux_stack[1];
>         alloc_meta->aux_stack[1] = alloc_meta->aux_stack[0];
> -       alloc_meta->aux_stack[0] = kasan_save_stack(0, depot_flags);
> +       alloc_meta->aux_stack[0] = new_handle;
> +       spin_unlock_irqrestore(&aux_lock, flags);
> +
> +       stack_depot_put(old_handle);
>  }
>
>  void kasan_record_aux_stack(void *addr)
> --
> 2.25.1
>
  
Andrey Konovalov Dec. 13, 2023, 2:40 p.m. UTC | #2
On Tue, Dec 12, 2023 at 8:29 PM Marco Elver <elver@google.com> wrote:
>
> > -       stack_depot_put(alloc_meta->aux_stack[1]);
> > +       new_handle = kasan_save_stack(0, depot_flags);
> > +
> > +       spin_lock_irqsave(&aux_lock, flags);
>
> This is a unnecessary global lock. What's the problem here? As far as
> I can understand a race is possible where we may end up with
> duplicated or lost stack handles.

Yes, this is the problem. And this leads to refcount underflows in the
stack depot code, as we fail to keep precise track of the stack
traces.

> Since storing this information is best effort anyway, and bugs are
> rare, a global lock protecting this is overkill.
>
> I'd just accept the racyness and use READ_ONCE() / WRITE_ONCE() just
> to make sure we don't tear any reads/writes and the depot handles are
> valid.

This will help with the potential tears but will not help with the
refcount issues.

> There are other more complex schemes [1], but I think they are
> overkill as well.
>
> [1]: Since a depot stack handle is just an u32, we can have a
>
>  union {
>    depot_stack_handle_t handles[2];
>    atomic64_t atomic_handle;
>   } aux_stack;
> (BUILD_BUG_ON somewhere if sizeof handles and atomic_handle mismatch.)
>
> Then in the code here create the same union and load atomic_handle.
> Swap handle[1] into handle[0] and write the new one in handles[1].
> Then do a cmpxchg loop to store the new atomic_handle.

This approach should work. If you prefer, I can do this instead of a spinlock.

But we do need some kind of atomicity while rotating the aux handles
to make sure nothing gets lost.

Thanks!
  
Marco Elver Dec. 13, 2023, 4:50 p.m. UTC | #3
On Wed, 13 Dec 2023 at 15:40, Andrey Konovalov <andreyknvl@gmail.com> wrote:
>
> On Tue, Dec 12, 2023 at 8:29 PM Marco Elver <elver@google.com> wrote:
> >
> > > -       stack_depot_put(alloc_meta->aux_stack[1]);
> > > +       new_handle = kasan_save_stack(0, depot_flags);
> > > +
> > > +       spin_lock_irqsave(&aux_lock, flags);
> >
> > This is a unnecessary global lock. What's the problem here? As far as
> > I can understand a race is possible where we may end up with
> > duplicated or lost stack handles.
>
> Yes, this is the problem. And this leads to refcount underflows in the
> stack depot code, as we fail to keep precise track of the stack
> traces.
>
> > Since storing this information is best effort anyway, and bugs are
> > rare, a global lock protecting this is overkill.
> >
> > I'd just accept the racyness and use READ_ONCE() / WRITE_ONCE() just
> > to make sure we don't tear any reads/writes and the depot handles are
> > valid.
>
> This will help with the potential tears but will not help with the
> refcount issues.
>
> > There are other more complex schemes [1], but I think they are
> > overkill as well.
> >
> > [1]: Since a depot stack handle is just an u32, we can have a
> >
> >  union {
> >    depot_stack_handle_t handles[2];
> >    atomic64_t atomic_handle;
> >   } aux_stack;
> > (BUILD_BUG_ON somewhere if sizeof handles and atomic_handle mismatch.)
> >
> > Then in the code here create the same union and load atomic_handle.
> > Swap handle[1] into handle[0] and write the new one in handles[1].
> > Then do a cmpxchg loop to store the new atomic_handle.
>
> This approach should work. If you prefer, I can do this instead of a spinlock.
>
> But we do need some kind of atomicity while rotating the aux handles
> to make sure nothing gets lost.

Yes, I think that'd be preferable. Although note that not all 32-bit
architectures have 64-bit atomics, so that may be an issue. Another
alternative is to have a spinlock next to the aux_stack (it needs to
be initialized properly). It'll use up a little more space, but that's
for KASAN configs only, so I think it's ok. Certainly better than a
global lock.
  
Andrey Konovalov Dec. 14, 2023, 12:41 a.m. UTC | #4
On Wed, Dec 13, 2023 at 5:51 PM Marco Elver <elver@google.com> wrote:
>
> > > [1]: Since a depot stack handle is just an u32, we can have a
> > >
> > >  union {
> > >    depot_stack_handle_t handles[2];
> > >    atomic64_t atomic_handle;
> > >   } aux_stack;
> > > (BUILD_BUG_ON somewhere if sizeof handles and atomic_handle mismatch.)
> > >
> > > Then in the code here create the same union and load atomic_handle.
> > > Swap handle[1] into handle[0] and write the new one in handles[1].
> > > Then do a cmpxchg loop to store the new atomic_handle.
> >
> > This approach should work. If you prefer, I can do this instead of a spinlock.
> >
> > But we do need some kind of atomicity while rotating the aux handles
> > to make sure nothing gets lost.
>
> Yes, I think that'd be preferable. Although note that not all 32-bit
> architectures have 64-bit atomics, so that may be an issue. Another
> alternative is to have a spinlock next to the aux_stack (it needs to
> be initialized properly). It'll use up a little more space, but that's
> for KASAN configs only, so I think it's ok. Certainly better than a
> global lock.

Ah, hm, actually this is what I indented to do with this change. But
somehow my brain glitched out and decided to use a global lock :)

I'll change this into a local spinlock in v2.

Thank you!
  

Patch

diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 54e20b2bc3e1..ca5c75a1866c 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -25,6 +25,7 @@ 
 #include <linux/sched.h>
 #include <linux/sched/task_stack.h>
 #include <linux/slab.h>
+#include <linux/spinlock.h>
 #include <linux/stackdepot.h>
 #include <linux/stacktrace.h>
 #include <linux/string.h>
@@ -35,6 +36,8 @@ 
 #include "kasan.h"
 #include "../slab.h"
 
+DEFINE_SPINLOCK(aux_lock);
+
 /*
  * All functions below always inlined so compiler could
  * perform better optimizations in each of __asan_loadX/__assn_storeX
@@ -502,6 +505,8 @@  static void __kasan_record_aux_stack(void *addr, depot_flags_t depot_flags)
 	struct kmem_cache *cache;
 	struct kasan_alloc_meta *alloc_meta;
 	void *object;
+	depot_stack_handle_t new_handle, old_handle;
+	unsigned long flags;
 
 	if (is_kfence_address(addr) || !slab)
 		return;
@@ -512,9 +517,15 @@  static void __kasan_record_aux_stack(void *addr, depot_flags_t depot_flags)
 	if (!alloc_meta)
 		return;
 
-	stack_depot_put(alloc_meta->aux_stack[1]);
+	new_handle = kasan_save_stack(0, depot_flags);
+
+	spin_lock_irqsave(&aux_lock, flags);
+	old_handle = alloc_meta->aux_stack[1];
 	alloc_meta->aux_stack[1] = alloc_meta->aux_stack[0];
-	alloc_meta->aux_stack[0] = kasan_save_stack(0, depot_flags);
+	alloc_meta->aux_stack[0] = new_handle;
+	spin_unlock_irqrestore(&aux_lock, flags);
+
+	stack_depot_put(old_handle);
 }
 
 void kasan_record_aux_stack(void *addr)