LoongArch: Check unwind_error() in arch_stack_walk()

Message ID 1679380154-20308-1-git-send-email-yangtiezhu@loongson.cn
State New
Headers
Series LoongArch: Check unwind_error() in arch_stack_walk() |

Commit Message

Tiezhu Yang March 21, 2023, 6:29 a.m. UTC
  We can see the following messages with CONFIG_PROVE_LOCKING=y on
LoongArch:

  BUG: MAX_STACK_TRACE_ENTRIES too low!
  turning off the locking correctness validator.

This is because stack_trace_save() returns a big value after call
arch_stack_walk(), here is the call trace:

  save_trace()
    stack_trace_save()
      arch_stack_walk()
        stack_trace_consume_entry()

arch_stack_walk() should return immediately if unwind_next_frame()
failed, no need to do the useless loops to increase the value of
c->len in stack_trace_consume_entry(), then we can fix the above
problem.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/all/8a44ad71-68d2-4926-892f-72bfc7a67e2a@roeck-us.net/
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
 arch/loongarch/kernel/stacktrace.c      | 3 ++-
 arch/loongarch/kernel/unwind.c          | 1 +
 arch/loongarch/kernel/unwind_prologue.c | 4 +++-
 3 files changed, 6 insertions(+), 2 deletions(-)
  

Comments

Xi Ruoyao March 21, 2023, 12:35 p.m. UTC | #1
On Tue, 2023-03-21 at 14:29 +0800, Tiezhu Yang wrote:
> We can see the following messages with CONFIG_PROVE_LOCKING=y on
> LoongArch:
> 
>   BUG: MAX_STACK_TRACE_ENTRIES too low!
>   turning off the locking correctness validator.
> 
> This is because stack_trace_save() returns a big value after call
> arch_stack_walk(), here is the call trace:
> 
>   save_trace()
>     stack_trace_save()
>       arch_stack_walk()
>         stack_trace_consume_entry()
> 
> arch_stack_walk() should return immediately if unwind_next_frame()
> failed, no need to do the useless loops to increase the value of
> c->len in stack_trace_consume_entry(), then we can fix the above
> problem.
> 
> Reported-by: Guenter Roeck <linux@roeck-us.net>
> Link: https://lore.kernel.org/all/8a44ad71-68d2-4926-892f-72bfc7a67e2a@roeck-us.net/
> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>

The fix makes sense, but I'm asking the same question again (sorry if
it's noisy): should we Cc stable@vger.kernel.org and/or make a PR for
6.3?

To me a bug fixes should be backported into all stable branches affected
by the bug, unless there is some serious difficulty.  As 6.3 release
will work on launched 3A5000 boards out-of-box, people may want to stop
staying on the leading edge and use a LTS/stable release series. We
can't just say (or behave like) "we don't backport, please use latest
mainline" IMO :).

> ---
>  arch/loongarch/kernel/stacktrace.c      | 3 ++-
>  arch/loongarch/kernel/unwind.c          | 1 +
>  arch/loongarch/kernel/unwind_prologue.c | 4 +++-
>  3 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/loongarch/kernel/stacktrace.c b/arch/loongarch/kernel/stacktrace.c
> index 3a690f9..7c15ba5 100644
> --- a/arch/loongarch/kernel/stacktrace.c
> +++ b/arch/loongarch/kernel/stacktrace.c
> @@ -30,7 +30,8 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
>  
>         regs->regs[1] = 0;
>         for (unwind_start(&state, task, regs);
> -             !unwind_done(&state); unwind_next_frame(&state)) {
> +            !unwind_done(&state) && !unwind_error(&state);
> +            unwind_next_frame(&state)) {
>                 addr = unwind_get_return_address(&state);
>                 if (!addr || !consume_entry(cookie, addr))
>                         break;
> diff --git a/arch/loongarch/kernel/unwind.c b/arch/loongarch/kernel/unwind.c
> index a463d69..ba324ba 100644
> --- a/arch/loongarch/kernel/unwind.c
> +++ b/arch/loongarch/kernel/unwind.c
> @@ -28,5 +28,6 @@ bool default_next_frame(struct unwind_state *state)
>  
>         } while (!get_stack_info(state->sp, state->task, info));
>  
> +       state->error = true;
>         return false;
>  }
> diff --git a/arch/loongarch/kernel/unwind_prologue.c b/arch/loongarch/kernel/unwind_prologue.c
> index 9095fde..55afc27 100644
> --- a/arch/loongarch/kernel/unwind_prologue.c
> +++ b/arch/loongarch/kernel/unwind_prologue.c
> @@ -211,7 +211,7 @@ static bool next_frame(struct unwind_state *state)
>                         pc = regs->csr_era;
>  
>                         if (user_mode(regs) || !__kernel_text_address(pc))
> -                               return false;
> +                               goto out;
>  
>                         state->first = true;
>                         state->pc = pc;
> @@ -226,6 +226,8 @@ static bool next_frame(struct unwind_state *state)
>  
>         } while (!get_stack_info(state->sp, state->task, info));
>  
> +out:
> +       state->error = true;
>         return false;
>  }
>
  
Guenter Roeck March 21, 2023, 2:25 p.m. UTC | #2
On Tue, Mar 21, 2023 at 08:35:34PM +0800, Xi Ruoyao wrote:
> On Tue, 2023-03-21 at 14:29 +0800, Tiezhu Yang wrote:
> > We can see the following messages with CONFIG_PROVE_LOCKING=y on
> > LoongArch:
> > 
> >   BUG: MAX_STACK_TRACE_ENTRIES too low!
> >   turning off the locking correctness validator.
> > 
> > This is because stack_trace_save() returns a big value after call
> > arch_stack_walk(), here is the call trace:
> > 
> >   save_trace()
> >     stack_trace_save()
> >       arch_stack_walk()
> >         stack_trace_consume_entry()
> > 
> > arch_stack_walk() should return immediately if unwind_next_frame()
> > failed, no need to do the useless loops to increase the value of
> > c->len in stack_trace_consume_entry(), then we can fix the above
> > problem.
> > 
> > Reported-by: Guenter Roeck <linux@roeck-us.net>
> > Link: https://lore.kernel.org/all/8a44ad71-68d2-4926-892f-72bfc7a67e2a@roeck-us.net/
> > Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> 
> The fix makes sense, but I'm asking the same question again (sorry if
> it's noisy): should we Cc stable@vger.kernel.org and/or make a PR for
> 6.3?
> 
> To me a bug fixes should be backported into all stable branches affected
> by the bug, unless there is some serious difficulty.  As 6.3 release
> will work on launched 3A5000 boards out-of-box, people may want to stop
> staying on the leading edge and use a LTS/stable release series. We
> can't just say (or behave like) "we don't backport, please use latest
> mainline" IMO :).

It is a bug fix, isn't it ? It should be backported to v6.1+. Otherwise,
if your policy is to not backport bug fixes, I might as well stop testing
loongarch on all but the most recent kernel branch. Let me know if this is
what you want. If so, I think you should let all other regression testers
know that they should only test loongarch on mainline and possibly on
linux-next.

Thanks,
Guenter
  
Huacai Chen March 22, 2023, 12:50 a.m. UTC | #3
On Tue, Mar 21, 2023 at 10:25 PM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On Tue, Mar 21, 2023 at 08:35:34PM +0800, Xi Ruoyao wrote:
> > On Tue, 2023-03-21 at 14:29 +0800, Tiezhu Yang wrote:
> > > We can see the following messages with CONFIG_PROVE_LOCKING=y on
> > > LoongArch:
> > >
> > >   BUG: MAX_STACK_TRACE_ENTRIES too low!
> > >   turning off the locking correctness validator.
> > >
> > > This is because stack_trace_save() returns a big value after call
> > > arch_stack_walk(), here is the call trace:
> > >
> > >   save_trace()
> > >     stack_trace_save()
> > >       arch_stack_walk()
> > >         stack_trace_consume_entry()
> > >
> > > arch_stack_walk() should return immediately if unwind_next_frame()
> > > failed, no need to do the useless loops to increase the value of
> > > c->len in stack_trace_consume_entry(), then we can fix the above
> > > problem.
> > >
> > > Reported-by: Guenter Roeck <linux@roeck-us.net>
> > > Link: https://lore.kernel.org/all/8a44ad71-68d2-4926-892f-72bfc7a67e2a@roeck-us.net/
> > > Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> >
> > The fix makes sense, but I'm asking the same question again (sorry if
> > it's noisy): should we Cc stable@vger.kernel.org and/or make a PR for
> > 6.3?
> >
> > To me a bug fixes should be backported into all stable branches affected
> > by the bug, unless there is some serious difficulty.  As 6.3 release
> > will work on launched 3A5000 boards out-of-box, people may want to stop
> > staying on the leading edge and use a LTS/stable release series. We
> > can't just say (or behave like) "we don't backport, please use latest
> > mainline" IMO :).
>
> It is a bug fix, isn't it ? It should be backported to v6.1+. Otherwise,
> if your policy is to not backport bug fixes, I might as well stop testing
> loongarch on all but the most recent kernel branch. Let me know if this is
> what you want. If so, I think you should let all other regression testers
> know that they should only test loongarch on mainline and possibly on
> linux-next.
This is of course a bug fix, but should Tiezhu resend this patch? Or
just replying to this message with CC stable@vger.kernel.org is
enough?

Huacai
>
> Thanks,
> Guenter
  
Guenter Roeck March 22, 2023, 2:20 a.m. UTC | #4
On Wed, Mar 22, 2023 at 08:50:07AM +0800, Huacai Chen wrote:
> On Tue, Mar 21, 2023 at 10:25 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > On Tue, Mar 21, 2023 at 08:35:34PM +0800, Xi Ruoyao wrote:
> > > On Tue, 2023-03-21 at 14:29 +0800, Tiezhu Yang wrote:
> > > > We can see the following messages with CONFIG_PROVE_LOCKING=y on
> > > > LoongArch:
> > > >
> > > >   BUG: MAX_STACK_TRACE_ENTRIES too low!
> > > >   turning off the locking correctness validator.
> > > >
> > > > This is because stack_trace_save() returns a big value after call
> > > > arch_stack_walk(), here is the call trace:
> > > >
> > > >   save_trace()
> > > >     stack_trace_save()
> > > >       arch_stack_walk()
> > > >         stack_trace_consume_entry()
> > > >
> > > > arch_stack_walk() should return immediately if unwind_next_frame()
> > > > failed, no need to do the useless loops to increase the value of
> > > > c->len in stack_trace_consume_entry(), then we can fix the above
> > > > problem.
> > > >
> > > > Reported-by: Guenter Roeck <linux@roeck-us.net>
> > > > Link: https://lore.kernel.org/all/8a44ad71-68d2-4926-892f-72bfc7a67e2a@roeck-us.net/
> > > > Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> > >
> > > The fix makes sense, but I'm asking the same question again (sorry if
> > > it's noisy): should we Cc stable@vger.kernel.org and/or make a PR for
> > > 6.3?
> > >
> > > To me a bug fixes should be backported into all stable branches affected
> > > by the bug, unless there is some serious difficulty.  As 6.3 release
> > > will work on launched 3A5000 boards out-of-box, people may want to stop
> > > staying on the leading edge and use a LTS/stable release series. We
> > > can't just say (or behave like) "we don't backport, please use latest
> > > mainline" IMO :).
> >
> > It is a bug fix, isn't it ? It should be backported to v6.1+. Otherwise,
> > if your policy is to not backport bug fixes, I might as well stop testing
> > loongarch on all but the most recent kernel branch. Let me know if this is
> > what you want. If so, I think you should let all other regression testers
> > know that they should only test loongarch on mainline and possibly on
> > linux-next.
> This is of course a bug fix, but should Tiezhu resend this patch? Or
> just replying to this message with CC stable@vger.kernel.org is
> enough?
> 

Normally the maintainer, before sending a pull request to Linus, would add
"Cc: stable@vger.kernel.org" to the patch. Actually sending the patch to
the stable@ mailing list is only necessary if it was applied to the
upstream kernel without Cc: stable@ in the commit message.

Guenter
  
Huacai Chen March 23, 2023, 1:30 a.m. UTC | #5
OK, thanks.

Huacai

On Wed, Mar 22, 2023 at 10:20 AM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On Wed, Mar 22, 2023 at 08:50:07AM +0800, Huacai Chen wrote:
> > On Tue, Mar 21, 2023 at 10:25 PM Guenter Roeck <linux@roeck-us.net> wrote:
> > >
> > > On Tue, Mar 21, 2023 at 08:35:34PM +0800, Xi Ruoyao wrote:
> > > > On Tue, 2023-03-21 at 14:29 +0800, Tiezhu Yang wrote:
> > > > > We can see the following messages with CONFIG_PROVE_LOCKING=y on
> > > > > LoongArch:
> > > > >
> > > > >   BUG: MAX_STACK_TRACE_ENTRIES too low!
> > > > >   turning off the locking correctness validator.
> > > > >
> > > > > This is because stack_trace_save() returns a big value after call
> > > > > arch_stack_walk(), here is the call trace:
> > > > >
> > > > >   save_trace()
> > > > >     stack_trace_save()
> > > > >       arch_stack_walk()
> > > > >         stack_trace_consume_entry()
> > > > >
> > > > > arch_stack_walk() should return immediately if unwind_next_frame()
> > > > > failed, no need to do the useless loops to increase the value of
> > > > > c->len in stack_trace_consume_entry(), then we can fix the above
> > > > > problem.
> > > > >
> > > > > Reported-by: Guenter Roeck <linux@roeck-us.net>
> > > > > Link: https://lore.kernel.org/all/8a44ad71-68d2-4926-892f-72bfc7a67e2a@roeck-us.net/
> > > > > Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> > > >
> > > > The fix makes sense, but I'm asking the same question again (sorry if
> > > > it's noisy): should we Cc stable@vger.kernel.org and/or make a PR for
> > > > 6.3?
> > > >
> > > > To me a bug fixes should be backported into all stable branches affected
> > > > by the bug, unless there is some serious difficulty.  As 6.3 release
> > > > will work on launched 3A5000 boards out-of-box, people may want to stop
> > > > staying on the leading edge and use a LTS/stable release series. We
> > > > can't just say (or behave like) "we don't backport, please use latest
> > > > mainline" IMO :).
> > >
> > > It is a bug fix, isn't it ? It should be backported to v6.1+. Otherwise,
> > > if your policy is to not backport bug fixes, I might as well stop testing
> > > loongarch on all but the most recent kernel branch. Let me know if this is
> > > what you want. If so, I think you should let all other regression testers
> > > know that they should only test loongarch on mainline and possibly on
> > > linux-next.
> > This is of course a bug fix, but should Tiezhu resend this patch? Or
> > just replying to this message with CC stable@vger.kernel.org is
> > enough?
> >
>
> Normally the maintainer, before sending a pull request to Linus, would add
> "Cc: stable@vger.kernel.org" to the patch. Actually sending the patch to
> the stable@ mailing list is only necessary if it was applied to the
> upstream kernel without Cc: stable@ in the commit message.
>
> Guenter
  

Patch

diff --git a/arch/loongarch/kernel/stacktrace.c b/arch/loongarch/kernel/stacktrace.c
index 3a690f9..7c15ba5 100644
--- a/arch/loongarch/kernel/stacktrace.c
+++ b/arch/loongarch/kernel/stacktrace.c
@@ -30,7 +30,8 @@  void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
 
 	regs->regs[1] = 0;
 	for (unwind_start(&state, task, regs);
-	      !unwind_done(&state); unwind_next_frame(&state)) {
+	     !unwind_done(&state) && !unwind_error(&state);
+	     unwind_next_frame(&state)) {
 		addr = unwind_get_return_address(&state);
 		if (!addr || !consume_entry(cookie, addr))
 			break;
diff --git a/arch/loongarch/kernel/unwind.c b/arch/loongarch/kernel/unwind.c
index a463d69..ba324ba 100644
--- a/arch/loongarch/kernel/unwind.c
+++ b/arch/loongarch/kernel/unwind.c
@@ -28,5 +28,6 @@  bool default_next_frame(struct unwind_state *state)
 
 	} while (!get_stack_info(state->sp, state->task, info));
 
+	state->error = true;
 	return false;
 }
diff --git a/arch/loongarch/kernel/unwind_prologue.c b/arch/loongarch/kernel/unwind_prologue.c
index 9095fde..55afc27 100644
--- a/arch/loongarch/kernel/unwind_prologue.c
+++ b/arch/loongarch/kernel/unwind_prologue.c
@@ -211,7 +211,7 @@  static bool next_frame(struct unwind_state *state)
 			pc = regs->csr_era;
 
 			if (user_mode(regs) || !__kernel_text_address(pc))
-				return false;
+				goto out;
 
 			state->first = true;
 			state->pc = pc;
@@ -226,6 +226,8 @@  static bool next_frame(struct unwind_state *state)
 
 	} while (!get_stack_info(state->sp, state->task, info));
 
+out:
+	state->error = true;
 	return false;
 }