[tip:,objtool/urgent] x86/unwind/orc: Fix unreliable stack dump with gcov

Message ID 166638340876.401.7064883651633264359.tip-bot2@tip-bot2
State New
Headers
Series [tip:,objtool/urgent] x86/unwind/orc: Fix unreliable stack dump with gcov |

Commit Message

tip-bot2 for Thomas Gleixner Oct. 21, 2022, 8:16 p.m. UTC
  The following commit has been merged into the objtool/urgent branch of tip:

Commit-ID:     230db82413c091bc16acee72650f48d419cebe49
Gitweb:        https://git.kernel.org/tip/230db82413c091bc16acee72650f48d419cebe49
Author:        Chen Zhongjin <chenzhongjin@huawei.com>
AuthorDate:    Wed, 27 Jul 2022 11:15:06 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 21 Oct 2022 14:56:42 +02:00

x86/unwind/orc: Fix unreliable stack dump with gcov

When a console stack dump is initiated with CONFIG_GCOV_PROFILE_ALL
enabled, show_trace_log_lvl() gets out of sync with the ORC unwinder,
causing the stack trace to show all text addresses as unreliable:

  # echo l > /proc/sysrq-trigger
  [  477.521031] sysrq: Show backtrace of all active CPUs
  [  477.523813] NMI backtrace for cpu 0
  [  477.524492] CPU: 0 PID: 1021 Comm: bash Not tainted 6.0.0 #65
  [  477.525295] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014
  [  477.526439] Call Trace:
  [  477.526854]  <TASK>
  [  477.527216]  ? dump_stack_lvl+0xc7/0x114
  [  477.527801]  ? dump_stack+0x13/0x1f
  [  477.528331]  ? nmi_cpu_backtrace.cold+0xb5/0x10d
  [  477.528998]  ? lapic_can_unplug_cpu+0xa0/0xa0
  [  477.529641]  ? nmi_trigger_cpumask_backtrace+0x16a/0x1f0
  [  477.530393]  ? arch_trigger_cpumask_backtrace+0x1d/0x30
  [  477.531136]  ? sysrq_handle_showallcpus+0x1b/0x30
  [  477.531818]  ? __handle_sysrq.cold+0x4e/0x1ae
  [  477.532451]  ? write_sysrq_trigger+0x63/0x80
  [  477.533080]  ? proc_reg_write+0x92/0x110
  [  477.533663]  ? vfs_write+0x174/0x530
  [  477.534265]  ? handle_mm_fault+0x16f/0x500
  [  477.534940]  ? ksys_write+0x7b/0x170
  [  477.535543]  ? __x64_sys_write+0x1d/0x30
  [  477.536191]  ? do_syscall_64+0x6b/0x100
  [  477.536809]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [  477.537609]  </TASK>

This happens when the compiled code for show_stack() has a single word
on the stack, and doesn't use a tail call to show_stack_log_lvl().
(CONFIG_GCOV_PROFILE_ALL=y is the only known case of this.)  Then the
__unwind_start() skip logic hits an off-by-one bug and fails to unwind
all the way to the intended starting frame.

Fix it by reverting the following commit:

  f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks")

The original justification for that commit no longer exists.  That
original issue was later fixed in a different way, with the following
commit:

  f2ac57a4c49d ("x86/unwind/orc: Fix inactive tasks with stack pointer in %sp on GCC 10 compiled kernels")

Fixes: f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks")
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
[jpoimboe: rewrite commit log]
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/kernel/unwind_orc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Patch

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 0ea57da..c059820 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -713,7 +713,7 @@  void __unwind_start(struct unwind_state *state, struct task_struct *task,
 	/* Otherwise, skip ahead to the user-specified starting frame: */
 	while (!unwind_done(state) &&
 	       (!on_stack(&state->stack_info, first_frame, sizeof(long)) ||
-			state->sp < (unsigned long)first_frame))
+			state->sp <= (unsigned long)first_frame))
 		unwind_next_frame(state);
 
 	return;