sched/debug: Dump end of stack when detected corrupted

Message ID 20231219032254.96685-1-feng.tang@intel.com
State New
Headers
Series sched/debug: Dump end of stack when detected corrupted |

Commit Message

Feng Tang Dec. 19, 2023, 3:22 a.m. UTC
  When debugging a kernel hang during suspend/resume, there are random
memory corruptions in different places like being detected by scheduler
with error message:

  "Kernel panic - not syncing: corrupted stack end detected inside scheduler"

Dump the corrupted memory around the stack end will give more direct
hints about how the memory is corrupted:

 "
 Corrupted Stack: ff11000122770000: ff ff ff ff ff ff 14 91 82 3b 78 e8 08 00 45 00  .........;x...E.
 Corrupted Stack: ff11000122770010: 00 1d 2a ff 40 00 40 11 98 c8 0a ef 30 2c 0a ef  ..*.@.@.....0,..
 Corrupted Stack: ff11000122770020: 30 ff a2 00 22 3d 00 09 9a 95 2a 00 00 00 00 00  0..."=....*.....
 ...
 Kernel panic - not syncing: corrupted stack end detected inside scheduler
 "

And with it, the culprit was quickly identified to be an ethernet
driver with its DMA operations.

Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 kernel/sched/core.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)
  

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a795e030678c..1280f7012bc5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5949,8 +5949,18 @@  static noinline void __schedule_bug(struct task_struct *prev)
 static inline void schedule_debug(struct task_struct *prev, bool preempt)
 {
 #ifdef CONFIG_SCHED_STACK_END_CHECK
-	if (task_stack_end_corrupted(prev))
+	if (task_stack_end_corrupted(prev)) {
+		unsigned long *ptr = end_of_stack(prev);
+
+		/* Dump 16 ulong words around the corruption point */
+#ifdef CONFIG_STACK_GROWSUP
+		ptr -= 15;
+#endif
+		print_hex_dump(KERN_ERR, "Corrupted Stack: ",
+			DUMP_PREFIX_ADDRESS, 16, 1, ptr, 16 * sizeof(*ptr), 1);
+
 		panic("corrupted stack end detected inside scheduler\n");
+	}
 
 	if (task_scs_end_corrupted(prev))
 		panic("corrupted shadow stack detected inside scheduler\n");