panic: add option to dump blocked tasks in panic_print

Message ID 20240202132042.3609657-1-feng.tang@intel.com
State New
Headers
Series panic: add option to dump blocked tasks in panic_print |

Commit Message

Feng Tang Feb. 2, 2024, 1:20 p.m. UTC
  For debugging kernel panic and other bugs, there is already option of
panic_print to dump all tasks' call stacks. On today's large servers
running many containers, there could be thousands of tasks or more,
and it will print out huge amount of call stacks, and take a lot of
time (for serial console which is main target user case of panic_print).

And in many cases, only those several tasks being blocked is key for
the panic, so add an option to only dump blocked tasks' call stack.

Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 1 +
 Documentation/admin-guide/sysctl/kernel.rst     | 1 +
 kernel/panic.c                                  | 4 ++++
 3 files changed, 6 insertions(+)
  

Comments

Guilherme G. Piccoli Feb. 3, 2024, 12:05 p.m. UTC | #1
On 02/02/2024 10:20, Feng Tang wrote:
> For debugging kernel panic and other bugs, there is already option of
> panic_print to dump all tasks' call stacks. On today's large servers
> running many containers, there could be thousands of tasks or more,
> and it will print out huge amount of call stacks, and take a lot of
> time (for serial console which is main target user case of panic_print).
> 
> And in many cases, only those several tasks being blocked is key for
> the panic, so add an option to only dump blocked tasks' call stack.
> 
> Signed-off-by: Feng Tang <feng.tang@intel.com>
> [...]

Thank you Feng Tang, this is an interesting and useful idea!
I've just tested the patch and works fine - also no code issues from my
side. So, feel free to add:


Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>


Cheers!

 ---
>  Documentation/admin-guide/kernel-parameters.txt | 1 +
>  Documentation/admin-guide/sysctl/kernel.rst     | 1 +
>  kernel/panic.c                                  | 4 ++++
>  3 files changed, 6 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 31b3a25680d0..0f2369e87175 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4182,6 +4182,7 @@
>  			bit 4: print ftrace buffer
>  			bit 5: print all printk messages in buffer
>  			bit 6: print all CPUs backtrace (if available in the arch)
> +			bit 7: print tasks in uninterruptible (blocked) state
>  			*Be aware* that this option may print a _lot_ of lines,
>  			so there are risks of losing older messages in the log.
>  			Use this option carefully, maybe worth to setup a
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 6584a1f9bfe3..e066a16b35d5 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -850,6 +850,7 @@ bit 3  print locks info if ``CONFIG_LOCKDEP`` is on
>  bit 4  print ftrace buffer
>  bit 5  print all printk messages in buffer
>  bit 6  print all CPUs backtrace (if available in the arch)
> +bit 7  print tasks in uninterruptible (blocked) state
>  =====  ============================================
>  
>  So for example to print tasks and memory info on panic, user can::
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 2807639aab51..aa17ae0897c0 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -73,6 +73,7 @@ EXPORT_SYMBOL_GPL(panic_timeout);
>  #define PANIC_PRINT_FTRACE_INFO		0x00000010
>  #define PANIC_PRINT_ALL_PRINTK_MSG	0x00000020
>  #define PANIC_PRINT_ALL_CPU_BT		0x00000040
> +#define PANIC_PRINT_BLOCKED_TASKS	0x00000080
>  unsigned long panic_print;
>  
>  ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
> @@ -227,6 +228,9 @@ static void panic_print_sys_info(bool console_flush)
>  
>  	if (panic_print & PANIC_PRINT_FTRACE_INFO)
>  		ftrace_dump(DUMP_ALL);
> +
> +	if (panic_print & PANIC_PRINT_BLOCKED_TASKS)
> +		show_state_filter(TASK_UNINTERRUPTIBLE);
>  }
>  
>  void check_panic_on_warn(const char *origin)
  

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 31b3a25680d0..0f2369e87175 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4182,6 +4182,7 @@ 
 			bit 4: print ftrace buffer
 			bit 5: print all printk messages in buffer
 			bit 6: print all CPUs backtrace (if available in the arch)
+			bit 7: print tasks in uninterruptible (blocked) state
 			*Be aware* that this option may print a _lot_ of lines,
 			so there are risks of losing older messages in the log.
 			Use this option carefully, maybe worth to setup a
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 6584a1f9bfe3..e066a16b35d5 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -850,6 +850,7 @@  bit 3  print locks info if ``CONFIG_LOCKDEP`` is on
 bit 4  print ftrace buffer
 bit 5  print all printk messages in buffer
 bit 6  print all CPUs backtrace (if available in the arch)
+bit 7  print tasks in uninterruptible (blocked) state
 =====  ============================================
 
 So for example to print tasks and memory info on panic, user can::
diff --git a/kernel/panic.c b/kernel/panic.c
index 2807639aab51..aa17ae0897c0 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -73,6 +73,7 @@  EXPORT_SYMBOL_GPL(panic_timeout);
 #define PANIC_PRINT_FTRACE_INFO		0x00000010
 #define PANIC_PRINT_ALL_PRINTK_MSG	0x00000020
 #define PANIC_PRINT_ALL_CPU_BT		0x00000040
+#define PANIC_PRINT_BLOCKED_TASKS	0x00000080
 unsigned long panic_print;
 
 ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
@@ -227,6 +228,9 @@  static void panic_print_sys_info(bool console_flush)
 
 	if (panic_print & PANIC_PRINT_FTRACE_INFO)
 		ftrace_dump(DUMP_ALL);
+
+	if (panic_print & PANIC_PRINT_BLOCKED_TASKS)
+		show_state_filter(TASK_UNINTERRUPTIBLE);
 }
 
 void check_panic_on_warn(const char *origin)