[printk,v2,08/11] panic: Add atomic write enforcement to warn/panic

Message ID 20230919230856.661435-9-john.ogness@linutronix.de
State New
Headers
Series wire up nbcon atomic printing |

Commit Message

John Ogness Sept. 19, 2023, 11:08 p.m. UTC
  From: Thomas Gleixner <tglx@linutronix.de>

Invoke the atomic write enforcement functions for warn/panic to
ensure that the information gets out to the consoles.

For the panic case, add explicit intermediate atomic flush
calls to ensure immediate flushing at important points.
Otherwise the atomic flushing only occurs when dropping out of
the elevated priority, which for panic may never happen.

It is important to note that if there are any legacy consoles
registered, they will be attempting to directly print from the
printk-caller context, which may jeopardize the reliability of
the atomic consoles. Optimally there should be no legacy
consoles registered.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 kernel/panic.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
  

Comments

Petr Mladek Sept. 27, 2023, 12:02 p.m. UTC | #1
On Wed 2023-09-20 01:14:53, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Invoke the atomic write enforcement functions for warn/panic to
> ensure that the information gets out to the consoles.
> 
> For the panic case, add explicit intermediate atomic flush
> calls to ensure immediate flushing at important points.
> Otherwise the atomic flushing only occurs when dropping out of
> the elevated priority, which for panic may never happen.

It would be great to avoid the need for the explicit flushes
except for the final unsafe one. Otherwise, we would play
another Whack-a-mole game. People would report that
panic() failed and they needed to add another explicit flush
to find the culprit...

> It is important to note that if there are any legacy consoles
> registered, they will be attempting to directly print from the
> printk-caller context, which may jeopardize the reliability of
> the atomic consoles. Optimally there should be no legacy
> consoles registered.

> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -275,6 +275,7 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>   */
>  void panic(const char *fmt, ...)
>  {
> +	enum nbcon_prio prev_prio;
>  	static char buf[1024];
>  	va_list args;
>  	long i, i_next = 0, len;
> @@ -322,6 +323,8 @@ void panic(const char *fmt, ...)
>  	if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
>  		panic_smp_self_stop();
>  
> +	prev_prio = nbcon_atomic_enter(NBCON_PRIO_PANIC);
> +

It would make sense to flush nbcon consoles the safe way
at this point before we allow dangerous games for legacy
consoles via bust_spinlock().

>  	console_verbose();
>  	bust_spinlocks(1);
>  	va_start(args, fmt);
> @@ -382,6 +385,8 @@ void panic(const char *fmt, ...)
>  	if (_crash_kexec_post_notifiers)
>  		__crash_kexec(NULL);
>  
> +	nbcon_atomic_flush_all();

There should be one more safe flush after dump_stack() just
in case the kexec fails. The legacy consoles also tried
to flush the consoles in each called printk().

...

I do not want to review or comment each added or needed
nbcon_atomic_flush_all(). I hope that they won't be needed
in the end.

Best Regards,
Petr
  

Patch

diff --git a/kernel/panic.c b/kernel/panic.c
index 07239d4ad81e..86ed71ba8c4d 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -275,6 +275,7 @@  static void panic_other_cpus_shutdown(bool crash_kexec)
  */
 void panic(const char *fmt, ...)
 {
+	enum nbcon_prio prev_prio;
 	static char buf[1024];
 	va_list args;
 	long i, i_next = 0, len;
@@ -322,6 +323,8 @@  void panic(const char *fmt, ...)
 	if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
 		panic_smp_self_stop();
 
+	prev_prio = nbcon_atomic_enter(NBCON_PRIO_PANIC);
+
 	console_verbose();
 	bust_spinlocks(1);
 	va_start(args, fmt);
@@ -382,6 +385,8 @@  void panic(const char *fmt, ...)
 	if (_crash_kexec_post_notifiers)
 		__crash_kexec(NULL);
 
+	nbcon_atomic_flush_all();
+
 	console_unblank();
 
 	/*
@@ -406,6 +411,7 @@  void panic(const char *fmt, ...)
 		 * We can't use the "normal" timers since we just panicked.
 		 */
 		pr_emerg("Rebooting in %d seconds..\n", panic_timeout);
+		nbcon_atomic_flush_all();
 
 		for (i = 0; i < panic_timeout * 1000; i += PANIC_TIMER_STEP) {
 			touch_nmi_watchdog();
@@ -424,6 +430,7 @@  void panic(const char *fmt, ...)
 		 */
 		if (panic_reboot_mode != REBOOT_UNDEFINED)
 			reboot_mode = panic_reboot_mode;
+		nbcon_atomic_flush_all();
 		emergency_restart();
 	}
 #ifdef __sparc__
@@ -436,12 +443,16 @@  void panic(const char *fmt, ...)
 	}
 #endif
 #if defined(CONFIG_S390)
+	nbcon_atomic_flush_all();
 	disabled_wait();
 #endif
 	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
 
 	/* Do not scroll important messages printed above */
 	suppress_printk = 1;
+
+	nbcon_atomic_exit(NBCON_PRIO_PANIC, prev_prio);
+
 	local_irq_enable();
 	for (i = 0; ; i += PANIC_TIMER_STEP) {
 		touch_softlockup_watchdog();
@@ -652,6 +663,10 @@  struct warn_args {
 void __warn(const char *file, int line, void *caller, unsigned taint,
 	    struct pt_regs *regs, struct warn_args *args)
 {
+	enum nbcon_prio prev_prio;
+
+	prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
+
 	disable_trace_on_warning();
 
 	if (file)
@@ -682,6 +697,8 @@  void __warn(const char *file, int line, void *caller, unsigned taint,
 
 	/* Just a warning, don't kill lockdep. */
 	add_taint(taint, LOCKDEP_STILL_OK);
+
+	nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, prev_prio);
 }
 
 #ifdef CONFIG_BUG