[printk,v1,17/18] rcu: Add atomic write enforcement for rcu stalls

Message ID 20230302195618.156940-18-john.ogness@linutronix.de
State New
Headers
Series threaded/atomic console support |

Commit Message

John Ogness March 2, 2023, 7:56 p.m. UTC
  Invoke the atomic write enforcement functions for rcu stalls to
ensure that the information gets out to the consoles.

It is important to note that if there are any legacy consoles
registered, they will be attempting to directly print from the
printk-caller context, which may jeopardize the reliability of the
atomic consoles. Optimally there should be no legacy consoles
registered.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/rcu/tree_stall.h | 6 ++++++
 1 file changed, 6 insertions(+)
  

Comments

Petr Mladek April 13, 2023, 12:10 p.m. UTC | #1
On Thu 2023-03-02 21:02:17, John Ogness wrote:
> Invoke the atomic write enforcement functions for rcu stalls to
> ensure that the information gets out to the consoles.

"ensure" is too strong. It is still just the best effort. It might
fail when the current console user does not pass the lock.

I would say that it will increase the chance to see the messages
on NOBKL consoles by printing the messages directly instead
of waiting for the printk thread.

> It is important to note that if there are any legacy consoles
> registered, they will be attempting to directly print from the
> printk-caller context, which may jeopardize the reliability of the
> atomic consoles. Optimally there should be no legacy consoles
> registered.

The above paragraph is a bit vague. It is not clear how exactly the
legacy consoles affect the reliability,

Does it mean that they might cause a deadlock because they are not
atomic? But there is nothing specific about rcu stalls and priority
of NOBKL consoles. This is a generic problem with legacy consoles.


> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -566,6 +568,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>  	if (rcu_stall_is_suppressed())
>  		return;
>  
> +	prev_prio = cons_atomic_enter(CONS_PRIO_EMERGENCY);

Thinking loudly: This would set the EMERGENCY priority on this
CPU. But the following function:

  + rcu_dump_cpu_stacks()
    + dump_cpu_task()
      + trigger_single_cpu_backtrace()

might send IPI and the backtrace will be printed from another CPU.
As a result that backtraces won't be printed with EMERGENCY priority.

One solution would be to have also global EMERGENCY priority.

Another possibility would be to use EMERGENCY priority also
in nmi_cpu_backtrace() which is the callback called by the IPI.

I would probably go for the global flag. printk() called in EMERGENCY
priority has to flush also messages added by other CPUs. So that
messages added by other CPUs are printed "directly" anyway.

Also setting the EMERGENCY priority in  nmi_cpu_backtrace() is an
ad-hoc solution. The backtrace is usually called as part of another
global emergency report.

> +
>  	/*
>  	 * OK, time to rat on our buddy...
>  	 * See Documentation/RCU/stallwarn.rst for info on how to debug

Best Regards,
Petr
  

Patch

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 5653560573e2..25207a213e7a 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -8,6 +8,7 @@ 
  */
 
 #include <linux/kvm_para.h>
+#include <linux/console.h>
 
 //////////////////////////////////////////////////////////////////////////////
 //
@@ -551,6 +552,7 @@  static void rcu_check_gp_kthread_expired_fqs_timer(void)
 
 static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 {
+	enum cons_prio prev_prio;
 	int cpu;
 	unsigned long flags;
 	unsigned long gpa;
@@ -566,6 +568,8 @@  static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 	if (rcu_stall_is_suppressed())
 		return;
 
+	prev_prio = cons_atomic_enter(CONS_PRIO_EMERGENCY);
+
 	/*
 	 * OK, time to rat on our buddy...
 	 * See Documentation/RCU/stallwarn.rst for info on how to debug
@@ -620,6 +624,8 @@  static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 	panic_on_rcu_stall();
 
 	rcu_force_quiescent_state();  /* Kick them all. */
+
+	cons_atomic_exit(CONS_PRIO_EMERGENCY, prev_prio);
 }
 
 static void print_cpu_stall(unsigned long gps)