[printk,v2,09/11] panic: Add atomic write enforcement to oops

Message ID 20230919230856.661435-10-john.ogness@linutronix.de
State New
Headers
Series wire up nbcon atomic printing |

Commit Message

John Ogness Sept. 19, 2023, 11:08 p.m. UTC
  Invoke the atomic write enforcement functions for oops to
ensure that the information gets out to the consoles.

Since there is no single general function that calls both
oops_enter() and oops_exit(), the nesting feature of atomic
write sections is taken advantage of in order to guarantee
full coverage between the first oops_enter() and the last
oops_exit().

It is important to note that if there are any legacy consoles
registered, they will be attempting to directly print from the
printk-caller context, which may jeopardize the reliability of
the atomic consoles. Optimally there should be no legacy
consoles registered.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/panic.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)
  

Comments

Andy Shevchenko Sept. 20, 2023, 1:28 p.m. UTC | #1
On Wed, Sep 20, 2023 at 01:14:54AM +0206, John Ogness wrote:
> Invoke the atomic write enforcement functions for oops to
> ensure that the information gets out to the consoles.
> 
> Since there is no single general function that calls both
> oops_enter() and oops_exit(), the nesting feature of atomic
> write sections is taken advantage of in order to guarantee
> full coverage between the first oops_enter() and the last
> oops_exit().
> 
> It is important to note that if there are any legacy consoles
> registered, they will be attempting to directly print from the
> printk-caller context, which may jeopardize the reliability of
> the atomic consoles. Optimally there should be no legacy
> consoles registered.

...

> +	if (atomic_read(&oops_cpu) == smp_processor_id()) {
> +		oops_nesting--;
> +		if (oops_nesting == 0) {
> +			atomic_set(&oops_cpu, -1);

Between read and set the variable can change, can't it?
If not, why this variable is atomic then? Or, why it's not a problem?
If the latter is the case, perhaps a comment to explain this?

> +			/* Exit outmost atomic section. */
> +			nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, oops_prev_prio);
> +		}
> +	}
> +	put_cpu();
  
John Ogness Sept. 20, 2023, 2:20 p.m. UTC | #2
On 2023-09-20, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> On Wed, Sep 20, 2023 at 01:14:54AM +0206, John Ogness wrote:
>> Invoke the atomic write enforcement functions for oops to
>> ensure that the information gets out to the consoles.
>> 
>> Since there is no single general function that calls both
>> oops_enter() and oops_exit(), the nesting feature of atomic
>> write sections is taken advantage of in order to guarantee
>> full coverage between the first oops_enter() and the last
>> oops_exit().
>> 
>> It is important to note that if there are any legacy consoles
>> registered, they will be attempting to directly print from the
>> printk-caller context, which may jeopardize the reliability of
>> the atomic consoles. Optimally there should be no legacy
>> consoles registered.
>
> ...
>
>> +	if (atomic_read(&oops_cpu) == smp_processor_id()) {
>> +		oops_nesting--;
>> +		if (oops_nesting == 0) {
>> +			atomic_set(&oops_cpu, -1);
>
> Between read and set the variable can change, can't it?

CPU migration is disabled. @oops_cpu contains the CPU ID of the only CPU
that is printing the oops. (Perhaps the variable should be called
"oops_printing_cpu"?)

If this matches smp_processor_id(), then the current CPU is the only one
that is allowed to change it back to -1. So no, if the first condition
is true, it cannot change before atomic_set(). And if the second
condition is true, this is the only CPU+context that is allowed to
change it back to -1;

> If not, why this variable is atomic then? Or, why it's not a problem?
> If the latter is the case, perhaps a comment to explain this?

If not atomic, it will be a data race since one CPU might be changing
@oops_cpu and another is reading it. For type "int" such a data race
would be fine because it doesn't matter which side of the race the
reader was on, both values will not match the current CPU ID.

The reason that I didn't implement it using cmpxchg(),
data_race(READ_ONCE()), and WRITE_ONCE() is because I once learned that
you should never mix cmpxchg() with READ_ONCE()/WRITE_ONCE() because
there are architectures that do not support cmpxchg() as an atomic
instruction. The answer was always: "use atomic_t instead... that is
what it is for".

But AFAICT for this case it would be fine because obviously cmpxchg()
will not race with itself. And successfully reading a matching CPU ID
means there cannot be any cmpxchg() in progress. And writing only occurs
after seeing a matching CPU ID.

So I can change it from atomic_t to int. Although I do feel like that
might require explanation about why the data race is safe.

Or perhaps it is enough just to have something like this:

/**
 * oops_printing_cpu - The ID of the CPU responsible for printing the
 *                     OOPS message(s) to the consoles.
 *
 * This is atomic_t because multiple CPUs can read this variable
 * simultaneously when exiting OOPS while another CPU can be
 * modifying this variable to begin or end its printing duties.
 */
static atomic_t oops_printing_cpu = ATOMIC_INIT(-1);

John Ogness
  
Andy Shevchenko Sept. 20, 2023, 2:45 p.m. UTC | #3
On Wed, Sep 20, 2023 at 04:26:12PM +0206, John Ogness wrote:
> On 2023-09-20, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> > On Wed, Sep 20, 2023 at 01:14:54AM +0206, John Ogness wrote:

...

> >> +	if (atomic_read(&oops_cpu) == smp_processor_id()) {
> >> +		oops_nesting--;
> >> +		if (oops_nesting == 0) {
> >> +			atomic_set(&oops_cpu, -1);
> >
> > Between read and set the variable can change, can't it?
> 
> CPU migration is disabled. @oops_cpu contains the CPU ID of the only CPU
> that is printing the oops. (Perhaps the variable should be called
> "oops_printing_cpu"?)
> 
> If this matches smp_processor_id(), then the current CPU is the only one
> that is allowed to change it back to -1. So no, if the first condition
> is true, it cannot change before atomic_set(). And if the second
> condition is true, this is the only CPU+context that is allowed to
> change it back to -1;
> 
> > If not, why this variable is atomic then? Or, why it's not a problem?
> > If the latter is the case, perhaps a comment to explain this?
> 
> If not atomic, it will be a data race since one CPU might be changing
> @oops_cpu and another is reading it. For type "int" such a data race
> would be fine because it doesn't matter which side of the race the
> reader was on, both values will not match the current CPU ID.
> 
> The reason that I didn't implement it using cmpxchg(),
> data_race(READ_ONCE()), and WRITE_ONCE() is because I once learned that
> you should never mix cmpxchg() with READ_ONCE()/WRITE_ONCE() because
> there are architectures that do not support cmpxchg() as an atomic
> instruction. The answer was always: "use atomic_t instead... that is
> what it is for".
> 
> But AFAICT for this case it would be fine because obviously cmpxchg()
> will not race with itself. And successfully reading a matching CPU ID
> means there cannot be any cmpxchg() in progress. And writing only occurs
> after seeing a matching CPU ID.
> 
> So I can change it from atomic_t to int. Although I do feel like that
> might require explanation about why the data race is safe.

Either way a comment is needed, but I think the usage of atomic above
is a bit confusing as you see I immediately rose the concern.

> Or perhaps it is enough just to have something like this:
> 
> /**
>  * oops_printing_cpu - The ID of the CPU responsible for printing the
>  *                     OOPS message(s) to the consoles.
>  *
>  * This is atomic_t because multiple CPUs can read this variable
>  * simultaneously when exiting OOPS while another CPU can be
>  * modifying this variable to begin or end its printing duties.
>  */
> static atomic_t oops_printing_cpu = ATOMIC_INIT(-1);
  
Petr Mladek Sept. 27, 2023, 1:15 p.m. UTC | #4
On Wed 2023-09-20 01:14:54, John Ogness wrote:
> Invoke the atomic write enforcement functions for oops to
> ensure that the information gets out to the consoles.
> 
> Since there is no single general function that calls both
> oops_enter() and oops_exit(), the nesting feature of atomic
> write sections is taken advantage of in order to guarantee
> full coverage between the first oops_enter() and the last
> oops_exit().
> 
> It is important to note that if there are any legacy consoles
> registered, they will be attempting to directly print from the
> printk-caller context, which may jeopardize the reliability of
> the atomic consoles. Optimally there should be no legacy
> consoles registered.
> 
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -630,6 +634,36 @@ bool oops_may_print(void)
>   */
>  void oops_enter(void)
>  {
> +	enum nbcon_prio prev_prio;
> +	int cpu = -1;
> +
> +	/*
> +	 * If this turns out to be the first CPU in oops, this is the
> +	 * beginning of the outermost atomic section. Otherwise it is
> +	 * the beginning of an inner atomic section.
> +	 */

This sounds strange. What is the advantage of having the inner
atomic context, please? It covers only messages printed inside
oops_enter() and not the whole oops_enter()/exit(). Also see below.

> +	prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
> +
> +	if (atomic_try_cmpxchg_relaxed(&oops_cpu, &cpu, smp_processor_id())) {
> +		/*
> +		 * This is the first CPU in oops. Save the outermost
> +		 * @prev_prio in order to restore it on the outermost
> +		 * matching oops_exit(), when @oops_nesting == 0.
> +		 */
> +		oops_prev_prio = prev_prio;
> +
> +		/*
> +		 * Enter an inner atomic section that ends at the end of this
> +		 * function. In this case, the nbcon_atomic_enter() above
> +		 * began the outermost atomic section.
> +		 */
> +		prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
> +	}
> +
> +	/* Track nesting when this CPU is the owner. */
> +	if (cpu == -1 || cpu == smp_processor_id())
> +		oops_nesting++;
> +
>  	tracing_off();
>  	/* can't trust the integrity of the kernel anymore: */
>  	debug_locks_off();
> @@ -637,6 +671,9 @@ void oops_enter(void)
>  
>  	if (sysctl_oops_all_cpu_backtrace)
>  		trigger_all_cpu_backtrace();
> +
> +	/* Exit inner atomic section. */
> +	nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, prev_prio);

This will not flush the messages when:

   + This CPU owns oops_cpu. The flush will have to wait for exiting
     the outer loop.

     In this case, the inner atomic context is not needed.


   + oops_cpu is owner by another CPU, the other CPU is
     just flushing the messages and block the per-console
     lock.

     The good thing is that the messages printed by this oops_enter()
     would likely get flushed by the other CPU.

     The bad thing is that oops_exit() on this CPU won't call
     nbcon_atomic_exit() so that the following OOPS messages
     from this CPU might need to wait for the printk kthread.
     IMHO, this is not what we want.


One solution would be to store prev_prio in per-CPU array
so that each CPU could call its own nbcon_atomic_exit().

But I start liking more and more the idea with storing
and counting nested emergency contexts in struct task_struct.
It is the alternative implementation in reply to the 7th patch,
https://lore.kernel.org/r/ZRLBxsXPCym2NC5Q@alley

Then it will be enough to simply call:

   + nbcon_emergency_enter() in oops_enter()
   + nbcon_emergency_exit() in oops_enter()

Best Regards,
Petr

PS: I just hope that you didn't add all this complexity just because
    we preferred this behavior at LPC 2022. Especially I hope
    that it was not me who proposed and preferred this.
  

Patch

diff --git a/kernel/panic.c b/kernel/panic.c
index 86ed71ba8c4d..e2879098645d 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -614,6 +614,10 @@  bool oops_may_print(void)
 	return pause_on_oops_flag == 0;
 }
 
+static atomic_t oops_cpu = ATOMIC_INIT(-1);
+static int oops_nesting;
+static enum nbcon_prio oops_prev_prio;
+
 /*
  * Called when the architecture enters its oops handler, before it prints
  * anything.  If this is the first CPU to oops, and it's oopsing the first
@@ -630,6 +634,36 @@  bool oops_may_print(void)
  */
 void oops_enter(void)
 {
+	enum nbcon_prio prev_prio;
+	int cpu = -1;
+
+	/*
+	 * If this turns out to be the first CPU in oops, this is the
+	 * beginning of the outermost atomic section. Otherwise it is
+	 * the beginning of an inner atomic section.
+	 */
+	prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
+
+	if (atomic_try_cmpxchg_relaxed(&oops_cpu, &cpu, smp_processor_id())) {
+		/*
+		 * This is the first CPU in oops. Save the outermost
+		 * @prev_prio in order to restore it on the outermost
+		 * matching oops_exit(), when @oops_nesting == 0.
+		 */
+		oops_prev_prio = prev_prio;
+
+		/*
+		 * Enter an inner atomic section that ends at the end of this
+		 * function. In this case, the nbcon_atomic_enter() above
+		 * began the outermost atomic section.
+		 */
+		prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
+	}
+
+	/* Track nesting when this CPU is the owner. */
+	if (cpu == -1 || cpu == smp_processor_id())
+		oops_nesting++;
+
 	tracing_off();
 	/* can't trust the integrity of the kernel anymore: */
 	debug_locks_off();
@@ -637,6 +671,9 @@  void oops_enter(void)
 
 	if (sysctl_oops_all_cpu_backtrace)
 		trigger_all_cpu_backtrace();
+
+	/* Exit inner atomic section. */
+	nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, prev_prio);
 }
 
 static void print_oops_end_marker(void)
@@ -652,6 +689,18 @@  void oops_exit(void)
 {
 	do_oops_enter_exit();
 	print_oops_end_marker();
+
+	if (atomic_read(&oops_cpu) == smp_processor_id()) {
+		oops_nesting--;
+		if (oops_nesting == 0) {
+			atomic_set(&oops_cpu, -1);
+
+			/* Exit outmost atomic section. */
+			nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, oops_prev_prio);
+		}
+	}
+	put_cpu();
+
 	kmsg_dump(KMSG_DUMP_OOPS);
 }