[printk,v3,11/14] printk: ringbuffer: Consider committed as finalized in panic

Message ID 20231214214201.499426-12-john.ogness@linutronix.de
State New
Headers
Series fix console flushing |

Commit Message

John Ogness Dec. 14, 2023, 9:41 p.m. UTC
  A descriptor in the committed state means the record does not yet
exist for the reader. However, for the panic CPU, committed
records should be handled as finalized records since they contain
message data in a consistent state and may contain additional
hints as to the cause of the panic.

Add an exception for records in the commit state to not be
considered non-existing when reading from the panic CPU.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk_ringbuffer.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)
  

Comments

Petr Mladek Feb. 1, 2024, 6 p.m. UTC | #1
On Thu 2023-12-14 22:47:58, John Ogness wrote:
> A descriptor in the committed state means the record does not yet
> exist for the reader. However, for the panic CPU, committed
> records should be handled as finalized records since they contain
> message data in a consistent state and may contain additional
> hints as to the cause of the panic.
> 
> Add an exception for records in the commit state to not be
> considered non-existing when reading from the panic CPU.

IMHO, it is important to describe effects of this change in more
details. And I think that it actually does not work as expected,
see below.

> --- a/kernel/printk/printk_ringbuffer.c
> +++ b/kernel/printk/printk_ringbuffer.c
> @@ -1875,16 +1877,25 @@ static int desc_read_finalized_seq(struct prb_desc_ring *desc_ring,
>  
>  	/*
>  	 * An unexpected @id (desc_miss) or @seq mismatch means the record
> -	 * does not exist. A descriptor in the reserved or committed state
> -	 * means the record does not yet exist for the reader.
> +	 * does not exist. A descriptor in the reserved state means the
> +	 * record does not yet exist for the reader.
>  	 */
>  	if (d_state == desc_miss ||
>  	    d_state == desc_reserved ||
> -	    d_state == desc_committed ||
>  	    s != seq) {
>  		return -EINVAL;
>  	}
>  
> +	/*
> +	 * A descriptor in the committed state means the record does not yet
> +	 * exist for the reader. However, for the panic CPU, committed
> +	 * records are also handled as finalized records since they contain
> +	 * message data in a consistent state and may contain additional
> +	 * hints as to the cause of the panic.
> +	 */
> +	if (d_state == desc_committed && !this_cpu_in_panic())
> +		return -EINVAL;

If I get it correctly, this causes that panic CPU would see a
non-finalized continuous line as finalized. And it would flush
the existing piece to consoles.

The problem is that pr_cont() would append the message into
the same record. But the consoles would already wait
for the next record. They would miss the appended pieces.

It might be fixed by finalizing the record before we read
the non-finalized piece. It is doable but it would add
another lock-less scenario. I am not sure if it would work with
the existing barriers or how complicated it would be to
prove the correctness.

Now, in practice, it would allow to flush pieces of continuous lines
printed on panic CPU immediately. It would not affect messages printed
by other CPUs because of a mix of reasons:

   1. The current code tries hard to move the console owner to panic
      CPU. It allows the panic CPU to flush the messages a safe way
      even when other CPUs are stopped or somehow locked.

      It means that the consoles are ideally flushed when the panic CPU
      adds a message.

   2. Only the last record might be in a committed state. Any older
      record is automatically finalized when it reaches the committed
      state.

   3. The previous patch causes that messages from non-panic CPUs
      are skipped when they are not committed or finalized, see
      https://lore.kernel.org/all/20231214214201.499426-11-john.ogness@linutronix.de


Now, this patch would really help only when the panic CPU dies in
the middle of a continuous message or when the final message does
not have a newline character.

Honestly, I think that it is not worth the effort. It would add another
complexity to the memory barriers. The real effect is not easy
to understand. And the benefit is minimal from my POV.


Alternative solutions:

1. Flush the non-finalized continuous line only before reaching
   the infinite loop in panic(). printk() gets disabled
   at this point anyway.

   It would solve only this one scenario, though.


2. Always finalize messages in vprintk_store() when in_panic().

   It would have the desired effect in all panic() situations.
   And it would not add another complexity to the lock-less
   ringbuffer code.


Best Regards,
Petr
  
John Ogness Feb. 5, 2024, 2:08 p.m. UTC | #2
On 2024-02-01, Petr Mladek <pmladek@suse.com> wrote:
> On Thu 2023-12-14 22:47:58, John Ogness wrote:
>> A descriptor in the committed state means the record does not yet
>> exist for the reader. However, for the panic CPU, committed
>> records should be handled as finalized records since they contain
>> message data in a consistent state and may contain additional
>> hints as to the cause of the panic.
>> 
>> Add an exception for records in the commit state to not be
>> considered non-existing when reading from the panic CPU.
>
> IMHO, it is important to describe effects of this change in more
> details. And I think that it actually does not work as expected,
> see below.

I reviewed my notes from our meeting in Richmond. We had agreed that
this feature should not apply to the latest message. That would change
the commit message to be as follows:

    printk: ringbuffer: Consider committed as finalized in panic
    
    A descriptor in the committed state means the record does not yet
    exist for the reader. However, for the panic CPU, committed
    records should be handled as finalized records since they contain
    message data in a consistent state and may contain additional
    hints as to the cause of the panic.
    
    The only exception is the last record. The panic CPU may be
    usig LOG_CONT and the individual pieces should not be printed
    separately.
    
    Add a special-case check for records in the commit state to not
    be considered non-existing when reading from the panic CPU and
    it is not the last record.

>> --- a/kernel/printk/printk_ringbuffer.c
>> +++ b/kernel/printk/printk_ringbuffer.c
>> @@ -1875,16 +1877,25 @@ static int desc_read_finalized_seq(struct prb_desc_ring *desc_ring,
>>  
>>  	/*
>>  	 * An unexpected @id (desc_miss) or @seq mismatch means the record
>> -	 * does not exist. A descriptor in the reserved or committed state
>> -	 * means the record does not yet exist for the reader.
>> +	 * does not exist. A descriptor in the reserved state means the
>> +	 * record does not yet exist for the reader.
>>  	 */
>>  	if (d_state == desc_miss ||
>>  	    d_state == desc_reserved ||
>> -	    d_state == desc_committed ||
>>  	    s != seq) {
>>  		return -EINVAL;
>>  	}
>>  
>> +	/*
>> +	 * A descriptor in the committed state means the record does not yet
>> +	 * exist for the reader. However, for the panic CPU, committed
>> +	 * records are also handled as finalized records since they contain
>> +	 * message data in a consistent state and may contain additional
>> +	 * hints as to the cause of the panic.
>> +	 */
>> +	if (d_state == desc_committed && !this_cpu_in_panic())
>> +		return -EINVAL;

And this code would change to:

+	/*
+	 * A descriptor in the committed state means the record does not yet
+	 * exist for the reader. However, for the panic CPU, committed
+	 * records are also handled as finalized records since they contain
+	 * message data in a consistent state and may contain additional
+	 * hints as to the cause of the panic. The only exception is the
+	 * last record, which may still be appended by the panic CPU and so
+	 * is not available to the panic CPU for reading.
+	 */
+	if (d_state == desc_committed &&
+	    (!this_cpu_in_panic() || id == atomic_long_read(&desc_ring->head_id))) {
+		return -EINVAL;
+	}

> If I get it correctly, this causes that panic CPU would see a
> non-finalized continuous line as finalized. And it would flush
> the existing piece to consoles.
>
> The problem is that pr_cont() would append the message into
> the same record. But the consoles would already wait
> for the next record. They would miss the appended pieces.

Exactly. That is why we said that the last message would not be
available. Maybe this new version is acceptable.

> Honestly, I think that it is not worth the effort. It would add
> another complexity to the memory barriers. The real effect is not easy
> to understand. And the benefit is minimal from my POV.

I am OK with dropping this patch from the series. It is questionable how
valuable a LOG_CONT piece from a non-panic CPU is anyway. And if the
non-panic CPU managed to reopen the record, it would be skipped anyway.

I will drop this patch unless you want to keep the new version.

John
  
Petr Mladek Feb. 7, 2024, 10:11 a.m. UTC | #3
On Mon 2024-02-05 15:14:14, John Ogness wrote:
> On 2024-02-01, Petr Mladek <pmladek@suse.com> wrote:
> > On Thu 2023-12-14 22:47:58, John Ogness wrote:
> >> A descriptor in the committed state means the record does not yet
> >> exist for the reader. However, for the panic CPU, committed
> >> records should be handled as finalized records since they contain
> >> message data in a consistent state and may contain additional
> >> hints as to the cause of the panic.
> >> 
> >> Add an exception for records in the commit state to not be
> >> considered non-existing when reading from the panic CPU.
> >
> > IMHO, it is important to describe effects of this change in more
> > details. And I think that it actually does not work as expected,
> > see below.
> 
> I reviewed my notes from our meeting in Richmond. We had agreed that
> this feature should not apply to the latest message. That would change
> the commit message to be as follows:
> 
>     printk: ringbuffer: Consider committed as finalized in panic
>     
>     A descriptor in the committed state means the record does not yet
>     exist for the reader. However, for the panic CPU, committed
>     records should be handled as finalized records since they contain
>     message data in a consistent state and may contain additional
>     hints as to the cause of the panic.
>     
>     The only exception is the last record. The panic CPU may be
>     usig LOG_CONT and the individual pieces should not be printed
>     separately.

This was my first understanding. But then I realized that appended
pieces would not be printed at all when a committed part
was printed.

And it might be even worse. I have realized yesterday that an
attempt to print the last record in the committed state might
cause missing the trailing '\0' in the copied data. Just imagine
the following race:

CPU0				CPU1

prb_read()
  // read last record in committed state
  copy_data(.., info->text_len)

				prb_reserve_in_last()
				printk_sprint(&r.text_buf[0], reserve_size, ...
				prb_commit()

     // copy data_size before appending
     // from already appended buffer
     // The trailing '\0' is not in the copied part
     memcpy(&buf[0], data, data_size);

BANG: CPU0 would try to use a string without the trailing '\0'.

We should probably add a sanity check and fixup for this potential
problem.


>     Add a special-case check for records in the commit state to not
>     be considered non-existing when reading from the panic CPU and
>     it is not the last record.
>
> >> --- a/kernel/printk/printk_ringbuffer.c
> >> +++ b/kernel/printk/printk_ringbuffer.c
> >> @@ -1875,16 +1877,25 @@ static int desc_read_finalized_seq(struct prb_desc_ring *desc_ring,
> >>  
> >>  	/*
> >>  	 * An unexpected @id (desc_miss) or @seq mismatch means the record
> >> -	 * does not exist. A descriptor in the reserved or committed state
> >> -	 * means the record does not yet exist for the reader.
> >> +	 * does not exist. A descriptor in the reserved state means the
> >> +	 * record does not yet exist for the reader.
> >>  	 */
> >>  	if (d_state == desc_miss ||
> >>  	    d_state == desc_reserved ||
> >> -	    d_state == desc_committed ||
> >>  	    s != seq) {
> >>  		return -EINVAL;
> >>  	}
> >>  
> >> +	/*
> >> +	 * A descriptor in the committed state means the record does not yet
> >> +	 * exist for the reader. However, for the panic CPU, committed
> >> +	 * records are also handled as finalized records since they contain
> >> +	 * message data in a consistent state and may contain additional
> >> +	 * hints as to the cause of the panic.
> >> +	 */
> >> +	if (d_state == desc_committed && !this_cpu_in_panic())
> >> +		return -EINVAL;
> 
> And this code would change to:
> 
> +	/*
> +	 * A descriptor in the committed state means the record does not yet
> +	 * exist for the reader. However, for the panic CPU, committed
> +	 * records are also handled as finalized records since they contain
> +	 * message data in a consistent state and may contain additional
> +	 * hints as to the cause of the panic. The only exception is the
> +	 * last record, which may still be appended by the panic CPU and so
> +	 * is not available to the panic CPU for reading.
> +	 */
> +	if (d_state == desc_committed &&
> +	    (!this_cpu_in_panic() || id == atomic_long_read(&desc_ring->head_id))) {
> +		return -EINVAL;
> +	}

This is clever.

Well, it would help only with a very small race window. Older
records are automatically finalized when committed. It is because
they could not be reopened. See prb_commit().

> > If I get it correctly, this causes that panic CPU would see a
> > non-finalized continuous line as finalized. And it would flush
> > the existing piece to consoles.
> >
> > The problem is that pr_cont() would append the message into
> > the same record. But the consoles would already wait
> > for the next record. They would miss the appended pieces.
> 
> Exactly. That is why we said that the last message would not be
> available. Maybe this new version is acceptable.
> 
> > Honestly, I think that it is not worth the effort. It would add
> > another complexity to the memory barriers. The real effect is not easy
> > to understand. And the benefit is minimal from my POV.
> 
> I am OK with dropping this patch from the series. It is questionable how
> valuable a LOG_CONT piece from a non-panic CPU is anyway. And if the
> non-panic CPU managed to reopen the record, it would be skipped anyway.
> 
> I will drop this patch unless you want to keep the new version.

Honestly, I would drop it. It is kind of tricky code. And it would
help only with a very small race window with messages from
non-panic() CPUs. Especially with the 14th patch which
blocks messages from other CPUs during panic() anyway.

Best Regards,
Petr
  

Patch

diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index d6ed33683b8b..e7b808b829a0 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -1857,6 +1857,8 @@  static bool copy_data(struct prb_data_ring *data_ring,
  * descriptor. However, it also verifies that the record is finalized and has
  * the sequence number @seq. On success, 0 is returned.
  *
+ * For the panic CPU, committed descriptors are also considered finalized.
+ *
  * Error return values:
  * -EINVAL: A finalized record with sequence number @seq does not exist.
  * -ENOENT: A finalized record with sequence number @seq exists, but its data
@@ -1875,16 +1877,25 @@  static int desc_read_finalized_seq(struct prb_desc_ring *desc_ring,
 
 	/*
 	 * An unexpected @id (desc_miss) or @seq mismatch means the record
-	 * does not exist. A descriptor in the reserved or committed state
-	 * means the record does not yet exist for the reader.
+	 * does not exist. A descriptor in the reserved state means the
+	 * record does not yet exist for the reader.
 	 */
 	if (d_state == desc_miss ||
 	    d_state == desc_reserved ||
-	    d_state == desc_committed ||
 	    s != seq) {
 		return -EINVAL;
 	}
 
+	/*
+	 * A descriptor in the committed state means the record does not yet
+	 * exist for the reader. However, for the panic CPU, committed
+	 * records are also handled as finalized records since they contain
+	 * message data in a consistent state and may contain additional
+	 * hints as to the cause of the panic.
+	 */
+	if (d_state == desc_committed && !this_cpu_in_panic())
+		return -EINVAL;
+
 	/*
 	 * A descriptor in the reusable state may no longer have its data
 	 * available; report it as existing but with lost data. Or the record