[printk,v2,3/4] printk: Skip unfinalized records in panic

Message ID 20231013204340.1112036-4-john.ogness@linutronix.de
State New
Headers
Series fix console flushing on panic |

Commit Message

John Ogness Oct. 13, 2023, 8:43 p.m. UTC
  Normally a reader will stop once reaching an unfinalized record.
However, when a panic happens, writers from other CPUs (or an
interrupted context on the panic CPU) may have been writing a
record and were unable to finalize it. The panic CPU will
reserve/commit/finalize its panic records, but these will be
located after the unfinalized records. This results in panic()
not flushing the panic messages.

Add a special case to printk_get_next_message() to allow
skipping over unfinalized records if on the panic CPU.

Also refine the documentation of the ringbuffer reading
functions to clarify that failure may also be due to an
unfinalized record.

Fixes: 896fbe20b4e2 ("printk: use the lockless ringbuffer")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c            | 15 +++++++++++++--
 kernel/printk/printk_ringbuffer.c |  9 +++++----
 2 files changed, 18 insertions(+), 6 deletions(-)
  

Comments

Petr Mladek Oct. 17, 2023, 11:27 a.m. UTC | #1
On Fri 2023-10-13 22:49:39, John Ogness wrote:
> Normally a reader will stop once reaching an unfinalized record.
> However, when a panic happens, writers from other CPUs (or an
> interrupted context on the panic CPU) may have been writing a
> record and were unable to finalize it. The panic CPU will
> reserve/commit/finalize its panic records, but these will be
> located after the unfinalized records. This results in panic()
> not flushing the panic messages.
> 
> Add a special case to printk_get_next_message() to allow
> skipping over unfinalized records if on the panic CPU.
> 
> Also refine the documentation of the ringbuffer reading
> functions to clarify that failure may also be due to an
> unfinalized record.

Great catch!

It took me some time to think about various possible scenarios.
As a result, I suggest to update some comments, see below.

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2813,8 +2813,19 @@ static bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
>  	else
>  		prb_rec_init_rd(&r, &info, outbuf, outbuf_sz);
>  
> -	if (!prb_read_valid(prb, seq, &r))
> -		return false;
> +	while (!prb_read_valid(prb, seq, &r)) {
> +		if (this_cpu_in_panic() && seq < prb_next_seq(prb)) {
> +			/*
> +			 * The record @seq is not finalized and there may be

"may be" is a bit misleading. If I count it correctly then there
"are" more records when seq < prb_next_seq().

> +			 * more records in the ringbuffer. Since this is the
> +			 * panic CPU, skip over the unfinalized record and
> +			 * try to read a finalized record that may follow.
> +			 */

It took me quite some time to understand whether this is right or not.
It is partly because the comment describes what the code does but it
does not describe the motivation and why it is safe. I wonder
if the following is better:

			/*
			 * Skip non-finalized records when emitting messages
			 * from panic CPU. They might never get finalized.
			 * Note that new messages printed on panic CPU
			 * are finalized when we are here.
			 */

But wait. Are the messages printed in panic context always finalized?
What about messages without the trailing newline printed?

Aha, they actually are finalized because prb_next_seq() would return sequence
number of the record in "desc_committed" state when there is
a message without newline and we skip only seq < prb_next_seq().
So I would update the comment, something like:

			/*
			 * Skip non-finalized records when emitting messages
			 * from panic CPU. They might never get	finalized.
			 *
			 * Note that new messages printed on panic CPU are
			 * finalized when we are here. The only exception
			 * might be the last message without trailing newline.
			 * But it would have the sequence number returned
			 * by prb_next_seq().
			 */

Sigh, it is long. But this is a quite tricky situation.

> +			seq++;
> +		} else {
> +			return false;
> +		}
> +	}
>  
>  	pmsg->seq = r.info->seq;
>  	pmsg->dropped = r.info->seq - seq;
> diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
> index 2dc4d5a1f1ff..1bbc008109ef 100644
> --- a/kernel/printk/printk_ringbuffer.c
> +++ b/kernel/printk/printk_ringbuffer.c
> @@ -1876,8 +1876,9 @@ static u64 prb_first_seq(struct printk_ringbuffer *rb)
>  }
>  
>  /*
> - * Non-blocking read of a record. Updates @seq to the last finalized record
> - * (which may have no data available).
> + * Non-blocking read of a record. Updates @seq to the record that was read
> + * (which may have no data available) or was attempted to be read (in case
> + * it was unfinalized or non-existent).

I am confused. Well, even the original message was confusing.
I think about describing it the following way.

  * On input, @seq defines the record which should be read. It might
  * be updated to a higher value when the requested record has already
  * been overwritten or when the record had empty data.
  *
  * On return, @seq value depends on the situation. It is:
  *
  *	- sequence number of the read record on success.
  *     - sequence number of the first found to-be-finalized record
  *	  when the input seq number was lower or equal to prb_next_seq().
  *	- the original value when @seq was invalid, bigger then prb_next_seq().

Sigh, the comment is hairy. Maybe you would find a more elegant way
to describe the variants.


BTW: The support for data-less records were added by the commit
     ("printk: ringbuffer: support dataless records"). It was
     needed to handle empty lines: printk("\n"). It is strange
     that we skip them instead of printing the empty line.

     It looks like a small bug. I might look a it if you agree
     that it looks like a bug.

>   *
>   * See the description of prb_read_valid() and prb_read_valid_info()
>   * for details.


Also I would update the above prb_next_seq():

--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -2012,8 +2012,8 @@ u64 prb_first_valid_seq(struct printk_ringbuffer *rb)
  * available records should be skipped.
  *
  * Context: Any context.
- * Return: The sequence number of the next newest (not yet available) record
- *         for readers.
+ * Return: The sequence number of the next newest (not yet finalized or
+ *	   non-existing) record for readers.
  */
 u64 prb_next_seq(struct printk_ringbuffer *rb)
 {
  
John Ogness Oct. 17, 2023, 9:25 p.m. UTC | #2
On 2023-10-17, Petr Mladek <pmladek@suse.com> wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2813,8 +2813,19 @@ static bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
>>  	else
>>  		prb_rec_init_rd(&r, &info, outbuf, outbuf_sz);
>>  
>> -	if (!prb_read_valid(prb, seq, &r))
>> -		return false;
>> +	while (!prb_read_valid(prb, seq, &r)) {
>> +		if (this_cpu_in_panic() && seq < prb_next_seq(prb)) {
>> +			/*
>> +			 * The record @seq is not finalized and there may be
>
> "may be" is a bit misleading. If I count it correctly then there
> "are" more records when seq < prb_next_seq().

Ack.

> But wait. Are the messages printed in panic context always finalized?
> What about messages without the trailing newline printed?
>
> Aha, they actually are finalized because prb_next_seq() would return sequence
> number of the record in "desc_committed" state when there is
> a message without newline and we skip only seq < prb_next_seq().
> So I would update the comment, something like:
>
> 			/*
> 			 * Skip non-finalized records when emitting messages
> 			 * from panic CPU. They might never get	finalized.
> 			 *
> 			 * Note that new messages printed on panic CPU are
> 			 * finalized when we are here. The only exception
> 			 * might be the last message without trailing newline.
> 			 * But it would have the sequence number returned
> 			 * by prb_next_seq().
> 			 */
>
> Sigh, it is long. But this is a quite tricky situation.

OK.

>> diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
>> index 2dc4d5a1f1ff..1bbc008109ef 100644
>> --- a/kernel/printk/printk_ringbuffer.c
>> +++ b/kernel/printk/printk_ringbuffer.c
>> @@ -1876,8 +1876,9 @@ static u64 prb_first_seq(struct printk_ringbuffer *rb)
>>  }
>>  
>>  /*
>> - * Non-blocking read of a record. Updates @seq to the last finalized record
>> - * (which may have no data available).
>> + * Non-blocking read of a record. Updates @seq to the record that was read
>> + * (which may have no data available) or was attempted to be read (in case
>> + * it was unfinalized or non-existent).
>
> I am confused. Well, even the original message was confusing.
> I think about describing it the following way.
>
>   * On input, @seq defines the record which should be read. It might
>   * be updated to a higher value when the requested record has already
>   * been overwritten or when the record had empty data.
>   *
>   * On return, @seq value depends on the situation. It is:
>   *
>   *	- sequence number of the read record on success.
>   *     - sequence number of the first found to-be-finalized record
>   *	  when the input seq number was lower or equal to prb_next_seq().
>   *	- the original value when @seq was invalid, bigger then prb_next_seq().
>
> Sigh, the comment is hairy. Maybe you would find a more elegant way
> to describe the variants.

Be careful. prb_next_seq() is only loosely related to
_prb_read_valid(). I would not mention prb_next_seq() when describing
_prb_read_valid(). There are situations where _prb_read_valid() could
successfully read a record with a higher sequence number than
prb_next_seq() would return. This is because prb_next_seq() is only best
effort.

For panic it is sufficient because panic() will commit finalized records
after having stopped all other CPUs, so it will definitely update
@prb->desc_ring.last_finalized_id and allow prb_next_seq() to point to
the end of the panic messages. But for non-panic that is not the case.

I do not have a problem understanding my version of the comments. Note
that it is just a brief internal comment. It also says:

 * See the description of prb_read_valid() and prb_read_valid_info()
 * for details.

And if you look at the kerneldoc of either of those functions you see:

 * On success, the reader must check r->info.seq to see which record was
 * actually read. This allows the reader to detect dropped records.
 *
 * Failure means @seq refers to a not yet finalized or non-existing record.

Also note that @seq is never passed by reference from the external
caller. It is only passed by reference to the helper function
_prb_read_valid().

> BTW: The support for data-less records were added by the commit
>      ("printk: ringbuffer: support dataless records"). It was
>      needed to handle empty lines: printk("\n"). It is strange
>      that we skip them instead of printing the empty line.

We do not skip them. That was the whole point of adding support for
data-less records. ;-)

get_data() returns "" and @data_size=0

copy_data() returns true (but does not copy any data)

prb_read() returns true (we are assuming it is finalized)

_prb_read_valid() returns true

prb_read_valid() return true

record_print_text() creates a string with prefix and adds "\n"

printk_get_next_message() returns something to print

> Also I would update the above prb_next_seq():
>
> --- a/kernel/printk/printk_ringbuffer.c
> +++ b/kernel/printk/printk_ringbuffer.c
> @@ -2012,8 +2012,8 @@ u64 prb_first_valid_seq(struct printk_ringbuffer *rb)
>   * available records should be skipped.
>   *
>   * Context: Any context.
> - * Return: The sequence number of the next newest (not yet available) record
> - *         for readers.
> + * Return: The sequence number of the next newest (not yet finalized or
> + *	   non-existing) record for readers.

Ack.

John Ogness
  
John Ogness Oct. 18, 2023, 8:24 a.m. UTC | #3
On 2023-10-17, John Ogness <john.ogness@linutronix.de> wrote:
>>> -	if (!prb_read_valid(prb, seq, &r))
>>> -		return false;
>>> +	while (!prb_read_valid(prb, seq, &r)) {
>>> +		if (this_cpu_in_panic() && seq < prb_next_seq(prb)) {
>>> +			/*
>>> +			 * The record @seq is not finalized and there may be

I am wondering if we should put the this_cpu_in_panic() check into
_prb_read_valid() instead. This has the advantage that it could more
intelligently skip non-finalized records and not need to rely on
prb_next_seq(). Also, it would allow any other readers (kmsg_dump) to
skip non-finalized records during panic.

Looking over the code, it would be easy to extend
desc_read_finalized_seq() to return a separate error code for exists but
is non-finalized. This would make it trivial for _prb_read_valid() to
skip over it when this_cpu_in_panic().

John
  
Petr Mladek Oct. 18, 2023, 12:54 p.m. UTC | #4
On Tue 2023-10-17 23:31:25, John Ogness wrote:
> On 2023-10-17, Petr Mladek <pmladek@suse.com> wrote:
> >> --- a/kernel/printk/printk_ringbuffer.c
> >> +++ b/kernel/printk/printk_ringbuffer.c
> >> @@ -1876,8 +1876,9 @@ static u64 prb_first_seq(struct printk_ringbuffer *rb)
> >>  }
> >>  
> >>  /*
> >> - * Non-blocking read of a record. Updates @seq to the last finalized record
> >> - * (which may have no data available).
> >> + * Non-blocking read of a record. Updates @seq to the record that was read
> >> + * (which may have no data available) or was attempted to be read (in case
> >> + * it was unfinalized or non-existent).
> >
> > I am confused. Well, even the original message was confusing.
> > I think about describing it the following way.
> >
> >   * On input, @seq defines the record which should be read. It might
> >   * be updated to a higher value when the requested record has already
> >   * been overwritten or when the record had empty data.
> >   *
> >   * On return, @seq value depends on the situation. It is:
> >   *
> >   *	- sequence number of the read record on success.
> >   *     - sequence number of the first found to-be-finalized record
> >   *	  when the input seq number was lower or equal to prb_next_seq().
> >   *	- the original value when @seq was invalid, bigger then prb_next_seq().
> >
> > Sigh, the comment is hairy. Maybe you would find a more elegant way
> > to describe the variants.
> 
> Be careful. prb_next_seq() is only loosely related to
> _prb_read_valid(). I would not mention prb_next_seq() when describing
> _prb_read_valid(). There are situations where _prb_read_valid() could
> successfully read a record with a higher sequence number than
> prb_next_seq() would return. This is because prb_next_seq() is only best
> effort.

I see. I misunderstood the meaning of prb_next_seq().

Well, I would not say that "prb_next_seq() is only the best effort".
It has well defined meaning. It returns the sequence number
if the first not yet finalized record.

You are right, there might be many reserved records which are newer.
Some of these reserved records might get finalized before older ones.
This is why _prb_read_valid() might actually return the higher number
when anyone attempts to read records above prb_next_seq().

Wait! This means that your patch actually does not work. Here is the diff:

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2815,8 +2815,19 @@ static bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
 	else
 		prb_rec_init_rd(&r, &info, outbuf, outbuf_sz);
 
-	if (!prb_read_valid(prb, seq, &r))
-		return false;
+	while (!prb_read_valid(prb, seq, &r)) {
+		if (this_cpu_in_panic() && seq < prb_next_seq(prb)) {
+			/*
+			 * The record @seq is not finalized and there may be
+			 * more records in the ringbuffer. Since this is the
+			 * panic CPU, skip over the unfinalized record and
+			 * try to read a finalized record that may follow.
+			 */
+			seq++;
+		} else {
+			return false;
+		}
+	}
 
 	pmsg->seq = r.info->seq;
 	pmsg->dropped = r.info->seq - seq;

It skips the invalid reads only when seq < prb_next_seq(). But
prb_next_seq(prb) points to the 1st non-finalized record. And
all records with seq < prb_next_seq() must be finalized!


> For panic it is sufficient because panic() will commit finalized records
> after having stopped all other CPUs, so it will definitely update
> @prb->desc_ring.last_finalized_id and allow prb_next_seq() to point to
> the end of the panic messages. But for non-panic that is not the case.
> 
> I do not have a problem understanding my version of the comments.

Maybe, "hairy" is not the right word. Well, from my POV, the comment
is not clear enough. It says that it updates "seq" but it does not say
when and how.

And it is even wrong. I still believe that dataless records are
quietly skipped. Here is the code:

static bool _prb_read_valid(struct printk_ringbuffer *rb, u64 *seq,
			    struct printk_record *r, unsigned int *line_count)
{
	while ((err = prb_read(rb, *seq, r, line_count))) {
[...]
		} else if (err == -ENOENT) {
			/* Record exists, but no data available. Skip. */
			(*seq)++;

		} else {
	}
[...]
}

Note that "seq" is incremented and the cycle continues. It means
that it tries to read the next record. It does _not_ return
"true" when seeing dataless record, definitely.

  => it might modify "seq" and still return "false" when
     the original "seq" pointed to finalized dataless record.


> that it is just a brief internal comment. It also says:
> 
>  * See the description of prb_read_valid() and prb_read_valid_info()
>  * for details.

This does not help because these two functions do not describe
how the seq is modified. And there are clearly three possibilities
when it returns false:

    1. The record for the given seq number has not been finalized
       yet. In this case, it keeps "seq" as it was.

    2. The record for the given seq has been dataless. In this case
      it would try to advance seq until it finds a record with
      the data or not-yet-finalized one.

    3. The given seq number has been invalid, above the head.
       In this case, it keeps "seq" intact. So same as 1st case.

Note that "seq" was not updated in the 1st and 3rd case. But the
current comment suggests that it is updated.

Well, it might be updated in all situations when then given
seq pointed to an already overwritten record (below tail).


> And if you look at the kerneldoc of either of those functions you see:
> 
>  * On success, the reader must check r->info.seq to see which record was
>  * actually read. This allows the reader to detect dropped records.
>  *
>  * Failure means @seq refers to a not yet finalized or non-existing record.
> 
> Also note that @seq is never passed by reference from the external
> caller. It is only passed by reference to the helper function
> _prb_read_valid().
> 
> > BTW: The support for data-less records were added by the commit
> >      ("printk: ringbuffer: support dataless records"). It was
> >      needed to handle empty lines: printk("\n"). It is strange
> >      that we skip them instead of printing the empty line.
> 
> We do not skip them. That was the whole point of adding support for
> data-less records. ;-)
> 
> get_data() returns "" and @data_size=0
> 
> copy_data() returns true (but does not copy any data)
> 
> prb_read() returns true (we are assuming it is finalized)
> 
> _prb_read_valid() returns true

This is not true if I read the code correctly.

> prb_read_valid() return true
> 
> record_print_text() creates a string with prefix and adds "\n"
> 
> printk_get_next_message() returns something to print

Best Regards,
Petr

PS: Sigh, I know that my comments looked like nitpicking. I had big inner
    fight whether it was worth it or I was just burning my and others
    time. But it seems that it helped for find many subtle problems
    in the end. Or am I still missing something?
  
Petr Mladek Oct. 18, 2023, 1:15 p.m. UTC | #5
On Wed 2023-10-18 10:30:46, John Ogness wrote:
> On 2023-10-17, John Ogness <john.ogness@linutronix.de> wrote:
> >>> -	if (!prb_read_valid(prb, seq, &r))
> >>> -		return false;
> >>> +	while (!prb_read_valid(prb, seq, &r)) {
> >>> +		if (this_cpu_in_panic() && seq < prb_next_seq(prb)) {
> >>> +			/*
> >>> +			 * The record @seq is not finalized and there may be
> 
> I am wondering if we should put the this_cpu_in_panic() check into
> _prb_read_valid() instead. This has the advantage that it could more
> intelligently skip non-finalized records and not need to rely on
> prb_next_seq(). Also, it would allow any other readers (kmsg_dump) to
> skip non-finalized records during panic.

Yeah, this look like a better solution. Anyway, we need to try reading
the messages behind prb_next_seq() up to the head of the ring buffer.
And @head_id is not available outside prb API.

> Looking over the code, it would be easy to extend
> desc_read_finalized_seq() to return a separate error code for exists but
> is non-finalized. This would make it trivial for _prb_read_valid() to
> skip over it when this_cpu_in_panic().

Sounds reasonable.

Best Regards,
Petr
  
John Ogness Oct. 18, 2023, 1:45 p.m. UTC | #6
On 2023-10-18, Petr Mladek <pmladek@suse.com> wrote:
> Wait! This means that your patch actually does not work. Here is the diff:
>
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2815,8 +2815,19 @@ static bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
>  	else
>  		prb_rec_init_rd(&r, &info, outbuf, outbuf_sz);
>  
> -	if (!prb_read_valid(prb, seq, &r))
> -		return false;
> +	while (!prb_read_valid(prb, seq, &r)) {
> +		if (this_cpu_in_panic() && seq < prb_next_seq(prb)) {
> +			/*
> +			 * The record @seq is not finalized and there may be
> +			 * more records in the ringbuffer. Since this is the
> +			 * panic CPU, skip over the unfinalized record and
> +			 * try to read a finalized record that may follow.
> +			 */
> +			seq++;
> +		} else {
> +			return false;
> +		}
> +	}
>  
>  	pmsg->seq = r.info->seq;
>  	pmsg->dropped = r.info->seq - seq;
>
> It skips the invalid reads only when seq < prb_next_seq(). But
> prb_next_seq(prb) points to the 1st non-finalized record. And
> all records with seq < prb_next_seq() must be finalized!

Please take a look at prb_next_seq(). It _starts_ its search from:

    id = atomic_long_read(&desc_ring->last_finalized_id);

For console_flush_on_panic(), @last_finalized_id will _always_ be set to
the last finalized message of the panic messages, being higher than any
non-finalized records that may exist. There are no other CPUs running
except the panic CPU.

> And it is even wrong. I still believe that dataless records are
> quietly skipped. Here is the code:
>
> static bool _prb_read_valid(struct printk_ringbuffer *rb, u64 *seq,
> 			    struct printk_record *r, unsigned int *line_count)
> {
> 	while ((err = prb_read(rb, *seq, r, line_count))) {
> [...]
> 		} else if (err == -ENOENT) {
> 			/* Record exists, but no data available. Skip. */
> 			(*seq)++;
>
> 		} else {
> 	}
> [...]
> }

Yes, this is the code. But prb_read() returns _true_ for data-less
records. You are confusing data-less records (which are valid records
containing the empty string) with "no data available" (which are records
that have lost their data).

Your confusion shows that we should improve the language. I am working
on a patch that does this.

> Note that "seq" is incremented and the cycle continues. It means
> that it tries to read the next record. It does _not_ return
> "true" when seeing dataless record, definitely.

I showed you where to look: get_data() has an extra check for data-less
records. It returns a valid string "", not NULL.

>> that it is just a brief internal comment. It also says:
>> 
>>  * See the description of prb_read_valid() and prb_read_valid_info()
>>  * for details.
>
> This does not help because these two functions do not describe
> how the seq is modified. And there are clearly three possibilities
> when it returns false:
>
>     1. The record for the given seq number has not been finalized
>        yet. In this case, it keeps "seq" as it was.
>
>     2. The record for the given seq has been dataless. In this case
>       it would try to advance seq until it finds a record with
>       the data or not-yet-finalized one.

"dataless" is the wrong word. It should be "lost data". It is skipped
because there is no data available (because it was lost) for the reader
to read. This is _not_ the case where it is a finalized fully intakt
record containing an empty string.

>
>     3. The given seq number has been invalid, above the head.
>        In this case, it keeps "seq" intact. So same as 1st case.
>
> Note that "seq" was not updated in the 1st and 3rd case. But the
> current comment suggests that it is updated.
>
> Well, it might be updated in all situations when then given
> seq pointed to an already overwritten record (below tail).

You are making this all sound much more complicated than it
is. Really. I suppose this is testimony for the horrible documentation.

>> _prb_read_valid() returns true
>
> This is not true if I read the code correctly.

Unfortunately you are not reading the code correctly. (Or rather, you
are being misled by comments because you incorrectly associate "not
available to reader" to mean "valid record with an empty string".

> PS: Sigh, I know that my comments looked like nitpicking. I had big inner
>     fight whether it was worth it or I was just burning my and others
>     time. But it seems that it helped for find many subtle problems
>     in the end.

Indeed. The subtle problem is: unclear comments/documentation.

John
  
Petr Mladek Oct. 18, 2023, 2:20 p.m. UTC | #7
On Wed 2023-10-18 14:54:29, Petr Mladek wrote:
> On Tue 2023-10-17 23:31:25, John Ogness wrote:
> > On 2023-10-17, Petr Mladek <pmladek@suse.com> wrote:
> > > BTW: The support for data-less records were added by the commit
> > >      ("printk: ringbuffer: support dataless records"). It was
> > >      needed to handle empty lines: printk("\n"). It is strange
> > >      that we skip them instead of printing the empty line.
> > 
> > We do not skip them. That was the whole point of adding support for
> > data-less records. ;-)
> > 
> > get_data() returns "" and @data_size=0

Ah, I see it now. The code really handles printk("\n") correctly.

There are two types of dataless records:

      + #define NO_LPOS 0x3. It is used for storing zero size data.

      + #define FAILED_LPOS 0x1. It is used when the code failed
	  to allocate enough space on the ring buffer or when
	  they were overwritten.

BLK_DATALESS() returns true for both of them.

Sigh, we should really distinguish in the comments when functions
handle the two situations different way. Especially we should not
use the ambiguous "dataless record" word.

I think about "empty line record" and "records with missing data".
And I would rename NO_LPOS to EMPTY_LINE_LPOS to make the meaning
more obvious.

Also it would make sense to use 0x2 for EMPTY_LINE_LPOS and

#define FAILED_LPOS		0x1
#define EMPTY_LINE_LPOS		0x2
#define DATALESS_LPOS_MASK	(FAILED_LPOS | EMPTY_LINE_LPOS)

#define LPOS_DATALESS(lpos)	((lpos) & DATALESS_LPOS_MASK)

> > copy_data() returns true (but does not copy any data)
> > 
> > prb_read() returns true (we are assuming it is finalized)
> > 
> > _prb_read_valid() returns true
> 
> This is not true if I read the code correctly.
> 
> > prb_read_valid() return true
> > 
> > record_print_text() creates a string with prefix and adds "\n"
> > 
> > printk_get_next_message() returns something to print

If we used the right terms in the comments then they might stay short
and still be clear.

Best Regards,
Petr
  
Petr Mladek Oct. 18, 2023, 3:27 p.m. UTC | #8
On Wed 2023-10-18 15:51:57, John Ogness wrote:
> On 2023-10-18, Petr Mladek <pmladek@suse.com> wrote:
> > Wait! This means that your patch actually does not work. Here is the diff:
> >
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2815,8 +2815,19 @@ static bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
> >  	else
> >  		prb_rec_init_rd(&r, &info, outbuf, outbuf_sz);
> >  
> > -	if (!prb_read_valid(prb, seq, &r))
> > -		return false;
> > +	while (!prb_read_valid(prb, seq, &r)) {
> > +		if (this_cpu_in_panic() && seq < prb_next_seq(prb)) {
> > +			/*
> > +			 * The record @seq is not finalized and there may be
> > +			 * more records in the ringbuffer. Since this is the
> > +			 * panic CPU, skip over the unfinalized record and
> > +			 * try to read a finalized record that may follow.
> > +			 */
> > +			seq++;
> > +		} else {
> > +			return false;
> > +		}
> > +	}
> >  
> >  	pmsg->seq = r.info->seq;
> >  	pmsg->dropped = r.info->seq - seq;
> >
> > It skips the invalid reads only when seq < prb_next_seq(). But
> > prb_next_seq(prb) points to the 1st non-finalized record. And
> > all records with seq < prb_next_seq() must be finalized!
> 
> Please take a look at prb_next_seq(). It _starts_ its search from:
> 
>     id = atomic_long_read(&desc_ring->last_finalized_id);

I see. And this this set in

static void desc_make_final(struct prb_desc_ring *desc_ring, unsigned long id)
{
[...]

	/* Best effort to remember the last finalized @id. */
	atomic_long_set(&desc_ring->last_finalized_id, id);
}

So it is the _last_ finalized id from the timing POV. If there are
more CPUs storing and finalizing the messages in parallel then
it might change forth and back. There might be earlier non-finalized
records and newer finalized ones.

It means that prb_next_seq() really is the best effort and
the description is not valid:

/**
 * prb_next_seq() - Get the sequence number after the last available record.
 *
 * @rb:  The ringbuffer to get the sequence number from.
 *
 * This is the public function available to readers to see what the next
 * newest sequence number available to readers will be.
 *
 * This provides readers a sequence number to jump to if all currently
 * available records should be skipped.

It is not guaranteed that it will be the last available record
because there might be newer already finalized records with
some non-finalized records in between.

Also it is not guaranteed that it will be the next record available
to readers because readers should stop on the 1st non-yet-finalized
record and prb_next_seq() might be behind.

It would be great to document these subtle details especially when
we are going to depend on them.

> For console_flush_on_panic(), @last_finalized_id will _always_ be set to
> the last finalized message of the panic messages, being higher than any
> non-finalized records that may exist. There are no other CPUs running
> except the panic CPU.

It is not guaranteed. It might be lower when some still running CPU
manages to finalize an earlier record and there are later
non-finalized records.

But you are right, there is a very high chance that it will point
behind the last message from panic() context sooner or later,
especially after CPUs get stopped.

Well, note that only NMI guarantees that CPUs get stopped. There
are still architectures which stop CPUs using normal interrupts.

> Unfortunately you are not reading the code correctly. (Or rather, you
> are being misled by comments because you incorrectly associate "not
> available to reader" to mean "valid record with an empty string".

You are right. Well, it is really hard to put all the pieces together
just by reading the code. And the unclear comments make it even worse.

Best Regards,
Petr
  
John Ogness Oct. 18, 2023, 3:50 p.m. UTC | #9
On 2023-10-18, Petr Mladek <pmladek@suse.com> wrote:
> So it is the _last_ finalized id from the timing POV. If there are
> more CPUs storing and finalizing the messages in parallel then
> it might change forth and back. There might be earlier non-finalized
> records and newer finalized ones.
>
> It means that prb_next_seq() really is the best effort and
> the description is not valid:

Well, the description was valid until prb_next_seq() was optimized and
converted to best-effort with:

commit f244b4dc53e5 ("printk: ringbuffer: Improve prb_next_seq() performance")

> It would be great to document these subtle details especially when
> we are going to depend on them.

Going through the various call sites of prb_next_seq(), I would argue
that the above optimization introduced some bugs. I will investigate if
prb_next_seq() can be fixed to match its description because the current
users already depend on that.

WRT to this series, I have put together an alternative implementation
that does not use prb_next_seq().

John
  
Petr Mladek Oct. 19, 2023, 10:29 a.m. UTC | #10
On Wed 2023-10-18 17:56:52, John Ogness wrote:
> On 2023-10-18, Petr Mladek <pmladek@suse.com> wrote:
> > So it is the _last_ finalized id from the timing POV. If there are
> > more CPUs storing and finalizing the messages in parallel then
> > it might change forth and back. There might be earlier non-finalized
> > records and newer finalized ones.
> >
> > It means that prb_next_seq() really is the best effort and
> > the description is not valid:
> 
> Well, the description was valid until prb_next_seq() was optimized and
> converted to best-effort with:
> 
> commit f244b4dc53e5 ("printk: ringbuffer: Improve prb_next_seq() performance")
> 
> > It would be great to document these subtle details especially when
> > we are going to depend on them.
> 
> Going through the various call sites of prb_next_seq(), I would argue
> that the above optimization introduced some bugs. I will investigate if
> prb_next_seq() can be fixed to match its description because the current
> users already depend on that.

I thought about caching the last seq returned by
prb_next_seq() instead of caching the last finalized record.

Also I thought about the highest sequence number accessed
by _prb_read_valid(). But it can be done lockless way only
on 64-bit systems. Well, it might be good enough. I doubt that
there are big 32-bit systems.

> WRT to this series, I have put together an alternative implementation
> that does not use prb_next_seq().

Great.

Best Regards,
Petr
  
John Ogness Oct. 23, 2023, 9:53 a.m. UTC | #11
On 2023-10-18, Petr Mladek <pmladek@suse.com> wrote:
> I think about "empty line record" and "records with missing data".
> And I would rename NO_LPOS to EMPTY_LINE_LPOS to make the meaning
> more obvious.

OK.

> Also it would make sense to use 0x2 for EMPTY_LINE_LPOS and
>
> #define FAILED_LPOS		0x1
> #define EMPTY_LINE_LPOS		0x2
> #define DATALESS_LPOS_MASK	(FAILED_LPOS | EMPTY_LINE_LPOS)
>
> #define LPOS_DATALESS(lpos)	((lpos) & DATALESS_LPOS_MASK)

The existing debugging tools use bit0 to identify if there is data. Bit1
is really the _reason_ for the missing data. This can be seen in the
definition of LPOS_DATALESS(), but it needs to be documented
better. (Ideally where FAILED_LPOS and EMPTY_LINE_LPOS are defined.)

John
  

Patch

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 19b752880879..56d9b4acbbf2 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2813,8 +2813,19 @@  static bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
 	else
 		prb_rec_init_rd(&r, &info, outbuf, outbuf_sz);
 
-	if (!prb_read_valid(prb, seq, &r))
-		return false;
+	while (!prb_read_valid(prb, seq, &r)) {
+		if (this_cpu_in_panic() && seq < prb_next_seq(prb)) {
+			/*
+			 * The record @seq is not finalized and there may be
+			 * more records in the ringbuffer. Since this is the
+			 * panic CPU, skip over the unfinalized record and
+			 * try to read a finalized record that may follow.
+			 */
+			seq++;
+		} else {
+			return false;
+		}
+	}
 
 	pmsg->seq = r.info->seq;
 	pmsg->dropped = r.info->seq - seq;
diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index 2dc4d5a1f1ff..1bbc008109ef 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -1876,8 +1876,9 @@  static u64 prb_first_seq(struct printk_ringbuffer *rb)
 }
 
 /*
- * Non-blocking read of a record. Updates @seq to the last finalized record
- * (which may have no data available).
+ * Non-blocking read of a record. Updates @seq to the record that was read
+ * (which may have no data available) or was attempted to be read (in case
+ * it was unfinalized or non-existent).
  *
  * See the description of prb_read_valid() and prb_read_valid_info()
  * for details.
@@ -1932,7 +1933,7 @@  static bool _prb_read_valid(struct printk_ringbuffer *rb, u64 *seq,
  * On success, the reader must check r->info.seq to see which record was
  * actually read. This allows the reader to detect dropped records.
  *
- * Failure means @seq refers to a not yet written record.
+ * Failure means @seq refers to a not yet finalized or non-existing record.
  */
 bool prb_read_valid(struct printk_ringbuffer *rb, u64 seq,
 		    struct printk_record *r)
@@ -1962,7 +1963,7 @@  bool prb_read_valid(struct printk_ringbuffer *rb, u64 seq,
  * On success, the reader must check info->seq to see which record meta data
  * was actually read. This allows the reader to detect dropped records.
  *
- * Failure means @seq refers to a not yet written record.
+ * Failure means @seq refers to a not yet finalized or non-existing record.
  */
 bool prb_read_valid_info(struct printk_ringbuffer *rb, u64 seq,
 			 struct printk_info *info, unsigned int *line_count)