[printk,v3,02/14] printk: Adjust mapping for 32bit seq macros

Message ID 20231214214201.499426-3-john.ogness@linutronix.de
State New
Headers
Series fix console flushing |

Commit Message

John Ogness Dec. 14, 2023, 9:41 p.m. UTC
  Note: This change only applies to 32bit architectures. On 64bit
      architectures the macros are NOPs.

__ulseq_to_u64seq() computes the upper 32 bits of the passed
argument value (@ulseq). The upper bits are derived from a base
value (@rb_next_seq) in a way that assumes @ulseq represents a
64bit number that is less than or equal to @rb_next_seq.

Until now this mapping has been correct for all call sites. However,
in a follow-up commit, values of @ulseq will be passed in that are
higher than the base value. This requires a change to how the 32bit
value is mapped to a 64bit sequence number.

Rather than mapping @ulseq such that the base value is the end of a
32bit block, map @ulseq such that the base value is in the middle of
a 32bit block. This allows supporting 31 bits before and after the
base value, which is deemed acceptable for the console sequence
number during runtime.

Here is an example to illustrate the previous and new mappings.

For a base value (@rb_next_seq) of 2 2000 0000...

Before this change the range of possible return values was:

1 2000 0001 to 2 2000 0000

__ulseq_to_u64seq(1fff ffff) => 2 1fff ffff
__ulseq_to_u64seq(2000 0000) => 2 2000 0000
__ulseq_to_u64seq(2000 0001) => 1 2000 0001
__ulseq_to_u64seq(9fff ffff) => 1 9fff ffff
__ulseq_to_u64seq(a000 0000) => 1 a000 0000
__ulseq_to_u64seq(a000 0001) => 1 a000 0001

After this change the range of possible return values are:
1 a000 0001 to 2 a000 0000

__ulseq_to_u64seq(1fff ffff) => 2 1fff ffff
__ulseq_to_u64seq(2000 0000) => 2 2000 0000
__ulseq_to_u64seq(2000 0001) => 2 2000 0001
__ulseq_to_u64seq(9fff ffff) => 2 9fff ffff
__ulseq_to_u64seq(a000 0000) => 2 a000 0000
__ulseq_to_u64seq(a000 0001) => 1 a000 0001

[ john.ogness: Rewrite commit message. ]

Reported-by: Francesco Dolcini <francesco@dolcini.it>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202311171611.78d41dbe-oliver.sang@intel.com
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202311161555.3ee16fc9-oliver.sang@intel.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk_ringbuffer.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Sebastian Andrzej Siewior Dec. 15, 2023, 9:55 a.m. UTC | #1
On 2023-12-14 22:47:49 [+0106], John Ogness wrote:
> Note: This change only applies to 32bit architectures. On 64bit
>       architectures the macros are NOPs.

You lost my authorship.
But serious matter: Is this bug already present in the current printk
tree or is this code just not used and enabled later? Patch #1 from this
series moves the sequence around so is this problem introduced in the
previous patch and fixed here?

Sebastian
  
John Ogness Dec. 15, 2023, 10:10 a.m. UTC | #2
On 2023-12-15, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> On 2023-12-14 22:47:49 [+0106], John Ogness wrote:
>> Note: This change only applies to 32bit architectures. On 64bit
>>       architectures the macros are NOPs.
>
> You lost my authorship.

OK. Thanks for speaking up. (And thanks for doing the heavy lifting
researching the needed change.)

> But serious matter: Is this bug already present in the current printk
> tree or is this code just not used and enabled later? Patch #1 from
> this series moves the sequence around so is this problem introduced in
> the previous patch and fixed here?

No, it is not a bug at this point. As the commit message writes:

"Until now this mapping has been correct for all call sites."

But if the mapping is not changed here, it will become a bug after
applying patch 3 of this series.

John
  
Sebastian Andrzej Siewior Dec. 15, 2023, 10:58 a.m. UTC | #3
On 2023-12-15 11:16:44 [+0106], John Ogness wrote:
> No, it is not a bug at this point. As the commit message writes:
> 
> "Until now this mapping has been correct for all call sites."
> 
> But if the mapping is not changed here, it will become a bug after
> applying patch 3 of this series.

Okay. Thank you clarifying this.

> John

Sebastian
  
Petr Mladek Jan. 12, 2024, 10:28 a.m. UTC | #4
On Thu 2023-12-14 22:47:49, John Ogness wrote:
> Note: This change only applies to 32bit architectures. On 64bit
>       architectures the macros are NOPs.
> 
> __ulseq_to_u64seq() computes the upper 32 bits of the passed
> argument value (@ulseq). The upper bits are derived from a base
> value (@rb_next_seq) in a way that assumes @ulseq represents a
> 64bit number that is less than or equal to @rb_next_seq.
> 
> Until now this mapping has been correct for all call sites. However,
> in a follow-up commit, values of @ulseq will be passed in that are
> higher than the base value. This requires a change to how the 32bit
> value is mapped to a 64bit sequence number.
> 
> Rather than mapping @ulseq such that the base value is the end of a
> 32bit block, map @ulseq such that the base value is in the middle of
> a 32bit block. This allows supporting 31 bits before and after the
> base value, which is deemed acceptable for the console sequence
> number during runtime.
> 
> Here is an example to illustrate the previous and new mappings.
> 
> For a base value (@rb_next_seq) of 2 2000 0000...
> 
> Before this change the range of possible return values was:
> 
> 1 2000 0001 to 2 2000 0000
> 
> __ulseq_to_u64seq(1fff ffff) => 2 1fff ffff
> __ulseq_to_u64seq(2000 0000) => 2 2000 0000
> __ulseq_to_u64seq(2000 0001) => 1 2000 0001
> __ulseq_to_u64seq(9fff ffff) => 1 9fff ffff
> __ulseq_to_u64seq(a000 0000) => 1 a000 0000
> __ulseq_to_u64seq(a000 0001) => 1 a000 0001
> 
> After this change the range of possible return values are:
> 1 a000 0001 to 2 a000 0000
> 
> __ulseq_to_u64seq(1fff ffff) => 2 1fff ffff
> __ulseq_to_u64seq(2000 0000) => 2 2000 0000
> __ulseq_to_u64seq(2000 0001) => 2 2000 0001
> __ulseq_to_u64seq(9fff ffff) => 2 9fff ffff
> __ulseq_to_u64seq(a000 0000) => 2 a000 0000
> __ulseq_to_u64seq(a000 0001) => 1 a000 0001
> 
> [ john.ogness: Rewrite commit message. ]
> 
> Reported-by: Francesco Dolcini <francesco@dolcini.it>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202311171611.78d41dbe-oliver.sang@intel.com
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202311161555.3ee16fc9-oliver.sang@intel.com
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Great catch! It must have been complicated to debug this.

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr
  
Petr Mladek Jan. 12, 2024, 6:14 p.m. UTC | #5
On Fri 2024-01-12 11:28:35, Petr Mladek wrote:
> On Thu 2023-12-14 22:47:49, John Ogness wrote:
> > Note: This change only applies to 32bit architectures. On 64bit
> >       architectures the macros are NOPs.
> > 
> > __ulseq_to_u64seq() computes the upper 32 bits of the passed
> > argument value (@ulseq). The upper bits are derived from a base
> > value (@rb_next_seq) in a way that assumes @ulseq represents a
> > 64bit number that is less than or equal to @rb_next_seq.
> > 
> > Until now this mapping has been correct for all call sites. However,
> > in a follow-up commit, values of @ulseq will be passed in that are
> > higher than the base value. This requires a change to how the 32bit
> > value is mapped to a 64bit sequence number.
> > 
> > Rather than mapping @ulseq such that the base value is the end of a
> > 32bit block, map @ulseq such that the base value is in the middle of
> > a 32bit block. This allows supporting 31 bits before and after the
> > base value, which is deemed acceptable for the console sequence
> > number during runtime.
> > 
> > Here is an example to illustrate the previous and new mappings.
> > 
> > For a base value (@rb_next_seq) of 2 2000 0000...
> > 
> > Before this change the range of possible return values was:
> > 
> > 1 2000 0001 to 2 2000 0000
> > 
> > __ulseq_to_u64seq(1fff ffff) => 2 1fff ffff
> > __ulseq_to_u64seq(2000 0000) => 2 2000 0000
> > __ulseq_to_u64seq(2000 0001) => 1 2000 0001
> > __ulseq_to_u64seq(9fff ffff) => 1 9fff ffff
> > __ulseq_to_u64seq(a000 0000) => 1 a000 0000
> > __ulseq_to_u64seq(a000 0001) => 1 a000 0001
> > 
> > After this change the range of possible return values are:
> > 1 a000 0001 to 2 a000 0000
> > 
> > __ulseq_to_u64seq(1fff ffff) => 2 1fff ffff
> > __ulseq_to_u64seq(2000 0000) => 2 2000 0000
> > __ulseq_to_u64seq(2000 0001) => 2 2000 0001
> > __ulseq_to_u64seq(9fff ffff) => 2 9fff ffff
> > __ulseq_to_u64seq(a000 0000) => 2 a000 0000
> > __ulseq_to_u64seq(a000 0001) => 1 a000 0001
> > 
> > [ john.ogness: Rewrite commit message. ]
> > 
> > Reported-by: Francesco Dolcini <francesco@dolcini.it>
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > Closes: https://lore.kernel.org/oe-lkp/202311171611.78d41dbe-oliver.sang@intel.com
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > Closes: https://lore.kernel.org/oe-lkp/202311161555.3ee16fc9-oliver.sang@intel.com
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Signed-off-by: John Ogness <john.ogness@linutronix.de>
> 
> Great catch! It must have been complicated to debug this.
> 
> Reviewed-by: Petr Mladek <pmladek@suse.com>

That said, I am a bit nervous that a bug like this might cause
workqueue stall and panic() the kernel.

At least, this is how I read
https://lore.kernel.org/oe-lkp/202311171611.78d41dbe-oliver.sang@intel.com/

[  125.917650][    C0] workqueue events: flags=0x0
[  125.917979][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  125.917984][    C0]     pending: deferred_probe_timeout_work_func
[  125.917992][    C0] workqueue events_power_efficient: flags=0x82
[  125.919409][    C0]   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/256 refcnt=2
[  125.919412][    C0]     pending: do_cache_clean
[  125.919419][    C0] workqueue mm_percpu_wq: flags=0x8
[  125.920619][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  125.920622][    C0]     pending: vmstat_update
[  125.920632][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  129.568547][    C1] rcu: INFO: rcu_sched self-detected stall on CPU
[  129.568553][    C1] rcu: 	1-....: (26250 ticks this GP) idle=3c24/1/0x40000000 softirq=5858/5934 fqs=13122
[  129.568558][    C1] rcu: 	(t=26251 jiffies g=301 q=598 ncpus=2)
[  129.568562][    C1] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G        W        N 6.6.0-00134-g768c33033e41 #1
[ 129.568565][ C1] EIP: desc_read (kernel/printk/printk_ringbuffer.c:439) 
[ 129.568573][ C1] Code: fb 83 c4 08 5b 5e 5f 5d c3 8d 74 26 00 0f 0b eb d2 3e 8d 74 26 00 55 89 e5 57 89 cf 56 53 89 c3 83 ec 08 8b 08 b8 ff ff ff ff <89> 55 f0 d3 e0 f7 d0 21 d0 6b f0 58 83 f9 1f 0f 87 a7 e6 83 00 03

[  129.568575][    C1] EAX: ffffffff EBX: c1d5cd60 ECX: 0000000f EDX: 3fff8001
[  129.568576][    C1] ESI: 00000001 EDI: c320fd78 EBP: c320fd28 ESP: c320fd14
[  129.568578][    C1] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00000286
[  129.568582][    C1] CR0: 80050033 CR2: b7d77b93 CR3: 022b4000 CR4: 00040690
[  129.568583][    C1] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  129.568584][    C1] DR6: fffe0ff0 DR7: 00000400
[  129.568585][    C1] Call Trace:
[ 129.568589][ C1] ? show_regs (arch/x86/kernel/dumpstack.c:479) 
[ 129.568596][ C1] ? dump_cpu_task (kernel/sched/core.c:11572) 
[ 129.568602][ C1] ? rcu_dump_cpu_stacks (include/linux/cpumask.h:143 include/linux/cpumask.h:150 include/linux/cpumask.h:212 kernel/rcu/tree_stall.h:367) 
[ 129.568606][ C1] ? vprintk (kernel/printk/printk_safe.c:46) 
[ 129.568611][ C1] ? print_cpu_stall (kernel/rcu/tree_stall.h:692) 
[ 129.568616][ C1] ? check_cpu_stall (kernel/rcu/tree_stall.h:775) 
[ 129.568620][ C1] ? rcu_sched_clock_irq (kernel/rcu/tree.c:3869 kernel/rcu/tree.c:2240) 
[ 129.568623][ C1] ? update_process_times (arch/x86/include/asm/preempt.h:27 kernel/time/timer.c:2080) 
[ 129.568628][ C1] ? tick_sched_timer (kernel/time/tick-sched.c:1497) 
[ 129.568632][ C1] ? __hrtimer_run_queues (kernel/time/hrtimer.c:1688 kernel/time/hrtimer.c:1752) 
[ 129.568636][ C1] ? cpumask_weight (arch/x86/kernel/alternative.c:1766) 
[ 129.568640][ C1] ? hrtimer_interrupt (kernel/time/hrtimer.c:1817) 
[ 129.568645][ C1] ? sysvec_call_function_single (arch/x86/kernel/apic/apic.c:1074) 
[ 129.568650][ C1] ? __sysvec_apic_timer_interrupt (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:444 include/linux/jump_label.h:260 include/linux/jump_label.h:270 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1081) 
[ 129.568653][ C1] ? sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1074) 
[ 129.568656][ C1] ? handle_exception (arch/x86/entry/entry_32.S:1049) 
[ 129.568662][ C1] ? sysvec_call_function_single (arch/x86/kernel/apic/apic.c:1074) 
[ 129.568664][ C1] ? desc_read (kernel/printk/printk_ringbuffer.c:439) 
[ 129.568668][ C1] ? sysvec_call_function_single (arch/x86/kernel/apic/apic.c:1074) 
[ 129.568670][ C1] ? desc_read (kernel/printk/printk_ringbuffer.c:439) 
[ 129.568675][ C1] desc_read_finalized_seq (kernel/printk/printk_ringbuffer.c:1907) 
[ 129.568680][ C1] prb_next_reserve_seq (kernel/printk/printk_ringbuffer.c:2084) 
[ 129.568682][ C1] ? nbcon_get_cpu_emergency_nesting (kernel/printk/nbcon.c:984) 
[ 129.568689][ C1] nbcon_atomic_flush_all (kernel/printk/nbcon.c:1133) 
[ 129.568692][ C1] ? nbcon_atomic_flush_all (kernel/printk/nbcon.c:1133) 
[ 129.568695][ C1] printk_trigger_flush (arch/x86/include/asm/preempt.h:85 kernel/printk/printk.c:3993) 
[ 129.568698][ C1] nbcon_cpu_emergency_exit (kernel/printk/nbcon.c:1193) 
[ 129.568701][ C1] ? refcount_warn_saturate (lib/refcount.c:28 (discriminator 3)) 
[ 129.568706][ C1] __warn (kernel/panic.c:692) 
[ 129.568711][ C1] ? refcount_warn_saturate (lib/refcount.c:28 (discriminator 3)) 
[ 129.568714][ C1] report_bug (lib/bug.c:201 lib/bug.c:219) 
[ 129.568720][ C1] ? exc_overflow (arch/x86/kernel/traps.c:250) 
[ 129.568722][ C1] handle_bug (arch/x86/kernel/traps.c:216) 
[ 129.568724][ C1] exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1)) 
[ 129.568727][ C1] handle_exception (arch/x86/entry/entry_32.S:1049) 
[ 129.568729][ C1] EIP: refcount_warn_saturate (lib/refcount.c:28 (discriminator 3)) 
[ 129.568731][ C1] Code: 79 18 c2 01 68 a8 b9 c3 c1 e8 03 7b cf ff 0f 0b 58 c9 c3 8d b6 00 00 00 00 c6 05 86 79 18 c2 01 68 50 b9 c3 c1 e8 e7 7a cf ff <0f> 0b 59 c9 c3 66 90 89 c2 8b 00 3d 00 00 00 c0 74 12 83 f8 01 74

It looks like it caused some loop and refcout overlow or so.
But I might be wrong.

I would like to better understand this and check if we could prevent
it somehow.

Best Regards,
Petr
  
Sebastian Andrzej Siewior Jan. 15, 2024, 8:51 a.m. UTC | #6
On 2024-01-12 19:14:44 [+0100], Petr Mladek wrote:
> 
> That said, I am a bit nervous that a bug like this might cause
> workqueue stall and panic() the kernel.


> At least, this is how I read
> https://lore.kernel.org/oe-lkp/202311171611.78d41dbe-oliver.sang@intel.com/

well, workqueue stalls and RCU as well because the CPU spins. That is a
natural consequence because the CPU makes no progress (at boot). The
panic _might_ be due to panic_on_error or so.
There is no scheduler, nothing so one CPU is blocked and the world ends…

> It looks like it caused some loop and refcout overlow or so.
> But I might be wrong.
> 
> I would like to better understand this and check if we could prevent
> it somehow.

Based on memory: the problem is that the sign extension bug (the fixed
bug) returned the wrong or too low sequence number. So the printk code
tried again to obtain a new sequence number. And got the wrong
again. And this is what looped during boot. 

I'm not sure if this sort of lockup can happen now after the bug is
fixed. I can issue a NMI backtrace on all CPUs (32) without the sync (so
they all can printk immediately and not one after the other) and it
prints and continues…

> Best Regards,
> Petr

Sebastian
  
John Ogness Jan. 15, 2024, 10:52 a.m. UTC | #7
On 2024-01-12, Petr Mladek <pmladek@suse.com> wrote:
>> > Reported-by: Francesco Dolcini <francesco@dolcini.it>
>> > Reported-by: kernel test robot <oliver.sang@intel.com>
>> > Closes: https://lore.kernel.org/oe-lkp/202311171611.78d41dbe-oliver.sang@intel.com
>> > Reported-by: kernel test robot <oliver.sang@intel.com>
>> > Closes: https://lore.kernel.org/oe-lkp/202311161555.3ee16fc9-oliver.sang@intel.com
>> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>> > Signed-off-by: John Ogness <john.ogness@linutronix.de>
>> 
>> Great catch! It must have been complicated to debug this.
>> 
>> Reviewed-by: Petr Mladek <pmladek@suse.com>
>
> That said, I am a bit nervous that a bug like this might cause
> workqueue stall and panic() the kernel.
>
> At least, this is how I read
> https://lore.kernel.org/oe-lkp/202311171611.78d41dbe-oliver.sang@intel.com/

Note that the bug is reported for code that mainline will never
have. This patch "fixes" the problem before it is a problem. From the
perspective of mainline, the problem never existed. Perhaps it is
inappropriate to list these with the Closes tag.

> It looks like it caused some loop and refcout overlow or so.
> But I might be wrong.
>
> I would like to better understand this and check if we could prevent
> it somehow.

The code in question can be found here:

https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/kernel/printk/printk_ringbuffer.c?id=768c33033e41ad195a9304ebb183afb730d5ae98#n2031

The URL above points to prb_next_reserve_seq(), which is in an infinite
loop.

desc_last_finalized_seq() is returning a huge garbage value due to the
32bit sequence value 1 being incorrectly mapped to 0xffffffff00000001
(which is 18446744069414584321). When desc_read_finalized_seq() is
called (line 2082), it obviously returns -EINVAL because there is no
record with this huge sequence number.

The code interrupts -EINVAL to mean that the record has been
overwritten, and so it tries again with an updated last finalized
sequence number => infinite loop.

With this patch applied, the 32bit value 1 is correctly mapped to
sequence number 1 and all is OK.

The problem was that desc_last_finalized_seq() returned a sequence
number that was higher than the highest record. That was the bug. As
long as desc_last_finalized_seq() always returns a sequence number that
exists or used to exist, the retry loop is fine. And since the tail
record is always in the finalized state, there will always be a
finalized record available.

John
  
Petr Mladek Jan. 15, 2024, 4:17 p.m. UTC | #8
On Mon 2024-01-15 11:58:39, John Ogness wrote:
> On 2024-01-12, Petr Mladek <pmladek@suse.com> wrote:
> >> > Reported-by: Francesco Dolcini <francesco@dolcini.it>
> >> > Reported-by: kernel test robot <oliver.sang@intel.com>
> >> > Closes: https://lore.kernel.org/oe-lkp/202311171611.78d41dbe-oliver.sang@intel.com
> >> > Reported-by: kernel test robot <oliver.sang@intel.com>
> >> > Closes: https://lore.kernel.org/oe-lkp/202311161555.3ee16fc9-oliver.sang@intel.com
> >> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> >> > Signed-off-by: John Ogness <john.ogness@linutronix.de>
> >> 
> >> Great catch! It must have been complicated to debug this.
> >> 
> >> Reviewed-by: Petr Mladek <pmladek@suse.com>
> >
> > That said, I am a bit nervous that a bug like this might cause
> > workqueue stall and panic() the kernel.
> >
> > At least, this is how I read
> > https://lore.kernel.org/oe-lkp/202311171611.78d41dbe-oliver.sang@intel.com/
> 
> Note that the bug is reported for code that mainline will never
> have. This patch "fixes" the problem before it is a problem. From the
> perspective of mainline, the problem never existed. Perhaps it is
> inappropriate to list these with the Closes tag.

I think that it is better to keep the reference. It might help
archaeologists to understand the problem ;-)

> > It looks like it caused some loop and refcout overlow or so.
> > But I might be wrong.
> >
> > I would like to better understand this and check if we could prevent
> > it somehow.
> 
> The code in question can be found here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/kernel/printk/printk_ringbuffer.c?id=768c33033e41ad195a9304ebb183afb730d5ae98#n2031
> 
> The URL above points to prb_next_reserve_seq(), which is in an infinite
> loop.
> 
> desc_last_finalized_seq() is returning a huge garbage value due to the
> 32bit sequence value 1 being incorrectly mapped to 0xffffffff00000001
> (which is 18446744069414584321). When desc_read_finalized_seq() is
> called (line 2082), it obviously returns -EINVAL because there is no
> record with this huge sequence number.
> 
> The code interrupts -EINVAL to mean that the record has been
> overwritten, and so it tries again with an updated last finalized
> sequence number => infinite loop.
> 
> With this patch applied, the 32bit value 1 is correctly mapped to
> sequence number 1 and all is OK.
> 
> The problem was that desc_last_finalized_seq() returned a sequence
> number that was higher than the highest record. That was the bug. As
> long as desc_last_finalized_seq() always returns a sequence number that
> exists or used to exist, the retry loop is fine. And since the tail
> record is always in the finalized state, there will always be a
> finalized record available.

Thank you both for explanation. You are right that this was kind
of special.

You know, the code around reading the messages is getting more
and more complex. Also the u32 -> u64 transition is not 100% safe.
So some sanity checks might be useful.

That said, I do not see any obvious trivial one. Just the following
came to my mind. prb_first_seq() is reliable and we could do:

void next_seq_sanity_check(struct printk_ringbuffer *rb, u64 next_seq)
{
	struct prb_desc_ring *desc_ring = &rb->desc_ring;
	u64 first_seq = prb_first_seq(rb);
	u64 desc_count = DESC_COUNT(&rb->desc_ring)).

	WARN_ONCE(seq > first_seq + DESC_COUNT(desc_ring));
}

Well, I am not sure if it is worth it. Also the WARN() won't
be printed on consoles when the reading is broken.

Best Regards,
Petr
  
John Ogness Jan. 15, 2024, 5:08 p.m. UTC | #9
On 2024-01-15, Petr Mladek <pmladek@suse.com> wrote:
> You know, the code around reading the messages is getting more
> and more complex. Also the u32 -> u64 transition is not 100% safe.
> So some sanity checks might be useful.

This complexity only exists for 32bit systems, but yeah, it is still
important.

> That said, I do not see any obvious trivial one. Just the following
> came to my mind. prb_first_seq() is reliable and we could do:
>
> void next_seq_sanity_check(struct printk_ringbuffer *rb, u64 next_seq)
> {
> 	struct prb_desc_ring *desc_ring = &rb->desc_ring;
> 	u64 first_seq = prb_first_seq(rb);
> 	u64 desc_count = DESC_COUNT(&rb->desc_ring)).
>
> 	WARN_ONCE(seq > first_seq + DESC_COUNT(desc_ring));
> }

@seq is allowed to be 2^31 before or after @first_seq. The check would
look more like this:

    WARN_ONCE((rb_first_seq < 0xffffffff80000000 &&
               seq > rb_first_seq + 0x80000000) ||
              (rb_first_seq > 0x80000000 &&
               seq < rb_first_seq - 0x80000000));

> Well, I am not sure if it is worth it. Also the WARN() won't be
> printed on consoles when the reading is broken.

Broken printing is irrelevant. There are plenty of debug methods to get
at the ringbuffer. I am OK with adding this sanity check (again, only
for 32bit, within __ulseq_to_u64seq()).

John
  

Patch

diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
index b82a96dc2ea2..12f60c782e46 100644
--- a/kernel/printk/printk_ringbuffer.h
+++ b/kernel/printk/printk_ringbuffer.h
@@ -407,7 +407,7 @@  static inline u64 __ulseq_to_u64seq(struct printk_ringbuffer *rb, u32 ulseq)
 	 * Also the access to the ring buffer is always safe.
 	 */
 	rb_next_seq = prb_next_seq(rb);
-	seq = rb_next_seq - ((u32)rb_next_seq - ulseq);
+	seq = rb_next_seq - (s32)((u32)rb_next_seq - ulseq);
 
 	return seq;
 }