printk: save loglevel before console_verbose()

Message ID 20240216005128.1045812-1-stephen.s.brennan@oracle.com
State New
Headers
Series printk: save loglevel before console_verbose() |

Commit Message

Stephen Brennan Feb. 16, 2024, 12:51 a.m. UTC
  In panic() cases, console_verbose() is called, clobbering the
console_loglevel value. If the bug which led to the panic related to
printk, it can be useful to know the log level prior to the
console_verbose() call. Save it to allow debugging these cases.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 kernel/printk/printk.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)
  

Comments

John Ogness Feb. 16, 2024, 9:22 a.m. UTC | #1
On 2024-02-15, Stephen Brennan <stephen.s.brennan@oracle.com> wrote:
> In panic() cases, console_verbose() is called, clobbering the
> console_loglevel value. If the bug which led to the panic related to
> printk, it can be useful to know the log level prior to the
> console_verbose() call.

I've done a *lot* of printk debugging over the past 6 years and have
never had a case where this would be useful (or even interesting).

I assume there is some rare and particular scenario you are trying to
debug. And once you've debugged it, it is no longer useful for you
either.

IMHO this does not warrant adding an extra global variable for all Linux
users.

When Petr gets back from vacation, maybe he will have a different
opinion.

John Ogness
  
Stephen Brennan Feb. 16, 2024, 4:02 p.m. UTC | #2
On 2/16/24 01:22, John Ogness wrote:
> On 2024-02-15, Stephen Brennan <stephen.s.brennan@oracle.com> wrote:
>> In panic() cases, console_verbose() is called, clobbering the
>> console_loglevel value. If the bug which led to the panic related to
>> printk, it can be useful to know the log level prior to the
>> console_verbose() call.
> 
> I've done a *lot* of printk debugging over the past 6 years and have
> never had a case where this would be useful (or even interesting).

Hi John,

That's fair! The point of sending it upstream is seeing if anybody else
uses this information or if I'm the only one :)

> I assume there is some rare and particular scenario you are trying to
> debug. And once you've debugged it, it is no longer useful for you
> either.
>> IMHO this does not warrant adding an extra global variable for all Linux
> users.

I've been seeing bugs (to be fair, on older kernels without the latest
printk/nbcon work, which resolves much of this) caused by excessive
printk logging & slow serial consoles. In some of these cases, the
loglevel was set low at boot but modified by an application, so it has
been nice to know what the _actual_ loglevel was at the time of the
crash, which I can use with the console baud rates and the log buffer to
get an idea for how backlogged the console was at a point in time.

But of course, the console sequence numbers can tell us how backlogged a
console is at the time of crash, and you can infer the log level to some
extent from that. So I can see this not being as valuable generally as
it is for my use case.

Thanks,
Stephen

> When Petr gets back from vacation, maybe he will have a different
> opinion.
> 
> John Ogness
  
Petr Mladek Feb. 27, 2024, 4:39 p.m. UTC | #3
On Fri 2024-02-16 08:02:20, Stephen Brennan wrote:
> On 2/16/24 01:22, John Ogness wrote:
> > On 2024-02-15, Stephen Brennan <stephen.s.brennan@oracle.com> wrote:
> >> In panic() cases, console_verbose() is called, clobbering the
> >> console_loglevel value. If the bug which led to the panic related to
> >> printk, it can be useful to know the log level prior to the
> >> console_verbose() call.
> > 
> > I've done a *lot* of printk debugging over the past 6 years and have
> > never had a case where this would be useful (or even interesting).
> 
> Hi John,
> 
> That's fair! The point of sending it upstream is seeing if anybody else
> uses this information or if I'm the only one :)
> 
> > I assume there is some rare and particular scenario you are trying to
> > debug. And once you've debugged it, it is no longer useful for you
> > either.
> >> IMHO this does not warrant adding an extra global variable for all Linux
> > users.
> 
> I've been seeing bugs (to be fair, on older kernels without the latest
> printk/nbcon work, which resolves much of this) caused by excessive
> printk logging & slow serial consoles.

Yeah, printk() might block a non-preemptive context for a long time.
The current owner is responsible for flushing all messages which
might be added by other CPUs in parallel. There is some logic to
handover the console_lock() but it is not reliable.

And yes, these problems should be solved by the printk kthreads.
Even though, they would not help in panic() when the messages
must be flushed directly.

Well, there is a patch in linux-next which would block printk()
on non-panic CPUs. It would help with some problems as well.

> In some of these cases, the
> loglevel was set low at boot but modified by an application, so it has
> been nice to know what the _actual_ loglevel was at the time of the
> crash, which I can use with the console baud rates and the log buffer to
> get an idea for how backlogged the console was at a point in time.

I think that systemd or an log daemon might be configured to
modify the console_loglevel set by the kernel during boot.
They do so during boot when the userspace services are
being started.

The value might be read at runtime either by:

  $> cat /proc/sys/kernel/printk
  4       4       1       7

or by:

  $> sysctl kernel.printk
  kernel.printk = 4       4       1       7

It is a quite confusing interface. The console_loglevel
is the first number.

I doubt that any userspace application changes the loglevel
later when the system is up and running.

> But of course, the console sequence numbers can tell us how backlogged a
> console is at the time of crash, and you can infer the log level to some
> extent from that. So I can see this not being as valuable generally as
> it is for my use case.

Setting the right console_loglevel is problematic. It helps to remove
problems with too many "normal" messages, for example, during boot.
But people usually want to see warnings, errors, for example,
from rcu or softlockup watchdogs. The sad thing is when printk()
and slow consoles make the stall situation even worse. Let's
see how the printk kthreads would help.

Finally, I agree with John that this patch is not worth adding.
It does not add much value for most users.

Best Regards,
Petr
  

Patch

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index f2444b581e16..97bd9427f118 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2521,11 +2521,19 @@  MODULE_PARM_DESC(console_suspend, "suspend console during suspend"
 	" and hibernate operations");
 
 static bool printk_console_no_auto_verbose;
+int saved_console_loglevel;
 
 void console_verbose(void)
 {
-	if (console_loglevel && !printk_console_no_auto_verbose)
+	if (console_loglevel && !printk_console_no_auto_verbose) {
+		/*
+		 * Save the prior log level for later debugging. Avoid
+		 * clobbering in case of multiple console_verbose() calls.
+		 */
+		if (console_loglevel != CONSOLE_LOGLEVEL_MOTORMOUTH)
+			saved_console_loglevel = console_loglevel;
 		console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH;
+	}
 }
 EXPORT_SYMBOL_GPL(console_verbose);