[2/4] watchdog/softlockup: Use printk_cpu_sync_get_irqsave() to serialize reporting

Message ID 20231220131534.2.Ia5906525d440d8e8383cde31b7c61c2aadc8f907@changeid
State New
Headers
Series watchdog: Better handling of concurrent lockups |

Commit Message

Doug Anderson Dec. 20, 2023, 9:15 p.m. UTC
  Instead of introducing a spinlock, use printk_cpu_sync_get_irqsave()
and printk_cpu_sync_put_irqrestore() to serialize softlockup
reporting. Alone this doesn't have any real advantage over the
spinlock, but this will allow us to use the same function in a future
change to also serialize hardlockup crawls.

NOTE: for the most part this serialization is important because we
often end up in the show_regs() path and that has no built-in
serialization if there are multiple callers at once. However, even in
the case where we end up in the dump_stack() path this still has some
advantages because the stack will be guaranteed to be together in the
logs with the lockup message with no interleaving.

NOTE: the fact that printk_cpu_sync_get_irqsave() is allowed to be
called multiple times on the same CPU is important here. Specifically
we hold the "lock" while calling dump_stack() which also gets the same
"lock". This is explicitly documented to be OK and means we don't need
to introduce a variant of dump_stack() that doesn't grab the lock.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 kernel/watchdog.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
  

Comments

lizhe.67@bytedance.com Dec. 22, 2023, 7:13 a.m. UTC | #1
On Wed, 20 Dec 2023 13:15:35 -0800, dianders@chromium.org wrote: 

>diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>index b4fd2f12137f..526041a1100a 100644
>--- a/kernel/watchdog.c
>+++ b/kernel/watchdog.c
>@@ -454,7 +454,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> 	struct pt_regs *regs = get_irq_regs();
> 	int duration;
> 	int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
>-	static DEFINE_SPINLOCK(watchdog_output_lock);
>+	unsigned long flags;
> 
> 	if (!watchdog_enabled)
> 		return HRTIMER_NORESTART;
>@@ -521,7 +521,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> 		/* Start period for the next softlockup warning. */
> 		update_report_ts();
> 
>-		spin_lock(&watchdog_output_lock);
>+		printk_cpu_sync_get_irqsave(flags);
> 		pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
> 			smp_processor_id(), duration,
> 			current->comm, task_pid_nr(current));
>@@ -531,7 +531,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> 			show_regs(regs);
> 		else
> 			dump_stack();
>-		spin_unlock(&watchdog_output_lock);
>+		printk_cpu_sync_put_irqrestore(flags);
> 
> 		if (softlockup_all_cpu_backtrace) {
> 			trigger_allbutcpu_cpu_backtrace(smp_processor_id());
>-- 

Reviewed-by: Li Zhe <lizhe.67@bytedance.com>
  
John Ogness Dec. 22, 2023, 9:30 a.m. UTC | #2
On 2023-12-20, Douglas Anderson <dianders@chromium.org> wrote:
> Instead of introducing a spinlock, use printk_cpu_sync_get_irqsave()
> and printk_cpu_sync_put_irqrestore() to serialize softlockup
> reporting. Alone this doesn't have any real advantage over the
> spinlock, but this will allow us to use the same function in a future
> change to also serialize hardlockup crawls.

Thanks for this change. For me, this is the preferred workaround to
best-effort serialize a particular type of output. Hopefully one day we
will get to implementing printk contexts [0] [1] so that message blocks
can be inserted atomically.

> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: John Ogness <john.ogness@linutronix.de>

[0] https://lore.kernel.org/lkml/1299043680.4208.97.camel@Joe-Laptop
[1] https://lore.kernel.org/lkml/b17fc8afc8984fedb852921366190104@AcuMS.aculab.com
  
Petr Mladek Feb. 6, 2024, 10:21 a.m. UTC | #3
On Fri 2023-12-22 10:36:37, John Ogness wrote:
> On 2023-12-20, Douglas Anderson <dianders@chromium.org> wrote:
> > Instead of introducing a spinlock, use printk_cpu_sync_get_irqsave()
> > and printk_cpu_sync_put_irqrestore() to serialize softlockup
> > reporting. Alone this doesn't have any real advantage over the
> > spinlock, but this will allow us to use the same function in a future
> > change to also serialize hardlockup crawls.
> 
> Thanks for this change. For me, this is the preferred workaround to
> best-effort serialize a particular type of output.

I agree.

The good thing is that dump_stack_lvl() and nmi_cpu_backtrace()
use this lock on its known. Also nmi_trigger_cpumask_backtrace()
prevents parallel calls. It means that the particular backtraces
should be serialized for most callers.

> Hopefully one day we
> will get to implementing printk contexts [0] [1] so that message blocks
> can be inserted atomically.

I didn't think about this possibility. You are right. It might be even
better than the printk_cpu_sync_put_irqrestore() because it allows
passing the lock to a higher priority context and
supports timeout.


Best Regards,
Petr
  

Patch

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b4fd2f12137f..526041a1100a 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -454,7 +454,7 @@  static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	struct pt_regs *regs = get_irq_regs();
 	int duration;
 	int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
-	static DEFINE_SPINLOCK(watchdog_output_lock);
+	unsigned long flags;
 
 	if (!watchdog_enabled)
 		return HRTIMER_NORESTART;
@@ -521,7 +521,7 @@  static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 		/* Start period for the next softlockup warning. */
 		update_report_ts();
 
-		spin_lock(&watchdog_output_lock);
+		printk_cpu_sync_get_irqsave(flags);
 		pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
 			smp_processor_id(), duration,
 			current->comm, task_pid_nr(current));
@@ -531,7 +531,7 @@  static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 			show_regs(regs);
 		else
 			dump_stack();
-		spin_unlock(&watchdog_output_lock);
+		printk_cpu_sync_put_irqrestore(flags);
 
 		if (softlockup_all_cpu_backtrace) {
 			trigger_allbutcpu_cpu_backtrace(smp_processor_id());