[printk,v2,2/5] printk: Add NMI safety to console_flush_on_panic() and console_unblank()
Commit Message
The printk path is NMI safe because it only adds content to the
buffer and then triggers the delayed output via irq_work. If the
console is flushed or unblanked on panic (from NMI context) then it
can deadlock in down_trylock_console_sem() because the semaphore is
not NMI safe.
Avoid taking the console lock when flushing in panic. To prevent
other CPUs from taking the console lock while flushing, have
console_lock() block and console_trylock() fail for non-panic CPUs
during panic.
Skip unblanking in panic if the current context is NMI.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
kernel/printk/printk.c | 77 +++++++++++++++++++++++++++---------------
1 file changed, 49 insertions(+), 28 deletions(-)
Comments
On Mon 2023-07-10 15:51:21, John Ogness wrote:
> The printk path is NMI safe because it only adds content to the
> buffer and then triggers the delayed output via irq_work. If the
> console is flushed or unblanked on panic (from NMI context) then it
> can deadlock in down_trylock_console_sem() because the semaphore is
> not NMI safe.
<thinking loudly>
Just to be sure. The semaphore is not NMI safe because even the
trylock takes an internal spin lock. Am I right, please?
Alternative solution would be to make down_trylock() NMI safe
by using raw_spin_trylock_irqsave() for the internal lock.
But this actually would not solve the whole problem. If the NMI safe
down_trylock() succeeded then up() would need to be called
in NMI as well. And up() really needs to take the spin lock
which might get blocked in the meantime.
Another question is whether we want to call c->unblank()
in NMI even when down_trylock() was NMI safe. It seems that it
is implemented only for struct console vt_console_driver.
I am pretty sure that it takes more internal locks which
are not NMI safe either.
On the other hand, if we would risk calling c->write() then
we might risk calling c->unblank() either.
Finally, it is not only about NMI. Any locks might cause a deadlock
in panic() in any context. It is because other CPUs are stopped
and might block some locks.
</thinking loudly>
In my opinion, we should handle c->unblank() in panic() the same way
as c->write() in panic().
I suggest to create
void __console_unblank(void)
{
struct console *c;
int cookie;
cookie = console_srcu_read_lock();
for_each_console_srcu(c) {
if ((console_srcu_read_flags(c) & CON_ENABLED) && c->unblank)
c->unblank();
}
console_srcu_read_unlock(cookie);
}
and call this in console_flush_on_panic() without the console_lock().
We still need to take the lock during Oops when the system tries
to continue. In this case, the NMI check makes perfect sense.
NMI might cause a deadlock. Other contexts should be safe
because the CPUs are not stopped.
> Avoid taking the console lock when flushing in panic. To prevent
> other CPUs from taking the console lock while flushing, have
> console_lock() block and console_trylock() fail for non-panic CPUs
> during panic.
I really like the trick that console_lock() and console_trylock()
would start failing on non-panic CPUs. It should prevent races
when the other CPUs were not stopped for some reasons.
I am still slightly afraid to do this even before stopping other CPUs.
But I do not have any real scenario where it might be a problem.
And it is only about console_lock() which should preferably be
available for the panic-CPU. Also we should _not_ rely on the other
CPUs during panic() anyway. So, it should be fine after all.
Well, would you mind to split this into two patches?
1st patch would split __console_unblank(), call it from
console_flush_on_panic() after the trylock().
Also it would add the NMI check into the original
console_unblank() which would still be called in
bust_spinlocks().
The commit message should explain the motivation
(primary the internal spinlock in the semaphore
implementation). Also it should explain why only NMI
is a problem when called in the Oops path.
And why the locks are problem in any context
when called in panic() after CPUs were stopped.
2nd patch would prevent taking console_lock on non-panic CPUs.
And it would remove console_trylock()/console_unlock() from
console_flush_on_panic().
The commit message should explain the motivation
(the internal spinlock in the semaphore code).
Also it would solve a problem with a potential
double unlock. And it should mention that it should
not be worse then before when the trylock() result
was ignored.
IMHO, both patches has a potential to cause regressions.
And it is better to do it in smaller steps.
Best Regards,
Petr
On (23/07/11 17:43), Petr Mladek wrote:
> On Mon 2023-07-10 15:51:21, John Ogness wrote:
> > The printk path is NMI safe because it only adds content to the
> > buffer and then triggers the delayed output via irq_work. If the
> > console is flushed or unblanked on panic (from NMI context) then it
> > can deadlock in down_trylock_console_sem() because the semaphore is
> > not NMI safe.
>
> <thinking loudly>
>
> Just to be sure. The semaphore is not NMI safe because even the
> trylock takes an internal spin lock. Am I right, please?
>
> Alternative solution would be to make down_trylock() NMI safe
> by using raw_spin_trylock_irqsave() for the internal lock.
>
> But this actually would not solve the whole problem. If the NMI safe
> down_trylock() succeeded then up() would need to be called
> in NMI as well. And up() really needs to take the spin lock
> which might get blocked in the meantime.
I guess another problem with up() is that it also may call
try_to_wake_up(), that may attempt to acquire a bunch of other
spin_lock-s.
On 2023-07-11, Petr Mladek <pmladek@suse.com> wrote:
> Just to be sure. The semaphore is not NMI safe because even the
> trylock takes an internal spin lock. Am I right, please?
Yes, that is one of the reasons. Sergey mentioned another (waking a task
on up()).
> Alternative solution would be to make down_trylock() NMI safe
> by using raw_spin_trylock_irqsave() for the internal lock.
NMI contexts are only allowed to take raw spinlocks if those spinlocks
are only used from NMI context. Otherwise you could have deadlock:
raw_spin_lock()
--- NMI ---
raw_spin_lock()
Using a trylock does not avoid the deadlock danger.
> Another question is whether we want to call c->unblank()
> in NMI even when down_trylock() was NMI safe. It seems that it
> is implemented only for struct console vt_console_driver.
> I am pretty sure that it takes more internal locks which
> are not NMI safe either.
Yes, it does. As an example, it calls mod_timer(), which is also not NMI
safe. Clearly the unblank() callback must not be called in NMI context.
> Finally, it is not only about NMI. Any locks might cause a deadlock
> in panic() in any context. It is because other CPUs are stopped
> and might block some locks.
With the atomic/threaded model this is not true. The port ownership can
be safely taken over from stopped CPUs.
> In my opinion, we should handle c->unblank() in panic() the same way
> as c->write() in panic().
I do not agree. Clearly unblank() is not NMI safe. Also, in current
mainline code, console_unblank() will already give up if the trylock
failed (rather than ignoring the lock, like write() does). So
console_unblank() might as well also give up if in NMI context.
John
On Wed 2023-07-12 23:17:49, John Ogness wrote:
> On 2023-07-11, Petr Mladek <pmladek@suse.com> wrote:
> > Just to be sure. The semaphore is not NMI safe because even the
> > trylock takes an internal spin lock. Am I right, please?
>
> Yes, that is one of the reasons. Sergey mentioned another (waking a task
> on up()).
I see.
> > Alternative solution would be to make down_trylock() NMI safe
> > by using raw_spin_trylock_irqsave() for the internal lock.
>
> NMI contexts are only allowed to take raw spinlocks if those spinlocks
> are only used from NMI context. Otherwise you could have deadlock:
>
> raw_spin_lock()
> --- NMI ---
> raw_spin_lock()
>
> Using a trylock does not avoid the deadlock danger.
>
> > Another question is whether we want to call c->unblank()
> > in NMI even when down_trylock() was NMI safe. It seems that it
> > is implemented only for struct console vt_console_driver.
> > I am pretty sure that it takes more internal locks which
> > are not NMI safe either.
>
> Yes, it does. As an example, it calls mod_timer(), which is also not NMI
> safe. Clearly the unblank() callback must not be called in NMI context.
>
> > Finally, it is not only about NMI. Any locks might cause a deadlock
> > in panic() in any context. It is because other CPUs are stopped
> > and might block some locks.
>
> With the atomic/threaded model this is not true. The port ownership can
> be safely taken over from stopped CPUs.
Right. But it would mean using the special lock also in c->unblank()
code. And it is only tty console which is one of the most complicated
consoles.
> > In my opinion, we should handle c->unblank() in panic() the same way
> > as c->write() in panic().
>
> I do not agree. Clearly unblank() is not NMI safe. Also, in current
> mainline code, console_unblank() will already give up if the trylock
> failed (rather than ignoring the lock, like write() does). So
> console_unblank() might as well also give up if in NMI context.
You are right.
OK, could we at least improve the commit message, please?
Something like:
<proposal>
console_sem() is not NMI safe even when using down_trylock(). Both
down_trylock() and up() are using an interal spinlock. up() might even
call wake_up_process() with another locks.
It is even worse in the panic() code path where the locks might be blocked
by stopped CPUs.
The sepaphore is used in two code paths, in console_unblank() and when
flushing console messages. On the low level, it is needed when calling
c->unblank() and c->write() callbacks.
Both code paths are not safe in panic() but they are handled differently.
c->unblank() is called only when console_trylock() succeeded. c->write()
is called in console_flush_on_panic() even when console_trylock().
The risk of a deadlock in c->write() callbacks is reduced by using trylock()
for the internal locks when oops_in_progess variable is set.
Reduce the risk of deadlocks caused the console semapthore by:
+ bailing out from console_unblank() in NMI
+ not taking the console_sem() in console_flush_on_panic() at all
Simple removal of console_trylock() in console_flush_on_panic() would
cause that other CPUs might still be able to take it and race.
The problem is avoided by checking panic_in_progress() in console_lock()
and console_trylock(). They will never succeed on non-panic CPUs.
The change is a preparation step for introducing printk kthreads and
atomic console write callbacks. It would make the panic() code path
completely safe for consoles without c->unblank() callback.
</proposal>
Wait, the last paragraph is not true. console_trylock() is still
called in console_unblank() in non-NMI context. But the lock
might be blocked by a stopped CPU.
It can be solved by checking whether there is any registered console
with c->unblank() callback first. As a result, console_trylock()
would be called only when a tty console is registered. The panic() path
really might be completely safe where only safe consoles are
registered.
It would make sense to do separate patch for console_unblank()
and console_flush_on_panic().
Of course, we might also improve console_unblank() later. But I
would still like to improve the commit message. You know, the original
commit title and message is talking about NMI. But the patch
has effects even in non-NMI context.
Best Regards,
Petr
On (23/07/13 16:43), Petr Mladek wrote:
>
> Simple removal of console_trylock() in console_flush_on_panic() would
> cause that other CPUs might still be able to take it and race.
> The problem is avoided by checking panic_in_progress() in console_lock()
> and console_trylock(). They will never succeed on non-panic CPUs.
>
In theory, we also can have non-panic CPU in console_flush_all(),
which should let panic CPU to take over the next time it checks
abandon_console_lock_in_panic() (other_cpu_in_panic() after 5/5),
but it may not happen immediately. I wonder if we somehow can/want
to "wait" in console_flush_on_panic() for non-panic CPU handover?
On Fri 2023-07-14 13:00:49, Sergey Senozhatsky wrote:
> On (23/07/13 16:43), Petr Mladek wrote:
> >
> > Simple removal of console_trylock() in console_flush_on_panic() would
> > cause that other CPUs might still be able to take it and race.
> > The problem is avoided by checking panic_in_progress() in console_lock()
> > and console_trylock(). They will never succeed on non-panic CPUs.
> >
>
> In theory, we also can have non-panic CPU in console_flush_all(),
> which should let panic CPU to take over the next time it checks
> abandon_console_lock_in_panic() (other_cpu_in_panic() after 5/5),
> but it may not happen immediately. I wonder if we somehow can/want
> to "wait" in console_flush_on_panic() for non-panic CPU handover?
Good point. It might actually be any console_lock() owner,
including printk() on other CPU.
I think that we might need to add some wait() as we did in the last
attempt, see the commit b87f02307d3cfbda76852 ("printk: Wait for
the global console lock when the system is going down").
Anyway, it will be more important after introducing the kthreads.
There is a non-trivial chance that they would block the lock.
They might be busy when handling a message printed right before
the panic() was called. At least, this is what I saw in the last
attempt to introduce the kthreads.
But maybe, it will be somehow hidden in the new atomic lock.
It might be passed to a printk context with a higher priority
and it uses some wait internally, see the waiting in the patch
https://lore.kernel.org/all/20230302195618.156940-7-john.ogness@linutronix.de/
Anyway, this patch does not make it worse. It just ignores the
potential console_lock owner in console_flush_on_panic() another
way.
Best Regards,
Petr
@@ -2583,6 +2583,25 @@ static int console_cpu_notify(unsigned int cpu)
return 0;
}
+/*
+ * Return true when this CPU should unlock console_sem without pushing all
+ * messages to the console. This reduces the chance that the console is
+ * locked when the panic CPU tries to use it.
+ */
+static bool abandon_console_lock_in_panic(void)
+{
+ if (!panic_in_progress())
+ return false;
+
+ /*
+ * We can use raw_smp_processor_id() here because it is impossible for
+ * the task to be migrated to the panic_cpu, or away from it. If
+ * panic_cpu has already been set, and we're not currently executing on
+ * that CPU, then we never will be.
+ */
+ return atomic_read(&panic_cpu) != raw_smp_processor_id();
+}
+
/**
* console_lock - block the console subsystem from printing
*
@@ -2595,6 +2614,10 @@ void console_lock(void)
{
might_sleep();
+ /* On panic, the console_lock must be left to the panic cpu. */
+ while (abandon_console_lock_in_panic())
+ msleep(1000);
+
down_console_sem();
if (console_suspended)
return;
@@ -2613,6 +2636,9 @@ EXPORT_SYMBOL(console_lock);
*/
int console_trylock(void)
{
+ /* On panic, the console_lock must be left to the panic cpu. */
+ if (abandon_console_lock_in_panic())
+ return 0;
if (down_trylock_console_sem())
return 0;
if (console_suspended) {
@@ -2631,25 +2657,6 @@ int is_console_locked(void)
}
EXPORT_SYMBOL(is_console_locked);
-/*
- * Return true when this CPU should unlock console_sem without pushing all
- * messages to the console. This reduces the chance that the console is
- * locked when the panic CPU tries to use it.
- */
-static bool abandon_console_lock_in_panic(void)
-{
- if (!panic_in_progress())
- return false;
-
- /*
- * We can use raw_smp_processor_id() here because it is impossible for
- * the task to be migrated to the panic_cpu, or away from it. If
- * panic_cpu has already been set, and we're not currently executing on
- * that CPU, then we never will be.
- */
- return atomic_read(&panic_cpu) != raw_smp_processor_id();
-}
-
/*
* Check if the given console is currently capable and allowed to print
* records.
@@ -3054,6 +3061,10 @@ void console_unblank(void)
* In that case, attempt a trylock as best-effort.
*/
if (oops_in_progress) {
+ /* Semaphores are not NMI-safe. */
+ if (in_nmi())
+ return;
+
if (down_trylock_console_sem() != 0)
return;
} else
@@ -3083,14 +3094,24 @@ void console_unblank(void)
*/
void console_flush_on_panic(enum con_flush_mode mode)
{
+ bool handover;
+ u64 next_seq;
+
/*
- * If someone else is holding the console lock, trylock will fail
- * and may_schedule may be set. Ignore and proceed to unlock so
- * that messages are flushed out. As this can be called from any
- * context and we don't want to get preempted while flushing,
- * ensure may_schedule is cleared.
+ * Ignore the console lock and flush out the messages. Attempting a
+ * trylock would not be useful because:
+ *
+ * - if it is contended, it must be ignored anyway
+ * - console_lock() and console_trylock() block and fail
+ * respectively in panic for non-panic CPUs
+ * - semaphores are not NMI-safe
+ */
+
+ /*
+ * If another context is holding the console lock,
+ * @console_may_schedule might be set. Clear it so that
+ * this context does not call cond_resched() while flushing.
*/
- console_trylock();
console_may_schedule = 0;
if (mode == CONSOLE_REPLAY_ALL) {
@@ -3103,15 +3124,15 @@ void console_flush_on_panic(enum con_flush_mode mode)
cookie = console_srcu_read_lock();
for_each_console_srcu(c) {
/*
- * If the above console_trylock() failed, this is an
- * unsynchronized assignment. But in that case, the
+ * This is an unsynchronized assignment, but the
* kernel is in "hope and pray" mode anyway.
*/
c->seq = seq;
}
console_srcu_read_unlock(cookie);
}
- console_unlock();
+
+ console_flush_all(false, &next_seq, &handover);
}
/*