Message ID | 20230302195618.156940-3-john.ogness@linutronix.de |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp41975wrd; Thu, 2 Mar 2023 11:59:41 -0800 (PST) X-Google-Smtp-Source: AK7set9rqKHN+fAI1Jl7zW0MSGTqNxQIIB0aXGop7uAsrMUtkB2gsMHYmCGdfa7HxOpuseMA4Rbl X-Received: by 2002:a05:6a20:1aaa:b0:cc:fbdc:886 with SMTP id ci42-20020a056a201aaa00b000ccfbdc0886mr12621518pzb.18.1677787181049; Thu, 02 Mar 2023 11:59:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677787181; cv=none; d=google.com; s=arc-20160816; b=PGYH7Y8QMsG/8WY1tUJvZ22QLLd3JupH3l6O5FPRKlbCCzagMziwvoRo62PGkKOV+k bhn9d+zqBws1XlBxIw8JDw35BpXNfeK+VleS5/EkHpNqANf+zC8Mrg5MiTZWiSlrMw8q 3f6gNtSoK4n+vqJwd0yO86msyxmrV7GtD6LTQdZw8R63cyOvHn2TXNqUwoyg786LiBS0 wzEegVzk96lzVztI+bK+TnEz9sRtznY9Wp3WZ3GEPm+IGJjCsFX7+wHVecjeZksnt8mu jWyfnbaZ+jiy3uOSs7WD5nbJwT9Q5IaTOmIUW/6yZj8JeD8X+EEleC+LziM980X0oiZD qM0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:dkim-signature :dkim-signature:from; bh=+RiY4OM6UJRNz/WSBX/5KTZe1S/q7JPH46lONkiG6+g=; b=W38OsAXHsn7VVy5WwAyicOR81D6XhCcs+hd+lVwT8sYsvpiHc7SlGPIm6/g2Pqg0iT 8IM8cT/S4EBV3fZAB4UcdMXYTob/R+CH5w3IfK+UXWuOx5j63KKeidysVxPC7o/C+r0/ Ty8x+EourZ9nYPzRtMYCj0IKQUskY64xBzM/92mVyw+LCTD60bqdEXkYcCVp2uIeqG3k 16lQx/KJfh/sYl9hMmYjjMS8UYJvcaaMOJKuiel1e80GxlYkWDOk8zy1K+OOWZ+lyCls sYSfX4TtjP2Zw2zvhgvQ647H6/5usSdUaEcXv02rm1WWTDCLnxhZuXEAJdBGO/HMvovx LusA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="Y//5m69s"; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=5mkIXmg8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y9-20020aa79e09000000b005a91615aea7si200363pfq.215.2023.03.02.11.59.28; Thu, 02 Mar 2023 11:59:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="Y//5m69s"; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=5mkIXmg8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229955AbjCBT57 (ORCPT <rfc822;davidbtadokoro@gmail.com> + 99 others); Thu, 2 Mar 2023 14:57:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229779AbjCBT5p (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 2 Mar 2023 14:57:45 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 61787474D7 for <linux-kernel@vger.kernel.org>; Thu, 2 Mar 2023 11:57:44 -0800 (PST) From: John Ogness <john.ogness@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1677787062; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+RiY4OM6UJRNz/WSBX/5KTZe1S/q7JPH46lONkiG6+g=; b=Y//5m69smqt/3rvjKkJp+AlCiSiSrJ2hIBzpHRCpeJXg/jOSGUYgrqzG6OPRuopLBUgI93 drFPLWbNUW/fRZlcONLMavFXa5Ibf1mkU0drw9z2KBEmFfJsksGvn1r5BQExYV/wcjj/D9 Lhx/bq9/xUikhZhko9k7557OS/3qwp2w3O2W9RgfQBrbQwwcZUi7nOQVxdn6PbbK0ZVTNR p9N6Y1bTFLq74dYQ9u4Iv62cwBG27Gy0I7yiG8WBav5uqHe7as38pYd49SanoYvV+CQl1N f4tx5rN4Cndz1RU2q5bwq6uoZMmD4xhgtDWyLhUarNHhzL1TuU+ycaxQ+E++iQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1677787062; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+RiY4OM6UJRNz/WSBX/5KTZe1S/q7JPH46lONkiG6+g=; b=5mkIXmg8E1wb4LxyOtqUWqr3pPJWzahz+3aLJVXlzhMk3YTSTUrSC36xyMS9fCgNetkQGg J14n8LFF4jHNUMBg== To: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org>, Steven Rostedt <rostedt@goodmis.org>, Thomas Gleixner <tglx@linutronix.de>, linux-kernel@vger.kernel.org Subject: [PATCH printk v1 02/18] printk: Add NMI check to down_trylock_console_sem() Date: Thu, 2 Mar 2023 21:02:02 +0106 Message-Id: <20230302195618.156940-3-john.ogness@linutronix.de> In-Reply-To: <20230302195618.156940-1-john.ogness@linutronix.de> References: <20230302195618.156940-1-john.ogness@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,INVALID_DATE_TZ_ABSURD, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759287371104733090?= X-GMAIL-MSGID: =?utf-8?q?1759287371104733090?= |
Series |
threaded/atomic console support
|
|
Commit Message
John Ogness
March 2, 2023, 7:56 p.m. UTC
The printk path is NMI safe because it only adds content to the
buffer and then triggers the delayed output via irq_work. If the
console is flushed or unblanked (on panic) from NMI then it can
deadlock in down_trylock_console_sem() because the semaphore is not
NMI safe.
Avoid try-locking the console from NMI and assume it failed.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
kernel/printk/printk.c | 4 ++++
1 file changed, 4 insertions(+)
Comments
On Thu 2023-03-02 21:02:02, John Ogness wrote: > The printk path is NMI safe because it only adds content to the > buffer and then triggers the delayed output via irq_work. If the > console is flushed or unblanked (on panic) from NMI then it can > deadlock in down_trylock_console_sem() because the semaphore is not > NMI safe. Do you have any particular code path in mind, please? This does not work in console_flush_on_panic(), see below. > Avoid try-locking the console from NMI and assume it failed. > > Signed-off-by: John Ogness <john.ogness@linutronix.de> > --- > kernel/printk/printk.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 40c5f4170ac7..84af038292d9 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -318,6 +318,10 @@ static int __down_trylock_console_sem(unsigned long ip) > int lock_failed; > unsigned long flags; > > + /* Semaphores are not NMI-safe. */ > + if (in_nmi()) > + return 1; console_flush_on_panic() ignores the console_trylock() return value: void console_flush_on_panic(enum con_flush_mode mode) { [...] /* * If someone else is holding the console lock, trylock will fail * and may_schedule may be set. Ignore and proceed to unlock so * that messages are flushed out. As this can be called from any * context and we don't want to get preempted while flushing, * ensure may_schedule is cleared. */ console_trylock(); console_may_schedule = 0; console_unlock(); } So that this change would cause a non-paired console_unlock(). And console_unlock might still deadlock on the console_sem->lock. OK, your change makes sense. But we still should try flushing the messages in console_flush_on_panic() even in NMI. One solution would be to call console_flush_all() directly in console_flush_on_panic() without taking console_lock(). It should not be worse than the current code which ignores the console_trylock() return value. Note that it mostly works because console_flush_on_panic() is called when other CPUs are supposed to be stopped. We only would need to prevent other CPUs from flushing messages as well if they were still running by chance. But we actually already do this, see abandon_console_lock_in_panic(). Well, we should make sure that the abandon_console_lock_in_panic() check is done before flushing the first message. All these changes together would prevent deadlock on console_sem->lock. But the synchronization "guarantees" should stay the same. > + > /* > * Here and in __up_console_sem() we need to be in safe mode, > * because spindump/WARN/etc from under console ->lock will Alternative solution would be to make the generic down_trylock() safe in NMI or in panic(). It might do spin_trylock() when oops_in_progress is set. I mean to do the same trick and console drivers do with port->lock. But I am not sure if other down_trylock() users would be happy with this change. Yes, it might get solved by introducing down_trylock_panic() that might be used only in console_flush_on_panic(). But it might be more hairy than the solution proposed above. Best Regards, Petr
On 2023-03-07, Petr Mladek <pmladek@suse.com> wrote: > So that this change would cause a non-paired console_unlock(). > And console_unlock might still deadlock on the console_sem->lock. Yes, but at least it would have flushed beforehand. > One solution would be to call console_flush_all() directly in > console_flush_on_panic() without taking console_lock(). > > It should not be worse than the current code which ignores > the console_trylock() return value. I think your suggestion is acceptable. > Note that it mostly works because console_flush_on_panic() is called > when other CPUs are supposed to be stopped. > > We only would need to prevent other CPUs from flushing messages > as well if they were still running by chance. But we actually already > do this, see abandon_console_lock_in_panic(). Well, we should > make sure that the abandon_console_lock_in_panic() check is > done before flushing the first message. > > All these changes together would prevent deadlock on > console_sem->lock. But the synchronization "guarantees" should stay > the same. We could also update console_trylock() and console_lock() to fail and infinitely sleep, respectively, when abandon_console_lock_in_panic() is true. That would prevent CPUs from newly acquiring the console lock and interfering with the panic CPU. John
On Fri 2023-03-17 12:43:56, John Ogness wrote: > On 2023-03-07, Petr Mladek <pmladek@suse.com> wrote: > > So that this change would cause a non-paired console_unlock(). > > And console_unlock might still deadlock on the console_sem->lock. > > Yes, but at least it would have flushed beforehand. > > > One solution would be to call console_flush_all() directly in > > console_flush_on_panic() without taking console_lock(). > > > > It should not be worse than the current code which ignores > > the console_trylock() return value. > > I think your suggestion is acceptable. > > > Note that it mostly works because console_flush_on_panic() is called > > when other CPUs are supposed to be stopped. > > > > We only would need to prevent other CPUs from flushing messages > > as well if they were still running by chance. But we actually already > > do this, see abandon_console_lock_in_panic(). Well, we should > > make sure that the abandon_console_lock_in_panic() check is > > done before flushing the first message. > > > > All these changes together would prevent deadlock on > > console_sem->lock. But the synchronization "guarantees" should stay > > the same. > > We could also update console_trylock() and console_lock() to fail and > infinitely sleep, respectively, when abandon_console_lock_in_panic() is > true. That would prevent CPUs from newly acquiring the console lock and > interfering with the panic CPU. Interesting idea. It should be safe after panic() tries to stop the CPUs. But I am slightly worried to do this earlier. I wonder if it might block, for example, trigger_all_cpu_backtrace() that is called when (panic_print & PANIC_PRINT_ALL_CPU_BT) bit is set. Best Regards. Petr
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 40c5f4170ac7..84af038292d9 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -318,6 +318,10 @@ static int __down_trylock_console_sem(unsigned long ip) int lock_failed; unsigned long flags; + /* Semaphores are not NMI-safe. */ + if (in_nmi()) + return 1; + /* * Here and in __up_console_sem() we need to be in safe mode, * because spindump/WARN/etc from under console ->lock will