Message ID | 20231013204340.1112036-5-john.ogness@linutronix.de |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp2144302vqb; Fri, 13 Oct 2023 13:44:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHNVOo/MgJejHUHwXlqolSDchHmxkS1qNa3uq/r3lRzxzkGfP6ulkFhOA3U5eYRupCz2L2P X-Received: by 2002:a05:6359:b9a:b0:147:eb87:3665 with SMTP id gf26-20020a0563590b9a00b00147eb873665mr23604318rwb.3.1697229863784; Fri, 13 Oct 2023 13:44:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697229863; cv=none; d=google.com; s=arc-20160816; b=CG9boHFMgBB6Dw+vPgYyKIqPEJJ9lHmIOxEr+giluDp4nmR6dowIlhvuyWXiYR7qUF RstsXg5edGkOX+/giLd5Fvk9cHyKzWkLnPL8qxP9/BaW4hifIIR4JA/ktXhLJGhOi4pX 2qQgYMPRbOPQf4ygLtTNyEvZop9hdWKvkNGS3fmqZrcgWltAeIahUVyQFT6S40HwPo+Q cFYNSxrVehzw5AybQYbck2n8ZWLpibr3S1Upiwe9KfslQNivln0OjxpSWsYoeirnnDMU W0MjNuvcanWeXkYcPvPqQtPuAIH6zuiD2iHWCBH4jmxTh33TKrLVXkBTfZr//PVbzYmY 7V4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:dkim-signature :dkim-signature:from; bh=57p/+gSPfOoKdSF21ACUMUWvxO7+D7FS68Md7CNw7GE=; fh=pKnPKUsir0uEGSsor+4Zc2vgbu+g+ayvUgsdzkuXaoA=; b=lyZN5o3zkIvxFhl9cBgzzwofWXJMx2pl1SLm/zOAbnEJULxTK4QCcw5YNS0URljIwR 1VEW0Pyrd7s4qKYSrIytVP+/m1dn5LUyQy80nFkun41mghT3hWVxr/fb4zIU9vhmTRfF bQczVkkHWtUCCXB8VM/fA001J6KREs5O/kX0AoYNkdPekhdYfm6KO4WeRxTM63vwURrL KihwFJ8rQGuXEMIbJdse16goX5JWJhSqPBIKHuEhuWo8QLYgjcgXV5m7fN2Bbu0RZhY8 d5A9olr660DUf9Myk65B8/gPL2LCd2Y5LuJYDVQMuUNpH08rf/bsKM7782H3d5rkua3J 9z6A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=0xRJBwAV; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id fi38-20020a056a0039a600b0068a54cfcd74si7820001pfb.192.2023.10.13.13.44.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Oct 2023 13:44:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=0xRJBwAV; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id D3EFB83BD980; Fri, 13 Oct 2023 13:44:21 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229679AbjJMUn5 (ORCPT <rfc822;rua109.linux@gmail.com> + 19 others); Fri, 13 Oct 2023 16:43:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231680AbjJMUnt (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 13 Oct 2023 16:43:49 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A929BF for <linux-kernel@vger.kernel.org>; Fri, 13 Oct 2023 13:43:48 -0700 (PDT) From: John Ogness <john.ogness@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1697229827; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=57p/+gSPfOoKdSF21ACUMUWvxO7+D7FS68Md7CNw7GE=; b=0xRJBwAVH/UTiYtp6zcbdVc+c0Ujn60Jf4rk/D5a8G9mOgHOZAlf9xbncVwPRgT3whA8UG usAZZVImCxA+5LUkUX0pJbTfFkF1Yyr+0nmjnyGfkoUytdC8YI9aWYIgqGLDhCKKBFyxX9 CcRrjmMV8iv5MYOaJNPKVsKKxKn+VCqJlQbjgaTwgonRP8+hD5HZmDhxl2DgY/BOIVl/om S7tGO1eHr5WwQ7uUHQfXpjE+XzYi8PZvYqZRPWmQtNC+sh6+w2D6p2OAEIA37PuOIukljA zm/FT01ylYVPgmyNl8aWYntBnyO0qKWK7xYqKINUcfPQG7M4Uje1SdMcTe6How== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1697229827; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=57p/+gSPfOoKdSF21ACUMUWvxO7+D7FS68Md7CNw7GE=; b=sWpIK1b2zhiGcoMSIFK4I5BAQHavPrQ7DURfIvSsasFZVR8eX4g/mmO0MgjRzw/pR7aUyp JRrs1TGPK14eyhAg== To: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org>, Steven Rostedt <rostedt@goodmis.org>, Thomas Gleixner <tglx@linutronix.de>, linux-kernel@vger.kernel.org Subject: [PATCH printk v2 4/4] printk: Ignore waiter on panic Date: Fri, 13 Oct 2023 22:49:40 +0206 Message-Id: <20231013204340.1112036-5-john.ogness@linutronix.de> In-Reply-To: <20231013204340.1112036-1-john.ogness@linutronix.de> References: <20231013204340.1112036-1-john.ogness@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INVALID_DATE_TZ_ABSURD, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Fri, 13 Oct 2023 13:44:21 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1779674501487842978 X-GMAIL-MSGID: 1779674501487842978 |
Series | fix console flushing on panic | |
Commit Message
John Ogness
Oct. 13, 2023, 8:43 p.m. UTC
Commit d51507098ff91 ("printk: disable optimistic spin during
panic") added checks to avoid becoming a console waiter if a
panic is in progress. However, the transition to panic can occur
while there is already a waiter. If the panic occurred in a
context that does not support printing from the printk() caller
context, the waiter CPU may become stopped while still stored as
@console_waiter. Then when console_flush_on_panic() is called,
the panic CPU will see @console_waiter and handover to the
stopped CPU.
Here a simple example:
CPU0 CPU1
---- ----
console_lock_spinning_enable()
console_trylock_spinning()
[set as console waiter]
NMI: panic()
panic_other_cpus_shutdown()
[stopped as console waiter]
console_flush_on_panic()
console_lock_spinning_enable()
[print 1 record]
console_lock_spinning_disable_and_check()
[handover to stopped CPU1]
This results in panic() not flushing the panic messages.
Fix this by ignoring any console waiter if the panic CPU is
printing.
Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
kernel/printk/printk.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Comments
On Fri 2023-10-13 22:49:40, John Ogness wrote: > Commit d51507098ff91 ("printk: disable optimistic spin during > panic") added checks to avoid becoming a console waiter if a > panic is in progress. However, the transition to panic can occur > while there is already a waiter. If the panic occurred in a > context that does not support printing from the printk() caller > context, the waiter CPU may become stopped while still stored as > @console_waiter. I guess that "context that does not support printing" is NMI or printk_safe when the console handling is deferred(). Another scenario is when the current owner gets stopped or blocked so that it actually can't pass the lock to the waiter. > the panic CPU will see @console_waiter and handover to the > stopped CPU. > > Here a simple example: > > CPU0 CPU1 > ---- ---- > console_lock_spinning_enable() > console_trylock_spinning() > [set as console waiter] > NMI: panic() > panic_other_cpus_shutdown() > [stopped as console waiter] > console_flush_on_panic() > console_lock_spinning_enable() > [print 1 record] > console_lock_spinning_disable_and_check() > [handover to stopped CPU1] > > This results in panic() not flushing the panic messages. Great catch! > Fix this by ignoring any console waiter if the panic CPU is > printing. > > Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes") > Signed-off-by: John Ogness <john.ogness@linutronix.de> > --- > kernel/printk/printk.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 56d9b4acbbf2..cd6493f12970 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -1904,7 +1904,8 @@ static int console_lock_spinning_disable_and_check(int cookie) > console_owner = NULL; > raw_spin_unlock(&console_owner_lock); > > - if (!waiter) { > + /* Waiters are ignored by the panic CPU. */ > + if (!waiter || this_cpu_in_panic()) { > spin_release(&console_owner_dep_map, _THIS_IP_); > return 0; > } This seems to work. Well, I have spent some time thinking about possible scenarios and I would go even further. I would block also console_lock_spinning_enable() Also I would do the checks before taking console_owner_lock. It would make the behavior symmetric. And more importantly, it would prevent possible deadlocks caused by console_owner_lock. See the proposed patch below. I created it to see the changes in the code. Also I added info about possible scenarios and effects. Here we go: From 2b6e2b58df737e3a3e0f86134a53f560539be813 Mon Sep 17 00:00:00 2001 From: Petr Mladek <pmladek@suse.com> Date: Wed, 18 Oct 2023 10:04:17 +0200 Subject: [PATCH] printk: Disable passing console lock owner completely during panic() The commit d51507098ff91 ("printk: disable optimistic spin during panic") added checks to avoid becoming a console waiter if a panic is in progress. However, the transition to panic can occur while there is already waiter. The current owner should not pass the lock to the waiter because it might get stopped or blocked anytime. Also the panic context might pass the console lock owner to an already stopped waiter by mistake. It might happen when console_flush_on_panic() ignores the current lock owner, for example: CPU0 CPU1 ---- ---- console_lock_spinning_enable() console_trylock_spinning() [set as console waiter] NMI: panic() panic_other_cpus_shutdown() [stopped as console waiter] console_flush_on_panic() console_lock_spinning_enable() [print 1 record] console_lock_spinning_disable_and_check() [handover to stopped CPU1] This results in panic() not flushing the panic messages. Fix these problems by disabling all spinning operations completely during panic(). Another advantage is that it prevents possible deadlocks caused by "console_owner_lock". The panic() context does not need to take it any longer. The lockless checks are safe because the functions become NOPs when they see the panic in progress. All operations manipulating the state are still synchronized by the lock even when non-panic CPUs would notice the panic synchronously. The current owner might stay spinning. But non-panic() CPUs would get stopped anyway and the panic context will never start spinning. Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes") Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Petr Mladek <pmladek@suse.com> --- kernel/printk/printk.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 26a76b167ea6..fbe0dc1ca416 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1847,10 +1847,23 @@ static bool console_waiter; */ static void console_lock_spinning_enable(void) { + /* + * Do not use spinning in panic(). The panic CPU wants to keep the lock. + * Non-panic CPUs abandon the flush anyway. + * + * Just keep the lockdep annotation. The panic-CPU should avoid + * taking console_owner_lock because it might cause a deadlock. + * This looks like the easiest way how to prevent false lockdep + * reports without handling races a lockless way. + */ + if (panic_in_progress()) + goto lockdep; + raw_spin_lock(&console_owner_lock); console_owner = current; raw_spin_unlock(&console_owner_lock); +lockdep: /* The waiter may spin on us after setting console_owner */ spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); } @@ -1875,6 +1888,22 @@ static int console_lock_spinning_disable_and_check(int cookie) { int waiter; + /* + * Ignore spinning waiters during panic() because they might get stopped + * or blocked at any time, + * + * It is safe because nobody is allowed to start spinning during panic + * in the first place. If there has been a waiter then non panic CPUs + * might stay spinning. They would get stopped anyway. The panic context + * will never start spinning and an interrupted spin on panic CPU will + * never continue. + */ + if (panic_in_progress()) { + /* Keep lockdep happy. */ + spin_release(&console_owner_dep_map, _THIS_IP_); + return 0; + } + raw_spin_lock(&console_owner_lock); waiter = READ_ONCE(console_waiter); console_owner = NULL;
On 2023-10-18, Petr Mladek <pmladek@suse.com> wrote: > Well, I have spent some time thinking about possible scenarios and I would > go even further. I would block also console_lock_spinning_enable() > Also I would do the checks before taking console_owner_lock. > > It would make the behavior symmetric. And more importantly, it would prevent > possible deadlocks caused by console_owner_lock. > > See the proposed patch below. I created it to see the changes in the > code. Also I added info about possible scenarios and effects. Reviewed-by: John Ogness <john.ogness@linutronix.de> I will use your patch for the next version of this series. John
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 56d9b4acbbf2..cd6493f12970 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1904,7 +1904,8 @@ static int console_lock_spinning_disable_and_check(int cookie) console_owner = NULL; raw_spin_unlock(&console_owner_lock); - if (!waiter) { + /* Waiters are ignored by the panic CPU. */ + if (!waiter || this_cpu_in_panic()) { spin_release(&console_owner_dep_map, _THIS_IP_); return 0; }