Message ID | 20230919230856.661435-10-john.ogness@linutronix.de |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp3875888vqi; Tue, 19 Sep 2023 21:46:44 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGfX4Qis6OTPMtSCQO7Fl0cUI9kYD1wNtUebiKY4ctg60B3TdacCDKLZRiWuP3BUCVsixEV X-Received: by 2002:a05:6a00:179f:b0:68e:3bc7:3101 with SMTP id s31-20020a056a00179f00b0068e3bc73101mr1997413pfg.2.1695185204456; Tue, 19 Sep 2023 21:46:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695185204; cv=none; d=google.com; s=arc-20160816; b=Tp3r9dGCxaiHnjSLy6x2LNiJdYffpXA2hd7086VLQvR9Qo6j85E3Qd3JLBt5/1fLQ6 Cc359ZVkPfu7KNe6ffmmjRzsqT5O2c59lD7rfoLWUFdTKYEqoe5erAENGI/J595cJM3J WVY3TyYEz69QXRLqU+nQ5RKnmwGlAKIeQkLx7Pqx37/2EKDYTK8u7HVxYnGv9XpgYWw5 vYWYt/gnQ6+rgIqmigpBWhvJqJAWpsfVCLaljob/eXbXbYV1JqCzN0/MjZ50jxwGAVUi x3q3eEIEAVIij5hIyqaqRkmBKzmcF9lK2LoBLBfiReS5i+NfHWs/tvBOMkcuna+E8Evm H3xA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:dkim-signature :dkim-signature:from; bh=3M6IXxDMcErVEoXMIDzEHrM+hhZey9Pc7y7dowbbNtk=; fh=3gevDrX7WuWUbena6o12/5hyTTSGeHznSST72uYLt3c=; b=RJkWIdtnK+Z3W01INSojVs/Z9zqkZWY4b9ZxucKhQd+olXS8hSSqSrAWlarFeb4Pug jSTeg6BknXI9o5ydcJPN4B8Gln7DR9ay0uBqyyp2bVQ1ZxhcRMIkQMbRBzUsV6TsKgJc wuhPfmk/bLL9LtxYQD6fwz/Teq7o4D4G2cFoMACAift+qr0KCODarZ9AZZ9OTxnLIok2 S0wCQy0ZueE3jlkYU/+GivhaT/ZUs7AdiBu5Ga25uA2UTiCI6M5VN/zmwN+YMYT0Au19 ApeLoVklOJHre8kscbDgT899HhIJB+DHpKsOjiecwiXod9K/VO0W0qhIVblfpYIYQs/S XnMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=ykzLmHkU; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id g7-20020a056a0023c700b006902507d409si11423086pfc.174.2023.09.19.21.46.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Sep 2023 21:46:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=ykzLmHkU; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 9C1F8807070A; Tue, 19 Sep 2023 16:10:31 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233618AbjISXKJ (ORCPT <rfc822;toshivichauhan@gmail.com> + 26 others); Tue, 19 Sep 2023 19:10:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233578AbjISXJ1 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 19 Sep 2023 19:09:27 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4637BF1 for <linux-kernel@vger.kernel.org>; Tue, 19 Sep 2023 16:09:18 -0700 (PDT) From: John Ogness <john.ogness@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1695164956; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3M6IXxDMcErVEoXMIDzEHrM+hhZey9Pc7y7dowbbNtk=; b=ykzLmHkUV4QAIDn9PfSIf+5hV7yYHau/MwfSByro1+/RRo7xJ+fEdTKVzcdEcGiflpDxuO kjhSt3Zfya8Jid1zZi845nZJDhzpuhJEb0q3/Fcq++wddKi7Tl1uLEi4FT1haWi+es+l9Y dzyLYYnM4K/b0K0XJqsKPIPvLEv9kpe8Co4LCXg3656Db/fUAyCMJtb0z9Quatqcew/2r9 i1kjEQuv0D67WRfbU8ew0zqcZe5bBTk+IuMcU6gwgqjCNgnIZqAPxazLb5yAWxaixS3aCt QwchMoJlviiIAzHllzE1v4+PtMxza/htLXQLUVyF0UKwidvdTK9D5mLcm5cnUw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1695164956; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3M6IXxDMcErVEoXMIDzEHrM+hhZey9Pc7y7dowbbNtk=; b=NhLLwrLA38y5dN+KiT8IFLeDnAO76+WTX2vBcMJuxRtcbkK7T7OaHXmxLEaVWISZQUzkwD jr7HtmziecRcpUCQ== To: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org>, Steven Rostedt <rostedt@goodmis.org>, Thomas Gleixner <tglx@linutronix.de>, linux-kernel@vger.kernel.org, Kees Cook <keescook@chromium.org>, Luis Chamberlain <mcgrof@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Peter Zijlstra <peterz@infradead.org>, Josh Poimboeuf <jpoimboe@kernel.org>, Arnd Bergmann <arnd@arndb.de>, "Guilherme G. Piccoli" <gpiccoli@igalia.com>, Andy Shevchenko <andriy.shevchenko@linux.intel.com> Subject: [PATCH printk v2 09/11] panic: Add atomic write enforcement to oops Date: Wed, 20 Sep 2023 01:14:54 +0206 Message-Id: <20230919230856.661435-10-john.ogness@linutronix.de> In-Reply-To: <20230919230856.661435-1-john.ogness@linutronix.de> References: <20230919230856.661435-1-john.ogness@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,INVALID_DATE_TZ_ABSURD, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Tue, 19 Sep 2023 16:10:31 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777516082772029010 X-GMAIL-MSGID: 1777530520964837278 |
Series |
wire up nbcon atomic printing
|
|
Commit Message
John Ogness
Sept. 19, 2023, 11:08 p.m. UTC
Invoke the atomic write enforcement functions for oops to
ensure that the information gets out to the consoles.
Since there is no single general function that calls both
oops_enter() and oops_exit(), the nesting feature of atomic
write sections is taken advantage of in order to guarantee
full coverage between the first oops_enter() and the last
oops_exit().
It is important to note that if there are any legacy consoles
registered, they will be attempting to directly print from the
printk-caller context, which may jeopardize the reliability of
the atomic consoles. Optimally there should be no legacy
consoles registered.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
kernel/panic.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
Comments
On Wed, Sep 20, 2023 at 01:14:54AM +0206, John Ogness wrote: > Invoke the atomic write enforcement functions for oops to > ensure that the information gets out to the consoles. > > Since there is no single general function that calls both > oops_enter() and oops_exit(), the nesting feature of atomic > write sections is taken advantage of in order to guarantee > full coverage between the first oops_enter() and the last > oops_exit(). > > It is important to note that if there are any legacy consoles > registered, they will be attempting to directly print from the > printk-caller context, which may jeopardize the reliability of > the atomic consoles. Optimally there should be no legacy > consoles registered. ... > + if (atomic_read(&oops_cpu) == smp_processor_id()) { > + oops_nesting--; > + if (oops_nesting == 0) { > + atomic_set(&oops_cpu, -1); Between read and set the variable can change, can't it? If not, why this variable is atomic then? Or, why it's not a problem? If the latter is the case, perhaps a comment to explain this? > + /* Exit outmost atomic section. */ > + nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, oops_prev_prio); > + } > + } > + put_cpu();
On 2023-09-20, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > On Wed, Sep 20, 2023 at 01:14:54AM +0206, John Ogness wrote: >> Invoke the atomic write enforcement functions for oops to >> ensure that the information gets out to the consoles. >> >> Since there is no single general function that calls both >> oops_enter() and oops_exit(), the nesting feature of atomic >> write sections is taken advantage of in order to guarantee >> full coverage between the first oops_enter() and the last >> oops_exit(). >> >> It is important to note that if there are any legacy consoles >> registered, they will be attempting to directly print from the >> printk-caller context, which may jeopardize the reliability of >> the atomic consoles. Optimally there should be no legacy >> consoles registered. > > ... > >> + if (atomic_read(&oops_cpu) == smp_processor_id()) { >> + oops_nesting--; >> + if (oops_nesting == 0) { >> + atomic_set(&oops_cpu, -1); > > Between read and set the variable can change, can't it? CPU migration is disabled. @oops_cpu contains the CPU ID of the only CPU that is printing the oops. (Perhaps the variable should be called "oops_printing_cpu"?) If this matches smp_processor_id(), then the current CPU is the only one that is allowed to change it back to -1. So no, if the first condition is true, it cannot change before atomic_set(). And if the second condition is true, this is the only CPU+context that is allowed to change it back to -1; > If not, why this variable is atomic then? Or, why it's not a problem? > If the latter is the case, perhaps a comment to explain this? If not atomic, it will be a data race since one CPU might be changing @oops_cpu and another is reading it. For type "int" such a data race would be fine because it doesn't matter which side of the race the reader was on, both values will not match the current CPU ID. The reason that I didn't implement it using cmpxchg(), data_race(READ_ONCE()), and WRITE_ONCE() is because I once learned that you should never mix cmpxchg() with READ_ONCE()/WRITE_ONCE() because there are architectures that do not support cmpxchg() as an atomic instruction. The answer was always: "use atomic_t instead... that is what it is for". But AFAICT for this case it would be fine because obviously cmpxchg() will not race with itself. And successfully reading a matching CPU ID means there cannot be any cmpxchg() in progress. And writing only occurs after seeing a matching CPU ID. So I can change it from atomic_t to int. Although I do feel like that might require explanation about why the data race is safe. Or perhaps it is enough just to have something like this: /** * oops_printing_cpu - The ID of the CPU responsible for printing the * OOPS message(s) to the consoles. * * This is atomic_t because multiple CPUs can read this variable * simultaneously when exiting OOPS while another CPU can be * modifying this variable to begin or end its printing duties. */ static atomic_t oops_printing_cpu = ATOMIC_INIT(-1); John Ogness
On Wed, Sep 20, 2023 at 04:26:12PM +0206, John Ogness wrote: > On 2023-09-20, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > On Wed, Sep 20, 2023 at 01:14:54AM +0206, John Ogness wrote: ... > >> + if (atomic_read(&oops_cpu) == smp_processor_id()) { > >> + oops_nesting--; > >> + if (oops_nesting == 0) { > >> + atomic_set(&oops_cpu, -1); > > > > Between read and set the variable can change, can't it? > > CPU migration is disabled. @oops_cpu contains the CPU ID of the only CPU > that is printing the oops. (Perhaps the variable should be called > "oops_printing_cpu"?) > > If this matches smp_processor_id(), then the current CPU is the only one > that is allowed to change it back to -1. So no, if the first condition > is true, it cannot change before atomic_set(). And if the second > condition is true, this is the only CPU+context that is allowed to > change it back to -1; > > > If not, why this variable is atomic then? Or, why it's not a problem? > > If the latter is the case, perhaps a comment to explain this? > > If not atomic, it will be a data race since one CPU might be changing > @oops_cpu and another is reading it. For type "int" such a data race > would be fine because it doesn't matter which side of the race the > reader was on, both values will not match the current CPU ID. > > The reason that I didn't implement it using cmpxchg(), > data_race(READ_ONCE()), and WRITE_ONCE() is because I once learned that > you should never mix cmpxchg() with READ_ONCE()/WRITE_ONCE() because > there are architectures that do not support cmpxchg() as an atomic > instruction. The answer was always: "use atomic_t instead... that is > what it is for". > > But AFAICT for this case it would be fine because obviously cmpxchg() > will not race with itself. And successfully reading a matching CPU ID > means there cannot be any cmpxchg() in progress. And writing only occurs > after seeing a matching CPU ID. > > So I can change it from atomic_t to int. Although I do feel like that > might require explanation about why the data race is safe. Either way a comment is needed, but I think the usage of atomic above is a bit confusing as you see I immediately rose the concern. > Or perhaps it is enough just to have something like this: > > /** > * oops_printing_cpu - The ID of the CPU responsible for printing the > * OOPS message(s) to the consoles. > * > * This is atomic_t because multiple CPUs can read this variable > * simultaneously when exiting OOPS while another CPU can be > * modifying this variable to begin or end its printing duties. > */ > static atomic_t oops_printing_cpu = ATOMIC_INIT(-1);
On Wed 2023-09-20 01:14:54, John Ogness wrote: > Invoke the atomic write enforcement functions for oops to > ensure that the information gets out to the consoles. > > Since there is no single general function that calls both > oops_enter() and oops_exit(), the nesting feature of atomic > write sections is taken advantage of in order to guarantee > full coverage between the first oops_enter() and the last > oops_exit(). > > It is important to note that if there are any legacy consoles > registered, they will be attempting to directly print from the > printk-caller context, which may jeopardize the reliability of > the atomic consoles. Optimally there should be no legacy > consoles registered. > > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -630,6 +634,36 @@ bool oops_may_print(void) > */ > void oops_enter(void) > { > + enum nbcon_prio prev_prio; > + int cpu = -1; > + > + /* > + * If this turns out to be the first CPU in oops, this is the > + * beginning of the outermost atomic section. Otherwise it is > + * the beginning of an inner atomic section. > + */ This sounds strange. What is the advantage of having the inner atomic context, please? It covers only messages printed inside oops_enter() and not the whole oops_enter()/exit(). Also see below. > + prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY); > + > + if (atomic_try_cmpxchg_relaxed(&oops_cpu, &cpu, smp_processor_id())) { > + /* > + * This is the first CPU in oops. Save the outermost > + * @prev_prio in order to restore it on the outermost > + * matching oops_exit(), when @oops_nesting == 0. > + */ > + oops_prev_prio = prev_prio; > + > + /* > + * Enter an inner atomic section that ends at the end of this > + * function. In this case, the nbcon_atomic_enter() above > + * began the outermost atomic section. > + */ > + prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY); > + } > + > + /* Track nesting when this CPU is the owner. */ > + if (cpu == -1 || cpu == smp_processor_id()) > + oops_nesting++; > + > tracing_off(); > /* can't trust the integrity of the kernel anymore: */ > debug_locks_off(); > @@ -637,6 +671,9 @@ void oops_enter(void) > > if (sysctl_oops_all_cpu_backtrace) > trigger_all_cpu_backtrace(); > + > + /* Exit inner atomic section. */ > + nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, prev_prio); This will not flush the messages when: + This CPU owns oops_cpu. The flush will have to wait for exiting the outer loop. In this case, the inner atomic context is not needed. + oops_cpu is owner by another CPU, the other CPU is just flushing the messages and block the per-console lock. The good thing is that the messages printed by this oops_enter() would likely get flushed by the other CPU. The bad thing is that oops_exit() on this CPU won't call nbcon_atomic_exit() so that the following OOPS messages from this CPU might need to wait for the printk kthread. IMHO, this is not what we want. One solution would be to store prev_prio in per-CPU array so that each CPU could call its own nbcon_atomic_exit(). But I start liking more and more the idea with storing and counting nested emergency contexts in struct task_struct. It is the alternative implementation in reply to the 7th patch, https://lore.kernel.org/r/ZRLBxsXPCym2NC5Q@alley Then it will be enough to simply call: + nbcon_emergency_enter() in oops_enter() + nbcon_emergency_exit() in oops_enter() Best Regards, Petr PS: I just hope that you didn't add all this complexity just because we preferred this behavior at LPC 2022. Especially I hope that it was not me who proposed and preferred this.
diff --git a/kernel/panic.c b/kernel/panic.c index 86ed71ba8c4d..e2879098645d 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -614,6 +614,10 @@ bool oops_may_print(void) return pause_on_oops_flag == 0; } +static atomic_t oops_cpu = ATOMIC_INIT(-1); +static int oops_nesting; +static enum nbcon_prio oops_prev_prio; + /* * Called when the architecture enters its oops handler, before it prints * anything. If this is the first CPU to oops, and it's oopsing the first @@ -630,6 +634,36 @@ bool oops_may_print(void) */ void oops_enter(void) { + enum nbcon_prio prev_prio; + int cpu = -1; + + /* + * If this turns out to be the first CPU in oops, this is the + * beginning of the outermost atomic section. Otherwise it is + * the beginning of an inner atomic section. + */ + prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY); + + if (atomic_try_cmpxchg_relaxed(&oops_cpu, &cpu, smp_processor_id())) { + /* + * This is the first CPU in oops. Save the outermost + * @prev_prio in order to restore it on the outermost + * matching oops_exit(), when @oops_nesting == 0. + */ + oops_prev_prio = prev_prio; + + /* + * Enter an inner atomic section that ends at the end of this + * function. In this case, the nbcon_atomic_enter() above + * began the outermost atomic section. + */ + prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY); + } + + /* Track nesting when this CPU is the owner. */ + if (cpu == -1 || cpu == smp_processor_id()) + oops_nesting++; + tracing_off(); /* can't trust the integrity of the kernel anymore: */ debug_locks_off(); @@ -637,6 +671,9 @@ void oops_enter(void) if (sysctl_oops_all_cpu_backtrace) trigger_all_cpu_backtrace(); + + /* Exit inner atomic section. */ + nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, prev_prio); } static void print_oops_end_marker(void) @@ -652,6 +689,18 @@ void oops_exit(void) { do_oops_enter_exit(); print_oops_end_marker(); + + if (atomic_read(&oops_cpu) == smp_processor_id()) { + oops_nesting--; + if (oops_nesting == 0) { + atomic_set(&oops_cpu, -1); + + /* Exit outmost atomic section. */ + nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, oops_prev_prio); + } + } + put_cpu(); + kmsg_dump(KMSG_DUMP_OOPS); }