Message ID | 20230919230856.661435-5-john.ogness@linutronix.de |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp3834735vqi; Tue, 19 Sep 2023 19:56:34 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEyBgiWXrd5r+Rd74rCf1imGPlvOtKhW+KkSR1aZr06s0AhuwaOLdVHjMOw76Ecj5pL/5a1 X-Received: by 2002:a05:6a20:1585:b0:154:9222:684a with SMTP id h5-20020a056a20158500b001549222684amr1557382pzj.24.1695178593899; Tue, 19 Sep 2023 19:56:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695178593; cv=none; d=google.com; s=arc-20160816; b=vCq8qY6FOV1wi6Kj7TTfLyamrWfRgTsI/AX5WPTueEsnpZW/enyViPImNlnb6rEaB7 04+n94j2GfyFSI4aVpSxBTrxza/k8pvqsc8WTqEGI5qnTmreNsy69oLSnZgA8crRPZMr M7X3yPIQwjARtL77EnX9IO4qWQHh9GBzgrAToMVYOjEJMihaM5h0qdcPIGxFCfBHjAPw fioz6kBn463B5Tv6t0Dh/xeEc3pDY/T/5Hv+bbZleVW+Gc/KVaVTSujABpLOG6+iPTw0 jNu3dH2f8KkNDz1QAHBfOiBuioZQMTmy06gQFMiD9vGkFei9oBypRkaW04nKVkXU/Cts RbRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:dkim-signature :dkim-signature:from; bh=RXYljq1O5mwfZU+NDYT3eBF6WGuYpS+t9+Qc7DTVOZs=; fh=/NZX815RJy/ecOe1WbZ023sSZUJKZnpaeIZwQG5Ym7U=; b=AgkEy5WEfK2k/L5SnXvWMYEpvyBKFJu0CsE2H9cSyoGPk4b63RYshNz6RN0pGjqo0c CfYaghb1T/CnXK7y/d05mRAeJb0xaP/eH8SV6npCt0d/J3EQVOq4FFDLk8nYVrNFRtiN 7w1h+JQJF7OsaD5ic+Xgxzoo/xIgwiSOC6kANBM0Nv2mtVp578heAZWvgzWxwcKvefEx nTgcTkNtV4rem6YE8nHhyLNegkF+x2ZEvvv1jQfYQTQsVK1DsXeBvOv+Kw9Igk3tSXxC Te3jnBrI7HWCv8c5qsYGQCYGcPSB2ndobAHjFP4jC3q/UISmW1xB0F1rogb00FvaY6Nt NlWw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=0n9rSWKK; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id br2-20020a17090b0f0200b002684bc84493si577722pjb.131.2023.09.19.19.56.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Sep 2023 19:56:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=0n9rSWKK; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id D4FAB81D7A7F; Tue, 19 Sep 2023 16:09:55 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233675AbjISXJe (ORCPT <rfc822;toshivichauhan@gmail.com> + 26 others); Tue, 19 Sep 2023 19:09:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42292 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233399AbjISXJU (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 19 Sep 2023 19:09:20 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACC62C0 for <linux-kernel@vger.kernel.org>; Tue, 19 Sep 2023 16:09:12 -0700 (PDT) From: John Ogness <john.ogness@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1695164950; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RXYljq1O5mwfZU+NDYT3eBF6WGuYpS+t9+Qc7DTVOZs=; b=0n9rSWKKgFMMM8oe7OxgVm25UuFUBlbW3YAo2v2icNrgjeHRizUJRTeSvdYT/eGtVcIKJp vjFkpAkv6JhdILbdaNar/Oq0AUKclYJykk+XE3DwtMKMr8FEc7H+1SaJH25CtVSuHK98ez WwHcrluKf3t95h8bAz12tGmzMFDSf6+/jogu926ViKIIYtUAF+G1Tz8vx++q9tfw/Pj2o4 xounm7rjfOk6MvfeuxnCZ0PLv6l9SSFbqTHZQ5rbFR8jD+dXYcV+AhvyPeM6DKLRG4tBMs yIllyCVpEpRJEjNbm+qMif91wyzGcF33SOvNDJvM0qn2yVJvqmo/tDDJh/bNZg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1695164950; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RXYljq1O5mwfZU+NDYT3eBF6WGuYpS+t9+Qc7DTVOZs=; b=SF2ALktfgmDTzBDMjoVe+9Uw9oVXGPK8VIxJwdu/Otl/ztdreIl9fLaUgdg5d9EvAO66IX 8+xRq32MGQwOnXBQ== To: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org>, Steven Rostedt <rostedt@goodmis.org>, Thomas Gleixner <tglx@linutronix.de>, linux-kernel@vger.kernel.org, Greg Kroah-Hartman <gregkh@linuxfoundation.org> Subject: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections Date: Wed, 20 Sep 2023 01:14:49 +0206 Message-Id: <20230919230856.661435-5-john.ogness@linutronix.de> In-Reply-To: <20230919230856.661435-1-john.ogness@linutronix.de> References: <20230919230856.661435-1-john.ogness@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,INVALID_DATE_TZ_ABSURD, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 19 Sep 2023 16:09:56 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777523589653532637 X-GMAIL-MSGID: 1777523589653532637 |
Series |
wire up nbcon atomic printing
|
|
Commit Message
John Ogness
Sept. 19, 2023, 11:08 p.m. UTC
From: Thomas Gleixner <tglx@linutronix.de> WARN/OOPS/PANIC require printing out immediately since the regular printing method (and in the future, the printing threads) might not be able to run. Add per-CPU state to denote the priority/urgency of the output and provide functions to mark the beginning and end of sections where the urgent messages are generated. Note that when a CPU is in a priority elevated state, flushing only occurs when dropping back to a lower priority. This allows the full set of printk records (WARN/OOPS/PANIC output) to be stored in the ringbuffer before beginning to flush the backlog. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> --- include/linux/console.h | 4 ++ kernel/printk/nbcon.c | 89 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 93 insertions(+)
Comments
On Wed 2023-09-20 01:14:49, John Ogness wrote: > From: Thomas Gleixner <tglx@linutronix.de> > > WARN/OOPS/PANIC require printing out immediately since the > regular printing method (and in the future, the printing > threads) might not be able to run. > > Add per-CPU state to denote the priority/urgency of the output > and provide functions to mark the beginning and end of sections > where the urgent messages are generated. > > Note that when a CPU is in a priority elevated state, flushing > only occurs when dropping back to a lower priority. This allows > the full set of printk records (WARN/OOPS/PANIC output) to be > stored in the ringbuffer before beginning to flush the backlog. The above paragraph is a bit confusing. The code added by this patch does not do any flushing. I guess that this last paragraph is supposed to explain why the "nesting" array is needed. I would write something like: "The state also counts nesting of printing contexts per-priority. It will be later used to prevent flushing in nested contexts." That said, I am not sure if the special handling of nested contexts is needed. But let's discuss it in the patch introducing the flush funtions. > --- a/kernel/printk/nbcon.c > +++ b/kernel/printk/nbcon.c > @@ -961,6 +961,95 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt) > return nbcon_context_exit_unsafe(ctxt); > } > > +/** > + * struct nbcon_cpu_state - Per CPU printk context state > + * @prio: The current context priority level > + * @nesting: Per priority nest counter > + */ > +struct nbcon_cpu_state { > + enum nbcon_prio prio; > + int nesting[NBCON_PRIO_MAX]; > +}; > + > +static DEFINE_PER_CPU(struct nbcon_cpu_state, nbcon_pcpu_state); > +static struct nbcon_cpu_state early_nbcon_pcpu_state __initdata; > + > +/** > + * nbcon_get_cpu_state - Get the per CPU console state pointer > + * > + * Returns either a pointer to the per CPU state of the current CPU or to > + * the init data state during early boot. > + */ > +static __ref struct nbcon_cpu_state *nbcon_get_cpu_state(void) > +{ > + if (!printk_percpu_data_ready()) > + return &early_nbcon_pcpu_state; My first thought, was that this was racy. I was afraid that printk_percpu_data_ready() could change value inside atomit_enter()/exit() area. But it actually could not happen. Anyway, it might worth a comment. Something like: /* * The value of __printk_percpu_data_ready is modified in normal * context. As a result it could never change inside a nbcon * atomic context. */ if (!printk_percpu_data_ready()) return &early_nbcon_pcpu_state; > + > + return this_cpu_ptr(&nbcon_pcpu_state); > +} > + > +/** > + * nbcon_atomic_exit - Exit a context that enforces atomic printing > + * @prio: Priority of the context to leave > + * @prev_prio: Priority of the previous context for restore > + * > + * Context: Any context. Enables migration. > + * > + * @prev_prio is the priority returned by the corresponding > + * nbcon_atomic_enter(). > + */ > +void nbcon_atomic_exit(enum nbcon_prio prio, enum nbcon_prio prev_prio) > +{ > + struct nbcon_cpu_state *cpu_state; > + > + cpu_state = nbcon_get_cpu_state(); I would add a consistency check: WARN_ON_ONCE(cpu_state->nesting[cpu_state->prio] <= 0) > + /* > + * Undo the nesting of nbcon_atomic_enter() at the CPU state > + * priority. > + */ > + cpu_state->nesting[cpu_state->prio]--; > + > + /* > + * Restore the previous priority, which was returned by > + * nbcon_atomic_enter(). > + */ > + cpu_state->prio = prev_prio; > + > + migrate_enable(); > +} Best Regards, Petr
On 2023-09-22, Petr Mladek <pmladek@suse.com> wrote: >> Note that when a CPU is in a priority elevated state, flushing >> only occurs when dropping back to a lower priority. This allows >> the full set of printk records (WARN/OOPS/PANIC output) to be >> stored in the ringbuffer before beginning to flush the backlog. > > The above paragraph is a bit confusing. The code added by this patch > does not do any flushing. You are right. I should put this patch after patch 5 "printk: nbcon: Provide function for atomic flushing" to simplify the introduction. > I guess that this last paragraph is supposed to explain why the > "nesting" array is needed. No, it is explaining how this feature works in general. The term "priority elevated state" means the CPU is in an atomic write section. The "nesting" array is needed in order to support a feature that is not explained in the commit message: If nested OOPS/WARN/PANIC occur, only the outermost OOPS/WARN/PANIC will do the flushing. I will add this information to the commit message. >> +static __ref struct nbcon_cpu_state *nbcon_get_cpu_state(void) >> +{ >> + if (!printk_percpu_data_ready()) >> + return &early_nbcon_pcpu_state; > > it might worth a comment. Something like: > > /* > * The value of __printk_percpu_data_ready is modified in normal > * context. As a result it could never change inside a nbcon > * atomic context. > */ > if (!printk_percpu_data_ready()) > return &early_nbcon_pcpu_state; OK. >> +void nbcon_atomic_exit(enum nbcon_prio prio, enum nbcon_prio prev_prio) >> +{ >> + struct nbcon_cpu_state *cpu_state; >> + >> + cpu_state = nbcon_get_cpu_state(); > > I would add a consistency check: > > WARN_ON_ONCE(cpu_state->nesting[cpu_state->prio] <= 0) OK. John
On Mon 2023-09-25 11:31:54, John Ogness wrote: > On 2023-09-22, Petr Mladek <pmladek@suse.com> wrote: > >> Note that when a CPU is in a priority elevated state, flushing > >> only occurs when dropping back to a lower priority. This allows > >> the full set of printk records (WARN/OOPS/PANIC output) to be > >> stored in the ringbuffer before beginning to flush the backlog. > > > > The above paragraph is a bit confusing. The code added by this patch > > does not do any flushing. > > You are right. I should put this patch after patch 5 "printk: nbcon: > Provide function for atomic flushing" to simplify the introduction. > > > I guess that this last paragraph is supposed to explain why the > > "nesting" array is needed. > > No, it is explaining how this feature works in general. The term > "priority elevated state" means the CPU is in an atomic write section. This did not help me much after at first. But I got it later after connection more dots ;-) IMHO, the problem was that the commit message introduced the terms in the following order: + WARN/OOPS/PANIC require printing out immediately + per-CPU state to denote the priority/urgency + functions to mark the beginning/end where the urgent messages are generated + when CPU is in priority elevated state, flushing only occurs when dropping back to a lower priority From my POV: + It did not mention/explained "atomic write" at all + It said that the urgent messages required immediate printing. And Later, it said that they would get flushed later. Which is contradicting each other. + The handling of priorities is not only about CPU nesting. The same rules should apply also when other CPU is printing messages in a higher priority context. My proposal: + Use the term "higher priority context" for all WARN/OOPS/PANIC messages. + Use "emergency priority context" or "nbcon emergency context" when talking about NBCON_PRIO_EMERGENCY context to avoid confusion with the printk log levels. + use "panic priority context or "nbcon panic context" for the panic one if we really want to add enter/exit for the panic context. + We must somewhere explain the "atomic context" and "atomic_write". callback. One important question is why it is atomic. Is it because it + _can_ be called in atomic context? + _must_ be called in atomic context? It is called also from console_unlock() for boot messages so it need not be in atomic context. What about renaming it to "nbcon_write" to avoid this confusion? > The "nesting" array is needed in order to support a feature that is not > explained in the commit message: If nested OOPS/WARN/PANIC occur, only > the outermost OOPS/WARN/PANIC will do the flushing. I will add this > information to the commit message. What is the motivation for the feature, please? 1. Either we want to flush the messages in the higher priority context ASAP. The per-CPU lock has been designed exactly for this purpose. It would not need any extra nesting counter. And it would work consistently also when the lock is acquired on another CPU. It is simple, the context will either get the per-console lock or not. The (nested) context might get the lock only when it has higher priority. The flush would be called directly from vprintk_emit(). I always thought that this was the planned behavior. IMHO, we want it. A nested panic() should be able to takeover the console and flush messages ASAP. It will never return. 2. Or we want to wait until all messages in the higher priority context are stored into the ring buffer and flush them by the caller. Who would actually flush the higher messages? WARN() caller? The timer callback which detected softlockup? Or a completely different context? Who would flush panic() messages when panic() interrupted locked CPU? My proposal: There are only two higher priority contexts: + NBCON_PRIO_PANIC should be used when panic_cpu == raw_smp_processor_id() + NBCON_PRIO_EMERGENCY contex would require some enter/exit wrappers and tracking. But it does not necessarily need to be per-CPU variable. I think about adding "int printk_state" into struct task_struct. It might be useful also for other things, e.g. for storing the last log level of non-finished message. Entering section with enforced minimal loglevel or so. Then we could have: #define PRINTK_STATE_EMERGENCY_MASK 0x000000ff #define PRINTK_STATE_EMERGENCY_OFFSET 0x00000001 void nbcon_emergency_enter(&printk_state) { *printk_state = current->printk_state; WARN_ON_ONCE((*printk_state & PRINTK_STATE_EMERGENCY_MASK) == PRINTK_STATE_EMERGENCY_MASK); current->printk_state = *printk_state + PRINTK_STATE_EMERGENCY_OFFSET; } void nbcon_emergency_exit(printk_state) { WARN_ON_ONCE(!(current->printk_state & PRINTK_STATE_EMERGENCY_MASK)) current->printk_state = printk_state; } enum nbcon_prio nbcon_get_default_prio(void) { if (panic_cpu == raw_smp_processor_id() return NBCON_PANIC_PRIO; /* Does current always exist? What about fork? */ if (current && (current->printk_state && PRINTK_STATE_EMERGENCY_MASK)) return NBCON_PRIO_EMERGENCY; return NBCON_PRIO_NORMAL; } IMHO, it should be race free. And we could use it to initialize struct nbcon_context. The final decision will be left for the nbcon_ctxt_try_acquire(). Best Regards, Petr
On 2023-09-25, Petr Mladek <pmladek@suse.com> wrote: > From my POV: > > + It did not mention/explained "atomic write" at all Agreed. > + It said that the urgent messages required immediate printing. > And Later, it said that they would get flushed later. Which > is contradicting each other. I agree that the wording needs to be dramatically improved. The term "urgent message" was not meant to represent a single printk() call. > + The handling of priorities is not only about CPU nesting. > The same rules should apply also when other CPU is printing > messages in a higher priority context. Sorry, I do not understand what you mean here. Each CPU is responsible for its own state. If another CPU is in a higher priority state, that CPU will be responsible for ensuring its own WARN/OOPS is stored and flushed. (From v2 I see that the CPU does not try hard enough. I would fix that for v3.) > My proposal: > > + Use the term "higher priority context" for all WARN/OOPS/PANIC > messages. > > + Use "emergency priority context" or "nbcon emergency context" > when talking about NBCON_PRIO_EMERGENCY context to avoid > confusion with the printk log levels. > > + use "panic priority context or "nbcon panic context" for the panic > one if we really want to add enter/exit for the panic context. There are 3 different types of priority: 1. The printk record priority: KERN_ERR, KERN_WARNING, etc. 2. The priority of a console owner: used for controlled takeovers. 3. The priority state of a CPU: only elevated for urgent messages, used to store all urgent messages and then afterwards print directly by taking ownership of consoles. I need to choose terminology carefully to make it easy to distinguish between these 3 types. v2 failed to name, describe, and document this correctly. > + We must somewhere explain the "atomic context" and "atomic_write". > callback. One important question is why it is atomic. Is it because it > > + _can_ be called in atomic context? > + _must_ be called in atomic context? Its main reason for existing is because it can be called from atomic (including NMI) contexts. But like atomic_t, it can be used from any context. > It is called also from console_unlock() for boot messages > so it need not be in atomic context. > > What about renaming it to "nbcon_write" to avoid this confusion? When we introduce the threads, there will be a 2nd callback (currently planned write_thread()). This callback is guaranteed to be called from a printing kthread, which for console drivers like fbcon will prove to be helpful in cleaning up its code. I will reserve the word "atomic" _only_ when talking about which printing callback is used. That should help to avoid associating the callback with a certain context or priority. But I think the name "write_atomic" is appropriate. >> The "nesting" array is needed in order to support a feature that is not >> explained in the commit message: If nested OOPS/WARN/PANIC occur, only >> the outermost OOPS/WARN/PANIC will do the flushing. I will add this >> information to the commit message. > > What is the motivation for the feature, please? During the demo at LPC2022 we had the situation that there was a large backlog when a WARN was hit. With current mainline the first line of the WARN is put into the ringbuffer and then the entire backlog is flushed before storing the rest of the WARN into the ringbuffer. At the time it was obvious that we should finish storing the WARN message and then start flushing the backlog. > 1. Either we want to flush the messages in the higher priority context > ASAP. > > The per-CPU lock has been designed exactly for this purpose. It > would not need any extra nesting counter. And it would work > consistently also when the lock is acquired on another CPU. > > It is simple, the context will either get the per-console lock or > not. The (nested) context might get the lock only when it has higher > priority. The flush would be called directly from vprintk_emit(). > > I always thought that this was the planned behavior. It was. But then it was suggested by Thomas and acked by Linus that we were taking the wrong approach. Rather than a global state for all the consoles, each console should have separate owners with states, allowing handovers/takeovers. And CPUs should have urgency states to control how the ringbuffer is stored and when direct printing should take place for that CPU. This idea is similar to the cpu_sync approach, but with timeouts, handovers/takeovers, and is per-console. This approach gives us a lot of flexibility to enforce logical and safe policies for storing and printing in different situations. And if named/documented correctly, I think it will be easy to understand. > 2. Or we want to wait until all messages in the higher priority context > are stored into the ring buffer and flush them by the caller. Yes. > Who would actually flush the higher messages? > WARN() caller? Yes. > The timer callback which detected softlockup? Yes. > Or a completely different context? Another CPU could do some helpful backlog printing if it sees new messages and is able to become an owner. But it is not the CPU that is responsible for making certain that the messages have been printed. > Who would flush panic() messages when panic() interrupted > locked CPU? The panic CPU can takeover (as a last resort, unsafely if necessary). The panic CPU is responsible for getting those messages out. > My proposal: > > There are only two higher priority contexts: > > + NBCON_PRIO_PANIC should be used when panic_cpu == raw_smp_processor_id() This is the case with v2. > + NBCON_PRIO_EMERGENCY contex would require some enter/exit wrappers > and tracking. But it does not necessarily need to be per-CPU > variable. < > I think about adding "int printk_state" into struct task_struct. > It might be useful also for other things, e.g. for storing the last > log level of non-finished message. Entering section with enforced > minimal loglevel or so. printk() calls are quite rare. And warning states are even more rare. IMHO adding such a field to every task is a huge price to pay. Also, printk operates in non-task contexts (hardirq, nmi). Although @current probably points to something existing, there could be some tricky corner cases. IMHO per-cpu urgency state variables and per-console owner state variables allows a much simpler picture in all contexts. My proposal: You have provided excellent feedback regarding naming and documentation. Allow me to fix these things to clarify the various functions and their roles. John
Adding Linus into Cc. I would like to be sure about the flushing of atomic consoles in panic context. > During the demo at LPC2022 we had the situation that there was a large > backlog when a WARN was hit. With current mainline the first line of the > WARN is put into the ringbuffer and then the entire backlog is flushed > before storing the rest of the WARN into the ringbuffer. At the time it > was obvious that we should finish storing the WARN message and then > start flushing the backlog. This talks about the "emergency" context (WARN/OOPS/watchdog). The system might be in big troubles but it would still try to continue. Do we really want to defer the flush also for panic() context? I ask because I was not on LPC 2022 in person and I do not remember all details. Anyway, the deferred flush works relatively well for the "emergency" context: + flushed from nbcon_atomic_exit() + printk kthread might emit the messages while they are being added But it is tricky in panic(), see 8th patch at https://lore.kernel.org/r/20230919230856.661435-9-john.ogness@linutronix.de + nbcon_atomic_exit() is called only in one code path. + nbcon_atomic_flush_all() is used in other paths. It looks like a "Whack a mole" game to me. + messages are never emitted by printk kthread either because CPUs are stopped or the kthread is not allowed to get the lock[*] I see only one positive of the explicit flush. The consoles would not delay crash_exec() and the crash dump might be closer to the point where panic() was called. Otherwise I see only negatives => IMHO, we want to flush atomic consoles synchronously from printk() in panic(). Does anyone really want explicit flushes in panic()? [*] Emitting messages is explicitly blocked on non-panic CPUs. It increases the change that panic-CPU would be able to take the console lock the safe way. Best Regards, Petr
(2nd attempt with with Linus really in Cc). Adding Linus into Cc. I would like to be sure about the flushing of atomic consoles in panic context. > During the demo at LPC2022 we had the situation that there was a large > backlog when a WARN was hit. With current mainline the first line of the > WARN is put into the ringbuffer and then the entire backlog is flushed > before storing the rest of the WARN into the ringbuffer. At the time it > was obvious that we should finish storing the WARN message and then > start flushing the backlog. This talks about the "emergency" context (WARN/OOPS/watchdog). The system might be in big troubles but it would still try to continue. Do we really want to defer the flush also for panic() context? I ask because I was not on LPC 2022 in person and I do not remember all details. Anyway, the deferred flush works relatively well for the "emergency" context: + flushed from nbcon_atomic_exit() + printk kthread might emit the messages while they are being added But it is tricky in panic(), see 8th patch at https://lore.kernel.org/r/20230919230856.661435-9-john.ogness@linutronix.de + nbcon_atomic_exit() is called only in one code path. + nbcon_atomic_flush_all() is used in other paths. It looks like a "Whack a mole" game to me. + messages are never emitted by printk kthread either because CPUs are stopped or the kthread is not allowed to get the lock[*] I see only one positive of the explicit flush. The consoles would not delay crash_exec() and the crash dump might be closer to the point where panic() was called. Otherwise I see only negatives => IMHO, we want to flush atomic consoles synchronously from printk() in panic(). Does anyone really want explicit flushes in panic()? [*] Emitting messages is explicitly blocked on non-panic CPUs. It increases the change that panic-CPU would be able to take the console lock the safe way. Best Regards, Petr
On Thu 2023-10-05 14:57:47, John Ogness wrote: > On 2023-09-25, Petr Mladek <pmladek@suse.com> wrote: > > From my POV: > > > > + It did not mention/explained "atomic write" at all > > Agreed. > > > + It said that the urgent messages required immediate printing. > > And Later, it said that they would get flushed later. Which > > is contradicting each other. > > I agree that the wording needs to be dramatically improved. The term > "urgent message" was not meant to represent a single printk() call. > > > + The handling of priorities is not only about CPU nesting. > > The same rules should apply also when other CPU is printing > > messages in a higher priority context. > > Sorry, I do not understand what you mean here. Each CPU is responsible > for its own state. If another CPU is in a higher priority state, that > CPU will be responsible for ensuring its own WARN/OOPS is stored and > flushed. You are right that my comment was confusing. I had in mind that flushing one emergency context would flush all pending messages from other CPUs and even from other emergency context. Your explanation is better. > (From v2 I see that the CPU does not try hard enough. I would > fix that for v3.) Yes, this should be fixed. > There are 3 different types of priority: > > 1. The printk record priority: KERN_ERR, KERN_WARNING, etc. > > 2. The priority of a console owner: used for controlled takeovers. > > 3. The priority state of a CPU: only elevated for urgent messages, used > to store all urgent messages and then afterwards print directly by > taking ownership of consoles. I know that the you want to distinguish 2. and 3. But I think that they should be the same. I mean that nbcon_context_try_acquire() should use the PRIO according to what context it is in. Is there any situation where it should be different, please? IMHO, it might simplify some logic when they are the same. > I need to choose terminology carefully to make it easy to distinguish > between these 3 types. v2 failed to name, describe, and document this > correctly. > > + We must somewhere explain the "atomic context" and "atomic_write". > > callback. One important question is why it is atomic. Is it because it > > > > + _can_ be called in atomic context? > > + _must_ be called in atomic context? > > Its main reason for existing is because it can be called from atomic > (including NMI) contexts. But like atomic_t, it can be used from any > context. > > > It is called also from console_unlock() for boot messages > > so it need not be in atomic context. > > > > What about renaming it to "nbcon_write" to avoid this confusion? > > When we introduce the threads, there will be a 2nd callback (currently > planned write_thread()). This callback is guaranteed to be called from a > printing kthread, which for console drivers like fbcon will prove to be > helpful in cleaning up its code. I see. > I will reserve the word "atomic" _only_ when talking about which > printing callback is used. That should help to avoid associating the > callback with a certain context or priority. But I think the name > "write_atomic" is appropriate. Sounds good. > >> The "nesting" array is needed in order to support a feature that is not > >> explained in the commit message: If nested OOPS/WARN/PANIC occur, only > >> the outermost OOPS/WARN/PANIC will do the flushing. I will add this > >> information to the commit message. > > > > What is the motivation for the feature, please? > > During the demo at LPC2022 we had the situation that there was a large > backlog when a WARN was hit. With current mainline the first line of the > WARN is put into the ringbuffer and then the entire backlog is flushed > before storing the rest of the WARN into the ringbuffer. At the time it > was obvious that we should finish storing the WARN message and then > start flushing the backlog. This talks about contenxt using NBCON_PRIO_EMERGENCY. I am pretty sure that we want to flush the messages synchronously from printk() in panic(). Let's discuss this in the other thread with Linus in Cc [1]. > > My proposal: > > > > There are only two higher priority contexts: > > > > + NBCON_PRIO_PANIC should be used when panic_cpu == raw_smp_processor_id() > > This is the case with v2. Ah, I see, nbcon_atomic_enter(NBCON_PRIO_PANIC) is called right after setting panic_cpu. But this is actually another reason to get rid of the nbcon_atomic_enter() call why? + The information about entering context with NBCON_PRIO_PANIC is already provided by panic_cpu variable. + Nesting is not possible because only one one context is allowed to acquire panic_cpu. + nbcon_atomic_exit() is almost useless because it is called only in one code path. Explicit nbcon_atomic_flush_all() calls are needed in other code paths. IMHO, getting rid of nbcon_atomic_enter(NBCON_PRIO_PANIC) would have several advantages: + enter()/exit() API would be needed only for NBCON_PRIO_EMERGENCY. We could call it nbcon_emergency_enter()/exit() and avoid the confusing atomic_enter name. + Nesting is important only for NBCON_PRIO_EMERGENCY context => the per-CPU variable might be a simple refecence counter => easier code and no need to remember the previous value and pass it to _exit() as a parameter. > > + NBCON_PRIO_EMERGENCY contex would require some enter/exit wrappers > > and tracking. But it does not necessarily need to be per-CPU > > variable. > < > > I think about adding "int printk_state" into struct task_struct. > > It might be useful also for other things, e.g. for storing the last > > log level of non-finished message. Entering section with enforced > > minimal loglevel or so. > > printk() calls are quite rare. And warning states are even more > rare. IMHO adding such a field to every task is a huge price to > pay. Also, printk operates in non-task contexts (hardirq, nmi). Although > @current probably points to something existing, there could be some > tricky corner cases. Fair enough. Per-CPU variables are fine after all. > My proposal: > > You have provided excellent feedback regarding naming and > documentation. Allow me to fix these things to clarify the various > functions and their roles. We should first agree on the approach in panic() in the other thread [1]. IMHO, the context using NBCON_PRIO_EMERGENCY should be the only one where we need enter/exit and the deferred flush. In this case, the per-CPU variable would be a simple nesting counter. And maybe the explicit flush would be needed only in emergency context. => easier code, logic, naming. [1] https://lore.kernel.org/r/ZSADUKp8oJ2Ws2vC@alley Best Regards, Petr
Hi Petr, On 2023-10-06, Petr Mladek <pmladek@suse.com> wrote: >> During the demo at LPC2022 we had the situation that there was a large >> backlog when a WARN was hit. With current mainline the first line of the >> WARN is put into the ringbuffer and then the entire backlog is flushed >> before storing the rest of the WARN into the ringbuffer. At the time it >> was obvious that we should finish storing the WARN message and then >> start flushing the backlog. > > This talks about the "emergency" context (WARN/OOPS/watchdog). > The system might be in big troubles but it would still try to continue. > > Do we really want to defer the flush also for panic() context? We can start flushing right after the backtrace is in the ringbuffer. But flushing the backlog _before_ putting the backtrace into the ringbuffer was not desired because if there is a large backlog, the machine may not survive to get the backtrace out. And in that case it won't even be in the ringbuffer to be used by other debugging tools. > I ask because I was not on LPC 2022 in person and I do not remember > all details. The LPC2022 demo/talk was recorded: https://www.youtube.com/watch?v=TVhNcKQvzxI At 55:55 is where the situation occurred and triggered the conversation, ultimately leading to this new feature. You may also want to reread my summary: https://lore.kernel.org/lkml/875yheqh6v.fsf@jogness.linutronix.de as well as Linus' follow-up message: https://lore.kernel.org/lkml/CAHk-=wieXPMGEm7E=Sz2utzZdW1d=9hJBwGYAaAipxnMXr0Hvg@mail.gmail.com > But it is tricky in panic(), see 8th patch at > https://lore.kernel.org/r/20230919230856.661435-9-john.ogness@linutronix.de > > + nbcon_atomic_exit() is called only in one code path. Correct. When panic() is complete and the machine goes into its infinite loop. This is also the point where it will attempt an unsafe flush, if it could not get the messages out yet. > + nbcon_atomic_flush_all() is used in other paths. It looks like > a "Whack a mole" game to me. Several different outputs occur during panic(). The flush is everywhere where something significant has been put into the ringbuffer and now it would be OK to flush it. > + messages are never emitted by printk kthread either because > CPUs are stopped or the kthread is not allowed to get the lock Correct. > I see only one positive of the explicit flush. The consoles would > not delay crash_exec() and the crash dump might be closer to > the point where panic() was called. It's only about getting the critical messages into the ringbuffer before flushing. And since various things can go wrong during the many actions within panic(), it makes sense to flush in between those actions. > Otherwise I see only negatives => IMHO, we want to flush atomic > consoles synchronously from printk() in panic(). > > Does anyone really want explicit flushes in panic()? So far you are the only one speaking against it. I expect as time goes on it will get even more complex as it becomes tunable (also something we talked about during the demo). John
On Sun 2023-10-08 12:19:21, John Ogness wrote: > Hi Petr, > > On 2023-10-06, Petr Mladek <pmladek@suse.com> wrote: > >> During the demo at LPC2022 we had the situation that there was a large > >> backlog when a WARN was hit. With current mainline the first line of the > >> WARN is put into the ringbuffer and then the entire backlog is flushed > >> before storing the rest of the WARN into the ringbuffer. At the time it > >> was obvious that we should finish storing the WARN message and then > >> start flushing the backlog. > > > > This talks about the "emergency" context (WARN/OOPS/watchdog). > > The system might be in big troubles but it would still try to continue. > > > > Do we really want to defer the flush also for panic() context? > > We can start flushing right after the backtrace is in the > ringbuffer. But flushing the backlog _before_ putting the backtrace into > the ringbuffer was not desired because if there is a large backlog, the > machine may not survive to get the backtrace out. And in that case it > won't even be in the ringbuffer to be used by other debugging > tools. We really have to distinguish emergency and panic context! > > I ask because I was not on LPC 2022 in person and I do not remember > > all details. > > The LPC2022 demo/talk was recorded: > > https://www.youtube.com/watch?v=TVhNcKQvzxI > > At 55:55 is where the situation occurred and triggered the conversation, > ultimately leading to this new feature. Thanks for the link. My understanding is that the following scenario has happened: 1. System was booting and messages were being flushed using the kthread. 2. WARN() happened. It printed the 1st line, took over the per-console console lock and started flushing the backlog. There were still many pending messages from the boot. 3. NMI came and caused panic(). The panic context printed its first line, took over the console from the WARN context, flushed the rest of the backlog and continued printing/flushing more messages from the panic() context. Problem: People complained that they saw only the first line from WARN(). The related detailed info, including backtrace, was missing. It was because panic() interrupted WARN() before it was able to flush the backlog and print/flush all WARN() messages. Proposed solution: WARN()/emergency context should first store the messages and flush them at the end. It would increase the chance that all WARN() messages will be stored in the ring buffer before NMI/panic() is called. panic() would then flush all pending messages including the stored WARN() messages. Important: The problem is that panic() interrupted WARN(). There is _no_ problem with interrupting panic(). Let me repeat, nested panic() is not possible! > You may also want to reread my summary: > > https://lore.kernel.org/lkml/875yheqh6v.fsf@jogness.linutronix.de Again, thanks for the pointer. Let me paste 2 paragraphs here: <paste> - Printing the backlog is important! If some emergency situation occurs, make sure the backlog gets printed. - When an emergency occurs, put the full backtrace into the ringbuffer before flushing any backlog. This ensures that the backtrace makes it into the ringbuffer in case a panic occurs while flushing the backlog. </paste> My understanding is: 1st paragraph is the reason why: + we have three priorities: normal, emergency, panic + messages in normal context might be offloaded to kthread + emergency and panic context should try to flush the messages from this context. 2nd paragraph talks about that emergency context should first store the messages and flush them later. And the important part: "in case a panic occurs while flushing the backlog. => panic might interrupt emergency It clearly distinguishes emergency and panic context. > as well as Linus' follow-up message: > > https://lore.kernel.org/lkml/CAHk-=wieXPMGEm7E=Sz2utzZdW1d=9hJBwGYAaAipxnMXr0Hvg@mail.gmail.com IMHO, the important part is: <paste> Yeah, I really liked the notion of doing the oops with just filling the back buffer but still getting it printed out if something goes wrong in the middle. </paste> He was talking about oops => emergency context Also he wanted to get it out when something goes wrong in the middle => panic in the middle ? And another paragraph: <paste> I doubt it ends up being an issue in practice, but basically I wanted to just pipe up and say that the exact details of how much of the back buffer needs to be flushed first _could_ be tweaked if it ever does come up as an issue. </paste> Linus had doubts that there might be problems with too long backlog in practice. And I have the doubts as well. And this is my view. The deferred flush is trying to solve a corner case and we are forgetting what blocked printk kthreads >10 years. > > But it is tricky in panic(), see 8th patch at > > https://lore.kernel.org/r/20230919230856.661435-9-john.ogness@linutronix.de > > > > + nbcon_atomic_exit() is called only in one code path. > > Correct. When panic() is complete and the machine goes into its infinite > loop. This is also the point where it will attempt an unsafe flush, if > it could not get the messages out yet. > > > + nbcon_atomic_flush_all() is used in other paths. It looks like > > a "Whack a mole" game to me. > > Several different outputs occur during panic(). The flush is everywhere > where something significant has been put into the ringbuffer and now it > would be OK to flush it. No, we could _never_ say that the flush is everywhere where something important happens. panic() might fail anywhere. The last printed message might be an important clue. And it can be any message. > > + messages are never emitted by printk kthread either because > > CPUs are stopped or the kthread is not allowed to get the lock > > Correct. > > > I see only one positive of the explicit flush. The consoles would > > not delay crash_exec() and the crash dump might be closer to > > the point where panic() was called. > > It's only about getting the critical messages into the ringbuffer before > flushing. And since various things can go wrong during the many actions > within panic(), it makes sense to flush in between those actions. I am glad that we agree that "various things can go wrong during panic". Are we sure that the patchset added the explicit flush right after each possible problematic place? What about crash_exec(), various kmsg dumpers, notifiers? > > Otherwise I see only negatives => IMHO, we want to flush atomic > > consoles synchronously from printk() in panic(). > > > > Does anyone really want explicit flushes in panic()? > > So far you are the only one speaking against it. I expect as time goes > on it will get even more complex as it becomes tunable (also something > we talked about during the demo). From my POV: 1. We are just two people discussing it at the moment => word against word. 2. Please, try to read my reply again. I agreed with delaying the flush in emergency context. But I am strongly against explicit flushes during panic at least in the initial implementation. 3. IMHO, there might be a lot of misunderstanding caused by the word "emergency" context. Some people might see it as OOPs/WARN/panic and other might want to handle panic special way. I see it: + emergency - huge danger, medical check needed, patient might survive + panic - patient is about to die, try to get some secrets from him before he dies. Any sentence of the secret might be important. It would be pity to ignore his last A4 page just because it was not complete. Sigh, I did my best to explain why the nesting problem is only in the emergency context and why panic is different. Feel free to ask if anything is still unclear. Anyway, I am _not_ going to accept the explicit flushes in panic() unless you show me a problem with interrupted panic() or Linus explicitly says that the explicit flushes make sense. Best Regards, Petr
On 2023-10-09, Petr Mladek <pmladek@suse.com> wrote: > We really have to distinguish emergency and panic context! OK. >> The LPC2022 demo/talk was recorded: >> >> https://www.youtube.com/watch?v=TVhNcKQvzxI >> >> At 55:55 is where the situation occurred and triggered the conversation, >> ultimately leading to this new feature. > > Thanks for the link. My understanding is that the following scenario > has happened: > > 1. System was booting and messages were being flushed using the kthread. > > 2. WARN() happened. It printed the 1st line, took over the per-console > console lock and started flushing the backlog. There were still > many pending messages from the boot. > > 3. NMI came and caused panic(). The panic context printed its first line, > took over the console from the WARN context, flushed the rest > of the backlog and continued printing/flushing more messages from > the panic() context. > > > Problem: > > People complained that they saw only the first line from WARN(). > The related detailed info, including backtrace, was missing. > > It was because panic() interrupted WARN() before it was able > to flush the backlog and print/flush all WARN() messages. Thanks for taking the time to review it in detail. > Proposed solution: > > WARN()/emergency context should first store the messages and > flush them at the end. > > It would increase the chance that all WARN() messages will > be stored in the ring buffer before NMI/panic() is called. > > panic() would then flush all pending messages including > the stored WARN() messages. OK. > Important: > > The problem is that panic() interrupted WARN(). Ack. >> You may also want to reread my summary: >> >> https://lore.kernel.org/lkml/875yheqh6v.fsf@jogness.linutronix.de > > Again, thanks for the pointer. Let me paste 2 paragraphs here: > > <paste> > - Printing the backlog is important! If some emergency situation occurs, > make sure the backlog gets printed. > > - When an emergency occurs, put the full backtrace into the ringbuffer > before flushing any backlog. This ensures that the backtrace makes it > into the ringbuffer in case a panic occurs while flushing the backlog. > </paste> > > My understanding is: > > 1st paragraph is the reason why: > > + we have three priorities: normal, emergency, panic > > + messages in normal context might be offloaded to kthread > > + emergency and panic context should try to flush the messages > from this context. > > > 2nd paragraph talks about that emergency context should first store > the messages and flush them later. And the important part: > > "in case a panic occurs while flushing the backlog. > > => panic might interrupt emergency > > It clearly distinguishes emergency and panic context. > > >> as well as Linus' follow-up message: >> >> https://lore.kernel.org/lkml/CAHk-=wieXPMGEm7E=Sz2utzZdW1d=9hJBwGYAaAipxnMXr0Hvg@mail.gmail.com > > IMHO, the important part is: > > <paste> > Yeah, I really liked the notion of doing the oops with just filling > the back buffer but still getting it printed out if something goes > wrong in the middle. > </paste> > > He was talking about oops => emergency context > > Also he wanted to get it out when something goes wrong in the middle > => panic in the middle ? > > > And another paragraph: > > <paste> > I doubt it ends up being an issue in practice, but basically I wanted > to just pipe up and say that the exact details of how much of the back > buffer needs to be flushed first _could_ be tweaked if it ever does > come up as an issue. > </paste> > > Linus had doubts that there might be problems with too long backlog > in practice. And I have the doubts as well. > > And this is my view. The deferred flush is trying to solve a corner > case and we are forgetting what blocked printk kthreads >10 years. OK. Thank you for the detailed analysis. For v3 I will do something similar to what you proposed [0], except that I will use a per-cpu variable (to track printk emergency nesting) instead of adding a new field to the task struct. John Ogness [0] https://lore.kernel.org/lkml/ZRLBxsXPCym2NC5Q@alley/
[Added more people in cc] On 10/08/23 at 12:19pm, John Ogness wrote: > Hi Petr, > > On 2023-10-06, Petr Mladek <pmladek@suse.com> wrote: > >> During the demo at LPC2022 we had the situation that there was a large > >> backlog when a WARN was hit. With current mainline the first line of the > >> WARN is put into the ringbuffer and then the entire backlog is flushed > >> before storing the rest of the WARN into the ringbuffer. At the time it > >> was obvious that we should finish storing the WARN message and then > >> start flushing the backlog. > > > > This talks about the "emergency" context (WARN/OOPS/watchdog). > > The system might be in big troubles but it would still try to continue. > > > > Do we really want to defer the flush also for panic() context? > > We can start flushing right after the backtrace is in the > ringbuffer. But flushing the backlog _before_ putting the backtrace into > the ringbuffer was not desired because if there is a large backlog, the > machine may not survive to get the backtrace out. And in that case it > won't even be in the ringbuffer to be used by other debugging > tools. > > > I ask because I was not on LPC 2022 in person and I do not remember > > all details. > > The LPC2022 demo/talk was recorded: > > https://www.youtube.com/watch?v=TVhNcKQvzxI > > At 55:55 is where the situation occurred and triggered the conversation, > ultimately leading to this new feature. > > You may also want to reread my summary: > > https://lore.kernel.org/lkml/875yheqh6v.fsf@jogness.linutronix.de > > as well as Linus' follow-up message: > > https://lore.kernel.org/lkml/CAHk-=wieXPMGEm7E=Sz2utzZdW1d=9hJBwGYAaAipxnMXr0Hvg@mail.gmail.com > > > But it is tricky in panic(), see 8th patch at > > https://lore.kernel.org/r/20230919230856.661435-9-john.ogness@linutronix.de > > > > + nbcon_atomic_exit() is called only in one code path. > > Correct. When panic() is complete and the machine goes into its infinite > loop. This is also the point where it will attempt an unsafe flush, if > it could not get the messages out yet. > > > + nbcon_atomic_flush_all() is used in other paths. It looks like > > a "Whack a mole" game to me. > > Several different outputs occur during panic(). The flush is everywhere > where something significant has been put into the ringbuffer and now it > would be OK to flush it. > > > + messages are never emitted by printk kthread either because > > CPUs are stopped or the kthread is not allowed to get the lock > > Correct. > > > I see only one positive of the explicit flush. The consoles would > > not delay crash_exec() and the crash dump might be closer to > > the point where panic() was called. > > It's only about getting the critical messages into the ringbuffer before > flushing. And since various things can go wrong during the many actions > within panic(), it makes sense to flush in between those actions. > > > Otherwise I see only negatives => IMHO, we want to flush atomic > > consoles synchronously from printk() in panic(). > > > > Does anyone really want explicit flushes in panic()? > > So far you are the only one speaking against it. I expect as time goes > on it will get even more complex as it becomes tunable (also something > we talked about during the demo). Flush consoles in panic kexec case sounds not good, but I have no deep understanding about the atomic printk series, added kexec list and reviewers in cc. > > John > Thanks Dave
Hi Dave, On 2023-10-16, Dave Young <dyoung@redhat.com> wrote: >> > Does anyone really want explicit flushes in panic()? >> >> So far you are the only one speaking against it. I expect as time >> goes on it will get even more complex as it becomes tunable (also >> something we talked about during the demo). > > Flush consoles in panic kexec case sounds not good, but I have no deep > understanding about the atomic printk series, added kexec list and > reviewers in cc. Currently every printk() message tries to flush immediately. This series introduced a new method of first allowing a set of printk() messages to be stored to the ringbuffer and then flushing the full set. That is what this discussion was about. The issue with allowing a set of printk() messages to be stored is that you need to explicitly mark in code where the actual flushing should occur. Petr's argument is that we do not want to insert "flush points" into the panic() function and instead we should do as we do now: flush each printk() message immediately. In the end (for my upcoming v3 series) I agreed with Petr. We will continue to keep things as they are now: flush each printk() message immediately. Currently consoles try to flush unsafely before kexec. With the atomic printk series our goal is to only perform _safe_ flushing until all other panic operations are complete. Only at the very end of panic() would unsafe flushing be attempted. John
diff --git a/include/linux/console.h b/include/linux/console.h index e4fc6f7c1496..25a3ddd39083 100644 --- a/include/linux/console.h +++ b/include/linux/console.h @@ -452,10 +452,14 @@ static inline bool console_is_registered(const struct console *con) hlist_for_each_entry(con, &console_list, node) #ifdef CONFIG_PRINTK +extern enum nbcon_prio nbcon_atomic_enter(enum nbcon_prio prio); +extern void nbcon_atomic_exit(enum nbcon_prio prio, enum nbcon_prio prev_prio); extern bool nbcon_can_proceed(struct nbcon_write_context *wctxt); extern bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt); extern bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt); #else +static inline enum nbcon_prio nbcon_atomic_enter(enum nbcon_prio prio) { return NBCON_PRIO_NONE; } +static inline void nbcon_atomic_exit(enum nbcon_prio prio, enum nbcon_prio prev_prio) { } static inline bool nbcon_can_proceed(struct nbcon_write_context *wctxt) { return false; } static inline bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt) { return false; } static inline bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt) { return false; } diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c index b96077152f49..9359906b575b 100644 --- a/kernel/printk/nbcon.c +++ b/kernel/printk/nbcon.c @@ -961,6 +961,95 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt) return nbcon_context_exit_unsafe(ctxt); } +/** + * struct nbcon_cpu_state - Per CPU printk context state + * @prio: The current context priority level + * @nesting: Per priority nest counter + */ +struct nbcon_cpu_state { + enum nbcon_prio prio; + int nesting[NBCON_PRIO_MAX]; +}; + +static DEFINE_PER_CPU(struct nbcon_cpu_state, nbcon_pcpu_state); +static struct nbcon_cpu_state early_nbcon_pcpu_state __initdata; + +/** + * nbcon_get_cpu_state - Get the per CPU console state pointer + * + * Returns either a pointer to the per CPU state of the current CPU or to + * the init data state during early boot. + */ +static __ref struct nbcon_cpu_state *nbcon_get_cpu_state(void) +{ + if (!printk_percpu_data_ready()) + return &early_nbcon_pcpu_state; + + return this_cpu_ptr(&nbcon_pcpu_state); +} + +/** + * nbcon_atomic_enter - Enter a context that enforces atomic printing + * @prio: Priority of the context + * + * Return: The previous priority that needs to be fed into + * the corresponding nbcon_atomic_exit() + * Context: Any context. Disables migration. + */ +enum nbcon_prio nbcon_atomic_enter(enum nbcon_prio prio) +{ + struct nbcon_cpu_state *cpu_state; + enum nbcon_prio prev_prio; + + migrate_disable(); + + cpu_state = nbcon_get_cpu_state(); + + prev_prio = cpu_state->prio; + if (prio > prev_prio) + cpu_state->prio = prio; + + /* + * Increment the nesting on @cpu_state->prio (instead of + * @prio) so that a WARN() nested within a panic printout + * does not attempt to scribble state. + */ + cpu_state->nesting[cpu_state->prio]++; + + return prev_prio; +} + +/** + * nbcon_atomic_exit - Exit a context that enforces atomic printing + * @prio: Priority of the context to leave + * @prev_prio: Priority of the previous context for restore + * + * Context: Any context. Enables migration. + * + * @prev_prio is the priority returned by the corresponding + * nbcon_atomic_enter(). + */ +void nbcon_atomic_exit(enum nbcon_prio prio, enum nbcon_prio prev_prio) +{ + struct nbcon_cpu_state *cpu_state; + + cpu_state = nbcon_get_cpu_state(); + + /* + * Undo the nesting of nbcon_atomic_enter() at the CPU state + * priority. + */ + cpu_state->nesting[cpu_state->prio]--; + + /* + * Restore the previous priority, which was returned by + * nbcon_atomic_enter(). + */ + cpu_state->prio = prev_prio; + + migrate_enable(); +} + /** * nbcon_alloc - Allocate buffers needed by the nbcon console * @con: Console to allocate buffers for