Message ID | 20230919230856.661435-7-john.ogness@linutronix.de |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp3941514vqi; Wed, 20 Sep 2023 00:26:29 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFwvpeGaE7PLTvUx5IpohD8dMk37ToMbCN2BOX7oQURfUgdhUalQNRkFefOHZnCjXv76KTn X-Received: by 2002:a17:90b:4f88:b0:268:557e:1848 with SMTP id qe8-20020a17090b4f8800b00268557e1848mr1845475pjb.2.1695194789360; Wed, 20 Sep 2023 00:26:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695194789; cv=none; d=google.com; s=arc-20160816; b=LJIFHQ1ZY9pcWvLYGg+fVIYOYqkACSmTuDWPWhXR9BBnmFacy/TpzFelsOpjK0afv1 Upk4J8ReseBR5L2vU9kdBtORTRfkaX4OAbQZHBpWS9cUPMt1k6LU/tKpk6ga6A1U+4sK ZE25GQjI8832SUBNWGaZR8eHTG1agIdEL2SxRuOUWccS3+jezZVvILqDffUhg+UwBiDI uC8LzhDkgU5fVdWkMDfSCkYL32hOlryYTUqdZUbRZEonCR1nMkcO8I4HL4LhXPpj8dsY vSf0NEROYI+vefPQqe1fStaKG3TVWNC+LyiMIKs8d4VESmIl0+btv3u/qlaPe3p3cda1 h2Rg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:dkim-signature :dkim-signature:from; bh=OzbPeuWkTorGv3d2yuxEAEu0gnIlvdjiqaqtf+AucME=; fh=pKnPKUsir0uEGSsor+4Zc2vgbu+g+ayvUgsdzkuXaoA=; b=r7NJLIPhyXGfPiD+OA06xYYBzMeyPcUhkIhVnrunQtll0IBASfcyiieNiW3y7PS+pz qKQhSprfdcAJOQzOfuuoCE8b4M2D2gEJH5drpOzE/c3lHurLvgZ21M9XkA2YxWAa/Oku lUz7DkhjHv6e8frbukryoPDjAHTnaRRdMz02KWtSTYtQNOX3l6R+cG1JcleWAxxcZ8uJ e7nZe56kT+sqEd46i8y1dWvnlYlwJjYrQn47s3jCGGBqaYU0aqswjIeGX9+KuVrPcZhJ H9LsTQ3sk+KuhnSsLXWaIjcLxVcQ0cAN1N/Wkc8smKxhXeXTXL3E6qeuJ6/OdT/Kh6Aa GiZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=AcOxDy26; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id mv16-20020a17090b199000b00262ebe643a2si940349pjb.186.2023.09.20.00.26.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 00:26:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=AcOxDy26; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 3C7D981D7A78; Tue, 19 Sep 2023 16:09:58 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233660AbjISXJh (ORCPT <rfc822;toshivichauhan@gmail.com> + 26 others); Tue, 19 Sep 2023 19:09:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233410AbjISXJV (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 19 Sep 2023 19:09:21 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15213C6 for <linux-kernel@vger.kernel.org>; Tue, 19 Sep 2023 16:09:15 -0700 (PDT) From: John Ogness <john.ogness@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1695164952; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OzbPeuWkTorGv3d2yuxEAEu0gnIlvdjiqaqtf+AucME=; b=AcOxDy26/wZHBrY1PDymcspuNPgrI1Epe/gZj0sMxSW60NjlAnd6XQ6j36zzzT9zkdYknI SyHklaTdzHgcn9LBQa9GGeaoNOXlRJLOqVyOGnYrx+zkzbIjOj+rGfYaUoKZUHiO1i/fJH oRx2nLs96R3O1IprrRjd2SBLf/ZzZYM+KNEtDNGbqNxLXnoWFeL0oc2AHgyQV9MqCQR7Qi 5z3jbF9q+31Ek8GNWMxr5rL6OX2Euegaz4BAcEFUl5UJQycdekHwYzhpuVHgZpAp3Ui5WR RMvkiMCLnfrW7ohY4ApfANrEpdmBI0//+TmvQRYsEISQxTsj/D2MYWTr6R03sw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1695164952; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OzbPeuWkTorGv3d2yuxEAEu0gnIlvdjiqaqtf+AucME=; b=gKmdzdskT7Ri+LOcvfojaRdY0IBLQEasZSxD8GXhIC5B7lEqTcovfFCrvzcWfX0+Tniup9 d/UYvefBi8WwQrAw== To: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org>, Steven Rostedt <rostedt@goodmis.org>, Thomas Gleixner <tglx@linutronix.de>, linux-kernel@vger.kernel.org Subject: [PATCH printk v2 06/11] printk: nbcon: Wire up nbcon console atomic flushing Date: Wed, 20 Sep 2023 01:14:51 +0206 Message-Id: <20230919230856.661435-7-john.ogness@linutronix.de> In-Reply-To: <20230919230856.661435-1-john.ogness@linutronix.de> References: <20230919230856.661435-1-john.ogness@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,INVALID_DATE_TZ_ABSURD, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 19 Sep 2023 16:09:58 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777540571101023045 X-GMAIL-MSGID: 1777540571101023045 |
Series |
wire up nbcon atomic printing
|
|
Commit Message
John Ogness
Sept. 19, 2023, 11:08 p.m. UTC
From: Thomas Gleixner <tglx@linutronix.de> Perform nbcon console atomic flushing at key call sites: nbcon_atomic_exit() - When exiting from the outermost atomic printing section. If the priority was NBCON_PRIO_PANIC, attempt a second flush allowing unsafe hostile takeovers. console_flush_on_panic() - Called from several call sites to trigger ringbuffer dumping in an urgent situation. console_flush_on_panic() - Typically the panic() function will take care of atomic flushing the nbcon consoles on panic. However, there are several users of console_flush_on_panic() outside of panic(). printk_trigger_flush() - Used in urgent situations to trigger a dump in an irq_work context. However, the atomic flushing part happens in the calling context since an alternative context is not required. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> --- kernel/printk/nbcon.c | 10 ++++++++++ kernel/printk/printk.c | 6 ++++++ 2 files changed, 16 insertions(+)
Comments
On Wed 2023-09-20 01:14:51, John Ogness wrote: > From: Thomas Gleixner <tglx@linutronix.de> > > Perform nbcon console atomic flushing at key call sites: > > nbcon_atomic_exit() - When exiting from the outermost atomic > printing section. IMHO, it would not help because it all depends whether nbcon_context_try_acquire() succeeds or now, see below. > If the priority was NBCON_PRIO_PANIC, > attempt a second flush allowing unsafe hostile > takeovers. I was first scared that this would be called by any printk() in panic(). But it seems that nbcon_atomic_exit() is called only one after disabling printk(). It might deserve a comment where it is supposed to be used. See a proposal below. > console_flush_on_panic() - Called from several call sites to > trigger ringbuffer dumping in an urgent situation. > > console_flush_on_panic() - Typically the panic() function will This is a second description of console_flush_of_panic() which looks like a mistake. > take care of atomic flushing the nbcon consoles on > panic. However, there are several users of > console_flush_on_panic() outside of panic(). The generic panic() seems to use console_flush_on_panic() correctly at the very end. Hmm, I see that console_flush_on_panic() is called also in arch/loongarch/kernel/reset.c and arch/powerpc/kernel/traps.c. The loongarch code uses it in halt(). IMHO, it would be better to use the normal flush. But maybe they accept the risk. I know nothing about this architecture and who uses it. The function defined on powerpc is then used in: + arch/powerpc/platforms/powernv/opal.c: IMHO, it should be replaced there by normal flush. It seems to call the generic panic() later. + arch/powerpc/platforms/ps3/setup.c: This seems to be used as a panic notifier ppc_panic_platform_handler(). I am not completely sure but it does not look like the final flush. + arch/powerpc/platforms/pseries/setup.c: Same as ps3/setup.c. Also "just" a panic notifier. Anyway, we should make clear that console_flush_on_panic() might break the system and should be called as the last attempt to flush consoles. The above arch-specific users are worth review. > --- a/kernel/printk/nbcon.c > +++ b/kernel/printk/nbcon.c > @@ -1092,6 +1092,11 @@ void nbcon_atomic_flush_all(void) > * Return: The previous priority that needs to be fed into > * the corresponding nbcon_atomic_exit() > * Context: Any context. Disables migration. > + * > + * When within an atomic printing section, no atomic printing occurs. This > + * is to allow all emergency messages to be dumped into the ringbuffer before > + * flushing the ringbuffer. The comment sounds like it is an advantage. But I think that it would be a disadvantage. Fortunately, this is not true. Even the atomic context tries to flush the messages immediately when it is able to get the per-cpu lock. It happes via console_flush_all() called from console_unlock(). > + * The actual atomic printing occurs when exiting > + * the outermost atomic printing section. > */ > enum nbcon_prio nbcon_atomic_enter(enum nbcon_prio prio) > { > @@ -1130,8 +1135,13 @@ void nbcon_atomic_exit(enum nbcon_prio prio, enum nbcon_prio prev_prio) > { > struct nbcon_cpu_state *cpu_state; > > + __nbcon_atomic_flush_all(false); IMHO, this would not help. Either this context was able to acquire the lock and flush each message directly. Or it would fail to get the lock even here. > + > cpu_state = nbcon_get_cpu_state(); > > + if (cpu_state->prio == NBCON_PRIO_PANIC) > + __nbcon_atomic_flush_all(true); It would deserve a comment that nbcon_atomic_exit() is used in panic() to calm down the consoles completely, similar effect as setting the variable "suppress_printk". > + > /* > * Undo the nesting of nbcon_atomic_enter() at the CPU state > * priority. > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 6ef33cefa4d0..419c0fabc481 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -3159,6 +3159,8 @@ void console_flush_on_panic(enum con_flush_mode mode) > console_srcu_read_unlock(cookie); > } > > + nbcon_atomic_flush_all(); > + > console_flush_all(false, &next_seq, &handover); It seems that console_flush_all() tries to flush nbcon conosoles as well. And nbcon_atomic_flush_all() is explicitely called close above this flush_on_panic(). This is from panic() after this patchset is applied. void panic(const char *fmt, ...) { [...] nbcon_atomic_flush_all(); console_unblank(); /* * We may have ended up stopping the CPU holding the lock (in * smp_send_stop()) while still having some valuable data in the console * buffer. Try to acquire the lock then release it regardless of the * result. The release will also print the buffers out. Locks debug * should be disabled to avoid reporting bad unlock balance when * panic() is not being callled from OOPS. */ debug_locks_off(); console_flush_on_panic(CONSOLE_FLUSH_PENDING); By other words, we try to flush nbcon consoles 3 times almost immediately after each other. I believe that there is a motivation to do so. Anyway, I want to make sure that it was on purpose. It would deserve some comment what is the purpose. Otherwise, people would tend to remove the "redundant" calls. > } > > @@ -3903,6 +3905,10 @@ void defer_console_output(void) > > void printk_trigger_flush(void) > { > + migrate_disable(); > + nbcon_atomic_flush_all(); > + migrate_enable(); Can this be actually called in NMI? > + > defer_console_output(); > } This function would deserve some description because it looks strange. There are three names which are contradicting each other: + trigger_flush: it sounds like it tells someone to do the flush + nbcon_atomic_flush_all: does the flush immediately + defer_console_output: sounds like it prevents the current context to flush the consoles immediately We should either choose better names and/or add comments: /** * console_flush_or_trigger() - Make sure that the consoles will get flushed. * * Try to flush consoles when possible or queue flushing consoles like * in the deferred printk. * * Context: Can be used in any context (including NMI?) */ void printk_flush_or_trigger(void) { /* * Try to flush consoles which do not depend on console_lock() * and support safe atomic_write() */ migrate_disable(); nbcon_atomic_flush_all(); migrate_enable(); /* Try flushing legacy consoles or queue the deferred handling. */ if (!in_nmi() && console_trylock()) console_unlock(); else defer_console_output(); } All in all. I feel a bit overwhelmed by the many *flush*() functions at the moment. Some flush only nbcon consoles. Some flush all. Some are using only the safe takeover/handover and some allow also harsh takeover. Some ignore port->lock because of bust_spinlock(). Some are ignoring console_lock. They are called on different locations. The nbcon variants are called explicitly and also inside the generic *flush*() functions. It is partly because it is Friday evening. Anyway, I need to think more about it. Best Regards, Petr
On 2023-09-22, Petr Mladek <pmladek@suse.com> wrote: >> Perform nbcon console atomic flushing at key call sites: >> >> nbcon_atomic_exit() - When exiting from the outermost atomic >> printing section. > > IMHO, it would not help because it all depends > whether nbcon_context_try_acquire() succeeds or now, see below. Sure, but it first attempts a _safe_ flush on all nbcon consoles that alow it, _then_ it falls back to an _unsafe_ flush (which cannot fail, but might explode). >> If the priority was NBCON_PRIO_PANIC, >> attempt a second flush allowing unsafe hostile >> takeovers. > > I was first scared that this would be called by any printk() > in panic(). But it seems that nbcon_atomic_exit() is called > only one after disabling printk(). It might deserve a comment > where it is supposed to be used. See a proposal below. OK. >> console_flush_on_panic() - Called from several call sites to >> trigger ringbuffer dumping in an urgent situation. >> >> console_flush_on_panic() - Typically the panic() function will > > This is a second description of console_flush_of_panic() which > looks like a mistake. Oops. The first one should not be there. >> take care of atomic flushing the nbcon consoles on >> panic. However, there are several users of >> console_flush_on_panic() outside of panic(). > > The generic panic() seems to use console_flush_on_panic() correctly > at the very end. > > Hmm, I see that console_flush_on_panic() is called also in [...] > Anyway, we should make clear that console_flush_on_panic() might break > the system and should be called as the last attempt to flush consoles. > The above arch-specific users are worth review. In an upcoming series you will see that console_flush_on_panic() only takes the console lock if there are legacy consoles. Ideally, eventually there will only be nbcon consoles, so your concern would disappear. And if those users continue to use legacy consoles, then the risks will be the same as now. >> * Return: The previous priority that needs to be fed into >> * the corresponding nbcon_atomic_exit() >> * Context: Any context. Disables migration. >> + * >> + * When within an atomic printing section, no atomic printing occurs. This >> + * is to allow all emergency messages to be dumped into the ringbuffer before >> + * flushing the ringbuffer. > > The comment sounds like it is an advantage. But I think that it would be > a disadvantage. Please explain. At LPC2022 we agreed it was an advantage and it is implemented on purpose using the atomic printing sections. > Fortunately, this is not true. Even the atomic context tries > to flush the messages immediately when it is able to get > the per-cpu lock. It happes via console_flush_all() called > from console_unlock(). You are talking about legacy consoles. The goal of the nbcon consoles is to introduce a _new_ console type to support controlled decisions for emergency printing. Legacy consoles will continue to work (or not work) as before. >> @@ -1130,8 +1135,13 @@ void nbcon_atomic_exit(enum nbcon_prio prio, enum nbcon_prio prev_prio) >> { >> struct nbcon_cpu_state *cpu_state; >> >> + __nbcon_atomic_flush_all(false); > > IMHO, this would not help. Either this context was able to acquire the > lock and flush each message directly. What do you mean by "help"? We are exiting an atomic printing section. This is where the accumulated emergency messages are to be printed. If this is a nested atomic printing section, it does nothing because the outermost atomic printing section will flush. > Or it would fail to get the lock even here. If it fails to get the lock, we are talking about the worst case scenario. That scenario demands unsafe printing, which is only allowed in panic. If we are not in panic, it is assumed the current owner will take care of it. >> + >> cpu_state = nbcon_get_cpu_state(); >> >> + if (cpu_state->prio == NBCON_PRIO_PANIC) >> + __nbcon_atomic_flush_all(true); > > It would deserve a comment that nbcon_atomic_exit() is used > in panic() to calm down the consoles completely, similar effect > as setting the variable "suppress_printk". Actually, it is nbcon_atomic_enter() that does the calming down, but only for the emergency CPU. Other CPUs are allowed to print as wildly as they want. And all CPUs are allowed to continue to fill the ringbuffer. @suppress_printk is quite different as it globally blocks any new messages from being stored into the ringbuffer. This is not calming down the consoles, but rather putting a finite limit on what they produce. >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c >> index 6ef33cefa4d0..419c0fabc481 100644 >> --- a/kernel/printk/printk.c >> +++ b/kernel/printk/printk.c >> @@ -3159,6 +3159,8 @@ void console_flush_on_panic(enum con_flush_mode mode) >> console_srcu_read_unlock(cookie); >> } >> >> + nbcon_atomic_flush_all(); >> + >> console_flush_all(false, &next_seq, &handover); > > It seems that console_flush_all() tries to flush nbcon conosoles > as well. And nbcon_atomic_flush_all() is explicitely called > close above this flush_on_panic(). This is from panic() > after this patchset is applied. > > > void panic(const char *fmt, ...) > { > [...] > nbcon_atomic_flush_all(); > > console_unblank(); > > /* > * We may have ended up stopping the CPU holding the lock (in > * smp_send_stop()) while still having some valuable data in the console > * buffer. Try to acquire the lock then release it regardless of the > * result. The release will also print the buffers out. Locks debug > * should be disabled to avoid reporting bad unlock balance when > * panic() is not being callled from OOPS. > */ > debug_locks_off(); > console_flush_on_panic(CONSOLE_FLUSH_PENDING); > > > By other words, we try to flush nbcon consoles 3 times almost > immediately after each other. "Almost immediately" != "immediately". Keep in mind that this is in atomic printing context. Generally speaking, no messages will be seen on the consoles yet. Anytime time there is a signficant piece (such as console_unblank()) it would probably be wise to flush. We will probably want even more flush points. But I wanted to at least start with the risky points (such as console_unblank() when legacy consoles still exist). > It would deserve some comment what is the purpose. Otherwise, people > would tend to remove the "redundant" calls. OK. >> @@ -3903,6 +3905,10 @@ void defer_console_output(void) >> >> void printk_trigger_flush(void) >> { >> + migrate_disable(); >> + nbcon_atomic_flush_all(); >> + migrate_enable(); > > Can this be actually called in NMI? No. I need to add in_nmi() checks. >> defer_console_output(); >> } > > This function would deserve some description because it looks strange. > There are three names which are contradicting each other: > > + trigger_flush: it sounds like it tells someone to do the flush > > + nbcon_atomic_flush_all: does the flush immediately > > + defer_console_output: sounds like it prevents the current context > to flush the consoles immediately Agreed. > We should either choose better names and/or add comments: > > /** > * console_flush_or_trigger() - Make sure that the consoles will get flushed. > * > * Try to flush consoles when possible or queue flushing consoles like > * in the deferred printk. > * > * Context: Can be used in any context (including NMI?) > */ > void printk_flush_or_trigger(void) > { > /* > * Try to flush consoles which do not depend on console_lock() > * and support safe atomic_write() > */ if (!in_nmi()) migrate_disable(); > nbcon_atomic_flush_all(); if (!in_nmi()) migrate_enable(); > > /* Try flushing legacy consoles or queue the deferred handling. */ > if (!in_nmi() && console_trylock()) > console_unlock(); > else > defer_console_output(); > } I would be OK with this name and (fixed up for NMI) implementation. > All in all. I feel a bit overwhelmed by the many *flush*() functions at > the moment. Some flush only nbcon consoles. Some flush all. Some > are using only the safe takeover/handover and some allow also > harsh takeover. Some ignore port->lock because of bust_spinlock(). > Some are ignoring console_lock. They are called on different > locations. The nbcon variants are called explicitly and also > inside the generic *flush*() functions. Agreed. This madness exists in part because we are continuing to support legacy consoles. I tried to keep the two separate as much as possible. But when it comes to flushing, there will be some overlap. When working on the code, I tend to either look at the legacy consoles _or_ the nbcon consoles. If you try to visualize the whole picture on a system with legacy and nbcon consoles, it is a bit overwhelming. I recommend focussing on them separately. I.e. when talking about nbcon consoles, there is no reason to be concerned about the state of the console lock or the port lock. When talking about legacy consoles, there is no reason to be concerned about CPU atomic printing states. John
On Mon 2023-09-25 15:43:03, John Ogness wrote: > On 2023-09-22, Petr Mladek <pmladek@suse.com> wrote: > >> console_flush_on_panic() - Called from several call sites to > >> trigger ringbuffer dumping in an urgent situation. > >> > >> console_flush_on_panic() - Typically the panic() function will > > > > This is a second description of console_flush_of_panic() which > > looks like a mistake. > > Oops. The first one should not be there. > > >> take care of atomic flushing the nbcon consoles on > >> panic. However, there are several users of > >> console_flush_on_panic() outside of panic(). > > > > The generic panic() seems to use console_flush_on_panic() correctly > > at the very end. > > > > Hmm, I see that console_flush_on_panic() is called also in > > [...] > > > Anyway, we should make clear that console_flush_on_panic() might break > > the system and should be called as the last attempt to flush consoles. > > The above arch-specific users are worth review. > > In an upcoming series you will see that console_flush_on_panic() only > takes the console lock if there are legacy consoles. Ideally, eventually > there will only be nbcon consoles, so your concern would disappear. The legacy consoles have two risk levels: 1. post->lock is ignored after bust_spinlocks() 2. even console_lock is ignored in console_flush_on_panic() The nbcon consoles have only one risk level: 1. unsafe takeover is allowed First, I thought that we wanted to allow the unsafe takeover in console_flush_on_panic(). In that case, this function would be dangerous even for nbcon consoles. Now, I remember that we wanted to allow it only before entering the infinite loop (blinking diodes). In this case, console_flush_on_panic() would be really safe for nbcon consoles. > And if those users continue to use legacy consoles, then the risks will > be the same as now. > > >> * Return: The previous priority that needs to be fed into > >> * the corresponding nbcon_atomic_exit() > >> * Context: Any context. Disables migration. > >> + * > >> + * When within an atomic printing section, no atomic printing occurs. This > >> + * is to allow all emergency messages to be dumped into the ringbuffer before > >> + * flushing the ringbuffer. > > > > The comment sounds like it is an advantage. But I think that it would be > > a disadvantage. > > Please explain. At LPC2022 we agreed it was an advantage and it is > implemented on purpose using the atomic printing sections. I am sorry but I do not remember the details. Do you remember the motivation, please? In each case, we can't just say that this works by design because someone somewhere agreed on it. We must explain why this is better and I do not see it at the moment. I am terribly sorry if I agreed with this and I disagree now. I have never been good in life discussion because there is no enough time to think about all consequences. Anyway, the proposed behavior (agreed on LPC2022) seems to contradict the original plan from LPC 2019, see https://lore.kernel.org/all/87k1acz5rx.fsf@linutronix.de/ Namely: --- cut --- 3. Rather than defining emergency _messages_, we define an emergency _state_ where the kernel wants to flush the messages immediately before dying. Unlike oops_in_progress, this state will not be visible to anything outside of the printk infrastructure. 4. When in emergency state, the kernel will use a new console callback write_atomic() to flush the messages in whatever context the CPU is in at that moment. Only consoles that implement the NMI-safe write_atomic() will be able to flush in this state. --- cut --- We wanted to flush it ASAP. I wonder if we discussed some limitations where the messages could not be flushed immediately. Maybe, we discussed a scenario when there are many pending messages which would delay flushing the emergency ones. But we need to flush them anyway. Now, I do not see any real advantage to first store all messages and flush them later in the same context. OK, flushign them immediately might cause delay when flushing the first emergency one. But storing all might cause overwrinting the first emergency messages. I hope that my proposed change would actually make things easier and will not affect that much the upcoming patchsets. Best Regards, Petr
On 2023-09-26, Petr Mladek <pmladek@suse.com> wrote: > Anyway, the proposed behavior (agreed on LPC2022) seems > to contradict the original plan from LPC 2019, see > https://lore.kernel.org/all/87k1acz5rx.fsf@linutronix.de/ > Namely: > > --- cut --- > 3. Rather than defining emergency _messages_, we define an emergency > _state_ where the kernel wants to flush the messages immediately before > dying. Unlike oops_in_progress, this state will not be visible to > anything outside of the printk infrastructure. > > 4. When in emergency state, the kernel will use a new console callback > write_atomic() to flush the messages in whatever context the CPU is in > at that moment. Only consoles that implement the NMI-safe write_atomic() > will be able to flush in this state. > --- cut --- > > We wanted to flush it ASAP. In 2019 we didn't have all the code yet. Yes, we assumed that flushing each individual printk() call would be how to do it. But in 2022, during a live demo with real code, we saw that it was a bad idea. I disagree that the 2022 decisions are contradicting the 2019 decisions. We have added more details because now we have a real implementation. v2 establishes an emergency state before dying. v2 flushes the messages safely before dying (and unsafely _immediately_ before dying). v2 flushes the messages in whatever context the CPU is in. v2 uses write_atomic() to perform the flush. John
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c index 2a9ff18fc78c..82e6a1678363 100644 --- a/kernel/printk/nbcon.c +++ b/kernel/printk/nbcon.c @@ -1092,6 +1092,11 @@ void nbcon_atomic_flush_all(void) * Return: The previous priority that needs to be fed into * the corresponding nbcon_atomic_exit() * Context: Any context. Disables migration. + * + * When within an atomic printing section, no atomic printing occurs. This + * is to allow all emergency messages to be dumped into the ringbuffer before + * flushing the ringbuffer. The actual atomic printing occurs when exiting + * the outermost atomic printing section. */ enum nbcon_prio nbcon_atomic_enter(enum nbcon_prio prio) { @@ -1130,8 +1135,13 @@ void nbcon_atomic_exit(enum nbcon_prio prio, enum nbcon_prio prev_prio) { struct nbcon_cpu_state *cpu_state; + __nbcon_atomic_flush_all(false); + cpu_state = nbcon_get_cpu_state(); + if (cpu_state->prio == NBCON_PRIO_PANIC) + __nbcon_atomic_flush_all(true); + /* * Undo the nesting of nbcon_atomic_enter() at the CPU state * priority. diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 6ef33cefa4d0..419c0fabc481 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -3159,6 +3159,8 @@ void console_flush_on_panic(enum con_flush_mode mode) console_srcu_read_unlock(cookie); } + nbcon_atomic_flush_all(); + console_flush_all(false, &next_seq, &handover); } @@ -3903,6 +3905,10 @@ void defer_console_output(void) void printk_trigger_flush(void) { + migrate_disable(); + nbcon_atomic_flush_all(); + migrate_enable(); + defer_console_output(); }