Message ID | 20240202085408.23251-2-daniel.van.vugt@canonical.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-49513-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:9bc1:b0:106:209c:c626 with SMTP id op1csp298097dyc; Fri, 2 Feb 2024 01:02:59 -0800 (PST) X-Google-Smtp-Source: AGHT+IFbdUo5usmhKLe7X65sSm+89xau/zQyCANx7bRVJJV8ZlWpvkHDO0NTHeA2WIiaJlo2Dx6s X-Received: by 2002:a05:6358:904b:b0:178:70ac:5a4f with SMTP id f11-20020a056358904b00b0017870ac5a4fmr8017462rwf.27.1706864578703; Fri, 02 Feb 2024 01:02:58 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706864578; cv=pass; d=google.com; s=arc-20160816; b=ZIZceZbBnDwJu+ak8UgxmpVQNrz2oJzGonk8wvBk5m/7rRLSqOjgBHzsdKe/XR4b7L EjkY235EtOMMjm8wxlFbIZkfufQw/ugGNbfPobeUlkIXj5ccYza5jwSt9eXS9fOcBuCK 7sy1G3sblRAfos5G1X9RFPZy909smQsJMAJ9eqlOXM2ElELAZaorFCovDTMqdk9HVE1w lSrajB1UIUaVKIanhgOj00uzBgzX1FaH5WWKhAfRpVwI0jN/koKJG396cey2tNlHFQOK UQ1F6pc/7582frLRApd/3LOYWx36Mbfz2PtIuZ3nXdZgPZYoxgDZzTxzG0LXC+RmMtir t6Aw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=9y1paqiSjtE07Weh2QloWiCQOyB0q+cBO1seRMkGQ5U=; fh=OCysyjNjDae4U3xYPCiX3kh1WBjEGSrbn+6p2qLmFE8=; b=PQYjaFygCWRq31XbXDgBcbwS8afkvIXFu6jKAWp2u5XFVrsV9lqgt/SX0Go2kgioPx 1vvk633sC+0fDKNSFMgDYMEqTL2WTZLEuSexmwAYmT7iG0I9KBZAArRK9IcbjxWXr0Bw vHH7OVKYR6c42URBkjVY8HXu0PqjLWocrQ1RZm7F4MzyFjK4It5yweJU6Jn39V28Jujy pcLSv1p9k/u/44gpgQJPN8ueC5t1sIlv/qO8oIPfSKm85mh23N7pCRF2aO1a6AnoSvLI Bf15qmJ2eFClgtMdpGSJGoFcwCR/r5EhzbSzq9WjgxOFQXozpuQxVeEUpGSXov3hluiB nkpA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b=jSwxgw1o; arc=pass (i=1 spf=pass spfdomain=canonical.com dkim=pass dkdomain=canonical.com dmarc=pass fromdomain=canonical.com); spf=pass (google.com: domain of linux-kernel+bounces-49513-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-49513-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com X-Forwarded-Encrypted: i=1; AJvYcCX+56GjGw6ocR0zwZDJjcmuv77pU8oz0L0hyOqjYEymJRmvUxRjeSqa+CcO8fHiEFn+98pOKpmf/U8rLbV2Ek2Tj7Sfxg== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id h4-20020a63f904000000b005d747b7058fsi1200721pgi.778.2024.02.02.01.02.58 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Feb 2024 01:02:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-49513-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b=jSwxgw1o; arc=pass (i=1 spf=pass spfdomain=canonical.com dkim=pass dkdomain=canonical.com dmarc=pass fromdomain=canonical.com); spf=pass (google.com: domain of linux-kernel+bounces-49513-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-49513-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 3231C291B60 for <ouuuleilei@gmail.com>; Fri, 2 Feb 2024 08:56:01 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 318C160DC6; Fri, 2 Feb 2024 08:55:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b="jSwxgw1o" Received: from smtp-relay-canonical-1.canonical.com (smtp-relay-canonical-1.canonical.com [185.125.188.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FDBB60257; Fri, 2 Feb 2024 08:55:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.125.188.121 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706864146; cv=none; b=rVIHaSTdsQ9QiXnDiSCKX6BOPSEnSE0kwgZPyb0+M+2pBYfpQVPVe+9Rq+Kw7k0htWDmcsnjQD3h78VEwmYO+IKT70XS1oEKJ/I5AevCJiWAm5QEBTNvcRR+LB6Vmdbbf782a3SZ7nzB//2d9PS7YW1cW9xRObhDMPCl4y5c27Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706864146; c=relaxed/simple; bh=QJN4byeE5cmqPJcwaf5F7ajBalsBJwhZ6yaGrvXhP0c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aEjVoj/2imqTYFeQ+SpR4pEqAuETzcPBmKJbz2XY8kZKQqJiRuNqpVWld6KiPKk5/43T1kroTAJIQS5xbgiwqqoEqDC7ndfUKiCWlBHj1wBm0kZx0JlHSMTY5HCKeOVg+WIyhxVtpK3gh12QHeygyu9I9E5LJbwnrNJ5+5FNogQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=canonical.com; spf=pass smtp.mailfrom=canonical.com; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b=jSwxgw1o; arc=none smtp.client-ip=185.125.188.121 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=canonical.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=canonical.com Received: from dragon.fritz.box (unknown [58.7.187.102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-1.canonical.com (Postfix) with ESMTPSA id 6EE233FE92; Fri, 2 Feb 2024 08:55:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1706864133; bh=9y1paqiSjtE07Weh2QloWiCQOyB0q+cBO1seRMkGQ5U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jSwxgw1oinexsw8k0fTHX+F+36OtyHCsA6707oBJrYH9KWdzlxTFxDutKv68xnW/7 UfEwG4imbeeLaBu6PF50zny0Ko3uNsUGcnXCuDGo1+za4PyofPz1YTmmAisLYRxKFB RQNEtNMBu9f4mZqqtdFvj/420c0CRGIdO2lNVRM1c+Tt7A+UEYdnw3BlWbxh7qyskM H8Sux6yJcmNlyAU3TI/EhMQzFPNECg7SVkuhX6S8KzUlkgRfEo4V1XBNWXyBzCx/xw rqlCB9HpUFzKqVIZgLcwTjE9Y9ZM4j1yCGlS2NHDhJXaruP5ECg+/CzSaLLAqKgwA2 J40BhzF+wbxFA== From: Daniel van Vugt <daniel.van.vugt@canonical.com> To: Cc: Daniel van Vugt <daniel.van.vugt@canonical.com>, Mario Limonciello <mario.limonciello@amd.com>, Daniel Vetter <daniel@ffwll.ch>, Helge Deller <deller@gmx.de>, Jani Nikula <jani.nikula@intel.com>, Danilo Krummrich <dakr@redhat.com>, linux-fbdev@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: [PATCH 2/2] fbcon: Defer console takeover for splash screens to first switch Date: Fri, 2 Feb 2024 16:53:55 +0800 Message-ID: <20240202085408.23251-2-daniel.van.vugt@canonical.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240202085408.23251-1-daniel.van.vugt@canonical.com> References: <20240202085408.23251-1-daniel.van.vugt@canonical.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789777232791267970 X-GMAIL-MSGID: 1789777232791267970 |
Series |
[1/2] dummycon: Add dummycon_(un)register_switch_notifier
|
|
Commit Message
Daniel van Vugt
Feb. 2, 2024, 8:53 a.m. UTC
Until now, deferred console takeover only meant defer until there is
output. But that risks stepping on the toes of userspace splash screens,
as console messages may appear before the splash screen. So check for the
"splash" parameter (as used by Plymouth) and if present then extend the
deferral until the first switch.
Closes: https://bugs.launchpad.net/bugs/1970069
Cc: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Daniel van Vugt <daniel.van.vugt@canonical.com>
---
drivers/video/fbdev/core/fbcon.c | 32 +++++++++++++++++++++++++++++---
1 file changed, 29 insertions(+), 3 deletions(-)
Comments
On 2/2/2024 02:53, Daniel van Vugt wrote: > Until now, deferred console takeover only meant defer until there is > output. But that risks stepping on the toes of userspace splash screens, > as console messages may appear before the splash screen. So check for the > "splash" parameter (as used by Plymouth) and if present then extend the > deferral until the first switch. > > Closes: https://bugs.launchpad.net/bugs/1970069 > Cc: Mario Limonciello <mario.limonciello@amd.com> > Signed-off-by: Daniel van Vugt <daniel.van.vugt@canonical.com> > --- > drivers/video/fbdev/core/fbcon.c | 32 +++++++++++++++++++++++++++++--- > 1 file changed, 29 insertions(+), 3 deletions(-) > > diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c > index 63af6ab034..5b9f7635f7 100644 > --- a/drivers/video/fbdev/core/fbcon.c > +++ b/drivers/video/fbdev/core/fbcon.c > @@ -76,6 +76,7 @@ > #include <linux/crc32.h> /* For counting font checksums */ > #include <linux/uaccess.h> > #include <asm/irq.h> > +#include <asm/cmdline.h> > > #include "fbcon.h" > #include "fb_internal.h" > @@ -146,6 +147,7 @@ static inline void fbcon_map_override(void) > > #ifdef CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER > static bool deferred_takeover = true; > +static int initial_console = -1; > #else > #define deferred_takeover false > #endif > @@ -3341,7 +3343,7 @@ static void fbcon_register_existing_fbs(struct work_struct *work) > console_unlock(); > } > > -static struct notifier_block fbcon_output_nb; > +static struct notifier_block fbcon_output_nb, fbcon_switch_nb; > static DECLARE_WORK(fbcon_deferred_takeover_work, fbcon_register_existing_fbs); > > static int fbcon_output_notifier(struct notifier_block *nb, > @@ -3358,6 +3360,21 @@ static int fbcon_output_notifier(struct notifier_block *nb, > > return NOTIFY_OK; > } > + > +static int fbcon_switch_notifier(struct notifier_block *nb, > + unsigned long action, void *data) > +{ > + struct vc_data *vc = data; > + > + WARN_CONSOLE_UNLOCKED(); > + > + if (vc->vc_num != initial_console) { > + dummycon_unregister_switch_notifier(&fbcon_switch_nb); > + dummycon_register_output_notifier(&fbcon_output_nb); > + } > + > + return NOTIFY_OK; > +} > #endif > > static void fbcon_start(void) > @@ -3370,7 +3387,14 @@ static void fbcon_start(void) > > if (deferred_takeover) { > fbcon_output_nb.notifier_call = fbcon_output_notifier; > - dummycon_register_output_notifier(&fbcon_output_nb); > + fbcon_switch_nb.notifier_call = fbcon_switch_notifier; > + initial_console = fg_console; > + > + if (cmdline_find_option_bool(boot_command_line, "splash")) > + dummycon_register_switch_notifier(&fbcon_switch_nb); So there is a problem here that this would only apply if the distro happened to use "splash" which some distros use something different. I looked at the matching plymouth code [1] and they have a bunch of variations of what they accept and what it does. [1] https://gitlab.freedesktop.org/plymouth/plymouth/-/blob/main/src/main.c?ref_type=heads#L888 If from the kernel side we're going to have code that caters to the userspace behavior of plymouth we probably need to match all those cases they do too. Another alternative could be to make it a kernel configuration option for which string to look for to activate this behavior. > + else > + dummycon_register_output_notifier(&fbcon_output_nb); > + > return; > } > #endif > @@ -3417,8 +3441,10 @@ void __exit fb_console_exit(void) > { > #ifdef CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER > console_lock(); > - if (deferred_takeover) > + if (deferred_takeover) { > dummycon_unregister_output_notifier(&fbcon_output_nb); > + dummycon_unregister_switch_notifier(&fbcon_switch_nb); > + } > console_unlock(); > > cancel_work_sync(&fbcon_deferred_takeover_work);
Hi All, On 2/2/24 09:53, Daniel van Vugt wrote: > Until now, deferred console takeover only meant defer until there is > output. But that risks stepping on the toes of userspace splash screens, > as console messages may appear before the splash screen. So check for the > "splash" parameter (as used by Plymouth) and if present then extend the > deferral until the first switch. Daniel, thank you for your patch but I do not believe that this is the right solution. Deferring fbcon takeover further then after the first text is output means that any errors about e.g. a corrupt initrd or the kernel erroring out / crashing will not be visible. When the kernel e.g. oopses or panics because of not finding its rootfs (I tested the latter option when writing the original deferred fbcon takeover code) then fbcon must takeover and print the messages from the dying kernel so that the user has some notion of what is going wrong. And since your patch seems to delay switching till the first vc-switch this means that e.g. even after say gdm refusing to start because of issues there still will be no text output. This makes debugging various issues much harder. Moreover Fedora has been doing flickerfree boot for many years without needing this. The kernel itself will be quiet as long as you set CONFIG_CONSOLE_LOGLEVEL_QUIET=3 Ubuntu atm has set this to 4 which means any kernel pr_err() or dev_err() messages will get through and since there are quite a few false positives of those Ubuntu really needs to set CONFIG_CONSOLE_LOGLEVEL_QUIET=3 to fix part of: https://bugs.launchpad.net/bugs/1970069 After that it is "just" a matter of not making userspace output anything unless it has errors to report. systemd already is quiet by default (only logging errors) when quiet is on the kernel commandline. So any remaining issues are Ubuntu specific boot process bits and Ubuntu really should be able to make those by silent unless they have important info (errors or other unexpected things) to report. Given that this will make debugging boot issues much harder and that there are other IMHO better alternatives I'm nacking this patch: NACK. FWIW I believe that I'm actually saving Ubuntu from shooting themselves in the foot here, hiding all sort of boot errors (like the initrd not finding /) until the user does a magic alt+f2 followed by alt+f1 incantation really is not doing yourself any favors wrt debugging any sort of boot failures. Regards, Hans > Closes: https://bugs.launchpad.net/bugs/1970069 > Cc: Mario Limonciello <mario.limonciello@amd.com> > Signed-off-by: Daniel van Vugt <daniel.van.vugt@canonical.com> > --- > drivers/video/fbdev/core/fbcon.c | 32 +++++++++++++++++++++++++++++--- > 1 file changed, 29 insertions(+), 3 deletions(-) > > diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c > index 63af6ab034..5b9f7635f7 100644 > --- a/drivers/video/fbdev/core/fbcon.c > +++ b/drivers/video/fbdev/core/fbcon.c > @@ -76,6 +76,7 @@ > #include <linux/crc32.h> /* For counting font checksums */ > #include <linux/uaccess.h> > #include <asm/irq.h> > +#include <asm/cmdline.h> > > #include "fbcon.h" > #include "fb_internal.h" > @@ -146,6 +147,7 @@ static inline void fbcon_map_override(void) > > #ifdef CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER > static bool deferred_takeover = true; > +static int initial_console = -1; > #else > #define deferred_takeover false > #endif > @@ -3341,7 +3343,7 @@ static void fbcon_register_existing_fbs(struct work_struct *work) > console_unlock(); > } > > -static struct notifier_block fbcon_output_nb; > +static struct notifier_block fbcon_output_nb, fbcon_switch_nb; > static DECLARE_WORK(fbcon_deferred_takeover_work, fbcon_register_existing_fbs); > > static int fbcon_output_notifier(struct notifier_block *nb, > @@ -3358,6 +3360,21 @@ static int fbcon_output_notifier(struct notifier_block *nb, > > return NOTIFY_OK; > } > + > +static int fbcon_switch_notifier(struct notifier_block *nb, > + unsigned long action, void *data) > +{ > + struct vc_data *vc = data; > + > + WARN_CONSOLE_UNLOCKED(); > + > + if (vc->vc_num != initial_console) { > + dummycon_unregister_switch_notifier(&fbcon_switch_nb); > + dummycon_register_output_notifier(&fbcon_output_nb); > + } > + > + return NOTIFY_OK; > +} > #endif > > static void fbcon_start(void) > @@ -3370,7 +3387,14 @@ static void fbcon_start(void) > > if (deferred_takeover) { > fbcon_output_nb.notifier_call = fbcon_output_notifier; > - dummycon_register_output_notifier(&fbcon_output_nb); > + fbcon_switch_nb.notifier_call = fbcon_switch_notifier; > + initial_console = fg_console; > + > + if (cmdline_find_option_bool(boot_command_line, "splash")) > + dummycon_register_switch_notifier(&fbcon_switch_nb); > + else > + dummycon_register_output_notifier(&fbcon_output_nb); > + > return; > } > #endif > @@ -3417,8 +3441,10 @@ void __exit fb_console_exit(void) > { > #ifdef CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER > console_lock(); > - if (deferred_takeover) > + if (deferred_takeover) { > dummycon_unregister_output_notifier(&fbcon_output_nb); > + dummycon_unregister_switch_notifier(&fbcon_switch_nb); > + } > console_unlock(); > > cancel_work_sync(&fbcon_deferred_takeover_work);
On 27/2/24 02:23, Hans de Goede wrote: > Hi All, > > On 2/2/24 09:53, Daniel van Vugt wrote: >> Until now, deferred console takeover only meant defer until there is >> output. But that risks stepping on the toes of userspace splash screens, >> as console messages may appear before the splash screen. So check for the >> "splash" parameter (as used by Plymouth) and if present then extend the >> deferral until the first switch. > > Daniel, thank you for your patch but I do not believe that this > is the right solution. Deferring fbcon takeover further then > after the first text is output means that any errors about e.g. > a corrupt initrd or the kernel erroring out / crashing will not > be visible. That's not really correct. If a boot failure has occurred after the splash then pressing escape shows the log. If a boot failure has occurred before the splash then it can be debugged visually by rebooting without the "splash" parameter. > > When the kernel e.g. oopses or panics because of not finding > its rootfs (I tested the latter option when writing the original > deferred fbcon takeover code) then fbcon must takeover and > print the messages from the dying kernel so that the user has > some notion of what is going wrong. Indeed, just reboot without the "splash" parameter to do that. > > And since your patch seems to delay switching till the first > vc-switch this means that e.g. even after say gdm refusing > to start because of issues there still will be no text > output. This makes debugging various issues much harder. I've debugged many gdm failures and it is never useful to use the console for those. Reboot and get the system journal instead. > > Moreover Fedora has been doing flickerfree boot for many > years without needing this. I believe Fedora has a mostly working solution, but not totally reliable, as mentioned in the commit message: "even systems whose splash exists in initrd may not be not immune because they still rely on racing against all possible kernel messages that might trigger the fbcon takeover" > > The kernel itself will be quiet as long as you set > CONFIG_CONSOLE_LOGLEVEL_QUIET=3 Ubuntu atm has set this > to 4 which means any kernel pr_err() or dev_err() > messages will get through and since there are quite > a few false positives of those Ubuntu really needs > to set CONFIG_CONSOLE_LOGLEVEL_QUIET=3 to fix part of: > https://bugs.launchpad.net/bugs/1970069 Incorrect. In my testing some laptops needed log level as low as 2 to go quiet. And the Ubuntu kernel team is never going to fix all those for non-sponsored devices. > > After that it is "just" a matter of not making userspace > output anything unless it has errors to report. > > systemd already is quiet by default (only logging > errors) when quiet is on the kernel commandline. Unfortunately not true for Ubuntu. We carry a noisy systemd patch which I'm told we can't remove in the short term: https://bugs.launchpad.net/ubuntu/+source/plymouth/+bug/1970069/comments/39 > > So any remaining issues are Ubuntu specific boot > process bits and Ubuntu really should be able to > make those by silent unless they have important > info (errors or other unexpected things) to report. > > Given that this will make debugging boot issues > much harder and that there are other IMHO better > alternatives I'm nacking this patch: NACK. > > FWIW I believe that I'm actually saving Ubuntu > from shooting themselves in the foot here, > hiding all sort of boot errors (like the initrd > not finding /) until the user does a magic > alt+f2 followed by alt+f1 incantation really is > not doing yourself any favors wrt debugging any > sort of boot failures. > > Regards, > > Hans Thanks for your input, but I respectfully disagree and did consider these points already. - Daniel > > > > > >> Closes: https://bugs.launchpad.net/bugs/1970069 >> Cc: Mario Limonciello <mario.limonciello@amd.com> >> Signed-off-by: Daniel van Vugt <daniel.van.vugt@canonical.com> >> --- >> drivers/video/fbdev/core/fbcon.c | 32 +++++++++++++++++++++++++++++--- >> 1 file changed, 29 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c >> index 63af6ab034..5b9f7635f7 100644 >> --- a/drivers/video/fbdev/core/fbcon.c >> +++ b/drivers/video/fbdev/core/fbcon.c >> @@ -76,6 +76,7 @@ >> #include <linux/crc32.h> /* For counting font checksums */ >> #include <linux/uaccess.h> >> #include <asm/irq.h> >> +#include <asm/cmdline.h> >> >> #include "fbcon.h" >> #include "fb_internal.h" >> @@ -146,6 +147,7 @@ static inline void fbcon_map_override(void) >> >> #ifdef CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER >> static bool deferred_takeover = true; >> +static int initial_console = -1; >> #else >> #define deferred_takeover false >> #endif >> @@ -3341,7 +3343,7 @@ static void fbcon_register_existing_fbs(struct work_struct *work) >> console_unlock(); >> } >> >> -static struct notifier_block fbcon_output_nb; >> +static struct notifier_block fbcon_output_nb, fbcon_switch_nb; >> static DECLARE_WORK(fbcon_deferred_takeover_work, fbcon_register_existing_fbs); >> >> static int fbcon_output_notifier(struct notifier_block *nb, >> @@ -3358,6 +3360,21 @@ static int fbcon_output_notifier(struct notifier_block *nb, >> >> return NOTIFY_OK; >> } >> + >> +static int fbcon_switch_notifier(struct notifier_block *nb, >> + unsigned long action, void *data) >> +{ >> + struct vc_data *vc = data; >> + >> + WARN_CONSOLE_UNLOCKED(); >> + >> + if (vc->vc_num != initial_console) { >> + dummycon_unregister_switch_notifier(&fbcon_switch_nb); >> + dummycon_register_output_notifier(&fbcon_output_nb); >> + } >> + >> + return NOTIFY_OK; >> +} >> #endif >> >> static void fbcon_start(void) >> @@ -3370,7 +3387,14 @@ static void fbcon_start(void) >> >> if (deferred_takeover) { >> fbcon_output_nb.notifier_call = fbcon_output_notifier; >> - dummycon_register_output_notifier(&fbcon_output_nb); >> + fbcon_switch_nb.notifier_call = fbcon_switch_notifier; >> + initial_console = fg_console; >> + >> + if (cmdline_find_option_bool(boot_command_line, "splash")) >> + dummycon_register_switch_notifier(&fbcon_switch_nb); >> + else >> + dummycon_register_output_notifier(&fbcon_output_nb); >> + >> return; >> } >> #endif >> @@ -3417,8 +3441,10 @@ void __exit fb_console_exit(void) >> { >> #ifdef CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER >> console_lock(); >> - if (deferred_takeover) >> + if (deferred_takeover) { >> dummycon_unregister_output_notifier(&fbcon_output_nb); >> + dummycon_unregister_switch_notifier(&fbcon_switch_nb); >> + } >> console_unlock(); >> >> cancel_work_sync(&fbcon_deferred_takeover_work); >
Hi, On 2/27/24 02:06, Daniel van Vugt wrote: > On 27/2/24 02:23, Hans de Goede wrote: >> Hi All, >> >> On 2/2/24 09:53, Daniel van Vugt wrote: >>> Until now, deferred console takeover only meant defer until there is >>> output. But that risks stepping on the toes of userspace splash screens, >>> as console messages may appear before the splash screen. So check for the >>> "splash" parameter (as used by Plymouth) and if present then extend the >>> deferral until the first switch. >> >> Daniel, thank you for your patch but I do not believe that this >> is the right solution. Deferring fbcon takeover further then >> after the first text is output means that any errors about e.g. >> a corrupt initrd or the kernel erroring out / crashing will not >> be visible. > > That's not really correct. If a boot failure has occurred after the splash then > pressing escape shows the log. Hmm, I guess this is with the latest plymouth which has a builtin terminal emulator for kernels without VT support ? Pressing ESC does not to a VC switch and AFAICT that is what you are triggering on to allow fbcon takeover after this patches. > If a boot failure has occurred before the splash > then it can be debugged visually by rebooting without the "splash" parameter. Which requires the user to know this and requires the user to know how to edit kernel cmdline parameters in e.g. grub. This is not a good user experience. We want inexperienced users to just be able to point a phone camera at the screen and take a picture of the errors. >> When the kernel e.g. oopses or panics because of not finding >> its rootfs (I tested the latter option when writing the original >> deferred fbcon takeover code) then fbcon must takeover and >> print the messages from the dying kernel so that the user has >> some notion of what is going wrong. > > Indeed, just reboot without the "splash" parameter to do that. Again not something beginning Linux users will be able to do, what happened to "Ubuntu: Linux for Human Beings" ? >> And since your patch seems to delay switching till the first >> vc-switch this means that e.g. even after say gdm refusing >> to start because of issues there still will be no text >> output. This makes debugging various issues much harder. > > I've debugged many gdm failures and it is never useful to use the console for > those. Reboot and get the system journal instead. But users will not see any errors now, meaning they don't even know where to begin with troubleshooting ... >> Moreover Fedora has been doing flickerfree boot for many >> years without needing this. > > I believe Fedora has a mostly working solution, but not totally reliable, as > mentioned in the commit message: > > "even systems whose splash exists in initrd may not be not immune because they > still rely on racing against all possible kernel messages that might > trigger the fbcon takeover" Only very serious kernel errors like oopses or panics will trigger the takeover and that is *exactly* what we want. There is a race where plymouth may hide such vary serious messages, if plymouth does manage to start before the errors, but that is actually an existing issue which we don't want to make bigger by *always* hiding such errors. >> The kernel itself will be quiet as long as you set >> CONFIG_CONSOLE_LOGLEVEL_QUIET=3 Ubuntu atm has set this >> to 4 which means any kernel pr_err() or dev_err() >> messages will get through and since there are quite >> a few false positives of those Ubuntu really needs >> to set CONFIG_CONSOLE_LOGLEVEL_QUIET=3 to fix part of: >> https://bugs.launchpad.net/bugs/1970069 > > Incorrect. In my testing some laptops needed log level as low as 2 to go quiet. > And the Ubuntu kernel team is never going to fix all those for non-sponsored > devices. Notice that atm Ubuntu's kernel is using the too high CONFIG_CONSOLE_LOGLEVEL_QUIET=4 with CONFIG_CONSOLE_LOGLEVEL_QUIET=3 getting any errors logged to the console should be very very rare. The only thing I can think of is if the kernel oopses / WARN()s early on but the cause is innocent enough that the boot happily continues. In that case actually showing the oops/WARN() is a good thing. For all the years Fedora has had flickerfree boot I have seen zero bug reports about this. If you have examples of this actually being a problem please file bugs for them (launchpad or bugzilla.kernel.org is fine) and then lets take a look at those bugs and fix them. These should be so rare that I'm not worried about this becoming a never ending list of bugs (unlike pr_err() / dev_err() messages of which there are simply too many). >> After that it is "just" a matter of not making userspace >> output anything unless it has errors to report. >> >> systemd already is quiet by default (only logging >> errors) when quiet is on the kernel commandline. > > Unfortunately not true for Ubuntu. We carry a noisy systemd patch which I'm > told we can't remove in the short term: > > https://bugs.launchpad.net/ubuntu/+source/plymouth/+bug/1970069/comments/39 Well then make the patch less noisy? Suppressing non error message unless in debug mode should be easy even with a downstream patch. > Thanks for your input, but I respectfully disagree and did consider these > points already. Sorry, but your real problem here seems to be your noisy downstream systemd patch. I'm not going to ack a kernel patch which I consider a bad idea because Ubuntu has a non standard systemd patch which is to trigger happy with spamming the console. So this is still a NACK from me. Regards, Hans >>> Closes: https://bugs.launchpad.net/bugs/1970069 >>> Cc: Mario Limonciello <mario.limonciello@amd.com> >>> Signed-off-by: Daniel van Vugt <daniel.van.vugt@canonical.com> >>> --- >>> drivers/video/fbdev/core/fbcon.c | 32 +++++++++++++++++++++++++++++--- >>> 1 file changed, 29 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c >>> index 63af6ab034..5b9f7635f7 100644 >>> --- a/drivers/video/fbdev/core/fbcon.c >>> +++ b/drivers/video/fbdev/core/fbcon.c >>> @@ -76,6 +76,7 @@ >>> #include <linux/crc32.h> /* For counting font checksums */ >>> #include <linux/uaccess.h> >>> #include <asm/irq.h> >>> +#include <asm/cmdline.h> >>> >>> #include "fbcon.h" >>> #include "fb_internal.h" >>> @@ -146,6 +147,7 @@ static inline void fbcon_map_override(void) >>> >>> #ifdef CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER >>> static bool deferred_takeover = true; >>> +static int initial_console = -1; >>> #else >>> #define deferred_takeover false >>> #endif >>> @@ -3341,7 +3343,7 @@ static void fbcon_register_existing_fbs(struct work_struct *work) >>> console_unlock(); >>> } >>> >>> -static struct notifier_block fbcon_output_nb; >>> +static struct notifier_block fbcon_output_nb, fbcon_switch_nb; >>> static DECLARE_WORK(fbcon_deferred_takeover_work, fbcon_register_existing_fbs); >>> >>> static int fbcon_output_notifier(struct notifier_block *nb, >>> @@ -3358,6 +3360,21 @@ static int fbcon_output_notifier(struct notifier_block *nb, >>> >>> return NOTIFY_OK; >>> } >>> + >>> +static int fbcon_switch_notifier(struct notifier_block *nb, >>> + unsigned long action, void *data) >>> +{ >>> + struct vc_data *vc = data; >>> + >>> + WARN_CONSOLE_UNLOCKED(); >>> + >>> + if (vc->vc_num != initial_console) { >>> + dummycon_unregister_switch_notifier(&fbcon_switch_nb); >>> + dummycon_register_output_notifier(&fbcon_output_nb); >>> + } >>> + >>> + return NOTIFY_OK; >>> +} >>> #endif >>> >>> static void fbcon_start(void) >>> @@ -3370,7 +3387,14 @@ static void fbcon_start(void) >>> >>> if (deferred_takeover) { >>> fbcon_output_nb.notifier_call = fbcon_output_notifier; >>> - dummycon_register_output_notifier(&fbcon_output_nb); >>> + fbcon_switch_nb.notifier_call = fbcon_switch_notifier; >>> + initial_console = fg_console; >>> + >>> + if (cmdline_find_option_bool(boot_command_line, "splash")) >>> + dummycon_register_switch_notifier(&fbcon_switch_nb); >>> + else >>> + dummycon_register_output_notifier(&fbcon_output_nb); >>> + >>> return; >>> } >>> #endif >>> @@ -3417,8 +3441,10 @@ void __exit fb_console_exit(void) >>> { >>> #ifdef CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER >>> console_lock(); >>> - if (deferred_takeover) >>> + if (deferred_takeover) { >>> dummycon_unregister_output_notifier(&fbcon_output_nb); >>> + dummycon_unregister_switch_notifier(&fbcon_switch_nb); >>> + } >>> console_unlock(); >>> >>> cancel_work_sync(&fbcon_deferred_takeover_work); >> >
On 27/2/24 21:47, Hans de Goede wrote: > Hi, > > On 2/27/24 02:06, Daniel van Vugt wrote: >> On 27/2/24 02:23, Hans de Goede wrote: >>> Hi All, >>> >>> On 2/2/24 09:53, Daniel van Vugt wrote: >>>> Until now, deferred console takeover only meant defer until there is >>>> output. But that risks stepping on the toes of userspace splash screens, >>>> as console messages may appear before the splash screen. So check for the >>>> "splash" parameter (as used by Plymouth) and if present then extend the >>>> deferral until the first switch. >>> >>> Daniel, thank you for your patch but I do not believe that this >>> is the right solution. Deferring fbcon takeover further then >>> after the first text is output means that any errors about e.g. >>> a corrupt initrd or the kernel erroring out / crashing will not >>> be visible. >> >> That's not really correct. If a boot failure has occurred after the splash then >> pressing escape shows the log. > > Hmm, I guess this is with the latest plymouth which has a builtin terminal > emulator for kernels without VT support ? Pressing ESC does not to a VC > switch and AFAICT that is what you are triggering on to allow fbcon takeover > after this patches. > >> If a boot failure has occurred before the splash >> then it can be debugged visually by rebooting without the "splash" parameter. > > Which requires the user to know this and requires the user to know how to > edit kernel cmdline parameters in e.g. grub. This is not a good user > experience. We want inexperienced users to just be able to point > a phone camera at the screen and take a picture of the errors. As the person who contributes most to Ubuntu bug triage I have a pretty good idea of what users experience. And when they do experience boot failures it's either with a blank screen already (because userspace, not the kernel's fault), or they report an error message to us that's not relevant to the real failure. In both cases our users understand (or learn quickly) the ease with which they can reboot either to recovery mode, or a previous kernel. We then direct them to collect the full log of the failed boot. Because even if they were booting with a full text console, most of those bugs don't reveal themselves on the console. If they did then they'd be visible in the system journal along with everything else. What is not a "good user experience" is the boot messages people are shown on every boot. > > >>> When the kernel e.g. oopses or panics because of not finding >>> its rootfs (I tested the latter option when writing the original >>> deferred fbcon takeover code) then fbcon must takeover and >>> print the messages from the dying kernel so that the user has >>> some notion of what is going wrong. >> >> Indeed, just reboot without the "splash" parameter to do that. > > Again not something beginning Linux users will be able to do, > what happened to "Ubuntu: Linux for Human Beings" ? It is more user-friendly than it sounds. Just reboot, trigger the grub menu and select recovery mode or an older kernel (which is always available). I think some boot failures also take you to the grub menu automatically next time? > >>> And since your patch seems to delay switching till the first >>> vc-switch this means that e.g. even after say gdm refusing >>> to start because of issues there still will be no text >>> output. This makes debugging various issues much harder. >> >> I've debugged many gdm failures and it is never useful to use the console for >> those. Reboot and get the system journal instead. > > But users will not see any errors now, meaning they don't > even know where to begin with troubleshooting ... Indeed. I deal with those users every day and they log their bugs against the wrong components, understandably. We then work with them to triage and reassign the issue to the right place. > >>> Moreover Fedora has been doing flickerfree boot for many >>> years without needing this. >> >> I believe Fedora has a mostly working solution, but not totally reliable, as >> mentioned in the commit message: >> >> "even systems whose splash exists in initrd may not be not immune because they >> still rely on racing against all possible kernel messages that might >> trigger the fbcon takeover" > > Only very serious kernel errors like oopses or panics will > trigger the takeover and that is *exactly* what we want. > > There is a race where plymouth may hide such vary serious > messages, if plymouth does manage to start before the errors, > but that is actually an existing issue which we don't want > to make bigger by *always* hiding such errors. > >>> The kernel itself will be quiet as long as you set >>> CONFIG_CONSOLE_LOGLEVEL_QUIET=3 Ubuntu atm has set this >>> to 4 which means any kernel pr_err() or dev_err() >>> messages will get through and since there are quite >>> a few false positives of those Ubuntu really needs >>> to set CONFIG_CONSOLE_LOGLEVEL_QUIET=3 to fix part of: >>> https://bugs.launchpad.net/bugs/1970069 >> >> Incorrect. In my testing some laptops needed log level as low as 2 to go quiet. >> And the Ubuntu kernel team is never going to fix all those for non-sponsored >> devices. > > Notice that atm Ubuntu's kernel is using the too high > CONFIG_CONSOLE_LOGLEVEL_QUIET=4 with > CONFIG_CONSOLE_LOGLEVEL_QUIET=3 getting any errors logged > to the console should be very very rare. > > The only thing I can think of is if the kernel oopses > / WARN()s early on but the cause is innocent enough > that the boot happily continues. > > In that case actually showing the oops/WARN() is a good > thing. > > For all the years Fedora has had flickerfree boot I have > seen zero bug reports about this. If you have examples > of this actually being a problem please file bugs for > them (launchpad or bugzilla.kernel.org is fine) and > then lets take a look at those bugs and fix them. > > These should be so rare that I'm not worried about this > becoming a never ending list of bugs (unlike pr_err() / > dev_err() messages of which there are simply too many). I personally own many laptops with so many different boot messages that it's overwhelming for me already to report bugs for each of them. Hence this patch. Also I don't own all the laptops in the world, so fixing all the errors just for my collection wouldn't solve all cases. Whereas this patch does. > >>> After that it is "just" a matter of not making userspace >>> output anything unless it has errors to report. >>> >>> systemd already is quiet by default (only logging >>> errors) when quiet is on the kernel commandline. >> >> Unfortunately not true for Ubuntu. We carry a noisy systemd patch which I'm >> told we can't remove in the short term: >> >> https://bugs.launchpad.net/ubuntu/+source/plymouth/+bug/1970069/comments/39 > > Well then make the patch less noisy? Suppressing non > error message unless in debug mode should be easy > even with a downstream patch. > >> Thanks for your input, but I respectfully disagree and did consider these >> points already. > > Sorry, but your real problem here seems to be your > noisy downstream systemd patch. I'm not going to ack > a kernel patch which I consider a bad idea because > Ubuntu has a non standard systemd patch which is > to trigger happy with spamming the console. Indeed the systemd patch is a big problem. We seem to have had it for 9 years or so. I only just discovered it recently and would love to drop it, but was told we can't. Its main problem is that it uses the console as a communication pipe to plymouth. So simply making it less noisy isn't possible without disabling its functionality. It was seemingly intended to run behind the splash, but since it does fsck it tends to run before the splash (because DRM startup takes a few more seconds). > > So this is still a NACK from me. > > Regards, > > Hans > > > > > >>>> Closes: https://bugs.launchpad.net/bugs/1970069 >>>> Cc: Mario Limonciello <mario.limonciello@amd.com> >>>> Signed-off-by: Daniel van Vugt <daniel.van.vugt@canonical.com> >>>> --- >>>> drivers/video/fbdev/core/fbcon.c | 32 +++++++++++++++++++++++++++++--- >>>> 1 file changed, 29 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c >>>> index 63af6ab034..5b9f7635f7 100644 >>>> --- a/drivers/video/fbdev/core/fbcon.c >>>> +++ b/drivers/video/fbdev/core/fbcon.c >>>> @@ -76,6 +76,7 @@ >>>> #include <linux/crc32.h> /* For counting font checksums */ >>>> #include <linux/uaccess.h> >>>> #include <asm/irq.h> >>>> +#include <asm/cmdline.h> >>>> >>>> #include "fbcon.h" >>>> #include "fb_internal.h" >>>> @@ -146,6 +147,7 @@ static inline void fbcon_map_override(void) >>>> >>>> #ifdef CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER >>>> static bool deferred_takeover = true; >>>> +static int initial_console = -1; >>>> #else >>>> #define deferred_takeover false >>>> #endif >>>> @@ -3341,7 +3343,7 @@ static void fbcon_register_existing_fbs(struct work_struct *work) >>>> console_unlock(); >>>> } >>>> >>>> -static struct notifier_block fbcon_output_nb; >>>> +static struct notifier_block fbcon_output_nb, fbcon_switch_nb; >>>> static DECLARE_WORK(fbcon_deferred_takeover_work, fbcon_register_existing_fbs); >>>> >>>> static int fbcon_output_notifier(struct notifier_block *nb, >>>> @@ -3358,6 +3360,21 @@ static int fbcon_output_notifier(struct notifier_block *nb, >>>> >>>> return NOTIFY_OK; >>>> } >>>> + >>>> +static int fbcon_switch_notifier(struct notifier_block *nb, >>>> + unsigned long action, void *data) >>>> +{ >>>> + struct vc_data *vc = data; >>>> + >>>> + WARN_CONSOLE_UNLOCKED(); >>>> + >>>> + if (vc->vc_num != initial_console) { >>>> + dummycon_unregister_switch_notifier(&fbcon_switch_nb); >>>> + dummycon_register_output_notifier(&fbcon_output_nb); >>>> + } >>>> + >>>> + return NOTIFY_OK; >>>> +} >>>> #endif >>>> >>>> static void fbcon_start(void) >>>> @@ -3370,7 +3387,14 @@ static void fbcon_start(void) >>>> >>>> if (deferred_takeover) { >>>> fbcon_output_nb.notifier_call = fbcon_output_notifier; >>>> - dummycon_register_output_notifier(&fbcon_output_nb); >>>> + fbcon_switch_nb.notifier_call = fbcon_switch_notifier; >>>> + initial_console = fg_console; >>>> + >>>> + if (cmdline_find_option_bool(boot_command_line, "splash")) >>>> + dummycon_register_switch_notifier(&fbcon_switch_nb); >>>> + else >>>> + dummycon_register_output_notifier(&fbcon_output_nb); >>>> + >>>> return; >>>> } >>>> #endif >>>> @@ -3417,8 +3441,10 @@ void __exit fb_console_exit(void) >>>> { >>>> #ifdef CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER >>>> console_lock(); >>>> - if (deferred_takeover) >>>> + if (deferred_takeover) { >>>> dummycon_unregister_output_notifier(&fbcon_output_nb); >>>> + dummycon_unregister_switch_notifier(&fbcon_switch_nb); >>>> + } >>>> console_unlock(); >>>> >>>> cancel_work_sync(&fbcon_deferred_takeover_work); >>> >> >
Hi Daniel, On 2/28/24 03:00, Daniel van Vugt wrote: > On 27/2/24 21:47, Hans de Goede wrote: <snip> > I think some boot failures also take you to the grub menu automatically next time? In Fedora all boot failures will unhide the grub menu on the next boot. This unfortunately relies on downstream changes so I don't know what Ubuntu does here. <snip> >>>> The kernel itself will be quiet as long as you set >>>> CONFIG_CONSOLE_LOGLEVEL_QUIET=3 Ubuntu atm has set this >>>> to 4 which means any kernel pr_err() or dev_err() >>>> messages will get through and since there are quite >>>> a few false positives of those Ubuntu really needs >>>> to set CONFIG_CONSOLE_LOGLEVEL_QUIET=3 to fix part of: >>>> https://bugs.launchpad.net/bugs/1970069 >>> >>> Incorrect. In my testing some laptops needed log level as low as 2 to go quiet. >>> And the Ubuntu kernel team is never going to fix all those for non-sponsored >>> devices. >> >> Notice that atm Ubuntu's kernel is using the too high >> CONFIG_CONSOLE_LOGLEVEL_QUIET=4 with >> CONFIG_CONSOLE_LOGLEVEL_QUIET=3 getting any errors logged >> to the console should be very very rare. >> >> The only thing I can think of is if the kernel oopses >> / WARN()s early on but the cause is innocent enough >> that the boot happily continues. >> >> In that case actually showing the oops/WARN() is a good >> thing. >> >> For all the years Fedora has had flickerfree boot I have >> seen zero bug reports about this. If you have examples >> of this actually being a problem please file bugs for >> them (launchpad or bugzilla.kernel.org is fine) and >> then lets take a look at those bugs and fix them. >> >> These should be so rare that I'm not worried about this >> becoming a never ending list of bugs (unlike pr_err() / >> dev_err() messages of which there are simply too many). > > I personally own many laptops with so many different boot messages that it's > overwhelming for me already to report bugs for each of them. Hence this patch. > > Also I don't own all the laptops in the world, so fixing all the errors just > for my collection wouldn't solve all cases. Whereas this patch does. Almost all of those boot messages are because Ubuntu has set CONFIG_CONSOLE_LOGLEVEL_QUIET too high. Once that is fixed there should be very little of not no messages left. I too own many laptops and I'm not seeing this issue on any of them. You claim you are still seeing errors with CONFIG_CONSOLE_LOGLEVEL_QUIET=3 yet you have not provided a single example! >> Sorry, but your real problem here seems to be your >> noisy downstream systemd patch. I'm not going to ack >> a kernel patch which I consider a bad idea because >> Ubuntu has a non standard systemd patch which is >> to trigger happy with spamming the console. > > Indeed the systemd patch is a big problem. We seem to have had it for 9 years > or so. I only just discovered it recently and would love to drop it, but was > told we can't. Its main problem is that it uses the console as a communication > pipe to plymouth. So simply making it less noisy isn't possible without > disabling its functionality. It was seemingly intended to run behind the > splash, but since it does fsck it tends to run before the splash (because DRM > startup takes a few more seconds). This does indeed sound like it is a non trivial problem to fix, but that is still not a good reason to add this (IMHO) hack to the kernel. The issue deferred fbcon takeover was designed to fix is that the fbcon would mess up the framebuffer contents even if nothing was ever logged to the console. The whole idea being that to still have the fbcon come up as soon as there are any messages. Actively hiding messages was never part of the design, so this is still a NACK from me. Also note that this matches how things work in grub and shim when I first implemented flickerfree boot I also had to patch shim and grub to not make EFI text output protocol calls (including init()) until they actually had some text to show. So the whole design here for shim, grub and the kernel has always been to not mess with the framebuffer until there is some text (any text) to output and then show that text immediately. I do not think that deviating from this design is a good idea. Regards, Hans >>>>> Closes: https://bugs.launchpad.net/bugs/1970069 >>>>> Cc: Mario Limonciello <mario.limonciello@amd.com> >>>>> Signed-off-by: Daniel van Vugt <daniel.van.vugt@canonical.com> >>>>> --- >>>>> drivers/video/fbdev/core/fbcon.c | 32 +++++++++++++++++++++++++++++--- >>>>> 1 file changed, 29 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c >>>>> index 63af6ab034..5b9f7635f7 100644 >>>>> --- a/drivers/video/fbdev/core/fbcon.c >>>>> +++ b/drivers/video/fbdev/core/fbcon.c >>>>> @@ -76,6 +76,7 @@ >>>>> #include <linux/crc32.h> /* For counting font checksums */ >>>>> #include <linux/uaccess.h> >>>>> #include <asm/irq.h> >>>>> +#include <asm/cmdline.h> >>>>> >>>>> #include "fbcon.h" >>>>> #include "fb_internal.h" >>>>> @@ -146,6 +147,7 @@ static inline void fbcon_map_override(void) >>>>> >>>>> #ifdef CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER >>>>> static bool deferred_takeover = true; >>>>> +static int initial_console = -1; >>>>> #else >>>>> #define deferred_takeover false >>>>> #endif >>>>> @@ -3341,7 +3343,7 @@ static void fbcon_register_existing_fbs(struct work_struct *work) >>>>> console_unlock(); >>>>> } >>>>> >>>>> -static struct notifier_block fbcon_output_nb; >>>>> +static struct notifier_block fbcon_output_nb, fbcon_switch_nb; >>>>> static DECLARE_WORK(fbcon_deferred_takeover_work, fbcon_register_existing_fbs); >>>>> >>>>> static int fbcon_output_notifier(struct notifier_block *nb, >>>>> @@ -3358,6 +3360,21 @@ static int fbcon_output_notifier(struct notifier_block *nb, >>>>> >>>>> return NOTIFY_OK; >>>>> } >>>>> + >>>>> +static int fbcon_switch_notifier(struct notifier_block *nb, >>>>> + unsigned long action, void *data) >>>>> +{ >>>>> + struct vc_data *vc = data; >>>>> + >>>>> + WARN_CONSOLE_UNLOCKED(); >>>>> + >>>>> + if (vc->vc_num != initial_console) { >>>>> + dummycon_unregister_switch_notifier(&fbcon_switch_nb); >>>>> + dummycon_register_output_notifier(&fbcon_output_nb); >>>>> + } >>>>> + >>>>> + return NOTIFY_OK; >>>>> +} >>>>> #endif >>>>> >>>>> static void fbcon_start(void) >>>>> @@ -3370,7 +3387,14 @@ static void fbcon_start(void) >>>>> >>>>> if (deferred_takeover) { >>>>> fbcon_output_nb.notifier_call = fbcon_output_notifier; >>>>> - dummycon_register_output_notifier(&fbcon_output_nb); >>>>> + fbcon_switch_nb.notifier_call = fbcon_switch_notifier; >>>>> + initial_console = fg_console; >>>>> + >>>>> + if (cmdline_find_option_bool(boot_command_line, "splash")) >>>>> + dummycon_register_switch_notifier(&fbcon_switch_nb); >>>>> + else >>>>> + dummycon_register_output_notifier(&fbcon_output_nb); >>>>> + >>>>> return; >>>>> } >>>>> #endif >>>>> @@ -3417,8 +3441,10 @@ void __exit fb_console_exit(void) >>>>> { >>>>> #ifdef CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER >>>>> console_lock(); >>>>> - if (deferred_takeover) >>>>> + if (deferred_takeover) { >>>>> dummycon_unregister_output_notifier(&fbcon_output_nb); >>>>> + dummycon_unregister_switch_notifier(&fbcon_switch_nb); >>>>> + } >>>>> console_unlock(); >>>>> >>>>> cancel_work_sync(&fbcon_deferred_takeover_work); >>>> >>> >> >
On 2/28/2024 05:54, Hans de Goede wrote: > Hi Daniel, > > On 2/28/24 03:00, Daniel van Vugt wrote: >> On 27/2/24 21:47, Hans de Goede wrote: > > <snip> > >> I think some boot failures also take you to the grub menu automatically next time? > > In Fedora all boot failures will unhide the grub menu on > the next boot. This unfortunately relies on downstream changes > so I don't know what Ubuntu does here. > > <snip> > >>>>> The kernel itself will be quiet as long as you set >>>>> CONFIG_CONSOLE_LOGLEVEL_QUIET=3 Ubuntu atm has set this >>>>> to 4 which means any kernel pr_err() or dev_err() >>>>> messages will get through and since there are quite >>>>> a few false positives of those Ubuntu really needs >>>>> to set CONFIG_CONSOLE_LOGLEVEL_QUIET=3 to fix part of: >>>>> https://bugs.launchpad.net/bugs/1970069 >>>> >>>> Incorrect. In my testing some laptops needed log level as low as 2 to go quiet. >>>> And the Ubuntu kernel team is never going to fix all those for non-sponsored >>>> devices. >>> >>> Notice that atm Ubuntu's kernel is using the too high >>> CONFIG_CONSOLE_LOGLEVEL_QUIET=4 with >>> CONFIG_CONSOLE_LOGLEVEL_QUIET=3 getting any errors logged >>> to the console should be very very rare. >>> >>> The only thing I can think of is if the kernel oopses >>> / WARN()s early on but the cause is innocent enough >>> that the boot happily continues. >>> >>> In that case actually showing the oops/WARN() is a good >>> thing. >>> >>> For all the years Fedora has had flickerfree boot I have >>> seen zero bug reports about this. If you have examples >>> of this actually being a problem please file bugs for >>> them (launchpad or bugzilla.kernel.org is fine) and >>> then lets take a look at those bugs and fix them. >>> >>> These should be so rare that I'm not worried about this >>> becoming a never ending list of bugs (unlike pr_err() / >>> dev_err() messages of which there are simply too many). >> >> I personally own many laptops with so many different boot messages that it's >> overwhelming for me already to report bugs for each of them. Hence this patch. >> >> Also I don't own all the laptops in the world, so fixing all the errors just >> for my collection wouldn't solve all cases. Whereas this patch does. > > Almost all of those boot messages are because Ubuntu has > set CONFIG_CONSOLE_LOGLEVEL_QUIET too high. Once that is fixed > there should be very little of not no messages left. > > I too own many laptops and I'm not seeing this issue on > any of them. > > You claim you are still seeing errors with > CONFIG_CONSOLE_LOGLEVEL_QUIET=3 yet you have not provided > a single example! > >>> Sorry, but your real problem here seems to be your >>> noisy downstream systemd patch. I'm not going to ack >>> a kernel patch which I consider a bad idea because >>> Ubuntu has a non standard systemd patch which is >>> to trigger happy with spamming the console. >> >> Indeed the systemd patch is a big problem. We seem to have had it for 9 years >> or so. I only just discovered it recently and would love to drop it, but was >> told we can't. Its main problem is that it uses the console as a communication >> pipe to plymouth. So simply making it less noisy isn't possible without >> disabling its functionality. It was seemingly intended to run behind the >> splash, but since it does fsck it tends to run before the splash (because DRM >> startup takes a few more seconds). This comes back to what I was saying before - Ubuntu (and anyone else that wants a flicker free boot for that matter) should adopt simpledrm. When simpledrm is compiled into the kernel DRM will be up long before the splash screen comes up. When drivers do fastboot (Intel) or seamless (AMD) handoff you /should/ be able to get the splash screen without a modeset. I think between doing that and changing the default log level not to show console err messages will go a long way. If there is a concern that people need to see those; how about changing the kernel command line for the recovery kernel so that they only come up in the recovery kernel? > > This does indeed sound like it is a non trivial problem to fix, > but that is still not a good reason to add this (IMHO) hack > to the kernel. > > The issue deferred fbcon takeover was designed to fix is that > the fbcon would mess up the framebuffer contents even if > nothing was ever logged to the console. > > The whole idea being that to still have the fbcon come up > as soon as there are any messages. > > Actively hiding messages was never part of the design, so > this is still a NACK from me. > > Also note that this matches how things work in grub > and shim when I first implemented flickerfree boot > I also had to patch shim and grub to not make EFI > text output protocol calls (including init()) until > they actually had some text to show. > > So the whole design here for shim, grub and the kernel > has always been to not mess with the framebuffer until > there is some text (any text) to output and then show > that text immediately. > > I do not think that deviating from this design is a good > idea. > > Regards, > > Hans > > > >>>>>> Closes: https://bugs.launchpad.net/bugs/1970069 >>>>>> Cc: Mario Limonciello <mario.limonciello@amd.com> >>>>>> Signed-off-by: Daniel van Vugt <daniel.van.vugt@canonical.com> >>>>>> --- >>>>>> drivers/video/fbdev/core/fbcon.c | 32 +++++++++++++++++++++++++++++--- >>>>>> 1 file changed, 29 insertions(+), 3 deletions(-) >>>>>> >>>>>> diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c >>>>>> index 63af6ab034..5b9f7635f7 100644 >>>>>> --- a/drivers/video/fbdev/core/fbcon.c >>>>>> +++ b/drivers/video/fbdev/core/fbcon.c >>>>>> @@ -76,6 +76,7 @@ >>>>>> #include <linux/crc32.h> /* For counting font checksums */ >>>>>> #include <linux/uaccess.h> >>>>>> #include <asm/irq.h> >>>>>> +#include <asm/cmdline.h> >>>>>> >>>>>> #include "fbcon.h" >>>>>> #include "fb_internal.h" >>>>>> @@ -146,6 +147,7 @@ static inline void fbcon_map_override(void) >>>>>> >>>>>> #ifdef CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER >>>>>> static bool deferred_takeover = true; >>>>>> +static int initial_console = -1; >>>>>> #else >>>>>> #define deferred_takeover false >>>>>> #endif >>>>>> @@ -3341,7 +3343,7 @@ static void fbcon_register_existing_fbs(struct work_struct *work) >>>>>> console_unlock(); >>>>>> } >>>>>> >>>>>> -static struct notifier_block fbcon_output_nb; >>>>>> +static struct notifier_block fbcon_output_nb, fbcon_switch_nb; >>>>>> static DECLARE_WORK(fbcon_deferred_takeover_work, fbcon_register_existing_fbs); >>>>>> >>>>>> static int fbcon_output_notifier(struct notifier_block *nb, >>>>>> @@ -3358,6 +3360,21 @@ static int fbcon_output_notifier(struct notifier_block *nb, >>>>>> >>>>>> return NOTIFY_OK; >>>>>> } >>>>>> + >>>>>> +static int fbcon_switch_notifier(struct notifier_block *nb, >>>>>> + unsigned long action, void *data) >>>>>> +{ >>>>>> + struct vc_data *vc = data; >>>>>> + >>>>>> + WARN_CONSOLE_UNLOCKED(); >>>>>> + >>>>>> + if (vc->vc_num != initial_console) { >>>>>> + dummycon_unregister_switch_notifier(&fbcon_switch_nb); >>>>>> + dummycon_register_output_notifier(&fbcon_output_nb); >>>>>> + } >>>>>> + >>>>>> + return NOTIFY_OK; >>>>>> +} >>>>>> #endif >>>>>> >>>>>> static void fbcon_start(void) >>>>>> @@ -3370,7 +3387,14 @@ static void fbcon_start(void) >>>>>> >>>>>> if (deferred_takeover) { >>>>>> fbcon_output_nb.notifier_call = fbcon_output_notifier; >>>>>> - dummycon_register_output_notifier(&fbcon_output_nb); >>>>>> + fbcon_switch_nb.notifier_call = fbcon_switch_notifier; >>>>>> + initial_console = fg_console; >>>>>> + >>>>>> + if (cmdline_find_option_bool(boot_command_line, "splash")) >>>>>> + dummycon_register_switch_notifier(&fbcon_switch_nb); >>>>>> + else >>>>>> + dummycon_register_output_notifier(&fbcon_output_nb); >>>>>> + >>>>>> return; >>>>>> } >>>>>> #endif >>>>>> @@ -3417,8 +3441,10 @@ void __exit fb_console_exit(void) >>>>>> { >>>>>> #ifdef CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER >>>>>> console_lock(); >>>>>> - if (deferred_takeover) >>>>>> + if (deferred_takeover) { >>>>>> dummycon_unregister_output_notifier(&fbcon_output_nb); >>>>>> + dummycon_unregister_switch_notifier(&fbcon_switch_nb); >>>>>> + } >>>>>> console_unlock(); >>>>>> >>>>>> cancel_work_sync(&fbcon_deferred_takeover_work); >>>>> >>>> >>> >> >
diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c index 63af6ab034..5b9f7635f7 100644 --- a/drivers/video/fbdev/core/fbcon.c +++ b/drivers/video/fbdev/core/fbcon.c @@ -76,6 +76,7 @@ #include <linux/crc32.h> /* For counting font checksums */ #include <linux/uaccess.h> #include <asm/irq.h> +#include <asm/cmdline.h> #include "fbcon.h" #include "fb_internal.h" @@ -146,6 +147,7 @@ static inline void fbcon_map_override(void) #ifdef CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER static bool deferred_takeover = true; +static int initial_console = -1; #else #define deferred_takeover false #endif @@ -3341,7 +3343,7 @@ static void fbcon_register_existing_fbs(struct work_struct *work) console_unlock(); } -static struct notifier_block fbcon_output_nb; +static struct notifier_block fbcon_output_nb, fbcon_switch_nb; static DECLARE_WORK(fbcon_deferred_takeover_work, fbcon_register_existing_fbs); static int fbcon_output_notifier(struct notifier_block *nb, @@ -3358,6 +3360,21 @@ static int fbcon_output_notifier(struct notifier_block *nb, return NOTIFY_OK; } + +static int fbcon_switch_notifier(struct notifier_block *nb, + unsigned long action, void *data) +{ + struct vc_data *vc = data; + + WARN_CONSOLE_UNLOCKED(); + + if (vc->vc_num != initial_console) { + dummycon_unregister_switch_notifier(&fbcon_switch_nb); + dummycon_register_output_notifier(&fbcon_output_nb); + } + + return NOTIFY_OK; +} #endif static void fbcon_start(void) @@ -3370,7 +3387,14 @@ static void fbcon_start(void) if (deferred_takeover) { fbcon_output_nb.notifier_call = fbcon_output_notifier; - dummycon_register_output_notifier(&fbcon_output_nb); + fbcon_switch_nb.notifier_call = fbcon_switch_notifier; + initial_console = fg_console; + + if (cmdline_find_option_bool(boot_command_line, "splash")) + dummycon_register_switch_notifier(&fbcon_switch_nb); + else + dummycon_register_output_notifier(&fbcon_output_nb); + return; } #endif @@ -3417,8 +3441,10 @@ void __exit fb_console_exit(void) { #ifdef CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER console_lock(); - if (deferred_takeover) + if (deferred_takeover) { dummycon_unregister_output_notifier(&fbcon_output_nb); + dummycon_unregister_switch_notifier(&fbcon_switch_nb); + } console_unlock(); cancel_work_sync(&fbcon_deferred_takeover_work);