[printk,v1,05/18] printk: Add non-BKL console basic infrastructure

Message ID 20230302195618.156940-6-john.ogness@linutronix.de
State New
Headers
Series threaded/atomic console support |

Commit Message

John Ogness March 2, 2023, 7:56 p.m. UTC
  From: Thomas Gleixner <tglx@linutronix.de>

The current console/printk subsystem is protected by a Big Kernel Lock,
(aka console_lock) which has ill defined semantics and is more or less
stateless. This puts severe limitations on the console subsystem and
makes forced takeover and output in emergency and panic situations a
fragile endavour which is based on try and pray.

The goal of non-BKL consoles is to break out of the console lock jail
and to provide a new infrastructure that avoids the pitfalls and
allows console drivers to be gradually converted over.

The proposed infrastructure aims for the following properties:

  - Per console locking instead of global locking
  - Per console state which allows to make informed decisions
  - Stateful handover and takeover

As a first step state is added to struct console. The per console state
is an atomic_long_t with a 32bit bit field and on 64bit also a 32bit
sequence for tracking the last printed ringbuffer sequence number. On
32bit the sequence is separate from state for obvious reasons which
requires handling a few extra race conditions.

Reserve state bits, which will be populated later in the series. Wire
it up into the console register/unregister functionality and exclude
such consoles from being handled in the console BKL mechanisms. Since
the non-BKL consoles will not depend on the console lock/unlock dance
for printing, only perform said dance if a BKL console is registered.

The decision to use a bitfield was made as using a plain u32 with
mask/shift operations turned out to result in uncomprehensible code.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 fs/proc/consoles.c           |   1 +
 include/linux/console.h      |  33 +++++++++
 kernel/printk/Makefile       |   2 +-
 kernel/printk/internal.h     |  10 +++
 kernel/printk/printk.c       |  50 +++++++++++--
 kernel/printk/printk_nobkl.c | 137 +++++++++++++++++++++++++++++++++++
 6 files changed, 226 insertions(+), 7 deletions(-)
 create mode 100644 kernel/printk/printk_nobkl.c
  

Comments

Petr Mladek March 9, 2023, 2:08 p.m. UTC | #1
On Thu 2023-03-02 21:02:05, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The current console/printk subsystem is protected by a Big Kernel Lock,
> (aka console_lock) which has ill defined semantics and is more or less
> stateless. This puts severe limitations on the console subsystem and
> makes forced takeover and output in emergency and panic situations a
> fragile endavour which is based on try and pray.
> 
> The goal of non-BKL consoles is to break out of the console lock jail
> and to provide a new infrastructure that avoids the pitfalls and
> allows console drivers to be gradually converted over.
> 
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3472,6 +3492,14 @@ void register_console(struct console *newcon)
>  	newcon->dropped = 0;
>  	console_init_seq(newcon, bootcon_registered);
>  
> +	if (!(newcon->flags & CON_NO_BKL))
> +		have_bkl_console = true;

We never clear this value even when the console gets unregistered.

> +	else
> +		cons_nobkl_init(newcon);
> +
> +	if (newcon->flags & CON_BOOT)
> +		have_boot_console = true;
> +
>  	/*
>  	 * Put this console in the list - keep the
>  	 * preferred driver at the head of the list.
> @@ -3515,6 +3543,9 @@ void register_console(struct console *newcon)
>  			if (con->flags & CON_BOOT)
>  				unregister_console_locked(con);
>  		}
> +
> +		/* All boot consoles have been unregistered. */
> +		have_boot_console = false;

The boot consoles can be removed also by printk_late_init().

I would prefer to make this more error-proof and update both
have_bkl_console and have_boot_console in unregister_console().

A solution would be to use a reference counter instead of the boolean.
I am not sure if it is worth it. But it seems that refcount_read()
is just simple atomic read, aka READ_ONCE().


>  	}
>  unlock:
>  	console_list_unlock();

Best Regards,
Petr
  
Petr Mladek March 9, 2023, 3:32 p.m. UTC | #2
On Thu 2023-03-02 21:02:05, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The current console/printk subsystem is protected by a Big Kernel Lock,
> (aka console_lock) which has ill defined semantics and is more or less
> stateless. This puts severe limitations on the console subsystem and
> makes forced takeover and output in emergency and panic situations a
> fragile endavour which is based on try and pray.
> 
> The goal of non-BKL consoles is to break out of the console lock jail
> and to provide a new infrastructure that avoids the pitfalls and
> allows console drivers to be gradually converted over.
> 
> The proposed infrastructure aims for the following properties:
> 
>   - Per console locking instead of global locking
>   - Per console state which allows to make informed decisions
>   - Stateful handover and takeover
> 

So, this patch adds:

	CON_NO_BKL		= BIT(8),

	struct cons_state {

	atomic_long_t		__private atomic_state[2];

	include/linux/console.h
	kernel/printk/printk_nobkl.c

	enum state_selector {
		CON_STATE_CUR,

	cons_state_set()
	cons_state_try_cmpxchg()

	cons_nobkl_init()
	cons_nobkl_cleanup()


later patches add:

	console_can_proceed(struct cons_write_context *wctxt);
	console_enter_unsafe(struct cons_write_context *wctxt);

	cons_atomic_enter()
	cons_atomic_flush();

	static bool cons_emit_record(struct cons_write_context *wctxt)


All the above names seem to be used only by the NOBLK consoles.
And they use "cons", "NO_BKL", "nobkl", "cons_atomic", "atomic", "console".

I wonder if there is a system or if the names just evolved during several
reworks.

Please, let me know if I am over the edge, like too picky and that it
is not worth it. But you know me. I think that it should help to be
more consistent. And it actually might be a good idea to separate
API specific to the NOBKL consoles.

Here is my opinion:

1. I am not even sure if "nobkl", aka "no_big_kernel_lock" is the
   major property of these consoles.

   It might get confused by the real famous big kernel lock.
   Also I tend to confuse this with "noblk", aka "non-blocking".

   I always liked the "atomic consoles" description.


2. More importantly, an easy to distinguish shortcat would be nice
   as a common prefix. The following comes to my mind:

   + nbcon - aka nobkl/noblk consoles API
   + acon  - atomic console API


It would look like:

a) variant with nbcom:


	CON_NB		= BIT(8),

	struct nbcon_state {
	atomic_long_t		__private atomic_nbcon_state[2];

	include/linux/console.h
	kernel/printk/nbcon.c

	enum nbcon_state_selector {
		NBCON_STATE_CUR,

	nbcon_state_set()
	nbcon_state_try_cmpxchg()

	nbcon_init()
	nbcon_cleanup()

	nbcon_can_proceed(struct cons_write_context *wctxt);
	nbcon_enter_unsafe(struct cons_write_context *wctxt);

	nbcon_enter()
	nbcon_flush_all();

	nbcon_emit_next_record()


a) varianta with atomic:


	CON_ATOMIC		= BIT(8),

	struct acon_state {
	atomic_long_t		__private acon_state[2];

	include/linux/console.h
	kernel/printk/acon.c  or atomic_console.c

	enum acon_state_selector {
		ACON_STATE_CUR,

	acon_state_set()
	acon_state_try_cmpxchg()

	acon_init()
	acon_cleanup()

	acon_can_proceed(struct cons_write_context *wctxt);
	acon_enter_unsafe(struct cons_write_context *wctxt);

	acon_enter()
	acon_flush_all();

	acon_emit_next_record()


I would prefer the variant with "nbcon" because

	$> git grep nbcon | wc -l
	0

vs.

	$> git grep acon | wc -l
	11544


Again, feel free to tell me that I ask for too much. I am not
sure how complicated would be to do this mass change and if it
is worth it. I can review this patchset even with the current names.

My main concern is about the long term maintainability. It is
always easier to see patches than a monolitic source code.
I would like to reduce the risk of people hating us for what "a mess"
we made ;-)

Well, the current names might be fine when the legacy code gets
removed one day. The question is how realistic it is. Also we
probably should make them slightly more consistent anyway.

Best Regards,
Petr
  
John Ogness March 17, 2023, 1:29 p.m. UTC | #3
On 2023-03-09, Petr Mladek <pmladek@suse.com> wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -3472,6 +3492,14 @@ void register_console(struct console *newcon)
>>  	newcon->dropped = 0;
>>  	console_init_seq(newcon, bootcon_registered);
>>  
>> +	if (!(newcon->flags & CON_NO_BKL))
>> +		have_bkl_console = true;
>
> We never clear this value even when the console gets unregistered.

OK. I'll allow it to be cleared on unregister by checking the registered
list.

>> @@ -3515,6 +3543,9 @@ void register_console(struct console *newcon)
>>  			if (con->flags & CON_BOOT)
>>  				unregister_console_locked(con);
>>  		}
>> +
>> +		/* All boot consoles have been unregistered. */
>> +		have_boot_console = false;
>
> The boot consoles can be removed also by printk_late_init().
>
> I would prefer to make this more error-proof and update both
> have_bkl_console and have_boot_console in unregister_console().

OK.

> A solution would be to use a reference counter instead of the boolean.
> I am not sure if it is worth it. But it seems that refcount_read()
> is just simple atomic read, aka READ_ONCE().

Well, we are holding the console_list_lock, so we can just iterate over
the list. Iteration happens later in the series anyway, in order to
create/run the NOBKL threads.

John
  
John Ogness March 17, 2023, 1:39 p.m. UTC | #4
On 2023-03-09, Petr Mladek <pmladek@suse.com> wrote:
> So, this patch adds:
>
> [...]
>
> All the above names seem to be used only by the NOBLK consoles.
> And they use "cons", "NO_BKL", "nobkl", "cons_atomic", "atomic", "console".
>
> I wonder if there is a system or if the names just evolved during several
> reworks.

Yes. And because we didn't really know how to name these different yet
related pieces.

Note that the "atomic" usage is really only referring to the things
related to the atomic part of the console. The console also has a
threaded component.

>    ... an easy to distinguish shortcat would be nice
>    as a common prefix. The following comes to my mind:
>
>    + nbcon - aka nobkl/noblk consoles API
>    + acon  - atomic console API

"acon" is not really appropriate because they are threaded+atomic
consoles, not just atomic consoles. But it probably isn't necessary to
have separate atomic and threaded API forms. The atomic can still be
used as (for example) nbcon_atomic_enter().

> It would look like:
>
> a) variant with nbcom:
>
>
> 	CON_NB		= BIT(8),
>
> 	struct nbcon_state {
> 	atomic_long_t		__private atomic_nbcon_state[2];
>
> 	include/linux/console.h
> 	kernel/printk/nbcon.c
>
> 	enum nbcon_state_selector {
> 		NBCON_STATE_CUR,
>
> 	nbcon_state_set()
> 	nbcon_state_try_cmpxchg()
>
> 	nbcon_init()
> 	nbcon_cleanup()
>
> 	nbcon_can_proceed(struct cons_write_context *wctxt);
> 	nbcon_enter_unsafe(struct cons_write_context *wctxt);
>
> 	nbcon_enter()
> 	nbcon_flush_all();
>
> 	nbcon_emit_next_record()
>
> I would prefer the variant with "nbcon" because
>
> 	$> git grep nbcon | wc -l
> 	0

I also prefer "nbcon". Thanks for finding a name and unique string for
this new code. I will also rename the file to printk_nbcon.c.

John
  
Petr Mladek March 21, 2023, 4:04 p.m. UTC | #5
On Thu 2023-03-02 21:02:05, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The current console/printk subsystem is protected by a Big Kernel Lock,
> (aka console_lock) which has ill defined semantics and is more or less
> stateless. This puts severe limitations on the console subsystem and
> makes forced takeover and output in emergency and panic situations a
> fragile endavour which is based on try and pray.
> 
> The goal of non-BKL consoles is to break out of the console lock jail
> and to provide a new infrastructure that avoids the pitfalls and
> allows console drivers to be gradually converted over.
> 
> The proposed infrastructure aims for the following properties:
> 
>   - Per console locking instead of global locking
>   - Per console state which allows to make informed decisions
>   - Stateful handover and takeover
> 
> As a first step state is added to struct console. The per console state
> is an atomic_long_t with a 32bit bit field and on 64bit also a 32bit
> sequence for tracking the last printed ringbuffer sequence number. On
> 32bit the sequence is separate from state for obvious reasons which
> requires handling a few extra race conditions.
>

It is not completely clear that that this struct is stored
as atomic_long_t atomic_state[2] in struct console.

What about adding?

		atomic_long_t atomic;

> +		unsigned long	atom;
> +		struct {
> +#ifdef CONFIG_64BIT
> +			u32	seq;
> +#endif
> +			union {
> +				u32	bits;
> +				struct {
> +				};
> +			};
> +		};
> +	};
>  };
>  
>  /**
> @@ -186,6 +214,8 @@ enum cons_flags {
>   * @dropped:		Number of unreported dropped ringbuffer records
>   * @data:		Driver private data
>   * @node:		hlist node for the console list
> + *
> + * @atomic_state:	State array for NOBKL consoles; real and handover
>   */
>  struct console {
>  	char			name[16];
> @@ -205,6 +235,9 @@ struct console {
>  	unsigned long		dropped;
>  	void			*data;
>  	struct hlist_node	node;
> +
> +	/* NOBKL console specific members */
> +	atomic_long_t		__private atomic_state[2];

and using here

	struct cons_state	__private cons_state[2];

Then we could use cons_state[which].atomic to access it as
the atomic type.

Or was this on purpose? It is true that only the variable
in struct console has to be accessed the atomic way.

Anyway, we should at least add a comment into struct console
about that atomic_state[2] is used to store and access
struct cons_state an atomic way. Also add a compilation
check that the size is the same.

Best Regards,
Petr
  
John Ogness March 27, 2023, 4:28 p.m. UTC | #6
On 2023-03-21, Petr Mladek <pmladek@suse.com> wrote:
> It is not completely clear that that this struct is stored
> as atomic_long_t atomic_state[2] in struct console.
>
> What about adding?
>
> 		atomic_long_t atomic;

The struct is used to simplify interpretting and creating values to be
stored in the atomic state variable. I do not think it makes sense that
the atomic variable type itself is part of it.

> Anyway, we should at least add a comment into struct console
> about that atomic_state[2] is used to store and access
> struct cons_state an atomic way. Also add a compilation
> check that the size is the same.

A compilation check would be nice. Is that possible?

I am renaming the struct to nbcon_state. Also the variable will be
called nbcon_state. With the description updated, I think it makes it
clearer that "struct nbcon_state" is used to interpret/create values of
console->nbcon_state.

John Ogness
  
Petr Mladek March 28, 2023, 8:20 a.m. UTC | #7
On Mon 2023-03-27 18:34:22, John Ogness wrote:
> On 2023-03-21, Petr Mladek <pmladek@suse.com> wrote:
> > It is not completely clear that that this struct is stored
> > as atomic_long_t atomic_state[2] in struct console.
> >
> > What about adding?
> >
> > 		atomic_long_t atomic;
> 
> The struct is used to simplify interpretting and creating values to be
> stored in the atomic state variable. I do not think it makes sense that
> the atomic variable type itself is part of it.

It was just an idea. Feel free to keep it as is (not to add the atomic
into the union).

> > Anyway, we should at least add a comment into struct console
> > about that atomic_state[2] is used to store and access
> > struct cons_state an atomic way. Also add a compilation
> > check that the size is the same.
> 
> A compilation check would be nice. Is that possible?

I think the following might do the trick:

static_assert(sizeof(struct cons_state) == sizeof(atomic_long_t));


> I am renaming the struct to nbcon_state. Also the variable will be
> called nbcon_state. With the description updated, I think it makes it
> clearer that "struct nbcon_state" is used to interpret/create values of
> console->nbcon_state.

Sounds good.

Best Regards,
Petr
  
John Ogness March 28, 2023, 9:42 a.m. UTC | #8
On 2023-03-28, Petr Mladek <pmladek@suse.com> wrote:
>> A compilation check would be nice. Is that possible?
>
> I think the following might do the trick:
>
> static_assert(sizeof(struct cons_state) == sizeof(atomic_long_t));

I never realized the kernel code was allowed to have that. But it is
everywhere! :-) Thanks. I've added and tested the following:

/*
 * The nbcon_state struct is used to easily create and interpret values that
 * are stored in the console.nbcon_state variable. Make sure this struct stays
 * within the size boundaries of that atomic variable's underlying type in
 * order to avoid any accidental truncation.
 */
static_assert(sizeof(struct nbcon_state) <= sizeof(long));

Note that I am checking against sizeof(long), the underlying variable
type. We probably shouldn't assume sizeof(atomic_long_t) is always
sizeof(long).

John
  
Petr Mladek March 28, 2023, 12:52 p.m. UTC | #9
On Tue 2023-03-28 11:48:06, John Ogness wrote:
> On 2023-03-28, Petr Mladek <pmladek@suse.com> wrote:
> >> A compilation check would be nice. Is that possible?
> >
> > I think the following might do the trick:
> >
> > static_assert(sizeof(struct cons_state) == sizeof(atomic_long_t));
> 
> I never realized the kernel code was allowed to have that. But it is
> everywhere! :-) Thanks. I've added and tested the following:
> 
> /*
>  * The nbcon_state struct is used to easily create and interpret values that
>  * are stored in the console.nbcon_state variable. Make sure this struct stays
>  * within the size boundaries of that atomic variable's underlying type in
>  * order to avoid any accidental truncation.
>  */
> static_assert(sizeof(struct nbcon_state) <= sizeof(long));
> 
> Note that I am checking against sizeof(long), the underlying variable
> type. We probably shouldn't assume sizeof(atomic_long_t) is always
> sizeof(long).

Makes sense and looks good to me.

Best Regards,
Petr
  
Steven Rostedt March 28, 2023, 1:47 p.m. UTC | #10
On Tue, 28 Mar 2023 11:48:06 +0206
John Ogness <john.ogness@linutronix.de> wrote:

> > static_assert(sizeof(struct cons_state) == sizeof(atomic_long_t));  
> 
> I never realized the kernel code was allowed to have that. But it is
> everywhere! :-) Thanks. I've added and tested the following:

I didn't know about static_assert(), as I always used BUILD_BUG_ON().

The difference being that BUILD_BUG_ON() has to be used within a function,
where as static_assert() can be done outside of functions. Hmm, maybe I can
convert some of my BUILD_BUG_ON()s to static_assert()s.

Learn something new every day.

-- Steve
  

Patch

diff --git a/fs/proc/consoles.c b/fs/proc/consoles.c
index e0758fe7936d..9ce506866e60 100644
--- a/fs/proc/consoles.c
+++ b/fs/proc/consoles.c
@@ -21,6 +21,7 @@  static int show_console_dev(struct seq_file *m, void *v)
 		{ CON_ENABLED,		'E' },
 		{ CON_CONSDEV,		'C' },
 		{ CON_BOOT,		'B' },
+		{ CON_NO_BKL,		'N' },
 		{ CON_PRINTBUFFER,	'p' },
 		{ CON_BRL,		'b' },
 		{ CON_ANYTIME,		'a' },
diff --git a/include/linux/console.h b/include/linux/console.h
index f7967fb238e0..b9d2ad580128 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -155,6 +155,8 @@  static inline int con_debug_leave(void)
  *			/dev/kmesg which requires a larger output buffer.
  * @CON_SUSPENDED:	Indicates if a console is suspended. If true, the
  *			printing callbacks must not be called.
+ * @CON_NO_BKL:		Console can operate outside of the BKL style console_lock
+ *			constraints.
  */
 enum cons_flags {
 	CON_PRINTBUFFER		= BIT(0),
@@ -165,6 +167,32 @@  enum cons_flags {
 	CON_BRL			= BIT(5),
 	CON_EXTENDED		= BIT(6),
 	CON_SUSPENDED		= BIT(7),
+	CON_NO_BKL		= BIT(8),
+};
+
+/**
+ * struct cons_state - console state for NOBKL consoles
+ * @atom:	Compound of the state fields for atomic operations
+ * @seq:	Sequence for record tracking (64bit only)
+ * @bits:	Compound of the state bits below
+ *
+ * To be used for state read and preparation of atomic_long_cmpxchg()
+ * operations.
+ */
+struct cons_state {
+	union {
+		unsigned long	atom;
+		struct {
+#ifdef CONFIG_64BIT
+			u32	seq;
+#endif
+			union {
+				u32	bits;
+				struct {
+				};
+			};
+		};
+	};
 };
 
 /**
@@ -186,6 +214,8 @@  enum cons_flags {
  * @dropped:		Number of unreported dropped ringbuffer records
  * @data:		Driver private data
  * @node:		hlist node for the console list
+ *
+ * @atomic_state:	State array for NOBKL consoles; real and handover
  */
 struct console {
 	char			name[16];
@@ -205,6 +235,9 @@  struct console {
 	unsigned long		dropped;
 	void			*data;
 	struct hlist_node	node;
+
+	/* NOBKL console specific members */
+	atomic_long_t		__private atomic_state[2];
 };
 
 #ifdef CONFIG_LOCKDEP
diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index f5b388e810b9..b36683bd2f82 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -1,6 +1,6 @@ 
 # SPDX-License-Identifier: GPL-2.0-only
 obj-y	= printk.o
-obj-$(CONFIG_PRINTK)	+= printk_safe.o
+obj-$(CONFIG_PRINTK)	+= printk_safe.o printk_nobkl.o
 obj-$(CONFIG_A11Y_BRAILLE_CONSOLE)	+= braille.o
 obj-$(CONFIG_PRINTK_INDEX)	+= index.o
 
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 2a17704136f1..da380579263b 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -3,6 +3,7 @@ 
  * internal.h - printk internal definitions
  */
 #include <linux/percpu.h>
+#include <linux/console.h>
 
 #if defined(CONFIG_PRINTK) && defined(CONFIG_SYSCTL)
 void __init printk_sysctl_init(void);
@@ -61,6 +62,10 @@  void defer_console_output(void);
 
 u16 printk_parse_prefix(const char *text, int *level,
 			enum printk_info_flags *flags);
+
+void cons_nobkl_cleanup(struct console *con);
+void cons_nobkl_init(struct console *con);
+
 #else
 
 #define PRINTK_PREFIX_MAX	0
@@ -76,8 +81,13 @@  u16 printk_parse_prefix(const char *text, int *level,
 #define printk_safe_exit_irqrestore(flags) local_irq_restore(flags)
 
 static inline bool printk_percpu_data_ready(void) { return false; }
+static inline void cons_nobkl_init(struct console *con) { }
+static inline void cons_nobkl_cleanup(struct console *con) { }
+
 #endif /* CONFIG_PRINTK */
 
+extern bool have_boot_console;
+
 /**
  * struct printk_buffers - Buffers to read/format/output printk messages.
  * @outbuf:	After formatting, contains text to output.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 626d467c7e9b..b2c7c92c3d79 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -446,6 +446,19 @@  static int console_msg_format = MSG_FORMAT_DEFAULT;
 /* syslog_lock protects syslog_* variables and write access to clear_seq. */
 static DEFINE_MUTEX(syslog_lock);
 
+/*
+ * Specifies if a BKL console was ever registered. Used to determine if the
+ * console lock/unlock dance is needed for console printing.
+ */
+static bool have_bkl_console;
+
+/*
+ * Specifies if a boot console is registered. Used to determine if NOBKL
+ * consoles may be used since NOBKL consoles cannot synchronize with boot
+ * consoles.
+ */
+bool have_boot_console;
+
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* All 3 protected by @syslog_lock. */
@@ -2301,7 +2314,7 @@  asmlinkage int vprintk_emit(int facility, int level,
 	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
 
 	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
+	if (!in_sched && have_bkl_console) {
 		/*
 		 * The caller may be holding system-critical or
 		 * timing-sensitive locks. Disable preemption during
@@ -2624,7 +2637,7 @@  void resume_console(void)
  */
 static int console_cpu_notify(unsigned int cpu)
 {
-	if (!cpuhp_tasks_frozen) {
+	if (!cpuhp_tasks_frozen && have_bkl_console) {
 		/* If trylock fails, someone else is doing the printing */
 		if (console_trylock())
 			console_unlock();
@@ -3098,6 +3111,9 @@  void console_unblank(void)
 	struct console *c;
 	int cookie;
 
+	if (!have_bkl_console)
+		return;
+
 	/*
 	 * Stop console printing because the unblank() callback may
 	 * assume the console is not within its write() callback.
@@ -3135,6 +3151,9 @@  void console_unblank(void)
  */
 void console_flush_on_panic(enum con_flush_mode mode)
 {
+	if (!have_bkl_console)
+		return;
+
 	/*
 	 * If someone else is holding the console lock, trylock will fail
 	 * and may_schedule may be set.  Ignore and proceed to unlock so
@@ -3310,9 +3329,10 @@  static void try_enable_default_console(struct console *newcon)
 		newcon->flags |= CON_CONSDEV;
 }
 
-#define con_printk(lvl, con, fmt, ...)			\
-	printk(lvl pr_fmt("%sconsole [%s%d] " fmt),	\
-	       (con->flags & CON_BOOT) ? "boot" : "",	\
+#define con_printk(lvl, con, fmt, ...)				\
+	printk(lvl pr_fmt("%s%sconsole [%s%d] " fmt),		\
+	       (con->flags & CON_NO_BKL) ? "" : "legacy ",	\
+	       (con->flags & CON_BOOT) ? "boot" : "",		\
 	       con->name, con->index, ##__VA_ARGS__)
 
 static void console_init_seq(struct console *newcon, bool bootcon_registered)
@@ -3472,6 +3492,14 @@  void register_console(struct console *newcon)
 	newcon->dropped = 0;
 	console_init_seq(newcon, bootcon_registered);
 
+	if (!(newcon->flags & CON_NO_BKL))
+		have_bkl_console = true;
+	else
+		cons_nobkl_init(newcon);
+
+	if (newcon->flags & CON_BOOT)
+		have_boot_console = true;
+
 	/*
 	 * Put this console in the list - keep the
 	 * preferred driver at the head of the list.
@@ -3515,6 +3543,9 @@  void register_console(struct console *newcon)
 			if (con->flags & CON_BOOT)
 				unregister_console_locked(con);
 		}
+
+		/* All boot consoles have been unregistered. */
+		have_boot_console = false;
 	}
 unlock:
 	console_list_unlock();
@@ -3563,6 +3594,9 @@  static int unregister_console_locked(struct console *console)
 	 */
 	synchronize_srcu(&console_srcu);
 
+	if (console->flags & CON_NO_BKL)
+		cons_nobkl_cleanup(console);
+
 	console_sysfs_notify();
 
 	if (console->exit)
@@ -3866,11 +3900,15 @@  void wake_up_klogd(void)
  */
 void defer_console_output(void)
 {
+	int val = PRINTK_PENDING_WAKEUP;
+
 	/*
 	 * New messages may have been added directly to the ringbuffer
 	 * using vprintk_store(), so wake any waiters as well.
 	 */
-	__wake_up_klogd(PRINTK_PENDING_WAKEUP | PRINTK_PENDING_OUTPUT);
+	if (have_bkl_console)
+		val |= PRINTK_PENDING_OUTPUT;
+	__wake_up_klogd(val);
 }
 
 void printk_trigger_flush(void)
diff --git a/kernel/printk/printk_nobkl.c b/kernel/printk/printk_nobkl.c
new file mode 100644
index 000000000000..8df3626808dd
--- /dev/null
+++ b/kernel/printk/printk_nobkl.c
@@ -0,0 +1,137 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (C) 2022 Linutronix GmbH, John Ogness
+// Copyright (C) 2022 Intel, Thomas Gleixner
+
+#include <linux/kernel.h>
+#include <linux/console.h>
+#include "internal.h"
+/*
+ * Printk implementation for consoles that do not depend on the BKL style
+ * console_lock mechanism.
+ *
+ * Console is locked on a CPU when state::locked is set and state:cpu ==
+ * current CPU. This is valid for the current execution context.
+ *
+ * Nesting execution contexts on the same CPU can carefully take over
+ * if the driver allows reentrancy via state::unsafe = false. When the
+ * interrupted context resumes it checks the state before entering
+ * an unsafe region and aborts the operation if it detects a takeover.
+ *
+ * In case of panic or emergency the nesting context can take over the
+ * console forcefully. The write callback is then invoked with the unsafe
+ * flag set in the write context data, which allows the driver side to avoid
+ * locks and to evaluate the driver state so it can use an emergency path
+ * or repair the state instead of blindly assuming that it works.
+ *
+ * If the interrupted context touches the assigned record buffer after
+ * takeover, it does not cause harm because at the same execution level
+ * there is no concurrency on the same CPU. A threaded printer always has
+ * its own record buffer so it can never interfere with any of the per CPU
+ * record buffers.
+ *
+ * A concurrent writer on a different CPU can request to take over the
+ * console by:
+ *
+ *	1) Carefully writing the desired state into state[REQ]
+ *	   if there is no same or higher priority request pending.
+ *	   This locks state[REQ] except for higher priority
+ *	   waiters.
+ *
+ *	2) Setting state[CUR].req_prio unless a same or higher
+ *	   priority waiter won the race.
+ *
+ *	3) Carefully spin on state[CUR] until that is locked with the
+ *	   expected state. When the state is not the expected one then it
+ *	   has to verify that state[REQ] is still the same and that
+ *	   state[CUR] has not been taken over or unlocked.
+ *
+ *      The unlocker hands over to state[REQ], but only if state[CUR]
+ *	matches.
+ *
+ * In case that the owner does not react on the request and does not make
+ * observable progress, the waiter will timeout and can then decide to do
+ * a hostile takeover.
+ */
+
+#define copy_full_state(_dst, _src)	do { _dst = _src; } while (0)
+#define copy_bit_state(_dst, _src)	do { _dst.bits = _src.bits; } while (0)
+
+#ifdef CONFIG_64BIT
+#define copy_seq_state64(_dst, _src)	do { _dst.seq = _src.seq; } while (0)
+#else
+#define copy_seq_state64(_dst, _src)	do { } while (0)
+#endif
+
+enum state_selector {
+	CON_STATE_CUR,
+	CON_STATE_REQ,
+};
+
+/**
+ * cons_state_set - Helper function to set the console state
+ * @con:	Console to update
+ * @which:	Selects real state or handover state
+ * @new:	The new state to write
+ *
+ * Only to be used when the console is not yet or no longer visible in the
+ * system. Otherwise use cons_state_try_cmpxchg().
+ */
+static inline void cons_state_set(struct console *con, enum state_selector which,
+				  struct cons_state *new)
+{
+	atomic_long_set(&ACCESS_PRIVATE(con, atomic_state[which]), new->atom);
+}
+
+/**
+ * cons_state_read - Helper function to read the console state
+ * @con:	Console to update
+ * @which:	Selects real state or handover state
+ * @state:	The state to store the result
+ */
+static inline void cons_state_read(struct console *con, enum state_selector which,
+				   struct cons_state *state)
+{
+	state->atom = atomic_long_read(&ACCESS_PRIVATE(con, atomic_state[which]));
+}
+
+/**
+ * cons_state_try_cmpxchg() - Helper function for atomic_long_try_cmpxchg() on console state
+ * @con:	Console to update
+ * @which:	Selects real state or handover state
+ * @old:	Old/expected state
+ * @new:	New state
+ *
+ * Returns: True on success, false on fail
+ */
+static inline bool cons_state_try_cmpxchg(struct console *con,
+					  enum state_selector which,
+					  struct cons_state *old,
+					  struct cons_state *new)
+{
+	return atomic_long_try_cmpxchg(&ACCESS_PRIVATE(con, atomic_state[which]),
+				       &old->atom, new->atom);
+}
+
+/**
+ * cons_nobkl_init - Initialize the NOBKL console specific data
+ * @con:	Console to initialize
+ */
+void cons_nobkl_init(struct console *con)
+{
+	struct cons_state state = { };
+
+	cons_state_set(con, CON_STATE_CUR, &state);
+	cons_state_set(con, CON_STATE_REQ, &state);
+}
+
+/**
+ * cons_nobkl_cleanup - Cleanup the NOBKL console specific data
+ * @con:	Console to cleanup
+ */
+void cons_nobkl_cleanup(struct console *con)
+{
+	struct cons_state state = { };
+
+	cons_state_set(con, CON_STATE_CUR, &state);
+	cons_state_set(con, CON_STATE_REQ, &state);
+}