[printk,v5,03/40] printk: Prepare for SRCU console list protection

Message ID 20221116162152.193147-4-john.ogness@linutronix.de
State New
Headers
Series reduce console_lock scope |

Commit Message

John Ogness Nov. 16, 2022, 4:21 p.m. UTC
  Provide an NMI-safe SRCU protected variant to walk the console list.

Note that all console fields are now set before adding the console
to the list to avoid the console becoming visible by SCRU readers
before being fully initialized.

This is a preparatory change for a new console infrastructure which
operates independent of the console BKL.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 .clang-format           |  1 +
 include/linux/console.h | 28 ++++++++++++-
 kernel/printk/printk.c  | 87 ++++++++++++++++++++++++++++++++++-------
 3 files changed, 100 insertions(+), 16 deletions(-)
  

Comments

Nathan Chancellor Dec. 1, 2022, 6:22 p.m. UTC | #1
Hi John,

On Wed, Nov 16, 2022 at 05:27:15PM +0106, John Ogness wrote:
> Provide an NMI-safe SRCU protected variant to walk the console list.
> 
> Note that all console fields are now set before adding the console
> to the list to avoid the console becoming visible by SCRU readers
> before being fully initialized.
> 
> This is a preparatory change for a new console infrastructure which
> operates independent of the console BKL.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> Acked-by: Miguel Ojeda <ojeda@kernel.org>
> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
> Reviewed-by: Petr Mladek <pmladek@suse.com>

I just bisected a boot hang to this change in -next as commit
621a912810a5 ("printk: Prepare for SRCU console list protection"); I
have included the bisect log at the end of this message.

The failure occurs while booting ARCH=powerpc pmac32_defconfig in QEMU,
during which I see the following message spammed constantly. I have
included reproduction steps below, assuming you can use kernel.org's
toolchains [1] (I used GCC 11.3.0):

$ make -skj$(nproc) ARCH=powerpc CROSS_COMPILE=powerpc-linux- mrproper pmac32_defconfig

# For proper serial console output
$ scripts/config -e SERIAL_PMACZILOG -e SERIAL_PMACZILOG_CONSOLE

$ make -skj$(nproc) ARCH=powerpc CROSS_COMPILE=powerpc-linux- olddefconfig vmlinux

$ qemu-system-ppc \
-append console=ttyS0 \
-display none \
-initrd rootfs.cpio \
-kernel vmlinux \
-m 128m \
-machine mac99 \
-nodefaults \
-no-reboot \
-serial mon:stdio
...
bad: scheduling from the idle thread!
CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc1+ #1
Hardware name: PowerMac3,1 7400 0xc0209 PowerMac
Call Trace:
[c0bc1db0] [c07f07e0] dump_stack_lvl+0x34/0x50 (unreliable)
[c0bc1dd0] [c008429c] dequeue_task_idle+0x34/0x5c
[c0bc1df0] [c0820924] __schedule+0x56c/0x5c4
[c0bc1e40] [c08209d0] schedule+0x54/0xfc
[c0bc1e60] [c0826034] schedule_timeout+0x13c/0x194
[c0bc1ea0] [c082134c] __wait_for_common+0xcc/0x1f4
[c0bc1ee0] [c00ac8ac] synchronize_srcu+0xc8/0x12c
[c0bc1f20] [c00a0230] unregister_console+0xc8/0x10c
[c0bc1f40] [c009e314] register_console+0x2f4/0x390
[c0bc1f60] [c0a17510] pmz_console_init+0x34/0x48
[c0bc1f70] [c0a0491c] console_init+0x9c/0xf0
[c0bc1fa0] [c09f5584] start_kernel+0x588/0x6ac
[c0bc1ff0] [00003540] 0x3540
...

The rootfs can be downloaded from [2] at 'images/ppc32/rootfs.cpio.zst'
(just decompress it with 'zstd' first) or you can use 'boot-qemu.py' to
run QEMU automatically (pass '-h' to see the options available).

If there is any further information I can provide or patches I can test,
I am more than happy to do so.

Cheers,
Nathan

[1]: https://mirrors.edge.kernel.org/pub/tools/crosstool/
[2]: https://github.com/ClangBuiltLinux/boot-utils

# bad: [2934ceb4e967b9233d0f97732e47175574a11406] Add linux-next specific files for 20221201
# good: [ef4d3ea40565a781c25847e9cb96c1bd9f462bc6] afs: Fix server->active leak in afs_put_server
git bisect start '2934ceb4e967b9233d0f97732e47175574a11406' 'ef4d3ea40565a781c25847e9cb96c1bd9f462bc6'
# bad: [e21f15c77886e52e4cdb448933ffa9efff48695f] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
git bisect bad e21f15c77886e52e4cdb448933ffa9efff48695f
# good: [dae6dd6a3a5774175b752856b1a19201317f591d] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git
git bisect good dae6dd6a3a5774175b752856b1a19201317f591d
# good: [59b843691d726b332492d70038131d2fb584fc7c] selftests/bpf: Add bench test to arm64 and s390x denylist
git bisect good 59b843691d726b332492d70038131d2fb584fc7c
# bad: [0e4c3a93d8ac956c1697e376ba8f772d342cda6a] Merge branch 'master' of git://linuxtv.org/media_tree.git
git bisect bad 0e4c3a93d8ac956c1697e376ba8f772d342cda6a
# bad: [89e8355104a3b82372fe8ceae026a96097b823b9] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid.git
git bisect bad 89e8355104a3b82372fe8ceae026a96097b823b9
# bad: [9d9e2018111d42c49c34bc4c59e65163185cb1df] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git
git bisect bad 9d9e2018111d42c49c34bc4c59e65163185cb1df
# bad: [a04b46d8846f8c92bb7697554609d3ae5bc7efb9] Merge branch 'rework/console-list-lock' into for-next
git bisect bad a04b46d8846f8c92bb7697554609d3ae5bc7efb9
# bad: [1a30db5d04673703450f944b3141fd89c70afa70] tty: tty_io: use console_list_lock for list synchronization
git bisect bad 1a30db5d04673703450f944b3141fd89c70afa70
# bad: [feec6b0406f1b71798d31c48d5721b6461e4a636] proc: consoles: document console_lock usage
git bisect bad feec6b0406f1b71798d31c48d5721b6461e4a636
# good: [e29a4915db1480f96e0bc2e928699d086a71f43c] srcu: Debug NMI safety even on archs that don't require it
git bisect good e29a4915db1480f96e0bc2e928699d086a71f43c
# bad: [fd9e5fe0e2a8a8ffc159430eb7ee676978159432] printk: register_console: use "registered" for variable names
git bisect bad fd9e5fe0e2a8a8ffc159430eb7ee676978159432
# good: [c1313d1ab38deb6621045f7d70f0d34f19ec7cfc] serial: kgdboc: Lock console list in probe function
git bisect good c1313d1ab38deb6621045f7d70f0d34f19ec7cfc
# bad: [621a912810a582195eacd96a3536fd057376be21] printk: Prepare for SRCU console list protection
git bisect bad 621a912810a582195eacd96a3536fd057376be21
# good: [7f2dc96716485b01c65f564f2805c558eb0b1a10] printk: Convert console_drivers list to hlist
git bisect good 7f2dc96716485b01c65f564f2805c558eb0b1a1
# first bad commit: [621a912810a582195eacd96a3536fd057376be21] printk: Prepare for SRCU console list protection
  
John Ogness Dec. 1, 2022, 9:36 p.m. UTC | #2
Hi Nathan,

Thanks for reporting this. Patch below.

@paulmck: Please also take a look below.

On 2022-12-01, Nathan Chancellor <nathan@kernel.org> wrote:
> bad: scheduling from the idle thread!
> CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc1+ #1
> Hardware name: PowerMac3,1 7400 0xc0209 PowerMac
> Call Trace:
> [c0bc1db0] [c07f07e0] dump_stack_lvl+0x34/0x50 (unreliable)
> [c0bc1dd0] [c008429c] dequeue_task_idle+0x34/0x5c
> [c0bc1df0] [c0820924] __schedule+0x56c/0x5c4
> [c0bc1e40] [c08209d0] schedule+0x54/0xfc
> [c0bc1e60] [c0826034] schedule_timeout+0x13c/0x194
> [c0bc1ea0] [c082134c] __wait_for_common+0xcc/0x1f4
> [c0bc1ee0] [c00ac8ac] synchronize_srcu+0xc8/0x12c
> [c0bc1f20] [c00a0230] unregister_console+0xc8/0x10c
> [c0bc1f40] [c009e314] register_console+0x2f4/0x390
> [c0bc1f60] [c0a17510] pmz_console_init+0x34/0x48
> [c0bc1f70] [c0a0491c] console_init+0x9c/0xf0
> [c0bc1fa0] [c09f5584] start_kernel+0x588/0x6ac
> [c0bc1ff0] [00003540] 0x3540

This config is using TINY_RCU. Its srcu_synchronize() implementation
does not check if it called before scheduling is ready. The following
patch will fix it.

@paulmck: Should it check (system_state < SYSTEM_SCHEDULING) instead
since TINY_RCU does not use @rcu_scheduler_active?

John Ogness

diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 33adafdad261..35338e6e37e7 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -197,6 +197,9 @@ void synchronize_srcu(struct srcu_struct *ssp)
 {
 	struct rcu_synchronize rs;
 
+	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
+		return;
+
 	init_rcu_head_on_stack(&rs.head);
 	init_completion(&rs.completion);
 	call_srcu(ssp, &rs.head, wakeme_after_rcu);
  
Paul E. McKenney Dec. 1, 2022, 9:56 p.m. UTC | #3
On Thu, Dec 01, 2022 at 10:42:25PM +0106, John Ogness wrote:
> Hi Nathan,
> 
> Thanks for reporting this. Patch below.
> 
> @paulmck: Please also take a look below.
> 
> On 2022-12-01, Nathan Chancellor <nathan@kernel.org> wrote:
> > bad: scheduling from the idle thread!
> > CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc1+ #1
> > Hardware name: PowerMac3,1 7400 0xc0209 PowerMac
> > Call Trace:
> > [c0bc1db0] [c07f07e0] dump_stack_lvl+0x34/0x50 (unreliable)
> > [c0bc1dd0] [c008429c] dequeue_task_idle+0x34/0x5c
> > [c0bc1df0] [c0820924] __schedule+0x56c/0x5c4
> > [c0bc1e40] [c08209d0] schedule+0x54/0xfc
> > [c0bc1e60] [c0826034] schedule_timeout+0x13c/0x194
> > [c0bc1ea0] [c082134c] __wait_for_common+0xcc/0x1f4
> > [c0bc1ee0] [c00ac8ac] synchronize_srcu+0xc8/0x12c
> > [c0bc1f20] [c00a0230] unregister_console+0xc8/0x10c
> > [c0bc1f40] [c009e314] register_console+0x2f4/0x390
> > [c0bc1f60] [c0a17510] pmz_console_init+0x34/0x48
> > [c0bc1f70] [c0a0491c] console_init+0x9c/0xf0
> > [c0bc1fa0] [c09f5584] start_kernel+0x588/0x6ac
> > [c0bc1ff0] [00003540] 0x3540
> 
> This config is using TINY_RCU. Its srcu_synchronize() implementation
> does not check if it called before scheduling is ready. The following
> patch will fix it.
> 
> @paulmck: Should it check (system_state < SYSTEM_SCHEDULING) instead
> since TINY_RCU does not use @rcu_scheduler_active?

Thank you for chasing this down, John!

You are exactly right, and I therefore need to pull this into the
pile for the upcoming merge window:

dbc6ca150842 ("srcu: Make Tiny synchronize_srcu() check for readers")

And kudos to Zqiang for a proactive fix!  ;-)

I will add your (John's) Tested-by, but please let me know if this is
inappropriate.

							Thanx, Paul

> John Ogness
> 
> diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> index 33adafdad261..35338e6e37e7 100644
> --- a/kernel/rcu/srcutiny.c
> +++ b/kernel/rcu/srcutiny.c
> @@ -197,6 +197,9 @@ void synchronize_srcu(struct srcu_struct *ssp)
>  {
>  	struct rcu_synchronize rs;
>  
> +	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> +		return;
> +
>  	init_rcu_head_on_stack(&rs.head);
>  	init_completion(&rs.completion);
>  	call_srcu(ssp, &rs.head, wakeme_after_rcu);
  
John Ogness Dec. 1, 2022, 10:12 p.m. UTC | #4
On 2022-12-01, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> I therefore need to pull this into the pile for the upcoming merge
> window:
>
> dbc6ca150842 ("srcu: Make Tiny synchronize_srcu() check for readers")
>
> And kudos to Zqiang for a proactive fix!  ;-)

Yes! Great job!

> I will add your (John's) Tested-by, but please let me know if this is
> inappropriate.

Sure.

Tested-by: John Ogness <john.ogness@linutronix.de>

John
  
Paul E. McKenney Dec. 2, 2022, 12:21 a.m. UTC | #5
On Thu, Dec 01, 2022 at 11:18:44PM +0106, John Ogness wrote:
> On 2022-12-01, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > I therefore need to pull this into the pile for the upcoming merge
> > window:
> >
> > dbc6ca150842 ("srcu: Make Tiny synchronize_srcu() check for readers")
> >
> > And kudos to Zqiang for a proactive fix!  ;-)
> 
> Yes! Great job!
> 
> > I will add your (John's) Tested-by, but please let me know if this is
> > inappropriate.
> 
> Sure.
> 
> Tested-by: John Ogness <john.ogness@linutronix.de>

Applied, thank you!

The new -rcu branch is srcunmisafe.2022.12.01a.

Adding Stephen Rothwell on CC because Mr. Murphy says that this change
will once again mess up -next.  Apologies in advance!!!

							Thanx, Paul
  
Petr Mladek Dec. 2, 2022, 10:53 a.m. UTC | #6
On Thu 2022-12-01 16:21:52, Paul E. McKenney wrote:
> On Thu, Dec 01, 2022 at 11:18:44PM +0106, John Ogness wrote:
> > On 2022-12-01, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > > I therefore need to pull this into the pile for the upcoming merge
> > > window:
> > >
> > > dbc6ca150842 ("srcu: Make Tiny synchronize_srcu() check for readers")
> > >
> > > And kudos to Zqiang for a proactive fix!  ;-)
> > 
> > Yes! Great job!
> > 
> > > I will add your (John's) Tested-by, but please let me know if this is
> > > inappropriate.
> > 
> > Sure.
> > 
> > Tested-by: John Ogness <john.ogness@linutronix.de>
> 
> Applied, thank you!
> 
> The new -rcu branch is srcunmisafe.2022.12.01a.
> 
> Adding Stephen Rothwell on CC because Mr. Murphy says that this change
> will once again mess up -next.  Apologies in advance!!!

I have rebased printk/linux.git, branch rework/console-list-lock on
top of srcunmisafe.2022.12.01a, commit 51f5f78a4f804aeb73cf ("srcu:
Make Tiny synchronize_srcu() check for readers").

I hope that it will be OK.

Thanks a lot for nailing this down. It looked like a mystery to me.

Best Regards,
Petr
  

Patch

diff --git a/.clang-format b/.clang-format
index 1247d54f9e49..04a675b56b57 100644
--- a/.clang-format
+++ b/.clang-format
@@ -222,6 +222,7 @@  ForEachMacros:
   - 'for_each_component_dais'
   - 'for_each_component_dais_safe'
   - 'for_each_console'
+  - 'for_each_console_srcu'
   - 'for_each_cpu'
   - 'for_each_cpu_and'
   - 'for_each_cpu_not'
diff --git a/include/linux/console.h b/include/linux/console.h
index 7b5f21f9e469..f4f0c9523835 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -15,7 +15,7 @@ 
 #define _LINUX_CONSOLE_H_ 1
 
 #include <linux/atomic.h>
-#include <linux/list.h>
+#include <linux/rculist.h>
 #include <linux/types.h>
 
 struct vc_data;
@@ -158,8 +158,34 @@  struct console {
 	struct hlist_node node;
 };
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+extern bool console_srcu_read_lock_is_held(void);
+#else
+static inline bool console_srcu_read_lock_is_held(void)
+{
+	return 1;
+}
+#endif
+
+extern int console_srcu_read_lock(void);
+extern void console_srcu_read_unlock(int cookie);
+
 extern struct hlist_head console_list;
 
+/**
+ * for_each_console_srcu() - Iterator over registered consoles
+ * @con:	struct console pointer used as loop cursor
+ *
+ * Although SRCU guarantees the console list will be consistent, the
+ * struct console fields may be updated by other CPUs while iterating.
+ *
+ * Requires console_srcu_read_lock to be held. Can be invoked from
+ * any context.
+ */
+#define for_each_console_srcu(con)					\
+	hlist_for_each_entry_srcu(con, &console_list, node,		\
+				  console_srcu_read_lock_is_held())
+
 /*
  * for_each_console() allows you to iterate on each console
  */
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e6f0832e71f0..173f46a29252 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -85,6 +85,7 @@  EXPORT_SYMBOL(oops_in_progress);
 static DEFINE_SEMAPHORE(console_sem);
 HLIST_HEAD(console_list);
 EXPORT_SYMBOL_GPL(console_list);
+DEFINE_STATIC_SRCU(console_srcu);
 
 /*
  * System may need to suppress printk message under certain
@@ -104,6 +105,13 @@  static struct lockdep_map console_lock_dep_map = {
 };
 #endif
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+bool console_srcu_read_lock_is_held(void)
+{
+	return srcu_read_lock_held(&console_srcu);
+}
+#endif
+
 enum devkmsg_log_bits {
 	__DEVKMSG_LOG_BIT_ON = 0,
 	__DEVKMSG_LOG_BIT_OFF,
@@ -219,6 +227,32 @@  int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
 }
 #endif /* CONFIG_PRINTK && CONFIG_SYSCTL */
 
+/**
+ * console_srcu_read_lock - Register a new reader for the
+ *	SRCU-protected console list
+ *
+ * Use for_each_console_srcu() to iterate the console list
+ *
+ * Context: Any context.
+ */
+int console_srcu_read_lock(void)
+{
+	return srcu_read_lock_nmisafe(&console_srcu);
+}
+EXPORT_SYMBOL(console_srcu_read_lock);
+
+/**
+ * console_srcu_read_unlock - Unregister an old reader from
+ *	the SRCU-protected console list
+ *
+ * Counterpart to console_srcu_read_lock()
+ */
+void console_srcu_read_unlock(int cookie)
+{
+	srcu_read_unlock_nmisafe(&console_srcu, cookie);
+}
+EXPORT_SYMBOL(console_srcu_read_unlock);
+
 /*
  * Helper macros to handle lockdep when locking/unlocking console_sem. We use
  * macros instead of functions so that _RET_IP_ contains useful information.
@@ -2989,6 +3023,14 @@  void console_stop(struct console *console)
 	console_lock();
 	console->flags &= ~CON_ENABLED;
 	console_unlock();
+
+	/*
+	 * Ensure that all SRCU list walks have completed. All contexts must
+	 * be able to see that this console is disabled so that (for example)
+	 * the caller can suspend the port without risk of another context
+	 * using the port.
+	 */
+	synchronize_srcu(&console_srcu);
 }
 EXPORT_SYMBOL(console_stop);
 
@@ -3179,6 +3221,17 @@  void register_console(struct console *newcon)
 		newcon->flags &= ~CON_PRINTBUFFER;
 	}
 
+	newcon->dropped = 0;
+	if (newcon->flags & CON_PRINTBUFFER) {
+		/* Get a consistent copy of @syslog_seq. */
+		mutex_lock(&syslog_lock);
+		newcon->seq = syslog_seq;
+		mutex_unlock(&syslog_lock);
+	} else {
+		/* Begin with next message. */
+		newcon->seq = prb_next_seq(prb);
+	}
+
 	/*
 	 * Put this console in the list - keep the
 	 * preferred driver at the head of the list.
@@ -3187,28 +3240,24 @@  void register_console(struct console *newcon)
 	if (hlist_empty(&console_list)) {
 		/* Ensure CON_CONSDEV is always set for the head. */
 		newcon->flags |= CON_CONSDEV;
-		hlist_add_head(&newcon->node, &console_list);
+		hlist_add_head_rcu(&newcon->node, &console_list);
 
 	} else if (newcon->flags & CON_CONSDEV) {
 		/* Only the new head can have CON_CONSDEV set. */
 		console_first()->flags &= ~CON_CONSDEV;
-		hlist_add_head(&newcon->node, &console_list);
+		hlist_add_head_rcu(&newcon->node, &console_list);
 
 	} else {
-		hlist_add_behind(&newcon->node, console_list.first);
-	}
-
-	newcon->dropped = 0;
-	if (newcon->flags & CON_PRINTBUFFER) {
-		/* Get a consistent copy of @syslog_seq. */
-		mutex_lock(&syslog_lock);
-		newcon->seq = syslog_seq;
-		mutex_unlock(&syslog_lock);
-	} else {
-		/* Begin with next message. */
-		newcon->seq = prb_next_seq(prb);
+		hlist_add_behind_rcu(&newcon->node, console_list.first);
 	}
 	console_unlock();
+
+	/*
+	 * No need to synchronize SRCU here! The caller does not rely
+	 * on all contexts being able to see the new console before
+	 * register_console() completes.
+	 */
+
 	console_sysfs_notify();
 
 	/*
@@ -3254,7 +3303,7 @@  int unregister_console(struct console *console)
 		return -ENODEV;
 	}
 
-	hlist_del_init(&console->node);
+	hlist_del_init_rcu(&console->node);
 
 	/*
 	 * <HISTORICAL>
@@ -3269,6 +3318,14 @@  int unregister_console(struct console *console)
 		console_first()->flags |= CON_CONSDEV;
 
 	console_unlock();
+
+	/*
+	 * Ensure that all SRCU list walks have completed. All contexts
+	 * must not be able to see this console in the list so that any
+	 * exit/cleanup routines can be performed safely.
+	 */
+	synchronize_srcu(&console_srcu);
+
 	console_sysfs_notify();
 
 	if (console->exit)