tracing: Do not synchronize freeing of trigger filter on boot up

Message ID 20221213172429.7774f4ba@gandalf.local.home
State New
Headers
Series tracing: Do not synchronize freeing of trigger filter on boot up |

Commit Message

Steven Rostedt Dec. 13, 2022, 10:24 p.m. UTC
  From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

If a trigger filter on the kernel command line fails to apply (due to
syntax error), it will be freed. The freeing will call
tracepoint_synchronize_unregister(), but this is not needed during early
boot up, and will even trigger a lockdep splat.

Avoid calling the synchronization function when system_state is
SYSTEM_BOOTING.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace_events_trigger.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)
  

Comments

Masami Hiramatsu (Google) Dec. 13, 2022, 11:49 p.m. UTC | #1
On Tue, 13 Dec 2022 17:24:29 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> If a trigger filter on the kernel command line fails to apply (due to
> syntax error), it will be freed. The freeing will call
> tracepoint_synchronize_unregister(), but this is not needed during early
> boot up, and will even trigger a lockdep splat.
> 
> Avoid calling the synchronization function when system_state is
> SYSTEM_BOOTING.

Shouldn't this be done inside tracepoint_synchronize_unregister()?
Then, it will prevent similar warnings if we expand boot time feature.

Thank you,

> 
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
>  kernel/trace/trace_events_trigger.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> index 19ce9d22bfd7..e535959939d3 100644
> --- a/kernel/trace/trace_events_trigger.c
> +++ b/kernel/trace/trace_events_trigger.c
> @@ -1085,8 +1085,14 @@ int set_trigger_filter(char *filter_str,
>  	rcu_assign_pointer(data->filter, filter);
>  
>  	if (tmp) {
> -		/* Make sure the call is done with the filter */
> -		tracepoint_synchronize_unregister();
> +		/*
> +		 * Make sure the call is done with the filter.
> +		 * It is possible that a filter could fail at boot up,
> +		 * and then this path will be called. Avoid the synchronization
> +		 * in that case.
> +		 */
> +		if (system_state != SYSTEM_BOOTING)
> +			tracepoint_synchronize_unregister();
>  		free_event_filter(tmp);
>  	}
>  
> -- 
> 2.35.1
>
  
Steven Rostedt Dec. 14, 2022, 12:03 a.m. UTC | #2
On Wed, 14 Dec 2022 08:49:54 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> > Avoid calling the synchronization function when system_state is
> > SYSTEM_BOOTING.  
> 
> Shouldn't this be done inside tracepoint_synchronize_unregister()?
> Then, it will prevent similar warnings if we expand boot time feature.

I thought about that, but because this is a special case, I was thinking
that we don't just gloss over it and have something call this when it
shouldn't be in early boot up. I was surprised when it was called, and then
had to debug it a bit to figure out why it was. When I discovered the path
it took to get there, I realized it was legitimate.

Right now, I only see this one case as an exception. If more pop up that
are correct, then we can switch it over. But right now, I still want
calling tracepoint_synchronize_unregister() to warn if it is called in
early boot up, because that could be a sign that something went wrong.

-- Steve
  
Masami Hiramatsu (Google) Dec. 14, 2022, 7:31 a.m. UTC | #3
On Tue, 13 Dec 2022 19:03:21 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 14 Dec 2022 08:49:54 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> 
> > > Avoid calling the synchronization function when system_state is
> > > SYSTEM_BOOTING.  
> > 
> > Shouldn't this be done inside tracepoint_synchronize_unregister()?
> > Then, it will prevent similar warnings if we expand boot time feature.
> 
> I thought about that, but because this is a special case, I was thinking
> that we don't just gloss over it and have something call this when it
> shouldn't be in early boot up. I was surprised when it was called, and then
> had to debug it a bit to figure out why it was. When I discovered the path
> it took to get there, I realized it was legitimate.
> 
> Right now, I only see this one case as an exception. If more pop up that
> are correct, then we can switch it over. But right now, I still want
> calling tracepoint_synchronize_unregister() to warn if it is called in
> early boot up, because that could be a sign that something went wrong.

Thanks, I got it. :) So when we find more synchronization is involved,
we'll revisit it.

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thank you,

> 
> -- Steve
  
Paul E. McKenney Dec. 14, 2022, 8:03 p.m. UTC | #4
On Wed, Dec 14, 2022 at 08:49:54AM +0900, Masami Hiramatsu wrote:
> On Tue, 13 Dec 2022 17:24:29 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> > 
> > If a trigger filter on the kernel command line fails to apply (due to
> > syntax error), it will be freed. The freeing will call
> > tracepoint_synchronize_unregister(), but this is not needed during early
> > boot up, and will even trigger a lockdep splat.
> > 
> > Avoid calling the synchronization function when system_state is
> > SYSTEM_BOOTING.
> 
> Shouldn't this be done inside tracepoint_synchronize_unregister()?
> Then, it will prevent similar warnings if we expand boot time feature.

How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
Just in case there are ever additional issues of this sort?

							Thanx, Paul

------------------------------------------------------------------------

commit d493ffca2df6c1963bd1d7b8f8c652a172f095ae
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed Dec 14 11:41:44 2022 -0800

    rcu: Make RCU_LOCKDEP_WARN() avoid early lockdep checks
    
    Currently, RCU_LOCKDEP_WARN() checks the condition before checking
    to see if lockdep is still enabled.  This is necessary to avoid the
    false-positive splats fixed by commit 3066820034b5dd ("rcu: Reject
    RCU_LOCKDEP_WARN() false positives").  However, the current state can
    result in false-positive splats during early boot before lockdep is fully
    initialized.  This commit therefore checks debug_lockdep_rcu_enabled()
    both before and after checking the condition, thus avoiding both sets
    of false-positive error reports.
    
    Reported-by: Steven Rostedt <rostedt@goodmis.org>
    Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index aa1b1c3546d7a..1aec1d53b0c91 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -364,11 +364,18 @@ static inline int debug_lockdep_rcu_enabled(void)
  * RCU_LOCKDEP_WARN - emit lockdep splat if specified condition is met
  * @c: condition to check
  * @s: informative message
+ *
+ * This checks debug_lockdep_rcu_enabled() before checking (c) to
+ * prevent early boot splats due to lockdep not yet being initialized,
+ * and rechecks it after checking (c) to prevent false-positive splats
+ * due to races with lockdep being disabled.  See commit 3066820034b5dd
+ * ("rcu: Reject RCU_LOCKDEP_WARN() false positives") for more detail.
  */
 #define RCU_LOCKDEP_WARN(c, s)						\
 	do {								\
 		static bool __section(".data.unlikely") __warned;	\
-		if ((c) && debug_lockdep_rcu_enabled() && !__warned) {	\
+		if (debug_lockdep_rcu_enabled() && (c) &&		\
+		    debug_lockdep_rcu_enabled() && !__warned) {		\
 			__warned = true;				\
 			lockdep_rcu_suspicious(__FILE__, __LINE__, s);	\
 		}							\
  
Mathieu Desnoyers Dec. 14, 2022, 8:37 p.m. UTC | #5
On 2022-12-14 15:03, Paul E. McKenney wrote:
> On Wed, Dec 14, 2022 at 08:49:54AM +0900, Masami Hiramatsu wrote:
>> On Tue, 13 Dec 2022 17:24:29 -0500
>> Steven Rostedt <rostedt@goodmis.org> wrote:
>>
>>> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
>>>
>>> If a trigger filter on the kernel command line fails to apply (due to
>>> syntax error), it will be freed. The freeing will call
>>> tracepoint_synchronize_unregister(), but this is not needed during early
>>> boot up, and will even trigger a lockdep splat.
>>>
>>> Avoid calling the synchronization function when system_state is
>>> SYSTEM_BOOTING.
>>
>> Shouldn't this be done inside tracepoint_synchronize_unregister()?
>> Then, it will prevent similar warnings if we expand boot time feature.
> 
> How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
> Just in case there are ever additional issues of this sort?

Hi Paul,

Your approach makes sense. Thanks for looking into this.

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit d493ffca2df6c1963bd1d7b8f8c652a172f095ae
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Wed Dec 14 11:41:44 2022 -0800
> 
>      rcu: Make RCU_LOCKDEP_WARN() avoid early lockdep checks
>      
>      Currently, RCU_LOCKDEP_WARN() checks the condition before checking
>      to see if lockdep is still enabled.  This is necessary to avoid the
>      false-positive splats fixed by commit 3066820034b5dd ("rcu: Reject
>      RCU_LOCKDEP_WARN() false positives").  However, the current state can
>      result in false-positive splats during early boot before lockdep is fully
>      initialized.  This commit therefore checks debug_lockdep_rcu_enabled()
>      both before and after checking the condition, thus avoiding both sets
>      of false-positive error reports.
>      
>      Reported-by: Steven Rostedt <rostedt@goodmis.org>
>      Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>      Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>      Cc: Boqun Feng <boqun.feng@gmail.com>
>      Cc: Matthew Wilcox <willy@infradead.org>
>      Cc: Thomas Gleixner <tglx@linutronix.de>
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index aa1b1c3546d7a..1aec1d53b0c91 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -364,11 +364,18 @@ static inline int debug_lockdep_rcu_enabled(void)
>    * RCU_LOCKDEP_WARN - emit lockdep splat if specified condition is met
>    * @c: condition to check
>    * @s: informative message
> + *
> + * This checks debug_lockdep_rcu_enabled() before checking (c) to
> + * prevent early boot splats due to lockdep not yet being initialized,
> + * and rechecks it after checking (c) to prevent false-positive splats
> + * due to races with lockdep being disabled.  See commit 3066820034b5dd
> + * ("rcu: Reject RCU_LOCKDEP_WARN() false positives") for more detail.
>    */
>   #define RCU_LOCKDEP_WARN(c, s)						\
>   	do {								\
>   		static bool __section(".data.unlikely") __warned;	\
> -		if ((c) && debug_lockdep_rcu_enabled() && !__warned) {	\
> +		if (debug_lockdep_rcu_enabled() && (c) &&		\
> +		    debug_lockdep_rcu_enabled() && !__warned) {		\
>   			__warned = true;				\
>   			lockdep_rcu_suspicious(__FILE__, __LINE__, s);	\
>   		}							\
  
Paul E. McKenney Dec. 14, 2022, 9:26 p.m. UTC | #6
On Wed, Dec 14, 2022 at 03:37:13PM -0500, Mathieu Desnoyers wrote:
> On 2022-12-14 15:03, Paul E. McKenney wrote:
> > On Wed, Dec 14, 2022 at 08:49:54AM +0900, Masami Hiramatsu wrote:
> > > On Tue, 13 Dec 2022 17:24:29 -0500
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > 
> > > > From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> > > > 
> > > > If a trigger filter on the kernel command line fails to apply (due to
> > > > syntax error), it will be freed. The freeing will call
> > > > tracepoint_synchronize_unregister(), but this is not needed during early
> > > > boot up, and will even trigger a lockdep splat.
> > > > 
> > > > Avoid calling the synchronization function when system_state is
> > > > SYSTEM_BOOTING.
> > > 
> > > Shouldn't this be done inside tracepoint_synchronize_unregister()?
> > > Then, it will prevent similar warnings if we expand boot time feature.
> > 
> > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
> > Just in case there are ever additional issues of this sort?
> 
> Hi Paul,
> 
> Your approach makes sense. Thanks for looking into this.
> 
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Thank you, I will apply this on my next rebase.

							Thanx, Paul

> > ------------------------------------------------------------------------
> > 
> > commit d493ffca2df6c1963bd1d7b8f8c652a172f095ae
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date:   Wed Dec 14 11:41:44 2022 -0800
> > 
> >      rcu: Make RCU_LOCKDEP_WARN() avoid early lockdep checks
> >      Currently, RCU_LOCKDEP_WARN() checks the condition before checking
> >      to see if lockdep is still enabled.  This is necessary to avoid the
> >      false-positive splats fixed by commit 3066820034b5dd ("rcu: Reject
> >      RCU_LOCKDEP_WARN() false positives").  However, the current state can
> >      result in false-positive splats during early boot before lockdep is fully
> >      initialized.  This commit therefore checks debug_lockdep_rcu_enabled()
> >      both before and after checking the condition, thus avoiding both sets
> >      of false-positive error reports.
> >      Reported-by: Steven Rostedt <rostedt@goodmis.org>
> >      Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> >      Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> >      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> >      Cc: Boqun Feng <boqun.feng@gmail.com>
> >      Cc: Matthew Wilcox <willy@infradead.org>
> >      Cc: Thomas Gleixner <tglx@linutronix.de>
> > 
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index aa1b1c3546d7a..1aec1d53b0c91 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -364,11 +364,18 @@ static inline int debug_lockdep_rcu_enabled(void)
> >    * RCU_LOCKDEP_WARN - emit lockdep splat if specified condition is met
> >    * @c: condition to check
> >    * @s: informative message
> > + *
> > + * This checks debug_lockdep_rcu_enabled() before checking (c) to
> > + * prevent early boot splats due to lockdep not yet being initialized,
> > + * and rechecks it after checking (c) to prevent false-positive splats
> > + * due to races with lockdep being disabled.  See commit 3066820034b5dd
> > + * ("rcu: Reject RCU_LOCKDEP_WARN() false positives") for more detail.
> >    */
> >   #define RCU_LOCKDEP_WARN(c, s)						\
> >   	do {								\
> >   		static bool __section(".data.unlikely") __warned;	\
> > -		if ((c) && debug_lockdep_rcu_enabled() && !__warned) {	\
> > +		if (debug_lockdep_rcu_enabled() && (c) &&		\
> > +		    debug_lockdep_rcu_enabled() && !__warned) {		\
> >   			__warned = true;				\
> >   			lockdep_rcu_suspicious(__FILE__, __LINE__, s);	\
> >   		}							\
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>
  
Steven Rostedt Dec. 15, 2022, 3:02 p.m. UTC | #7
On Wed, 14 Dec 2022 12:03:33 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> > > Avoid calling the synchronization function when system_state is
> > > SYSTEM_BOOTING.  
> > 
> > Shouldn't this be done inside tracepoint_synchronize_unregister()?
> > Then, it will prevent similar warnings if we expand boot time feature.  
> 
> How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
> Just in case there are ever additional issues of this sort?

Adding more tracing command line parameters is triggering this more. I now
hit:

[    0.213442] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.214653] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
[    0.232023] ------------[ cut here ]------------
[    0.232704] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:1410 rcu_poll_gp_seq_start_unlocked+0x8a/0xa0
[    0.234241] Modules linked in:
[    0.234715] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc6-test-00046-ga0ca17bfe724-dirty #63
[    0.236069] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[    0.237549] RIP: 0010:rcu_poll_gp_seq_start_unlocked+0x8a/0xa0
[    0.238495] Code: 02 0f 0b 48 89 ee 5b 48 c7 c7 00 ca b1 b3 5d e9 cc 0f c1 00 65 8b 05 85 f3 dd 4d 85 c0 75 a1 65 8b 05 d6 f0 dd 4d 85 c0 75 96 <0f> 0b eb 92 5b 5d e9 eb 18 fc 00 66 66 2e 0f 1f 84 00 00 00 00 00
[    0.241433] RSP: 0000:ffffffffb3a03d28 EFLAGS: 00010046
[    0.242289] RAX: 0000000000000000 RBX: ffffffffb3bcfc48 RCX: 0000000000000001
[    0.243355] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffffb3bcfc48
[    0.244421] RBP: 0000000000000000 R08: ffff97a3c012b8f8 R09: fffffffffffffffe
[    0.245569] R10: ffffffffb3a03e78 R11: 00000000ffffffff R12: 0000000000000040
[    0.246752] R13: ffffffffb3a30500 R14: ffffffffb3be6180 R15: ffff97a3c0355d10
[    0.247826] FS:  0000000000000000(0000) GS:ffff97a537c00000(0000) knlGS:0000000000000000
[    0.249084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.250012] CR2: ffff97a51e201000 CR3: 000000025c826001 CR4: 00000000000606b0
[    0.251122] Call Trace:
[    0.251491]  <TASK>
[    0.251810]  synchronize_rcu+0xb6/0x1e0
[    0.252393]  ? smp_call_function_many_cond+0x36d/0x3b0
[    0.253243]  ? enable_trace_buffered_event+0x10/0x10
[    0.254021]  trace_buffered_event_disable+0x7c/0x140
[    0.254786]  __ftrace_event_enable_disable+0x11c/0x250
[    0.255559]  __ftrace_set_clr_event_nolock+0xe3/0x130
[    0.256321]  ftrace_set_clr_event+0x9a/0xe0
[    0.256976]  early_enable_events+0x64/0xa8
[    0.257647]  trace_event_init+0xd6/0x2dd
[    0.258305]  start_kernel+0x760/0x9e0
[    0.258862]  secondary_startup_64_no_verify+0xe5/0xeb
[    0.259634]  </TASK>
[    0.259966] irq event stamp: 0
[    0.260427] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.261448] hardirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.262424] softirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.263402] softirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.264345] ---[ end trace 0000000000000000 ]---
[    0.265088] ------------[ cut here ]------------

Where kernel/rcu/tree.c:1410 is at:

// Make the polled API aware of the beginning of a grace period, but
// where caller does not hold the root rcu_node structure's lock.
static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap)
{
	unsigned long flags;
	struct rcu_node *rnp = rcu_get_root();

	if (rcu_init_invoked()) {
		lockdep_assert_irqs_enabled(); <<-------------- 1410
		raw_spin_lock_irqsave_rcu_node(rnp, flags);
	}
	rcu_poll_gp_seq_start(snap);
	if (rcu_init_invoked())
		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}


-- Steve
  
Paul E. McKenney Dec. 15, 2022, 5:02 p.m. UTC | #8
On Thu, Dec 15, 2022 at 10:02:41AM -0500, Steven Rostedt wrote:
> On Wed, 14 Dec 2022 12:03:33 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > > > Avoid calling the synchronization function when system_state is
> > > > SYSTEM_BOOTING.  
> > > 
> > > Shouldn't this be done inside tracepoint_synchronize_unregister()?
> > > Then, it will prevent similar warnings if we expand boot time feature.  
> > 
> > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
> > Just in case there are ever additional issues of this sort?
> 
> Adding more tracing command line parameters is triggering this more. I now
> hit:

Fair point, and apologies for the hassle.

Any chance of getting an official "it is now late enough in boot to
safely invoke lockdep" API?  ;-)

In the meantime, does the (untested and quite crude) patch at the end
of this message help?

							Thanx, Paul

> [    0.213442] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
> [    0.214653] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
> [    0.232023] ------------[ cut here ]------------
> [    0.232704] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:1410 rcu_poll_gp_seq_start_unlocked+0x8a/0xa0
> [    0.234241] Modules linked in:
> [    0.234715] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc6-test-00046-ga0ca17bfe724-dirty #63
> [    0.236069] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> [    0.237549] RIP: 0010:rcu_poll_gp_seq_start_unlocked+0x8a/0xa0
> [    0.238495] Code: 02 0f 0b 48 89 ee 5b 48 c7 c7 00 ca b1 b3 5d e9 cc 0f c1 00 65 8b 05 85 f3 dd 4d 85 c0 75 a1 65 8b 05 d6 f0 dd 4d 85 c0 75 96 <0f> 0b eb 92 5b 5d e9 eb 18 fc 00 66 66 2e 0f 1f 84 00 00 00 00 00
> [    0.241433] RSP: 0000:ffffffffb3a03d28 EFLAGS: 00010046
> [    0.242289] RAX: 0000000000000000 RBX: ffffffffb3bcfc48 RCX: 0000000000000001
> [    0.243355] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffffb3bcfc48
> [    0.244421] RBP: 0000000000000000 R08: ffff97a3c012b8f8 R09: fffffffffffffffe
> [    0.245569] R10: ffffffffb3a03e78 R11: 00000000ffffffff R12: 0000000000000040
> [    0.246752] R13: ffffffffb3a30500 R14: ffffffffb3be6180 R15: ffff97a3c0355d10
> [    0.247826] FS:  0000000000000000(0000) GS:ffff97a537c00000(0000) knlGS:0000000000000000
> [    0.249084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.250012] CR2: ffff97a51e201000 CR3: 000000025c826001 CR4: 00000000000606b0
> [    0.251122] Call Trace:
> [    0.251491]  <TASK>
> [    0.251810]  synchronize_rcu+0xb6/0x1e0
> [    0.252393]  ? smp_call_function_many_cond+0x36d/0x3b0
> [    0.253243]  ? enable_trace_buffered_event+0x10/0x10
> [    0.254021]  trace_buffered_event_disable+0x7c/0x140
> [    0.254786]  __ftrace_event_enable_disable+0x11c/0x250
> [    0.255559]  __ftrace_set_clr_event_nolock+0xe3/0x130
> [    0.256321]  ftrace_set_clr_event+0x9a/0xe0
> [    0.256976]  early_enable_events+0x64/0xa8
> [    0.257647]  trace_event_init+0xd6/0x2dd
> [    0.258305]  start_kernel+0x760/0x9e0
> [    0.258862]  secondary_startup_64_no_verify+0xe5/0xeb
> [    0.259634]  </TASK>
> [    0.259966] irq event stamp: 0
> [    0.260427] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [    0.261448] hardirqs last disabled at (0): [<0000000000000000>] 0x0
> [    0.262424] softirqs last  enabled at (0): [<0000000000000000>] 0x0
> [    0.263402] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [    0.264345] ---[ end trace 0000000000000000 ]---
> [    0.265088] ------------[ cut here ]------------
> 
> Where kernel/rcu/tree.c:1410 is at:
> 
> // Make the polled API aware of the beginning of a grace period, but
> // where caller does not hold the root rcu_node structure's lock.
> static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap)
> {
> 	unsigned long flags;
> 	struct rcu_node *rnp = rcu_get_root();
> 
> 	if (rcu_init_invoked()) {
> 		lockdep_assert_irqs_enabled(); <<-------------- 1410
> 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> 	}
> 	rcu_poll_gp_seq_start(snap);
> 	if (rcu_init_invoked())
> 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> }

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ee8a6a711719a..af8c99920e1a4 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -151,7 +151,6 @@ static void sync_sched_exp_online_cleanup(int cpu);
 static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
 static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
 static bool rcu_rdp_cpu_online(struct rcu_data *rdp);
-static bool rcu_init_invoked(void);
 static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf);
 static void rcu_init_new_rnp(struct rcu_node *rnp_leaf);
 
@@ -1314,7 +1313,7 @@ static void rcu_poll_gp_seq_start(unsigned long *snap)
 {
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_lockdep_assert_held_rcu_node(rnp);
 
 	// If RCU was idle, note beginning of GP.
@@ -1330,7 +1329,7 @@ static void rcu_poll_gp_seq_end(unsigned long *snap)
 {
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_lockdep_assert_held_rcu_node(rnp);
 
 	// If the previously noted GP is still in effect, record the
@@ -1352,12 +1351,12 @@ static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap)
 	unsigned long flags;
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked()) {
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) {
 		lockdep_assert_irqs_enabled();
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	}
 	rcu_poll_gp_seq_start(snap);
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 }
 
@@ -1368,12 +1367,12 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
 	unsigned long flags;
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked()) {
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) {
 		lockdep_assert_irqs_enabled();
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	}
 	rcu_poll_gp_seq_end(snap);
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 }
 
@@ -4042,13 +4041,6 @@ EXPORT_SYMBOL_GPL(rcu_lockdep_current_cpu_online);
 
 #endif /* #if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU) */
 
-// Has rcu_init() been invoked?  This is used (for example) to determine
-// whether spinlocks may be acquired safely.
-static bool rcu_init_invoked(void)
-{
-	return !!rcu_state.n_online_cpus;
-}
-
 /*
  * Near the end of the offline process.  Trace the fact that this CPU
  * is going offline.
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index ed6c3cce28f23..26146530d48f9 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -1019,14 +1019,14 @@ unsigned long start_poll_synchronize_rcu_expedited(void)
 	s = get_state_synchronize_rcu();
 	rdp = per_cpu_ptr(&rcu_data, raw_smp_processor_id());
 	rnp = rdp->mynode;
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
 	if (!poll_state_synchronize_rcu(s)) {
 		rnp->exp_seq_poll_rq = s;
-		if (rcu_init_invoked())
+		if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 			queue_work(rcu_gp_wq, &rnp->exp_poll_wq);
 	}
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
 
 	return s;
  
Paul E. McKenney Dec. 15, 2022, 5:25 p.m. UTC | #9
On Thu, Dec 15, 2022 at 09:02:56AM -0800, Paul E. McKenney wrote:
> On Thu, Dec 15, 2022 at 10:02:41AM -0500, Steven Rostedt wrote:
> > On Wed, 14 Dec 2022 12:03:33 -0800
> > "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > 
> > > > > Avoid calling the synchronization function when system_state is
> > > > > SYSTEM_BOOTING.  
> > > > 
> > > > Shouldn't this be done inside tracepoint_synchronize_unregister()?
> > > > Then, it will prevent similar warnings if we expand boot time feature.  
> > > 
> > > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
> > > Just in case there are ever additional issues of this sort?
> > 
> > Adding more tracing command line parameters is triggering this more. I now
> > hit:
> 
> Fair point, and apologies for the hassle.
> 
> Any chance of getting an official "it is now late enough in boot to
> safely invoke lockdep" API?  ;-)
> 
> In the meantime, does the (untested and quite crude) patch at the end
> of this message help?

OK, I was clearly not yet awake.  :-/

The more targeted (but still untested) patch below might be more
appropriate...

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ee8a6a711719a..f627888715dca 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1314,7 +1314,7 @@ static void rcu_poll_gp_seq_start(unsigned long *snap)
 {
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_lockdep_assert_held_rcu_node(rnp);
 
 	// If RCU was idle, note beginning of GP.
@@ -1330,7 +1330,7 @@ static void rcu_poll_gp_seq_end(unsigned long *snap)
 {
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_lockdep_assert_held_rcu_node(rnp);
 
 	// If the previously noted GP is still in effect, record the
@@ -1353,7 +1353,8 @@ static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap)
 	struct rcu_node *rnp = rcu_get_root();
 
 	if (rcu_init_invoked()) {
-		lockdep_assert_irqs_enabled();
+		if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
+			lockdep_assert_irqs_enabled();
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	}
 	rcu_poll_gp_seq_start(snap);
@@ -1369,7 +1370,8 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
 	struct rcu_node *rnp = rcu_get_root();
 
 	if (rcu_init_invoked()) {
-		lockdep_assert_irqs_enabled();
+		if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
+			lockdep_assert_irqs_enabled();
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	}
 	rcu_poll_gp_seq_end(snap);
  
Steven Rostedt Dec. 15, 2022, 6:51 p.m. UTC | #10
On Thu, 15 Dec 2022 09:02:56 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> On Thu, Dec 15, 2022 at 10:02:41AM -0500, Steven Rostedt wrote:
> > On Wed, 14 Dec 2022 12:03:33 -0800
> > "Paul E. McKenney" <paulmck@kernel.org> wrote:
> >   
> > > > > Avoid calling the synchronization function when system_state is
> > > > > SYSTEM_BOOTING.    
> > > > 
> > > > Shouldn't this be done inside tracepoint_synchronize_unregister()?
> > > > Then, it will prevent similar warnings if we expand boot time feature.    
> > > 
> > > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
> > > Just in case there are ever additional issues of this sort?  
> > 
> > Adding more tracing command line parameters is triggering this more. I now
> > hit:  
> 
> Fair point, and apologies for the hassle.
> 
> Any chance of getting an official "it is now late enough in boot to
> safely invoke lockdep" API?  ;-)

lockdep API isn't the problem, it's that we are still in the earlyboot stage
where interrupts are disabled, and you can't enable them. Lockdep works
just fine there, and is reporting interrupts being disabled correctly. The
backtrace reported *does* have interrupts disabled.

The thing is, because we are still running on a single CPU with interrupts
disabled, there is no need for synchronization. Even grabbing a mutex is
safe because there's guaranteed to be no contention (unless it's not
released). This is why a lot of warnings are suppressed if system_state is
SYSTEM_BOOTING.

> 
> In the meantime, does the (untested and quite crude) patch at the end
> of this message help?

I'll go and test it, but I'm guessing it will work fine. You could also test
against system_state != SYSTEM_BOOTING, as that gets cleared just before
kernel_init() can continue (it waits for the complete() that is called
after system_state is set to SYSTEM_SCHEDULING). Which happens shortly
after rcu_scheduler_starting().

I wonder if you could even replace RCU_SCHEDULER_RUNNING with
system_state != SYSTEM_BOOTING, and remove the rcu_scheduler_starting()
call.

-- Steve
  
Paul E. McKenney Dec. 15, 2022, 7:01 p.m. UTC | #11
On Thu, Dec 15, 2022 at 01:51:02PM -0500, Steven Rostedt wrote:
> On Thu, 15 Dec 2022 09:02:56 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > On Thu, Dec 15, 2022 at 10:02:41AM -0500, Steven Rostedt wrote:
> > > On Wed, 14 Dec 2022 12:03:33 -0800
> > > "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > >   
> > > > > > Avoid calling the synchronization function when system_state is
> > > > > > SYSTEM_BOOTING.    
> > > > > 
> > > > > Shouldn't this be done inside tracepoint_synchronize_unregister()?
> > > > > Then, it will prevent similar warnings if we expand boot time feature.    
> > > > 
> > > > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
> > > > Just in case there are ever additional issues of this sort?  
> > > 
> > > Adding more tracing command line parameters is triggering this more. I now
> > > hit:  
> > 
> > Fair point, and apologies for the hassle.
> > 
> > Any chance of getting an official "it is now late enough in boot to
> > safely invoke lockdep" API?  ;-)
> 
> lockdep API isn't the problem, it's that we are still in the earlyboot stage
> where interrupts are disabled, and you can't enable them. Lockdep works
> just fine there, and is reporting interrupts being disabled correctly. The
> backtrace reported *does* have interrupts disabled.
> 
> The thing is, because we are still running on a single CPU with interrupts
> disabled, there is no need for synchronization. Even grabbing a mutex is
> safe because there's guaranteed to be no contention (unless it's not
> released). This is why a lot of warnings are suppressed if system_state is
> SYSTEM_BOOTING.

Agreed, and my second attempt is a bit more surgical.  (Please see below
for a more official version of it.)

> > In the meantime, does the (untested and quite crude) patch at the end
> > of this message help?
> 
> I'll go and test it, but I'm guessing it will work fine. You could also test
> against system_state != SYSTEM_BOOTING, as that gets cleared just before
> kernel_init() can continue (it waits for the complete() that is called
> after system_state is set to SYSTEM_SCHEDULING). Which happens shortly
> after rcu_scheduler_starting().
> 
> I wonder if you could even replace RCU_SCHEDULER_RUNNING with
> system_state != SYSTEM_BOOTING, and remove the rcu_scheduler_starting()
> call.

In this particular case, agreed, I could use system_state.  But there are
other cases that must change behavior as soon as preemption can happen,
which is upon return from that call to user_mode_thread().  The update to
system_state doesn't happen until much later.  So I don't get to remove
that rcu_scheduler_starting() call.

What case?

Here is one:

o	The newly spawned init process does something that uses RCU,
	but is preempted while holding rcu_read_lock().

o	The boot thread, which did the preempting, waits for a grace
	period.  If we use rcu_scheduler_active, all is well because
	synchronize_rcu() will do a real run-time grace period, thus
	waiting for that reader.

	But system_state has not yet been updated, so if synchronize_rcu()
	were instead to pay attention to that one, there might be a
	tragically too-short RCU grace period.

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

commit 876c5ac113fa66a64fa241e69d9a2251b8daa5ee
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu Dec 15 09:26:09 2022 -0800

    rcu: Don't assert interrupts enabled too early in boot
    
    The rcu_poll_gp_seq_end() and rcu_poll_gp_seq_end_unlocked() both check
    that interrupts are enabled, as they normally should be when waiting for
    an RCU grace period.  Except that it is legal to wait for grace periods
    during early boot, before interrupts have been enabled for the first time,
    and polling for grace periods is required to work during this time.
    This can result in false-positive lockdep splats in the presence of
    boot-time-initiated tracing.
    
    This commit therefore conditions those interrupts-enabled checks on
    rcu_scheduler_active having advanced past RCU_SCHEDULER_INACTIVE, by
    which time interrupts have been enabled.
    
    Reported-by: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ee8a6a711719a..f627888715dca 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1314,7 +1314,7 @@ static void rcu_poll_gp_seq_start(unsigned long *snap)
 {
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_lockdep_assert_held_rcu_node(rnp);
 
 	// If RCU was idle, note beginning of GP.
@@ -1330,7 +1330,7 @@ static void rcu_poll_gp_seq_end(unsigned long *snap)
 {
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_lockdep_assert_held_rcu_node(rnp);
 
 	// If the previously noted GP is still in effect, record the
@@ -1353,7 +1353,8 @@ static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap)
 	struct rcu_node *rnp = rcu_get_root();
 
 	if (rcu_init_invoked()) {
-		lockdep_assert_irqs_enabled();
+		if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
+			lockdep_assert_irqs_enabled();
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	}
 	rcu_poll_gp_seq_start(snap);
@@ -1369,7 +1370,8 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
 	struct rcu_node *rnp = rcu_get_root();
 
 	if (rcu_init_invoked()) {
-		lockdep_assert_irqs_enabled();
+		if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
+			lockdep_assert_irqs_enabled();
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	}
 	rcu_poll_gp_seq_end(snap);
  
Steven Rostedt Dec. 15, 2022, 10:39 p.m. UTC | #12
On Thu, 15 Dec 2022 11:01:58 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> What case?
> 
> Here is one:
> 
> o	The newly spawned init process does something that uses RCU,
> 	but is preempted while holding rcu_read_lock().
> 
> o	The boot thread, which did the preempting, waits for a grace
> 	period.  If we use rcu_scheduler_active, all is well because
> 	synchronize_rcu() will do a real run-time grace period, thus
> 	waiting for that reader.
> 
> 	But system_state has not yet been updated, so if synchronize_rcu()
> 	were instead to pay attention to that one, there might be a
> 	tragically too-short RCU grace period.

The thing is, preemption is disabled the entire time here.

That is, from:

void kthread_show_list(void);
noinline void __ref rest_init(void)
{
	struct task_struct *tsk;
	int pid;

	rcu_scheduler_starting();

through:

	system_state = SYSTEM_SCHEDULING;

	complete(&kthreadd_done);


Preemption is disabled and other CPUs have not even been started yet.

Although the might_sleep() call might schedule the kernel_init() task but
that will only block on the completion.

In other words, I don't think anything can cause any issues this early in
the boot up.

-- Steve
  
Paul E. McKenney Dec. 15, 2022, 11:10 p.m. UTC | #13
On Thu, Dec 15, 2022 at 05:39:13PM -0500, Steven Rostedt wrote:
> On Thu, 15 Dec 2022 11:01:58 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > What case?
> > 
> > Here is one:
> > 
> > o	The newly spawned init process does something that uses RCU,
> > 	but is preempted while holding rcu_read_lock().
> > 
> > o	The boot thread, which did the preempting, waits for a grace
> > 	period.  If we use rcu_scheduler_active, all is well because
> > 	synchronize_rcu() will do a real run-time grace period, thus
> > 	waiting for that reader.
> > 
> > 	But system_state has not yet been updated, so if synchronize_rcu()
> > 	were instead to pay attention to that one, there might be a
> > 	tragically too-short RCU grace period.
> 
> The thing is, preemption is disabled the entire time here.
> 
> That is, from:
> 
> void kthread_show_list(void);
> noinline void __ref rest_init(void)
> {
> 	struct task_struct *tsk;
> 	int pid;
> 
> 	rcu_scheduler_starting();
> 
> through:
> 
> 	system_state = SYSTEM_SCHEDULING;
> 
> 	complete(&kthreadd_done);
> 
> 
> Preemption is disabled and other CPUs have not even been started yet.
> 
> Although the might_sleep() call might schedule the kernel_init() task but
> that will only block on the completion.
> 
> In other words, I don't think anything can cause any issues this early in
> the boot up.

The nice thing about the current placement of rcu_scheduler_starting()
is that there is not yet any other task to switch to.  ;-)

							Thanx, Paul
  
Steven Rostedt Dec. 15, 2022, 11:42 p.m. UTC | #14
On Thu, 15 Dec 2022 15:10:27 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> The nice thing about the current placement of rcu_scheduler_starting()
> is that there is not yet any other task to switch to.  ;-)

Fair enough. Anyway the last patch appears to do the job.

Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>

-- Steve
  
Paul E. McKenney Dec. 16, 2022, 1:01 a.m. UTC | #15
On Thu, Dec 15, 2022 at 06:42:02PM -0500, Steven Rostedt wrote:
> On Thu, 15 Dec 2022 15:10:27 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > The nice thing about the current placement of rcu_scheduler_starting()
> > is that there is not yet any other task to switch to.  ;-)
> 
> Fair enough. Anyway the last patch appears to do the job.
> 
> Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Applied, thank you!

							Thanx, Paul
  
Paul E. McKenney Dec. 16, 2022, 4:40 a.m. UTC | #16
On Thu, Dec 15, 2022 at 05:01:18PM -0800, Paul E. McKenney wrote:
> On Thu, Dec 15, 2022 at 06:42:02PM -0500, Steven Rostedt wrote:
> > On Thu, 15 Dec 2022 15:10:27 -0800
> > "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > 
> > > The nice thing about the current placement of rcu_scheduler_starting()
> > > is that there is not yet any other task to switch to.  ;-)
> > 
> > Fair enough. Anyway the last patch appears to do the job.
> > 
> > Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> > Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> 
> Applied, thank you!

And I should have checked up on urgency.  Normal process would hold this
one until the v6.3 merge window, but please let me know if you would
like it in v6.2.  It is after all a bug fix.

							Thanx, Paul
  
Steven Rostedt Dec. 16, 2022, 11:08 a.m. UTC | #17
On Thu, 15 Dec 2022 20:40:22 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> On Thu, Dec 15, 2022 at 05:01:18PM -0800, Paul E. McKenney wrote:
> > On Thu, Dec 15, 2022 at 06:42:02PM -0500, Steven Rostedt wrote:  
> > > On Thu, 15 Dec 2022 15:10:27 -0800
> > > "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > >   
> > > > The nice thing about the current placement of rcu_scheduler_starting()
> > > > is that there is not yet any other task to switch to.  ;-)  
> > > 
> > > Fair enough. Anyway the last patch appears to do the job.
> > > 
> > > Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> > > Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>  
> > 
> > Applied, thank you!  
> 
> And I should have checked up on urgency.  Normal process would hold this
> one until the v6.3 merge window, but please let me know if you would
> like it in v6.2.  It is after all a bug fix.
> 

Yes, any fix that fixes a lockdep splat is considered urgent and should be
pushed ASAP. As it makes finding other lockdep splats more difficult (as
the first splat shuts down lockdep).

Please send this during the merge window, as I have to carry it internally
until it makes it into mainline.

Thanks!

-- Steve
  
Paul E. McKenney Dec. 18, 2022, 12:29 a.m. UTC | #18
On Fri, Dec 16, 2022 at 06:08:10AM -0500, Steven Rostedt wrote:
> On Thu, 15 Dec 2022 20:40:22 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > On Thu, Dec 15, 2022 at 05:01:18PM -0800, Paul E. McKenney wrote:
> > > On Thu, Dec 15, 2022 at 06:42:02PM -0500, Steven Rostedt wrote:  
> > > > On Thu, 15 Dec 2022 15:10:27 -0800
> > > > "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > > >   
> > > > > The nice thing about the current placement of rcu_scheduler_starting()
> > > > > is that there is not yet any other task to switch to.  ;-)  
> > > > 
> > > > Fair enough. Anyway the last patch appears to do the job.
> > > > 
> > > > Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> > > > Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>  
> > > 
> > > Applied, thank you!  
> > 
> > And I should have checked up on urgency.  Normal process would hold this
> > one until the v6.3 merge window, but please let me know if you would
> > like it in v6.2.  It is after all a bug fix.
> > 
> 
> Yes, any fix that fixes a lockdep splat is considered urgent and should be
> pushed ASAP. As it makes finding other lockdep splats more difficult (as
> the first splat shuts down lockdep).
> 
> Please send this during the merge window, as I have to carry it internally
> until it makes it into mainline.

This is now rebased and marked with tag rcu-urgent.2022.12.17a.
If testing goes well, I will send it by the middle of next week.

							Thanx, Paul
  

Patch

diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 19ce9d22bfd7..e535959939d3 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -1085,8 +1085,14 @@  int set_trigger_filter(char *filter_str,
 	rcu_assign_pointer(data->filter, filter);
 
 	if (tmp) {
-		/* Make sure the call is done with the filter */
-		tracepoint_synchronize_unregister();
+		/*
+		 * Make sure the call is done with the filter.
+		 * It is possible that a filter could fail at boot up,
+		 * and then this path will be called. Avoid the synchronization
+		 * in that case.
+		 */
+		if (system_state != SYSTEM_BOOTING)
+			tracepoint_synchronize_unregister();
 		free_event_filter(tmp);
 	}