RCU stall on 6.1-rc4 (and some previous releases) related to ftrace

Message ID 1ef5fe19-a82f-835e-fda5-455e9c2b94b4@igalia.com
State New
Headers
Series RCU stall on 6.1-rc4 (and some previous releases) related to ftrace |

Commit Message

Guilherme G. Piccoli Nov. 10, 2022, 9:25 p.m. UTC
  Hi folks, I've noticed some RCU stalls when enabling ftrace filtering in
6.1-rc4 (also 6.1-rc3 and I guess I've seen this before, but only in
6.0+). Here is the full dmesg: https://termbin.com/4xo6 , see below for
a small snippet of the stall [0].

I've briefly talked to Steve on IRC, and he mentioned that my kernel
seemed to have PREEMPT_VOLUNTARY=y (and indeed, this is the case - see
the full config here: https://termbin.com/t48d), saying that maybe
adding a cond_resched() in the loop would help.

So, I've cooked a small hack (see the patch attached) and it seems to
work. Steve: lemme know if you want to send it (since it's your idea and
maybe my hack is not covering all cases), or if you prefer, I can work a
commit message and send myself.

Any other advice / tests required, please lemme me know and I'll be glad
in contributing.

Cheers,


Guilherme


[0]
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-....
} 6 jiffies s: 1289 root: 0x2/.
rcu: blocking rcu_node structures (internal RCU debug):
Sending NMI from CPU 4 to CPUs 1:
NMI backtrace for cpu 1
[...]
RIP: 0010:find_kallsyms_symbol+0x85/0x1a0
[...]
Call Trace:
 <TASK>
 ? get_refcyc_per_delivery.constprop.0+0x200/0x200 [amdgpu]
 module_address_lookup+0x63/0xc0
 ? get_refcyc_per_delivery.constprop.0+0x200/0x200 [amdgpu]
 kallsyms_lookup_buildid+0xb6/0x130
 ftrace_match_record+0x43/0xf0
 ? match_records+0x210/0x3b0
 ? __kmem_cache_alloc_node+0x165/0x260
 match_records+0x13a/0x3b0
 ftrace_process_regex.isra.0+0x101/0x120
 ftrace_filter_write+0x57/0x90
[...]
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-....
} 25 jiffies s: 1289 root: 0x2/.
rcu: blocking rcu_node structures (internal RCU debug):
Sending NMI from CPU 4 to CPUs 1:
[...]
RIP: 0010:find_kallsyms_symbol+0x8f/0x1a0
[...]
Call Trace:
 <TASK>
 ? dcn21_dmcu_create+0xd0/0xd0 [amdgpu]
 module_address_lookup+0x63/0xc0
 ? dcn21_dmcu_create+0xd0/0xd0 [amdgpu]
 kallsyms_lookup_buildid+0xb6/0x130
 ftrace_match_record+0x43/0xf0
 ? match_records+0x210/0x3b0
 ? __kmem_cache_alloc_node+0x165/0x260
 match_records+0x13a/0x3b0
 ftrace_process_regex.isra.0+0x101/0x120
 ftrace_filter_write+0x57/0x90
[...]
  

Comments

Paul E. McKenney Nov. 10, 2022, 11:08 p.m. UTC | #1
On Thu, Nov 10, 2022 at 06:25:41PM -0300, Guilherme G. Piccoli wrote:
> Hi folks, I've noticed some RCU stalls when enabling ftrace filtering in
> 6.1-rc4 (also 6.1-rc3 and I guess I've seen this before, but only in
> 6.0+). Here is the full dmesg: https://termbin.com/4xo6 , see below for
> a small snippet of the stall [0].
> 
> I've briefly talked to Steve on IRC, and he mentioned that my kernel
> seemed to have PREEMPT_VOLUNTARY=y (and indeed, this is the case - see
> the full config here: https://termbin.com/t48d), saying that maybe
> adding a cond_resched() in the loop would help.
> 
> So, I've cooked a small hack (see the patch attached) and it seems to
> work. Steve: lemme know if you want to send it (since it's your idea and
> maybe my hack is not covering all cases), or if you prefer, I can work a
> commit message and send myself.
> 
> Any other advice / tests required, please lemme me know and I'll be glad
> in contributing.

I will let others give feedback on the commit log, signoffs, and so
on.  From an RCU CPU stall warning perspective:

Acked-by: Paul E. McKenney <paulmck@kernel.org>

> Cheers,
> 
> 
> Guilherme
> 
> 
> [0]
> rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-....
> } 6 jiffies s: 1289 root: 0x2/.
> rcu: blocking rcu_node structures (internal RCU debug):
> Sending NMI from CPU 4 to CPUs 1:
> NMI backtrace for cpu 1
> [...]
> RIP: 0010:find_kallsyms_symbol+0x85/0x1a0
> [...]
> Call Trace:
>  <TASK>
>  ? get_refcyc_per_delivery.constprop.0+0x200/0x200 [amdgpu]
>  module_address_lookup+0x63/0xc0
>  ? get_refcyc_per_delivery.constprop.0+0x200/0x200 [amdgpu]
>  kallsyms_lookup_buildid+0xb6/0x130
>  ftrace_match_record+0x43/0xf0
>  ? match_records+0x210/0x3b0
>  ? __kmem_cache_alloc_node+0x165/0x260
>  match_records+0x13a/0x3b0
>  ftrace_process_regex.isra.0+0x101/0x120
>  ftrace_filter_write+0x57/0x90
> [...]
> rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-....
> } 25 jiffies s: 1289 root: 0x2/.
> rcu: blocking rcu_node structures (internal RCU debug):
> Sending NMI from CPU 4 to CPUs 1:
> [...]
> RIP: 0010:find_kallsyms_symbol+0x8f/0x1a0
> [...]
> Call Trace:
>  <TASK>
>  ? dcn21_dmcu_create+0xd0/0xd0 [amdgpu]
>  module_address_lookup+0x63/0xc0
>  ? dcn21_dmcu_create+0xd0/0xd0 [amdgpu]
>  kallsyms_lookup_buildid+0xb6/0x130
>  ftrace_match_record+0x43/0xf0
>  ? match_records+0x210/0x3b0
>  ? __kmem_cache_alloc_node+0x165/0x260
>  match_records+0x13a/0x3b0
>  ftrace_process_regex.isra.0+0x101/0x120
>  ftrace_filter_write+0x57/0x90
> [...]

> From 45fda369a49d0e59689e4a6b9babc12598265825 Mon Sep 17 00:00:00 2001
> From: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
> Date: Thu, 10 Nov 2022 16:42:17 -0300
> Subject: [PATCH] ftrace/hack: Add cond_resched() to prevent RCU stall
> 
> Suggestion from Steve on IRC
> ---
>  kernel/trace/ftrace.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 7dc023641bf1..8c2c04b2eb52 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -4184,6 +4184,7 @@ match_records(struct ftrace_hash *hash, char *func, int len, char *mod)
>  		if (rec->flags & FTRACE_FL_DISABLED)
>  			continue;
>  
> +		cond_resched();
>  		if (ftrace_match_record(rec, &func_g, mod_match, exclude_mod)) {
>  			ret = enter_record(hash, rec, clear_filter);
>  			if (ret < 0) {
> -- 
> 2.38.0
>
  
Steven Rostedt Nov. 11, 2022, 12:16 a.m. UTC | #2
On Thu, 10 Nov 2022 18:25:41 -0300
"Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote:

> @@ -4184,6 +4184,7 @@ match_records(struct ftrace_hash *hash, char *func, int len, char *mod)
>  		if (rec->flags & FTRACE_FL_DISABLED)
>  			continue;
>  
> +		cond_resched();
>  		if (ftrace_match_record(rec, &func_g, mod_match, exclude_mod)) {
>  			ret = enter_record(hash, rec, clear_filter);
>  			if (ret < 0) {

This isn't where I would put it. I would add it after the if statement.
That is, at the end of the loop.

-- Steve
  
Paul E. McKenney Nov. 11, 2022, 3:37 a.m. UTC | #3
On Thu, Nov 10, 2022 at 07:16:51PM -0500, Steven Rostedt wrote:
> On Thu, 10 Nov 2022 18:25:41 -0300
> "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote:
> 
> > @@ -4184,6 +4184,7 @@ match_records(struct ftrace_hash *hash, char *func, int len, char *mod)
> >  		if (rec->flags & FTRACE_FL_DISABLED)
> >  			continue;
> >  
> > +		cond_resched();
> >  		if (ftrace_match_record(rec, &func_g, mod_match, exclude_mod)) {
> >  			ret = enter_record(hash, rec, clear_filter);
> >  			if (ret < 0) {
> 
> This isn't where I would put it. I would add it after the if statement.
> That is, at the end of the loop.

I am good either way.  Though one could argue for putting it at the
beginning of the loop in case every element takes that "continue" above...

							Thanx, Paul
  
Guilherme G. Piccoli Nov. 15, 2022, 8:53 p.m. UTC | #4
On 11/11/2022 00:37, Paul E. McKenney wrote:
> [...]
>> This isn't where I would put it. I would add it after the if statement.
>> That is, at the end of the loop.
> 
> I am good either way.  Though one could argue for putting it at the
> beginning of the loop in case every element takes that "continue" above...
> 
> 							Thanx, Paul

Thanks Paul and Steve - submitted an official version here:
https://lore.kernel.org/lkml/20221115204847.593616-1-gpiccoli@igalia.com
  

Patch

From 45fda369a49d0e59689e4a6b9babc12598265825 Mon Sep 17 00:00:00 2001
From: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Date: Thu, 10 Nov 2022 16:42:17 -0300
Subject: [PATCH] ftrace/hack: Add cond_resched() to prevent RCU stall

Suggestion from Steve on IRC
---
 kernel/trace/ftrace.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 7dc023641bf1..8c2c04b2eb52 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -4184,6 +4184,7 @@  match_records(struct ftrace_hash *hash, char *func, int len, char *mod)
 		if (rec->flags & FTRACE_FL_DISABLED)
 			continue;
 
+		cond_resched();
 		if (ftrace_match_record(rec, &func_g, mod_match, exclude_mod)) {
 			ret = enter_record(hash, rec, clear_filter);
 			if (ret < 0) {
-- 
2.38.0