[RFC] fprobe call of rethook_try_get faults

Message ID ZICzdpvp46Xk1rIv@krava
State New
Headers
Series [RFC] fprobe call of rethook_try_get faults |

Commit Message

Jiri Olsa June 7, 2023, 4:42 p.m. UTC
  hi,
I occasionally get following fault:

  general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
  CPU: 3 PID: 28438 Comm: test_progs Tainted: G           OE      6.4.0-rc3+ #448 dad92bc91c459c664b308990ada0799837010e31
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
  RIP: 0010:rethook_try_get+0x34/0xf0
  Code: 48 8b 47 08 85 d2 74 0b 65 8b 15 af 26 eb 7e 85 d2 74 57 48 85 c0 74 73 e8 39 8e f0 ff 84 c0 74 6a 48 8b 53 10 48 85 d2 74 >
  RSP: 0018:ffffc90003ccfcf0 EFLAGS: 00010202
  RAX: 0000000000000001 RBX: ffff88810920db40 RCX: 0000000000000003
  RDX: 6b6b6b6b6b6b6b6b RSI: ffffffff82c0a371 RDI: ffffffff82bcbddb
  RBP: ffffffff81f5a5f0 R08: 0000000000000001 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000014000 R12: ffffffffa02ec3f2
  R13: fffffffffffffff7 R14: ffffc90003ccfd38 R15: 0000000000000000
  FS:  00007f2f8195eb80(0000) GS:ffff88846da00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f2f819c0140 CR3: 0000000189cb8006 CR4: 0000000000770ee0
  PKRU: 55555554
  Call Trace:
   <TASK>
   fprobe_handler+0xc1/0x270
   ? __pfx_bpf_testmod_init+0x10/0x10 [bpf_testmod b0bc3019aa6d6bdb2afc30cf6381f510d7e5abbe]
   ? __pfx_bpf_testmod_init+0x10/0x10 [bpf_testmod b0bc3019aa6d6bdb2afc30cf6381f510d7e5abbe]
   ? bpf_fentry_test1+0x5/0x10
   ? bpf_fentry_test1+0x5/0x10
   ? bpf_testmod_init+0x22/0x80 [bpf_testmod b0bc3019aa6d6bdb2afc30cf6381f510d7e5abbe]
   ? do_one_initcall+0x63/0x2e0
   ? rcu_is_watching+0xd/0x40
   ? kmalloc_trace+0xaf/0xc0
   ? do_init_module+0x60/0x250
   ? __do_sys_finit_module+0xac/0x120
   ? do_syscall_64+0x37/0x90
   ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
   </TASK>
  Modules linked in: bpf_testmod(OE+) loop bpf_preload intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul crc32c_intel >


I can't really reliable reproduce this, but while checking the code, I wonder
we should call rethook_free only after we call unregister_ftrace_function like
in the patch below

jirka


---
  

Comments

Steven Rostedt June 13, 2023, 9:48 p.m. UTC | #1
On Wed, 7 Jun 2023 09:42:30 -0700
Jiri Olsa <olsajiri@gmail.com> wrote:

> I can't really reliable reproduce this, but while checking the code, I wonder
> we should call rethook_free only after we call unregister_ftrace_function like
> in the patch below

Yeah, I think you're right!

> 
> jirka
> 
> 
> ---
> diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
> index 18d36842faf5..0121e8c0d54e 100644
> --- a/kernel/trace/fprobe.c
> +++ b/kernel/trace/fprobe.c
> @@ -364,19 +364,13 @@ int unregister_fprobe(struct fprobe *fp)
>  		    fp->ops.saved_func != fprobe_kprobe_handler))
>  		return -EINVAL;
>  
> -	/*
> -	 * rethook_free() starts disabling the rethook, but the rethook handlers
> -	 * may be running on other processors at this point. To make sure that all
> -	 * current running handlers are finished, call unregister_ftrace_function()
> -	 * after this.
> -	 */
> -	if (fp->rethook)
> -		rethook_free(fp->rethook);

The above only waits for RCU to finish and then starts to free rethook.

This also means that something could be on the trampoline already and was
preempted. It could be that this code path gets preempted. Anyway, I don't
see how freeing rethook is safe before disabling all users.

Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>

-- Steve


> -
>  	ret = unregister_ftrace_function(&fp->ops);
>  	if (ret < 0)
>  		return ret;
>  
> +	if (fp->rethook)
> +		rethook_free(fp->rethook);
> +
>  	ftrace_free_filter(&fp->ops);
>  
>  	return ret;
  
Jiri Olsa June 14, 2023, 6:42 a.m. UTC | #2
On Tue, Jun 13, 2023 at 05:48:44PM -0400, Steven Rostedt wrote:
> On Wed, 7 Jun 2023 09:42:30 -0700
> Jiri Olsa <olsajiri@gmail.com> wrote:
> 
> > I can't really reliable reproduce this, but while checking the code, I wonder
> > we should call rethook_free only after we call unregister_ftrace_function like
> > in the patch below
> 
> Yeah, I think you're right!
> 
> > 
> > jirka
> > 
> > 
> > ---
> > diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
> > index 18d36842faf5..0121e8c0d54e 100644
> > --- a/kernel/trace/fprobe.c
> > +++ b/kernel/trace/fprobe.c
> > @@ -364,19 +364,13 @@ int unregister_fprobe(struct fprobe *fp)
> >  		    fp->ops.saved_func != fprobe_kprobe_handler))
> >  		return -EINVAL;
> >  
> > -	/*
> > -	 * rethook_free() starts disabling the rethook, but the rethook handlers
> > -	 * may be running on other processors at this point. To make sure that all
> > -	 * current running handlers are finished, call unregister_ftrace_function()
> > -	 * after this.
> > -	 */
> > -	if (fp->rethook)
> > -		rethook_free(fp->rethook);
> 
> The above only waits for RCU to finish and then starts to free rethook.
> 
> This also means that something could be on the trampoline already and was
> preempted. It could be that this code path gets preempted. Anyway, I don't
> see how freeing rethook is safe before disabling all users.
> 
> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>

thanks, I'll send formal patch

jirka

> 
> -- Steve
> 
> 
> > -
> >  	ret = unregister_ftrace_function(&fp->ops);
> >  	if (ret < 0)
> >  		return ret;
> >  
> > +	if (fp->rethook)
> > +		rethook_free(fp->rethook);
> > +
> >  	ftrace_free_filter(&fp->ops);
> >  
> >  	return ret;
>
  

Patch

diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 18d36842faf5..0121e8c0d54e 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -364,19 +364,13 @@  int unregister_fprobe(struct fprobe *fp)
 		    fp->ops.saved_func != fprobe_kprobe_handler))
 		return -EINVAL;
 
-	/*
-	 * rethook_free() starts disabling the rethook, but the rethook handlers
-	 * may be running on other processors at this point. To make sure that all
-	 * current running handlers are finished, call unregister_ftrace_function()
-	 * after this.
-	 */
-	if (fp->rethook)
-		rethook_free(fp->rethook);
-
 	ret = unregister_ftrace_function(&fp->ops);
 	if (ret < 0)
 		return ret;
 
+	if (fp->rethook)
+		rethook_free(fp->rethook);
+
 	ftrace_free_filter(&fp->ops);
 
 	return ret;