[v2,06/10] ftrace: selftest: remove broken trace_direct_tramp
Commit Message
From: Mark Rutland <mark.rutland@arm.com>
The ftrace selftest code has a trace_direct_tramp() function which it
uses as a direct call trampoline. This happens to work on x86, since the
direct call's return address is in the usual place, and can be returned
to via a RET, but in general the calling convention for direct calls is
different from regular function calls, and requires a trampoline written
in assembly.
On s390, regular function calls place the return address in %r14, and an
ftrace patch-site in an instrumented function places the trampoline's
return address (which is within the instrumented function) in %r0,
preserving the original %r14 value in-place. As a regular C function
will return to the address in %r14, using a C function as the trampoline
results in the trampoline returning to the caller of the instrumented
function, skipping the body of the instrumented function.
Note that the s390 issue is not detcted by the ftrace selftest code, as
the instrumented function is trivial, and returning back into the caller
happens to be equivalent.
On arm64, regular function calls place the return address in x30, and
an ftrace patch-site in an instrumented function saves this into r9
and places the trampoline's return address (within the instrumented
function) in x30. A regular C function will return to the address in
x30, but will not restore x9 into x30. Consequently, using a C function
as the trampoline results in returning to the trampoline's return
address having corrupted x30, such that when the instrumented function
returns, it will return back into itself.
To avoid future issues in this area, remove the trace_direct_tramp()
function, and require that each architecture with direct calls provides
a stub trampoline, named ftrace_stub_direct_tramp. This can be written
to handle the architecture's trampoline calling convention, and in
future could be used elsewhere (e.g. in the ftrace ops sample, to
measure the overhead of direct calls), so we may as well always build it
in.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Xu Kuohai <xukuohai@huawei.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Signed-off-by: Florent Revest <revest@chromium.org>
---
arch/s390/kernel/mcount.S | 5 +++++
arch/x86/kernel/ftrace_32.S | 5 +++++
arch/x86/kernel/ftrace_64.S | 4 ++++
include/linux/ftrace.h | 2 ++
kernel/trace/trace_selftest.c | 15 ++-------------
5 files changed, 18 insertions(+), 13 deletions(-)
Comments
On Tue, 7 Feb 2023 19:21:31 +0100
Florent Revest <revest@chromium.org> wrote:
> From: Mark Rutland <mark.rutland@arm.com>
>
> The ftrace selftest code has a trace_direct_tramp() function which it
> uses as a direct call trampoline. This happens to work on x86, since the
> direct call's return address is in the usual place, and can be returned
> to via a RET, but in general the calling convention for direct calls is
> different from regular function calls, and requires a trampoline written
> in assembly.
>
> On s390, regular function calls place the return address in %r14, and an
> ftrace patch-site in an instrumented function places the trampoline's
> return address (which is within the instrumented function) in %r0,
> preserving the original %r14 value in-place. As a regular C function
> will return to the address in %r14, using a C function as the trampoline
> results in the trampoline returning to the caller of the instrumented
> function, skipping the body of the instrumented function.
>
> Note that the s390 issue is not detcted by the ftrace selftest code, as
> the instrumented function is trivial, and returning back into the caller
> happens to be equivalent.
>
> On arm64, regular function calls place the return address in x30, and
> an ftrace patch-site in an instrumented function saves this into r9
> and places the trampoline's return address (within the instrumented
> function) in x30. A regular C function will return to the address in
> x30, but will not restore x9 into x30. Consequently, using a C function
> as the trampoline results in returning to the trampoline's return
> address having corrupted x30, such that when the instrumented function
> returns, it will return back into itself.
>
> To avoid future issues in this area, remove the trace_direct_tramp()
> function, and require that each architecture with direct calls provides
> a stub trampoline, named ftrace_stub_direct_tramp. This can be written
> to handle the architecture's trampoline calling convention, and in
> future could be used elsewhere (e.g. in the ftrace ops sample, to
> measure the overhead of direct calls), so we may as well always build it
> in.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Li Huafei <lihuafei1@huawei.com>
> Cc: Xu Kuohai <xukuohai@huawei.com>
> Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
> Cc: Florent Revest <revest@chromium.org>
> Signed-off-by: Florent Revest <revest@chromium.org>
Care to respin with my update requests? I can take up to this patch and
base it directly on v6.3-rc3 when it comes out. I'm expecting that to have
the fixes in other code that is breaking my tests.
Then I'll push it out after it passes all my tests, and you can take it
and add the arm64 specific bits on top. I'm currently running these patches
as is on my tests to see if they fail (with a patched kernel for the other
code that's breaking my tests).
Does that sound OK?
-- Steve
On Thu, Mar 16, 2023 at 12:51 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 7 Feb 2023 19:21:31 +0100
> Florent Revest <revest@chromium.org> wrote:
>
> > From: Mark Rutland <mark.rutland@arm.com>
> >
> > The ftrace selftest code has a trace_direct_tramp() function which it
> > uses as a direct call trampoline. This happens to work on x86, since the
> > direct call's return address is in the usual place, and can be returned
> > to via a RET, but in general the calling convention for direct calls is
> > different from regular function calls, and requires a trampoline written
> > in assembly.
> >
> > On s390, regular function calls place the return address in %r14, and an
> > ftrace patch-site in an instrumented function places the trampoline's
> > return address (which is within the instrumented function) in %r0,
> > preserving the original %r14 value in-place. As a regular C function
> > will return to the address in %r14, using a C function as the trampoline
> > results in the trampoline returning to the caller of the instrumented
> > function, skipping the body of the instrumented function.
> >
> > Note that the s390 issue is not detcted by the ftrace selftest code, as
> > the instrumented function is trivial, and returning back into the caller
> > happens to be equivalent.
> >
> > On arm64, regular function calls place the return address in x30, and
> > an ftrace patch-site in an instrumented function saves this into r9
> > and places the trampoline's return address (within the instrumented
> > function) in x30. A regular C function will return to the address in
> > x30, but will not restore x9 into x30. Consequently, using a C function
> > as the trampoline results in returning to the trampoline's return
> > address having corrupted x30, such that when the instrumented function
> > returns, it will return back into itself.
> >
> > To avoid future issues in this area, remove the trace_direct_tramp()
> > function, and require that each architecture with direct calls provides
> > a stub trampoline, named ftrace_stub_direct_tramp. This can be written
> > to handle the architecture's trampoline calling convention, and in
> > future could be used elsewhere (e.g. in the ftrace ops sample, to
> > measure the overhead of direct calls), so we may as well always build it
> > in.
> >
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: Li Huafei <lihuafei1@huawei.com>
> > Cc: Xu Kuohai <xukuohai@huawei.com>
> > Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
> > Cc: Florent Revest <revest@chromium.org>
> > Signed-off-by: Florent Revest <revest@chromium.org>
>
>
> Care to respin with my update requests? I can take up to this patch and
> base it directly on v6.3-rc3 when it comes out. I'm expecting that to have
> the fixes in other code that is breaking my tests.
Okay! :) I'll send you a subset of this series (first 6 patches with
your review addressed and rebased on v6.3-rc2 for now).
> Then I'll push it out after it passes all my tests, and you can take it
> and add the arm64 specific bits on top. I'm currently running these patches
> as is on my tests to see if they fail (with a patched kernel for the other
> code that's breaking my tests).
>
> Does that sound OK?
Sounds good to me, yes!
@@ -32,6 +32,11 @@ ENTRY(ftrace_stub)
BR_EX %r14
ENDPROC(ftrace_stub)
+SYM_CODE_START(ftrace_stub_direct_tramp)
+ lgr %r1, %r0
+ BR_EX %r1
+SYM_CODE_END(ftrace_stub_direct_tramp)
+
.macro ftrace_regs_entry, allregs=0
stg %r14,(__SF_GPRS+8*8)(%r15) # save traced function caller
@@ -163,6 +163,11 @@ SYM_INNER_LABEL(ftrace_regs_call, SYM_L_GLOBAL)
jmp .Lftrace_ret
SYM_CODE_END(ftrace_regs_caller)
+SYM_FUNC_START(ftrace_stub_direct_tramp)
+ CALL_DEPTH_ACCOUNT
+ RET
+SYM_FUNC_END(ftrace_stub_direct_tramp)
+
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
SYM_CODE_START(ftrace_graph_caller)
pushl %eax
@@ -307,6 +307,10 @@ SYM_INNER_LABEL(ftrace_regs_caller_end, SYM_L_GLOBAL)
SYM_FUNC_END(ftrace_regs_caller)
STACK_FRAME_NON_STANDARD_FP(ftrace_regs_caller)
+SYM_FUNC_START(ftrace_stub_direct_tramp)
+ CALL_DEPTH_ACCOUNT
+ RET
+SYM_FUNC_END(ftrace_stub_direct_tramp)
#else /* ! CONFIG_DYNAMIC_FTRACE */
@@ -412,6 +412,8 @@ int unregister_ftrace_direct(struct ftrace_ops *ops, unsigned long addr);
int modify_ftrace_direct(struct ftrace_ops *ops, unsigned long addr);
int modify_ftrace_direct_nolock(struct ftrace_ops *ops, unsigned long addr);
+void ftrace_stub_direct_tramp(void);
+
#else
struct ftrace_ops;
# define ftrace_direct_func_count 0
@@ -784,17 +784,6 @@ static struct fgraph_ops fgraph_ops __initdata = {
.retfunc = &trace_graph_return,
};
-#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
-#ifndef CALL_DEPTH_ACCOUNT
-#define CALL_DEPTH_ACCOUNT ""
-#endif
-
-noinline __noclone static void trace_direct_tramp(void)
-{
- asm(CALL_DEPTH_ACCOUNT);
-}
-#endif
-
/*
* Pretty much the same than for the function tracer from which the selftest
* has been borrowed.
@@ -875,7 +864,7 @@ trace_selftest_startup_function_graph(struct tracer *trace,
*/
ftrace_set_filter_ip(&direct, (unsigned long)DYN_FTRACE_TEST_NAME, 0, 0);
ret = register_ftrace_direct(&direct,
- (unsigned long)trace_direct_tramp);
+ (unsigned long)ftrace_stub_direct_tramp);
if (ret)
goto out;
@@ -896,7 +885,7 @@ trace_selftest_startup_function_graph(struct tracer *trace,
unregister_ftrace_graph(&fgraph_ops);
ret = unregister_ftrace_direct(&direct,
- (unsigned long)trace_direct_tramp);
+ (unsigned long)ftrace_stub_direct_tramp);
if (ret)
goto out;