[RFC,24/32] x86/ftrace: Enable HAVE_FUNCTION_GRAPH_FREGS

Message ID 169920068069.482486.6540417903833579700.stgit@devnote2
State New
Headers
Series tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph |

Commit Message

Masami Hiramatsu (Google) Nov. 5, 2023, 4:11 p.m. UTC
  From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Support HAVE_FUNCTION_GRAPH_FREGS on x86-64, which saves ftrace_regs
on the stack in ftrace_graph return trampoline so that the callbacks
can access registers via ftrace_regs APIs.

Note that this only recovers 'rax' and 'rdx' registers because other
registers are not used anymore and recovered by caller. 'rax' and
'rdx' will be used for passing the return value.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 arch/x86/Kconfig            |    3 ++-
 arch/x86/kernel/ftrace_64.S |   30 ++++++++++++++++++++++--------
 2 files changed, 24 insertions(+), 9 deletions(-)
  

Comments

Peter Zijlstra Nov. 5, 2023, 5:25 p.m. UTC | #1
On Mon, Nov 06, 2023 at 01:11:21AM +0900, Masami Hiramatsu (Google) wrote:
> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> Support HAVE_FUNCTION_GRAPH_FREGS on x86-64, which saves ftrace_regs
> on the stack in ftrace_graph return trampoline so that the callbacks
> can access registers via ftrace_regs APIs.

What is ftrace_regs ? If I look at arch/x86/include/asm/ftrace.h it's a
pointless wrapper around pt_regs.

Can we please remove the pointless wrappery and call it what it is?
  
Steven Rostedt Nov. 5, 2023, 7:11 p.m. UTC | #2
On Sun, 5 Nov 2023 18:25:36 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Nov 06, 2023 at 01:11:21AM +0900, Masami Hiramatsu (Google) wrote:
> > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > 
> > Support HAVE_FUNCTION_GRAPH_FREGS on x86-64, which saves ftrace_regs
> > on the stack in ftrace_graph return trampoline so that the callbacks
> > can access registers via ftrace_regs APIs.  
> 
> What is ftrace_regs ? If I look at arch/x86/include/asm/ftrace.h it's a
> pointless wrapper around pt_regs.
> 
> Can we please remove the pointless wrappery and call it what it is?

A while back ago when I introduced FTRACE_WITH_ARGS, it would have all
ftrace callbacks get a pt_regs, but it would be partially filled for
those that did not specify the "REGS" flag when registering the
callback. You and Thomas complained that it would be a bug to return
pt_regs that was not full because something might read the non filled
registers and think they were valid.

To solve this, I came up with ftrace_regs to only hold the registers
that were required for function parameters (including the stack
pointer). You could then call arch_ftrace_get_regs(ftrace_regs) and if
this "wrapper" had all valid pt_regs registers, then it would return
the pt_regs, otherwise it would return NULL, and you would need to use
the ftrace_regs accessor calls to get the function registers. You and
Thomas agreed with this.

You even Acked the patch:

commit 02a474ca266a47ea8f4d5a11f4ffa120f83730ad
Author: Steven Rostedt (VMware) <rostedt@goodmis.org>
Date:   Tue Oct 27 10:55:55 2020 -0400

    ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default
    
    Currently, the only way to get access to the registers of a function via a
    ftrace callback is to set the "FL_SAVE_REGS" bit in the ftrace_ops. But as this
    saves all regs as if a breakpoint were to trigger (for use with kprobes), it
    is expensive.
    
    The regs are already saved on the stack for the default ftrace callbacks, as
    that is required otherwise a function being traced will get the wrong
    arguments and possibly crash. And on x86, the arguments are already stored
    where they would be on a pt_regs structure to use that code for both the
    regs version of a callback, it makes sense to pass that information always
    to all functions.
    
    If an architecture does this (as x86_64 now does), it is to set
    HAVE_DYNAMIC_FTRACE_WITH_ARGS, and this will let the generic code that it
    could have access to arguments without having to set the flags.
    
    This also includes having the stack pointer being saved, which could be used
    for accessing arguments on the stack, as well as having the function graph
    tracer not require its own trampoline!
    
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


-- Steve
  
Steven Rostedt Nov. 5, 2023, 9:28 p.m. UTC | #3
On Sun, 5 Nov 2023 14:11:30 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> You even Acked the patch:

More specifically, you acked the series and stressed the ftrace_regs
wrapper part when doing so:

  https://lore.kernel.org/all/20201113080733.GZ2628@hirez.programming.kicks-ass.net/

> On Thu, Nov 12, 2020 at 09:01:42PM -0500, Steven Rostedt wrote:
> > Steven Rostedt (VMware) (3):
> >       ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs
> >       ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default
> >       livepatch: Use the default ftrace_ops instead of REGS when ARGS is available
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

-- Steve
  
Peter Zijlstra Nov. 5, 2023, 11:17 p.m. UTC | #4
On Sun, Nov 05, 2023 at 02:11:30PM -0500, Steven Rostedt wrote:
> On Sun, 5 Nov 2023 18:25:36 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Mon, Nov 06, 2023 at 01:11:21AM +0900, Masami Hiramatsu (Google) wrote:
> > > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > > 
> > > Support HAVE_FUNCTION_GRAPH_FREGS on x86-64, which saves ftrace_regs
> > > on the stack in ftrace_graph return trampoline so that the callbacks
> > > can access registers via ftrace_regs APIs.  
> > 
> > What is ftrace_regs ? If I look at arch/x86/include/asm/ftrace.h it's a
> > pointless wrapper around pt_regs.
> > 
> > Can we please remove the pointless wrappery and call it what it is?
> 
> A while back ago when I introduced FTRACE_WITH_ARGS, it would have all
> ftrace callbacks get a pt_regs, but it would be partially filled for
> those that did not specify the "REGS" flag when registering the
> callback. You and Thomas complained that it would be a bug to return
> pt_regs that was not full because something might read the non filled
> registers and think they were valid.
> 
> To solve this, I came up with ftrace_regs to only hold the registers
> that were required for function parameters (including the stack
> pointer). You could then call arch_ftrace_get_regs(ftrace_regs) and if
> this "wrapper" had all valid pt_regs registers, then it would return
> the pt_regs, otherwise it would return NULL, and you would need to use
> the ftrace_regs accessor calls to get the function registers. You and
> Thomas agreed with this.

Changelog nor code made it clear this was partial anything. So this is
still the partial thing?

Can we then pretty clear clarify all that, and make it clear which regs
are in there? Because when I do 'vim -t ftrace_regs' it just gets me a
seemingly pointless wrapper struct, no elucidating comments nothingses.

> You even Acked the patch:
>
> commit 02a474ca266a47ea8f4d5a11f4ffa120f83730ad
> Author: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Date:   Tue Oct 27 10:55:55 2020 -0400

You expect me to remember things from 3 years ago?
  
Steven Rostedt Nov. 5, 2023, 11:33 p.m. UTC | #5
On Mon, 6 Nov 2023 00:17:34 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> Changelog nor code made it clear this was partial anything. So this is
> still the partial thing?
> 
> Can we then pretty clear clarify all that, and make it clear which regs
> are in there? Because when I do 'vim -t ftrace_regs' it just gets me a
> seemingly pointless wrapper struct, no elucidating comments nothingses.

I agree it should be better documented (like everything else). The
ftrace_regs must have all the registers needed to produce a function's
arguments. For x86_64, that would be:

  rdi, rsi, rdx, r8, r9, rsp

Basically anything that is needed to call mcount/fentry.

But yes, it's still partial registers but for archs that support
FTRACE_WITH_REGS, it can also hold all pt_regs which can be retrieved
by the arch_ftrace_get_regs(), which is why there's a pt_regs struct in
the x86 version. But that's not the case for arm64, as
arch_ftrace_get_regs() will always return NULL.

> 
> > You even Acked the patch:
> >
> > commit 02a474ca266a47ea8f4d5a11f4ffa120f83730ad
> > Author: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > Date:   Tue Oct 27 10:55:55 2020 -0400  
> 
> You expect me to remember things from 3 years ago?

Heh, of course not. I just thought it amusing that I created
ftrace_regs because of you and then 3 years later you ask to get rid of
it. But the real issue is that it's not documented clearly why it
exists, and that should be rectified.

Thanks,

-- Steve
  
Steven Rostedt Nov. 5, 2023, 11:34 p.m. UTC | #6
On Sun, 5 Nov 2023 18:33:01 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> For x86_64, that would be:
> 
>   rdi, rsi, rdx, r8, r9, rsp

I missed rcx.

-- Steve
  
Masami Hiramatsu (Google) Nov. 6, 2023, 12:38 a.m. UTC | #7
On Sun, 5 Nov 2023 18:34:09 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Sun, 5 Nov 2023 18:33:01 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > For x86_64, that would be:
> > 
> >   rdi, rsi, rdx, r8, r9, rsp
> 
> I missed rcx.

I would like to add rax to the list so that it can handle the return value too. :)

Thanks,

> 
> -- Steve
  
Masami Hiramatsu (Google) Nov. 6, 2023, 1:05 a.m. UTC | #8
On Sun, 5 Nov 2023 18:33:01 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 6 Nov 2023 00:17:34 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Changelog nor code made it clear this was partial anything. So this is
> > still the partial thing?
> > 
> > Can we then pretty clear clarify all that, and make it clear which regs
> > are in there? Because when I do 'vim -t ftrace_regs' it just gets me a
> > seemingly pointless wrapper struct, no elucidating comments nothingses.
> 
> I agree it should be better documented (like everything else). The
> ftrace_regs must have all the registers needed to produce a function's
> arguments. For x86_64, that would be:
> 
>   rdi, rsi, rdx, r8, r9, rsp
> 
> Basically anything that is needed to call mcount/fentry.

Oops, I found I missed to save rsp. let me update it.

Anyway, this will be defined clearly. ftrace_regs needs to be a partial
set of registers related to the (kernel) function call.

 - registers which is used for passing the function parameters in
   integer registers and stack pointer (for parameters on memory).

 - registers which is used for passing the return values.

 - call-frame-pointer register if exists.

So for x86-64,

 - rdi, rsi, rcx, rdx, r8, r9, and rsp
 - rax and rdx
 - rbp

(BTW, why orig_rax is cleared?)

> But yes, it's still partial registers but for archs that support
> FTRACE_WITH_REGS, it can also hold all pt_regs which can be retrieved
> by the arch_ftrace_get_regs(), which is why there's a pt_regs struct in
> the x86 version. But that's not the case for arm64, as
> arch_ftrace_get_regs() will always return NULL.

The major reason of the DYNAMIC_FTRACE_WITH_REGS is livepatch and
kprobe on ftrace (if kprobe puts probe on the ftrace address, it uses
ftrace instead of breakpoint).

Thank you,
  
Peter Zijlstra Nov. 6, 2023, 10:19 a.m. UTC | #9
On Mon, Nov 06, 2023 at 09:38:50AM +0900, Masami Hiramatsu wrote:
> On Sun, 5 Nov 2023 18:34:09 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Sun, 5 Nov 2023 18:33:01 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > For x86_64, that would be:
> > > 
> > >   rdi, rsi, rdx, r8, r9, rsp
> > 
> > I missed rcx.
> 
> I would like to add rax to the list so that it can handle the return value too. :)

So something like so?


diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 897cf02c20b1..71bfe27594a5 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -36,6 +36,10 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
 
 #ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
 struct ftrace_regs {
+	/*
+	 * Partial, filled with:
+	 *  rax, rcx, rdx, rdi, rsi, r8, r9, rsp
+	 */
 	struct pt_regs		regs;
 };
  
Masami Hiramatsu (Google) Nov. 6, 2023, 12:47 p.m. UTC | #10
On Mon, 6 Nov 2023 11:19:32 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Nov 06, 2023 at 09:38:50AM +0900, Masami Hiramatsu wrote:
> > On Sun, 5 Nov 2023 18:34:09 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > On Sun, 5 Nov 2023 18:33:01 -0500
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > 
> > > > For x86_64, that would be:
> > > > 
> > > >   rdi, rsi, rdx, r8, r9, rsp
> > > 
> > > I missed rcx.
> > 
> > I would like to add rax to the list so that it can handle the return value too. :)
> 
> So something like so?
> 
> 
> diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> index 897cf02c20b1..71bfe27594a5 100644
> --- a/arch/x86/include/asm/ftrace.h
> +++ b/arch/x86/include/asm/ftrace.h
> @@ -36,6 +36,10 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
>  
>  #ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
>  struct ftrace_regs {
> +	/*
> +	 * Partial, filled with:
> +	 *  rax, rcx, rdx, rdi, rsi, r8, r9, rsp

Don't we need rbp too? (for frame pointer)


> +	 */
>  	struct pt_regs		regs;
>  };
  
Peter Zijlstra Nov. 6, 2023, 12:52 p.m. UTC | #11
On Mon, Nov 06, 2023 at 09:47:08PM +0900, Masami Hiramatsu wrote:
> On Mon, 6 Nov 2023 11:19:32 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Mon, Nov 06, 2023 at 09:38:50AM +0900, Masami Hiramatsu wrote:
> > > On Sun, 5 Nov 2023 18:34:09 -0500
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > 
> > > > On Sun, 5 Nov 2023 18:33:01 -0500
> > > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > > 
> > > > > For x86_64, that would be:
> > > > > 
> > > > >   rdi, rsi, rdx, r8, r9, rsp
> > > > 
> > > > I missed rcx.
> > > 
> > > I would like to add rax to the list so that it can handle the return value too. :)
> > 
> > So something like so?
> > 
> > 
> > diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> > index 897cf02c20b1..71bfe27594a5 100644
> > --- a/arch/x86/include/asm/ftrace.h
> > +++ b/arch/x86/include/asm/ftrace.h
> > @@ -36,6 +36,10 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
> >  
> >  #ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
> >  struct ftrace_regs {
> > +	/*
> > +	 * Partial, filled with:
> > +	 *  rax, rcx, rdx, rdi, rsi, r8, r9, rsp
> 
> Don't we need rbp too? (for frame pointer)

/me goes stare at ftrace_64.S, and yes it appears it fills out rbp too.
  
Steven Rostedt Nov. 6, 2023, 4:37 p.m. UTC | #12
On Mon, 6 Nov 2023 10:05:49 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> So for x86-64,
> 
>  - rdi, rsi, rcx, rdx, r8, r9, and rsp
>  - rax and rdx
>  - rbp
> 
> (BTW, why orig_rax is cleared?)

You mean from ftrace_caller?

That's a "hack" to determine if we need to call the direct trampoline or
not. When you have both a direct trampoline and ftrace functions on the
same function, it will call ftrace_ops_list_func() to iterate all the
registered ftrace callbacks. The direct callback helper will set "orig_rax"
to let the return of the ftrace trampoline call the direct callback.

Remember if a direct callback is by itself, the fentry will call that
direct trampoline without going through the ftrace trampoline. This is used
to tell the ftrace trampoline that it's attached to a direct caller and
needs to call that and not return back to the function it is tracing.

See later down in that file we have:

	/*
	 * If ORIG_RAX is anything but zero, make this a call to that.
	 * See arch_ftrace_set_direct_caller().
	 */
	testq	%rax, %rax

-- Steve
  
Masami Hiramatsu (Google) Nov. 7, 2023, 12:42 a.m. UTC | #13
On Mon, 6 Nov 2023 11:37:10 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 6 Nov 2023 10:05:49 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> 
> > So for x86-64,
> > 
> >  - rdi, rsi, rcx, rdx, r8, r9, and rsp
> >  - rax and rdx
> >  - rbp
> > 
> > (BTW, why orig_rax is cleared?)
> 
> You mean from ftrace_caller?
> 
> That's a "hack" to determine if we need to call the direct trampoline or
> not. When you have both a direct trampoline and ftrace functions on the
> same function, it will call ftrace_ops_list_func() to iterate all the
> registered ftrace callbacks. The direct callback helper will set "orig_rax"
> to let the return of the ftrace trampoline call the direct callback.

Got it. So does ftrace_regs need a placeholder for direct trampoline?
(Or, can we use a register to pass it?)
I think we don't need to clear it for return_to_handler() but if
`ftrace_regs` spec requires it, it is better to do so.

Thank you,

> 
> Remember if a direct callback is by itself, the fentry will call that
> direct trampoline without going through the ftrace trampoline. This is used
> to tell the ftrace trampoline that it's attached to a direct caller and
> needs to call that and not return back to the function it is tracing.
> 
> See later down in that file we have:
> 
> 	/*
> 	 * If ORIG_RAX is anything but zero, make this a call to that.
> 	 * See arch_ftrace_set_direct_caller().
> 	 */
> 	testq	%rax, %rax
> 
> -- Steve
  
Steven Rostedt Nov. 7, 2023, 3:06 a.m. UTC | #14
On Tue, 7 Nov 2023 09:42:58 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> Got it. So does ftrace_regs need a placeholder for direct trampoline?
> (Or, can we use a register to pass it?)
> I think we don't need to clear it for return_to_handler() but if
> `ftrace_regs` spec requires it, it is better to do so.

It's per arch defined. I think I wrote somewhere that it just needs to pass
back something that can tell if the handler is to return to a direct
trampoline or not. It could be a unused register, or something else.

It's only needed if an architecture supports direct trampolines.

-- Steve
  
Masami Hiramatsu (Google) Nov. 7, 2023, 5:43 a.m. UTC | #15
On Mon, 6 Nov 2023 22:06:17 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 7 Nov 2023 09:42:58 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> 
> > Got it. So does ftrace_regs need a placeholder for direct trampoline?
> > (Or, can we use a register to pass it?)
> > I think we don't need to clear it for return_to_handler() but if
> > `ftrace_regs` spec requires it, it is better to do so.
> 
> It's per arch defined. I think I wrote somewhere that it just needs to pass
> back something that can tell if the handler is to return to a direct
> trampoline or not. It could be a unused register, or something else.

Oh, I meant the flag (address) for "return" trampoline. If we have
direct "return" trampoline we may use it, but currently not.

> 
> It's only needed if an architecture supports direct trampolines.

I see, and x86_64 needs it.
OK, maybe better to keep it clear on x86-64 even on the
return handler.

Thank you,

> 
> -- Steve
  
Steven Rostedt Nov. 7, 2023, 1:48 p.m. UTC | #16
On Tue, 7 Nov 2023 14:43:28 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> > 
> > It's only needed if an architecture supports direct trampolines.  
> 
> I see, and x86_64 needs it.
> OK, maybe better to keep it clear on x86-64 even on the
> return handler.

As it is arch specific, I'm not sure it matters for the return handler, as
the return should never call a direct trampoline.

-- Steve
  
Steven Rostedt Nov. 7, 2023, 2:09 p.m. UTC | #17
On Tue, 7 Nov 2023 08:48:44 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 7 Nov 2023 14:43:28 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> 
> > > 
> > > It's only needed if an architecture supports direct trampolines.    
> > 
> > I see, and x86_64 needs it.
> > OK, maybe better to keep it clear on x86-64 even on the
> > return handler.  
> 
> As it is arch specific, I'm not sure it matters for the return handler, as
> the return should never call a direct trampoline.

Just to clarify, the return trampoline should not bother touching that
register. The register was cleared in the fentry trampoline before calling
all the callbacks because the arch_ftrace_set_direct_caller() would set it.
Then on return of calling the function callbacks, it would test if
something set it or not.

If the return trampoline is not testing it after the return from the
callbacks, there's no reason to clear it. The fentry trampoline used it to
communicate to itself:

	orig_rax = 0;

	call ftrace_ops_list_func()

	/* Did something set orig_rax? */
	if (orig_rax != 0)
		return orig_rax;

It's not setting it to communicate with the callbacks. That is, the
callback does not expect it to be set.

-- Steve
  

Patch

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 66bfabae8814..4b4c2f9d67da 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -219,7 +219,8 @@  config X86
 	select HAVE_FAST_GUP
 	select HAVE_FENTRY			if X86_64 || DYNAMIC_FTRACE
 	select HAVE_FTRACE_MCOUNT_RECORD
-	select HAVE_FUNCTION_GRAPH_RETVAL	if HAVE_FUNCTION_GRAPH_TRACER
+	select HAVE_FUNCTION_GRAPH_FREGS	if HAVE_DYNAMIC_FTRACE_WITH_ARGS
+	select HAVE_FUNCTION_GRAPH_RETVAL	if !HAVE_DYNAMIC_FTRACE_WITH_ARGS
 	select HAVE_FUNCTION_GRAPH_TRACER	if X86_32 || (X86_64 && DYNAMIC_FTRACE)
 	select HAVE_FUNCTION_TRACER
 	select HAVE_GCC_PLUGINS
diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index 945cfa5f7239..41351a621753 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -348,21 +348,35 @@  STACK_FRAME_NON_STANDARD_FP(__fentry__)
 SYM_CODE_START(return_to_handler)
 	UNWIND_HINT_UNDEFINED
 	ANNOTATE_NOENDBR
-	subq  $24, %rsp
+	/*
+	 * We add enough stack to save all regs, but saves only registers
+	 * for function params.
+	 */
+	subq $(FRAME_SIZE), %rsp
+	movq %rax, RAX(%rsp)
+	movq %rcx, RCX(%rsp)
+	movq %rdx, RDX(%rsp)
+	movq %rsi, RSI(%rsp)
+	movq %rdi, RDI(%rsp)
+	movq %r8, R8(%rsp)
+	movq %r9, R9(%rsp)
+	movq $0, ORIG_RAX(%rsp)
+	movq %rbp, RBP(%rsp)
 
-	/* Save the return values */
-	movq %rax, (%rsp)
-	movq %rdx, 8(%rsp)
-	movq %rbp, 16(%rsp)
 	movq %rsp, %rdi
 
 	call ftrace_return_to_handler
 
 	movq %rax, %rdi
-	movq 8(%rsp), %rdx
-	movq (%rsp), %rax
 
-	addq $24, %rsp
+	/*
+	 * Restore only rax and rdx because other registers are not used
+	 * for return value nor callee saved. Caller will reuse/recover it.
+	 */
+	movq RDX(%rsp), %rdx
+	movq RAX(%rsp), %rax
+
+	addq $(FRAME_SIZE), %rsp
 	/*
 	 * Jump back to the old return address. This cannot be JMP_NOSPEC rdi
 	 * since IBT would demand that contain ENDBR, which simply isn't so for