[4/5] x86/ftrace: Remove SYSTEM_BOOTING exceptions

Message ID 20221025201057.945960823@infradead.org
State New
Headers
Series x86/ftrace: Cure boot time W+X mapping |

Commit Message

Peter Zijlstra Oct. 25, 2022, 8:07 p.m. UTC
  Now that text_poke is available before ftrace, remove the
SYSTEM_BOOTING exceptions.

Specifically, this cures a W+X case during boot.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/alternative.c |   10 ----------
 arch/x86/kernel/ftrace.c      |    3 +--
 2 files changed, 1 insertion(+), 12 deletions(-)
  

Comments

Steven Rostedt Oct. 25, 2022, 8:59 p.m. UTC | #1
On Tue, 25 Oct 2022 22:07:00 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Now that text_poke is available before ftrace, remove the
> SYSTEM_BOOTING exceptions.
> 
> Specifically, this cures a W+X case during boot.

We have W+X all over the place (the entire kernel text). And I don't think
we really want this.

This will slow down boots in general, as it will cause all static_branches
to use this memory page logic. And I don't think we really want to do
that at boot up when we don't need to.

I would change this to:

	if (unlikely(system_state == SYSTEM_BOOTING) &&
	    core_kernel_text((unsigned long)addr)) {

This way we still do memcpy() on all core kernel text which is still
writable. It was the ftrace allocated trampoline that caused issues, not
the locations that were being updated.

-- Steve



> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/alternative.c |   10 ----------
>  arch/x86/kernel/ftrace.c      |    3 +--
>  2 files changed, 1 insertion(+), 12 deletions(-)
> 
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -1681,11 +1681,6 @@ void __ref text_poke_queue(void *addr, c
>  {
>  	struct text_poke_loc *tp;
>  
> -	if (unlikely(system_state == SYSTEM_BOOTING)) {
> -		text_poke_early(addr, opcode, len);
> -		return;
> -	}
> -
>  	text_poke_flush(addr);
>  
>  	tp = &tp_vec[tp_vec_nr++];
> @@ -1707,11 +1702,6 @@ void __ref text_poke_bp(void *addr, cons
>  {
>  	struct text_poke_loc tp;
>  
> -	if (unlikely(system_state == SYSTEM_BOOTING)) {
> -		text_poke_early(addr, opcode, len);
> -		return;
> -	}
> -
>  	text_poke_loc_init(&tp, addr, opcode, len, emulate);
>  	text_poke_bp_batch(&tp, 1);
>  }
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -415,8 +415,7 @@ create_trampoline(struct ftrace_ops *ops
>  
>  	set_vm_flush_reset_perms(trampoline);
>  
> -	if (likely(system_state != SYSTEM_BOOTING))
> -		set_memory_ro((unsigned long)trampoline, npages);
> +	set_memory_ro((unsigned long)trampoline, npages);
>  	set_memory_x((unsigned long)trampoline, npages);
>  	return (unsigned long)trampoline;
>  fail:
>
  
Peter Zijlstra Oct. 26, 2022, 7:02 a.m. UTC | #2
On Tue, Oct 25, 2022 at 04:59:56PM -0400, Steven Rostedt wrote:
> On Tue, 25 Oct 2022 22:07:00 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Now that text_poke is available before ftrace, remove the
> > SYSTEM_BOOTING exceptions.
> > 
> > Specifically, this cures a W+X case during boot.
> 
> We have W+X all over the place (the entire kernel text). And I don't think
> we really want this.
> 
> This will slow down boots in general, as it will cause all static_branches
> to use this memory page logic. And I don't think we really want to do
> that at boot up when we don't need to.

Both static_call and jump_label explicitly call text_poke_early() when
appropriate.

> I would change this to:
> 
> 	if (unlikely(system_state == SYSTEM_BOOTING) &&
> 	    core_kernel_text((unsigned long)addr)) {
> 
> This way we still do memcpy() on all core kernel text which is still
> writable. It was the ftrace allocated trampoline that caused issues, not
> the locations that were being updated.

I would suggest changing ftrace to call text_poke_early() when
appropriate if it matters (it already does a little of that); doing a
boot test with and without my patch 4 on shows no noticable overhead
over being horribly slow either way.
  

Patch

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1681,11 +1681,6 @@  void __ref text_poke_queue(void *addr, c
 {
 	struct text_poke_loc *tp;
 
-	if (unlikely(system_state == SYSTEM_BOOTING)) {
-		text_poke_early(addr, opcode, len);
-		return;
-	}
-
 	text_poke_flush(addr);
 
 	tp = &tp_vec[tp_vec_nr++];
@@ -1707,11 +1702,6 @@  void __ref text_poke_bp(void *addr, cons
 {
 	struct text_poke_loc tp;
 
-	if (unlikely(system_state == SYSTEM_BOOTING)) {
-		text_poke_early(addr, opcode, len);
-		return;
-	}
-
 	text_poke_loc_init(&tp, addr, opcode, len, emulate);
 	text_poke_bp_batch(&tp, 1);
 }
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -415,8 +415,7 @@  create_trampoline(struct ftrace_ops *ops
 
 	set_vm_flush_reset_perms(trampoline);
 
-	if (likely(system_state != SYSTEM_BOOTING))
-		set_memory_ro((unsigned long)trampoline, npages);
+	set_memory_ro((unsigned long)trampoline, npages);
 	set_memory_x((unsigned long)trampoline, npages);
 	return (unsigned long)trampoline;
 fail: