[for-next,13/25] x86/mm/kmmio: Use rcu_read_lock_sched_notrace()

Message ID 20221210135825.241167123@goodmis.org
State New
Headers
Series tracing: Updates for 6.2 |

Commit Message

Steven Rostedt Dec. 10, 2022, 1:58 p.m. UTC
  From: Steven Rostedt <rostedt@goodmis.org>

The mmiotrace tracer is "special". The purpose is to help reverse engineer
binary drivers by removing the memory allocated by the driver and when the
driver goes to access it, a fault occurs, the mmiotracer will record what
the driver was doing and then do the work on its behalf by single stepping
through the process.

But to achieve this ability, it must do some special things. One is to
take the rcu_read_lock() when the fault occurs, and then release it in the
breakpoint that is single stepping. This makes lockdep unhappy, as it
changes the state of RCU from within an exception that is not contained in
that exception, and we get a nasty splat from lockdep.

Instead, switch to rcu_read_lock_sched_notrace() as the RCU sched variant
has the same grace period as normal RCU. This is basically the same as
rcu_read_lock() but does not make lockdep complain about it.

Note, the preempt_disable() is still needed as it uses preempt_enable_no_resched().

Link: https://lore.kernel.org/linux-trace-kernel/20221209134144.04f33626@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Karol Herbst <karolherbst@gmail.com>
Cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 arch/x86/mm/kmmio.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
  

Comments

Paul E. McKenney Dec. 10, 2022, 5:47 p.m. UTC | #1
On Sat, Dec 10, 2022 at 08:58:03AM -0500, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> 
> The mmiotrace tracer is "special". The purpose is to help reverse engineer
> binary drivers by removing the memory allocated by the driver and when the
> driver goes to access it, a fault occurs, the mmiotracer will record what
> the driver was doing and then do the work on its behalf by single stepping
> through the process.
> 
> But to achieve this ability, it must do some special things. One is to
> take the rcu_read_lock() when the fault occurs, and then release it in the
> breakpoint that is single stepping. This makes lockdep unhappy, as it
> changes the state of RCU from within an exception that is not contained in
> that exception, and we get a nasty splat from lockdep.
> 
> Instead, switch to rcu_read_lock_sched_notrace() as the RCU sched variant
> has the same grace period as normal RCU. This is basically the same as
> rcu_read_lock() but does not make lockdep complain about it.
> 
> Note, the preempt_disable() is still needed as it uses preempt_enable_no_resched().
> 
> Link: https://lore.kernel.org/linux-trace-kernel/20221209134144.04f33626@gandalf.local.home
> 
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Karol Herbst <karolherbst@gmail.com>
> Cc: Pekka Paalanen <ppaalanen@gmail.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Executable code can be the best form of comment.  ;-)

This does mess with preempt_count() redundantly, but the overhead from
that should be way down in the noise.

Acked-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  arch/x86/mm/kmmio.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/mm/kmmio.c b/arch/x86/mm/kmmio.c
> index edb486450158..853c49877c16 100644
> --- a/arch/x86/mm/kmmio.c
> +++ b/arch/x86/mm/kmmio.c
> @@ -254,7 +254,7 @@ int kmmio_handler(struct pt_regs *regs, unsigned long addr)
>  	 * again.
>  	 */
>  	preempt_disable();
> -	rcu_read_lock();
> +	rcu_read_lock_sched_notrace();
>  
>  	faultpage = get_kmmio_fault_page(page_base);
>  	if (!faultpage) {
> @@ -323,7 +323,7 @@ int kmmio_handler(struct pt_regs *regs, unsigned long addr)
>  	return 1; /* fault handled */
>  
>  no_kmmio:
> -	rcu_read_unlock();
> +	rcu_read_unlock_sched_notrace();
>  	preempt_enable_no_resched();
>  	return ret;
>  }
> @@ -363,7 +363,7 @@ static int post_kmmio_handler(unsigned long condition, struct pt_regs *regs)
>  	/* These were acquired in kmmio_handler(). */
>  	ctx->active--;
>  	BUG_ON(ctx->active);
> -	rcu_read_unlock();
> +	rcu_read_unlock_sched_notrace();
>  	preempt_enable_no_resched();
>  
>  	/*
> -- 
> 2.35.1
> 
>
  
Steven Rostedt Dec. 10, 2022, 6:34 p.m. UTC | #2
On Sat, 10 Dec 2022 09:47:53 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> > Note, the preempt_disable() is still needed as it uses preempt_enable_no_resched().
> > 

 ...

> Executable code can be the best form of comment.  ;-)
> 
> This does mess with preempt_count() redundantly, but the overhead from
> that should be way down in the noise.

I was going to remove it, but then I realized that it would be a functional
change, as from the comment above, it uses "preempt_enable_no_resched(),
which there is not a rcu_read_unlock_sched() variant.

> 
> Acked-by: Paul E. McKenney <paulmck@kernel.org>

Thanks! I'll add this to the commit.

-- Steve
  
Paul E. McKenney Dec. 10, 2022, 9:34 p.m. UTC | #3
On Sat, Dec 10, 2022 at 01:34:25PM -0500, Steven Rostedt wrote:
> On Sat, 10 Dec 2022 09:47:53 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > > Note, the preempt_disable() is still needed as it uses preempt_enable_no_resched().
> > > 
> 
>  ...
> 
> > Executable code can be the best form of comment.  ;-)
> > 
> > This does mess with preempt_count() redundantly, but the overhead from
> > that should be way down in the noise.
> 
> I was going to remove it, but then I realized that it would be a functional
> change, as from the comment above, it uses "preempt_enable_no_resched(),
> which there is not a rcu_read_unlock_sched() variant.

If this happens often enough, it might be worth adding something like
rcu_read_unlock_sched_no_resched(), but we clearly are not there yet.
Especially not with a name like that!  ;-)

							Thanx, Paul

> > Acked-by: Paul E. McKenney <paulmck@kernel.org>
> 
> Thanks! I'll add this to the commit.
> 
> -- Steve
  
Steven Rostedt Dec. 10, 2022, 10:32 p.m. UTC | #4
On Sat, 10 Dec 2022 13:34:12 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> > I was going to remove it, but then I realized that it would be a functional
> > change, as from the comment above, it uses "preempt_enable_no_resched(),
> > which there is not a rcu_read_unlock_sched() variant.  
> 
> If this happens often enough, it might be worth adding something like
> rcu_read_unlock_sched_no_resched(), but we clearly are not there yet.
> Especially not with a name like that!  ;-)

Please don't ;-)

This is only to handle the bizarre case that mmio tracing does. Remember,
this tracer is only for those that want to reverse engineer a binary
driver. It's not even SMP safe! When you enable it, it shuts down all but
one CPU. This is actually the reason I worked so hard to keep it working
with lockdep. The shutting down of CPUs has caught so many bugs in other
parts of the kernel! ;-)

Thus, anything that mmio tracer does, is considered niche, and not
something to much care about.

-- Steve
  
Thomas Gleixner Dec. 10, 2022, 11:30 p.m. UTC | #5
On Sat, Dec 10 2022 at 13:34, Steven Rostedt wrote:
> On Sat, 10 Dec 2022 09:47:53 -0800 "Paul E. McKenney" <paulmck@kernel.org> wrote:
>> This does mess with preempt_count() redundantly, but the overhead from
>> that should be way down in the noise.
>
> I was going to remove it, but then I realized that it would be a functional
> change, as from the comment above, it uses "preempt_enable_no_resched(),
> which there is not a rcu_read_unlock_sched() variant.

preempt_enable_no_resched() in this context is simply garbage.

preempt_enable_no_resched() tries to avoid the overhead of checking
whether rescheduling is due after decrementing preempt_count() because
the code which it this claims to know that it is _not_ the outermost one
which brings preempt count back to preemtible state.

I concede that there are hot paths which actually can benefit, but this
code has exactly _ZERO_ benefit from that. Taking that tracing exception
and handling it is orders of magnitudes more expensive than a regular
preempt_enable().

So just get rid of it and don't proliferate cargo cult programming.

Thanks,

        tglx
  
Steven Rostedt Dec. 10, 2022, 11:55 p.m. UTC | #6
On Sun, 11 Dec 2022 00:30:36 +0100
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Sat, Dec 10 2022 at 13:34, Steven Rostedt wrote:
> > On Sat, 10 Dec 2022 09:47:53 -0800 "Paul E. McKenney" <paulmck@kernel.org> wrote:  
> >> This does mess with preempt_count() redundantly, but the overhead from
> >> that should be way down in the noise.  
> >
> > I was going to remove it, but then I realized that it would be a functional
> > change, as from the comment above, it uses "preempt_enable_no_resched(),
> > which there is not a rcu_read_unlock_sched() variant.  
> 
> preempt_enable_no_resched() in this context is simply garbage.
> 
> preempt_enable_no_resched() tries to avoid the overhead of checking
> whether rescheduling is due after decrementing preempt_count() because
> the code which it this claims to know that it is _not_ the outermost one
> which brings preempt count back to preemtible state.
> 
> I concede that there are hot paths which actually can benefit, but this
> code has exactly _ZERO_ benefit from that. Taking that tracing exception
> and handling it is orders of magnitudes more expensive than a regular
> preempt_enable().
> 
> So just get rid of it and don't proliferate cargo cult programming.
> 

The point of the patch is to just fix the lockdep issue. I'm happy to
remove that "no_resched" (I was planning to), but that would be a separate
change, with a different purpose, and thus a separate patch.

-- Steve
  
Paul E. McKenney Dec. 11, 2022, 5:52 a.m. UTC | #7
On Sat, Dec 10, 2022 at 05:32:27PM -0500, Steven Rostedt wrote:
> On Sat, 10 Dec 2022 13:34:12 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > > I was going to remove it, but then I realized that it would be a functional
> > > change, as from the comment above, it uses "preempt_enable_no_resched(),
> > > which there is not a rcu_read_unlock_sched() variant.  
> > 
> > If this happens often enough, it might be worth adding something like
> > rcu_read_unlock_sched_no_resched(), but we clearly are not there yet.
> > Especially not with a name like that!  ;-)
> 
> Please don't ;-)
> 
> This is only to handle the bizarre case that mmio tracing does. Remember,
> this tracer is only for those that want to reverse engineer a binary
> driver. It's not even SMP safe! When you enable it, it shuts down all but
> one CPU. This is actually the reason I worked so hard to keep it working
> with lockdep. The shutting down of CPUs has caught so many bugs in other
> parts of the kernel! ;-)
> 
> Thus, anything that mmio tracer does, is considered niche, and not
> something to much care about.

Agreed, as I said, we are clearly not there yet.

							Thanx, Paul
  
Thomas Gleixner Dec. 12, 2022, 10:51 a.m. UTC | #8
On Sat, Dec 10 2022 at 18:55, Steven Rostedt wrote:
> On Sun, 11 Dec 2022 00:30:36 +0100
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> I concede that there are hot paths which actually can benefit, but this
>> code has exactly _ZERO_ benefit from that. Taking that tracing exception
>> and handling it is orders of magnitudes more expensive than a regular
>> preempt_enable().
>> 
>> So just get rid of it and don't proliferate cargo cult programming.
>> 
> The point of the patch is to just fix the lockdep issue. I'm happy to
> remove that "no_resched" (I was planning to), but that would be a separate
> change, with a different purpose, and thus a separate patch.

Right, but please make that part of the series.

Thanks,

        tglx
  
Steven Rostedt Dec. 12, 2022, 3:42 p.m. UTC | #9
On Mon, 12 Dec 2022 11:51:51 +0100
Thomas Gleixner <tglx@linutronix.de> wrote:

> Right, but please make that part of the series.

I just pushed out a patch to do this.

  https://lore.kernel.org/all/20221212103703.7129cc5d@gandalf.local.home/

Feel free to ack it.

Thanks,

-- Steve
  

Patch

diff --git a/arch/x86/mm/kmmio.c b/arch/x86/mm/kmmio.c
index edb486450158..853c49877c16 100644
--- a/arch/x86/mm/kmmio.c
+++ b/arch/x86/mm/kmmio.c
@@ -254,7 +254,7 @@  int kmmio_handler(struct pt_regs *regs, unsigned long addr)
 	 * again.
 	 */
 	preempt_disable();
-	rcu_read_lock();
+	rcu_read_lock_sched_notrace();
 
 	faultpage = get_kmmio_fault_page(page_base);
 	if (!faultpage) {
@@ -323,7 +323,7 @@  int kmmio_handler(struct pt_regs *regs, unsigned long addr)
 	return 1; /* fault handled */
 
 no_kmmio:
-	rcu_read_unlock();
+	rcu_read_unlock_sched_notrace();
 	preempt_enable_no_resched();
 	return ret;
 }
@@ -363,7 +363,7 @@  static int post_kmmio_handler(unsigned long condition, struct pt_regs *regs)
 	/* These were acquired in kmmio_handler(). */
 	ctx->active--;
 	BUG_ON(ctx->active);
-	rcu_read_unlock();
+	rcu_read_unlock_sched_notrace();
 	preempt_enable_no_resched();
 
 	/*