diff mbox series

ring-buffer: Handle resize in early boot up

Message ID	20221209101151.1fec1167@gandalf.local.home
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Date: Fri, 9 Dec 2022 10:11:51 -0500 From: Steven Rostedt <rostedt@goodmis.org> To: LKML <linux-kernel@vger.kernel.org> Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org>, Masami Hiramatsu <mhiramat@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Ross Zwisler <zwisler@google.com> Subject: [PATCH] ring-buffer: Handle resize in early boot up Message-ID: <20221209101151.1fec1167@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk
Series	ring-buffer: Handle resize in early boot up \| ring-buffer: Handle resize in early boot up

Commit Message

Steven Rostedt Dec. 9, 2022, 3:11 p.m. UTC

  From: Steven Rostedt <rostedt@goodmis.org>

With the new command line option that allows trace event triggers to be
added at boot, the "snapshot" trigger will allocate the snapshot buffer
very early, when interrupts can not be enabled. Allocating the ring buffer
is not the problem, but it also resizes it, which is, as the resize code
does synchronization that can not be preformed at early boot.

To handle this, first change the raw_spin_lock_irq() in rb_insert_pages()
to raw_spin_lock_irqsave(), such that the unlocking of that spin lock will
not enable interrupts.

Next, where it calls schedule_work_on(), disable migration and check if
the CPU to update is the current CPU, and if so, perform the work
directly, otherwise re-enable migration and call the schedule_work_on() to
the CPU that is being updated. The rb_insert_pages() just needs to be run
on the CPU that it is updating, and does not need preemption nor
interrupts disabled when calling it.

Link: https://lore.kernel.org/lkml/Y5J%2FCajlNh1gexvo@google.com/

Fixes: a01fdc897fa5 ("tracing: Add trace_trigger kernel command line option")
Reported-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

Comments

Ross Zwisler Dec. 9, 2022, 4:57 p.m. UTC | #1

On Fri, Dec 09, 2022 at 10:11:51AM -0500, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> 
> With the new command line option that allows trace event triggers to be
> added at boot, the "snapshot" trigger will allocate the snapshot buffer
> very early, when interrupts can not be enabled. Allocating the ring buffer
> is not the problem, but it also resizes it, which is, as the resize code
> does synchronization that can not be preformed at early boot.
> 
> To handle this, first change the raw_spin_lock_irq() in rb_insert_pages()
> to raw_spin_lock_irqsave(), such that the unlocking of that spin lock will
> not enable interrupts.
> 
> Next, where it calls schedule_work_on(), disable migration and check if
> the CPU to update is the current CPU, and if so, perform the work
> directly, otherwise re-enable migration and call the schedule_work_on() to
> the CPU that is being updated. The rb_insert_pages() just needs to be run
> on the CPU that it is updating, and does not need preemption nor
> interrupts disabled when calling it.
> 
> Link: https://lore.kernel.org/lkml/Y5J%2FCajlNh1gexvo@google.com/
> 
> Fixes: a01fdc897fa5 ("tracing: Add trace_trigger kernel command line option")
> Reported-by: Ross Zwisler <zwisler@google.com>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
<>
> @@ -2298,9 +2308,17 @@ int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
>  		if (!cpu_online(cpu_id))
>  			rb_update_pages(cpu_buffer);
>  		else {
> -			schedule_work_on(cpu_id,
> -					 &cpu_buffer->update_pages_work);
> -			wait_for_completion(&cpu_buffer->update_done);
> +			/* Run directly if possible. */
> +			migrate_disable();
> +			if (cpu_id == smp_processor_id()) {
> +				rb_update_pages(cpu_buffer);
> +				migrate_enable();
> +			} else {
> +				migrate_enable();
> +				schedule_work_on(cpu_id,
> +						 &cpu_buffer->update_pages_work);
> +				wait_for_completion(&cpu_buffer->update_done);

I ran with this patch on my test VM and hit the same Oops from the original
report.

I think the problem is that we're still trying to enable interrupts via
wait_for_completion():

wait_for_completion()
  wait_for_common()
    __wait_for_common()
      raw_spin_unlock_irq()
        _raw_spin_unlock_irq()
          __raw_spin_unlock_irq()
            local_irq_enable()

I'm testing on a QEMU VM with 4 virtual CPUs, if that helps WRT where work is
being scheduled (cpu_id == smp_processor_id).

Steven Rostedt Dec. 9, 2022, 6:13 p.m. UTC | #2

On Fri, 9 Dec 2022 09:57:48 -0700
Ross Zwisler <zwisler@google.com> wrote:

> > +			if (cpu_id == smp_processor_id()) {
> > +				rb_update_pages(cpu_buffer);
> > +				migrate_enable();
> > +			} else {
> > +				migrate_enable();
> > +				schedule_work_on(cpu_id,
> > +						 &cpu_buffer->update_pages_work);
> > +				wait_for_completion(&cpu_buffer->update_done);  
> 
> I ran with this patch on my test VM and hit the same Oops from the original
> report.
> 
> I think the problem is that we're still trying to enable interrupts via
> wait_for_completion():
> 
> wait_for_completion()
>   wait_for_common()
>     __wait_for_common()
>       raw_spin_unlock_irq()
>         _raw_spin_unlock_irq()
>           __raw_spin_unlock_irq()
>             local_irq_enable()
> 
> I'm testing on a QEMU VM with 4 virtual CPUs, if that helps WRT where work is
> being scheduled (cpu_id == smp_processor_id).

Can you show the backtrace with that. Because when I triggered it, the
other CPUs were not up and running. I'm testing this on a VM with 8 CPUs.

-- Steve

Ross Zwisler Dec. 9, 2022, 6:32 p.m. UTC | #3

On Fri, Dec 09, 2022 at 01:13:57PM -0500, Steven Rostedt wrote:
> On Fri, 9 Dec 2022 09:57:48 -0700
> Ross Zwisler <zwisler@google.com> wrote:
> 
> > > +			if (cpu_id == smp_processor_id()) {
> > > +				rb_update_pages(cpu_buffer);
> > > +				migrate_enable();
> > > +			} else {
> > > +				migrate_enable();
> > > +				schedule_work_on(cpu_id,
> > > +						 &cpu_buffer->update_pages_work);
> > > +				wait_for_completion(&cpu_buffer->update_done);  
> > 
> > I ran with this patch on my test VM and hit the same Oops from the original
> > report.
> > 
> > I think the problem is that we're still trying to enable interrupts via
> > wait_for_completion():
> > 
> > wait_for_completion()
> >   wait_for_common()
> >     __wait_for_common()
> >       raw_spin_unlock_irq()
> >         _raw_spin_unlock_irq()
> >           __raw_spin_unlock_irq()
> >             local_irq_enable()
> > 
> > I'm testing on a QEMU VM with 4 virtual CPUs, if that helps WRT where work is
> > being scheduled (cpu_id == smp_processor_id).
> 
> Can you show the backtrace with that. Because when I triggered it, the
> other CPUs were not up and running. I'm testing this on a VM with 8 CPUs.

Ah, apologies, I think I got confused in my testing.  It is indeed working for
me now.

Thank you for the fix!

Tested-by: Ross Zwisler <zwisler@google.com>

diff mbox series

Patch

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 843818ee4814..c366a0a9ddba 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2062,8 +2062,10 @@  rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
 {
 	struct list_head *pages = &cpu_buffer->new_pages;
 	int retries, success;
+	unsigned long flags;
 
-	raw_spin_lock_irq(&cpu_buffer->reader_lock);
+	/* Can be called at early boot up, where interrupts must not been enabled */
+	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
 	/*
 	 * We are holding the reader lock, so the reader page won't be swapped
 	 * in the ring buffer. Now we are racing with the writer trying to
@@ -2120,7 +2122,7 @@  rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
 	 * tracing
 	 */
 	RB_WARN_ON(cpu_buffer, !success);
-	raw_spin_unlock_irq(&cpu_buffer->reader_lock);
+	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 
 	/* free pages if they weren't inserted */
 	if (!success) {
@@ -2248,8 +2250,16 @@  int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
 				rb_update_pages(cpu_buffer);
 				cpu_buffer->nr_pages_to_update = 0;
 			} else {
-				schedule_work_on(cpu,
-						&cpu_buffer->update_pages_work);
+				/* Run directly if possible. */
+				migrate_disable();
+				if (cpu != smp_processor_id()) {
+					migrate_enable();
+					schedule_work_on(cpu,
+							 &cpu_buffer->update_pages_work);
+				} else {
+					update_pages_handler(&cpu_buffer->update_pages_work);
+					migrate_enable();
+				}
 			}
 		}
 
@@ -2298,9 +2308,17 @@  int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
 		if (!cpu_online(cpu_id))
 			rb_update_pages(cpu_buffer);
 		else {
-			schedule_work_on(cpu_id,
-					 &cpu_buffer->update_pages_work);
-			wait_for_completion(&cpu_buffer->update_done);
+			/* Run directly if possible. */
+			migrate_disable();
+			if (cpu_id == smp_processor_id()) {
+				rb_update_pages(cpu_buffer);
+				migrate_enable();
+			} else {
+				migrate_enable();
+				schedule_work_on(cpu_id,
+						 &cpu_buffer->update_pages_work);
+				wait_for_completion(&cpu_buffer->update_done);
+			}
 		}
 
 		cpu_buffer->nr_pages_to_update = 0;