[00/14] ring-buffer/tracing: Allow ring buffer to have bigger sub buffers

Message ID 20231210035404.053677508@goodmis.org
Headers
Series ring-buffer/tracing: Allow ring buffer to have bigger sub buffers |

Message

Steven Rostedt Dec. 10, 2023, 3:54 a.m. UTC
  Note, this has been on my todo list since the ring buffer was created back
in 2008.

Tzvetomir last worked on this in 2020 and I need to finally get it in.

His last series was:

  https://lore.kernel.org/linux-trace-devel/20211213094825.61876-1-tz.stoyanov@gmail.com/

With the description of:

   Currently the size of one sub buffer page is global for all buffers and
   it is hard coded to one system page. The patch set introduces configurable
   ring buffer sub page size, per ring buffer. A new user space interface is
   introduced, which allows to change the sub page size of the ftrace buffer,
   per ftrace instance.

I'm pulling in his patches mostly untouched, except that I had to tweak
a few things to forward port them.

The issues I found I added as the last 7 patches to the series, and then
I added documentation and a selftest.

Basically, events to the tracing subsystem are limited to just under a
PAGE_SIZE, as the ring buffer is split into "sub buffers" of one page
size, and an event can not be bigger than a sub buffer. This allows users
to change the size of a sub buffer by the order:

  echo 3 > /sys/kernel/tracing/buffer_subbuf_order

Will make each sub buffer a size of 8 pages, allowing events to be almost
as big as 8 pages in size (sub buffers do have meta data on them as
well, keeping an event from reaching the same size as a sub buffer).



Steven Rostedt (Google) (9):
      ring-buffer: Clear pages on error in ring_buffer_subbuf_order_set() failure
      ring-buffer: Do no swap cpu buffers if order is different
      ring-buffer: Make sure the spare sub buffer used for reads has same size
      tracing: Update snapshot order along with main buffer order
      tracing: Stop the tracing while changing the ring buffer subbuf size
      ring-buffer: Keep the same size when updating the order
      ring-buffer: Just update the subbuffers when changing their allocation order
      ring-buffer: Add documentation on the buffer_subbuf_order file
      ringbuffer/selftest: Add basic selftest to test chaning subbuf order

Tzvetomir Stoyanov (VMware) (5):
      ring-buffer: Refactor ring buffer implementation
      ring-buffer: Page size per ring buffer
      ring-buffer: Add interface for configuring trace sub buffer size
      ring-buffer: Set new size of the ring buffer sub page
      ring-buffer: Read and write to ring buffers with custom sub buffer size

----
 Documentation/trace/ftrace.rst                     |  27 ++
 include/linux/ring_buffer.h                        |  17 +-
 kernel/trace/ring_buffer.c                         | 406 ++++++++++++++++-----
 kernel/trace/ring_buffer_benchmark.c               |  10 +-
 kernel/trace/trace.c                               | 143 +++++++-
 kernel/trace/trace.h                               |   1 +
 kernel/trace/trace_events.c                        |  59 ++-
 .../ftrace/test.d/00basic/ringbuffer_order.tc      |  46 +++
 8 files changed, 588 insertions(+), 121 deletions(-)
 create mode 100644 tools/testing/selftests/ftrace/test.d/00basic/ringbuffer_order.tc
  

Comments

Mathieu Desnoyers Dec. 10, 2023, 2:17 p.m. UTC | #1
On 2023-12-09 22:54, Steven Rostedt wrote:
[...]
> 
> Basically, events to the tracing subsystem are limited to just under a
> PAGE_SIZE, as the ring buffer is split into "sub buffers" of one page
> size, and an event can not be bigger than a sub buffer. This allows users
> to change the size of a sub buffer by the order:
> 
>    echo 3 > /sys/kernel/tracing/buffer_subbuf_order
> 
> Will make each sub buffer a size of 8 pages, allowing events to be almost
> as big as 8 pages in size (sub buffers do have meta data on them as
> well, keeping an event from reaching the same size as a sub buffer).

Specifying the "order" of subbuffer size as a power of two of
number of pages is a poor UX choice for a user-facing ABI.

I would recommend allowing the user to specify the size in bytes, and
internally bump to size to the next power of 2, with a minimum of
PAGE_SIZE.

Thanks,

Mathieu
  
Steven Rostedt Dec. 10, 2023, 3:38 p.m. UTC | #2
On Sun, 10 Dec 2023 09:17:44 -0500
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> On 2023-12-09 22:54, Steven Rostedt wrote:
> [...]
> > 
> > Basically, events to the tracing subsystem are limited to just under a
> > PAGE_SIZE, as the ring buffer is split into "sub buffers" of one page
> > size, and an event can not be bigger than a sub buffer. This allows users
> > to change the size of a sub buffer by the order:
> > 
> >    echo 3 > /sys/kernel/tracing/buffer_subbuf_order
> > 
> > Will make each sub buffer a size of 8 pages, allowing events to be almost
> > as big as 8 pages in size (sub buffers do have meta data on them as
> > well, keeping an event from reaching the same size as a sub buffer).  
> 
> Specifying the "order" of subbuffer size as a power of two of
> number of pages is a poor UX choice for a user-facing ABI.
> 
> I would recommend allowing the user to specify the size in bytes, and
> internally bump to size to the next power of 2, with a minimum of
> PAGE_SIZE.

Thanks. I actually agree with you and thought about doing just that, but
decided to not make those changes and send out these patches with the
given API first. I wanted to see if you would comment on this ;-) You did
not disappoint!

I was thinking of keeping the same kind of interface as we have with the
buffer size "buffer_size_kb", and have it be "buffer_subbuf_size_kb", where
you specify the minimum size in kilobytes and it creates it, and the subbuf
may end up being bigger than specified (as that's more a implementation
detail).

Now that you called it out, I will add a patch to convert that as such. But
will keep the current patches in for historical reasons.

-- Steve