[v5,03/15] ring-buffer: Add interface for configuring trace sub buffer size

Message ID 20231219185628.298324722@goodmis.org
State New
Headers
Series ring-buffer/tracing: Allow ring buffer to have bigger sub buffers |

Commit Message

Steven Rostedt Dec. 19, 2023, 6:54 p.m. UTC
  From: "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com>

The trace ring buffer sub page size can be configured, per trace
instance. A new ftrace file "buffer_subbuf_order" is added to get and
set the size of the ring buffer sub page for current trace instance.
The size must be an order of system page size, that's why the new
interface works with system page order, instead of absolute page size:
0 means the ring buffer sub page is equal to 1 system page and so
forth:
0 - 1 system page
1 - 2 system pages
2 - 4 system pages
...
The ring buffer sub page size is limited between 1 and 128 system
pages. The default value is 1 system page.
New ring buffer APIs are introduced:
 ring_buffer_subbuf_order_set()
 ring_buffer_subbuf_order_get()
 ring_buffer_subbuf_size_get()

Link: https://lore.kernel.org/linux-trace-devel/20211213094825.61876-4-tz.stoyanov@gmail.com

Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 include/linux/ring_buffer.h |  4 ++
 kernel/trace/ring_buffer.c  | 73 +++++++++++++++++++++++++++++++++++++
 kernel/trace/trace.c        | 48 ++++++++++++++++++++++++
 3 files changed, 125 insertions(+)
  

Comments

Masami Hiramatsu (Google) Dec. 20, 2023, 2:26 p.m. UTC | #1
On Tue, 19 Dec 2023 13:54:17 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> +/**
> + * ring_buffer_subbuf_order_set - set the size of ring buffer sub page.
> + * @buffer: The ring_buffer to set the new page size.
> + * @order: Order of the system pages in one sub buffer page
> + *
> + * By default, one ring buffer pages equals to one system page. This API can be
> + * used to set new size of the ring buffer page. The size must be order of
> + * system page size, that's why the input parameter @order is the order of
> + * system pages that are allocated for one ring buffer page:
> + *  0 - 1 system page
> + *  1 - 2 system pages
> + *  3 - 4 system pages
> + *  ...

Don't we have the max order of the pages?

> + *
> + * Returns 0 on success or < 0 in case of an error.
> + */
> +int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order)
> +{
> +	int psize;
> +
> +	if (!buffer || order < 0)
> +		return -EINVAL;
> +
> +	if (buffer->subbuf_order == order)
> +		return 0;
> +
> +	psize = (1 << order) * PAGE_SIZE;
> +	if (psize <= BUF_PAGE_HDR_SIZE)
> +		return -EINVAL;
> +
> +	buffer->subbuf_order = order;
> +	buffer->subbuf_size = psize - BUF_PAGE_HDR_SIZE;
> +
> +	/* Todo: reset the buffer with the new page size */
> +

I just wonder why there is no reallocate the sub buffers here.
Is it OK to change the sub buffer page size and order while
using the ring buffer?

Thank you,
  
Steven Rostedt Dec. 20, 2023, 2:40 p.m. UTC | #2
On Wed, 20 Dec 2023 23:26:19 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> On Tue, 19 Dec 2023 13:54:17 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > +/**
> > + * ring_buffer_subbuf_order_set - set the size of ring buffer sub page.
> > + * @buffer: The ring_buffer to set the new page size.
> > + * @order: Order of the system pages in one sub buffer page
> > + *
> > + * By default, one ring buffer pages equals to one system page. This API can be
> > + * used to set new size of the ring buffer page. The size must be order of
> > + * system page size, that's why the input parameter @order is the order of
> > + * system pages that are allocated for one ring buffer page:
> > + *  0 - 1 system page
> > + *  1 - 2 system pages
> > + *  3 - 4 system pages
> > + *  ...  
> 
> Don't we have the max order of the pages?

Actually there is. I think it's 7?

Honestly, anything over 5 is probably too much. But hey.

> 
> > + *
> > + * Returns 0 on success or < 0 in case of an error.
> > + */
> > +int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order)
> > +{
> > +	int psize;
> > +
> > +	if (!buffer || order < 0)
> > +		return -EINVAL;
> > +
> > +	if (buffer->subbuf_order == order)
> > +		return 0;
> > +
> > +	psize = (1 << order) * PAGE_SIZE;
> > +	if (psize <= BUF_PAGE_HDR_SIZE)
> > +		return -EINVAL;
> > +
> > +	buffer->subbuf_order = order;
> > +	buffer->subbuf_size = psize - BUF_PAGE_HDR_SIZE;
> > +
> > +	/* Todo: reset the buffer with the new page size */
> > +  
> 
> I just wonder why there is no reallocate the sub buffers here.
> Is it OK to change the sub buffer page size and order while
> using the ring buffer?

Currently we disable the ring buffer to do the update.

-- Steve
  
Masami Hiramatsu (Google) Dec. 21, 2023, 12:10 a.m. UTC | #3
On Wed, 20 Dec 2023 09:40:30 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 20 Dec 2023 23:26:19 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> 
> > On Tue, 19 Dec 2023 13:54:17 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > +/**
> > > + * ring_buffer_subbuf_order_set - set the size of ring buffer sub page.
> > > + * @buffer: The ring_buffer to set the new page size.
> > > + * @order: Order of the system pages in one sub buffer page
> > > + *
> > > + * By default, one ring buffer pages equals to one system page. This API can be
> > > + * used to set new size of the ring buffer page. The size must be order of
> > > + * system page size, that's why the input parameter @order is the order of
> > > + * system pages that are allocated for one ring buffer page:
> > > + *  0 - 1 system page
> > > + *  1 - 2 system pages
> > > + *  3 - 4 system pages
> > > + *  ...  
> > 
> > Don't we have the max order of the pages?
> 
> Actually there is. I think it's 7?
> 
> Honestly, anything over 5 is probably too much. But hey.

Ah, I see. It is checked in subbuf_order_write() method (and it is embedded
directly). I think that 7 should be replaced with a macro, something like
RB_SUBBUF_ORDER_MAX and check it in this exposed function instead of write
method.

Thank you,
  

Patch

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index ce46218ce46d..12573306b889 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -202,6 +202,10 @@  struct trace_seq;
 int ring_buffer_print_entry_header(struct trace_seq *s);
 int ring_buffer_print_page_header(struct trace_buffer *buffer, struct trace_seq *s);
 
+int ring_buffer_subbuf_order_get(struct trace_buffer *buffer);
+int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order);
+int ring_buffer_subbuf_size_get(struct trace_buffer *buffer);
+
 enum ring_buffer_flags {
 	RB_FL_OVERWRITE		= 1 << 0,
 };
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index d9f656502400..20fc0121735d 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -507,6 +507,7 @@  struct trace_buffer {
 	bool				time_stamp_abs;
 
 	unsigned int			subbuf_size;
+	unsigned int			subbuf_order;
 	unsigned int			max_data_size;
 };
 
@@ -5761,6 +5762,78 @@  int ring_buffer_read_page(struct trace_buffer *buffer,
 }
 EXPORT_SYMBOL_GPL(ring_buffer_read_page);
 
+/**
+ * ring_buffer_subbuf_size_get - get size of the sub buffer.
+ * @buffer: the buffer to get the sub buffer size from
+ *
+ * Returns size of the sub buffer, in bytes.
+ */
+int ring_buffer_subbuf_size_get(struct trace_buffer *buffer)
+{
+	return buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_subbuf_size_get);
+
+/**
+ * ring_buffer_subbuf_order_get - get order of system sub pages in one buffer page.
+ * @buffer: The ring_buffer to get the system sub page order from
+ *
+ * By default, one ring buffer sub page equals to one system page. This parameter
+ * is configurable, per ring buffer. The size of the ring buffer sub page can be
+ * extended, but must be an order of system page size.
+ *
+ * Returns the order of buffer sub page size, in system pages:
+ * 0 means the sub buffer size is 1 system page and so forth.
+ * In case of an error < 0 is returned.
+ */
+int ring_buffer_subbuf_order_get(struct trace_buffer *buffer)
+{
+	if (!buffer)
+		return -EINVAL;
+
+	return buffer->subbuf_order;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_subbuf_order_get);
+
+/**
+ * ring_buffer_subbuf_order_set - set the size of ring buffer sub page.
+ * @buffer: The ring_buffer to set the new page size.
+ * @order: Order of the system pages in one sub buffer page
+ *
+ * By default, one ring buffer pages equals to one system page. This API can be
+ * used to set new size of the ring buffer page. The size must be order of
+ * system page size, that's why the input parameter @order is the order of
+ * system pages that are allocated for one ring buffer page:
+ *  0 - 1 system page
+ *  1 - 2 system pages
+ *  3 - 4 system pages
+ *  ...
+ *
+ * Returns 0 on success or < 0 in case of an error.
+ */
+int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order)
+{
+	int psize;
+
+	if (!buffer || order < 0)
+		return -EINVAL;
+
+	if (buffer->subbuf_order == order)
+		return 0;
+
+	psize = (1 << order) * PAGE_SIZE;
+	if (psize <= BUF_PAGE_HDR_SIZE)
+		return -EINVAL;
+
+	buffer->subbuf_order = order;
+	buffer->subbuf_size = psize - BUF_PAGE_HDR_SIZE;
+
+	/* Todo: reset the buffer with the new page size */
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_subbuf_order_set);
+
 /*
  * We only allocate new buffers, never free them if the CPU goes down.
  * If we were to free the buffer, then the user would lose any trace that was in
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 76dd0a4c8cb5..a010aba4c4a4 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9358,6 +9358,51 @@  static const struct file_operations buffer_percent_fops = {
 	.llseek		= default_llseek,
 };
 
+static ssize_t
+buffer_order_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
+{
+	struct trace_array *tr = filp->private_data;
+	char buf[64];
+	int r;
+
+	r = sprintf(buf, "%d\n", ring_buffer_subbuf_order_get(tr->array_buffer.buffer));
+
+	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+}
+
+static ssize_t
+buffer_order_write(struct file *filp, const char __user *ubuf,
+		   size_t cnt, loff_t *ppos)
+{
+	struct trace_array *tr = filp->private_data;
+	unsigned long val;
+	int ret;
+
+	ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
+	if (ret)
+		return ret;
+
+	/* limit between 1 and 128 system pages */
+	if (val < 0 || val > 7)
+		return -EINVAL;
+
+	ret = ring_buffer_subbuf_order_set(tr->array_buffer.buffer, val);
+	if (ret)
+		return ret;
+
+	(*ppos)++;
+
+	return cnt;
+}
+
+static const struct file_operations buffer_order_fops = {
+	.open		= tracing_open_generic_tr,
+	.read		= buffer_order_read,
+	.write		= buffer_order_write,
+	.release	= tracing_release_generic_tr,
+	.llseek		= default_llseek,
+};
+
 static struct dentry *trace_instance_dir;
 
 static void
@@ -9824,6 +9869,9 @@  init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
 	trace_create_file("buffer_percent", TRACE_MODE_WRITE, d_tracer,
 			tr, &buffer_percent_fops);
 
+	trace_create_file("buffer_subbuf_order", TRACE_MODE_WRITE, d_tracer,
+			  tr, &buffer_order_fops);
+
 	create_trace_options_dir(tr);
 
 #ifdef CONFIG_TRACER_MAX_TRACE