[v5,15/15] tracing: Update subbuffer with kilobytes not page order

Message ID 20231219185631.809766769@goodmis.org
State New
Headers
Series ring-buffer/tracing: Allow ring buffer to have bigger sub buffers |

Commit Message

Steven Rostedt Dec. 19, 2023, 6:54 p.m. UTC
  From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Using page order for deciding what the size of the ring buffer sub buffers
are is exposing a bit too much of the implementation. Although the sub
buffers are only allocated in orders of pages, allow the user to specify
the minimum size of each sub-buffer via kilobytes like they can with the
buffer size itself.

If the user specifies 3 via:

  echo 3 > buffer_subbuf_size_kb

Then the sub-buffer size will round up to 4kb (on a 4kb page size system).

If they specify:

  echo 6 > buffer_subbuf_size_kb

The sub-buffer size will become 8kb.

and so on.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 Documentation/trace/ftrace.rst                | 46 ++++++++-----------
 kernel/trace/trace.c                          | 38 +++++++++------
 ...fer_order.tc => ringbuffer_subbuf_size.tc} | 18 ++++----
 3 files changed, 54 insertions(+), 48 deletions(-)
 rename tools/testing/selftests/ftrace/test.d/00basic/{ringbuffer_order.tc => ringbuffer_subbuf_size.tc} (85%)
  

Comments

Masami Hiramatsu (Google) Dec. 21, 2023, 12:26 a.m. UTC | #1
On Tue, 19 Dec 2023 13:54:29 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> Using page order for deciding what the size of the ring buffer sub buffers
> are is exposing a bit too much of the implementation. Although the sub
> buffers are only allocated in orders of pages, allow the user to specify
> the minimum size of each sub-buffer via kilobytes like they can with the
> buffer size itself.
> 
> If the user specifies 3 via:
> 
>   echo 3 > buffer_subbuf_size_kb
> 
> Then the sub-buffer size will round up to 4kb (on a 4kb page size system).
> 
> If they specify:
> 
>   echo 6 > buffer_subbuf_size_kb
> 
> The sub-buffer size will become 8kb.

I think this is better interface. Can we apply this earlier in the series
to avoid rewriting the document and test code?

Thank you,

> 
> and so on.
> 
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
>  Documentation/trace/ftrace.rst                | 46 ++++++++-----------
>  kernel/trace/trace.c                          | 38 +++++++++------
>  ...fer_order.tc => ringbuffer_subbuf_size.tc} | 18 ++++----
>  3 files changed, 54 insertions(+), 48 deletions(-)
>  rename tools/testing/selftests/ftrace/test.d/00basic/{ringbuffer_order.tc => ringbuffer_subbuf_size.tc} (85%)
> 
> diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
> index 231d26ceedb0..933e7efb9f1b 100644
> --- a/Documentation/trace/ftrace.rst
> +++ b/Documentation/trace/ftrace.rst
> @@ -203,32 +203,26 @@ of ftrace. Here is a list of some of the key files:
>  
>  	This displays the total combined size of all the trace buffers.
>  
> -  buffer_subbuf_order:
> -
> -	This sets or displays the sub buffer page size order. The ring buffer
> -	is broken up into several same size "sub buffers". An event can not be
> -	bigger than the size of the sub buffer. Normally, the sub buffer is
> -	the size of the architecture's page (4K on x86). The sub buffer also
> -	contains meta data at the start which also limits the size of an event.
> -	That means when the sub buffer is a page size, no event can be larger
> -	than the page size minus the sub buffer meta data.
> -
> -	The buffer_subbuf_order allows the user to change the size of the sub
> -	buffer. As the sub buffer is a set of pages by the power of 2, thus
> -	the sub buffer total size is defined by the order:
> -
> -	order		size
> -	----		----
> -	0		PAGE_SIZE
> -	1		PAGE_SIZE * 2
> -	2		PAGE_SIZE * 4
> -	3		PAGE_SIZE * 8
> -
> -	Changing the order will change the sub buffer size allowing for events
> -	to be larger than the page size.
> -
> -	Note: When changing the order, tracing is stopped and any data in the
> -	ring buffer and the snapshot buffer will be discarded.
> +  buffer_subbuf_size_kb:
> +
> +	This sets or displays the sub buffer size. The ring buffer is broken up
> +	into several same size "sub buffers". An event can not be bigger than
> +	the size of the sub buffer. Normally, the sub buffer is the size of the
> +	architecture's page (4K on x86). The sub buffer also contains meta data
> +	at the start which also limits the size of an event.  That means when
> +	the sub buffer is a page size, no event can be larger than the page
> +	size minus the sub buffer meta data.
> +
> +	Note, the buffer_subbuf_size_kb is a way for the user to specify the
> +	minimum size of the subbuffer. The kernel may make it bigger due to the
> +	implementation details, or simply fail the operation if the kernel can
> +	not handle the request.
> +
> +	Changing the sub buffer size allows for events to be larger than the
> +	page size.
> +
> +	Note: When changing the sub-buffer size, tracing is stopped and any
> +	data in the ring buffer and the snapshot buffer will be discarded.
>  
>    free_buffer:
>  
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 82303bd2bba1..46dbe22121e9 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -9384,42 +9384,54 @@ static const struct file_operations buffer_percent_fops = {
>  };
>  
>  static ssize_t
> -buffer_order_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
> +buffer_subbuf_size_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
>  {
>  	struct trace_array *tr = filp->private_data;
> +	size_t size;
>  	char buf[64];
> +	int order;
>  	int r;
>  
> -	r = sprintf(buf, "%d\n", ring_buffer_subbuf_order_get(tr->array_buffer.buffer));
> +	order = ring_buffer_subbuf_order_get(tr->array_buffer.buffer);
> +	size = (PAGE_SIZE << order) / 1024;
> +
> +	r = sprintf(buf, "%zd\n", size);
>  
>  	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
>  }
>  
>  static ssize_t
> -buffer_order_write(struct file *filp, const char __user *ubuf,
> -		   size_t cnt, loff_t *ppos)
> +buffer_subbuf_size_write(struct file *filp, const char __user *ubuf,
> +			 size_t cnt, loff_t *ppos)
>  {
>  	struct trace_array *tr = filp->private_data;
>  	unsigned long val;
>  	int old_order;
> +	int order;
> +	int pages;
>  	int ret;
>  
>  	ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
>  	if (ret)
>  		return ret;
>  
> +	val *= 1024; /* value passed in is in KB */
> +
> +	pages = DIV_ROUND_UP(val, PAGE_SIZE);
> +	order = fls(pages - 1);
> +
>  	/* limit between 1 and 128 system pages */
> -	if (val < 0 || val > 7)
> +	if (order < 0 || order > 7)
>  		return -EINVAL;
>  
>  	/* Do not allow tracing while changing the order of the ring buffer */
>  	tracing_stop_tr(tr);
>  
>  	old_order = ring_buffer_subbuf_order_get(tr->array_buffer.buffer);
> -	if (old_order == val)
> +	if (old_order == order)
>  		goto out;
>  
> -	ret = ring_buffer_subbuf_order_set(tr->array_buffer.buffer, val);
> +	ret = ring_buffer_subbuf_order_set(tr->array_buffer.buffer, order);
>  	if (ret)
>  		goto out;
>  
> @@ -9428,7 +9440,7 @@ buffer_order_write(struct file *filp, const char __user *ubuf,
>  	if (!tr->allocated_snapshot)
>  		goto out_max;
>  
> -	ret = ring_buffer_subbuf_order_set(tr->max_buffer.buffer, val);
> +	ret = ring_buffer_subbuf_order_set(tr->max_buffer.buffer, order);
>  	if (ret) {
>  		/* Put back the old order */
>  		cnt = ring_buffer_subbuf_order_set(tr->array_buffer.buffer, old_order);
> @@ -9460,10 +9472,10 @@ buffer_order_write(struct file *filp, const char __user *ubuf,
>  	return cnt;
>  }
>  
> -static const struct file_operations buffer_order_fops = {
> +static const struct file_operations buffer_subbuf_size_fops = {
>  	.open		= tracing_open_generic_tr,
> -	.read		= buffer_order_read,
> -	.write		= buffer_order_write,
> +	.read		= buffer_subbuf_size_read,
> +	.write		= buffer_subbuf_size_write,
>  	.release	= tracing_release_generic_tr,
>  	.llseek		= default_llseek,
>  };
> @@ -9934,8 +9946,8 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
>  	trace_create_file("buffer_percent", TRACE_MODE_WRITE, d_tracer,
>  			tr, &buffer_percent_fops);
>  
> -	trace_create_file("buffer_subbuf_order", TRACE_MODE_WRITE, d_tracer,
> -			  tr, &buffer_order_fops);
> +	trace_create_file("buffer_subbuf_size_kb", TRACE_MODE_WRITE, d_tracer,
> +			  tr, &buffer_subbuf_size_fops);
>  
>  	create_trace_options_dir(tr);
>  
> diff --git a/tools/testing/selftests/ftrace/test.d/00basic/ringbuffer_order.tc b/tools/testing/selftests/ftrace/test.d/00basic/ringbuffer_subbuf_size.tc
> similarity index 85%
> rename from tools/testing/selftests/ftrace/test.d/00basic/ringbuffer_order.tc
> rename to tools/testing/selftests/ftrace/test.d/00basic/ringbuffer_subbuf_size.tc
> index ecbcc810e6c1..d44d09a33a74 100644
> --- a/tools/testing/selftests/ftrace/test.d/00basic/ringbuffer_order.tc
> +++ b/tools/testing/selftests/ftrace/test.d/00basic/ringbuffer_subbuf_size.tc
> @@ -1,7 +1,7 @@
>  #!/bin/sh
>  # SPDX-License-Identifier: GPL-2.0
> -# description: Change the ringbuffer sub-buffer order
> -# requires: buffer_subbuf_order
> +# description: Change the ringbuffer sub-buffer size
> +# requires: buffer_subbuf_size_kb
>  # flags: instance
>  
>  get_buffer_data_size() {
> @@ -52,8 +52,8 @@ write_buffer() {
>  }
>  
>  test_buffer() {
> -	orde=$1
> -	page_size=$((4096<<order))
> +	size_kb=$1
> +	page_size=$((size_kb*1024))
>  
>  	size=`get_buffer_data_size`
>  
> @@ -82,14 +82,14 @@ test_buffer() {
>  	fi
>  }
>  
> -ORIG=`cat buffer_subbuf_order`
> +ORIG=`cat buffer_subbuf_size_kb`
>  
> -# Could test bigger orders than 3, but then creating the string
> +# Could test bigger sizes than 32K, but then creating the string
>  # to write into the ring buffer takes too long
> -for a in 0 1 2 3 ; do
> -	echo $a > buffer_subbuf_order
> +for a in 4 8 16 32 ; do
> +	echo $a > buffer_subbuf_size_kb
>  	test_buffer $a
>  done
>  
> -echo $ORIG > buffer_subbuf_order
> +echo $ORIG > buffer_subbuf_size_kb
>  
> -- 
> 2.42.0
> 
>
  
Steven Rostedt Dec. 21, 2023, 1:57 a.m. UTC | #2
On Thu, 21 Dec 2023 09:26:21 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> > If the user specifies 3 via:
> > 
> >   echo 3 > buffer_subbuf_size_kb
> > 
> > Then the sub-buffer size will round up to 4kb (on a 4kb page size system).
> > 
> > If they specify:
> > 
> >   echo 6 > buffer_subbuf_size_kb
> > 
> > The sub-buffer size will become 8kb.  
> 
> I think this is better interface. Can we apply this earlier in the series
> to avoid rewriting the document and test code?

I kept it separate for testing purposes.

Through out all this, it was a good way to make sure the two approaches
were compatible. I still like to keep them separate as that's the way it
was developed. It's good to keep that history.

-- Steve
  

Patch

diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
index 231d26ceedb0..933e7efb9f1b 100644
--- a/Documentation/trace/ftrace.rst
+++ b/Documentation/trace/ftrace.rst
@@ -203,32 +203,26 @@  of ftrace. Here is a list of some of the key files:
 
 	This displays the total combined size of all the trace buffers.
 
-  buffer_subbuf_order:
-
-	This sets or displays the sub buffer page size order. The ring buffer
-	is broken up into several same size "sub buffers". An event can not be
-	bigger than the size of the sub buffer. Normally, the sub buffer is
-	the size of the architecture's page (4K on x86). The sub buffer also
-	contains meta data at the start which also limits the size of an event.
-	That means when the sub buffer is a page size, no event can be larger
-	than the page size minus the sub buffer meta data.
-
-	The buffer_subbuf_order allows the user to change the size of the sub
-	buffer. As the sub buffer is a set of pages by the power of 2, thus
-	the sub buffer total size is defined by the order:
-
-	order		size
-	----		----
-	0		PAGE_SIZE
-	1		PAGE_SIZE * 2
-	2		PAGE_SIZE * 4
-	3		PAGE_SIZE * 8
-
-	Changing the order will change the sub buffer size allowing for events
-	to be larger than the page size.
-
-	Note: When changing the order, tracing is stopped and any data in the
-	ring buffer and the snapshot buffer will be discarded.
+  buffer_subbuf_size_kb:
+
+	This sets or displays the sub buffer size. The ring buffer is broken up
+	into several same size "sub buffers". An event can not be bigger than
+	the size of the sub buffer. Normally, the sub buffer is the size of the
+	architecture's page (4K on x86). The sub buffer also contains meta data
+	at the start which also limits the size of an event.  That means when
+	the sub buffer is a page size, no event can be larger than the page
+	size minus the sub buffer meta data.
+
+	Note, the buffer_subbuf_size_kb is a way for the user to specify the
+	minimum size of the subbuffer. The kernel may make it bigger due to the
+	implementation details, or simply fail the operation if the kernel can
+	not handle the request.
+
+	Changing the sub buffer size allows for events to be larger than the
+	page size.
+
+	Note: When changing the sub-buffer size, tracing is stopped and any
+	data in the ring buffer and the snapshot buffer will be discarded.
 
   free_buffer:
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 82303bd2bba1..46dbe22121e9 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9384,42 +9384,54 @@  static const struct file_operations buffer_percent_fops = {
 };
 
 static ssize_t
-buffer_order_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
+buffer_subbuf_size_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
 {
 	struct trace_array *tr = filp->private_data;
+	size_t size;
 	char buf[64];
+	int order;
 	int r;
 
-	r = sprintf(buf, "%d\n", ring_buffer_subbuf_order_get(tr->array_buffer.buffer));
+	order = ring_buffer_subbuf_order_get(tr->array_buffer.buffer);
+	size = (PAGE_SIZE << order) / 1024;
+
+	r = sprintf(buf, "%zd\n", size);
 
 	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
 }
 
 static ssize_t
-buffer_order_write(struct file *filp, const char __user *ubuf,
-		   size_t cnt, loff_t *ppos)
+buffer_subbuf_size_write(struct file *filp, const char __user *ubuf,
+			 size_t cnt, loff_t *ppos)
 {
 	struct trace_array *tr = filp->private_data;
 	unsigned long val;
 	int old_order;
+	int order;
+	int pages;
 	int ret;
 
 	ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
 	if (ret)
 		return ret;
 
+	val *= 1024; /* value passed in is in KB */
+
+	pages = DIV_ROUND_UP(val, PAGE_SIZE);
+	order = fls(pages - 1);
+
 	/* limit between 1 and 128 system pages */
-	if (val < 0 || val > 7)
+	if (order < 0 || order > 7)
 		return -EINVAL;
 
 	/* Do not allow tracing while changing the order of the ring buffer */
 	tracing_stop_tr(tr);
 
 	old_order = ring_buffer_subbuf_order_get(tr->array_buffer.buffer);
-	if (old_order == val)
+	if (old_order == order)
 		goto out;
 
-	ret = ring_buffer_subbuf_order_set(tr->array_buffer.buffer, val);
+	ret = ring_buffer_subbuf_order_set(tr->array_buffer.buffer, order);
 	if (ret)
 		goto out;
 
@@ -9428,7 +9440,7 @@  buffer_order_write(struct file *filp, const char __user *ubuf,
 	if (!tr->allocated_snapshot)
 		goto out_max;
 
-	ret = ring_buffer_subbuf_order_set(tr->max_buffer.buffer, val);
+	ret = ring_buffer_subbuf_order_set(tr->max_buffer.buffer, order);
 	if (ret) {
 		/* Put back the old order */
 		cnt = ring_buffer_subbuf_order_set(tr->array_buffer.buffer, old_order);
@@ -9460,10 +9472,10 @@  buffer_order_write(struct file *filp, const char __user *ubuf,
 	return cnt;
 }
 
-static const struct file_operations buffer_order_fops = {
+static const struct file_operations buffer_subbuf_size_fops = {
 	.open		= tracing_open_generic_tr,
-	.read		= buffer_order_read,
-	.write		= buffer_order_write,
+	.read		= buffer_subbuf_size_read,
+	.write		= buffer_subbuf_size_write,
 	.release	= tracing_release_generic_tr,
 	.llseek		= default_llseek,
 };
@@ -9934,8 +9946,8 @@  init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
 	trace_create_file("buffer_percent", TRACE_MODE_WRITE, d_tracer,
 			tr, &buffer_percent_fops);
 
-	trace_create_file("buffer_subbuf_order", TRACE_MODE_WRITE, d_tracer,
-			  tr, &buffer_order_fops);
+	trace_create_file("buffer_subbuf_size_kb", TRACE_MODE_WRITE, d_tracer,
+			  tr, &buffer_subbuf_size_fops);
 
 	create_trace_options_dir(tr);
 
diff --git a/tools/testing/selftests/ftrace/test.d/00basic/ringbuffer_order.tc b/tools/testing/selftests/ftrace/test.d/00basic/ringbuffer_subbuf_size.tc
similarity index 85%
rename from tools/testing/selftests/ftrace/test.d/00basic/ringbuffer_order.tc
rename to tools/testing/selftests/ftrace/test.d/00basic/ringbuffer_subbuf_size.tc
index ecbcc810e6c1..d44d09a33a74 100644
--- a/tools/testing/selftests/ftrace/test.d/00basic/ringbuffer_order.tc
+++ b/tools/testing/selftests/ftrace/test.d/00basic/ringbuffer_subbuf_size.tc
@@ -1,7 +1,7 @@ 
 #!/bin/sh
 # SPDX-License-Identifier: GPL-2.0
-# description: Change the ringbuffer sub-buffer order
-# requires: buffer_subbuf_order
+# description: Change the ringbuffer sub-buffer size
+# requires: buffer_subbuf_size_kb
 # flags: instance
 
 get_buffer_data_size() {
@@ -52,8 +52,8 @@  write_buffer() {
 }
 
 test_buffer() {
-	orde=$1
-	page_size=$((4096<<order))
+	size_kb=$1
+	page_size=$((size_kb*1024))
 
 	size=`get_buffer_data_size`
 
@@ -82,14 +82,14 @@  test_buffer() {
 	fi
 }
 
-ORIG=`cat buffer_subbuf_order`
+ORIG=`cat buffer_subbuf_size_kb`
 
-# Could test bigger orders than 3, but then creating the string
+# Could test bigger sizes than 32K, but then creating the string
 # to write into the ring buffer takes too long
-for a in 0 1 2 3 ; do
-	echo $a > buffer_subbuf_order
+for a in 4 8 16 32 ; do
+	echo $a > buffer_subbuf_size_kb
 	test_buffer $a
 done
 
-echo $ORIG > buffer_subbuf_order
+echo $ORIG > buffer_subbuf_size_kb