tracing: Have saved_cmdlines arrays all in one allocation

Message ID 20240212180941.379c419b@gandalf.local.home
State New
Headers
Series tracing: Have saved_cmdlines arrays all in one allocation |

Commit Message

Steven Rostedt Feb. 12, 2024, 11:09 p.m. UTC
  From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

The saved_cmdlines have three arrays for mapping PIDs to COMMs:

 - map_pid_to_cmdline[]
 - map_cmdline_to_pid[]
 - saved_cmdlines

The map_pid_to_cmdline[] is PID_MAX_DEFAULT in size and holds the index
into the other arrays. The map_cmdline_to_pid[] is a mapping back to the
full pid as it can be larger than PID_MAX_DEFAULT. And the
saved_cmdlines[] just holds the COMMs associated to the pids.

Currently the map_pid_to_cmdline[] and saved_cmdlines[] are allocated
together (in reality the saved_cmdlines is just in the memory of the
rounding of the allocation of the structure as it is always allocated in
powers of two). The map_cmdline_to_pid[] array is allocated separately.

Since the rounding to a power of two is rather large (it allows for 8000
elements in saved_cmdlines), also include the map_cmdline_to_pid[] array.
(This drops it to 6000 by default, which is still plenty for most use
cases). This saves even more memory as the map_cmdline_to_pid[] array
doesn't need to be allocated.

Link: https://lore.kernel.org/linux-trace-kernel/20240212174011.068211d9@gandalf.local.home/

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace_sched_switch.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)
  

Comments

Masami Hiramatsu (Google) Feb. 12, 2024, 11:36 p.m. UTC | #1
On Mon, 12 Feb 2024 18:09:41 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> The saved_cmdlines have three arrays for mapping PIDs to COMMs:
> 
>  - map_pid_to_cmdline[]
>  - map_cmdline_to_pid[]
>  - saved_cmdlines
> 
> The map_pid_to_cmdline[] is PID_MAX_DEFAULT in size and holds the index
> into the other arrays. The map_cmdline_to_pid[] is a mapping back to the
> full pid as it can be larger than PID_MAX_DEFAULT. And the
> saved_cmdlines[] just holds the COMMs associated to the pids.
> 
> Currently the map_pid_to_cmdline[] and saved_cmdlines[] are allocated
> together (in reality the saved_cmdlines is just in the memory of the
> rounding of the allocation of the structure as it is always allocated in
> powers of two). The map_cmdline_to_pid[] array is allocated separately.
> 
> Since the rounding to a power of two is rather large (it allows for 8000
> elements in saved_cmdlines), also include the map_cmdline_to_pid[] array.
> (This drops it to 6000 by default, which is still plenty for most use
> cases). This saves even more memory as the map_cmdline_to_pid[] array
> doesn't need to be allocated.
> 
> Link: https://lore.kernel.org/linux-trace-kernel/20240212174011.068211d9@gandalf.local.home/
> 

Looks good to me.

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thank you,

> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
>  kernel/trace/trace_sched_switch.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
> index e4fbcc3bede5..210c74dcd016 100644
> --- a/kernel/trace/trace_sched_switch.c
> +++ b/kernel/trace/trace_sched_switch.c
> @@ -201,7 +201,7 @@ static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
>  	int order;
>  
>  	/* Figure out how much is needed to hold the given number of cmdlines */
> -	orig_size = sizeof(*s) + val * TASK_COMM_LEN;
> +	orig_size = sizeof(*s) + val * (TASK_COMM_LEN + sizeof(int));
>  	order = get_order(orig_size);
>  	size = 1 << (order + PAGE_SHIFT);
>  	page = alloc_pages(GFP_KERNEL, order);
> @@ -212,16 +212,11 @@ static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
>  	memset(s, 0, sizeof(*s));
>  
>  	/* Round up to actual allocation */
> -	val = (size - sizeof(*s)) / TASK_COMM_LEN;
> +	val = (size - sizeof(*s)) / (TASK_COMM_LEN + sizeof(int));
>  	s->cmdline_num = val;
>  
> -	s->map_cmdline_to_pid = kmalloc_array(val,
> -					      sizeof(*s->map_cmdline_to_pid),
> -					      GFP_KERNEL);
> -	if (!s->map_cmdline_to_pid) {
> -		free_saved_cmdlines_buffer(s);
> -		return NULL;
> -	}
> +	/* Place map_cmdline_to_pid array right after saved_cmdlines */
> +	s->map_cmdline_to_pid = (unsigned *)&s->saved_cmdlines[val * TASK_COMM_LEN];
>  
>  	s->cmdline_idx = 0;
>  	memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP,
> -- 
> 2.43.0
>
  
Tim Chen Feb. 12, 2024, 11:39 p.m. UTC | #2
On Mon, 2024-02-12 at 18:09 -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> The saved_cmdlines have three arrays for mapping PIDs to COMMs:
> 
>  - map_pid_to_cmdline[]
>  - map_cmdline_to_pid[]
>  - saved_cmdlines
> 
> The map_pid_to_cmdline[] is PID_MAX_DEFAULT in size and holds the index
> into the other arrays. The map_cmdline_to_pid[] is a mapping back to the
> full pid as it can be larger than PID_MAX_DEFAULT. And the
> saved_cmdlines[] just holds the COMMs associated to the pids.
> 
> Currently the map_pid_to_cmdline[] and saved_cmdlines[] are allocated
> together (in reality the saved_cmdlines is just in the memory of the
> rounding of the allocation of the structure as it is always allocated in
> powers of two). The map_cmdline_to_pid[] array is allocated separately.
> 
> Since the rounding to a power of two is rather large (it allows for 8000
> elements in saved_cmdlines), also include the map_cmdline_to_pid[] array.
> (This drops it to 6000 by default, which is still plenty for most use
> cases). This saves even more memory as the map_cmdline_to_pid[] array
> doesn't need to be allocated.


This patch does make better use of the extra space and make the
previous change better.

Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
 
> 
> Link: https://lore.kernel.org/linux-trace-kernel/20240212174011.068211d9@gandalf.local.home/
> 
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
>  kernel/trace/trace_sched_switch.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
> index e4fbcc3bede5..210c74dcd016 100644
> --- a/kernel/trace/trace_sched_switch.c
> +++ b/kernel/trace/trace_sched_switch.c
> @@ -201,7 +201,7 @@ static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
>  	int order;
>  
>  	/* Figure out how much is needed to hold the given number of cmdlines */
> -	orig_size = sizeof(*s) + val * TASK_COMM_LEN;
> +	orig_size = sizeof(*s) + val * (TASK_COMM_LEN + sizeof(int));

Strictly speaking, *map_cmdline_to_pid is unsigned int so it is more consistent
to use sizeof(unsigned) in line above.  But I'm nitpicking and I'm fine to
leave it as is.

>  	order = get_order(orig_size);
>  	size = 1 << (order + PAGE_SHIFT);
>  	page = alloc_pages(GFP_KERNEL, order);
> @@ -212,16 +212,11 @@ static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
>  	memset(s, 0, sizeof(*s));
>  
>  	/* Round up to actual allocation */
> -	val = (size - sizeof(*s)) / TASK_COMM_LEN;
> +	val = (size - sizeof(*s)) / (TASK_COMM_LEN + sizeof(int));
>  	s->cmdline_num = val;
>  
> -	s->map_cmdline_to_pid = kmalloc_array(val,
> -					      sizeof(*s->map_cmdline_to_pid),
> -					      GFP_KERNEL);
> -	if (!s->map_cmdline_to_pid) {
> -		free_saved_cmdlines_buffer(s);
> -		return NULL;
> -	}
> +	/* Place map_cmdline_to_pid array right after saved_cmdlines */
> +	s->map_cmdline_to_pid = (unsigned *)&s->saved_cmdlines[val * TASK_COMM_LEN];
>  
>  	s->cmdline_idx = 0;
>  	memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP,
  
Steven Rostedt Feb. 13, 2024, 12:13 a.m. UTC | #3
On Mon, 12 Feb 2024 15:39:03 -0800
Tim Chen <tim.c.chen@linux.intel.com> wrote:

> > diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
> > index e4fbcc3bede5..210c74dcd016 100644
> > --- a/kernel/trace/trace_sched_switch.c
> > +++ b/kernel/trace/trace_sched_switch.c
> > @@ -201,7 +201,7 @@ static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
> >  	int order;
> >  
> >  	/* Figure out how much is needed to hold the given number of cmdlines */
> > -	orig_size = sizeof(*s) + val * TASK_COMM_LEN;
> > +	orig_size = sizeof(*s) + val * (TASK_COMM_LEN + sizeof(int));  
> 
> Strictly speaking, *map_cmdline_to_pid is unsigned int so it is more consistent
> to use sizeof(unsigned) in line above.  But I'm nitpicking and I'm fine to
> leave it as is.

I was thinking about making that into a macro as it is used in two places.

/* Holds the size of a cmdline and pid element */
#define SAVED_CMDLINE_MAP_ELEMENT_SIZE(s)		\
	(TASK_COMM_LEN + sizeof((s)->map_cmdline_to_pid[0]))

	orig_size = sizeof(*s) + val * SAVED_CMDLINE_MAP_ELEMENT_SIZE(s);

> 
> >  	order = get_order(orig_size);
> >  	size = 1 << (order + PAGE_SHIFT);
> >  	page = alloc_pages(GFP_KERNEL, order);
> > @@ -212,16 +212,11 @@ static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
> >  	memset(s, 0, sizeof(*s));
> >  
> >  	/* Round up to actual allocation */
> > -	val = (size - sizeof(*s)) / TASK_COMM_LEN;
> > +	val = (size - sizeof(*s)) / (TASK_COMM_LEN + sizeof(int));

	val = (size - sizeof(*s)) / SAVED_CMDLINE_MAP_ELEMENT_SIZE(s);

-- Steve

> >  	s->cmdline_num = val;
> >  
> > -	s->map_cmdline_to_pid = kmalloc_array(val,
> > -					      sizeof(*s->map_cmdline_to_pid),
> > -					      GFP_KERNEL);
> > -	if (!s->map_cmdline_to_pid) {
> > -		free_saved_cmdlines_buffer(s);
> > -		return NULL;
> > -	}
> > +	/* Place map_cmdline_to_pid array right after saved_cmdlines */
> > +	s->map_cmdline_to_pid = (unsigned *)&s->saved_cmdlines[val * TASK_COMM_LEN];
> >  
> >  	s->cmdline_idx = 0;
> >  	memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP,
  
Tim Chen Feb. 13, 2024, 4:35 p.m. UTC | #4
On Mon, 2024-02-12 at 19:13 -0500, Steven Rostedt wrote:
> On Mon, 12 Feb 2024 15:39:03 -0800
> Tim Chen <tim.c.chen@linux.intel.com> wrote:
> 
> > > diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
> > > index e4fbcc3bede5..210c74dcd016 100644
> > > --- a/kernel/trace/trace_sched_switch.c
> > > +++ b/kernel/trace/trace_sched_switch.c
> > > @@ -201,7 +201,7 @@ static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
> > >  	int order;
> > >  
> > >  	/* Figure out how much is needed to hold the given number of cmdlines */
> > > -	orig_size = sizeof(*s) + val * TASK_COMM_LEN;
> > > +	orig_size = sizeof(*s) + val * (TASK_COMM_LEN + sizeof(int));  
> > 
> > Strictly speaking, *map_cmdline_to_pid is unsigned int so it is more consistent
> > to use sizeof(unsigned) in line above.  But I'm nitpicking and I'm fine to
> > leave it as is.
> 
> I was thinking about making that into a macro as it is used in two places.
> 
> /* Holds the size of a cmdline and pid element */
> #define SAVED_CMDLINE_MAP_ELEMENT_SIZE(s)		\
> 	(TASK_COMM_LEN + sizeof((s)->map_cmdline_to_pid[0]))
> 
> 	orig_size = sizeof(*s) + val * SAVED_CMDLINE_MAP_ELEMENT_SIZE(s);
> 
> 

Looks good. This makes the code more readable.

Tim
  

Patch

diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
index e4fbcc3bede5..210c74dcd016 100644
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -201,7 +201,7 @@  static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
 	int order;
 
 	/* Figure out how much is needed to hold the given number of cmdlines */
-	orig_size = sizeof(*s) + val * TASK_COMM_LEN;
+	orig_size = sizeof(*s) + val * (TASK_COMM_LEN + sizeof(int));
 	order = get_order(orig_size);
 	size = 1 << (order + PAGE_SHIFT);
 	page = alloc_pages(GFP_KERNEL, order);
@@ -212,16 +212,11 @@  static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
 	memset(s, 0, sizeof(*s));
 
 	/* Round up to actual allocation */
-	val = (size - sizeof(*s)) / TASK_COMM_LEN;
+	val = (size - sizeof(*s)) / (TASK_COMM_LEN + sizeof(int));
 	s->cmdline_num = val;
 
-	s->map_cmdline_to_pid = kmalloc_array(val,
-					      sizeof(*s->map_cmdline_to_pid),
-					      GFP_KERNEL);
-	if (!s->map_cmdline_to_pid) {
-		free_saved_cmdlines_buffer(s);
-		return NULL;
-	}
+	/* Place map_cmdline_to_pid array right after saved_cmdlines */
+	s->map_cmdline_to_pid = (unsigned *)&s->saved_cmdlines[val * TASK_COMM_LEN];
 
 	s->cmdline_idx = 0;
 	memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP,