[RFC,v6,1/5] perf sched: sync state char array with the kernel

Message ID 20230803083352.1585-2-zegao@tencent.com
State New
Headers
Series fix task state report from sched tracepoint |

Commit Message

Ze Gao Aug. 3, 2023, 8:33 a.m. UTC
  Update state char array and then remove unused and stale
macros, which are kernel internal representations and not
encouraged to use anymore.

Signed-off-by: Ze Gao <zegao@tencent.com>
---
 tools/perf/builtin-sched.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)
  

Comments

Steven Rostedt Aug. 3, 2023, 9:09 a.m. UTC | #1
On Thu,  3 Aug 2023 04:33:48 -0400
Ze Gao <zegao2021@gmail.com> wrote:

Hi Ze,

> Update state char array and then remove unused and stale
> macros, which are kernel internal representations and not
> encouraged to use anymore.
> 

A couple of things.

First, the change logs of every commit need to specify the "why". The
subject can say "what", but the change log really needs to explain why this
patch is important. For example, this patch is really two changes (and thus
should actually be two patches). (I'll also comment on the other patches)

1. The update of the state char array. You should explain why it's being
updated. If it was wrong, it needs to state the commit that changed to make
that happen.

2. For the removing the stale macros, the change log can simply state that
the macros are unused in the code and are being removed.

Finally, I know you're eager to get this patch set in, but please hold off
sending a new version immediately after a comment or two. Some maintainers
prefer submitters to wait a week or so, otherwise you will tend to "spam"
their inboxes. There's more than one maintainer Cc'd on this series, and you
need to be courteous not to send too many emails in a short period of time.

-- Steve


> Signed-off-by: Ze Gao <zegao@tencent.com>
> ---
>  tools/perf/builtin-sched.c | 13 +------------
>  1 file changed, 1 insertion(+), 12 deletions(-)
> 
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index 9ab300b6f131..8dc8f071721c 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -92,23 +92,12 @@ struct sched_atom {
>  	struct task_desc	*wakee;
>  };
>  
> -#define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP"
> +#define TASK_STATE_TO_CHAR_STR "RSDTtXZPI"
>  
>  /* task state bitmask, copied from include/linux/sched.h */
>  #define TASK_RUNNING		0
>  #define TASK_INTERRUPTIBLE	1
>  #define TASK_UNINTERRUPTIBLE	2
> -#define __TASK_STOPPED		4
> -#define __TASK_TRACED		8
> -/* in tsk->exit_state */
> -#define EXIT_DEAD		16
> -#define EXIT_ZOMBIE		32
> -#define EXIT_TRACE		(EXIT_ZOMBIE | EXIT_DEAD)
> -/* in tsk->state again */
> -#define TASK_DEAD		64
> -#define TASK_WAKEKILL		128
> -#define TASK_WAKING		256
> -#define TASK_PARKED		512
>  
>  enum thread_state {
>  	THREAD_SLEEPING = 0,
  
Ze Gao Aug. 3, 2023, 10:29 a.m. UTC | #2
On Thu, Aug 3, 2023 at 5:09 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu,  3 Aug 2023 04:33:48 -0400
> Ze Gao <zegao2021@gmail.com> wrote:
>
> Hi Ze,
>
> > Update state char array and then remove unused and stale
> > macros, which are kernel internal representations and not
> > encouraged to use anymore.
> >
>
> A couple of things.
>
> First, the change logs of every commit need to specify the "why". The
> subject can say "what", but the change log really needs to explain why this
> patch is important. For example, this patch is really two changes (and thus
> should actually be two patches). (I'll also comment on the other patches)

Thanks for the feedback! Will elaborate the changes in each changelog.

> 1. The update of the state char array. You should explain why it's being
> updated. If it was wrong, it needs to state the commit that changed to make
> that happen.
>
> 2. For the removing the stale macros, the change log can simply state that
> the macros are unused in the code and are being removed.
>
> Finally, I know you're eager to get this patch set in, but please hold off
> sending a new version immediately after a comment or two. Some maintainers
> prefer submitters to wait a week or so, otherwise you will tend to "spam"
> their inboxes. There's more than one maintainer Cc'd on this series, and you
> need to be courteous not to send too many emails in a short period of time.

Noted!  Actually I'm in no rush and just to make sure people see the
latest patches so they do not have to waste time on the old series.

Will hold off to resolve all the comments in this thread.

And thanks for pointing this out.

Regards,
Ze
  
Ze Gao Aug. 3, 2023, 12:25 p.m. UTC | #3
Hi,

THIS IS THE NEW CHANGELOG FOR THIS PATCH:

    perf sched: sync state char array with the kernel

    Since commit e936e8e459e14 ("perf tools: Adapt the
    TASK_STATE_TO_CHAR_STR to new value in kernel space."),
    the state char array that is used to interpret the
    states of tasks being switched out have not synced
    once with kernel definitions. Whereas the task report
    logic is evolving over this time and the definition
    of this state char array has been changed multiple
    times. And this leads to inconsistency.

    As of this writing, perf timehist --state still reports
    the wrong states because TASK_STATE_TO_CHAR_STR is too
    outdated to use.

    So sync TASK_STATE_TO_CHAR_STR to match the latest kernel
    definitions to fix it.

    Signed-off-by: Ze Gao <zegao@tencent.com>

Regards,
Ze

On Thu, Aug 3, 2023 at 6:29 PM Ze Gao <zegao2021@gmail.com> wrote:
>
> On Thu, Aug 3, 2023 at 5:09 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Thu,  3 Aug 2023 04:33:48 -0400
> > Ze Gao <zegao2021@gmail.com> wrote:
> >
> > Hi Ze,
> >
> > > Update state char array and then remove unused and stale
> > > macros, which are kernel internal representations and not
> > > encouraged to use anymore.
> > >
> >
> > A couple of things.
> >
> > First, the change logs of every commit need to specify the "why". The
> > subject can say "what", but the change log really needs to explain why this
> > patch is important. For example, this patch is really two changes (and thus
> > should actually be two patches). (I'll also comment on the other patches)
>
> Thanks for the feedback! Will elaborate the changes in each changelog.
>
> > 1. The update of the state char array. You should explain why it's being
> > updated. If it was wrong, it needs to state the commit that changed to make
> > that happen.
> >
> > 2. For the removing the stale macros, the change log can simply state that
> > the macros are unused in the code and are being removed.
> >
> > Finally, I know you're eager to get this patch set in, but please hold off
> > sending a new version immediately after a comment or two. Some maintainers
> > prefer submitters to wait a week or so, otherwise you will tend to "spam"
> > their inboxes. There's more than one maintainer Cc'd on this series, and you
> > need to be courteous not to send too many emails in a short period of time.
>
> Noted!  Actually I'm in no rush and just to make sure people see the
> latest patches so they do not have to waste time on the old series.
>
> Will hold off to resolve all the comments in this thread.
>
> And thanks for pointing this out.
>
> Regards,
> Ze
  
Ze Gao Aug. 3, 2023, 12:39 p.m. UTC | #4
> 2. For the removing the stale macros, the change log can simply state that
> the macros are unused in the code and are being removed.

I've split this part into a separate patch, and here is the changelog:

    perf sched: cleanup to remove unused macros

    The macros copied from kernel headers are unused and
    stale in the code and are being removed to avoid confusions.

    Signed-off-by: Ze Gao <zegao@tencent.com>

Regards,
Ze
  
Steven Rostedt Aug. 3, 2023, 3:10 p.m. UTC | #5
On Thu,  3 Aug 2023 04:33:48 -0400
Ze Gao <zegao2021@gmail.com> wrote:

> Update state char array and then remove unused and stale
> macros, which are kernel internal representations and not
> encouraged to use anymore.
> 
> Signed-off-by: Ze Gao <zegao@tencent.com>
> ---
>  tools/perf/builtin-sched.c | 13 +------------
>  1 file changed, 1 insertion(+), 12 deletions(-)
> 
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index 9ab300b6f131..8dc8f071721c 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -92,23 +92,12 @@ struct sched_atom {
>  	struct task_desc	*wakee;
>  };
>  
> -#define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP"
> +#define TASK_STATE_TO_CHAR_STR "RSDTtXZPI"

Thinking about this more, this will always be wrong. Changing it just works
for the kernel you made the change for, but if it is run on another kernel,
it's broken again.

I actually wrote code once that basically just did a:

	struct trace_seq s;

	trace_seq_init(&s);
	tep_print_event(tep, &s, record, "%s", TEP_PRINT_INFO);

then searched s.buffer for "prev_state=%s ", to find the state character.

That's because the kernel should always be up to date (and why I said I
needed that string in the print_fmt).

As perf has a tep handle, this could be a helper function to extract the
state if needed, and get rind of relying on the above character array.

-- Steve


>  
>  /* task state bitmask, copied from include/linux/sched.h */
>  #define TASK_RUNNING		0
>  #define TASK_INTERRUPTIBLE	1
>  #define TASK_UNINTERRUPTIBLE	2
> -#define __TASK_STOPPED		4
> -#define __TASK_TRACED		8
> -/* in tsk->exit_state */
> -#define EXIT_DEAD		16
> -#define EXIT_ZOMBIE		32
> -#define EXIT_TRACE		(EXIT_ZOMBIE | EXIT_DEAD)
> -/* in tsk->state again */
> -#define TASK_DEAD		64
> -#define TASK_WAKEKILL		128
> -#define TASK_WAKING		256
> -#define TASK_PARKED		512
>  
>  enum thread_state {
>  	THREAD_SLEEPING = 0,
  
Ze Gao Aug. 4, 2023, 2:21 a.m. UTC | #6
On Thu, Aug 3, 2023 at 11:10 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu,  3 Aug 2023 04:33:48 -0400
> Ze Gao <zegao2021@gmail.com> wrote:
>
> > Update state char array and then remove unused and stale
> > macros, which are kernel internal representations and not
> > encouraged to use anymore.
> >
> > Signed-off-by: Ze Gao <zegao@tencent.com>
> > ---
> >  tools/perf/builtin-sched.c | 13 +------------
> >  1 file changed, 1 insertion(+), 12 deletions(-)
> >
> > diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> > index 9ab300b6f131..8dc8f071721c 100644
> > --- a/tools/perf/builtin-sched.c
> > +++ b/tools/perf/builtin-sched.c
> > @@ -92,23 +92,12 @@ struct sched_atom {
> >       struct task_desc        *wakee;
> >  };
> >
> > -#define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP"
> > +#define TASK_STATE_TO_CHAR_STR "RSDTtXZPI"
>
> Thinking about this more, this will always be wrong. Changing it just works
> for the kernel you made the change for, but if it is run on another kernel,
> it's broken again.

Indeed. There is no easy way to maintain backward compatibility unless
we stop using this bizarre 'prev_state' field. Basically all its users suffer
from this. That's why I believe this needs a fix to alert people does not
use 'prev_state' anymore.

> I actually wrote code once that basically just did a:
>
>         struct trace_seq s;
>
>         trace_seq_init(&s);
>         tep_print_event(tep, &s, record, "%s", TEP_PRINT_INFO);
>
> then searched s.buffer for "prev_state=%s ", to find the state character.
>
> That's because the kernel should always be up to date (and why I said I
> needed that string in the print_fmt).

Turing to building the state char array from print fmt string dynamically
is a great idea. :)

> As perf has a tep handle, this could be a helper function to extract the
> state if needed, and get rind of relying on the above character array.

I'll figure out how to make it happen.

BTW,  my last concern is that is there any better way to notice userspace to
avoid interpreting task state out of 'prev_state'. Because the awkward thing
happens again.

Thanks,
Ze

> -- Steve
>
>
> >
> >  /* task state bitmask, copied from include/linux/sched.h */
> >  #define TASK_RUNNING         0
> >  #define TASK_INTERRUPTIBLE   1
> >  #define TASK_UNINTERRUPTIBLE 2
> > -#define __TASK_STOPPED               4
> > -#define __TASK_TRACED                8
> > -/* in tsk->exit_state */
> > -#define EXIT_DEAD            16
> > -#define EXIT_ZOMBIE          32
> > -#define EXIT_TRACE           (EXIT_ZOMBIE | EXIT_DEAD)
> > -/* in tsk->state again */
> > -#define TASK_DEAD            64
> > -#define TASK_WAKEKILL                128
> > -#define TASK_WAKING          256
> > -#define TASK_PARKED          512
> >
> >  enum thread_state {
> >       THREAD_SLEEPING = 0,
>
  
Ze Gao Aug. 4, 2023, 2:38 a.m. UTC | #7
On Fri, Aug 4, 2023 at 10:21 AM Ze Gao <zegao2021@gmail.com> wrote:
>
> On Thu, Aug 3, 2023 at 11:10 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Thu,  3 Aug 2023 04:33:48 -0400
> > Ze Gao <zegao2021@gmail.com> wrote:
> >
> > > Update state char array and then remove unused and stale
> > > macros, which are kernel internal representations and not
> > > encouraged to use anymore.
> > >
> > > Signed-off-by: Ze Gao <zegao@tencent.com>
> > > ---
> > >  tools/perf/builtin-sched.c | 13 +------------
> > >  1 file changed, 1 insertion(+), 12 deletions(-)
> > >
> > > diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> > > index 9ab300b6f131..8dc8f071721c 100644
> > > --- a/tools/perf/builtin-sched.c
> > > +++ b/tools/perf/builtin-sched.c
> > > @@ -92,23 +92,12 @@ struct sched_atom {
> > >       struct task_desc        *wakee;
> > >  };
> > >
> > > -#define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP"
> > > +#define TASK_STATE_TO_CHAR_STR "RSDTtXZPI"
> >
> > Thinking about this more, this will always be wrong. Changing it just works
> > for the kernel you made the change for, but if it is run on another kernel,
> > it's broken again.
>
> Indeed. There is no easy way to maintain backward compatibility unless
> we stop using this bizarre 'prev_state' field. Basically all its users suffer
> from this. That's why I believe this needs a fix to alert people does not
> use 'prev_state' anymore.
>
> > I actually wrote code once that basically just did a:
> >
> >         struct trace_seq s;
> >
> >         trace_seq_init(&s);
> >         tep_print_event(tep, &s, record, "%s", TEP_PRINT_INFO);
> >
> > then searched s.buffer for "prev_state=%s ", to find the state character.
> >
> > That's because the kernel should always be up to date (and why I said I
> > needed that string in the print_fmt).
>
> Turing to building the state char array from print fmt string dynamically
> is a great idea. :)
>
> > As perf has a tep handle, this could be a helper function to extract the
> > state if needed, and get rind of relying on the above character array.
>
> I'll figure out how to make it happen.
>
> BTW,  my last concern is that is there any better way to notice userspace to
> avoid interpreting task state out of 'prev_state'. Because the awkward thing
> happens again.

By userspace, I mean all tools consume 'prev_state' but don't have print fmt
available, taking bpf tracepoint for example.

Regards,
Ze

> Thanks,
> Ze
>
> > -- Steve
> >
> >
> > >
> > >  /* task state bitmask, copied from include/linux/sched.h */
> > >  #define TASK_RUNNING         0
> > >  #define TASK_INTERRUPTIBLE   1
> > >  #define TASK_UNINTERRUPTIBLE 2
> > > -#define __TASK_STOPPED               4
> > > -#define __TASK_TRACED                8
> > > -/* in tsk->exit_state */
> > > -#define EXIT_DEAD            16
> > > -#define EXIT_ZOMBIE          32
> > > -#define EXIT_TRACE           (EXIT_ZOMBIE | EXIT_DEAD)
> > > -/* in tsk->state again */
> > > -#define TASK_DEAD            64
> > > -#define TASK_WAKEKILL                128
> > > -#define TASK_WAKING          256
> > > -#define TASK_PARKED          512
> > >
> > >  enum thread_state {
> > >       THREAD_SLEEPING = 0,
> >
  
Ze Gao Aug. 4, 2023, 3:19 a.m. UTC | #8
On Fri, Aug 4, 2023 at 10:38 AM Ze Gao <zegao2021@gmail.com> wrote:
>
> On Fri, Aug 4, 2023 at 10:21 AM Ze Gao <zegao2021@gmail.com> wrote:
> >
> > On Thu, Aug 3, 2023 at 11:10 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Thu,  3 Aug 2023 04:33:48 -0400
> > > Ze Gao <zegao2021@gmail.com> wrote:
> > >
> > > > Update state char array and then remove unused and stale
> > > > macros, which are kernel internal representations and not
> > > > encouraged to use anymore.
> > > >
> > > > Signed-off-by: Ze Gao <zegao@tencent.com>
> > > > ---
> > > >  tools/perf/builtin-sched.c | 13 +------------
> > > >  1 file changed, 1 insertion(+), 12 deletions(-)
> > > >
> > > > diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> > > > index 9ab300b6f131..8dc8f071721c 100644
> > > > --- a/tools/perf/builtin-sched.c
> > > > +++ b/tools/perf/builtin-sched.c
> > > > @@ -92,23 +92,12 @@ struct sched_atom {
> > > >       struct task_desc        *wakee;
> > > >  };
> > > >
> > > > -#define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP"
> > > > +#define TASK_STATE_TO_CHAR_STR "RSDTtXZPI"
> > >
> > > Thinking about this more, this will always be wrong. Changing it just works
> > > for the kernel you made the change for, but if it is run on another kernel,
> > > it's broken again.
> >
> > Indeed. There is no easy way to maintain backward compatibility unless
> > we stop using this bizarre 'prev_state' field. Basically all its users suffer
> > from this. That's why I believe this needs a fix to alert people does not
> > use 'prev_state' anymore.
> >
> > > I actually wrote code once that basically just did a:
> > >
> > >         struct trace_seq s;
> > >
> > >         trace_seq_init(&s);
> > >         tep_print_event(tep, &s, record, "%s", TEP_PRINT_INFO);
> > >
> > > then searched s.buffer for "prev_state=%s ", to find the state character.
> > >
> > > That's because the kernel should always be up to date (and why I said I
> > > needed that string in the print_fmt).
> >
> > Turing to building the state char array from print fmt string dynamically
> > is a great idea. :)

I realize this is not perfect as well after second thoughts, since this does not
take offline use of perf into consideration.  People might run perf on different
machines than where the perf.data gets recorded, in which way what we get
from  /sys/kernel/debug/tracing/events/sched/sched_switch/format is likely
different from the perf.data.

So let's parse it from TEP_PRINT_INFO of each record instead of building
the state char array and rely on 'prev_state' again. At least this fix all tools
that have TEP_PRINT_INFO available.

Thanks,
Ze



> > > As perf has a tep handle, this could be a helper function to extract the
> > > state if needed, and get rind of relying on the above character array.
> >
> > I'll figure out how to make it happen.
> >
> > BTW,  my last concern is that is there any better way to notice userspace to
> > avoid interpreting task state out of 'prev_state'. Because the awkward thing
> > happens again.
>
> By userspace, I mean all tools consume 'prev_state' but don't have print fmt
> available, taking bpf tracepoint for example.
>
> Regards,
> Ze
>
> > Thanks,
> > Ze
> >
> > > -- Steve
> > >
> > >
> > > >
> > > >  /* task state bitmask, copied from include/linux/sched.h */
> > > >  #define TASK_RUNNING         0
> > > >  #define TASK_INTERRUPTIBLE   1
> > > >  #define TASK_UNINTERRUPTIBLE 2
> > > > -#define __TASK_STOPPED               4
> > > > -#define __TASK_TRACED                8
> > > > -/* in tsk->exit_state */
> > > > -#define EXIT_DEAD            16
> > > > -#define EXIT_ZOMBIE          32
> > > > -#define EXIT_TRACE           (EXIT_ZOMBIE | EXIT_DEAD)
> > > > -/* in tsk->state again */
> > > > -#define TASK_DEAD            64
> > > > -#define TASK_WAKEKILL                128
> > > > -#define TASK_WAKING          256
> > > > -#define TASK_PARKED          512
> > > >
> > > >  enum thread_state {
> > > >       THREAD_SLEEPING = 0,
> > >
  
Steven Rostedt Aug. 4, 2023, 3:41 a.m. UTC | #9
On Fri, 4 Aug 2023 11:19:06 +0800
Ze Gao <zegao2021@gmail.com> wrote:

> I realize this is not perfect as well after second thoughts, since this does not
> take offline use of perf into consideration.  People might run perf on different
> machines than where the perf.data gets recorded, in which way what we get
> from  /sys/kernel/debug/tracing/events/sched/sched_switch/format is likely
> different from the perf.data.

If perf data files does what trace.dat files do, it should save the
file formats in the data files. It should not be reading the kernel
when reading the data file.

With trace-cmd, you can do: trace-cmd dump --events

And it will show you all the formats of the events that it saved in the
file.

-- Steve
  

Patch

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 9ab300b6f131..8dc8f071721c 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -92,23 +92,12 @@  struct sched_atom {
 	struct task_desc	*wakee;
 };
 
-#define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP"
+#define TASK_STATE_TO_CHAR_STR "RSDTtXZPI"
 
 /* task state bitmask, copied from include/linux/sched.h */
 #define TASK_RUNNING		0
 #define TASK_INTERRUPTIBLE	1
 #define TASK_UNINTERRUPTIBLE	2
-#define __TASK_STOPPED		4
-#define __TASK_TRACED		8
-/* in tsk->exit_state */
-#define EXIT_DEAD		16
-#define EXIT_ZOMBIE		32
-#define EXIT_TRACE		(EXIT_ZOMBIE | EXIT_DEAD)
-/* in tsk->state again */
-#define TASK_DEAD		64
-#define TASK_WAKEKILL		128
-#define TASK_WAKING		256
-#define TASK_PARKED		512
 
 enum thread_state {
 	THREAD_SLEEPING = 0,