[3/5] perf record: Tracking side-band events for all CPUs when tracing selected CPUs

Message ID 20230704074217.240939-4-yangjihong1@huawei.com
State New
Headers
Series perf record: Tracking side-band events for all CPUs when tracing selected CPUs |

Commit Message

Yang Jihong July 4, 2023, 7:42 a.m. UTC
  User space tasks can migrate between CPUs, we need to track side-band
events for all CPUs.

The specific scenarios are as follows:

         CPU0                                 CPU1
  perf record -C 0 start
                              taskA starts to be created and executed
                                -> PERF_RECORD_COMM and PERF_RECORD_MMAP
                                   events only deliver to CPU1
                              ......
                                |
                          migrate to CPU0
                                |
  Running on CPU0    <----------/
  ...

  perf record -C 0 stop

Now perf samples the PC of taskA. However, perf does not record the
PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.
Therefore, the comm and symbols of taskA cannot be parsed.

The sys_perf_event_open invoked is as follows:

  # perf --debug verbose=3 record -e cpu-clock -C 1 true
  <SNIP>
  Opening: cpu-clock
  ------------------------------------------------------------
  perf_event_attr:
    type                             1
    size                             136
    { sample_period, sample_freq }   4000
    sample_type                      IP|TID|TIME|ID|CPU|PERIOD
    read_format                      ID|LOST
    disabled                         1
    inherit                          1
    freq                             1
    sample_id_all                    1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5
  Opening: dummy:HG
  ------------------------------------------------------------
  perf_event_attr:
    type                             1
    size                             136
    config                           0x9
    { sample_period, sample_freq }   4000
    sample_type                      IP|TID|TIME|ID|CPU|PERIOD
    read_format                      ID|LOST
    inherit                          1
    mmap                             1
    comm                             1
    freq                             1
    task                             1
    sample_id_all                    1
    mmap2                            1
    comm_exec                        1
    ksymbol                          1
    bpf_event                        1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 6
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 14
  <SNIP>

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
---
 tools/perf/builtin-record.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)
  

Comments

Namhyung Kim July 5, 2023, 9:09 p.m. UTC | #1
On Tue, Jul 4, 2023 at 12:44 AM Yang Jihong <yangjihong1@huawei.com> wrote:
>
> User space tasks can migrate between CPUs, we need to track side-band
> events for all CPUs.
>
> The specific scenarios are as follows:
>
>          CPU0                                 CPU1
>   perf record -C 0 start
>                               taskA starts to be created and executed
>                                 -> PERF_RECORD_COMM and PERF_RECORD_MMAP
>                                    events only deliver to CPU1
>                               ......
>                                 |
>                           migrate to CPU0
>                                 |
>   Running on CPU0    <----------/
>   ...
>
>   perf record -C 0 stop

But I'm curious why you don't limit the task to run on the
specified CPUs only (using taskset).

Also, as you may know, you don't need to specify -C if you
want to profile specific tasks only.  It'll open per-cpu, per-task
events and they will have all necessary info.

>
> Now perf samples the PC of taskA. However, perf does not record the
> PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.

_COMM and _MMAP right?

Thanks,
Namhyung


> Therefore, the comm and symbols of taskA cannot be parsed.
>
> The sys_perf_event_open invoked is as follows:
>
>   # perf --debug verbose=3 record -e cpu-clock -C 1 true
>   <SNIP>
>   Opening: cpu-clock
>   ------------------------------------------------------------
>   perf_event_attr:
>     type                             1
>     size                             136
>     { sample_period, sample_freq }   4000
>     sample_type                      IP|TID|TIME|ID|CPU|PERIOD
>     read_format                      ID|LOST
>     disabled                         1
>     inherit                          1
>     freq                             1
>     sample_id_all                    1
>     exclude_guest                    1
>   ------------------------------------------------------------
>   sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5
>   Opening: dummy:HG
>   ------------------------------------------------------------
>   perf_event_attr:
>     type                             1
>     size                             136
>     config                           0x9
>     { sample_period, sample_freq }   4000
>     sample_type                      IP|TID|TIME|ID|CPU|PERIOD
>     read_format                      ID|LOST
>     inherit                          1
>     mmap                             1
>     comm                             1
>     freq                             1
>     task                             1
>     sample_id_all                    1
>     mmap2                            1
>     comm_exec                        1
>     ksymbol                          1
>     bpf_event                        1
>   ------------------------------------------------------------
>   sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 6
>   sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 7
>   sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 9
>   sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 10
>   sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 11
>   sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 12
>   sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 13
>   sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 14
>   <SNIP>
>
> Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
> ---
>  tools/perf/builtin-record.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 8872cd037f2c..69e0d8c75aab 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -908,6 +908,31 @@ static int record__config_off_cpu(struct record *rec)
>         return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
>  }
>
> +static int record__config_tracking_events(struct record *rec)
> +{
> +       struct evsel *evsel;
> +       struct evlist *evlist = rec->evlist;
> +       struct record_opts *opts = &rec->opts;
> +
> +       /*
> +        * User space tasks can migrate between CPUs, so when tracing
> +        * selected CPUs, sideband for all CPUs is still needed.
> +        */
> +       if (opts->target.cpu_list) {
> +               evsel = evlist__findnew_tracking_event(evlist);
> +               if (!evsel)
> +                       return -ENOMEM;
> +
> +               if (!evsel->core.system_wide) {
> +                       evsel->core.system_wide = true;
> +                       evsel__set_sample_bit(evsel, TIME);
> +                       perf_evlist__propagate_maps(&evlist->core, &evsel->core);
> +               }
> +       }
> +
> +       return 0;
> +}
> +
>  static bool record__kcore_readable(struct machine *machine)
>  {
>         char kcore[PATH_MAX];
> @@ -4235,6 +4260,12 @@ int cmd_record(int argc, const char **argv)
>                 goto out;
>         }
>
> +       err = record__config_tracking_events(rec);
> +       if (err) {
> +               pr_err("record__config_tracking_events failed, error %d\n", err);
> +               goto out;
> +       }
> +
>         err = record__init_thread_masks(rec);
>         if (err) {
>                 pr_err("Failed to initialize parallel data streaming masks\n");
> --
> 2.30.GIT
>
  
Yang Jihong July 6, 2023, 1:39 a.m. UTC | #2
Hello,

On 2023/7/6 5:09, Namhyung Kim wrote:
> On Tue, Jul 4, 2023 at 12:44 AM Yang Jihong <yangjihong1@huawei.com> wrote:
>>
>> User space tasks can migrate between CPUs, we need to track side-band
>> events for all CPUs.
>>
>> The specific scenarios are as follows:
>>
>>           CPU0                                 CPU1
>>    perf record -C 0 start
>>                                taskA starts to be created and executed
>>                                  -> PERF_RECORD_COMM and PERF_RECORD_MMAP
>>                                     events only deliver to CPU1
>>                                ......
>>                                  |
>>                            migrate to CPU0
>>                                  |
>>    Running on CPU0    <----------/
>>    ...
>>
>>    perf record -C 0 stop
> 
> But I'm curious why you don't limit the task to run on the
> specified CPUs only (using taskset).
> 
> Also, as you may know, you don't need to specify -C if you
> want to profile specific tasks only.  It'll open per-cpu, per-task
> events and they will have all necessary info.
> 
The actual application scenario is to perform perf records only for 
specified cores. However, during sampling, the system may create new 
processes and then migrate the processes between cores due to 
scheduling. If the processes run on the selected core, In this case, the 
perf report cannot parse symbols for these processes.
>>
>> Now perf samples the PC of taskA. However, perf does not record the
>> PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.
> 
> _COMM and _MMAP right?
> 
Yes, PERF_RECORD_COMM and PERF_RECORD_MMAP. There's a clerical error here...

Thanks,
Yang
  
Adrian Hunter July 11, 2023, 1:13 p.m. UTC | #3
On 4/07/23 10:42, Yang Jihong wrote:
> User space tasks can migrate between CPUs, we need to track side-band
> events for all CPUs.
> 
> The specific scenarios are as follows:
> 
>          CPU0                                 CPU1
>   perf record -C 0 start
>                               taskA starts to be created and executed
>                                 -> PERF_RECORD_COMM and PERF_RECORD_MMAP
>                                    events only deliver to CPU1
>                               ......
>                                 |
>                           migrate to CPU0
>                                 |
>   Running on CPU0    <----------/
>   ...
> 
>   perf record -C 0 stop
> 
> Now perf samples the PC of taskA. However, perf does not record the
> PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.
> Therefore, the comm and symbols of taskA cannot be parsed.
> 
> The sys_perf_event_open invoked is as follows:
> 
>   # perf --debug verbose=3 record -e cpu-clock -C 1 true
>   <SNIP>
>   Opening: cpu-clock
>   ------------------------------------------------------------
>   perf_event_attr:
>     type                             1
>     size                             136
>     { sample_period, sample_freq }   4000
>     sample_type                      IP|TID|TIME|ID|CPU|PERIOD
>     read_format                      ID|LOST
>     disabled                         1
>     inherit                          1
>     freq                             1
>     sample_id_all                    1
>     exclude_guest                    1
>   ------------------------------------------------------------
>   sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5
>   Opening: dummy:HG
>   ------------------------------------------------------------
>   perf_event_attr:
>     type                             1
>     size                             136
>     config                           0x9
>     { sample_period, sample_freq }   4000
>     sample_type                      IP|TID|TIME|ID|CPU|PERIOD
>     read_format                      ID|LOST
>     inherit                          1
>     mmap                             1
>     comm                             1
>     freq                             1
>     task                             1
>     sample_id_all                    1
>     mmap2                            1
>     comm_exec                        1
>     ksymbol                          1
>     bpf_event                        1
>   ------------------------------------------------------------
>   sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 6
>   sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 7
>   sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 9
>   sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 10
>   sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 11
>   sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 12
>   sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 13
>   sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 14
>   <SNIP>
> 
> Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
> ---
>  tools/perf/builtin-record.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 8872cd037f2c..69e0d8c75aab 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -908,6 +908,31 @@ static int record__config_off_cpu(struct record *rec)
>  	return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
>  }
>  
> +static int record__config_tracking_events(struct record *rec)
> +{
> +	struct evsel *evsel;
> +	struct evlist *evlist = rec->evlist;
> +	struct record_opts *opts = &rec->opts;
> +
> +	/*
> +	 * User space tasks can migrate between CPUs, so when tracing
> +	 * selected CPUs, sideband for all CPUs is still needed.
> +	 */
> +	if (opts->target.cpu_list) {

I am not sure if anyone minds doing this by default, but perhaps
we should say something about it on the perf record man page.

> +		evsel = evlist__findnew_tracking_event(evlist);
> +		if (!evsel)
> +			return -ENOMEM;
> +
> +		if (!evsel->core.system_wide) {
> +			evsel->core.system_wide = true;
> +			evsel__set_sample_bit(evsel, TIME);
> +			perf_evlist__propagate_maps(&evlist->core, &evsel->core);
> +		}

Perhaps better to export via internel/evsel.h

void perf_evsel__go_system_wide(struct perf_evlist *evlist, struct perf_evsel *evsel)
{
	if (!evsel->system_wide) {
		evsel->system_wide = true;
		if (evlist->needs_map_propagation)
			__perf_evlist__propagate_maps(evlist, evsel);
	}
}

As suggested in response to patch 2, perhaps deal with system_wide
inside evlist__findnew_tracking_event()

> +	}
> +
> +	return 0;
> +}
> +
>  static bool record__kcore_readable(struct machine *machine)
>  {
>  	char kcore[PATH_MAX];
> @@ -4235,6 +4260,12 @@ int cmd_record(int argc, const char **argv)
>  		goto out;
>  	}
>  
> +	err = record__config_tracking_events(rec);
> +	if (err) {
> +		pr_err("record__config_tracking_events failed, error %d\n", err);
> +		goto out;
> +	}
> +
>  	err = record__init_thread_masks(rec);
>  	if (err) {
>  		pr_err("Failed to initialize parallel data streaming masks\n");
  
Yang Jihong July 12, 2023, 2:44 p.m. UTC | #4
Hello,

On 2023/7/11 21:13, Adrian Hunter wrote:
> On 4/07/23 10:42, Yang Jihong wrote:
>> User space tasks can migrate between CPUs, we need to track side-band
>> events for all CPUs.
>>
>> The specific scenarios are as follows:
>>
>>           CPU0                                 CPU1
>>    perf record -C 0 start
>>                                taskA starts to be created and executed
>>                                  -> PERF_RECORD_COMM and PERF_RECORD_MMAP
>>                                     events only deliver to CPU1
>>                                ......
>>                                  |
>>                            migrate to CPU0
>>                                  |
>>    Running on CPU0    <----------/
>>    ...
>>
>>    perf record -C 0 stop
>>
>> Now perf samples the PC of taskA. However, perf does not record the
>> PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.
>> Therefore, the comm and symbols of taskA cannot be parsed.
>>
>> The sys_perf_event_open invoked is as follows:
>>
>>    # perf --debug verbose=3 record -e cpu-clock -C 1 true
>>    <SNIP>
>>    Opening: cpu-clock
>>    ------------------------------------------------------------
>>    perf_event_attr:
>>      type                             1
>>      size                             136
>>      { sample_period, sample_freq }   4000
>>      sample_type                      IP|TID|TIME|ID|CPU|PERIOD
>>      read_format                      ID|LOST
>>      disabled                         1
>>      inherit                          1
>>      freq                             1
>>      sample_id_all                    1
>>      exclude_guest                    1
>>    ------------------------------------------------------------
>>    sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5
>>    Opening: dummy:HG
>>    ------------------------------------------------------------
>>    perf_event_attr:
>>      type                             1
>>      size                             136
>>      config                           0x9
>>      { sample_period, sample_freq }   4000
>>      sample_type                      IP|TID|TIME|ID|CPU|PERIOD
>>      read_format                      ID|LOST
>>      inherit                          1
>>      mmap                             1
>>      comm                             1
>>      freq                             1
>>      task                             1
>>      sample_id_all                    1
>>      mmap2                            1
>>      comm_exec                        1
>>      ksymbol                          1
>>      bpf_event                        1
>>    ------------------------------------------------------------
>>    sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 6
>>    sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 7
>>    sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 9
>>    sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 10
>>    sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 11
>>    sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 12
>>    sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 13
>>    sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 14
>>    <SNIP>
>>
>> Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
>> ---
>>   tools/perf/builtin-record.c | 31 +++++++++++++++++++++++++++++++
>>   1 file changed, 31 insertions(+)
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index 8872cd037f2c..69e0d8c75aab 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -908,6 +908,31 @@ static int record__config_off_cpu(struct record *rec)
>>   	return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
>>   }
>>   
>> +static int record__config_tracking_events(struct record *rec)
>> +{
>> +	struct evsel *evsel;
>> +	struct evlist *evlist = rec->evlist;
>> +	struct record_opts *opts = &rec->opts;
>> +
>> +	/*
>> +	 * User space tasks can migrate between CPUs, so when tracing
>> +	 * selected CPUs, sideband for all CPUs is still needed.
>> +	 */
>> +	if (opts->target.cpu_list) {
> 
> I am not sure if anyone minds doing this by default, but perhaps
> we should say something about it on the perf record man page.
> 
Okay, will add comments to the man page.

>> +		evsel = evlist__findnew_tracking_event(evlist);
>> +		if (!evsel)
>> +			return -ENOMEM;
>> +
>> +		if (!evsel->core.system_wide) {
>> +			evsel->core.system_wide = true;
>> +			evsel__set_sample_bit(evsel, TIME);
>> +			perf_evlist__propagate_maps(&evlist->core, &evsel->core);
>> +		}
> 
> Perhaps better to export via internel/evsel.h
> 
> void perf_evsel__go_system_wide(struct perf_evlist *evlist, struct perf_evsel *evsel)
> {
> 	if (!evsel->system_wide) {
> 		evsel->system_wide = true;
> 		if (evlist->needs_map_propagation)
> 			__perf_evlist__propagate_maps(evlist, evsel);
> 	}
> }
> 
> As suggested in response to patch 2, perhaps deal with system_wide
> inside evlist__findnew_tracking_event()
> 
Okay, I'll modify it as above, so maybe we need to export 
perf_evlist__propagate_maps().

As mentioned in the patch 1, __perf_evlist__propagate_maps is low-level 
and avoid to export it.
Or can we export perf_evsel__go_system_wide() via through internel/evlist.h?
In this way, we do not need to export perf_evlist__propagate_maps().
If so, would it be more appropriate to call perf_evlist__go_system_wide()?

Thanks,
Yang
  
Adrian Hunter July 12, 2023, 3:03 p.m. UTC | #5
On 12/07/23 17:44, Yang Jihong wrote:
> Hello,
> 
> On 2023/7/11 21:13, Adrian Hunter wrote:
>> On 4/07/23 10:42, Yang Jihong wrote:
>>> User space tasks can migrate between CPUs, we need to track side-band
>>> events for all CPUs.
>>>
>>> The specific scenarios are as follows:
>>>
>>>           CPU0                                 CPU1
>>>    perf record -C 0 start
>>>                                taskA starts to be created and executed
>>>                                  -> PERF_RECORD_COMM and PERF_RECORD_MMAP
>>>                                     events only deliver to CPU1
>>>                                ......
>>>                                  |
>>>                            migrate to CPU0
>>>                                  |
>>>    Running on CPU0    <----------/
>>>    ...
>>>
>>>    perf record -C 0 stop
>>>
>>> Now perf samples the PC of taskA. However, perf does not record the
>>> PERF_RECORD_COMM and PERF_RECORD_COMM events of taskA.
>>> Therefore, the comm and symbols of taskA cannot be parsed.
>>>
>>> The sys_perf_event_open invoked is as follows:
>>>
>>>    # perf --debug verbose=3 record -e cpu-clock -C 1 true
>>>    <SNIP>
>>>    Opening: cpu-clock
>>>    ------------------------------------------------------------
>>>    perf_event_attr:
>>>      type                             1
>>>      size                             136
>>>      { sample_period, sample_freq }   4000
>>>      sample_type                      IP|TID|TIME|ID|CPU|PERIOD
>>>      read_format                      ID|LOST
>>>      disabled                         1
>>>      inherit                          1
>>>      freq                             1
>>>      sample_id_all                    1
>>>      exclude_guest                    1
>>>    ------------------------------------------------------------
>>>    sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5
>>>    Opening: dummy:HG
>>>    ------------------------------------------------------------
>>>    perf_event_attr:
>>>      type                             1
>>>      size                             136
>>>      config                           0x9
>>>      { sample_period, sample_freq }   4000
>>>      sample_type                      IP|TID|TIME|ID|CPU|PERIOD
>>>      read_format                      ID|LOST
>>>      inherit                          1
>>>      mmap                             1
>>>      comm                             1
>>>      freq                             1
>>>      task                             1
>>>      sample_id_all                    1
>>>      mmap2                            1
>>>      comm_exec                        1
>>>      ksymbol                          1
>>>      bpf_event                        1
>>>    ------------------------------------------------------------
>>>    sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 6
>>>    sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 7
>>>    sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 9
>>>    sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 10
>>>    sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 11
>>>    sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 12
>>>    sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 13
>>>    sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 14
>>>    <SNIP>
>>>
>>> Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
>>> ---
>>>   tools/perf/builtin-record.c | 31 +++++++++++++++++++++++++++++++
>>>   1 file changed, 31 insertions(+)
>>>
>>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>>> index 8872cd037f2c..69e0d8c75aab 100644
>>> --- a/tools/perf/builtin-record.c
>>> +++ b/tools/perf/builtin-record.c
>>> @@ -908,6 +908,31 @@ static int record__config_off_cpu(struct record *rec)
>>>       return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
>>>   }
>>>   +static int record__config_tracking_events(struct record *rec)
>>> +{
>>> +    struct evsel *evsel;
>>> +    struct evlist *evlist = rec->evlist;
>>> +    struct record_opts *opts = &rec->opts;
>>> +
>>> +    /*
>>> +     * User space tasks can migrate between CPUs, so when tracing
>>> +     * selected CPUs, sideband for all CPUs is still needed.
>>> +     */
>>> +    if (opts->target.cpu_list) {
>>
>> I am not sure if anyone minds doing this by default, but perhaps
>> we should say something about it on the perf record man page.
>>
> Okay, will add comments to the man page.
> 
>>> +        evsel = evlist__findnew_tracking_event(evlist);
>>> +        if (!evsel)
>>> +            return -ENOMEM;
>>> +
>>> +        if (!evsel->core.system_wide) {
>>> +            evsel->core.system_wide = true;
>>> +            evsel__set_sample_bit(evsel, TIME);
>>> +            perf_evlist__propagate_maps(&evlist->core, &evsel->core);
>>> +        }
>>
>> Perhaps better to export via internel/evsel.h
>>
>> void perf_evsel__go_system_wide(struct perf_evlist *evlist, struct perf_evsel *evsel)
>> {
>>     if (!evsel->system_wide) {
>>         evsel->system_wide = true;
>>         if (evlist->needs_map_propagation)
>>             __perf_evlist__propagate_maps(evlist, evsel);
>>     }
>> }
>>
>> As suggested in response to patch 2, perhaps deal with system_wide
>> inside evlist__findnew_tracking_event()
>>
> Okay, I'll modify it as above, so maybe we need to export perf_evlist__propagate_maps().
> 
> As mentioned in the patch 1, __perf_evlist__propagate_maps is low-level and avoid to export it.
> Or can we export perf_evsel__go_system_wide() via through internel/evlist.h?

Yes

> In this way, we do not need to export perf_evlist__propagate_maps().
> If so, would it be more appropriate to call perf_evlist__go_system_wide()?

Sure

> 
> Thanks,
> Yang
  

Patch

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8872cd037f2c..69e0d8c75aab 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -908,6 +908,31 @@  static int record__config_off_cpu(struct record *rec)
 	return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
 }
 
+static int record__config_tracking_events(struct record *rec)
+{
+	struct evsel *evsel;
+	struct evlist *evlist = rec->evlist;
+	struct record_opts *opts = &rec->opts;
+
+	/*
+	 * User space tasks can migrate between CPUs, so when tracing
+	 * selected CPUs, sideband for all CPUs is still needed.
+	 */
+	if (opts->target.cpu_list) {
+		evsel = evlist__findnew_tracking_event(evlist);
+		if (!evsel)
+			return -ENOMEM;
+
+		if (!evsel->core.system_wide) {
+			evsel->core.system_wide = true;
+			evsel__set_sample_bit(evsel, TIME);
+			perf_evlist__propagate_maps(&evlist->core, &evsel->core);
+		}
+	}
+
+	return 0;
+}
+
 static bool record__kcore_readable(struct machine *machine)
 {
 	char kcore[PATH_MAX];
@@ -4235,6 +4260,12 @@  int cmd_record(int argc, const char **argv)
 		goto out;
 	}
 
+	err = record__config_tracking_events(rec);
+	if (err) {
+		pr_err("record__config_tracking_events failed, error %d\n", err);
+		goto out;
+	}
+
 	err = record__init_thread_masks(rec);
 	if (err) {
 		pr_err("Failed to initialize parallel data streaming masks\n");