perf test: Retry without grouping for all metrics test

Message ID 20230614090710.680330-1-sandipan.das@amd.com
State New
Headers
Series perf test: Retry without grouping for all metrics test |

Commit Message

Sandipan Das June 14, 2023, 9:07 a.m. UTC
  There are cases where a metric uses more events than the number of
counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric
counters but the "nps1_die_to_dram" metric has eight events. By default,
the constituent events are placed in a group. Since the events cannot be
scheduled at the same time, the metric is not computed. The all metrics
test also fails because of this.

Before announcing failure, the test can try multiple options for each
available metric. After system-wide mode fails, retry once again with
the "--metric-no-group" option.

E.g.

  $ sudo perf test -v 100

Before:

  100: perf all metrics test                                           :
  --- start ---
  test child forked, pid 672731
  Testing branch_misprediction_ratio
  Testing all_remote_links_outbound
  Testing nps1_die_to_dram
  Metric 'nps1_die_to_dram' not printed in:
  Error:
  Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
  Testing macro_ops_dispatched
  Testing all_l2_cache_accesses
  Testing all_l2_cache_hits
  Testing all_l2_cache_misses
  Testing ic_fetch_miss_ratio
  Testing l2_cache_accesses_from_l2_hwpf
  Testing l2_cache_misses_from_l2_hwpf
  Testing op_cache_fetch_miss_ratio
  Testing l3_read_miss_latency
  Testing l1_itlb_misses
  test child finished with -1
  ---- end ----
  perf all metrics test: FAILED!

After:

  100: perf all metrics test                                           :
  --- start ---
  test child forked, pid 672887
  Testing branch_misprediction_ratio
  Testing all_remote_links_outbound
  Testing nps1_die_to_dram
  Testing macro_ops_dispatched
  Testing all_l2_cache_accesses
  Testing all_l2_cache_hits
  Testing all_l2_cache_misses
  Testing ic_fetch_miss_ratio
  Testing l2_cache_accesses_from_l2_hwpf
  Testing l2_cache_misses_from_l2_hwpf
  Testing op_cache_fetch_miss_ratio
  Testing l3_read_miss_latency
  Testing l1_itlb_misses
  test child finished with 0
  ---- end ----
  perf all metrics test: Ok

Reported-by: Ayush Jain <ayush.jain3@amd.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 tools/perf/tests/shell/stat_all_metrics.sh | 7 +++++++
 1 file changed, 7 insertions(+)
  

Comments

Ayush Jain June 14, 2023, 11:38 a.m. UTC | #1
Hello Sandipan,

Thank you for this patch,

On 6/14/2023 2:37 PM, Sandipan Das wrote:
> There are cases where a metric uses more events than the number of
> counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric
> counters but the "nps1_die_to_dram" metric has eight events. By default,
> the constituent events are placed in a group. Since the events cannot be
> scheduled at the same time, the metric is not computed. The all metrics
> test also fails because of this.
> 
> Before announcing failure, the test can try multiple options for each
> available metric. After system-wide mode fails, retry once again with
> the "--metric-no-group" option.
> 
> E.g.
> 
>    $ sudo perf test -v 100
> 
> Before:
> 
>    100: perf all metrics test                                           :
>    --- start ---
>    test child forked, pid 672731
>    Testing branch_misprediction_ratio
>    Testing all_remote_links_outbound
>    Testing nps1_die_to_dram
>    Metric 'nps1_die_to_dram' not printed in:
>    Error:
>    Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
>    Testing macro_ops_dispatched
>    Testing all_l2_cache_accesses
>    Testing all_l2_cache_hits
>    Testing all_l2_cache_misses
>    Testing ic_fetch_miss_ratio
>    Testing l2_cache_accesses_from_l2_hwpf
>    Testing l2_cache_misses_from_l2_hwpf
>    Testing op_cache_fetch_miss_ratio
>    Testing l3_read_miss_latency
>    Testing l1_itlb_misses
>    test child finished with -1
>    ---- end ----
>    perf all metrics test: FAILED!
> 
> After:
> 
>    100: perf all metrics test                                           :
>    --- start ---
>    test child forked, pid 672887
>    Testing branch_misprediction_ratio
>    Testing all_remote_links_outbound
>    Testing nps1_die_to_dram
>    Testing macro_ops_dispatched
>    Testing all_l2_cache_accesses
>    Testing all_l2_cache_hits
>    Testing all_l2_cache_misses
>    Testing ic_fetch_miss_ratio
>    Testing l2_cache_accesses_from_l2_hwpf
>    Testing l2_cache_misses_from_l2_hwpf
>    Testing op_cache_fetch_miss_ratio
>    Testing l3_read_miss_latency
>    Testing l1_itlb_misses
>    test child finished with 0
>    ---- end ----
>    perf all metrics test: Ok
> 

Issue gets resolved after applying this patch

   $ ./perf test 102 -vvv
   $102: perf all metrics test                                           :
   $--- start ---
   $test child forked, pid 244991
   $Testing branch_misprediction_ratio
   $Testing all_remote_links_outbound
   $Testing nps1_die_to_dram
   $Testing all_l2_cache_accesses
   $Testing all_l2_cache_hits
   $Testing all_l2_cache_misses
   $Testing ic_fetch_miss_ratio
   $Testing l2_cache_accesses_from_l2_hwpf
   $Testing l2_cache_misses_from_l2_hwpf
   $Testing l3_read_miss_latency
   $Testing l1_itlb_misses
   $test child finished with 0
   $---- end ----
   $perf all metrics test: Ok

> Reported-by: Ayush Jain <ayush.jain3@amd.com>
> Signed-off-by: Sandipan Das <sandipan.das@amd.com>

Tested-by: Ayush Jain <ayush.jain3@amd.com>

> ---
>   tools/perf/tests/shell/stat_all_metrics.sh | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh
> index 54774525e18a..1e88ea8c5677 100755
> --- a/tools/perf/tests/shell/stat_all_metrics.sh
> +++ b/tools/perf/tests/shell/stat_all_metrics.sh
> @@ -16,6 +16,13 @@ for m in $(perf list --raw-dump metrics); do
>     then
>       continue
>     fi
> +  # Failed again, possibly there are not enough counters so retry system wide
> +  # mode but without event grouping.
> +  result=$(perf stat -M "$m" --metric-no-group -a sleep 0.01 2>&1)
> +  if [[ "$result" =~ ${m:0:50} ]]
> +  then
> +    continue
> +  fi
>     # Failed again, possibly the workload was too small so retry with something
>     # longer.
>     result=$(perf stat -M "$m" perf bench internals synthesize 2>&1)

Thanks & Regards,
Ayush Jain
  
Ian Rogers June 14, 2023, 4:40 p.m. UTC | #2
On Wed, Jun 14, 2023 at 2:07 AM Sandipan Das <sandipan.das@amd.com> wrote:
>
> There are cases where a metric uses more events than the number of
> counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric
> counters but the "nps1_die_to_dram" metric has eight events. By default,
> the constituent events are placed in a group. Since the events cannot be
> scheduled at the same time, the metric is not computed. The all metrics
> test also fails because of this.

Thanks Sandipan. So this is exposing a bug in the AMD data fabric PMU
driver. When the events are added the driver should create a fake PMU,
check that adding the group is valid and if not fail. The failure is
picked up by the tool and it will remove the group.

I appreciate the need for a time machine to make such a fix work. To
workaround the issue with the metrics add:
"MetricConstraint": "NO_GROUP_EVENTS",
to each metric in the json.

> Before announcing failure, the test can try multiple options for each
> available metric. After system-wide mode fails, retry once again with
> the "--metric-no-group" option.
>
> E.g.
>
>   $ sudo perf test -v 100
>
> Before:
>
>   100: perf all metrics test                                           :
>   --- start ---
>   test child forked, pid 672731
>   Testing branch_misprediction_ratio
>   Testing all_remote_links_outbound
>   Testing nps1_die_to_dram
>   Metric 'nps1_die_to_dram' not printed in:
>   Error:
>   Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.

This error doesn't relate to grouping, so I'm confused about having it
in the commit message, aside from the test failure.

Thanks,
Ian

>   Testing macro_ops_dispatched
>   Testing all_l2_cache_accesses
>   Testing all_l2_cache_hits
>   Testing all_l2_cache_misses
>   Testing ic_fetch_miss_ratio
>   Testing l2_cache_accesses_from_l2_hwpf
>   Testing l2_cache_misses_from_l2_hwpf
>   Testing op_cache_fetch_miss_ratio
>   Testing l3_read_miss_latency
>   Testing l1_itlb_misses
>   test child finished with -1
>   ---- end ----
>   perf all metrics test: FAILED!
>
> After:
>
>   100: perf all metrics test                                           :
>   --- start ---
>   test child forked, pid 672887
>   Testing branch_misprediction_ratio
>   Testing all_remote_links_outbound
>   Testing nps1_die_to_dram
>   Testing macro_ops_dispatched
>   Testing all_l2_cache_accesses
>   Testing all_l2_cache_hits
>   Testing all_l2_cache_misses
>   Testing ic_fetch_miss_ratio
>   Testing l2_cache_accesses_from_l2_hwpf
>   Testing l2_cache_misses_from_l2_hwpf
>   Testing op_cache_fetch_miss_ratio
>   Testing l3_read_miss_latency
>   Testing l1_itlb_misses
>   test child finished with 0
>   ---- end ----
>   perf all metrics test: Ok
>
> Reported-by: Ayush Jain <ayush.jain3@amd.com>
> Signed-off-by: Sandipan Das <sandipan.das@amd.com>
> ---
>  tools/perf/tests/shell/stat_all_metrics.sh | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh
> index 54774525e18a..1e88ea8c5677 100755
> --- a/tools/perf/tests/shell/stat_all_metrics.sh
> +++ b/tools/perf/tests/shell/stat_all_metrics.sh
> @@ -16,6 +16,13 @@ for m in $(perf list --raw-dump metrics); do
>    then
>      continue
>    fi
> +  # Failed again, possibly there are not enough counters so retry system wide
> +  # mode but without event grouping.
> +  result=$(perf stat -M "$m" --metric-no-group -a sleep 0.01 2>&1)
> +  if [[ "$result" =~ ${m:0:50} ]]
> +  then
> +    continue
> +  fi
>    # Failed again, possibly the workload was too small so retry with something
>    # longer.
>    result=$(perf stat -M "$m" perf bench internals synthesize 2>&1)
> --
> 2.34.1
>
  
Sandipan Das June 19, 2023, 11:46 a.m. UTC | #3
Hi Ian,

On 6/14/2023 10:10 PM, Ian Rogers wrote:
> On Wed, Jun 14, 2023 at 2:07 AM Sandipan Das <sandipan.das@amd.com> wrote:
>>
>> There are cases where a metric uses more events than the number of
>> counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric
>> counters but the "nps1_die_to_dram" metric has eight events. By default,
>> the constituent events are placed in a group. Since the events cannot be
>> scheduled at the same time, the metric is not computed. The all metrics
>> test also fails because of this.
> 
> Thanks Sandipan. So this is exposing a bug in the AMD data fabric PMU
> driver. When the events are added the driver should create a fake PMU,
> check that adding the group is valid and if not fail. The failure is
> picked up by the tool and it will remove the group.
> 
> I appreciate the need for a time machine to make such a fix work. To
> workaround the issue with the metrics add:
> "MetricConstraint": "NO_GROUP_EVENTS",
> to each metric in the json.
> 

Thanks for the suggestions. The amd_uncore driver is indeed missing group
validation checks during event init. Will send out a fix with the
"NO_GROUP_EVENTS" workaround.

>> Before announcing failure, the test can try multiple options for each
>> available metric. After system-wide mode fails, retry once again with
>> the "--metric-no-group" option.
>>
>> E.g.
>>
>>   $ sudo perf test -v 100
>>
>> Before:
>>
>>   100: perf all metrics test                                           :
>>   --- start ---
>>   test child forked, pid 672731
>>   Testing branch_misprediction_ratio
>>   Testing all_remote_links_outbound
>>   Testing nps1_die_to_dram
>>   Metric 'nps1_die_to_dram' not printed in:
>>   Error:
>>   Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
> 
> This error doesn't relate to grouping, so I'm confused about having it
> in the commit message, aside from the test failure.
> 

Agreed. That's the error message from the last attempt where the test
tries to use a longer running workload (perf bench).

- Sandipan
  
Arnaldo Carvalho de Melo Dec. 6, 2023, 1:08 p.m. UTC | #4
Em Wed, Jun 14, 2023 at 05:08:21PM +0530, Ayush Jain escreveu:
> On 6/14/2023 2:37 PM, Sandipan Das wrote:
> > There are cases where a metric uses more events than the number of
> > counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric
> > counters but the "nps1_die_to_dram" metric has eight events. By default,
> > the constituent events are placed in a group. Since the events cannot be
> > scheduled at the same time, the metric is not computed. The all metrics
> > test also fails because of this.

Humm, I'm not being able to reproduce here the problem, before applying
this patch:

[root@five ~]# grep -m1 "model name" /proc/cpuinfo
model name	: AMD Ryzen 9 5950X 16-Core Processor
[root@five ~]# perf test -vvv "perf all metrics test"
104: perf all metrics test                                           :
--- start ---
test child forked, pid 1379713
Testing branch_misprediction_ratio
Testing all_remote_links_outbound
Testing nps1_die_to_dram
Testing macro_ops_dispatched
Testing all_l2_cache_accesses
Testing all_l2_cache_hits
Testing all_l2_cache_misses
Testing ic_fetch_miss_ratio
Testing l2_cache_accesses_from_l2_hwpf
Testing l2_cache_misses_from_l2_hwpf
Testing op_cache_fetch_miss_ratio
Testing l3_read_miss_latency
Testing l1_itlb_misses
test child finished with 0
---- end ----
perf all metrics test: Ok
[root@five ~]#

[root@five ~]# perf stat -M nps1_die_to_dram -a sleep 2

 Performance counter stats for 'system wide':

                 0      dram_channel_data_controller_4   #  10885.3 MiB  nps1_die_to_dram       (49.96%)
        31,334,338      dram_channel_data_controller_1                                          (50.01%)
                 0      dram_channel_data_controller_6                                          (50.04%)
        54,679,601      dram_channel_data_controller_3                                          (50.04%)
        38,420,402      dram_channel_data_controller_0                                          (50.04%)
                 0      dram_channel_data_controller_5                                          (49.99%)
        54,012,661      dram_channel_data_controller_2                                          (49.96%)
                 0      dram_channel_data_controller_7                                          (49.96%)

       2.001465439 seconds time elapsed

[root@five ~]#

[root@five ~]# perf stat -v -M nps1_die_to_dram -a sleep 2
Using CPUID AuthenticAMD-25-21-0
metric expr dram_channel_data_controller_0 + dram_channel_data_controller_1 + dram_channel_data_controller_2 + dram_channel_data_controller_3 + dram_channel_data_controller_4 + dram_channel_data_controller_5 + dram_channel_data_controller_6 + dram_channel_data_controller_7 for nps1_die_to_dram
found event dram_channel_data_controller_4
found event dram_channel_data_controller_1
found event dram_channel_data_controller_6
found event dram_channel_data_controller_3
found event dram_channel_data_controller_0
found event dram_channel_data_controller_5
found event dram_channel_data_controller_2
found event dram_channel_data_controller_7
Parsing metric events 'dram_channel_data_controller_4/metric-id=dram_channel_data_controller_4/,dram_channel_data_controller_1/metric-id=dram_channel_data_controller_1/,dram_channel_data_controller_6/metric-id=dram_channel_data_controller_6/,dram_channel_data_controller_3/metric-id=dram_channel_data_controller_3/,dram_channel_data_controller_0/metric-id=dram_channel_data_controller_0/,dram_channel_data_controller_5/metric-id=dram_channel_data_controller_5/,dram_channel_data_controller_2/metric-id=dram_channel_data_controller_2/,dram_channel_data_controller_7/metric-id=dram_channel_data_controller_7/'
dram_channel_data_controller_4 -> amd_df/metric-id=dram_channel_data_controller_4,dram_channel_data_controller_4/
dram_channel_data_controller_1 -> amd_df/metric-id=dram_channel_data_controller_1,dram_channel_data_controller_1/
Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_1'. Missing kernel support? (<no help>)
dram_channel_data_controller_6 -> amd_df/metric-id=dram_channel_data_controller_6,dram_channel_data_controller_6/
Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_6'. Missing kernel support? (<no help>)
dram_channel_data_controller_3 -> amd_df/metric-id=dram_channel_data_controller_3,dram_channel_data_controller_3/
Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_3'. Missing kernel support? (<no help>)
dram_channel_data_controller_0 -> amd_df/metric-id=dram_channel_data_controller_0,dram_channel_data_controller_0/
Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_0'. Missing kernel support? (<no help>)
dram_channel_data_controller_5 -> amd_df/metric-id=dram_channel_data_controller_5,dram_channel_data_controller_5/
Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_5'. Missing kernel support? (<no help>)
dram_channel_data_controller_2 -> amd_df/metric-id=dram_channel_data_controller_2,dram_channel_data_controller_2/
Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_2'. Missing kernel support? (<no help>)
dram_channel_data_controller_7 -> amd_df/metric-id=dram_channel_data_controller_7,dram_channel_data_controller_7/
Matched metric-id dram_channel_data_controller_4 to dram_channel_data_controller_4
Matched metric-id dram_channel_data_controller_1 to dram_channel_data_controller_1
Matched metric-id dram_channel_data_controller_6 to dram_channel_data_controller_6
Matched metric-id dram_channel_data_controller_3 to dram_channel_data_controller_3
Matched metric-id dram_channel_data_controller_0 to dram_channel_data_controller_0
Matched metric-id dram_channel_data_controller_5 to dram_channel_data_controller_5
Matched metric-id dram_channel_data_controller_2 to dram_channel_data_controller_2
Matched metric-id dram_channel_data_controller_7 to dram_channel_data_controller_7
Control descriptor is not initialized
dram_channel_data_controller_4: 0 2001175127 999996394
dram_channel_data_controller_1: 32346663 2001169897 1000709803
dram_channel_data_controller_6: 0 2001168377 1001193443
dram_channel_data_controller_3: 47551247 2001166947 1001198122
dram_channel_data_controller_0: 38975242 2001165217 1001182923
dram_channel_data_controller_5: 0 2001163067 1000464054
dram_channel_data_controller_2: 49934162 2001160907 999974934
dram_channel_data_controller_7: 0 2001150317 999968825

 Performance counter stats for 'system wide':

                 0      dram_channel_data_controller_4   #  10297.2 MiB  nps1_die_to_dram       (49.97%)
        32,346,663      dram_channel_data_controller_1                                          (50.01%)
                 0      dram_channel_data_controller_6                                          (50.03%)
        47,551,247      dram_channel_data_controller_3                                          (50.03%)
        38,975,242      dram_channel_data_controller_0                                          (50.03%)
                 0      dram_channel_data_controller_5                                          (49.99%)
        49,934,162      dram_channel_data_controller_2                                          (49.97%)
                 0      dram_channel_data_controller_7                                          (49.97%)

       2.001196512 seconds time elapsed

[root@five ~]#

What am I missing?

Ian, I also stumbled on this:

[root@five ~]# perf stat -M dram_channel_data_controller_4
Cannot find metric or group `dram_channel_data_controller_4'
^C
 Performance counter stats for 'system wide':

        284,908.91 msec cpu-clock                        #   32.002 CPUs utilized
         6,485,456      context-switches                 #   22.763 K/sec
               719      cpu-migrations                   #    2.524 /sec
            32,800      page-faults                      #  115.125 /sec
   189,779,273,552      cycles                           #    0.666 GHz                         (83.33%)
     2,893,165,259      stalled-cycles-frontend          #    1.52% frontend cycles idle        (83.33%)
    24,807,157,349      stalled-cycles-backend           #   13.07% backend cycles idle         (83.33%)
    99,286,488,807      instructions                     #    0.52  insn per cycle
                                                  #    0.25  stalled cycles per insn     (83.33%)
    24,120,737,678      branches                         #   84.661 M/sec                       (83.33%)
     1,907,540,278      branch-misses                    #    7.91% of all branches             (83.34%)

       8.902784776 seconds time elapsed


[root@five ~]#
[root@five ~]# perf stat -e dram_channel_data_controller_4
^C
 Performance counter stats for 'system wide':

                 0      dram_channel_data_controller_4

       1.189638741 seconds time elapsed


[root@five ~]#

I.e. -M should bail out at that point (Cannot find metric or group `dram_channel_data_controller_4'), no?

- Arnaldo

> > Before announcing failure, the test can try multiple options for each
> > available metric. After system-wide mode fails, retry once again with
> > the "--metric-no-group" option.
> > 
> > E.g.
> > 
> >    $ sudo perf test -v 100
> > 
> > Before:
> > 
> >    100: perf all metrics test                                           :
> >    --- start ---
> >    test child forked, pid 672731
> >    Testing branch_misprediction_ratio
> >    Testing all_remote_links_outbound
> >    Testing nps1_die_to_dram
> >    Metric 'nps1_die_to_dram' not printed in:
> >    Error:
> >    Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
> >    Testing macro_ops_dispatched
> >    Testing all_l2_cache_accesses
> >    Testing all_l2_cache_hits
> >    Testing all_l2_cache_misses
> >    Testing ic_fetch_miss_ratio
> >    Testing l2_cache_accesses_from_l2_hwpf
> >    Testing l2_cache_misses_from_l2_hwpf
> >    Testing op_cache_fetch_miss_ratio
> >    Testing l3_read_miss_latency
> >    Testing l1_itlb_misses
> >    test child finished with -1
> >    ---- end ----
> >    perf all metrics test: FAILED!
> > 
> > After:
> > 
> >    100: perf all metrics test                                           :
> >    --- start ---
> >    test child forked, pid 672887
> >    Testing branch_misprediction_ratio
> >    Testing all_remote_links_outbound
> >    Testing nps1_die_to_dram
> >    Testing macro_ops_dispatched
> >    Testing all_l2_cache_accesses
> >    Testing all_l2_cache_hits
> >    Testing all_l2_cache_misses
> >    Testing ic_fetch_miss_ratio
> >    Testing l2_cache_accesses_from_l2_hwpf
> >    Testing l2_cache_misses_from_l2_hwpf
> >    Testing op_cache_fetch_miss_ratio
> >    Testing l3_read_miss_latency
> >    Testing l1_itlb_misses
> >    test child finished with 0
> >    ---- end ----
> >    perf all metrics test: Ok
> > 
> 
> Issue gets resolved after applying this patch
> 
>   $ ./perf test 102 -vvv
>   $102: perf all metrics test                                           :
>   $--- start ---
>   $test child forked, pid 244991
>   $Testing branch_misprediction_ratio
>   $Testing all_remote_links_outbound
>   $Testing nps1_die_to_dram
>   $Testing all_l2_cache_accesses
>   $Testing all_l2_cache_hits
>   $Testing all_l2_cache_misses
>   $Testing ic_fetch_miss_ratio
>   $Testing l2_cache_accesses_from_l2_hwpf
>   $Testing l2_cache_misses_from_l2_hwpf
>   $Testing l3_read_miss_latency
>   $Testing l1_itlb_misses
>   $test child finished with 0
>   $---- end ----
>   $perf all metrics test: Ok
> 
> > Reported-by: Ayush Jain <ayush.jain3@amd.com>
> > Signed-off-by: Sandipan Das <sandipan.das@amd.com>
> 
> Tested-by: Ayush Jain <ayush.jain3@amd.com>
> 
> > ---
> >   tools/perf/tests/shell/stat_all_metrics.sh | 7 +++++++
> >   1 file changed, 7 insertions(+)
> > 
> > diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh
> > index 54774525e18a..1e88ea8c5677 100755
> > --- a/tools/perf/tests/shell/stat_all_metrics.sh
> > +++ b/tools/perf/tests/shell/stat_all_metrics.sh
> > @@ -16,6 +16,13 @@ for m in $(perf list --raw-dump metrics); do
> >     then
> >       continue
> >     fi
> > +  # Failed again, possibly there are not enough counters so retry system wide
> > +  # mode but without event grouping.
> > +  result=$(perf stat -M "$m" --metric-no-group -a sleep 0.01 2>&1)
> > +  if [[ "$result" =~ ${m:0:50} ]]
> > +  then
> > +    continue
> > +  fi
> >     # Failed again, possibly the workload was too small so retry with something
> >     # longer.
> >     result=$(perf stat -M "$m" perf bench internals synthesize 2>&1)
> 
> Thanks & Regards,
> Ayush Jain
  
Ian Rogers Dec. 6, 2023, 4:35 p.m. UTC | #5
On Wed, Dec 6, 2023 at 5:08 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>
> Em Wed, Jun 14, 2023 at 05:08:21PM +0530, Ayush Jain escreveu:
> > On 6/14/2023 2:37 PM, Sandipan Das wrote:
> > > There are cases where a metric uses more events than the number of
> > > counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric
> > > counters but the "nps1_die_to_dram" metric has eight events. By default,
> > > the constituent events are placed in a group. Since the events cannot be
> > > scheduled at the same time, the metric is not computed. The all metrics
> > > test also fails because of this.
>
> Humm, I'm not being able to reproduce here the problem, before applying
> this patch:
>
> [root@five ~]# grep -m1 "model name" /proc/cpuinfo
> model name      : AMD Ryzen 9 5950X 16-Core Processor
> [root@five ~]# perf test -vvv "perf all metrics test"
> 104: perf all metrics test                                           :
> --- start ---
> test child forked, pid 1379713
> Testing branch_misprediction_ratio
> Testing all_remote_links_outbound
> Testing nps1_die_to_dram
> Testing macro_ops_dispatched
> Testing all_l2_cache_accesses
> Testing all_l2_cache_hits
> Testing all_l2_cache_misses
> Testing ic_fetch_miss_ratio
> Testing l2_cache_accesses_from_l2_hwpf
> Testing l2_cache_misses_from_l2_hwpf
> Testing op_cache_fetch_miss_ratio
> Testing l3_read_miss_latency
> Testing l1_itlb_misses
> test child finished with 0
> ---- end ----
> perf all metrics test: Ok
> [root@five ~]#

Please don't apply the patch. The patch masks a bug in metrics/PMUs
and the proper fix was:
8d40f74ebf21 perf vendor events amd: Fix large metrics
https://lore.kernel.org/r/20230706063440.54189-1-sandipan.das@amd.com

> [root@five ~]# perf stat -M nps1_die_to_dram -a sleep 2
>
>  Performance counter stats for 'system wide':
>
>                  0      dram_channel_data_controller_4   #  10885.3 MiB  nps1_die_to_dram       (49.96%)
>         31,334,338      dram_channel_data_controller_1                                          (50.01%)
>                  0      dram_channel_data_controller_6                                          (50.04%)
>         54,679,601      dram_channel_data_controller_3                                          (50.04%)
>         38,420,402      dram_channel_data_controller_0                                          (50.04%)
>                  0      dram_channel_data_controller_5                                          (49.99%)
>         54,012,661      dram_channel_data_controller_2                                          (49.96%)
>                  0      dram_channel_data_controller_7                                          (49.96%)
>
>        2.001465439 seconds time elapsed
>
> [root@five ~]#
>
> [root@five ~]# perf stat -v -M nps1_die_to_dram -a sleep 2
> Using CPUID AuthenticAMD-25-21-0
> metric expr dram_channel_data_controller_0 + dram_channel_data_controller_1 + dram_channel_data_controller_2 + dram_channel_data_controller_3 + dram_channel_data_controller_4 + dram_channel_data_controller_5 + dram_channel_data_controller_6 + dram_channel_data_controller_7 for nps1_die_to_dram
> found event dram_channel_data_controller_4
> found event dram_channel_data_controller_1
> found event dram_channel_data_controller_6
> found event dram_channel_data_controller_3
> found event dram_channel_data_controller_0
> found event dram_channel_data_controller_5
> found event dram_channel_data_controller_2
> found event dram_channel_data_controller_7
> Parsing metric events 'dram_channel_data_controller_4/metric-id=dram_channel_data_controller_4/,dram_channel_data_controller_1/metric-id=dram_channel_data_controller_1/,dram_channel_data_controller_6/metric-id=dram_channel_data_controller_6/,dram_channel_data_controller_3/metric-id=dram_channel_data_controller_3/,dram_channel_data_controller_0/metric-id=dram_channel_data_controller_0/,dram_channel_data_controller_5/metric-id=dram_channel_data_controller_5/,dram_channel_data_controller_2/metric-id=dram_channel_data_controller_2/,dram_channel_data_controller_7/metric-id=dram_channel_data_controller_7/'
> dram_channel_data_controller_4 -> amd_df/metric-id=dram_channel_data_controller_4,dram_channel_data_controller_4/
> dram_channel_data_controller_1 -> amd_df/metric-id=dram_channel_data_controller_1,dram_channel_data_controller_1/
> Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_1'. Missing kernel support? (<no help>)
> dram_channel_data_controller_6 -> amd_df/metric-id=dram_channel_data_controller_6,dram_channel_data_controller_6/
> Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_6'. Missing kernel support? (<no help>)
> dram_channel_data_controller_3 -> amd_df/metric-id=dram_channel_data_controller_3,dram_channel_data_controller_3/
> Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_3'. Missing kernel support? (<no help>)
> dram_channel_data_controller_0 -> amd_df/metric-id=dram_channel_data_controller_0,dram_channel_data_controller_0/
> Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_0'. Missing kernel support? (<no help>)
> dram_channel_data_controller_5 -> amd_df/metric-id=dram_channel_data_controller_5,dram_channel_data_controller_5/
> Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_5'. Missing kernel support? (<no help>)
> dram_channel_data_controller_2 -> amd_df/metric-id=dram_channel_data_controller_2,dram_channel_data_controller_2/
> Multiple errors dropping message: Cannot find PMU `dram_channel_data_controller_2'. Missing kernel support? (<no help>)
> dram_channel_data_controller_7 -> amd_df/metric-id=dram_channel_data_controller_7,dram_channel_data_controller_7/
> Matched metric-id dram_channel_data_controller_4 to dram_channel_data_controller_4
> Matched metric-id dram_channel_data_controller_1 to dram_channel_data_controller_1
> Matched metric-id dram_channel_data_controller_6 to dram_channel_data_controller_6
> Matched metric-id dram_channel_data_controller_3 to dram_channel_data_controller_3
> Matched metric-id dram_channel_data_controller_0 to dram_channel_data_controller_0
> Matched metric-id dram_channel_data_controller_5 to dram_channel_data_controller_5
> Matched metric-id dram_channel_data_controller_2 to dram_channel_data_controller_2
> Matched metric-id dram_channel_data_controller_7 to dram_channel_data_controller_7
> Control descriptor is not initialized
> dram_channel_data_controller_4: 0 2001175127 999996394
> dram_channel_data_controller_1: 32346663 2001169897 1000709803
> dram_channel_data_controller_6: 0 2001168377 1001193443
> dram_channel_data_controller_3: 47551247 2001166947 1001198122
> dram_channel_data_controller_0: 38975242 2001165217 1001182923
> dram_channel_data_controller_5: 0 2001163067 1000464054
> dram_channel_data_controller_2: 49934162 2001160907 999974934
> dram_channel_data_controller_7: 0 2001150317 999968825
>
>  Performance counter stats for 'system wide':
>
>                  0      dram_channel_data_controller_4   #  10297.2 MiB  nps1_die_to_dram       (49.97%)
>         32,346,663      dram_channel_data_controller_1                                          (50.01%)
>                  0      dram_channel_data_controller_6                                          (50.03%)
>         47,551,247      dram_channel_data_controller_3                                          (50.03%)
>         38,975,242      dram_channel_data_controller_0                                          (50.03%)
>                  0      dram_channel_data_controller_5                                          (49.99%)
>         49,934,162      dram_channel_data_controller_2                                          (49.97%)
>                  0      dram_channel_data_controller_7                                          (49.97%)
>
>        2.001196512 seconds time elapsed
>
> [root@five ~]#
>
> What am I missing?
>
> Ian, I also stumbled on this:
>
> [root@five ~]# perf stat -M dram_channel_data_controller_4
> Cannot find metric or group `dram_channel_data_controller_4'
> ^C
>  Performance counter stats for 'system wide':
>
>         284,908.91 msec cpu-clock                        #   32.002 CPUs utilized
>          6,485,456      context-switches                 #   22.763 K/sec
>                719      cpu-migrations                   #    2.524 /sec
>             32,800      page-faults                      #  115.125 /sec
>    189,779,273,552      cycles                           #    0.666 GHz                         (83.33%)
>      2,893,165,259      stalled-cycles-frontend          #    1.52% frontend cycles idle        (83.33%)
>     24,807,157,349      stalled-cycles-backend           #   13.07% backend cycles idle         (83.33%)
>     99,286,488,807      instructions                     #    0.52  insn per cycle
>                                                   #    0.25  stalled cycles per insn     (83.33%)
>     24,120,737,678      branches                         #   84.661 M/sec                       (83.33%)
>      1,907,540,278      branch-misses                    #    7.91% of all branches             (83.34%)
>
>        8.902784776 seconds time elapsed
>
>
> [root@five ~]#
> [root@five ~]# perf stat -e dram_channel_data_controller_4
> ^C
>  Performance counter stats for 'system wide':
>
>                  0      dram_channel_data_controller_4
>
>        1.189638741 seconds time elapsed
>
>
> [root@five ~]#
>
> I.e. -M should bail out at that point (Cannot find metric or group `dram_channel_data_controller_4'), no?

We could. I suspect the code has always just not bailed out. I'll put
together a patch adding the bail out.

Thanks,
Ian

> - Arnaldo
>
> > > Before announcing failure, the test can try multiple options for each
> > > available metric. After system-wide mode fails, retry once again with
> > > the "--metric-no-group" option.
> > >
> > > E.g.
> > >
> > >    $ sudo perf test -v 100
> > >
> > > Before:
> > >
> > >    100: perf all metrics test                                           :
> > >    --- start ---
> > >    test child forked, pid 672731
> > >    Testing branch_misprediction_ratio
> > >    Testing all_remote_links_outbound
> > >    Testing nps1_die_to_dram
> > >    Metric 'nps1_die_to_dram' not printed in:
> > >    Error:
> > >    Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
> > >    Testing macro_ops_dispatched
> > >    Testing all_l2_cache_accesses
> > >    Testing all_l2_cache_hits
> > >    Testing all_l2_cache_misses
> > >    Testing ic_fetch_miss_ratio
> > >    Testing l2_cache_accesses_from_l2_hwpf
> > >    Testing l2_cache_misses_from_l2_hwpf
> > >    Testing op_cache_fetch_miss_ratio
> > >    Testing l3_read_miss_latency
> > >    Testing l1_itlb_misses
> > >    test child finished with -1
> > >    ---- end ----
> > >    perf all metrics test: FAILED!
> > >
> > > After:
> > >
> > >    100: perf all metrics test                                           :
> > >    --- start ---
> > >    test child forked, pid 672887
> > >    Testing branch_misprediction_ratio
> > >    Testing all_remote_links_outbound
> > >    Testing nps1_die_to_dram
> > >    Testing macro_ops_dispatched
> > >    Testing all_l2_cache_accesses
> > >    Testing all_l2_cache_hits
> > >    Testing all_l2_cache_misses
> > >    Testing ic_fetch_miss_ratio
> > >    Testing l2_cache_accesses_from_l2_hwpf
> > >    Testing l2_cache_misses_from_l2_hwpf
> > >    Testing op_cache_fetch_miss_ratio
> > >    Testing l3_read_miss_latency
> > >    Testing l1_itlb_misses
> > >    test child finished with 0
> > >    ---- end ----
> > >    perf all metrics test: Ok
> > >
> >
> > Issue gets resolved after applying this patch
> >
> >   $ ./perf test 102 -vvv
> >   $102: perf all metrics test                                           :
> >   $--- start ---
> >   $test child forked, pid 244991
> >   $Testing branch_misprediction_ratio
> >   $Testing all_remote_links_outbound
> >   $Testing nps1_die_to_dram
> >   $Testing all_l2_cache_accesses
> >   $Testing all_l2_cache_hits
> >   $Testing all_l2_cache_misses
> >   $Testing ic_fetch_miss_ratio
> >   $Testing l2_cache_accesses_from_l2_hwpf
> >   $Testing l2_cache_misses_from_l2_hwpf
> >   $Testing l3_read_miss_latency
> >   $Testing l1_itlb_misses
> >   $test child finished with 0
> >   $---- end ----
> >   $perf all metrics test: Ok
> >
> > > Reported-by: Ayush Jain <ayush.jain3@amd.com>
> > > Signed-off-by: Sandipan Das <sandipan.das@amd.com>
> >
> > Tested-by: Ayush Jain <ayush.jain3@amd.com>
> >
> > > ---
> > >   tools/perf/tests/shell/stat_all_metrics.sh | 7 +++++++
> > >   1 file changed, 7 insertions(+)
> > >
> > > diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh
> > > index 54774525e18a..1e88ea8c5677 100755
> > > --- a/tools/perf/tests/shell/stat_all_metrics.sh
> > > +++ b/tools/perf/tests/shell/stat_all_metrics.sh
> > > @@ -16,6 +16,13 @@ for m in $(perf list --raw-dump metrics); do
> > >     then
> > >       continue
> > >     fi
> > > +  # Failed again, possibly there are not enough counters so retry system wide
> > > +  # mode but without event grouping.
> > > +  result=$(perf stat -M "$m" --metric-no-group -a sleep 0.01 2>&1)
> > > +  if [[ "$result" =~ ${m:0:50} ]]
> > > +  then
> > > +    continue
> > > +  fi
> > >     # Failed again, possibly the workload was too small so retry with something
> > >     # longer.
> > >     result=$(perf stat -M "$m" perf bench internals synthesize 2>&1)
> >
> > Thanks & Regards,
> > Ayush Jain
>
> --
>
> - Arnaldo
  
Arnaldo Carvalho de Melo Dec. 6, 2023, 5:54 p.m. UTC | #6
Em Wed, Dec 06, 2023 at 08:35:23AM -0800, Ian Rogers escreveu:
> On Wed, Dec 6, 2023 at 5:08 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > Humm, I'm not being able to reproduce here the problem, before applying
> > this patch:
 
> Please don't apply the patch. The patch masks a bug in metrics/PMUs

I didn't

> and the proper fix was:
> 8d40f74ebf21 perf vendor events amd: Fix large metrics
> https://lore.kernel.org/r/20230706063440.54189-1-sandipan.das@amd.com

that is upstream:

⬢[acme@toolbox perf-tools-next]$ git log tools/perf/pmu-events/arch/x86/amdzen1/recommended.json
commit 8d40f74ebf217d3b9e9b7481721e6236b857cc55
Author: Sandipan Das <sandipan.das@amd.com>
Date:   Thu Jul 6 12:04:40 2023 +0530

    perf vendor events amd: Fix large metrics

    There are cases where a metric requires more events than the number of
    available counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four
    data fabric counters but the "nps1_die_to_dram" metric has eight events.

    By default, the constituent events are placed in a group and since the
    events cannot be scheduled at the same time, the metric is not computed.
    The "all metrics" test also fails because of this.

    Use the NO_GROUP_EVENTS constraint for such metrics which anyway expect
    the user to run perf with "--metric-no-group".

    E.g.

      $ sudo perf test -v 101

    Before:

      101: perf all metrics test                                           :
      --- start ---
      test child forked, pid 37131
      Testing branch_misprediction_ratio
      Testing all_remote_links_outbound
      Testing nps1_die_to_dram
      Metric 'nps1_die_to_dram' not printed in:
      Error:
      Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
      Testing macro_ops_dispatched
      Testing all_l2_cache_accesses
      Testing all_l2_cache_hits
      Testing all_l2_cache_misses
      Testing ic_fetch_miss_ratio
      Testing l2_cache_accesses_from_l2_hwpf
      Testing l2_cache_misses_from_l2_hwpf
      Testing op_cache_fetch_miss_ratio
      Testing l3_read_miss_latency
      Testing l1_itlb_misses
      test child finished with -1
      ---- end ----
      perf all metrics test: FAILED!

    After:

      101: perf all metrics test                                           :
      --- start ---
      test child forked, pid 43766
      Testing branch_misprediction_ratio
      Testing all_remote_links_outbound
      Testing nps1_die_to_dram
      Testing macro_ops_dispatched
      Testing all_l2_cache_accesses
      Testing all_l2_cache_hits
      Testing all_l2_cache_misses
      Testing ic_fetch_miss_ratio
      Testing l2_cache_accesses_from_l2_hwpf
      Testing l2_cache_misses_from_l2_hwpf
      Testing op_cache_fetch_miss_ratio
      Testing l3_read_miss_latency
      Testing l1_itlb_misses
      test child finished with 0
      ---- end ----
      perf all metrics test: Ok

    Reported-by: Ayush Jain <ayush.jain3@amd.com>
    Suggested-by: Ian Rogers <irogers@google.com>
    Signed-off-by: Sandipan Das <sandipan.das@amd.com>
    Acked-by: Ian Rogers <irogers@google.com>
    Cc: Adrian Hunter <adrian.hunter@intel.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Ananth Narayan <ananth.narayan@amd.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Ravi Bangoria <ravi.bangoria@amd.com>
    Cc: Santosh Shukla <santosh.shukla@amd.com>
    Link: https://lore.kernel.org/r/20230706063440.54189-1-sandipan.das@amd.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com
 
> > Ian, I also stumbled on this:

> > [root@five ~]# perf stat -M dram_channel_data_controller_4
> > Cannot find metric or group `dram_channel_data_controller_4'
> > ^C
> >  Performance counter stats for 'system wide':

> >         284,908.91 msec cpu-clock                        #   32.002 CPUs utilized
> >          6,485,456      context-switches                 #   22.763 K/sec
> >                719      cpu-migrations                   #    2.524 /sec
> >             32,800      page-faults                      #  115.125 /sec

<SNIP>

> > I.e. -M should bail out at that point (Cannot find metric or group `dram_channel_data_controller_4'), no?

> We could. I suspect the code has always just not bailed out. I'll put
> together a patch adding the bail out.

Great, thanks,

- Arnaldo
  
Ian Rogers Dec. 6, 2023, 6:50 p.m. UTC | #7
On Wed, Dec 6, 2023 at 9:54 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>
> Em Wed, Dec 06, 2023 at 08:35:23AM -0800, Ian Rogers escreveu:
> > On Wed, Dec 6, 2023 at 5:08 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > > Humm, I'm not being able to reproduce here the problem, before applying
> > > this patch:
>
> > Please don't apply the patch. The patch masks a bug in metrics/PMUs
>
> I didn't
>
> > and the proper fix was:
> > 8d40f74ebf21 perf vendor events amd: Fix large metrics
> > https://lore.kernel.org/r/20230706063440.54189-1-sandipan.das@amd.com
>
> that is upstream:
>
> ⬢[acme@toolbox perf-tools-next]$ git log tools/perf/pmu-events/arch/x86/amdzen1/recommended.json
> commit 8d40f74ebf217d3b9e9b7481721e6236b857cc55
> Author: Sandipan Das <sandipan.das@amd.com>
> Date:   Thu Jul 6 12:04:40 2023 +0530
>
>     perf vendor events amd: Fix large metrics
>
>     There are cases where a metric requires more events than the number of
>     available counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four
>     data fabric counters but the "nps1_die_to_dram" metric has eight events.
>
>     By default, the constituent events are placed in a group and since the
>     events cannot be scheduled at the same time, the metric is not computed.
>     The "all metrics" test also fails because of this.
>
>     Use the NO_GROUP_EVENTS constraint for such metrics which anyway expect
>     the user to run perf with "--metric-no-group".
>
>     E.g.
>
>       $ sudo perf test -v 101
>
>     Before:
>
>       101: perf all metrics test                                           :
>       --- start ---
>       test child forked, pid 37131
>       Testing branch_misprediction_ratio
>       Testing all_remote_links_outbound
>       Testing nps1_die_to_dram
>       Metric 'nps1_die_to_dram' not printed in:
>       Error:
>       Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
>       Testing macro_ops_dispatched
>       Testing all_l2_cache_accesses
>       Testing all_l2_cache_hits
>       Testing all_l2_cache_misses
>       Testing ic_fetch_miss_ratio
>       Testing l2_cache_accesses_from_l2_hwpf
>       Testing l2_cache_misses_from_l2_hwpf
>       Testing op_cache_fetch_miss_ratio
>       Testing l3_read_miss_latency
>       Testing l1_itlb_misses
>       test child finished with -1
>       ---- end ----
>       perf all metrics test: FAILED!
>
>     After:
>
>       101: perf all metrics test                                           :
>       --- start ---
>       test child forked, pid 43766
>       Testing branch_misprediction_ratio
>       Testing all_remote_links_outbound
>       Testing nps1_die_to_dram
>       Testing macro_ops_dispatched
>       Testing all_l2_cache_accesses
>       Testing all_l2_cache_hits
>       Testing all_l2_cache_misses
>       Testing ic_fetch_miss_ratio
>       Testing l2_cache_accesses_from_l2_hwpf
>       Testing l2_cache_misses_from_l2_hwpf
>       Testing op_cache_fetch_miss_ratio
>       Testing l3_read_miss_latency
>       Testing l1_itlb_misses
>       test child finished with 0
>       ---- end ----
>       perf all metrics test: Ok
>
>     Reported-by: Ayush Jain <ayush.jain3@amd.com>
>     Suggested-by: Ian Rogers <irogers@google.com>
>     Signed-off-by: Sandipan Das <sandipan.das@amd.com>
>     Acked-by: Ian Rogers <irogers@google.com>
>     Cc: Adrian Hunter <adrian.hunter@intel.com>
>     Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>     Cc: Ananth Narayan <ananth.narayan@amd.com>
>     Cc: Ingo Molnar <mingo@redhat.com>
>     Cc: Jiri Olsa <jolsa@kernel.org>
>     Cc: Mark Rutland <mark.rutland@arm.com>
>     Cc: Namhyung Kim <namhyung@kernel.org>
>     Cc: Peter Zijlstra <peterz@infradead.org>
>     Cc: Ravi Bangoria <ravi.bangoria@amd.com>
>     Cc: Santosh Shukla <santosh.shukla@amd.com>
>     Link: https://lore.kernel.org/r/20230706063440.54189-1-sandipan.das@amd.com
>     Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com
>
> > > Ian, I also stumbled on this:
>
> > > [root@five ~]# perf stat -M dram_channel_data_controller_4
> > > Cannot find metric or group `dram_channel_data_controller_4'
> > > ^C
> > >  Performance counter stats for 'system wide':
>
> > >         284,908.91 msec cpu-clock                        #   32.002 CPUs utilized
> > >          6,485,456      context-switches                 #   22.763 K/sec
> > >                719      cpu-migrations                   #    2.524 /sec
> > >             32,800      page-faults                      #  115.125 /sec
>
> <SNIP>
>
> > > I.e. -M should bail out at that point (Cannot find metric or group `dram_channel_data_controller_4'), no?
>
> > We could. I suspect the code has always just not bailed out. I'll put
> > together a patch adding the bail out.
>
> Great, thanks,

Sent:
https://lore.kernel.org/lkml/20231206183533.972028-1-irogers@google.com/

Thanks,
Ian

> - Arnaldo
  

Patch

diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh
index 54774525e18a..1e88ea8c5677 100755
--- a/tools/perf/tests/shell/stat_all_metrics.sh
+++ b/tools/perf/tests/shell/stat_all_metrics.sh
@@ -16,6 +16,13 @@  for m in $(perf list --raw-dump metrics); do
   then
     continue
   fi
+  # Failed again, possibly there are not enough counters so retry system wide
+  # mode but without event grouping.
+  result=$(perf stat -M "$m" --metric-no-group -a sleep 0.01 2>&1)
+  if [[ "$result" =~ ${m:0:50} ]]
+  then
+    continue
+  fi
   # Failed again, possibly the workload was too small so retry with something
   # longer.
   result=$(perf stat -M "$m" perf bench internals synthesize 2>&1)