[RFC,0/6] Add metrics for neoverse-n2

Message ID 1667214694-89839-1-git-send-email-renyu.zj@linux.alibaba.com
Headers
Series Add metrics for neoverse-n2 |

Message

Jing Zhang Oct. 31, 2022, 11:11 a.m. UTC
  This series add six metricgroups for neoverse-n2, among which, the
formula of topdown L1 is from the document:
https://documentation-service.arm.com/static/60250c7395978b529036da86?token=

Since neoverse-n2 does not yet support topdown L2, metricgroups such
as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
help further analysis of performance bottlenecks.

with this series on neoverse-n2:

$./perf list metricgroup

List of pre-defined events (to be used in -e):


Metric Groups:

Branch
Cache
InstructionMix
PEutilization
TLB
TopDownL1


$./perf list

...
Metric Groups:

Branch:
  branch_miss_pred_rate
       [The rate of branches mis-predited to the overall branches]
  branch_mpki
       [The rate of branches mis-predicted per kilo instructions]
  branch_pki
       [The rate of branches retired per kilo instructions]
Cache:
  l1d_cache_miss_rate
       [The rate of L1 D-Cache misses to the overall L1 D-Cache]
  l1d_cache_mpki
       [The rate of L1 D-Cache misses per kilo instructions]
...


$sudo ./perf stat -a -M TLB sleep 1

 Performance counter stats for 'system wide':

        35,861,936      L1I_TLB                          #     0.00 itlb_walk_rate           (74.91%)
             5,661      ITLB_WALK                                                            (74.91%)
        97,279,240      INST_RETIRED                     #     0.07 itlb_mpki                (74.91%)
             6,851      ITLB_WALK                                                            (74.91%)
            26,391      DTLB_WALK                        #     0.00 dtlb_walk_rate           (75.07%)
        35,585,545      L1D_TLB                                                              (75.07%)
        85,923,244      INST_RETIRED                     #     0.35 dtlb_mpki                (75.11%)
            29,992      DTLB_WALK                                                            (75.11%)

       1.003450755 seconds time elapsed
       

Jing Zhang (6):
  perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  perf vendor events arm64: Add TLB metrics for neoverse-n2
  perf vendor events arm64: Add cache metrics for neoverse-n2
  perf vendor events arm64: Add branch metrics for neoverse-n2
  perf vendor events arm64: Add PE utilization metrics for neoverse-n2
  perf vendor events arm64: Add instruction mix metrics for neoverse-n2

 .../arch/arm64/arm/neoverse-n2/metrics.json        | 247 +++++++++++++++++++++
 1 file changed, 247 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
  

Comments

James Clark Nov. 16, 2022, 11:19 a.m. UTC | #1
On 31/10/2022 11:11, Jing Zhang wrote:
> This series add six metricgroups for neoverse-n2, among which, the
> formula of topdown L1 is from the document:
> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
> 
> Since neoverse-n2 does not yet support topdown L2, metricgroups such
> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
> help further analysis of performance bottlenecks.
> 

Hi Jing,

Thanks for working on this, these metrics look ok to me in general,
although we're currently working on publishing standardised metrics
across all new cores as part of a new project in Arm. This will include
N2, and our ones are very similar (or almost identical) to yours,
barring slightly different group names, metric names, and differences in
things like outputting topdown metrics as percentages.

We plan to publish our standard metrics some time in the next 2 months.
Would you consider holding off on merging this change so that we have
consistant group names and units going forward? Otherwise N2 would be
the odd one out. I will send you the metrics when they are ready, and we
will have a script to generate perf jsons from them, so you can review.

We also have a slightly different forumula for one of the top down
metrics which I think would be slightly more accurate. We don't have
anything for your "PE utilization" metrics, which I can raise
internally. It could always be added to perf on top of the standardised
ones if we don't add it to our standard ones.

Thanks
James

> with this series on neoverse-n2:
> 
> $./perf list metricgroup
> 
> List of pre-defined events (to be used in -e):
> 
> 
> Metric Groups:
> 
> Branch
> Cache
> InstructionMix
> PEutilization
> TLB
> TopDownL1
> 
> 
> $./perf list
> 
> ...
> Metric Groups:
> 
> Branch:
>   branch_miss_pred_rate
>        [The rate of branches mis-predited to the overall branches]
>   branch_mpki
>        [The rate of branches mis-predicted per kilo instructions]
>   branch_pki
>        [The rate of branches retired per kilo instructions]
> Cache:
>   l1d_cache_miss_rate
>        [The rate of L1 D-Cache misses to the overall L1 D-Cache]
>   l1d_cache_mpki
>        [The rate of L1 D-Cache misses per kilo instructions]
> ...
> 
> 
> $sudo ./perf stat -a -M TLB sleep 1
> 
>  Performance counter stats for 'system wide':
> 
>         35,861,936      L1I_TLB                          #     0.00 itlb_walk_rate           (74.91%)
>              5,661      ITLB_WALK                                                            (74.91%)
>         97,279,240      INST_RETIRED                     #     0.07 itlb_mpki                (74.91%)
>              6,851      ITLB_WALK                                                            (74.91%)
>             26,391      DTLB_WALK                        #     0.00 dtlb_walk_rate           (75.07%)
>         35,585,545      L1D_TLB                                                              (75.07%)
>         85,923,244      INST_RETIRED                     #     0.35 dtlb_mpki                (75.11%)
>             29,992      DTLB_WALK                                                            (75.11%)
> 
>        1.003450755 seconds time elapsed
>        
> 
> Jing Zhang (6):
>   perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
>   perf vendor events arm64: Add TLB metrics for neoverse-n2
>   perf vendor events arm64: Add cache metrics for neoverse-n2
>   perf vendor events arm64: Add branch metrics for neoverse-n2
>   perf vendor events arm64: Add PE utilization metrics for neoverse-n2
>   perf vendor events arm64: Add instruction mix metrics for neoverse-n2
> 
>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 247 +++++++++++++++++++++
>  1 file changed, 247 insertions(+)
>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>
  
Jing Zhang Nov. 16, 2022, 3:26 p.m. UTC | #2
在 2022/11/16 下午7:19, James Clark 写道:
> 
> 
> On 31/10/2022 11:11, Jing Zhang wrote:
>> This series add six metricgroups for neoverse-n2, among which, the
>> formula of topdown L1 is from the document:
>> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>
>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>> help further analysis of performance bottlenecks.
>>
> 
> Hi Jing,
> 
> Thanks for working on this, these metrics look ok to me in general,
> although we're currently working on publishing standardised metrics
> across all new cores as part of a new project in Arm. This will include
> N2, and our ones are very similar (or almost identical) to yours,
> barring slightly different group names, metric names, and differences in
> things like outputting topdown metrics as percentages.
> 
> We plan to publish our standard metrics some time in the next 2 months.
> Would you consider holding off on merging this change so that we have
> consistant group names and units going forward? Otherwise N2 would be> the odd one out. I will send you the metrics when they are ready, and we
> will have a script to generate perf jsons from them, so you can review.
> 

Do you mean that after you release the new standard metrics, I remake my
patch referring to them, such as consistent group names and unit?


> We also have a slightly different forumula for one of the top down
> metrics which I think would be slightly more accurate. We don't have


The v2 version of the patchset updated the formula of topdown L1.
Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/

The formula of the v2 version is more accurate than v1, and it has been
verified in our test environment. Can you share your formula first and we
can discuss it together? :)

Thanks,
Jing


> anything for your "PE utilization" metrics, which I can raise
> internally. It could always be added to perf on top of the standardised
> ones if we don't add it to our standard ones.
> 
> Thanks
> James
> 
>> with this series on neoverse-n2:
>>
>> $./perf list metricgroup
>>
>> List of pre-defined events (to be used in -e):
>>
>>
>> Metric Groups:
>>
>> Branch
>> Cache
>> InstructionMix
>> PEutilization
>> TLB
>> TopDownL1
>>
>>
>> $./perf list
>>
>> ...
>> Metric Groups:
>>
>> Branch:
>>   branch_miss_pred_rate
>>        [The rate of branches mis-predited to the overall branches]
>>   branch_mpki
>>        [The rate of branches mis-predicted per kilo instructions]
>>   branch_pki
>>        [The rate of branches retired per kilo instructions]
>> Cache:
>>   l1d_cache_miss_rate
>>        [The rate of L1 D-Cache misses to the overall L1 D-Cache]
>>   l1d_cache_mpki
>>        [The rate of L1 D-Cache misses per kilo instructions]
>> ...
>>
>>
>> $sudo ./perf stat -a -M TLB sleep 1
>>
>>  Performance counter stats for 'system wide':
>>
>>         35,861,936      L1I_TLB                          #     0.00 itlb_walk_rate           (74.91%)
>>              5,661      ITLB_WALK                                                            (74.91%)
>>         97,279,240      INST_RETIRED                     #     0.07 itlb_mpki                (74.91%)
>>              6,851      ITLB_WALK                                                            (74.91%)
>>             26,391      DTLB_WALK                        #     0.00 dtlb_walk_rate           (75.07%)
>>         35,585,545      L1D_TLB                                                              (75.07%)
>>         85,923,244      INST_RETIRED                     #     0.35 dtlb_mpki                (75.11%)
>>             29,992      DTLB_WALK                                                            (75.11%)
>>
>>        1.003450755 seconds time elapsed
>>        
>>
>> Jing Zhang (6):
>>   perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
>>   perf vendor events arm64: Add TLB metrics for neoverse-n2
>>   perf vendor events arm64: Add cache metrics for neoverse-n2
>>   perf vendor events arm64: Add branch metrics for neoverse-n2
>>   perf vendor events arm64: Add PE utilization metrics for neoverse-n2
>>   perf vendor events arm64: Add instruction mix metrics for neoverse-n2
>>
>>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 247 +++++++++++++++++++++
>>  1 file changed, 247 insertions(+)
>>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>
  
Jing Zhang Nov. 19, 2022, 3:30 a.m. UTC | #3
在 2022/11/16 下午7:19, James Clark 写道:
> 
> 
> On 31/10/2022 11:11, Jing Zhang wrote:
>> This series add six metricgroups for neoverse-n2, among which, the
>> formula of topdown L1 is from the document:
>> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>
>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>> help further analysis of performance bottlenecks.
>>
> 
> Hi Jing,
> 
> Thanks for working on this, these metrics look ok to me in general,
> although we're currently working on publishing standardised metrics
> across all new cores as part of a new project in Arm. This will include
> N2, and our ones are very similar (or almost identical) to yours,
> barring slightly different group names, metric names, and differences in
> things like outputting topdown metrics as percentages.
> 
> We plan to publish our standard metrics some time in the next 2 months.
> Would you consider holding off on merging this change so that we have
> consistant group names and units going forward? Otherwise N2 would be
> the odd one out. I will send you the metrics when they are ready, and we
> will have a script to generate perf jsons from them, so you can review.
> 
> We also have a slightly different forumula for one of the top down
> metrics which I think would be slightly more accurate. We don't have
> anything for your "PE utilization" metrics, which I can raise
> internally. It could always be added to perf on top of the standardised
> ones if we don't add it to our standard ones.
> 
> Thanks
> James
> 

Hi James,

Regarding the arm n2 standard metrics last time, is my understanding correct,
and does it meet your meaning? If so, may I ask when you will send me the
standards you formulate so that I can align with you in time over my patchset.
Please communicate this matter so that we can understand each other's schedule.

Thanks,
Jing


>> with this series on neoverse-n2:
>>
>> $./perf list metricgroup
>>
>> List of pre-defined events (to be used in -e):
>>
>>
>> Metric Groups:
>>
>> Branch
>> Cache
>> InstructionMix
>> PEutilization
>> TLB
>> TopDownL1
>>
>>
>> $./perf list
>>
>> ...
>> Metric Groups:
>>
>> Branch:
>>   branch_miss_pred_rate
>>        [The rate of branches mis-predited to the overall branches]
>>   branch_mpki
>>        [The rate of branches mis-predicted per kilo instructions]
>>   branch_pki
>>        [The rate of branches retired per kilo instructions]
>> Cache:
>>   l1d_cache_miss_rate
>>        [The rate of L1 D-Cache misses to the overall L1 D-Cache]
>>   l1d_cache_mpki
>>        [The rate of L1 D-Cache misses per kilo instructions]
>> ...
>>
>>
>> $sudo ./perf stat -a -M TLB sleep 1
>>
>>  Performance counter stats for 'system wide':
>>
>>         35,861,936      L1I_TLB                          #     0.00 itlb_walk_rate           (74.91%)
>>              5,661      ITLB_WALK                                                            (74.91%)
>>         97,279,240      INST_RETIRED                     #     0.07 itlb_mpki                (74.91%)
>>              6,851      ITLB_WALK                                                            (74.91%)
>>             26,391      DTLB_WALK                        #     0.00 dtlb_walk_rate           (75.07%)
>>         35,585,545      L1D_TLB                                                              (75.07%)
>>         85,923,244      INST_RETIRED                     #     0.35 dtlb_mpki                (75.11%)
>>             29,992      DTLB_WALK                                                            (75.11%)
>>
>>        1.003450755 seconds time elapsed
>>        
>>
>> Jing Zhang (6):
>>   perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
>>   perf vendor events arm64: Add TLB metrics for neoverse-n2
>>   perf vendor events arm64: Add cache metrics for neoverse-n2
>>   perf vendor events arm64: Add branch metrics for neoverse-n2
>>   perf vendor events arm64: Add PE utilization metrics for neoverse-n2
>>   perf vendor events arm64: Add instruction mix metrics for neoverse-n2
>>
>>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 247 +++++++++++++++++++++
>>  1 file changed, 247 insertions(+)
>>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>
  
Jing Zhang Nov. 20, 2022, 3:49 a.m. UTC | #4
在 2022/11/20 上午5:46, Ian Rogers 写道:
> On Fri, Nov 18, 2022 at 7:30 PM Jing Zhang <renyu.zj@linux.alibaba.com <mailto:renyu.zj@linux.alibaba.com>> wrote:
>>
>>
>> 在 2022/11/16 下午7:19, James Clark 写道:
>> >
>> >
>> > On 31/10/2022 11:11, Jing Zhang wrote:
>> >> This series add six metricgroups for neoverse-n2, among which, the
>> >> formula of topdown L1 is from the document:
>> >> https://documentation-service.arm.com/static/60250c7395978b529036da86?token= <https://documentation-service.arm.com/static/60250c7395978b529036da86?token=>
>> >>
>> >> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>> >> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>> >> help further analysis of performance bottlenecks.
>> >>
>> >
>> > Hi Jing,
>> >
>> > Thanks for working on this, these metrics look ok to me in general,
>> > although we're currently working on publishing standardised metrics
>> > across all new cores as part of a new project in Arm. This will include
>> > N2, and our ones are very similar (or almost identical) to yours,
>> > barring slightly different group names, metric names, and differences in
>> > things like outputting topdown metrics as percentages.
>> >
>> > We plan to publish our standard metrics some time in the next 2 months.
>> > Would you consider holding off on merging this change so that we have
>> > consistant group names and units going forward? Otherwise N2 would be
>> > the odd one out. I will send you the metrics when they are ready, and we
>> > will have a script to generate perf jsons from them, so you can review.
>> >
>> > We also have a slightly different forumula for one of the top down
>> > metrics which I think would be slightly more accurate. We don't have
>> > anything for your "PE utilization" metrics, which I can raise
>> > internally. It could always be added to perf on top of the standardised
>> > ones if we don't add it to our standard ones.
>> >
>> > Thanks
>> > James
>> >
>>
>> Hi James,
>>
>> Regarding the arm n2 standard metrics last time, is my understanding correct,
>> and does it meet your meaning? If so, may I ask when you will send me the
>> standards you formulate so that I can align with you in time over my patchset.
>> Please communicate this matter so that we can understand each other's schedule.
>>
>> Thanks,
>> Jing
> 
> Hi,
> 
> In past versions of the perf tool the metrics have been pretty broken. If we have something that is good we shouldn't be holding it to a bar of being perfect, we can merge what we have and improve over time. In this case what Jing has prepared may arrive in time for Linux 6.2 whilst the standard metrics may arrive in time for 6.3. I'd suggest merging Jing's work and then improving on it with the standard metrics.
> 
> In terms of the metrics themselves, could we add ScaleUnit? For example:
> 
> +    {
> +        "MetricExpr": "LD_SPEC / INST_SPEC",
> +        "PublicDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
> +        "BriefDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
> +        "MetricGroup": "InstructionMix",
> +        "MetricName": "load_spec_rate"
> +    },
> 
> A ScaleUnit of "100%" would likely make things more readable.
> 

Ok, I'll modify it over your suggestion, making it more readable, and move on with it.

Thanks,
Jing

> Thanks,
> Ian
> 
>> >> with this series on neoverse-n2:
>> >>
>> >> $./perf list metricgroup
>> >>
>> >> List of pre-defined events (to be used in -e):
>> >>
>> >>
>> >> Metric Groups:
>> >>
>> >> Branch
>> >> Cache
>> >> InstructionMix
>> >> PEutilization
>> >> TLB
>> >> TopDownL1
>> >>
>> >>
>> >> $./perf list
>> >>
>> >> ...
>> >> Metric Groups:
>> >>
>> >> Branch:
>> >>   branch_miss_pred_rate
>> >>        [The rate of branches mis-predited to the overall branches]
>> >>   branch_mpki
>> >>        [The rate of branches mis-predicted per kilo instructions]
>> >>   branch_pki
>> >>        [The rate of branches retired per kilo instructions]
>> >> Cache:
>> >>   l1d_cache_miss_rate
>> >>        [The rate of L1 D-Cache misses to the overall L1 D-Cache]
>> >>   l1d_cache_mpki
>> >>        [The rate of L1 D-Cache misses per kilo instructions]
>> >> ...
>> >>
>> >>
>> >> $sudo ./perf stat -a -M TLB sleep 1
>> >>
>> >>  Performance counter stats for 'system wide':
>> >>
>> >>         35,861,936      L1I_TLB                          #     0.00 itlb_walk_rate           (74.91%)
>> >>              5,661      ITLB_WALK                                                            (74.91%)
>> >>         97,279,240      INST_RETIRED                     #     0.07 itlb_mpki                (74.91%)
>> >>              6,851      ITLB_WALK                                                            (74.91%)
>> >>             26,391      DTLB_WALK                        #     0.00 dtlb_walk_rate           (75.07%)
>> >>         35,585,545      L1D_TLB                                                              (75.07%)
>> >>         85,923,244      INST_RETIRED                     #     0.35 dtlb_mpki                (75.11%)
>> >>             29,992      DTLB_WALK                                                            (75.11%)
>> >>
>> >>        1.003450755 seconds time elapsed
>> >>     
>> >>
>> >> Jing Zhang (6):
>> >>   perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
>> >>   perf vendor events arm64: Add TLB metrics for neoverse-n2
>> >>   perf vendor events arm64: Add cache metrics for neoverse-n2
>> >>   perf vendor events arm64: Add branch metrics for neoverse-n2
>> >>   perf vendor events arm64: Add PE utilization metrics for neoverse-n2
>> >>   perf vendor events arm64: Add instruction mix metrics for neoverse-n2
>> >>
>> >>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 247 +++++++++++++++++++++
>> >>  1 file changed, 247 insertions(+)
>> >>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> >>
  
James Clark Nov. 21, 2022, 11:51 a.m. UTC | #5
On 16/11/2022 15:26, Jing Zhang wrote:
> 
> 
> 在 2022/11/16 下午7:19, James Clark 写道:
>>
>>
>> On 31/10/2022 11:11, Jing Zhang wrote:
>>> This series add six metricgroups for neoverse-n2, among which, the
>>> formula of topdown L1 is from the document:
>>> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>>
>>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>>> help further analysis of performance bottlenecks.
>>>
>>
>> Hi Jing,
>>
>> Thanks for working on this, these metrics look ok to me in general,
>> although we're currently working on publishing standardised metrics
>> across all new cores as part of a new project in Arm. This will include
>> N2, and our ones are very similar (or almost identical) to yours,
>> barring slightly different group names, metric names, and differences in
>> things like outputting topdown metrics as percentages.
>>
>> We plan to publish our standard metrics some time in the next 2 months.
>> Would you consider holding off on merging this change so that we have
>> consistant group names and units going forward? Otherwise N2 would be> the odd one out. I will send you the metrics when they are ready, and we
>> will have a script to generate perf jsons from them, so you can review.
>>
> 
> Do you mean that after you release the new standard metrics, I remake my
> patch referring to them, such as consistent group names and unit?

Hi Jing,

I was planning to submit the patch myself, but there will be a script to
generate perf json files, so no manual work would be needed. Although
this is complicated by the fact that we won't be publishing the fixed
TopdownL1 metrics that you have for the existing N2 silicon so there
would be a one time copy paste to fix that part.

> 
> 
>> We also have a slightly different forumula for one of the top down
>> metrics which I think would be slightly more accurate. We don't have
> 
> 
> The v2 version of the patchset updated the formula of topdown L1.
> Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/
> 
> The formula of the v2 version is more accurate than v1, and it has been
> verified in our test environment. Can you share your formula first and we
> can discuss it together? :)

I was looking at v2 but replied to the root of the thread by mistake. I
also had it the wrong way round. So your version corrects for the errata
on the current version of N2 (as you mentioned in the commit message).
Our version would be if there is a future new silicon revision with that
fixed, but it does have an extra improvement by subtracting the branch
mispredicts.

Perf doesn't currently match the jsons based on silicon revision, so
we'd have to add something in for that if a fixed silicon version is
released. But this is another problem for another time.

This is the frontend bound metric we have for future revisions:

	"100 * ( (STALL_SLOT_FRONTEND/(CPU_CYCLES * 5)) - ((BR_MIS_PRED *
4)/CPU_CYCLES) )"

Other changes are, for example, your 'wasted' metric, we have
'bad_speculation', and without the
cycles subtraction:

	100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 - (STALL_SLOT/(CPU_CYCLES *
5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )

And some more details filled in around the units, for example:

    {
        "MetricName": "bad_speculation",
        "MetricExpr": "100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 -
(STALL_SLOT/(CPU_CYCLES * 5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )",
        "BriefDescription": "Bad Speculation",
        "PublicDescription": "This metric is the percentage of total
slots that executed operations and didn't retire due to a pipeline
flush.\nThis indicates cycles that were utilized but inefficiently.",
        "MetricGroup": "TopdownL1",
        "ScaleUnit": "1percent of slots"
    },

So ignoring the errata issue, the main reason to hold off is for
consistency and churn because these metrics in this format will be
released for all cores going forwards.

Thanks
James
  
James Clark Nov. 21, 2022, 11:55 a.m. UTC | #6
On 19/11/2022 21:46, Ian Rogers wrote:
> On Fri, Nov 18, 2022 at 7:30 PM Jing Zhang <renyu.zj@linux.alibaba.com>
> wrote:
>>
>>
>> 在 2022/11/16 下午7:19, James Clark 写道:
>>>
>>>
>>> On 31/10/2022 11:11, Jing Zhang wrote:
>>>> This series add six metricgroups for neoverse-n2, among which, the
>>>> formula of topdown L1 is from the document:
>>>>
> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>>>
>>>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>>>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>>>> help further analysis of performance bottlenecks.
>>>>
>>>
>>> Hi Jing,
>>>
>>> Thanks for working on this, these metrics look ok to me in general,
>>> although we're currently working on publishing standardised metrics
>>> across all new cores as part of a new project in Arm. This will include
>>> N2, and our ones are very similar (or almost identical) to yours,
>>> barring slightly different group names, metric names, and differences in
>>> things like outputting topdown metrics as percentages.
>>>
>>> We plan to publish our standard metrics some time in the next 2 months.
>>> Would you consider holding off on merging this change so that we have
>>> consistant group names and units going forward? Otherwise N2 would be
>>> the odd one out. I will send you the metrics when they are ready, and we
>>> will have a script to generate perf jsons from them, so you can review.
>>>
>>> We also have a slightly different forumula for one of the top down
>>> metrics which I think would be slightly more accurate. We don't have
>>> anything for your "PE utilization" metrics, which I can raise
>>> internally. It could always be added to perf on top of the standardised
>>> ones if we don't add it to our standard ones.
>>>
>>> Thanks
>>> James
>>>
>>
>> Hi James,
>>
>> Regarding the arm n2 standard metrics last time, is my understanding
> correct,
>> and does it meet your meaning? If so, may I ask when you will send me the
>> standards you formulate so that I can align with you in time over my
> patchset.
>> Please communicate this matter so that we can understand each other's
> schedule.
>>
>> Thanks,
>> Jing
> 
> Hi,
> 
> In past versions of the perf tool the metrics have been pretty broken. If
> we have something that is good we shouldn't be holding it to a bar of being
> perfect, we can merge what we have and improve over time. In this case what
> Jing has prepared may arrive in time for Linux 6.2 whilst the standard
> metrics may arrive in time for 6.3. I'd suggest merging Jing's work and
> then improving on it with the standard metrics.
>

I'm not completely opposed to this, I was just worried about the churn
because ours will be generated from a script, and that it would end up
looking like a mass replacement of these that would have only recently
been added.

But maybe that's fine like you say.

> In terms of the metrics themselves, could we add ScaleUnit? For example:
> 
> +    {
> +        "MetricExpr": "LD_SPEC / INST_SPEC",
> +        "PublicDescription": "The rate of load instructions speculatively
> executed to overall instructions speclatively executed",
> +        "BriefDescription": "The rate of load instructions speculatively
> executed to overall instructions speclatively executed",
> +        "MetricGroup": "InstructionMix",
> +        "MetricName": "load_spec_rate"
> +    },
> 
> A ScaleUnit of "100%" would likely make things more readable.
> 
> Thanks,
> Ian
>
  
Jing Zhang Nov. 22, 2022, 7:11 a.m. UTC | #7
在 2022/11/21 下午7:51, James Clark 写道:
> 
> 
> On 16/11/2022 15:26, Jing Zhang wrote:
>>
>>
>> 在 2022/11/16 下午7:19, James Clark 写道:
>>>
>>>
>>> On 31/10/2022 11:11, Jing Zhang wrote:
>>>> This series add six metricgroups for neoverse-n2, among which, the
>>>> formula of topdown L1 is from the document:
>>>> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>>>
>>>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>>>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>>>> help further analysis of performance bottlenecks.
>>>>
>>>
>>> Hi Jing,
>>>
>>> Thanks for working on this, these metrics look ok to me in general,
>>> although we're currently working on publishing standardised metrics
>>> across all new cores as part of a new project in Arm. This will include
>>> N2, and our ones are very similar (or almost identical) to yours,
>>> barring slightly different group names, metric names, and differences in
>>> things like outputting topdown metrics as percentages.
>>>
>>> We plan to publish our standard metrics some time in the next 2 months.
>>> Would you consider holding off on merging this change so that we have
>>> consistant group names and units going forward? Otherwise N2 would be> the odd one out. I will send you the metrics when they are ready, and we
>>> will have a script to generate perf jsons from them, so you can review.
>>>
>>
>> Do you mean that after you release the new standard metrics, I remake my
>> patch referring to them, such as consistent group names and unit?
> 
> Hi Jing,
> 
> I was planning to submit the patch myself, but there will be a script to
> generate perf json files, so no manual work would be needed. Although
> this is complicated by the fact that we won't be publishing the fixed
> TopdownL1 metrics that you have for the existing N2 silicon so there
> would be a one time copy paste to fix that part.
> 
>>
>>
>>> We also have a slightly different forumula for one of the top down
>>> metrics which I think would be slightly more accurate. We don't have
>>
>>
>> The v2 version of the patchset updated the formula of topdown L1.
>> Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/
>>
>> The formula of the v2 version is more accurate than v1, and it has been
>> verified in our test environment. Can you share your formula first and we
>> can discuss it together? :)
> 
> I was looking at v2 but replied to the root of the thread by mistake. I
> also had it the wrong way round. So your version corrects for the errata
> on the current version of N2 (as you mentioned in the commit message).
> Our version would be if there is a future new silicon revision with that
> fixed, but it does have an extra improvement by subtracting the branch
> mispredicts.
> 
> Perf doesn't currently match the jsons based on silicon revision, so
> we'd have to add something in for that if a fixed silicon version is
> released. But this is another problem for another time.
> 

Hi James,

Let's do what Ian said, and you can improve it later with the standard metrics,
after the fixed silicon version is released.


> This is the frontend bound metric we have for future revisions:
> 
> 	"100 * ( (STALL_SLOT_FRONTEND/(CPU_CYCLES * 5)) - ((BR_MIS_PRED *
> 4)/CPU_CYCLES) )"
> 
> Other changes are, for example, your 'wasted' metric, we have
> 'bad_speculation', and without the
> cycles subtraction:
> 
> 	100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 - (STALL_SLOT/(CPU_CYCLES *
> 5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )
> 

Thanks for sharing your metric version, But I still wonder, is BR_MIS_PRED not classified
as frontend bound? How do you judge the extra improvement by subtracting branch mispredicts?

> And some more details filled in around the units, for example:
> 
>     {
>         "MetricName": "bad_speculation",
>         "MetricExpr": "100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 -
> (STALL_SLOT/(CPU_CYCLES * 5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )",
>         "BriefDescription": "Bad Speculation",
>         "PublicDescription": "This metric is the percentage of total
> slots that executed operations and didn't retire due to a pipeline
> flush.\nThis indicates cycles that were utilized but inefficiently.",
>         "MetricGroup": "TopdownL1",
>         "ScaleUnit": "1percent of slots"
>     },
> 

My "wasted" metric was changed according to the arm documentation description, it was originally
"bad_speculation".  I will change "wasted" back to "bad_speculation", if you wish.


Thanks,
Jing


> So ignoring the errata issue, the main reason to hold off is for
> consistency and churn because these metrics in this format will be
> released for all cores going forwards.
> 
> Thanks
> James
>
  
James Clark Nov. 22, 2022, 11:53 a.m. UTC | #8
On 22/11/2022 07:11, Jing Zhang wrote:
> 
> 
> 在 2022/11/21 下午7:51, James Clark 写道:
>>
>>
>> On 16/11/2022 15:26, Jing Zhang wrote:
>>>
>>>
>>> 在 2022/11/16 下午7:19, James Clark 写道:
>>>>
>>>>
>>>> On 31/10/2022 11:11, Jing Zhang wrote:
>>>>> This series add six metricgroups for neoverse-n2, among which, the
>>>>> formula of topdown L1 is from the document:
>>>>> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>>>>
>>>>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>>>>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>>>>> help further analysis of performance bottlenecks.
>>>>>
>>>>
>>>> Hi Jing,
>>>>
>>>> Thanks for working on this, these metrics look ok to me in general,
>>>> although we're currently working on publishing standardised metrics
>>>> across all new cores as part of a new project in Arm. This will include
>>>> N2, and our ones are very similar (or almost identical) to yours,
>>>> barring slightly different group names, metric names, and differences in
>>>> things like outputting topdown metrics as percentages.
>>>>
>>>> We plan to publish our standard metrics some time in the next 2 months.
>>>> Would you consider holding off on merging this change so that we have
>>>> consistant group names and units going forward? Otherwise N2 would be> the odd one out. I will send you the metrics when they are ready, and we
>>>> will have a script to generate perf jsons from them, so you can review.
>>>>
>>>
>>> Do you mean that after you release the new standard metrics, I remake my
>>> patch referring to them, such as consistent group names and unit?
>>
>> Hi Jing,
>>
>> I was planning to submit the patch myself, but there will be a script to
>> generate perf json files, so no manual work would be needed. Although
>> this is complicated by the fact that we won't be publishing the fixed
>> TopdownL1 metrics that you have for the existing N2 silicon so there
>> would be a one time copy paste to fix that part.
>>
>>>
>>>
>>>> We also have a slightly different forumula for one of the top down
>>>> metrics which I think would be slightly more accurate. We don't have
>>>
>>>
>>> The v2 version of the patchset updated the formula of topdown L1.
>>> Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/
>>>
>>> The formula of the v2 version is more accurate than v1, and it has been
>>> verified in our test environment. Can you share your formula first and we
>>> can discuss it together? :)
>>
>> I was looking at v2 but replied to the root of the thread by mistake. I
>> also had it the wrong way round. So your version corrects for the errata
>> on the current version of N2 (as you mentioned in the commit message).
>> Our version would be if there is a future new silicon revision with that
>> fixed, but it does have an extra improvement by subtracting the branch
>> mispredicts.
>>
>> Perf doesn't currently match the jsons based on silicon revision, so
>> we'd have to add something in for that if a fixed silicon version is
>> released. But this is another problem for another time.
>>
> 
> Hi James,
> 
> Let's do what Ian said, and you can improve it later with the standard metrics,
> after the fixed silicon version is released.
> 

Ok that's fine by me. I do have one update about our publishing progress
to share. This is the (currently empty) repo that we will be holding our
metrics in: https://gitlab.arm.com/telemetry-solution/telemetry-solution

We'll also have the conversion script in there as well. So there has at
least been some progress and we're getting close. I will keep you
updated when it is populated.

> 
>> This is the frontend bound metric we have for future revisions:
>>
>> 	"100 * ( (STALL_SLOT_FRONTEND/(CPU_CYCLES * 5)) - ((BR_MIS_PRED *
>> 4)/CPU_CYCLES) )"
>>
>> Other changes are, for example, your 'wasted' metric, we have
>> 'bad_speculation', and without the
>> cycles subtraction:
>>
>> 	100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 - (STALL_SLOT/(CPU_CYCLES *
>> 5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )
>>
> 
> Thanks for sharing your metric version, But I still wonder, is BR_MIS_PRED not classified
> as frontend bound? 

We're counting branch mispredicts as an extra cost so we subtract it
from frontend_bound because branch related stalls are covered by
bad_speculation where we have added BR_MIS_PRED instead of subtracting.

Unfortunately I'm just the middle man here, I didn't actually work
directly on producing these metrics so I hope nothing gets lost in my
explanation.

> How do you judge the extra improvement by subtracting branch mispredicts?

As far as I know the repo that I mentioned above will have some
benchmarks and tooling that were used to validate our version. So it
should be apparent by running those.

> 
>> And some more details filled in around the units, for example:
>>
>>     {
>>         "MetricName": "bad_speculation",
>>         "MetricExpr": "100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 -
>> (STALL_SLOT/(CPU_CYCLES * 5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )",
>>         "BriefDescription": "Bad Speculation",
>>         "PublicDescription": "This metric is the percentage of total
>> slots that executed operations and didn't retire due to a pipeline
>> flush.\nThis indicates cycles that were utilized but inefficiently.",
>>         "MetricGroup": "TopdownL1",
>>         "ScaleUnit": "1percent of slots"
>>     },
>>
> 
> My "wasted" metric was changed according to the arm documentation description, it was originally
> "bad_speculation".  I will change "wasted" back to "bad_speculation", if you wish.

Yeah that would be good. I think since that document we've tried to
align names more to what was already out there and bad_speculation was
probably judged to be a better description. For example it's already
used in tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json

> 
> 
> Thanks,
> Jing
> 
> 
>> So ignoring the errata issue, the main reason to hold off is for
>> consistency and churn because these metrics in this format will be
>> released for all cores going forwards.
>>
>> Thanks
>> James
>>