[v3,2/7] perf metric: Event "Compat" value supports matching multiple identifiers

Message ID 1685438374-33287-3-git-send-email-renyu.zj@linux.alibaba.com
State New
Headers
Series Add JSON metrics for arm CMN and Yitian710 DDR |

Commit Message

Jing Zhang May 30, 2023, 9:19 a.m. UTC
  The jevent "Compat" is used for uncore PMU alias or metric definitions.

The same PMU driver has different PMU identifiers due to different hardware
versions and types, but they may have some common PMU event/metric. Since a
Compat value can only match one identifier, when adding the same event
alias and metric to PMUs with different identifiers, each identifier needs
to be defined once, which is not streamlined enough.

So let "Compat" value supports matching multiple identifiers. For example,
the Compat value "arm_cmn600;arm_cmn700;arm_ci700" can match the PMU
identifier "arm_cmn600X" or "arm_cmn700X" or "arm_ci700X", where "X" is a
wildcard. Tokens in Unit field are delimited by ';'.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 tools/perf/util/metricgroup.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)
  

Comments

John Garry May 31, 2023, 1:18 p.m. UTC | #1
On 30/05/2023 10:19, Jing Zhang wrote:
> The jevent "Compat" is used for uncore PMU alias or metric definitions.
> 
> The same PMU driver has different PMU identifiers due to different hardware
> versions and types, but they may have some common PMU event/metric. Since a
> Compat value can only match one identifier, when adding the same event
> alias and metric to PMUs with different identifiers, each identifier needs
> to be defined once, which is not streamlined enough.
> 
> So let "Compat" value supports matching multiple identifiers. For example,
> the Compat value "arm_cmn600;arm_cmn700;arm_ci700" can match the PMU
> identifier "arm_cmn600X" or "arm_cmn700X" or "arm_ci700X", where "X" is a
> wildcard.

 From checking the driver, it seems that we have model names 
"arm_cmn600" and "arm_cmn650". Are you saying that "arm_cmn600X" would 
match for those? I am most curious about how "arm_cmn600X" matches 
"arm_cmn650".

> Tokens in Unit field are delimited by ';'.

Thanks for taking a stab at solving this problem.

I have to admit that I am not the biggest fan of having multiple values 
to match in the "Compat" value possibly for every event. It doesn't 
really scale.

I would hope that there are at least some events which we are guaranteed 
to always be present. From what Robin said on the v2 series, for the 
implementations which we care about, events are generally added per 
subsequent version. So we should have some base set of fixed events.

If we are confident that we have a fixed set of base set of events, can 
we ensure that those events would not require this compat string which 
needs each version explicitly stated?

Robin, please let us know what you think of this.

Thanks,
John

> 
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> ---
>   tools/perf/util/metricgroup.c | 24 +++++++++++++++++++++++-
>   1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> index f3559be..c12ccd9 100644
> --- a/tools/perf/util/metricgroup.c
> +++ b/tools/perf/util/metricgroup.c
> @@ -456,6 +456,28 @@ struct metricgroup_iter_data {
>   	void *data;
>   };
>   
> +static bool match_pmu_identifier(const char *id, const char *compat)
> +{
> +	char *tmp = NULL, *tok, *str;
> +	bool res;
> +
> +	str = strdup(compat);
> +	if (!str)
> +		return false;
> +
> +	tok = strtok_r(str, ";", &tmp);
> +	for (; tok; tok = strtok_r(NULL, ";", &tmp)) {
> +		if (!strncmp(id, tok, strlen(tok))) {
> +			res = true;
> +			goto out;
> +		}
> +	}
> +	res = false;
> +out:
> +	free(str);
> +	return res;
> +}
> +
>   static int metricgroup__sys_event_iter(const struct pmu_metric *pm,
>   				       const struct pmu_metrics_table *table,
>   				       void *data)
> @@ -468,7 +490,7 @@ static int metricgroup__sys_event_iter(const struct pmu_metric *pm,
>   
>   	while ((pmu = perf_pmu__scan(pmu))) {
>   
> -		if (!pmu->id || strcmp(pmu->id, pm->compat))
> +		if (!pmu->id || !match_pmu_identifier(pmu->id, pm->compat))
>   			continue;
>   
>   		return d->fn(pm, table, d->data);
  
Jing Zhang June 1, 2023, 8:58 a.m. UTC | #2
在 2023/5/31 下午9:18, John Garry 写道:
> On 30/05/2023 10:19, Jing Zhang wrote:
>> The jevent "Compat" is used for uncore PMU alias or metric definitions.
>>
>> The same PMU driver has different PMU identifiers due to different hardware
>> versions and types, but they may have some common PMU event/metric. Since a
>> Compat value can only match one identifier, when adding the same event
>> alias and metric to PMUs with different identifiers, each identifier needs
>> to be defined once, which is not streamlined enough.
>>
>> So let "Compat" value supports matching multiple identifiers. For example,
>> the Compat value "arm_cmn600;arm_cmn700;arm_ci700" can match the PMU
>> identifier "arm_cmn600X" or "arm_cmn700X" or "arm_ci700X", where "X" is a
>> wildcard.
> 
> From checking the driver, it seems that we have model names "arm_cmn600" and "arm_cmn650". Are you saying that "arm_cmn600X" would match for those? I am most curious about how "arm_cmn600X" matches "arm_cmn650".
> 

Hi John,

From patch #1 we have identifiers "arm_cmn600_0" and "arm_cmn650_0" etc. The identifier consists of model_name and revision.
The compatible value "arm_cmn600;arm_cmn650" can match the identifier "arm_cmn600_0" or "arm_cmn650_0". Maybe the message log
is not clear enough.

For example in patch #3 the metric "slc_miss_rate" is a generic metric for cmn-any. So we can define:

+	{
+		"MetricName": "slc_miss_rate",
+		"BriefDescription": "The system level cache miss rate include.",
+		"MetricGroup": "arm_cmn",
+		"MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access",
+		"ScaleUnit": "100%",
+		"Unit": "arm_cmn",
+		"Compat": "arm_cmn600;arm_cmn650;arm_cmn700;arm_ci700"
+	},


It can match identifiers "arm_cmn600_{0,1,2..X}" or "arm_cmn650_{0,1,2..X}" or "arm_cmn700_{0,1,2..X}" or "arm_ci700_{ 0,1,2..X}".
In other words, it can match all identifiers prefixed with “arm_cmn600” or “arm_cmn650” or “arm_cmn700” or “arm_ci700”.

If a new model arm_cmn driver with identifier "arm_cmn750_0", it will not be matched, but if a new revision arm_cmn driver with identifier
"arm_cmn700_4", it can be matched.


>> Tokens in Unit field are delimited by ';'.
> 
> Thanks for taking a stab at solving this problem.
> 
> I have to admit that I am not the biggest fan of having multiple values to match in the "Compat" value possibly for every event. It doesn't really scale.
> 
> I would hope that there are at least some events which we are guaranteed to always be present. From what Robin said on the v2 series, for the implementations which we care about, events are generally added per subsequent version. So we should have some base set of fixed events.
> 
> If we are confident that we have a fixed set of base set of events, can we ensure that those events would not require this compat string which needs each version explicitly stated?
> 

If we are sure that some events will always exist in subsequent versions, we can set the Compat field to "arm_cmn;arm_ci". After that,
whether it is a different model or a different revision of the cmn PMU, it will be compatible. That is, it matches all whose identifier
is prefixed with “arm_cmn” or “arm_ci”.

Maybe it's not a perfect solution yet, looking forward to your suggestions.


Thanks,
Jing


> Robin, please let us know what you think of this.
> 
> Thanks,
> John
> 
>>
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>> ---
>>   tools/perf/util/metricgroup.c | 24 +++++++++++++++++++++++-
>>   1 file changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
>> index f3559be..c12ccd9 100644
>> --- a/tools/perf/util/metricgroup.c
>> +++ b/tools/perf/util/metricgroup.c
>> @@ -456,6 +456,28 @@ struct metricgroup_iter_data {
>>       void *data;
>>   };
>>   +static bool match_pmu_identifier(const char *id, const char *compat)
>> +{
>> +    char *tmp = NULL, *tok, *str;
>> +    bool res;
>> +
>> +    str = strdup(compat);
>> +    if (!str)
>> +        return false;
>> +
>> +    tok = strtok_r(str, ";", &tmp);
>> +    for (; tok; tok = strtok_r(NULL, ";", &tmp)) {
>> +        if (!strncmp(id, tok, strlen(tok))) {
>> +            res = true;
>> +            goto out;
>> +        }
>> +    }
>> +    res = false;
>> +out:
>> +    free(str);
>> +    return res;
>> +}
>> +
>>   static int metricgroup__sys_event_iter(const struct pmu_metric *pm,
>>                          const struct pmu_metrics_table *table,
>>                          void *data)
>> @@ -468,7 +490,7 @@ static int metricgroup__sys_event_iter(const struct pmu_metric *pm,
>>         while ((pmu = perf_pmu__scan(pmu))) {
>>   -        if (!pmu->id || strcmp(pmu->id, pm->compat))
>> +        if (!pmu->id || !match_pmu_identifier(pmu->id, pm->compat))
>>               continue;
>>             return d->fn(pm, table, d->data);
  
John Garry June 2, 2023, 4:20 p.m. UTC | #3
On 01/06/2023 09:58, Jing Zhang wrote:
>>  From checking the driver, it seems that we have model names "arm_cmn600" and "arm_cmn650". Are you saying that "arm_cmn600X" would match for those? I am most curious about how "arm_cmn600X" matches "arm_cmn650".
>>
> Hi John,
> 
>  From patch #1 we have identifiers "arm_cmn600_0" and "arm_cmn650_0" etc. 

ok, I see. Your idea for the cmn driver HW identifier format is odd to 
me. Your HW identifier is a mix of the HW IP model name (from 
arm_cmn_device_data.model_name) with some the kernel revision identifier 
(from cmn_revision). The kernel revision identifier is an enum, and I 
don't think that it is a good idea to expose enum values through sysfs 
files.

I assume that there is some ordering requirement for cmn_revision, 
considering that we have equality operations on the revision in the driver.

> The identifier consists of model_name and revision.
> The compatible value "arm_cmn600;arm_cmn650" can match the identifier "arm_cmn600_0" or "arm_cmn650_0". Maybe the message log
> is not clear enough.
> 
> For example in patch #3 the metric "slc_miss_rate" is a generic metric for cmn-any. So we can define:
> 
> +	{
> +		"MetricName": "slc_miss_rate",
> +		"BriefDescription": "The system level cache miss rate include.",
> +		"MetricGroup": "arm_cmn",
> +		"MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access",
> +		"ScaleUnit": "100%",
> +		"Unit": "arm_cmn",
> +		"Compat": "arm_cmn600;arm_cmn650;arm_cmn700;arm_ci700"
> +	},
> 
> 
> It can match identifiers "arm_cmn600_{0,1,2..X}" or "arm_cmn650_{0,1,2..X}" or "arm_cmn700_{0,1,2..X}" or "arm_ci700_{ 0,1,2..X}".
> In other words, it can match all identifiers prefixed with “arm_cmn600” or “arm_cmn650” or “arm_cmn700” or “arm_ci700”.
> 
> If a new model arm_cmn driver with identifier "arm_cmn750_0", it will not be matched, but if a new revision arm_cmn driver with identifier
> "arm_cmn700_4", it can be matched.

OK, I see what you mean. My confusion came about though your commit 
message on this same patch, which did not mention cmn650. I assumed that 
the example event which you were describing was supported for arm_cmn650 
and you intentionally omitted it.

> 
> 
>>> Tokens in Unit field are delimited by ';'.
>> Thanks for taking a stab at solving this problem.
>>
>> I have to admit that I am not the biggest fan of having multiple values to match in the "Compat" value possibly for every event. It doesn't really scale.
>>
>> I would hope that there are at least some events which we are guaranteed to always be present. From what Robin said on the v2 series, for the implementations which we care about, events are generally added per subsequent version. So we should have some base set of fixed events.
>>
>> If we are confident that we have a fixed set of base set of events, can we ensure that those events would not require this compat string which needs each version explicitly stated?
>>
> If we are sure that some events will always exist in subsequent versions, we can set the Compat field to "arm_cmn;arm_ci". After that,
> whether it is a different model or a different revision of the cmn PMU, it will be compatible. That is, it matches all whose identifier
> is prefixed with “arm_cmn” or “arm_ci”.

Sure, we could do something like that. Or if we are super-confident that 
every model and rev will support some event, then we can change perf 
tool to just not check Compat for that event.

> 
> Maybe it's not a perfect solution yet, looking forward to your suggestions.

Well first we need to define kernel HW identifier format. I don't know 
why we don't encode model and revision name, like "cmn650_r1p1". Robin?

Thanks,
John
  
Jing Zhang June 5, 2023, 2:46 a.m. UTC | #4
在 2023/6/3 上午12:20, John Garry 写道:
> On 01/06/2023 09:58, Jing Zhang wrote:
>>>  From checking the driver, it seems that we have model names "arm_cmn600" and "arm_cmn650". Are you saying that "arm_cmn600X" would match for those? I am most curious about how "arm_cmn600X" matches "arm_cmn650".
>>>
>> Hi John,
>>
>>  From patch #1 we have identifiers "arm_cmn600_0" and "arm_cmn650_0" etc. 
> 
> ok, I see. Your idea for the cmn driver HW identifier format is odd to me. Your HW identifier is a mix of the HW IP model name (from arm_cmn_device_data.model_name) with some the kernel revision identifier (from cmn_revision). The kernel revision identifier is an enum, and I don't think that it is a good idea to expose enum values through sysfs files.
> 
> I assume that there is some ordering requirement for cmn_revision, considering that we have equality operations on the revision in the driver.
> 

Actually revision is a register value rather than an enumeration value. If I modify the revision to r0p0, etc., it is also possible, but I need
to convert the enumeration to a string.


>> The identifier consists of model_name and revision.
>> The compatible value "arm_cmn600;arm_cmn650" can match the identifier "arm_cmn600_0" or "arm_cmn650_0". Maybe the message log
>> is not clear enough.
>>
>> For example in patch #3 the metric "slc_miss_rate" is a generic metric for cmn-any. So we can define:
>>
>> +    {
>> +        "MetricName": "slc_miss_rate",
>> +        "BriefDescription": "The system level cache miss rate include.",
>> +        "MetricGroup": "arm_cmn",
>> +        "MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access",
>> +        "ScaleUnit": "100%",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "arm_cmn600;arm_cmn650;arm_cmn700;arm_ci700"
>> +    },
>>
>>
>> It can match identifiers "arm_cmn600_{0,1,2..X}" or "arm_cmn650_{0,1,2..X}" or "arm_cmn700_{0,1,2..X}" or "arm_ci700_{ 0,1,2..X}".
>> In other words, it can match all identifiers prefixed with “arm_cmn600” or “arm_cmn650” or “arm_cmn700” or “arm_ci700”.
>>
>> If a new model arm_cmn driver with identifier "arm_cmn750_0", it will not be matched, but if a new revision arm_cmn driver with identifier
>> "arm_cmn700_4", it can be matched.
> 
> OK, I see what you mean. My confusion came about though your commit message on this same patch, which did not mention cmn650. I assumed that the example event which you were describing was supported for arm_cmn650 and you intentionally omitted it.
> 

Yes, I didn't write cmn650 in the commit message, just because I omitted it. I will add it in a later version.

>>
>>
>>>> Tokens in Unit field are delimited by ';'.
>>> Thanks for taking a stab at solving this problem.
>>>
>>> I have to admit that I am not the biggest fan of having multiple values to match in the "Compat" value possibly for every event. It doesn't really scale.
>>>
>>> I would hope that there are at least some events which we are guaranteed to always be present. From what Robin said on the v2 series, for the implementations which we care about, events are generally added per subsequent version. So we should have some base set of fixed events.
>>>
>>> If we are confident that we have a fixed set of base set of events, can we ensure that those events would not require this compat string which needs each version explicitly stated?
>>>
>> If we are sure that some events will always exist in subsequent versions, we can set the Compat field to "arm_cmn;arm_ci". After that,
>> whether it is a different model or a different revision of the cmn PMU, it will be compatible. That is, it matches all whose identifier
>> is prefixed with “arm_cmn” or “arm_ci”.
> 
> Sure, we could do something like that. Or if we are super-confident that every model and rev will support some event, then we can change perf tool to just not check Compat for that event.
> 

Yes, I have also thought about this solution. If it is an event supported by all versions, as long as it meets the Unit match, it does not need
to check Compat. However, the current perf tools tool seems to ignore the "Unit" inspection for the metric event.

>>
>> Maybe it's not a perfect solution yet, looking forward to your suggestions.
> 
> Well first we need to define kernel HW identifier format. I don't know why we don't encode model and revision name, like "cmn650_r1p1". Robin?
> 

As mentioned earlier, revision is a register value rather than an enumeration value. If change revision to revision name, I need a more redundant
operation of converting enumeration value to string. If you think changing the naming method to "cmn650_r1p1" is clearer, of course
there is no problem.

Thanks,
Jing
  
Arnaldo Carvalho de Melo June 5, 2023, 7:39 p.m. UTC | #5
Em Mon, Jun 05, 2023 at 10:46:36AM +0800, Jing Zhang escreveu:
> 在 2023/6/3 上午12:20, John Garry 写道:
> > On 01/06/2023 09:58, Jing Zhang wrote:
> >> It can match identifiers "arm_cmn600_{0,1,2..X}" or "arm_cmn650_{0,1,2..X}" or "arm_cmn700_{0,1,2..X}" or "arm_ci700_{ 0,1,2..X}".
> >> In other words, it can match all identifiers prefixed with “arm_cmn600” or “arm_cmn650” or “arm_cmn700” or “arm_ci700”.
> >>
> >> If a new model arm_cmn driver with identifier "arm_cmn750_0", it will not be matched, but if a new revision arm_cmn driver with identifier
> >> "arm_cmn700_4", it can be matched.
> > 
> > OK, I see what you mean. My confusion came about though your commit message on this same patch, which did not mention cmn650. I assumed that the example event which you were describing was supported for arm_cmn650 and you intentionally omitted it.
> > 
> 
> Yes, I didn't write cmn650 in the commit message, just because I omitted it. I will add it in a later version.

Ok, will wait for v4 then.

- Arnaldo
  
John Garry June 6, 2023, 2:11 p.m. UTC | #6
On 05/06/2023 03:46, Jing Zhang wrote:
>> I assume that there is some ordering requirement for cmn_revision, considering that we have equality operations on the revision in the driver.
>>
> Actually revision is a register value rather than an enumeration value.

ok, got it.

> If I modify the revision to r0p0, etc., it is also possible, but I need
> to convert the enumeration to a string.

understood

> 
> 
>>> The identifier consists of model_name and revision.
>>> The compatible value "arm_cmn600;arm_cmn650" can match the identifier "arm_cmn600_0" or "arm_cmn650_0". Maybe the message log
>>> is not clear enough.
>>>
>>> For example in patch #3 the metric "slc_miss_rate" is a generic metric for cmn-any. So we can define:
>>>
>>> +    {
>>> +        "MetricName": "slc_miss_rate",
>>> +        "BriefDescription": "The system level cache miss rate include.",
>>> +        "MetricGroup": "arm_cmn",
>>> +        "MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access",
>>> +        "ScaleUnit": "100%",
>>> +        "Unit": "arm_cmn",
>>> +        "Compat": "arm_cmn600;arm_cmn650;arm_cmn700;arm_ci700"
>>> +    },
>>>
>>>
>>> It can match identifiers "arm_cmn600_{0,1,2..X}" or "arm_cmn650_{0,1,2..X}" or "arm_cmn700_{0,1,2..X}" or "arm_ci700_{ 0,1,2..X}".
>>> In other words, it can match all identifiers prefixed with “arm_cmn600” or “arm_cmn650” or “arm_cmn700” or “arm_ci700”.
>>>
>>> If a new model arm_cmn driver with identifier "arm_cmn750_0", it will not be matched, but if a new revision arm_cmn driver with identifier
>>> "arm_cmn700_4", it can be matched.
>> OK, I see what you mean. My confusion came about though your commit message on this same patch, which did not mention cmn650. I assumed that the example event which you were describing was supported for arm_cmn650 and you intentionally omitted it.
>>
> Yes, I didn't write cmn650 in the commit message, just because I omitted it. I will add it in a later version.
> 
>>>
>>>>> Tokens in Unit field are delimited by ';'.
>>>> Thanks for taking a stab at solving this problem.
>>>>
>>>> I have to admit that I am not the biggest fan of having multiple values to match in the "Compat" value possibly for every event. It doesn't really scale.
>>>>
>>>> I would hope that there are at least some events which we are guaranteed to always be present. From what Robin said on the v2 series, for the implementations which we care about, events are generally added per subsequent version. So we should have some base set of fixed events.
>>>>
>>>> If we are confident that we have a fixed set of base set of events, can we ensure that those events would not require this compat string which needs each version explicitly stated?
>>>>
>>> If we are sure that some events will always exist in subsequent versions, we can set the Compat field to "arm_cmn;arm_ci". After that,
>>> whether it is a different model or a different revision of the cmn PMU, it will be compatible. That is, it matches all whose identifier
>>> is prefixed with “arm_cmn” or “arm_ci”.
>> Sure, we could do something like that. Or if we are super-confident that every model and rev will support some event, then we can change perf tool to just not check Compat for that event.
>>
> Yes, I have also thought about this solution. If it is an event supported by all versions, as long as it meets the Unit match, it does not need
> to check Compat.


>However, the current perf tools tool seems to ignore the "Unit" inspection for the metric event.

Unit is the format of the event_source device name. We should match 
based on that as well as compat. I need to check the code again to 
understand how that is done... it has changed a good bit in 3 years.

> 
>>> Maybe it's not a perfect solution yet, looking forward to your suggestions.
>> Well first we need to define kernel HW identifier format. I don't know why we don't encode model and revision name, like "cmn650_r1p1". Robin?
>>
> As mentioned earlier, revision is a register value rather than an enumeration value. If change revision to revision name, I need a more redundant
> operation of converting enumeration value to string. If you think changing the naming method to "cmn650_r1p1" is clearer, of course
> there is no problem.

It's just odd to mix a string and revision number in this way.

Robin knows more about this HW than me, so I'll let him decide on the 
the preferred format.

Thanks,
John
  
Robin Murphy June 6, 2023, 4:27 p.m. UTC | #7
On 02/06/2023 5:20 pm, John Garry wrote:
> On 01/06/2023 09:58, Jing Zhang wrote:
>>>  From checking the driver, it seems that we have model names 
>>> "arm_cmn600" and "arm_cmn650". Are you saying that "arm_cmn600X" 
>>> would match for those? I am most curious about how "arm_cmn600X" 
>>> matches "arm_cmn650".
>>>
>> Hi John,
>>
>>  From patch #1 we have identifiers "arm_cmn600_0" and "arm_cmn650_0" etc. 
> 
> ok, I see. Your idea for the cmn driver HW identifier format is odd to 
> me. Your HW identifier is a mix of the HW IP model name (from 
> arm_cmn_device_data.model_name) with some the kernel revision identifier 
> (from cmn_revision). The kernel revision identifier is an enum, and I 
> don't think that it is a good idea to expose enum values through sysfs 
> files.
> 
> I assume that there is some ordering requirement for cmn_revision, 
> considering that we have equality operations on the revision in the driver.

That enum does actually follow the revision identifiers as provided by 
the hardware (see CMN_CFGM_PID2_REVISION), so I don't see any major 
issue with putting it into user ABI. And TBH I think I would prefer to 
just use a numeric value rather than have to maintain yet more tables of 
strings which given the usage model here would effectively only mangle a 
matchable value into a different matchable value anyway.

I am inclined to agree that the mix between part 
driver-generated-string, part hardware-value looks a little funky. I 
still need to check with the hardware team exactly how the part number 
field from PERIPH_ID_0/1 is "configuration-dependent", and whether there 
might actually be a chance of using that as well.

One nagging doubt that remains for metrics are any baked-in assumptions 
which may not always simply depend on the product version - for instance 
it happens to be the case currently that everything has a fixed flit 
size of 256 bits, hence the magic "32" in the bandwidth calculations, 
but if that ever became configurable in some future product, we may 
potentially have a problem guaranteeing a meaningful calculation.

>> The identifier consists of model_name and revision.
>> The compatible value "arm_cmn600;arm_cmn650" can match the identifier 
>> "arm_cmn600_0" or "arm_cmn650_0". Maybe the message log
>> is not clear enough.
>>
>> For example in patch #3 the metric "slc_miss_rate" is a generic metric 
>> for cmn-any. So we can define:
>>
>> +    {
>> +        "MetricName": "slc_miss_rate",
>> +        "BriefDescription": "The system level cache miss rate include.",
>> +        "MetricGroup": "arm_cmn",
>> +        "MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access",
>> +        "ScaleUnit": "100%",
>> +        "Unit": "arm_cmn",
>> +        "Compat": "arm_cmn600;arm_cmn650;arm_cmn700;arm_ci700"
>> +    },
>>
>>
>> It can match identifiers "arm_cmn600_{0,1,2..X}" or 
>> "arm_cmn650_{0,1,2..X}" or "arm_cmn700_{0,1,2..X}" or "arm_ci700_{ 
>> 0,1,2..X}".
>> In other words, it can match all identifiers prefixed with 
>> “arm_cmn600” or “arm_cmn650” or “arm_cmn700” or “arm_ci700”.
>>
>> If a new model arm_cmn driver with identifier "arm_cmn750_0", it will 
>> not be matched, but if a new revision arm_cmn driver with identifier
>> "arm_cmn700_4", it can be matched.
> 
> OK, I see what you mean. My confusion came about though your commit 
> message on this same patch, which did not mention cmn650. I assumed that 
> the example event which you were describing was supported for arm_cmn650 
> and you intentionally omitted it.
> 
>>
>>
>>>> Tokens in Unit field are delimited by ';'.
>>> Thanks for taking a stab at solving this problem.
>>>
>>> I have to admit that I am not the biggest fan of having multiple 
>>> values to match in the "Compat" value possibly for every event. It 
>>> doesn't really scale.
>>>
>>> I would hope that there are at least some events which we are 
>>> guaranteed to always be present. From what Robin said on the v2 
>>> series, for the implementations which we care about, events are 
>>> generally added per subsequent version. So we should have some base 
>>> set of fixed events.

Note that there's a slight difference between "present" and "valid", 
e.g. in the current driver-internal aliases, all MTSX events are marked 
CMN_ANY, meaning they're considered valid on any CMN configuration with 
an MTSX node, regardless of model. The events don't exist on CMN-600 or 
CMN-650, but that's because the MTSX itself wasn't a thing yet, so for 
simplicity we don't have to bother considering the events invalid when 
we know they will always be non-present and thus filtered anyway.

>>> If we are confident that we have a fixed set of base set of events, 
>>> can we ensure that those events would not require this compat string 
>>> which needs each version explicitly stated?
>>>
>> If we are sure that some events will always exist in subsequent 
>> versions, we can set the Compat field to "arm_cmn;arm_ci". After that,
>> whether it is a different model or a different revision of the cmn 
>> PMU, it will be compatible. That is, it matches all whose identifier
>> is prefixed with “arm_cmn” or “arm_ci”.
> 
> Sure, we could do something like that. Or if we are super-confident that 
> every model and rev will support some event, then we can change perf 
> tool to just not check Compat for that event.

The majority of events have stayed unchanged since the introduction of 
their respective node type, so assuming we already have a basic match on 
the PMU name to know which JSON to be looking at in the first place, I'd 
imagine the Compat field could be optional, and only needed for events 
which first appear in a subsequent revision or model, or the fiddly 
cases like where DVM node events got entirely rewritten in CMN-650.

Thanks,
Robin.
  
Jing Zhang June 8, 2023, 9:44 a.m. UTC | #8
在 2023/6/6 下午10:11, John Garry 写道:
> On 05/06/2023 03:46, Jing Zhang wrote:
>>> I assume that there is some ordering requirement for cmn_revision, considering that we have equality operations on the revision in the driver.
>>>
>> Actually revision is a register value rather than an enumeration value.
> 
> ok, got it.
> 
>> If I modify the revision to r0p0, etc., it is also possible, but I need
>> to convert the enumeration to a string.
> 
> understood
> 
>>
>>
>>>> The identifier consists of model_name and revision.
>>>> The compatible value "arm_cmn600;arm_cmn650" can match the identifier "arm_cmn600_0" or "arm_cmn650_0". Maybe the message log
>>>> is not clear enough.
>>>>
>>>> For example in patch #3 the metric "slc_miss_rate" is a generic metric for cmn-any. So we can define:
>>>>
>>>> +    {
>>>> +        "MetricName": "slc_miss_rate",
>>>> +        "BriefDescription": "The system level cache miss rate include.",
>>>> +        "MetricGroup": "arm_cmn",
>>>> +        "MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access",
>>>> +        "ScaleUnit": "100%",
>>>> +        "Unit": "arm_cmn",
>>>> +        "Compat": "arm_cmn600;arm_cmn650;arm_cmn700;arm_ci700"
>>>> +    },
>>>>
>>>>
>>>> It can match identifiers "arm_cmn600_{0,1,2..X}" or "arm_cmn650_{0,1,2..X}" or "arm_cmn700_{0,1,2..X}" or "arm_ci700_{ 0,1,2..X}".
>>>> In other words, it can match all identifiers prefixed with “arm_cmn600” or “arm_cmn650” or “arm_cmn700” or “arm_ci700”.
>>>>
>>>> If a new model arm_cmn driver with identifier "arm_cmn750_0", it will not be matched, but if a new revision arm_cmn driver with identifier
>>>> "arm_cmn700_4", it can be matched.
>>> OK, I see what you mean. My confusion came about though your commit message on this same patch, which did not mention cmn650. I assumed that the example event which you were describing was supported for arm_cmn650 and you intentionally omitted it.
>>>
>> Yes, I didn't write cmn650 in the commit message, just because I omitted it. I will add it in a later version.
>>
>>>>
>>>>>> Tokens in Unit field are delimited by ';'.
>>>>> Thanks for taking a stab at solving this problem.
>>>>>
>>>>> I have to admit that I am not the biggest fan of having multiple values to match in the "Compat" value possibly for every event. It doesn't really scale.
>>>>>
>>>>> I would hope that there are at least some events which we are guaranteed to always be present. From what Robin said on the v2 series, for the implementations which we care about, events are generally added per subsequent version. So we should have some base set of fixed events.
>>>>>
>>>>> If we are confident that we have a fixed set of base set of events, can we ensure that those events would not require this compat string which needs each version explicitly stated?
>>>>>
>>>> If we are sure that some events will always exist in subsequent versions, we can set the Compat field to "arm_cmn;arm_ci". After that,
>>>> whether it is a different model or a different revision of the cmn PMU, it will be compatible. That is, it matches all whose identifier
>>>> is prefixed with “arm_cmn” or “arm_ci”.
>>> Sure, we could do something like that. Or if we are super-confident that every model and rev will support some event, then we can change perf tool to just not check Compat for that event.
>>>
>> Yes, I have also thought about this solution. If it is an event supported by all versions, as long as it meets the Unit match, it does not need
>> to check Compat.
> 
> 
>> However, the current perf tools tool seems to ignore the "Unit" inspection for the metric event.
> 
> Unit is the format of the event_source device name. We should match based on that as well as compat. I need to check the code again to understand how that is done... it has changed a good bit in 3 years.
> 

This situation only happens on uncore metric. I happened to write wrong Unit, but the metric still matches.

>>
>>>> Maybe it's not a perfect solution yet, looking forward to your suggestions.
>>> Well first we need to define kernel HW identifier format. I don't know why we don't encode model and revision name, like "cmn650_r1p1". Robin?
>>>
>> As mentioned earlier, revision is a register value rather than an enumeration value. If change revision to revision name, I need a more redundant
>> operation of converting enumeration value to string. If you think changing the naming method to "cmn650_r1p1" is clearer, of course
>> there is no problem.
> 
> It's just odd to mix a string and revision number in this way.
> 
> Robin knows more about this HW than me, so I'll let him decide on the the preferred format.
> 

Ok, thank you John.
  
Jing Zhang June 8, 2023, 10:11 a.m. UTC | #9
在 2023/6/7 上午12:27, Robin Murphy 写道:
> On 02/06/2023 5:20 pm, John Garry wrote:
>> On 01/06/2023 09:58, Jing Zhang wrote:
>>>>  From checking the driver, it seems that we have model names "arm_cmn600" and "arm_cmn650". Are you saying that "arm_cmn600X" would match for those? I am most curious about how "arm_cmn600X" matches "arm_cmn650".
>>>>
>>> Hi John,
>>>
>>>  From patch #1 we have identifiers "arm_cmn600_0" and "arm_cmn650_0" etc. 
>>
>> ok, I see. Your idea for the cmn driver HW identifier format is odd to me. Your HW identifier is a mix of the HW IP model name (from arm_cmn_device_data.model_name) with some the kernel revision identifier (from cmn_revision). The kernel revision identifier is an enum, and I don't think that it is a good idea to expose enum values through sysfs files.
>>
>> I assume that there is some ordering requirement for cmn_revision, considering that we have equality operations on the revision in the driver.
> 
> That enum does actually follow the revision identifiers as provided by the hardware (see CMN_CFGM_PID2_REVISION), so I don't see any major issue with putting it into user ABI. And TBH I think I would prefer to just use a numeric value rather than have to maintain yet more tables of strings which given the usage model here would effectively only mangle a matchable value into a different matchable value anyway.
> 
> I am inclined to agree that the mix between part driver-generated-string, part hardware-value looks a little funky. I still need to check with the hardware team exactly how the part number field from PERIPH_ID_0/1 is "configuration-dependent", and whether there might actually be a chance of using that as well.
> 

Thanks Robin. So should we wait to confirm the configuration of the PERIPH_ID_0/1 field before pushing this patch? Or is it
acceptable to use "cmn600_r0p0" as an identifier? Looking forward to your suggestion.

> One nagging doubt that remains for metrics are any baked-in assumptions which may not always simply depend on the product version - for instance it happens to be the case currently that everything has a fixed flit size of 256 bits, hence the magic "32" in the bandwidth calculations, but if that ever became configurable in some future product, we may potentially have a problem guaranteeing a meaningful calculation.
> 

In this case, we may use "literal" to solve it later. It can use variables in bandwidth calculations. For example,
"#slots" can get the value from the file of the current architecture and use it in the metric.

>>> The identifier consists of model_name and revision.
>>> The compatible value "arm_cmn600;arm_cmn650" can match the identifier "arm_cmn600_0" or "arm_cmn650_0". Maybe the message log
>>> is not clear enough.
>>>
>>> For example in patch #3 the metric "slc_miss_rate" is a generic metric for cmn-any. So we can define:
>>>
>>> +    {
>>> +        "MetricName": "slc_miss_rate",
>>> +        "BriefDescription": "The system level cache miss rate include.",
>>> +        "MetricGroup": "arm_cmn",
>>> +        "MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access",
>>> +        "ScaleUnit": "100%",
>>> +        "Unit": "arm_cmn",
>>> +        "Compat": "arm_cmn600;arm_cmn650;arm_cmn700;arm_ci700"
>>> +    },
>>>
>>>
>>> It can match identifiers "arm_cmn600_{0,1,2..X}" or "arm_cmn650_{0,1,2..X}" or "arm_cmn700_{0,1,2..X}" or "arm_ci700_{ 0,1,2..X}".
>>> In other words, it can match all identifiers prefixed with “arm_cmn600” or “arm_cmn650” or “arm_cmn700” or “arm_ci700”.
>>>
>>> If a new model arm_cmn driver with identifier "arm_cmn750_0", it will not be matched, but if a new revision arm_cmn driver with identifier
>>> "arm_cmn700_4", it can be matched.
>>
>> OK, I see what you mean. My confusion came about though your commit message on this same patch, which did not mention cmn650. I assumed that the example event which you were describing was supported for arm_cmn650 and you intentionally omitted it.
>>
>>>
>>>
>>>>> Tokens in Unit field are delimited by ';'.
>>>> Thanks for taking a stab at solving this problem.
>>>>
>>>> I have to admit that I am not the biggest fan of having multiple values to match in the "Compat" value possibly for every event. It doesn't really scale.
>>>>
>>>> I would hope that there are at least some events which we are guaranteed to always be present. From what Robin said on the v2 series, for the implementations which we care about, events are generally added per subsequent version. So we should have some base set of fixed events.
> 
> Note that there's a slight difference between "present" and "valid", e.g. in the current driver-internal aliases, all MTSX events are marked CMN_ANY, meaning they're considered valid on any CMN configuration with an MTSX node, regardless of model. The events don't exist on CMN-600 or CMN-650, but that's because the MTSX itself wasn't a thing yet, so for simplicity we don't have to bother considering the events invalid when we know they will always be non-present and thus filtered anyway.
> 
>>>> If we are confident that we have a fixed set of base set of events, can we ensure that those events would not require this compat string which needs each version explicitly stated?
>>>>
>>> If we are sure that some events will always exist in subsequent versions, we can set the Compat field to "arm_cmn;arm_ci". After that,
>>> whether it is a different model or a different revision of the cmn PMU, it will be compatible. That is, it matches all whose identifier
>>> is prefixed with “arm_cmn” or “arm_ci”.
>>
>> Sure, we could do something like that. Or if we are super-confident that every model and rev will support some event, then we can change perf tool to just not check Compat for that event.
> 
> The majority of events have stayed unchanged since the introduction of their respective node type, so assuming we already have a basic match on the PMU name to know which JSON to be looking at in the first place, I'd imagine the Compat field could be optional, and only needed for events which first appear in a subsequent revision or model, or the fiddly cases like where DVM node events got entirely rewritten in CMN-650.
> 

OK, thanks. Maybe we first need to confirm how to set the identifier format in the driver. Then it will be clearer how to implement Compat matching.


Thanks,
Jing
  
John Garry June 12, 2023, 6:15 p.m. UTC | #10
On 08/06/2023 10:44, Jing Zhang wrote:
>> Unit is the format of the event_source device name. We should match based on that as well as compat. I need to check the code again to understand how that is done... it has changed a good bit in 3 years.
>>
> This situation only happens on uncore metric. I happened to write wrong Unit, but the metric still matches.
> 

I'm just double checking this now. I think any possible fix should be 
easy enough for current code but may be tricky for backport with lots of 
metric code changes.

Thanks,
John
  
John Garry June 13, 2023, 4:36 p.m. UTC | #11
On 12/06/2023 19:15, John Garry wrote:
> On 08/06/2023 10:44, Jing Zhang wrote:
>>> Unit is the format of the event_source device name. We should match 
>>> based on that as well as compat. I need to check the code again to 
>>> understand how that is done... it has changed a good bit in 3 years.
>>>
>> This situation only happens on uncore metric. I happened to write 
>> wrong Unit, but the metric still matches.
>>
> 
> I'm just double checking this now. I think any possible fix should be 
> easy enough for current code but may be tricky for backport with lots of 
> metric code changes.

I also have code to re-work sys event metric support such that we don't 
require "compat" or "Unit" values for a metric when the metric is 
described in terms of event aliases. That code is 2 years old, so may 
take a bit of time to rebase. I'll look to do that now.

Thanks,
John
  
Jing Zhang June 15, 2023, 2:18 a.m. UTC | #12
在 2023/6/14 上午12:36, John Garry 写道:
> On 12/06/2023 19:15, John Garry wrote:
>> On 08/06/2023 10:44, Jing Zhang wrote:
>>>> Unit is the format of the event_source device name. We should match based on that as well as compat. I need to check the code again to understand how that is done... it has changed a good bit in 3 years.
>>>>
>>> This situation only happens on uncore metric. I happened to write wrong Unit, but the metric still matches.
>>>
>>
>> I'm just double checking this now. I think any possible fix should be easy enough for current code but may be tricky for backport with lots of metric code changes.
> 
> I also have code to re-work sys event metric support such that we don't require "compat" or "Unit" values for a metric when the metric is described in terms of event aliases. That code is 2 years old, so may take a bit of time to rebase. I'll look to do that now.
> 

Sounds good!

Thanks,
Jing


> Thanks,
> John
>
  
Jing Zhang June 15, 2023, 3:41 a.m. UTC | #13
在 2023/6/6 上午3:39, Arnaldo Carvalho de Melo 写道:
> Em Mon, Jun 05, 2023 at 10:46:36AM +0800, Jing Zhang escreveu:
>> 在 2023/6/3 上午12:20, John Garry 写道:
>>> On 01/06/2023 09:58, Jing Zhang wrote:
>>>> It can match identifiers "arm_cmn600_{0,1,2..X}" or "arm_cmn650_{0,1,2..X}" or "arm_cmn700_{0,1,2..X}" or "arm_ci700_{ 0,1,2..X}".
>>>> In other words, it can match all identifiers prefixed with “arm_cmn600” or “arm_cmn650” or “arm_cmn700” or “arm_ci700”.
>>>>
>>>> If a new model arm_cmn driver with identifier "arm_cmn750_0", it will not be matched, but if a new revision arm_cmn driver with identifier
>>>> "arm_cmn700_4", it can be matched.
>>>
>>> OK, I see what you mean. My confusion came about though your commit message on this same patch, which did not mention cmn650. I assumed that the example event which you were describing was supported for arm_cmn650 and you intentionally omitted it.
>>>
>>
>> Yes, I didn't write cmn650 in the commit message, just because I omitted it. I will add it in a later version.
> 
> Ok, will wait for v4 then.
> 
> - Arnaldo

Hi Arnaldo,

Thank you for following up this patchset, maybe we can merge patch4 to patch7 into your branch first?
They have all been reviewed/acked and there is no dispute. And patch4-7 does not depend on patch1-3,
because patch4-7 is about the metric of ali_drw_pmu, and patch1-3 is about the metric of arm_cmn.
It may take some time for patch1-3 to reach a consensus.

Thanks,
Jing
  
John Garry June 16, 2023, 11:41 a.m. UTC | #14
On 15/06/2023 03:18, Jing Zhang wrote:
>>>>> Unit is the format of the event_source device name. We should match based on that as well as compat. I need to check the code again to understand how that is done... it has changed a good bit in 3 years.
>>>>>
>>>> This situation only happens on uncore metric. I happened to write wrong Unit, but the metric still matches.
>>>>
>>> I'm just double checking this now. I think any possible fix should be easy enough for current code but may be tricky for backport with lots of metric code changes.
>> I also have code to re-work sys event metric support such that we don't require "compat" or "Unit" values for a metric when the metric is described in terms of event aliases. That code is 2 years old, so may take a bit of time to rebase. I'll look to do that now.
>>
> Sounds good!

BTW, I am just looking at your cmn JSONs in this series, and we have 
something like this:

index 0000000..e70ac1a
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metrics.json
@@ -0,0 +1,74 @@
+[
+	{
+		"MetricName": "slc_miss_rate",
+		"BriefDescription": "The system level cache miss rate include.",
+		"MetricGroup": "arm_cmn",
+		"MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access",

So this expression uses event aliases hnf_cache_miss and 
hnf_slc_sf_cache_access - where are they defined in a JSON?

I could not see them. If they are not needed, then I may be missing 
something...

Thanks,
John
  
Jing Zhang June 19, 2023, 2:58 a.m. UTC | #15
在 2023/6/16 下午7:41, John Garry 写道:
> On 15/06/2023 03:18, Jing Zhang wrote:
>>>>>> Unit is the format of the event_source device name. We should match based on that as well as compat. I need to check the code again to understand how that is done... it has changed a good bit in 3 years.
>>>>>>
>>>>> This situation only happens on uncore metric. I happened to write wrong Unit, but the metric still matches.
>>>>>
>>>> I'm just double checking this now. I think any possible fix should be easy enough for current code but may be tricky for backport with lots of metric code changes.
>>> I also have code to re-work sys event metric support such that we don't require "compat" or "Unit" values for a metric when the metric is described in terms of event aliases. That code is 2 years old, so may take a bit of time to rebase. I'll look to do that now.
>>>
>> Sounds good!
> 
> BTW, I am just looking at your cmn JSONs in this series, and we have something like this:
> 
> index 0000000..e70ac1a
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metrics.json
> @@ -0,0 +1,74 @@
> +[
> +    {
> +        "MetricName": "slc_miss_rate",
> +        "BriefDescription": "The system level cache miss rate include.",
> +        "MetricGroup": "arm_cmn",
> +        "MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access",
> 
> So this expression uses event aliases hnf_cache_miss and hnf_slc_sf_cache_access - where are they defined in a JSON?
> 

Hi John,

I defined the aliases for these events in the JSON file during the RFC version. However, I later removed the alias
definitions for these events in subsequent versions due to the possibility of non-uniqueness and difficulty in defining
their EventCode. But this does not affect their usage in metrics. In other words, metrics can use the aliases without
defining event aliases in the JSON file.

In the past, I always thought that the function of the alias was to explain the meaning of these events in the perf list.
Or maybe I'm missing something?

Thanks,
Jing

> I could not see them. If they are not needed, then I may be missing something...
> 
> Thanks,
> John
> 
>
  
John Garry June 19, 2023, 7:07 a.m. UTC | #16
On 19/06/2023 03:58, Jing Zhang wrote:
>> +++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metrics.json
>> @@ -0,0 +1,74 @@
>> +[
>> +    {
>> +        "MetricName": "slc_miss_rate",
>> +        "BriefDescription": "The system level cache miss rate include.",
>> +        "MetricGroup": "arm_cmn",
>> +        "MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access",
>>
>> So this expression uses event aliases hnf_cache_miss and hnf_slc_sf_cache_access - where are they defined in a JSON?
>>
> Hi John,
> 
> I defined the aliases for these events in the JSON file during the RFC version. However, I later removed the alias
> definitions for these events in subsequent versions due to the possibility of non-uniqueness and difficulty in defining
> their EventCode. But this does not affect their usage in metrics. In other words, metrics can use the aliases without
> defining event aliases in the JSON file.

Really? So how can we resolve the event aliases when we try to run the 
metric?

Please verify running these metrics with 'perf stat', like 'perf stat -v 
-M slc_miss_rate'

> 
> In the past, I always thought that the function of the alias was to explain the meaning of these events in the perf list.
> Or maybe I'm missing something?

Event aliases do give the ability to describe the event in perf list. 
But we can also run them for 'perf stat', like:

./perf list uncore
List of pre-defined events (to be used in -e or -M):

   uncore_cbox_0/clockticks/                          [Kernel PMU event]
   uncore_cbox_1/clockticks/                          [Kernel PMU event]
   uncore_imc/data_reads/                             [Kernel PMU event]
   uncore_imc/data_writes/                            [Kernel PMU event]
   uncore_imc/gt_requests/                            [Kernel PMU event]
   uncore_imc/ia_requests/                            [Kernel PMU event]
   uncore_imc/io_requests/                            [Kernel PMU event]

uncore cache:
   unc_cbo_cache_lookup.any_es
        [L3 Lookup any request that access cache and found line in E or 
S-state. Unit: uncore_cbox]
...

sudo ./perf stat -v -e unc_cbo_cache_lookup.any_es
Using CPUID GenuineIntel-6-3D-4
unc_cbo_cache_lookup.any_es -> uncore_cbox_0/event=0x34,umask=0x86/
unc_cbo_cache_lookup.any_es -> uncore_cbox_1/event=0x34,umask=0x86/
Control descriptor is not initialized
^Cunc_cbo_cache_lookup.any_es: 14361103 1853372468 1853372468
unc_cbo_cache_lookup.any_es: 14322188 1853360415 1853360415

  Performance counter stats for 'system wide':

         14,361,103      unc_cbo_cache_lookup.any_es 

         14,322,188      unc_cbo_cache_lookup.any_es 


        1.853388227 seconds time elapsed


Thanks,
John
  
Jing Zhang June 19, 2023, 8:59 a.m. UTC | #17
在 2023/6/19 下午3:07, John Garry 写道:
> On 19/06/2023 03:58, Jing Zhang wrote:
>>> +++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metrics.json
>>> @@ -0,0 +1,74 @@
>>> +[
>>> +    {
>>> +        "MetricName": "slc_miss_rate",
>>> +        "BriefDescription": "The system level cache miss rate include.",
>>> +        "MetricGroup": "arm_cmn",
>>> +        "MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access",
>>>
>>> So this expression uses event aliases hnf_cache_miss and hnf_slc_sf_cache_access - where are they defined in a JSON?
>>>
>> Hi John,
>>
>> I defined the aliases for these events in the JSON file during the RFC version. However, I later removed the alias
>> definitions for these events in subsequent versions due to the possibility of non-uniqueness and difficulty in defining
>> their EventCode. But this does not affect their usage in metrics. In other words, metrics can use the aliases without
>> defining event aliases in the JSON file.
> 
> Really? So how can we resolve the event aliases when we try to run the metric?
> 
> Please verify running these metrics with 'perf stat', like 'perf stat -v -M slc_miss_rate'
> 

Ok, it shows:
#./perf stat -v -M slc_miss_rate sleep 1

metric expr hnf_cache_miss / hnf_slc_sf_cache_access for slc_miss_rate
found event duration_time
found event hnf_slc_sf_cache_access
found event hnf_cache_miss
Parsing metric events '{hnf_slc_sf_cache_access/metric-id=hnf_slc_sf_cache_access/,hnf_cache_miss/metric-id=hnf_cache_miss/}:W,duration_time'
hnf_slc_sf_cache_access -> arm_cmn_0/type=0x5,eventid=0x2/
hnf_slc_sf_cache_access -> arm_cmn_1/type=0x5,eventid=0x2/
hnf_cache_miss -> arm_cmn_0/type=0x5,eventid=0x1/
hnf_cache_miss -> arm_cmn_1/type=0x5,eventid=0x1/
Control descriptor is not initialized
hnf_slc_sf_cache_access: 127615 1001344900 1001344900
hnf_cache_miss: 36829 1001344900 1001344900
hnf_slc_sf_cache_access: 131526 1001343540 1001343540
hnf_cache_miss: 40587 1001343540 1001343540
duration_time: 1001381687 1001381687 1001381687

 Performance counter stats for 'system wide':

           259,141      hnf_slc_sf_cache_access   #     29.9 %  slc_miss_rate
            77,416      hnf_cache_miss
     1,001,381,687 ns   duration_time

       1.001381687 seconds time elapsed



#./perf list
...
 arm_cmn_0/hnf_cache_miss/                          [Kernel PMU event]
 arm_cmn_0/hnf_slc_sf_cache_access/                 [Kernel PMU event]
...
 arm_cmn_1/hnf_cache_miss/                          [Kernel PMU event]
 arm_cmn_1/hnf_slc_sf_cache_access/                 [Kernel PMU event]
...

>>
>> In the past, I always thought that the function of the alias was to explain the meaning of these events in the perf list.
>> Or maybe I'm missing something?
> 
> Event aliases do give the ability to describe the event in perf list. But we can also run them for 'perf stat', like:
> 
> ./perf list uncore
> List of pre-defined events (to be used in -e or -M):
> 
>   uncore_cbox_0/clockticks/                          [Kernel PMU event]
>   uncore_cbox_1/clockticks/                          [Kernel PMU event]
>   uncore_imc/data_reads/                             [Kernel PMU event]
>   uncore_imc/data_writes/                            [Kernel PMU event]
>   uncore_imc/gt_requests/                            [Kernel PMU event]
>   uncore_imc/ia_requests/                            [Kernel PMU event]
>   uncore_imc/io_requests/                            [Kernel PMU event]
> 
> uncore cache:
>   unc_cbo_cache_lookup.any_es
>        [L3 Lookup any request that access cache and found line in E or S-state. Unit: uncore_cbox]
> ...
> 
> sudo ./perf stat -v -e unc_cbo_cache_lookup.any_es
> Using CPUID GenuineIntel-6-3D-4
> unc_cbo_cache_lookup.any_es -> uncore_cbox_0/event=0x34,umask=0x86/
> unc_cbo_cache_lookup.any_es -> uncore_cbox_1/event=0x34,umask=0x86/
> Control descriptor is not initialized
> ^Cunc_cbo_cache_lookup.any_es: 14361103 1853372468 1853372468
> unc_cbo_cache_lookup.any_es: 14322188 1853360415 1853360415
> 
>  Performance counter stats for 'system wide':
> 
>         14,361,103      unc_cbo_cache_lookup.any_es
>         14,322,188      unc_cbo_cache_lookup.any_es
> 
>        1.853388227 seconds time elapsed
> 

Ok, thanks. If I use events without a prefix, such as perf stat -e clockticks sleep 1, will this also work?

Thanks,
Jing


> 
> Thanks,
> John
  
John Garry June 19, 2023, 9:31 a.m. UTC | #18
On 19/06/2023 09:59, Jing Zhang wrote:
>> Please verify running these metrics with 'perf stat', like 'perf stat -v -M slc_miss_rate'
>>
> Ok, it shows:
> #./perf stat -v -M slc_miss_rate sleep 1
> 
> metric expr hnf_cache_miss / hnf_slc_sf_cache_access for slc_miss_rate
> found event duration_time
> found event hnf_slc_sf_cache_access

In the earlier RFC series you had 
tools/perf/pmu-events/arch/arm64/arm/cmn700/sys/cmn.json, which 
describes event hnf_slc_sf_cache_access

But that JSON is not in this series. Why is it not included?

The cmn kernel driver exposes event hnf_slc_sf_cache_access, but I did 
not think that perf tool metric code matches those events described in 
/bus/event_sourcs/devices/<PMU>/events

> found event hnf_cache_miss
> Parsing metric events '{hnf_slc_sf_cache_access/metric-id=hnf_slc_sf_cache_access/,hnf_cache_miss/metric-id=hnf_cache_miss/}:W,duration_time'
> hnf_slc_sf_cache_access -> arm_cmn_0/type=0x5,eventid=0x2/
> hnf_slc_sf_cache_access -> arm_cmn_1/type=0x5,eventid=0x2/
> hnf_cache_miss -> arm_cmn_0/type=0x5,eventid=0x1/
> hnf_cache_miss -> arm_cmn_1/type=0x5,eventid=0x1/
> Control descriptor is not initialized
> hnf_slc_sf_cache_access: 127615 1001344900 1001344900
> hnf_cache_miss: 36829 1001344900 1001344900
> hnf_slc_sf_cache_access: 131526 1001343540 1001343540
> hnf_cache_miss: 40587 1001343540 1001343540
> duration_time: 1001381687 1001381687 1001381687
> 
>   Performance counter stats for 'system wide':
> 
>             259,141      hnf_slc_sf_cache_access   #     29.9 %  slc_miss_rate
>              77,416      hnf_cache_miss
>       1,001,381,687 ns   duration_time
> 
>         1.001381687 seconds time elapsed
> 
> 
> 
> #./perf list
> ...
>   arm_cmn_0/hnf_cache_miss/                          [Kernel PMU event]
>   arm_cmn_0/hnf_slc_sf_cache_access/                 [Kernel PMU event]
> ...
>   arm_cmn_1/hnf_cache_miss/                          [Kernel PMU event]
>   arm_cmn_1/hnf_slc_sf_cache_access/                 [Kernel PMU event]
> ...
> 
>>> In the past, I always thought that the function of the alias was to explain the meaning of these events in the perf list.
>>> Or maybe I'm missing something?
>> Event aliases do give the ability to describe the event in perf list. But we can also run them for 'perf stat', like:
>>
>> ./perf list uncore
>> List of pre-defined events (to be used in -e or -M):
>>
>>    uncore_cbox_0/clockticks/                          [Kernel PMU event]
>>    uncore_cbox_1/clockticks/                          [Kernel PMU event]
>>    uncore_imc/data_reads/                             [Kernel PMU event]
>>    uncore_imc/data_writes/                            [Kernel PMU event]
>>    uncore_imc/gt_requests/                            [Kernel PMU event]
>>    uncore_imc/ia_requests/                            [Kernel PMU event]
>>    uncore_imc/io_requests/                            [Kernel PMU event]
>>
>> uncore cache:
>>    unc_cbo_cache_lookup.any_es
>>         [L3 Lookup any request that access cache and found line in E or S-state. Unit: uncore_cbox]
>> ...
>>
>> sudo ./perf stat -v -e unc_cbo_cache_lookup.any_es
>> Using CPUID GenuineIntel-6-3D-4
>> unc_cbo_cache_lookup.any_es -> uncore_cbox_0/event=0x34,umask=0x86/
>> unc_cbo_cache_lookup.any_es -> uncore_cbox_1/event=0x34,umask=0x86/
>> Control descriptor is not initialized
>> ^Cunc_cbo_cache_lookup.any_es: 14361103 1853372468 1853372468
>> unc_cbo_cache_lookup.any_es: 14322188 1853360415 1853360415
>>
>>   Performance counter stats for 'system wide':
>>
>>          14,361,103      unc_cbo_cache_lookup.any_es
>>          14,322,188      unc_cbo_cache_lookup.any_es
>>
>>         1.853388227 seconds time elapsed
>>
> Ok, thanks. If I use events without a prefix, such as perf stat -e clockticks sleep 1, will this also work?

In this case, yes - it would work for uncore_cbox_0/clockticks/ and 
uncore_cbox_1/clockticks/

But you need to be careful to here - if another PMU has same event name, 
then it might also match.

Thanks,
John
  
Jing Zhang June 20, 2023, 2:12 a.m. UTC | #19
在 2023/6/19 下午5:31, John Garry 写道:
> On 19/06/2023 09:59, Jing Zhang wrote:
>>> Please verify running these metrics with 'perf stat', like 'perf stat -v -M slc_miss_rate'
>>>
>> Ok, it shows:
>> #./perf stat -v -M slc_miss_rate sleep 1
>>
>> metric expr hnf_cache_miss / hnf_slc_sf_cache_access for slc_miss_rate
>> found event duration_time
>> found event hnf_slc_sf_cache_access
> 
> In the earlier RFC series you had tools/perf/pmu-events/arch/arm64/arm/cmn700/sys/cmn.json, which describes event hnf_slc_sf_cache_access
> 
> But that JSON is not in this series. Why is it not included?
> 

Because the RFC version of the cmn.json file does not define the EventCode for each event, this will not take effect, so I temporarily removed it.
The EventID of CMN events is different from other events. For example, hnf_slc_sf_cache_access corresponds to arm_cmn_0/type=0x5,eventid=0x2/.
The current JSON format parsing does not support this EventID, and jevent.py needs to be extended.

> The cmn kernel driver exposes event hnf_slc_sf_cache_access, but I did not think that perf tool metric code matches those events described in /bus/event_sourcs/devices/<PMU>/events
> 

If there is no alias defined, other events with the same name may be matched, it is indeed necessary to define an alias for each event first,
and I will add it in the next version. But first I need to extend jevent.py to support cmn EventID.

>> found event hnf_cache_miss
>> Parsing metric events '{hnf_slc_sf_cache_access/metric-id=hnf_slc_sf_cache_access/,hnf_cache_miss/metric-id=hnf_cache_miss/}:W,duration_time'
>> hnf_slc_sf_cache_access -> arm_cmn_0/type=0x5,eventid=0x2/
>> hnf_slc_sf_cache_access -> arm_cmn_1/type=0x5,eventid=0x2/
>> hnf_cache_miss -> arm_cmn_0/type=0x5,eventid=0x1/
>> hnf_cache_miss -> arm_cmn_1/type=0x5,eventid=0x1/
>> Control descriptor is not initialized
>> hnf_slc_sf_cache_access: 127615 1001344900 1001344900
>> hnf_cache_miss: 36829 1001344900 1001344900
>> hnf_slc_sf_cache_access: 131526 1001343540 1001343540
>> hnf_cache_miss: 40587 1001343540 1001343540
>> duration_time: 1001381687 1001381687 1001381687
>>
>>   Performance counter stats for 'system wide':
>>
>>             259,141      hnf_slc_sf_cache_access   #     29.9 %  slc_miss_rate
>>              77,416      hnf_cache_miss
>>       1,001,381,687 ns   duration_time
>>
>>         1.001381687 seconds time elapsed
>>
>>
>>
>> #./perf list
>> ...
>>   arm_cmn_0/hnf_cache_miss/                          [Kernel PMU event]
>>   arm_cmn_0/hnf_slc_sf_cache_access/                 [Kernel PMU event]
>> ...
>>   arm_cmn_1/hnf_cache_miss/                          [Kernel PMU event]
>>   arm_cmn_1/hnf_slc_sf_cache_access/                 [Kernel PMU event]
>> ...
>>
>>>> In the past, I always thought that the function of the alias was to explain the meaning of these events in the perf list.
>>>> Or maybe I'm missing something?
>>> Event aliases do give the ability to describe the event in perf list. But we can also run them for 'perf stat', like:
>>>
>>> ./perf list uncore
>>> List of pre-defined events (to be used in -e or -M):
>>>
>>>    uncore_cbox_0/clockticks/                          [Kernel PMU event]
>>>    uncore_cbox_1/clockticks/                          [Kernel PMU event]
>>>    uncore_imc/data_reads/                             [Kernel PMU event]
>>>    uncore_imc/data_writes/                            [Kernel PMU event]
>>>    uncore_imc/gt_requests/                            [Kernel PMU event]
>>>    uncore_imc/ia_requests/                            [Kernel PMU event]
>>>    uncore_imc/io_requests/                            [Kernel PMU event]
>>>
>>> uncore cache:
>>>    unc_cbo_cache_lookup.any_es
>>>         [L3 Lookup any request that access cache and found line in E or S-state. Unit: uncore_cbox]
>>> ...
>>>
>>> sudo ./perf stat -v -e unc_cbo_cache_lookup.any_es
>>> Using CPUID GenuineIntel-6-3D-4
>>> unc_cbo_cache_lookup.any_es -> uncore_cbox_0/event=0x34,umask=0x86/
>>> unc_cbo_cache_lookup.any_es -> uncore_cbox_1/event=0x34,umask=0x86/
>>> Control descriptor is not initialized
>>> ^Cunc_cbo_cache_lookup.any_es: 14361103 1853372468 1853372468
>>> unc_cbo_cache_lookup.any_es: 14322188 1853360415 1853360415
>>>
>>>   Performance counter stats for 'system wide':
>>>
>>>          14,361,103      unc_cbo_cache_lookup.any_es
>>>          14,322,188      unc_cbo_cache_lookup.any_es
>>>
>>>         1.853388227 seconds time elapsed
>>>
>> Ok, thanks. If I use events without a prefix, such as perf stat -e clockticks sleep 1, will this also work?
> 
> In this case, yes - it would work for uncore_cbox_0/clockticks/ and uncore_cbox_1/clockticks/
> 
> But you need to be careful to here - if another PMU has same event name, then it might also match.
> 

Ok, I got it.

Thanks,
Jing

> Thanks,
> John
  
John Garry June 20, 2023, 7:01 a.m. UTC | #20
On 20/06/2023 03:12, Jing Zhang wrote:
>> But that JSON is not in this series. Why is it not included?
>>
> Because the RFC version of the cmn.json file does not define the EventCode for each event, this will not take effect, so I temporarily removed it.
> The EventID of CMN events is different from other events. For example, hnf_slc_sf_cache_access corresponds to arm_cmn_0/type=0x5,eventid=0x2/.
> The current JSON format parsing does not support this EventID, and jevent.py needs to be extended.

So please do that then. I would suggest just to first support event 
aliasing, and then support metrics. JFYI, I am still reworking current 
perf tool metric code for sys events, which should take a few more days.

> 
>> The cmn kernel driver exposes event hnf_slc_sf_cache_access, but I did not think that perf tool metric code matches those events described in/bus/event_sourcs/devices/<PMU>/events
>>
> If there is no alias defined, other events with the same name may be matched, it is indeed necessary to define an alias for each event first,
> and I will add it in the next version. But first I need to extend jevent.py to support cmn EventID.

ok, please do this. I assume that you are talking about wildcard 
matching for cmn HW identifier. We would also need perf tool self-test 
expanded to cover this wildcard matching - see tests/pmu-events.c

Thanks,
John
  
Jing Zhang June 21, 2023, 3:15 a.m. UTC | #21
在 2023/6/20 下午3:01, John Garry 写道:
> On 20/06/2023 03:12, Jing Zhang wrote:
>>> But that JSON is not in this series. Why is it not included?
>>>
>> Because the RFC version of the cmn.json file does not define the EventCode for each event, this will not take effect, so I temporarily removed it.
>> The EventID of CMN events is different from other events. For example, hnf_slc_sf_cache_access corresponds to arm_cmn_0/type=0x5,eventid=0x2/.
>> The current JSON format parsing does not support this EventID, and jevent.py needs to be extended.
> 
> So please do that then. I would suggest just to first support event aliasing, and then support metrics. JFYI, I am still reworking current perf tool metric code for sys events, which should take a few more days.
> 

Ok, thank you.

>>
>>> The cmn kernel driver exposes event hnf_slc_sf_cache_access, but I did not think that perf tool metric code matches those events described in/bus/event_sourcs/devices/<PMU>/events
>>>
>> If there is no alias defined, other events with the same name may be matched, it is indeed necessary to define an alias for each event first,
>> and I will add it in the next version. But first I need to extend jevent.py to support cmn EventID.
> 
> ok, please do this. I assume that you are talking about wildcard matching for cmn HW identifier. We would also need perf tool self-test expanded to cover this wildcard matching - see tests/pmu-events.c
> 

Ok, let me try.

Thanks,
Jing
  
Namhyung Kim June 23, 2023, 11:52 p.m. UTC | #22
Hello,

On Wed, Jun 14, 2023 at 8:42 PM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>
>
>
> 在 2023/6/6 上午3:39, Arnaldo Carvalho de Melo 写道:
> > Em Mon, Jun 05, 2023 at 10:46:36AM +0800, Jing Zhang escreveu:
> >> 在 2023/6/3 上午12:20, John Garry 写道:
> >>> On 01/06/2023 09:58, Jing Zhang wrote:
> >>>> It can match identifiers "arm_cmn600_{0,1,2..X}" or "arm_cmn650_{0,1,2..X}" or "arm_cmn700_{0,1,2..X}" or "arm_ci700_{ 0,1,2..X}".
> >>>> In other words, it can match all identifiers prefixed with “arm_cmn600” or “arm_cmn650” or “arm_cmn700” or “arm_ci700”.
> >>>>
> >>>> If a new model arm_cmn driver with identifier "arm_cmn750_0", it will not be matched, but if a new revision arm_cmn driver with identifier
> >>>> "arm_cmn700_4", it can be matched.
> >>>
> >>> OK, I see what you mean. My confusion came about though your commit message on this same patch, which did not mention cmn650. I assumed that the example event which you were describing was supported for arm_cmn650 and you intentionally omitted it.
> >>>
> >>
> >> Yes, I didn't write cmn650 in the commit message, just because I omitted it. I will add it in a later version.
> >
> > Ok, will wait for v4 then.
> >
> > - Arnaldo
>
> Hi Arnaldo,
>
> Thank you for following up this patchset, maybe we can merge patch4 to patch7 into your branch first?
> They have all been reviewed/acked and there is no dispute. And patch4-7 does not depend on patch1-3,
> because patch4-7 is about the metric of ali_drw_pmu, and patch1-3 is about the metric of arm_cmn.
> It may take some time for patch1-3 to reach a consensus.

IIUC patch 4 is for the kernel and patch 5-7 semantically depend on it.
So the patch 4 should be picked up by the kernel maintainers first.

Thanks,
Namhyung
  
Jing Zhang June 25, 2023, 8:55 a.m. UTC | #23
在 2023/6/24 上午7:52, Namhyung Kim 写道:
> Hello,
> 
> On Wed, Jun 14, 2023 at 8:42 PM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>>
>>
>>
>> 在 2023/6/6 上午3:39, Arnaldo Carvalho de Melo 写道:
>>> Em Mon, Jun 05, 2023 at 10:46:36AM +0800, Jing Zhang escreveu:
>>>> 在 2023/6/3 上午12:20, John Garry 写道:
>>>>> On 01/06/2023 09:58, Jing Zhang wrote:
>>>>>> It can match identifiers "arm_cmn600_{0,1,2..X}" or "arm_cmn650_{0,1,2..X}" or "arm_cmn700_{0,1,2..X}" or "arm_ci700_{ 0,1,2..X}".
>>>>>> In other words, it can match all identifiers prefixed with “arm_cmn600” or “arm_cmn650” or “arm_cmn700” or “arm_ci700”.
>>>>>>
>>>>>> If a new model arm_cmn driver with identifier "arm_cmn750_0", it will not be matched, but if a new revision arm_cmn driver with identifier
>>>>>> "arm_cmn700_4", it can be matched.
>>>>>
>>>>> OK, I see what you mean. My confusion came about though your commit message on this same patch, which did not mention cmn650. I assumed that the example event which you were describing was supported for arm_cmn650 and you intentionally omitted it.
>>>>>
>>>>
>>>> Yes, I didn't write cmn650 in the commit message, just because I omitted it. I will add it in a later version.
>>>
>>> Ok, will wait for v4 then.
>>>
>>> - Arnaldo
>>
>> Hi Arnaldo,
>>
>> Thank you for following up this patchset, maybe we can merge patch4 to patch7 into your branch first?
>> They have all been reviewed/acked and there is no dispute. And patch4-7 does not depend on patch1-3,
>> because patch4-7 is about the metric of ali_drw_pmu, and patch1-3 is about the metric of arm_cmn.
>> It may take some time for patch1-3 to reach a consensus.
> 
> IIUC patch 4 is for the kernel and patch 5-7 semantically depend on it.
> So the patch 4 should be picked up by the kernel maintainers first.
> 

Yes, that's right, thanks.

> Thanks,
> Namhyung
  

Patch

diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index f3559be..c12ccd9 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -456,6 +456,28 @@  struct metricgroup_iter_data {
 	void *data;
 };
 
+static bool match_pmu_identifier(const char *id, const char *compat)
+{
+	char *tmp = NULL, *tok, *str;
+	bool res;
+
+	str = strdup(compat);
+	if (!str)
+		return false;
+
+	tok = strtok_r(str, ";", &tmp);
+	for (; tok; tok = strtok_r(NULL, ";", &tmp)) {
+		if (!strncmp(id, tok, strlen(tok))) {
+			res = true;
+			goto out;
+		}
+	}
+	res = false;
+out:
+	free(str);
+	return res;
+}
+
 static int metricgroup__sys_event_iter(const struct pmu_metric *pm,
 				       const struct pmu_metrics_table *table,
 				       void *data)
@@ -468,7 +490,7 @@  static int metricgroup__sys_event_iter(const struct pmu_metric *pm,
 
 	while ((pmu = perf_pmu__scan(pmu))) {
 
-		if (!pmu->id || strcmp(pmu->id, pm->compat))
+		if (!pmu->id || !match_pmu_identifier(pmu->id, pm->compat))
 			continue;
 
 		return d->fn(pm, table, d->data);