[kvm-unit-tests,2/5] x86: pmu: Change the minimum value of llc_misses event to 0

Message ID 20231024075748.1675382-3-dapeng1.mi@linux.intel.com
State New
Headers
Series Fix PMU test failures on Sapphire Rapids |

Commit Message

Mi, Dapeng Oct. 24, 2023, 7:57 a.m. UTC
  Along with the CPU HW's upgrade and optimization, the count of LLC
misses event for running loop() helper could be 0 just like seen on
Sapphire Rapids.

So modify the lower limit of possible count range for LLC misses
events to 0 to avoid LLC misses event test failure on Sapphire Rapids.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 x86/pmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Jim Mattson Oct. 24, 2023, 1:03 p.m. UTC | #1
On Tue, Oct 24, 2023 at 12:51 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>
> Along with the CPU HW's upgrade and optimization, the count of LLC
> misses event for running loop() helper could be 0 just like seen on
> Sapphire Rapids.
>
> So modify the lower limit of possible count range for LLC misses
> events to 0 to avoid LLC misses event test failure on Sapphire Rapids.

I'm not convinced that these tests are really indicative of whether or
not the PMU is working properly. If 0 is allowed for llc misses, for
instance, doesn't this sub-test pass even when the PMU is disabled?

Surely, we can do better.

> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
>  x86/pmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/x86/pmu.c b/x86/pmu.c
> index 0def28695c70..7443fdab5c8a 100644
> --- a/x86/pmu.c
> +++ b/x86/pmu.c
> @@ -35,7 +35,7 @@ struct pmu_event {
>         {"instructions", 0x00c0, 10*N, 10.2*N},
>         {"ref cycles", 0x013c, 1*N, 30*N},
>         {"llc references", 0x4f2e, 1, 2*N},
> -       {"llc misses", 0x412e, 1, 1*N},
> +       {"llc misses", 0x412e, 0, 1*N},
>         {"branches", 0x00c4, 1*N, 1.1*N},
>         {"branch misses", 0x00c5, 0, 0.1*N},
>  }, amd_gp_events[] = {
> --
> 2.34.1
>
  
Mi, Dapeng Oct. 25, 2023, 11:22 a.m. UTC | #2
On 10/24/2023 9:03 PM, Jim Mattson wrote:
> On Tue, Oct 24, 2023 at 12:51 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>> Along with the CPU HW's upgrade and optimization, the count of LLC
>> misses event for running loop() helper could be 0 just like seen on
>> Sapphire Rapids.
>>
>> So modify the lower limit of possible count range for LLC misses
>> events to 0 to avoid LLC misses event test failure on Sapphire Rapids.
> I'm not convinced that these tests are really indicative of whether or
> not the PMU is working properly. If 0 is allowed for llc misses, for
> instance, doesn't this sub-test pass even when the PMU is disabled?
>
> Surely, we can do better.


Considering the testing workload is just a simple adding loop, it's 
reasonable and possible that it gets a 0 result for LLC misses and 
branch misses events. Yeah, I agree the 0 count makes the results not so 
credible. If we want to avoid these 0 count values, we may have to 
complicate the workload, such as adding flush cache instructions, or 
something like that (I'm not sure if there are instructions which can 
force branch misses). How's your idea about this?


>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> ---
>>   x86/pmu.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/x86/pmu.c b/x86/pmu.c
>> index 0def28695c70..7443fdab5c8a 100644
>> --- a/x86/pmu.c
>> +++ b/x86/pmu.c
>> @@ -35,7 +35,7 @@ struct pmu_event {
>>          {"instructions", 0x00c0, 10*N, 10.2*N},
>>          {"ref cycles", 0x013c, 1*N, 30*N},
>>          {"llc references", 0x4f2e, 1, 2*N},
>> -       {"llc misses", 0x412e, 1, 1*N},
>> +       {"llc misses", 0x412e, 0, 1*N},
>>          {"branches", 0x00c4, 1*N, 1.1*N},
>>          {"branch misses", 0x00c5, 0, 0.1*N},
>>   }, amd_gp_events[] = {
>> --
>> 2.34.1
>>
  
Jim Mattson Oct. 25, 2023, 12:35 p.m. UTC | #3
On Wed, Oct 25, 2023 at 4:23 AM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>
>
> On 10/24/2023 9:03 PM, Jim Mattson wrote:
> > On Tue, Oct 24, 2023 at 12:51 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
> >> Along with the CPU HW's upgrade and optimization, the count of LLC
> >> misses event for running loop() helper could be 0 just like seen on
> >> Sapphire Rapids.
> >>
> >> So modify the lower limit of possible count range for LLC misses
> >> events to 0 to avoid LLC misses event test failure on Sapphire Rapids.
> > I'm not convinced that these tests are really indicative of whether or
> > not the PMU is working properly. If 0 is allowed for llc misses, for
> > instance, doesn't this sub-test pass even when the PMU is disabled?
> >
> > Surely, we can do better.
>
>
> Considering the testing workload is just a simple adding loop, it's
> reasonable and possible that it gets a 0 result for LLC misses and
> branch misses events. Yeah, I agree the 0 count makes the results not so
> credible. If we want to avoid these 0 count values, we may have to
> complicate the workload, such as adding flush cache instructions, or
> something like that (I'm not sure if there are instructions which can
> force branch misses). How's your idea about this?

CLFLUSH is probably a good way to ensure cache misses. IBPB may be a
good way to ensure branch mispredictions, or IBRS on parts without
eIBRS.

>
> >
> >> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> >> ---
> >>   x86/pmu.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/x86/pmu.c b/x86/pmu.c
> >> index 0def28695c70..7443fdab5c8a 100644
> >> --- a/x86/pmu.c
> >> +++ b/x86/pmu.c
> >> @@ -35,7 +35,7 @@ struct pmu_event {
> >>          {"instructions", 0x00c0, 10*N, 10.2*N},
> >>          {"ref cycles", 0x013c, 1*N, 30*N},
> >>          {"llc references", 0x4f2e, 1, 2*N},
> >> -       {"llc misses", 0x412e, 1, 1*N},
> >> +       {"llc misses", 0x412e, 0, 1*N},
> >>          {"branches", 0x00c4, 1*N, 1.1*N},
> >>          {"branch misses", 0x00c5, 0, 0.1*N},
> >>   }, amd_gp_events[] = {
> >> --
> >> 2.34.1
> >>
  
Mi, Dapeng Oct. 26, 2023, 2:14 a.m. UTC | #4
On 10/25/2023 8:35 PM, Jim Mattson wrote:
> On Wed, Oct 25, 2023 at 4:23 AM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>>
>> On 10/24/2023 9:03 PM, Jim Mattson wrote:
>>> On Tue, Oct 24, 2023 at 12:51 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>>>> Along with the CPU HW's upgrade and optimization, the count of LLC
>>>> misses event for running loop() helper could be 0 just like seen on
>>>> Sapphire Rapids.
>>>>
>>>> So modify the lower limit of possible count range for LLC misses
>>>> events to 0 to avoid LLC misses event test failure on Sapphire Rapids.
>>> I'm not convinced that these tests are really indicative of whether or
>>> not the PMU is working properly. If 0 is allowed for llc misses, for
>>> instance, doesn't this sub-test pass even when the PMU is disabled?
>>>
>>> Surely, we can do better.
>>
>> Considering the testing workload is just a simple adding loop, it's
>> reasonable and possible that it gets a 0 result for LLC misses and
>> branch misses events. Yeah, I agree the 0 count makes the results not so
>> credible. If we want to avoid these 0 count values, we may have to
>> complicate the workload, such as adding flush cache instructions, or
>> something like that (I'm not sure if there are instructions which can
>> force branch misses). How's your idea about this?
> CLFLUSH is probably a good way to ensure cache misses. IBPB may be a
> good way to ensure branch mispredictions, or IBRS on parts without
> eIBRS.


Thanks Jim for the information. I'm not familiar with IBPB/IBRS 
instructions, but just a glance, it looks there two instructions are 
some kind of advanced instructions,  Not all Intel CPUs support these 
instructions and not sure if AMD has similar instructions. It would be 
better if there are more generic instruction to trigger branch miss. 
Anyway I would look at the details and come back again.


>>>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>>> ---
>>>>    x86/pmu.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/x86/pmu.c b/x86/pmu.c
>>>> index 0def28695c70..7443fdab5c8a 100644
>>>> --- a/x86/pmu.c
>>>> +++ b/x86/pmu.c
>>>> @@ -35,7 +35,7 @@ struct pmu_event {
>>>>           {"instructions", 0x00c0, 10*N, 10.2*N},
>>>>           {"ref cycles", 0x013c, 1*N, 30*N},
>>>>           {"llc references", 0x4f2e, 1, 2*N},
>>>> -       {"llc misses", 0x412e, 1, 1*N},
>>>> +       {"llc misses", 0x412e, 0, 1*N},
>>>>           {"branches", 0x00c4, 1*N, 1.1*N},
>>>>           {"branch misses", 0x00c5, 0, 0.1*N},
>>>>    }, amd_gp_events[] = {
>>>> --
>>>> 2.34.1
>>>>
  
Jim Mattson Oct. 26, 2023, 12:19 p.m. UTC | #5
On Wed, Oct 25, 2023 at 7:14 PM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>
>
> On 10/25/2023 8:35 PM, Jim Mattson wrote:
> > On Wed, Oct 25, 2023 at 4:23 AM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
> >>
> >> On 10/24/2023 9:03 PM, Jim Mattson wrote:
> >>> On Tue, Oct 24, 2023 at 12:51 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
> >>>> Along with the CPU HW's upgrade and optimization, the count of LLC
> >>>> misses event for running loop() helper could be 0 just like seen on
> >>>> Sapphire Rapids.
> >>>>
> >>>> So modify the lower limit of possible count range for LLC misses
> >>>> events to 0 to avoid LLC misses event test failure on Sapphire Rapids.
> >>> I'm not convinced that these tests are really indicative of whether or
> >>> not the PMU is working properly. If 0 is allowed for llc misses, for
> >>> instance, doesn't this sub-test pass even when the PMU is disabled?
> >>>
> >>> Surely, we can do better.
> >>
> >> Considering the testing workload is just a simple adding loop, it's
> >> reasonable and possible that it gets a 0 result for LLC misses and
> >> branch misses events. Yeah, I agree the 0 count makes the results not so
> >> credible. If we want to avoid these 0 count values, we may have to
> >> complicate the workload, such as adding flush cache instructions, or
> >> something like that (I'm not sure if there are instructions which can
> >> force branch misses). How's your idea about this?
> > CLFLUSH is probably a good way to ensure cache misses. IBPB may be a
> > good way to ensure branch mispredictions, or IBRS on parts without
> > eIBRS.
>
>
> Thanks Jim for the information. I'm not familiar with IBPB/IBRS
> instructions, but just a glance, it looks there two instructions are
> some kind of advanced instructions,  Not all Intel CPUs support these
> instructions and not sure if AMD has similar instructions. It would be
> better if there are more generic instruction to trigger branch miss.
> Anyway I would look at the details and come back again.

IBPB and IBRS are not instructions. IBPB (indirect branch predictor
barrier) is triggered by setting bit 0 of the IA32_PRED_CMD MSR. IBRS
(indirect branch restricted speculation) is triggered by setting bit 0
of the IA32_SPEC_CTRL MSR. It is true that the desired behavior of
IBRS (causing branch mispredictions) is only exhibited by certain
older parts. However, IBPB is now universally available, as it is
necessary to mitigate many speculative execution attacks. For Intel
documentation, see
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/cpuid-enumeration-and-architectural-msrs.html.

If you don't want to use these, you could train a branch to go one way
prior to measurement, and then arrange for the branch under test go
the other way.

> >>>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> >>>> ---
> >>>>    x86/pmu.c | 2 +-
> >>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/x86/pmu.c b/x86/pmu.c
> >>>> index 0def28695c70..7443fdab5c8a 100644
> >>>> --- a/x86/pmu.c
> >>>> +++ b/x86/pmu.c
> >>>> @@ -35,7 +35,7 @@ struct pmu_event {
> >>>>           {"instructions", 0x00c0, 10*N, 10.2*N},
> >>>>           {"ref cycles", 0x013c, 1*N, 30*N},
> >>>>           {"llc references", 0x4f2e, 1, 2*N},
> >>>> -       {"llc misses", 0x412e, 1, 1*N},
> >>>> +       {"llc misses", 0x412e, 0, 1*N},
> >>>>           {"branches", 0x00c4, 1*N, 1.1*N},
> >>>>           {"branch misses", 0x00c5, 0, 0.1*N},
> >>>>    }, amd_gp_events[] = {
> >>>> --
> >>>> 2.34.1
> >>>>
  
Mi, Dapeng Oct. 27, 2023, 10:17 a.m. UTC | #6
On 10/26/2023 8:19 PM, Jim Mattson wrote:
> On Wed, Oct 25, 2023 at 7:14 PM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>>
>> On 10/25/2023 8:35 PM, Jim Mattson wrote:
>>> On Wed, Oct 25, 2023 at 4:23 AM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>>>> On 10/24/2023 9:03 PM, Jim Mattson wrote:
>>>>> On Tue, Oct 24, 2023 at 12:51 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>>>>>> Along with the CPU HW's upgrade and optimization, the count of LLC
>>>>>> misses event for running loop() helper could be 0 just like seen on
>>>>>> Sapphire Rapids.
>>>>>>
>>>>>> So modify the lower limit of possible count range for LLC misses
>>>>>> events to 0 to avoid LLC misses event test failure on Sapphire Rapids.
>>>>> I'm not convinced that these tests are really indicative of whether or
>>>>> not the PMU is working properly. If 0 is allowed for llc misses, for
>>>>> instance, doesn't this sub-test pass even when the PMU is disabled?
>>>>>
>>>>> Surely, we can do better.
>>>> Considering the testing workload is just a simple adding loop, it's
>>>> reasonable and possible that it gets a 0 result for LLC misses and
>>>> branch misses events. Yeah, I agree the 0 count makes the results not so
>>>> credible. If we want to avoid these 0 count values, we may have to
>>>> complicate the workload, such as adding flush cache instructions, or
>>>> something like that (I'm not sure if there are instructions which can
>>>> force branch misses). How's your idea about this?
>>> CLFLUSH is probably a good way to ensure cache misses. IBPB may be a
>>> good way to ensure branch mispredictions, or IBRS on parts without
>>> eIBRS.
>>
>> Thanks Jim for the information. I'm not familiar with IBPB/IBRS
>> instructions, but just a glance, it looks there two instructions are
>> some kind of advanced instructions,  Not all Intel CPUs support these
>> instructions and not sure if AMD has similar instructions. It would be
>> better if there are more generic instruction to trigger branch miss.
>> Anyway I would look at the details and come back again.
> IBPB and IBRS are not instructions. IBPB (indirect branch predictor
> barrier) is triggered by setting bit 0 of the IA32_PRED_CMD MSR. IBRS
> (indirect branch restricted speculation) is triggered by setting bit 0
> of the IA32_SPEC_CTRL MSR. It is true that the desired behavior of
> IBRS (causing branch mispredictions) is only exhibited by certain
> older parts. However, IBPB is now universally available, as it is
> necessary to mitigate many speculative execution attacks. For Intel
> documentation, see
> https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/cpuid-enumeration-and-architectural-msrs.html.
>
> If you don't want to use these, you could train a branch to go one way
> prior to measurement, and then arrange for the branch under test go
> the other way.


Thanks Jim. From my point of view, IBPB is still some kind of extended 
feature which may be not supported on some older platforms. Considering 
kvm-unit-tests could still be run on these old platforms, IBPB seems not 
the best choice. I'm thinking an alternative way is to use the 'rdrand' 
instruction to get a random value, and then call jmp instruction base on 
the random value results. That would definitely cause branch misses. 
This looks more generic.


>
>>>>>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>>>>> ---
>>>>>>     x86/pmu.c | 2 +-
>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/x86/pmu.c b/x86/pmu.c
>>>>>> index 0def28695c70..7443fdab5c8a 100644
>>>>>> --- a/x86/pmu.c
>>>>>> +++ b/x86/pmu.c
>>>>>> @@ -35,7 +35,7 @@ struct pmu_event {
>>>>>>            {"instructions", 0x00c0, 10*N, 10.2*N},
>>>>>>            {"ref cycles", 0x013c, 1*N, 30*N},
>>>>>>            {"llc references", 0x4f2e, 1, 2*N},
>>>>>> -       {"llc misses", 0x412e, 1, 1*N},
>>>>>> +       {"llc misses", 0x412e, 0, 1*N},
>>>>>>            {"branches", 0x00c4, 1*N, 1.1*N},
>>>>>>            {"branch misses", 0x00c5, 0, 0.1*N},
>>>>>>     }, amd_gp_events[] = {
>>>>>> --
>>>>>> 2.34.1
>>>>>>
  

Patch

diff --git a/x86/pmu.c b/x86/pmu.c
index 0def28695c70..7443fdab5c8a 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -35,7 +35,7 @@  struct pmu_event {
 	{"instructions", 0x00c0, 10*N, 10.2*N},
 	{"ref cycles", 0x013c, 1*N, 30*N},
 	{"llc references", 0x4f2e, 1, 2*N},
-	{"llc misses", 0x412e, 1, 1*N},
+	{"llc misses", 0x412e, 0, 1*N},
 	{"branches", 0x00c4, 1*N, 1.1*N},
 	{"branch misses", 0x00c5, 0, 0.1*N},
 }, amd_gp_events[] = {