KVM: x86/svm/pmu: Set PerfMonV2 global control bits correctly

Message ID 20240301075007.644152-1-sandipan.das@amd.com
State New
Headers
Series KVM: x86/svm/pmu: Set PerfMonV2 global control bits correctly |

Commit Message

Sandipan Das March 1, 2024, 7:50 a.m. UTC
  With PerfMonV2, a performance monitoring counter will start operating
only when both the PERF_CTLx enable bit as well as the corresponding
PerfCntrGlobalCtl enable bit are set.

When the PerfMonV2 CPUID feature bit (leaf 0x80000022 EAX bit 0) is set
for a guest but the guest kernel does not support PerfMonV2 (such as
kernels older than v5.19), the guest counters do not count since the
PerfCntrGlobalCtl MSR is initialized to zero and the guest kernel never
writes to it.

This is not observed on bare-metal as the default value of the
PerfCntrGlobalCtl MSR after a reset is 0x3f (assuming there are six
counters) and the counters can still be operated by using the enable
bit in the PERF_CTLx MSRs. Replicate the same behaviour in guests for
compatibility with older kernels.

Before:

  $ perf stat -e cycles:u true

   Performance counter stats for 'true':

                   0      cycles:u

         0.001074773 seconds time elapsed

         0.001169000 seconds user
         0.000000000 seconds sys

After:

  $ perf stat -e cycles:u true

   Performance counter stats for 'true':

             227,850      cycles:u

         0.037770758 seconds time elapsed

         0.000000000 seconds user
         0.037886000 seconds sys

Reported-by: Babu Moger <babu.moger@amd.com>
Fixes: 4a2771895ca6 ("KVM: x86/svm/pmu: Add AMD PerfMonV2 support")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/kvm/svm/pmu.c | 1 +
 1 file changed, 1 insertion(+)
  

Comments

Like Xu March 1, 2024, 8:37 a.m. UTC | #1
On 1/3/2024 3:50 pm, Sandipan Das wrote:
> With PerfMonV2, a performance monitoring counter will start operating
> only when both the PERF_CTLx enable bit as well as the corresponding
> PerfCntrGlobalCtl enable bit are set.
> 
> When the PerfMonV2 CPUID feature bit (leaf 0x80000022 EAX bit 0) is set
> for a guest but the guest kernel does not support PerfMonV2 (such as
> kernels older than v5.19), the guest counters do not count since the
> PerfCntrGlobalCtl MSR is initialized to zero and the guest kernel never
> writes to it.

If the vcpu has the PerfMonV2 feature, it should not work the way legacy
PMU does. Users need to use the new driver to operate the new hardware,
don't they ? One practical approach is that the hypervisor should not set
the PerfMonV2 bit for this unpatched 'v5.19' guest.

> 
> This is not observed on bare-metal as the default value of the
> PerfCntrGlobalCtl MSR after a reset is 0x3f (assuming there are six
> counters) and the counters can still be operated by using the enable
> bit in the PERF_CTLx MSRs. Replicate the same behaviour in guests for
> compatibility with older kernels.
> 
> Before:
> 
>    $ perf stat -e cycles:u true
> 
>     Performance counter stats for 'true':
> 
>                     0      cycles:u
> 
>           0.001074773 seconds time elapsed
> 
>           0.001169000 seconds user
>           0.000000000 seconds sys
> 
> After:
> 
>    $ perf stat -e cycles:u true
> 
>     Performance counter stats for 'true':
> 
>               227,850      cycles:u
> 
>           0.037770758 seconds time elapsed
> 
>           0.000000000 seconds user
>           0.037886000 seconds sys
> 
> Reported-by: Babu Moger <babu.moger@amd.com>
> Fixes: 4a2771895ca6 ("KVM: x86/svm/pmu: Add AMD PerfMonV2 support")
> Signed-off-by: Sandipan Das <sandipan.das@amd.com>
> ---
>   arch/x86/kvm/svm/pmu.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> index b6a7ad4d6914..14709c564d6a 100644
> --- a/arch/x86/kvm/svm/pmu.c
> +++ b/arch/x86/kvm/svm/pmu.c
> @@ -205,6 +205,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
>   	if (pmu->version > 1) {
>   		pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1);
>   		pmu->global_status_mask = pmu->global_ctrl_mask;
> +		pmu->global_ctrl = ~pmu->global_ctrl_mask;
>   	}
>   
>   	pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
  
Sandipan Das March 1, 2024, 9 a.m. UTC | #2
On 3/1/2024 2:07 PM, Like Xu wrote:
> On 1/3/2024 3:50 pm, Sandipan Das wrote:
>> With PerfMonV2, a performance monitoring counter will start operating
>> only when both the PERF_CTLx enable bit as well as the corresponding
>> PerfCntrGlobalCtl enable bit are set.
>>
>> When the PerfMonV2 CPUID feature bit (leaf 0x80000022 EAX bit 0) is set
>> for a guest but the guest kernel does not support PerfMonV2 (such as
>> kernels older than v5.19), the guest counters do not count since the
>> PerfCntrGlobalCtl MSR is initialized to zero and the guest kernel never
>> writes to it.
> 
> If the vcpu has the PerfMonV2 feature, it should not work the way legacy
> PMU does. Users need to use the new driver to operate the new hardware,
> don't they ? One practical approach is that the hypervisor should not set
> the PerfMonV2 bit for this unpatched 'v5.19' guest.
> 

My understanding is that the legacy method of managing the counters should
still work because the enable bits in PerfCntrGlobalCtl are expected to be
set. The AMD PPR does mention that the PerfCntrEn bitfield of PerfCntrGlobalCtl
is set to 0x3f after a system reset. That way, the guest kernel can use either
the new or legacy method.

>>
>> This is not observed on bare-metal as the default value of the
>> PerfCntrGlobalCtl MSR after a reset is 0x3f (assuming there are six
>> counters) and the counters can still be operated by using the enable
>> bit in the PERF_CTLx MSRs. Replicate the same behaviour in guests for
>> compatibility with older kernels.
>>
>> Before:
>>
>>    $ perf stat -e cycles:u true
>>
>>     Performance counter stats for 'true':
>>
>>                     0      cycles:u
>>
>>           0.001074773 seconds time elapsed
>>
>>           0.001169000 seconds user
>>           0.000000000 seconds sys
>>
>> After:
>>
>>    $ perf stat -e cycles:u true
>>
>>     Performance counter stats for 'true':
>>
>>               227,850      cycles:u
>>
>>           0.037770758 seconds time elapsed
>>
>>           0.000000000 seconds user
>>           0.037886000 seconds sys
>>
>> Reported-by: Babu Moger <babu.moger@amd.com>
>> Fixes: 4a2771895ca6 ("KVM: x86/svm/pmu: Add AMD PerfMonV2 support")
>> Signed-off-by: Sandipan Das <sandipan.das@amd.com>
>> ---
>>   arch/x86/kvm/svm/pmu.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
>> index b6a7ad4d6914..14709c564d6a 100644
>> --- a/arch/x86/kvm/svm/pmu.c
>> +++ b/arch/x86/kvm/svm/pmu.c
>> @@ -205,6 +205,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
>>       if (pmu->version > 1) {
>>           pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1);
>>           pmu->global_status_mask = pmu->global_ctrl_mask;
>> +        pmu->global_ctrl = ~pmu->global_ctrl_mask;
>>       }
>>         pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
  
Mi, Dapeng March 4, 2024, 7:59 a.m. UTC | #3
On 3/1/2024 5:00 PM, Sandipan Das wrote:
> On 3/1/2024 2:07 PM, Like Xu wrote:
>> On 1/3/2024 3:50 pm, Sandipan Das wrote:
>>> With PerfMonV2, a performance monitoring counter will start operating
>>> only when both the PERF_CTLx enable bit as well as the corresponding
>>> PerfCntrGlobalCtl enable bit are set.
>>>
>>> When the PerfMonV2 CPUID feature bit (leaf 0x80000022 EAX bit 0) is set
>>> for a guest but the guest kernel does not support PerfMonV2 (such as
>>> kernels older than v5.19), the guest counters do not count since the
>>> PerfCntrGlobalCtl MSR is initialized to zero and the guest kernel never
>>> writes to it.
>> If the vcpu has the PerfMonV2 feature, it should not work the way legacy
>> PMU does. Users need to use the new driver to operate the new hardware,
>> don't they ? One practical approach is that the hypervisor should not set
>> the PerfMonV2 bit for this unpatched 'v5.19' guest.
>>
> My understanding is that the legacy method of managing the counters should
> still work because the enable bits in PerfCntrGlobalCtl are expected to be
> set. The AMD PPR does mention that the PerfCntrEn bitfield of PerfCntrGlobalCtl
> is set to 0x3f after a system reset. That way, the guest kernel can use either


If so, please add the PPR description here as comments.


> the new or legacy method.
>
>>> This is not observed on bare-metal as the default value of the
>>> PerfCntrGlobalCtl MSR after a reset is 0x3f (assuming there are six
>>> counters) and the counters can still be operated by using the enable
>>> bit in the PERF_CTLx MSRs. Replicate the same behaviour in guests for
>>> compatibility with older kernels.
>>>
>>> Before:
>>>
>>>     $ perf stat -e cycles:u true
>>>
>>>      Performance counter stats for 'true':
>>>
>>>                      0      cycles:u
>>>
>>>            0.001074773 seconds time elapsed
>>>
>>>            0.001169000 seconds user
>>>            0.000000000 seconds sys
>>>
>>> After:
>>>
>>>     $ perf stat -e cycles:u true
>>>
>>>      Performance counter stats for 'true':
>>>
>>>                227,850      cycles:u
>>>
>>>            0.037770758 seconds time elapsed
>>>
>>>            0.000000000 seconds user
>>>            0.037886000 seconds sys
>>>
>>> Reported-by: Babu Moger <babu.moger@amd.com>
>>> Fixes: 4a2771895ca6 ("KVM: x86/svm/pmu: Add AMD PerfMonV2 support")
>>> Signed-off-by: Sandipan Das <sandipan.das@amd.com>
>>> ---
>>>    arch/x86/kvm/svm/pmu.c | 1 +
>>>    1 file changed, 1 insertion(+)
>>>
>>> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
>>> index b6a7ad4d6914..14709c564d6a 100644
>>> --- a/arch/x86/kvm/svm/pmu.c
>>> +++ b/arch/x86/kvm/svm/pmu.c
>>> @@ -205,6 +205,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
>>>        if (pmu->version > 1) {
>>>            pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1);
>>>            pmu->global_status_mask = pmu->global_ctrl_mask;
>>> +        pmu->global_ctrl = ~pmu->global_ctrl_mask;


It seems to be more easily understand to calculate global_ctrl firstly 
and then derive the globol_ctrl_mask (negative logic).

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index e886300f0f97..7ac9b080aba6 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -199,7 +199,8 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
kvm_pmu_cap.num_counters_gp);

         if (pmu->version > 1) {
-               pmu->global_ctrl_mask = ~((1ull << 
pmu->nr_arch_gp_counters) - 1);
+               pmu->global_ctrl = (1ull << pmu->nr_arch_gp_counters) - 1;
+               pmu->global_ctrl_mask = ~pmu->global_ctrl;
                 pmu->global_status_mask = pmu->global_ctrl_mask;
         }

>>>        }
>>>          pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
>
  

Patch

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index b6a7ad4d6914..14709c564d6a 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -205,6 +205,7 @@  static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
 	if (pmu->version > 1) {
 		pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1);
 		pmu->global_status_mask = pmu->global_ctrl_mask;
+		pmu->global_ctrl = ~pmu->global_ctrl_mask;
 	}
 
 	pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;