[V5,05/10] KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met

Message ID 20230410105056.60973-6-likexu@tencent.com
State New
Headers
Series KVM: x86: Add AMD Guest PerfMonV2 PMU support |

Commit Message

Like Xu April 10, 2023, 10:50 a.m. UTC
  From: Like Xu <likexu@tencent.com>

Disable PMU support when running on AMD and perf reports fewer than four
general purpose counters. All AMD PMUs must define at least four counters
due to AMD's legacy architecture hardcoding the number of counters
without providing a way to enumerate the number of counters to software,
e.g. from AMD's APM:

 The legacy architecture defines four performance counters (PerfCtrn)
 and corresponding event-select registers (PerfEvtSeln).

Virtualizing fewer than four counters can lead to guest instability as
software expects four counters to be available.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/pmu.h | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Jim Mattson April 11, 2023, 5:36 a.m. UTC | #1
On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote:
>
> From: Like Xu <likexu@tencent.com>
>
> Disable PMU support when running on AMD and perf reports fewer than four
> general purpose counters. All AMD PMUs must define at least four counters
> due to AMD's legacy architecture hardcoding the number of counters
> without providing a way to enumerate the number of counters to software,
> e.g. from AMD's APM:
>
>  The legacy architecture defines four performance counters (PerfCtrn)
>  and corresponding event-select registers (PerfEvtSeln).
>
> Virtualizing fewer than four counters can lead to guest instability as
> software expects four counters to be available.

I'm confused. Isn't zero less than four?

> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Like Xu <likexu@tencent.com>
> ---
>  arch/x86/kvm/pmu.h | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index dd7c7d4ffe3b..002b527360f4 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
>                         enable_pmu = false;
>         }
>
> +       if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS)
> +               enable_pmu = false;
> +
>         if (!enable_pmu) {
>                 memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
>                 return;
> --
> 2.40.0
>
  
Like Xu April 11, 2023, 6:17 a.m. UTC | #2
On 11/4/2023 1:36 pm, Jim Mattson wrote:
> On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote:
>>
>> From: Like Xu <likexu@tencent.com>
>>
>> Disable PMU support when running on AMD and perf reports fewer than four
>> general purpose counters. All AMD PMUs must define at least four counters
>> due to AMD's legacy architecture hardcoding the number of counters
>> without providing a way to enumerate the number of counters to software,
>> e.g. from AMD's APM:
>>
>>   The legacy architecture defines four performance counters (PerfCtrn)
>>   and corresponding event-select registers (PerfEvtSeln).
>>
>> Virtualizing fewer than four counters can lead to guest instability as
>> software expects four counters to be available.
> 
> I'm confused. Isn't zero less than four?

As I understand it, you are saying that virtualization of zero counter is also 
reasonable.
If so, the above statement could be refined as:

	Virtualizing fewer than four counters when vPMU is enabled may lead to guest 
instability
	as software expects at least four counters to be available, thus the vPMU is 
disabled if the
	minimum number of KVM supported counters is not reached during initialization.

Jim, does this help you or could you explain more about your confusion ?

> 
>> Suggested-by: Sean Christopherson <seanjc@google.com>
>> Signed-off-by: Like Xu <likexu@tencent.com>
>> ---
>>   arch/x86/kvm/pmu.h | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
>> index dd7c7d4ffe3b..002b527360f4 100644
>> --- a/arch/x86/kvm/pmu.h
>> +++ b/arch/x86/kvm/pmu.h
>> @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
>>                          enable_pmu = false;
>>          }
>>
>> +       if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS)
>> +               enable_pmu = false;
>> +
>>          if (!enable_pmu) {
>>                  memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
>>                  return;
>> --
>> 2.40.0
>>
  
Jim Mattson April 11, 2023, 12:58 p.m. UTC | #3
On Mon, Apr 10, 2023 at 11:17 PM Like Xu <like.xu.linux@gmail.com> wrote:
>
> On 11/4/2023 1:36 pm, Jim Mattson wrote:
> > On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote:
> >>
> >> From: Like Xu <likexu@tencent.com>
> >>
> >> Disable PMU support when running on AMD and perf reports fewer than four
> >> general purpose counters. All AMD PMUs must define at least four counters
> >> due to AMD's legacy architecture hardcoding the number of counters
> >> without providing a way to enumerate the number of counters to software,
> >> e.g. from AMD's APM:
> >>
> >>   The legacy architecture defines four performance counters (PerfCtrn)
> >>   and corresponding event-select registers (PerfEvtSeln).
> >>
> >> Virtualizing fewer than four counters can lead to guest instability as
> >> software expects four counters to be available.
> >
> > I'm confused. Isn't zero less than four?
>
> As I understand it, you are saying that virtualization of zero counter is also
> reasonable.
> If so, the above statement could be refined as:
>
>         Virtualizing fewer than four counters when vPMU is enabled may lead to guest
> instability
>         as software expects at least four counters to be available, thus the vPMU is
> disabled if the
>         minimum number of KVM supported counters is not reached during initialization.
>
> Jim, does this help you or could you explain more about your confusion ?

You say that "fewer than four counters can lead to guest instability
as software expects four counters to be available." Your solution is
to disable the PMU, which leaves zero counters available. Zero is less
than four. Hence, by your claim, disabling the PMU can lead to guest
instability. I don't see how this is an improvement over one, two, or
three counters.

> >
> >> Suggested-by: Sean Christopherson <seanjc@google.com>
> >> Signed-off-by: Like Xu <likexu@tencent.com>
> >> ---
> >>   arch/x86/kvm/pmu.h | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> >> index dd7c7d4ffe3b..002b527360f4 100644
> >> --- a/arch/x86/kvm/pmu.h
> >> +++ b/arch/x86/kvm/pmu.h
> >> @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
> >>                          enable_pmu = false;
> >>          }
> >>
> >> +       if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS)
> >> +               enable_pmu = false;
> >> +
> >>          if (!enable_pmu) {
> >>                  memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
> >>                  return;
> >> --
> >> 2.40.0
> >>
  
Like Xu April 11, 2023, 1:17 p.m. UTC | #4
On 11/4/2023 8:58 pm, Jim Mattson wrote:
> On Mon, Apr 10, 2023 at 11:17 PM Like Xu <like.xu.linux@gmail.com> wrote:
>>
>> On 11/4/2023 1:36 pm, Jim Mattson wrote:
>>> On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote:
>>>>
>>>> From: Like Xu <likexu@tencent.com>
>>>>
>>>> Disable PMU support when running on AMD and perf reports fewer than four
>>>> general purpose counters. All AMD PMUs must define at least four counters
>>>> due to AMD's legacy architecture hardcoding the number of counters
>>>> without providing a way to enumerate the number of counters to software,
>>>> e.g. from AMD's APM:
>>>>
>>>>    The legacy architecture defines four performance counters (PerfCtrn)
>>>>    and corresponding event-select registers (PerfEvtSeln).
>>>>
>>>> Virtualizing fewer than four counters can lead to guest instability as
>>>> software expects four counters to be available.
>>>
>>> I'm confused. Isn't zero less than four?
>>
>> As I understand it, you are saying that virtualization of zero counter is also
>> reasonable.
>> If so, the above statement could be refined as:
>>
>>          Virtualizing fewer than four counters when vPMU is enabled may lead to guest
>> instability
>>          as software expects at least four counters to be available, thus the vPMU is
>> disabled if the
>>          minimum number of KVM supported counters is not reached during initialization.
>>
>> Jim, does this help you or could you explain more about your confusion ?
> 
> You say that "fewer than four counters can lead to guest instability
> as software expects four counters to be available." Your solution is
> to disable the PMU, which leaves zero counters available. Zero is less
> than four. Hence, by your claim, disabling the PMU can lead to guest
> instability. I don't see how this is an improvement over one, two, or
> three counters.

As you know, AMD pmu lacks an architected method (such as CPUID) to
indicate that the VM does not have any pmu counters available for the
current platform. Guests like Linux tend to check if their first counters
exist and work properly to infer that other pmu counters exist.

If KVM chooses to emulate greater than 1 less than 4 counters, then the
AMD guest PMU agent may assume that there are legacy 4 counters all
present (it's what the APM specifies), which requires the legacy code
to add #GP error handling for counters that should exist but actually not.

So at Sean's suggestion, we took a conservative approach. If KVM detects
less than 4 counters, we think KVM (under the current configuration and
platform) is not capable of emulating the most basic AMD pmu capability.
A large number of legacy instances are ready for 0 or 4+ ctrs, not 2 or 3.

Does this help you ? I wouldn't mind a better move.

> 
>>>
>>>> Suggested-by: Sean Christopherson <seanjc@google.com>
>>>> Signed-off-by: Like Xu <likexu@tencent.com>
>>>> ---
>>>>    arch/x86/kvm/pmu.h | 3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
>>>> index dd7c7d4ffe3b..002b527360f4 100644
>>>> --- a/arch/x86/kvm/pmu.h
>>>> +++ b/arch/x86/kvm/pmu.h
>>>> @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
>>>>                           enable_pmu = false;
>>>>           }
>>>>
>>>> +       if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS)
>>>> +               enable_pmu = false;
>>>> +
>>>>           if (!enable_pmu) {
>>>>                   memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
>>>>                   return;
>>>> --
>>>> 2.40.0
>>>>
  
Jim Mattson April 11, 2023, 2:58 p.m. UTC | #5
On Tue, Apr 11, 2023 at 6:18 AM Like Xu <like.xu.linux@gmail.com> wrote:
>
> On 11/4/2023 8:58 pm, Jim Mattson wrote:
> > On Mon, Apr 10, 2023 at 11:17 PM Like Xu <like.xu.linux@gmail.com> wrote:
> >>
> >> On 11/4/2023 1:36 pm, Jim Mattson wrote:
> >>> On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote:
> >>>>
> >>>> From: Like Xu <likexu@tencent.com>
> >>>>
> >>>> Disable PMU support when running on AMD and perf reports fewer than four
> >>>> general purpose counters. All AMD PMUs must define at least four counters
> >>>> due to AMD's legacy architecture hardcoding the number of counters
> >>>> without providing a way to enumerate the number of counters to software,
> >>>> e.g. from AMD's APM:
> >>>>
> >>>>    The legacy architecture defines four performance counters (PerfCtrn)
> >>>>    and corresponding event-select registers (PerfEvtSeln).
> >>>>
> >>>> Virtualizing fewer than four counters can lead to guest instability as
> >>>> software expects four counters to be available.
> >>>
> >>> I'm confused. Isn't zero less than four?
> >>
> >> As I understand it, you are saying that virtualization of zero counter is also
> >> reasonable.
> >> If so, the above statement could be refined as:
> >>
> >>          Virtualizing fewer than four counters when vPMU is enabled may lead to guest
> >> instability
> >>          as software expects at least four counters to be available, thus the vPMU is
> >> disabled if the
> >>          minimum number of KVM supported counters is not reached during initialization.
> >>
> >> Jim, does this help you or could you explain more about your confusion ?
> >
> > You say that "fewer than four counters can lead to guest instability
> > as software expects four counters to be available." Your solution is
> > to disable the PMU, which leaves zero counters available. Zero is less
> > than four. Hence, by your claim, disabling the PMU can lead to guest
> > instability. I don't see how this is an improvement over one, two, or
> > three counters.
>
> As you know, AMD pmu lacks an architected method (such as CPUID) to
> indicate that the VM does not have any pmu counters available for the
> current platform. Guests like Linux tend to check if their first counters
> exist and work properly to infer that other pmu counters exist.

"Guests like Linux," or just Linux? What do you mean by "tend"? When
do they perform this check, and when do they not?

> If KVM chooses to emulate greater than 1 less than 4 counters, then the
> AMD guest PMU agent may assume that there are legacy 4 counters all
> present (it's what the APM specifies), which requires the legacy code
> to add #GP error handling for counters that should exist but actually not.

I would argue that regardless of the number of counters emulated, a
guest PMU agent may assume that the 4 legacy counters are present,
since that's what the APM specifies.

> So at Sean's suggestion, we took a conservative approach. If KVM detects
> less than 4 counters, we think KVM (under the current configuration and
> platform) is not capable of emulating the most basic AMD pmu capability.
> A large number of legacy instances are ready for 0 or 4+ ctrs, not 2 or 3

Which specific guest operating systems is this change intended for?

> Does this help you ? I wouldn't mind a better move.

Which AMD platforms have less than 4 counters available?

>
> >
> >>>
> >>>> Suggested-by: Sean Christopherson <seanjc@google.com>
> >>>> Signed-off-by: Like Xu <likexu@tencent.com>
> >>>> ---
> >>>>    arch/x86/kvm/pmu.h | 3 +++
> >>>>    1 file changed, 3 insertions(+)
> >>>>
> >>>> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> >>>> index dd7c7d4ffe3b..002b527360f4 100644
> >>>> --- a/arch/x86/kvm/pmu.h
> >>>> +++ b/arch/x86/kvm/pmu.h
> >>>> @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
> >>>>                           enable_pmu = false;
> >>>>           }
> >>>>
> >>>> +       if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS)

Does this actually guarantee that the requisite number of counters are
available and will always be available while the guest is running?
What happens if some other client of the host perf subsystem requests
a CPU-pinned counter after this checck?

> >>>> +               enable_pmu = false;
> >>>> +
> >>>>           if (!enable_pmu) {
> >>>>                   memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
> >>>>                   return;
> >>>> --
> >>>> 2.40.0
> >>>>
  
Like Xu April 19, 2023, 9:34 a.m. UTC | #6
Jim, sorry for the late reply.

On 11/4/2023 10:58 pm, Jim Mattson wrote:
> On Tue, Apr 11, 2023 at 6:18 AM Like Xu <like.xu.linux@gmail.com> wrote:
>>
>> On 11/4/2023 8:58 pm, Jim Mattson wrote:
>>> On Mon, Apr 10, 2023 at 11:17 PM Like Xu <like.xu.linux@gmail.com> wrote:
>>>>
>>>> On 11/4/2023 1:36 pm, Jim Mattson wrote:
>>>>> On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote:
>>>>>>
>>>>>> From: Like Xu <likexu@tencent.com>
>>>>>>
>>>>>> Disable PMU support when running on AMD and perf reports fewer than four
>>>>>> general purpose counters. All AMD PMUs must define at least four counters
>>>>>> due to AMD's legacy architecture hardcoding the number of counters
>>>>>> without providing a way to enumerate the number of counters to software,
>>>>>> e.g. from AMD's APM:
>>>>>>
>>>>>>     The legacy architecture defines four performance counters (PerfCtrn)
>>>>>>     and corresponding event-select registers (PerfEvtSeln).
>>>>>>
>>>>>> Virtualizing fewer than four counters can lead to guest instability as
>>>>>> software expects four counters to be available.
>>>>>
>>>>> I'm confused. Isn't zero less than four?
>>>>
>>>> As I understand it, you are saying that virtualization of zero counter is also
>>>> reasonable.
>>>> If so, the above statement could be refined as:
>>>>
>>>>           Virtualizing fewer than four counters when vPMU is enabled may lead to guest
>>>> instability
>>>>           as software expects at least four counters to be available, thus the vPMU is
>>>> disabled if the
>>>>           minimum number of KVM supported counters is not reached during initialization.
>>>>
>>>> Jim, does this help you or could you explain more about your confusion ?
>>>
>>> You say that "fewer than four counters can lead to guest instability
>>> as software expects four counters to be available." Your solution is
>>> to disable the PMU, which leaves zero counters available. Zero is less
>>> than four. Hence, by your claim, disabling the PMU can lead to guest
>>> instability. I don't see how this is an improvement over one, two, or
>>> three counters.
>>
>> As you know, AMD pmu lacks an architected method (such as CPUID) to
>> indicate that the VM does not have any pmu counters available for the
>> current platform. Guests like Linux tend to check if their first counters
>> exist and work properly to infer that other pmu counters exist.
> 
> "Guests like Linux," or just Linux? What do you mean by "tend"? When
> do they perform this check, and when do they not?

We do not know how guests that do not disclose their source code
will detect the presence of pmu counters.

For upstream Linux guests, such a check is implemented in the check_hw_exists(),
which checks the counters one by one, often with an error on the first counter,
and then disables pmu from the kernel perspective.

The key point is that the KVM implementation cannot rely on assumptions about
the guest kernel version, and considering that the above check was added very early,
existing Linux guest instances will most likely (tend to) check the first 
counter and
error out (a VM could also check all of the possible counters and use a bitmap with
holes to track any functional counters).

> 
>> If KVM chooses to emulate greater than 1 less than 4 counters, then the
>> AMD guest PMU agent may assume that there are legacy 4 counters all
>> present (it's what the APM specifies), which requires the legacy code
>> to add #GP error handling for counters that should exist but actually not.
> 
> I would argue that regardless of the number of counters emulated, a
> guest PMU agent may assume that the 4 legacy counters are present,
> since that's what the APM specifies.

I certainly agree that, for example, a particular cpu model is stated in the spec
to have certain features (e.g. uncore pmu), but the KVM does not or chooses
not ro emulate them, for security reasons (e.g. side channel attacks), which
does violate the defined behavior of the hardware spec, such as here where
enable_pmu is false, which is not possible on almost all real hardware today.

> 
>> So at Sean's suggestion, we took a conservative approach. If KVM detects
>> less than 4 counters, we think KVM (under the current configuration and
>> platform) is not capable of emulating the most basic AMD pmu capability.
>> A large number of legacy instances are ready for 0 or 4+ ctrs, not 2 or 3
> 
> Which specific guest operating systems is this change intended for?
> 
>> Does this help you ? I wouldn't mind a better move.
> 
> Which AMD platforms have less than 4 counters available?

All this is for L2 Linux guest, as pmu on L1 Linux guest will be disabled by L0.

> 
>>
>>>
>>>>>
>>>>>> Suggested-by: Sean Christopherson <seanjc@google.com>
>>>>>> Signed-off-by: Like Xu <likexu@tencent.com>
>>>>>> ---
>>>>>>     arch/x86/kvm/pmu.h | 3 +++
>>>>>>     1 file changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
>>>>>> index dd7c7d4ffe3b..002b527360f4 100644
>>>>>> --- a/arch/x86/kvm/pmu.h
>>>>>> +++ b/arch/x86/kvm/pmu.h
>>>>>> @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
>>>>>>                            enable_pmu = false;
>>>>>>            }
>>>>>>
>>>>>> +       if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS)
> 
> Does this actually guarantee that the requisite number of counters are
> available and will always be available while the guest is running?

Not 100%, the scheduling of physical counters depends on the host perf scheduler.

I noticed that many cloud vendors want to make sure that hardware resources
are given exclusively to VMs, but for upstream, the availability of resources
should depend entirely on the host administrators, and a VMM should take away
access to resources at any time, such as vcpu time slice.

Any attempts in the direction of exclusive use will be thwarted.

> What happens if some other client of the host perf subsystem requests
> a CPU-pinned counter after this checck?

Normal perf use does not grab the counters allocated for kvm, NMI-watchdog
maybe one, but it will be moved to other timer hardware like HPET.

Of interest is that some ebpf programs that access the pmu hardware directly
use the interface that perf sub-system presents to KVM in the kernel.

> 
>>>>>> +               enable_pmu = false;
>>>>>> +
>>>>>>            if (!enable_pmu) {
>>>>>>                    memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
>>>>>>                    return;
>>>>>> --
>>>>>> 2.40.0
>>>>>>
>
  
Sean Christopherson May 24, 2023, 11:37 p.m. UTC | #7
On Wed, Apr 19, 2023, Like Xu wrote:
> Jim, sorry for the late reply.
> 
> On 11/4/2023 10:58 pm, Jim Mattson wrote:
> > > > > Jim, does this help you or could you explain more about your confusion ?
> > > > 
> > > > You say that "fewer than four counters can lead to guest instability
> > > > as software expects four counters to be available." Your solution is
> > > > to disable the PMU, which leaves zero counters available. Zero is less
> > > > than four. Hence, by your claim, disabling the PMU can lead to guest
> > > > instability. I don't see how this is an improvement over one, two, or
> > > > three counters.

KVM can't do the right thing regardless.  I would rather have KVM explicitly tell
userspace via that it can't support a vPMU than to carry on with a bogus and
unexpected setup.

> > Does this actually guarantee that the requisite number of counters are
> > available and will always be available while the guest is running?
> 
> Not 100%, the scheduling of physical counters depends on the host perf scheduler.

Or put differently, the same thing that happens on Intel.  kvm_pmu_cap.num_counters_gp
is the number of counters reported by perf when KVM loads, i.e. barring oddities,
it's the number of counters present in the host.  Most importantly, if perf doesn't
find the expected number of counters, perf will bail and use software only events,
and then clear all of x86_pmu.

In other words, KVM's new sanity *should* be a nop with respect to current
behavior.  If we're concerned about "unnecessarily" hiding the PMU when there are
1-3 counters, I'd be ok with a WARN_ON_ONCE().

Actually, looking more closely, there's unaddressed feedback from v4[*].  Folding
that in, we can enable the sanity check for both Intel and AMD, though that's a
bit of a lie since Intel will be '1'.  But the code looks pretty!

	if (enable_pmu) {
		perf_get_x86_pmu_capability(&kvm_pmu_cap);

		/*
		 * WARN if perf did NOT disable hardware PMU if the number of
		 * architecturally required GP counters aren't present, i.e. if
		 * there are a non-zero number of counters, but fewer than what
		 * is architecturally required.
		 */
		if (!kvm_pmu_cap.num_counters_gp ||
		    WARN_ON_ONCE(kvm_pmu_cap.num_counters_gp < pmu_ops->MIN_NR_GP_COUNTERS))
			enable_pmu = false;
		else if (is_intel && !kvm_pmu_cap.version)
			enable_pmu = false;
	}

	if (!enable_pmu) {
		memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
		return;
	}

[*] https://lore.kernel.org/all/ZC9ijgZBaz6p1ecw@google.com
  
Like Xu May 29, 2023, 2:25 p.m. UTC | #8
On 25/5/2023 7:37 am, Sean Christopherson wrote:
> On Wed, Apr 19, 2023, Like Xu wrote:
>> Jim, sorry for the late reply.
>>
>> On 11/4/2023 10:58 pm, Jim Mattson wrote:
>>>>>> Jim, does this help you or could you explain more about your confusion ?
>>>>>
>>>>> You say that "fewer than four counters can lead to guest instability
>>>>> as software expects four counters to be available." Your solution is
>>>>> to disable the PMU, which leaves zero counters available. Zero is less
>>>>> than four. Hence, by your claim, disabling the PMU can lead to guest
>>>>> instability. I don't see how this is an improvement over one, two, or
>>>>> three counters.
> 
> KVM can't do the right thing regardless.  I would rather have KVM explicitly tell
> userspace via that it can't support a vPMU than to carry on with a bogus and
> unexpected setup.
> 
>>> Does this actually guarantee that the requisite number of counters are
>>> available and will always be available while the guest is running?
>>
>> Not 100%, the scheduling of physical counters depends on the host perf scheduler.
> 
> Or put differently, the same thing that happens on Intel.  kvm_pmu_cap.num_counters_gp
> is the number of counters reported by perf when KVM loads, i.e. barring oddities,
> it's the number of counters present in the host.  Most importantly, if perf doesn't
> find the expected number of counters, perf will bail and use software only events,
> and then clear all of x86_pmu.
> 
> In other words, KVM's new sanity *should* be a nop with respect to current
> behavior.  If we're concerned about "unnecessarily" hiding the PMU when there are
> 1-3 counters, I'd be ok with a WARN_ON_ONCE().
> 
> Actually, looking more closely, there's unaddressed feedback from v4[*].  Folding
> that in, we can enable the sanity check for both Intel and AMD, though that's a
> bit of a lie since Intel will be '1'.  But the code looks pretty!

The below diff looks good to me. Please confirm that the other patches are in 
good shape
so that we can iterate on the next version. Thanks.

> 
> 	if (enable_pmu) {
> 		perf_get_x86_pmu_capability(&kvm_pmu_cap);
> 
> 		/*
> 		 * WARN if perf did NOT disable hardware PMU if the number of
> 		 * architecturally required GP counters aren't present, i.e. if
> 		 * there are a non-zero number of counters, but fewer than what
> 		 * is architecturally required.
> 		 */
> 		if (!kvm_pmu_cap.num_counters_gp ||
> 		    WARN_ON_ONCE(kvm_pmu_cap.num_counters_gp < pmu_ops->MIN_NR_GP_COUNTERS))
> 			enable_pmu = false;
> 		else if (is_intel && !kvm_pmu_cap.version)
> 			enable_pmu = false;
> 	}
> 
> 	if (!enable_pmu) {
> 		memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
> 		return;
> 	}
> 
> [*] https://lore.kernel.org/all/ZC9ijgZBaz6p1ecw@google.com
  

Patch

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index dd7c7d4ffe3b..002b527360f4 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -182,6 +182,9 @@  static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
 			enable_pmu = false;
 	}
 
+	if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS)
+		enable_pmu = false;
+
 	if (!enable_pmu) {
 		memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
 		return;