Message ID | 20230410105056.60973-6-likexu@tencent.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1814620vqo; Mon, 10 Apr 2023 04:04:13 -0700 (PDT) X-Google-Smtp-Source: AKy350YLtKiswTSIC5CDevbM11qTft/pqFcEWA5O6ccVgSryYCyKbrSrhVvgTbQvM26nY7oTv0Sf X-Received: by 2002:a17:907:8d89:b0:94a:959f:6d58 with SMTP id tf9-20020a1709078d8900b0094a959f6d58mr2332692ejc.18.1681124652908; Mon, 10 Apr 2023 04:04:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681124652; cv=none; d=google.com; s=arc-20160816; b=ijtqeXNaNBT0dUpnpUXI5bKhPdQGETrIzla6k7E5f4QauSFjhjnwywYtwtVdO5PzQY uzjS7Y2pNJoVqriGAVnE8FGg9KRe2CSwz1vN6Ey5hsKp0Bqot2PypMQ5sFf4DaH9aew2 oFHaCrhbliPnn6lkUXj8MVs876lRNgRNdDFwBn9iyUG5myBFPZ9eKqYzuU4UtHQq6n8D 1Qvry9BzfLYYD1Eg+ORKEXvAUqqFn9MJyKspFdQEZ6PNz5/HsNG9z2sM2N4SbhP9fSd2 H8HqzGKskqQMMpdnl3R4OLB/5Pf8iheg/rJtHZecj+kQoIy3KRskDFPfuEh7C+4sjBgl ZggQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=D3qItoPbhd1cS9wC/WtD5ihRlrUlFhuVlKz/oT0HlAk=; b=NS56ik2gK3wN3YF9POnelUsQa5pAXWxkS16LZ+SDrckmJdgwJnEttFROpcto3oUWmm Fj/gf8uPLBq9SeI6LZG6fPb+rBnkAG2i05wWDnbd+AOlc951ee2HIhu13aMWC/HA/0Kx qMmhcd1IyKks2kOS8ELgN0LH1mO16xOQtVln+Oi34HvyGI+3s3S/hAlOWU+vZvHO4UbF 2/aTm+Uy2nonle93wPEZJbiWGsLvriYXPuE4LKucjXW1XVK0pUFSSMUDMV6cPZGldnlE ympaBCucOO/M6tjBCrYYQkiHAQ76JEuQzkxz9rEPHdiSlgA0ARGiWZU/E33haCMJ8c4F zQ7w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=Hle8piow; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id xo20-20020a170907bb9400b0094a781b3d0fsi2759033ejc.134.2023.04.10.04.03.49; Mon, 10 Apr 2023 04:04:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=Hle8piow; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229755AbjDJKvg (ORCPT <rfc822;aposhian.dev@gmail.com> + 99 others); Mon, 10 Apr 2023 06:51:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40962 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229719AbjDJKv3 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 10 Apr 2023 06:51:29 -0400 Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 24BF65591; Mon, 10 Apr 2023 03:51:27 -0700 (PDT) Received: by mail-pj1-x102b.google.com with SMTP id j8so2214182pjy.4; Mon, 10 Apr 2023 03:51:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1681123886; x=1683715886; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=D3qItoPbhd1cS9wC/WtD5ihRlrUlFhuVlKz/oT0HlAk=; b=Hle8piowxX0J3XV9psl9ZEffGVGw/830cfh4j8jEWWSDjf+8wbg0iu/kbODfn9I3mf vZOgoFcbyGhG1YtybxsbVDbjCK+esSuZYvsBP/MAyqFmm/BSN3REl6qODlbXPFcAWFRF HK9bowYAX1p0s2u/EhhWzYT4ZduBa0OhsO0h4Dj7cZEkyWA+NGF+d4Ulc5dNbQcs95et kwyOsjf5R+YvjVv9L3lJZ4Xehz10C9jdqrpAc2FPPT1Y0FyA9xp3TW1uudGmDKpfkv5K KEi1f9ZY6XpE7FS+zD3PN4jnwX6qExOI/Hid7/SmdHKr0HtjEM7CpOdsmrbkkRPqXwrT pAfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1681123886; x=1683715886; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=D3qItoPbhd1cS9wC/WtD5ihRlrUlFhuVlKz/oT0HlAk=; b=T9o1IkPVKOdKVsFFOm5KSehqEEqkEqN+Q4xC7U4CeZyyfiP1mzcoLbCnFGAVbaf9+8 CLYll3BbhFv3pp/tCSNht7ukBmhad1/vQfT4PUmTReT1JFFxX4fAgUwKqLfzhPYHelw5 r+W+N9yezortM/mZGpm/b99t9D/LSxfpOXZQdeUB5fEo4IjXXxN09o46VcmqsXwZjIpn lYQ0O8pbJEKyryyTmJ16lX/CZDX6fC9/eTx8gwRHb3A8o7iYTPtHtkFRmR0b1yRabw41 uZ7oD4nGkteznTm4Ys0UiuuazF0/58EN/2TdmGkPtOI1GlO7jf1v0BvSBi9lkeIw1C2b aCfA== X-Gm-Message-State: AAQBX9dz2nqaQeIu1O/WlgJ2qeJiBKpAi5RdPH7C6fIZVVF+FL6cqyeP Q2b45Z3MgGC2oZSY5kTN1yvhbodTC+/YOA== X-Received: by 2002:a05:6a20:1221:b0:e7:c39a:8823 with SMTP id v33-20020a056a20122100b000e7c39a8823mr10787565pzf.12.1681123886452; Mon, 10 Apr 2023 03:51:26 -0700 (PDT) Received: from localhost.localdomain ([103.7.29.32]) by smtp.gmail.com with ESMTPSA id h4-20020a056a00170400b0062e032b61a6sm7783252pfc.91.2023.04.10.03.51.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Apr 2023 03:51:26 -0700 (PDT) From: Like Xu <like.xu.linux@gmail.com> X-Google-Original-From: Like Xu <likexu@tencent.com> To: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com>, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH V5 05/10] KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met Date: Mon, 10 Apr 2023 18:50:51 +0800 Message-Id: <20230410105056.60973-6-likexu@tencent.com> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230410105056.60973-1-likexu@tencent.com> References: <20230410105056.60973-1-likexu@tencent.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762786964020546873?= X-GMAIL-MSGID: =?utf-8?q?1762786964020546873?= |
Series |
KVM: x86: Add AMD Guest PerfMonV2 PMU support
|
|
Commit Message
Like Xu
April 10, 2023, 10:50 a.m. UTC
From: Like Xu <likexu@tencent.com> Disable PMU support when running on AMD and perf reports fewer than four general purpose counters. All AMD PMUs must define at least four counters due to AMD's legacy architecture hardcoding the number of counters without providing a way to enumerate the number of counters to software, e.g. from AMD's APM: The legacy architecture defines four performance counters (PerfCtrn) and corresponding event-select registers (PerfEvtSeln). Virtualizing fewer than four counters can lead to guest instability as software expects four counters to be available. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Like Xu <likexu@tencent.com> --- arch/x86/kvm/pmu.h | 3 +++ 1 file changed, 3 insertions(+)
Comments
On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote: > > From: Like Xu <likexu@tencent.com> > > Disable PMU support when running on AMD and perf reports fewer than four > general purpose counters. All AMD PMUs must define at least four counters > due to AMD's legacy architecture hardcoding the number of counters > without providing a way to enumerate the number of counters to software, > e.g. from AMD's APM: > > The legacy architecture defines four performance counters (PerfCtrn) > and corresponding event-select registers (PerfEvtSeln). > > Virtualizing fewer than four counters can lead to guest instability as > software expects four counters to be available. I'm confused. Isn't zero less than four? > Suggested-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Like Xu <likexu@tencent.com> > --- > arch/x86/kvm/pmu.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h > index dd7c7d4ffe3b..002b527360f4 100644 > --- a/arch/x86/kvm/pmu.h > +++ b/arch/x86/kvm/pmu.h > @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) > enable_pmu = false; > } > > + if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS) > + enable_pmu = false; > + > if (!enable_pmu) { > memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap)); > return; > -- > 2.40.0 >
On 11/4/2023 1:36 pm, Jim Mattson wrote: > On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote: >> >> From: Like Xu <likexu@tencent.com> >> >> Disable PMU support when running on AMD and perf reports fewer than four >> general purpose counters. All AMD PMUs must define at least four counters >> due to AMD's legacy architecture hardcoding the number of counters >> without providing a way to enumerate the number of counters to software, >> e.g. from AMD's APM: >> >> The legacy architecture defines four performance counters (PerfCtrn) >> and corresponding event-select registers (PerfEvtSeln). >> >> Virtualizing fewer than four counters can lead to guest instability as >> software expects four counters to be available. > > I'm confused. Isn't zero less than four? As I understand it, you are saying that virtualization of zero counter is also reasonable. If so, the above statement could be refined as: Virtualizing fewer than four counters when vPMU is enabled may lead to guest instability as software expects at least four counters to be available, thus the vPMU is disabled if the minimum number of KVM supported counters is not reached during initialization. Jim, does this help you or could you explain more about your confusion ? > >> Suggested-by: Sean Christopherson <seanjc@google.com> >> Signed-off-by: Like Xu <likexu@tencent.com> >> --- >> arch/x86/kvm/pmu.h | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h >> index dd7c7d4ffe3b..002b527360f4 100644 >> --- a/arch/x86/kvm/pmu.h >> +++ b/arch/x86/kvm/pmu.h >> @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) >> enable_pmu = false; >> } >> >> + if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS) >> + enable_pmu = false; >> + >> if (!enable_pmu) { >> memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap)); >> return; >> -- >> 2.40.0 >>
On Mon, Apr 10, 2023 at 11:17 PM Like Xu <like.xu.linux@gmail.com> wrote: > > On 11/4/2023 1:36 pm, Jim Mattson wrote: > > On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote: > >> > >> From: Like Xu <likexu@tencent.com> > >> > >> Disable PMU support when running on AMD and perf reports fewer than four > >> general purpose counters. All AMD PMUs must define at least four counters > >> due to AMD's legacy architecture hardcoding the number of counters > >> without providing a way to enumerate the number of counters to software, > >> e.g. from AMD's APM: > >> > >> The legacy architecture defines four performance counters (PerfCtrn) > >> and corresponding event-select registers (PerfEvtSeln). > >> > >> Virtualizing fewer than four counters can lead to guest instability as > >> software expects four counters to be available. > > > > I'm confused. Isn't zero less than four? > > As I understand it, you are saying that virtualization of zero counter is also > reasonable. > If so, the above statement could be refined as: > > Virtualizing fewer than four counters when vPMU is enabled may lead to guest > instability > as software expects at least four counters to be available, thus the vPMU is > disabled if the > minimum number of KVM supported counters is not reached during initialization. > > Jim, does this help you or could you explain more about your confusion ? You say that "fewer than four counters can lead to guest instability as software expects four counters to be available." Your solution is to disable the PMU, which leaves zero counters available. Zero is less than four. Hence, by your claim, disabling the PMU can lead to guest instability. I don't see how this is an improvement over one, two, or three counters. > > > >> Suggested-by: Sean Christopherson <seanjc@google.com> > >> Signed-off-by: Like Xu <likexu@tencent.com> > >> --- > >> arch/x86/kvm/pmu.h | 3 +++ > >> 1 file changed, 3 insertions(+) > >> > >> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h > >> index dd7c7d4ffe3b..002b527360f4 100644 > >> --- a/arch/x86/kvm/pmu.h > >> +++ b/arch/x86/kvm/pmu.h > >> @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) > >> enable_pmu = false; > >> } > >> > >> + if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS) > >> + enable_pmu = false; > >> + > >> if (!enable_pmu) { > >> memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap)); > >> return; > >> -- > >> 2.40.0 > >>
On 11/4/2023 8:58 pm, Jim Mattson wrote: > On Mon, Apr 10, 2023 at 11:17 PM Like Xu <like.xu.linux@gmail.com> wrote: >> >> On 11/4/2023 1:36 pm, Jim Mattson wrote: >>> On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote: >>>> >>>> From: Like Xu <likexu@tencent.com> >>>> >>>> Disable PMU support when running on AMD and perf reports fewer than four >>>> general purpose counters. All AMD PMUs must define at least four counters >>>> due to AMD's legacy architecture hardcoding the number of counters >>>> without providing a way to enumerate the number of counters to software, >>>> e.g. from AMD's APM: >>>> >>>> The legacy architecture defines four performance counters (PerfCtrn) >>>> and corresponding event-select registers (PerfEvtSeln). >>>> >>>> Virtualizing fewer than four counters can lead to guest instability as >>>> software expects four counters to be available. >>> >>> I'm confused. Isn't zero less than four? >> >> As I understand it, you are saying that virtualization of zero counter is also >> reasonable. >> If so, the above statement could be refined as: >> >> Virtualizing fewer than four counters when vPMU is enabled may lead to guest >> instability >> as software expects at least four counters to be available, thus the vPMU is >> disabled if the >> minimum number of KVM supported counters is not reached during initialization. >> >> Jim, does this help you or could you explain more about your confusion ? > > You say that "fewer than four counters can lead to guest instability > as software expects four counters to be available." Your solution is > to disable the PMU, which leaves zero counters available. Zero is less > than four. Hence, by your claim, disabling the PMU can lead to guest > instability. I don't see how this is an improvement over one, two, or > three counters. As you know, AMD pmu lacks an architected method (such as CPUID) to indicate that the VM does not have any pmu counters available for the current platform. Guests like Linux tend to check if their first counters exist and work properly to infer that other pmu counters exist. If KVM chooses to emulate greater than 1 less than 4 counters, then the AMD guest PMU agent may assume that there are legacy 4 counters all present (it's what the APM specifies), which requires the legacy code to add #GP error handling for counters that should exist but actually not. So at Sean's suggestion, we took a conservative approach. If KVM detects less than 4 counters, we think KVM (under the current configuration and platform) is not capable of emulating the most basic AMD pmu capability. A large number of legacy instances are ready for 0 or 4+ ctrs, not 2 or 3. Does this help you ? I wouldn't mind a better move. > >>> >>>> Suggested-by: Sean Christopherson <seanjc@google.com> >>>> Signed-off-by: Like Xu <likexu@tencent.com> >>>> --- >>>> arch/x86/kvm/pmu.h | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h >>>> index dd7c7d4ffe3b..002b527360f4 100644 >>>> --- a/arch/x86/kvm/pmu.h >>>> +++ b/arch/x86/kvm/pmu.h >>>> @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) >>>> enable_pmu = false; >>>> } >>>> >>>> + if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS) >>>> + enable_pmu = false; >>>> + >>>> if (!enable_pmu) { >>>> memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap)); >>>> return; >>>> -- >>>> 2.40.0 >>>>
On Tue, Apr 11, 2023 at 6:18 AM Like Xu <like.xu.linux@gmail.com> wrote: > > On 11/4/2023 8:58 pm, Jim Mattson wrote: > > On Mon, Apr 10, 2023 at 11:17 PM Like Xu <like.xu.linux@gmail.com> wrote: > >> > >> On 11/4/2023 1:36 pm, Jim Mattson wrote: > >>> On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote: > >>>> > >>>> From: Like Xu <likexu@tencent.com> > >>>> > >>>> Disable PMU support when running on AMD and perf reports fewer than four > >>>> general purpose counters. All AMD PMUs must define at least four counters > >>>> due to AMD's legacy architecture hardcoding the number of counters > >>>> without providing a way to enumerate the number of counters to software, > >>>> e.g. from AMD's APM: > >>>> > >>>> The legacy architecture defines four performance counters (PerfCtrn) > >>>> and corresponding event-select registers (PerfEvtSeln). > >>>> > >>>> Virtualizing fewer than four counters can lead to guest instability as > >>>> software expects four counters to be available. > >>> > >>> I'm confused. Isn't zero less than four? > >> > >> As I understand it, you are saying that virtualization of zero counter is also > >> reasonable. > >> If so, the above statement could be refined as: > >> > >> Virtualizing fewer than four counters when vPMU is enabled may lead to guest > >> instability > >> as software expects at least four counters to be available, thus the vPMU is > >> disabled if the > >> minimum number of KVM supported counters is not reached during initialization. > >> > >> Jim, does this help you or could you explain more about your confusion ? > > > > You say that "fewer than four counters can lead to guest instability > > as software expects four counters to be available." Your solution is > > to disable the PMU, which leaves zero counters available. Zero is less > > than four. Hence, by your claim, disabling the PMU can lead to guest > > instability. I don't see how this is an improvement over one, two, or > > three counters. > > As you know, AMD pmu lacks an architected method (such as CPUID) to > indicate that the VM does not have any pmu counters available for the > current platform. Guests like Linux tend to check if their first counters > exist and work properly to infer that other pmu counters exist. "Guests like Linux," or just Linux? What do you mean by "tend"? When do they perform this check, and when do they not? > If KVM chooses to emulate greater than 1 less than 4 counters, then the > AMD guest PMU agent may assume that there are legacy 4 counters all > present (it's what the APM specifies), which requires the legacy code > to add #GP error handling for counters that should exist but actually not. I would argue that regardless of the number of counters emulated, a guest PMU agent may assume that the 4 legacy counters are present, since that's what the APM specifies. > So at Sean's suggestion, we took a conservative approach. If KVM detects > less than 4 counters, we think KVM (under the current configuration and > platform) is not capable of emulating the most basic AMD pmu capability. > A large number of legacy instances are ready for 0 or 4+ ctrs, not 2 or 3 Which specific guest operating systems is this change intended for? > Does this help you ? I wouldn't mind a better move. Which AMD platforms have less than 4 counters available? > > > > >>> > >>>> Suggested-by: Sean Christopherson <seanjc@google.com> > >>>> Signed-off-by: Like Xu <likexu@tencent.com> > >>>> --- > >>>> arch/x86/kvm/pmu.h | 3 +++ > >>>> 1 file changed, 3 insertions(+) > >>>> > >>>> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h > >>>> index dd7c7d4ffe3b..002b527360f4 100644 > >>>> --- a/arch/x86/kvm/pmu.h > >>>> +++ b/arch/x86/kvm/pmu.h > >>>> @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) > >>>> enable_pmu = false; > >>>> } > >>>> > >>>> + if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS) Does this actually guarantee that the requisite number of counters are available and will always be available while the guest is running? What happens if some other client of the host perf subsystem requests a CPU-pinned counter after this checck? > >>>> + enable_pmu = false; > >>>> + > >>>> if (!enable_pmu) { > >>>> memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap)); > >>>> return; > >>>> -- > >>>> 2.40.0 > >>>>
Jim, sorry for the late reply. On 11/4/2023 10:58 pm, Jim Mattson wrote: > On Tue, Apr 11, 2023 at 6:18 AM Like Xu <like.xu.linux@gmail.com> wrote: >> >> On 11/4/2023 8:58 pm, Jim Mattson wrote: >>> On Mon, Apr 10, 2023 at 11:17 PM Like Xu <like.xu.linux@gmail.com> wrote: >>>> >>>> On 11/4/2023 1:36 pm, Jim Mattson wrote: >>>>> On Mon, Apr 10, 2023 at 3:51 AM Like Xu <like.xu.linux@gmail.com> wrote: >>>>>> >>>>>> From: Like Xu <likexu@tencent.com> >>>>>> >>>>>> Disable PMU support when running on AMD and perf reports fewer than four >>>>>> general purpose counters. All AMD PMUs must define at least four counters >>>>>> due to AMD's legacy architecture hardcoding the number of counters >>>>>> without providing a way to enumerate the number of counters to software, >>>>>> e.g. from AMD's APM: >>>>>> >>>>>> The legacy architecture defines four performance counters (PerfCtrn) >>>>>> and corresponding event-select registers (PerfEvtSeln). >>>>>> >>>>>> Virtualizing fewer than four counters can lead to guest instability as >>>>>> software expects four counters to be available. >>>>> >>>>> I'm confused. Isn't zero less than four? >>>> >>>> As I understand it, you are saying that virtualization of zero counter is also >>>> reasonable. >>>> If so, the above statement could be refined as: >>>> >>>> Virtualizing fewer than four counters when vPMU is enabled may lead to guest >>>> instability >>>> as software expects at least four counters to be available, thus the vPMU is >>>> disabled if the >>>> minimum number of KVM supported counters is not reached during initialization. >>>> >>>> Jim, does this help you or could you explain more about your confusion ? >>> >>> You say that "fewer than four counters can lead to guest instability >>> as software expects four counters to be available." Your solution is >>> to disable the PMU, which leaves zero counters available. Zero is less >>> than four. Hence, by your claim, disabling the PMU can lead to guest >>> instability. I don't see how this is an improvement over one, two, or >>> three counters. >> >> As you know, AMD pmu lacks an architected method (such as CPUID) to >> indicate that the VM does not have any pmu counters available for the >> current platform. Guests like Linux tend to check if their first counters >> exist and work properly to infer that other pmu counters exist. > > "Guests like Linux," or just Linux? What do you mean by "tend"? When > do they perform this check, and when do they not? We do not know how guests that do not disclose their source code will detect the presence of pmu counters. For upstream Linux guests, such a check is implemented in the check_hw_exists(), which checks the counters one by one, often with an error on the first counter, and then disables pmu from the kernel perspective. The key point is that the KVM implementation cannot rely on assumptions about the guest kernel version, and considering that the above check was added very early, existing Linux guest instances will most likely (tend to) check the first counter and error out (a VM could also check all of the possible counters and use a bitmap with holes to track any functional counters). > >> If KVM chooses to emulate greater than 1 less than 4 counters, then the >> AMD guest PMU agent may assume that there are legacy 4 counters all >> present (it's what the APM specifies), which requires the legacy code >> to add #GP error handling for counters that should exist but actually not. > > I would argue that regardless of the number of counters emulated, a > guest PMU agent may assume that the 4 legacy counters are present, > since that's what the APM specifies. I certainly agree that, for example, a particular cpu model is stated in the spec to have certain features (e.g. uncore pmu), but the KVM does not or chooses not ro emulate them, for security reasons (e.g. side channel attacks), which does violate the defined behavior of the hardware spec, such as here where enable_pmu is false, which is not possible on almost all real hardware today. > >> So at Sean's suggestion, we took a conservative approach. If KVM detects >> less than 4 counters, we think KVM (under the current configuration and >> platform) is not capable of emulating the most basic AMD pmu capability. >> A large number of legacy instances are ready for 0 or 4+ ctrs, not 2 or 3 > > Which specific guest operating systems is this change intended for? > >> Does this help you ? I wouldn't mind a better move. > > Which AMD platforms have less than 4 counters available? All this is for L2 Linux guest, as pmu on L1 Linux guest will be disabled by L0. > >> >>> >>>>> >>>>>> Suggested-by: Sean Christopherson <seanjc@google.com> >>>>>> Signed-off-by: Like Xu <likexu@tencent.com> >>>>>> --- >>>>>> arch/x86/kvm/pmu.h | 3 +++ >>>>>> 1 file changed, 3 insertions(+) >>>>>> >>>>>> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h >>>>>> index dd7c7d4ffe3b..002b527360f4 100644 >>>>>> --- a/arch/x86/kvm/pmu.h >>>>>> +++ b/arch/x86/kvm/pmu.h >>>>>> @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) >>>>>> enable_pmu = false; >>>>>> } >>>>>> >>>>>> + if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS) > > Does this actually guarantee that the requisite number of counters are > available and will always be available while the guest is running? Not 100%, the scheduling of physical counters depends on the host perf scheduler. I noticed that many cloud vendors want to make sure that hardware resources are given exclusively to VMs, but for upstream, the availability of resources should depend entirely on the host administrators, and a VMM should take away access to resources at any time, such as vcpu time slice. Any attempts in the direction of exclusive use will be thwarted. > What happens if some other client of the host perf subsystem requests > a CPU-pinned counter after this checck? Normal perf use does not grab the counters allocated for kvm, NMI-watchdog maybe one, but it will be moved to other timer hardware like HPET. Of interest is that some ebpf programs that access the pmu hardware directly use the interface that perf sub-system presents to KVM in the kernel. > >>>>>> + enable_pmu = false; >>>>>> + >>>>>> if (!enable_pmu) { >>>>>> memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap)); >>>>>> return; >>>>>> -- >>>>>> 2.40.0 >>>>>> >
On Wed, Apr 19, 2023, Like Xu wrote: > Jim, sorry for the late reply. > > On 11/4/2023 10:58 pm, Jim Mattson wrote: > > > > > Jim, does this help you or could you explain more about your confusion ? > > > > > > > > You say that "fewer than four counters can lead to guest instability > > > > as software expects four counters to be available." Your solution is > > > > to disable the PMU, which leaves zero counters available. Zero is less > > > > than four. Hence, by your claim, disabling the PMU can lead to guest > > > > instability. I don't see how this is an improvement over one, two, or > > > > three counters. KVM can't do the right thing regardless. I would rather have KVM explicitly tell userspace via that it can't support a vPMU than to carry on with a bogus and unexpected setup. > > Does this actually guarantee that the requisite number of counters are > > available and will always be available while the guest is running? > > Not 100%, the scheduling of physical counters depends on the host perf scheduler. Or put differently, the same thing that happens on Intel. kvm_pmu_cap.num_counters_gp is the number of counters reported by perf when KVM loads, i.e. barring oddities, it's the number of counters present in the host. Most importantly, if perf doesn't find the expected number of counters, perf will bail and use software only events, and then clear all of x86_pmu. In other words, KVM's new sanity *should* be a nop with respect to current behavior. If we're concerned about "unnecessarily" hiding the PMU when there are 1-3 counters, I'd be ok with a WARN_ON_ONCE(). Actually, looking more closely, there's unaddressed feedback from v4[*]. Folding that in, we can enable the sanity check for both Intel and AMD, though that's a bit of a lie since Intel will be '1'. But the code looks pretty! if (enable_pmu) { perf_get_x86_pmu_capability(&kvm_pmu_cap); /* * WARN if perf did NOT disable hardware PMU if the number of * architecturally required GP counters aren't present, i.e. if * there are a non-zero number of counters, but fewer than what * is architecturally required. */ if (!kvm_pmu_cap.num_counters_gp || WARN_ON_ONCE(kvm_pmu_cap.num_counters_gp < pmu_ops->MIN_NR_GP_COUNTERS)) enable_pmu = false; else if (is_intel && !kvm_pmu_cap.version) enable_pmu = false; } if (!enable_pmu) { memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap)); return; } [*] https://lore.kernel.org/all/ZC9ijgZBaz6p1ecw@google.com
On 25/5/2023 7:37 am, Sean Christopherson wrote: > On Wed, Apr 19, 2023, Like Xu wrote: >> Jim, sorry for the late reply. >> >> On 11/4/2023 10:58 pm, Jim Mattson wrote: >>>>>> Jim, does this help you or could you explain more about your confusion ? >>>>> >>>>> You say that "fewer than four counters can lead to guest instability >>>>> as software expects four counters to be available." Your solution is >>>>> to disable the PMU, which leaves zero counters available. Zero is less >>>>> than four. Hence, by your claim, disabling the PMU can lead to guest >>>>> instability. I don't see how this is an improvement over one, two, or >>>>> three counters. > > KVM can't do the right thing regardless. I would rather have KVM explicitly tell > userspace via that it can't support a vPMU than to carry on with a bogus and > unexpected setup. > >>> Does this actually guarantee that the requisite number of counters are >>> available and will always be available while the guest is running? >> >> Not 100%, the scheduling of physical counters depends on the host perf scheduler. > > Or put differently, the same thing that happens on Intel. kvm_pmu_cap.num_counters_gp > is the number of counters reported by perf when KVM loads, i.e. barring oddities, > it's the number of counters present in the host. Most importantly, if perf doesn't > find the expected number of counters, perf will bail and use software only events, > and then clear all of x86_pmu. > > In other words, KVM's new sanity *should* be a nop with respect to current > behavior. If we're concerned about "unnecessarily" hiding the PMU when there are > 1-3 counters, I'd be ok with a WARN_ON_ONCE(). > > Actually, looking more closely, there's unaddressed feedback from v4[*]. Folding > that in, we can enable the sanity check for both Intel and AMD, though that's a > bit of a lie since Intel will be '1'. But the code looks pretty! The below diff looks good to me. Please confirm that the other patches are in good shape so that we can iterate on the next version. Thanks. > > if (enable_pmu) { > perf_get_x86_pmu_capability(&kvm_pmu_cap); > > /* > * WARN if perf did NOT disable hardware PMU if the number of > * architecturally required GP counters aren't present, i.e. if > * there are a non-zero number of counters, but fewer than what > * is architecturally required. > */ > if (!kvm_pmu_cap.num_counters_gp || > WARN_ON_ONCE(kvm_pmu_cap.num_counters_gp < pmu_ops->MIN_NR_GP_COUNTERS)) > enable_pmu = false; > else if (is_intel && !kvm_pmu_cap.version) > enable_pmu = false; > } > > if (!enable_pmu) { > memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap)); > return; > } > > [*] https://lore.kernel.org/all/ZC9ijgZBaz6p1ecw@google.com
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index dd7c7d4ffe3b..002b527360f4 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -182,6 +182,9 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) enable_pmu = false; } + if (!is_intel && kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS) + enable_pmu = false; + if (!enable_pmu) { memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap)); return;