Message ID | 20230407085646.24809-1-likexu@tencent.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp157296vqo; Fri, 7 Apr 2023 02:16:55 -0700 (PDT) X-Google-Smtp-Source: AKy350aX1sWFJ5M4U3BJmqKUtw2YXwck7LFL7uc4LRLf6yOrVWP9fQ75Cb+SU5FRvf5zxbAesb5D X-Received: by 2002:a05:6a20:4992:b0:e3:86d1:55e8 with SMTP id fs18-20020a056a20499200b000e386d155e8mr1982676pzb.53.1680859014829; Fri, 07 Apr 2023 02:16:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680859014; cv=none; d=google.com; s=arc-20160816; b=nobAmh2hNy0kjssDGwqNQCW+/qtg0Wk4XpjbdC/qELHQLkCu25xR1zo8n4NhqkYSCK A37vuuU2FthM+23Yxiycg+pvlvwIRnyn4KiqBalo9Y2t4YNjhvL3Brl6leXkCansb3MN iBRrNV5E92m/4fIPg2IpiNPE5helLZ9ttZ09YU72yvXL3OTwh6yVXTzwLtUp2JByWiA2 C0YbxC0VnbuGwYPNfliHRMOgsZz0JU1H1Ji/ffiByi7RAx6pQnU72kvkM6alKzbfv6uS 6SGV8p9i7Ec+5TmAV866r2sel4JxswXoKguuctynO4KGQsej5NawrUOmtdYqGnItB6UX BB2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=gpAcJUBZnPRe6gVW7efqQAtGabyC/Y11dXostFs84j4=; b=c1yGy7KtghfXPjUhPIBM7Tmxxxq/WvkTsmdFxPWUKj7GCPaNLqphtWJKf1zSUWipe9 f6TNhB27t/af3QxMYHqKjeD8QHXkKOJV4ZE0QZNwCm8kHXtM8xA+Dfl5hPPCAiPxwJTp I+v3sjYtIyPdk7nl0gtUFEyCQMMUP7prLc6pansEeAm0r4RyqIvCK9xUXhfgmt2vlIGG 99yO5he5LmkG5TGv9wWlhb832Hi6Lq18Qih9L5J7cY2pyl0A8MpI5AVM6X+5/voTPdGN YQ1lqKe+2hT8gBzhzni+ZsD6JD/umJYAZlg3O+CkHo31tYEj7ONTIlaCNg2R/up2+xc5 8vaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=gLnkbJy6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y28-20020aa79afc000000b005e1cabb612fsi3433944pfp.67.2023.04.07.02.16.42; Fri, 07 Apr 2023 02:16:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=gLnkbJy6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240247AbjDGI5M (ORCPT <rfc822;a1648639935@gmail.com> + 99 others); Fri, 7 Apr 2023 04:57:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231962AbjDGI5J (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 7 Apr 2023 04:57:09 -0400 Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACE1EA5CB; Fri, 7 Apr 2023 01:57:07 -0700 (PDT) Received: by mail-pl1-x62c.google.com with SMTP id o11so39565357ple.1; Fri, 07 Apr 2023 01:57:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680857827; x=1683449827; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=gpAcJUBZnPRe6gVW7efqQAtGabyC/Y11dXostFs84j4=; b=gLnkbJy6vCxNcSx0/K8fZShwl5sL9ieZFfkqJrE4xPkeFw6MA+zxEbkEkVlfJtmIOk IofcB+gFPxc4fBacN8KQrLDCxWqA8uhUduggQ07Ar+zvnTKSxM2/b7l1ddEvM5Ldl4BH k8j49KONMc/sxCPWAPGP2za0R4HV4yaIO+/ELgCChCYBT4lgV9WBJok9wrHWt1oQCzGd cK/Z+zLE5Br4hsKGnz+Bh7Dc833/1llMX/wTBDka562sWC7TGB2hRMliI4Axx5TL+WZ6 mbAg099iNBg2Sa5aq0oln9h540Y+jzT1GnH24ivEJaMOMv6GZnhcj6tM+eQwVp1A9N8X +SBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680857827; x=1683449827; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=gpAcJUBZnPRe6gVW7efqQAtGabyC/Y11dXostFs84j4=; b=0LlFJv0+4g0KIfHCHc9fVGuyzP+UBvcILcFeDClEOCfO2h83nKmFlpxXf4x1qTM5IA Cnz6eFpntr1NW+qWzYFv9AaA9s/c/8wBKTuAE9TZyCpiA7Lvgn4ydfIM42QcSsF81s9l 6OpEKHpubKz0UyyQup5XKhEZriuE0oDdFDssu76GZvg8YRzEzlkE4xKbF1KyRO11G0Hi u/UXC9/Lt/DuN90GrdLK9Rc3LUemjGg/6sh1q4YpYqLCO5FnxLmZ8/Y3b8QRe9tbVQ+/ wH5yHiMbcUYVIMVMEGNLBWrw99WDAEvLLOIiCH+KQBvhSDBWDoiJh1PHulv5CHKuQk3m +XLA== X-Gm-Message-State: AAQBX9eAWlnyzilBebUB3VNl1kkMnEUO6/5seQdHy6PwPcNB6CDZc6bq jj8RF0toAxam2K2ETdrGKn0= X-Received: by 2002:a05:6a20:4c9a:b0:cb:7958:7071 with SMTP id fq26-20020a056a204c9a00b000cb79587071mr2128294pzb.19.1680857826847; Fri, 07 Apr 2023 01:57:06 -0700 (PDT) Received: from localhost.localdomain ([103.7.29.32]) by smtp.gmail.com with ESMTPSA id u12-20020a62ed0c000000b0062e23e81b24sm2568860pfh.114.2023.04.07.01.57.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Apr 2023 01:57:05 -0700 (PDT) From: Like Xu <like.xu.linux@gmail.com> X-Google-Original-From: Like Xu <likexu@tencent.com> To: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com>, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Ravi Bangoria <ravi.bangoria@amd.com> Subject: [PATCH V2] KVM: x86/pmu: Disable vPMU if EVENTSEL_GUESTONLY bit doesn't exist Date: Fri, 7 Apr 2023 16:56:46 +0800 Message-Id: <20230407085646.24809-1-likexu@tencent.com> X-Mailer: git-send-email 2.40.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762508422574637199?= X-GMAIL-MSGID: =?utf-8?q?1762508422574637199?= |
Series |
[V2] KVM: x86/pmu: Disable vPMU if EVENTSEL_GUESTONLY bit doesn't exist
|
|
Commit Message
Like Xu
April 7, 2023, 8:56 a.m. UTC
From: Like Xu <likexu@tencent.com> Unlike Intel's MSR atomic_switch mechanism, AMD supports guest pmu basic counter feature by setting the GUESTONLY bit on the host, so the presence or absence of this bit determines whether vPMU is emulatable (e.g. in nested virtualization). Since on AMD, writing reserved bits of EVENTSEL register does not bring #GP, KVM needs to update the global enable_pmu value by checking the persistence of this GUESTONLY bit. Cc: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Like Xu <likexu@tencent.com> --- V1: https://lore.kernel.org/kvm/20230307113819.34089-1-likexu@tencent.com V1 -> V2 Changelog: - Preemption needs to be disabled to ensure a stable CPU; (Sean) - KVM should be restoring the original value too; (Sean) - Disable vPMU once guest_only mode is not supported; (Sean) - Appreciate any better way to probe for GUESTONLY support; arch/x86/kvm/svm/svm.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) base-commit: 649bccd7fac98225525c79cf4b1cecc4bafdfc54
Comments
On Fri, Apr 07, 2023, Like Xu wrote: > From: Like Xu <likexu@tencent.com> > > Unlike Intel's MSR atomic_switch mechanism, AMD supports guest pmu > basic counter feature by setting the GUESTONLY bit on the host, so the > presence or absence of this bit determines whether vPMU is emulatable > (e.g. in nested virtualization). Since on AMD, writing reserved bits of > EVENTSEL register does not bring #GP, KVM needs to update the global > enable_pmu value by checking the persistence of this GUESTONLY bit. This is looking more and more like a bug fix, i.e. needs a Fixes:, no? > Cc: Ravi Bangoria <ravi.bangoria@amd.com> > Signed-off-by: Like Xu <likexu@tencent.com> > --- > V1: > https://lore.kernel.org/kvm/20230307113819.34089-1-likexu@tencent.com > V1 -> V2 Changelog: > - Preemption needs to be disabled to ensure a stable CPU; (Sean) > - KVM should be restoring the original value too; (Sean) > - Disable vPMU once guest_only mode is not supported; (Sean) Please respond to my questions, don't just send a new version. When I asked : Why does lack of AMD64_EVENTSEL_GUESTONLY disable the PMU, but if and only if : X86_FEATURE_PERFCTR_CORE? E.g. why does the behavior not also apply to legacy : perfmon support? I wanted an actual answer because I genuinely do not know what the correct behavior is. > - Appreciate any better way to probe for GUESTONLY support; Again, wait for discussion in previous versions to resolve before posting a new version. If your answer is "not as far as I know", that's totally fine, but sending a new version without responding makes it unnecessarily difficult to track down your "answer". E.g. instead of seeing a very direct "I don't know", I had to discover that answer by finding a hint buried in the ignored section of a new patch. > arch/x86/kvm/svm/svm.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 7584eb85410b..1ab885596510 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -4884,6 +4884,20 @@ static __init void svm_adjust_mmio_mask(void) > kvm_mmu_set_mmio_spte_mask(mask, mask, PT_WRITABLE_MASK | PT_USER_MASK); > } > > +static __init bool pmu_has_guestonly_mode(void) > +{ > + u64 original, value; > + > + preempt_disable(); > + rdmsrl(MSR_F15H_PERF_CTL0, original); What guarantees this MSR actually exists? In v1, it was guarded by enable_pmu=%true, but that's longer the case. And KVM does case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5: if (!guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) return NULL; which very strongly suggests this MSR doesn't exist if the CPU supports only the "legacy" PMU. > + wrmsrl(MSR_F15H_PERF_CTL0, AMD64_EVENTSEL_GUESTONLY); > + rdmsrl(MSR_F15H_PERF_CTL0, value); > + wrmsrl(MSR_F15H_PERF_CTL0, original); > + preempt_enable(); > + > + return value == AMD64_EVENTSEL_GUESTONLY; > +} > + > static __init void svm_set_cpu_caps(void) > { > kvm_set_cpu_caps(); > @@ -4928,6 +4942,9 @@ static __init void svm_set_cpu_caps(void) > boot_cpu_has(X86_FEATURE_AMD_SSBD)) > kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD); > > + /* Probe for AMD64_EVENTSEL_GUESTONLY support */ I've said this several times recently: use comments to explain _why_ and to call out subtleties. The code quite obviously is probing for guest-only support, what's not obvious is why guest-only support is mandatory for vPMU support. It may be obvious to you, but pease try to view all of this code from the perspective of someone who has only passing knowledge of the various components, i.e. doesn't know the gory details of exactly what KVM supports. Poking around, I see that pmc_reprogram_counter() unconditionally does .exclude_host = 1, and amd_core_hw_config() if (event->attr.exclude_host && event->attr.exclude_guest) /* * When HO == GO == 1 the hardware treats that as GO == HO == 0 * and will count in both modes. We don't want to count in that * case so we emulate no-counting by setting US = OS = 0. */ event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR | ARCH_PERFMON_EVENTSEL_OS); else if (event->attr.exclude_host) event->hw.config |= AMD64_EVENTSEL_GUESTONLY; else if (event->attr.exclude_guest) event->hw.config |= AMD64_EVENTSEL_HOSTONLY; and so something like this seems appropriate /* * KVM requires guest-only event support in order to isolate guest PMCs * from host PMCs. SVM doesn't provide a way to atomically load MSRs * on VMRUN, and manually adjusting counts before/after VMRUN is not * accurate enough to properly virtualize a PMU. */ But now I'm really confused, because if I'm reading the code correctly, perf invokes amd_core_hw_config() for legacy PMUs, i.e. even if PERFCTR_CORE isn't supported. And the APM documents the host/guest bits only for "Core Performance Event-Select Registers". So either (a) GUESTONLY isn't supported on legacy CPUs and perf is relying on AMD CPUs ignoring reserved bits or (b) GUESTONLY _is_ supported on legacy PMUs and pmu_has_guestonly_mode() is checking the wrong MSR when running on older CPUs. And if (a) is true, then how on earth does KVM support vPMU when running on a legacy PMU? Is vPMU on AMD just wildly broken? Am I missing something?
On 7/4/2023 11:37 pm, Sean Christopherson wrote: > On Fri, Apr 07, 2023, Like Xu wrote: >> From: Like Xu <likexu@tencent.com> >> >> Unlike Intel's MSR atomic_switch mechanism, AMD supports guest pmu >> basic counter feature by setting the GUESTONLY bit on the host, so the >> presence or absence of this bit determines whether vPMU is emulatable >> (e.g. in nested virtualization). Since on AMD, writing reserved bits of >> EVENTSEL register does not bring #GP, KVM needs to update the global >> enable_pmu value by checking the persistence of this GUESTONLY bit. > > This is looking more and more like a bug fix, i.e. needs a Fixes:, no? Fix: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") ? Fix: 4b6e4dca7011 ("KVM: SVM: enable nested svm by default") ? As far as I can tell, the current KVM code makes the PMU counter of AMD nested VMs have ridiculous values. One conservative fix is to disable nested PMU on AMD. > >> Cc: Ravi Bangoria <ravi.bangoria@amd.com> >> Signed-off-by: Like Xu <likexu@tencent.com> >> --- >> V1: >> https://lore.kernel.org/kvm/20230307113819.34089-1-likexu@tencent.com >> V1 -> V2 Changelog: >> - Preemption needs to be disabled to ensure a stable CPU; (Sean) >> - KVM should be restoring the original value too; (Sean) >> - Disable vPMU once guest_only mode is not supported; (Sean) > > Please respond to my questions, don't just send a new version. When I asked > > : Why does lack of AMD64_EVENTSEL_GUESTONLY disable the PMU, but if and only if > : X86_FEATURE_PERFCTR_CORE? E.g. why does the behavior not also apply to legacy > : perfmon support? > > I wanted an actual answer because I genuinely do not know what the correct > behavior is. As far as I know, only AMD CPUs that support PERFCTR_CORE+ will have the GUESTONLY bit, if the BIOS doesn't disable it. The "GO/HO bit" is first introduced for perf-kvm usage (2011) before adding support for AMD guest vPMU (2015). As the guest_only name implies, this bit can be used for SVM to emulate core counter for AMD guests. So if the host does not have this bit (e.g. on the L1 VM), the L2 VM's vPMU will work abnormally. > >> - Appreciate any better way to probe for GUESTONLY support; > > Again, wait for discussion in previous versions to resolve before posting a new > version. If your answer is "not as far as I know", that's totally fine, but > sending a new version without responding makes it unnecessarily difficult to > track down your "answer". E.g. instead of seeing a very direct "I don't know", > I had to discover that answer by finding a hint buried in the ignored section of > a new patch. Okay, then. Nice rule to follow. > >> arch/x86/kvm/svm/svm.c | 17 +++++++++++++++++ >> 1 file changed, 17 insertions(+) >> >> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c >> index 7584eb85410b..1ab885596510 100644 >> --- a/arch/x86/kvm/svm/svm.c >> +++ b/arch/x86/kvm/svm/svm.c >> @@ -4884,6 +4884,20 @@ static __init void svm_adjust_mmio_mask(void) >> kvm_mmu_set_mmio_spte_mask(mask, mask, PT_WRITABLE_MASK | PT_USER_MASK); >> } >> >> +static __init bool pmu_has_guestonly_mode(void) >> +{ >> + u64 original, value; >> + >> + preempt_disable(); >> + rdmsrl(MSR_F15H_PERF_CTL0, original); > > What guarantees this MSR actually exists? In v1, it was guarded by enable_pmu=%true, > but that's longer the case. And KVM does > > case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5: > if (!guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) > return NULL; > > which very strongly suggests this MSR doesn't exist if the CPU supports only the > "legacy" PMU. Yes, the check for boot_cpu_has(X86_FEATURE_PERFCTR_CORE) is missing here. > >> + wrmsrl(MSR_F15H_PERF_CTL0, AMD64_EVENTSEL_GUESTONLY); >> + rdmsrl(MSR_F15H_PERF_CTL0, value); >> + wrmsrl(MSR_F15H_PERF_CTL0, original); >> + preempt_enable(); >> + >> + return value == AMD64_EVENTSEL_GUESTONLY; >> +} >> + >> static __init void svm_set_cpu_caps(void) >> { >> kvm_set_cpu_caps(); >> @@ -4928,6 +4942,9 @@ static __init void svm_set_cpu_caps(void) >> boot_cpu_has(X86_FEATURE_AMD_SSBD)) >> kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD); >> >> + /* Probe for AMD64_EVENTSEL_GUESTONLY support */ > > I've said this several times recently: use comments to explain _why_ and to call > out subtleties. The code quite obviously is probing for guest-only support, what's > not obvious is why guest-only support is mandatory for vPMU support. It may be > obvious to you, but pease try to view all of this code from the perspective of > someone who has only passing knowledge of the various components, i.e. doesn't > know the gory details of exactly what KVM supports. How about: /* * The guest vPMU counter emulation depends on the EVENTSEL_GUESTONLY bit. * If this bit is present on the host, the host needs to support at least the PERFCTR_CORE. */ > > Poking around, I see that pmc_reprogram_counter() unconditionally does > > .exclude_host = 1, > > and amd_core_hw_config() > > if (event->attr.exclude_host && event->attr.exclude_guest) > /* > * When HO == GO == 1 the hardware treats that as GO == HO == 0 > * and will count in both modes. We don't want to count in that > * case so we emulate no-counting by setting US = OS = 0. > */ > event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR | > ARCH_PERFMON_EVENTSEL_OS); > else if (event->attr.exclude_host) > event->hw.config |= AMD64_EVENTSEL_GUESTONLY; > else if (event->attr.exclude_guest) > event->hw.config |= AMD64_EVENTSEL_HOSTONLY; > > and so something like this seems appropriate > > /* > * KVM requires guest-only event support in order to isolate guest PMCs > * from host PMCs. SVM doesn't provide a way to atomically load MSRs > * on VMRUN, and manually adjusting counts before/after VMRUN is not > * accurate enough to properly virtualize a PMU. > */ > > But now I'm really confused, because if I'm reading the code correctly, perf > invokes amd_core_hw_config() for legacy PMUs, i.e. even if PERFCTR_CORE isn't > supported. And the APM documents the host/guest bits only for "Core Performance > Event-Select Registers". > > So either (a) GUESTONLY isn't supported on legacy CPUs and perf is relying on AMD > CPUs ignoring reserved bits or (b) GUESTONLY _is_ supported on legacy PMUs and > pmu_has_guestonly_mode() is checking the wrong MSR when running on older CPUs. > > And if (a) is true, then how on earth does KVM support vPMU when running on a > legacy PMU? Is vPMU on AMD just wildly broken? Am I missing something? > (a) It's true and AMD guest vPMU have only been implemented accurately with the help of this GUESTONLY bit. There are two other scenarios worth discussing here: one is support L2 vPMU on the PERFCTR_CORE+ host and this proposal is disabling it; and the other case is to support AMD legacy vPMU on the PERFCTR_CORE+ host. Please let me know if you have more concerns.
On Thu, Sep 14, 2023, Like Xu wrote: > On 7/4/2023 11:37 pm, Sean Christopherson wrote: > > On Fri, Apr 07, 2023, Like Xu wrote: > /* > * The guest vPMU counter emulation depends on the EVENTSEL_GUESTONLY bit. > * If this bit is present on the host, the host needs to support at least > the PERFCTR_CORE. > */ ... > > /* > > * KVM requires guest-only event support in order to isolate guest PMCs > > * from host PMCs. SVM doesn't provide a way to atomically load MSRs > > * on VMRUN, and manually adjusting counts before/after VMRUN is not > > * accurate enough to properly virtualize a PMU. > > */ > > > > But now I'm really confused, because if I'm reading the code correctly, perf > > invokes amd_core_hw_config() for legacy PMUs, i.e. even if PERFCTR_CORE isn't > > supported. And the APM documents the host/guest bits only for "Core Performance > > Event-Select Registers". > > > > So either (a) GUESTONLY isn't supported on legacy CPUs and perf is relying on AMD > > CPUs ignoring reserved bits or (b) GUESTONLY _is_ supported on legacy PMUs and > > pmu_has_guestonly_mode() is checking the wrong MSR when running on older CPUs. > > > > And if (a) is true, then how on earth does KVM support vPMU when running on a > > legacy PMU? Is vPMU on AMD just wildly broken? Am I missing something? > > > > (a) It's true and AMD guest vPMU have only been implemented accurately with > the help of this GUESTONLY bit. > > There are two other scenarios worth discussing here: one is support L2 vPMU > on the PERFCTR_CORE+ host and this proposal is disabling it; and the other > case is to support AMD legacy vPMU on the PERFCTR_CORE+ host. Oooh, so the really problematic case is when PERFCTR_CORE+ is supported but GUESTONLY is not, in which case KVM+perf *think* they can use GUESTONLY (and HOSTONLY). That's a straight up KVM (as L0) bug, no? I don't see anything in the APM that suggests those bits are optional, i.e. KVM is blatantly violating AMD's architecture by ignoring those bits. I would rather fix KVM (as L0). It doesn't seem _that_ hard to support, e.g. modify reprogram_counter() to disable the counter if it's supposed to be silent for the current mode, and reprogram all counters if EFER.SVME is toggled, and on all nested transitions.
On 26/9/2023 7:31 am, Sean Christopherson wrote: > On Thu, Sep 14, 2023, Like Xu wrote: >> On 7/4/2023 11:37 pm, Sean Christopherson wrote: >>> On Fri, Apr 07, 2023, Like Xu wrote: >> /* >> * The guest vPMU counter emulation depends on the EVENTSEL_GUESTONLY bit. >> * If this bit is present on the host, the host needs to support at least >> the PERFCTR_CORE. >> */ > > ... > >>> /* >>> * KVM requires guest-only event support in order to isolate guest PMCs >>> * from host PMCs. SVM doesn't provide a way to atomically load MSRs >>> * on VMRUN, and manually adjusting counts before/after VMRUN is not >>> * accurate enough to properly virtualize a PMU. >>> */ >>> >>> But now I'm really confused, because if I'm reading the code correctly, perf >>> invokes amd_core_hw_config() for legacy PMUs, i.e. even if PERFCTR_CORE isn't >>> supported. And the APM documents the host/guest bits only for "Core Performance >>> Event-Select Registers". >>> >>> So either (a) GUESTONLY isn't supported on legacy CPUs and perf is relying on AMD >>> CPUs ignoring reserved bits or (b) GUESTONLY _is_ supported on legacy PMUs and >>> pmu_has_guestonly_mode() is checking the wrong MSR when running on older CPUs. >>> >>> And if (a) is true, then how on earth does KVM support vPMU when running on a >>> legacy PMU? Is vPMU on AMD just wildly broken? Am I missing something? >>> >> >> (a) It's true and AMD guest vPMU have only been implemented accurately with >> the help of this GUESTONLY bit. >> >> There are two other scenarios worth discussing here: one is support L2 vPMU >> on the PERFCTR_CORE+ host and this proposal is disabling it; and the other >> case is to support AMD legacy vPMU on the PERFCTR_CORE+ host. > > Oooh, so the really problematic case is when PERFCTR_CORE+ is supported but > GUESTONLY is not, in which case KVM+perf *think* they can use GUESTONLY (and > HOSTONLY). > > That's a straight up KVM (as L0) bug, no? I don't see anything in the APM that > suggests those bits are optional, i.e. KVM is blatantly violating AMD's architecture > by ignoring those bits. For L2 guest, it often doesn't see all the cpu features corresponding to the cpu model because KVM and VMM filter some of the capabilities. We can't say that the absence of these features violates spec, can we ? I treat it as a KVM flaw or a lack of emulation capability. > > I would rather fix KVM (as L0). It doesn't seem _that_ hard to support, e.g. > modify reprogram_counter() to disable the counter if it's supposed to be silent > for the current mode, and reprogram all counters if EFER.SVME is toggled, and on > all nested transitions. I thought about that too, setting up EFER.SVME and VMRUN is still a little bit far away, and more micro-testing is needed to correct the behavior of the emulation here, considering KVM also has to support emulated ins. It's safe to say that there are no real user scenarios using vPMU in a nested guest, so I'm more inclined to disable it provisionally (for the sake of more stable tree users), enabling this feature is honestly at the end of my to-do list.
On Tue, Sep 26, 2023, Like Xu wrote: > On 26/9/2023 7:31 am, Sean Christopherson wrote: > > On Thu, Sep 14, 2023, Like Xu wrote: > > > On 7/4/2023 11:37 pm, Sean Christopherson wrote: > > > > On Fri, Apr 07, 2023, Like Xu wrote: > > > /* > > > * The guest vPMU counter emulation depends on the EVENTSEL_GUESTONLY bit. > > > * If this bit is present on the host, the host needs to support at least > > > the PERFCTR_CORE. > > > */ > > > > ... > > > > > > /* > > > > * KVM requires guest-only event support in order to isolate guest PMCs > > > > * from host PMCs. SVM doesn't provide a way to atomically load MSRs > > > > * on VMRUN, and manually adjusting counts before/after VMRUN is not > > > > * accurate enough to properly virtualize a PMU. > > > > */ > > > > > > > > But now I'm really confused, because if I'm reading the code correctly, perf > > > > invokes amd_core_hw_config() for legacy PMUs, i.e. even if PERFCTR_CORE isn't > > > > supported. And the APM documents the host/guest bits only for "Core Performance > > > > Event-Select Registers". > > > > > > > > So either (a) GUESTONLY isn't supported on legacy CPUs and perf is relying on AMD > > > > CPUs ignoring reserved bits or (b) GUESTONLY _is_ supported on legacy PMUs and > > > > pmu_has_guestonly_mode() is checking the wrong MSR when running on older CPUs. > > > > > > > > And if (a) is true, then how on earth does KVM support vPMU when running on a > > > > legacy PMU? Is vPMU on AMD just wildly broken? Am I missing something? > > > > > > > > > > (a) It's true and AMD guest vPMU have only been implemented accurately with > > > the help of this GUESTONLY bit. > > > > > > There are two other scenarios worth discussing here: one is support L2 vPMU > > > on the PERFCTR_CORE+ host and this proposal is disabling it; and the other > > > case is to support AMD legacy vPMU on the PERFCTR_CORE+ host. > > > > Oooh, so the really problematic case is when PERFCTR_CORE+ is supported but > > GUESTONLY is not, in which case KVM+perf *think* they can use GUESTONLY (and > > HOSTONLY). > > > > That's a straight up KVM (as L0) bug, no? I don't see anything in the APM that > > suggests those bits are optional, i.e. KVM is blatantly violating AMD's architecture > > by ignoring those bits. > > For L2 guest, it often doesn't see all the cpu features corresponding to the > cpu model because KVM and VMM filter some of the capabilities. We can't say > that the absence of these features violates spec, can we ? Yes, KVM hides features via architectural means. This is enumerating a feature, PERFCTR_CORE, and not providing all its functionalality. The two things are distinctly different. > I treat it as a KVM flaw or a lack of emulation capability. A.k.a. a bug. > > I would rather fix KVM (as L0). It doesn't seem _that_ hard to support, e.g. > > modify reprogram_counter() to disable the counter if it's supposed to be silent > > for the current mode, and reprogram all counters if EFER.SVME is toggled, and on > > all nested transitions. > > I thought about that too, setting up EFER.SVME and VMRUN is still a little > bit far away, and more micro-testing is needed to correct the behavior > of the emulation here, considering KVM also has to support emulated ins. > > It's safe to say that there are no real user scenarios using vPMU in a nested > guest, so I'm more inclined to disable it provisionally (for the sake of more > stable tree users), enabling this feature is honestly at the end of my to-do list. I don't see a way to do that without violating AMD's architecture while still exposing the PMU to L1.
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 7584eb85410b..1ab885596510 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4884,6 +4884,20 @@ static __init void svm_adjust_mmio_mask(void) kvm_mmu_set_mmio_spte_mask(mask, mask, PT_WRITABLE_MASK | PT_USER_MASK); } +static __init bool pmu_has_guestonly_mode(void) +{ + u64 original, value; + + preempt_disable(); + rdmsrl(MSR_F15H_PERF_CTL0, original); + wrmsrl(MSR_F15H_PERF_CTL0, AMD64_EVENTSEL_GUESTONLY); + rdmsrl(MSR_F15H_PERF_CTL0, value); + wrmsrl(MSR_F15H_PERF_CTL0, original); + preempt_enable(); + + return value == AMD64_EVENTSEL_GUESTONLY; +} + static __init void svm_set_cpu_caps(void) { kvm_set_cpu_caps(); @@ -4928,6 +4942,9 @@ static __init void svm_set_cpu_caps(void) boot_cpu_has(X86_FEATURE_AMD_SSBD)) kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD); + /* Probe for AMD64_EVENTSEL_GUESTONLY support */ + enable_pmu &= pmu_has_guestonly_mode(); + /* AMD PMU PERFCTR_CORE CPUID */ if (enable_pmu && boot_cpu_has(X86_FEATURE_PERFCTR_CORE)) kvm_cpu_cap_set(X86_FEATURE_PERFCTR_CORE);