Message ID | 20230322093117.48335-1-likexu@tencent.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp2248298wrt; Wed, 22 Mar 2023 02:46:59 -0700 (PDT) X-Google-Smtp-Source: AK7set+LMfWJ2UOXFi8ezAYxPJVfqUxcj7UvdBngoimfK4cWA4pBl2gbEJmWKQmkzaE+aeO3UXbF X-Received: by 2002:aa7:c448:0:b0:4fd:20e5:4142 with SMTP id n8-20020aa7c448000000b004fd20e54142mr7316164edr.21.1679478419174; Wed, 22 Mar 2023 02:46:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679478419; cv=none; d=google.com; s=arc-20160816; b=xZDX13kNWlJFQHs4cSBksQUZ4kNS9KdFJDtuRaFWzlenvCW4C98lUHyu1QEuJPGi0g cHx2ZEqRT6gjTd81gYjbCjo+K3nMs3Pa+EtkaECYEJJNgUU0LjSDRJHOK9IUsUdp6hb0 xc/FPYjQgK1BT7K8q0lgdElPW7hefpYcW2knQmMzgCBP2QwBIF430TzT6I6EfCbIIKXf OxaC1/ohAaoJYkjTXW5Jf32zM5uI7WDy9accbdIlWHKVa4d1lzwkbhK9p1jyzaNPql37 1Lm2rDNHooPMsm/i0BjMbvVrOQbje667LewosaYKyPUE4dDf0828gBQYIIRI5tfztIU5 u6Gw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=b/P1OzWAyXP8n6l8YX4wHeS0sQVxDP0LeHhymaMs17I=; b=Wfodm7UykJaLV+ex91yC+4MQQN1CZH1DMq9b88grNKV5HEBlTIxPBemgpPAlRgosby n6THzD63/2CJamtMk4/n+ilXmcRAlyODzTiDqau5z1N79OYskWbv3D1stT/QXkzgN7kC bU8CeSw8OUtS4UIXjH6ZNS6lhjNHD2KqRWNofwJgUiH1oeLyjKIGbFzZnqVc6lTbTgBA wKjdT4FS4q2wmdQ400JxjBBshKDMXXj1XwGD3RTfqA3A+CeKX0LxuUm7AzTHqFhQiJqS l6vsJdGvPahv7SpiHKJeF7BU5aZo39iKkFeMLhr/driwNMHQ+OL0Z2Igp7tIGPSzirz5 AnAQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="m1vZW/vs"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j12-20020aa7c0cc000000b004fce9efe651si14776950edp.603.2023.03.22.02.46.35; Wed, 22 Mar 2023 02:46:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="m1vZW/vs"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230295AbjCVJbu (ORCPT <rfc822;ezelljr.billy@gmail.com> + 99 others); Wed, 22 Mar 2023 05:31:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53844 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230443AbjCVJbm (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 22 Mar 2023 05:31:42 -0400 Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54DB2442FE; Wed, 22 Mar 2023 02:31:37 -0700 (PDT) Received: by mail-pj1-x1031.google.com with SMTP id mp3-20020a17090b190300b0023fcc8ce113so8040941pjb.4; Wed, 22 Mar 2023 02:31:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679477497; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=b/P1OzWAyXP8n6l8YX4wHeS0sQVxDP0LeHhymaMs17I=; b=m1vZW/vsDyPNyox3J2edRV5FS8ChuoOn5EwpCPC35JI25c2qcuj1Xj25vD9jH+uj8f vcB4pQ1n8UmJ2IU7ttfWju4CM1WXWP+05VNAoODrYXdpsNSPP+mCodbf85uAAC/HQU9b KS5sz7IQwlK1W4miEfZrbo+iA9Fc1F5/bCJ/Cy5XOnmjxky/mX5onlgoHJdyFeoqltKL A6vdrfXr8JQKUAdUztBuCulhAX7LZjhlGcdQY1jHJ9v229GW5Dq428LuV0dplqRbASx8 CuEZAaRoOGlbPaQQNf/dvCgm8L5IE9+JT/DMqxhyOzjskGsAQQot5RzznOGLFQ4xFh54 ne8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679477497; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=b/P1OzWAyXP8n6l8YX4wHeS0sQVxDP0LeHhymaMs17I=; b=qD1j7w1QWgjPZ6IwjqYFMZE6RA9kyiZ8t5xuMsBBMihoSq0erfvxtJKr7yOb6xrxNR ze8XYAkh2AOZSImiSPFDtw7C9IgL+HoljDsg3lm4Kpxxhpb2OHGrnIl7l5+Fc7NpRoLS cczFzfbckdPt54fPBYGk2fiLLemALnuyxvgkARgzYnqH2FRtpkT5EH9eIdvBhK7Q1eFR duvc/RhQ113Qgvme4u1yOetslERBmpsfkXim+z/O2nLb565qQvw+1U2gzzIVZgI2P8it BWdC0QUfW4QkUZ/aTwT3GsSzcMFMySLNnnxTol3Omy/h/ysBkawEVU1yNY5ubgTusWpV e7rg== X-Gm-Message-State: AO0yUKWPGFaOaBGiah7tF2RDM5+Bqs/FPFsLA3KEmQBpr3Nbq2bjkCqd KzDuQKbef9g2z7u2NCuS7Dc= X-Received: by 2002:a05:6a20:65a9:b0:d4:b24b:4459 with SMTP id p41-20020a056a2065a900b000d4b24b4459mr5353505pzh.13.1679477496663; Wed, 22 Mar 2023 02:31:36 -0700 (PDT) Received: from localhost.localdomain ([103.7.29.32]) by smtp.gmail.com with ESMTPSA id e18-20020a62ee12000000b00625b9e625fdsm9902548pfi.179.2023.03.22.02.31.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Mar 2023 02:31:36 -0700 (PDT) From: Like Xu <like.xu.linux@gmail.com> X-Google-Original-From: Like Xu <likexu@tencent.com> To: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com>, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2] KVM: x86/pmu: Fix emulation on Intel counters' bit width Date: Wed, 22 Mar 2023 17:31:17 +0800 Message-Id: <20230322093117.48335-1-likexu@tencent.com> X-Mailer: git-send-email 2.40.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761060762680795824?= X-GMAIL-MSGID: =?utf-8?q?1761060762680795824?= |
Series |
[v2] KVM: x86/pmu: Fix emulation on Intel counters' bit width
|
|
Commit Message
Like Xu
March 22, 2023, 9:31 a.m. UTC
From: Like Xu <likexu@tencent.com> Per Intel SDM, the bit width of a PMU counter is specified via CPUID only if the vCPU has FW_WRITE[bit 13] on IA32_PERF_CAPABILITIES. When the FW_WRITE bit is not set, only EAX is valid and out-of-bounds bits accesses do not generate #GP. Conversely when this bit is set, #GP for out-of-bounds bits accesses will also appear on the fixed counters. vPMU currently does not support emulation of bit widths lower than 32 bits or higher than its host capability. Signed-off-by: Like Xu <likexu@tencent.com> --- Previous: https://lore.kernel.org/kvm/20230316113312.54714-1-likexu@tencent.com/ V1 -> V2 Changelog: - Apply #GP rule to fixed counetrs when guest has FW_WRITE; - Apply signed rule to fixed counetrs when guest doesn't have FW_WRITE; - Counters' bit width set by cpuid cannot be less than 32 bits; arch/x86/kvm/vmx/pmu_intel.c | 10 ++++++++++ 1 file changed, 10 insertions(+) base-commit: d8708b80fa0e6e21bc0c9e7276ad0bccef73b6e7
Comments
On Wed, Mar 22, 2023 at 10:31 AM Like Xu <like.xu.linux@gmail.com> wrote: > > From: Like Xu <likexu@tencent.com> > > Per Intel SDM, the bit width of a PMU counter is specified via CPUID > only if the vCPU has FW_WRITE[bit 13] on IA32_PERF_CAPABILITIES. > When the FW_WRITE bit is not set, only EAX is valid and out-of-bounds > bits accesses do not generate #GP. Conversely when this bit is set, #GP > for out-of-bounds bits accesses will also appear on the fixed counters. > vPMU currently does not support emulation of bit widths lower than 32 > bits or higher than its host capability. Can you please point out the date and paragraph of the SDM? Paolo
On 27/3/2023 10:30 pm, Paolo Bonzini wrote: > On Wed, Mar 22, 2023 at 10:31 AM Like Xu <like.xu.linux@gmail.com> wrote: >> >> From: Like Xu <likexu@tencent.com> >> >> Per Intel SDM, the bit width of a PMU counter is specified via CPUID >> only if the vCPU has FW_WRITE[bit 13] on IA32_PERF_CAPABILITIES. >> When the FW_WRITE bit is not set, only EAX is valid and out-of-bounds >> bits accesses do not generate #GP. Conversely when this bit is set, #GP >> for out-of-bounds bits accesses will also appear on the fixed counters. >> vPMU currently does not support emulation of bit widths lower than 32 >> bits or higher than its host capability. > > Can you please point out the date and paragraph of the SDM? > > Paolo > 25462-078US, December 2022 20.2.6 Full-Width Writes to Performance Counter Registers The general-purpose performance counter registers IA32_PMCx are writable via WRMSR instruction. However, the value written into IA32_PMCx by WRMSR is the signed extended 64-bit value of the EAX[31:0] input of WRMSR. A processor that supports full-width writes to the general-purpose performance counters enumerated by CPUID.0AH:EAX[15:8] will set IA32_PERF_CAPABILITIES[13] to enumerate its full-width-write capability See Figure 20-65. If IA32_PERF_CAPABILITIES.FW_WRITE[bit 13] =1, each IA32_PMCi is accompanied by a corresponding alias address starting at 4C1H for IA32_A_PMC0. The bit width of the performance monitoring counters is specified in CPUID.0AH:EAX[23:16]. If IA32_A_PMCi is present, the 64-bit input value (EDX:EAX) of WRMSR to IA32_A_PMCi will cause IA32_PMCi to be updated by: COUNTERWIDTH = CPUID.0AH:EAX[23:16] bit width of the performance monitoring counter IA32_PMCi[COUNTERWIDTH-1:32] := EDX[COUNTERWIDTH-33:0]); IA32_PMCi[31:0] := EAX[31:0]; EDX[63:COUNTERWIDTH] are reserved --- Some might argue that this is all talking about GP counters, not fixed counters. In fact, the full-width write hw behaviour is presumed to do the same thing for all counters. Commercial hardware will not use less than 32 bits or a bit width like 46 bits. A KVM user space (such as selftests) may set a strange bit-width, for example using 33 bits, and based on the current code, writing the reserved bits for #fixed counters doesn't cause #GP. Also when the guest does not have the Full-Width feature, the fixed counters can be more than 32 bits wide via CPUID, while the #GP counter is only 32 bits wide, which is also monstrous. The current KVM is also not capable of emulating counter overflow when KVM user space is set to a bit width of less than 32 bits w/ FW_WRITE. The above SDM-undefined behaviour led to this fix, which may lift some of the fog.
On 3/28/23 11:16, Like Xu wrote: > > > If IA32_PERF_CAPABILITIES.FW_WRITE[bit 13] =1, each IA32_PMCi is > accompanied by a > corresponding alias address starting at 4C1H for IA32_A_PMC0. > > The bit width of the performance monitoring counters is specified in > CPUID.0AH:EAX[23:16]. > If IA32_A_PMCi is present, the 64-bit input value (EDX:EAX) of WRMSR to > IA32_A_PMCi will cause > IA32_PMCi to be updated by: > > COUNTERWIDTH = > CPUID.0AH:EAX[23:16] bit width of the performance monitoring > counter > IA32_PMCi[COUNTERWIDTH-1:32] := EDX[COUNTERWIDTH-33:0]); > IA32_PMCi[31:0] := EAX[31:0]; > EDX[63:COUNTERWIDTH] are reserved > > --- > > Some might argue that this is all talking about GP counters, not > fixed counters. In fact, the full-width write hw behaviour is > presumed to do the same thing for all counters. But the above behavior, and the #GP, is only true for IA32_A_PMCi (the full-witdh MSR). Did I understand correctly that the behavior for fixed counters is changed without introducing an alias MSR? Paolo
On 28/3/2023 5:20 pm, Paolo Bonzini wrote: > On 3/28/23 11:16, Like Xu wrote: >> >> >> If IA32_PERF_CAPABILITIES.FW_WRITE[bit 13] =1, each IA32_PMCi is accompanied by a >> corresponding alias address starting at 4C1H for IA32_A_PMC0. >> >> The bit width of the performance monitoring counters is specified in >> CPUID.0AH:EAX[23:16]. >> If IA32_A_PMCi is present, the 64-bit input value (EDX:EAX) of WRMSR to >> IA32_A_PMCi will cause >> IA32_PMCi to be updated by: >> >> COUNTERWIDTH = >> CPUID.0AH:EAX[23:16] bit width of the performance monitoring counter >> IA32_PMCi[COUNTERWIDTH-1:32] := EDX[COUNTERWIDTH-33:0]); >> IA32_PMCi[31:0] := EAX[31:0]; >> EDX[63:COUNTERWIDTH] are reserved >> >> --- >> >> Some might argue that this is all talking about GP counters, not >> fixed counters. In fact, the full-width write hw behaviour is >> presumed to do the same thing for all counters. > But the above behavior, and the #GP, is only true for IA32_A_PMCi (the > full-witdh MSR). Did I understand correctly that the behavior for fixed > counters is changed without introducing an alias MSR? > > Paolo > If true, why introducing those alias MSRs ? My archaeological findings are: a platform w/o full-witdh like Westmere (has 3-fixed counters already) is declared to have a counter width (R:48, W:32) and its successor Sandy Bridge has (R:48 , W: 32/48). Thus I think the behaviour of the fixed counter has changed from there, and the alias GP MSRs were introduced to keep the support on 32-bit writes on #GP counters (via original address). [*] Intel® 64 and IA-32 Architectures Software Developer’s Manual Documentation Changes (252046-030, January 2011) Table 30-18 Core PMU Comparison.
On Tue, Mar 28, 2023, Like Xu wrote: > On 28/3/2023 5:20 pm, Paolo Bonzini wrote: > > On 3/28/23 11:16, Like Xu wrote: > > > > > > > > > If IA32_PERF_CAPABILITIES.FW_WRITE[bit 13] =1, each IA32_PMCi is accompanied by a > > > corresponding alias address starting at 4C1H for IA32_A_PMC0. > > > > > > The bit width of the performance monitoring counters is specified in > > > CPUID.0AH:EAX[23:16]. > > > If IA32_A_PMCi is present, the 64-bit input value (EDX:EAX) of WRMSR > > > to IA32_A_PMCi will cause > > > IA32_PMCi to be updated by: > > > > > > �����COUNTERWIDTH = > > > �������� CPUID.0AH:EAX[23:16] bit width of the performance monitoring counter > > > �����IA32_PMCi[COUNTERWIDTH-1:32] := EDX[COUNTERWIDTH-33:0]); > > > �����IA32_PMCi[31:0] := EAX[31:0]; > > > �����EDX[63:COUNTERWIDTH] are reserved > > > > > > --- > > > > > > Some might argue that this is all talking about GP counters, not > > > fixed counters. In fact, the full-width write hw behaviour is > > > presumed to do the same thing for all counters. > > But the above behavior, and the #GP, is only true for IA32_A_PMCi (the > > full-witdh MSR).� Did I understand correctly that the behavior for fixed > > counters is changed without introducing an alias MSR? > > > > Paolo > > > > If true, why introducing those alias MSRs ? My guess is there is/was software in the field that wrote -1 to the GP counters, i.e. would have been broken by the new #GP behavior. > My archaeological findings are: > > a platform w/o full-witdh like Westmere (has 3-fixed counters already) is > declared to have a counter width (R:48, W:32) and its successor Sandy Bridge > has (R:48 , W: 32/48). > > Thus I think the behaviour of the fixed counter has changed from there, and > the alias GP MSRs were introduced to keep the support on 32-bit writes on #GP > counters (via original address). FWIW, I see the #GP behavior for fixed counters on Haswell, so this does seem to be the case. That said, I would like to get confirmation from Intel that this is architectural and/or working as intended. Like, can you follow up with Intel to get clarification/confirmation? And ideally an SDM update...
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index e8a3be0b9df9..d38b820d6b9e 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -470,6 +470,12 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) pmc_update_sample_period(pmc); return 0; } else if ((pmc = get_fixed_pmc(pmu, msr))) { + if (fw_writes_is_enabled(vcpu)) { + if (data & ~pmu->counter_bitmask[KVM_PMC_FIXED]) + return 1; + } else if (!msr_info->host_initiated) { + data = (s64)(s32)data; + } pmc->counter += data - pmc_read_counter(pmc); pmc_update_sample_period(pmc); return 0; @@ -516,6 +522,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu) union cpuid10_edx edx; u64 perf_capabilities; u64 counter_mask; + bool fw_wr = fw_writes_is_enabled(vcpu); int i; pmu->nr_arch_gp_counters = 0; @@ -543,6 +550,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu) pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters, kvm_pmu_cap.num_counters_gp); + eax.split.bit_width = fw_wr ? max_t(int, 32, eax.split.bit_width) : 32; eax.split.bit_width = min_t(int, eax.split.bit_width, kvm_pmu_cap.bit_width_gp); pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << eax.split.bit_width) - 1; @@ -558,6 +566,8 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu) min3(ARRAY_SIZE(fixed_pmc_events), (size_t) edx.split.num_counters_fixed, (size_t)kvm_pmu_cap.num_counters_fixed); + edx.split.bit_width_fixed = fw_wr ? + max_t(int, 32, edx.split.bit_width_fixed) : 32; edx.split.bit_width_fixed = min_t(int, edx.split.bit_width_fixed, kvm_pmu_cap.bit_width_fixed); pmu->counter_bitmask[KVM_PMC_FIXED] =