Message ID | 20231024002633.2540714-9-seanjc@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp1633994vqx; Mon, 23 Oct 2023 17:28:22 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHK/t7PIrs6PXJGvPLCLcnJVanIN7SqVBzYVINf/lUA/zZIRXUrbizzrHqRiGfYZQwyVkKC X-Received: by 2002:a05:6358:ce30:b0:168:cd22:c8e3 with SMTP id gt48-20020a056358ce3000b00168cd22c8e3mr7424780rwb.0.1698107302035; Mon, 23 Oct 2023 17:28:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698107301; cv=none; d=google.com; s=arc-20160816; b=0pGncnizqdzmjxFL1Yb8OXeem+PtRkrzJbz6DraZyhgwCsXw8o9XM7Sc++hBtWifXx L2vjEJi5F03J9DTzgLD5RdhTYDaGsK8e+K2TfdF8pE/6oHT46zlcT0hzLyEFoBklgzzS h3Dk8rDKNjajbz1uVu+5OOPLxs3Ye6F3YQFaUrjqGDH9Tg/uEduvrFDAut6VnyZmg8/9 CPZRGvVdNMU/dCRmlyK65zMvRhzfGBrdw4/Xz9MDdDtPOrXJhFNTdVMeW6V1QEiaQUOG Xdyi/l4VAu6r6bhuacC/Ra1il6iismskKRV6+C0av13nJVUs43eBFIdj7/6MrO95pHQx kP3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=0U+dLPR8elz3+cGklc6/3G8uITKyjqrZ5KLIjMTZ3q8=; fh=D//PLXZS63mgWSQM2kcZZyW+2k64a7rh3i2k12EZPR4=; b=uPCy7oQW8wg9nxM6iPE/wq7oKiO1LFCYXDJI3FCQrExF6AtlahLS0yqeOsCUgtRbEh j6ynH5VXqyINa6j2JrgcxrsCUUfgjhxzZ/iT7VuVcx049sqhkXO2RLlVkgtFic/3xIOn xD6PJYYiiq60bIpvjRjY8FUzn+P58Sv+uUHpkL8LWR1mY4Qe4TGneitSkx08/5reYkz9 h1IMV6oHKbNFSwELJyBUp63vIMhUrRUN2MI8yeJqdfSWhTFKGGBwdLJQeeTe4JoIQBzG OZ/3lizJC+kuao10hdrjnS+e2iD4R6NiqkkhNYw3jw0TuEiSQOCSU+Jmv6uGqrDXWjw7 PCRg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=yU1A3L10; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id j17-20020a633c11000000b00578b9d1d118si7224570pga.219.2023.10.23.17.28.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Oct 2023 17:28:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=yU1A3L10; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 44B03809A799; Mon, 23 Oct 2023 17:28:08 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231942AbjJXA1W (ORCPT <rfc822;aposhian.dev@gmail.com> + 27 others); Mon, 23 Oct 2023 20:27:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231913AbjJXA1D (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 23 Oct 2023 20:27:03 -0400 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8D0B10F3 for <linux-kernel@vger.kernel.org>; Mon, 23 Oct 2023 17:26:51 -0700 (PDT) Received: by mail-pl1-x64a.google.com with SMTP id d9443c01a7336-1c9c939cc94so28740485ad.1 for <linux-kernel@vger.kernel.org>; Mon, 23 Oct 2023 17:26:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698107211; x=1698712011; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=0U+dLPR8elz3+cGklc6/3G8uITKyjqrZ5KLIjMTZ3q8=; b=yU1A3L10BSMTA6J2kZ593aTlbJ7631hanoC4gNl2rTP6fQ7wgVbTPa3+ExDiRICWDp qMgaeRkd0SBl/TBpLh2yAyfp+iCgibWfqpkoUjE+gOr9roCdicYQYl6L6B71U2Z30OvF kCTea/DCcpUykKLj7n7gPgoLa6MJBPlGtgoS914YjvDNy98Tg+8zacG7GMlKuq+BrgwD JNdZ+aTz7e+WM5Xlc0XXK93E487SjI7AlRwTmFaPeUDr0zzh/npVktOJpGFirTBfrRTE Rod8OZtUCrH+wT5+qbFA2SVBuRsPBeu07qBozr01n2g0MnlfI629RKOwVc7mHC8uoEbR /K0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698107211; x=1698712011; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=0U+dLPR8elz3+cGklc6/3G8uITKyjqrZ5KLIjMTZ3q8=; b=YA7zSnJj1sTm2PinT2h6ast30nbRq/hPsYnJzknQ52n2cnYlZHB50C2sIWBMPZLkJ6 gnH+iR7+RcEIWA5aeITylUCIjVONg1iIkS7dwffFZbi6AIPrK2+TfTPC2WRY+uJriBr9 sHuSskzx+OZpNHfiXY3fbje+WAwSMBqAk5WCksQ2S3yJA3rxZjqmhnGYMzGNl5FOOxJ0 OzoBcAffmdJMQVN/bCeehTYHF5LHadCsL9xTQgISmdqE8DN2z8UHnXsG1oqdhxqGkTne DAqcwEmyQndEzW9AyBKmHVyw7/++9TDshX3VmaH5iv+wPFYQTz9MtO5m6HMuBrIPsAY6 g6Ig== X-Gm-Message-State: AOJu0Yw0fyg+uyP1OikILuFasAQvdGMIpky+7zWuKePgy4Enf5RrNx2G hFyyCOoj2an6OnsqLyQM27riBE6FWJY= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:64c2:b0:1cb:de60:874c with SMTP id y2-20020a17090264c200b001cbde60874cmr37994pli.12.1698107211277; Mon, 23 Oct 2023 17:26:51 -0700 (PDT) Reply-To: Sean Christopherson <seanjc@google.com> Date: Mon, 23 Oct 2023 17:26:28 -0700 In-Reply-To: <20231024002633.2540714-1-seanjc@google.com> Mime-Version: 1.0 References: <20231024002633.2540714-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.758.gaed0368e0e-goog Message-ID: <20231024002633.2540714-9-seanjc@google.com> Subject: [PATCH v5 08/13] KVM: selftests: Test Intel PMU architectural events on gp counters From: Sean Christopherson <seanjc@google.com> To: Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Jinrong Liang <cloudliang@tencent.com>, Like Xu <likexu@tencent.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Mon, 23 Oct 2023 17:28:08 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780594562656788396 X-GMAIL-MSGID: 1780594562656788396 |
Series |
KVM: x86/pmu: selftests: Fixes and new tests
|
|
Commit Message
Sean Christopherson
Oct. 24, 2023, 12:26 a.m. UTC
From: Jinrong Liang <cloudliang@tencent.com> Add test cases to check if different Architectural events are available after it's marked as unavailable via CPUID. It covers vPMU event filtering logic based on Intel CPUID, which is a complement to pmu_event_filter. According to Intel SDM, the number of architectural events is reported through CPUID.0AH:EAX[31:24] and the architectural event x is supported if EBX[x]=0 && EAX[31:24]>x. Co-developed-by: Like Xu <likexu@tencent.com> Signed-off-by: Like Xu <likexu@tencent.com> Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> --- tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/x86_64/pmu_counters_test.c | 189 ++++++++++++++++++ 2 files changed, 190 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
Comments
On Mon, Oct 23, 2023, Sean Christopherson wrote: > +static void guest_measure_pmu_v1(struct kvm_x86_pmu_feature event, > + uint32_t counter_msr, uint32_t nr_gp_counters) > +{ > + uint8_t idx = event.f.bit; > + unsigned int i; > + > + for (i = 0; i < nr_gp_counters; i++) { > + wrmsr(counter_msr + i, 0); > + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | > + ARCH_PERFMON_EVENTSEL_ENABLE | intel_pmu_arch_events[idx]); > + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); > + > + if (pmu_is_intel_event_stable(idx)) > + GUEST_ASSERT_EQ(this_pmu_has(event), !!_rdpmc(i)); > + > + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | > + !ARCH_PERFMON_EVENTSEL_ENABLE | > + intel_pmu_arch_events[idx]); > + wrmsr(counter_msr + i, 0); > + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); > + > + if (pmu_is_intel_event_stable(idx)) > + GUEST_ASSERT(!_rdpmc(i)); > + } > + > + GUEST_DONE(); > +} > + > +static void guest_measure_loop(uint8_t idx) > +{ > + const struct { > + struct kvm_x86_pmu_feature gp_event; > + } intel_event_to_feature[] = { > + [INTEL_ARCH_CPU_CYCLES] = { X86_PMU_FEATURE_CPU_CYCLES }, > + [INTEL_ARCH_INSTRUCTIONS_RETIRED] = { X86_PMU_FEATURE_INSNS_RETIRED }, > + [INTEL_ARCH_REFERENCE_CYCLES] = { X86_PMU_FEATURE_REFERENCE_CYCLES }, > + [INTEL_ARCH_LLC_REFERENCES] = { X86_PMU_FEATURE_LLC_REFERENCES }, > + [INTEL_ARCH_LLC_MISSES] = { X86_PMU_FEATURE_LLC_MISSES }, > + [INTEL_ARCH_BRANCHES_RETIRED] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED }, > + [INTEL_ARCH_BRANCHES_MISPREDICTED] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED }, > + }; > + > + uint32_t nr_gp_counters = this_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS); > + uint32_t pmu_version = this_cpu_property(X86_PROPERTY_PMU_VERSION); > + struct kvm_x86_pmu_feature gp_event; > + uint32_t counter_msr; > + unsigned int i; > + > + if (rdmsr(MSR_IA32_PERF_CAPABILITIES) & PMU_CAP_FW_WRITES) > + counter_msr = MSR_IA32_PMC0; > + else > + counter_msr = MSR_IA32_PERFCTR0; > + > + gp_event = intel_event_to_feature[idx].gp_event; > + TEST_ASSERT_EQ(idx, gp_event.f.bit); > + > + if (pmu_version < 2) { > + guest_measure_pmu_v1(gp_event, counter_msr, nr_gp_counters); Looking at this again, testing guest PMU version 1 is practically impossible because this testcase doesn't force the guest PMU version. I.e. unless I'm missing something, this requires old hardware or running in a VM with its PMU forced to '1'. And if all subtests use similar inputs, the common configuration can be shoved into pmu_vm_create_with_one_vcpu(). It's easy enough to fold test_intel_arch_events() into test_intel_counters(), which will also provide coverage for running with full-width writes enabled. The only downside is that the total runtime will be longer. > +static void test_arch_events_cpuid(uint8_t i, uint8_t j, uint8_t idx) > +{ > + uint8_t arch_events_unavailable_mask = BIT_ULL(j); > + uint8_t arch_events_bitmap_size = BIT_ULL(i); > + struct kvm_vcpu *vcpu; > + struct kvm_vm *vm; > + > + vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_measure_loop); > + > + vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH, > + arch_events_bitmap_size); > + vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EVENTS_MASK, > + arch_events_unavailable_mask); > + > + vcpu_args_set(vcpu, 1, idx); > + > + run_vcpu(vcpu); > + > + kvm_vm_free(vm); > +} > + > +static void test_intel_arch_events(void) > +{ > + uint8_t idx, i, j; > + > + for (idx = 0; idx < NR_INTEL_ARCH_EVENTS; idx++) { There's no need to iterate over each event in the host, we can simply add a wrapper for guest_measure_loop() in the guest. That'll be slightly faster since it won't require creating and destroying a VM for every event. > + /* > + * A brute force iteration of all combinations of values is > + * likely to exhaust the limit of the single-threaded thread > + * fd nums, so it's test by iterating through all valid > + * single-bit values. > + */ > + for (i = 0; i < NR_INTEL_ARCH_EVENTS; i++) { This is flawed/odd. 'i' becomes arch_events_bitmap_size, i.e. it's a length, but the length is computed byt BIT(i). That's nonsensical and will eventually result in undefined behavior. Oof, that'll actually happen sooner than later because arch_events_bitmap_size is only a single byte, i.e. when the number of events hits 9, this will try to shove 256 into an 8-bit variable. The more correct approach would be to pass in 0..NR_INTEL_ARCH_EVENTS inclusive as the size. But I think we should actually test 0..length+1, where "length" is the max of the native length and NR_INTEL_ARCH_EVENTS, i.e. we should verify KVM KVM handles a size larger than the native length. > + for (j = 0; j < NR_INTEL_ARCH_EVENTS; j++) > + test_arch_events_cpuid(i, j, idx); And here, I think it makes sense to brute force all possible values for at least one configuration. There aren't actually _that_ many values, e.g. currently it's 64 (I think). E.g. test the native PMU version with the "full" length, and then test single bits with varying lengths. I'll send a v6 later this week.
On Tue, Oct 24, 2023, Sean Christopherson wrote: > On Mon, Oct 23, 2023, Sean Christopherson wrote: > > +static void test_intel_arch_events(void) > > +{ > > + uint8_t idx, i, j; > > + > > + for (idx = 0; idx < NR_INTEL_ARCH_EVENTS; idx++) { *sigh* Yet another KVM bug that this test _should_ catch, but doesn't because too many things are hardcoded. KVM _still_ advertises Top Down Slots to userspace because KVM doesn't sanitize the bit vector or its length that comes out of perf.
在 2023/10/25 03:49, Sean Christopherson 写道: > On Mon, Oct 23, 2023, Sean Christopherson wrote: >> +static void guest_measure_pmu_v1(struct kvm_x86_pmu_feature event, >> + uint32_t counter_msr, uint32_t nr_gp_counters) >> +{ >> + uint8_t idx = event.f.bit; >> + unsigned int i; >> + >> + for (i = 0; i < nr_gp_counters; i++) { >> + wrmsr(counter_msr + i, 0); >> + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | >> + ARCH_PERFMON_EVENTSEL_ENABLE | intel_pmu_arch_events[idx]); >> + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); >> + >> + if (pmu_is_intel_event_stable(idx)) >> + GUEST_ASSERT_EQ(this_pmu_has(event), !!_rdpmc(i)); >> + >> + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | >> + !ARCH_PERFMON_EVENTSEL_ENABLE | >> + intel_pmu_arch_events[idx]); >> + wrmsr(counter_msr + i, 0); >> + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); >> + >> + if (pmu_is_intel_event_stable(idx)) >> + GUEST_ASSERT(!_rdpmc(i)); >> + } >> + >> + GUEST_DONE(); >> +} >> + >> +static void guest_measure_loop(uint8_t idx) >> +{ >> + const struct { >> + struct kvm_x86_pmu_feature gp_event; >> + } intel_event_to_feature[] = { >> + [INTEL_ARCH_CPU_CYCLES] = { X86_PMU_FEATURE_CPU_CYCLES }, >> + [INTEL_ARCH_INSTRUCTIONS_RETIRED] = { X86_PMU_FEATURE_INSNS_RETIRED }, >> + [INTEL_ARCH_REFERENCE_CYCLES] = { X86_PMU_FEATURE_REFERENCE_CYCLES }, >> + [INTEL_ARCH_LLC_REFERENCES] = { X86_PMU_FEATURE_LLC_REFERENCES }, >> + [INTEL_ARCH_LLC_MISSES] = { X86_PMU_FEATURE_LLC_MISSES }, >> + [INTEL_ARCH_BRANCHES_RETIRED] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED }, >> + [INTEL_ARCH_BRANCHES_MISPREDICTED] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED }, >> + }; >> + >> + uint32_t nr_gp_counters = this_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS); >> + uint32_t pmu_version = this_cpu_property(X86_PROPERTY_PMU_VERSION); >> + struct kvm_x86_pmu_feature gp_event; >> + uint32_t counter_msr; >> + unsigned int i; >> + >> + if (rdmsr(MSR_IA32_PERF_CAPABILITIES) & PMU_CAP_FW_WRITES) >> + counter_msr = MSR_IA32_PMC0; >> + else >> + counter_msr = MSR_IA32_PERFCTR0; >> + >> + gp_event = intel_event_to_feature[idx].gp_event; >> + TEST_ASSERT_EQ(idx, gp_event.f.bit); >> + >> + if (pmu_version < 2) { >> + guest_measure_pmu_v1(gp_event, counter_msr, nr_gp_counters); > > Looking at this again, testing guest PMU version 1 is practically impossible > because this testcase doesn't force the guest PMU version. I.e. unless I'm > missing something, this requires old hardware or running in a VM with its PMU > forced to '1'. > > And if all subtests use similar inputs, the common configuration can be shoved > into pmu_vm_create_with_one_vcpu(). > > It's easy enough to fold test_intel_arch_events() into test_intel_counters(), > which will also provide coverage for running with full-width writes enabled. The > only downside is that the total runtime will be longer. > >> +static void test_arch_events_cpuid(uint8_t i, uint8_t j, uint8_t idx) >> +{ >> + uint8_t arch_events_unavailable_mask = BIT_ULL(j); >> + uint8_t arch_events_bitmap_size = BIT_ULL(i); >> + struct kvm_vcpu *vcpu; >> + struct kvm_vm *vm; >> + >> + vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_measure_loop); >> + >> + vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH, >> + arch_events_bitmap_size); >> + vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EVENTS_MASK, >> + arch_events_unavailable_mask); >> + >> + vcpu_args_set(vcpu, 1, idx); >> + >> + run_vcpu(vcpu); >> + >> + kvm_vm_free(vm); >> +} >> + >> +static void test_intel_arch_events(void) >> +{ >> + uint8_t idx, i, j; >> + >> + for (idx = 0; idx < NR_INTEL_ARCH_EVENTS; idx++) { > > There's no need to iterate over each event in the host, we can simply add a wrapper > for guest_measure_loop() in the guest. That'll be slightly faster since it won't > require creating and destroying a VM for every event. > >> + /* >> + * A brute force iteration of all combinations of values is >> + * likely to exhaust the limit of the single-threaded thread >> + * fd nums, so it's test by iterating through all valid >> + * single-bit values. >> + */ >> + for (i = 0; i < NR_INTEL_ARCH_EVENTS; i++) { > > This is flawed/odd. 'i' becomes arch_events_bitmap_size, i.e. it's a length, > but the length is computed byt BIT(i). That's nonsensical and will eventually > result in undefined behavior. Oof, that'll actually happen sooner than later > because arch_events_bitmap_size is only a single byte, i.e. when the number of > events hits 9, this will try to shove 256 into an 8-bit variable. > > The more correct approach would be to pass in 0..NR_INTEL_ARCH_EVENTS inclusive > as the size. But I think we should actually test 0..length+1, where "length" is > the max of the native length and NR_INTEL_ARCH_EVENTS, i.e. we should verify KVM > KVM handles a size larger than the native length. > >> + for (j = 0; j < NR_INTEL_ARCH_EVENTS; j++) >> + test_arch_events_cpuid(i, j, idx); > > And here, I think it makes sense to brute force all possible values for at least > one configuration. There aren't actually _that_ many values, e.g. currently it's > 64 (I think). E.g. test the native PMU version with the "full" length, and then > test single bits with varying lengths. > > I'll send a v6 later this week. Got it, thanks. Please feel free to let me know if there's anything you'd like me to do.
On Mon, Oct 23, 2023, Sean Christopherson wrote: > From: Jinrong Liang <cloudliang@tencent.com> > > Add test cases to check if different Architectural events are available > after it's marked as unavailable via CPUID. It covers vPMU event filtering > logic based on Intel CPUID, which is a complement to pmu_event_filter. > > According to Intel SDM, the number of architectural events is reported > through CPUID.0AH:EAX[31:24] and the architectural event x is supported > if EBX[x]=0 && EAX[31:24]>x. > > Co-developed-by: Like Xu <likexu@tencent.com> > Signed-off-by: Like Xu <likexu@tencent.com> > Signed-off-by: Jinrong Liang <cloudliang@tencent.com> > Co-developed-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > tools/testing/selftests/kvm/Makefile | 1 + > .../selftests/kvm/x86_64/pmu_counters_test.c | 189 ++++++++++++++++++ > 2 files changed, 190 insertions(+) > create mode 100644 tools/testing/selftests/kvm/x86_64/pmu_counters_test.c > > diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile > index ed1c17cabc07..4c024fb845b4 100644 > --- a/tools/testing/selftests/kvm/Makefile > +++ b/tools/testing/selftests/kvm/Makefile > @@ -82,6 +82,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test > TEST_GEN_PROGS_x86_64 += x86_64/monitor_mwait_test > TEST_GEN_PROGS_x86_64 += x86_64/nested_exceptions_test > TEST_GEN_PROGS_x86_64 += x86_64/platform_info_test > +TEST_GEN_PROGS_x86_64 += x86_64/pmu_counters_test > TEST_GEN_PROGS_x86_64 += x86_64/pmu_event_filter_test > TEST_GEN_PROGS_x86_64 += x86_64/set_boot_cpu_id > TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test > diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c > new file mode 100644 > index 000000000000..2a6336b994d5 > --- /dev/null > +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c > @@ -0,0 +1,189 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2023, Tencent, Inc. > + */ > + > +#define _GNU_SOURCE /* for program_invocation_short_name */ > +#include <x86intrin.h> > + > +#include "pmu.h" > +#include "processor.h" > + > +/* Guest payload for any performance counter counting */ > +#define NUM_BRANCHES 10 > + > +static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu, > + void *guest_code) > +{ > + struct kvm_vm *vm; > + > + vm = vm_create_with_one_vcpu(vcpu, guest_code); > + vm_init_descriptor_tables(vm); > + vcpu_init_descriptor_tables(*vcpu); > + > + return vm; > +} > + > +static void run_vcpu(struct kvm_vcpu *vcpu) > +{ > + struct ucall uc; > + > + do { > + vcpu_run(vcpu); > + switch (get_ucall(vcpu, &uc)) { > + case UCALL_SYNC: > + break; > + case UCALL_ABORT: > + REPORT_GUEST_ASSERT(uc); > + break; > + case UCALL_DONE: > + break; > + default: > + TEST_FAIL("Unexpected ucall: %lu", uc.cmd); > + } > + } while (uc.cmd != UCALL_DONE); > +} > + > +static bool pmu_is_intel_event_stable(uint8_t idx) > +{ > + switch (idx) { > + case INTEL_ARCH_CPU_CYCLES: > + case INTEL_ARCH_INSTRUCTIONS_RETIRED: > + case INTEL_ARCH_REFERENCE_CYCLES: > + case INTEL_ARCH_BRANCHES_RETIRED: > + return true; > + default: > + return false; > + } > +} Brief explanation on why other events are not stable please. Since there are only a few architecture events, maybe listing all of them with explanation in comments would work better. Let out-of-bound return false on default. > + > +static void guest_measure_pmu_v1(struct kvm_x86_pmu_feature event, > + uint32_t counter_msr, uint32_t nr_gp_counters) > +{ > + uint8_t idx = event.f.bit; > + unsigned int i; > + > + for (i = 0; i < nr_gp_counters; i++) { > + wrmsr(counter_msr + i, 0); > + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | > + ARCH_PERFMON_EVENTSEL_ENABLE | intel_pmu_arch_events[idx]); > + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); Some comment might be needed for readability. Abuptly inserting inline assembly code in C destroys the readability. I wonder do we need add 'clobber' here for the above line, since it takes away ecx? Also, I wonder if we need to disable IRQ here? This code might be intercepted and resumed. If so, then the test will get a different number? > + > + if (pmu_is_intel_event_stable(idx)) > + GUEST_ASSERT_EQ(this_pmu_has(event), !!_rdpmc(i)); Okay, just the counter value is non-zero means we pass the test ?! hmm, I wonder other than IRQ stuff, what else may affect the result? NMI watchdog or what? > + > + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | > + !ARCH_PERFMON_EVENTSEL_ENABLE | > + intel_pmu_arch_events[idx]); > + wrmsr(counter_msr + i, 0); > + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); ditto for readability. Please consider using a macro to avoid repeated explanation. > + > + if (pmu_is_intel_event_stable(idx)) > + GUEST_ASSERT(!_rdpmc(i)); > + } > + > + GUEST_DONE(); > +} > + > +static void guest_measure_loop(uint8_t idx) > +{ > + const struct { > + struct kvm_x86_pmu_feature gp_event; > + } intel_event_to_feature[] = { > + [INTEL_ARCH_CPU_CYCLES] = { X86_PMU_FEATURE_CPU_CYCLES }, > + [INTEL_ARCH_INSTRUCTIONS_RETIRED] = { X86_PMU_FEATURE_INSNS_RETIRED }, > + [INTEL_ARCH_REFERENCE_CYCLES] = { X86_PMU_FEATURE_REFERENCE_CYCLES }, > + [INTEL_ARCH_LLC_REFERENCES] = { X86_PMU_FEATURE_LLC_REFERENCES }, > + [INTEL_ARCH_LLC_MISSES] = { X86_PMU_FEATURE_LLC_MISSES }, > + [INTEL_ARCH_BRANCHES_RETIRED] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED }, > + [INTEL_ARCH_BRANCHES_MISPREDICTED] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED }, > + }; > + > + uint32_t nr_gp_counters = this_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS); > + uint32_t pmu_version = this_cpu_property(X86_PROPERTY_PMU_VERSION); > + struct kvm_x86_pmu_feature gp_event; > + uint32_t counter_msr; > + unsigned int i; > + > + if (rdmsr(MSR_IA32_PERF_CAPABILITIES) & PMU_CAP_FW_WRITES) > + counter_msr = MSR_IA32_PMC0; > + else > + counter_msr = MSR_IA32_PERFCTR0; > + > + gp_event = intel_event_to_feature[idx].gp_event; > + TEST_ASSERT_EQ(idx, gp_event.f.bit); > + > + if (pmu_version < 2) { > + guest_measure_pmu_v1(gp_event, counter_msr, nr_gp_counters); > + return; > + } > + > + for (i = 0; i < nr_gp_counters; i++) { > + wrmsr(counter_msr + i, 0); > + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | > + ARCH_PERFMON_EVENTSEL_ENABLE | > + intel_pmu_arch_events[idx]); > + > + wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, BIT_ULL(i)); > + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); > + wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0); > + > + if (pmu_is_intel_event_stable(idx)) > + GUEST_ASSERT_EQ(this_pmu_has(gp_event), !!_rdpmc(i)); > + } > + > + GUEST_DONE(); > +} > + > +static void test_arch_events_cpuid(uint8_t i, uint8_t j, uint8_t idx) > +{ > + uint8_t arch_events_unavailable_mask = BIT_ULL(j); > + uint8_t arch_events_bitmap_size = BIT_ULL(i); > + struct kvm_vcpu *vcpu; > + struct kvm_vm *vm; > + > + vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_measure_loop); > + > + vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH, > + arch_events_bitmap_size); > + vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EVENTS_MASK, > + arch_events_unavailable_mask); > + > + vcpu_args_set(vcpu, 1, idx); > + > + run_vcpu(vcpu); > + > + kvm_vm_free(vm); > +} > + > +static void test_intel_arch_events(void) > +{ > + uint8_t idx, i, j; > + > + for (idx = 0; idx < NR_INTEL_ARCH_EVENTS; idx++) { > + /* > + * A brute force iteration of all combinations of values is > + * likely to exhaust the limit of the single-threaded thread > + * fd nums, so it's test by iterating through all valid > + * single-bit values. > + */ > + for (i = 0; i < NR_INTEL_ARCH_EVENTS; i++) { > + for (j = 0; j < NR_INTEL_ARCH_EVENTS; j++) > + test_arch_events_cpuid(i, j, idx); > + } > + } > +} > + > +int main(int argc, char *argv[]) > +{ > + TEST_REQUIRE(get_kvm_param_bool("enable_pmu")); > + > + TEST_REQUIRE(host_cpu_is_intel); > + TEST_REQUIRE(kvm_cpu_has_p(X86_PROPERTY_PMU_VERSION)); > + TEST_REQUIRE(kvm_cpu_property(X86_PROPERTY_PMU_VERSION) > 0); > + TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_PDCM)); hmm, this means we cannot run this in nested if X86_FEATURE_PDCM is missing. It only affects full-width counter, right? > + > + test_intel_arch_events(); > + > + return 0; > +} > -- > 2.42.0.758.gaed0368e0e-goog >
On Thu, Oct 26, 2023, Mingwei Zhang wrote: > > +static bool pmu_is_intel_event_stable(uint8_t idx) > > +{ > > + switch (idx) { > > + case INTEL_ARCH_CPU_CYCLES: > > + case INTEL_ARCH_INSTRUCTIONS_RETIRED: > > + case INTEL_ARCH_REFERENCE_CYCLES: > > + case INTEL_ARCH_BRANCHES_RETIRED: > > + return true; > > + default: > > + return false; > > + } > > +} > > Brief explanation on why other events are not stable please. Since there > are only a few architecture events, maybe listing all of them with > explanation in comments would work better. Heh, I've already rewritten this logic to make > > + > > +static void guest_measure_pmu_v1(struct kvm_x86_pmu_feature event, > > + uint32_t counter_msr, uint32_t nr_gp_counters) > > +{ > > + uint8_t idx = event.f.bit; > > + unsigned int i; > > + > > + for (i = 0; i < nr_gp_counters; i++) { > > + wrmsr(counter_msr + i, 0); > > + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | > > + ARCH_PERFMON_EVENTSEL_ENABLE | intel_pmu_arch_events[idx]); > > + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); > > Some comment might be needed for readability. Abuptly inserting inline > assembly code in C destroys the readability. > > I wonder do we need add 'clobber' here for the above line, since it > takes away ecx? It's already there. You can't directly clobber a register that is used as an input constraint. The workaround is to make the register both an input and an output, hense the "+c" in the outputs section instead of just "c" in the inputs section. The extra bit of cleverness is to use an intermediate anonymous variable so that NUM_BRANCHES can effectively be passed in (#defines won't work as output constraints). > Also, I wonder if we need to disable IRQ here? This code might be > intercepted and resumed. If so, then the test will get a different > number? This is guest code, disabling IRQs is pointless. There are no guest virtual IRQs, guarding aginst host IRQs is impossible, unnecessary, and actualy undesirable, i.e. the guest vPMU shouldn't be counting host instructions and whatnot. > > + > > + if (pmu_is_intel_event_stable(idx)) > > + GUEST_ASSERT_EQ(this_pmu_has(event), !!_rdpmc(i)); > > Okay, just the counter value is non-zero means we pass the test ?! FWIW, I've updated > hmm, I wonder other than IRQ stuff, what else may affect the result? NMI > watchdog or what? This is the beauty of selftests. There _so_ simple that there are very few surprises. E.g. there are no events of any kind unless the test explicitly generates them. The downside is that doing anything complex in selftests requires writing a fair bit of code. > > + > > + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | > > + !ARCH_PERFMON_EVENTSEL_ENABLE | > > + intel_pmu_arch_events[idx]); > > + wrmsr(counter_msr + i, 0); > > + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); > ditto for readability. Please consider using a macro to avoid repeated > explanation. Heh, already did this too. Though I'm not entirely sure it's more readable. It's definitely more precise and featured :-) #define GUEST_MEASURE_EVENT(_msr, _value, clflush, FEP) \ do { \ __asm__ __volatile__("wrmsr\n\t" \ clflush "\n\t" \ "mfence\n\t" \ "1: mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t" \ FEP "loop .\n\t" \ FEP "mov %%edi, %%ecx\n\t" \ FEP "xor %%eax, %%eax\n\t" \ FEP "xor %%edx, %%edx\n\t" \ "wrmsr\n\t" \ : "+c"((int){_msr}) \ : "a"((uint32_t)_value), "d"(_value >> 32), \ "D"(_msr) \ ); \ } while (0) > > +int main(int argc, char *argv[]) > > +{ > > + TEST_REQUIRE(get_kvm_param_bool("enable_pmu")); > > + > > + TEST_REQUIRE(host_cpu_is_intel); > > + TEST_REQUIRE(kvm_cpu_has_p(X86_PROPERTY_PMU_VERSION)); > > + TEST_REQUIRE(kvm_cpu_property(X86_PROPERTY_PMU_VERSION) > 0); > > + TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_PDCM)); > > hmm, this means we cannot run this in nested if X86_FEATURE_PDCM is > missing. It only affects full-width counter, right? Ah, yeah, good call. It won't be too much trouble to have the test play nice with !PDCM.
On Thu, Oct 26, 2023, Sean Christopherson wrote: > On Thu, Oct 26, 2023, Mingwei Zhang wrote: > > > +static bool pmu_is_intel_event_stable(uint8_t idx) > > > +{ > > > + switch (idx) { > > > + case INTEL_ARCH_CPU_CYCLES: > > > + case INTEL_ARCH_INSTRUCTIONS_RETIRED: > > > + case INTEL_ARCH_REFERENCE_CYCLES: > > > + case INTEL_ARCH_BRANCHES_RETIRED: > > > + return true; > > > + default: > > > + return false; > > > + } > > > +} > > > > Brief explanation on why other events are not stable please. Since there > > are only a few architecture events, maybe listing all of them with > > explanation in comments would work better. > > Heh, I've already rewritten this logic to make > > > > > + > > > +static void guest_measure_pmu_v1(struct kvm_x86_pmu_feature event, > > > + uint32_t counter_msr, uint32_t nr_gp_counters) > > > +{ > > > + uint8_t idx = event.f.bit; > > > + unsigned int i; > > > + > > > + for (i = 0; i < nr_gp_counters; i++) { > > > + wrmsr(counter_msr + i, 0); > > > + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | > > > + ARCH_PERFMON_EVENTSEL_ENABLE | intel_pmu_arch_events[idx]); > > > + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); > > > > Some comment might be needed for readability. Abuptly inserting inline > > assembly code in C destroys the readability. > > > > I wonder do we need add 'clobber' here for the above line, since it > > takes away ecx? > > It's already there. You can't directly clobber a register that is used as an > input constraint. The workaround is to make the register both an input and an > output, hense the "+c" in the outputs section instead of just "c" in the inputs > section. The extra bit of cleverness is to use an intermediate anonymous variable > so that NUM_BRANCHES can effectively be passed in (#defines won't work as output > constraints). > > > Also, I wonder if we need to disable IRQ here? This code might be > > intercepted and resumed. If so, then the test will get a different > > number? > > This is guest code, disabling IRQs is pointless. There are no guest virtual IRQs, > guarding aginst host IRQs is impossible, unnecessary, and actualy undesirable, > i.e. the guest vPMU shouldn't be counting host instructions and whatnot. > > > > + > > > + if (pmu_is_intel_event_stable(idx)) > > > + GUEST_ASSERT_EQ(this_pmu_has(event), !!_rdpmc(i)); > > > > Okay, just the counter value is non-zero means we pass the test ?! > > FWIW, I've updated > > > hmm, I wonder other than IRQ stuff, what else may affect the result? NMI > > watchdog or what? > > This is the beauty of selftests. There _so_ simple that there are very few > surprises. E.g. there are no events of any kind unless the test explicitly > generates them. The downside is that doing anything complex in selftests requires > writing a fair bit of code. Understood, so we could support precise matching. > > > > + > > > + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | > > > + !ARCH_PERFMON_EVENTSEL_ENABLE | > > > + intel_pmu_arch_events[idx]); > > > + wrmsr(counter_msr + i, 0); > > > + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); > > ditto for readability. Please consider using a macro to avoid repeated > > explanation. > > Heh, already did this too. Though I'm not entirely sure it's more readable. It's > definitely more precise and featured :-) > Oh dear, this is challenging to my rusty inline assembly skills :) > #define GUEST_MEASURE_EVENT(_msr, _value, clflush, FEP) \ > do { \ > __asm__ __volatile__("wrmsr\n\t" \ > clflush "\n\t" \ > "mfence\n\t" \ > "1: mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t" \ > FEP "loop .\n\t" \ > FEP "mov %%edi, %%ecx\n\t" \ > FEP "xor %%eax, %%eax\n\t" \ > FEP "xor %%edx, %%edx\n\t" \ > "wrmsr\n\t" \ > : "+c"((int){_msr}) \ isn't it NUM_BRANCHES? > : "a"((uint32_t)_value), "d"(_value >> 32), \ > "D"(_msr) \ > ); \ > } while (0) > do we need this label '1:' in the above code? It does not seems to be used anywhere within the code. why is clflush needed here? > > > > +int main(int argc, char *argv[]) > > > +{ > > > + TEST_REQUIRE(get_kvm_param_bool("enable_pmu")); > > > + > > > + TEST_REQUIRE(host_cpu_is_intel); > > > + TEST_REQUIRE(kvm_cpu_has_p(X86_PROPERTY_PMU_VERSION)); > > > + TEST_REQUIRE(kvm_cpu_property(X86_PROPERTY_PMU_VERSION) > 0); > > > + TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_PDCM)); > > > > hmm, this means we cannot run this in nested if X86_FEATURE_PDCM is > > missing. It only affects full-width counter, right? > > Ah, yeah, good call. It won't be too much trouble to have the test play nice > with !PDCM.
On Thu, Oct 26, 2023, Mingwei Zhang wrote: > On Thu, Oct 26, 2023, Sean Christopherson wrote: > > Heh, already did this too. Though I'm not entirely sure it's more readable. It's > > definitely more precise and featured :-) > > > Oh dear, this is challenging to my rusty inline assembly skills :) > > > #define GUEST_MEASURE_EVENT(_msr, _value, clflush, FEP) \ > > do { \ > > __asm__ __volatile__("wrmsr\n\t" \ > > clflush "\n\t" \ > > "mfence\n\t" \ > > "1: mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t" \ > > FEP "loop .\n\t" \ > > FEP "mov %%edi, %%ecx\n\t" \ > > FEP "xor %%eax, %%eax\n\t" \ > > FEP "xor %%edx, %%edx\n\t" \ > > "wrmsr\n\t" \ > > : "+c"((int){_msr}) \ > isn't it NUM_BRANCHES? Nope. It's hard to see because this doesn't provide the usage, but @_msr is an MSR index that is consumed by the first "wrmsr", i.e. this blob relies on the compiler to preload ECX, EAX, and EDX for WRMSR. NUM_BRANCHES is manually loaded into ECX after WRMSR (WRMSR and LOOP both hardcode consuming ECX). Ha! I can actually drop the "+c" clobbering trick since ECX is restored to its input value before the asm blob finishes. EDI is also loaded with _@msr so that it can be quickly reloaded into ECX for the WRMSR to disable the event. The purpose of doing both WRMSRs in assembly is to ensure the compiler doesn't insert _any_ code into the measured sequence, e.g. so that a random Jcc doesn't throw off instructions retired. > > : "a"((uint32_t)_value), "d"(_value >> 32), \ > > "D"(_msr) \ > > ); \ > > } while (0) > > > > do we need this label '1:' in the above code? It does not seems to be > used anywhere within the code. It's used by the caller as the target for CLFLUSH{,OPT}. if (this_cpu_has(X86_FEATURE_CLFLUSHOPT)) \ GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt 1f", FEP); \ else if (this_cpu_has(X86_FEATURE_CLFLUSH)) \ GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt 1f", FEP); \ else \ GUEST_MEASURE_EVENT(_ctrl_msr, _value, "nop", FEP); \ > > why is clflush needed here? As suggested by Jim, it allows verifying LLC references and misses by forcing the CPU to evict the cache line that holds the MOV at 1: (and likely holds most of the asm blob).
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index ed1c17cabc07..4c024fb845b4 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -82,6 +82,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test TEST_GEN_PROGS_x86_64 += x86_64/monitor_mwait_test TEST_GEN_PROGS_x86_64 += x86_64/nested_exceptions_test TEST_GEN_PROGS_x86_64 += x86_64/platform_info_test +TEST_GEN_PROGS_x86_64 += x86_64/pmu_counters_test TEST_GEN_PROGS_x86_64 += x86_64/pmu_event_filter_test TEST_GEN_PROGS_x86_64 += x86_64/set_boot_cpu_id TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c new file mode 100644 index 000000000000..2a6336b994d5 --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c @@ -0,0 +1,189 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023, Tencent, Inc. + */ + +#define _GNU_SOURCE /* for program_invocation_short_name */ +#include <x86intrin.h> + +#include "pmu.h" +#include "processor.h" + +/* Guest payload for any performance counter counting */ +#define NUM_BRANCHES 10 + +static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu, + void *guest_code) +{ + struct kvm_vm *vm; + + vm = vm_create_with_one_vcpu(vcpu, guest_code); + vm_init_descriptor_tables(vm); + vcpu_init_descriptor_tables(*vcpu); + + return vm; +} + +static void run_vcpu(struct kvm_vcpu *vcpu) +{ + struct ucall uc; + + do { + vcpu_run(vcpu); + switch (get_ucall(vcpu, &uc)) { + case UCALL_SYNC: + break; + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + break; + case UCALL_DONE: + break; + default: + TEST_FAIL("Unexpected ucall: %lu", uc.cmd); + } + } while (uc.cmd != UCALL_DONE); +} + +static bool pmu_is_intel_event_stable(uint8_t idx) +{ + switch (idx) { + case INTEL_ARCH_CPU_CYCLES: + case INTEL_ARCH_INSTRUCTIONS_RETIRED: + case INTEL_ARCH_REFERENCE_CYCLES: + case INTEL_ARCH_BRANCHES_RETIRED: + return true; + default: + return false; + } +} + +static void guest_measure_pmu_v1(struct kvm_x86_pmu_feature event, + uint32_t counter_msr, uint32_t nr_gp_counters) +{ + uint8_t idx = event.f.bit; + unsigned int i; + + for (i = 0; i < nr_gp_counters; i++) { + wrmsr(counter_msr + i, 0); + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | + ARCH_PERFMON_EVENTSEL_ENABLE | intel_pmu_arch_events[idx]); + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); + + if (pmu_is_intel_event_stable(idx)) + GUEST_ASSERT_EQ(this_pmu_has(event), !!_rdpmc(i)); + + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | + !ARCH_PERFMON_EVENTSEL_ENABLE | + intel_pmu_arch_events[idx]); + wrmsr(counter_msr + i, 0); + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); + + if (pmu_is_intel_event_stable(idx)) + GUEST_ASSERT(!_rdpmc(i)); + } + + GUEST_DONE(); +} + +static void guest_measure_loop(uint8_t idx) +{ + const struct { + struct kvm_x86_pmu_feature gp_event; + } intel_event_to_feature[] = { + [INTEL_ARCH_CPU_CYCLES] = { X86_PMU_FEATURE_CPU_CYCLES }, + [INTEL_ARCH_INSTRUCTIONS_RETIRED] = { X86_PMU_FEATURE_INSNS_RETIRED }, + [INTEL_ARCH_REFERENCE_CYCLES] = { X86_PMU_FEATURE_REFERENCE_CYCLES }, + [INTEL_ARCH_LLC_REFERENCES] = { X86_PMU_FEATURE_LLC_REFERENCES }, + [INTEL_ARCH_LLC_MISSES] = { X86_PMU_FEATURE_LLC_MISSES }, + [INTEL_ARCH_BRANCHES_RETIRED] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED }, + [INTEL_ARCH_BRANCHES_MISPREDICTED] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED }, + }; + + uint32_t nr_gp_counters = this_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS); + uint32_t pmu_version = this_cpu_property(X86_PROPERTY_PMU_VERSION); + struct kvm_x86_pmu_feature gp_event; + uint32_t counter_msr; + unsigned int i; + + if (rdmsr(MSR_IA32_PERF_CAPABILITIES) & PMU_CAP_FW_WRITES) + counter_msr = MSR_IA32_PMC0; + else + counter_msr = MSR_IA32_PERFCTR0; + + gp_event = intel_event_to_feature[idx].gp_event; + TEST_ASSERT_EQ(idx, gp_event.f.bit); + + if (pmu_version < 2) { + guest_measure_pmu_v1(gp_event, counter_msr, nr_gp_counters); + return; + } + + for (i = 0; i < nr_gp_counters; i++) { + wrmsr(counter_msr + i, 0); + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | + ARCH_PERFMON_EVENTSEL_ENABLE | + intel_pmu_arch_events[idx]); + + wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, BIT_ULL(i)); + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); + wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, 0); + + if (pmu_is_intel_event_stable(idx)) + GUEST_ASSERT_EQ(this_pmu_has(gp_event), !!_rdpmc(i)); + } + + GUEST_DONE(); +} + +static void test_arch_events_cpuid(uint8_t i, uint8_t j, uint8_t idx) +{ + uint8_t arch_events_unavailable_mask = BIT_ULL(j); + uint8_t arch_events_bitmap_size = BIT_ULL(i); + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + + vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_measure_loop); + + vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH, + arch_events_bitmap_size); + vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EVENTS_MASK, + arch_events_unavailable_mask); + + vcpu_args_set(vcpu, 1, idx); + + run_vcpu(vcpu); + + kvm_vm_free(vm); +} + +static void test_intel_arch_events(void) +{ + uint8_t idx, i, j; + + for (idx = 0; idx < NR_INTEL_ARCH_EVENTS; idx++) { + /* + * A brute force iteration of all combinations of values is + * likely to exhaust the limit of the single-threaded thread + * fd nums, so it's test by iterating through all valid + * single-bit values. + */ + for (i = 0; i < NR_INTEL_ARCH_EVENTS; i++) { + for (j = 0; j < NR_INTEL_ARCH_EVENTS; j++) + test_arch_events_cpuid(i, j, idx); + } + } +} + +int main(int argc, char *argv[]) +{ + TEST_REQUIRE(get_kvm_param_bool("enable_pmu")); + + TEST_REQUIRE(host_cpu_is_intel); + TEST_REQUIRE(kvm_cpu_has_p(X86_PROPERTY_PMU_VERSION)); + TEST_REQUIRE(kvm_cpu_property(X86_PROPERTY_PMU_VERSION) > 0); + TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_PDCM)); + + test_intel_arch_events(); + + return 0; +}