From patchwork Tue Aug 22 05:11:38 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
X-Patchwork-Id: 136584
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b82d:0:b0:3f2:4152:657d with SMTP id z13csp3798851vqi;
        Tue, 22 Aug 2023 11:04:45 -0700 (PDT)
X-Google-Smtp-Source: 
 AGHT+IGs2soMctllOVGM/WvdpsLU8gGRucEIWl6N40DW8TFWJFmVPGt+zUXW6rKw/pAdpiMExo8w
X-Received: by 2002:a17:90a:f0ce:b0:26f:4685:5b53 with SMTP id
 fa14-20020a17090af0ce00b0026f46855b53mr7495803pjb.28.1692727485294;
        Tue, 22 Aug 2023 11:04:45 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1692727485; cv=none;
        d=google.com; s=arc-20160816;
        b=gVYpmf0LPtikfeG9DWSDen4oHzFATKecmj6lXMwDbCU2UsnMnRL3CppceBUdCV+y4Q
         OCJj19UebegZhwANXvdThQJLl94O4z4EmT71t4jct1k5b18qCJCcZy3x6Jsa+eqmnHnY
         0kqrONoeA5krs27ntaWRvO9u+Dz6qKxT6ImbPDHxzDZ7i450dtnKKHZ1rkxD6TJOCZYd
         eDABa1xvSHbYaDkxoNLuIXGP+EUucEkEKCDs1S9eMqeIEfGw/vRjRBLK2kHmNPvPnOF+
         IJiel7ee9aqaRWLITlW+plMIW6JQx8z61hcPh4ImQwklaTHK2/udtpchX+mgeOdFJfrU
         OP2w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=2cpjJRsZv+kxdMpHyogiWe/ZtNEACsnAHwsUuK1Sj30=;
        fh=UPwqiesAqvJCK7JcrgEpVEdaX86ysnZ9K1cUIusTBBs=;
        b=epLSwxBZj+OfQBA43kl5Q23Z2mH+crTfTpIpxZGsW6EWBUBAYB6bqRululoxJpau9t
         7Qbq6MpI7/V49Kv9Eh0GDm5xNuhvtTiRjLBS/gtsVOl0WwPymzCFfMPS9P8CO8aZtsPr
         2TDdMpxOiKAbVkEiq2Ub5CX1tt2LPk7QJMAPlWOGxuwiMLPulSu7LX1MwQ04m9kcu8r8
         +hxLGqDccNEwr9wxmG1foeEQL8Dw4TTuOX3uVQ5vH8EpRo/EX07Cpu5RRlOinkd+7hLg
         yEyayiuQAePrelu0C3Zy7EiB5voYkOk5UVyfkau5FtNDOBWnOK4e+nX2neTdTEYQeRwS
         ZPAA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=ZaSQo8a3;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:18 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net.
 [2620:137:e000::1:18])
        by mx.google.com with ESMTPS id
 y17-20020a17090abd1100b0026b698fdda6si1887944pjr.98.2023.08.22.11.04.37
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 22 Aug 2023 11:04:45 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:18 as permitted sender)
 client-ip=2620:137:e000::1:18;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=ZaSQo8a3;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:18 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by lindbergh.monkeyblade.net (Postfix) with ESMTP id 9ABF4237BC;
	Tue, 22 Aug 2023 10:20:02 -0700 (PDT)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232679AbjHVFH1 (ORCPT <rfc822;kjyul0303@gmail.com> + 99 others);
        Tue, 22 Aug 2023 01:07:27 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46630 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231576AbjHVFH0 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 22 Aug 2023 01:07:26 -0400
Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69764191;
        Mon, 21 Aug 2023 22:06:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1692680819; x=1724216819;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=U6aVh1MFuTXfCz6xGzJhQRJsBE/sxwKFrUCvIhNsIJg=;
  b=ZaSQo8a3m84jFPgZLiKA+GZuIzz5hHAOhNPadpbnaWffQGXQmr/BkoMr
   Y7Dwmj2tCxL04I8kR5W0ZT+3d+1/r9QfCf1JAT8AxfjU34gkGggNZIMPA
   rv2z7XwhcsVyE0YjOY8AV36gwOimuff+mrfZqc2Ch44ZYPrB6iUIRSrh9
   aW5jJ7X9crbC7gyrLqP8El5yBGQUrmpsB6l9AP+W9XXM5RdTIVotMmCmF
   HhmuZPgtj9v9n6vT6Y6G2CId0dUQfgkrQYGfwL8Q7jAjPAlS9XpeCNBww
   cvgUViC/okGxc4tHBonSF2dxXjEeaHmT8dMkgI+jr6CJEb1WEydQM0VaY
   g==;
X-IronPort-AV: E=McAfee;i="6600,9927,10809"; a="440146723"
X-IronPort-AV: E=Sophos;i="6.01,192,1684825200";
   d="scan'208";a="440146723"
Received: from fmsmga006.fm.intel.com ([10.253.24.20])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 21 Aug 2023 22:05:03 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10809"; a="982736945"
X-IronPort-AV: E=Sophos;i="6.01,192,1684825200";
   d="scan'208";a="982736945"
Received: from dmi-pnp-i7.sh.intel.com ([10.239.159.155])
  by fmsmga006.fm.intel.com with ESMTP; 21 Aug 2023 22:04:58 -0700
From: Dapeng Mi <dapeng1.mi@linux.intel.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Kan Liang <kan.liang@linux.intel.com>,
        Like Xu <likexu@tencent.com>,
        Mark Rutland <mark.rutland@arm.com>,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        Jiri Olsa <jolsa@kernel.org>,
        Namhyung Kim <namhyung@kernel.org>,
        Ian Rogers <irogers@google.com>,
        Adrian Hunter <adrian.hunter@intel.com>
Cc: kvm@vger.kernel.org, linux-perf-users@vger.kernel.org,
        linux-kernel@vger.kernel.org,
        Zhenyu Wang <zhenyuw@linux.intel.com>,
        Zhang Xiong <xiong.y.zhang@intel.com>,
        Lv Zhiyuan <zhiyuan.lv@intel.com>,
        Yang Weijiang <weijiang.yang@intel.com>,
        Dapeng Mi <dapeng1.mi@intel.com>,
        Dapeng Mi <dapeng1.mi@linux.intel.com>
Subject: [PATCH RFC v3 11/13] KVM: x86/pmu: Support topdown perf metrics
 feature
Date: Tue, 22 Aug 2023 13:11:38 +0800
Message-Id: <20230822051140.512879-12-dapeng1.mi@linux.intel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20230822051140.512879-1-dapeng1.mi@linux.intel.com>
References: <20230822051140.512879-1-dapeng1.mi@linux.intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED,
        SPF_HELO_NONE,SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1774953415563707064
X-GMAIL-MSGID: 1774953415563707064

This patch adds topdown perf metrics support for KVM. The topdown perf
metrics is a feature on Intel CPUs which supports the TopDown
Microarchitecture Analysis (TMA) Method. TMA is a structured analysis
methodology to identify critical performance bottlenecks in out-of-order
processors. The details about topdown metrics support on Intel
processors can be found in section "Performance Metrics" of Intel's SDM.

Intel chips use fixed counter 3 and PERF_METRICS MSR together to support
topdown metrics feature. Fix counter 3 counts the elapsed cpu slots,
PERF_METRICS reports the topdown metrics percentage.

Generally speaking, KVM has no method to know guest is running a solo
slots event counting/sampling or guest is profiling the topdown perf
metrics if KVM only observes the fixed counter 3. Fortunately we know
topdown metrics profiling always manipulate fixed counter 3 and
PERF_METRICS MSR together with a fixed sequence, FIXED_CTR3 MSR is
written firstly and then PERF_METRICS follows. So we can assume topdown
metrics profiling is running in guest if KVM observes PERF_METRICS
writing.

In current perf logic, an events group is required to handle the topdown
metrics profiling, and the events group couples a slots event which
acts events group leader and multiple metric events. To coordinate with
the perf topdown metrics handing logic and reduce the code changes in
KVM, we choose to follow current mature vPMU PMC emulation framework. The
only difference is that we need to create a events group for fixed
counter 3 and manipulate FIXED_CTR3 and PERF_METRICS MSRS together
instead of a single event and only manipulating FIXED_CTR3 MSR.

When guest write PERF_METRICS MSR at first, KVM would create an event
group which couples a slots event and a virtual metrics event. In this
event group, slots event claims the fixed counter 3 HW resource and acts
as group leader which is required by perf system. The virtual metrics
event claims the PERF_METRICS MSR. This event group is just like the perf
metrics events group on host and is scheduled by host perf system.

In this proposal, the count of slots event is calculated and emulated
on host and returned to guest just like other normal counters, but there
is a difference for the metrics event processing. KVM doesn't calculate
the real count of topdown metrics, it just stores the raw data of
PERF_METRICS MSR and directly returns the stored raw data to guest. Thus,
guest can get the real HW PERF_METRICS data and guarantee the calculation
accuracy of topdown metrics.

The whole procedure can be summarized as below.
1. KVM intercepts PERF_METRICS MSR writing and marks fixed counter 3
   enter topdown profiling mode (set the max_nr_events of fixed counter 3
   to 2) if it's not.
2. If the topdown metrics events group doesn't exist, create the events
   group firstly, and then update the saved slots count and metrics data
   of the group events with guest values. At last, enable the events and
   make the guest values are loaded into HW FIXED_CTR3 and PERF_METRICS
   MSRs.
3. Modify kvm_pmu_rdpmc() function to return PERF_METRICS MSR raw data to
   guest directly.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  6 ++++
 arch/x86/kvm/pmu.c              | 62 +++++++++++++++++++++++++++++++--
 arch/x86/kvm/pmu.h              | 28 +++++++++++++++
 arch/x86/kvm/vmx/capabilities.h |  1 +
 arch/x86/kvm/vmx/pmu_intel.c    | 48 +++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.h          |  5 +++
 arch/x86/kvm/x86.c              |  1 +
 7 files changed, 149 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 235e24fe66a4..d037259c6887 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -487,6 +487,12 @@ enum pmc_type {
 	KVM_PMC_FIXED,
 };
 
+enum topdown_events {
+	KVM_TD_SLOTS = 0,
+	KVM_TD_METRICS = 1,
+	KVM_TD_EVENTS_MAX = 2,
+};
+
 struct kvm_pmc {
 	enum pmc_type type;
 	u8 idx;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index b02a56c77647..fad7b2c10bb8 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -182,6 +182,30 @@ static u64 pmc_get_pebs_precise_level(struct kvm_pmc *pmc)
 	return 1;
 }
 
+static void pmc_setup_td_metrics_events_attr(struct kvm_pmc *pmc,
+					     struct perf_event_attr *attr,
+					     unsigned int event_idx)
+{
+	if (!pmc_is_topdown_metrics_used(pmc))
+		return;
+
+	/*
+	 * setup slots event attribute, when slots event is
+	 * created for guest topdown metrics profiling, the
+	 * sample period must be 0.
+	 */
+	if (event_idx == KVM_TD_SLOTS)
+		attr->sample_period = 0;
+
+	/* setup vmetrics event attribute */
+	if (event_idx == KVM_TD_METRICS) {
+		attr->config = INTEL_FIXED_VMETRICS_EVENT;
+		attr->sample_period = 0;
+		/* Only group leader event can be pinned. */
+		attr->pinned = false;
+	}
+}
+
 static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
 				 bool exclude_user, bool exclude_kernel,
 				 bool intr)
@@ -233,6 +257,8 @@ static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
 
 	for (i = 0; i < pmc->max_nr_events; i++) {
 		group_leader = i ? pmc->perf_event : NULL;
+		pmc_setup_td_metrics_events_attr(pmc, &attr, i);
+
 		event = perf_event_create_kernel_counter(&attr, -1,
 							 current, group_leader,
 							 kvm_perf_overflow, pmc);
@@ -256,6 +282,12 @@ static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
 	pmc->is_paused = false;
 	pmc->intr = intr || pebs;
 
+	if (pmc_is_topdown_metrics_active(pmc)) {
+		pmc_update_topdown_metrics(pmc);
+		/* KVM need to inject PMI for PERF_METRICS overflow. */
+		pmc->intr = true;
+	}
+
 	if (!attr.disabled)
 		return 0;
 
@@ -269,6 +301,7 @@ static void pmc_pause_counter(struct kvm_pmc *pmc)
 {
 	u64 counter = pmc->counter;
 	unsigned int i;
+	u64 data;
 
 	if (!pmc->perf_event || pmc->is_paused)
 		return;
@@ -279,8 +312,15 @@ static void pmc_pause_counter(struct kvm_pmc *pmc)
 	 * then disable non-group leader events.
 	 */
 	counter += perf_event_pause(pmc->perf_event, true);
-	for (i = 1; pmc->perf_events[i] && i < pmc->max_nr_events; i++)
-		perf_event_pause(pmc->perf_events[i], true);
+	for (i = 1; pmc->perf_events[i] && i < pmc->max_nr_events; i++) {
+		data = perf_event_pause(pmc->perf_events[i], true);
+		/*
+		 * The count of vmetrics event actually stores raw data of
+		 * PERF_METRICS, save it to extra_config.
+		 */
+		if (pmc->idx == INTEL_PMC_IDX_FIXED_SLOTS && i == KVM_TD_METRICS)
+			pmc->extra_config = data;
+	}
 	pmc->counter = counter & pmc_bitmask(pmc);
 	pmc->is_paused = true;
 }
@@ -557,6 +597,21 @@ static int kvm_pmu_rdpmc_vmware(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
 	return 0;
 }
 
+static inline int kvm_pmu_read_perf_metrics(struct kvm_vcpu *vcpu,
+					    unsigned int idx, u64 *data)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	struct kvm_pmc *pmc = get_fixed_pmc(pmu, MSR_CORE_PERF_FIXED_CTR3);
+
+	if (!pmc) {
+		*data = 0;
+		return 1;
+	}
+
+	*data = pmc->extra_config;
+	return 0;
+}
+
 int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
 {
 	bool fast_mode = idx & (1u << 31);
@@ -570,6 +625,9 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
 	if (is_vmware_backdoor_pmc(idx))
 		return kvm_pmu_rdpmc_vmware(vcpu, idx, data);
 
+	if (idx & INTEL_PMC_FIXED_RDPMC_METRICS)
+		return kvm_pmu_read_perf_metrics(vcpu, idx, data);
+
 	pmc = static_call(kvm_x86_pmu_rdpmc_ecx_to_pmc)(vcpu, idx, &mask);
 	if (!pmc)
 		return 1;
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 3dc0deb83096..43abe793c11c 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -257,6 +257,34 @@ static inline bool pmc_is_globally_enabled(struct kvm_pmc *pmc)
 	return test_bit(pmc->idx, (unsigned long *)&pmu->global_ctrl);
 }
 
+static inline int pmc_is_topdown_metrics_used(struct kvm_pmc *pmc)
+{
+	return (pmc->idx == INTEL_PMC_IDX_FIXED_SLOTS) &&
+	       (pmc->max_nr_events == KVM_TD_EVENTS_MAX);
+}
+
+static inline int pmc_is_topdown_metrics_active(struct kvm_pmc *pmc)
+{
+	return pmc_is_topdown_metrics_used(pmc) &&
+	       pmc->perf_events[KVM_TD_METRICS];
+}
+
+static inline void pmc_update_topdown_metrics(struct kvm_pmc *pmc)
+{
+	struct perf_event *event;
+	int i;
+
+	struct td_metrics td_metrics = {
+		.slots = pmc->counter,
+		.metric = pmc->extra_config,
+	};
+
+	for (i = 0; i < pmc->max_nr_events; i++) {
+		event = pmc->perf_events[i];
+		perf_event_topdown_metrics(event, &td_metrics);
+	}
+}
+
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
 void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
 int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 41a4533f9989..d8317552b634 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -22,6 +22,7 @@ extern int __read_mostly pt_mode;
 #define PT_MODE_HOST_GUEST	1
 
 #define PMU_CAP_FW_WRITES	(1ULL << 13)
+#define PMU_CAP_PERF_METRICS	BIT_ULL(15)
 #define PMU_CAP_LBR_FMT		0x3f
 
 struct nested_vmx_msrs {
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index b45396e0a46c..04ccb8c6f7e4 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -229,6 +229,9 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 		ret = (perf_capabilities & PERF_CAP_PEBS_BASELINE) &&
 			((perf_capabilities & PERF_CAP_PEBS_FORMAT) > 3);
 		break;
+	case MSR_PERF_METRICS:
+		ret = intel_pmu_metrics_is_enabled(vcpu) && (pmu->version > 1);
+		break;
 	default:
 		ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0) ||
 			get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0) ||
@@ -357,6 +360,43 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static int intel_pmu_handle_perf_metrics_access(struct kvm_vcpu *vcpu,
+						struct msr_data *msr_info, bool read)
+{
+	u32 index = msr_info->index;
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	struct kvm_pmc *pmc = get_fixed_pmc(pmu, MSR_CORE_PERF_FIXED_CTR3);
+
+	if (!pmc || index != MSR_PERF_METRICS)
+		return 1;
+
+	if (read) {
+		msr_info->data = pmc->extra_config;
+	} else {
+		/*
+		 * Save guest PERF_METRICS data in to extra_config,
+		 * the extra_config would be read to write to PERF_METRICS
+		 * MSR in later events group creating process.
+		 */
+		pmc->extra_config = msr_info->data;
+		if (pmc_is_topdown_metrics_active(pmc)) {
+			pmc_update_topdown_metrics(pmc);
+		} else {
+			/*
+			 * If the slots/vmetrics events group is not
+			 * created yet, set max_nr_events to 2
+			 * (slots event + vmetrics event), so KVM knows
+			 * topdown metrics profiling is running in guest
+			 * and slots/vmetrics events group would be created
+			 * later.
+			 */
+			pmc->max_nr_events = KVM_TD_EVENTS_MAX;
+		}
+	}
+
+	return 0;
+}
+
 static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -376,6 +416,10 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_PEBS_DATA_CFG:
 		msr_info->data = pmu->pebs_data_cfg;
 		break;
+	case MSR_PERF_METRICS:
+		if (intel_pmu_handle_perf_metrics_access(vcpu, msr_info, true))
+			return 1;
+		break;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -438,6 +482,10 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 		pmu->pebs_data_cfg = data;
 		break;
+	case MSR_PERF_METRICS:
+		if (intel_pmu_handle_perf_metrics_access(vcpu, msr_info, false))
+			return 1;
+		break;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index c2130d2c8e24..63b6dcc360c2 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -670,6 +670,11 @@ static inline bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu)
 	return !!vcpu_to_lbr_records(vcpu)->nr;
 }
 
+static inline bool intel_pmu_metrics_is_enabled(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.perf_capabilities & PMU_CAP_PERF_METRICS;
+}
+
 void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu);
 int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
 void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 95b1ac3bc0b6..5d9fde90370a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1463,6 +1463,7 @@ static const u32 msrs_to_save_pmu[] = {
 	MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
 	MSR_CORE_PERF_GLOBAL_CTRL, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
 	MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
+	MSR_PERF_METRICS,
 
 	/* This part of MSRs should match KVM_INTEL_PMC_MAX_GENERIC. */
 	MSR_ARCH_PERFMON_PERFCTR0, MSR_ARCH_PERFMON_PERFCTR1,

From patchwork Tue Aug 22 05:11:40 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
X-Patchwork-Id: 136586
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b82d:0:b0:3f2:4152:657d with SMTP id z13csp3801742vqi;
        Tue, 22 Aug 2023 11:09:11 -0700 (PDT)
X-Google-Smtp-Source: 
 AGHT+IFGhOkg7dtRnzw5tcN12AY1HXfroeVheApTG5OJHkr+tgfMnS8ZBMA8UHclBg0sz1fglZmK
X-Received: by 2002:a17:90a:fd14:b0:26d:ae3:f6a7 with SMTP id
 cv20-20020a17090afd1400b0026d0ae3f6a7mr7400867pjb.21.1692727750797;
        Tue, 22 Aug 2023 11:09:10 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1692727750; cv=none;
        d=google.com; s=arc-20160816;
        b=EZiAkIKYt4x7msIlZHTKATY1S5nbdqoTBH3T6t3iEgri+7gIfFb84REjFpdXyVNtEs
         mioklXM7K+ZtJfptRW2sumhj8jBKPKmKO+n4YpMvnn9Zs72ZcZH2g7MduXoxw7HFkwZ3
         65LBKK6/5GGHlpCjrH5Vu+dlzxUDhVX+442DI/Xwoyzr2MhJP9MyKdjHhyZhnBHxD8lO
         8l41X0FoCa6tdftl19zXJ+7A20Jt9I4rY8Ux2giafl5p1Wor6gSkRetWuRU1HOdTUbPP
         Ma0Vexk/7iqryDc/fKEGycCAt5Dgs5jHnCppyj9kcbcZerfeYyaKqcvCvS8QmPEVUPzv
         gq5A==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=QApwb8vcuhEYI7cg+QZgYU+/1E1CfEF81yhbh27zDpA=;
        fh=UPwqiesAqvJCK7JcrgEpVEdaX86ysnZ9K1cUIusTBBs=;
        b=BbYy/q6lYK7vmq4JX/v9wrVEs0NbIzEUZvTTbCPiVGJVmO9kzPz2nU9C79svxd94TY
         23RJy7A90dFJ78aGRdBuIR2xeSmutvku+2MPTB2cQnFDAEWYhMnlv8Txguy7PgtD2wWG
         SO8fnRCgwclLZS0/zw+8Qef3HslBv4oG94dt6POzl/Wfr/8sLgn/e9wYoElO0C+5eYW6
         JeWETKqB+i09KBTGZElfmXE3n/dWTlBN6fhePX/UbdwP6ZOxvy7QWC8aAtDqlrxAmaop
         UQAoA28cWQddBWKvXhFxYDsNCF8GrIAa7Kbx4YdkyqhYj70L8P9yGkOVx0qQUKMkWpu3
         Fhlg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=hL4xq9lg;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:18 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net.
 [2620:137:e000::1:18])
        by mx.google.com with ESMTPS id
 b14-20020a6567ce000000b00563fc0ccc31si8785116pgs.665.2023.08.22.11.09.10
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 22 Aug 2023 11:09:10 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:18 as permitted sender)
 client-ip=2620:137:e000::1:18;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=hL4xq9lg;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:18 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by lindbergh.monkeyblade.net (Postfix) with ESMTP id 78C7463BB;
	Tue, 22 Aug 2023 10:20:48 -0700 (PDT)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232706AbjHVFHf (ORCPT <rfc822;kjyul0303@gmail.com> + 99 others);
        Tue, 22 Aug 2023 01:07:35 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46760 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231576AbjHVFH3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 22 Aug 2023 01:07:29 -0400
Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57073E42;
        Mon, 21 Aug 2023 22:07:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1692680821; x=1724216821;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=FjGrlNIY+vShUtvCpATT+KcAnzFBFFjm1ThsnaWT4H0=;
  b=hL4xq9lgsSQmr0QH2YrhYekw7P1RpYJJFmTc+LXHKeTpoqWfk8acJwhn
   EdlkNdZJ/UUIT1yZ4UX6VrimqPCmcKhjCFjYNahHRrQUjjYivi0muxx63
   dNqBgXVOi/+NsrYFIYLHcLnFH6WhHgGXz+Mxugyv6Efsw9dzHQ5Ux2eUB
   MsYV2pE/nOoAbhK2GOx1A7wds4RbGxONlfnCgtcZ/feuMV6jvGnpwfWz/
   PnyLe4dQdyRoI0eYsJfSk+9XVBlL3ddBCBFzsW/vcgmFWgtorxetSXFVf
   GVvRJjUETzXHJPZKAHXB/D/HLun3cATQObyZgZtDHiGsimJCUVE2wMexm
   Q==;
X-IronPort-AV: E=McAfee;i="6600,9927,10809"; a="440146780"
X-IronPort-AV: E=Sophos;i="6.01,192,1684825200";
   d="scan'208";a="440146780"
Received: from fmsmga006.fm.intel.com ([10.253.24.20])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 21 Aug 2023 22:05:16 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10809"; a="982737043"
X-IronPort-AV: E=Sophos;i="6.01,192,1684825200";
   d="scan'208";a="982737043"
Received: from dmi-pnp-i7.sh.intel.com ([10.239.159.155])
  by fmsmga006.fm.intel.com with ESMTP; 21 Aug 2023 22:05:12 -0700
From: Dapeng Mi <dapeng1.mi@linux.intel.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Kan Liang <kan.liang@linux.intel.com>,
        Like Xu <likexu@tencent.com>,
        Mark Rutland <mark.rutland@arm.com>,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        Jiri Olsa <jolsa@kernel.org>,
        Namhyung Kim <namhyung@kernel.org>,
        Ian Rogers <irogers@google.com>,
        Adrian Hunter <adrian.hunter@intel.com>
Cc: kvm@vger.kernel.org, linux-perf-users@vger.kernel.org,
        linux-kernel@vger.kernel.org,
        Zhenyu Wang <zhenyuw@linux.intel.com>,
        Zhang Xiong <xiong.y.zhang@intel.com>,
        Lv Zhiyuan <zhiyuan.lv@intel.com>,
        Yang Weijiang <weijiang.yang@intel.com>,
        Dapeng Mi <dapeng1.mi@intel.com>,
        Dapeng Mi <dapeng1.mi@linux.intel.com>
Subject: [PATCH RFC v3 13/13] KVM: x86/pmu: Expose Topdown in
 MSR_IA32_PERF_CAPABILITIES
Date: Tue, 22 Aug 2023 13:11:40 +0800
Message-Id: <20230822051140.512879-14-dapeng1.mi@linux.intel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20230822051140.512879-1-dapeng1.mi@linux.intel.com>
References: <20230822051140.512879-1-dapeng1.mi@linux.intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED,
        SPF_HELO_NONE,SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1774953694112870509
X-GMAIL-MSGID: 1774953694112870509

Topdown support is enumerated via IA32_PERF_CAPABILITIES[bit 15]. Enable
this bit for guest when the feature is available on host.

Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 arch/x86/kvm/vmx/pmu_intel.c | 3 +++
 arch/x86/kvm/vmx/vmx.c       | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 04ccb8c6f7e4..5783cde00054 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -614,6 +614,9 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 		(((1ull << pmu->nr_arch_fixed_counters) - 1) << INTEL_PMC_IDX_FIXED));
 	pmu->global_ctrl_mask = counter_mask;
 
+	if (intel_pmu_metrics_is_enabled(vcpu))
+		pmu->global_ctrl_mask &= ~(1ULL << GLOBAL_CTRL_EN_PERF_METRICS);
+
 	/*
 	 * GLOBAL_STATUS and GLOBAL_OVF_CONTROL (a.k.a. GLOBAL_STATUS_RESET)
 	 * share reserved bit definitions.  The kernel just happens to use
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e6849f780dba..69a425be55bf 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7827,6 +7827,8 @@ static u64 vmx_get_perf_capabilities(void)
 			perf_cap &= ~PERF_CAP_PEBS_BASELINE;
 	}
 
+	perf_cap |= host_perf_cap & PMU_CAP_PERF_METRICS;
+
 	return perf_cap;
 }