[RFC,v2,1/2] perf/x86/rapl: Fix energy-cores event

Message ID 20230301181449.14647-2-wyes.karny@amd.com
State New
Headers
Series Enable Core RAPL for AMD |

Commit Message

Wyes Karny March 1, 2023, 6:14 p.m. UTC
  For quite some time, energy-cores event is broken, because RAPL PMU
assumes all the events on this PMU are uncore and sets rapl_cpu_mask
with the first available CPU on the die. Therefore, for energy-cores
event if we read MSR form pmu->cpu, it's wrong. But the following two
changes helped to hide this issue.

- commit 704e2f5b700d ("perf stat: Use affinity for enabling/disabling
  events")
- commit e64cd6f73ff5 ("perf/x86: Use PMUEF_READ_CPU_PKG in uncore
  events")

These two changes together acted as a workaround for energy-cores event.
First change affined perf events to respective CPUs whereas the second
change helped to pick the local CPU to read the MSR. In this way, MSRs
were read from the correct CPU. This works unless it's the first
reading.  For the first reading the second patch doesn't apply and we
get wrong readings. Stephane reported this issue when a patch to enable
AMD energy-cores RAPL event was posted [1].

The right way to fix the issue is to get rid of RAPL being considered an
uncore event. That is a larger change. To enable current RAPL usage,
work around the issue by conditionally remove the
`PERF_EV_CAP_READ_ACTIVE_PKG` flag for energy-cores event. Also, use the
event's CPU instead for PMU's CPU to read the MSR.

[1]: https://lore.kernel.org/lkml/CABPqkBQ_bSTC-OEe_LrgUrpj2VsseX1ThvO-yLcEtF8vb4+AAw@mail.gmail.com/#t

Fixes: e64cd6f73ff5 ("perf/x86: Use PMUEF_READ_CPU_PKG in uncore events")
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
---
 arch/x86/events/rapl.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)
  

Comments

Liang, Kan March 1, 2023, 7:29 p.m. UTC | #1
On 2023-03-01 1:14 p.m., Wyes Karny wrote:
> For quite some time, energy-cores event is broken, because RAPL PMU
> assumes all the events on this PMU are uncore and sets rapl_cpu_mask
> with the first available CPU on the die. Therefore, for energy-cores
> event if we read MSR form pmu->cpu, it's wrong. But the following two
> changes helped to hide this issue.
> 
> - commit 704e2f5b700d ("perf stat: Use affinity for enabling/disabling
>   events")
> - commit e64cd6f73ff5 ("perf/x86: Use PMUEF_READ_CPU_PKG in uncore
>   events")
> 
> These two changes together acted as a workaround for energy-cores event.
> First change affined perf events to respective CPUs whereas the second
> change helped to pick the local CPU to read the MSR. In this way, MSRs
> were read from the correct CPU. This works unless it's the first
> reading.  For the first reading the second patch doesn't apply and we
> get wrong readings. Stephane reported this issue when a patch to enable
> AMD energy-cores RAPL event was posted [1].
> 
> The right way to fix the issue is to get rid of RAPL being considered an
> uncore event. That is a larger change. To enable current RAPL usage,
> work around the issue by conditionally remove the
> `PERF_EV_CAP_READ_ACTIVE_PKG` flag for energy-cores event. Also, use the
> event's CPU instead for PMU's CPU to read the MSR.

The current RAPL PMU aka 'power' should be die/socket scope.
The energy-cores event is also defined as a die/socket scope.

 *  pp0 counter: consumption of all physical cores (power plane 0)
 * 	  event: rapl_energy_cores
 *    perf code: 0x1

I don't think we want to change the scope of the energy-cores event.
Otherwise Intel's energy-cores event probably be broken.

It looks like you are looking for a new per-core RAPL event. I think
it's better to create a new core scope RAPL PMU.

Thanks,
Kan

> 
> [1]: https://lore.kernel.org/lkml/CABPqkBQ_bSTC-OEe_LrgUrpj2VsseX1ThvO-yLcEtF8vb4+AAw@mail.gmail.com/#t
> 
> Fixes: e64cd6f73ff5 ("perf/x86: Use PMUEF_READ_CPU_PKG in uncore events")
> Signed-off-by: Wyes Karny <wyes.karny@amd.com>
> ---
>  arch/x86/events/rapl.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index 52e6e7ed4f78..e6a0c077daf5 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -343,14 +343,15 @@ static int rapl_pmu_event_init(struct perf_event *event)
>  	if (event->cpu < 0)
>  		return -EINVAL;
>  
> -	event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
> -
>  	if (!cfg || cfg >= NR_RAPL_DOMAINS + 1)
>  		return -EINVAL;
>  
>  	cfg = array_index_nospec((long)cfg, NR_RAPL_DOMAINS + 1);
>  	bit = cfg - 1;
>  
> +	if (bit != PERF_RAPL_PP0)
> +		event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
> +
>  	/* check event supported */
>  	if (!(rapl_cntr_mask & (1 << bit)))
>  		return -EINVAL;
> @@ -363,7 +364,15 @@ static int rapl_pmu_event_init(struct perf_event *event)
>  	pmu = cpu_to_rapl_pmu(event->cpu);
>  	if (!pmu)
>  		return -EINVAL;
> -	event->cpu = pmu->cpu;
> +
> +	/*
> +	 * FIXME: RAPL PMU considers events are uncore and MSRs can be read from
> +	 * the first available CPU of the die. But this is not true for energy-cores
> +	 * event. Therefore as a workaround don't consider pmu->cpu here for PERF_RAPL_PP0.
> +	 */
> +	if (event->event_caps & PERF_EV_CAP_READ_ACTIVE_PKG)
> +		event->cpu = pmu->cpu;
> +
>  	event->pmu_private = pmu;
>  	event->hw.event_base = rapl_msrs[bit].msr;
>  	event->hw.config = cfg;
  

Patch

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 52e6e7ed4f78..e6a0c077daf5 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -343,14 +343,15 @@  static int rapl_pmu_event_init(struct perf_event *event)
 	if (event->cpu < 0)
 		return -EINVAL;
 
-	event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
-
 	if (!cfg || cfg >= NR_RAPL_DOMAINS + 1)
 		return -EINVAL;
 
 	cfg = array_index_nospec((long)cfg, NR_RAPL_DOMAINS + 1);
 	bit = cfg - 1;
 
+	if (bit != PERF_RAPL_PP0)
+		event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
+
 	/* check event supported */
 	if (!(rapl_cntr_mask & (1 << bit)))
 		return -EINVAL;
@@ -363,7 +364,15 @@  static int rapl_pmu_event_init(struct perf_event *event)
 	pmu = cpu_to_rapl_pmu(event->cpu);
 	if (!pmu)
 		return -EINVAL;
-	event->cpu = pmu->cpu;
+
+	/*
+	 * FIXME: RAPL PMU considers events are uncore and MSRs can be read from
+	 * the first available CPU of the die. But this is not true for energy-cores
+	 * event. Therefore as a workaround don't consider pmu->cpu here for PERF_RAPL_PP0.
+	 */
+	if (event->event_caps & PERF_EV_CAP_READ_ACTIVE_PKG)
+		event->cpu = pmu->cpu;
+
 	event->pmu_private = pmu;
 	event->hw.event_base = rapl_msrs[bit].msr;
 	event->hw.config = cfg;