[v2,2/3] perf/ibs: Fix interface via core pmu events

Message ID 20230309101111.444-3-ravi.bangoria@amd.com
State New
Headers
Series perf/ibs: Fix interface via core pmu events |

Commit Message

Ravi Bangoria March 9, 2023, 10:11 a.m. UTC
  Although, IBS pmu can be invoked via it's own interface, indirect
IBS invocation via core pmu event is also supported with fixed set
of events: cpu-cycles:p, r076:p (same as cpu-cycles:p) and r0C1:p
(micro-ops) for user convenience.

This indirect IBS invocation is broken since commit 66d258c5b048
("perf/core: Optimize perf_init_event()"), which added RAW pmu
under pmu_idr list and thus if event_init() fails with RAW pmu,
it started returning error instead of trying other pmus.

Fix it by trying to open event on all pmus if event_init() on user
requested pmu returns -ESRCH.

Without patch:
  $ sudo ./perf record -C 0 -e r076:p -- sleep 1
  Error:
  The r076:p event is not supported.

With patch:
  $ sudo ./perf record -C 0 -e r076:p -- sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.341 MB perf.data (37 samples) ]

Note that there is no notion of forward pmu mapping. i.e. kernel doesn't
know which specific pmu(or a set of pmus) the event should be forwarded
to. As of now, only AMD core pmu forwards a set of events to IBS pmu
when precise_ip attribute is set and thus trying with all pmus works.
But if more pmus starts returning -ESRCH, some sort of forward pmu
mapping needs to be introduced through which the event can directly
get forwarded to only mapped pmus. Otherwise, trying all pmus can
inadvertently open event on wrong pmu.

Fixes: 66d258c5b048 ("perf/core: Optimize perf_init_event()")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/core.c | 11 ++++++++---
 kernel/events/core.c       | 10 +++++++++-
 2 files changed, 17 insertions(+), 4 deletions(-)
  

Comments

Namhyung Kim March 11, 2023, 12:32 a.m. UTC | #1
On Thu, Mar 9, 2023 at 2:12 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> Although, IBS pmu can be invoked via it's own interface, indirect
> IBS invocation via core pmu event is also supported with fixed set
> of events: cpu-cycles:p, r076:p (same as cpu-cycles:p) and r0C1:p
> (micro-ops) for user convenience.
>
> This indirect IBS invocation is broken since commit 66d258c5b048
> ("perf/core: Optimize perf_init_event()"), which added RAW pmu
> under pmu_idr list and thus if event_init() fails with RAW pmu,
> it started returning error instead of trying other pmus.
>
> Fix it by trying to open event on all pmus if event_init() on user
> requested pmu returns -ESRCH.
>
> Without patch:
>   $ sudo ./perf record -C 0 -e r076:p -- sleep 1
>   Error:
>   The r076:p event is not supported.
>
> With patch:
>   $ sudo ./perf record -C 0 -e r076:p -- sleep 1
>   [ perf record: Woken up 1 times to write data ]
>   [ perf record: Captured and wrote 0.341 MB perf.data (37 samples) ]
>
> Note that there is no notion of forward pmu mapping. i.e. kernel doesn't
> know which specific pmu(or a set of pmus) the event should be forwarded
> to. As of now, only AMD core pmu forwards a set of events to IBS pmu
> when precise_ip attribute is set and thus trying with all pmus works.
> But if more pmus starts returning -ESRCH, some sort of forward pmu
> mapping needs to be introduced through which the event can directly
> get forwarded to only mapped pmus. Otherwise, trying all pmus can
> inadvertently open event on wrong pmu.
>
> Fixes: 66d258c5b048 ("perf/core: Optimize perf_init_event()")
> Reported-by: Stephane Eranian <eranian@google.com>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  arch/x86/events/amd/core.c | 11 ++++++++---
>  kernel/events/core.c       | 10 +++++++++-
>  2 files changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
> index 8c45b198b62f..81d67b899371 100644
> --- a/arch/x86/events/amd/core.c
> +++ b/arch/x86/events/amd/core.c
> @@ -371,10 +371,15 @@ static inline int amd_has_nb(struct cpu_hw_events *cpuc)
>  static int amd_pmu_hw_config(struct perf_event *event)
>  {
>         int ret;
> +       u64 dummy;
>
> -       /* pass precise event sampling to ibs: */
> -       if (event->attr.precise_ip && get_ibs_caps())
> -               return -ENOENT;
> +       if (event->attr.precise_ip) {
> +               /* pass precise event sampling to ibs by returning -ESRCH */
> +               if (get_ibs_caps() && !ibs_core_pmu_event(event, &dummy))
> +                       return -ESRCH;
> +               else
> +                       return -ENOENT;
> +       }
>
>         if (has_branch_stack(event) && !x86_pmu.lbr_nr)
>                 return -EOPNOTSUPP;
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index f79fd8b87f75..e990c71ba34a 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -11639,18 +11639,26 @@ static struct pmu *perf_init_event(struct perf_event *event)
>                         goto again;
>                 }
>
> +               /*
> +                * pmu->event_init() should return -ESRCH only when it
> +                * wants to forward the event to other pmu.
> +                */

Can we add this to the comment in the struct pmu?  There is a
description already for other error codes.

Otherwise looks good.

Thanks,
Namhyung


> +               if (ret == -ESRCH)
> +                       goto try_all;
> +
>                 if (ret)
>                         pmu = ERR_PTR(ret);
>
>                 goto unlock;
>         }
>
> +try_all:
>         list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {
>                 ret = perf_try_init_event(pmu, event);
>                 if (!ret)
>                         goto unlock;
>
> -               if (ret != -ENOENT) {
> +               if (ret != -ENOENT && ret != -ESRCH) {
>                         pmu = ERR_PTR(ret);
>                         goto unlock;
>                 }
> --
> 2.39.2
>
  
Peter Zijlstra March 12, 2023, 2:54 p.m. UTC | #2
On Thu, Mar 09, 2023 at 03:41:10PM +0530, Ravi Bangoria wrote:
> diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
> index 8c45b198b62f..81d67b899371 100644
> --- a/arch/x86/events/amd/core.c
> +++ b/arch/x86/events/amd/core.c
> @@ -371,10 +371,15 @@ static inline int amd_has_nb(struct cpu_hw_events *cpuc)
>  static int amd_pmu_hw_config(struct perf_event *event)
>  {
>  	int ret;
> +	u64 dummy;
>  
> -	/* pass precise event sampling to ibs: */
> -	if (event->attr.precise_ip && get_ibs_caps())
> -		return -ENOENT;
> +	if (event->attr.precise_ip) {
> +		/* pass precise event sampling to ibs by returning -ESRCH */
> +		if (get_ibs_caps() && !ibs_core_pmu_event(event, &dummy))
> +			return -ESRCH;
> +		else
> +			return -ENOENT;
> +	}
>  
>  	if (has_branch_stack(event) && !x86_pmu.lbr_nr)
>  		return -EOPNOTSUPP;
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index f79fd8b87f75..e990c71ba34a 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -11639,18 +11639,26 @@ static struct pmu *perf_init_event(struct perf_event *event)
>  			goto again;
>  		}
>  
> +		/*
> +		 * pmu->event_init() should return -ESRCH only when it
> +		 * wants to forward the event to other pmu.
> +		 */
> +		if (ret == -ESRCH)
> +			goto try_all;
> +
>  		if (ret)
>  			pmu = ERR_PTR(ret);
>  
>  		goto unlock;
>  	}
>  
> +try_all:
>  	list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {
>  		ret = perf_try_init_event(pmu, event);
>  		if (!ret)
>  			goto unlock;
>  
> -		if (ret != -ENOENT) {
> +		if (ret != -ENOENT && ret != -ESRCH) {
>  			pmu = ERR_PTR(ret);
>  			goto unlock;
>  		}

Urgh.. So amd_pmu_hw_config() knows what PMU it should be but has no
real way to communicate this, so you make it try all of them again.

Now, we already have a gruesome hack in there, and I'm thikning you
should use that instead of adding yet another one. Note:

		if (ret == -ENOENT && event->attr.type != type && !extended_type) {
			type = event->attr.type;
			goto again;

So if you have amd_pmu_hw_config() do:

	event->attr.type = ibs_pmu.type;
	return -ENOENT;

it should all just work no?

And now thinking about this, I'm thinking we can clean up the whole
swevent mess too, a little something like the below perhaps... Then it
might just be possible to remove that list_for_each_entry_rcu()
entirely.

Hmm?


---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f79fd8b87f75..26130d1ca40b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9951,6 +9951,9 @@ static void sw_perf_event_destroy(struct perf_event *event)
 	swevent_hlist_put();
 }
 
+static struct pmu perf_cpu_clock; /* fwd declaration */
+static struct pmu perf_task_clock;
+
 static int perf_swevent_init(struct perf_event *event)
 {
 	u64 event_id = event->attr.config;
@@ -9966,7 +9969,11 @@ static int perf_swevent_init(struct perf_event *event)
 
 	switch (event_id) {
 	case PERF_COUNT_SW_CPU_CLOCK:
+		event->attr.type = perf_cpu_clock.type;
+		return -ENOENT;
+
 	case PERF_COUNT_SW_TASK_CLOCK:
+		event->attr.type = perf_task_clock.type;
 		return -ENOENT;
 
 	default:
  
Ravi Bangoria March 13, 2023, 12:29 p.m. UTC | #3
Hi Peter,

On 12-Mar-23 8:24 PM, Peter Zijlstra wrote:
> On Thu, Mar 09, 2023 at 03:41:10PM +0530, Ravi Bangoria wrote:
>> diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
>> index 8c45b198b62f..81d67b899371 100644
>> --- a/arch/x86/events/amd/core.c
>> +++ b/arch/x86/events/amd/core.c
>> @@ -371,10 +371,15 @@ static inline int amd_has_nb(struct cpu_hw_events *cpuc)
>>  static int amd_pmu_hw_config(struct perf_event *event)
>>  {
>>  	int ret;
>> +	u64 dummy;
>>  
>> -	/* pass precise event sampling to ibs: */
>> -	if (event->attr.precise_ip && get_ibs_caps())
>> -		return -ENOENT;
>> +	if (event->attr.precise_ip) {
>> +		/* pass precise event sampling to ibs by returning -ESRCH */
>> +		if (get_ibs_caps() && !ibs_core_pmu_event(event, &dummy))
>> +			return -ESRCH;
>> +		else
>> +			return -ENOENT;
>> +	}
>>  
>>  	if (has_branch_stack(event) && !x86_pmu.lbr_nr)
>>  		return -EOPNOTSUPP;
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index f79fd8b87f75..e990c71ba34a 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -11639,18 +11639,26 @@ static struct pmu *perf_init_event(struct perf_event *event)
>>  			goto again;
>>  		}
>>  
>> +		/*
>> +		 * pmu->event_init() should return -ESRCH only when it
>> +		 * wants to forward the event to other pmu.
>> +		 */
>> +		if (ret == -ESRCH)
>> +			goto try_all;
>> +
>>  		if (ret)
>>  			pmu = ERR_PTR(ret);
>>  
>>  		goto unlock;
>>  	}
>>  
>> +try_all:
>>  	list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {
>>  		ret = perf_try_init_event(pmu, event);
>>  		if (!ret)
>>  			goto unlock;
>>  
>> -		if (ret != -ENOENT) {
>> +		if (ret != -ENOENT && ret != -ESRCH) {
>>  			pmu = ERR_PTR(ret);
>>  			goto unlock;
>>  		}
> 
> Urgh.. So amd_pmu_hw_config() knows what PMU it should be but has no
> real way to communicate this, so you make it try all of them again.
> 
> Now, we already have a gruesome hack in there, and I'm thikning you
> should use that instead of adding yet another one. Note:
> 
> 		if (ret == -ENOENT && event->attr.type != type && !extended_type) {
> 			type = event->attr.type;
> 			goto again;
> 
> So if you have amd_pmu_hw_config() do:
> 
> 	event->attr.type = ibs_pmu.type;
> 	return -ENOENT;
> 
> it should all just work no?

IBS driver needs to convert RAW pmu config to IBS config, which it does
based on original event->attr.type. See perf_ibs_precise_event(). This
logic will fail with event->attr.type overwrite.

> 
> And now thinking about this, I'm thinking we can clean up the whole
> swevent mess too, a little something like the below perhaps... Then it
> might just be possible to remove that list_for_each_entry_rcu()
> entirely.
> 
> Hmm?

I'll check this and revert back.

> 
> 
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index f79fd8b87f75..26130d1ca40b 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9951,6 +9951,9 @@ static void sw_perf_event_destroy(struct perf_event *event)
>  	swevent_hlist_put();
>  }
>  
> +static struct pmu perf_cpu_clock; /* fwd declaration */
> +static struct pmu perf_task_clock;
> +
>  static int perf_swevent_init(struct perf_event *event)
>  {
>  	u64 event_id = event->attr.config;
> @@ -9966,7 +9969,11 @@ static int perf_swevent_init(struct perf_event *event)
>  
>  	switch (event_id) {
>  	case PERF_COUNT_SW_CPU_CLOCK:
> +		event->attr.type = perf_cpu_clock.type;
> +		return -ENOENT;
> +
>  	case PERF_COUNT_SW_TASK_CLOCK:
> +		event->attr.type = perf_task_clock.type;
>  		return -ENOENT;
>  
>  	default:

Thanks,
Ravi
  
Ravi Bangoria March 13, 2023, 12:35 p.m. UTC | #4
>> +               /*
>> +                * pmu->event_init() should return -ESRCH only when it
>> +                * wants to forward the event to other pmu.
>> +                */
> 
> Can we add this to the comment in the struct pmu?  There is a
> description already for other error codes.

Sure. I'll update there if we continue with this approach.

> 
> Otherwise looks good.

Thanks,
Ravi
  
Peter Zijlstra March 13, 2023, 2:21 p.m. UTC | #5
On Mon, Mar 13, 2023 at 05:59:46PM +0530, Ravi Bangoria wrote:

> > Now, we already have a gruesome hack in there, and I'm thikning you
> > should use that instead of adding yet another one. Note:
> > 
> > 		if (ret == -ENOENT && event->attr.type != type && !extended_type) {
> > 			type = event->attr.type;
> > 			goto again;
> > 
> > So if you have amd_pmu_hw_config() do:
> > 
> > 	event->attr.type = ibs_pmu.type;
> > 	return -ENOENT;
> > 
> > it should all just work no?
> 
> IBS driver needs to convert RAW pmu config to IBS config, which it does
> based on original event->attr.type. See perf_ibs_precise_event(). This
> logic will fail with event->attr.type overwrite.

amd_pmu_hw_config() could also rewrite event->attr.config I suppose.

I don't think we actually use/expose these event->attr fields again
after all this, do wel?

The closest to that is perf_event_modify_attr(), but that is limited to
TYPE_BREAKPOINT for the time being (also, could this be used to cure
your in-kernel IBS usage woes?).
  
Ravi Bangoria March 13, 2023, 3:40 p.m. UTC | #6
On 13-Mar-23 7:51 PM, Peter Zijlstra wrote:
> On Mon, Mar 13, 2023 at 05:59:46PM +0530, Ravi Bangoria wrote:
> 
>>> Now, we already have a gruesome hack in there, and I'm thikning you
>>> should use that instead of adding yet another one. Note:
>>>
>>> 		if (ret == -ENOENT && event->attr.type != type && !extended_type) {
>>> 			type = event->attr.type;
>>> 			goto again;
>>>
>>> So if you have amd_pmu_hw_config() do:
>>>
>>> 	event->attr.type = ibs_pmu.type;
>>> 	return -ENOENT;
>>>
>>> it should all just work no?
>>
>> IBS driver needs to convert RAW pmu config to IBS config, which it does
>> based on original event->attr.type. See perf_ibs_precise_event(). This
>> logic will fail with event->attr.type overwrite.
> 
> amd_pmu_hw_config() could also rewrite event->attr.config I suppose.

This might work. Let me try.

> 
> I don't think we actually use/expose these event->attr fields again
> after all this, do wel?

I'll confirm this as well.

> 
> The closest to that is perf_event_modify_attr(), but that is limited to
> TYPE_BREAKPOINT for the time being (also, could this be used to cure
> your in-kernel IBS usage woes?).

I think doing it transparently at the arch layer would be better.

Thanks,
Ravi
  

Patch

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 8c45b198b62f..81d67b899371 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -371,10 +371,15 @@  static inline int amd_has_nb(struct cpu_hw_events *cpuc)
 static int amd_pmu_hw_config(struct perf_event *event)
 {
 	int ret;
+	u64 dummy;
 
-	/* pass precise event sampling to ibs: */
-	if (event->attr.precise_ip && get_ibs_caps())
-		return -ENOENT;
+	if (event->attr.precise_ip) {
+		/* pass precise event sampling to ibs by returning -ESRCH */
+		if (get_ibs_caps() && !ibs_core_pmu_event(event, &dummy))
+			return -ESRCH;
+		else
+			return -ENOENT;
+	}
 
 	if (has_branch_stack(event) && !x86_pmu.lbr_nr)
 		return -EOPNOTSUPP;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f79fd8b87f75..e990c71ba34a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11639,18 +11639,26 @@  static struct pmu *perf_init_event(struct perf_event *event)
 			goto again;
 		}
 
+		/*
+		 * pmu->event_init() should return -ESRCH only when it
+		 * wants to forward the event to other pmu.
+		 */
+		if (ret == -ESRCH)
+			goto try_all;
+
 		if (ret)
 			pmu = ERR_PTR(ret);
 
 		goto unlock;
 	}
 
+try_all:
 	list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {
 		ret = perf_try_init_event(pmu, event);
 		if (!ret)
 			goto unlock;
 
-		if (ret != -ENOENT) {
+		if (ret != -ENOENT && ret != -ESRCH) {
 			pmu = ERR_PTR(ret);
 			goto unlock;
 		}