[0/1] Support PERF_SAMPLE_READ with inherit_stat

Message ID 20240119163924.2801678-1-ben.gainey@arm.com
Headers
Series Support PERF_SAMPLE_READ with inherit_stat |

Message

Ben Gainey Jan. 19, 2024, 4:39 p.m. UTC
  This change allows events to use PERF_SAMPLE READ with inherit so long 
as both inherit_stat and PERF_SAMPLE_TID are set.

Currently it is not possible to use PERF_SAMPLE_READ with inherit. This 
restriction assumes the user is interested in collecting aggregate 
statistics as per `perf stat`. It prevents a user from collecting 
per-thread samples using counter groups from a multi-threaded or 
multi-process application, as with `perf record -e '{....}:S'`. Instead 
users must use system-wide mode, or forgo the ability to sample counter 
groups. System-wide mode is often problematic as it requires specific 
permissions (no CAP_PERFMON / root access), or may lead to capture of 
significant amounts of extra data from other processes running on the 
system. 

Perf already supports the ability to collect per-thread counts with 
`inherit` via the `inherit_stat` flag. This patch changes 
`perf_event_alloc` relaxing the restriction to combine `inherit` with 
`PERF_SAMPLE_READ` so that the combination will be allowed so long as 
`inherit_stat` and `PERF_SAMPLE_TID` are enabled.

In this configuration stream ids (such as may appear in the read_format 
field of a PERF_RECORD_SAMPLE) are no longer globally unique, rather 
the pair of (stream id, tid) uniquely identify each event. Tools that 
rely on this, for example to calculate a delta between samples, would 
need updating to take this into account. Previously valid event 
configurations (system-wide, no-inherit and so on) where each stream id 
is the identifier are unaffected.

This patch has been tested on aarch64 both my manual inspection of the 
output of `perf script -D` and through a modified version of Arm's 
commercial profiling tools and the numbers appear to line up as one 
would expect, but some further validation across other architectures 
and/or edge cases would be welcome.

This patch was developed and tested on top of v6.7.


Ben Gainey (1):
  perf: Support PERF_SAMPLE_READ with inherit_stat

 kernel/events/core.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
  

Comments

Namhyung Kim Jan. 20, 2024, 12:49 a.m. UTC | #1
Hello,

On Fri, Jan 19, 2024 at 8:39 AM Ben Gainey <ben.gainey@arm.com> wrote:
>
> This change allows events to use PERF_SAMPLE READ with inherit so long
> as both inherit_stat and PERF_SAMPLE_TID are set.
>
> Currently it is not possible to use PERF_SAMPLE_READ with inherit. This
> restriction assumes the user is interested in collecting aggregate
> statistics as per `perf stat`. It prevents a user from collecting
> per-thread samples using counter groups from a multi-threaded or
> multi-process application, as with `perf record -e '{....}:S'`. Instead
> users must use system-wide mode, or forgo the ability to sample counter
> groups. System-wide mode is often problematic as it requires specific
> permissions (no CAP_PERFMON / root access), or may lead to capture of
> significant amounts of extra data from other processes running on the
> system.
>
> Perf already supports the ability to collect per-thread counts with
> `inherit` via the `inherit_stat` flag. This patch changes
> `perf_event_alloc` relaxing the restriction to combine `inherit` with
> `PERF_SAMPLE_READ` so that the combination will be allowed so long as
> `inherit_stat` and `PERF_SAMPLE_TID` are enabled.

I'm not sure if it's correct.  Maybe I misunderstand inherit_stat but
AFAIK it's just to use prev_task's events when next_task has the
compatible event context.  So the event values it sees in samples
would depend on the timing or scheduler behavior.

Also event counts and time values PERF_SAMPLE_READ sees
include child event's so the values of the parent event can be
updated even if it's inactive.  And the values will vary for the
next_task whether prev_task is the parent or not.  I think it
would return consistent values only if it iterates all child events
and sums up the values like it does for read(2).  But it cannot
do that in the NMI handler.

Frankly I don't understand how inherit_stat supports per-thread
counts properly.  Also it doesn't seem to be used by default in
the perf tools.  IIUC per-thread count is supported when you
don't set the inherit bit and open separate events for each
thread but I guess that's not what you want.

Anyway, I'm ok with the idea of using PERF_SAMPLE_READ to
improve per-thread profiling especially with event groups.
But I think it should not use inherit_stat and it needs a way to
not include child stats in the samples.

What do you think?

Thanks,
Namhyung

>
> In this configuration stream ids (such as may appear in the read_format
> field of a PERF_RECORD_SAMPLE) are no longer globally unique, rather
> the pair of (stream id, tid) uniquely identify each event. Tools that
> rely on this, for example to calculate a delta between samples, would
> need updating to take this into account. Previously valid event
> configurations (system-wide, no-inherit and so on) where each stream id
> is the identifier are unaffected.
>
> This patch has been tested on aarch64 both my manual inspection of the
> output of `perf script -D` and through a modified version of Arm's
> commercial profiling tools and the numbers appear to line up as one
> would expect, but some further validation across other architectures
> and/or edge cases would be welcome.
>
> This patch was developed and tested on top of v6.7.
>
>
> Ben Gainey (1):
>   perf: Support PERF_SAMPLE_READ with inherit_stat
>
>  kernel/events/core.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> --
> 2.43.0
>