perf: fix perf_event_context->time

Message ID 20230228192145.2120675-1-song@kernel.org
State New
Headers
Series perf: fix perf_event_context->time |

Commit Message

Song Liu Feb. 28, 2023, 7:21 p.m. UTC
  Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get
accurate time_enabled and time_running for an event. The difference between
ctx->timestamp and ctx->time is the among of time when the context is not
enabled. For cpuctx.ctx, time and timestamp should stay the same, however,
it is not the case at the moment. To show this with drgn [1]:

    drgn 0.0.22 (using Python 3.8.6, elfutils 0.185, with libkdumpfile)
    For help, type help(drgn).
    ...
    >>> ctx = per_cpu_ptr(prog['pmu'].pmu_cpu_context, 0).ctx
    >>> ctx.timestamp * 1.0 / ctx.time
    (double)1.0385869134111765
    >>>

ctx->timestamp is about 4% larger than ctx.time. This issue causes time
read by perf_event_read_local() goes back in some cases.

Fix this by updating the condition for __update_context_time(ctx, false).
Specifically, it should only be called when we enable EVENT_TIME for the
ctx.

[1] drgn: https://github.com/osandov/drgn
Fixes: 09f5e7dc7ad7 ("perf: Fix perf_event_read_local() time")
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
---
 kernel/events/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Namhyung Kim March 1, 2023, 10:29 p.m. UTC | #1
Hi Song,

On Tue, Feb 28, 2023 at 11:22 AM Song Liu <song@kernel.org> wrote:
>
> Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get
> accurate time_enabled and time_running for an event. The difference between
> ctx->timestamp and ctx->time is the among of time when the context is not
> enabled. For cpuctx.ctx, time and timestamp should stay the same, however,

I'm not sure if it's correct.  The timestamp can go when the context is disabled
for example, in ctx_resched() even if the NMI watchdog is enabled, right?

> it is not the case at the moment. To show this with drgn [1]:
>
>     drgn 0.0.22 (using Python 3.8.6, elfutils 0.185, with libkdumpfile)
>     For help, type help(drgn).
>     ...
>     >>> ctx = per_cpu_ptr(prog['pmu'].pmu_cpu_context, 0).ctx
>     >>> ctx.timestamp * 1.0 / ctx.time
>     (double)1.0385869134111765
>     >>>
>
> ctx->timestamp is about 4% larger than ctx.time. This issue causes time
> read by perf_event_read_local() goes back in some cases.

I don't think the difference between the ctx.time and the timestamp
itself is a problem.  I think the problem is it CAN update the timestamp
(and the timeoffset) while the context is enabled.  Then depending on
the timing, event times can return smaller values than before.

>
> Fix this by updating the condition for __update_context_time(ctx, false).
> Specifically, it should only be called when we enable EVENT_TIME for the
> ctx.

So this change fixes the condition to update the timestamp only if
it moves from disabled to enabled state.

>
> [1] drgn: https://github.com/osandov/drgn
> Fixes: 09f5e7dc7ad7 ("perf: Fix perf_event_read_local() time")
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Signed-off-by: Song Liu <song@kernel.org>

Actually I got some similar reports lately but couldn't find time to
dig in deeply.  Thanks for the fix!

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung


> ---
>  kernel/events/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 380476a934e8..67478f43e26e 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3872,7 +3872,7 @@ ctx_sched_in(struct perf_event_context *ctx, enum event_type_t event_type)
>         if (likely(!ctx->nr_events))
>                 return;
>
> -       if (is_active ^ EVENT_TIME) {
> +       if (!(is_active & EVENT_TIME)) {
>                 /* start ctx time */
>                 __update_context_time(ctx, false);
>                 perf_cgroup_set_timestamp(cpuctx);
> --
> 2.30.2
>
  
Song Liu March 1, 2023, 11:16 p.m. UTC | #2
> On Mar 1, 2023, at 2:29 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> Hi Song,
> 
> On Tue, Feb 28, 2023 at 11:22 AM Song Liu <song@kernel.org> wrote:
>> 
>> Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get
>> accurate time_enabled and time_running for an event. The difference between
>> ctx->timestamp and ctx->time is the among of time when the context is not
>> enabled. For cpuctx.ctx, time and timestamp should stay the same, however,
> 
> I'm not sure if it's correct.  The timestamp can go when the context is disabled
> for example, in ctx_resched() even if the NMI watchdog is enabled, right?

I think we do not disable EVENT_TIME for per cpu ctx? 

> 
>> it is not the case at the moment. To show this with drgn [1]:
>> 
>>    drgn 0.0.22 (using Python 3.8.6, elfutils 0.185, with libkdumpfile)
>>    For help, type help(drgn).
>>    ...
>>>>> ctx = per_cpu_ptr(prog['pmu'].pmu_cpu_context, 0).ctx
>>>>> ctx.timestamp * 1.0 / ctx.time
>>    (double)1.0385869134111765
>>>>> 
>> 
>> ctx->timestamp is about 4% larger than ctx.time. This issue causes time
>> read by perf_event_read_local() goes back in some cases.
> 
> I don't think the difference between the ctx.time and the timestamp
> itself is a problem.  I think the problem is it CAN update the timestamp
> (and the timeoffset) while the context is enabled.  Then depending on
> the timing, event times can return smaller values than before.

For per cpu ctx, I think timeoffset should stay the same (may not be zero 
though). 

Thanks,
Song
  
Namhyung Kim March 2, 2023, 9:15 p.m. UTC | #3
On Wed, Mar 1, 2023 at 3:16 PM Song Liu <songliubraving@meta.com> wrote:
>
>
>
> > On Mar 1, 2023, at 2:29 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hi Song,
> >
> > On Tue, Feb 28, 2023 at 11:22 AM Song Liu <song@kernel.org> wrote:
> >>
> >> Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get
> >> accurate time_enabled and time_running for an event. The difference between
> >> ctx->timestamp and ctx->time is the among of time when the context is not
> >> enabled. For cpuctx.ctx, time and timestamp should stay the same, however,
> >
> > I'm not sure if it's correct.  The timestamp can go when the context is disabled
> > for example, in ctx_resched() even if the NMI watchdog is enabled, right?
>
> I think we do not disable EVENT_TIME for per cpu ctx?

I can see ctx_sched_out(ctx, EVENT_TIME) in some places.
Also it'd reset EVENT_TIME if both _PINNED and _FLEXIBLE is
cleared.

Thanks,
Namhyung

>
> >
> >> it is not the case at the moment. To show this with drgn [1]:
> >>
> >>    drgn 0.0.22 (using Python 3.8.6, elfutils 0.185, with libkdumpfile)
> >>    For help, type help(drgn).
> >>    ...
> >>>>> ctx = per_cpu_ptr(prog['pmu'].pmu_cpu_context, 0).ctx
> >>>>> ctx.timestamp * 1.0 / ctx.time
> >>    (double)1.0385869134111765
> >>>>>
> >>
> >> ctx->timestamp is about 4% larger than ctx.time. This issue causes time
> >> read by perf_event_read_local() goes back in some cases.
> >
> > I don't think the difference between the ctx.time and the timestamp
> > itself is a problem.  I think the problem is it CAN update the timestamp
> > (and the timeoffset) while the context is enabled.  Then depending on
> > the timing, event times can return smaller values than before.
>
> For per cpu ctx, I think timeoffset should stay the same (may not be zero
> though).
>
> Thanks,
> Song
>
>
  
Song Liu March 3, 2023, 1:21 a.m. UTC | #4
> On Mar 2, 2023, at 1:15 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> On Wed, Mar 1, 2023 at 3:16 PM Song Liu <songliubraving@meta.com> wrote:
>> 
>> 
>> 
>>> On Mar 1, 2023, at 2:29 PM, Namhyung Kim <namhyung@kernel.org> wrote:
>>> 
>>> Hi Song,
>>> 
>>> On Tue, Feb 28, 2023 at 11:22 AM Song Liu <song@kernel.org> wrote:
>>>> 
>>>> Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get
>>>> accurate time_enabled and time_running for an event. The difference between
>>>> ctx->timestamp and ctx->time is the among of time when the context is not
>>>> enabled. For cpuctx.ctx, time and timestamp should stay the same, however,
>>> 
>>> I'm not sure if it's correct.  The timestamp can go when the context is disabled
>>> for example, in ctx_resched() even if the NMI watchdog is enabled, right?
>> 
>> I think we do not disable EVENT_TIME for per cpu ctx?
> 
> I can see ctx_sched_out(ctx, EVENT_TIME) in some places.
> Also it'd reset EVENT_TIME if both _PINNED and _FLEXIBLE is
> cleared.

Yeah, you are right. I missed this part. 

Hi Peter, 

Does this fix look good do you?

Thanks,
Song
  

Patch

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 380476a934e8..67478f43e26e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3872,7 +3872,7 @@  ctx_sched_in(struct perf_event_context *ctx, enum event_type_t event_type)
 	if (likely(!ctx->nr_events))
 		return;
 
-	if (is_active ^ EVENT_TIME) {
+	if (!(is_active & EVENT_TIME)) {
 		/* start ctx time */
 		__update_context_time(ctx, false);
 		perf_cgroup_set_timestamp(cpuctx);