Message ID | 20230228192145.2120675-1-song@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp3210573wrd; Tue, 28 Feb 2023 11:31:44 -0800 (PST) X-Google-Smtp-Source: AK7set+RwfoipvLJy0ybVAaUFz2j0rpiSDhDf1TmBs2dcRYEqP/VloW3FVFD+pCKTJ7R5Xs2nowp X-Received: by 2002:aa7:c414:0:b0:4b8:9a32:ec07 with SMTP id j20-20020aa7c414000000b004b89a32ec07mr4653933edq.39.1677612704273; Tue, 28 Feb 2023 11:31:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677612704; cv=none; d=google.com; s=arc-20160816; b=QldZBHO/+iorHobDvQGbAMybOrScLuivh7aE3WDSiyX5XrfBYRN4aITbg5rB/ZDoj3 FwKNF1eykFdP0kVSbXfgVTZrq1XY1ThV/j68LTOYUHH+yGp/b7HZzEDjCw9s3Ma8g5W7 kMpikHAs2IL9w80IuSlGOCoG+s/V3jtoY9F6hi1RWboA3lxU7TBeCRmS+yomwfh9VekA rCyxB8rTMhI1li1OlgbynGWCZG+VCB/IWYmSiTyUb3kMxMwJfH9NFIcoVV2YK4xouPS1 yPqQAZzLoAEqxvdJnjTIGmOruuCnYdgubBEFw25g9xNUKUe1jMQVm5kE0JQvQ/r7K6OB SIkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :message-id:date:subject:cc:to:from; bh=NWPadobJEqUfwEWoHXHpEMD3KVIxKTjVDDzbWqh+9I0=; b=ZuGFJ+7ZNu0qj8Nv+YMT9y7akMu02xy4+q5g7Sl9Wxo9mNwYyKM1IIONtfG3IJatVw hDSnbu54845r3POmCAhSNAVd3yMZ9FLhg56vXvyCIuOC1Oz+9QLoWtNQbegpvTzAyhFH fjCe9voN7+JYhMx7rzJoCqjCsm7gH4h8wnw2/ucWflQ6dQZki2LHWB/Yu/ll8Ha/YZsp ikPOJFVaMgJaV6LI4Fz+5eyyZQOR18sC6UvjWSlBKZwntN7GoQoLAxSFnUs1PY4pnilp gRQOTSNAFDdUEKIu7vNifSFLnjl3JIhyHUlSYaQPljtHyi064zu2v7WABaxaw3Ooq5NI DUrQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u13-20020aa7db8d000000b004ad7204b7e4si11048109edt.500.2023.02.28.11.31.21; Tue, 28 Feb 2023 11:31:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229903AbjB1TWA convert rfc822-to-8bit (ORCPT <rfc822;brysonjbanks@gmail.com> + 99 others); Tue, 28 Feb 2023 14:22:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229470AbjB1TV6 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 28 Feb 2023 14:21:58 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A33831A6 for <linux-kernel@vger.kernel.org>; Tue, 28 Feb 2023 11:21:56 -0800 (PST) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 31SIOUKo005371 for <linux-kernel@vger.kernel.org>; Tue, 28 Feb 2023 11:21:56 -0800 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3p1f6hksvt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for <linux-kernel@vger.kernel.org>; Tue, 28 Feb 2023 11:21:56 -0800 Received: from twshared18553.27.frc3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.17; Tue, 28 Feb 2023 11:21:54 -0800 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id 93814172324C9; Tue, 28 Feb 2023 11:21:47 -0800 (PST) From: Song Liu <song@kernel.org> To: <linux-kernel@vger.kernel.org> CC: <kernel-team@meta.com>, Song Liu <song@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Namhyung Kim <namhyung@kernel.org> Subject: [PATCH] perf: fix perf_event_context->time Date: Tue, 28 Feb 2023 11:21:45 -0800 Message-ID: <20230228192145.2120675-1-song@kernel.org> X-Mailer: git-send-email 2.30.2 X-FB-Internal: Safe Content-Type: text/plain X-Proofpoint-ORIG-GUID: X1EUv-MQOayHSKzHgXQ_39_8hTzy6Z68 X-Proofpoint-GUID: X1EUv-MQOayHSKzHgXQ_39_8hTzy6Z68 Content-Transfer-Encoding: 8BIT X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-02-28_17,2023-02-28_03,2023-02-09_01 X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759104419208909843?= X-GMAIL-MSGID: =?utf-8?q?1759104419208909843?= |
Series |
perf: fix perf_event_context->time
|
|
Commit Message
Song Liu
Feb. 28, 2023, 7:21 p.m. UTC
Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get accurate time_enabled and time_running for an event. The difference between ctx->timestamp and ctx->time is the among of time when the context is not enabled. For cpuctx.ctx, time and timestamp should stay the same, however, it is not the case at the moment. To show this with drgn [1]: drgn 0.0.22 (using Python 3.8.6, elfutils 0.185, with libkdumpfile) For help, type help(drgn). ... >>> ctx = per_cpu_ptr(prog['pmu'].pmu_cpu_context, 0).ctx >>> ctx.timestamp * 1.0 / ctx.time (double)1.0385869134111765 >>> ctx->timestamp is about 4% larger than ctx.time. This issue causes time read by perf_event_read_local() goes back in some cases. Fix this by updating the condition for __update_context_time(ctx, false). Specifically, it should only be called when we enable EVENT_TIME for the ctx. [1] drgn: https://github.com/osandov/drgn Fixes: 09f5e7dc7ad7 ("perf: Fix perf_event_read_local() time") Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Song Liu <song@kernel.org> --- kernel/events/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Comments
Hi Song, On Tue, Feb 28, 2023 at 11:22 AM Song Liu <song@kernel.org> wrote: > > Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get > accurate time_enabled and time_running for an event. The difference between > ctx->timestamp and ctx->time is the among of time when the context is not > enabled. For cpuctx.ctx, time and timestamp should stay the same, however, I'm not sure if it's correct. The timestamp can go when the context is disabled for example, in ctx_resched() even if the NMI watchdog is enabled, right? > it is not the case at the moment. To show this with drgn [1]: > > drgn 0.0.22 (using Python 3.8.6, elfutils 0.185, with libkdumpfile) > For help, type help(drgn). > ... > >>> ctx = per_cpu_ptr(prog['pmu'].pmu_cpu_context, 0).ctx > >>> ctx.timestamp * 1.0 / ctx.time > (double)1.0385869134111765 > >>> > > ctx->timestamp is about 4% larger than ctx.time. This issue causes time > read by perf_event_read_local() goes back in some cases. I don't think the difference between the ctx.time and the timestamp itself is a problem. I think the problem is it CAN update the timestamp (and the timeoffset) while the context is enabled. Then depending on the timing, event times can return smaller values than before. > > Fix this by updating the condition for __update_context_time(ctx, false). > Specifically, it should only be called when we enable EVENT_TIME for the > ctx. So this change fixes the condition to update the timestamp only if it moves from disabled to enabled state. > > [1] drgn: https://github.com/osandov/drgn > Fixes: 09f5e7dc7ad7 ("perf: Fix perf_event_read_local() time") > Cc: Peter Zijlstra (Intel) <peterz@infradead.org> > Cc: Namhyung Kim <namhyung@kernel.org> > Signed-off-by: Song Liu <song@kernel.org> Actually I got some similar reports lately but couldn't find time to dig in deeply. Thanks for the fix! Acked-by: Namhyung Kim <namhyung@kernel.org> Thanks, Namhyung > --- > kernel/events/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 380476a934e8..67478f43e26e 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -3872,7 +3872,7 @@ ctx_sched_in(struct perf_event_context *ctx, enum event_type_t event_type) > if (likely(!ctx->nr_events)) > return; > > - if (is_active ^ EVENT_TIME) { > + if (!(is_active & EVENT_TIME)) { > /* start ctx time */ > __update_context_time(ctx, false); > perf_cgroup_set_timestamp(cpuctx); > -- > 2.30.2 >
> On Mar 1, 2023, at 2:29 PM, Namhyung Kim <namhyung@kernel.org> wrote: > > Hi Song, > > On Tue, Feb 28, 2023 at 11:22 AM Song Liu <song@kernel.org> wrote: >> >> Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get >> accurate time_enabled and time_running for an event. The difference between >> ctx->timestamp and ctx->time is the among of time when the context is not >> enabled. For cpuctx.ctx, time and timestamp should stay the same, however, > > I'm not sure if it's correct. The timestamp can go when the context is disabled > for example, in ctx_resched() even if the NMI watchdog is enabled, right? I think we do not disable EVENT_TIME for per cpu ctx? > >> it is not the case at the moment. To show this with drgn [1]: >> >> drgn 0.0.22 (using Python 3.8.6, elfutils 0.185, with libkdumpfile) >> For help, type help(drgn). >> ... >>>>> ctx = per_cpu_ptr(prog['pmu'].pmu_cpu_context, 0).ctx >>>>> ctx.timestamp * 1.0 / ctx.time >> (double)1.0385869134111765 >>>>> >> >> ctx->timestamp is about 4% larger than ctx.time. This issue causes time >> read by perf_event_read_local() goes back in some cases. > > I don't think the difference between the ctx.time and the timestamp > itself is a problem. I think the problem is it CAN update the timestamp > (and the timeoffset) while the context is enabled. Then depending on > the timing, event times can return smaller values than before. For per cpu ctx, I think timeoffset should stay the same (may not be zero though). Thanks, Song
On Wed, Mar 1, 2023 at 3:16 PM Song Liu <songliubraving@meta.com> wrote: > > > > > On Mar 1, 2023, at 2:29 PM, Namhyung Kim <namhyung@kernel.org> wrote: > > > > Hi Song, > > > > On Tue, Feb 28, 2023 at 11:22 AM Song Liu <song@kernel.org> wrote: > >> > >> Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get > >> accurate time_enabled and time_running for an event. The difference between > >> ctx->timestamp and ctx->time is the among of time when the context is not > >> enabled. For cpuctx.ctx, time and timestamp should stay the same, however, > > > > I'm not sure if it's correct. The timestamp can go when the context is disabled > > for example, in ctx_resched() even if the NMI watchdog is enabled, right? > > I think we do not disable EVENT_TIME for per cpu ctx? I can see ctx_sched_out(ctx, EVENT_TIME) in some places. Also it'd reset EVENT_TIME if both _PINNED and _FLEXIBLE is cleared. Thanks, Namhyung > > > > >> it is not the case at the moment. To show this with drgn [1]: > >> > >> drgn 0.0.22 (using Python 3.8.6, elfutils 0.185, with libkdumpfile) > >> For help, type help(drgn). > >> ... > >>>>> ctx = per_cpu_ptr(prog['pmu'].pmu_cpu_context, 0).ctx > >>>>> ctx.timestamp * 1.0 / ctx.time > >> (double)1.0385869134111765 > >>>>> > >> > >> ctx->timestamp is about 4% larger than ctx.time. This issue causes time > >> read by perf_event_read_local() goes back in some cases. > > > > I don't think the difference between the ctx.time and the timestamp > > itself is a problem. I think the problem is it CAN update the timestamp > > (and the timeoffset) while the context is enabled. Then depending on > > the timing, event times can return smaller values than before. > > For per cpu ctx, I think timeoffset should stay the same (may not be zero > though). > > Thanks, > Song > >
> On Mar 2, 2023, at 1:15 PM, Namhyung Kim <namhyung@kernel.org> wrote: > > On Wed, Mar 1, 2023 at 3:16 PM Song Liu <songliubraving@meta.com> wrote: >> >> >> >>> On Mar 1, 2023, at 2:29 PM, Namhyung Kim <namhyung@kernel.org> wrote: >>> >>> Hi Song, >>> >>> On Tue, Feb 28, 2023 at 11:22 AM Song Liu <song@kernel.org> wrote: >>>> >>>> Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get >>>> accurate time_enabled and time_running for an event. The difference between >>>> ctx->timestamp and ctx->time is the among of time when the context is not >>>> enabled. For cpuctx.ctx, time and timestamp should stay the same, however, >>> >>> I'm not sure if it's correct. The timestamp can go when the context is disabled >>> for example, in ctx_resched() even if the NMI watchdog is enabled, right? >> >> I think we do not disable EVENT_TIME for per cpu ctx? > > I can see ctx_sched_out(ctx, EVENT_TIME) in some places. > Also it'd reset EVENT_TIME if both _PINNED and _FLEXIBLE is > cleared. Yeah, you are right. I missed this part. Hi Peter, Does this fix look good do you? Thanks, Song
diff --git a/kernel/events/core.c b/kernel/events/core.c index 380476a934e8..67478f43e26e 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3872,7 +3872,7 @@ ctx_sched_in(struct perf_event_context *ctx, enum event_type_t event_type) if (likely(!ctx->nr_events)) return; - if (is_active ^ EVENT_TIME) { + if (!(is_active & EVENT_TIME)) { /* start ctx time */ __update_context_time(ctx, false); perf_cgroup_set_timestamp(cpuctx);