Message ID | 20230123182728.825519-4-kan.liang@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1758615wrn; Mon, 23 Jan 2023 10:36:48 -0800 (PST) X-Google-Smtp-Source: AMrXdXvxR8cebMeiIylyapU/YF+B7i8+j3SWeaa5IUiXsGvuBjb4+61+WUB8hFVODa82Zzaq7vu3 X-Received: by 2002:a17:902:d507:b0:194:d7df:cfad with SMTP id b7-20020a170902d50700b00194d7dfcfadmr22602119plg.18.1674499007801; Mon, 23 Jan 2023 10:36:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674499007; cv=none; d=google.com; s=arc-20160816; b=kjsf2x+peJMaAjAzXSvIDXI+EyzNhiFUqleJxehg8MeXF/BB2LoVeOTTRCq9WVZC6d 7hZQifAozcWqRUzSz5x6Apv4s99Vx54fahoqsFOPuVWaLIje2FOFzCzj+CVG3pYqg7xV TbWDrE97h2OqXx81ro9YgV/ji4XO9X2t8xgu6Cn8cF0pqkiNFdWQrC9ED/cHZwK0Xv6R 3BO7sRtPdxvDfs8V4m2H61JxiCzVHDyvHwMKRSbK0dKzWxTiuxcwxX7guGVh/KN4lUBp oKFyLOedalUSW9+nrZbPxYwflciGJl6NNJUJttdLCc2XA+lwAw164Nhp3M+MvHPUAxTO zRnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=DWn9upWoVgpEVfkAiJjO0rv5C7xT20KjB5dfyv6Jrow=; b=j/PblusOFOwvAQtnmunen8sVSbYH6KHbq1piJB//Pi/iC7K/cRk0CxoLklDnmJ8unC fEW1Ku6J2njy+YeqyMUdwDGGcW4jRHzBJL9KjsAKcOgBNGEFqqjGuWeB85HYOheicU0u HCbkhmYZoYFo4L3gIbpftd4kTVBuQlVZMyEpbunTW1vVxTtr7clCTxcvZR+1SI3PZFtV uTQFrW+FYEoqwXtZCjxqLRdairLI1KzET3YuIirUGNEKLlgQptRt8ux0xGER691BrcxR MzI69SbdMQHDXRPbPpZdPbFdkt+W5UAXfGTduB3OFyuwY1Sr7iEAqehzINq3bCLsgiTs jH4w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=gkEa+JMG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a4-20020a170902ecc400b00192a14233e9si5154792plh.586.2023.01.23.10.36.35; Mon, 23 Jan 2023 10:36:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=gkEa+JMG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233182AbjAWSao (ORCPT <rfc822;rust.linux@gmail.com> + 99 others); Mon, 23 Jan 2023 13:30:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233240AbjAWSa0 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 23 Jan 2023 13:30:26 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1AA332E57 for <linux-kernel@vger.kernel.org>; Mon, 23 Jan 2023 10:29:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674498598; x=1706034598; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ReP9kpyfBFbBhxCpPh596y2tC02v63s7zvaRoytHorM=; b=gkEa+JMG+7hyphJjnZF7brly+EIcgVpWc4pfqvqvpET1vqLbItOHDqjC zprKyv6ghy0Hx2+Z+2vy4yUrp9ZKMdWBMENZgMEizcqpJoXMKtmVq1biA w8qS5IX6OZi+xNFsweVshSvAmm4o1IlJ8Ej9RRU9jbOR1bQ+Ly8NzLOuC kRJn6hXYxUJv+alDQtmQFt6oIwTmdgf2iQJwx5R0cdn/ezvp9qo332BW5 P+AappiZgn5DL+FRcGA4mSPr2+Y8pyBvwqf4c+Kqk1TQ2sqwdSJ2eE7f8 Fya9hgxY/SFOSrvT42cID649Nkwh4zirOzkAeDFy69enEvjJBs915TgHG w==; X-IronPort-AV: E=McAfee;i="6500,9779,10599"; a="328201807" X-IronPort-AV: E=Sophos;i="5.97,240,1669104000"; d="scan'208";a="328201807" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jan 2023 10:27:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10599"; a="661812076" X-IronPort-AV: E=Sophos;i="5.97,240,1669104000"; d="scan'208";a="661812076" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by orsmga002.jf.intel.com with ESMTP; 23 Jan 2023 10:27:43 -0800 From: kan.liang@linux.intel.com To: peterz@infradead.org, mingo@redhat.com, tglx@linutronix.de, jstultz@google.com, sboyd@kernel.org, linux-kernel@vger.kernel.org Cc: eranian@google.com, namhyung@kernel.org, ak@linux.intel.com, Kan Liang <kan.liang@linux.intel.com> Subject: [PATCH 3/3] perf/x86/intel/ds: Support monotonic clock for PEBS Date: Mon, 23 Jan 2023 10:27:28 -0800 Message-Id: <20230123182728.825519-4-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230123182728.825519-1-kan.liang@linux.intel.com> References: <20230123182728.825519-1-kan.liang@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755839471382211225?= X-GMAIL-MSGID: =?utf-8?q?1755839471382211225?= |
Series |
Convert TSC to monotonic clock for PEBS
|
|
Commit Message
Liang, Kan
Jan. 23, 2023, 6:27 p.m. UTC
From: Kan Liang <kan.liang@linux.intel.com> Users try to reconcile user samples with PEBS samples and require a common clock source. However, the current PEBS codes only convert to sched_clock, which is not available from the user space. Only support converting to clock monotonic. Having one common clock source is good enough to fulfill the requirement. Enable the large PEBS for the monotonic clock to reduce the PEBS overhead. There are a few rare cases that may make the conversion fails. For example, TSC overflows. The cycle_last may be changed between samples. The time will fallback to the inaccurate SW times. But the cases are extremely unlikely to happen. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> --- The patch has to be on top of the below patch https://lore.kernel.org/all/20230123172027.125385-1-kan.liang@linux.intel.com/ arch/x86/events/intel/core.c | 2 +- arch/x86/events/intel/ds.c | 30 ++++++++++++++++++++++++++---- 2 files changed, 27 insertions(+), 5 deletions(-)
Comments
On Mon, Jan 23, 2023 at 10:27 AM <kan.liang@linux.intel.com> wrote: > > From: Kan Liang <kan.liang@linux.intel.com> > > Users try to reconcile user samples with PEBS samples and require a > common clock source. However, the current PEBS codes only convert to > sched_clock, which is not available from the user space. > > Only support converting to clock monotonic. Having one common clock > source is good enough to fulfill the requirement. > > Enable the large PEBS for the monotonic clock to reduce the PEBS > overhead. > > There are a few rare cases that may make the conversion fails. For > example, TSC overflows. The cycle_last may be changed between samples. > The time will fallback to the inaccurate SW times. But the cases are > extremely unlikely to happen. > > Signed-off-by: Kan Liang <kan.liang@linux.intel.com> > --- Thanks for sending this out! A few minor style issues below and a warning. > The patch has to be on top of the below patch > https://lore.kernel.org/all/20230123172027.125385-1-kan.liang@linux.intel.com/ > > arch/x86/events/intel/core.c | 2 +- > arch/x86/events/intel/ds.c | 30 ++++++++++++++++++++++++++---- > 2 files changed, 27 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c > index 14f0a746257d..ea194556cc73 100644 > --- a/arch/x86/events/intel/core.c > +++ b/arch/x86/events/intel/core.c > @@ -3777,7 +3777,7 @@ static unsigned long intel_pmu_large_pebs_flags(struct perf_event *event) > { > unsigned long flags = x86_pmu.large_pebs_flags; > > - if (event->attr.use_clockid) > + if (event->attr.use_clockid && (event->attr.clockid != CLOCK_MONOTONIC)) > flags &= ~PERF_SAMPLE_TIME; > if (!event->attr.exclude_kernel) > flags &= ~PERF_SAMPLE_REGS_USER; > diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c > index 7980e92dec64..d7f0eaf4405c 100644 > --- a/arch/x86/events/intel/ds.c > +++ b/arch/x86/events/intel/ds.c > @@ -1570,13 +1570,33 @@ static u64 get_data_src(struct perf_event *event, u64 aux) > return val; > } > > +static int pebs_get_synctime(struct system_counterval_t *system, > + void *ctx) Just because the abstract function type taken by get_mono_fast_from_given_time is vague, doesn't mean the implementation needs to be. ctx is really a tsc value, right? So let's call it that to make this a bit more readable. > +{ > + *system = set_tsc_system_counterval(*(u64 *)ctx); > + return 0; > +} > + > +static inline int pebs_clockid_time(clockid_t clk_id, u64 tsc, u64 *clk_id_time) clk_id_time is maybe a bit too fuzzy. It is really a mono_ns value, right? Let's keep that explicit here. > +{ > + /* Only support converting to clock monotonic */ > + if (clk_id != CLOCK_MONOTONIC) > + return -EINVAL; > + > + return get_mono_fast_from_given_time(pebs_get_synctime, &tsc, clk_id_time); > +} > + > static void setup_pebs_time(struct perf_event *event, > struct perf_sample_data *data, > u64 tsc) > { > - /* Converting to a user-defined clock is not supported yet. */ > - if (event->attr.use_clockid != 0) > - return; > + u64 time; Again, "time" is too generic a term without any context here. mono_nsec or something would be more clear. > + > + if (event->attr.use_clockid != 0) { > + if (pebs_clockid_time(event->attr.clockid, tsc, &time)) > + return; > + goto done; > + } Apologies for this warning/rant: So, I do get the NMI safety of the "fast" time accessors (along with the "high performance" sounding name!) is attractive, but as its use expands I worry the downsides of this interface isn't made clear enough. The fast accessors *can* see time discontinuities! Because the logic is done without holding the tk_core.seq lock, If you are reading in the middle of a ntp adjustment, you may find the current value to be larger than the next time you read the time. These discontinuities are likely to be very small, but a negative delta will look very large as a u64. So part of using these "fast *and unsafe*" interfaces is you get to keep both pieces when it breaks. Make sure the code here that is using these interfaces guards against this (zeroing out negative deltas). thanks -john
On 2023-01-24 1:56 a.m., John Stultz wrote: > On Mon, Jan 23, 2023 at 10:27 AM <kan.liang@linux.intel.com> wrote: >> >> From: Kan Liang <kan.liang@linux.intel.com> >> >> Users try to reconcile user samples with PEBS samples and require a >> common clock source. However, the current PEBS codes only convert to >> sched_clock, which is not available from the user space. >> >> Only support converting to clock monotonic. Having one common clock >> source is good enough to fulfill the requirement. >> >> Enable the large PEBS for the monotonic clock to reduce the PEBS >> overhead. >> >> There are a few rare cases that may make the conversion fails. For >> example, TSC overflows. The cycle_last may be changed between samples. >> The time will fallback to the inaccurate SW times. But the cases are >> extremely unlikely to happen. >> >> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> >> --- > > Thanks for sending this out! > A few minor style issues below and a warning. Thanks. > >> The patch has to be on top of the below patch >> https://lore.kernel.org/all/20230123172027.125385-1-kan.liang@linux.intel.com/ >> >> arch/x86/events/intel/core.c | 2 +- >> arch/x86/events/intel/ds.c | 30 ++++++++++++++++++++++++++---- >> 2 files changed, 27 insertions(+), 5 deletions(-) >> >> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c >> index 14f0a746257d..ea194556cc73 100644 >> --- a/arch/x86/events/intel/core.c >> +++ b/arch/x86/events/intel/core.c >> @@ -3777,7 +3777,7 @@ static unsigned long intel_pmu_large_pebs_flags(struct perf_event *event) >> { >> unsigned long flags = x86_pmu.large_pebs_flags; >> >> - if (event->attr.use_clockid) >> + if (event->attr.use_clockid && (event->attr.clockid != CLOCK_MONOTONIC)) >> flags &= ~PERF_SAMPLE_TIME; >> if (!event->attr.exclude_kernel) >> flags &= ~PERF_SAMPLE_REGS_USER; >> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c >> index 7980e92dec64..d7f0eaf4405c 100644 >> --- a/arch/x86/events/intel/ds.c >> +++ b/arch/x86/events/intel/ds.c >> @@ -1570,13 +1570,33 @@ static u64 get_data_src(struct perf_event *event, u64 aux) >> return val; >> } >> >> +static int pebs_get_synctime(struct system_counterval_t *system, >> + void *ctx) > > Just because the abstract function type taken by > get_mono_fast_from_given_time is vague, doesn't mean the > implementation needs to be. > ctx is really a tsc value, right? So let's call it that to make this a > bit more readable. Sure, I will use the tsc to replace ctx. > >> +{ >> + *system = set_tsc_system_counterval(*(u64 *)ctx); >> + return 0; >> +} >> + >> +static inline int pebs_clockid_time(clockid_t clk_id, u64 tsc, u64 *clk_id_time) > > clk_id_time is maybe a bit too fuzzy. It is really a mono_ns value, > right? Let's keep that explicit here. Yes. Will make it explicit. > >> +{ >> + /* Only support converting to clock monotonic */ >> + if (clk_id != CLOCK_MONOTONIC) >> + return -EINVAL; >> + >> + return get_mono_fast_from_given_time(pebs_get_synctime, &tsc, clk_id_time); >> +} >> + >> static void setup_pebs_time(struct perf_event *event, >> struct perf_sample_data *data, >> u64 tsc) >> { >> - /* Converting to a user-defined clock is not supported yet. */ >> - if (event->attr.use_clockid != 0) >> - return; >> + u64 time; > > Again, "time" is too generic a term without any context here. > mono_nsec or something would be more clear. Sure. > >> + >> + if (event->attr.use_clockid != 0) { >> + if (pebs_clockid_time(event->attr.clockid, tsc, &time)) >> + return; >> + goto done; >> + } > > Apologies for this warning/rant: > > So, I do get the NMI safety of the "fast" time accessors (along with > the "high performance" sounding name!) is attractive, but as its use > expands I worry the downsides of this interface isn't made clear > enough. > > The fast accessors *can* see time discontinuities! Because the logic > is done without holding the tk_core.seq lock, If you are reading in > the middle of a ntp adjustment, you may find the current value to be > larger than the next time you read the time. These discontinuities > are likely to be very small, but a negative delta will look very large > as a u64. So part of using these "fast *and unsafe*" interfaces is > you get to keep both pieces when it breaks. Make sure the code here > that is using these interfaces guards against this (zeroing out > negative deltas). > Thanks for the warning. I will add more comments and specially handle it here. Thanks, Kan
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 14f0a746257d..ea194556cc73 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3777,7 +3777,7 @@ static unsigned long intel_pmu_large_pebs_flags(struct perf_event *event) { unsigned long flags = x86_pmu.large_pebs_flags; - if (event->attr.use_clockid) + if (event->attr.use_clockid && (event->attr.clockid != CLOCK_MONOTONIC)) flags &= ~PERF_SAMPLE_TIME; if (!event->attr.exclude_kernel) flags &= ~PERF_SAMPLE_REGS_USER; diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 7980e92dec64..d7f0eaf4405c 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1570,13 +1570,33 @@ static u64 get_data_src(struct perf_event *event, u64 aux) return val; } +static int pebs_get_synctime(struct system_counterval_t *system, + void *ctx) +{ + *system = set_tsc_system_counterval(*(u64 *)ctx); + return 0; +} + +static inline int pebs_clockid_time(clockid_t clk_id, u64 tsc, u64 *clk_id_time) +{ + /* Only support converting to clock monotonic */ + if (clk_id != CLOCK_MONOTONIC) + return -EINVAL; + + return get_mono_fast_from_given_time(pebs_get_synctime, &tsc, clk_id_time); +} + static void setup_pebs_time(struct perf_event *event, struct perf_sample_data *data, u64 tsc) { - /* Converting to a user-defined clock is not supported yet. */ - if (event->attr.use_clockid != 0) - return; + u64 time; + + if (event->attr.use_clockid != 0) { + if (pebs_clockid_time(event->attr.clockid, tsc, &time)) + return; + goto done; + } /* * Converting the TSC to perf time is only supported, @@ -1587,8 +1607,10 @@ static void setup_pebs_time(struct perf_event *event, */ if (!using_native_sched_clock() || !sched_clock_stable()) return; + time = native_sched_clock_from_tsc(tsc) + __sched_clock_offset; - data->time = native_sched_clock_from_tsc(tsc) + __sched_clock_offset; +done: + data->time = time; data->sample_flags |= PERF_SAMPLE_TIME; }