Message ID | 20231123121851.10826-3-adrian.hunter@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce62:0:b0:403:3b70:6f57 with SMTP id o2csp401880vqx; Thu, 23 Nov 2023 04:19:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IEdsXnXC0TNWZzken8w+zvT7kWSPlelWElBJVOvzGvGYPoYBohRFe0wHUkL3qZ/CdEKatBm X-Received: by 2002:a05:6a00:1248:b0:6cb:63cb:83c0 with SMTP id u8-20020a056a00124800b006cb63cb83c0mr5769252pfi.29.1700741975734; Thu, 23 Nov 2023 04:19:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700741975; cv=none; d=google.com; s=arc-20160816; b=KZvW03m7kPqde2qpZ9E4ZM1IrmvzK96lZrQMRKUzk7ZAukvNO/LGVyQsWDWEUBBwAI ezf5TNnuQRFGsY3RVsem6IHSoXCBmKXJT2CP2Ix7HyHsGWMDo4f+ZljUSEQIf+8uoTDb oINNUbL5cyT5GeHysTsSNne3el3Ik8cbVOvpjZ726PY5lvhOv5joGYH1jiQ05A7YPQ+9 Cc8Ilr2gXqFLsU3NeMEb8Dr4DGDUqWN6RcrI1IkcIEpPoEDOAXh3eZkqi+xOqWfYxA9C U4t23aBRzrNlpW1rO+/omsaIUHMR/OMB0gOMi0UWevUTgLO3DnD84dt22i1tbpkN249O vo7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:organization :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=RNzh3upIfUQ/SiOWExsLoNPak1m35NKI7BJelv+ATX4=; fh=cd6f3jNh3j7Qvd5mXmd/oCoIQnwAYeKd0j29JiSirLE=; b=erTzQ26taj5/x+6XfNp2O1lsH+459r8MBVnGtKJ9Jj7nnmwP4w9dcGvwWmoMmB/phU NTvRxslCYNvhb3wp7Q4xGhKHyaWCbFxG35boHlnEeFrULf4f2EvR9J+G4J5Ns0ekQ0ME 4ijGEYGJs3HjIooJz0UoNRN3JY729Rhg+8n6tnzoVLpuWHdeqsqMUuwjd6l8aehc5ioz q+896igdZkyr0ocYV0wKQdsOhPKFKp3Uf9fIosALVQsLubDKr38sEB0tXQZGhl1ya+Ea RQygx7/I75h8lV81OBKn5nrYjJ6aIhulQTtJ3uCyipHPvOtW2eftSleM3P0b3IXQVkYS Qpow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=gWMCoVhC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id q5-20020a056a00150500b006cb89d5bd1esi1186245pfu.264.2023.11.23.04.19.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Nov 2023 04:19:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=gWMCoVhC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id EE567809188C; Thu, 23 Nov 2023 04:19:29 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345276AbjKWMTU (ORCPT <rfc822;ouuuleilei@gmail.com> + 99 others); Thu, 23 Nov 2023 07:19:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41408 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345280AbjKWMTR (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 23 Nov 2023 07:19:17 -0500 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F20F1D42; Thu, 23 Nov 2023 04:19:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700741964; x=1732277964; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8aCNwpSFHAyXWSCRr6tTvoejhaLHrQ0RcXndxzv80fA=; b=gWMCoVhCK77yAS/boOmjr8CBHR3C6EJ8rUVhE0PkIexL+VjjCH380kZ1 hZ2fpRnkQqOTrgdfnLdYUTiRf9Q7qqdJFOjdDUSr/rvCPjXBcYg2uMsaj ekb2K6iS6i8bAxCnERhOvArnT1/2Mp3d2Vp9prxB9Yh0PZQVSo4cbrAYe BGko5U4gRsPkYesaQs6YYgNBScjHi7oxddftAuRFkXJXAPDwz5UyxAttY yFkrzbIKT631k/F/IsW1iN/5KACb4X9CVO2BLCU+e+zd1J0tNxcKUrl3i SnS1TKPH9P1EMbdO+vShL4fyaNGGPNRWlwkyruUq8zHUWLF0zGUDAn+Xa A==; X-IronPort-AV: E=McAfee;i="6600,9927,10902"; a="478456361" X-IronPort-AV: E=Sophos;i="6.04,221,1695711600"; d="scan'208";a="478456361" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Nov 2023 04:19:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10902"; a="890774792" X-IronPort-AV: E=Sophos;i="6.04,221,1695711600"; d="scan'208";a="890774792" Received: from ahunter6-mobl1.ger.corp.intel.com (HELO ahunter-VirtualBox.home\044ger.corp.intel.com) ([10.252.41.107]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Nov 2023 04:19:18 -0800 From: Adrian Hunter <adrian.hunter@intel.com> To: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com>, Mark Rutland <mark.rutland@arm.com>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Heiko Carstens <hca@linux.ibm.com>, Thomas Richter <tmricht@linux.ibm.com>, Hendrik Brueckner <brueckner@linux.ibm.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, Mike Leach <mike.leach@linaro.org>, James Clark <james.clark@arm.com>, coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org, Yicong Yang <yangyicong@hisilicon.com>, Jonathan Cameron <jonathan.cameron@huawei.com>, Will Deacon <will@kernel.org>, Arnaldo Carvalho de Melo <acme@kernel.org>, Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>, Ian Rogers <irogers@google.com>, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Subject: [PATCH RFC 2/3] perf/x86/intel/pt: Add support for pause_resume() Date: Thu, 23 Nov 2023 14:18:50 +0200 Message-Id: <20231123121851.10826-3-adrian.hunter@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231123121851.10826-1-adrian.hunter@intel.com> References: <20231123121851.10826-1-adrian.hunter@intel.com> MIME-Version: 1.0 Organization: Intel Finland Oy, Registered Address: PL 281, 00181 Helsinki, Business Identity Code: 0357606 - 4, Domiciled in Helsinki Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Thu, 23 Nov 2023 04:19:30 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783357217829040422 X-GMAIL-MSGID: 1783357217829040422 |
Series |
perf/core: Add ability for an event to "pause" or "resume" AUX area tracing
|
|
Commit Message
Adrian Hunter
Nov. 23, 2023, 12:18 p.m. UTC
Prevent tracing to start if aux_paused.
Implement pause_resume() callback. When aux_paused, stop tracing. When
not aux_paused, only start tracing if it isn't currently meant to be
stopped.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
arch/x86/events/intel/pt.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
Comments
On 23/11/2023 12:18, Adrian Hunter wrote: > Prevent tracing to start if aux_paused. > > Implement pause_resume() callback. When aux_paused, stop tracing. When > not aux_paused, only start tracing if it isn't currently meant to be > stopped. > > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> > --- > arch/x86/events/intel/pt.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c > index 42a55794004a..aa883b64814a 100644 > --- a/arch/x86/events/intel/pt.c > +++ b/arch/x86/events/intel/pt.c > @@ -418,6 +418,9 @@ static void pt_config_start(struct perf_event *event) > struct pt *pt = this_cpu_ptr(&pt_ctx); > u64 ctl = event->hw.config; > > + if (event->aux_paused) > + return; > + > ctl |= RTIT_CTL_TRACEEN; > if (READ_ONCE(pt->vmx_on)) > perf_aux_output_flag(&pt->handle, PERF_AUX_FLAG_PARTIAL); > @@ -1563,6 +1566,14 @@ EXPORT_SYMBOL_GPL(intel_pt_handle_vmx); > * PMU callbacks > */ > > +static void pt_event_pause_resume(struct perf_event *event) > +{ > + if (event->aux_paused) > + pt_config_stop(event); > + else if (!event->hw.state) > + pt_config_start(event); > +} It seems like having a single pause/resume callback rather than separate pause and resume ones pushes some of the event state management into the individual drivers and would be prone to code duplication and divergent behavior. Would it be possible to move the conditions from here into the core code and call separate functions instead? > + > static void pt_event_start(struct perf_event *event, int mode) > { > struct hw_perf_event *hwc = &event->hw; > @@ -1798,6 +1809,7 @@ static __init int pt_init(void) > pt_pmu.pmu.del = pt_event_del; > pt_pmu.pmu.start = pt_event_start; > pt_pmu.pmu.stop = pt_event_stop; > + pt_pmu.pmu.pause_resume = pt_event_pause_resume; The general idea seems ok to me. Is there a reason to not use the existing start() stop() callbacks, rather than adding a new one? I assume it's intended to be something like an optimisation where you can turn it on and off without having to do the full setup, teardown and emit an AUX record because you know the process being traced never gets switched out? Could you make it so that it works out of the box, with the option of later optimisation if you do something like this (not here but something like this in events/core.c): /* Use specialised pause/resume if it exists, otherwise use more * expensive start/stop. */ if (pmu->pause_resume) pmu->pause_resume(...) else pmu->stop(...) > pt_pmu.pmu.snapshot_aux = pt_event_snapshot_aux; > pt_pmu.pmu.read = pt_event_read; > pt_pmu.pmu.setup_aux = pt_buffer_setup_aux;
On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote: > On 23/11/2023 12:18, Adrian Hunter wrote: > > +static void pt_event_pause_resume(struct perf_event *event) > > +{ > > + if (event->aux_paused) > > + pt_config_stop(event); > > + else if (!event->hw.state) > > + pt_config_start(event); > > +} > > It seems like having a single pause/resume callback rather than separate > pause and resume ones pushes some of the event state management into the > individual drivers and would be prone to code duplication and divergent > behavior. > > Would it be possible to move the conditions from here into the core code > and call separate functions instead? > > > + > > static void pt_event_start(struct perf_event *event, int mode) > > { > > struct hw_perf_event *hwc = &event->hw; > > @@ -1798,6 +1809,7 @@ static __init int pt_init(void) > > pt_pmu.pmu.del = pt_event_del; > > pt_pmu.pmu.start = pt_event_start; > > pt_pmu.pmu.stop = pt_event_stop; > > + pt_pmu.pmu.pause_resume = pt_event_pause_resume; > > The general idea seems ok to me. Is there a reason to not use the > existing start() stop() callbacks, rather than adding a new one? > > I assume it's intended to be something like an optimisation where you > can turn it on and off without having to do the full setup, teardown and > emit an AUX record because you know the process being traced never gets > switched out? So the actual scheduling uses ->add() / ->del(), the ->start() / ->stop() methods are something that can be used after ->add() and before ->del() to 'temporarily' pause things. Pretty much exactly what is required here I think. We currently use this for PMI throttling and adaptive frequency stuff, but there is no reason it could not also be used for this. As is, we don't track the paused state across ->del() / ->add(), but perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_ bits to manage things.
On 29/11/23 12:58, Peter Zijlstra wrote: > On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote: >> On 23/11/2023 12:18, Adrian Hunter wrote: > >>> +static void pt_event_pause_resume(struct perf_event *event) >>> +{ >>> + if (event->aux_paused) >>> + pt_config_stop(event); >>> + else if (!event->hw.state) >>> + pt_config_start(event); >>> +} >> >> It seems like having a single pause/resume callback rather than separate >> pause and resume ones pushes some of the event state management into the >> individual drivers and would be prone to code duplication and divergent >> behavior. >> >> Would it be possible to move the conditions from here into the core code >> and call separate functions instead? >> >>> + >>> static void pt_event_start(struct perf_event *event, int mode) >>> { >>> struct hw_perf_event *hwc = &event->hw; >>> @@ -1798,6 +1809,7 @@ static __init int pt_init(void) >>> pt_pmu.pmu.del = pt_event_del; >>> pt_pmu.pmu.start = pt_event_start; >>> pt_pmu.pmu.stop = pt_event_stop; >>> + pt_pmu.pmu.pause_resume = pt_event_pause_resume; >> >> The general idea seems ok to me. Is there a reason to not use the >> existing start() stop() callbacks, rather than adding a new one? >> >> I assume it's intended to be something like an optimisation where you >> can turn it on and off without having to do the full setup, teardown and >> emit an AUX record because you know the process being traced never gets >> switched out? > > So the actual scheduling uses ->add() / ->del(), the ->start() / > ->stop() methods are something that can be used after ->add() and before > ->del() to 'temporarily' pause things. > > Pretty much exactly what is required here I think. We currently use this > for PMI throttling and adaptive frequency stuff, but there is no reason > it could not also be used for this. > > As is, we don't track the paused state across ->del() / ->add(), but > perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_ > bits to manage things. > > I am not sure stop / start play nice with NMI's from other events e.g. PMC NMI wants to pause or resume AUX but what if AUX event is currently being processed in ->stop() or ->start()? Or maybe that can't happen?
On Wed, Nov 29, 2023 at 01:15:43PM +0200, Adrian Hunter wrote: > On 29/11/23 12:58, Peter Zijlstra wrote: > > On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote: > >> On 23/11/2023 12:18, Adrian Hunter wrote: > > > >>> +static void pt_event_pause_resume(struct perf_event *event) > >>> +{ > >>> + if (event->aux_paused) > >>> + pt_config_stop(event); > >>> + else if (!event->hw.state) > >>> + pt_config_start(event); > >>> +} > >> > >> It seems like having a single pause/resume callback rather than separate > >> pause and resume ones pushes some of the event state management into the > >> individual drivers and would be prone to code duplication and divergent > >> behavior. > >> > >> Would it be possible to move the conditions from here into the core code > >> and call separate functions instead? > >> > >>> + > >>> static void pt_event_start(struct perf_event *event, int mode) > >>> { > >>> struct hw_perf_event *hwc = &event->hw; > >>> @@ -1798,6 +1809,7 @@ static __init int pt_init(void) > >>> pt_pmu.pmu.del = pt_event_del; > >>> pt_pmu.pmu.start = pt_event_start; > >>> pt_pmu.pmu.stop = pt_event_stop; > >>> + pt_pmu.pmu.pause_resume = pt_event_pause_resume; > >> > >> The general idea seems ok to me. Is there a reason to not use the > >> existing start() stop() callbacks, rather than adding a new one? > >> > >> I assume it's intended to be something like an optimisation where you > >> can turn it on and off without having to do the full setup, teardown and > >> emit an AUX record because you know the process being traced never gets > >> switched out? > > > > So the actual scheduling uses ->add() / ->del(), the ->start() / > > ->stop() methods are something that can be used after ->add() and before > > ->del() to 'temporarily' pause things. > > > > Pretty much exactly what is required here I think. We currently use this > > for PMI throttling and adaptive frequency stuff, but there is no reason > > it could not also be used for this. > > > > As is, we don't track the paused state across ->del() / ->add(), but > > perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_ > > bits to manage things. > > > > > > I am not sure stop / start play nice with NMI's from other events e.g. > > PMC NMI wants to pause or resume AUX but what if AUX event is currently > being processed in ->stop() or ->start()? Or maybe that can't happen? I think that can happen, and pt_event_stop() can actually handle some of that, while your pause_resume() thing, which uses pt_config_stop() does not. But yes, I think that if you add pt_event_{stop,start}() calls from *other* events their PMI, then you get to deal with more 'fun'. Something like: perf_addr_filters_adjust() __perf_addr_filters_adjust() perf_event_stop() __perf_event_stop() event->pmu->stop() <NMI> ... perf_event_overflow() pt_event->pmu->stop() </NMI> event->pmu->start() // whoopsie! Should now be possible. I think what you want to do is rename pt->handle_nmi into pt->stop_count and make it a counter, then ->stop() increments it, and ->start() decrements it and everybody ensures the thing doesn't get restart while !0 etc.. I suspect you need to guard the generic part of this feature with a new PERF_PMU_CAP_ flag and then have the coresight/etc. people opt-in once they've audited things. James, does that work for you?
On 29/11/2023 12:23, Peter Zijlstra wrote: > On Wed, Nov 29, 2023 at 01:15:43PM +0200, Adrian Hunter wrote: >> On 29/11/23 12:58, Peter Zijlstra wrote: >>> On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote: >>>> On 23/11/2023 12:18, Adrian Hunter wrote: >>> >>>>> +static void pt_event_pause_resume(struct perf_event *event) >>>>> +{ >>>>> + if (event->aux_paused) >>>>> + pt_config_stop(event); >>>>> + else if (!event->hw.state) >>>>> + pt_config_start(event); >>>>> +} >>>> >>>> It seems like having a single pause/resume callback rather than separate >>>> pause and resume ones pushes some of the event state management into the >>>> individual drivers and would be prone to code duplication and divergent >>>> behavior. >>>> >>>> Would it be possible to move the conditions from here into the core code >>>> and call separate functions instead? >>>> >>>>> + >>>>> static void pt_event_start(struct perf_event *event, int mode) >>>>> { >>>>> struct hw_perf_event *hwc = &event->hw; >>>>> @@ -1798,6 +1809,7 @@ static __init int pt_init(void) >>>>> pt_pmu.pmu.del = pt_event_del; >>>>> pt_pmu.pmu.start = pt_event_start; >>>>> pt_pmu.pmu.stop = pt_event_stop; >>>>> + pt_pmu.pmu.pause_resume = pt_event_pause_resume; >>>> >>>> The general idea seems ok to me. Is there a reason to not use the >>>> existing start() stop() callbacks, rather than adding a new one? >>>> >>>> I assume it's intended to be something like an optimisation where you >>>> can turn it on and off without having to do the full setup, teardown and >>>> emit an AUX record because you know the process being traced never gets >>>> switched out? >>> >>> So the actual scheduling uses ->add() / ->del(), the ->start() / >>> ->stop() methods are something that can be used after ->add() and before >>> ->del() to 'temporarily' pause things. >>> >>> Pretty much exactly what is required here I think. We currently use this >>> for PMI throttling and adaptive frequency stuff, but there is no reason >>> it could not also be used for this. >>> >>> As is, we don't track the paused state across ->del() / ->add(), but >>> perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_ >>> bits to manage things. >>> >>> >> >> I am not sure stop / start play nice with NMI's from other events e.g. >> >> PMC NMI wants to pause or resume AUX but what if AUX event is currently >> being processed in ->stop() or ->start()? Or maybe that can't happen? > > I think that can happen, and pt_event_stop() can actually handle some of > that, while your pause_resume() thing, which uses pt_config_stop() does > not. > > But yes, I think that if you add pt_event_{stop,start}() calls from > *other* events their PMI, then you get to deal with more 'fun'. > > Something like: > > perf_addr_filters_adjust() > __perf_addr_filters_adjust() > perf_event_stop() > __perf_event_stop() > event->pmu->stop() > <NMI> > ... > perf_event_overflow() > pt_event->pmu->stop() > </NMI> > event->pmu->start() // whoopsie! > > Should now be possible. > > I think what you want to do is rename pt->handle_nmi into pt->stop_count > and make it a counter, then ->stop() increments it, and ->start() > decrements it and everybody ensures the thing doesn't get restart while > !0 etc.. > > I suspect you need to guard the generic part of this feature with a new > PERF_PMU_CAP_ flag and then have the coresight/etc. people opt-in once > they've audited things. > > James, does that work for you? > Yep I think that would work. If I understand it with the stop_count counter, to be able to support the new CAP, every device would basically have to implement a similar counter? Would it be possible to do that reference counting on the outside of start() and stop() in the core code? So that a device only ever sees one call to start and one call to stop and doesn't need to do any extra tracking?
On 30/11/23 12:07, James Clark wrote: > > > On 29/11/2023 12:23, Peter Zijlstra wrote: >> On Wed, Nov 29, 2023 at 01:15:43PM +0200, Adrian Hunter wrote: >>> On 29/11/23 12:58, Peter Zijlstra wrote: >>>> On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote: >>>>> On 23/11/2023 12:18, Adrian Hunter wrote: >>>> >>>>>> +static void pt_event_pause_resume(struct perf_event *event) >>>>>> +{ >>>>>> + if (event->aux_paused) >>>>>> + pt_config_stop(event); >>>>>> + else if (!event->hw.state) >>>>>> + pt_config_start(event); >>>>>> +} >>>>> >>>>> It seems like having a single pause/resume callback rather than separate >>>>> pause and resume ones pushes some of the event state management into the >>>>> individual drivers and would be prone to code duplication and divergent >>>>> behavior. >>>>> >>>>> Would it be possible to move the conditions from here into the core code >>>>> and call separate functions instead? >>>>> >>>>>> + >>>>>> static void pt_event_start(struct perf_event *event, int mode) >>>>>> { >>>>>> struct hw_perf_event *hwc = &event->hw; >>>>>> @@ -1798,6 +1809,7 @@ static __init int pt_init(void) >>>>>> pt_pmu.pmu.del = pt_event_del; >>>>>> pt_pmu.pmu.start = pt_event_start; >>>>>> pt_pmu.pmu.stop = pt_event_stop; >>>>>> + pt_pmu.pmu.pause_resume = pt_event_pause_resume; >>>>> >>>>> The general idea seems ok to me. Is there a reason to not use the >>>>> existing start() stop() callbacks, rather than adding a new one? >>>>> >>>>> I assume it's intended to be something like an optimisation where you >>>>> can turn it on and off without having to do the full setup, teardown and >>>>> emit an AUX record because you know the process being traced never gets >>>>> switched out? >>>> >>>> So the actual scheduling uses ->add() / ->del(), the ->start() / >>>> ->stop() methods are something that can be used after ->add() and before >>>> ->del() to 'temporarily' pause things. >>>> >>>> Pretty much exactly what is required here I think. We currently use this >>>> for PMI throttling and adaptive frequency stuff, but there is no reason >>>> it could not also be used for this. >>>> >>>> As is, we don't track the paused state across ->del() / ->add(), but >>>> perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_ >>>> bits to manage things. >>>> >>>> >>> >>> I am not sure stop / start play nice with NMI's from other events e.g. >>> >>> PMC NMI wants to pause or resume AUX but what if AUX event is currently >>> being processed in ->stop() or ->start()? Or maybe that can't happen? >> >> I think that can happen, and pt_event_stop() can actually handle some of >> that, while your pause_resume() thing, which uses pt_config_stop() does >> not. >> >> But yes, I think that if you add pt_event_{stop,start}() calls from >> *other* events their PMI, then you get to deal with more 'fun'. >> >> Something like: >> >> perf_addr_filters_adjust() >> __perf_addr_filters_adjust() >> perf_event_stop() >> __perf_event_stop() >> event->pmu->stop() >> <NMI> >> ... >> perf_event_overflow() >> pt_event->pmu->stop() >> </NMI> >> event->pmu->start() // whoopsie! >> >> Should now be possible. >> >> I think what you want to do is rename pt->handle_nmi into pt->stop_count >> and make it a counter, then ->stop() increments it, and ->start() >> decrements it and everybody ensures the thing doesn't get restart while >> !0 etc.. >> >> I suspect you need to guard the generic part of this feature with a new >> PERF_PMU_CAP_ flag and then have the coresight/etc. people opt-in once >> they've audited things. >> >> James, does that work for you? >> > > Yep I think that would work. > > If I understand it with the stop_count counter, to be able to support > the new CAP, every device would basically have to implement a similar > counter? > > Would it be possible to do that reference counting on the outside of > start() and stop() in the core code? So that a device only ever sees one > call to start and one call to stop and doesn't need to do any extra > tracking? stop_counter does not seem right for pauses and resumes because they should not accumulate. We want: pause stop pause resume start resume not: pause stop pause resume resume start Also stop_counter has issues like: stop_counter 0 -> 1 <NMI> stop_counter 1 -> 0 start() </NMI> stop() <- stop_counter is now out of sync or: stop_counter 1 -> 0 <NMI> stop_counter 0 -> 1 stop() </NMI> start() <- stop_counter is now out of sync Also Intel PT implementation uses low-level start / stop for pause / resume, which can be made not to conflict with regular start / stop because they only toggle TRACEEN bit.
diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c index 42a55794004a..aa883b64814a 100644 --- a/arch/x86/events/intel/pt.c +++ b/arch/x86/events/intel/pt.c @@ -418,6 +418,9 @@ static void pt_config_start(struct perf_event *event) struct pt *pt = this_cpu_ptr(&pt_ctx); u64 ctl = event->hw.config; + if (event->aux_paused) + return; + ctl |= RTIT_CTL_TRACEEN; if (READ_ONCE(pt->vmx_on)) perf_aux_output_flag(&pt->handle, PERF_AUX_FLAG_PARTIAL); @@ -1563,6 +1566,14 @@ EXPORT_SYMBOL_GPL(intel_pt_handle_vmx); * PMU callbacks */ +static void pt_event_pause_resume(struct perf_event *event) +{ + if (event->aux_paused) + pt_config_stop(event); + else if (!event->hw.state) + pt_config_start(event); +} + static void pt_event_start(struct perf_event *event, int mode) { struct hw_perf_event *hwc = &event->hw; @@ -1798,6 +1809,7 @@ static __init int pt_init(void) pt_pmu.pmu.del = pt_event_del; pt_pmu.pmu.start = pt_event_start; pt_pmu.pmu.stop = pt_event_stop; + pt_pmu.pmu.pause_resume = pt_event_pause_resume; pt_pmu.pmu.snapshot_aux = pt_event_snapshot_aux; pt_pmu.pmu.read = pt_event_read; pt_pmu.pmu.setup_aux = pt_buffer_setup_aux;