[RFC,2/3] perf/x86/intel/pt: Add support for pause_resume()

Message ID 20231123121851.10826-3-adrian.hunter@intel.com
State New
Headers
Series perf/core: Add ability for an event to "pause" or "resume" AUX area tracing |

Commit Message

Adrian Hunter Nov. 23, 2023, 12:18 p.m. UTC
  Prevent tracing to start if aux_paused.

Implement pause_resume() callback. When aux_paused, stop tracing. When
not aux_paused, only start tracing if it isn't currently meant to be
stopped.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/events/intel/pt.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)
  

Comments

James Clark Nov. 29, 2023, 9:53 a.m. UTC | #1
On 23/11/2023 12:18, Adrian Hunter wrote:
> Prevent tracing to start if aux_paused.
> 
> Implement pause_resume() callback. When aux_paused, stop tracing. When
> not aux_paused, only start tracing if it isn't currently meant to be
> stopped.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  arch/x86/events/intel/pt.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
> index 42a55794004a..aa883b64814a 100644
> --- a/arch/x86/events/intel/pt.c
> +++ b/arch/x86/events/intel/pt.c
> @@ -418,6 +418,9 @@ static void pt_config_start(struct perf_event *event)
>  	struct pt *pt = this_cpu_ptr(&pt_ctx);
>  	u64 ctl = event->hw.config;
>  
> +	if (event->aux_paused)
> +		return;
> +
>  	ctl |= RTIT_CTL_TRACEEN;
>  	if (READ_ONCE(pt->vmx_on))
>  		perf_aux_output_flag(&pt->handle, PERF_AUX_FLAG_PARTIAL);
> @@ -1563,6 +1566,14 @@ EXPORT_SYMBOL_GPL(intel_pt_handle_vmx);
>   * PMU callbacks
>   */
>  
> +static void pt_event_pause_resume(struct perf_event *event)
> +{
> +	if (event->aux_paused)
> +		pt_config_stop(event);
> +	else if (!event->hw.state)
> +		pt_config_start(event);
> +}

It seems like having a single pause/resume callback rather than separate
pause and resume ones pushes some of the event state management into the
individual drivers and would be prone to code duplication and divergent
behavior.

Would it be possible to move the conditions from here into the core code
and call separate functions instead?

> +
>  static void pt_event_start(struct perf_event *event, int mode)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
> @@ -1798,6 +1809,7 @@ static __init int pt_init(void)
>  	pt_pmu.pmu.del			 = pt_event_del;
>  	pt_pmu.pmu.start		 = pt_event_start;
>  	pt_pmu.pmu.stop			 = pt_event_stop;
> +	pt_pmu.pmu.pause_resume		 = pt_event_pause_resume;

The general idea seems ok to me. Is there a reason to not use the
existing start() stop() callbacks, rather than adding a new one?

I assume it's intended to be something like an optimisation where you
can turn it on and off without having to do the full setup, teardown and
emit an AUX record because you know the process being traced never gets
switched out?

Could you make it so that it works out of the box, with the option of
later optimisation if you do something like this (not here but something
like this in events/core.c):

  /* Use specialised pause/resume if it exists, otherwise use more
   * expensive start/stop.
   */
  if (pmu->pause_resume)
	pmu->pause_resume(...)
  else
        pmu->stop(...)


>  	pt_pmu.pmu.snapshot_aux		 = pt_event_snapshot_aux;
>  	pt_pmu.pmu.read			 = pt_event_read;
>  	pt_pmu.pmu.setup_aux		 = pt_buffer_setup_aux;
  
Peter Zijlstra Nov. 29, 2023, 10:58 a.m. UTC | #2
On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote:
> On 23/11/2023 12:18, Adrian Hunter wrote:

> > +static void pt_event_pause_resume(struct perf_event *event)
> > +{
> > +	if (event->aux_paused)
> > +		pt_config_stop(event);
> > +	else if (!event->hw.state)
> > +		pt_config_start(event);
> > +}
> 
> It seems like having a single pause/resume callback rather than separate
> pause and resume ones pushes some of the event state management into the
> individual drivers and would be prone to code duplication and divergent
> behavior.
> 
> Would it be possible to move the conditions from here into the core code
> and call separate functions instead?
> 
> > +
> >  static void pt_event_start(struct perf_event *event, int mode)
> >  {
> >  	struct hw_perf_event *hwc = &event->hw;
> > @@ -1798,6 +1809,7 @@ static __init int pt_init(void)
> >  	pt_pmu.pmu.del			 = pt_event_del;
> >  	pt_pmu.pmu.start		 = pt_event_start;
> >  	pt_pmu.pmu.stop			 = pt_event_stop;
> > +	pt_pmu.pmu.pause_resume		 = pt_event_pause_resume;
> 
> The general idea seems ok to me. Is there a reason to not use the
> existing start() stop() callbacks, rather than adding a new one?
> 
> I assume it's intended to be something like an optimisation where you
> can turn it on and off without having to do the full setup, teardown and
> emit an AUX record because you know the process being traced never gets
> switched out?

So the actual scheduling uses ->add() / ->del(), the ->start() /
->stop() methods are something that can be used after ->add() and before
->del() to 'temporarily' pause things.

Pretty much exactly what is required here I think. We currently use this
for PMI throttling and adaptive frequency stuff, but there is no reason
it could not also be used for this.

As is, we don't track the paused state across ->del() / ->add(), but
perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_
bits to manage things.
  
Adrian Hunter Nov. 29, 2023, 11:15 a.m. UTC | #3
On 29/11/23 12:58, Peter Zijlstra wrote:
> On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote:
>> On 23/11/2023 12:18, Adrian Hunter wrote:
> 
>>> +static void pt_event_pause_resume(struct perf_event *event)
>>> +{
>>> +	if (event->aux_paused)
>>> +		pt_config_stop(event);
>>> +	else if (!event->hw.state)
>>> +		pt_config_start(event);
>>> +}
>>
>> It seems like having a single pause/resume callback rather than separate
>> pause and resume ones pushes some of the event state management into the
>> individual drivers and would be prone to code duplication and divergent
>> behavior.
>>
>> Would it be possible to move the conditions from here into the core code
>> and call separate functions instead?
>>
>>> +
>>>  static void pt_event_start(struct perf_event *event, int mode)
>>>  {
>>>  	struct hw_perf_event *hwc = &event->hw;
>>> @@ -1798,6 +1809,7 @@ static __init int pt_init(void)
>>>  	pt_pmu.pmu.del			 = pt_event_del;
>>>  	pt_pmu.pmu.start		 = pt_event_start;
>>>  	pt_pmu.pmu.stop			 = pt_event_stop;
>>> +	pt_pmu.pmu.pause_resume		 = pt_event_pause_resume;
>>
>> The general idea seems ok to me. Is there a reason to not use the
>> existing start() stop() callbacks, rather than adding a new one?
>>
>> I assume it's intended to be something like an optimisation where you
>> can turn it on and off without having to do the full setup, teardown and
>> emit an AUX record because you know the process being traced never gets
>> switched out?
> 
> So the actual scheduling uses ->add() / ->del(), the ->start() /
> ->stop() methods are something that can be used after ->add() and before
> ->del() to 'temporarily' pause things.
> 
> Pretty much exactly what is required here I think. We currently use this
> for PMI throttling and adaptive frequency stuff, but there is no reason
> it could not also be used for this.
> 
> As is, we don't track the paused state across ->del() / ->add(), but
> perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_
> bits to manage things.
> 
> 

I am not sure stop / start play nice with NMI's from other events e.g.

PMC NMI wants to pause or resume AUX but what if AUX event is currently
being processed in ->stop() or ->start()?  Or maybe that can't happen?
  
Peter Zijlstra Nov. 29, 2023, 12:23 p.m. UTC | #4
On Wed, Nov 29, 2023 at 01:15:43PM +0200, Adrian Hunter wrote:
> On 29/11/23 12:58, Peter Zijlstra wrote:
> > On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote:
> >> On 23/11/2023 12:18, Adrian Hunter wrote:
> > 
> >>> +static void pt_event_pause_resume(struct perf_event *event)
> >>> +{
> >>> +	if (event->aux_paused)
> >>> +		pt_config_stop(event);
> >>> +	else if (!event->hw.state)
> >>> +		pt_config_start(event);
> >>> +}
> >>
> >> It seems like having a single pause/resume callback rather than separate
> >> pause and resume ones pushes some of the event state management into the
> >> individual drivers and would be prone to code duplication and divergent
> >> behavior.
> >>
> >> Would it be possible to move the conditions from here into the core code
> >> and call separate functions instead?
> >>
> >>> +
> >>>  static void pt_event_start(struct perf_event *event, int mode)
> >>>  {
> >>>  	struct hw_perf_event *hwc = &event->hw;
> >>> @@ -1798,6 +1809,7 @@ static __init int pt_init(void)
> >>>  	pt_pmu.pmu.del			 = pt_event_del;
> >>>  	pt_pmu.pmu.start		 = pt_event_start;
> >>>  	pt_pmu.pmu.stop			 = pt_event_stop;
> >>> +	pt_pmu.pmu.pause_resume		 = pt_event_pause_resume;
> >>
> >> The general idea seems ok to me. Is there a reason to not use the
> >> existing start() stop() callbacks, rather than adding a new one?
> >>
> >> I assume it's intended to be something like an optimisation where you
> >> can turn it on and off without having to do the full setup, teardown and
> >> emit an AUX record because you know the process being traced never gets
> >> switched out?
> > 
> > So the actual scheduling uses ->add() / ->del(), the ->start() /
> > ->stop() methods are something that can be used after ->add() and before
> > ->del() to 'temporarily' pause things.
> > 
> > Pretty much exactly what is required here I think. We currently use this
> > for PMI throttling and adaptive frequency stuff, but there is no reason
> > it could not also be used for this.
> > 
> > As is, we don't track the paused state across ->del() / ->add(), but
> > perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_
> > bits to manage things.
> > 
> > 
> 
> I am not sure stop / start play nice with NMI's from other events e.g.
> 
> PMC NMI wants to pause or resume AUX but what if AUX event is currently
> being processed in ->stop() or ->start()?  Or maybe that can't happen?

I think that can happen, and pt_event_stop() can actually handle some of
that, while your pause_resume() thing, which uses pt_config_stop() does
not.

But yes, I think that if you add pt_event_{stop,start}() calls from
*other* events their PMI, then you get to deal with more 'fun'.

Something like:

  perf_addr_filters_adjust()
    __perf_addr_filters_adjust()
      perf_event_stop()
        __perf_event_stop()
	  event->pmu->stop()
	  <NMI>
	    ...
	    perf_event_overflow()
	      pt_event->pmu->stop()
	  </NMI>
	  event->pmu->start() // whoopsie!

Should now be possible.

I think what you want to do is rename pt->handle_nmi into pt->stop_count
and make it a counter, then ->stop() increments it, and ->start()
decrements it and everybody ensures the thing doesn't get restart while
!0 etc..

I suspect you need to guard the generic part of this feature with a new
PERF_PMU_CAP_ flag and then have the coresight/etc. people opt-in once
they've audited things.

James, does that work for you?
  
James Clark Nov. 30, 2023, 10:07 a.m. UTC | #5
On 29/11/2023 12:23, Peter Zijlstra wrote:
> On Wed, Nov 29, 2023 at 01:15:43PM +0200, Adrian Hunter wrote:
>> On 29/11/23 12:58, Peter Zijlstra wrote:
>>> On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote:
>>>> On 23/11/2023 12:18, Adrian Hunter wrote:
>>>
>>>>> +static void pt_event_pause_resume(struct perf_event *event)
>>>>> +{
>>>>> +	if (event->aux_paused)
>>>>> +		pt_config_stop(event);
>>>>> +	else if (!event->hw.state)
>>>>> +		pt_config_start(event);
>>>>> +}
>>>>
>>>> It seems like having a single pause/resume callback rather than separate
>>>> pause and resume ones pushes some of the event state management into the
>>>> individual drivers and would be prone to code duplication and divergent
>>>> behavior.
>>>>
>>>> Would it be possible to move the conditions from here into the core code
>>>> and call separate functions instead?
>>>>
>>>>> +
>>>>>  static void pt_event_start(struct perf_event *event, int mode)
>>>>>  {
>>>>>  	struct hw_perf_event *hwc = &event->hw;
>>>>> @@ -1798,6 +1809,7 @@ static __init int pt_init(void)
>>>>>  	pt_pmu.pmu.del			 = pt_event_del;
>>>>>  	pt_pmu.pmu.start		 = pt_event_start;
>>>>>  	pt_pmu.pmu.stop			 = pt_event_stop;
>>>>> +	pt_pmu.pmu.pause_resume		 = pt_event_pause_resume;
>>>>
>>>> The general idea seems ok to me. Is there a reason to not use the
>>>> existing start() stop() callbacks, rather than adding a new one?
>>>>
>>>> I assume it's intended to be something like an optimisation where you
>>>> can turn it on and off without having to do the full setup, teardown and
>>>> emit an AUX record because you know the process being traced never gets
>>>> switched out?
>>>
>>> So the actual scheduling uses ->add() / ->del(), the ->start() /
>>> ->stop() methods are something that can be used after ->add() and before
>>> ->del() to 'temporarily' pause things.
>>>
>>> Pretty much exactly what is required here I think. We currently use this
>>> for PMI throttling and adaptive frequency stuff, but there is no reason
>>> it could not also be used for this.
>>>
>>> As is, we don't track the paused state across ->del() / ->add(), but
>>> perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_
>>> bits to manage things.
>>>
>>>
>>
>> I am not sure stop / start play nice with NMI's from other events e.g.
>>
>> PMC NMI wants to pause or resume AUX but what if AUX event is currently
>> being processed in ->stop() or ->start()?  Or maybe that can't happen?
> 
> I think that can happen, and pt_event_stop() can actually handle some of
> that, while your pause_resume() thing, which uses pt_config_stop() does
> not.
> 
> But yes, I think that if you add pt_event_{stop,start}() calls from
> *other* events their PMI, then you get to deal with more 'fun'.
> 
> Something like:
> 
>   perf_addr_filters_adjust()
>     __perf_addr_filters_adjust()
>       perf_event_stop()
>         __perf_event_stop()
> 	  event->pmu->stop()
> 	  <NMI>
> 	    ...
> 	    perf_event_overflow()
> 	      pt_event->pmu->stop()
> 	  </NMI>
> 	  event->pmu->start() // whoopsie!
> 
> Should now be possible.
> 
> I think what you want to do is rename pt->handle_nmi into pt->stop_count
> and make it a counter, then ->stop() increments it, and ->start()
> decrements it and everybody ensures the thing doesn't get restart while
> !0 etc..
> 
> I suspect you need to guard the generic part of this feature with a new
> PERF_PMU_CAP_ flag and then have the coresight/etc. people opt-in once
> they've audited things.
> 
> James, does that work for you?
> 

Yep I think that would work.

If I understand it with the stop_count counter, to be able to support
the new CAP, every device would basically have to implement a similar
counter?

Would it be possible to do that reference counting on the outside of
start() and stop() in the core code? So that a device only ever sees one
call to start and one call to stop and doesn't need to do any extra
tracking?
  
Adrian Hunter Dec. 5, 2023, 5:36 a.m. UTC | #6
On 30/11/23 12:07, James Clark wrote:
> 
> 
> On 29/11/2023 12:23, Peter Zijlstra wrote:
>> On Wed, Nov 29, 2023 at 01:15:43PM +0200, Adrian Hunter wrote:
>>> On 29/11/23 12:58, Peter Zijlstra wrote:
>>>> On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote:
>>>>> On 23/11/2023 12:18, Adrian Hunter wrote:
>>>>
>>>>>> +static void pt_event_pause_resume(struct perf_event *event)
>>>>>> +{
>>>>>> +	if (event->aux_paused)
>>>>>> +		pt_config_stop(event);
>>>>>> +	else if (!event->hw.state)
>>>>>> +		pt_config_start(event);
>>>>>> +}
>>>>>
>>>>> It seems like having a single pause/resume callback rather than separate
>>>>> pause and resume ones pushes some of the event state management into the
>>>>> individual drivers and would be prone to code duplication and divergent
>>>>> behavior.
>>>>>
>>>>> Would it be possible to move the conditions from here into the core code
>>>>> and call separate functions instead?
>>>>>
>>>>>> +
>>>>>>  static void pt_event_start(struct perf_event *event, int mode)
>>>>>>  {
>>>>>>  	struct hw_perf_event *hwc = &event->hw;
>>>>>> @@ -1798,6 +1809,7 @@ static __init int pt_init(void)
>>>>>>  	pt_pmu.pmu.del			 = pt_event_del;
>>>>>>  	pt_pmu.pmu.start		 = pt_event_start;
>>>>>>  	pt_pmu.pmu.stop			 = pt_event_stop;
>>>>>> +	pt_pmu.pmu.pause_resume		 = pt_event_pause_resume;
>>>>>
>>>>> The general idea seems ok to me. Is there a reason to not use the
>>>>> existing start() stop() callbacks, rather than adding a new one?
>>>>>
>>>>> I assume it's intended to be something like an optimisation where you
>>>>> can turn it on and off without having to do the full setup, teardown and
>>>>> emit an AUX record because you know the process being traced never gets
>>>>> switched out?
>>>>
>>>> So the actual scheduling uses ->add() / ->del(), the ->start() /
>>>> ->stop() methods are something that can be used after ->add() and before
>>>> ->del() to 'temporarily' pause things.
>>>>
>>>> Pretty much exactly what is required here I think. We currently use this
>>>> for PMI throttling and adaptive frequency stuff, but there is no reason
>>>> it could not also be used for this.
>>>>
>>>> As is, we don't track the paused state across ->del() / ->add(), but
>>>> perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_
>>>> bits to manage things.
>>>>
>>>>
>>>
>>> I am not sure stop / start play nice with NMI's from other events e.g.
>>>
>>> PMC NMI wants to pause or resume AUX but what if AUX event is currently
>>> being processed in ->stop() or ->start()?  Or maybe that can't happen?
>>
>> I think that can happen, and pt_event_stop() can actually handle some of
>> that, while your pause_resume() thing, which uses pt_config_stop() does
>> not.
>>
>> But yes, I think that if you add pt_event_{stop,start}() calls from
>> *other* events their PMI, then you get to deal with more 'fun'.
>>
>> Something like:
>>
>>   perf_addr_filters_adjust()
>>     __perf_addr_filters_adjust()
>>       perf_event_stop()
>>         __perf_event_stop()
>> 	  event->pmu->stop()
>> 	  <NMI>
>> 	    ...
>> 	    perf_event_overflow()
>> 	      pt_event->pmu->stop()
>> 	  </NMI>
>> 	  event->pmu->start() // whoopsie!
>>
>> Should now be possible.
>>
>> I think what you want to do is rename pt->handle_nmi into pt->stop_count
>> and make it a counter, then ->stop() increments it, and ->start()
>> decrements it and everybody ensures the thing doesn't get restart while
>> !0 etc..
>>
>> I suspect you need to guard the generic part of this feature with a new
>> PERF_PMU_CAP_ flag and then have the coresight/etc. people opt-in once
>> they've audited things.
>>
>> James, does that work for you?
>>
> 
> Yep I think that would work.
> 
> If I understand it with the stop_count counter, to be able to support
> the new CAP, every device would basically have to implement a similar
> counter?
> 
> Would it be possible to do that reference counting on the outside of
> start() and stop() in the core code? So that a device only ever sees one
> call to start and one call to stop and doesn't need to do any extra
> tracking?

stop_counter does not seem right for pauses and resumes because they
should not accumulate.  We want:

	pause	stop
	pause
	resume	start
	resume

not:

	pause	stop
	pause
	resume
	resume	start

Also stop_counter has issues like:

	stop_counter 0 -> 1
	<NMI>
		stop_counter 1 -> 0
		start()
	</NMI>
	stop()	<- stop_counter is now out of sync

or:

	stop_counter 1 -> 0
	<NMI>
		stop_counter 0 -> 1
		stop()
	</NMI>
	start()	<- stop_counter is now out of sync

Also Intel PT implementation uses low-level start / stop
for pause / resume, which can be made not to conflict with
regular start / stop because they only toggle TRACEEN bit.
  

Patch

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 42a55794004a..aa883b64814a 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -418,6 +418,9 @@  static void pt_config_start(struct perf_event *event)
 	struct pt *pt = this_cpu_ptr(&pt_ctx);
 	u64 ctl = event->hw.config;
 
+	if (event->aux_paused)
+		return;
+
 	ctl |= RTIT_CTL_TRACEEN;
 	if (READ_ONCE(pt->vmx_on))
 		perf_aux_output_flag(&pt->handle, PERF_AUX_FLAG_PARTIAL);
@@ -1563,6 +1566,14 @@  EXPORT_SYMBOL_GPL(intel_pt_handle_vmx);
  * PMU callbacks
  */
 
+static void pt_event_pause_resume(struct perf_event *event)
+{
+	if (event->aux_paused)
+		pt_config_stop(event);
+	else if (!event->hw.state)
+		pt_config_start(event);
+}
+
 static void pt_event_start(struct perf_event *event, int mode)
 {
 	struct hw_perf_event *hwc = &event->hw;
@@ -1798,6 +1809,7 @@  static __init int pt_init(void)
 	pt_pmu.pmu.del			 = pt_event_del;
 	pt_pmu.pmu.start		 = pt_event_start;
 	pt_pmu.pmu.stop			 = pt_event_stop;
+	pt_pmu.pmu.pause_resume		 = pt_event_pause_resume;
 	pt_pmu.pmu.snapshot_aux		 = pt_event_snapshot_aux;
 	pt_pmu.pmu.read			 = pt_event_read;
 	pt_pmu.pmu.setup_aux		 = pt_buffer_setup_aux;