perf/x86/intel/pt: Fix sampling using single range output

Message ID 20221112151508.13768-1-adrian.hunter@intel.com
State New
Headers
Series perf/x86/intel/pt: Fix sampling using single range output |

Commit Message

Adrian Hunter Nov. 12, 2022, 3:15 p.m. UTC
  Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect
Data When Configured With Single Range Output Larger Than 4KB" by
disabling single range output whenever larger than 4KB.

Fixes: 670638477aed ("perf/x86/intel/pt: Opportunistically use single range output mode")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/events/intel/pt.c | 9 +++++++++
 1 file changed, 9 insertions(+)
  

Comments

Peter Zijlstra Nov. 14, 2022, 10:51 a.m. UTC | #1
On Sat, Nov 12, 2022 at 05:15:08PM +0200, Adrian Hunter wrote:
> Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect
> Data When Configured With Single Range Output Larger Than 4KB" by
> disabling single range output whenever larger than 4KB.
> 
> Fixes: 670638477aed ("perf/x86/intel/pt: Opportunistically use single range output mode")
> Cc: stable@vger.kernel.org
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  arch/x86/events/intel/pt.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
> index 82ef87e9a897..42a55794004a 100644
> --- a/arch/x86/events/intel/pt.c
> +++ b/arch/x86/events/intel/pt.c
> @@ -1263,6 +1263,15 @@ static int pt_buffer_try_single(struct pt_buffer *buf, int nr_pages)
>  	if (1 << order != nr_pages)
>  		goto out;
>  
> +	/*
> +	 * Some processors cannot always support single range for more than
> +	 * 4KB - refer errata TGL052, ADL037 and RPL017. Future processors might
> +	 * also be affected, so for now rather than trying to keep track of
> +	 * which ones, just disable it for all.
> +	 */
> +	if (nr_pages > 1)
> +		goto out;

This effectively declares single-output-mode dead? Because I don't think
anybody uses PT with a single 4K buffer.
  
Adrian Hunter Nov. 14, 2022, 11:10 a.m. UTC | #2
On 14/11/22 12:51, Peter Zijlstra wrote:
> On Sat, Nov 12, 2022 at 05:15:08PM +0200, Adrian Hunter wrote:
>> Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect
>> Data When Configured With Single Range Output Larger Than 4KB" by
>> disabling single range output whenever larger than 4KB.
>>
>> Fixes: 670638477aed ("perf/x86/intel/pt: Opportunistically use single range output mode")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  arch/x86/events/intel/pt.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
>> index 82ef87e9a897..42a55794004a 100644
>> --- a/arch/x86/events/intel/pt.c
>> +++ b/arch/x86/events/intel/pt.c
>> @@ -1263,6 +1263,15 @@ static int pt_buffer_try_single(struct pt_buffer *buf, int nr_pages)
>>  	if (1 << order != nr_pages)
>>  		goto out;
>>  
>> +	/*
>> +	 * Some processors cannot always support single range for more than
>> +	 * 4KB - refer errata TGL052, ADL037 and RPL017. Future processors might
>> +	 * also be affected, so for now rather than trying to keep track of
>> +	 * which ones, just disable it for all.
>> +	 */
>> +	if (nr_pages > 1)
>> +		goto out;
> 
> This effectively declares single-output-mode dead? Because I don't think
> anybody uses PT with a single 4K buffer.

4K is the default size for "sample mode" i.e. stuffing 4KB of Intel PT trace
data into a PERF_RECORD_SAMPLE record that has sample_type bit PERF_SAMPLE_AUX

e.g.

$ perf record -vv --aux-sample -e '{intel_pt//u,cycles:u}' uname 2>err.txt
Linux
$ grep aux_sample_size err.txt
  aux_sample_size                  4096
$
  
Peter Zijlstra Nov. 14, 2022, 12:59 p.m. UTC | #3
On Mon, Nov 14, 2022 at 01:10:38PM +0200, Adrian Hunter wrote:
> On 14/11/22 12:51, Peter Zijlstra wrote:
> > On Sat, Nov 12, 2022 at 05:15:08PM +0200, Adrian Hunter wrote:
> >> Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect
> >> Data When Configured With Single Range Output Larger Than 4KB" by
> >> disabling single range output whenever larger than 4KB.
> >>
> >> Fixes: 670638477aed ("perf/x86/intel/pt: Opportunistically use single range output mode")
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> >> ---
> >>  arch/x86/events/intel/pt.c | 9 +++++++++
> >>  1 file changed, 9 insertions(+)
> >>
> >> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
> >> index 82ef87e9a897..42a55794004a 100644
> >> --- a/arch/x86/events/intel/pt.c
> >> +++ b/arch/x86/events/intel/pt.c
> >> @@ -1263,6 +1263,15 @@ static int pt_buffer_try_single(struct pt_buffer *buf, int nr_pages)
> >>  	if (1 << order != nr_pages)
> >>  		goto out;
> >>  
> >> +	/*
> >> +	 * Some processors cannot always support single range for more than
> >> +	 * 4KB - refer errata TGL052, ADL037 and RPL017. Future processors might
> >> +	 * also be affected, so for now rather than trying to keep track of
> >> +	 * which ones, just disable it for all.
> >> +	 */
> >> +	if (nr_pages > 1)
> >> +		goto out;
> > 
> > This effectively declares single-output-mode dead? Because I don't think
> > anybody uses PT with a single 4K buffer.
> 
> 4K is the default size for "sample mode" i.e. stuffing 4KB of Intel PT trace
> data into a PERF_RECORD_SAMPLE record that has sample_type bit PERF_SAMPLE_AUX
> 
> e.g.
> 
> $ perf record -vv --aux-sample -e '{intel_pt//u,cycles:u}' uname 2>err.txt
> Linux
> $ grep aux_sample_size err.txt
>   aux_sample_size                  4096

Ah, ok. Not as bad then. Anyway, I'll go queue it for perf/urgent I
suppose.
  
Andi Kleen Nov. 15, 2022, 7:46 p.m. UTC | #4
Peter Zijlstra <peterz@infradead.org> writes:

> On Mon, Nov 14, 2022 at 01:10:38PM +0200, Adrian Hunter wrote:
>> On 14/11/22 12:51, Peter Zijlstra wrote:
>> > On Sat, Nov 12, 2022 at 05:15:08PM +0200, Adrian Hunter wrote:
>> >> Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect
>> >> Data When Configured With Single Range Output Larger Than 4KB" by
>> >> disabling single range output whenever larger than 4KB.
>> >>
>> >> Fixes: 670638477aed ("perf/x86/intel/pt: Opportunistically use single range output mode")
>> >> Cc: stable@vger.kernel.org
>> >> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> >> ---
>> >>  arch/x86/events/intel/pt.c | 9 +++++++++
>> >>  1 file changed, 9 insertions(+)
>> >>
>> >> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
>> >> index 82ef87e9a897..42a55794004a 100644
>> >> --- a/arch/x86/events/intel/pt.c
>> >> +++ b/arch/x86/events/intel/pt.c
>> >> @@ -1263,6 +1263,15 @@ static int pt_buffer_try_single(struct pt_buffer *buf, int nr_pages)
>> >>  	if (1 << order != nr_pages)
>> >>  		goto out;
>> >>  
>> >> +	/*
>> >> +	 * Some processors cannot always support single range for more than
>> >> +	 * 4KB - refer errata TGL052, ADL037 and RPL017. Future processors might
>> >> +	 * also be affected, so for now rather than trying to keep track of
>> >> +	 * which ones, just disable it for all.
>> >> +	 */
>> >> +	if (nr_pages > 1)
>> >> +		goto out;
>> > 
>> > This effectively declares single-output-mode dead? Because I don't think
>> > anybody uses PT with a single 4K buffer.
>> 
>> 4K is the default size for "sample mode" i.e. stuffing 4KB of Intel PT trace
>> data into a PERF_RECORD_SAMPLE record that has sample_type bit PERF_SAMPLE_AUX
>> 
>> e.g.
>> 
>> $ perf record -vv --aux-sample -e '{intel_pt//u,cycles:u}' uname 2>err.txt
>> Linux
>> $ grep aux_sample_size err.txt
>>   aux_sample_size                  4096
>
> Ah, ok. Not as bad then. Anyway, I'll go queue it for perf/urgent I
> suppose.

It would be better to only limit on the CPUs with the bug because
switching buffers causes some extra latencies. So this patch may regress
PT overhead or tail latencies.

-Andi
  
Adrian Hunter Nov. 16, 2022, 6:26 a.m. UTC | #5
On 15/11/22 21:46, Andi Kleen wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
>> On Mon, Nov 14, 2022 at 01:10:38PM +0200, Adrian Hunter wrote:
>>> On 14/11/22 12:51, Peter Zijlstra wrote:
>>>> On Sat, Nov 12, 2022 at 05:15:08PM +0200, Adrian Hunter wrote:
>>>>> Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect
>>>>> Data When Configured With Single Range Output Larger Than 4KB" by
>>>>> disabling single range output whenever larger than 4KB.
>>>>>
>>>>> Fixes: 670638477aed ("perf/x86/intel/pt: Opportunistically use single range output mode")
>>>>> Cc: stable@vger.kernel.org
>>>>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>>>>> ---
>>>>>  arch/x86/events/intel/pt.c | 9 +++++++++
>>>>>  1 file changed, 9 insertions(+)
>>>>>
>>>>> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
>>>>> index 82ef87e9a897..42a55794004a 100644
>>>>> --- a/arch/x86/events/intel/pt.c
>>>>> +++ b/arch/x86/events/intel/pt.c
>>>>> @@ -1263,6 +1263,15 @@ static int pt_buffer_try_single(struct pt_buffer *buf, int nr_pages)
>>>>>  	if (1 << order != nr_pages)
>>>>>  		goto out;
>>>>>  
>>>>> +	/*
>>>>> +	 * Some processors cannot always support single range for more than
>>>>> +	 * 4KB - refer errata TGL052, ADL037 and RPL017. Future processors might
>>>>> +	 * also be affected, so for now rather than trying to keep track of
>>>>> +	 * which ones, just disable it for all.
>>>>> +	 */
>>>>> +	if (nr_pages > 1)
>>>>> +		goto out;
>>>>
>>>> This effectively declares single-output-mode dead? Because I don't think
>>>> anybody uses PT with a single 4K buffer.
>>>
>>> 4K is the default size for "sample mode" i.e. stuffing 4KB of Intel PT trace
>>> data into a PERF_RECORD_SAMPLE record that has sample_type bit PERF_SAMPLE_AUX
>>>
>>> e.g.
>>>
>>> $ perf record -vv --aux-sample -e '{intel_pt//u,cycles:u}' uname 2>err.txt
>>> Linux
>>> $ grep aux_sample_size err.txt
>>>   aux_sample_size                  4096
>>
>> Ah, ok. Not as bad then. Anyway, I'll go queue it for perf/urgent I
>> suppose.
> 
> It would be better to only limit on the CPUs with the bug because
> switching buffers causes some extra latencies. So this patch may regress
> PT overhead or tail latencies.

I could whitelist CPUs that do not have the issue, because a blacklist
would keep expanding, which would be a bit of a pain to maintain.
  

Patch

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 82ef87e9a897..42a55794004a 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -1263,6 +1263,15 @@  static int pt_buffer_try_single(struct pt_buffer *buf, int nr_pages)
 	if (1 << order != nr_pages)
 		goto out;
 
+	/*
+	 * Some processors cannot always support single range for more than
+	 * 4KB - refer errata TGL052, ADL037 and RPL017. Future processors might
+	 * also be affected, so for now rather than trying to keep track of
+	 * which ones, just disable it for all.
+	 */
+	if (nr_pages > 1)
+		goto out;
+
 	buf->single = true;
 	buf->nr_pages = nr_pages;
 	ret = 0;