[1/1] Revert "perf report: Append inlines to non-DWARF callchains"

Message ID ZMl8VyhdwhClTM5g@kernel.org
State New
Headers
Series [1/1] Revert "perf report: Append inlines to non-DWARF callchains" |

Commit Message

Arnaldo Carvalho de Melo Aug. 1, 2023, 9:42 p.m. UTC
  Hi Artem,

	Can you please double check this? I reproduced with:

git checkout 46d21ec067490ab9cdcc89b9de5aae28786a8b8e
build it
perf record -a -g sleep 5s
perf report

	Do you get the same slowness and then reverting it, i.e. just
going to HEAD~ and rebuilding getting a fast 'perf report' startup, i.e.
without the inlines in the callchains?

- Arnaldo

----

This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e.

The tests were made with a specific workload, further tests on a
recently updated fedora 38 system with a system wide perf.data file
shows 'perf report' taking excessive time, so lets revert this until a
full investigation and improvement on the addr2line support code is
made.

Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Artem Savkov <asavkov@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/machine.c | 5 -----
 1 file changed, 5 deletions(-)
  

Comments

Artem Savkov Aug. 2, 2023, 7:43 a.m. UTC | #1
Hi Arnaldo,

On Tue, Aug 01, 2023 at 06:42:47PM -0300, Arnaldo Carvalho de Melo wrote:
> Hi Artem,
> 
> 	Can you please double check this? I reproduced with:
> 
> git checkout 46d21ec067490ab9cdcc89b9de5aae28786a8b8e
> build it
> perf record -a -g sleep 5s
> perf report
> 
> 	Do you get the same slowness and then reverting it, i.e. just
> going to HEAD~ and rebuilding getting a fast 'perf report' startup, i.e.
> without the inlines in the callchains?

With a simple test like this I definitely get a slowdown, but not sure
if it can be called excessive.

Below are the times I got by running 'time perf report' and hitting 'q'
during load so that it quits as soon as it is loads up. Tested on a
freshly updated fedora 38.

For 'perf record -a -g sleep 60' (Event count (approx.): 774055090):

with inlines:
$ time ./perf report

real    0m1.477s
user    0m1.324s
sys     0m0.147s

without inlines:
$ time ./perf report

real    0m1.349s
user    0m1.232s
sys     0m0.111s

For 'perf record -a -g sleep 5' (Event count (approx.): 90179399):

with inlines:
$ time ./perf report

real    0m0.657s
user    0m0.555s
sys     0m0.099s

without inlines:
$ time ./perf report

real    0m0.559s
user    0m0.498s
sys     0m0.060s


> - Arnaldo
> 
> ----
> 
> This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e.
> 
> The tests were made with a specific workload, further tests on a
> recently updated fedora 38 system with a system wide perf.data file
> shows 'perf report' taking excessive time, so lets revert this until a
> full investigation and improvement on the addr2line support code is
> made.
> 
> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> Cc: Artem Savkov <asavkov@redhat.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Ian Rogers <irogers@google.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Milian Wolff <milian.wolff@kdab.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> ---
>  tools/perf/util/machine.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 4e62843d51b7dbf9..f4cb41ee23cdbcfc 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -45,7 +45,6 @@
>  
>  static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
>  				     struct thread *th, bool lock);
> -static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
>  
>  static struct dso *machine__kernel_dso(struct machine *machine)
>  {
> @@ -2385,10 +2384,6 @@ static int add_callchain_ip(struct thread *thread,
>  	ms.maps = maps__get(al.maps);
>  	ms.map = map__get(al.map);
>  	ms.sym = al.sym;
> -
> -	if (!branch && append_inlines(cursor, &ms, ip) == 0)
> -		goto out;
> -
>  	srcline = callchain_srcline(&ms, al.addr);
>  	err = callchain_cursor_append(cursor, ip, &ms,
>  				      branch, flags, nr_loop_iter,
> -- 
> 2.41.0
>
  
Artem Savkov Aug. 7, 2023, 11 a.m. UTC | #2
On Wed, Aug 02, 2023 at 09:43:40AM +0200, Artem Savkov wrote:
> Hi Arnaldo,
> 
> On Tue, Aug 01, 2023 at 06:42:47PM -0300, Arnaldo Carvalho de Melo wrote:
> > Hi Artem,
> > 
> > 	Can you please double check this? I reproduced with:
> > 
> > git checkout 46d21ec067490ab9cdcc89b9de5aae28786a8b8e
> > build it
> > perf record -a -g sleep 5s
> > perf report
> > 
> > 	Do you get the same slowness and then reverting it, i.e. just
> > going to HEAD~ and rebuilding getting a fast 'perf report' startup, i.e.
> > without the inlines in the callchains?
> 
> With a simple test like this I definitely get a slowdown, but not sure
> if it can be called excessive.
> 
> Below are the times I got by running 'time perf report' and hitting 'q'
> during load so that it quits as soon as it is loads up. Tested on a
> freshly updated fedora 38.

My bad, I had wrong debuginfo installed for the kernel I tested. I can
reproduce it with the correct one. Looks like vmlinux is just too much
for addr2line. Maybe we can skip it but leave other inlines in, like so:

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 11de3ca8d4fa7..fef309cd401f7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2388,7 +2388,9 @@ static int add_callchain_ip(struct thread *thread,
 	ms.map = map__get(al.map);
 	ms.sym = al.sym;
 
-	if (!branch && append_inlines(cursor, &ms, ip) == 0)
+	if (!branch && ms.map && ms.map->dso &&
+	    strcmp(ms.map->dso->short_name, "[kernel.vmlinux]") &&
+	    append_inlines(cursor, &ms, ip) == 0)
 		goto out;
 
 	srcline = callchain_srcline(&ms, al.addr);

> > - Arnaldo
> > 
> > ----
> > 
> > This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e.
> > 
> > The tests were made with a specific workload, further tests on a
> > recently updated fedora 38 system with a system wide perf.data file
> > shows 'perf report' taking excessive time, so lets revert this until a
> > full investigation and improvement on the addr2line support code is
> > made.
> > 
> > Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> > Cc: Artem Savkov <asavkov@redhat.com>
> > Cc: Namhyung Kim <namhyung@kernel.org>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > Cc: Ian Rogers <irogers@google.com>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Jiri Olsa <jolsa@kernel.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Milian Wolff <milian.wolff@kdab.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> > ---
> >  tools/perf/util/machine.c | 5 -----
> >  1 file changed, 5 deletions(-)
> > 
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 4e62843d51b7dbf9..f4cb41ee23cdbcfc 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -45,7 +45,6 @@
> >  
> >  static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
> >  				     struct thread *th, bool lock);
> > -static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
> >  
> >  static struct dso *machine__kernel_dso(struct machine *machine)
> >  {
> > @@ -2385,10 +2384,6 @@ static int add_callchain_ip(struct thread *thread,
> >  	ms.maps = maps__get(al.maps);
> >  	ms.map = map__get(al.map);
> >  	ms.sym = al.sym;
> > -
> > -	if (!branch && append_inlines(cursor, &ms, ip) == 0)
> > -		goto out;
> > -
> >  	srcline = callchain_srcline(&ms, al.addr);
> >  	err = callchain_cursor_append(cursor, ip, &ms,
> >  				      branch, flags, nr_loop_iter,
> > -- 
> > 2.41.0
> > 
> 
> -- 
>  Artem
  
Arnaldo Carvalho de Melo Aug. 7, 2023, 1:34 p.m. UTC | #3
Em Mon, Aug 07, 2023 at 01:00:08PM +0200, Artem Savkov escreveu:
> On Wed, Aug 02, 2023 at 09:43:40AM +0200, Artem Savkov wrote:
> > Hi Arnaldo,
> > 
> > On Tue, Aug 01, 2023 at 06:42:47PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Hi Artem,
> > > 
> > > 	Can you please double check this? I reproduced with:
> > > 
> > > git checkout 46d21ec067490ab9cdcc89b9de5aae28786a8b8e
> > > build it
> > > perf record -a -g sleep 5s
> > > perf report
> > > 
> > > 	Do you get the same slowness and then reverting it, i.e. just
> > > going to HEAD~ and rebuilding getting a fast 'perf report' startup, i.e.
> > > without the inlines in the callchains?
> > 
> > With a simple test like this I definitely get a slowdown, but not sure
> > if it can be called excessive.
> > 
> > Below are the times I got by running 'time perf report' and hitting 'q'
> > during load so that it quits as soon as it is loads up. Tested on a
> > freshly updated fedora 38.
> 
> My bad, I had wrong debuginfo installed for the kernel I tested. I can
> reproduce it with the correct one. Looks like vmlinux is just too much
> for addr2line. Maybe we can skip it but leave other inlines in, like so:

That is a possibilit, and probably we could make it cheaper by looking
at the cpumode, avoiding calling addr2line when we didn't makage to
resolve the symbol, etc.

We also may want to have this as an option that has to be explicitely
enabled, like --resolve-inlines, as this will add overhead no matter if
we stop calling addr2line and do it more efficiently, etc.

Fact is, we're late in the 6.5 schedule, so the best thing now is to
just revert the patch and then try again later, ok?

- Arnaldo
 
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 11de3ca8d4fa7..fef309cd401f7 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -2388,7 +2388,9 @@ static int add_callchain_ip(struct thread *thread,
>  	ms.map = map__get(al.map);
>  	ms.sym = al.sym;
>  
> -	if (!branch && append_inlines(cursor, &ms, ip) == 0)
> +	if (!branch && ms.map && ms.map->dso &&
> +	    strcmp(ms.map->dso->short_name, "[kernel.vmlinux]") &&
> +	    append_inlines(cursor, &ms, ip) == 0)
>  		goto out;
>  
>  	srcline = callchain_srcline(&ms, al.addr);
> 
> > > - Arnaldo
> > > 
> > > ----
> > > 
> > > This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e.
> > > 
> > > The tests were made with a specific workload, further tests on a
> > > recently updated fedora 38 system with a system wide perf.data file
> > > shows 'perf report' taking excessive time, so lets revert this until a
> > > full investigation and improvement on the addr2line support code is
> > > made.
> > > 
> > > Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> > > Cc: Artem Savkov <asavkov@redhat.com>
> > > Cc: Namhyung Kim <namhyung@kernel.org>
> > > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > > Cc: Ian Rogers <irogers@google.com>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: Jiri Olsa <jolsa@kernel.org>
> > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > > Cc: Milian Wolff <milian.wolff@kdab.com>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> > > ---
> > >  tools/perf/util/machine.c | 5 -----
> > >  1 file changed, 5 deletions(-)
> > > 
> > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > index 4e62843d51b7dbf9..f4cb41ee23cdbcfc 100644
> > > --- a/tools/perf/util/machine.c
> > > +++ b/tools/perf/util/machine.c
> > > @@ -45,7 +45,6 @@
> > >  
> > >  static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
> > >  				     struct thread *th, bool lock);
> > > -static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
> > >  
> > >  static struct dso *machine__kernel_dso(struct machine *machine)
> > >  {
> > > @@ -2385,10 +2384,6 @@ static int add_callchain_ip(struct thread *thread,
> > >  	ms.maps = maps__get(al.maps);
> > >  	ms.map = map__get(al.map);
> > >  	ms.sym = al.sym;
> > > -
> > > -	if (!branch && append_inlines(cursor, &ms, ip) == 0)
> > > -		goto out;
> > > -
> > >  	srcline = callchain_srcline(&ms, al.addr);
> > >  	err = callchain_cursor_append(cursor, ip, &ms,
> > >  				      branch, flags, nr_loop_iter,
> > > -- 
> > > 2.41.0
> > > 
> > 
> > -- 
> >  Artem
> 
> -- 
>  Artem
>
  
Artem Savkov Aug. 7, 2023, 2:03 p.m. UTC | #4
On Mon, Aug 07, 2023 at 10:34:44AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Aug 07, 2023 at 01:00:08PM +0200, Artem Savkov escreveu:
> > On Wed, Aug 02, 2023 at 09:43:40AM +0200, Artem Savkov wrote:
> > > Hi Arnaldo,
> > > 
> > > On Tue, Aug 01, 2023 at 06:42:47PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > Hi Artem,
> > > > 
> > > > 	Can you please double check this? I reproduced with:
> > > > 
> > > > git checkout 46d21ec067490ab9cdcc89b9de5aae28786a8b8e
> > > > build it
> > > > perf record -a -g sleep 5s
> > > > perf report
> > > > 
> > > > 	Do you get the same slowness and then reverting it, i.e. just
> > > > going to HEAD~ and rebuilding getting a fast 'perf report' startup, i.e.
> > > > without the inlines in the callchains?
> > > 
> > > With a simple test like this I definitely get a slowdown, but not sure
> > > if it can be called excessive.
> > > 
> > > Below are the times I got by running 'time perf report' and hitting 'q'
> > > during load so that it quits as soon as it is loads up. Tested on a
> > > freshly updated fedora 38.
> > 
> > My bad, I had wrong debuginfo installed for the kernel I tested. I can
> > reproduce it with the correct one. Looks like vmlinux is just too much
> > for addr2line. Maybe we can skip it but leave other inlines in, like so:
> 
> That is a possibilit, and probably we could make it cheaper by looking
> at the cpumode, avoiding calling addr2line when we didn't makage to
> resolve the symbol, etc.
> 
> We also may want to have this as an option that has to be explicitely
> enabled, like --resolve-inlines, as this will add overhead no matter if
> we stop calling addr2line and do it more efficiently, etc.

Sounds good, I'll look into it.

> Fact is, we're late in the 6.5 schedule, so the best thing now is to
> just revert the patch and then try again later, ok?

Yes, sure.

> - Arnaldo
>  
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 11de3ca8d4fa7..fef309cd401f7 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -2388,7 +2388,9 @@ static int add_callchain_ip(struct thread *thread,
> >  	ms.map = map__get(al.map);
> >  	ms.sym = al.sym;
> >  
> > -	if (!branch && append_inlines(cursor, &ms, ip) == 0)
> > +	if (!branch && ms.map && ms.map->dso &&
> > +	    strcmp(ms.map->dso->short_name, "[kernel.vmlinux]") &&
> > +	    append_inlines(cursor, &ms, ip) == 0)
> >  		goto out;
> >  
> >  	srcline = callchain_srcline(&ms, al.addr);
> > 
> > > > - Arnaldo
> > > > 
> > > > ----
> > > > 
> > > > This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e.
> > > > 
> > > > The tests were made with a specific workload, further tests on a
> > > > recently updated fedora 38 system with a system wide perf.data file
> > > > shows 'perf report' taking excessive time, so lets revert this until a
> > > > full investigation and improvement on the addr2line support code is
> > > > made.
> > > > 
> > > > Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> > > > Cc: Artem Savkov <asavkov@redhat.com>
> > > > Cc: Namhyung Kim <namhyung@kernel.org>
> > > > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > > > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > > > Cc: Ian Rogers <irogers@google.com>
> > > > Cc: Ingo Molnar <mingo@redhat.com>
> > > > Cc: Jiri Olsa <jolsa@kernel.org>
> > > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > > > Cc: Milian Wolff <milian.wolff@kdab.com>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> > > > ---
> > > >  tools/perf/util/machine.c | 5 -----
> > > >  1 file changed, 5 deletions(-)
> > > > 
> > > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > > index 4e62843d51b7dbf9..f4cb41ee23cdbcfc 100644
> > > > --- a/tools/perf/util/machine.c
> > > > +++ b/tools/perf/util/machine.c
> > > > @@ -45,7 +45,6 @@
> > > >  
> > > >  static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
> > > >  				     struct thread *th, bool lock);
> > > > -static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
> > > >  
> > > >  static struct dso *machine__kernel_dso(struct machine *machine)
> > > >  {
> > > > @@ -2385,10 +2384,6 @@ static int add_callchain_ip(struct thread *thread,
> > > >  	ms.maps = maps__get(al.maps);
> > > >  	ms.map = map__get(al.map);
> > > >  	ms.sym = al.sym;
> > > > -
> > > > -	if (!branch && append_inlines(cursor, &ms, ip) == 0)
> > > > -		goto out;
> > > > -
> > > >  	srcline = callchain_srcline(&ms, al.addr);
> > > >  	err = callchain_cursor_append(cursor, ip, &ms,
> > > >  				      branch, flags, nr_loop_iter,
> > > > -- 
> > > > 2.41.0
> > > > 
> > > 
> > > -- 
> > >  Artem
> > 
> > -- 
> >  Artem
> > 
> 
> -- 
> 
> - Arnaldo
>
  
Jesper Dangaard Brouer Aug. 8, 2023, 1:38 p.m. UTC | #5
On 07/08/2023 16.03, Artem Savkov wrote:
> On Mon, Aug 07, 2023 at 10:34:44AM -0300, Arnaldo Carvalho de Melo wrote:
>> Em Mon, Aug 07, 2023 at 01:00:08PM +0200, Artem Savkov escreveu:
>>> On Wed, Aug 02, 2023 at 09:43:40AM +0200, Artem Savkov wrote:
>>>> Hi Arnaldo,
>>>>
>>>> On Tue, Aug 01, 2023 at 06:42:47PM -0300, Arnaldo Carvalho de Melo wrote:
>>>>> Hi Artem,
>>>>>
>>>>> 	Can you please double check this? I reproduced with:
>>>>>
>>>>> git checkout 46d21ec067490ab9cdcc89b9de5aae28786a8b8e
>>>>> build it
>>>>> perf record -a -g sleep 5s
>>>>> perf report
>>>>>
>>>>> 	Do you get the same slowness and then reverting it, i.e. just
>>>>> going to HEAD~ and rebuilding getting a fast 'perf report' startup, i.e.
>>>>> without the inlines in the callchains?
>>>>
>>>> With a simple test like this I definitely get a slowdown, but not sure
>>>> if it can be called excessive.
>>>>
>>>> Below are the times I got by running 'time perf report' and hitting 'q'
>>>> during load so that it quits as soon as it is loads up. Tested on a
>>>> freshly updated fedora 38.
>>>

I reported this problem to ACME. It is also possible to reproduce
without hitting 'q' via using this cmdline with --stdio like this:

  time perf report -v --stdio > /dev/null 2> debug01.stderr

The file 'debug01.stderr' contained a lot of addr2line output, that
might help debug the issue further.



>>> My bad, I had wrong debuginfo installed for the kernel I tested. I can
>>> reproduce it with the correct one. Looks like vmlinux is just too much
>>> for addr2line. Maybe we can skip it but leave other inlines in, like so:
>>
>> That is a possibilit, and probably we could make it cheaper by looking
>> at the cpumode, avoiding calling addr2line when we didn't makage to
>> resolve the symbol, etc.
>>
>> We also may want to have this as an option that has to be explicitely
>> enabled, like --resolve-inlines, as this will add overhead no matter if
>> we stop calling addr2line and do it more efficiently, etc.
> 
> Sounds good, I'll look into it.
> 
>> Fact is, we're late in the 6.5 schedule, so the best thing now is to
>> just revert the patch and then try again later, ok?
> 
> Yes, sure.
> 
>> - Arnaldo
>>   
>>> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
>>> index 11de3ca8d4fa7..fef309cd401f7 100644
>>> --- a/tools/perf/util/machine.c
>>> +++ b/tools/perf/util/machine.c
>>> @@ -2388,7 +2388,9 @@ static int add_callchain_ip(struct thread *thread,
>>>   	ms.map = map__get(al.map);
>>>   	ms.sym = al.sym;
>>>   
>>> -	if (!branch && append_inlines(cursor, &ms, ip) == 0)
>>> +	if (!branch && ms.map && ms.map->dso &&
>>> +	    strcmp(ms.map->dso->short_name, "[kernel.vmlinux]") &&
>>> +	    append_inlines(cursor, &ms, ip) == 0)
>>>   		goto out;
>>>   
>>>   	srcline = callchain_srcline(&ms, al.addr);
>>>
>>>>> - Arnaldo
>>>>>
>>>>> ----
>>>>>
>>>>> This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e.
>>>>>
>>>>> The tests were made with a specific workload, further tests on a
>>>>> recently updated fedora 38 system with a system wide perf.data file
>>>>> shows 'perf report' taking excessive time, so lets revert this until a
>>>>> full investigation and improvement on the addr2line support code is
>>>>> made.
>>>>>

Reported-by: Jesper Dangaard Brouer <hawk@kernel.org>

>>>>> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
>>>>> Cc: Artem Savkov <asavkov@redhat.com>
>>>>> Cc: Namhyung Kim <namhyung@kernel.org>
>>>>> Cc: Adrian Hunter <adrian.hunter@intel.com>
>>>>> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>>>>> Cc: Ian Rogers <irogers@google.com>
>>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>>> Cc: Jiri Olsa <jolsa@kernel.org>
>>>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>>>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>>>>> Cc: Milian Wolff <milian.wolff@kdab.com>
>>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>>> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
>>>>> ---
>>>>>   tools/perf/util/machine.c | 5 -----
>>>>>   1 file changed, 5 deletions(-)
>>>>>
>>>>> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
>>>>> index 4e62843d51b7dbf9..f4cb41ee23cdbcfc 100644
>>>>> --- a/tools/perf/util/machine.c
>>>>> +++ b/tools/perf/util/machine.c
>>>>> @@ -45,7 +45,6 @@
>>>>>   
>>>>>   static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
>>>>>   				     struct thread *th, bool lock);
>>>>> -static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
>>>>>   
>>>>>   static struct dso *machine__kernel_dso(struct machine *machine)
>>>>>   {
>>>>> @@ -2385,10 +2384,6 @@ static int add_callchain_ip(struct thread *thread,
>>>>>   	ms.maps = maps__get(al.maps);
>>>>>   	ms.map = map__get(al.map);
>>>>>   	ms.sym = al.sym;
>>>>> -
>>>>> -	if (!branch && append_inlines(cursor, &ms, ip) == 0)
>>>>> -		goto out;
>>>>> -
>>>>>   	srcline = callchain_srcline(&ms, al.addr);
>>>>>   	err = callchain_cursor_append(cursor, ip, &ms,
>>>>>   				      branch, flags, nr_loop_iter,

Tested-by: Jesper Dangaard Brouer <hawk@kernel.org>

--Jesper
  
Arnaldo Carvalho de Melo Aug. 8, 2023, 1:49 p.m. UTC | #6
Em Tue, Aug 08, 2023 at 03:38:37PM +0200, Jesper Dangaard Brouer escreveu:
> 
> 
> On 07/08/2023 16.03, Artem Savkov wrote:
> > On Mon, Aug 07, 2023 at 10:34:44AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, Aug 07, 2023 at 01:00:08PM +0200, Artem Savkov escreveu:
> > > > On Wed, Aug 02, 2023 at 09:43:40AM +0200, Artem Savkov wrote:
> > > > > Hi Arnaldo,
> > > > > 
> > > > > On Tue, Aug 01, 2023 at 06:42:47PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > > Hi Artem,
> > > > > > 
> > > > > > 	Can you please double check this? I reproduced with:
> > > > > > 
> > > > > > git checkout 46d21ec067490ab9cdcc89b9de5aae28786a8b8e
> > > > > > build it
> > > > > > perf record -a -g sleep 5s
> > > > > > perf report
> > > > > > 
> > > > > > 	Do you get the same slowness and then reverting it, i.e. just
> > > > > > going to HEAD~ and rebuilding getting a fast 'perf report' startup, i.e.
> > > > > > without the inlines in the callchains?
> > > > > 
> > > > > With a simple test like this I definitely get a slowdown, but not sure
> > > > > if it can be called excessive.
> > > > > 
> > > > > Below are the times I got by running 'time perf report' and hitting 'q'
> > > > > during load so that it quits as soon as it is loads up. Tested on a
> > > > > freshly updated fedora 38.
> > > > 
> 
> I reported this problem to ACME. It is also possible to reproduce
> without hitting 'q' via using this cmdline with --stdio like this:
> 
>  time perf report -v --stdio > /dev/null 2> debug01.stderr
> 
> The file 'debug01.stderr' contained a lot of addr2line output, that
> might help debug the issue further.
> 
> 
> 
> > > > My bad, I had wrong debuginfo installed for the kernel I tested. I can
> > > > reproduce it with the correct one. Looks like vmlinux is just too much
> > > > for addr2line. Maybe we can skip it but leave other inlines in, like so:
> > > 
> > > That is a possibilit, and probably we could make it cheaper by looking
> > > at the cpumode, avoiding calling addr2line when we didn't makage to
> > > resolve the symbol, etc.
> > > 
> > > We also may want to have this as an option that has to be explicitely
> > > enabled, like --resolve-inlines, as this will add overhead no matter if
> > > we stop calling addr2line and do it more efficiently, etc.
> > 
> > Sounds good, I'll look into it.
> > 
> > > Fact is, we're late in the 6.5 schedule, so the best thing now is to
> > > just revert the patch and then try again later, ok?
> > 
> > Yes, sure.
> > 
> > > - Arnaldo
> > > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > > index 11de3ca8d4fa7..fef309cd401f7 100644
> > > > --- a/tools/perf/util/machine.c
> > > > +++ b/tools/perf/util/machine.c
> > > > @@ -2388,7 +2388,9 @@ static int add_callchain_ip(struct thread *thread,
> > > >   	ms.map = map__get(al.map);
> > > >   	ms.sym = al.sym;
> > > > -	if (!branch && append_inlines(cursor, &ms, ip) == 0)
> > > > +	if (!branch && ms.map && ms.map->dso &&
> > > > +	    strcmp(ms.map->dso->short_name, "[kernel.vmlinux]") &&
> > > > +	    append_inlines(cursor, &ms, ip) == 0)
> > > >   		goto out;
> > > >   	srcline = callchain_srcline(&ms, al.addr);
> > > > 
> > > > > > - Arnaldo
> > > > > > 
> > > > > > ----
> > > > > > 
> > > > > > This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e.
> > > > > > 
> > > > > > The tests were made with a specific workload, further tests on a
> > > > > > recently updated fedora 38 system with a system wide perf.data file
> > > > > > shows 'perf report' taking excessive time, so lets revert this until a
> > > > > > full investigation and improvement on the addr2line support code is
> > > > > > made.
> > > > > > 
> 
> Reported-by: Jesper Dangaard Brouer <hawk@kernel.org>

Thanks, I did add that locally, will keep it and add your Tested-by,
thanks a lot!

- Arnaldo
 
> > > > > > Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> > > > > > Cc: Artem Savkov <asavkov@redhat.com>
> > > > > > Cc: Namhyung Kim <namhyung@kernel.org>
> > > > > > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > > > > > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > > > > > Cc: Ian Rogers <irogers@google.com>
> > > > > > Cc: Ingo Molnar <mingo@redhat.com>
> > > > > > Cc: Jiri Olsa <jolsa@kernel.org>
> > > > > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > > > > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > > > > > Cc: Milian Wolff <milian.wolff@kdab.com>
> > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> > > > > > ---
> > > > > >   tools/perf/util/machine.c | 5 -----
> > > > > >   1 file changed, 5 deletions(-)
> > > > > > 
> > > > > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > > > > index 4e62843d51b7dbf9..f4cb41ee23cdbcfc 100644
> > > > > > --- a/tools/perf/util/machine.c
> > > > > > +++ b/tools/perf/util/machine.c
> > > > > > @@ -45,7 +45,6 @@
> > > > > >   static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
> > > > > >   				     struct thread *th, bool lock);
> > > > > > -static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
> > > > > >   static struct dso *machine__kernel_dso(struct machine *machine)
> > > > > >   {
> > > > > > @@ -2385,10 +2384,6 @@ static int add_callchain_ip(struct thread *thread,
> > > > > >   	ms.maps = maps__get(al.maps);
> > > > > >   	ms.map = map__get(al.map);
> > > > > >   	ms.sym = al.sym;
> > > > > > -
> > > > > > -	if (!branch && append_inlines(cursor, &ms, ip) == 0)
> > > > > > -		goto out;
> > > > > > -
> > > > > >   	srcline = callchain_srcline(&ms, al.addr);
> > > > > >   	err = callchain_cursor_append(cursor, ip, &ms,
> > > > > >   				      branch, flags, nr_loop_iter,
> 
> Tested-by: Jesper Dangaard Brouer <hawk@kernel.org>
> 
> --Jesper
  

Patch

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 4e62843d51b7dbf9..f4cb41ee23cdbcfc 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -45,7 +45,6 @@ 
 
 static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
 				     struct thread *th, bool lock);
-static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
 
 static struct dso *machine__kernel_dso(struct machine *machine)
 {
@@ -2385,10 +2384,6 @@  static int add_callchain_ip(struct thread *thread,
 	ms.maps = maps__get(al.maps);
 	ms.map = map__get(al.map);
 	ms.sym = al.sym;
-
-	if (!branch && append_inlines(cursor, &ms, ip) == 0)
-		goto out;
-
 	srcline = callchain_srcline(&ms, al.addr);
 	err = callchain_cursor_append(cursor, ip, &ms,
 				      branch, flags, nr_loop_iter,