Message ID | 20231102162225.50028-1-nick.forrington@arm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp474613vqu; Thu, 2 Nov 2023 09:23:05 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHQlHuasxEh+6Z0ID2VNxHeU+xae0QaaR97lkWAcw0v1+aMOVrmiJRwwejZ//5iJw2RwFov X-Received: by 2002:a17:903:26c3:b0:1c9:fdbf:296a with SMTP id jg3-20020a17090326c300b001c9fdbf296amr16010500plb.8.1698942185493; Thu, 02 Nov 2023 09:23:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698942185; cv=none; d=google.com; s=arc-20160816; b=mfqsDJpHRQlORabmRWBIs1oy9kg+zj+bnoVo05YoroFNa/ZjQFpx8Gb2KIzQQEcqjU u7FKvWb88/0qF2xy4xvH/5bdpEOtM0mfURG/01EqKIUMAvlfqgXdljpDPuie2/MHUrcu hm5xKBWfVPvcHhQfSGHYTV1fT9zNWuI/WbimnAgcveI386FudVZ3awxp5kMwiHYASLMu tPQaJoIgnqBfHvXgEodF0LN1Ij4CWebmsj0Xm7gYKTcN2OGz97T2ZO/6TxDSMaeTqD9h 6isRB6sVNMvQ0UAzQ55EyplvrR6+WkflATt5ryYFnhZ7isTzxWcm+8KB2hYiVX1FjB3J aYFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=6jO3i5ZHHx4eTXnBm2SQv30Sqe08v70jQSZj8afHIgE=; fh=KZwJI2IhX2NkYw3UI/ow2q4exUfD2a8wOZqtpxoZpb8=; b=kiWPCOYWAVnDTCXOUa2HUeKBiYs5sRUBnV+7QN4Uie67wQEtd5+hJUAj1iG7dGPZnd wGbfskGodj9LW5rz2OqqF4i+xjh5n6qIctuB5fRq8KLkTL/NB0NSWgsvOda/fbhft1cg 6hqyfpv9yaZgHtyH+ZU2uBv9VzekrzCyL3z25RzBHgY4ashv1UVfAhMAxcWQ4eS+9UXi 1YuV54wZOufv68lH96lZ9nP4u4E01163U0UMKaklNG31LyIFeSHiRkkGXXwLTZZ9Doto 9r9iqbk4GiyI9nVNa2EcMTX65Ci6EZf/weL4b5IUVMUtPpUO1jVAxzsKj/tE/W7LyZGu wTAQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id j3-20020a170903028300b001c9c3212c88si105176plr.253.2023.11.02.09.23.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Nov 2023 09:23:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id E9E33817C8BD; Thu, 2 Nov 2023 09:23:01 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234161AbjKBQWx (ORCPT <rfc822;heyuhang3455@gmail.com> + 35 others); Thu, 2 Nov 2023 12:22:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234506AbjKBQWv (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 2 Nov 2023 12:22:51 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8484312E; Thu, 2 Nov 2023 09:22:48 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9205E2F4; Thu, 2 Nov 2023 09:23:30 -0700 (PDT) Received: from dsg-hive-n1sdp-01.. (dsg-hive-n1sdp-01.cambridge.arm.com [10.2.2.18]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id D379A3F738; Thu, 2 Nov 2023 09:22:46 -0700 (PDT) From: Nick Forrington <nick.forrington@arm.com> To: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Cc: Nick Forrington <nick.forrington@arm.com>, Mark Rutland <mark.rutland@arm.com>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>, Ian Rogers <irogers@google.com>, Adrian Hunter <adrian.hunter@intel.com>, Arnaldo Carvalho de Melo <acme@redhat.com> Subject: [PATCH] perf test: Remove atomics from test_loop to avoid test failures Date: Thu, 2 Nov 2023 16:22:24 +0000 Message-ID: <20231102162225.50028-1-nick.forrington@arm.com> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Thu, 02 Nov 2023 09:23:02 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781470001575927924 X-GMAIL-MSGID: 1781470001575927924 |
Series |
perf test: Remove atomics from test_loop to avoid test failures
|
|
Commit Message
Nick Forrington
Nov. 2, 2023, 4:22 p.m. UTC
The current use of atomics can lead to test failures, as tests (such as
tests/shell/record.sh) search for samples with "test_loop" as the
top-most stack frame, but find frames related to the atomic operation
(e.g. __aarch64_ldadd4_relax).
This change simply removes the "count" variable, as it is not necessary.
Fixes: 1962ab6f6e0b ("perf test workload thloop: Make count increments atomic")
Signed-off-by: Nick Forrington <nick.forrington@arm.com>
---
tools/perf/tests/workloads/thloop.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
Comments
On 02/11/2023 16:22, Nick Forrington wrote: > The current use of atomics can lead to test failures, as tests (such as > tests/shell/record.sh) search for samples with "test_loop" as the > top-most stack frame, but find frames related to the atomic operation > (e.g. __aarch64_ldadd4_relax). > > This change simply removes the "count" variable, as it is not necessary. > > Fixes: 1962ab6f6e0b ("perf test workload thloop: Make count increments atomic") > Signed-off-by: Nick Forrington <nick.forrington@arm.com> > --- > tools/perf/tests/workloads/thloop.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/tools/perf/tests/workloads/thloop.c b/tools/perf/tests/workloads/thloop.c > index af05269c2eb8..457b29f91c3e 100644 > --- a/tools/perf/tests/workloads/thloop.c > +++ b/tools/perf/tests/workloads/thloop.c > @@ -7,7 +7,6 @@ > #include "../tests.h" > > static volatile sig_atomic_t done; > -static volatile unsigned count; > > /* We want to check this symbol in perf report */ > noinline void test_loop(void); > @@ -19,8 +18,7 @@ static void sighandler(int sig __maybe_unused) > > noinline void test_loop(void) > { > - while (!done) > - __atomic_fetch_add(&count, 1, __ATOMIC_RELAXED); > + while (!done); > } > > static void *thfunc(void *arg) Reviewed-by: James Clark <james.clark@arm.com>
Em Fri, Nov 03, 2023 at 09:14:45AM +0000, James Clark escreveu: > On 02/11/2023 16:22, Nick Forrington wrote: > > The current use of atomics can lead to test failures, as tests (such as > > tests/shell/record.sh) search for samples with "test_loop" as the > > top-most stack frame, but find frames related to the atomic operation > > (e.g. __aarch64_ldadd4_relax). > > This change simply removes the "count" variable, as it is not necessary. > > Fixes: 1962ab6f6e0b ("perf test workload thloop: Make count increments atomic") > > Signed-off-by: Nick Forrington <nick.forrington@arm.com> > > +++ b/tools/perf/tests/workloads/thloop.c > > @@ -7,7 +7,6 @@ > > #include "../tests.h" > > static volatile sig_atomic_t done; > > -static volatile unsigned count; > > /* We want to check this symbol in perf report */ > > noinline void test_loop(void); > > @@ -19,8 +18,7 @@ static void sighandler(int sig __maybe_unused) > > noinline void test_loop(void) > > { > > - while (!done) > > - __atomic_fetch_add(&count, 1, __ATOMIC_RELAXED); > > + while (!done); > > } > > static void *thfunc(void *arg) > Reviewed-by: James Clark <james.clark@arm.com> Thanks, applied to perf-tools-next. - Arnaldo
On Thu, 2 Nov 2023, Nick Forrington wrote: > The current use of atomics can lead to test failures, as tests (such as > tests/shell/record.sh) search for samples with "test_loop" as the > top-most stack frame, but find frames related to the atomic operation > (e.g. __aarch64_ldadd4_relax). > > This change simply removes the "count" variable, as it is not necessary. Hello. I believe that it was there to prevent the compiler to optimize the loop out or some reason like that. Hopefully, it will work even without that on all architectures with all compilers that are used for building perf... I also think that the tests should be designed in a more robust way, so that they aren't directly dependent on exact stack frames, e.g. let's look at the "inet_pton" testcase... The inet_pton test result check algorithm is designed to rely on exact stacktrace, without a possibility to specify something like "we want this and that in the stack trace, but otherwise, it does not matter which wrappers are aroung". Then there must be the following code to adjust the expected output exactly per architecture: echo "ping[][0-9 \.:]+$event_name: \([[:xdigit:]]+\)" > $expected echo ".*inet_pton\+0x[[:xdigit:]]+[[:space:]]\($libc|inlined\)$" >> $expected case "$(uname -m)" in s390x) eventattr='call-graph=dwarf,max-stack=4' echo "(__GI_)?getaddrinfo\+0x[[:xdigit:]]+[[:space:]]\($libc|inlined\)$" >> $expected echo "main\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$" >> $expected ;; ppc64|ppc64le) eventattr='max-stack=4' echo "gaih_inet.*\+0x[[:xdigit:]]+[[:space:]]\($libc\)$" >> $expected echo "getaddrinfo\+0x[[:xdigit:]]+[[:space:]]\($libc\)$" >> $expected echo ".*(\+0x[[:xdigit:]]+|\[unknown\])[[:space:]]\(.*/bin/ping.*\)$" >> $expected ;; *) eventattr='max-stack=3' echo ".*(\+0x[[:xdigit:]]+|\[unknown\])[[:space:]]\(.*/bin/ping.*\)$" >> $expected ;; esac Of course, since it relies on libc version, then we need patches, such as 1f85d016768f ("perf test record+probe_libc_inet_pton: Fix call chain match on x86_64") 311693ce81c9 ("perf test record+probe_libc_inet_pton: Fix call chain match on s390") fb710ddee75f ("perf test record_probe_libc_inet_pton: Fix test on s/390 where 'text_to_binary_address' now appears on the backtrace") ... which break the test when used with older libc... I think that a better design of such test is either probing some program of ourselves that has predictable and stable function call sequence or be more robust in checking the stack trace. Thoughts? Michael > > Fixes: 1962ab6f6e0b ("perf test workload thloop: Make count increments atomic") > Signed-off-by: Nick Forrington <nick.forrington@arm.com> > --- > tools/perf/tests/workloads/thloop.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/tools/perf/tests/workloads/thloop.c b/tools/perf/tests/workloads/thloop.c > index af05269c2eb8..457b29f91c3e 100644 > --- a/tools/perf/tests/workloads/thloop.c > +++ b/tools/perf/tests/workloads/thloop.c > @@ -7,7 +7,6 @@ > #include "../tests.h" > > static volatile sig_atomic_t done; > -static volatile unsigned count; > > /* We want to check this symbol in perf report */ > noinline void test_loop(void); > @@ -19,8 +18,7 @@ static void sighandler(int sig __maybe_unused) > > noinline void test_loop(void) > { > - while (!done) > - __atomic_fetch_add(&count, 1, __ATOMIC_RELAXED); > + while (!done); > } > > static void *thfunc(void *arg) > -- > 2.42.0 > > >
Hi all, On Fri, Nov 24, 2023 at 08:57:52PM +0100, Michael Petlan wrote: > On Thu, 2 Nov 2023, Nick Forrington wrote: > > The current use of atomics can lead to test failures, as tests (such as > > tests/shell/record.sh) search for samples with "test_loop" as the > > top-most stack frame, but find frames related to the atomic operation > > (e.g. __aarch64_ldadd4_relax). I am confused by above description. As I went through the script record.sh, which is the only test invoking the program 'test_loop', but I don't find any test is related with stack frame. Do I miss anything? I went through record.sh but no clue why the failure is caused by stack frame. All the testings use command: if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" ... fi @Nick, could you narrow down which specific test case causing the failure. [...] > I believe that it was there to prevent the compiler to optimize the loop > out or some reason like that. Hopefully, it will work even without that > on all architectures with all compilers that are used for building perf... Agreed. As said above, I'd like to step back a bit for making clear what's the exactly failure caused by the program. Thanks, Leo
On 25/11/2023 03:05, Leo Yan wrote: > Hi all, > > On Fri, Nov 24, 2023 at 08:57:52PM +0100, Michael Petlan wrote: >> On Thu, 2 Nov 2023, Nick Forrington wrote: >>> The current use of atomics can lead to test failures, as tests (such as >>> tests/shell/record.sh) search for samples with "test_loop" as the >>> top-most stack frame, but find frames related to the atomic operation >>> (e.g. __aarch64_ldadd4_relax). > I am confused by above description. As I went through the script > record.sh, which is the only test invoking the program 'test_loop', > but I don't find any test is related with stack frame. > > Do I miss anything? I went through record.sh but no clue why the > failure is caused by stack frame. All the testings use command: > > if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" > ... > fi > > @Nick, could you narrow down which specific test case causing the > failure. > > [...] All checks for ${testsym} in record.sh (including the example you provide) can fail, as the expected symbol (test_loop) is not the top-most function on the stack (and therefore not the symbol associated with the sample). Example perf report output: # Overhead Command Shared Object Symbol # ........ ....... ..................... ............................. # 99.53% perf perf [.] __aarch64_ldadd4_relax ... You can see the issue when recording/reporting with call stacks: # Children Self Command Shared Object Symbol # ........ ........ ....... ..................... .......................................................... # 99.52% 99.52% perf perf [.] __aarch64_ldadd4_relax | |--49.77%--0xffffb905a5dc | 0xffffb8ff0aec | thfunc | test_loop | __aarch64_ldadd4_relax ... > >> I believe that it was there to prevent the compiler to optimize the loop >> out or some reason like that. Hopefully, it will work even without that >> on all architectures with all compilers that are used for building perf... > Agreed. > > As said above, I'd like to step back a bit for making clear what's the > exactly failure caused by the program. I don't think this loop could be sensibly optimised away, as it depends on "done", which is defined at file scope (and assigned by a signal handler). Cheers, Nick > > Thanks, > Leo >
On Sat, Nov 25, 2023 at 07:10:25PM +0000, Nick Forrington wrote: [...] > > @Nick, could you narrow down which specific test case causing the > > failure. > > > > [...] > > All checks for ${testsym} in record.sh (including the example you provide) > can fail, as the expected symbol (test_loop) is not the top-most function on > the stack (and therefore not the symbol associated with the sample). > > > Example perf report output: > > # Overhead Command Shared Object Symbol > # ........ ....... ..................... ............................. > # > 99.53% perf perf [.] __aarch64_ldadd4_relax Thanks for confirmation. I cannot reproduce the issue on my Juno board with Debian (buster). It's likely because I don't use GCC toolchain with enabling '-moutline-atomics' AArch64 flag [1]. > ... > > > You can see the issue when recording/reporting with call stacks: > > # Children Self Command Shared Object Symbol > # ........ ........ ....... ..................... > .......................................................... > # > 99.52% 99.52% perf perf [.] > __aarch64_ldadd4_relax > | > |--49.77%--0xffffb905a5dc > | 0xffffb8ff0aec > | thfunc > | test_loop > | __aarch64_ldadd4_relax > ... > > > > > > I believe that it was there to prevent the compiler to optimize the loop > > > out or some reason like that. Hopefully, it will work even without that > > > on all architectures with all compilers that are used for building perf... > > Agreed. > > > > As said above, I'd like to step back a bit for making clear what's the > > exactly failure caused by the program. > > I don't think this loop could be sensibly optimised away, as it depends on > "done", which is defined at file scope (and assigned by a signal handler). I verified your patch with 'perf annotate'. The disassembly on Arm64 is: noinline void test_loop(void) { stp x29, x30, [sp, #-32]! mov x29, sp adrp x0, options+0x4c0 ldr x0, [x0, #3664] ldr x1, [x0] str x1, [sp, #24] mov x1, #0x0 // #0 while (!done); nop 20: adrp x0, spacing.22251 add x0, x0, #0xa04 ldr w0, [x0] cmp w0, #0x0 ↑ b.eq 20 } The disassembly on x86_64 is: noinline void test_loop(void) { endbr64 push %rbp mov %rsp,%rbp sub $0x10,%rsp mov %fs:0x28,%rax mov %rax,-0x8(%rbp) xor %eax,%eax while (!done); nop 1c: mov done,%eax test %eax,%eax ↑ je 1c } Maybe the commit log caused a bit confusion, the problem is after enabling "-moutline-atomics" on aarch64, the overhead is altered into the linked __aarch64_ldadd4_relax() function, test_loop() cannot be sampled anymore, but it's not about stack tracing. Anyway, the patch is fine for me. Thanks, Leo [1] https://stackoverflow.com/questions/65239845/how-to-enable-mno-outline-atomics-aarch64-flag
On 24/11/2023 19:57, Michael Petlan wrote: > On Thu, 2 Nov 2023, Nick Forrington wrote: >> The current use of atomics can lead to test failures, as tests (such as >> tests/shell/record.sh) search for samples with "test_loop" as the >> top-most stack frame, but find frames related to the atomic operation >> (e.g. __aarch64_ldadd4_relax). >> >> This change simply removes the "count" variable, as it is not necessary. > > Hello. > > I believe that it was there to prevent the compiler to optimize the loop > out or some reason like that. Hopefully, it will work even without that > on all architectures with all compilers that are used for building perf... > If that optimisation ever happens and there is a concrete case, I think it should be treated like any other test regression and fixed at that time when it's known what the specifics of that issue are. As Nick says in a later comment, the loop can't be optimised out because it depends on the done variable and the function is noinline. > I also think that the tests should be designed in a more robust way, so > that they aren't directly dependent on exact stack frames, e.g. let's look > at the "inet_pton" testcase... This one doesn't look like it's dependent on exact stack frames, but just one frame at the end, which unless something is actually broken in Perf causing a failure, I don't see how could be made any more robust. > > The inet_pton test result check algorithm is designed to rely on exact > stacktrace, without a possibility to specify something like "we want this > and that in the stack trace, but otherwise, it does not matter which > wrappers are aroung". Then there must be the following code to adjust > the expected output exactly per architecture: > > echo "ping[][0-9 \.:]+$event_name: \([[:xdigit:]]+\)" > $expected > echo ".*inet_pton\+0x[[:xdigit:]]+[[:space:]]\($libc|inlined\)$" >> $expected > case "$(uname -m)" in > s390x) > eventattr='call-graph=dwarf,max-stack=4' > echo "(__GI_)?getaddrinfo\+0x[[:xdigit:]]+[[:space:]]\($libc|inlined\)$" >> $expected > echo "main\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$" >> $expected > ;; > ppc64|ppc64le) > eventattr='max-stack=4' > echo "gaih_inet.*\+0x[[:xdigit:]]+[[:space:]]\($libc\)$" >> $expected > echo "getaddrinfo\+0x[[:xdigit:]]+[[:space:]]\($libc\)$" >> $expected > echo ".*(\+0x[[:xdigit:]]+|\[unknown\])[[:space:]]\(.*/bin/ping.*\)$" >> $expected > ;; > *) > eventattr='max-stack=3' > echo ".*(\+0x[[:xdigit:]]+|\[unknown\])[[:space:]]\(.*/bin/ping.*\)$" >> $expected > ;; > esac > > Of course, since it relies on libc version, then we need patches, such as > 1f85d016768f ("perf test record+probe_libc_inet_pton: Fix call chain match on x86_64") > 311693ce81c9 ("perf test record+probe_libc_inet_pton: Fix call chain match on s390") > fb710ddee75f ("perf test record_probe_libc_inet_pton: Fix test on s/390 where 'text_to_binary_address' now appears on the backtrace") > ... > which break the test when used with older libc... > I definitely think that relying on external libraries for stacks in tests can be annoying to keep up to date, but that's exactly what we just removed from test_loop. For the inet_pton test it seems to be specifically testing probing libc, so maybe making the test program ourselves would make the test much less valuable. And then as long as all the tests are not like that and there is only this one to keep up to date, maybe it's not that bad. > I think that a better design of such test is either probing some program > of ourselves that has predictable and stable function call sequence or > be more robust in checking the stack trace. > > Thoughts? > > Michael > >> >> Fixes: 1962ab6f6e0b ("perf test workload thloop: Make count increments atomic") >> Signed-off-by: Nick Forrington <nick.forrington@arm.com> >> --- >> tools/perf/tests/workloads/thloop.c | 4 +--- >> 1 file changed, 1 insertion(+), 3 deletions(-) >> >> diff --git a/tools/perf/tests/workloads/thloop.c b/tools/perf/tests/workloads/thloop.c >> index af05269c2eb8..457b29f91c3e 100644 >> --- a/tools/perf/tests/workloads/thloop.c >> +++ b/tools/perf/tests/workloads/thloop.c >> @@ -7,7 +7,6 @@ >> #include "../tests.h" >> >> static volatile sig_atomic_t done; >> -static volatile unsigned count; >> >> /* We want to check this symbol in perf report */ >> noinline void test_loop(void); >> @@ -19,8 +18,7 @@ static void sighandler(int sig __maybe_unused) >> >> noinline void test_loop(void) >> { >> - while (!done) >> - __atomic_fetch_add(&count, 1, __ATOMIC_RELAXED); >> + while (!done); >> } >> >> static void *thfunc(void *arg) >> -- >> 2.42.0 >> >> >> > >
Em Sun, Nov 26, 2023 at 03:41:14PM +0800, Leo Yan escreveu: > Maybe the commit log caused a bit confusion, the problem is after We'll have the Link pointing to this discussion. > enabling "-moutline-atomics" on aarch64, the overhead is altered into > the linked __aarch64_ldadd4_relax() function, test_loop() cannot be > sampled anymore, but it's not about stack tracing. > > Anyway, the patch is fine for me. I'm taking this as an Acked-by: Leo But probably this could be even a Tested-by, ok? - Arnaldo
On Mon, Nov 27, 2023 at 10:20:47AM -0300, Arnaldo Carvalho de Melo wrote: > Em Sun, Nov 26, 2023 at 03:41:14PM +0800, Leo Yan escreveu: > > Maybe the commit log caused a bit confusion, the problem is after > > We'll have the Link pointing to this discussion. Okay, it's a good tracking. > > enabling "-moutline-atomics" on aarch64, the overhead is altered into > > the linked __aarch64_ldadd4_relax() function, test_loop() cannot be > > sampled anymore, but it's not about stack tracing. > > > > Anyway, the patch is fine for me. > > I'm taking this as an Acked-by: Leo > > But probably this could be even a Tested-by, ok? Yes, here is my test tag: Tested-by: Leo Yan <leo.yan@linaro.org>
diff --git a/tools/perf/tests/workloads/thloop.c b/tools/perf/tests/workloads/thloop.c index af05269c2eb8..457b29f91c3e 100644 --- a/tools/perf/tests/workloads/thloop.c +++ b/tools/perf/tests/workloads/thloop.c @@ -7,7 +7,6 @@ #include "../tests.h" static volatile sig_atomic_t done; -static volatile unsigned count; /* We want to check this symbol in perf report */ noinline void test_loop(void); @@ -19,8 +18,7 @@ static void sighandler(int sig __maybe_unused) noinline void test_loop(void) { - while (!done) - __atomic_fetch_add(&count, 1, __ATOMIC_RELAXED); + while (!done); } static void *thfunc(void *arg)