From patchwork Thu Oct 20 11:38:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 6162 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp54756wrr; Thu, 20 Oct 2022 04:39:29 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7JeOwtW8Q2iG+gScjuIdPukvWJCrtr11sNR3yAEp/UZtOWcHWDFqbKHyIQMT0y6HwAjV+I X-Received: by 2002:a17:907:3da5:b0:78e:793:4084 with SMTP id he37-20020a1709073da500b0078e07934084mr10751815ejc.285.1666265969315; Thu, 20 Oct 2022 04:39:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666265969; cv=none; d=google.com; s=arc-20160816; b=EhSA37oOSeW0JNvGamwaxZUoKKC3Vh3gmoIgbUUj+Hrllwhfade1/k+c/2l3H/vjjo TugyXKCT0WYoyuab2t8eaPVdCsiaBswTIxfvjk7OkIxYngizuyq9Kldh3unhVZTeyQKF VylPNt2JDE0AI9LUvcVMXlnePKyglvEp/ctj1O23fI2WP1vDYIoz5EG71wySRX0hrAfe Bbhc+4l6PnHRlT5C90oxGKG60I37DMlGdZcRoBmSnfplc0cARu+hhXlwYVEUGJwb54QR lcrG9SSjmwZ6ocox/DhGv6MeDUoEnpkbc1QtacWw8cCcx6ADi1r1g6P/tIaqOjhUKctv FPjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mime-version:message-id:date :user-agent:references:in-reply-to:subject:to:from:ironport-sdr :dmarc-filter:delivered-to; bh=7SaZYzc1+szYxt9fD0uym1OrbYRWwqQeJtYuWlPjJxw=; b=Ey8BB4xjxO++EqbByzGCQqmgx2PqNQTQejUUxCb8lQraTbQi/yKk7dSJ9u6qZKr9nH qCaDB7g9u3g4VcP1B9yU+M5Fnhr1CRBtObU4l1+/i6/iHpgkyttdmINH2EdhUF86FmHd 6KyzkKLOfvEXsWDpM2DgvA+KJWwpI6Ub3sW9MuERxmQ3/721RLvIUiHnkxJZ5i5x7cyw Qv1EU6BJi1G0Puz0WRAQ0lAj76c6OUtcw8/AEsLrxnh635x3RncsSfIBDRBAvLbwrbZc emqnYNXJlvVxYnls8XgXgmCrfC/jxlNVyYc62rRdF0ncZt8iYNNjJiz+NAG0BG8oImfr wi+A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id sb10-20020a1709076d8a00b0078e1e216396si16746461ejc.66.2022.10.20.04.39.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Oct 2022 04:39:29 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A3A5F3851A83 for ; Thu, 20 Oct 2022 11:39:19 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id BDFDC38561B1 for ; Thu, 20 Oct 2022 11:38:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BDFDC38561B1 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.95,198,1661846400"; d="scan'208,223";a="85206457" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 20 Oct 2022 03:38:50 -0800 IronPort-SDR: DjwrRO/4gDyY8nqdTJ7/p63ct+KgV5s+j8JyNWisfFrZju/sgGZ9tDZmb0848kCejT+LGQBPma JCVt/AjVFRTkmWE0ipZgwnHxwR7iSbCsYF1S5UFpiHt3SwR+vZ/74fuECl/0+ZpWyREpCNu321 kNf0NR4BovFxyhxpMyd+OZnChXqhxnMZz8lC+yfoiFy27L+lhr52GJyuQ+iqxhw0JamWgOAGYA 8+2T6SmTZTaL2cqPE0B/+5qYIhzMGiR/jxPe+iLJLA6A22Sq0KPsMrFSBw/kX0v5sZEYxEyMxz MYM= From: Thomas Schwinge To: Aldy Hernandez , Subject: Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195] (was: Add 'c-c++-common/torture/pr107195-1.c' [PR107195] (was: [COMMITTED] [PR107195] Set range to zero when nonzero mask is 0.)) In-Reply-To: References: <20221011083137.336470-1-aldyh@redhat.com> <878rlej3o6.fsf@euler.schwinge.homeip.net> <87o7uafqyf.fsf@dem-tschwing-1.ger.mentorg.com> User-Agent: Notmuch/0.29.1+93~g67ed7df (https://notmuchmail.org) Emacs/26.3 (x86_64-pc-linux-gnu) Date: Thu, 20 Oct 2022 13:38:43 +0200 Message-ID: <87y1taencs.fsf@dem-tschwing-1.ger.mentorg.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-07.mgc.mentorg.com (139.181.222.7) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_LOTSOFHASH, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747206505060811177?= X-GMAIL-MSGID: =?utf-8?q?1747206505060811177?= Hi! On 2022-10-18T07:41:29+0200, Aldy Hernandez wrote: > On Mon, Oct 17, 2022 at 4:47 PM Thomas Schwinge wrote: >> On 2022-10-17T15:58:47+0200, Aldy Hernandez wrote: >> > On Mon, Oct 17, 2022 at 9:44 AM Thomas Schwinge wrote: >> >> On 2022-10-11T10:31:37+0200, Aldy Hernandez via Gcc-patches wrote: >> >> > When solving 0 = _15 & 1, we calculate _15 as: >> >> > >> >> > [irange] int [-INF, -2][0, +INF] NONZERO 0xfffffffe >> >> > >> >> > The known value of _15 is [0, 1] NONZERO 0x1 which is intersected with >> >> > the above, yielding: >> >> > >> >> > [0, 1] NONZERO 0x0 >> >> > >> >> > This eventually gets copied to a _Bool [0, 1] NONZERO 0x0. >> >> > >> >> > This is problematic because here we have a bool which is zero, but >> >> > returns false for irange::zero_p, since the latter does not look at >> >> > nonzero bits. This causes logical_combine to assume the range is >> >> > not-zero, and all hell breaks loose. >> >> > >> >> > I think we should just normalize a nonzero mask of 0 to [0, 0] at >> >> > creation, thus avoiding all this. >> >> >> >> 1. This commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04 >> >> "[PR107195] Set range to zero when nonzero mask is 0" broke a GCC/nvptx >> >> offloading test case: >> >> >> >> UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 >> >> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 (test for excess errors) >> >> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test >> >> [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* [0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}" >> >> >> >> Same for C++. >> >> >> >> I'll later send a patch (for the test case!) to fix that up. >> >> >> >> 2. Looking into this, I found that this >> >> commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04 >> >> "[PR107195] Set range to zero when nonzero mask is 0" actually enables a >> >> code transformation/optimization that GCC apparently has not been doing >> >> before! I've tried to capture that in the attached >> >> "Add 'c-c++-common/torture/pr107195-1.c' [PR107195]". >> > >> > Nice. >> > >> >> Will you please verify that one? In its current '#if 1' configuration, >> >> it's all-PASS after commit >> >> r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04 >> >> "[PR107195] Set range to zero when nonzero mask is 0", whereas before, we >> >> get two calls to 'foo', because GCC apparently didnn't understand the >> >> relation (optimization opportunity) between 'r *= 2;' and the subsequent >> >> 'if (r & 1)'. >> > >> > Yeah, that looks correct. We keep better track of nonzero masks. >> >> OK, next observation: this also works for split-up expressions >> 'if ((r & 2) && (r & 1))' (same rationale as for 'if (r & 1)' alone). >> I've added such a variant in my test case. > > Unless I'm missing something, your testcase doesn't have a body for > foo[123], so GCC has no way to know what any of those functions did or > what bits are set/unset. Ah, there seems to be some confusion what's happening here. :-) First, these functions, 'foo[...]', are '__attribute__((const))', and their argument, 'r' doesn't change if the first 'foo[...]' call returns zero. Thus, GCC can infer that the second 'foo[...]' call also must return zero, and thus may elide that second function call. Second, should the first 'foo[...]' call return non-zero, 'r *= 2;' is executed, and thus GCC can infer that 'if (r & 1)' can never hold, and thus the 'if' branch is not executed, and thus it may elide the second function call for that scenario, too. Thus, the second function is completely elided. The attached "Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195]" demonstrates that this does work for 'if (r & 1)' in 'f1', 'foo1', and also does work for 'if ((r & 2) && (r & 1))' in 'f2', 'foo2', but: >> But: it doesn't work for logically equal 'if (r & 3)'. ... in 'f3', 'foo3'. I understand 'r & 3' to be logically equivalent to '(r & 2) && (r & 1)', right? >> I've added such >> an XFAILed variant in my test case. Do you have guidance what needs to >> be done to make such cases work, too? Thus my question, where/how GCC would learn this? Otherwise: >> >> I've left in the other '#if' variants in case you'd like to experiment >> >> with these, but would otherwise clean that up before pushing. >> >> >> >> Where does one put such a test case? >> >> >> >> Should the file be named 'pr107195' or something else? >> > >> > The aforementioned patch already has: >> > >> > * gcc.dg/tree-ssa/pr107195-1.c: New test. >> > * gcc.dg/tree-ssa/pr107195-2.c: New test. >> > >> > So I would just add a pr107195-3.c test. >> >> But note that unlike yours in 'gcc.dg/tree-ssa/', I had put mine into >> 'c-c++-common/torture/'. That's so that we get C and C++ testing, and >> all torture testing flag variants. (... where we do see the optimization >> happen starting at '-O1'.) Do you think that is excessive, and a single >> 'gcc.dg/tree-ssa/' test case, C only, '-O1' only is sufficient for this? >> (I don't have much experience with test cases in such regions of GCC, >> hence these questions.) > > My personal preference is tree-ssa since they are middle end tests. > Also, since we're testing ranger, it primarily runs in DOM, VRP, evrp, > and the backward threader, so no need to run it at multiple > optimization levels. > > I suggested DOM, because I know ranger runs within DOM, so if the > transformation is seen at -O1, it's likely to be done there. Also, > evrp/VRP don't run at -O1, so that's another hint it happened in DOM. > This is a guess though, it could've been CCP setting a nonzero mask, > which then ranger/DOM picked up. > > All in all, I'm in favor of putting tests as early as possible, > otherwise any number of passes could perform a transformation that > could lead to the same end result. We are testing ranger, so the most > likely place to put this test is in DOM at -O1, or in evrp/VRP[12] for > -O2. > > Of course, this is my personal preference, and these are just general > guidelines. Perhaps others can opine. Thanks, that's exactly what I needed to hear, and makes perfect sense to me. Updated "Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195]" is attached. OK to push? Grüße Thomas >> >> Do we scan 'optimized', or an earlier dump? >> >> >> >> At '-O1', the actual code transformation is visible already in the 'dom2' >> >> dump: >> >> >> >> [local count: 536870913]: >> >> gimple_assign >> >> + gimple_assign >> >> + goto ; [100.00%] >> >> >> >> - [local count: 1073741824]: >> >> - # gimple_phi >> >> + [local count: 536870912]: >> >> + # gimple_phi >> >> gimple_assign >> >> gimple_cond >> >> - goto ; [50.00%] >> >> + goto ; [100.00%] >> >> else >> >> - goto ; [50.00%] >> >> + goto ; [0.00%] >> >> >> >> [local count: 536870913]: >> >> gimple_call >> >> gimple_assign >> >> >> >> [local count: 1073741824]: >> >> - # gimple_phi >> >> + # gimple_phi >> >> gimple_return >> >> >> >> And, the actual "avoid second call 'foo'" optimization is visiable >> >> starting 'dom3': >> >> >> >> [local count: 536870913]: >> >> gimple_assign >> >> + goto ; [100.00%] >> >> >> >> - [local count: 1073741824]: >> >> - # gimple_phi >> >> - gimple_assign >> >> + [local count: 536870912]: >> >> + gimple_assign >> >> gimple_cond >> >> - goto ; [50.00%] >> >> + goto ; [100.00%] >> >> else >> >> - goto ; [50.00%] >> >> + goto ; [0.00%] >> >> >> >> [local count: 536870913]: >> >> - gimple_call >> >> - gimple_assign >> >> + gimple_assign >> >> + gimple_assign >> >> >> >> [local count: 1073741824]: >> >> - # gimple_phi >> >> + # gimple_phi >> >> gimple_return >> >> >> >> ..., but I don't know if either of those would be stable/appropriate to >> >> scan instead of 'optimized'? >> > >> > IMO, either dom3 or optimized is fine. >> >> OK, I left it at 'optimized', as I don't have any rationale why exactly >> it should happen in 'dom3' already. ;-) >> >> >> Grüße >> Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 From 1e7943646059c13713b0b3f1e667be9de2c03d0f Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Mon, 17 Oct 2022 09:10:03 +0200 Subject: [PATCH] Add 'gcc.dg/tree-ssa/pr107195-3.c' [PR107195] ... to display optimization performed as of recent commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04 "[PR107195] Set range to zero when nonzero mask is 0". PR tree-optimization/107195 gcc/testsuite/ * gcc.dg/tree-ssa/pr107195-3.c: New. --- gcc/testsuite/gcc.dg/tree-ssa/pr107195-3.c | 73 ++++++++++++++++++++++ 1 file changed, 73 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr107195-3.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr107195-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-3.c new file mode 100644 index 00000000000..d3c5a31a904 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-3.c @@ -0,0 +1,73 @@ +/* Inspired by 'libgomp.oacc-c-c++-common/nvptx-sese-1.c'. */ + +/* { dg-additional-options -O1 } */ +/* { dg-additional-options -fdump-tree-dom3-raw } */ + + +extern int +__attribute__((const)) +foo1 (int); + +int f1 (int r) +{ + if (foo1 (r)) /* If this first 'if' holds... */ + r *= 2; /* ..., 'r' now has a zero-value lower-most bit... */ + + if (r & 1) /* ..., so this second 'if' can never hold... */ + { /* ..., so this is unreachable. */ + /* In constrast, if the first 'if' does not hold ('foo1 (r) == 0'), the + second 'if' may hold, but we know ('foo1' being 'const') that + 'foo1 (r) == 0', so don't have to re-evaluate it here: */ + r += foo1 (r); + /* Thus, if optimizing, we only ever expect one call of 'foo1'. + { dg-final { scan-tree-dump-times {gimple_call