From patchwork Tue Jul 18 14:52:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 122080 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c923:0:b0:3e4:2afc:c1 with SMTP id j3csp1803274vqt; Tue, 18 Jul 2023 07:53:42 -0700 (PDT) X-Google-Smtp-Source: APBJJlFSwUfzKRThobzybnuDcKkclJaL7r0Vi4axKrJ3YkmEVMU6ga2OXhb+VFZYEknwvxrWmvGJ X-Received: by 2002:a17:907:90d5:b0:992:bc8:58e4 with SMTP id gk21-20020a17090790d500b009920bc858e4mr15316ejb.20.1689692022451; Tue, 18 Jul 2023 07:53:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689692022; cv=none; d=google.com; s=arc-20160816; b=cSsbudxYS3eqxWzFTpkEYA2Y3luprc5/LsUjscyiogFXkHG18kFZ6aK63bLurnAx1/ 9CQkaS+slVYSj8S2zUceBIqaV0QOaknK7NpLnUu7Jn2Hjt+58b11r6iCbgcIc2Zvd/rT fiM0BNDfBWrp8VCj46uohrZ9F1XH+GHPIY1h3M+uVem2V4gfDNKGAJcQsELMiyU/+gZA LkRcFU29akHXQCjpkEgywcBN60UmpbKXAkIhqyFh044/bT/JQ8qNeCNs4S5rxgZBajKf EyNCrnn4IWQB5xfK0eN2N/VNrSorejSIyka+nr+OqEKrZeHgMSsnTj0wEqCJWCGwbvzo IKSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:message-id :mime-version:subject:to:date:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=BKJHcQIZ//6wck0wBg0LPYzFvMtLPweiVYi8GJPwYDY=; fh=etb9MYHN7HLF/sff76ICVdPeKiI8ZsjoOL2bcdG0aog=; b=v0TdLFdqvl+0ZRK8w0Iht9lGxKRNhdnDwg1LxIlUV1PALQ1kyPFrTGRY4V2vBbhGKd XgozqN7Mab144cjbei94d2AMCywp8cqyMCWjtEGmGwB8fna6GZOPrz/PtpzilPqhFJxN 2mmi8HWaaKhRaZyGhzp8rd3sefIVKhgmst936V8Q4ILANuy0b17CdbmPAR+ZS4QVmDs0 2Elr6s+awz2n5S/lLsieZeTuOlsL4pI2UIS8O4tuYymgfgyH35V7V7BsPg9moYwOeHFn LplUkJKPeV06R8oiEWqi2+5ybCeHvn73txe4rU7tZytuockXcl36pOyaMW+0a1p3bY8L kx3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Nk5F98pr; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id k2-20020a17090627c200b00993a9a951f9si1215337ejc.28.2023.07.18.07.53.42 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jul 2023 07:53:42 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Nk5F98pr; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 31B723856DC8 for ; Tue, 18 Jul 2023 14:53:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 31B723856DC8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1689692021; bh=BKJHcQIZ//6wck0wBg0LPYzFvMtLPweiVYi8GJPwYDY=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=Nk5F98prOL/lSXcn/UZmsANPDM+uZ8g+yro5LN+g6752L7504y/hsaMbWlrHY5qn+ 5eUJfIOyrOxshBMFF0BGZKwSto3JpOt6sfSvzI+JKcVsnkvn0aZVYebdXjlhqF0CMJ uSJ1HlG0k/vC6HMpRFO3jItq5RHUhHn3amaDcHJw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id 7E072385773C for ; Tue, 18 Jul 2023 14:52:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7E072385773C Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E1165218EE for ; Tue, 18 Jul 2023 14:52:53 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id CC67E134B0 for ; Tue, 18 Jul 2023 14:52:53 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id wzioMEWntmSxFgAAMHmgww (envelope-from ) for ; Tue, 18 Jul 2023 14:52:53 +0000 Date: Tue, 18 Jul 2023 16:52:53 +0200 (CEST) To: gcc-patches@gcc.gnu.org Subject: [PATCH] tree-optimization/88540 - FP x > y ? x : y if-conversion without -ffast-math MIME-Version: 1.0 Message-Id: <20230718145253.CC67E134B0@imap2.suse-dmz.suse.de> X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: Richard Biener Reply-To: Richard Biener Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771770502245029924 X-GMAIL-MSGID: 1771770502245029924 The following makes sure that FP x > y ? x : y style max/min operations are if-converted at the GIMPLE level. While we can neither match it to MAX_EXPR nor .FMAX as both have different semantics with IEEE than the ternary ?: operation we can make sure to maintain this form as a COND_EXPR so backends have the chance to match this to instructions their ISA offers. The patch does this in phiopt where we recognize min/max and instead of giving up when we have to honor NaNs we alter the generated code to a COND_EXPR. This resolves PR88540 and we can then SLP vectorize the min operation for its testcase. It also resolves part of the regressions observed with the change matching bit-inserts of bit-field-refs to vec_perm. Expansion from a COND_EXPR rather than from compare-and-branch gcc.target/i386/pr54855-9.c by producing extra moves while the corresponding min/max operations are now already synthesized by RTL expansion, register selection isn't optimal. This can be also provoked without this change by altering the operand order in the source. I have XFAILed that part of the test. Bootstrapped and tested on x86_64-unknown-linux-gnu ontop of the patch fixing if-converted RTL expansion when constants are involved. Comments welcome but I plan to push this once that dependency is acked. Thanks, Richard. PR tree-optimization/88540 * tree-ssa-phiopt.cc (minmax_replacement): Do not give up with NaNs but handle the simple case by if-converting to a COND_EXPR. * gcc.target/i386/pr88540.c: New testcase. * gcc.target/i386/pr54855-9.c: XFAIL check for redundant moves. * gcc.target/i386/pr54855-12.c: Adjust. * gcc.target/i386/pr54855-13.c: Likewise. * gcc.dg/tree-ssa/split-path-12.c: Likewise. --- gcc/testsuite/gcc.dg/tree-ssa/split-path-12.c | 4 +++- gcc/testsuite/gcc.target/i386/pr54855-12.c | 2 +- gcc/testsuite/gcc.target/i386/pr54855-13.c | 2 +- gcc/testsuite/gcc.target/i386/pr54855-9.c | 4 ++-- gcc/testsuite/gcc.target/i386/pr88540.c | 10 +++++++++ gcc/tree-ssa-phiopt.cc | 21 ++++++++++++++----- 6 files changed, 33 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr88540.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/split-path-12.c b/gcc/testsuite/gcc.dg/tree-ssa/split-path-12.c index 19a130d9bf1..da00f795ef0 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/split-path-12.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/split-path-12.c @@ -16,4 +16,6 @@ foo(double *d1, double *d2, double *d3, int num, double *ip) return dmax[0] + dmax[1] + dmax[2]; } -/* { dg-final { scan-tree-dump "appears to be optimized to a join point for if-convertable half-diamond" "split-paths" } } */ +/* Split-paths shouldn't do anything here, if there's a diamond it would + be if-convertible. */ +/* { dg-final { scan-tree-dump-not "Duplicating join block" "split-paths" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c b/gcc/testsuite/gcc.target/i386/pr54855-12.c index 2f8af392c83..09e8ab8ae39 100644 --- a/gcc/testsuite/gcc.target/i386/pr54855-12.c +++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mavx512fp16" } */ -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */ +/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */ /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */ /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr54855-13.c b/gcc/testsuite/gcc.target/i386/pr54855-13.c index 87b4f459a5a..a4f25066f81 100644 --- a/gcc/testsuite/gcc.target/i386/pr54855-13.c +++ b/gcc/testsuite/gcc.target/i386/pr54855-13.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mavx512fp16" } */ -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */ +/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */ /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */ /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr54855-9.c b/gcc/testsuite/gcc.target/i386/pr54855-9.c index 40add5f6763..fe9302e5077 100644 --- a/gcc/testsuite/gcc.target/i386/pr54855-9.c +++ b/gcc/testsuite/gcc.target/i386/pr54855-9.c @@ -1,8 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-O2 -msse2 -mfpmath=sse" } */ /* { dg-final { scan-assembler-times "minss" 1 } } */ -/* { dg-final { scan-assembler-not "movaps" } } */ -/* { dg-final { scan-assembler-not "movss" } } */ +/* { dg-final { scan-assembler-not "movaps" { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-not "movss" { xfail *-*-* } } } */ typedef float vec __attribute__((vector_size(16))); diff --git a/gcc/testsuite/gcc.target/i386/pr88540.c b/gcc/testsuite/gcc.target/i386/pr88540.c new file mode 100644 index 00000000000..b927d0c57d5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr88540.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2" } */ + +void test(double* __restrict d1, double* __restrict d2, double* __restrict d3) +{ + for (int n = 0; n < 2; ++n) + d3[n] = d1[n] < d2[n] ? d1[n] : d2[n]; +} + +/* { dg-final { scan-assembler "minpd" } } */ diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc index 467c9fd108a..13ee486831d 100644 --- a/gcc/tree-ssa-phiopt.cc +++ b/gcc/tree-ssa-phiopt.cc @@ -1580,10 +1580,6 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_ tree type = TREE_TYPE (PHI_RESULT (phi)); - /* The optimization may be unsafe due to NaNs. */ - if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type)) - return false; - gcond *cond = as_a (*gsi_last_bb (cond_bb)); enum tree_code cmp = gimple_cond_code (cond); tree rhs = gimple_cond_rhs (cond); @@ -1770,6 +1766,9 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_ else return false; } + else if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type)) + /* The optimization may be unsafe due to NaNs. */ + return false; else if (middle_bb != alt_middle_bb && threeway_p) { /* Recognize the following case: @@ -2103,7 +2102,19 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_ /* Emit the statement to compute min/max. */ gimple_seq stmts = NULL; tree phi_result = PHI_RESULT (phi); - result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, arg1); + + /* When we can't use a MIN/MAX_EXPR still make sure the expression + stays in a form to be recognized by ISA that map to IEEE x > y ? x : y + semantics (that's not IEEE max semantics). */ + if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type)) + { + result = gimple_build (&stmts, cmp, boolean_type_node, + gimple_cond_lhs (cond), rhs); + result = gimple_build (&stmts, COND_EXPR, TREE_TYPE (phi_result), + result, arg_true, arg_false); + } + else + result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, arg1); gsi = gsi_last_bb (cond_bb); gsi_insert_seq_before (&gsi, stmts, GSI_NEW_STMT);