From patchwork Fri Jan 27 11:06:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 49187 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp772802wrn; Fri, 27 Jan 2023 03:07:08 -0800 (PST) X-Google-Smtp-Source: AK7set+npSLLOswWM+IXFHaE6/6gZ80J/VDUhsg0IZwroVSmgvuvt6h93HCTsXRCegfmic/FA899 X-Received: by 2002:a17:906:9144:b0:87b:d153:5ac3 with SMTP id y4-20020a170906914400b0087bd1535ac3mr1770746ejw.40.1674817628705; Fri, 27 Jan 2023 03:07:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674817628; cv=none; d=google.com; s=arc-20160816; b=wTv6jF3AunAyLj9wUreSlkqvvJvsC8cHe/f/0q4OdAPmZFZFRIQu3gxIvjOpuAxQ1R GE/8LgbGprcBL2N94IiWFWOj2bsJooGzKR7kz0f9ExSh8SegsBCsS0Itq7YeuT+GLjuV G/iaUoE9RBsjt9KvI5TcvY1OLPYv2ql5NVEStE6CZGGO5wO0+ejBgZWEVpjxdrp0a4Iq BWzGu9H/r3hVJc/JIPKGlbKEAc+GMPpE/ruFI3FbndtEzP2GmodgxsBtlFmopLUduNQa EbR5dTWfwih6M3/T7QiuC18Lg3WZ05/ZYC0QbT8rFUGV/1p2RoIHRTGkCZI+IUF56qNv afRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:mime-version :user-agent:message-id:date:subject:cc:mail-followup-to:to :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=hJxICQ+xGT/wGH6q8NiYsXqdKXDKWCerbKc4ufy1KCw=; b=zUez5a1cegu1d05qi8RpMVOxGL+q1PSbBMLNQWBrJ8yF5EMyePFg0C5ZHcV7yeCHYO 6bToCAg9XkY6XG0fBJeIoHIOZiOryCec3WU78XhooPJjcAn0PXKWYV7nColwxR6j9/+Q NgRQA3CgmpXaJXIHljzl78+U1UwAUMSvwtdAXVoOtunafjbX71C2N3NM3VB3yjJSrSw6 56PmmI2QBAQsqWoJH8EfKnBzjztJeZ/ezQAreG0gk/bQq6SwRmGcWXJCE/oJguNWlTKH Xxc7LUsX3jOSqY1y2VDI61s2DOj+VkJySFEqgypBCPNBaY7asUBubujbopP3vXyIuxKC B9Qg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=w5oTr2Px; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id ep5-20020a1709069b4500b0087a56fcad93si2239808ejc.262.2023.01.27.03.07.08 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Jan 2023 03:07:08 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=w5oTr2Px; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9C2E43858422 for ; Fri, 27 Jan 2023 11:07:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9C2E43858422 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1674817627; bh=hJxICQ+xGT/wGH6q8NiYsXqdKXDKWCerbKc4ufy1KCw=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=w5oTr2PxEPDtKbDmM4LUTv40eWFtdS18+rCvmj9M//7NG+zjzL0zY6c2KRTAmIS28 oQlD4+aCm33bWBWo8cKeyhi7ZZ47lk6z/YrKhKp0vB/A2ikpSn+cv80p+krVidXXBx AgQQ1NxFK5kCWqfNqA5E99JYqas2TQOLadatbn8g= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id AD03A3858422 for ; Fri, 27 Jan 2023 11:06:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AD03A3858422 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 441952B; Fri, 27 Jan 2023 03:07:04 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E02B33F64C; Fri, 27 Jan 2023 03:06:21 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, rguenther@suse.de, richard.sandiford@arm.com Cc: rguenther@suse.de Subject: [PATCH 1/2] Add support for conditional xorsign [PR96373] Date: Fri, 27 Jan 2023 11:06:20 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-36.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756173569919480774?= X-GMAIL-MSGID: =?utf-8?q?1756173569919480774?= This patch is an optimisation, but it's also a prerequisite for fixing PR96373 without regressing vect-xorsign_exec.c. Currently the vectoriser vectorises: for (i = 0; i < N; i++) r[i] = a[i] * __builtin_copysignf (1.0f, b[i]); as two unconditional operations (copysign and mult). tree-ssa-math-opts.cc later combines them into an "xorsign" function. This works for both Advanced SIMD and SVE. However, with the fix for PR96373, the vectoriser will instead generate a conditional multiplication (IFN_COND_MUL). Something then needs to fold copysign & IFN_COND_MUL to the equivalent of a conditional xorsign. Three obvious options were: (1) Extend tree-ssa-math-opts.cc. (2) Do the fold in match.pd. (3) Leave it to rtl combine. I'm against (3), because this isn't a target-specific optimisation. (1) would be possible, but would involve open-coding a lot of what match.pd does for us. And, in contrast to doing the current tree-ssa-math-opts.cc optimisation in match.pd, there should be no danger of (2) happening too early. If we have an IFN_COND_MUL then we're already past the stage of simplifying the original source code. There was also a choice between adding a conditional xorsign ifn and simply open-coding the xorsign. The latter seems simpler, and means less boiler-plate for target-specific code. The signed_or_unsigned_type_for change is needed to make sure that we stay in "SVE space" when doing the optimisation on 128-bit fixed-length SVE. Tested on aarch64-linux-gnu. OK to install? Richard gcc/ PR tree-optimization/96373 * tree.h (sign_mask_for): Declare. * tree.cc (sign_mask_for): New function. (signed_or_unsigned_type_for): For vector types, try to use the related_int_vector_mode. * genmatch.cc (commutative_op): Handle conditional internal functions. * match.pd: Fold an IFN_COND_MUL+copysign into an IFN_COND_XOR+and. gcc/testsuite/ PR tree-optimization/96373 * gcc.target/aarch64/sve/cond_xorsign_1.c: New test. * gcc.target/aarch64/sve/cond_xorsign_2.c: Likewise. --- gcc/genmatch.cc | 15 ++++++++ gcc/match.pd | 14 ++++++++ .../gcc.target/aarch64/sve/cond_xorsign_1.c | 34 +++++++++++++++++++ .../gcc.target/aarch64/sve/cond_xorsign_2.c | 17 ++++++++++ gcc/tree.cc | 33 ++++++++++++++++++ gcc/tree.h | 1 + 6 files changed, 114 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc index d4cb439a851..e147ab9db7a 100644 --- a/gcc/genmatch.cc +++ b/gcc/genmatch.cc @@ -489,6 +489,21 @@ commutative_op (id_base *id) case CFN_FNMS: return 0; + case CFN_COND_ADD: + case CFN_COND_MUL: + case CFN_COND_MIN: + case CFN_COND_MAX: + case CFN_COND_FMIN: + case CFN_COND_FMAX: + case CFN_COND_AND: + case CFN_COND_IOR: + case CFN_COND_XOR: + case CFN_COND_FMA: + case CFN_COND_FMS: + case CFN_COND_FNMA: + case CFN_COND_FNMS: + return 1; + default: return -1; } diff --git a/gcc/match.pd b/gcc/match.pd index 56ac743aa6d..f605b798c44 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -339,6 +339,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@0))) (COPYSIGN_ALL (negate @0) @1))) +/* Transform c ? x * copysign (1, y) : z to c ? x ^ signs(y) : z. + tree-ssa-math-opts.cc does the corresponding optimization for + unconditional multiplications (via xorsign). */ +(simplify + (IFN_COND_MUL:c @0 @1 (IFN_COPYSIGN real_onep @2) @3) + (with { tree signs = sign_mask_for (type); } + (if (signs) + (with { tree inttype = TREE_TYPE (signs); } + (view_convert:type + (IFN_COND_XOR:inttype @0 + (view_convert:inttype @1) + (bit_and (view_convert:inttype @2) { signs; }) + (view_convert:inttype @3))))))) + /* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x. */ (simplify (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c new file mode 100644 index 00000000000..338ca605923 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#define xorsign(A, B, SUFFIX) ((A) * __builtin_copysign##SUFFIX (1.0, B)) + +#define DEF_LOOP(TYPE, SUFFIX) \ + void __attribute__ ((noinline, noclone)) \ + test_##TYPE (TYPE *__restrict r, TYPE *__restrict a, \ + TYPE *__restrict b, TYPE *__restrict c, \ + int n) \ + { \ + for (int i = 0; i < n; ++i) \ + r[i] = a[i] < 20 ? xorsign(b[i], c[i], SUFFIX) : b[i]; \ + } + +#define TEST_ALL(T) \ + T (_Float16, f16) \ + T (float, f) \ + T (double, ) + +TEST_ALL (DEF_LOOP) + +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.h, z[0-9]+\.h,} 1 } } */ +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.s, z[0-9]+\.s,} 1 } } */ +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, z[0-9]+\.d,} 1 } } */ + +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.h, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-not {\tfmul} } } */ +/* { dg-final { scan-assembler-not {\tmov\tz[^,]*z} } } */ +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ +/* { dg-final { scan-assembler-not {\tsel\t} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c new file mode 100644 index 00000000000..274dd0ede59 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msve-vector-bits=128 --param aarch64-autovec-preference=2" } */ + +#include "cond_xorsign_1.c" + +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.h, z[0-9]+\.h,} 1 } } */ +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.s, z[0-9]+\.s,} 1 } } */ +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, z[0-9]+\.d,} 1 } } */ + +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.h, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-not {\tfmul} } } */ +/* { dg-final { scan-assembler-not {\tmov\tz[^,]*z} } } */ +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ +/* { dg-final { scan-assembler-not {\tsel\t} } } */ diff --git a/gcc/tree.cc b/gcc/tree.cc index 7473912a065..94d20eaba17 100644 --- a/gcc/tree.cc +++ b/gcc/tree.cc @@ -2669,6 +2669,35 @@ build_zero_cst (tree type) } } +/* If floating-point type TYPE has an IEEE-style sign bit, return an + unsigned constant in which only the sign bit is set. Return null + otherwise. */ + +tree +sign_mask_for (tree type) +{ + /* Avoid having to choose between a real-only sign and a pair of signs. + This could be relaxed if the choice becomes obvious later. */ + if (TREE_CODE (type) == COMPLEX_TYPE) + return NULL_TREE; + + auto eltmode = as_a (element_mode (type)); + auto bits = REAL_MODE_FORMAT (eltmode)->ieee_bits; + if (!bits || !pow2p_hwi (bits)) + return NULL_TREE; + + tree inttype = unsigned_type_for (type); + if (!inttype) + return NULL_TREE; + + auto mask = wi::set_bit_in_zero (bits - 1, bits); + if (TREE_CODE (inttype) == VECTOR_TYPE) + { + tree elt = wide_int_to_tree (TREE_TYPE (inttype), mask); + return build_vector_from_val (inttype, elt); + } + return wide_int_to_tree (inttype, mask); +} /* Build a BINFO with LEN language slots. */ @@ -10961,6 +10990,10 @@ signed_or_unsigned_type_for (int unsignedp, tree type) return NULL_TREE; if (inner == inner2) return type; + machine_mode new_mode; + if (VECTOR_MODE_P (TYPE_MODE (type)) + && related_int_vector_mode (TYPE_MODE (type)).exists (&new_mode)) + return build_vector_type_for_mode (inner2, new_mode); return build_vector_type (inner2, TYPE_VECTOR_SUBPARTS (type)); } diff --git a/gcc/tree.h b/gcc/tree.h index e730a2a3e56..c656cd5b7bf 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -4675,6 +4675,7 @@ extern tree build_one_cst (tree); extern tree build_minus_one_cst (tree); extern tree build_all_ones_cst (tree); extern tree build_zero_cst (tree); +extern tree sign_mask_for (tree); extern tree build_string (unsigned, const char * = NULL); extern tree build_poly_int_cst (tree, const poly_wide_int_ref &); extern tree build_tree_list (tree, tree CXX_MEM_STAT_INFO); From patchwork Fri Jan 27 11:08:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 49191 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp773744wrn; Fri, 27 Jan 2023 03:09:25 -0800 (PST) X-Google-Smtp-Source: AMrXdXuqFKp0C2gSR0dpjMvaceFFe4E/4lUocprm1TWLYVcBVUSXtgvP0k+F9jtLaD5yod5FboxK X-Received: by 2002:a17:907:3da0:b0:84d:3f40:4bc8 with SMTP id he32-20020a1709073da000b0084d3f404bc8mr68348292ejc.65.1674817765791; Fri, 27 Jan 2023 03:09:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674817765; cv=none; d=google.com; s=arc-20160816; b=HgbxXTt6eTDGjcyMQWCO3w00Zl/EYvAFyp7gIx8fkP2q74L+fJcwRdgZiMtWZf8MQ0 GQIg+HtBHlBBeaSR93oRYc7sUcqRd7q6G3XaZD9YRCnfj3HvuT6dA6NgXbg+cM09/a45 twPj1y7GX+7P6VFp0Sh72QV8lluL/RfgoUMX1tDE1mtqIXXynAJARD2gamiPHSIFXD6U aXv5cef+upQSzhqZWhQyc8tOQ4/Yxw3MVCamnp1cAR1lzzXD/nN00dPdxaH9dxG4U3n+ 5qLJnZxuF3S/v/S4iOMA+jzYqoZDv4FS83lta6dsDBlo0LhQ9Gw8xB2WERQBtFJhMh1b 6EKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:mime-version :user-agent:message-id:date:subject:cc:mail-followup-to:to :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=zxkCTCkDtuix0ix60Nes9qhzA8VlLYukisOkXSJh8F8=; b=YM0d+3UEszllrATXT4ZUq3BQnJEycoU0rvLqf0ecKCbEktTX+GdkL7lZDRe6y4s/eR KMQAdNCmn1TTTJgg3EZiH5Z/FnNlJpQ+VywO5XEGj8HkE0Z4IlvAM4pj7FtnF0aChTyW 7woEgDrx9uRsANczQDFeKXhMTJHGq4hI5njbafKUwd9lBZCTD7mJNxe+YMy/AsfW3uFl zyjA6DGATyQ+7OcowYe3ptz3KcKVpiicHeF4RJmwmeghYekUTBz6PTmfl6d+Y5FTY0m4 9BWqyy0jJK5eibr7oCdoBL2h32EJoNzj2Fy1/lJzIOf+IbsVcLrRpyfSjeAh/6v8Vzu9 cD4Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=yXMdSWX9; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id op27-20020a170906bcfb00b00877e1720997si4030675ejb.669.2023.01.27.03.09.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Jan 2023 03:09:25 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=yXMdSWX9; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 63FFA3858C00 for ; Fri, 27 Jan 2023 11:09:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 63FFA3858C00 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1674817764; bh=zxkCTCkDtuix0ix60Nes9qhzA8VlLYukisOkXSJh8F8=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=yXMdSWX9SElQurX7Cm4RHrkNL7hPSR4NNwLVjzsNaskzEkvHT6EeJVGQbV4W29kVF X2NGh6ZUlp51muhfs3bbmXrwix4IeT+HwnIrO4fzisvYmKsWtrOUgUbGZB13WFd87d CrX72Xk9hhl7dkRISECecTKC3Kb/fYk/WZG6itPs= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id A3B313858C60; Fri, 27 Jan 2023 11:08:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A3B313858C60 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3EA782B; Fri, 27 Jan 2023 03:09:22 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BF2643F64C; Fri, 27 Jan 2023 03:08:39 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, rguenther@suse.de, linkw@gcc.gnu.org, richard.sandiford@arm.com Cc: rguenther@suse.de, linkw@gcc.gnu.org Subject: [PATCH 2/2] vect: Make partial trapping ops use predication [PR96373] Date: Fri, 27 Jan 2023 11:08:38 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-36.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756173713210127985?= X-GMAIL-MSGID: =?utf-8?q?1756173713210127985?= PR96373 points out that a predicated SVE loop currently converts trapping unconditional ops into unpredicated vector ops. Doing the operation on inactive lanes can then raise an exception. As discussed in the PR trail, we aren't 100% consistent about whether we preserve traps or not. But the direction of travel is clearly to improve that rather than live with it. This patch tries to do that for the SVE case. Doing this regresses gcc.target/aarch64/sve/fabd_1.c. I've added -fno-trapping-math for now and filed PR108571 to track it. A similar problem applies to fsubr_1.d. I think this is likely to regress Power 10, since conditional operations are only available for masked loops. I think we'll need to add -fno-trapping-math to any affected testcases, but I don't have a Power 10 system to test on. Kewen, would you mind giving this a spin and seeing how bad the fallout is? Tested on aarch64-linux-gnu. OK to install assuming no blockers on the Power 10 side? Richard gcc/ PR tree-optimization/96373 * tree-vect-stmts.cc (vectorizable_operation): Predicate trapping operations on the loop mask. Reject partial vectors if this isn't possible. gcc/testsuite/ PR tree-optimization/96373 PR tree-optimization/108571 * gcc.target/aarch64/sve/fabd_1.c: Add -fno-trapping-math. * gcc.target/aarch64/sve/fsubr_1.c: Likewise. * gcc.target/aarch64/sve/fmul_1.c: Expect predicate ops. * gcc.target/aarch64/sve/fp_arith_1.c: Likewise. --- gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c | 2 +- gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c | 12 +++---- .../gcc.target/aarch64/sve/fp_arith_1.c | 12 +++---- .../gcc.target/aarch64/sve/fsubr_1.c | 2 +- gcc/tree-vect-stmts.cc | 32 ++++++++++++++----- 5 files changed, 38 insertions(+), 22 deletions(-) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c b/gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c index 13ad83be24c..30bde6f0df7 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c @@ -1,5 +1,5 @@ /* { dg-do assemble { target aarch64_asm_sve_ok } } */ -/* { dg-options "-O3 --save-temps" } */ +/* { dg-options "-O3 --save-temps -fno-trapping-math" } */ #define N 16 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c b/gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c index 4a3e7c06745..0245a8c1422 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c @@ -27,20 +27,20 @@ DO_ARITH_OPS (_Float16, *, mul) DO_ARITH_OPS (float, *, mul) DO_ARITH_OPS (double, *, mul) -/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, z[0-9]+\.h\n} 4 } } */ /* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #0.5\n} 1 } } */ -/* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #2} } } */ +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #2.0\n} 1 } } */ /* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #5} } } */ /* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #-} } } */ -/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} 4 } } */ /* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #0.5\n} 1 } } */ -/* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #2} } } */ +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #2.0\n} 1 } } */ /* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #5} } } */ /* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #-} } } */ -/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 4 } } */ /* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #0.5\n} 1 } } */ -/* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #2} } } */ +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #2.0\n} 1 } } */ /* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #5} } } */ /* { dg-final { scan-assembler-not {\tfmul\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #-} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fp_arith_1.c b/gcc/testsuite/gcc.target/aarch64/sve/fp_arith_1.c index 5aed0dcb490..419d6e1b5ec 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/fp_arith_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/fp_arith_1.c @@ -34,37 +34,37 @@ DO_ARITH_OPS (double, -, minus) /* No specific count because it's valid to use fadd or fsub for the out-of-range constants. */ -/* { dg-final { scan-assembler {\tfadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} } } */ +/* { dg-final { scan-assembler {\tfadd\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, z[0-9]+\.h\n} } } */ /* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #1.0\n} 2 } } */ /* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #0.5\n} 2 } } */ /* { dg-final { scan-assembler-not {\tfadd\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #2} } } */ /* { dg-final { scan-assembler-not {\tfadd\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #-} } } */ -/* { dg-final { scan-assembler {\tfsub\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} } } */ +/* { dg-final { scan-assembler {\tfsub\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, z[0-9]+\.h\n} } } */ /* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #1.0\n} 2 } } */ /* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #0.5\n} 2 } } */ /* { dg-final { scan-assembler-not {\tfsub\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #2} } } */ /* { dg-final { scan-assembler-not {\tfsub\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, #-} } } */ -/* { dg-final { scan-assembler {\tfadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} } } */ +/* { dg-final { scan-assembler {\tfadd\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} } } */ /* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #1.0\n} 2 } } */ /* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #0.5\n} 2 } } */ /* { dg-final { scan-assembler-not {\tfadd\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #2} } } */ /* { dg-final { scan-assembler-not {\tfadd\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #-} } } */ -/* { dg-final { scan-assembler {\tfsub\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} } } */ +/* { dg-final { scan-assembler {\tfsub\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} } } */ /* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #1.0\n} 2 } } */ /* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #0.5\n} 2 } } */ /* { dg-final { scan-assembler-not {\tfsub\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #2} } } */ /* { dg-final { scan-assembler-not {\tfsub\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, #-} } } */ -/* { dg-final { scan-assembler {\tfadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} } } */ +/* { dg-final { scan-assembler {\tfadd\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} } } */ /* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #1.0\n} 2 } } */ /* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #0.5\n} 2 } } */ /* { dg-final { scan-assembler-not {\tfadd\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #2} } } */ /* { dg-final { scan-assembler-not {\tfadd\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #-} } } */ -/* { dg-final { scan-assembler {\tfsub\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} } } */ +/* { dg-final { scan-assembler {\tfsub\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} } } */ /* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #1.0\n} 2 } } */ /* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #0.5\n} 2 } } */ /* { dg-final { scan-assembler-not {\tfsub\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #2} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fsubr_1.c b/gcc/testsuite/gcc.target/aarch64/sve/fsubr_1.c index f47a360dee9..012cf6e9e5d 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/fsubr_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/fsubr_1.c @@ -1,5 +1,5 @@ /* { dg-do assemble { target aarch64_asm_sve_ok } } */ -/* { dg-options "-O3 --save-temps" } */ +/* { dg-options "-O3 --save-temps -fno-trapping-math" } */ #define DO_IMMEDIATE_OPS(VALUE, TYPE, NAME) \ void vsubrarithimm_##NAME##_##TYPE (TYPE *dst, int count) \ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index eb4ca1f184e..56e3c30658e 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -6301,6 +6301,7 @@ vectorizable_operation (vec_info *vinfo, int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info); vec_loop_masks *masks = (loop_vinfo ? &LOOP_VINFO_MASKS (loop_vinfo) : NULL); internal_fn cond_fn = get_conditional_internal_fn (code); + bool could_trap = gimple_could_trap_p (stmt); if (!vec_stmt) /* transformation not required. */ { @@ -6309,7 +6310,7 @@ vectorizable_operation (vec_info *vinfo, keeping the inactive lanes as-is. */ if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) - && reduc_idx >= 0) + && (could_trap || reduc_idx >= 0)) { if (cond_fn == IFN_LAST || !direct_internal_fn_supported_p (cond_fn, vectype, @@ -6452,16 +6453,31 @@ vectorizable_operation (vec_info *vinfo, vop1 = ((op_type == binary_op || op_type == ternary_op) ? vec_oprnds1[i] : NULL_TREE); vop2 = ((op_type == ternary_op) ? vec_oprnds2[i] : NULL_TREE); - if (masked_loop_p && reduc_idx >= 0) + if (masked_loop_p && (reduc_idx >= 0 || could_trap)) { - /* Perform the operation on active elements only and take - inactive elements from the reduction chain input. */ - gcc_assert (!vop2); - vop2 = reduc_idx == 1 ? vop1 : vop0; tree mask = vect_get_loop_mask (gsi, masks, vec_num * ncopies, vectype, i); - gcall *call = gimple_build_call_internal (cond_fn, 4, mask, - vop0, vop1, vop2); + auto_vec vops (5); + vops.quick_push (mask); + vops.quick_push (vop0); + if (vop1) + vops.quick_push (vop1); + if (vop2) + vops.quick_push (vop2); + if (reduc_idx >= 0) + { + /* Perform the operation on active elements only and take + inactive elements from the reduction chain input. */ + gcc_assert (!vop2); + vops.quick_push (reduc_idx == 1 ? vop1 : vop0); + } + else + { + auto else_value = targetm.preferred_else_value + (cond_fn, vectype, vops.length () - 1, &vops[1]); + vops.quick_push (else_value); + } + gcall *call = gimple_build_call_internal_vec (cond_fn, vops); new_temp = make_ssa_name (vec_dest, call); gimple_call_set_lhs (call, new_temp); gimple_call_set_nothrow (call, true);