From patchwork Fri Apr 7 06:34:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 80668 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp88658vqo; Thu, 6 Apr 2023 23:34:58 -0700 (PDT) X-Google-Smtp-Source: AKy350YcKqLRAK80hvUUkkK1DKgHRYtpHMFeaExIULwiQ2oQjwh7K9vkrTNJK5wYsXgFkPBDrfoQ X-Received: by 2002:aa7:d0d3:0:b0:4fb:5607:6a24 with SMTP id u19-20020aa7d0d3000000b004fb56076a24mr826443edo.8.1680849297762; Thu, 06 Apr 2023 23:34:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680849297; cv=none; d=google.com; s=arc-20160816; b=uwVGrr9HnOz2H241dbIQybud6/yktrqBM55QbhispAFPhss/8xV0Z59LDxB9QtjgXG dMTZpVWRXrE4BTUj8xHr3cmB3nHSvZD/PLrsAOy9wzeJT+THTZ1KL0ZI3Ms8w9ZAKRjf W5eoGI2Gdmf7keOA3aaDhB2SJ7DCokQDroqFQU8fWd/gSuAB7b8mEz67t2LHrymNsUL7 gwVSyQPBxSBT3jLajZogeKDKZQwAdWT+t2y6zjld5Oulu8hknx6Q+EMBMU1UVqzoibx0 lXYCa9N680Vm4+iYTTesUwsDJWm0/z0oxr6Tu0yu+9UWIRnoAf4Q3drpJ7zAFxtmk7/N 5kIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-disposition:mime-version:mail-followup-to:message-id :subject:to:date:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=JYt3H93SZ1DifekBNU1+ScLa3ujDQss5Ed1DicW+0vk=; b=zCV4nSEOTuXe2ibHh7a/qA7H6zMurDx0MHj7qVIQiWtAJmqBDID2uVnVmEQTabzVhR FvEDDOqtVRAorbumrgjlmWl+QR4hHSN5F9XaTQrYMYuP7Bs1R3ZW4wAivs0a9yQnm5sx bb4air8r3adZzcrJh0B2J6AAGgdJ4ka2n0KLT69uiJNMRYzk7eRZTvLCRxaR9wUlAabE ZRrlvcA3wmkl3FVJUF8r+8qeID6uvt1/RCneEC7kIpm7J8fHvf03e8mEctpI/qwdFXTZ /TUhmt2to18SizskZ9Nq1XVTBNbVkgxc5JZBxJcWuJ3B6rjOWSCvka09c1/BIRzNy4PN mkSg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Ymqrkswv; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id b26-20020aa7c91a000000b004fd240a5a37si2695432edt.299.2023.04.06.23.34.57 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Apr 2023 23:34:57 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Ymqrkswv; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 99BCF385842C for ; Fri, 7 Apr 2023 06:34:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 99BCF385842C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1680849296; bh=JYt3H93SZ1DifekBNU1+ScLa3ujDQss5Ed1DicW+0vk=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=Ymqrkswvhu8tqkQ+3FoiuE96VECxasYDSJyBN7x6ZlijIU4FKHvLQ78DvkeXncCNW OnQKAYASb+R0vFkD/4lGopjc079foF9R7JwkfqWWODjUXlbnqyRhecL484mBeIC9PA gQyk9/ys29ynDEXTMr2Q+oO0IFW1DgTQS+MwqH8c= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 3CDB63858D28 for ; Fri, 7 Apr 2023 06:34:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3CDB63858D28 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3374bqcx027568; Fri, 7 Apr 2023 06:34:09 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ptcaga45t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 07 Apr 2023 06:34:09 +0000 Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33760FAg016294; Fri, 7 Apr 2023 06:34:08 GMT Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ptcaga45b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 07 Apr 2023 06:34:08 +0000 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 337385Ts017664; Fri, 7 Apr 2023 06:34:07 GMT Received: from smtprelay05.wdc07v.mail.ibm.com ([9.208.129.117]) by ppma03wdc.us.ibm.com (PPS) with ESMTPS id 3ppc886ef1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 07 Apr 2023 06:34:07 +0000 Received: from smtpav06.dal12v.mail.ibm.com (smtpav06.dal12v.mail.ibm.com [10.241.53.105]) by smtprelay05.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3376Y45m32178688 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 7 Apr 2023 06:34:04 GMT Received: from smtpav06.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F34645806A; Fri, 7 Apr 2023 06:34:03 +0000 (GMT) Received: from smtpav06.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 63B1758063; Fri, 7 Apr 2023 06:34:03 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.160.59.115]) by smtpav06.dal12v.mail.ibm.com (Postfix) with ESMTPS; Fri, 7 Apr 2023 06:34:03 +0000 (GMT) Date: Fri, 7 Apr 2023 02:34:01 -0400 To: gcc-patches@gcc.gnu.org, Michael Meissner , Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt , chip.kerchner@ibm.com Subject: [PATCH, V2] PR target/70243: Do not generate vmaddfp and vnmsubfp Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt , chip.kerchner@ibm.com MIME-Version: 1.0 Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 716iiHgymJ7usPeqm3smvmdjuYDRSj9F X-Proofpoint-GUID: rVscrq9rTjIuoHozTHFR6s50RK1a-w-D X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-07_04,2023-04-06_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 clxscore=1015 mlxscore=0 lowpriorityscore=0 priorityscore=1501 spamscore=0 impostorscore=0 malwarescore=0 adultscore=0 mlxlogscore=999 bulkscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304070060 X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762498233202142700?= X-GMAIL-MSGID: =?utf-8?q?1762498233202142700?= This is version 2 of the patch. The first version was posted on April 6th. In this version, I eliminated the changes to Altivec.md that added checks to altivec_fmav4sf4 and altivec_vnmsubfp. After writing the code, I remembered that VECTOR_UNIT_ALTIVEC_P that is used by those insns will not be true if the VSX instruction set is enabled, so no additional test is needed. As we discussed in a private chat room, I modified the code to generate vmaddfp and vnmsubfp if -Ofast (-ffast-math) is used. This allows the compiler to eliminate the extra move if the user does not care about strict floating point code generation, but it generates only the VSX instructions in the normal case. I reworked the examples and split them into two tests to test both the normal case when -Ofast is not used and when it is used. I also fixed the instructions mentioned in the comments to be the actual instructions (vmaddfp and vnmsubfp) instead of fmaddfp and fnmsubdp. Sorry about tat. The AltiVec (VMX) instructions vmaddfp and vnmsubfp have different rounding behaviors than the VSX xvmadd{a,m}sp and xvnmsub{a,m}sp instructions. In particular, generating these instructions seems to break Eigen. The bug is that GCC has generated the VMX vmaddfp and vnmsubfp instructions on VSX systems as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions. The advantage of the VMX instructions is that they are 4 operand instructions (i.e. the target register does not have to overlap with one of the input registers). This can mean that the compiler can eliminate an extra move instruction. The disadvantage of generating these instructions is it does not round the same was as the VSX instructions. This patch will only generate the VMX vmaddfp and vnmsubfp instructions as alternatives in the VSX instruction insn support if -Ofast (-ffast-math) is used. I also added 2 tests to the regression suite. I have done bootstrap builds on power9 little endian (with both IEEE long double and IBM long double). I have also done the builds and test on a power8 big endian system (testing both 32-bit and 64-bit code generation). Chip has verified that it fixes the problem that Eigen encountered. Can I check this into the master GCC branch? After a burn-in period, can I check this patch into the active GCC branches? Thanks in advance. 2023-04-07 Michael Meissner gcc/ PR target/70243 * config/rs6000/rs6000.md (isa attribute): Add fastmath. (enabled attribute): Add support for fastmath. * config/rs6000/vsx.md (vsx_fmav4sf4): Set the isa attribute to fastmath to disable Altivec instruction generatins normally. (vsx_nfmsv4sf4): Likewise. gcc/testsuite/ PR target/70243 * gcc.target/powerpc/pr70243.c: New test. * gcc.target/powerpc/pr70243-2.c: New test. --- gcc/config/rs6000/rs6000.md | 6 ++- gcc/config/rs6000/vsx.md | 17 ++++---- gcc/testsuite/gcc.target/powerpc/pr70243-2.c | 41 ++++++++++++++++++++ gcc/testsuite/gcc.target/powerpc/pr70243.c | 41 ++++++++++++++++++++ 4 files changed, 97 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr70243-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr70243.c diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 44f7dd509cb..7fea6a40e0c 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -354,7 +354,7 @@ (define_attr "cpu" (const (symbol_ref "(enum attr_cpu) rs6000_tune"))) ;; The ISA we implement. -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10" +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10,fastmath" (const_string "any")) ;; Is this alternative enabled for the current CPU/ISA/etc.? @@ -402,6 +402,10 @@ (define_attr "enabled" "" (and (eq_attr "isa" "p10") (match_test "TARGET_POWER10")) (const_int 1) + + (and (eq_attr "isa" "fastmath") + (match_test "flag_unsafe_math_optimizations")) + (const_int 1) ] (const_int 0))) ;; If this instruction is microcoded on the CELL processor diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 0865608f94a..7f64a2dd356 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -2009,11 +2009,12 @@ (define_insn "*vsx_tsqrt2_internal" "xtsqrtp %0,%x1" [(set_attr "type" "")]) -;; Fused vector multiply/add instructions. Support the classical Altivec -;; versions of fma, which allows the target to be a separate register from the -;; 3 inputs. Under VSX, the target must be either the addend or the first -;; multiply. - +;; Fused vector multiply/add instructions. Under VSX, the target must be either +;; the addend or the first multiply. If the user used -Ofast, also support the +;; classical VMX versions of fma (vmaddfp and vnmsubfp), which allows the +;; target to be a separate register from the 3 inputs. This restriction is due +;; to the fact that vmaddfp and vnmsubfp have different rounding behaviors +;; compared to xvmadd{a,m}sp or xvnmsub{a,m}sp. (define_insn "*vsx_fmav4sf4" [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v") (fma:V4SF @@ -2025,7 +2026,8 @@ (define_insn "*vsx_fmav4sf4" xvmaddasp %x0,%x1,%x2 xvmaddmsp %x0,%x1,%x3 vmaddfp %0,%1,%2,%3" - [(set_attr "type" "vecfloat")]) + [(set_attr "type" "vecfloat") + (set_attr "isa" "*,*,fastmath")]) (define_insn "*vsx_fmav2df4" [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa,wa") @@ -2078,7 +2080,8 @@ (define_insn "*vsx_nfmsv4sf4" xvnmsubasp %x0,%x1,%x2 xvnmsubmsp %x0,%x1,%x3 vnmsubfp %0,%1,%2,%3" - [(set_attr "type" "vecfloat")]) + [(set_attr "type" "vecfloat") + (set_attr "isa" "*,*,fastmath")]) (define_insn "*vsx_nfmsv2df4" [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa,wa") diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243-2.c b/gcc/testsuite/gcc.target/powerpc/pr70243-2.c new file mode 100644 index 00000000000..ef475f39b12 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr70243-2.c @@ -0,0 +1,41 @@ +/* { dg-do compile */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-Ofast -mvsx" } */ + +/* PR 70423. Make sure we don't generate vmaddfp or vnmsubfp unless -Ofast is + used. These instructions do not round the same way the normal VSX + instructions do. These tests are written where the 3 inputs and target are + all separate registers where the register allocator would prefer to issue + the 4 argument FMA instruction over the 3 argument instruction plus an extra + move. */ + +#include + +vector float +do_add1 (vector float dummy, vector float a, vector float b, vector float c) +{ + return (a * b) + c; +} + +vector float +do_nsub1 (vector float dummy, vector float a, vector float b, vector float c) +{ + return -((a * b) - c); +} + +vector float +do_add2 (vector float dummy, vector float a, vector float b, vector float c) +{ + return vec_madd (a, b, c); +} + +vector float +do_nsub2 (vector float dummy, vector float a, vector float b, vector float c) +{ + return vec_nmsub (a, b, c); +} + +/* { dg-final { scan-assembler-not {\mxvmadd[am]sp\M} } } */ +/* { dg-final { scan-assembler-not {\mxvnmsub[am]sp\M} } } */ +/* { dg-final { scan-assembler {\mvmaddfp\M} } } */ +/* { dg-final { scan-assembler {\mvnmsubfp\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243.c b/gcc/testsuite/gcc.target/powerpc/pr70243.c new file mode 100644 index 00000000000..c1a5c676fc3 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr70243.c @@ -0,0 +1,41 @@ +/* { dg-do compile */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +/* PR 70423. Make sure we don't generate vmaddfp or vnmsubfp unless -Ofast is + used. These instructions do not round the same way the normal VSX + instructions do. These tests are written where the 3 inputs and target are + all separate registers where the register allocator would prefer to issue + the 4 argument FMA instruction over the 3 argument instruction plus an extra + move. */ + +#include + +vector float +do_add1 (vector float dummy, vector float a, vector float b, vector float c) +{ + return (a * b) + c; +} + +vector float +do_nsub1 (vector float dummy, vector float a, vector float b, vector float c) +{ + return -((a * b) - c); +} + +vector float +do_add2 (vector float dummy, vector float a, vector float b, vector float c) +{ + return vec_madd (a, b, c); +} + +vector float +do_nsub2 (vector float dummy, vector float a, vector float b, vector float c) +{ + return vec_nmsub (a, b, c); +} + +/* { dg-final { scan-assembler {\mxvmadd[am]sp\M} } } */ +/* { dg-final { scan-assembler {\mxvnmsub[am]sp\M} } } */ +/* { dg-final { scan-assembler-not {\mvmaddfp\M} } } */ +/* { dg-final { scan-assembler-not {\mvnmsubfp\M} } } */