From patchwork Thu Apr  6 15:12:11 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Michael Meissner <meissner@linux.ibm.com>
X-Patchwork-Id: 80308
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1101619vqo;
        Thu, 6 Apr 2023 08:13:08 -0700 (PDT)
X-Google-Smtp-Source: 
 AKy350b1rMXXgJBJ6Kp5ZCYEgB6YvUq17OqOY/Fmo2EmgPlEadbRHjftvQXex+fB5qVvs+QCEbmd
X-Received: by 2002:a05:6402:7d6:b0:502:7d3f:ced1 with SMTP id
 u22-20020a05640207d600b005027d3fced1mr4905971edy.25.1680793988005;
        Thu, 06 Apr 2023 08:13:08 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1680793987; cv=none;
        d=google.com; s=arc-20160816;
        b=CfEeByH5Ca4ZEnW4DedOyLGptsKPSNMBhzH8yebCeqZEp++UYXjFjGz6Kz0eHdSq2A
         c0yf3h/Z9LlK7r6469V931IbCx6/dbEUXVjrK9qcH3r1Ga2+UaQD4hqICOcf1GBOPcEH
         t+6MVSjlhk5hm2ctBtczPJ8Rxf90F6Ev6HtppzjGITtJ/hri/LWTmFyx8QxoOR3dqE6z
         OCKBArmMlPic+6VlOOp76xZy28jBHF0e9aAp/0F8bC8rC9JqnNmEIUrauXciPZNQGL4V
         5R/H0mscZggWzTYFE794LYGc5c+GXPWeaBsvfd8kcOhUTC//fIAIFuUAVimfwzodASRx
         ygfg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post
         :list-archive:list-unsubscribe:list-id:precedence
         :content-disposition:mime-version:mail-followup-to:message-id
         :subject:to:date:dmarc-filter:delivered-to:dkim-signature
         :dkim-filter;
        bh=qRU4ulEmj5Bp06CAIJTMsn5Wh1ySs8DQhDVx2D45hJY=;
        b=Qf3vRzVg9+m6XS5/1LifxTOgNg6SkiiYxOgtp1L2RHKvUU3O53FWHyFN6GGZbwI+7j
         Bkk8okxpzpDBiDVwUdyN1rVE5Y1ekdLeDUyAzAzcGdvqY4aZQseBSwIcCa0tBMmcgWbE
         kFJbFGvbnfSzhvn2iO0DcbqIiMOw/NRHPHsyNt6ilABCiAOcjZwoZrZoFXKfiMz8pC/u
         k4ccWcfAzwi0Oc8kl+HtZAa1y6zC1xVu4eKuaCO/3KQXmjDUB8IGHW//IE9jb490fspD
         EL7c0/s6On0W687Y7ihfCNQkPl30JAqTAbo2g2ZThz6iRaQxpCd9VnunWLcbisvNv5aE
         /ZiQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=M220gUi2;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from sourceware.org (server2.sourceware.org.
 [2620:52:3:1:0:246e:9693:128c])
        by mx.google.com with ESMTPS id
 a23-20020aa7d757000000b0050202f2d6a7si1452437eds.319.2023.04.06.08.13.07
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 06 Apr 2023 08:13:07 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 client-ip=2620:52:3:1:0:246e:9693:128c;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=M220gUi2;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id B41A43858431
	for <ouuuleilei@gmail.com>; Thu,  6 Apr 2023 15:13:05 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B41A43858431
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1680793985;
	bh=qRU4ulEmj5Bp06CAIJTMsn5Wh1ySs8DQhDVx2D45hJY=;
	h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=M220gUi2RKZDKW1cydVxBIQRSVw4a5SLT3zPnD/AcryuVfZTo1QTQYAWCl7yTpGNi
	 p1eJoO2A4qct/HF/70spV2HE4LaBN5SrCgd3lvRNLozIf9M307sQvMRbWzVf08iPUD
	 Lh/Ma3fD6M7o7D4nw6AkXZNAvgNrFVyL1xLb4HE0=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1])
 by sourceware.org (Postfix) with ESMTPS id A92953858D28
 for <gcc-patches@gcc.gnu.org>; Thu,  6 Apr 2023 15:12:19 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A92953858D28
Received: from pps.filterd (m0098396.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id
 336EwRQq010179; Thu, 6 Apr 2023 15:12:18 GMT
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3psyst9evu-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 06 Apr 2023 15:12:18 +0000
Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1])
 by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 336EwtLi013472;
 Thu, 6 Apr 2023 15:12:18 GMT
Received: from ppma01wdc.us.ibm.com (fd.55.37a9.ip4.static.sl-reverse.com
 [169.55.85.253])
 by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3psyst9ev8-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 06 Apr 2023 15:12:17 +0000
Received: from pps.filterd (ppma01wdc.us.ibm.com [127.0.0.1])
 by ppma01wdc.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 336F4OOp024935;
 Thu, 6 Apr 2023 15:12:16 GMT
Received: from smtprelay03.wdc07v.mail.ibm.com ([9.208.129.113])
 by ppma01wdc.us.ibm.com (PPS) with ESMTPS id 3ppc8838k1-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 06 Apr 2023 15:12:16 +0000
Received: from smtpav02.dal12v.mail.ibm.com (smtpav02.dal12v.mail.ibm.com
 [10.241.53.101])
 by smtprelay03.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 336FCDRw31195784
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Thu, 6 Apr 2023 15:12:14 GMT
Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id B6E2658065;
 Thu,  6 Apr 2023 15:12:13 +0000 (GMT)
Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 207555805C;
 Thu,  6 Apr 2023 15:12:13 +0000 (GMT)
Received: from toto.the-meissners.org (unknown [9.160.59.115])
 by smtpav02.dal12v.mail.ibm.com (Postfix) with ESMTPS;
 Thu,  6 Apr 2023 15:12:13 +0000 (GMT)
Date: Thu, 6 Apr 2023 11:12:11 -0400
To: gcc-patches@gcc.gnu.org, Michael Meissner <meissner@linux.ibm.com>,
 Segher Boessenkool <segher@kernel.crashing.org>,
 "Kewen.Lin" <linkw@linux.ibm.com>, David Edelsohn <dje.gcc@gmail.com>,
 Peter Bergner <bergner@linux.ibm.com>,
 Will Schmidt <will_schmidt@vnet.ibm.com>, chip.kerchner@ibm.com
Subject: PR target/70243: Do not generate fmaddfp and fnmsubfp
Message-ID: <ZC7hS75ohXMo7Qcw@toto.the-meissners.org>
Mail-Followup-To: Michael Meissner <meissner@linux.ibm.com>,
 gcc-patches@gcc.gnu.org,
 Segher Boessenkool <segher@kernel.crashing.org>,
 "Kewen.Lin" <linkw@linux.ibm.com>,
 David Edelsohn <dje.gcc@gmail.com>,
 Peter Bergner <bergner@linux.ibm.com>,
 Will Schmidt <will_schmidt@vnet.ibm.com>, chip.kerchner@ibm.com
MIME-Version: 1.0
Content-Disposition: inline
X-TM-AS-GCONF: 00
X-Proofpoint-ORIG-GUID: 3GX1BaHgp3H1jT16betoYJ4XIzJdgoYN
X-Proofpoint-GUID: S-QQPEj9CxvkdH4jq6xtWAAswbwvMZFC
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22
 definitions=2023-04-06_08,2023-04-06_03,2023-02-09_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 mlxscore=0 bulkscore=0
 adultscore=0 impostorscore=0 clxscore=1011 mlxlogscore=999 suspectscore=0
 phishscore=0 malwarescore=0 spamscore=0 priorityscore=1501
 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2303200000 definitions=main-2304060134
X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Michael Meissner via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Michael Meissner <meissner@linux.ibm.com>
Reply-To: Michael Meissner <meissner@linux.ibm.com>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1762440236796562761?=
X-GMAIL-MSGID: =?utf-8?q?1762440236796562761?=

The Altivec instructions fmaddfp and fnmsubfp have different rounding behaviors
than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
these instructions seems to break Eigen.

GCC has generated the Altivec fmaddfp and fnmsubfp instructions on VSX systems
as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions.  The
advantage  of the Altivec instructions is that they are 4 operand instructions
(i.e. the target register does not have to overlap with one of the input
registers).  The advantage is it can eliminate an extra move instruction.  The
disadvantage is it does round the same was as the VSX instructions.

This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp
instructions as alternatives in the VSX instruction insn support, and in the
Altivec insns it adds a test to prevent the insn from being used if VSX is
available.  I also added a test to the regression test suite.

I have done bootstrap builds on power9 little endian (with both IEEE long
double and IBM long double).  I have also done the builds and test on a power8
big endian system (testing both 32-bit and 64-bit code generation).  Chip has
verified that it fixes the problem that Eigen encountered.  Can I check this
into the master GCC branch?  After a burn-in period, can I check this patch
into the active GCC branches?

Thanks in advance.

2023-04-06   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	PR target/70243
	* config/rs6000/altivec.md (altivec_fmav4sf4): Add a test to prevent
	fmaddfp and fnmsubfp from being generated on VSX systems.
	(altivec_vnmsubfp): Likewise.
	* config/rs6000/rs6000.md (vsx_fmav4sf4): Do not generate fmaddfp or
	fnmsubfp.
	(vsx_nfmsv4sf4): Likewise.

gcc/testsuite/

	PR target/70243
	* gcc.target/powerpc/pr70243.c: New test.
---
 gcc/config/rs6000/altivec.md               |  9 +++--
 gcc/config/rs6000/vsx.md                   | 29 +++++++--------
 gcc/testsuite/gcc.target/powerpc/pr70243.c | 41 ++++++++++++++++++++++
 3 files changed, 61 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr70243.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 49b0c964f4d..63eab228d0d 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -750,12 +750,15 @@ (define_insn "altivec_vsel<mode>4"
 
 ;; Fused multiply add.
 
+;; If we are using VSX instructions, do not generate the vmaddfp instruction
+;; since is has different rounding behavior than the xvmaddsp instruction.
+
 (define_insn "*altivec_fmav4sf4"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
 	(fma:V4SF (match_operand:V4SF 1 "register_operand" "v")
 		  (match_operand:V4SF 2 "register_operand" "v")
 		  (match_operand:V4SF 3 "register_operand" "v")))]
-  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
+  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
   "vmaddfp %0,%1,%2,%3"
   [(set_attr "type" "vecfloat")])
 
@@ -984,6 +987,8 @@ (define_insn "vstril_p_direct_<mode>"
   [(set_attr "type" "vecsimple")])
 
 ;; Fused multiply subtract 
+;; If we are using VSX instructions, do not generate the vnmsubfp instruction
+;; since is has different rounding behavior than the xvnmsubsp instruction.
 (define_insn "*altivec_vnmsubfp"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
 	(neg:V4SF
@@ -991,7 +996,7 @@ (define_insn "*altivec_vnmsubfp"
 		   (match_operand:V4SF 2 "register_operand" "v")
 		   (neg:V4SF
 		    (match_operand:V4SF 3 "register_operand" "v")))))]
-  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
+  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
   "vnmsubfp %0,%1,%2,%3"
   [(set_attr "type" "vecfloat")])
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0865608f94a..03c1d787b6c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2009,22 +2009,20 @@ (define_insn "*vsx_tsqrt<mode>2_internal"
   "x<VSv>tsqrt<sd>p %0,%x1"
   [(set_attr "type" "<VStype_simple>")])
 
-;; Fused vector multiply/add instructions. Support the classical Altivec
-;; versions of fma, which allows the target to be a separate register from the
-;; 3 inputs.  Under VSX, the target must be either the addend or the first
-;; multiply.
+;; Fused vector multiply/add instructions. Do not use the classical Altivec
+;; versions of fma.  Those instructions allows the target to be a separate
+;; register from the 3 inputs, but they have different rounding behaviors.
 
 (define_insn "*vsx_fmav4sf4"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
 	(fma:V4SF
-	  (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
-	  (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
-	  (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))]
+	  (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
+	  (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
+	  (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "@
    xvmaddasp %x0,%x1,%x2
-   xvmaddmsp %x0,%x1,%x3
-   vmaddfp %0,%1,%2,%3"
+   xvmaddmsp %x0,%x1,%x3"
   [(set_attr "type" "vecfloat")])
 
 (define_insn "*vsx_fmav2df4"
@@ -2066,18 +2064,17 @@ (define_insn "*vsx_nfma<mode>4"
   [(set_attr "type" "<VStype_mul>")])
 
 (define_insn "*vsx_nfmsv4sf4"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
 	(neg:V4SF
 	 (fma:V4SF
-	   (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
-	   (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
+	   (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
+	   (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
 	   (neg:V4SF
-	     (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))))]
+	     (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))))]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "@
    xvnmsubasp %x0,%x1,%x2
-   xvnmsubmsp %x0,%x1,%x3
-   vnmsubfp %0,%1,%2,%3"
+   xvnmsubmsp %x0,%x1,%x3"
   [(set_attr "type" "vecfloat")])
 
 (define_insn "*vsx_nfmsv2df4"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243.c b/gcc/testsuite/gcc.target/powerpc/pr70243.c
new file mode 100644
index 00000000000..1dfc13a8864
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70243.c
@@ -0,0 +1,41 @@
+/* { dg-do compile */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* PR 70423, Make sure we don't generate fmaddfp or fnmsubfp.  These
+   instructions have different rounding modes than the VSX instructions
+   xvmaddsp and xvnmsubsp.  These tests are written where the 3 inputs and
+   target are all separate registers.  Because fmaddfp and fnmsubfp are no
+   longer generated the compiler will have to generate an xsmaddsp or xsnmsubsp
+   instruction followed by a move operation.  */
+
+#include <altivec.h>
+
+vector float
+do_add1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return (a * b) + c;
+}
+
+vector float
+do_nsub1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return -((a * b) - c);
+}
+
+vector float
+do_add2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_madd (a, b, c);
+}
+
+vector float
+do_nsub2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_nmsub (a, b, c);
+}
+
+/* { dg-final { scan-assembler     "xvmaddsp"  } } */
+/* { dg-final { scan-assembler     "xvnmsubsp" } } */
+/* { dg-final { scan-assembler-not "fmaddfp"   } } */
+/* { dg-final { scan-assembler-not "fnmsubfp"  } } */