From patchwork Thu Aug 24 09:19:45 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@arm.com>
X-Patchwork-Id: 136780
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:a7d1:0:b0:3f2:4152:657d with SMTP id p17csp992219vqm;
        Thu, 24 Aug 2023 02:20:52 -0700 (PDT)
X-Google-Smtp-Source: 
 AGHT+IEUe0SK2hsj/bZDD/riu+cTNy+64CcW7s7GlEBV67PmKJn9Vuc5p/kRsWOzqthd9uNTSA4N
X-Received: by 2002:a17:906:3d29:b0:992:bc8:58e4 with SMTP id
 l9-20020a1709063d2900b009920bc858e4mr11754674ejf.20.1692868852480;
        Thu, 24 Aug 2023 02:20:52 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1692868852; cv=none;
        d=google.com; s=arc-20160816;
        b=ZdQGQ7HQ1YhDyOK9/YRKwWh4IGu6nhzI6ZN8PtqEgCzaPoERVt7dxtGIlp2gTs9b2v
         VXJjmZdM2hU9PWeEiLBt+DY99fCBEtY6WT/q+ttiVi9ZAaVVSlL5ZxsjUqp9Jw69ulgp
         sj8noU2sgnxhZEn5Jr+nQ0qYNvyITspugCm2XnEUT3YnoQPyd4ZEcvm75SIYVWhbj+wm
         4AkzmXrsKAapRp3s1XutBgAdVjGQhUikgpbRj5tVyx7u/6NelZc1htdyn17LC9fU+lud
         meFzj3vJhY/9Fw7aTIkvlbZu9ctOtoK9JUG8IG97Pd7MMCWO3RsxF5mJnamXubOFGG/k
         gSqw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post
         :list-archive:list-unsubscribe:list-id:precedence:mime-version
         :user-agent:message-id:date:subject:mail-followup-to:to:dmarc-filter
         :delivered-to:dkim-signature:dkim-filter;
        bh=o7xMM70CdsKGcgQ0E4WD49Q87KMXLtcVH3pwx6LYN9o=;
        fh=F7d6d8P90+01d16f9G7J0FMaWZN9AIbjH3G405HNlZ4=;
        b=GMc5EVXBG3aqtJhtlJa27G78LrlntLCYL4wba/6yjyuQDZKk8uU1YlhUqhNcwqvPZ+
         ECPYMfs4DiXpUNa348qrffiEQ6jdSfg6POwtWoDPOOS28EExuVOLGvoxPF/mwtDa07Tp
         thnJOQPfaj+gt0Lp5ECs0fd3z9KOIoXYS+5OEg/OCs7La8wUpx3HIrKU2CIWp9HrHlPw
         1rLhQPAA0aOh2oJ9aalnZ5HZ8faESbGNuCJG2WtVaz+F7WzxyTSwI8leytJau3zc+6lJ
         imjbA4RFNiHr/2vQ1zeZQqRZ1McdnGaYG1ydvMhuldzfpalstf+7kec7U9qs2mw5vade
         oGMw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=iJY3chkX;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from server2.sourceware.org (server2.sourceware.org.
 [2620:52:3:1:0:246e:9693:128c])
        by mx.google.com with ESMTPS id
 cd13-20020a170906b34d00b0099347270140si9500684ejb.565.2023.08.24.02.20.52
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 24 Aug 2023 02:20:52 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 client-ip=2620:52:3:1:0:246e:9693:128c;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=iJY3chkX;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 22F3C3858C39
	for <ouuuleilei@gmail.com>; Thu, 24 Aug 2023 09:20:51 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 22F3C3858C39
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1692868851;
	bh=o7xMM70CdsKGcgQ0E4WD49Q87KMXLtcVH3pwx6LYN9o=;
	h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=iJY3chkXmZtbkuxilXJlXejQKzEo4gqqiVGz+hxY5Zhf7XVZWDfSIlspz9mVF3RLU
	 /adGmfJ1ZRLUCXkQXR5X4uW37vU+08DvBmirVzE1R7FN/ibVtRCSpkyZFEVyk+6A96
	 1d2C4Xpp/pkGoPKsZ1AuNiwB3wJ9USKlo+JXW2X4=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 403983858C2C
 for <gcc-patches@gcc.gnu.org>; Thu, 24 Aug 2023 09:19:47 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 403983858C2C
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0AA4D1042
 for <gcc-patches@gcc.gnu.org>; Thu, 24 Aug 2023 02:20:27 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 702C83F762
 for <gcc-patches@gcc.gnu.org>; Thu, 24 Aug 2023 02:19:46 -0700 (PDT)
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com
Subject: [PATCH] aarch64: Account for different Advanced SIMD fusing options
Date: Thu, 24 Aug 2023 10:19:45 +0100
Message-ID: <mptcyzcajbi.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
X-Spam-Status: No, score=-25.5 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT,
 SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Richard Sandiford via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Richard Sandiford <richard.sandiford@arm.com>
Reply-To: Richard Sandiford <richard.sandiford@arm.com>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1775101649619578988
X-GMAIL-MSGID: 1775101649619578988

The scalar FNMADD/FNMSUB and SVE FNMLA/FNMLS instructions mean
that either side of a subtraction can start an accumulator chain.
However, Advanced SIMD doesn't have an equivalent instruction.
This means that, for Advanced SIMD, a subtraction can only be
fused if the second operand is a multiplication.

Also, if both sides of a subtraction are multiplications,
and if the second operand is used multiple times, such as:

     c * d - a * b
     e * f - a * b

then the first rather than second multiplication operand will tend
to be fused.  On Advanced SIMD, this leads to:

     tmp1 = a * b
     tmp2 = -tmp1
      ... = tmp2 + c * d   // FMLA
      ... = tmp2 + e * f   // FMLA

where one of the FMLAs also requires a MOV.

This patch tries to account for this in the vector cost model.
It improves roms performance by 2-3% on Neoverse V1.  It's also
needed to avoid a regression in fotonik for Neoverse N2 and
Neoverse V2 with the patch for PR110625.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
	* config/aarch64/aarch64.cc: Include ssa.h.
	(aarch64_multiply_add_p): Require the second operand of an
	Advanced SIMD subtraction to be a multiplication.  Assume that
	such an operation won't be fused if the second operand is used
	multiple times and if the first operand is also a multiplication.

gcc/testsuite/
	* gcc.target/aarch64/neoverse_v1_2.c: New test.
	* gcc.target/aarch64/neoverse_v1_3.c: Likewise.
---
 gcc/config/aarch64/aarch64.cc                 | 24 ++++++++++++++-----
 .../gcc.target/aarch64/neoverse_v1_2.c        | 15 ++++++++++++
 .../gcc.target/aarch64/neoverse_v1_3.c        | 14 +++++++++++
 3 files changed, 47 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/neoverse_v1_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/neoverse_v1_3.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 034628148ef..37d414021ca 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -84,6 +84,7 @@
 #include "aarch64-feature-deps.h"
 #include "config/arm/aarch-common.h"
 #include "config/arm/aarch-common-protos.h"
+#include "ssa.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -16411,20 +16412,20 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info,
   if (code != PLUS_EXPR && code != MINUS_EXPR)
     return false;
 
-  for (int i = 1; i < 3; ++i)
+  auto is_mul_result = [&](int i)
     {
       tree rhs = gimple_op (assign, i);
       /* ??? Should we try to check for a single use as well?  */
       if (TREE_CODE (rhs) != SSA_NAME)
-	continue;
+	return false;
 
       stmt_vec_info def_stmt_info = vinfo->lookup_def (rhs);
       if (!def_stmt_info
 	  || STMT_VINFO_DEF_TYPE (def_stmt_info) != vect_internal_def)
-	continue;
+	return false;
       gassign *rhs_assign = dyn_cast<gassign *> (def_stmt_info->stmt);
       if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR)
-	continue;
+	return false;
 
       if (vec_flags & VEC_ADVSIMD)
 	{
@@ -16444,8 +16445,19 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info,
 	}
 
       return true;
-    }
-  return false;
+    };
+
+  if (code == MINUS_EXPR && (vec_flags & VEC_ADVSIMD))
+    /* Advanced SIMD doesn't have FNMADD/FNMSUB/FNMLA/FNMLS, so the
+       multiplication must be on the second operand (to form an FMLS).
+       But if both operands are multiplications and the second operand
+       is used more than once, we'll instead negate the second operand
+       and use it as an accumulator for the first operand.  */
+    return (is_mul_result (2)
+	    && (has_single_use (gimple_assign_rhs2 (assign))
+		|| !is_mul_result (1)));
+
+  return is_mul_result (1) || is_mul_result (2);
 }
 
 /* Return true if STMT_INFO is the second part of a two-statement boolean AND
diff --git a/gcc/testsuite/gcc.target/aarch64/neoverse_v1_2.c b/gcc/testsuite/gcc.target/aarch64/neoverse_v1_2.c
new file mode 100644
index 00000000000..45d7e81c78e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/neoverse_v1_2.c
@@ -0,0 +1,15 @@
+/* { dg-options "-O2 -mcpu=neoverse-v1 --param aarch64-autovec-preference=1 -fdump-tree-vect-details" } */
+
+void
+f (float x[restrict][100], float y[restrict][100])
+{
+  for (int i = 0; i < 100; ++i)
+    {
+      x[0][i] = y[0][i] * y[1][i] - y[3][i] * y[4][i];
+      x[1][i] = y[1][i] * y[2][i] - y[3][i] * y[4][i];
+    }
+}
+
+/* { dg-final { scan-tree-dump {_[0-9]+ - _[0-9]+ 1 times vector_stmt costs 2 } "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector_stmt costs 0 } "vect" } } */
+/* { dg-final { scan-tree-dump {_[0-9]+ - _[0-9]+ 1 times scalar_stmt costs 0 } "vect" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/neoverse_v1_3.c b/gcc/testsuite/gcc.target/aarch64/neoverse_v1_3.c
new file mode 100644
index 00000000000..de31fc13b28
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/neoverse_v1_3.c
@@ -0,0 +1,14 @@
+/* { dg-options "-O2 -mcpu=neoverse-v1 --param aarch64-autovec-preference=2 -fdump-tree-vect-details" } */
+
+void
+f (float x[restrict][100], float y[restrict][100])
+{
+  for (int i = 0; i < 100; ++i)
+    {
+      x[0][i] = y[0][i] * y[1][i] - y[3][i] * y[4][i];
+      x[1][i] = y[1][i] * y[2][i] - y[3][i] * y[4][i];
+    }
+}
+
+/* { dg-final { scan-tree-dump {_[0-9]+ - _[0-9]+ 1 times vector_stmt costs 0 } "vect" } } */
+/* { dg-final { scan-tree-dump {_[0-9]+ - _[0-9]+ 1 times scalar_stmt costs 0 } "vect" } } */