From patchwork Fri Sep  2 10:02:25 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@arm.com>
X-Patchwork-Id: 926
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:ecc5:0:0:0:0:0 with SMTP id s5csp650380wro;
        Fri, 2 Sep 2022 03:03:11 -0700 (PDT)
X-Google-Smtp-Source: 
 AA6agR5FivsqNdWR8fykuKqXikgxEzYf4AV01HjZT5j5fHq+yRvL1eZeTM7LQhaqSiqxIdBdmsBy
X-Received: by 2002:a17:907:2d2b:b0:731:2179:5ba with SMTP id
 gs43-20020a1709072d2b00b00731217905bamr28357821ejc.207.1662112991443;
        Fri, 02 Sep 2022 03:03:11 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1662112991; cv=none;
        d=google.com; s=arc-20160816;
        b=AdhrbOu2Pd4R7+HTXMIsFbta5r3CI23tG99LG2LozKYyJO3+PXSUQ04UDC9LK5CiiD
         UU37/D4Wt5TbSnx87D6lycPOjp0CUZywsbeMLzUMlpDwhRhX0wKDfHIdZrwmun88yS24
         lr24dS6drUK7rFiNq86g5WxRCEzsC0VwfSYeqg2UFsArVK1lstAY5qKU5CFbrxKM7yai
         NP9J/VlOPhds34buX7suVgZ0V4UJXsrw78egscA9nezFRX+kZ/8scOlPjgagjGWs2d3c
         ITftjWM1G/fHVULRbQqNozJQoVV4kSwpSQ178VFLQFHXO2af5w+1kJ0m6a5BmadegMOb
         WYWA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help
         :list-post:list-archive:list-unsubscribe:list-id:precedence
         :mime-version:user-agent:message-id:date:subject:mail-followup-to:to
         :dmarc-filter:delivered-to:dkim-signature:dkim-filter;
        bh=z9SgpJxjBbQlIQb416B0DDy2va+p5LtnGHIoLJG+Uus=;
        b=PjCgILMJDyk1JVzOOzT4tOMzhJ7UALVVqu5YIzs6dR6gnLpgGojuO7JCN5hxLmzJxJ
         kQ5WKMdHft5SjmwXn4F8Sfe2YmFxmdc0v3mf9dp+GjCuRbWL+oClHqvkm8zPF8YYMIKE
         69/iNDzX2Q/Bn95ccIrjxK1HHl1Ey6tU7QvYCyhE9qFUmL8yW1lbqSDVoBCDLGa9W8gk
         7EfdIjpoxSs0Ubt+KIwDYo7bl6M9wOOw6L/mOXfvG3VyWUn0zN4ucjKaApOlan18DXz+
         xWlm5PuvBgypD5dPLXeg9wpdSF0XLhSt7c9S/vyt8y6xUasK4xlJPXCdxIIjJ6PGLN/4
         sC+Q==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=AYdr6HGB;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from sourceware.org (server2.sourceware.org.
 [2620:52:3:1:0:246e:9693:128c])
        by mx.google.com with ESMTPS id
 i4-20020aa7dd04000000b004489bc073ebsi1235692edv.574.2022.09.02.03.03.11
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 02 Sep 2022 03:03:11 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 client-ip=2620:52:3:1:0:246e:9693:128c;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=AYdr6HGB;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 2AE5F3858425
	for <ouuuleilei@gmail.com>; Fri,  2 Sep 2022 10:03:10 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2AE5F3858425
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1662112990;
	bh=z9SgpJxjBbQlIQb416B0DDy2va+p5LtnGHIoLJG+Uus=;
	h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:Cc:From;
	b=AYdr6HGBi7A8uR/FiGUBKzLGUVJBCwnDKXueyNlbICz+mQzjgyKvRmA+vp7pDoZ8K
	 O7iktc4tMinK5TkcLBBxqcP9Epzo+5zHqr5g1w5znSzSsi2EkXiHCdsU7BrwGsMhPf
	 ZjOEAfzvKJSJuESBHCgxpkb9pwTGsOyke9fjunew=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 779C23858C54
 for <gcc-patches@gcc.gnu.org>; Fri,  2 Sep 2022 10:02:27 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 779C23858C54
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4972BD6E;
 Fri,  2 Sep 2022 03:02:33 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9B66B3F71A;
 Fri,  2 Sep 2022 03:02:26 -0700 (PDT)
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, rguenther@suse.de,
 richard.sandiford@arm.com
Subject: [PATCH] vect: Use better fallback costs in layout subpass
Date: Fri, 02 Sep 2022 11:02:25 +0100
Message-ID: <mpt8rn2ccwe.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
X-Spam-Status: No, score=-49.5 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE,
 SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Richard Sandiford via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Richard Sandiford <richard.sandiford@arm.com>
Reply-To: Richard Sandiford <richard.sandiford@arm.com>
Cc: rguenther@suse.de
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1742851792079683285?=
X-GMAIL-MSGID: =?utf-8?q?1742851792079683285?=

vect_optimize_slp_pass always treats the starting layout as valid,
to avoid having to "optimise" when every possible choice is invalid.
But it gives the starting layout a high cost if it seems like the
target might reject it, in the hope that this will encourage other
(valid) layouts.

The testcase for PR106787 showed that this was flawed, since it was
triggering even in cases where the number of input lanes is different
from the number of output lanes.  Picking such a high cost could also
make costs for loop-invariant nodes overwhelm the costs for inner-loop
nodes.

This patch makes the costing less aggressive by (a) restricting
it to N-to-N permutations and (b) assigning the maximum cost of
a permute.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
	* tree-vect-slp.cc (vect_optimize_slp_pass::internal_node_cost):
	Reduce the fallback cost to 1.  Only use it if the number of
	input lanes is equal to the number of output lanes.

gcc/testsuite/
	* gcc.dg/vect/bb-slp-layout-20.c: New test.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-layout-20.c | 33 ++++++++++++++++
 gcc/tree-vect-slp.cc                         | 40 +++++++++++++++-----
 2 files changed, 63 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-layout-20.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-layout-20.c b/gcc/testsuite/gcc.dg/vect/bb-slp-layout-20.c
new file mode 100644
index 00000000000..ed7816b3f7b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-layout-20.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fno-tree-loop-vectorize" } */
+
+extern int a[][4], b[][4], c[][4], d[4], e[4];
+void f()
+{
+  int t0 = a[0][3];
+  int t1 = a[1][3];
+  int t2 = a[2][3];
+  int t3 = a[3][3];
+  int a0 = 0, a1 = 0, a2 = 0, a3 = 0, b0 = 0, b1 = 0, b2 = 0, b3 = 0;
+  for (int i = 0; i < 400; i += 4)
+    {
+      a0 += b[i][3] * t0;
+      a1 += b[i][2] * t1;
+      a2 += b[i][1] * t2;
+      a3 += b[i][0] * t3;
+      b0 += c[i][3] * t0;
+      b1 += c[i][2] * t1;
+      b2 += c[i][1] * t2;
+      b3 += c[i][0] * t3;
+    }
+  d[0] = a0;
+  d[1] = a1;
+  d[2] = a2;
+  d[3] = a3;
+  e[0] = b0;
+  e[1] = b1;
+  e[2] = b2;
+  e[3] = b3;
+}
+
+/* { dg-final { scan-tree-dump-times "add new stmt: \[^\\n\\r\]* = VEC_PERM_EXPR" 3 "slp1" { target { vect_int_mult && vect_perm } } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 59ec66a6f96..b10f69da133 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -4436,18 +4436,19 @@ change_vec_perm_layout (slp_tree node, lane_permutation_t &perm,
 
    IN_LAYOUT_I has no meaning for other types of node.
 
-   Keeping the node as-is is always valid.  If the target doesn't appear to
-   support the node as-is then layout 0 has a high and arbitrary cost instead
-   of being invalid.  On the one hand, this ensures that every node has at
-   least one valid layout, avoiding what would otherwise be an awkward
-   special case.  On the other, it still encourages the pass to change
-   an invalid pre-existing layout choice into a valid one.  */
+   Keeping the node as-is is always valid.  If the target doesn't appear
+   to support the node as-is, but might realistically support other layouts,
+   then layout 0 instead has the cost of a worst-case permutation.  On the
+   one hand, this ensures that every node has at least one valid layout,
+   avoiding what would otherwise be an awkward special case.  On the other,
+   it still encourages the pass to change an invalid pre-existing layout
+   choice into a valid one.  */
 
 int
 vect_optimize_slp_pass::internal_node_cost (slp_tree node, int in_layout_i,
 					    unsigned int out_layout_i)
 {
-  const int fallback_cost = 100;
+  const int fallback_cost = 1;
 
   if (SLP_TREE_CODE (node) == VEC_PERM_EXPR)
     {
@@ -4457,8 +4458,9 @@ vect_optimize_slp_pass::internal_node_cost (slp_tree node, int in_layout_i,
       /* Check that the child nodes support the chosen layout.  Checking
 	 the first child is enough, since any second child would have the
 	 same shape.  */
+      auto first_child = SLP_TREE_CHILDREN (node)[0];
       if (in_layout_i > 0
-	  && !is_compatible_layout (SLP_TREE_CHILDREN (node)[0], in_layout_i))
+	  && !is_compatible_layout (first_child, in_layout_i))
 	return -1;
 
       change_vec_perm_layout (node, tmp_perm, in_layout_i, out_layout_i);
@@ -4469,7 +4471,15 @@ vect_optimize_slp_pass::internal_node_cost (slp_tree node, int in_layout_i,
       if (count < 0)
 	{
 	  if (in_layout_i == 0 && out_layout_i == 0)
-	    return fallback_cost;
+	    {
+	      /* Use the fallback cost if the node could in principle support
+		 some nonzero layout for both the inputs and the outputs.
+		 Otherwise assume that the node will be rejected later
+		 and rebuilt from scalars.  */
+	      if (SLP_TREE_LANES (node) == SLP_TREE_LANES (first_child))
+		return fallback_cost;
+	      return 0;
+	    }
 	  return -1;
 	}
 
@@ -4498,8 +4508,18 @@ vect_optimize_slp_pass::internal_node_cost (slp_tree node, int in_layout_i,
       if (!vect_transform_slp_perm_load_1 (m_vinfo, node, tmp_perm, vNULL,
 					   nullptr, vf, true, false, &n_perms))
 	{
+	  auto rep = SLP_TREE_REPRESENTATIVE (node);
 	  if (out_layout_i == 0)
-	    return fallback_cost;
+	    {
+	      /* Use the fallback cost if the load is an N-to-N permutation.
+		 Otherwise assume that the node will be rejected later
+		 and rebuilt from scalars.  */
+	      if (STMT_VINFO_GROUPED_ACCESS (rep)
+		  && (DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (rep))
+		      == SLP_TREE_LANES (node)))
+		return fallback_cost;
+	      return 0;
+	    }
 	  return -1;
 	}