From patchwork Tue Jun 13 02:03:29 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Kewen.Lin" <linkw@linux.ibm.com>
X-Patchwork-Id: 107021
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp251841vqr;
        Mon, 12 Jun 2023 19:07:40 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ6W1LNx4gCFwVKoDd8BfiB8BWW1gp52aiKcVM7mnsJnKme2knEL36V+yFvUf9SMTuf8ZBe1
X-Received: by 2002:a17:907:36c8:b0:973:d84a:33a4 with SMTP id
 bj8-20020a17090736c800b00973d84a33a4mr10787351ejc.6.1686622060344;
        Mon, 12 Jun 2023 19:07:40 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1686622060; cv=none;
        d=google.com; s=arc-20160816;
        b=HQ7KJuuOkJTy7Kp8uFCQUudTUpLJOVjWCkyJ5JzTcsTDajuPH/H2c5aOt93V32WDWm
         BWNeAhzhnmeFyqHwGlb8tvI+ETWebwX1m0McuOfB3Nw/JhTdT3IzHd+/b7CVHBw1x/F6
         cdzwH6Yfi/ChFUzTmP9/BEJ1SHPvicP3YJtqY8eA1MxtWGGZqCO69caP4+CAcCNbzjd/
         D83JrdY1A81yqe7K0q6GhtOoWWjONeFPbOwTcZD3V3wUOY15mvbfi5GGv6ucrUGlqFlZ
         2ABQOpNJdTgt+n0jhfOa79uf9zrrz1gsxfxUTpFbTj6d0ItVr29Wmkm6PW6ZwcCOJ6M9
         qD7w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post
         :list-archive:list-unsubscribe:list-id:precedence
         :content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:dmarc-filter:delivered-to
         :dkim-signature:dkim-filter;
        bh=UWeazaYN0EcO47hx+2yvOIhXof+YK144D1RZcXN1UYo=;
        b=B7kr8Yyl163cHjy1pqP6fLtSXSnxa6VXbXHvFQvHm7h9JL6Iq32+FqW3OtPAr46zvz
         oKS5YWgetSW/sZV39LiFzjAugQSwRiN7nMAzz1ecYph7bEgACdIDPCqHEsVlAt/xY2mG
         Wop7O8ER4oVpNKFufgWmRD96ZuMe1atphV33hADHoufnp0xkHVHxDLNk4RQGt9qlU6Jd
         q5Ne5HyLekgHNVJJBSN82SZD1O/6s+pCHyaFlR0D2ZxZVsq2GGO+jku0S+8DoqOoAhw4
         RCOw2YUVDTOZUB83e30AFIt+dXpmHvlqaxlUi5AkRNzrEuyHrTjO3GvZVrN6dIpaDs+1
         dQ2A==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=eFZfcJWL;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from sourceware.org (server2.sourceware.org. [8.43.85.97])
        by mx.google.com with ESMTPS id
 y15-20020aa7cccf000000b0051062e32fd2si6565682edt.68.2023.06.12.19.07.40
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 12 Jun 2023 19:07:40 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender) client-ip=8.43.85.97;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=eFZfcJWL;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 9199438555A3
	for <ouuuleilei@gmail.com>; Tue, 13 Jun 2023 02:06:57 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9199438555A3
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1686622017;
	bh=UWeazaYN0EcO47hx+2yvOIhXof+YK144D1RZcXN1UYo=;
	h=To:Cc:Subject:Date:In-Reply-To:References:List-Id:
	 List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe:
	 From:Reply-To:From;
	b=eFZfcJWLt0XzigAUFgvrcLxgetTtNise68r99T2Z9w4ABLZw7K6CBlQ+h+u0gMtjP
	 R4gpY4y/FYigZWcOv44HDlvnb6/H1+wvL0lTK4V8Y2WZKhZoX/IDbvaPlgjoAPEi5P
	 LZ6aA0ATXhIOUraT2sh0FeX5oqELFwwPkqrDcHKY=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com
 [148.163.158.5])
 by sourceware.org (Postfix) with ESMTPS id 2ACD93858C74
 for <gcc-patches@gcc.gnu.org>; Tue, 13 Jun 2023 02:03:59 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2ACD93858C74
Received: from pps.filterd (m0356516.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id
 35D1qaB5008139; Tue, 13 Jun 2023 02:03:57 GMT
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f6886pw-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Tue, 13 Jun 2023 02:03:56 +0000
Received: from m0356516.ppops.net (m0356516.ppops.net [127.0.0.1])
 by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35D1xV8A026350;
 Tue, 13 Jun 2023 02:03:56 GMT
Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com
 [149.81.74.108])
 by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f6886p7-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Tue, 13 Jun 2023 02:03:56 +0000
Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1])
 by ppma05fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35D1uSHa031197;
 Tue, 13 Jun 2023 02:03:54 GMT
Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224])
 by ppma05fra.de.ibm.com (PPS) with ESMTPS id 3r4gt4sbt3-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Tue, 13 Jun 2023 02:03:54 +0000
Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com
 [10.20.54.103])
 by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 35D23phU18547298
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Tue, 13 Jun 2023 02:03:51 GMT
Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id BBA062004B;
 Tue, 13 Jun 2023 02:03:51 +0000 (GMT)
Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 906F720040;
 Tue, 13 Jun 2023 02:03:50 +0000 (GMT)
Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100])
 by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP;
 Tue, 13 Jun 2023 02:03:50 +0000 (GMT)
To: gcc-patches@gcc.gnu.org
Cc: richard.guenther@gmail.com, richard.sandiford@arm.com,
 segher@kernel.crashing.org, bergner@linux.ibm.com, ubizjak@gmail.com,
 hongtao.liu@intel.com
Subject: [PATCH 8/9] vect: Adjust vectorizable_load costing on
 VMAT_CONTIGUOUS_PERMUTE
Date: Mon, 12 Jun 2023 21:03:29 -0500
Message-Id: 
 <216bf6e61d4fe2caa6b87ae1e5c8e15b6d31c409.1686573640.git.linkw@linux.ibm.com>
X-Mailer: git-send-email 2.31.1
In-Reply-To: <cover.1686573640.git.linkw@linux.ibm.com>
References: <cover.1686573640.git.linkw@linux.ibm.com>
MIME-Version: 1.0
X-TM-AS-GCONF: 00
X-Proofpoint-ORIG-GUID: J5OwV_98bKJCcbKliKwJrpaeYtsci4Fn
X-Proofpoint-GUID: Z8UQk-ZCwXKPGzI_xQ3vJWIl2fGKb1Yp
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26
 definitions=2023-06-12_18,2023-06-12_02,2023-05-22_02
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 adultscore=0 bulkscore=0
 phishscore=0 lowpriorityscore=0 malwarescore=0 suspectscore=0 spamscore=0
 mlxlogscore=999 impostorscore=0 priorityscore=1501 clxscore=1011
 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2305260000 definitions=main-2306130016
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H5,
 RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Kewen Lin via Gcc-patches <gcc-patches@gcc.gnu.org>
From: "Kewen.Lin" <linkw@linux.ibm.com>
Reply-To: Kewen Lin <linkw@linux.ibm.com>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1768551413403506804?=
X-GMAIL-MSGID: =?utf-8?q?1768551413403506804?=

This patch adjusts the cost handling on
VMAT_CONTIGUOUS_PERMUTE in function vectorizable_load.  We
don't call function vect_model_load_cost for it any more.

As the affected test case gcc.target/i386/pr70021.c shows,
the previous costing can under-cost the total generated
vector loads as for VMAT_CONTIGUOUS_PERMUTE function
vect_model_load_cost doesn't consider the group size which
is considered as vec_num during the transformation.

This patch makes the count of vector load in costing become
consistent with what we generates during the transformation.
To be more specific, for the given test case, for memory
access b[i_20], it costed for 2 vector loads before,
with this patch it costs 8 instead, it matches the final
count of generated vector loads basing from b.  This costing
change makes cost model analysis feel it's not profitable
to vectorize the first loop, so this patch adjusts the test
case without vect cost model any more.

But note that this test case also exposes something we can
improve further is that although the number of vector
permutation what we costed and generated are consistent,
but DCE can further optimize some unused permutation out,
it would be good if we can predict that and generate only
those necessary permutations.

gcc/ChangeLog:

	* tree-vect-stmts.cc (vect_model_load_cost): Assert this function only
	handle memory_access_type VMAT_CONTIGUOUS, remove some
	VMAT_CONTIGUOUS_PERMUTE related handlings.
	(vectorizable_load): Adjust the cost handling on VMAT_CONTIGUOUS_PERMUTE
	without calling vect_model_load_cost.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr70021.c: Adjust with -fno-vect-cost-model.
---
 gcc/testsuite/gcc.target/i386/pr70021.c |  2 +-
 gcc/tree-vect-stmts.cc                  | 88 ++++++++++++++-----------
 2 files changed, 51 insertions(+), 39 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr70021.c b/gcc/testsuite/gcc.target/i386/pr70021.c
index 6562c0f2bd0..d509583601e 100644
--- a/gcc/testsuite/gcc.target/i386/pr70021.c
+++ b/gcc/testsuite/gcc.target/i386/pr70021.c
@@ -1,7 +1,7 @@
 /* PR target/70021 */
 /* { dg-do run } */
 /* { dg-require-effective-target avx2 } */
-/* { dg-options "-O2 -ftree-vectorize -mavx2 -fdump-tree-vect-details -mtune=skylake" } */
+/* { dg-options "-O2 -ftree-vectorize -mavx2 -fdump-tree-vect-details -mtune=skylake -fno-vect-cost-model" } */
 
 #include "avx2-check.h"
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 7f8d9db5363..e7a97dbe05d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1134,8 +1134,7 @@ vect_model_load_cost (vec_info *vinfo,
 		      slp_tree slp_node,
 		      stmt_vector_for_cost *cost_vec)
 {
-  gcc_assert (memory_access_type == VMAT_CONTIGUOUS
-	      || memory_access_type == VMAT_CONTIGUOUS_PERMUTE);
+  gcc_assert (memory_access_type == VMAT_CONTIGUOUS);
 
   unsigned int inside_cost = 0, prologue_cost = 0;
   bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
@@ -1174,26 +1173,6 @@ vect_model_load_cost (vec_info *vinfo,
      once per group anyhow.  */
   bool first_stmt_p = (first_stmt_info == stmt_info);
 
-  /* We assume that the cost of a single load-lanes instruction is
-     equivalent to the cost of DR_GROUP_SIZE separate loads.  If a grouped
-     access is instead being provided by a load-and-permute operation,
-     include the cost of the permutes.  */
-  if (first_stmt_p
-      && memory_access_type == VMAT_CONTIGUOUS_PERMUTE)
-    {
-      /* Uses an even and odd extract operations or shuffle operations
-	 for each needed permute.  */
-      int group_size = DR_GROUP_SIZE (first_stmt_info);
-      int nstmts = ncopies * ceil_log2 (group_size) * group_size;
-      inside_cost += record_stmt_cost (cost_vec, nstmts, vec_perm,
-				       stmt_info, 0, vect_body);
-
-      if (dump_enabled_p ())
-        dump_printf_loc (MSG_NOTE, vect_location,
-                         "vect_model_load_cost: strided group_size = %d .\n",
-                         group_size);
-    }
-
   vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
 		      misalignment, first_stmt_p, &inside_cost, &prologue_cost,
 		      cost_vec, cost_vec, true);
@@ -10652,11 +10631,22 @@ vectorizable_load (vec_info *vinfo,
 		 alignment support schemes.  */
 	      if (costing_p)
 		{
-		  if (memory_access_type == VMAT_CONTIGUOUS_REVERSE)
+		  /* For VMAT_CONTIGUOUS_PERMUTE if it's grouped load, we
+		     only need to take care of the first stmt, whose
+		     stmt_info is first_stmt_info, vec_num iterating on it
+		     will cover the cost for the remaining, it's consistent
+		     with transforming.  For the prologue cost for realign,
+		     we only need to count it once for the whole group.  */
+		  bool first_stmt_info_p = first_stmt_info == stmt_info;
+		  bool add_realign_cost = first_stmt_info_p && i == 0;
+		  if (memory_access_type == VMAT_CONTIGUOUS_REVERSE
+		      || (memory_access_type == VMAT_CONTIGUOUS_PERMUTE
+			  && (!grouped_load || first_stmt_info_p)))
 		    vect_get_load_cost (vinfo, stmt_info, 1,
 					alignment_support_scheme, misalignment,
-					false, &inside_cost, &prologue_cost,
-					cost_vec, cost_vec, true);
+					add_realign_cost, &inside_cost,
+					&prologue_cost, cost_vec, cost_vec,
+					true);
 		}
 	      else
 		{
@@ -10774,8 +10764,7 @@ vectorizable_load (vec_info *vinfo,
 	     ???  This is a hack to prevent compile-time issues as seen
 	     in PR101120 and friends.  */
 	  if (costing_p
-	      && memory_access_type != VMAT_CONTIGUOUS
-	      && memory_access_type != VMAT_CONTIGUOUS_PERMUTE)
+	      && memory_access_type != VMAT_CONTIGUOUS)
 	    {
 	      vect_transform_slp_perm_load (vinfo, slp_node, vNULL, nullptr, vf,
 					    true, &n_perms, nullptr);
@@ -10790,20 +10779,44 @@ vectorizable_load (vec_info *vinfo,
 	      gcc_assert (ok);
 	    }
 	}
-      else if (!costing_p)
+      else
         {
           if (grouped_load)
   	    {
 	      if (memory_access_type != VMAT_LOAD_STORE_LANES)
-		vect_transform_grouped_load (vinfo, stmt_info, dr_chain,
-					     group_size, gsi);
-	      *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
-	    }
-          else
-	    {
-	      STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+		{
+		  gcc_assert (memory_access_type == VMAT_CONTIGUOUS_PERMUTE);
+		  /* We assume that the cost of a single load-lanes instruction
+		     is equivalent to the cost of DR_GROUP_SIZE separate loads.
+		     If a grouped access is instead being provided by a
+		     load-and-permute operation, include the cost of the
+		     permutes.  */
+		  if (costing_p && first_stmt_info == stmt_info)
+		    {
+		      /* Uses an even and odd extract operations or shuffle
+			 operations for each needed permute.  */
+		      int group_size = DR_GROUP_SIZE (first_stmt_info);
+		      int nstmts = ceil_log2 (group_size) * group_size;
+		      inside_cost
+			+= record_stmt_cost (cost_vec, nstmts, vec_perm,
+					     stmt_info, 0, vect_body);
+
+		      if (dump_enabled_p ())
+			dump_printf_loc (
+			  MSG_NOTE, vect_location,
+			  "vect_model_load_cost: strided group_size = %d .\n",
+			  group_size);
+		    }
+		  else if (!costing_p)
+		    vect_transform_grouped_load (vinfo, stmt_info, dr_chain,
+						 group_size, gsi);
+		}
+	      if (!costing_p)
+		*vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
 	    }
-        }
+	  else if (!costing_p)
+	    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+	}
       dr_chain.release ();
     }
   if (!slp && !costing_p)
@@ -10814,8 +10827,7 @@ vectorizable_load (vec_info *vinfo,
       gcc_assert (memory_access_type != VMAT_INVARIANT
 		  && memory_access_type != VMAT_ELEMENTWISE
 		  && memory_access_type != VMAT_STRIDED_SLP);
-      if (memory_access_type != VMAT_CONTIGUOUS
-	  && memory_access_type != VMAT_CONTIGUOUS_PERMUTE)
+      if (memory_access_type != VMAT_CONTIGUOUS)
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_NOTE, vect_location,