From patchwork Thu Sep 14 03:11:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 139277 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp80715vqi; Wed, 13 Sep 2023 20:16:53 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGwS+IjeItLmkC/3okz4N5PQt5LuK3m404ZJcW+nnQuQR3wj5JxhypvtP1f4lW8eOei0UGR X-Received: by 2002:a17:907:2cf2:b0:9a9:e53d:5d5b with SMTP id hz18-20020a1709072cf200b009a9e53d5d5bmr3411210ejc.41.1694661412841; Wed, 13 Sep 2023 20:16:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694661412; cv=none; d=google.com; s=arc-20160816; b=xRvlYQYzaYPEj73yf/M+yHENmofaj3il9uiJgzgTqQ+bbBaaiul598O8v5XnQ3msOq 78vwd39vDD/LVgl4ZFe2nkTRa+4IAMk3HQvvUrNvC/BQVCP4vDPR25BnNPeiu3Q3Teu6 EjPYxurbDGG+zTTXdTZXhqxCUtZltxPWqrzB87nAj+ZO8TrP9OmTNx6uxNA9yOOuItvH nXmilXoiwi0gM4QZB8cjAO0KgGPe/lXkjyEGXX8JUuQpWBS7XrDEbiJ20VPpqPhafwjF eIMBHWDI7c84rm6isNFQ028m5YLGC1dsc1tWN+DJf9FGoR5QZQg3ogh0he1R7Sro9/86 FTYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=yYh0FyzWyXBm7mpdnmIpfTcS3TOiit4V6OIoPGWJJ5U=; fh=se0EChbRuDzWeQLxh4ma2F8om/5/For7fEVL6Npc7+A=; b=k0LjsMpOG+t2vdZZgZLgsYbYYyFDi30iQUD+/+1C3miNf3fRPwhLRhrtcKqBo3cZ8V ft4j+ugf1wyisAxJqzkDK5mnP3On6nP4hoX+sX1lRtNKoHTPX+j6iOEWQknJAYOjczba zilVn16WgH5y4hZSglM9TbeQ37jnb1e6u0gGAx60YSImUFbl/uXiBzT8Wj8FSvzitjNT XEavS09o3vQtGrw3GVGAp7Q+1ylAf9a3aIGYNAa/IYxeSpyp9Zg+6/ytKoQzBmCbweVA KuiH3BuuiRtqBR3AjEAcYCjh1CHNZG/2P94sw+ysKEWjuQuKL55YJpv/eOSgyD8jbV5v mKvg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=mVGG+2DP; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id dx11-20020a170906a84b00b009939cd92a18si487724ejb.73.2023.09.13.20.16.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Sep 2023 20:16:52 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=mVGG+2DP; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1A24538319C9 for ; Thu, 14 Sep 2023 03:14:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1A24538319C9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694661261; bh=yYh0FyzWyXBm7mpdnmIpfTcS3TOiit4V6OIoPGWJJ5U=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=mVGG+2DPxOOLFd71MF+rIjYcNFMp6X30T7d7xnN8eJIHzae6D499ht4beCEu2Jbzo x6nYaQA4pNXuq7o7KsSHyOadqjyJgJyCYXvno4hQqit3Zp/ZCHEhuEJGZ16d7l4fO8 rH1Y7koDudDx5hfgmJwDB/VCTMUkVljvhnCoRQr0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 098BE3858C2F for ; Thu, 14 Sep 2023 03:12:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 098BE3858C2F Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38E394do028954; Thu, 14 Sep 2023 03:12:12 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3sq3rat7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:12 +0000 Received: from m0356516.ppops.net (m0356516.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38E3CBSW006555; Thu, 14 Sep 2023 03:12:12 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3sq3rasu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:11 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38E37pTk002779; Thu, 14 Sep 2023 03:12:11 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3t14hm7q3s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:11 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38E3C9IE33161942 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Sep 2023 03:12:09 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 158DE20040; Thu, 14 Sep 2023 03:12:09 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4EBF820049; Thu, 14 Sep 2023 03:12:08 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 14 Sep 2023 03:12:08 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com Subject: [PATCH 07/10] vect: Adjust vectorizable_store costing on VMAT_CONTIGUOUS_PERMUTE Date: Wed, 13 Sep 2023 22:11:56 -0500 Message-Id: <03074b183ea6c016691e6174a331de1443bdf326.1694657494.git.linkw@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: A2JfLfMfARBZUkf4s45XnWoCAw9YMSnj X-Proofpoint-ORIG-GUID: aqmzHGyxKzfHsu3um06hgj3w4Z32xtXO X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-13_19,2023-09-13_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 suspectscore=0 adultscore=0 spamscore=0 clxscore=1015 malwarescore=0 phishscore=0 bulkscore=0 lowpriorityscore=0 priorityscore=1501 impostorscore=0 mlxlogscore=626 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309140025 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1776981285147165987 X-GMAIL-MSGID: 1776981285147165987 This patch adjusts the cost handling on VMAT_CONTIGUOUS_PERMUTE in function vectorizable_store. We don't call function vect_model_store_cost for it any more. It's the case of interleaving stores, so it skips all stmts excepting for first_stmt_info, consider the whole group when costing first_stmt_info. This patch shouldn't have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Assert it will never get VMAT_CONTIGUOUS_PERMUTE and remove VMAT_CONTIGUOUS_PERMUTE related handlings. (vectorizable_store): Adjust the cost handling on VMAT_CONTIGUOUS_PERMUTE without calling vect_model_store_cost. --- gcc/tree-vect-stmts.cc | 128 ++++++++++++++++++++++++----------------- 1 file changed, 74 insertions(+), 54 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index fbd16b8a487..e3ba8077091 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -967,10 +967,10 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, gcc_assert (memory_access_type != VMAT_GATHER_SCATTER && memory_access_type != VMAT_ELEMENTWISE && memory_access_type != VMAT_STRIDED_SLP - && memory_access_type != VMAT_LOAD_STORE_LANES); + && memory_access_type != VMAT_LOAD_STORE_LANES + && memory_access_type != VMAT_CONTIGUOUS_PERMUTE); + unsigned int inside_cost = 0, prologue_cost = 0; - stmt_vec_info first_stmt_info = stmt_info; - bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); /* ??? Somehow we need to fix this at the callers. */ if (slp_node) @@ -983,35 +983,6 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, stmt_info, 0, vect_prologue); } - /* Grouped stores update all elements in the group at once, - so we want the DR for the first statement. */ - if (!slp_node && grouped_access_p) - first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info); - - /* True if we should include any once-per-group costs as well as - the cost of the statement itself. For SLP we only get called - once per group anyhow. */ - bool first_stmt_p = (first_stmt_info == stmt_info); - - /* We assume that the cost of a single store-lanes instruction is - equivalent to the cost of DR_GROUP_SIZE separate stores. If a grouped - access is instead being provided by a permute-and-store operation, - include the cost of the permutes. */ - if (first_stmt_p - && memory_access_type == VMAT_CONTIGUOUS_PERMUTE) - { - /* Uses a high and low interleave or shuffle operations for each - needed permute. */ - int group_size = DR_GROUP_SIZE (first_stmt_info); - int nstmts = ncopies * ceil_log2 (group_size) * group_size; - inside_cost = record_stmt_cost (cost_vec, nstmts, vec_perm, - stmt_info, 0, vect_body); - - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "vect_model_store_cost: strided group_size = %d .\n", - group_size); - } /* Costs of the stores. */ vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, @@ -8408,9 +8379,7 @@ vectorizable_store (vec_info *vinfo, costing, use the first one instead. */ if (grouped_store && !slp - && first_stmt_info != stmt_info - && (memory_access_type == VMAT_ELEMENTWISE - || memory_access_type == VMAT_LOAD_STORE_LANES)) + && first_stmt_info != stmt_info) return true; } gcc_assert (memory_access_type == STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info)); @@ -9254,14 +9223,15 @@ vectorizable_store (vec_info *vinfo, return true; } + unsigned inside_cost = 0, prologue_cost = 0; auto_vec result_chain (group_size); auto_vec vec_oprnds; for (j = 0; j < ncopies; j++) { gimple *new_stmt; - if (j == 0 && !costing_p) + if (j == 0) { - if (slp) + if (slp && !costing_p) { /* Get vectorized arguments for SLP_NODE. */ vect_get_vec_defs (vinfo, stmt_info, slp_node, 1, op, @@ -9287,13 +9257,20 @@ vectorizable_store (vec_info *vinfo, that there is no interleaving, DR_GROUP_SIZE is 1, and only one iteration of the loop will be executed. */ op = vect_get_store_rhs (next_stmt_info); - vect_get_vec_defs_for_operand (vinfo, next_stmt_info, ncopies, - op, gvec_oprnds[i]); - vec_oprnd = (*gvec_oprnds[i])[0]; - dr_chain.quick_push (vec_oprnd); + if (costing_p + && memory_access_type == VMAT_CONTIGUOUS_PERMUTE) + update_prologue_cost (&prologue_cost, op); + else if (!costing_p) + { + vect_get_vec_defs_for_operand (vinfo, next_stmt_info, + ncopies, op, + gvec_oprnds[i]); + vec_oprnd = (*gvec_oprnds[i])[0]; + dr_chain.quick_push (vec_oprnd); + } next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info); } - if (mask) + if (mask && !costing_p) { vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopies, mask, &vec_masks, @@ -9303,11 +9280,13 @@ vectorizable_store (vec_info *vinfo, } /* We should have catched mismatched types earlier. */ - gcc_assert (useless_type_conversion_p (vectype, - TREE_TYPE (vec_oprnd))); + gcc_assert (costing_p + || useless_type_conversion_p (vectype, + TREE_TYPE (vec_oprnd))); bool simd_lane_access_p = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) != 0; - if (simd_lane_access_p + if (!costing_p + && simd_lane_access_p && !loop_masks && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0)) @@ -9319,7 +9298,7 @@ vectorizable_store (vec_info *vinfo, dataref_ptr = unshare_expr (DR_BASE_ADDRESS (first_dr_info->dr)); dataref_offset = build_int_cst (ref_type, 0); } - else + else if (!costing_p) dataref_ptr = vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_type, simd_lane_access_p ? loop : NULL, @@ -9347,16 +9326,46 @@ vectorizable_store (vec_info *vinfo, } new_stmt = NULL; - if (!costing_p && grouped_store) - /* Permute. */ - vect_permute_store_chain (vinfo, dr_chain, group_size, stmt_info, gsi, - &result_chain); + if (grouped_store) + { + /* Permute. */ + gcc_assert (memory_access_type == VMAT_CONTIGUOUS_PERMUTE); + if (costing_p) + { + int group_size = DR_GROUP_SIZE (first_stmt_info); + int nstmts = ceil_log2 (group_size) * group_size; + inside_cost += record_stmt_cost (cost_vec, nstmts, vec_perm, + stmt_info, 0, vect_body); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_store_cost: " + "strided group_size = %d .\n", + group_size); + } + else + vect_permute_store_chain (vinfo, dr_chain, group_size, stmt_info, + gsi, &result_chain); + } stmt_vec_info next_stmt_info = first_stmt_info; for (i = 0; i < vec_num; i++) { if (costing_p) - continue; + { + if (memory_access_type == VMAT_CONTIGUOUS_PERMUTE) + vect_get_store_cost (vinfo, stmt_info, 1, + alignment_support_scheme, misalignment, + &inside_cost, cost_vec); + + if (!slp) + { + next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info); + if (!next_stmt_info) + break; + } + + continue; + } unsigned misalign; unsigned HOST_WIDE_INT align; @@ -9540,9 +9549,20 @@ vectorizable_store (vec_info *vinfo, } if (costing_p) - vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, - alignment_support_scheme, misalignment, vls_type, - slp_node, cost_vec); + { + if (memory_access_type == VMAT_CONTIGUOUS_PERMUTE) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_store_cost: inside_cost = %d, " + "prologue_cost = %d .\n", + inside_cost, prologue_cost); + } + else + vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, + alignment_support_scheme, misalignment, vls_type, + slp_node, cost_vec); + } return true; }