From patchwork Mon Jul 3 03:01:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 115174 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9f45:0:b0:3ea:f831:8777 with SMTP id v5csp249568vqx; Sun, 2 Jul 2023 20:02:03 -0700 (PDT) X-Google-Smtp-Source: APBJJlGfwpY+crcVWWNFvPEyhGRhitPV2MwdhltsrLYao2bSgnNlLx52lQZRMMJBRw+4kDjCJZxB X-Received: by 2002:a17:907:2ce6:b0:965:6075:d0e1 with SMTP id hz6-20020a1709072ce600b009656075d0e1mr7683864ejc.72.1688353323314; Sun, 02 Jul 2023 20:02:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688353323; cv=none; d=google.com; s=arc-20160816; b=pNBnv9Ekg9bgyRl5Ww40FPjD4uOfXZ90+l84MV3R1DeABQxGj6CEP15ZBihgnTPZ9B EtaquYiBXwTWWJqrZFJzvLiVEet0xroubrHh3A+/bRFpZWT1ukXU+3XuFBHiGT+F4Ykd diLPXGlPZhmjRI1aw4g6nzPOQ0A0VcrrjfsD3UtCkLTqnI+BGH6AupgrTfMv+OjpXKiB WIH9go23e9X+7lp02X9OylgI6O+JxTFnSaxCPvEko89M7a72X06igBsqk01pwytAoKFI jXS5PU90rOucxyqNf0cr9TIh8A/ZIgrcBTx0uJVzD95EKE9TPwuwKFB11hBMslyOtcB/ LMhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:in-reply-to:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=joZUgfhzu/WVPzkZgf2R5GbSauMEy8EaFXHi5o48rgk=; fh=zWr9GOjhwRk/OR7M019xb4EqVYJlk9tsKbvwqd9am2o=; b=p2xH4ovft/DbUmXR7hlkU5UZzJ7gitVduoHC5JS0g1HSseG8JnMY6Vqx08+kzKKJYZ WD1YYQu6NK5pelXXcWFdyEHdC10yRaaeiRNurVvRUV/zxDgYRw1HSXURb1sB+n3vp+y3 hf/ebEG8Fo68P5Y8d2gwudd1eF+lkRbkZ7vtKS06vetfTJj6JY5maAwGE/SE4aw107wP q5ceVzObJLIG51JlOneB7LBX4j/Gxux2zBa7rJXmw5XOL/X1X3lYkKxJjFkshdmY2z3Q 8ln1x9fWTKqPBub90bvOWQGmuKrBLAwKPOSnDYKApdAx/CrWLlIUgMJS5ieUaCa8RfZQ q+1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=GPCqSK0u; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id z17-20020a170906815100b0099238b86eecsi7457210ejw.566.2023.07.02.20.02.02 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Jul 2023 20:02:03 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=GPCqSK0u; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C9D333857BB2 for ; Mon, 3 Jul 2023 03:02:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C9D333857BB2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1688353320; bh=joZUgfhzu/WVPzkZgf2R5GbSauMEy8EaFXHi5o48rgk=; h=Date:Subject:To:Cc:References:In-Reply-To:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=GPCqSK0uAbZ0wgDeyo2o4U0+CiuY/+sPD87a7zPCF2Xzjz4u2Cat1ciimQmsFKLVr MAXrH6iovnZtDLCRyU4z+/5AI8GuN3/kwOnR5QyJ0KJyltBlq/oE8gHmg0riWkjFbl VfrMLxZUVKAgbz0gNQUVaIAW1SMhc9ACGe/upkKk= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id A907E3858281 for ; Mon, 3 Jul 2023 03:01:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A907E3858281 Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3632va7g012695; Mon, 3 Jul 2023 03:01:14 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rkp0q81wt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 03 Jul 2023 03:01:13 +0000 Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 36330VeL021590; Mon, 3 Jul 2023 03:01:13 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rkp0q81vp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 03 Jul 2023 03:01:13 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 362ClH52005097; Mon, 3 Jul 2023 03:01:11 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma03ams.nl.ibm.com (PPS) with ESMTPS id 3rjbs4s23x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 03 Jul 2023 03:01:10 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 363318AN25952794 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 3 Jul 2023 03:01:08 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 77A5D20043; Mon, 3 Jul 2023 03:01:08 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 488F220040; Mon, 3 Jul 2023 03:01:06 +0000 (GMT) Received: from [9.197.241.204] (unknown [9.197.241.204]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 3 Jul 2023 03:01:05 +0000 (GMT) Message-ID: Date: Mon, 3 Jul 2023 11:01:04 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: [PATCH 5/9 v2] vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER Content-Language: en-US To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, bergner@linux.ibm.com References: In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: GtMIl8pw9ZrCqDqJxO88u2us4dfIUTGN X-Proofpoint-ORIG-GUID: o6PXNXdRu48YU1R0qowbr56o1sLBN1js X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-07-03_02,2023-06-30_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 priorityscore=1501 mlxlogscore=999 adultscore=0 malwarescore=0 bulkscore=0 spamscore=0 impostorscore=0 mlxscore=0 suspectscore=0 lowpriorityscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2307030022 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768551497947852469?= X-GMAIL-MSGID: =?utf-8?q?1770366774447097642?= This is version v2 rebasing from latest trunk. ===== This patch adjusts the cost handling on VMAT_GATHER_SCATTER in function vectorizable_load. We don't call function vect_model_load_cost for it any more. It's mainly for gather loads with IFN or emulated gather loads, it follows the handlings in function vect_model_load_cost. This patch shouldn't have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling on VMAT_GATHER_SCATTER without calling vect_model_load_cost. (vect_model_load_cost): Adjut the assertion on VMAT_GATHER_SCATTER, remove VMAT_GATHER_SCATTER related handlings and the related parameter gs_info. --- gcc/tree-vect-stmts.cc | 124 +++++++++++++++++++++++++---------------- 1 file changed, 76 insertions(+), 48 deletions(-) -- 2.31.1 diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 7d8e72bda67..1ae917db627 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1132,11 +1132,10 @@ vect_model_load_cost (vec_info *vinfo, vect_memory_access_type memory_access_type, dr_alignment_support alignment_support_scheme, int misalignment, - gather_scatter_info *gs_info, slp_tree slp_node, stmt_vector_for_cost *cost_vec) { - gcc_assert ((memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl) + gcc_assert (memory_access_type != VMAT_GATHER_SCATTER && memory_access_type != VMAT_INVARIANT && memory_access_type != VMAT_ELEMENTWISE && memory_access_type != VMAT_STRIDED_SLP); @@ -1225,35 +1224,9 @@ vect_model_load_cost (vec_info *vinfo, group_size); } - /* The loads themselves. */ - if (memory_access_type == VMAT_GATHER_SCATTER) - { - tree vectype = STMT_VINFO_VECTYPE (stmt_info); - unsigned int assumed_nunits = vect_nunits_for_cost (vectype); - if (memory_access_type == VMAT_GATHER_SCATTER - && gs_info->ifn == IFN_LAST && !gs_info->decl) - /* For emulated gathers N offset vector element extracts - (we assume the scalar scaling and ptr + offset add is consumed by - the load). */ - inside_cost += record_stmt_cost (cost_vec, ncopies * assumed_nunits, - vec_to_scalar, stmt_info, 0, - vect_body); - /* N scalar loads plus gathering them into a vector. */ - inside_cost += record_stmt_cost (cost_vec, - ncopies * assumed_nunits, - scalar_load, stmt_info, 0, vect_body); - } - else - vect_get_load_cost (vinfo, stmt_info, ncopies, - alignment_support_scheme, misalignment, first_stmt_p, - &inside_cost, &prologue_cost, - cost_vec, cost_vec, true); - - if (memory_access_type == VMAT_GATHER_SCATTER - && gs_info->ifn == IFN_LAST - && !gs_info->decl) - inside_cost += record_stmt_cost (cost_vec, ncopies, vec_construct, - stmt_info, 0, vect_body); + vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, + misalignment, first_stmt_p, &inside_cost, &prologue_cost, + cost_vec, cost_vec, true); if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, @@ -10283,6 +10256,7 @@ vectorizable_load (vec_info *vinfo, } tree vec_mask = NULL_TREE; poly_uint64 group_elt = 0; + unsigned int inside_cost = 0; for (j = 0; j < ncopies; j++) { /* 1. Create the vector or array pointer update chain. */ @@ -10414,23 +10388,26 @@ vectorizable_load (vec_info *vinfo, /* Record that VEC_ARRAY is now dead. */ vect_clobber_variable (vinfo, stmt_info, gsi, vec_array); } - else if (!costing_p) + else { for (i = 0; i < vec_num; i++) { tree final_mask = NULL_TREE; - if (loop_masks - && memory_access_type != VMAT_INVARIANT) - final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, - vec_num * ncopies, - vectype, vec_num * j + i); - if (vec_mask) - final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, - final_mask, vec_mask, gsi); - - if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info)) - dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, - gsi, stmt_info, bump); + if (!costing_p) + { + if (loop_masks && memory_access_type != VMAT_INVARIANT) + final_mask + = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, + vec_num * ncopies, vectype, + vec_num * j + i); + if (vec_mask) + final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, + final_mask, vec_mask, gsi); + + if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info)) + dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, + gsi, stmt_info, bump); + } /* 2. Create the vector-load in the loop. */ switch (alignment_support_scheme) @@ -10444,6 +10421,16 @@ vectorizable_load (vec_info *vinfo, if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.ifn != IFN_LAST) { + if (costing_p) + { + unsigned int cnunits + = vect_nunits_for_cost (vectype); + inside_cost + = record_stmt_cost (cost_vec, cnunits, + scalar_load, stmt_info, 0, + vect_body); + goto vec_num_loop_costing_end; + } if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) vec_offset = vec_offsets[vec_num * j + i]; tree zero = build_zero_cst (vectype); @@ -10468,6 +10455,25 @@ vectorizable_load (vec_info *vinfo, gcc_assert (!final_mask); unsigned HOST_WIDE_INT const_nunits = nunits.to_constant (); + if (costing_p) + { + /* For emulated gathers N offset vector element + offset add is consumed by the load). */ + inside_cost + = record_stmt_cost (cost_vec, const_nunits, + vec_to_scalar, stmt_info, 0, + vect_body); + /* N scalar loads plus gathering them into a + vector. */ + inside_cost + = record_stmt_cost (cost_vec, const_nunits, + scalar_load, stmt_info, 0, + vect_body); + inside_cost + = record_stmt_cost (cost_vec, 1, vec_construct, + stmt_info, 0, vect_body); + goto vec_num_loop_costing_end; + } unsigned HOST_WIDE_INT const_offset_nunits = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype) .to_constant (); @@ -10520,6 +10526,9 @@ vectorizable_load (vec_info *vinfo, break; } + if (costing_p) + goto vec_num_loop_costing_end; + align = known_alignment (DR_TARGET_ALIGNMENT (first_dr_info)); if (alignment_support_scheme == dr_aligned) @@ -10734,6 +10743,8 @@ vectorizable_load (vec_info *vinfo, } case dr_explicit_realign: { + if (costing_p) + goto vec_num_loop_costing_end; tree ptr, bump; tree vs = size_int (TYPE_VECTOR_SUBPARTS (vectype)); @@ -10796,6 +10807,8 @@ vectorizable_load (vec_info *vinfo, } case dr_explicit_realign_optimized: { + if (costing_p) + goto vec_num_loop_costing_end; if (TREE_CODE (dataref_ptr) == SSA_NAME) new_temp = copy_ssa_name (dataref_ptr); else @@ -10892,10 +10905,14 @@ vectorizable_load (vec_info *vinfo, gsi, stmt_info, bump); group_elt = 0; } +vec_num_loop_costing_end: + ; } /* Bump the vector pointer to account for a gap or for excess elements loaded for a permuted SLP load. */ - if (maybe_ne (group_gap_adj, 0U) && slp_perm) + if (!costing_p + && maybe_ne (group_gap_adj, 0U) + && slp_perm) { poly_wide_int bump_val = (wi::to_wide (TYPE_SIZE_UNIT (elem_type)) @@ -10944,9 +10961,20 @@ vectorizable_load (vec_info *vinfo, *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; if (costing_p) - vect_model_load_cost (vinfo, stmt_info, ncopies, vf, memory_access_type, - alignment_support_scheme, misalignment, &gs_info, - slp_node, cost_vec); + { + if (memory_access_type == VMAT_GATHER_SCATTER) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_load_cost: inside_cost = %u, " + "prologue_cost = 0 .\n", + inside_cost); + } + else + vect_model_load_cost (vinfo, stmt_info, ncopies, vf, memory_access_type, + alignment_support_scheme, misalignment, slp_node, + cost_vec); + } return true; }