From patchwork Tue Jun 13 02:03:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 107024 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp252558vqr; Mon, 12 Jun 2023 19:09:38 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5UuyCwkVzmpWr+yiD3LKQApV5BDzwdyAPOp9DZW7j6cekBoGNPjmEovaicMg9fFhaRb15S X-Received: by 2002:a17:907:6d9e:b0:97e:a917:e6a5 with SMTP id sb30-20020a1709076d9e00b0097ea917e6a5mr8229256ejc.19.1686622178802; Mon, 12 Jun 2023 19:09:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686622178; cv=none; d=google.com; s=arc-20160816; b=iKCcidGZAoLJ/hmcEHmACxYk6Qw0V+qcpA9fWrt3YAwDt3P5NQobOsrEG/suA4X8RT cGXHVhsrVe1NyDnJDDAPVlBQgMeNtG3fPITwwhFZt+nmuikYr2AqOBk4YRWHh7v98Jr+ KA7xRgD7fnOl2uiankk60cC/2bL+6ZcqSUFyuhV4Hn/o8QVLiAWjLD9gdPL0gWpw178y 4bfbhxNmd3zore6vTR4eM8/oJbaaUIE7OD7ayWHYafWayR2t4XA9JmVujhjmwHzrBuKZ oOqs37jwrI88KqnpCqtQt/DCGh/ed8LLgBhsEipUhmC1gfQg05ogm3VMJZU1LNzhfr5L hxRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=ImoyiwXX94q+FFMQvdFVUKR8M60ZthY2akheBRg/UW4=; b=MaF76/YB2vvtkWDBQlJYnDEtkLMhX/O5vZM5I6x/aFTuEToOS5ACsBj0VbJElFyEmX nF+5p36WvzzETp5tf4Lr2iD2swI0IHiHVH7xk5lkv9LYVVke8c0JX62FbfgnOBNiMUGY g2FXt2twpciIGDKidQ4vN1y7gLXHJZAFZ6bEyug8lgtgJkdFW8U5wqECJooH/89HlqBP vWgSn5DQ9bWeNorySdrmFdtJ3u24O30wTG3Sv/WPdyWGM5y9cQBnNxm2tOP2PdXBi2/J VtSoaEiX1IeXN4BS/dxlFpE09uPSVPkvEITA3zkQifTN0+22Rcbv29nRIAlBjRY+amA0 Ivfw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=DbIpmLC5; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id a19-20020a17090682d300b0096f79e9c5efsi2674366ejy.590.2023.06.12.19.09.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Jun 2023 19:09:38 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=DbIpmLC5; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B63153855886 for ; Tue, 13 Jun 2023 02:08:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B63153855886 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1686622101; bh=ImoyiwXX94q+FFMQvdFVUKR8M60ZthY2akheBRg/UW4=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=DbIpmLC54hNpaEQQb9ceBX+n5Lt3isPIIXTyCO9I5Ygts6YBTJYpdvJO33FrLNPzX osW2s7Dr71F/wOFgR7I9MZ4Te7P5CVKH0v2yj5HoW73/QnciLEAf0r29oonFd+IBOm aC9ecZGOTlWC6v2PWT5S4xQ4QG0zuHq2w5I64Dy0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 8BBC8385C6E8 for ; Tue, 13 Jun 2023 02:07:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8BBC8385C6E8 Received: from pps.filterd (m0353722.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35D27QHs012364; Tue, 13 Jun 2023 02:07:31 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f3era6r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:07:30 +0000 Received: from m0353722.ppops.net (m0353722.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35D27TVU012849; Tue, 13 Jun 2023 02:07:29 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f3era21-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:07:28 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35D0J485032234; Tue, 13 Jun 2023 02:03:52 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma03ams.nl.ibm.com (PPS) with ESMTPS id 3r4gt51u1c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:52 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35D23oxv27197988 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Jun 2023 02:03:50 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5314C20043; Tue, 13 Jun 2023 02:03:50 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 64FF720040; Tue, 13 Jun 2023 02:03:49 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 13 Jun 2023 02:03:49 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, bergner@linux.ibm.com Subject: [PATCH 7/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_REVERSE Date: Mon, 12 Jun 2023 21:03:28 -0500 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: LTRSoTFJCf6CNUvPZkwnE0L90vWKn63m X-Proofpoint-ORIG-GUID: CXPBCiOfpGBWTpc4KvoEfczgtoeTXfEf X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-12_18,2023-06-12_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxscore=0 lowpriorityscore=0 spamscore=0 malwarescore=0 adultscore=0 suspectscore=0 mlxlogscore=999 clxscore=1015 impostorscore=0 bulkscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306130016 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768551537289399543?= X-GMAIL-MSGID: =?utf-8?q?1768551537289399543?= This patch adjusts the cost handling on VMAT_CONTIGUOUS_REVERSE in function vectorizable_load. We don't call function vect_model_load_cost for it any more. This change makes us not miscount some required vector permutation as the associated test case shows. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_load_cost): Assert it won't get VMAT_CONTIGUOUS_REVERSE any more. (vectorizable_load): Adjust the costing handling on VMAT_CONTIGUOUS_REVERSE without calling vect_model_load_cost. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c: New test. --- .../costmodel/ppc/costmodel-vect-reversed.c | 22 ++++ gcc/tree-vect-stmts.cc | 109 ++++++++++++------ 2 files changed, 93 insertions(+), 38 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c new file mode 100644 index 00000000000..651274be038 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-additional-options "-mvsx" } */ + +/* Verify we do cost the required vec_perm. */ + +int x[1024], y[1024]; + +void +foo () +{ + for (int i = 0; i < 512; ++i) + { + x[2 * i] = y[1023 - (2 * i)]; + x[2 * i + 1] = y[1023 - (2 * i + 1)]; + } +} +/* The reason why it doesn't check the exact count is that + retrying for the epilogue with partial vector capability + like Power10 can result in more than 1 vec_perm. */ +/* { dg-final { scan-tree-dump {\mvec_perm\M} "vect" } } */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 4c5ce2ab278..7f8d9db5363 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1134,11 +1134,8 @@ vect_model_load_cost (vec_info *vinfo, slp_tree slp_node, stmt_vector_for_cost *cost_vec) { - gcc_assert (memory_access_type != VMAT_GATHER_SCATTER - && memory_access_type != VMAT_INVARIANT - && memory_access_type != VMAT_ELEMENTWISE - && memory_access_type != VMAT_STRIDED_SLP - && memory_access_type != VMAT_LOAD_STORE_LANES); + gcc_assert (memory_access_type == VMAT_CONTIGUOUS + || memory_access_type == VMAT_CONTIGUOUS_PERMUTE); unsigned int inside_cost = 0, prologue_cost = 0; bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); @@ -10292,7 +10289,7 @@ vectorizable_load (vec_info *vinfo, = record_stmt_cost (cost_vec, cnunits, scalar_load, stmt_info, 0, vect_body); - goto vec_num_loop_costing_end; + break; } if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) vec_offset = vec_offsets[vec_num * j + i]; @@ -10335,7 +10332,7 @@ vectorizable_load (vec_info *vinfo, inside_cost = record_stmt_cost (cost_vec, 1, vec_construct, stmt_info, 0, vect_body); - goto vec_num_loop_costing_end; + break; } unsigned HOST_WIDE_INT const_offset_nunits = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype) @@ -10390,7 +10387,7 @@ vectorizable_load (vec_info *vinfo, } if (costing_p) - goto vec_num_loop_costing_end; + break; align = known_alignment (DR_TARGET_ALIGNMENT (first_dr_info)); @@ -10563,7 +10560,7 @@ vectorizable_load (vec_info *vinfo, case dr_explicit_realign: { if (costing_p) - goto vec_num_loop_costing_end; + break; tree ptr, bump; tree vs = size_int (TYPE_VECTOR_SUBPARTS (vectype)); @@ -10627,7 +10624,7 @@ vectorizable_load (vec_info *vinfo, case dr_explicit_realign_optimized: { if (costing_p) - goto vec_num_loop_costing_end; + break; if (TREE_CODE (dataref_ptr) == SSA_NAME) new_temp = copy_ssa_name (dataref_ptr); else @@ -10650,22 +10647,37 @@ vectorizable_load (vec_info *vinfo, default: gcc_unreachable (); } - vec_dest = vect_create_destination_var (scalar_dest, vectype); - /* DATA_REF is null if we've already built the statement. */ - if (data_ref) + + /* One common place to cost the above vect load for different + alignment support schemes. */ + if (costing_p) { - vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr)); - new_stmt = gimple_build_assign (vec_dest, data_ref); + if (memory_access_type == VMAT_CONTIGUOUS_REVERSE) + vect_get_load_cost (vinfo, stmt_info, 1, + alignment_support_scheme, misalignment, + false, &inside_cost, &prologue_cost, + cost_vec, cost_vec, true); + } + else + { + vec_dest = vect_create_destination_var (scalar_dest, vectype); + /* DATA_REF is null if we've already built the statement. */ + if (data_ref) + { + vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr)); + new_stmt = gimple_build_assign (vec_dest, data_ref); + } + new_temp = make_ssa_name (vec_dest, new_stmt); + gimple_set_lhs (new_stmt, new_temp); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); } - new_temp = make_ssa_name (vec_dest, new_stmt); - gimple_set_lhs (new_stmt, new_temp); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); /* 3. Handle explicit realignment if necessary/supported. Create in loop: vec_dest = realign_load (msq, lsq, realignment_token) */ - if (alignment_support_scheme == dr_explicit_realign_optimized - || alignment_support_scheme == dr_explicit_realign) + if (!costing_p + && (alignment_support_scheme == dr_explicit_realign_optimized + || alignment_support_scheme == dr_explicit_realign)) { lsq = gimple_assign_lhs (new_stmt); if (!realignment_token) @@ -10690,26 +10702,34 @@ vectorizable_load (vec_info *vinfo, if (memory_access_type == VMAT_CONTIGUOUS_REVERSE) { - tree perm_mask = perm_mask_for_reverse (vectype); - new_temp = permute_vec_elements (vinfo, new_temp, new_temp, - perm_mask, stmt_info, gsi); - new_stmt = SSA_NAME_DEF_STMT (new_temp); + if (costing_p) + inside_cost = record_stmt_cost (cost_vec, 1, vec_perm, + stmt_info, 0, vect_body); + else + { + tree perm_mask = perm_mask_for_reverse (vectype); + new_temp + = permute_vec_elements (vinfo, new_temp, new_temp, + perm_mask, stmt_info, gsi); + new_stmt = SSA_NAME_DEF_STMT (new_temp); + } } /* Collect vector loads and later create their permutation in vect_transform_grouped_load (). */ - if (grouped_load || slp_perm) + if (!costing_p && (grouped_load || slp_perm)) dr_chain.quick_push (new_temp); /* Store vector loads in the corresponding SLP_NODE. */ - if (slp && !slp_perm) + if (!costing_p && slp && !slp_perm) SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt); /* With SLP permutation we load the gaps as well, without we need to skip the gaps after we manage to fully load all elements. group_gap_adj is DR_GROUP_SIZE here. */ group_elt += nunits; - if (maybe_ne (group_gap_adj, 0U) + if (!costing_p + && maybe_ne (group_gap_adj, 0U) && !slp_perm && known_eq (group_elt, group_size - group_gap_adj)) { @@ -10724,8 +10744,6 @@ vectorizable_load (vec_info *vinfo, gsi, stmt_info, bump); group_elt = 0; } -vec_num_loop_costing_end: - ; } /* Bump the vector pointer to account for a gap or for excess elements loaded for a permuted SLP load. */ @@ -10748,18 +10766,30 @@ vec_num_loop_costing_end: if (slp && !slp_perm) continue; - if (slp_perm && !costing_p) - { + if (slp_perm) + { unsigned n_perms; /* For SLP we know we've seen all possible uses of dr_chain so direct vect_transform_slp_perm_load to DCE the unused parts. ??? This is a hack to prevent compile-time issues as seen in PR101120 and friends. */ - bool ok = vect_transform_slp_perm_load (vinfo, slp_node, dr_chain, - gsi, vf, false, &n_perms, - nullptr, true); - gcc_assert (ok); - } + if (costing_p + && memory_access_type != VMAT_CONTIGUOUS + && memory_access_type != VMAT_CONTIGUOUS_PERMUTE) + { + vect_transform_slp_perm_load (vinfo, slp_node, vNULL, nullptr, vf, + true, &n_perms, nullptr); + inside_cost = record_stmt_cost (cost_vec, n_perms, vec_perm, + stmt_info, 0, vect_body); + } + else if (!costing_p) + { + bool ok = vect_transform_slp_perm_load (vinfo, slp_node, dr_chain, + gsi, vf, false, &n_perms, + nullptr, true); + gcc_assert (ok); + } + } else if (!costing_p) { if (grouped_load) @@ -10781,8 +10811,11 @@ vec_num_loop_costing_end: if (costing_p) { - if (memory_access_type == VMAT_GATHER_SCATTER - || memory_access_type == VMAT_LOAD_STORE_LANES) + gcc_assert (memory_access_type != VMAT_INVARIANT + && memory_access_type != VMAT_ELEMENTWISE + && memory_access_type != VMAT_STRIDED_SLP); + if (memory_access_type != VMAT_CONTIGUOUS + && memory_access_type != VMAT_CONTIGUOUS_PERMUTE) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location,