From patchwork Thu Sep 14 03:11:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 139272 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp80071vqi; Wed, 13 Sep 2023 20:15:05 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGcJgAAPUapiqN+tiIsvqmfWqEtTZVYuN54Fn92F76GkiDzdtKWiUrxPBsUfZBsk6NgYZpd X-Received: by 2002:a2e:809a:0:b0:2bf:7905:12c3 with SMTP id i26-20020a2e809a000000b002bf790512c3mr3795818ljg.40.1694661305327; Wed, 13 Sep 2023 20:15:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694661305; cv=none; d=google.com; s=arc-20160816; b=CqQD13LN50VFXfXx66V0cl26lgQa/IquDZyezGJLJLakhF124oxkZBTvw0/3TtTbwg 5zA3gXDsecQu1xq2VY4WpUjX6IVVQnQaLbwlinVgJgn5RN2/4F44elAC7gRCJAcj1ifd MiXPHQHZoy3s16PxhP+2sYRCBikBeqHtXW7jQY55KATHo+HRgwbKsGfVtvp7sKNOHpm7 ks1ht3GXXyPUZhgunPdKRGqvP5cQytkRYpf3f0Qy0eyT5Y4Fr/ku90xcmqZw9M+dJ7Cq TgDDy57/ypIa2zZQoX9x2m5mo5A4CbroPu0cgCqWSuoQifAcAsArBWp9LSt0+eBo3/fS f0PQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=2gtAuGwjK5byV+aXTzqLrKTXQsQlUooJHhtV5KqlxNU=; fh=se0EChbRuDzWeQLxh4ma2F8om/5/For7fEVL6Npc7+A=; b=jS/7HOFqQvnxbF2QBFulQRPeVRQEoFJrTijAXfshJdRgzUt0OuVLYqjdIIpYZAyXqC wS/KeGOZ+LHVf6Ozvz9uDEogNQxZS74jid2nU3X874Wb44bLNvLEKuLbQyHG4Sz/Bb0P R77efyMrcbZwNavCLAlB18uw1yuWx9YnwrVkDSEp344TANzFbcMApzF/KXpKvdrIBrJV WDOGVRCWyjhGPLyVFfEw6c3Oxmkt+5nnKLv1NLq10YyzSAg24tCGcuourSrqZSLHN10F jGa0lI15u8HBtc34WbVWayO8yJMJWhP+C9yLtH5vP0cAz51lOdGmb6W89xBUZuD1Og1b QBvQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=QMt7CUwL; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id fy22-20020a170906b7d600b0098debb6fa67si444673ejb.495.2023.09.13.20.15.04 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Sep 2023 20:15:05 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=QMt7CUwL; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2679B3858414 for ; Thu, 14 Sep 2023 03:13:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2679B3858414 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694661204; bh=2gtAuGwjK5byV+aXTzqLrKTXQsQlUooJHhtV5KqlxNU=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=QMt7CUwLp3VuPjqPOqOeo0GaqiosuJ2lMTtDtgVWHkGCQbJIXGW7hCRe6RC+NzcWq 8YwNQf+zfU4PLSCcfv/cpue7AUaDDV2ARkpzQcXBN15U0RxzDvXm9X2qbUb/c/m83r 9NpQimekmly261G8eBLD3aUVlydHsj8YhiW4flMI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id CE8FA385842C for ; Thu, 14 Sep 2023 03:12:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CE8FA385842C Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38E36h4Y011974; Thu, 14 Sep 2023 03:12:11 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3st2rdc9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:10 +0000 Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38E3779o015668; Thu, 14 Sep 2023 03:12:09 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t3st2rdby-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:09 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38E16WU8011980; Thu, 14 Sep 2023 03:12:08 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3t15r27b1v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2023 03:12:08 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38E3C78l22610664 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Sep 2023 03:12:07 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EC5F620049; Thu, 14 Sep 2023 03:12:06 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2E48320040; Thu, 14 Sep 2023 03:12:06 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 14 Sep 2023 03:12:05 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com Subject: [PATCH 05/10] vect: Adjust vectorizable_store costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP Date: Wed, 13 Sep 2023 22:11:54 -0500 Message-Id: <2adef8b10433859b6642282b03a11df33c732d11.1694657494.git.linkw@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: yeA2gKqtTBGEbibTmtBvE8-nb3XC44uw X-Proofpoint-ORIG-GUID: XYGBd0vX17AYdtgKu1StvDPTSNgs5i43 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-13_19,2023-09-13_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 malwarescore=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 impostorscore=0 phishscore=0 priorityscore=1501 adultscore=0 suspectscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309140025 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1776981172541121887 X-GMAIL-MSGID: 1776981172541121887 This patch adjusts the cost handling on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP in function vectorizable_store. We don't call function vect_model_store_cost for them any more. Like what we improved for PR82255 on load side, this change helps us to get rid of unnecessary vec_to_scalar costing for some case with VMAT_STRIDED_SLP. One typical test case gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c has been associated. And it helps some cases with some inconsistent costing too. Besides, this also special-cases the interleaving stores for these two affected memory access types, since for the interleaving stores the whole chain is vectorized when the last store in the chain is reached, the other stores in the group would be skipped. To keep consistent with this and follows the transforming handlings like iterating the whole group, it only costs for the first store in the group. Ideally we can only cost for the last one but it's not trivial and using the first one is actually equivalent. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_store_cost): Assert it won't get VMAT_ELEMENTWISE and VMAT_STRIDED_SLP any more, and remove their related handlings. (vectorizable_store): Adjust the cost handling on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP without calling vect_model_store_cost. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c: New test. --- .../costmodel/ppc/costmodel-vect-store-1.c | 23 +++ gcc/tree-vect-stmts.cc | 160 +++++++++++------- 2 files changed, 120 insertions(+), 63 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c new file mode 100644 index 00000000000..ab5f3301492 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-O3" } + +/* This test case is partially extracted from case + gcc.dg/vect/vect-avg-16.c, it's to verify we don't + cost a store with vec_to_scalar when we shouldn't. */ + +void +test (signed char *restrict a, signed char *restrict b, signed char *restrict c, + int n) +{ + for (int j = 0; j < n; ++j) + { + for (int i = 0; i < 16; ++i) + a[i] = (b[i] + c[i]) >> 1; + a += 20; + b += 20; + c += 20; + } +} + +/* { dg-final { scan-tree-dump-times "vec_to_scalar" 0 "vect" } } */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 048c14d291c..3d01168080a 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -964,7 +964,9 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, vec_load_store_type vls_type, slp_tree slp_node, stmt_vector_for_cost *cost_vec) { - gcc_assert (memory_access_type != VMAT_GATHER_SCATTER); + gcc_assert (memory_access_type != VMAT_GATHER_SCATTER + && memory_access_type != VMAT_ELEMENTWISE + && memory_access_type != VMAT_STRIDED_SLP); unsigned int inside_cost = 0, prologue_cost = 0; stmt_vec_info first_stmt_info = stmt_info; bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); @@ -1010,29 +1012,9 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, group_size); } - tree vectype = STMT_VINFO_VECTYPE (stmt_info); /* Costs of the stores. */ - if (memory_access_type == VMAT_ELEMENTWISE) - { - unsigned int assumed_nunits = vect_nunits_for_cost (vectype); - /* N scalar stores plus extracting the elements. */ - inside_cost += record_stmt_cost (cost_vec, - ncopies * assumed_nunits, - scalar_store, stmt_info, 0, vect_body); - } - else - vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, - misalignment, &inside_cost, cost_vec); - - if (memory_access_type == VMAT_ELEMENTWISE - || memory_access_type == VMAT_STRIDED_SLP) - { - /* N scalar stores plus extracting the elements. */ - unsigned int assumed_nunits = vect_nunits_for_cost (vectype); - inside_cost += record_stmt_cost (cost_vec, - ncopies * assumed_nunits, - vec_to_scalar, stmt_info, 0, vect_body); - } + vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, + misalignment, &inside_cost, cost_vec); /* When vectorizing a store into the function result assign a penalty if the function returns in a multi-register location. @@ -8416,6 +8398,18 @@ vectorizable_store (vec_info *vinfo, "Vectorizing an unaligned access.\n"); STMT_VINFO_TYPE (stmt_info) = store_vec_info_type; + + /* As function vect_transform_stmt shows, for interleaving stores + the whole chain is vectorized when the last store in the chain + is reached, the other stores in the group are skipped. So we + want to only cost the last one here, but it's not trivial to + get the last, as it's equivalent to use the first one for + costing, use the first one instead. */ + if (grouped_store + && !slp + && first_stmt_info != stmt_info + && memory_access_type == VMAT_ELEMENTWISE) + return true; } gcc_assert (memory_access_type == STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info)); @@ -8488,14 +8482,7 @@ vectorizable_store (vec_info *vinfo, if (memory_access_type == VMAT_ELEMENTWISE || memory_access_type == VMAT_STRIDED_SLP) { - if (costing_p) - { - vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type, - alignment_support_scheme, misalignment, - vls_type, slp_node, cost_vec); - return true; - } - + unsigned inside_cost = 0, prologue_cost = 0; gimple_stmt_iterator incr_gsi; bool insert_after; gimple *incr; @@ -8503,7 +8490,7 @@ vectorizable_store (vec_info *vinfo, tree ivstep; tree running_off; tree stride_base, stride_step, alias_off; - tree vec_oprnd; + tree vec_oprnd = NULL_TREE; tree dr_offset; unsigned int g; /* Checked by get_load_store_type. */ @@ -8609,26 +8596,30 @@ vectorizable_store (vec_info *vinfo, lnel = const_nunits; ltype = vectype; lvectype = vectype; + alignment_support_scheme = dr_align; + misalignment = mis_align; } } ltype = build_aligned_type (ltype, TYPE_ALIGN (elem_type)); ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); } - ivstep = stride_step; - ivstep = fold_build2 (MULT_EXPR, TREE_TYPE (ivstep), ivstep, - build_int_cst (TREE_TYPE (ivstep), vf)); + if (!costing_p) + { + ivstep = stride_step; + ivstep = fold_build2 (MULT_EXPR, TREE_TYPE (ivstep), ivstep, + build_int_cst (TREE_TYPE (ivstep), vf)); - standard_iv_increment_position (loop, &incr_gsi, &insert_after); + standard_iv_increment_position (loop, &incr_gsi, &insert_after); - stride_base = cse_and_gimplify_to_preheader (loop_vinfo, stride_base); - ivstep = cse_and_gimplify_to_preheader (loop_vinfo, ivstep); - create_iv (stride_base, PLUS_EXPR, ivstep, NULL, - loop, &incr_gsi, insert_after, - &offvar, NULL); - incr = gsi_stmt (incr_gsi); + stride_base = cse_and_gimplify_to_preheader (loop_vinfo, stride_base); + ivstep = cse_and_gimplify_to_preheader (loop_vinfo, ivstep); + create_iv (stride_base, PLUS_EXPR, ivstep, NULL, loop, &incr_gsi, + insert_after, &offvar, NULL); + incr = gsi_stmt (incr_gsi); - stride_step = cse_and_gimplify_to_preheader (loop_vinfo, stride_step); + stride_step = cse_and_gimplify_to_preheader (loop_vinfo, stride_step); + } alias_off = build_int_cst (ref_type, 0); stmt_vec_info next_stmt_info = first_stmt_info; @@ -8636,39 +8627,76 @@ vectorizable_store (vec_info *vinfo, for (g = 0; g < group_size; g++) { running_off = offvar; - if (g) + if (!costing_p) { - tree size = TYPE_SIZE_UNIT (ltype); - tree pos = fold_build2 (MULT_EXPR, sizetype, size_int (g), - size); - tree newoff = copy_ssa_name (running_off, NULL); - incr = gimple_build_assign (newoff, POINTER_PLUS_EXPR, - running_off, pos); - vect_finish_stmt_generation (vinfo, stmt_info, incr, gsi); - running_off = newoff; + if (g) + { + tree size = TYPE_SIZE_UNIT (ltype); + tree pos + = fold_build2 (MULT_EXPR, sizetype, size_int (g), size); + tree newoff = copy_ssa_name (running_off, NULL); + incr = gimple_build_assign (newoff, POINTER_PLUS_EXPR, + running_off, pos); + vect_finish_stmt_generation (vinfo, stmt_info, incr, gsi); + running_off = newoff; + } } if (!slp) op = vect_get_store_rhs (next_stmt_info); - vect_get_vec_defs (vinfo, next_stmt_info, slp_node, ncopies, - op, &vec_oprnds); + if (!costing_p) + vect_get_vec_defs (vinfo, next_stmt_info, slp_node, ncopies, op, + &vec_oprnds); + else if (!slp) + { + enum vect_def_type cdt; + gcc_assert (vect_is_simple_use (op, vinfo, &cdt)); + if (cdt == vect_constant_def || cdt == vect_external_def) + prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, + stmt_info, 0, vect_prologue); + } unsigned int group_el = 0; unsigned HOST_WIDE_INT elsz = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (vectype))); for (j = 0; j < ncopies; j++) { - vec_oprnd = vec_oprnds[j]; - /* Pun the vector to extract from if necessary. */ - if (lvectype != vectype) + if (!costing_p) { - tree tem = make_ssa_name (lvectype); - gimple *pun - = gimple_build_assign (tem, build1 (VIEW_CONVERT_EXPR, - lvectype, vec_oprnd)); - vect_finish_stmt_generation (vinfo, stmt_info, pun, gsi); - vec_oprnd = tem; + vec_oprnd = vec_oprnds[j]; + /* Pun the vector to extract from if necessary. */ + if (lvectype != vectype) + { + tree tem = make_ssa_name (lvectype); + tree cvt + = build1 (VIEW_CONVERT_EXPR, lvectype, vec_oprnd); + gimple *pun = gimple_build_assign (tem, cvt); + vect_finish_stmt_generation (vinfo, stmt_info, pun, gsi); + vec_oprnd = tem; + } } for (i = 0; i < nstores; i++) { + if (costing_p) + { + /* Only need vector extracting when there are more + than one stores. */ + if (nstores > 1) + inside_cost + += record_stmt_cost (cost_vec, 1, vec_to_scalar, + stmt_info, 0, vect_body); + /* Take a single lane vector type store as scalar + store to avoid ICE like 110776. */ + if (VECTOR_TYPE_P (ltype) + && known_ne (TYPE_VECTOR_SUBPARTS (ltype), 1U)) + vect_get_store_cost (vinfo, stmt_info, 1, + alignment_support_scheme, + misalignment, &inside_cost, + cost_vec); + else + inside_cost + += record_stmt_cost (cost_vec, 1, scalar_store, + stmt_info, 0, vect_body); + continue; + } tree newref, newoff; gimple *incr, *assign; tree size = TYPE_SIZE (ltype); @@ -8719,6 +8747,12 @@ vectorizable_store (vec_info *vinfo, break; } + if (costing_p && dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_store_cost: inside_cost = %d, " + "prologue_cost = %d .\n", + inside_cost, prologue_cost); + return true; }