From patchwork Fri Apr 28 12:41:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 88636 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp902805vqo; Fri, 28 Apr 2023 05:47:41 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4kerQOAo+3If6hIPQjsiK5t/tnrVTwFkRxHvJSxc5JPumudanGd1JaDdtUc9CbsW4A31at X-Received: by 2002:a17:907:d8a:b0:94e:effc:de4f with SMTP id go10-20020a1709070d8a00b0094eeffcde4fmr6017845ejc.54.1682686060979; Fri, 28 Apr 2023 05:47:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682686060; cv=none; d=google.com; s=arc-20160816; b=WvdTxBb62FG1ZdkyYAo7jSiqMaA8efFlxGEbV9HuvhUg9MwoErOSvrURqH5m1YMvSR VSkvIOWsIi8+cRtBddVCRN3xyWTAKxUqjRk9ZF7hji9SKyw2ZBS3QIAActFiToG9+arI SOG+Cb6xHH0idXMtfOlHRkjGWy+ERL+54ebzr3M513KzVIlZ2N3fJ8urBVlJ1vxePch2 kUsSs1S4wia83caIkZfFL/hjux4jPqUdIuHixAbV0spi1rmG/xCgKL84aOdylA1Sdpcc BjLv1ri6l+d8XXqRhz5XtX0A9ivXHK0Gv6l+5KniWBWWphybAvD1isnU44rBhRBoTxdZ XXhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:sender:errors-to:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :mime-version:user-agent:subject:to:date:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=VNBKhDOEPSQxR2hhOM0rN9bxlq5lLQXkAreR95CyzaE=; b=JKeNAH/DEBRuxp8m1/GHhD5oc+pxB8cR/AZdO0n1KQdBQ1+8XWYfFHqAsfSDvtrEFV 2I0Sehe/OgppvezaaHvn76AGYhNgjfx1jRszz5rrsIzMn0v7PAtIt0lkriwtFCkcJuci CEZbv7IcjR0q4kzFMHwdTkF72F4wFx5/pxmLr4q5FfcHnNL9mafXlWT5dbLnk8fR+EU2 rcblg6gEKQ5tZeVtkIuH1t4yu0X3xAZ2A9d5NuthEetZgFkGDVk873xRaN+zLpCB4BiD Lxe48A1JilKoaGIa5wWUOsaNMtgI+QyiE5gqu0a0BkJZLwLq6PDclES7wBpxXrAkSrdX TCHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=MSH5IjEL; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id qm10-20020a170907674a00b0094f742e636bsi17335559ejc.777.2023.04.28.05.47.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Apr 2023 05:47:40 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=MSH5IjEL; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CFF443889E20 for ; Fri, 28 Apr 2023 12:42:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CFF443889E20 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1682685752; bh=VNBKhDOEPSQxR2hhOM0rN9bxlq5lLQXkAreR95CyzaE=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=MSH5IjELMFETTMUyf2ayWy+ySYmh60QLKRCnh1tDhsXOUSuu3GzwJLtspfig8VeRH V3H/XsOQnlA0MZuB1We1VSn3fsp2R5013b9LIV1Z67GNGpV/yBa9RlOuyldcCnGfry G/+/bfyBiLJFHO0ielr+QP1TnZ00GeHtxNTaPlaE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id 909D33852766 for ; Fri, 28 Apr 2023 12:41:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 909D33852766 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id C08462008C for ; Fri, 28 Apr 2023 12:41:43 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id B778E2C18F for ; Fri, 28 Apr 2023 12:41:43 +0000 (UTC) Date: Fri, 28 Apr 2023 12:41:43 +0000 (UTC) To: gcc-patches@gcc.gnu.org Subject: [PATCH] Add emulated scatter capability to the vectorizer User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: Richard Biener Reply-To: Richard Biener Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" Message-Id: <20230428124232.CFF443889E20@sourceware.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1764424219134689170?= X-GMAIL-MSGID: =?utf-8?q?1764424219134689170?= This adds a scatter vectorization capability to the vectorizer without target support by decomposing the offset and data vectors and then performing scalar stores in the order of vector lanes. This is aimed at cases where vectorizing the rest of the loop offsets the cost of vectorizing the scatter. The offset load is still vectorized and costed as such, but like with emulated gather those will be turned back to scalar loads by forwrpop. Slightly fixed compared to the version posted in autumn, re-bootstrapped & tested on x86_64-unknown-linux-gnu and pushed. Richard. * tree-vect-data-refs.cc (vect_analyze_data_refs): Always consider scatters. * tree-vect-stmts.cc (vect_model_store_cost): Pass in the gather-scatter info and cost emulated scatters accordingly. (get_load_store_type): Support emulated scatters. (vectorizable_store): Likewise. Emulate them by extracting scalar offsets and data, doing scalar stores. * gcc.dg/vect/pr25413a.c: Un-XFAIL everywhere. * gcc.dg/vect/vect-71.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s4113.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s491.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-vas.c: Likewise. --- gcc/testsuite/gcc.dg/vect/pr25413a.c | 3 +- .../gcc.dg/vect/tsvc/vect-tsvc-s4113.c | 2 +- .../gcc.dg/vect/tsvc/vect-tsvc-s491.c | 2 +- .../gcc.dg/vect/tsvc/vect-tsvc-vas.c | 2 +- gcc/testsuite/gcc.dg/vect/vect-71.c | 2 +- gcc/tree-vect-data-refs.cc | 4 +- gcc/tree-vect-stmts.cc | 117 ++++++++++++++---- 7 files changed, 97 insertions(+), 35 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/pr25413a.c b/gcc/testsuite/gcc.dg/vect/pr25413a.c index e444b2c3e8e..ffb517c9ce0 100644 --- a/gcc/testsuite/gcc.dg/vect/pr25413a.c +++ b/gcc/testsuite/gcc.dg/vect/pr25413a.c @@ -123,7 +123,6 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! vect_scatter_store } } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_scatter_store } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vector alignment may not be reachable" 1 "vect" { target { ! vector_alignment_reachable } } } } */ /* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 1 "vect" { target { ! vector_alignment_reachable } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c index b64682a65df..ddb7e9dc0e8 100644 --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c @@ -39,4 +39,4 @@ int main (int argc, char **argv) return 0; } -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! aarch64_sve } } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c index 8465e137070..29e90ff0aff 100644 --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c @@ -39,4 +39,4 @@ int main (int argc, char **argv) return 0; } -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! aarch64_sve } } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vas.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vas.c index 5ff38851f43..b72ee21a9a3 100644 --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vas.c +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vas.c @@ -39,4 +39,4 @@ int main (int argc, char **argv) return 0; } -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! aarch64_sve } } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-71.c b/gcc/testsuite/gcc.dg/vect/vect-71.c index f15521176df..581473fa4a1 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-71.c +++ b/gcc/testsuite/gcc.dg/vect/vect-71.c @@ -36,4 +36,4 @@ int main (void) return main1 (); } -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_scatter_store } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index c03ffb3aaf1..6721ab6efc4 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -4464,9 +4464,7 @@ vect_analyze_data_refs (vec_info *vinfo, poly_uint64 *min_vf, bool *fatal) && !TREE_THIS_VOLATILE (DR_REF (dr)); bool maybe_scatter = DR_IS_WRITE (dr) - && !TREE_THIS_VOLATILE (DR_REF (dr)) - && (targetm.vectorize.builtin_scatter != NULL - || supports_vec_scatter_store_p ()); + && !TREE_THIS_VOLATILE (DR_REF (dr)); /* If target supports vector gather loads or scatter stores, see if they can't be used. */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index dc2dc2cfa7e..c71e28737ee 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -942,6 +942,7 @@ cfun_returns (tree decl) static void vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, vect_memory_access_type memory_access_type, + gather_scatter_info *gs_info, dr_alignment_support alignment_support_scheme, int misalignment, vec_load_store_type vls_type, slp_tree slp_node, @@ -997,8 +998,16 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, if (memory_access_type == VMAT_ELEMENTWISE || memory_access_type == VMAT_GATHER_SCATTER) { - /* N scalar stores plus extracting the elements. */ unsigned int assumed_nunits = vect_nunits_for_cost (vectype); + if (memory_access_type == VMAT_GATHER_SCATTER + && gs_info->ifn == IFN_LAST && !gs_info->decl) + /* For emulated scatter N offset vector element extracts + (we assume the scalar scaling and ptr + offset add is consumed by + the load). */ + inside_cost += record_stmt_cost (cost_vec, ncopies * assumed_nunits, + vec_to_scalar, stmt_info, 0, + vect_body); + /* N scalar stores plus extracting the elements. */ inside_cost += record_stmt_cost (cost_vec, ncopies * assumed_nunits, scalar_store, stmt_info, 0, vect_body); @@ -1008,7 +1017,9 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies, misalignment, &inside_cost, cost_vec); if (memory_access_type == VMAT_ELEMENTWISE - || memory_access_type == VMAT_STRIDED_SLP) + || memory_access_type == VMAT_STRIDED_SLP + || (memory_access_type == VMAT_GATHER_SCATTER + && gs_info->ifn == IFN_LAST && !gs_info->decl)) { /* N scalar stores plus extracting the elements. */ unsigned int assumed_nunits = vect_nunits_for_cost (vectype); @@ -2503,19 +2514,11 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, } else if (gs_info->ifn == IFN_LAST && !gs_info->decl) { - if (vls_type != VLS_LOAD) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "unsupported emulated scatter.\n"); - return false; - } - else if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant () - || !TYPE_VECTOR_SUBPARTS - (gs_info->offset_vectype).is_constant () - || !constant_multiple_p (TYPE_VECTOR_SUBPARTS - (gs_info->offset_vectype), - TYPE_VECTOR_SUBPARTS (vectype))) + if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant () + || !TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype).is_constant () + || !constant_multiple_p (TYPE_VECTOR_SUBPARTS + (gs_info->offset_vectype), + TYPE_VECTOR_SUBPARTS (vectype))) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -7824,6 +7827,15 @@ vectorizable_store (vec_info *vinfo, "unsupported access type for masked store.\n"); return false; } + else if (memory_access_type == VMAT_GATHER_SCATTER + && gs_info.ifn == IFN_LAST + && !gs_info.decl) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "unsupported masked emulated scatter.\n"); + return false; + } } else { @@ -7887,7 +7899,8 @@ vectorizable_store (vec_info *vinfo, STMT_VINFO_TYPE (stmt_info) = store_vec_info_type; vect_model_store_cost (vinfo, stmt_info, ncopies, - memory_access_type, alignment_support_scheme, + memory_access_type, &gs_info, + alignment_support_scheme, misalignment, vls_type, slp_node, cost_vec); return true; } @@ -8527,12 +8540,9 @@ vectorizable_store (vec_info *vinfo, dataref_offset = build_int_cst (ref_type, 0); } else if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) - { - vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info, - slp_node, &gs_info, &dataref_ptr, - &vec_offsets); - vec_offset = vec_offsets[0]; - } + vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info, + slp_node, &gs_info, &dataref_ptr, + &vec_offsets); else dataref_ptr = vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_type, @@ -8558,9 +8568,7 @@ vectorizable_store (vec_info *vinfo, if (dataref_offset) dataref_offset = int_const_binop (PLUS_EXPR, dataref_offset, bump); - else if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) - vec_offset = vec_offsets[j]; - else + else if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info)) dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, gsi, stmt_info, bump); } @@ -8648,8 +8656,11 @@ vectorizable_store (vec_info *vinfo, final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, final_mask, vec_mask, gsi); - if (memory_access_type == VMAT_GATHER_SCATTER) + if (memory_access_type == VMAT_GATHER_SCATTER + && gs_info.ifn != IFN_LAST) { + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) + vec_offset = vec_offsets[vec_num * j + i]; tree scale = size_int (gs_info.scale); gcall *call; if (final_mask) @@ -8665,6 +8676,60 @@ vectorizable_store (vec_info *vinfo, new_stmt = call; break; } + else if (memory_access_type == VMAT_GATHER_SCATTER) + { + /* Emulated scatter. */ + gcc_assert (!final_mask); + unsigned HOST_WIDE_INT const_nunits = nunits.to_constant (); + unsigned HOST_WIDE_INT const_offset_nunits + = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype) + .to_constant (); + vec *ctor_elts; + vec_alloc (ctor_elts, const_nunits); + gimple_seq stmts = NULL; + tree elt_type = TREE_TYPE (vectype); + unsigned HOST_WIDE_INT elt_size + = tree_to_uhwi (TYPE_SIZE (elt_type)); + /* We support offset vectors with more elements + than the data vector for now. */ + unsigned HOST_WIDE_INT factor + = const_offset_nunits / const_nunits; + vec_offset = vec_offsets[j / factor]; + unsigned elt_offset = (j % factor) * const_nunits; + tree idx_type = TREE_TYPE (TREE_TYPE (vec_offset)); + tree scale = size_int (gs_info.scale); + align = get_object_alignment (DR_REF (first_dr_info->dr)); + tree ltype = build_aligned_type (TREE_TYPE (vectype), align); + for (unsigned k = 0; k < const_nunits; ++k) + { + /* Compute the offsetted pointer. */ + tree boff = size_binop (MULT_EXPR, TYPE_SIZE (idx_type), + bitsize_int (k + elt_offset)); + tree idx = gimple_build (&stmts, BIT_FIELD_REF, + idx_type, vec_offset, + TYPE_SIZE (idx_type), boff); + idx = gimple_convert (&stmts, sizetype, idx); + idx = gimple_build (&stmts, MULT_EXPR, + sizetype, idx, scale); + tree ptr = gimple_build (&stmts, PLUS_EXPR, + TREE_TYPE (dataref_ptr), + dataref_ptr, idx); + ptr = gimple_convert (&stmts, ptr_type_node, ptr); + /* Extract the element to be stored. */ + tree elt = gimple_build (&stmts, BIT_FIELD_REF, + TREE_TYPE (vectype), vec_oprnd, + TYPE_SIZE (elt_type), + bitsize_int (k * elt_size)); + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); + stmts = NULL; + tree ref = build2 (MEM_REF, ltype, ptr, + build_int_cst (ref_type, 0)); + new_stmt = gimple_build_assign (ref, elt); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + } + break; + } if (i > 0) /* Bump the vector pointer. */