From patchwork Tue Jul 4 07:53:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 115599 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9f45:0:b0:3ea:f831:8777 with SMTP id v5csp1046167vqx; Tue, 4 Jul 2023 00:54:27 -0700 (PDT) X-Google-Smtp-Source: APBJJlGRlLK0DmIC23p3LtAVClXcW9FmvVCAjlbTvEws8jcqw/irc3OmUSHHObuSQnPT+LfmXY9B X-Received: by 2002:a17:906:edc7:b0:987:fac5:9fb9 with SMTP id sb7-20020a170906edc700b00987fac59fb9mr8018433ejb.35.1688457267767; Tue, 04 Jul 2023 00:54:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688457267; cv=none; d=google.com; s=arc-20160816; b=dLrdOitxeppZFaNJxGJv0e1SYdFWsSn+ZcMfuCiVXFHbZQLJ6vENVgJG5v0iWX9AY1 07JTXrVw43oqYAmnYUGqsbICv3pibc2oe3vRg3jLEzgxVx8tXj+xlrw9kO2YniwN95Fh 7VHGJ4rPXb/3S5WrZZ9esKwywg7ATBSHxN8yzc7QvnLfWnku8kAcCCF1I14SJmkkB1sz Rgq2kTg6uX6z7wEGEvbSz5h1/fdXFOfu0HtZ3kD5dJKX/9xPkl6SccxjpLkXefaHCY6B +jATtY371wpDqP7Dtws8cOkCYaxlnUku3INba03JwkzJtO1SsCV6SI4AHKzX+3j2bLpm AE2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=NCO0Cew4xLimvIucx5AufHTpVSzqdASHk9IGin3De8Y=; fh=h6UG6PIw2FD/e8JL+9n/VDDoO75bXx4B3jGj5okR2Rw=; b=vBx0Bgjr5nIPDmNZpiOuZgO8hX2zUalGQLT/yYci4zAHjH3B78VuxI7nVhTx3Zm+rW 7N/pVsQpklwIH/TbFDKE7f6A438WHFPLM0sYWS505W4k0Hv6M1C7+P972hoZyrBEG6U5 UQAeYAnMQdkdlS6uY7KMBFcSP52F9wJEV+xK7h0ADYWsGCW8yRArq15VoTmdAPYvP5YZ INrEQI/PwIYSYvbbP7tg2cSO5OPEZhGRds2FbaJMstdXzfnTTjcMx2dWw7gevGXYXNMj AwUlcFtZ1kqXb0EMmUxfyynxm/qVphplDLLZweWqqYyvuHG4GQbE10dIjcl8tQzCCUB/ elOg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id qw19-20020a170906fcb300b0099307123b55si4016904ejb.76.2023.07.04.00.54.27 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jul 2023 00:54:27 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4BC4B3857009 for ; Tue, 4 Jul 2023 07:54:17 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbg151.qq.com (smtpbg151.qq.com [18.169.211.239]) by sourceware.org (Postfix) with ESMTPS id 64E323857722 for ; Tue, 4 Jul 2023 07:53:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 64E323857722 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp71t1688457203t8i6ese5 Received: from server1.localdomain ( [58.60.1.22]) by bizesmtp.qq.com (ESMTP) with id ; Tue, 04 Jul 2023 15:53:22 +0800 (CST) X-QQ-SSF: 01400000000000G0T000000A0000000 X-QQ-FEAT: QityeSR92A3PCgq4CLRTT8R/tkOW38qcdPWmaN6920PNpKWWXE1wC/nXrpVzR ZU5rIc9MmoKQT7PRzIbmQW2Cyo8MIY23Ozr4wGnTnUdvEhrXyAQgutTd9ZImXoE2wcdTS1Z 4Fe6XENRwQX/PxlUaUhfq44T56rD1QXN8DcciEse2jGjDqacg3b3ntseXHAVrkAUQ8cJIRe 2+/QryfcHf6oXtj0iy4AewY7ORdqFo2uWfEVlhTvgGPG9miyv9xoF6gXuzqaS6MkH975Mcy nEO734QpNi/QTyRiTShSxrcJ86JJy3pdkJIyILui/X2GXFlNksOUvzo2eH3fF60YxUF5unM 1g2qbFynXuHFU/37cRKzSYumXyA9HZXKfBy2id6aZY9c6eGp2My51eess7aUjvCoR+cE802 k+jwzLFZ2I0= X-QQ-GoodBg: 2 X-BIZMAIL-ID: 14791377389684768458 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Ju-Zhe Zhong Subject: [PATCH V2] VECT: Apply LEN_MASK_GATHER_LOAD/SCATTER_STORE into vectorizer Date: Tue, 4 Jul 2023 15:53:20 +0800 Message-Id: <20230704075320.195407-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1770475768415319982?= X-GMAIL-MSGID: =?utf-8?q?1770475768415319982?= From: Ju-Zhe Zhong Hi, Richard and Richi. The len_mask_gather_load/len_mask_scatter_store patterns have been added. Now, this patch applies them into vectorizer. Here is the example: void f (int *restrict a, int *restrict b, int n, int base, int step, int *restrict cond) { for (int i = 0; i < n; ++i) { if (cond[i]) a[i * 4] = b[i]; } } Gimple IR: [local count: 105119324]: _58 = (unsigned long) n_13(D); [local count: 630715945]: # vectp_cond.7_45 = PHI # vectp_b.11_51 = PHI # vectp_a.14_55 = PHI # ivtmp_59 = PHI _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [2, 2]); ivtmp_44 = _61 * 4; vect__4.9_47 = .LEN_MASK_LOAD (vectp_cond.7_45, 32B, _61, 0, { -1, ... }); mask__24.10_49 = vect__4.9_47 != { 0, ... }; vect__8.13_53 = .LEN_MASK_LOAD (vectp_b.11_51, 32B, _61, 0, mask__24.10_49); ivtmp_54 = _61 * 16; .LEN_MASK_SCATTER_STORE (vectp_a.14_55, { 0, 16, 32, ... }, 1, vect__8.13_53, _61, 0, mask__24.10_49); vectp_cond.7_46 = vectp_cond.7_45 + ivtmp_44; vectp_b.11_52 = vectp_b.11_51 + ivtmp_44; vectp_a.14_56 = vectp_a.14_55 + ivtmp_54; ivtmp_60 = ivtmp_59 - _61; if (ivtmp_60 != 0) goto ; [83.33%] else goto ; [16.67%] gcc/ChangeLog: * optabs-query.cc (supports_vec_gather_load_p): Apply LEN_MASK_GATHER_LOAD/SCATTER_STORE into vectorizer. (supports_vec_scatter_store_p): Ditto. * tree-vect-data-refs.cc (vect_gather_scatter_fn_p): Ditto. * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto. (vect_get_strided_load_store_ops): Ditto. (vectorizable_store): Ditto. (vectorizable_load): Ditto. --- gcc/optabs-query.cc | 2 + gcc/tree-vect-data-refs.cc | 15 +++- gcc/tree-vect-stmts.cc | 136 ++++++++++++++++++++++++++++++++----- 3 files changed, 134 insertions(+), 19 deletions(-) diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc index 2fdd0d34354..bf1f484e874 100644 --- a/gcc/optabs-query.cc +++ b/gcc/optabs-query.cc @@ -676,6 +676,7 @@ supports_vec_gather_load_p (machine_mode mode) this_fn_optabs->supports_vec_gather_load[mode] = (supports_vec_convert_optab_p (gather_load_optab, mode) || supports_vec_convert_optab_p (mask_gather_load_optab, mode) + || supports_vec_convert_optab_p (len_mask_gather_load_optab, mode) ? 1 : -1); return this_fn_optabs->supports_vec_gather_load[mode] > 0; @@ -692,6 +693,7 @@ supports_vec_scatter_store_p (machine_mode mode) this_fn_optabs->supports_vec_scatter_store[mode] = (supports_vec_convert_optab_p (scatter_store_optab, mode) || supports_vec_convert_optab_p (mask_scatter_store_optab, mode) + || supports_vec_convert_optab_p (len_mask_scatter_store_optab, mode) ? 1 : -1); return this_fn_optabs->supports_vec_scatter_store[mode] > 0; diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index ebe93832b1e..8d32eb3c83b 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -3873,16 +3873,24 @@ vect_gather_scatter_fn_p (vec_info *vinfo, bool read_p, bool masked_p, return false; /* Work out which function we need. */ - internal_fn ifn, alt_ifn; + internal_fn ifn, alt_ifn, len_mask_ifn; if (read_p) { ifn = masked_p ? IFN_MASK_GATHER_LOAD : IFN_GATHER_LOAD; alt_ifn = IFN_MASK_GATHER_LOAD; + /* When target supports LEN_MASK_GATHER_LOAD, we always + use LEN_MASK_GATHER_LOAD regardless whether len and + mask are valid or not. */ + len_mask_ifn = IFN_LEN_MASK_GATHER_LOAD; } else { ifn = masked_p ? IFN_MASK_SCATTER_STORE : IFN_SCATTER_STORE; alt_ifn = IFN_MASK_SCATTER_STORE; + /* When target supports LEN_MASK_SCATTER_STORE, we always + use LEN_MASK_SCATTER_STORE regardless whether len and + mask are valid or not. */ + len_mask_ifn = IFN_LEN_MASK_SCATTER_STORE; } for (;;) @@ -3893,7 +3901,10 @@ vect_gather_scatter_fn_p (vec_info *vinfo, bool read_p, bool masked_p, /* Test whether the target supports this combination. */ if (internal_gather_scatter_fn_supported_p (ifn, vectype, memory_type, - offset_vectype, scale)) + offset_vectype, scale) + || internal_gather_scatter_fn_supported_p (len_mask_ifn, vectype, + memory_type, + offset_vectype, scale)) { *ifn_out = ifn; *offset_vectype_out = offset_vectype; diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index a0c39268bf0..1f607b7102b 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1771,6 +1771,18 @@ check_load_store_for_partial_vectors (loop_vec_info loop_vinfo, tree vectype, gs_info->offset_vectype, gs_info->scale)) { + ifn = (is_load + ? IFN_LEN_MASK_GATHER_LOAD + : IFN_LEN_MASK_SCATTER_STORE); + if (internal_gather_scatter_fn_supported_p (ifn, vectype, + gs_info->memory_type, + gs_info->offset_vectype, + gs_info->scale)) + { + vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); + vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1); + return; + } if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "can't operate on partial vectors because" @@ -3129,16 +3141,39 @@ vect_get_gather_scatter_ops (loop_vec_info loop_vinfo, static void vect_get_strided_load_store_ops (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, + gimple_stmt_iterator *gsi, gather_scatter_info *gs_info, - tree *dataref_bump, tree *vec_offset) + tree *dataref_bump, tree *vec_offset, + vec_loop_lens *loop_lens) { struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); tree vectype = STMT_VINFO_VECTYPE (stmt_info); - tree bump = size_binop (MULT_EXPR, - fold_convert (sizetype, unshare_expr (DR_STEP (dr))), - size_int (TYPE_VECTOR_SUBPARTS (vectype))); - *dataref_bump = cse_and_gimplify_to_preheader (loop_vinfo, bump); + if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) + { + /* _31 = .SELECT_VL (ivtmp_29, POLY_INT_CST [4, 4]); + ivtmp_8 = _31 * 16 (step in bytes); + .LEN_MASK_SCATTER_STORE (vectp_a.9_7, ... ); + vectp_a.9_26 = vectp_a.9_7 + ivtmp_8; */ + tree loop_len + = vect_get_loop_len (loop_vinfo, gsi, loop_lens, 1, vectype, 0, 0); + tree tmp + = fold_build2 (MULT_EXPR, sizetype, + fold_convert (sizetype, unshare_expr (DR_STEP (dr))), + loop_len); + tree bump = make_temp_ssa_name (sizetype, NULL, "ivtmp"); + gassign *assign = gimple_build_assign (bump, tmp); + gsi_insert_before (gsi, assign, GSI_SAME_STMT); + *dataref_bump = bump; + } + else + { + tree bump + = size_binop (MULT_EXPR, + fold_convert (sizetype, unshare_expr (DR_STEP (dr))), + size_int (TYPE_VECTOR_SUBPARTS (vectype))); + *dataref_bump = cse_and_gimplify_to_preheader (loop_vinfo, bump); + } /* The offset given in GS_INFO can have pointer type, so use the element type of the vector instead. */ @@ -8685,8 +8720,8 @@ vectorizable_store (vec_info *vinfo, else if (memory_access_type == VMAT_GATHER_SCATTER) { aggr_type = elem_type; - vect_get_strided_load_store_ops (stmt_info, loop_vinfo, &gs_info, - &bump, &vec_offset); + vect_get_strided_load_store_ops (stmt_info, loop_vinfo, gsi, &gs_info, + &bump, &vec_offset, loop_lens); } else { @@ -8915,6 +8950,8 @@ vectorizable_store (vec_info *vinfo, unsigned HOST_WIDE_INT align; tree final_mask = NULL_TREE; + tree final_len = NULL_TREE; + tree bias = NULL_TREE; if (loop_masks) final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, vec_num * ncopies, @@ -8929,8 +8966,43 @@ vectorizable_store (vec_info *vinfo, if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) vec_offset = vec_offsets[vec_num * j + i]; tree scale = size_int (gs_info.scale); + + if (internal_gather_scatter_fn_supported_p ( + IFN_LEN_MASK_SCATTER_STORE, vectype, gs_info.memory_type, + TREE_TYPE (vec_offset), gs_info.scale)) + { + if (loop_lens) + { + final_len + = vect_get_loop_len (loop_vinfo, gsi, loop_lens, + vec_num * ncopies, vectype, + vec_num * j + i, 1); + } + else + { + tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); + final_len + = build_int_cst (iv_type, + TYPE_VECTOR_SUBPARTS (vectype)); + } + signed char biasval + = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + bias = build_int_cst (intQI_type_node, biasval); + if (!final_mask) + { + mask_vectype = truth_type_for (vectype); + final_mask = build_minus_one_cst (mask_vectype); + } + } + gcall *call; - if (final_mask) + if (final_len && final_len) + call + = gimple_build_call_internal (IFN_LEN_MASK_SCATTER_STORE, + 7, dataref_ptr, vec_offset, + scale, vec_oprnd, final_len, + bias, final_mask); + else if (final_mask) call = gimple_build_call_internal (IFN_MASK_SCATTER_STORE, 5, dataref_ptr, vec_offset, scale, vec_oprnd, final_mask); @@ -9047,9 +9119,6 @@ vectorizable_store (vec_info *vinfo, machine_mode vmode = TYPE_MODE (vectype); machine_mode new_vmode = vmode; internal_fn partial_ifn = IFN_LAST; - /* Produce 'len' and 'bias' argument. */ - tree final_len = NULL_TREE; - tree bias = NULL_TREE; if (loop_lens) { opt_machine_mode new_ovmode @@ -10177,8 +10246,8 @@ vectorizable_load (vec_info *vinfo, else if (memory_access_type == VMAT_GATHER_SCATTER) { aggr_type = elem_type; - vect_get_strided_load_store_ops (stmt_info, loop_vinfo, &gs_info, - &bump, &vec_offset); + vect_get_strided_load_store_ops (stmt_info, loop_vinfo, gsi, &gs_info, + &bump, &vec_offset, loop_lens); } else { @@ -10339,6 +10408,8 @@ vectorizable_load (vec_info *vinfo, for (i = 0; i < vec_num; i++) { tree final_mask = NULL_TREE; + tree final_len = NULL_TREE; + tree bias = NULL_TREE; if (loop_masks && memory_access_type != VMAT_INVARIANT) final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, @@ -10368,8 +10439,42 @@ vectorizable_load (vec_info *vinfo, vec_offset = vec_offsets[vec_num * j + i]; tree zero = build_zero_cst (vectype); tree scale = size_int (gs_info.scale); + + if (internal_gather_scatter_fn_supported_p ( + IFN_LEN_MASK_GATHER_LOAD, vectype, + gs_info.memory_type, TREE_TYPE (vec_offset), + gs_info.scale)) + { + if (loop_lens) + { + final_len = vect_get_loop_len ( + loop_vinfo, gsi, loop_lens, vec_num * ncopies, + vectype, vec_num * j + i, 1); + } + else + { + tree iv_type + = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); + final_len = build_int_cst ( + iv_type, TYPE_VECTOR_SUBPARTS (vectype)); + } + signed char biasval + = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + bias = build_int_cst (intQI_type_node, biasval); + if (!final_mask) + { + mask_vectype = truth_type_for (vectype); + final_mask = build_minus_one_cst (mask_vectype); + } + } + gcall *call; - if (final_mask) + if (final_len && final_mask) + call = gimple_build_call_internal ( + IFN_LEN_MASK_GATHER_LOAD, 7, dataref_ptr, + vec_offset, scale, zero, final_len, bias, + final_mask); + else if (final_mask) call = gimple_build_call_internal (IFN_MASK_GATHER_LOAD, 5, dataref_ptr, vec_offset, scale, zero, final_mask); @@ -10462,9 +10567,6 @@ vectorizable_load (vec_info *vinfo, machine_mode vmode = TYPE_MODE (vectype); machine_mode new_vmode = vmode; internal_fn partial_ifn = IFN_LAST; - /* Produce 'len' and 'bias' argument. */ - tree final_len = NULL_TREE; - tree bias = NULL_TREE; if (loop_lens) { opt_machine_mode new_ovmode