From patchwork Wed May 24 14:48:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 98528 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2903483vqo; Wed, 24 May 2023 07:48:43 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6G9SGvVBdXnyiGyUtbHvdabtb0n79fJlzWzZh1qPhoD1q1EGTekqQfJALeIl1xv5gvmHPx X-Received: by 2002:a17:907:9708:b0:96a:17f4:c9bb with SMTP id jg8-20020a170907970800b0096a17f4c9bbmr19164582ejc.58.1684939723614; Wed, 24 May 2023 07:48:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684939723; cv=none; d=google.com; s=arc-20160816; b=hA4pBxMhqhJuhgR3i6PyhzsJQ9A732NJFhYlI2XG8DGjW4DUok9YS1lCBveDmS6mL2 i4a9frvUrWrFvdqxmMg3tU4DudhCxmpsUJJv7yrScFN4BBkGcKCV2EAGBHh69kuH2d5L hdJSYtd4IS8wbHz5bL2e28VFMU7I+F7cEpiDU53OAIgujU0z/ueaA9NungedRGw7StOM Nm2KB5npxh1JD91Xjo3HoCJuTEdUGOy1arquat2YSLtFgtouhXxJJ6eIYNKBwql7j8O7 EWJH4508WudgYTyIZPBuNggtLHD5eLcGLYUC28CmT6nOW2/HzRwRWSTDGqeK2A1yXck5 wr9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=CnjMFAUmUjtQTvzLB82qn99mG54GWgk0zdcadxFkj3Y=; b=ERFBd+Eds43wu04H/zfM0rLrcYOyQJvlivV7kgyxW77sYwD3PZqBDhVjiRyyU285vD g0eCKOKMy3jZXIxiV2fZFaxOHATD4PlwFZckQVhxMy4NFJENaqjQhPSg+U8U7STgG+Ts PE/jYtfkLMH/PMugWPBEhpqwMWF+tUcC1Ik9ZfhQG0C0oONqsOZHGlFyPP0FasiOEbkf J8wmqVJcr5noJByNG8rz+8WN4pkCWVIBNsEvRLciLzcHaw3Dt7m7cULB6HLjZ6MJL8Mf FJsgKmJ3OhvEXFZ6DxLqs+2BH2qIPJLe+hteBVhUnJRsF1L96eOylzSInFX4tp2JHIiS +M6w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id fn15-20020a1709069d0f00b009572d0759a8si1524103ejc.225.2023.05.24.07.48.43 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 May 2023 07:48:43 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2E4C7385773C for ; Wed, 24 May 2023 14:48:36 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgau2.qq.com (smtpbgau2.qq.com [54.206.34.216]) by sourceware.org (Postfix) with ESMTPS id B7DB93858D28 for ; Wed, 24 May 2023 14:48:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B7DB93858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp71t1684939683ti9mrh5l Received: from server1.localdomain ( [58.60.1.22]) by bizesmtp.qq.com (ESMTP) with id ; Wed, 24 May 2023 22:48:02 +0800 (CST) X-QQ-SSF: 01400000000000F0R000000A0000000 X-QQ-FEAT: oxgq2YVMtX/yQaetA/isENJtxGsP24+jpX8S7yj93wRPoaIQXOpMPVbC2yO3h EGHYL3SD5At0RG3WXRkoT+AK+HEe/yXtEMf/Oq3b/CYvPNB8H6qs4zu9K/FmEDGcfSfOa6Q 63uPamqOPxzxB1KDll7SertFA8DkrBpOycRhaHEobu13Nb5YoDk9ppZPG+/WRcWT2ZkppYX 0JB4HRM29JxaZZ7Uo6gNLs37Zs/dxjY66AsIAxcNKXuYw3huvUI5pelhooNaiCPNJXSkB96 LoKaa7t7W54ukfpyVTsZ3PVbrYCm7urYuE7GQLDxJVjW4qCoFhlbMBGNu+l4oik7tJ3nOwt MB/SoaODUQjQyKDjaANhFdB3Qlr3vVY6tnHkw+5kxRxwYN2k9tZBVuitVNPF6ohlvF8jXnD rtC0mryhKjJ6C8EP4ZfHfw== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 7138075181263565690 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Ju-Zhe Zhong Subject: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support Date: Wed, 24 May 2023 22:48:01 +0800 Message-Id: <20230524144801.73537-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766582915121924772?= X-GMAIL-MSGID: =?utf-8?q?1766787355388603001?= From: Ju-Zhe Zhong This patch is supporting decrement IV by following the flow designed by Richard: (1) In vect_set_loop_condition_partial_vectors, for the first iteration of: call vect_set_loop_controls_directly. (2) vect_set_loop_controls_directly calculates "step" as in your patch. If rgc has 1 control, this step is the SSA name created for that control. Otherwise the step is a fresh SSA name, as in your patch. (3) vect_set_loop_controls_directly stores this step somewhere for later use, probably in LOOP_VINFO. Let's use "S" to refer to this stored step. (4) After the vect_set_loop_controls_directly call above, and outside the "if" statement that now contains vect_set_loop_controls_directly, check whether rgc->controls.length () > 1. If so, use vect_adjust_loop_lens_control to set the controls based on S. Then the only caller of vect_adjust_loop_lens_control is vect_set_loop_condition_partial_vectors. And the starting step for vect_adjust_loop_lens_control is always S. This patch has well tested for single-rgroup and multiple-rgroup (SLP) and passed all testcase in RISC-V port. Also, pass tests for multiple-rgroup (non-SLP) tested on vec_pack_trunk. --- gcc/tree-vect-loop-manip.cc | 178 +++++++++++++++++++++++++++++++++--- gcc/tree-vect-loop.cc | 13 +++ gcc/tree-vectorizer.h | 12 +++ 3 files changed, 192 insertions(+), 11 deletions(-) diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index ff6159e08d5..578ac5b783e 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -468,6 +468,38 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, gimple_stmt_iterator incr_gsi; bool insert_after; standard_iv_increment_position (loop, &incr_gsi, &insert_after); + if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) + { + /* single rgroup: + ... + _10 = (unsigned long) count_12(D); + ... + # ivtmp_9 = PHI + _36 = MIN_EXPR ; + ... + vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0); + ... + ivtmp_35 = ivtmp_9 - _36; + ... + if (ivtmp_35 != 0) + goto ; [83.33%] + else + goto ; [16.67%] + */ + nitems_total = gimple_convert (preheader_seq, iv_type, nitems_total); + tree step = rgc->controls.length () == 1 ? rgc->controls[0] + : make_ssa_name (iv_type); + /* Create decrement IV. */ + create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi, + insert_after, &index_before_incr, &index_after_incr); + gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR, + index_before_incr, + nitems_step)); + LOOP_VINFO_DECREMENTING_IV_STEP (loop_vinfo) = step; + return index_after_incr; + } + + /* Create increment IV. */ create_iv (build_int_cst (iv_type, 0), PLUS_EXPR, nitems_step, NULL_TREE, loop, &incr_gsi, insert_after, &index_before_incr, &index_after_incr); @@ -683,6 +715,63 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo, return next_ctrl; } +/* Try to use adjust loop lens for multiple-rgroups. + + _36 = MIN_EXPR ; + + First length (MIN (X, VF/N)): + loop_len_15 = MIN_EXPR <_36, VF/N>; + + Second length: + tmp = _36 - loop_len_15; + loop_len_16 = MIN (tmp, VF/N); + + Third length: + tmp2 = tmp - loop_len_16; + loop_len_17 = MIN (tmp2, VF/N); + + Last length: + loop_len_18 = tmp2 - loop_len_17; +*/ + +static void +vect_adjust_loop_lens_control (tree iv_type, gimple_seq *seq, + rgroup_controls *dest_rgm, tree step) +{ + tree ctrl_type = dest_rgm->type; + poly_uint64 nitems_per_ctrl + = TYPE_VECTOR_SUBPARTS (ctrl_type) * dest_rgm->factor; + tree length_limit = build_int_cst (iv_type, nitems_per_ctrl); + + for (unsigned int i = 0; i < dest_rgm->controls.length (); ++i) + { + tree ctrl = dest_rgm->controls[i]; + if (i == 0) + { + /* First iteration: MIN (X, VF/N) capped to the range [0, VF/N]. */ + gassign *assign + = gimple_build_assign (ctrl, MIN_EXPR, step, length_limit); + gimple_seq_add_stmt (seq, assign); + } + else if (i == dest_rgm->controls.length () - 1) + { + /* Last iteration: Remain capped to the range [0, VF/N]. */ + gassign *assign = gimple_build_assign (ctrl, MINUS_EXPR, step, + dest_rgm->controls[i - 1]); + gimple_seq_add_stmt (seq, assign); + } + else + { + /* (MIN (remain, VF*I/N)) capped to the range [0, VF/N]. */ + step = gimple_build (seq, MINUS_EXPR, iv_type, step, + dest_rgm->controls[i - 1]); + gassign *assign + = gimple_build_assign (ctrl, MIN_EXPR, step, length_limit); + gimple_seq_add_stmt (seq, assign); + } + } +} + /* Set up the iteration condition and rgroup controls for LOOP, given that LOOP_VINFO_USING_PARTIAL_VECTORS_P is true for the vectorized loop. LOOP_VINFO describes the vectorization of LOOP. NITERS is @@ -753,17 +842,84 @@ vect_set_loop_condition_partial_vectors (class loop *loop, continue; } - /* See whether zero-based IV would ever generate all-false masks - or zero length before wrapping around. */ - bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc); - - /* Set up all controls for this group. */ - test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo, - &preheader_seq, - &header_seq, - loop_cond_gsi, rgc, - niters, niters_skip, - might_wrap_p); + if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) + || !LOOP_VINFO_DECREMENTING_IV_STEP (loop_vinfo)) + { + /* See whether zero-based IV would ever generate all-false masks + or zero length before wrapping around. */ + bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc); + + /* Set up all controls for this group. */ + test_ctrl + = vect_set_loop_controls_directly (loop, loop_vinfo, + &preheader_seq, &header_seq, + loop_cond_gsi, rgc, niters, + niters_skip, might_wrap_p); + } + + /* Decrement IV only run vect_set_loop_controls_directly once. */ + if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) + && rgc->controls.length () > 1) + { + /* + - Multiple rgroup (SLP): + ... + _38 = (unsigned long) bnd.7_29; + _39 = _38 * 2; + ... + # ivtmp_41 = PHI + ... + _43 = MIN_EXPR ; + loop_len_26 = MIN_EXPR <_43, 16>; + loop_len_25 = _43 - loop_len_26; + ... + .LEN_STORE (_6, 8B, loop_len_26, ...); + ... + .LEN_STORE (_25, 8B, loop_len_25, ...); + _33 = loop_len_26 / 2; + ... + .LEN_STORE (_8, 16B, _33, ...); + _36 = loop_len_25 / 2; + ... + .LEN_STORE (_15, 16B, _36, ...); + ivtmp_42 = ivtmp_41 - _43; + ... + + - Multiple rgroup (non-SLP): + ... + _38 = (unsigned long) n_12(D); + ... + # ivtmp_38 = PHI + ... + _40 = MIN_EXPR ; + loop_len_21 = MIN_EXPR <_40, POLY_INT_CST [2, 2]>; + _41 = _40 - loop_len_21; + loop_len_20 = MIN_EXPR <_41, POLY_INT_CST [2, 2]>; + _42 = _40 - loop_len_20; + loop_len_19 = MIN_EXPR <_42, POLY_INT_CST [2, 2]>; + _43 = _40 - loop_len_19; + loop_len_16 = MIN_EXPR <_43, POLY_INT_CST [2, 2]>; + ... + vect__4.8_15 = .LEN_LOAD (_6, 64B, loop_len_21, 0); + ... + vect__4.9_8 = .LEN_LOAD (_13, 64B, loop_len_20, 0); + ... + vect__4.10_28 = .LEN_LOAD (_46, 64B, loop_len_19, 0); + ... + vect__4.11_30 = .LEN_LOAD (_49, 64B, loop_len_16, 0); + vect__7.13_31 = VEC_PACK_TRUNC_EXPR <...>, + vect__7.13_32 = VEC_PACK_TRUNC_EXPR <...>; + vect__7.12_33 = VEC_PACK_TRUNC_EXPR <...>; + ... + .LEN_STORE (_14, 16B, _40, vect__7.12_33, 0); + ivtmp_39 = ivtmp_38 - _40; + ... + */ + tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); + tree step = LOOP_VINFO_DECREMENTING_IV_STEP (loop_vinfo); + gcc_assert (step); + vect_adjust_loop_lens_control (iv_type, &header_seq, rgc, step); + } } /* Emit all accumulated statements. */ diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index cf10132b0bf..456f50fa7cc 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -973,6 +973,8 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) vectorizable (false), can_use_partial_vectors_p (param_vect_partial_vector_usage != 0), using_partial_vectors_p (false), + using_decrementing_iv_p (false), + decrementing_iv_step (NULL_TREE), epil_using_partial_vectors_p (false), partial_load_store_bias (0), peeling_for_gaps (false), @@ -2725,6 +2727,17 @@ start_over: && !vect_verify_loop_lens (loop_vinfo)) LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; + /* If we're vectorizing an loop that uses length "controls" and + can iterate more than once, we apply decrementing IV approach + in loop control. */ + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) + && !LOOP_VINFO_LENS (loop_vinfo).is_empty () + && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0 + && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) + && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo), + LOOP_VINFO_VECT_FACTOR (loop_vinfo)))) + LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true; + /* If we're vectorizing an epilogue loop, the vectorized loop either needs to be able to handle fewer than VF scalars, or needs to have a lower VF than the main loop. */ diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 02d2ad6fba1..7ed079f543a 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -818,6 +818,16 @@ public: the vector loop can handle fewer than VF scalars. */ bool using_partial_vectors_p; + /* True if we've decided to use a decrementing loop control IV that counts + scalars. This can be done for any loop that: + + (a) uses length "controls"; and + (b) can iterate more than once. */ + bool using_decrementing_iv_p; + + /* The variable amount step for decrement IV. */ + tree decrementing_iv_step; + /* True if we've decided to use partially-populated vectors for the epilogue of loop. */ bool epil_using_partial_vectors_p; @@ -890,6 +900,8 @@ public: #define LOOP_VINFO_VECTORIZABLE_P(L) (L)->vectorizable #define LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P(L) (L)->can_use_partial_vectors_p #define LOOP_VINFO_USING_PARTIAL_VECTORS_P(L) (L)->using_partial_vectors_p +#define LOOP_VINFO_USING_DECREMENTING_IV_P(L) (L)->using_decrementing_iv_p +#define LOOP_VINFO_DECREMENTING_IV_STEP(L) (L)->decrementing_iv_step #define LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P(L) \ (L)->epil_using_partial_vectors_p #define LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS(L) (L)->partial_load_store_bias