From patchwork Fri Aug 11 06:38:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 134334 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp894209vqi; Thu, 10 Aug 2023 23:39:06 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGxQIJezYXjyPJY1ufmHh/KazjejhxBCqoISCKHvhzWqE7KxECfndao43F110gbZu3Or8gx X-Received: by 2002:a17:907:78d6:b0:978:acec:36b1 with SMTP id kv22-20020a17090778d600b00978acec36b1mr1008998ejc.17.1691735946637; Thu, 10 Aug 2023 23:39:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691735946; cv=none; d=google.com; s=arc-20160816; b=UCZ7VZAkjrXjZJhXSDYWKrng92ZjpTdPES+mk7Ma5rxE/apSRo875M8Lzm22bX1jxp TballaWofegpBTxZ6QhtTzqKH9wVMPNQc9clsLD8PJS0wRE+tk4jGX29PdgsaBz79e8C zin15FAkjeikFpwcoBPwOgskvXwxRy4PnFBEmrzOgNIrNHW3ba1WZMFstJkUxjZiy3QV 74MOt3VD8OUFvX8N66d4TPlXekU5/LDLt+zcGIPxQwNvqJnpe8thm1PQYCYdJkQHU6uy WBRPBFDHpCtadPZaGGuqF/sTgMNYxTXRA6dSRHWq0lPgrSvOAy+SavEmE3XovLmWWjyH RLKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=OEAGLJGnZHnxBC6JV8isthLCbS2KdniwcEb/zIdnl8U=; fh=CHbn33ss3MXGFqXGZpS89+qfBQv2oFkoJCJQmHw6RIo=; b=AdnFWOUVCRM4BF/xZc0t75xOtAVIK346L0s9MUVAsgr3Jm/btAciSiXNzHvRGNFsyS sTJTS1+5mUDblt6U/4KSusPDHygcF1Kh7TPHlUlhCExLrdNuip59psHKyJ59AiN5yIf7 QwCjgeUAYoCuXcdZq69v9fwid6nMXqXmL7c/zMTwn8ootVei7dSxmlImXRBmpiB94TdH 6zU/YEhd0ufE0c+V6MICE7dsUZCuQJh99fk2xR1o+GXfcIsd4z3ZE/sEnLxt7fPSxZdl 4/T1GnZOy6elFxvrpZJ9vbcom7jAD5atjjHGfv/NvHMtURMM/rajr3Y/HfoTjYfClEVt FK7g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id f24-20020a1709067f9800b00987b20b66bbsi2814579ejr.711.2023.08.10.23.39.06 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Aug 2023 23:39:06 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BDFEF3858C1F for ; Fri, 11 Aug 2023 06:39:00 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbguseast3.qq.com (smtpbguseast3.qq.com [54.243.244.52]) by sourceware.org (Postfix) with ESMTPS id 3F2AE3858D20 for ; Fri, 11 Aug 2023 06:38:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3F2AE3858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp71t1691735900tvv0mdnm Received: from rios-cad5.localdomain ( [58.60.1.11]) by bizesmtp.qq.com (ESMTP) with id ; Fri, 11 Aug 2023 14:38:18 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: C46Rb8GPIEfCZXYYSrYQRgDikEm8J8WPtEuACf+OZjJXKs19ko38iU/R8UMJZ 3Hh3Aw9v/2kv5m2DCR9om0OsDFgNsiXw9p7fkSkVtUXra2I0+7bMJ7OPqxNLd5T00072au5 fwzPsJ5Av64Cc6iqqTqKG8IVKYTq5tM69V0UPOF7O3nD8VWJyO94U3QVbyc8gAiZL4LJ9Hl YUKd/styVlxWarJ/BPpKlYE3Y29Thqkxz2V/+7b2Xk6VgM8GwOxFKrdNRPrIeG2iLaQmLp7 BQEySCMTTkh8GeVuw308I6tnvcsZ6Q+d41Y+LuojFwi8bGfhsocMQlmMC0RVng51MIciuJY 1UvPIW5kgZ3Bu1BYn40c/5PDdk5On3T/nNsWDfHmdY3MPzMc3C0P4NwLiqKOXRhc3Epklh8 xAA0KGy3c12Y8Uc7yIN9IA== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 17228281536687692325 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Ju-Zhe Zhong Subject: [PATCH V3] VECT: Support loop len control on EXTRACT_LAST vectorization Date: Fri, 11 Aug 2023 14:38:17 +0800 Message-Id: <20230811063817.491547-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773913711908513793 X-GMAIL-MSGID: 1773913711908513793 From: Ju-Zhe Zhong Hi, Richard and Richi. This patch add support live vectorization by VEC_EXTRACT for LEN loop control. Consider this following case: #include #define EXTRACT_LAST(TYPE) \ TYPE __attribute__ ((noinline, noclone)) \ test_##TYPE (TYPE *x, int n, TYPE value) \ { \ TYPE last; \ for (int j = 0; j < n; ++j) \ { \ last = x[j]; \ x[j] = last * value; \ } \ return last; \ } #define TEST_ALL(T) \ T (uint8_t) \ TEST_ALL (EXTRACT_LAST) ARM SVE IR: Preheader: max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... }); Loop: ... # loop_mask_22 = PHI ... vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22); vect__4.9_27 = vect_last_12.8_23 * vect_cst__26; .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27); ... next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... }); ... Epilogue: _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23); For RVV since we prefer len in loop control, after this patch for RVV: Loop: ... loop_len_22 = SELECT_VL; vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22); vect__4.9_27 = vect_last_12.8_23 * vect_cst__26; .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27); ... Epilogue: _25 = .VEC_EXTRACT (loop_len_22 + bias - 1, vect_last_12.8_23); Details of this approach: 1. Step 1 - Add 'vect_can_vectorize_extract_last_with_len_p' to enable live vectorization for LEN loop control. This function we check whether target support: - Use LEN as the loop control. - Support VEC_EXTRACT optab. 2. Step 2 - Record LEN for loop control if 'vect_can_vectorize_extract_last_with_len_p' is true. 3. Step 3 - Gerenate VEC_EXTRACT (v, LEN + BIAS - 1). The only difference between mask and len is that len is using length generated by SELECT_VL and use VEC_EXTRACT pattern. The rest of the live vectorization is totally the same ARM SVE. Bootstrap and Regression on X86 passed. Tested on ARM QEMU. Ok for trunk? gcc/ChangeLog: * tree-vect-loop.cc (vect_can_vectorize_extract_last_with_len_p): New function. (vectorizable_live_operation): Add loop len control. --- gcc/tree-vect-loop.cc | 76 +++++++++++++++++++++++++++++++++++++++---- 1 file changed, 70 insertions(+), 6 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index bf8d677b584..809b73b966c 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -8963,6 +8963,27 @@ vect_can_vectorize_without_simd_p (code_helper code) && vect_can_vectorize_without_simd_p (tree_code (code))); } +/* Return true if target supports extract last vectorization with LEN. */ + +static bool +vect_can_vectorize_extract_last_with_len_p (tree vectype) +{ + /* Return false if target doesn't support LEN in loop control. */ + machine_mode vmode; + machine_mode vec_mode = TYPE_MODE (vectype); + if (!VECTOR_MODE_P (vec_mode)) + return false; + if (!get_len_load_store_mode (vec_mode, true).exists (&vmode) + || !get_len_load_store_mode (vec_mode, false).exists (&vmode)) + return false; + + /* Target need to support VEC_EXTRACT to extract the last active element. */ + return convert_optab_handler (vec_extract_optab, + vec_mode, + TYPE_MODE (TREE_TYPE (vectype))) + != CODE_FOR_nothing; +} + /* Create vector init for vectorized iv. */ static tree vect_create_nonlinear_iv_init (gimple_seq* stmts, tree init_expr, @@ -10279,7 +10300,8 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) { if (!direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype, - OPTIMIZE_FOR_SPEED)) + OPTIMIZE_FOR_SPEED) + && !vect_can_vectorize_extract_last_with_len_p (vectype)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -10308,9 +10330,14 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, else { gcc_assert (ncopies == 1 && !slp_node); - vect_record_loop_mask (loop_vinfo, - &LOOP_VINFO_MASKS (loop_vinfo), - 1, vectype, NULL); + if (vect_can_vectorize_extract_last_with_len_p (vectype)) + vect_record_loop_len (loop_vinfo, + &LOOP_VINFO_LENS (loop_vinfo), + 1, vectype, 1); + else + vect_record_loop_mask (loop_vinfo, + &LOOP_VINFO_MASKS (loop_vinfo), + 1, vectype, NULL); } } /* ??? Enable for loop costing as well. */ @@ -10336,7 +10363,9 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, gimple *vec_stmt; if (slp_node) { - gcc_assert (!loop_vinfo || !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)); + gcc_assert (!loop_vinfo + || (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) + && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))); /* Get the correct slp vectorized stmt. */ vec_lhs = SLP_TREE_VEC_DEFS (slp_node)[vec_entry]; @@ -10380,7 +10409,42 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, gimple_seq stmts = NULL; tree new_tree; - if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) + if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) + { + /* Emit: + + SCALAR_RES = VEC_EXTRACT + + where VEC_LHS is the vectorized live-out result and MASK is + the loop mask for the final iteration. */ + gcc_assert (ncopies == 1 && !slp_node); + gimple_seq tem = NULL; + gimple_stmt_iterator gsi = gsi_last (tem); + tree len + = vect_get_loop_len (loop_vinfo, &gsi, + &LOOP_VINFO_LENS (loop_vinfo), + 1, vectype, 0, 0); + + /* BIAS - 1. */ + signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + tree bias_minus_one + = int_const_binop (MINUS_EXPR, + build_int_cst (TREE_TYPE (len), biasval), + build_one_cst (TREE_TYPE (len))); + + /* LAST_INDEX = LEN + (BIAS - 1). */ + tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len), + len, bias_minus_one); + + /* SCALAR_RES = VEC_EXTRACT . */ + tree scalar_res + = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype), + vec_lhs_phi, last_index); + + /* Convert the extracted vector element to the scalar type. */ + new_tree = gimple_convert (&stmts, lhs_type, scalar_res); + } + else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) { /* Emit: