From patchwork Thu Aug 10 07:49:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 133736 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp252603vqi; Thu, 10 Aug 2023 00:49:59 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFbCoFdwU6mS3/oKgr16fyMJesSeoT2iUlARURA024k9SCSlWRrz1WGWtPzxwkzWvkYZ2pu X-Received: by 2002:a17:906:2189:b0:993:fb68:ed67 with SMTP id 9-20020a170906218900b00993fb68ed67mr1043660eju.24.1691653799271; Thu, 10 Aug 2023 00:49:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691653799; cv=none; d=google.com; s=arc-20160816; b=hUmJlt0Qc+XyuQDlRQ8cYAjZ6BKeVDEajdoSc6gB8ax1lEqzebw+eT14053z7ayUmh cPRe5VD4TFoquzHioon1MORTaySYrWrpZRygz+ZZIbekWEYTvpMygL34X9nn9/STnErG 8kr6icR/9za6SktHAxRa8vtCNmQ/c+MGC9U80WzdR76WFddMsFfzp2actYqUCjv68NEu Gh+xAatuJYF3eYwVWUmWbzOxKON1PtwVXyXGAS9+Jdlof1sor5oZH8SXTyPn4qAYyU2x CcbS2wHy6+rCOL9Nj0bKr/Ar/xmYD/IV0reb/q9tSceNueIBkafHuCw/fB9hkAXeCr/N NJUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=3tvnaBWzYU9wGoBN9ICNLpS8TcsjbuMRwGeJGqS0ev4=; fh=CHbn33ss3MXGFqXGZpS89+qfBQv2oFkoJCJQmHw6RIo=; b=gq3z5s75H6OVqrCF8lkzQAl4icRBXk62R2xMNC1Dsc3A4AjiiIrlpD+S+flx8Wagky IKdwSnHnllhjNqEH065bs+3FFxDTpOAvVP7v9GTB1xNWFJ1oPKi8DtECLJH2PDuomXQV tTRNO454GQEBxqCbiRrIgBxvAWxqvxEVE1uOHGuVVhl3XWQ/UgoA/oZYLzAgXuCGDCYQ XP4ySKRWKI7pm5xKNCBjyEgh+3v8z7kR0J+mLrKp4akAwITUBrATRrP/5s2fFoWuDweL gSSMxYzXbM4YWfa26pMQHhvKXxTgOQg9nMR5JSZr+zOpwXamcAD/c4JHtbIAeENlUWdE CemQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id p4-20020a17090628c400b0099bc2493e79si927882ejd.585.2023.08.10.00.49.59 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Aug 2023 00:49:59 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CA1543856967 for ; Thu, 10 Aug 2023 07:49:42 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbguseast3.qq.com (smtpbguseast3.qq.com [54.243.244.52]) by sourceware.org (Postfix) with ESMTPS id E32C53857B9B for ; Thu, 10 Aug 2023 07:49:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E32C53857B9B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp71t1691653751tlqqx822 Received: from rios-cad5.localdomain ( [58.60.1.11]) by bizesmtp.qq.com (ESMTP) with id ; Thu, 10 Aug 2023 15:49:10 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: C46Rb8GPIEdOaIWd9TBi1LQs/BXAQ+QPgJmuNTqQuHdPz96EtjmSTvhqgRo37 I95G5aJB+FGQyTLsjjvty++SM/WsiioucfjxNwq4nAMosWJgOrWU8tm4cdmXkqfGP6aCD/M kfG7Lz2lJ1clpQkKqaizKkmmseXaNGhGFNAMQtJfdWeJGJLYYBUEmtjfK5kWypesGENA2Wi j7h9v4Hz1mGDOF05npjc/yBUPEMwLhtC3PFOtuDTp2M8YfMO+A+E80/3eLuXcYSwneUjbVX DP3nCQRPngpbm6EEEch0EvCfYgvQen87U6qu8iioh5OMShzK51jVt2QO8OC5dDKRkO4Y13F CBbJr/ZkvQvF3yGlsR19I2mwoTLwCAeZlc4IDheBTS4tFYzpNlDeczIfOADU6HPbiKG6a/N 9iuXLIqY+DGvO0NuLg3QhA== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 14543276528820692383 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Ju-Zhe Zhong Subject: [PATCH V2] VECT: Support loop len control on EXTRACT_LAST vectorization Date: Thu, 10 Aug 2023 15:49:09 +0800 Message-Id: <20230810074909.492039-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773827573897128397 X-GMAIL-MSGID: 1773827573897128397 From: Ju-Zhe Zhong Hi, Richard and Richi. This patch add support live vectorization by VEC_EXTRACT for LEN loop control. Consider this following case: #include #define EXTRACT_LAST(TYPE) \ TYPE __attribute__ ((noinline, noclone)) \ test_##TYPE (TYPE *x, int n, TYPE value) \ { \ TYPE last; \ for (int j = 0; j < n; ++j) \ { \ last = x[j]; \ x[j] = last * value; \ } \ return last; \ } #define TEST_ALL(T) \ T (uint8_t) \ TEST_ALL (EXTRACT_LAST) ARM SVE IR: Preheader: max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... }); Loop: ... # loop_mask_22 = PHI ... vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22); vect__4.9_27 = vect_last_12.8_23 * vect_cst__26; .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27); ... next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... }); ... Epilogue: _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23); For RVV since we prefer len in loop control, after this patch for RVV: Loop: ... loop_len_22 = SELECT_VL; vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22); vect__4.9_27 = vect_last_12.8_23 * vect_cst__26; .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27); ... Epilogue: _25 = .VEC_EXTRACT (loop_len_22 - 1 - bias, vect_last_12.8_23); Details of this approach: 1. Step 1 - Add 'vect_can_vectorize_extract_last_with_len_p' to enable live vectorization for LEN loop control. This function we check whether target support: - Use LEN as the loop control. - Support VEC_EXTRACT optab. 2. Step 2 - Record LEN for loop control if 'vect_can_vectorize_extract_last_with_len_p' is true. 3. Step 3 - Gerenate VEC_EXTRACT (v, LEN - 1 - BIAS). NOTE: This patch set 'vinfo->any_known_not_updated_vssa = true;' since the original STMT is a simple assignment wheras VEC_EXTRACT is neither pure nor const function according to internal-fn.def: DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, 0, vec_extract, vec_extract) If we don't set 'vinfo->any_known_not_updated_vssa' as true, it will cause ICE in: if (need_ssa_update_p (cfun)) { gcc_assert (loop_vinfo->any_known_not_updated_vssa); ----> Report assertion fail here. fun->gimple_df->ssa_renaming_needed = false; todo |= TODO_update_ssa_only_virtuals; } I saw there are 2 places set 'vinfo->any_known_not_updated_vssa' as true: - The one is in 'vectorizable_simd_clone_call': /* When the original call is pure or const but the SIMD ABI dictates an aggregate return we will have to use a virtual definition and in a loop eventually even need to add a virtual PHI. That's not straight-forward so allow to fix this up via renaming. */ if (gimple_call_lhs (stmt) && !gimple_vdef (stmt) && TREE_CODE (TREE_TYPE (TREE_TYPE (bestn->decl))) == ARRAY_TYPE) vinfo->any_known_not_updated_vssa = true; - The other is in 'vectorizable_load': if (memory_access_type == VMAT_LOAD_STORE_LANES) vinfo->any_known_not_updated_vssa = true; It seems that they are the same reason as me doing in 'vectorizable_live_operation'. Feel free to correct me if I am wrong. Bootstrap and Regression on X86 passed. gcc/ChangeLog: * tree-vect-loop.cc (vect_can_vectorize_extract_last_with_len_p): New function. (vectorizable_live_operation): Add loop LEN control. --- gcc/tree-vect-loop.cc | 74 +++++++++++++++++++++++++++++++++++++++---- 1 file changed, 68 insertions(+), 6 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 00058c3c13e..208918f53fb 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -8964,6 +8964,24 @@ vect_can_vectorize_without_simd_p (code_helper code) && vect_can_vectorize_without_simd_p (tree_code (code))); } +/* Return true if target supports extract last vectorization with LEN. */ + +static bool +vect_can_vectorize_extract_last_with_len_p (tree vectype) +{ + /* Return false if target doesn't support LEN in loop control. */ + machine_mode vmode; + if (!get_len_load_store_mode (TYPE_MODE (vectype), true).exists (&vmode) + || !get_len_load_store_mode (TYPE_MODE (vectype), false).exists (&vmode)) + return false; + + /* Target need to support VEC_EXTRACT to extract the last active element. */ + return convert_optab_handler (vec_extract_optab, + TYPE_MODE (vectype), + TYPE_MODE (TREE_TYPE (vectype))) + != CODE_FOR_nothing; +} + /* Create vector init for vectorized iv. */ static tree vect_create_nonlinear_iv_init (gimple_seq* stmts, tree init_expr, @@ -10282,7 +10300,8 @@ vectorizable_live_operation (vec_info *vinfo, if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) { if (!direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype, - OPTIMIZE_FOR_SPEED)) + OPTIMIZE_FOR_SPEED) + && !vect_can_vectorize_extract_last_with_len_p (vectype)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -10311,9 +10330,14 @@ vectorizable_live_operation (vec_info *vinfo, else { gcc_assert (ncopies == 1 && !slp_node); - vect_record_loop_mask (loop_vinfo, - &LOOP_VINFO_MASKS (loop_vinfo), - 1, vectype, NULL); + if (vect_can_vectorize_extract_last_with_len_p (vectype)) + vect_record_loop_len (loop_vinfo, + &LOOP_VINFO_LENS (loop_vinfo), + 1, vectype, 1); + else + vect_record_loop_mask (loop_vinfo, + &LOOP_VINFO_MASKS (loop_vinfo), + 1, vectype, NULL); } } /* ??? Enable for loop costing as well. */ @@ -10339,7 +10363,9 @@ vectorizable_live_operation (vec_info *vinfo, gimple *vec_stmt; if (slp_node) { - gcc_assert (!loop_vinfo || !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)); + gcc_assert (!loop_vinfo + || (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) + && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))); /* Get the correct slp vectorized stmt. */ vec_lhs = SLP_TREE_VEC_DEFS (slp_node)[vec_entry]; @@ -10383,7 +10409,43 @@ vectorizable_live_operation (vec_info *vinfo, gimple_seq stmts = NULL; tree new_tree; - if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) + if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) + { + /* Emit: + + SCALAR_RES = VEC_EXTRACT + + where VEC_LHS is the vectorized live-out result and MASK is + the loop mask for the final iteration. */ + gcc_assert (ncopies == 1 && !slp_node); + tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info)); + tree len + = vect_get_loop_len (loop_vinfo, gsi, &LOOP_VINFO_LENS (loop_vinfo), + 1, vectype, 0, 0); + + /* BIAS + 1. */ + signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + tree bias_one + = size_binop (PLUS_EXPR, build_int_cst (TREE_TYPE (len), biasval), + build_one_cst (TREE_TYPE (len))); + + /* LAST_INDEX = LEN - (BIAS + 1). */ + tree last_index + = gimple_build (&stmts, MINUS_EXPR, TREE_TYPE (len), len, bias_one); + + tree scalar_res = gimple_build (&stmts, CFN_VEC_EXTRACT, scalar_type, + vec_lhs_phi, last_index); + + /* Convert the extracted vector element to the scalar type. */ + new_tree = gimple_convert (&stmts, lhs_type, scalar_res); + /* When the original stmt is an assignment but VEC_EXTRACT is not pure + or const since it may return a memory result. We will have to use + a virtual definition and in a loop eventually even need to add a + virtual PHI. That's not straight-forward so allow to fix this up + via renaming. */ + vinfo->any_known_not_updated_vssa = true; + } + else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) { /* Emit: