From patchwork Fri Jul 28 07:10:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 127362 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp250700vqg; Fri, 28 Jul 2023 00:11:26 -0700 (PDT) X-Google-Smtp-Source: APBJJlEAPnglHzYkIjnWlaFY6kQEluhzqSY7PeDUHmZ7mj5uUYCN1B265hCyZb+BGSx+JiLmF+XA X-Received: by 2002:a17:906:cc48:b0:965:9602:1f07 with SMTP id mm8-20020a170906cc4800b0096596021f07mr1676581ejb.39.1690528286155; Fri, 28 Jul 2023 00:11:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690528286; cv=none; d=google.com; s=arc-20160816; b=BBVTY1C6NJ80X9zgV7OvgDr2C5EanCMicnxv+xBou7wlgRUpcWKXD1dMjpyjx2D0Jr TrifDKbijLWWARpkaQcKfQHjGa2VPnrOll1vstkwwxzMC/pIObT59iUDTVA1b8y7Q7UE ceIjaRO2HjK6nIFnXOZoSZiOdnojvsgNLMLznO0ciYuTm2i3VWWAtO92JAJGLSPfKgYJ E4ZLjP7twzoJHSEF5x5dTMdDu8v79BxAWwdSIMQkYWA+vzW9DkQPvCy0eV3fcnEbPd/m 9sql2rOrr4PJTa7F+rOHBBaKfSwzitVv0qKNsm2xMIAbMKxXfZY4q8P3x4smBpqBmV1I 84Fw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=vbyJ0m1iHOAnaxBWrc1oTjN8VwjWbQem+/r2aUgFGMo=; fh=bFGt8bvBdss4RDVNwtoG4vql1cbNszblsxOGwrQqCdA=; b=JtZ2mhLajcK+AoZNdmlomY66dhbuEc1/Zvhua/eBgdCH+ymQhwTPPSKMQmCMGQeUQh CNgWuf0F1SpR2UW43oMvfKY+kNSCxjZrJ9uYsMxCb56DGBxLfQ2QCbfpKeT1MGM3/0uF xz7cJdxEzqBNF9UK6H55xd6yNTqP2J935PWkr+weW0SxclRN+4W1ZnLb7XRoXtEv4Pr1 U3YQU3Z0dCdIJfcEnVi5z4lroQzWg1HyJrzqKetzjE+Z9m6KMm6pyyqj4tyaCw7aBe0y u0Ka4Xe/csaZGb3ilALnrCa3moraTvQr+lwEk3Gag8q7JgJ+K9aa8/uphUa7gQQiflvN +eyw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id f5-20020a1709064dc500b0098f564f6366si2446524ejw.345.2023.07.28.00.11.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Jul 2023 00:11:26 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D18AD385AF83 for ; Fri, 28 Jul 2023 07:11:19 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbg151.qq.com (smtpbg151.qq.com [18.169.211.239]) by sourceware.org (Postfix) with ESMTPS id 6DB673858D33 for ; Fri, 28 Jul 2023 07:10:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6DB673858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp82t1690528244t322q6yo Received: from server1.localdomain ( [58.60.1.10]) by bizesmtp.qq.com (ESMTP) with id ; Fri, 28 Jul 2023 15:10:42 +0800 (CST) X-QQ-SSF: 01400000000000G0U000000A0000000 X-QQ-FEAT: k0mQ4ihyJQOYEjM8UeskIm67eB5AIrX4jsbOaqH+XrhYpbTRxXKApmC7EsUxw /p67wNUsdCE1eWgz/5PSdgFQXzZtAgjGxHxcxIQGBxHWo2H1pePXFPqL/XjfJDcuAMc7oxD y+qfaxIMC5io2mHRM524zocsno+IQULS3uJ2FmqvFY0N96vmwiAF3PVuwoB0j0O8bnyEN+e pBvVTo2uYw3t46W4a7s4x/qcEgQahpnQjg6mBM2ywEDoSH5M897qkeIFUt4mAkS9bl8UCxl Wegrb5ak0OagVW9K+H6W7kICBwz9F5TSRGYa2zv/0yzZDGtyZfoVYw9nxxnww5XZtOdabV8 8xEywxyHMe5F1DXAKsTyy4WlGKTZPKyYZSmoOpFHWrPneaUbIC2Z26irxsPk2Wn3ERgVQfv enzsco6OpPc= X-QQ-GoodBg: 2 X-BIZMAIL-ID: 15315810107612946758 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Juzhe-Zhong Subject: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_* Date: Fri, 28 Jul 2023 15:10:39 +0800 Message-Id: <20230728071039.107552-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772647387937958348 X-GMAIL-MSGID: 1772647387937958348 Hi, Richard and Richi. Base on the suggestions from Richard: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html This patch choose (1) approach that Richard provided, meaning: RVV implements cond_* optabs as expanders. RVV therefore supports both IFN_COND_ADD and IFN_COND_LEN_ADD. No dummy length arguments are needed at the gimple level. Such approach can make codes much cleaner and reasonable. Consider this following case: void foo (float * __restrict a, float * __restrict b, int * __restrict cond, int n) { for (int i = 0; i < n; i++) if (cond[i]) a[i] = b[i] + a[i]; } Output of RISC-V (32-bits) gcc (trunk) (Compiler #3) :5:21: missed: couldn't vectorize loop :5:21: missed: not vectorized: control flow in loop. ARM SVE: ... mask__27.10_51 = vect__4.9_49 != { 0, ... }; ... vec_mask_and_55 = loop_mask_49 & mask__27.10_51; ... vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, vect__6.13_56); For RVV, we want IR as follows: ... _68 = .SELECT_VL (ivtmp_66, POLY_INT_CST [4, 4]); ... mask__27.10_51 = vect__4.9_49 != { 0, ... }; ... vect__9.17_60 = .COND_LEN_ADD (mask__27.10_51, vect__6.13_55, vect__8.16_59, vect__6.13_55, _68, 0); ... Both len and mask of COND_LEN_ADD are real not dummy. This patch has been fully tested in RISC-V port with supporting both COND_* and COND_LEN_*. And also, Bootstrap and Regression on X86 passed. OK for trunk? gcc/ChangeLog: * internal-fn.cc (FOR_EACH_LEN_FN_PAIR): New macro. (get_len_internal_fn): New function. (CASE): Ditto. * internal-fn.h (get_len_internal_fn): Ditto. * tree-vect-stmts.cc (vectorizable_call): Support CALL vectorization with COND_LEN_*. --- gcc/internal-fn.cc | 46 ++++++++++++++++++++++ gcc/internal-fn.h | 1 + gcc/tree-vect-stmts.cc | 87 +++++++++++++++++++++++++++++++++++++----- 3 files changed, 125 insertions(+), 9 deletions(-) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 8e294286388..379220bebc7 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -4443,6 +4443,52 @@ get_conditional_internal_fn (internal_fn fn) } } +/* Invoke T(IFN) for each internal function IFN that also has an + IFN_COND_LEN_* or IFN_MASK_LEN_* form. */ +#define FOR_EACH_LEN_FN_PAIR(T) \ + T (MASK_LOAD, MASK_LEN_LOAD) \ + T (MASK_STORE, MASK_LEN_STORE) \ + T (MASK_GATHER_LOAD, MASK_LEN_GATHER_LOAD) \ + T (MASK_SCATTER_STORE, MASK_LEN_SCATTER_STORE) \ + T (COND_ADD, COND_LEN_ADD) \ + T (COND_SUB, COND_LEN_SUB) \ + T (COND_MUL, COND_LEN_MUL) \ + T (COND_DIV, COND_LEN_DIV) \ + T (COND_MOD, COND_LEN_MOD) \ + T (COND_RDIV, COND_LEN_RDIV) \ + T (COND_FMIN, COND_LEN_FMIN) \ + T (COND_FMAX, COND_LEN_FMAX) \ + T (COND_MIN, COND_LEN_MIN) \ + T (COND_MAX, COND_LEN_MAX) \ + T (COND_AND, COND_LEN_AND) \ + T (COND_IOR, COND_LEN_IOR) \ + T (COND_XOR, COND_LEN_XOR) \ + T (COND_SHL, COND_LEN_SHL) \ + T (COND_SHR, COND_LEN_SHR) \ + T (COND_NEG, COND_LEN_NEG) \ + T (COND_FMA, COND_LEN_FMA) \ + T (COND_FMS, COND_LEN_FMS) \ + T (COND_FNMA, COND_LEN_FNMA) \ + T (COND_FNMS, COND_LEN_FNMS) + +/* If there exists an internal function like IFN that operates on vectors, + but with additional length and bias parameters, return the internal_fn + for that function, otherwise return IFN_LAST. */ +internal_fn +get_len_internal_fn (internal_fn fn) +{ + switch (fn) + { +#define CASE(NAME, LEN_NAME) \ + case IFN_##NAME: \ + return IFN_##LEN_NAME; + FOR_EACH_LEN_FN_PAIR (CASE) +#undef CASE + default: + return IFN_LAST; + } +} + /* If IFN implements the conditional form of an unconditional internal function, return that unconditional function, otherwise return IFN_LAST. */ diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index a5c3f4765ff..410c1b623d6 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -224,6 +224,7 @@ extern bool set_edom_supported_p (void); extern internal_fn get_conditional_internal_fn (tree_code); extern internal_fn get_conditional_internal_fn (internal_fn); +extern internal_fn get_len_internal_fn (internal_fn); extern internal_fn get_conditional_len_internal_fn (tree_code); extern tree_code conditional_internal_fn_code (internal_fn); extern internal_fn get_unconditional_internal_fn (internal_fn); diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 6a4e8fce126..ae5b0b09c08 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -3540,7 +3540,10 @@ vectorizable_call (vec_info *vinfo, int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info); internal_fn cond_fn = get_conditional_internal_fn (ifn); + internal_fn cond_len_fn = get_len_internal_fn (ifn); + int len_opno = internal_fn_len_index (cond_len_fn); vec_loop_masks *masks = (loop_vinfo ? &LOOP_VINFO_MASKS (loop_vinfo) : NULL); + vec_loop_lens *lens = (loop_vinfo ? &LOOP_VINFO_LENS (loop_vinfo) : NULL); if (!vec_stmt) /* transformation not required. */ { if (slp_node) @@ -3586,8 +3589,14 @@ vectorizable_call (vec_info *vinfo, tree scalar_mask = NULL_TREE; if (mask_opno >= 0) scalar_mask = gimple_call_arg (stmt_info->stmt, mask_opno); - vect_record_loop_mask (loop_vinfo, masks, nvectors, - vectype_out, scalar_mask); + if (cond_len_fn != IFN_LAST + && direct_internal_fn_supported_p (cond_len_fn, vectype_out, + OPTIMIZE_FOR_SPEED)) + vect_record_loop_len (loop_vinfo, lens, nvectors, vectype_out, + 1); + else + vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype_out, + scalar_mask); } } return true; @@ -3603,8 +3612,24 @@ vectorizable_call (vec_info *vinfo, vec_dest = vect_create_destination_var (scalar_dest, vectype_out); bool masked_loop_p = loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); + bool len_loop_p = loop_vinfo && LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo); unsigned int vect_nargs = nargs; - if (masked_loop_p && reduc_idx >= 0) + if (len_loop_p) + { + if (len_opno >= 0) + { + ifn = cond_len_fn; + /* COND_* -> COND_LEN_* takes 2 extra arguments:LEN,BIAS. */ + vect_nargs += 2; + } + else if (reduc_idx >= 0) + { + /* FMA -> COND_LEN_FMA takes 4 extra arguments:MASK,ELSE,LEN,BIAS. */ + ifn = get_len_internal_fn (cond_fn); + vect_nargs += 4; + } + } + else if (masked_loop_p && reduc_idx >= 0) { ifn = cond_fn; vect_nargs += 2; @@ -3629,7 +3654,18 @@ vectorizable_call (vec_info *vinfo, FOR_EACH_VEC_ELT (vec_oprnds0, i, vec_oprnd0) { int varg = 0; - if (masked_loop_p && reduc_idx >= 0) + if (len_loop_p && reduc_idx >= 0) + { + /* Always true for SLP. */ + gcc_assert (ncopies == 1); + /* For COND_LEN_* operations used by reduction of + CALL vectorization, the LEN argument is the real + loop len produced by SELECT_VL or MIN wheras the + MASK argument here is the dummy mask. */ + vargs[varg++] + = build_minus_one_cst (truth_type_for (vectype_out)); + } + else if (masked_loop_p && reduc_idx >= 0) { unsigned int vec_num = vec_oprnds0.length (); /* Always true for SLP. */ @@ -3644,7 +3680,7 @@ vectorizable_call (vec_info *vinfo, vec vec_oprndsk = vec_defs[k]; vargs[varg++] = vec_oprndsk[i]; } - if (masked_loop_p && reduc_idx >= 0) + if ((masked_loop_p || len_loop_p) && reduc_idx >= 0) vargs[varg++] = vargs[reduc_idx + 1]; gimple *new_stmt; if (modifier == NARROW) @@ -3671,7 +3707,21 @@ vectorizable_call (vec_info *vinfo, } else { - if (mask_opno >= 0 && masked_loop_p) + if (len_opno >= 0 && len_loop_p) + { + unsigned int vec_num = vec_oprnds0.length (); + /* Always true for SLP. */ + gcc_assert (ncopies == 1); + tree len + = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, + vectype_out, i, 1); + signed char biasval + = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + tree bias = build_int_cst (intQI_type_node, biasval); + vargs[len_opno] = len; + vargs[len_opno + 1] = bias; + } + else if (mask_opno >= 0 && masked_loop_p) { unsigned int vec_num = vec_oprnds0.length (); /* Always true for SLP. */ @@ -3701,7 +3751,16 @@ vectorizable_call (vec_info *vinfo, } int varg = 0; - if (masked_loop_p && reduc_idx >= 0) + if (len_loop_p && reduc_idx >= 0) + { + /* For COND_LEN_* operations used by reduction of + CALL vectorization, the LEN argument is the real + loop len produced by SELECT_VL or MIN wheras the + MASK argument here is the dummy mask. */ + vargs[varg++] + = build_minus_one_cst (truth_type_for (vectype_out)); + } + else if (masked_loop_p && reduc_idx >= 0) vargs[varg++] = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype_out, j); for (i = 0; i < nargs; i++) @@ -3716,10 +3775,20 @@ vectorizable_call (vec_info *vinfo, } vargs[varg++] = vec_defs[i][j]; } - if (masked_loop_p && reduc_idx >= 0) + if ((masked_loop_p || len_loop_p) && reduc_idx >= 0) vargs[varg++] = vargs[reduc_idx + 1]; - if (mask_opno >= 0 && masked_loop_p) + if (len_opno >= 0 && len_loop_p) + { + tree len = vect_get_loop_len (loop_vinfo, gsi, lens, ncopies, + vectype_out, j, 1); + signed char biasval + = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + tree bias = build_int_cst (intQI_type_node, biasval); + vargs[len_opno] = len; + vargs[len_opno + 1] = bias; + } + else if (mask_opno >= 0 && masked_loop_p) { tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype_out, j);