From patchwork Tue Aug 1 06:37:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 129013 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp2480611vqg; Mon, 31 Jul 2023 23:38:39 -0700 (PDT) X-Google-Smtp-Source: APBJJlGc13AkamDxNXxhWcRmJ2FKYBDhecG/aabXTVgFvXkxTdqokV4A9WejhnaXAP0bZalCBNje X-Received: by 2002:aa7:cd47:0:b0:522:37f1:5fd0 with SMTP id v7-20020aa7cd47000000b0052237f15fd0mr1791284edw.5.1690871919267; Mon, 31 Jul 2023 23:38:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690871919; cv=none; d=google.com; s=arc-20160816; b=OpEv+lNJcG313BS8yQ2/Wutcbk76fj7zT6fC0S6tyw7azXuYS52EOA/gZx/34oAgim N5NB4rxJ+2pNwrUzvh0+yuoxsLbN89DP3/BwWu9WtQwmQHkcSTR6Unly9jwzkORdEDoQ k2bBqZUxNxdvvt5bi6BbCGigC4vF6ZWm/rhtQzIvVLwhUJD6w9hTtvIx5aSPgZMANAsl 93H07z3qz32Op3uHOxB9tkG9rZmI5N+/IHVtN57/wcVeQVUB/c2KmeRw0xTzXSdWz40B /JP8nBiXXiroqRZ5WW1WFGG49x7B9cIwCDj2SfqzTh1k0Ih5OcODLFBkj0BWO7Hh6VPf qHrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=wPQYNasQCeBgO+VT63BxXSaa8oADTZlMfQ7caU5kqRA=; fh=CHbn33ss3MXGFqXGZpS89+qfBQv2oFkoJCJQmHw6RIo=; b=r6m62X1L4sPq5PgXsSKNBnLXmL5K8n1xw3mxTnW6xcf3PfCgVNQrjxBFkJnepWFq7r ks1L1bxQQDct58uoXeJUJI5wV21tg2eaV/Y47ftvwpJ1TsOPkuKxHi0b1hEmRAj00hqY 8T+nrB6vIUBMi+CFYfDijotRQa4DbdQ0/5hGUz/NqH/JgV7lkuJUCi6QOCy1b24t0PRm DPZuNaKm80xW1gSHLf7kXmqA2T2fGdYC1JEEMkgBOPTxnyfIS3HnwhnPYypbC9gH4+BJ Yfxp8o/SgPm7zHZLeiR6x+QgM4vi7+ip19iaD9MhTqJiRrYT2pMQaH2Ft79/8A4FJL6t iOXA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id s25-20020a056402165900b00522205a4d35si4368514edx.103.2023.07.31.23.38.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Jul 2023 23:38:39 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 358683858408 for ; Tue, 1 Aug 2023 06:38:29 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbg151.qq.com (smtpbg151.qq.com [18.169.211.239]) by sourceware.org (Postfix) with ESMTPS id 680C93858401 for ; Tue, 1 Aug 2023 06:37:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 680C93858401 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp91t1690871867taos56wm Received: from server1.localdomain ( [58.60.1.10]) by bizesmtp.qq.com (ESMTP) with id ; Tue, 01 Aug 2023 14:37:46 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: znfcQSa1hKaeT09OsCE157WembE5DqliwtbbLJIRsHAi31lRdnAez8Na4eS3l kjTmlyemihm5LQmWnKlgnid7xIuUcO4TPfOysE9CUsIhz5lcO6s8IhTQBZwrnab1gZHKRMS VWy2Vop4LaYtzlwOLGGUosbKhva2YWuGl1WcyVi5nTm8FuaoHyWYM2if007UxY5hl37iqlw frL42jz5fe+FywNqaqGuXjt6UgQOQmZy4yi6JVPOgVJrKplw3GKCvRJ2z6GT5NXfCcqNwSe c2F/Lu3h4ZHyXWKsCeiMFx6uCBBml7EE/Ads13b9hCfGmNCuL5P/AsUFWsmYotN5mVk34hY FdDl4iyeZrvHa97e4gE0Jjt7/ZV6U8ddA7pFjJ/lgFO8tKhVsUBSkdH/GrNYXuzge9yI6i+ AgJZSEe51WY= X-QQ-GoodBg: 2 X-BIZMAIL-ID: 9766250177972564283 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Ju-Zhe Zhong Subject: [PATCH V3] VECT: Support CALL vectorization for COND_LEN_* Date: Tue, 1 Aug 2023 14:37:43 +0800 Message-Id: <20230801063743.155666-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773007713461019515 X-GMAIL-MSGID: 1773007713461019515 From: Ju-Zhe Zhong Hi, Richard and Richi. Base on the suggestions from Richard: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html This patch choose (1) approach that Richard provided, meaning: RVV implements cond_* optabs as expanders. RVV therefore supports both IFN_COND_ADD and IFN_COND_LEN_ADD. No dummy length arguments are needed at the gimple level. Such approach can make codes much cleaner and reasonable. Consider this following case: void foo (float * __restrict a, float * __restrict b, int * __restrict cond, int n) { for (int i = 0; i < n; i++) if (cond[i]) a[i] = b[i] + a[i]; } Output of RISC-V (32-bits) gcc (trunk) (Compiler #3) :5:21: missed: couldn't vectorize loop :5:21: missed: not vectorized: control flow in loop. ARM SVE: ... mask__27.10_51 = vect__4.9_49 != { 0, ... }; ... vec_mask_and_55 = loop_mask_49 & mask__27.10_51; ... vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, vect__6.13_56); For RVV, we want IR as follows: ... _68 = .SELECT_VL (ivtmp_66, POLY_INT_CST [4, 4]); ... mask__27.10_51 = vect__4.9_49 != { 0, ... }; ... vect__9.17_60 = .COND_LEN_ADD (mask__27.10_51, vect__6.13_55, vect__8.16_59, vect__6.13_55, _68, 0); ... Both len and mask of COND_LEN_ADD are real not dummy. This patch has been fully tested in RISC-V port with supporting both COND_* and COND_LEN_*. And also, Bootstrap and Regression on X86 passed. OK for trunk? gcc/ChangeLog: * internal-fn.cc (get_len_internal_fn): New function. (DEF_INTERNAL_COND_FN): Ditto. (DEF_INTERNAL_SIGNED_COND_FN): Ditto. * internal-fn.h (get_len_internal_fn): Ditto. * tree-vect-stmts.cc (vectorizable_call): Add CALL auto-vectorization. --- gcc/internal-fn.cc | 24 +++++++++++ gcc/internal-fn.h | 1 + gcc/tree-vect-stmts.cc | 90 +++++++++++++++++++++++++++++++++++++----- 3 files changed, 106 insertions(+), 9 deletions(-) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 8e294286388..7f5ede00c02 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -4443,6 +4443,30 @@ get_conditional_internal_fn (internal_fn fn) } } +/* If there exists an internal function like IFN that operates on vectors, + but with additional length and bias parameters, return the internal_fn + for that function, otherwise return IFN_LAST. */ +internal_fn +get_len_internal_fn (internal_fn fn) +{ + switch (fn) + { +#undef DEF_INTERNAL_COND_FN +#undef DEF_INTERNAL_SIGNED_COND_FN +#define DEF_INTERNAL_COND_FN(NAME, ...) \ + case IFN_COND_##NAME: \ + return IFN_COND_LEN_##NAME; +#define DEF_INTERNAL_SIGNED_COND_FN(NAME, ...) \ + case IFN_COND_##NAME: \ + return IFN_COND_LEN_##NAME; +#include "internal-fn.def" +#undef DEF_INTERNAL_COND_FN +#undef DEF_INTERNAL_SIGNED_COND_FN + default: + return IFN_LAST; + } +} + /* If IFN implements the conditional form of an unconditional internal function, return that unconditional function, otherwise return IFN_LAST. */ diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index a5c3f4765ff..410c1b623d6 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -224,6 +224,7 @@ extern bool set_edom_supported_p (void); extern internal_fn get_conditional_internal_fn (tree_code); extern internal_fn get_conditional_internal_fn (internal_fn); +extern internal_fn get_len_internal_fn (internal_fn); extern internal_fn get_conditional_len_internal_fn (tree_code); extern tree_code conditional_internal_fn_code (internal_fn); extern internal_fn get_unconditional_internal_fn (internal_fn); diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 6a4e8fce126..97106b8c475 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -3540,7 +3540,10 @@ vectorizable_call (vec_info *vinfo, int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info); internal_fn cond_fn = get_conditional_internal_fn (ifn); + internal_fn cond_len_fn = get_len_internal_fn (ifn); + int len_opno = internal_fn_len_index (cond_len_fn); vec_loop_masks *masks = (loop_vinfo ? &LOOP_VINFO_MASKS (loop_vinfo) : NULL); + vec_loop_lens *lens = (loop_vinfo ? &LOOP_VINFO_LENS (loop_vinfo) : NULL); if (!vec_stmt) /* transformation not required. */ { if (slp_node) @@ -3569,6 +3572,9 @@ vectorizable_call (vec_info *vinfo, if (reduc_idx >= 0 && (cond_fn == IFN_LAST || !direct_internal_fn_supported_p (cond_fn, vectype_out, + OPTIMIZE_FOR_SPEED)) + && (cond_len_fn == IFN_LAST + || !direct_internal_fn_supported_p (cond_len_fn, vectype_out, OPTIMIZE_FOR_SPEED))) { if (dump_enabled_p ()) @@ -3586,8 +3592,14 @@ vectorizable_call (vec_info *vinfo, tree scalar_mask = NULL_TREE; if (mask_opno >= 0) scalar_mask = gimple_call_arg (stmt_info->stmt, mask_opno); - vect_record_loop_mask (loop_vinfo, masks, nvectors, - vectype_out, scalar_mask); + if (cond_len_fn != IFN_LAST + && direct_internal_fn_supported_p (cond_len_fn, vectype_out, + OPTIMIZE_FOR_SPEED)) + vect_record_loop_len (loop_vinfo, lens, nvectors, vectype_out, + 1); + else + vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype_out, + scalar_mask); } } return true; @@ -3603,8 +3615,24 @@ vectorizable_call (vec_info *vinfo, vec_dest = vect_create_destination_var (scalar_dest, vectype_out); bool masked_loop_p = loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); + bool len_loop_p = loop_vinfo && LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo); unsigned int vect_nargs = nargs; - if (masked_loop_p && reduc_idx >= 0) + if (len_loop_p) + { + if (len_opno >= 0) + { + ifn = cond_len_fn; + /* COND_* -> COND_LEN_* takes 2 extra arguments:LEN,BIAS. */ + vect_nargs += 2; + } + else if (reduc_idx >= 0) + { + /* FMA -> COND_LEN_FMA takes 4 extra arguments:MASK,ELSE,LEN,BIAS. */ + ifn = get_len_internal_fn (cond_fn); + vect_nargs += 4; + } + } + else if (masked_loop_p && reduc_idx >= 0) { ifn = cond_fn; vect_nargs += 2; @@ -3629,7 +3657,18 @@ vectorizable_call (vec_info *vinfo, FOR_EACH_VEC_ELT (vec_oprnds0, i, vec_oprnd0) { int varg = 0; - if (masked_loop_p && reduc_idx >= 0) + if (len_loop_p && reduc_idx >= 0) + { + /* Always true for SLP. */ + gcc_assert (ncopies == 1); + /* For COND_LEN_* operations used by reduction of + CALL vectorization, the LEN argument is the real + loop len produced by SELECT_VL or MIN wheras the + MASK argument here is the dummy mask. */ + vargs[varg++] + = build_minus_one_cst (truth_type_for (vectype_out)); + } + else if (masked_loop_p && reduc_idx >= 0) { unsigned int vec_num = vec_oprnds0.length (); /* Always true for SLP. */ @@ -3644,7 +3683,7 @@ vectorizable_call (vec_info *vinfo, vec vec_oprndsk = vec_defs[k]; vargs[varg++] = vec_oprndsk[i]; } - if (masked_loop_p && reduc_idx >= 0) + if ((masked_loop_p || len_loop_p) && reduc_idx >= 0) vargs[varg++] = vargs[reduc_idx + 1]; gimple *new_stmt; if (modifier == NARROW) @@ -3671,7 +3710,21 @@ vectorizable_call (vec_info *vinfo, } else { - if (mask_opno >= 0 && masked_loop_p) + if (len_opno >= 0 && len_loop_p) + { + unsigned int vec_num = vec_oprnds0.length (); + /* Always true for SLP. */ + gcc_assert (ncopies == 1); + tree len + = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, + vectype_out, i, 1); + signed char biasval + = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + tree bias = build_int_cst (intQI_type_node, biasval); + vargs[len_opno] = len; + vargs[len_opno + 1] = bias; + } + else if (mask_opno >= 0 && masked_loop_p) { unsigned int vec_num = vec_oprnds0.length (); /* Always true for SLP. */ @@ -3701,7 +3754,16 @@ vectorizable_call (vec_info *vinfo, } int varg = 0; - if (masked_loop_p && reduc_idx >= 0) + if (len_loop_p && reduc_idx >= 0) + { + /* For COND_LEN_* operations used by reduction of + CALL vectorization, the LEN argument is the real + loop len produced by SELECT_VL or MIN wheras the + MASK argument here is the dummy mask. */ + vargs[varg++] + = build_minus_one_cst (truth_type_for (vectype_out)); + } + else if (masked_loop_p && reduc_idx >= 0) vargs[varg++] = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype_out, j); for (i = 0; i < nargs; i++) @@ -3716,10 +3778,20 @@ vectorizable_call (vec_info *vinfo, } vargs[varg++] = vec_defs[i][j]; } - if (masked_loop_p && reduc_idx >= 0) + if ((masked_loop_p || len_loop_p) && reduc_idx >= 0) vargs[varg++] = vargs[reduc_idx + 1]; - if (mask_opno >= 0 && masked_loop_p) + if (len_opno >= 0 && len_loop_p) + { + tree len = vect_get_loop_len (loop_vinfo, gsi, lens, ncopies, + vectype_out, j, 1); + signed char biasval + = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + tree bias = build_int_cst (intQI_type_node, biasval); + vargs[len_opno] = len; + vargs[len_opno + 1] = bias; + } + else if (mask_opno >= 0 && masked_loop_p) { tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype_out, j);