From patchwork Mon Aug 7 09:38:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 131768 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c44e:0:b0:3f2:4152:657d with SMTP id w14csp1331085vqr; Mon, 7 Aug 2023 02:39:02 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGUHU1TtNxEb2wg+ZpuOpG11T2di6p0IHDN1Xb/qyqRCvHq96KxTRiSFCm2oVaHzFUXD4kN X-Received: by 2002:a17:906:5193:b0:99c:1ca3:859f with SMTP id y19-20020a170906519300b0099c1ca3859fmr6576673ejk.11.1691401142421; Mon, 07 Aug 2023 02:39:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691401142; cv=none; d=google.com; s=arc-20160816; b=CxypDQz64lE+Q1kZbJksdJpHo4WjHk8vxN1Z0RapZmOGYbdDx9ztQE0orcY8cnwK3f enVZooHlh96sVnwNO667UHq0zMGcza53iaAtR1rfwXHwwMimqRZofULeS1tVeJRrp3HQ EG1fNIABleldysowQ1HDzH3yOpFGRhSK922BH6Qa/3alYolHSHhJrnVW8UfTi/2jTjah fcgPwiaLe5fIWlutJPVkEYH8+xF9ynoUoqVWB70LAytOPud61XvhFx68YTY/QNsQyxX8 dry10hPGzX1p1HjICoWA/87kLmcRKz2WcNltgObnyN4VBWAQNWdBpcclXqhuWzsk+vHr dHjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=EQGDoddt5eaC8YdLnfcX0TO6s71PfhuRTRpcCNjjbEU=; fh=CHbn33ss3MXGFqXGZpS89+qfBQv2oFkoJCJQmHw6RIo=; b=XJlahSAlvy5TVsaoTAVonYAocWD8wOrGiOf9/g2LxynLJq0omrbk5mVkUhlyOfz3oi AglL7oWVY+w8VVdC118prm7w4Ra7AhoOMEoleCdZVL7xB3Rf6F4TYvYhUqdL+ykftqBB SPIJm0CVTj8Jf8gO9SbzfzODHz2oOq7dYw06EMSWK5iCvPfhSeRY8jl7m71gIfTzHTcl Dp33d5aOTtI156cgkV2Oi5+PfdKN3CPIXXNb7Cs2WupqNGfIF8J+29I+XQKxYhwrt118 eoKgEAEirZgl/ltD8cqu983oG3b/fyRC4VGNyBQq4KkWiURDc4loQWhkIS4zEed+5TsZ 9SbA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id i27-20020a170906251b00b0099a1f78de48si5340841ejb.638.2023.08.07.02.39.02 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 02:39:02 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 00D313857704 for ; Mon, 7 Aug 2023 09:38:48 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgau1.qq.com (smtpbgau1.qq.com [54.206.16.166]) by sourceware.org (Postfix) with ESMTPS id 1693B3858417 for ; Mon, 7 Aug 2023 09:38:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1693B3858417 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp63t1691401095ti6sn791 Received: from rios-cad5.localdomain ( [58.60.1.11]) by bizesmtp.qq.com (ESMTP) with id ; Mon, 07 Aug 2023 17:38:14 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: CR3LFp2JE4lhF0yd0KomtLX+n6kV9HhUL8lCusXC5PJ4cRB8+v93VffzVCqc0 dmqqRY9GeXh7qJr91fSbmz9mPLUBdIxqaLvtXK5wsim9442b+PYnseU4x0sy5uK3HmvVC/T PMFt/03rcev5D2op3BjUX07MWJHSwsp9qGWTLr6CvxvBjoMnv9flUmvDoBGG2b/fpPdTWqa WXe4I606sUE3UCs7I9BJyjPzKy0xDGIGnbXPSL169NUgf1AYKIVZ4U8QFQJfhV/TN1Cv/Xl vMwcIM/T1jlxQUAdm7il2iikcIWqUOIHk+lz8moOAS65fx+gUxpnL0c3VYqxL+hMABziacm aIeOALNOPGX3A2ZGtCD3Sxdb4etK2qO7Idbvw4v7gKQrBhvZIcu5q3cjFaykfQGcJtiDFl9 SwgXnKehScLNe0qFFO2aqA== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 1586544276459470964 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Ju-Zhe Zhong Subject: [PATCH V4] VECT: Support CALL vectorization for COND_LEN_* Date: Mon, 7 Aug 2023 17:38:12 +0800 Message-Id: <20230807093812.1716553-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773562644470525181 X-GMAIL-MSGID: 1773562644470525181 From: Ju-Zhe Zhong Hi, Richard and Richi. Base on the suggestions from Richard: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html This patch choose (1) approach that Richard provided, meaning: RVV implements cond_* optabs as expanders. RVV therefore supports both IFN_COND_ADD and IFN_COND_LEN_ADD. No dummy length arguments are needed at the gimple level. Such approach can make codes much cleaner and reasonable. Consider this following case: void foo (float * __restrict a, float * __restrict b, int * __restrict cond, int n) { for (int i = 0; i < n; i++) if (cond[i]) a[i] = b[i] + a[i]; } Output of RISC-V (32-bits) gcc (trunk) (Compiler #3) :5:21: missed: couldn't vectorize loop :5:21: missed: not vectorized: control flow in loop. ARM SVE: ... mask__27.10_51 = vect__4.9_49 != { 0, ... }; ... vec_mask_and_55 = loop_mask_49 & mask__27.10_51; ... vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, vect__6.13_56); For RVV, we want IR as follows: ... _68 = .SELECT_VL (ivtmp_66, POLY_INT_CST [4, 4]); ... mask__27.10_51 = vect__4.9_49 != { 0, ... }; ... vect__9.17_60 = .COND_LEN_ADD (mask__27.10_51, vect__6.13_55, vect__8.16_59, vect__6.13_55, _68, 0); ... Both len and mask of COND_LEN_ADD are real not dummy. This patch has been fully tested in RISC-V port with supporting both COND_* and COND_LEN_*. And also, Bootstrap and Regression on X86 passed. OK for trunk? gcc/ChangeLog: * internal-fn.cc (get_len_internal_fn): New function. (DEF_INTERNAL_COND_FN): Ditto. (DEF_INTERNAL_SIGNED_COND_FN): Ditto. * internal-fn.h (get_len_internal_fn): Ditto. * tree-vect-stmts.cc (vectorizable_call): Add CALL auto-vectorization. --- gcc/internal-fn.cc | 24 +++++++++++++++++ gcc/internal-fn.h | 1 + gcc/tree-vect-stmts.cc | 58 ++++++++++++++++++++++++++++++++++++++---- 3 files changed, 78 insertions(+), 5 deletions(-) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 8e294286388..7f5ede00c02 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -4443,6 +4443,30 @@ get_conditional_internal_fn (internal_fn fn) } } +/* If there exists an internal function like IFN that operates on vectors, + but with additional length and bias parameters, return the internal_fn + for that function, otherwise return IFN_LAST. */ +internal_fn +get_len_internal_fn (internal_fn fn) +{ + switch (fn) + { +#undef DEF_INTERNAL_COND_FN +#undef DEF_INTERNAL_SIGNED_COND_FN +#define DEF_INTERNAL_COND_FN(NAME, ...) \ + case IFN_COND_##NAME: \ + return IFN_COND_LEN_##NAME; +#define DEF_INTERNAL_SIGNED_COND_FN(NAME, ...) \ + case IFN_COND_##NAME: \ + return IFN_COND_LEN_##NAME; +#include "internal-fn.def" +#undef DEF_INTERNAL_COND_FN +#undef DEF_INTERNAL_SIGNED_COND_FN + default: + return IFN_LAST; + } +} + /* If IFN implements the conditional form of an unconditional internal function, return that unconditional function, otherwise return IFN_LAST. */ diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index a5c3f4765ff..410c1b623d6 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -224,6 +224,7 @@ extern bool set_edom_supported_p (void); extern internal_fn get_conditional_internal_fn (tree_code); extern internal_fn get_conditional_internal_fn (internal_fn); +extern internal_fn get_len_internal_fn (internal_fn); extern internal_fn get_conditional_len_internal_fn (tree_code); extern tree_code conditional_internal_fn_code (internal_fn); extern internal_fn get_unconditional_internal_fn (internal_fn); diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 6a4e8fce126..76b1c83f41e 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -3540,7 +3540,10 @@ vectorizable_call (vec_info *vinfo, int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info); internal_fn cond_fn = get_conditional_internal_fn (ifn); + internal_fn cond_len_fn = get_len_internal_fn (ifn); + int len_opno = internal_fn_len_index (cond_len_fn); vec_loop_masks *masks = (loop_vinfo ? &LOOP_VINFO_MASKS (loop_vinfo) : NULL); + vec_loop_lens *lens = (loop_vinfo ? &LOOP_VINFO_LENS (loop_vinfo) : NULL); if (!vec_stmt) /* transformation not required. */ { if (slp_node) @@ -3569,6 +3572,9 @@ vectorizable_call (vec_info *vinfo, if (reduc_idx >= 0 && (cond_fn == IFN_LAST || !direct_internal_fn_supported_p (cond_fn, vectype_out, + OPTIMIZE_FOR_SPEED)) + && (cond_len_fn == IFN_LAST + || !direct_internal_fn_supported_p (cond_len_fn, vectype_out, OPTIMIZE_FOR_SPEED))) { if (dump_enabled_p ()) @@ -3586,8 +3592,14 @@ vectorizable_call (vec_info *vinfo, tree scalar_mask = NULL_TREE; if (mask_opno >= 0) scalar_mask = gimple_call_arg (stmt_info->stmt, mask_opno); - vect_record_loop_mask (loop_vinfo, masks, nvectors, - vectype_out, scalar_mask); + if (cond_len_fn != IFN_LAST + && direct_internal_fn_supported_p (cond_len_fn, vectype_out, + OPTIMIZE_FOR_SPEED)) + vect_record_loop_len (loop_vinfo, lens, nvectors, vectype_out, + 1); + else + vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype_out, + scalar_mask); } } return true; @@ -3603,8 +3615,20 @@ vectorizable_call (vec_info *vinfo, vec_dest = vect_create_destination_var (scalar_dest, vectype_out); bool masked_loop_p = loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); + bool len_loop_p = loop_vinfo && LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo); unsigned int vect_nargs = nargs; - if (masked_loop_p && reduc_idx >= 0) + if (len_loop_p) + { + if (len_opno >= 0) + { + ifn = cond_len_fn; + /* COND_* -> COND_LEN_* takes 2 extra arguments:LEN,BIAS. */ + vect_nargs += 2; + } + else if (reduc_idx >= 0) + gcc_unreachable (); + } + else if (masked_loop_p && reduc_idx >= 0) { ifn = cond_fn; vect_nargs += 2; @@ -3671,7 +3695,21 @@ vectorizable_call (vec_info *vinfo, } else { - if (mask_opno >= 0 && masked_loop_p) + if (len_opno >= 0 && len_loop_p) + { + unsigned int vec_num = vec_oprnds0.length (); + /* Always true for SLP. */ + gcc_assert (ncopies == 1); + tree len + = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, + vectype_out, i, 1); + signed char biasval + = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + tree bias = build_int_cst (intQI_type_node, biasval); + vargs[len_opno] = len; + vargs[len_opno + 1] = bias; + } + else if (mask_opno >= 0 && masked_loop_p) { unsigned int vec_num = vec_oprnds0.length (); /* Always true for SLP. */ @@ -3719,7 +3757,17 @@ vectorizable_call (vec_info *vinfo, if (masked_loop_p && reduc_idx >= 0) vargs[varg++] = vargs[reduc_idx + 1]; - if (mask_opno >= 0 && masked_loop_p) + if (len_opno >= 0 && len_loop_p) + { + tree len = vect_get_loop_len (loop_vinfo, gsi, lens, ncopies, + vectype_out, j, 1); + signed char biasval + = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + tree bias = build_int_cst (intQI_type_node, biasval); + vargs[len_opno] = len; + vargs[len_opno + 1] = bias; + } + else if (mask_opno >= 0 && masked_loop_p) { tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype_out, j);