From patchwork Fri Jul 21 11:06:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 123788 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9010:0:b0:3e4:2afc:c1 with SMTP id l16csp124380vqg; Fri, 21 Jul 2023 04:06:51 -0700 (PDT) X-Google-Smtp-Source: APBJJlGOUuUcSsvy4jyTcMiCTfq1ezxrtKZKUv2CYo4iqVbtUO9RAsMvoIiQwhyAaixRkFY6QBEn X-Received: by 2002:a17:907:1de6:b0:982:79fa:4532 with SMTP id og38-20020a1709071de600b0098279fa4532mr1466618ejc.53.1689937611143; Fri, 21 Jul 2023 04:06:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689937611; cv=none; d=google.com; s=arc-20160816; b=ojKthxH2YSI5Bx5lh6WwvG91kHppVYiExCvxMxlQeiv8k3QQV5PTaMuNpI+XnPB1Au Njg/JAPK+Nxuia3YjQiA6/eyK3O7QxVGkKGP5ida1z8no/thbJi675qlrubdCcxoVOUa GsaoxR2XOKBrQVb5rq8O/6eAc7uE8k0IgqJdDU8MbtdKjxO3esUm/DO6jwl181qMobWQ Pun79QsfParl/F81kOf3UhJu+XVK+7nLAvIpTJmLgoJmcaJ6ACZbGQQBCtih3wLwqdQi F9d2oTeCL51M9lXGErQbgt5W4p4miMC5bH9Jt+7Z0uatlE97VCFtuXi/qotliPjqHBiC ZlpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=JklclmBUNLWLS9wIex2el/yzbeIEd58+QHR2gf3dmv8=; fh=CHbn33ss3MXGFqXGZpS89+qfBQv2oFkoJCJQmHw6RIo=; b=vS7ZK8mmS45JC/Mo3XpI7vRtNfAMmSLMEqpaMfmkqBUd2jBoXtPp+ZpTwpmQfZIM5/ rFcc7jo6bmCj7Z6aH1kiuBxyCbw50CRUk8QZx587aZAZ3nBuklDnoEanc8KhoCZfYIus zawM+8uf9TRlDcKVCPlyXnbijRoOxLySgaAMmYzHPG3/nvGA8jZMejcLWcSjwxDQOzwj HYhHYOsoXmtapwvgeuNDqDxykalcdLI0BA+c/mEnuvPNNrvyl3L2YASA3YqhYWDscVkS 0LHBUlTR83H9GSMeTfeBtKsyPlxTrHbq0fUJ3TRX8sDEJ8sIHbSyN8rTqpEXaelNbR3d eOPQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id e27-20020a170906045b00b0099b5c6c96b6si1652236eja.490.2023.07.21.04.06.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Jul 2023 04:06:51 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BAF6D3851C20 for ; Fri, 21 Jul 2023 11:06:37 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbguseast3.qq.com (smtpbguseast3.qq.com [54.243.244.52]) by sourceware.org (Postfix) with ESMTPS id 0732A3858D35 for ; Fri, 21 Jul 2023 11:06:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0732A3858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp87t1689937566t1xn63ix Received: from rios-cad5.localdomain ( [58.60.1.11]) by bizesmtp.qq.com (ESMTP) with id ; Fri, 21 Jul 2023 19:06:05 +0800 (CST) X-QQ-SSF: 01400000000000G0U000000A0000000 X-QQ-FEAT: HsnQyNqlyjMLENmRj6kWr9FMdXTgPxnQ28QvjUSR1pQWlYrpE8FQ5eo6MN/Ob 74yRAHloS6HZOQ0H1NHdEct85v24MQEnssJPL73FOF00+FJGf+9UuKKFjpLNRdCgYaRIVbD VMndf778ALCYnVkiC0xgwi3H3erQYMVQO3yIEntCg+jWv8kPQsjxJZOVuJpWIlc07INhW0E BIaH0QH6Kdqm4tRAGmwkTWeSJhR3nI3OtknPgYIqw8y7heYPtoQ2YDUSP8rATR735ShaN0V gVwThna71tMGBEOWdxNAMlVgCAIIA5Y4K0GCW+fcFH19CxFk55rON3iVajOssqpz4WKUvkL vt5uR0paWPBnf51LTdwMZWyux+lX4xsypq32DgSI0CJzudX7ZH7QEUoFPQj8AkiA5ZDFlys VMdBCNmMO1k= X-QQ-GoodBg: 2 X-BIZMAIL-ID: 983074507600818884 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Ju-Zhe Zhong Subject: [PATCH V4] VECT: Support floating-point in-order reduction for length loop control Date: Fri, 21 Jul 2023 19:06:03 +0800 Message-Id: <20230721110603.1470072-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772024234268241918 X-GMAIL-MSGID: 1772028020367806652 From: Ju-Zhe Zhong Hi, Richard and Richi. This patch support floating-point in-order reduction for loop length control. Consider this following case: float foo (float *__restrict a, int n) { float result = 1.0; for (int i = 0; i < n; i++) result += a[i]; return result; } When compile with **NO** -ffast-math on ARM SVE, we will end up with: loop_mask = WHILE_ULT result = MASK_FOLD_LEFT_PLUS (...loop_mask...) For RVV, we don't use length loop control instead of mask: So, with this patch, we expect to see: loop_len = SELECT_VL result = MASK_LEN_FOLD_LEFT_PLUS (...loop_len...) gcc/ChangeLog: * tree-vect-loop.cc (get_masked_reduction_fn): Add mask_len_fold_left_plus. (vectorize_fold_left_reduction): Ditto. (vectorizable_reduction): Ditto. (vect_transform_reduction): Ditto. --- gcc/tree-vect-loop.cc | 41 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 36 insertions(+), 5 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index b44fb9c7712..3b296d41157 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -6800,11 +6800,13 @@ static internal_fn get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in) { internal_fn mask_reduc_fn; + internal_fn mask_len_reduc_fn; switch (reduc_fn) { case IFN_FOLD_LEFT_PLUS: mask_reduc_fn = IFN_MASK_FOLD_LEFT_PLUS; + mask_len_reduc_fn = IFN_MASK_LEN_FOLD_LEFT_PLUS; break; default: @@ -6814,6 +6816,9 @@ get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in) if (direct_internal_fn_supported_p (mask_reduc_fn, vectype_in, OPTIMIZE_FOR_SPEED)) return mask_reduc_fn; + if (direct_internal_fn_supported_p (mask_len_reduc_fn, vectype_in, + OPTIMIZE_FOR_SPEED)) + return mask_len_reduc_fn; return IFN_LAST; } @@ -6834,7 +6839,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, gimple *reduc_def_stmt, tree_code code, internal_fn reduc_fn, tree ops[3], tree vectype_in, - int reduc_index, vec_loop_masks *masks) + int reduc_index, vec_loop_masks *masks, + vec_loop_lens *lens) { class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); tree vectype_out = STMT_VINFO_VECTYPE (stmt_info); @@ -6896,8 +6902,18 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, { gimple *new_stmt; tree mask = NULL_TREE; + tree len = NULL_TREE; + tree bias = NULL_TREE; if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, i); + if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) + { + len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in, + i, 1); + signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + bias = build_int_cst (intQI_type_node, biasval); + mask = build_minus_one_cst (truth_type_for (vectype_in)); + } /* Handle MINUS by adding the negative. */ if (reduc_fn != IFN_LAST && code == MINUS_EXPR) @@ -6917,7 +6933,10 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, the preceding operation. */ if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST)) { - if (mask && mask_reduc_fn != IFN_LAST) + if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS) + new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var, + def0, mask, len, bias); + else if (mask_reduc_fn == IFN_MASK_FOLD_LEFT_PLUS) new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var, def0, mask); else @@ -7979,6 +7998,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo, else if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) { vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); + vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); internal_fn cond_fn = get_conditional_internal_fn (op.code, op.type); if (reduction_type != FOLD_LEFT_REDUCTION @@ -8006,8 +8026,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo, LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; } else - vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num, - vectype_in, NULL); + { + internal_fn mask_reduc_fn + = get_masked_reduction_fn (reduc_fn, vectype_in); + + if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS) + vect_record_loop_len (loop_vinfo, lens, ncopies * vec_num, + vectype_in, 1); + else + vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num, + vectype_in, NULL); + } } return true; } @@ -8137,6 +8166,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo, code_helper code = canonicalize_code (op.code, op.type); internal_fn cond_fn = get_conditional_internal_fn (code, op.type); vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); + vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); bool mask_by_cond_expr = use_mask_by_cond_expr_p (code, cond_fn, vectype_in); /* Transform. */ @@ -8162,7 +8192,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo, gcc_assert (code.is_tree_code ()); return vectorize_fold_left_reduction (loop_vinfo, stmt_info, gsi, vec_stmt, slp_node, reduc_def_phi, - tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks); + tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks, + lens); } bool single_defuse_cycle = STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info);