From patchwork Wed Jul 12 11:16:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 119086 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp1072975vqm; Wed, 12 Jul 2023 04:17:01 -0700 (PDT) X-Google-Smtp-Source: APBJJlEsqfBn5V6VZ5Vg6wtqpJSPglW9Zd5amkwKbpYQPdK9DzgxfGu3M7qVhJGRlOWQHWvNkwu5 X-Received: by 2002:a5d:4fc5:0:b0:313:ea84:147a with SMTP id h5-20020a5d4fc5000000b00313ea84147amr15784428wrw.64.1689160621019; Wed, 12 Jul 2023 04:17:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689160621; cv=none; d=google.com; s=arc-20160816; b=ipjGLf2V2HtqBAG3qm9D5JXYBZeidWTCvTD1vjdlMWz5i3OjOPBvgIGEONG/Wt7Qze eUR99pP2G/jqkRrfUYvfqp2Z7tGlnxKDwDT5psfW9jOP7dwwszWzVLHDF7vBjNCuMrwr kXTVO+MgGw2nTFfwovBGG3awoZXBiazzYQulapKryXuOQx24Oe8Ul2+ZNfkBELkMc6+N CiDcL5IuiXaCPM81KfYTdz1HTmwox9QyALIVR9PzAFSi+MLWAGCHWB4owQwGbjtfQsE+ 1l4Exi0cTVU7RqQxZTllI9xXEF1RFg6Jw9BlfL6c5aaYULFDOoX/0ShIdUL7mGAhmL+i yzew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=4ES0ZvnJxbth9rj4UUA/dmBlNcmT+i/lfrkx2z+e8VQ=; fh=CHbn33ss3MXGFqXGZpS89+qfBQv2oFkoJCJQmHw6RIo=; b=hqAO0zpDNfZS86QIhadEkRk4S6+JbRC5OVflLdsZ2RrziJA06usBJgdGI6b1g29E3m UER81kns0rNXXXQoDFB8LZkIarYL/wVZUtnN51NqR1tmBP2TFqjVallYt5K76qmj2AcB cgJAwmt6/fA+m/XVRyMetWfDLpUbjCLqnpcE7EWSuw6jKTfAuNAGHxNjHXaZbUYi3Rj5 1PwozDM/qIygXXeZTnqlEShut9zNkc1zANaad7YGUoGlU4zIcA3Egy62o17mUHiO2XYh J/XesRze4gqbuUbFnrHxcZaABB6bigvG57y2h3gY8O4q7HfCCNGS0Wqpy8bK1q0XT3aI wGcA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id kk25-20020a170907767900b00993a68a3af6si4257710ejc.511.2023.07.12.04.17.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Jul 2023 04:17:01 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2C3FE385703C for ; Wed, 12 Jul 2023 11:16:49 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgau2.qq.com (smtpbgau2.qq.com [54.206.34.216]) by sourceware.org (Postfix) with ESMTPS id A9E9E3858C62 for ; Wed, 12 Jul 2023 11:16:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A9E9E3858C62 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp76t1689160571t0y4i844 Received: from server1.localdomain ( [58.60.1.22]) by bizesmtp.qq.com (ESMTP) with id ; Wed, 12 Jul 2023 19:16:09 +0800 (CST) X-QQ-SSF: 01400000000000G0T000000A0000000 X-QQ-FEAT: 2i+QW+6os+3gAP2cf8PAPBXBj6tUq+hJcTEoiD13hskt5vsffKrOxqCPVWYYU VeYS4hKSxQX3St4L3HAk3F8KEHxdxqZUDVWN2bdk13ICiSp97NEjIyOKMWj3aoDgQJVyEje WRNMfv/u1Oy+sK4MEUanLIUP420+A3DtsrzYy2NYWsxk/V+zd803uBiUP8RBMLrGVaf5CNL Z+iGo1mHD3w9KumAPiogIIHwLPFo/jLstu7DDR2lDduAZmPW4xOQRZVRgiqAiqkyX6YPrKA HcnEAm8mxK/wyJIXQL4ULpmYY70afs/i0MavlpahQsodemkM9HwD7wfJzar2679E9KxNeiI ugZqDShWnGY5E2P4nuYi5H1op3WdCc2pTh779ZfmJRBAM+abv4mZB8GkOOtntk8WUqLrUSF FX6XNsk3sKFuwVmrsrS7hA== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 5679604907736943951 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Ju-Zhe Zhong Subject: [PATCH V3] VECT: Apply COND_LEN_* into vectorizable_operation Date: Wed, 12 Jul 2023 19:16:08 +0800 Message-Id: <20230712111608.71951-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771210782911458622 X-GMAIL-MSGID: 1771213287174634730 From: Ju-Zhe Zhong Hi, Richard and Richi. As we disscussed before, COND_LEN_* patterns were added for multiple situations. This patch apply CON_LEN_* for the following situation: Support for the situation that in "vectorizable_operation": /* If operating on inactive elements could generate spurious traps, we need to restrict the operation to active lanes. Note that this specifically doesn't apply to unhoisted invariants, since they operate on the same value for every lane. Similarly, if this operation is part of a reduction, a fully-masked loop should only change the active lanes of the reduction chain, keeping the inactive lanes as-is. */ bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt)) || reduc_idx >= 0); For mask_out_inactive is true with length loop control. So, we can these 2 following cases: 1. Integer division: #define TEST_TYPE(TYPE) \ __attribute__((noipa)) \ void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \ { \ for (int i = 0; i < n; i++) \ dst[i] = a[i] % b[i]; \ } #define TEST_ALL() \ TEST_TYPE(int8_t) \ TEST_ALL() With this patch: _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]); ivtmp_45 = _61 * 4; vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... }); vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... }); vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, vect__4.8_48, _61, 0); .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53); 2. Floating-point arithmetic **WITHOUT** -ffast-math #define TEST_TYPE(TYPE) \ __attribute__((noipa)) \ void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \ { \ for (int i = 0; i < n; i++) \ dst[i] = a[i] + b[i]; \ } #define TEST_ALL() \ TEST_TYPE(float) \ TEST_ALL() With this patch: _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]); ivtmp_45 = _61 * 4; vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... }); vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... }); vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, vect__4.8_48, _61, 0); .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53); With this patch, we can make sure operations won't trap for elements that "mask_out_inactive". gcc/ChangeLog: * internal-fn.cc (FOR_EACH_CODE_MAPPING): Adapt for COND_LEN_* support. (CASE): Ditto. (get_conditional_len_internal_fn): New function. * internal-fn.h (get_conditional_len_internal_fn): Ditto. * tree-vect-stmts.cc (vectorizable_operation): Adapt for COND_LEN_* support. --- gcc/internal-fn.cc | 73 +++++++++++++++++++++++++++++++----------- gcc/internal-fn.h | 1 + gcc/tree-vect-stmts.cc | 48 ++++++++++++++++++++------- 3 files changed, 93 insertions(+), 29 deletions(-) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index f9aaf66cf2a..b288ac6fe6b 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -4276,23 +4276,24 @@ static void (*const internal_fn_expanders[]) (internal_fn, gcall *) = { 0 }; -/* Invoke T(CODE, IFN) for each conditional function IFN that maps to a - tree code CODE. */ +/* Invoke T(CODE, SUFFIX) for each conditional function IFN_COND_##SUFFIX + that maps to a tree code CODE. There is also an IFN_COND_LEN_##SUFFIX + for each such IFN_COND_##SUFFIX. */ #define FOR_EACH_CODE_MAPPING(T) \ - T (PLUS_EXPR, IFN_COND_ADD) \ - T (MINUS_EXPR, IFN_COND_SUB) \ - T (MULT_EXPR, IFN_COND_MUL) \ - T (TRUNC_DIV_EXPR, IFN_COND_DIV) \ - T (TRUNC_MOD_EXPR, IFN_COND_MOD) \ - T (RDIV_EXPR, IFN_COND_RDIV) \ - T (MIN_EXPR, IFN_COND_MIN) \ - T (MAX_EXPR, IFN_COND_MAX) \ - T (BIT_AND_EXPR, IFN_COND_AND) \ - T (BIT_IOR_EXPR, IFN_COND_IOR) \ - T (BIT_XOR_EXPR, IFN_COND_XOR) \ - T (LSHIFT_EXPR, IFN_COND_SHL) \ - T (RSHIFT_EXPR, IFN_COND_SHR) \ - T (NEGATE_EXPR, IFN_COND_NEG) + T (PLUS_EXPR, ADD) \ + T (MINUS_EXPR, SUB) \ + T (MULT_EXPR, MUL) \ + T (TRUNC_DIV_EXPR, DIV) \ + T (TRUNC_MOD_EXPR, MOD) \ + T (RDIV_EXPR, RDIV) \ + T (MIN_EXPR, MIN) \ + T (MAX_EXPR, MAX) \ + T (BIT_AND_EXPR, AND) \ + T (BIT_IOR_EXPR, IOR) \ + T (BIT_XOR_EXPR, XOR) \ + T (LSHIFT_EXPR, SHL) \ + T (RSHIFT_EXPR, SHR) \ + T (NEGATE_EXPR, NEG) /* Return a function that only performs CODE when a certain condition is met and that uses a given fallback value otherwise. For example, if CODE is @@ -4313,7 +4314,7 @@ get_conditional_internal_fn (tree_code code) { switch (code) { -#define CASE(CODE, IFN) case CODE: return IFN; +#define CASE(CODE, IFN) case CODE: return IFN_COND_##IFN; FOR_EACH_CODE_MAPPING(CASE) #undef CASE default: @@ -4329,7 +4330,7 @@ conditional_internal_fn_code (internal_fn ifn) { switch (ifn) { -#define CASE(CODE, IFN) case IFN: return CODE; +#define CASE(CODE, IFN) case IFN_COND_##IFN: return CODE; FOR_EACH_CODE_MAPPING(CASE) #undef CASE default: @@ -4337,6 +4338,42 @@ conditional_internal_fn_code (internal_fn ifn) } } +/* Like get_conditional_internal_fn, but return a function that + additionally restricts the operation to the leading elements + of a vector. The number of elements to process is given by + a length and bias pair. The function only performs the CODE + when a certain condition is met as well as the element is located + within LEN + BIAS (i < LEN + BIAS) and that uses a given fallback value + otherwise. + + For example, if CODE is [PLUS, MINUS, ... etc]: + + LHS = FN (COND, A, B, ELSE, LEN, BIAS) + + is equivalent to the C expression: + + for (int i = 0; i < NUNITS; i++) + { + if (COND[i] && i < (LEN + BIAS)) + LHS[i] = A[i] CODE B[i]; + else + LHS[i] = ELSE[i]; + } +*/ + +internal_fn +get_conditional_len_internal_fn (tree_code code) +{ + switch (code) + { +#define CASE(CODE, IFN) case CODE: return IFN_COND_LEN_##IFN; + FOR_EACH_CODE_MAPPING(CASE) +#undef CASE + default: + return IFN_LAST; + } +} + /* Invoke T(IFN) for each internal function IFN that also has an IFN_COND_* form. */ #define FOR_EACH_COND_FN_PAIR(T) \ diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 4234bbfed87..dd1bab0bddf 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -224,6 +224,7 @@ extern bool set_edom_supported_p (void); extern internal_fn get_conditional_internal_fn (tree_code); extern internal_fn get_conditional_internal_fn (internal_fn); +extern internal_fn get_conditional_len_internal_fn (tree_code); extern tree_code conditional_internal_fn_code (internal_fn); extern internal_fn get_unconditional_internal_fn (internal_fn); extern bool can_interpret_as_conditional_op_p (gimple *, tree *, diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 10e71178ce7..dd24f017235 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -6711,7 +6711,9 @@ vectorizable_operation (vec_info *vinfo, int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info); vec_loop_masks *masks = (loop_vinfo ? &LOOP_VINFO_MASKS (loop_vinfo) : NULL); + vec_loop_lens *lens = (loop_vinfo ? &LOOP_VINFO_LENS (loop_vinfo) : NULL); internal_fn cond_fn = get_conditional_internal_fn (code); + internal_fn cond_len_fn = get_conditional_len_internal_fn (code); /* If operating on inactive elements could generate spurious traps, we need to restrict the operation to active lanes. Note that this @@ -6730,9 +6732,17 @@ vectorizable_operation (vec_info *vinfo, && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) && mask_out_inactive) { - if (cond_fn == IFN_LAST - || !direct_internal_fn_supported_p (cond_fn, vectype, - OPTIMIZE_FOR_SPEED)) + if (cond_fn != IFN_LAST + && direct_internal_fn_supported_p (cond_fn, vectype, + OPTIMIZE_FOR_SPEED)) + vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num, + vectype, NULL); + else if (cond_len_fn != IFN_LAST + && direct_internal_fn_supported_p (cond_len_fn, vectype, + OPTIMIZE_FOR_SPEED)) + vect_record_loop_len (loop_vinfo, lens, ncopies * vec_num, vectype, + 1); + else { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -6740,9 +6750,6 @@ vectorizable_operation (vec_info *vinfo, " conditional operation is available.\n"); LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; } - else - vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num, - vectype, NULL); } /* Put types on constant and invariant SLP children. */ @@ -6805,6 +6812,7 @@ vectorizable_operation (vec_info *vinfo, "transform binary/unary operation.\n"); bool masked_loop_p = loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); + bool len_loop_p = loop_vinfo && LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo); /* POINTER_DIFF_EXPR has pointer arguments which are vectorized as vectors with unsigned elements, but the result is signed. So, we @@ -6971,11 +6979,16 @@ vectorizable_operation (vec_info *vinfo, gimple_assign_set_lhs (new_stmt, new_temp); vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); } - else if (masked_loop_p && mask_out_inactive) + else if ((masked_loop_p || len_loop_p) && mask_out_inactive) { - tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, - vec_num * ncopies, vectype, i); - auto_vec vops (5); + tree mask; + if (masked_loop_p) + mask = vect_get_loop_mask (loop_vinfo, gsi, masks, + vec_num * ncopies, vectype, i); + else + /* Dummy mask. */ + mask = build_minus_one_cst (truth_type_for (vectype)); + auto_vec vops (6); vops.quick_push (mask); vops.quick_push (vop0); if (vop1) @@ -6995,7 +7008,20 @@ vectorizable_operation (vec_info *vinfo, (cond_fn, vectype, vops.length () - 1, &vops[1]); vops.quick_push (else_value); } - gcall *call = gimple_build_call_internal_vec (cond_fn, vops); + if (len_loop_p) + { + tree len = vect_get_loop_len (loop_vinfo, gsi, lens, + vec_num * ncopies, vectype, i, 1); + signed char biasval + = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + tree bias = build_int_cst (intQI_type_node, biasval); + vops.quick_push (len); + vops.quick_push (bias); + } + gcall *call + = gimple_build_call_internal_vec (masked_loop_p ? cond_fn + : cond_len_fn, + vops); new_temp = make_ssa_name (vec_dest, call); gimple_call_set_lhs (call, new_temp); gimple_call_set_nothrow (call, true);