From patchwork Wed Jul 12 10:36:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 119056 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp1052586vqm; Wed, 12 Jul 2023 03:37:12 -0700 (PDT) X-Google-Smtp-Source: APBJJlG6QYV5X2fKHan5FsHUfsI9c1flGCsn2IKrLa76jYCg2j45fsXJQjvBgkuNxJOoFxu8IXQ4 X-Received: by 2002:a05:6512:3e0c:b0:4fb:bc46:7c09 with SMTP id i12-20020a0565123e0c00b004fbbc467c09mr19438398lfv.6.1689158232625; Wed, 12 Jul 2023 03:37:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689158232; cv=none; d=google.com; s=arc-20160816; b=IFW5IvQO0yx5DibjHMIVV6uhtchDYl5IBhFUNDbqUP3e2fFjJ1XxGjuVeRkuADSUEO Fr4IDXkG2Oz8vcTm68WsuWRT8pP06msbmhzsNq3bohcfZEEvYdynGurhBQ3S0aBB1LSa COG0+NdM3qfcZuL+HAVDr6gOSjBvMRhEjd8qHYIA/26z90ZEyL77y2iZHofHYJNJ8H8o 8qM/zR9VR7pCztKCsNcbPdckw2o6N0RGpYQWibxxPplmUKqgi58CX+A13Pm5QS2+WiEr 6hmWw3YSKDZOU1N9erVebFGZEpHGy43oA2owhgAMfEg7U7YP4T+6yeXKXZBI3myLRAXW RzPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=BkKxIUrr0DazNNFKAPWViZdxbxd4+okgzFh/OQ10h1A=; fh=CHbn33ss3MXGFqXGZpS89+qfBQv2oFkoJCJQmHw6RIo=; b=Jdt1QgvtPSdEVeUxAxeMCXD6OYrvIsm9BUdClQc0K8YiQk/7A09d75WjsBR4vLSQIK g3wUBytMvNUFkygLHKQ8gGj40w1SBJf9mPPhZa1sZn/JrJloZEC/4yw/AkNthQMmG4DM Cv7UIkvsci4PiXW+onGlNZyewQ/mbCrA5pq+SkyTE8p9mM4EBKRJU5LZ6BXn7AMCE8mI 3RCZP2h0kxtL0Q1nnmNE6o190o8L+ffClkxEfnuQqJPyXAT1FWKvyS1ZouMY9m/QVjr0 r6NWWi6/VkrwpjR5MIRt0dLbs+Zh+Gz4S/jT9GcgnSR2ekHp5WVq+Nb8k80uIcbsGZVW Hhtw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id i7-20020aa7c707000000b0051e3cc85bdfsi4096025edq.149.2023.07.12.03.37.12 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Jul 2023 03:37:12 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CC0BB3857007 for ; Wed, 12 Jul 2023 10:37:06 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbg150.qq.com (smtpbg150.qq.com [18.132.163.193]) by sourceware.org (Postfix) with ESMTPS id D60E93858D20 for ; Wed, 12 Jul 2023 10:36:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D60E93858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp91t1689158185tisj3v1i Received: from server1.localdomain ( [58.60.1.22]) by bizesmtp.qq.com (ESMTP) with id ; Wed, 12 Jul 2023 18:36:24 +0800 (CST) X-QQ-SSF: 01400000000000G0T000000A0000000 X-QQ-FEAT: lhJs5gel5SzWberJUEVAx7aRhAcvyyCS9anSN3DXr+0YUTMWDTBw1UDNC5j88 jYCU8rIXAC39TxagiyBpRlxhN1zSLXCjxEMxUJEoYev0vNNsYVR5OK8JZOsXI3V6rW23APy zejd5SKDhDbXBYJCbEcjcHsFeWIrs4B60S7wSXjfz1+MT/HKIf7T8LEVu9QsbItlB45RsC7 OLoZ2gFfogCbW1alchhdn1QP5Yb6EGvBnq1UY+IHYF7IXJ3mgr7L5XIC4Jm6c4sVQpYPtS1 l66TN9dYlJdEIH+dSaaxJ5qgmI/ybkmjYyzsR/veH+bYKQ5GlqoyB0tLFEVOYaIVCJo4n9S IMKovmCe5/JUaVRc/dnuOoxcPWpBq06S2kwvhxqEQSQqv8vw5MhnVpgYmNnu5+jXrth0pud NOVNY7pblvvObZEA0nZPYQ== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 8149378070051573277 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Ju-Zhe Zhong Subject: [PATCH V2] VECT: Apply COND_LEN_* into vectorizable_operation Date: Wed, 12 Jul 2023 18:36:21 +0800 Message-Id: <20230712103621.47696-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771210782911458622 X-GMAIL-MSGID: 1771210782911458622 From: Ju-Zhe Zhong Hi, Richard and Richi. As we disscussed before, COND_LEN_* patterns were added for multiple situations. This patch apply CON_LEN_* for the following situation: Support for the situation that in "vectorizable_operation": /* If operating on inactive elements could generate spurious traps, we need to restrict the operation to active lanes. Note that this specifically doesn't apply to unhoisted invariants, since they operate on the same value for every lane. Similarly, if this operation is part of a reduction, a fully-masked loop should only change the active lanes of the reduction chain, keeping the inactive lanes as-is. */ bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt)) || reduc_idx >= 0); For mask_out_inactive is true with length loop control. So, we can these 2 following cases: 1. Integer division: #define TEST_TYPE(TYPE) \ __attribute__((noipa)) \ void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \ { \ for (int i = 0; i < n; i++) \ dst[i] = a[i] % b[i]; \ } #define TEST_ALL() \ TEST_TYPE(int8_t) \ TEST_ALL() With this patch: _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]); ivtmp_45 = _61 * 4; vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... }); vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... }); vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, vect__4.8_48, _61, 0); .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53); 2. Floating-point arithmetic **WITHOUT** -ffast-math #define TEST_TYPE(TYPE) \ __attribute__((noipa)) \ void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \ { \ for (int i = 0; i < n; i++) \ dst[i] = a[i] + b[i]; \ } #define TEST_ALL() \ TEST_TYPE(float) \ TEST_ALL() With this patch: _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]); ivtmp_45 = _61 * 4; vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... }); vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... }); vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, vect__4.8_48, _61, 0); .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53); With this patch, we can make sure operations won't trap for elements that "mask_out_inactive". gcc/ChangeLog: * internal-fn.cc (FOR_EACH_CODE_MAPPING): Adapt for COND_LEN_* support. (CASE): Ditto. (get_conditional_len_internal_fn): New function. * internal-fn.h (get_conditional_len_internal_fn): Ditto. * tree-vect-stmts.cc (vectorizable_operation): Adapt for COND_LEN_* support. --- gcc/internal-fn.cc | 65 ++++++++++++++++++++++++++++++------------ gcc/internal-fn.h | 1 + gcc/tree-vect-stmts.cc | 48 ++++++++++++++++++++++++------- 3 files changed, 85 insertions(+), 29 deletions(-) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index f9aaf66cf2a..7e3a8cc8412 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -4276,23 +4276,24 @@ static void (*const internal_fn_expanders[]) (internal_fn, gcall *) = { 0 }; -/* Invoke T(CODE, IFN) for each conditional function IFN that maps to a - tree code CODE. */ +/* Invoke T(CODE, SUFFIX) for each conditional function IFN_COND_##SUFFIX + that maps to a tree code CODE. There is also an IFN_COND_LEN_##SUFFIX + for each such IFN_COND_##SUFFIX. */ #define FOR_EACH_CODE_MAPPING(T) \ - T (PLUS_EXPR, IFN_COND_ADD) \ - T (MINUS_EXPR, IFN_COND_SUB) \ - T (MULT_EXPR, IFN_COND_MUL) \ - T (TRUNC_DIV_EXPR, IFN_COND_DIV) \ - T (TRUNC_MOD_EXPR, IFN_COND_MOD) \ - T (RDIV_EXPR, IFN_COND_RDIV) \ - T (MIN_EXPR, IFN_COND_MIN) \ - T (MAX_EXPR, IFN_COND_MAX) \ - T (BIT_AND_EXPR, IFN_COND_AND) \ - T (BIT_IOR_EXPR, IFN_COND_IOR) \ - T (BIT_XOR_EXPR, IFN_COND_XOR) \ - T (LSHIFT_EXPR, IFN_COND_SHL) \ - T (RSHIFT_EXPR, IFN_COND_SHR) \ - T (NEGATE_EXPR, IFN_COND_NEG) + T (PLUS_EXPR, ADD) \ + T (MINUS_EXPR, SUB) \ + T (MULT_EXPR, MUL) \ + T (TRUNC_DIV_EXPR, DIV) \ + T (TRUNC_MOD_EXPR, MOD) \ + T (RDIV_EXPR, RDIV) \ + T (MIN_EXPR, MIN) \ + T (MAX_EXPR, MAX) \ + T (BIT_AND_EXPR, AND) \ + T (BIT_IOR_EXPR, IOR) \ + T (BIT_XOR_EXPR, XOR) \ + T (LSHIFT_EXPR, SHL) \ + T (RSHIFT_EXPR, SHR) \ + T (NEGATE_EXPR, NEG) /* Return a function that only performs CODE when a certain condition is met and that uses a given fallback value otherwise. For example, if CODE is @@ -4313,7 +4314,7 @@ get_conditional_internal_fn (tree_code code) { switch (code) { -#define CASE(CODE, IFN) case CODE: return IFN; +#define CASE(CODE, IFN) case CODE: return IFN_COND_##IFN; FOR_EACH_CODE_MAPPING(CASE) #undef CASE default: @@ -4329,7 +4330,7 @@ conditional_internal_fn_code (internal_fn ifn) { switch (ifn) { -#define CASE(CODE, IFN) case IFN: return CODE; +#define CASE(CODE, IFN) case IFN_COND_##IFN: return CODE; FOR_EACH_CODE_MAPPING(CASE) #undef CASE default: @@ -4337,6 +4338,34 @@ conditional_internal_fn_code (internal_fn ifn) } } +/* Return a function that only performs CODE when a certain condition is met + and that uses a given fallback value otherwise. For example, if CODE is + a binary operation associated with conditional function FN: + + LHS = FN (COND, A, B, ELSE, LEN, BIAS) + + is equivalent to the C expression: + + for (int i = 0; i < LEN + BIAS; i++) + LHS[i] = COND[i] ? A[i] CODE B[i] : ELSE[i]; + + operating elementwise if the operands are vectors. + + Return IFN_LAST if no such function exists. */ + +internal_fn +get_conditional_len_internal_fn (tree_code code) +{ + switch (code) + { +#define CASE(CODE, IFN) case CODE: return IFN_COND_LEN_##IFN; + FOR_EACH_CODE_MAPPING(CASE) +#undef CASE + default: + return IFN_LAST; + } +} + /* Invoke T(IFN) for each internal function IFN that also has an IFN_COND_* form. */ #define FOR_EACH_COND_FN_PAIR(T) \ diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 4234bbfed87..dd1bab0bddf 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -224,6 +224,7 @@ extern bool set_edom_supported_p (void); extern internal_fn get_conditional_internal_fn (tree_code); extern internal_fn get_conditional_internal_fn (internal_fn); +extern internal_fn get_conditional_len_internal_fn (tree_code); extern tree_code conditional_internal_fn_code (internal_fn); extern internal_fn get_unconditional_internal_fn (internal_fn); extern bool can_interpret_as_conditional_op_p (gimple *, tree *, diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 10e71178ce7..dd24f017235 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -6711,7 +6711,9 @@ vectorizable_operation (vec_info *vinfo, int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info); vec_loop_masks *masks = (loop_vinfo ? &LOOP_VINFO_MASKS (loop_vinfo) : NULL); + vec_loop_lens *lens = (loop_vinfo ? &LOOP_VINFO_LENS (loop_vinfo) : NULL); internal_fn cond_fn = get_conditional_internal_fn (code); + internal_fn cond_len_fn = get_conditional_len_internal_fn (code); /* If operating on inactive elements could generate spurious traps, we need to restrict the operation to active lanes. Note that this @@ -6730,9 +6732,17 @@ vectorizable_operation (vec_info *vinfo, && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) && mask_out_inactive) { - if (cond_fn == IFN_LAST - || !direct_internal_fn_supported_p (cond_fn, vectype, - OPTIMIZE_FOR_SPEED)) + if (cond_fn != IFN_LAST + && direct_internal_fn_supported_p (cond_fn, vectype, + OPTIMIZE_FOR_SPEED)) + vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num, + vectype, NULL); + else if (cond_len_fn != IFN_LAST + && direct_internal_fn_supported_p (cond_len_fn, vectype, + OPTIMIZE_FOR_SPEED)) + vect_record_loop_len (loop_vinfo, lens, ncopies * vec_num, vectype, + 1); + else { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -6740,9 +6750,6 @@ vectorizable_operation (vec_info *vinfo, " conditional operation is available.\n"); LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; } - else - vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num, - vectype, NULL); } /* Put types on constant and invariant SLP children. */ @@ -6805,6 +6812,7 @@ vectorizable_operation (vec_info *vinfo, "transform binary/unary operation.\n"); bool masked_loop_p = loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); + bool len_loop_p = loop_vinfo && LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo); /* POINTER_DIFF_EXPR has pointer arguments which are vectorized as vectors with unsigned elements, but the result is signed. So, we @@ -6971,11 +6979,16 @@ vectorizable_operation (vec_info *vinfo, gimple_assign_set_lhs (new_stmt, new_temp); vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); } - else if (masked_loop_p && mask_out_inactive) + else if ((masked_loop_p || len_loop_p) && mask_out_inactive) { - tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, - vec_num * ncopies, vectype, i); - auto_vec vops (5); + tree mask; + if (masked_loop_p) + mask = vect_get_loop_mask (loop_vinfo, gsi, masks, + vec_num * ncopies, vectype, i); + else + /* Dummy mask. */ + mask = build_minus_one_cst (truth_type_for (vectype)); + auto_vec vops (6); vops.quick_push (mask); vops.quick_push (vop0); if (vop1) @@ -6995,7 +7008,20 @@ vectorizable_operation (vec_info *vinfo, (cond_fn, vectype, vops.length () - 1, &vops[1]); vops.quick_push (else_value); } - gcall *call = gimple_build_call_internal_vec (cond_fn, vops); + if (len_loop_p) + { + tree len = vect_get_loop_len (loop_vinfo, gsi, lens, + vec_num * ncopies, vectype, i, 1); + signed char biasval + = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + tree bias = build_int_cst (intQI_type_node, biasval); + vops.quick_push (len); + vops.quick_push (bias); + } + gcall *call + = gimple_build_call_internal_vec (masked_loop_p ? cond_fn + : cond_len_fn, + vops); new_temp = make_ssa_name (vec_dest, call); gimple_call_set_lhs (call, new_temp); gimple_call_set_nothrow (call, true);