From patchwork Thu Jul 13 08:54:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 119692 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp1689801vqm; Thu, 13 Jul 2023 02:06:36 -0700 (PDT) X-Google-Smtp-Source: APBJJlHPbaJMMixz5N314V/keQkLrIxXUcWq8do7nVtlp31himRPCwMd/LhI21A/083ZXNvNkX5K X-Received: by 2002:a05:6402:b27:b0:51d:e495:626a with SMTP id bo7-20020a0564020b2700b0051de495626amr1122683edb.6.1689239196264; Thu, 13 Jul 2023 02:06:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689239196; cv=none; d=google.com; s=arc-20160816; b=O7ihOtVMmGt2O2N+57Dwhvd0s03MwRQ0vSkPJA5I1mlTlFTyMS0fJahkCEF4i6Qruo 9p4Q8dkBk57F/YdkJsu24uuW6/S0nvvpyM97xF9eslNjocacD32OL0P8rMWUK+/R9gcp sKl63isShM9nFxgGJ9HndzwA4c631Tp6IWzkyaCP3JvpKQj9xnUNdPw8cljlXip+1HOt zLrWN6TBZOYHI7t+BCvrCq4yvtTNZA8qjXQq46RrFAi+mCnYNkWIfav6sFrEehAP9nH/ 5QafBcStUXZ2hX98zjzgR3+9wCGweEo0atGaRtjLqdr9oPITRmJmQ7SCPScNVX64PKGJ xHqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=TyrIP25mwZipfMPKkUptViOnI6EYvItooO1rOXIG8UQ=; fh=CHbn33ss3MXGFqXGZpS89+qfBQv2oFkoJCJQmHw6RIo=; b=NW1byBFPaGuHpRifhjjF9PS6P0Cmtzob1FZSw5PNrCIWTjE2AmHaAxKKnTzD1SVmJK UMOjbfJRjKLJCi8jIwSaDlM9NQkpR64UMPICW/goZneZdMhIFjrTW6U2YSDibKCW9BGU izoFJsKOqdeZlNna+JUzJdGZ4tAAeh5WRszrPPfpRtr2jn9xI5pNLfuCtCSsFSWsLgDo 7UY6FmRSMrjeKnB1Fryi/xgGRXygMIkAc+bJu9rmwUXWjLXT6MygQSwa8TXWEogTWjCA g2jU41W2WmcXmyDkeBz+D8V/p9n2iPz25OwARuwRh0yvGePXoP8mmVHRfBC2l8T6mz/e CT7Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id m6-20020aa7d346000000b0051e1644fd3dsi6867212edr.635.2023.07.13.02.06.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jul 2023 02:06:36 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A8E3F3853D25 for ; Thu, 13 Jul 2023 08:57:40 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbg150.qq.com (smtpbg150.qq.com [18.132.163.193]) by sourceware.org (Postfix) with ESMTPS id CCD183875440 for ; Thu, 13 Jul 2023 08:54:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CCD183875440 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp74t1689238476tx0lm5gg Received: from rios-cad5.localdomain ( [58.60.1.11]) by bizesmtp.qq.com (ESMTP) with id ; Thu, 13 Jul 2023 16:54:35 +0800 (CST) X-QQ-SSF: 01400000000000G0T000000A0000000 X-QQ-FEAT: 0vfWcIgh24xMRNvO1Qs81/4icnPbgUJT0DTBiygOI0dFdx35ZSNjapsJWNFia yk+MEi2pPXyYcBBR8iiZ9i7NIG3NfEkqE5cDl3l34NXdUx53ZFfqB3xrF0n/X6ofXX5ClB6 Af4bwiiVPTTsvSWMPRNFPRACzlM0ngYDMqwghhjyhahsh7wffQLwIZf+bnuAfP5Y06VR2nL zuBL3IkTIJ/q77Hiwg2TI43z+maJ5BjLypwB7WqO4t3v5pnB9WVJHnmxuo2TBnzAvmwy9NH 07DOSUp+W7HAq6GBs/O/EUKSu1Y6ys/O0PRxG6+TYLC8AgodunrFmgnnBcshthdStKMbw0I mUINeleSn3aFRCNMe3DL/AwAaqqPUxemdlRd3Q773Ih3piPrOODEptq1fjRbA13jv7X9Bkv XMwZn3WCHhQ= X-QQ-GoodBg: 2 X-BIZMAIL-ID: 4610992051313230854 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Ju-Zhe Zhong Subject: [PATCH V2] SSA MATH: Support COND_LEN_FMA for floating-point math optimization Date: Thu, 13 Jul 2023 16:54:34 +0800 Message-Id: <20230713085434.3381643-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771295679417387792 X-GMAIL-MSGID: 1771295679417387792 From: Ju-Zhe Zhong Hi, Richard and Richi. Previous patch we support COND_LEN_* binary operations. However, we didn't support COND_LEN_* ternary. Now, this patch support COND_LEN_* ternary. Consider this following case: #define TEST_TYPE(TYPE) \ __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, \ TYPE *__restrict a, \ TYPE *__restrict b,\ TYPE *__restrict c, int n) \ { \ for (int i = 0; i < n; i++) \ dst[i] += a[i] * b[i]; \ } #define TEST_ALL() TEST_TYPE (double) TEST_ALL () Before this patch: ... COND_LEN_MUL COND_LEN_ADD Afther this patch: ... COND_LEN_FMA gcc/ChangeLog: * genmatch.cc (commutative_op): Add COND_LEN_* * internal-fn.cc (first_commutative_argument): Ditto. (CASE): Ditto. (get_unconditional_internal_fn): Ditto. (can_interpret_as_conditional_op_p): Ditto. (internal_fn_len_index): Ditto. * internal-fn.h (can_interpret_as_conditional_op_p): Ditt. * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Ditto. (convert_mult_to_fma): Ditto. (math_opts_dom_walker::after_dom_children): Ditto. --- gcc/genmatch.cc | 13 ++++++ gcc/internal-fn.cc | 87 ++++++++++++++++++++++++++++++++++----- gcc/internal-fn.h | 2 +- gcc/tree-ssa-math-opts.cc | 80 +++++++++++++++++++++++++++++------ 4 files changed, 159 insertions(+), 23 deletions(-) diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc index 5fceeec9780..2302f2a7ff0 100644 --- a/gcc/genmatch.cc +++ b/gcc/genmatch.cc @@ -559,6 +559,19 @@ commutative_op (id_base *id) case CFN_COND_FMS: case CFN_COND_FNMA: case CFN_COND_FNMS: + case CFN_COND_LEN_ADD: + case CFN_COND_LEN_MUL: + case CFN_COND_LEN_MIN: + case CFN_COND_LEN_MAX: + case CFN_COND_LEN_FMIN: + case CFN_COND_LEN_FMAX: + case CFN_COND_LEN_AND: + case CFN_COND_LEN_IOR: + case CFN_COND_LEN_XOR: + case CFN_COND_LEN_FMA: + case CFN_COND_LEN_FMS: + case CFN_COND_LEN_FNMA: + case CFN_COND_LEN_FNMS: return 1; default: diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index c11123a1173..e698f0bffc7 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -4191,6 +4191,19 @@ first_commutative_argument (internal_fn fn) case IFN_COND_FMS: case IFN_COND_FNMA: case IFN_COND_FNMS: + case IFN_COND_LEN_ADD: + case IFN_COND_LEN_MUL: + case IFN_COND_LEN_MIN: + case IFN_COND_LEN_MAX: + case IFN_COND_LEN_FMIN: + case IFN_COND_LEN_FMAX: + case IFN_COND_LEN_AND: + case IFN_COND_LEN_IOR: + case IFN_COND_LEN_XOR: + case IFN_COND_LEN_FMA: + case IFN_COND_LEN_FMS: + case IFN_COND_LEN_FNMA: + case IFN_COND_LEN_FNMS: return 1; default: @@ -4330,11 +4343,14 @@ conditional_internal_fn_code (internal_fn ifn) { switch (ifn) { -#define CASE(CODE, IFN) case IFN_COND_##IFN: return CODE; - FOR_EACH_CODE_MAPPING(CASE) +#define CASE(CODE, IFN) \ + case IFN_COND_##IFN: \ + case IFN_COND_LEN_##IFN: \ + return CODE; + FOR_EACH_CODE_MAPPING (CASE) #undef CASE - default: - return ERROR_MARK; + default: + return ERROR_MARK; } } @@ -4433,6 +4449,18 @@ get_unconditional_internal_fn (internal_fn ifn) operating elementwise if the operands are vectors. This includes the case of an all-true COND, so that the operation always happens. + There is an alternative approach to interpret the STMT when the operands + are vectors which is the operation predicated by both conditional mask + and loop control length, the equivalent C code: + + for (int i = 0; i < NUNTIS; i++) + { + if (i < LEN + BIAS && COND[i]) + LHS[i] = A[i] CODE B[i]; + else + LHS[i] = ELSE[i]; + } + When returning true, set: - *COND_OUT to the condition COND, or to NULL_TREE if the condition @@ -4440,13 +4468,18 @@ get_unconditional_internal_fn (internal_fn ifn) - *CODE_OUT to the tree code - OPS[I] to operand I of *CODE_OUT - *ELSE_OUT to the fallback value ELSE, or to NULL_TREE if the - condition is known to be all true. */ + condition is known to be all true. + - *LEN to the len argument if it COND_LEN_* operations or to NULL_TREE. + - *BIAS to the bias argument if it COND_LEN_* operations or to NULL_TREE. */ bool can_interpret_as_conditional_op_p (gimple *stmt, tree *cond_out, tree_code *code_out, - tree (&ops)[3], tree *else_out) + tree (&ops)[3], tree *else_out, + tree *len, tree *bias) { + *len = NULL_TREE; + *bias = NULL_TREE; if (gassign *assign = dyn_cast (stmt)) { *cond_out = NULL_TREE; @@ -4462,18 +4495,28 @@ can_interpret_as_conditional_op_p (gimple *stmt, tree *cond_out, { internal_fn ifn = gimple_call_internal_fn (call); tree_code code = conditional_internal_fn_code (ifn); + int len_index = internal_fn_len_index (ifn); + int cond_nargs = len_index >= 0 ? 4 : 2; if (code != ERROR_MARK) { *cond_out = gimple_call_arg (call, 0); *code_out = code; - unsigned int nops = gimple_call_num_args (call) - 2; + unsigned int nops = gimple_call_num_args (call) - cond_nargs; for (unsigned int i = 0; i < 3; ++i) ops[i] = i < nops ? gimple_call_arg (call, i + 1) : NULL_TREE; *else_out = gimple_call_arg (call, nops + 1); - if (integer_truep (*cond_out)) + if (len_index < 0) + { + if (integer_truep (*cond_out)) + { + *cond_out = NULL_TREE; + *else_out = NULL_TREE; + } + } + else { - *cond_out = NULL_TREE; - *else_out = NULL_TREE; + *len = gimple_call_arg (call, len_index); + *bias = gimple_call_arg (call, len_index + 1); } return true; } @@ -4561,8 +4604,32 @@ internal_fn_len_index (internal_fn fn) case IFN_LEN_MASK_GATHER_LOAD: case IFN_LEN_MASK_SCATTER_STORE: + case IFN_COND_LEN_FMA: + case IFN_COND_LEN_FMS: + case IFN_COND_LEN_FNMA: + case IFN_COND_LEN_FNMS: return 5; + case IFN_COND_LEN_ADD: + case IFN_COND_LEN_SUB: + case IFN_COND_LEN_MUL: + case IFN_COND_LEN_DIV: + case IFN_COND_LEN_MOD: + case IFN_COND_LEN_RDIV: + case IFN_COND_LEN_MIN: + case IFN_COND_LEN_MAX: + case IFN_COND_LEN_FMIN: + case IFN_COND_LEN_FMAX: + case IFN_COND_LEN_AND: + case IFN_COND_LEN_IOR: + case IFN_COND_LEN_XOR: + case IFN_COND_LEN_SHL: + case IFN_COND_LEN_SHR: + return 4; + + case IFN_COND_LEN_NEG: + return 3; + default: return -1; } diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index dd1bab0bddf..a5c3f4765ff 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -229,7 +229,7 @@ extern tree_code conditional_internal_fn_code (internal_fn); extern internal_fn get_unconditional_internal_fn (internal_fn); extern bool can_interpret_as_conditional_op_p (gimple *, tree *, tree_code *, tree (&)[3], - tree *); + tree *, tree *, tree *); extern bool internal_load_fn_p (internal_fn); extern bool internal_store_fn_p (internal_fn); diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc index 68fc518b1ab..712097ac5be 100644 --- a/gcc/tree-ssa-math-opts.cc +++ b/gcc/tree-ssa-math-opts.cc @@ -3099,10 +3099,11 @@ convert_mult_to_fma_1 (tree mul_result, tree op1, tree op2) negate_p = true; } - tree cond, else_value, ops[3]; + tree cond, else_value, ops[3], len, bias; tree_code code; if (!can_interpret_as_conditional_op_p (use_stmt, &cond, &code, - ops, &else_value)) + ops, &else_value, + &len, &bias)) gcc_unreachable (); addop = ops[0] == result ? ops[1] : ops[0]; @@ -3122,7 +3123,11 @@ convert_mult_to_fma_1 (tree mul_result, tree op1, tree op2) if (seq) gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT); - if (cond) + if (len) + fma_stmt + = gimple_build_call_internal (IFN_COND_LEN_FMA, 7, cond, mulop1, op2, + addop, else_value, len, bias); + else if (cond) fma_stmt = gimple_build_call_internal (IFN_COND_FMA, 5, cond, mulop1, op2, addop, else_value); else @@ -3307,7 +3312,8 @@ last_fma_candidate_feeds_initial_phi (fma_deferring_state *state, static bool convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2, - fma_deferring_state *state, tree mul_cond = NULL_TREE) + fma_deferring_state *state, tree mul_cond = NULL_TREE, + tree mul_len = NULL_TREE, tree mul_bias = NULL_TREE) { tree mul_result = gimple_get_lhs (mul_stmt); /* If there isn't a LHS then this can't be an FMA. There can be no LHS @@ -3420,10 +3426,10 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2, negate_p = seen_negate_p = true; } - tree cond, else_value, ops[3]; + tree cond, else_value, ops[3], len, bias; tree_code code; if (!can_interpret_as_conditional_op_p (use_stmt, &cond, &code, ops, - &else_value)) + &else_value, &len, &bias)) return false; switch (code) @@ -3439,15 +3445,49 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2, return false; } - if (mul_cond && cond != mul_cond) - return false; - - if (cond) + if (len) { - if (cond == result || else_value == result) + /* For COND_LEN_* operations, we may have dummpy mask which is + the all true mask. Such TREE type may be mul_cond != cond + but we still consider they are equal. */ + if (mul_cond && cond != mul_cond + && !(integer_truep (mul_cond) && integer_truep (cond))) return false; - if (!direct_internal_fn_supported_p (IFN_COND_FMA, type, opt_type)) + + if (else_value == result) + return false; + + if (!direct_internal_fn_supported_p (IFN_COND_LEN_FMA, type, + opt_type)) return false; + + if (mul_len) + { + poly_int64 mul_value, value; + if (poly_int_tree_p (mul_len, &mul_value) + && poly_int_tree_p (len, &value) + && maybe_ne (mul_value, value)) + return false; + else if (mul_len != len) + return false; + + if (wi::to_widest (mul_bias) != wi::to_widest (bias)) + return false; + } + } + else + { + if (mul_cond && cond != mul_cond) + return false; + + if (cond) + { + if (cond == result || else_value == result) + return false; + if (!direct_internal_fn_supported_p (IFN_COND_FMA, type, + opt_type)) + return false; + } } /* If the subtrahend (OPS[1]) is computed by a MULT_EXPR that @@ -5632,6 +5672,22 @@ math_opts_dom_walker::after_dom_children (basic_block bb) } break; + case CFN_COND_LEN_MUL: + if (convert_mult_to_fma (stmt, + gimple_call_arg (stmt, 1), + gimple_call_arg (stmt, 2), + &fma_state, + gimple_call_arg (stmt, 0), + gimple_call_arg (stmt, 4), + gimple_call_arg (stmt, 5))) + + { + gsi_remove (&gsi, true); + release_defs (stmt); + continue; + } + break; + case CFN_LAST: cancel_fma_deferring (&fma_state); break;