From patchwork Thu Feb 22 17:38:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Andre Vieira (lists)" X-Patchwork-Id: 204906 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:a81b:b0:108:e6aa:91d0 with SMTP id bq27csp98528dyb; Thu, 22 Feb 2024 09:39:28 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCU3c3IUEaqfRoGRZuAbfbHAWqp57LT3HlbnZB0ob4H3YiEftPxpMNn8E3yHmzDt+uxNHvLAkReLL8tmkH0sx5Ju6Ql7zg== X-Google-Smtp-Source: AGHT+IG9zrtipiPl7rmE03aTvftztOTdl7RPalY0IFQhlgjbfKCOTU6FQ4KT31w1wC3+nZbAbhGo X-Received: by 2002:ac8:5f53:0:b0:42d:c88d:ed4b with SMTP id y19-20020ac85f53000000b0042dc88ded4bmr26317618qta.34.1708623568299; Thu, 22 Feb 2024 09:39:28 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708623568; cv=pass; d=google.com; s=arc-20160816; b=YbJmzfY+qoObRyBBPyg4xUE5hZO6tS40LMSkBNnsRkhPa5qUvyMJ8TVCllWzeA931w QlXOIUssxUA6Bk9YlZiZ43xSvB7DbzZ8W7VM4O/HUSSda70T4sPp0EBC+x5KA17Hr6dI OgXbI1Zw8szPP5P3bCwQ/iKp2GpDN/UBWiuHTDXmTrqP2C6RLkhTL3awbbIMfwQ3G6pS NdDNyNmqYB9r78rafPtgOggibkk5vFuMvDEDSceihkBlGC4pg6rv2IX+DUt0UndHgTRu PT4Wi4YD2ullL3HE5Odpo17oWm6tsD3Ooj13QINC/C6nsd2jjqJJRq96Q6h+9PZ/+fX5 7EUA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mime-version:references :in-reply-to:message-id:date:subject:cc:to:from:arc-filter :dmarc-filter:delivered-to; bh=m2HqvIhWQy2jqFbWXn3lktzGN6PygkkqBPtJ4pnyxIg=; fh=lh7ptWl8zOQobrGh4MOY/ctpcMDiGJivh5VBXqVS558=; b=uhl7LxrWC2D5VftoYRd1aIklxRjDLcwXhrkr9kEERvwTnXuMcx3JNznvr8gjegQ+Yw LHtOEWsX1xvJPPN0FDGMI5XJMFYyfYgGSBcpgWtW1pzQa6+jHrrKRojO0XMf9k0pUIcM cNSmi/OENPEs9kpC9Idy11t3EosBUlqlnSIU4e/6UacoJbmkmmV6Vz9FNiTKZRMF/Sak dkWETu2eQgLFyqRx4cWIDBeguHp62Mt+aR9wOFfHWG8bPToK5QXpJLyNpx76jXLcc9jl MeE28wtj4WOfU2ZF6xYKPkFTuXGlbkeFqOeFFAmcxEgRNbh+Q4Wk2lE5hKN+F8a7PyoY sAcQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id s10-20020a05622a178a00b0042c145dbf0asi13994427qtk.540.2024.02.22.09.39.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Feb 2024 09:39:28 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B3AF4385841B for ; Thu, 22 Feb 2024 17:39:27 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id E6C103858417 for ; Thu, 22 Feb 2024 17:38:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E6C103858417 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E6C103858417 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708623501; cv=none; b=DiX2NUkGS5Of7U9sU1QBj5zB9P6/tW1WcrdBsyJdCr6xDEPKsZSf9xYbKtbEUO5mtD24l1VKILXhAZ5bhQ+Axd5lAHrsv3gaUjVR6Jjduu+1E/sEgdtJtYAuVJq8hMKmT2IzuZdFPHlOPRW0XJ5oNpxXn6P+6RGuYhz2XzF7fxY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708623501; c=relaxed/simple; bh=E3GR03Fn5utF60wQ3Q6a8gIJr7cp5+hTITQ6sTCufV0=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=p/W9GTOwFiKOZ1F1BdvVZH1cks/xxU1TxhNDlN2REUaTGEKdsTtltzo0Z24ZnEorTz3JAghxj992bUKBYmbtGJb+JV8+RIySgaBvUo+VPqnfTVupm05/1PQDGt5v2h7cCG6P0xhIGEF9T/Nzo3bZdorOOvDsYAZeJdlSKmuaUEs= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2F582DA7; Thu, 22 Feb 2024 09:38:57 -0800 (PST) Received: from e107157-lin.cambridge.arm.com (e107157-lin.cambridge.arm.com [10.2.78.70]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 08E673F73F; Thu, 22 Feb 2024 09:38:17 -0800 (PST) From: Andre Vieira To: gcc-patches@gcc.gnu.org Cc: stam.markianos-wright@arm.com, richard.earnshaw@arm.com, Andre Vieira Subject: [PATCH v5 2/5] doloop: Add support for predicated vectorized loops Date: Thu, 22 Feb 2024 17:38:00 +0000 Message-Id: <20240222173803.20989-3-andre.simoesdiasvieira@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20240222173803.20989-1-andre.simoesdiasvieira@arm.com> References: <20240222173803.20989-1-andre.simoesdiasvieira@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1791621666752941396 X-GMAIL-MSGID: 1791621666752941396 This patch adds support in the target agnostic doloop pass for the detection of predicated vectorized hardware loops. Arm is currently the only target that will make use of this feature. gcc/ChangeLog: * df-core.cc (df_bb_regno_only_def_find): New helper function. * df.h (df_bb_regno_only_def_find): Declare new function. * loop-doloop.cc (doloop_condition_get): Add support for detecting predicated vectorized hardware loops. (doloop_modify): Add support for GTU condition checks. (doloop_optimize): Update costing computation to support alterations to desc->niter_expr by the backend. Co-authored-by: Stam Markianos-Wright --- gcc/df-core.cc | 15 +++++ gcc/df.h | 1 + gcc/loop-doloop.cc | 164 +++++++++++++++++++++++++++------------------ 3 files changed, 113 insertions(+), 67 deletions(-) diff --git a/gcc/df-core.cc b/gcc/df-core.cc index f0eb4c93957..b0e8a88d433 100644 --- a/gcc/df-core.cc +++ b/gcc/df-core.cc @@ -1964,6 +1964,21 @@ df_bb_regno_last_def_find (basic_block bb, unsigned int regno) return NULL; } +/* Return the one and only def of REGNO within BB. If there is no def or + there are multiple defs, return NULL. */ + +df_ref +df_bb_regno_only_def_find (basic_block bb, unsigned int regno) +{ + df_ref temp = df_bb_regno_first_def_find (bb, regno); + if (!temp) + return NULL; + else if (temp == df_bb_regno_last_def_find (bb, regno)) + return temp; + else + return NULL; +} + /* Finds the reference corresponding to the definition of REG in INSN. DF is the dataflow object. */ diff --git a/gcc/df.h b/gcc/df.h index 84e5aa8b524..c4e690b40cf 100644 --- a/gcc/df.h +++ b/gcc/df.h @@ -987,6 +987,7 @@ extern void df_check_cfg_clean (void); #endif extern df_ref df_bb_regno_first_def_find (basic_block, unsigned int); extern df_ref df_bb_regno_last_def_find (basic_block, unsigned int); +extern df_ref df_bb_regno_only_def_find (basic_block, unsigned int); extern df_ref df_find_def (rtx_insn *, rtx); extern bool df_reg_defined (rtx_insn *, rtx); extern df_ref df_find_use (rtx_insn *, rtx); diff --git a/gcc/loop-doloop.cc b/gcc/loop-doloop.cc index 529e810e530..8953e1de960 100644 --- a/gcc/loop-doloop.cc +++ b/gcc/loop-doloop.cc @@ -85,10 +85,10 @@ doloop_condition_get (rtx_insn *doloop_pat) forms: 1) (parallel [(set (pc) (if_then_else (condition) - (label_ref (label)) - (pc))) - (set (reg) (plus (reg) (const_int -1))) - (additional clobbers and uses)]) + (label_ref (label)) + (pc))) + (set (reg) (plus (reg) (const_int -1))) + (additional clobbers and uses)]) The branch must be the first entry of the parallel (also required by jump.cc), and the second entry of the parallel must be a set of @@ -96,19 +96,33 @@ doloop_condition_get (rtx_insn *doloop_pat) the loop counter in an if_then_else too. 2) (set (reg) (plus (reg) (const_int -1)) - (set (pc) (if_then_else (reg != 0) - (label_ref (label)) - (pc))). + (set (pc) (if_then_else (reg != 0) + (label_ref (label)) + (pc))). - Some targets (ARM) do the comparison before the branch, as in the + 3) Some targets (Arm) do the comparison before the branch, as in the following form: - 3) (parallel [(set (cc) (compare ((plus (reg) (const_int -1), 0))) - (set (reg) (plus (reg) (const_int -1)))]) - (set (pc) (if_then_else (cc == NE) - (label_ref (label)) - (pc))) */ - + (parallel [(set (cc) (compare (plus (reg) (const_int -1)) 0)) + (set (reg) (plus (reg) (const_int -1)))]) + (set (pc) (if_then_else (cc == NE) + (label_ref (label)) + (pc))) + + 4) This form supports a construct that is used to represent a vectorized + do loop with predication, however we do not need to care about the + details of the predication here. + Arm uses this construct to support MVE tail predication. + + (parallel + [(set (pc) + (if_then_else (gtu (plus (reg) (const_int -n)) + (const_int n-1)) + (label_ref) + (pc))) + (set (reg) (plus (reg) (const_int -n))) + (additional clobbers and uses)]) + */ pattern = PATTERN (doloop_pat); if (GET_CODE (pattern) != PARALLEL) @@ -173,15 +187,17 @@ doloop_condition_get (rtx_insn *doloop_pat) if (! REG_P (reg)) return 0; - /* Check if something = (plus (reg) (const_int -1)). + /* Check if something = (plus (reg) (const_int -n)). On IA-64, this decrement is wrapped in an if_then_else. */ inc_src = SET_SRC (inc); if (GET_CODE (inc_src) == IF_THEN_ELSE) inc_src = XEXP (inc_src, 1); if (GET_CODE (inc_src) != PLUS - || XEXP (inc_src, 0) != reg - || XEXP (inc_src, 1) != constm1_rtx) + || !rtx_equal_p (XEXP (inc_src, 0), reg) + || !CONST_INT_P (XEXP (inc_src, 1)) + || INTVAL (XEXP (inc_src, 1)) >= 0) return 0; + int dec_num = -INTVAL (XEXP (inc_src, 1)); /* Check for (set (pc) (if_then_else (condition) (label_ref (label)) @@ -196,60 +212,63 @@ doloop_condition_get (rtx_insn *doloop_pat) /* Extract loop termination condition. */ condition = XEXP (SET_SRC (cmp), 0); - /* We expect a GE or NE comparison with 0 or 1. */ - if ((GET_CODE (condition) != GE - && GET_CODE (condition) != NE) - || (XEXP (condition, 1) != const0_rtx - && XEXP (condition, 1) != const1_rtx)) + /* We expect a GE or NE comparison with 0 or 1, or a GTU comparison with + dec_num - 1. */ + if (!((GET_CODE (condition) == GE + || GET_CODE (condition) == NE) + && (XEXP (condition, 1) == const0_rtx + || XEXP (condition, 1) == const1_rtx )) + &&!(GET_CODE (condition) == GTU + && ((INTVAL (XEXP (condition, 1))) == (dec_num - 1)))) return 0; - if ((XEXP (condition, 0) == reg) + if (rtx_equal_p (XEXP (condition, 0), reg) /* For the third case: */ || ((cc_reg != NULL_RTX) && (XEXP (condition, 0) == cc_reg) - && (reg_orig == reg)) + && (rtx_equal_p (reg_orig, reg))) || (GET_CODE (XEXP (condition, 0)) == PLUS - && XEXP (XEXP (condition, 0), 0) == reg)) - { - if (GET_CODE (pattern) != PARALLEL) - /* For the second form we expect: + && rtx_equal_p (XEXP (XEXP (condition, 0), 0), reg, NULL))) + { + if (GET_CODE (pattern) != PARALLEL) + /* For the second form we expect: - (set (reg) (plus (reg) (const_int -1)) - (set (pc) (if_then_else (reg != 0) - (label_ref (label)) - (pc))). + (set (reg) (plus (reg) (const_int -1)) + (set (pc) (if_then_else (reg != 0) + (label_ref (label)) + (pc))). - is equivalent to the following: + is equivalent to the following: - (parallel [(set (pc) (if_then_else (reg != 1) - (label_ref (label)) - (pc))) - (set (reg) (plus (reg) (const_int -1))) - (additional clobbers and uses)]) + (parallel [(set (pc) (if_then_else (reg != 1) + (label_ref (label)) + (pc))) + (set (reg) (plus (reg) (const_int -1))) + (additional clobbers and uses)]) - For the third form we expect: + For the third form we expect: - (parallel [(set (cc) (compare ((plus (reg) (const_int -1)), 0)) - (set (reg) (plus (reg) (const_int -1)))]) - (set (pc) (if_then_else (cc == NE) - (label_ref (label)) - (pc))) + (parallel [(set (cc) (compare ((plus (reg) (const_int -1)), 0)) + (set (reg) (plus (reg) (const_int -1)))]) + (set (pc) (if_then_else (cc == NE) + (label_ref (label)) + (pc))) - which is equivalent to the following: + which is equivalent to the following: - (parallel [(set (cc) (compare (reg, 1)) - (set (reg) (plus (reg) (const_int -1))) - (set (pc) (if_then_else (NE == cc) - (label_ref (label)) - (pc))))]) + (parallel [(set (cc) (compare (reg, 1)) + (set (reg) (plus (reg) (const_int -1))) + (set (pc) (if_then_else (NE == cc) + (label_ref (label)) + (pc))))]) - So we return the second form instead for the two cases. + So we return the second form instead for the two cases. */ - condition = gen_rtx_fmt_ee (NE, VOIDmode, inc_src, const1_rtx); + condition = gen_rtx_fmt_ee (NE, VOIDmode, inc_src, const1_rtx); return condition; - } + } /* ??? If a machine uses a funny comparison, we could return a canonicalized form here. */ @@ -507,6 +526,11 @@ doloop_modify (class loop *loop, class niter_desc *desc, nonneg = 1; break; + case GTU: + /* The iteration count does not need incrementing for a GTU test. */ + increment_count = false; + break; + /* Abort if an invalid doloop pattern has been generated. */ default: gcc_unreachable (); @@ -529,6 +553,10 @@ doloop_modify (class loop *loop, class niter_desc *desc, if (desc->noloop_assumptions) { + /* The GTU case has only been implemented for Arm, where + noloop_assumptions gets explicitly set to NULL for that case, so + assert here for safety. */ + gcc_assert (GET_CODE (condition) != GTU); rtx ass = copy_rtx (desc->noloop_assumptions); basic_block preheader = loop_preheader_edge (loop)->src; basic_block set_zero = split_edge (loop_preheader_edge (loop)); @@ -642,7 +670,7 @@ doloop_optimize (class loop *loop) { scalar_int_mode mode; rtx doloop_reg; - rtx count; + rtx count = NULL_RTX; widest_int iterations, iterations_max; rtx_code_label *start_label; rtx condition; @@ -685,17 +713,6 @@ doloop_optimize (class loop *loop) return false; } - max_cost - = COSTS_N_INSNS (param_max_iterations_computation_cost); - if (set_src_cost (desc->niter_expr, mode, optimize_loop_for_speed_p (loop)) - > max_cost) - { - if (dump_file) - fprintf (dump_file, - "Doloop: number of iterations too costly to compute.\n"); - return false; - } - if (desc->const_iter) iterations = widest_int::from (rtx_mode_t (desc->niter_expr, mode), UNSIGNED); @@ -716,12 +733,25 @@ doloop_optimize (class loop *loop) /* Generate looping insn. If the pattern FAILs then give up trying to modify the loop since there is some aspect the back-end does - not like. */ - count = copy_rtx (desc->niter_expr); + not like. If this succeeds, there is a chance that the loop + desc->niter_expr has been altered by the backend, so only extract + that data after the gen_doloop_end. */ start_label = block_label (desc->in_edge->dest); doloop_reg = gen_reg_rtx (mode); rtx_insn *doloop_seq = targetm.gen_doloop_end (doloop_reg, start_label); + max_cost + = COSTS_N_INSNS (param_max_iterations_computation_cost); + if (set_src_cost (desc->niter_expr, mode, optimize_loop_for_speed_p (loop)) + > max_cost) + { + if (dump_file) + fprintf (dump_file, + "Doloop: number of iterations too costly to compute.\n"); + return false; + } + + count = copy_rtx (desc->niter_expr); word_mode_size = GET_MODE_PRECISION (word_mode); word_mode_max = (HOST_WIDE_INT_1U << (word_mode_size - 1) << 1) - 1; if (! doloop_seq