From patchwork Tue Aug 16 14:04:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 556 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6a10:38f:b0:2d5:3c95:9e21 with SMTP id 15csp2027382pxh; Tue, 16 Aug 2022 07:05:47 -0700 (PDT) X-Google-Smtp-Source: AA6agR6EEw3om1fJ2QV1nm/PegfW/M5k3CMSNU5chsMrrahUuhVPk2QMUPtd7hOSOkd8l1gl0cJP X-Received: by 2002:a17:907:1c1f:b0:739:17e2:209a with SMTP id nc31-20020a1709071c1f00b0073917e2209amr1438655ejc.320.1660658747452; Tue, 16 Aug 2022 07:05:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660658747; cv=none; d=google.com; s=arc-20160816; b=b2C0j1MqeCXCJfFmjl7+qItASIkohaTrpN6wolLDTZKa+KOEpWT4LwF64xB9FmzvML V5enUaixD6Dablg3umOuSchQ++6mZoSia9gAY/GyvBsIVxOrLC7ORP745TnI3lz+fqVm XBgxsTXuwPkcYRO/5EmOem+8CR1QT+PKIqOjolWuiY7gKjaoWpGckP8VADWPZBPXd2gR 4fjB3QbrgCg7/VpwnxfhawRbkpI9h+6zCxk9Ki1X1w6/i6Xxu6Pdm6UsPjV+7JJhgxi3 kPE0J2jGPQK0C19oyWz7+kGPZeuhPObFXyf+0dl++J60VKAp9sugOkv3LaYU4OKduzlQ xpiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:sender:errors-to:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :mime-version:user-agent:subject:to:date:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=FLl7UL7nhlM0ViFomqBYJNDWPciTjOocdYESxFVBg5Q=; b=KhdC8dtdpqeonRTMI0VDQmIXBmEsZXby7G4UpjK2C0tGrlHJ4bdtU8ROIprFGVVrXq cz2O0wZBxh94O71k1u2ukFWMHqmuLP/hB3M4aQhfF5bar/QoeE/1XOxJfMa4VT1ppgsX hiWXPowiN41nbn0mWuYhwS3UWY19cxAVzZv4Kx/1/7XcKojPz0yr1HgTv6hl2t/8woQx tsG/X4ueZyeyRBCw2n32gEr7j3hfrzLY8k274/0Z5PGmdAuUahlEmigrBhA5Zoue5vOG zB+So+do7KA1K87ZWKb/XXKpHviN/rFqXrg1IpBmGwFl8IgNL4WdjQsjToyfdiXVmq8O Ygzg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=huUfZyLi; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id hp39-20020a1709073e2700b007305ad2d6dfsi11952247ejc.736.2022.08.16.07.05.47 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Aug 2022 07:05:47 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=huUfZyLi; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3823F3858436 for ; Tue, 16 Aug 2022 14:05:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3823F3858436 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1660658746; bh=FLl7UL7nhlM0ViFomqBYJNDWPciTjOocdYESxFVBg5Q=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=huUfZyLiEk/6qFplwusCEEnYbjzqJtuB95wbZg3221WNPEa4LmdnYs/UvmfDcZQnl sW/zXuTorjPs2XDuuSt+nefoEMxy4Jk68T9kDgTJCIUeMWfnIhLHb17IFZtX4ZMDLU NFvIGkMM3SxsLW6TwmAe0Qie9CTjbPRj0WwOhFpM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id 561403858425 for ; Tue, 16 Aug 2022 14:04:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 561403858425 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 011A01FDEC; Tue, 16 Aug 2022 14:04:58 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id E1FB32C15A; Tue, 16 Aug 2022 14:04:57 +0000 (UTC) Date: Tue, 16 Aug 2022 14:04:57 +0000 (UTC) To: gcc-patches@gcc.gnu.org Subject: [PATCH] Refactor back_threader_profitability User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP, T_FILL_THIS_FORM_SHORT, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: Richard Biener Reply-To: Richard Biener Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" Message-Id: <20220816140546.3823F3858436@sourceware.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1741326906906688494?= X-GMAIL-MSGID: =?utf-8?q?1741326906906688494?= The following refactors profitable_path_p in the backward threader, splitting out parts that can be computed once the exit block is known, parts that contiguously update and that can be checked allowing for the path to be later identified as FSM with larger limits, possibly_profitable_path_p, and final checks done when the whole path is known, profitable_path_p. I've removed the back_threader_profitability instance from the back_threader class and instead instantiate it once per path discovery. I've kept the size compute non-incremental to simplify the patch and not worry about unwinding. There's key changes to previous behavior - namely we apply the param_max_jump_thread_duplication_stmts early only when we know the path cannot become an FSM one (multiway + thread through latch) but make sure to elide the path query when we we didn't yet discover that but are over this limit. Similarly the speed limit is now used even when we did not yet discover a hot BB on the path. Basically the idea is to only stop path discovery when we know the path will never become profitable but avoid the expensive path range query when we know it's currently not. I've done a few cleanups, merging functions, on the way. Bootstrapped and tested on x86_64-unknown-linux-gnu. Statistics show an overall slight increase in threading but looking at different files theres noise up and down. That's somewhat expected since we now are applying the "more correct" limits in the end. Unless I made big mistakes of course. The next thing cost-wise would be to raise the backwards threading limit to the limit of DOM so we don't get artificial high counts for that. OK? Thanks, Richard. * tree-ssa-threadbackward.cc (back_threader_profitability): Split profitable_path_p into possibly_profitable_path_p and itself, keep state as new members. (back_threader::m_profit): Remove. (back_threader::find_paths): Likewise. (back_threader::maybe_register_path): Take profitability instance as parameter. (back_threader::find_paths_to_names): Likewise. Use possibly_profitable_path_p and avoid the path range query when the path is currently too large. (back_threader::find_paths): Fold into ... (back_threader::maybe_thread_block): ... this. (get_gimple_control_stmt): Remove. (back_threader_profitability::possibly_profitable_path_p): Split out from profitable_path_p, do early profitability checks. (back_threader_profitability::profitable_path_p): Do final profitability path after the taken edge has been determined. --- gcc/tree-ssa-threadbackward.cc | 350 ++++++++++++++++++--------------- 1 file changed, 192 insertions(+), 158 deletions(-) diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc index 1c362839fab..077a767e73d 100644 --- a/gcc/tree-ssa-threadbackward.cc +++ b/gcc/tree-ssa-threadbackward.cc @@ -61,15 +61,33 @@ public: class back_threader_profitability { public: - back_threader_profitability (bool speed_p) - : m_speed_p (speed_p) - { } - bool profitable_path_p (const vec &, tree name, edge taken, - bool *irreducible_loop = NULL); + back_threader_profitability (bool speed_p, gimple *stmt); + bool possibly_profitable_path_p (const vec &, tree, bool *); + bool profitable_path_p (const vec &, + edge taken, bool *irreducible_loop); private: const bool m_speed_p; + int m_exit_jump_benefit; + bool m_threaded_multiway_branch; + // The following are computed by possibly_profitable_path_p + bool m_threaded_through_latch; + bool m_multiway_branch_in_path; + bool m_contains_hot_bb; + int m_n_insns; }; +back_threader_profitability::back_threader_profitability (bool speed_p, + gimple *last) + : m_speed_p (speed_p) +{ + m_threaded_multiway_branch = (gimple_code (last) == GIMPLE_SWITCH + || gimple_code (last) == GIMPLE_GOTO); + // The forward threader has estimate_threading_killed_stmts, in + // particular it estimates further DCE from eliminating the exit + // control stmt. + m_exit_jump_benefit = estimate_num_insns (last, &eni_size_weights); +} + // Back threader flags. #define BT_NONE 0 // Generate fast code at the expense of code size. @@ -86,11 +104,11 @@ public: unsigned thread_blocks (); private: void maybe_thread_block (basic_block bb); - void find_paths (basic_block bb, tree name); bool debug_counter (); - edge maybe_register_path (); + edge maybe_register_path (back_threader_profitability &); void maybe_register_path_dump (edge taken_edge); - void find_paths_to_names (basic_block bb, bitmap imports, unsigned); + void find_paths_to_names (basic_block bb, bitmap imports, unsigned, + back_threader_profitability &); edge find_taken_edge (const vec &path); edge find_taken_edge_cond (const vec &path, gcond *); edge find_taken_edge_switch (const vec &path, gswitch *); @@ -98,7 +116,6 @@ private: virtual void dump (FILE *out); back_threader_registry m_registry; - back_threader_profitability m_profit; path_range_query *m_solver; // Current path being analyzed. @@ -131,8 +148,7 @@ private: const edge back_threader::UNREACHABLE_EDGE = (edge) -1; back_threader::back_threader (function *fun, unsigned flags, bool first) - : m_profit (flags & BT_SPEED), - m_first (first) + : m_first (first) { if (flags & BT_SPEED) loop_optimizer_init (LOOPS_HAVE_PREHEADERS | LOOPS_HAVE_SIMPLE_LATCHES); @@ -226,7 +242,7 @@ back_threader::maybe_register_path_dump (edge taken) // unreachable. edge -back_threader::maybe_register_path () +back_threader::maybe_register_path (back_threader_profitability &profit) { edge taken_edge = find_taken_edge (m_path); @@ -241,8 +257,7 @@ back_threader::maybe_register_path () else { bool irreducible = false; - if (m_profit.profitable_path_p (m_path, m_name, taken_edge, - &irreducible) + if (profit.profitable_path_p (m_path, taken_edge, &irreducible) && debug_counter () && m_registry.register_path (m_path, taken_edge)) { @@ -342,17 +357,21 @@ back_threader::find_taken_edge_cond (const vec &path, void back_threader::find_paths_to_names (basic_block bb, bitmap interesting, - unsigned overall_paths) + unsigned overall_paths, + back_threader_profitability &profit) { if (m_visited_bbs.add (bb)) return; m_path.safe_push (bb); - // Try to resolve the path without looking back. + // Try to resolve the path without looking back. Avoid resolving paths + // we know are large but not (yet) FSM. + bool large_non_fsm; if (m_path.length () > 1 - && (!m_profit.profitable_path_p (m_path, m_name, NULL) - || maybe_register_path ())) + && (!profit.possibly_profitable_path_p (m_path, m_name, &large_non_fsm) + || (!large_non_fsm + && maybe_register_path (profit)))) ; // The backwards thread copier cannot copy blocks that do not belong @@ -474,7 +493,8 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting, } } } - find_paths_to_names (e->src, new_interesting, overall_paths); + find_paths_to_names (e->src, new_interesting, overall_paths, + profit); // Restore new_interesting. for (int def : unwind) bitmap_clear_bit (new_interesting, def); @@ -499,66 +519,6 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting, m_visited_bbs.remove (bb); } -// Search backwards from BB looking for paths where the final -// conditional out of BB can be determined. NAME is the LHS of the -// final conditional. Register such paths for jump threading. - -void -back_threader::find_paths (basic_block bb, tree name) -{ - gimple *stmt = last_stmt (bb); - if (!stmt - || (gimple_code (stmt) != GIMPLE_COND - && gimple_code (stmt) != GIMPLE_SWITCH)) - return; - - if (EDGE_COUNT (bb->succs) > 1) - { - m_last_stmt = stmt; - m_visited_bbs.empty (); - m_path.truncate (0); - m_name = name; - - // We compute imports of the path during discovery starting - // just with names used in the conditional. - bitmap_clear (m_imports); - ssa_op_iter iter; - FOR_EACH_SSA_TREE_OPERAND (name, stmt, iter, SSA_OP_USE) - { - if (!gimple_range_ssa_p (name)) - return; - bitmap_set_bit (m_imports, SSA_NAME_VERSION (name)); - } - - // Interesting is the set of imports we still not have see - // the definition of. So while imports only grow, the - // set of interesting defs dwindles and once empty we can - // stop searching. - auto_bitmap interesting; - bitmap_copy (interesting, m_imports); - find_paths_to_names (bb, interesting, 1); - } -} - -// Simple helper to get the last statement from BB, which is assumed -// to be a control statement. Return NULL if the last statement is -// not a control statement. - -static gimple * -get_gimple_control_stmt (basic_block bb) -{ - gimple_stmt_iterator gsi = gsi_last_nondebug_bb (bb); - - if (gsi_end_p (gsi)) - return NULL; - - gimple *stmt = gsi_stmt (gsi); - enum gimple_code code = gimple_code (stmt); - if (code == GIMPLE_COND || code == GIMPLE_SWITCH || code == GIMPLE_GOTO) - return stmt; - return NULL; -} - // Search backwards from BB looking for paths where the final // conditional maybe threaded to a successor block. Record such paths // for jump threading. @@ -566,7 +526,10 @@ get_gimple_control_stmt (basic_block bb) void back_threader::maybe_thread_block (basic_block bb) { - gimple *stmt = get_gimple_control_stmt (bb); + if (EDGE_COUNT (bb->succs) <= 1) + return; + + gimple *stmt = last_stmt (bb); if (!stmt) return; @@ -576,12 +539,33 @@ back_threader::maybe_thread_block (basic_block bb) name = gimple_switch_index (as_a (stmt)); else if (code == GIMPLE_COND) name = gimple_cond_lhs (stmt); - else if (code == GIMPLE_GOTO) - name = gimple_goto_dest (stmt); else - name = NULL; + return; - find_paths (bb, name); + m_last_stmt = stmt; + m_visited_bbs.empty (); + m_path.truncate (0); + m_name = name; + + // We compute imports of the path during discovery starting + // just with names used in the conditional. + bitmap_clear (m_imports); + ssa_op_iter iter; + FOR_EACH_SSA_TREE_OPERAND (name, stmt, iter, SSA_OP_USE) + { + if (!gimple_range_ssa_p (name)) + return; + bitmap_set_bit (m_imports, SSA_NAME_VERSION (name)); + } + + // Interesting is the set of imports we still not have see + // the definition of. So while imports only grow, the + // set of interesting defs dwindles and once empty we can + // stop searching. + auto_bitmap interesting; + bitmap_copy (interesting, m_imports); + back_threader_profitability profit (m_flags & BT_SPEED, stmt); + find_paths_to_names (bb, interesting, 1, profit); } DEBUG_FUNCTION void @@ -615,26 +599,21 @@ back_threader::debug () dump (stderr); } -/* Examine jump threading path PATH and return TRUE if it is profitable to - thread it, otherwise return FALSE. +/* Examine jump threading path PATH and return TRUE if it is possibly + profitable to thread it, otherwise return FALSE. Indicate in + *LARGE_NON_FSM whether the thread is too large for a non-fsm thread + but would be OK if we discover a backedge on our way. NAME is the SSA_NAME of the variable we found to have a constant value on PATH. If unknown, SSA_NAME is NULL. - If the taken edge out of the path is known ahead of time it is passed in - TAKEN_EDGE, otherwise it is NULL. - - CREATES_IRREDUCIBLE_LOOP, if non-null is set to TRUE if threading this path - would create an irreducible loop. - ?? It seems we should be able to loosen some of the restrictions in this function after loop optimizations have run. */ bool -back_threader_profitability::profitable_path_p (const vec &m_path, - tree name, - edge taken_edge, - bool *creates_irreducible_loop) +back_threader_profitability::possibly_profitable_path_p + (const vec &m_path, tree name, + bool *large_non_fsm) { gcc_checking_assert (!m_path.is_empty ()); @@ -650,13 +629,15 @@ back_threader_profitability::profitable_path_p (const vec &m_path, if (m_path.length () <= 1) return false; - int n_insns = 0; gimple_stmt_iterator gsi; loop_p loop = m_path[0]->loop_father; - bool threaded_through_latch = false; - bool multiway_branch_in_path = false; - bool threaded_multiway_branch = false; - bool contains_hot_bb = false; + + // We recompute the following, when we rewrite possibly_profitable_path_p + // to work incrementally on added BBs we have to unwind them on backtracking + m_n_insns = 0; + m_threaded_through_latch = false; + m_multiway_branch_in_path = false; + m_contains_hot_bb = false; if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "Checking profitability of path (backwards): "); @@ -679,7 +660,7 @@ back_threader_profitability::profitable_path_p (const vec &m_path, it ends in a multiway branch. */ if (j < m_path.length () - 1) { - int orig_n_insns = n_insns; + int orig_n_insns = m_n_insns; /* PHIs in the path will create degenerate PHIS in the copied path which will then get propagated away, so looking at just the duplicate path the PHIs would @@ -714,12 +695,12 @@ back_threader_profitability::profitable_path_p (const vec &m_path, && (SSA_NAME_VAR (dst) != SSA_NAME_VAR (name) || !SSA_NAME_VAR (dst)) && !virtual_operand_p (dst)) - ++n_insns; + ++m_n_insns; } } - if (!contains_hot_bb && m_speed_p) - contains_hot_bb |= optimize_bb_for_speed_p (bb); + if (!m_contains_hot_bb && m_speed_p) + m_contains_hot_bb |= optimize_bb_for_speed_p (bb); for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next_nondebug (&gsi)) @@ -735,10 +716,10 @@ back_threader_profitability::profitable_path_p (const vec &m_path, /* Do not count empty statements and labels. */ if (gimple_code (stmt) != GIMPLE_NOP && !is_gimple_debug (stmt)) - n_insns += estimate_num_insns (stmt, &eni_size_weights); + m_n_insns += estimate_num_insns (stmt, &eni_size_weights); } if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, " (%i insns)", n_insns-orig_n_insns); + fprintf (dump_file, " (%i insns)", m_n_insns-orig_n_insns); /* We do not look at the block with the threaded branch in this loop. So if any block with a last statement that @@ -747,14 +728,13 @@ back_threader_profitability::profitable_path_p (const vec &m_path, The block in PATH[0] is special, it's the block were we're going to be able to eliminate its branch. */ - gimple *last = last_stmt (bb); - if (last && (gimple_code (last) == GIMPLE_SWITCH - || gimple_code (last) == GIMPLE_GOTO)) + if (j > 0) { - if (j == 0) - threaded_multiway_branch = true; - else - multiway_branch_in_path = true; + gimple *last = last_stmt (bb); + if (last + && (gimple_code (last) == GIMPLE_SWITCH + || gimple_code (last) == GIMPLE_GOTO)) + m_multiway_branch_in_path = true; } } @@ -763,50 +743,29 @@ back_threader_profitability::profitable_path_p (const vec &m_path, through the loop latch. */ if (loop->latch == bb) { - threaded_through_latch = true; + m_threaded_through_latch = true; if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " (latch)"); } } - gimple *stmt = get_gimple_control_stmt (m_path[0]); - gcc_assert (stmt); - /* We are going to remove the control statement at the end of the last block in the threading path. So don't count it against our statement count. */ - - int stmt_insns = estimate_num_insns (stmt, &eni_size_weights); - n_insns-= stmt_insns; + m_n_insns -= m_exit_jump_benefit; if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "\n Control statement insns: %i\n" " Overall: %i insns\n", - stmt_insns, n_insns); - - if (creates_irreducible_loop) - { - /* If this path threaded through the loop latch back into the - same loop and the destination does not dominate the loop - latch, then this thread would create an irreducible loop. */ - *creates_irreducible_loop = false; - if (taken_edge - && threaded_through_latch - && loop == taken_edge->dest->loop_father - && (determine_bb_domination_status (loop, taken_edge->dest) - == DOMST_NONDOMINATING)) - *creates_irreducible_loop = true; - } + m_exit_jump_benefit, m_n_insns); /* Threading is profitable if the path duplicated is hot but also in a case we separate cold path from hot path and permit optimization of the hot path later. Be on the agressive side here. In some testcases, as in PR 78407 this leads to noticeable improvements. */ - if (m_speed_p - && ((taken_edge && optimize_edge_for_speed_p (taken_edge)) - || contains_hot_bb)) + if (m_speed_p) { - if (n_insns >= param_max_fsm_thread_path_insns) + if (m_n_insns >= param_max_fsm_thread_path_insns) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " FAIL: Jump-thread path not considered: " @@ -814,29 +773,104 @@ back_threader_profitability::profitable_path_p (const vec &m_path, "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n"); return false; } - if (taken_edge && probably_never_executed_edge_p (cfun, taken_edge)) + edge entry = find_edge (m_path[m_path.length () - 1], + m_path[m_path.length () - 2]); + if (probably_never_executed_edge_p (cfun, entry)) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " FAIL: Jump-thread path not considered: " - "path leads to probably never executed edge.\n"); + "path entry is probably never executed.\n"); return false; } - edge entry = find_edge (m_path[m_path.length () - 1], - m_path[m_path.length () - 2]); - if (probably_never_executed_edge_p (cfun, entry)) + } + else if (m_n_insns > 1) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, " FAIL: Jump-thread path not considered: " + "duplication of %i insns is needed and optimizing for size.\n", + m_n_insns); + return false; + } + + /* The generic copier used by the backthreader does not re-use an + existing threading path to reduce code duplication. So for that + case, drastically reduce the number of statements we are allowed + to copy. We don't know yet whether we will thread through the latch + so we have to be permissive and continue threading, but indicate + to the caller the thread, if final, wouldn't be profitable. */ + if ((!m_threaded_multiway_branch + || !loop->latch + || loop->latch->index == EXIT_BLOCK) + && (m_n_insns * param_fsm_scale_path_stmts + >= param_max_jump_thread_duplication_stmts)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + " FAIL: Did not thread around loop and would copy too " + "many statements.\n"); + return false; + } + *large_non_fsm = (!(m_threaded_through_latch && m_threaded_multiway_branch) + && (m_n_insns * param_fsm_scale_path_stmts + >= param_max_jump_thread_duplication_stmts)); + + return true; +} + +/* Examine jump threading path PATH and return TRUE if it is profitable to + thread it, otherwise return FALSE. + + The taken edge out of the path is TAKEN_EDGE. + + CREATES_IRREDUCIBLE_LOOP is set to TRUE if threading this path + would create an irreducible loop. + + ?? It seems we should be able to loosen some of the restrictions in + this function after loop optimizations have run. */ + +bool +back_threader_profitability::profitable_path_p (const vec &m_path, + edge taken_edge, + bool *creates_irreducible_loop) +{ + // We can assume that possibly_profitable_path_p holds here + + loop_p loop = m_path[0]->loop_father; + + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Checking profitability of path (backwards): "); + + /* If this path threaded through the loop latch back into the + same loop and the destination does not dominate the loop + latch, then this thread would create an irreducible loop. */ + *creates_irreducible_loop = false; + if (m_threaded_through_latch + && loop == taken_edge->dest->loop_father + && (determine_bb_domination_status (loop, taken_edge->dest) + == DOMST_NONDOMINATING)) + *creates_irreducible_loop = true; + + /* Threading is profitable if the path duplicated is hot but also + in a case we separate cold path from hot path and permit optimization + of the hot path later. Be on the agressive side here. In some testcases, + as in PR 78407 this leads to noticeable improvements. */ + if (m_speed_p + && (optimize_edge_for_speed_p (taken_edge) || m_contains_hot_bb)) + { + if (probably_never_executed_edge_p (cfun, taken_edge)) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " FAIL: Jump-thread path not considered: " - "path entry is probably never executed.\n"); + "path leads to probably never executed edge.\n"); return false; } } - else if (n_insns > 1) + else if (m_n_insns > 1) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " FAIL: Jump-thread path not considered: " "duplication of %i insns is needed and optimizing for size.\n", - n_insns); + m_n_insns); return false; } @@ -849,10 +883,9 @@ back_threader_profitability::profitable_path_p (const vec &m_path, the path -- in that case there's little the traditional loop optimizer would have done anyway, so an irreducible loop is not so bad. */ - if (!threaded_multiway_branch - && creates_irreducible_loop + if (!m_threaded_multiway_branch && *creates_irreducible_loop - && (n_insns * (unsigned) param_fsm_scale_path_stmts + && (m_n_insns * (unsigned) param_fsm_scale_path_stmts > (m_path.length () * (unsigned) param_fsm_scale_path_blocks))) @@ -861,6 +894,7 @@ back_threader_profitability::profitable_path_p (const vec &m_path, fprintf (dump_file, " FAIL: Would create irreducible loop without threading " "multiway branch.\n"); + /* We compute creates_irreducible_loop only late. */ return false; } @@ -868,8 +902,8 @@ back_threader_profitability::profitable_path_p (const vec &m_path, existing threading path to reduce code duplication. So for that case, drastically reduce the number of statements we are allowed to copy. */ - if (!(threaded_through_latch && threaded_multiway_branch) - && (n_insns * param_fsm_scale_path_stmts + if (!(m_threaded_through_latch && m_threaded_multiway_branch) + && (m_n_insns * param_fsm_scale_path_stmts >= param_max_jump_thread_duplication_stmts)) { if (dump_file && (dump_flags & TDF_DETAILS)) @@ -883,7 +917,7 @@ back_threader_profitability::profitable_path_p (const vec &m_path, explode the CFG due to duplicating the edges for that multi-way branch. So like above, only allow a multi-way branch on the path if we actually thread a multi-way branch. */ - if (!threaded_multiway_branch && multiway_branch_in_path) + if (!m_threaded_multiway_branch && m_multiway_branch_in_path) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, @@ -896,20 +930,20 @@ back_threader_profitability::profitable_path_p (const vec &m_path, the latch. This could alter the loop form sufficiently to cause loop optimizations to fail. Disable these threads until after loop optimizations have run. */ - if ((threaded_through_latch - || (taken_edge && taken_edge->dest == loop->latch)) + if ((m_threaded_through_latch || taken_edge->dest == loop->latch) && !(cfun->curr_properties & PROP_loop_opts_done) && empty_block_p (loop->latch)) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, - " FAIL: Thread through latch before loop opts would create non-empty latch\n"); + " FAIL: Thread through latch before loop opts would create " + "non-empty latch\n"); return false; - } return true; } + /* The current path PATH is a vector of blocks forming a jump threading path in reverse order. TAKEN_EDGE is the edge taken from path[0].