Message ID | 20220812120145.91C6513AAE@imap2.suse-dmz.suse.de |
---|---|
State | New, archived |
Headers |
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6a10:38f:b0:2d5:3c95:9e21 with SMTP id 15csp817468pxh; Fri, 12 Aug 2022 05:03:27 -0700 (PDT) X-Google-Smtp-Source: AA6agR7VK/27FTFRHv8kzTlgzGCHkJFmbrXnlFcOA2oWDGeh9m4QVHjThMaOztT4f9hXhRR3Vt6m X-Received: by 2002:a17:906:98d4:b0:730:b545:fb5d with SMTP id zd20-20020a17090698d400b00730b545fb5dmr2481180ejb.532.1660305807210; Fri, 12 Aug 2022 05:03:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660305807; cv=none; d=google.com; s=arc-20160816; b=NdvmPA2Ig+ASzhmiZ+NHQlsSB8osCnbtGVfUsQG4gZsD0BQ8DQqMJhQu93DDEDNrlw n/GTdNGaoElF0ShXYjFtVtFi94qygkBMs/2FZ+oba1O+SmgIZsDBgm7DEUhlTbGdrpWS 6b9xbbNHmmJiOyp8NIPnNem71akpK87J72/mFqm2lxrMG8R9KHCeQ6gEEbQ2yN/TEejj 61J41PWnKrYeuDU5AaMbS1ugM+Lv77eh0tvfy1pcbdNbu7YDcQg4qvZrKLYo8qykk1TV usyLJQCmbS+6Ot2q5XaOx9DnAixt3+NX+SsLHnHOioeJpoXWZRnlTl3+83DIL0cIopra VRbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:message-id :mime-version:subject:to:date:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=eiF7Yp123geCa8o0SWYdXADQ9WSLKCUnDN98bV698y8=; b=tqoNflnqkHyxc/GYOONcaIgZ7RqycP7+YEnZSD3idrWuxOlefuIXImL/cCzx5eWZ8v KlYvF5WW+oUHtT/bXMoj0cbF3GVQZaq9CuCWdllVruDFPRKqu/fVyHeyZ/xDrNOHmDbf sLhRxLgdVd+gg/c2pN478qCYRwjy7GptkkCb99rw7mRZ3HXq/nF8ScF5WazbvE3HYrkQ +g4A95CfxBwQVAnTGSGyDR5xJJt7H85WSREm8RpyhQWvx1pZKnWgRH4CTdIDxdIh2KZC SsPe6ENAAA/GGF3yHnHrvDoQmQziUl2veM6S5Q4sTtWV6QFiPKx+RUqpiTfEeSdAI0eR jzoQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=wpTpUoro; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id y7-20020aa7d507000000b00441ed6a3e99si1819188edq.586.2022.08.12.05.03.26 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Aug 2022 05:03:27 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=wpTpUoro; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B40823856249 for <ouuuleilei@gmail.com>; Fri, 12 Aug 2022 12:02:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B40823856249 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1660305750; bh=eiF7Yp123geCa8o0SWYdXADQ9WSLKCUnDN98bV698y8=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=wpTpUoror4Ql/GnLB/CQWXa/OBdMEQVJQGEdvRGwuKUqPrQMvWsuxfuglBreitzkQ 20cbU0gl/Qp0C7GJK/zuAym5YiP6mkeARqT9kyZ1gvJp8myStoSWepjDfqqEcJQ1xG K9f9+PgbutwgvVQ4vGsGha2YyvnXBdz/h8QbPKqQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id E19543858D28 for <gcc-patches@gcc.gnu.org>; Fri, 12 Aug 2022 12:01:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E19543858D28 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id B00053F750; Fri, 12 Aug 2022 12:01:45 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 91C6513AAE; Fri, 12 Aug 2022 12:01:45 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id KVdsIilB9mJ3BgAAMHmgww (envelope-from <rguenther@suse.de>); Fri, 12 Aug 2022 12:01:45 +0000 Date: Fri, 12 Aug 2022 14:01:45 +0200 (CEST) To: gcc-patches@gcc.gnu.org Subject: [PATCH] Support threading of just the exit edge MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Message-Id: <20220812120145.91C6513AAE@imap2.suse-dmz.suse.de> X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Richard Biener <rguenther@suse.de> Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1740956821795128250?= X-GMAIL-MSGID: =?utf-8?q?1740956821795128250?= |
Series |
Support threading of just the exit edge
|
|
Commit Message
Richard Biener
Aug. 12, 2022, 12:01 p.m. UTC
This started with noticing we add ENTRY_BLOCK to our threads just for the sake of simplifying the conditional at the end of the first block in a function. That's not really threading anything but it ends up duplicating the entry block, and re-writing the result instead of statically fold the jump. The following tries to handle those by recording simplifications of the exit conditional as a thread of length one. That requires special-casing them in the backward copier since if we do not have any block to copy but modify the jump in place and remove not taken edges this confuses the hell out of remaining threads. So back_jt_path_registry::update_cfg now first marks all edges we know are never taken and then prunes the threading candidates when they include such edge. Then it makes sure to first perform unreachable edge removal (so we avoid copying them when other thread paths contain the prevailing edge) before continuing to apply the remaining threads. In statistics you can see this avoids quite a bunch of useless threads (I've investiated 3 random files from cc1files with dropped stats in any of the thread passes). Still thinking about it it would be nice to avoid the work of discovering those candidates we have to throw away later which could eventually be done by having the backward threader perform a RPO walk over the CFG, skipping edges that can be statically determined as not being executed. Below I'm abusing the path range query to statically analyze the exit branch but I assume there's a simpler way of folding this stmt which could then better integrate with such a walk. In any case it seems worth more conciously handling the case of exit branches that simplify without path sensitive information. Then the patch also restricts path discovery when we'd produce threads we'll reject later during copying - the backward threader copying cannot handle paths where the to duplicate blocks are not from exactly the same loop. I'm probably going to split this part out. Any thoughts? * gimple-range-path.cc (path_range_query::set_path): Adjust assert to allow paths of size one. * tree-ssa-threadbackward.cc (back_threader::maybe_register_path): Paths of size one are always profitable. (back_threader::find_paths_to_names): Likewise. Do not walk further if we are leaving the current loop. (back_threader::find_taken_edge): Remove assert. Do not walk to ENTRY_BLOCK. * tree-ssa-threadupdate.cc (back_jt_path_registry::update_cfg): Handle jump threads of just the exit edge by modifying the control statement in-place. --- gcc/gimple-range-path.cc | 2 +- gcc/tree-ssa-threadbackward.cc | 21 ++++++++----- gcc/tree-ssa-threadupdate.cc | 54 ++++++++++++++++++++++++++++++++++ 3 files changed, 69 insertions(+), 8 deletions(-)
Comments
On Fri, Aug 12, 2022 at 2:01 PM Richard Biener <rguenther@suse.de> wrote: > > This started with noticing we add ENTRY_BLOCK to our threads > just for the sake of simplifying the conditional at the end of > the first block in a function. That's not really threading > anything but it ends up duplicating the entry block, and > re-writing the result instead of statically fold the jump. Hmmm, but threading 2 blocks is not really threading at all?? Unless I'm misunderstanding something, this was even documented in the backwards threader: [snip] That's not really a jump threading opportunity, but instead is simple cprop & simplification. We could handle it here if we wanted by wiring up all the incoming edges. If we run this early in IPA, that might be worth doing. For now we just reject that case. */ if (m_path.length () <= 1) return false; Which you undoubtedly ran into because you're specifically eliding the check: > - if (m_profit.profitable_path_p (m_path, m_name, taken_edge, > - &irreducible) > + if ((m_path.length () == 1 > + || m_profit.profitable_path_p (m_path, m_name, taken_edge, > + &irreducible)) > > The following tries to handle those by recording simplifications > of the exit conditional as a thread of length one. That requires > special-casing them in the backward copier since if we do not > have any block to copy but modify the jump in place and remove > not taken edges this confuses the hell out of remaining threads. > > So back_jt_path_registry::update_cfg now first marks all > edges we know are never taken and then prunes the threading > candidates when they include such edge. Then it makes sure > to first perform unreachable edge removal (so we avoid > copying them when other thread paths contain the prevailing > edge) before continuing to apply the remaining threads. This is all beyond my pay grade. I'm not very well versed in the threader per se. So if y'all think it's a good idea, by all means. Perhaps Jeff can chime in, or remembers the above comment? > > In statistics you can see this avoids quite a bunch of useless > threads (I've investiated 3 random files from cc1files with > dropped stats in any of the thread passes). > > Still thinking about it it would be nice to avoid the work of > discovering those candidates we have to throw away later > which could eventually be done by having the backward threader > perform a RPO walk over the CFG, skipping edges that can be > statically determined as not being executed. Below I'm > abusing the path range query to statically analyze the exit > branch but I assume there's a simpler way of folding this stmt > which could then better integrate with such a walk. Unreachable paths can be queried with path_range_query::unreachable_path_p (). Could you leverage this? The idea was that if we ever resolved any SSA name to UNDEFINED, the path itself was unreachable. Aldy > > In any case it seems worth more conciously handling the > case of exit branches that simplify without path sensitive > information. > > Then the patch also restricts path discovery when we'd produce > threads we'll reject later during copying - the backward threader > copying cannot handle paths where the to duplicate blocks are > not from exactly the same loop. I'm probably going to split this > part out. > > Any thoughts? > > * gimple-range-path.cc (path_range_query::set_path): Adjust > assert to allow paths of size one. > * tree-ssa-threadbackward.cc (back_threader::maybe_register_path): > Paths of size one are always profitable. > (back_threader::find_paths_to_names): Likewise. > Do not walk further if we are leaving the current loop. > (back_threader::find_taken_edge): Remove assert. Do not > walk to ENTRY_BLOCK. > * tree-ssa-threadupdate.cc (back_jt_path_registry::update_cfg): > Handle jump threads of just the exit edge by modifying the > control statement in-place. > --- > gcc/gimple-range-path.cc | 2 +- > gcc/tree-ssa-threadbackward.cc | 21 ++++++++----- > gcc/tree-ssa-threadupdate.cc | 54 ++++++++++++++++++++++++++++++++++ > 3 files changed, 69 insertions(+), 8 deletions(-) > > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc > index 78146f5683e..a7d277c31b8 100644 > --- a/gcc/gimple-range-path.cc > +++ b/gcc/gimple-range-path.cc > @@ -220,7 +220,7 @@ path_range_query::unreachable_path_p () > void > path_range_query::set_path (const vec<basic_block> &path) > { > - gcc_checking_assert (path.length () > 1); > + gcc_checking_assert (!path.is_empty ()); > m_path = path.copy (); > m_pos = m_path.length () - 1; > bitmap_clear (m_has_cache_entry); > diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc > index b886027fccf..669098e4ec3 100644 > --- a/gcc/tree-ssa-threadbackward.cc > +++ b/gcc/tree-ssa-threadbackward.cc > @@ -241,8 +241,9 @@ back_threader::maybe_register_path () > else > { > bool irreducible = false; > - if (m_profit.profitable_path_p (m_path, m_name, taken_edge, > - &irreducible) > + if ((m_path.length () == 1 > + || m_profit.profitable_path_p (m_path, m_name, taken_edge, > + &irreducible)) > && debug_counter () > && m_registry.register_path (m_path, taken_edge)) > { > @@ -267,7 +268,6 @@ back_threader::maybe_register_path () > edge > back_threader::find_taken_edge (const vec<basic_block> &path) > { > - gcc_checking_assert (path.length () > 1); > switch (gimple_code (m_last_stmt)) > { > case GIMPLE_COND: > @@ -350,9 +350,15 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting, > m_path.safe_push (bb); > > // Try to resolve the path without looking back. > - if (m_path.length () > 1 > - && (!m_profit.profitable_path_p (m_path, m_name, NULL) > - || maybe_register_path ())) > + if ((m_path.length () > 1 > + && !m_profit.profitable_path_p (m_path, m_name, NULL)) > + || maybe_register_path ()) > + ; > + > + // The backwards thread copier cannot copy blocks that do not belong > + // to the same loop, so when the new source of the path entry no > + // longer belongs to it we don't need to search further. > + else if (m_path[0]->loop_father != bb->loop_father) > ; > > // Continue looking for ways to extend the path but limit the > @@ -445,7 +451,8 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting, > edge e; > FOR_EACH_EDGE (e, iter, bb->preds) > { > - if (e->flags & EDGE_ABNORMAL > + if ((e->flags & EDGE_ABNORMAL) > + || e->src->index == ENTRY_BLOCK > // This is like path_crosses_loops in profitable_path_p but > // more restrictive to avoid peeling off loop iterations (see > // tree-ssa/pr14341.c for an example). > diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-threadupdate.cc > index 59c268a3567..d40fa7c4cff 100644 > --- a/gcc/tree-ssa-threadupdate.cc > +++ b/gcc/tree-ssa-threadupdate.cc > @@ -2613,6 +2613,60 @@ back_jt_path_registry::update_cfg (bool /*peel_loop_headers*/) > bool retval = false; > hash_set<edge> visited_starting_edges; > > + /* Mark never taken edges from paths that are just jump simplifications. */ > + auto_edge_flag never_taken (cfun); > + for (auto path : m_paths) > + if (path->length () == 1) > + { > + edge_iterator ei; > + edge e; > + FOR_EACH_EDGE (e, ei, (*path)[0]->e->src->succs) > + if (e != (*path)[0]->e) > + e->flags |= never_taken; > + } > + > + /* And prune paths that contain such edge before we remove them. */ > + for (unsigned i = 0; i < m_paths.length ();) > + { > + bool remove = false; > + for (auto je : *m_paths[i]) > + { > + if (je->e->flags & never_taken) > + { > + cancel_thread (m_paths[i], > + "Avoiding threading through unreachable edge"); > + remove = true; > + break; > + } > + } > + if (!remove) > + ++i; > + else > + m_paths.unordered_remove (i); > + } > + > + /* Finally perform those threads first, this way we avoid copying the > + dead outgoing edges when other theads contain the prevailing edge. */ > + for (unsigned i = 0; i < m_paths.length ();) > + { > + vec<jump_thread_edge *> *path = m_paths[i]; > + if (path->length () != 1) > + { > + ++i; > + continue; > + } > + edge exit = (*path)[0]->e; > + remove_ctrl_stmt_and_useless_edges (exit->src, exit->dest); > + exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE | EDGE_ABNORMAL); > + exit->flags |= EDGE_FALLTHRU; > + /* We do not update dominance info. */ > + free_dominance_info (CDI_DOMINATORS); > + retval = true; > + m_num_threaded_edges++; > + path->release (); > + m_paths.unordered_remove (i); > + } > + > while (m_paths.length ()) > { > vec<jump_thread_edge *> *path = m_paths[0]; > -- > 2.35.3 >
On Fri, 12 Aug 2022, Aldy Hernandez wrote: > On Fri, Aug 12, 2022 at 2:01 PM Richard Biener <rguenther@suse.de> wrote: > > > > This started with noticing we add ENTRY_BLOCK to our threads > > just for the sake of simplifying the conditional at the end of > > the first block in a function. That's not really threading > > anything but it ends up duplicating the entry block, and > > re-writing the result instead of statically fold the jump. > > Hmmm, but threading 2 blocks is not really threading at all?? Unless > I'm misunderstanding something, this was even documented in the > backwards threader: > > [snip] > That's not really a jump threading opportunity, but instead is > simple cprop & simplification. We could handle it here if we > wanted by wiring up all the incoming edges. If we run this > early in IPA, that might be worth doing. For now we just > reject that case. */ > if (m_path.length () <= 1) > return false; > > Which you undoubtedly ran into because you're specifically eliding the check: > > > - if (m_profit.profitable_path_p (m_path, m_name, taken_edge, > > - &irreducible) > > + if ((m_path.length () == 1 > > + || m_profit.profitable_path_p (m_path, m_name, taken_edge, > > + &irreducible)) Correct. But currently the threader just "cheats", picks up one more block and then "threads" the case anyway, doing this simple simplification in the most expensive way possible ... > > > > The following tries to handle those by recording simplifications > > of the exit conditional as a thread of length one. That requires > > special-casing them in the backward copier since if we do not > > have any block to copy but modify the jump in place and remove > > not taken edges this confuses the hell out of remaining threads. > > > > So back_jt_path_registry::update_cfg now first marks all > > edges we know are never taken and then prunes the threading > > candidates when they include such edge. Then it makes sure > > to first perform unreachable edge removal (so we avoid > > copying them when other thread paths contain the prevailing > > edge) before continuing to apply the remaining threads. > > This is all beyond my pay grade. I'm not very well versed in the > threader per se. So if y'all think it's a good idea, by all means. > Perhaps Jeff can chime in, or remembers the above comment? > > > > > In statistics you can see this avoids quite a bunch of useless > > threads (I've investiated 3 random files from cc1files with > > dropped stats in any of the thread passes). > > > > Still thinking about it it would be nice to avoid the work of > > discovering those candidates we have to throw away later > > which could eventually be done by having the backward threader > > perform a RPO walk over the CFG, skipping edges that can be > > statically determined as not being executed. Below I'm > > abusing the path range query to statically analyze the exit > > branch but I assume there's a simpler way of folding this stmt > > which could then better integrate with such a walk. > > Unreachable paths can be queried with > path_range_query::unreachable_path_p (). Could you leverage this? > The idea was that if we ever resolved any SSA name to UNDEFINED, the > path itself was unreachable. The situation is that we end up with paths where an intermediate branch on the path can be simplified to false - but of course only if we put all intermediate branch dependences into the list of imports to consider. I don't like it very much to use the "threading" code to perform the simplification but I couldn't figure a cheap way to perform the simplification without invoking a full EVRP? That said, the backwards threader simply does basic_block bb; FOR_EACH_BB_FN (bb, m_fun) if (EDGE_COUNT (bb->succs) > 1) maybe_thread_block (bb); bool changed = m_registry.thread_through_all_blocks (true); instead of that we should only consider edges that may be executable by instead walking the CFG along such edges, simplifying BB exit conditionals. Unfortunately EVRP is now a C++ maze so I couldn't find how to actually do such simplification, not knowing how interacting with ranger influences the path query use either. If you or Andrew has any suggestions on how to essentially do a if (edge e = find_taken_edge (bb)) { ... } where find_taken_edge should be at least as powerful as using the path query for a single bb then I'd be all ears. As said, I tried to find the code to cut&paste in EVRP but failed to ... Thanks, Richard. > > Aldy > > > > > In any case it seems worth more conciously handling the > > case of exit branches that simplify without path sensitive > > information. > > > > Then the patch also restricts path discovery when we'd produce > > threads we'll reject later during copying - the backward threader > > copying cannot handle paths where the to duplicate blocks are > > not from exactly the same loop. I'm probably going to split this > > part out. > > > > Any thoughts? > > > > * gimple-range-path.cc (path_range_query::set_path): Adjust > > assert to allow paths of size one. > > * tree-ssa-threadbackward.cc (back_threader::maybe_register_path): > > Paths of size one are always profitable. > > (back_threader::find_paths_to_names): Likewise. > > Do not walk further if we are leaving the current loop. > > (back_threader::find_taken_edge): Remove assert. Do not > > walk to ENTRY_BLOCK. > > * tree-ssa-threadupdate.cc (back_jt_path_registry::update_cfg): > > Handle jump threads of just the exit edge by modifying the > > control statement in-place. > > --- > > gcc/gimple-range-path.cc | 2 +- > > gcc/tree-ssa-threadbackward.cc | 21 ++++++++----- > > gcc/tree-ssa-threadupdate.cc | 54 ++++++++++++++++++++++++++++++++++ > > 3 files changed, 69 insertions(+), 8 deletions(-) > > > > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc > > index 78146f5683e..a7d277c31b8 100644 > > --- a/gcc/gimple-range-path.cc > > +++ b/gcc/gimple-range-path.cc > > @@ -220,7 +220,7 @@ path_range_query::unreachable_path_p () > > void > > path_range_query::set_path (const vec<basic_block> &path) > > { > > - gcc_checking_assert (path.length () > 1); > > + gcc_checking_assert (!path.is_empty ()); > > m_path = path.copy (); > > m_pos = m_path.length () - 1; > > bitmap_clear (m_has_cache_entry); > > diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc > > index b886027fccf..669098e4ec3 100644 > > --- a/gcc/tree-ssa-threadbackward.cc > > +++ b/gcc/tree-ssa-threadbackward.cc > > @@ -241,8 +241,9 @@ back_threader::maybe_register_path () > > else > > { > > bool irreducible = false; > > - if (m_profit.profitable_path_p (m_path, m_name, taken_edge, > > - &irreducible) > > + if ((m_path.length () == 1 > > + || m_profit.profitable_path_p (m_path, m_name, taken_edge, > > + &irreducible)) > > && debug_counter () > > && m_registry.register_path (m_path, taken_edge)) > > { > > @@ -267,7 +268,6 @@ back_threader::maybe_register_path () > > edge > > back_threader::find_taken_edge (const vec<basic_block> &path) > > { > > - gcc_checking_assert (path.length () > 1); > > switch (gimple_code (m_last_stmt)) > > { > > case GIMPLE_COND: > > @@ -350,9 +350,15 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting, > > m_path.safe_push (bb); > > > > // Try to resolve the path without looking back. > > - if (m_path.length () > 1 > > - && (!m_profit.profitable_path_p (m_path, m_name, NULL) > > - || maybe_register_path ())) > > + if ((m_path.length () > 1 > > + && !m_profit.profitable_path_p (m_path, m_name, NULL)) > > + || maybe_register_path ()) > > + ; > > + > > + // The backwards thread copier cannot copy blocks that do not belong > > + // to the same loop, so when the new source of the path entry no > > + // longer belongs to it we don't need to search further. > > + else if (m_path[0]->loop_father != bb->loop_father) > > ; > > > > // Continue looking for ways to extend the path but limit the > > @@ -445,7 +451,8 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting, > > edge e; > > FOR_EACH_EDGE (e, iter, bb->preds) > > { > > - if (e->flags & EDGE_ABNORMAL > > + if ((e->flags & EDGE_ABNORMAL) > > + || e->src->index == ENTRY_BLOCK > > // This is like path_crosses_loops in profitable_path_p but > > // more restrictive to avoid peeling off loop iterations (see > > // tree-ssa/pr14341.c for an example). > > diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-threadupdate.cc > > index 59c268a3567..d40fa7c4cff 100644 > > --- a/gcc/tree-ssa-threadupdate.cc > > +++ b/gcc/tree-ssa-threadupdate.cc > > @@ -2613,6 +2613,60 @@ back_jt_path_registry::update_cfg (bool /*peel_loop_headers*/) > > bool retval = false; > > hash_set<edge> visited_starting_edges; > > > > + /* Mark never taken edges from paths that are just jump simplifications. */ > > + auto_edge_flag never_taken (cfun); > > + for (auto path : m_paths) > > + if (path->length () == 1) > > + { > > + edge_iterator ei; > > + edge e; > > + FOR_EACH_EDGE (e, ei, (*path)[0]->e->src->succs) > > + if (e != (*path)[0]->e) > > + e->flags |= never_taken; > > + } > > + > > + /* And prune paths that contain such edge before we remove them. */ > > + for (unsigned i = 0; i < m_paths.length ();) > > + { > > + bool remove = false; > > + for (auto je : *m_paths[i]) > > + { > > + if (je->e->flags & never_taken) > > + { > > + cancel_thread (m_paths[i], > > + "Avoiding threading through unreachable edge"); > > + remove = true; > > + break; > > + } > > + } > > + if (!remove) > > + ++i; > > + else > > + m_paths.unordered_remove (i); > > + } > > + > > + /* Finally perform those threads first, this way we avoid copying the > > + dead outgoing edges when other theads contain the prevailing edge. */ > > + for (unsigned i = 0; i < m_paths.length ();) > > + { > > + vec<jump_thread_edge *> *path = m_paths[i]; > > + if (path->length () != 1) > > + { > > + ++i; > > + continue; > > + } > > + edge exit = (*path)[0]->e; > > + remove_ctrl_stmt_and_useless_edges (exit->src, exit->dest); > > + exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE | EDGE_ABNORMAL); > > + exit->flags |= EDGE_FALLTHRU; > > + /* We do not update dominance info. */ > > + free_dominance_info (CDI_DOMINATORS); > > + retval = true; > > + m_num_threaded_edges++; > > + path->release (); > > + m_paths.unordered_remove (i); > > + } > > + > > while (m_paths.length ()) > > { > > vec<jump_thread_edge *> *path = m_paths[0]; > > -- > > 2.35.3 > > > >
On 8/12/2022 10:03 AM, Aldy Hernandez wrote: > On Fri, Aug 12, 2022 at 2:01 PM Richard Biener <rguenther@suse.de> wrote: >> This started with noticing we add ENTRY_BLOCK to our threads >> just for the sake of simplifying the conditional at the end of >> the first block in a function. That's not really threading >> anything but it ends up duplicating the entry block, and >> re-writing the result instead of statically fold the jump. > Hmmm, but threading 2 blocks is not really threading at all?? Unless > I'm misunderstanding something, this was even documented in the > backwards threader: > > [snip] > That's not really a jump threading opportunity, but instead is > simple cprop & simplification. We could handle it here if we > wanted by wiring up all the incoming edges. If we run this > early in IPA, that might be worth doing. For now we just > reject that case. */ > if (m_path.length () <= 1) > return false; My recollection is that code was supposed to filter out the case where the threading path is just a definition block and a use block where the definition block dominated the use block. For that case, threading isn't really needed as we can just use const/copy propagation to propagate the value from the def to the use which should in turn allow the use (the conditional branch) to be simplified away -- all without the block copying and associated CFG updates. What doesn't make sense to me today is how do we know there's a dominator relationship between the two blocks? Jeff
On Mon, Aug 15, 2022 at 11:39 AM Richard Biener <rguenther@suse.de> wrote: > > On Fri, 12 Aug 2022, Aldy Hernandez wrote: > > > On Fri, Aug 12, 2022 at 2:01 PM Richard Biener <rguenther@suse.de> wrote: > > > > > > This started with noticing we add ENTRY_BLOCK to our threads > > > just for the sake of simplifying the conditional at the end of > > > the first block in a function. That's not really threading > > > anything but it ends up duplicating the entry block, and > > > re-writing the result instead of statically fold the jump. > > > > Hmmm, but threading 2 blocks is not really threading at all?? Unless > > I'm misunderstanding something, this was even documented in the > > backwards threader: > > > > [snip] > > That's not really a jump threading opportunity, but instead is > > simple cprop & simplification. We could handle it here if we > > wanted by wiring up all the incoming edges. If we run this > > early in IPA, that might be worth doing. For now we just > > reject that case. */ > > if (m_path.length () <= 1) > > return false; > > > > Which you undoubtedly ran into because you're specifically eliding the check: > > > > > - if (m_profit.profitable_path_p (m_path, m_name, taken_edge, > > > - &irreducible) > > > + if ((m_path.length () == 1 > > > + || m_profit.profitable_path_p (m_path, m_name, taken_edge, > > > + &irreducible)) > > Correct. But currently the threader just "cheats", picks up one more > block and then "threads" the case anyway, doing this simple simplification > in the most expensive way possible ... Ah. > > > > > > > The following tries to handle those by recording simplifications > > > of the exit conditional as a thread of length one. That requires > > > special-casing them in the backward copier since if we do not > > > have any block to copy but modify the jump in place and remove > > > not taken edges this confuses the hell out of remaining threads. > > > > > > So back_jt_path_registry::update_cfg now first marks all > > > edges we know are never taken and then prunes the threading > > > candidates when they include such edge. Then it makes sure > > > to first perform unreachable edge removal (so we avoid > > > copying them when other thread paths contain the prevailing > > > edge) before continuing to apply the remaining threads. > > > > This is all beyond my pay grade. I'm not very well versed in the > > threader per se. So if y'all think it's a good idea, by all means. > > Perhaps Jeff can chime in, or remembers the above comment? > > > > > > > > In statistics you can see this avoids quite a bunch of useless > > > threads (I've investiated 3 random files from cc1files with > > > dropped stats in any of the thread passes). > > > > > > Still thinking about it it would be nice to avoid the work of > > > discovering those candidates we have to throw away later > > > which could eventually be done by having the backward threader > > > perform a RPO walk over the CFG, skipping edges that can be > > > statically determined as not being executed. Below I'm > > > abusing the path range query to statically analyze the exit > > > branch but I assume there's a simpler way of folding this stmt > > > which could then better integrate with such a walk. > > > > Unreachable paths can be queried with > > path_range_query::unreachable_path_p (). Could you leverage this? > > The idea was that if we ever resolved any SSA name to UNDEFINED, the > > path itself was unreachable. > > The situation is that we end up with paths where an intermediate > branch on the path can be simplified to false - but of course only > if we put all intermediate branch dependences into the list of > imports to consider. > > I don't like it very much to use the "threading" code to perform > the simplification but I couldn't figure a cheap way to perform > the simplification without invoking a full EVRP? That said, > the backwards threader simply does > > basic_block bb; > FOR_EACH_BB_FN (bb, m_fun) > if (EDGE_COUNT (bb->succs) > 1) > maybe_thread_block (bb); > > bool changed = m_registry.thread_through_all_blocks (true); > > instead of that we should only consider edges that may be executable > by instead walking the CFG along such edges, simplifying BB exit > conditionals. Unfortunately EVRP is now a C++ maze so I couldn't > find how to actually do such simplification, not knowing how > interacting with ranger influences the path query use either. > If you or Andrew has any suggestions on how to essentially > do a > > if (edge e = find_taken_edge (bb)) > { > ... > } > > where find_taken_edge should be at least as powerful as using > the path query for a single bb then I'd be all ears. As said, > I tried to find the code to cut&paste in EVRP but failed to ... Interesting... If what you need is find_taken_edge(bb), I think we can do this quite cleanly. What you're asking is to fold the conditional at the end of a basic block, and return the edge depending on the folded value. We have all the smarts to do the folding (fold_range), and tree-cfg.c has find_taken_edge(basic_block, tree val), which AFAICT, does everything except the folding of non-trivial statements. If you like, I could tweak find_taken_edge() to use a ranger if available (pass has called enable_ranger()), or otherwise use the global range query in cfun (SSA_NAME_RANGE_INFO but with the range_query API). This should be as fast as poking at SSA_NAME_RANGE_INFO for the common case, or using an active ranger if available. See the attached proof of concept. With this approach we could handle everything find_taken_edge(bb) currently handles, plus anything we can fold in ranger (without the penalty of a full ranger if you don't want to-- we can still fold and use global ranges for SSA names if available). Or ultimately, if you have an active ranger, you can leverage the full ranger functionality. Either way is a lot better than the 1 != 0, etc folding the code currently does ;-). If you want to fold these blocks from the threader, we could expose the ranger in the path_query with enable_ranger(), and then find_taken_edge(basic_block) can pick the ranger up. Up to you if you want to run this within the threader, or elsewhere. Does this make sense? I'm happy to cobble it up if you find it useful. Aldy
heh. or just + int_range<2> r; + if (!fold_range (r, const_cast <gcond *> (cond_stmt)) + || !r.singleton_p (&val)) if you do not provide a range_query to any of the fold_using_range code, it defaults to: fur_source::fur_source (range_query *q) { if (q) m_query = q; else if (cfun) m_query = get_range_query (cfun); else m_query = get_global_range_query (); m_gori = NULL; } so it will default to the one you provide, and if there isn't one, it will try to use the cfun version if cfun is available.. otherwise it defaults to the global range query. So you dont need to provide the cfun version. This applies to the 5 basic routines in gimple-fold-range.h: // Fold stmt S into range R using range query Q. bool fold_range (vrange &r, gimple *s, range_query *q = NULL); // Recalculate stmt S into R using range query Q as if it were on edge ON_EDGE. bool fold_range (vrange &v, gimple *s, edge on_edge, range_query *q = NULL); // These routines the operands to be specified when manually folding. // Any excess queries will be drawn from the current range_query. bool fold_range (vrange &r, gimple *s, vrange &r1); bool fold_range (vrange &r, gimple *s, vrange &r1, vrange &r2); bool fold_range (vrange &r, gimple *s, unsigned num_elements, vrange **vector); Andrew On 8/15/22 15:09, Aldy Hernandez wrote: > On Mon, Aug 15, 2022 at 11:39 AM Richard Biener <rguenther@suse.de> wrote: >> On Fri, 12 Aug 2022, Aldy Hernandez wrote: >> >>> On Fri, Aug 12, 2022 at 2:01 PM Richard Biener <rguenther@suse.de> wrote: >>>> This started with noticing we add ENTRY_BLOCK to our threads >>>> just for the sake of simplifying the conditional at the end of >>>> the first block in a function. That's not really threading >>>> anything but it ends up duplicating the entry block, and >>>> re-writing the result instead of statically fold the jump. >>> Hmmm, but threading 2 blocks is not really threading at all?? Unless >>> I'm misunderstanding something, this was even documented in the >>> backwards threader: >>> >>> [snip] >>> That's not really a jump threading opportunity, but instead is >>> simple cprop & simplification. We could handle it here if we >>> wanted by wiring up all the incoming edges. If we run this >>> early in IPA, that might be worth doing. For now we just >>> reject that case. */ >>> if (m_path.length () <= 1) >>> return false; >>> >>> Which you undoubtedly ran into because you're specifically eliding the check: >>> >>>> - if (m_profit.profitable_path_p (m_path, m_name, taken_edge, >>>> - &irreducible) >>>> + if ((m_path.length () == 1 >>>> + || m_profit.profitable_path_p (m_path, m_name, taken_edge, >>>> + &irreducible)) >> Correct. But currently the threader just "cheats", picks up one more >> block and then "threads" the case anyway, doing this simple simplification >> in the most expensive way possible ... > Ah. > >>>> The following tries to handle those by recording simplifications >>>> of the exit conditional as a thread of length one. That requires >>>> special-casing them in the backward copier since if we do not >>>> have any block to copy but modify the jump in place and remove >>>> not taken edges this confuses the hell out of remaining threads. >>>> >>>> So back_jt_path_registry::update_cfg now first marks all >>>> edges we know are never taken and then prunes the threading >>>> candidates when they include such edge. Then it makes sure >>>> to first perform unreachable edge removal (so we avoid >>>> copying them when other thread paths contain the prevailing >>>> edge) before continuing to apply the remaining threads. >>> This is all beyond my pay grade. I'm not very well versed in the >>> threader per se. So if y'all think it's a good idea, by all means. >>> Perhaps Jeff can chime in, or remembers the above comment? >>> >>>> In statistics you can see this avoids quite a bunch of useless >>>> threads (I've investiated 3 random files from cc1files with >>>> dropped stats in any of the thread passes). >>>> >>>> Still thinking about it it would be nice to avoid the work of >>>> discovering those candidates we have to throw away later >>>> which could eventually be done by having the backward threader >>>> perform a RPO walk over the CFG, skipping edges that can be >>>> statically determined as not being executed. Below I'm >>>> abusing the path range query to statically analyze the exit >>>> branch but I assume there's a simpler way of folding this stmt >>>> which could then better integrate with such a walk. >>> Unreachable paths can be queried with >>> path_range_query::unreachable_path_p (). Could you leverage this? >>> The idea was that if we ever resolved any SSA name to UNDEFINED, the >>> path itself was unreachable. >> The situation is that we end up with paths where an intermediate >> branch on the path can be simplified to false - but of course only >> if we put all intermediate branch dependences into the list of >> imports to consider. >> >> I don't like it very much to use the "threading" code to perform >> the simplification but I couldn't figure a cheap way to perform >> the simplification without invoking a full EVRP? That said, >> the backwards threader simply does >> >> basic_block bb; >> FOR_EACH_BB_FN (bb, m_fun) >> if (EDGE_COUNT (bb->succs) > 1) >> maybe_thread_block (bb); >> >> bool changed = m_registry.thread_through_all_blocks (true); >> >> instead of that we should only consider edges that may be executable >> by instead walking the CFG along such edges, simplifying BB exit >> conditionals. Unfortunately EVRP is now a C++ maze so I couldn't >> find how to actually do such simplification, not knowing how >> interacting with ranger influences the path query use either. >> If you or Andrew has any suggestions on how to essentially >> do a >> >> if (edge e = find_taken_edge (bb)) >> { >> ... >> } >> >> where find_taken_edge should be at least as powerful as using >> the path query for a single bb then I'd be all ears. As said, >> I tried to find the code to cut&paste in EVRP but failed to ... > Interesting... If what you need is find_taken_edge(bb), I think we can > do this quite cleanly. > > What you're asking is to fold the conditional at the end of a basic > block, and return the edge depending on the folded value. We have all > the smarts to do the folding (fold_range), and tree-cfg.c has > find_taken_edge(basic_block, tree val), which AFAICT, does everything > except the folding of non-trivial statements. > > If you like, I could tweak find_taken_edge() to use a ranger if > available (pass has called enable_ranger()), or otherwise use the > global range query in cfun (SSA_NAME_RANGE_INFO but with the > range_query API). This should be as fast as poking at > SSA_NAME_RANGE_INFO for the common case, or using an active ranger if > available. > > See the attached proof of concept. > > With this approach we could handle everything find_taken_edge(bb) > currently handles, plus anything we can fold in ranger (without the > penalty of a full ranger if you don't want to-- we can still fold and > use global ranges for SSA names if available). Or ultimately, if you > have an active ranger, you can leverage the full ranger functionality. > Either way is a lot better than the 1 != 0, etc folding the code > currently does ;-). > > If you want to fold these blocks from the threader, we could expose > the ranger in the path_query with enable_ranger(), and then > find_taken_edge(basic_block) can pick the ranger up. Up to you if you > want to run this within the threader, or elsewhere. > > Does this make sense? > > I'm happy to cobble it up if you find it useful. > Aldy
On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod <amacleod@redhat.com> wrote: > > heh. or just > > > + int_range<2> r; > + if (!fold_range (r, const_cast <gcond *> (cond_stmt)) > + || !r.singleton_p (&val)) > > > if you do not provide a range_query to any of the fold_using_range code, > it defaults to: > > fur_source::fur_source (range_query *q) > { > if (q) > m_query = q; > else if (cfun) > m_query = get_range_query (cfun); > else > m_query = get_global_range_query (); > m_gori = NULL; > } > Sweet. Even better! Aldy
On Mon, 15 Aug 2022, Aldy Hernandez wrote: > On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod <amacleod@redhat.com> wrote: > > > > heh. or just > > > > > > + int_range<2> r; > > + if (!fold_range (r, const_cast <gcond *> (cond_stmt)) > > + || !r.singleton_p (&val)) > > > > > > if you do not provide a range_query to any of the fold_using_range code, > > it defaults to: > > > > fur_source::fur_source (range_query *q) > > { > > if (q) > > m_query = q; > > else if (cfun) > > m_query = get_range_query (cfun); > > else > > m_query = get_global_range_query (); > > m_gori = NULL; > > } > > > > Sweet. Even better! So when I do the following incremental change ontop of the posted patch then I see that the path query is able to simplify more "single BB paths" than the global range folding. diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc index 669098e4ec3..777e778037f 100644 --- a/gcc/tree-ssa-threadbackward.cc +++ b/gcc/tree-ssa-threadbackward.cc @@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const vec<basic_block> &path, { int_range_max r; + int_range<2> rf; + if (path.length () == 1) + { + fold_range (rf, cond); + } + m_solver->compute_ranges (path, m_imports); m_solver->range_of_stmt (r, cond); @@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const vec<basic_block> &path, if (r == true_range || r == false_range) { + if (path.length () == 1) + gcc_assert (r == rf); edge e_true, e_false; basic_block bb = gimple_bb (cond); extract_true_false_edges_from_block (bb, &e_true, &e_false); Even doing the following (not sure what's the difference and in particular expense over the path range query) results in missed simplifications (checking my set of cc1files). diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc index 669098e4ec3..1d43a179d08 100644 --- a/gcc/tree-ssa-threadbackward.cc +++ b/gcc/tree-ssa-threadbackward.cc @@ -99,6 +99,7 @@ private: back_threader_registry m_registry; back_threader_profitability m_profit; + gimple_ranger *m_ranger; path_range_query *m_solver; // Current path being analyzed. @@ -146,12 +147,14 @@ back_threader::back_threader (function *fun, unsigned flags, bool first) // The path solver needs EDGE_DFS_BACK in resolving mode. if (flags & BT_RESOLVE) mark_dfs_back_edges (); - m_solver = new path_range_query (flags & BT_RESOLVE); + m_ranger = new gimple_ranger; + m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger); } back_threader::~back_threader () { delete m_solver; + delete m_ranger; loop_optimizer_finalize (); } @@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const vec<basic_block> &path, { int_range_max r; + int_range<2> rf; + if (path.length () == 1) + { + fold_range (rf, cond, m_ranger); + } + m_solver->compute_ranges (path, m_imports); m_solver->range_of_stmt (r, cond); @@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const vec<basic_block> &path, if (r == true_range || r == false_range) { + if (path.length () == 1) + gcc_assert (r == rf); edge e_true, e_false; basic_block bb = gimple_bb (cond); extract_true_false_edges_from_block (bb, &e_true, &e_false); one example is <bb 176> [local count: 14414059]: _128 = node_177(D)->typed.type; pretmp_413 = MEM[(const union tree_node *)_128].base.code; _431 = pretmp_413 + 65519; if (_128 == 0B) goto <bb 199>; [18.09%] else goto <bb 177>; [81.91%] where m_imports for the path is just _128 and the range computed is false while the ranger query returns VARYING. But path_range_query::range_defined_in_block does if (bb && POINTER_TYPE_P (TREE_TYPE (name))) m_ranger->m_cache.m_exit.maybe_adjust_range (r, name, bb); which adjusts the range to ~[0, 0], probably because of the dereference in the following stmt. Why does fold_range not do this when folding the exit test? Is there a way to make it do so? It looks like the only routine using this in gimple-range.cc is range_on_edge and there it's used for e->src after calling range_on_exit for e->src (why's it not done in range_on_exit?). A testcase for this is int foo (int **p, int i) { int *q = *p; int res = *q + i; if (q) return res; return -1; } which we "thread" with the path and with the above ICEs because fold_range doesn't get that if (q) is always true. Without the patch ethread doesn't want to duplicate the block (it's too large) but threadfull will if you disable evrp (if you remove the increment by 'i' it again won't since nothing interesting prevails and it won't go to BB 0 and fails to pick up a thread of length > 1): Checking profitability of path (backwards): bb:2 (6 insns) bb:0 Control statement insns: 2 Overall: 4 insns [1] Registering jump thread: (0, 2) incoming edge; (2, 3) nocopy; path: 0->2->3 SUCCESS Removing basic block 2 ;; basic block 2, loop depth 0 ;; pred: _1 = *p_6(D); _2 = (long unsigned int) n_7(D); _3 = _2 * 4; q_8 = _1 + _3; res_9 = *q_8; if (q_8 != 0B) goto <bb 3>; [98.33%] else goto <bb 4>; [1.67%] ;; succ: 3 ;; 4 ... (it copied BB 2) ... int foo (int * * p, int n) { int res; int * q; int * _10; long unsigned int _11; long unsigned int _12; <bb 2> [local count: 1073741824]: _10 = *p_6(D); _11 = (long unsigned int) n_7(D); _12 = _11 * 4; q_13 = _10 + _12; res_14 = *q_13; return res_14;
On Tue, Aug 16, 2022 at 11:18 AM Richard Biener <rguenther@suse.de> wrote: > > On Mon, 15 Aug 2022, Aldy Hernandez wrote: > > > On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod <amacleod@redhat.com> wrote: > > > > > > heh. or just > > > > > > > > > + int_range<2> r; > > > + if (!fold_range (r, const_cast <gcond *> (cond_stmt)) > > > + || !r.singleton_p (&val)) > > > > > > > > > if you do not provide a range_query to any of the fold_using_range code, > > > it defaults to: > > > > > > fur_source::fur_source (range_query *q) > > > { > > > if (q) > > > m_query = q; > > > else if (cfun) > > > m_query = get_range_query (cfun); > > > else > > > m_query = get_global_range_query (); > > > m_gori = NULL; > > > } > > > > > > > Sweet. Even better! > > So when I do the following incremental change ontop of the posted > patch then I see that the path query is able to simplify more > "single BB paths" than the global range folding. > > diff --git a/gcc/tree-ssa-threadbackward.cc > b/gcc/tree-ssa-threadbackward.cc > index 669098e4ec3..777e778037f 100644 > --- a/gcc/tree-ssa-threadbackward.cc > +++ b/gcc/tree-ssa-threadbackward.cc > @@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const > vec<basic_block> &path, > { > int_range_max r; > > + int_range<2> rf; > + if (path.length () == 1) > + { > + fold_range (rf, cond); > + } > + > m_solver->compute_ranges (path, m_imports); > m_solver->range_of_stmt (r, cond); > > @@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const > vec<basic_block> &path, > > if (r == true_range || r == false_range) > { > + if (path.length () == 1) > + gcc_assert (r == rf); > edge e_true, e_false; > basic_block bb = gimple_bb (cond); > extract_true_false_edges_from_block (bb, &e_true, &e_false); > > Even doing the following (not sure what's the difference and in > particular expense over the path range query) results in missed > simplifications (checking my set of cc1files). > > diff --git a/gcc/tree-ssa-threadbackward.cc > b/gcc/tree-ssa-threadbackward.cc > index 669098e4ec3..1d43a179d08 100644 > --- a/gcc/tree-ssa-threadbackward.cc > +++ b/gcc/tree-ssa-threadbackward.cc > @@ -99,6 +99,7 @@ private: > > back_threader_registry m_registry; > back_threader_profitability m_profit; > + gimple_ranger *m_ranger; > path_range_query *m_solver; > > // Current path being analyzed. > @@ -146,12 +147,14 @@ back_threader::back_threader (function *fun, > unsigned flags, bool first) > // The path solver needs EDGE_DFS_BACK in resolving mode. > if (flags & BT_RESOLVE) > mark_dfs_back_edges (); > - m_solver = new path_range_query (flags & BT_RESOLVE); > + m_ranger = new gimple_ranger; > + m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger); > } Passing an allocated ranger here results in less simplifications over letting path_range_query allocate its own? That's not right. Or do you mean that using fold_range() with the m_ranger causes ICEs with your patch (due to the non-null processing described below)? > > back_threader::~back_threader () > { > delete m_solver; > + delete m_ranger; > > loop_optimizer_finalize (); > } > @@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const > vec<basic_block> &path, > { > int_range_max r; > > + int_range<2> rf; > + if (path.length () == 1) > + { > + fold_range (rf, cond, m_ranger); > + } > + > m_solver->compute_ranges (path, m_imports); > m_solver->range_of_stmt (r, cond); > > @@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const > vec<basic_block> &path, > > if (r == true_range || r == false_range) > { > + if (path.length () == 1) > + gcc_assert (r == rf); > edge e_true, e_false; > basic_block bb = gimple_bb (cond); > extract_true_false_edges_from_block (bb, &e_true, &e_false); > > one example is > > <bb 176> [local count: 14414059]: > _128 = node_177(D)->typed.type; > pretmp_413 = MEM[(const union tree_node *)_128].base.code; > _431 = pretmp_413 + 65519; > if (_128 == 0B) > goto <bb 199>; [18.09%] > else > goto <bb 177>; [81.91%] > > where m_imports for the path is just _128 and the range computed is > false while the ranger query returns VARYING. But > path_range_query::range_defined_in_block does > > if (bb && POINTER_TYPE_P (TREE_TYPE (name))) > m_ranger->m_cache.m_exit.maybe_adjust_range (r, name, bb); > > which adjusts the range to ~[0, 0], probably because of the > dereference in the following stmt. > > Why does fold_range not do this when folding the exit test? Is there > a way to make it do so? It looks like the only routine using this > in gimple-range.cc is range_on_edge and there it's used for e->src > after calling range_on_exit for e->src (why's it not done in > range_on_exit?). A testcase for this is Andrew's gonna have to answer this one, because I'm just a user of the infer_range infrastructure. But yes, you're right... fold_range doesn't seem to take into account side-effects such as non-null. Aldy > > int foo (int **p, int i) > { > int *q = *p; > int res = *q + i; > if (q) > return res; > return -1; > } > > which we "thread" with the path and with the above ICEs because > fold_range doesn't get that if (q) is always true. Without the > patch ethread doesn't want to duplicate the block (it's too large) > but threadfull will if you disable evrp (if you remove the increment > by 'i' it again won't since nothing interesting prevails and it > won't go to BB 0 and fails to pick up a thread of length > 1): > > Checking profitability of path (backwards): bb:2 (6 insns) bb:0 > Control statement insns: 2 > Overall: 4 insns > [1] Registering jump thread: (0, 2) incoming edge; (2, 3) nocopy; > path: 0->2->3 SUCCESS > Removing basic block 2 > ;; basic block 2, loop depth 0 > ;; pred: > _1 = *p_6(D); > _2 = (long unsigned int) n_7(D); > _3 = _2 * 4; > q_8 = _1 + _3; > res_9 = *q_8; > if (q_8 != 0B) > goto <bb 3>; [98.33%] > else > goto <bb 4>; [1.67%] > ;; succ: 3 > ;; 4 > > ... (it copied BB 2) ... > > int foo (int * * p, int n) > { > int res; > int * q; > int * _10; > long unsigned int _11; > long unsigned int _12; > > <bb 2> [local count: 1073741824]: > _10 = *p_6(D); > _11 = (long unsigned int) n_7(D); > _12 = _11 * 4; > q_13 = _10 + _12; > res_14 = *q_13; > return res_14; >
On Tue, 16 Aug 2022, Aldy Hernandez wrote: > On Tue, Aug 16, 2022 at 11:18 AM Richard Biener <rguenther@suse.de> wrote: > > > > On Mon, 15 Aug 2022, Aldy Hernandez wrote: > > > > > On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod <amacleod@redhat.com> wrote: > > > > > > > > heh. or just > > > > > > > > > > > > + int_range<2> r; > > > > + if (!fold_range (r, const_cast <gcond *> (cond_stmt)) > > > > + || !r.singleton_p (&val)) > > > > > > > > > > > > if you do not provide a range_query to any of the fold_using_range code, > > > > it defaults to: > > > > > > > > fur_source::fur_source (range_query *q) > > > > { > > > > if (q) > > > > m_query = q; > > > > else if (cfun) > > > > m_query = get_range_query (cfun); > > > > else > > > > m_query = get_global_range_query (); > > > > m_gori = NULL; > > > > } > > > > > > > > > > Sweet. Even better! > > > > So when I do the following incremental change ontop of the posted > > patch then I see that the path query is able to simplify more > > "single BB paths" than the global range folding. > > > > diff --git a/gcc/tree-ssa-threadbackward.cc > > b/gcc/tree-ssa-threadbackward.cc > > index 669098e4ec3..777e778037f 100644 > > --- a/gcc/tree-ssa-threadbackward.cc > > +++ b/gcc/tree-ssa-threadbackward.cc > > @@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const > > vec<basic_block> &path, > > { > > int_range_max r; > > > > + int_range<2> rf; > > + if (path.length () == 1) > > + { > > + fold_range (rf, cond); > > + } > > + > > m_solver->compute_ranges (path, m_imports); > > m_solver->range_of_stmt (r, cond); > > > > @@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const > > vec<basic_block> &path, > > > > if (r == true_range || r == false_range) > > { > > + if (path.length () == 1) > > + gcc_assert (r == rf); > > edge e_true, e_false; > > basic_block bb = gimple_bb (cond); > > extract_true_false_edges_from_block (bb, &e_true, &e_false); > > > > Even doing the following (not sure what's the difference and in > > particular expense over the path range query) results in missed > > simplifications (checking my set of cc1files). > > > > diff --git a/gcc/tree-ssa-threadbackward.cc > > b/gcc/tree-ssa-threadbackward.cc > > index 669098e4ec3..1d43a179d08 100644 > > --- a/gcc/tree-ssa-threadbackward.cc > > +++ b/gcc/tree-ssa-threadbackward.cc > > @@ -99,6 +99,7 @@ private: > > > > back_threader_registry m_registry; > > back_threader_profitability m_profit; > > + gimple_ranger *m_ranger; > > path_range_query *m_solver; > > > > // Current path being analyzed. > > @@ -146,12 +147,14 @@ back_threader::back_threader (function *fun, > > unsigned flags, bool first) > > // The path solver needs EDGE_DFS_BACK in resolving mode. > > if (flags & BT_RESOLVE) > > mark_dfs_back_edges (); > > - m_solver = new path_range_query (flags & BT_RESOLVE); > > + m_ranger = new gimple_ranger; > > + m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger); > > } > > Passing an allocated ranger here results in less simplifications over > letting path_range_query allocate its own? That's not right. Or do > you mean that using fold_range() with the m_ranger causes ICEs with > your patch (due to the non-null processing described below)? Yes, I've needed a ranger to use fold_range (..., m_ranger) which I thought might do more than not passing one. > > > > back_threader::~back_threader () > > { > > delete m_solver; > > + delete m_ranger; > > > > loop_optimizer_finalize (); > > } > > @@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const > > vec<basic_block> &path, > > { > > int_range_max r; > > > > + int_range<2> rf; > > + if (path.length () == 1) > > + { > > + fold_range (rf, cond, m_ranger); > > + } > > + > > m_solver->compute_ranges (path, m_imports); > > m_solver->range_of_stmt (r, cond); > > > > @@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const > > vec<basic_block> &path, > > > > if (r == true_range || r == false_range) > > { > > + if (path.length () == 1) > > + gcc_assert (r == rf); > > edge e_true, e_false; > > basic_block bb = gimple_bb (cond); > > extract_true_false_edges_from_block (bb, &e_true, &e_false); > > > > one example is > > > > <bb 176> [local count: 14414059]: > > _128 = node_177(D)->typed.type; > > pretmp_413 = MEM[(const union tree_node *)_128].base.code; > > _431 = pretmp_413 + 65519; > > if (_128 == 0B) > > goto <bb 199>; [18.09%] > > else > > goto <bb 177>; [81.91%] > > > > where m_imports for the path is just _128 and the range computed is > > false while the ranger query returns VARYING. But > > path_range_query::range_defined_in_block does > > > > if (bb && POINTER_TYPE_P (TREE_TYPE (name))) > > m_ranger->m_cache.m_exit.maybe_adjust_range (r, name, bb); > > > > which adjusts the range to ~[0, 0], probably because of the > > dereference in the following stmt. > > > > Why does fold_range not do this when folding the exit test? Is there > > a way to make it do so? It looks like the only routine using this > > in gimple-range.cc is range_on_edge and there it's used for e->src > > after calling range_on_exit for e->src (why's it not done in > > range_on_exit?). A testcase for this is > > Andrew's gonna have to answer this one, because I'm just a user of the > infer_range infrastructure. But yes, you're right... fold_range > doesn't seem to take into account side-effects such as non-null. OK, let's see. I can probably use the path query as well - I'll see to produce a prototype of doing those simplifications early, avoiding threadings there and through unreachable parts of the CFG. Richard. > Aldy > > > > > int foo (int **p, int i) > > { > > int *q = *p; > > int res = *q + i; > > if (q) > > return res; > > return -1; > > } > > > > which we "thread" with the path and with the above ICEs because > > fold_range doesn't get that if (q) is always true. Without the > > patch ethread doesn't want to duplicate the block (it's too large) > > but threadfull will if you disable evrp (if you remove the increment > > by 'i' it again won't since nothing interesting prevails and it > > won't go to BB 0 and fails to pick up a thread of length > 1): > > > > Checking profitability of path (backwards): bb:2 (6 insns) bb:0 > > Control statement insns: 2 > > Overall: 4 insns > > [1] Registering jump thread: (0, 2) incoming edge; (2, 3) nocopy; > > path: 0->2->3 SUCCESS > > Removing basic block 2 > > ;; basic block 2, loop depth 0 > > ;; pred: > > _1 = *p_6(D); > > _2 = (long unsigned int) n_7(D); > > _3 = _2 * 4; > > q_8 = _1 + _3; > > res_9 = *q_8; > > if (q_8 != 0B) > > goto <bb 3>; [98.33%] > > else > > goto <bb 4>; [1.67%] > > ;; succ: 3 > > ;; 4 > > > > ... (it copied BB 2) ... > > > > int foo (int * * p, int n) > > { > > int res; > > int * q; > > int * _10; > > long unsigned int _11; > > long unsigned int _12; > > > > <bb 2> [local count: 1073741824]: > > _10 = *p_6(D); > > _11 = (long unsigned int) n_7(D); > > _12 = _11 * 4; > > q_13 = _10 + _12; > > res_14 = *q_13; > > return res_14; > > > >
On Tue, Aug 16, 2022 at 1:32 PM Richard Biener <rguenther@suse.de> wrote: > > On Tue, 16 Aug 2022, Aldy Hernandez wrote: > > > On Tue, Aug 16, 2022 at 11:18 AM Richard Biener <rguenther@suse.de> wrote: > > > > > > On Mon, 15 Aug 2022, Aldy Hernandez wrote: > > > > > > > On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod <amacleod@redhat.com> wrote: > > > > > > > > > > heh. or just > > > > > > > > > > > > > > > + int_range<2> r; > > > > > + if (!fold_range (r, const_cast <gcond *> (cond_stmt)) > > > > > + || !r.singleton_p (&val)) > > > > > > > > > > > > > > > if you do not provide a range_query to any of the fold_using_range code, > > > > > it defaults to: > > > > > > > > > > fur_source::fur_source (range_query *q) > > > > > { > > > > > if (q) > > > > > m_query = q; > > > > > else if (cfun) > > > > > m_query = get_range_query (cfun); > > > > > else > > > > > m_query = get_global_range_query (); > > > > > m_gori = NULL; > > > > > } > > > > > > > > > > > > > Sweet. Even better! > > > > > > So when I do the following incremental change ontop of the posted > > > patch then I see that the path query is able to simplify more > > > "single BB paths" than the global range folding. > > > > > > diff --git a/gcc/tree-ssa-threadbackward.cc > > > b/gcc/tree-ssa-threadbackward.cc > > > index 669098e4ec3..777e778037f 100644 > > > --- a/gcc/tree-ssa-threadbackward.cc > > > +++ b/gcc/tree-ssa-threadbackward.cc > > > @@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const > > > vec<basic_block> &path, > > > { > > > int_range_max r; > > > > > > + int_range<2> rf; > > > + if (path.length () == 1) > > > + { > > > + fold_range (rf, cond); > > > + } > > > + > > > m_solver->compute_ranges (path, m_imports); > > > m_solver->range_of_stmt (r, cond); > > > > > > @@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const > > > vec<basic_block> &path, > > > > > > if (r == true_range || r == false_range) > > > { > > > + if (path.length () == 1) > > > + gcc_assert (r == rf); > > > edge e_true, e_false; > > > basic_block bb = gimple_bb (cond); > > > extract_true_false_edges_from_block (bb, &e_true, &e_false); > > > > > > Even doing the following (not sure what's the difference and in > > > particular expense over the path range query) results in missed > > > simplifications (checking my set of cc1files). > > > > > > diff --git a/gcc/tree-ssa-threadbackward.cc > > > b/gcc/tree-ssa-threadbackward.cc > > > index 669098e4ec3..1d43a179d08 100644 > > > --- a/gcc/tree-ssa-threadbackward.cc > > > +++ b/gcc/tree-ssa-threadbackward.cc > > > @@ -99,6 +99,7 @@ private: > > > > > > back_threader_registry m_registry; > > > back_threader_profitability m_profit; > > > + gimple_ranger *m_ranger; > > > path_range_query *m_solver; > > > > > > // Current path being analyzed. > > > @@ -146,12 +147,14 @@ back_threader::back_threader (function *fun, > > > unsigned flags, bool first) > > > // The path solver needs EDGE_DFS_BACK in resolving mode. > > > if (flags & BT_RESOLVE) > > > mark_dfs_back_edges (); > > > - m_solver = new path_range_query (flags & BT_RESOLVE); > > > + m_ranger = new gimple_ranger; > > > + m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger); > > > } > > > > Passing an allocated ranger here results in less simplifications over > > letting path_range_query allocate its own? That's not right. Or do > > you mean that using fold_range() with the m_ranger causes ICEs with > > your patch (due to the non-null processing described below)? > > Yes, I've needed a ranger to use fold_range (..., m_ranger) which > I thought might do more than not passing one. Yeah. If you don't pass it a ranger, it'll use global ranges (i.e. SSA_NAME_RANGE_INFO). More specifically, it will first try to get the ranger you have enabled in your pass with enable_ranger(). If that's not available, then it will use global ranges. So you could also do: m_ranger = enable_ranger (fun); and then in the destructor: disable_ranger (m_fun); m_ranger = NULL; // for good measure. Then you could use fold_range() without any parameters and it will DTRT. This is what I had in mind when I shared my proof of concept for tree-cfg's version of find_taken_edge(bb). If you have enabled a ranger, it'll use that, otherwise it'll use global SSA_NAME_RANGE_INFO ranges. > > > > > > > back_threader::~back_threader () > > > { > > > delete m_solver; > > > + delete m_ranger; > > > > > > loop_optimizer_finalize (); > > > } > > > @@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const > > > vec<basic_block> &path, > > > { > > > int_range_max r; > > > > > > + int_range<2> rf; > > > + if (path.length () == 1) > > > + { > > > + fold_range (rf, cond, m_ranger); > > > + } > > > + > > > m_solver->compute_ranges (path, m_imports); > > > m_solver->range_of_stmt (r, cond); > > > > > > @@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const > > > vec<basic_block> &path, > > > > > > if (r == true_range || r == false_range) > > > { > > > + if (path.length () == 1) > > > + gcc_assert (r == rf); > > > edge e_true, e_false; > > > basic_block bb = gimple_bb (cond); > > > extract_true_false_edges_from_block (bb, &e_true, &e_false); > > > > > > one example is > > > > > > <bb 176> [local count: 14414059]: > > > _128 = node_177(D)->typed.type; > > > pretmp_413 = MEM[(const union tree_node *)_128].base.code; > > > _431 = pretmp_413 + 65519; > > > if (_128 == 0B) > > > goto <bb 199>; [18.09%] > > > else > > > goto <bb 177>; [81.91%] > > > > > > where m_imports for the path is just _128 and the range computed is > > > false while the ranger query returns VARYING. But > > > path_range_query::range_defined_in_block does > > > > > > if (bb && POINTER_TYPE_P (TREE_TYPE (name))) > > > m_ranger->m_cache.m_exit.maybe_adjust_range (r, name, bb); > > > > > > which adjusts the range to ~[0, 0], probably because of the > > > dereference in the following stmt. > > > > > > Why does fold_range not do this when folding the exit test? Is there > > > a way to make it do so? It looks like the only routine using this > > > in gimple-range.cc is range_on_edge and there it's used for e->src > > > after calling range_on_exit for e->src (why's it not done in > > > range_on_exit?). A testcase for this is > > > > Andrew's gonna have to answer this one, because I'm just a user of the > > infer_range infrastructure. But yes, you're right... fold_range > > doesn't seem to take into account side-effects such as non-null. > > OK, let's see. I can probably use the path query as well - I'll > see to produce a prototype of doing those simplifications early, > avoiding threadings there and through unreachable parts of the CFG. Let's see if Andrew has any bright ideas on having fold_range work right with non-null, etc. Otherwise, the path_query idea would be fine, although it'd be nice to have a global entry point for this, like what you suggested with find_taken_edge(bb). Aldy
On Tue, 16 Aug 2022, Richard Biener wrote: > On Tue, 16 Aug 2022, Aldy Hernandez wrote: > > > On Tue, Aug 16, 2022 at 11:18 AM Richard Biener <rguenther@suse.de> wrote: > > > > > > On Mon, 15 Aug 2022, Aldy Hernandez wrote: > > > > > > > On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod <amacleod@redhat.com> wrote: > > > > > > > > > > heh. or just > > > > > > > > > > > > > > > + int_range<2> r; > > > > > + if (!fold_range (r, const_cast <gcond *> (cond_stmt)) > > > > > + || !r.singleton_p (&val)) > > > > > > > > > > > > > > > if you do not provide a range_query to any of the fold_using_range code, > > > > > it defaults to: > > > > > > > > > > fur_source::fur_source (range_query *q) > > > > > { > > > > > if (q) > > > > > m_query = q; > > > > > else if (cfun) > > > > > m_query = get_range_query (cfun); > > > > > else > > > > > m_query = get_global_range_query (); > > > > > m_gori = NULL; > > > > > } > > > > > > > > > > > > > Sweet. Even better! > > > > > > So when I do the following incremental change ontop of the posted > > > patch then I see that the path query is able to simplify more > > > "single BB paths" than the global range folding. > > > > > > diff --git a/gcc/tree-ssa-threadbackward.cc > > > b/gcc/tree-ssa-threadbackward.cc > > > index 669098e4ec3..777e778037f 100644 > > > --- a/gcc/tree-ssa-threadbackward.cc > > > +++ b/gcc/tree-ssa-threadbackward.cc > > > @@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const > > > vec<basic_block> &path, > > > { > > > int_range_max r; > > > > > > + int_range<2> rf; > > > + if (path.length () == 1) > > > + { > > > + fold_range (rf, cond); > > > + } > > > + > > > m_solver->compute_ranges (path, m_imports); > > > m_solver->range_of_stmt (r, cond); > > > > > > @@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const > > > vec<basic_block> &path, > > > > > > if (r == true_range || r == false_range) > > > { > > > + if (path.length () == 1) > > > + gcc_assert (r == rf); > > > edge e_true, e_false; > > > basic_block bb = gimple_bb (cond); > > > extract_true_false_edges_from_block (bb, &e_true, &e_false); > > > > > > Even doing the following (not sure what's the difference and in > > > particular expense over the path range query) results in missed > > > simplifications (checking my set of cc1files). > > > > > > diff --git a/gcc/tree-ssa-threadbackward.cc > > > b/gcc/tree-ssa-threadbackward.cc > > > index 669098e4ec3..1d43a179d08 100644 > > > --- a/gcc/tree-ssa-threadbackward.cc > > > +++ b/gcc/tree-ssa-threadbackward.cc > > > @@ -99,6 +99,7 @@ private: > > > > > > back_threader_registry m_registry; > > > back_threader_profitability m_profit; > > > + gimple_ranger *m_ranger; > > > path_range_query *m_solver; > > > > > > // Current path being analyzed. > > > @@ -146,12 +147,14 @@ back_threader::back_threader (function *fun, > > > unsigned flags, bool first) > > > // The path solver needs EDGE_DFS_BACK in resolving mode. > > > if (flags & BT_RESOLVE) > > > mark_dfs_back_edges (); > > > - m_solver = new path_range_query (flags & BT_RESOLVE); > > > + m_ranger = new gimple_ranger; > > > + m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger); > > > } > > > > Passing an allocated ranger here results in less simplifications over > > letting path_range_query allocate its own? That's not right. Or do > > you mean that using fold_range() with the m_ranger causes ICEs with > > your patch (due to the non-null processing described below)? > > Yes, I've needed a ranger to use fold_range (..., m_ranger) which > I thought might do more than not passing one. > > > > > > > back_threader::~back_threader () > > > { > > > delete m_solver; > > > + delete m_ranger; > > > > > > loop_optimizer_finalize (); > > > } > > > @@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const > > > vec<basic_block> &path, > > > { > > > int_range_max r; > > > > > > + int_range<2> rf; > > > + if (path.length () == 1) > > > + { > > > + fold_range (rf, cond, m_ranger); > > > + } > > > + > > > m_solver->compute_ranges (path, m_imports); > > > m_solver->range_of_stmt (r, cond); > > > > > > @@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const > > > vec<basic_block> &path, > > > > > > if (r == true_range || r == false_range) > > > { > > > + if (path.length () == 1) > > > + gcc_assert (r == rf); > > > edge e_true, e_false; > > > basic_block bb = gimple_bb (cond); > > > extract_true_false_edges_from_block (bb, &e_true, &e_false); > > > > > > one example is > > > > > > <bb 176> [local count: 14414059]: > > > _128 = node_177(D)->typed.type; > > > pretmp_413 = MEM[(const union tree_node *)_128].base.code; > > > _431 = pretmp_413 + 65519; > > > if (_128 == 0B) > > > goto <bb 199>; [18.09%] > > > else > > > goto <bb 177>; [81.91%] > > > > > > where m_imports for the path is just _128 and the range computed is > > > false while the ranger query returns VARYING. But > > > path_range_query::range_defined_in_block does > > > > > > if (bb && POINTER_TYPE_P (TREE_TYPE (name))) > > > m_ranger->m_cache.m_exit.maybe_adjust_range (r, name, bb); > > > > > > which adjusts the range to ~[0, 0], probably because of the > > > dereference in the following stmt. > > > > > > Why does fold_range not do this when folding the exit test? Is there > > > a way to make it do so? It looks like the only routine using this > > > in gimple-range.cc is range_on_edge and there it's used for e->src > > > after calling range_on_exit for e->src (why's it not done in > > > range_on_exit?). A testcase for this is > > > > Andrew's gonna have to answer this one, because I'm just a user of the > > infer_range infrastructure. But yes, you're right... fold_range > > doesn't seem to take into account side-effects such as non-null. > > OK, let's see. I can probably use the path query as well - I'll > see to produce a prototype of doing those simplifications early, > avoiding threadings there and through unreachable parts of the CFG. Something like the following - it processes the CFG first, simplifying BB conditionals and marking edges so we can later avoid threading along unreachable paths. It will no longer count those simplifications as theaded jumps. It's basically like an EVRP pass on steroids (because of the path query doing more than ranger folding of the conditional), but only simplifying conditionals. Ideally just using ranger or even better, fold_stmt (..., m_ranger) plus find_taken_edge () would work here. Especially for ethread doing such simplification seems worthwhile since EVRP only comes later. Likewise the first threading pass after inlining runs before the first VRP pass (and VRP isn't enabled at -O1 but threading is). That said, what triggered this is seeing that the backwards threader needs 0 -> 2 (aka ENTRY_BLOCK -> 2) to simplify the conditional in BB2 and that it will also copy BB2 for this even though there's only a single entry into BB2. I suppose we could say we don't want to thread those, thus reject any threading attempt with an entry edge into a block with a single predecessor? Or optimize for this case in the copier, not copying such blocks? I'm somewhat undecided what's the correct thing to do here. Richard. diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc index c99d77dd340..782794d3825 100644 --- a/gcc/gimple-range-path.cc +++ b/gcc/gimple-range-path.cc @@ -220,7 +220,6 @@ path_range_query::unreachable_path_p () void path_range_query::set_path (const vec<basic_block> &path) { - gcc_checking_assert (path.length () > 1); m_path = path.copy (); m_pos = m_path.length () - 1; bitmap_clear (m_has_cache_entry); diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc index 1c362839fab..3ca8c6eb9da 100644 --- a/gcc/tree-ssa-threadbackward.cc +++ b/gcc/tree-ssa-threadbackward.cc @@ -91,9 +91,9 @@ private: edge maybe_register_path (); void maybe_register_path_dump (edge taken_edge); void find_paths_to_names (basic_block bb, bitmap imports, unsigned); - edge find_taken_edge (const vec<basic_block> &path); - edge find_taken_edge_cond (const vec<basic_block> &path, gcond *); - edge find_taken_edge_switch (const vec<basic_block> &path, gswitch *); + edge find_taken_edge (const vec<basic_block> &path, bool = false); + edge find_taken_edge_cond (const vec<basic_block> &path, gcond *, bool); + edge find_taken_edge_switch (const vec<basic_block> &path, gswitch *, bool); virtual void debug (); virtual void dump (FILE *out); @@ -124,6 +124,7 @@ private: // each threadfull[12] pass. This is used to differentiate between // the different threading passes so we can set up debug counters. bool m_first; + auto_edge_flag m_reachable; }; // Used to differentiate unreachable edges, so we may stop the search @@ -132,7 +133,8 @@ const edge back_threader::UNREACHABLE_EDGE = (edge) -1; back_threader::back_threader (function *fun, unsigned flags, bool first) : m_profit (flags & BT_SPEED), - m_first (first) + m_first (first), + m_reachable (fun) { if (flags & BT_SPEED) loop_optimizer_init (LOOPS_HAVE_PREHEADERS | LOOPS_HAVE_SIMPLE_LATCHES); @@ -265,16 +267,16 @@ back_threader::maybe_register_path () // outgoing edge can be calculated, return NULL. edge -back_threader::find_taken_edge (const vec<basic_block> &path) +back_threader::find_taken_edge (const vec<basic_block> &path, bool modify) { - gcc_checking_assert (path.length () > 1); switch (gimple_code (m_last_stmt)) { case GIMPLE_COND: - return find_taken_edge_cond (path, as_a<gcond *> (m_last_stmt)); + return find_taken_edge_cond (path, as_a<gcond *> (m_last_stmt), modify); case GIMPLE_SWITCH: - return find_taken_edge_switch (path, as_a<gswitch *> (m_last_stmt)); + return find_taken_edge_switch (path, as_a<gswitch *> (m_last_stmt), + modify); default: return NULL; @@ -285,7 +287,7 @@ back_threader::find_taken_edge (const vec<basic_block> &path) edge back_threader::find_taken_edge_switch (const vec<basic_block> &path, - gswitch *sw) + gswitch *sw, bool modify) { tree name = gimple_switch_index (sw); int_range_max r; @@ -303,6 +305,9 @@ back_threader::find_taken_edge_switch (const vec<basic_block> &path, if (!label) return NULL; + if (modify) + gimple_switch_set_index (sw, CASE_LOW (label)); + return find_edge (gimple_bb (sw), label_to_block (cfun, CASE_LABEL (label))); } @@ -310,7 +315,7 @@ back_threader::find_taken_edge_switch (const vec<basic_block> &path, edge back_threader::find_taken_edge_cond (const vec<basic_block> &path, - gcond *cond) + gcond *cond, bool modify) { int_range_max r; @@ -328,6 +333,13 @@ back_threader::find_taken_edge_cond (const vec<basic_block> &path, edge e_true, e_false; basic_block bb = gimple_bb (cond); extract_true_false_edges_from_block (bb, &e_true, &e_false); + if (modify) + { + if (r == true_range) + gimple_cond_make_true (cond); + else + gimple_cond_make_false (cond); + } return r == true_range ? e_true : e_false; } return NULL; @@ -452,6 +464,7 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting, FOR_EACH_EDGE (e, iter, bb->preds) { if (e->flags & EDGE_ABNORMAL + || !(e->flags & m_reachable) // This is like path_crosses_loops in profitable_path_p but // more restrictive to avoid peeling off loop iterations (see // tree-ssa/pr14341.c for an example). @@ -948,16 +961,101 @@ unsigned int back_threader::thread_blocks () { basic_block bb; + bool changed = false; + + auto_vec<basic_block> worklist (n_basic_blocks_for_fn (cfun)); + auto_bb_flag visited (cfun); + worklist.quick_push (single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun))); + worklist[0]->flags |= visited; + while (!worklist.is_empty ()) + { + bb = worklist.pop (); + + // Try to resolve the jump at the end of the block + // ??? We'd like to use ranger without a fake path of + // length one here, but that turns out to be less powerful + gimple *stmt = last_stmt (bb); + if (stmt + && (gimple_code (stmt) == GIMPLE_COND + || gimple_code (stmt) == GIMPLE_SWITCH)) + { + bitmap_clear (m_imports); + ssa_op_iter iter; + tree name; + // could use gori exports here + FOR_EACH_SSA_TREE_OPERAND (name, stmt, iter, SSA_OP_USE) + { + if (gimple_range_ssa_p (name)) + bitmap_set_bit (m_imports, SSA_NAME_VERSION (name)); + } + if (!bitmap_empty_p (m_imports)) + { + auto_vec<basic_block, 1> path; + path.quick_push (bb); + m_last_stmt = stmt; + if (edge e = find_taken_edge (path, true)) + { + if (e == UNREACHABLE_EDGE) + continue; + else + { + // The jump was modified to be trivially + // redirectable by CFG cleanup, so make sure + // that runs + changed = true; + if (!(e->dest->flags & visited)) + { + worklist.quick_push (e->dest); + e->dest->flags |= visited; + e->flags |= m_reachable; + continue; + } + } + } + } + } + + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, bb->succs) + { + e->flags |= m_reachable; + if (!(e->dest->flags & visited)) + { + worklist.quick_push (e->dest); + e->dest->flags |= visited; + e->flags |= m_reachable; + } + } + } + FOR_EACH_BB_FN (bb, m_fun) - if (EDGE_COUNT (bb->succs) > 1) - maybe_thread_block (bb); + // visited doubles for reachable + if (bb->flags & visited) + { + bb->flags &= ~visited; + unsigned cnt = 0; + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, bb->succs) + if (e->flags & m_reachable) + cnt++; + // check we have a yet unresolved exit jump + if (cnt > 1) + maybe_thread_block (bb); + } - bool changed = m_registry.thread_through_all_blocks (true); + FOR_EACH_BB_FN (bb, m_fun) + { + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, bb->succs) + e->flags &= ~m_reachable; + } - if (m_flags & BT_SPEED) - return changed ? TODO_cleanup_cfg : 0; + changed |= m_registry.thread_through_all_blocks (true); - return false; + return changed ? TODO_cleanup_cfg : 0; } namespace {
On 8/16/22 05:18, Richard Biener wrote: > On Mon, 15 Aug 2022, Aldy Hernandez wrote: > >> On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod <amacleod@redhat.com> wrote: >>> heh. or just >>> >>> >>> + int_range<2> r; >>> + if (!fold_range (r, const_cast <gcond *> (cond_stmt)) >>> + || !r.singleton_p (&val)) >>> >>> >>> if you do not provide a range_query to any of the fold_using_range code, >>> it defaults to: >>> >>> fur_source::fur_source (range_query *q) >>> { >>> if (q) >>> m_query = q; >>> else if (cfun) >>> m_query = get_range_query (cfun); >>> else >>> m_query = get_global_range_query (); >>> m_gori = NULL; >>> } >>> >> Sweet. Even better! > So when I do the following incremental change ontop of the posted > patch then I see that the path query is able to simplify more > "single BB paths" than the global range folding. > > diff --git a/gcc/tree-ssa-threadbackward.cc > b/gcc/tree-ssa-threadbackward.cc > index 669098e4ec3..777e778037f 100644 > --- a/gcc/tree-ssa-threadbackward.cc > +++ b/gcc/tree-ssa-threadbackward.cc > @@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const > vec<basic_block> &path, > { > int_range_max r; > > + int_range<2> rf; > + if (path.length () == 1) > + { > + fold_range (rf, cond); > + } > + > m_solver->compute_ranges (path, m_imports); > m_solver->range_of_stmt (r, cond); > > @@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const > vec<basic_block> &path, > > if (r == true_range || r == false_range) > { > + if (path.length () == 1) > + gcc_assert (r == rf); > edge e_true, e_false; > basic_block bb = gimple_bb (cond); > extract_true_false_edges_from_block (bb, &e_true, &e_false); > > Even doing the following (not sure what's the difference and in > particular expense over the path range query) results in missed > simplifications (checking my set of cc1files). > > diff --git a/gcc/tree-ssa-threadbackward.cc > b/gcc/tree-ssa-threadbackward.cc > index 669098e4ec3..1d43a179d08 100644 > --- a/gcc/tree-ssa-threadbackward.cc > +++ b/gcc/tree-ssa-threadbackward.cc > @@ -99,6 +99,7 @@ private: > > back_threader_registry m_registry; > back_threader_profitability m_profit; > + gimple_ranger *m_ranger; > path_range_query *m_solver; > > // Current path being analyzed. > @@ -146,12 +147,14 @@ back_threader::back_threader (function *fun, > unsigned flags, bool first) > // The path solver needs EDGE_DFS_BACK in resolving mode. > if (flags & BT_RESOLVE) > mark_dfs_back_edges (); > - m_solver = new path_range_query (flags & BT_RESOLVE); > + m_ranger = new gimple_ranger; > + m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger); > } > > back_threader::~back_threader () > { > delete m_solver; > + delete m_ranger; > > loop_optimizer_finalize (); > } > @@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const > vec<basic_block> &path, > { > int_range_max r; > > + int_range<2> rf; > + if (path.length () == 1) > + { > + fold_range (rf, cond, m_ranger); > + } > + > m_solver->compute_ranges (path, m_imports); > m_solver->range_of_stmt (r, cond); > > @@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const > vec<basic_block> &path, > > if (r == true_range || r == false_range) > { > + if (path.length () == 1) > + gcc_assert (r == rf); > edge e_true, e_false; > basic_block bb = gimple_bb (cond); > extract_true_false_edges_from_block (bb, &e_true, &e_false); > > one example is > > <bb 176> [local count: 14414059]: > _128 = node_177(D)->typed.type; > pretmp_413 = MEM[(const union tree_node *)_128].base.code; > _431 = pretmp_413 + 65519; > if (_128 == 0B) > goto <bb 199>; [18.09%] > else > goto <bb 177>; [81.91%] > > where m_imports for the path is just _128 and the range computed is > false while the ranger query returns VARYING. But > path_range_query::range_defined_in_block does > > if (bb && POINTER_TYPE_P (TREE_TYPE (name))) > m_ranger->m_cache.m_exit.maybe_adjust_range (r, name, bb); This is the coarse grained "side effect applies somewhere in the block" mechanism. There is no understanding of where in the block it happens. > > which adjusts the range to ~[0, 0], probably because of the > dereference in the following stmt. > > Why does fold_range not do this when folding the exit test? Is there > a way to make it do so? It looks like the only routine using this > in gimple-range.cc is range_on_edge and there it's used for e->src > after calling range_on_exit for e->src (why's it not done in > range_on_exit?). A testcase for this is Fold_range doesnt do this because it is simply another statement. It makes no attempt to understand the context in which you are folding something. you could be folding that stmt from a different location (ie recomputing) If your context is that you are looking for the range after the last statement has been executed, then one needs to check to see if there are any side effects. ranger uses it for range_on_edge (), because it knows all the statements in the block have been executed, and its safe to apply anything seen in the block. It does it right after range_on_exit() is called internally. Once upon a time, it was integrated with range-on-exit, but it turned out there were certain times that it was causing problems. There have been some cleanups since then, it probably safe now to return that call to range_on_exit.. but doesnt buy us a whole lot by itself.. except of course I have now OKd using range_on-entry/exit generally :-) the cache also uses it when walking blocks to pick up inferred values during an on-entry cache fill. > int foo (int **p, int i) > { > int *q = *p; > int res = *q + i; > if (q) > return res; > return -1; > } > > which we "thread" with the path and with the above ICEs because > fold_range doesn't get that if (q) is always true. Without the Its a known limitation that, unless you are doing a walk, on-demand requests will "miss" some inferred ranges, because they are only maintained at the basic block level. (ie, we will know that q is non-null in BB2, but we don't know where, so we can make no assumptions at the exit condition about q in this case. the path_query is invoked in on-demand mode because it wont walk the entire IL,. so the first time you ask for the range of an ssa-name, it will quickly zip over the immediate use list and "fill" the on-exit structure for any blocks which a non-null reference is seen. This allows the threader to pick up non-null from blocks outside the path that havent been examined. VRP does a walk, and while during the walk, adjusts ranges on the fly for the current block via the gimple_ranger::register_inferred_ranges () routine. which is really just a wrapper around ranger_cache::apply_inferred_ranges () (in gimple-range-cache.cc) This is called after every statement and is where we take care of bookkeeping for adjusting values, and adding them to the blocks list. if the path query is walking those statement, it could also "adjust" the range of q on the fly... but it has to have walked those statements. from that routine, the relevant bits use the gimple-infer class to find any inferred ranges from the statement, and would look something like: gimple_infer_range infer(s); for (unsigned x = 0; x < infer.num (); x++) { tree name = infer.name (x); if (!interesting_p (name)) continue; get_current_path_range (r, name); if (r.intersect (infer.range (x))) set_current_path_range (name, r); } That would adjust the value of q to be non-zero after "int res = *q + i;" but you need to have walked the statement before you get to the condition. as long as they are all in your list of interesting statement to look at, then you'd be golden. I dont know if, when, or what direction things are examined. Andrew
On Tue, 16 Aug 2022, Andrew MacLeod wrote: > > On 8/16/22 05:18, Richard Biener wrote: > > On Mon, 15 Aug 2022, Aldy Hernandez wrote: > > > >> On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod <amacleod@redhat.com> wrote: > >>> heh. or just > >>> > >>> > >>> + int_range<2> r; > >>> + if (!fold_range (r, const_cast <gcond *> (cond_stmt)) > >>> + || !r.singleton_p (&val)) > >>> > >>> > >>> if you do not provide a range_query to any of the fold_using_range code, > >>> it defaults to: > >>> > >>> fur_source::fur_source (range_query *q) > >>> { > >>> if (q) > >>> m_query = q; > >>> else if (cfun) > >>> m_query = get_range_query (cfun); > >>> else > >>> m_query = get_global_range_query (); > >>> m_gori = NULL; > >>> } > >>> > >> Sweet. Even better! > > So when I do the following incremental change ontop of the posted > > patch then I see that the path query is able to simplify more > > "single BB paths" than the global range folding. > > > > diff --git a/gcc/tree-ssa-threadbackward.cc > > b/gcc/tree-ssa-threadbackward.cc > > index 669098e4ec3..777e778037f 100644 > > --- a/gcc/tree-ssa-threadbackward.cc > > +++ b/gcc/tree-ssa-threadbackward.cc > > @@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const > > vec<basic_block> &path, > > { > > int_range_max r; > > + int_range<2> rf; > > + if (path.length () == 1) > > + { > > + fold_range (rf, cond); > > + } > > + > > m_solver->compute_ranges (path, m_imports); > > m_solver->range_of_stmt (r, cond); > > @@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const > > vec<basic_block> &path, > > > > if (r == true_range || r == false_range) > > { > > + if (path.length () == 1) > > + gcc_assert (r == rf); > > edge e_true, e_false; > > basic_block bb = gimple_bb (cond); > > extract_true_false_edges_from_block (bb, &e_true, &e_false); > > > > Even doing the following (not sure what's the difference and in > > particular expense over the path range query) results in missed > > simplifications (checking my set of cc1files). > > > > diff --git a/gcc/tree-ssa-threadbackward.cc > > b/gcc/tree-ssa-threadbackward.cc > > index 669098e4ec3..1d43a179d08 100644 > > --- a/gcc/tree-ssa-threadbackward.cc > > +++ b/gcc/tree-ssa-threadbackward.cc > > @@ -99,6 +99,7 @@ private: > > > > back_threader_registry m_registry; > > back_threader_profitability m_profit; > > + gimple_ranger *m_ranger; > > path_range_query *m_solver; > > > > // Current path being analyzed. > > @@ -146,12 +147,14 @@ back_threader::back_threader (function *fun, > > unsigned flags, bool first) > > // The path solver needs EDGE_DFS_BACK in resolving mode. > > if (flags & BT_RESOLVE) > > mark_dfs_back_edges (); > > - m_solver = new path_range_query (flags & BT_RESOLVE); > > + m_ranger = new gimple_ranger; > > + m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger); > > } > > > > back_threader::~back_threader () > > { > > delete m_solver; > > + delete m_ranger; > > > > loop_optimizer_finalize (); > > } > > @@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const > > vec<basic_block> &path, > > { > > int_range_max r; > > + int_range<2> rf; > > + if (path.length () == 1) > > + { > > + fold_range (rf, cond, m_ranger); > > + } > > + > > m_solver->compute_ranges (path, m_imports); > > m_solver->range_of_stmt (r, cond); > > @@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const > > vec<basic_block> &path, > > > > if (r == true_range || r == false_range) > > { > > + if (path.length () == 1) > > + gcc_assert (r == rf); > > edge e_true, e_false; > > basic_block bb = gimple_bb (cond); > > extract_true_false_edges_from_block (bb, &e_true, &e_false); > > > > one example is > > > > <bb 176> [local count: 14414059]: > > _128 = node_177(D)->typed.type; > > pretmp_413 = MEM[(const union tree_node *)_128].base.code; > > _431 = pretmp_413 + 65519; > > if (_128 == 0B) > > goto <bb 199>; [18.09%] > > else > > goto <bb 177>; [81.91%] > > > > where m_imports for the path is just _128 and the range computed is > > false while the ranger query returns VARYING. But > > path_range_query::range_defined_in_block does > > > > if (bb && POINTER_TYPE_P (TREE_TYPE (name))) > > m_ranger->m_cache.m_exit.maybe_adjust_range (r, name, bb); > This is the coarse grained "side effect applies somewhere in the block" > mechanism. There is no understanding of where in the block it happens. > > > > which adjusts the range to ~[0, 0], probably because of the > > dereference in the following stmt. > > > > Why does fold_range not do this when folding the exit test? Is there > > a way to make it do so? It looks like the only routine using this > > in gimple-range.cc is range_on_edge and there it's used for e->src > > after calling range_on_exit for e->src (why's it not done in > > range_on_exit?). A testcase for this is > > Fold_range doesnt do this because it is simply another statement. It makes no > attempt to understand the context in which you are folding something. you > could be folding that stmt from a different location (ie recomputing) If > your context is that you are looking for the range after the last statement > has been executed, then one needs to check to see if there are any side > effects. Hmm, but I'm asking it to fold a specific statement - how can that ever be folded "from a different context"? In fact it traces as " at stmt " with the stmt I pass it. But yes, the issue is of course that to compute the range of q != 0 it needs to compute the range of 'q' but it simply does // If no name, simply call the base routine. if (!name) { res = fold_range_internal (r, s, NULL_TREE); if (res && is_a <gcond *> (s)) { // Update any exports in the cache if this is a gimple cond statement. tree exp; basic_block bb = gimple_bb (s); FOR_EACH_GORI_EXPORT_NAME (m_cache.m_gori, bb, exp) m_cache.propagate_updated_value (exp, bb); and fur_depend seems to get the context so it could adjust ranges in its get_operand which seems to just call range_of_expr with the stmt as third arg. Unfortunately gimple_ranger::range_of_expr is undocumented, but again 'stmt' is dumped as " at stmt " thus seems to be the "context" here? // If name is defined in this block, try to get an range from S. what is 'S'? def_stmt or stmt? if (def_stmt && gimple_bb (def_stmt) == bb) { // Declared in this block, if it has a global set, check for an // override from a block walk, otherwise calculate it. if (m_cache.get_global_range (r, expr)) m_cache.block_range (r, bb, expr, false); else range_of_stmt (r, def_stmt, expr); so, add if (is_ctrl_stmt (stmt)) m_cache.m_exit.maybe_adjust_range (r, expr, bb); at the end (also covering the range_on_entry case). I suppose range_on_entry does not evaluate all possible paths from definition to BB, using maybe_adjust_range on blocks inbetween and unioning ranges at path merges? That would likely be prohibitly expensive. > ranger uses it for range_on_edge (), because it knows all the statements in > the block have been executed, and its safe to apply anything seen in the > block. It does it right after range_on_exit() is called internally. > > Once upon a time, it was integrated with range-on-exit, but it turned out > there were certain times that it was causing problems. There have been some > cleanups since then, it probably safe now to return that call to > range_on_exit.. but doesnt buy us a whole lot by itself.. except of course I > have now OKd using range_on-entry/exit generally :-) > > the cache also uses it when walking blocks to pick up inferred values during > an on-entry cache fill. > > > > int foo (int **p, int i) > > { > > int *q = *p; > > int res = *q + i; > > if (q) > > return res; > > return -1; > > } > > > > which we "thread" with the path and with the above ICEs because > > fold_range doesn't get that if (q) is always true. Without the > > Its a known limitation that, unless you are doing a walk, on-demand requests > will "miss" some inferred ranges, because they are only maintained at the > basic block level. (ie, we will know that q is non-null in BB2, but we don't > know where, so we can make no assumptions at the exit condition about q in > this case. the path_query is invoked in on-demand mode because it wont walk > the entire IL,. so the first time you ask for the range of an ssa-name, it > will quickly zip over the immediate use list and "fill" the on-exit structure > for any blocks which a non-null reference is seen. This allows the threader > to pick up non-null from blocks outside the path that havent been examined. > > VRP does a walk, and while during the walk, adjusts ranges on the fly for the > current block via the gimple_ranger::register_inferred_ranges () routine. > which is really just a wrapper around ranger_cache::apply_inferred_ranges () > (in gimple-range-cache.cc) > > This is called after every statement and is where we take care of bookkeeping > for adjusting values, and adding them to the blocks list. > > if the path query is walking those statement, it could also "adjust" the range > of q on the fly... but it has to have walked those statements. from that > routine, the relevant bits use the gimple-infer class to find any inferred > ranges from the statement, and would look something like: > > gimple_infer_range infer(s); > for (unsigned x = 0; x < infer.num (); x++) > { > tree name = infer.name (x); > if (!interesting_p (name)) > continue; > get_current_path_range (r, name); > if (r.intersect (infer.range (x))) > set_current_path_range (name, r); > } > > That would adjust the value of q to be non-zero after "int res = *q + i;" > > but you need to have walked the statement before you get to the condition. > as long as they are all in your list of interesting statement to look at, then > you'd be golden. Ah, so while the above might work it would go against the spirit of this being only fully useful if applied during a walk, adjusting the cached ranges? It's still a bit unfortunate that one needs to walk all stmts for this - the usecase wants to just walk the CFG and control stmts, and with the path query it would only cover the single BB of each control stmt (thus the above hack would probably work out). Meanwhile I'm leaning towards calling this a phase ordering issue of threading + VRP, but that also means we shouldn't deliberately try to preserve "threadings" of this kind - in fact we might want to explicitely reject them? Richard.
On 8/17/22 03:42, Richard Biener wrote: > On Tue, 16 Aug 2022, Andrew MacLeod wrote: > >> On 8/16/22 05:18, Richard Biener wrote: >>> On Mon, 15 Aug 2022, Aldy Hernandez wrote: >>> >>>> On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod <amacleod@redhat.com> wrote: >>>>> heh. or just >>>>> >>>>> >>>>> + int_range<2> r; >>>>> + if (!fold_range (r, const_cast <gcond *> (cond_stmt)) >>>>> + || !r.singleton_p (&val)) >>>>> >>>>> >>>>> if you do not provide a range_query to any of the fold_using_range code, >>>>> it defaults to: >>>>> >>>>> fur_source::fur_source (range_query *q) >>>>> { >>>>> if (q) >>>>> m_query = q; >>>>> else if (cfun) >>>>> m_query = get_range_query (cfun); >>>>> else >>>>> m_query = get_global_range_query (); >>>>> m_gori = NULL; >>>>> } >>>>> >>>> Sweet. Even better! >>> So when I do the following incremental change ontop of the posted >>> patch then I see that the path query is able to simplify more >>> "single BB paths" than the global range folding. >>> >>> diff --git a/gcc/tree-ssa-threadbackward.cc >>> b/gcc/tree-ssa-threadbackward.cc >>> index 669098e4ec3..777e778037f 100644 >>> --- a/gcc/tree-ssa-threadbackward.cc >>> +++ b/gcc/tree-ssa-threadbackward.cc >>> @@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const >>> vec<basic_block> &path, >>> { >>> int_range_max r; >>> + int_range<2> rf; >>> + if (path.length () == 1) >>> + { >>> + fold_range (rf, cond); >>> + } >>> + >>> m_solver->compute_ranges (path, m_imports); >>> m_solver->range_of_stmt (r, cond); >>> @@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const >>> vec<basic_block> &path, >>> >>> if (r == true_range || r == false_range) >>> { >>> + if (path.length () == 1) >>> + gcc_assert (r == rf); >>> edge e_true, e_false; >>> basic_block bb = gimple_bb (cond); >>> extract_true_false_edges_from_block (bb, &e_true, &e_false); >>> >>> Even doing the following (not sure what's the difference and in >>> particular expense over the path range query) results in missed >>> simplifications (checking my set of cc1files). >>> >>> diff --git a/gcc/tree-ssa-threadbackward.cc >>> b/gcc/tree-ssa-threadbackward.cc >>> index 669098e4ec3..1d43a179d08 100644 >>> --- a/gcc/tree-ssa-threadbackward.cc >>> +++ b/gcc/tree-ssa-threadbackward.cc >>> @@ -99,6 +99,7 @@ private: >>> >>> back_threader_registry m_registry; >>> back_threader_profitability m_profit; >>> + gimple_ranger *m_ranger; >>> path_range_query *m_solver; >>> >>> // Current path being analyzed. >>> @@ -146,12 +147,14 @@ back_threader::back_threader (function *fun, >>> unsigned flags, bool first) >>> // The path solver needs EDGE_DFS_BACK in resolving mode. >>> if (flags & BT_RESOLVE) >>> mark_dfs_back_edges (); >>> - m_solver = new path_range_query (flags & BT_RESOLVE); >>> + m_ranger = new gimple_ranger; >>> + m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger); >>> } >>> >>> back_threader::~back_threader () >>> { >>> delete m_solver; >>> + delete m_ranger; >>> >>> loop_optimizer_finalize (); >>> } >>> @@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const >>> vec<basic_block> &path, >>> { >>> int_range_max r; >>> + int_range<2> rf; >>> + if (path.length () == 1) >>> + { >>> + fold_range (rf, cond, m_ranger); >>> + } >>> + >>> m_solver->compute_ranges (path, m_imports); >>> m_solver->range_of_stmt (r, cond); >>> @@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const >>> vec<basic_block> &path, >>> >>> if (r == true_range || r == false_range) >>> { >>> + if (path.length () == 1) >>> + gcc_assert (r == rf); >>> edge e_true, e_false; >>> basic_block bb = gimple_bb (cond); >>> extract_true_false_edges_from_block (bb, &e_true, &e_false); >>> >>> one example is >>> >>> <bb 176> [local count: 14414059]: >>> _128 = node_177(D)->typed.type; >>> pretmp_413 = MEM[(const union tree_node *)_128].base.code; >>> _431 = pretmp_413 + 65519; >>> if (_128 == 0B) >>> goto <bb 199>; [18.09%] >>> else >>> goto <bb 177>; [81.91%] >>> >>> where m_imports for the path is just _128 and the range computed is >>> false while the ranger query returns VARYING. But >>> path_range_query::range_defined_in_block does >>> >>> if (bb && POINTER_TYPE_P (TREE_TYPE (name))) >>> m_ranger->m_cache.m_exit.maybe_adjust_range (r, name, bb); >> This is the coarse grained "side effect applies somewhere in the block" >> mechanism. There is no understanding of where in the block it happens. >>> which adjusts the range to ~[0, 0], probably because of the >>> dereference in the following stmt. >>> >>> Why does fold_range not do this when folding the exit test? Is there >>> a way to make it do so? It looks like the only routine using this >>> in gimple-range.cc is range_on_edge and there it's used for e->src >>> after calling range_on_exit for e->src (why's it not done in >>> range_on_exit?). A testcase for this is >> Fold_range doesnt do this because it is simply another statement. It makes no >> attempt to understand the context in which you are folding something. you >> could be folding that stmt from a different location (ie recomputing) If >> your context is that you are looking for the range after the last statement >> has been executed, then one needs to check to see if there are any side >> effects. > Hmm, but I'm asking it to fold a specific statement - how can that ever > be folded "from a different context"? In fact it traces as > " at stmt " with the stmt I pass it. But yes, the issue is of course > that to compute the range of q != 0 it needs to compute the range > of 'q' but it simply does > > // If no name, simply call the base routine. > if (!name) > { > res = fold_range_internal (r, s, NULL_TREE); > if (res && is_a <gcond *> (s)) > { > // Update any exports in the cache if this is a gimple cond > statement. > tree exp; > basic_block bb = gimple_bb (s); > FOR_EACH_GORI_EXPORT_NAME (m_cache.m_gori, bb, exp) > m_cache.propagate_updated_value (exp, bb); > > and fur_depend seems to get the context so it could adjust ranges > in its get_operand which seems to just call range_of_expr with > the stmt as third arg. Unfortunately gimple_ranger::range_of_expr > is undocumented, but again 'stmt' is dumped as " at stmt " thus > seems to be the "context" here? Oh, so you are looking at gimple_ranger::range_of_stmt(), not fold_range. > // If name is defined in this block, try to get an range from S. > > what is 'S'? def_stmt or stmt? > > if (def_stmt && gimple_bb (def_stmt) == bb) > { > // Declared in this block, if it has a global set, check for an > // override from a block walk, otherwise calculate it. > if (m_cache.get_global_range (r, expr)) > m_cache.block_range (r, bb, expr, false); > else > range_of_stmt (r, def_stmt, expr); > > so, add > > if (is_ctrl_stmt (stmt)) > m_cache.m_exit.maybe_adjust_range (r, expr, bb); > > at the end (also covering the range_on_entry case). I suppose > range_on_entry does not evaluate all possible paths from definition > to BB, using maybe_adjust_range on blocks inbetween and unioning > ranges at path merges? That would likely be prohibitly expensive. in fact it does. The cache takes care of filling in the blanks and requesting ranges that have not been satisfied yet in a reasonably efficient way. Maybe_adjust_range() will deal only with inferred ranges that are marked as having occurred in this block. The problem I ran into with trying to doing it in range_on_exit(), is that if there is an abnornal edge out of the block, it is unsafe to assume that anything is true at block exit.. when processing the abnormal destination, we ask for range_on_exit of anything it uses from the src block. That said, I suppose is should be safe for range_of_stmt to assume the statement successfully executed (or there wouldnt be a defintion produced), and therefore if it is the last stmt in the block, we should in theory be able to do as you suggest there. Im trying to think if its safe to also do it in range_of_expr. if is_ctrl_stmt() is true, can I assume that the stmt cannot throw an exception under any circumstance? in which case, its proabbly safe to put that in range_of_expr. Ive attached a patch which does that if you want to try it. The caveat is that it is only a partial solution... it will only work for names on that stmt. if you have anything more complex, ie if (a == 0 || b == 0) we have a seqeunce feeding the ctrl stmt.. c_1 = a == 0 c_2 = b == 0 c_3 = c_1 && c_2 if (c_3 == 0) only the evaluation of c_3 will have the ctrl stmt as its context.. the others will be evaluted on their own statement, and thus neither a nor b would pick up anything from the block as they are evalauted and cached as they are seen. unless of course we are doing a walk :-P > >> ranger uses it for range_on_edge (), because it knows all the statements in >> the block have been executed, and its safe to apply anything seen in the >> block. It does it right after range_on_exit() is called internally. >> >> Once upon a time, it was integrated with range-on-exit, but it turned out >> there were certain times that it was causing problems. There have been some >> cleanups since then, it probably safe now to return that call to >> range_on_exit.. but doesnt buy us a whole lot by itself.. except of course I >> have now OKd using range_on-entry/exit generally :-) >> >> the cache also uses it when walking blocks to pick up inferred values during >> an on-entry cache fill. >> >> >>> int foo (int **p, int i) >>> { >>> int *q = *p; >>> int res = *q + i; >>> if (q) >>> return res; >>> return -1; >>> } >>> >>> which we "thread" with the path and with the above ICEs because >>> fold_range doesn't get that if (q) is always true. Without the >> Its a known limitation that, unless you are doing a walk, on-demand requests >> will "miss" some inferred ranges, because they are only maintained at the >> basic block level. (ie, we will know that q is non-null in BB2, but we don't >> know where, so we can make no assumptions at the exit condition about q in >> this case. the path_query is invoked in on-demand mode because it wont walk >> the entire IL,. so the first time you ask for the range of an ssa-name, it >> will quickly zip over the immediate use list and "fill" the on-exit structure >> for any blocks which a non-null reference is seen. This allows the threader >> to pick up non-null from blocks outside the path that havent been examined. >> >> VRP does a walk, and while during the walk, adjusts ranges on the fly for the >> current block via the gimple_ranger::register_inferred_ranges () routine. >> which is really just a wrapper around ranger_cache::apply_inferred_ranges () >> (in gimple-range-cache.cc) >> >> This is called after every statement and is where we take care of bookkeeping >> for adjusting values, and adding them to the blocks list. >> >> if the path query is walking those statement, it could also "adjust" the range >> of q on the fly... but it has to have walked those statements. from that >> routine, the relevant bits use the gimple-infer class to find any inferred >> ranges from the statement, and would look something like: >> >> gimple_infer_range infer(s); >> for (unsigned x = 0; x < infer.num (); x++) >> { >> tree name = infer.name (x); >> if (!interesting_p (name)) >> continue; >> get_current_path_range (r, name); >> if (r.intersect (infer.range (x))) >> set_current_path_range (name, r); >> } >> >> That would adjust the value of q to be non-zero after "int res = *q + i;" >> >> but you need to have walked the statement before you get to the condition. >> as long as they are all in your list of interesting statement to look at, then >> you'd be golden. > Ah, so while the above might work it would go against the spirit of this > being only fully useful if applied during a walk, adjusting the > cached ranges? It's still a bit unfortunate that one needs to walk all > stmts for this - the usecase wants to just walk the CFG and control stmts, > and with the path query it would only cover the single BB of each > control stmt (thus the above hack would probably work out). I wouldn't say its against the spirit, merely against trying to catch all cases in a practical way :-). If we add that, it should work for this case. for the threader, we will have visited all the uses of q to fill the inferred block cache q will have a non-null refererence in bb2. and when calculating the final value, if we are indeed assuming anything in the block can be applied to the final stmt, then it should be ok. but we'll still mis anything but the simpliest cases. maybe thats OK. better to catch a few cases than none. let me think about how, especially for paths, we might be able to take care of calculating an entire complex condition/expression in the context of the original call. So, a longer term goal was to perhaps have some sort of persistent ranger across compatible passes.. turn it into a pass property, and only dispose of it when a pass makes enough IL changes to invalidate it... In the meantime, it should be possible to take a ranger that just completed a VRP pass, and use that as the root ranger for a threading pass immediately after.. I think there may be some lingering issues with abnormal edges if we "re-visit" blocks which we claim to have walked due to the way I store inferred ranges in those block.. (the expectation being we never go back up into the block, so the on-entry cache works like the "current def" vector in the original EVRP. I'd have to think about that too. > > Meanwhile I'm leaning towards calling this a phase ordering issue > of threading + VRP, but that also means we shouldn't deliberately > try to preserve "threadings" of this kind - in fact we might want > to explicitely reject them? we are probably going to want to visit some pass ordering. > Richard.
On Wed, 17 Aug 2022, Andrew MacLeod wrote: > > On 8/17/22 03:42, Richard Biener wrote: > > On Tue, 16 Aug 2022, Andrew MacLeod wrote: > > > >> On 8/16/22 05:18, Richard Biener wrote: > >>> On Mon, 15 Aug 2022, Aldy Hernandez wrote: > >>> > >>>> On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod <amacleod@redhat.com> > >>>> wrote: > >>>>> heh. or just > >>>>> > >>>>> > >>>>> + int_range<2> r; > >>>>> + if (!fold_range (r, const_cast <gcond *> (cond_stmt)) > >>>>> + || !r.singleton_p (&val)) > >>>>> > >>>>> > >>>>> if you do not provide a range_query to any of the fold_using_range code, > >>>>> it defaults to: > >>>>> > >>>>> fur_source::fur_source (range_query *q) > >>>>> { > >>>>> if (q) > >>>>> m_query = q; > >>>>> else if (cfun) > >>>>> m_query = get_range_query (cfun); > >>>>> else > >>>>> m_query = get_global_range_query (); > >>>>> m_gori = NULL; > >>>>> } > >>>>> > >>>> Sweet. Even better! > >>> So when I do the following incremental change ontop of the posted > >>> patch then I see that the path query is able to simplify more > >>> "single BB paths" than the global range folding. > >>> > >>> diff --git a/gcc/tree-ssa-threadbackward.cc > >>> b/gcc/tree-ssa-threadbackward.cc > >>> index 669098e4ec3..777e778037f 100644 > >>> --- a/gcc/tree-ssa-threadbackward.cc > >>> +++ b/gcc/tree-ssa-threadbackward.cc > >>> @@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const > >>> vec<basic_block> &path, > >>> { > >>> int_range_max r; > >>> + int_range<2> rf; > >>> + if (path.length () == 1) > >>> + { > >>> + fold_range (rf, cond); > >>> + } > >>> + > >>> m_solver->compute_ranges (path, m_imports); > >>> m_solver->range_of_stmt (r, cond); > >>> @@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const > >>> vec<basic_block> &path, > >>> > >>> if (r == true_range || r == false_range) > >>> { > >>> + if (path.length () == 1) > >>> + gcc_assert (r == rf); > >>> edge e_true, e_false; > >>> basic_block bb = gimple_bb (cond); > >>> extract_true_false_edges_from_block (bb, &e_true, &e_false); > >>> > >>> Even doing the following (not sure what's the difference and in > >>> particular expense over the path range query) results in missed > >>> simplifications (checking my set of cc1files). > >>> > >>> diff --git a/gcc/tree-ssa-threadbackward.cc > >>> b/gcc/tree-ssa-threadbackward.cc > >>> index 669098e4ec3..1d43a179d08 100644 > >>> --- a/gcc/tree-ssa-threadbackward.cc > >>> +++ b/gcc/tree-ssa-threadbackward.cc > >>> @@ -99,6 +99,7 @@ private: > >>> > >>> back_threader_registry m_registry; > >>> back_threader_profitability m_profit; > >>> + gimple_ranger *m_ranger; > >>> path_range_query *m_solver; > >>> > >>> // Current path being analyzed. > >>> @@ -146,12 +147,14 @@ back_threader::back_threader (function *fun, > >>> unsigned flags, bool first) > >>> // The path solver needs EDGE_DFS_BACK in resolving mode. > >>> if (flags & BT_RESOLVE) > >>> mark_dfs_back_edges (); > >>> - m_solver = new path_range_query (flags & BT_RESOLVE); > >>> + m_ranger = new gimple_ranger; > >>> + m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger); > >>> } > >>> > >>> back_threader::~back_threader () > >>> { > >>> delete m_solver; > >>> + delete m_ranger; > >>> > >>> loop_optimizer_finalize (); > >>> } > >>> @@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const > >>> vec<basic_block> &path, > >>> { > >>> int_range_max r; > >>> + int_range<2> rf; > >>> + if (path.length () == 1) > >>> + { > >>> + fold_range (rf, cond, m_ranger); > >>> + } > >>> + > >>> m_solver->compute_ranges (path, m_imports); > >>> m_solver->range_of_stmt (r, cond); > >>> @@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const > >>> vec<basic_block> &path, > >>> > >>> if (r == true_range || r == false_range) > >>> { > >>> + if (path.length () == 1) > >>> + gcc_assert (r == rf); > >>> edge e_true, e_false; > >>> basic_block bb = gimple_bb (cond); > >>> extract_true_false_edges_from_block (bb, &e_true, &e_false); > >>> > >>> one example is > >>> > >>> <bb 176> [local count: 14414059]: > >>> _128 = node_177(D)->typed.type; > >>> pretmp_413 = MEM[(const union tree_node *)_128].base.code; > >>> _431 = pretmp_413 + 65519; > >>> if (_128 == 0B) > >>> goto <bb 199>; [18.09%] > >>> else > >>> goto <bb 177>; [81.91%] > >>> > >>> where m_imports for the path is just _128 and the range computed is > >>> false while the ranger query returns VARYING. But > >>> path_range_query::range_defined_in_block does > >>> > >>> if (bb && POINTER_TYPE_P (TREE_TYPE (name))) > >>> m_ranger->m_cache.m_exit.maybe_adjust_range (r, name, bb); > >> This is the coarse grained "side effect applies somewhere in the block" > >> mechanism. There is no understanding of where in the block it happens. > >>> which adjusts the range to ~[0, 0], probably because of the > >>> dereference in the following stmt. > >>> > >>> Why does fold_range not do this when folding the exit test? Is there > >>> a way to make it do so? It looks like the only routine using this > >>> in gimple-range.cc is range_on_edge and there it's used for e->src > >>> after calling range_on_exit for e->src (why's it not done in > >>> range_on_exit?). A testcase for this is > >> Fold_range doesnt do this because it is simply another statement. It makes > >> no > >> attempt to understand the context in which you are folding something. you > >> could be folding that stmt from a different location (ie recomputing) If > >> your context is that you are looking for the range after the last statement > >> has been executed, then one needs to check to see if there are any side > >> effects. > > Hmm, but I'm asking it to fold a specific statement - how can that ever > > be folded "from a different context"? In fact it traces as > > " at stmt " with the stmt I pass it. But yes, the issue is of course > > that to compute the range of q != 0 it needs to compute the range > > of 'q' but it simply does > > > > // If no name, simply call the base routine. > > if (!name) > > { > > res = fold_range_internal (r, s, NULL_TREE); > > if (res && is_a <gcond *> (s)) > > { > > // Update any exports in the cache if this is a gimple cond > > statement. > > tree exp; > > basic_block bb = gimple_bb (s); > > FOR_EACH_GORI_EXPORT_NAME (m_cache.m_gori, bb, exp) > > m_cache.propagate_updated_value (exp, bb); > > > > and fur_depend seems to get the context so it could adjust ranges > > in its get_operand which seems to just call range_of_expr with > > the stmt as third arg. Unfortunately gimple_ranger::range_of_expr > > is undocumented, but again 'stmt' is dumped as " at stmt " thus > > seems to be the "context" here? > Oh, so you are looking at gimple_ranger::range_of_stmt(), not fold_range. > > // If name is defined in this block, try to get an range from S. > > > > what is 'S'? def_stmt or stmt? > > > > if (def_stmt && gimple_bb (def_stmt) == bb) > > { > > // Declared in this block, if it has a global set, check for an > > // override from a block walk, otherwise calculate it. > > if (m_cache.get_global_range (r, expr)) > > m_cache.block_range (r, bb, expr, false); > > else > > range_of_stmt (r, def_stmt, expr); > > > > so, add > > > > if (is_ctrl_stmt (stmt)) > > m_cache.m_exit.maybe_adjust_range (r, expr, bb); > > > > at the end (also covering the range_on_entry case). I suppose > > range_on_entry does not evaluate all possible paths from definition > > to BB, using maybe_adjust_range on blocks inbetween and unioning > > ranges at path merges? That would likely be prohibitly expensive. > > in fact it does. The cache takes care of filling in the blanks and requesting > ranges that have not been satisfied yet in a reasonably efficient way. > Maybe_adjust_range() will deal only with inferred ranges that are marked as > having occurred in this block. > > The problem I ran into with trying to doing it in range_on_exit(), is that if > there is an abnornal edge out of the block, it is unsafe to assume that > anything is true at block exit.. when processing the abnormal destination, > we ask for range_on_exit of anything it uses from the src block. > > That said, I suppose is should be safe for range_of_stmt to assume the > statement successfully executed (or there wouldnt be a defintion produced), > and therefore if it is the last stmt in the block, we should in theory be able > to do as you suggest there. > > Im trying to think if its safe to also do it in range_of_expr. if > is_ctrl_stmt() is true, can I assume that the stmt cannot throw an exception > under any circumstance? in which case, its proabbly safe to put that in > range_of_expr. Ive attached a patch which does that if you want to try it. control stmts cannot throw > The caveat is that it is only a partial solution... it will only work for > names on that stmt. if you have anything more complex, ie > > if (a == 0 || b == 0) we have a seqeunce feeding the ctrl stmt.. > > c_1 = a == 0 > c_2 = b == 0 > c_3 = c_1 && c_2 > if (c_3 == 0) > > only the evaluation of c_3 will have the ctrl stmt as its context.. the others > will be evaluted on their own statement, and thus neither a nor b would pick > up anything from the block as they are evalauted and cached as they are > seen. unless of course we are doing a walk :-P Hmm, but as I traced it when I do range_of_expr the context stmt I provide will be passed down and even when processing dependences that context will stick? But maybe it doesn't because it would mean explosion of the cache? But yeah, with the above restriction it would be even more useless. Same issue as with *p = 0; if (..) / \ .. \ if (p) here the local adjustment of 'p' in if (p) would not pick up the p != 0 guarantee from the immediate dominator. > > > > >> ranger uses it for range_on_edge (), because it knows all the statements > >> in > >> the block have been executed, and its safe to apply anything seen in the > >> block. It does it right after range_on_exit() is called internally. > >> > >> Once upon a time, it was integrated with range-on-exit, but it turned out > >> there were certain times that it was causing problems. There have been some > >> cleanups since then, it probably safe now to return that call to > >> range_on_exit.. but doesnt buy us a whole lot by itself.. except of course > >> I > >> have now OKd using range_on-entry/exit generally :-) > >> > >> the cache also uses it when walking blocks to pick up inferred values > >> during > >> an on-entry cache fill. > >> > >> > >>> int foo (int **p, int i) > >>> { > >>> int *q = *p; > >>> int res = *q + i; > >>> if (q) > >>> return res; > >>> return -1; > >>> } > >>> > >>> which we "thread" with the path and with the above ICEs because > >>> fold_range doesn't get that if (q) is always true. Without the > >> Its a known limitation that, unless you are doing a walk, on-demand > >> requests > >> will "miss" some inferred ranges, because they are only maintained at the > >> basic block level. (ie, we will know that q is non-null in BB2, but we > >> don't > >> know where, so we can make no assumptions at the exit condition about q in > >> this case. the path_query is invoked in on-demand mode because it wont walk > >> the entire IL,. so the first time you ask for the range of an ssa-name, it > >> will quickly zip over the immediate use list and "fill" the on-exit > >> structure > >> for any blocks which a non-null reference is seen. This allows the > >> threader > >> to pick up non-null from blocks outside the path that havent been examined. > >> > >> VRP does a walk, and while during the walk, adjusts ranges on the fly for > >> the > >> current block via the gimple_ranger::register_inferred_ranges () routine. > >> which is really just a wrapper around ranger_cache::apply_inferred_ranges > >> () > >> (in gimple-range-cache.cc) > >> > >> This is called after every statement and is where we take care of > >> bookkeeping > >> for adjusting values, and adding them to the blocks list. > >> > >> if the path query is walking those statement, it could also "adjust" the > >> range > >> of q on the fly... but it has to have walked those statements. from that > >> routine, the relevant bits use the gimple-infer class to find any inferred > >> ranges from the statement, and would look something like: > >> > >> gimple_infer_range infer(s); > >> for (unsigned x = 0; x < infer.num (); x++) > >> { > >> tree name = infer.name (x); > >> if (!interesting_p (name)) > >> continue; > >> get_current_path_range (r, name); > >> if (r.intersect (infer.range (x))) > >> set_current_path_range (name, r); > >> } > >> > >> That would adjust the value of q to be non-zero after "int res = *q + i;" > >> > >> but you need to have walked the statement before you get to the condition. > >> as long as they are all in your list of interesting statement to look at, > >> then > >> you'd be golden. > > Ah, so while the above might work it would go against the spirit of this > > being only fully useful if applied during a walk, adjusting the > > cached ranges? It's still a bit unfortunate that one needs to walk all > > stmts for this - the usecase wants to just walk the CFG and control stmts, > > and with the path query it would only cover the single BB of each > > control stmt (thus the above hack would probably work out). > > I wouldn't say its against the spirit, merely against trying to catch all > cases in a practical way :-). If we add that, it should work for this case. > for the threader, we will have visited all the uses of q to fill the inferred > block cache q will have a non-null refererence in bb2. and when calculating > the final value, if we are indeed assuming anything in the block can be > applied to the final stmt, then it should be ok. but we'll still mis anything > but the simpliest cases. maybe thats OK. better to catch a few cases than > none. > > let me think about how, especially for paths, we might be able to take care of > calculating an entire complex condition/expression in the context of the > original call. > > So, a longer term goal was to perhaps have some sort of persistent ranger > across compatible passes.. turn it into a pass property, and only dispose of > it when a pass makes enough IL changes to invalidate it... We've tried to do that with the SCEV cache and it proved to be a hassle with lots of issues ... (we're still there but I think almost all passes now clear the SCEV cache), so I wouldn't go there. > In the meantime, > it should be possible to take a ranger that just completed a VRP pass, and use > that as the root ranger for a threading pass immediately after.. I think there > may be some lingering issues with abnormal edges if we "re-visit" blocks which > we claim to have walked due to the way I store inferred ranges in those > block.. (the expectation being we never go back up into the block, so the > on-entry cache works like the "current def" vector in the original EVRP. I'd > have to think about that too. Of course that would essentially do a VRP pass before each threading which I think is a bit expensive. Also looking at how ranger works with all its abstraction having the "old" EVRP style body walk rather than abusing the on-demand ranger for such a walk would be a lot more efficient for this purpose :/ > > Meanwhile I'm leaning towards calling this a phase ordering issue > > of threading + VRP, but that also means we shouldn't deliberately > > try to preserve "threadings" of this kind - in fact we might want > > to explicitely reject them? > we are probably going to want to visit some pass ordering. Sure, though there's never going to be a good enough pass ordering :/ Richard.
On 8/18/22 03:08, Richard Biener wrote: > >> The caveat is that it is only a partial solution... it will only work for >> names on that stmt. if you have anything more complex, ie >> >> if (a == 0 || b == 0) we have a seqeunce feeding the ctrl stmt.. >> >> c_1 = a == 0 >> c_2 = b == 0 >> c_3 = c_1 && c_2 >> if (c_3 == 0) >> >> only the evaluation of c_3 will have the ctrl stmt as its context.. the others >> will be evaluted on their own statement, and thus neither a nor b would pick >> up anything from the block as they are evalauted and cached as they are >> seen. unless of course we are doing a walk :-P > Hmm, but as I traced it when I do range_of_expr the context stmt I provide > will be passed down and even when processing dependences that context > will stick? But maybe it doesn't because it would mean explosion of the > cache? > > But yeah, with the above restriction it would be even more useless. > Same issue as with > > *p = 0; > if (..) > / \ > .. \ > if (p) > > here the local adjustment of 'p' in if (p) would not pick up the > p != 0 guarantee from the immediate dominator. it certainly should. the earlier BB will have the ~[0, 0] property in the on-exit structure, so when the range of 'p' is evaluated on the edge to the next block, it will be adjusted. the value for on-entry of P to that block will therefore be ~[0, 0]. Ranger does this, and the path query code is *suppose* to.. late discussions with Aldy yesterday left me unclear if it always does. it should. that was the entire point of leaving the on-demand filling of the structure via immediate uses. >> In the meantime, >> it should be possible to take a ranger that just completed a VRP pass, and use >> that as the root ranger for a threading pass immediately after.. I think there >> may be some lingering issues with abnormal edges if we "re-visit" blocks which >> we claim to have walked due to the way I store inferred ranges in those >> block.. (the expectation being we never go back up into the block, so the >> on-entry cache works like the "current def" vector in the original EVRP. I'd >> have to think about that too. > Of course that would essentially do a VRP pass before each threading > which I think is a bit expensive. Also looking at how ranger works > with all its abstraction having the "old" EVRP style body walk rather > than abusing the on-demand ranger for such a walk would be a lot more > efficient for this purpose :/ No, I just meant when we do the VRP pass, rather than throw away the fully primed ranger and its values, one could invoke the threader using it... But I'm not sure how much extra we'd get anyway. >>> Meanwhile I'm leaning towards calling this a phase ordering issue >>> of threading + VRP, but that also means we shouldn't deliberately >>> try to preserve "threadings" of this kind - in fact we might want >>> to explicitely reject them? >> we are probably going to want to visit some pass ordering. > Sure, though there's never going to be a good enough pass ordering :/ > > Richard.
diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc index 78146f5683e..a7d277c31b8 100644 --- a/gcc/gimple-range-path.cc +++ b/gcc/gimple-range-path.cc @@ -220,7 +220,7 @@ path_range_query::unreachable_path_p () void path_range_query::set_path (const vec<basic_block> &path) { - gcc_checking_assert (path.length () > 1); + gcc_checking_assert (!path.is_empty ()); m_path = path.copy (); m_pos = m_path.length () - 1; bitmap_clear (m_has_cache_entry); diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc index b886027fccf..669098e4ec3 100644 --- a/gcc/tree-ssa-threadbackward.cc +++ b/gcc/tree-ssa-threadbackward.cc @@ -241,8 +241,9 @@ back_threader::maybe_register_path () else { bool irreducible = false; - if (m_profit.profitable_path_p (m_path, m_name, taken_edge, - &irreducible) + if ((m_path.length () == 1 + || m_profit.profitable_path_p (m_path, m_name, taken_edge, + &irreducible)) && debug_counter () && m_registry.register_path (m_path, taken_edge)) { @@ -267,7 +268,6 @@ back_threader::maybe_register_path () edge back_threader::find_taken_edge (const vec<basic_block> &path) { - gcc_checking_assert (path.length () > 1); switch (gimple_code (m_last_stmt)) { case GIMPLE_COND: @@ -350,9 +350,15 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting, m_path.safe_push (bb); // Try to resolve the path without looking back. - if (m_path.length () > 1 - && (!m_profit.profitable_path_p (m_path, m_name, NULL) - || maybe_register_path ())) + if ((m_path.length () > 1 + && !m_profit.profitable_path_p (m_path, m_name, NULL)) + || maybe_register_path ()) + ; + + // The backwards thread copier cannot copy blocks that do not belong + // to the same loop, so when the new source of the path entry no + // longer belongs to it we don't need to search further. + else if (m_path[0]->loop_father != bb->loop_father) ; // Continue looking for ways to extend the path but limit the @@ -445,7 +451,8 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting, edge e; FOR_EACH_EDGE (e, iter, bb->preds) { - if (e->flags & EDGE_ABNORMAL + if ((e->flags & EDGE_ABNORMAL) + || e->src->index == ENTRY_BLOCK // This is like path_crosses_loops in profitable_path_p but // more restrictive to avoid peeling off loop iterations (see // tree-ssa/pr14341.c for an example). diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-threadupdate.cc index 59c268a3567..d40fa7c4cff 100644 --- a/gcc/tree-ssa-threadupdate.cc +++ b/gcc/tree-ssa-threadupdate.cc @@ -2613,6 +2613,60 @@ back_jt_path_registry::update_cfg (bool /*peel_loop_headers*/) bool retval = false; hash_set<edge> visited_starting_edges; + /* Mark never taken edges from paths that are just jump simplifications. */ + auto_edge_flag never_taken (cfun); + for (auto path : m_paths) + if (path->length () == 1) + { + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, (*path)[0]->e->src->succs) + if (e != (*path)[0]->e) + e->flags |= never_taken; + } + + /* And prune paths that contain such edge before we remove them. */ + for (unsigned i = 0; i < m_paths.length ();) + { + bool remove = false; + for (auto je : *m_paths[i]) + { + if (je->e->flags & never_taken) + { + cancel_thread (m_paths[i], + "Avoiding threading through unreachable edge"); + remove = true; + break; + } + } + if (!remove) + ++i; + else + m_paths.unordered_remove (i); + } + + /* Finally perform those threads first, this way we avoid copying the + dead outgoing edges when other theads contain the prevailing edge. */ + for (unsigned i = 0; i < m_paths.length ();) + { + vec<jump_thread_edge *> *path = m_paths[i]; + if (path->length () != 1) + { + ++i; + continue; + } + edge exit = (*path)[0]->e; + remove_ctrl_stmt_and_useless_edges (exit->src, exit->dest); + exit->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE | EDGE_ABNORMAL); + exit->flags |= EDGE_FALLTHRU; + /* We do not update dominance info. */ + free_dominance_info (CDI_DOMINATORS); + retval = true; + m_num_threaded_edges++; + path->release (); + m_paths.unordered_remove (i); + } + while (m_paths.length ()) { vec<jump_thread_edge *> *path = m_paths[0];