[RFH] Wire ranger into FRE

Message ID 20220919111916.192EE13A96@imap2.suse-dmz.suse.de
State Accepted, archived
Headers
Series [RFH] Wire ranger into FRE |

Checks

Context Check Description
snail/gcc-patches-check success Github commit url

Commit Message

Richard Biener Sept. 19, 2022, 11:19 a.m. UTC
  The following is an attempt to utilize ranger for simplification
of conditionals during value-numbering.  It fails the added
testcase because it seems the fur source is only involved when
processing the stmt to be folded but not when filling the cache
of ranges on the operand use->def chain.

When not iterating that's not a problem since earlier stmts
have operands eliminated but when not iterating or when
elimination is disabled the IL remains unchanged.

When iterating it will probably do things wrong due to the lack
of pruning the chache for the region we unwind and when doing
region based VN it likely picks up stale EDGE_EXECUTABLE when
leaving the region over the entry because there's no way to stop
"local" cache prefilling when leaving the region (and just using
global ranges from there).

Knowing path-range-query it's probably possible to clone all
the block walking and cache filling, making a special variant
for value-numbering but I wonder whether that's really what's
intended?  I would probably need to implement a special
range_of_expr and range_of_stmt for this and keep my own cache?
Basically a region_range_query with wired in valueization?

	* tree-ssa-sccvn.cc (...): ...
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c | 15 ++++
 gcc/tree-ssa-sccvn.cc                       | 77 ++++++++++++++++++---
 gcc/tree-ssa-sccvn.h                        |  3 +-
 3 files changed, 83 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
  

Comments

Andrew MacLeod Sept. 19, 2022, 3:25 p.m. UTC | #1
Yeah, currently the internal cache isnt really wired into the 
fold_using_range as its is suppose to be a consistent internal state.  
so its not currently expecting to work in a situation here what it 
thinks is global state might change.

I figured I better go back and watch your entire VN presentation, I only 
caught the last bit live.

Let me try to understand exactly what you want/need.. and let me use a 
simple example from that talk (yes I just typed it in :-).

   n_2 = 1
   i_4 = 0
   val_5 = 0
<bb 3>:
   # i_1 = PHI <i_4(2), i_7(3)>
   #val_2 = PHI <val_5(2), val_6(3) >
   val_6 = val_2 + 1;
   i_7 = i_1 + 1
   if (i_7 < n_3)
     goto <bb 3>;
   else
     goto <bb 4>;
<bb 4>
   _8 = val_6
   return _8


In an ideal world, While you are processing bb 3 the first iteration,  
you want edge 4->3 to be unexecutable as far as ranger is concerned.  
You walk thru the block, and get to the end.  and in this example, you'd 
be done cause any queries you make range would have if (2 < 1),

For the case when we do need to iterate (say n_2 = 3), you need to go 
back and rewalk that block, with that edge no longer marked as 
unexecutable?  And part of the problem would be that ranger's cache for 
bb 3 would have all those values from the first pass. and everything 
explodes.

It looks like you created a fur_source to manually adjust PHIs within 
the fold_stmt query to ignore edges that are not marked executable.

So now let me start with my questions.

In theory, if you could tell it "kill the cache for bb3"  change the 
executable state and re walk it, that would work?  This clearly get s a 
little more complicated as the size of the region that is being iterated 
grows, so you would want to be able to invalidate the cache for a range 
of blocks.

Thats not quite as straightforward as it might seems because the cache 
for values per basic block is indexed by ssa-name..  so you cant simply 
say, "kill bb3s value".. you need to walk all the ssa-names that have a 
cache object, and if they have an entry for bb3, kill that.  But lets 
leave that alone for a minute, because that is a solvable problem.

Ranger also has a globally available flag that it uses to internally 
flag and track executable state of edge:

   auto_edge_flag non_executable_edge_flag;

The gori object which is instantiated with any ranger instance only uses 
that to determine if an edge should be processed for a value  ie

gori_compute::outgoing_edge_range_p (vrange &r, edge e, tree name, 
range_query &q)
{
     if ((e->flags & m_not_executable_flag))
     {
       r.set_undefined ();
       return true;

So it seems to me that if you all set/cleared this flag along with 
EDGE_EXECUTABLE  (and it means the opposite by the way.  TRUE means the 
edge is not_executable.. )  That at least all your queries through 
ranger would be picking up the values you are looking for.  including 
those PHI's.

That would then just leave you with the stale cache state to deal 
with?   And if we can resolve that, would all just work?  at least in 
theory?

There is a concept in the cache of staleness which is used when we 
recalculate over back edges, but its really oriented towards a model 
where we only ever move edges from executable to non-executable.    Your 
VN approach moves them in the other way and I have no confidence it 
would work properly in that mode.

Its not super efficient, but as a starting point we might be able to do 
something like
   1 - when you enter a region call a new routine called "Mark state" 
which would  push an empty bitmap over ssa-names onto a stack
   2 - everytime the cache creates an entry, set the bit for the 
ssa-name in the bitmap at the top of the stack
  3 - when you call "reset state", simply kill all the cache entries for 
any ssa name set in the bitmap.  And also kill rangers internal global 
value for the name.   THIs would in essence reset eveyrthing it knows 
about that ssa name.

That would have to drawback of losing any information that had been 
calculated earlier, and possibly trigger some additional calculations as 
that value will now be stale elsewhere in the IL.   We could make it 
more efficient if you also provided a bitmap over the basic blocks in 
the region (or some way of determining that).  Then we would just kill 
the entries in those blocks for those ssa-names.

As long as you weren't making any ranger queries from outside the region 
during this time, that might work.

Before I go any further, are we on the same page, or are you looking for 
something different than this?

You had mentioned the path_ranger model also, which effectively uses 
ranger for the values on entry to the path/region, and then uses 
over-rided API entry points to choose whether to use the local cache 
vector, or go to rangers on-entry values..

  This approach could also work, and one way would be to implement the 
mark/reset API . The new class would maybe instantiate an additional 
local cache from which values are set/picked up first for any queries 
within the region blocks and when you are done, throws away the cache 
for the region. This would require some tweaking from rangers primary 
cache lookup/set routine, but perhaps there is some benefit to 
generalizing this model and integrating it.. ie, a region_ranger which 
builds this facility on top and allows you to reset regions.   
Presumably you want to be able to "stack" these marks so you can push 
and pop regions? would also have to figure out how to work the 
relational oracle with it.

I can think of a couple of possible approaches for that, some more 
efficient than others.    I suppose the path_ranger could even be 
completely replaced with that model. huh. it would just call mark() at 
the start of a path, and reset() when its done with the path.   Might 
not be as efficient however.   Be a good proof of correctness test I 
guess at least.

Anyway, am I in the ballpark somewhere of what you are looking for?  Or 
am I over engineering something? :-)

Andrew



On 9/19/22 07:19, Richard Biener wrote:
> The following is an attempt to utilize ranger for simplification
> of conditionals during value-numbering.  It fails the added
> testcase because it seems the fur source is only involved when
> processing the stmt to be folded but not when filling the cache
> of ranges on the operand use->def chain.
>
> When not iterating that's not a problem since earlier stmts
> have operands eliminated but when not iterating or when
> elimination is disabled the IL remains unchanged.
>
> When iterating it will probably do things wrong due to the lack
> of pruning the chache for the region we unwind and when doing
> region based VN it likely picks up stale EDGE_EXECUTABLE when
> leaving the region over the entry because there's no way to stop
> "local" cache prefilling when leaving the region (and just using
> global ranges from there).
>
> Knowing path-range-query it's probably possible to clone all
> the block walking and cache filling, making a special variant
> for value-numbering but I wonder whether that's really what's
> intended?  I would probably need to implement a special
> range_of_expr and range_of_stmt for this and keep my own cache?
> Basically a region_range_query with wired in valueization?
>
> 	* tree-ssa-sccvn.cc (...): ...
> ---
>   gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c | 15 ++++
>   gcc/tree-ssa-sccvn.cc                       | 77 ++++++++++++++++++---
>   gcc/tree-ssa-sccvn.h                        |  3 +-
>   3 files changed, 83 insertions(+), 12 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
> new file mode 100644
> index 00000000000..f479a322b70
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-fre1" } */
> +
> +int foo (int *p, int *q)
> +{
> +  if (*p > 0 && *q > 0)
> +    {
> +      int v = *p + *q;
> +      if (v > 0)
> +        return *q;
> +    }
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "known outgoing edge" 1 "fre1" } } */
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index 74b8d8d18ef..2753200c2c2 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
>   #include "fold-const-call.h"
>   #include "ipa-modref-tree.h"
>   #include "ipa-modref.h"
> +#include "gimple-range.h"
>   #include "tree-ssa-sccvn.h"
>   
>   /* This algorithm is based on the SCC algorithm presented by Keith
> @@ -360,10 +361,11 @@ static vn_ssa_aux_t last_pushed_avail;
>      correct.  */
>   static vn_tables_t valid_info;
>   
> +tree (*vn_valueize) (tree);
>   
>   /* Valueization hook for simplify_replace_tree.  Valueize NAME if it is
>      an SSA name, otherwise just return it.  */
> -tree (*vn_valueize) (tree);
> +
>   static tree
>   vn_valueize_for_srt (tree t, void* context ATTRIBUTE_UNUSED)
>   {
> @@ -380,6 +382,36 @@ vn_valueize_for_srt (tree t, void* context ATTRIBUTE_UNUSED)
>     return res;
>   }
>   
> +class vn_fur : public fur_depend
> +{
> +public:
> +  vn_fur (gimple *s, gori_compute *g, range_query *q)
> +    : fur_depend (s, g, q) {}
> +  virtual bool get_operand (vrange &, tree);
> +  virtual bool get_phi_operand (vrange &, tree, edge);
> +};
> +
> +bool
> +vn_fur::get_operand (vrange &r, tree op)
> +{
> +  return fur_depend::get_operand (r, vn_valueize (op));
> +}
> +
> +bool
> +vn_fur::get_phi_operand (vrange &r, tree op, edge e)
> +{
> +  // ???  Check region boundary, avoid walking ourside of the region
> +  // somehow?  Possibly when initializing the ranger object?
> +  // or can/should we simply return 'false' when we arrive with
> +  // e being the entry edge?  What about for non-PHIs?  Can we
> +  // somehow force to only use the global range and stop caching
> +  // after that?
> +  if (e->flags & EDGE_EXECUTABLE)
> +    return fur_depend::get_phi_operand (r, vn_valueize (op), e);
> +  r.set_undefined ();
> +  return true;
> +}
> +
>   
>   /* This represents the top of the VN lattice, which is the universal
>      value.  */
> @@ -7292,12 +7324,12 @@ eliminate_with_rpo_vn (bitmap inserted_exprs)
>   
>   static unsigned
>   do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
> -	     bool iterate, bool eliminate, vn_lookup_kind kind);
> +	     bool iterate, bool eliminate, vn_lookup_kind kind, gimple_ranger *);
>   
>   void
>   run_rpo_vn (vn_lookup_kind kind)
>   {
> -  do_rpo_vn_1 (cfun, NULL, NULL, true, false, kind);
> +  do_rpo_vn_1 (cfun, NULL, NULL, true, false, kind, NULL);
>   
>     /* ???  Prune requirement of these.  */
>     constant_to_value_id = new hash_table<vn_constant_hasher> (23);
> @@ -7580,7 +7612,8 @@ insert_related_predicates_on_edge (enum tree_code code, tree *ops, edge pred_e)
>   static unsigned
>   process_bb (rpo_elim &avail, basic_block bb,
>   	    bool bb_visited, bool iterate_phis, bool iterate, bool eliminate,
> -	    bool do_region, bitmap exit_bbs, bool skip_phis)
> +	    bool do_region, bitmap exit_bbs, bool skip_phis,
> +	    gimple_ranger *ranger)
>   {
>     unsigned todo = 0;
>     edge_iterator ei;
> @@ -7766,6 +7799,20 @@ process_bb (rpo_elim &avail, basic_block bb,
>   		      }
>   		  }
>   	      }
> +	    if (!val && ranger)
> +	      {
> +		int_range_max r;
> +		fold_using_range f;
> +		// ???  We get here with not valueized operands but
> +		// the fur_source only is involved for operands of
> +		// this very stmt, not when ranger prefills its cache
> +		// walking use->def edges.
> +		vn_fur src (last, &ranger->gori (), ranger);
> +		tree tem;
> +		if (f.fold_stmt (r, last, src)
> +		    && r.singleton_p (&tem))
> +		  val = tem;
> +	      }
>   	    if (val)
>   	      e = find_taken_edge (bb, val);
>   	    if (! e)
> @@ -7997,7 +8044,8 @@ do_unwind (unwind_state *to, rpo_elim &avail)
>   
>   static unsigned
>   do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
> -	     bool iterate, bool eliminate, vn_lookup_kind kind)
> +	     bool iterate, bool eliminate, vn_lookup_kind kind,
> +	     gimple_ranger *ranger)
>   {
>     unsigned todo = 0;
>     default_vn_walk_kind = kind;
> @@ -8213,7 +8261,8 @@ do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
>   	todo |= process_bb (avail, bb,
>   			    rpo_state[idx].visited != 0,
>   			    rpo_state[idx].iterate,
> -			    iterate, eliminate, do_region, exit_bbs, false);
> +			    iterate, eliminate, do_region, exit_bbs, false,
> +			    ranger);
>   	rpo_state[idx].visited++;
>   
>   	/* Verify if changed values flow over executable outgoing backedges
> @@ -8326,7 +8375,8 @@ do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
>   	  nblk++;
>   	  todo |= process_bb (avail, bb, false, false, false, eliminate,
>   			      do_region, exit_bbs,
> -			      skip_entry_phis && bb == entry->dest);
> +			      skip_entry_phis && bb == entry->dest,
> +			      ranger);
>   	  rpo_state[idx].visited++;
>   
>   	  FOR_EACH_EDGE (e, ei, bb->succs)
> @@ -8419,14 +8469,17 @@ do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
>      If ITERATE is true then treat backedges optimistically as not
>      executed and iterate.  If ELIMINATE is true then perform
>      elimination, otherwise leave that to the caller.
> -   KIND specifies the amount of work done for handling memory operations.  */
> +   KIND specifies the amount of work done for handling memory operations.
> +   When RANGER is specified use that to simplify conditionals.  */
>   
>   unsigned
>   do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
> -	   bool iterate, bool eliminate, vn_lookup_kind kind)
> +	   bool iterate, bool eliminate, vn_lookup_kind kind,
> +	   gimple_ranger *ranger)
>   {
>     auto_timevar tv (TV_TREE_RPO_VN);
> -  unsigned todo = do_rpo_vn_1 (fn, entry, exit_bbs, iterate, eliminate, kind);
> +  unsigned todo = do_rpo_vn_1 (fn, entry, exit_bbs, iterate, eliminate, kind,
> +			       ranger);
>     free_rpo_vn ();
>     return todo;
>   }
> @@ -8482,7 +8535,9 @@ pass_fre::execute (function *fun)
>     if (iterate_p)
>       loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
>   
> -  todo = do_rpo_vn_1 (fun, NULL, NULL, iterate_p, true, VN_WALKREWRITE);
> +  gimple_ranger ranger;
> +  todo = do_rpo_vn_1 (fun, NULL, NULL, iterate_p, true, VN_WALKREWRITE,
> +		      &ranger); //!iterate_p ? &ranger : NULL);
>     free_rpo_vn ();
>   
>     if (iterate_p)
> diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h
> index abcf7e666c2..b9c4ac787b4 100644
> --- a/gcc/tree-ssa-sccvn.h
> +++ b/gcc/tree-ssa-sccvn.h
> @@ -298,7 +298,8 @@ tree vn_nary_simplify (vn_nary_op_t);
>   unsigned do_rpo_vn (function *, edge, bitmap,
>   		    /* iterate */ bool = false,
>   		    /* eliminate */ bool = true,
> -		    vn_lookup_kind = VN_WALKREWRITE);
> +		    vn_lookup_kind = VN_WALKREWRITE,
> +		    class gimple_ranger * = NULL);
>   
>   /* Private interface for PRE.  */
>   void run_rpo_vn (vn_lookup_kind);
  
Richard Biener Sept. 21, 2022, 10:13 a.m. UTC | #2
On Mon, 19 Sep 2022, Andrew MacLeod wrote:

> Yeah, currently the internal cache isnt really wired into the fold_using_range
> as its is suppose to be a consistent internal state.  so its not currently
> expecting to work in a situation here what it thinks is global state might
> change.
> 
> I figured I better go back and watch your entire VN presentation, I only
> caught the last bit live.
> 
> Let me try to understand exactly what you want/need.. and let me use a simple
> example from that talk (yes I just typed it in :-).
> 
>   n_2 = 1
>   i_4 = 0
>   val_5 = 0
> <bb 3>:
>   # i_1 = PHI <i_4(2), i_7(3)>
>   #val_2 = PHI <val_5(2), val_6(3) >
>   val_6 = val_2 + 1;
>   i_7 = i_1 + 1
>   if (i_7 < n_3)
>     goto <bb 3>;
>   else
>     goto <bb 4>;
> <bb 4>
>   _8 = val_6
>   return _8
> 
> 
> In an ideal world, While you are processing bb 3 the first iteration,  you
> want edge 4->3 to be unexecutable as far as ranger is concerned.  You walk
> thru the block, and get to the end.  and in this example, you'd be done cause
> any queries you make range would have if (2 < 1),
> 
> For the case when we do need to iterate (say n_2 = 3), you need to go back and
> rewalk that block, with that edge no longer marked as unexecutable?  And part
> of the problem would be that ranger's cache for bb 3 would have all those
> values from the first pass. and everything explodes.

Yep.

> It looks like you created a fur_source to manually adjust PHIs within the
> fold_stmt query to ignore edges that are not marked executable.

Yes, and use the current values from the VN lattice when looking at
statement operands.

> So now let me start with my questions.
> 
> In theory, if you could tell it "kill the cache for bb3"  change the
> executable state and re walk it, that would work?  This clearly get s a little
> more complicated as the size of the region that is being iterated grows, so
> you would want to be able to invalidate the cache for a range of blocks.

Yep.  I do have such a range of blocks nicely available.  I would also
know the points when I change executability of an edge (always goes from
not executable to executable) in case that would help more.

> Thats not quite as straightforward as it might seems because the cache for
> values per basic block is indexed by ssa-name..  so you cant simply say, "kill
> bb3s value".. you need to walk all the ssa-names that have a cache object, and
> if they have an entry for bb3, kill that.  But lets leave that alone for a
> minute, because that is a solvable problem.
> 
> Ranger also has a globally available flag that it uses to internally flag and
> track executable state of edge:
> 
>   auto_edge_flag non_executable_edge_flag;
> 
> The gori object which is instantiated with any ranger instance only uses that
> to determine if an edge should be processed for a value  ie
> 
> gori_compute::outgoing_edge_range_p (vrange &r, edge e, tree name, range_query
> &q)
> {
>     if ((e->flags & m_not_executable_flag))
>     {
>       r.set_undefined ();
>       return true;
> 
> So it seems to me that if you all set/cleared this flag along with
> EDGE_EXECUTABLE  (and it means the opposite by the way.  TRUE means the edge
> is not_executable.. )  That at least all your queries through ranger would be
> picking up the values you are looking for.  including those PHI's.

Ah, I see.

> That would then just leave you with the stale cache state to deal with?   And
> if we can resolve that, would all just work?  at least in theory?

In theory, yes.  Besides that the use-def walking of the cache it not
wired up with fur_*

> There is a concept in the cache of staleness which is used when we recalculate
> over back edges, but its really oriented towards a model where we only ever
> move edges from executable to non-executable.    Your VN approach moves them
> in the other way and I have no confidence it would work properly in that mode.
> 
> Its not super efficient, but as a starting point we might be able to do
> something like
>   1 - when you enter a region call a new routine called "Mark state" which
> would  push an empty bitmap over ssa-names onto a stack
>   2 - everytime the cache creates an entry, set the bit for the ssa-name in
> the bitmap at the top of the stack
>  3 - when you call "reset state", simply kill all the cache entries for any
> ssa name set in the bitmap.  And also kill rangers internal global value for
> the name.   THIs would in essence reset eveyrthing it knows about that ssa
> name.

Value-numbering essentially does such things to be able to roll-back,
it records "change objects" that can be used to remove elements added
to hash-tables efficiently for example.

> That would have to drawback of losing any information that had been calculated
> earlier, and possibly trigger some additional calculations as that value will
> now be stale elsewhere in the IL.   We could make it more efficient if you
> also provided a bitmap over the basic blocks in the region (or some way of
> determining that).  Then we would just kill the entries in those blocks for
> those ssa-names.
> 
> As long as you weren't making any ranger queries from outside the region
> during this time, that might work.
> 
> Before I go any further, are we on the same page, or are you looking for
> something different than this?
> 
> You had mentioned the path_ranger model also, which effectively uses ranger
> for the values on entry to the path/region, and then uses over-rided API entry
> points to choose whether to use the local cache vector, or go to rangers
> on-entry values..
> 
>  This approach could also work, and one way would be to implement the
> mark/reset API. The new class would maybe instantiate an additional local
> cache from which values are set/picked up first for any queries within the
> region blocks and when you are done, throws away the cache for the region.
> This would require some tweaking from rangers primary cache lookup/set
> routine, but perhaps there is some benefit to generalizing this model and
> integrating it.. ie, a region_ranger which builds this facility on top and
> allows you to reset regions.   Presumably you want to be able to "stack" these
> marks so you can push and pop regions? would also have to figure out how to
> work the relational oracle with it.

Yeah, so my thinking was that I'd keep my own range cache for in-region,
like path_ranger does, and track changes similar as to how I do for VN
so I can undo things efficiently.  That way I also get to decide
how far to do the caching use-def walks.

Of course the disadvantage is duplicating quite some ranger functionality
that's already there without too much changes besides my own cache plus
ensuring using the VNs computed values and edge execuability.  Maybe
it isn't too much work though.

> I can think of a couple of possible approaches for that, some more efficient
> than others.    I suppose the path_ranger could even be completely replaced
> with that model. huh. it would just call mark() at the start of a path, and
> reset() when its done with the path.   Might not be as efficient however.   Be
> a good proof of correctness test I guess at least.
> 
> Anyway, am I in the ballpark somewhere of what you are looking for?  Or am I
> over engineering something? :-)

No, you basically identified the problems.

Note that when not iterating the proposed change should already work
optimally, but it's of course bad to be "better" in the weaker setting ...

The second thing would be to look at how to use conditional equivalences.
What's required to make ranger aware of them and what's required to
query them since for them the actual simplification would happen in
the folding done by value-numbering like with

  if (a == b)
    c = a - b;

which EVRP gets simplified for example but VN fails at this.
There's proposed enhancements to VN that might catch this as well
but I'm not too happy about the implementation.  Mainly because
equivalences are a bitch.

Richard.

> Andrew
> 
> 
> 
> On 9/19/22 07:19, Richard Biener wrote:
> > The following is an attempt to utilize ranger for simplification
> > of conditionals during value-numbering.  It fails the added
> > testcase because it seems the fur source is only involved when
> > processing the stmt to be folded but not when filling the cache
> > of ranges on the operand use->def chain.
> >
> > When not iterating that's not a problem since earlier stmts
> > have operands eliminated but when not iterating or when
> > elimination is disabled the IL remains unchanged.
> >
> > When iterating it will probably do things wrong due to the lack
> > of pruning the chache for the region we unwind and when doing
> > region based VN it likely picks up stale EDGE_EXECUTABLE when
> > leaving the region over the entry because there's no way to stop
> > "local" cache prefilling when leaving the region (and just using
> > global ranges from there).
> >
> > Knowing path-range-query it's probably possible to clone all
> > the block walking and cache filling, making a special variant
> > for value-numbering but I wonder whether that's really what's
> > intended?  I would probably need to implement a special
> > range_of_expr and range_of_stmt for this and keep my own cache?
> > Basically a region_range_query with wired in valueization?
> >
> > 	* tree-ssa-sccvn.cc (...): ...
> > ---
> >   gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c | 15 ++++
> >   gcc/tree-ssa-sccvn.cc                       | 77 ++++++++++++++++++---
> >   gcc/tree-ssa-sccvn.h                        |  3 +-
> >   3 files changed, 83 insertions(+), 12 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
> > new file mode 100644
> > index 00000000000..f479a322b70
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-fre1" } */
> > +
> > +int foo (int *p, int *q)
> > +{
> > +  if (*p > 0 && *q > 0)
> > +    {
> > +      int v = *p + *q;
> > +      if (v > 0)
> > +        return *q;
> > +    }
> > +  return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "known outgoing edge" 1 "fre1" } } */
> > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> > index 74b8d8d18ef..2753200c2c2 100644
> > --- a/gcc/tree-ssa-sccvn.cc
> > +++ b/gcc/tree-ssa-sccvn.cc
> > @@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
> >   #include "fold-const-call.h"
> >   #include "ipa-modref-tree.h"
> >   #include "ipa-modref.h"
> > +#include "gimple-range.h"
> >   #include "tree-ssa-sccvn.h"
> >   
> >   /* This algorithm is based on the SCC algorithm presented by Keith
> > @@ -360,10 +361,11 @@ static vn_ssa_aux_t last_pushed_avail;
> >      correct.  */
> >   static vn_tables_t valid_info;
> >   
> > +tree (*vn_valueize) (tree);
> >   
> >   /* Valueization hook for simplify_replace_tree.  Valueize NAME if it is
> >      an SSA name, otherwise just return it.  */
> > -tree (*vn_valueize) (tree);
> > +
> >   static tree
> >   vn_valueize_for_srt (tree t, void* context ATTRIBUTE_UNUSED)
> >   {
> > @@ -380,6 +382,36 @@ vn_valueize_for_srt (tree t, void* context
> > ATTRIBUTE_UNUSED)
> >     return res;
> >   }
> >   
> > +class vn_fur : public fur_depend
> > +{
> > +public:
> > +  vn_fur (gimple *s, gori_compute *g, range_query *q)
> > +    : fur_depend (s, g, q) {}
> > +  virtual bool get_operand (vrange &, tree);
> > +  virtual bool get_phi_operand (vrange &, tree, edge);
> > +};
> > +
> > +bool
> > +vn_fur::get_operand (vrange &r, tree op)
> > +{
> > +  return fur_depend::get_operand (r, vn_valueize (op));
> > +}
> > +
> > +bool
> > +vn_fur::get_phi_operand (vrange &r, tree op, edge e)
> > +{
> > +  // ???  Check region boundary, avoid walking ourside of the region
> > +  // somehow?  Possibly when initializing the ranger object?
> > +  // or can/should we simply return 'false' when we arrive with
> > +  // e being the entry edge?  What about for non-PHIs?  Can we
> > +  // somehow force to only use the global range and stop caching
> > +  // after that?
> > +  if (e->flags & EDGE_EXECUTABLE)
> > +    return fur_depend::get_phi_operand (r, vn_valueize (op), e);
> > +  r.set_undefined ();
> > +  return true;
> > +}
> > +
> >   
> >   /* This represents the top of the VN lattice, which is the universal
> >      value.  */
> > @@ -7292,12 +7324,12 @@ eliminate_with_rpo_vn (bitmap inserted_exprs)
> >   
> >   static unsigned
> >   do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
> > -	     bool iterate, bool eliminate, vn_lookup_kind kind);
> > +	     bool iterate, bool eliminate, vn_lookup_kind kind, gimple_ranger
> > *);
> >   
> >   void
> >   run_rpo_vn (vn_lookup_kind kind)
> >   {
> > -  do_rpo_vn_1 (cfun, NULL, NULL, true, false, kind);
> > +  do_rpo_vn_1 (cfun, NULL, NULL, true, false, kind, NULL);
> >   
> >     /* ???  Prune requirement of these.  */
> >     constant_to_value_id = new hash_table<vn_constant_hasher> (23);
> > @@ -7580,7 +7612,8 @@ insert_related_predicates_on_edge (enum tree_code
> > code, tree *ops, edge pred_e)
> >   static unsigned
> >   process_bb (rpo_elim &avail, basic_block bb,
> >   	    bool bb_visited, bool iterate_phis, bool iterate, bool eliminate,
> > -	    bool do_region, bitmap exit_bbs, bool skip_phis)
> > +	    bool do_region, bitmap exit_bbs, bool skip_phis,
> > +	    gimple_ranger *ranger)
> >   {
> >     unsigned todo = 0;
> >     edge_iterator ei;
> > @@ -7766,6 +7799,20 @@ process_bb (rpo_elim &avail, basic_block bb,
> >           }
> >          	  }
> >   	      }
> > +	    if (!val && ranger)
> > +	      {
> > +		int_range_max r;
> > +		fold_using_range f;
> > +		// ???  We get here with not valueized operands but
> > +		// the fur_source only is involved for operands of
> > +		// this very stmt, not when ranger prefills its cache
> > +		// walking use->def edges.
> > +		vn_fur src (last, &ranger->gori (), ranger);
> > +		tree tem;
> > +		if (f.fold_stmt (r, last, src)
> > +		    && r.singleton_p (&tem))
> > +		  val = tem;
> > +	      }
> >        if (val)
> >          e = find_taken_edge (bb, val);
> >   	    if (! e)
> > @@ -7997,7 +8044,8 @@ do_unwind (unwind_state *to, rpo_elim &avail)
> >   
> >   static unsigned
> >   do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
> > -	     bool iterate, bool eliminate, vn_lookup_kind kind)
> > +	     bool iterate, bool eliminate, vn_lookup_kind kind,
> > +	     gimple_ranger *ranger)
> >   {
> >     unsigned todo = 0;
> >     default_vn_walk_kind = kind;
> > @@ -8213,7 +8261,8 @@ do_rpo_vn_1 (function *fn, edge entry, bitmap
> > exit_bbs,
> >    todo |= process_bb (avail, bb,
> >          rpo_state[idx].visited != 0,
> >          rpo_state[idx].iterate,
> > -			    iterate, eliminate, do_region, exit_bbs, false);
> > +			    iterate, eliminate, do_region, exit_bbs, false,
> > +			    ranger);
> >    rpo_state[idx].visited++;
> >   
> >   	/* Verify if changed values flow over executable outgoing backedges
> > @@ -8326,7 +8375,8 @@ do_rpo_vn_1 (function *fn, edge entry, bitmap
> > exit_bbs,
> >      nblk++;
> >      todo |= process_bb (avail, bb, false, false, false, eliminate,
> >   			      do_region, exit_bbs,
> > -			      skip_entry_phis && bb == entry->dest);
> > +			      skip_entry_phis && bb == entry->dest,
> > +			      ranger);
> >      rpo_state[idx].visited++;
> >   
> >   	  FOR_EACH_EDGE (e, ei, bb->succs)
> > @@ -8419,14 +8469,17 @@ do_rpo_vn_1 (function *fn, edge entry, bitmap
> > exit_bbs,
> >      If ITERATE is true then treat backedges optimistically as not
> >      executed and iterate.  If ELIMINATE is true then perform
> >      elimination, otherwise leave that to the caller.
> > -   KIND specifies the amount of work done for handling memory operations.
> > */
> > +   KIND specifies the amount of work done for handling memory operations.
> > +   When RANGER is specified use that to simplify conditionals.  */
> >   
> >   unsigned
> >   do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
> > -	   bool iterate, bool eliminate, vn_lookup_kind kind)
> > +	   bool iterate, bool eliminate, vn_lookup_kind kind,
> > +	   gimple_ranger *ranger)
> >   {
> >     auto_timevar tv (TV_TREE_RPO_VN);
> > -  unsigned todo = do_rpo_vn_1 (fn, entry, exit_bbs, iterate, eliminate,
> > kind);
> > +  unsigned todo = do_rpo_vn_1 (fn, entry, exit_bbs, iterate, eliminate,
> > kind,
> > +			       ranger);
> >     free_rpo_vn ();
> >     return todo;
> >   }
> > @@ -8482,7 +8535,9 @@ pass_fre::execute (function *fun)
> >     if (iterate_p)
> >       loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
> >   -  todo = do_rpo_vn_1 (fun, NULL, NULL, iterate_p, true, VN_WALKREWRITE);
> > +  gimple_ranger ranger;
> > +  todo = do_rpo_vn_1 (fun, NULL, NULL, iterate_p, true, VN_WALKREWRITE,
> > +		      &ranger); //!iterate_p ? &ranger : NULL);
> >     free_rpo_vn ();
> >   
> >     if (iterate_p)
> > diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h
> > index abcf7e666c2..b9c4ac787b4 100644
> > --- a/gcc/tree-ssa-sccvn.h
> > +++ b/gcc/tree-ssa-sccvn.h
> > @@ -298,7 +298,8 @@ tree vn_nary_simplify (vn_nary_op_t);
> >   unsigned do_rpo_vn (function *, edge, bitmap,
> >         /* iterate */ bool = false,
> >         /* eliminate */ bool = true,
> > -		    vn_lookup_kind = VN_WALKREWRITE);
> > +		    vn_lookup_kind = VN_WALKREWRITE,
> > +		    class gimple_ranger * = NULL);
> >   
> >   /* Private interface for PRE.  */
> >   void run_rpo_vn (vn_lookup_kind);
> 
> 
>
  
Andrew MacLeod Sept. 21, 2022, 6:52 p.m. UTC | #3
On 9/21/22 06:13, Richard Biener wrote:
> On Mon, 19 Sep 2022, Andrew MacLeod wrote:
>
>
>> It looks like you created a fur_source to manually adjust PHIs within the
>> fold_stmt query to ignore edges that are not marked executable.
> Yes, and use the current values from the VN lattice when looking at
> statement operands.

yes, that is exactly how its intended to be used.


>
>> That would then just leave you with the stale cache state to deal with?   And
>> if we can resolve that, would all just work?  at least in theory?
> In theory, yes.  Besides that the use-def walking of the cache it not
> wired up with fur_*

Well, yes. hmm, you want to set cache values based on the VN lattice as 
well. yes. OK, let me do a bit of cache explanation since I haven't done 
that yet. It does not need a fur_source of any kind, and I'll explain why.

The cache has 2 primary functions..
   1) maintain the global definition table (used to decide if a name has 
been processed). This is local and not the one the rest of GCC uses.   and
   2) maintain the range-on-entry cache andresolve queries to that 
efficiently.

The cache does not actually create any NEW information.  This is one of 
its key features in preventing any kind of cascading cyclic updates.  
All it does is propagate existing information from the definition table, 
with values extracted from the global value table.  So your example is 
not good for this, as there isn't much in the cache for it.  so lets 
tweak it and add another block. example:

n_2 = 1
   i_4 = 0
   val_5 = 0
<bb 3>:
   # i_1 = PHI <i_4(2), i_7(3)>
   #val_2 = PHI <val_5(2), val_6(3) >
   val_6 = val_2 + 1;
   i_7 = i_1 + 1
   if (i_7 > 22)
      goto <bb 12>
   else
      goto <bb 7>
<bb 7>
   if (i_7 < n_3)
     goto <bb 3>;
   else
     goto <bb 4>;
<bb 4>
   _8 = val_6
   return _8

For the sake of simplicity, lets also assume bb2 and bb3 have been 
looked and all the ssa-names defined in those blocks have an entry in 
rangers defintion table.

Moving to <bb 7> if we ask for the range of "if (i_7< n_3) to be 
evaluated, it checks that i_7 and n_3 have been evaluated before it 
proceeds.  Both have entries, which means the next task is to get their 
values at this location.  range_of_expr is called on each one, and as 
they are not defined in this block, ranger asks the cache for the value 
of i_7 on entry to bb7. (likewise when it gets an answer back, it will 
do so for n_3 as well)

The cache walks back the dominators until it finds either:
   a) the block with the definition of i_7, or
   b) a block which has an on-entry cache value for i_7 already set.
During it walk, it tags any block which has i_7 in the export list, 
meaning an outgoing edge from that block may change the value of i_7.

There are additional complexities, but the fundamental operation is to 
now take the value it saw from a) or b) as the starting value, and 
supply that to GORI at every intervening outgoing edge i_7 was exported 
from. Whenever the value changes along the way, we write a cache update 
at the end of the edge to facilitate future queries.  At the end, the 
answer has been calculated and is stored as the on-entry value for this 
block.

So returning to the example, assume i_7 was set to VARYING in bb3, GORI 
would apply !(i_7 > 22) to the value, and we would end up in <bb 7> with 
a range-on-entry of [0, 21] and it would be stored in bb7.

In your example, if you have disabled that back edge, you would have a 
value of [1,1] for i_7.  GORI would not have changed that value since 
its already < 22, and we would store [1,1] as the range-on-entry to <bb 7>

Likewise, we do something similar for n_3.  The point is, the cache has 
not gone and created an new information.  its *only* purpose it to 
propagate known values thru the CFG, adjusting them for any outgoing 
edges that are encountered.  It uses a temporal marking in an attempt to 
identify when a global value has been changed, meaning it may need to go 
and repopulate something, but the bottom line It never contains anything 
beyond "reductions" in the ranges of values in the global table.  And it 
only every works on one name at a time.

THe bottom line, Ranger effectively only every changes values via the 
global table. And the cache propagates simply those values around, 
adjusting them with GORI as appropriate.

So there are multiple approaches.  We could simply kill the global table 
and cache line for any ssa_name we want to change the value of.  That 
gets a little tricker for on-entry values of secondary effects (ie, 
those used in the calculation of the primary names). It would probably 
work, but something unforeseen could show up.

More advanced would be to "layer" the cache.  ie, we use the cache, and 
at some point, you issue a "push".  The push creates a new cache, and 
all queries look first to the new cache, and if it cant be answered 
looks "down" thru to the previous caches.   This resolves all queries as 
if the cache layers are all "one" thing. Sets would always go to the 
latest layer.   When we "pop", we delete the latest layer..  the layers 
underneath should all reflect the exact state at the time of the push.  
All in theory of course :-) There would be some expense to it, but 
possibly not as much as one thinks since most components defer 
allocations and such until something is actually set.   It seems like it 
might not be a hard experiment that I will try once I get thru the bits 
I'm on now.


>
>
>> That would have to drawback of losing any information that had been calculated
>> earlier, and possibly trigger some additional calculations as that value will
>> now be stale elsewhere in the IL.   We could make it more efficient if you
>> also provided a bitmap over the basic blocks in the region (or some way of
>> determining that).  Then we would just kill the entries in those blocks for
>> those ssa-names.
>>
>> As long as you weren't making any ranger queries from outside the region
>> during this time, that might work.
>>
>> Before I go any further, are we on the same page, or are you looking for
>> something different than this?
>>
>> You had mentioned the path_ranger model also, which effectively uses ranger
>> for the values on entry to the path/region, and then uses over-rided API entry
>> points to choose whether to use the local cache vector, or go to rangers
>> on-entry values..
>>
>>   This approach could also work, and one way would be to implement the
>> mark/reset API. The new class would maybe instantiate an additional local
>> cache from which values are set/picked up first for any queries within the
>> region blocks and when you are done, throws away the cache for the region.
>> This would require some tweaking from rangers primary cache lookup/set
>> routine, but perhaps there is some benefit to generalizing this model and
>> integrating it.. ie, a region_ranger which builds this facility on top and
>> allows you to reset regions.   Presumably you want to be able to "stack" these
>> marks so you can push and pop regions? would also have to figure out how to
>> work the relational oracle with it.
> Yeah, so my thinking was that I'd keep my own range cache for in-region,
> like path_ranger does, and track changes similar as to how I do for VN
> so I can undo things efficiently.  That way I also get to decide
> how far to do the caching use-def walks.
>
> Of course the disadvantage is duplicating quite some ranger functionality
> that's already there without too much changes besides my own cache plus
> ensuring using the VNs computed values and edge execuability.  Maybe
> it isn't too much work though.

Let me give the layered cache a shot.. i don't think that will be much 
work (ha. famous last words), and we can see if it works, and how 
expensive it is.   There might be alternatives along this line, 
especially if you are only querying a few things here and there.   The 
cache is designed to be able to plug in alternative mechanisms on a 
name-by-name basis (there are already a couple of variations depending 
on CFG size and name density usage) , so is might be that we can do 
something really cheap for these kind of pushed layers.

I'm still mulling around options.


>> I can think of a couple of possible approaches for that, some more efficient
>> than others.    I suppose the path_ranger could even be completely replaced
>> with that model. huh. it would just call mark() at the start of a path, and
>> reset() when its done with the path.   Might not be as efficient however.   Be
>> a good proof of correctness test I guess at least.
>>
>> Anyway, am I in the ballpark somewhere of what you are looking for?  Or am I
>> over engineering something? :-)
> No, you basically identified the problems.
>
> Note that when not iterating the proposed change should already work
> optimally, but it's of course bad to be "better" in the weaker setting ...
>
> The second thing would be to look at how to use conditional equivalences.
> What's required to make ranger aware of them and what's required to
> query them since for them the actual simplification would happen in
> the folding done by value-numbering like with
>
>    if (a == b)
>      c = a - b;
>
> which EVRP gets simplified for example but VN fails at this.
> There's proposed enhancements to VN that might catch this as well
> but I'm not too happy about the implementation.  Mainly because
> equivalences are a bitch.

The relational oracle currently snags this one by putting a and b in an 
equivalence set on that outgoing edge.  the fold_stmt mechanism you are 
using automatically queries the relation oracle (if fur_source supplies 
one) for a relation between the 2 use operands of a statement, and 
passes that to the range-ops fold routine. so it will ultimately invoke 
operator_minus with

   fold_range (irange, int, range_of_a, range_of_b, VREL_EQ)

Range-ops for operator minus defines the routine 
"op1_op2_relation_effect()" (which is automatically applied to any value 
calculated by fold_range() after the fact.   That routine has:

// == and != produce [0,0] and ~[0,0] regardless of wrapping.
   if (rel == VREL_EQ)
     rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec));
   else if (rel == VREL_NE)
     rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec),
                               VR_ANTI_RANGE);

so as long as the relation was registered, fold_stmt will automatically 
get a range of [0,0] for c, all through range-ops for operator_minus.

If the fur_source used when fold_stmt is called supplies the routine 
"register_relation" (which I think only a fur_depend currently does), 
then relations are also automatically registered with the oracle 
supplied by the fur_depend.

There's a lot of this that can happen auto-magically.   I just need to 
make sure we can reset some of the underlying cache entries (and oracle 
as well.. it might need to be layered so that we don't pick up invalid 
relations.  It should also be relatively simply to adjust I think)

Andrew
  
Richard Biener Sept. 22, 2022, 6:33 a.m. UTC | #4
On Wed, 21 Sep 2022, Andrew MacLeod wrote:

> 
> On 9/21/22 06:13, Richard Biener wrote:
> > On Mon, 19 Sep 2022, Andrew MacLeod wrote:
> >
> >
> >> It looks like you created a fur_source to manually adjust PHIs within the
> >> fold_stmt query to ignore edges that are not marked executable.
> > Yes, and use the current values from the VN lattice when looking at
> > statement operands.
> 
> yes, that is exactly how its intended to be used.
> 
> 
> >
> >> That would then just leave you with the stale cache state to deal with?  
> >> And
> >> if we can resolve that, would all just work?  at least in theory?
> > In theory, yes.  Besides that the use-def walking of the cache it not
> > wired up with fur_*
> 
> Well, yes. hmm, you want to set cache values based on the VN lattice as well.
> yes. OK, let me do a bit of cache explanation since I haven't done that yet.
> It does not need a fur_source of any kind, and I'll explain why.
> 
> The cache has 2 primary functions..
>   1) maintain the global definition table (used to decide if a name has been
> processed). This is local and not the one the rest of GCC uses.   and
>   2) maintain the range-on-entry cache andresolve queries to that efficiently.
> 
> The cache does not actually create any NEW information.  This is one of its
> key features in preventing any kind of cascading cyclic updates.  All it does
> is propagate existing information from the definition table, with values
> extracted from the global value table.  So your example is not good for this,
> as there isn't much in the cache for it.  so lets tweak it and add another
> block. example:
> 
> n_2 = 1
>   i_4 = 0
>   val_5 = 0
> <bb 3>:
>   # i_1 = PHI <i_4(2), i_7(3)>
>   #val_2 = PHI <val_5(2), val_6(3) >
>   val_6 = val_2 + 1;
>   i_7 = i_1 + 1
>   if (i_7 > 22)
>      goto <bb 12>
>   else
>      goto <bb 7>
> <bb 7>
>   if (i_7 < n_3)
>     goto <bb 3>;
>   else
>     goto <bb 4>;
> <bb 4>
>   _8 = val_6
>   return _8
> 
> For the sake of simplicity, lets also assume bb2 and bb3 have been looked and
> all the ssa-names defined in those blocks have an entry in rangers defintion
> table.
> 
> Moving to <bb 7> if we ask for the range of "if (i_7< n_3) to be evaluated, it
> checks that i_7 and n_3 have been evaluated before it proceeds.  Both have
> entries, which means the next task is to get their values at this location. 
> range_of_expr is called on each one, and as they are not defined in this
> block, ranger asks the cache for the value of i_7 on entry to bb7. (likewise
> when it gets an answer back, it will do so for n_3 as well)
> 
> The cache walks back the dominators until it finds either:
>   a) the block with the definition of i_7, or
>   b) a block which has an on-entry cache value for i_7 already set.
> During it walk, it tags any block which has i_7 in the export list, meaning an
> outgoing edge from that block may change the value of i_7.
> 
> There are additional complexities, but the fundamental operation is to now
> take the value it saw from a) or b) as the starting value, and supply that to
> GORI at every intervening outgoing edge i_7 was exported from. Whenever the
> value changes along the way, we write a cache update at the end of the edge to
> facilitate future queries.  At the end, the answer has been calculated and is
> stored as the on-entry value for this block.
> 
> So returning to the example, assume i_7 was set to VARYING in bb3, GORI would
> apply !(i_7 > 22) to the value, and we would end up in <bb 7> with a
> range-on-entry of [0, 21] and it would be stored in bb7.
> 
> In your example, if you have disabled that back edge, you would have a value
> of [1,1] for i_7.  GORI would not have changed that value since its already <
> 22, and we would store [1,1] as the range-on-entry to <bb 7>
> 
> Likewise, we do something similar for n_3.  The point is, the cache has not
> gone and created an new information.  its *only* purpose it to propagate known
> values thru the CFG, adjusting them for any outgoing edges that are
> encountered.  It uses a temporal marking in an attempt to identify when a
> global value has been changed, meaning it may need to go and repopulate
> something, but the bottom line It never contains anything beyond "reductions"
> in the ranges of values in the global table.  And it only every works on one
> name at a time.
> 
> THe bottom line, Ranger effectively only every changes values via the global
> table. And the cache propagates simply those values around, adjusting them
> with GORI as appropriate.

Hmm, but when it reaches the definition of _7 then it ends up calling
range_of_stmt on it which then recursively processes operands and the
result is stored into the "global table"?  For the definition of _7
fur_* isn't asked for the operands so valueization would not happen.
path-ranger seems to override the def walking (range_of_expr) and thus
could transparently valueize.

OK, so you say there's two things we need to invalidate - first the
range-on-entry cache and second the "global table" entries?  Note that
value-numbering doesn't need to undo things in its value table when
iterating but it of course updates values when re-visiting definitions.
Since with ranger we'd query definitions upthread on-demand that
scheme likely would not work unless we "range-interpret" each stmt
when value numbering them.  That _might_ be the cheapest option in
the end if all the undoing is too expensive - basically force-push
a new range each time we visit a stmt.  For range-on-entry/exit
(aka range-on-edge) we probably would simply clear the range-on-entry
cache when (re-)entering a block?

And we'd like to just use the SSA range storage for defs outside of
the region.

Richard.

> So there are multiple approaches.  We could simply kill the global table and
> cache line for any ssa_name we want to change the value of.  That gets a
> little tricker for on-entry values of secondary effects (ie, those used in the
> calculation of the primary names). It would probably work, but something
> unforeseen could show up.
> 
> More advanced would be to "layer" the cache.  ie, we use the cache, and at
> some point, you issue a "push".  The push creates a new cache, and all queries
> look first to the new cache, and if it cant be answered looks "down" thru to
> the previous caches.   This resolves all queries as if the cache layers are
> all "one" thing. Sets would always go to the latest layer.   When we "pop", we
> delete the latest layer..  the layers underneath should all reflect the exact
> state at the time of the push.  All in theory of course :-) There would be
> some expense to it, but possibly not as much as one thinks since most
> components defer allocations and such until something is actually set.   It
> seems like it might not be a hard experiment that I will try once I get thru
> the bits I'm on now.
> 
> 
> >
> >
> >> That would have to drawback of losing any information that had been
> >> calculated
> >> earlier, and possibly trigger some additional calculations as that value
> >> will
> >> now be stale elsewhere in the IL.   We could make it more efficient if you
> >> also provided a bitmap over the basic blocks in the region (or some way of
> >> determining that).  Then we would just kill the entries in those blocks for
> >> those ssa-names.
> >>
> >> As long as you weren't making any ranger queries from outside the region
> >> during this time, that might work.
> >>
> >> Before I go any further, are we on the same page, or are you looking for
> >> something different than this?
> >>
> >> You had mentioned the path_ranger model also, which effectively uses ranger
> >> for the values on entry to the path/region, and then uses over-rided API
> >> entry
> >> points to choose whether to use the local cache vector, or go to rangers
> >> on-entry values..
> >>
> >>  This approach could also work, and one way would be to implement the
> >> mark/reset API. The new class would maybe instantiate an additional local
> >> cache from which values are set/picked up first for any queries within the
> >> region blocks and when you are done, throws away the cache for the region.
> >> This would require some tweaking from rangers primary cache lookup/set
> >> routine, but perhaps there is some benefit to generalizing this model and
> >> integrating it.. ie, a region_ranger which builds this facility on top and
> >> allows you to reset regions.   Presumably you want to be able to "stack"
> >> these
> >> marks so you can push and pop regions? would also have to figure out how to
> >> work the relational oracle with it.
> > Yeah, so my thinking was that I'd keep my own range cache for in-region,
> > like path_ranger does, and track changes similar as to how I do for VN
> > so I can undo things efficiently.  That way I also get to decide
> > how far to do the caching use-def walks.
> >
> > Of course the disadvantage is duplicating quite some ranger functionality
> > that's already there without too much changes besides my own cache plus
> > ensuring using the VNs computed values and edge execuability.  Maybe
> > it isn't too much work though.
> 
> Let me give the layered cache a shot.. i don't think that will be much work
> (ha. famous last words), and we can see if it works, and how expensive it
> is.   There might be alternatives along this line, especially if you are only
> querying a few things here and there.   The cache is designed to be able to
> plug in alternative mechanisms on a name-by-name basis (there are already a
> couple of variations depending on CFG size and name density usage) , so is
> might be that we can do something really cheap for these kind of pushed
> layers.
> 
> I'm still mulling around options.
> 
> 
> >> I can think of a couple of possible approaches for that, some more
> >> efficient
> >> than others.    I suppose the path_ranger could even be completely replaced
> >> with that model. huh. it would just call mark() at the start of a path, and
> >> reset() when its done with the path.   Might not be as efficient however.  
> >> Be
> >> a good proof of correctness test I guess at least.
> >>
> >> Anyway, am I in the ballpark somewhere of what you are looking for?  Or am
> >> I
> >> over engineering something? :-)
> > No, you basically identified the problems.
> >
> > Note that when not iterating the proposed change should already work
> > optimally, but it's of course bad to be "better" in the weaker setting ...
> >
> > The second thing would be to look at how to use conditional equivalences.
> > What's required to make ranger aware of them and what's required to
> > query them since for them the actual simplification would happen in
> > the folding done by value-numbering like with
> >
> >    if (a == b)
> >      c = a - b;
> >
> > which EVRP gets simplified for example but VN fails at this.
> > There's proposed enhancements to VN that might catch this as well
> > but I'm not too happy about the implementation.  Mainly because
> > equivalences are a bitch.
> 
> The relational oracle currently snags this one by putting a and b in an
> equivalence set on that outgoing edge.  the fold_stmt mechanism you are using
> automatically queries the relation oracle (if fur_source supplies one) for a
> relation between the 2 use operands of a statement, and passes that to the
> range-ops fold routine. so it will ultimately invoke operator_minus with
> 
>   fold_range (irange, int, range_of_a, range_of_b, VREL_EQ)
> 
> Range-ops for operator minus defines the routine "op1_op2_relation_effect()"
> (which is automatically applied to any value calculated by fold_range() after
> the fact.   That routine has:
> 
> // == and != produce [0,0] and ~[0,0] regardless of wrapping.
>   if (rel == VREL_EQ)
>     rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec));
>   else if (rel == VREL_NE)
>     rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec),
>                               VR_ANTI_RANGE);
> 
> so as long as the relation was registered, fold_stmt will automatically get a
> range of [0,0] for c, all through range-ops for operator_minus.
> 
> If the fur_source used when fold_stmt is called supplies the routine
> "register_relation" (which I think only a fur_depend currently does), then
> relations are also automatically registered with the oracle supplied by the
> fur_depend.
> 
> There's a lot of this that can happen auto-magically.   I just need to make
> sure we can reset some of the underlying cache entries (and oracle as well..
> it might need to be layered so that we don't pick up invalid relations.  It
> should also be relatively simply to adjust I think)
> 
> Andrew
> 
>
  

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
new file mode 100644
index 00000000000..f479a322b70
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
@@ -0,0 +1,15 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+int foo (int *p, int *q)
+{
+  if (*p > 0 && *q > 0)
+    {
+      int v = *p + *q;
+      if (v > 0)
+        return *q;
+    }
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "known outgoing edge" 1 "fre1" } } */
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 74b8d8d18ef..2753200c2c2 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -73,6 +73,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "fold-const-call.h"
 #include "ipa-modref-tree.h"
 #include "ipa-modref.h"
+#include "gimple-range.h"
 #include "tree-ssa-sccvn.h"
 
 /* This algorithm is based on the SCC algorithm presented by Keith
@@ -360,10 +361,11 @@  static vn_ssa_aux_t last_pushed_avail;
    correct.  */
 static vn_tables_t valid_info;
 
+tree (*vn_valueize) (tree);
 
 /* Valueization hook for simplify_replace_tree.  Valueize NAME if it is
    an SSA name, otherwise just return it.  */
-tree (*vn_valueize) (tree);
+
 static tree
 vn_valueize_for_srt (tree t, void* context ATTRIBUTE_UNUSED)
 {
@@ -380,6 +382,36 @@  vn_valueize_for_srt (tree t, void* context ATTRIBUTE_UNUSED)
   return res;
 }
 
+class vn_fur : public fur_depend
+{
+public:
+  vn_fur (gimple *s, gori_compute *g, range_query *q)
+    : fur_depend (s, g, q) {}
+  virtual bool get_operand (vrange &, tree);
+  virtual bool get_phi_operand (vrange &, tree, edge);
+};
+
+bool
+vn_fur::get_operand (vrange &r, tree op)
+{
+  return fur_depend::get_operand (r, vn_valueize (op));
+}
+
+bool
+vn_fur::get_phi_operand (vrange &r, tree op, edge e)
+{
+  // ???  Check region boundary, avoid walking ourside of the region
+  // somehow?  Possibly when initializing the ranger object?
+  // or can/should we simply return 'false' when we arrive with
+  // e being the entry edge?  What about for non-PHIs?  Can we
+  // somehow force to only use the global range and stop caching
+  // after that?
+  if (e->flags & EDGE_EXECUTABLE)
+    return fur_depend::get_phi_operand (r, vn_valueize (op), e);
+  r.set_undefined ();
+  return true;
+}
+
 
 /* This represents the top of the VN lattice, which is the universal
    value.  */
@@ -7292,12 +7324,12 @@  eliminate_with_rpo_vn (bitmap inserted_exprs)
 
 static unsigned
 do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
-	     bool iterate, bool eliminate, vn_lookup_kind kind);
+	     bool iterate, bool eliminate, vn_lookup_kind kind, gimple_ranger *);
 
 void
 run_rpo_vn (vn_lookup_kind kind)
 {
-  do_rpo_vn_1 (cfun, NULL, NULL, true, false, kind);
+  do_rpo_vn_1 (cfun, NULL, NULL, true, false, kind, NULL);
 
   /* ???  Prune requirement of these.  */
   constant_to_value_id = new hash_table<vn_constant_hasher> (23);
@@ -7580,7 +7612,8 @@  insert_related_predicates_on_edge (enum tree_code code, tree *ops, edge pred_e)
 static unsigned
 process_bb (rpo_elim &avail, basic_block bb,
 	    bool bb_visited, bool iterate_phis, bool iterate, bool eliminate,
-	    bool do_region, bitmap exit_bbs, bool skip_phis)
+	    bool do_region, bitmap exit_bbs, bool skip_phis,
+	    gimple_ranger *ranger)
 {
   unsigned todo = 0;
   edge_iterator ei;
@@ -7766,6 +7799,20 @@  process_bb (rpo_elim &avail, basic_block bb,
 		      }
 		  }
 	      }
+	    if (!val && ranger)
+	      {
+		int_range_max r;
+		fold_using_range f;
+		// ???  We get here with not valueized operands but
+		// the fur_source only is involved for operands of
+		// this very stmt, not when ranger prefills its cache
+		// walking use->def edges.
+		vn_fur src (last, &ranger->gori (), ranger);
+		tree tem;
+		if (f.fold_stmt (r, last, src)
+		    && r.singleton_p (&tem))
+		  val = tem;
+	      }
 	    if (val)
 	      e = find_taken_edge (bb, val);
 	    if (! e)
@@ -7997,7 +8044,8 @@  do_unwind (unwind_state *to, rpo_elim &avail)
 
 static unsigned
 do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
-	     bool iterate, bool eliminate, vn_lookup_kind kind)
+	     bool iterate, bool eliminate, vn_lookup_kind kind,
+	     gimple_ranger *ranger)
 {
   unsigned todo = 0;
   default_vn_walk_kind = kind;
@@ -8213,7 +8261,8 @@  do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
 	todo |= process_bb (avail, bb,
 			    rpo_state[idx].visited != 0,
 			    rpo_state[idx].iterate,
-			    iterate, eliminate, do_region, exit_bbs, false);
+			    iterate, eliminate, do_region, exit_bbs, false,
+			    ranger);
 	rpo_state[idx].visited++;
 
 	/* Verify if changed values flow over executable outgoing backedges
@@ -8326,7 +8375,8 @@  do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
 	  nblk++;
 	  todo |= process_bb (avail, bb, false, false, false, eliminate,
 			      do_region, exit_bbs,
-			      skip_entry_phis && bb == entry->dest);
+			      skip_entry_phis && bb == entry->dest,
+			      ranger);
 	  rpo_state[idx].visited++;
 
 	  FOR_EACH_EDGE (e, ei, bb->succs)
@@ -8419,14 +8469,17 @@  do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
    If ITERATE is true then treat backedges optimistically as not
    executed and iterate.  If ELIMINATE is true then perform
    elimination, otherwise leave that to the caller.
-   KIND specifies the amount of work done for handling memory operations.  */
+   KIND specifies the amount of work done for handling memory operations.
+   When RANGER is specified use that to simplify conditionals.  */
 
 unsigned
 do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
-	   bool iterate, bool eliminate, vn_lookup_kind kind)
+	   bool iterate, bool eliminate, vn_lookup_kind kind,
+	   gimple_ranger *ranger)
 {
   auto_timevar tv (TV_TREE_RPO_VN);
-  unsigned todo = do_rpo_vn_1 (fn, entry, exit_bbs, iterate, eliminate, kind);
+  unsigned todo = do_rpo_vn_1 (fn, entry, exit_bbs, iterate, eliminate, kind,
+			       ranger);
   free_rpo_vn ();
   return todo;
 }
@@ -8482,7 +8535,9 @@  pass_fre::execute (function *fun)
   if (iterate_p)
     loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
 
-  todo = do_rpo_vn_1 (fun, NULL, NULL, iterate_p, true, VN_WALKREWRITE);
+  gimple_ranger ranger;
+  todo = do_rpo_vn_1 (fun, NULL, NULL, iterate_p, true, VN_WALKREWRITE,
+		      &ranger); //!iterate_p ? &ranger : NULL);
   free_rpo_vn ();
 
   if (iterate_p)
diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h
index abcf7e666c2..b9c4ac787b4 100644
--- a/gcc/tree-ssa-sccvn.h
+++ b/gcc/tree-ssa-sccvn.h
@@ -298,7 +298,8 @@  tree vn_nary_simplify (vn_nary_op_t);
 unsigned do_rpo_vn (function *, edge, bitmap,
 		    /* iterate */ bool = false,
 		    /* eliminate */ bool = true,
-		    vn_lookup_kind = VN_WALKREWRITE);
+		    vn_lookup_kind = VN_WALKREWRITE,
+		    class gimple_ranger * = NULL);
 
 /* Private interface for PRE.  */
 void run_rpo_vn (vn_lookup_kind);