if-conv: Small improvement for expansion of complex PHIs [PR109154]

Message ID ZDpVL1KlzxWJKDzy@tucnak
State Unresolved
Headers
Series if-conv: Small improvement for expansion of complex PHIs [PR109154] |

Checks

Context Check Description
snail/gcc-patch-check warning Git am fail log

Commit Message

Jakub Jelinek April 15, 2023, 7:41 a.m. UTC
  Hi!

The following patch is just a dumb improvement, gets rid of 2 unnecessary
instructions on both the PR's original testcase and on the two reduced ones,
both on -mcpu=neoverse-v1 and -mavx512f.

The thing is, if we have args_len (args_len >= 2) unique PHI arguments,
we need only args_len - 1 COND_EXPRs to expand the PHI, because first
COND_EXPR can merge 2 unique arguments and all the following ones merge
another unique argument with the previously merged arguments,
while the code for mysterious reasons was always emitting args_len
COND_EXPRs, where the first COND_EXPR merged the first and second unique
arguments, the second COND_EXPR merged the second unique argument with
result of merging the first and second unique arguments and the rest was
already expectable, nth COND_EXPR for n > 2 merged the nth unique argument
with result of merging the previous unique arguments.
Now, in my understanding, the bb_predicate for bb's predecessor need to
form a disjunct set which together creates the successor's bb_predicate,
so I don't see why we'd need to check all the bb_predicates, if we check
all but one then when all those other ones are false the last bb_predicate
is necessarily true.  Given that the code attempts to sort argument with
most occurrences (so likely most complex combined predicate) last, I chose
not to test that last argument's predicate.
So e.g. on the testcase from comment 47 in the PR:
void
foo (int *f, int d, int e)
{
  for (int i = 0; i < 1024; i++)
    {
      int a = f[i];
      int t;
      if (a < 0)
	t = 1;
      else if (a < e)
	t = 1 - a * d;
      else
	t = 0;
      f[i] = t;
    }
}
we used to emit:
  _7 = a_10 < 0;
  _21 = a_10 >= 0;
  _22 = a_10 < e_11(D);
  _23 = _21 & _22;
  _26 = a_10 >= e_11(D);
  _27 = _21 & _26;
  _ifc__42 = _7 ? 1 : t_13;
  _ifc__43 = _23 ? t_13 : _ifc__42;
  t_6 = _27 ? 0 : _ifc__43;
while the following patch changes it to:
  _7 = a_10 < 0;
  _21 = a_10 >= 0;
  _22 = a_10 < e_11(D);
  _23 = _21 & _22;
  _ifc__42 = _23 ? t_13 : 0;
  t_6 = _7 ? 1 : _ifc__42;
which I believe should be sufficient for a PHI <1, t_13, 0>.

I've gathered some statistics and on x86_64-linux and i686-linux
bootstraps/regtests, this code triggers:
     92 4 4
    112 2 4
    141 3 4
   4046 3 3
(where 2nd number is args_len and 3rd argument EDGE_COUNT (bb->preds)
and first argument count of those from sort | uniq -c | sort -n).
In all these cases the patch should squeze one extra COND_EXPR and
its associated predicate (the latter only if it wasn't used elsewhere).

Incrementally, I think we should try to perform some analysis on which
predicates depend on inverses of other predicates and if possible try
to sort the arguments better and omit testing unnecessary predicates.
So essentially for the above testcase deconstruct it back to:
  _7 = a_10 < 0;
  _22 = a_10 < e_11(D);
  _ifc__42 = _22 ? t_13 : 0;
  t_6 = _7 ? 1 : _ifc__42;
which is like what this patch produces, but with the & a_10 >= 0 part
removed, because the last predicate is a_10 < 0 and so testing a_10 >= 0
on what appears on the false branch doesn't make sense.
But I'm afraid that will take more work than is doable in stage4 right now.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-04-15  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/109154
	* tree-if-conv.cc (predicate_scalar_phi): For complex PHIs, emit just
	args_len - 1 COND_EXPRs rather than args_len.  Formatting fix.


	Jakub
  

Comments

Richard Biener April 15, 2023, 9:13 a.m. UTC | #1
> Am 15.04.2023 um 10:30 schrieb Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org>:
> 
> Hi!
> 
> The following patch is just a dumb improvement, gets rid of 2 unnecessary
> instructions on both the PR's original testcase and on the two reduced ones,
> both on -mcpu=neoverse-v1 and -mavx512f.
> 
> The thing is, if we have args_len (args_len >= 2) unique PHI arguments,
> we need only args_len - 1 COND_EXPRs to expand the PHI, because first
> COND_EXPR can merge 2 unique arguments and all the following ones merge
> another unique argument with the previously merged arguments,
> while the code for mysterious reasons was always emitting args_len
> COND_EXPRs, where the first COND_EXPR merged the first and second unique
> arguments, the second COND_EXPR merged the second unique argument with
> result of merging the first and second unique arguments and the rest was
> already expectable, nth COND_EXPR for n > 2 merged the nth unique argument
> with result of merging the previous unique arguments.
> Now, in my understanding, the bb_predicate for bb's predecessor need to
> form a disjunct set which together creates the successor's bb_predicate,
> so I don't see why we'd need to check all the bb_predicates, if we check
> all but one then when all those other ones are false the last bb_predicate
> is necessarily true.  Given that the code attempts to sort argument with
> most occurrences (so likely most complex combined predicate) last, I chose
> not to test that last argument's predicate.
> So e.g. on the testcase from comment 47 in the PR:
> void
> foo (int *f, int d, int e)
> {
>  for (int i = 0; i < 1024; i++)
>    {
>      int a = f[i];
>      int t;
>      if (a < 0)
>    t = 1;
>      else if (a < e)
>    t = 1 - a * d;
>      else
>    t = 0;
>      f[i] = t;
>    }
> }
> we used to emit:
>  _7 = a_10 < 0;
>  _21 = a_10 >= 0;
>  _22 = a_10 < e_11(D);
>  _23 = _21 & _22;
>  _26 = a_10 >= e_11(D);
>  _27 = _21 & _26;
>  _ifc__42 = _7 ? 1 : t_13;
>  _ifc__43 = _23 ? t_13 : _ifc__42;
>  t_6 = _27 ? 0 : _ifc__43;
> while the following patch changes it to:
>  _7 = a_10 < 0;
>  _21 = a_10 >= 0;
>  _22 = a_10 < e_11(D);
>  _23 = _21 & _22;
>  _ifc__42 = _23 ? t_13 : 0;
>  t_6 = _7 ? 1 : _ifc__42;
> which I believe should be sufficient for a PHI <1, t_13, 0>.
> 
> I've gathered some statistics and on x86_64-linux and i686-linux
> bootstraps/regtests, this code triggers:
>     92 4 4
>    112 2 4
>    141 3 4
>   4046 3 3
> (where 2nd number is args_len and 3rd argument EDGE_COUNT (bb->preds)
> and first argument count of those from sort | uniq -c | sort -n).
> In all these cases the patch should squeze one extra COND_EXPR and
> its associated predicate (the latter only if it wasn't used elsewhere).
> 
> Incrementally, I think we should try to perform some analysis on which
> predicates depend on inverses of other predicates and if possible try
> to sort the arguments better and omit testing unnecessary predicates.
> So essentially for the above testcase deconstruct it back to:
>  _7 = a_10 < 0;
>  _22 = a_10 < e_11(D);
>  _ifc__42 = _22 ? t_13 : 0;
>  t_6 = _7 ? 1 : _ifc__42;
> which is like what this patch produces, but with the & a_10 >= 0 part
> removed, because the last predicate is a_10 < 0 and so testing a_10 >= 0
> on what appears on the false branch doesn't make sense.
> But I'm afraid that will take more work than is doable in stage4 right now.

Agreed.

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Yes - thanks for spotting this obvious improvement.

Richard 

> 2023-04-15  Jakub Jelinek  <jakub@redhat.com>
> 
>    PR tree-optimization/109154
>    * tree-if-conv.cc (predicate_scalar_phi): For complex PHIs, emit just
>    args_len - 1 COND_EXPRs rather than args_len.  Formatting fix.
> 
> --- gcc/tree-if-conv.cc.jj    2023-04-12 08:53:58.264496474 +0200
> +++ gcc/tree-if-conv.cc    2023-04-14 21:02:42.403826690 +0200
> @@ -2071,7 +2071,7 @@ predicate_scalar_phi (gphi *phi, gimple_
>     }
> 
>   /* Put element with max number of occurences to the end of ARGS.  */
> -  if (max_ind != -1 && max_ind +1 != (int) args_len)
> +  if (max_ind != -1 && max_ind + 1 != (int) args_len)
>     std::swap (args[args_len - 1], args[max_ind]);
> 
>   /* Handle one special case when number of arguments with different values
> @@ -2116,12 +2116,12 @@ predicate_scalar_phi (gphi *phi, gimple_
>       vec<int> *indexes;
>       tree type = TREE_TYPE (gimple_phi_result (phi));
>       tree lhs;
> -      arg1 = args[1];
> -      for (i = 0; i < args_len; i++)
> +      arg1 = args[args_len - 1];
> +      for (i = args_len - 1; i > 0; i--)
>    {
> -      arg0 = args[i];
> -      indexes = phi_arg_map.get (args[i]);
> -      if (i != args_len - 1)
> +      arg0 = args[i - 1];
> +      indexes = phi_arg_map.get (args[i - 1]);
> +      if (i != 1)
>        lhs = make_temp_ssa_name (type, NULL, "_ifc_");
>      else
>        lhs = res;
> 
>    Jakub
>
  

Patch

--- gcc/tree-if-conv.cc.jj	2023-04-12 08:53:58.264496474 +0200
+++ gcc/tree-if-conv.cc	2023-04-14 21:02:42.403826690 +0200
@@ -2071,7 +2071,7 @@  predicate_scalar_phi (gphi *phi, gimple_
     }
 
   /* Put element with max number of occurences to the end of ARGS.  */
-  if (max_ind != -1 && max_ind +1 != (int) args_len)
+  if (max_ind != -1 && max_ind + 1 != (int) args_len)
     std::swap (args[args_len - 1], args[max_ind]);
 
   /* Handle one special case when number of arguments with different values
@@ -2116,12 +2116,12 @@  predicate_scalar_phi (gphi *phi, gimple_
       vec<int> *indexes;
       tree type = TREE_TYPE (gimple_phi_result (phi));
       tree lhs;
-      arg1 = args[1];
-      for (i = 0; i < args_len; i++)
+      arg1 = args[args_len - 1];
+      for (i = args_len - 1; i > 0; i--)
 	{
-	  arg0 = args[i];
-	  indexes = phi_arg_map.get (args[i]);
-	  if (i != args_len - 1)
+	  arg0 = args[i - 1];
+	  indexes = phi_arg_map.get (args[i - 1]);
+	  if (i != 1)
 	    lhs = make_temp_ssa_name (type, NULL, "_ifc_");
 	  else
 	    lhs = res;