[1/2] Improve do_store_flag for single bit comparison against 0

Message ID 20230519021410.1841811-1-apinski@marvell.com
State Accepted
Headers
Series [1/2] Improve do_store_flag for single bit comparison against 0 |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Andrew Pinski May 19, 2023, 2:14 a.m. UTC
  While working something else, I noticed we could improve
the following function code generation:
```
unsigned f(unsigned t)
{
  if (t & ~(1<<30)) __builtin_unreachable();
  return t != 0;
}
```
Right know we just emit a comparison against 0 instead
of just a shift right by 30.
There is code in do_store_flag which already optimizes
`(t & 1<<30) != 0` to `(t >> 30) & 1`. This patch
extends it to handle the case where we know t has a
nonzero of just one bit set.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

	* expr.cc (do_store_flag): Extend the one bit checking case
	to handle the case where we don't have an and but rather still
	one bit is known to be non-zero.
---
 gcc/expr.cc | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)
  

Comments

Jeff Law May 19, 2023, 4:38 p.m. UTC | #1
On 5/18/23 20:14, Andrew Pinski via Gcc-patches wrote:
> While working something else, I noticed we could improve
> the following function code generation:
> ```
> unsigned f(unsigned t)
> {
>    if (t & ~(1<<30)) __builtin_unreachable();
>    return t != 0;
> }
> ```
> Right know we just emit a comparison against 0 instead
> of just a shift right by 30.
> There is code in do_store_flag which already optimizes
> `(t & 1<<30) != 0` to `(t >> 30) & 1`. This patch
> extends it to handle the case where we know t has a
> nonzero of just one bit set.
> 
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> 
> gcc/ChangeLog:
> 
> 	* expr.cc (do_store_flag): Extend the one bit checking case
> 	to handle the case where we don't have an and but rather still
> 	one bit is known to be non-zero.
So as we touched on in IRC, the concern is targets where the cost of the 
shift depends on the number of bits shifted.  Can we look at costing 
here to determine the initial RTL generation approach?

Another approach that would work for some targets is a single bit 
extract.  In theory we should be discovering the extract idiom from the 
shift+and form, but I'm always concerned that it's going to be missed 
for one or more oddball reasons.

jeff
  
Andrew Pinski May 19, 2023, 10:04 p.m. UTC | #2
On Fri, May 19, 2023 at 9:40 AM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 5/18/23 20:14, Andrew Pinski via Gcc-patches wrote:
> > While working something else, I noticed we could improve
> > the following function code generation:
> > ```
> > unsigned f(unsigned t)
> > {
> >    if (t & ~(1<<30)) __builtin_unreachable();
> >    return t != 0;
> > }
> > ```
> > Right know we just emit a comparison against 0 instead
> > of just a shift right by 30.
> > There is code in do_store_flag which already optimizes
> > `(t & 1<<30) != 0` to `(t >> 30) & 1`. This patch
> > extends it to handle the case where we know t has a
> > nonzero of just one bit set.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > gcc/ChangeLog:
> >
> >       * expr.cc (do_store_flag): Extend the one bit checking case
> >       to handle the case where we don't have an and but rather still
> >       one bit is known to be non-zero.
> So as we touched on in IRC, the concern is targets where the cost of the
> shift depends on the number of bits shifted.  Can we look at costing
> here to determine the initial RTL generation approach?
>
> Another approach that would work for some targets is a single bit
> extract.  In theory we should be discovering the extract idiom from the
> shift+and form, but I'm always concerned that it's going to be missed
> for one or more oddball reasons.

I now have a patch set which does the extraction directly rather than having
combine try to combine it later on. This actually fixes an issue with avr target
which expands out the shift by doing a loop. Since we are using
extract_bit_field,
if a target does not have an extract pattern, it will expand using
shift+and form instead.
I will resubmit this and the other patch after this new patch set is completed.

Thanks,
Andrew Pinski

>
> jeff
>
  
Richard Biener May 22, 2023, 11:55 a.m. UTC | #3
On Fri, May 19, 2023 at 4:15 AM Andrew Pinski via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> While working something else, I noticed we could improve
> the following function code generation:
> ```
> unsigned f(unsigned t)
> {
>   if (t & ~(1<<30)) __builtin_unreachable();
>   return t != 0;
> }
> ```
> Right know we just emit a comparison against 0 instead
> of just a shift right by 30.
> There is code in do_store_flag which already optimizes
> `(t & 1<<30) != 0` to `(t >> 30) & 1`. This patch
> extends it to handle the case where we know t has a
> nonzero of just one bit set.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> gcc/ChangeLog:
>
>         * expr.cc (do_store_flag): Extend the one bit checking case
>         to handle the case where we don't have an and but rather still
>         one bit is known to be non-zero.
> ---
>  gcc/expr.cc | 27 +++++++++++++++++++++------
>  1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 5ede094e705..91528e734e7 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -13083,15 +13083,30 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
>        && integer_zerop (arg1)
>        && (TYPE_PRECISION (ops->type) != 1 || TYPE_UNSIGNED (ops->type)))
>      {
> -      gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
> -      if (srcstmt
> -         && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
> +      wide_int nz = tree_nonzero_bits (arg0);
> +
> +      if (wi::popcount (nz) == 1)
>         {
> +         tree op0;
> +         tree op1;
> +         gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
> +         /* If the defining statement was (x & POW2), then remove the and
> +            as we are going to add it back. */
> +         if (srcstmt
> +             && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
> +           {
> +             op0 = gimple_assign_rhs1 (srcstmt);
> +             op1 = gimple_assign_rhs2 (srcstmt);
> +           }
> +         else
> +           {
> +             op0 = arg0;
> +             op1 = wide_int_to_tree (TREE_TYPE (op0), nz);
> +           }
>           enum tree_code tcode = code == NE ? NE_EXPR : EQ_EXPR;
>           type = lang_hooks.types.type_for_mode (mode, unsignedp);
> -         tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (arg1),
> -                                      gimple_assign_rhs1 (srcstmt),
> -                                      gimple_assign_rhs2 (srcstmt));
> +         tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (op0),
> +                                      op0, op1);
>           temp = fold_single_bit_test (loc, tcode, temp, arg1, type);
>           if (temp)
>             return expand_expr (temp, target, VOIDmode, EXPAND_NORMAL);

I wonder if, instead of expanding expand with these kind of tricks we
want to instead
add to ISEL and use direct optab IFNs for things we matched?  In
particular I think
we do want to get rid of TER but the above adds another use of get_def_for_expr.

As Jeff says the above doesn't look like it includes costing so that would be an
argument to make it a generic match.pd transform (it appears to be "simpler")?

Richard.

> --
> 2.31.1
>
  
Andrew Pinski May 22, 2023, 3:21 p.m. UTC | #4
On Mon, May 22, 2023 at 4:56 AM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Fri, May 19, 2023 at 4:15 AM Andrew Pinski via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > While working something else, I noticed we could improve
> > the following function code generation:
> > ```
> > unsigned f(unsigned t)
> > {
> >   if (t & ~(1<<30)) __builtin_unreachable();
> >   return t != 0;
> > }
> > ```
> > Right know we just emit a comparison against 0 instead
> > of just a shift right by 30.
> > There is code in do_store_flag which already optimizes
> > `(t & 1<<30) != 0` to `(t >> 30) & 1`. This patch
> > extends it to handle the case where we know t has a
> > nonzero of just one bit set.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > gcc/ChangeLog:
> >
> >         * expr.cc (do_store_flag): Extend the one bit checking case
> >         to handle the case where we don't have an and but rather still
> >         one bit is known to be non-zero.
> > ---
> >  gcc/expr.cc | 27 +++++++++++++++++++++------
> >  1 file changed, 21 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/expr.cc b/gcc/expr.cc
> > index 5ede094e705..91528e734e7 100644
> > --- a/gcc/expr.cc
> > +++ b/gcc/expr.cc
> > @@ -13083,15 +13083,30 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
> >        && integer_zerop (arg1)
> >        && (TYPE_PRECISION (ops->type) != 1 || TYPE_UNSIGNED (ops->type)))
> >      {
> > -      gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
> > -      if (srcstmt
> > -         && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
> > +      wide_int nz = tree_nonzero_bits (arg0);
> > +
> > +      if (wi::popcount (nz) == 1)
> >         {
> > +         tree op0;
> > +         tree op1;
> > +         gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
> > +         /* If the defining statement was (x & POW2), then remove the and
> > +            as we are going to add it back. */
> > +         if (srcstmt
> > +             && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
> > +           {
> > +             op0 = gimple_assign_rhs1 (srcstmt);
> > +             op1 = gimple_assign_rhs2 (srcstmt);
> > +           }
> > +         else
> > +           {
> > +             op0 = arg0;
> > +             op1 = wide_int_to_tree (TREE_TYPE (op0), nz);
> > +           }
> >           enum tree_code tcode = code == NE ? NE_EXPR : EQ_EXPR;
> >           type = lang_hooks.types.type_for_mode (mode, unsignedp);
> > -         tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (arg1),
> > -                                      gimple_assign_rhs1 (srcstmt),
> > -                                      gimple_assign_rhs2 (srcstmt));
> > +         tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (op0),
> > +                                      op0, op1);
> >           temp = fold_single_bit_test (loc, tcode, temp, arg1, type);
> >           if (temp)
> >             return expand_expr (temp, target, VOIDmode, EXPAND_NORMAL);
>
> I wonder if, instead of expanding expand with these kind of tricks we
> want to instead
> add to ISEL and use direct optab IFNs for things we matched?  In
> particular I think
> we do want to get rid of TER but the above adds another use of get_def_for_expr.

The above does not add another at all. It was there before, it just
moves it around slightly. Instead we depend on the non-zero bits to be
correct before even trying get_def_for_expr .
The get_def_for_expr is there to remove the & if it can be ter'ed.

>
> As Jeff says the above doesn't look like it includes costing so that would be an
> argument to make it a generic match.pd transform (it appears to be "simpler")?

For the TER case, it would be same number of gimple instructions so
that can happen if we want
t = a & CST
result = t != 0
vs:
t1 = BIT_FIELD_REF <a, 1, N>
result = (bool)t1

For the non-TER case (which is what this patch is trying to solve).
we just have `t != 0` (where t has a non-zero value of CST) so it might increase
the number of gimple instructions by 1.

Is that ok? Or should that still happen in expand only.

The cost issue between a != 0 vs bit_extraction (for the non-ter case)
is something which I will be solving next weekend.

>
> Richard.
>
> > --
> > 2.31.1
> >
  
Richard Biener May 23, 2023, 6:07 a.m. UTC | #5
On Mon, May 22, 2023 at 5:21 PM Andrew Pinski <pinskia@gmail.com> wrote:
>
> On Mon, May 22, 2023 at 4:56 AM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Fri, May 19, 2023 at 4:15 AM Andrew Pinski via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > While working something else, I noticed we could improve
> > > the following function code generation:
> > > ```
> > > unsigned f(unsigned t)
> > > {
> > >   if (t & ~(1<<30)) __builtin_unreachable();
> > >   return t != 0;
> > > }
> > > ```
> > > Right know we just emit a comparison against 0 instead
> > > of just a shift right by 30.
> > > There is code in do_store_flag which already optimizes
> > > `(t & 1<<30) != 0` to `(t >> 30) & 1`. This patch
> > > extends it to handle the case where we know t has a
> > > nonzero of just one bit set.
> > >
> > > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> > >
> > > gcc/ChangeLog:
> > >
> > >         * expr.cc (do_store_flag): Extend the one bit checking case
> > >         to handle the case where we don't have an and but rather still
> > >         one bit is known to be non-zero.
> > > ---
> > >  gcc/expr.cc | 27 +++++++++++++++++++++------
> > >  1 file changed, 21 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/gcc/expr.cc b/gcc/expr.cc
> > > index 5ede094e705..91528e734e7 100644
> > > --- a/gcc/expr.cc
> > > +++ b/gcc/expr.cc
> > > @@ -13083,15 +13083,30 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
> > >        && integer_zerop (arg1)
> > >        && (TYPE_PRECISION (ops->type) != 1 || TYPE_UNSIGNED (ops->type)))
> > >      {
> > > -      gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
> > > -      if (srcstmt
> > > -         && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
> > > +      wide_int nz = tree_nonzero_bits (arg0);
> > > +
> > > +      if (wi::popcount (nz) == 1)
> > >         {
> > > +         tree op0;
> > > +         tree op1;
> > > +         gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
> > > +         /* If the defining statement was (x & POW2), then remove the and
> > > +            as we are going to add it back. */
> > > +         if (srcstmt
> > > +             && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
> > > +           {
> > > +             op0 = gimple_assign_rhs1 (srcstmt);
> > > +             op1 = gimple_assign_rhs2 (srcstmt);
> > > +           }
> > > +         else
> > > +           {
> > > +             op0 = arg0;
> > > +             op1 = wide_int_to_tree (TREE_TYPE (op0), nz);
> > > +           }
> > >           enum tree_code tcode = code == NE ? NE_EXPR : EQ_EXPR;
> > >           type = lang_hooks.types.type_for_mode (mode, unsignedp);
> > > -         tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (arg1),
> > > -                                      gimple_assign_rhs1 (srcstmt),
> > > -                                      gimple_assign_rhs2 (srcstmt));
> > > +         tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (op0),
> > > +                                      op0, op1);
> > >           temp = fold_single_bit_test (loc, tcode, temp, arg1, type);
> > >           if (temp)
> > >             return expand_expr (temp, target, VOIDmode, EXPAND_NORMAL);
> >
> > I wonder if, instead of expanding expand with these kind of tricks we
> > want to instead
> > add to ISEL and use direct optab IFNs for things we matched?  In
> > particular I think
> > we do want to get rid of TER but the above adds another use of get_def_for_expr.
>
> The above does not add another at all. It was there before, it just
> moves it around slightly. Instead we depend on the non-zero bits to be
> correct before even trying get_def_for_expr .
> The get_def_for_expr is there to remove the & if it can be ter'ed.
>
> >
> > As Jeff says the above doesn't look like it includes costing so that would be an
> > argument to make it a generic match.pd transform (it appears to be "simpler")?
>
> For the TER case, it would be same number of gimple instructions so
> that can happen if we want
> t = a & CST
> result = t != 0
> vs:
> t1 = BIT_FIELD_REF <a, 1, N>
> result = (bool)t1
>
> For the non-TER case (which is what this patch is trying to solve).
> we just have `t != 0` (where t has a non-zero value of CST) so it might increase
> the number of gimple instructions by 1.
>
> Is that ok? Or should that still happen in expand only.

I was looking at the `(t & 1<<30) != 0` to `(t >> 30) & 1` case mostly
(there's conversions involved at both sides possibly).  I don't think we
want BIT_FIELD_REF for bit extraction as canonical variant on GIMPLE.

> The cost issue between a != 0 vs bit_extraction (for the non-ter case)
> is something which I will be solving next weekend.

Ah, good - if costs are involved expand is the correct place (or pre-expand
ISEL).

Richard.

> >
> > Richard.
> >
> > > --
> > > 2.31.1
> > >
  

Patch

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 5ede094e705..91528e734e7 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -13083,15 +13083,30 @@  do_store_flag (sepops ops, rtx target, machine_mode mode)
       && integer_zerop (arg1)
       && (TYPE_PRECISION (ops->type) != 1 || TYPE_UNSIGNED (ops->type)))
     {
-      gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
-      if (srcstmt
-	  && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
+      wide_int nz = tree_nonzero_bits (arg0);
+
+      if (wi::popcount (nz) == 1)
 	{
+	  tree op0;
+	  tree op1;
+	  gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
+	  /* If the defining statement was (x & POW2), then remove the and
+	     as we are going to add it back. */
+	  if (srcstmt
+	      && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
+	    {
+	      op0 = gimple_assign_rhs1 (srcstmt);
+	      op1 = gimple_assign_rhs2 (srcstmt);
+	    }
+	  else
+	    {
+	      op0 = arg0;
+	      op1 = wide_int_to_tree (TREE_TYPE (op0), nz);
+	    }
 	  enum tree_code tcode = code == NE ? NE_EXPR : EQ_EXPR;
 	  type = lang_hooks.types.type_for_mode (mode, unsignedp);
-	  tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (arg1),
-				       gimple_assign_rhs1 (srcstmt),
-				       gimple_assign_rhs2 (srcstmt));
+	  tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (op0),
+				       op0, op1);
 	  temp = fold_single_bit_test (loc, tcode, temp, arg1, type);
 	  if (temp)
 	    return expand_expr (temp, target, VOIDmode, EXPAND_NORMAL);