match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]

Message ID 20230721150851.94504-1-drross@redhat.com
State Accepted
Headers
Series match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955] |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Drew Ross July 21, 2023, 3:08 p.m. UTC
  Simplifies (x << c) >> c where x is a signed integral type of
width >= int and c = precision(type) - 1 into -(x & 1). Tested successfully
on x86_64 and x86 targets.

        PR middle-end/101955

gcc/ChangeLog:

        * match.pd (x << c) >> c -> -(x & 1): New simplification.

gcc/testsuite/ChangeLog:

        * gcc.dg/pr101955.c: New test.
---
 gcc/match.pd                    | 10 +++++
 gcc/testsuite/gcc.dg/pr101955.c | 69 +++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr101955.c
  

Comments

Andrew Pinski July 21, 2023, 5:27 p.m. UTC | #1
On Fri, Jul 21, 2023 at 8:09 AM Drew Ross via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Simplifies (x << c) >> c where x is a signed integral type of
> width >= int and c = precision(type) - 1 into -(x & 1). Tested successfully
> on x86_64 and x86 targets.

Thinking about this some more, I think this should be handled in
expand rather than on the gimple level.
It is very much related to PR 110717 even. We are basically truncating
to a signed one bit integer and then sign extending that across the
whole code.

Thanks,
Andrew

>
>         PR middle-end/101955
>
> gcc/ChangeLog:
>
>         * match.pd (x << c) >> c -> -(x & 1): New simplification.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/pr101955.c: New test.
> ---
>  gcc/match.pd                    | 10 +++++
>  gcc/testsuite/gcc.dg/pr101955.c | 69 +++++++++++++++++++++++++++++++++
>  2 files changed, 79 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr101955.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 8543f777a28..820fc890e8e 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3766,6 +3766,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>        && (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
>    (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
>
> +/* Optimize (X << C) >> C where C = precision(type) - 1 and X is signed
> +   into -(X & 1).  */
> +(simplify
> + (rshift (nop_convert? (lshift @0 uniform_integer_cst_p@1)) @@1)
> + (with { tree cst = uniform_integer_cst_p (@1); }
> + (if (ANY_INTEGRAL_TYPE_P (type)
> +      && !TYPE_UNSIGNED (type)
> +      && wi::eq_p (wi::to_wide (cst), element_precision (type) - 1))
> +  (negate (bit_and (convert @0) { build_one_cst (type); })))))
> +
>  /* Optimize x >> x into 0 */
>  (simplify
>   (rshift @0 @0)
> diff --git a/gcc/testsuite/gcc.dg/pr101955.c b/gcc/testsuite/gcc.dg/pr101955.c
> new file mode 100644
> index 00000000000..386154911c5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr101955.c
> @@ -0,0 +1,69 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse1 -Wno-psabi" } */
> +
> +typedef int v4si __attribute__((vector_size(4 * sizeof(int))));
> +
> +__attribute__((noipa)) int
> +t1 (int x)
> +{
> +  return (x << 31) >> 31;
> +}
> +
> +__attribute__((noipa)) int
> +t2 (int x)
> +{
> +  int y = x << 31;
> +  int z = y >> 31;
> +  return z;
> +}
> +
> +__attribute__((noipa)) int
> +t3 (int x)
> +{
> +  int w = 31;
> +  int y = x << w;
> +  int z = y >> w;
> +  return z;
> +}
> +
> +__attribute__((noipa)) long long
> +t4 (long long x)
> +{
> +  return (x << 63) >> 63;
> +}
> +
> +__attribute__((noipa)) long long
> +t5 (long long x)
> +{
> +  long long y = x << 63;
> +  long long z = y >> 63;
> +  return z;
> +}
> +
> +__attribute__((noipa)) long long
> +t6 (long long x)
> +{
> +  int w = 63;
> +  long long y = x << w;
> +  long long z = y >> w;
> +  return z;
> +}
> +
> +__attribute__((noipa)) v4si
> +t7 (v4si x)
> +{
> +  return (x << 31) >> 31;
> +}
> +
> +__attribute__((noipa)) v4si
> +t8 (v4si x)
> +{
> +  v4si t = {31,31,31,31};
> +  return (x << t) >> t;
> +}
> +
> +/* { dg-final { scan-tree-dump-not " >> " "dse1" } } */
> +/* { dg-final { scan-tree-dump-not " << " "dse1" } } */
> +/* { dg-final { scan-tree-dump-times " -" 8 "dse1" } } */
> +/* { dg-final { scan-tree-dump-times " & " 8 "dse1" } } */
> +
> --
> 2.39.3
>
  
Jeff Law July 22, 2023, 6:09 a.m. UTC | #2
On 7/21/23 11:27, Andrew Pinski via Gcc-patches wrote:
> On Fri, Jul 21, 2023 at 8:09 AM Drew Ross via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> Simplifies (x << c) >> c where x is a signed integral type of
>> width >= int and c = precision(type) - 1 into -(x & 1). Tested successfully
>> on x86_64 and x86 targets.
> 
> Thinking about this some more, I think this should be handled in
> expand rather than on the gimple level.
> It is very much related to PR 110717 even. We are basically truncating
> to a signed one bit integer and then sign extending that across the
> whole code.
But why defer it to expand?  This idiom is going to generate a -1,0 
result which is definitely interesting from a further simplification 
standpoint.

Jeff
  
Richard Biener July 24, 2023, 7:16 a.m. UTC | #3
On Sat, Jul 22, 2023 at 8:09 AM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 7/21/23 11:27, Andrew Pinski via Gcc-patches wrote:
> > On Fri, Jul 21, 2023 at 8:09 AM Drew Ross via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >> Simplifies (x << c) >> c where x is a signed integral type of
> >> width >= int and c = precision(type) - 1 into -(x & 1). Tested successfully
> >> on x86_64 and x86 targets.
> >
> > Thinking about this some more, I think this should be handled in
> > expand rather than on the gimple level.
> > It is very much related to PR 110717 even. We are basically truncating
> > to a signed one bit integer and then sign extending that across the
> > whole code.
> But why defer it to expand?  This idiom is going to generate a -1,0
> result which is definitely interesting from a further simplification
> standpoint.

It's not 'simpler' so it would be a canonicalization.  We talked about
providing a SEXT_EXPR at some point (sign-extend from constant bit N).

Another canonicalization to existing ops would be

 (convert (convert:sbool @0))

where sbool is a 1-bit precision signed type.  I think that's a better
canonicalization
than -(x & 1)?  For zero-extensions we canonicalize such a conversion sequence
to x & const-mask.  For sign-extensions there's no single operation
representation.

Richard.

>
> Jeff
  
Drew Ross July 24, 2023, 7:29 p.m. UTC | #4
So would something like

(simplify
 (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
 (with { tree stype = build_nonstandard_integer_type (1, 0); }
 (if (INTEGRAL_TYPE_P (type)
      && !TYPE_UNSIGNED (type)
      && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
  (convert (convert:stype @0)))))

work?

Drew

On Mon, Jul 24, 2023 at 3:16 AM Richard Biener <richard.guenther@gmail.com>
wrote:

> On Sat, Jul 22, 2023 at 8:09 AM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> >
> >
> > On 7/21/23 11:27, Andrew Pinski via Gcc-patches wrote:
> > > On Fri, Jul 21, 2023 at 8:09 AM Drew Ross via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > >>
> > >> Simplifies (x << c) >> c where x is a signed integral type of
> > >> width >= int and c = precision(type) - 1 into -(x & 1). Tested
> successfully
> > >> on x86_64 and x86 targets.
> > >
> > > Thinking about this some more, I think this should be handled in
> > > expand rather than on the gimple level.
> > > It is very much related to PR 110717 even. We are basically truncating
> > > to a signed one bit integer and then sign extending that across the
> > > whole code.
> > But why defer it to expand?  This idiom is going to generate a -1,0
> > result which is definitely interesting from a further simplification
> > standpoint.
>
> It's not 'simpler' so it would be a canonicalization.  We talked about
> providing a SEXT_EXPR at some point (sign-extend from constant bit N).
>
> Another canonicalization to existing ops would be
>
>  (convert (convert:sbool @0))
>
> where sbool is a 1-bit precision signed type.  I think that's a better
> canonicalization
> than -(x & 1)?  For zero-extensions we canonicalize such a conversion
> sequence
> to x & const-mask.  For sign-extensions there's no single operation
> representation.
>
> Richard.
>
> >
> > Jeff
>
>
  
Jakub Jelinek July 24, 2023, 7:42 p.m. UTC | #5
On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches wrote:
> So would something like
> 
> (simplify
>  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>  (with { tree stype = build_nonstandard_integer_type (1, 0); }
>  (if (INTEGRAL_TYPE_P (type)
>       && !TYPE_UNSIGNED (type)
>       && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
>   (convert (convert:stype @0)))))
> 
> work?

Certainly swap the if and with and the (with then should be indented by 1
column to the right of (if and (convert one further (the reason for the
swapping is not to call build_nonstandard_integer_type when it will not be
needed, which will be probably far more often then an actual match).
As discussed privately, the above isn't what we want for vectors and the 2
shifts are probably best on most arches because even when using -(x & 1) the
{ 1, 1, 1, ... } vector would often needed to be loaded from memory.

	Jakub
  
Richard Biener July 25, 2023, 6:54 a.m. UTC | #6
On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches wrote:
> > So would something like
> >
> > (simplify
> >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> >  (with { tree stype = build_nonstandard_integer_type (1, 0); }
> >  (if (INTEGRAL_TYPE_P (type)
> >       && !TYPE_UNSIGNED (type)
> >       && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
> >   (convert (convert:stype @0)))))
> >
> > work?
>
> Certainly swap the if and with and the (with then should be indented by 1
> column to the right of (if and (convert one further (the reason for the
> swapping is not to call build_nonstandard_integer_type when it will not be
> needed, which will be probably far more often then an actual match).

With that fixed I think for non-vector integrals the above is the most suitable
canonical form of a sign-extension.  Note it should also work for any other
constant shift amount - just use the appropriate intermediate precision for
the truncating type.  You might also want to verify what RTL expansion
produces before/after - it at least shouldn't be worse.  We _might_ want
to consider to only use the converts when the intermediate type has
mode precision (and as a special case allow one bit as in your above case)
so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).

> As discussed privately, the above isn't what we want for vectors and the 2
> shifts are probably best on most arches because even when using -(x & 1) the
> { 1, 1, 1, ... } vector would often needed to be loaded from memory.

I think for vectors a vpcmpgt {0,0,0,..}, %xmm is the cheapest way of
producing the result.  Note that to reflect this on GIMPLE you'd need

  _2 = _1 < { 0,0...};
  res = _2 ? { -1, -1, ...} : { 0, 0,...};

because whether the ISA has a way to produce all-ones masks isn't known.

For scalars using -(T)(_1 < 0) would also be possible.

That said - do you have any testcase where the canonicalization is an enabler
for further transforms or was this requested stand-alone?

Thanks,
Richard.

>         Jakub
>
  
Drew Ross July 25, 2023, 7:25 p.m. UTC | #7
> With that fixed I think for non-vector integrals the above is the most
suitable
> canonical form of a sign-extension.  Note it should also work for any
other
> constant shift amount - just use the appropriate intermediate precision
for
> the truncating type.
> We _might_ want
> to consider to only use the converts when the intermediate type has
> mode precision (and as a special case allow one bit as in your above case)
> so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).

Here is a pattern that that only matches to truncations that result in mode
precision (or precision of 1):

(simplify
 (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
 (if (INTEGRAL_TYPE_P (type)
      && !TYPE_UNSIGNED (type)
      && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN
(TREE_TYPE (@1))))
  (with {
    int width = element_precision (type) - tree_to_uhwi (@1);
    tree stype = build_nonstandard_integer_type (width, 0);
   }
   (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
    (convert (convert:stype @0))))))

Look ok?

> You might also want to verify what RTL expansion
> produces before/after - it at least shouldn't be worse.

The RTL is slightly better for the mode precision cases and slightly worse
for the precision 1 case.

> That said - do you have any testcase where the canonicalization is an
enabler
> for further transforms or was this requested stand-alone?

No, I don't have any specific test cases. This patch is just in response to
pr101955 <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101955>.

On Tue, Jul 25, 2023 at 2:55 AM Richard Biener <richard.guenther@gmail.com>
wrote:

> On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches
> wrote:
> > > So would something like
> > >
> > > (simplify
> > >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> > >  (with { tree stype = build_nonstandard_integer_type (1, 0); }
> > >  (if (INTEGRAL_TYPE_P (type)
> > >       && !TYPE_UNSIGNED (type)
> > >       && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
> > >   (convert (convert:stype @0)))))
> > >
> > > work?
> >
> > Certainly swap the if and with and the (with then should be indented by 1
> > column to the right of (if and (convert one further (the reason for the
> > swapping is not to call build_nonstandard_integer_type when it will not
> be
> > needed, which will be probably far more often then an actual match).
>
> With that fixed I think for non-vector integrals the above is the most
> suitable
> canonical form of a sign-extension.  Note it should also work for any other
> constant shift amount - just use the appropriate intermediate precision for
> the truncating type.  You might also want to verify what RTL expansion
> produces before/after - it at least shouldn't be worse.  We _might_ want
> to consider to only use the converts when the intermediate type has
> mode precision (and as a special case allow one bit as in your above case)
> so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
>
> > As discussed privately, the above isn't what we want for vectors and the
> 2
> > shifts are probably best on most arches because even when using -(x & 1)
> the
> > { 1, 1, 1, ... } vector would often needed to be loaded from memory.
>
> I think for vectors a vpcmpgt {0,0,0,..}, %xmm is the cheapest way of
> producing the result.  Note that to reflect this on GIMPLE you'd need
>
>   _2 = _1 < { 0,0...};
>   res = _2 ? { -1, -1, ...} : { 0, 0,...};
>
> because whether the ISA has a way to produce all-ones masks isn't known.
>
> For scalars using -(T)(_1 < 0) would also be possible.
>
> That said - do you have any testcase where the canonicalization is an
> enabler
> for further transforms or was this requested stand-alone?
>
> Thanks,
> Richard.
>
> >         Jakub
> >
>
>
  
Jakub Jelinek July 25, 2023, 7:43 p.m. UTC | #8
On Tue, Jul 25, 2023 at 03:25:57PM -0400, Drew Ross wrote:
> > With that fixed I think for non-vector integrals the above is the most
> suitable
> > canonical form of a sign-extension.  Note it should also work for any
> other
> > constant shift amount - just use the appropriate intermediate precision
> for
> > the truncating type.
> > We _might_ want
> > to consider to only use the converts when the intermediate type has
> > mode precision (and as a special case allow one bit as in your above case)
> > so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
> 
> Here is a pattern that that only matches to truncations that result in mode
> precision (or precision of 1):
> 
> (simplify
>  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>  (if (INTEGRAL_TYPE_P (type)
>       && !TYPE_UNSIGNED (type)
>       && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN
> (TREE_TYPE (@1))))

I'd use
     && wi::ltu_p (wi::to_wide (@1), element_precision (type))
If the shift count would be negative, you'd otherwise ICE in tree_to_uhwi on
it (sure, that is UB at runtime, but compiler shouldn't ICE on it).

>   (with {
>     int width = element_precision (type) - tree_to_uhwi (@1);
>     tree stype = build_nonstandard_integer_type (width, 0);
>    }
>    (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
>     (convert (convert:stype @0))))))

	Jakub
  
Richard Biener July 26, 2023, 8:39 a.m. UTC | #9
On Tue, Jul 25, 2023 at 9:26 PM Drew Ross <drross@redhat.com> wrote:
>
> > With that fixed I think for non-vector integrals the above is the most suitable
> > canonical form of a sign-extension.  Note it should also work for any other
> > constant shift amount - just use the appropriate intermediate precision for
> > the truncating type.
> > We _might_ want
> > to consider to only use the converts when the intermediate type has
> > mode precision (and as a special case allow one bit as in your above case)
> > so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
>
> Here is a pattern that that only matches to truncations that result in mode precision (or precision of 1):
>
> (simplify
>  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>  (if (INTEGRAL_TYPE_P (type)
>       && !TYPE_UNSIGNED (type)
>       && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN (TREE_TYPE (@1))))
>   (with {
>     int width = element_precision (type) - tree_to_uhwi (@1);
>     tree stype = build_nonstandard_integer_type (width, 0);
>    }
>    (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
>     (convert (convert:stype @0))))))
>
> Look ok?

I suppose so.  Can you see to amend the existing

/* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
   types.  */
(simplify
 (rshift (lshift @0 INTEGER_CST@1) @1)
 (if (TYPE_UNSIGNED (type)
      && (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
  (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))

pattern?  You will get a duplicate pattern diagnostic otherwise.  It
also looks like this
one has the (nop_convert? ..) missing.  Btw, I wonder whether we can handle
some cases of widening/truncating converts between the shifts?

Richard.

> > You might also want to verify what RTL expansion
> > produces before/after - it at least shouldn't be worse.
>
> The RTL is slightly better for the mode precision cases and slightly worse for the precision 1 case.
>
> > That said - do you have any testcase where the canonicalization is an enabler
> > for further transforms or was this requested stand-alone?
>
> No, I don't have any specific test cases. This patch is just in response to pr101955.
>
> On Tue, Jul 25, 2023 at 2:55 AM Richard Biener <richard.guenther@gmail.com> wrote:
>>
>> On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek <jakub@redhat.com> wrote:
>> >
>> > On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches wrote:
>> > > So would something like
>> > >
>> > > (simplify
>> > >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>> > >  (with { tree stype = build_nonstandard_integer_type (1, 0); }
>> > >  (if (INTEGRAL_TYPE_P (type)
>> > >       && !TYPE_UNSIGNED (type)
>> > >       && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
>> > >   (convert (convert:stype @0)))))
>> > >
>> > > work?
>> >
>> > Certainly swap the if and with and the (with then should be indented by 1
>> > column to the right of (if and (convert one further (the reason for the
>> > swapping is not to call build_nonstandard_integer_type when it will not be
>> > needed, which will be probably far more often then an actual match).
>>
>> With that fixed I think for non-vector integrals the above is the most suitable
>> canonical form of a sign-extension.  Note it should also work for any other
>> constant shift amount - just use the appropriate intermediate precision for
>> the truncating type.  You might also want to verify what RTL expansion
>> produces before/after - it at least shouldn't be worse.  We _might_ want
>> to consider to only use the converts when the intermediate type has
>> mode precision (and as a special case allow one bit as in your above case)
>> so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
>>
>> > As discussed privately, the above isn't what we want for vectors and the 2
>> > shifts are probably best on most arches because even when using -(x & 1) the
>> > { 1, 1, 1, ... } vector would often needed to be loaded from memory.
>>
>> I think for vectors a vpcmpgt {0,0,0,..}, %xmm is the cheapest way of
>> producing the result.  Note that to reflect this on GIMPLE you'd need
>>
>>   _2 = _1 < { 0,0...};
>>   res = _2 ? { -1, -1, ...} : { 0, 0,...};
>>
>> because whether the ISA has a way to produce all-ones masks isn't known.
>>
>> For scalars using -(T)(_1 < 0) would also be possible.
>>
>> That said - do you have any testcase where the canonicalization is an enabler
>> for further transforms or was this requested stand-alone?
>>
>> Thanks,
>> Richard.
>>
>> >         Jakub
>> >
>>
  
Drew Ross July 26, 2023, 6:18 p.m. UTC | #10
Here is what I came up with for combining the two:

/* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
   unsigned x OR truncate into the precision(type) - c lowest bits
   of signed x (if they have mode precision or a precision of 1)  */
(simplify
 (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
 (if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
  (if (TYPE_UNSIGNED (type))
   (bit_and @0 (rshift { build_minus_one_cst (type); } @1))
   (if (INTEGRAL_TYPE_P (type))
    (with {
      int width = element_precision (type) - tree_to_uhwi (@1);
      tree stype = build_nonstandard_integer_type (width, 0);
     }
     (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
      (convert (convert:stype @0))))))))

Let me know what you think.

> Btw, I wonder whether we can handle
> some cases of widening/truncating converts between the shifts?

I will look into this.

Drew

On Wed, Jul 26, 2023 at 4:40 AM Richard Biener <richard.guenther@gmail.com>
wrote:

> On Tue, Jul 25, 2023 at 9:26 PM Drew Ross <drross@redhat.com> wrote:
> >
> > > With that fixed I think for non-vector integrals the above is the most
> suitable
> > > canonical form of a sign-extension.  Note it should also work for any
> other
> > > constant shift amount - just use the appropriate intermediate
> precision for
> > > the truncating type.
> > > We _might_ want
> > > to consider to only use the converts when the intermediate type has
> > > mode precision (and as a special case allow one bit as in your above
> case)
> > > so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
> >
> > Here is a pattern that that only matches to truncations that result in
> mode precision (or precision of 1):
> >
> > (simplify
> >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> >  (if (INTEGRAL_TYPE_P (type)
> >       && !TYPE_UNSIGNED (type)
> >       && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN
> (TREE_TYPE (@1))))
> >   (with {
> >     int width = element_precision (type) - tree_to_uhwi (@1);
> >     tree stype = build_nonstandard_integer_type (width, 0);
> >    }
> >    (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
> >     (convert (convert:stype @0))))))
> >
> > Look ok?
>
> I suppose so.  Can you see to amend the existing
>
> /* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
>    types.  */
> (simplify
>  (rshift (lshift @0 INTEGER_CST@1) @1)
>  (if (TYPE_UNSIGNED (type)
>       && (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
>   (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
>
> pattern?  You will get a duplicate pattern diagnostic otherwise.  It
> also looks like this
> one has the (nop_convert? ..) missing.  Btw, I wonder whether we can handle
> some cases of widening/truncating converts between the shifts?
>
> Richard.
>
> > > You might also want to verify what RTL expansion
> > > produces before/after - it at least shouldn't be worse.
> >
> > The RTL is slightly better for the mode precision cases and slightly
> worse for the precision 1 case.
> >
> > > That said - do you have any testcase where the canonicalization is an
> enabler
> > > for further transforms or was this requested stand-alone?
> >
> > No, I don't have any specific test cases. This patch is just in response
> to pr101955.
> >
> > On Tue, Jul 25, 2023 at 2:55 AM Richard Biener <
> richard.guenther@gmail.com> wrote:
> >>
> >> On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >> >
> >> > On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches
> wrote:
> >> > > So would something like
> >> > >
> >> > > (simplify
> >> > >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> >> > >  (with { tree stype = build_nonstandard_integer_type (1, 0); }
> >> > >  (if (INTEGRAL_TYPE_P (type)
> >> > >       && !TYPE_UNSIGNED (type)
> >> > >       && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
> >> > >   (convert (convert:stype @0)))))
> >> > >
> >> > > work?
> >> >
> >> > Certainly swap the if and with and the (with then should be indented
> by 1
> >> > column to the right of (if and (convert one further (the reason for
> the
> >> > swapping is not to call build_nonstandard_integer_type when it will
> not be
> >> > needed, which will be probably far more often then an actual match).
> >>
> >> With that fixed I think for non-vector integrals the above is the most
> suitable
> >> canonical form of a sign-extension.  Note it should also work for any
> other
> >> constant shift amount - just use the appropriate intermediate precision
> for
> >> the truncating type.  You might also want to verify what RTL expansion
> >> produces before/after - it at least shouldn't be worse.  We _might_ want
> >> to consider to only use the converts when the intermediate type has
> >> mode precision (and as a special case allow one bit as in your above
> case)
> >> so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
> >>
> >> > As discussed privately, the above isn't what we want for vectors and
> the 2
> >> > shifts are probably best on most arches because even when using -(x &
> 1) the
> >> > { 1, 1, 1, ... } vector would often needed to be loaded from memory.
> >>
> >> I think for vectors a vpcmpgt {0,0,0,..}, %xmm is the cheapest way of
> >> producing the result.  Note that to reflect this on GIMPLE you'd need
> >>
> >>   _2 = _1 < { 0,0...};
> >>   res = _2 ? { -1, -1, ...} : { 0, 0,...};
> >>
> >> because whether the ISA has a way to produce all-ones masks isn't known.
> >>
> >> For scalars using -(T)(_1 < 0) would also be possible.
> >>
> >> That said - do you have any testcase where the canonicalization is an
> enabler
> >> for further transforms or was this requested stand-alone?
> >>
> >> Thanks,
> >> Richard.
> >>
> >> >         Jakub
> >> >
> >>
>
>
  
Richard Biener July 28, 2023, 6:30 a.m. UTC | #11
On Wed, Jul 26, 2023 at 8:19 PM Drew Ross <drross@redhat.com> wrote:
>
> Here is what I came up with for combining the two:
>
> /* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
>    unsigned x OR truncate into the precision(type) - c lowest bits
>    of signed x (if they have mode precision or a precision of 1)  */
> (simplify
>  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>  (if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
>   (if (TYPE_UNSIGNED (type))
>    (bit_and @0 (rshift { build_minus_one_cst (type); } @1))
>    (if (INTEGRAL_TYPE_P (type))
>     (with {
>       int width = element_precision (type) - tree_to_uhwi (@1);
>       tree stype = build_nonstandard_integer_type (width, 0);
>      }
>      (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
>       (convert (convert:stype @0))))))))
>
> Let me know what you think.

Looks good to me.

Thanks,
Richard.

> > Btw, I wonder whether we can handle
> > some cases of widening/truncating converts between the shifts?
>
> I will look into this.
>
> Drew
>
> On Wed, Jul 26, 2023 at 4:40 AM Richard Biener <richard.guenther@gmail.com> wrote:
>>
>> On Tue, Jul 25, 2023 at 9:26 PM Drew Ross <drross@redhat.com> wrote:
>> >
>> > > With that fixed I think for non-vector integrals the above is the most suitable
>> > > canonical form of a sign-extension.  Note it should also work for any other
>> > > constant shift amount - just use the appropriate intermediate precision for
>> > > the truncating type.
>> > > We _might_ want
>> > > to consider to only use the converts when the intermediate type has
>> > > mode precision (and as a special case allow one bit as in your above case)
>> > > so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
>> >
>> > Here is a pattern that that only matches to truncations that result in mode precision (or precision of 1):
>> >
>> > (simplify
>> >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>> >  (if (INTEGRAL_TYPE_P (type)
>> >       && !TYPE_UNSIGNED (type)
>> >       && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN (TREE_TYPE (@1))))
>> >   (with {
>> >     int width = element_precision (type) - tree_to_uhwi (@1);
>> >     tree stype = build_nonstandard_integer_type (width, 0);
>> >    }
>> >    (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
>> >     (convert (convert:stype @0))))))
>> >
>> > Look ok?
>>
>> I suppose so.  Can you see to amend the existing
>>
>> /* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
>>    types.  */
>> (simplify
>>  (rshift (lshift @0 INTEGER_CST@1) @1)
>>  (if (TYPE_UNSIGNED (type)
>>       && (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
>>   (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
>>
>> pattern?  You will get a duplicate pattern diagnostic otherwise.  It
>> also looks like this
>> one has the (nop_convert? ..) missing.  Btw, I wonder whether we can handle
>> some cases of widening/truncating converts between the shifts?
>>
>> Richard.
>>
>> > > You might also want to verify what RTL expansion
>> > > produces before/after - it at least shouldn't be worse.
>> >
>> > The RTL is slightly better for the mode precision cases and slightly worse for the precision 1 case.
>> >
>> > > That said - do you have any testcase where the canonicalization is an enabler
>> > > for further transforms or was this requested stand-alone?
>> >
>> > No, I don't have any specific test cases. This patch is just in response to pr101955.
>> >
>> > On Tue, Jul 25, 2023 at 2:55 AM Richard Biener <richard.guenther@gmail.com> wrote:
>> >>
>> >> On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek <jakub@redhat.com> wrote:
>> >> >
>> >> > On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches wrote:
>> >> > > So would something like
>> >> > >
>> >> > > (simplify
>> >> > >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>> >> > >  (with { tree stype = build_nonstandard_integer_type (1, 0); }
>> >> > >  (if (INTEGRAL_TYPE_P (type)
>> >> > >       && !TYPE_UNSIGNED (type)
>> >> > >       && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
>> >> > >   (convert (convert:stype @0)))))
>> >> > >
>> >> > > work?
>> >> >
>> >> > Certainly swap the if and with and the (with then should be indented by 1
>> >> > column to the right of (if and (convert one further (the reason for the
>> >> > swapping is not to call build_nonstandard_integer_type when it will not be
>> >> > needed, which will be probably far more often then an actual match).
>> >>
>> >> With that fixed I think for non-vector integrals the above is the most suitable
>> >> canonical form of a sign-extension.  Note it should also work for any other
>> >> constant shift amount - just use the appropriate intermediate precision for
>> >> the truncating type.  You might also want to verify what RTL expansion
>> >> produces before/after - it at least shouldn't be worse.  We _might_ want
>> >> to consider to only use the converts when the intermediate type has
>> >> mode precision (and as a special case allow one bit as in your above case)
>> >> so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
>> >>
>> >> > As discussed privately, the above isn't what we want for vectors and the 2
>> >> > shifts are probably best on most arches because even when using -(x & 1) the
>> >> > { 1, 1, 1, ... } vector would often needed to be loaded from memory.
>> >>
>> >> I think for vectors a vpcmpgt {0,0,0,..}, %xmm is the cheapest way of
>> >> producing the result.  Note that to reflect this on GIMPLE you'd need
>> >>
>> >>   _2 = _1 < { 0,0...};
>> >>   res = _2 ? { -1, -1, ...} : { 0, 0,...};
>> >>
>> >> because whether the ISA has a way to produce all-ones masks isn't known.
>> >>
>> >> For scalars using -(T)(_1 < 0) would also be possible.
>> >>
>> >> That said - do you have any testcase where the canonicalization is an enabler
>> >> for further transforms or was this requested stand-alone?
>> >>
>> >> Thanks,
>> >> Richard.
>> >>
>> >> >         Jakub
>> >> >
>> >>
>>
  

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 8543f777a28..820fc890e8e 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3766,6 +3766,16 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
       && (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
   (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
 
+/* Optimize (X << C) >> C where C = precision(type) - 1 and X is signed
+   into -(X & 1).  */
+(simplify
+ (rshift (nop_convert? (lshift @0 uniform_integer_cst_p@1)) @@1)
+ (with { tree cst = uniform_integer_cst_p (@1); }
+ (if (ANY_INTEGRAL_TYPE_P (type)
+      && !TYPE_UNSIGNED (type)
+      && wi::eq_p (wi::to_wide (cst), element_precision (type) - 1))
+  (negate (bit_and (convert @0) { build_one_cst (type); })))))
+
 /* Optimize x >> x into 0 */
 (simplify
  (rshift @0 @0)
diff --git a/gcc/testsuite/gcc.dg/pr101955.c b/gcc/testsuite/gcc.dg/pr101955.c
new file mode 100644
index 00000000000..386154911c5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr101955.c
@@ -0,0 +1,69 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse1 -Wno-psabi" } */
+
+typedef int v4si __attribute__((vector_size(4 * sizeof(int))));
+
+__attribute__((noipa)) int
+t1 (int x)
+{
+  return (x << 31) >> 31;
+}
+
+__attribute__((noipa)) int
+t2 (int x)
+{
+  int y = x << 31;
+  int z = y >> 31;
+  return z;
+}
+
+__attribute__((noipa)) int
+t3 (int x)
+{
+  int w = 31;
+  int y = x << w;
+  int z = y >> w;
+  return z;
+}
+
+__attribute__((noipa)) long long
+t4 (long long x)
+{
+  return (x << 63) >> 63;
+}
+
+__attribute__((noipa)) long long
+t5 (long long x)
+{
+  long long y = x << 63;
+  long long z = y >> 63;
+  return z;
+}
+
+__attribute__((noipa)) long long
+t6 (long long x)
+{
+  int w = 63;
+  long long y = x << w;
+  long long z = y >> w;
+  return z;
+}
+
+__attribute__((noipa)) v4si
+t7 (v4si x)
+{
+  return (x << 31) >> 31;
+}
+
+__attribute__((noipa)) v4si
+t8 (v4si x)
+{
+  v4si t = {31,31,31,31};
+  return (x << t) >> t;
+}
+
+/* { dg-final { scan-tree-dump-not " >> " "dse1" } } */
+/* { dg-final { scan-tree-dump-not " << " "dse1" } } */
+/* { dg-final { scan-tree-dump-times " -" 8 "dse1" } } */
+/* { dg-final { scan-tree-dump-times " & " 8 "dse1" } } */
+