match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]
Checks
Commit Message
Simplifies (x << c) >> c where x is a signed integral type of
width >= int and c = precision(type) - 1 into -(x & 1). Tested successfully
on x86_64 and x86 targets.
PR middle-end/101955
gcc/ChangeLog:
* match.pd (x << c) >> c -> -(x & 1): New simplification.
gcc/testsuite/ChangeLog:
* gcc.dg/pr101955.c: New test.
---
gcc/match.pd | 10 +++++
gcc/testsuite/gcc.dg/pr101955.c | 69 +++++++++++++++++++++++++++++++++
2 files changed, 79 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/pr101955.c
Comments
On Fri, Jul 21, 2023 at 8:09 AM Drew Ross via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Simplifies (x << c) >> c where x is a signed integral type of
> width >= int and c = precision(type) - 1 into -(x & 1). Tested successfully
> on x86_64 and x86 targets.
Thinking about this some more, I think this should be handled in
expand rather than on the gimple level.
It is very much related to PR 110717 even. We are basically truncating
to a signed one bit integer and then sign extending that across the
whole code.
Thanks,
Andrew
>
> PR middle-end/101955
>
> gcc/ChangeLog:
>
> * match.pd (x << c) >> c -> -(x & 1): New simplification.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr101955.c: New test.
> ---
> gcc/match.pd | 10 +++++
> gcc/testsuite/gcc.dg/pr101955.c | 69 +++++++++++++++++++++++++++++++++
> 2 files changed, 79 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/pr101955.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 8543f777a28..820fc890e8e 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3766,6 +3766,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> && (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
> (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
>
> +/* Optimize (X << C) >> C where C = precision(type) - 1 and X is signed
> + into -(X & 1). */
> +(simplify
> + (rshift (nop_convert? (lshift @0 uniform_integer_cst_p@1)) @@1)
> + (with { tree cst = uniform_integer_cst_p (@1); }
> + (if (ANY_INTEGRAL_TYPE_P (type)
> + && !TYPE_UNSIGNED (type)
> + && wi::eq_p (wi::to_wide (cst), element_precision (type) - 1))
> + (negate (bit_and (convert @0) { build_one_cst (type); })))))
> +
> /* Optimize x >> x into 0 */
> (simplify
> (rshift @0 @0)
> diff --git a/gcc/testsuite/gcc.dg/pr101955.c b/gcc/testsuite/gcc.dg/pr101955.c
> new file mode 100644
> index 00000000000..386154911c5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr101955.c
> @@ -0,0 +1,69 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse1 -Wno-psabi" } */
> +
> +typedef int v4si __attribute__((vector_size(4 * sizeof(int))));
> +
> +__attribute__((noipa)) int
> +t1 (int x)
> +{
> + return (x << 31) >> 31;
> +}
> +
> +__attribute__((noipa)) int
> +t2 (int x)
> +{
> + int y = x << 31;
> + int z = y >> 31;
> + return z;
> +}
> +
> +__attribute__((noipa)) int
> +t3 (int x)
> +{
> + int w = 31;
> + int y = x << w;
> + int z = y >> w;
> + return z;
> +}
> +
> +__attribute__((noipa)) long long
> +t4 (long long x)
> +{
> + return (x << 63) >> 63;
> +}
> +
> +__attribute__((noipa)) long long
> +t5 (long long x)
> +{
> + long long y = x << 63;
> + long long z = y >> 63;
> + return z;
> +}
> +
> +__attribute__((noipa)) long long
> +t6 (long long x)
> +{
> + int w = 63;
> + long long y = x << w;
> + long long z = y >> w;
> + return z;
> +}
> +
> +__attribute__((noipa)) v4si
> +t7 (v4si x)
> +{
> + return (x << 31) >> 31;
> +}
> +
> +__attribute__((noipa)) v4si
> +t8 (v4si x)
> +{
> + v4si t = {31,31,31,31};
> + return (x << t) >> t;
> +}
> +
> +/* { dg-final { scan-tree-dump-not " >> " "dse1" } } */
> +/* { dg-final { scan-tree-dump-not " << " "dse1" } } */
> +/* { dg-final { scan-tree-dump-times " -" 8 "dse1" } } */
> +/* { dg-final { scan-tree-dump-times " & " 8 "dse1" } } */
> +
> --
> 2.39.3
>
On 7/21/23 11:27, Andrew Pinski via Gcc-patches wrote:
> On Fri, Jul 21, 2023 at 8:09 AM Drew Ross via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> Simplifies (x << c) >> c where x is a signed integral type of
>> width >= int and c = precision(type) - 1 into -(x & 1). Tested successfully
>> on x86_64 and x86 targets.
>
> Thinking about this some more, I think this should be handled in
> expand rather than on the gimple level.
> It is very much related to PR 110717 even. We are basically truncating
> to a signed one bit integer and then sign extending that across the
> whole code.
But why defer it to expand? This idiom is going to generate a -1,0
result which is definitely interesting from a further simplification
standpoint.
Jeff
On Sat, Jul 22, 2023 at 8:09 AM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 7/21/23 11:27, Andrew Pinski via Gcc-patches wrote:
> > On Fri, Jul 21, 2023 at 8:09 AM Drew Ross via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >> Simplifies (x << c) >> c where x is a signed integral type of
> >> width >= int and c = precision(type) - 1 into -(x & 1). Tested successfully
> >> on x86_64 and x86 targets.
> >
> > Thinking about this some more, I think this should be handled in
> > expand rather than on the gimple level.
> > It is very much related to PR 110717 even. We are basically truncating
> > to a signed one bit integer and then sign extending that across the
> > whole code.
> But why defer it to expand? This idiom is going to generate a -1,0
> result which is definitely interesting from a further simplification
> standpoint.
It's not 'simpler' so it would be a canonicalization. We talked about
providing a SEXT_EXPR at some point (sign-extend from constant bit N).
Another canonicalization to existing ops would be
(convert (convert:sbool @0))
where sbool is a 1-bit precision signed type. I think that's a better
canonicalization
than -(x & 1)? For zero-extensions we canonicalize such a conversion sequence
to x & const-mask. For sign-extensions there's no single operation
representation.
Richard.
>
> Jeff
So would something like
(simplify
(rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
(with { tree stype = build_nonstandard_integer_type (1, 0); }
(if (INTEGRAL_TYPE_P (type)
&& !TYPE_UNSIGNED (type)
&& wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
(convert (convert:stype @0)))))
work?
Drew
On Mon, Jul 24, 2023 at 3:16 AM Richard Biener <richard.guenther@gmail.com>
wrote:
> On Sat, Jul 22, 2023 at 8:09 AM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> >
> >
> > On 7/21/23 11:27, Andrew Pinski via Gcc-patches wrote:
> > > On Fri, Jul 21, 2023 at 8:09 AM Drew Ross via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > >>
> > >> Simplifies (x << c) >> c where x is a signed integral type of
> > >> width >= int and c = precision(type) - 1 into -(x & 1). Tested
> successfully
> > >> on x86_64 and x86 targets.
> > >
> > > Thinking about this some more, I think this should be handled in
> > > expand rather than on the gimple level.
> > > It is very much related to PR 110717 even. We are basically truncating
> > > to a signed one bit integer and then sign extending that across the
> > > whole code.
> > But why defer it to expand? This idiom is going to generate a -1,0
> > result which is definitely interesting from a further simplification
> > standpoint.
>
> It's not 'simpler' so it would be a canonicalization. We talked about
> providing a SEXT_EXPR at some point (sign-extend from constant bit N).
>
> Another canonicalization to existing ops would be
>
> (convert (convert:sbool @0))
>
> where sbool is a 1-bit precision signed type. I think that's a better
> canonicalization
> than -(x & 1)? For zero-extensions we canonicalize such a conversion
> sequence
> to x & const-mask. For sign-extensions there's no single operation
> representation.
>
> Richard.
>
> >
> > Jeff
>
>
On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches wrote:
> So would something like
>
> (simplify
> (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> (with { tree stype = build_nonstandard_integer_type (1, 0); }
> (if (INTEGRAL_TYPE_P (type)
> && !TYPE_UNSIGNED (type)
> && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
> (convert (convert:stype @0)))))
>
> work?
Certainly swap the if and with and the (with then should be indented by 1
column to the right of (if and (convert one further (the reason for the
swapping is not to call build_nonstandard_integer_type when it will not be
needed, which will be probably far more often then an actual match).
As discussed privately, the above isn't what we want for vectors and the 2
shifts are probably best on most arches because even when using -(x & 1) the
{ 1, 1, 1, ... } vector would often needed to be loaded from memory.
Jakub
On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches wrote:
> > So would something like
> >
> > (simplify
> > (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> > (with { tree stype = build_nonstandard_integer_type (1, 0); }
> > (if (INTEGRAL_TYPE_P (type)
> > && !TYPE_UNSIGNED (type)
> > && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
> > (convert (convert:stype @0)))))
> >
> > work?
>
> Certainly swap the if and with and the (with then should be indented by 1
> column to the right of (if and (convert one further (the reason for the
> swapping is not to call build_nonstandard_integer_type when it will not be
> needed, which will be probably far more often then an actual match).
With that fixed I think for non-vector integrals the above is the most suitable
canonical form of a sign-extension. Note it should also work for any other
constant shift amount - just use the appropriate intermediate precision for
the truncating type. You might also want to verify what RTL expansion
produces before/after - it at least shouldn't be worse. We _might_ want
to consider to only use the converts when the intermediate type has
mode precision (and as a special case allow one bit as in your above case)
so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
> As discussed privately, the above isn't what we want for vectors and the 2
> shifts are probably best on most arches because even when using -(x & 1) the
> { 1, 1, 1, ... } vector would often needed to be loaded from memory.
I think for vectors a vpcmpgt {0,0,0,..}, %xmm is the cheapest way of
producing the result. Note that to reflect this on GIMPLE you'd need
_2 = _1 < { 0,0...};
res = _2 ? { -1, -1, ...} : { 0, 0,...};
because whether the ISA has a way to produce all-ones masks isn't known.
For scalars using -(T)(_1 < 0) would also be possible.
That said - do you have any testcase where the canonicalization is an enabler
for further transforms or was this requested stand-alone?
Thanks,
Richard.
> Jakub
>
> With that fixed I think for non-vector integrals the above is the most
suitable
> canonical form of a sign-extension. Note it should also work for any
other
> constant shift amount - just use the appropriate intermediate precision
for
> the truncating type.
> We _might_ want
> to consider to only use the converts when the intermediate type has
> mode precision (and as a special case allow one bit as in your above case)
> so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
Here is a pattern that that only matches to truncations that result in mode
precision (or precision of 1):
(simplify
(rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
(if (INTEGRAL_TYPE_P (type)
&& !TYPE_UNSIGNED (type)
&& wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN
(TREE_TYPE (@1))))
(with {
int width = element_precision (type) - tree_to_uhwi (@1);
tree stype = build_nonstandard_integer_type (width, 0);
}
(if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
(convert (convert:stype @0))))))
Look ok?
> You might also want to verify what RTL expansion
> produces before/after - it at least shouldn't be worse.
The RTL is slightly better for the mode precision cases and slightly worse
for the precision 1 case.
> That said - do you have any testcase where the canonicalization is an
enabler
> for further transforms or was this requested stand-alone?
No, I don't have any specific test cases. This patch is just in response to
pr101955 <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101955>.
On Tue, Jul 25, 2023 at 2:55 AM Richard Biener <richard.guenther@gmail.com>
wrote:
> On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches
> wrote:
> > > So would something like
> > >
> > > (simplify
> > > (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> > > (with { tree stype = build_nonstandard_integer_type (1, 0); }
> > > (if (INTEGRAL_TYPE_P (type)
> > > && !TYPE_UNSIGNED (type)
> > > && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
> > > (convert (convert:stype @0)))))
> > >
> > > work?
> >
> > Certainly swap the if and with and the (with then should be indented by 1
> > column to the right of (if and (convert one further (the reason for the
> > swapping is not to call build_nonstandard_integer_type when it will not
> be
> > needed, which will be probably far more often then an actual match).
>
> With that fixed I think for non-vector integrals the above is the most
> suitable
> canonical form of a sign-extension. Note it should also work for any other
> constant shift amount - just use the appropriate intermediate precision for
> the truncating type. You might also want to verify what RTL expansion
> produces before/after - it at least shouldn't be worse. We _might_ want
> to consider to only use the converts when the intermediate type has
> mode precision (and as a special case allow one bit as in your above case)
> so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
>
> > As discussed privately, the above isn't what we want for vectors and the
> 2
> > shifts are probably best on most arches because even when using -(x & 1)
> the
> > { 1, 1, 1, ... } vector would often needed to be loaded from memory.
>
> I think for vectors a vpcmpgt {0,0,0,..}, %xmm is the cheapest way of
> producing the result. Note that to reflect this on GIMPLE you'd need
>
> _2 = _1 < { 0,0...};
> res = _2 ? { -1, -1, ...} : { 0, 0,...};
>
> because whether the ISA has a way to produce all-ones masks isn't known.
>
> For scalars using -(T)(_1 < 0) would also be possible.
>
> That said - do you have any testcase where the canonicalization is an
> enabler
> for further transforms or was this requested stand-alone?
>
> Thanks,
> Richard.
>
> > Jakub
> >
>
>
On Tue, Jul 25, 2023 at 03:25:57PM -0400, Drew Ross wrote:
> > With that fixed I think for non-vector integrals the above is the most
> suitable
> > canonical form of a sign-extension. Note it should also work for any
> other
> > constant shift amount - just use the appropriate intermediate precision
> for
> > the truncating type.
> > We _might_ want
> > to consider to only use the converts when the intermediate type has
> > mode precision (and as a special case allow one bit as in your above case)
> > so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
>
> Here is a pattern that that only matches to truncations that result in mode
> precision (or precision of 1):
>
> (simplify
> (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> (if (INTEGRAL_TYPE_P (type)
> && !TYPE_UNSIGNED (type)
> && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN
> (TREE_TYPE (@1))))
I'd use
&& wi::ltu_p (wi::to_wide (@1), element_precision (type))
If the shift count would be negative, you'd otherwise ICE in tree_to_uhwi on
it (sure, that is UB at runtime, but compiler shouldn't ICE on it).
> (with {
> int width = element_precision (type) - tree_to_uhwi (@1);
> tree stype = build_nonstandard_integer_type (width, 0);
> }
> (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
> (convert (convert:stype @0))))))
Jakub
On Tue, Jul 25, 2023 at 9:26 PM Drew Ross <drross@redhat.com> wrote:
>
> > With that fixed I think for non-vector integrals the above is the most suitable
> > canonical form of a sign-extension. Note it should also work for any other
> > constant shift amount - just use the appropriate intermediate precision for
> > the truncating type.
> > We _might_ want
> > to consider to only use the converts when the intermediate type has
> > mode precision (and as a special case allow one bit as in your above case)
> > so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
>
> Here is a pattern that that only matches to truncations that result in mode precision (or precision of 1):
>
> (simplify
> (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> (if (INTEGRAL_TYPE_P (type)
> && !TYPE_UNSIGNED (type)
> && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN (TREE_TYPE (@1))))
> (with {
> int width = element_precision (type) - tree_to_uhwi (@1);
> tree stype = build_nonstandard_integer_type (width, 0);
> }
> (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
> (convert (convert:stype @0))))))
>
> Look ok?
I suppose so. Can you see to amend the existing
/* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
types. */
(simplify
(rshift (lshift @0 INTEGER_CST@1) @1)
(if (TYPE_UNSIGNED (type)
&& (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
(bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
pattern? You will get a duplicate pattern diagnostic otherwise. It
also looks like this
one has the (nop_convert? ..) missing. Btw, I wonder whether we can handle
some cases of widening/truncating converts between the shifts?
Richard.
> > You might also want to verify what RTL expansion
> > produces before/after - it at least shouldn't be worse.
>
> The RTL is slightly better for the mode precision cases and slightly worse for the precision 1 case.
>
> > That said - do you have any testcase where the canonicalization is an enabler
> > for further transforms or was this requested stand-alone?
>
> No, I don't have any specific test cases. This patch is just in response to pr101955.
>
> On Tue, Jul 25, 2023 at 2:55 AM Richard Biener <richard.guenther@gmail.com> wrote:
>>
>> On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek <jakub@redhat.com> wrote:
>> >
>> > On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches wrote:
>> > > So would something like
>> > >
>> > > (simplify
>> > > (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>> > > (with { tree stype = build_nonstandard_integer_type (1, 0); }
>> > > (if (INTEGRAL_TYPE_P (type)
>> > > && !TYPE_UNSIGNED (type)
>> > > && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
>> > > (convert (convert:stype @0)))))
>> > >
>> > > work?
>> >
>> > Certainly swap the if and with and the (with then should be indented by 1
>> > column to the right of (if and (convert one further (the reason for the
>> > swapping is not to call build_nonstandard_integer_type when it will not be
>> > needed, which will be probably far more often then an actual match).
>>
>> With that fixed I think for non-vector integrals the above is the most suitable
>> canonical form of a sign-extension. Note it should also work for any other
>> constant shift amount - just use the appropriate intermediate precision for
>> the truncating type. You might also want to verify what RTL expansion
>> produces before/after - it at least shouldn't be worse. We _might_ want
>> to consider to only use the converts when the intermediate type has
>> mode precision (and as a special case allow one bit as in your above case)
>> so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
>>
>> > As discussed privately, the above isn't what we want for vectors and the 2
>> > shifts are probably best on most arches because even when using -(x & 1) the
>> > { 1, 1, 1, ... } vector would often needed to be loaded from memory.
>>
>> I think for vectors a vpcmpgt {0,0,0,..}, %xmm is the cheapest way of
>> producing the result. Note that to reflect this on GIMPLE you'd need
>>
>> _2 = _1 < { 0,0...};
>> res = _2 ? { -1, -1, ...} : { 0, 0,...};
>>
>> because whether the ISA has a way to produce all-ones masks isn't known.
>>
>> For scalars using -(T)(_1 < 0) would also be possible.
>>
>> That said - do you have any testcase where the canonicalization is an enabler
>> for further transforms or was this requested stand-alone?
>>
>> Thanks,
>> Richard.
>>
>> > Jakub
>> >
>>
Here is what I came up with for combining the two:
/* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
unsigned x OR truncate into the precision(type) - c lowest bits
of signed x (if they have mode precision or a precision of 1) */
(simplify
(rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
(if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
(if (TYPE_UNSIGNED (type))
(bit_and @0 (rshift { build_minus_one_cst (type); } @1))
(if (INTEGRAL_TYPE_P (type))
(with {
int width = element_precision (type) - tree_to_uhwi (@1);
tree stype = build_nonstandard_integer_type (width, 0);
}
(if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
(convert (convert:stype @0))))))))
Let me know what you think.
> Btw, I wonder whether we can handle
> some cases of widening/truncating converts between the shifts?
I will look into this.
Drew
On Wed, Jul 26, 2023 at 4:40 AM Richard Biener <richard.guenther@gmail.com>
wrote:
> On Tue, Jul 25, 2023 at 9:26 PM Drew Ross <drross@redhat.com> wrote:
> >
> > > With that fixed I think for non-vector integrals the above is the most
> suitable
> > > canonical form of a sign-extension. Note it should also work for any
> other
> > > constant shift amount - just use the appropriate intermediate
> precision for
> > > the truncating type.
> > > We _might_ want
> > > to consider to only use the converts when the intermediate type has
> > > mode precision (and as a special case allow one bit as in your above
> case)
> > > so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
> >
> > Here is a pattern that that only matches to truncations that result in
> mode precision (or precision of 1):
> >
> > (simplify
> > (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> > (if (INTEGRAL_TYPE_P (type)
> > && !TYPE_UNSIGNED (type)
> > && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN
> (TREE_TYPE (@1))))
> > (with {
> > int width = element_precision (type) - tree_to_uhwi (@1);
> > tree stype = build_nonstandard_integer_type (width, 0);
> > }
> > (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
> > (convert (convert:stype @0))))))
> >
> > Look ok?
>
> I suppose so. Can you see to amend the existing
>
> /* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
> types. */
> (simplify
> (rshift (lshift @0 INTEGER_CST@1) @1)
> (if (TYPE_UNSIGNED (type)
> && (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
> (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
>
> pattern? You will get a duplicate pattern diagnostic otherwise. It
> also looks like this
> one has the (nop_convert? ..) missing. Btw, I wonder whether we can handle
> some cases of widening/truncating converts between the shifts?
>
> Richard.
>
> > > You might also want to verify what RTL expansion
> > > produces before/after - it at least shouldn't be worse.
> >
> > The RTL is slightly better for the mode precision cases and slightly
> worse for the precision 1 case.
> >
> > > That said - do you have any testcase where the canonicalization is an
> enabler
> > > for further transforms or was this requested stand-alone?
> >
> > No, I don't have any specific test cases. This patch is just in response
> to pr101955.
> >
> > On Tue, Jul 25, 2023 at 2:55 AM Richard Biener <
> richard.guenther@gmail.com> wrote:
> >>
> >> On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >> >
> >> > On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches
> wrote:
> >> > > So would something like
> >> > >
> >> > > (simplify
> >> > > (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> >> > > (with { tree stype = build_nonstandard_integer_type (1, 0); }
> >> > > (if (INTEGRAL_TYPE_P (type)
> >> > > && !TYPE_UNSIGNED (type)
> >> > > && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
> >> > > (convert (convert:stype @0)))))
> >> > >
> >> > > work?
> >> >
> >> > Certainly swap the if and with and the (with then should be indented
> by 1
> >> > column to the right of (if and (convert one further (the reason for
> the
> >> > swapping is not to call build_nonstandard_integer_type when it will
> not be
> >> > needed, which will be probably far more often then an actual match).
> >>
> >> With that fixed I think for non-vector integrals the above is the most
> suitable
> >> canonical form of a sign-extension. Note it should also work for any
> other
> >> constant shift amount - just use the appropriate intermediate precision
> for
> >> the truncating type. You might also want to verify what RTL expansion
> >> produces before/after - it at least shouldn't be worse. We _might_ want
> >> to consider to only use the converts when the intermediate type has
> >> mode precision (and as a special case allow one bit as in your above
> case)
> >> so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
> >>
> >> > As discussed privately, the above isn't what we want for vectors and
> the 2
> >> > shifts are probably best on most arches because even when using -(x &
> 1) the
> >> > { 1, 1, 1, ... } vector would often needed to be loaded from memory.
> >>
> >> I think for vectors a vpcmpgt {0,0,0,..}, %xmm is the cheapest way of
> >> producing the result. Note that to reflect this on GIMPLE you'd need
> >>
> >> _2 = _1 < { 0,0...};
> >> res = _2 ? { -1, -1, ...} : { 0, 0,...};
> >>
> >> because whether the ISA has a way to produce all-ones masks isn't known.
> >>
> >> For scalars using -(T)(_1 < 0) would also be possible.
> >>
> >> That said - do you have any testcase where the canonicalization is an
> enabler
> >> for further transforms or was this requested stand-alone?
> >>
> >> Thanks,
> >> Richard.
> >>
> >> > Jakub
> >> >
> >>
>
>
On Wed, Jul 26, 2023 at 8:19 PM Drew Ross <drross@redhat.com> wrote:
>
> Here is what I came up with for combining the two:
>
> /* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
> unsigned x OR truncate into the precision(type) - c lowest bits
> of signed x (if they have mode precision or a precision of 1) */
> (simplify
> (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> (if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
> (if (TYPE_UNSIGNED (type))
> (bit_and @0 (rshift { build_minus_one_cst (type); } @1))
> (if (INTEGRAL_TYPE_P (type))
> (with {
> int width = element_precision (type) - tree_to_uhwi (@1);
> tree stype = build_nonstandard_integer_type (width, 0);
> }
> (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
> (convert (convert:stype @0))))))))
>
> Let me know what you think.
Looks good to me.
Thanks,
Richard.
> > Btw, I wonder whether we can handle
> > some cases of widening/truncating converts between the shifts?
>
> I will look into this.
>
> Drew
>
> On Wed, Jul 26, 2023 at 4:40 AM Richard Biener <richard.guenther@gmail.com> wrote:
>>
>> On Tue, Jul 25, 2023 at 9:26 PM Drew Ross <drross@redhat.com> wrote:
>> >
>> > > With that fixed I think for non-vector integrals the above is the most suitable
>> > > canonical form of a sign-extension. Note it should also work for any other
>> > > constant shift amount - just use the appropriate intermediate precision for
>> > > the truncating type.
>> > > We _might_ want
>> > > to consider to only use the converts when the intermediate type has
>> > > mode precision (and as a special case allow one bit as in your above case)
>> > > so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
>> >
>> > Here is a pattern that that only matches to truncations that result in mode precision (or precision of 1):
>> >
>> > (simplify
>> > (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>> > (if (INTEGRAL_TYPE_P (type)
>> > && !TYPE_UNSIGNED (type)
>> > && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN (TREE_TYPE (@1))))
>> > (with {
>> > int width = element_precision (type) - tree_to_uhwi (@1);
>> > tree stype = build_nonstandard_integer_type (width, 0);
>> > }
>> > (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
>> > (convert (convert:stype @0))))))
>> >
>> > Look ok?
>>
>> I suppose so. Can you see to amend the existing
>>
>> /* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
>> types. */
>> (simplify
>> (rshift (lshift @0 INTEGER_CST@1) @1)
>> (if (TYPE_UNSIGNED (type)
>> && (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
>> (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
>>
>> pattern? You will get a duplicate pattern diagnostic otherwise. It
>> also looks like this
>> one has the (nop_convert? ..) missing. Btw, I wonder whether we can handle
>> some cases of widening/truncating converts between the shifts?
>>
>> Richard.
>>
>> > > You might also want to verify what RTL expansion
>> > > produces before/after - it at least shouldn't be worse.
>> >
>> > The RTL is slightly better for the mode precision cases and slightly worse for the precision 1 case.
>> >
>> > > That said - do you have any testcase where the canonicalization is an enabler
>> > > for further transforms or was this requested stand-alone?
>> >
>> > No, I don't have any specific test cases. This patch is just in response to pr101955.
>> >
>> > On Tue, Jul 25, 2023 at 2:55 AM Richard Biener <richard.guenther@gmail.com> wrote:
>> >>
>> >> On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek <jakub@redhat.com> wrote:
>> >> >
>> >> > On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches wrote:
>> >> > > So would something like
>> >> > >
>> >> > > (simplify
>> >> > > (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>> >> > > (with { tree stype = build_nonstandard_integer_type (1, 0); }
>> >> > > (if (INTEGRAL_TYPE_P (type)
>> >> > > && !TYPE_UNSIGNED (type)
>> >> > > && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
>> >> > > (convert (convert:stype @0)))))
>> >> > >
>> >> > > work?
>> >> >
>> >> > Certainly swap the if and with and the (with then should be indented by 1
>> >> > column to the right of (if and (convert one further (the reason for the
>> >> > swapping is not to call build_nonstandard_integer_type when it will not be
>> >> > needed, which will be probably far more often then an actual match).
>> >>
>> >> With that fixed I think for non-vector integrals the above is the most suitable
>> >> canonical form of a sign-extension. Note it should also work for any other
>> >> constant shift amount - just use the appropriate intermediate precision for
>> >> the truncating type. You might also want to verify what RTL expansion
>> >> produces before/after - it at least shouldn't be worse. We _might_ want
>> >> to consider to only use the converts when the intermediate type has
>> >> mode precision (and as a special case allow one bit as in your above case)
>> >> so it can expand to (sign_extend:<outer> (subreg:<inner> reg)).
>> >>
>> >> > As discussed privately, the above isn't what we want for vectors and the 2
>> >> > shifts are probably best on most arches because even when using -(x & 1) the
>> >> > { 1, 1, 1, ... } vector would often needed to be loaded from memory.
>> >>
>> >> I think for vectors a vpcmpgt {0,0,0,..}, %xmm is the cheapest way of
>> >> producing the result. Note that to reflect this on GIMPLE you'd need
>> >>
>> >> _2 = _1 < { 0,0...};
>> >> res = _2 ? { -1, -1, ...} : { 0, 0,...};
>> >>
>> >> because whether the ISA has a way to produce all-ones masks isn't known.
>> >>
>> >> For scalars using -(T)(_1 < 0) would also be possible.
>> >>
>> >> That said - do you have any testcase where the canonicalization is an enabler
>> >> for further transforms or was this requested stand-alone?
>> >>
>> >> Thanks,
>> >> Richard.
>> >>
>> >> > Jakub
>> >> >
>> >>
>>
@@ -3766,6 +3766,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
(bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
+/* Optimize (X << C) >> C where C = precision(type) - 1 and X is signed
+ into -(X & 1). */
+(simplify
+ (rshift (nop_convert? (lshift @0 uniform_integer_cst_p@1)) @@1)
+ (with { tree cst = uniform_integer_cst_p (@1); }
+ (if (ANY_INTEGRAL_TYPE_P (type)
+ && !TYPE_UNSIGNED (type)
+ && wi::eq_p (wi::to_wide (cst), element_precision (type) - 1))
+ (negate (bit_and (convert @0) { build_one_cst (type); })))))
+
/* Optimize x >> x into 0 */
(simplify
(rshift @0 @0)
new file mode 100644
@@ -0,0 +1,69 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse1 -Wno-psabi" } */
+
+typedef int v4si __attribute__((vector_size(4 * sizeof(int))));
+
+__attribute__((noipa)) int
+t1 (int x)
+{
+ return (x << 31) >> 31;
+}
+
+__attribute__((noipa)) int
+t2 (int x)
+{
+ int y = x << 31;
+ int z = y >> 31;
+ return z;
+}
+
+__attribute__((noipa)) int
+t3 (int x)
+{
+ int w = 31;
+ int y = x << w;
+ int z = y >> w;
+ return z;
+}
+
+__attribute__((noipa)) long long
+t4 (long long x)
+{
+ return (x << 63) >> 63;
+}
+
+__attribute__((noipa)) long long
+t5 (long long x)
+{
+ long long y = x << 63;
+ long long z = y >> 63;
+ return z;
+}
+
+__attribute__((noipa)) long long
+t6 (long long x)
+{
+ int w = 63;
+ long long y = x << w;
+ long long z = y >> w;
+ return z;
+}
+
+__attribute__((noipa)) v4si
+t7 (v4si x)
+{
+ return (x << 31) >> 31;
+}
+
+__attribute__((noipa)) v4si
+t8 (v4si x)
+{
+ v4si t = {31,31,31,31};
+ return (x << t) >> t;
+}
+
+/* { dg-final { scan-tree-dump-not " >> " "dse1" } } */
+/* { dg-final { scan-tree-dump-not " << " "dse1" } } */
+/* { dg-final { scan-tree-dump-times " -" 8 "dse1" } } */
+/* { dg-final { scan-tree-dump-times " & " 8 "dse1" } } */
+