match.pd: Optimize sign-extension followed by truncation [PR113024]
Checks
Commit Message
Hi!
While looking at a bitint ICE, I've noticed we don't optimize
in f1 and f5 functions below the 2 casts into just one at GIMPLE,
even when optimize it in convert_to_integer if it appears in the same
stmt. The large match.pd simplification of two conversions in a row
has many complex rules and as the testcase shows, everything else from
the narrowest -> widest -> prec_in_between all integer conversions
is already handled, either because the inside_unsignedp == inter_unsignedp
rule kicks in, or the
&& ((inter_unsignedp && inter_prec > inside_prec)
== (final_unsignedp && final_prec > inter_prec))
one, but there is no reason why sign extension to from narrowest to
widest type followed by truncation to something in between can't be
done just as sign extension from narrowest to the final type. After all,
if the widest type is signed rather than unsigned, regardless of the final
type signedness we already handle it that way.
And since PR93044 we also handle it if the final precision is not wider
than the inside precision.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
2023-12-14 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113024
* match.pd (two conversions in a row): Simplify scalar integer
sign-extension followed by truncation.
* gcc.dg/tree-ssa/pr113024.c: New test.
Jakub
Comments
On Thu, 14 Dec 2023, Jakub Jelinek wrote:
> Hi!
>
> While looking at a bitint ICE, I've noticed we don't optimize
> in f1 and f5 functions below the 2 casts into just one at GIMPLE,
> even when optimize it in convert_to_integer if it appears in the same
> stmt. The large match.pd simplification of two conversions in a row
> has many complex rules and as the testcase shows, everything else from
> the narrowest -> widest -> prec_in_between all integer conversions
> is already handled, either because the inside_unsignedp == inter_unsignedp
> rule kicks in, or the
> && ((inter_unsignedp && inter_prec > inside_prec)
> == (final_unsignedp && final_prec > inter_prec))
> one, but there is no reason why sign extension to from narrowest to
> widest type followed by truncation to something in between can't be
> done just as sign extension from narrowest to the final type. After all,
> if the widest type is signed rather than unsigned, regardless of the final
> type signedness we already handle it that way.
> And since PR93044 we also handle it if the final precision is not wider
> than the inside precision.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK.
Richard.
> 2023-12-14 Jakub Jelinek <jakub@redhat.com>
>
> PR tree-optimization/113024
> * match.pd (two conversions in a row): Simplify scalar integer
> sign-extension followed by truncation.
>
> * gcc.dg/tree-ssa/pr113024.c: New test.
>
> --- gcc/match.pd.jj 2023-12-14 11:59:28.000000000 +0100
> +++ gcc/match.pd 2023-12-14 18:25:00.457961975 +0100
> @@ -4754,11 +4754,14 @@ (define_operator_list SYNC_FETCH_AND_AND
> /* If we have a sign-extension of a zero-extended value, we can
> replace that by a single zero-extension. Likewise if the
> final conversion does not change precision we can drop the
> - intermediate conversion. */
> + intermediate conversion. Similarly truncation of a sign-extension
> + can be replaced by a single sign-extension. */
> (if (inside_int && inter_int && final_int
> && ((inside_prec < inter_prec && inter_prec < final_prec
> && inside_unsignedp && !inter_unsignedp)
> - || final_prec == inter_prec))
> + || final_prec == inter_prec
> + || (inside_prec < inter_prec && inter_prec > final_prec
> + && !inside_unsignedp && inter_unsignedp)))
> (ocvt @0))
>
> /* Two conversions in a row are not needed unless:
> --- gcc/testsuite/gcc.dg/tree-ssa/pr113024.c.jj 2023-12-14 18:35:30.652225327 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr113024.c 2023-12-14 18:37:42.056403418 +0100
> @@ -0,0 +1,22 @@
> +/* PR tree-optimization/113024 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-forwprop1" } */
> +/* Make sure we have just a single cast per function rather than 2 casts in some cases. */
> +/* { dg-final { scan-tree-dump-times " = \\\(\[a-z \]*\\\) \[xy_\]" 16 "forwprop1" { target { ilp32 || lp64 } } } } */
> +
> +unsigned int f1 (signed char x) { unsigned long long y = x; return y; }
> +unsigned int f2 (unsigned char x) { unsigned long long y = x; return y; }
> +unsigned int f3 (signed char x) { long long y = x; return y; }
> +unsigned int f4 (unsigned char x) { long long y = x; return y; }
> +int f5 (signed char x) { unsigned long long y = x; return y; }
> +int f6 (unsigned char x) { unsigned long long y = x; return y; }
> +int f7 (signed char x) { long long y = x; return y; }
> +int f8 (unsigned char x) { long long y = x; return y; }
> +unsigned int f9 (signed char x) { return (unsigned long long) x; }
> +unsigned int f10 (unsigned char x) { return (unsigned long long) x; }
> +unsigned int f11 (signed char x) { return (long long) x; }
> +unsigned int f12 (unsigned char x) { return (long long) x; }
> +int f13 (signed char x) { return (unsigned long long) x; }
> +int f14 (unsigned char x) { return (unsigned long long) x; }
> +int f15 (signed char x) { return (long long) x; }
> +int f16 (unsigned char x) { return (long long) x; }
>
> Jakub
>
>
Jakub Jelinek <jakub@redhat.com> writes:
> Hi!
>
> While looking at a bitint ICE, I've noticed we don't optimize
> in f1 and f5 functions below the 2 casts into just one at GIMPLE,
> even when optimize it in convert_to_integer if it appears in the same
> stmt. The large match.pd simplification of two conversions in a row
> has many complex rules and as the testcase shows, everything else from
> the narrowest -> widest -> prec_in_between all integer conversions
> is already handled, either because the inside_unsignedp == inter_unsignedp
> rule kicks in, or the
> && ((inter_unsignedp && inter_prec > inside_prec)
> == (final_unsignedp && final_prec > inter_prec))
> one, but there is no reason why sign extension to from narrowest to
> widest type followed by truncation to something in between can't be
> done just as sign extension from narrowest to the final type. After all,
> if the widest type is signed rather than unsigned, regardless of the final
> type signedness we already handle it that way.
> And since PR93044 we also handle it if the final precision is not wider
> than the inside precision.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2023-12-14 Jakub Jelinek <jakub@redhat.com>
>
> PR tree-optimization/113024
> * match.pd (two conversions in a row): Simplify scalar integer
> sign-extension followed by truncation.
>
> * gcc.dg/tree-ssa/pr113024.c: New test.
>
> --- gcc/match.pd.jj 2023-12-14 11:59:28.000000000 +0100
> +++ gcc/match.pd 2023-12-14 18:25:00.457961975 +0100
> @@ -4754,11 +4754,14 @@ (define_operator_list SYNC_FETCH_AND_AND
> /* If we have a sign-extension of a zero-extended value, we can
> replace that by a single zero-extension. Likewise if the
> final conversion does not change precision we can drop the
> - intermediate conversion. */
> + intermediate conversion. Similarly truncation of a sign-extension
> + can be replaced by a single sign-extension. */
> (if (inside_int && inter_int && final_int
> && ((inside_prec < inter_prec && inter_prec < final_prec
> && inside_unsignedp && !inter_unsignedp)
> - || final_prec == inter_prec))
> + || final_prec == inter_prec
> + || (inside_prec < inter_prec && inter_prec > final_prec
> + && !inside_unsignedp && inter_unsignedp)))
Just curious: is the inter_unsignedp part needed for correctness?
If it's bigger than both the initial and final type then I wouldn't
have expected its signedness to matter.
Thanks,
Richard
> (ocvt @0))
>
> /* Two conversions in a row are not needed unless:
> --- gcc/testsuite/gcc.dg/tree-ssa/pr113024.c.jj 2023-12-14 18:35:30.652225327 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr113024.c 2023-12-14 18:37:42.056403418 +0100
> @@ -0,0 +1,22 @@
> +/* PR tree-optimization/113024 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-forwprop1" } */
> +/* Make sure we have just a single cast per function rather than 2 casts in some cases. */
> +/* { dg-final { scan-tree-dump-times " = \\\(\[a-z \]*\\\) \[xy_\]" 16 "forwprop1" { target { ilp32 || lp64 } } } } */
> +
> +unsigned int f1 (signed char x) { unsigned long long y = x; return y; }
> +unsigned int f2 (unsigned char x) { unsigned long long y = x; return y; }
> +unsigned int f3 (signed char x) { long long y = x; return y; }
> +unsigned int f4 (unsigned char x) { long long y = x; return y; }
> +int f5 (signed char x) { unsigned long long y = x; return y; }
> +int f6 (unsigned char x) { unsigned long long y = x; return y; }
> +int f7 (signed char x) { long long y = x; return y; }
> +int f8 (unsigned char x) { long long y = x; return y; }
> +unsigned int f9 (signed char x) { return (unsigned long long) x; }
> +unsigned int f10 (unsigned char x) { return (unsigned long long) x; }
> +unsigned int f11 (signed char x) { return (long long) x; }
> +unsigned int f12 (unsigned char x) { return (long long) x; }
> +int f13 (signed char x) { return (unsigned long long) x; }
> +int f14 (unsigned char x) { return (unsigned long long) x; }
> +int f15 (signed char x) { return (long long) x; }
> +int f16 (unsigned char x) { return (long long) x; }
>
> Jakub
@@ -4754,11 +4754,14 @@ (define_operator_list SYNC_FETCH_AND_AND
/* If we have a sign-extension of a zero-extended value, we can
replace that by a single zero-extension. Likewise if the
final conversion does not change precision we can drop the
- intermediate conversion. */
+ intermediate conversion. Similarly truncation of a sign-extension
+ can be replaced by a single sign-extension. */
(if (inside_int && inter_int && final_int
&& ((inside_prec < inter_prec && inter_prec < final_prec
&& inside_unsignedp && !inter_unsignedp)
- || final_prec == inter_prec))
+ || final_prec == inter_prec
+ || (inside_prec < inter_prec && inter_prec > final_prec
+ && !inside_unsignedp && inter_unsignedp)))
(ocvt @0))
/* Two conversions in a row are not needed unless:
@@ -0,0 +1,22 @@
+/* PR tree-optimization/113024 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-forwprop1" } */
+/* Make sure we have just a single cast per function rather than 2 casts in some cases. */
+/* { dg-final { scan-tree-dump-times " = \\\(\[a-z \]*\\\) \[xy_\]" 16 "forwprop1" { target { ilp32 || lp64 } } } } */
+
+unsigned int f1 (signed char x) { unsigned long long y = x; return y; }
+unsigned int f2 (unsigned char x) { unsigned long long y = x; return y; }
+unsigned int f3 (signed char x) { long long y = x; return y; }
+unsigned int f4 (unsigned char x) { long long y = x; return y; }
+int f5 (signed char x) { unsigned long long y = x; return y; }
+int f6 (unsigned char x) { unsigned long long y = x; return y; }
+int f7 (signed char x) { long long y = x; return y; }
+int f8 (unsigned char x) { long long y = x; return y; }
+unsigned int f9 (signed char x) { return (unsigned long long) x; }
+unsigned int f10 (unsigned char x) { return (unsigned long long) x; }
+unsigned int f11 (signed char x) { return (long long) x; }
+unsigned int f12 (unsigned char x) { return (long long) x; }
+int f13 (signed char x) { return (unsigned long long) x; }
+int f14 (unsigned char x) { return (unsigned long long) x; }
+int f15 (signed char x) { return (long long) x; }
+int f16 (unsigned char x) { return (long long) x; }