lower-bitint: Avoid merging non-mergeable stmt with cast and mergeable stmt [PR112902]
Checks
Commit Message
Hi!
Before bitint lowering, the IL has:
b.0_1 = b;
_2 = -b.0_1;
_3 = (unsigned _BitInt(512)) _2;
a.1_4 = a;
a.2_5 = (unsigned _BitInt(512)) a.1_4;
_6 = _3 * a.2_5;
on the first function. Now, gimple_lower_bitint has an optimization
(when not -O0) that it avoids assigning underlying VAR_DECLs for certain
SSA_NAMEs where it is possible to lower it in a single loop (or straight
line code) rather than in multiple loops.
So, e.g. the multiplication above uses handle_operand_addr, which can deal
with INTEGER_CST arguments, loads but also casts, so it is fine
not to assign an underlying VAR_DECL for SSA_NAMEs a.1_4 and a.2_5, as
the multiplication can handle it fine.
The more problematic case is the other multiplication operand.
It is again a result of a (in this case narrowing) cast, so it is fine
not to assign VAR_DECL for _3. Normally we can merge the load (b.0_1)
with the negation (_2) and even with the following cast (_3). If _3
was used in a mergeable operation like addition, subtraction, negation,
&|^ or equality comparison, all of b.0_1, _2 and _3 could be without
underlying VAR_DECLs.
The problem is that the current code does that even when the cast is used
by a non-mergeable operation, and handle_operand_addr certainly can't handle
the mergeable operations feeding the rhs1 of the cast, for multiplication
we don't emit any loop in which it could appear, for other operations like
shifts or non-equality comparisons we emit loops, but either in the reverse
direction or with unpredictable indexes (for shifts).
So, in order to lower the above correctly, we need to have an underlying
VAR_DECL for either _2 or _3; if we choose _2, then the load and negation
would be done in one loop and extension handled as part of the
multiplication, if we choose _3, then the load, negation and cast are done
in one loop and the multiplication just uses the underlying VAR_DECL
computed by that.
It is far easier to do this for _3, which is what the following patch
implements.
It actually already had code for most of it, just it did that for widening
casts only (optimize unless the cast rhs1 is not SSA_NAME, or is SSA_NAME
defined in some other bb, or with more than one use, etc.).
This falls through into such code even for the narrowing or same precision
casts, unless the cast is used in a mergeable operation.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
2023-12-08 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112902
* gimple-lower-bitint.cc (gimple_lower_bitint): For a narrowing
or same precision cast don't set SSA_NAME_VERSION in m_names only
if use_stmt is mergeable_op or fall through into the check that
use is a store or rhs1 is not mergeable or other reasons prevent
merging.
* gcc.dg/bitint-52.c: New test.
Jakub
Comments
On Fri, 8 Dec 2023, Jakub Jelinek wrote:
> Hi!
>
> Before bitint lowering, the IL has:
> b.0_1 = b;
> _2 = -b.0_1;
> _3 = (unsigned _BitInt(512)) _2;
> a.1_4 = a;
> a.2_5 = (unsigned _BitInt(512)) a.1_4;
> _6 = _3 * a.2_5;
> on the first function. Now, gimple_lower_bitint has an optimization
> (when not -O0) that it avoids assigning underlying VAR_DECLs for certain
> SSA_NAMEs where it is possible to lower it in a single loop (or straight
> line code) rather than in multiple loops.
> So, e.g. the multiplication above uses handle_operand_addr, which can deal
> with INTEGER_CST arguments, loads but also casts, so it is fine
> not to assign an underlying VAR_DECL for SSA_NAMEs a.1_4 and a.2_5, as
> the multiplication can handle it fine.
> The more problematic case is the other multiplication operand.
> It is again a result of a (in this case narrowing) cast, so it is fine
> not to assign VAR_DECL for _3. Normally we can merge the load (b.0_1)
> with the negation (_2) and even with the following cast (_3). If _3
> was used in a mergeable operation like addition, subtraction, negation,
> &|^ or equality comparison, all of b.0_1, _2 and _3 could be without
> underlying VAR_DECLs.
> The problem is that the current code does that even when the cast is used
> by a non-mergeable operation, and handle_operand_addr certainly can't handle
> the mergeable operations feeding the rhs1 of the cast, for multiplication
> we don't emit any loop in which it could appear, for other operations like
> shifts or non-equality comparisons we emit loops, but either in the reverse
> direction or with unpredictable indexes (for shifts).
> So, in order to lower the above correctly, we need to have an underlying
> VAR_DECL for either _2 or _3; if we choose _2, then the load and negation
> would be done in one loop and extension handled as part of the
> multiplication, if we choose _3, then the load, negation and cast are done
> in one loop and the multiplication just uses the underlying VAR_DECL
> computed by that.
> It is far easier to do this for _3, which is what the following patch
> implements.
> It actually already had code for most of it, just it did that for widening
> casts only (optimize unless the cast rhs1 is not SSA_NAME, or is SSA_NAME
> defined in some other bb, or with more than one use, etc.).
> This falls through into such code even for the narrowing or same precision
> casts, unless the cast is used in a mergeable operation.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK.
> 2023-12-08 Jakub Jelinek <jakub@redhat.com>
>
> PR tree-optimization/112902
> * gimple-lower-bitint.cc (gimple_lower_bitint): For a narrowing
> or same precision cast don't set SSA_NAME_VERSION in m_names only
> if use_stmt is mergeable_op or fall through into the check that
> use is a store or rhs1 is not mergeable or other reasons prevent
> merging.
>
> * gcc.dg/bitint-52.c: New test.
>
> --- gcc/gimple-lower-bitint.cc.jj 2023-12-06 09:55:18.522993378 +0100
> +++ gcc/gimple-lower-bitint.cc 2023-12-07 18:05:17.183692049 +0100
> @@ -5989,10 +5989,11 @@ gimple_lower_bitint (void)
> {
> if (TREE_CODE (TREE_TYPE (rhs1)) != BITINT_TYPE
> || (bitint_precision_kind (TREE_TYPE (rhs1))
> - < bitint_prec_large)
> - || (TYPE_PRECISION (TREE_TYPE (rhs1))
> - >= TYPE_PRECISION (TREE_TYPE (s)))
> - || mergeable_op (SSA_NAME_DEF_STMT (s)))
> + < bitint_prec_large))
> + continue;
> + if ((TYPE_PRECISION (TREE_TYPE (rhs1))
> + >= TYPE_PRECISION (TREE_TYPE (s)))
> + && mergeable_op (use_stmt))
> continue;
> /* Prevent merging a widening non-mergeable cast
> on result of some narrower mergeable op
> @@ -6011,7 +6012,9 @@ gimple_lower_bitint (void)
> || !mergeable_op (SSA_NAME_DEF_STMT (rhs1))
> || gimple_store_p (use_stmt))
> continue;
> - if (gimple_assign_cast_p (SSA_NAME_DEF_STMT (rhs1)))
> + if ((TYPE_PRECISION (TREE_TYPE (rhs1))
> + < TYPE_PRECISION (TREE_TYPE (s)))
> + && gimple_assign_cast_p (SSA_NAME_DEF_STMT (rhs1)))
> {
> /* Another exception is if the widening cast is
> from mergeable same precision cast from something
> --- gcc/testsuite/gcc.dg/bitint-52.c.jj 2023-12-08 00:35:39.970953164 +0100
> +++ gcc/testsuite/gcc.dg/bitint-52.c 2023-12-08 00:35:21.983205440 +0100
> @@ -0,0 +1,22 @@
> +/* PR tree-optimization/112902 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c23 -O2" } */
> +
> +double c;
> +#if __BITINT_MAXWIDTH__ >= 2048
> +_BitInt (512) a;
> +_BitInt (2048) b;
> +
> +void
> +foo (void)
> +{
> + b = __builtin_mul_overflow_p (40, (_BitInt (512)) (-b * a), 0);
> +}
> +
> +
> +void
> +bar (void)
> +{
> + c -= (unsigned _BitInt (512)) (a | a << b);
> +}
> +#endif
>
> Jakub
>
>
@@ -5989,10 +5989,11 @@ gimple_lower_bitint (void)
{
if (TREE_CODE (TREE_TYPE (rhs1)) != BITINT_TYPE
|| (bitint_precision_kind (TREE_TYPE (rhs1))
- < bitint_prec_large)
- || (TYPE_PRECISION (TREE_TYPE (rhs1))
- >= TYPE_PRECISION (TREE_TYPE (s)))
- || mergeable_op (SSA_NAME_DEF_STMT (s)))
+ < bitint_prec_large))
+ continue;
+ if ((TYPE_PRECISION (TREE_TYPE (rhs1))
+ >= TYPE_PRECISION (TREE_TYPE (s)))
+ && mergeable_op (use_stmt))
continue;
/* Prevent merging a widening non-mergeable cast
on result of some narrower mergeable op
@@ -6011,7 +6012,9 @@ gimple_lower_bitint (void)
|| !mergeable_op (SSA_NAME_DEF_STMT (rhs1))
|| gimple_store_p (use_stmt))
continue;
- if (gimple_assign_cast_p (SSA_NAME_DEF_STMT (rhs1)))
+ if ((TYPE_PRECISION (TREE_TYPE (rhs1))
+ < TYPE_PRECISION (TREE_TYPE (s)))
+ && gimple_assign_cast_p (SSA_NAME_DEF_STMT (rhs1)))
{
/* Another exception is if the widening cast is
from mergeable same precision cast from something
@@ -0,0 +1,22 @@
+/* PR tree-optimization/112902 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-std=c23 -O2" } */
+
+double c;
+#if __BITINT_MAXWIDTH__ >= 2048
+_BitInt (512) a;
+_BitInt (2048) b;
+
+void
+foo (void)
+{
+ b = __builtin_mul_overflow_p (40, (_BitInt (512)) (-b * a), 0);
+}
+
+
+void
+bar (void)
+{
+ c -= (unsigned _BitInt (512)) (a | a << b);
+}
+#endif