lower-bitint: Attempt not to emit always true conditions in handle_cast [PR113774]
Checks
Commit Message
Hi!
The following patch is the optimization part of PR113774, where in
handle_cast we emit some conditionals which are always true and presumably
VRP would figure that out later and clean it up, except that instead
thread1 is invoked and threads everything through the conditions, so we end
up with really ugly code which is hard to be cleaned up later and then
run into PR113831 VN bug and miscompile stuff.
handle_cast computes low and high as limb indexes, where idx < low
doesn't need any special treatment, just uses the operand's limb,
idx >= high cases all the bits in the limb are an extension (so, for
unsigned widening cast all those bits are 0, for signed widening cast
all those bits are equal to the in earlier code computed sign mask,
narrowing cast don't trigger this code) and then the idx == low && idx <
high case if it exists need special treatment (some bits are copied, others
extended, or all bits are copied but sign mask needs to be computed).
The code already attempted to optimize away some unneeded casts, in the
first hunk below e.g. for the case like 257 -> 321 bit extension, where
low is 4 and high 5 and we use a loop handling the first 4 limbs (2
iterations) with m_upwards_2limb 4 - no special handling is needed in the
loop, and the special handling is done on the first limb after the loop
and then the last limb after the loop gets the extension only, or
in the second hunk where can emit a single comparison instead of
2 e.g. for the low == high case - that must be a zero extension from
multiple of limb bits, say 192 -> 328, or for the case where we know
the idx == low case happens in the other limb processed in the loop, not
the current one.
But the testcase shows further cases where we always know some of the
comparisons can be folded to true/false, in particular there is
255 -> 257 bit zero extension, so low 3, high 4, m_upwards_2limb 4.
The loop handles 2 limbs at the time and for the first limb we were
emitting idx < low ? operand[idx] : 0; but because idx goes from 0
with step 2 2 iterations, idx < 3 is always true, so we can just
emit operand[idx]. This is handled in the first hunk. In addition
to fixing it (that is the " - m_first" part in there) I've rewritten
it using low to make it more readable.
Similarly, in the other limb we were emitting
idx + 1 <= low ? (idx + 1 == low ? operand[idx] & 0x7ff....ff : operand[idx]) : 0
but idx + 1 <= 3 is always true in the loop, so all we should emit is
idx + 1 == low ? operand[idx] & 0x7ff....ff : operand[idx],
Unfortunately for the latter, when single_comparison is true, we emit
just one comparison, but the code which fills the branches will fill it
with the operand[idx] and 0 cases (for zero extension, for sign extension
similarly), not the operand[idx] (aka copy) and operand[idx] & 0x7ff....ff
(aka most significant limb of the narrower precision) cases. Instead
of making the code less readable by using single_comparison for that and
handling it in the code later differently I've chosen to just emit
a condition which will be always true and let cfg cleanup clean it up.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
2024-02-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113774
* gimple-lower-bitint.cc (bitint_large_huge::handle_cast): Don't
emit any comparison if m_first and low + 1 is equal to
m_upwards_2limb, simplify condition for that. If not
single_comparison, not m_first and we can prove that the idx <= low
comparison will be always true, emit instead of idx <= low
comparison low <= low such that cfg cleanup will optimize it at
the end of the pass.
* gcc.dg/torture/bitint-57.c: New test.
Jakub
Comments
On Fri, 9 Feb 2024, Jakub Jelinek wrote:
> Hi!
>
> The following patch is the optimization part of PR113774, where in
> handle_cast we emit some conditionals which are always true and presumably
> VRP would figure that out later and clean it up, except that instead
> thread1 is invoked and threads everything through the conditions, so we end
> up with really ugly code which is hard to be cleaned up later and then
> run into PR113831 VN bug and miscompile stuff.
>
> handle_cast computes low and high as limb indexes, where idx < low
> doesn't need any special treatment, just uses the operand's limb,
> idx >= high cases all the bits in the limb are an extension (so, for
> unsigned widening cast all those bits are 0, for signed widening cast
> all those bits are equal to the in earlier code computed sign mask,
> narrowing cast don't trigger this code) and then the idx == low && idx <
> high case if it exists need special treatment (some bits are copied, others
> extended, or all bits are copied but sign mask needs to be computed).
>
> The code already attempted to optimize away some unneeded casts, in the
> first hunk below e.g. for the case like 257 -> 321 bit extension, where
> low is 4 and high 5 and we use a loop handling the first 4 limbs (2
> iterations) with m_upwards_2limb 4 - no special handling is needed in the
> loop, and the special handling is done on the first limb after the loop
> and then the last limb after the loop gets the extension only, or
> in the second hunk where can emit a single comparison instead of
> 2 e.g. for the low == high case - that must be a zero extension from
> multiple of limb bits, say 192 -> 328, or for the case where we know
> the idx == low case happens in the other limb processed in the loop, not
> the current one.
>
> But the testcase shows further cases where we always know some of the
> comparisons can be folded to true/false, in particular there is
> 255 -> 257 bit zero extension, so low 3, high 4, m_upwards_2limb 4.
> The loop handles 2 limbs at the time and for the first limb we were
> emitting idx < low ? operand[idx] : 0; but because idx goes from 0
> with step 2 2 iterations, idx < 3 is always true, so we can just
> emit operand[idx]. This is handled in the first hunk. In addition
> to fixing it (that is the " - m_first" part in there) I've rewritten
> it using low to make it more readable.
>
> Similarly, in the other limb we were emitting
> idx + 1 <= low ? (idx + 1 == low ? operand[idx] & 0x7ff....ff : operand[idx]) : 0
> but idx + 1 <= 3 is always true in the loop, so all we should emit is
> idx + 1 == low ? operand[idx] & 0x7ff....ff : operand[idx],
> Unfortunately for the latter, when single_comparison is true, we emit
> just one comparison, but the code which fills the branches will fill it
> with the operand[idx] and 0 cases (for zero extension, for sign extension
> similarly), not the operand[idx] (aka copy) and operand[idx] & 0x7ff....ff
> (aka most significant limb of the narrower precision) cases. Instead
> of making the code less readable by using single_comparison for that and
> handling it in the code later differently I've chosen to just emit
> a condition which will be always true and let cfg cleanup clean it up.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK.
Richard.
> 2024-02-09 Jakub Jelinek <jakub@redhat.com>
>
> PR tree-optimization/113774
> * gimple-lower-bitint.cc (bitint_large_huge::handle_cast): Don't
> emit any comparison if m_first and low + 1 is equal to
> m_upwards_2limb, simplify condition for that. If not
> single_comparison, not m_first and we can prove that the idx <= low
> comparison will be always true, emit instead of idx <= low
> comparison low <= low such that cfg cleanup will optimize it at
> the end of the pass.
>
> * gcc.dg/torture/bitint-57.c: New test.
>
> --- gcc/gimple-lower-bitint.cc.jj 2024-02-08 12:49:40.435313811 +0100
> +++ gcc/gimple-lower-bitint.cc 2024-02-08 14:33:36.033220098 +0100
> @@ -1350,9 +1350,7 @@ bitint_large_huge::handle_cast (tree lhs
> if (!tree_fits_uhwi_p (idx))
> {
> if (m_upwards_2limb
> - && (m_upwards_2limb * limb_prec
> - <= ((unsigned) TYPE_PRECISION (rhs_type)
> - - !TYPE_UNSIGNED (rhs_type))))
> + && low >= m_upwards_2limb - m_first)
> {
> rhs1 = handle_operand (rhs1, idx);
> if (m_first)
> @@ -1363,8 +1361,21 @@ bitint_large_huge::handle_cast (tree lhs
> }
> bool single_comparison
> = low == high || (m_upwards_2limb && (low & 1) == m_first);
> + tree idxc = idx;
> + if (!single_comparison
> + && m_upwards_2limb
> + && !m_first
> + && low + 1 == m_upwards_2limb)
> + /* In this case we know that idx <= low always,
> + so effectively we just needs a single comparison,
> + idx < low or idx == low, but we'd need to emit different
> + code for the 2 branches than single_comparison normally
> + emits. So, instead of special-casing that, emit a
> + low <= low comparison which cfg cleanup will clean up
> + at the end of the pass. */
> + idxc = size_int (low);
> g = gimple_build_cond (single_comparison ? LT_EXPR : LE_EXPR,
> - idx, size_int (low), NULL_TREE, NULL_TREE);
> + idxc, size_int (low), NULL_TREE, NULL_TREE);
> edge edge_true_true, edge_true_false, edge_false;
> if_then_if_then_else (g, (single_comparison ? NULL
> : gimple_build_cond (EQ_EXPR, idx,
> --- gcc/testsuite/gcc.dg/torture/bitint-57.c.jj 2024-02-08 14:45:35.033303393 +0100
> +++ gcc/testsuite/gcc.dg/torture/bitint-57.c 2024-02-07 18:50:10.759168537 +0100
> @@ -0,0 +1,32 @@
> +/* PR tree-optimization/113774 */
> +/* { dg-do run { target bitint } } */
> +/* { dg-options "-std=c23 -pedantic-errors" } */
> +/* { dg-skip-if "" { ! run_expensive_tests } { "*" } { "-O0" "-O2" } } */
> +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
> +
> +#if __BITINT_MAXWIDTH__ >= 512
> +unsigned _BitInt(512) u;
> +unsigned _BitInt(512) v;
> +
> +void
> +foo (unsigned _BitInt(255) a, unsigned _BitInt(257) b, unsigned _BitInt(512) *r)
> +{
> + b += v;
> + b |= a - b;
> + unsigned _BitInt(512) c = b * 6;
> + unsigned _BitInt(512) h = c >> u;
> + *r = h;
> +}
> +#endif
> +
> +int
> +main ()
> +{
> +#if __BITINT_MAXWIDTH__ >= 512
> + unsigned _BitInt(512) x;
> + foo (0x10000000000000000wb, 0x10000000000000001wb, &x);
> + if (x != 0x1fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffawb)
> + __builtin_abort ();
> +#endif
> + return 0;
> +}
>
> Jakub
>
>
@@ -1350,9 +1350,7 @@ bitint_large_huge::handle_cast (tree lhs
if (!tree_fits_uhwi_p (idx))
{
if (m_upwards_2limb
- && (m_upwards_2limb * limb_prec
- <= ((unsigned) TYPE_PRECISION (rhs_type)
- - !TYPE_UNSIGNED (rhs_type))))
+ && low >= m_upwards_2limb - m_first)
{
rhs1 = handle_operand (rhs1, idx);
if (m_first)
@@ -1363,8 +1361,21 @@ bitint_large_huge::handle_cast (tree lhs
}
bool single_comparison
= low == high || (m_upwards_2limb && (low & 1) == m_first);
+ tree idxc = idx;
+ if (!single_comparison
+ && m_upwards_2limb
+ && !m_first
+ && low + 1 == m_upwards_2limb)
+ /* In this case we know that idx <= low always,
+ so effectively we just needs a single comparison,
+ idx < low or idx == low, but we'd need to emit different
+ code for the 2 branches than single_comparison normally
+ emits. So, instead of special-casing that, emit a
+ low <= low comparison which cfg cleanup will clean up
+ at the end of the pass. */
+ idxc = size_int (low);
g = gimple_build_cond (single_comparison ? LT_EXPR : LE_EXPR,
- idx, size_int (low), NULL_TREE, NULL_TREE);
+ idxc, size_int (low), NULL_TREE, NULL_TREE);
edge edge_true_true, edge_true_false, edge_false;
if_then_if_then_else (g, (single_comparison ? NULL
: gimple_build_cond (EQ_EXPR, idx,
@@ -0,0 +1,32 @@
+/* PR tree-optimization/113774 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-std=c23 -pedantic-errors" } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+#if __BITINT_MAXWIDTH__ >= 512
+unsigned _BitInt(512) u;
+unsigned _BitInt(512) v;
+
+void
+foo (unsigned _BitInt(255) a, unsigned _BitInt(257) b, unsigned _BitInt(512) *r)
+{
+ b += v;
+ b |= a - b;
+ unsigned _BitInt(512) c = b * 6;
+ unsigned _BitInt(512) h = c >> u;
+ *r = h;
+}
+#endif
+
+int
+main ()
+{
+#if __BITINT_MAXWIDTH__ >= 512
+ unsigned _BitInt(512) x;
+ foo (0x10000000000000000wb, 0x10000000000000001wb, &x);
+ if (x != 0x1fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffawb)
+ __builtin_abort ();
+#endif
+ return 0;
+}