tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693]
Checks
Commit Message
Hi!
Per the earlier discussions on this PR, the following patch folds
popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=)
if the corresponding popcount optab isn't implemented (I think any
double-word popcount or call will be necessarily slower than the
above cheap 3 op check and even for -Os larger or same size).
I've noticed e.g. C++ aligned new starts with std::has_single_bit
which does popcount (x) == 1.
As a follow-up, I'm considering changing in this routine the popcount
call to IFN_POPCOUNT with 2 arguments and during expansion test costs.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
2023-11-17 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/90693
* tree-ssa-math-opts.cc (match_single_bit_test): New function.
(math_opts_dom_walker::after_dom_children): Call it for EQ_EXPR
and NE_EXPR assignments and GIMPLE_CONDs.
* gcc.target/i386/pr90693.c: New test.
Jakub
Comments
On Mon, Nov 20, 2023 at 07:54:54AM +0000, Richard Biener wrote:
> On Fri, 17 Nov 2023, Jakub Jelinek wrote:
> > Per the earlier discussions on this PR, the following patch folds
> > popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=)
> > if the corresponding popcount optab isn't implemented (I think any
> > double-word popcount or call will be necessarily slower than the
> > above cheap 3 op check and even for -Os larger or same size).
> >
> > I've noticed e.g. C++ aligned new starts with std::has_single_bit
> > which does popcount (x) == 1.
> >
> > As a follow-up, I'm considering changing in this routine the popcount
> > call to IFN_POPCOUNT with 2 arguments and during expansion test costs.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Classically this would have been an RTL expansion alternative, given
> we want to do less of those the next place would have been the ISEL
> pass. Any particular reason you chose widening-mul for this (guess
> that pass just has a bad name and it's the effective "optimize" ISEL pass
> we have).
I think the ssa-math-opts pass does far more of this staff than the isel
pass which only deals with vector stuff right now and you've even mentioned
that pass for that in the PR90693 thread.
That said, I can move it into the isel pass as well if you prefer that.
Jakub
On Mon, Nov 20, 2023 at 08:39:42AM +0000, Richard Biener wrote:
> I think it's fine as you posted (and Jeff approved), I'm just wondering
> if we should rename that pass somehow ;) Note ISEL is more for
> required pre-expansion stuff and widen-mul is for expansion related but
> optimization parts.
+1 for renaming the pass.
Jakub
@@ -5154,6 +5154,72 @@ match_uaddc_usubc (gimple_stmt_iterator
return true;
}
+/* Replace .POPCOUNT (x) == 1 or .POPCOUNT (x) != 1 with
+ (x & (x - 1)) > x - 1 or (x & (x - 1)) <= x - 1 if .POPCOUNT
+ isn't a direct optab. */
+
+static void
+match_single_bit_test (gimple_stmt_iterator *gsi, gimple *stmt)
+{
+ tree clhs, crhs;
+ enum tree_code code;
+ if (gimple_code (stmt) == GIMPLE_COND)
+ {
+ clhs = gimple_cond_lhs (stmt);
+ crhs = gimple_cond_rhs (stmt);
+ code = gimple_cond_code (stmt);
+ }
+ else
+ {
+ clhs = gimple_assign_rhs1 (stmt);
+ crhs = gimple_assign_rhs2 (stmt);
+ code = gimple_assign_rhs_code (stmt);
+ }
+ if (code != EQ_EXPR && code != NE_EXPR)
+ return;
+ if (TREE_CODE (clhs) != SSA_NAME || !integer_onep (crhs))
+ return;
+ gimple *call = SSA_NAME_DEF_STMT (clhs);
+ combined_fn cfn = gimple_call_combined_fn (call);
+ switch (cfn)
+ {
+ CASE_CFN_POPCOUNT:
+ break;
+ default:
+ return;
+ }
+ if (!has_single_use (clhs))
+ return;
+ tree arg = gimple_call_arg (call, 0);
+ tree type = TREE_TYPE (arg);
+ if (!INTEGRAL_TYPE_P (type))
+ return;
+ if (direct_internal_fn_supported_p (IFN_POPCOUNT, type, OPTIMIZE_FOR_BOTH))
+ return;
+ tree argm1 = make_ssa_name (type);
+ gimple *g = gimple_build_assign (argm1, PLUS_EXPR, arg,
+ build_int_cst (type, -1));
+ gsi_insert_before (gsi, g, GSI_SAME_STMT);
+ g = gimple_build_assign (make_ssa_name (type), BIT_XOR_EXPR, arg, argm1);
+ gsi_insert_before (gsi, g, GSI_SAME_STMT);
+ if (gcond *cond = dyn_cast <gcond *> (stmt))
+ {
+ gimple_cond_set_lhs (cond, gimple_assign_lhs (g));
+ gimple_cond_set_rhs (cond, argm1);
+ gimple_cond_set_code (cond, code == EQ_EXPR ? GT_EXPR : LE_EXPR);
+ }
+ else
+ {
+ gimple_assign_set_rhs1 (stmt, gimple_assign_lhs (g));
+ gimple_assign_set_rhs2 (stmt, argm1);
+ gimple_assign_set_rhs_code (stmt, code == EQ_EXPR ? GT_EXPR : LE_EXPR);
+ }
+ update_stmt (stmt);
+ gimple_stmt_iterator gsi2 = gsi_for_stmt (call);
+ gsi_remove (&gsi2, true);
+ release_defs (call);
+}
+
/* Return true if target has support for divmod. */
static bool
@@ -5807,6 +5873,11 @@ math_opts_dom_walker::after_dom_children
match_uaddc_usubc (&gsi, stmt, code);
break;
+ case EQ_EXPR:
+ case NE_EXPR:
+ match_single_bit_test (&gsi, stmt);
+ break;
+
default:;
}
}
@@ -5872,7 +5943,10 @@ math_opts_dom_walker::after_dom_children
}
}
else if (gimple_code (stmt) == GIMPLE_COND)
- optimize_spaceship (as_a <gcond *> (stmt));
+ {
+ match_single_bit_test (&gsi, stmt);
+ optimize_spaceship (as_a <gcond *> (stmt));
+ }
gsi_next (&gsi);
}
if (fma_state.m_deferring_p
@@ -0,0 +1,29 @@
+/* PR tree-optimization/90693 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-abm -mno-popcnt -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-not "POPCOUNT \\\(" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "__builtin_popcount(ll)? \\\(" "optimized" } } */
+
+int
+foo (unsigned int x)
+{
+ return __builtin_popcount (x) == 1;
+}
+
+int
+bar (unsigned int x)
+{
+ return (x ^ (x - 1)) > x - 1;
+}
+
+int
+baz (unsigned long long x)
+{
+ return __builtin_popcountll (x) == 1;
+}
+
+int
+qux (unsigned long long x)
+{
+ return (x ^ (x - 1)) > x - 1;
+}