tree-ssa-math-opts: Pattern recognize hand written __builtin_mul_overflow_p with same unsigned types even when target just has highpart umul [PR101856]

Message ID ZGcsc76Md1sN0D9i@tucnak
State Unresolved
Headers
Series tree-ssa-math-opts: Pattern recognize hand written __builtin_mul_overflow_p with same unsigned types even when target just has highpart umul [PR101856] |

Checks

Context Check Description
snail/gcc-patch-check warning Git am fail log

Commit Message

Jakub Jelinek May 19, 2023, 7:59 a.m. UTC
  Hi!

As can be seen on the following testcase, we pattern recognize it on
i?86/x86_64 as return __builtin_mul_overflow_p (x, y, 0UL) and avoid
that way the extra division, but don't do it e.g. on aarch64 or ppc64le,
even when return __builtin_mul_overflow_p (x, y, 0UL); actually produces
there better code.  The reason for testing the presence of the optab
handler is to make sure the generated code for it is short to ensure
we don't actually pessimize code instead of optimizing it.
But, we have one case that the internal-fn.cc .MUL_OVERFLOW expansion
handles nicely, and that is when arguments/result is the same mode
TYPE_UNSIGNED type, we only use IMAGPART_EXPR of it (i.e.
__builtin_mul_overflow_p rather than __builtin_mul_overflow) and
umul_highpart_optab supports the particular mode, in that case
we emit comparison of the highpart umul result against zero.

So, the following patch matches what we do in internal-fn.cc and
also pattern matches __builtin_mul_overflow_p if
1) we only need the flag whether it overflowed (i.e. !use_seen)
2) it is unsigned (i.e. !cast_stmt)
3) umul_highpart is supported for the mode

Bootstrapped/regtested on x86_64-linux, i686-linux, aarch64-linux and
powerpc64le-linux, ok for trunk?

2023-05-19  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/101856
	* tree-ssa-math-opts.cc (match_arith_overflow): Pattern detect
	unsigned __builtin_mul_overflow_p even when umulv4_optab doesn't
	support it but umul_highpart_optab does.

	* gcc.dg/tree-ssa/pr101856.c: New test.


	Jakub
  

Comments

Richard Biener May 19, 2023, 10:43 a.m. UTC | #1
> Am 19.05.2023 um 10:00 schrieb Jakub Jelinek <jakub@redhat.com>:
> 
> Hi!
> 
> As can be seen on the following testcase, we pattern recognize it on
> i?86/x86_64 as return __builtin_mul_overflow_p (x, y, 0UL) and avoid
> that way the extra division, but don't do it e.g. on aarch64 or ppc64le,
> even when return __builtin_mul_overflow_p (x, y, 0UL); actually produces
> there better code.  The reason for testing the presence of the optab
> handler is to make sure the generated code for it is short to ensure
> we don't actually pessimize code instead of optimizing it.
> But, we have one case that the internal-fn.cc .MUL_OVERFLOW expansion
> handles nicely, and that is when arguments/result is the same mode
> TYPE_UNSIGNED type, we only use IMAGPART_EXPR of it (i.e.
> __builtin_mul_overflow_p rather than __builtin_mul_overflow) and
> umul_highpart_optab supports the particular mode, in that case
> we emit comparison of the highpart umul result against zero.
> 
> So, the following patch matches what we do in internal-fn.cc and
> also pattern matches __builtin_mul_overflow_p if
> 1) we only need the flag whether it overflowed (i.e. !use_seen)
> 2) it is unsigned (i.e. !cast_stmt)
> 3) umul_highpart is supported for the mode
> 
> Bootstrapped/regtested on x86_64-linux, i686-linux, aarch64-linux and
> powerpc64le-linux, ok for trunk?

Ok.

Richard 

> 2023-05-19  Jakub Jelinek  <jakub@redhat.com>
> 
>    PR tree-optimization/101856
>    * tree-ssa-math-opts.cc (match_arith_overflow): Pattern detect
>    unsigned __builtin_mul_overflow_p even when umulv4_optab doesn't
>    support it but umul_highpart_optab does.
> 
>    * gcc.dg/tree-ssa/pr101856.c: New test.
> 
> --- gcc/tree-ssa-math-opts.cc.jj    2023-05-17 20:57:59.537914382 +0200
> +++ gcc/tree-ssa-math-opts.cc    2023-05-18 12:04:09.332336899 +0200
> @@ -4074,7 +4074,10 @@ match_arith_overflow (gimple_stmt_iterat
>                TYPE_MODE (type)) == CODE_FOR_nothing)
>       || (code == MULT_EXPR
>      && optab_handler (cast_stmt ? mulv4_optab : umulv4_optab,
> -                TYPE_MODE (type)) == CODE_FOR_nothing))
> +                TYPE_MODE (type)) == CODE_FOR_nothing
> +      && (use_seen
> +          || cast_stmt
> +          || !can_mult_highpart_p (TYPE_MODE (type), true))))
>     {
>       if (code != PLUS_EXPR)
>    return false;
> --- gcc/testsuite/gcc.dg/tree-ssa/pr101856.c.jj    2023-05-18 11:57:17.681206745 +0200
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr101856.c    2023-05-18 11:56:51.662577752 +0200
> @@ -0,0 +1,11 @@
> +/* PR tree-optimization/101856 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump " .MUL_OVERFLOW " "optimized" { target i?86-*-* x86_64-*-* aarch64*-*-* powerpc64le-*-* } } } */
> +
> +int
> +foo (unsigned long x, unsigned long y)
> +{
> +  unsigned long z = x * y;
> +  return z / y != x;
> +}
> 
>    Jakub
>
  

Patch

--- gcc/tree-ssa-math-opts.cc.jj	2023-05-17 20:57:59.537914382 +0200
+++ gcc/tree-ssa-math-opts.cc	2023-05-18 12:04:09.332336899 +0200
@@ -4074,7 +4074,10 @@  match_arith_overflow (gimple_stmt_iterat
 			    TYPE_MODE (type)) == CODE_FOR_nothing)
       || (code == MULT_EXPR
 	  && optab_handler (cast_stmt ? mulv4_optab : umulv4_optab,
-			    TYPE_MODE (type)) == CODE_FOR_nothing))
+			    TYPE_MODE (type)) == CODE_FOR_nothing
+	  && (use_seen
+	      || cast_stmt
+	      || !can_mult_highpart_p (TYPE_MODE (type), true))))
     {
       if (code != PLUS_EXPR)
 	return false;
--- gcc/testsuite/gcc.dg/tree-ssa/pr101856.c.jj	2023-05-18 11:57:17.681206745 +0200
+++ gcc/testsuite/gcc.dg/tree-ssa/pr101856.c	2023-05-18 11:56:51.662577752 +0200
@@ -0,0 +1,11 @@ 
+/* PR tree-optimization/101856 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump " .MUL_OVERFLOW " "optimized" { target i?86-*-* x86_64-*-* aarch64*-*-* powerpc64le-*-* } } } */
+
+int
+foo (unsigned long x, unsigned long y)
+{
+  unsigned long z = x * y;
+  return z / y != x;
+}