match.pd: Canonicalize (signed x << c) >> c [PR101955]

Message ID 20230801192033.432742-1-drross@redhat.com
State Accepted
Headers
Series match.pd: Canonicalize (signed x << c) >> c [PR101955] |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Drew Ross Aug. 1, 2023, 7:20 p.m. UTC
  Canonicalizes (signed x << c) >> c into the lowest
precision(type) - c bits of x IF those bits have a mode precision or a
precision of 1. Also combines this rule with (unsigned x << c) >> c -> x &
((unsigned)-1 >> c) to prevent duplicate pattern. Tested successfully on
x86_64 and x86 targets.

  PR middle-end/101955

gcc/ChangeLog:

  * match.pd ((signed x << c) >> c): New canonicalization.

gcc/testsuite/ChangeLog:

  * gcc.dg/pr101955.c: New test.
---
 gcc/match.pd                    | 20 +++++++----
 gcc/testsuite/gcc.dg/pr101955.c | 63 +++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr101955.c
  

Comments

Jakub Jelinek Aug. 1, 2023, 9:36 p.m. UTC | #1
On Tue, Aug 01, 2023 at 03:20:33PM -0400, Drew Ross via Gcc-patches wrote:
> Canonicalizes (signed x << c) >> c into the lowest
> precision(type) - c bits of x IF those bits have a mode precision or a
> precision of 1. Also combines this rule with (unsigned x << c) >> c -> x &
> ((unsigned)-1 >> c) to prevent duplicate pattern. Tested successfully on
> x86_64 and x86 targets.
> 
>   PR middle-end/101955
> 
> gcc/ChangeLog:
> 
>   * match.pd ((signed x << c) >> c): New canonicalization.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr101955.c: New test.
> ---
>  gcc/match.pd                    | 20 +++++++----
>  gcc/testsuite/gcc.dg/pr101955.c | 63 +++++++++++++++++++++++++++++++++
>  2 files changed, 77 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr101955.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 8543f777a28..62f7c84f565 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3758,13 +3758,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  			- TYPE_PRECISION (TREE_TYPE (@2)))))
>    (bit_and (convert @0) (lshift { build_minus_one_cst (type); } @1))))
>  
> -/* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
> -   types.  */
> +/* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
> +   unsigned x OR truncate into the precision(type) - c lowest bits
> +   of signed x (if they have mode precision or a precision of 1)  */

There should be . between ) and "  */" above.

>  (simplify
> - (rshift (lshift @0 INTEGER_CST@1) @1)
> - (if (TYPE_UNSIGNED (type)
> -      && (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
> -  (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
> + (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> + (if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
> +  (if (TYPE_UNSIGNED (type))
> +   (bit_and @0 (rshift { build_minus_one_cst (type); } @1))

This needs to be (convert @0) instead of @0, because now that there is
the nop_convert? in between, @0 could have different type than type.
I certainly see regressions on
gcc.c-torture/compile/950612-1.c
on i686-linux because of this:
/home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/compile/950612-1.c:17:1: error: type mismatch in binary expression
long long unsigned int

long long int

long long unsigned int

_346 = _3 & 4294967295;
during GIMPLE pass: forwprop
/home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/compile/950612-1.c:17:1: internal compiler error: verify_gimple failed
0x9018a4e verify_gimple_in_cfg(function*, bool, bool)
        ../../gcc/tree-cfg.cc:5646
0x8e81eb5 execute_function_todo
        ../../gcc/passes.cc:2088
0x8e8234c do_per_function
        ../../gcc/passes.cc:1687
0x8e82431 execute_todo
        ../../gcc/passes.cc:2142
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

> +   (if (INTEGRAL_TYPE_P (type))
> +    (with {
> +      int width = element_precision (type) - tree_to_uhwi (@1);
> +      tree stype = build_nonstandard_integer_type (width, 0);
> +     }
> +     (if (width  == 1 || type_has_mode_precision_p (stype))
> +      (convert (convert:stype @0))))))))

just one space before == instead of two

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr101955.c
> @@ -0,0 +1,63 @@
> +/* { dg-do compile } */

The above line should be
/* { dg-do compile { target int32 } } */
because the test relies on 32-bit int, some targets have just
16-bit int.
Of course, unless you want to make the testcase more portable, by
using say
#define CHAR_BITS __CHAR_BIT__
#define INT_BITS (__SIZEOF_INT__ * __CHAR_BIT__)
#define LLONG_BITS (__SIZEOF_LONGLONG__ * __CHAR_BIT__)
and replacing all the 31, 24, 56 etc. constants with (INT_BITS - 1),
(INT_BITS - CHAR_BITS), (LLONG_BITS - CHAR_BITS) etc.
Though, it would still fail on some AVR configurations which have
(invalid for C) just 8-bit int, and the question is what to do with
that 16, because (INT_BITS - 2 * CHAR_BITS) is 0 on 16-bit ints, so
it would need to be (INT_BITS / 2) instead.  C requires that
long long is at least 64-bit, so that is less problematic (no known
target to have > 64-bit long long, though theoretically possible).

> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +

	Jakub
  

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 8543f777a28..62f7c84f565 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3758,13 +3758,21 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 			- TYPE_PRECISION (TREE_TYPE (@2)))))
   (bit_and (convert @0) (lshift { build_minus_one_cst (type); } @1))))
 
-/* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
-   types.  */
+/* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
+   unsigned x OR truncate into the precision(type) - c lowest bits
+   of signed x (if they have mode precision or a precision of 1)  */
 (simplify
- (rshift (lshift @0 INTEGER_CST@1) @1)
- (if (TYPE_UNSIGNED (type)
-      && (wi::ltu_p (wi::to_wide (@1), element_precision (type))))
-  (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
+ (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
+ (if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
+  (if (TYPE_UNSIGNED (type))
+   (bit_and @0 (rshift { build_minus_one_cst (type); } @1))
+   (if (INTEGRAL_TYPE_P (type))
+    (with {
+      int width = element_precision (type) - tree_to_uhwi (@1);
+      tree stype = build_nonstandard_integer_type (width, 0);
+     }
+     (if (width  == 1 || type_has_mode_precision_p (stype))
+      (convert (convert:stype @0))))))))
 
 /* Optimize x >> x into 0 */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/pr101955.c b/gcc/testsuite/gcc.dg/pr101955.c
new file mode 100644
index 00000000000..8619661b291
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr101955.c
@@ -0,0 +1,63 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+__attribute__((noipa)) int
+t1 (int x)
+{
+  int y = x << 31;
+  int z = y >> 31;
+  return z;
+}
+
+__attribute__((noipa)) int
+t2 (unsigned int x)
+{
+  int y = x << 31;
+  int z = y >> 31;
+  return z;
+}
+
+__attribute__((noipa)) int
+t3 (int x)
+{
+  return (x << 31) >> 31;
+}
+
+__attribute__((noipa)) int
+t4 (int x)
+{
+  return (x << 24) >> 24;
+}
+
+__attribute__((noipa)) int
+t5 (int x)
+{
+  return (x << 16) >> 16;
+}
+
+__attribute__((noipa)) long long
+t6 (long long x)
+{
+  return (x << 63) >> 63;
+}
+
+__attribute__((noipa)) long long
+t7 (long long x)
+{
+  return (x << 56) >> 56;
+}
+
+__attribute__((noipa)) long long
+t8 (long long x)
+{
+  return (x << 48) >> 48;
+}
+
+__attribute__((noipa)) long long
+t9 (long long x)
+{
+  return (x << 32) >> 32;
+}
+
+/* { dg-final { scan-tree-dump-not " >> " "optimized" } } */
+/* { dg-final { scan-tree-dump-not " << " "optimized" } } */