aarch64: [PR110986] Emit csinv again for `a ? ~b : b`

Message ID 20231019040519.2655598-1-pinskia@gmail.com
State Accepted
Headers
Series aarch64: [PR110986] Emit csinv again for `a ? ~b : b` |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Andrew Pinski Oct. 19, 2023, 4:05 a.m. UTC
  After r14-3110-g7fb65f10285, the canonical form for
`a ? ~b : b` changed to be `-(a) ^ b` that means
for aarch64 we need to add a few new insn patterns
to be able to catch this and change it to be
what is the canonical form for the aarch64 backend.
A secondary pattern was needed to support a zero_extended
form too; this adds a testcase for all 3 cases.

Bootstrapped and tested on aarch64-linux-gnu with no regressions.

	PR target/110986

gcc/ChangeLog:

	* config/aarch64/aarch64.md (*cmov<mode>_insn_insv): New pattern.
	(*cmov_uxtw_insn_insv): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/cond_op-1.c: New test.
---
 gcc/config/aarch64/aarch64.md                | 46 ++++++++++++++++++++
 gcc/testsuite/gcc.target/aarch64/cond_op-1.c | 20 +++++++++
 2 files changed, 66 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cond_op-1.c
  

Comments

Richard Sandiford Oct. 20, 2023, 12:13 p.m. UTC | #1
Andrew Pinski <pinskia@gmail.com> writes:
> After r14-3110-g7fb65f10285, the canonical form for
> `a ? ~b : b` changed to be `-(a) ^ b` that means
> for aarch64 we need to add a few new insn patterns
> to be able to catch this and change it to be
> what is the canonical form for the aarch64 backend.
> A secondary pattern was needed to support a zero_extended
> form too; this adds a testcase for all 3 cases.

From the comment in the patch, it sounds like we don't really have
a target-independent canonical form.  That is, we can't just rewrite
the old pattern to use the new form.

It would be nice there was a canonical form, but I won't push it.

> Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>
> 	PR target/110986
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64.md (*cmov<mode>_insn_insv): New pattern.
> 	(*cmov_uxtw_insn_insv): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/aarch64/cond_op-1.c: New test.
> ---
>  gcc/config/aarch64/aarch64.md                | 46 ++++++++++++++++++++
>  gcc/testsuite/gcc.target/aarch64/cond_op-1.c | 20 +++++++++
>  2 files changed, 66 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/cond_op-1.c
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 32c7adc8928..59cd0415937 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4413,6 +4413,52 @@ (define_insn "*csinv3_uxtw_insn3"
>    [(set_attr "type" "csel")]
>  )
>  
> +;; There are two canonical forms for `cmp ? ~a : a`.
> +;; This is the second form and is here to help combine.
> +;; Support `-(cmp) ^ a` into `cmp ? ~a : a`
> +;; The second pattern is to support the zero extend'ed version.
> +
> +(define_insn_and_split "*cmov<mode>_insn_insv"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +        (xor:GPI
> +	 (neg:GPI
> +	  (match_operator:GPI 1 "aarch64_comparison_operator"
> +	   [(match_operand 2 "cc_register" "") (const_int 0)]))
> +	 (match_operand:GPI 3 "general_operand" "r")))]
> +  "can_create_pseudo_p ()"
> +  "#"
> +  "&& true"

IMO this is an ICE trap, since it hard-codes the assumption that there
will be a split pass after the last pre-LRA call to recog.  I think we
should jsut provide the asm directly instead.

Looks good otherwise, thanks.

Richard

> +  [(set (match_dup 0)
> +	(if_then_else:GPI (match_dup 1)
> +			  (not:GPI (match_dup 3))
> +			  (match_dup 3)))]
> +  {
> +    operands[3] = force_reg (<MODE>mode, operands[3]);
> +  }
> +  [(set_attr "type" "csel")]
> +)
> +
> +(define_insn_and_split "*cmov_uxtw_insn_insv"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +        (zero_extend:DI
> +	 (xor:SI
> +	  (neg:SI
> +	   (match_operator:SI 1 "aarch64_comparison_operator"
> +	    [(match_operand 2 "cc_register" "") (const_int 0)]))
> +	  (match_operand:SI 3 "general_operand" "r"))))]
> +  "can_create_pseudo_p ()"
> +  "#"
> +  "&& true"
> +  [(set (match_dup 0)
> +	(if_then_else:DI (match_dup 1)
> +			  (zero_extend:DI (not:SI (match_dup 3)))
> +			  (zero_extend:DI (match_dup 3))))]
> +  {
> +    operands[3] = force_reg (SImode, operands[3]);
> +  }
> +  [(set_attr "type" "csel")]
> +)
> +
>  ;; If X can be loaded by a single CNT[BHWD] instruction,
>  ;;
>  ;;    A = UMAX (B, X)
> diff --git a/gcc/testsuite/gcc.target/aarch64/cond_op-1.c b/gcc/testsuite/gcc.target/aarch64/cond_op-1.c
> new file mode 100644
> index 00000000000..e6c7821127e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/cond_op-1.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* PR target/110986 */
> +
> +
> +long long full(unsigned a, unsigned b)
> +{
> +  return a ? ~b : b;
> +}
> +unsigned fuu(unsigned a, unsigned b)
> +{
> +  return a ? ~b : b;
> +}
> +long long fllll(unsigned long long a, unsigned long long b)
> +{
> +  return a ? ~b : b;
> +}
> +
> +/* { dg-final { scan-assembler-times "csinv\tw\[0-9\]*" 2 } } */
> +/* { dg-final { scan-assembler-times "csinv\tx\[0-9\]*" 1 } } */
  
Richard Earnshaw Oct. 20, 2023, 1:17 p.m. UTC | #2
On 20/10/2023 13:13, Richard Sandiford wrote:
>> +(define_insn_and_split "*cmov<mode>_insn_insv"
>> +  [(set (match_operand:GPI 0 "register_operand" "=r")
>> +        (xor:GPI
>> +	 (neg:GPI
>> +	  (match_operator:GPI 1 "aarch64_comparison_operator"
>> +	   [(match_operand 2 "cc_register" "") (const_int 0)]))
>> +	 (match_operand:GPI 3 "general_operand" "r")))]
>> +  "can_create_pseudo_p ()"
>> +  "#"
>> +  "&& true"
 >
> IMO this is an ICE trap, since it hard-codes the assumption that there
> will be a split pass after the last pre-LRA call to recog.  I think we
> should jsut provide the asm directly instead.

So why not add

(clobber (match_operand:GPI 4 "register_operand" "=&r"))

to the insn, then you'll always get the scratch needed and the need to 
check cane_create_pseudo_p goes away.

R.
  
Richard Sandiford Oct. 20, 2023, 1:42 p.m. UTC | #3
Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes:
> On 20/10/2023 13:13, Richard Sandiford wrote:
>>> +(define_insn_and_split "*cmov<mode>_insn_insv"
>>> +  [(set (match_operand:GPI 0 "register_operand" "=r")
>>> +        (xor:GPI
>>> +	 (neg:GPI
>>> +	  (match_operator:GPI 1 "aarch64_comparison_operator"
>>> +	   [(match_operand 2 "cc_register" "") (const_int 0)]))
>>> +	 (match_operand:GPI 3 "general_operand" "r")))]
>>> +  "can_create_pseudo_p ()"
>>> +  "#"
>>> +  "&& true"
>  >
>> IMO this is an ICE trap, since it hard-codes the assumption that there
>> will be a split pass after the last pre-LRA call to recog.  I think we
>> should jsut provide the asm directly instead.
>
> So why not add
>
> (clobber (match_operand:GPI 4 "register_operand" "=&r"))
>
> to the insn, then you'll always get the scratch needed and the need to 
> check cane_create_pseudo_p goes away.

I think the "general_operand" "r" works in terms of ensuring that the
source is a GPR.  So we shouldn't need a separate clobber.

Our off-list discussion made me realise that my concern above wasn't
very clear.  In principle, it should be possible for any pass to clear
INSN_CODE and then rerecognise the pattern using recog.  So I think it's
wrong (or at least dangerous) for insns to require can_create_pseudo_p.
It means that an insn starts out valid and suddenly becomes invalid
half way through RTL compilation.

But looking at it again, the patch seems correct with just the
can_create_pseudo_p conditions removed.  The constraints seem to
satisfy what csinv requires, and the force_reg should be a no-op
after RA.

So the patch is OK with just the can_create_pseudo_p tests removed.
Sorry for the run-around, and thanks for pushing back.

Richard
  

Patch

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 32c7adc8928..59cd0415937 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4413,6 +4413,52 @@  (define_insn "*csinv3_uxtw_insn3"
   [(set_attr "type" "csel")]
 )
 
+;; There are two canonical forms for `cmp ? ~a : a`.
+;; This is the second form and is here to help combine.
+;; Support `-(cmp) ^ a` into `cmp ? ~a : a`
+;; The second pattern is to support the zero extend'ed version.
+
+(define_insn_and_split "*cmov<mode>_insn_insv"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+        (xor:GPI
+	 (neg:GPI
+	  (match_operator:GPI 1 "aarch64_comparison_operator"
+	   [(match_operand 2 "cc_register" "") (const_int 0)]))
+	 (match_operand:GPI 3 "general_operand" "r")))]
+  "can_create_pseudo_p ()"
+  "#"
+  "&& true"
+  [(set (match_dup 0)
+	(if_then_else:GPI (match_dup 1)
+			  (not:GPI (match_dup 3))
+			  (match_dup 3)))]
+  {
+    operands[3] = force_reg (<MODE>mode, operands[3]);
+  }
+  [(set_attr "type" "csel")]
+)
+
+(define_insn_and_split "*cmov_uxtw_insn_insv"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+        (zero_extend:DI
+	 (xor:SI
+	  (neg:SI
+	   (match_operator:SI 1 "aarch64_comparison_operator"
+	    [(match_operand 2 "cc_register" "") (const_int 0)]))
+	  (match_operand:SI 3 "general_operand" "r"))))]
+  "can_create_pseudo_p ()"
+  "#"
+  "&& true"
+  [(set (match_dup 0)
+	(if_then_else:DI (match_dup 1)
+			  (zero_extend:DI (not:SI (match_dup 3)))
+			  (zero_extend:DI (match_dup 3))))]
+  {
+    operands[3] = force_reg (SImode, operands[3]);
+  }
+  [(set_attr "type" "csel")]
+)
+
 ;; If X can be loaded by a single CNT[BHWD] instruction,
 ;;
 ;;    A = UMAX (B, X)
diff --git a/gcc/testsuite/gcc.target/aarch64/cond_op-1.c b/gcc/testsuite/gcc.target/aarch64/cond_op-1.c
new file mode 100644
index 00000000000..e6c7821127e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cond_op-1.c
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* PR target/110986 */
+
+
+long long full(unsigned a, unsigned b)
+{
+  return a ? ~b : b;
+}
+unsigned fuu(unsigned a, unsigned b)
+{
+  return a ? ~b : b;
+}
+long long fllll(unsigned long long a, unsigned long long b)
+{
+  return a ? ~b : b;
+}
+
+/* { dg-final { scan-assembler-times "csinv\tw\[0-9\]*" 2 } } */
+/* { dg-final { scan-assembler-times "csinv\tx\[0-9\]*" 1 } } */