aarch64: [PR110986] Emit csinv again for `a ? ~b : b`
Checks
Commit Message
After r14-3110-g7fb65f10285, the canonical form for
`a ? ~b : b` changed to be `-(a) ^ b` that means
for aarch64 we need to add a few new insn patterns
to be able to catch this and change it to be
what is the canonical form for the aarch64 backend.
A secondary pattern was needed to support a zero_extended
form too; this adds a testcase for all 3 cases.
Bootstrapped and tested on aarch64-linux-gnu with no regressions.
PR target/110986
gcc/ChangeLog:
* config/aarch64/aarch64.md (*cmov<mode>_insn_insv): New pattern.
(*cmov_uxtw_insn_insv): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cond_op-1.c: New test.
---
gcc/config/aarch64/aarch64.md | 46 ++++++++++++++++++++
gcc/testsuite/gcc.target/aarch64/cond_op-1.c | 20 +++++++++
2 files changed, 66 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/aarch64/cond_op-1.c
Comments
Andrew Pinski <pinskia@gmail.com> writes:
> After r14-3110-g7fb65f10285, the canonical form for
> `a ? ~b : b` changed to be `-(a) ^ b` that means
> for aarch64 we need to add a few new insn patterns
> to be able to catch this and change it to be
> what is the canonical form for the aarch64 backend.
> A secondary pattern was needed to support a zero_extended
> form too; this adds a testcase for all 3 cases.
From the comment in the patch, it sounds like we don't really have
a target-independent canonical form. That is, we can't just rewrite
the old pattern to use the new form.
It would be nice there was a canonical form, but I won't push it.
> Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>
> PR target/110986
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md (*cmov<mode>_insn_insv): New pattern.
> (*cmov_uxtw_insn_insv): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/cond_op-1.c: New test.
> ---
> gcc/config/aarch64/aarch64.md | 46 ++++++++++++++++++++
> gcc/testsuite/gcc.target/aarch64/cond_op-1.c | 20 +++++++++
> 2 files changed, 66 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/cond_op-1.c
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 32c7adc8928..59cd0415937 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4413,6 +4413,52 @@ (define_insn "*csinv3_uxtw_insn3"
> [(set_attr "type" "csel")]
> )
>
> +;; There are two canonical forms for `cmp ? ~a : a`.
> +;; This is the second form and is here to help combine.
> +;; Support `-(cmp) ^ a` into `cmp ? ~a : a`
> +;; The second pattern is to support the zero extend'ed version.
> +
> +(define_insn_and_split "*cmov<mode>_insn_insv"
> + [(set (match_operand:GPI 0 "register_operand" "=r")
> + (xor:GPI
> + (neg:GPI
> + (match_operator:GPI 1 "aarch64_comparison_operator"
> + [(match_operand 2 "cc_register" "") (const_int 0)]))
> + (match_operand:GPI 3 "general_operand" "r")))]
> + "can_create_pseudo_p ()"
> + "#"
> + "&& true"
IMO this is an ICE trap, since it hard-codes the assumption that there
will be a split pass after the last pre-LRA call to recog. I think we
should jsut provide the asm directly instead.
Looks good otherwise, thanks.
Richard
> + [(set (match_dup 0)
> + (if_then_else:GPI (match_dup 1)
> + (not:GPI (match_dup 3))
> + (match_dup 3)))]
> + {
> + operands[3] = force_reg (<MODE>mode, operands[3]);
> + }
> + [(set_attr "type" "csel")]
> +)
> +
> +(define_insn_and_split "*cmov_uxtw_insn_insv"
> + [(set (match_operand:DI 0 "register_operand" "=r")
> + (zero_extend:DI
> + (xor:SI
> + (neg:SI
> + (match_operator:SI 1 "aarch64_comparison_operator"
> + [(match_operand 2 "cc_register" "") (const_int 0)]))
> + (match_operand:SI 3 "general_operand" "r"))))]
> + "can_create_pseudo_p ()"
> + "#"
> + "&& true"
> + [(set (match_dup 0)
> + (if_then_else:DI (match_dup 1)
> + (zero_extend:DI (not:SI (match_dup 3)))
> + (zero_extend:DI (match_dup 3))))]
> + {
> + operands[3] = force_reg (SImode, operands[3]);
> + }
> + [(set_attr "type" "csel")]
> +)
> +
> ;; If X can be loaded by a single CNT[BHWD] instruction,
> ;;
> ;; A = UMAX (B, X)
> diff --git a/gcc/testsuite/gcc.target/aarch64/cond_op-1.c b/gcc/testsuite/gcc.target/aarch64/cond_op-1.c
> new file mode 100644
> index 00000000000..e6c7821127e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/cond_op-1.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* PR target/110986 */
> +
> +
> +long long full(unsigned a, unsigned b)
> +{
> + return a ? ~b : b;
> +}
> +unsigned fuu(unsigned a, unsigned b)
> +{
> + return a ? ~b : b;
> +}
> +long long fllll(unsigned long long a, unsigned long long b)
> +{
> + return a ? ~b : b;
> +}
> +
> +/* { dg-final { scan-assembler-times "csinv\tw\[0-9\]*" 2 } } */
> +/* { dg-final { scan-assembler-times "csinv\tx\[0-9\]*" 1 } } */
On 20/10/2023 13:13, Richard Sandiford wrote:
>> +(define_insn_and_split "*cmov<mode>_insn_insv"
>> + [(set (match_operand:GPI 0 "register_operand" "=r")
>> + (xor:GPI
>> + (neg:GPI
>> + (match_operator:GPI 1 "aarch64_comparison_operator"
>> + [(match_operand 2 "cc_register" "") (const_int 0)]))
>> + (match_operand:GPI 3 "general_operand" "r")))]
>> + "can_create_pseudo_p ()"
>> + "#"
>> + "&& true"
>
> IMO this is an ICE trap, since it hard-codes the assumption that there
> will be a split pass after the last pre-LRA call to recog. I think we
> should jsut provide the asm directly instead.
So why not add
(clobber (match_operand:GPI 4 "register_operand" "=&r"))
to the insn, then you'll always get the scratch needed and the need to
check cane_create_pseudo_p goes away.
R.
Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes:
> On 20/10/2023 13:13, Richard Sandiford wrote:
>>> +(define_insn_and_split "*cmov<mode>_insn_insv"
>>> + [(set (match_operand:GPI 0 "register_operand" "=r")
>>> + (xor:GPI
>>> + (neg:GPI
>>> + (match_operator:GPI 1 "aarch64_comparison_operator"
>>> + [(match_operand 2 "cc_register" "") (const_int 0)]))
>>> + (match_operand:GPI 3 "general_operand" "r")))]
>>> + "can_create_pseudo_p ()"
>>> + "#"
>>> + "&& true"
> >
>> IMO this is an ICE trap, since it hard-codes the assumption that there
>> will be a split pass after the last pre-LRA call to recog. I think we
>> should jsut provide the asm directly instead.
>
> So why not add
>
> (clobber (match_operand:GPI 4 "register_operand" "=&r"))
>
> to the insn, then you'll always get the scratch needed and the need to
> check cane_create_pseudo_p goes away.
I think the "general_operand" "r" works in terms of ensuring that the
source is a GPR. So we shouldn't need a separate clobber.
Our off-list discussion made me realise that my concern above wasn't
very clear. In principle, it should be possible for any pass to clear
INSN_CODE and then rerecognise the pattern using recog. So I think it's
wrong (or at least dangerous) for insns to require can_create_pseudo_p.
It means that an insn starts out valid and suddenly becomes invalid
half way through RTL compilation.
But looking at it again, the patch seems correct with just the
can_create_pseudo_p conditions removed. The constraints seem to
satisfy what csinv requires, and the force_reg should be a no-op
after RA.
So the patch is OK with just the can_create_pseudo_p tests removed.
Sorry for the run-around, and thanks for pushing back.
Richard
@@ -4413,6 +4413,52 @@ (define_insn "*csinv3_uxtw_insn3"
[(set_attr "type" "csel")]
)
+;; There are two canonical forms for `cmp ? ~a : a`.
+;; This is the second form and is here to help combine.
+;; Support `-(cmp) ^ a` into `cmp ? ~a : a`
+;; The second pattern is to support the zero extend'ed version.
+
+(define_insn_and_split "*cmov<mode>_insn_insv"
+ [(set (match_operand:GPI 0 "register_operand" "=r")
+ (xor:GPI
+ (neg:GPI
+ (match_operator:GPI 1 "aarch64_comparison_operator"
+ [(match_operand 2 "cc_register" "") (const_int 0)]))
+ (match_operand:GPI 3 "general_operand" "r")))]
+ "can_create_pseudo_p ()"
+ "#"
+ "&& true"
+ [(set (match_dup 0)
+ (if_then_else:GPI (match_dup 1)
+ (not:GPI (match_dup 3))
+ (match_dup 3)))]
+ {
+ operands[3] = force_reg (<MODE>mode, operands[3]);
+ }
+ [(set_attr "type" "csel")]
+)
+
+(define_insn_and_split "*cmov_uxtw_insn_insv"
+ [(set (match_operand:DI 0 "register_operand" "=r")
+ (zero_extend:DI
+ (xor:SI
+ (neg:SI
+ (match_operator:SI 1 "aarch64_comparison_operator"
+ [(match_operand 2 "cc_register" "") (const_int 0)]))
+ (match_operand:SI 3 "general_operand" "r"))))]
+ "can_create_pseudo_p ()"
+ "#"
+ "&& true"
+ [(set (match_dup 0)
+ (if_then_else:DI (match_dup 1)
+ (zero_extend:DI (not:SI (match_dup 3)))
+ (zero_extend:DI (match_dup 3))))]
+ {
+ operands[3] = force_reg (SImode, operands[3]);
+ }
+ [(set_attr "type" "csel")]
+)
+
;; If X can be loaded by a single CNT[BHWD] instruction,
;;
;; A = UMAX (B, X)
new file mode 100644
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* PR target/110986 */
+
+
+long long full(unsigned a, unsigned b)
+{
+ return a ? ~b : b;
+}
+unsigned fuu(unsigned a, unsigned b)
+{
+ return a ? ~b : b;
+}
+long long fllll(unsigned long long a, unsigned long long b)
+{
+ return a ? ~b : b;
+}
+
+/* { dg-final { scan-assembler-times "csinv\tw\[0-9\]*" 2 } } */
+/* { dg-final { scan-assembler-times "csinv\tx\[0-9\]*" 1 } } */