Missed lowering to ld1rq from svld1rq for memory operand

Message ID CAAgBjM=5ELyC+e3McMiaS--hiaR1yqxzKqvT3466XGBQjC_jig@mail.gmail.com
State New, archived
Headers
Series Missed lowering to ld1rq from svld1rq for memory operand |

Commit Message

Prathamesh Kulkarni Aug. 5, 2022, 11:32 a.m. UTC
  Hi Richard,
Following from off-list discussion, in the attached patch, I wrote pattern
similar to vec_duplicate<mode>_reg, which seems to work for the svld1rq tests.
Does it look OK ?

Sorry, I didn't fully understand your suggestion on integrating with
vec_duplicate<mode>_reg
pattern. For vec_duplicate<mode>_reg, the operand to vec_duplicate expects
mode to be <VEL>, while the pattern in patch expects operand of
vec_duplicate to have mode <V128>.
How do we write a pattern so an operand can accept either of the 2 modes ?
Also it seems <V128> cannot be used with SVE_ALL ?

Thanks,
Prathamesh
  

Comments

Richard Sandiford Aug. 5, 2022, 12:19 p.m. UTC | #1
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> Hi Richard,
> Following from off-list discussion, in the attached patch, I wrote pattern
> similar to vec_duplicate<mode>_reg, which seems to work for the svld1rq tests.
> Does it look OK ?
>
> Sorry, I didn't fully understand your suggestion on integrating with
> vec_duplicate<mode>_reg
> pattern. For vec_duplicate<mode>_reg, the operand to vec_duplicate expects
> mode to be <VEL>, while the pattern in patch expects operand of
> vec_duplicate to have mode <V128>.
> How do we write a pattern so an operand can accept either of the 2 modes ?

I quoted the wrong one, sorry, should have been
aarch64_vec_duplicate_vq<mode>_le.

> Also it seems <V128> cannot be used with SVE_ALL ?

Yeah, these would be SVE_FULL only.

Richard
  
Prathamesh Kulkarni Jan. 10, 2023, 5:34 p.m. UTC | #2
On Fri, 5 Aug 2022 at 17:49, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > Hi Richard,
> > Following from off-list discussion, in the attached patch, I wrote pattern
> > similar to vec_duplicate<mode>_reg, which seems to work for the svld1rq tests.
> > Does it look OK ?
> >
> > Sorry, I didn't fully understand your suggestion on integrating with
> > vec_duplicate<mode>_reg
> > pattern. For vec_duplicate<mode>_reg, the operand to vec_duplicate expects
> > mode to be <VEL>, while the pattern in patch expects operand of
> > vec_duplicate to have mode <V128>.
> > How do we write a pattern so an operand can accept either of the 2 modes ?
>
> I quoted the wrong one, sorry, should have been
> aarch64_vec_duplicate_vq<mode>_le.
>
> > Also it seems <V128> cannot be used with SVE_ALL ?
>
> Yeah, these would be SVE_FULL only.
Hi Richard,
Sorry for the very late reply. I have attached patch, to integrate
with vec_duplicate_vq<mode>_le.
Bootstrapped+tested on aarch64-linux-gnu.
OK to commit ?

Thanks,
Prathamesh
>
> Richard
>
gcc/
	* config/aarch64/aarch64-sve.md (aarch64_vec_duplicate_vq<mode>_le):
	Change to define_insn_and_split to fold ldr+dup to ld1rq.
	* config/aarch64/predicates.md (aarch64_sve_dup_ld1rq_operand): New.

testsuite/
	* gcc.target/aarch64/sve/acle/general/pr96463-2.c: Adjust.

diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index b8cc47ef5fc..4548375b8d6 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -2533,14 +2533,34 @@
 )
 
 ;; Duplicate an Advanced SIMD vector to fill an SVE vector (LE version).
-(define_insn "@aarch64_vec_duplicate_vq<mode>_le"
-  [(set (match_operand:SVE_FULL 0 "register_operand" "=w")
+
+(define_insn_and_split "@aarch64_vec_duplicate_vq<mode>_le"
+  [(set (match_operand:SVE_FULL 0 "register_operand" "=w, w")
 	(vec_duplicate:SVE_FULL
-	  (match_operand:<V128> 1 "register_operand" "w")))]
+	  (match_operand:<V128> 1 "aarch64_sve_dup_ld1rq_operand" "w, UtQ")))
+   (clobber (match_scratch:VNx16BI 2 "=X, Upl"))]
   "TARGET_SVE && !BYTES_BIG_ENDIAN"
   {
-    operands[1] = gen_rtx_REG (<MODE>mode, REGNO (operands[1]));
-    return "dup\t%0.q, %1.q[0]";
+    switch (which_alternative)
+      {
+	case 0:
+	  operands[1] = gen_rtx_REG (<MODE>mode, REGNO (operands[1]));
+	  return "dup\t%0.q, %1.q[0]";
+	case 1:
+	  return "#";
+	default:
+	  gcc_unreachable ();
+      }
+  }
+  "&& MEM_P (operands[1])"
+  [(const_int 0)]
+  {
+    if (GET_CODE (operands[2]) == SCRATCH)
+      operands[2] = gen_reg_rtx (VNx16BImode);
+    emit_move_insn (operands[2], CONSTM1_RTX (VNx16BImode));
+    rtx gp = gen_lowpart (<VPRED>mode, operands[2]);
+    emit_insn (gen_aarch64_sve_ld1rq<mode> (operands[0], operands[1], gp));
+    DONE;
   }
 )
 
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index ff7f73d3f30..6062f37025e 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -676,6 +676,10 @@
   (ior (match_operand 0 "register_operand")
        (match_operand 0 "aarch64_sve_ld1r_operand")))
 
+(define_predicate "aarch64_sve_dup_ld1rq_operand"
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "aarch64_sve_ld1rq_operand")))
+
 (define_predicate "aarch64_sve_ptrue_svpattern_immediate"
   (and (match_code "const")
        (match_test "aarch64_sve_ptrue_svpattern_p (op, NULL)")))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
index 196de3f5e0a..c38204e6874 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
@@ -26,4 +26,4 @@ TEST(svfloat64_t, float64_t, f64)
 
 TEST(svbfloat16_t, bfloat16_t, bf16)
 
-/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]} 12 { target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-not {\tdup\t} } } */
  
Richard Sandiford Jan. 12, 2023, 3:32 p.m. UTC | #3
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Fri, 5 Aug 2022 at 17:49, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> > Hi Richard,
>> > Following from off-list discussion, in the attached patch, I wrote pattern
>> > similar to vec_duplicate<mode>_reg, which seems to work for the svld1rq tests.
>> > Does it look OK ?
>> >
>> > Sorry, I didn't fully understand your suggestion on integrating with
>> > vec_duplicate<mode>_reg
>> > pattern. For vec_duplicate<mode>_reg, the operand to vec_duplicate expects
>> > mode to be <VEL>, while the pattern in patch expects operand of
>> > vec_duplicate to have mode <V128>.
>> > How do we write a pattern so an operand can accept either of the 2 modes ?
>>
>> I quoted the wrong one, sorry, should have been
>> aarch64_vec_duplicate_vq<mode>_le.
>>
>> > Also it seems <V128> cannot be used with SVE_ALL ?
>>
>> Yeah, these would be SVE_FULL only.
> Hi Richard,
> Sorry for the very late reply. I have attached patch, to integrate
> with vec_duplicate_vq<mode>_le.
> Bootstrapped+tested on aarch64-linux-gnu.
> OK to commit ?
>
> Thanks,
> Prathamesh
>>
>> Richard
>>
>
> gcc/
> 	* config/aarch64/aarch64-sve.md (aarch64_vec_duplicate_vq<mode>_le):
> 	Change to define_insn_and_split to fold ldr+dup to ld1rq.
> 	* config/aarch64/predicates.md (aarch64_sve_dup_ld1rq_operand): New.
>
> testsuite/
> 	* gcc.target/aarch64/sve/acle/general/pr96463-2.c: Adjust.
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
> index b8cc47ef5fc..4548375b8d6 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -2533,14 +2533,34 @@
>  )
>  
>  ;; Duplicate an Advanced SIMD vector to fill an SVE vector (LE version).
> -(define_insn "@aarch64_vec_duplicate_vq<mode>_le"
> -  [(set (match_operand:SVE_FULL 0 "register_operand" "=w")
> +
> +(define_insn_and_split "@aarch64_vec_duplicate_vq<mode>_le"
> +  [(set (match_operand:SVE_FULL 0 "register_operand" "=w, w")
>  	(vec_duplicate:SVE_FULL
> -	  (match_operand:<V128> 1 "register_operand" "w")))]
> +	  (match_operand:<V128> 1 "aarch64_sve_dup_ld1rq_operand" "w, UtQ")))
> +   (clobber (match_scratch:VNx16BI 2 "=X, Upl"))]
>    "TARGET_SVE && !BYTES_BIG_ENDIAN"
>    {
> -    operands[1] = gen_rtx_REG (<MODE>mode, REGNO (operands[1]));
> -    return "dup\t%0.q, %1.q[0]";
> +    switch (which_alternative)
> +      {
> +	case 0:
> +	  operands[1] = gen_rtx_REG (<MODE>mode, REGNO (operands[1]));
> +	  return "dup\t%0.q, %1.q[0]";
> +	case 1:
> +	  return "#";
> +	default:
> +	  gcc_unreachable ();
> +      }
> +  }
> +  "&& MEM_P (operands[1])"
> +  [(const_int 0)]
> +  {
> +    if (GET_CODE (operands[2]) == SCRATCH)
> +      operands[2] = gen_reg_rtx (VNx16BImode);
> +    emit_move_insn (operands[2], CONSTM1_RTX (VNx16BImode));
> +    rtx gp = gen_lowpart (<VPRED>mode, operands[2]);
> +    emit_insn (gen_aarch64_sve_ld1rq<mode> (operands[0], operands[1], gp));
> +    DONE;
>    }
>  )
>  
> diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
> index ff7f73d3f30..6062f37025e 100644
> --- a/gcc/config/aarch64/predicates.md
> +++ b/gcc/config/aarch64/predicates.md
> @@ -676,6 +676,10 @@
>    (ior (match_operand 0 "register_operand")
>         (match_operand 0 "aarch64_sve_ld1r_operand")))
>  
> +(define_predicate "aarch64_sve_dup_ld1rq_operand"
> +  (ior (match_operand 0 "register_operand")
> +       (match_operand 0 "aarch64_sve_ld1rq_operand")))
> +
>  (define_predicate "aarch64_sve_ptrue_svpattern_immediate"
>    (and (match_code "const")
>         (match_test "aarch64_sve_ptrue_svpattern_p (op, NULL)")))
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
> index 196de3f5e0a..c38204e6874 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
> @@ -26,4 +26,4 @@ TEST(svfloat64_t, float64_t, f64)
>  
>  TEST(svbfloat16_t, bfloat16_t, bf16)
>  
> -/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]} 12 { target aarch64_little_endian } } } */
> +/* { dg-final { scan-assembler-not {\tdup\t} } } */

It would be good to add something like:

/* { dg-final { scan-assembler-times {\tld1rq\t} 12 } } */

(I assume it'll pass for both endiannesses, but please check!),
in addition to the scan-assembler-not.

OK with that change, thanks.

Richard
  
Prathamesh Kulkarni Jan. 14, 2023, 5:59 p.m. UTC | #4
On Thu, 12 Jan 2023 at 21:02, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > On Fri, 5 Aug 2022 at 17:49, Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> >> > Hi Richard,
> >> > Following from off-list discussion, in the attached patch, I wrote pattern
> >> > similar to vec_duplicate<mode>_reg, which seems to work for the svld1rq tests.
> >> > Does it look OK ?
> >> >
> >> > Sorry, I didn't fully understand your suggestion on integrating with
> >> > vec_duplicate<mode>_reg
> >> > pattern. For vec_duplicate<mode>_reg, the operand to vec_duplicate expects
> >> > mode to be <VEL>, while the pattern in patch expects operand of
> >> > vec_duplicate to have mode <V128>.
> >> > How do we write a pattern so an operand can accept either of the 2 modes ?
> >>
> >> I quoted the wrong one, sorry, should have been
> >> aarch64_vec_duplicate_vq<mode>_le.
> >>
> >> > Also it seems <V128> cannot be used with SVE_ALL ?
> >>
> >> Yeah, these would be SVE_FULL only.
> > Hi Richard,
> > Sorry for the very late reply. I have attached patch, to integrate
> > with vec_duplicate_vq<mode>_le.
> > Bootstrapped+tested on aarch64-linux-gnu.
> > OK to commit ?
> >
> > Thanks,
> > Prathamesh
> >>
> >> Richard
> >>
> >
> > gcc/
> >       * config/aarch64/aarch64-sve.md (aarch64_vec_duplicate_vq<mode>_le):
> >       Change to define_insn_and_split to fold ldr+dup to ld1rq.
> >       * config/aarch64/predicates.md (aarch64_sve_dup_ld1rq_operand): New.
> >
> > testsuite/
> >       * gcc.target/aarch64/sve/acle/general/pr96463-2.c: Adjust.
> >
> > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
> > index b8cc47ef5fc..4548375b8d6 100644
> > --- a/gcc/config/aarch64/aarch64-sve.md
> > +++ b/gcc/config/aarch64/aarch64-sve.md
> > @@ -2533,14 +2533,34 @@
> >  )
> >
> >  ;; Duplicate an Advanced SIMD vector to fill an SVE vector (LE version).
> > -(define_insn "@aarch64_vec_duplicate_vq<mode>_le"
> > -  [(set (match_operand:SVE_FULL 0 "register_operand" "=w")
> > +
> > +(define_insn_and_split "@aarch64_vec_duplicate_vq<mode>_le"
> > +  [(set (match_operand:SVE_FULL 0 "register_operand" "=w, w")
> >       (vec_duplicate:SVE_FULL
> > -       (match_operand:<V128> 1 "register_operand" "w")))]
> > +       (match_operand:<V128> 1 "aarch64_sve_dup_ld1rq_operand" "w, UtQ")))
> > +   (clobber (match_scratch:VNx16BI 2 "=X, Upl"))]
> >    "TARGET_SVE && !BYTES_BIG_ENDIAN"
> >    {
> > -    operands[1] = gen_rtx_REG (<MODE>mode, REGNO (operands[1]));
> > -    return "dup\t%0.q, %1.q[0]";
> > +    switch (which_alternative)
> > +      {
> > +     case 0:
> > +       operands[1] = gen_rtx_REG (<MODE>mode, REGNO (operands[1]));
> > +       return "dup\t%0.q, %1.q[0]";
> > +     case 1:
> > +       return "#";
> > +     default:
> > +       gcc_unreachable ();
> > +      }
> > +  }
> > +  "&& MEM_P (operands[1])"
> > +  [(const_int 0)]
> > +  {
> > +    if (GET_CODE (operands[2]) == SCRATCH)
> > +      operands[2] = gen_reg_rtx (VNx16BImode);
> > +    emit_move_insn (operands[2], CONSTM1_RTX (VNx16BImode));
> > +    rtx gp = gen_lowpart (<VPRED>mode, operands[2]);
> > +    emit_insn (gen_aarch64_sve_ld1rq<mode> (operands[0], operands[1], gp));
> > +    DONE;
> >    }
> >  )
> >
> > diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
> > index ff7f73d3f30..6062f37025e 100644
> > --- a/gcc/config/aarch64/predicates.md
> > +++ b/gcc/config/aarch64/predicates.md
> > @@ -676,6 +676,10 @@
> >    (ior (match_operand 0 "register_operand")
> >         (match_operand 0 "aarch64_sve_ld1r_operand")))
> >
> > +(define_predicate "aarch64_sve_dup_ld1rq_operand"
> > +  (ior (match_operand 0 "register_operand")
> > +       (match_operand 0 "aarch64_sve_ld1rq_operand")))
> > +
> >  (define_predicate "aarch64_sve_ptrue_svpattern_immediate"
> >    (and (match_code "const")
> >         (match_test "aarch64_sve_ptrue_svpattern_p (op, NULL)")))
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
> > index 196de3f5e0a..c38204e6874 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
> > @@ -26,4 +26,4 @@ TEST(svfloat64_t, float64_t, f64)
> >
> >  TEST(svbfloat16_t, bfloat16_t, bf16)
> >
> > -/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]} 12 { target aarch64_little_endian } } } */
> > +/* { dg-final { scan-assembler-not {\tdup\t} } } */
>
> It would be good to add something like:
>
> /* { dg-final { scan-assembler-times {\tld1rq\t} 12 } } */
>
> (I assume it'll pass for both endiannesses, but please check!),
> in addition to the scan-assembler-not.
>
> OK with that change, thanks.
Thanks, committed the patch in
a3b99b84609af310c72b4d6221621f5b63a3c169 after adjusting the
test-case,
and verifying that we generate ld1rq for big endian targets, and
bootstrap+test on aarch64-linux-gnu.

Thanks,
Prathamesh
>
> Richard
  

Patch

diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index bd60e65b0c3..b0dc33870b8 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -2504,6 +2504,27 @@ 
   }
 )
 
+;; Fold ldr+dup -> ld1rq
+
+(define_insn_and_split "*vec_duplicate<mode>_ld1rq"
+  [(set (match_operand:SVE_FULL 0 "register_operand" "=w")
+	(vec_duplicate:SVE_FULL
+	  (match_operand:<V128> 1 "aarch64_sve_ld1rq_operand" "UtQ")))
+   (clobber (match_scratch:VNx16BI 2 "=Upl"))]
+  "TARGET_SVE"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+    if (GET_CODE (operands[2]) == SCRATCH)
+      operands[2] = gen_reg_rtx (VNx16BImode);
+    emit_move_insn (operands[2], CONSTM1_RTX (VNx16BImode));
+    rtx gp = gen_lowpart (<VPRED>mode, operands[2]);
+    emit_insn (gen_aarch64_sve_ld1rq<mode> (operands[0], operands[1], gp));
+    DONE;
+  }
+)
+
 ;; Accept memory operands for the benefit of combine, and also in case
 ;; the scalar input gets spilled to memory during RA.  We want to split
 ;; the load at the first opportunity in order to allow the PTRUE to be
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
index 196de3f5e0a..0dfe125507f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
@@ -26,4 +26,8 @@  TEST(svfloat64_t, float64_t, f64)
 
 TEST(svbfloat16_t, bfloat16_t, bf16)
 
-/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]} 12 { target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-not "dup" { target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rqb\tz0\.b, p0/z, \[x0\]} 2 { target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rqh\tz0\.h, p0/z, \[x0\]} 4 { target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rqw\tz0\.s, p0/z, \[x0\]} 3 { target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rqd\tz0\.d, p0/z, \[x0\]} 3 { target aarch64_little_endian } } } */