[PATCH-2,rs6000] Enable vector mode for memory equality compare [PR111449]

Message ID aa42d6bc-151e-515b-81a6-d08925c047ac@linux.ibm.com
State Accepted
Headers
Series [PATCH-2,rs6000] Enable vector mode for memory equality compare [PR111449] |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

HAO CHEN GUI Oct. 9, 2023, 2:30 a.m. UTC
  Hi,
  This patch enables vector mode for memory equality compare by adding
a new expand cbranchv16qi4 and implementing it. Also the corresponding
CC reg and compare code is set in rs6000_generate_compare. With the
patch, 16-byte equality compare can be implemented by one vector compare
instructions other than 2 8-byte compares with branches.

  The test case is in the second patch which is rs6000 specific.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions.

Thanks
Gui Haochen

ChangeLog
rs6000: Enable vector compare for memory equality compare

gcc/
	PR target/111449
	* config/rs6000/altivec.md (cbranchv16qi4): New expand pattern.
	* config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn
	sequence for V16QImode equality compare.
	* config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
	(COMPARE_MAX_PIECES): Define.

gcc/testsuite/
	PR target/111449
	* gcc.target/powerpc/pr111449.c: New.

patch.diff
  

Comments

David Edelsohn Oct. 9, 2023, 3:42 p.m. UTC | #1
On Sun, Oct 8, 2023 at 10:30 PM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote:

> Hi,
>   This patch enables vector mode for memory equality compare by adding
> a new expand cbranchv16qi4 and implementing it. Also the corresponding
> CC reg and compare code is set in rs6000_generate_compare. With the
> patch, 16-byte equality compare can be implemented by one vector compare
> instructions other than 2 8-byte compares with branches.
>
>   The test case is in the second patch which is rs6000 specific.
>
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions.
>

Thanks for working on this.



>
> Thanks
> Gui Haochen
>
> ChangeLog
> rs6000: Enable vector compare for memory equality compare
>
> gcc/
>         PR target/111449
>         * config/rs6000/altivec.md (cbranchv16qi4): New expand pattern.
>         * config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn
>         sequence for V16QImode equality compare.
>         * config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
>         (COMPARE_MAX_PIECES): Define.
>
> gcc/testsuite/
>         PR target/111449
>         * gcc.target/powerpc/pr111449.c: New.
>
> patch.diff
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index e8a596fb7e9..c69bf266402 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2605,6 +2605,39 @@ (define_insn "altivec_vupklpx"
>  }
>    [(set_attr "type" "vecperm")])
>
> +(define_expand "cbranchv16qi4"
> +  [(use (match_operator 0 "equality_operator"
> +       [(match_operand:V16QI 1 "gpc_reg_operand")
> +        (match_operand:V16QI 2 "gpc_reg_operand")]))
> +   (use (match_operand 3))]
> +  "VECTOR_UNIT_ALTIVEC_P (V16QImode)"
> +{
> +  if (!TARGET_P9_VECTOR
> +      && MEM_P (operands[1])
> +      && !altivec_indexed_or_indirect_operand (operands[1], V16QImode)
> +      && MEM_P (operands[2])
> +      && !altivec_indexed_or_indirect_operand (operands[2], V16QImode))
> +    {
> +      /* Use direct move as the byte order doesn't matter for equality
> +        compare.  */
> +      rtx reg_op1 = gen_reg_rtx (V16QImode);
> +      rtx reg_op2 = gen_reg_rtx (V16QImode);
> +      rs6000_emit_le_vsx_permute (reg_op1, operands[1], V16QImode);
> +      rs6000_emit_le_vsx_permute (reg_op2, operands[2], V16QImode);
> +      operands[1] = reg_op1;
> +      operands[2] = reg_op2;
> +    }
> +  else
> +    {
> +      operands[1] = force_reg (V16QImode, operands[1]);
> +      operands[2] = force_reg (V16QImode, operands[2]);
> +    }
> +  rtx_code code = GET_CODE (operands[0]);
> +  operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1],
> operands[2]);
> +  rs6000_emit_cbranch (V16QImode, operands);
> +  DONE;
> +})
> +
>  ;; Compare vectors producing a vector result and a predicate, setting CR6
> to
>  ;; indicate a combined status
>  (define_insn "altivec_vcmpequ<VI_char>_p"
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index efe9adce1f8..0087d786840 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -15264,6 +15264,15 @@ rs6000_generate_compare (rtx cmp, machine_mode
> mode)
>           else
>             emit_insn (gen_stack_protect_testsi (compare_result, op0,
> op1b));
>         }
> +      else if (mode == V16QImode)
> +       {
> +         gcc_assert (code == EQ || code == NE);
> +
> +         rtx result_vector = gen_reg_rtx (V16QImode);
> +         compare_result = gen_rtx_REG (CCmode, CR6_REGNO);
> +         emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1));
> +         code = (code == NE) ? GE : LT;
> +       }
>        else
>         emit_insn (gen_rtx_SET (compare_result,
>                                 gen_rtx_COMPARE (comp_mode, op0, op1)));
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 3503614efbd..dc33bca0802 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -1730,6 +1730,8 @@ typedef struct rs6000_args
>     in one reasonably fast instruction.  */
>  #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
>  #define MAX_MOVE_MAX 8
> +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
> +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
>

How are the definitions of MOVE_MAX_PIECES and COMPARE_MAX_PIECES
determined?  The email does not provide any explanation for the
implementation.  The rest of the patch is related to vector support, but
vector support is not dependent on TARGET_POWERPC64.

Thanks, David


>
>  /* Nonzero if access to memory by bytes is no faster than for words.
>     Also nonzero if doing byte operations (specifically shifts) in
> registers
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449.c
> b/gcc/testsuite/gcc.target/powerpc/pr111449.c
> new file mode 100644
> index 00000000000..a8c30b92a41
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr111449.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-maltivec -O2" } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> +
> +/* Ensure vector comparison is used for 16-byte memory equality compare.
> */
> +
> +int compare1 (const char* s1, const char* s2)
> +{
> +  return __builtin_memcmp (s1, s2, 16) == 0;
> +}
> +
> +int compare2 (const char* s1)
> +{
> +  return __builtin_memcmp (s1, "0123456789012345", 16) == 0;
> +}
> +
> +/* { dg-final { scan-assembler-times {\mvcmpequb\.} 2 } } */
> +/* { dg-final { scan-assembler-not {\mcmpd\M} } } */
>
  
HAO CHEN GUI Oct. 10, 2023, 8:18 a.m. UTC | #2
Hi David,

  Thanks for your review comments.

在 2023/10/9 23:42, David Edelsohn 写道:
>      #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
>      #define MAX_MOVE_MAX 8
>     +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
>     +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
> 
> 
> How are the definitions of MOVE_MAX_PIECES and COMPARE_MAX_PIECES determined?  The email does not provide any explanation for the implementation.  The rest of the patch is related to vector support, but vector support is not dependent on TARGET_POWERPC64.

By default, MOVE_MAX_PIECES and COMPARE_MAX_PIECES is set the same value
as MOVE_MAX. The move and compare instructions are required in
compare_by_pieces, those macros are set to 16 byte when supporting
vector mode (V16QImode). The problem is rs6000 hasn't supported TImode
for "-m32". We discussed it in issue 1307. TImode will be used for
move when MOVE_MAX_PIECES is set to 16. But TImode isn't supported
with "-m32" which might cause ICE.

So MOVE_MAX_PIECES and COMPARE_MAX_PIECES is set to 4 for 32 bit
target in this patch. They could be changed to 16 after rs6000
supports TImode with "-m32".

Thanks
Gui Haochen
  
David Edelsohn Oct. 10, 2023, 12:44 p.m. UTC | #3
On Tue, Oct 10, 2023 at 4:23 AM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote:

> Hi David,
>
>   Thanks for your review comments.
>
> 在 2023/10/9 23:42, David Edelsohn 写道:
> >      #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
> >      #define MAX_MOVE_MAX 8
> >     +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
> >     +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
> >
> >
> > How are the definitions of MOVE_MAX_PIECES and COMPARE_MAX_PIECES
> determined?  The email does not provide any explanation for the
> implementation.  The rest of the patch is related to vector support, but
> vector support is not dependent on TARGET_POWERPC64.
>
> By default, MOVE_MAX_PIECES and COMPARE_MAX_PIECES is set the same value
> as MOVE_MAX. The move and compare instructions are required in
> compare_by_pieces, those macros are set to 16 byte when supporting
> vector mode (V16QImode). The problem is rs6000 hasn't supported TImode
> for "-m32". We discussed it in issue 1307. TImode will be used for
> move when MOVE_MAX_PIECES is set to 16. But TImode isn't supported
> with "-m32" which might cause ICE.
>
> So MOVE_MAX_PIECES and COMPARE_MAX_PIECES is set to 4 for 32 bit
> target in this patch. They could be changed to 16 after rs6000
> supports TImode with "-m32".
>

Hi, Hao

Thanks for the explanation.

Are you stating that although PPC32 supports V16QImode in VSX, the
move_by_pieces support also requires TImode, which is not available on
PPC32?

Thanks, David
  
HAO CHEN GUI Oct. 11, 2023, 9:10 a.m. UTC | #4
Hi David,

在 2023/10/10 20:44, David Edelsohn 写道:
> Are you stating that although PPC32 supports V16QImode in VSX, the move_by_pieces support also requires TImode, which is not available on PPC32?
> 

Yes. By setting MOVE_MAX_PIECES to 16, TImode compare
might be generated as it checks vector mode first then
uses scalar mode by default.

Thanks
Gui Haochen
  
Kewen.Lin Oct. 17, 2023, 2:19 a.m. UTC | #5
Hi,

on 2023/10/10 16:18, HAO CHEN GUI wrote:
> Hi David,
> 
>   Thanks for your review comments.
> 
> 在 2023/10/9 23:42, David Edelsohn 写道:
>>      #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
>>      #define MAX_MOVE_MAX 8
>>     +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
>>     +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
>>
>>
>> How are the definitions of MOVE_MAX_PIECES and COMPARE_MAX_PIECES determined?  The email does not provide any explanation for the implementation.  The rest of the patch is related to vector support, but vector support is not dependent on TARGET_POWERPC64.
> 
> By default, MOVE_MAX_PIECES and COMPARE_MAX_PIECES is set the same value
> as MOVE_MAX. The move and compare instructions are required in
> compare_by_pieces, those macros are set to 16 byte when supporting
> vector mode (V16QImode). The problem is rs6000 hasn't supported TImode
> for "-m32". We discussed it in issue 1307. TImode will be used for
> move when MOVE_MAX_PIECES is set to 16. But TImode isn't supported
> with "-m32" which might cause ICE.

I think David raised a good question, it sounds to me that the current
handling simply consider that if MOVE_MAX_PIECES is set to 16, the
required operations for this optimization on TImode are always available,
but unfortunately on rs6000 the assumption doesn't hold, so could we
teach generic code instead?

BR,
Kewen
  
HAO CHEN GUI Oct. 19, 2023, 8:39 a.m. UTC | #6
Kewen & David,
  Thanks for your comments.

在 2023/10/17 10:19, Kewen.Lin 写道:
> I think David raised a good question, it sounds to me that the current
> handling simply consider that if MOVE_MAX_PIECES is set to 16, the
> required operations for this optimization on TImode are always available,
> but unfortunately on rs6000 the assumption doesn't hold, so could we
> teach generic code instead?

Finally I found that it doesn't check if the scalar mode used in by pieces
operations is enabled by the target. The TImode is not enabled on ppc. It
should be checked before taking TImode to do by pieces operations. I made
a patch for the generic code and testing it. With the patch, 16-byte
comparison could be enabled on both ppc64 and ppc.

Thanks
Gui Haochen
  

Patch

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index e8a596fb7e9..c69bf266402 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2605,6 +2605,39 @@  (define_insn "altivec_vupklpx"
 }
   [(set_attr "type" "vecperm")])

+(define_expand "cbranchv16qi4"
+  [(use (match_operator 0 "equality_operator"
+	[(match_operand:V16QI 1 "gpc_reg_operand")
+	 (match_operand:V16QI 2 "gpc_reg_operand")]))
+   (use (match_operand 3))]
+  "VECTOR_UNIT_ALTIVEC_P (V16QImode)"
+{
+  if (!TARGET_P9_VECTOR
+      && MEM_P (operands[1])
+      && !altivec_indexed_or_indirect_operand (operands[1], V16QImode)
+      && MEM_P (operands[2])
+      && !altivec_indexed_or_indirect_operand (operands[2], V16QImode))
+    {
+      /* Use direct move as the byte order doesn't matter for equality
+	 compare.  */
+      rtx reg_op1 = gen_reg_rtx (V16QImode);
+      rtx reg_op2 = gen_reg_rtx (V16QImode);
+      rs6000_emit_le_vsx_permute (reg_op1, operands[1], V16QImode);
+      rs6000_emit_le_vsx_permute (reg_op2, operands[2], V16QImode);
+      operands[1] = reg_op1;
+      operands[2] = reg_op2;
+    }
+  else
+    {
+      operands[1] = force_reg (V16QImode, operands[1]);
+      operands[2] = force_reg (V16QImode, operands[2]);
+    }
+  rtx_code code = GET_CODE (operands[0]);
+  operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1], operands[2]);
+  rs6000_emit_cbranch (V16QImode, operands);
+  DONE;
+})
+
 ;; Compare vectors producing a vector result and a predicate, setting CR6 to
 ;; indicate a combined status
 (define_insn "altivec_vcmpequ<VI_char>_p"
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index efe9adce1f8..0087d786840 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15264,6 +15264,15 @@  rs6000_generate_compare (rtx cmp, machine_mode mode)
 	  else
 	    emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b));
 	}
+      else if (mode == V16QImode)
+	{
+	  gcc_assert (code == EQ || code == NE);
+
+	  rtx result_vector = gen_reg_rtx (V16QImode);
+	  compare_result = gen_rtx_REG (CCmode, CR6_REGNO);
+	  emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1));
+	  code = (code == NE) ? GE : LT;
+	}
       else
 	emit_insn (gen_rtx_SET (compare_result,
 				gen_rtx_COMPARE (comp_mode, op0, op1)));
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..dc33bca0802 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1730,6 +1730,8 @@  typedef struct rs6000_args
    in one reasonably fast instruction.  */
 #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
 #define MAX_MOVE_MAX 8
+#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
+#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)

 /* Nonzero if access to memory by bytes is no faster than for words.
    Also nonzero if doing byte operations (specifically shifts) in registers
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449.c b/gcc/testsuite/gcc.target/powerpc/pr111449.c
new file mode 100644
index 00000000000..a8c30b92a41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111449.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-maltivec -O2" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+/* Ensure vector comparison is used for 16-byte memory equality compare.  */
+
+int compare1 (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 16) == 0;
+}
+
+int compare2 (const char* s1)
+{
+  return __builtin_memcmp (s1, "0123456789012345", 16) == 0;
+}
+
+/* { dg-final { scan-assembler-times {\mvcmpequb\.} 2 } } */
+/* { dg-final { scan-assembler-not {\mcmpd\M} } } */