[v1] RISC-V: Bugfix for vec_init repeating auto vectorization in RV32

Message ID 20230614005859.960040-1-pan2.li@intel.com
State Unresolved
Headers
Series [v1] RISC-V: Bugfix for vec_init repeating auto vectorization in RV32 |

Checks

Context Check Description
snail/gcc-patch-check warning Git am fail log

Commit Message

Li, Pan2 via Gcc-patches June 14, 2023, 12:58 a.m. UTC
  From: Pan Li <pan2.li@intel.com>

This patch would like to fix one bug exported by RV32 test case
multiple_rgroup_run-2.c. The mask should be restricted by elen in
vector, and the condition between the vmv.s.x and the vmv.v.x should
take inner_bits_size rather than constants.

Passed both the rv32 and rv64 riscv/rvv tests.

Signed-off-by: Pan Li <pan2.li@intel.com>

gcc/ChangeLog:

	* config/riscv/riscv-v.cc (rvv_builder::get_merge_scalar_mask):
	Take elen instead of scalar BITS_PER_WORD.
	(expand_vector_init_merge_repeating_sequence): Use inner_bits_size
	instead of scaler BITS_PER_WORD.
---
 gcc/config/riscv/riscv-v.cc | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)
  

Comments

juzhe.zhong@rivai.ai June 14, 2023, 1:07 a.m. UTC | #1
>> unsigned int elen = TARGET_VECTOR_ELEN_64 ? 64 : 32;
Add comment here to demonstrate why you pick up elen to set the LIMIT.
I understand:
1. -march=zve32* ===> ELEN = 32
    -march=zve64* ===> ELEN = 64
2. both vmv.v.x/vmv.s.x is restrict to the ELEN
    For example, When ELEN=32 (-march=zve32*)
    vsetvli ...e64,m1
    vmv.v.x/vmv.s.x
    We can't support such code sequence.

You should demonstrate it clearly in the comments.

Otherwise, this patch LGTM.


juzhe.zhong@rivai.ai
 
From: pan2.li
Date: 2023-06-14 08:58
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Bugfix for vec_init repeating auto vectorization in RV32
From: Pan Li <pan2.li@intel.com>
 
This patch would like to fix one bug exported by RV32 test case
multiple_rgroup_run-2.c. The mask should be restricted by elen in
vector, and the condition between the vmv.s.x and the vmv.v.x should
take inner_bits_size rather than constants.
 
Passed both the rv32 and rv64 riscv/rvv tests.
 
Signed-off-by: Pan Li <pan2.li@intel.com>
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (rvv_builder::get_merge_scalar_mask):
Take elen instead of scalar BITS_PER_WORD.
(expand_vector_init_merge_repeating_sequence): Use inner_bits_size
instead of scaler BITS_PER_WORD.
---
gcc/config/riscv/riscv-v.cc | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index fb970344521..9270e258ca3 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -399,10 +399,11 @@ rvv_builder::get_merge_scalar_mask (unsigned int index_in_pattern) const
{
   unsigned HOST_WIDE_INT mask = 0;
   unsigned HOST_WIDE_INT base_mask = (1ULL << index_in_pattern);
+  unsigned int elen = TARGET_VECTOR_ELEN_64 ? 64 : 32;
-  gcc_assert (BITS_PER_WORD % npatterns () == 0);
+  gcc_assert (elen % npatterns () == 0);
-  int limit = BITS_PER_WORD / npatterns ();
+  int limit = elen / npatterns ();
   for (int i = 0; i < limit; i++)
     mask |= base_mask << (i * npatterns ());
@@ -1923,7 +1924,7 @@ expand_vector_init_merge_repeating_sequence (rtx target,
       rtx mask = gen_reg_rtx (mask_mode);
       rtx dup = gen_reg_rtx (dup_mode);
-      if (full_nelts <= BITS_PER_WORD) /* vmv.s.x.  */
+      if (full_nelts <= builder.inner_bits_size ()) /* vmv.s.x.  */
{
  rtx ops[] = {dup, gen_scalar_move_mask (dup_mask_mode),
    RVV_VUNDEF (dup_mode), merge_mask};
@@ -1933,7 +1934,8 @@ expand_vector_init_merge_repeating_sequence (rtx target,
       else /* vmv.v.x.  */
{
  rtx ops[] = {dup, force_reg (GET_MODE_INNER (dup_mode), merge_mask)};
-   rtx vl = gen_int_mode (CEIL (full_nelts, BITS_PER_WORD), Pmode);
+   rtx vl = gen_int_mode (CEIL (full_nelts, builder.inner_bits_size ()),
+ Pmode);
  emit_nonvlmax_integer_move_insn (code_for_pred_broadcast (dup_mode),
   ops, vl);
}
-- 
2.34.1
  
Li, Pan2 via Gcc-patches June 14, 2023, 7:30 a.m. UTC | #2
Thanks Juzhe for reviewing, update the PATCH v2 as below.

https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621645.html

Pan

From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Wednesday, June 14, 2023 9:07 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Robin Dapp <rdapp.gcc@gmail.com>; jeffreyalaw <jeffreyalaw@gmail.com>; Li, Pan2 <pan2.li@intel.com>; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng <kito.cheng@gmail.com>
Subject: Re: [PATCH v1] RISC-V: Bugfix for vec_init repeating auto vectorization in RV32


>> unsigned int elen = TARGET_VECTOR_ELEN_64 ? 64 : 32;
Add comment here to demonstrate why you pick up elen to set the LIMIT.
I understand:
1. -march=zve32* ===> ELEN = 32
    -march=zve64* ===> ELEN = 64
2. both vmv.v.x/vmv.s.x is restrict to the ELEN
    For example, When ELEN=32 (-march=zve32*)
    vsetvli ...e64,m1
    vmv.v.x/vmv.s.x
    We can't support such code sequence.

You should demonstrate it clearly in the comments.

Otherwise, this patch LGTM.
  

Patch

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index fb970344521..9270e258ca3 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -399,10 +399,11 @@  rvv_builder::get_merge_scalar_mask (unsigned int index_in_pattern) const
 {
   unsigned HOST_WIDE_INT mask = 0;
   unsigned HOST_WIDE_INT base_mask = (1ULL << index_in_pattern);
+  unsigned int elen = TARGET_VECTOR_ELEN_64 ? 64 : 32;
 
-  gcc_assert (BITS_PER_WORD % npatterns () == 0);
+  gcc_assert (elen % npatterns () == 0);
 
-  int limit = BITS_PER_WORD / npatterns ();
+  int limit = elen / npatterns ();
 
   for (int i = 0; i < limit; i++)
     mask |= base_mask << (i * npatterns ());
@@ -1923,7 +1924,7 @@  expand_vector_init_merge_repeating_sequence (rtx target,
       rtx mask = gen_reg_rtx (mask_mode);
       rtx dup = gen_reg_rtx (dup_mode);
 
-      if (full_nelts <= BITS_PER_WORD) /* vmv.s.x.  */
+      if (full_nelts <= builder.inner_bits_size ()) /* vmv.s.x.  */
 	{
 	  rtx ops[] = {dup, gen_scalar_move_mask (dup_mask_mode),
 	    RVV_VUNDEF (dup_mode), merge_mask};
@@ -1933,7 +1934,8 @@  expand_vector_init_merge_repeating_sequence (rtx target,
       else /* vmv.v.x.  */
 	{
 	  rtx ops[] = {dup, force_reg (GET_MODE_INNER (dup_mode), merge_mask)};
-	  rtx vl = gen_int_mode (CEIL (full_nelts, BITS_PER_WORD), Pmode);
+	  rtx vl = gen_int_mode (CEIL (full_nelts, builder.inner_bits_size ()),
+				 Pmode);
 	  emit_nonvlmax_integer_move_insn (code_for_pred_broadcast (dup_mode),
 					   ops, vl);
 	}