[Committed] RISC-V: Fix mem-to-mem VLS move pattern[PR111566]

Message ID 20230926131532.1935361-1-juzhe.zhong@rivai.ai
State Unresolved
Headers
Series [Committed] RISC-V: Fix mem-to-mem VLS move pattern[PR111566] |

Checks

Context Check Description
snail/gcc-patch-check warning Git am fail log

Commit Message

juzhe.zhong@rivai.ai Sept. 26, 2023, 1:15 p.m. UTC
  The mem-to-mem insn pattern is splitted from reg-to-mem/mem-to-reg/reg-to-reg
causes ICE in RA since RA prefer they stay together.

Now, we split mem-to-mem as a pure pre-RA split pattern and only allow
define_insn match mem-to-mem VLS move in pre-RA stage (Forbid mem-to-mem move after RA).

Tested no difference. Committed.

	PR target/111566

gcc/ChangeLog:

	* config/riscv/vector.md (*mov<mode>_mem_to_mem):

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/fortran/pr111566.f90: New test.

---
 gcc/config/riscv/vector.md                    | 19 +++++++++---
 .../gcc.target/riscv/rvv/fortran/pr111566.f90 | 31 +++++++++++++++++++
 2 files changed, 45 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/fortran/pr111566.f90
  

Comments

Jeff Law Sept. 26, 2023, 1:35 p.m. UTC | #1
On 9/26/23 07:15, Juzhe-Zhong wrote:
> The mem-to-mem insn pattern is splitted from reg-to-mem/mem-to-reg/reg-to-reg
> causes ICE in RA since RA prefer they stay together.
> 
> Now, we split mem-to-mem as a pure pre-RA split pattern and only allow
> define_insn match mem-to-mem VLS move in pre-RA stage (Forbid mem-to-mem move after RA).
> 
> Tested no difference. Committed.
> 
> 	PR target/111566
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/vector.md (*mov<mode>_mem_to_mem):
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/rvv/fortran/pr111566.f90: New test.
ChangeLog for the vector.md is missing.

In general we shouldn't be allowing mem->mem in most patterns since the 
hardware doesn't actually implement such instructions.  I suspect that's 
the real problem here and that ultimately you're just papering over it.



Jeff
  
juzhe.zhong@rivai.ai Sept. 26, 2023, 2:51 p.m. UTC | #2
Thanks Jeff.

Address comments:
[PATCH V2] RISC-V: Fix mem-to-mem VLS move pattern[PR111566] (gnu.org)

Actually, we only allow mem-to-mem move for VLS modes size <= MAX_BITS_PER_WORD.
Since we want to optimize this case:
-	    typedef int8_t v2qi __attribute__ ((vector_size (2)));
-	    v2qi v = *(v2qi*)in;
-	    *(v2qi*)out = v;
using scalar load/store.

Does it look more reasonable ?


juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-09-26 21:35
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; rdapp.gcc
Subject: Re: [Committed] RISC-V: Fix mem-to-mem VLS move pattern[PR111566]
 
 
On 9/26/23 07:15, Juzhe-Zhong wrote:
> The mem-to-mem insn pattern is splitted from reg-to-mem/mem-to-reg/reg-to-reg
> causes ICE in RA since RA prefer they stay together.
> 
> Now, we split mem-to-mem as a pure pre-RA split pattern and only allow
> define_insn match mem-to-mem VLS move in pre-RA stage (Forbid mem-to-mem move after RA).
> 
> Tested no difference. Committed.
> 
> PR target/111566
> 
> gcc/ChangeLog:
> 
> * config/riscv/vector.md (*mov<mode>_mem_to_mem):
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/riscv/rvv/fortran/pr111566.f90: New test.
ChangeLog for the vector.md is missing.
 
In general we shouldn't be allowing mem->mem in most patterns since the 
hardware doesn't actually implement such instructions.  I suspect that's 
the real problem here and that ultimately you're just papering over it.
 
 
 
Jeff
  
Jeff Law Sept. 26, 2023, 3:15 p.m. UTC | #3
On 9/26/23 08:51, 钟居哲 wrote:
> Thanks Jeff.
> 
> Address comments:
> [PATCH V2] RISC-V: Fix mem-to-mem VLS move pattern[PR111566] (gnu.org) 
> <https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631395.html>
> 
> Actually, we only allow mem-to-mem move for VLS modes size <= 
> MAX_BITS_PER_WORD.
> Since we want to optimize this case:
> 
> -	    typedef int8_t v2qi __attribute__ ((vector_size (2)));
> -	    v2qi v = *(v2qi*)in;
> -	    *(v2qi*)out = v;
> 
> using scalar load/store.
That should be do-able without resorting to a pattern that allowed 
mem->mem moves.

THe thing you have to be careful about is in the effort to optimize this 
case, you can end up confusing the register allocator into making poor 
choices elsewhere.  ie, once you expose a small vector move as 
implementable in GPRs you run the very real risk of pessimizing other code.

But even with that caveat, the better way to go here is to disallow the 
mem->mem case.

jeff
  
juzhe.zhong@rivai.ai Sept. 26, 2023, 3:27 p.m. UTC | #4
OK。

Remove mem-to-mem pattern:
[PATCH V3] RISC-V: Remove mem-to-mem VLS move pattern[PR111566] (gnu.org)




juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-09-26 23:15
To: 钟居哲; gcc-patches
CC: kito.cheng; kito.cheng; rdapp.gcc
Subject: Re: [Committed] RISC-V: Fix mem-to-mem VLS move pattern[PR111566]
 
 
On 9/26/23 08:51, 钟居哲 wrote:
> Thanks Jeff.
> 
> Address comments:
> [PATCH V2] RISC-V: Fix mem-to-mem VLS move pattern[PR111566] (gnu.org) 
> <https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631395.html>
> 
> Actually, we only allow mem-to-mem move for VLS modes size <= 
> MAX_BITS_PER_WORD.
> Since we want to optimize this case:
> 
> -     typedef int8_t v2qi __attribute__ ((vector_size (2)));
> -     v2qi v = *(v2qi*)in;
> -     *(v2qi*)out = v;
> 
> using scalar load/store.
That should be do-able without resorting to a pattern that allowed 
mem->mem moves.
 
THe thing you have to be careful about is in the effort to optimize this 
case, you can end up confusing the register allocator into making poor 
choices elsewhere.  ie, once you expose a small vector move as 
implementable in GPRs you run the very real risk of pessimizing other code.
 
But even with that caveat, the better way to go here is to disallow the 
mem->mem case.
 
jeff
  
juzhe.zhong@rivai.ai Sept. 26, 2023, 11:08 p.m. UTC | #5
Hi, Jeff.

I removed mem-to-mem patterns as you suggested that means we don't have scalar move optimization for small size vector modes.
Is it ok for trunk?  
Since it is a bug fix patch, I hope we can land it soon. We may will find another way to optimize small size vector mode mem-to-mem.



juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-09-26 23:15
To: 钟居哲; gcc-patches
CC: kito.cheng; kito.cheng; rdapp.gcc
Subject: Re: [Committed] RISC-V: Fix mem-to-mem VLS move pattern[PR111566]
 
 
On 9/26/23 08:51, 钟居哲 wrote:
> Thanks Jeff.
> 
> Address comments:
> [PATCH V2] RISC-V: Fix mem-to-mem VLS move pattern[PR111566] (gnu.org) 
> <https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631395.html>
> 
> Actually, we only allow mem-to-mem move for VLS modes size <= 
> MAX_BITS_PER_WORD.
> Since we want to optimize this case:
> 
> -     typedef int8_t v2qi __attribute__ ((vector_size (2)));
> -     v2qi v = *(v2qi*)in;
> -     *(v2qi*)out = v;
> 
> using scalar load/store.
That should be do-able without resorting to a pattern that allowed 
mem->mem moves.
 
THe thing you have to be careful about is in the effort to optimize this 
case, you can end up confusing the register allocator into making poor 
choices elsewhere.  ie, once you expose a small vector move as 
implementable in GPRs you run the very real risk of pessimizing other code.
 
But even with that caveat, the better way to go here is to disallow the 
mem->mem case.
 
jeff
  
Jeff Law Sept. 27, 2023, 10:07 p.m. UTC | #6
On 9/26/23 17:08, 钟居哲 wrote:
> Hi, Jeff.
> 
> I removed mem-to-mem patterns as you suggested that means we don't have 
> scalar move optimization for small size vector modes.
> Is it ok for trunk?
> Since it is a bug fix patch, I hope we can land it soon. We may will 
> find another way to optimize small size vector mode mem-to-mem.
It's OK with me.

jeff
  

Patch

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index d5300a33946..a98242f2fd8 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1222,12 +1222,14 @@ 
     DONE;
 })
 
-(define_insn_and_split "*mov<mode>_mem_to_mem"
+;; Some VLS modes (like V2SImode) have size <= a general purpose
+;; register width, we optimize such mem-to-mem move into mem-to-mem
+;; scalar move.  Otherwise, we always force operands[1] into register
+;; so that we will never get mem-to-mem move after RA.
+(define_split
   [(set (match_operand:VLS_AVL_IMM 0 "memory_operand")
 	(match_operand:VLS_AVL_IMM 1 "memory_operand"))]
   "TARGET_VECTOR && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
   [(const_int 0)]
   {
     if (GET_MODE_BITSIZE (<MODE>mode).to_constant () <= MAX_BITS_PER_WORD)
@@ -1256,14 +1258,21 @@ 
       }
     DONE;
   }
-  [(set_attr "type" "vmov")]
 )
 
+;; We recognize mem-to-mem move in pre-RA stage so that we won't have
+;; ICE (unrecognizable insn: (set (mem) (mem))).  Then, the previous
+;; mem-to-mem split pattern will force operands[1] into a register so
+;; that mem-to-mem move will never happen after RA.
+;;
+;; We don't allow mem-to-mem move in post-RA stage since we
+;; don't have an instruction to split mem-to-mem move after RA.
 (define_insn_and_split "*mov<mode>"
   [(set (match_operand:VLS_AVL_IMM 0 "reg_or_mem_operand" "=vr, m, vr")
 	(match_operand:VLS_AVL_IMM 1 "reg_or_mem_operand" "  m,vr, vr"))]
   "TARGET_VECTOR
-   && (register_operand (operands[0], <MODE>mode)
+   && (can_create_pseudo_p ()
+       || register_operand (operands[0], <MODE>mode)
        || register_operand (operands[1], <MODE>mode))"
   "@
    #
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/fortran/pr111566.f90 b/gcc/testsuite/gcc.target/riscv/rvv/fortran/pr111566.f90
new file mode 100644
index 00000000000..2e30dc9bfaa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/fortran/pr111566.f90
@@ -0,0 +1,31 @@ 
+! { dg-do compile }
+! { dg-options "-march=rv64gcv -mabi=lp64d -Ofast -fallow-argument-mismatch -fmax-stack-var-size=65536 -S  -std=legacy -w" }
+
+module a
+  integer,parameter :: SHR_KIND_R8 = selected_real_kind(12)
+end module a
+module b
+  use a,  c => shr_kind_r8
+contains
+  subroutine d(cg , km, i1, i2)
+    real (c) ch(i2,km)
+    real (c) cg(4,i1:i2,km)
+    real  dc(i2,km)
+    real(c) ci(i2,km)
+    real(c) cj(i2,km)
+    do k=2,ck
+       do i=i1,0
+          cl = ci(i,k) *ci(i,1) /      cj(i,k)+ch(i,1)
+          cm = cg(1,i,k) - min(e,cg(1,i,co))
+          dc(i,k) = sign(cm, cl)
+       enddo
+    enddo
+    if ( cq == 0 ) then
+       do i=i1,i2
+          if( cr <=  cs ) then
+             cg= sign( min(ct,   cg),  cg)
+          endif
+       enddo
+    endif
+  end subroutine d
+end module b