[1/3] RISC-V: movmem for RISCV with V extension

Message ID 20231211094728.1623032-2-slewis@rivosinc.com
State Accepted
Headers
Series RISC-V: vectorised memory operations |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Sergei Lewis Dec. 11, 2023, 9:47 a.m. UTC
  gcc/ChangeLog

    * config/riscv/riscv.md (movmem<mode>): Use riscv_vector::expand_block_move,
    if and only if we know the entire operation can be performed using one vector
    load followed by one vector store

gcc/testsuite/ChangeLog

    * gcc.target/riscv/rvv/base/movmem-1.c: New test
---
 gcc/config/riscv/riscv.md                     | 22 +++++++
 .../gcc.target/riscv/rvv/base/movmem-1.c      | 59 +++++++++++++++++++
 2 files changed, 81 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
  

Comments

Robin Dapp Dec. 11, 2023, 10:08 a.m. UTC | #1
Hi Sergei,

thanks for contributing this!

Small general remarks/nits upfront:

The code looks like it hasn't been run through clang-format or
similar.  Please make sure that it adheres to the GNU coding
conventions.  The same applies to comments.  Some of them start
in lowercase.

As you rely on the vector length, please make sure to test various
combinations (also "exotic" ones) like zve32 and zve64.
Also, please specify which configurations it has been tested on. 

>     * config/riscv/riscv.md (movmem<mode>): Use riscv_vector::expand_block_move,
>     if and only if we know the entire operation can be performed using one vector
>     load followed by one vector store
> 
> gcc/testsuite/ChangeLog
> 
>     * gcc.target/riscv/rvv/base/movmem-1.c: New test

Please add a PR target/112109 here.  I believe after these
patches have landed we can close that bug.

> ---
>  gcc/config/riscv/riscv.md                     | 22 +++++++
>  .../gcc.target/riscv/rvv/base/movmem-1.c      | 59 +++++++++++++++++++
>  2 files changed, 81 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
> 
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index eed997116b0..88fde290a8a 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -2359,6 +2359,28 @@
>      FAIL;
>  })
>  
> +;; inlining general memmove is a pessimisation: we can't avoid having to decide
> +;; which direction to go at runtime, which is costly in instruction count
> +;; however for situations where the entire move fits in one vector operation
> +;; we can do all reads before doing any writes so we don't have to worry
> +;; so generate the inline vector code in such situations
> +;; nb. prefer scalar path for tiny memmoves
> +(define_expand "movmem<mode>"
> +  [(parallel [(set (match_operand:BLK 0 "general_operand")
> +      (match_operand:BLK 1 "general_operand"))
> +	      (use (match_operand:P 2 ""))
> +	      (use (match_operand:SI 3 "const_int_operand"))])]
> +  "TARGET_VECTOR"
> +{
> +  if ((INTVAL (operands[2]) >= TARGET_MIN_VLEN/8)

If operands[2] is used as an int we need to make sure the constraint
says so.  Shouldn't operand[1] be a memory_operand?

> +	&& (INTVAL (operands[2]) <= TARGET_MIN_VLEN)
> +	&& riscv_vector::expand_block_move (operands[0], operands[1],
> +	     operands[2]))
> +    DONE;
> +  else
> +    FAIL;
> +})
> +

> +#define MIN_VECTOR_BYTES (__riscv_v_min_vlen/8)
> +
> +/* tiny memmoves should not be vectorised
> +** f1:
> +**  li\s+a2,15
> +**  tail\s+memmove
> +*/
> +char * f1 (char *a, char const *b)
> +{
> +  return memmove (a, b, 15);
> +}

The < 16 assumption might not hold on embedded targets.
Same with the other tests.

Regards
 Robin
  
Robin Dapp Dec. 11, 2023, 10:21 a.m. UTC | #2
Ah, please also ensure to include (and follow) the stringop_strategy
checks. (LIBCALL, VECTOR)
The naming is a bit unfortunate still but that need not be fixed
in this patch.  

Regards
 Robin
  

Patch

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index eed997116b0..88fde290a8a 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2359,6 +2359,28 @@ 
     FAIL;
 })
 
+;; inlining general memmove is a pessimisation: we can't avoid having to decide
+;; which direction to go at runtime, which is costly in instruction count
+;; however for situations where the entire move fits in one vector operation
+;; we can do all reads before doing any writes so we don't have to worry
+;; so generate the inline vector code in such situations
+;; nb. prefer scalar path for tiny memmoves
+(define_expand "movmem<mode>"
+  [(parallel [(set (match_operand:BLK 0 "general_operand")
+      (match_operand:BLK 1 "general_operand"))
+	      (use (match_operand:P 2 ""))
+	      (use (match_operand:SI 3 "const_int_operand"))])]
+  "TARGET_VECTOR"
+{
+  if ((INTVAL (operands[2]) >= TARGET_MIN_VLEN/8)
+	&& (INTVAL (operands[2]) <= TARGET_MIN_VLEN)
+	&& riscv_vector::expand_block_move (operands[0], operands[1],
+	     operands[2]))
+    DONE;
+  else
+    FAIL;
+})
+
 ;; Expand in-line code to clear the instruction cache between operand[0] and
 ;; operand[1].
 (define_expand "clear_cache"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
new file mode 100644
index 00000000000..b930241ae5d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
@@ -0,0 +1,59 @@ 
+/* { dg-do compile } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <string.h>
+
+#define MIN_VECTOR_BYTES (__riscv_v_min_vlen/8)
+
+/* tiny memmoves should not be vectorised
+** f1:
+**  li\s+a2,15
+**  tail\s+memmove
+*/
+char * f1 (char *a, char const *b)
+{
+  return memmove (a, b, 15);
+}
+
+/* vectorise+inline minimum vector register width with LMUL=1
+** f2:
+**  (
+**  vsetivli\s+zero,16,e8,m1,ta,ma
+**  |
+**  li\s+[ta][0-7],\d+
+**  vsetvli\s+zero,[ta][0-7],e8,m1,ta,ma
+**  )
+**  vle8\.v\s+v\d+,0\(a1\)
+**  vse8\.v\s+v\d+,0\(a0\)
+**  ret
+*/
+char * f2 (char *a, char const *b)
+{
+  return memmove (a, b, MIN_VECTOR_BYTES);
+}
+
+/* vectorise+inline up to LMUL=8
+** f3:
+**  li\s+[ta][0-7],\d+
+**  vsetvli\s+zero,[ta][0-7],e8,m8,ta,ma
+**  vle8\.v\s+v\d+,0\(a1\)
+**  vse8\.v\s+v\d+,0\(a0\)
+**  ret
+*/
+char * f3 (char *a, char const *b)
+{
+  return memmove (a, b, MIN_VECTOR_BYTES*8);
+}
+
+/* don't vectorise if the move is too large for one operation
+** f4:
+**  li\s+a2,\d+
+**  tail\s+memmove
+*/
+char * f4 (char *a, char const *b)
+{
+  return memmove (a, b, MIN_VECTOR_BYTES*8+1);
+}
+