[V2] RISC-V: Add RVV FMA auto-vectorization support
Checks
Commit Message
From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
This patch support FMA auto-vectorization pattern.
1. Let's RA decide vmacc or vmadd.
2. Fix bug of vector.md which generate incorrect information to VSETVL
PASS when testing ternop-3.c.
gcc/ChangeLog:
* config/riscv/autovec.md (fma<mode>4): New pattern.
(*fma<mode>): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(emit_vlmax_ternary_insn): New function.
* config/riscv/riscv-v.cc (emit_vlmax_ternary_insn): Ditto.
* config/riscv/vector.md: Fix vimuladd instruction bug.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp: Add ternary tests
* gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: New test.
---
gcc/config/riscv/autovec.md | 65 +++++++++++
gcc/config/riscv/riscv-protos.h | 2 +
gcc/config/riscv/riscv-v.cc | 20 ++++
gcc/config/riscv/vector.md | 2 +-
.../riscv/rvv/autovec/ternop/ternop-1.c | 28 +++++
.../riscv/rvv/autovec/ternop/ternop-2.c | 34 ++++++
.../riscv/rvv/autovec/ternop/ternop-3.c | 33 ++++++
.../riscv/rvv/autovec/ternop/ternop_run-1.c | 84 ++++++++++++++
.../riscv/rvv/autovec/ternop/ternop_run-2.c | 104 ++++++++++++++++++
.../riscv/rvv/autovec/ternop/ternop_run-3.c | 104 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 2 +
11 files changed, 477 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c
Comments
Ping。Ok for trunk?
juzhe.zhong@rivai.ai
From: juzhe.zhong
Date: 2023-05-26 19:35
To: gcc-patches
CC: kito.cheng; palmer; rdapp.gcc; jeffreyalaw; kito.cheng; pan2.li; Juzhe-Zhong
Subject: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support
From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
This patch support FMA auto-vectorization pattern.
1. Let's RA decide vmacc or vmadd.
2. Fix bug of vector.md which generate incorrect information to VSETVL
PASS when testing ternop-3.c.
gcc/ChangeLog:
* config/riscv/autovec.md (fma<mode>4): New pattern.
(*fma<mode>): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(emit_vlmax_ternary_insn): New function.
* config/riscv/riscv-v.cc (emit_vlmax_ternary_insn): Ditto.
* config/riscv/vector.md: Fix vimuladd instruction bug.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp: Add ternary tests
* gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: New test.
---
gcc/config/riscv/autovec.md | 65 +++++++++++
gcc/config/riscv/riscv-protos.h | 2 +
gcc/config/riscv/riscv-v.cc | 20 ++++
gcc/config/riscv/vector.md | 2 +-
.../riscv/rvv/autovec/ternop/ternop-1.c | 28 +++++
.../riscv/rvv/autovec/ternop/ternop-2.c | 34 ++++++
.../riscv/rvv/autovec/ternop/ternop-3.c | 33 ++++++
.../riscv/rvv/autovec/ternop/ternop_run-1.c | 84 ++++++++++++++
.../riscv/rvv/autovec/ternop/ternop_run-2.c | 104 ++++++++++++++++++
.../riscv/rvv/autovec/ternop/ternop_run-3.c | 104 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 2 +
11 files changed, 477 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7fe4d94de39..04825df1210 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -373,3 +373,68 @@
DONE;
}
)
+
+;; =========================================================================
+;; == Ternary arithmetic
+;; =========================================================================
+
+;; -------------------------------------------------------------------------
+;; ---- [INT] VMACC and VMADD
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - vmacc
+;; - vmadd
+;; -------------------------------------------------------------------------
+
+;; We can't expand FMA for the following reasons:
+;; 1. Before RA, we don't know which multiply-add instruction is the ideal one.
+;; The vmacc is the ideal instruction when operands[3] overlaps operands[0].
+;; The vmadd is the ideal instruction when operands[1|2] overlaps operands[0].
+;; 2. According to vector.md, the multiply-add patterns has 'merge' operand which
+;; is the operands[5]. Since operands[5] should overlap operands[0], this operand
+;; should be allocated the same regno as operands[1|2|3].
+;; 3. The 'merge' operand is always a real merge operand and we don't allow undefined
+;; operand.
+;; 4. The operation of FMA pattern needs VLMAX vsetlvi which needs a VL operand.
+;;
+;; In this situation, we design the codegen of FMA as follows:
+;; 1. clobber a scratch in the expand pattern of FMA.
+;; 2. Let's RA decide which input operand (operands[1|2|3]) overlap operands[0].
+;; 3. Generate instructions (vmacc or vmadd) according to the register allocation
+;; result after reload_completed.
+(define_expand "fma<mode>4"
+ [(parallel
+ [(set (match_operand:VI 0 "register_operand" "=vr")
+ (plus:VI
+ (mult:VI
+ (match_operand:VI 1 "register_operand" " vr")
+ (match_operand:VI 2 "register_operand" " vr"))
+ (match_operand:VI 3 "register_operand" " vr")))
+ (clobber (match_scratch:SI 4))])]
+ "TARGET_VECTOR"
+ {})
+
+(define_insn_and_split "*fma<mode>"
+ [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr")
+ (plus:VI
+ (mult:VI
+ (match_operand:VI 1 "register_operand" " %0, vr, vr")
+ (match_operand:VI 2 "register_operand" " vr, vr, vr"))
+ (match_operand:VI 3 "register_operand" " vr, 0, vr")))
+ (clobber (match_scratch:SI 4 "=r,r,r"))]
+ "TARGET_VECTOR"
+ "#"
+ "&& reload_completed"
+ [(const_int 0)]
+ {
+ PUT_MODE (operands[4], Pmode);
+ riscv_vector::emit_vlmax_vsetvl (<MODE>mode, operands[4]);
+ if (which_alternative == 2)
+ emit_insn (gen_rtx_SET (operands[0], operands[3]));
+ rtx ops[] = {operands[0], operands[1], operands[2], operands[3], operands[0]};
+ riscv_vector::emit_vlmax_ternary_insn (code_for_pred_mul_plus (<MODE>mode),
+ riscv_vector::RVV_TERNOP, ops, operands[4]);
+ DONE;
+ }
+ [(set_attr "type" "vimuladd")
+ (set_attr "mode" "<MODE>")])
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 36419c95bbd..157c271bc93 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -140,6 +140,7 @@ enum insn_type
RVV_MERGE_OP = 4,
RVV_CMP_OP = 4,
RVV_CMP_MU_OP = RVV_CMP_OP + 2, /* +2 means mask and maskoff operand. */
+ RVV_TERNOP = 5,
};
enum vlmul_type
{
@@ -176,6 +177,7 @@ bool legitimize_move (rtx, rtx);
void emit_vlmax_vsetvl (machine_mode, rtx);
void emit_hard_vlmax_vsetvl (machine_mode, rtx);
void emit_vlmax_insn (unsigned, int, rtx *, rtx = 0);
+void emit_vlmax_ternary_insn (unsigned, int, rtx *, rtx = 0);
void emit_nonvlmax_insn (unsigned, int, rtx *, rtx);
void emit_vlmax_merge_insn (unsigned, int, rtx *);
void emit_vlmax_cmp_insn (unsigned, rtx *);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index f71ad9e46a1..ac1d0e61e83 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -362,6 +362,26 @@ emit_vlmax_insn (unsigned icode, int op_num, rtx *ops, rtx vl)
e.emit_insn ((enum insn_code) icode, ops);
}
+/* This function emits a {VLMAX, TAIL_ANY, MASK_ANY} vsetvli followed by the
+ * ternary operation which always has a real merge operand. */
+void
+emit_vlmax_ternary_insn (unsigned icode, int op_num, rtx *ops, rtx vl)
+{
+ machine_mode dest_mode = GET_MODE (ops[0]);
+ machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+ /* We have a maximum of 11 operands for RVV instruction patterns according to
+ * vector.md. */
+ insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
+ /*FULLY_UNMASKED_P*/ true,
+ /*USE_REAL_MERGE_P*/ true, /*HAS_AVL_P*/ true,
+ /*VLMAX_P*/ true,
+ /*DEST_MODE*/ dest_mode, /*MASK_MODE*/ mask_mode);
+ e.set_policy (TAIL_ANY);
+ e.set_policy (MASK_ANY);
+ e.set_vl (vl);
+ e.emit_insn ((enum insn_code) icode, ops);
+}
+
/* This function emits a {NONVLMAX, TAIL_ANY, MASK_ANY} vsetvli followed by the
* actual operation. */
void
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 15f66efaa48..cd696da5d89 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -388,7 +388,7 @@
(symbol_ref "INTVAL (operands[7])"))
(eq_attr "type" "vldux,vldox,vialu,vshift,viminmax,vimul,vidiv,vsalu,\
- viwalu,viwmul,vnshift,vimuladd,vaalu,vsmul,vsshift,\
+ viwalu,viwmul,vnshift,vaalu,vsmul,vsshift,\
vnclip,vicmp,vfalu,vfmul,vfminmax,vfdiv,vfwalu,vfwmul,\
vfsgnj,vfcmp,vfmuladd,vslideup,vslidedown,vislide1up,\
vislide1down,vfslide1up,vfslide1down,vgather,viwmuladd,vfwmuladd,\
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-1.c
new file mode 100644
index 00000000000..1996ca65108
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE) \
+ __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, \
+ TYPE *__restrict a, \
+ TYPE *__restrict b, int n) \
+ { \
+ for (int i = 0; i < n; i++) \
+ dst[i] += a[i] * b[i]; \
+ }
+
+#define TEST_ALL() \
+ TEST_TYPE (int8_t) \
+ TEST_TYPE (uint8_t) \
+ TEST_TYPE (int16_t) \
+ TEST_TYPE (uint16_t) \
+ TEST_TYPE (int32_t) \
+ TEST_TYPE (uint32_t) \
+ TEST_TYPE (int64_t) \
+ TEST_TYPE (uint64_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvmadd\.vv} 8 } } */
+/* { dg-final { scan-assembler-not {\tvmv} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c
new file mode 100644
index 00000000000..89eeaf6315f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE) \
+ __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dest1, \
+ TYPE *__restrict dest2, \
+ TYPE *__restrict dest3, \
+ TYPE *__restrict src1, \
+ TYPE *__restrict src2, int n) \
+ { \
+ for (int i = 0; i < n; ++i) \
+ { \
+ dest1[i] += src1[i] * src2[i]; \
+ dest2[i] += src1[i] * dest1[i]; \
+ dest3[i] += src2[i] * dest2[i]; \
+ } \
+ }
+
+#define TEST_ALL() \
+ TEST_TYPE (int8_t) \
+ TEST_TYPE (uint8_t) \
+ TEST_TYPE (int16_t) \
+ TEST_TYPE (uint16_t) \
+ TEST_TYPE (int32_t) \
+ TEST_TYPE (uint32_t) \
+ TEST_TYPE (int64_t) \
+ TEST_TYPE (uint64_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvmacc\.vv} 8 } } */
+/* { dg-final { scan-assembler-not {\tvmv} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-3.c
new file mode 100644
index 00000000000..127e701b187
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-3.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE) \
+ __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dest1, \
+ TYPE *__restrict dest2, \
+ TYPE *__restrict dest3, \
+ TYPE *__restrict src1, \
+ TYPE *__restrict src2, int n) \
+ { \
+ for (int i = 0; i < n; ++i) \
+ { \
+ dest1[i] = src1[i] * src2[i] + dest2[i]; \
+ dest2[i] += src1[i] * dest1[i]; \
+ dest3[i] += src2[i] * dest2[i]; \
+ } \
+ }
+
+#define TEST_ALL() \
+ TEST_TYPE (int8_t) \
+ TEST_TYPE (uint8_t) \
+ TEST_TYPE (int16_t) \
+ TEST_TYPE (uint16_t) \
+ TEST_TYPE (int32_t) \
+ TEST_TYPE (uint32_t) \
+ TEST_TYPE (int64_t) \
+ TEST_TYPE (uint64_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvmv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c
new file mode 100644
index 00000000000..1f69b694818
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c
@@ -0,0 +1,84 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+
+#include "ternop-1.c"
+
+#define TEST_LOOP(TYPE, NUM) \
+ { \
+ TYPE array1_##NUM[NUM] = {}; \
+ TYPE array2_##NUM[NUM] = {}; \
+ TYPE array3_##NUM[NUM] = {}; \
+ TYPE array4_##NUM[NUM] = {}; \
+ for (int i = 0; i < NUM; ++i) \
+ { \
+ array1_##NUM[i] = (i & 1) + 5; \
+ array2_##NUM[i] = i - NUM / 3; \
+ array3_##NUM[i] = NUM - NUM / 3 - i; \
+ array4_##NUM[i] = NUM - NUM / 3 - i; \
+ asm volatile("" ::: "memory"); \
+ } \
+ ternop_##TYPE (array3_##NUM, array1_##NUM, array2_##NUM, NUM); \
+ for (int i = 0; i < NUM; i++) \
+ if (array3_##NUM[i] \
+ != (TYPE) (array1_##NUM[i] * array2_##NUM[i] + array4_##NUM[i])) \
+ __builtin_abort (); \
+ }
+
+int __attribute__ ((optimize (0))) main ()
+{
+ TEST_LOOP (int8_t, 7)
+ TEST_LOOP (uint8_t, 7)
+ TEST_LOOP (int16_t, 7)
+ TEST_LOOP (uint16_t, 7)
+ TEST_LOOP (int32_t, 7)
+ TEST_LOOP (uint32_t, 7)
+ TEST_LOOP (int64_t, 7)
+ TEST_LOOP (uint64_t, 7)
+
+ TEST_LOOP (int8_t, 16)
+ TEST_LOOP (uint8_t, 16)
+ TEST_LOOP (int16_t, 16)
+ TEST_LOOP (uint16_t, 16)
+ TEST_LOOP (int32_t, 16)
+ TEST_LOOP (uint32_t, 16)
+ TEST_LOOP (int64_t, 16)
+ TEST_LOOP (uint64_t, 16)
+
+ TEST_LOOP (int8_t, 77)
+ TEST_LOOP (uint8_t, 77)
+ TEST_LOOP (int16_t, 77)
+ TEST_LOOP (uint16_t, 77)
+ TEST_LOOP (int32_t, 77)
+ TEST_LOOP (uint32_t, 77)
+ TEST_LOOP (int64_t, 77)
+ TEST_LOOP (uint64_t, 77)
+
+ TEST_LOOP (int8_t, 128)
+ TEST_LOOP (uint8_t, 128)
+ TEST_LOOP (int16_t, 128)
+ TEST_LOOP (uint16_t, 128)
+ TEST_LOOP (int32_t, 128)
+ TEST_LOOP (uint32_t, 128)
+ TEST_LOOP (int64_t, 128)
+ TEST_LOOP (uint64_t, 128)
+
+ TEST_LOOP (int8_t, 15641)
+ TEST_LOOP (uint8_t, 15641)
+ TEST_LOOP (int16_t, 15641)
+ TEST_LOOP (uint16_t, 15641)
+ TEST_LOOP (int32_t, 15641)
+ TEST_LOOP (uint32_t, 15641)
+ TEST_LOOP (int64_t, 15641)
+ TEST_LOOP (uint64_t, 15641)
+
+ TEST_LOOP (int8_t, 795)
+ TEST_LOOP (uint8_t, 795)
+ TEST_LOOP (int16_t, 795)
+ TEST_LOOP (uint16_t, 795)
+ TEST_LOOP (int32_t, 795)
+ TEST_LOOP (uint32_t, 795)
+ TEST_LOOP (int64_t, 795)
+ TEST_LOOP (uint64_t, 795)
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c
new file mode 100644
index 00000000000..103b98acdf0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c
@@ -0,0 +1,104 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+
+#include "ternop-2.c"
+
+#define TEST_LOOP(TYPE, NUM) \
+ { \
+ TYPE array1_##NUM[NUM] = {}; \
+ TYPE array2_##NUM[NUM] = {}; \
+ TYPE array3_##NUM[NUM] = {}; \
+ TYPE array4_##NUM[NUM] = {}; \
+ TYPE array5_##NUM[NUM] = {}; \
+ TYPE array6_##NUM[NUM] = {}; \
+ TYPE array7_##NUM[NUM] = {}; \
+ TYPE array8_##NUM[NUM] = {}; \
+ for (int i = 0; i < NUM; ++i) \
+ { \
+ array1_##NUM[i] = (i & 1) + 5; \
+ array2_##NUM[i] = i - NUM / 3; \
+ array3_##NUM[i] = NUM - NUM / 3 - i; \
+ array6_##NUM[i] = NUM - NUM / 3 - i; \
+ array4_##NUM[i] = NUM - NUM / 2 + i; \
+ array7_##NUM[i] = NUM - NUM / 2 + i; \
+ array5_##NUM[i] = NUM + i * 7; \
+ array8_##NUM[i] = NUM + i * 7; \
+ asm volatile("" ::: "memory"); \
+ } \
+ ternop_##TYPE (array3_##NUM, array4_##NUM, array5_##NUM, array1_##NUM, \
+ array2_##NUM, NUM); \
+ for (int i = 0; i < NUM; i++) \
+ { \
+ array6_##NUM[i] \
+ = (TYPE) (array1_##NUM[i] * array2_##NUM[i] + array6_##NUM[i]); \
+ if (array3_##NUM[i] != array6_##NUM[i]) \
+ __builtin_abort (); \
+ array7_##NUM[i] \
+ = (TYPE) (array1_##NUM[i] * array6_##NUM[i] + array7_##NUM[i]); \
+ if (array4_##NUM[i] != array7_##NUM[i]) \
+ __builtin_abort (); \
+ array8_##NUM[i] \
+ = (TYPE) (array2_##NUM[i] * array7_##NUM[i] + array8_##NUM[i]); \
+ if (array5_##NUM[i] != array8_##NUM[i]) \
+ __builtin_abort (); \
+ } \
+ }
+
+int __attribute__ ((optimize (0))) main ()
+{
+ TEST_LOOP (int8_t, 7)
+ TEST_LOOP (uint8_t, 7)
+ TEST_LOOP (int16_t, 7)
+ TEST_LOOP (uint16_t, 7)
+ TEST_LOOP (int32_t, 7)
+ TEST_LOOP (uint32_t, 7)
+ TEST_LOOP (int64_t, 7)
+ TEST_LOOP (uint64_t, 7)
+
+ TEST_LOOP (int8_t, 16)
+ TEST_LOOP (uint8_t, 16)
+ TEST_LOOP (int16_t, 16)
+ TEST_LOOP (uint16_t, 16)
+ TEST_LOOP (int32_t, 16)
+ TEST_LOOP (uint32_t, 16)
+ TEST_LOOP (int64_t, 16)
+ TEST_LOOP (uint64_t, 16)
+
+ TEST_LOOP (int8_t, 77)
+ TEST_LOOP (uint8_t, 77)
+ TEST_LOOP (int16_t, 77)
+ TEST_LOOP (uint16_t, 77)
+ TEST_LOOP (int32_t, 77)
+ TEST_LOOP (uint32_t, 77)
+ TEST_LOOP (int64_t, 77)
+ TEST_LOOP (uint64_t, 77)
+
+ TEST_LOOP (int8_t, 128)
+ TEST_LOOP (uint8_t, 128)
+ TEST_LOOP (int16_t, 128)
+ TEST_LOOP (uint16_t, 128)
+ TEST_LOOP (int32_t, 128)
+ TEST_LOOP (uint32_t, 128)
+ TEST_LOOP (int64_t, 128)
+ TEST_LOOP (uint64_t, 128)
+
+ TEST_LOOP (int8_t, 15641)
+ TEST_LOOP (uint8_t, 15641)
+ TEST_LOOP (int16_t, 15641)
+ TEST_LOOP (uint16_t, 15641)
+ TEST_LOOP (int32_t, 15641)
+ TEST_LOOP (uint32_t, 15641)
+ TEST_LOOP (int64_t, 15641)
+ TEST_LOOP (uint64_t, 15641)
+
+ TEST_LOOP (int8_t, 795)
+ TEST_LOOP (uint8_t, 795)
+ TEST_LOOP (int16_t, 795)
+ TEST_LOOP (uint16_t, 795)
+ TEST_LOOP (int32_t, 795)
+ TEST_LOOP (uint32_t, 795)
+ TEST_LOOP (int64_t, 795)
+ TEST_LOOP (uint64_t, 795)
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c
new file mode 100644
index 00000000000..eac5408ce6f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c
@@ -0,0 +1,104 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+
+#include "ternop-3.c"
+
+#define TEST_LOOP(TYPE, NUM) \
+ { \
+ TYPE array1_##NUM[NUM] = {}; \
+ TYPE array2_##NUM[NUM] = {}; \
+ TYPE array3_##NUM[NUM] = {}; \
+ TYPE array4_##NUM[NUM] = {}; \
+ TYPE array5_##NUM[NUM] = {}; \
+ TYPE array6_##NUM[NUM] = {}; \
+ TYPE array7_##NUM[NUM] = {}; \
+ TYPE array8_##NUM[NUM] = {}; \
+ for (int i = 0; i < NUM; ++i) \
+ { \
+ array1_##NUM[i] = (i & 1) + 5; \
+ array2_##NUM[i] = i - NUM / 3; \
+ array3_##NUM[i] = NUM - NUM / 3 - i; \
+ array6_##NUM[i] = NUM - NUM / 3 - i; \
+ array4_##NUM[i] = NUM - NUM / 2 + i; \
+ array7_##NUM[i] = NUM - NUM / 2 + i; \
+ array5_##NUM[i] = NUM + i * 7; \
+ array8_##NUM[i] = NUM + i * 7; \
+ asm volatile("" ::: "memory"); \
+ } \
+ ternop_##TYPE (array3_##NUM, array4_##NUM, array5_##NUM, array1_##NUM, \
+ array2_##NUM, NUM); \
+ for (int i = 0; i < NUM; i++) \
+ { \
+ array6_##NUM[i] \
+ = (TYPE) (array1_##NUM[i] * array2_##NUM[i] + array7_##NUM[i]); \
+ if (array3_##NUM[i] != array6_##NUM[i]) \
+ __builtin_abort (); \
+ array7_##NUM[i] \
+ = (TYPE) (array1_##NUM[i] * array6_##NUM[i] + array7_##NUM[i]); \
+ if (array4_##NUM[i] != array7_##NUM[i]) \
+ __builtin_abort (); \
+ array8_##NUM[i] \
+ = (TYPE) (array2_##NUM[i] * array7_##NUM[i] + array8_##NUM[i]); \
+ if (array5_##NUM[i] != array8_##NUM[i]) \
+ __builtin_abort (); \
+ } \
+ }
+
+int __attribute__ ((optimize (0))) main ()
+{
+ TEST_LOOP (int8_t, 7)
+ TEST_LOOP (uint8_t, 7)
+ TEST_LOOP (int16_t, 7)
+ TEST_LOOP (uint16_t, 7)
+ TEST_LOOP (int32_t, 7)
+ TEST_LOOP (uint32_t, 7)
+ TEST_LOOP (int64_t, 7)
+ TEST_LOOP (uint64_t, 7)
+
+ TEST_LOOP (int8_t, 16)
+ TEST_LOOP (uint8_t, 16)
+ TEST_LOOP (int16_t, 16)
+ TEST_LOOP (uint16_t, 16)
+ TEST_LOOP (int32_t, 16)
+ TEST_LOOP (uint32_t, 16)
+ TEST_LOOP (int64_t, 16)
+ TEST_LOOP (uint64_t, 16)
+
+ TEST_LOOP (int8_t, 77)
+ TEST_LOOP (uint8_t, 77)
+ TEST_LOOP (int16_t, 77)
+ TEST_LOOP (uint16_t, 77)
+ TEST_LOOP (int32_t, 77)
+ TEST_LOOP (uint32_t, 77)
+ TEST_LOOP (int64_t, 77)
+ TEST_LOOP (uint64_t, 77)
+
+ TEST_LOOP (int8_t, 128)
+ TEST_LOOP (uint8_t, 128)
+ TEST_LOOP (int16_t, 128)
+ TEST_LOOP (uint16_t, 128)
+ TEST_LOOP (int32_t, 128)
+ TEST_LOOP (uint32_t, 128)
+ TEST_LOOP (int64_t, 128)
+ TEST_LOOP (uint64_t, 128)
+
+ TEST_LOOP (int8_t, 15641)
+ TEST_LOOP (uint8_t, 15641)
+ TEST_LOOP (int16_t, 15641)
+ TEST_LOOP (uint16_t, 15641)
+ TEST_LOOP (int32_t, 15641)
+ TEST_LOOP (uint32_t, 15641)
+ TEST_LOOP (int64_t, 15641)
+ TEST_LOOP (uint64_t, 15641)
+
+ TEST_LOOP (int8_t, 795)
+ TEST_LOOP (uint8_t, 795)
+ TEST_LOOP (int16_t, 795)
+ TEST_LOOP (uint16_t, 795)
+ TEST_LOOP (int32_t, 795)
+ TEST_LOOP (uint32_t, 795)
+ TEST_LOOP (int64_t, 795)
+ TEST_LOOP (uint64_t, 795)
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 9809a421fc8..7bd803303d0 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -65,6 +65,8 @@ foreach op $AUTOVEC_TEST_OPTS {
"" "$op"
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/cmp/*.\[cS\]]] \
"" "$op"
+ dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/ternop/*.\[cS\]]] \
+ "" "$op"
}
# VLS-VLMAX tests
--
2.36.1
LGTM, but with one question.
On Fri, May 26, 2023 at 7:36 PM <juzhe.zhong@rivai.ai> wrote:
>
> From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
>
> This patch support FMA auto-vectorization pattern.
> 1. Let's RA decide vmacc or vmadd.
> 2. Fix bug of vector.md which generate incorrect information to VSETVL
> PASS when testing ternop-3.c.
Does this bug also appear in GCC 13? or this is new bug introduced at trunk
This is existing bug in GCC 13. I think I should split into 2 patches.
juzhe.zhong@rivai.ai
From: Kito Cheng
Date: 2023-05-29 11:17
To: juzhe.zhong
CC: gcc-patches; kito.cheng; palmer; rdapp.gcc; jeffreyalaw; pan2.li
Subject: Re: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support
LGTM, but with one question.
On Fri, May 26, 2023 at 7:36 PM <juzhe.zhong@rivai.ai> wrote:
>
> From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
>
> This patch support FMA auto-vectorization pattern.
> 1. Let's RA decide vmacc or vmadd.
> 2. Fix bug of vector.md which generate incorrect information to VSETVL
> PASS when testing ternop-3.c.
Does this bug also appear in GCC 13? or this is new bug introduced at trunk
Committed with 2 patches, thanks Kito.
Pan
From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Monday, May 29, 2023 11:19 AM
To: kito.cheng <kito.cheng@gmail.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; palmer <palmer@rivosinc.com>; Robin Dapp <rdapp.gcc@gmail.com>; jeffreyalaw <jeffreyalaw@gmail.com>; Li, Pan2 <pan2.li@intel.com>
Subject: Re: Re: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support
This is existing bug in GCC 13. I think I should split into 2 patches.
________________________________
juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
From: Kito Cheng<mailto:kito.cheng@gmail.com>
Date: 2023-05-29 11:17
To: juzhe.zhong<mailto:juzhe.zhong@rivai.ai>
CC: gcc-patches<mailto:gcc-patches@gcc.gnu.org>; kito.cheng<mailto:kito.cheng@sifive.com>; palmer<mailto:palmer@rivosinc.com>; rdapp.gcc<mailto:rdapp.gcc@gmail.com>; jeffreyalaw<mailto:jeffreyalaw@gmail.com>; pan2.li<mailto:pan2.li@intel.com>
Subject: Re: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support
LGTM, but with one question.
On Fri, May 26, 2023 at 7:36 PM <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>> wrote:
>
> From: Juzhe-Zhong <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>
>
> This patch support FMA auto-vectorization pattern.
> 1. Let's RA decide vmacc or vmadd.
> 2. Fix bug of vector.md which generate incorrect information to VSETVL
> PASS when testing ternop-3.c.
Does this bug also appear in GCC 13? or this is new bug introduced at trunk
pushed the bug fixed part to gcc 13 branch
On Mon, May 29, 2023 at 12:52 PM Li, Pan2 via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Committed with 2 patches, thanks Kito.
>
> Pan
>
> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> Sent: Monday, May 29, 2023 11:19 AM
> To: kito.cheng <kito.cheng@gmail.com>
> Cc: gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; palmer <palmer@rivosinc.com>; Robin Dapp <rdapp.gcc@gmail.com>; jeffreyalaw <jeffreyalaw@gmail.com>; Li, Pan2 <pan2.li@intel.com>
> Subject: Re: Re: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support
>
> This is existing bug in GCC 13. I think I should split into 2 patches.
>
> ________________________________
> juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
>
> From: Kito Cheng<mailto:kito.cheng@gmail.com>
> Date: 2023-05-29 11:17
> To: juzhe.zhong<mailto:juzhe.zhong@rivai.ai>
> CC: gcc-patches<mailto:gcc-patches@gcc.gnu.org>; kito.cheng<mailto:kito.cheng@sifive.com>; palmer<mailto:palmer@rivosinc.com>; rdapp.gcc<mailto:rdapp.gcc@gmail.com>; jeffreyalaw<mailto:jeffreyalaw@gmail.com>; pan2.li<mailto:pan2.li@intel.com>
> Subject: Re: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support
> LGTM, but with one question.
>
> On Fri, May 26, 2023 at 7:36 PM <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>> wrote:
> >
> > From: Juzhe-Zhong <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>
> >
> > This patch support FMA auto-vectorization pattern.
> > 1. Let's RA decide vmacc or vmadd.
> > 2. Fix bug of vector.md which generate incorrect information to VSETVL
> > PASS when testing ternop-3.c.
>
> Does this bug also appear in GCC 13? or this is new bug introduced at trunk
>
Looks there may be unnecessary due to the release/gcc-13 has the code as is.
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 15f66efaa48..cd696da5d89 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -388,7 +388,7 @@ (define_attr "avl_type" ""
(symbol_ref "INTVAL (operands[7])"))
(eq_attr "type" "vldux,vldox,vialu,vshift,viminmax,vimul,vidiv,vsalu,\
- viwalu,viwmul,vnshift,vimuladd,vaalu,vsmul,vsshift,\
+ viwalu,viwmul,vnshift,vaalu,vsmul,vsshift,\
vnclip,vicmp,vfalu,vfmul,vfminmax,vfdiv,vfwalu,vfwmul,\
vfsgnj,vfcmp,vfmuladd,vslideup,vslidedown,vislide1up,\
vislide1down,vfslide1up,vfslide1down,vgather,viwmuladd,vfwmuladd,\
Pan
-----Original Message-----
From: Kito Cheng <kito.cheng@gmail.com>
Sent: Monday, May 29, 2023 6:35 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: juzhe.zhong@rivai.ai; gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; palmer <palmer@rivosinc.com>; Robin Dapp <rdapp.gcc@gmail.com>; jeffreyalaw <jeffreyalaw@gmail.com>
Subject: Re: Re: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support
pushed the bug fixed part to gcc 13 branch
On Mon, May 29, 2023 at 12:52 PM Li, Pan2 via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>
> Committed with 2 patches, thanks Kito.
>
> Pan
>
> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> Sent: Monday, May 29, 2023 11:19 AM
> To: kito.cheng <kito.cheng@gmail.com>
> Cc: gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng
> <kito.cheng@sifive.com>; palmer <palmer@rivosinc.com>; Robin Dapp
> <rdapp.gcc@gmail.com>; jeffreyalaw <jeffreyalaw@gmail.com>; Li, Pan2
> <pan2.li@intel.com>
> Subject: Re: Re: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization
> support
>
> This is existing bug in GCC 13. I think I should split into 2 patches.
>
> ________________________________
> juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
>
> From: Kito Cheng<mailto:kito.cheng@gmail.com>
> Date: 2023-05-29 11:17
> To: juzhe.zhong<mailto:juzhe.zhong@rivai.ai>
> CC: gcc-patches<mailto:gcc-patches@gcc.gnu.org>;
> kito.cheng<mailto:kito.cheng@sifive.com>;
> palmer<mailto:palmer@rivosinc.com>;
> rdapp.gcc<mailto:rdapp.gcc@gmail.com>;
> jeffreyalaw<mailto:jeffreyalaw@gmail.com>;
> pan2.li<mailto:pan2.li@intel.com>
> Subject: Re: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support
> LGTM, but with one question.
>
> On Fri, May 26, 2023 at 7:36 PM <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>> wrote:
> >
> > From: Juzhe-Zhong
> > <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>
> >
> > This patch support FMA auto-vectorization pattern.
> > 1. Let's RA decide vmacc or vmadd.
> > 2. Fix bug of vector.md which generate incorrect information to VSETVL
> > PASS when testing ternop-3.c.
>
> Does this bug also appear in GCC 13? or this is new bug introduced at
> trunk
>
@@ -373,3 +373,68 @@
DONE;
}
)
+
+;; =========================================================================
+;; == Ternary arithmetic
+;; =========================================================================
+
+;; -------------------------------------------------------------------------
+;; ---- [INT] VMACC and VMADD
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - vmacc
+;; - vmadd
+;; -------------------------------------------------------------------------
+
+;; We can't expand FMA for the following reasons:
+;; 1. Before RA, we don't know which multiply-add instruction is the ideal one.
+;; The vmacc is the ideal instruction when operands[3] overlaps operands[0].
+;; The vmadd is the ideal instruction when operands[1|2] overlaps operands[0].
+;; 2. According to vector.md, the multiply-add patterns has 'merge' operand which
+;; is the operands[5]. Since operands[5] should overlap operands[0], this operand
+;; should be allocated the same regno as operands[1|2|3].
+;; 3. The 'merge' operand is always a real merge operand and we don't allow undefined
+;; operand.
+;; 4. The operation of FMA pattern needs VLMAX vsetlvi which needs a VL operand.
+;;
+;; In this situation, we design the codegen of FMA as follows:
+;; 1. clobber a scratch in the expand pattern of FMA.
+;; 2. Let's RA decide which input operand (operands[1|2|3]) overlap operands[0].
+;; 3. Generate instructions (vmacc or vmadd) according to the register allocation
+;; result after reload_completed.
+(define_expand "fma<mode>4"
+ [(parallel
+ [(set (match_operand:VI 0 "register_operand" "=vr")
+ (plus:VI
+ (mult:VI
+ (match_operand:VI 1 "register_operand" " vr")
+ (match_operand:VI 2 "register_operand" " vr"))
+ (match_operand:VI 3 "register_operand" " vr")))
+ (clobber (match_scratch:SI 4))])]
+ "TARGET_VECTOR"
+ {})
+
+(define_insn_and_split "*fma<mode>"
+ [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr")
+ (plus:VI
+ (mult:VI
+ (match_operand:VI 1 "register_operand" " %0, vr, vr")
+ (match_operand:VI 2 "register_operand" " vr, vr, vr"))
+ (match_operand:VI 3 "register_operand" " vr, 0, vr")))
+ (clobber (match_scratch:SI 4 "=r,r,r"))]
+ "TARGET_VECTOR"
+ "#"
+ "&& reload_completed"
+ [(const_int 0)]
+ {
+ PUT_MODE (operands[4], Pmode);
+ riscv_vector::emit_vlmax_vsetvl (<MODE>mode, operands[4]);
+ if (which_alternative == 2)
+ emit_insn (gen_rtx_SET (operands[0], operands[3]));
+ rtx ops[] = {operands[0], operands[1], operands[2], operands[3], operands[0]};
+ riscv_vector::emit_vlmax_ternary_insn (code_for_pred_mul_plus (<MODE>mode),
+ riscv_vector::RVV_TERNOP, ops, operands[4]);
+ DONE;
+ }
+ [(set_attr "type" "vimuladd")
+ (set_attr "mode" "<MODE>")])
@@ -140,6 +140,7 @@ enum insn_type
RVV_MERGE_OP = 4,
RVV_CMP_OP = 4,
RVV_CMP_MU_OP = RVV_CMP_OP + 2, /* +2 means mask and maskoff operand. */
+ RVV_TERNOP = 5,
};
enum vlmul_type
{
@@ -176,6 +177,7 @@ bool legitimize_move (rtx, rtx);
void emit_vlmax_vsetvl (machine_mode, rtx);
void emit_hard_vlmax_vsetvl (machine_mode, rtx);
void emit_vlmax_insn (unsigned, int, rtx *, rtx = 0);
+void emit_vlmax_ternary_insn (unsigned, int, rtx *, rtx = 0);
void emit_nonvlmax_insn (unsigned, int, rtx *, rtx);
void emit_vlmax_merge_insn (unsigned, int, rtx *);
void emit_vlmax_cmp_insn (unsigned, rtx *);
@@ -362,6 +362,26 @@ emit_vlmax_insn (unsigned icode, int op_num, rtx *ops, rtx vl)
e.emit_insn ((enum insn_code) icode, ops);
}
+/* This function emits a {VLMAX, TAIL_ANY, MASK_ANY} vsetvli followed by the
+ * ternary operation which always has a real merge operand. */
+void
+emit_vlmax_ternary_insn (unsigned icode, int op_num, rtx *ops, rtx vl)
+{
+ machine_mode dest_mode = GET_MODE (ops[0]);
+ machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+ /* We have a maximum of 11 operands for RVV instruction patterns according to
+ * vector.md. */
+ insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
+ /*FULLY_UNMASKED_P*/ true,
+ /*USE_REAL_MERGE_P*/ true, /*HAS_AVL_P*/ true,
+ /*VLMAX_P*/ true,
+ /*DEST_MODE*/ dest_mode, /*MASK_MODE*/ mask_mode);
+ e.set_policy (TAIL_ANY);
+ e.set_policy (MASK_ANY);
+ e.set_vl (vl);
+ e.emit_insn ((enum insn_code) icode, ops);
+}
+
/* This function emits a {NONVLMAX, TAIL_ANY, MASK_ANY} vsetvli followed by the
* actual operation. */
void
@@ -388,7 +388,7 @@
(symbol_ref "INTVAL (operands[7])"))
(eq_attr "type" "vldux,vldox,vialu,vshift,viminmax,vimul,vidiv,vsalu,\
- viwalu,viwmul,vnshift,vimuladd,vaalu,vsmul,vsshift,\
+ viwalu,viwmul,vnshift,vaalu,vsmul,vsshift,\
vnclip,vicmp,vfalu,vfmul,vfminmax,vfdiv,vfwalu,vfwmul,\
vfsgnj,vfcmp,vfmuladd,vslideup,vslidedown,vislide1up,\
vislide1down,vfslide1up,vfslide1down,vgather,viwmuladd,vfwmuladd,\
new file mode 100644
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE) \
+ __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, \
+ TYPE *__restrict a, \
+ TYPE *__restrict b, int n) \
+ { \
+ for (int i = 0; i < n; i++) \
+ dst[i] += a[i] * b[i]; \
+ }
+
+#define TEST_ALL() \
+ TEST_TYPE (int8_t) \
+ TEST_TYPE (uint8_t) \
+ TEST_TYPE (int16_t) \
+ TEST_TYPE (uint16_t) \
+ TEST_TYPE (int32_t) \
+ TEST_TYPE (uint32_t) \
+ TEST_TYPE (int64_t) \
+ TEST_TYPE (uint64_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvmadd\.vv} 8 } } */
+/* { dg-final { scan-assembler-not {\tvmv} } } */
new file mode 100644
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE) \
+ __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dest1, \
+ TYPE *__restrict dest2, \
+ TYPE *__restrict dest3, \
+ TYPE *__restrict src1, \
+ TYPE *__restrict src2, int n) \
+ { \
+ for (int i = 0; i < n; ++i) \
+ { \
+ dest1[i] += src1[i] * src2[i]; \
+ dest2[i] += src1[i] * dest1[i]; \
+ dest3[i] += src2[i] * dest2[i]; \
+ } \
+ }
+
+#define TEST_ALL() \
+ TEST_TYPE (int8_t) \
+ TEST_TYPE (uint8_t) \
+ TEST_TYPE (int16_t) \
+ TEST_TYPE (uint16_t) \
+ TEST_TYPE (int32_t) \
+ TEST_TYPE (uint32_t) \
+ TEST_TYPE (int64_t) \
+ TEST_TYPE (uint64_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvmacc\.vv} 8 } } */
+/* { dg-final { scan-assembler-not {\tvmv} } } */
new file mode 100644
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE) \
+ __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dest1, \
+ TYPE *__restrict dest2, \
+ TYPE *__restrict dest3, \
+ TYPE *__restrict src1, \
+ TYPE *__restrict src2, int n) \
+ { \
+ for (int i = 0; i < n; ++i) \
+ { \
+ dest1[i] = src1[i] * src2[i] + dest2[i]; \
+ dest2[i] += src1[i] * dest1[i]; \
+ dest3[i] += src2[i] * dest2[i]; \
+ } \
+ }
+
+#define TEST_ALL() \
+ TEST_TYPE (int8_t) \
+ TEST_TYPE (uint8_t) \
+ TEST_TYPE (int16_t) \
+ TEST_TYPE (uint16_t) \
+ TEST_TYPE (int32_t) \
+ TEST_TYPE (uint32_t) \
+ TEST_TYPE (int64_t) \
+ TEST_TYPE (uint64_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvmv} 8 } } */
new file mode 100644
@@ -0,0 +1,84 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+
+#include "ternop-1.c"
+
+#define TEST_LOOP(TYPE, NUM) \
+ { \
+ TYPE array1_##NUM[NUM] = {}; \
+ TYPE array2_##NUM[NUM] = {}; \
+ TYPE array3_##NUM[NUM] = {}; \
+ TYPE array4_##NUM[NUM] = {}; \
+ for (int i = 0; i < NUM; ++i) \
+ { \
+ array1_##NUM[i] = (i & 1) + 5; \
+ array2_##NUM[i] = i - NUM / 3; \
+ array3_##NUM[i] = NUM - NUM / 3 - i; \
+ array4_##NUM[i] = NUM - NUM / 3 - i; \
+ asm volatile("" ::: "memory"); \
+ } \
+ ternop_##TYPE (array3_##NUM, array1_##NUM, array2_##NUM, NUM); \
+ for (int i = 0; i < NUM; i++) \
+ if (array3_##NUM[i] \
+ != (TYPE) (array1_##NUM[i] * array2_##NUM[i] + array4_##NUM[i])) \
+ __builtin_abort (); \
+ }
+
+int __attribute__ ((optimize (0))) main ()
+{
+ TEST_LOOP (int8_t, 7)
+ TEST_LOOP (uint8_t, 7)
+ TEST_LOOP (int16_t, 7)
+ TEST_LOOP (uint16_t, 7)
+ TEST_LOOP (int32_t, 7)
+ TEST_LOOP (uint32_t, 7)
+ TEST_LOOP (int64_t, 7)
+ TEST_LOOP (uint64_t, 7)
+
+ TEST_LOOP (int8_t, 16)
+ TEST_LOOP (uint8_t, 16)
+ TEST_LOOP (int16_t, 16)
+ TEST_LOOP (uint16_t, 16)
+ TEST_LOOP (int32_t, 16)
+ TEST_LOOP (uint32_t, 16)
+ TEST_LOOP (int64_t, 16)
+ TEST_LOOP (uint64_t, 16)
+
+ TEST_LOOP (int8_t, 77)
+ TEST_LOOP (uint8_t, 77)
+ TEST_LOOP (int16_t, 77)
+ TEST_LOOP (uint16_t, 77)
+ TEST_LOOP (int32_t, 77)
+ TEST_LOOP (uint32_t, 77)
+ TEST_LOOP (int64_t, 77)
+ TEST_LOOP (uint64_t, 77)
+
+ TEST_LOOP (int8_t, 128)
+ TEST_LOOP (uint8_t, 128)
+ TEST_LOOP (int16_t, 128)
+ TEST_LOOP (uint16_t, 128)
+ TEST_LOOP (int32_t, 128)
+ TEST_LOOP (uint32_t, 128)
+ TEST_LOOP (int64_t, 128)
+ TEST_LOOP (uint64_t, 128)
+
+ TEST_LOOP (int8_t, 15641)
+ TEST_LOOP (uint8_t, 15641)
+ TEST_LOOP (int16_t, 15641)
+ TEST_LOOP (uint16_t, 15641)
+ TEST_LOOP (int32_t, 15641)
+ TEST_LOOP (uint32_t, 15641)
+ TEST_LOOP (int64_t, 15641)
+ TEST_LOOP (uint64_t, 15641)
+
+ TEST_LOOP (int8_t, 795)
+ TEST_LOOP (uint8_t, 795)
+ TEST_LOOP (int16_t, 795)
+ TEST_LOOP (uint16_t, 795)
+ TEST_LOOP (int32_t, 795)
+ TEST_LOOP (uint32_t, 795)
+ TEST_LOOP (int64_t, 795)
+ TEST_LOOP (uint64_t, 795)
+
+ return 0;
+}
new file mode 100644
@@ -0,0 +1,104 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+
+#include "ternop-2.c"
+
+#define TEST_LOOP(TYPE, NUM) \
+ { \
+ TYPE array1_##NUM[NUM] = {}; \
+ TYPE array2_##NUM[NUM] = {}; \
+ TYPE array3_##NUM[NUM] = {}; \
+ TYPE array4_##NUM[NUM] = {}; \
+ TYPE array5_##NUM[NUM] = {}; \
+ TYPE array6_##NUM[NUM] = {}; \
+ TYPE array7_##NUM[NUM] = {}; \
+ TYPE array8_##NUM[NUM] = {}; \
+ for (int i = 0; i < NUM; ++i) \
+ { \
+ array1_##NUM[i] = (i & 1) + 5; \
+ array2_##NUM[i] = i - NUM / 3; \
+ array3_##NUM[i] = NUM - NUM / 3 - i; \
+ array6_##NUM[i] = NUM - NUM / 3 - i; \
+ array4_##NUM[i] = NUM - NUM / 2 + i; \
+ array7_##NUM[i] = NUM - NUM / 2 + i; \
+ array5_##NUM[i] = NUM + i * 7; \
+ array8_##NUM[i] = NUM + i * 7; \
+ asm volatile("" ::: "memory"); \
+ } \
+ ternop_##TYPE (array3_##NUM, array4_##NUM, array5_##NUM, array1_##NUM, \
+ array2_##NUM, NUM); \
+ for (int i = 0; i < NUM; i++) \
+ { \
+ array6_##NUM[i] \
+ = (TYPE) (array1_##NUM[i] * array2_##NUM[i] + array6_##NUM[i]); \
+ if (array3_##NUM[i] != array6_##NUM[i]) \
+ __builtin_abort (); \
+ array7_##NUM[i] \
+ = (TYPE) (array1_##NUM[i] * array6_##NUM[i] + array7_##NUM[i]); \
+ if (array4_##NUM[i] != array7_##NUM[i]) \
+ __builtin_abort (); \
+ array8_##NUM[i] \
+ = (TYPE) (array2_##NUM[i] * array7_##NUM[i] + array8_##NUM[i]); \
+ if (array5_##NUM[i] != array8_##NUM[i]) \
+ __builtin_abort (); \
+ } \
+ }
+
+int __attribute__ ((optimize (0))) main ()
+{
+ TEST_LOOP (int8_t, 7)
+ TEST_LOOP (uint8_t, 7)
+ TEST_LOOP (int16_t, 7)
+ TEST_LOOP (uint16_t, 7)
+ TEST_LOOP (int32_t, 7)
+ TEST_LOOP (uint32_t, 7)
+ TEST_LOOP (int64_t, 7)
+ TEST_LOOP (uint64_t, 7)
+
+ TEST_LOOP (int8_t, 16)
+ TEST_LOOP (uint8_t, 16)
+ TEST_LOOP (int16_t, 16)
+ TEST_LOOP (uint16_t, 16)
+ TEST_LOOP (int32_t, 16)
+ TEST_LOOP (uint32_t, 16)
+ TEST_LOOP (int64_t, 16)
+ TEST_LOOP (uint64_t, 16)
+
+ TEST_LOOP (int8_t, 77)
+ TEST_LOOP (uint8_t, 77)
+ TEST_LOOP (int16_t, 77)
+ TEST_LOOP (uint16_t, 77)
+ TEST_LOOP (int32_t, 77)
+ TEST_LOOP (uint32_t, 77)
+ TEST_LOOP (int64_t, 77)
+ TEST_LOOP (uint64_t, 77)
+
+ TEST_LOOP (int8_t, 128)
+ TEST_LOOP (uint8_t, 128)
+ TEST_LOOP (int16_t, 128)
+ TEST_LOOP (uint16_t, 128)
+ TEST_LOOP (int32_t, 128)
+ TEST_LOOP (uint32_t, 128)
+ TEST_LOOP (int64_t, 128)
+ TEST_LOOP (uint64_t, 128)
+
+ TEST_LOOP (int8_t, 15641)
+ TEST_LOOP (uint8_t, 15641)
+ TEST_LOOP (int16_t, 15641)
+ TEST_LOOP (uint16_t, 15641)
+ TEST_LOOP (int32_t, 15641)
+ TEST_LOOP (uint32_t, 15641)
+ TEST_LOOP (int64_t, 15641)
+ TEST_LOOP (uint64_t, 15641)
+
+ TEST_LOOP (int8_t, 795)
+ TEST_LOOP (uint8_t, 795)
+ TEST_LOOP (int16_t, 795)
+ TEST_LOOP (uint16_t, 795)
+ TEST_LOOP (int32_t, 795)
+ TEST_LOOP (uint32_t, 795)
+ TEST_LOOP (int64_t, 795)
+ TEST_LOOP (uint64_t, 795)
+
+ return 0;
+}
new file mode 100644
@@ -0,0 +1,104 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+
+#include "ternop-3.c"
+
+#define TEST_LOOP(TYPE, NUM) \
+ { \
+ TYPE array1_##NUM[NUM] = {}; \
+ TYPE array2_##NUM[NUM] = {}; \
+ TYPE array3_##NUM[NUM] = {}; \
+ TYPE array4_##NUM[NUM] = {}; \
+ TYPE array5_##NUM[NUM] = {}; \
+ TYPE array6_##NUM[NUM] = {}; \
+ TYPE array7_##NUM[NUM] = {}; \
+ TYPE array8_##NUM[NUM] = {}; \
+ for (int i = 0; i < NUM; ++i) \
+ { \
+ array1_##NUM[i] = (i & 1) + 5; \
+ array2_##NUM[i] = i - NUM / 3; \
+ array3_##NUM[i] = NUM - NUM / 3 - i; \
+ array6_##NUM[i] = NUM - NUM / 3 - i; \
+ array4_##NUM[i] = NUM - NUM / 2 + i; \
+ array7_##NUM[i] = NUM - NUM / 2 + i; \
+ array5_##NUM[i] = NUM + i * 7; \
+ array8_##NUM[i] = NUM + i * 7; \
+ asm volatile("" ::: "memory"); \
+ } \
+ ternop_##TYPE (array3_##NUM, array4_##NUM, array5_##NUM, array1_##NUM, \
+ array2_##NUM, NUM); \
+ for (int i = 0; i < NUM; i++) \
+ { \
+ array6_##NUM[i] \
+ = (TYPE) (array1_##NUM[i] * array2_##NUM[i] + array7_##NUM[i]); \
+ if (array3_##NUM[i] != array6_##NUM[i]) \
+ __builtin_abort (); \
+ array7_##NUM[i] \
+ = (TYPE) (array1_##NUM[i] * array6_##NUM[i] + array7_##NUM[i]); \
+ if (array4_##NUM[i] != array7_##NUM[i]) \
+ __builtin_abort (); \
+ array8_##NUM[i] \
+ = (TYPE) (array2_##NUM[i] * array7_##NUM[i] + array8_##NUM[i]); \
+ if (array5_##NUM[i] != array8_##NUM[i]) \
+ __builtin_abort (); \
+ } \
+ }
+
+int __attribute__ ((optimize (0))) main ()
+{
+ TEST_LOOP (int8_t, 7)
+ TEST_LOOP (uint8_t, 7)
+ TEST_LOOP (int16_t, 7)
+ TEST_LOOP (uint16_t, 7)
+ TEST_LOOP (int32_t, 7)
+ TEST_LOOP (uint32_t, 7)
+ TEST_LOOP (int64_t, 7)
+ TEST_LOOP (uint64_t, 7)
+
+ TEST_LOOP (int8_t, 16)
+ TEST_LOOP (uint8_t, 16)
+ TEST_LOOP (int16_t, 16)
+ TEST_LOOP (uint16_t, 16)
+ TEST_LOOP (int32_t, 16)
+ TEST_LOOP (uint32_t, 16)
+ TEST_LOOP (int64_t, 16)
+ TEST_LOOP (uint64_t, 16)
+
+ TEST_LOOP (int8_t, 77)
+ TEST_LOOP (uint8_t, 77)
+ TEST_LOOP (int16_t, 77)
+ TEST_LOOP (uint16_t, 77)
+ TEST_LOOP (int32_t, 77)
+ TEST_LOOP (uint32_t, 77)
+ TEST_LOOP (int64_t, 77)
+ TEST_LOOP (uint64_t, 77)
+
+ TEST_LOOP (int8_t, 128)
+ TEST_LOOP (uint8_t, 128)
+ TEST_LOOP (int16_t, 128)
+ TEST_LOOP (uint16_t, 128)
+ TEST_LOOP (int32_t, 128)
+ TEST_LOOP (uint32_t, 128)
+ TEST_LOOP (int64_t, 128)
+ TEST_LOOP (uint64_t, 128)
+
+ TEST_LOOP (int8_t, 15641)
+ TEST_LOOP (uint8_t, 15641)
+ TEST_LOOP (int16_t, 15641)
+ TEST_LOOP (uint16_t, 15641)
+ TEST_LOOP (int32_t, 15641)
+ TEST_LOOP (uint32_t, 15641)
+ TEST_LOOP (int64_t, 15641)
+ TEST_LOOP (uint64_t, 15641)
+
+ TEST_LOOP (int8_t, 795)
+ TEST_LOOP (uint8_t, 795)
+ TEST_LOOP (int16_t, 795)
+ TEST_LOOP (uint16_t, 795)
+ TEST_LOOP (int32_t, 795)
+ TEST_LOOP (uint32_t, 795)
+ TEST_LOOP (int64_t, 795)
+ TEST_LOOP (uint64_t, 795)
+
+ return 0;
+}
@@ -65,6 +65,8 @@ foreach op $AUTOVEC_TEST_OPTS {
"" "$op"
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/cmp/*.\[cS\]]] \
"" "$op"
+ dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/ternop/*.\[cS\]]] \
+ "" "$op"
}
# VLS-VLMAX tests