[v1] RISC-V: Support ceil and ceilf auto-vectorization
Checks
Commit Message
From: Pan Li <pan2.li@intel.com>
This patch would like to support auto-vectorization for both the
ceil and ceilf of math.h. It depends on the -ffast-math option.
When we would like to call ceil/ceilf like v2 = ceil (v1), we will
onvert it into below insn (reference the implementation of llvm).
* vfcvt.x.f v3, v1, RUP
* vfcvt.f.x v2, v3
The conditional auto-vectorization for ceil/ceilf is also supported
and covered by test cases.
Befor this patch:
math-ceil-1.c:21:1: missed: couldn't vectorize loop
...
.L3:
flw fa0,0(s0)
addi s0,s0,4
addi s1,s1,4
call ceilf
fsw fa0,-4(s1)
bne s0,s2,.L3
After this patch:
...
fsrmi 3
.L4:
vsetvli a5,a2,e32,m1,ta,ma
vle32.v v1,0(a1)
vsetvli a3,zero,e32,m1,ta,ma
slli a4,a5,2
vfcvt.x.f.v v1,v1
sub a2,a2,a5
vfcvt.f.x.v v1,v1
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a4
add a0,a0,a4
bne a2,zero,.L4
.L14:
fsrm a6
ret
Please not VLS mode is not involved in this patch and will be token
care of in the underlying patches soon.
gcc/ChangeLog:
* config/riscv/autovec.md (ceil<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
* config/riscv/riscv-v.cc: Handle rounding up.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/math-ceil-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-2.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-3.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-4.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/test-math.h: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
---
gcc/config/riscv/autovec.md | 30 +++++++++++++
gcc/config/riscv/riscv-protos.h | 4 ++
gcc/config/riscv/riscv-v.cc | 2 +
.../riscv/rvv/autovec/math-ceil-1.c | 21 +++++++++
.../riscv/rvv/autovec/math-ceil-2.c | 21 +++++++++
.../riscv/rvv/autovec/math-ceil-3.c | 24 ++++++++++
.../riscv/rvv/autovec/math-ceil-4.c | 24 ++++++++++
.../riscv/rvv/autovec/math-ceil-run-1.c | 24 ++++++++++
.../riscv/rvv/autovec/math-ceil-run-2.c | 24 ++++++++++
.../gcc.target/riscv/rvv/autovec/test-math.h | 45 +++++++++++++++++++
10 files changed, 219 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
Comments
+;; -------------------------------------------------------------------------
+;; ---- [FP] Math.h.
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - ceil/ceilf
+;; -------------------------------------------------------------------------
+(define_expand "ceil<mode>2"
+ [(match_operand:VF 0 "register_operand")
+ (match_operand:VF 1 "register_operand")]
+ "TARGET_VECTOR"
+ {
+ rtx tmp = gen_reg_rtx (<VCONVERT>mode);
+ rtx ops_1[] = {tmp, operands[1]};
+ insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, <MODE>mode);
+
+ /* vfcvt.x.f with rounding up (aka ceil). */
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_1);
+
+ rtx ops_2[] = {operands[0], tmp};
+ icode = code_for_pred (FLOAT, <MODE>mode);
+
+ /* vfcvt.f.x for the final result. To avoid unnecessary frm register
+ access, we use RUP here and it will never do the rounding up because
+ the tmp rtx comes from the float to int conversion. */
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_2);
+
+ DONE;
+ }
+)
It should be "V_VLSF" instead of "VF" so that you could also support VLS CEIL.
Besides, I want to see this following case:
a[i] = cond[i] ? CEIL (b[i]): c[i];
Ideally, we should be able to combine vfcvt + vmerge into vfcvt with mask.
juzhe.zhong@rivai.ai
From: pan2.li
Date: 2023-09-20 10:30
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support ceil and ceilf auto-vectorization
From: Pan Li <pan2.li@intel.com>
This patch would like to support auto-vectorization for both the
ceil and ceilf of math.h. It depends on the -ffast-math option.
When we would like to call ceil/ceilf like v2 = ceil (v1), we will
onvert it into below insn (reference the implementation of llvm).
* vfcvt.x.f v3, v1, RUP
* vfcvt.f.x v2, v3
The conditional auto-vectorization for ceil/ceilf is also supported
and covered by test cases.
Befor this patch:
math-ceil-1.c:21:1: missed: couldn't vectorize loop
...
.L3:
flw fa0,0(s0)
addi s0,s0,4
addi s1,s1,4
call ceilf
fsw fa0,-4(s1)
bne s0,s2,.L3
After this patch:
...
fsrmi 3
.L4:
vsetvli a5,a2,e32,m1,ta,ma
vle32.v v1,0(a1)
vsetvli a3,zero,e32,m1,ta,ma
slli a4,a5,2
vfcvt.x.f.v v1,v1
sub a2,a2,a5
vfcvt.f.x.v v1,v1
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a4
add a0,a0,a4
bne a2,zero,.L4
.L14:
fsrm a6
ret
Please not VLS mode is not involved in this patch and will be token
care of in the underlying patches soon.
gcc/ChangeLog:
* config/riscv/autovec.md (ceil<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
* config/riscv/riscv-v.cc: Handle rounding up.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/math-ceil-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-2.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-3.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-4.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/test-math.h: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
---
gcc/config/riscv/autovec.md | 30 +++++++++++++
gcc/config/riscv/riscv-protos.h | 4 ++
gcc/config/riscv/riscv-v.cc | 2 +
.../riscv/rvv/autovec/math-ceil-1.c | 21 +++++++++
.../riscv/rvv/autovec/math-ceil-2.c | 21 +++++++++
.../riscv/rvv/autovec/math-ceil-3.c | 24 ++++++++++
.../riscv/rvv/autovec/math-ceil-4.c | 24 ++++++++++
.../riscv/rvv/autovec/math-ceil-run-1.c | 24 ++++++++++
.../riscv/rvv/autovec/math-ceil-run-2.c | 24 ++++++++++
.../gcc.target/riscv/rvv/autovec/test-math.h | 45 +++++++++++++++++++
10 files changed, 219 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 493d5745485..ea508d81047 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2374,3 +2374,33 @@ (define_expand "<u>avg<v_double_trunc>3_ceil"
riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3);
DONE;
})
+
+;; -------------------------------------------------------------------------
+;; ---- [FP] Math.h.
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - ceil/ceilf
+;; -------------------------------------------------------------------------
+(define_expand "ceil<mode>2"
+ [(match_operand:VF 0 "register_operand")
+ (match_operand:VF 1 "register_operand")]
+ "TARGET_VECTOR"
+ {
+ rtx tmp = gen_reg_rtx (<VCONVERT>mode);
+ rtx ops_1[] = {tmp, operands[1]};
+ insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, <MODE>mode);
+
+ /* vfcvt.x.f with rounding up (aka ceil). */
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_1);
+
+ rtx ops_2[] = {operands[0], tmp};
+ icode = code_for_pred (FLOAT, <MODE>mode);
+
+ /* vfcvt.f.x for the final result. To avoid unnecessary frm register
+ access, we use RUP here and it will never do the rounding up because
+ the tmp rtx comes from the float to int conversion. */
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_2);
+
+ DONE;
+ }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5a2d218d67b..833f1efbaf4 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -250,6 +250,9 @@ enum insn_flags : unsigned int
/* flags for the floating-point rounding mode. */
/* Means INSN has FRM operand and the value is FRM_DYN. */
FRM_DYN_P = 1 << 15,
+
+ /* Means INSN has FRM operand and the value is FRM_RUP. */
+ FRM_RUP_P = 1 << 16,
};
enum insn_type : unsigned int
@@ -290,6 +293,7 @@ enum insn_type : unsigned int
UNARY_OP_TAMA = __MASK_OP_TAMA | UNARY_OP_P,
UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P,
UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
+ UNARY_OP_FRM_RUP = UNARY_OP | FRM_RUP_P,
/* Binary operator. */
BINARY_OP = __NORMAL_OP | BINARY_OP_P,
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index a9287e5d671..4192f988648 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -323,6 +323,8 @@ public:
/* Add rounding mode operand. */
if (m_insn_flags & FRM_DYN_P)
add_rounding_mode_operand (FRM_DYN);
+ if (m_insn_flags & FRM_RUP_P)
+ add_rounding_mode_operand (FRM_RUP);
gcc_assert (insn_data[(int) icode].n_operands == m_opno);
expand (icode, any_mem_p);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
new file mode 100644
index 00000000000..8f0f09609eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float_ceilf:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+3
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*ma
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_CEIL(float, ceilf)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
new file mode 100644
index 00000000000..73395d30d7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_double_ceil:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+3
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_CEIL(double, ceil)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
new file mode 100644
index 00000000000..eb0f3a3db78
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float_ceilf:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+3
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vmerge\.vvm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_COND_CEIL(float, ceilf)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c
new file mode 100644
index 00000000000..b9a3c8ebf84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_double_ceil:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+3
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vmerge\.vvm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_COND_CEIL(double, ceil)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
new file mode 100644
index 00000000000..014c4c3ac0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
@@ -0,0 +1,24 @@
+/* { dg-do run { target { riscv_vector && riscv_zvfh_hw } } } */
+/* { dg-additional-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -lm" } */
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+float in[ARRAY_SIZE];
+float out[ARRAY_SIZE];
+float ref[ARRAY_SIZE];
+
+// Test function declaration
+TEST_CEIL(float, ceilf)
+TEST_INIT(float)
+TEST_ASSERT(float)
+
+int
+main ()
+{
+ test_float_init (in, ref, ARRAY_SIZE);
+ test_float_ceilf (out, in, ARRAY_SIZE);
+ test_float_assert (out, ref, ARRAY_SIZE);
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
new file mode 100644
index 00000000000..ae361e11144
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
@@ -0,0 +1,24 @@
+/* { dg-do run { target { riscv_vector && riscv_zvfh_hw } } } */
+/* { dg-additional-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -lm" } */
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+double in[ARRAY_SIZE];
+double out[ARRAY_SIZE];
+double ref[ARRAY_SIZE];
+
+// Test function declaration
+TEST_CEIL(double, ceil)
+TEST_INIT(double)
+TEST_ASSERT(double)
+
+int
+main ()
+{
+ test_double_init (in, ref, ARRAY_SIZE);
+ test_double_ceil (out, in, ARRAY_SIZE);
+ test_double_assert (out, ref, ARRAY_SIZE);
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
new file mode 100644
index 00000000000..57dd5e0e460
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
@@ -0,0 +1,45 @@
+#include <math.h>
+
+#define TEST_CEIL(TYPE, CALL) \
+ void test_##TYPE##_##CALL (TYPE *out, TYPE *in, unsigned count) \
+ { \
+ for (unsigned i = 0; i < count; i++) \
+ out[i] = CALL (in[i]); \
+ }
+
+#define TEST_COND_CEIL(TYPE, CALL) \
+ void test_##TYPE##_##CALL (TYPE *out, int *cond, TYPE *in, unsigned count) \
+ { \
+ for (unsigned i = 0; i < count; i++) \
+ out[i] = cond[i] ? CALL (in[i]) : in[i]; \
+ }
+
+#define TEST_INIT(TYPE) \
+ void test_##TYPE##_init (TYPE *in, TYPE *ref, unsigned size) \
+ { \
+ for (unsigned i = 0; i < size; i++) \
+ { \
+ TYPE tmp = (TYPE)i; \
+ \
+ if (i % 2 == 0) \
+ { \
+ in[i] = 1.5f + (TYPE)i; \
+ ref[i] = (TYPE)(i + 2); \
+ } \
+ else \
+ { \
+ in[i] = (TYPE)i; \
+ ref[i] = (TYPE)i; \
+ } \
+ } \
+ }
+
+#define TEST_ASSERT(TYPE) \
+ void test_##TYPE##_assert (TYPE *out, TYPE *ref, unsigned size) \
+ { \
+ for (unsigned i = 0; i < size; i++) \
+ { \
+ if (out[i] != ref[i]) \
+ __builtin_abort (); \
+ } \
+ }
--
2.34.1
> It should be "V_VLSF" instead of "VF" so that you could also support VLS CEIL.
Under preparing, and will append to this V2 instead of another patch.
> a[i] = cond[i] ? CEIL (b[i]): c[i];
Sure
Pan
From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Wednesday, September 20, 2023 10:35 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Li, Pan2 <pan2.li@intel.com>; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng <kito.cheng@gmail.com>
Subject: Re: [PATCH v1] RISC-V: Support ceil and ceilf auto-vectorization
+;; -------------------------------------------------------------------------
+;; ---- [FP] Math.h.
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - ceil/ceilf
+;; -------------------------------------------------------------------------
+(define_expand "ceil<mode>2"
+ [(match_operand:VF 0 "register_operand")
+ (match_operand:VF 1 "register_operand")]
+ "TARGET_VECTOR"
+ {
+ rtx tmp = gen_reg_rtx (<VCONVERT>mode);
+ rtx ops_1[] = {tmp, operands[1]};
+ insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, <MODE>mode);
+
+ /* vfcvt.x.f with rounding up (aka ceil). */
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_1);
+
+ rtx ops_2[] = {operands[0], tmp};
+ icode = code_for_pred (FLOAT, <MODE>mode);
+
+ /* vfcvt.f.x for the final result. To avoid unnecessary frm register
+ access, we use RUP here and it will never do the rounding up because
+ the tmp rtx comes from the float to int conversion. */
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_2);
+
+ DONE;
+ }
+)
It should be "V_VLSF" instead of "VF" so that you could also support VLS CEIL.
Besides, I want to see this following case:
a[i] = cond[i] ? CEIL (b[i]): c[i];
Ideally, we should be able to combine vfcvt + vmerge into vfcvt with mask.
I have checked LLVM:
https://godbolt.org/z/4jWG5vjMT
It seems their code sequence as follows:
vfabs.v
vmflt.vf
vfcvt.x.f.v -> static rounding mode
vfcvt.f.x.v -> dynamic rounding mode
vfsgnj.vv
How come you just only need 2 static vfcvt insns is enough ?
juzhe.zhong@rivai.ai
From: Li, Pan2
Date: 2023-09-20 10:44
To: juzhe.zhong@rivai.ai; gcc-patches
CC: Wang, Yanzhang; kito.cheng
Subject: RE: [PATCH v1] RISC-V: Support ceil and ceilf auto-vectorization
> It should be "V_VLSF" instead of "VF" so that you could also support VLS CEIL.
Under preparing, and will append to this V2 instead of another patch.
> a[i] = cond[i] ? CEIL (b[i]): c[i];
Sure
Pan
From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Wednesday, September 20, 2023 10:35 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Li, Pan2 <pan2.li@intel.com>; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng <kito.cheng@gmail.com>
Subject: Re: [PATCH v1] RISC-V: Support ceil and ceilf auto-vectorization
+;; -------------------------------------------------------------------------
+;; ---- [FP] Math.h.
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - ceil/ceilf
+;; -------------------------------------------------------------------------
+(define_expand "ceil<mode>2"
+ [(match_operand:VF 0 "register_operand")
+ (match_operand:VF 1 "register_operand")]
+ "TARGET_VECTOR"
+ {
+ rtx tmp = gen_reg_rtx (<VCONVERT>mode);
+ rtx ops_1[] = {tmp, operands[1]};
+ insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, <MODE>mode);
+
+ /* vfcvt.x.f with rounding up (aka ceil). */
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_1);
+
+ rtx ops_2[] = {operands[0], tmp};
+ icode = code_for_pred (FLOAT, <MODE>mode);
+
+ /* vfcvt.f.x for the final result. To avoid unnecessary frm register
+ access, we use RUP here and it will never do the rounding up because
+ the tmp rtx comes from the float to int conversion. */
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_2);
+
+ DONE;
+ }
+)
It should be "V_VLSF" instead of "VF" so that you could also support VLS CEIL.
Besides, I want to see this following case:
a[i] = cond[i] ? CEIL (b[i]): c[i];
Ideally, we should be able to combine vfcvt + vmerge into vfcvt with mask.
juzhe.zhong@rivai.ai
From: pan2.li
Date: 2023-09-20 10:30
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support ceil and ceilf auto-vectorization
From: Pan Li <pan2.li@intel.com>
This patch would like to support auto-vectorization for both the
ceil and ceilf of math.h. It depends on the -ffast-math option.
When we would like to call ceil/ceilf like v2 = ceil (v1), we will
onvert it into below insn (reference the implementation of llvm).
* vfcvt.x.f v3, v1, RUP
* vfcvt.f.x v2, v3
The conditional auto-vectorization for ceil/ceilf is also supported
and covered by test cases.
Befor this patch:
math-ceil-1.c:21:1: missed: couldn't vectorize loop
...
.L3:
flw fa0,0(s0)
addi s0,s0,4
addi s1,s1,4
call ceilf
fsw fa0,-4(s1)
bne s0,s2,.L3
After this patch:
...
fsrmi 3
.L4:
vsetvli a5,a2,e32,m1,ta,ma
vle32.v v1,0(a1)
vsetvli a3,zero,e32,m1,ta,ma
slli a4,a5,2
vfcvt.x.f.v v1,v1
sub a2,a2,a5
vfcvt.f.x.v v1,v1
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a4
add a0,a0,a4
bne a2,zero,.L4
.L14:
fsrm a6
ret
Please not VLS mode is not involved in this patch and will be token
care of in the underlying patches soon.
gcc/ChangeLog:
* config/riscv/autovec.md (ceil<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
* config/riscv/riscv-v.cc: Handle rounding up.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/math-ceil-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-2.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-3.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-4.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/test-math.h: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
---
gcc/config/riscv/autovec.md | 30 +++++++++++++
gcc/config/riscv/riscv-protos.h | 4 ++
gcc/config/riscv/riscv-v.cc | 2 +
.../riscv/rvv/autovec/math-ceil-1.c | 21 +++++++++
.../riscv/rvv/autovec/math-ceil-2.c | 21 +++++++++
.../riscv/rvv/autovec/math-ceil-3.c | 24 ++++++++++
.../riscv/rvv/autovec/math-ceil-4.c | 24 ++++++++++
.../riscv/rvv/autovec/math-ceil-run-1.c | 24 ++++++++++
.../riscv/rvv/autovec/math-ceil-run-2.c | 24 ++++++++++
.../gcc.target/riscv/rvv/autovec/test-math.h | 45 +++++++++++++++++++
10 files changed, 219 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 493d5745485..ea508d81047 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2374,3 +2374,33 @@ (define_expand "<u>avg<v_double_trunc>3_ceil"
riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3);
DONE;
})
+
+;; -------------------------------------------------------------------------
+;; ---- [FP] Math.h.
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - ceil/ceilf
+;; -------------------------------------------------------------------------
+(define_expand "ceil<mode>2"
+ [(match_operand:VF 0 "register_operand")
+ (match_operand:VF 1 "register_operand")]
+ "TARGET_VECTOR"
+ {
+ rtx tmp = gen_reg_rtx (<VCONVERT>mode);
+ rtx ops_1[] = {tmp, operands[1]};
+ insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, <MODE>mode);
+
+ /* vfcvt.x.f with rounding up (aka ceil). */
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_1);
+
+ rtx ops_2[] = {operands[0], tmp};
+ icode = code_for_pred (FLOAT, <MODE>mode);
+
+ /* vfcvt.f.x for the final result. To avoid unnecessary frm register
+ access, we use RUP here and it will never do the rounding up because
+ the tmp rtx comes from the float to int conversion. */
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_2);
+
+ DONE;
+ }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5a2d218d67b..833f1efbaf4 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -250,6 +250,9 @@ enum insn_flags : unsigned int
/* flags for the floating-point rounding mode. */
/* Means INSN has FRM operand and the value is FRM_DYN. */
FRM_DYN_P = 1 << 15,
+
+ /* Means INSN has FRM operand and the value is FRM_RUP. */
+ FRM_RUP_P = 1 << 16,
};
enum insn_type : unsigned int
@@ -290,6 +293,7 @@ enum insn_type : unsigned int
UNARY_OP_TAMA = __MASK_OP_TAMA | UNARY_OP_P,
UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P,
UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
+ UNARY_OP_FRM_RUP = UNARY_OP | FRM_RUP_P,
/* Binary operator. */
BINARY_OP = __NORMAL_OP | BINARY_OP_P,
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index a9287e5d671..4192f988648 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -323,6 +323,8 @@ public:
/* Add rounding mode operand. */
if (m_insn_flags & FRM_DYN_P)
add_rounding_mode_operand (FRM_DYN);
+ if (m_insn_flags & FRM_RUP_P)
+ add_rounding_mode_operand (FRM_RUP);
gcc_assert (insn_data[(int) icode].n_operands == m_opno);
expand (icode, any_mem_p);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
new file mode 100644
index 00000000000..8f0f09609eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float_ceilf:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+3
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*ma
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_CEIL(float, ceilf)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
new file mode 100644
index 00000000000..73395d30d7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_double_ceil:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+3
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_CEIL(double, ceil)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
new file mode 100644
index 00000000000..eb0f3a3db78
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float_ceilf:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+3
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vmerge\.vvm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_COND_CEIL(float, ceilf)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c
new file mode 100644
index 00000000000..b9a3c8ebf84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_double_ceil:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+3
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vmerge\.vvm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_COND_CEIL(double, ceil)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
new file mode 100644
index 00000000000..014c4c3ac0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
@@ -0,0 +1,24 @@
+/* { dg-do run { target { riscv_vector && riscv_zvfh_hw } } } */
+/* { dg-additional-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -lm" } */
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+float in[ARRAY_SIZE];
+float out[ARRAY_SIZE];
+float ref[ARRAY_SIZE];
+
+// Test function declaration
+TEST_CEIL(float, ceilf)
+TEST_INIT(float)
+TEST_ASSERT(float)
+
+int
+main ()
+{
+ test_float_init (in, ref, ARRAY_SIZE);
+ test_float_ceilf (out, in, ARRAY_SIZE);
+ test_float_assert (out, ref, ARRAY_SIZE);
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
new file mode 100644
index 00000000000..ae361e11144
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
@@ -0,0 +1,24 @@
+/* { dg-do run { target { riscv_vector && riscv_zvfh_hw } } } */
+/* { dg-additional-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -lm" } */
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+double in[ARRAY_SIZE];
+double out[ARRAY_SIZE];
+double ref[ARRAY_SIZE];
+
+// Test function declaration
+TEST_CEIL(double, ceil)
+TEST_INIT(double)
+TEST_ASSERT(double)
+
+int
+main ()
+{
+ test_double_init (in, ref, ARRAY_SIZE);
+ test_double_ceil (out, in, ARRAY_SIZE);
+ test_double_assert (out, ref, ARRAY_SIZE);
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
new file mode 100644
index 00000000000..57dd5e0e460
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
@@ -0,0 +1,45 @@
+#include <math.h>
+
+#define TEST_CEIL(TYPE, CALL) \
+ void test_##TYPE##_##CALL (TYPE *out, TYPE *in, unsigned count) \
+ { \
+ for (unsigned i = 0; i < count; i++) \
+ out[i] = CALL (in[i]); \
+ }
+
+#define TEST_COND_CEIL(TYPE, CALL) \
+ void test_##TYPE##_##CALL (TYPE *out, int *cond, TYPE *in, unsigned count) \
+ { \
+ for (unsigned i = 0; i < count; i++) \
+ out[i] = cond[i] ? CALL (in[i]) : in[i]; \
+ }
+
+#define TEST_INIT(TYPE) \
+ void test_##TYPE##_init (TYPE *in, TYPE *ref, unsigned size) \
+ { \
+ for (unsigned i = 0; i < size; i++) \
+ { \
+ TYPE tmp = (TYPE)i; \
+ \
+ if (i % 2 == 0) \
+ { \
+ in[i] = 1.5f + (TYPE)i; \
+ ref[i] = (TYPE)(i + 2); \
+ } \
+ else \
+ { \
+ in[i] = (TYPE)i; \
+ ref[i] = (TYPE)i; \
+ } \
+ } \
+ }
+
+#define TEST_ASSERT(TYPE) \
+ void test_##TYPE##_assert (TYPE *out, TYPE *ref, unsigned size) \
+ { \
+ for (unsigned i = 0; i < size; i++) \
+ { \
+ if (out[i] != ref[i]) \
+ __builtin_abort (); \
+ } \
+ }
--
2.34.1
I just checked the LLVM implementation.
This is their codes of rounding autovectorizaton:
They handle CEIL/FLOOR/FROUND/FROUNDEVEN/FROUND TO ZERO with the same handling
switch (Op.getOpcode()) {
default:
llvm_unreachable("Unexpected opcode");
case ISD::FCEIL:
case ISD::VP_FCEIL:
case ISD::FFLOOR:
case ISD::VP_FFLOOR:
case ISD::FROUND:
case ISD::FROUNDEVEN:
case ISD::VP_FROUND:
case ISD::VP_FROUNDEVEN:
case ISD::VP_FROUNDTOZERO: {
RISCVFPRndMode::RoundingMode FRM = matchRoundingOp(Op.getOpcode());
assert(FRM != RISCVFPRndMode::Invalid);
Truncated = DAG.getNode(RISCVISD::VFCVT_RM_X_F_VL, DL, IntVT, Src, Mask,
DAG.getTargetConstant(FRM, DL, XLenVT), VL);
break;
}
case ISD::FTRUNC:
Truncated = DAG.getNode(RISCVISD::VFCVT_RTZ_X_F_VL, DL, IntVT, Src,
Mask, VL);
break;
case ISD::VP_FRINT:
Truncated = DAG.getNode(RISCVISD::VFCVT_X_F_VL, DL, IntVT, Src, Mask, VL);
break;
case ISD::VP_FNEARBYINT:
Truncated = DAG.getNode(RISCVISD::VFROUND_NOEXCEPT_VL, DL, ContainerVT, Src,
Mask, VL);
break;
}
// VFROUND_NOEXCEPT_VL includes SINT_TO_FP_VL.
if (Op.getOpcode() != ISD::VP_FNEARBYINT)
Truncated = DAG.getNode(RISCVISD::SINT_TO_FP_VL, DL, ContainerVT, Truncated,
Mask, VL);
// Restore the original sign so that -0.0 is preserved.
Truncated = DAG.getNode(RISCVISD::FCOPYSIGN_VL, DL, ContainerVT, Truncated,
Src, Src, Mask, VL);
I think you could just copy LLVM implementation and translate them into GCC codes.
It's so simple.
Create a function call 'expand_rounding".
LLVM code is very easy to read. I believe you could leverage LLVM implementation quickly.
juzhe.zhong@rivai.ai
From: Li, Pan2
Date: 2023-09-20 10:44
To: juzhe.zhong@rivai.ai; gcc-patches
CC: Wang, Yanzhang; kito.cheng
Subject: RE: [PATCH v1] RISC-V: Support ceil and ceilf auto-vectorization
> It should be "V_VLSF" instead of "VF" so that you could also support VLS CEIL.
Under preparing, and will append to this V2 instead of another patch.
> a[i] = cond[i] ? CEIL (b[i]): c[i];
Sure
Pan
From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Wednesday, September 20, 2023 10:35 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Li, Pan2 <pan2.li@intel.com>; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng <kito.cheng@gmail.com>
Subject: Re: [PATCH v1] RISC-V: Support ceil and ceilf auto-vectorization
+;; -------------------------------------------------------------------------
+;; ---- [FP] Math.h.
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - ceil/ceilf
+;; -------------------------------------------------------------------------
+(define_expand "ceil<mode>2"
+ [(match_operand:VF 0 "register_operand")
+ (match_operand:VF 1 "register_operand")]
+ "TARGET_VECTOR"
+ {
+ rtx tmp = gen_reg_rtx (<VCONVERT>mode);
+ rtx ops_1[] = {tmp, operands[1]};
+ insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, <MODE>mode);
+
+ /* vfcvt.x.f with rounding up (aka ceil). */
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_1);
+
+ rtx ops_2[] = {operands[0], tmp};
+ icode = code_for_pred (FLOAT, <MODE>mode);
+
+ /* vfcvt.f.x for the final result. To avoid unnecessary frm register
+ access, we use RUP here and it will never do the rounding up because
+ the tmp rtx comes from the float to int conversion. */
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_2);
+
+ DONE;
+ }
+)
It should be "V_VLSF" instead of "VF" so that you could also support VLS CEIL.
Besides, I want to see this following case:
a[i] = cond[i] ? CEIL (b[i]): c[i];
Ideally, we should be able to combine vfcvt + vmerge into vfcvt with mask.
juzhe.zhong@rivai.ai
From: pan2.li
Date: 2023-09-20 10:30
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support ceil and ceilf auto-vectorization
From: Pan Li <pan2.li@intel.com>
This patch would like to support auto-vectorization for both the
ceil and ceilf of math.h. It depends on the -ffast-math option.
When we would like to call ceil/ceilf like v2 = ceil (v1), we will
onvert it into below insn (reference the implementation of llvm).
* vfcvt.x.f v3, v1, RUP
* vfcvt.f.x v2, v3
The conditional auto-vectorization for ceil/ceilf is also supported
and covered by test cases.
Befor this patch:
math-ceil-1.c:21:1: missed: couldn't vectorize loop
...
.L3:
flw fa0,0(s0)
addi s0,s0,4
addi s1,s1,4
call ceilf
fsw fa0,-4(s1)
bne s0,s2,.L3
After this patch:
...
fsrmi 3
.L4:
vsetvli a5,a2,e32,m1,ta,ma
vle32.v v1,0(a1)
vsetvli a3,zero,e32,m1,ta,ma
slli a4,a5,2
vfcvt.x.f.v v1,v1
sub a2,a2,a5
vfcvt.f.x.v v1,v1
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a4
add a0,a0,a4
bne a2,zero,.L4
.L14:
fsrm a6
ret
Please not VLS mode is not involved in this patch and will be token
care of in the underlying patches soon.
gcc/ChangeLog:
* config/riscv/autovec.md (ceil<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
* config/riscv/riscv-v.cc: Handle rounding up.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/math-ceil-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-2.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-3.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-4.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/test-math.h: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
---
gcc/config/riscv/autovec.md | 30 +++++++++++++
gcc/config/riscv/riscv-protos.h | 4 ++
gcc/config/riscv/riscv-v.cc | 2 +
.../riscv/rvv/autovec/math-ceil-1.c | 21 +++++++++
.../riscv/rvv/autovec/math-ceil-2.c | 21 +++++++++
.../riscv/rvv/autovec/math-ceil-3.c | 24 ++++++++++
.../riscv/rvv/autovec/math-ceil-4.c | 24 ++++++++++
.../riscv/rvv/autovec/math-ceil-run-1.c | 24 ++++++++++
.../riscv/rvv/autovec/math-ceil-run-2.c | 24 ++++++++++
.../gcc.target/riscv/rvv/autovec/test-math.h | 45 +++++++++++++++++++
10 files changed, 219 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 493d5745485..ea508d81047 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2374,3 +2374,33 @@ (define_expand "<u>avg<v_double_trunc>3_ceil"
riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3);
DONE;
})
+
+;; -------------------------------------------------------------------------
+;; ---- [FP] Math.h.
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - ceil/ceilf
+;; -------------------------------------------------------------------------
+(define_expand "ceil<mode>2"
+ [(match_operand:VF 0 "register_operand")
+ (match_operand:VF 1 "register_operand")]
+ "TARGET_VECTOR"
+ {
+ rtx tmp = gen_reg_rtx (<VCONVERT>mode);
+ rtx ops_1[] = {tmp, operands[1]};
+ insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, <MODE>mode);
+
+ /* vfcvt.x.f with rounding up (aka ceil). */
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_1);
+
+ rtx ops_2[] = {operands[0], tmp};
+ icode = code_for_pred (FLOAT, <MODE>mode);
+
+ /* vfcvt.f.x for the final result. To avoid unnecessary frm register
+ access, we use RUP here and it will never do the rounding up because
+ the tmp rtx comes from the float to int conversion. */
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_2);
+
+ DONE;
+ }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5a2d218d67b..833f1efbaf4 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -250,6 +250,9 @@ enum insn_flags : unsigned int
/* flags for the floating-point rounding mode. */
/* Means INSN has FRM operand and the value is FRM_DYN. */
FRM_DYN_P = 1 << 15,
+
+ /* Means INSN has FRM operand and the value is FRM_RUP. */
+ FRM_RUP_P = 1 << 16,
};
enum insn_type : unsigned int
@@ -290,6 +293,7 @@ enum insn_type : unsigned int
UNARY_OP_TAMA = __MASK_OP_TAMA | UNARY_OP_P,
UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P,
UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
+ UNARY_OP_FRM_RUP = UNARY_OP | FRM_RUP_P,
/* Binary operator. */
BINARY_OP = __NORMAL_OP | BINARY_OP_P,
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index a9287e5d671..4192f988648 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -323,6 +323,8 @@ public:
/* Add rounding mode operand. */
if (m_insn_flags & FRM_DYN_P)
add_rounding_mode_operand (FRM_DYN);
+ if (m_insn_flags & FRM_RUP_P)
+ add_rounding_mode_operand (FRM_RUP);
gcc_assert (insn_data[(int) icode].n_operands == m_opno);
expand (icode, any_mem_p);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
new file mode 100644
index 00000000000..8f0f09609eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float_ceilf:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+3
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*ma
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_CEIL(float, ceilf)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
new file mode 100644
index 00000000000..73395d30d7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_double_ceil:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+3
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_CEIL(double, ceil)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
new file mode 100644
index 00000000000..eb0f3a3db78
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float_ceilf:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+3
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vmerge\.vvm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_COND_CEIL(float, ceilf)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c
new file mode 100644
index 00000000000..b9a3c8ebf84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-4.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_double_ceil:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+3
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vmerge\.vvm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_COND_CEIL(double, ceil)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
new file mode 100644
index 00000000000..014c4c3ac0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
@@ -0,0 +1,24 @@
+/* { dg-do run { target { riscv_vector && riscv_zvfh_hw } } } */
+/* { dg-additional-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -lm" } */
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+float in[ARRAY_SIZE];
+float out[ARRAY_SIZE];
+float ref[ARRAY_SIZE];
+
+// Test function declaration
+TEST_CEIL(float, ceilf)
+TEST_INIT(float)
+TEST_ASSERT(float)
+
+int
+main ()
+{
+ test_float_init (in, ref, ARRAY_SIZE);
+ test_float_ceilf (out, in, ARRAY_SIZE);
+ test_float_assert (out, ref, ARRAY_SIZE);
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
new file mode 100644
index 00000000000..ae361e11144
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
@@ -0,0 +1,24 @@
+/* { dg-do run { target { riscv_vector && riscv_zvfh_hw } } } */
+/* { dg-additional-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -lm" } */
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+double in[ARRAY_SIZE];
+double out[ARRAY_SIZE];
+double ref[ARRAY_SIZE];
+
+// Test function declaration
+TEST_CEIL(double, ceil)
+TEST_INIT(double)
+TEST_ASSERT(double)
+
+int
+main ()
+{
+ test_double_init (in, ref, ARRAY_SIZE);
+ test_double_ceil (out, in, ARRAY_SIZE);
+ test_double_assert (out, ref, ARRAY_SIZE);
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
new file mode 100644
index 00000000000..57dd5e0e460
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
@@ -0,0 +1,45 @@
+#include <math.h>
+
+#define TEST_CEIL(TYPE, CALL) \
+ void test_##TYPE##_##CALL (TYPE *out, TYPE *in, unsigned count) \
+ { \
+ for (unsigned i = 0; i < count; i++) \
+ out[i] = CALL (in[i]); \
+ }
+
+#define TEST_COND_CEIL(TYPE, CALL) \
+ void test_##TYPE##_##CALL (TYPE *out, int *cond, TYPE *in, unsigned count) \
+ { \
+ for (unsigned i = 0; i < count; i++) \
+ out[i] = cond[i] ? CALL (in[i]) : in[i]; \
+ }
+
+#define TEST_INIT(TYPE) \
+ void test_##TYPE##_init (TYPE *in, TYPE *ref, unsigned size) \
+ { \
+ for (unsigned i = 0; i < size; i++) \
+ { \
+ TYPE tmp = (TYPE)i; \
+ \
+ if (i % 2 == 0) \
+ { \
+ in[i] = 1.5f + (TYPE)i; \
+ ref[i] = (TYPE)(i + 2); \
+ } \
+ else \
+ { \
+ in[i] = (TYPE)i; \
+ ref[i] = (TYPE)i; \
+ } \
+ } \
+ }
+
+#define TEST_ASSERT(TYPE) \
+ void test_##TYPE##_assert (TYPE *out, TYPE *ref, unsigned size) \
+ { \
+ for (unsigned i = 0; i < size; i++) \
+ { \
+ if (out[i] != ref[i]) \
+ __builtin_abort (); \
+ } \
+ }
--
2.34.1
Thanks Juzhe, let me check and keep you posted.
Pan
From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Wednesday, September 20, 2023 11:37 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng <kito.cheng@gmail.com>
Subject: Re: RE: [PATCH v1] RISC-V: Support ceil and ceilf auto-vectorization
I just checked the LLVM implementation.
This is their codes of rounding autovectorizaton:
They handle CEIL/FLOOR/FROUND/FROUNDEVEN/FROUND TO ZERO with the same handling
switch (Op.getOpcode()) {
default:
llvm_unreachable("Unexpected opcode");
case ISD::FCEIL:
case ISD::VP_FCEIL:
case ISD::FFLOOR:
case ISD::VP_FFLOOR:
case ISD::FROUND:
case ISD::FROUNDEVEN:
case ISD::VP_FROUND:
case ISD::VP_FROUNDEVEN:
case ISD::VP_FROUNDTOZERO: {
RISCVFPRndMode::RoundingMode FRM = matchRoundingOp(Op.getOpcode());
assert(FRM != RISCVFPRndMode::Invalid);
Truncated = DAG.getNode(RISCVISD::VFCVT_RM_X_F_VL, DL, IntVT, Src, Mask,
DAG.getTargetConstant(FRM, DL, XLenVT), VL);
break;
}
case ISD::FTRUNC:
Truncated = DAG.getNode(RISCVISD::VFCVT_RTZ_X_F_VL, DL, IntVT, Src,
Mask, VL);
break;
case ISD::VP_FRINT:
Truncated = DAG.getNode(RISCVISD::VFCVT_X_F_VL, DL, IntVT, Src, Mask, VL);
break;
case ISD::VP_FNEARBYINT:
Truncated = DAG.getNode(RISCVISD::VFROUND_NOEXCEPT_VL, DL, ContainerVT, Src,
Mask, VL);
break;
}
// VFROUND_NOEXCEPT_VL includes SINT_TO_FP_VL.
if (Op.getOpcode() != ISD::VP_FNEARBYINT)
Truncated = DAG.getNode(RISCVISD::SINT_TO_FP_VL, DL, ContainerVT, Truncated,
Mask, VL);
// Restore the original sign so that -0.0 is preserved.
Truncated = DAG.getNode(RISCVISD::FCOPYSIGN_VL, DL, ContainerVT, Truncated,
Src, Src, Mask, VL);
I think you could just copy LLVM implementation and translate them into GCC codes.
It's so simple.
Create a function call 'expand_rounding".
LLVM code is very easy to read. I believe you could leverage LLVM implementation quickly.
@@ -2374,3 +2374,33 @@ (define_expand "<u>avg<v_double_trunc>3_ceil"
riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3);
DONE;
})
+
+;; -------------------------------------------------------------------------
+;; ---- [FP] Math.h.
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - ceil/ceilf
+;; -------------------------------------------------------------------------
+(define_expand "ceil<mode>2"
+ [(match_operand:VF 0 "register_operand")
+ (match_operand:VF 1 "register_operand")]
+ "TARGET_VECTOR"
+ {
+ rtx tmp = gen_reg_rtx (<VCONVERT>mode);
+ rtx ops_1[] = {tmp, operands[1]};
+ insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, <MODE>mode);
+
+ /* vfcvt.x.f with rounding up (aka ceil). */
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_1);
+
+ rtx ops_2[] = {operands[0], tmp};
+ icode = code_for_pred (FLOAT, <MODE>mode);
+
+ /* vfcvt.f.x for the final result. To avoid unnecessary frm register
+ access, we use RUP here and it will never do the rounding up because
+ the tmp rtx comes from the float to int conversion. */
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_RUP, ops_2);
+
+ DONE;
+ }
+)
@@ -250,6 +250,9 @@ enum insn_flags : unsigned int
/* flags for the floating-point rounding mode. */
/* Means INSN has FRM operand and the value is FRM_DYN. */
FRM_DYN_P = 1 << 15,
+
+ /* Means INSN has FRM operand and the value is FRM_RUP. */
+ FRM_RUP_P = 1 << 16,
};
enum insn_type : unsigned int
@@ -290,6 +293,7 @@ enum insn_type : unsigned int
UNARY_OP_TAMA = __MASK_OP_TAMA | UNARY_OP_P,
UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P,
UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
+ UNARY_OP_FRM_RUP = UNARY_OP | FRM_RUP_P,
/* Binary operator. */
BINARY_OP = __NORMAL_OP | BINARY_OP_P,
@@ -323,6 +323,8 @@ public:
/* Add rounding mode operand. */
if (m_insn_flags & FRM_DYN_P)
add_rounding_mode_operand (FRM_DYN);
+ if (m_insn_flags & FRM_RUP_P)
+ add_rounding_mode_operand (FRM_RUP);
gcc_assert (insn_data[(int) icode].n_operands == m_opno);
expand (icode, any_mem_p);
new file mode 100644
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float_ceilf:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+3
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*ma
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_CEIL(float, ceilf)
new file mode 100644
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_double_ceil:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+3
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_CEIL(double, ceil)
new file mode 100644
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float_ceilf:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+3
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vmerge\.vvm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_COND_CEIL(float, ceilf)
new file mode 100644
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_double_ceil:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+3
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+
+** ...
+** vmerge\.vvm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_COND_CEIL(double, ceil)
new file mode 100644
@@ -0,0 +1,24 @@
+/* { dg-do run { target { riscv_vector && riscv_zvfh_hw } } } */
+/* { dg-additional-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -lm" } */
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+float in[ARRAY_SIZE];
+float out[ARRAY_SIZE];
+float ref[ARRAY_SIZE];
+
+// Test function declaration
+TEST_CEIL(float, ceilf)
+TEST_INIT(float)
+TEST_ASSERT(float)
+
+int
+main ()
+{
+ test_float_init (in, ref, ARRAY_SIZE);
+ test_float_ceilf (out, in, ARRAY_SIZE);
+ test_float_assert (out, ref, ARRAY_SIZE);
+
+ return 0;
+}
new file mode 100644
@@ -0,0 +1,24 @@
+/* { dg-do run { target { riscv_vector && riscv_zvfh_hw } } } */
+/* { dg-additional-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -lm" } */
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+double in[ARRAY_SIZE];
+double out[ARRAY_SIZE];
+double ref[ARRAY_SIZE];
+
+// Test function declaration
+TEST_CEIL(double, ceil)
+TEST_INIT(double)
+TEST_ASSERT(double)
+
+int
+main ()
+{
+ test_double_init (in, ref, ARRAY_SIZE);
+ test_double_ceil (out, in, ARRAY_SIZE);
+ test_double_assert (out, ref, ARRAY_SIZE);
+
+ return 0;
+}
new file mode 100644
@@ -0,0 +1,45 @@
+#include <math.h>
+
+#define TEST_CEIL(TYPE, CALL) \
+ void test_##TYPE##_##CALL (TYPE *out, TYPE *in, unsigned count) \
+ { \
+ for (unsigned i = 0; i < count; i++) \
+ out[i] = CALL (in[i]); \
+ }
+
+#define TEST_COND_CEIL(TYPE, CALL) \
+ void test_##TYPE##_##CALL (TYPE *out, int *cond, TYPE *in, unsigned count) \
+ { \
+ for (unsigned i = 0; i < count; i++) \
+ out[i] = cond[i] ? CALL (in[i]) : in[i]; \
+ }
+
+#define TEST_INIT(TYPE) \
+ void test_##TYPE##_init (TYPE *in, TYPE *ref, unsigned size) \
+ { \
+ for (unsigned i = 0; i < size; i++) \
+ { \
+ TYPE tmp = (TYPE)i; \
+ \
+ if (i % 2 == 0) \
+ { \
+ in[i] = 1.5f + (TYPE)i; \
+ ref[i] = (TYPE)(i + 2); \
+ } \
+ else \
+ { \
+ in[i] = (TYPE)i; \
+ ref[i] = (TYPE)i; \
+ } \
+ } \
+ }
+
+#define TEST_ASSERT(TYPE) \
+ void test_##TYPE##_assert (TYPE *out, TYPE *ref, unsigned size) \
+ { \
+ for (unsigned i = 0; i < size; i++) \
+ { \
+ if (out[i] != ref[i]) \
+ __builtin_abort (); \
+ } \
+ }