RISC-V: Bugfix for mode tieable of the rvv bool types
Checks
Commit Message
From: Pan Li <incarnation.p.lee@outlook.com>
Fix the bug for mode tieable of the rvv bool types. The vbool*_t
cannot be tied as the actually load/store size is determinated by
the vl. The mode size of rvv bool types are also adjusted for the
underlying optimization pass. The rvv bool type is vbool*_t, aka
vbool1_t, vbool2_t, vbool4_t, vbool8_t, vbool16_t, vbool32_t, and
vbool64_t.
PR 108185
PR 108654
gcc/ChangeLog:
* config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
* config/riscv/riscv.cc (riscv_v_adjust_bytesize):
(riscv_modes_tieable_p):
* config/riscv/riscv.h (riscv_v_adjust_bytesize):
* machmode.h (VECTOR_BOOL_MODE_P):
* tree-ssa-sccvn.cc (visit_reference_op_load):
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr108185-1.c: New test.
* gcc.target/riscv/pr108185-2.c: New test.
* gcc.target/riscv/pr108185-3.c: New test.
* gcc.target/riscv/pr108185-4.c: New test.
* gcc.target/riscv/pr108185-5.c: New test.
* gcc.target/riscv/pr108185-6.c: New test.
* gcc.target/riscv/pr108185-7.c: New test.
* gcc.target/riscv/pr108185-8.c: New test.
Signed-off-by: Pan Li <incarnation.p.lee@outlook.com>
---
gcc/config/riscv/riscv-modes.def | 14 ++--
gcc/config/riscv/riscv.cc | 34 ++++++++-
gcc/config/riscv/riscv.h | 2 +
gcc/machmode.h | 3 +
gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
gcc/tree-ssa-sccvn.cc | 13 +++-
13 files changed, 608 insertions(+), 11 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
Comments
Thanks for contributing this.
Hi, Richard. Can you help us with this issue?
In RVV, we have vbool8_t (VNx8BImode), vbool16_t (VNx4BImode), vbool32_t (VNx2BImode), vbool64_t (VNx1BImode)
Since we are using 1bit-mask which is 1-BOOL occupy 1bit.
According to RVV ISA, we adjust like:
VNx8BImode (8,8) NUNTTS
VNx8BImode (8,8) NUNTTS
juzhe.zhong@rivai.ai
From: incarnation.p.lee
Date: 2023-02-11 16:46
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rguenther; Pan Li
Subject: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
From: Pan Li <incarnation.p.lee@outlook.com>
Fix the bug for mode tieable of the rvv bool types. The vbool*_t
cannot be tied as the actually load/store size is determinated by
the vl. The mode size of rvv bool types are also adjusted for the
underlying optimization pass. The rvv bool type is vbool*_t, aka
vbool1_t, vbool2_t, vbool4_t, vbool8_t, vbool16_t, vbool32_t, and
vbool64_t.
PR 108185
PR 108654
gcc/ChangeLog:
* config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
* config/riscv/riscv.cc (riscv_v_adjust_bytesize):
(riscv_modes_tieable_p):
* config/riscv/riscv.h (riscv_v_adjust_bytesize):
* machmode.h (VECTOR_BOOL_MODE_P):
* tree-ssa-sccvn.cc (visit_reference_op_load):
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr108185-1.c: New test.
* gcc.target/riscv/pr108185-2.c: New test.
* gcc.target/riscv/pr108185-3.c: New test.
* gcc.target/riscv/pr108185-4.c: New test.
* gcc.target/riscv/pr108185-5.c: New test.
* gcc.target/riscv/pr108185-6.c: New test.
* gcc.target/riscv/pr108185-7.c: New test.
* gcc.target/riscv/pr108185-8.c: New test.
Signed-off-by: Pan Li <incarnation.p.lee@outlook.com>
---
gcc/config/riscv/riscv-modes.def | 14 ++--
gcc/config/riscv/riscv.cc | 34 ++++++++-
gcc/config/riscv/riscv.h | 2 +
gcc/machmode.h | 3 +
gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
gcc/tree-ssa-sccvn.cc | 13 +++-
13 files changed, 608 insertions(+), 11 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
index d5305efa8a6..cc21d3c83a2 100644
--- a/gcc/config/riscv/riscv-modes.def
+++ b/gcc/config/riscv/riscv-modes.def
@@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
ADJUST_ALIGNMENT (VNx32BI, 1);
ADJUST_ALIGNMENT (VNx64BI, 1);
-ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
+ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
+ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
+ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1));
+ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1));
+ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2));
+ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4));
+ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
/*
| Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3b7804b7501..138c052e13c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1003,6 +1003,27 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
return scale;
}
+/* Call from ADJUST_BYTESIZE in riscv-modes.def. Return the correct
+ BYTES size for corresponding machine_mode. */
+
+poly_int64
+riscv_v_adjust_bytesize (machine_mode mode, int scale)
+{
+ gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
+
+ if (riscv_v_ext_vector_mode_p (mode))
+ {
+ poly_uint16 mode_size = GET_MODE_SIZE (mode);
+
+ if (known_lt (mode_size, BYTES_PER_RISCV_VECTOR))
+ return mode_size;
+ else
+ return BYTES_PER_RISCV_VECTOR;
+ }
+
+ return scale;
+}
+
/* Return true if X is a valid address for machine mode MODE. If it is,
fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
effect. */
@@ -5807,11 +5828,22 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
/* Implement TARGET_MODES_TIEABLE_P.
Don't allow floating-point modes to be tied, since type punning of
- single-precision and double-precision is implementation defined. */
+ single-precision and double-precision is implementation defined.
+
+ Don't allow different vbool*_t modes to be tied, since the type
+ size is determinated by vl. */
static bool
riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
{
+ if (riscv_v_ext_vector_mode_p (mode1) && riscv_v_ext_vector_mode_p (mode2))
+ {
+ if (VECTOR_BOOL_MODE_P (mode1) || VECTOR_BOOL_MODE_P (mode2))
+ return false;
+
+ return known_eq (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2));
+ }
+
return (mode1 == mode2
|| !(GET_MODE_CLASS (mode1) == MODE_FLOAT
&& GET_MODE_CLASS (mode2) == MODE_FLOAT));
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index faffd5a77fe..f857223338c 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1028,6 +1028,8 @@ extern unsigned riscv_stack_boundary;
extern unsigned riscv_bytes_per_vector_chunk;
extern poly_uint16 riscv_vector_chunks;
extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
+extern poly_int64 riscv_v_adjust_bytesize (machine_mode mode, int scale);
+
/* The number of bits and bytes in a RVV vector. */
#define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
#define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
diff --git a/gcc/machmode.h b/gcc/machmode.h
index f1865c1ef42..6720472f2c9 100644
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -242,6 +242,9 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
|| CLASS == MODE_ACCUM \
|| CLASS == MODE_UACCUM)
+/* Nonzero if MODE is an vector bool mode. */
+#define VECTOR_BOOL_MODE_P(MODE) (GET_MODE_CLASS(MODE) == MODE_VECTOR_BOOL)
+
/* An optional T (i.e. a T or nothing), where T is some form of mode class. */
template<typename T>
class opt_mode
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
new file mode 100644
index 00000000000..c3d0b10271a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
new file mode 100644
index 00000000000..bd13ba916da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
new file mode 100644
index 00000000000..99928f7b1cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
new file mode 100644
index 00000000000..e70284fada8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
new file mode 100644
index 00000000000..575a7842cdf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
new file mode 100644
index 00000000000..95a11d37016
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
new file mode 100644
index 00000000000..8f6f0b11f09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
new file mode 100644
index 00000000000..d96959dd064
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
@@ -0,0 +1,77 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 028bedbc9a0..19fdba8cfa2 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see
#include "gimple-fold.h"
#include "tree-eh.h"
#include "gimplify.h"
+#include "target.h"
#include "flags.h"
#include "dojump.h"
#include "explow.h"
@@ -5657,10 +5658,16 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
if (result
&& !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
{
+ machine_mode result_mode = TYPE_MODE (TREE_TYPE (result));
+ machine_mode op_mode = TYPE_MODE (TREE_TYPE (op));
+ poly_uint16 result_mode_precision = GET_MODE_PRECISION (result_mode);
+ poly_uint16 op_mode_precision = GET_MODE_PRECISION (op_mode);
+
/* Avoid the type punning in case the result mode has padding where
- the op we lookup has not. */
- if (maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
- GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)))))
+ the op we lookup has not.
+ Avoid the type punning in case the target mode cannot be tied. */
+ if (maybe_lt (result_mode_precision, op_mode_precision)
+ || !targetm.modes_tieable_p (result_mode, op_mode))
result = NULL_TREE;
else
{
--
2.34.1
Thanks for contributing this.
Hi, Richard. Can you help us with this issue?
In RVV, we have vbool8_t (VNx8BImode), vbool16_t (VNx4BImode), vbool32_t (VNx2BImode), vbool64_t (VNx1BImode)
Since we are using 1bit-mask which is 1-BOOL occupy 1bit.
According to RVV ISA, we adjust these modes as follows:
VNx8BImode poly (8,8) NUNTTS (each nunits is 1bit mask)
VNx4BImode poly(4,4) NUNTTS (each nunits is 1bit mask)
VNx2BImode poly(2,2) NUNTTS (each nunits is 1bit mask)
VNx1BImode poly (1,1) NUNTTS (each nunits is 1bit mask)
If we tried GET_MODE_BITSIZE or GET_MODE_NUNITS to get value, their value are different.
However, If we tried GET_MODE_SIZE of these modes, they are the same (poly (1,1)).
Such scenario make these tied together and gives the wrong code gen since their bitsize are different.
Consider the case as this:
#include "riscv_vector.h"
void foo5_3 (int32_t * restrict in, int32_t * restrict out, size_t n, int cond)
{
vint8m1_t v = *(vint8m1_t*)in;
*(vint8m1_t*)out = v; vbool16_t v4 = *(vbool16_t *)in;
*(vbool16_t *)(out + 300) = v4;
vbool8_t v3 = *(vbool8_t*)in;
*(vbool8_t*)(out + 200) = v3;
}
The second vbool8_t load (vlm.v) is missing. Since GCC gives "v3 = VIEW_CONVERT (vbool8_t) v4" in gimple.
We failed to fix it in RISC-V backend. Can you help us with this? Thanks.
juzhe.zhong@rivai.ai
From: incarnation.p.lee
Date: 2023-02-11 16:46
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rguenther; Pan Li
Subject: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
From: Pan Li <incarnation.p.lee@outlook.com>
Fix the bug for mode tieable of the rvv bool types. The vbool*_t
cannot be tied as the actually load/store size is determinated by
the vl. The mode size of rvv bool types are also adjusted for the
underlying optimization pass. The rvv bool type is vbool*_t, aka
vbool1_t, vbool2_t, vbool4_t, vbool8_t, vbool16_t, vbool32_t, and
vbool64_t.
PR 108185
PR 108654
gcc/ChangeLog:
* config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
* config/riscv/riscv.cc (riscv_v_adjust_bytesize):
(riscv_modes_tieable_p):
* config/riscv/riscv.h (riscv_v_adjust_bytesize):
* machmode.h (VECTOR_BOOL_MODE_P):
* tree-ssa-sccvn.cc (visit_reference_op_load):
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr108185-1.c: New test.
* gcc.target/riscv/pr108185-2.c: New test.
* gcc.target/riscv/pr108185-3.c: New test.
* gcc.target/riscv/pr108185-4.c: New test.
* gcc.target/riscv/pr108185-5.c: New test.
* gcc.target/riscv/pr108185-6.c: New test.
* gcc.target/riscv/pr108185-7.c: New test.
* gcc.target/riscv/pr108185-8.c: New test.
Signed-off-by: Pan Li <incarnation.p.lee@outlook.com>
---
gcc/config/riscv/riscv-modes.def | 14 ++--
gcc/config/riscv/riscv.cc | 34 ++++++++-
gcc/config/riscv/riscv.h | 2 +
gcc/machmode.h | 3 +
gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
gcc/tree-ssa-sccvn.cc | 13 +++-
13 files changed, 608 insertions(+), 11 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
index d5305efa8a6..cc21d3c83a2 100644
--- a/gcc/config/riscv/riscv-modes.def
+++ b/gcc/config/riscv/riscv-modes.def
@@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
ADJUST_ALIGNMENT (VNx32BI, 1);
ADJUST_ALIGNMENT (VNx64BI, 1);
-ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
+ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
+ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
+ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1));
+ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1));
+ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2));
+ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4));
+ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
/*
| Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3b7804b7501..138c052e13c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1003,6 +1003,27 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
return scale;
}
+/* Call from ADJUST_BYTESIZE in riscv-modes.def. Return the correct
+ BYTES size for corresponding machine_mode. */
+
+poly_int64
+riscv_v_adjust_bytesize (machine_mode mode, int scale)
+{
+ gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
+
+ if (riscv_v_ext_vector_mode_p (mode))
+ {
+ poly_uint16 mode_size = GET_MODE_SIZE (mode);
+
+ if (known_lt (mode_size, BYTES_PER_RISCV_VECTOR))
+ return mode_size;
+ else
+ return BYTES_PER_RISCV_VECTOR;
+ }
+
+ return scale;
+}
+
/* Return true if X is a valid address for machine mode MODE. If it is,
fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
effect. */
@@ -5807,11 +5828,22 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
/* Implement TARGET_MODES_TIEABLE_P.
Don't allow floating-point modes to be tied, since type punning of
- single-precision and double-precision is implementation defined. */
+ single-precision and double-precision is implementation defined.
+
+ Don't allow different vbool*_t modes to be tied, since the type
+ size is determinated by vl. */
static bool
riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
{
+ if (riscv_v_ext_vector_mode_p (mode1) && riscv_v_ext_vector_mode_p (mode2))
+ {
+ if (VECTOR_BOOL_MODE_P (mode1) || VECTOR_BOOL_MODE_P (mode2))
+ return false;
+
+ return known_eq (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2));
+ }
+
return (mode1 == mode2
|| !(GET_MODE_CLASS (mode1) == MODE_FLOAT
&& GET_MODE_CLASS (mode2) == MODE_FLOAT));
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index faffd5a77fe..f857223338c 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1028,6 +1028,8 @@ extern unsigned riscv_stack_boundary;
extern unsigned riscv_bytes_per_vector_chunk;
extern poly_uint16 riscv_vector_chunks;
extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
+extern poly_int64 riscv_v_adjust_bytesize (machine_mode mode, int scale);
+
/* The number of bits and bytes in a RVV vector. */
#define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
#define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
diff --git a/gcc/machmode.h b/gcc/machmode.h
index f1865c1ef42..6720472f2c9 100644
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -242,6 +242,9 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
|| CLASS == MODE_ACCUM \
|| CLASS == MODE_UACCUM)
+/* Nonzero if MODE is an vector bool mode. */
+#define VECTOR_BOOL_MODE_P(MODE) (GET_MODE_CLASS(MODE) == MODE_VECTOR_BOOL)
+
/* An optional T (i.e. a T or nothing), where T is some form of mode class. */
template<typename T>
class opt_mode
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
new file mode 100644
index 00000000000..c3d0b10271a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
new file mode 100644
index 00000000000..bd13ba916da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
new file mode 100644
index 00000000000..99928f7b1cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
new file mode 100644
index 00000000000..e70284fada8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
new file mode 100644
index 00000000000..575a7842cdf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
new file mode 100644
index 00000000000..95a11d37016
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
new file mode 100644
index 00000000000..8f6f0b11f09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
new file mode 100644
index 00000000000..d96959dd064
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
@@ -0,0 +1,77 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 028bedbc9a0..19fdba8cfa2 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see
#include "gimple-fold.h"
#include "tree-eh.h"
#include "gimplify.h"
+#include "target.h"
#include "flags.h"
#include "dojump.h"
#include "explow.h"
@@ -5657,10 +5658,16 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
if (result
&& !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
{
+ machine_mode result_mode = TYPE_MODE (TREE_TYPE (result));
+ machine_mode op_mode = TYPE_MODE (TREE_TYPE (op));
+ poly_uint16 result_mode_precision = GET_MODE_PRECISION (result_mode);
+ poly_uint16 op_mode_precision = GET_MODE_PRECISION (op_mode);
+
/* Avoid the type punning in case the result mode has padding where
- the op we lookup has not. */
- if (maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
- GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)))))
+ the op we lookup has not.
+ Avoid the type punning in case the target mode cannot be tied. */
+ if (maybe_lt (result_mode_precision, op_mode_precision)
+ || !targetm.modes_tieable_p (result_mode, op_mode))
result = NULL_TREE;
else
{
--
2.34.1
On Sat, 11 Feb 2023, juzhe.zhong@rivai.ai wrote:
> Thanks for contributing this.
> Hi, Richard. Can you help us with this issue?
> In RVV, we have vbool8_t (VNx8BImode), vbool16_t (VNx4BImode), vbool32_t (VNx2BImode), vbool64_t (VNx1BImode)
> Since we are using 1bit-mask which is 1-BOOL occupy 1bit.
> According to RVV ISA, we adjust these modes as follows:
>
> VNx8BImode poly (8,8) NUNTTS (each nunits is 1bit mask)
> VNx4BImode poly(4,4) NUNTTS (each nunits is 1bit mask)
> VNx2BImode poly(2,2) NUNTTS (each nunits is 1bit mask)
> VNx1BImode poly (1,1) NUNTTS (each nunits is 1bit mask)
So how's VNx1BImode laid out for N == 2? Is that still a single
byte and two consecutive bits? I suppose so.
But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
I'm not sure what GET_MODE_PRECISION of the vector mode itself
should be here, but then I wonder ...
> If we tried GET_MODE_BITSIZE or GET_MODE_NUNITS to get value, their value are different.
> However, If we tried GET_MODE_SIZE of these modes, they are the same (poly (1,1)).
> Such scenario make these tied together and gives the wrong code gen since their bitsize are different.
> Consider the case as this:
> #include "riscv_vector.h"
> void foo5_3 (int32_t * restrict in, int32_t * restrict out, size_t n, int cond)
> {
> vint8m1_t v = *(vint8m1_t*)in;
> *(vint8m1_t*)out = v; vbool16_t v4 = *(vbool16_t *)in;
> *(vbool16_t *)(out + 300) = v4;
> vbool8_t v3 = *(vbool8_t*)in;
> *(vbool8_t*)(out + 200) = v3;
> }
> The second vbool8_t load (vlm.v) is missing. Since GCC gives "v3 = VIEW_CONVERT (vbool8_t) v4" in gimple.
> We failed to fix it in RISC-V backend. Can you help us with this? Thanks.
... why for the loads the "padding" is not loaded? The above testcase
is probably more complicated than necessary as well?
Thanks,
Richard.
>
> juzhe.zhong@rivai.ai
>
> From: incarnation.p.lee
> Date: 2023-02-11 16:46
> To: gcc-patches
> CC: juzhe.zhong; kito.cheng; rguenther; Pan Li
> Subject: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
> From: Pan Li <incarnation.p.lee@outlook.com>
>
> Fix the bug for mode tieable of the rvv bool types. The vbool*_t
> cannot be tied as the actually load/store size is determinated by
> the vl. The mode size of rvv bool types are also adjusted for the
> underlying optimization pass. The rvv bool type is vbool*_t, aka
> vbool1_t, vbool2_t, vbool4_t, vbool8_t, vbool16_t, vbool32_t, and
> vbool64_t.
>
> PR 108185
> PR 108654
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
> * config/riscv/riscv.cc (riscv_v_adjust_bytesize):
> (riscv_modes_tieable_p):
> * config/riscv/riscv.h (riscv_v_adjust_bytesize):
> * machmode.h (VECTOR_BOOL_MODE_P):
> * tree-ssa-sccvn.cc (visit_reference_op_load):
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr108185-1.c: New test.
> * gcc.target/riscv/pr108185-2.c: New test.
> * gcc.target/riscv/pr108185-3.c: New test.
> * gcc.target/riscv/pr108185-4.c: New test.
> * gcc.target/riscv/pr108185-5.c: New test.
> * gcc.target/riscv/pr108185-6.c: New test.
> * gcc.target/riscv/pr108185-7.c: New test.
> * gcc.target/riscv/pr108185-8.c: New test.
>
> Signed-off-by: Pan Li <incarnation.p.lee@outlook.com>
> ---
> gcc/config/riscv/riscv-modes.def | 14 ++--
> gcc/config/riscv/riscv.cc | 34 ++++++++-
> gcc/config/riscv/riscv.h | 2 +
> gcc/machmode.h | 3 +
> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
> gcc/tree-ssa-sccvn.cc | 13 +++-
> 13 files changed, 608 insertions(+), 11 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>
> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
> index d5305efa8a6..cc21d3c83a2 100644
> --- a/gcc/config/riscv/riscv-modes.def
> +++ b/gcc/config/riscv/riscv-modes.def
> @@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
> ADJUST_ALIGNMENT (VNx32BI, 1);
> ADJUST_ALIGNMENT (VNx64BI, 1);
> -ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> -ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> -ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> -ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> -ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> -ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> -ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
> +ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
> +ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
> +ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1));
> +ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1));
> +ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2));
> +ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4));
> +ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
> /*
> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 3b7804b7501..138c052e13c 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -1003,6 +1003,27 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
> return scale;
> }
> +/* Call from ADJUST_BYTESIZE in riscv-modes.def. Return the correct
> + BYTES size for corresponding machine_mode. */
> +
> +poly_int64
> +riscv_v_adjust_bytesize (machine_mode mode, int scale)
> +{
> + gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
> +
> + if (riscv_v_ext_vector_mode_p (mode))
> + {
> + poly_uint16 mode_size = GET_MODE_SIZE (mode);
> +
> + if (known_lt (mode_size, BYTES_PER_RISCV_VECTOR))
> + return mode_size;
> + else
> + return BYTES_PER_RISCV_VECTOR;
> + }
> +
> + return scale;
> +}
> +
> /* Return true if X is a valid address for machine mode MODE. If it is,
> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
> effect. */
> @@ -5807,11 +5828,22 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
> /* Implement TARGET_MODES_TIEABLE_P.
> Don't allow floating-point modes to be tied, since type punning of
> - single-precision and double-precision is implementation defined. */
> + single-precision and double-precision is implementation defined.
> +
> + Don't allow different vbool*_t modes to be tied, since the type
> + size is determinated by vl. */
> static bool
> riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
> {
> + if (riscv_v_ext_vector_mode_p (mode1) && riscv_v_ext_vector_mode_p (mode2))
> + {
> + if (VECTOR_BOOL_MODE_P (mode1) || VECTOR_BOOL_MODE_P (mode2))
> + return false;
> +
> + return known_eq (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2));
> + }
> +
> return (mode1 == mode2
> || !(GET_MODE_CLASS (mode1) == MODE_FLOAT
> && GET_MODE_CLASS (mode2) == MODE_FLOAT));
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index faffd5a77fe..f857223338c 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -1028,6 +1028,8 @@ extern unsigned riscv_stack_boundary;
> extern unsigned riscv_bytes_per_vector_chunk;
> extern poly_uint16 riscv_vector_chunks;
> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> +extern poly_int64 riscv_v_adjust_bytesize (machine_mode mode, int scale);
> +
> /* The number of bits and bytes in a RVV vector. */
> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
> diff --git a/gcc/machmode.h b/gcc/machmode.h
> index f1865c1ef42..6720472f2c9 100644
> --- a/gcc/machmode.h
> +++ b/gcc/machmode.h
> @@ -242,6 +242,9 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
> || CLASS == MODE_ACCUM \
> || CLASS == MODE_UACCUM)
> +/* Nonzero if MODE is an vector bool mode. */
> +#define VECTOR_BOOL_MODE_P(MODE) (GET_MODE_CLASS(MODE) == MODE_VECTOR_BOOL)
> +
> /* An optional T (i.e. a T or nothing), where T is some form of mode class. */
> template<typename T>
> class opt_mode
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> new file mode 100644
> index 00000000000..c3d0b10271a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> new file mode 100644
> index 00000000000..bd13ba916da
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> new file mode 100644
> index 00000000000..99928f7b1cc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> new file mode 100644
> index 00000000000..e70284fada8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> new file mode 100644
> index 00000000000..575a7842cdf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> new file mode 100644
> index 00000000000..95a11d37016
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> new file mode 100644
> index 00000000000..8f6f0b11f09
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> new file mode 100644
> index 00000000000..d96959dd064
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> @@ -0,0 +1,77 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index 028bedbc9a0..19fdba8cfa2 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see
> #include "gimple-fold.h"
> #include "tree-eh.h"
> #include "gimplify.h"
> +#include "target.h"
> #include "flags.h"
> #include "dojump.h"
> #include "explow.h"
> @@ -5657,10 +5658,16 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
> if (result
> && !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
> {
> + machine_mode result_mode = TYPE_MODE (TREE_TYPE (result));
> + machine_mode op_mode = TYPE_MODE (TREE_TYPE (op));
> + poly_uint16 result_mode_precision = GET_MODE_PRECISION (result_mode);
> + poly_uint16 op_mode_precision = GET_MODE_PRECISION (op_mode);
> +
> /* Avoid the type punning in case the result mode has padding where
> - the op we lookup has not. */
> - if (maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
> - GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)))))
> + the op we lookup has not.
> + Avoid the type punning in case the target mode cannot be tied. */
> + if (maybe_lt (result_mode_precision, op_mode_precision)
> + || !targetm.modes_tieable_p (result_mode, op_mode))
> result = NULL_TREE;
> else
> {
>
>> But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
Yes, I think so.
Let's explain RVV more clearly.
Let's suppose we have vector-length = 64bits in RVV CPU.
VNx1BI is exactly 1 consecutive bits.
VNx2BI is exactly 2 consecutive bits.
VNx4BI is exactly 4 consecutive bits.
VNx8BI is exactly 8 consecutive bits.
For VNx1BI (vbool64_t ), we load it wich this asm:
vsetvl e8mf8
vlm.v
For VNx2BI (vbool32_t ), we load it wich this asm:
vsetvl e8mf4
vlm.v
For VNx4BI (vbool16_t ), we load it wich this asm:
vsetvl e8mf2
vlm.v
For VNx8BI (vbool8_t ), we load it wich this asm:
vsetvl e8m1
vlm.v
In case of this code sequence:
vbool16_t v4 = *(vbool16_t *)in;
vbool8_t v3 = *(vbool8_t*)in;
Since VNx4BI (vbool16_t ) is smaller than VNx8BI (vbool8_t )
We can't just use the data loaded by VNx4BI (vbool16_t ) in VNx8BI (vbool8_t ).
But we can use the data loaded by VNx8BI (vbool8_t ) in VNx4BI (vbool16_t ).
In this example, GCC thinks data loaded for vbool8_t v3 can be replaced by vbool16_t v4 which is already loaded
It's incorrect for RVV.
Maybe @kito can give us more information about RVV ISA if I don't explain it clearly.
juzhe.zhong@rivai.ai
From: Richard Biener
Date: 2023-02-13 16:07
To: juzhe.zhong
CC: Pan Li; gcc-patches; kito.cheng; richard.sandiford; ams
Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
On Sat, 11 Feb 2023, juzhe.zhong@rivai.ai wrote:
> Thanks for contributing this.
> Hi, Richard. Can you help us with this issue?
> In RVV, we have vbool8_t (VNx8BImode), vbool16_t (VNx4BImode), vbool32_t (VNx2BImode), vbool64_t (VNx1BImode)
> Since we are using 1bit-mask which is 1-BOOL occupy 1bit.
> According to RVV ISA, we adjust these modes as follows:
>
> VNx8BImode poly (8,8) NUNTTS (each nunits is 1bit mask)
> VNx4BImode poly(4,4) NUNTTS (each nunits is 1bit mask)
> VNx2BImode poly(2,2) NUNTTS (each nunits is 1bit mask)
> VNx1BImode poly (1,1) NUNTTS (each nunits is 1bit mask)
So how's VNx1BImode laid out for N == 2? Is that still a single
byte and two consecutive bits? I suppose so.
But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
I'm not sure what GET_MODE_PRECISION of the vector mode itself
should be here, but then I wonder ...
> If we tried GET_MODE_BITSIZE or GET_MODE_NUNITS to get value, their value are different.
> However, If we tried GET_MODE_SIZE of these modes, they are the same (poly (1,1)).
> Such scenario make these tied together and gives the wrong code gen since their bitsize are different.
> Consider the case as this:
> #include "riscv_vector.h"
> void foo5_3 (int32_t * restrict in, int32_t * restrict out, size_t n, int cond)
> {
> vint8m1_t v = *(vint8m1_t*)in;
> *(vint8m1_t*)out = v; vbool16_t v4 = *(vbool16_t *)in;
> *(vbool16_t *)(out + 300) = v4;
> vbool8_t v3 = *(vbool8_t*)in;
> *(vbool8_t*)(out + 200) = v3;
> }
> The second vbool8_t load (vlm.v) is missing. Since GCC gives "v3 = VIEW_CONVERT (vbool8_t) v4" in gimple.
> We failed to fix it in RISC-V backend. Can you help us with this? Thanks.
... why for the loads the "padding" is not loaded? The above testcase
is probably more complicated than necessary as well?
Thanks,
Richard.
>
> juzhe.zhong@rivai.ai
>
> From: incarnation.p.lee
> Date: 2023-02-11 16:46
> To: gcc-patches
> CC: juzhe.zhong; kito.cheng; rguenther; Pan Li
> Subject: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
> From: Pan Li <incarnation.p.lee@outlook.com>
>
> Fix the bug for mode tieable of the rvv bool types. The vbool*_t
> cannot be tied as the actually load/store size is determinated by
> the vl. The mode size of rvv bool types are also adjusted for the
> underlying optimization pass. The rvv bool type is vbool*_t, aka
> vbool1_t, vbool2_t, vbool4_t, vbool8_t, vbool16_t, vbool32_t, and
> vbool64_t.
>
> PR 108185
> PR 108654
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
> * config/riscv/riscv.cc (riscv_v_adjust_bytesize):
> (riscv_modes_tieable_p):
> * config/riscv/riscv.h (riscv_v_adjust_bytesize):
> * machmode.h (VECTOR_BOOL_MODE_P):
> * tree-ssa-sccvn.cc (visit_reference_op_load):
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr108185-1.c: New test.
> * gcc.target/riscv/pr108185-2.c: New test.
> * gcc.target/riscv/pr108185-3.c: New test.
> * gcc.target/riscv/pr108185-4.c: New test.
> * gcc.target/riscv/pr108185-5.c: New test.
> * gcc.target/riscv/pr108185-6.c: New test.
> * gcc.target/riscv/pr108185-7.c: New test.
> * gcc.target/riscv/pr108185-8.c: New test.
>
> Signed-off-by: Pan Li <incarnation.p.lee@outlook.com>
> ---
> gcc/config/riscv/riscv-modes.def | 14 ++--
> gcc/config/riscv/riscv.cc | 34 ++++++++-
> gcc/config/riscv/riscv.h | 2 +
> gcc/machmode.h | 3 +
> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
> gcc/tree-ssa-sccvn.cc | 13 +++-
> 13 files changed, 608 insertions(+), 11 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>
> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
> index d5305efa8a6..cc21d3c83a2 100644
> --- a/gcc/config/riscv/riscv-modes.def
> +++ b/gcc/config/riscv/riscv-modes.def
> @@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
> ADJUST_ALIGNMENT (VNx32BI, 1);
> ADJUST_ALIGNMENT (VNx64BI, 1);
> -ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> -ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> -ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> -ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> -ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> -ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> -ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
> +ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
> +ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
> +ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1));
> +ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1));
> +ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2));
> +ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4));
> +ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
> /*
> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 3b7804b7501..138c052e13c 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -1003,6 +1003,27 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
> return scale;
> }
> +/* Call from ADJUST_BYTESIZE in riscv-modes.def. Return the correct
> + BYTES size for corresponding machine_mode. */
> +
> +poly_int64
> +riscv_v_adjust_bytesize (machine_mode mode, int scale)
> +{
> + gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
> +
> + if (riscv_v_ext_vector_mode_p (mode))
> + {
> + poly_uint16 mode_size = GET_MODE_SIZE (mode);
> +
> + if (known_lt (mode_size, BYTES_PER_RISCV_VECTOR))
> + return mode_size;
> + else
> + return BYTES_PER_RISCV_VECTOR;
> + }
> +
> + return scale;
> +}
> +
> /* Return true if X is a valid address for machine mode MODE. If it is,
> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
> effect. */
> @@ -5807,11 +5828,22 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
> /* Implement TARGET_MODES_TIEABLE_P.
> Don't allow floating-point modes to be tied, since type punning of
> - single-precision and double-precision is implementation defined. */
> + single-precision and double-precision is implementation defined.
> +
> + Don't allow different vbool*_t modes to be tied, since the type
> + size is determinated by vl. */
> static bool
> riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
> {
> + if (riscv_v_ext_vector_mode_p (mode1) && riscv_v_ext_vector_mode_p (mode2))
> + {
> + if (VECTOR_BOOL_MODE_P (mode1) || VECTOR_BOOL_MODE_P (mode2))
> + return false;
> +
> + return known_eq (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2));
> + }
> +
> return (mode1 == mode2
> || !(GET_MODE_CLASS (mode1) == MODE_FLOAT
> && GET_MODE_CLASS (mode2) == MODE_FLOAT));
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index faffd5a77fe..f857223338c 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -1028,6 +1028,8 @@ extern unsigned riscv_stack_boundary;
> extern unsigned riscv_bytes_per_vector_chunk;
> extern poly_uint16 riscv_vector_chunks;
> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> +extern poly_int64 riscv_v_adjust_bytesize (machine_mode mode, int scale);
> +
> /* The number of bits and bytes in a RVV vector. */
> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
> diff --git a/gcc/machmode.h b/gcc/machmode.h
> index f1865c1ef42..6720472f2c9 100644
> --- a/gcc/machmode.h
> +++ b/gcc/machmode.h
> @@ -242,6 +242,9 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
> || CLASS == MODE_ACCUM \
> || CLASS == MODE_UACCUM)
> +/* Nonzero if MODE is an vector bool mode. */
> +#define VECTOR_BOOL_MODE_P(MODE) (GET_MODE_CLASS(MODE) == MODE_VECTOR_BOOL)
> +
> /* An optional T (i.e. a T or nothing), where T is some form of mode class. */
> template<typename T>
> class opt_mode
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> new file mode 100644
> index 00000000000..c3d0b10271a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> new file mode 100644
> index 00000000000..bd13ba916da
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> new file mode 100644
> index 00000000000..99928f7b1cc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> new file mode 100644
> index 00000000000..e70284fada8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> new file mode 100644
> index 00000000000..575a7842cdf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> new file mode 100644
> index 00000000000..95a11d37016
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> new file mode 100644
> index 00000000000..8f6f0b11f09
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> new file mode 100644
> index 00000000000..d96959dd064
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> @@ -0,0 +1,77 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index 028bedbc9a0..19fdba8cfa2 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see
> #include "gimple-fold.h"
> #include "tree-eh.h"
> #include "gimplify.h"
> +#include "target.h"
> #include "flags.h"
> #include "dojump.h"
> #include "explow.h"
> @@ -5657,10 +5658,16 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
> if (result
> && !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
> {
> + machine_mode result_mode = TYPE_MODE (TREE_TYPE (result));
> + machine_mode op_mode = TYPE_MODE (TREE_TYPE (op));
> + poly_uint16 result_mode_precision = GET_MODE_PRECISION (result_mode);
> + poly_uint16 op_mode_precision = GET_MODE_PRECISION (op_mode);
> +
> /* Avoid the type punning in case the result mode has padding where
> - the op we lookup has not. */
> - if (maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
> - GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)))))
> + the op we lookup has not.
> + Avoid the type punning in case the target mode cannot be tied. */
> + if (maybe_lt (result_mode_precision, op_mode_precision)
> + || !targetm.modes_tieable_p (result_mode, op_mode))
> result = NULL_TREE;
> else
> {
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
On Mon, 13 Feb 2023, juzhe.zhong@rivai.ai wrote:
> >> But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
> Yes, I think so.
>
> Let's explain RVV more clearly.
> Let's suppose we have vector-length = 64bits in RVV CPU.
> VNx1BI is exactly 1 consecutive bits.
> VNx2BI is exactly 2 consecutive bits.
> VNx4BI is exactly 4 consecutive bits.
> VNx8BI is exactly 8 consecutive bits.
>
> For VNx1BI (vbool64_t ), we load it wich this asm:
> vsetvl e8mf8
> vlm.v
>
> For VNx2BI (vbool32_t ), we load it wich this asm:
> vsetvl e8mf4
> vlm.v
>
> For VNx4BI (vbool16_t ), we load it wich this asm:
> vsetvl e8mf2
> vlm.v
>
> For VNx8BI (vbool8_t ), we load it wich this asm:
> vsetvl e8m1
> vlm.v
>
> In case of this code sequence:
> vbool16_t v4 = *(vbool16_t *)in;
> vbool8_t v3 = *(vbool8_t*)in;
>
> Since VNx4BI (vbool16_t ) is smaller than VNx8BI (vbool8_t )
> We can't just use the data loaded by VNx4BI (vbool16_t ) in VNx8BI (vbool8_t ).
> But we can use the data loaded by VNx8BI (vbool8_t ) in VNx4BI (vbool16_t ).
>
> In this example, GCC thinks data loaded for vbool8_t v3 can be replaced by vbool16_t v4 which is already loaded
> It's incorrect for RVV.
OK, so the 'vlm.v' instruction will zero the padding bits (according to
vsetvl), but I doubt the memory subsystem will not load a whole byte.
Then GET_MODE_PRECISION of VNx4BI has to be smaller than
GET_MODE_PRECISION of VNx8BI, even if their size is the same.
I suppose that ADJUST_NUNITS should be able to do this, but then we
have in aarch64-modes.def
VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2);
ADJUST_NUNITS (VNx2BI, aarch64_sve_vg);
so all VNxMBI modes are 2 bytes in size but their component is always
BImode but IIRC the elements of VNx2BImode occupy 4 bits each?
For riscv we have
VECTOR_BOOL_MODE (VNx1BI, 1, BI, 1);
ADJUST_NUNITS (VNx1BI, riscv_v_adjust_nunits (VNx1BImode, 1));
so here it would be natural to set the mode precision to
a poly-int computed by the component precision times nunits? OTOH
we have to look at the component precision vs. size as well and
/* Single bit mode used for booleans. */
BOOL_MODE (BI, 1, 1);
BOOL_MODE is not documented, but its precision and size, so BImode
has a size of 1. That makes VECTOR_BOOL_MODE very special since
the layout isn't derived from the component mode. Deriving the
layout from the precision would make aarch64 incorrect and
would need BI2 and BI4 modes at least.
Adding a parameter to ADJUST_NUNITS might be the way to go instead,
specifying the number of bits in a component?
Richard.
> Maybe @kito can give us more information about RVV ISA if I don't explain it clearly.
>
>
> juzhe.zhong@rivai.ai
>
> From: Richard Biener
> Date: 2023-02-13 16:07
> To: juzhe.zhong
> CC: Pan Li; gcc-patches; kito.cheng; richard.sandiford; ams
> Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
> On Sat, 11 Feb 2023, juzhe.zhong@rivai.ai wrote:
>
> > Thanks for contributing this.
> > Hi, Richard. Can you help us with this issue?
> > In RVV, we have vbool8_t (VNx8BImode), vbool16_t (VNx4BImode), vbool32_t (VNx2BImode), vbool64_t (VNx1BImode)
> > Since we are using 1bit-mask which is 1-BOOL occupy 1bit.
> > According to RVV ISA, we adjust these modes as follows:
> >
> > VNx8BImode poly (8,8) NUNTTS (each nunits is 1bit mask)
> > VNx4BImode poly(4,4) NUNTTS (each nunits is 1bit mask)
> > VNx2BImode poly(2,2) NUNTTS (each nunits is 1bit mask)
> > VNx1BImode poly (1,1) NUNTTS (each nunits is 1bit mask)
>
> So how's VNx1BImode laid out for N == 2? Is that still a single
> byte and two consecutive bits? I suppose so.
>
> But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
>
> I'm not sure what GET_MODE_PRECISION of the vector mode itself
> should be here, but then I wonder ...
>
> > If we tried GET_MODE_BITSIZE or GET_MODE_NUNITS to get value, their value are different.
> > However, If we tried GET_MODE_SIZE of these modes, they are the same (poly (1,1)).
> > Such scenario make these tied together and gives the wrong code gen since their bitsize are different.
> > Consider the case as this:
> > #include "riscv_vector.h"
> > void foo5_3 (int32_t * restrict in, int32_t * restrict out, size_t n, int cond)
> > {
> > vint8m1_t v = *(vint8m1_t*)in;
> > *(vint8m1_t*)out = v; vbool16_t v4 = *(vbool16_t *)in;
> > *(vbool16_t *)(out + 300) = v4;
> > vbool8_t v3 = *(vbool8_t*)in;
> > *(vbool8_t*)(out + 200) = v3;
> > }
> > The second vbool8_t load (vlm.v) is missing. Since GCC gives "v3 = VIEW_CONVERT (vbool8_t) v4" in gimple.
> > We failed to fix it in RISC-V backend. Can you help us with this? Thanks.
>
> ... why for the loads the "padding" is not loaded? The above testcase
> is probably more complicated than necessary as well?
>
> Thanks,
> Richard.
> >
> > juzhe.zhong@rivai.ai
> >
> > From: incarnation.p.lee
> > Date: 2023-02-11 16:46
> > To: gcc-patches
> > CC: juzhe.zhong; kito.cheng; rguenther; Pan Li
> > Subject: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
> > From: Pan Li <incarnation.p.lee@outlook.com>
> >
> > Fix the bug for mode tieable of the rvv bool types. The vbool*_t
> > cannot be tied as the actually load/store size is determinated by
> > the vl. The mode size of rvv bool types are also adjusted for the
> > underlying optimization pass. The rvv bool type is vbool*_t, aka
> > vbool1_t, vbool2_t, vbool4_t, vbool8_t, vbool16_t, vbool32_t, and
> > vbool64_t.
> >
> > PR 108185
> > PR 108654
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
> > * config/riscv/riscv.cc (riscv_v_adjust_bytesize):
> > (riscv_modes_tieable_p):
> > * config/riscv/riscv.h (riscv_v_adjust_bytesize):
> > * machmode.h (VECTOR_BOOL_MODE_P):
> > * tree-ssa-sccvn.cc (visit_reference_op_load):
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/pr108185-1.c: New test.
> > * gcc.target/riscv/pr108185-2.c: New test.
> > * gcc.target/riscv/pr108185-3.c: New test.
> > * gcc.target/riscv/pr108185-4.c: New test.
> > * gcc.target/riscv/pr108185-5.c: New test.
> > * gcc.target/riscv/pr108185-6.c: New test.
> > * gcc.target/riscv/pr108185-7.c: New test.
> > * gcc.target/riscv/pr108185-8.c: New test.
> >
> > Signed-off-by: Pan Li <incarnation.p.lee@outlook.com>
> > ---
> > gcc/config/riscv/riscv-modes.def | 14 ++--
> > gcc/config/riscv/riscv.cc | 34 ++++++++-
> > gcc/config/riscv/riscv.h | 2 +
> > gcc/machmode.h | 3 +
> > gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
> > gcc/tree-ssa-sccvn.cc | 13 +++-
> > 13 files changed, 608 insertions(+), 11 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >
> > diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
> > index d5305efa8a6..cc21d3c83a2 100644
> > --- a/gcc/config/riscv/riscv-modes.def
> > +++ b/gcc/config/riscv/riscv-modes.def
> > @@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
> > ADJUST_ALIGNMENT (VNx32BI, 1);
> > ADJUST_ALIGNMENT (VNx64BI, 1);
> > -ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
> > +ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
> > +ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
> > +ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1));
> > +ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1));
> > +ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2));
> > +ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4));
> > +ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
> > /*
> > | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 3b7804b7501..138c052e13c 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -1003,6 +1003,27 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
> > return scale;
> > }
> > +/* Call from ADJUST_BYTESIZE in riscv-modes.def. Return the correct
> > + BYTES size for corresponding machine_mode. */
> > +
> > +poly_int64
> > +riscv_v_adjust_bytesize (machine_mode mode, int scale)
> > +{
> > + gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
> > +
> > + if (riscv_v_ext_vector_mode_p (mode))
> > + {
> > + poly_uint16 mode_size = GET_MODE_SIZE (mode);
> > +
> > + if (known_lt (mode_size, BYTES_PER_RISCV_VECTOR))
> > + return mode_size;
> > + else
> > + return BYTES_PER_RISCV_VECTOR;
> > + }
> > +
> > + return scale;
> > +}
> > +
> > /* Return true if X is a valid address for machine mode MODE. If it is,
> > fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
> > effect. */
> > @@ -5807,11 +5828,22 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
> > /* Implement TARGET_MODES_TIEABLE_P.
> > Don't allow floating-point modes to be tied, since type punning of
> > - single-precision and double-precision is implementation defined. */
> > + single-precision and double-precision is implementation defined.
> > +
> > + Don't allow different vbool*_t modes to be tied, since the type
> > + size is determinated by vl. */
> > static bool
> > riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
> > {
> > + if (riscv_v_ext_vector_mode_p (mode1) && riscv_v_ext_vector_mode_p (mode2))
> > + {
> > + if (VECTOR_BOOL_MODE_P (mode1) || VECTOR_BOOL_MODE_P (mode2))
> > + return false;
> > +
> > + return known_eq (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2));
> > + }
> > +
> > return (mode1 == mode2
> > || !(GET_MODE_CLASS (mode1) == MODE_FLOAT
> > && GET_MODE_CLASS (mode2) == MODE_FLOAT));
> > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> > index faffd5a77fe..f857223338c 100644
> > --- a/gcc/config/riscv/riscv.h
> > +++ b/gcc/config/riscv/riscv.h
> > @@ -1028,6 +1028,8 @@ extern unsigned riscv_stack_boundary;
> > extern unsigned riscv_bytes_per_vector_chunk;
> > extern poly_uint16 riscv_vector_chunks;
> > extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> > +extern poly_int64 riscv_v_adjust_bytesize (machine_mode mode, int scale);
> > +
> > /* The number of bits and bytes in a RVV vector. */
> > #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
> > #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
> > diff --git a/gcc/machmode.h b/gcc/machmode.h
> > index f1865c1ef42..6720472f2c9 100644
> > --- a/gcc/machmode.h
> > +++ b/gcc/machmode.h
> > @@ -242,6 +242,9 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
> > || CLASS == MODE_ACCUM \
> > || CLASS == MODE_UACCUM)
> > +/* Nonzero if MODE is an vector bool mode. */
> > +#define VECTOR_BOOL_MODE_P(MODE) (GET_MODE_CLASS(MODE) == MODE_VECTOR_BOOL)
> > +
> > /* An optional T (i.e. a T or nothing), where T is some form of mode class. */
> > template<typename T>
> > class opt_mode
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > new file mode 100644
> > index 00000000000..c3d0b10271a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > new file mode 100644
> > index 00000000000..bd13ba916da
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > new file mode 100644
> > index 00000000000..99928f7b1cc
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > new file mode 100644
> > index 00000000000..e70284fada8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > new file mode 100644
> > index 00000000000..575a7842cdf
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > new file mode 100644
> > index 00000000000..95a11d37016
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > new file mode 100644
> > index 00000000000..8f6f0b11f09
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > new file mode 100644
> > index 00000000000..d96959dd064
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > @@ -0,0 +1,77 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> > index 028bedbc9a0..19fdba8cfa2 100644
> > --- a/gcc/tree-ssa-sccvn.cc
> > +++ b/gcc/tree-ssa-sccvn.cc
> > @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see
> > #include "gimple-fold.h"
> > #include "tree-eh.h"
> > #include "gimplify.h"
> > +#include "target.h"
> > #include "flags.h"
> > #include "dojump.h"
> > #include "explow.h"
> > @@ -5657,10 +5658,16 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
> > if (result
> > && !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
> > {
> > + machine_mode result_mode = TYPE_MODE (TREE_TYPE (result));
> > + machine_mode op_mode = TYPE_MODE (TREE_TYPE (op));
> > + poly_uint16 result_mode_precision = GET_MODE_PRECISION (result_mode);
> > + poly_uint16 op_mode_precision = GET_MODE_PRECISION (op_mode);
> > +
> > /* Avoid the type punning in case the result mode has padding where
> > - the op we lookup has not. */
> > - if (maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
> > - GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)))))
> > + the op we lookup has not.
> > + Avoid the type punning in case the target mode cannot be tied. */
> > + if (maybe_lt (result_mode_precision, op_mode_precision)
> > + || !targetm.modes_tieable_p (result_mode, op_mode))
> > result = NULL_TREE;
> > else
> > {
> >
>
>
I am not sure changing the precision inner mode of BImode is correct for RVV.
Since by definition , each single 1-bit mask in RVV mask layout are consecutive.
Maybe we can wait for Kito answer this question ?
juzhe.zhong@rivai.ai
From: Richard Biener
Date: 2023-02-13 16:46
To: juzhe.zhong@rivai.ai
CC: incarnation.p.lee; gcc-patches; Kito.cheng; richard.sandiford; ams
Subject: Re: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
On Mon, 13 Feb 2023, juzhe.zhong@rivai.ai wrote:
> >> But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
> Yes, I think so.
>
> Let's explain RVV more clearly.
> Let's suppose we have vector-length = 64bits in RVV CPU.
> VNx1BI is exactly 1 consecutive bits.
> VNx2BI is exactly 2 consecutive bits.
> VNx4BI is exactly 4 consecutive bits.
> VNx8BI is exactly 8 consecutive bits.
>
> For VNx1BI (vbool64_t ), we load it wich this asm:
> vsetvl e8mf8
> vlm.v
>
> For VNx2BI (vbool32_t ), we load it wich this asm:
> vsetvl e8mf4
> vlm.v
>
> For VNx4BI (vbool16_t ), we load it wich this asm:
> vsetvl e8mf2
> vlm.v
>
> For VNx8BI (vbool8_t ), we load it wich this asm:
> vsetvl e8m1
> vlm.v
>
> In case of this code sequence:
> vbool16_t v4 = *(vbool16_t *)in;
> vbool8_t v3 = *(vbool8_t*)in;
>
> Since VNx4BI (vbool16_t ) is smaller than VNx8BI (vbool8_t )
> We can't just use the data loaded by VNx4BI (vbool16_t ) in VNx8BI (vbool8_t ).
> But we can use the data loaded by VNx8BI (vbool8_t ) in VNx4BI (vbool16_t ).
>
> In this example, GCC thinks data loaded for vbool8_t v3 can be replaced by vbool16_t v4 which is already loaded
> It's incorrect for RVV.
OK, so the 'vlm.v' instruction will zero the padding bits (according to
vsetvl), but I doubt the memory subsystem will not load a whole byte.
Then GET_MODE_PRECISION of VNx4BI has to be smaller than
GET_MODE_PRECISION of VNx8BI, even if their size is the same.
I suppose that ADJUST_NUNITS should be able to do this, but then we
have in aarch64-modes.def
VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2);
ADJUST_NUNITS (VNx2BI, aarch64_sve_vg);
so all VNxMBI modes are 2 bytes in size but their component is always
BImode but IIRC the elements of VNx2BImode occupy 4 bits each?
For riscv we have
VECTOR_BOOL_MODE (VNx1BI, 1, BI, 1);
ADJUST_NUNITS (VNx1BI, riscv_v_adjust_nunits (VNx1BImode, 1));
so here it would be natural to set the mode precision to
a poly-int computed by the component precision times nunits? OTOH
we have to look at the component precision vs. size as well and
/* Single bit mode used for booleans. */
BOOL_MODE (BI, 1, 1);
BOOL_MODE is not documented, but its precision and size, so BImode
has a size of 1. That makes VECTOR_BOOL_MODE very special since
the layout isn't derived from the component mode. Deriving the
layout from the precision would make aarch64 incorrect and
would need BI2 and BI4 modes at least.
Adding a parameter to ADJUST_NUNITS might be the way to go instead,
specifying the number of bits in a component?
Richard.
> Maybe @kito can give us more information about RVV ISA if I don't explain it clearly.
>
>
> juzhe.zhong@rivai.ai
>
> From: Richard Biener
> Date: 2023-02-13 16:07
> To: juzhe.zhong
> CC: Pan Li; gcc-patches; kito.cheng; richard.sandiford; ams
> Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
> On Sat, 11 Feb 2023, juzhe.zhong@rivai.ai wrote:
>
> > Thanks for contributing this.
> > Hi, Richard. Can you help us with this issue?
> > In RVV, we have vbool8_t (VNx8BImode), vbool16_t (VNx4BImode), vbool32_t (VNx2BImode), vbool64_t (VNx1BImode)
> > Since we are using 1bit-mask which is 1-BOOL occupy 1bit.
> > According to RVV ISA, we adjust these modes as follows:
> >
> > VNx8BImode poly (8,8) NUNTTS (each nunits is 1bit mask)
> > VNx4BImode poly(4,4) NUNTTS (each nunits is 1bit mask)
> > VNx2BImode poly(2,2) NUNTTS (each nunits is 1bit mask)
> > VNx1BImode poly (1,1) NUNTTS (each nunits is 1bit mask)
>
> So how's VNx1BImode laid out for N == 2? Is that still a single
> byte and two consecutive bits? I suppose so.
>
> But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
>
> I'm not sure what GET_MODE_PRECISION of the vector mode itself
> should be here, but then I wonder ...
>
> > If we tried GET_MODE_BITSIZE or GET_MODE_NUNITS to get value, their value are different.
> > However, If we tried GET_MODE_SIZE of these modes, they are the same (poly (1,1)).
> > Such scenario make these tied together and gives the wrong code gen since their bitsize are different.
> > Consider the case as this:
> > #include "riscv_vector.h"
> > void foo5_3 (int32_t * restrict in, int32_t * restrict out, size_t n, int cond)
> > {
> > vint8m1_t v = *(vint8m1_t*)in;
> > *(vint8m1_t*)out = v; vbool16_t v4 = *(vbool16_t *)in;
> > *(vbool16_t *)(out + 300) = v4;
> > vbool8_t v3 = *(vbool8_t*)in;
> > *(vbool8_t*)(out + 200) = v3;
> > }
> > The second vbool8_t load (vlm.v) is missing. Since GCC gives "v3 = VIEW_CONVERT (vbool8_t) v4" in gimple.
> > We failed to fix it in RISC-V backend. Can you help us with this? Thanks.
>
> ... why for the loads the "padding" is not loaded? The above testcase
> is probably more complicated than necessary as well?
>
> Thanks,
> Richard.
> >
> > juzhe.zhong@rivai.ai
> >
> > From: incarnation.p.lee
> > Date: 2023-02-11 16:46
> > To: gcc-patches
> > CC: juzhe.zhong; kito.cheng; rguenther; Pan Li
> > Subject: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
> > From: Pan Li <incarnation.p.lee@outlook.com>
> >
> > Fix the bug for mode tieable of the rvv bool types. The vbool*_t
> > cannot be tied as the actually load/store size is determinated by
> > the vl. The mode size of rvv bool types are also adjusted for the
> > underlying optimization pass. The rvv bool type is vbool*_t, aka
> > vbool1_t, vbool2_t, vbool4_t, vbool8_t, vbool16_t, vbool32_t, and
> > vbool64_t.
> >
> > PR 108185
> > PR 108654
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
> > * config/riscv/riscv.cc (riscv_v_adjust_bytesize):
> > (riscv_modes_tieable_p):
> > * config/riscv/riscv.h (riscv_v_adjust_bytesize):
> > * machmode.h (VECTOR_BOOL_MODE_P):
> > * tree-ssa-sccvn.cc (visit_reference_op_load):
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/pr108185-1.c: New test.
> > * gcc.target/riscv/pr108185-2.c: New test.
> > * gcc.target/riscv/pr108185-3.c: New test.
> > * gcc.target/riscv/pr108185-4.c: New test.
> > * gcc.target/riscv/pr108185-5.c: New test.
> > * gcc.target/riscv/pr108185-6.c: New test.
> > * gcc.target/riscv/pr108185-7.c: New test.
> > * gcc.target/riscv/pr108185-8.c: New test.
> >
> > Signed-off-by: Pan Li <incarnation.p.lee@outlook.com>
> > ---
> > gcc/config/riscv/riscv-modes.def | 14 ++--
> > gcc/config/riscv/riscv.cc | 34 ++++++++-
> > gcc/config/riscv/riscv.h | 2 +
> > gcc/machmode.h | 3 +
> > gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
> > gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
> > gcc/tree-ssa-sccvn.cc | 13 +++-
> > 13 files changed, 608 insertions(+), 11 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >
> > diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
> > index d5305efa8a6..cc21d3c83a2 100644
> > --- a/gcc/config/riscv/riscv-modes.def
> > +++ b/gcc/config/riscv/riscv-modes.def
> > @@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
> > ADJUST_ALIGNMENT (VNx32BI, 1);
> > ADJUST_ALIGNMENT (VNx64BI, 1);
> > -ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > -ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
> > +ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
> > +ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
> > +ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1));
> > +ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1));
> > +ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2));
> > +ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4));
> > +ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
> > /*
> > | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 3b7804b7501..138c052e13c 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -1003,6 +1003,27 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
> > return scale;
> > }
> > +/* Call from ADJUST_BYTESIZE in riscv-modes.def. Return the correct
> > + BYTES size for corresponding machine_mode. */
> > +
> > +poly_int64
> > +riscv_v_adjust_bytesize (machine_mode mode, int scale)
> > +{
> > + gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
> > +
> > + if (riscv_v_ext_vector_mode_p (mode))
> > + {
> > + poly_uint16 mode_size = GET_MODE_SIZE (mode);
> > +
> > + if (known_lt (mode_size, BYTES_PER_RISCV_VECTOR))
> > + return mode_size;
> > + else
> > + return BYTES_PER_RISCV_VECTOR;
> > + }
> > +
> > + return scale;
> > +}
> > +
> > /* Return true if X is a valid address for machine mode MODE. If it is,
> > fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
> > effect. */
> > @@ -5807,11 +5828,22 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
> > /* Implement TARGET_MODES_TIEABLE_P.
> > Don't allow floating-point modes to be tied, since type punning of
> > - single-precision and double-precision is implementation defined. */
> > + single-precision and double-precision is implementation defined.
> > +
> > + Don't allow different vbool*_t modes to be tied, since the type
> > + size is determinated by vl. */
> > static bool
> > riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
> > {
> > + if (riscv_v_ext_vector_mode_p (mode1) && riscv_v_ext_vector_mode_p (mode2))
> > + {
> > + if (VECTOR_BOOL_MODE_P (mode1) || VECTOR_BOOL_MODE_P (mode2))
> > + return false;
> > +
> > + return known_eq (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2));
> > + }
> > +
> > return (mode1 == mode2
> > || !(GET_MODE_CLASS (mode1) == MODE_FLOAT
> > && GET_MODE_CLASS (mode2) == MODE_FLOAT));
> > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> > index faffd5a77fe..f857223338c 100644
> > --- a/gcc/config/riscv/riscv.h
> > +++ b/gcc/config/riscv/riscv.h
> > @@ -1028,6 +1028,8 @@ extern unsigned riscv_stack_boundary;
> > extern unsigned riscv_bytes_per_vector_chunk;
> > extern poly_uint16 riscv_vector_chunks;
> > extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> > +extern poly_int64 riscv_v_adjust_bytesize (machine_mode mode, int scale);
> > +
> > /* The number of bits and bytes in a RVV vector. */
> > #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
> > #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
> > diff --git a/gcc/machmode.h b/gcc/machmode.h
> > index f1865c1ef42..6720472f2c9 100644
> > --- a/gcc/machmode.h
> > +++ b/gcc/machmode.h
> > @@ -242,6 +242,9 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
> > || CLASS == MODE_ACCUM \
> > || CLASS == MODE_UACCUM)
> > +/* Nonzero if MODE is an vector bool mode. */
> > +#define VECTOR_BOOL_MODE_P(MODE) (GET_MODE_CLASS(MODE) == MODE_VECTOR_BOOL)
> > +
> > /* An optional T (i.e. a T or nothing), where T is some form of mode class. */
> > template<typename T>
> > class opt_mode
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > new file mode 100644
> > index 00000000000..c3d0b10271a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > new file mode 100644
> > index 00000000000..bd13ba916da
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > new file mode 100644
> > index 00000000000..99928f7b1cc
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > new file mode 100644
> > index 00000000000..e70284fada8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > new file mode 100644
> > index 00000000000..575a7842cdf
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > new file mode 100644
> > index 00000000000..95a11d37016
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > new file mode 100644
> > index 00000000000..8f6f0b11f09
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > new file mode 100644
> > index 00000000000..d96959dd064
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > @@ -0,0 +1,77 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
> > +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> > index 028bedbc9a0..19fdba8cfa2 100644
> > --- a/gcc/tree-ssa-sccvn.cc
> > +++ b/gcc/tree-ssa-sccvn.cc
> > @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see
> > #include "gimple-fold.h"
> > #include "tree-eh.h"
> > #include "gimplify.h"
> > +#include "target.h"
> > #include "flags.h"
> > #include "dojump.h"
> > #include "explow.h"
> > @@ -5657,10 +5658,16 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
> > if (result
> > && !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
> > {
> > + machine_mode result_mode = TYPE_MODE (TREE_TYPE (result));
> > + machine_mode op_mode = TYPE_MODE (TREE_TYPE (op));
> > + poly_uint16 result_mode_precision = GET_MODE_PRECISION (result_mode);
> > + poly_uint16 op_mode_precision = GET_MODE_PRECISION (op_mode);
> > +
> > /* Avoid the type punning in case the result mode has padding where
> > - the op we lookup has not. */
> > - if (maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
> > - GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)))))
> > + the op we lookup has not.
> > + Avoid the type punning in case the target mode cannot be tied. */
> > + if (maybe_lt (result_mode_precision, op_mode_precision)
> > + || !targetm.modes_tieable_p (result_mode, op_mode))
> > result = NULL_TREE;
> > else
> > {
> >
>
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
Richard Biener <rguenther@suse.de> writes:
> On Mon, 13 Feb 2023, juzhe.zhong@rivai.ai wrote:
>
>> >> But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
>> Yes, I think so.
>>
>> Let's explain RVV more clearly.
>> Let's suppose we have vector-length = 64bits in RVV CPU.
>> VNx1BI is exactly 1 consecutive bits.
>> VNx2BI is exactly 2 consecutive bits.
>> VNx4BI is exactly 4 consecutive bits.
>> VNx8BI is exactly 8 consecutive bits.
>>
>> For VNx1BI (vbool64_t ), we load it wich this asm:
>> vsetvl e8mf8
>> vlm.v
>>
>> For VNx2BI (vbool32_t ), we load it wich this asm:
>> vsetvl e8mf4
>> vlm.v
>>
>> For VNx4BI (vbool16_t ), we load it wich this asm:
>> vsetvl e8mf2
>> vlm.v
>>
>> For VNx8BI (vbool8_t ), we load it wich this asm:
>> vsetvl e8m1
>> vlm.v
>>
>> In case of this code sequence:
>> vbool16_t v4 = *(vbool16_t *)in;
>> vbool8_t v3 = *(vbool8_t*)in;
>>
>> Since VNx4BI (vbool16_t ) is smaller than VNx8BI (vbool8_t )
>> We can't just use the data loaded by VNx4BI (vbool16_t ) in VNx8BI (vbool8_t ).
>> But we can use the data loaded by VNx8BI (vbool8_t ) in VNx4BI (vbool16_t ).
>>
>> In this example, GCC thinks data loaded for vbool8_t v3 can be replaced by vbool16_t v4 which is already loaded
>> It's incorrect for RVV.
>
> OK, so the 'vlm.v' instruction will zero the padding bits (according to
> vsetvl), but I doubt the memory subsystem will not load a whole byte.
>
> Then GET_MODE_PRECISION of VNx4BI has to be smaller than
> GET_MODE_PRECISION of VNx8BI, even if their size is the same.
>
> I suppose that ADJUST_NUNITS should be able to do this, but then we
> have in aarch64-modes.def
>
> VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
> VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
> VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
> VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
>
> ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
> ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
> ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2);
> ADJUST_NUNITS (VNx2BI, aarch64_sve_vg);
>
> so all VNxMBI modes are 2 bytes in size but their component is always
> BImode but IIRC the elements of VNx2BImode occupy 4 bits each?
Yeah. Only the low bit is significant, so it's still a 1-bit element.
But the padding is distributed evenly across the elements rather than
being grouped at one end of the predicate.
> For riscv we have
>
> VECTOR_BOOL_MODE (VNx1BI, 1, BI, 1);
> ADJUST_NUNITS (VNx1BI, riscv_v_adjust_nunits (VNx1BImode, 1));
>
> so here it would be natural to set the mode precision to
> a poly-int computed by the component precision times nunits? OTOH
> we have to look at the component precision vs. size as well and
>
> /* Single bit mode used for booleans. */
> BOOL_MODE (BI, 1, 1);
>
> BOOL_MODE is not documented, but its precision and size, so BImode
> has a size of 1. That makes VECTOR_BOOL_MODE very special since
> the layout isn't derived from the component mode. Deriving the
> layout from the precision would make aarch64 incorrect and
> would need BI2 and BI4 modes at least.
I think the elements have to stay BI for AArch64. Using BI2 (with a
precision of 2) would make both bits significant.
I'm not sure the RVV case fits into the existing mode layout scheme.
AFAIK we don't currently support vector modes with padding at one end.
If that's right, the fix is likely to involve more than just tweaking
the mode parameters.
What's the byte size of VNx1BI, expressed as a function of N?
If it's CEIL (N, 8) then we don't have a way of representing that yet.
Thanks,
Richard
On Mon, 13 Feb 2023, Richard Sandiford wrote:
> Richard Biener <rguenther@suse.de> writes:
> > On Mon, 13 Feb 2023, juzhe.zhong@rivai.ai wrote:
> >
> >> >> But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
> >> Yes, I think so.
> >>
> >> Let's explain RVV more clearly.
> >> Let's suppose we have vector-length = 64bits in RVV CPU.
> >> VNx1BI is exactly 1 consecutive bits.
> >> VNx2BI is exactly 2 consecutive bits.
> >> VNx4BI is exactly 4 consecutive bits.
> >> VNx8BI is exactly 8 consecutive bits.
> >>
> >> For VNx1BI (vbool64_t ), we load it wich this asm:
> >> vsetvl e8mf8
> >> vlm.v
> >>
> >> For VNx2BI (vbool32_t ), we load it wich this asm:
> >> vsetvl e8mf4
> >> vlm.v
> >>
> >> For VNx4BI (vbool16_t ), we load it wich this asm:
> >> vsetvl e8mf2
> >> vlm.v
> >>
> >> For VNx8BI (vbool8_t ), we load it wich this asm:
> >> vsetvl e8m1
> >> vlm.v
> >>
> >> In case of this code sequence:
> >> vbool16_t v4 = *(vbool16_t *)in;
> >> vbool8_t v3 = *(vbool8_t*)in;
> >>
> >> Since VNx4BI (vbool16_t ) is smaller than VNx8BI (vbool8_t )
> >> We can't just use the data loaded by VNx4BI (vbool16_t ) in VNx8BI (vbool8_t ).
> >> But we can use the data loaded by VNx8BI (vbool8_t ) in VNx4BI (vbool16_t ).
> >>
> >> In this example, GCC thinks data loaded for vbool8_t v3 can be replaced by vbool16_t v4 which is already loaded
> >> It's incorrect for RVV.
> >
> > OK, so the 'vlm.v' instruction will zero the padding bits (according to
> > vsetvl), but I doubt the memory subsystem will not load a whole byte.
> >
> > Then GET_MODE_PRECISION of VNx4BI has to be smaller than
> > GET_MODE_PRECISION of VNx8BI, even if their size is the same.
> >
> > I suppose that ADJUST_NUNITS should be able to do this, but then we
> > have in aarch64-modes.def
> >
> > VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
> > VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
> > VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
> > VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
> >
> > ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
> > ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
> > ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2);
> > ADJUST_NUNITS (VNx2BI, aarch64_sve_vg);
> >
> > so all VNxMBI modes are 2 bytes in size but their component is always
> > BImode but IIRC the elements of VNx2BImode occupy 4 bits each?
>
> Yeah. Only the low bit is significant, so it's still a 1-bit element.
> But the padding is distributed evenly across the elements rather than
> being grouped at one end of the predicate.
I wonder what we'd do for a target that makes the high bit significant ;)
> > For riscv we have
> >
> > VECTOR_BOOL_MODE (VNx1BI, 1, BI, 1);
> > ADJUST_NUNITS (VNx1BI, riscv_v_adjust_nunits (VNx1BImode, 1));
> >
> > so here it would be natural to set the mode precision to
> > a poly-int computed by the component precision times nunits? OTOH
> > we have to look at the component precision vs. size as well and
> >
> > /* Single bit mode used for booleans. */
> > BOOL_MODE (BI, 1, 1);
> >
> > BOOL_MODE is not documented, but its precision and size, so BImode
> > has a size of 1. That makes VECTOR_BOOL_MODE very special since
> > the layout isn't derived from the component mode. Deriving the
> > layout from the precision would make aarch64 incorrect and
> > would need BI2 and BI4 modes at least.
>
> I think the elements have to stay BI for AArch64. Using BI2 (with a
> precision of 2) would make both bits significant.
I think what's "wrong" with a BImode component mode is not the
precision but the size - we don't support bit-precision component
types on the GENERIC side but for bool vector modes we pack the
components to a bit size and aarch64 has varying bit sizes here
(and thus components with padding). I don't think we support
modes with sizes less than a unit but since bool modes are special
we could re-purpose their precision to mean bitsize.
> I'm not sure the RVV case fits into the existing mode layout scheme.
> AFAIK we don't currently support vector modes with padding at one end.
> If that's right, the fix is likely to involve more than just tweaking
> the mode parameters.
>
> What's the byte size of VNx1BI, expressed as a function of N?
> If it's CEIL (N, 8) then we don't have a way of representing that yet.
PARTIAL_VECTOR_MODE? (ick)
Richard.
>> What's the byte size of VNx1BI, expressed as a function of N?
>> If it's CEIL (N, 8) then we don't have a way of representing that yet.
N is a poly value.
RVV like SVE support scalable vector.
the N is poly (1,1).
VNx1B mode nunits = poly(1,1) units.
VNx1B mode bitsize =poly (1,1) bitsize.
VNx1B mode bytesize = poly(1,1) units (currently). Ideally and more accurate, it should be VNx1B mode bytesize =poly (1/8,1/8).
However, it can't represent it like this. GCC consider its bytesize as poly (1,1) bytesize.
VNx2B mode nunits = poly(2,2) units.
VNx2B mode bitsize =poly (2,2) bitsize.
VNx2B mode bytesize = poly(2,2) units (currently). Ideally and more accurate, it should be VNx1B mode bytesize =poly (2/8,2/8).
However, it can't represent it like this. GCC consider its bytesize as poly (1,1) bytesize.
VNx4BI,VNx8BI, likewise.
So their bitsize are different but byteszie are all same.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2023-02-13 17:41
To: Richard Biener
CC: juzhe.zhong\@rivai.ai; incarnation.p.lee; gcc-patches; Kito.cheng; ams
Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
Richard Biener <rguenther@suse.de> writes:
> On Mon, 13 Feb 2023, juzhe.zhong@rivai.ai wrote:
>
>> >> But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
>> Yes, I think so.
>>
>> Let's explain RVV more clearly.
>> Let's suppose we have vector-length = 64bits in RVV CPU.
>> VNx1BI is exactly 1 consecutive bits.
>> VNx2BI is exactly 2 consecutive bits.
>> VNx4BI is exactly 4 consecutive bits.
>> VNx8BI is exactly 8 consecutive bits.
>>
>> For VNx1BI (vbool64_t ), we load it wich this asm:
>> vsetvl e8mf8
>> vlm.v
>>
>> For VNx2BI (vbool32_t ), we load it wich this asm:
>> vsetvl e8mf4
>> vlm.v
>>
>> For VNx4BI (vbool16_t ), we load it wich this asm:
>> vsetvl e8mf2
>> vlm.v
>>
>> For VNx8BI (vbool8_t ), we load it wich this asm:
>> vsetvl e8m1
>> vlm.v
>>
>> In case of this code sequence:
>> vbool16_t v4 = *(vbool16_t *)in;
>> vbool8_t v3 = *(vbool8_t*)in;
>>
>> Since VNx4BI (vbool16_t ) is smaller than VNx8BI (vbool8_t )
>> We can't just use the data loaded by VNx4BI (vbool16_t ) in VNx8BI (vbool8_t ).
>> But we can use the data loaded by VNx8BI (vbool8_t ) in VNx4BI (vbool16_t ).
>>
>> In this example, GCC thinks data loaded for vbool8_t v3 can be replaced by vbool16_t v4 which is already loaded
>> It's incorrect for RVV.
>
> OK, so the 'vlm.v' instruction will zero the padding bits (according to
> vsetvl), but I doubt the memory subsystem will not load a whole byte.
>
> Then GET_MODE_PRECISION of VNx4BI has to be smaller than
> GET_MODE_PRECISION of VNx8BI, even if their size is the same.
>
> I suppose that ADJUST_NUNITS should be able to do this, but then we
> have in aarch64-modes.def
>
> VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
> VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
> VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
> VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
>
> ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
> ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
> ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2);
> ADJUST_NUNITS (VNx2BI, aarch64_sve_vg);
>
> so all VNxMBI modes are 2 bytes in size but their component is always
> BImode but IIRC the elements of VNx2BImode occupy 4 bits each?
Yeah. Only the low bit is significant, so it's still a 1-bit element.
But the padding is distributed evenly across the elements rather than
being grouped at one end of the predicate.
> For riscv we have
>
> VECTOR_BOOL_MODE (VNx1BI, 1, BI, 1);
> ADJUST_NUNITS (VNx1BI, riscv_v_adjust_nunits (VNx1BImode, 1));
>
> so here it would be natural to set the mode precision to
> a poly-int computed by the component precision times nunits? OTOH
> we have to look at the component precision vs. size as well and
>
> /* Single bit mode used for booleans. */
> BOOL_MODE (BI, 1, 1);
>
> BOOL_MODE is not documented, but its precision and size, so BImode
> has a size of 1. That makes VECTOR_BOOL_MODE very special since
> the layout isn't derived from the component mode. Deriving the
> layout from the precision would make aarch64 incorrect and
> would need BI2 and BI4 modes at least.
I think the elements have to stay BI for AArch64. Using BI2 (with a
precision of 2) would make both bits significant.
I'm not sure the RVV case fits into the existing mode layout scheme.
AFAIK we don't currently support vector modes with padding at one end.
If that's right, the fix is likely to involve more than just tweaking
the mode parameters.
What's the byte size of VNx1BI, expressed as a function of N?
If it's CEIL (N, 8) then we don't have a way of representing that yet.
Thanks,
Richard
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>> What's the byte size of VNx1BI, expressed as a function of N?
>>> If it's CEIL (N, 8) then we don't have a way of representing that yet.
> N is a poly value.
> RVV like SVE support scalable vector.
> the N is poly (1,1).
>
> VNx1B mode nunits = poly(1,1) units.
> VNx1B mode bitsize =poly (1,1) bitsize.
> VNx1B mode bytesize = poly(1,1) units (currently). Ideally and more accurate, it should be VNx1B mode bytesize =poly (1/8,1/8).
But this would be a fractional bytesize, and like Richard says,
the memory subsystem would always access full bytes. So I think
the bytesize would have to be at least CEIL (N, 8).
> However, it can't represent it like this. GCC consider its bytesize as poly (1,1) bytesize.
Ah, OK. That (making the size N bytes) does seem like a reasonable
workaround, provided that it matches the C types, etc. So the total
amount of padding is 7N bits (I assume at the msb of the type when
viewed as an integer).
I agree that what (IIUC) was discussed upthread works, i.e.:
bytesize = N
bitsize = N * 8 (fixed function of bytesize)
precision = N
nunits = N
unit_size = 1
unit_precision = 1
But target-independent code won't expect this layout, so supporting
it will involve more than just adjusting the parameters.
Thanks,
Richard
Yeah, I am aggree with you. Memory system access should always at least 1-byte.
So, consider such following code:
vsetvl e8,mf8
vlm.v v8, a0 (v8 is a 1-bit mask (Not sure what the behavior dealing with this case))
vsm.v v8,a1
vsetvl e8,m1
vlm.v v8, a0 (v8 is a 8-bit mask)
vsm.v v8,a2
(Note: both vlm.v are loading same address)
Such asm will not happen in GCC. It will become like this since bool modes are tied:
vsetvl e8,mf8
vlm.v v8, a0 (v8 is a 8-bit mask)
vsm.v v8,a0
vsm.v v8,a1
I am not sure whether it's correct. Maybe I should ask RVV ISA community.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2023-02-13 18:18
To: juzhe.zhong\@rivai.ai
CC: rguenther; incarnation.p.lee; gcc-patches; Kito.cheng; ams
Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>> What's the byte size of VNx1BI, expressed as a function of N?
>>> If it's CEIL (N, 8) then we don't have a way of representing that yet.
> N is a poly value.
> RVV like SVE support scalable vector.
> the N is poly (1,1).
>
> VNx1B mode nunits = poly(1,1) units.
> VNx1B mode bitsize =poly (1,1) bitsize.
> VNx1B mode bytesize = poly(1,1) units (currently). Ideally and more accurate, it should be VNx1B mode bytesize =poly (1/8,1/8).
But this would be a fractional bytesize, and like Richard says,
the memory subsystem would always access full bytes. So I think
the bytesize would have to be at least CEIL (N, 8).
> However, it can't represent it like this. GCC consider its bytesize as poly (1,1) bytesize.
Ah, OK. That (making the size N bytes) does seem like a reasonable
workaround, provided that it matches the C types, etc. So the total
amount of padding is 7N bits (I assume at the msb of the type when
viewed as an integer).
I agree that what (IIUC) was discussed upthread works, i.e.:
bytesize = N
bitsize = N * 8 (fixed function of bytesize)
precision = N
nunits = N
unit_size = 1
unit_precision = 1
But target-independent code won't expect this layout, so supporting
it will involve more than just adjusting the parameters.
Thanks,
Richard
>> Yeah, I am aggree with you. Memory system access should always at least 1-byte.
>> So, consider such following code:
>> vsetvl e8,mf8
>> vlm.v v8, a0 (v8 is a 1-bit mask (Not sure what the behavior dealing with this case))
>> vsm.v v8,a1
>> vsetvl e8,m1
>> vlm.v v8, a0 (v8 is a 8-bit mask)
>> vsm.v v8,a2
>> (Note: both vlm.v are loading same address)
>> Such asm will not happen in GCC. It will become like this since bool modes are tied:
>> vsetvl e8,mf8
>> vlm.v v8, a0 (v8 is a 8-bit mask)
>> vsm.v v8,a0
>> vsm.v v8,a1
>> I am not sure whether it's correct. Maybe I should ask RVV ISA community.
Such case may not be appropriate to talke about. Since 1bit mask for VNx1BI is the minimum value.
Since the size is a poly value (1,1). It can be only be 1 bit or 1bytes or 2bytes...etc. It's a compile-time unknown which is denpending on CPU vector length.
This case should be represent as this:
vsetvl e8,mf8
vlm.v v8, a0 (v8 is a N x 1-bit mask, N is compile-time unknown))
vsm.v v8,a1
vsetvl e8,m1
vlm.v v8, a0 (v8 is a N x 8-bit mask, N is compile-time unknown)
vsm.v v8,a2
(Note: both vlm.v are loading same address)
Such asm will not happen in GCC. It will become like this since bool modes are tied:
vsetvl e8,mf8
vlm.v v8, a0 (v8 is a N x 1-bit mask, N is compile-time unknown))
vsm.v v8,a0
vsm.v v8,a1
Such asm codegen is incorrect, this is what we want to fix.
juzhe.zhong@rivai.ai
From: juzhe.zhong@rivai.ai
Date: 2023-02-13 18:28
To: richard.sandiford
CC: rguenther; incarnation.p.lee; gcc-patches; Kito.cheng; ams
Subject: Re: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
Yeah, I am aggree with you. Memory system access should always at least 1-byte.
So, consider such following code:
vsetvl e8,mf8
vlm.v v8, a0 (v8 is a 1-bit mask (Not sure what the behavior dealing with this case))
vsm.v v8,a1
vsetvl e8,m1
vlm.v v8, a0 (v8 is a 8-bit mask)
vsm.v v8,a2
(Note: both vlm.v are loading same address)
Such asm will not happen in GCC. It will become like this since bool modes are tied:
vsetvl e8,mf8
vlm.v v8, a0 (v8 is a 8-bit mask)
vsm.v v8,a0
vsm.v v8,a1
I am not sure whether it's correct. Maybe I should ask RVV ISA community.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2023-02-13 18:18
To: juzhe.zhong\@rivai.ai
CC: rguenther; incarnation.p.lee; gcc-patches; Kito.cheng; ams
Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>> What's the byte size of VNx1BI, expressed as a function of N?
>>> If it's CEIL (N, 8) then we don't have a way of representing that yet.
> N is a poly value.
> RVV like SVE support scalable vector.
> the N is poly (1,1).
>
> VNx1B mode nunits = poly(1,1) units.
> VNx1B mode bitsize =poly (1,1) bitsize.
> VNx1B mode bytesize = poly(1,1) units (currently). Ideally and more accurate, it should be VNx1B mode bytesize =poly (1/8,1/8).
But this would be a fractional bytesize, and like Richard says,
the memory subsystem would always access full bytes. So I think
the bytesize would have to be at least CEIL (N, 8).
> However, it can't represent it like this. GCC consider its bytesize as poly (1,1) bytesize.
Ah, OK. That (making the size N bytes) does seem like a reasonable
workaround, provided that it matches the C types, etc. So the total
amount of padding is 7N bits (I assume at the msb of the type when
viewed as an integer).
I agree that what (IIUC) was discussed upthread works, i.e.:
bytesize = N
bitsize = N * 8 (fixed function of bytesize)
precision = N
nunits = N
unit_size = 1
unit_precision = 1
But target-independent code won't expect this layout, so supporting
it will involve more than just adjusting the parameters.
Thanks,
Richard
I presume I've been CC'd on this conversation because weird vector
architecture problems have happened to me before. :)
However, I'm not sure I can help much because AMD GCN does not use
BImode vectors at all. This is partly because loading boolean values
into a GCN vector would have 31 padding bits for each lane, but mostly
because the result of comparison instructions is written to a DImode
scalar register, not into a vector.
I did experiment, long ago, with having a V64BImode that could be stored
in scalar registers (tieable with DImode), but there wasn't any great
advantage and it broke VECTOR_MODE_P in most other contexts.
It's possible to store truth values in vectors as integers, and there
are some cases where we do so (SIMD clone mask arguments, for example),
but that's mostly to smooth things over in the middle-end.
The problem with padding bits is something I do see: V64QImode has 24
padding bits for each lane, in register. While there are instructions
that will load and store QImode vectors correctly, without the padding,
the backend still has to handle all the sign-extends, zero-extends, and
truncates explicitly, because the middle-end and expand pass give no
assistance with that for vectors (unlike scalars).
Andrew
On 13/02/2023 08:07, Richard Biener via Gcc-patches wrote:
> On Sat, 11 Feb 2023, juzhe.zhong@rivai.ai wrote:
>
>> Thanks for contributing this.
>> Hi, Richard. Can you help us with this issue?
>> In RVV, we have vbool8_t (VNx8BImode), vbool16_t (VNx4BImode), vbool32_t (VNx2BImode), vbool64_t (VNx1BImode)
>> Since we are using 1bit-mask which is 1-BOOL occupy 1bit.
>> According to RVV ISA, we adjust these modes as follows:
>>
>> VNx8BImode poly (8,8) NUNTTS (each nunits is 1bit mask)
>> VNx4BImode poly(4,4) NUNTTS (each nunits is 1bit mask)
>> VNx2BImode poly(2,2) NUNTTS (each nunits is 1bit mask)
>> VNx1BImode poly (1,1) NUNTTS (each nunits is 1bit mask)
>
> So how's VNx1BImode laid out for N == 2? Is that still a single
> byte and two consecutive bits? I suppose so.
>
> But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
>
> I'm not sure what GET_MODE_PRECISION of the vector mode itself
> should be here, but then I wonder ...
>
>> If we tried GET_MODE_BITSIZE or GET_MODE_NUNITS to get value, their value are different.
>> However, If we tried GET_MODE_SIZE of these modes, they are the same (poly (1,1)).
>> Such scenario make these tied together and gives the wrong code gen since their bitsize are different.
>> Consider the case as this:
>> #include "riscv_vector.h"
>> void foo5_3 (int32_t * restrict in, int32_t * restrict out, size_t n, int cond)
>> {
>> vint8m1_t v = *(vint8m1_t*)in;
>> *(vint8m1_t*)out = v; vbool16_t v4 = *(vbool16_t *)in;
>> *(vbool16_t *)(out + 300) = v4;
>> vbool8_t v3 = *(vbool8_t*)in;
>> *(vbool8_t*)(out + 200) = v3;
>> }
>> The second vbool8_t load (vlm.v) is missing. Since GCC gives "v3 = VIEW_CONVERT (vbool8_t) v4" in gimple.
>> We failed to fix it in RISC-V backend. Can you help us with this? Thanks.
>
> ... why for the loads the "padding" is not loaded? The above testcase
> is probably more complicated than necessary as well?
>
> Thanks,
> Richard.
>
>>
>> juzhe.zhong@rivai.ai
>>
>> From: incarnation.p.lee
>> Date: 2023-02-11 16:46
>> To: gcc-patches
>> CC: juzhe.zhong; kito.cheng; rguenther; Pan Li
>> Subject: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
>> From: Pan Li <incarnation.p.lee@outlook.com>
>>
>> Fix the bug for mode tieable of the rvv bool types. The vbool*_t
>> cannot be tied as the actually load/store size is determinated by
>> the vl. The mode size of rvv bool types are also adjusted for the
>> underlying optimization pass. The rvv bool type is vbool*_t, aka
>> vbool1_t, vbool2_t, vbool4_t, vbool8_t, vbool16_t, vbool32_t, and
>> vbool64_t.
>>
>> PR 108185
>> PR 108654
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
>> * config/riscv/riscv.cc (riscv_v_adjust_bytesize):
>> (riscv_modes_tieable_p):
>> * config/riscv/riscv.h (riscv_v_adjust_bytesize):
>> * machmode.h (VECTOR_BOOL_MODE_P):
>> * tree-ssa-sccvn.cc (visit_reference_op_load):
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/pr108185-1.c: New test.
>> * gcc.target/riscv/pr108185-2.c: New test.
>> * gcc.target/riscv/pr108185-3.c: New test.
>> * gcc.target/riscv/pr108185-4.c: New test.
>> * gcc.target/riscv/pr108185-5.c: New test.
>> * gcc.target/riscv/pr108185-6.c: New test.
>> * gcc.target/riscv/pr108185-7.c: New test.
>> * gcc.target/riscv/pr108185-8.c: New test.
>>
>> Signed-off-by: Pan Li <incarnation.p.lee@outlook.com>
>> ---
>> gcc/config/riscv/riscv-modes.def | 14 ++--
>> gcc/config/riscv/riscv.cc | 34 ++++++++-
>> gcc/config/riscv/riscv.h | 2 +
>> gcc/machmode.h | 3 +
>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
>> gcc/tree-ssa-sccvn.cc | 13 +++-
>> 13 files changed, 608 insertions(+), 11 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>
>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
>> index d5305efa8a6..cc21d3c83a2 100644
>> --- a/gcc/config/riscv/riscv-modes.def
>> +++ b/gcc/config/riscv/riscv-modes.def
>> @@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
>> ADJUST_ALIGNMENT (VNx32BI, 1);
>> ADJUST_ALIGNMENT (VNx64BI, 1);
>> -ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> -ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> -ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> -ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> -ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> -ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> -ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>> +ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
>> +ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
>> +ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1));
>> +ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1));
>> +ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2));
>> +ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4));
>> +ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
>> /*
>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index 3b7804b7501..138c052e13c 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -1003,6 +1003,27 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
>> return scale;
>> }
>> +/* Call from ADJUST_BYTESIZE in riscv-modes.def. Return the correct
>> + BYTES size for corresponding machine_mode. */
>> +
>> +poly_int64
>> +riscv_v_adjust_bytesize (machine_mode mode, int scale)
>> +{
>> + gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
>> +
>> + if (riscv_v_ext_vector_mode_p (mode))
>> + {
>> + poly_uint16 mode_size = GET_MODE_SIZE (mode);
>> +
>> + if (known_lt (mode_size, BYTES_PER_RISCV_VECTOR))
>> + return mode_size;
>> + else
>> + return BYTES_PER_RISCV_VECTOR;
>> + }
>> +
>> + return scale;
>> +}
>> +
>> /* Return true if X is a valid address for machine mode MODE. If it is,
>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
>> effect. */
>> @@ -5807,11 +5828,22 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>> /* Implement TARGET_MODES_TIEABLE_P.
>> Don't allow floating-point modes to be tied, since type punning of
>> - single-precision and double-precision is implementation defined. */
>> + single-precision and double-precision is implementation defined.
>> +
>> + Don't allow different vbool*_t modes to be tied, since the type
>> + size is determinated by vl. */
>> static bool
>> riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
>> {
>> + if (riscv_v_ext_vector_mode_p (mode1) && riscv_v_ext_vector_mode_p (mode2))
>> + {
>> + if (VECTOR_BOOL_MODE_P (mode1) || VECTOR_BOOL_MODE_P (mode2))
>> + return false;
>> +
>> + return known_eq (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2));
>> + }
>> +
>> return (mode1 == mode2
>> || !(GET_MODE_CLASS (mode1) == MODE_FLOAT
>> && GET_MODE_CLASS (mode2) == MODE_FLOAT));
>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>> index faffd5a77fe..f857223338c 100644
>> --- a/gcc/config/riscv/riscv.h
>> +++ b/gcc/config/riscv/riscv.h
>> @@ -1028,6 +1028,8 @@ extern unsigned riscv_stack_boundary;
>> extern unsigned riscv_bytes_per_vector_chunk;
>> extern poly_uint16 riscv_vector_chunks;
>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
>> +extern poly_int64 riscv_v_adjust_bytesize (machine_mode mode, int scale);
>> +
>> /* The number of bits and bytes in a RVV vector. */
>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
>> diff --git a/gcc/machmode.h b/gcc/machmode.h
>> index f1865c1ef42..6720472f2c9 100644
>> --- a/gcc/machmode.h
>> +++ b/gcc/machmode.h
>> @@ -242,6 +242,9 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
>> || CLASS == MODE_ACCUM \
>> || CLASS == MODE_UACCUM)
>> +/* Nonzero if MODE is an vector bool mode. */
>> +#define VECTOR_BOOL_MODE_P(MODE) (GET_MODE_CLASS(MODE) == MODE_VECTOR_BOOL)
>> +
>> /* An optional T (i.e. a T or nothing), where T is some form of mode class. */
>> template<typename T>
>> class opt_mode
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>> new file mode 100644
>> index 00000000000..c3d0b10271a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>> new file mode 100644
>> index 00000000000..bd13ba916da
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>> new file mode 100644
>> index 00000000000..99928f7b1cc
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>> new file mode 100644
>> index 00000000000..e70284fada8
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>> new file mode 100644
>> index 00000000000..575a7842cdf
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>> new file mode 100644
>> index 00000000000..95a11d37016
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>> new file mode 100644
>> index 00000000000..8f6f0b11f09
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>> new file mode 100644
>> index 00000000000..d96959dd064
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>> @@ -0,0 +1,77 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
>> index 028bedbc9a0..19fdba8cfa2 100644
>> --- a/gcc/tree-ssa-sccvn.cc
>> +++ b/gcc/tree-ssa-sccvn.cc
>> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see
>> #include "gimple-fold.h"
>> #include "tree-eh.h"
>> #include "gimplify.h"
>> +#include "target.h"
>> #include "flags.h"
>> #include "dojump.h"
>> #include "explow.h"
>> @@ -5657,10 +5658,16 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
>> if (result
>> && !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
>> {
>> + machine_mode result_mode = TYPE_MODE (TREE_TYPE (result));
>> + machine_mode op_mode = TYPE_MODE (TREE_TYPE (op));
>> + poly_uint16 result_mode_precision = GET_MODE_PRECISION (result_mode);
>> + poly_uint16 op_mode_precision = GET_MODE_PRECISION (op_mode);
>> +
>> /* Avoid the type punning in case the result mode has padding where
>> - the op we lookup has not. */
>> - if (maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
>> - GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)))))
>> + the op we lookup has not.
>> + Avoid the type punning in case the target mode cannot be tied. */
>> + if (maybe_lt (result_mode_precision, op_mode_precision)
>> + || !targetm.modes_tieable_p (result_mode, op_mode))
>> result = NULL_TREE;
>> else
>> {
>>
>
Thanks all for your help and comments.
Let me share more information about this patch. Especially for the tree-ssa-sccvn.cc part.
Assume we have the blow test code for this issue.
void
test_1(int8_t * restrict in, int8_t * restrict out) {
vbool8_t v2 = *(vbool8_t*)in;
vbool16_t v5 = *(vbool16_t*)in;
*(vbool8_t*)(out + 100) = v2;
*(vbool16_t*)(out + 200) = v5;
}
Without the tree-ssa-sccvn.cc file code change.
------------------------------------------------------------------------------------
void test_1 (int8_t * restrict in, int8_t * restrict out)
{
vbool8_t v2;
__rvv_bool16_t _1;
<bb 2> [local count: 1073741824]:
v2_4 = MEM[(vbool8_t *)in_3(D)];
_1 = VIEW_CONVERT_EXPR<__rvv_bool16_t>(v2_4); // insert during 039.fre1
MEM[(vbool8_t *)out_5(D) + 100B] = v2_4;
MEM[(vbool16_t *)out_5(D) + 200B] = _1;
return;
}
WIthin the tree-ssa-sccvn.cc file code change.
------------------------------------------------------------------------------------
void test_1 (int8_t * restrict in, int8_t * restrict out)
{
vbool16_t v5;
vbool8_t v2;
<bb 2> [local count: 1073741824]:
v2_3 = MEM[(vbool8_t *)in_2(D)];
v5_4 = MEM[(vbool16_t *)in_2(D)];
MEM[(vbool8_t *)out_5(D) + 100B] = v2_3;
MEM[(vbool16_t *)out_5(D) + 200B] = v5_4;
return;
}
Thus, I figured out the a-main.c.039t.fre1 pass results in this CONVERT being inserted.
With some debugging, I located the difference that comes from the
expressions_equal_p. If GET_MODE_SIZE(mode) is the same between the VxN8Bimode
and VxN4Bimode, the expressions_equal_p will compare the same address of a tree, aka
POLY_INT_CST [8, 8].
visit_reference_op_load
|- vn_reference_lookup
|- vn_reference_lookup_2
|- find_slot_with_hash
|- vn_reference_hasher::equal
|- expressions_equal_p
Meanwhile, we also double-checked that set the different MODE_SIZE of both the
VxN8Bimode and VxN4Bimode (for example, [8, 1] and [4,1] for test only) are able
to resolve this issue. But they should be [1, 1] according to the ISA semantics.
Thus, we try to set other MODE_XXX but it seems not working at all. For example:
VNx4BIMode NUNITS [0x4, 0x4]
VNx8BIMode NUNITS [0x8, 0x8]
Finally, I found the TARGET_MODES_TIEABLE_P and inject it into the function
visit_reference_op_load to resolve this issue.
I will continue to try other ways besides the tree-ssa-sccvn.cc if this may not be
the right place for this issue.
Thank again and will keep you posted.
Pan
________________________________
From: Andrew Stubbs <ams@codesourcery.com>
Sent: Monday, February 13, 2023 19:00
To: Richard Biener <rguenther@suse.de>; juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Cc: Pan Li <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; kito.cheng <kito.cheng@sifive.com>; richard.sandiford@arm.com <richard.sandiford@arm.com>
Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
I presume I've been CC'd on this conversation because weird vector
architecture problems have happened to me before. :)
However, I'm not sure I can help much because AMD GCN does not use
BImode vectors at all. This is partly because loading boolean values
into a GCN vector would have 31 padding bits for each lane, but mostly
because the result of comparison instructions is written to a DImode
scalar register, not into a vector.
I did experiment, long ago, with having a V64BImode that could be stored
in scalar registers (tieable with DImode), but there wasn't any great
advantage and it broke VECTOR_MODE_P in most other contexts.
It's possible to store truth values in vectors as integers, and there
are some cases where we do so (SIMD clone mask arguments, for example),
but that's mostly to smooth things over in the middle-end.
The problem with padding bits is something I do see: V64QImode has 24
padding bits for each lane, in register. While there are instructions
that will load and store QImode vectors correctly, without the padding,
the backend still has to handle all the sign-extends, zero-extends, and
truncates explicitly, because the middle-end and expand pass give no
assistance with that for vectors (unlike scalars).
Andrew
On 13/02/2023 08:07, Richard Biener via Gcc-patches wrote:
> On Sat, 11 Feb 2023, juzhe.zhong@rivai.ai wrote:
>
>> Thanks for contributing this.
>> Hi, Richard. Can you help us with this issue?
>> In RVV, we have vbool8_t (VNx8BImode), vbool16_t (VNx4BImode), vbool32_t (VNx2BImode), vbool64_t (VNx1BImode)
>> Since we are using 1bit-mask which is 1-BOOL occupy 1bit.
>> According to RVV ISA, we adjust these modes as follows:
>>
>> VNx8BImode poly (8,8) NUNTTS (each nunits is 1bit mask)
>> VNx4BImode poly(4,4) NUNTTS (each nunits is 1bit mask)
>> VNx2BImode poly(2,2) NUNTTS (each nunits is 1bit mask)
>> VNx1BImode poly (1,1) NUNTTS (each nunits is 1bit mask)
>
> So how's VNx1BImode laid out for N == 2? Is that still a single
> byte and two consecutive bits? I suppose so.
>
> But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
>
> I'm not sure what GET_MODE_PRECISION of the vector mode itself
> should be here, but then I wonder ...
>
>> If we tried GET_MODE_BITSIZE or GET_MODE_NUNITS to get value, their value are different.
>> However, If we tried GET_MODE_SIZE of these modes, they are the same (poly (1,1)).
>> Such scenario make these tied together and gives the wrong code gen since their bitsize are different.
>> Consider the case as this:
>> #include "riscv_vector.h"
>> void foo5_3 (int32_t * restrict in, int32_t * restrict out, size_t n, int cond)
>> {
>> vint8m1_t v = *(vint8m1_t*)in;
>> *(vint8m1_t*)out = v; vbool16_t v4 = *(vbool16_t *)in;
>> *(vbool16_t *)(out + 300) = v4;
>> vbool8_t v3 = *(vbool8_t*)in;
>> *(vbool8_t*)(out + 200) = v3;
>> }
>> The second vbool8_t load (vlm.v) is missing. Since GCC gives "v3 = VIEW_CONVERT (vbool8_t) v4" in gimple.
>> We failed to fix it in RISC-V backend. Can you help us with this? Thanks.
>
> ... why for the loads the "padding" is not loaded? The above testcase
> is probably more complicated than necessary as well?
>
> Thanks,
> Richard.
>
>>
>> juzhe.zhong@rivai.ai
>>
>> From: incarnation.p.lee
>> Date: 2023-02-11 16:46
>> To: gcc-patches
>> CC: juzhe.zhong; kito.cheng; rguenther; Pan Li
>> Subject: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
>> From: Pan Li <incarnation.p.lee@outlook.com>
>>
>> Fix the bug for mode tieable of the rvv bool types. The vbool*_t
>> cannot be tied as the actually load/store size is determinated by
>> the vl. The mode size of rvv bool types are also adjusted for the
>> underlying optimization pass. The rvv bool type is vbool*_t, aka
>> vbool1_t, vbool2_t, vbool4_t, vbool8_t, vbool16_t, vbool32_t, and
>> vbool64_t.
>>
>> PR 108185
>> PR 108654
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
>> * config/riscv/riscv.cc (riscv_v_adjust_bytesize):
>> (riscv_modes_tieable_p):
>> * config/riscv/riscv.h (riscv_v_adjust_bytesize):
>> * machmode.h (VECTOR_BOOL_MODE_P):
>> * tree-ssa-sccvn.cc (visit_reference_op_load):
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/pr108185-1.c: New test.
>> * gcc.target/riscv/pr108185-2.c: New test.
>> * gcc.target/riscv/pr108185-3.c: New test.
>> * gcc.target/riscv/pr108185-4.c: New test.
>> * gcc.target/riscv/pr108185-5.c: New test.
>> * gcc.target/riscv/pr108185-6.c: New test.
>> * gcc.target/riscv/pr108185-7.c: New test.
>> * gcc.target/riscv/pr108185-8.c: New test.
>>
>> Signed-off-by: Pan Li <incarnation.p.lee@outlook.com>
>> ---
>> gcc/config/riscv/riscv-modes.def | 14 ++--
>> gcc/config/riscv/riscv.cc | 34 ++++++++-
>> gcc/config/riscv/riscv.h | 2 +
>> gcc/machmode.h | 3 +
>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
>> gcc/tree-ssa-sccvn.cc | 13 +++-
>> 13 files changed, 608 insertions(+), 11 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>
>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
>> index d5305efa8a6..cc21d3c83a2 100644
>> --- a/gcc/config/riscv/riscv-modes.def
>> +++ b/gcc/config/riscv/riscv-modes.def
>> @@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
>> ADJUST_ALIGNMENT (VNx32BI, 1);
>> ADJUST_ALIGNMENT (VNx64BI, 1);
>> -ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> -ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> -ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> -ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> -ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> -ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> -ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>> +ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
>> +ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
>> +ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1));
>> +ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1));
>> +ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2));
>> +ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4));
>> +ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
>> /*
>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index 3b7804b7501..138c052e13c 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -1003,6 +1003,27 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
>> return scale;
>> }
>> +/* Call from ADJUST_BYTESIZE in riscv-modes.def. Return the correct
>> + BYTES size for corresponding machine_mode. */
>> +
>> +poly_int64
>> +riscv_v_adjust_bytesize (machine_mode mode, int scale)
>> +{
>> + gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
>> +
>> + if (riscv_v_ext_vector_mode_p (mode))
>> + {
>> + poly_uint16 mode_size = GET_MODE_SIZE (mode);
>> +
>> + if (known_lt (mode_size, BYTES_PER_RISCV_VECTOR))
>> + return mode_size;
>> + else
>> + return BYTES_PER_RISCV_VECTOR;
>> + }
>> +
>> + return scale;
>> +}
>> +
>> /* Return true if X is a valid address for machine mode MODE. If it is,
>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
>> effect. */
>> @@ -5807,11 +5828,22 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>> /* Implement TARGET_MODES_TIEABLE_P.
>> Don't allow floating-point modes to be tied, since type punning of
>> - single-precision and double-precision is implementation defined. */
>> + single-precision and double-precision is implementation defined.
>> +
>> + Don't allow different vbool*_t modes to be tied, since the type
>> + size is determinated by vl. */
>> static bool
>> riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
>> {
>> + if (riscv_v_ext_vector_mode_p (mode1) && riscv_v_ext_vector_mode_p (mode2))
>> + {
>> + if (VECTOR_BOOL_MODE_P (mode1) || VECTOR_BOOL_MODE_P (mode2))
>> + return false;
>> +
>> + return known_eq (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2));
>> + }
>> +
>> return (mode1 == mode2
>> || !(GET_MODE_CLASS (mode1) == MODE_FLOAT
>> && GET_MODE_CLASS (mode2) == MODE_FLOAT));
>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>> index faffd5a77fe..f857223338c 100644
>> --- a/gcc/config/riscv/riscv.h
>> +++ b/gcc/config/riscv/riscv.h
>> @@ -1028,6 +1028,8 @@ extern unsigned riscv_stack_boundary;
>> extern unsigned riscv_bytes_per_vector_chunk;
>> extern poly_uint16 riscv_vector_chunks;
>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
>> +extern poly_int64 riscv_v_adjust_bytesize (machine_mode mode, int scale);
>> +
>> /* The number of bits and bytes in a RVV vector. */
>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
>> diff --git a/gcc/machmode.h b/gcc/machmode.h
>> index f1865c1ef42..6720472f2c9 100644
>> --- a/gcc/machmode.h
>> +++ b/gcc/machmode.h
>> @@ -242,6 +242,9 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
>> || CLASS == MODE_ACCUM \
>> || CLASS == MODE_UACCUM)
>> +/* Nonzero if MODE is an vector bool mode. */
>> +#define VECTOR_BOOL_MODE_P(MODE) (GET_MODE_CLASS(MODE) == MODE_VECTOR_BOOL)
>> +
>> /* An optional T (i.e. a T or nothing), where T is some form of mode class. */
>> template<typename T>
>> class opt_mode
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>> new file mode 100644
>> index 00000000000..c3d0b10271a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>> new file mode 100644
>> index 00000000000..bd13ba916da
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>> new file mode 100644
>> index 00000000000..99928f7b1cc
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>> new file mode 100644
>> index 00000000000..e70284fada8
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>> new file mode 100644
>> index 00000000000..575a7842cdf
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>> new file mode 100644
>> index 00000000000..95a11d37016
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>> new file mode 100644
>> index 00000000000..8f6f0b11f09
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>> new file mode 100644
>> index 00000000000..d96959dd064
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>> @@ -0,0 +1,77 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
>> index 028bedbc9a0..19fdba8cfa2 100644
>> --- a/gcc/tree-ssa-sccvn.cc
>> +++ b/gcc/tree-ssa-sccvn.cc
>> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see
>> #include "gimple-fold.h"
>> #include "tree-eh.h"
>> #include "gimplify.h"
>> +#include "target.h"
>> #include "flags.h"
>> #include "dojump.h"
>> #include "explow.h"
>> @@ -5657,10 +5658,16 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
>> if (result
>> && !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
>> {
>> + machine_mode result_mode = TYPE_MODE (TREE_TYPE (result));
>> + machine_mode op_mode = TYPE_MODE (TREE_TYPE (op));
>> + poly_uint16 result_mode_precision = GET_MODE_PRECISION (result_mode);
>> + poly_uint16 op_mode_precision = GET_MODE_PRECISION (op_mode);
>> +
>> /* Avoid the type punning in case the result mode has padding where
>> - the op we lookup has not. */
>> - if (maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
>> - GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)))))
>> + the op we lookup has not.
>> + Avoid the type punning in case the target mode cannot be tied. */
>> + if (maybe_lt (result_mode_precision, op_mode_precision)
>> + || !targetm.modes_tieable_p (result_mode, op_mode))
>> result = NULL_TREE;
>> else
>> {
>>
>
On Mon, 13 Feb 2023, 盼 李 wrote:
> Thanks all for your help and comments.
>
> Let me share more information about this patch. Especially for the tree-ssa-sccvn.cc part.
>
> Assume we have the blow test code for this issue.
>
> void
> test_1(int8_t * restrict in, int8_t * restrict out) {
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
>
> *(vbool8_t*)(out + 100) = v2;
> *(vbool16_t*)(out + 200) = v5;
> }
>
> Without the tree-ssa-sccvn.cc file code change.
> ------------------------------------------------------------------------------------
> void test_1 (int8_t * restrict in, int8_t * restrict out)
> {
> vbool8_t v2;
> __rvv_bool16_t _1;
>
> <bb 2> [local count: 1073741824]:
> v2_4 = MEM[(vbool8_t *)in_3(D)];
> _1 = VIEW_CONVERT_EXPR<__rvv_bool16_t>(v2_4); // insert during 039.fre1
> MEM[(vbool8_t *)out_5(D) + 100B] = v2_4;
> MEM[(vbool16_t *)out_5(D) + 200B] = _1;
> return;
> }
>
> WIthin the tree-ssa-sccvn.cc file code change.
> ------------------------------------------------------------------------------------
> void test_1 (int8_t * restrict in, int8_t * restrict out)
> {
> vbool16_t v5;
> vbool8_t v2;
>
> <bb 2> [local count: 1073741824]:
> v2_3 = MEM[(vbool8_t *)in_2(D)];
> v5_4 = MEM[(vbool16_t *)in_2(D)];
> MEM[(vbool8_t *)out_5(D) + 100B] = v2_3;
> MEM[(vbool16_t *)out_5(D) + 200B] = v5_4;
> return;
> }
>
> Thus, I figured out the a-main.c.039t.fre1 pass results in this CONVERT being inserted.
> With some debugging, I located the difference that comes from the
> expressions_equal_p. If GET_MODE_SIZE(mode) is the same between the VxN8Bimode
> and VxN4Bimode, the expressions_equal_p will compare the same address of a tree, aka
> POLY_INT_CST [8, 8].
>
> visit_reference_op_load
> |- vn_reference_lookup
> |- vn_reference_lookup_2
> |- find_slot_with_hash
> |- vn_reference_hasher::equal
> |- expressions_equal_p
>
> Meanwhile, we also double-checked that set the different MODE_SIZE of both the
> VxN8Bimode and VxN4Bimode (for example, [8, 1] and [4,1] for test only) are able
> to resolve this issue. But they should be [1, 1] according to the ISA semantics.
>
> Thus, we try to set other MODE_XXX but it seems not working at all. For example:
>
> VNx4BIMode NUNITS [0x4, 0x4]
> VNx8BIMode NUNITS [0x8, 0x8]
>
> Finally, I found the TARGET_MODES_TIEABLE_P and inject it into the function
> visit_reference_op_load to resolve this issue.
>
> I will continue to try other ways besides the tree-ssa-sccvn.cc if this may not be
> the right place for this issue.
There are other places like alias analysis which will be not happy
if the mode size/precision do not match reality. So no, I don't think
modes_tieable is the correct thing to check here. Instead the existing
check seems to be to the point but the modes are not set up correctly
to carry the info of one having padding at the end and the other not.
Richard.
> Thank again and will keep you posted.
>
> Pan
>
>
>
> ________________________________
> From: Andrew Stubbs <ams@codesourcery.com>
> Sent: Monday, February 13, 2023 19:00
> To: Richard Biener <rguenther@suse.de>; juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> Cc: Pan Li <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; kito.cheng <kito.cheng@sifive.com>; richard.sandiford@arm.com <richard.sandiford@arm.com>
> Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
>
> I presume I've been CC'd on this conversation because weird vector
> architecture problems have happened to me before. :)
>
> However, I'm not sure I can help much because AMD GCN does not use
> BImode vectors at all. This is partly because loading boolean values
> into a GCN vector would have 31 padding bits for each lane, but mostly
> because the result of comparison instructions is written to a DImode
> scalar register, not into a vector.
>
> I did experiment, long ago, with having a V64BImode that could be stored
> in scalar registers (tieable with DImode), but there wasn't any great
> advantage and it broke VECTOR_MODE_P in most other contexts.
>
> It's possible to store truth values in vectors as integers, and there
> are some cases where we do so (SIMD clone mask arguments, for example),
> but that's mostly to smooth things over in the middle-end.
>
> The problem with padding bits is something I do see: V64QImode has 24
> padding bits for each lane, in register. While there are instructions
> that will load and store QImode vectors correctly, without the padding,
> the backend still has to handle all the sign-extends, zero-extends, and
> truncates explicitly, because the middle-end and expand pass give no
> assistance with that for vectors (unlike scalars).
>
> Andrew
> On 13/02/2023 08:07, Richard Biener via Gcc-patches wrote:
> > On Sat, 11 Feb 2023, juzhe.zhong@rivai.ai wrote:
> >
> >> Thanks for contributing this.
> >> Hi, Richard. Can you help us with this issue?
> >> In RVV, we have vbool8_t (VNx8BImode), vbool16_t (VNx4BImode), vbool32_t (VNx2BImode), vbool64_t (VNx1BImode)
> >> Since we are using 1bit-mask which is 1-BOOL occupy 1bit.
> >> According to RVV ISA, we adjust these modes as follows:
> >>
> >> VNx8BImode poly (8,8) NUNTTS (each nunits is 1bit mask)
> >> VNx4BImode poly(4,4) NUNTTS (each nunits is 1bit mask)
> >> VNx2BImode poly(2,2) NUNTTS (each nunits is 1bit mask)
> >> VNx1BImode poly (1,1) NUNTTS (each nunits is 1bit mask)
> >
> > So how's VNx1BImode laid out for N == 2? Is that still a single
> > byte and two consecutive bits? I suppose so.
> >
> > But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
> >
> > I'm not sure what GET_MODE_PRECISION of the vector mode itself
> > should be here, but then I wonder ...
> >
> >> If we tried GET_MODE_BITSIZE or GET_MODE_NUNITS to get value, their value are different.
> >> However, If we tried GET_MODE_SIZE of these modes, they are the same (poly (1,1)).
> >> Such scenario make these tied together and gives the wrong code gen since their bitsize are different.
> >> Consider the case as this:
> >> #include "riscv_vector.h"
> >> void foo5_3 (int32_t * restrict in, int32_t * restrict out, size_t n, int cond)
> >> {
> >> vint8m1_t v = *(vint8m1_t*)in;
> >> *(vint8m1_t*)out = v; vbool16_t v4 = *(vbool16_t *)in;
> >> *(vbool16_t *)(out + 300) = v4;
> >> vbool8_t v3 = *(vbool8_t*)in;
> >> *(vbool8_t*)(out + 200) = v3;
> >> }
> >> The second vbool8_t load (vlm.v) is missing. Since GCC gives "v3 = VIEW_CONVERT (vbool8_t) v4" in gimple.
> >> We failed to fix it in RISC-V backend. Can you help us with this? Thanks.
> >
> > ... why for the loads the "padding" is not loaded? The above testcase
> > is probably more complicated than necessary as well?
> >
> > Thanks,
> > Richard.
> >
> >>
> >> juzhe.zhong@rivai.ai
> >>
> >> From: incarnation.p.lee
> >> Date: 2023-02-11 16:46
> >> To: gcc-patches
> >> CC: juzhe.zhong; kito.cheng; rguenther; Pan Li
> >> Subject: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
> >> From: Pan Li <incarnation.p.lee@outlook.com>
> >>
> >> Fix the bug for mode tieable of the rvv bool types. The vbool*_t
> >> cannot be tied as the actually load/store size is determinated by
> >> the vl. The mode size of rvv bool types are also adjusted for the
> >> underlying optimization pass. The rvv bool type is vbool*_t, aka
> >> vbool1_t, vbool2_t, vbool4_t, vbool8_t, vbool16_t, vbool32_t, and
> >> vbool64_t.
> >>
> >> PR 108185
> >> PR 108654
> >>
> >> gcc/ChangeLog:
> >>
> >> * config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
> >> * config/riscv/riscv.cc (riscv_v_adjust_bytesize):
> >> (riscv_modes_tieable_p):
> >> * config/riscv/riscv.h (riscv_v_adjust_bytesize):
> >> * machmode.h (VECTOR_BOOL_MODE_P):
> >> * tree-ssa-sccvn.cc (visit_reference_op_load):
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.target/riscv/pr108185-1.c: New test.
> >> * gcc.target/riscv/pr108185-2.c: New test.
> >> * gcc.target/riscv/pr108185-3.c: New test.
> >> * gcc.target/riscv/pr108185-4.c: New test.
> >> * gcc.target/riscv/pr108185-5.c: New test.
> >> * gcc.target/riscv/pr108185-6.c: New test.
> >> * gcc.target/riscv/pr108185-7.c: New test.
> >> * gcc.target/riscv/pr108185-8.c: New test.
> >>
> >> Signed-off-by: Pan Li <incarnation.p.lee@outlook.com>
> >> ---
> >> gcc/config/riscv/riscv-modes.def | 14 ++--
> >> gcc/config/riscv/riscv.cc | 34 ++++++++-
> >> gcc/config/riscv/riscv.h | 2 +
> >> gcc/machmode.h | 3 +
> >> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
> >> gcc/tree-ssa-sccvn.cc | 13 +++-
> >> 13 files changed, 608 insertions(+), 11 deletions(-)
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >>
> >> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
> >> index d5305efa8a6..cc21d3c83a2 100644
> >> --- a/gcc/config/riscv/riscv-modes.def
> >> +++ b/gcc/config/riscv/riscv-modes.def
> >> @@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
> >> ADJUST_ALIGNMENT (VNx32BI, 1);
> >> ADJUST_ALIGNMENT (VNx64BI, 1);
> >> -ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
> >> +ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
> >> +ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
> >> +ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1));
> >> +ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1));
> >> +ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2));
> >> +ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4));
> >> +ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
> >> /*
> >> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> >> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> >> index 3b7804b7501..138c052e13c 100644
> >> --- a/gcc/config/riscv/riscv.cc
> >> +++ b/gcc/config/riscv/riscv.cc
> >> @@ -1003,6 +1003,27 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
> >> return scale;
> >> }
> >> +/* Call from ADJUST_BYTESIZE in riscv-modes.def. Return the correct
> >> + BYTES size for corresponding machine_mode. */
> >> +
> >> +poly_int64
> >> +riscv_v_adjust_bytesize (machine_mode mode, int scale)
> >> +{
> >> + gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
> >> +
> >> + if (riscv_v_ext_vector_mode_p (mode))
> >> + {
> >> + poly_uint16 mode_size = GET_MODE_SIZE (mode);
> >> +
> >> + if (known_lt (mode_size, BYTES_PER_RISCV_VECTOR))
> >> + return mode_size;
> >> + else
> >> + return BYTES_PER_RISCV_VECTOR;
> >> + }
> >> +
> >> + return scale;
> >> +}
> >> +
> >> /* Return true if X is a valid address for machine mode MODE. If it is,
> >> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
> >> effect. */
> >> @@ -5807,11 +5828,22 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
> >> /* Implement TARGET_MODES_TIEABLE_P.
> >> Don't allow floating-point modes to be tied, since type punning of
> >> - single-precision and double-precision is implementation defined. */
> >> + single-precision and double-precision is implementation defined.
> >> +
> >> + Don't allow different vbool*_t modes to be tied, since the type
> >> + size is determinated by vl. */
> >> static bool
> >> riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
> >> {
> >> + if (riscv_v_ext_vector_mode_p (mode1) && riscv_v_ext_vector_mode_p (mode2))
> >> + {
> >> + if (VECTOR_BOOL_MODE_P (mode1) || VECTOR_BOOL_MODE_P (mode2))
> >> + return false;
> >> +
> >> + return known_eq (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2));
> >> + }
> >> +
> >> return (mode1 == mode2
> >> || !(GET_MODE_CLASS (mode1) == MODE_FLOAT
> >> && GET_MODE_CLASS (mode2) == MODE_FLOAT));
> >> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> >> index faffd5a77fe..f857223338c 100644
> >> --- a/gcc/config/riscv/riscv.h
> >> +++ b/gcc/config/riscv/riscv.h
> >> @@ -1028,6 +1028,8 @@ extern unsigned riscv_stack_boundary;
> >> extern unsigned riscv_bytes_per_vector_chunk;
> >> extern poly_uint16 riscv_vector_chunks;
> >> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> >> +extern poly_int64 riscv_v_adjust_bytesize (machine_mode mode, int scale);
> >> +
> >> /* The number of bits and bytes in a RVV vector. */
> >> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
> >> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
> >> diff --git a/gcc/machmode.h b/gcc/machmode.h
> >> index f1865c1ef42..6720472f2c9 100644
> >> --- a/gcc/machmode.h
> >> +++ b/gcc/machmode.h
> >> @@ -242,6 +242,9 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
> >> || CLASS == MODE_ACCUM \
> >> || CLASS == MODE_UACCUM)
> >> +/* Nonzero if MODE is an vector bool mode. */
> >> +#define VECTOR_BOOL_MODE_P(MODE) (GET_MODE_CLASS(MODE) == MODE_VECTOR_BOOL)
> >> +
> >> /* An optional T (i.e. a T or nothing), where T is some form of mode class. */
> >> template<typename T>
> >> class opt_mode
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >> new file mode 100644
> >> index 00000000000..c3d0b10271a
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >> new file mode 100644
> >> index 00000000000..bd13ba916da
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >> new file mode 100644
> >> index 00000000000..99928f7b1cc
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >> new file mode 100644
> >> index 00000000000..e70284fada8
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >> new file mode 100644
> >> index 00000000000..575a7842cdf
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >> new file mode 100644
> >> index 00000000000..95a11d37016
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >> new file mode 100644
> >> index 00000000000..8f6f0b11f09
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >> new file mode 100644
> >> index 00000000000..d96959dd064
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >> @@ -0,0 +1,77 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> >> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> >> index 028bedbc9a0..19fdba8cfa2 100644
> >> --- a/gcc/tree-ssa-sccvn.cc
> >> +++ b/gcc/tree-ssa-sccvn.cc
> >> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see
> >> #include "gimple-fold.h"
> >> #include "tree-eh.h"
> >> #include "gimplify.h"
> >> +#include "target.h"
> >> #include "flags.h"
> >> #include "dojump.h"
> >> #include "explow.h"
> >> @@ -5657,10 +5658,16 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
> >> if (result
> >> && !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
> >> {
> >> + machine_mode result_mode = TYPE_MODE (TREE_TYPE (result));
> >> + machine_mode op_mode = TYPE_MODE (TREE_TYPE (op));
> >> + poly_uint16 result_mode_precision = GET_MODE_PRECISION (result_mode);
> >> + poly_uint16 op_mode_precision = GET_MODE_PRECISION (op_mode);
> >> +
> >> /* Avoid the type punning in case the result mode has padding where
> >> - the op we lookup has not. */
> >> - if (maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
> >> - GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)))))
> >> + the op we lookup has not.
> >> + Avoid the type punning in case the target mode cannot be tied. */
> >> + if (maybe_lt (result_mode_precision, op_mode_precision)
> >> + || !targetm.modes_tieable_p (result_mode, op_mode))
> >> result = NULL_TREE;
> >> else
> >> {
> >>
> >
>
>
After some investigation, the mode precision adjusting can help to tell the difference from the VxN1BI to VxN64BI, besides the existing mode_size. Thus I would like to prepare the patch for the precision adjustment only first.
Unfortunately, there is one selftest failure right now when I try to adjust the precision of VxN*BI and I am still working on it. Of course, will keep you all posted.
VxN1BI adjust precision => 1
VxN2BI adjust precision => 2
VxN4BI adjust precision => 4
VxN8BI adjust precision => 8
VxN16BI adjust precision => 16
VxN32BI adjust precision => 32
VxN64BI adjust precision => 64
Pan
________________________________
From: Richard Biener <rguenther@suse.de>
Sent: Monday, February 13, 2023 23:47
To: 盼 李 <incarnation.p.lee@outlook.com>
Cc: Andrew Stubbs <ams@codesourcery.com>; juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; gcc-patches <gcc-patches@gcc.gnu.org>; kito.cheng <kito.cheng@sifive.com>; richard.sandiford@arm.com <richard.sandiford@arm.com>
Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
On Mon, 13 Feb 2023, 盼 李 wrote:
> Thanks all for your help and comments.
>
> Let me share more information about this patch. Especially for the tree-ssa-sccvn.cc part.
>
> Assume we have the blow test code for this issue.
>
> void
> test_1(int8_t * restrict in, int8_t * restrict out) {
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
>
> *(vbool8_t*)(out + 100) = v2;
> *(vbool16_t*)(out + 200) = v5;
> }
>
> Without the tree-ssa-sccvn.cc file code change.
> ------------------------------------------------------------------------------------
> void test_1 (int8_t * restrict in, int8_t * restrict out)
> {
> vbool8_t v2;
> __rvv_bool16_t _1;
>
> <bb 2> [local count: 1073741824]:
> v2_4 = MEM[(vbool8_t *)in_3(D)];
> _1 = VIEW_CONVERT_EXPR<__rvv_bool16_t>(v2_4); // insert during 039.fre1
> MEM[(vbool8_t *)out_5(D) + 100B] = v2_4;
> MEM[(vbool16_t *)out_5(D) + 200B] = _1;
> return;
> }
>
> WIthin the tree-ssa-sccvn.cc file code change.
> ------------------------------------------------------------------------------------
> void test_1 (int8_t * restrict in, int8_t * restrict out)
> {
> vbool16_t v5;
> vbool8_t v2;
>
> <bb 2> [local count: 1073741824]:
> v2_3 = MEM[(vbool8_t *)in_2(D)];
> v5_4 = MEM[(vbool16_t *)in_2(D)];
> MEM[(vbool8_t *)out_5(D) + 100B] = v2_3;
> MEM[(vbool16_t *)out_5(D) + 200B] = v5_4;
> return;
> }
>
> Thus, I figured out the a-main.c.039t.fre1 pass results in this CONVERT being inserted.
> With some debugging, I located the difference that comes from the
> expressions_equal_p. If GET_MODE_SIZE(mode) is the same between the VxN8Bimode
> and VxN4Bimode, the expressions_equal_p will compare the same address of a tree, aka
> POLY_INT_CST [8, 8].
>
> visit_reference_op_load
> |- vn_reference_lookup
> |- vn_reference_lookup_2
> |- find_slot_with_hash
> |- vn_reference_hasher::equal
> |- expressions_equal_p
>
> Meanwhile, we also double-checked that set the different MODE_SIZE of both the
> VxN8Bimode and VxN4Bimode (for example, [8, 1] and [4,1] for test only) are able
> to resolve this issue. But they should be [1, 1] according to the ISA semantics.
>
> Thus, we try to set other MODE_XXX but it seems not working at all. For example:
>
> VNx4BIMode NUNITS [0x4, 0x4]
> VNx8BIMode NUNITS [0x8, 0x8]
>
> Finally, I found the TARGET_MODES_TIEABLE_P and inject it into the function
> visit_reference_op_load to resolve this issue.
>
> I will continue to try other ways besides the tree-ssa-sccvn.cc if this may not be
> the right place for this issue.
There are other places like alias analysis which will be not happy
if the mode size/precision do not match reality. So no, I don't think
modes_tieable is the correct thing to check here. Instead the existing
check seems to be to the point but the modes are not set up correctly
to carry the info of one having padding at the end and the other not.
Richard.
> Thank again and will keep you posted.
>
> Pan
>
>
>
> ________________________________
> From: Andrew Stubbs <ams@codesourcery.com>
> Sent: Monday, February 13, 2023 19:00
> To: Richard Biener <rguenther@suse.de>; juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> Cc: Pan Li <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; kito.cheng <kito.cheng@sifive.com>; richard.sandiford@arm.com <richard.sandiford@arm.com>
> Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
>
> I presume I've been CC'd on this conversation because weird vector
> architecture problems have happened to me before. :)
>
> However, I'm not sure I can help much because AMD GCN does not use
> BImode vectors at all. This is partly because loading boolean values
> into a GCN vector would have 31 padding bits for each lane, but mostly
> because the result of comparison instructions is written to a DImode
> scalar register, not into a vector.
>
> I did experiment, long ago, with having a V64BImode that could be stored
> in scalar registers (tieable with DImode), but there wasn't any great
> advantage and it broke VECTOR_MODE_P in most other contexts.
>
> It's possible to store truth values in vectors as integers, and there
> are some cases where we do so (SIMD clone mask arguments, for example),
> but that's mostly to smooth things over in the middle-end.
>
> The problem with padding bits is something I do see: V64QImode has 24
> padding bits for each lane, in register. While there are instructions
> that will load and store QImode vectors correctly, without the padding,
> the backend still has to handle all the sign-extends, zero-extends, and
> truncates explicitly, because the middle-end and expand pass give no
> assistance with that for vectors (unlike scalars).
>
> Andrew
> On 13/02/2023 08:07, Richard Biener via Gcc-patches wrote:
> > On Sat, 11 Feb 2023, juzhe.zhong@rivai.ai wrote:
> >
> >> Thanks for contributing this.
> >> Hi, Richard. Can you help us with this issue?
> >> In RVV, we have vbool8_t (VNx8BImode), vbool16_t (VNx4BImode), vbool32_t (VNx2BImode), vbool64_t (VNx1BImode)
> >> Since we are using 1bit-mask which is 1-BOOL occupy 1bit.
> >> According to RVV ISA, we adjust these modes as follows:
> >>
> >> VNx8BImode poly (8,8) NUNTTS (each nunits is 1bit mask)
> >> VNx4BImode poly(4,4) NUNTTS (each nunits is 1bit mask)
> >> VNx2BImode poly(2,2) NUNTTS (each nunits is 1bit mask)
> >> VNx1BImode poly (1,1) NUNTTS (each nunits is 1bit mask)
> >
> > So how's VNx1BImode laid out for N == 2? Is that still a single
> > byte and two consecutive bits? I suppose so.
> >
> > But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
> >
> > I'm not sure what GET_MODE_PRECISION of the vector mode itself
> > should be here, but then I wonder ...
> >
> >> If we tried GET_MODE_BITSIZE or GET_MODE_NUNITS to get value, their value are different.
> >> However, If we tried GET_MODE_SIZE of these modes, they are the same (poly (1,1)).
> >> Such scenario make these tied together and gives the wrong code gen since their bitsize are different.
> >> Consider the case as this:
> >> #include "riscv_vector.h"
> >> void foo5_3 (int32_t * restrict in, int32_t * restrict out, size_t n, int cond)
> >> {
> >> vint8m1_t v = *(vint8m1_t*)in;
> >> *(vint8m1_t*)out = v; vbool16_t v4 = *(vbool16_t *)in;
> >> *(vbool16_t *)(out + 300) = v4;
> >> vbool8_t v3 = *(vbool8_t*)in;
> >> *(vbool8_t*)(out + 200) = v3;
> >> }
> >> The second vbool8_t load (vlm.v) is missing. Since GCC gives "v3 = VIEW_CONVERT (vbool8_t) v4" in gimple.
> >> We failed to fix it in RISC-V backend. Can you help us with this? Thanks.
> >
> > ... why for the loads the "padding" is not loaded? The above testcase
> > is probably more complicated than necessary as well?
> >
> > Thanks,
> > Richard.
> >
> >>
> >> juzhe.zhong@rivai.ai
> >>
> >> From: incarnation.p.lee
> >> Date: 2023-02-11 16:46
> >> To: gcc-patches
> >> CC: juzhe.zhong; kito.cheng; rguenther; Pan Li
> >> Subject: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
> >> From: Pan Li <incarnation.p.lee@outlook.com>
> >>
> >> Fix the bug for mode tieable of the rvv bool types. The vbool*_t
> >> cannot be tied as the actually load/store size is determinated by
> >> the vl. The mode size of rvv bool types are also adjusted for the
> >> underlying optimization pass. The rvv bool type is vbool*_t, aka
> >> vbool1_t, vbool2_t, vbool4_t, vbool8_t, vbool16_t, vbool32_t, and
> >> vbool64_t.
> >>
> >> PR 108185
> >> PR 108654
> >>
> >> gcc/ChangeLog:
> >>
> >> * config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
> >> * config/riscv/riscv.cc (riscv_v_adjust_bytesize):
> >> (riscv_modes_tieable_p):
> >> * config/riscv/riscv.h (riscv_v_adjust_bytesize):
> >> * machmode.h (VECTOR_BOOL_MODE_P):
> >> * tree-ssa-sccvn.cc (visit_reference_op_load):
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.target/riscv/pr108185-1.c: New test.
> >> * gcc.target/riscv/pr108185-2.c: New test.
> >> * gcc.target/riscv/pr108185-3.c: New test.
> >> * gcc.target/riscv/pr108185-4.c: New test.
> >> * gcc.target/riscv/pr108185-5.c: New test.
> >> * gcc.target/riscv/pr108185-6.c: New test.
> >> * gcc.target/riscv/pr108185-7.c: New test.
> >> * gcc.target/riscv/pr108185-8.c: New test.
> >>
> >> Signed-off-by: Pan Li <incarnation.p.lee@outlook.com>
> >> ---
> >> gcc/config/riscv/riscv-modes.def | 14 ++--
> >> gcc/config/riscv/riscv.cc | 34 ++++++++-
> >> gcc/config/riscv/riscv.h | 2 +
> >> gcc/machmode.h | 3 +
> >> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
> >> gcc/tree-ssa-sccvn.cc | 13 +++-
> >> 13 files changed, 608 insertions(+), 11 deletions(-)
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >>
> >> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
> >> index d5305efa8a6..cc21d3c83a2 100644
> >> --- a/gcc/config/riscv/riscv-modes.def
> >> +++ b/gcc/config/riscv/riscv-modes.def
> >> @@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
> >> ADJUST_ALIGNMENT (VNx32BI, 1);
> >> ADJUST_ALIGNMENT (VNx64BI, 1);
> >> -ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
> >> +ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
> >> +ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
> >> +ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1));
> >> +ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1));
> >> +ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2));
> >> +ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4));
> >> +ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
> >> /*
> >> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> >> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> >> index 3b7804b7501..138c052e13c 100644
> >> --- a/gcc/config/riscv/riscv.cc
> >> +++ b/gcc/config/riscv/riscv.cc
> >> @@ -1003,6 +1003,27 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
> >> return scale;
> >> }
> >> +/* Call from ADJUST_BYTESIZE in riscv-modes.def. Return the correct
> >> + BYTES size for corresponding machine_mode. */
> >> +
> >> +poly_int64
> >> +riscv_v_adjust_bytesize (machine_mode mode, int scale)
> >> +{
> >> + gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
> >> +
> >> + if (riscv_v_ext_vector_mode_p (mode))
> >> + {
> >> + poly_uint16 mode_size = GET_MODE_SIZE (mode);
> >> +
> >> + if (known_lt (mode_size, BYTES_PER_RISCV_VECTOR))
> >> + return mode_size;
> >> + else
> >> + return BYTES_PER_RISCV_VECTOR;
> >> + }
> >> +
> >> + return scale;
> >> +}
> >> +
> >> /* Return true if X is a valid address for machine mode MODE. If it is,
> >> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
> >> effect. */
> >> @@ -5807,11 +5828,22 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
> >> /* Implement TARGET_MODES_TIEABLE_P.
> >> Don't allow floating-point modes to be tied, since type punning of
> >> - single-precision and double-precision is implementation defined. */
> >> + single-precision and double-precision is implementation defined.
> >> +
> >> + Don't allow different vbool*_t modes to be tied, since the type
> >> + size is determinated by vl. */
> >> static bool
> >> riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
> >> {
> >> + if (riscv_v_ext_vector_mode_p (mode1) && riscv_v_ext_vector_mode_p (mode2))
> >> + {
> >> + if (VECTOR_BOOL_MODE_P (mode1) || VECTOR_BOOL_MODE_P (mode2))
> >> + return false;
> >> +
> >> + return known_eq (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2));
> >> + }
> >> +
> >> return (mode1 == mode2
> >> || !(GET_MODE_CLASS (mode1) == MODE_FLOAT
> >> && GET_MODE_CLASS (mode2) == MODE_FLOAT));
> >> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> >> index faffd5a77fe..f857223338c 100644
> >> --- a/gcc/config/riscv/riscv.h
> >> +++ b/gcc/config/riscv/riscv.h
> >> @@ -1028,6 +1028,8 @@ extern unsigned riscv_stack_boundary;
> >> extern unsigned riscv_bytes_per_vector_chunk;
> >> extern poly_uint16 riscv_vector_chunks;
> >> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> >> +extern poly_int64 riscv_v_adjust_bytesize (machine_mode mode, int scale);
> >> +
> >> /* The number of bits and bytes in a RVV vector. */
> >> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
> >> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
> >> diff --git a/gcc/machmode.h b/gcc/machmode.h
> >> index f1865c1ef42..6720472f2c9 100644
> >> --- a/gcc/machmode.h
> >> +++ b/gcc/machmode.h
> >> @@ -242,6 +242,9 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
> >> || CLASS == MODE_ACCUM \
> >> || CLASS == MODE_UACCUM)
> >> +/* Nonzero if MODE is an vector bool mode. */
> >> +#define VECTOR_BOOL_MODE_P(MODE) (GET_MODE_CLASS(MODE) == MODE_VECTOR_BOOL)
> >> +
> >> /* An optional T (i.e. a T or nothing), where T is some form of mode class. */
> >> template<typename T>
> >> class opt_mode
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >> new file mode 100644
> >> index 00000000000..c3d0b10271a
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >> new file mode 100644
> >> index 00000000000..bd13ba916da
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >> new file mode 100644
> >> index 00000000000..99928f7b1cc
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >> new file mode 100644
> >> index 00000000000..e70284fada8
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >> new file mode 100644
> >> index 00000000000..575a7842cdf
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >> new file mode 100644
> >> index 00000000000..95a11d37016
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >> new file mode 100644
> >> index 00000000000..8f6f0b11f09
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >> new file mode 100644
> >> index 00000000000..d96959dd064
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >> @@ -0,0 +1,77 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> >> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> >> index 028bedbc9a0..19fdba8cfa2 100644
> >> --- a/gcc/tree-ssa-sccvn.cc
> >> +++ b/gcc/tree-ssa-sccvn.cc
> >> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see
> >> #include "gimple-fold.h"
> >> #include "tree-eh.h"
> >> #include "gimplify.h"
> >> +#include "target.h"
> >> #include "flags.h"
> >> #include "dojump.h"
> >> #include "explow.h"
> >> @@ -5657,10 +5658,16 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
> >> if (result
> >> && !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
> >> {
> >> + machine_mode result_mode = TYPE_MODE (TREE_TYPE (result));
> >> + machine_mode op_mode = TYPE_MODE (TREE_TYPE (op));
> >> + poly_uint16 result_mode_precision = GET_MODE_PRECISION (result_mode);
> >> + poly_uint16 op_mode_precision = GET_MODE_PRECISION (op_mode);
> >> +
> >> /* Avoid the type punning in case the result mode has padding where
> >> - the op we lookup has not. */
> >> - if (maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
> >> - GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)))))
> >> + the op we lookup has not.
> >> + Avoid the type punning in case the target mode cannot be tied. */
> >> + if (maybe_lt (result_mode_precision, op_mode_precision)
> >> + || !targetm.modes_tieable_p (result_mode, op_mode))
> >> result = NULL_TREE;
> >> else
> >> {
> >>
> >
>
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
Hi all,
Thank you for your patience. Just file another PR like "Bugfix for rvv bool mode precision adjustment" for the mode precision adjustment only. Feel free to comment if any questions or concerns.
Pan
________________________________
From: 盼 李 <incarnation.p.lee@outlook.com>
Sent: Wednesday, February 15, 2023 23:57
To: Richard Biener <rguenther@suse.de>
Cc: Andrew Stubbs <ams@codesourcery.com>; juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; gcc-patches <gcc-patches@gcc.gnu.org>; kito.cheng <kito.cheng@sifive.com>; richard.sandiford@arm.com <richard.sandiford@arm.com>
Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
After some investigation, the mode precision adjusting can help to tell the difference from the VxN1BI to VxN64BI, besides the existing mode_size. Thus I would like to prepare the patch for the precision adjustment only first.
Unfortunately, there is one selftest failure right now when I try to adjust the precision of VxN*BI and I am still working on it. Of course, will keep you all posted.
VxN1BI adjust precision => 1
VxN2BI adjust precision => 2
VxN4BI adjust precision => 4
VxN8BI adjust precision => 8
VxN16BI adjust precision => 16
VxN32BI adjust precision => 32
VxN64BI adjust precision => 64
Pan
________________________________
From: Richard Biener <rguenther@suse.de>
Sent: Monday, February 13, 2023 23:47
To: 盼 李 <incarnation.p.lee@outlook.com>
Cc: Andrew Stubbs <ams@codesourcery.com>; juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; gcc-patches <gcc-patches@gcc.gnu.org>; kito.cheng <kito.cheng@sifive.com>; richard.sandiford@arm.com <richard.sandiford@arm.com>
Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
On Mon, 13 Feb 2023, 盼 李 wrote:
> Thanks all for your help and comments.
>
> Let me share more information about this patch. Especially for the tree-ssa-sccvn.cc part.
>
> Assume we have the blow test code for this issue.
>
> void
> test_1(int8_t * restrict in, int8_t * restrict out) {
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
>
> *(vbool8_t*)(out + 100) = v2;
> *(vbool16_t*)(out + 200) = v5;
> }
>
> Without the tree-ssa-sccvn.cc file code change.
> ------------------------------------------------------------------------------------
> void test_1 (int8_t * restrict in, int8_t * restrict out)
> {
> vbool8_t v2;
> __rvv_bool16_t _1;
>
> <bb 2> [local count: 1073741824]:
> v2_4 = MEM[(vbool8_t *)in_3(D)];
> _1 = VIEW_CONVERT_EXPR<__rvv_bool16_t>(v2_4); // insert during 039.fre1
> MEM[(vbool8_t *)out_5(D) + 100B] = v2_4;
> MEM[(vbool16_t *)out_5(D) + 200B] = _1;
> return;
> }
>
> WIthin the tree-ssa-sccvn.cc file code change.
> ------------------------------------------------------------------------------------
> void test_1 (int8_t * restrict in, int8_t * restrict out)
> {
> vbool16_t v5;
> vbool8_t v2;
>
> <bb 2> [local count: 1073741824]:
> v2_3 = MEM[(vbool8_t *)in_2(D)];
> v5_4 = MEM[(vbool16_t *)in_2(D)];
> MEM[(vbool8_t *)out_5(D) + 100B] = v2_3;
> MEM[(vbool16_t *)out_5(D) + 200B] = v5_4;
> return;
> }
>
> Thus, I figured out the a-main.c.039t.fre1 pass results in this CONVERT being inserted.
> With some debugging, I located the difference that comes from the
> expressions_equal_p. If GET_MODE_SIZE(mode) is the same between the VxN8Bimode
> and VxN4Bimode, the expressions_equal_p will compare the same address of a tree, aka
> POLY_INT_CST [8, 8].
>
> visit_reference_op_load
> |- vn_reference_lookup
> |- vn_reference_lookup_2
> |- find_slot_with_hash
> |- vn_reference_hasher::equal
> |- expressions_equal_p
>
> Meanwhile, we also double-checked that set the different MODE_SIZE of both the
> VxN8Bimode and VxN4Bimode (for example, [8, 1] and [4,1] for test only) are able
> to resolve this issue. But they should be [1, 1] according to the ISA semantics.
>
> Thus, we try to set other MODE_XXX but it seems not working at all. For example:
>
> VNx4BIMode NUNITS [0x4, 0x4]
> VNx8BIMode NUNITS [0x8, 0x8]
>
> Finally, I found the TARGET_MODES_TIEABLE_P and inject it into the function
> visit_reference_op_load to resolve this issue.
>
> I will continue to try other ways besides the tree-ssa-sccvn.cc if this may not be
> the right place for this issue.
There are other places like alias analysis which will be not happy
if the mode size/precision do not match reality. So no, I don't think
modes_tieable is the correct thing to check here. Instead the existing
check seems to be to the point but the modes are not set up correctly
to carry the info of one having padding at the end and the other not.
Richard.
> Thank again and will keep you posted.
>
> Pan
>
>
>
> ________________________________
> From: Andrew Stubbs <ams@codesourcery.com>
> Sent: Monday, February 13, 2023 19:00
> To: Richard Biener <rguenther@suse.de>; juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> Cc: Pan Li <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; kito.cheng <kito.cheng@sifive.com>; richard.sandiford@arm.com <richard.sandiford@arm.com>
> Subject: Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
>
> I presume I've been CC'd on this conversation because weird vector
> architecture problems have happened to me before. :)
>
> However, I'm not sure I can help much because AMD GCN does not use
> BImode vectors at all. This is partly because loading boolean values
> into a GCN vector would have 31 padding bits for each lane, but mostly
> because the result of comparison instructions is written to a DImode
> scalar register, not into a vector.
>
> I did experiment, long ago, with having a V64BImode that could be stored
> in scalar registers (tieable with DImode), but there wasn't any great
> advantage and it broke VECTOR_MODE_P in most other contexts.
>
> It's possible to store truth values in vectors as integers, and there
> are some cases where we do so (SIMD clone mask arguments, for example),
> but that's mostly to smooth things over in the middle-end.
>
> The problem with padding bits is something I do see: V64QImode has 24
> padding bits for each lane, in register. While there are instructions
> that will load and store QImode vectors correctly, without the padding,
> the backend still has to handle all the sign-extends, zero-extends, and
> truncates explicitly, because the middle-end and expand pass give no
> assistance with that for vectors (unlike scalars).
>
> Andrew
> On 13/02/2023 08:07, Richard Biener via Gcc-patches wrote:
> > On Sat, 11 Feb 2023, juzhe.zhong@rivai.ai wrote:
> >
> >> Thanks for contributing this.
> >> Hi, Richard. Can you help us with this issue?
> >> In RVV, we have vbool8_t (VNx8BImode), vbool16_t (VNx4BImode), vbool32_t (VNx2BImode), vbool64_t (VNx1BImode)
> >> Since we are using 1bit-mask which is 1-BOOL occupy 1bit.
> >> According to RVV ISA, we adjust these modes as follows:
> >>
> >> VNx8BImode poly (8,8) NUNTTS (each nunits is 1bit mask)
> >> VNx4BImode poly(4,4) NUNTTS (each nunits is 1bit mask)
> >> VNx2BImode poly(2,2) NUNTTS (each nunits is 1bit mask)
> >> VNx1BImode poly (1,1) NUNTTS (each nunits is 1bit mask)
> >
> > So how's VNx1BImode laid out for N == 2? Is that still a single
> > byte and two consecutive bits? I suppose so.
> >
> > But then GET_MODE_PRECISION (GET_MODE_INNER (..)) should always be 1?
> >
> > I'm not sure what GET_MODE_PRECISION of the vector mode itself
> > should be here, but then I wonder ...
> >
> >> If we tried GET_MODE_BITSIZE or GET_MODE_NUNITS to get value, their value are different.
> >> However, If we tried GET_MODE_SIZE of these modes, they are the same (poly (1,1)).
> >> Such scenario make these tied together and gives the wrong code gen since their bitsize are different.
> >> Consider the case as this:
> >> #include "riscv_vector.h"
> >> void foo5_3 (int32_t * restrict in, int32_t * restrict out, size_t n, int cond)
> >> {
> >> vint8m1_t v = *(vint8m1_t*)in;
> >> *(vint8m1_t*)out = v; vbool16_t v4 = *(vbool16_t *)in;
> >> *(vbool16_t *)(out + 300) = v4;
> >> vbool8_t v3 = *(vbool8_t*)in;
> >> *(vbool8_t*)(out + 200) = v3;
> >> }
> >> The second vbool8_t load (vlm.v) is missing. Since GCC gives "v3 = VIEW_CONVERT (vbool8_t) v4" in gimple.
> >> We failed to fix it in RISC-V backend. Can you help us with this? Thanks.
> >
> > ... why for the loads the "padding" is not loaded? The above testcase
> > is probably more complicated than necessary as well?
> >
> > Thanks,
> > Richard.
> >
> >>
> >> juzhe.zhong@rivai.ai
> >>
> >> From: incarnation.p.lee
> >> Date: 2023-02-11 16:46
> >> To: gcc-patches
> >> CC: juzhe.zhong; kito.cheng; rguenther; Pan Li
> >> Subject: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types
> >> From: Pan Li <incarnation.p.lee@outlook.com>
> >>
> >> Fix the bug for mode tieable of the rvv bool types. The vbool*_t
> >> cannot be tied as the actually load/store size is determinated by
> >> the vl. The mode size of rvv bool types are also adjusted for the
> >> underlying optimization pass. The rvv bool type is vbool*_t, aka
> >> vbool1_t, vbool2_t, vbool4_t, vbool8_t, vbool16_t, vbool32_t, and
> >> vbool64_t.
> >>
> >> PR 108185
> >> PR 108654
> >>
> >> gcc/ChangeLog:
> >>
> >> * config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
> >> * config/riscv/riscv.cc (riscv_v_adjust_bytesize):
> >> (riscv_modes_tieable_p):
> >> * config/riscv/riscv.h (riscv_v_adjust_bytesize):
> >> * machmode.h (VECTOR_BOOL_MODE_P):
> >> * tree-ssa-sccvn.cc (visit_reference_op_load):
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.target/riscv/pr108185-1.c: New test.
> >> * gcc.target/riscv/pr108185-2.c: New test.
> >> * gcc.target/riscv/pr108185-3.c: New test.
> >> * gcc.target/riscv/pr108185-4.c: New test.
> >> * gcc.target/riscv/pr108185-5.c: New test.
> >> * gcc.target/riscv/pr108185-6.c: New test.
> >> * gcc.target/riscv/pr108185-7.c: New test.
> >> * gcc.target/riscv/pr108185-8.c: New test.
> >>
> >> Signed-off-by: Pan Li <incarnation.p.lee@outlook.com>
> >> ---
> >> gcc/config/riscv/riscv-modes.def | 14 ++--
> >> gcc/config/riscv/riscv.cc | 34 ++++++++-
> >> gcc/config/riscv/riscv.h | 2 +
> >> gcc/machmode.h | 3 +
> >> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
> >> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
> >> gcc/tree-ssa-sccvn.cc | 13 +++-
> >> 13 files changed, 608 insertions(+), 11 deletions(-)
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >>
> >> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
> >> index d5305efa8a6..cc21d3c83a2 100644
> >> --- a/gcc/config/riscv/riscv-modes.def
> >> +++ b/gcc/config/riscv/riscv-modes.def
> >> @@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
> >> ADJUST_ALIGNMENT (VNx32BI, 1);
> >> ADJUST_ALIGNMENT (VNx64BI, 1);
> >> -ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >> -ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
> >> +ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
> >> +ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
> >> +ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1));
> >> +ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1));
> >> +ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2));
> >> +ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4));
> >> +ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
> >> /*
> >> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> >> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> >> index 3b7804b7501..138c052e13c 100644
> >> --- a/gcc/config/riscv/riscv.cc
> >> +++ b/gcc/config/riscv/riscv.cc
> >> @@ -1003,6 +1003,27 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
> >> return scale;
> >> }
> >> +/* Call from ADJUST_BYTESIZE in riscv-modes.def. Return the correct
> >> + BYTES size for corresponding machine_mode. */
> >> +
> >> +poly_int64
> >> +riscv_v_adjust_bytesize (machine_mode mode, int scale)
> >> +{
> >> + gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
> >> +
> >> + if (riscv_v_ext_vector_mode_p (mode))
> >> + {
> >> + poly_uint16 mode_size = GET_MODE_SIZE (mode);
> >> +
> >> + if (known_lt (mode_size, BYTES_PER_RISCV_VECTOR))
> >> + return mode_size;
> >> + else
> >> + return BYTES_PER_RISCV_VECTOR;
> >> + }
> >> +
> >> + return scale;
> >> +}
> >> +
> >> /* Return true if X is a valid address for machine mode MODE. If it is,
> >> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
> >> effect. */
> >> @@ -5807,11 +5828,22 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
> >> /* Implement TARGET_MODES_TIEABLE_P.
> >> Don't allow floating-point modes to be tied, since type punning of
> >> - single-precision and double-precision is implementation defined. */
> >> + single-precision and double-precision is implementation defined.
> >> +
> >> + Don't allow different vbool*_t modes to be tied, since the type
> >> + size is determinated by vl. */
> >> static bool
> >> riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
> >> {
> >> + if (riscv_v_ext_vector_mode_p (mode1) && riscv_v_ext_vector_mode_p (mode2))
> >> + {
> >> + if (VECTOR_BOOL_MODE_P (mode1) || VECTOR_BOOL_MODE_P (mode2))
> >> + return false;
> >> +
> >> + return known_eq (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2));
> >> + }
> >> +
> >> return (mode1 == mode2
> >> || !(GET_MODE_CLASS (mode1) == MODE_FLOAT
> >> && GET_MODE_CLASS (mode2) == MODE_FLOAT));
> >> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> >> index faffd5a77fe..f857223338c 100644
> >> --- a/gcc/config/riscv/riscv.h
> >> +++ b/gcc/config/riscv/riscv.h
> >> @@ -1028,6 +1028,8 @@ extern unsigned riscv_stack_boundary;
> >> extern unsigned riscv_bytes_per_vector_chunk;
> >> extern poly_uint16 riscv_vector_chunks;
> >> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> >> +extern poly_int64 riscv_v_adjust_bytesize (machine_mode mode, int scale);
> >> +
> >> /* The number of bits and bytes in a RVV vector. */
> >> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
> >> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
> >> diff --git a/gcc/machmode.h b/gcc/machmode.h
> >> index f1865c1ef42..6720472f2c9 100644
> >> --- a/gcc/machmode.h
> >> +++ b/gcc/machmode.h
> >> @@ -242,6 +242,9 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
> >> || CLASS == MODE_ACCUM \
> >> || CLASS == MODE_UACCUM)
> >> +/* Nonzero if MODE is an vector bool mode. */
> >> +#define VECTOR_BOOL_MODE_P(MODE) (GET_MODE_CLASS(MODE) == MODE_VECTOR_BOOL)
> >> +
> >> /* An optional T (i.e. a T or nothing), where T is some form of mode class. */
> >> template<typename T>
> >> class opt_mode
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >> new file mode 100644
> >> index 00000000000..c3d0b10271a
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >> new file mode 100644
> >> index 00000000000..bd13ba916da
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >> new file mode 100644
> >> index 00000000000..99928f7b1cc
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >> new file mode 100644
> >> index 00000000000..e70284fada8
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >> new file mode 100644
> >> index 00000000000..575a7842cdf
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >> new file mode 100644
> >> index 00000000000..95a11d37016
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >> new file mode 100644
> >> index 00000000000..8f6f0b11f09
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >> @@ -0,0 +1,68 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >> new file mode 100644
> >> index 00000000000..d96959dd064
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >> @@ -0,0 +1,77 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >> +
> >> +#include "riscv_vector.h"
> >> +
> >> +void
> >> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool1_t v1 = *(vbool1_t*)in;
> >> + vbool1_t v2 = *(vbool1_t*)in;
> >> +
> >> + *(vbool1_t*)(out + 100) = v1;
> >> + *(vbool1_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool2_t v1 = *(vbool2_t*)in;
> >> + vbool2_t v2 = *(vbool2_t*)in;
> >> +
> >> + *(vbool2_t*)(out + 100) = v1;
> >> + *(vbool2_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool4_t v1 = *(vbool4_t*)in;
> >> + vbool4_t v2 = *(vbool4_t*)in;
> >> +
> >> + *(vbool4_t*)(out + 100) = v1;
> >> + *(vbool4_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool8_t v1 = *(vbool8_t*)in;
> >> + vbool8_t v2 = *(vbool8_t*)in;
> >> +
> >> + *(vbool8_t*)(out + 100) = v1;
> >> + *(vbool8_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool16_t v1 = *(vbool16_t*)in;
> >> + vbool16_t v2 = *(vbool16_t*)in;
> >> +
> >> + *(vbool16_t*)(out + 100) = v1;
> >> + *(vbool16_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool32_t v1 = *(vbool32_t*)in;
> >> + vbool32_t v2 = *(vbool32_t*)in;
> >> +
> >> + *(vbool32_t*)(out + 100) = v1;
> >> + *(vbool32_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +void
> >> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >> + vbool64_t v1 = *(vbool64_t*)in;
> >> + vbool64_t v2 = *(vbool64_t*)in;
> >> +
> >> + *(vbool64_t*)(out + 100) = v1;
> >> + *(vbool64_t*)(out + 200) = v2;
> >> +}
> >> +
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
> >> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> >> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> >> index 028bedbc9a0..19fdba8cfa2 100644
> >> --- a/gcc/tree-ssa-sccvn.cc
> >> +++ b/gcc/tree-ssa-sccvn.cc
> >> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see
> >> #include "gimple-fold.h"
> >> #include "tree-eh.h"
> >> #include "gimplify.h"
> >> +#include "target.h"
> >> #include "flags.h"
> >> #include "dojump.h"
> >> #include "explow.h"
> >> @@ -5657,10 +5658,16 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
> >> if (result
> >> && !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
> >> {
> >> + machine_mode result_mode = TYPE_MODE (TREE_TYPE (result));
> >> + machine_mode op_mode = TYPE_MODE (TREE_TYPE (op));
> >> + poly_uint16 result_mode_precision = GET_MODE_PRECISION (result_mode);
> >> + poly_uint16 op_mode_precision = GET_MODE_PRECISION (op_mode);
> >> +
> >> /* Avoid the type punning in case the result mode has padding where
> >> - the op we lookup has not. */
> >> - if (maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
> >> - GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)))))
> >> + the op we lookup has not.
> >> + Avoid the type punning in case the target mode cannot be tied. */
> >> + if (maybe_lt (result_mode_precision, op_mode_precision)
> >> + || !targetm.modes_tieable_p (result_mode, op_mode))
> >> result = NULL_TREE;
> >> else
> >> {
> >>
> >
>
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
@@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
ADJUST_ALIGNMENT (VNx32BI, 1);
ADJUST_ALIGNMENT (VNx64BI, 1);
-ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
+ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
+ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
+ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1));
+ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1));
+ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2));
+ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4));
+ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
/*
| Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
@@ -1003,6 +1003,27 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
return scale;
}
+/* Call from ADJUST_BYTESIZE in riscv-modes.def. Return the correct
+ BYTES size for corresponding machine_mode. */
+
+poly_int64
+riscv_v_adjust_bytesize (machine_mode mode, int scale)
+{
+ gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
+
+ if (riscv_v_ext_vector_mode_p (mode))
+ {
+ poly_uint16 mode_size = GET_MODE_SIZE (mode);
+
+ if (known_lt (mode_size, BYTES_PER_RISCV_VECTOR))
+ return mode_size;
+ else
+ return BYTES_PER_RISCV_VECTOR;
+ }
+
+ return scale;
+}
+
/* Return true if X is a valid address for machine mode MODE. If it is,
fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
effect. */
@@ -5807,11 +5828,22 @@ riscv_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
/* Implement TARGET_MODES_TIEABLE_P.
Don't allow floating-point modes to be tied, since type punning of
- single-precision and double-precision is implementation defined. */
+ single-precision and double-precision is implementation defined.
+
+ Don't allow different vbool*_t modes to be tied, since the type
+ size is determinated by vl. */
static bool
riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
{
+ if (riscv_v_ext_vector_mode_p (mode1) && riscv_v_ext_vector_mode_p (mode2))
+ {
+ if (VECTOR_BOOL_MODE_P (mode1) || VECTOR_BOOL_MODE_P (mode2))
+ return false;
+
+ return known_eq (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2));
+ }
+
return (mode1 == mode2
|| !(GET_MODE_CLASS (mode1) == MODE_FLOAT
&& GET_MODE_CLASS (mode2) == MODE_FLOAT));
@@ -1028,6 +1028,8 @@ extern unsigned riscv_stack_boundary;
extern unsigned riscv_bytes_per_vector_chunk;
extern poly_uint16 riscv_vector_chunks;
extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
+extern poly_int64 riscv_v_adjust_bytesize (machine_mode mode, int scale);
+
/* The number of bits and bytes in a RVV vector. */
#define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
#define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
@@ -242,6 +242,9 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
|| CLASS == MODE_ACCUM \
|| CLASS == MODE_UACCUM)
+/* Nonzero if MODE is an vector bool mode. */
+#define VECTOR_BOOL_MODE_P(MODE) (GET_MODE_CLASS(MODE) == MODE_VECTOR_BOOL)
+
/* An optional T (i.e. a T or nothing), where T is some form of mode class. */
template<typename T>
class opt_mode
new file mode 100644
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
new file mode 100644
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
new file mode 100644
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
new file mode 100644
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
new file mode 100644
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
new file mode 100644
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
new file mode 100644
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
new file mode 100644
@@ -0,0 +1,77 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see
#include "gimple-fold.h"
#include "tree-eh.h"
#include "gimplify.h"
+#include "target.h"
#include "flags.h"
#include "dojump.h"
#include "explow.h"
@@ -5657,10 +5658,16 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
if (result
&& !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
{
+ machine_mode result_mode = TYPE_MODE (TREE_TYPE (result));
+ machine_mode op_mode = TYPE_MODE (TREE_TYPE (op));
+ poly_uint16 result_mode_precision = GET_MODE_PRECISION (result_mode);
+ poly_uint16 op_mode_precision = GET_MODE_PRECISION (op_mode);
+
/* Avoid the type punning in case the result mode has padding where
- the op we lookup has not. */
- if (maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
- GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)))))
+ the op we lookup has not.
+ Avoid the type punning in case the target mode cannot be tied. */
+ if (maybe_lt (result_mode_precision, op_mode_precision)
+ || !targetm.modes_tieable_p (result_mode, op_mode))
result = NULL_TREE;
else
{