RISC-V: Bugfix for rvv bool mode precision adjustment
Checks
Commit Message
From: Pan Li <pan2.li@intel.com>
Fix the bug of the rvv bool mode precision with the adjustment.
The bits size of vbool*_t will be adjusted to
[1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
adjusted mode precison of vbool*_t will help underlying pass to
make the right decision for both the correctness and optimization.
Given below sample code:
void test_1(int8_t * restrict in, int8_t * restrict out)
{
vbool8_t v2 = *(vbool8_t*)in;
vbool16_t v5 = *(vbool16_t*)in;
*(vbool16_t*)(out + 200) = v5;
*(vbool8_t*)(out + 100) = v2;
}
Before the precision adjustment:
addi a4,a1,100
vsetvli a5,zero,e8,m1,ta,ma
addi a1,a1,200
vlm.v v24,0(a0)
vsm.v v24,0(a4)
// Need one vsetvli and vlm.v for correctness here.
vsm.v v24,0(a1)
After the precision adjustment:
csrr t0,vlenb
slli t1,t0,1
csrr a3,vlenb
sub sp,sp,t1
slli a4,a3,1
add a4,a4,sp
sub a3,a4,a3
vsetvli a5,zero,e8,m1,ta,ma
addi a2,a1,200
vlm.v v24,0(a0)
vsm.v v24,0(a3)
addi a1,a1,100
vsetvli a4,zero,e8,mf2,ta,ma
csrr t0,vlenb
vlm.v v25,0(a3)
vsm.v v25,0(a2)
slli t1,t0,1
vsetvli a5,zero,e8,m1,ta,ma
vsm.v v24,0(a1)
add sp,sp,t1
jr ra
However, there may be some optimization opportunates after
the mode precision adjustment. It can be token care of in
the RISC-V backend in the underlying separted PR(s).
PR 108185
PR 108654
gcc/ChangeLog:
* config/riscv/riscv-modes.def (ADJUST_PRECISION):
* config/riscv/riscv.cc (riscv_v_adjust_precision):
* config/riscv/riscv.h (riscv_v_adjust_precision):
* genmodes.cc (ADJUST_PRECISION):
(emit_mode_adjustments):
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr108185-1.c: New test.
* gcc.target/riscv/pr108185-2.c: New test.
* gcc.target/riscv/pr108185-3.c: New test.
* gcc.target/riscv/pr108185-4.c: New test.
* gcc.target/riscv/pr108185-5.c: New test.
* gcc.target/riscv/pr108185-6.c: New test.
* gcc.target/riscv/pr108185-7.c: New test.
* gcc.target/riscv/pr108185-8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
---
gcc/config/riscv/riscv-modes.def | 8 +++
gcc/config/riscv/riscv.cc | 12 ++++
gcc/config/riscv/riscv.h | 1 +
gcc/genmodes.cc | 25 ++++++-
gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
12 files changed, 598 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
Comments
Thank you all.
Hi Richard,
Could you please help to review the precision adjustment related change when you free? I am looking forward your option of this issue from the expert’s perspective, 😉!
Pan
From: juzhe.zhong <juzhe.zhong@rivai.ai>
Sent: Thursday, February 16, 2023 11:23 PM
To: incarnation.p.lee@outlook.com
Cc: gcc-patches@gcc.gnu.org; kito.cheng@sifive.com; rguenther@suse.de; Li, Pan2 <pan2.li@intel.com>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Thanks for the great work to fix this issue for rvv.
Hi,richard. This is the patch to differentiate mask mode of same bytesize. Adjust the precision correctly according to rvv isa. Would you mind helping us with this patch ? Since it‘s very important for rvv support in gcc
Thanks.
---- Replied Message ----
From
incarnation.p.lee@outlook.com<incarnation.p.lee@outlook.com><mailto:incarnation.p.lee@outlook.com>
Date
02/16/2023 23:12
To
gcc-patches@gcc.gnu.org<gcc-patches@gcc.gnu.org><mailto:gcc-patches@gcc.gnu.org>
Cc
juzhe.zhong@rivai.ai<juzhe.zhong@rivai.ai><mailto:juzhe.zhong@rivai.ai>,
kito.cheng@sifive.com<kito.cheng@sifive.com><mailto:kito.cheng@sifive.com>,
rguenther@suse.de<rguenther@suse.de><mailto:rguenther@suse.de>,
pan2.li@intel.com<pan2.li@intel.com><mailto:pan2.li@intel.com>
Subject
[PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>>
Fix the bug of the rvv bool mode precision with the adjustment.
The bits size of vbool*_t will be adjusted to
[1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
adjusted mode precison of vbool*_t will help underlying pass to
make the right decision for both the correctness and optimization.
Given below sample code:
void test_1(int8_t * restrict in, int8_t * restrict out)
{
vbool8_t v2 = *(vbool8_t*)in;
vbool16_t v5 = *(vbool16_t*)in;
*(vbool16_t*)(out + 200) = v5;
*(vbool8_t*)(out + 100) = v2;
}
Before the precision adjustment:
addi a4,a1,100
vsetvli a5,zero,e8,m1,ta,ma
addi a1,a1,200
vlm.v v24,0(a0)
vsm.v v24,0(a4)
// Need one vsetvli and vlm.v for correctness here.
vsm.v v24,0(a1)
After the precision adjustment:
csrr t0,vlenb
slli t1,t0,1
csrr a3,vlenb
sub sp,sp,t1
slli a4,a3,1
add a4,a4,sp
sub a3,a4,a3
vsetvli a5,zero,e8,m1,ta,ma
addi a2,a1,200
vlm.v v24,0(a0)
vsm.v v24,0(a3)
addi a1,a1,100
vsetvli a4,zero,e8,mf2,ta,ma
csrr t0,vlenb
vlm.v v25,0(a3)
vsm.v v25,0(a2)
slli t1,t0,1
vsetvli a5,zero,e8,m1,ta,ma
vsm.v v24,0(a1)
add sp,sp,t1
jr ra
However, there may be some optimization opportunates after
the mode precision adjustment. It can be token care of in
the RISC-V backend in the underlying separted PR(s).
PR 108185
PR 108654
gcc/ChangeLog:
* config/riscv/riscv-modes.def (ADJUST_PRECISION):
* config/riscv/riscv.cc (riscv_v_adjust_precision):
* config/riscv/riscv.h (riscv_v_adjust_precision):
* genmodes.cc (ADJUST_PRECISION):
(emit_mode_adjustments):
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr108185-1.c: New test.
* gcc.target/riscv/pr108185-2.c: New test.
* gcc.target/riscv/pr108185-3.c: New test.
* gcc.target/riscv/pr108185-4.c: New test.
* gcc.target/riscv/pr108185-5.c: New test.
* gcc.target/riscv/pr108185-6.c: New test.
* gcc.target/riscv/pr108185-7.c: New test.
* gcc.target/riscv/pr108185-8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>>
---
gcc/config/riscv/riscv-modes.def | 8 +++
gcc/config/riscv/riscv.cc | 12 ++++
gcc/config/riscv/riscv.h | 1 +
gcc/genmodes.cc | 25 ++++++-
gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
12 files changed, 598 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
index d5305efa8a6..110bddce851 100644
--- a/gcc/config/riscv/riscv-modes.def
+++ b/gcc/config/riscv/riscv-modes.def
@@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
+ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
+ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
+ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
+ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
+ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
+ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
+ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
+
/*
| Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
| | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index de3e1f903c7..cbe66c0e35b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
return scale;
}
+/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
+ PRECISION size for corresponding machine_mode. */
+
+poly_int64
+riscv_v_adjust_precision (machine_mode mode, int scale)
+{
+ if (riscv_v_ext_vector_mode_p (mode))
+ return riscv_vector_chunks * scale;
+
+ return scale;
+}
+
/* Return true if X is a valid address for machine mode MODE. If it is,
fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
effect. */
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 5bc7f2f467d..15b9317a8ce 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
extern unsigned riscv_bytes_per_vector_chunk;
extern poly_uint16 riscv_vector_chunks;
extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
+extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
/* The number of bits and bytes in a RVV vector. */
#define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
#define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
index 2d418f09aab..12f4e6335e6 100644
--- a/gcc/genmodes.cc
+++ b/gcc/genmodes.cc
@@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
static struct mode_adjust *adj_format;
static struct mode_adjust *adj_ibit;
static struct mode_adjust *adj_fbit;
+static struct mode_adjust *adj_precision;
/* Mode class operations. */
static enum mode_class
@@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
#define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
#define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
#define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
+#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
#define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
#define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
#define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
@@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
" (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
m->name, m->name);
printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
- printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
+ /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
+ printf (" poly_uint16 size_one = "
+ "mode_precision[E_%smode].is_constant ()\n", m->name);
+ printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
+ printf (" if (known_lt (mode_precision[E_%smode], "
+ "size_one * BITS_PER_UNIT))\n", m->name);
+ printf (" mode_size[E_%smode] = size_one;\n", m->name);
+ printf (" else\n");
+ printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
" BITS_PER_UNIT);\n", m->name, m->name);
printf (" mode_nunits[E_%smode] = ps;\n", m->name);
printf (" adjust_mode_mask (E_%smode);\n", m->name);
@@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
a->file, a->line, a->mode->name, a->adjustment);
+ /* Adjust precision to the actual bits size. */
+ for (a = adj_precision; a; a = a->next)
+ switch (a->mode->cl)
+ {
+ case MODE_VECTOR_BOOL:
+ printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
+ a->adjustment);
+ printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
+ break;
+ default:
+ break;
+ }
+
puts ("}");
}
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
new file mode 100644
index 00000000000..e70960c5b6d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
new file mode 100644
index 00000000000..dcc7a644a88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
new file mode 100644
index 00000000000..3af0513e006
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
new file mode 100644
index 00000000000..ea3c360d756
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
new file mode 100644
index 00000000000..9fc659d2402
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
new file mode 100644
index 00000000000..98275e5267d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
new file mode 100644
index 00000000000..8f6f0b11f09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
new file mode 100644
index 00000000000..d96959dd064
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
@@ -0,0 +1,77 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
--
2.34.1
On Thu, 16 Feb 2023, juzhe.zhong wrote:
> Thanks for the great work to fix this issue for rvv.Hi,richard. This is the
> patch to differentiate mask mode of same bytesize. Adjust the precision
> correctly according to rvv isa. Would you mind helping us with this patch ?
> Since it‘s very important for rvv support in gcc
If adjusting the precision works fine then I suppose the patch looks
reasonable. I'll defer to Richard S. though since he's the one knowing
the mode stuff better. I'd have integrated the precision adjustment
with the ADJUST_NITER hook since that is also documented to adjust
the precision btw.
Richard.
> Thanks.
> ---- Replied Message ----
> From
> incarnation.p.lee@outlook.com<incarnation.p.lee@outlook.com>
> Date
> 02/16/2023 23:12
> To
> gcc-patches@gcc.gnu.org<gcc-patches@gcc.gnu.org>
> Cc
> juzhe.zhong@rivai.ai<juzhe.zhong@rivai.ai>,
> kito.cheng@sifive.com<kito.cheng@sifive.com>,
> rguenther@suse.de<rguenther@suse.de>,
> pan2.li@intel.com<pan2.li@intel.com>
> Subject
> [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> From: Pan Li <pan2.li@intel.com>
>
> Fix the bug of the rvv bool mode precision with the adjustment.
> The bits size of vbool*_t will be adjusted to
> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
> adjusted mode precison of vbool*_t will help underlying pass to
> make the right decision for both the correctness and optimization.
>
> Given below sample code:
> void test_1(int8_t * restrict in, int8_t * restrict out)
> {
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
> *(vbool16_t*)(out + 200) = v5;
> *(vbool8_t*)(out + 100) = v2;
> }
>
> Before the precision adjustment:
> addi a4,a1,100
> vsetvli a5,zero,e8,m1,ta,ma
> addi a1,a1,200
> vlm.v v24,0(a0)
> vsm.v v24,0(a4)
> // Need one vsetvli and vlm.v for correctness here.
> vsm.v v24,0(a1)
>
> After the precision adjustment:
> csrr t0,vlenb
> slli t1,t0,1
> csrr a3,vlenb
> sub sp,sp,t1
> slli a4,a3,1
> add a4,a4,sp
> sub a3,a4,a3
> vsetvli a5,zero,e8,m1,ta,ma
> addi a2,a1,200
> vlm.v v24,0(a0)
> vsm.v v24,0(a3)
> addi a1,a1,100
> vsetvli a4,zero,e8,mf2,ta,ma
> csrr t0,vlenb
> vlm.v v25,0(a3)
> vsm.v v25,0(a2)
> slli t1,t0,1
> vsetvli a5,zero,e8,m1,ta,ma
> vsm.v v24,0(a1)
> add sp,sp,t1
> jr ra
>
> However, there may be some optimization opportunates after
> the mode precision adjustment. It can be token care of in
> the RISC-V backend in the underlying separted PR(s).
>
> PR 108185
> PR 108654
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
> * config/riscv/riscv.cc (riscv_v_adjust_precision):
> * config/riscv/riscv.h (riscv_v_adjust_precision):
> * genmodes.cc (ADJUST_PRECISION):
> (emit_mode_adjustments):
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr108185-1.c: New test.
> * gcc.target/riscv/pr108185-2.c: New test.
> * gcc.target/riscv/pr108185-3.c: New test.
> * gcc.target/riscv/pr108185-4.c: New test.
> * gcc.target/riscv/pr108185-5.c: New test.
> * gcc.target/riscv/pr108185-6.c: New test.
> * gcc.target/riscv/pr108185-7.c: New test.
> * gcc.target/riscv/pr108185-8.c: New test.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
> gcc/config/riscv/riscv-modes.def | 8 +++
> gcc/config/riscv/riscv.cc | 12 ++++
> gcc/config/riscv/riscv.h | 1 +
> gcc/genmodes.cc | 25 ++++++-
> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
> 12 files changed, 598 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>
> diff --git a/gcc/config/riscv/riscv-modes.def
> b/gcc/config/riscv/riscv-modes.def
> index d5305efa8a6..110bddce851 100644
> --- a/gcc/config/riscv/riscv-modes.def
> +++ b/gcc/config/riscv/riscv-modes.def
> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks *
> riscv_bytes_per_vector_chunk);
> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks *
> riscv_bytes_per_vector_chunk);
> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>
> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
> +
> /*
> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index de3e1f903c7..cbe66c0e35b 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
> return scale;
> }
>
> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
> + PRECISION size for corresponding machine_mode. */
> +
> +poly_int64
> +riscv_v_adjust_precision (machine_mode mode, int scale)
> +{
> + if (riscv_v_ext_vector_mode_p (mode))
> + return riscv_vector_chunks * scale;
> +
> + return scale;
> +}
> +
> /* Return true if X is a valid address for machine mode MODE. If it is,
> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
> effect. */
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 5bc7f2f467d..15b9317a8ce 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
> extern unsigned riscv_bytes_per_vector_chunk;
> extern poly_uint16 riscv_vector_chunks;
> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
> /* The number of bits and bytes in a RVV vector. */
> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks *
> riscv_bytes_per_vector_chunk * 8))
> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks *
> riscv_bytes_per_vector_chunk))
> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
> index 2d418f09aab..12f4e6335e6 100644
> --- a/gcc/genmodes.cc
> +++ b/gcc/genmodes.cc
> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
> static struct mode_adjust *adj_format;
> static struct mode_adjust *adj_ibit;
> static struct mode_adjust *adj_fbit;
> +static struct mode_adjust *adj_precision;
>
> /* Mode class operations. */
> static enum mode_class
> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM,
> RANDOM)
> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT,
> FLOAT)
> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
> m->name, m->name);
> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
> - printf (" mode_size[E_%smode] = exact_div
> (mode_precision[E_%smode],"
> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT.
> */
> + printf (" poly_uint16 size_one = "
> + "mode_precision[E_%smode].is_constant ()\n", m->name);
> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
> + printf (" if (known_lt (mode_precision[E_%smode], "
> + "size_one * BITS_PER_UNIT))\n", m->name);
> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
> + printf (" else\n");
> + printf (" mode_size[E_%smode] = exact_div
> (mode_precision[E_%smode],"
> " BITS_PER_UNIT);\n", m->name, m->name);
> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
> printf (" adjust_mode_mask (E_%smode);\n", m->name);
> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
> a->file, a->line, a->mode->name, a->adjustment);
>
> + /* Adjust precision to the actual bits size. */
> + for (a = adj_precision; a; a = a->next)
> + switch (a->mode->cl)
> + {
> + case MODE_VECTOR_BOOL:
> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
> + a->adjustment);
> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
> + break;
> + default:
> + break;
> + }
> +
> puts ("}");
> }
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> new file mode 100644
> index 00000000000..e70960c5b6d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 18 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> new file mode 100644
> index 00000000000..dcc7a644a88
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 17 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> new file mode 100644
> index 00000000000..3af0513e006
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 16 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> new file mode 100644
> index 00000000000..ea3c360d756
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 15 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> new file mode 100644
> index 00000000000..9fc659d2402
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 14 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> new file mode 100644
> index 00000000000..98275e5267d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 13 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> new file mode 100644
> index 00000000000..8f6f0b11f09
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> new file mode 100644
> index 00000000000..d96959dd064
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> @@ -0,0 +1,77 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 7 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 14 } } */
> --
> 2.34.1
>
>
>
Cool, thank you!
Hi Richard S,
Could you please help to do me a fever for this change when you free? Thank you!
Pan
-----Original Message-----
From: Richard Biener <rguenther@suse.de>
Sent: Friday, February 17, 2023 3:36 PM
To: juzhe.zhong <juzhe.zhong@rivai.ai>
Cc: incarnation.p.lee@outlook.com; gcc-patches@gcc.gnu.org; kito.cheng@sifive.com; Li, Pan2 <pan2.li@intel.com>; richard.sandiford@arm.com
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
On Thu, 16 Feb 2023, juzhe.zhong wrote:
> Thanks for the great work to fix this issue for rvv.Hi,richard. This
> is the patch to differentiate mask mode of same bytesize. Adjust the
> precision correctly according to rvv isa. Would you mind helping us
> with this patch ?
> Since it‘s very important for rvv support in gcc
If adjusting the precision works fine then I suppose the patch looks reasonable. I'll defer to Richard S. though since he's the one knowing the mode stuff better. I'd have integrated the precision adjustment with the ADJUST_NITER hook since that is also documented to adjust the precision btw.
Richard.
> Thanks.
> ---- Replied Message ----
> From
> incarnation.p.lee@outlook.com<incarnation.p.lee@outlook.com>
> Date
> 02/16/2023 23:12
> To
> gcc-patches@gcc.gnu.org<gcc-patches@gcc.gnu.org>
> Cc
> juzhe.zhong@rivai.ai<juzhe.zhong@rivai.ai>,
> kito.cheng@sifive.com<kito.cheng@sifive.com>,
> rguenther@suse.de<rguenther@suse.de>,
> pan2.li@intel.com<pan2.li@intel.com>
> Subject
> [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> From: Pan Li <pan2.li@intel.com>
>
> Fix the bug of the rvv bool mode precision with the adjustment.
> The bits size of vbool*_t will be adjusted to
> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
> adjusted mode precison of vbool*_t will help underlying pass to
> make the right decision for both the correctness and optimization.
>
> Given below sample code:
> void test_1(int8_t * restrict in, int8_t * restrict out)
> {
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
> *(vbool16_t*)(out + 200) = v5;
> *(vbool8_t*)(out + 100) = v2;
> }
>
> Before the precision adjustment:
> addi a4,a1,100
> vsetvli a5,zero,e8,m1,ta,ma
> addi a1,a1,200
> vlm.v v24,0(a0)
> vsm.v v24,0(a4)
> // Need one vsetvli and vlm.v for correctness here.
> vsm.v v24,0(a1)
>
> After the precision adjustment:
> csrr t0,vlenb
> slli t1,t0,1
> csrr a3,vlenb
> sub sp,sp,t1
> slli a4,a3,1
> add a4,a4,sp
> sub a3,a4,a3
> vsetvli a5,zero,e8,m1,ta,ma
> addi a2,a1,200
> vlm.v v24,0(a0)
> vsm.v v24,0(a3)
> addi a1,a1,100
> vsetvli a4,zero,e8,mf2,ta,ma
> csrr t0,vlenb
> vlm.v v25,0(a3)
> vsm.v v25,0(a2)
> slli t1,t0,1
> vsetvli a5,zero,e8,m1,ta,ma
> vsm.v v24,0(a1)
> add sp,sp,t1
> jr ra
>
> However, there may be some optimization opportunates after
> the mode precision adjustment. It can be token care of in
> the RISC-V backend in the underlying separted PR(s).
>
> PR 108185
> PR 108654
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
> * config/riscv/riscv.cc (riscv_v_adjust_precision):
> * config/riscv/riscv.h (riscv_v_adjust_precision):
> * genmodes.cc (ADJUST_PRECISION):
> (emit_mode_adjustments):
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr108185-1.c: New test.
> * gcc.target/riscv/pr108185-2.c: New test.
> * gcc.target/riscv/pr108185-3.c: New test.
> * gcc.target/riscv/pr108185-4.c: New test.
> * gcc.target/riscv/pr108185-5.c: New test.
> * gcc.target/riscv/pr108185-6.c: New test.
> * gcc.target/riscv/pr108185-7.c: New test.
> * gcc.target/riscv/pr108185-8.c: New test.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
> gcc/config/riscv/riscv-modes.def | 8 +++
> gcc/config/riscv/riscv.cc | 12 ++++
> gcc/config/riscv/riscv.h | 1 + gcc/genmodes.cc
> | 25 ++++++- gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68
> ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68
> ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68
> ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68
> ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68
> ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68
> ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68
> ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77
> +++++++++++++++++++++
> 12 files changed, 598 insertions(+), 1 deletion(-) create mode 100644
> gcc/testsuite/gcc.target/riscv/pr108185-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>
> diff --git a/gcc/config/riscv/riscv-modes.def
> b/gcc/config/riscv/riscv-modes.def
> index d5305efa8a6..110bddce851 100644
> --- a/gcc/config/riscv/riscv-modes.def
> +++ b/gcc/config/riscv/riscv-modes.def
> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks *
> riscv_bytes_per_vector_chunk); ADJUST_BYTESIZE (VNx32BI,
> riscv_vector_chunks * riscv_bytes_per_vector_chunk); ADJUST_BYTESIZE
> (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>
> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode,
> +16)); ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision
> +(VNx32BImode, 32)); ADJUST_PRECISION (VNx64BI,
> +riscv_v_adjust_precision (VNx64BImode, 64));
> +
> /*
> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 |
> MIN_VLEN=64 |
> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL
> | diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index de3e1f903c7..cbe66c0e35b 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int
> scale)
> return scale;
> }
>
> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
> + PRECISION size for corresponding machine_mode. */
> +
> +poly_int64
> +riscv_v_adjust_precision (machine_mode mode, int scale) {
> + if (riscv_v_ext_vector_mode_p (mode))
> + return riscv_vector_chunks * scale;
> +
> + return scale;
> +}
> +
> /* Return true if X is a valid address for machine mode MODE. If it
> is,
> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is
> in
> effect. */
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index
> 5bc7f2f467d..15b9317a8ce 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary; extern
> unsigned riscv_bytes_per_vector_chunk; extern poly_uint16
> riscv_vector_chunks; extern poly_int64 riscv_v_adjust_nunits (enum
> machine_mode, int);
> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
> /* The number of bits and bytes in a RVV vector. */ #define
> BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks *
> riscv_bytes_per_vector_chunk * 8)) #define BYTES_PER_RISCV_VECTOR
> (poly_uint16 (riscv_vector_chunks *
> riscv_bytes_per_vector_chunk))
> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc index
> 2d418f09aab..12f4e6335e6 100644
> --- a/gcc/genmodes.cc
> +++ b/gcc/genmodes.cc
> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment; static
> struct mode_adjust *adj_format; static struct mode_adjust *adj_ibit;
> static struct mode_adjust *adj_fbit;
> +static struct mode_adjust *adj_precision;
>
> /* Mode class operations. */
> static enum mode_class
> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass, #define
> ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM,
> RANDOM) #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X,
> RANDOM, RANDOM)
> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM,
> RANDOM)
> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT,
> FLOAT)
> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM) @@
> -1829,7 +1831,15 @@ emit_mode_adjustments (void)
> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
> m->name, m->name);
> printf (" mode_precision[E_%smode] = ps * old_factor;\n",
> m->name);
> - printf (" mode_size[E_%smode] = exact_div
> (mode_precision[E_%smode],"
> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT.
> */
> + printf (" poly_uint16 size_one = "
> + "mode_precision[E_%smode].is_constant ()\n", m->name);
> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
> + printf (" if (known_lt (mode_precision[E_%smode], "
> + "size_one * BITS_PER_UNIT))\n", m->name);
> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
> + printf (" else\n");
> + printf (" mode_size[E_%smode] = exact_div
> (mode_precision[E_%smode],"
> " BITS_PER_UNIT);\n", m->name, m->name);
> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
> printf (" adjust_mode_mask (E_%smode);\n", m->name); @@
> -1963,6 +1973,19 @@ emit_mode_adjustments (void)
> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
> a->file, a->line, a->mode->name, a->adjustment);
>
> + /* Adjust precision to the actual bits size. */
> + for (a = adj_precision; a; a = a->next)
> + switch (a->mode->cl)
> + {
> + case MODE_VECTOR_BOOL:
> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
> + a->adjustment);
> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
> + break;
> + default:
> + break;
> + }
> +
> puts ("}");
> }
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> new file mode 100644
> index 00000000000..e70960c5b6d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times
> +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 18 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> new file mode 100644
> index 00000000000..dcc7a644a88
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times
> +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 17 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> new file mode 100644
> index 00000000000..3af0513e006
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times
> +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 16 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> new file mode 100644
> index 00000000000..ea3c360d756
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times
> +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 15 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> new file mode 100644
> index 00000000000..9fc659d2402
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times
> +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 14 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> new file mode 100644
> index 00000000000..98275e5267d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times
> +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 13 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> new file mode 100644
> index 00000000000..8f6f0b11f09
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times
> +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> new file mode 100644
> index 00000000000..d96959dd064
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> @@ -0,0 +1,77 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 7 } } */
> +/* { dg-final { scan-assembler-times
> +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 14 } } */
> --
> 2.34.1
>
>
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
Hi,
Kindly reminder for this PR.
Pan
-----Original Message-----
From: Li, Pan2
Sent: Friday, February 17, 2023 4:39 PM
To: richard.sandiford@arm.com; juzhe.zhong <juzhe.zhong@rivai.ai>
Cc: incarnation.p.lee@outlook.com; gcc-patches@gcc.gnu.org; kito.cheng@sifive.com; Richard Biener <rguenther@suse.de>
Subject: RE: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Cool, thank you!
Hi Richard S,
Could you please help to do me a fever for this change when you free? Thank you!
Pan
-----Original Message-----
From: Richard Biener <rguenther@suse.de>
Sent: Friday, February 17, 2023 3:36 PM
To: juzhe.zhong <juzhe.zhong@rivai.ai>
Cc: incarnation.p.lee@outlook.com; gcc-patches@gcc.gnu.org; kito.cheng@sifive.com; Li, Pan2 <pan2.li@intel.com>; richard.sandiford@arm.com
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
On Thu, 16 Feb 2023, juzhe.zhong wrote:
> Thanks for the great work to fix this issue for rvv.Hi,richard. This
> is the patch to differentiate mask mode of same bytesize. Adjust the
> precision correctly according to rvv isa. Would you mind helping us
> with this patch ?
> Since it‘s very important for rvv support in gcc
If adjusting the precision works fine then I suppose the patch looks reasonable. I'll defer to Richard S. though since he's the one knowing the mode stuff better. I'd have integrated the precision adjustment with the ADJUST_NITER hook since that is also documented to adjust the precision btw.
Richard.
> Thanks.
> ---- Replied Message ----
> From
> incarnation.p.lee@outlook.com<incarnation.p.lee@outlook.com>
> Date
> 02/16/2023 23:12
> To
> gcc-patches@gcc.gnu.org<gcc-patches@gcc.gnu.org>
> Cc
> juzhe.zhong@rivai.ai<juzhe.zhong@rivai.ai>,
> kito.cheng@sifive.com<kito.cheng@sifive.com>,
> rguenther@suse.de<rguenther@suse.de>,
> pan2.li@intel.com<pan2.li@intel.com>
> Subject
> [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> From: Pan Li <pan2.li@intel.com>
>
> Fix the bug of the rvv bool mode precision with the adjustment.
> The bits size of vbool*_t will be adjusted to
> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
> adjusted mode precison of vbool*_t will help underlying pass to
> make the right decision for both the correctness and optimization.
>
> Given below sample code:
> void test_1(int8_t * restrict in, int8_t * restrict out)
> {
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
> *(vbool16_t*)(out + 200) = v5;
> *(vbool8_t*)(out + 100) = v2;
> }
>
> Before the precision adjustment:
> addi a4,a1,100
> vsetvli a5,zero,e8,m1,ta,ma
> addi a1,a1,200
> vlm.v v24,0(a0)
> vsm.v v24,0(a4)
> // Need one vsetvli and vlm.v for correctness here.
> vsm.v v24,0(a1)
>
> After the precision adjustment:
> csrr t0,vlenb
> slli t1,t0,1
> csrr a3,vlenb
> sub sp,sp,t1
> slli a4,a3,1
> add a4,a4,sp
> sub a3,a4,a3
> vsetvli a5,zero,e8,m1,ta,ma
> addi a2,a1,200
> vlm.v v24,0(a0)
> vsm.v v24,0(a3)
> addi a1,a1,100
> vsetvli a4,zero,e8,mf2,ta,ma
> csrr t0,vlenb
> vlm.v v25,0(a3)
> vsm.v v25,0(a2)
> slli t1,t0,1
> vsetvli a5,zero,e8,m1,ta,ma
> vsm.v v24,0(a1)
> add sp,sp,t1
> jr ra
>
> However, there may be some optimization opportunates after
> the mode precision adjustment. It can be token care of in
> the RISC-V backend in the underlying separted PR(s).
>
> PR 108185
> PR 108654
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
> * config/riscv/riscv.cc (riscv_v_adjust_precision):
> * config/riscv/riscv.h (riscv_v_adjust_precision):
> * genmodes.cc (ADJUST_PRECISION):
> (emit_mode_adjustments):
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr108185-1.c: New test.
> * gcc.target/riscv/pr108185-2.c: New test.
> * gcc.target/riscv/pr108185-3.c: New test.
> * gcc.target/riscv/pr108185-4.c: New test.
> * gcc.target/riscv/pr108185-5.c: New test.
> * gcc.target/riscv/pr108185-6.c: New test.
> * gcc.target/riscv/pr108185-7.c: New test.
> * gcc.target/riscv/pr108185-8.c: New test.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
> gcc/config/riscv/riscv-modes.def | 8 +++
> gcc/config/riscv/riscv.cc | 12 ++++
> gcc/config/riscv/riscv.h | 1 + gcc/genmodes.cc
> | 25 ++++++- gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68
> ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68
> ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68
> ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68
> ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68
> ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68
> ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68
> ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77
> +++++++++++++++++++++
> 12 files changed, 598 insertions(+), 1 deletion(-) create mode 100644
> gcc/testsuite/gcc.target/riscv/pr108185-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>
> diff --git a/gcc/config/riscv/riscv-modes.def
> b/gcc/config/riscv/riscv-modes.def
> index d5305efa8a6..110bddce851 100644
> --- a/gcc/config/riscv/riscv-modes.def
> +++ b/gcc/config/riscv/riscv-modes.def
> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks *
> riscv_bytes_per_vector_chunk); ADJUST_BYTESIZE (VNx32BI,
> riscv_vector_chunks * riscv_bytes_per_vector_chunk); ADJUST_BYTESIZE
> (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>
> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode,
> +16)); ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision
> +(VNx32BImode, 32)); ADJUST_PRECISION (VNx64BI,
> +riscv_v_adjust_precision (VNx64BImode, 64));
> +
> /*
> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 |
> MIN_VLEN=64 |
> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL
> | diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index de3e1f903c7..cbe66c0e35b 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int
> scale)
> return scale;
> }
>
> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
> + PRECISION size for corresponding machine_mode. */
> +
> +poly_int64
> +riscv_v_adjust_precision (machine_mode mode, int scale) {
> + if (riscv_v_ext_vector_mode_p (mode))
> + return riscv_vector_chunks * scale;
> +
> + return scale;
> +}
> +
> /* Return true if X is a valid address for machine mode MODE. If it
> is,
> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is
> in
> effect. */
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index
> 5bc7f2f467d..15b9317a8ce 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary; extern
> unsigned riscv_bytes_per_vector_chunk; extern poly_uint16
> riscv_vector_chunks; extern poly_int64 riscv_v_adjust_nunits (enum
> machine_mode, int);
> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
> /* The number of bits and bytes in a RVV vector. */ #define
> BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks *
> riscv_bytes_per_vector_chunk * 8)) #define BYTES_PER_RISCV_VECTOR
> (poly_uint16 (riscv_vector_chunks *
> riscv_bytes_per_vector_chunk))
> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc index
> 2d418f09aab..12f4e6335e6 100644
> --- a/gcc/genmodes.cc
> +++ b/gcc/genmodes.cc
> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment; static
> struct mode_adjust *adj_format; static struct mode_adjust *adj_ibit;
> static struct mode_adjust *adj_fbit;
> +static struct mode_adjust *adj_precision;
>
> /* Mode class operations. */
> static enum mode_class
> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass, #define
> ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM,
> RANDOM) #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X,
> RANDOM, RANDOM)
> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM,
> RANDOM)
> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT,
> FLOAT)
> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM) @@
> -1829,7 +1831,15 @@ emit_mode_adjustments (void)
> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
> m->name, m->name);
> printf (" mode_precision[E_%smode] = ps * old_factor;\n",
> m->name);
> - printf (" mode_size[E_%smode] = exact_div
> (mode_precision[E_%smode],"
> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT.
> */
> + printf (" poly_uint16 size_one = "
> + "mode_precision[E_%smode].is_constant ()\n", m->name);
> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
> + printf (" if (known_lt (mode_precision[E_%smode], "
> + "size_one * BITS_PER_UNIT))\n", m->name);
> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
> + printf (" else\n");
> + printf (" mode_size[E_%smode] = exact_div
> (mode_precision[E_%smode],"
> " BITS_PER_UNIT);\n", m->name, m->name);
> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
> printf (" adjust_mode_mask (E_%smode);\n", m->name); @@
> -1963,6 +1973,19 @@ emit_mode_adjustments (void)
> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
> a->file, a->line, a->mode->name, a->adjustment);
>
> + /* Adjust precision to the actual bits size. */
> + for (a = adj_precision; a; a = a->next)
> + switch (a->mode->cl)
> + {
> + case MODE_VECTOR_BOOL:
> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
> + a->adjustment);
> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
> + break;
> + default:
> + break;
> + }
> +
> puts ("}");
> }
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> new file mode 100644
> index 00000000000..e70960c5b6d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times
> +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 18 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> new file mode 100644
> index 00000000000..dcc7a644a88
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times
> +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 17 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> new file mode 100644
> index 00000000000..3af0513e006
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times
> +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 16 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> new file mode 100644
> index 00000000000..ea3c360d756
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times
> +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 15 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> new file mode 100644
> index 00000000000..9fc659d2402
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times
> +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 14 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> new file mode 100644
> index 00000000000..98275e5267d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times
> +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 13 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> new file mode 100644
> index 00000000000..8f6f0b11f09
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> +/* { dg-final { scan-assembler-times
> +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> new file mode 100644
> index 00000000000..d96959dd064
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> @@ -0,0 +1,77 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> +{
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict
> +out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times
> +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 7 } } */
> +/* { dg-final { scan-assembler-times
> +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> 14 } } */
> --
> 2.34.1
>
>
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
Hi Richard Sandiford:
RISC-V part is OK to me, could you review the ADJUST_PRECISION part to
make sure this change is reasonable?
Thanks :)
On Tue, Feb 21, 2023 at 2:37 PM Li, Pan2 via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>
> Kindly reminder for this PR.
>
> Pan
>
> -----Original Message-----
> From: Li, Pan2
> Sent: Friday, February 17, 2023 4:39 PM
> To: richard.sandiford@arm.com; juzhe.zhong <juzhe.zhong@rivai.ai>
> Cc: incarnation.p.lee@outlook.com; gcc-patches@gcc.gnu.org; kito.cheng@sifive.com; Richard Biener <rguenther@suse.de>
> Subject: RE: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> Cool, thank you!
>
> Hi Richard S,
>
> Could you please help to do me a fever for this change when you free? Thank you!
>
> Pan
>
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, February 17, 2023 3:36 PM
> To: juzhe.zhong <juzhe.zhong@rivai.ai>
> Cc: incarnation.p.lee@outlook.com; gcc-patches@gcc.gnu.org; kito.cheng@sifive.com; Li, Pan2 <pan2.li@intel.com>; richard.sandiford@arm.com
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> On Thu, 16 Feb 2023, juzhe.zhong wrote:
>
> > Thanks for the great work to fix this issue for rvv.Hi,richard. This
> > is the patch to differentiate mask mode of same bytesize. Adjust the
> > precision correctly according to rvv isa. Would you mind helping us
> > with this patch ?
> > Since it‘s very important for rvv support in gcc
>
> If adjusting the precision works fine then I suppose the patch looks reasonable. I'll defer to Richard S. though since he's the one knowing the mode stuff better. I'd have integrated the precision adjustment with the ADJUST_NITER hook since that is also documented to adjust the precision btw.
>
> Richard.
>
> > Thanks.
> > ---- Replied Message ----
> > From
> > incarnation.p.lee@outlook.com<incarnation.p.lee@outlook.com>
> > Date
> > 02/16/2023 23:12
> > To
> > gcc-patches@gcc.gnu.org<gcc-patches@gcc.gnu.org>
> > Cc
> > juzhe.zhong@rivai.ai<juzhe.zhong@rivai.ai>,
> > kito.cheng@sifive.com<kito.cheng@sifive.com>,
> > rguenther@suse.de<rguenther@suse.de>,
> > pan2.li@intel.com<pan2.li@intel.com>
> > Subject
> > [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> > From: Pan Li <pan2.li@intel.com>
> >
> > Fix the bug of the rvv bool mode precision with the adjustment.
> > The bits size of vbool*_t will be adjusted to
> > [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
> > adjusted mode precison of vbool*_t will help underlying pass to
> > make the right decision for both the correctness and optimization.
> >
> > Given below sample code:
> > void test_1(int8_t * restrict in, int8_t * restrict out)
> > {
> > vbool8_t v2 = *(vbool8_t*)in;
> > vbool16_t v5 = *(vbool16_t*)in;
> > *(vbool16_t*)(out + 200) = v5;
> > *(vbool8_t*)(out + 100) = v2;
> > }
> >
> > Before the precision adjustment:
> > addi a4,a1,100
> > vsetvli a5,zero,e8,m1,ta,ma
> > addi a1,a1,200
> > vlm.v v24,0(a0)
> > vsm.v v24,0(a4)
> > // Need one vsetvli and vlm.v for correctness here.
> > vsm.v v24,0(a1)
> >
> > After the precision adjustment:
> > csrr t0,vlenb
> > slli t1,t0,1
> > csrr a3,vlenb
> > sub sp,sp,t1
> > slli a4,a3,1
> > add a4,a4,sp
> > sub a3,a4,a3
> > vsetvli a5,zero,e8,m1,ta,ma
> > addi a2,a1,200
> > vlm.v v24,0(a0)
> > vsm.v v24,0(a3)
> > addi a1,a1,100
> > vsetvli a4,zero,e8,mf2,ta,ma
> > csrr t0,vlenb
> > vlm.v v25,0(a3)
> > vsm.v v25,0(a2)
> > slli t1,t0,1
> > vsetvli a5,zero,e8,m1,ta,ma
> > vsm.v v24,0(a1)
> > add sp,sp,t1
> > jr ra
> >
> > However, there may be some optimization opportunates after
> > the mode precision adjustment. It can be token care of in
> > the RISC-V backend in the underlying separted PR(s).
> >
> > PR 108185
> > PR 108654
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv-modes.def (ADJUST_PRECISION):
> > * config/riscv/riscv.cc (riscv_v_adjust_precision):
> > * config/riscv/riscv.h (riscv_v_adjust_precision):
> > * genmodes.cc (ADJUST_PRECISION):
> > (emit_mode_adjustments):
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/pr108185-1.c: New test.
> > * gcc.target/riscv/pr108185-2.c: New test.
> > * gcc.target/riscv/pr108185-3.c: New test.
> > * gcc.target/riscv/pr108185-4.c: New test.
> > * gcc.target/riscv/pr108185-5.c: New test.
> > * gcc.target/riscv/pr108185-6.c: New test.
> > * gcc.target/riscv/pr108185-7.c: New test.
> > * gcc.target/riscv/pr108185-8.c: New test.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> > gcc/config/riscv/riscv-modes.def | 8 +++
> > gcc/config/riscv/riscv.cc | 12 ++++
> > gcc/config/riscv/riscv.h | 1 + gcc/genmodes.cc
> > | 25 ++++++- gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77
> > +++++++++++++++++++++
> > 12 files changed, 598 insertions(+), 1 deletion(-) create mode 100644
> > gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >
> > diff --git a/gcc/config/riscv/riscv-modes.def
> > b/gcc/config/riscv/riscv-modes.def
> > index d5305efa8a6..110bddce851 100644
> > --- a/gcc/config/riscv/riscv-modes.def
> > +++ b/gcc/config/riscv/riscv-modes.def
> > @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks *
> > riscv_bytes_per_vector_chunk); ADJUST_BYTESIZE (VNx32BI,
> > riscv_vector_chunks * riscv_bytes_per_vector_chunk); ADJUST_BYTESIZE
> > (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
> >
> > +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
> > +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
> > +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
> > +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
> > +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode,
> > +16)); ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision
> > +(VNx32BImode, 32)); ADJUST_PRECISION (VNx64BI,
> > +riscv_v_adjust_precision (VNx64BImode, 64));
> > +
> > /*
> > | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 |
> > MIN_VLEN=64 |
> > | | LMUL | SEW/LMUL | LMUL | SEW/LMUL
> > | diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index de3e1f903c7..cbe66c0e35b 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int
> > scale)
> > return scale;
> > }
> >
> > +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
> > + PRECISION size for corresponding machine_mode. */
> > +
> > +poly_int64
> > +riscv_v_adjust_precision (machine_mode mode, int scale) {
> > + if (riscv_v_ext_vector_mode_p (mode))
> > + return riscv_vector_chunks * scale;
> > +
> > + return scale;
> > +}
> > +
> > /* Return true if X is a valid address for machine mode MODE. If it
> > is,
> > fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is
> > in
> > effect. */
> > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index
> > 5bc7f2f467d..15b9317a8ce 100644
> > --- a/gcc/config/riscv/riscv.h
> > +++ b/gcc/config/riscv/riscv.h
> > @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary; extern
> > unsigned riscv_bytes_per_vector_chunk; extern poly_uint16
> > riscv_vector_chunks; extern poly_int64 riscv_v_adjust_nunits (enum
> > machine_mode, int);
> > +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
> > /* The number of bits and bytes in a RVV vector. */ #define
> > BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks *
> > riscv_bytes_per_vector_chunk * 8)) #define BYTES_PER_RISCV_VECTOR
> > (poly_uint16 (riscv_vector_chunks *
> > riscv_bytes_per_vector_chunk))
> > diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc index
> > 2d418f09aab..12f4e6335e6 100644
> > --- a/gcc/genmodes.cc
> > +++ b/gcc/genmodes.cc
> > @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment; static
> > struct mode_adjust *adj_format; static struct mode_adjust *adj_ibit;
> > static struct mode_adjust *adj_fbit;
> > +static struct mode_adjust *adj_precision;
> >
> > /* Mode class operations. */
> > static enum mode_class
> > @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass, #define
> > ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
> > #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM,
> > RANDOM) #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X,
> > RANDOM, RANDOM)
> > +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM,
> > RANDOM)
> > #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT,
> > FLOAT)
> > #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
> > #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM) @@
> > -1829,7 +1831,15 @@ emit_mode_adjustments (void)
> > " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
> > m->name, m->name);
> > printf (" mode_precision[E_%smode] = ps * old_factor;\n",
> > m->name);
> > - printf (" mode_size[E_%smode] = exact_div
> > (mode_precision[E_%smode],"
> > + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT.
> > */
> > + printf (" poly_uint16 size_one = "
> > + "mode_precision[E_%smode].is_constant ()\n", m->name);
> > + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
> > + printf (" if (known_lt (mode_precision[E_%smode], "
> > + "size_one * BITS_PER_UNIT))\n", m->name);
> > + printf (" mode_size[E_%smode] = size_one;\n", m->name);
> > + printf (" else\n");
> > + printf (" mode_size[E_%smode] = exact_div
> > (mode_precision[E_%smode],"
> > " BITS_PER_UNIT);\n", m->name, m->name);
> > printf (" mode_nunits[E_%smode] = ps;\n", m->name);
> > printf (" adjust_mode_mask (E_%smode);\n", m->name); @@
> > -1963,6 +1973,19 @@ emit_mode_adjustments (void)
> > printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
> > a->file, a->line, a->mode->name, a->adjustment);
> >
> > + /* Adjust precision to the actual bits size. */
> > + for (a = adj_precision; a; a = a->next)
> > + switch (a->mode->cl)
> > + {
> > + case MODE_VECTOR_BOOL:
> > + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
> > + a->adjustment);
> > + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
> > + break;
> > + default:
> > + break;
> > + }
> > +
> > puts ("}");
> > }
> >
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > new file mode 100644
> > index 00000000000..e70960c5b6d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 18 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > new file mode 100644
> > index 00000000000..dcc7a644a88
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 17 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > new file mode 100644
> > index 00000000000..3af0513e006
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 16 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > new file mode 100644
> > index 00000000000..ea3c360d756
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 15 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > new file mode 100644
> > index 00000000000..9fc659d2402
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 14 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > new file mode 100644
> > index 00000000000..98275e5267d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 13 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > new file mode 100644
> > index 00000000000..8f6f0b11f09
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > new file mode 100644
> > index 00000000000..d96959dd064
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > @@ -0,0 +1,77 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 7 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 14 } } */
> > --
> > 2.34.1
> >
> >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
Hi,
It's been a while since this patch is sent.
This patch is very important for us since we are going to release RVV intrinsic support in GCC 13.
And this is the patch to fix bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108654
Can any one verifies this patch for us?
Thanks.
juzhe.zhong@rivai.ai
From: Kito Cheng
Date: 2023-02-21 16:28
To: Li, Pan2
CC: richard.sandiford@arm.com; juzhe.zhong; incarnation.p.lee@outlook.com; gcc-patches@gcc.gnu.org; kito.cheng@sifive.com; Richard Biener
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Hi Richard Sandiford:
RISC-V part is OK to me, could you review the ADJUST_PRECISION part to
make sure this change is reasonable?
Thanks :)
On Tue, Feb 21, 2023 at 2:37 PM Li, Pan2 via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>
> Kindly reminder for this PR.
>
> Pan
>
> -----Original Message-----
> From: Li, Pan2
> Sent: Friday, February 17, 2023 4:39 PM
> To: richard.sandiford@arm.com; juzhe.zhong <juzhe.zhong@rivai.ai>
> Cc: incarnation.p.lee@outlook.com; gcc-patches@gcc.gnu.org; kito.cheng@sifive.com; Richard Biener <rguenther@suse.de>
> Subject: RE: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> Cool, thank you!
>
> Hi Richard S,
>
> Could you please help to do me a fever for this change when you free? Thank you!
>
> Pan
>
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, February 17, 2023 3:36 PM
> To: juzhe.zhong <juzhe.zhong@rivai.ai>
> Cc: incarnation.p.lee@outlook.com; gcc-patches@gcc.gnu.org; kito.cheng@sifive.com; Li, Pan2 <pan2.li@intel.com>; richard.sandiford@arm.com
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> On Thu, 16 Feb 2023, juzhe.zhong wrote:
>
> > Thanks for the great work to fix this issue for rvv.Hi,richard. This
> > is the patch to differentiate mask mode of same bytesize. Adjust the
> > precision correctly according to rvv isa. Would you mind helping us
> > with this patch ?
> > Since it‘s very important for rvv support in gcc
>
> If adjusting the precision works fine then I suppose the patch looks reasonable. I'll defer to Richard S. though since he's the one knowing the mode stuff better. I'd have integrated the precision adjustment with the ADJUST_NITER hook since that is also documented to adjust the precision btw.
>
> Richard.
>
> > Thanks.
> > ---- Replied Message ----
> > From
> > incarnation.p.lee@outlook.com<incarnation.p.lee@outlook.com>
> > Date
> > 02/16/2023 23:12
> > To
> > gcc-patches@gcc.gnu.org<gcc-patches@gcc.gnu.org>
> > Cc
> > juzhe.zhong@rivai.ai<juzhe.zhong@rivai.ai>,
> > kito.cheng@sifive.com<kito.cheng@sifive.com>,
> > rguenther@suse.de<rguenther@suse.de>,
> > pan2.li@intel.com<pan2.li@intel.com>
> > Subject
> > [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> > From: Pan Li <pan2.li@intel.com>
> >
> > Fix the bug of the rvv bool mode precision with the adjustment.
> > The bits size of vbool*_t will be adjusted to
> > [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
> > adjusted mode precison of vbool*_t will help underlying pass to
> > make the right decision for both the correctness and optimization.
> >
> > Given below sample code:
> > void test_1(int8_t * restrict in, int8_t * restrict out)
> > {
> > vbool8_t v2 = *(vbool8_t*)in;
> > vbool16_t v5 = *(vbool16_t*)in;
> > *(vbool16_t*)(out + 200) = v5;
> > *(vbool8_t*)(out + 100) = v2;
> > }
> >
> > Before the precision adjustment:
> > addi a4,a1,100
> > vsetvli a5,zero,e8,m1,ta,ma
> > addi a1,a1,200
> > vlm.v v24,0(a0)
> > vsm.v v24,0(a4)
> > // Need one vsetvli and vlm.v for correctness here.
> > vsm.v v24,0(a1)
> >
> > After the precision adjustment:
> > csrr t0,vlenb
> > slli t1,t0,1
> > csrr a3,vlenb
> > sub sp,sp,t1
> > slli a4,a3,1
> > add a4,a4,sp
> > sub a3,a4,a3
> > vsetvli a5,zero,e8,m1,ta,ma
> > addi a2,a1,200
> > vlm.v v24,0(a0)
> > vsm.v v24,0(a3)
> > addi a1,a1,100
> > vsetvli a4,zero,e8,mf2,ta,ma
> > csrr t0,vlenb
> > vlm.v v25,0(a3)
> > vsm.v v25,0(a2)
> > slli t1,t0,1
> > vsetvli a5,zero,e8,m1,ta,ma
> > vsm.v v24,0(a1)
> > add sp,sp,t1
> > jr ra
> >
> > However, there may be some optimization opportunates after
> > the mode precision adjustment. It can be token care of in
> > the RISC-V backend in the underlying separted PR(s).
> >
> > PR 108185
> > PR 108654
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv-modes.def (ADJUST_PRECISION):
> > * config/riscv/riscv.cc (riscv_v_adjust_precision):
> > * config/riscv/riscv.h (riscv_v_adjust_precision):
> > * genmodes.cc (ADJUST_PRECISION):
> > (emit_mode_adjustments):
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/pr108185-1.c: New test.
> > * gcc.target/riscv/pr108185-2.c: New test.
> > * gcc.target/riscv/pr108185-3.c: New test.
> > * gcc.target/riscv/pr108185-4.c: New test.
> > * gcc.target/riscv/pr108185-5.c: New test.
> > * gcc.target/riscv/pr108185-6.c: New test.
> > * gcc.target/riscv/pr108185-7.c: New test.
> > * gcc.target/riscv/pr108185-8.c: New test.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> > gcc/config/riscv/riscv-modes.def | 8 +++
> > gcc/config/riscv/riscv.cc | 12 ++++
> > gcc/config/riscv/riscv.h | 1 + gcc/genmodes.cc
> > | 25 ++++++- gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77
> > +++++++++++++++++++++
> > 12 files changed, 598 insertions(+), 1 deletion(-) create mode 100644
> > gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >
> > diff --git a/gcc/config/riscv/riscv-modes.def
> > b/gcc/config/riscv/riscv-modes.def
> > index d5305efa8a6..110bddce851 100644
> > --- a/gcc/config/riscv/riscv-modes.def
> > +++ b/gcc/config/riscv/riscv-modes.def
> > @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks *
> > riscv_bytes_per_vector_chunk); ADJUST_BYTESIZE (VNx32BI,
> > riscv_vector_chunks * riscv_bytes_per_vector_chunk); ADJUST_BYTESIZE
> > (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
> >
> > +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
> > +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
> > +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
> > +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
> > +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode,
> > +16)); ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision
> > +(VNx32BImode, 32)); ADJUST_PRECISION (VNx64BI,
> > +riscv_v_adjust_precision (VNx64BImode, 64));
> > +
> > /*
> > | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 |
> > MIN_VLEN=64 |
> > | | LMUL | SEW/LMUL | LMUL | SEW/LMUL
> > | diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index de3e1f903c7..cbe66c0e35b 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int
> > scale)
> > return scale;
> > }
> >
> > +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
> > + PRECISION size for corresponding machine_mode. */
> > +
> > +poly_int64
> > +riscv_v_adjust_precision (machine_mode mode, int scale) {
> > + if (riscv_v_ext_vector_mode_p (mode))
> > + return riscv_vector_chunks * scale;
> > +
> > + return scale;
> > +}
> > +
> > /* Return true if X is a valid address for machine mode MODE. If it
> > is,
> > fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is
> > in
> > effect. */
> > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index
> > 5bc7f2f467d..15b9317a8ce 100644
> > --- a/gcc/config/riscv/riscv.h
> > +++ b/gcc/config/riscv/riscv.h
> > @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary; extern
> > unsigned riscv_bytes_per_vector_chunk; extern poly_uint16
> > riscv_vector_chunks; extern poly_int64 riscv_v_adjust_nunits (enum
> > machine_mode, int);
> > +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
> > /* The number of bits and bytes in a RVV vector. */ #define
> > BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks *
> > riscv_bytes_per_vector_chunk * 8)) #define BYTES_PER_RISCV_VECTOR
> > (poly_uint16 (riscv_vector_chunks *
> > riscv_bytes_per_vector_chunk))
> > diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc index
> > 2d418f09aab..12f4e6335e6 100644
> > --- a/gcc/genmodes.cc
> > +++ b/gcc/genmodes.cc
> > @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment; static
> > struct mode_adjust *adj_format; static struct mode_adjust *adj_ibit;
> > static struct mode_adjust *adj_fbit;
> > +static struct mode_adjust *adj_precision;
> >
> > /* Mode class operations. */
> > static enum mode_class
> > @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass, #define
> > ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
> > #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM,
> > RANDOM) #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X,
> > RANDOM, RANDOM)
> > +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM,
> > RANDOM)
> > #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT,
> > FLOAT)
> > #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
> > #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM) @@
> > -1829,7 +1831,15 @@ emit_mode_adjustments (void)
> > " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
> > m->name, m->name);
> > printf (" mode_precision[E_%smode] = ps * old_factor;\n",
> > m->name);
> > - printf (" mode_size[E_%smode] = exact_div
> > (mode_precision[E_%smode],"
> > + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT.
> > */
> > + printf (" poly_uint16 size_one = "
> > + "mode_precision[E_%smode].is_constant ()\n", m->name);
> > + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
> > + printf (" if (known_lt (mode_precision[E_%smode], "
> > + "size_one * BITS_PER_UNIT))\n", m->name);
> > + printf (" mode_size[E_%smode] = size_one;\n", m->name);
> > + printf (" else\n");
> > + printf (" mode_size[E_%smode] = exact_div
> > (mode_precision[E_%smode],"
> > " BITS_PER_UNIT);\n", m->name, m->name);
> > printf (" mode_nunits[E_%smode] = ps;\n", m->name);
> > printf (" adjust_mode_mask (E_%smode);\n", m->name); @@
> > -1963,6 +1973,19 @@ emit_mode_adjustments (void)
> > printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
> > a->file, a->line, a->mode->name, a->adjustment);
> >
> > + /* Adjust precision to the actual bits size. */
> > + for (a = adj_precision; a; a = a->next)
> > + switch (a->mode->cl)
> > + {
> > + case MODE_VECTOR_BOOL:
> > + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
> > + a->adjustment);
> > + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
> > + break;
> > + default:
> > + break;
> > + }
> > +
> > puts ("}");
> > }
> >
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > new file mode 100644
> > index 00000000000..e70960c5b6d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 18 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > new file mode 100644
> > index 00000000000..dcc7a644a88
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 17 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > new file mode 100644
> > index 00000000000..3af0513e006
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 16 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > new file mode 100644
> > index 00000000000..ea3c360d756
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 15 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > new file mode 100644
> > index 00000000000..9fc659d2402
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 14 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > new file mode 100644
> > index 00000000000..98275e5267d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 13 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > new file mode 100644
> > index 00000000000..8f6f0b11f09
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > new file mode 100644
> > index 00000000000..d96959dd064
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > @@ -0,0 +1,77 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 7 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 14 } } */
> > --
> > 2.34.1
> >
> >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
Hi Richard Sandiford,
Looks like you are busy and stuck in some important work right now, could you please help to share something like ETA if possible? Then we may have a better plan for the RVV intrinsic support in the GCC 13.
Thanks a lot and feel free to ping us if any question or concern.
Pan
From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Friday, February 24, 2023 1:08 PM
To: kito.cheng <kito.cheng@gmail.com>; Li, Pan2 <pan2.li@intel.com>
Cc: richard.sandiford <richard.sandiford@arm.com>; incarnation.p.lee <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>; jeffreyalaw <jeffreyalaw@gmail.com>
Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Hi,
It's been a while since this patch is sent.
This patch is very important for us since we are going to release RVV intrinsic support in GCC 13.
And this is the patch to fix bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108654
Can any one verifies this patch for us?
Thanks.
________________________________
juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
From: Kito Cheng<mailto:kito.cheng@gmail.com>
Date: 2023-02-21 16:28
To: Li, Pan2<mailto:pan2.li@intel.com>
CC: richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>; juzhe.zhong<mailto:juzhe.zhong@rivai.ai>; incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>; gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; Richard Biener<mailto:rguenther@suse.de>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Hi Richard Sandiford:
RISC-V part is OK to me, could you review the ADJUST_PRECISION part to
make sure this change is reasonable?
Thanks :)
On Tue, Feb 21, 2023 at 2:37 PM Li, Pan2 via Gcc-patches
<gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote:
>
> Hi,
>
> Kindly reminder for this PR.
>
> Pan
>
> -----Original Message-----
> From: Li, Pan2
> Sent: Friday, February 17, 2023 4:39 PM
> To: richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>; juzhe.zhong <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>
> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>; gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> Subject: RE: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> Cool, thank you!
>
> Hi Richard S,
>
> Could you please help to do me a fever for this change when you free? Thank you!
>
> Pan
>
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> Sent: Friday, February 17, 2023 3:36 PM
> To: juzhe.zhong <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>
> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>; gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>; richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> On Thu, 16 Feb 2023, juzhe.zhong wrote:
>
> > Thanks for the great work to fix this issue for rvv.Hi,richard. This
> > is the patch to differentiate mask mode of same bytesize. Adjust the
> > precision correctly according to rvv isa. Would you mind helping us
> > with this patch ?
> > Since it‘s very important for rvv support in gcc
>
> If adjusting the precision works fine then I suppose the patch looks reasonable. I'll defer to Richard S. though since he's the one knowing the mode stuff better. I'd have integrated the precision adjustment with the ADJUST_NITER hook since that is also documented to adjust the precision btw.
>
> Richard.
>
> > Thanks.
> > ---- Replied Message ----
> > From
> > incarnation.p.lee@outlook.com<incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cincarnation.p.lee@outlook.com>>
> > Date
> > 02/16/2023 23:12
> > To
> > gcc-patches@gcc.gnu.org<gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cgcc-patches@gcc.gnu.org>>
> > Cc
> > juzhe.zhong@rivai.ai<juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cjuzhe.zhong@rivai.ai>>,
> > kito.cheng@sifive.com<kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3ckito.cheng@sifive.com>>,
> > rguenther@suse.de<rguenther@suse.de<mailto:rguenther@suse.de%3crguenther@suse.de>>,
> > pan2.li@intel.com<pan2.li@intel.com<mailto:pan2.li@intel.com%3cpan2.li@intel.com>>
> > Subject
> > [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> > From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> >
> > Fix the bug of the rvv bool mode precision with the adjustment.
> > The bits size of vbool*_t will be adjusted to
> > [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
> > adjusted mode precison of vbool*_t will help underlying pass to
> > make the right decision for both the correctness and optimization.
> >
> > Given below sample code:
> > void test_1(int8_t * restrict in, int8_t * restrict out)
> > {
> > vbool8_t v2 = *(vbool8_t*)in;
> > vbool16_t v5 = *(vbool16_t*)in;
> > *(vbool16_t*)(out + 200) = v5;
> > *(vbool8_t*)(out + 100) = v2;
> > }
> >
> > Before the precision adjustment:
> > addi a4,a1,100
> > vsetvli a5,zero,e8,m1,ta,ma
> > addi a1,a1,200
> > vlm.v v24,0(a0)
> > vsm.v v24,0(a4)
> > // Need one vsetvli and vlm.v for correctness here.
> > vsm.v v24,0(a1)
> >
> > After the precision adjustment:
> > csrr t0,vlenb
> > slli t1,t0,1
> > csrr a3,vlenb
> > sub sp,sp,t1
> > slli a4,a3,1
> > add a4,a4,sp
> > sub a3,a4,a3
> > vsetvli a5,zero,e8,m1,ta,ma
> > addi a2,a1,200
> > vlm.v v24,0(a0)
> > vsm.v v24,0(a3)
> > addi a1,a1,100
> > vsetvli a4,zero,e8,mf2,ta,ma
> > csrr t0,vlenb
> > vlm.v v25,0(a3)
> > vsm.v v25,0(a2)
> > slli t1,t0,1
> > vsetvli a5,zero,e8,m1,ta,ma
> > vsm.v v24,0(a1)
> > add sp,sp,t1
> > jr ra
> >
> > However, there may be some optimization opportunates after
> > the mode precision adjustment. It can be token care of in
> > the RISC-V backend in the underlying separted PR(s).
> >
> > PR 108185
> > PR 108654
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv-modes.def (ADJUST_PRECISION):
> > * config/riscv/riscv.cc (riscv_v_adjust_precision):
> > * config/riscv/riscv.h (riscv_v_adjust_precision):
> > * genmodes.cc (ADJUST_PRECISION):
> > (emit_mode_adjustments):
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/pr108185-1.c: New test.
> > * gcc.target/riscv/pr108185-2.c: New test.
> > * gcc.target/riscv/pr108185-3.c: New test.
> > * gcc.target/riscv/pr108185-4.c: New test.
> > * gcc.target/riscv/pr108185-5.c: New test.
> > * gcc.target/riscv/pr108185-6.c: New test.
> > * gcc.target/riscv/pr108185-7.c: New test.
> > * gcc.target/riscv/pr108185-8.c: New test.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> > ---
> > gcc/config/riscv/riscv-modes.def | 8 +++
> > gcc/config/riscv/riscv.cc | 12 ++++
> > gcc/config/riscv/riscv.h | 1 + gcc/genmodes.cc
> > | 25 ++++++- gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77
> > +++++++++++++++++++++
> > 12 files changed, 598 insertions(+), 1 deletion(-) create mode 100644
> > gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >
> > diff --git a/gcc/config/riscv/riscv-modes.def
> > b/gcc/config/riscv/riscv-modes.def
> > index d5305efa8a6..110bddce851 100644
> > --- a/gcc/config/riscv/riscv-modes.def
> > +++ b/gcc/config/riscv/riscv-modes.def
> > @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks *
> > riscv_bytes_per_vector_chunk); ADJUST_BYTESIZE (VNx32BI,
> > riscv_vector_chunks * riscv_bytes_per_vector_chunk); ADJUST_BYTESIZE
> > (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
> >
> > +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
> > +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
> > +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
> > +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
> > +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode,
> > +16)); ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision
> > +(VNx32BImode, 32)); ADJUST_PRECISION (VNx64BI,
> > +riscv_v_adjust_precision (VNx64BImode, 64));
> > +
> > /*
> > | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 |
> > MIN_VLEN=64 |
> > | | LMUL | SEW/LMUL | LMUL | SEW/LMUL
> > | diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index de3e1f903c7..cbe66c0e35b 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int
> > scale)
> > return scale;
> > }
> >
> > +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
> > + PRECISION size for corresponding machine_mode. */
> > +
> > +poly_int64
> > +riscv_v_adjust_precision (machine_mode mode, int scale) {
> > + if (riscv_v_ext_vector_mode_p (mode))
> > + return riscv_vector_chunks * scale;
> > +
> > + return scale;
> > +}
> > +
> > /* Return true if X is a valid address for machine mode MODE. If it
> > is,
> > fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is
> > in
> > effect. */
> > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index
> > 5bc7f2f467d..15b9317a8ce 100644
> > --- a/gcc/config/riscv/riscv.h
> > +++ b/gcc/config/riscv/riscv.h
> > @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary; extern
> > unsigned riscv_bytes_per_vector_chunk; extern poly_uint16
> > riscv_vector_chunks; extern poly_int64 riscv_v_adjust_nunits (enum
> > machine_mode, int);
> > +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
> > /* The number of bits and bytes in a RVV vector. */ #define
> > BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks *
> > riscv_bytes_per_vector_chunk * 8)) #define BYTES_PER_RISCV_VECTOR
> > (poly_uint16 (riscv_vector_chunks *
> > riscv_bytes_per_vector_chunk))
> > diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc index
> > 2d418f09aab..12f4e6335e6 100644
> > --- a/gcc/genmodes.cc
> > +++ b/gcc/genmodes.cc
> > @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment; static
> > struct mode_adjust *adj_format; static struct mode_adjust *adj_ibit;
> > static struct mode_adjust *adj_fbit;
> > +static struct mode_adjust *adj_precision;
> >
> > /* Mode class operations. */
> > static enum mode_class
> > @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass, #define
> > ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
> > #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM,
> > RANDOM) #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X,
> > RANDOM, RANDOM)
> > +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM,
> > RANDOM)
> > #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT,
> > FLOAT)
> > #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
> > #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM) @@
> > -1829,7 +1831,15 @@ emit_mode_adjustments (void)
> > " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
> > m->name, m->name);
> > printf (" mode_precision[E_%smode] = ps * old_factor;\n",
> > m->name);
> > - printf (" mode_size[E_%smode] = exact_div
> > (mode_precision[E_%smode],"
> > + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT.
> > */
> > + printf (" poly_uint16 size_one = "
> > + "mode_precision[E_%smode].is_constant ()\n", m->name);
> > + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
> > + printf (" if (known_lt (mode_precision[E_%smode], "
> > + "size_one * BITS_PER_UNIT))\n", m->name);
> > + printf (" mode_size[E_%smode] = size_one;\n", m->name);
> > + printf (" else\n");
> > + printf (" mode_size[E_%smode] = exact_div
> > (mode_precision[E_%smode],"
> > " BITS_PER_UNIT);\n", m->name, m->name);
> > printf (" mode_nunits[E_%smode] = ps;\n", m->name);
> > printf (" adjust_mode_mask (E_%smode);\n", m->name); @@
> > -1963,6 +1973,19 @@ emit_mode_adjustments (void)
> > printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
> > a->file, a->line, a->mode->name, a->adjustment);
> >
> > + /* Adjust precision to the actual bits size. */
> > + for (a = adj_precision; a; a = a->next)
> > + switch (a->mode->cl)
> > + {
> > + case MODE_VECTOR_BOOL:
> > + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
> > + a->adjustment);
> > + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
> > + break;
> > + default:
> > + break;
> > + }
> > +
> > puts ("}");
> > }
> >
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > new file mode 100644
> > index 00000000000..e70960c5b6d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 18 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > new file mode 100644
> > index 00000000000..dcc7a644a88
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 17 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > new file mode 100644
> > index 00000000000..3af0513e006
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 16 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > new file mode 100644
> > index 00000000000..ea3c360d756
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 15 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > new file mode 100644
> > index 00000000000..9fc659d2402
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 14 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > new file mode 100644
> > index 00000000000..98275e5267d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 13 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > new file mode 100644
> > index 00000000000..8f6f0b11f09
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > new file mode 100644
> > index 00000000000..d96959dd064
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > @@ -0,0 +1,77 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 7 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 14 } } */
> > --
> > 2.34.1
> >
> >
> >
>
> --
> Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
Hi there,
Just FYI that there is no obvious errors of this patch by running the GCC bootstrap making.
Pan
From: Li, Pan2
Sent: Friday, February 24, 2023 3:21 PM
To: juzhe.zhong@rivai.ai; kito.cheng <kito.cheng@gmail.com>; richard.sandiford <richard.sandiford@arm.com>
Cc: incarnation.p.lee <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>; jeffreyalaw <jeffreyalaw@gmail.com>
Subject: RE: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Hi Richard Sandiford,
Looks like you are busy and stuck in some important work right now, could you please help to share something like ETA if possible? Then we may have a better plan for the RVV intrinsic support in the GCC 13.
Thanks a lot and feel free to ping us if any question or concern.
Pan
From: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>
Sent: Friday, February 24, 2023 1:08 PM
To: kito.cheng <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
Cc: richard.sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; incarnation.p.lee <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; Kito.cheng <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther <rguenther@suse.de<mailto:rguenther@suse.de>>; jeffreyalaw <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>
Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Hi,
It's been a while since this patch is sent.
This patch is very important for us since we are going to release RVV intrinsic support in GCC 13.
And this is the patch to fix bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108654
Can any one verifies this patch for us?
Thanks.
________________________________
juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
From: Kito Cheng<mailto:kito.cheng@gmail.com>
Date: 2023-02-21 16:28
To: Li, Pan2<mailto:pan2.li@intel.com>
CC: richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>; juzhe.zhong<mailto:juzhe.zhong@rivai.ai>; incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>; gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; Richard Biener<mailto:rguenther@suse.de>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Hi Richard Sandiford:
RISC-V part is OK to me, could you review the ADJUST_PRECISION part to
make sure this change is reasonable?
Thanks :)
On Tue, Feb 21, 2023 at 2:37 PM Li, Pan2 via Gcc-patches
<gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote:
>
> Hi,
>
> Kindly reminder for this PR.
>
> Pan
>
> -----Original Message-----
> From: Li, Pan2
> Sent: Friday, February 17, 2023 4:39 PM
> To: richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>; juzhe.zhong <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>
> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>; gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> Subject: RE: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> Cool, thank you!
>
> Hi Richard S,
>
> Could you please help to do me a fever for this change when you free? Thank you!
>
> Pan
>
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> Sent: Friday, February 17, 2023 3:36 PM
> To: juzhe.zhong <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>
> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>; gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>; richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> On Thu, 16 Feb 2023, juzhe.zhong wrote:
>
> > Thanks for the great work to fix this issue for rvv.Hi,richard. This
> > is the patch to differentiate mask mode of same bytesize. Adjust the
> > precision correctly according to rvv isa. Would you mind helping us
> > with this patch ?
> > Since it‘s very important for rvv support in gcc
>
> If adjusting the precision works fine then I suppose the patch looks reasonable. I'll defer to Richard S. though since he's the one knowing the mode stuff better. I'd have integrated the precision adjustment with the ADJUST_NITER hook since that is also documented to adjust the precision btw.
>
> Richard.
>
> > Thanks.
> > ---- Replied Message ----
> > From
> > incarnation.p.lee@outlook.com<incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cincarnation.p.lee@outlook.com>>
> > Date
> > 02/16/2023 23:12
> > To
> > gcc-patches@gcc.gnu.org<gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cgcc-patches@gcc.gnu.org>>
> > Cc
> > juzhe.zhong@rivai.ai<juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cjuzhe.zhong@rivai.ai>>,
> > kito.cheng@sifive.com<kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3ckito.cheng@sifive.com>>,
> > rguenther@suse.de<rguenther@suse.de<mailto:rguenther@suse.de%3crguenther@suse.de>>,
> > pan2.li@intel.com<pan2.li@intel.com<mailto:pan2.li@intel.com%3cpan2.li@intel.com>>
> > Subject
> > [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> > From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> >
> > Fix the bug of the rvv bool mode precision with the adjustment.
> > The bits size of vbool*_t will be adjusted to
> > [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
> > adjusted mode precison of vbool*_t will help underlying pass to
> > make the right decision for both the correctness and optimization.
> >
> > Given below sample code:
> > void test_1(int8_t * restrict in, int8_t * restrict out)
> > {
> > vbool8_t v2 = *(vbool8_t*)in;
> > vbool16_t v5 = *(vbool16_t*)in;
> > *(vbool16_t*)(out + 200) = v5;
> > *(vbool8_t*)(out + 100) = v2;
> > }
> >
> > Before the precision adjustment:
> > addi a4,a1,100
> > vsetvli a5,zero,e8,m1,ta,ma
> > addi a1,a1,200
> > vlm.v v24,0(a0)
> > vsm.v v24,0(a4)
> > // Need one vsetvli and vlm.v for correctness here.
> > vsm.v v24,0(a1)
> >
> > After the precision adjustment:
> > csrr t0,vlenb
> > slli t1,t0,1
> > csrr a3,vlenb
> > sub sp,sp,t1
> > slli a4,a3,1
> > add a4,a4,sp
> > sub a3,a4,a3
> > vsetvli a5,zero,e8,m1,ta,ma
> > addi a2,a1,200
> > vlm.v v24,0(a0)
> > vsm.v v24,0(a3)
> > addi a1,a1,100
> > vsetvli a4,zero,e8,mf2,ta,ma
> > csrr t0,vlenb
> > vlm.v v25,0(a3)
> > vsm.v v25,0(a2)
> > slli t1,t0,1
> > vsetvli a5,zero,e8,m1,ta,ma
> > vsm.v v24,0(a1)
> > add sp,sp,t1
> > jr ra
> >
> > However, there may be some optimization opportunates after
> > the mode precision adjustment. It can be token care of in
> > the RISC-V backend in the underlying separted PR(s).
> >
> > PR 108185
> > PR 108654
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv-modes.def (ADJUST_PRECISION):
> > * config/riscv/riscv.cc (riscv_v_adjust_precision):
> > * config/riscv/riscv.h (riscv_v_adjust_precision):
> > * genmodes.cc (ADJUST_PRECISION):
> > (emit_mode_adjustments):
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/pr108185-1.c: New test.
> > * gcc.target/riscv/pr108185-2.c: New test.
> > * gcc.target/riscv/pr108185-3.c: New test.
> > * gcc.target/riscv/pr108185-4.c: New test.
> > * gcc.target/riscv/pr108185-5.c: New test.
> > * gcc.target/riscv/pr108185-6.c: New test.
> > * gcc.target/riscv/pr108185-7.c: New test.
> > * gcc.target/riscv/pr108185-8.c: New test.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> > ---
> > gcc/config/riscv/riscv-modes.def | 8 +++
> > gcc/config/riscv/riscv.cc | 12 ++++
> > gcc/config/riscv/riscv.h | 1 + gcc/genmodes.cc
> > | 25 ++++++- gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68
> > ++++++++++++++++++ gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77
> > +++++++++++++++++++++
> > 12 files changed, 598 insertions(+), 1 deletion(-) create mode 100644
> > gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >
> > diff --git a/gcc/config/riscv/riscv-modes.def
> > b/gcc/config/riscv/riscv-modes.def
> > index d5305efa8a6..110bddce851 100644
> > --- a/gcc/config/riscv/riscv-modes.def
> > +++ b/gcc/config/riscv/riscv-modes.def
> > @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks *
> > riscv_bytes_per_vector_chunk); ADJUST_BYTESIZE (VNx32BI,
> > riscv_vector_chunks * riscv_bytes_per_vector_chunk); ADJUST_BYTESIZE
> > (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
> >
> > +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
> > +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
> > +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
> > +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
> > +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode,
> > +16)); ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision
> > +(VNx32BImode, 32)); ADJUST_PRECISION (VNx64BI,
> > +riscv_v_adjust_precision (VNx64BImode, 64));
> > +
> > /*
> > | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 |
> > MIN_VLEN=64 |
> > | | LMUL | SEW/LMUL | LMUL | SEW/LMUL
> > | diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index de3e1f903c7..cbe66c0e35b 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int
> > scale)
> > return scale;
> > }
> >
> > +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
> > + PRECISION size for corresponding machine_mode. */
> > +
> > +poly_int64
> > +riscv_v_adjust_precision (machine_mode mode, int scale) {
> > + if (riscv_v_ext_vector_mode_p (mode))
> > + return riscv_vector_chunks * scale;
> > +
> > + return scale;
> > +}
> > +
> > /* Return true if X is a valid address for machine mode MODE. If it
> > is,
> > fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is
> > in
> > effect. */
> > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index
> > 5bc7f2f467d..15b9317a8ce 100644
> > --- a/gcc/config/riscv/riscv.h
> > +++ b/gcc/config/riscv/riscv.h
> > @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary; extern
> > unsigned riscv_bytes_per_vector_chunk; extern poly_uint16
> > riscv_vector_chunks; extern poly_int64 riscv_v_adjust_nunits (enum
> > machine_mode, int);
> > +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
> > /* The number of bits and bytes in a RVV vector. */ #define
> > BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks *
> > riscv_bytes_per_vector_chunk * 8)) #define BYTES_PER_RISCV_VECTOR
> > (poly_uint16 (riscv_vector_chunks *
> > riscv_bytes_per_vector_chunk))
> > diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc index
> > 2d418f09aab..12f4e6335e6 100644
> > --- a/gcc/genmodes.cc
> > +++ b/gcc/genmodes.cc
> > @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment; static
> > struct mode_adjust *adj_format; static struct mode_adjust *adj_ibit;
> > static struct mode_adjust *adj_fbit;
> > +static struct mode_adjust *adj_precision;
> >
> > /* Mode class operations. */
> > static enum mode_class
> > @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass, #define
> > ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
> > #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM,
> > RANDOM) #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X,
> > RANDOM, RANDOM)
> > +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM,
> > RANDOM)
> > #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT,
> > FLOAT)
> > #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
> > #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM) @@
> > -1829,7 +1831,15 @@ emit_mode_adjustments (void)
> > " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
> > m->name, m->name);
> > printf (" mode_precision[E_%smode] = ps * old_factor;\n",
> > m->name);
> > - printf (" mode_size[E_%smode] = exact_div
> > (mode_precision[E_%smode],"
> > + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT.
> > */
> > + printf (" poly_uint16 size_one = "
> > + "mode_precision[E_%smode].is_constant ()\n", m->name);
> > + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
> > + printf (" if (known_lt (mode_precision[E_%smode], "
> > + "size_one * BITS_PER_UNIT))\n", m->name);
> > + printf (" mode_size[E_%smode] = size_one;\n", m->name);
> > + printf (" else\n");
> > + printf (" mode_size[E_%smode] = exact_div
> > (mode_precision[E_%smode],"
> > " BITS_PER_UNIT);\n", m->name, m->name);
> > printf (" mode_nunits[E_%smode] = ps;\n", m->name);
> > printf (" adjust_mode_mask (E_%smode);\n", m->name); @@
> > -1963,6 +1973,19 @@ emit_mode_adjustments (void)
> > printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
> > a->file, a->line, a->mode->name, a->adjustment);
> >
> > + /* Adjust precision to the actual bits size. */
> > + for (a = adj_precision; a; a = a->next)
> > + switch (a->mode->cl)
> > + {
> > + case MODE_VECTOR_BOOL:
> > + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
> > + a->adjustment);
> > + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
> > + break;
> > + default:
> > + break;
> > + }
> > +
> > puts ("}");
> > }
> >
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > new file mode 100644
> > index 00000000000..e70960c5b6d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 18 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > new file mode 100644
> > index 00000000000..dcc7a644a88
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 17 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > new file mode 100644
> > index 00000000000..3af0513e006
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 16 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > new file mode 100644
> > index 00000000000..ea3c360d756
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 15 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > new file mode 100644
> > index 00000000000..9fc659d2402
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 14 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > new file mode 100644
> > index 00000000000..98275e5267d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 13 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > new file mode 100644
> > index 00000000000..8f6f0b11f09
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 12 } } */
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > new file mode 100644
> > index 00000000000..d96959dd064
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > @@ -0,0 +1,77 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +void
> > +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool1_t v1 = *(vbool1_t*)in;
> > + vbool1_t v2 = *(vbool1_t*)in;
> > +
> > + *(vbool1_t*)(out + 100) = v1;
> > + *(vbool1_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool2_t v1 = *(vbool2_t*)in;
> > + vbool2_t v2 = *(vbool2_t*)in;
> > +
> > + *(vbool2_t*)(out + 100) = v1;
> > + *(vbool2_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool4_t v1 = *(vbool4_t*)in;
> > + vbool4_t v2 = *(vbool4_t*)in;
> > +
> > + *(vbool4_t*)(out + 100) = v1;
> > + *(vbool4_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out)
> > +{
> > + vbool8_t v1 = *(vbool8_t*)in;
> > + vbool8_t v2 = *(vbool8_t*)in;
> > +
> > + *(vbool8_t*)(out + 100) = v1;
> > + *(vbool8_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool16_t v1 = *(vbool16_t*)in;
> > + vbool16_t v2 = *(vbool16_t*)in;
> > +
> > + *(vbool16_t*)(out + 100) = v1;
> > + *(vbool16_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool32_t v1 = *(vbool32_t*)in;
> > + vbool32_t v2 = *(vbool32_t*)in;
> > +
> > + *(vbool32_t*)(out + 100) = v1;
> > + *(vbool32_t*)(out + 200) = v2;
> > +}
> > +
> > +void
> > +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict
> > +out) {
> > + vbool64_t v1 = *(vbool64_t*)in;
> > + vbool64_t v2 = *(vbool64_t*)in;
> > +
> > + *(vbool64_t*)(out + 100) = v1;
> > + *(vbool64_t*)(out + 200) = v2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 7 } } */
> > +/* { dg-final { scan-assembler-times
> > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)}
> > 14 } } */
> > --
> > 2.34.1
> >
> >
> >
>
> --
> Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
Sorry for the slow reply, been away for a couple of weeks.
"incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> From: Pan Li <pan2.li@intel.com>
>
> Fix the bug of the rvv bool mode precision with the adjustment.
> The bits size of vbool*_t will be adjusted to
> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
> adjusted mode precison of vbool*_t will help underlying pass to
> make the right decision for both the correctness and optimization.
>
> Given below sample code:
> void test_1(int8_t * restrict in, int8_t * restrict out)
> {
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
> *(vbool16_t*)(out + 200) = v5;
> *(vbool8_t*)(out + 100) = v2;
> }
>
> Before the precision adjustment:
> addi a4,a1,100
> vsetvli a5,zero,e8,m1,ta,ma
> addi a1,a1,200
> vlm.v v24,0(a0)
> vsm.v v24,0(a4)
> // Need one vsetvli and vlm.v for correctness here.
> vsm.v v24,0(a1)
>
> After the precision adjustment:
> csrr t0,vlenb
> slli t1,t0,1
> csrr a3,vlenb
> sub sp,sp,t1
> slli a4,a3,1
> add a4,a4,sp
> sub a3,a4,a3
> vsetvli a5,zero,e8,m1,ta,ma
> addi a2,a1,200
> vlm.v v24,0(a0)
> vsm.v v24,0(a3)
> addi a1,a1,100
> vsetvli a4,zero,e8,mf2,ta,ma
> csrr t0,vlenb
> vlm.v v25,0(a3)
> vsm.v v25,0(a2)
> slli t1,t0,1
> vsetvli a5,zero,e8,m1,ta,ma
> vsm.v v24,0(a1)
> add sp,sp,t1
> jr ra
>
> However, there may be some optimization opportunates after
> the mode precision adjustment. It can be token care of in
> the RISC-V backend in the underlying separted PR(s).
>
> PR 108185
> PR 108654
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
> * config/riscv/riscv.cc (riscv_v_adjust_precision):
> * config/riscv/riscv.h (riscv_v_adjust_precision):
> * genmodes.cc (ADJUST_PRECISION):
> (emit_mode_adjustments):
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr108185-1.c: New test.
> * gcc.target/riscv/pr108185-2.c: New test.
> * gcc.target/riscv/pr108185-3.c: New test.
> * gcc.target/riscv/pr108185-4.c: New test.
> * gcc.target/riscv/pr108185-5.c: New test.
> * gcc.target/riscv/pr108185-6.c: New test.
> * gcc.target/riscv/pr108185-7.c: New test.
> * gcc.target/riscv/pr108185-8.c: New test.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
> gcc/config/riscv/riscv-modes.def | 8 +++
> gcc/config/riscv/riscv.cc | 12 ++++
> gcc/config/riscv/riscv.h | 1 +
> gcc/genmodes.cc | 25 ++++++-
> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
> 12 files changed, 598 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>
> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
> index d5305efa8a6..110bddce851 100644
> --- a/gcc/config/riscv/riscv-modes.def
> +++ b/gcc/config/riscv/riscv-modes.def
> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>
> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
> +
> /*
> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index de3e1f903c7..cbe66c0e35b 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
> return scale;
> }
>
> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
> + PRECISION size for corresponding machine_mode. */
> +
> +poly_int64
> +riscv_v_adjust_precision (machine_mode mode, int scale)
> +{
> + if (riscv_v_ext_vector_mode_p (mode))
> + return riscv_vector_chunks * scale;
> +
> + return scale;
> +}
> +
> /* Return true if X is a valid address for machine mode MODE. If it is,
> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
> effect. */
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 5bc7f2f467d..15b9317a8ce 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
> extern unsigned riscv_bytes_per_vector_chunk;
> extern poly_uint16 riscv_vector_chunks;
> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
> /* The number of bits and bytes in a RVV vector. */
> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
> index 2d418f09aab..12f4e6335e6 100644
> --- a/gcc/genmodes.cc
> +++ b/gcc/genmodes.cc
> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
> static struct mode_adjust *adj_format;
> static struct mode_adjust *adj_ibit;
> static struct mode_adjust *adj_fbit;
> +static struct mode_adjust *adj_precision;
>
> /* Mode class operations. */
> static enum mode_class
> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
> m->name, m->name);
> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
> + printf (" poly_uint16 size_one = "
> + "mode_precision[E_%smode].is_constant ()\n", m->name);
> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
Have you tried this on an x86_64 system? I wouldn't expect it to work
because of the:
STATIC_ASSERT (N >= 2);
in the poly_uint16 constructor.
> + printf (" if (known_lt (mode_precision[E_%smode], "
> + "size_one * BITS_PER_UNIT))\n", m->name);
> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
> + printf (" else\n");
> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
Now that the assert implicit in the original exact_div no longer holds,
I think we should instead generalise it to can_div_away_from_zero_p
(which will involve defining a new overload of can_div_away_from_zero_p).
I think that will give the same result as the code above for the cases
that the code above handles. But it should be more general too.
TBH, I'm still sceptical that this is all that is needed. It seems
unlikely that we've been so good at writing vector support code that
we've made it work for precision < bitsize, despite that being an
unsupported combination until now. But I guess we can fix problems
on a case-by-case basis.
Thanks,
Richard
> " BITS_PER_UNIT);\n", m->name, m->name);
> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
> printf (" adjust_mode_mask (E_%smode);\n", m->name);
> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
> a->file, a->line, a->mode->name, a->adjustment);
>
> + /* Adjust precision to the actual bits size. */
> + for (a = adj_precision; a; a = a->next)
> + switch (a->mode->cl)
> + {
> + case MODE_VECTOR_BOOL:
> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
> + a->adjustment);
> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
> + break;
> + default:
> + break;
> + }
> +
> puts ("}");
> }
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> new file mode 100644
> index 00000000000..e70960c5b6d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> new file mode 100644
> index 00000000000..dcc7a644a88
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> new file mode 100644
> index 00000000000..3af0513e006
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> new file mode 100644
> index 00000000000..ea3c360d756
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> new file mode 100644
> index 00000000000..9fc659d2402
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> new file mode 100644
> index 00000000000..98275e5267d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> new file mode 100644
> index 00000000000..8f6f0b11f09
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> new file mode 100644
> index 00000000000..d96959dd064
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> @@ -0,0 +1,77 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
Never mind, wish you have a good holiday.
Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
Thanks again for your professional suggestion, have a nice day, ??!
Pan
________________________________
From: Richard Sandiford <richard.sandiford@arm.com>
Sent: Monday, February 27, 2023 22:24
To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: incarnation.p.lee@outlook.com <incarnation.p.lee@outlook.com>; juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; kito.cheng@sifive.com <kito.cheng@sifive.com>; rguenther@suse.de <rguenther@suse.de>; pan2.li@intel.com <pan2.li@intel.com>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Sorry for the slow reply, been away for a couple of weeks.
"incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> From: Pan Li <pan2.li@intel.com>
>
> Fix the bug of the rvv bool mode precision with the adjustment.
> The bits size of vbool*_t will be adjusted to
> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
> adjusted mode precison of vbool*_t will help underlying pass to
> make the right decision for both the correctness and optimization.
>
> Given below sample code:
> void test_1(int8_t * restrict in, int8_t * restrict out)
> {
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
> *(vbool16_t*)(out + 200) = v5;
> *(vbool8_t*)(out + 100) = v2;
> }
>
> Before the precision adjustment:
> addi a4,a1,100
> vsetvli a5,zero,e8,m1,ta,ma
> addi a1,a1,200
> vlm.v v24,0(a0)
> vsm.v v24,0(a4)
> // Need one vsetvli and vlm.v for correctness here.
> vsm.v v24,0(a1)
>
> After the precision adjustment:
> csrr t0,vlenb
> slli t1,t0,1
> csrr a3,vlenb
> sub sp,sp,t1
> slli a4,a3,1
> add a4,a4,sp
> sub a3,a4,a3
> vsetvli a5,zero,e8,m1,ta,ma
> addi a2,a1,200
> vlm.v v24,0(a0)
> vsm.v v24,0(a3)
> addi a1,a1,100
> vsetvli a4,zero,e8,mf2,ta,ma
> csrr t0,vlenb
> vlm.v v25,0(a3)
> vsm.v v25,0(a2)
> slli t1,t0,1
> vsetvli a5,zero,e8,m1,ta,ma
> vsm.v v24,0(a1)
> add sp,sp,t1
> jr ra
>
> However, there may be some optimization opportunates after
> the mode precision adjustment. It can be token care of in
> the RISC-V backend in the underlying separted PR(s).
>
> PR 108185
> PR 108654
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
> * config/riscv/riscv.cc (riscv_v_adjust_precision):
> * config/riscv/riscv.h (riscv_v_adjust_precision):
> * genmodes.cc (ADJUST_PRECISION):
> (emit_mode_adjustments):
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr108185-1.c: New test.
> * gcc.target/riscv/pr108185-2.c: New test.
> * gcc.target/riscv/pr108185-3.c: New test.
> * gcc.target/riscv/pr108185-4.c: New test.
> * gcc.target/riscv/pr108185-5.c: New test.
> * gcc.target/riscv/pr108185-6.c: New test.
> * gcc.target/riscv/pr108185-7.c: New test.
> * gcc.target/riscv/pr108185-8.c: New test.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
> gcc/config/riscv/riscv-modes.def | 8 +++
> gcc/config/riscv/riscv.cc | 12 ++++
> gcc/config/riscv/riscv.h | 1 +
> gcc/genmodes.cc | 25 ++++++-
> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
> 12 files changed, 598 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>
> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
> index d5305efa8a6..110bddce851 100644
> --- a/gcc/config/riscv/riscv-modes.def
> +++ b/gcc/config/riscv/riscv-modes.def
> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>
> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
> +
> /*
> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index de3e1f903c7..cbe66c0e35b 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
> return scale;
> }
>
> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
> + PRECISION size for corresponding machine_mode. */
> +
> +poly_int64
> +riscv_v_adjust_precision (machine_mode mode, int scale)
> +{
> + if (riscv_v_ext_vector_mode_p (mode))
> + return riscv_vector_chunks * scale;
> +
> + return scale;
> +}
> +
> /* Return true if X is a valid address for machine mode MODE. If it is,
> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
> effect. */
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 5bc7f2f467d..15b9317a8ce 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
> extern unsigned riscv_bytes_per_vector_chunk;
> extern poly_uint16 riscv_vector_chunks;
> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
> /* The number of bits and bytes in a RVV vector. */
> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
> index 2d418f09aab..12f4e6335e6 100644
> --- a/gcc/genmodes.cc
> +++ b/gcc/genmodes.cc
> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
> static struct mode_adjust *adj_format;
> static struct mode_adjust *adj_ibit;
> static struct mode_adjust *adj_fbit;
> +static struct mode_adjust *adj_precision;
>
> /* Mode class operations. */
> static enum mode_class
> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
> m->name, m->name);
> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
> + printf (" poly_uint16 size_one = "
> + "mode_precision[E_%smode].is_constant ()\n", m->name);
> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
Have you tried this on an x86_64 system? I wouldn't expect it to work
because of the:
STATIC_ASSERT (N >= 2);
in the poly_uint16 constructor.
> + printf (" if (known_lt (mode_precision[E_%smode], "
> + "size_one * BITS_PER_UNIT))\n", m->name);
> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
> + printf (" else\n");
> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
Now that the assert implicit in the original exact_div no longer holds,
I think we should instead generalise it to can_div_away_from_zero_p
(which will involve defining a new overload of can_div_away_from_zero_p).
I think that will give the same result as the code above for the cases
that the code above handles. But it should be more general too.
TBH, I'm still sceptical that this is all that is needed. It seems
unlikely that we've been so good at writing vector support code that
we've made it work for precision < bitsize, despite that being an
unsupported combination until now. But I guess we can fix problems
on a case-by-case basis.
Thanks,
Richard
> " BITS_PER_UNIT);\n", m->name, m->name);
> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
> printf (" adjust_mode_mask (E_%smode);\n", m->name);
> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
> a->file, a->line, a->mode->name, a->adjustment);
>
> + /* Adjust precision to the actual bits size. */
> + for (a = adj_precision; a; a = a->next)
> + switch (a->mode->cl)
> + {
> + case MODE_VECTOR_BOOL:
> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
> + a->adjustment);
> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
> + break;
> + default:
> + break;
> + }
> +
> puts ("}");
> }
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> new file mode 100644
> index 00000000000..e70960c5b6d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> new file mode 100644
> index 00000000000..dcc7a644a88
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> new file mode 100644
> index 00000000000..3af0513e006
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> new file mode 100644
> index 00000000000..ea3c360d756
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> new file mode 100644
> index 00000000000..9fc659d2402
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> new file mode 100644
> index 00000000000..98275e5267d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> new file mode 100644
> index 00000000000..8f6f0b11f09
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> new file mode 100644
> index 00000000000..d96959dd064
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> @@ -0,0 +1,77 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
Hi Richard Sandiford,
After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
template<unsigned int N, typename Ca>
inline POLY_CONST_RESULT (N, Ca, Ca)
normalize_to_unit (const poly_int_pod<N, Ca> &a)
{
typedef POLY_CONST_COEFF (Ca, Ca) C;
poly_int<N, C> normalized = a;
if (normalized.is_constant())
normalized.coeffs[0] = 1;
else
for (unsigned int i = 0; i < N; i++)
POLY_SET_COEFF (C, normalized, i, 1);
return normalized;
}
And then adjust the genmodes like below to consume the unit poly.
printf (" poly_uint16 unit_poly = "
"normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
printf (" if (known_lt (mode_precision[E_%smode], "
"unit_poly * BITS_PER_UNIT))\n", m->name);
printf (" mode_size[E_%smode] = unit_poly;\n", m->name);
I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
Pan
From: 盼 李 <incarnation.p.lee@outlook.com>
Sent: Monday, February 27, 2023 11:13 PM
To: Richard Sandiford <richard.sandiford@arm.com>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; Li, Pan2 <pan2.li@intel.com>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Never mind, wish you have a good holiday.
Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
Thanks again for your professional suggestion, have a nice day, 😉!
Pan
________________________________
From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
Sent: Monday, February 27, 2023 22:24
To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com>>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Sorry for the slow reply, been away for a couple of weeks.
"incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> writes:
> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>
> Fix the bug of the rvv bool mode precision with the adjustment.
> The bits size of vbool*_t will be adjusted to
> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
> adjusted mode precison of vbool*_t will help underlying pass to
> make the right decision for both the correctness and optimization.
>
> Given below sample code:
> void test_1(int8_t * restrict in, int8_t * restrict out)
> {
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
> *(vbool16_t*)(out + 200) = v5;
> *(vbool8_t*)(out + 100) = v2;
> }
>
> Before the precision adjustment:
> addi a4,a1,100
> vsetvli a5,zero,e8,m1,ta,ma
> addi a1,a1,200
> vlm.v v24,0(a0)
> vsm.v v24,0(a4)
> // Need one vsetvli and vlm.v for correctness here.
> vsm.v v24,0(a1)
>
> After the precision adjustment:
> csrr t0,vlenb
> slli t1,t0,1
> csrr a3,vlenb
> sub sp,sp,t1
> slli a4,a3,1
> add a4,a4,sp
> sub a3,a4,a3
> vsetvli a5,zero,e8,m1,ta,ma
> addi a2,a1,200
> vlm.v v24,0(a0)
> vsm.v v24,0(a3)
> addi a1,a1,100
> vsetvli a4,zero,e8,mf2,ta,ma
> csrr t0,vlenb
> vlm.v v25,0(a3)
> vsm.v v25,0(a2)
> slli t1,t0,1
> vsetvli a5,zero,e8,m1,ta,ma
> vsm.v v24,0(a1)
> add sp,sp,t1
> jr ra
>
> However, there may be some optimization opportunates after
> the mode precision adjustment. It can be token care of in
> the RISC-V backend in the underlying separted PR(s).
>
> PR 108185
> PR 108654
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
> * config/riscv/riscv.cc (riscv_v_adjust_precision):
> * config/riscv/riscv.h (riscv_v_adjust_precision):
> * genmodes.cc (ADJUST_PRECISION):
> (emit_mode_adjustments):
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr108185-1.c: New test.
> * gcc.target/riscv/pr108185-2.c: New test.
> * gcc.target/riscv/pr108185-3.c: New test.
> * gcc.target/riscv/pr108185-4.c: New test.
> * gcc.target/riscv/pr108185-5.c: New test.
> * gcc.target/riscv/pr108185-6.c: New test.
> * gcc.target/riscv/pr108185-7.c: New test.
> * gcc.target/riscv/pr108185-8.c: New test.
>
> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> ---
> gcc/config/riscv/riscv-modes.def | 8 +++
> gcc/config/riscv/riscv.cc | 12 ++++
> gcc/config/riscv/riscv.h | 1 +
> gcc/genmodes.cc | 25 ++++++-
> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
> 12 files changed, 598 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>
> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
> index d5305efa8a6..110bddce851 100644
> --- a/gcc/config/riscv/riscv-modes.def
> +++ b/gcc/config/riscv/riscv-modes.def
> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>
> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
> +
> /*
> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index de3e1f903c7..cbe66c0e35b 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
> return scale;
> }
>
> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
> + PRECISION size for corresponding machine_mode. */
> +
> +poly_int64
> +riscv_v_adjust_precision (machine_mode mode, int scale)
> +{
> + if (riscv_v_ext_vector_mode_p (mode))
> + return riscv_vector_chunks * scale;
> +
> + return scale;
> +}
> +
> /* Return true if X is a valid address for machine mode MODE. If it is,
> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
> effect. */
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 5bc7f2f467d..15b9317a8ce 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
> extern unsigned riscv_bytes_per_vector_chunk;
> extern poly_uint16 riscv_vector_chunks;
> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
> /* The number of bits and bytes in a RVV vector. */
> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
> index 2d418f09aab..12f4e6335e6 100644
> --- a/gcc/genmodes.cc
> +++ b/gcc/genmodes.cc
> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
> static struct mode_adjust *adj_format;
> static struct mode_adjust *adj_ibit;
> static struct mode_adjust *adj_fbit;
> +static struct mode_adjust *adj_precision;
>
> /* Mode class operations. */
> static enum mode_class
> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
> m->name, m->name);
> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
> + printf (" poly_uint16 size_one = "
> + "mode_precision[E_%smode].is_constant ()\n", m->name);
> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
Have you tried this on an x86_64 system? I wouldn't expect it to work
because of the:
STATIC_ASSERT (N >= 2);
in the poly_uint16 constructor.
> + printf (" if (known_lt (mode_precision[E_%smode], "
> + "size_one * BITS_PER_UNIT))\n", m->name);
> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
> + printf (" else\n");
> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
Now that the assert implicit in the original exact_div no longer holds,
I think we should instead generalise it to can_div_away_from_zero_p
(which will involve defining a new overload of can_div_away_from_zero_p).
I think that will give the same result as the code above for the cases
that the code above handles. But it should be more general too.
TBH, I'm still sceptical that this is all that is needed. It seems
unlikely that we've been so good at writing vector support code that
we've made it work for precision < bitsize, despite that being an
unsupported combination until now. But I guess we can fix problems
on a case-by-case basis.
Thanks,
Richard
> " BITS_PER_UNIT);\n", m->name, m->name);
> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
> printf (" adjust_mode_mask (E_%smode);\n", m->name);
> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
> a->file, a->line, a->mode->name, a->adjustment);
>
> + /* Adjust precision to the actual bits size. */
> + for (a = adj_precision; a; a = a->next)
> + switch (a->mode->cl)
> + {
> + case MODE_VECTOR_BOOL:
> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
> + a->adjustment);
> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
> + break;
> + default:
> + break;
> + }
> +
> puts ("}");
> }
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> new file mode 100644
> index 00000000000..e70960c5b6d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> new file mode 100644
> index 00000000000..dcc7a644a88
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> new file mode 100644
> index 00000000000..3af0513e006
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> new file mode 100644
> index 00000000000..ea3c360d756
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> new file mode 100644
> index 00000000000..9fc659d2402
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> new file mode 100644
> index 00000000000..98275e5267d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> new file mode 100644
> index 00000000000..8f6f0b11f09
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> new file mode 100644
> index 00000000000..d96959dd064
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> @@ -0,0 +1,77 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> + vbool1_t v1 = *(vbool1_t*)in;
> + vbool1_t v2 = *(vbool1_t*)in;
> +
> + *(vbool1_t*)(out + 100) = v1;
> + *(vbool1_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> + vbool2_t v1 = *(vbool2_t*)in;
> + vbool2_t v2 = *(vbool2_t*)in;
> +
> + *(vbool2_t*)(out + 100) = v1;
> + *(vbool2_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> + vbool4_t v1 = *(vbool4_t*)in;
> + vbool4_t v2 = *(vbool4_t*)in;
> +
> + *(vbool4_t*)(out + 100) = v1;
> + *(vbool4_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> + vbool8_t v1 = *(vbool8_t*)in;
> + vbool8_t v2 = *(vbool8_t*)in;
> +
> + *(vbool8_t*)(out + 100) = v1;
> + *(vbool8_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> + vbool16_t v1 = *(vbool16_t*)in;
> + vbool16_t v2 = *(vbool16_t*)in;
> +
> + *(vbool16_t*)(out + 100) = v1;
> + *(vbool16_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> + vbool32_t v1 = *(vbool32_t*)in;
> + vbool32_t v2 = *(vbool32_t*)in;
> +
> + *(vbool32_t*)(out + 100) = v1;
> + *(vbool32_t*)(out + 200) = v2;
> +}
> +
> +void
> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> + vbool64_t v1 = *(vbool64_t*)in;
> + vbool64_t v2 = *(vbool64_t*)in;
> +
> + *(vbool64_t*)(out + 100) = v1;
> + *(vbool64_t*)(out + 200) = v2;
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
"Li, Pan2" <pan2.li@intel.com> writes:
> Hi Richard Sandiford,
>
> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
>
> template<unsigned int N, typename Ca>
> inline POLY_CONST_RESULT (N, Ca, Ca)
> normalize_to_unit (const poly_int_pod<N, Ca> &a)
> {
> typedef POLY_CONST_COEFF (Ca, Ca) C;
>
> poly_int<N, C> normalized = a;
>
> if (normalized.is_constant())
> normalized.coeffs[0] = 1;
> else
> for (unsigned int i = 0; i < N; i++)
> POLY_SET_COEFF (C, normalized, i, 1);
>
> return normalized;
> }
>
> And then adjust the genmodes like below to consume the unit poly.
>
> printf (" poly_uint16 unit_poly = "
> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
> printf (" if (known_lt (mode_precision[E_%smode], "
> "unit_poly * BITS_PER_UNIT))\n", m->name);
> printf (" mode_size[E_%smode] = unit_poly;\n", m->name);
>
> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
My point was that we have multiple ways of dividing poly_ints:
- exact_div, for when the caller knows that the result is always exact
- can_div_trunc_p, for truncating division (round towards 0)
- can_div_away_from_zero_p, for rounding away from 0
- ...
This is like how we have multiple division *_EXPRs on trees.
Until now, exact_div was the correct choice for modes because vector
modes didn't have padding. We're now changing that, so my suggestion
in the review was to change the division operation that we use.
Rather than use exact_div, we should now use can_div_away_from_zero_p,
which would have the effect of rounding the quotient up.
Something like:
if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,
&mode_size[E_%smode]))
gcc_unreachable ();
But this will require a new overload of can_div_away_from_zero_p, since
the existing one is for constant quotients rather than constant divisors.
Thanks,
Richard
>
> Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
>
> Pan
>
> From: 盼 李 <incarnation.p.lee@outlook.com>
> Sent: Monday, February 27, 2023 11:13 PM
> To: Richard Sandiford <richard.sandiford@arm.com>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>
> Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; Li, Pan2 <pan2.li@intel.com>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> Never mind, wish you have a good holiday.
>
> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
>
> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
>
> Thanks again for your professional suggestion, have a nice day, 😉!
>
> Pan
> ________________________________
> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
> Sent: Monday, February 27, 2023 22:24
> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> Sorry for the slow reply, been away for a couple of weeks.
>
> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> writes:
>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>>
>> Fix the bug of the rvv bool mode precision with the adjustment.
>> The bits size of vbool*_t will be adjusted to
>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>> adjusted mode precison of vbool*_t will help underlying pass to
>> make the right decision for both the correctness and optimization.
>>
>> Given below sample code:
>> void test_1(int8_t * restrict in, int8_t * restrict out)
>> {
>> vbool8_t v2 = *(vbool8_t*)in;
>> vbool16_t v5 = *(vbool16_t*)in;
>> *(vbool16_t*)(out + 200) = v5;
>> *(vbool8_t*)(out + 100) = v2;
>> }
>>
>> Before the precision adjustment:
>> addi a4,a1,100
>> vsetvli a5,zero,e8,m1,ta,ma
>> addi a1,a1,200
>> vlm.v v24,0(a0)
>> vsm.v v24,0(a4)
>> // Need one vsetvli and vlm.v for correctness here.
>> vsm.v v24,0(a1)
>>
>> After the precision adjustment:
>> csrr t0,vlenb
>> slli t1,t0,1
>> csrr a3,vlenb
>> sub sp,sp,t1
>> slli a4,a3,1
>> add a4,a4,sp
>> sub a3,a4,a3
>> vsetvli a5,zero,e8,m1,ta,ma
>> addi a2,a1,200
>> vlm.v v24,0(a0)
>> vsm.v v24,0(a3)
>> addi a1,a1,100
>> vsetvli a4,zero,e8,mf2,ta,ma
>> csrr t0,vlenb
>> vlm.v v25,0(a3)
>> vsm.v v25,0(a2)
>> slli t1,t0,1
>> vsetvli a5,zero,e8,m1,ta,ma
>> vsm.v v24,0(a1)
>> add sp,sp,t1
>> jr ra
>>
>> However, there may be some optimization opportunates after
>> the mode precision adjustment. It can be token care of in
>> the RISC-V backend in the underlying separted PR(s).
>>
>> PR 108185
>> PR 108654
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>> * config/riscv/riscv.cc (riscv_v_adjust_precision):
>> * config/riscv/riscv.h (riscv_v_adjust_precision):
>> * genmodes.cc (ADJUST_PRECISION):
>> (emit_mode_adjustments):
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/pr108185-1.c: New test.
>> * gcc.target/riscv/pr108185-2.c: New test.
>> * gcc.target/riscv/pr108185-3.c: New test.
>> * gcc.target/riscv/pr108185-4.c: New test.
>> * gcc.target/riscv/pr108185-5.c: New test.
>> * gcc.target/riscv/pr108185-6.c: New test.
>> * gcc.target/riscv/pr108185-7.c: New test.
>> * gcc.target/riscv/pr108185-8.c: New test.
>>
>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>> ---
>> gcc/config/riscv/riscv-modes.def | 8 +++
>> gcc/config/riscv/riscv.cc | 12 ++++
>> gcc/config/riscv/riscv.h | 1 +
>> gcc/genmodes.cc | 25 ++++++-
>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
>> 12 files changed, 598 insertions(+), 1 deletion(-)
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>
>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
>> index d5305efa8a6..110bddce851 100644
>> --- a/gcc/config/riscv/riscv-modes.def
>> +++ b/gcc/config/riscv/riscv-modes.def
>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>>
>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
>> +
>> /*
>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index de3e1f903c7..cbe66c0e35b 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
>> return scale;
>> }
>>
>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
>> + PRECISION size for corresponding machine_mode. */
>> +
>> +poly_int64
>> +riscv_v_adjust_precision (machine_mode mode, int scale)
>> +{
>> + if (riscv_v_ext_vector_mode_p (mode))
>> + return riscv_vector_chunks * scale;
>> +
>> + return scale;
>> +}
>> +
>> /* Return true if X is a valid address for machine mode MODE. If it is,
>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
>> effect. */
>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>> index 5bc7f2f467d..15b9317a8ce 100644
>> --- a/gcc/config/riscv/riscv.h
>> +++ b/gcc/config/riscv/riscv.h
>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
>> extern unsigned riscv_bytes_per_vector_chunk;
>> extern poly_uint16 riscv_vector_chunks;
>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
>> /* The number of bits and bytes in a RVV vector. */
>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
>> index 2d418f09aab..12f4e6335e6 100644
>> --- a/gcc/genmodes.cc
>> +++ b/gcc/genmodes.cc
>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
>> static struct mode_adjust *adj_format;
>> static struct mode_adjust *adj_ibit;
>> static struct mode_adjust *adj_fbit;
>> +static struct mode_adjust *adj_precision;
>>
>> /* Mode class operations. */
>> static enum mode_class
>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
>> m->name, m->name);
>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
>> + printf (" poly_uint16 size_one = "
>> + "mode_precision[E_%smode].is_constant ()\n", m->name);
>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
>
> Have you tried this on an x86_64 system? I wouldn't expect it to work
> because of the:
>
> STATIC_ASSERT (N >= 2);
>
> in the poly_uint16 constructor.
>
>> + printf (" if (known_lt (mode_precision[E_%smode], "
>> + "size_one * BITS_PER_UNIT))\n", m->name);
>> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
>> + printf (" else\n");
>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>
> Now that the assert implicit in the original exact_div no longer holds,
> I think we should instead generalise it to can_div_away_from_zero_p
> (which will involve defining a new overload of can_div_away_from_zero_p).
> I think that will give the same result as the code above for the cases
> that the code above handles. But it should be more general too.
>
> TBH, I'm still sceptical that this is all that is needed. It seems
> unlikely that we've been so good at writing vector support code that
> we've made it work for precision < bitsize, despite that being an
> unsupported combination until now. But I guess we can fix problems
> on a case-by-case basis.
>
> Thanks,
> Richard
>
>> " BITS_PER_UNIT);\n", m->name, m->name);
>> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
>> printf (" adjust_mode_mask (E_%smode);\n", m->name);
>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
>> a->file, a->line, a->mode->name, a->adjustment);
>>
>> + /* Adjust precision to the actual bits size. */
>> + for (a = adj_precision; a; a = a->next)
>> + switch (a->mode->cl)
>> + {
>> + case MODE_VECTOR_BOOL:
>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
>> + a->adjustment);
>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
>> + break;
>> + default:
>> + break;
>> + }
>> +
>> puts ("}");
>> }
>>
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>> new file mode 100644
>> index 00000000000..e70960c5b6d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>> new file mode 100644
>> index 00000000000..dcc7a644a88
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>> new file mode 100644
>> index 00000000000..3af0513e006
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>> new file mode 100644
>> index 00000000000..ea3c360d756
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>> new file mode 100644
>> index 00000000000..9fc659d2402
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>> new file mode 100644
>> index 00000000000..98275e5267d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>> new file mode 100644
>> index 00000000000..8f6f0b11f09
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>> new file mode 100644
>> index 00000000000..d96959dd064
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>> @@ -0,0 +1,77 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted.
Pan
________________________________
From: Richard Sandiford <richard.sandiford@arm.com>
Sent: Tuesday, February 28, 2023 17:50
To: Li, Pan2 <pan2.li@intel.com>
Cc: 盼 李 <incarnation.p.lee@outlook.com>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>; juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; kito.cheng@sifive.com <kito.cheng@sifive.com>; rguenther@suse.de <rguenther@suse.de>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
"Li, Pan2" <pan2.li@intel.com> writes:
> Hi Richard Sandiford,
>
> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
>
> template<unsigned int N, typename Ca>
> inline POLY_CONST_RESULT (N, Ca, Ca)
> normalize_to_unit (const poly_int_pod<N, Ca> &a)
> {
> typedef POLY_CONST_COEFF (Ca, Ca) C;
>
> poly_int<N, C> normalized = a;
>
> if (normalized.is_constant())
> normalized.coeffs[0] = 1;
> else
> for (unsigned int i = 0; i < N; i++)
> POLY_SET_COEFF (C, normalized, i, 1);
>
> return normalized;
> }
>
> And then adjust the genmodes like below to consume the unit poly.
>
> printf (" poly_uint16 unit_poly = "
> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
> printf (" if (known_lt (mode_precision[E_%smode], "
> "unit_poly * BITS_PER_UNIT))\n", m->name);
> printf (" mode_size[E_%smode] = unit_poly;\n", m->name);
>
> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
My point was that we have multiple ways of dividing poly_ints:
- exact_div, for when the caller knows that the result is always exact
- can_div_trunc_p, for truncating division (round towards 0)
- can_div_away_from_zero_p, for rounding away from 0
- ...
This is like how we have multiple division *_EXPRs on trees.
Until now, exact_div was the correct choice for modes because vector
modes didn't have padding. We're now changing that, so my suggestion
in the review was to change the division operation that we use.
Rather than use exact_div, we should now use can_div_away_from_zero_p,
which would have the effect of rounding the quotient up.
Something like:
if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,
&mode_size[E_%smode]))
gcc_unreachable ();
But this will require a new overload of can_div_away_from_zero_p, since
the existing one is for constant quotients rather than constant divisors.
Thanks,
Richard
>
> Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
>
> Pan
>
> From: 盼 李 <incarnation.p.lee@outlook.com>
> Sent: Monday, February 27, 2023 11:13 PM
> To: Richard Sandiford <richard.sandiford@arm.com>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>
> Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; Li, Pan2 <pan2.li@intel.com>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> Never mind, wish you have a good holiday.
>
> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
>
> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
>
> Thanks again for your professional suggestion, have a nice day, 😉!
>
> Pan
> ________________________________
> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
> Sent: Monday, February 27, 2023 22:24
> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> Sorry for the slow reply, been away for a couple of weeks.
>
> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> writes:
>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>>
>> Fix the bug of the rvv bool mode precision with the adjustment.
>> The bits size of vbool*_t will be adjusted to
>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>> adjusted mode precison of vbool*_t will help underlying pass to
>> make the right decision for both the correctness and optimization.
>>
>> Given below sample code:
>> void test_1(int8_t * restrict in, int8_t * restrict out)
>> {
>> vbool8_t v2 = *(vbool8_t*)in;
>> vbool16_t v5 = *(vbool16_t*)in;
>> *(vbool16_t*)(out + 200) = v5;
>> *(vbool8_t*)(out + 100) = v2;
>> }
>>
>> Before the precision adjustment:
>> addi a4,a1,100
>> vsetvli a5,zero,e8,m1,ta,ma
>> addi a1,a1,200
>> vlm.v v24,0(a0)
>> vsm.v v24,0(a4)
>> // Need one vsetvli and vlm.v for correctness here.
>> vsm.v v24,0(a1)
>>
>> After the precision adjustment:
>> csrr t0,vlenb
>> slli t1,t0,1
>> csrr a3,vlenb
>> sub sp,sp,t1
>> slli a4,a3,1
>> add a4,a4,sp
>> sub a3,a4,a3
>> vsetvli a5,zero,e8,m1,ta,ma
>> addi a2,a1,200
>> vlm.v v24,0(a0)
>> vsm.v v24,0(a3)
>> addi a1,a1,100
>> vsetvli a4,zero,e8,mf2,ta,ma
>> csrr t0,vlenb
>> vlm.v v25,0(a3)
>> vsm.v v25,0(a2)
>> slli t1,t0,1
>> vsetvli a5,zero,e8,m1,ta,ma
>> vsm.v v24,0(a1)
>> add sp,sp,t1
>> jr ra
>>
>> However, there may be some optimization opportunates after
>> the mode precision adjustment. It can be token care of in
>> the RISC-V backend in the underlying separted PR(s).
>>
>> PR 108185
>> PR 108654
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>> * config/riscv/riscv.cc (riscv_v_adjust_precision):
>> * config/riscv/riscv.h (riscv_v_adjust_precision):
>> * genmodes.cc (ADJUST_PRECISION):
>> (emit_mode_adjustments):
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/pr108185-1.c: New test.
>> * gcc.target/riscv/pr108185-2.c: New test.
>> * gcc.target/riscv/pr108185-3.c: New test.
>> * gcc.target/riscv/pr108185-4.c: New test.
>> * gcc.target/riscv/pr108185-5.c: New test.
>> * gcc.target/riscv/pr108185-6.c: New test.
>> * gcc.target/riscv/pr108185-7.c: New test.
>> * gcc.target/riscv/pr108185-8.c: New test.
>>
>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>> ---
>> gcc/config/riscv/riscv-modes.def | 8 +++
>> gcc/config/riscv/riscv.cc | 12 ++++
>> gcc/config/riscv/riscv.h | 1 +
>> gcc/genmodes.cc | 25 ++++++-
>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
>> 12 files changed, 598 insertions(+), 1 deletion(-)
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>
>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
>> index d5305efa8a6..110bddce851 100644
>> --- a/gcc/config/riscv/riscv-modes.def
>> +++ b/gcc/config/riscv/riscv-modes.def
>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>>
>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
>> +
>> /*
>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index de3e1f903c7..cbe66c0e35b 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
>> return scale;
>> }
>>
>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
>> + PRECISION size for corresponding machine_mode. */
>> +
>> +poly_int64
>> +riscv_v_adjust_precision (machine_mode mode, int scale)
>> +{
>> + if (riscv_v_ext_vector_mode_p (mode))
>> + return riscv_vector_chunks * scale;
>> +
>> + return scale;
>> +}
>> +
>> /* Return true if X is a valid address for machine mode MODE. If it is,
>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
>> effect. */
>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>> index 5bc7f2f467d..15b9317a8ce 100644
>> --- a/gcc/config/riscv/riscv.h
>> +++ b/gcc/config/riscv/riscv.h
>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
>> extern unsigned riscv_bytes_per_vector_chunk;
>> extern poly_uint16 riscv_vector_chunks;
>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
>> /* The number of bits and bytes in a RVV vector. */
>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
>> index 2d418f09aab..12f4e6335e6 100644
>> --- a/gcc/genmodes.cc
>> +++ b/gcc/genmodes.cc
>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
>> static struct mode_adjust *adj_format;
>> static struct mode_adjust *adj_ibit;
>> static struct mode_adjust *adj_fbit;
>> +static struct mode_adjust *adj_precision;
>>
>> /* Mode class operations. */
>> static enum mode_class
>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
>> m->name, m->name);
>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
>> + printf (" poly_uint16 size_one = "
>> + "mode_precision[E_%smode].is_constant ()\n", m->name);
>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
>
> Have you tried this on an x86_64 system? I wouldn't expect it to work
> because of the:
>
> STATIC_ASSERT (N >= 2);
>
> in the poly_uint16 constructor.
>
>> + printf (" if (known_lt (mode_precision[E_%smode], "
>> + "size_one * BITS_PER_UNIT))\n", m->name);
>> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
>> + printf (" else\n");
>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>
> Now that the assert implicit in the original exact_div no longer holds,
> I think we should instead generalise it to can_div_away_from_zero_p
> (which will involve defining a new overload of can_div_away_from_zero_p).
> I think that will give the same result as the code above for the cases
> that the code above handles. But it should be more general too.
>
> TBH, I'm still sceptical that this is all that is needed. It seems
> unlikely that we've been so good at writing vector support code that
> we've made it work for precision < bitsize, despite that being an
> unsupported combination until now. But I guess we can fix problems
> on a case-by-case basis.
>
> Thanks,
> Richard
>
>> " BITS_PER_UNIT);\n", m->name, m->name);
>> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
>> printf (" adjust_mode_mask (E_%smode);\n", m->name);
>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
>> a->file, a->line, a->mode->name, a->adjustment);
>>
>> + /* Adjust precision to the actual bits size. */
>> + for (a = adj_precision; a; a = a->next)
>> + switch (a->mode->cl)
>> + {
>> + case MODE_VECTOR_BOOL:
>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
>> + a->adjustment);
>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
>> + break;
>> + default:
>> + break;
>> + }
>> +
>> puts ("}");
>> }
>>
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>> new file mode 100644
>> index 00000000000..e70960c5b6d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>> new file mode 100644
>> index 00000000000..dcc7a644a88
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>> new file mode 100644
>> index 00000000000..3af0513e006
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>> new file mode 100644
>> index 00000000000..ea3c360d756
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>> new file mode 100644
>> index 00000000000..9fc659d2402
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>> new file mode 100644
>> index 00000000000..98275e5267d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>> new file mode 100644
>> index 00000000000..8f6f0b11f09
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>> new file mode 100644
>> index 00000000000..d96959dd064
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>> @@ -0,0 +1,77 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
Hi Richard Sandiford,
Just tried the overloaded constant divisors with below print div, it works as you mentioned, 😉!
printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
"BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
template<unsigned int N, typename Ca, typename Cb, typename Cq>
inline typename if_nonpoly<Cb, bool>::type
can_div_away_from_zero_p (const poly_int_pod<N, Ca> &a,
Cb b,
poly_int_pod<N, Cq> *quotient)
{
if (!can_div_trunc_p (a, b, quotient))
return false;
if (maybe_ne (*quotient * b, a))
for (unsigned int i = 0; i < N; ++i)
quotient->coeffs[i] += (quotient->coeffs[i] < 0 ? -1 : 1);
return true;
}
But I may have a question about the one case as below.
Assume:
a = [4, 4], b = 8.
When meet can_div_trunc_p, it will check if the reminder is constant or not, aka a.coeffs[i] % 8 == 0 (i >= 1). If not constant reminder, the can_div_trunc_p will do nothing about quotient and return false.
Thus, when a = [4, 4] for can_div_away_from_zero_p, the output *quotient will be unchanged, aka the mod_size[E_%smode] will be unchanged for this case. However, the underlying mode_size will adjust it to the real byte size, and I am not sure if it is by design or requires additional handling.
Pan
From: 盼 李 <incarnation.p.lee@outlook.com>
Sent: Tuesday, February 28, 2023 5:59 PM
To: Richard Sandiford <richard.sandiford@arm.com>; Li, Pan2 <pan2.li@intel.com>
Cc: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted.
Pan
________________________________
From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
Sent: Tuesday, February 28, 2023 17:50
To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
Cc: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
"Li, Pan2" <pan2.li@intel.com<mailto:pan2.li@intel.com>> writes:
> Hi Richard Sandiford,
>
> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
>
> template<unsigned int N, typename Ca>
> inline POLY_CONST_RESULT (N, Ca, Ca)
> normalize_to_unit (const poly_int_pod<N, Ca> &a)
> {
> typedef POLY_CONST_COEFF (Ca, Ca) C;
>
> poly_int<N, C> normalized = a;
>
> if (normalized.is_constant())
> normalized.coeffs[0] = 1;
> else
> for (unsigned int i = 0; i < N; i++)
> POLY_SET_COEFF (C, normalized, i, 1);
>
> return normalized;
> }
>
> And then adjust the genmodes like below to consume the unit poly.
>
> printf (" poly_uint16 unit_poly = "
> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
> printf (" if (known_lt (mode_precision[E_%smode], "
> "unit_poly * BITS_PER_UNIT))\n", m->name);
> printf (" mode_size[E_%smode] = unit_poly;\n", m->name);
>
> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
My point was that we have multiple ways of dividing poly_ints:
- exact_div, for when the caller knows that the result is always exact
- can_div_trunc_p, for truncating division (round towards 0)
- can_div_away_from_zero_p, for rounding away from 0
- ...
This is like how we have multiple division *_EXPRs on trees.
Until now, exact_div was the correct choice for modes because vector
modes didn't have padding. We're now changing that, so my suggestion
in the review was to change the division operation that we use.
Rather than use exact_div, we should now use can_div_away_from_zero_p,
which would have the effect of rounding the quotient up.
Something like:
if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,
&mode_size[E_%smode]))
gcc_unreachable ();
But this will require a new overload of can_div_away_from_zero_p, since
the existing one is for constant quotients rather than constant divisors.
Thanks,
Richard
>
> Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
>
> Pan
>
> From: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>
> Sent: Monday, February 27, 2023 11:13 PM
> To: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
> Cc: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; rguenther@suse.de<mailto:rguenther@suse.de>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> Never mind, wish you have a good holiday.
>
> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
>
> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
>
> Thanks again for your professional suggestion, have a nice day, 😉!
>
> Pan
> ________________________________
> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com<mailto:richard.sandiford@arm.com%3cmailto:richard.sandiford@arm.com>>>
> Sent: Monday, February 27, 2023 22:24
> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>>
> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de<mailto:rguenther@suse.de%3cmailto:rguenther@suse.de>>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> Sorry for the slow reply, been away for a couple of weeks.
>
> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>> writes:
>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>
>> Fix the bug of the rvv bool mode precision with the adjustment.
>> The bits size of vbool*_t will be adjusted to
>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>> adjusted mode precison of vbool*_t will help underlying pass to
>> make the right decision for both the correctness and optimization.
>>
>> Given below sample code:
>> void test_1(int8_t * restrict in, int8_t * restrict out)
>> {
>> vbool8_t v2 = *(vbool8_t*)in;
>> vbool16_t v5 = *(vbool16_t*)in;
>> *(vbool16_t*)(out + 200) = v5;
>> *(vbool8_t*)(out + 100) = v2;
>> }
>>
>> Before the precision adjustment:
>> addi a4,a1,100
>> vsetvli a5,zero,e8,m1,ta,ma
>> addi a1,a1,200
>> vlm.v v24,0(a0)
>> vsm.v v24,0(a4)
>> // Need one vsetvli and vlm.v for correctness here.
>> vsm.v v24,0(a1)
>>
>> After the precision adjustment:
>> csrr t0,vlenb
>> slli t1,t0,1
>> csrr a3,vlenb
>> sub sp,sp,t1
>> slli a4,a3,1
>> add a4,a4,sp
>> sub a3,a4,a3
>> vsetvli a5,zero,e8,m1,ta,ma
>> addi a2,a1,200
>> vlm.v v24,0(a0)
>> vsm.v v24,0(a3)
>> addi a1,a1,100
>> vsetvli a4,zero,e8,mf2,ta,ma
>> csrr t0,vlenb
>> vlm.v v25,0(a3)
>> vsm.v v25,0(a2)
>> slli t1,t0,1
>> vsetvli a5,zero,e8,m1,ta,ma
>> vsm.v v24,0(a1)
>> add sp,sp,t1
>> jr ra
>>
>> However, there may be some optimization opportunates after
>> the mode precision adjustment. It can be token care of in
>> the RISC-V backend in the underlying separted PR(s).
>>
>> PR 108185
>> PR 108654
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>> * config/riscv/riscv.cc (riscv_v_adjust_precision):
>> * config/riscv/riscv.h (riscv_v_adjust_precision):
>> * genmodes.cc (ADJUST_PRECISION):
>> (emit_mode_adjustments):
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/pr108185-1.c: New test.
>> * gcc.target/riscv/pr108185-2.c: New test.
>> * gcc.target/riscv/pr108185-3.c: New test.
>> * gcc.target/riscv/pr108185-4.c: New test.
>> * gcc.target/riscv/pr108185-5.c: New test.
>> * gcc.target/riscv/pr108185-6.c: New test.
>> * gcc.target/riscv/pr108185-7.c: New test.
>> * gcc.target/riscv/pr108185-8.c: New test.
>>
>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>> ---
>> gcc/config/riscv/riscv-modes.def | 8 +++
>> gcc/config/riscv/riscv.cc | 12 ++++
>> gcc/config/riscv/riscv.h | 1 +
>> gcc/genmodes.cc | 25 ++++++-
>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
>> 12 files changed, 598 insertions(+), 1 deletion(-)
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>
>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
>> index d5305efa8a6..110bddce851 100644
>> --- a/gcc/config/riscv/riscv-modes.def
>> +++ b/gcc/config/riscv/riscv-modes.def
>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>>
>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
>> +
>> /*
>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index de3e1f903c7..cbe66c0e35b 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
>> return scale;
>> }
>>
>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
>> + PRECISION size for corresponding machine_mode. */
>> +
>> +poly_int64
>> +riscv_v_adjust_precision (machine_mode mode, int scale)
>> +{
>> + if (riscv_v_ext_vector_mode_p (mode))
>> + return riscv_vector_chunks * scale;
>> +
>> + return scale;
>> +}
>> +
>> /* Return true if X is a valid address for machine mode MODE. If it is,
>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
>> effect. */
>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>> index 5bc7f2f467d..15b9317a8ce 100644
>> --- a/gcc/config/riscv/riscv.h
>> +++ b/gcc/config/riscv/riscv.h
>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
>> extern unsigned riscv_bytes_per_vector_chunk;
>> extern poly_uint16 riscv_vector_chunks;
>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
>> /* The number of bits and bytes in a RVV vector. */
>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
>> index 2d418f09aab..12f4e6335e6 100644
>> --- a/gcc/genmodes.cc
>> +++ b/gcc/genmodes.cc
>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
>> static struct mode_adjust *adj_format;
>> static struct mode_adjust *adj_ibit;
>> static struct mode_adjust *adj_fbit;
>> +static struct mode_adjust *adj_precision;
>>
>> /* Mode class operations. */
>> static enum mode_class
>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
>> m->name, m->name);
>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
>> + printf (" poly_uint16 size_one = "
>> + "mode_precision[E_%smode].is_constant ()\n", m->name);
>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
>
> Have you tried this on an x86_64 system? I wouldn't expect it to work
> because of the:
>
> STATIC_ASSERT (N >= 2);
>
> in the poly_uint16 constructor.
>
>> + printf (" if (known_lt (mode_precision[E_%smode], "
>> + "size_one * BITS_PER_UNIT))\n", m->name);
>> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
>> + printf (" else\n");
>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>
> Now that the assert implicit in the original exact_div no longer holds,
> I think we should instead generalise it to can_div_away_from_zero_p
> (which will involve defining a new overload of can_div_away_from_zero_p).
> I think that will give the same result as the code above for the cases
> that the code above handles. But it should be more general too.
>
> TBH, I'm still sceptical that this is all that is needed. It seems
> unlikely that we've been so good at writing vector support code that
> we've made it work for precision < bitsize, despite that being an
> unsupported combination until now. But I guess we can fix problems
> on a case-by-case basis.
>
> Thanks,
> Richard
>
>> " BITS_PER_UNIT);\n", m->name, m->name);
>> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
>> printf (" adjust_mode_mask (E_%smode);\n", m->name);
>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
>> a->file, a->line, a->mode->name, a->adjustment);
>>
>> + /* Adjust precision to the actual bits size. */
>> + for (a = adj_precision; a; a = a->next)
>> + switch (a->mode->cl)
>> + {
>> + case MODE_VECTOR_BOOL:
>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
>> + a->adjustment);
>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
>> + break;
>> + default:
>> + break;
>> + }
>> +
>> puts ("}");
>> }
>>
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>> new file mode 100644
>> index 00000000000..e70960c5b6d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>> new file mode 100644
>> index 00000000000..dcc7a644a88
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>> new file mode 100644
>> index 00000000000..3af0513e006
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>> new file mode 100644
>> index 00000000000..ea3c360d756
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>> new file mode 100644
>> index 00000000000..9fc659d2402
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>> new file mode 100644
>> index 00000000000..98275e5267d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>> new file mode 100644
>> index 00000000000..8f6f0b11f09
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>> @@ -0,0 +1,68 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>> new file mode 100644
>> index 00000000000..d96959dd064
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>> @@ -0,0 +1,77 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>> +
>> +#include "riscv_vector.h"
>> +
>> +void
>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>> + vbool1_t v1 = *(vbool1_t*)in;
>> + vbool1_t v2 = *(vbool1_t*)in;
>> +
>> + *(vbool1_t*)(out + 100) = v1;
>> + *(vbool1_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>> + vbool2_t v1 = *(vbool2_t*)in;
>> + vbool2_t v2 = *(vbool2_t*)in;
>> +
>> + *(vbool2_t*)(out + 100) = v1;
>> + *(vbool2_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>> + vbool4_t v1 = *(vbool4_t*)in;
>> + vbool4_t v2 = *(vbool4_t*)in;
>> +
>> + *(vbool4_t*)(out + 100) = v1;
>> + *(vbool4_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>> + vbool8_t v1 = *(vbool8_t*)in;
>> + vbool8_t v2 = *(vbool8_t*)in;
>> +
>> + *(vbool8_t*)(out + 100) = v1;
>> + *(vbool8_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>> + vbool16_t v1 = *(vbool16_t*)in;
>> + vbool16_t v2 = *(vbool16_t*)in;
>> +
>> + *(vbool16_t*)(out + 100) = v1;
>> + *(vbool16_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>> + vbool32_t v1 = *(vbool32_t*)in;
>> + vbool32_t v2 = *(vbool32_t*)in;
>> +
>> + *(vbool32_t*)(out + 100) = v1;
>> + *(vbool32_t*)(out + 200) = v2;
>> +}
>> +
>> +void
>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>> + vbool64_t v1 = *(vbool64_t*)in;
>> + vbool64_t v2 = *(vbool64_t*)in;
>> +
>> + *(vbool64_t*)(out + 100) = v1;
>> + *(vbool64_t*)(out + 200) = v2;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
"Li, Pan2" <pan2.li@intel.com> writes:
> Hi Richard Sandiford,
>
> Just tried the overloaded constant divisors with below print div, it works as you mentioned, 😉!
>
> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
>
> template<unsigned int N, typename Ca, typename Cb, typename Cq>
> inline typename if_nonpoly<Cb, bool>::type
> can_div_away_from_zero_p (const poly_int_pod<N, Ca> &a,
> Cb b,
> poly_int_pod<N, Cq> *quotient)
> {
> if (!can_div_trunc_p (a, b, quotient))
> return false;
> if (maybe_ne (*quotient * b, a))
> for (unsigned int i = 0; i < N; ++i)
> quotient->coeffs[i] += (quotient->coeffs[i] < 0 ? -1 : 1);
> return true;
> }
>
> But I may have a question about the one case as below.
>
> Assume:
> a = [4, 4], b = 8.
>
> When meet can_div_trunc_p, it will check if the reminder is constant or not, aka a.coeffs[i] % 8 == 0 (i >= 1). If not constant reminder, the can_div_trunc_p will do nothing about quotient and return false.
>
> Thus, when a = [4, 4] for can_div_away_from_zero_p, the output *quotient will be unchanged, aka the mod_size[E_%smode] will be unchanged for this case. However, the underlying mode_size will adjust it to the real byte size, and I am not sure if it is by design or requires additional handling.
Is it right that, for RVV, a load or store of [4,4] will access [8,8]
bits, even when that means accessing fully-unused bytes? E.g. 4+4X
when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
Richard
> Pan
>
> From: 盼 李 <incarnation.p.lee@outlook.com>
> Sent: Tuesday, February 28, 2023 5:59 PM
> To: Richard Sandiford <richard.sandiford@arm.com>; Li, Pan2 <pan2.li@intel.com>
> Cc: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted.
>
> Pan
> ________________________________
> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
> Sent: Tuesday, February 28, 2023 17:50
> To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> Cc: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> "Li, Pan2" <pan2.li@intel.com<mailto:pan2.li@intel.com>> writes:
>> Hi Richard Sandiford,
>>
>> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
>>
>> template<unsigned int N, typename Ca>
>> inline POLY_CONST_RESULT (N, Ca, Ca)
>> normalize_to_unit (const poly_int_pod<N, Ca> &a)
>> {
>> typedef POLY_CONST_COEFF (Ca, Ca) C;
>>
>> poly_int<N, C> normalized = a;
>>
>> if (normalized.is_constant())
>> normalized.coeffs[0] = 1;
>> else
>> for (unsigned int i = 0; i < N; i++)
>> POLY_SET_COEFF (C, normalized, i, 1);
>>
>> return normalized;
>> }
>>
>> And then adjust the genmodes like below to consume the unit poly.
>>
>> printf (" poly_uint16 unit_poly = "
>> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
>> printf (" if (known_lt (mode_precision[E_%smode], "
>> "unit_poly * BITS_PER_UNIT))\n", m->name);
>> printf (" mode_size[E_%smode] = unit_poly;\n", m->name);
>>
>> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
>
> My point was that we have multiple ways of dividing poly_ints:
>
> - exact_div, for when the caller knows that the result is always exact
> - can_div_trunc_p, for truncating division (round towards 0)
> - can_div_away_from_zero_p, for rounding away from 0
> - ...
>
> This is like how we have multiple division *_EXPRs on trees.
>
> Until now, exact_div was the correct choice for modes because vector
> modes didn't have padding. We're now changing that, so my suggestion
> in the review was to change the division operation that we use.
> Rather than use exact_div, we should now use can_div_away_from_zero_p,
> which would have the effect of rounding the quotient up.
>
> Something like:
>
> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,
> &mode_size[E_%smode]))
> gcc_unreachable ();
>
> But this will require a new overload of can_div_away_from_zero_p, since
> the existing one is for constant quotients rather than constant divisors.
>
> Thanks,
> Richard
>
>>
>> Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
>>
>> Pan
>>
>> From: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>
>> Sent: Monday, February 27, 2023 11:13 PM
>> To: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
>> Cc: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; rguenther@suse.de<mailto:rguenther@suse.de>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>
>> Never mind, wish you have a good holiday.
>>
>> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
>>
>> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
>>
>> Thanks again for your professional suggestion, have a nice day, 😉!
>>
>> Pan
>> ________________________________
>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com<mailto:richard.sandiford@arm.com%3cmailto:richard.sandiford@arm.com>>>
>> Sent: Monday, February 27, 2023 22:24
>> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>>
>> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de<mailto:rguenther@suse.de%3cmailto:rguenther@suse.de>>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>
>> Sorry for the slow reply, been away for a couple of weeks.
>>
>> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>> writes:
>>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>
>>> Fix the bug of the rvv bool mode precision with the adjustment.
>>> The bits size of vbool*_t will be adjusted to
>>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>>> adjusted mode precison of vbool*_t will help underlying pass to
>>> make the right decision for both the correctness and optimization.
>>>
>>> Given below sample code:
>>> void test_1(int8_t * restrict in, int8_t * restrict out)
>>> {
>>> vbool8_t v2 = *(vbool8_t*)in;
>>> vbool16_t v5 = *(vbool16_t*)in;
>>> *(vbool16_t*)(out + 200) = v5;
>>> *(vbool8_t*)(out + 100) = v2;
>>> }
>>>
>>> Before the precision adjustment:
>>> addi a4,a1,100
>>> vsetvli a5,zero,e8,m1,ta,ma
>>> addi a1,a1,200
>>> vlm.v v24,0(a0)
>>> vsm.v v24,0(a4)
>>> // Need one vsetvli and vlm.v for correctness here.
>>> vsm.v v24,0(a1)
>>>
>>> After the precision adjustment:
>>> csrr t0,vlenb
>>> slli t1,t0,1
>>> csrr a3,vlenb
>>> sub sp,sp,t1
>>> slli a4,a3,1
>>> add a4,a4,sp
>>> sub a3,a4,a3
>>> vsetvli a5,zero,e8,m1,ta,ma
>>> addi a2,a1,200
>>> vlm.v v24,0(a0)
>>> vsm.v v24,0(a3)
>>> addi a1,a1,100
>>> vsetvli a4,zero,e8,mf2,ta,ma
>>> csrr t0,vlenb
>>> vlm.v v25,0(a3)
>>> vsm.v v25,0(a2)
>>> slli t1,t0,1
>>> vsetvli a5,zero,e8,m1,ta,ma
>>> vsm.v v24,0(a1)
>>> add sp,sp,t1
>>> jr ra
>>>
>>> However, there may be some optimization opportunates after
>>> the mode precision adjustment. It can be token care of in
>>> the RISC-V backend in the underlying separted PR(s).
>>>
>>> PR 108185
>>> PR 108654
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>>> * config/riscv/riscv.cc (riscv_v_adjust_precision):
>>> * config/riscv/riscv.h (riscv_v_adjust_precision):
>>> * genmodes.cc (ADJUST_PRECISION):
>>> (emit_mode_adjustments):
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/riscv/pr108185-1.c: New test.
>>> * gcc.target/riscv/pr108185-2.c: New test.
>>> * gcc.target/riscv/pr108185-3.c: New test.
>>> * gcc.target/riscv/pr108185-4.c: New test.
>>> * gcc.target/riscv/pr108185-5.c: New test.
>>> * gcc.target/riscv/pr108185-6.c: New test.
>>> * gcc.target/riscv/pr108185-7.c: New test.
>>> * gcc.target/riscv/pr108185-8.c: New test.
>>>
>>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>> ---
>>> gcc/config/riscv/riscv-modes.def | 8 +++
>>> gcc/config/riscv/riscv.cc | 12 ++++
>>> gcc/config/riscv/riscv.h | 1 +
>>> gcc/genmodes.cc | 25 ++++++-
>>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
>>> 12 files changed, 598 insertions(+), 1 deletion(-)
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>
>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
>>> index d5305efa8a6..110bddce851 100644
>>> --- a/gcc/config/riscv/riscv-modes.def
>>> +++ b/gcc/config/riscv/riscv-modes.def
>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>>>
>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
>>> +
>>> /*
>>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
>>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>> index de3e1f903c7..cbe66c0e35b 100644
>>> --- a/gcc/config/riscv/riscv.cc
>>> +++ b/gcc/config/riscv/riscv.cc
>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
>>> return scale;
>>> }
>>>
>>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
>>> + PRECISION size for corresponding machine_mode. */
>>> +
>>> +poly_int64
>>> +riscv_v_adjust_precision (machine_mode mode, int scale)
>>> +{
>>> + if (riscv_v_ext_vector_mode_p (mode))
>>> + return riscv_vector_chunks * scale;
>>> +
>>> + return scale;
>>> +}
>>> +
>>> /* Return true if X is a valid address for machine mode MODE. If it is,
>>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
>>> effect. */
>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>>> index 5bc7f2f467d..15b9317a8ce 100644
>>> --- a/gcc/config/riscv/riscv.h
>>> +++ b/gcc/config/riscv/riscv.h
>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
>>> extern unsigned riscv_bytes_per_vector_chunk;
>>> extern poly_uint16 riscv_vector_chunks;
>>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
>>> /* The number of bits and bytes in a RVV vector. */
>>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
>>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
>>> index 2d418f09aab..12f4e6335e6 100644
>>> --- a/gcc/genmodes.cc
>>> +++ b/gcc/genmodes.cc
>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
>>> static struct mode_adjust *adj_format;
>>> static struct mode_adjust *adj_ibit;
>>> static struct mode_adjust *adj_fbit;
>>> +static struct mode_adjust *adj_precision;
>>>
>>> /* Mode class operations. */
>>> static enum mode_class
>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
>>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
>>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
>>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
>>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
>>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
>>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
>>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
>>> m->name, m->name);
>>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
>>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
>>> + printf (" poly_uint16 size_one = "
>>> + "mode_precision[E_%smode].is_constant ()\n", m->name);
>>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
>>
>> Have you tried this on an x86_64 system? I wouldn't expect it to work
>> because of the:
>>
>> STATIC_ASSERT (N >= 2);
>>
>> in the poly_uint16 constructor.
>>
>>> + printf (" if (known_lt (mode_precision[E_%smode], "
>>> + "size_one * BITS_PER_UNIT))\n", m->name);
>>> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
>>> + printf (" else\n");
>>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>
>> Now that the assert implicit in the original exact_div no longer holds,
>> I think we should instead generalise it to can_div_away_from_zero_p
>> (which will involve defining a new overload of can_div_away_from_zero_p).
>> I think that will give the same result as the code above for the cases
>> that the code above handles. But it should be more general too.
>>
>> TBH, I'm still sceptical that this is all that is needed. It seems
>> unlikely that we've been so good at writing vector support code that
>> we've made it work for precision < bitsize, despite that being an
>> unsupported combination until now. But I guess we can fix problems
>> on a case-by-case basis.
>>
>> Thanks,
>> Richard
>>
>>> " BITS_PER_UNIT);\n", m->name, m->name);
>>> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
>>> printf (" adjust_mode_mask (E_%smode);\n", m->name);
>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
>>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
>>> a->file, a->line, a->mode->name, a->adjustment);
>>>
>>> + /* Adjust precision to the actual bits size. */
>>> + for (a = adj_precision; a; a = a->next)
>>> + switch (a->mode->cl)
>>> + {
>>> + case MODE_VECTOR_BOOL:
>>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
>>> + a->adjustment);
>>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
>>> + break;
>>> + default:
>>> + break;
>>> + }
>>> +
>>> puts ("}");
>>> }
>>>
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>> new file mode 100644
>>> index 00000000000..e70960c5b6d
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>> new file mode 100644
>>> index 00000000000..dcc7a644a88
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>> new file mode 100644
>>> index 00000000000..3af0513e006
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>> new file mode 100644
>>> index 00000000000..ea3c360d756
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>> new file mode 100644
>>> index 00000000000..9fc659d2402
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>> new file mode 100644
>>> index 00000000000..98275e5267d
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>> new file mode 100644
>>> index 00000000000..8f6f0b11f09
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>> new file mode 100644
>>> index 00000000000..d96959dd064
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>> @@ -0,0 +1,77 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
>>bits, even when that means accessing fully-unused bytes? E.g. 4+4X
>>when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
>>8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
>>of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
Hi, Richard. Thank you for helping us.
My understanding of RVV ISA:
In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VNx4QI, VNx2DI,...etc)
For data mode, we fully access the data and we don't have unused bytes, so we don't need to adjust precision.
However, for mask mode we access mask bit in compact model (since each mask bit for corresponding element are consecutive).
for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, these 4 modes have same bytesize (1,1) but different bitsize.
VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, VNx1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignment, I guess it can not).
If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2BI,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access in bit alignment. so they will the same in the access.
However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte, 2bytes, 4bytes. They are accessing different size.
This is my comprehension of RVV ISA, feel free to correct me.
Thanks.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2023-03-01 18:11
To: Li\, Pan2
CC: 盼 李; incarnation.p.lee--- via Gcc-patches; juzhe.zhong\@rivai.ai; kito.cheng\@sifive.com; rguenther\@suse.de
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
"Li, Pan2" <pan2.li@intel.com> writes:
> Hi Richard Sandiford,
>
> Just tried the overloaded constant divisors with below print div, it works as you mentioned, !
>
> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
>
> template<unsigned int N, typename Ca, typename Cb, typename Cq>
> inline typename if_nonpoly<Cb, bool>::type
> can_div_away_from_zero_p (const poly_int_pod<N, Ca> &a,
> Cb b,
> poly_int_pod<N, Cq> *quotient)
> {
> if (!can_div_trunc_p (a, b, quotient))
> return false;
> if (maybe_ne (*quotient * b, a))
> for (unsigned int i = 0; i < N; ++i)
> quotient->coeffs[i] += (quotient->coeffs[i] < 0 ? -1 : 1);
> return true;
> }
>
> But I may have a question about the one case as below.
>
> Assume:
> a = [4, 4], b = 8.
>
> When meet can_div_trunc_p, it will check if the reminder is constant or not, aka a.coeffs[i] % 8 == 0 (i >= 1). If not constant reminder, the can_div_trunc_p will do nothing about quotient and return false.
>
> Thus, when a = [4, 4] for can_div_away_from_zero_p, the output *quotient will be unchanged, aka the mod_size[E_%smode] will be unchanged for this case. However, the underlying mode_size will adjust it to the real byte size, and I am not sure if it is by design or requires additional handling.
Is it right that, for RVV, a load or store of [4,4] will access [8,8]
bits, even when that means accessing fully-unused bytes? E.g. 4+4X
when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
Richard
> Pan
>
> From: 盼 李 <incarnation.p.lee@outlook.com>
> Sent: Tuesday, February 28, 2023 5:59 PM
> To: Richard Sandiford <richard.sandiford@arm.com>; Li, Pan2 <pan2.li@intel.com>
> Cc: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted.
>
> Pan
> ________________________________
> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
> Sent: Tuesday, February 28, 2023 17:50
> To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> Cc: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> "Li, Pan2" <pan2.li@intel.com<mailto:pan2.li@intel.com>> writes:
>> Hi Richard Sandiford,
>>
>> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
>>
>> template<unsigned int N, typename Ca>
>> inline POLY_CONST_RESULT (N, Ca, Ca)
>> normalize_to_unit (const poly_int_pod<N, Ca> &a)
>> {
>> typedef POLY_CONST_COEFF (Ca, Ca) C;
>>
>> poly_int<N, C> normalized = a;
>>
>> if (normalized.is_constant())
>> normalized.coeffs[0] = 1;
>> else
>> for (unsigned int i = 0; i < N; i++)
>> POLY_SET_COEFF (C, normalized, i, 1);
>>
>> return normalized;
>> }
>>
>> And then adjust the genmodes like below to consume the unit poly.
>>
>> printf (" poly_uint16 unit_poly = "
>> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
>> printf (" if (known_lt (mode_precision[E_%smode], "
>> "unit_poly * BITS_PER_UNIT))\n", m->name);
>> printf (" mode_size[E_%smode] = unit_poly;\n", m->name);
>>
>> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
>
> My point was that we have multiple ways of dividing poly_ints:
>
> - exact_div, for when the caller knows that the result is always exact
> - can_div_trunc_p, for truncating division (round towards 0)
> - can_div_away_from_zero_p, for rounding away from 0
> - ...
>
> This is like how we have multiple division *_EXPRs on trees.
>
> Until now, exact_div was the correct choice for modes because vector
> modes didn't have padding. We're now changing that, so my suggestion
> in the review was to change the division operation that we use.
> Rather than use exact_div, we should now use can_div_away_from_zero_p,
> which would have the effect of rounding the quotient up.
>
> Something like:
>
> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,
> &mode_size[E_%smode]))
> gcc_unreachable ();
>
> But this will require a new overload of can_div_away_from_zero_p, since
> the existing one is for constant quotients rather than constant divisors.
>
> Thanks,
> Richard
>
>>
>> Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
>>
>> Pan
>>
>> From: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>
>> Sent: Monday, February 27, 2023 11:13 PM
>> To: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
>> Cc: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; rguenther@suse.de<mailto:rguenther@suse.de>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>
>> Never mind, wish you have a good holiday.
>>
>> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
>>
>> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
>>
>> Thanks again for your professional suggestion, have a nice day, !
>>
>> Pan
>> ________________________________
>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com<mailto:richard.sandiford@arm.com%3cmailto:richard.sandiford@arm.com>>>
>> Sent: Monday, February 27, 2023 22:24
>> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>>
>> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de<mailto:rguenther@suse.de%3cmailto:rguenther@suse.de>>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>
>> Sorry for the slow reply, been away for a couple of weeks.
>>
>> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>> writes:
>>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>
>>> Fix the bug of the rvv bool mode precision with the adjustment.
>>> The bits size of vbool*_t will be adjusted to
>>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>>> adjusted mode precison of vbool*_t will help underlying pass to
>>> make the right decision for both the correctness and optimization.
>>>
>>> Given below sample code:
>>> void test_1(int8_t * restrict in, int8_t * restrict out)
>>> {
>>> vbool8_t v2 = *(vbool8_t*)in;
>>> vbool16_t v5 = *(vbool16_t*)in;
>>> *(vbool16_t*)(out + 200) = v5;
>>> *(vbool8_t*)(out + 100) = v2;
>>> }
>>>
>>> Before the precision adjustment:
>>> addi a4,a1,100
>>> vsetvli a5,zero,e8,m1,ta,ma
>>> addi a1,a1,200
>>> vlm.v v24,0(a0)
>>> vsm.v v24,0(a4)
>>> // Need one vsetvli and vlm.v for correctness here.
>>> vsm.v v24,0(a1)
>>>
>>> After the precision adjustment:
>>> csrr t0,vlenb
>>> slli t1,t0,1
>>> csrr a3,vlenb
>>> sub sp,sp,t1
>>> slli a4,a3,1
>>> add a4,a4,sp
>>> sub a3,a4,a3
>>> vsetvli a5,zero,e8,m1,ta,ma
>>> addi a2,a1,200
>>> vlm.v v24,0(a0)
>>> vsm.v v24,0(a3)
>>> addi a1,a1,100
>>> vsetvli a4,zero,e8,mf2,ta,ma
>>> csrr t0,vlenb
>>> vlm.v v25,0(a3)
>>> vsm.v v25,0(a2)
>>> slli t1,t0,1
>>> vsetvli a5,zero,e8,m1,ta,ma
>>> vsm.v v24,0(a1)
>>> add sp,sp,t1
>>> jr ra
>>>
>>> However, there may be some optimization opportunates after
>>> the mode precision adjustment. It can be token care of in
>>> the RISC-V backend in the underlying separted PR(s).
>>>
>>> PR 108185
>>> PR 108654
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>>> * config/riscv/riscv.cc (riscv_v_adjust_precision):
>>> * config/riscv/riscv.h (riscv_v_adjust_precision):
>>> * genmodes.cc (ADJUST_PRECISION):
>>> (emit_mode_adjustments):
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/riscv/pr108185-1.c: New test.
>>> * gcc.target/riscv/pr108185-2.c: New test.
>>> * gcc.target/riscv/pr108185-3.c: New test.
>>> * gcc.target/riscv/pr108185-4.c: New test.
>>> * gcc.target/riscv/pr108185-5.c: New test.
>>> * gcc.target/riscv/pr108185-6.c: New test.
>>> * gcc.target/riscv/pr108185-7.c: New test.
>>> * gcc.target/riscv/pr108185-8.c: New test.
>>>
>>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>> ---
>>> gcc/config/riscv/riscv-modes.def | 8 +++
>>> gcc/config/riscv/riscv.cc | 12 ++++
>>> gcc/config/riscv/riscv.h | 1 +
>>> gcc/genmodes.cc | 25 ++++++-
>>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
>>> 12 files changed, 598 insertions(+), 1 deletion(-)
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>
>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
>>> index d5305efa8a6..110bddce851 100644
>>> --- a/gcc/config/riscv/riscv-modes.def
>>> +++ b/gcc/config/riscv/riscv-modes.def
>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>>>
>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
>>> +
>>> /*
>>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
>>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>> index de3e1f903c7..cbe66c0e35b 100644
>>> --- a/gcc/config/riscv/riscv.cc
>>> +++ b/gcc/config/riscv/riscv.cc
>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
>>> return scale;
>>> }
>>>
>>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
>>> + PRECISION size for corresponding machine_mode. */
>>> +
>>> +poly_int64
>>> +riscv_v_adjust_precision (machine_mode mode, int scale)
>>> +{
>>> + if (riscv_v_ext_vector_mode_p (mode))
>>> + return riscv_vector_chunks * scale;
>>> +
>>> + return scale;
>>> +}
>>> +
>>> /* Return true if X is a valid address for machine mode MODE. If it is,
>>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
>>> effect. */
>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>>> index 5bc7f2f467d..15b9317a8ce 100644
>>> --- a/gcc/config/riscv/riscv.h
>>> +++ b/gcc/config/riscv/riscv.h
>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
>>> extern unsigned riscv_bytes_per_vector_chunk;
>>> extern poly_uint16 riscv_vector_chunks;
>>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
>>> /* The number of bits and bytes in a RVV vector. */
>>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
>>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
>>> index 2d418f09aab..12f4e6335e6 100644
>>> --- a/gcc/genmodes.cc
>>> +++ b/gcc/genmodes.cc
>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
>>> static struct mode_adjust *adj_format;
>>> static struct mode_adjust *adj_ibit;
>>> static struct mode_adjust *adj_fbit;
>>> +static struct mode_adjust *adj_precision;
>>>
>>> /* Mode class operations. */
>>> static enum mode_class
>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
>>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
>>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
>>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
>>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
>>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
>>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
>>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
>>> m->name, m->name);
>>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
>>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
>>> + printf (" poly_uint16 size_one = "
>>> + "mode_precision[E_%smode].is_constant ()\n", m->name);
>>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
>>
>> Have you tried this on an x86_64 system? I wouldn't expect it to work
>> because of the:
>>
>> STATIC_ASSERT (N >= 2);
>>
>> in the poly_uint16 constructor.
>>
>>> + printf (" if (known_lt (mode_precision[E_%smode], "
>>> + "size_one * BITS_PER_UNIT))\n", m->name);
>>> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
>>> + printf (" else\n");
>>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>
>> Now that the assert implicit in the original exact_div no longer holds,
>> I think we should instead generalise it to can_div_away_from_zero_p
>> (which will involve defining a new overload of can_div_away_from_zero_p).
>> I think that will give the same result as the code above for the cases
>> that the code above handles. But it should be more general too.
>>
>> TBH, I'm still sceptical that this is all that is needed. It seems
>> unlikely that we've been so good at writing vector support code that
>> we've made it work for precision < bitsize, despite that being an
>> unsupported combination until now. But I guess we can fix problems
>> on a case-by-case basis.
>>
>> Thanks,
>> Richard
>>
>>> " BITS_PER_UNIT);\n", m->name, m->name);
>>> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
>>> printf (" adjust_mode_mask (E_%smode);\n", m->name);
>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
>>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
>>> a->file, a->line, a->mode->name, a->adjustment);
>>>
>>> + /* Adjust precision to the actual bits size. */
>>> + for (a = adj_precision; a; a = a->next)
>>> + switch (a->mode->cl)
>>> + {
>>> + case MODE_VECTOR_BOOL:
>>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
>>> + a->adjustment);
>>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
>>> + break;
>>> + default:
>>> + break;
>>> + }
>>> +
>>> puts ("}");
>>> }
>>>
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>> new file mode 100644
>>> index 00000000000..e70960c5b6d
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>> new file mode 100644
>>> index 00000000000..dcc7a644a88
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>> new file mode 100644
>>> index 00000000000..3af0513e006
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>> new file mode 100644
>>> index 00000000000..ea3c360d756
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>> new file mode 100644
>>> index 00000000000..9fc659d2402
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>> new file mode 100644
>>> index 00000000000..98275e5267d
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>> new file mode 100644
>>> index 00000000000..8f6f0b11f09
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>> new file mode 100644
>>> index 00000000000..d96959dd064
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>> @@ -0,0 +1,77 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
Thank you all for your quick response.
As juzhe mentioned, the memory access of RISC-V will be always aligned to the bytes boundary with the compact mode, aka ceil(vl / 8) bytes for vbool*.
Actually, the data [4,4] comes from the self-test, the RISC-V precision mode as below.
VNx64BI precision [0x40, 0x40].
VNx32BI precision [0x20, 0x20].
VNx16BI precision [0x10, 0x10].
VNx8BI precision [0x8, 0x8].
VNx4BI precision [0x8, 0x8].
VNx2BI precision [0x8, 0x8].
VNx1BI precision [0x8, 0x8].
The impact of data [4, 4] will impact the genmode part, we cannot write like below as the gcc_unreachable will be hitten.
if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, &mode_size[E_%smode]))
gcc_unreachable (); // Hit on [4, 4] of the self-test.
Pan
________________________________
From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Wednesday, March 1, 2023 18:46
To: richard.sandiford <richard.sandiford@arm.com>; pan2.li <pan2.li@intel.com>
Cc: incarnation.p.lee <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
>>bits, even when that means accessing fully-unused bytes? E.g. 4+4X
>>when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
>>8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
>>of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
Hi, Richard. Thank you for helping us.
My understanding of RVV ISA:
In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VNx4QI, VNx2DI,...etc)
For data mode, we fully access the data and we don't have unused bytes, so we don't need to adjust precision.
However, for mask mode we access mask bit in compact model (since each mask bit for corresponding element are consecutive).
for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, these 4 modes have same bytesize (1,1) but different bitsize.
VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, VNx1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignment, I guess it can not).
If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2BI,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access in bit alignment. so they will the same in the access.
However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte, 2bytes, 4bytes. They are accessing different size.
This is my comprehension of RVV ISA, feel free to correct me.
Thanks.
________________________________
juzhe.zhong@rivai.ai
From: Richard Sandiford<mailto:richard.sandiford@arm.com>
Date: 2023-03-01 18:11
To: Li\, Pan2<mailto:pan2.li@intel.com>
CC: 盼 李<mailto:incarnation.p.lee@outlook.com>; incarnation.p.lee--- via Gcc-patches<mailto:gcc-patches@gcc.gnu.org>; juzhe.zhong\@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng\@sifive.com<mailto:kito.cheng@sifive.com>; rguenther\@suse.de<mailto:rguenther@suse.de>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
"Li, Pan2" <pan2.li@intel.com> writes:
> Hi Richard Sandiford,
>
> Just tried the overloaded constant divisors with below print div, it works as you mentioned, !
>
> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
>
> template<unsigned int N, typename Ca, typename Cb, typename Cq>
> inline typename if_nonpoly<Cb, bool>::type
> can_div_away_from_zero_p (const poly_int_pod<N, Ca> &a,
> Cb b,
> poly_int_pod<N, Cq> *quotient)
> {
> if (!can_div_trunc_p (a, b, quotient))
> return false;
> if (maybe_ne (*quotient * b, a))
> for (unsigned int i = 0; i < N; ++i)
> quotient->coeffs[i] += (quotient->coeffs[i] < 0 ? -1 : 1);
> return true;
> }
>
> But I may have a question about the one case as below.
>
> Assume:
> a = [4, 4], b = 8.
>
> When meet can_div_trunc_p, it will check if the reminder is constant or not, aka a.coeffs[i] % 8 == 0 (i >= 1). If not constant reminder, the can_div_trunc_p will do nothing about quotient and return false.
>
> Thus, when a = [4, 4] for can_div_away_from_zero_p, the output *quotient will be unchanged, aka the mod_size[E_%smode] will be unchanged for this case. However, the underlying mode_size will adjust it to the real byte size, and I am not sure if it is by design or requires additional handling.
Is it right that, for RVV, a load or store of [4,4] will access [8,8]
bits, even when that means accessing fully-unused bytes? E.g. 4+4X
when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
Richard
> Pan
>
> From: 盼 李 <incarnation.p.lee@outlook.com>
> Sent: Tuesday, February 28, 2023 5:59 PM
> To: Richard Sandiford <richard.sandiford@arm.com>; Li, Pan2 <pan2.li@intel.com>
> Cc: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted.
>
> Pan
> ________________________________
> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
> Sent: Tuesday, February 28, 2023 17:50
> To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> Cc: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> "Li, Pan2" <pan2.li@intel.com<mailto:pan2.li@intel.com>> writes:
>> Hi Richard Sandiford,
>>
>> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
>>
>> template<unsigned int N, typename Ca>
>> inline POLY_CONST_RESULT (N, Ca, Ca)
>> normalize_to_unit (const poly_int_pod<N, Ca> &a)
>> {
>> typedef POLY_CONST_COEFF (Ca, Ca) C;
>>
>> poly_int<N, C> normalized = a;
>>
>> if (normalized.is_constant())
>> normalized.coeffs[0] = 1;
>> else
>> for (unsigned int i = 0; i < N; i++)
>> POLY_SET_COEFF (C, normalized, i, 1);
>>
>> return normalized;
>> }
>>
>> And then adjust the genmodes like below to consume the unit poly.
>>
>> printf (" poly_uint16 unit_poly = "
>> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
>> printf (" if (known_lt (mode_precision[E_%smode], "
>> "unit_poly * BITS_PER_UNIT))\n", m->name);
>> printf (" mode_size[E_%smode] = unit_poly;\n", m->name);
>>
>> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
>
> My point was that we have multiple ways of dividing poly_ints:
>
> - exact_div, for when the caller knows that the result is always exact
> - can_div_trunc_p, for truncating division (round towards 0)
> - can_div_away_from_zero_p, for rounding away from 0
> - ...
>
> This is like how we have multiple division *_EXPRs on trees.
>
> Until now, exact_div was the correct choice for modes because vector
> modes didn't have padding. We're now changing that, so my suggestion
> in the review was to change the division operation that we use.
> Rather than use exact_div, we should now use can_div_away_from_zero_p,
> which would have the effect of rounding the quotient up.
>
> Something like:
>
> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,
> &mode_size[E_%smode]))
> gcc_unreachable ();
>
> But this will require a new overload of can_div_away_from_zero_p, since
> the existing one is for constant quotients rather than constant divisors.
>
> Thanks,
> Richard
>
>>
>> Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
>>
>> Pan
>>
>> From: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>
>> Sent: Monday, February 27, 2023 11:13 PM
>> To: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
>> Cc: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; rguenther@suse.de<mailto:rguenther@suse.de>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>
>> Never mind, wish you have a good holiday.
>>
>> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
>>
>> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
>>
>> Thanks again for your professional suggestion, have a nice day, !
>>
>> Pan
>> ________________________________
>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com<mailto:richard.sandiford@arm.com%3cmailto:richard.sandiford@arm.com>>>
>> Sent: Monday, February 27, 2023 22:24
>> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>>
>> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de<mailto:rguenther@suse.de%3cmailto:rguenther@suse.de>>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>
>> Sorry for the slow reply, been away for a couple of weeks.
>>
>> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>> writes:
>>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>
>>> Fix the bug of the rvv bool mode precision with the adjustment.
>>> The bits size of vbool*_t will be adjusted to
>>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>>> adjusted mode precison of vbool*_t will help underlying pass to
>>> make the right decision for both the correctness and optimization.
>>>
>>> Given below sample code:
>>> void test_1(int8_t * restrict in, int8_t * restrict out)
>>> {
>>> vbool8_t v2 = *(vbool8_t*)in;
>>> vbool16_t v5 = *(vbool16_t*)in;
>>> *(vbool16_t*)(out + 200) = v5;
>>> *(vbool8_t*)(out + 100) = v2;
>>> }
>>>
>>> Before the precision adjustment:
>>> addi a4,a1,100
>>> vsetvli a5,zero,e8,m1,ta,ma
>>> addi a1,a1,200
>>> vlm.v v24,0(a0)
>>> vsm.v v24,0(a4)
>>> // Need one vsetvli and vlm.v for correctness here.
>>> vsm.v v24,0(a1)
>>>
>>> After the precision adjustment:
>>> csrr t0,vlenb
>>> slli t1,t0,1
>>> csrr a3,vlenb
>>> sub sp,sp,t1
>>> slli a4,a3,1
>>> add a4,a4,sp
>>> sub a3,a4,a3
>>> vsetvli a5,zero,e8,m1,ta,ma
>>> addi a2,a1,200
>>> vlm.v v24,0(a0)
>>> vsm.v v24,0(a3)
>>> addi a1,a1,100
>>> vsetvli a4,zero,e8,mf2,ta,ma
>>> csrr t0,vlenb
>>> vlm.v v25,0(a3)
>>> vsm.v v25,0(a2)
>>> slli t1,t0,1
>>> vsetvli a5,zero,e8,m1,ta,ma
>>> vsm.v v24,0(a1)
>>> add sp,sp,t1
>>> jr ra
>>>
>>> However, there may be some optimization opportunates after
>>> the mode precision adjustment. It can be token care of in
>>> the RISC-V backend in the underlying separted PR(s).
>>>
>>> PR 108185
>>> PR 108654
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>>> * config/riscv/riscv.cc (riscv_v_adjust_precision):
>>> * config/riscv/riscv.h (riscv_v_adjust_precision):
>>> * genmodes.cc (ADJUST_PRECISION):
>>> (emit_mode_adjustments):
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/riscv/pr108185-1.c: New test.
>>> * gcc.target/riscv/pr108185-2.c: New test.
>>> * gcc.target/riscv/pr108185-3.c: New test.
>>> * gcc.target/riscv/pr108185-4.c: New test.
>>> * gcc.target/riscv/pr108185-5.c: New test.
>>> * gcc.target/riscv/pr108185-6.c: New test.
>>> * gcc.target/riscv/pr108185-7.c: New test.
>>> * gcc.target/riscv/pr108185-8.c: New test.
>>>
>>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>> ---
>>> gcc/config/riscv/riscv-modes.def | 8 +++
>>> gcc/config/riscv/riscv.cc | 12 ++++
>>> gcc/config/riscv/riscv.h | 1 +
>>> gcc/genmodes.cc | 25 ++++++-
>>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
>>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
>>> 12 files changed, 598 insertions(+), 1 deletion(-)
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>
>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
>>> index d5305efa8a6..110bddce851 100644
>>> --- a/gcc/config/riscv/riscv-modes.def
>>> +++ b/gcc/config/riscv/riscv-modes.def
>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>>>
>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
>>> +
>>> /*
>>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
>>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>> index de3e1f903c7..cbe66c0e35b 100644
>>> --- a/gcc/config/riscv/riscv.cc
>>> +++ b/gcc/config/riscv/riscv.cc
>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
>>> return scale;
>>> }
>>>
>>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
>>> + PRECISION size for corresponding machine_mode. */
>>> +
>>> +poly_int64
>>> +riscv_v_adjust_precision (machine_mode mode, int scale)
>>> +{
>>> + if (riscv_v_ext_vector_mode_p (mode))
>>> + return riscv_vector_chunks * scale;
>>> +
>>> + return scale;
>>> +}
>>> +
>>> /* Return true if X is a valid address for machine mode MODE. If it is,
>>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
>>> effect. */
>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>>> index 5bc7f2f467d..15b9317a8ce 100644
>>> --- a/gcc/config/riscv/riscv.h
>>> +++ b/gcc/config/riscv/riscv.h
>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
>>> extern unsigned riscv_bytes_per_vector_chunk;
>>> extern poly_uint16 riscv_vector_chunks;
>>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
>>> /* The number of bits and bytes in a RVV vector. */
>>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
>>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
>>> index 2d418f09aab..12f4e6335e6 100644
>>> --- a/gcc/genmodes.cc
>>> +++ b/gcc/genmodes.cc
>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
>>> static struct mode_adjust *adj_format;
>>> static struct mode_adjust *adj_ibit;
>>> static struct mode_adjust *adj_fbit;
>>> +static struct mode_adjust *adj_precision;
>>>
>>> /* Mode class operations. */
>>> static enum mode_class
>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
>>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
>>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
>>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
>>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
>>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
>>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
>>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
>>> m->name, m->name);
>>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
>>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
>>> + printf (" poly_uint16 size_one = "
>>> + "mode_precision[E_%smode].is_constant ()\n", m->name);
>>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
>>
>> Have you tried this on an x86_64 system? I wouldn't expect it to work
>> because of the:
>>
>> STATIC_ASSERT (N >= 2);
>>
>> in the poly_uint16 constructor.
>>
>>> + printf (" if (known_lt (mode_precision[E_%smode], "
>>> + "size_one * BITS_PER_UNIT))\n", m->name);
>>> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
>>> + printf (" else\n");
>>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>
>> Now that the assert implicit in the original exact_div no longer holds,
>> I think we should instead generalise it to can_div_away_from_zero_p
>> (which will involve defining a new overload of can_div_away_from_zero_p).
>> I think that will give the same result as the code above for the cases
>> that the code above handles. But it should be more general too.
>>
>> TBH, I'm still sceptical that this is all that is needed. It seems
>> unlikely that we've been so good at writing vector support code that
>> we've made it work for precision < bitsize, despite that being an
>> unsupported combination until now. But I guess we can fix problems
>> on a case-by-case basis.
>>
>> Thanks,
>> Richard
>>
>>> " BITS_PER_UNIT);\n", m->name, m->name);
>>> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
>>> printf (" adjust_mode_mask (E_%smode);\n", m->name);
>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
>>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
>>> a->file, a->line, a->mode->name, a->adjustment);
>>>
>>> + /* Adjust precision to the actual bits size. */
>>> + for (a = adj_precision; a; a = a->next)
>>> + switch (a->mode->cl)
>>> + {
>>> + case MODE_VECTOR_BOOL:
>>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
>>> + a->adjustment);
>>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
>>> + break;
>>> + default:
>>> + break;
>>> + }
>>> +
>>> puts ("}");
>>> }
>>>
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>> new file mode 100644
>>> index 00000000000..e70960c5b6d
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>> new file mode 100644
>>> index 00000000000..dcc7a644a88
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>> new file mode 100644
>>> index 00000000000..3af0513e006
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>> new file mode 100644
>>> index 00000000000..ea3c360d756
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>> new file mode 100644
>>> index 00000000000..9fc659d2402
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>> new file mode 100644
>>> index 00000000000..98275e5267d
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>> new file mode 100644
>>> index 00000000000..8f6f0b11f09
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>> @@ -0,0 +1,68 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>> new file mode 100644
>>> index 00000000000..d96959dd064
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>> @@ -0,0 +1,77 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>> +
>>> +#include "riscv_vector.h"
>>> +
>>> +void
>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool1_t v1 = *(vbool1_t*)in;
>>> + vbool1_t v2 = *(vbool1_t*)in;
>>> +
>>> + *(vbool1_t*)(out + 100) = v1;
>>> + *(vbool1_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool2_t v1 = *(vbool2_t*)in;
>>> + vbool2_t v2 = *(vbool2_t*)in;
>>> +
>>> + *(vbool2_t*)(out + 100) = v1;
>>> + *(vbool2_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool4_t v1 = *(vbool4_t*)in;
>>> + vbool4_t v2 = *(vbool4_t*)in;
>>> +
>>> + *(vbool4_t*)(out + 100) = v1;
>>> + *(vbool4_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool8_t v1 = *(vbool8_t*)in;
>>> + vbool8_t v2 = *(vbool8_t*)in;
>>> +
>>> + *(vbool8_t*)(out + 100) = v1;
>>> + *(vbool8_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool16_t v1 = *(vbool16_t*)in;
>>> + vbool16_t v2 = *(vbool16_t*)in;
>>> +
>>> + *(vbool16_t*)(out + 100) = v1;
>>> + *(vbool16_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool32_t v1 = *(vbool32_t*)in;
>>> + vbool32_t v2 = *(vbool32_t*)in;
>>> +
>>> + *(vbool32_t*)(out + 100) = v1;
>>> + *(vbool32_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +void
>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>> + vbool64_t v1 = *(vbool64_t*)in;
>>> + vbool64_t v2 = *(vbool64_t*)in;
>>> +
>>> + *(vbool64_t*)(out + 100) = v1;
>>> + *(vbool64_t*)(out + 200) = v2;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Thank you all for your quick response.
>
> As juzhe mentioned, the memory access of RISC-V will be always aligned to the bytes boundary with the compact mode, aka ceil(vl / 8) bytes for vbool*.
OK, thanks to both of you. This is what I'd have expected.
In that case, I think both the can_div_away_from_zero_p and the
original patch (using size_one) will give the wrong results.
There isn't a way of representing ceil([4,4]/8) as a poly_int.
The result is (4+4X)/8 when X is odd and (8+4X)/8 when X is even.
> Actually, the data [4,4] comes from the self-test, the RISC-V precision mode as below.
>
> VNx64BI precision [0x40, 0x40].
> VNx32BI precision [0x20, 0x20].
> VNx16BI precision [0x10, 0x10].
> VNx8BI precision [0x8, 0x8].
> VNx4BI precision [0x8, 0x8].
> VNx2BI precision [0x8, 0x8].
> VNx1BI precision [0x8, 0x8].
Ah, OK. Which self-test causes this?
Richard
> The impact of data [4, 4] will impact the genmode part, we cannot write like below as the gcc_unreachable will be hitten.
>
> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, &mode_size[E_%smode]))
> gcc_unreachable (); // Hit on [4, 4] of the self-test.
>
> Pan
> ________________________________
> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> Sent: Wednesday, March 1, 2023 18:46
> To: richard.sandiford <richard.sandiford@arm.com>; pan2.li <pan2.li@intel.com>
> Cc: incarnation.p.lee <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
>>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
>>>bits, even when that means accessing fully-unused bytes? E.g. 4+4X
>>>when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
>>>8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
>>>of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
>
> Hi, Richard. Thank you for helping us.
> My understanding of RVV ISA:
>
> In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VNx4QI, VNx2DI,...etc)
> For data mode, we fully access the data and we don't have unused bytes, so we don't need to adjust precision.
> However, for mask mode we access mask bit in compact model (since each mask bit for corresponding element are consecutive).
> for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, these 4 modes have same bytesize (1,1) but different bitsize.
>
> VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, VNx1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignment, I guess it can not).
>
> If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2BI,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access in bit alignment. so they will the same in the access.
> However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte, 2bytes, 4bytes. They are accessing different size.
>
> This is my comprehension of RVV ISA, feel free to correct me.
> Thanks.
>
> ________________________________
> juzhe.zhong@rivai.ai
>
> From: Richard Sandiford<mailto:richard.sandiford@arm.com>
> Date: 2023-03-01 18:11
> To: Li\, Pan2<mailto:pan2.li@intel.com>
> CC: 盼 李<mailto:incarnation.p.lee@outlook.com>; incarnation.p.lee--- via Gcc-patches<mailto:gcc-patches@gcc.gnu.org>; juzhe.zhong\@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng\@sifive.com<mailto:kito.cheng@sifive.com>; rguenther\@suse.de<mailto:rguenther@suse.de>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> "Li, Pan2" <pan2.li@intel.com> writes:
>> Hi Richard Sandiford,
>>
>> Just tried the overloaded constant divisors with below print div, it works as you mentioned, !
>>
>> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
>> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
>>
>> template<unsigned int N, typename Ca, typename Cb, typename Cq>
>> inline typename if_nonpoly<Cb, bool>::type
>> can_div_away_from_zero_p (const poly_int_pod<N, Ca> &a,
>> Cb b,
>> poly_int_pod<N, Cq> *quotient)
>> {
>> if (!can_div_trunc_p (a, b, quotient))
>> return false;
>> if (maybe_ne (*quotient * b, a))
>> for (unsigned int i = 0; i < N; ++i)
>> quotient->coeffs[i] += (quotient->coeffs[i] < 0 ? -1 : 1);
>> return true;
>> }
>>
>> But I may have a question about the one case as below.
>>
>> Assume:
>> a = [4, 4], b = 8.
>>
>> When meet can_div_trunc_p, it will check if the reminder is constant or not, aka a.coeffs[i] % 8 == 0 (i >= 1). If not constant reminder, the can_div_trunc_p will do nothing about quotient and return false.
>>
>> Thus, when a = [4, 4] for can_div_away_from_zero_p, the output *quotient will be unchanged, aka the mod_size[E_%smode] will be unchanged for this case. However, the underlying mode_size will adjust it to the real byte size, and I am not sure if it is by design or requires additional handling.
>
> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
> bits, even when that means accessing fully-unused bytes? E.g. 4+4X
> when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
> 8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
> of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
>
> Richard
>
>> Pan
>>
>> From: 盼 李 <incarnation.p.lee@outlook.com>
>> Sent: Tuesday, February 28, 2023 5:59 PM
>> To: Richard Sandiford <richard.sandiford@arm.com>; Li, Pan2 <pan2.li@intel.com>
>> Cc: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>
>> Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted.
>>
>> Pan
>> ________________________________
>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
>> Sent: Tuesday, February 28, 2023 17:50
>> To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>> Cc: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>
>> "Li, Pan2" <pan2.li@intel.com<mailto:pan2.li@intel.com>> writes:
>>> Hi Richard Sandiford,
>>>
>>> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
>>>
>>> template<unsigned int N, typename Ca>
>>> inline POLY_CONST_RESULT (N, Ca, Ca)
>>> normalize_to_unit (const poly_int_pod<N, Ca> &a)
>>> {
>>> typedef POLY_CONST_COEFF (Ca, Ca) C;
>>>
>>> poly_int<N, C> normalized = a;
>>>
>>> if (normalized.is_constant())
>>> normalized.coeffs[0] = 1;
>>> else
>>> for (unsigned int i = 0; i < N; i++)
>>> POLY_SET_COEFF (C, normalized, i, 1);
>>>
>>> return normalized;
>>> }
>>>
>>> And then adjust the genmodes like below to consume the unit poly.
>>>
>>> printf (" poly_uint16 unit_poly = "
>>> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
>>> printf (" if (known_lt (mode_precision[E_%smode], "
>>> "unit_poly * BITS_PER_UNIT))\n", m->name);
>>> printf (" mode_size[E_%smode] = unit_poly;\n", m->name);
>>>
>>> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
>>
>> My point was that we have multiple ways of dividing poly_ints:
>>
>> - exact_div, for when the caller knows that the result is always exact
>> - can_div_trunc_p, for truncating division (round towards 0)
>> - can_div_away_from_zero_p, for rounding away from 0
>> - ...
>>
>> This is like how we have multiple division *_EXPRs on trees.
>>
>> Until now, exact_div was the correct choice for modes because vector
>> modes didn't have padding. We're now changing that, so my suggestion
>> in the review was to change the division operation that we use.
>> Rather than use exact_div, we should now use can_div_away_from_zero_p,
>> which would have the effect of rounding the quotient up.
>>
>> Something like:
>>
>> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,
>> &mode_size[E_%smode]))
>> gcc_unreachable ();
>>
>> But this will require a new overload of can_div_away_from_zero_p, since
>> the existing one is for constant quotients rather than constant divisors.
>>
>> Thanks,
>> Richard
>>
>>>
>>> Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
>>>
>>> Pan
>>>
>>> From: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>
>>> Sent: Monday, February 27, 2023 11:13 PM
>>> To: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
>>> Cc: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; rguenther@suse.de<mailto:rguenther@suse.de>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>
>>> Never mind, wish you have a good holiday.
>>>
>>> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
>>>
>>> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
>>>
>>> Thanks again for your professional suggestion, have a nice day, !
>>>
>>> Pan
>>> ________________________________
>>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com<mailto:richard.sandiford@arm.com%3cmailto:richard.sandiford@arm.com>>>
>>> Sent: Monday, February 27, 2023 22:24
>>> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>>
>>> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de<mailto:rguenther@suse.de%3cmailto:rguenther@suse.de>>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>
>>> Sorry for the slow reply, been away for a couple of weeks.
>>>
>>> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>> writes:
>>>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>>
>>>> Fix the bug of the rvv bool mode precision with the adjustment.
>>>> The bits size of vbool*_t will be adjusted to
>>>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>>>> adjusted mode precison of vbool*_t will help underlying pass to
>>>> make the right decision for both the correctness and optimization.
>>>>
>>>> Given below sample code:
>>>> void test_1(int8_t * restrict in, int8_t * restrict out)
>>>> {
>>>> vbool8_t v2 = *(vbool8_t*)in;
>>>> vbool16_t v5 = *(vbool16_t*)in;
>>>> *(vbool16_t*)(out + 200) = v5;
>>>> *(vbool8_t*)(out + 100) = v2;
>>>> }
>>>>
>>>> Before the precision adjustment:
>>>> addi a4,a1,100
>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>> addi a1,a1,200
>>>> vlm.v v24,0(a0)
>>>> vsm.v v24,0(a4)
>>>> // Need one vsetvli and vlm.v for correctness here.
>>>> vsm.v v24,0(a1)
>>>>
>>>> After the precision adjustment:
>>>> csrr t0,vlenb
>>>> slli t1,t0,1
>>>> csrr a3,vlenb
>>>> sub sp,sp,t1
>>>> slli a4,a3,1
>>>> add a4,a4,sp
>>>> sub a3,a4,a3
>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>> addi a2,a1,200
>>>> vlm.v v24,0(a0)
>>>> vsm.v v24,0(a3)
>>>> addi a1,a1,100
>>>> vsetvli a4,zero,e8,mf2,ta,ma
>>>> csrr t0,vlenb
>>>> vlm.v v25,0(a3)
>>>> vsm.v v25,0(a2)
>>>> slli t1,t0,1
>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>> vsm.v v24,0(a1)
>>>> add sp,sp,t1
>>>> jr ra
>>>>
>>>> However, there may be some optimization opportunates after
>>>> the mode precision adjustment. It can be token care of in
>>>> the RISC-V backend in the underlying separted PR(s).
>>>>
>>>> PR 108185
>>>> PR 108654
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>>>> * config/riscv/riscv.cc (riscv_v_adjust_precision):
>>>> * config/riscv/riscv.h (riscv_v_adjust_precision):
>>>> * genmodes.cc (ADJUST_PRECISION):
>>>> (emit_mode_adjustments):
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> * gcc.target/riscv/pr108185-1.c: New test.
>>>> * gcc.target/riscv/pr108185-2.c: New test.
>>>> * gcc.target/riscv/pr108185-3.c: New test.
>>>> * gcc.target/riscv/pr108185-4.c: New test.
>>>> * gcc.target/riscv/pr108185-5.c: New test.
>>>> * gcc.target/riscv/pr108185-6.c: New test.
>>>> * gcc.target/riscv/pr108185-7.c: New test.
>>>> * gcc.target/riscv/pr108185-8.c: New test.
>>>>
>>>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>> ---
>>>> gcc/config/riscv/riscv-modes.def | 8 +++
>>>> gcc/config/riscv/riscv.cc | 12 ++++
>>>> gcc/config/riscv/riscv.h | 1 +
>>>> gcc/genmodes.cc | 25 ++++++-
>>>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
>>>> 12 files changed, 598 insertions(+), 1 deletion(-)
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>>
>>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
>>>> index d5305efa8a6..110bddce851 100644
>>>> --- a/gcc/config/riscv/riscv-modes.def
>>>> +++ b/gcc/config/riscv/riscv-modes.def
>>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>>>>
>>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
>>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
>>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
>>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
>>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
>>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
>>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
>>>> +
>>>> /*
>>>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
>>>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
>>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>>> index de3e1f903c7..cbe66c0e35b 100644
>>>> --- a/gcc/config/riscv/riscv.cc
>>>> +++ b/gcc/config/riscv/riscv.cc
>>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
>>>> return scale;
>>>> }
>>>>
>>>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
>>>> + PRECISION size for corresponding machine_mode. */
>>>> +
>>>> +poly_int64
>>>> +riscv_v_adjust_precision (machine_mode mode, int scale)
>>>> +{
>>>> + if (riscv_v_ext_vector_mode_p (mode))
>>>> + return riscv_vector_chunks * scale;
>>>> +
>>>> + return scale;
>>>> +}
>>>> +
>>>> /* Return true if X is a valid address for machine mode MODE. If it is,
>>>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
>>>> effect. */
>>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>>>> index 5bc7f2f467d..15b9317a8ce 100644
>>>> --- a/gcc/config/riscv/riscv.h
>>>> +++ b/gcc/config/riscv/riscv.h
>>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
>>>> extern unsigned riscv_bytes_per_vector_chunk;
>>>> extern poly_uint16 riscv_vector_chunks;
>>>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
>>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
>>>> /* The number of bits and bytes in a RVV vector. */
>>>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
>>>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
>>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
>>>> index 2d418f09aab..12f4e6335e6 100644
>>>> --- a/gcc/genmodes.cc
>>>> +++ b/gcc/genmodes.cc
>>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
>>>> static struct mode_adjust *adj_format;
>>>> static struct mode_adjust *adj_ibit;
>>>> static struct mode_adjust *adj_fbit;
>>>> +static struct mode_adjust *adj_precision;
>>>>
>>>> /* Mode class operations. */
>>>> static enum mode_class
>>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
>>>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
>>>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
>>>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
>>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
>>>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
>>>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
>>>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
>>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
>>>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
>>>> m->name, m->name);
>>>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
>>>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
>>>> + printf (" poly_uint16 size_one = "
>>>> + "mode_precision[E_%smode].is_constant ()\n", m->name);
>>>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
>>>
>>> Have you tried this on an x86_64 system? I wouldn't expect it to work
>>> because of the:
>>>
>>> STATIC_ASSERT (N >= 2);
>>>
>>> in the poly_uint16 constructor.
>>>
>>>> + printf (" if (known_lt (mode_precision[E_%smode], "
>>>> + "size_one * BITS_PER_UNIT))\n", m->name);
>>>> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
>>>> + printf (" else\n");
>>>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>>
>>> Now that the assert implicit in the original exact_div no longer holds,
>>> I think we should instead generalise it to can_div_away_from_zero_p
>>> (which will involve defining a new overload of can_div_away_from_zero_p).
>>> I think that will give the same result as the code above for the cases
>>> that the code above handles. But it should be more general too.
>>>
>>> TBH, I'm still sceptical that this is all that is needed. It seems
>>> unlikely that we've been so good at writing vector support code that
>>> we've made it work for precision < bitsize, despite that being an
>>> unsupported combination until now. But I guess we can fix problems
>>> on a case-by-case basis.
>>>
>>> Thanks,
>>> Richard
>>>
>>>> " BITS_PER_UNIT);\n", m->name, m->name);
>>>> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
>>>> printf (" adjust_mode_mask (E_%smode);\n", m->name);
>>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
>>>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
>>>> a->file, a->line, a->mode->name, a->adjustment);
>>>>
>>>> + /* Adjust precision to the actual bits size. */
>>>> + for (a = adj_precision; a; a = a->next)
>>>> + switch (a->mode->cl)
>>>> + {
>>>> + case MODE_VECTOR_BOOL:
>>>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
>>>> + a->adjustment);
>>>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
>>>> + break;
>>>> + default:
>>>> + break;
>>>> + }
>>>> +
>>>> puts ("}");
>>>> }
>>>>
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>> new file mode 100644
>>>> index 00000000000..e70960c5b6d
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>> new file mode 100644
>>>> index 00000000000..dcc7a644a88
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>> new file mode 100644
>>>> index 00000000000..3af0513e006
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>> new file mode 100644
>>>> index 00000000000..ea3c360d756
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>> new file mode 100644
>>>> index 00000000000..9fc659d2402
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>> new file mode 100644
>>>> index 00000000000..98275e5267d
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>> new file mode 100644
>>>> index 00000000000..8f6f0b11f09
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>> new file mode 100644
>>>> index 00000000000..d96959dd064
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>> @@ -0,0 +1,77 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
Thank you for the explanation.
For [4,4] I need extra time to figure out which one, but I confirmed it occurs from the log.
This PR precision adjustment part tries to align the ISA as juzhe mentioned, by the underlying precision adjustment part.
VNx64BI precision [0x40, 0x40] // unchanged
VNx32BI precision [0x20, 0x20] // unchanged
VNx16BI precision [0x10, 0x10] // unchanged
VNx8BI precision [0x8, 0x8] // unchanged
VNx4BI precision [0x8, 0x8] => [4, 4]
VNx2BI precision [0x8, 0x8] => [2, 2]
VNx1BI precision [0x8, 0x8] => [1, 1]
Pan
________________________________
From: Richard Sandiford <richard.sandiford@arm.com>
Sent: Wednesday, March 1, 2023 19:11
To: 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; pan2.li <pan2.li@intel.com>; 盼 李 <incarnation.p.lee@outlook.com>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Thank you all for your quick response.
>
> As juzhe mentioned, the memory access of RISC-V will be always aligned to the bytes boundary with the compact mode, aka ceil(vl / 8) bytes for vbool*.
OK, thanks to both of you. This is what I'd have expected.
In that case, I think both the can_div_away_from_zero_p and the
original patch (using size_one) will give the wrong results.
There isn't a way of representing ceil([4,4]/8) as a poly_int.
The result is (4+4X)/8 when X is odd and (8+4X)/8 when X is even.
> Actually, the data [4,4] comes from the self-test, the RISC-V precision mode as below.
>
> VNx64BI precision [0x40, 0x40].
> VNx32BI precision [0x20, 0x20].
> VNx16BI precision [0x10, 0x10].
> VNx8BI precision [0x8, 0x8].
> VNx4BI precision [0x8, 0x8].
> VNx2BI precision [0x8, 0x8].
> VNx1BI precision [0x8, 0x8].
Ah, OK. Which self-test causes this?
Richard
> The impact of data [4, 4] will impact the genmode part, we cannot write like below as the gcc_unreachable will be hitten.
>
> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, &mode_size[E_%smode]))
> gcc_unreachable (); // Hit on [4, 4] of the self-test.
>
> Pan
> ________________________________
> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> Sent: Wednesday, March 1, 2023 18:46
> To: richard.sandiford <richard.sandiford@arm.com>; pan2.li <pan2.li@intel.com>
> Cc: incarnation.p.lee <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
>>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
>>>bits, even when that means accessing fully-unused bytes? E.g. 4+4X
>>>when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
>>>8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
>>>of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
>
> Hi, Richard. Thank you for helping us.
> My understanding of RVV ISA:
>
> In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VNx4QI, VNx2DI,...etc)
> For data mode, we fully access the data and we don't have unused bytes, so we don't need to adjust precision.
> However, for mask mode we access mask bit in compact model (since each mask bit for corresponding element are consecutive).
> for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, these 4 modes have same bytesize (1,1) but different bitsize.
>
> VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, VNx1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignment, I guess it can not).
>
> If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2BI,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access in bit alignment. so they will the same in the access.
> However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte, 2bytes, 4bytes. They are accessing different size.
>
> This is my comprehension of RVV ISA, feel free to correct me.
> Thanks.
>
> ________________________________
> juzhe.zhong@rivai.ai
>
> From: Richard Sandiford<mailto:richard.sandiford@arm.com>
> Date: 2023-03-01 18:11
> To: Li\, Pan2<mailto:pan2.li@intel.com>
> CC: 盼 李<mailto:incarnation.p.lee@outlook.com>; incarnation.p.lee--- via Gcc-patches<mailto:gcc-patches@gcc.gnu.org>; juzhe.zhong\@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng\@sifive.com<mailto:kito.cheng@sifive.com>; rguenther\@suse.de<mailto:rguenther@suse.de>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> "Li, Pan2" <pan2.li@intel.com> writes:
>> Hi Richard Sandiford,
>>
>> Just tried the overloaded constant divisors with below print div, it works as you mentioned, !
>>
>> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
>> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
>>
>> template<unsigned int N, typename Ca, typename Cb, typename Cq>
>> inline typename if_nonpoly<Cb, bool>::type
>> can_div_away_from_zero_p (const poly_int_pod<N, Ca> &a,
>> Cb b,
>> poly_int_pod<N, Cq> *quotient)
>> {
>> if (!can_div_trunc_p (a, b, quotient))
>> return false;
>> if (maybe_ne (*quotient * b, a))
>> for (unsigned int i = 0; i < N; ++i)
>> quotient->coeffs[i] += (quotient->coeffs[i] < 0 ? -1 : 1);
>> return true;
>> }
>>
>> But I may have a question about the one case as below.
>>
>> Assume:
>> a = [4, 4], b = 8.
>>
>> When meet can_div_trunc_p, it will check if the reminder is constant or not, aka a.coeffs[i] % 8 == 0 (i >= 1). If not constant reminder, the can_div_trunc_p will do nothing about quotient and return false.
>>
>> Thus, when a = [4, 4] for can_div_away_from_zero_p, the output *quotient will be unchanged, aka the mod_size[E_%smode] will be unchanged for this case. However, the underlying mode_size will adjust it to the real byte size, and I am not sure if it is by design or requires additional handling.
>
> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
> bits, even when that means accessing fully-unused bytes? E.g. 4+4X
> when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
> 8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
> of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
>
> Richard
>
>> Pan
>>
>> From: 盼 李 <incarnation.p.lee@outlook.com>
>> Sent: Tuesday, February 28, 2023 5:59 PM
>> To: Richard Sandiford <richard.sandiford@arm.com>; Li, Pan2 <pan2.li@intel.com>
>> Cc: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>
>> Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted.
>>
>> Pan
>> ________________________________
>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
>> Sent: Tuesday, February 28, 2023 17:50
>> To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>> Cc: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>
>> "Li, Pan2" <pan2.li@intel.com<mailto:pan2.li@intel.com>> writes:
>>> Hi Richard Sandiford,
>>>
>>> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
>>>
>>> template<unsigned int N, typename Ca>
>>> inline POLY_CONST_RESULT (N, Ca, Ca)
>>> normalize_to_unit (const poly_int_pod<N, Ca> &a)
>>> {
>>> typedef POLY_CONST_COEFF (Ca, Ca) C;
>>>
>>> poly_int<N, C> normalized = a;
>>>
>>> if (normalized.is_constant())
>>> normalized.coeffs[0] = 1;
>>> else
>>> for (unsigned int i = 0; i < N; i++)
>>> POLY_SET_COEFF (C, normalized, i, 1);
>>>
>>> return normalized;
>>> }
>>>
>>> And then adjust the genmodes like below to consume the unit poly.
>>>
>>> printf (" poly_uint16 unit_poly = "
>>> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
>>> printf (" if (known_lt (mode_precision[E_%smode], "
>>> "unit_poly * BITS_PER_UNIT))\n", m->name);
>>> printf (" mode_size[E_%smode] = unit_poly;\n", m->name);
>>>
>>> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
>>
>> My point was that we have multiple ways of dividing poly_ints:
>>
>> - exact_div, for when the caller knows that the result is always exact
>> - can_div_trunc_p, for truncating division (round towards 0)
>> - can_div_away_from_zero_p, for rounding away from 0
>> - ...
>>
>> This is like how we have multiple division *_EXPRs on trees.
>>
>> Until now, exact_div was the correct choice for modes because vector
>> modes didn't have padding. We're now changing that, so my suggestion
>> in the review was to change the division operation that we use.
>> Rather than use exact_div, we should now use can_div_away_from_zero_p,
>> which would have the effect of rounding the quotient up.
>>
>> Something like:
>>
>> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,
>> &mode_size[E_%smode]))
>> gcc_unreachable ();
>>
>> But this will require a new overload of can_div_away_from_zero_p, since
>> the existing one is for constant quotients rather than constant divisors.
>>
>> Thanks,
>> Richard
>>
>>>
>>> Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
>>>
>>> Pan
>>>
>>> From: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>
>>> Sent: Monday, February 27, 2023 11:13 PM
>>> To: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
>>> Cc: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; rguenther@suse.de<mailto:rguenther@suse.de>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>
>>> Never mind, wish you have a good holiday.
>>>
>>> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
>>>
>>> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
>>>
>>> Thanks again for your professional suggestion, have a nice day, !
>>>
>>> Pan
>>> ________________________________
>>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com<mailto:richard.sandiford@arm.com%3cmailto:richard.sandiford@arm.com>>>
>>> Sent: Monday, February 27, 2023 22:24
>>> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>>
>>> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de<mailto:rguenther@suse.de%3cmailto:rguenther@suse.de>>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>
>>> Sorry for the slow reply, been away for a couple of weeks.
>>>
>>> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>> writes:
>>>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>>
>>>> Fix the bug of the rvv bool mode precision with the adjustment.
>>>> The bits size of vbool*_t will be adjusted to
>>>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>>>> adjusted mode precison of vbool*_t will help underlying pass to
>>>> make the right decision for both the correctness and optimization.
>>>>
>>>> Given below sample code:
>>>> void test_1(int8_t * restrict in, int8_t * restrict out)
>>>> {
>>>> vbool8_t v2 = *(vbool8_t*)in;
>>>> vbool16_t v5 = *(vbool16_t*)in;
>>>> *(vbool16_t*)(out + 200) = v5;
>>>> *(vbool8_t*)(out + 100) = v2;
>>>> }
>>>>
>>>> Before the precision adjustment:
>>>> addi a4,a1,100
>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>> addi a1,a1,200
>>>> vlm.v v24,0(a0)
>>>> vsm.v v24,0(a4)
>>>> // Need one vsetvli and vlm.v for correctness here.
>>>> vsm.v v24,0(a1)
>>>>
>>>> After the precision adjustment:
>>>> csrr t0,vlenb
>>>> slli t1,t0,1
>>>> csrr a3,vlenb
>>>> sub sp,sp,t1
>>>> slli a4,a3,1
>>>> add a4,a4,sp
>>>> sub a3,a4,a3
>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>> addi a2,a1,200
>>>> vlm.v v24,0(a0)
>>>> vsm.v v24,0(a3)
>>>> addi a1,a1,100
>>>> vsetvli a4,zero,e8,mf2,ta,ma
>>>> csrr t0,vlenb
>>>> vlm.v v25,0(a3)
>>>> vsm.v v25,0(a2)
>>>> slli t1,t0,1
>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>> vsm.v v24,0(a1)
>>>> add sp,sp,t1
>>>> jr ra
>>>>
>>>> However, there may be some optimization opportunates after
>>>> the mode precision adjustment. It can be token care of in
>>>> the RISC-V backend in the underlying separted PR(s).
>>>>
>>>> PR 108185
>>>> PR 108654
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>>>> * config/riscv/riscv.cc (riscv_v_adjust_precision):
>>>> * config/riscv/riscv.h (riscv_v_adjust_precision):
>>>> * genmodes.cc (ADJUST_PRECISION):
>>>> (emit_mode_adjustments):
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> * gcc.target/riscv/pr108185-1.c: New test.
>>>> * gcc.target/riscv/pr108185-2.c: New test.
>>>> * gcc.target/riscv/pr108185-3.c: New test.
>>>> * gcc.target/riscv/pr108185-4.c: New test.
>>>> * gcc.target/riscv/pr108185-5.c: New test.
>>>> * gcc.target/riscv/pr108185-6.c: New test.
>>>> * gcc.target/riscv/pr108185-7.c: New test.
>>>> * gcc.target/riscv/pr108185-8.c: New test.
>>>>
>>>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>> ---
>>>> gcc/config/riscv/riscv-modes.def | 8 +++
>>>> gcc/config/riscv/riscv.cc | 12 ++++
>>>> gcc/config/riscv/riscv.h | 1 +
>>>> gcc/genmodes.cc | 25 ++++++-
>>>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
>>>> 12 files changed, 598 insertions(+), 1 deletion(-)
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>>
>>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
>>>> index d5305efa8a6..110bddce851 100644
>>>> --- a/gcc/config/riscv/riscv-modes.def
>>>> +++ b/gcc/config/riscv/riscv-modes.def
>>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>>>>
>>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
>>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
>>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
>>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
>>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
>>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
>>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
>>>> +
>>>> /*
>>>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
>>>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
>>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>>> index de3e1f903c7..cbe66c0e35b 100644
>>>> --- a/gcc/config/riscv/riscv.cc
>>>> +++ b/gcc/config/riscv/riscv.cc
>>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
>>>> return scale;
>>>> }
>>>>
>>>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
>>>> + PRECISION size for corresponding machine_mode. */
>>>> +
>>>> +poly_int64
>>>> +riscv_v_adjust_precision (machine_mode mode, int scale)
>>>> +{
>>>> + if (riscv_v_ext_vector_mode_p (mode))
>>>> + return riscv_vector_chunks * scale;
>>>> +
>>>> + return scale;
>>>> +}
>>>> +
>>>> /* Return true if X is a valid address for machine mode MODE. If it is,
>>>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
>>>> effect. */
>>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>>>> index 5bc7f2f467d..15b9317a8ce 100644
>>>> --- a/gcc/config/riscv/riscv.h
>>>> +++ b/gcc/config/riscv/riscv.h
>>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
>>>> extern unsigned riscv_bytes_per_vector_chunk;
>>>> extern poly_uint16 riscv_vector_chunks;
>>>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
>>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
>>>> /* The number of bits and bytes in a RVV vector. */
>>>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
>>>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
>>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
>>>> index 2d418f09aab..12f4e6335e6 100644
>>>> --- a/gcc/genmodes.cc
>>>> +++ b/gcc/genmodes.cc
>>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
>>>> static struct mode_adjust *adj_format;
>>>> static struct mode_adjust *adj_ibit;
>>>> static struct mode_adjust *adj_fbit;
>>>> +static struct mode_adjust *adj_precision;
>>>>
>>>> /* Mode class operations. */
>>>> static enum mode_class
>>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
>>>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
>>>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
>>>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
>>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
>>>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
>>>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
>>>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
>>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
>>>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
>>>> m->name, m->name);
>>>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
>>>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
>>>> + printf (" poly_uint16 size_one = "
>>>> + "mode_precision[E_%smode].is_constant ()\n", m->name);
>>>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
>>>
>>> Have you tried this on an x86_64 system? I wouldn't expect it to work
>>> because of the:
>>>
>>> STATIC_ASSERT (N >= 2);
>>>
>>> in the poly_uint16 constructor.
>>>
>>>> + printf (" if (known_lt (mode_precision[E_%smode], "
>>>> + "size_one * BITS_PER_UNIT))\n", m->name);
>>>> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
>>>> + printf (" else\n");
>>>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>>
>>> Now that the assert implicit in the original exact_div no longer holds,
>>> I think we should instead generalise it to can_div_away_from_zero_p
>>> (which will involve defining a new overload of can_div_away_from_zero_p).
>>> I think that will give the same result as the code above for the cases
>>> that the code above handles. But it should be more general too.
>>>
>>> TBH, I'm still sceptical that this is all that is needed. It seems
>>> unlikely that we've been so good at writing vector support code that
>>> we've made it work for precision < bitsize, despite that being an
>>> unsupported combination until now. But I guess we can fix problems
>>> on a case-by-case basis.
>>>
>>> Thanks,
>>> Richard
>>>
>>>> " BITS_PER_UNIT);\n", m->name, m->name);
>>>> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
>>>> printf (" adjust_mode_mask (E_%smode);\n", m->name);
>>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
>>>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
>>>> a->file, a->line, a->mode->name, a->adjustment);
>>>>
>>>> + /* Adjust precision to the actual bits size. */
>>>> + for (a = adj_precision; a; a = a->next)
>>>> + switch (a->mode->cl)
>>>> + {
>>>> + case MODE_VECTOR_BOOL:
>>>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
>>>> + a->adjustment);
>>>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
>>>> + break;
>>>> + default:
>>>> + break;
>>>> + }
>>>> +
>>>> puts ("}");
>>>> }
>>>>
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>> new file mode 100644
>>>> index 00000000000..e70960c5b6d
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>> new file mode 100644
>>>> index 00000000000..dcc7a644a88
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>> new file mode 100644
>>>> index 00000000000..3af0513e006
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>> new file mode 100644
>>>> index 00000000000..ea3c360d756
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>> new file mode 100644
>>>> index 00000000000..9fc659d2402
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>> new file mode 100644
>>>> index 00000000000..98275e5267d
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>> new file mode 100644
>>>> index 00000000000..8f6f0b11f09
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>> new file mode 100644
>>>> index 00000000000..d96959dd064
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>> @@ -0,0 +1,77 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
Just have a test with the below code, the [0x4, 0x4] test comes from VNx4BI. You can notice that the mode size is unchanged.
printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
"BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
VNx4BI Before precision [0x4, 0x4], size [0x4, 0]
VNx4BI After precision [0x4, 0x4], size [0x4, 0]
Pan
________________________________
From: Richard Sandiford <richard.sandiford@arm.com>
Sent: Wednesday, March 1, 2023 19:11
To: 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; pan2.li <pan2.li@intel.com>; 盼 李 <incarnation.p.lee@outlook.com>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Thank you all for your quick response.
>
> As juzhe mentioned, the memory access of RISC-V will be always aligned to the bytes boundary with the compact mode, aka ceil(vl / 8) bytes for vbool*.
OK, thanks to both of you. This is what I'd have expected.
In that case, I think both the can_div_away_from_zero_p and the
original patch (using size_one) will give the wrong results.
There isn't a way of representing ceil([4,4]/8) as a poly_int.
The result is (4+4X)/8 when X is odd and (8+4X)/8 when X is even.
> Actually, the data [4,4] comes from the self-test, the RISC-V precision mode as below.
>
> VNx64BI precision [0x40, 0x40].
> VNx32BI precision [0x20, 0x20].
> VNx16BI precision [0x10, 0x10].
> VNx8BI precision [0x8, 0x8].
> VNx4BI precision [0x8, 0x8].
> VNx2BI precision [0x8, 0x8].
> VNx1BI precision [0x8, 0x8].
Ah, OK. Which self-test causes this?
Richard
> The impact of data [4, 4] will impact the genmode part, we cannot write like below as the gcc_unreachable will be hitten.
>
> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, &mode_size[E_%smode]))
> gcc_unreachable (); // Hit on [4, 4] of the self-test.
>
> Pan
> ________________________________
> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> Sent: Wednesday, March 1, 2023 18:46
> To: richard.sandiford <richard.sandiford@arm.com>; pan2.li <pan2.li@intel.com>
> Cc: incarnation.p.lee <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
>>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
>>>bits, even when that means accessing fully-unused bytes? E.g. 4+4X
>>>when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
>>>8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
>>>of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
>
> Hi, Richard. Thank you for helping us.
> My understanding of RVV ISA:
>
> In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VNx4QI, VNx2DI,...etc)
> For data mode, we fully access the data and we don't have unused bytes, so we don't need to adjust precision.
> However, for mask mode we access mask bit in compact model (since each mask bit for corresponding element are consecutive).
> for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, these 4 modes have same bytesize (1,1) but different bitsize.
>
> VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, VNx1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignment, I guess it can not).
>
> If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2BI,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access in bit alignment. so they will the same in the access.
> However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte, 2bytes, 4bytes. They are accessing different size.
>
> This is my comprehension of RVV ISA, feel free to correct me.
> Thanks.
>
> ________________________________
> juzhe.zhong@rivai.ai
>
> From: Richard Sandiford<mailto:richard.sandiford@arm.com>
> Date: 2023-03-01 18:11
> To: Li\, Pan2<mailto:pan2.li@intel.com>
> CC: 盼 李<mailto:incarnation.p.lee@outlook.com>; incarnation.p.lee--- via Gcc-patches<mailto:gcc-patches@gcc.gnu.org>; juzhe.zhong\@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng\@sifive.com<mailto:kito.cheng@sifive.com>; rguenther\@suse.de<mailto:rguenther@suse.de>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> "Li, Pan2" <pan2.li@intel.com> writes:
>> Hi Richard Sandiford,
>>
>> Just tried the overloaded constant divisors with below print div, it works as you mentioned, !
>>
>> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
>> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
>>
>> template<unsigned int N, typename Ca, typename Cb, typename Cq>
>> inline typename if_nonpoly<Cb, bool>::type
>> can_div_away_from_zero_p (const poly_int_pod<N, Ca> &a,
>> Cb b,
>> poly_int_pod<N, Cq> *quotient)
>> {
>> if (!can_div_trunc_p (a, b, quotient))
>> return false;
>> if (maybe_ne (*quotient * b, a))
>> for (unsigned int i = 0; i < N; ++i)
>> quotient->coeffs[i] += (quotient->coeffs[i] < 0 ? -1 : 1);
>> return true;
>> }
>>
>> But I may have a question about the one case as below.
>>
>> Assume:
>> a = [4, 4], b = 8.
>>
>> When meet can_div_trunc_p, it will check if the reminder is constant or not, aka a.coeffs[i] % 8 == 0 (i >= 1). If not constant reminder, the can_div_trunc_p will do nothing about quotient and return false.
>>
>> Thus, when a = [4, 4] for can_div_away_from_zero_p, the output *quotient will be unchanged, aka the mod_size[E_%smode] will be unchanged for this case. However, the underlying mode_size will adjust it to the real byte size, and I am not sure if it is by design or requires additional handling.
>
> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
> bits, even when that means accessing fully-unused bytes? E.g. 4+4X
> when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
> 8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
> of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
>
> Richard
>
>> Pan
>>
>> From: 盼 李 <incarnation.p.lee@outlook.com>
>> Sent: Tuesday, February 28, 2023 5:59 PM
>> To: Richard Sandiford <richard.sandiford@arm.com>; Li, Pan2 <pan2.li@intel.com>
>> Cc: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>
>> Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted.
>>
>> Pan
>> ________________________________
>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
>> Sent: Tuesday, February 28, 2023 17:50
>> To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>> Cc: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>
>> "Li, Pan2" <pan2.li@intel.com<mailto:pan2.li@intel.com>> writes:
>>> Hi Richard Sandiford,
>>>
>>> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
>>>
>>> template<unsigned int N, typename Ca>
>>> inline POLY_CONST_RESULT (N, Ca, Ca)
>>> normalize_to_unit (const poly_int_pod<N, Ca> &a)
>>> {
>>> typedef POLY_CONST_COEFF (Ca, Ca) C;
>>>
>>> poly_int<N, C> normalized = a;
>>>
>>> if (normalized.is_constant())
>>> normalized.coeffs[0] = 1;
>>> else
>>> for (unsigned int i = 0; i < N; i++)
>>> POLY_SET_COEFF (C, normalized, i, 1);
>>>
>>> return normalized;
>>> }
>>>
>>> And then adjust the genmodes like below to consume the unit poly.
>>>
>>> printf (" poly_uint16 unit_poly = "
>>> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
>>> printf (" if (known_lt (mode_precision[E_%smode], "
>>> "unit_poly * BITS_PER_UNIT))\n", m->name);
>>> printf (" mode_size[E_%smode] = unit_poly;\n", m->name);
>>>
>>> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
>>
>> My point was that we have multiple ways of dividing poly_ints:
>>
>> - exact_div, for when the caller knows that the result is always exact
>> - can_div_trunc_p, for truncating division (round towards 0)
>> - can_div_away_from_zero_p, for rounding away from 0
>> - ...
>>
>> This is like how we have multiple division *_EXPRs on trees.
>>
>> Until now, exact_div was the correct choice for modes because vector
>> modes didn't have padding. We're now changing that, so my suggestion
>> in the review was to change the division operation that we use.
>> Rather than use exact_div, we should now use can_div_away_from_zero_p,
>> which would have the effect of rounding the quotient up.
>>
>> Something like:
>>
>> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,
>> &mode_size[E_%smode]))
>> gcc_unreachable ();
>>
>> But this will require a new overload of can_div_away_from_zero_p, since
>> the existing one is for constant quotients rather than constant divisors.
>>
>> Thanks,
>> Richard
>>
>>>
>>> Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
>>>
>>> Pan
>>>
>>> From: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>
>>> Sent: Monday, February 27, 2023 11:13 PM
>>> To: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
>>> Cc: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; rguenther@suse.de<mailto:rguenther@suse.de>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>
>>> Never mind, wish you have a good holiday.
>>>
>>> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
>>>
>>> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
>>>
>>> Thanks again for your professional suggestion, have a nice day, !
>>>
>>> Pan
>>> ________________________________
>>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com<mailto:richard.sandiford@arm.com%3cmailto:richard.sandiford@arm.com>>>
>>> Sent: Monday, February 27, 2023 22:24
>>> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>>
>>> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de<mailto:rguenther@suse.de%3cmailto:rguenther@suse.de>>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>
>>> Sorry for the slow reply, been away for a couple of weeks.
>>>
>>> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>> writes:
>>>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>>
>>>> Fix the bug of the rvv bool mode precision with the adjustment.
>>>> The bits size of vbool*_t will be adjusted to
>>>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>>>> adjusted mode precison of vbool*_t will help underlying pass to
>>>> make the right decision for both the correctness and optimization.
>>>>
>>>> Given below sample code:
>>>> void test_1(int8_t * restrict in, int8_t * restrict out)
>>>> {
>>>> vbool8_t v2 = *(vbool8_t*)in;
>>>> vbool16_t v5 = *(vbool16_t*)in;
>>>> *(vbool16_t*)(out + 200) = v5;
>>>> *(vbool8_t*)(out + 100) = v2;
>>>> }
>>>>
>>>> Before the precision adjustment:
>>>> addi a4,a1,100
>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>> addi a1,a1,200
>>>> vlm.v v24,0(a0)
>>>> vsm.v v24,0(a4)
>>>> // Need one vsetvli and vlm.v for correctness here.
>>>> vsm.v v24,0(a1)
>>>>
>>>> After the precision adjustment:
>>>> csrr t0,vlenb
>>>> slli t1,t0,1
>>>> csrr a3,vlenb
>>>> sub sp,sp,t1
>>>> slli a4,a3,1
>>>> add a4,a4,sp
>>>> sub a3,a4,a3
>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>> addi a2,a1,200
>>>> vlm.v v24,0(a0)
>>>> vsm.v v24,0(a3)
>>>> addi a1,a1,100
>>>> vsetvli a4,zero,e8,mf2,ta,ma
>>>> csrr t0,vlenb
>>>> vlm.v v25,0(a3)
>>>> vsm.v v25,0(a2)
>>>> slli t1,t0,1
>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>> vsm.v v24,0(a1)
>>>> add sp,sp,t1
>>>> jr ra
>>>>
>>>> However, there may be some optimization opportunates after
>>>> the mode precision adjustment. It can be token care of in
>>>> the RISC-V backend in the underlying separted PR(s).
>>>>
>>>> PR 108185
>>>> PR 108654
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>>>> * config/riscv/riscv.cc (riscv_v_adjust_precision):
>>>> * config/riscv/riscv.h (riscv_v_adjust_precision):
>>>> * genmodes.cc (ADJUST_PRECISION):
>>>> (emit_mode_adjustments):
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> * gcc.target/riscv/pr108185-1.c: New test.
>>>> * gcc.target/riscv/pr108185-2.c: New test.
>>>> * gcc.target/riscv/pr108185-3.c: New test.
>>>> * gcc.target/riscv/pr108185-4.c: New test.
>>>> * gcc.target/riscv/pr108185-5.c: New test.
>>>> * gcc.target/riscv/pr108185-6.c: New test.
>>>> * gcc.target/riscv/pr108185-7.c: New test.
>>>> * gcc.target/riscv/pr108185-8.c: New test.
>>>>
>>>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>> ---
>>>> gcc/config/riscv/riscv-modes.def | 8 +++
>>>> gcc/config/riscv/riscv.cc | 12 ++++
>>>> gcc/config/riscv/riscv.h | 1 +
>>>> gcc/genmodes.cc | 25 ++++++-
>>>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
>>>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
>>>> 12 files changed, 598 insertions(+), 1 deletion(-)
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>>
>>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
>>>> index d5305efa8a6..110bddce851 100644
>>>> --- a/gcc/config/riscv/riscv-modes.def
>>>> +++ b/gcc/config/riscv/riscv-modes.def
>>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>>>>
>>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
>>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
>>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
>>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
>>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
>>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
>>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
>>>> +
>>>> /*
>>>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
>>>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
>>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>>> index de3e1f903c7..cbe66c0e35b 100644
>>>> --- a/gcc/config/riscv/riscv.cc
>>>> +++ b/gcc/config/riscv/riscv.cc
>>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
>>>> return scale;
>>>> }
>>>>
>>>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
>>>> + PRECISION size for corresponding machine_mode. */
>>>> +
>>>> +poly_int64
>>>> +riscv_v_adjust_precision (machine_mode mode, int scale)
>>>> +{
>>>> + if (riscv_v_ext_vector_mode_p (mode))
>>>> + return riscv_vector_chunks * scale;
>>>> +
>>>> + return scale;
>>>> +}
>>>> +
>>>> /* Return true if X is a valid address for machine mode MODE. If it is,
>>>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
>>>> effect. */
>>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>>>> index 5bc7f2f467d..15b9317a8ce 100644
>>>> --- a/gcc/config/riscv/riscv.h
>>>> +++ b/gcc/config/riscv/riscv.h
>>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
>>>> extern unsigned riscv_bytes_per_vector_chunk;
>>>> extern poly_uint16 riscv_vector_chunks;
>>>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
>>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
>>>> /* The number of bits and bytes in a RVV vector. */
>>>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
>>>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
>>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
>>>> index 2d418f09aab..12f4e6335e6 100644
>>>> --- a/gcc/genmodes.cc
>>>> +++ b/gcc/genmodes.cc
>>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
>>>> static struct mode_adjust *adj_format;
>>>> static struct mode_adjust *adj_ibit;
>>>> static struct mode_adjust *adj_fbit;
>>>> +static struct mode_adjust *adj_precision;
>>>>
>>>> /* Mode class operations. */
>>>> static enum mode_class
>>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
>>>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
>>>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
>>>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
>>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
>>>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
>>>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
>>>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
>>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
>>>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
>>>> m->name, m->name);
>>>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
>>>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
>>>> + printf (" poly_uint16 size_one = "
>>>> + "mode_precision[E_%smode].is_constant ()\n", m->name);
>>>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
>>>
>>> Have you tried this on an x86_64 system? I wouldn't expect it to work
>>> because of the:
>>>
>>> STATIC_ASSERT (N >= 2);
>>>
>>> in the poly_uint16 constructor.
>>>
>>>> + printf (" if (known_lt (mode_precision[E_%smode], "
>>>> + "size_one * BITS_PER_UNIT))\n", m->name);
>>>> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
>>>> + printf (" else\n");
>>>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>>
>>> Now that the assert implicit in the original exact_div no longer holds,
>>> I think we should instead generalise it to can_div_away_from_zero_p
>>> (which will involve defining a new overload of can_div_away_from_zero_p).
>>> I think that will give the same result as the code above for the cases
>>> that the code above handles. But it should be more general too.
>>>
>>> TBH, I'm still sceptical that this is all that is needed. It seems
>>> unlikely that we've been so good at writing vector support code that
>>> we've made it work for precision < bitsize, despite that being an
>>> unsupported combination until now. But I guess we can fix problems
>>> on a case-by-case basis.
>>>
>>> Thanks,
>>> Richard
>>>
>>>> " BITS_PER_UNIT);\n", m->name, m->name);
>>>> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
>>>> printf (" adjust_mode_mask (E_%smode);\n", m->name);
>>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
>>>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
>>>> a->file, a->line, a->mode->name, a->adjustment);
>>>>
>>>> + /* Adjust precision to the actual bits size. */
>>>> + for (a = adj_precision; a; a = a->next)
>>>> + switch (a->mode->cl)
>>>> + {
>>>> + case MODE_VECTOR_BOOL:
>>>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
>>>> + a->adjustment);
>>>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
>>>> + break;
>>>> + default:
>>>> + break;
>>>> + }
>>>> +
>>>> puts ("}");
>>>> }
>>>>
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>> new file mode 100644
>>>> index 00000000000..e70960c5b6d
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>> new file mode 100644
>>>> index 00000000000..dcc7a644a88
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>> new file mode 100644
>>>> index 00000000000..3af0513e006
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>> new file mode 100644
>>>> index 00000000000..ea3c360d756
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>> new file mode 100644
>>>> index 00000000000..9fc659d2402
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>> new file mode 100644
>>>> index 00000000000..98275e5267d
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>> new file mode 100644
>>>> index 00000000000..8f6f0b11f09
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>> @@ -0,0 +1,68 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>> new file mode 100644
>>>> index 00000000000..d96959dd064
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>> @@ -0,0 +1,77 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>> +
>>>> +#include "riscv_vector.h"
>>>> +
>>>> +void
>>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>> +
>>>> + *(vbool1_t*)(out + 100) = v1;
>>>> + *(vbool1_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>> +
>>>> + *(vbool2_t*)(out + 100) = v1;
>>>> + *(vbool2_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>> +
>>>> + *(vbool4_t*)(out + 100) = v1;
>>>> + *(vbool4_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>> +
>>>> + *(vbool8_t*)(out + 100) = v1;
>>>> + *(vbool8_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>> +
>>>> + *(vbool16_t*)(out + 100) = v1;
>>>> + *(vbool16_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>> +
>>>> + *(vbool32_t*)(out + 100) = v1;
>>>> + *(vbool32_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +void
>>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>> +
>>>> + *(vbool64_t*)(out + 100) = v1;
>>>> + *(vbool64_t*)(out + 200) = v2;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Just have a test with the below code, the [0x4, 0x4] test comes from VNx4BI. You can notice that the mode size is unchanged.
>
> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
>
> VNx4BI Before precision [0x4, 0x4], size [0x4, 0]
> VNx4BI After precision [0x4, 0x4], size [0x4, 0]
Yeah, the result is expected to be unchanged if the division fails.
That's a deliberate part of the interface. The can_* functions
should never be used without testing the boolean return value.
But this precision of [4,4] for VNx4BI is different from what you
listed below. Like I say, if the precision really is [4,4], and if
the size really is ceil([4,4]/8), then I don't think we can represent
that with current infrastructure.
Thanks,
Richard
>
> Pan
> ________________________________
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Wednesday, March 1, 2023 19:11
> To: 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org>
> Cc: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; pan2.li <pan2.li@intel.com>; 盼 李 <incarnation.p.lee@outlook.com>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> Thank you all for your quick response.
>>
>> As juzhe mentioned, the memory access of RISC-V will be always aligned to the bytes boundary with the compact mode, aka ceil(vl / 8) bytes for vbool*.
>
> OK, thanks to both of you. This is what I'd have expected.
>
> In that case, I think both the can_div_away_from_zero_p and the
> original patch (using size_one) will give the wrong results.
> There isn't a way of representing ceil([4,4]/8) as a poly_int.
> The result is (4+4X)/8 when X is odd and (8+4X)/8 when X is even.
>
>> Actually, the data [4,4] comes from the self-test, the RISC-V precision mode as below.
>>
>> VNx64BI precision [0x40, 0x40].
>> VNx32BI precision [0x20, 0x20].
>> VNx16BI precision [0x10, 0x10].
>> VNx8BI precision [0x8, 0x8].
>> VNx4BI precision [0x8, 0x8].
>> VNx2BI precision [0x8, 0x8].
>> VNx1BI precision [0x8, 0x8].
>
> Ah, OK. Which self-test causes this?
>
> Richard
>
>> The impact of data [4, 4] will impact the genmode part, we cannot write like below as the gcc_unreachable will be hitten.
>>
>> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, &mode_size[E_%smode]))
>> gcc_unreachable (); // Hit on [4, 4] of the self-test.
>>
>> Pan
>> ________________________________
>> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
>> Sent: Wednesday, March 1, 2023 18:46
>> To: richard.sandiford <richard.sandiford@arm.com>; pan2.li <pan2.li@intel.com>
>> Cc: incarnation.p.lee <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
>> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>
>>>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
>>>>bits, even when that means accessing fully-unused bytes? E.g. 4+4X
>>>>when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
>>>>8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
>>>>of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
>>
>> Hi, Richard. Thank you for helping us.
>> My understanding of RVV ISA:
>>
>> In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VNx4QI, VNx2DI,...etc)
>> For data mode, we fully access the data and we don't have unused bytes, so we don't need to adjust precision.
>> However, for mask mode we access mask bit in compact model (since each mask bit for corresponding element are consecutive).
>> for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, these 4 modes have same bytesize (1,1) but different bitsize.
>>
>> VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, VNx1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignment, I guess it can not).
>>
>> If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2BI,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access in bit alignment. so they will the same in the access.
>> However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte, 2bytes, 4bytes. They are accessing different size.
>>
>> This is my comprehension of RVV ISA, feel free to correct me.
>> Thanks.
>>
>> ________________________________
>> juzhe.zhong@rivai.ai
>>
>> From: Richard Sandiford<mailto:richard.sandiford@arm.com>
>> Date: 2023-03-01 18:11
>> To: Li\, Pan2<mailto:pan2.li@intel.com>
>> CC: 盼 李<mailto:incarnation.p.lee@outlook.com>; incarnation.p.lee--- via Gcc-patches<mailto:gcc-patches@gcc.gnu.org>; juzhe.zhong\@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng\@sifive.com<mailto:kito.cheng@sifive.com>; rguenther\@suse.de<mailto:rguenther@suse.de>
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>> "Li, Pan2" <pan2.li@intel.com> writes:
>>> Hi Richard Sandiford,
>>>
>>> Just tried the overloaded constant divisors with below print div, it works as you mentioned, !
>>>
>>> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
>>> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
>>>
>>> template<unsigned int N, typename Ca, typename Cb, typename Cq>
>>> inline typename if_nonpoly<Cb, bool>::type
>>> can_div_away_from_zero_p (const poly_int_pod<N, Ca> &a,
>>> Cb b,
>>> poly_int_pod<N, Cq> *quotient)
>>> {
>>> if (!can_div_trunc_p (a, b, quotient))
>>> return false;
>>> if (maybe_ne (*quotient * b, a))
>>> for (unsigned int i = 0; i < N; ++i)
>>> quotient->coeffs[i] += (quotient->coeffs[i] < 0 ? -1 : 1);
>>> return true;
>>> }
>>>
>>> But I may have a question about the one case as below.
>>>
>>> Assume:
>>> a = [4, 4], b = 8.
>>>
>>> When meet can_div_trunc_p, it will check if the reminder is constant or not, aka a.coeffs[i] % 8 == 0 (i >= 1). If not constant reminder, the can_div_trunc_p will do nothing about quotient and return false.
>>>
>>> Thus, when a = [4, 4] for can_div_away_from_zero_p, the output *quotient will be unchanged, aka the mod_size[E_%smode] will be unchanged for this case. However, the underlying mode_size will adjust it to the real byte size, and I am not sure if it is by design or requires additional handling.
>>
>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
>> bits, even when that means accessing fully-unused bytes? E.g. 4+4X
>> when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
>> 8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
>> of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
>>
>> Richard
>>
>>> Pan
>>>
>>> From: 盼 李 <incarnation.p.lee@outlook.com>
>>> Sent: Tuesday, February 28, 2023 5:59 PM
>>> To: Richard Sandiford <richard.sandiford@arm.com>; Li, Pan2 <pan2.li@intel.com>
>>> Cc: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de
>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>
>>> Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted.
>>>
>>> Pan
>>> ________________________________
>>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
>>> Sent: Tuesday, February 28, 2023 17:50
>>> To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>>> Cc: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>
>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>
>>> "Li, Pan2" <pan2.li@intel.com<mailto:pan2.li@intel.com>> writes:
>>>> Hi Richard Sandiford,
>>>>
>>>> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
>>>>
>>>> template<unsigned int N, typename Ca>
>>>> inline POLY_CONST_RESULT (N, Ca, Ca)
>>>> normalize_to_unit (const poly_int_pod<N, Ca> &a)
>>>> {
>>>> typedef POLY_CONST_COEFF (Ca, Ca) C;
>>>>
>>>> poly_int<N, C> normalized = a;
>>>>
>>>> if (normalized.is_constant())
>>>> normalized.coeffs[0] = 1;
>>>> else
>>>> for (unsigned int i = 0; i < N; i++)
>>>> POLY_SET_COEFF (C, normalized, i, 1);
>>>>
>>>> return normalized;
>>>> }
>>>>
>>>> And then adjust the genmodes like below to consume the unit poly.
>>>>
>>>> printf (" poly_uint16 unit_poly = "
>>>> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
>>>> printf (" if (known_lt (mode_precision[E_%smode], "
>>>> "unit_poly * BITS_PER_UNIT))\n", m->name);
>>>> printf (" mode_size[E_%smode] = unit_poly;\n", m->name);
>>>>
>>>> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
>>>
>>> My point was that we have multiple ways of dividing poly_ints:
>>>
>>> - exact_div, for when the caller knows that the result is always exact
>>> - can_div_trunc_p, for truncating division (round towards 0)
>>> - can_div_away_from_zero_p, for rounding away from 0
>>> - ...
>>>
>>> This is like how we have multiple division *_EXPRs on trees.
>>>
>>> Until now, exact_div was the correct choice for modes because vector
>>> modes didn't have padding. We're now changing that, so my suggestion
>>> in the review was to change the division operation that we use.
>>> Rather than use exact_div, we should now use can_div_away_from_zero_p,
>>> which would have the effect of rounding the quotient up.
>>>
>>> Something like:
>>>
>>> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,
>>> &mode_size[E_%smode]))
>>> gcc_unreachable ();
>>>
>>> But this will require a new overload of can_div_away_from_zero_p, since
>>> the existing one is for constant quotients rather than constant divisors.
>>>
>>> Thanks,
>>> Richard
>>>
>>>>
>>>> Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
>>>>
>>>> Pan
>>>>
>>>> From: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>
>>>> Sent: Monday, February 27, 2023 11:13 PM
>>>> To: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
>>>> Cc: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; rguenther@suse.de<mailto:rguenther@suse.de>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>>
>>>> Never mind, wish you have a good holiday.
>>>>
>>>> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
>>>>
>>>> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
>>>>
>>>> Thanks again for your professional suggestion, have a nice day, !
>>>>
>>>> Pan
>>>> ________________________________
>>>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com<mailto:richard.sandiford@arm.com%3cmailto:richard.sandiford@arm.com>>>
>>>> Sent: Monday, February 27, 2023 22:24
>>>> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>>
>>>> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de<mailto:rguenther@suse.de%3cmailto:rguenther@suse.de>>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>>
>>>> Sorry for the slow reply, been away for a couple of weeks.
>>>>
>>>> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>> writes:
>>>>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>>>
>>>>> Fix the bug of the rvv bool mode precision with the adjustment.
>>>>> The bits size of vbool*_t will be adjusted to
>>>>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>>>>> adjusted mode precison of vbool*_t will help underlying pass to
>>>>> make the right decision for both the correctness and optimization.
>>>>>
>>>>> Given below sample code:
>>>>> void test_1(int8_t * restrict in, int8_t * restrict out)
>>>>> {
>>>>> vbool8_t v2 = *(vbool8_t*)in;
>>>>> vbool16_t v5 = *(vbool16_t*)in;
>>>>> *(vbool16_t*)(out + 200) = v5;
>>>>> *(vbool8_t*)(out + 100) = v2;
>>>>> }
>>>>>
>>>>> Before the precision adjustment:
>>>>> addi a4,a1,100
>>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>>> addi a1,a1,200
>>>>> vlm.v v24,0(a0)
>>>>> vsm.v v24,0(a4)
>>>>> // Need one vsetvli and vlm.v for correctness here.
>>>>> vsm.v v24,0(a1)
>>>>>
>>>>> After the precision adjustment:
>>>>> csrr t0,vlenb
>>>>> slli t1,t0,1
>>>>> csrr a3,vlenb
>>>>> sub sp,sp,t1
>>>>> slli a4,a3,1
>>>>> add a4,a4,sp
>>>>> sub a3,a4,a3
>>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>>> addi a2,a1,200
>>>>> vlm.v v24,0(a0)
>>>>> vsm.v v24,0(a3)
>>>>> addi a1,a1,100
>>>>> vsetvli a4,zero,e8,mf2,ta,ma
>>>>> csrr t0,vlenb
>>>>> vlm.v v25,0(a3)
>>>>> vsm.v v25,0(a2)
>>>>> slli t1,t0,1
>>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>>> vsm.v v24,0(a1)
>>>>> add sp,sp,t1
>>>>> jr ra
>>>>>
>>>>> However, there may be some optimization opportunates after
>>>>> the mode precision adjustment. It can be token care of in
>>>>> the RISC-V backend in the underlying separted PR(s).
>>>>>
>>>>> PR 108185
>>>>> PR 108654
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>>>>> * config/riscv/riscv.cc (riscv_v_adjust_precision):
>>>>> * config/riscv/riscv.h (riscv_v_adjust_precision):
>>>>> * genmodes.cc (ADJUST_PRECISION):
>>>>> (emit_mode_adjustments):
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>> * gcc.target/riscv/pr108185-1.c: New test.
>>>>> * gcc.target/riscv/pr108185-2.c: New test.
>>>>> * gcc.target/riscv/pr108185-3.c: New test.
>>>>> * gcc.target/riscv/pr108185-4.c: New test.
>>>>> * gcc.target/riscv/pr108185-5.c: New test.
>>>>> * gcc.target/riscv/pr108185-6.c: New test.
>>>>> * gcc.target/riscv/pr108185-7.c: New test.
>>>>> * gcc.target/riscv/pr108185-8.c: New test.
>>>>>
>>>>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>>> ---
>>>>> gcc/config/riscv/riscv-modes.def | 8 +++
>>>>> gcc/config/riscv/riscv.cc | 12 ++++
>>>>> gcc/config/riscv/riscv.h | 1 +
>>>>> gcc/genmodes.cc | 25 ++++++-
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
>>>>> 12 files changed, 598 insertions(+), 1 deletion(-)
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>>>
>>>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
>>>>> index d5305efa8a6..110bddce851 100644
>>>>> --- a/gcc/config/riscv/riscv-modes.def
>>>>> +++ b/gcc/config/riscv/riscv-modes.def
>>>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>>>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>>>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>>>>>
>>>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
>>>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
>>>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
>>>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
>>>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
>>>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
>>>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
>>>>> +
>>>>> /*
>>>>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
>>>>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
>>>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>>>> index de3e1f903c7..cbe66c0e35b 100644
>>>>> --- a/gcc/config/riscv/riscv.cc
>>>>> +++ b/gcc/config/riscv/riscv.cc
>>>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
>>>>> return scale;
>>>>> }
>>>>>
>>>>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
>>>>> + PRECISION size for corresponding machine_mode. */
>>>>> +
>>>>> +poly_int64
>>>>> +riscv_v_adjust_precision (machine_mode mode, int scale)
>>>>> +{
>>>>> + if (riscv_v_ext_vector_mode_p (mode))
>>>>> + return riscv_vector_chunks * scale;
>>>>> +
>>>>> + return scale;
>>>>> +}
>>>>> +
>>>>> /* Return true if X is a valid address for machine mode MODE. If it is,
>>>>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
>>>>> effect. */
>>>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>>>>> index 5bc7f2f467d..15b9317a8ce 100644
>>>>> --- a/gcc/config/riscv/riscv.h
>>>>> +++ b/gcc/config/riscv/riscv.h
>>>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
>>>>> extern unsigned riscv_bytes_per_vector_chunk;
>>>>> extern poly_uint16 riscv_vector_chunks;
>>>>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
>>>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
>>>>> /* The number of bits and bytes in a RVV vector. */
>>>>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
>>>>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
>>>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
>>>>> index 2d418f09aab..12f4e6335e6 100644
>>>>> --- a/gcc/genmodes.cc
>>>>> +++ b/gcc/genmodes.cc
>>>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
>>>>> static struct mode_adjust *adj_format;
>>>>> static struct mode_adjust *adj_ibit;
>>>>> static struct mode_adjust *adj_fbit;
>>>>> +static struct mode_adjust *adj_precision;
>>>>>
>>>>> /* Mode class operations. */
>>>>> static enum mode_class
>>>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
>>>>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
>>>>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
>>>>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
>>>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
>>>>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
>>>>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
>>>>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
>>>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
>>>>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
>>>>> m->name, m->name);
>>>>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
>>>>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>>>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
>>>>> + printf (" poly_uint16 size_one = "
>>>>> + "mode_precision[E_%smode].is_constant ()\n", m->name);
>>>>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
>>>>
>>>> Have you tried this on an x86_64 system? I wouldn't expect it to work
>>>> because of the:
>>>>
>>>> STATIC_ASSERT (N >= 2);
>>>>
>>>> in the poly_uint16 constructor.
>>>>
>>>>> + printf (" if (known_lt (mode_precision[E_%smode], "
>>>>> + "size_one * BITS_PER_UNIT))\n", m->name);
>>>>> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
>>>>> + printf (" else\n");
>>>>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>>>
>>>> Now that the assert implicit in the original exact_div no longer holds,
>>>> I think we should instead generalise it to can_div_away_from_zero_p
>>>> (which will involve defining a new overload of can_div_away_from_zero_p).
>>>> I think that will give the same result as the code above for the cases
>>>> that the code above handles. But it should be more general too.
>>>>
>>>> TBH, I'm still sceptical that this is all that is needed. It seems
>>>> unlikely that we've been so good at writing vector support code that
>>>> we've made it work for precision < bitsize, despite that being an
>>>> unsupported combination until now. But I guess we can fix problems
>>>> on a case-by-case basis.
>>>>
>>>> Thanks,
>>>> Richard
>>>>
>>>>> " BITS_PER_UNIT);\n", m->name, m->name);
>>>>> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
>>>>> printf (" adjust_mode_mask (E_%smode);\n", m->name);
>>>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
>>>>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
>>>>> a->file, a->line, a->mode->name, a->adjustment);
>>>>>
>>>>> + /* Adjust precision to the actual bits size. */
>>>>> + for (a = adj_precision; a; a = a->next)
>>>>> + switch (a->mode->cl)
>>>>> + {
>>>>> + case MODE_VECTOR_BOOL:
>>>>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
>>>>> + a->adjustment);
>>>>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
>>>>> + break;
>>>>> + default:
>>>>> + break;
>>>>> + }
>>>>> +
>>>>> puts ("}");
>>>>> }
>>>>>
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>>> new file mode 100644
>>>>> index 00000000000..e70960c5b6d
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>>> new file mode 100644
>>>>> index 00000000000..dcc7a644a88
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>>> new file mode 100644
>>>>> index 00000000000..3af0513e006
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>>> new file mode 100644
>>>>> index 00000000000..ea3c360d756
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>>> new file mode 100644
>>>>> index 00000000000..9fc659d2402
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>>> new file mode 100644
>>>>> index 00000000000..98275e5267d
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>>> new file mode 100644
>>>>> index 00000000000..8f6f0b11f09
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>>> new file mode 100644
>>>>> index 00000000000..d96959dd064
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>>> @@ -0,0 +1,77 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
Actually, we just want to differentiate VNx1BI VNx2BI VNx4BI VNx8BI, and they are considered the same in GCC which produce BUG in RVV currently.
This patch is just adjust precision to differentiate them but may not be (like you say), they may not be handled accurately according precision.
However, at least it can help us differentiate these 4 mask modes and avoid encounter the bugs.
The is the current solution that we have to fix the bug of RVV and avoid influence other targets.
Do you have other ideas to fix this issue? Or such patch with adding adjust_precision support is OK to GCC?
Thanks.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2023-03-01 20:03
To: 盼 李 via Gcc-patches
CC: 盼 李; juzhe.zhong\@rivai.ai; pan2.li; Kito.cheng; rguenther
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Just have a test with the below code, the [0x4, 0x4] test comes from VNx4BI. You can notice that the mode size is unchanged.
>
> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
>
> VNx4BI Before precision [0x4, 0x4], size [0x4, 0]
> VNx4BI After precision [0x4, 0x4], size [0x4, 0]
Yeah, the result is expected to be unchanged if the division fails.
That's a deliberate part of the interface. The can_* functions
should never be used without testing the boolean return value.
But this precision of [4,4] for VNx4BI is different from what you
listed below. Like I say, if the precision really is [4,4], and if
the size really is ceil([4,4]/8), then I don't think we can represent
that with current infrastructure.
Thanks,
Richard
>
> Pan
> ________________________________
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Wednesday, March 1, 2023 19:11
> To: 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org>
> Cc: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; pan2.li <pan2.li@intel.com>; 盼 李 <incarnation.p.lee@outlook.com>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> Thank you all for your quick response.
>>
>> As juzhe mentioned, the memory access of RISC-V will be always aligned to the bytes boundary with the compact mode, aka ceil(vl / 8) bytes for vbool*.
>
> OK, thanks to both of you. This is what I'd have expected.
>
> In that case, I think both the can_div_away_from_zero_p and the
> original patch (using size_one) will give the wrong results.
> There isn't a way of representing ceil([4,4]/8) as a poly_int.
> The result is (4+4X)/8 when X is odd and (8+4X)/8 when X is even.
>
>> Actually, the data [4,4] comes from the self-test, the RISC-V precision mode as below.
>>
>> VNx64BI precision [0x40, 0x40].
>> VNx32BI precision [0x20, 0x20].
>> VNx16BI precision [0x10, 0x10].
>> VNx8BI precision [0x8, 0x8].
>> VNx4BI precision [0x8, 0x8].
>> VNx2BI precision [0x8, 0x8].
>> VNx1BI precision [0x8, 0x8].
>
> Ah, OK. Which self-test causes this?
>
> Richard
>
>> The impact of data [4, 4] will impact the genmode part, we cannot write like below as the gcc_unreachable will be hitten.
>>
>> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, &mode_size[E_%smode]))
>> gcc_unreachable (); // Hit on [4, 4] of the self-test.
>>
>> Pan
>> ________________________________
>> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
>> Sent: Wednesday, March 1, 2023 18:46
>> To: richard.sandiford <richard.sandiford@arm.com>; pan2.li <pan2.li@intel.com>
>> Cc: incarnation.p.lee <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
>> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>
>>>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
>>>>bits, even when that means accessing fully-unused bytes? E.g. 4+4X
>>>>when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
>>>>8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
>>>>of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
>>
>> Hi, Richard. Thank you for helping us.
>> My understanding of RVV ISA:
>>
>> In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VNx4QI, VNx2DI,...etc)
>> For data mode, we fully access the data and we don't have unused bytes, so we don't need to adjust precision.
>> However, for mask mode we access mask bit in compact model (since each mask bit for corresponding element are consecutive).
>> for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, these 4 modes have same bytesize (1,1) but different bitsize.
>>
>> VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, VNx1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignment, I guess it can not).
>>
>> If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2BI,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access in bit alignment. so they will the same in the access.
>> However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte, 2bytes, 4bytes. They are accessing different size.
>>
>> This is my comprehension of RVV ISA, feel free to correct me.
>> Thanks.
>>
>> ________________________________
>> juzhe.zhong@rivai.ai
>>
>> From: Richard Sandiford<mailto:richard.sandiford@arm.com>
>> Date: 2023-03-01 18:11
>> To: Li\, Pan2<mailto:pan2.li@intel.com>
>> CC: 盼 李<mailto:incarnation.p.lee@outlook.com>; incarnation.p.lee--- via Gcc-patches<mailto:gcc-patches@gcc.gnu.org>; juzhe.zhong\@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng\@sifive.com<mailto:kito.cheng@sifive.com>; rguenther\@suse.de<mailto:rguenther@suse.de>
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>> "Li, Pan2" <pan2.li@intel.com> writes:
>>> Hi Richard Sandiford,
>>>
>>> Just tried the overloaded constant divisors with below print div, it works as you mentioned, !
>>>
>>> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
>>> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
>>>
>>> template<unsigned int N, typename Ca, typename Cb, typename Cq>
>>> inline typename if_nonpoly<Cb, bool>::type
>>> can_div_away_from_zero_p (const poly_int_pod<N, Ca> &a,
>>> Cb b,
>>> poly_int_pod<N, Cq> *quotient)
>>> {
>>> if (!can_div_trunc_p (a, b, quotient))
>>> return false;
>>> if (maybe_ne (*quotient * b, a))
>>> for (unsigned int i = 0; i < N; ++i)
>>> quotient->coeffs[i] += (quotient->coeffs[i] < 0 ? -1 : 1);
>>> return true;
>>> }
>>>
>>> But I may have a question about the one case as below.
>>>
>>> Assume:
>>> a = [4, 4], b = 8.
>>>
>>> When meet can_div_trunc_p, it will check if the reminder is constant or not, aka a.coeffs[i] % 8 == 0 (i >= 1). If not constant reminder, the can_div_trunc_p will do nothing about quotient and return false.
>>>
>>> Thus, when a = [4, 4] for can_div_away_from_zero_p, the output *quotient will be unchanged, aka the mod_size[E_%smode] will be unchanged for this case. However, the underlying mode_size will adjust it to the real byte size, and I am not sure if it is by design or requires additional handling.
>>
>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
>> bits, even when that means accessing fully-unused bytes? E.g. 4+4X
>> when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
>> 8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
>> of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
>>
>> Richard
>>
>>> Pan
>>>
>>> From: 盼 李 <incarnation.p.lee@outlook.com>
>>> Sent: Tuesday, February 28, 2023 5:59 PM
>>> To: Richard Sandiford <richard.sandiford@arm.com>; Li, Pan2 <pan2.li@intel.com>
>>> Cc: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de
>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>
>>> Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted.
>>>
>>> Pan
>>> ________________________________
>>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
>>> Sent: Tuesday, February 28, 2023 17:50
>>> To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>>> Cc: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>
>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>
>>> "Li, Pan2" <pan2.li@intel.com<mailto:pan2.li@intel.com>> writes:
>>>> Hi Richard Sandiford,
>>>>
>>>> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
>>>>
>>>> template<unsigned int N, typename Ca>
>>>> inline POLY_CONST_RESULT (N, Ca, Ca)
>>>> normalize_to_unit (const poly_int_pod<N, Ca> &a)
>>>> {
>>>> typedef POLY_CONST_COEFF (Ca, Ca) C;
>>>>
>>>> poly_int<N, C> normalized = a;
>>>>
>>>> if (normalized.is_constant())
>>>> normalized.coeffs[0] = 1;
>>>> else
>>>> for (unsigned int i = 0; i < N; i++)
>>>> POLY_SET_COEFF (C, normalized, i, 1);
>>>>
>>>> return normalized;
>>>> }
>>>>
>>>> And then adjust the genmodes like below to consume the unit poly.
>>>>
>>>> printf (" poly_uint16 unit_poly = "
>>>> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
>>>> printf (" if (known_lt (mode_precision[E_%smode], "
>>>> "unit_poly * BITS_PER_UNIT))\n", m->name);
>>>> printf (" mode_size[E_%smode] = unit_poly;\n", m->name);
>>>>
>>>> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
>>>
>>> My point was that we have multiple ways of dividing poly_ints:
>>>
>>> - exact_div, for when the caller knows that the result is always exact
>>> - can_div_trunc_p, for truncating division (round towards 0)
>>> - can_div_away_from_zero_p, for rounding away from 0
>>> - ...
>>>
>>> This is like how we have multiple division *_EXPRs on trees.
>>>
>>> Until now, exact_div was the correct choice for modes because vector
>>> modes didn't have padding. We're now changing that, so my suggestion
>>> in the review was to change the division operation that we use.
>>> Rather than use exact_div, we should now use can_div_away_from_zero_p,
>>> which would have the effect of rounding the quotient up.
>>>
>>> Something like:
>>>
>>> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,
>>> &mode_size[E_%smode]))
>>> gcc_unreachable ();
>>>
>>> But this will require a new overload of can_div_away_from_zero_p, since
>>> the existing one is for constant quotients rather than constant divisors.
>>>
>>> Thanks,
>>> Richard
>>>
>>>>
>>>> Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
>>>>
>>>> Pan
>>>>
>>>> From: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>
>>>> Sent: Monday, February 27, 2023 11:13 PM
>>>> To: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
>>>> Cc: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; rguenther@suse.de<mailto:rguenther@suse.de>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>>
>>>> Never mind, wish you have a good holiday.
>>>>
>>>> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
>>>>
>>>> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
>>>>
>>>> Thanks again for your professional suggestion, have a nice day, !
>>>>
>>>> Pan
>>>> ________________________________
>>>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com<mailto:richard.sandiford@arm.com%3cmailto:richard.sandiford@arm.com>>>
>>>> Sent: Monday, February 27, 2023 22:24
>>>> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>>
>>>> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de<mailto:rguenther@suse.de%3cmailto:rguenther@suse.de>>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>>
>>>> Sorry for the slow reply, been away for a couple of weeks.
>>>>
>>>> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>> writes:
>>>>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>>>
>>>>> Fix the bug of the rvv bool mode precision with the adjustment.
>>>>> The bits size of vbool*_t will be adjusted to
>>>>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>>>>> adjusted mode precison of vbool*_t will help underlying pass to
>>>>> make the right decision for both the correctness and optimization.
>>>>>
>>>>> Given below sample code:
>>>>> void test_1(int8_t * restrict in, int8_t * restrict out)
>>>>> {
>>>>> vbool8_t v2 = *(vbool8_t*)in;
>>>>> vbool16_t v5 = *(vbool16_t*)in;
>>>>> *(vbool16_t*)(out + 200) = v5;
>>>>> *(vbool8_t*)(out + 100) = v2;
>>>>> }
>>>>>
>>>>> Before the precision adjustment:
>>>>> addi a4,a1,100
>>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>>> addi a1,a1,200
>>>>> vlm.v v24,0(a0)
>>>>> vsm.v v24,0(a4)
>>>>> // Need one vsetvli and vlm.v for correctness here.
>>>>> vsm.v v24,0(a1)
>>>>>
>>>>> After the precision adjustment:
>>>>> csrr t0,vlenb
>>>>> slli t1,t0,1
>>>>> csrr a3,vlenb
>>>>> sub sp,sp,t1
>>>>> slli a4,a3,1
>>>>> add a4,a4,sp
>>>>> sub a3,a4,a3
>>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>>> addi a2,a1,200
>>>>> vlm.v v24,0(a0)
>>>>> vsm.v v24,0(a3)
>>>>> addi a1,a1,100
>>>>> vsetvli a4,zero,e8,mf2,ta,ma
>>>>> csrr t0,vlenb
>>>>> vlm.v v25,0(a3)
>>>>> vsm.v v25,0(a2)
>>>>> slli t1,t0,1
>>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>>> vsm.v v24,0(a1)
>>>>> add sp,sp,t1
>>>>> jr ra
>>>>>
>>>>> However, there may be some optimization opportunates after
>>>>> the mode precision adjustment. It can be token care of in
>>>>> the RISC-V backend in the underlying separted PR(s).
>>>>>
>>>>> PR 108185
>>>>> PR 108654
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>>>>> * config/riscv/riscv.cc (riscv_v_adjust_precision):
>>>>> * config/riscv/riscv.h (riscv_v_adjust_precision):
>>>>> * genmodes.cc (ADJUST_PRECISION):
>>>>> (emit_mode_adjustments):
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>> * gcc.target/riscv/pr108185-1.c: New test.
>>>>> * gcc.target/riscv/pr108185-2.c: New test.
>>>>> * gcc.target/riscv/pr108185-3.c: New test.
>>>>> * gcc.target/riscv/pr108185-4.c: New test.
>>>>> * gcc.target/riscv/pr108185-5.c: New test.
>>>>> * gcc.target/riscv/pr108185-6.c: New test.
>>>>> * gcc.target/riscv/pr108185-7.c: New test.
>>>>> * gcc.target/riscv/pr108185-8.c: New test.
>>>>>
>>>>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>>> ---
>>>>> gcc/config/riscv/riscv-modes.def | 8 +++
>>>>> gcc/config/riscv/riscv.cc | 12 ++++
>>>>> gcc/config/riscv/riscv.h | 1 +
>>>>> gcc/genmodes.cc | 25 ++++++-
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
>>>>> 12 files changed, 598 insertions(+), 1 deletion(-)
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>>>
>>>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
>>>>> index d5305efa8a6..110bddce851 100644
>>>>> --- a/gcc/config/riscv/riscv-modes.def
>>>>> +++ b/gcc/config/riscv/riscv-modes.def
>>>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>>>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>>>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>>>>>
>>>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
>>>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
>>>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
>>>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
>>>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
>>>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
>>>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
>>>>> +
>>>>> /*
>>>>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
>>>>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
>>>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>>>> index de3e1f903c7..cbe66c0e35b 100644
>>>>> --- a/gcc/config/riscv/riscv.cc
>>>>> +++ b/gcc/config/riscv/riscv.cc
>>>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
>>>>> return scale;
>>>>> }
>>>>>
>>>>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
>>>>> + PRECISION size for corresponding machine_mode. */
>>>>> +
>>>>> +poly_int64
>>>>> +riscv_v_adjust_precision (machine_mode mode, int scale)
>>>>> +{
>>>>> + if (riscv_v_ext_vector_mode_p (mode))
>>>>> + return riscv_vector_chunks * scale;
>>>>> +
>>>>> + return scale;
>>>>> +}
>>>>> +
>>>>> /* Return true if X is a valid address for machine mode MODE. If it is,
>>>>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
>>>>> effect. */
>>>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>>>>> index 5bc7f2f467d..15b9317a8ce 100644
>>>>> --- a/gcc/config/riscv/riscv.h
>>>>> +++ b/gcc/config/riscv/riscv.h
>>>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
>>>>> extern unsigned riscv_bytes_per_vector_chunk;
>>>>> extern poly_uint16 riscv_vector_chunks;
>>>>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
>>>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
>>>>> /* The number of bits and bytes in a RVV vector. */
>>>>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
>>>>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
>>>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
>>>>> index 2d418f09aab..12f4e6335e6 100644
>>>>> --- a/gcc/genmodes.cc
>>>>> +++ b/gcc/genmodes.cc
>>>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
>>>>> static struct mode_adjust *adj_format;
>>>>> static struct mode_adjust *adj_ibit;
>>>>> static struct mode_adjust *adj_fbit;
>>>>> +static struct mode_adjust *adj_precision;
>>>>>
>>>>> /* Mode class operations. */
>>>>> static enum mode_class
>>>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
>>>>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
>>>>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
>>>>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
>>>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
>>>>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
>>>>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
>>>>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
>>>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
>>>>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
>>>>> m->name, m->name);
>>>>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
>>>>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>>>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
>>>>> + printf (" poly_uint16 size_one = "
>>>>> + "mode_precision[E_%smode].is_constant ()\n", m->name);
>>>>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
>>>>
>>>> Have you tried this on an x86_64 system? I wouldn't expect it to work
>>>> because of the:
>>>>
>>>> STATIC_ASSERT (N >= 2);
>>>>
>>>> in the poly_uint16 constructor.
>>>>
>>>>> + printf (" if (known_lt (mode_precision[E_%smode], "
>>>>> + "size_one * BITS_PER_UNIT))\n", m->name);
>>>>> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
>>>>> + printf (" else\n");
>>>>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>>>
>>>> Now that the assert implicit in the original exact_div no longer holds,
>>>> I think we should instead generalise it to can_div_away_from_zero_p
>>>> (which will involve defining a new overload of can_div_away_from_zero_p).
>>>> I think that will give the same result as the code above for the cases
>>>> that the code above handles. But it should be more general too.
>>>>
>>>> TBH, I'm still sceptical that this is all that is needed. It seems
>>>> unlikely that we've been so good at writing vector support code that
>>>> we've made it work for precision < bitsize, despite that being an
>>>> unsupported combination until now. But I guess we can fix problems
>>>> on a case-by-case basis.
>>>>
>>>> Thanks,
>>>> Richard
>>>>
>>>>> " BITS_PER_UNIT);\n", m->name, m->name);
>>>>> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
>>>>> printf (" adjust_mode_mask (E_%smode);\n", m->name);
>>>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
>>>>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
>>>>> a->file, a->line, a->mode->name, a->adjustment);
>>>>>
>>>>> + /* Adjust precision to the actual bits size. */
>>>>> + for (a = adj_precision; a; a = a->next)
>>>>> + switch (a->mode->cl)
>>>>> + {
>>>>> + case MODE_VECTOR_BOOL:
>>>>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
>>>>> + a->adjustment);
>>>>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
>>>>> + break;
>>>>> + default:
>>>>> + break;
>>>>> + }
>>>>> +
>>>>> puts ("}");
>>>>> }
>>>>>
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>>> new file mode 100644
>>>>> index 00000000000..e70960c5b6d
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>>> new file mode 100644
>>>>> index 00000000000..dcc7a644a88
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>>> new file mode 100644
>>>>> index 00000000000..3af0513e006
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>>> new file mode 100644
>>>>> index 00000000000..ea3c360d756
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>>> new file mode 100644
>>>>> index 00000000000..9fc659d2402
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>>> new file mode 100644
>>>>> index 00000000000..98275e5267d
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>>> new file mode 100644
>>>>> index 00000000000..8f6f0b11f09
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>>> new file mode 100644
>>>>> index 00000000000..d96959dd064
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>>> @@ -0,0 +1,77 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
let me explain more about the test [4,4]. As I understand, the self test will run more than 1 time for VNx4BI.
The first time, precision[VNx4BI] = [8, 8], then the PR precision part will adjust it to [4, 4].
The rest times, precision[VNx4BI] = [4, 4], then can_* will return false and hit the gcc_unreeachable().
I agree that the current infrastructure cannot represent this case. As mentioned by juzhe, we just would like to
have some differences for VNx[1-4]BI. So we try to adjust the precision and meet some self-test failure, that is the whole
story of the genmode printf(" xxx") parts changes.
It is perfect if you have some elegant way for this, including both the self-test and the precision part.
Thank you so much
Pan
________________________________
From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Wednesday, March 1, 2023 20:13
To: richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>
Cc: incarnation.p.lee <incarnation.p.lee@outlook.com>; pan2.li <pan2.li@intel.com>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Actually, we just want to differentiate VNx1BI VNx2BI VNx4BI VNx8BI, and they are considered the same in GCC which produce BUG in RVV currently.
This patch is just adjust precision to differentiate them but may not be (like you say), they may not be handled accurately according precision.
However, at least it can help us differentiate these 4 mask modes and avoid encounter the bugs.
The is the current solution that we have to fix the bug of RVV and avoid influence other targets.
Do you have other ideas to fix this issue? Or such patch with adding adjust_precision support is OK to GCC?
Thanks.
________________________________
juzhe.zhong@rivai.ai
From: Richard Sandiford<mailto:richard.sandiford@arm.com>
Date: 2023-03-01 20:03
To: 盼 李 via Gcc-patches<mailto:gcc-patches@gcc.gnu.org>
CC: 盼 李<mailto:incarnation.p.lee@outlook.com>; juzhe.zhong\@rivai.ai<mailto:juzhe.zhong@rivai.ai>; pan2.li<mailto:pan2.li@intel.com>; Kito.cheng<mailto:kito.cheng@sifive.com>; rguenther<mailto:rguenther@suse.de>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Just have a test with the below code, the [0x4, 0x4] test comes from VNx4BI. You can notice that the mode size is unchanged.
>
> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
>
> VNx4BI Before precision [0x4, 0x4], size [0x4, 0]
> VNx4BI After precision [0x4, 0x4], size [0x4, 0]
Yeah, the result is expected to be unchanged if the division fails.
That's a deliberate part of the interface. The can_* functions
should never be used without testing the boolean return value.
But this precision of [4,4] for VNx4BI is different from what you
listed below. Like I say, if the precision really is [4,4], and if
the size really is ceil([4,4]/8), then I don't think we can represent
that with current infrastructure.
Thanks,
Richard
>
> Pan
> ________________________________
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Wednesday, March 1, 2023 19:11
> To: 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org>
> Cc: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; pan2.li <pan2.li@intel.com>; 盼 李 <incarnation.p.lee@outlook.com>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> Thank you all for your quick response.
>>
>> As juzhe mentioned, the memory access of RISC-V will be always aligned to the bytes boundary with the compact mode, aka ceil(vl / 8) bytes for vbool*.
>
> OK, thanks to both of you. This is what I'd have expected.
>
> In that case, I think both the can_div_away_from_zero_p and the
> original patch (using size_one) will give the wrong results.
> There isn't a way of representing ceil([4,4]/8) as a poly_int.
> The result is (4+4X)/8 when X is odd and (8+4X)/8 when X is even.
>
>> Actually, the data [4,4] comes from the self-test, the RISC-V precision mode as below.
>>
>> VNx64BI precision [0x40, 0x40].
>> VNx32BI precision [0x20, 0x20].
>> VNx16BI precision [0x10, 0x10].
>> VNx8BI precision [0x8, 0x8].
>> VNx4BI precision [0x8, 0x8].
>> VNx2BI precision [0x8, 0x8].
>> VNx1BI precision [0x8, 0x8].
>
> Ah, OK. Which self-test causes this?
>
> Richard
>
>> The impact of data [4, 4] will impact the genmode part, we cannot write like below as the gcc_unreachable will be hitten.
>>
>> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, &mode_size[E_%smode]))
>> gcc_unreachable (); // Hit on [4, 4] of the self-test.
>>
>> Pan
>> ________________________________
>> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
>> Sent: Wednesday, March 1, 2023 18:46
>> To: richard.sandiford <richard.sandiford@arm.com>; pan2.li <pan2.li@intel.com>
>> Cc: incarnation.p.lee <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
>> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>
>>>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
>>>>bits, even when that means accessing fully-unused bytes? E.g. 4+4X
>>>>when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
>>>>8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
>>>>of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
>>
>> Hi, Richard. Thank you for helping us.
>> My understanding of RVV ISA:
>>
>> In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VNx4QI, VNx2DI,...etc)
>> For data mode, we fully access the data and we don't have unused bytes, so we don't need to adjust precision.
>> However, for mask mode we access mask bit in compact model (since each mask bit for corresponding element are consecutive).
>> for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, these 4 modes have same bytesize (1,1) but different bitsize.
>>
>> VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, VNx1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignment, I guess it can not).
>>
>> If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2BI,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access in bit alignment. so they will the same in the access.
>> However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte, 2bytes, 4bytes. They are accessing different size.
>>
>> This is my comprehension of RVV ISA, feel free to correct me.
>> Thanks.
>>
>> ________________________________
>> juzhe.zhong@rivai.ai
>>
>> From: Richard Sandiford<mailto:richard.sandiford@arm.com>
>> Date: 2023-03-01 18:11
>> To: Li\, Pan2<mailto:pan2.li@intel.com>
>> CC: 盼 李<mailto:incarnation.p.lee@outlook.com>; incarnation.p.lee--- via Gcc-patches<mailto:gcc-patches@gcc.gnu.org>; juzhe.zhong\@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng\@sifive.com<mailto:kito.cheng@sifive.com>; rguenther\@suse.de<mailto:rguenther@suse.de>
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>> "Li, Pan2" <pan2.li@intel.com> writes:
>>> Hi Richard Sandiford,
>>>
>>> Just tried the overloaded constant divisors with below print div, it works as you mentioned, !
>>>
>>> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
>>> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
>>>
>>> template<unsigned int N, typename Ca, typename Cb, typename Cq>
>>> inline typename if_nonpoly<Cb, bool>::type
>>> can_div_away_from_zero_p (const poly_int_pod<N, Ca> &a,
>>> Cb b,
>>> poly_int_pod<N, Cq> *quotient)
>>> {
>>> if (!can_div_trunc_p (a, b, quotient))
>>> return false;
>>> if (maybe_ne (*quotient * b, a))
>>> for (unsigned int i = 0; i < N; ++i)
>>> quotient->coeffs[i] += (quotient->coeffs[i] < 0 ? -1 : 1);
>>> return true;
>>> }
>>>
>>> But I may have a question about the one case as below.
>>>
>>> Assume:
>>> a = [4, 4], b = 8.
>>>
>>> When meet can_div_trunc_p, it will check if the reminder is constant or not, aka a.coeffs[i] % 8 == 0 (i >= 1). If not constant reminder, the can_div_trunc_p will do nothing about quotient and return false.
>>>
>>> Thus, when a = [4, 4] for can_div_away_from_zero_p, the output *quotient will be unchanged, aka the mod_size[E_%smode] will be unchanged for this case. However, the underlying mode_size will adjust it to the real byte size, and I am not sure if it is by design or requires additional handling.
>>
>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
>> bits, even when that means accessing fully-unused bytes? E.g. 4+4X
>> when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
>> 8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
>> of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
>>
>> Richard
>>
>>> Pan
>>>
>>> From: 盼 李 <incarnation.p.lee@outlook.com>
>>> Sent: Tuesday, February 28, 2023 5:59 PM
>>> To: Richard Sandiford <richard.sandiford@arm.com>; Li, Pan2 <pan2.li@intel.com>
>>> Cc: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de
>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>
>>> Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted.
>>>
>>> Pan
>>> ________________________________
>>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
>>> Sent: Tuesday, February 28, 2023 17:50
>>> To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>>> Cc: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>
>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>
>>> "Li, Pan2" <pan2.li@intel.com<mailto:pan2.li@intel.com>> writes:
>>>> Hi Richard Sandiford,
>>>>
>>>> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
>>>>
>>>> template<unsigned int N, typename Ca>
>>>> inline POLY_CONST_RESULT (N, Ca, Ca)
>>>> normalize_to_unit (const poly_int_pod<N, Ca> &a)
>>>> {
>>>> typedef POLY_CONST_COEFF (Ca, Ca) C;
>>>>
>>>> poly_int<N, C> normalized = a;
>>>>
>>>> if (normalized.is_constant())
>>>> normalized.coeffs[0] = 1;
>>>> else
>>>> for (unsigned int i = 0; i < N; i++)
>>>> POLY_SET_COEFF (C, normalized, i, 1);
>>>>
>>>> return normalized;
>>>> }
>>>>
>>>> And then adjust the genmodes like below to consume the unit poly.
>>>>
>>>> printf (" poly_uint16 unit_poly = "
>>>> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
>>>> printf (" if (known_lt (mode_precision[E_%smode], "
>>>> "unit_poly * BITS_PER_UNIT))\n", m->name);
>>>> printf (" mode_size[E_%smode] = unit_poly;\n", m->name);
>>>>
>>>> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
>>>
>>> My point was that we have multiple ways of dividing poly_ints:
>>>
>>> - exact_div, for when the caller knows that the result is always exact
>>> - can_div_trunc_p, for truncating division (round towards 0)
>>> - can_div_away_from_zero_p, for rounding away from 0
>>> - ...
>>>
>>> This is like how we have multiple division *_EXPRs on trees.
>>>
>>> Until now, exact_div was the correct choice for modes because vector
>>> modes didn't have padding. We're now changing that, so my suggestion
>>> in the review was to change the division operation that we use.
>>> Rather than use exact_div, we should now use can_div_away_from_zero_p,
>>> which would have the effect of rounding the quotient up.
>>>
>>> Something like:
>>>
>>> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,
>>> &mode_size[E_%smode]))
>>> gcc_unreachable ();
>>>
>>> But this will require a new overload of can_div_away_from_zero_p, since
>>> the existing one is for constant quotients rather than constant divisors.
>>>
>>> Thanks,
>>> Richard
>>>
>>>>
>>>> Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
>>>>
>>>> Pan
>>>>
>>>> From: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>
>>>> Sent: Monday, February 27, 2023 11:13 PM
>>>> To: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
>>>> Cc: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; rguenther@suse.de<mailto:rguenther@suse.de>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
>>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>>
>>>> Never mind, wish you have a good holiday.
>>>>
>>>> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
>>>>
>>>> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
>>>>
>>>> Thanks again for your professional suggestion, have a nice day, !
>>>>
>>>> Pan
>>>> ________________________________
>>>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com<mailto:richard.sandiford@arm.com%3cmailto:richard.sandiford@arm.com>>>
>>>> Sent: Monday, February 27, 2023 22:24
>>>> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>>
>>>> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de<mailto:rguenther@suse.de%3cmailto:rguenther@suse.de>>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>>>>
>>>> Sorry for the slow reply, been away for a couple of weeks.
>>>>
>>>> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>> writes:
>>>>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>>>
>>>>> Fix the bug of the rvv bool mode precision with the adjustment.
>>>>> The bits size of vbool*_t will be adjusted to
>>>>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>>>>> adjusted mode precison of vbool*_t will help underlying pass to
>>>>> make the right decision for both the correctness and optimization.
>>>>>
>>>>> Given below sample code:
>>>>> void test_1(int8_t * restrict in, int8_t * restrict out)
>>>>> {
>>>>> vbool8_t v2 = *(vbool8_t*)in;
>>>>> vbool16_t v5 = *(vbool16_t*)in;
>>>>> *(vbool16_t*)(out + 200) = v5;
>>>>> *(vbool8_t*)(out + 100) = v2;
>>>>> }
>>>>>
>>>>> Before the precision adjustment:
>>>>> addi a4,a1,100
>>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>>> addi a1,a1,200
>>>>> vlm.v v24,0(a0)
>>>>> vsm.v v24,0(a4)
>>>>> // Need one vsetvli and vlm.v for correctness here.
>>>>> vsm.v v24,0(a1)
>>>>>
>>>>> After the precision adjustment:
>>>>> csrr t0,vlenb
>>>>> slli t1,t0,1
>>>>> csrr a3,vlenb
>>>>> sub sp,sp,t1
>>>>> slli a4,a3,1
>>>>> add a4,a4,sp
>>>>> sub a3,a4,a3
>>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>>> addi a2,a1,200
>>>>> vlm.v v24,0(a0)
>>>>> vsm.v v24,0(a3)
>>>>> addi a1,a1,100
>>>>> vsetvli a4,zero,e8,mf2,ta,ma
>>>>> csrr t0,vlenb
>>>>> vlm.v v25,0(a3)
>>>>> vsm.v v25,0(a2)
>>>>> slli t1,t0,1
>>>>> vsetvli a5,zero,e8,m1,ta,ma
>>>>> vsm.v v24,0(a1)
>>>>> add sp,sp,t1
>>>>> jr ra
>>>>>
>>>>> However, there may be some optimization opportunates after
>>>>> the mode precision adjustment. It can be token care of in
>>>>> the RISC-V backend in the underlying separted PR(s).
>>>>>
>>>>> PR 108185
>>>>> PR 108654
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>>>>> * config/riscv/riscv.cc (riscv_v_adjust_precision):
>>>>> * config/riscv/riscv.h (riscv_v_adjust_precision):
>>>>> * genmodes.cc (ADJUST_PRECISION):
>>>>> (emit_mode_adjustments):
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>> * gcc.target/riscv/pr108185-1.c: New test.
>>>>> * gcc.target/riscv/pr108185-2.c: New test.
>>>>> * gcc.target/riscv/pr108185-3.c: New test.
>>>>> * gcc.target/riscv/pr108185-4.c: New test.
>>>>> * gcc.target/riscv/pr108185-5.c: New test.
>>>>> * gcc.target/riscv/pr108185-6.c: New test.
>>>>> * gcc.target/riscv/pr108185-7.c: New test.
>>>>> * gcc.target/riscv/pr108185-8.c: New test.
>>>>>
>>>>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
>>>>> ---
>>>>> gcc/config/riscv/riscv-modes.def | 8 +++
>>>>> gcc/config/riscv/riscv.cc | 12 ++++
>>>>> gcc/config/riscv/riscv.h | 1 +
>>>>> gcc/genmodes.cc | 25 ++++++-
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
>>>>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
>>>>> 12 files changed, 598 insertions(+), 1 deletion(-)
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>>>
>>>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
>>>>> index d5305efa8a6..110bddce851 100644
>>>>> --- a/gcc/config/riscv/riscv-modes.def
>>>>> +++ b/gcc/config/riscv/riscv-modes.def
>>>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>>>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
>>>>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>>>>>
>>>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
>>>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
>>>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
>>>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
>>>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
>>>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
>>>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
>>>>> +
>>>>> /*
>>>>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
>>>>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
>>>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>>>> index de3e1f903c7..cbe66c0e35b 100644
>>>>> --- a/gcc/config/riscv/riscv.cc
>>>>> +++ b/gcc/config/riscv/riscv.cc
>>>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
>>>>> return scale;
>>>>> }
>>>>>
>>>>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
>>>>> + PRECISION size for corresponding machine_mode. */
>>>>> +
>>>>> +poly_int64
>>>>> +riscv_v_adjust_precision (machine_mode mode, int scale)
>>>>> +{
>>>>> + if (riscv_v_ext_vector_mode_p (mode))
>>>>> + return riscv_vector_chunks * scale;
>>>>> +
>>>>> + return scale;
>>>>> +}
>>>>> +
>>>>> /* Return true if X is a valid address for machine mode MODE. If it is,
>>>>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
>>>>> effect. */
>>>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>>>>> index 5bc7f2f467d..15b9317a8ce 100644
>>>>> --- a/gcc/config/riscv/riscv.h
>>>>> +++ b/gcc/config/riscv/riscv.h
>>>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
>>>>> extern unsigned riscv_bytes_per_vector_chunk;
>>>>> extern poly_uint16 riscv_vector_chunks;
>>>>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
>>>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
>>>>> /* The number of bits and bytes in a RVV vector. */
>>>>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
>>>>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
>>>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
>>>>> index 2d418f09aab..12f4e6335e6 100644
>>>>> --- a/gcc/genmodes.cc
>>>>> +++ b/gcc/genmodes.cc
>>>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
>>>>> static struct mode_adjust *adj_format;
>>>>> static struct mode_adjust *adj_ibit;
>>>>> static struct mode_adjust *adj_fbit;
>>>>> +static struct mode_adjust *adj_precision;
>>>>>
>>>>> /* Mode class operations. */
>>>>> static enum mode_class
>>>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
>>>>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
>>>>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
>>>>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
>>>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
>>>>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
>>>>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
>>>>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
>>>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
>>>>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
>>>>> m->name, m->name);
>>>>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
>>>>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>>>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
>>>>> + printf (" poly_uint16 size_one = "
>>>>> + "mode_precision[E_%smode].is_constant ()\n", m->name);
>>>>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
>>>>
>>>> Have you tried this on an x86_64 system? I wouldn't expect it to work
>>>> because of the:
>>>>
>>>> STATIC_ASSERT (N >= 2);
>>>>
>>>> in the poly_uint16 constructor.
>>>>
>>>>> + printf (" if (known_lt (mode_precision[E_%smode], "
>>>>> + "size_one * BITS_PER_UNIT))\n", m->name);
>>>>> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
>>>>> + printf (" else\n");
>>>>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
>>>>
>>>> Now that the assert implicit in the original exact_div no longer holds,
>>>> I think we should instead generalise it to can_div_away_from_zero_p
>>>> (which will involve defining a new overload of can_div_away_from_zero_p).
>>>> I think that will give the same result as the code above for the cases
>>>> that the code above handles. But it should be more general too.
>>>>
>>>> TBH, I'm still sceptical that this is all that is needed. It seems
>>>> unlikely that we've been so good at writing vector support code that
>>>> we've made it work for precision < bitsize, despite that being an
>>>> unsupported combination until now. But I guess we can fix problems
>>>> on a case-by-case basis.
>>>>
>>>> Thanks,
>>>> Richard
>>>>
>>>>> " BITS_PER_UNIT);\n", m->name, m->name);
>>>>> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
>>>>> printf (" adjust_mode_mask (E_%smode);\n", m->name);
>>>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
>>>>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
>>>>> a->file, a->line, a->mode->name, a->adjustment);
>>>>>
>>>>> + /* Adjust precision to the actual bits size. */
>>>>> + for (a = adj_precision; a; a = a->next)
>>>>> + switch (a->mode->cl)
>>>>> + {
>>>>> + case MODE_VECTOR_BOOL:
>>>>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
>>>>> + a->adjustment);
>>>>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
>>>>> + break;
>>>>> + default:
>>>>> + break;
>>>>> + }
>>>>> +
>>>>> puts ("}");
>>>>> }
>>>>>
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>>> new file mode 100644
>>>>> index 00000000000..e70960c5b6d
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>>> new file mode 100644
>>>>> index 00000000000..dcc7a644a88
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>>> new file mode 100644
>>>>> index 00000000000..3af0513e006
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>>> new file mode 100644
>>>>> index 00000000000..ea3c360d756
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>>> new file mode 100644
>>>>> index 00000000000..9fc659d2402
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>>> new file mode 100644
>>>>> index 00000000000..98275e5267d
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>>> new file mode 100644
>>>>> index 00000000000..8f6f0b11f09
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
>>>>> @@ -0,0 +1,68 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>>> new file mode 100644
>>>>> index 00000000000..d96959dd064
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
>>>>> @@ -0,0 +1,77 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
>>>>> +
>>>>> +#include "riscv_vector.h"
>>>>> +
>>>>> +void
>>>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool1_t v1 = *(vbool1_t*)in;
>>>>> + vbool1_t v2 = *(vbool1_t*)in;
>>>>> +
>>>>> + *(vbool1_t*)(out + 100) = v1;
>>>>> + *(vbool1_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool2_t v1 = *(vbool2_t*)in;
>>>>> + vbool2_t v2 = *(vbool2_t*)in;
>>>>> +
>>>>> + *(vbool2_t*)(out + 100) = v1;
>>>>> + *(vbool2_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool4_t v1 = *(vbool4_t*)in;
>>>>> + vbool4_t v2 = *(vbool4_t*)in;
>>>>> +
>>>>> + *(vbool4_t*)(out + 100) = v1;
>>>>> + *(vbool4_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool8_t v1 = *(vbool8_t*)in;
>>>>> + vbool8_t v2 = *(vbool8_t*)in;
>>>>> +
>>>>> + *(vbool8_t*)(out + 100) = v1;
>>>>> + *(vbool8_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool16_t v1 = *(vbool16_t*)in;
>>>>> + vbool16_t v2 = *(vbool16_t*)in;
>>>>> +
>>>>> + *(vbool16_t*)(out + 100) = v1;
>>>>> + *(vbool16_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool32_t v1 = *(vbool32_t*)in;
>>>>> + vbool32_t v2 = *(vbool32_t*)in;
>>>>> +
>>>>> + *(vbool32_t*)(out + 100) = v1;
>>>>> + *(vbool32_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
>>>>> + vbool64_t v1 = *(vbool64_t*)in;
>>>>> + vbool64_t v2 = *(vbool64_t*)in;
>>>>> +
>>>>> + *(vbool64_t*)(out + 100) = v1;
>>>>> + *(vbool64_t*)(out + 200) = v2;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
>>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
On Wed, 1 Mar 2023, Richard Sandiford wrote:
> 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > Just have a test with the below code, the [0x4, 0x4] test comes from VNx4BI. You can notice that the mode size is unchanged.
> >
> > printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
> > "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
> >
> > VNx4BI Before precision [0x4, 0x4], size [0x4, 0]
> > VNx4BI After precision [0x4, 0x4], size [0x4, 0]
>
> Yeah, the result is expected to be unchanged if the division fails.
> That's a deliberate part of the interface. The can_* functions
> should never be used without testing the boolean return value.
>
> But this precision of [4,4] for VNx4BI is different from what you
> listed below. Like I say, if the precision really is [4,4], and if
> the size really is ceil([4,4]/8), then I don't think we can represent
> that with current infrastructure.
The size of VNx4BI is (4*N + 7) / 8 bytes. I suppose we could simply
not store the size in bytes but only the size in bits then?
I see the problem, but I also don't see a good solution since
for VNx4BI with N == 3 we have one and a half byte of storage.
How do memory access patterns work with poly-int sizes?
>
> Thanks,
> Richard
>
> >
> > Pan
> > ________________________________
> > From: Richard Sandiford <richard.sandiford@arm.com>
> > Sent: Wednesday, March 1, 2023 19:11
> > To: 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org>
> > Cc: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; pan2.li <pan2.li@intel.com>; 盼 李 <incarnation.p.lee@outlook.com>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
> > Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >
> > 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> Thank you all for your quick response.
> >>
> >> As juzhe mentioned, the memory access of RISC-V will be always aligned to the bytes boundary with the compact mode, aka ceil(vl / 8) bytes for vbool*.
> >
> > OK, thanks to both of you. This is what I'd have expected.
> >
> > In that case, I think both the can_div_away_from_zero_p and the
> > original patch (using size_one) will give the wrong results.
> > There isn't a way of representing ceil([4,4]/8) as a poly_int.
> > The result is (4+4X)/8 when X is odd and (8+4X)/8 when X is even.
> >
> >> Actually, the data [4,4] comes from the self-test, the RISC-V precision mode as below.
> >>
> >> VNx64BI precision [0x40, 0x40].
> >> VNx32BI precision [0x20, 0x20].
> >> VNx16BI precision [0x10, 0x10].
> >> VNx8BI precision [0x8, 0x8].
> >> VNx4BI precision [0x8, 0x8].
> >> VNx2BI precision [0x8, 0x8].
> >> VNx1BI precision [0x8, 0x8].
> >
> > Ah, OK. Which self-test causes this?
> >
> > Richard
> >
> >> The impact of data [4, 4] will impact the genmode part, we cannot write like below as the gcc_unreachable will be hitten.
> >>
> >> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, &mode_size[E_%smode]))
> >> gcc_unreachable (); // Hit on [4, 4] of the self-test.
> >>
> >> Pan
> >> ________________________________
> >> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> >> Sent: Wednesday, March 1, 2023 18:46
> >> To: richard.sandiford <richard.sandiford@arm.com>; pan2.li <pan2.li@intel.com>
> >> Cc: incarnation.p.lee <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
> >> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >>
> >>>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
> >>>>bits, even when that means accessing fully-unused bytes? E.g. 4+4X
> >>>>when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
> >>>>8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
> >>>>of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
> >>
> >> Hi, Richard. Thank you for helping us.
> >> My understanding of RVV ISA:
> >>
> >> In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VNx4QI, VNx2DI,...etc)
> >> For data mode, we fully access the data and we don't have unused bytes, so we don't need to adjust precision.
> >> However, for mask mode we access mask bit in compact model (since each mask bit for corresponding element are consecutive).
> >> for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, these 4 modes have same bytesize (1,1) but different bitsize.
> >>
> >> VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, VNx1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignment, I guess it can not).
> >>
> >> If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2BI,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access in bit alignment. so they will the same in the access.
> >> However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte, 2bytes, 4bytes. They are accessing different size.
> >>
> >> This is my comprehension of RVV ISA, feel free to correct me.
> >> Thanks.
> >>
> >> ________________________________
> >> juzhe.zhong@rivai.ai
> >>
> >> From: Richard Sandiford<mailto:richard.sandiford@arm.com>
> >> Date: 2023-03-01 18:11
> >> To: Li\, Pan2<mailto:pan2.li@intel.com>
> >> CC: 盼 李<mailto:incarnation.p.lee@outlook.com>; incarnation.p.lee--- via Gcc-patches<mailto:gcc-patches@gcc.gnu.org>; juzhe.zhong\@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng\@sifive.com<mailto:kito.cheng@sifive.com>; rguenther\@suse.de<mailto:rguenther@suse.de>
> >> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >> "Li, Pan2" <pan2.li@intel.com> writes:
> >>> Hi Richard Sandiford,
> >>>
> >>> Just tried the overloaded constant divisors with below print div, it works as you mentioned, !
> >>>
> >>> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
> >>> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
> >>>
> >>> template<unsigned int N, typename Ca, typename Cb, typename Cq>
> >>> inline typename if_nonpoly<Cb, bool>::type
> >>> can_div_away_from_zero_p (const poly_int_pod<N, Ca> &a,
> >>> Cb b,
> >>> poly_int_pod<N, Cq> *quotient)
> >>> {
> >>> if (!can_div_trunc_p (a, b, quotient))
> >>> return false;
> >>> if (maybe_ne (*quotient * b, a))
> >>> for (unsigned int i = 0; i < N; ++i)
> >>> quotient->coeffs[i] += (quotient->coeffs[i] < 0 ? -1 : 1);
> >>> return true;
> >>> }
> >>>
> >>> But I may have a question about the one case as below.
> >>>
> >>> Assume:
> >>> a = [4, 4], b = 8.
> >>>
> >>> When meet can_div_trunc_p, it will check if the reminder is constant or not, aka a.coeffs[i] % 8 == 0 (i >= 1). If not constant reminder, the can_div_trunc_p will do nothing about quotient and return false.
> >>>
> >>> Thus, when a = [4, 4] for can_div_away_from_zero_p, the output *quotient will be unchanged, aka the mod_size[E_%smode] will be unchanged for this case. However, the underlying mode_size will adjust it to the real byte size, and I am not sure if it is by design or requires additional handling.
> >>
> >> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
> >> bits, even when that means accessing fully-unused bytes? E.g. 4+4X
> >> when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
> >> 8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
> >> of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
> >>
> >> Richard
> >>
> >>> Pan
> >>>
> >>> From: 盼 李 <incarnation.p.lee@outlook.com>
> >>> Sent: Tuesday, February 28, 2023 5:59 PM
> >>> To: Richard Sandiford <richard.sandiford@arm.com>; Li, Pan2 <pan2.li@intel.com>
> >>> Cc: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de
> >>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >>>
> >>> Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted.
> >>>
> >>> Pan
> >>> ________________________________
> >>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
> >>> Sent: Tuesday, February 28, 2023 17:50
> >>> To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> >>> Cc: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>
> >>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >>>
> >>> "Li, Pan2" <pan2.li@intel.com<mailto:pan2.li@intel.com>> writes:
> >>>> Hi Richard Sandiford,
> >>>>
> >>>> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
> >>>>
> >>>> template<unsigned int N, typename Ca>
> >>>> inline POLY_CONST_RESULT (N, Ca, Ca)
> >>>> normalize_to_unit (const poly_int_pod<N, Ca> &a)
> >>>> {
> >>>> typedef POLY_CONST_COEFF (Ca, Ca) C;
> >>>>
> >>>> poly_int<N, C> normalized = a;
> >>>>
> >>>> if (normalized.is_constant())
> >>>> normalized.coeffs[0] = 1;
> >>>> else
> >>>> for (unsigned int i = 0; i < N; i++)
> >>>> POLY_SET_COEFF (C, normalized, i, 1);
> >>>>
> >>>> return normalized;
> >>>> }
> >>>>
> >>>> And then adjust the genmodes like below to consume the unit poly.
> >>>>
> >>>> printf (" poly_uint16 unit_poly = "
> >>>> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
> >>>> printf (" if (known_lt (mode_precision[E_%smode], "
> >>>> "unit_poly * BITS_PER_UNIT))\n", m->name);
> >>>> printf (" mode_size[E_%smode] = unit_poly;\n", m->name);
> >>>>
> >>>> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
> >>>
> >>> My point was that we have multiple ways of dividing poly_ints:
> >>>
> >>> - exact_div, for when the caller knows that the result is always exact
> >>> - can_div_trunc_p, for truncating division (round towards 0)
> >>> - can_div_away_from_zero_p, for rounding away from 0
> >>> - ...
> >>>
> >>> This is like how we have multiple division *_EXPRs on trees.
> >>>
> >>> Until now, exact_div was the correct choice for modes because vector
> >>> modes didn't have padding. We're now changing that, so my suggestion
> >>> in the review was to change the division operation that we use.
> >>> Rather than use exact_div, we should now use can_div_away_from_zero_p,
> >>> which would have the effect of rounding the quotient up.
> >>>
> >>> Something like:
> >>>
> >>> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,
> >>> &mode_size[E_%smode]))
> >>> gcc_unreachable ();
> >>>
> >>> But this will require a new overload of can_div_away_from_zero_p, since
> >>> the existing one is for constant quotients rather than constant divisors.
> >>>
> >>> Thanks,
> >>> Richard
> >>>
> >>>>
> >>>> Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
> >>>>
> >>>> Pan
> >>>>
> >>>> From: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>
> >>>> Sent: Monday, February 27, 2023 11:13 PM
> >>>> To: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
> >>>> Cc: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; rguenther@suse.de<mailto:rguenther@suse.de>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> >>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >>>>
> >>>> Never mind, wish you have a good holiday.
> >>>>
> >>>> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
> >>>>
> >>>> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
> >>>>
> >>>> Thanks again for your professional suggestion, have a nice day, !
> >>>>
> >>>> Pan
> >>>> ________________________________
> >>>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com<mailto:richard.sandiford@arm.com%3cmailto:richard.sandiford@arm.com>>>
> >>>> Sent: Monday, February 27, 2023 22:24
> >>>> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>>
> >>>> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de<mailto:rguenther@suse.de%3cmailto:rguenther@suse.de>>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.
li@intel.com%3cmailto:pan2..li@intel.com>>>
> >>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >>>>
> >>>> Sorry for the slow reply, been away for a couple of weeks.
> >>>>
> >>>> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>> writes:
> >>>>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
> >>>>>
> >>>>> Fix the bug of the rvv bool mode precision with the adjustment.
> >>>>> The bits size of vbool*_t will be adjusted to
> >>>>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
> >>>>> adjusted mode precison of vbool*_t will help underlying pass to
> >>>>> make the right decision for both the correctness and optimization.
> >>>>>
> >>>>> Given below sample code:
> >>>>> void test_1(int8_t * restrict in, int8_t * restrict out)
> >>>>> {
> >>>>> vbool8_t v2 = *(vbool8_t*)in;
> >>>>> vbool16_t v5 = *(vbool16_t*)in;
> >>>>> *(vbool16_t*)(out + 200) = v5;
> >>>>> *(vbool8_t*)(out + 100) = v2;
> >>>>> }
> >>>>>
> >>>>> Before the precision adjustment:
> >>>>> addi a4,a1,100
> >>>>> vsetvli a5,zero,e8,m1,ta,ma
> >>>>> addi a1,a1,200
> >>>>> vlm.v v24,0(a0)
> >>>>> vsm.v v24,0(a4)
> >>>>> // Need one vsetvli and vlm.v for correctness here.
> >>>>> vsm.v v24,0(a1)
> >>>>>
> >>>>> After the precision adjustment:
> >>>>> csrr t0,vlenb
> >>>>> slli t1,t0,1
> >>>>> csrr a3,vlenb
> >>>>> sub sp,sp,t1
> >>>>> slli a4,a3,1
> >>>>> add a4,a4,sp
> >>>>> sub a3,a4,a3
> >>>>> vsetvli a5,zero,e8,m1,ta,ma
> >>>>> addi a2,a1,200
> >>>>> vlm.v v24,0(a0)
> >>>>> vsm.v v24,0(a3)
> >>>>> addi a1,a1,100
> >>>>> vsetvli a4,zero,e8,mf2,ta,ma
> >>>>> csrr t0,vlenb
> >>>>> vlm.v v25,0(a3)
> >>>>> vsm.v v25,0(a2)
> >>>>> slli t1,t0,1
> >>>>> vsetvli a5,zero,e8,m1,ta,ma
> >>>>> vsm.v v24,0(a1)
> >>>>> add sp,sp,t1
> >>>>> jr ra
> >>>>>
> >>>>> However, there may be some optimization opportunates after
> >>>>> the mode precision adjustment. It can be token care of in
> >>>>> the RISC-V backend in the underlying separted PR(s).
> >>>>>
> >>>>> PR 108185
> >>>>> PR 108654
> >>>>>
> >>>>> gcc/ChangeLog:
> >>>>>
> >>>>> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
> >>>>> * config/riscv/riscv.cc (riscv_v_adjust_precision):
> >>>>> * config/riscv/riscv.h (riscv_v_adjust_precision):
> >>>>> * genmodes.cc (ADJUST_PRECISION):
> >>>>> (emit_mode_adjustments):
> >>>>>
> >>>>> gcc/testsuite/ChangeLog:
> >>>>>
> >>>>> * gcc.target/riscv/pr108185-1.c: New test.
> >>>>> * gcc.target/riscv/pr108185-2.c: New test.
> >>>>> * gcc.target/riscv/pr108185-3.c: New test.
> >>>>> * gcc.target/riscv/pr108185-4.c: New test.
> >>>>> * gcc.target/riscv/pr108185-5.c: New test.
> >>>>> * gcc.target/riscv/pr108185-6.c: New test.
> >>>>> * gcc.target/riscv/pr108185-7.c: New test.
> >>>>> * gcc.target/riscv/pr108185-8.c: New test.
> >>>>>
> >>>>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
> >>>>> ---
> >>>>> gcc/config/riscv/riscv-modes.def | 8 +++
> >>>>> gcc/config/riscv/riscv.cc | 12 ++++
> >>>>> gcc/config/riscv/riscv.h | 1 +
> >>>>> gcc/genmodes.cc | 25 ++++++-
> >>>>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
> >>>>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
> >>>>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
> >>>>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
> >>>>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
> >>>>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
> >>>>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
> >>>>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
> >>>>> 12 files changed, 598 insertions(+), 1 deletion(-)
> >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >>>>>
> >>>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
> >>>>> index d5305efa8a6..110bddce851 100644
> >>>>> --- a/gcc/config/riscv/riscv-modes.def
> >>>>> +++ b/gcc/config/riscv/riscv-modes.def
> >>>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >>>>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >>>>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
> >>>>>
> >>>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
> >>>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
> >>>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
> >>>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
> >>>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
> >>>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
> >>>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
> >>>>> +
> >>>>> /*
> >>>>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> >>>>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
> >>>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> >>>>> index de3e1f903c7..cbe66c0e35b 100644
> >>>>> --- a/gcc/config/riscv/riscv.cc
> >>>>> +++ b/gcc/config/riscv/riscv.cc
> >>>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
> >>>>> return scale;
> >>>>> }
> >>>>>
> >>>>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
> >>>>> + PRECISION size for corresponding machine_mode. */
> >>>>> +
> >>>>> +poly_int64
> >>>>> +riscv_v_adjust_precision (machine_mode mode, int scale)
> >>>>> +{
> >>>>> + if (riscv_v_ext_vector_mode_p (mode))
> >>>>> + return riscv_vector_chunks * scale;
> >>>>> +
> >>>>> + return scale;
> >>>>> +}
> >>>>> +
> >>>>> /* Return true if X is a valid address for machine mode MODE. If it is,
> >>>>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
> >>>>> effect. */
> >>>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> >>>>> index 5bc7f2f467d..15b9317a8ce 100644
> >>>>> --- a/gcc/config/riscv/riscv.h
> >>>>> +++ b/gcc/config/riscv/riscv.h
> >>>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
> >>>>> extern unsigned riscv_bytes_per_vector_chunk;
> >>>>> extern poly_uint16 riscv_vector_chunks;
> >>>>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> >>>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
> >>>>> /* The number of bits and bytes in a RVV vector. */
> >>>>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
> >>>>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
> >>>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
> >>>>> index 2d418f09aab..12f4e6335e6 100644
> >>>>> --- a/gcc/genmodes.cc
> >>>>> +++ b/gcc/genmodes.cc
> >>>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
> >>>>> static struct mode_adjust *adj_format;
> >>>>> static struct mode_adjust *adj_ibit;
> >>>>> static struct mode_adjust *adj_fbit;
> >>>>> +static struct mode_adjust *adj_precision;
> >>>>>
> >>>>> /* Mode class operations. */
> >>>>> static enum mode_class
> >>>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
> >>>>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
> >>>>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
> >>>>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
> >>>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
> >>>>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
> >>>>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
> >>>>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
> >>>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
> >>>>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
> >>>>> m->name, m->name);
> >>>>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
> >>>>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
> >>>>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
> >>>>> + printf (" poly_uint16 size_one = "
> >>>>> + "mode_precision[E_%smode].is_constant ()\n", m->name);
> >>>>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
> >>>>
> >>>> Have you tried this on an x86_64 system? I wouldn't expect it to work
> >>>> because of the:
> >>>>
> >>>> STATIC_ASSERT (N >= 2);
> >>>>
> >>>> in the poly_uint16 constructor.
> >>>>
> >>>>> + printf (" if (known_lt (mode_precision[E_%smode], "
> >>>>> + "size_one * BITS_PER_UNIT))\n", m->name);
> >>>>> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
> >>>>> + printf (" else\n");
> >>>>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
> >>>>
> >>>> Now that the assert implicit in the original exact_div no longer holds,
> >>>> I think we should instead generalise it to can_div_away_from_zero_p
> >>>> (which will involve defining a new overload of can_div_away_from_zero_p).
> >>>> I think that will give the same result as the code above for the cases
> >>>> that the code above handles. But it should be more general too.
> >>>>
> >>>> TBH, I'm still sceptical that this is all that is needed. It seems
> >>>> unlikely that we've been so good at writing vector support code that
> >>>> we've made it work for precision < bitsize, despite that being an
> >>>> unsupported combination until now. But I guess we can fix problems
> >>>> on a case-by-case basis.
> >>>>
> >>>> Thanks,
> >>>> Richard
> >>>>
> >>>>> " BITS_PER_UNIT);\n", m->name, m->name);
> >>>>> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
> >>>>> printf (" adjust_mode_mask (E_%smode);\n", m->name);
> >>>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
> >>>>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
> >>>>> a->file, a->line, a->mode->name, a->adjustment);
> >>>>>
> >>>>> + /* Adjust precision to the actual bits size. */
> >>>>> + for (a = adj_precision; a; a = a->next)
> >>>>> + switch (a->mode->cl)
> >>>>> + {
> >>>>> + case MODE_VECTOR_BOOL:
> >>>>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
> >>>>> + a->adjustment);
> >>>>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
> >>>>> + break;
> >>>>> + default:
> >>>>> + break;
> >>>>> + }
> >>>>> +
> >>>>> puts ("}");
> >>>>> }
> >>>>>
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..e70960c5b6d
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> + *(vbool1_t*)(out + 100) = v1;
> >>>>> + *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> + *(vbool1_t*)(out + 100) = v1;
> >>>>> + *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> + *(vbool1_t*)(out + 100) = v1;
> >>>>> + *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> + *(vbool1_t*)(out + 100) = v1;
> >>>>> + *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> + *(vbool1_t*)(out + 100) = v1;
> >>>>> + *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> + *(vbool1_t*)(out + 100) = v1;
> >>>>> + *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..dcc7a644a88
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> + *(vbool2_t*)(out + 100) = v1;
> >>>>> + *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> + *(vbool2_t*)(out + 100) = v1;
> >>>>> + *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> + *(vbool2_t*)(out + 100) = v1;
> >>>>> + *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> + *(vbool2_t*)(out + 100) = v1;
> >>>>> + *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> + *(vbool2_t*)(out + 100) = v1;
> >>>>> + *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> + *(vbool2_t*)(out + 100) = v1;
> >>>>> + *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..3af0513e006
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> + *(vbool4_t*)(out + 100) = v1;
> >>>>> + *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> + *(vbool4_t*)(out + 100) = v1;
> >>>>> + *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> + *(vbool4_t*)(out + 100) = v1;
> >>>>> + *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> + *(vbool4_t*)(out + 100) = v1;
> >>>>> + *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> + *(vbool4_t*)(out + 100) = v1;
> >>>>> + *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> + *(vbool4_t*)(out + 100) = v1;
> >>>>> + *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..ea3c360d756
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> + *(vbool8_t*)(out + 100) = v1;
> >>>>> + *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> + *(vbool8_t*)(out + 100) = v1;
> >>>>> + *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> + *(vbool8_t*)(out + 100) = v1;
> >>>>> + *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> + *(vbool8_t*)(out + 100) = v1;
> >>>>> + *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> + *(vbool8_t*)(out + 100) = v1;
> >>>>> + *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> + *(vbool8_t*)(out + 100) = v1;
> >>>>> + *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..9fc659d2402
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> + *(vbool16_t*)(out + 100) = v1;
> >>>>> + *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> + *(vbool16_t*)(out + 100) = v1;
> >>>>> + *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> + *(vbool16_t*)(out + 100) = v1;
> >>>>> + *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> + *(vbool16_t*)(out + 100) = v1;
> >>>>> + *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> + *(vbool16_t*)(out + 100) = v1;
> >>>>> + *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> + *(vbool16_t*)(out + 100) = v1;
> >>>>> + *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..98275e5267d
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> + *(vbool32_t*)(out + 100) = v1;
> >>>>> + *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> + *(vbool32_t*)(out + 100) = v1;
> >>>>> + *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> + *(vbool32_t*)(out + 100) = v1;
> >>>>> + *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> + *(vbool32_t*)(out + 100) = v1;
> >>>>> + *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> + *(vbool32_t*)(out + 100) = v1;
> >>>>> + *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> + *(vbool32_t*)(out + 100) = v1;
> >>>>> + *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..8f6f0b11f09
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> + *(vbool64_t*)(out + 100) = v1;
> >>>>> + *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> + *(vbool64_t*)(out + 100) = v1;
> >>>>> + *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> + *(vbool64_t*)(out + 100) = v1;
> >>>>> + *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> + *(vbool64_t*)(out + 100) = v1;
> >>>>> + *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> + *(vbool64_t*)(out + 100) = v1;
> >>>>> + *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> + *(vbool64_t*)(out + 100) = v1;
> >>>>> + *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..d96959dd064
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >>>>> @@ -0,0 +1,77 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> + *(vbool1_t*)(out + 100) = v1;
> >>>>> + *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> + *(vbool2_t*)(out + 100) = v1;
> >>>>> + *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> + *(vbool4_t*)(out + 100) = v1;
> >>>>> + *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> + *(vbool8_t*)(out + 100) = v1;
> >>>>> + *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> + *(vbool16_t*)(out + 100) = v1;
> >>>>> + *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> + *(vbool32_t*)(out + 100) = v1;
> >>>>> + *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> + *(vbool64_t*)(out + 100) = v1;
> >>>>> + *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>
I am not very familiar with the memory pattern, maybe juzhe can provide more information or correct me if anything is misleading.
The different precision try to resolve the below bugs, the second vlm(with different size of load bytes compared to first one)
is eliminated because vbool8 and vbool16 have the same precision size, aka [8, 8].
vbool8_t v2 = *(vbool8_t*)in;
vbool16_t v5 = *(vbool16_t*)in;
*(vbool16_t*)(out + 200) = v5;
*(vbool8_t*)(out + 100) = v2;
addi a4,a1,100
vsetvli a5,zero,e8,m1,ta,ma
addi a1,a1,200
vlm.v v24,0(a0)
vsm.v v24,0(a4)
// Need one vsetvli and vlm.v for correctness here.
vsm.v v24,0(a1)
Pan
________________________________
From: Richard Biener <rguenther@suse.de>
Sent: Wednesday, March 1, 2023 20:33
To: Richard Sandiford <richard.sandiford@arm.com>
Cc: 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org>; 盼 李 <incarnation.p.lee@outlook.com>; juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; pan2.li <pan2.li@intel.com>; Kito.cheng <kito.cheng@sifive.com>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
On Wed, 1 Mar 2023, Richard Sandiford wrote:
> 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > Just have a test with the below code, the [0x4, 0x4] test comes from VNx4BI. You can notice that the mode size is unchanged.
> >
> > printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
> > "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
> >
> > VNx4BI Before precision [0x4, 0x4], size [0x4, 0]
> > VNx4BI After precision [0x4, 0x4], size [0x4, 0]
>
> Yeah, the result is expected to be unchanged if the division fails.
> That's a deliberate part of the interface. The can_* functions
> should never be used without testing the boolean return value.
>
> But this precision of [4,4] for VNx4BI is different from what you
> listed below. Like I say, if the precision really is [4,4], and if
> the size really is ceil([4,4]/8), then I don't think we can represent
> that with current infrastructure.
The size of VNx4BI is (4*N + 7) / 8 bytes. I suppose we could simply
not store the size in bytes but only the size in bits then?
I see the problem, but I also don't see a good solution since
for VNx4BI with N == 3 we have one and a half byte of storage.
How do memory access patterns work with poly-int sizes?
>
> Thanks,
> Richard
>
> >
> > Pan
> > ________________________________
> > From: Richard Sandiford <richard.sandiford@arm.com>
> > Sent: Wednesday, March 1, 2023 19:11
> > To: 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org>
> > Cc: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; pan2.li <pan2.li@intel.com>; 盼 李 <incarnation.p.lee@outlook.com>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
> > Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >
> > 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> Thank you all for your quick response.
> >>
> >> As juzhe mentioned, the memory access of RISC-V will be always aligned to the bytes boundary with the compact mode, aka ceil(vl / 8) bytes for vbool*.
> >
> > OK, thanks to both of you. This is what I'd have expected.
> >
> > In that case, I think both the can_div_away_from_zero_p and the
> > original patch (using size_one) will give the wrong results.
> > There isn't a way of representing ceil([4,4]/8) as a poly_int.
> > The result is (4+4X)/8 when X is odd and (8+4X)/8 when X is even.
> >
> >> Actually, the data [4,4] comes from the self-test, the RISC-V precision mode as below.
> >>
> >> VNx64BI precision [0x40, 0x40].
> >> VNx32BI precision [0x20, 0x20].
> >> VNx16BI precision [0x10, 0x10].
> >> VNx8BI precision [0x8, 0x8].
> >> VNx4BI precision [0x8, 0x8].
> >> VNx2BI precision [0x8, 0x8].
> >> VNx1BI precision [0x8, 0x8].
> >
> > Ah, OK. Which self-test causes this?
> >
> > Richard
> >
> >> The impact of data [4, 4] will impact the genmode part, we cannot write like below as the gcc_unreachable will be hitten.
> >>
> >> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, &mode_size[E_%smode]))
> >> gcc_unreachable (); // Hit on [4, 4] of the self-test.
> >>
> >> Pan
> >> ________________________________
> >> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> >> Sent: Wednesday, March 1, 2023 18:46
> >> To: richard.sandiford <richard.sandiford@arm.com>; pan2.li <pan2.li@intel.com>
> >> Cc: incarnation.p.lee <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
> >> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >>
> >>>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
> >>>>bits, even when that means accessing fully-unused bytes? E.g. 4+4X
> >>>>when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
> >>>>8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
> >>>>of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
> >>
> >> Hi, Richard. Thank you for helping us.
> >> My understanding of RVV ISA:
> >>
> >> In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VNx4QI, VNx2DI,...etc)
> >> For data mode, we fully access the data and we don't have unused bytes, so we don't need to adjust precision.
> >> However, for mask mode we access mask bit in compact model (since each mask bit for corresponding element are consecutive).
> >> for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, these 4 modes have same bytesize (1,1) but different bitsize.
> >>
> >> VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, VNx1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignment, I guess it can not).
> >>
> >> If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2BI,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access in bit alignment. so they will the same in the access.
> >> However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte, 2bytes, 4bytes. They are accessing different size.
> >>
> >> This is my comprehension of RVV ISA, feel free to correct me.
> >> Thanks.
> >>
> >> ________________________________
> >> juzhe.zhong@rivai.ai
> >>
> >> From: Richard Sandiford<mailto:richard.sandiford@arm.com>
> >> Date: 2023-03-01 18:11
> >> To: Li\, Pan2<mailto:pan2.li@intel.com>
> >> CC: 盼 李<mailto:incarnation.p.lee@outlook.com>; incarnation.p.lee--- via Gcc-patches<mailto:gcc-patches@gcc.gnu.org>; juzhe.zhong\@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng\@sifive.com<mailto:kito.cheng@sifive.com>; rguenther\@suse.de<mailto:rguenther@suse.de>
> >> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >> "Li, Pan2" <pan2.li@intel.com> writes:
> >>> Hi Richard Sandiford,
> >>>
> >>> Just tried the overloaded constant divisors with below print div, it works as you mentioned, !
> >>>
> >>> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
> >>> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
> >>>
> >>> template<unsigned int N, typename Ca, typename Cb, typename Cq>
> >>> inline typename if_nonpoly<Cb, bool>::type
> >>> can_div_away_from_zero_p (const poly_int_pod<N, Ca> &a,
> >>> Cb b,
> >>> poly_int_pod<N, Cq> *quotient)
> >>> {
> >>> if (!can_div_trunc_p (a, b, quotient))
> >>> return false;
> >>> if (maybe_ne (*quotient * b, a))
> >>> for (unsigned int i = 0; i < N; ++i)
> >>> quotient->coeffs[i] += (quotient->coeffs[i] < 0 ? -1 : 1);
> >>> return true;
> >>> }
> >>>
> >>> But I may have a question about the one case as below.
> >>>
> >>> Assume:
> >>> a = [4, 4], b = 8.
> >>>
> >>> When meet can_div_trunc_p, it will check if the reminder is constant or not, aka a.coeffs[i] % 8 == 0 (i >= 1). If not constant reminder, the can_div_trunc_p will do nothing about quotient and return false.
> >>>
> >>> Thus, when a = [4, 4] for can_div_away_from_zero_p, the output *quotient will be unchanged, aka the mod_size[E_%smode] will be unchanged for this case. However, the underlying mode_size will adjust it to the real byte size, and I am not sure if it is by design or requires additional handling.
> >>
> >> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
> >> bits, even when that means accessing fully-unused bytes? E.g. 4+4X
> >> when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
> >> 8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
> >> of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
> >>
> >> Richard
> >>
> >>> Pan
> >>>
> >>> From: 盼 李 <incarnation.p.lee@outlook.com>
> >>> Sent: Tuesday, February 28, 2023 5:59 PM
> >>> To: Richard Sandiford <richard.sandiford@arm.com>; Li, Pan2 <pan2.li@intel.com>
> >>> Cc: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de
> >>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >>>
> >>> Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted.
> >>>
> >>> Pan
> >>> ________________________________
> >>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
> >>> Sent: Tuesday, February 28, 2023 17:50
> >>> To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> >>> Cc: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>
> >>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >>>
> >>> "Li, Pan2" <pan2.li@intel.com<mailto:pan2.li@intel.com>> writes:
> >>>> Hi Richard Sandiford,
> >>>>
> >>>> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
> >>>>
> >>>> template<unsigned int N, typename Ca>
> >>>> inline POLY_CONST_RESULT (N, Ca, Ca)
> >>>> normalize_to_unit (const poly_int_pod<N, Ca> &a)
> >>>> {
> >>>> typedef POLY_CONST_COEFF (Ca, Ca) C;
> >>>>
> >>>> poly_int<N, C> normalized = a;
> >>>>
> >>>> if (normalized.is_constant())
> >>>> normalized.coeffs[0] = 1;
> >>>> else
> >>>> for (unsigned int i = 0; i < N; i++)
> >>>> POLY_SET_COEFF (C, normalized, i, 1);
> >>>>
> >>>> return normalized;
> >>>> }
> >>>>
> >>>> And then adjust the genmodes like below to consume the unit poly.
> >>>>
> >>>> printf (" poly_uint16 unit_poly = "
> >>>> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
> >>>> printf (" if (known_lt (mode_precision[E_%smode], "
> >>>> "unit_poly * BITS_PER_UNIT))\n", m->name);
> >>>> printf (" mode_size[E_%smode] = unit_poly;\n", m->name);
> >>>>
> >>>> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
> >>>
> >>> My point was that we have multiple ways of dividing poly_ints:
> >>>
> >>> - exact_div, for when the caller knows that the result is always exact
> >>> - can_div_trunc_p, for truncating division (round towards 0)
> >>> - can_div_away_from_zero_p, for rounding away from 0
> >>> - ...
> >>>
> >>> This is like how we have multiple division *_EXPRs on trees.
> >>>
> >>> Until now, exact_div was the correct choice for modes because vector
> >>> modes didn't have padding. We're now changing that, so my suggestion
> >>> in the review was to change the division operation that we use.
> >>> Rather than use exact_div, we should now use can_div_away_from_zero_p,
> >>> which would have the effect of rounding the quotient up.
> >>>
> >>> Something like:
> >>>
> >>> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,
> >>> &mode_size[E_%smode]))
> >>> gcc_unreachable ();
> >>>
> >>> But this will require a new overload of can_div_away_from_zero_p, since
> >>> the existing one is for constant quotients rather than constant divisors.
> >>>
> >>> Thanks,
> >>> Richard
> >>>
> >>>>
> >>>> Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
> >>>>
> >>>> Pan
> >>>>
> >>>> From: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>
> >>>> Sent: Monday, February 27, 2023 11:13 PM
> >>>> To: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
> >>>> Cc: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; rguenther@suse.de<mailto:rguenther@suse.de>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> >>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >>>>
> >>>> Never mind, wish you have a good holiday.
> >>>>
> >>>> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
> >>>>
> >>>> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
> >>>>
> >>>> Thanks again for your professional suggestion, have a nice day, !
> >>>>
> >>>> Pan
> >>>> ________________________________
> >>>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com<mailto:richard.sandiford@arm.com%3cmailto:richard.sandiford@arm.com>>>
> >>>> Sent: Monday, February 27, 2023 22:24
> >>>> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>>
> >>>> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de<mailto:rguenther@suse.de%3cmailto:rguenther@suse.de>>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.
li@intel.com%3cmailto:pan2..li@intel.com>>>
> >>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >>>>
> >>>> Sorry for the slow reply, been away for a couple of weeks.
> >>>>
> >>>> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>> writes:
> >>>>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
> >>>>>
> >>>>> Fix the bug of the rvv bool mode precision with the adjustment.
> >>>>> The bits size of vbool*_t will be adjusted to
> >>>>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
> >>>>> adjusted mode precison of vbool*_t will help underlying pass to
> >>>>> make the right decision for both the correctness and optimization.
> >>>>>
> >>>>> Given below sample code:
> >>>>> void test_1(int8_t * restrict in, int8_t * restrict out)
> >>>>> {
> >>>>> vbool8_t v2 = *(vbool8_t*)in;
> >>>>> vbool16_t v5 = *(vbool16_t*)in;
> >>>>> *(vbool16_t*)(out + 200) = v5;
> >>>>> *(vbool8_t*)(out + 100) = v2;
> >>>>> }
> >>>>>
> >>>>> Before the precision adjustment:
> >>>>> addi a4,a1,100
> >>>>> vsetvli a5,zero,e8,m1,ta,ma
> >>>>> addi a1,a1,200
> >>>>> vlm.v v24,0(a0)
> >>>>> vsm.v v24,0(a4)
> >>>>> // Need one vsetvli and vlm.v for correctness here.
> >>>>> vsm.v v24,0(a1)
> >>>>>
> >>>>> After the precision adjustment:
> >>>>> csrr t0,vlenb
> >>>>> slli t1,t0,1
> >>>>> csrr a3,vlenb
> >>>>> sub sp,sp,t1
> >>>>> slli a4,a3,1
> >>>>> add a4,a4,sp
> >>>>> sub a3,a4,a3
> >>>>> vsetvli a5,zero,e8,m1,ta,ma
> >>>>> addi a2,a1,200
> >>>>> vlm.v v24,0(a0)
> >>>>> vsm.v v24,0(a3)
> >>>>> addi a1,a1,100
> >>>>> vsetvli a4,zero,e8,mf2,ta,ma
> >>>>> csrr t0,vlenb
> >>>>> vlm.v v25,0(a3)
> >>>>> vsm.v v25,0(a2)
> >>>>> slli t1,t0,1
> >>>>> vsetvli a5,zero,e8,m1,ta,ma
> >>>>> vsm.v v24,0(a1)
> >>>>> add sp,sp,t1
> >>>>> jr ra
> >>>>>
> >>>>> However, there may be some optimization opportunates after
> >>>>> the mode precision adjustment. It can be token care of in
> >>>>> the RISC-V backend in the underlying separted PR(s).
> >>>>>
> >>>>> PR 108185
> >>>>> PR 108654
> >>>>>
> >>>>> gcc/ChangeLog:
> >>>>>
> >>>>> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
> >>>>> * config/riscv/riscv.cc (riscv_v_adjust_precision):
> >>>>> * config/riscv/riscv.h (riscv_v_adjust_precision):
> >>>>> * genmodes.cc (ADJUST_PRECISION):
> >>>>> (emit_mode_adjustments):
> >>>>>
> >>>>> gcc/testsuite/ChangeLog:
> >>>>>
> >>>>> * gcc.target/riscv/pr108185-1.c: New test.
> >>>>> * gcc.target/riscv/pr108185-2.c: New test.
> >>>>> * gcc.target/riscv/pr108185-3.c: New test.
> >>>>> * gcc.target/riscv/pr108185-4.c: New test.
> >>>>> * gcc.target/riscv/pr108185-5.c: New test.
> >>>>> * gcc.target/riscv/pr108185-6.c: New test.
> >>>>> * gcc.target/riscv/pr108185-7.c: New test.
> >>>>> * gcc.target/riscv/pr108185-8.c: New test.
> >>>>>
> >>>>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
> >>>>> ---
> >>>>> gcc/config/riscv/riscv-modes.def | 8 +++
> >>>>> gcc/config/riscv/riscv.cc | 12 ++++
> >>>>> gcc/config/riscv/riscv.h | 1 +
> >>>>> gcc/genmodes.cc | 25 ++++++-
> >>>>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
> >>>>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
> >>>>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
> >>>>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
> >>>>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
> >>>>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
> >>>>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
> >>>>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
> >>>>> 12 files changed, 598 insertions(+), 1 deletion(-)
> >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >>>>>
> >>>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
> >>>>> index d5305efa8a6..110bddce851 100644
> >>>>> --- a/gcc/config/riscv/riscv-modes.def
> >>>>> +++ b/gcc/config/riscv/riscv-modes.def
> >>>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >>>>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> >>>>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
> >>>>>
> >>>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
> >>>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
> >>>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
> >>>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
> >>>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
> >>>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
> >>>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
> >>>>> +
> >>>>> /*
> >>>>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> >>>>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
> >>>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> >>>>> index de3e1f903c7..cbe66c0e35b 100644
> >>>>> --- a/gcc/config/riscv/riscv.cc
> >>>>> +++ b/gcc/config/riscv/riscv.cc
> >>>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
> >>>>> return scale;
> >>>>> }
> >>>>>
> >>>>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
> >>>>> + PRECISION size for corresponding machine_mode. */
> >>>>> +
> >>>>> +poly_int64
> >>>>> +riscv_v_adjust_precision (machine_mode mode, int scale)
> >>>>> +{
> >>>>> + if (riscv_v_ext_vector_mode_p (mode))
> >>>>> + return riscv_vector_chunks * scale;
> >>>>> +
> >>>>> + return scale;
> >>>>> +}
> >>>>> +
> >>>>> /* Return true if X is a valid address for machine mode MODE. If it is,
> >>>>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
> >>>>> effect. */
> >>>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> >>>>> index 5bc7f2f467d..15b9317a8ce 100644
> >>>>> --- a/gcc/config/riscv/riscv.h
> >>>>> +++ b/gcc/config/riscv/riscv.h
> >>>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
> >>>>> extern unsigned riscv_bytes_per_vector_chunk;
> >>>>> extern poly_uint16 riscv_vector_chunks;
> >>>>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> >>>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
> >>>>> /* The number of bits and bytes in a RVV vector. */
> >>>>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
> >>>>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
> >>>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
> >>>>> index 2d418f09aab..12f4e6335e6 100644
> >>>>> --- a/gcc/genmodes.cc
> >>>>> +++ b/gcc/genmodes.cc
> >>>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
> >>>>> static struct mode_adjust *adj_format;
> >>>>> static struct mode_adjust *adj_ibit;
> >>>>> static struct mode_adjust *adj_fbit;
> >>>>> +static struct mode_adjust *adj_precision;
> >>>>>
> >>>>> /* Mode class operations. */
> >>>>> static enum mode_class
> >>>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
> >>>>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
> >>>>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
> >>>>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
> >>>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
> >>>>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
> >>>>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
> >>>>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
> >>>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
> >>>>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
> >>>>> m->name, m->name);
> >>>>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
> >>>>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
> >>>>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
> >>>>> + printf (" poly_uint16 size_one = "
> >>>>> + "mode_precision[E_%smode].is_constant ()\n", m->name);
> >>>>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
> >>>>
> >>>> Have you tried this on an x86_64 system? I wouldn't expect it to work
> >>>> because of the:
> >>>>
> >>>> STATIC_ASSERT (N >= 2);
> >>>>
> >>>> in the poly_uint16 constructor.
> >>>>
> >>>>> + printf (" if (known_lt (mode_precision[E_%smode], "
> >>>>> + "size_one * BITS_PER_UNIT))\n", m->name);
> >>>>> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
> >>>>> + printf (" else\n");
> >>>>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
> >>>>
> >>>> Now that the assert implicit in the original exact_div no longer holds,
> >>>> I think we should instead generalise it to can_div_away_from_zero_p
> >>>> (which will involve defining a new overload of can_div_away_from_zero_p).
> >>>> I think that will give the same result as the code above for the cases
> >>>> that the code above handles. But it should be more general too.
> >>>>
> >>>> TBH, I'm still sceptical that this is all that is needed. It seems
> >>>> unlikely that we've been so good at writing vector support code that
> >>>> we've made it work for precision < bitsize, despite that being an
> >>>> unsupported combination until now. But I guess we can fix problems
> >>>> on a case-by-case basis.
> >>>>
> >>>> Thanks,
> >>>> Richard
> >>>>
> >>>>> " BITS_PER_UNIT);\n", m->name, m->name);
> >>>>> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
> >>>>> printf (" adjust_mode_mask (E_%smode);\n", m->name);
> >>>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
> >>>>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
> >>>>> a->file, a->line, a->mode->name, a->adjustment);
> >>>>>
> >>>>> + /* Adjust precision to the actual bits size. */
> >>>>> + for (a = adj_precision; a; a = a->next)
> >>>>> + switch (a->mode->cl)
> >>>>> + {
> >>>>> + case MODE_VECTOR_BOOL:
> >>>>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
> >>>>> + a->adjustment);
> >>>>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
> >>>>> + break;
> >>>>> + default:
> >>>>> + break;
> >>>>> + }
> >>>>> +
> >>>>> puts ("}");
> >>>>> }
> >>>>>
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..e70960c5b6d
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> + *(vbool1_t*)(out + 100) = v1;
> >>>>> + *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> + *(vbool1_t*)(out + 100) = v1;
> >>>>> + *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> + *(vbool1_t*)(out + 100) = v1;
> >>>>> + *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> + *(vbool1_t*)(out + 100) = v1;
> >>>>> + *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> + *(vbool1_t*)(out + 100) = v1;
> >>>>> + *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> + *(vbool1_t*)(out + 100) = v1;
> >>>>> + *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..dcc7a644a88
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> + *(vbool2_t*)(out + 100) = v1;
> >>>>> + *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> + *(vbool2_t*)(out + 100) = v1;
> >>>>> + *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> + *(vbool2_t*)(out + 100) = v1;
> >>>>> + *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> + *(vbool2_t*)(out + 100) = v1;
> >>>>> + *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> + *(vbool2_t*)(out + 100) = v1;
> >>>>> + *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> + *(vbool2_t*)(out + 100) = v1;
> >>>>> + *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..3af0513e006
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> + *(vbool4_t*)(out + 100) = v1;
> >>>>> + *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> + *(vbool4_t*)(out + 100) = v1;
> >>>>> + *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> + *(vbool4_t*)(out + 100) = v1;
> >>>>> + *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> + *(vbool4_t*)(out + 100) = v1;
> >>>>> + *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> + *(vbool4_t*)(out + 100) = v1;
> >>>>> + *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> + *(vbool4_t*)(out + 100) = v1;
> >>>>> + *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..ea3c360d756
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> + *(vbool8_t*)(out + 100) = v1;
> >>>>> + *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> + *(vbool8_t*)(out + 100) = v1;
> >>>>> + *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> + *(vbool8_t*)(out + 100) = v1;
> >>>>> + *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> + *(vbool8_t*)(out + 100) = v1;
> >>>>> + *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> + *(vbool8_t*)(out + 100) = v1;
> >>>>> + *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> + *(vbool8_t*)(out + 100) = v1;
> >>>>> + *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..9fc659d2402
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> + *(vbool16_t*)(out + 100) = v1;
> >>>>> + *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> + *(vbool16_t*)(out + 100) = v1;
> >>>>> + *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> + *(vbool16_t*)(out + 100) = v1;
> >>>>> + *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> + *(vbool16_t*)(out + 100) = v1;
> >>>>> + *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> + *(vbool16_t*)(out + 100) = v1;
> >>>>> + *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> + *(vbool16_t*)(out + 100) = v1;
> >>>>> + *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..98275e5267d
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> + *(vbool32_t*)(out + 100) = v1;
> >>>>> + *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> + *(vbool32_t*)(out + 100) = v1;
> >>>>> + *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> + *(vbool32_t*)(out + 100) = v1;
> >>>>> + *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> + *(vbool32_t*)(out + 100) = v1;
> >>>>> + *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> + *(vbool32_t*)(out + 100) = v1;
> >>>>> + *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> + *(vbool32_t*)(out + 100) = v1;
> >>>>> + *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..8f6f0b11f09
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> >>>>> @@ -0,0 +1,68 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> + *(vbool64_t*)(out + 100) = v1;
> >>>>> + *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> + *(vbool64_t*)(out + 100) = v1;
> >>>>> + *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> + *(vbool64_t*)(out + 100) = v1;
> >>>>> + *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> + *(vbool64_t*)(out + 100) = v1;
> >>>>> + *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> + *(vbool64_t*)(out + 100) = v1;
> >>>>> + *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> + *(vbool64_t*)(out + 100) = v1;
> >>>>> + *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..d96959dd064
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> >>>>> @@ -0,0 +1,77 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> >>>>> +
> >>>>> +#include "riscv_vector.h"
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> >>>>> +
> >>>>> + *(vbool1_t*)(out + 100) = v1;
> >>>>> + *(vbool1_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> >>>>> +
> >>>>> + *(vbool2_t*)(out + 100) = v1;
> >>>>> + *(vbool2_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> >>>>> +
> >>>>> + *(vbool4_t*)(out + 100) = v1;
> >>>>> + *(vbool4_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> >>>>> +
> >>>>> + *(vbool8_t*)(out + 100) = v1;
> >>>>> + *(vbool8_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> >>>>> +
> >>>>> + *(vbool16_t*)(out + 100) = v1;
> >>>>> + *(vbool16_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> >>>>> +
> >>>>> + *(vbool32_t*)(out + 100) = v1;
> >>>>> + *(vbool32_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> >>>>> +
> >>>>> + *(vbool64_t*)(out + 100) = v1;
> >>>>> + *(vbool64_t*)(out + 200) = v2;
> >>>>> +}
> >>>>> +
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
> >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
On Wed, 1 Mar 2023, Pan Li wrote:
> I am not very familiar with the memory pattern, maybe juzhe can provide more information or correct me if anything is misleading.
>
> The different precision try to resolve the below bugs, the second
> vlm(with different size of load bytes compared to first one) is
> eliminated because vbool8 and vbool16 have the same precision size, aka
> [8, 8].
That's because the corresponding data vectors to vbool8 and vbool16
have the same number of lanes, right? (another RVV pecularity)
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
> *(vbool16_t*)(out + 200) = v5;
> *(vbool8_t*)(out + 100) = v2;
>
> addi a4,a1,100
> vsetvli a5,zero,e8,m1,ta,ma
> addi a1,a1,200
> vlm.v v24,0(a0)
> vsm.v v24,0(a4)
> // Need one vsetvli and vlm.v for correctness here.
> vsm.v v24,0(a1)
>
> Pan
> ________________________________
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, March 1, 2023 20:33
> To: Richard Sandiford <richard.sandiford@arm.com>
> Cc: 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org>; 盼 李 <incarnation.p.lee@outlook.com>; juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; pan2.li <pan2.li@intel.com>; Kito.cheng <kito.cheng@sifive.com>
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
> On Wed, 1 Mar 2023, Richard Sandiford wrote:
>
> > 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > Just have a test with the below code, the [0x4, 0x4] test comes from VNx4BI. You can notice that the mode size is unchanged.
> > >
> > > printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
> > > "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
> > >
> > > VNx4BI Before precision [0x4, 0x4], size [0x4, 0]
> > > VNx4BI After precision [0x4, 0x4], size [0x4, 0]
> >
> > Yeah, the result is expected to be unchanged if the division fails.
> > That's a deliberate part of the interface. The can_* functions
> > should never be used without testing the boolean return value.
> >
> > But this precision of [4,4] for VNx4BI is different from what you
> > listed below. Like I say, if the precision really is [4,4], and if
> > the size really is ceil([4,4]/8), then I don't think we can represent
> > that with current infrastructure.
>
> The size of VNx4BI is (4*N + 7) / 8 bytes. I suppose we could simply
> not store the size in bytes but only the size in bits then?
>
> I see the problem, but I also don't see a good solution since
> for VNx4BI with N == 3 we have one and a half byte of storage.
>
> How do memory access patterns work with poly-int sizes?
>
> >
> > Thanks,
> > Richard
> >
> > >
> > > Pan
> > > ________________________________
> > > From: Richard Sandiford <richard.sandiford@arm.com>
> > > Sent: Wednesday, March 1, 2023 19:11
> > > To: 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org>
> > > Cc: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; pan2.li <pan2.li@intel.com>; 盼 李 <incarnation.p.lee@outlook.com>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
> > > Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> > >
> > > 盼 李 via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > >> Thank you all for your quick response.
> > >>
> > >> As juzhe mentioned, the memory access of RISC-V will be always aligned to the bytes boundary with the compact mode, aka ceil(vl / 8) bytes for vbool*.
> > >
> > > OK, thanks to both of you. This is what I'd have expected.
> > >
> > > In that case, I think both the can_div_away_from_zero_p and the
> > > original patch (using size_one) will give the wrong results.
> > > There isn't a way of representing ceil([4,4]/8) as a poly_int.
> > > The result is (4+4X)/8 when X is odd and (8+4X)/8 when X is even.
> > >
> > >> Actually, the data [4,4] comes from the self-test, the RISC-V precision mode as below.
> > >>
> > >> VNx64BI precision [0x40, 0x40].
> > >> VNx32BI precision [0x20, 0x20].
> > >> VNx16BI precision [0x10, 0x10].
> > >> VNx8BI precision [0x8, 0x8].
> > >> VNx4BI precision [0x8, 0x8].
> > >> VNx2BI precision [0x8, 0x8].
> > >> VNx1BI precision [0x8, 0x8].
> > >
> > > Ah, OK. Which self-test causes this?
> > >
> > > Richard
> > >
> > >> The impact of data [4, 4] will impact the genmode part, we cannot write like below as the gcc_unreachable will be hitten.
> > >>
> > >> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT, &mode_size[E_%smode]))
> > >> gcc_unreachable (); // Hit on [4, 4] of the self-test.
> > >>
> > >> Pan
> > >> ________________________________
> > >> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> > >> Sent: Wednesday, March 1, 2023 18:46
> > >> To: richard.sandiford <richard.sandiford@arm.com>; pan2.li <pan2.li@intel.com>
> > >> Cc: incarnation.p.lee <incarnation.p.lee@outlook.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Kito.cheng <kito.cheng@sifive.com>; rguenther <rguenther@suse.de>
> > >> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> > >>
> > >>>> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
> > >>>>bits, even when that means accessing fully-unused bytes? E.g. 4+4X
> > >>>>when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
> > >>>>8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
> > >>>>of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
> > >>
> > >> Hi, Richard. Thank you for helping us.
> > >> My understanding of RVV ISA:
> > >>
> > >> In RVV, we have mask mode (VNx1BI, VNx2BI,....etc), data mode (VNx1QI,VNx4QI, VNx2DI,...etc)
> > >> For data mode, we fully access the data and we don't have unused bytes, so we don't need to adjust precision.
> > >> However, for mask mode we access mask bit in compact model (since each mask bit for corresponding element are consecutive).
> > >> for example, current configuration: VNx1BI, VNx2BI, VNx4BI, VNx8BI, these 4 modes have same bytesize (1,1) but different bitsize.
> > >>
> > >> VNx8BI is accessed fully, but VNx4BI is only accessed 1/2, VNx2BI 1/4, VNx1BI 1/8 but byte alignment (I am not sure whether RVV support bit alignment, I guess it can not).
> > >>
> > >> If VNx8BI only occupy 1 byte (Depend on machine vector-length), so VNx2BI,VN4BI, VNx1BI, are 2/8 byte, 4/8 byte, 1/8 bytes. I think we can't access in bit alignment. so they will the same in the access.
> > >> However, if VNx8BI occupty 8 byte, Well, VNx2BI,VN4BI, VNx1BI are 1byte, 2bytes, 4bytes. They are accessing different size.
> > >>
> > >> This is my comprehension of RVV ISA, feel free to correct me.
> > >> Thanks.
> > >>
> > >> ________________________________
> > >> juzhe.zhong@rivai.ai
> > >>
> > >> From: Richard Sandiford<mailto:richard.sandiford@arm.com>
> > >> Date: 2023-03-01 18:11
> > >> To: Li\, Pan2<mailto:pan2.li@intel.com>
> > >> CC: 盼 李<mailto:incarnation.p.lee@outlook.com>; incarnation.p.lee--- via Gcc-patches<mailto:gcc-patches@gcc.gnu.org>; juzhe.zhong\@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng\@sifive.com<mailto:kito.cheng@sifive.com>; rguenther\@suse.de<mailto:rguenther@suse.de>
> > >> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> > >> "Li, Pan2" <pan2.li@intel.com> writes:
> > >>> Hi Richard Sandiford,
> > >>>
> > >>> Just tried the overloaded constant divisors with below print div, it works as you mentioned, !
> > >>>
> > >>> printf (" can_div_away_from_zero_p (mode_precision[E_%smode], "
> > >>> "BITS_PER_UNIT, &mode_size[E_%smode]);\n", m->name, m->name);
> > >>>
> > >>> template<unsigned int N, typename Ca, typename Cb, typename Cq>
> > >>> inline typename if_nonpoly<Cb, bool>::type
> > >>> can_div_away_from_zero_p (const poly_int_pod<N, Ca> &a,
> > >>> Cb b,
> > >>> poly_int_pod<N, Cq> *quotient)
> > >>> {
> > >>> if (!can_div_trunc_p (a, b, quotient))
> > >>> return false;
> > >>> if (maybe_ne (*quotient * b, a))
> > >>> for (unsigned int i = 0; i < N; ++i)
> > >>> quotient->coeffs[i] += (quotient->coeffs[i] < 0 ? -1 : 1);
> > >>> return true;
> > >>> }
> > >>>
> > >>> But I may have a question about the one case as below.
> > >>>
> > >>> Assume:
> > >>> a = [4, 4], b = 8.
> > >>>
> > >>> When meet can_div_trunc_p, it will check if the reminder is constant or not, aka a.coeffs[i] % 8 == 0 (i >= 1). If not constant reminder, the can_div_trunc_p will do nothing about quotient and return false.
> > >>>
> > >>> Thus, when a = [4, 4] for can_div_away_from_zero_p, the output *quotient will be unchanged, aka the mod_size[E_%smode] will be unchanged for this case. However, the underlying mode_size will adjust it to the real byte size, and I am not sure if it is by design or requires additional handling.
> > >>
> > >> Is it right that, for RVV, a load or store of [4,4] will access [8,8]
> > >> bits, even when that means accessing fully-unused bytes? E.g. 4+4X
> > >> when X=3 would be 16 bits/2 bytes of useful data, but a bitsize of
> > >> 8+8X would be 32 bits/4 bytes. So a store of [8,8] for a precision
> > >> of [4,4] would store 2 bytes beyond the end of the useful data when X==3?
> > >>
> > >> Richard
> > >>
> > >>> Pan
> > >>>
> > >>> From: 盼 李 <incarnation.p.lee@outlook.com>
> > >>> Sent: Tuesday, February 28, 2023 5:59 PM
> > >>> To: Richard Sandiford <richard.sandiford@arm.com>; Li, Pan2 <pan2.li@intel.com>
> > >>> Cc: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org>; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de
> > >>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> > >>>
> > >>> Understood, thanks for the explanations and suggestions. Let me have a try and keep you posted.
> > >>>
> > >>> Pan
> > >>> ________________________________
> > >>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>
> > >>> Sent: Tuesday, February 28, 2023 17:50
> > >>> To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> > >>> Cc: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de>>
> > >>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> > >>>
> > >>> "Li, Pan2" <pan2.li@intel.com<mailto:pan2.li@intel.com>> writes:
> > >>>> Hi Richard Sandiford,
> > >>>>
> > >>>> After some investigation, I am not sure if it is possible to make it general without any changes to exact_div. We can add one method like below to get the unit poly for all possible N.
> > >>>>
> > >>>> template<unsigned int N, typename Ca>
> > >>>> inline POLY_CONST_RESULT (N, Ca, Ca)
> > >>>> normalize_to_unit (const poly_int_pod<N, Ca> &a)
> > >>>> {
> > >>>> typedef POLY_CONST_COEFF (Ca, Ca) C;
> > >>>>
> > >>>> poly_int<N, C> normalized = a;
> > >>>>
> > >>>> if (normalized.is_constant())
> > >>>> normalized.coeffs[0] = 1;
> > >>>> else
> > >>>> for (unsigned int i = 0; i < N; i++)
> > >>>> POLY_SET_COEFF (C, normalized, i, 1);
> > >>>>
> > >>>> return normalized;
> > >>>> }
> > >>>>
> > >>>> And then adjust the genmodes like below to consume the unit poly.
> > >>>>
> > >>>> printf (" poly_uint16 unit_poly = "
> > >>>> "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
> > >>>> printf (" if (known_lt (mode_precision[E_%smode], "
> > >>>> "unit_poly * BITS_PER_UNIT))\n", m->name);
> > >>>> printf (" mode_size[E_%smode] = unit_poly;\n", m->name);
> > >>>>
> > >>>> I am not sure if it is a good idea to introduce above normalize code into exact_div. Given the comment of the exact_div indicates that “/* Return A / B, given that A is known to be a multiple of B. */”.
> > >>>
> > >>> My point was that we have multiple ways of dividing poly_ints:
> > >>>
> > >>> - exact_div, for when the caller knows that the result is always exact
> > >>> - can_div_trunc_p, for truncating division (round towards 0)
> > >>> - can_div_away_from_zero_p, for rounding away from 0
> > >>> - ...
> > >>>
> > >>> This is like how we have multiple division *_EXPRs on trees.
> > >>>
> > >>> Until now, exact_div was the correct choice for modes because vector
> > >>> modes didn't have padding. We're now changing that, so my suggestion
> > >>> in the review was to change the division operation that we use.
> > >>> Rather than use exact_div, we should now use can_div_away_from_zero_p,
> > >>> which would have the effect of rounding the quotient up.
> > >>>
> > >>> Something like:
> > >>>
> > >>> if (!can_div_away_from_zero_p (mode_precision[E_%smode], BITS_PER_UNIT,
> > >>> &mode_size[E_%smode]))
> > >>> gcc_unreachable ();
> > >>>
> > >>> But this will require a new overload of can_div_away_from_zero_p, since
> > >>> the existing one is for constant quotients rather than constant divisors.
> > >>>
> > >>> Thanks,
> > >>> Richard
> > >>>
> > >>>>
> > >>>> Could you please help to share your opinion about this from the expert’s perspective ? Thank you!
> > >>>>
> > >>>> Pan
> > >>>>
> > >>>> From: 盼 李 <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com>>
> > >>>> Sent: Monday, February 27, 2023 11:13 PM
> > >>>> To: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>
> > >>>> Cc: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com>; rguenther@suse.de<mailto:rguenther@suse.de>; Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> > >>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> > >>>>
> > >>>> Never mind, wish you have a good holiday.
> > >>>>
> > >>>> Thanks for pointing this out, the if part cannot take care of poly_int with N > 2. As I understand, we need to make it general for all the N of poly_int.
> > >>>>
> > >>>> Thus I would like to double confirm with you about how to make it general. I suppose there will be a new function can_div_away_from_zero_p to replace the if (known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the word exact, I suppose we should not touch here), right? Then we still need one poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.
> > >>>>
> > >>>> Thanks again for your professional suggestion, have a nice day, !
> > >>>>
> > >>>> Pan
> > >>>> ________________________________
> > >>>> From: Richard Sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com<mailto:richard.sandiford@arm.com%3cmailto:richard.sandiford@arm.com>>>
> > >>>> Sent: Monday, February 27, 2023 22:24
> > >>>> To: incarnation.p.lee--- via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>>
> > >>>> Cc: incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>> <incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com<mailto:incarnation.p.lee@outlook.com%3cmailto:incarnation.p.lee@outlook.com>>>; juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>> <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai%3cmailto:juzhe.zhong@rivai.ai>>>; kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>> <kito.cheng@sifive.com<mailto:kito.cheng@sifive.com<mailto:kito.cheng@sifive.com%3cmailto:kito.cheng@sifive.com>>>; rguenther@suse.de<mailto:rguenther@suse.de> <rguenther@suse.de<mailto:rguenther@suse.de<mailto:rguenther@suse.de%3cmailto:rguenther@suse.de>>>; pan2.li@intel.com<mailto:pan2.li@intel.com> <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan
2.
> li@intel.com%3cmailto:pan2..li@intel.com>>>
> > >>>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> > >>>>
> > >>>> Sorry for the slow reply, been away for a couple of weeks.
> > >>>>
> > >>>> "incarnation.p.lee--- via Gcc-patches" <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org%3cmailto:gcc-patches@gcc.gnu.org>>> writes:
> > >>>>> From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
> > >>>>>
> > >>>>> Fix the bug of the rvv bool mode precision with the adjustment.
> > >>>>> The bits size of vbool*_t will be adjusted to
> > >>>>> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
> > >>>>> adjusted mode precison of vbool*_t will help underlying pass to
> > >>>>> make the right decision for both the correctness and optimization.
> > >>>>>
> > >>>>> Given below sample code:
> > >>>>> void test_1(int8_t * restrict in, int8_t * restrict out)
> > >>>>> {
> > >>>>> vbool8_t v2 = *(vbool8_t*)in;
> > >>>>> vbool16_t v5 = *(vbool16_t*)in;
> > >>>>> *(vbool16_t*)(out + 200) = v5;
> > >>>>> *(vbool8_t*)(out + 100) = v2;
> > >>>>> }
> > >>>>>
> > >>>>> Before the precision adjustment:
> > >>>>> addi a4,a1,100
> > >>>>> vsetvli a5,zero,e8,m1,ta,ma
> > >>>>> addi a1,a1,200
> > >>>>> vlm.v v24,0(a0)
> > >>>>> vsm.v v24,0(a4)
> > >>>>> // Need one vsetvli and vlm.v for correctness here.
> > >>>>> vsm.v v24,0(a1)
> > >>>>>
> > >>>>> After the precision adjustment:
> > >>>>> csrr t0,vlenb
> > >>>>> slli t1,t0,1
> > >>>>> csrr a3,vlenb
> > >>>>> sub sp,sp,t1
> > >>>>> slli a4,a3,1
> > >>>>> add a4,a4,sp
> > >>>>> sub a3,a4,a3
> > >>>>> vsetvli a5,zero,e8,m1,ta,ma
> > >>>>> addi a2,a1,200
> > >>>>> vlm.v v24,0(a0)
> > >>>>> vsm.v v24,0(a3)
> > >>>>> addi a1,a1,100
> > >>>>> vsetvli a4,zero,e8,mf2,ta,ma
> > >>>>> csrr t0,vlenb
> > >>>>> vlm.v v25,0(a3)
> > >>>>> vsm.v v25,0(a2)
> > >>>>> slli t1,t0,1
> > >>>>> vsetvli a5,zero,e8,m1,ta,ma
> > >>>>> vsm.v v24,0(a1)
> > >>>>> add sp,sp,t1
> > >>>>> jr ra
> > >>>>>
> > >>>>> However, there may be some optimization opportunates after
> > >>>>> the mode precision adjustment. It can be token care of in
> > >>>>> the RISC-V backend in the underlying separted PR(s).
> > >>>>>
> > >>>>> PR 108185
> > >>>>> PR 108654
> > >>>>>
> > >>>>> gcc/ChangeLog:
> > >>>>>
> > >>>>> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
> > >>>>> * config/riscv/riscv.cc (riscv_v_adjust_precision):
> > >>>>> * config/riscv/riscv.h (riscv_v_adjust_precision):
> > >>>>> * genmodes.cc (ADJUST_PRECISION):
> > >>>>> (emit_mode_adjustments):
> > >>>>>
> > >>>>> gcc/testsuite/ChangeLog:
> > >>>>>
> > >>>>> * gcc.target/riscv/pr108185-1.c: New test.
> > >>>>> * gcc.target/riscv/pr108185-2.c: New test.
> > >>>>> * gcc.target/riscv/pr108185-3.c: New test.
> > >>>>> * gcc.target/riscv/pr108185-4.c: New test.
> > >>>>> * gcc.target/riscv/pr108185-5.c: New test.
> > >>>>> * gcc.target/riscv/pr108185-6.c: New test.
> > >>>>> * gcc.target/riscv/pr108185-7.c: New test.
> > >>>>> * gcc.target/riscv/pr108185-8.c: New test.
> > >>>>>
> > >>>>> Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com<mailto:pan2.li@intel.com%3cmailto:pan2.li@intel.com>>>
> > >>>>> ---
> > >>>>> gcc/config/riscv/riscv-modes.def | 8 +++
> > >>>>> gcc/config/riscv/riscv.cc | 12 ++++
> > >>>>> gcc/config/riscv/riscv.h | 1 +
> > >>>>> gcc/genmodes.cc | 25 ++++++-
> > >>>>> gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++++++++++++++++++
> > >>>>> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++++++++++++++++++
> > >>>>> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++++++++++++++++++
> > >>>>> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++++++++++++++++++
> > >>>>> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++++++++++++++++++
> > >>>>> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++++++++++++++++++
> > >>>>> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++++++++++++++++++
> > >>>>> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +++++++++++++++++++++
> > >>>>> 12 files changed, 598 insertions(+), 1 deletion(-)
> > >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > >>>>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > >>>>>
> > >>>>> diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
> > >>>>> index d5305efa8a6..110bddce851 100644
> > >>>>> --- a/gcc/config/riscv/riscv-modes.def
> > >>>>> +++ b/gcc/config/riscv/riscv-modes.def
> > >>>>> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > >>>>> ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> > >>>>> ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
> > >>>>>
> > >>>>> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
> > >>>>> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
> > >>>>> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
> > >>>>> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
> > >>>>> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
> > >>>>> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
> > >>>>> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
> > >>>>> +
> > >>>>> /*
> > >>>>> | Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> > >>>>> | | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
> > >>>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > >>>>> index de3e1f903c7..cbe66c0e35b 100644
> > >>>>> --- a/gcc/config/riscv/riscv.cc
> > >>>>> +++ b/gcc/config/riscv/riscv.cc
> > >>>>> @@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
> > >>>>> return scale;
> > >>>>> }
> > >>>>>
> > >>>>> +/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
> > >>>>> + PRECISION size for corresponding machine_mode. */
> > >>>>> +
> > >>>>> +poly_int64
> > >>>>> +riscv_v_adjust_precision (machine_mode mode, int scale)
> > >>>>> +{
> > >>>>> + if (riscv_v_ext_vector_mode_p (mode))
> > >>>>> + return riscv_vector_chunks * scale;
> > >>>>> +
> > >>>>> + return scale;
> > >>>>> +}
> > >>>>> +
> > >>>>> /* Return true if X is a valid address for machine mode MODE. If it is,
> > >>>>> fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
> > >>>>> effect. */
> > >>>>> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> > >>>>> index 5bc7f2f467d..15b9317a8ce 100644
> > >>>>> --- a/gcc/config/riscv/riscv.h
> > >>>>> +++ b/gcc/config/riscv/riscv.h
> > >>>>> @@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
> > >>>>> extern unsigned riscv_bytes_per_vector_chunk;
> > >>>>> extern poly_uint16 riscv_vector_chunks;
> > >>>>> extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
> > >>>>> +extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
> > >>>>> /* The number of bits and bytes in a RVV vector. */
> > >>>>> #define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
> > >>>>> #define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
> > >>>>> diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
> > >>>>> index 2d418f09aab..12f4e6335e6 100644
> > >>>>> --- a/gcc/genmodes.cc
> > >>>>> +++ b/gcc/genmodes.cc
> > >>>>> @@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
> > >>>>> static struct mode_adjust *adj_format;
> > >>>>> static struct mode_adjust *adj_ibit;
> > >>>>> static struct mode_adjust *adj_fbit;
> > >>>>> +static struct mode_adjust *adj_precision;
> > >>>>>
> > >>>>> /* Mode class operations. */
> > >>>>> static enum mode_class
> > >>>>> @@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
> > >>>>> #define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
> > >>>>> #define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
> > >>>>> #define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
> > >>>>> +#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
> > >>>>> #define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
> > >>>>> #define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
> > >>>>> #define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
> > >>>>> @@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
> > >>>>> " (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
> > >>>>> m->name, m->name);
> > >>>>> printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
> > >>>>> - printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
> > >>>>> + /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
> > >>>>> + printf (" poly_uint16 size_one = "
> > >>>>> + "mode_precision[E_%smode].is_constant ()\n", m->name);
> > >>>>> + printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
> > >>>>
> > >>>> Have you tried this on an x86_64 system? I wouldn't expect it to work
> > >>>> because of the:
> > >>>>
> > >>>> STATIC_ASSERT (N >= 2);
> > >>>>
> > >>>> in the poly_uint16 constructor.
> > >>>>
> > >>>>> + printf (" if (known_lt (mode_precision[E_%smode], "
> > >>>>> + "size_one * BITS_PER_UNIT))\n", m->name);
> > >>>>> + printf (" mode_size[E_%smode] = size_one;\n", m->name);
> > >>>>> + printf (" else\n");
> > >>>>> + printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
> > >>>>
> > >>>> Now that the assert implicit in the original exact_div no longer holds,
> > >>>> I think we should instead generalise it to can_div_away_from_zero_p
> > >>>> (which will involve defining a new overload of can_div_away_from_zero_p).
> > >>>> I think that will give the same result as the code above for the cases
> > >>>> that the code above handles. But it should be more general too.
> > >>>>
> > >>>> TBH, I'm still sceptical that this is all that is needed. It seems
> > >>>> unlikely that we've been so good at writing vector support code that
> > >>>> we've made it work for precision < bitsize, despite that being an
> > >>>> unsupported combination until now. But I guess we can fix problems
> > >>>> on a case-by-case basis.
> > >>>>
> > >>>> Thanks,
> > >>>> Richard
> > >>>>
> > >>>>> " BITS_PER_UNIT);\n", m->name, m->name);
> > >>>>> printf (" mode_nunits[E_%smode] = ps;\n", m->name);
> > >>>>> printf (" adjust_mode_mask (E_%smode);\n", m->name);
> > >>>>> @@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
> > >>>>> printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
> > >>>>> a->file, a->line, a->mode->name, a->adjustment);
> > >>>>>
> > >>>>> + /* Adjust precision to the actual bits size. */
> > >>>>> + for (a = adj_precision; a; a = a->next)
> > >>>>> + switch (a->mode->cl)
> > >>>>> + {
> > >>>>> + case MODE_VECTOR_BOOL:
> > >>>>> + printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
> > >>>>> + a->adjustment);
> > >>>>> + printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
> > >>>>> + break;
> > >>>>> + default:
> > >>>>> + break;
> > >>>>> + }
> > >>>>> +
> > >>>>> puts ("}");
> > >>>>> }
> > >>>>>
> > >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-1.c b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > >>>>> new file mode 100644
> > >>>>> index 00000000000..e70960c5b6d
> > >>>>> --- /dev/null
> > >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-1.c
> > >>>>> @@ -0,0 +1,68 @@
> > >>>>> +/* { dg-do compile } */
> > >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > >>>>> +
> > >>>>> +#include "riscv_vector.h"
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> > >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> > >>>>> +
> > >>>>> + *(vbool1_t*)(out + 100) = v1;
> > >>>>> + *(vbool2_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> > >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> > >>>>> +
> > >>>>> + *(vbool1_t*)(out + 100) = v1;
> > >>>>> + *(vbool4_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> > >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> > >>>>> +
> > >>>>> + *(vbool1_t*)(out + 100) = v1;
> > >>>>> + *(vbool8_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> > >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> > >>>>> +
> > >>>>> + *(vbool1_t*)(out + 100) = v1;
> > >>>>> + *(vbool16_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> > >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> > >>>>> +
> > >>>>> + *(vbool1_t*)(out + 100) = v1;
> > >>>>> + *(vbool32_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> > >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> > >>>>> +
> > >>>>> + *(vbool1_t*)(out + 100) = v1;
> > >>>>> + *(vbool64_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
> > >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-2.c b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > >>>>> new file mode 100644
> > >>>>> index 00000000000..dcc7a644a88
> > >>>>> --- /dev/null
> > >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-2.c
> > >>>>> @@ -0,0 +1,68 @@
> > >>>>> +/* { dg-do compile } */
> > >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > >>>>> +
> > >>>>> +#include "riscv_vector.h"
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> > >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> > >>>>> +
> > >>>>> + *(vbool2_t*)(out + 100) = v1;
> > >>>>> + *(vbool1_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> > >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> > >>>>> +
> > >>>>> + *(vbool2_t*)(out + 100) = v1;
> > >>>>> + *(vbool4_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> > >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> > >>>>> +
> > >>>>> + *(vbool2_t*)(out + 100) = v1;
> > >>>>> + *(vbool8_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> > >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> > >>>>> +
> > >>>>> + *(vbool2_t*)(out + 100) = v1;
> > >>>>> + *(vbool16_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> > >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> > >>>>> +
> > >>>>> + *(vbool2_t*)(out + 100) = v1;
> > >>>>> + *(vbool32_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> > >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> > >>>>> +
> > >>>>> + *(vbool2_t*)(out + 100) = v1;
> > >>>>> + *(vbool64_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
> > >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-3.c b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > >>>>> new file mode 100644
> > >>>>> index 00000000000..3af0513e006
> > >>>>> --- /dev/null
> > >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-3.c
> > >>>>> @@ -0,0 +1,68 @@
> > >>>>> +/* { dg-do compile } */
> > >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > >>>>> +
> > >>>>> +#include "riscv_vector.h"
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> > >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> > >>>>> +
> > >>>>> + *(vbool4_t*)(out + 100) = v1;
> > >>>>> + *(vbool1_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> > >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> > >>>>> +
> > >>>>> + *(vbool4_t*)(out + 100) = v1;
> > >>>>> + *(vbool2_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> > >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> > >>>>> +
> > >>>>> + *(vbool4_t*)(out + 100) = v1;
> > >>>>> + *(vbool8_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> > >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> > >>>>> +
> > >>>>> + *(vbool4_t*)(out + 100) = v1;
> > >>>>> + *(vbool16_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> > >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> > >>>>> +
> > >>>>> + *(vbool4_t*)(out + 100) = v1;
> > >>>>> + *(vbool32_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> > >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> > >>>>> +
> > >>>>> + *(vbool4_t*)(out + 100) = v1;
> > >>>>> + *(vbool64_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
> > >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-4.c b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > >>>>> new file mode 100644
> > >>>>> index 00000000000..ea3c360d756
> > >>>>> --- /dev/null
> > >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-4.c
> > >>>>> @@ -0,0 +1,68 @@
> > >>>>> +/* { dg-do compile } */
> > >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > >>>>> +
> > >>>>> +#include "riscv_vector.h"
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> > >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> > >>>>> +
> > >>>>> + *(vbool8_t*)(out + 100) = v1;
> > >>>>> + *(vbool1_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> > >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> > >>>>> +
> > >>>>> + *(vbool8_t*)(out + 100) = v1;
> > >>>>> + *(vbool2_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> > >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> > >>>>> +
> > >>>>> + *(vbool8_t*)(out + 100) = v1;
> > >>>>> + *(vbool4_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> > >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> > >>>>> +
> > >>>>> + *(vbool8_t*)(out + 100) = v1;
> > >>>>> + *(vbool16_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> > >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> > >>>>> +
> > >>>>> + *(vbool8_t*)(out + 100) = v1;
> > >>>>> + *(vbool32_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> > >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> > >>>>> +
> > >>>>> + *(vbool8_t*)(out + 100) = v1;
> > >>>>> + *(vbool64_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
> > >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-5.c b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > >>>>> new file mode 100644
> > >>>>> index 00000000000..9fc659d2402
> > >>>>> --- /dev/null
> > >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-5.c
> > >>>>> @@ -0,0 +1,68 @@
> > >>>>> +/* { dg-do compile } */
> > >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > >>>>> +
> > >>>>> +#include "riscv_vector.h"
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> > >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> > >>>>> +
> > >>>>> + *(vbool16_t*)(out + 100) = v1;
> > >>>>> + *(vbool1_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> > >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> > >>>>> +
> > >>>>> + *(vbool16_t*)(out + 100) = v1;
> > >>>>> + *(vbool2_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> > >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> > >>>>> +
> > >>>>> + *(vbool16_t*)(out + 100) = v1;
> > >>>>> + *(vbool4_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> > >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> > >>>>> +
> > >>>>> + *(vbool16_t*)(out + 100) = v1;
> > >>>>> + *(vbool8_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> > >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> > >>>>> +
> > >>>>> + *(vbool16_t*)(out + 100) = v1;
> > >>>>> + *(vbool32_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> > >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> > >>>>> +
> > >>>>> + *(vbool16_t*)(out + 100) = v1;
> > >>>>> + *(vbool64_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> > >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-6.c b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > >>>>> new file mode 100644
> > >>>>> index 00000000000..98275e5267d
> > >>>>> --- /dev/null
> > >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-6.c
> > >>>>> @@ -0,0 +1,68 @@
> > >>>>> +/* { dg-do compile } */
> > >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > >>>>> +
> > >>>>> +#include "riscv_vector.h"
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> > >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> > >>>>> +
> > >>>>> + *(vbool32_t*)(out + 100) = v1;
> > >>>>> + *(vbool1_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> > >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> > >>>>> +
> > >>>>> + *(vbool32_t*)(out + 100) = v1;
> > >>>>> + *(vbool2_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> > >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> > >>>>> +
> > >>>>> + *(vbool32_t*)(out + 100) = v1;
> > >>>>> + *(vbool4_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> > >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> > >>>>> +
> > >>>>> + *(vbool32_t*)(out + 100) = v1;
> > >>>>> + *(vbool8_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> > >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> > >>>>> +
> > >>>>> + *(vbool32_t*)(out + 100) = v1;
> > >>>>> + *(vbool16_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> > >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> > >>>>> +
> > >>>>> + *(vbool32_t*)(out + 100) = v1;
> > >>>>> + *(vbool64_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
> > >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-7.c b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > >>>>> new file mode 100644
> > >>>>> index 00000000000..8f6f0b11f09
> > >>>>> --- /dev/null
> > >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-7.c
> > >>>>> @@ -0,0 +1,68 @@
> > >>>>> +/* { dg-do compile } */
> > >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > >>>>> +
> > >>>>> +#include "riscv_vector.h"
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> > >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> > >>>>> +
> > >>>>> + *(vbool64_t*)(out + 100) = v1;
> > >>>>> + *(vbool1_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> > >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> > >>>>> +
> > >>>>> + *(vbool64_t*)(out + 100) = v1;
> > >>>>> + *(vbool2_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> > >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> > >>>>> +
> > >>>>> + *(vbool64_t*)(out + 100) = v1;
> > >>>>> + *(vbool4_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> > >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> > >>>>> +
> > >>>>> + *(vbool64_t*)(out + 100) = v1;
> > >>>>> + *(vbool8_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> > >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> > >>>>> +
> > >>>>> + *(vbool64_t*)(out + 100) = v1;
> > >>>>> + *(vbool16_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> > >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> > >>>>> +
> > >>>>> + *(vbool64_t*)(out + 100) = v1;
> > >>>>> + *(vbool32_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> > >>>>> diff --git a/gcc/testsuite/gcc.target/riscv/pr108185-8.c b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > >>>>> new file mode 100644
> > >>>>> index 00000000000..d96959dd064
> > >>>>> --- /dev/null
> > >>>>> +++ b/gcc/testsuite/gcc.target/riscv/pr108185-8.c
> > >>>>> @@ -0,0 +1,77 @@
> > >>>>> +/* { dg-do compile } */
> > >>>>> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> > >>>>> +
> > >>>>> +#include "riscv_vector.h"
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool1_t v1 = *(vbool1_t*)in;
> > >>>>> + vbool1_t v2 = *(vbool1_t*)in;
> > >>>>> +
> > >>>>> + *(vbool1_t*)(out + 100) = v1;
> > >>>>> + *(vbool1_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool2_t v1 = *(vbool2_t*)in;
> > >>>>> + vbool2_t v2 = *(vbool2_t*)in;
> > >>>>> +
> > >>>>> + *(vbool2_t*)(out + 100) = v1;
> > >>>>> + *(vbool2_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool4_t v1 = *(vbool4_t*)in;
> > >>>>> + vbool4_t v2 = *(vbool4_t*)in;
> > >>>>> +
> > >>>>> + *(vbool4_t*)(out + 100) = v1;
> > >>>>> + *(vbool4_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool8_t v1 = *(vbool8_t*)in;
> > >>>>> + vbool8_t v2 = *(vbool8_t*)in;
> > >>>>> +
> > >>>>> + *(vbool8_t*)(out + 100) = v1;
> > >>>>> + *(vbool8_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool16_t v1 = *(vbool16_t*)in;
> > >>>>> + vbool16_t v2 = *(vbool16_t*)in;
> > >>>>> +
> > >>>>> + *(vbool16_t*)(out + 100) = v1;
> > >>>>> + *(vbool16_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool32_t v1 = *(vbool32_t*)in;
> > >>>>> + vbool32_t v2 = *(vbool32_t*)in;
> > >>>>> +
> > >>>>> + *(vbool32_t*)(out + 100) = v1;
> > >>>>> + *(vbool32_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
> > >>>>> + vbool64_t v1 = *(vbool64_t*)in;
> > >>>>> + vbool64_t v2 = *(vbool64_t*)in;
> > >>>>> +
> > >>>>> + *(vbool64_t*)(out + 100) = v1;
> > >>>>> + *(vbool64_t*)(out + 200) = v2;
> > >>>>> +}
> > >>>>> +
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
> > >>>>> +/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> HRB 36809 (AG Nuernberg)
>
Pan Li via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> I am not very familiar with the memory pattern, maybe juzhe can provide more information or correct me if anything is misleading.
>
> The different precision try to resolve the below bugs, the second vlm(with different size of load bytes compared to first one)
> is eliminated because vbool8 and vbool16 have the same precision size, aka [8, 8].
>
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
> *(vbool16_t*)(out + 200) = v5;
> *(vbool8_t*)(out + 100) = v2;
>
> addi a4,a1,100
> vsetvli a5,zero,e8,m1,ta,ma
> addi a1,a1,200
> vlm.v v24,0(a0)
> vsm.v v24,0(a4)
> // Need one vsetvli and vlm.v for correctness here.
> vsm.v v24,0(a1)
But I think it's important to think about the patch as more than a way
of fixing the bug above. The aim has to be to describe the modes as they
really are.
I don't think there's a way for GET_MODE_SIZE to be "conservatively wrong".
A GET_MODE_SIZE that is too small would cause problems. So would a
GET_MODE_SIZE that is too big.
Like Richard says, I think the question comes down to the amount of padding.
Is it the case that for 4+4X ([4,4]), the memory representation has 4 bits
of padding for even X and 0 bits of padding for odd X?
I agree getting rid of GET_MODE_SIZE and representing everything in bits
would avoid the problem at this point, but I think it would just be pushing
the difficulty elsewhere. E.g. stack layout will be "interesting" if we
can't work in byte sizes.
Thanks,
Richard
On Wed, 1 Mar 2023, Richard Sandiford wrote:
> Pan Li via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > I am not very familiar with the memory pattern, maybe juzhe can provide more information or correct me if anything is misleading.
> >
> > The different precision try to resolve the below bugs, the second vlm(with different size of load bytes compared to first one)
> > is eliminated because vbool8 and vbool16 have the same precision size, aka [8, 8].
> >
> > vbool8_t v2 = *(vbool8_t*)in;
> > vbool16_t v5 = *(vbool16_t*)in;
> > *(vbool16_t*)(out + 200) = v5;
> > *(vbool8_t*)(out + 100) = v2;
> >
> > addi a4,a1,100
> > vsetvli a5,zero,e8,m1,ta,ma
> > addi a1,a1,200
> > vlm.v v24,0(a0)
> > vsm.v v24,0(a4)
> > // Need one vsetvli and vlm.v for correctness here.
> > vsm.v v24,0(a1)
>
> But I think it's important to think about the patch as more than a way
> of fixing the bug above. The aim has to be to describe the modes as they
> really are.
>
> I don't think there's a way for GET_MODE_SIZE to be "conservatively wrong".
> A GET_MODE_SIZE that is too small would cause problems. So would a
> GET_MODE_SIZE that is too big.
>
> Like Richard says, I think the question comes down to the amount of padding.
> Is it the case that for 4+4X ([4,4]), the memory representation has 4 bits
> of padding for even X and 0 bits of padding for odd X?
>
> I agree getting rid of GET_MODE_SIZE and representing everything in bits
> would avoid the problem at this point, but I think it would just be pushing
> the difficulty elsewhere. E.g. stack layout will be "interesting" if we
> can't work in byte sizes.
I suppose the backend could ensure when it performs setvl, that we only
ever end up with the even or odd case and also restrict vectorization
that way? So we could implement the working half and leave the not
working half never happening at runtime ...
Richard.
Let's me first introduce RVV load/store basics and stack allocation.
For scalable vector memory allocation, we allocate memory according to machine vector-length.
To get this CPU vector-length value (runtime invariant but compile time unknown), we have an instruction call csrr vlenb.
For example, csrr a5,vlenb (store CPU a single register vector-length value (describe as bytesize) in a5 register).
A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.
Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same bytesize poly (1,1). So their storage consumes the same size.
Meaning when we want to allocate a memory storge or stack for register spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = a5/8)
Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, VNx8BI are doing the same process as I described above. They all consume
the same memory storage size since we can't model them accurately according to precision or you bitsize.
They consume the same storage (I am agree it's better to model them more accurately in case of memory storage comsuming).
Well, even though they are consuming same size memory storage, I can make their memory accessing behavior (load/store) accurately by
emiting the accurate RVV instruction for them according to RVV ISA.
VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size poly (1,1)
The instruction for these modes as follows:
VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage.
VNx2BI: vsevl e8mf8 + vlm, loading 1/4 of poly (1,1) storage.
VNx4BI: vsevl e8mf8 + vlm, loading 1/2 of poly (1,1) storage.
VNx8BI: vsevl e8mf8 + vlm, loading 1 of poly (1,1) storage.
So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, VNx8BI accurately according to precision or bitsize.
This implementation is fine even though their memory storage is not accurate.
However, the problem is that since they have the same bytesize, GCC will think they are the same and do some incorrect statement elimination:
(Note: Load same memory base)
load v0 VNx1BI from base0
load v1 VNx2BI from base0
load v2 VNx4BI from base0
load v3 VNx8BI from base0
store v0 base1
store v1 base2
store v2 base3
store v3 base4
This program sequence, in GCC, it will eliminate the last 3 load instructions.
Then it will become:
load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly size (1,1) memory data)
store v0 base1
store v0 base2
store v0 base3
store v0 base4
This is what we want to fix. I think as long as we can have the way to differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI
and GCC will not do th incorrect elimination for RVV.
I think it can work fine even though these 4 modes consume inaccurate memory storage size
but accurate data memory access load store behavior.
Thanks.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2023-03-01 21:19
To: Pan Li via Gcc-patches
CC: Richard Biener; Pan Li; juzhe.zhong\@rivai.ai; pan2.li; Kito.cheng
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Pan Li via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> I am not very familiar with the memory pattern, maybe juzhe can provide more information or correct me if anything is misleading.
>
> The different precision try to resolve the below bugs, the second vlm(with different size of load bytes compared to first one)
> is eliminated because vbool8 and vbool16 have the same precision size, aka [8, 8].
>
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
> *(vbool16_t*)(out + 200) = v5;
> *(vbool8_t*)(out + 100) = v2;
>
> addi a4,a1,100
> vsetvli a5,zero,e8,m1,ta,ma
> addi a1,a1,200
> vlm.v v24,0(a0)
> vsm.v v24,0(a4)
> // Need one vsetvli and vlm.v for correctness here.
> vsm.v v24,0(a1)
But I think it's important to think about the patch as more than a way
of fixing the bug above. The aim has to be to describe the modes as they
really are.
I don't think there's a way for GET_MODE_SIZE to be "conservatively wrong".
A GET_MODE_SIZE that is too small would cause problems. So would a
GET_MODE_SIZE that is too big.
Like Richard says, I think the question comes down to the amount of padding.
Is it the case that for 4+4X ([4,4]), the memory representation has 4 bits
of padding for even X and 0 bits of padding for odd X?
I agree getting rid of GET_MODE_SIZE and representing everything in bits
would avoid the problem at this point, but I think it would just be pushing
the difficulty elsewhere. E.g. stack layout will be "interesting" if we
can't work in byte sizes.
Thanks,
Richard
Sorry for missleading typo.
>> VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage.
>> VNx2BI: vsevl e8mf8 + vlm, loading 1/4 of poly (1,1) storage.
>> VNx4BI: vsevl e8mf8 + vlm, loading 1/2 of poly (1,1) storage.
>> VNx8BI: vsevl e8mf8 + vlm, loading 1 of poly (1,1) storage.
It should be:
VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage.
VNx2BI: vsevl e8mf4 + vlm, loading 1/4 of poly (1,1) storage.
VNx4BI: vsevl e8mf2 + vlm, loading 1/2 of poly (1,1) storage.
VNx8BI: vsevl e8m1 + vlm, loading 1 of poly (1,1) storage.
Plz be aware of this . Thanks.
juzhe.zhong@rivai.ai
From: juzhe.zhong@rivai.ai
Date: 2023-03-01 21:50
To: richard.sandiford; gcc-patches
CC: rguenther; Pan Li; pan2.li; kito.cheng
Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Let's me first introduce RVV load/store basics and stack allocation.
For scalable vector memory allocation, we allocate memory according to machine vector-length.
To get this CPU vector-length value (runtime invariant but compile time unknown), we have an instruction call csrr vlenb.
For example, csrr a5,vlenb (store CPU a single register vector-length value (describe as bytesize) in a5 register).
A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.
Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same bytesize poly (1,1). So their storage consumes the same size.
Meaning when we want to allocate a memory storge or stack for register spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = a5/8)
Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, VNx8BI are doing the same process as I described above. They all consume
the same memory storage size since we can't model them accurately according to precision or you bitsize.
They consume the same storage (I am agree it's better to model them more accurately in case of memory storage comsuming).
Well, even though they are consuming same size memory storage, I can make their memory accessing behavior (load/store) accurately by
emiting the accurate RVV instruction for them according to RVV ISA.
VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size poly (1,1)
The instruction for these modes as follows:
VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage.
VNx2BI: vsevl e8mf8 + vlm, loading 1/4 of poly (1,1) storage.
VNx4BI: vsevl e8mf8 + vlm, loading 1/2 of poly (1,1) storage.
VNx8BI: vsevl e8mf8 + vlm, loading 1 of poly (1,1) storage.
So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, VNx8BI accurately according to precision or bitsize.
This implementation is fine even though their memory storage is not accurate.
However, the problem is that since they have the same bytesize, GCC will think they are the same and do some incorrect statement elimination:
(Note: Load same memory base)
load v0 VNx1BI from base0
load v1 VNx2BI from base0
load v2 VNx4BI from base0
load v3 VNx8BI from base0
store v0 base1
store v1 base2
store v2 base3
store v3 base4
This program sequence, in GCC, it will eliminate the last 3 load instructions.
Then it will become:
load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly size (1,1) memory data)
store v0 base1
store v0 base2
store v0 base3
store v0 base4
This is what we want to fix. I think as long as we can have the way to differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI
and GCC will not do th incorrect elimination for RVV.
I think it can work fine even though these 4 modes consume inaccurate memory storage size
but accurate data memory access load store behavior.
Thanks.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2023-03-01 21:19
To: Pan Li via Gcc-patches
CC: Richard Biener; Pan Li; juzhe.zhong\@rivai.ai; pan2.li; Kito.cheng
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Pan Li via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> I am not very familiar with the memory pattern, maybe juzhe can provide more information or correct me if anything is misleading.
>
> The different precision try to resolve the below bugs, the second vlm(with different size of load bytes compared to first one)
> is eliminated because vbool8 and vbool16 have the same precision size, aka [8, 8].
>
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
> *(vbool16_t*)(out + 200) = v5;
> *(vbool8_t*)(out + 100) = v2;
>
> addi a4,a1,100
> vsetvli a5,zero,e8,m1,ta,ma
> addi a1,a1,200
> vlm.v v24,0(a0)
> vsm.v v24,0(a4)
> // Need one vsetvli and vlm.v for correctness here.
> vsm.v v24,0(a1)
But I think it's important to think about the patch as more than a way
of fixing the bug above. The aim has to be to describe the modes as they
really are.
I don't think there's a way for GET_MODE_SIZE to be "conservatively wrong".
A GET_MODE_SIZE that is too small would cause problems. So would a
GET_MODE_SIZE that is too big.
Like Richard says, I think the question comes down to the amount of padding.
Is it the case that for 4+4X ([4,4]), the memory representation has 4 bits
of padding for even X and 0 bits of padding for odd X?
I agree getting rid of GET_MODE_SIZE and representing everything in bits
would avoid the problem at this point, but I think it would just be pushing
the difficulty elsewhere. E.g. stack layout will be "interesting" if we
can't work in byte sizes.
Thanks,
Richard
On Wed, 1 Mar 2023, juzhe.zhong@rivai.ai wrote:
> Let's me first introduce RVV load/store basics and stack allocation.
> For scalable vector memory allocation, we allocate memory according to machine vector-length.
> To get this CPU vector-length value (runtime invariant but compile time unknown), we have an instruction call csrr vlenb.
> For example, csrr a5,vlenb (store CPU a single register vector-length value (describe as bytesize) in a5 register).
> A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.
>
> Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same bytesize poly (1,1). So their storage consumes the same size.
> Meaning when we want to allocate a memory storge or stack for register spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = a5/8)
> Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, VNx8BI are doing the same process as I described above. They all consume
> the same memory storage size since we can't model them accurately according to precision or you bitsize.
>
> They consume the same storage (I am agree it's better to model them more accurately in case of memory storage comsuming).
>
> Well, even though they are consuming same size memory storage, I can make their memory accessing behavior (load/store) accurately by
> emiting the accurate RVV instruction for them according to RVV ISA.
>
> VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size poly (1,1)
> The instruction for these modes as follows:
> VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage.
> VNx2BI: vsevl e8mf8 + vlm, loading 1/4 of poly (1,1) storage.
> VNx4BI: vsevl e8mf8 + vlm, loading 1/2 of poly (1,1) storage.
> VNx8BI: vsevl e8mf8 + vlm, loading 1 of poly (1,1) storage.
>
> So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, VNx8BI accurately according to precision or bitsize.
> This implementation is fine even though their memory storage is not accurate.
>
> However, the problem is that since they have the same bytesize, GCC will think they are the same and do some incorrect statement elimination:
>
> (Note: Load same memory base)
> load v0 VNx1BI from base0
> load v1 VNx2BI from base0
> load v2 VNx4BI from base0
> load v3 VNx8BI from base0
>
> store v0 base1
> store v1 base2
> store v2 base3
> store v3 base4
>
> This program sequence, in GCC, it will eliminate the last 3 load instructions.
>
> Then it will become:
>
> load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly size (1,1) memory data)
>
> store v0 base1
> store v0 base2
> store v0 base3
> store v0 base4
>
> This is what we want to fix. I think as long as we can have the way to differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI
> and GCC will not do th incorrect elimination for RVV.
>
> I think it can work fine even though these 4 modes consume inaccurate memory storage size
> but accurate data memory access load store behavior.
So given the above I think that modeling the size as being the same
but with accurate precision would work. It's then only the size of the
padding in bytes we cannot represent with poly-int which should be fine.
Correct?
Richard.
> Thanks.
>
>
> juzhe.zhong@rivai.ai
>
> From: Richard Sandiford
> Date: 2023-03-01 21:19
> To: Pan Li via Gcc-patches
> CC: Richard Biener; Pan Li; juzhe.zhong\@rivai.ai; pan2.li; Kito.cheng
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> Pan Li via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > I am not very familiar with the memory pattern, maybe juzhe can provide more information or correct me if anything is misleading.
> >
> > The different precision try to resolve the below bugs, the second vlm(with different size of load bytes compared to first one)
> > is eliminated because vbool8 and vbool16 have the same precision size, aka [8, 8].
> >
> > vbool8_t v2 = *(vbool8_t*)in;
> > vbool16_t v5 = *(vbool16_t*)in;
> > *(vbool16_t*)(out + 200) = v5;
> > *(vbool8_t*)(out + 100) = v2;
> >
> > addi a4,a1,100
> > vsetvli a5,zero,e8,m1,ta,ma
> > addi a1,a1,200
> > vlm.v v24,0(a0)
> > vsm.v v24,0(a4)
> > // Need one vsetvli and vlm.v for correctness here.
> > vsm.v v24,0(a1)
>
> But I think it's important to think about the patch as more than a way
> of fixing the bug above. The aim has to be to describe the modes as they
> really are.
>
> I don't think there's a way for GET_MODE_SIZE to be "conservatively wrong".
> A GET_MODE_SIZE that is too small would cause problems. So would a
> GET_MODE_SIZE that is too big.
>
> Like Richard says, I think the question comes down to the amount of padding.
> Is it the case that for 4+4X ([4,4]), the memory representation has 4 bits
> of padding for even X and 0 bits of padding for odd X?
>
> I agree getting rid of GET_MODE_SIZE and representing everything in bits
> would avoid the problem at this point, but I think it would just be pushing
> the difficulty elsewhere. E.g. stack layout will be "interesting" if we
> can't work in byte sizes.
>
> Thanks,
> Richard
>
>
On Wed, 1 Mar 2023, Richard Biener wrote:
> On Wed, 1 Mar 2023, juzhe.zhong@rivai.ai wrote:
>
> > Let's me first introduce RVV load/store basics and stack allocation.
> > For scalable vector memory allocation, we allocate memory according to machine vector-length.
> > To get this CPU vector-length value (runtime invariant but compile time unknown), we have an instruction call csrr vlenb.
> > For example, csrr a5,vlenb (store CPU a single register vector-length value (describe as bytesize) in a5 register).
> > A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.
> >
> > Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same bytesize poly (1,1). So their storage consumes the same size.
> > Meaning when we want to allocate a memory storge or stack for register spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = a5/8)
> > Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, VNx8BI are doing the same process as I described above. They all consume
> > the same memory storage size since we can't model them accurately according to precision or you bitsize.
> >
> > They consume the same storage (I am agree it's better to model them more accurately in case of memory storage comsuming).
> >
> > Well, even though they are consuming same size memory storage, I can make their memory accessing behavior (load/store) accurately by
> > emiting the accurate RVV instruction for them according to RVV ISA.
> >
> > VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size poly (1,1)
> > The instruction for these modes as follows:
> > VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage.
> > VNx2BI: vsevl e8mf8 + vlm, loading 1/4 of poly (1,1) storage.
> > VNx4BI: vsevl e8mf8 + vlm, loading 1/2 of poly (1,1) storage.
> > VNx8BI: vsevl e8mf8 + vlm, loading 1 of poly (1,1) storage.
> >
> > So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, VNx8BI accurately according to precision or bitsize.
> > This implementation is fine even though their memory storage is not accurate.
> >
> > However, the problem is that since they have the same bytesize, GCC will think they are the same and do some incorrect statement elimination:
> >
> > (Note: Load same memory base)
> > load v0 VNx1BI from base0
> > load v1 VNx2BI from base0
> > load v2 VNx4BI from base0
> > load v3 VNx8BI from base0
> >
> > store v0 base1
> > store v1 base2
> > store v2 base3
> > store v3 base4
> >
> > This program sequence, in GCC, it will eliminate the last 3 load instructions.
> >
> > Then it will become:
> >
> > load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly size (1,1) memory data)
> >
> > store v0 base1
> > store v0 base2
> > store v0 base3
> > store v0 base4
> >
> > This is what we want to fix. I think as long as we can have the way to differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI
> > and GCC will not do th incorrect elimination for RVV.
> >
> > I think it can work fine even though these 4 modes consume inaccurate memory storage size
> > but accurate data memory access load store behavior.
>
> So given the above I think that modeling the size as being the same
> but with accurate precision would work. It's then only the size of the
> padding in bytes we cannot represent with poly-int which should be fine.
>
> Correct?
Btw, is storing a VNx1BI and then loading a VNx2BI from the same
memory address well-defined? That is, how is the padding handled
by the machine load/store instructions?
Richard.
>> So given the above I think that modeling the size as being the same
>> but with accurate precision would work. It's then only the size of the
>> padding in bytes we cannot represent with poly-int which should be fine.
>> Correct?
Yes.
>> Btw, is storing a VNx1BI and then loading a VNx2BI from the same
>> memory address well-defined? That is, how is the padding handled
>> by the machine load/store instructions?
storing VNx1BI is storing the data from addr 0 ~ 1/8 poly (1,1) and keep addr 1/8 poly (1,1) ~ 2/8 poly (1,1) memory data unchange.
load VNx2BI will load 0 ~ 2/8 poly (1,1), note that 0 ~ 1/8 poly (1,1) is the date that we store above, 1/8 poly (1,1) ~ 2/8 poly (1,1) is the orignal memory data.
You can see here for this case (LLVM):
https://godbolt.org/z/P9e1adrd3
foo: # @foo
vsetvli a2, zero, e8, mf8, ta, ma
vsm.v v0, (a0)
vsetvli a2, zero, e8, mf4, ta, ma
vlm.v v8, (a0)
vsm.v v8, (a1)
ret
We can also doing like this in GCC as long as we can differentiate VNx1BI and VNx2BI, and GCC do not eliminate statement according precision even though
they have same bytesize.
First we emit vsetvl e8mf8 +vsm for VNx1BI
Then we emit vsetvl e8mf8 + vlm for VNx2BI
Thanks.
juzhe.zhong@rivai.ai
From: Richard Biener
Date: 2023-03-01 22:03
To: juzhe.zhong
CC: richard.sandiford; gcc-patches; Pan Li; pan2.li; kito.cheng
Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
On Wed, 1 Mar 2023, Richard Biener wrote:
> On Wed, 1 Mar 2023, juzhe.zhong@rivai.ai wrote:
>
> > Let's me first introduce RVV load/store basics and stack allocation.
> > For scalable vector memory allocation, we allocate memory according to machine vector-length.
> > To get this CPU vector-length value (runtime invariant but compile time unknown), we have an instruction call csrr vlenb.
> > For example, csrr a5,vlenb (store CPU a single register vector-length value (describe as bytesize) in a5 register).
> > A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.
> >
> > Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same bytesize poly (1,1). So their storage consumes the same size.
> > Meaning when we want to allocate a memory storge or stack for register spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = a5/8)
> > Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, VNx8BI are doing the same process as I described above. They all consume
> > the same memory storage size since we can't model them accurately according to precision or you bitsize.
> >
> > They consume the same storage (I am agree it's better to model them more accurately in case of memory storage comsuming).
> >
> > Well, even though they are consuming same size memory storage, I can make their memory accessing behavior (load/store) accurately by
> > emiting the accurate RVV instruction for them according to RVV ISA.
> >
> > VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size poly (1,1)
> > The instruction for these modes as follows:
> > VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage.
> > VNx2BI: vsevl e8mf8 + vlm, loading 1/4 of poly (1,1) storage.
> > VNx4BI: vsevl e8mf8 + vlm, loading 1/2 of poly (1,1) storage.
> > VNx8BI: vsevl e8mf8 + vlm, loading 1 of poly (1,1) storage.
> >
> > So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, VNx8BI accurately according to precision or bitsize.
> > This implementation is fine even though their memory storage is not accurate.
> >
> > However, the problem is that since they have the same bytesize, GCC will think they are the same and do some incorrect statement elimination:
> >
> > (Note: Load same memory base)
> > load v0 VNx1BI from base0
> > load v1 VNx2BI from base0
> > load v2 VNx4BI from base0
> > load v3 VNx8BI from base0
> >
> > store v0 base1
> > store v1 base2
> > store v2 base3
> > store v3 base4
> >
> > This program sequence, in GCC, it will eliminate the last 3 load instructions.
> >
> > Then it will become:
> >
> > load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly size (1,1) memory data)
> >
> > store v0 base1
> > store v0 base2
> > store v0 base3
> > store v0 base4
> >
> > This is what we want to fix. I think as long as we can have the way to differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI
> > and GCC will not do th incorrect elimination for RVV.
> >
> > I think it can work fine even though these 4 modes consume inaccurate memory storage size
> > but accurate data memory access load store behavior.
>
> So given the above I think that modeling the size as being the same
> but with accurate precision would work. It's then only the size of the
> padding in bytes we cannot represent with poly-int which should be fine.
>
> Correct?
Btw, is storing a VNx1BI and then loading a VNx2BI from the same
memory address well-defined? That is, how is the padding handled
by the machine load/store instructions?
Richard.
Thanks all for so much valuable and helpful materials.
As I understand (Please help to correct me if any mistake.), for the VNx*BI (aka, 1, 2, 4, 8, 16, 32, 64),
the precision and mode size need to be adjusted as below.
Precision size [1, 2, 4, 8, 16, 32, 64]
Mode size [1, 1, 1, 1, 2, 4, 8]
Given that, if we ignore the self-test failure, only the adjust_precision part is able to fix the bug I mentioned.
The genmode will first get the precision, and then leverage the mode_size = exact_div / 8 to generate.
Meanwhile, it also provides the adjust_mode_size after the mode_size generation.
The riscv parts has the mode_size_adjust already and the value of mode_size will be overridden by the adjustments.
Unfortunately, the early stage mode_size generation leveraged exact_div, which doesn't honor precision size < 8
with the adjustment and fails on exact_div assertions.
Besides the precision adjustment, I am not sure if we can narrow down the problem to.
1. Defined the real size of both the precision and mode size to align the riscv ISA.
2. Besides, make the general mode_size = precision_size / 8 is able to take care of both the exact_div and the dividend less than the divisor (like 1/8 or 2/8) cases.
Could you please share your professional suggestions about this? Thank you all again and have a nice day!
Pan
From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Wednesday, March 1, 2023 10:19 PM
To: rguenther <rguenther@suse.de>
Cc: richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Pan Li <incarnation.p.lee@outlook.com>; Li, Pan2 <pan2.li@intel.com>; kito.cheng <kito.cheng@sifive.com>
Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>> So given the above I think that modeling the size as being the same
>> but with accurate precision would work. It's then only the size of the
>> padding in bytes we cannot represent with poly-int which should be fine.
>> Correct?
Yes.
>> Btw, is storing a VNx1BI and then loading a VNx2BI from the same
>> memory address well-defined? That is, how is the padding handled
>> by the machine load/store instructions?
storing VNx1BI is storing the data from addr 0 ~ 1/8 poly (1,1) and keep addr 1/8 poly (1,1) ~ 2/8 poly (1,1) memory data unchange.
load VNx2BI will load 0 ~ 2/8 poly (1,1), note that 0 ~ 1/8 poly (1,1) is the date that we store above, 1/8 poly (1,1) ~ 2/8 poly (1,1) is the orignal memory data.
You can see here for this case (LLVM):
https://godbolt.org/z/P9e1adrd3
foo: # @foo
vsetvli a2, zero, e8, mf8, ta, ma
vsm.v v0, (a0)
vsetvli a2, zero, e8, mf4, ta, ma
vlm.v v8, (a0)
vsm.v v8, (a1)
ret
We can also doing like this in GCC as long as we can differentiate VNx1BI and VNx2BI, and GCC do not eliminate statement according precision even though
they have same bytesize.
First we emit vsetvl e8mf8 +vsm for VNx1BI
Then we emit vsetvl e8mf8 + vlm for VNx2BI
Thanks.
Sorry for the typo and inconvenience.
The below
"The genmode will first get the precision, and then leverage the mode_size = exact_div / 8 to generate"
should be:
"The genmode will first get the precision, and then leverage mode_size = mode_precision / 8 to generate"
Pan
________________________________
From: Li, Pan2 <pan2.li@intel.com>
Sent: Wednesday, March 1, 2023 23:42
To: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>; rguenther <rguenther@suse.de>
Cc: richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Pan Li <incarnation.p.lee@outlook.com>; kito.cheng <kito.cheng@sifive.com>
Subject: RE: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Thanks all for so much valuable and helpful materials.
As I understand (Please help to correct me if any mistake.), for the VNx*BI (aka, 1, 2, 4, 8, 16, 32, 64),
the precision and mode size need to be adjusted as below.
Precision size [1, 2, 4, 8, 16, 32, 64]
Mode size [1, 1, 1, 1, 2, 4, 8]
Given that, if we ignore the self-test failure, only the adjust_precision part is able to fix the bug I mentioned.
The genmode will first get the precision, and then leverage the mode_size = exact_div / 8 to generate.
Meanwhile, it also provides the adjust_mode_size after the mode_size generation.
The riscv parts has the mode_size_adjust already and the value of mode_size will be overridden by the adjustments.
Unfortunately, the early stage mode_size generation leveraged exact_div, which doesn’t honor precision size < 8
with the adjustment and fails on exact_div assertions.
Besides the precision adjustment, I am not sure if we can narrow down the problem to.
1. Defined the real size of both the precision and mode size to align the riscv ISA.
2. Besides, make the general mode_size = precision_size / 8 is able to take care of both the exact_div and the dividend less than the divisor (like 1/8 or 2/8) cases.
Could you please share your professional suggestions about this? Thank you all again and have a nice day!
Pan
From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Wednesday, March 1, 2023 10:19 PM
To: rguenther <rguenther@suse.de>
Cc: richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Pan Li <incarnation.p.lee@outlook.com>; Li, Pan2 <pan2.li@intel.com>; kito.cheng <kito.cheng@sifive.com>
Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>> So given the above I think that modeling the size as being the same
>> but with accurate precision would work. It's then only the size of the
>> padding in bytes we cannot represent with poly-int which should be fine.
>> Correct?
Yes.
>> Btw, is storing a VNx1BI and then loading a VNx2BI from the same
>> memory address well-defined? That is, how is the padding handled
>> by the machine load/store instructions?
storing VNx1BI is storing the data from addr 0 ~ 1/8 poly (1,1) and keep addr 1/8 poly (1,1) ~ 2/8 poly (1,1) memory data unchange.
load VNx2BI will load 0 ~ 2/8 poly (1,1), note that 0 ~ 1/8 poly (1,1) is the date that we store above, 1/8 poly (1,1) ~ 2/8 poly (1,1) is the orignal memory data.
You can see here for this case (LLVM):
https://godbolt.org/z/P9e1adrd3
foo: # @foo
vsetvli a2, zero, e8, mf8, ta, ma
vsm.v v0, (a0)
vsetvli a2, zero, e8, mf4, ta, ma
vlm.v v8, (a0)
vsm.v v8, (a1)
ret
We can also doing like this in GCC as long as we can differentiate VNx1BI and VNx2BI, and GCC do not eliminate statement according precision even though
they have same bytesize.
First we emit vsetvl e8mf8 +vsm for VNx1BI
Then we emit vsetvl e8mf8 + vlm for VNx2BI
Thanks.
________________________________
juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
From: Richard Biener<mailto:rguenther@suse.de>
Date: 2023-03-01 22:03
To: juzhe.zhong<mailto:juzhe.zhong@rivai.ai>
CC: richard.sandiford<mailto:richard.sandiford@arm.com>; gcc-patches<mailto:gcc-patches@gcc.gnu.org>; Pan Li<mailto:incarnation.p.lee@outlook.com>; pan2.li<mailto:pan2.li@intel.com>; kito.cheng<mailto:kito.cheng@sifive.com>
Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
On Wed, 1 Mar 2023, Richard Biener wrote:
> On Wed, 1 Mar 2023, juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> wrote:
>
> > Let's me first introduce RVV load/store basics and stack allocation.
> > For scalable vector memory allocation, we allocate memory according to machine vector-length.
> > To get this CPU vector-length value (runtime invariant but compile time unknown), we have an instruction call csrr vlenb.
> > For example, csrr a5,vlenb (store CPU a single register vector-length value (describe as bytesize) in a5 register).
> > A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.
> >
> > Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same bytesize poly (1,1). So their storage consumes the same size.
> > Meaning when we want to allocate a memory storge or stack for register spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = a5/8)
> > Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, VNx8BI are doing the same process as I described above. They all consume
> > the same memory storage size since we can't model them accurately according to precision or you bitsize.
> >
> > They consume the same storage (I am agree it's better to model them more accurately in case of memory storage comsuming).
> >
> > Well, even though they are consuming same size memory storage, I can make their memory accessing behavior (load/store) accurately by
> > emiting the accurate RVV instruction for them according to RVV ISA.
> >
> > VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size poly (1,1)
> > The instruction for these modes as follows:
> > VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage.
> > VNx2BI: vsevl e8mf8 + vlm, loading 1/4 of poly (1,1) storage.
> > VNx4BI: vsevl e8mf8 + vlm, loading 1/2 of poly (1,1) storage.
> > VNx8BI: vsevl e8mf8 + vlm, loading 1 of poly (1,1) storage.
> >
> > So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, VNx8BI accurately according to precision or bitsize.
> > This implementation is fine even though their memory storage is not accurate.
> >
> > However, the problem is that since they have the same bytesize, GCC will think they are the same and do some incorrect statement elimination:
> >
> > (Note: Load same memory base)
> > load v0 VNx1BI from base0
> > load v1 VNx2BI from base0
> > load v2 VNx4BI from base0
> > load v3 VNx8BI from base0
> >
> > store v0 base1
> > store v1 base2
> > store v2 base3
> > store v3 base4
> >
> > This program sequence, in GCC, it will eliminate the last 3 load instructions.
> >
> > Then it will become:
> >
> > load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly size (1,1) memory data)
> >
> > store v0 base1
> > store v0 base2
> > store v0 base3
> > store v0 base4
> >
> > This is what we want to fix. I think as long as we can have the way to differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI
> > and GCC will not do th incorrect elimination for RVV.
> >
> > I think it can work fine even though these 4 modes consume inaccurate memory storage size
> > but accurate data memory access load store behavior.
>
> So given the above I think that modeling the size as being the same
> but with accurate precision would work. It's then only the size of the
> padding in bytes we cannot represent with poly-int which should be fine.
>
> Correct?
Btw, is storing a VNx1BI and then loading a VNx2BI from the same
memory address well-defined? That is, how is the padding handled
by the machine load/store instructions?
Richard.
"Li, Pan2" <pan2.li@intel.com> writes:
> Thanks all for so much valuable and helpful materials.
>
> As I understand (Please help to correct me if any mistake.), for the VNx*BI (aka, 1, 2, 4, 8, 16, 32, 64),
> the precision and mode size need to be adjusted as below.
>
> Precision size [1, 2, 4, 8, 16, 32, 64]
> Mode size [1, 1, 1, 1, 2, 4, 8]
>
> Given that, if we ignore the self-test failure, only the adjust_precision part is able to fix the bug I mentioned.
> The genmode will first get the precision, and then leverage the mode_size = exact_div / 8 to generate.
> Meanwhile, it also provides the adjust_mode_size after the mode_size generation.
>
> The riscv parts has the mode_size_adjust already and the value of mode_size will be overridden by the adjustments.
Ah, OK! In that case, would the following help:
Turn:
mode_size[E_%smode] = exact_div (mode_precision[E_%smode], BITS_PER_UNIT);
into:
if (!multiple_p (mode_precision[E_%smode], BITS_PER_UNIT,
&mode_size[E_%smode]))
mode_size[E_%smode] = -1;
where -1 is an "obviously wrong" value.
Ports that might hit the -1 are then responsible for setting the size
later, via ADJUST_BYTESIZE.
After all the adjustments are complete, genmodes asserts that no size is
known_eq to -1.
That way, target-independent code doesn't need to guess what the
correct behaviour is.
Does the eventual value set by ADJUST_BYTESIZE equal the real number of
bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl)?
And does it equal the size of the corresponding LLVM machine type?
Or is the GCC size larger in some cases than the number of bytes
loaded and stored?
(You and Juzhe have probably answered that question before, sorry,
but I'm still not 100% sure of the answer. Personally, I think I would
find the ISA behaviour easier to understand if the explanation doesn't
involve poly_ints. It would be good to understand things "as the
architecture sees then" rather than in terms of GCC concepts.)
Thanks,
Richard
> Unfortunately, the early stage mode_size generation leveraged exact_div, which doesn't honor precision size < 8
> with the adjustment and fails on exact_div assertions.
>
> Besides the precision adjustment, I am not sure if we can narrow down the problem to.
>
>
> 1. Defined the real size of both the precision and mode size to align the riscv ISA.
> 2. Besides, make the general mode_size = precision_size / 8 is able to take care of both the exact_div and the dividend less than the divisor (like 1/8 or 2/8) cases.
>
> Could you please share your professional suggestions about this? Thank you all again and have a nice day!
>
> Pan
>
> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> Sent: Wednesday, March 1, 2023 10:19 PM
> To: rguenther <rguenther@suse.de>
> Cc: richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Pan Li <incarnation.p.lee@outlook.com>; Li, Pan2 <pan2.li@intel.com>; kito.cheng <kito.cheng@sifive.com>
> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
>>> So given the above I think that modeling the size as being the same
>>> but with accurate precision would work. It's then only the size of the
>>> padding in bytes we cannot represent with poly-int which should be fine.
>
>>> Correct?
> Yes.
>
>>> Btw, is storing a VNx1BI and then loading a VNx2BI from the same
>>> memory address well-defined? That is, how is the padding handled
>>> by the machine load/store instructions?
>
> storing VNx1BI is storing the data from addr 0 ~ 1/8 poly (1,1) and keep addr 1/8 poly (1,1) ~ 2/8 poly (1,1) memory data unchange.
> load VNx2BI will load 0 ~ 2/8 poly (1,1), note that 0 ~ 1/8 poly (1,1) is the date that we store above, 1/8 poly (1,1) ~ 2/8 poly (1,1) is the orignal memory data.
> You can see here for this case (LLVM):
> https://godbolt.org/z/P9e1adrd3
> foo: # @foo
> vsetvli a2, zero, e8, mf8, ta, ma
> vsm.v v0, (a0)
> vsetvli a2, zero, e8, mf4, ta, ma
> vlm.v v8, (a0)
> vsm.v v8, (a1)
> ret
>
> We can also doing like this in GCC as long as we can differentiate VNx1BI and VNx2BI, and GCC do not eliminate statement according precision even though
> they have same bytesize.
>
> First we emit vsetvl e8mf8 +vsm for VNx1BI
> Then we emit vsetvl e8mf8 + vlm for VNx2BI
>
> Thanks.
> ________________________________
> juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
>
> From: Richard Biener<mailto:rguenther@suse.de>
> Date: 2023-03-01 22:03
> To: juzhe.zhong<mailto:juzhe.zhong@rivai.ai>
> CC: richard.sandiford<mailto:richard.sandiford@arm.com>; gcc-patches<mailto:gcc-patches@gcc.gnu.org>; Pan Li<mailto:incarnation.p.lee@outlook.com>; pan2.li<mailto:pan2.li@intel.com>; kito.cheng<mailto:kito.cheng@sifive.com>
> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> On Wed, 1 Mar 2023, Richard Biener wrote:
>
>> On Wed, 1 Mar 2023, juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> wrote:
>>
>> > Let's me first introduce RVV load/store basics and stack allocation.
>> > For scalable vector memory allocation, we allocate memory according to machine vector-length.
>> > To get this CPU vector-length value (runtime invariant but compile time unknown), we have an instruction call csrr vlenb.
>> > For example, csrr a5,vlenb (store CPU a single register vector-length value (describe as bytesize) in a5 register).
>> > A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.
>> >
>> > Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same bytesize poly (1,1). So their storage consumes the same size.
>> > Meaning when we want to allocate a memory storge or stack for register spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = a5/8)
>> > Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, VNx8BI are doing the same process as I described above. They all consume
>> > the same memory storage size since we can't model them accurately according to precision or you bitsize.
>> >
>> > They consume the same storage (I am agree it's better to model them more accurately in case of memory storage comsuming).
>> >
>> > Well, even though they are consuming same size memory storage, I can make their memory accessing behavior (load/store) accurately by
>> > emiting the accurate RVV instruction for them according to RVV ISA.
>> >
>> > VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size poly (1,1)
>> > The instruction for these modes as follows:
>> > VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage.
>> > VNx2BI: vsevl e8mf8 + vlm, loading 1/4 of poly (1,1) storage.
>> > VNx4BI: vsevl e8mf8 + vlm, loading 1/2 of poly (1,1) storage.
>> > VNx8BI: vsevl e8mf8 + vlm, loading 1 of poly (1,1) storage.
>> >
>> > So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, VNx8BI accurately according to precision or bitsize.
>> > This implementation is fine even though their memory storage is not accurate.
>> >
>> > However, the problem is that since they have the same bytesize, GCC will think they are the same and do some incorrect statement elimination:
>> >
>> > (Note: Load same memory base)
>> > load v0 VNx1BI from base0
>> > load v1 VNx2BI from base0
>> > load v2 VNx4BI from base0
>> > load v3 VNx8BI from base0
>> >
>> > store v0 base1
>> > store v1 base2
>> > store v2 base3
>> > store v3 base4
>> >
>> > This program sequence, in GCC, it will eliminate the last 3 load instructions.
>> >
>> > Then it will become:
>> >
>> > load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly size (1,1) memory data)
>> >
>> > store v0 base1
>> > store v0 base2
>> > store v0 base3
>> > store v0 base4
>> >
>> > This is what we want to fix. I think as long as we can have the way to differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI
>> > and GCC will not do th incorrect elimination for RVV.
>> >
>> > I think it can work fine even though these 4 modes consume inaccurate memory storage size
>> > but accurate data memory access load store behavior.
>>
>> So given the above I think that modeling the size as being the same
>> but with accurate precision would work. It's then only the size of the
>> padding in bytes we cannot represent with poly-int which should be fine.
>>
>> Correct?
>
> Btw, is storing a VNx1BI and then loading a VNx2BI from the same
> memory address well-defined? That is, how is the padding handled
> by the machine load/store instructions?
>
> Richard.
>> Does the eventual value set by ADJUST_BYTESIZE equal the real number of
>> bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl)?
>> Or is the GCC size larger in some cases than the number of bytes
>> loaded and stored?
For VNx1BI,VNx2BI,VNx4BI,VNx8BI, we allocate the larger size of memory or stack for register spillling
according to ADJUST_BYTESIZE.
After appropriate vsetvl, VNx1BI is loaded/stored 1/8 of ADJUST_BYTESIZE (vsetvl e8mf8).
After appropriate vsetvl, VNx2BI is loaded/stored 2/8 of ADJUST_BYTESIZE (vsetvl e8mf2).
After appropriate vsetvl, VNx4BI is loaded/stored 4/8 of ADJUST_BYTESIZE (vsetvl e8mf4).
After appropriate vsetvl, VNx8BI is loaded/stored 8/8 of ADJUST_BYTESIZE (vsetvl e8m1).
Note: except these 4 machine modes, all other machine modes of RVV, ADJUST_BYTESIZE
are equal to the real number of bytes of load/store instruction that RVV ISA define.
Well, as I said, it's fine that we allocated larger memory for VNx1BI,VNx2BI,VNx4BI,
we can emit appropriate vsetvl to gurantee the correctness in RISC-V backward according
to the machine_mode as long as long GCC didn't do the incorrect elimination in middle-end.
Besides, poly (1,1) is 1/8 of machine vector-length which is already really a small number,
which is the real number bytes loaded/stored for VNx8BI.
You can say VNx1BI, VNx2BI, VNx4BI are consuming larger memory than we actually load/stored by appropriate vsetvl
since they are having same ADJUST_BYTESIZE as VNx8BI. However, I think it's totally fine so far as long as we can
gurantee the correctness and I think optimizing such memory storage consuming is trivial.
>> And does it equal the size of the corresponding LLVM machine type?
Well, for some reason, in case of register spilling, LLVM consume much more memory than GCC.
And they always do whole register load/store (a single vector register vector-length) for register spilling.
That's another story (I am not going to talk too much about this since it's a quite ugly implementation).
They don't model the types accurately according RVV ISA for register spilling.
In case of normal load/store like:
vbool8_t v2 = *(vbool8_t*)in; *(vbool8_t*)(out + 100) = v2;
This kind of load/store, their load/stores instructions of codegen are accurate.
Even though their instructions are accurate for load/store accessing behavior, I am not sure whether size
of their machine type is accurate.
For example, in IR presentation: VNx1BI of GCC is represented as vscale x 1 x i1
VNx2BI of GCC is represented as vscale x 2 x i1
in LLVM IR.
I am not sure the bytesize of vscale x 1 x i1 and vscale x 2 x i1.
I didn't take a deep a look at it.
I think this question is not that important, no matter whether VNx1BI and VNx2BI are modeled accurately in case of ADUST_BYTESIZE
in GCC or vscale x 1 x i1 and vscale x 2 x i1 are modeled accurately in case of their bytesize,
I think as long as we can emit appropriate vsetvl + vlm/vsm, it's totally fine for RVV even though in some case, their memory allocation
is not accurate in compiler.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2023-03-02 00:14
To: Li\, Pan2
CC: juzhe.zhong\@rivai.ai; rguenther; gcc-patches; Pan Li; kito.cheng
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
"Li, Pan2" <pan2.li@intel.com> writes:
> Thanks all for so much valuable and helpful materials.
>
> As I understand (Please help to correct me if any mistake.), for the VNx*BI (aka, 1, 2, 4, 8, 16, 32, 64),
> the precision and mode size need to be adjusted as below.
>
> Precision size [1, 2, 4, 8, 16, 32, 64]
> Mode size [1, 1, 1, 1, 2, 4, 8]
>
> Given that, if we ignore the self-test failure, only the adjust_precision part is able to fix the bug I mentioned.
> The genmode will first get the precision, and then leverage the mode_size = exact_div / 8 to generate.
> Meanwhile, it also provides the adjust_mode_size after the mode_size generation.
>
> The riscv parts has the mode_size_adjust already and the value of mode_size will be overridden by the adjustments.
Ah, OK! In that case, would the following help:
Turn:
mode_size[E_%smode] = exact_div (mode_precision[E_%smode], BITS_PER_UNIT);
into:
if (!multiple_p (mode_precision[E_%smode], BITS_PER_UNIT,
&mode_size[E_%smode]))
mode_size[E_%smode] = -1;
where -1 is an "obviously wrong" value.
Ports that might hit the -1 are then responsible for setting the size
later, via ADJUST_BYTESIZE.
After all the adjustments are complete, genmodes asserts that no size is
known_eq to -1.
That way, target-independent code doesn't need to guess what the
correct behaviour is.
Does the eventual value set by ADJUST_BYTESIZE equal the real number of
bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl)?
And does it equal the size of the corresponding LLVM machine type?
Or is the GCC size larger in some cases than the number of bytes
loaded and stored?
(You and Juzhe have probably answered that question before, sorry,
but I'm still not 100% sure of the answer. Personally, I think I would
find the ISA behaviour easier to understand if the explanation doesn't
involve poly_ints. It would be good to understand things "as the
architecture sees then" rather than in terms of GCC concepts.)
Thanks,
Richard
> Unfortunately, the early stage mode_size generation leveraged exact_div, which doesn't honor precision size < 8
> with the adjustment and fails on exact_div assertions.
>
> Besides the precision adjustment, I am not sure if we can narrow down the problem to.
>
>
> 1. Defined the real size of both the precision and mode size to align the riscv ISA.
> 2. Besides, make the general mode_size = precision_size / 8 is able to take care of both the exact_div and the dividend less than the divisor (like 1/8 or 2/8) cases.
>
> Could you please share your professional suggestions about this? Thank you all again and have a nice day!
>
> Pan
>
> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> Sent: Wednesday, March 1, 2023 10:19 PM
> To: rguenther <rguenther@suse.de>
> Cc: richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Pan Li <incarnation.p.lee@outlook.com>; Li, Pan2 <pan2.li@intel.com>; kito.cheng <kito.cheng@sifive.com>
> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>
>>> So given the above I think that modeling the size as being the same
>>> but with accurate precision would work. It's then only the size of the
>>> padding in bytes we cannot represent with poly-int which should be fine.
>
>>> Correct?
> Yes.
>
>>> Btw, is storing a VNx1BI and then loading a VNx2BI from the same
>>> memory address well-defined? That is, how is the padding handled
>>> by the machine load/store instructions?
>
> storing VNx1BI is storing the data from addr 0 ~ 1/8 poly (1,1) and keep addr 1/8 poly (1,1) ~ 2/8 poly (1,1) memory data unchange.
> load VNx2BI will load 0 ~ 2/8 poly (1,1), note that 0 ~ 1/8 poly (1,1) is the date that we store above, 1/8 poly (1,1) ~ 2/8 poly (1,1) is the orignal memory data.
> You can see here for this case (LLVM):
> https://godbolt.org/z/P9e1adrd3
> foo: # @foo
> vsetvli a2, zero, e8, mf8, ta, ma
> vsm.v v0, (a0)
> vsetvli a2, zero, e8, mf4, ta, ma
> vlm.v v8, (a0)
> vsm.v v8, (a1)
> ret
>
> We can also doing like this in GCC as long as we can differentiate VNx1BI and VNx2BI, and GCC do not eliminate statement according precision even though
> they have same bytesize.
>
> First we emit vsetvl e8mf8 +vsm for VNx1BI
> Then we emit vsetvl e8mf8 + vlm for VNx2BI
>
> Thanks.
> ________________________________
> juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
>
> From: Richard Biener<mailto:rguenther@suse.de>
> Date: 2023-03-01 22:03
> To: juzhe.zhong<mailto:juzhe.zhong@rivai.ai>
> CC: richard.sandiford<mailto:richard.sandiford@arm.com>; gcc-patches<mailto:gcc-patches@gcc.gnu.org>; Pan Li<mailto:incarnation.p.lee@outlook.com>; pan2.li<mailto:pan2.li@intel.com>; kito.cheng<mailto:kito.cheng@sifive.com>
> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> On Wed, 1 Mar 2023, Richard Biener wrote:
>
>> On Wed, 1 Mar 2023, juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> wrote:
>>
>> > Let's me first introduce RVV load/store basics and stack allocation.
>> > For scalable vector memory allocation, we allocate memory according to machine vector-length.
>> > To get this CPU vector-length value (runtime invariant but compile time unknown), we have an instruction call csrr vlenb.
>> > For example, csrr a5,vlenb (store CPU a single register vector-length value (describe as bytesize) in a5 register).
>> > A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.
>> >
>> > Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same bytesize poly (1,1). So their storage consumes the same size.
>> > Meaning when we want to allocate a memory storge or stack for register spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = a5/8)
>> > Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, VNx8BI are doing the same process as I described above. They all consume
>> > the same memory storage size since we can't model them accurately according to precision or you bitsize.
>> >
>> > They consume the same storage (I am agree it's better to model them more accurately in case of memory storage comsuming).
>> >
>> > Well, even though they are consuming same size memory storage, I can make their memory accessing behavior (load/store) accurately by
>> > emiting the accurate RVV instruction for them according to RVV ISA.
>> >
>> > VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size poly (1,1)
>> > The instruction for these modes as follows:
>> > VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage.
>> > VNx2BI: vsevl e8mf8 + vlm, loading 1/4 of poly (1,1) storage.
>> > VNx4BI: vsevl e8mf8 + vlm, loading 1/2 of poly (1,1) storage.
>> > VNx8BI: vsevl e8mf8 + vlm, loading 1 of poly (1,1) storage.
>> >
>> > So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, VNx8BI accurately according to precision or bitsize.
>> > This implementation is fine even though their memory storage is not accurate.
>> >
>> > However, the problem is that since they have the same bytesize, GCC will think they are the same and do some incorrect statement elimination:
>> >
>> > (Note: Load same memory base)
>> > load v0 VNx1BI from base0
>> > load v1 VNx2BI from base0
>> > load v2 VNx4BI from base0
>> > load v3 VNx8BI from base0
>> >
>> > store v0 base1
>> > store v1 base2
>> > store v2 base3
>> > store v3 base4
>> >
>> > This program sequence, in GCC, it will eliminate the last 3 load instructions.
>> >
>> > Then it will become:
>> >
>> > load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly size (1,1) memory data)
>> >
>> > store v0 base1
>> > store v0 base2
>> > store v0 base3
>> > store v0 base4
>> >
>> > This is what we want to fix. I think as long as we can have the way to differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI
>> > and GCC will not do th incorrect elimination for RVV.
>> >
>> > I think it can work fine even though these 4 modes consume inaccurate memory storage size
>> > but accurate data memory access load store behavior.
>>
>> So given the above I think that modeling the size as being the same
>> but with accurate precision would work. It's then only the size of the
>> padding in bytes we cannot represent with poly-int which should be fine.
>>
>> Correct?
>
> Btw, is storing a VNx1BI and then loading a VNx2BI from the same
> memory address well-defined? That is, how is the padding handled
> by the machine load/store instructions?
>
> Richard.
Thanks all for help.
I tried and validated the way Richard mentioned, it works well as expected.
Meanwhile, I updated the PR as below (I take the in-reply-to option for send-email but looks failed).
Could you please help to review continuously?
Additionally, I would like to learn if we can land this patch for the GCC 13 release (RVV release included).
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613149.html
Pan
From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Thursday, March 2, 2023 6:54 AM
To: richard.sandiford <richard.sandiford@arm.com>; Li, Pan2 <pan2.li@intel.com>
Cc: rguenther <rguenther@suse.de>; gcc-patches <gcc-patches@gcc.gnu.org>; Pan Li <incarnation.p.lee@outlook.com>; kito.cheng <kito.cheng@sifive.com>
Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>> Does the eventual value set by ADJUST_BYTESIZE equal the real number of
>> bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl)?
>> Or is the GCC size larger in some cases than the number of bytes
>> loaded and stored?
For VNx1BI,VNx2BI,VNx4BI,VNx8BI, we allocate the larger size of memory or stack for register spillling
according to ADJUST_BYTESIZE.
After appropriate vsetvl, VNx1BI is loaded/stored 1/8 of ADJUST_BYTESIZE (vsetvl e8mf8).
After appropriate vsetvl, VNx2BI is loaded/stored 2/8 of ADJUST_BYTESIZE (vsetvl e8mf2).
After appropriate vsetvl, VNx4BI is loaded/stored 4/8 of ADJUST_BYTESIZE (vsetvl e8mf4).
After appropriate vsetvl, VNx8BI is loaded/stored 8/8 of ADJUST_BYTESIZE (vsetvl e8m1).
Note: except these 4 machine modes, all other machine modes of RVV, ADJUST_BYTESIZE
are equal to the real number of bytes of load/store instruction that RVV ISA define.
Well, as I said, it's fine that we allocated larger memory for VNx1BI,VNx2BI,VNx4BI,
we can emit appropriate vsetvl to gurantee the correctness in RISC-V backward according
to the machine_mode as long as long GCC didn't do the incorrect elimination in middle-end.
Besides, poly (1,1) is 1/8 of machine vector-length which is already really a small number,
which is the real number bytes loaded/stored for VNx8BI.
You can say VNx1BI, VNx2BI, VNx4BI are consuming larger memory than we actually load/stored by appropriate vsetvl
since they are having same ADJUST_BYTESIZE as VNx8BI. However, I think it's totally fine so far as long as we can
gurantee the correctness and I think optimizing such memory storage consuming is trivial.
>> And does it equal the size of the corresponding LLVM machine type?
Well, for some reason, in case of register spilling, LLVM consume much more memory than GCC.
And they always do whole register load/store (a single vector register vector-length) for register spilling.
That's another story (I am not going to talk too much about this since it's a quite ugly implementation).
They don't model the types accurately according RVV ISA for register spilling.
In case of normal load/store like:
vbool8_t v2 = *(vbool8_t*)in; *(vbool8_t*)(out + 100) = v2;
This kind of load/store, their load/stores instructions of codegen are accurate.
Even though their instructions are accurate for load/store accessing behavior, I am not sure whether size
of their machine type is accurate.
For example, in IR presentation: VNx1BI of GCC is represented as vscale x 1 x i1
VNx2BI of GCC is represented as vscale x 2 x i1
in LLVM IR.
I am not sure the bytesize of vscale x 1 x i1 and vscale x 2 x i1.
I didn't take a deep a look at it.
I think this question is not that important, no matter whether VNx1BI and VNx2BI are modeled accurately in case of ADUST_BYTESIZE
in GCC or vscale x 1 x i1 and vscale x 2 x i1 are modeled accurately in case of their bytesize,
I think as long as we can emit appropriate vsetvl + vlm/vsm, it's totally fine for RVV even though in some case, their memory allocation
is not accurate in compiler.
On Thu, 2 Mar 2023, juzhe.zhong@rivai.ai wrote:
> >> Does the eventual value set by ADJUST_BYTESIZE equal the real number of
> >> bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl)?
> >> Or is the GCC size larger in some cases than the number of bytes
> >> loaded and stored?
> For VNx1BI,VNx2BI,VNx4BI,VNx8BI, we allocate the larger size of memory or stack for register spillling
> according to ADJUST_BYTESIZE.
> After appropriate vsetvl, VNx1BI is loaded/stored 1/8 of ADJUST_BYTESIZE (vsetvl e8mf8).
> After appropriate vsetvl, VNx2BI is loaded/stored 2/8 of ADJUST_BYTESIZE (vsetvl e8mf2).
> After appropriate vsetvl, VNx4BI is loaded/stored 4/8 of ADJUST_BYTESIZE (vsetvl e8mf4).
> After appropriate vsetvl, VNx8BI is loaded/stored 8/8 of ADJUST_BYTESIZE (vsetvl e8m1).
>
> Note: except these 4 machine modes, all other machine modes of RVV, ADJUST_BYTESIZE
> are equal to the real number of bytes of load/store instruction that RVV ISA define.
>
> Well, as I said, it's fine that we allocated larger memory for VNx1BI,VNx2BI,VNx4BI,
> we can emit appropriate vsetvl to gurantee the correctness in RISC-V backward according
> to the machine_mode as long as long GCC didn't do the incorrect elimination in middle-end.
>
> Besides, poly (1,1) is 1/8 of machine vector-length which is already really a small number,
> which is the real number bytes loaded/stored for VNx8BI.
> You can say VNx1BI, VNx2BI, VNx4BI are consuming larger memory than we actually load/stored by appropriate vsetvl
> since they are having same ADJUST_BYTESIZE as VNx8BI. However, I think it's totally fine so far as long as we can
> gurantee the correctness and I think optimizing such memory storage consuming is trivial.
>
> >> And does it equal the size of the corresponding LLVM machine type?
>
> Well, for some reason, in case of register spilling, LLVM consume much more memory than GCC.
> And they always do whole register load/store (a single vector register vector-length) for register spilling.
> That's another story (I am not going to talk too much about this since it's a quite ugly implementation).
> They don't model the types accurately according RVV ISA for register spilling.
>
> In case of normal load/store like:
> vbool8_t v2 = *(vbool8_t*)in; *(vbool8_t*)(out + 100) = v2;
> This kind of load/store, their load/stores instructions of codegen are accurate.
> Even though their instructions are accurate for load/store accessing behavior, I am not sure whether size
> of their machine type is accurate.
>
> For example, in IR presentation: VNx1BI of GCC is represented as vscale x 1 x i1
> VNx2BI of GCC is represented as vscale x 2 x i1
> in LLVM IR.
> I am not sure the bytesize of vscale x 1 x i1 and vscale x 2 x i1.
> I didn't take a deep a look at it.
>
> I think this question is not that important, no matter whether VNx1BI and VNx2BI are modeled accurately in case of ADUST_BYTESIZE
> in GCC or vscale x 1 x i1 and vscale x 2 x i1 are modeled accurately in case of their bytesize,
> I think as long as we can emit appropriate vsetvl + vlm/vsm, it's totally fine for RVV even though in some case, their memory allocation
> is not accurate in compiler.
I'm not sure how it works for variable-length types but isn't
sizeof (vbool8_t) part of the ABI and thus its TYPE_SIZE / GET_MODE_SIZE
are relevant there? It might of course be that you can never have
these types as part of aggregates, arrays or objects of them address-taken
in which case the issue is moot?
Richard.
>
> juzhe.zhong@rivai.ai
>
> From: Richard Sandiford
> Date: 2023-03-02 00:14
> To: Li\, Pan2
> CC: juzhe.zhong\@rivai.ai; rguenther; gcc-patches; Pan Li; kito.cheng
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> "Li, Pan2" <pan2.li@intel.com> writes:
> > Thanks all for so much valuable and helpful materials.
> >
> > As I understand (Please help to correct me if any mistake.), for the VNx*BI (aka, 1, 2, 4, 8, 16, 32, 64),
> > the precision and mode size need to be adjusted as below.
> >
> > Precision size [1, 2, 4, 8, 16, 32, 64]
> > Mode size [1, 1, 1, 1, 2, 4, 8]
> >
> > Given that, if we ignore the self-test failure, only the adjust_precision part is able to fix the bug I mentioned.
> > The genmode will first get the precision, and then leverage the mode_size = exact_div / 8 to generate.
> > Meanwhile, it also provides the adjust_mode_size after the mode_size generation.
> >
> > The riscv parts has the mode_size_adjust already and the value of mode_size will be overridden by the adjustments.
>
> Ah, OK! In that case, would the following help:
>
> Turn:
>
> mode_size[E_%smode] = exact_div (mode_precision[E_%smode], BITS_PER_UNIT);
>
> into:
>
> if (!multiple_p (mode_precision[E_%smode], BITS_PER_UNIT,
> &mode_size[E_%smode]))
> mode_size[E_%smode] = -1;
>
> where -1 is an "obviously wrong" value.
>
> Ports that might hit the -1 are then responsible for setting the size
> later, via ADJUST_BYTESIZE.
>
> After all the adjustments are complete, genmodes asserts that no size is
> known_eq to -1.
>
> That way, target-independent code doesn't need to guess what the
> correct behaviour is.
>
> Does the eventual value set by ADJUST_BYTESIZE equal the real number of
> bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl)?
> And does it equal the size of the corresponding LLVM machine type?
> Or is the GCC size larger in some cases than the number of bytes
> loaded and stored?
>
> (You and Juzhe have probably answered that question before, sorry,
> but I'm still not 100% sure of the answer. Personally, I think I would
> find the ISA behaviour easier to understand if the explanation doesn't
> involve poly_ints. It would be good to understand things "as the
> architecture sees then" rather than in terms of GCC concepts.)
>
> Thanks,
> Richard
>
> > Unfortunately, the early stage mode_size generation leveraged exact_div, which doesn't honor precision size < 8
> > with the adjustment and fails on exact_div assertions.
> >
> > Besides the precision adjustment, I am not sure if we can narrow down the problem to.
> >
> >
> > 1. Defined the real size of both the precision and mode size to align the riscv ISA.
> > 2. Besides, make the general mode_size = precision_size / 8 is able to take care of both the exact_div and the dividend less than the divisor (like 1/8 or 2/8) cases.
> >
> > Could you please share your professional suggestions about this? Thank you all again and have a nice day!
> >
> > Pan
> >
> > From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> > Sent: Wednesday, March 1, 2023 10:19 PM
> > To: rguenther <rguenther@suse.de>
> > Cc: richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Pan Li <incarnation.p.lee@outlook.com>; Li, Pan2 <pan2.li@intel.com>; kito.cheng <kito.cheng@sifive.com>
> > Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >
> >>> So given the above I think that modeling the size as being the same
> >>> but with accurate precision would work. It's then only the size of the
> >>> padding in bytes we cannot represent with poly-int which should be fine.
> >
> >>> Correct?
> > Yes.
> >
> >>> Btw, is storing a VNx1BI and then loading a VNx2BI from the same
> >>> memory address well-defined? That is, how is the padding handled
> >>> by the machine load/store instructions?
> >
> > storing VNx1BI is storing the data from addr 0 ~ 1/8 poly (1,1) and keep addr 1/8 poly (1,1) ~ 2/8 poly (1,1) memory data unchange.
> > load VNx2BI will load 0 ~ 2/8 poly (1,1), note that 0 ~ 1/8 poly (1,1) is the date that we store above, 1/8 poly (1,1) ~ 2/8 poly (1,1) is the orignal memory data.
> > You can see here for this case (LLVM):
> > https://godbolt.org/z/P9e1adrd3
> > foo: # @foo
> > vsetvli a2, zero, e8, mf8, ta, ma
> > vsm.v v0, (a0)
> > vsetvli a2, zero, e8, mf4, ta, ma
> > vlm.v v8, (a0)
> > vsm.v v8, (a1)
> > ret
> >
> > We can also doing like this in GCC as long as we can differentiate VNx1BI and VNx2BI, and GCC do not eliminate statement according precision even though
> > they have same bytesize.
> >
> > First we emit vsetvl e8mf8 +vsm for VNx1BI
> > Then we emit vsetvl e8mf8 + vlm for VNx2BI
> >
> > Thanks.
> > ________________________________
> > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
> >
> > From: Richard Biener<mailto:rguenther@suse.de>
> > Date: 2023-03-01 22:03
> > To: juzhe.zhong<mailto:juzhe.zhong@rivai.ai>
> > CC: richard.sandiford<mailto:richard.sandiford@arm.com>; gcc-patches<mailto:gcc-patches@gcc.gnu.org>; Pan Li<mailto:incarnation.p.lee@outlook.com>; pan2.li<mailto:pan2.li@intel.com>; kito.cheng<mailto:kito.cheng@sifive.com>
> > Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> > On Wed, 1 Mar 2023, Richard Biener wrote:
> >
> >> On Wed, 1 Mar 2023, juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> wrote:
> >>
> >> > Let's me first introduce RVV load/store basics and stack allocation.
> >> > For scalable vector memory allocation, we allocate memory according to machine vector-length.
> >> > To get this CPU vector-length value (runtime invariant but compile time unknown), we have an instruction call csrr vlenb.
> >> > For example, csrr a5,vlenb (store CPU a single register vector-length value (describe as bytesize) in a5 register).
> >> > A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.
> >> >
> >> > Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same bytesize poly (1,1). So their storage consumes the same size.
> >> > Meaning when we want to allocate a memory storge or stack for register spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = a5/8)
> >> > Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, VNx8BI are doing the same process as I described above. They all consume
> >> > the same memory storage size since we can't model them accurately according to precision or you bitsize.
> >> >
> >> > They consume the same storage (I am agree it's better to model them more accurately in case of memory storage comsuming).
> >> >
> >> > Well, even though they are consuming same size memory storage, I can make their memory accessing behavior (load/store) accurately by
> >> > emiting the accurate RVV instruction for them according to RVV ISA.
> >> >
> >> > VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size poly (1,1)
> >> > The instruction for these modes as follows:
> >> > VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage.
> >> > VNx2BI: vsevl e8mf8 + vlm, loading 1/4 of poly (1,1) storage.
> >> > VNx4BI: vsevl e8mf8 + vlm, loading 1/2 of poly (1,1) storage.
> >> > VNx8BI: vsevl e8mf8 + vlm, loading 1 of poly (1,1) storage.
> >> >
> >> > So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, VNx8BI accurately according to precision or bitsize.
> >> > This implementation is fine even though their memory storage is not accurate.
> >> >
> >> > However, the problem is that since they have the same bytesize, GCC will think they are the same and do some incorrect statement elimination:
> >> >
> >> > (Note: Load same memory base)
> >> > load v0 VNx1BI from base0
> >> > load v1 VNx2BI from base0
> >> > load v2 VNx4BI from base0
> >> > load v3 VNx8BI from base0
> >> >
> >> > store v0 base1
> >> > store v1 base2
> >> > store v2 base3
> >> > store v3 base4
> >> >
> >> > This program sequence, in GCC, it will eliminate the last 3 load instructions.
> >> >
> >> > Then it will become:
> >> >
> >> > load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly size (1,1) memory data)
> >> >
> >> > store v0 base1
> >> > store v0 base2
> >> > store v0 base3
> >> > store v0 base4
> >> >
> >> > This is what we want to fix. I think as long as we can have the way to differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI
> >> > and GCC will not do th incorrect elimination for RVV.
> >> >
> >> > I think it can work fine even though these 4 modes consume inaccurate memory storage size
> >> > but accurate data memory access load store behavior.
> >>
> >> So given the above I think that modeling the size as being the same
> >> but with accurate precision would work. It's then only the size of the
> >> padding in bytes we cannot represent with poly-int which should be fine.
> >>
> >> Correct?
> >
> > Btw, is storing a VNx1BI and then loading a VNx2BI from the same
> > memory address well-defined? That is, how is the padding handled
> > by the machine load/store instructions?
> >
> > Richard.
>
>
Fortunately, we won't have aggregates, arrays of vbool*_t in the future.
I think it's not an issue.
juzhe.zhong@rivai.ai
From: Richard Biener
Date: 2023-03-02 16:25
To: juzhe.zhong
CC: richard.sandiford; pan2.li; gcc-patches; Pan Li; kito.cheng
Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
On Thu, 2 Mar 2023, juzhe.zhong@rivai.ai wrote:
> >> Does the eventual value set by ADJUST_BYTESIZE equal the real number of
> >> bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl)?
> >> Or is the GCC size larger in some cases than the number of bytes
> >> loaded and stored?
> For VNx1BI,VNx2BI,VNx4BI,VNx8BI, we allocate the larger size of memory or stack for register spillling
> according to ADJUST_BYTESIZE.
> After appropriate vsetvl, VNx1BI is loaded/stored 1/8 of ADJUST_BYTESIZE (vsetvl e8mf8).
> After appropriate vsetvl, VNx2BI is loaded/stored 2/8 of ADJUST_BYTESIZE (vsetvl e8mf2).
> After appropriate vsetvl, VNx4BI is loaded/stored 4/8 of ADJUST_BYTESIZE (vsetvl e8mf4).
> After appropriate vsetvl, VNx8BI is loaded/stored 8/8 of ADJUST_BYTESIZE (vsetvl e8m1).
>
> Note: except these 4 machine modes, all other machine modes of RVV, ADJUST_BYTESIZE
> are equal to the real number of bytes of load/store instruction that RVV ISA define.
>
> Well, as I said, it's fine that we allocated larger memory for VNx1BI,VNx2BI,VNx4BI,
> we can emit appropriate vsetvl to gurantee the correctness in RISC-V backward according
> to the machine_mode as long as long GCC didn't do the incorrect elimination in middle-end.
>
> Besides, poly (1,1) is 1/8 of machine vector-length which is already really a small number,
> which is the real number bytes loaded/stored for VNx8BI.
> You can say VNx1BI, VNx2BI, VNx4BI are consuming larger memory than we actually load/stored by appropriate vsetvl
> since they are having same ADJUST_BYTESIZE as VNx8BI. However, I think it's totally fine so far as long as we can
> gurantee the correctness and I think optimizing such memory storage consuming is trivial.
>
> >> And does it equal the size of the corresponding LLVM machine type?
>
> Well, for some reason, in case of register spilling, LLVM consume much more memory than GCC.
> And they always do whole register load/store (a single vector register vector-length) for register spilling.
> That's another story (I am not going to talk too much about this since it's a quite ugly implementation).
> They don't model the types accurately according RVV ISA for register spilling.
>
> In case of normal load/store like:
> vbool8_t v2 = *(vbool8_t*)in; *(vbool8_t*)(out + 100) = v2;
> This kind of load/store, their load/stores instructions of codegen are accurate.
> Even though their instructions are accurate for load/store accessing behavior, I am not sure whether size
> of their machine type is accurate.
>
> For example, in IR presentation: VNx1BI of GCC is represented as vscale x 1 x i1
> VNx2BI of GCC is represented as vscale x 2 x i1
> in LLVM IR.
> I am not sure the bytesize of vscale x 1 x i1 and vscale x 2 x i1.
> I didn't take a deep a look at it.
>
> I think this question is not that important, no matter whether VNx1BI and VNx2BI are modeled accurately in case of ADUST_BYTESIZE
> in GCC or vscale x 1 x i1 and vscale x 2 x i1 are modeled accurately in case of their bytesize,
> I think as long as we can emit appropriate vsetvl + vlm/vsm, it's totally fine for RVV even though in some case, their memory allocation
> is not accurate in compiler.
I'm not sure how it works for variable-length types but isn't
sizeof (vbool8_t) part of the ABI and thus its TYPE_SIZE / GET_MODE_SIZE
are relevant there? It might of course be that you can never have
these types as part of aggregates, arrays or objects of them address-taken
in which case the issue is moot?
Richard.
>
> juzhe.zhong@rivai.ai
>
> From: Richard Sandiford
> Date: 2023-03-02 00:14
> To: Li\, Pan2
> CC: juzhe.zhong\@rivai.ai; rguenther; gcc-patches; Pan Li; kito.cheng
> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> "Li, Pan2" <pan2.li@intel.com> writes:
> > Thanks all for so much valuable and helpful materials.
> >
> > As I understand (Please help to correct me if any mistake.), for the VNx*BI (aka, 1, 2, 4, 8, 16, 32, 64),
> > the precision and mode size need to be adjusted as below.
> >
> > Precision size [1, 2, 4, 8, 16, 32, 64]
> > Mode size [1, 1, 1, 1, 2, 4, 8]
> >
> > Given that, if we ignore the self-test failure, only the adjust_precision part is able to fix the bug I mentioned.
> > The genmode will first get the precision, and then leverage the mode_size = exact_div / 8 to generate.
> > Meanwhile, it also provides the adjust_mode_size after the mode_size generation.
> >
> > The riscv parts has the mode_size_adjust already and the value of mode_size will be overridden by the adjustments.
>
> Ah, OK! In that case, would the following help:
>
> Turn:
>
> mode_size[E_%smode] = exact_div (mode_precision[E_%smode], BITS_PER_UNIT);
>
> into:
>
> if (!multiple_p (mode_precision[E_%smode], BITS_PER_UNIT,
> &mode_size[E_%smode]))
> mode_size[E_%smode] = -1;
>
> where -1 is an "obviously wrong" value.
>
> Ports that might hit the -1 are then responsible for setting the size
> later, via ADJUST_BYTESIZE.
>
> After all the adjustments are complete, genmodes asserts that no size is
> known_eq to -1.
>
> That way, target-independent code doesn't need to guess what the
> correct behaviour is.
>
> Does the eventual value set by ADJUST_BYTESIZE equal the real number of
> bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl)?
> And does it equal the size of the corresponding LLVM machine type?
> Or is the GCC size larger in some cases than the number of bytes
> loaded and stored?
>
> (You and Juzhe have probably answered that question before, sorry,
> but I'm still not 100% sure of the answer. Personally, I think I would
> find the ISA behaviour easier to understand if the explanation doesn't
> involve poly_ints. It would be good to understand things "as the
> architecture sees then" rather than in terms of GCC concepts.)
>
> Thanks,
> Richard
>
> > Unfortunately, the early stage mode_size generation leveraged exact_div, which doesn't honor precision size < 8
> > with the adjustment and fails on exact_div assertions.
> >
> > Besides the precision adjustment, I am not sure if we can narrow down the problem to.
> >
> >
> > 1. Defined the real size of both the precision and mode size to align the riscv ISA.
> > 2. Besides, make the general mode_size = precision_size / 8 is able to take care of both the exact_div and the dividend less than the divisor (like 1/8 or 2/8) cases.
> >
> > Could you please share your professional suggestions about this? Thank you all again and have a nice day!
> >
> > Pan
> >
> > From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> > Sent: Wednesday, March 1, 2023 10:19 PM
> > To: rguenther <rguenther@suse.de>
> > Cc: richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Pan Li <incarnation.p.lee@outlook.com>; Li, Pan2 <pan2.li@intel.com>; kito.cheng <kito.cheng@sifive.com>
> > Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> >
> >>> So given the above I think that modeling the size as being the same
> >>> but with accurate precision would work. It's then only the size of the
> >>> padding in bytes we cannot represent with poly-int which should be fine.
> >
> >>> Correct?
> > Yes.
> >
> >>> Btw, is storing a VNx1BI and then loading a VNx2BI from the same
> >>> memory address well-defined? That is, how is the padding handled
> >>> by the machine load/store instructions?
> >
> > storing VNx1BI is storing the data from addr 0 ~ 1/8 poly (1,1) and keep addr 1/8 poly (1,1) ~ 2/8 poly (1,1) memory data unchange.
> > load VNx2BI will load 0 ~ 2/8 poly (1,1), note that 0 ~ 1/8 poly (1,1) is the date that we store above, 1/8 poly (1,1) ~ 2/8 poly (1,1) is the orignal memory data.
> > You can see here for this case (LLVM):
> > https://godbolt.org/z/P9e1adrd3
> > foo: # @foo
> > vsetvli a2, zero, e8, mf8, ta, ma
> > vsm.v v0, (a0)
> > vsetvli a2, zero, e8, mf4, ta, ma
> > vlm.v v8, (a0)
> > vsm.v v8, (a1)
> > ret
> >
> > We can also doing like this in GCC as long as we can differentiate VNx1BI and VNx2BI, and GCC do not eliminate statement according precision even though
> > they have same bytesize.
> >
> > First we emit vsetvl e8mf8 +vsm for VNx1BI
> > Then we emit vsetvl e8mf8 + vlm for VNx2BI
> >
> > Thanks.
> > ________________________________
> > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
> >
> > From: Richard Biener<mailto:rguenther@suse.de>
> > Date: 2023-03-01 22:03
> > To: juzhe.zhong<mailto:juzhe.zhong@rivai.ai>
> > CC: richard.sandiford<mailto:richard.sandiford@arm.com>; gcc-patches<mailto:gcc-patches@gcc.gnu.org>; Pan Li<mailto:incarnation.p.lee@outlook.com>; pan2.li<mailto:pan2.li@intel.com>; kito.cheng<mailto:kito.cheng@sifive.com>
> > Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> > On Wed, 1 Mar 2023, Richard Biener wrote:
> >
> >> On Wed, 1 Mar 2023, juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> wrote:
> >>
> >> > Let's me first introduce RVV load/store basics and stack allocation.
> >> > For scalable vector memory allocation, we allocate memory according to machine vector-length.
> >> > To get this CPU vector-length value (runtime invariant but compile time unknown), we have an instruction call csrr vlenb.
> >> > For example, csrr a5,vlenb (store CPU a single register vector-length value (describe as bytesize) in a5 register).
> >> > A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.
> >> >
> >> > Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same bytesize poly (1,1). So their storage consumes the same size.
> >> > Meaning when we want to allocate a memory storge or stack for register spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = a5/8)
> >> > Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, VNx8BI are doing the same process as I described above. They all consume
> >> > the same memory storage size since we can't model them accurately according to precision or you bitsize.
> >> >
> >> > They consume the same storage (I am agree it's better to model them more accurately in case of memory storage comsuming).
> >> >
> >> > Well, even though they are consuming same size memory storage, I can make their memory accessing behavior (load/store) accurately by
> >> > emiting the accurate RVV instruction for them according to RVV ISA.
> >> >
> >> > VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size poly (1,1)
> >> > The instruction for these modes as follows:
> >> > VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage.
> >> > VNx2BI: vsevl e8mf8 + vlm, loading 1/4 of poly (1,1) storage.
> >> > VNx4BI: vsevl e8mf8 + vlm, loading 1/2 of poly (1,1) storage.
> >> > VNx8BI: vsevl e8mf8 + vlm, loading 1 of poly (1,1) storage.
> >> >
> >> > So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, VNx8BI accurately according to precision or bitsize.
> >> > This implementation is fine even though their memory storage is not accurate.
> >> >
> >> > However, the problem is that since they have the same bytesize, GCC will think they are the same and do some incorrect statement elimination:
> >> >
> >> > (Note: Load same memory base)
> >> > load v0 VNx1BI from base0
> >> > load v1 VNx2BI from base0
> >> > load v2 VNx4BI from base0
> >> > load v3 VNx8BI from base0
> >> >
> >> > store v0 base1
> >> > store v1 base2
> >> > store v2 base3
> >> > store v3 base4
> >> >
> >> > This program sequence, in GCC, it will eliminate the last 3 load instructions.
> >> >
> >> > Then it will become:
> >> >
> >> > load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly size (1,1) memory data)
> >> >
> >> > store v0 base1
> >> > store v0 base2
> >> > store v0 base3
> >> > store v0 base4
> >> >
> >> > This is what we want to fix. I think as long as we can have the way to differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI
> >> > and GCC will not do th incorrect elimination for RVV.
> >> >
> >> > I think it can work fine even though these 4 modes consume inaccurate memory storage size
> >> > but accurate data memory access load store behavior.
> >>
> >> So given the above I think that modeling the size as being the same
> >> but with accurate precision would work. It's then only the size of the
> >> padding in bytes we cannot represent with poly-int which should be fine.
> >>
> >> Correct?
> >
> > Btw, is storing a VNx1BI and then loading a VNx2BI from the same
> > memory address well-defined? That is, how is the padding handled
> > by the machine load/store instructions?
> >
> > Richard.
>
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
Thanks for the explanation about the sizes.
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> Fortunately, we won't have aggregates, arrays of vbool*_t in the future.
> I think it's not an issue.
But isn't it possible to allocate a char/byte array and construct
vbool*_ts at addresses calculated by intrinsics? E.g. I don't see
anything wrong in principle with doing:
#include <arm_sve.h>
void f(char *x, svbool_t p1, svbool_t p2) {
*(svbool_t *)(x + svcntd()) = p2;
*(svbool_t *)(x) = p1;
}
If the mode size for svbool_t was too big, I think RTL DSE would be
within its rights to delete the first store. (Precision doesn't matter,
at least not currently.)
There's no problem if the ABI is defined such that vbool8_t has the same
size as the GET_MODE_SIZE recorded in GCC. (But of course, it would need
to be consistently so, even when the vector length is known at compile time.)
In that case, the difference between the size stored by the machine and the
size used by the ABI would be padding, and there is no requirement to
preserve padding. But if the ABI size of vbool8_t matches the machine
behaviour, I think making GCC's size bigger risks wrong code.
I realise it's a corner case. But I don't think making GET_MODE_SIZE
bigger than the real size is conservatively correct.
Thanks,
Richard
>
>
> juzhe.zhong@rivai.ai
>
> From: Richard Biener
> Date: 2023-03-02 16:25
> To: juzhe.zhong
> CC: richard.sandiford; pan2.li; gcc-patches; Pan Li; kito.cheng
> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> On Thu, 2 Mar 2023, juzhe.zhong@rivai.ai wrote:
>
>> >> Does the eventual value set by ADJUST_BYTESIZE equal the real number of
>> >> bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl)?
>> >> Or is the GCC size larger in some cases than the number of bytes
>> >> loaded and stored?
>> For VNx1BI,VNx2BI,VNx4BI,VNx8BI, we allocate the larger size of memory or stack for register spillling
>> according to ADJUST_BYTESIZE.
>> After appropriate vsetvl, VNx1BI is loaded/stored 1/8 of ADJUST_BYTESIZE (vsetvl e8mf8).
>> After appropriate vsetvl, VNx2BI is loaded/stored 2/8 of ADJUST_BYTESIZE (vsetvl e8mf2).
>> After appropriate vsetvl, VNx4BI is loaded/stored 4/8 of ADJUST_BYTESIZE (vsetvl e8mf4).
>> After appropriate vsetvl, VNx8BI is loaded/stored 8/8 of ADJUST_BYTESIZE (vsetvl e8m1).
>>
>> Note: except these 4 machine modes, all other machine modes of RVV, ADJUST_BYTESIZE
>> are equal to the real number of bytes of load/store instruction that RVV ISA define.
>>
>> Well, as I said, it's fine that we allocated larger memory for VNx1BI,VNx2BI,VNx4BI,
>> we can emit appropriate vsetvl to gurantee the correctness in RISC-V backward according
>> to the machine_mode as long as long GCC didn't do the incorrect elimination in middle-end.
>>
>> Besides, poly (1,1) is 1/8 of machine vector-length which is already really a small number,
>> which is the real number bytes loaded/stored for VNx8BI.
>> You can say VNx1BI, VNx2BI, VNx4BI are consuming larger memory than we actually load/stored by appropriate vsetvl
>> since they are having same ADJUST_BYTESIZE as VNx8BI. However, I think it's totally fine so far as long as we can
>> gurantee the correctness and I think optimizing such memory storage consuming is trivial.
>>
>> >> And does it equal the size of the corresponding LLVM machine type?
>>
>> Well, for some reason, in case of register spilling, LLVM consume much more memory than GCC.
>> And they always do whole register load/store (a single vector register vector-length) for register spilling.
>> That's another story (I am not going to talk too much about this since it's a quite ugly implementation).
>> They don't model the types accurately according RVV ISA for register spilling.
>>
>> In case of normal load/store like:
>> vbool8_t v2 = *(vbool8_t*)in; *(vbool8_t*)(out + 100) = v2;
>> This kind of load/store, their load/stores instructions of codegen are accurate.
>> Even though their instructions are accurate for load/store accessing behavior, I am not sure whether size
>> of their machine type is accurate.
>>
>> For example, in IR presentation: VNx1BI of GCC is represented as vscale x 1 x i1
>> VNx2BI of GCC is represented as vscale x 2 x i1
>> in LLVM IR.
>> I am not sure the bytesize of vscale x 1 x i1 and vscale x 2 x i1.
>> I didn't take a deep a look at it.
>>
>> I think this question is not that important, no matter whether VNx1BI and VNx2BI are modeled accurately in case of ADUST_BYTESIZE
>> in GCC or vscale x 1 x i1 and vscale x 2 x i1 are modeled accurately in case of their bytesize,
>> I think as long as we can emit appropriate vsetvl + vlm/vsm, it's totally fine for RVV even though in some case, their memory allocation
>> is not accurate in compiler.
>
> I'm not sure how it works for variable-length types but isn't
> sizeof (vbool8_t) part of the ABI and thus its TYPE_SIZE / GET_MODE_SIZE
> are relevant there? It might of course be that you can never have
> these types as part of aggregates, arrays or objects of them address-taken
> in which case the issue is moot?
>
> Richard.
>
>>
>> juzhe.zhong@rivai.ai
>>
>> From: Richard Sandiford
>> Date: 2023-03-02 00:14
>> To: Li\, Pan2
>> CC: juzhe.zhong\@rivai.ai; rguenther; gcc-patches; Pan Li; kito.cheng
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>> "Li, Pan2" <pan2.li@intel.com> writes:
>> > Thanks all for so much valuable and helpful materials.
>> >
>> > As I understand (Please help to correct me if any mistake.), for the VNx*BI (aka, 1, 2, 4, 8, 16, 32, 64),
>> > the precision and mode size need to be adjusted as below.
>> >
>> > Precision size [1, 2, 4, 8, 16, 32, 64]
>> > Mode size [1, 1, 1, 1, 2, 4, 8]
>> >
>> > Given that, if we ignore the self-test failure, only the adjust_precision part is able to fix the bug I mentioned.
>> > The genmode will first get the precision, and then leverage the mode_size = exact_div / 8 to generate.
>> > Meanwhile, it also provides the adjust_mode_size after the mode_size generation.
>> >
>> > The riscv parts has the mode_size_adjust already and the value of mode_size will be overridden by the adjustments.
>>
>> Ah, OK! In that case, would the following help:
>>
>> Turn:
>>
>> mode_size[E_%smode] = exact_div (mode_precision[E_%smode], BITS_PER_UNIT);
>>
>> into:
>>
>> if (!multiple_p (mode_precision[E_%smode], BITS_PER_UNIT,
>> &mode_size[E_%smode]))
>> mode_size[E_%smode] = -1;
>>
>> where -1 is an "obviously wrong" value.
>>
>> Ports that might hit the -1 are then responsible for setting the size
>> later, via ADJUST_BYTESIZE.
>>
>> After all the adjustments are complete, genmodes asserts that no size is
>> known_eq to -1.
>>
>> That way, target-independent code doesn't need to guess what the
>> correct behaviour is.
>>
>> Does the eventual value set by ADJUST_BYTESIZE equal the real number of
>> bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl)?
>> And does it equal the size of the corresponding LLVM machine type?
>> Or is the GCC size larger in some cases than the number of bytes
>> loaded and stored?
>>
>> (You and Juzhe have probably answered that question before, sorry,
>> but I'm still not 100% sure of the answer. Personally, I think I would
>> find the ISA behaviour easier to understand if the explanation doesn't
>> involve poly_ints. It would be good to understand things "as the
>> architecture sees then" rather than in terms of GCC concepts.)
>>
>> Thanks,
>> Richard
>>
>> > Unfortunately, the early stage mode_size generation leveraged exact_div, which doesn't honor precision size < 8
>> > with the adjustment and fails on exact_div assertions.
>> >
>> > Besides the precision adjustment, I am not sure if we can narrow down the problem to.
>> >
>> >
>> > 1. Defined the real size of both the precision and mode size to align the riscv ISA.
>> > 2. Besides, make the general mode_size = precision_size / 8 is able to take care of both the exact_div and the dividend less than the divisor (like 1/8 or 2/8) cases.
>> >
>> > Could you please share your professional suggestions about this? Thank you all again and have a nice day!
>> >
>> > Pan
>> >
>> > From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
>> > Sent: Wednesday, March 1, 2023 10:19 PM
>> > To: rguenther <rguenther@suse.de>
>> > Cc: richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Pan Li <incarnation.p.lee@outlook.com>; Li, Pan2 <pan2.li@intel.com>; kito.cheng <kito.cheng@sifive.com>
>> > Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>> >
>> >>> So given the above I think that modeling the size as being the same
>> >>> but with accurate precision would work. It's then only the size of the
>> >>> padding in bytes we cannot represent with poly-int which should be fine.
>> >
>> >>> Correct?
>> > Yes.
>> >
>> >>> Btw, is storing a VNx1BI and then loading a VNx2BI from the same
>> >>> memory address well-defined? That is, how is the padding handled
>> >>> by the machine load/store instructions?
>> >
>> > storing VNx1BI is storing the data from addr 0 ~ 1/8 poly (1,1) and keep addr 1/8 poly (1,1) ~ 2/8 poly (1,1) memory data unchange.
>> > load VNx2BI will load 0 ~ 2/8 poly (1,1), note that 0 ~ 1/8 poly (1,1) is the date that we store above, 1/8 poly (1,1) ~ 2/8 poly (1,1) is the orignal memory data.
>> > You can see here for this case (LLVM):
>> > https://godbolt.org/z/P9e1adrd3
>> > foo: # @foo
>> > vsetvli a2, zero, e8, mf8, ta, ma
>> > vsm.v v0, (a0)
>> > vsetvli a2, zero, e8, mf4, ta, ma
>> > vlm.v v8, (a0)
>> > vsm.v v8, (a1)
>> > ret
>> >
>> > We can also doing like this in GCC as long as we can differentiate VNx1BI and VNx2BI, and GCC do not eliminate statement according precision even though
>> > they have same bytesize.
>> >
>> > First we emit vsetvl e8mf8 +vsm for VNx1BI
>> > Then we emit vsetvl e8mf8 + vlm for VNx2BI
>> >
>> > Thanks.
>> > ________________________________
>> > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
>> >
>> > From: Richard Biener<mailto:rguenther@suse.de>
>> > Date: 2023-03-01 22:03
>> > To: juzhe.zhong<mailto:juzhe.zhong@rivai.ai>
>> > CC: richard.sandiford<mailto:richard.sandiford@arm.com>; gcc-patches<mailto:gcc-patches@gcc.gnu.org>; Pan Li<mailto:incarnation.p.lee@outlook.com>; pan2.li<mailto:pan2.li@intel.com>; kito.cheng<mailto:kito.cheng@sifive.com>
>> > Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>> > On Wed, 1 Mar 2023, Richard Biener wrote:
>> >
>> >> On Wed, 1 Mar 2023, juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> wrote:
>> >>
>> >> > Let's me first introduce RVV load/store basics and stack allocation.
>> >> > For scalable vector memory allocation, we allocate memory according to machine vector-length.
>> >> > To get this CPU vector-length value (runtime invariant but compile time unknown), we have an instruction call csrr vlenb.
>> >> > For example, csrr a5,vlenb (store CPU a single register vector-length value (describe as bytesize) in a5 register).
>> >> > A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.
>> >> >
>> >> > Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same bytesize poly (1,1). So their storage consumes the same size.
>> >> > Meaning when we want to allocate a memory storge or stack for register spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = a5/8)
>> >> > Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, VNx8BI are doing the same process as I described above. They all consume
>> >> > the same memory storage size since we can't model them accurately according to precision or you bitsize.
>> >> >
>> >> > They consume the same storage (I am agree it's better to model them more accurately in case of memory storage comsuming).
>> >> >
>> >> > Well, even though they are consuming same size memory storage, I can make their memory accessing behavior (load/store) accurately by
>> >> > emiting the accurate RVV instruction for them according to RVV ISA.
>> >> >
>> >> > VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size poly (1,1)
>> >> > The instruction for these modes as follows:
>> >> > VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage.
>> >> > VNx2BI: vsevl e8mf8 + vlm, loading 1/4 of poly (1,1) storage.
>> >> > VNx4BI: vsevl e8mf8 + vlm, loading 1/2 of poly (1,1) storage.
>> >> > VNx8BI: vsevl e8mf8 + vlm, loading 1 of poly (1,1) storage.
>> >> >
>> >> > So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, VNx8BI accurately according to precision or bitsize.
>> >> > This implementation is fine even though their memory storage is not accurate.
>> >> >
>> >> > However, the problem is that since they have the same bytesize, GCC will think they are the same and do some incorrect statement elimination:
>> >> >
>> >> > (Note: Load same memory base)
>> >> > load v0 VNx1BI from base0
>> >> > load v1 VNx2BI from base0
>> >> > load v2 VNx4BI from base0
>> >> > load v3 VNx8BI from base0
>> >> >
>> >> > store v0 base1
>> >> > store v1 base2
>> >> > store v2 base3
>> >> > store v3 base4
>> >> >
>> >> > This program sequence, in GCC, it will eliminate the last 3 load instructions.
>> >> >
>> >> > Then it will become:
>> >> >
>> >> > load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly size (1,1) memory data)
>> >> >
>> >> > store v0 base1
>> >> > store v0 base2
>> >> > store v0 base3
>> >> > store v0 base4
>> >> >
>> >> > This is what we want to fix. I think as long as we can have the way to differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI
>> >> > and GCC will not do th incorrect elimination for RVV.
>> >> >
>> >> > I think it can work fine even though these 4 modes consume inaccurate memory storage size
>> >> > but accurate data memory access load store behavior.
>> >>
>> >> So given the above I think that modeling the size as being the same
>> >> but with accurate precision would work. It's then only the size of the
>> >> padding in bytes we cannot represent with poly-int which should be fine.
>> >>
>> >> Correct?
>> >
>> > Btw, is storing a VNx1BI and then loading a VNx2BI from the same
>> > memory address well-defined? That is, how is the padding handled
>> > by the machine load/store instructions?
>> >
>> > Richard.
>>
>>
>> I realise it's a corner case. But I don't think making GET_MODE_SIZE
>>bigger than the real size is conservatively correct.
I don't understand which corner case will riscks wrong code.
Would you mind giving me some examples?
VNx8BI ABI size is the same as machine size.
The only inconsistency is VNx1BI VNx2BI VNx4BI.
These 3 ABI size is larger than machine behavior.
For example, VNx1BI GET_MODE_SIZE is poly (1,1) same as VNx8BI.
I emit vsetvl e8mf8 + vlm for VNx1BI which make GCC
load 1/8 of poly (1,1) bytes from the memory, even though the VNx1BI occupy the whole poly (1,1) size.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2023-03-02 17:39
To: juzhe.zhong\@rivai.ai
CC: rguenther; pan2.li; gcc-patches; incarnation.p.lee; Kito.cheng
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
Thanks for the explanation about the sizes.
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> Fortunately, we won't have aggregates, arrays of vbool*_t in the future.
> I think it's not an issue.
But isn't it possible to allocate a char/byte array and construct
vbool*_ts at addresses calculated by intrinsics? E.g. I don't see
anything wrong in principle with doing:
#include <arm_sve.h>
void f(char *x, svbool_t p1, svbool_t p2) {
*(svbool_t *)(x + svcntd()) = p2;
*(svbool_t *)(x) = p1;
}
If the mode size for svbool_t was too big, I think RTL DSE would be
within its rights to delete the first store. (Precision doesn't matter,
at least not currently.)
There's no problem if the ABI is defined such that vbool8_t has the same
size as the GET_MODE_SIZE recorded in GCC. (But of course, it would need
to be consistently so, even when the vector length is known at compile time.)
In that case, the difference between the size stored by the machine and the
size used by the ABI would be padding, and there is no requirement to
preserve padding. But if the ABI size of vbool8_t matches the machine
behaviour, I think making GCC's size bigger risks wrong code.
I realise it's a corner case. But I don't think making GET_MODE_SIZE
bigger than the real size is conservatively correct.
Thanks,
Richard
>
>
> juzhe.zhong@rivai.ai
>
> From: Richard Biener
> Date: 2023-03-02 16:25
> To: juzhe.zhong
> CC: richard.sandiford; pan2.li; gcc-patches; Pan Li; kito.cheng
> Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
> On Thu, 2 Mar 2023, juzhe.zhong@rivai.ai wrote:
>
>> >> Does the eventual value set by ADJUST_BYTESIZE equal the real number of
>> >> bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl)?
>> >> Or is the GCC size larger in some cases than the number of bytes
>> >> loaded and stored?
>> For VNx1BI,VNx2BI,VNx4BI,VNx8BI, we allocate the larger size of memory or stack for register spillling
>> according to ADJUST_BYTESIZE.
>> After appropriate vsetvl, VNx1BI is loaded/stored 1/8 of ADJUST_BYTESIZE (vsetvl e8mf8).
>> After appropriate vsetvl, VNx2BI is loaded/stored 2/8 of ADJUST_BYTESIZE (vsetvl e8mf2).
>> After appropriate vsetvl, VNx4BI is loaded/stored 4/8 of ADJUST_BYTESIZE (vsetvl e8mf4).
>> After appropriate vsetvl, VNx8BI is loaded/stored 8/8 of ADJUST_BYTESIZE (vsetvl e8m1).
>>
>> Note: except these 4 machine modes, all other machine modes of RVV, ADJUST_BYTESIZE
>> are equal to the real number of bytes of load/store instruction that RVV ISA define.
>>
>> Well, as I said, it's fine that we allocated larger memory for VNx1BI,VNx2BI,VNx4BI,
>> we can emit appropriate vsetvl to gurantee the correctness in RISC-V backward according
>> to the machine_mode as long as long GCC didn't do the incorrect elimination in middle-end.
>>
>> Besides, poly (1,1) is 1/8 of machine vector-length which is already really a small number,
>> which is the real number bytes loaded/stored for VNx8BI.
>> You can say VNx1BI, VNx2BI, VNx4BI are consuming larger memory than we actually load/stored by appropriate vsetvl
>> since they are having same ADJUST_BYTESIZE as VNx8BI. However, I think it's totally fine so far as long as we can
>> gurantee the correctness and I think optimizing such memory storage consuming is trivial.
>>
>> >> And does it equal the size of the corresponding LLVM machine type?
>>
>> Well, for some reason, in case of register spilling, LLVM consume much more memory than GCC.
>> And they always do whole register load/store (a single vector register vector-length) for register spilling.
>> That's another story (I am not going to talk too much about this since it's a quite ugly implementation).
>> They don't model the types accurately according RVV ISA for register spilling.
>>
>> In case of normal load/store like:
>> vbool8_t v2 = *(vbool8_t*)in; *(vbool8_t*)(out + 100) = v2;
>> This kind of load/store, their load/stores instructions of codegen are accurate.
>> Even though their instructions are accurate for load/store accessing behavior, I am not sure whether size
>> of their machine type is accurate.
>>
>> For example, in IR presentation: VNx1BI of GCC is represented as vscale x 1 x i1
>> VNx2BI of GCC is represented as vscale x 2 x i1
>> in LLVM IR.
>> I am not sure the bytesize of vscale x 1 x i1 and vscale x 2 x i1.
>> I didn't take a deep a look at it.
>>
>> I think this question is not that important, no matter whether VNx1BI and VNx2BI are modeled accurately in case of ADUST_BYTESIZE
>> in GCC or vscale x 1 x i1 and vscale x 2 x i1 are modeled accurately in case of their bytesize,
>> I think as long as we can emit appropriate vsetvl + vlm/vsm, it's totally fine for RVV even though in some case, their memory allocation
>> is not accurate in compiler.
>
> I'm not sure how it works for variable-length types but isn't
> sizeof (vbool8_t) part of the ABI and thus its TYPE_SIZE / GET_MODE_SIZE
> are relevant there? It might of course be that you can never have
> these types as part of aggregates, arrays or objects of them address-taken
> in which case the issue is moot?
>
> Richard.
>
>>
>> juzhe.zhong@rivai.ai
>>
>> From: Richard Sandiford
>> Date: 2023-03-02 00:14
>> To: Li\, Pan2
>> CC: juzhe.zhong\@rivai.ai; rguenther; gcc-patches; Pan Li; kito.cheng
>> Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>> "Li, Pan2" <pan2.li@intel.com> writes:
>> > Thanks all for so much valuable and helpful materials.
>> >
>> > As I understand (Please help to correct me if any mistake.), for the VNx*BI (aka, 1, 2, 4, 8, 16, 32, 64),
>> > the precision and mode size need to be adjusted as below.
>> >
>> > Precision size [1, 2, 4, 8, 16, 32, 64]
>> > Mode size [1, 1, 1, 1, 2, 4, 8]
>> >
>> > Given that, if we ignore the self-test failure, only the adjust_precision part is able to fix the bug I mentioned.
>> > The genmode will first get the precision, and then leverage the mode_size = exact_div / 8 to generate.
>> > Meanwhile, it also provides the adjust_mode_size after the mode_size generation.
>> >
>> > The riscv parts has the mode_size_adjust already and the value of mode_size will be overridden by the adjustments.
>>
>> Ah, OK! In that case, would the following help:
>>
>> Turn:
>>
>> mode_size[E_%smode] = exact_div (mode_precision[E_%smode], BITS_PER_UNIT);
>>
>> into:
>>
>> if (!multiple_p (mode_precision[E_%smode], BITS_PER_UNIT,
>> &mode_size[E_%smode]))
>> mode_size[E_%smode] = -1;
>>
>> where -1 is an "obviously wrong" value.
>>
>> Ports that might hit the -1 are then responsible for setting the size
>> later, via ADJUST_BYTESIZE.
>>
>> After all the adjustments are complete, genmodes asserts that no size is
>> known_eq to -1.
>>
>> That way, target-independent code doesn't need to guess what the
>> correct behaviour is.
>>
>> Does the eventual value set by ADJUST_BYTESIZE equal the real number of
>> bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl)?
>> And does it equal the size of the corresponding LLVM machine type?
>> Or is the GCC size larger in some cases than the number of bytes
>> loaded and stored?
>>
>> (You and Juzhe have probably answered that question before, sorry,
>> but I'm still not 100% sure of the answer. Personally, I think I would
>> find the ISA behaviour easier to understand if the explanation doesn't
>> involve poly_ints. It would be good to understand things "as the
>> architecture sees then" rather than in terms of GCC concepts.)
>>
>> Thanks,
>> Richard
>>
>> > Unfortunately, the early stage mode_size generation leveraged exact_div, which doesn't honor precision size < 8
>> > with the adjustment and fails on exact_div assertions.
>> >
>> > Besides the precision adjustment, I am not sure if we can narrow down the problem to.
>> >
>> >
>> > 1. Defined the real size of both the precision and mode size to align the riscv ISA.
>> > 2. Besides, make the general mode_size = precision_size / 8 is able to take care of both the exact_div and the dividend less than the divisor (like 1/8 or 2/8) cases.
>> >
>> > Could you please share your professional suggestions about this? Thank you all again and have a nice day!
>> >
>> > Pan
>> >
>> > From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
>> > Sent: Wednesday, March 1, 2023 10:19 PM
>> > To: rguenther <rguenther@suse.de>
>> > Cc: richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Pan Li <incarnation.p.lee@outlook.com>; Li, Pan2 <pan2.li@intel.com>; kito.cheng <kito.cheng@sifive.com>
>> > Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>> >
>> >>> So given the above I think that modeling the size as being the same
>> >>> but with accurate precision would work. It's then only the size of the
>> >>> padding in bytes we cannot represent with poly-int which should be fine.
>> >
>> >>> Correct?
>> > Yes.
>> >
>> >>> Btw, is storing a VNx1BI and then loading a VNx2BI from the same
>> >>> memory address well-defined? That is, how is the padding handled
>> >>> by the machine load/store instructions?
>> >
>> > storing VNx1BI is storing the data from addr 0 ~ 1/8 poly (1,1) and keep addr 1/8 poly (1,1) ~ 2/8 poly (1,1) memory data unchange.
>> > load VNx2BI will load 0 ~ 2/8 poly (1,1), note that 0 ~ 1/8 poly (1,1) is the date that we store above, 1/8 poly (1,1) ~ 2/8 poly (1,1) is the orignal memory data.
>> > You can see here for this case (LLVM):
>> > https://godbolt.org/z/P9e1adrd3
>> > foo: # @foo
>> > vsetvli a2, zero, e8, mf8, ta, ma
>> > vsm.v v0, (a0)
>> > vsetvli a2, zero, e8, mf4, ta, ma
>> > vlm.v v8, (a0)
>> > vsm.v v8, (a1)
>> > ret
>> >
>> > We can also doing like this in GCC as long as we can differentiate VNx1BI and VNx2BI, and GCC do not eliminate statement according precision even though
>> > they have same bytesize.
>> >
>> > First we emit vsetvl e8mf8 +vsm for VNx1BI
>> > Then we emit vsetvl e8mf8 + vlm for VNx2BI
>> >
>> > Thanks.
>> > ________________________________
>> > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
>> >
>> > From: Richard Biener<mailto:rguenther@suse.de>
>> > Date: 2023-03-01 22:03
>> > To: juzhe.zhong<mailto:juzhe.zhong@rivai.ai>
>> > CC: richard.sandiford<mailto:richard.sandiford@arm.com>; gcc-patches<mailto:gcc-patches@gcc.gnu.org>; Pan Li<mailto:incarnation.p.lee@outlook.com>; pan2.li<mailto:pan2.li@intel.com>; kito.cheng<mailto:kito.cheng@sifive.com>
>> > Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment
>> > On Wed, 1 Mar 2023, Richard Biener wrote:
>> >
>> >> On Wed, 1 Mar 2023, juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> wrote:
>> >>
>> >> > Let's me first introduce RVV load/store basics and stack allocation.
>> >> > For scalable vector memory allocation, we allocate memory according to machine vector-length.
>> >> > To get this CPU vector-length value (runtime invariant but compile time unknown), we have an instruction call csrr vlenb.
>> >> > For example, csrr a5,vlenb (store CPU a single register vector-length value (describe as bytesize) in a5 register).
>> >> > A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) bytes. That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes.
>> >> >
>> >> > Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same bytesize poly (1,1). So their storage consumes the same size.
>> >> > Meaning when we want to allocate a memory storge or stack for register spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 = a5/8)
>> >> > Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4BI, VNx8BI are doing the same process as I described above. They all consume
>> >> > the same memory storage size since we can't model them accurately according to precision or you bitsize.
>> >> >
>> >> > They consume the same storage (I am agree it's better to model them more accurately in case of memory storage comsuming).
>> >> >
>> >> > Well, even though they are consuming same size memory storage, I can make their memory accessing behavior (load/store) accurately by
>> >> > emiting the accurate RVV instruction for them according to RVV ISA.
>> >> >
>> >> > VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with size poly (1,1)
>> >> > The instruction for these modes as follows:
>> >> > VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage.
>> >> > VNx2BI: vsevl e8mf8 + vlm, loading 1/4 of poly (1,1) storage.
>> >> > VNx4BI: vsevl e8mf8 + vlm, loading 1/2 of poly (1,1) storage.
>> >> > VNx8BI: vsevl e8mf8 + vlm, loading 1 of poly (1,1) storage.
>> >> >
>> >> > So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, VNx8BI accurately according to precision or bitsize.
>> >> > This implementation is fine even though their memory storage is not accurate.
>> >> >
>> >> > However, the problem is that since they have the same bytesize, GCC will think they are the same and do some incorrect statement elimination:
>> >> >
>> >> > (Note: Load same memory base)
>> >> > load v0 VNx1BI from base0
>> >> > load v1 VNx2BI from base0
>> >> > load v2 VNx4BI from base0
>> >> > load v3 VNx8BI from base0
>> >> >
>> >> > store v0 base1
>> >> > store v1 base2
>> >> > store v2 base3
>> >> > store v3 base4
>> >> >
>> >> > This program sequence, in GCC, it will eliminate the last 3 load instructions.
>> >> >
>> >> > Then it will become:
>> >> >
>> >> > load v0 VNx1BI from base0 ===> vsetvl e8mf8 + vlm (only load 1/8 of poly size (1,1) memory data)
>> >> >
>> >> > store v0 base1
>> >> > store v0 base2
>> >> > store v0 base3
>> >> > store v0 base4
>> >> >
>> >> > This is what we want to fix. I think as long as we can have the way to differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI
>> >> > and GCC will not do th incorrect elimination for RVV.
>> >> >
>> >> > I think it can work fine even though these 4 modes consume inaccurate memory storage size
>> >> > but accurate data memory access load store behavior.
>> >>
>> >> So given the above I think that modeling the size as being the same
>> >> but with accurate precision would work. It's then only the size of the
>> >> padding in bytes we cannot represent with poly-int which should be fine.
>> >>
>> >> Correct?
>> >
>> > Btw, is storing a VNx1BI and then loading a VNx2BI from the same
>> > memory address well-defined? That is, how is the padding handled
>> > by the machine load/store instructions?
>> >
>> > Richard.
>>
>>
@@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
+ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
+ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
+ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
+ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
+ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
+ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
+ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
+
/*
| Mode | MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
| | LMUL | SEW/LMUL | LMUL | SEW/LMUL |
@@ -1003,6 +1003,18 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
return scale;
}
+/* Call from ADJUST_PRECISION in riscv-modes.def. Return the correct
+ PRECISION size for corresponding machine_mode. */
+
+poly_int64
+riscv_v_adjust_precision (machine_mode mode, int scale)
+{
+ if (riscv_v_ext_vector_mode_p (mode))
+ return riscv_vector_chunks * scale;
+
+ return scale;
+}
+
/* Return true if X is a valid address for machine mode MODE. If it is,
fill in INFO appropriately. STRICT_P is true if REG_OK_STRICT is in
effect. */
@@ -1025,6 +1025,7 @@ extern unsigned riscv_stack_boundary;
extern unsigned riscv_bytes_per_vector_chunk;
extern poly_uint16 riscv_vector_chunks;
extern poly_int64 riscv_v_adjust_nunits (enum machine_mode, int);
+extern poly_int64 riscv_v_adjust_precision (enum machine_mode, int);
/* The number of bits and bytes in a RVV vector. */
#define BITS_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk * 8))
#define BYTES_PER_RISCV_VECTOR (poly_uint16 (riscv_vector_chunks * riscv_bytes_per_vector_chunk))
@@ -114,6 +114,7 @@ static struct mode_adjust *adj_alignment;
static struct mode_adjust *adj_format;
static struct mode_adjust *adj_ibit;
static struct mode_adjust *adj_fbit;
+static struct mode_adjust *adj_precision;
/* Mode class operations. */
static enum mode_class
@@ -819,6 +820,7 @@ make_vector_mode (enum mode_class bclass,
#define ADJUST_NUNITS(M, X) _ADD_ADJUST (nunits, M, X, RANDOM, RANDOM)
#define ADJUST_BYTESIZE(M, X) _ADD_ADJUST (bytesize, M, X, RANDOM, RANDOM)
#define ADJUST_ALIGNMENT(M, X) _ADD_ADJUST (alignment, M, X, RANDOM, RANDOM)
+#define ADJUST_PRECISION(M, X) _ADD_ADJUST (precision, M, X, RANDOM, RANDOM)
#define ADJUST_FLOAT_FORMAT(M, X) _ADD_ADJUST (format, M, X, FLOAT, FLOAT)
#define ADJUST_IBIT(M, X) _ADD_ADJUST (ibit, M, X, ACCUM, UACCUM)
#define ADJUST_FBIT(M, X) _ADD_ADJUST (fbit, M, X, FRACT, UACCUM)
@@ -1829,7 +1831,15 @@ emit_mode_adjustments (void)
" (mode_precision[E_%smode], mode_nunits[E_%smode]);\n",
m->name, m->name);
printf (" mode_precision[E_%smode] = ps * old_factor;\n", m->name);
- printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
+ /* Normalize the size to 1 if precison is less than BITS_PER_UNIT. */
+ printf (" poly_uint16 size_one = "
+ "mode_precision[E_%smode].is_constant ()\n", m->name);
+ printf (" ? poly_uint16 (1, 0) : poly_uint16 (1, 1);\n");
+ printf (" if (known_lt (mode_precision[E_%smode], "
+ "size_one * BITS_PER_UNIT))\n", m->name);
+ printf (" mode_size[E_%smode] = size_one;\n", m->name);
+ printf (" else\n");
+ printf (" mode_size[E_%smode] = exact_div (mode_precision[E_%smode],"
" BITS_PER_UNIT);\n", m->name, m->name);
printf (" mode_nunits[E_%smode] = ps;\n", m->name);
printf (" adjust_mode_mask (E_%smode);\n", m->name);
@@ -1963,6 +1973,19 @@ emit_mode_adjustments (void)
printf ("\n /* %s:%d */\n REAL_MODE_FORMAT (E_%smode) = %s;\n",
a->file, a->line, a->mode->name, a->adjustment);
+ /* Adjust precision to the actual bits size. */
+ for (a = adj_precision; a; a = a->next)
+ switch (a->mode->cl)
+ {
+ case MODE_VECTOR_BOOL:
+ printf ("\n /* %s:%d. */\n ps = %s;\n", a->file, a->line,
+ a->adjustment);
+ printf (" mode_precision[E_%smode] = ps;\n", a->mode->name);
+ break;
+ default:
+ break;
+ }
+
puts ("}");
}
new file mode 100644
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool1_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool1_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 18 } } */
new file mode 100644
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool2_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 17 } } */
new file mode 100644
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool4_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 16 } } */
new file mode 100644
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool8_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
new file mode 100644
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool16_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */
new file mode 100644
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool32_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */
new file mode 100644
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool64_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
new file mode 100644
@@ -0,0 +1,77 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+void
+test_vbool1_then_vbool1(int8_t * restrict in, int8_t * restrict out) {
+ vbool1_t v1 = *(vbool1_t*)in;
+ vbool1_t v2 = *(vbool1_t*)in;
+
+ *(vbool1_t*)(out + 100) = v1;
+ *(vbool1_t*)(out + 200) = v2;
+}
+
+void
+test_vbool2_then_vbool2(int8_t * restrict in, int8_t * restrict out) {
+ vbool2_t v1 = *(vbool2_t*)in;
+ vbool2_t v2 = *(vbool2_t*)in;
+
+ *(vbool2_t*)(out + 100) = v1;
+ *(vbool2_t*)(out + 200) = v2;
+}
+
+void
+test_vbool4_then_vbool4(int8_t * restrict in, int8_t * restrict out) {
+ vbool4_t v1 = *(vbool4_t*)in;
+ vbool4_t v2 = *(vbool4_t*)in;
+
+ *(vbool4_t*)(out + 100) = v1;
+ *(vbool4_t*)(out + 200) = v2;
+}
+
+void
+test_vbool8_then_vbool8(int8_t * restrict in, int8_t * restrict out) {
+ vbool8_t v1 = *(vbool8_t*)in;
+ vbool8_t v2 = *(vbool8_t*)in;
+
+ *(vbool8_t*)(out + 100) = v1;
+ *(vbool8_t*)(out + 200) = v2;
+}
+
+void
+test_vbool16_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
+ vbool16_t v1 = *(vbool16_t*)in;
+ vbool16_t v2 = *(vbool16_t*)in;
+
+ *(vbool16_t*)(out + 100) = v1;
+ *(vbool16_t*)(out + 200) = v2;
+}
+
+void
+test_vbool32_then_vbool32(int8_t * restrict in, int8_t * restrict out) {
+ vbool32_t v1 = *(vbool32_t*)in;
+ vbool32_t v2 = *(vbool32_t*)in;
+
+ *(vbool32_t*)(out + 100) = v1;
+ *(vbool32_t*)(out + 200) = v2;
+}
+
+void
+test_vbool64_then_vbool64(int8_t * restrict in, int8_t * restrict out) {
+ vbool64_t v1 = *(vbool64_t*)in;
+ vbool64_t v2 = *(vbool64_t*)in;
+
+ *(vbool64_t*)(out + 100) = v1;
+ *(vbool64_t*)(out + 200) = v2;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
+/* { dg-final { scan-assembler-times {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 7 } } */
+/* { dg-final { scan-assembler-times {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */