RISC-V: Remove redundant vlmul_ext_* patterns to fix PR110109

Message ID 20230604085147.3989859-1-juzhe.zhong@rivai.ai
State Accepted
Headers
Series RISC-V: Remove redundant vlmul_ext_* patterns to fix PR110109 |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

juzhe.zhong@rivai.ai June 4, 2023, 8:51 a.m. UTC
  From: Juzhe-Zhong <juzhe.zhong@rivai.ai>

        PR target/110109

This patch is to fix PR110109 issue. This issue happens is because:

(define_insn_and_split "*vlmul_extx2<mode>"
  [(set (match_operand:<VLMULX2> 0 "register_operand"  "=vr, ?&vr")
       (subreg:<VLMULX2>
         (match_operand:VLMULEXT2 1 "register_operand" " 0,   vr") 0))]
  "TARGET_VECTOR"
  "#"
  "&& reload_completed"
  [(const_int 0)]
{
  emit_insn (gen_rtx_SET (gen_lowpart (<MODE>mode, operands[0]), operands[1]));
  DONE;
})

Such pattern generate such codes in insn-recog.cc:
static int
pattern57 (rtx x1)
{
  rtx * const operands ATTRIBUTE_UNUSED = &recog_data.operand[0];
  rtx x2;
  int res ATTRIBUTE_UNUSED;
  if (maybe_ne (SUBREG_BYTE (x1).to_constant (), 0))
    return -1;
...

PR110109 ICE at maybe_ne (SUBREG_BYTE (x1).to_constant (), 0) since for scalable
RVV modes can not be accessed as SUBREG_BYTE (x1).to_constant ()

I create that patterns is to optimize the following test:
vfloat32m2_t test_vlmul_ext_v_f32mf2_f32m2(vfloat32mf2_t op1) {
  return __riscv_vlmul_ext_v_f32mf2_f32m2(op1);
}

codegen:
test_vlmul_ext_v_f32mf2_f32m2:
        vsetvli a5,zero,e32,m2,ta,ma
        vmv.v.i v2,0
        vsetvli a5,zero,e32,mf2,ta,ma
        vle32.v v2,0(a1)
        vs2r.v  v2,0(a0)
        ret

There is a redundant 'vmv.v.i' here, Since GCC doesn't undefine IR (unlike LLVM, LLVM has undef/poison).
For vlmul_ext_* RVV intrinsic, GCC will initiate all zeros into register. However, I think it's not
a big issue after we support subreg livness tracking.

gcc/ChangeLog:

        * config/riscv/riscv-vector-builtins-bases.cc: Change expand approach.
        * config/riscv/vector.md (@vlmul_extx2<mode>): Remove it.
        (@vlmul_extx4<mode>): Ditto.
        (@vlmul_extx8<mode>): Ditto.
        (@vlmul_extx16<mode>): Ditto.
        (@vlmul_extx32<mode>): Ditto.
        (@vlmul_extx64<mode>): Ditto.
        (*vlmul_extx2<mode>): Ditto.
        (*vlmul_extx4<mode>): Ditto.
        (*vlmul_extx8<mode>): Ditto.
        (*vlmul_extx16<mode>): Ditto.
        (*vlmul_extx32<mode>): Ditto.
        (*vlmul_extx64<mode>): Ditto.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/base/pr110109-1.c: New test.
        * gcc.target/riscv/rvv/base/pr110109-2.c: New test.

---
 .../riscv/riscv-vector-builtins-bases.cc      |  28 +-
 gcc/config/riscv/vector.md                    | 120 -----
 .../gcc.target/riscv/rvv/base/pr110109-1.c    |  40 ++
 .../gcc.target/riscv/rvv/base/pr110109-2.c    | 485 ++++++++++++++++++
 4 files changed, 529 insertions(+), 144 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110109-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110109-2.c
  

Comments

Jeff Law June 4, 2023, 1:54 p.m. UTC | #1
On 6/4/23 02:51, juzhe.zhong@rivai.ai wrote:
> From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
> 
>          PR target/110109
> 
> This patch is to fix PR110109 issue. This issue happens is because:
> 
> (define_insn_and_split "*vlmul_extx2<mode>"
>    [(set (match_operand:<VLMULX2> 0 "register_operand"  "=vr, ?&vr")
>         (subreg:<VLMULX2>
>           (match_operand:VLMULEXT2 1 "register_operand" " 0,   vr") 0))]
>    "TARGET_VECTOR"
>    "#"
>    "&& reload_completed"
>    [(const_int 0)]
> {
>    emit_insn (gen_rtx_SET (gen_lowpart (<MODE>mode, operands[0]), operands[1]));
>    DONE;
> })
So anytime you find yourself with an explicit subreg in a pattern, 
there's a very reasonable chance you've made a mistake somewhere else.

As a result every time I see an explicit subreg in a pattern I ask the 
author to describe in a fair amount of detail why the subreg was needed.

 From a first glance, they definitely look like you're papering over a 
problem elsewhere.    These are just simple moves.  For scalar modes 
this would be clearly wrong, but I'm not sure we have the same 
restrictions on vector moves.

I would also caution against the way you're generating code here.  I'd 
have to sit down with it for a while, but I'm not 100% sure you can just 
change the location of the subreg like you did (it's going to move from 
wrapping operand1 to wrapping operand0).  The semantics may be subtly 
different -- and that's one of the other reasons to avoid explicit 
subregs.  It's easy to get the semantics wrong.


> 
> I create that patterns is to optimize the following test:
> vfloat32m2_t test_vlmul_ext_v_f32mf2_f32m2(vfloat32mf2_t op1) {
>    return __riscv_vlmul_ext_v_f32mf2_f32m2(op1);
> }
> 
> codegen:
> test_vlmul_ext_v_f32mf2_f32m2:
>          vsetvli a5,zero,e32,m2,ta,ma
>          vmv.v.i v2,0
>          vsetvli a5,zero,e32,mf2,ta,ma
>          vle32.v v2,0(a1)
>          vs2r.v  v2,0(a0)
>          ret
> 
> There is a redundant 'vmv.v.i' here, Since GCC doesn't undefine IR (unlike LLVM, LLVM has undef/poison).
> For vlmul_ext_* RVV intrinsic, GCC will initiate all zeros into register. However, I think it's not
> a big issue after we support subreg livness tracking.
As I've suggested elsewhere, let's get the code correct and reasonably 
complete before we worry about this class of problems.  I'm not even 
convinced it's a big issue right now.



> 
> gcc/ChangeLog:
> 
>          * config/riscv/riscv-vector-builtins-bases.cc: Change expand approach.
>          * config/riscv/vector.md (@vlmul_extx2<mode>): Remove it.
>          (@vlmul_extx4<mode>): Ditto.
>          (@vlmul_extx8<mode>): Ditto.
>          (@vlmul_extx16<mode>): Ditto.
>          (@vlmul_extx32<mode>): Ditto.
>          (@vlmul_extx64<mode>): Ditto.
>          (*vlmul_extx2<mode>): Ditto.
>          (*vlmul_extx4<mode>): Ditto.
>          (*vlmul_extx8<mode>): Ditto.
>          (*vlmul_extx16<mode>): Ditto.
>          (*vlmul_extx32<mode>): Ditto.
>          (*vlmul_extx64<mode>): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/rvv/base/pr110109-1.c: New test.
>          * gcc.target/riscv/rvv/base/pr110109-2.c: New test.
Approved.  Please commit.

Jeff
  
Li, Pan2 via Gcc-patches June 4, 2023, 2:03 p.m. UTC | #2
Committed, thanks Jeff.

Pan

-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of Jeff Law via Gcc-patches
Sent: Sunday, June 4, 2023 9:55 PM
To: juzhe.zhong@rivai.ai; gcc-patches@gcc.gnu.org
Cc: kito.cheng@sifive.com; palmer@rivosinc.com; rdapp.gcc@gmail.com
Subject: Re: [PATCH] RISC-V: Remove redundant vlmul_ext_* patterns to fix PR110109



On 6/4/23 02:51, juzhe.zhong@rivai.ai wrote:
> From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
> 
>          PR target/110109
> 
> This patch is to fix PR110109 issue. This issue happens is because:
> 
> (define_insn_and_split "*vlmul_extx2<mode>"
>    [(set (match_operand:<VLMULX2> 0 "register_operand"  "=vr, ?&vr")
>         (subreg:<VLMULX2>
>           (match_operand:VLMULEXT2 1 "register_operand" " 0,   vr") 0))]
>    "TARGET_VECTOR"
>    "#"
>    "&& reload_completed"
>    [(const_int 0)]
> {
>    emit_insn (gen_rtx_SET (gen_lowpart (<MODE>mode, operands[0]), operands[1]));
>    DONE;
> })
So anytime you find yourself with an explicit subreg in a pattern, there's a very reasonable chance you've made a mistake somewhere else.

As a result every time I see an explicit subreg in a pattern I ask the author to describe in a fair amount of detail why the subreg was needed.

 From a first glance, they definitely look like you're papering over a 
problem elsewhere.    These are just simple moves.  For scalar modes 
this would be clearly wrong, but I'm not sure we have the same restrictions on vector moves.

I would also caution against the way you're generating code here.  I'd have to sit down with it for a while, but I'm not 100% sure you can just change the location of the subreg like you did (it's going to move from wrapping operand1 to wrapping operand0).  The semantics may be subtly different -- and that's one of the other reasons to avoid explicit subregs.  It's easy to get the semantics wrong.


> 
> I create that patterns is to optimize the following test:
> vfloat32m2_t test_vlmul_ext_v_f32mf2_f32m2(vfloat32mf2_t op1) {
>    return __riscv_vlmul_ext_v_f32mf2_f32m2(op1);
> }
> 
> codegen:
> test_vlmul_ext_v_f32mf2_f32m2:
>          vsetvli a5,zero,e32,m2,ta,ma
>          vmv.v.i v2,0
>          vsetvli a5,zero,e32,mf2,ta,ma
>          vle32.v v2,0(a1)
>          vs2r.v  v2,0(a0)
>          ret
> 
> There is a redundant 'vmv.v.i' here, Since GCC doesn't undefine IR (unlike LLVM, LLVM has undef/poison).
> For vlmul_ext_* RVV intrinsic, GCC will initiate all zeros into 
> register. However, I think it's not a big issue after we support subreg livness tracking.
As I've suggested elsewhere, let's get the code correct and reasonably complete before we worry about this class of problems.  I'm not even convinced it's a big issue right now.



> 
> gcc/ChangeLog:
> 
>          * config/riscv/riscv-vector-builtins-bases.cc: Change expand approach.
>          * config/riscv/vector.md (@vlmul_extx2<mode>): Remove it.
>          (@vlmul_extx4<mode>): Ditto.
>          (@vlmul_extx8<mode>): Ditto.
>          (@vlmul_extx16<mode>): Ditto.
>          (@vlmul_extx32<mode>): Ditto.
>          (@vlmul_extx64<mode>): Ditto.
>          (*vlmul_extx2<mode>): Ditto.
>          (*vlmul_extx4<mode>): Ditto.
>          (*vlmul_extx8<mode>): Ditto.
>          (*vlmul_extx16<mode>): Ditto.
>          (*vlmul_extx32<mode>): Ditto.
>          (*vlmul_extx64<mode>): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/rvv/base/pr110109-1.c: New test.
>          * gcc.target/riscv/rvv/base/pr110109-2.c: New test.
Approved.  Please commit.

Jeff
  

Patch

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 09870c327fa..87a684dd127 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1565,30 +1565,10 @@  public:
 
   rtx expand (function_expander &e) const override
   {
-    e.add_input_operand (0);
-    switch (e.op_info->ret.base_type)
-      {
-      case RVV_BASE_vlmul_ext_x2:
-	return e.generate_insn (
-	  code_for_vlmul_extx2 (e.vector_mode ()));
-      case RVV_BASE_vlmul_ext_x4:
-	return e.generate_insn (
-	  code_for_vlmul_extx4 (e.vector_mode ()));
-      case RVV_BASE_vlmul_ext_x8:
-	return e.generate_insn (
-	  code_for_vlmul_extx8 (e.vector_mode ()));
-      case RVV_BASE_vlmul_ext_x16:
-	return e.generate_insn (
-	  code_for_vlmul_extx16 (e.vector_mode ()));
-      case RVV_BASE_vlmul_ext_x32:
-	return e.generate_insn (
-	  code_for_vlmul_extx32 (e.vector_mode ()));
-      case RVV_BASE_vlmul_ext_x64:
-	return e.generate_insn (
-	  code_for_vlmul_extx64 (e.vector_mode ()));
-      default:
-	gcc_unreachable ();
-      }
+    tree arg = CALL_EXPR_ARG (e.exp, 0);
+    rtx src = expand_normal (arg);
+    emit_insn (gen_rtx_SET (gen_lowpart (e.vector_mode (), e.target), src));
+    return e.target;
   }
 };
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 79f1644732a..2496eff7874 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -498,126 +498,6 @@ 
   }
 )
 
-(define_expand "@vlmul_extx2<mode>"
-  [(set (match_operand:<VLMULX2> 0 "register_operand")
-  	(subreg:<VLMULX2>
-  	  (match_operand:VLMULEXT2 1 "register_operand") 0))]
-  "TARGET_VECTOR"
-{})
-
-(define_expand "@vlmul_extx4<mode>"
-  [(set (match_operand:<VLMULX4> 0 "register_operand")
-  	(subreg:<VLMULX4>
-  	  (match_operand:VLMULEXT4 1 "register_operand") 0))]
-  "TARGET_VECTOR"
-{})
-
-(define_expand "@vlmul_extx8<mode>"
-  [(set (match_operand:<VLMULX8> 0 "register_operand")
-  	(subreg:<VLMULX8>
-  	  (match_operand:VLMULEXT8 1 "register_operand") 0))]
-  "TARGET_VECTOR"
-{})
-
-(define_expand "@vlmul_extx16<mode>"
-  [(set (match_operand:<VLMULX16> 0 "register_operand")
-  	(subreg:<VLMULX16>
-  	  (match_operand:VLMULEXT16 1 "register_operand") 0))]
-  "TARGET_VECTOR"
-{})
-
-(define_expand "@vlmul_extx32<mode>"
-  [(set (match_operand:<VLMULX32> 0 "register_operand")
-  	(subreg:<VLMULX32>
-  	  (match_operand:VLMULEXT32 1 "register_operand") 0))]
-  "TARGET_VECTOR"
-{})
-
-(define_expand "@vlmul_extx64<mode>"
-  [(set (match_operand:<VLMULX64> 0 "register_operand")
-  	(subreg:<VLMULX64>
-  	  (match_operand:VLMULEXT64 1 "register_operand") 0))]
-  "TARGET_VECTOR"
-{})
-
-(define_insn_and_split "*vlmul_extx2<mode>"
-  [(set (match_operand:<VLMULX2> 0 "register_operand"  "=vr, ?&vr")
-	(subreg:<VLMULX2>
-	  (match_operand:VLMULEXT2 1 "register_operand" " 0,   vr") 0))]
-  "TARGET_VECTOR"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-{
-  emit_insn (gen_rtx_SET (gen_lowpart (<MODE>mode, operands[0]), operands[1]));
-  DONE;
-})
-
-(define_insn_and_split "*vlmul_extx4<mode>"
-  [(set (match_operand:<VLMULX4> 0 "register_operand"  "=vr, ?&vr")
-	(subreg:<VLMULX4>
-	  (match_operand:VLMULEXT4 1 "register_operand" " 0,   vr") 0))]
-  "TARGET_VECTOR"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-{
-  emit_insn (gen_rtx_SET (gen_lowpart (<MODE>mode, operands[0]), operands[1]));
-  DONE;
-})
-
-(define_insn_and_split "*vlmul_extx8<mode>"
-  [(set (match_operand:<VLMULX8> 0 "register_operand"  "=vr, ?&vr")
-	(subreg:<VLMULX8>
-	  (match_operand:VLMULEXT8 1 "register_operand" " 0,   vr") 0))]
-  "TARGET_VECTOR"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-{
-  emit_insn (gen_rtx_SET (gen_lowpart (<MODE>mode, operands[0]), operands[1]));
-  DONE;
-})
-
-(define_insn_and_split "*vlmul_extx16<mode>"
-  [(set (match_operand:<VLMULX16> 0 "register_operand"  "=vr, ?&vr")
-	(subreg:<VLMULX16>
-	  (match_operand:VLMULEXT16 1 "register_operand" " 0,   vr") 0))]
-  "TARGET_VECTOR"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-{
-  emit_insn (gen_rtx_SET (gen_lowpart (<MODE>mode, operands[0]), operands[1]));
-  DONE;
-})
-
-(define_insn_and_split "*vlmul_extx32<mode>"
-  [(set (match_operand:<VLMULX32> 0 "register_operand"  "=vr, ?&vr")
-	(subreg:<VLMULX32>
-	  (match_operand:VLMULEXT32 1 "register_operand" " 0,   vr") 0))]
-  "TARGET_VECTOR"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-{
-  emit_insn (gen_rtx_SET (gen_lowpart (<MODE>mode, operands[0]), operands[1]));
-  DONE;
-})
-
-(define_insn_and_split "*vlmul_extx64<mode>"
-  [(set (match_operand:<VLMULX64> 0 "register_operand"  "=vr, ?&vr")
-	(subreg:<VLMULX64>
-	  (match_operand:VLMULEXT64 1 "register_operand" " 0,   vr") 0))]
-  "TARGET_VECTOR"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-{
-  emit_insn (gen_rtx_SET (gen_lowpart (<MODE>mode, operands[0]), operands[1]));
-  DONE;
-})
-
 ;; This pattern is used to hold the AVL operand for
 ;; RVV instructions that implicity use VLMAX AVL.
 ;; RVV instruction implicitly use GPR that is ultimately
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110109-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110109-1.c
new file mode 100644
index 00000000000..e921c431c2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110109-1.c
@@ -0,0 +1,40 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv32gcv -mabi=ilp32d" } */
+
+#include "riscv_vector.h"
+
+void __attribute__ ((noinline, noclone))
+clean_subreg (int32_t *in, int32_t *out, size_t m)
+{
+   vint16m8_t v24, v8, v16;   
+  vint32m8_t result = __riscv_vle32_v_i32m8 (in, 32);
+  vint32m1_t v0 = __riscv_vget_v_i32m8_i32m1 (result, 0);
+  vint32m1_t v1 = __riscv_vget_v_i32m8_i32m1 (result, 1);
+  vint32m1_t v2 = __riscv_vget_v_i32m8_i32m1 (result, 2);
+  vint32m1_t v3 = __riscv_vget_v_i32m8_i32m1 (result, 3);
+  vint32m1_t v4 = __riscv_vget_v_i32m8_i32m1 (result, 4);
+  vint32m1_t v5 = __riscv_vget_v_i32m8_i32m1 (result, 5);
+  vint32m1_t v6 = __riscv_vget_v_i32m8_i32m1 (result, 6);
+  vint32m1_t v7 = __riscv_vget_v_i32m8_i32m1 (result, 7);
+  for (size_t i = 0; i < m; i++)
+    {
+      v0 = __riscv_vadd_vv_i32m1(v0, v0, 4);
+      v1 = __riscv_vadd_vv_i32m1(v1, v1, 4);
+      v2 = __riscv_vadd_vv_i32m1(v2, v2, 4);
+      v3 = __riscv_vadd_vv_i32m1(v3, v3, 4);
+      v4 = __riscv_vadd_vv_i32m1(v4, v4, 4);
+      v5 = __riscv_vadd_vv_i32m1(v5, v5, 4);
+      v6 = __riscv_vadd_vv_i32m1(v6, v6, 4);
+      v7 = __riscv_vadd_vv_i32m1(v7, v7, 4);
+    }
+  vint32m8_t result2 = __riscv_vundefined_i32m8 ();
+  result2 = __riscv_vset_v_i32m1_i32m8 (result2, 0, v0);
+  result2 = __riscv_vset_v_i32m1_i32m8 (result2, 1, v1);
+  result2 = __riscv_vset_v_i32m1_i32m8 (result2, 2, v2);
+  result2 = __riscv_vset_v_i32m1_i32m8 (result2, 3, v3);
+  result2 = __riscv_vset_v_i32m1_i32m8 (result2, 4, v4);
+  result2 = __riscv_vset_v_i32m1_i32m8 (result2, 5, v5);
+  result2 = __riscv_vset_v_i32m1_i32m8 (result2, 6, v6);
+  result2 = __riscv_vset_v_i32m1_i32m8 (result2, 7, v7);
+  __riscv_vse32_v_i32m8((out), result2, 64); 
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110109-2.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110109-2.c
new file mode 100644
index 00000000000..e8b5bf8c714
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110109-2.c
@@ -0,0 +1,485 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv32gcv -mabi=ilp32d" } */
+
+#include "riscv_vector.h"
+
+vfloat32m1_t test_vlmul_ext_v_f32mf2_f32m1(vfloat32mf2_t op1) {
+  return __riscv_vlmul_ext_v_f32mf2_f32m1(op1);
+}
+
+vfloat32m2_t test_vlmul_ext_v_f32mf2_f32m2(vfloat32mf2_t op1) {
+  return __riscv_vlmul_ext_v_f32mf2_f32m2(op1);
+}
+
+vfloat32m4_t test_vlmul_ext_v_f32mf2_f32m4(vfloat32mf2_t op1) {
+  return __riscv_vlmul_ext_v_f32mf2_f32m4(op1);
+}
+
+vfloat32m8_t test_vlmul_ext_v_f32mf2_f32m8(vfloat32mf2_t op1) {
+  return __riscv_vlmul_ext_v_f32mf2_f32m8(op1);
+}
+
+vfloat32m2_t test_vlmul_ext_v_f32m1_f32m2(vfloat32m1_t op1) {
+  return __riscv_vlmul_ext_v_f32m1_f32m2(op1);
+}
+
+vfloat32m4_t test_vlmul_ext_v_f32m1_f32m4(vfloat32m1_t op1) {
+  return __riscv_vlmul_ext_v_f32m1_f32m4(op1);
+}
+
+vfloat32m8_t test_vlmul_ext_v_f32m1_f32m8(vfloat32m1_t op1) {
+  return __riscv_vlmul_ext_v_f32m1_f32m8(op1);
+}
+
+vfloat32m4_t test_vlmul_ext_v_f32m2_f32m4(vfloat32m2_t op1) {
+  return __riscv_vlmul_ext_v_f32m2_f32m4(op1);
+}
+
+vfloat32m8_t test_vlmul_ext_v_f32m2_f32m8(vfloat32m2_t op1) {
+  return __riscv_vlmul_ext_v_f32m2_f32m8(op1);
+}
+
+vfloat32m8_t test_vlmul_ext_v_f32m4_f32m8(vfloat32m4_t op1) {
+  return __riscv_vlmul_ext_v_f32m4_f32m8(op1);
+}
+
+vfloat64m2_t test_vlmul_ext_v_f64m1_f64m2(vfloat64m1_t op1) {
+  return __riscv_vlmul_ext_v_f64m1_f64m2(op1);
+}
+
+vfloat64m4_t test_vlmul_ext_v_f64m1_f64m4(vfloat64m1_t op1) {
+  return __riscv_vlmul_ext_v_f64m1_f64m4(op1);
+}
+
+vfloat64m8_t test_vlmul_ext_v_f64m1_f64m8(vfloat64m1_t op1) {
+  return __riscv_vlmul_ext_v_f64m1_f64m8(op1);
+}
+
+vfloat64m4_t test_vlmul_ext_v_f64m2_f64m4(vfloat64m2_t op1) {
+  return __riscv_vlmul_ext_v_f64m2_f64m4(op1);
+}
+
+vfloat64m8_t test_vlmul_ext_v_f64m2_f64m8(vfloat64m2_t op1) {
+  return __riscv_vlmul_ext_v_f64m2_f64m8(op1);
+}
+
+vfloat64m8_t test_vlmul_ext_v_f64m4_f64m8(vfloat64m4_t op1) {
+  return __riscv_vlmul_ext_v_f64m4_f64m8(op1);
+}
+
+vint8mf4_t test_vlmul_ext_v_i8mf8_i8mf4(vint8mf8_t op1) {
+  return __riscv_vlmul_ext_v_i8mf8_i8mf4(op1);
+}
+
+vint8mf2_t test_vlmul_ext_v_i8mf8_i8mf2(vint8mf8_t op1) {
+  return __riscv_vlmul_ext_v_i8mf8_i8mf2(op1);
+}
+
+vint8m1_t test_vlmul_ext_v_i8mf8_i8m1(vint8mf8_t op1) {
+  return __riscv_vlmul_ext_v_i8mf8_i8m1(op1);
+}
+
+vint8m2_t test_vlmul_ext_v_i8mf8_i8m2(vint8mf8_t op1) {
+  return __riscv_vlmul_ext_v_i8mf8_i8m2(op1);
+}
+
+vint8m4_t test_vlmul_ext_v_i8mf8_i8m4(vint8mf8_t op1) {
+  return __riscv_vlmul_ext_v_i8mf8_i8m4(op1);
+}
+
+vint8m8_t test_vlmul_ext_v_i8mf8_i8m8(vint8mf8_t op1) {
+  return __riscv_vlmul_ext_v_i8mf8_i8m8(op1);
+}
+
+vint8mf2_t test_vlmul_ext_v_i8mf4_i8mf2(vint8mf4_t op1) {
+  return __riscv_vlmul_ext_v_i8mf4_i8mf2(op1);
+}
+
+vint8m1_t test_vlmul_ext_v_i8mf4_i8m1(vint8mf4_t op1) {
+  return __riscv_vlmul_ext_v_i8mf4_i8m1(op1);
+}
+
+vint8m2_t test_vlmul_ext_v_i8mf4_i8m2(vint8mf4_t op1) {
+  return __riscv_vlmul_ext_v_i8mf4_i8m2(op1);
+}
+
+vint8m4_t test_vlmul_ext_v_i8mf4_i8m4(vint8mf4_t op1) {
+  return __riscv_vlmul_ext_v_i8mf4_i8m4(op1);
+}
+
+vint8m8_t test_vlmul_ext_v_i8mf4_i8m8(vint8mf4_t op1) {
+  return __riscv_vlmul_ext_v_i8mf4_i8m8(op1);
+}
+
+vint8m1_t test_vlmul_ext_v_i8mf2_i8m1(vint8mf2_t op1) {
+  return __riscv_vlmul_ext_v_i8mf2_i8m1(op1);
+}
+
+vint8m2_t test_vlmul_ext_v_i8mf2_i8m2(vint8mf2_t op1) {
+  return __riscv_vlmul_ext_v_i8mf2_i8m2(op1);
+}
+
+vint8m4_t test_vlmul_ext_v_i8mf2_i8m4(vint8mf2_t op1) {
+  return __riscv_vlmul_ext_v_i8mf2_i8m4(op1);
+}
+
+vint8m8_t test_vlmul_ext_v_i8mf2_i8m8(vint8mf2_t op1) {
+  return __riscv_vlmul_ext_v_i8mf2_i8m8(op1);
+}
+
+vint8m2_t test_vlmul_ext_v_i8m1_i8m2(vint8m1_t op1) {
+  return __riscv_vlmul_ext_v_i8m1_i8m2(op1);
+}
+
+vint8m4_t test_vlmul_ext_v_i8m1_i8m4(vint8m1_t op1) {
+  return __riscv_vlmul_ext_v_i8m1_i8m4(op1);
+}
+
+vint8m8_t test_vlmul_ext_v_i8m1_i8m8(vint8m1_t op1) {
+  return __riscv_vlmul_ext_v_i8m1_i8m8(op1);
+}
+
+vint8m4_t test_vlmul_ext_v_i8m2_i8m4(vint8m2_t op1) {
+  return __riscv_vlmul_ext_v_i8m2_i8m4(op1);
+}
+
+vint8m8_t test_vlmul_ext_v_i8m2_i8m8(vint8m2_t op1) {
+  return __riscv_vlmul_ext_v_i8m2_i8m8(op1);
+}
+
+vint8m8_t test_vlmul_ext_v_i8m4_i8m8(vint8m4_t op1) {
+  return __riscv_vlmul_ext_v_i8m4_i8m8(op1);
+}
+
+vint16mf2_t test_vlmul_ext_v_i16mf4_i16mf2(vint16mf4_t op1) {
+  return __riscv_vlmul_ext_v_i16mf4_i16mf2(op1);
+}
+
+vint16m1_t test_vlmul_ext_v_i16mf4_i16m1(vint16mf4_t op1) {
+  return __riscv_vlmul_ext_v_i16mf4_i16m1(op1);
+}
+
+vint16m2_t test_vlmul_ext_v_i16mf4_i16m2(vint16mf4_t op1) {
+  return __riscv_vlmul_ext_v_i16mf4_i16m2(op1);
+}
+
+vint16m4_t test_vlmul_ext_v_i16mf4_i16m4(vint16mf4_t op1) {
+  return __riscv_vlmul_ext_v_i16mf4_i16m4(op1);
+}
+
+vint16m8_t test_vlmul_ext_v_i16mf4_i16m8(vint16mf4_t op1) {
+  return __riscv_vlmul_ext_v_i16mf4_i16m8(op1);
+}
+
+vint16m1_t test_vlmul_ext_v_i16mf2_i16m1(vint16mf2_t op1) {
+  return __riscv_vlmul_ext_v_i16mf2_i16m1(op1);
+}
+
+vint16m2_t test_vlmul_ext_v_i16mf2_i16m2(vint16mf2_t op1) {
+  return __riscv_vlmul_ext_v_i16mf2_i16m2(op1);
+}
+
+vint16m4_t test_vlmul_ext_v_i16mf2_i16m4(vint16mf2_t op1) {
+  return __riscv_vlmul_ext_v_i16mf2_i16m4(op1);
+}
+
+vint16m8_t test_vlmul_ext_v_i16mf2_i16m8(vint16mf2_t op1) {
+  return __riscv_vlmul_ext_v_i16mf2_i16m8(op1);
+}
+
+vint16m2_t test_vlmul_ext_v_i16m1_i16m2(vint16m1_t op1) {
+  return __riscv_vlmul_ext_v_i16m1_i16m2(op1);
+}
+
+vint16m4_t test_vlmul_ext_v_i16m1_i16m4(vint16m1_t op1) {
+  return __riscv_vlmul_ext_v_i16m1_i16m4(op1);
+}
+
+vint16m8_t test_vlmul_ext_v_i16m1_i16m8(vint16m1_t op1) {
+  return __riscv_vlmul_ext_v_i16m1_i16m8(op1);
+}
+
+vint16m4_t test_vlmul_ext_v_i16m2_i16m4(vint16m2_t op1) {
+  return __riscv_vlmul_ext_v_i16m2_i16m4(op1);
+}
+
+vint16m8_t test_vlmul_ext_v_i16m2_i16m8(vint16m2_t op1) {
+  return __riscv_vlmul_ext_v_i16m2_i16m8(op1);
+}
+
+vint16m8_t test_vlmul_ext_v_i16m4_i16m8(vint16m4_t op1) {
+  return __riscv_vlmul_ext_v_i16m4_i16m8(op1);
+}
+
+vint32m1_t test_vlmul_ext_v_i32mf2_i32m1(vint32mf2_t op1) {
+  return __riscv_vlmul_ext_v_i32mf2_i32m1(op1);
+}
+
+vint32m2_t test_vlmul_ext_v_i32mf2_i32m2(vint32mf2_t op1) {
+  return __riscv_vlmul_ext_v_i32mf2_i32m2(op1);
+}
+
+vint32m4_t test_vlmul_ext_v_i32mf2_i32m4(vint32mf2_t op1) {
+  return __riscv_vlmul_ext_v_i32mf2_i32m4(op1);
+}
+
+vint32m8_t test_vlmul_ext_v_i32mf2_i32m8(vint32mf2_t op1) {
+  return __riscv_vlmul_ext_v_i32mf2_i32m8(op1);
+}
+
+vint32m2_t test_vlmul_ext_v_i32m1_i32m2(vint32m1_t op1) {
+  return __riscv_vlmul_ext_v_i32m1_i32m2(op1);
+}
+
+vint32m4_t test_vlmul_ext_v_i32m1_i32m4(vint32m1_t op1) {
+  return __riscv_vlmul_ext_v_i32m1_i32m4(op1);
+}
+
+vint32m8_t test_vlmul_ext_v_i32m1_i32m8(vint32m1_t op1) {
+  return __riscv_vlmul_ext_v_i32m1_i32m8(op1);
+}
+
+vint32m4_t test_vlmul_ext_v_i32m2_i32m4(vint32m2_t op1) {
+  return __riscv_vlmul_ext_v_i32m2_i32m4(op1);
+}
+
+vint32m8_t test_vlmul_ext_v_i32m2_i32m8(vint32m2_t op1) {
+  return __riscv_vlmul_ext_v_i32m2_i32m8(op1);
+}
+
+vint32m8_t test_vlmul_ext_v_i32m4_i32m8(vint32m4_t op1) {
+  return __riscv_vlmul_ext_v_i32m4_i32m8(op1);
+}
+
+vint64m2_t test_vlmul_ext_v_i64m1_i64m2(vint64m1_t op1) {
+  return __riscv_vlmul_ext_v_i64m1_i64m2(op1);
+}
+
+vint64m4_t test_vlmul_ext_v_i64m1_i64m4(vint64m1_t op1) {
+  return __riscv_vlmul_ext_v_i64m1_i64m4(op1);
+}
+
+vint64m8_t test_vlmul_ext_v_i64m1_i64m8(vint64m1_t op1) {
+  return __riscv_vlmul_ext_v_i64m1_i64m8(op1);
+}
+
+vint64m4_t test_vlmul_ext_v_i64m2_i64m4(vint64m2_t op1) {
+  return __riscv_vlmul_ext_v_i64m2_i64m4(op1);
+}
+
+vint64m8_t test_vlmul_ext_v_i64m2_i64m8(vint64m2_t op1) {
+  return __riscv_vlmul_ext_v_i64m2_i64m8(op1);
+}
+
+vint64m8_t test_vlmul_ext_v_i64m4_i64m8(vint64m4_t op1) {
+  return __riscv_vlmul_ext_v_i64m4_i64m8(op1);
+}
+
+vuint8mf4_t test_vlmul_ext_v_u8mf8_u8mf4(vuint8mf8_t op1) {
+  return __riscv_vlmul_ext_v_u8mf8_u8mf4(op1);
+}
+
+vuint8mf2_t test_vlmul_ext_v_u8mf8_u8mf2(vuint8mf8_t op1) {
+  return __riscv_vlmul_ext_v_u8mf8_u8mf2(op1);
+}
+
+vuint8m1_t test_vlmul_ext_v_u8mf8_u8m1(vuint8mf8_t op1) {
+  return __riscv_vlmul_ext_v_u8mf8_u8m1(op1);
+}
+
+vuint8m2_t test_vlmul_ext_v_u8mf8_u8m2(vuint8mf8_t op1) {
+  return __riscv_vlmul_ext_v_u8mf8_u8m2(op1);
+}
+
+vuint8m4_t test_vlmul_ext_v_u8mf8_u8m4(vuint8mf8_t op1) {
+  return __riscv_vlmul_ext_v_u8mf8_u8m4(op1);
+}
+
+vuint8m8_t test_vlmul_ext_v_u8mf8_u8m8(vuint8mf8_t op1) {
+  return __riscv_vlmul_ext_v_u8mf8_u8m8(op1);
+}
+
+vuint8mf2_t test_vlmul_ext_v_u8mf4_u8mf2(vuint8mf4_t op1) {
+  return __riscv_vlmul_ext_v_u8mf4_u8mf2(op1);
+}
+
+vuint8m1_t test_vlmul_ext_v_u8mf4_u8m1(vuint8mf4_t op1) {
+  return __riscv_vlmul_ext_v_u8mf4_u8m1(op1);
+}
+
+vuint8m2_t test_vlmul_ext_v_u8mf4_u8m2(vuint8mf4_t op1) {
+  return __riscv_vlmul_ext_v_u8mf4_u8m2(op1);
+}
+
+vuint8m4_t test_vlmul_ext_v_u8mf4_u8m4(vuint8mf4_t op1) {
+  return __riscv_vlmul_ext_v_u8mf4_u8m4(op1);
+}
+
+vuint8m8_t test_vlmul_ext_v_u8mf4_u8m8(vuint8mf4_t op1) {
+  return __riscv_vlmul_ext_v_u8mf4_u8m8(op1);
+}
+
+vuint8m1_t test_vlmul_ext_v_u8mf2_u8m1(vuint8mf2_t op1) {
+  return __riscv_vlmul_ext_v_u8mf2_u8m1(op1);
+}
+
+vuint8m2_t test_vlmul_ext_v_u8mf2_u8m2(vuint8mf2_t op1) {
+  return __riscv_vlmul_ext_v_u8mf2_u8m2(op1);
+}
+
+vuint8m4_t test_vlmul_ext_v_u8mf2_u8m4(vuint8mf2_t op1) {
+  return __riscv_vlmul_ext_v_u8mf2_u8m4(op1);
+}
+
+vuint8m8_t test_vlmul_ext_v_u8mf2_u8m8(vuint8mf2_t op1) {
+  return __riscv_vlmul_ext_v_u8mf2_u8m8(op1);
+}
+
+vuint8m2_t test_vlmul_ext_v_u8m1_u8m2(vuint8m1_t op1) {
+  return __riscv_vlmul_ext_v_u8m1_u8m2(op1);
+}
+
+vuint8m4_t test_vlmul_ext_v_u8m1_u8m4(vuint8m1_t op1) {
+  return __riscv_vlmul_ext_v_u8m1_u8m4(op1);
+}
+
+vuint8m8_t test_vlmul_ext_v_u8m1_u8m8(vuint8m1_t op1) {
+  return __riscv_vlmul_ext_v_u8m1_u8m8(op1);
+}
+
+vuint8m4_t test_vlmul_ext_v_u8m2_u8m4(vuint8m2_t op1) {
+  return __riscv_vlmul_ext_v_u8m2_u8m4(op1);
+}
+
+vuint8m8_t test_vlmul_ext_v_u8m2_u8m8(vuint8m2_t op1) {
+  return __riscv_vlmul_ext_v_u8m2_u8m8(op1);
+}
+
+vuint8m8_t test_vlmul_ext_v_u8m4_u8m8(vuint8m4_t op1) {
+  return __riscv_vlmul_ext_v_u8m4_u8m8(op1);
+}
+
+vuint16mf2_t test_vlmul_ext_v_u16mf4_u16mf2(vuint16mf4_t op1) {
+  return __riscv_vlmul_ext_v_u16mf4_u16mf2(op1);
+}
+
+vuint16m1_t test_vlmul_ext_v_u16mf4_u16m1(vuint16mf4_t op1) {
+  return __riscv_vlmul_ext_v_u16mf4_u16m1(op1);
+}
+
+vuint16m2_t test_vlmul_ext_v_u16mf4_u16m2(vuint16mf4_t op1) {
+  return __riscv_vlmul_ext_v_u16mf4_u16m2(op1);
+}
+
+vuint16m4_t test_vlmul_ext_v_u16mf4_u16m4(vuint16mf4_t op1) {
+  return __riscv_vlmul_ext_v_u16mf4_u16m4(op1);
+}
+
+vuint16m8_t test_vlmul_ext_v_u16mf4_u16m8(vuint16mf4_t op1) {
+  return __riscv_vlmul_ext_v_u16mf4_u16m8(op1);
+}
+
+vuint16m1_t test_vlmul_ext_v_u16mf2_u16m1(vuint16mf2_t op1) {
+  return __riscv_vlmul_ext_v_u16mf2_u16m1(op1);
+}
+
+vuint16m2_t test_vlmul_ext_v_u16mf2_u16m2(vuint16mf2_t op1) {
+  return __riscv_vlmul_ext_v_u16mf2_u16m2(op1);
+}
+
+vuint16m4_t test_vlmul_ext_v_u16mf2_u16m4(vuint16mf2_t op1) {
+  return __riscv_vlmul_ext_v_u16mf2_u16m4(op1);
+}
+
+vuint16m8_t test_vlmul_ext_v_u16mf2_u16m8(vuint16mf2_t op1) {
+  return __riscv_vlmul_ext_v_u16mf2_u16m8(op1);
+}
+
+vuint16m2_t test_vlmul_ext_v_u16m1_u16m2(vuint16m1_t op1) {
+  return __riscv_vlmul_ext_v_u16m1_u16m2(op1);
+}
+
+vuint16m4_t test_vlmul_ext_v_u16m1_u16m4(vuint16m1_t op1) {
+  return __riscv_vlmul_ext_v_u16m1_u16m4(op1);
+}
+
+vuint16m8_t test_vlmul_ext_v_u16m1_u16m8(vuint16m1_t op1) {
+  return __riscv_vlmul_ext_v_u16m1_u16m8(op1);
+}
+
+vuint16m4_t test_vlmul_ext_v_u16m2_u16m4(vuint16m2_t op1) {
+  return __riscv_vlmul_ext_v_u16m2_u16m4(op1);
+}
+
+vuint16m8_t test_vlmul_ext_v_u16m2_u16m8(vuint16m2_t op1) {
+  return __riscv_vlmul_ext_v_u16m2_u16m8(op1);
+}
+
+vuint16m8_t test_vlmul_ext_v_u16m4_u16m8(vuint16m4_t op1) {
+  return __riscv_vlmul_ext_v_u16m4_u16m8(op1);
+}
+
+vuint32m1_t test_vlmul_ext_v_u32mf2_u32m1(vuint32mf2_t op1) {
+  return __riscv_vlmul_ext_v_u32mf2_u32m1(op1);
+}
+
+vuint32m2_t test_vlmul_ext_v_u32mf2_u32m2(vuint32mf2_t op1) {
+  return __riscv_vlmul_ext_v_u32mf2_u32m2(op1);
+}
+
+vuint32m4_t test_vlmul_ext_v_u32mf2_u32m4(vuint32mf2_t op1) {
+  return __riscv_vlmul_ext_v_u32mf2_u32m4(op1);
+}
+
+vuint32m8_t test_vlmul_ext_v_u32mf2_u32m8(vuint32mf2_t op1) {
+  return __riscv_vlmul_ext_v_u32mf2_u32m8(op1);
+}
+
+vuint32m2_t test_vlmul_ext_v_u32m1_u32m2(vuint32m1_t op1) {
+  return __riscv_vlmul_ext_v_u32m1_u32m2(op1);
+}
+
+vuint32m4_t test_vlmul_ext_v_u32m1_u32m4(vuint32m1_t op1) {
+  return __riscv_vlmul_ext_v_u32m1_u32m4(op1);
+}
+
+vuint32m8_t test_vlmul_ext_v_u32m1_u32m8(vuint32m1_t op1) {
+  return __riscv_vlmul_ext_v_u32m1_u32m8(op1);
+}
+
+vuint32m4_t test_vlmul_ext_v_u32m2_u32m4(vuint32m2_t op1) {
+  return __riscv_vlmul_ext_v_u32m2_u32m4(op1);
+}
+
+vuint32m8_t test_vlmul_ext_v_u32m2_u32m8(vuint32m2_t op1) {
+  return __riscv_vlmul_ext_v_u32m2_u32m8(op1);
+}
+
+vuint32m8_t test_vlmul_ext_v_u32m4_u32m8(vuint32m4_t op1) {
+  return __riscv_vlmul_ext_v_u32m4_u32m8(op1);
+}
+
+vuint64m2_t test_vlmul_ext_v_u64m1_u64m2(vuint64m1_t op1) {
+  return __riscv_vlmul_ext_v_u64m1_u64m2(op1);
+}
+
+vuint64m4_t test_vlmul_ext_v_u64m1_u64m4(vuint64m1_t op1) {
+  return __riscv_vlmul_ext_v_u64m1_u64m4(op1);
+}
+
+vuint64m8_t test_vlmul_ext_v_u64m1_u64m8(vuint64m1_t op1) {
+  return __riscv_vlmul_ext_v_u64m1_u64m8(op1);
+}
+
+vuint64m4_t test_vlmul_ext_v_u64m2_u64m4(vuint64m2_t op1) {
+  return __riscv_vlmul_ext_v_u64m2_u64m4(op1);
+}
+
+vuint64m8_t test_vlmul_ext_v_u64m2_u64m8(vuint64m2_t op1) {
+  return __riscv_vlmul_ext_v_u64m2_u64m8(op1);
+}
+
+vuint64m8_t test_vlmul_ext_v_u64m4_u64m8(vuint64m4_t op1) {
+  return __riscv_vlmul_ext_v_u64m4_u64m8(op1);
+}
+