diff mbox series

RISC-V: Refactor the framework of RVV auto-vectorization

Message ID	20230523060804.61556-1-juzhe.zhong@rivai.ai
State	Unresolved
Headers	Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E4F453858426 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, palmer@dabbelt.com, palmer@rivosinc.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Juzhe-Zhong <juzhe.zhong@rivai.ai> Subject: [PATCH] RISC-V: Refactor the framework of RVV auto-vectorization Date: Tue, 23 May 2023 14:08:04 +0800 Message-Id: <20230523060804.61556-1-juzhe.zhong@rivai.ai> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 Precedence: list Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
Series	RISC-V: Refactor the framework of RVV auto-vectorization \| RISC-V: Refactor the framework of RVV auto-vectorization

Checks

Context	Check	Description
snail/gcc-patch-check	warning	Git am fail log

Commit Message

juzhe.zhong@rivai.ai May 23, 2023, 6:08 a.m. UTC

  From: Juzhe-Zhong <juzhe.zhong@rivai.ai>

This patch is to refactor the framework of RVV auto-vectorization.
Since we find out are keep adding helpers && wrappers when implementing auto-vectorization.
It will make the RVV auto-vectorizaiton very messy.

After double check my downstream RVV GCC, assemble all auto-vectorization patterns we are
going to have. Base on these informations, I refactor the RVV framework to make it is
easier and flexible for future use.

For example, we will definitely implement len_mask_load/len_mask_store patterns which
have both length && mask operand and use undefine merge operand.

len_cond_div or cond_div will have length or mask operand and use a real merge operand
instead of undefine merge operand.

Also, we will have some patterns will use tail undisturbed and mask any.

etc..... We will defintely have various features.

Base on these circumstances, we add these following private members:
  
  int m_op_num;
  /* It't true when the pattern has a dest operand. Most of the patterns have
     dest operand wheras some patterns like STOREs does not have dest operand.
  */
  bool m_has_dest_p;
  /* It't true if the pattern uses all trues mask operand.  */
  bool m_use_all_trues_mask_p;
  /* It's true if the pattern uses undefined merge operand.  */
  bool m_use_undef_merge_p;
  bool m_has_avl_p;
  bool m_vlmax_p;
  bool m_has_tail_policy_p;
  bool m_has_mask_policy_p;
  enum tail_policy m_tail_policy;
  enum mask_policy m_mask_policy;
  machine_mode m_dest_mode;
  machine_mode m_mask_mode;

These variables I believe can cover all potential situations.

And the instruction generater wrapper is "emit_insn" which will add operands and
emit instruction according to the variables I mentioned above.

After this is done. We will easily add helpers without changing any base class "insn_expand".

Currently, we have "emit_vlmax_tany_many" and "emit_nonvlmax_tany_many".

For example, when we want to emit a binary operations:
We have 
#define RVV_BINOP_NUM 3 (number including the output)

Then just use emit_vlmax_tany_many (...RVV_BINOP_NUM...)

So, if we support ternary operation in the future. It's quite simple:
#define RVV_TERNOP_NUM 4 (number including the output)
emit_vlmax_tany_many (...RVV_BINOP_NUM...)

"*_tany_many" means we are using tail any and mask any.

We will definitely need tail undisturbed or mask undisturbed when we support these patterns
in middle-end. It's very simple to extend such helper base on current framework:

we can do that in the future like this:

void
emit_nonvlmax_tu_mu (unsigned icode, int op_num, rtx *ops)
{
  machine_mode data_mode = GET_MODE (ops[0]);
  machine_mode mask_mode = get_mask_mode (data_mode).require ();
  /* The number = 11 is because we have maximum 11 operands for
     RVV instruction patterns according to vector.md.  */
  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
		       /*USE_ALL_TRUES_MASK_P*/ true,
		       /*USE_UNDEF_MERGE_P*/ true, /*HAS_AVL_P*/ true,
		       /*VLMAX_P*/ false,
		       /*HAS_TAIL_POLICY_P*/ true, /*HAS_MASK_POLICY_P*/ true,
		       /*TAIL_POLICY*/ TAIL_UNDISTURBED, /*MASK_POLICY*/ MASK_UNDISTURBED,
		       /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode);
  e.emit_insn ((enum insn_code) icode, ops);
}

That's enough (I have tested it fully in my downstream RVV GCC).
I didn't add it in this patch.

Thanks.

gcc/ChangeLog:

        * config/riscv/autovec.md: Refactor the framework of RVV auto-vectorization.
        * config/riscv/riscv-protos.h (RVV_MISC_OP_NUM): Ditto.
        (RVV_UNOP_NUM): New macro.
        (RVV_BINOP_NUM): Ditto.
        (legitimize_move): Refactor the framework of RVV auto-vectorization.
        (emit_vlmax_op): Ditto.
        (emit_vlmax_reg_op): Ditto.
        (emit_len_op): Ditto.
        (emit_len_binop): Ditto.
        (emit_vlmax_tany_many): Ditto.
        (emit_nonvlmax_tany_many): Ditto.
        (sew64_scalar_helper): Ditto.
        (expand_tuple_move): Ditto.
        * config/riscv/riscv-v.cc (emit_pred_op): Ditto.
        (emit_pred_binop): Ditto.
        (emit_vlmax_op): Ditto.
        (emit_vlmax_tany_many): New function.
        (emit_len_op): Remove.
        (emit_nonvlmax_tany_many): New function.
        (emit_vlmax_reg_op): Remove.
        (emit_len_binop): Ditto.
        (emit_index_op): Ditto.
        (expand_vec_series): Refactor the framework of RVV auto-vectorization.
        (expand_const_vector): Ditto.
        (legitimize_move): Ditto.
        (sew64_scalar_helper): Ditto.
        (expand_tuple_move): Ditto.
        (expand_vector_init_insert_elems): Ditto.
        * config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto.
        * config/riscv/vector.md: Ditto.

---
 gcc/config/riscv/autovec.md     |  40 ++--
 gcc/config/riscv/riscv-protos.h |  16 +-
 gcc/config/riscv/riscv-v.cc     | 341 +++++++++++++++++---------------
 gcc/config/riscv/riscv.cc       |   8 +-
 gcc/config/riscv/vector.md      |  40 +---
 5 files changed, 216 insertions(+), 229 deletions(-)

Comments

Robin Dapp May 23, 2023, 8:06 a.m. UTC | #1

Hi Juzhe,

in general I find the revised structure quite logical and it is definitely
an improvement.  Some abstraction are still a bit leaky but we can always
refactor "on the fly".  Some comments on the general parts, skipping
over the later details.

>   bool m_has_dest_p;

Why does a store not have a destination (as commented below)?

>   /* It't true if the pattern uses all trues mask operand.  */
>   bool m_use_all_trues_mask_p;

m_all_unmasked_p or m_fully_unmasked_p?

>   /* It's true if the pattern uses undefined merge operand.  */
>   bool m_use_undef_merge_p;

Apart from the insn-centric name, couldn't we also decide this
based on the context later?  In the vector-builtins.cc we have
use_real_mask_p and use_real_merge_p that do this.

>   bool m_has_avl_p;

This means "has avl operand" I suppose?  From the caller's point
of view (and also the vsetvl pass) something like "needs avl" or so
would be more descriptive but undecided here.

>   bool m_vlmax_p;
>   bool m_has_tail_policy_p;
>   bool m_has_mask_policy_p;

Do we need to expose these in the constructor?  As far as I can
tell we can decide later whether the instruction has a policy
or not (as I did in my patch, depending on whether all inputs
are masks or so).

>   enum tail_policy m_tail_policy;
>   enum mask_policy m_mask_policy;

>   machine_mode m_dest_mode;
>   machine_mode m_mask_mode;

Having the mask mode be automatically deduced from the destination
is good, it was just obnoxious before to pass <VM>...

> Currently, we have "emit_vlmax_tany_many" and "emit_nonvlmax_tany_many".

I don't particularly like the names ;) Going back to vlmax and
nonvlmax I don't mind but do we really need to have the policies
encoded in the name now?  Especially since "many" is a word and
the default is ANY anyway.  Why not emit_vlmax_insn/emit_vlmax_op
for now and add the tu/mu later?
 
> #define RVV_BINOP_NUM 3 (number including the output)

Could make this into an "instruction type" rather than just a
number  (i.e. RVV_BINOP) and then set the number of operands
internally according to the type.  This would also make it clearer
in case we later want to set other options depending on the type.
 
> Then just use emit_vlmax_tany_many (...RVV_BINOP_NUM...)
> 
> So, if we support ternary operation in the future. It's quite simple:
> #define RVV_TERNOP_NUM 4 (number including the output)
> emit_vlmax_tany_many (...RVV_BINOP_NUM...)
> 
> "*_tany_many" means we are using tail any and mask any.
> 
> We will definitely need tail undisturbed or mask undisturbed when we support these patterns
> in middle-end. It's very simple to extend such helper base on current framework:
> 
> we can do that in the future like this:
> 
> void
> emit_nonvlmax_tu_mu (unsigned icode, int op_num, rtx *ops)
> {
>   machine_mode data_mode = GET_MODE (ops[0]);
>   machine_mode mask_mode = get_mask_mode (data_mode).require ();
>   /* The number = 11 is because we have maximum 11 operands for
>      RVV instruction patterns according to vector.md.  */

You can just drop the "The number = 11 is because" and say
"We have a maximum of 11 operands for...".

>   insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
> 		       /*USE_ALL_TRUES_MASK_P*/ true,
> 		       /*USE_UNDEF_MERGE_P*/ true, /*HAS_AVL_P*/ true,
> 		       /*VLMAX_P*/ false,
> 		       /*HAS_TAIL_POLICY_P*/ true, /*HAS_MASK_POLICY_P*/ true,
> 		       /*TAIL_POLICY*/ TAIL_UNDISTURBED, /*MASK_POLICY*/ MASK_UNDISTURBED,
> 		       /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode);
>   e.emit_insn ((enum insn_code) icode, ops);
> }

The eleven arguments seem a bit clunky here ;)  I would suggest
changing this again in the future bur for now let's just go ahead
with it in order to make progress.

> -  riscv_vector::emit_len_op (code_for_pred_mov (<MODE>mode), operands[0],
> -			     operands[1], operands[2], <VM>mode);
> +  riscv_vector::emit_nonvlmax_tany_many (code_for_pred_mov (<MODE>mode),
> +  					 RVV_UNOP_NUM, operands);
>    DONE;
>  })

The rtx operands[] array I like least of the changes in this patch.
It's essentially an untyped array whose meaning is dependent on context
containing source operands and the length that is sometimes empty and
sometimes not.  I can't think of something that wouldn't complicate things
though but before we at least had functions called _len that would take
a length (NULL or not) and _vlmax that wouldn't.  It's pretty easy to mess
up here on the caller's side.

That said, again, we can always refactor again and let's not let perfect
be the enemy of good here.  All in all it is not more complicated now than
it was before so we should go ahead IMHO.

Regards
 Robin

juzhe.zhong@rivai.ai May 23, 2023, 8:25 a.m. UTC | #2

Hi, Robin.

>> Why does a store not have a destination (as commented below)?
OK, V2 patch will have more comments.

>> m_all_unmasked_p or m_fully_unmasked_p?
OK.

>> Apart from the insn-centric name, couldn't we also decide this
>> based on the context later?  In the vector-builtins.cc we have
>> use_real_mask_p and use_real_merge_p that do this.
Ok. V2 will follow builtin framework

>> This means "has avl operand" I suppose?  From the caller's point
>> of view (and also the vsetvl pass) something like "needs avl" or so
>> would be more descriptive but undecided here.
Ok.

>> Do we need to expose these in the constructor?  As far as I can
>> tell we can decide later whether the instruction has a policy
>> or not (as I did in my patch, depending on whether all inputs
>> are masks or so).

Maybe, we can add helpers to set policies. I will send V2 let you see.

>> Having the mask mode be automatically deduced from the destination
>>is good, it was just obnoxious before to pass <VM>...
Ok

>> I don't particularly like the names ;) Going back to vlmax and
>> nonvlmax I don't mind but do we really need to have the policies
>> encoded in the name now?  Especially since "many" is a word and
>> the default is ANY anyway.  Why not emit_vlmax_insn/emit_vlmax_op
>> for now and add the tu/mu later?

Ok

>> You can just drop the "The number = 11 is because" and say
>> "We have a maximum of 11 operands for...".
>> The eleven arguments seem a bit clunky here ;)  I would suggest
>> changing this again in the future bur for now let's just go ahead
>> with it in order to make progress.
Ok

>> The rtx operands[] array I like least of the changes in this patch.
>> It's essentially an untyped array whose meaning is dependent on context
>> containing source operands and the length that is sometimes empty and
>> sometimes not.  I can't think of something that wouldn't complicate things
>> though but before we at least had functions called _len that would take
>> a length (NULL or not) and _vlmax that wouldn't.  It's pretty easy to mess
>> up here on the caller's side.

ARM uses rtx operands[] in many places and I personally prefer this way since
it will make codes much cleaner. 
I dislike the way making the function argument with multiple operand ,like this:
void func(rtx dest, rtx src1, rtx src2, ....)
If we are doing this, we will need to add helpers forever...

Sending V2 patch soon.

Thanks.


juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-05-23 16:06
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Refactor the framework of RVV auto-vectorization
Hi Juzhe,
 
in general I find the revised structure quite logical and it is definitely
an improvement.  Some abstraction are still a bit leaky but we can always
refactor "on the fly".  Some comments on the general parts, skipping
over the later details.
 
>   bool m_has_dest_p;
 
Why does a store not have a destination (as commented below)?
 
>   /* It't true if the pattern uses all trues mask operand.  */
>   bool m_use_all_trues_mask_p;
 
m_all_unmasked_p or m_fully_unmasked_p?
 
>   /* It's true if the pattern uses undefined merge operand.  */
>   bool m_use_undef_merge_p;
 
Apart from the insn-centric name, couldn't we also decide this
based on the context later?  In the vector-builtins.cc we have
use_real_mask_p and use_real_merge_p that do this.
 
>   bool m_has_avl_p;
 
This means "has avl operand" I suppose?  From the caller's point
of view (and also the vsetvl pass) something like "needs avl" or so
would be more descriptive but undecided here.
 
>   bool m_vlmax_p;
>   bool m_has_tail_policy_p;
>   bool m_has_mask_policy_p;
 
Do we need to expose these in the constructor?  As far as I can
tell we can decide later whether the instruction has a policy
or not (as I did in my patch, depending on whether all inputs
are masks or so).
 
>   enum tail_policy m_tail_policy;
>   enum mask_policy m_mask_policy;
 
>   machine_mode m_dest_mode;
>   machine_mode m_mask_mode;
 
Having the mask mode be automatically deduced from the destination
is good, it was just obnoxious before to pass <VM>...
 
> Currently, we have "emit_vlmax_tany_many" and "emit_nonvlmax_tany_many".
 
I don't particularly like the names ;) Going back to vlmax and
nonvlmax I don't mind but do we really need to have the policies
encoded in the name now?  Especially since "many" is a word and
the default is ANY anyway.  Why not emit_vlmax_insn/emit_vlmax_op
for now and add the tu/mu later?
> #define RVV_BINOP_NUM 3 (number including the output)
 
Could make this into an "instruction type" rather than just a
number  (i.e. RVV_BINOP) and then set the number of operands
internally according to the type.  This would also make it clearer
in case we later want to set other options depending on the type.
> Then just use emit_vlmax_tany_many (...RVV_BINOP_NUM...)
> 
> So, if we support ternary operation in the future. It's quite simple:
> #define RVV_TERNOP_NUM 4 (number including the output)
> emit_vlmax_tany_many (...RVV_BINOP_NUM...)
> 
> "*_tany_many" means we are using tail any and mask any.
> 
> We will definitely need tail undisturbed or mask undisturbed when we support these patterns
> in middle-end. It's very simple to extend such helper base on current framework:
> 
> we can do that in the future like this:
> 
> void
> emit_nonvlmax_tu_mu (unsigned icode, int op_num, rtx *ops)
> {
>   machine_mode data_mode = GET_MODE (ops[0]);
>   machine_mode mask_mode = get_mask_mode (data_mode).require ();
>   /* The number = 11 is because we have maximum 11 operands for
>      RVV instruction patterns according to vector.md.  */
 
You can just drop the "The number = 11 is because" and say
"We have a maximum of 11 operands for...".
 
>   insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
>        /*USE_ALL_TRUES_MASK_P*/ true,
>        /*USE_UNDEF_MERGE_P*/ true, /*HAS_AVL_P*/ true,
>        /*VLMAX_P*/ false,
>        /*HAS_TAIL_POLICY_P*/ true, /*HAS_MASK_POLICY_P*/ true,
>        /*TAIL_POLICY*/ TAIL_UNDISTURBED, /*MASK_POLICY*/ MASK_UNDISTURBED,
>        /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode);
>   e.emit_insn ((enum insn_code) icode, ops);
> }
 
The eleven arguments seem a bit clunky here ;)  I would suggest
changing this again in the future bur for now let's just go ahead
with it in order to make progress.
 
> -  riscv_vector::emit_len_op (code_for_pred_mov (<MODE>mode), operands[0],
> -      operands[1], operands[2], <VM>mode);
> +  riscv_vector::emit_nonvlmax_tany_many (code_for_pred_mov (<MODE>mode),
> +  RVV_UNOP_NUM, operands);
>    DONE;
>  })
 
The rtx operands[] array I like least of the changes in this patch.
It's essentially an untyped array whose meaning is dependent on context
containing source operands and the length that is sometimes empty and
sometimes not.  I can't think of something that wouldn't complicate things
though but before we at least had functions called _len that would take
a length (NULL or not) and _vlmax that wouldn't.  It's pretty easy to mess
up here on the caller's side.
 
That said, again, we can always refactor again and let's not let perfect
be the enemy of good here.  All in all it is not more complicated now than
it was before so we should go ahead IMHO.
 
Regards
Robin

Kito Cheng May 23, 2023, 8:45 a.m. UTC | #3

> ARM uses rtx operands[] in many places and I personally prefer this way since
> it will make codes much cleaner.
> I dislike the way making the function argument with multiple operand ,like this:
> void func(rtx dest, rtx src1, rtx src2, ....)
> If we are doing this, we will need to add helpers forever...

Don't forget we are using C++, so we have function overloading or
default arguments :)

juzhe.zhong@rivai.ai May 23, 2023, 9 a.m. UTC | #4

Yeah. I know. 
Like ARM does everywhere:
(define_expand "vcond<SVE_ALL:mode><SVE_I:mode>"
  [(set (match_operand:SVE_ALL 0 "register_operand")
  (if_then_else:SVE_ALL
    (match_operator 3 "comparison_operator"
      [(match_operand:SVE_I 4 "register_operand")
       (match_operand:SVE_I 5 "nonmemory_operand")])
    (match_operand:SVE_ALL 1 "nonmemory_operand")
    (match_operand:SVE_ALL 2 "nonmemory_operand")))]
  "TARGET_SVE && <SVE_ALL:container_bits> == <SVE_I:container_bits>"
  {
    aarch64_expand_sve_vcond (<SVE_ALL:MODE>mode, <SVE_I:MODE>mode, operands);
    DONE;
  }
)

passing "operands" looks codes much cleaner.

Hi, kito. Could you take a look at the V2 refactor patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619291.html 
This is important for us since we can't post more autovec patches without refactor patch.

Thanks


juzhe.zhong@rivai.ai
 
From: Kito Cheng
Date: 2023-05-23 16:45
To: juzhe.zhong@rivai.ai
CC: Robin Dapp; gcc-patches; Kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: Re: [PATCH] RISC-V: Refactor the framework of RVV auto-vectorization
> ARM uses rtx operands[] in many places and I personally prefer this way since
> it will make codes much cleaner.
> I dislike the way making the function argument with multiple operand ,like this:
> void func(rtx dest, rtx src1, rtx src2, ....)
> If we are doing this, we will need to add helpers forever...
 
Don't forget we are using C++, so we have function overloading or
default arguments :)

Richard Sandiford May 23, 2023, 12:14 p.m. UTC | #5

"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> Yeah. I know. 
> Like ARM does everywhere:
> (define_expand "vcond<SVE_ALL:mode><SVE_I:mode>"
>   [(set (match_operand:SVE_ALL 0 "register_operand")
>   (if_then_else:SVE_ALL
>     (match_operator 3 "comparison_operator"
>       [(match_operand:SVE_I 4 "register_operand")
>        (match_operand:SVE_I 5 "nonmemory_operand")])
>     (match_operand:SVE_ALL 1 "nonmemory_operand")
>     (match_operand:SVE_ALL 2 "nonmemory_operand")))]
>   "TARGET_SVE && <SVE_ALL:container_bits> == <SVE_I:container_bits>"
>   {
>     aarch64_expand_sve_vcond (<SVE_ALL:MODE>mode, <SVE_I:MODE>mode, operands);
>     DONE;
>   }
> )
>
> passing "operands" looks codes much cleaner.

FWIW, I think we only do that when we're reusing optab patterns.
The handling of operand 3 is forced by the definition of vcond_optab.

When there's a choice, we generally use "@" patterns instead, and
pass codes and modes to the expander.

Thanks,
Richard

diff mbox series

Patch

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index ce0b46537ad..24405b869fa 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -31,8 +31,8 @@ 
    (match_operand 3 "const_0_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::emit_len_op (code_for_pred_mov (<MODE>mode), operands[0],
-			     operands[1], operands[2], <VM>mode);
+  riscv_vector::emit_nonvlmax_tany_many (code_for_pred_mov (<MODE>mode),
+  					 RVV_UNOP_NUM, operands);
   DONE;
 })
 
@@ -43,8 +43,8 @@ 
    (match_operand 3 "const_0_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::emit_len_op (code_for_pred_mov (<MODE>mode), operands[0],
-			     operands[1], operands[2], <VM>mode);
+  riscv_vector::emit_nonvlmax_tany_many (code_for_pred_mov (<MODE>mode),
+  					 RVV_UNOP_NUM, operands);
   DONE;
 })
 
@@ -118,21 +118,8 @@ 
      (match_operand:VI 2 "<binop_rhs2_predicate>")))]
   "TARGET_VECTOR"
 {
-  if (!register_operand (operands[2], <MODE>mode))
-    {
-      rtx cst;
-      gcc_assert (const_vec_duplicate_p(operands[2], &cst));
-      riscv_vector::emit_len_binop (code_for_pred_scalar
-				    (<CODE>, <MODE>mode),
-				    operands[0], operands[1], cst,
-				    NULL, <VM>mode,
-				    <VEL>mode);
-    }
-  else
-    riscv_vector::emit_len_binop (code_for_pred
-				  (<CODE>, <MODE>mode),
-				  operands[0], operands[1], operands[2],
-				  NULL, <VM>mode);
+  riscv_vector::emit_vlmax_tany_many (code_for_pred (<CODE>, <MODE>mode),
+				      RVV_BINOP_NUM, operands);
   DONE;
 })
 
@@ -151,12 +138,9 @@ 
      (match_operand:<VEL> 2 "csr_operand")))]
   "TARGET_VECTOR"
 {
-  if (!CONST_SCALAR_INT_P (operands[2]))
-      operands[2] = gen_lowpart (Pmode, operands[2]);
-  riscv_vector::emit_len_binop (code_for_pred_scalar
-				(<CODE>, <MODE>mode),
-				operands[0], operands[1], operands[2],
-				NULL_RTX, <VM>mode, Pmode);
+  operands[2] = gen_lowpart (Pmode, operands[2]);
+  riscv_vector::emit_vlmax_tany_many (code_for_pred_scalar (<CODE>, <MODE>mode),
+				      RVV_BINOP_NUM, operands);
   DONE;
 })
 
@@ -174,9 +158,7 @@ 
      (match_operand:VI 2 "vector_shift_operand")))]
   "TARGET_VECTOR"
 {
-  riscv_vector::emit_len_binop (code_for_pred
-				(<CODE>, <MODE>mode),
-				operands[0], operands[1], operands[2],
-				NULL_RTX, <VM>mode);
+  riscv_vector::emit_vlmax_tany_many (code_for_pred (<CODE>, <MODE>mode),
+				      RVV_BINOP_NUM, operands);
   DONE;
 })
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 12634d0ac1a..ba6d56517d3 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -132,6 +132,9 @@  namespace riscv_vector {
 #define RVV_VUNDEF(MODE)                                                       \
   gen_rtx_UNSPEC (MODE, gen_rtvec (1, gen_rtx_REG (SImode, X0_REGNUM)),        \
 		  UNSPEC_VUNDEF)
+#define RVV_MISC_OP_NUM 1
+#define RVV_UNOP_NUM 2
+#define RVV_BINOP_NUM 3
 enum vlmul_type
 {
   LMUL_1 = 0,
@@ -163,14 +166,11 @@  rtx expand_builtin (unsigned int, tree, rtx);
 bool check_builtin_call (location_t, vec<location_t>, unsigned int,
 			   tree, unsigned int, tree *);
 bool const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
-bool legitimize_move (rtx, rtx, machine_mode);
+bool legitimize_move (rtx, rtx);
 void emit_vlmax_vsetvl (machine_mode, rtx);
 void emit_hard_vlmax_vsetvl (machine_mode, rtx);
-void emit_vlmax_op (unsigned, rtx, rtx, machine_mode);
-void emit_vlmax_reg_op (unsigned, rtx, rtx, rtx, machine_mode);
-void emit_len_op (unsigned, rtx, rtx, rtx, machine_mode);
-void emit_len_binop (unsigned, rtx, rtx, rtx, rtx, machine_mode,
-		     machine_mode = VOIDmode);
+void emit_vlmax_tany_many (unsigned, int, rtx *);
+void emit_nonvlmax_tany_many (unsigned, int, rtx *);
 enum vlmul_type get_vlmul (machine_mode);
 unsigned int get_ratio (machine_mode);
 unsigned int get_nf (machine_mode);
@@ -202,7 +202,7 @@  bool neg_simm5_p (rtx);
 #ifdef RTX_CODE
 bool has_vi_variant_p (rtx_code, rtx);
 #endif
-bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode, machine_mode,
+bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
 			  bool, void (*)(rtx *, rtx));
 rtx gen_scalar_move_mask (machine_mode);
 
@@ -218,7 +218,7 @@  enum vlen_enum
 bool slide1_sew64_helper (int, machine_mode, machine_mode,
 			  machine_mode, rtx *);
 rtx gen_avl_for_scalar_move (rtx);
-void expand_tuple_move (machine_mode, rtx *);
+void expand_tuple_move (rtx *);
 machine_mode preferred_simd_mode (scalar_mode);
 opt_machine_mode get_mask_mode (machine_mode);
 void expand_vec_series (rtx, rtx, rtx);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e0b19bc1754..980928c8aff 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -66,7 +66,29 @@  const_vlmax_p (machine_mode mode)
 template <int MAX_OPERANDS> class insn_expander
 {
 public:
-  insn_expander () : m_opno (0), m_has_dest_p(false) {}
+  insn_expander ()
+    : m_opno (0), m_op_num (0), m_has_dest_p (false),
+      m_use_all_trues_mask_p (false), m_use_undef_merge_p (false),
+      m_has_avl_p (false), m_vlmax_p (false), m_has_tail_policy_p (false),
+      m_has_mask_policy_p (false), m_tail_policy (TAIL_ANY),
+      m_mask_policy (MASK_ANY), m_dest_mode (VOIDmode), m_mask_mode (VOIDmode)
+  {}
+
+  /* Initializer for various configurations.  */
+  insn_expander (int op_num, bool has_dest_p, bool use_all_trues_mask_p,
+		 bool use_undef_merge_p, bool has_avl_p, bool vlmax_p,
+		 bool has_tail_policy_p, bool has_mask_policy_p,
+		 enum tail_policy tail_policy, enum mask_policy mask_policy,
+		 machine_mode dest_mode, machine_mode mask_mode)
+    : m_opno (0), m_op_num (op_num), m_has_dest_p (has_dest_p),
+      m_use_all_trues_mask_p (use_all_trues_mask_p),
+      m_use_undef_merge_p (use_undef_merge_p), m_has_avl_p (has_avl_p),
+      m_vlmax_p (vlmax_p), m_has_tail_policy_p (has_tail_policy_p),
+      m_has_mask_policy_p (has_mask_policy_p), m_tail_policy (tail_policy),
+      m_mask_policy (mask_policy), m_dest_mode (dest_mode),
+      m_mask_mode (mask_mode)
+  {}
+
   void add_output_operand (rtx x, machine_mode mode)
   {
     create_output_operand (&m_ops[m_opno++], x, mode);
@@ -77,67 +99,94 @@  public:
     create_input_operand (&m_ops[m_opno++], x, mode);
     gcc_assert (m_opno <= MAX_OPERANDS);
   }
-  void add_all_one_mask_operand (machine_mode mode)
+  void add_all_one_mask_operand ()
   {
-    add_input_operand (CONSTM1_RTX (mode), mode);
+    add_input_operand (CONSTM1_RTX (m_mask_mode), m_mask_mode);
   }
-  void add_vundef_operand (machine_mode mode)
+  void add_vundef_operand ()
   {
-    add_input_operand (RVV_VUNDEF (mode), mode);
+    add_input_operand (RVV_VUNDEF (m_dest_mode), m_dest_mode);
   }
-  void add_policy_operand (enum tail_policy vta, enum mask_policy vma)
+  void add_policy_operand ()
   {
-    rtx tail_policy_rtx = gen_int_mode (vta, Pmode);
-    rtx mask_policy_rtx = gen_int_mode (vma, Pmode);
-    add_input_operand (tail_policy_rtx, Pmode);
-    add_input_operand (mask_policy_rtx, Pmode);
+    if (m_has_tail_policy_p)
+      {
+	rtx tail_policy_rtx = gen_int_mode (m_tail_policy, Pmode);
+	add_input_operand (tail_policy_rtx, Pmode);
+      }
+    if (m_has_mask_policy_p)
+      {
+	rtx mask_policy_rtx = gen_int_mode (m_mask_policy, Pmode);
+	add_input_operand (mask_policy_rtx, Pmode);
+      }
   }
   void add_avl_type_operand (avl_type type)
   {
     add_input_operand (gen_int_mode (type, Pmode), Pmode);
   }
 
-  void set_dest_and_mask (rtx mask, rtx dest, machine_mode mask_mode)
+  void emit_insn (enum insn_code icode, rtx *ops)
   {
-    m_dest_mode = GET_MODE (dest);
-    m_has_dest_p = true;
-
-    add_output_operand (dest, m_dest_mode);
-
-    if (mask)
-      add_input_operand (mask, GET_MODE (mask));
-    else
-      add_all_one_mask_operand (mask_mode);
-
-    add_vundef_operand (m_dest_mode);
-  }
+    int opno = 0;
+    /* It's true if any operand is memory operand.  */
+    bool any_mem_p = false;
+    /* It's true if all operands are mask operand.  */
+    bool all_mask_p = true;
+    if (m_has_dest_p)
+      {
+	any_mem_p |= MEM_P (ops[opno]);
+	all_mask_p &= GET_MODE_CLASS (GET_MODE (ops[opno])) == MODE_VECTOR_BOOL;
+	add_output_operand (ops[opno++], m_dest_mode);
+      }
 
-  void set_len_and_policy (rtx len, bool force_vlmax = false)
-    {
-      bool vlmax_p = force_vlmax || !len;
-      gcc_assert (m_has_dest_p);
+    if (m_use_all_trues_mask_p)
+      add_all_one_mask_operand ();
 
-      if (vlmax_p && const_vlmax_p (m_dest_mode))
-	{
-	  /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of the
-	     vsetvli to obtain the value of vlmax.  */
-	  poly_uint64 nunits = GET_MODE_NUNITS (m_dest_mode);
-	  len = gen_int_mode (nunits, Pmode);
-	  vlmax_p = false; /* It has became NONVLMAX now.  */
-	}
-      else if (!len)
-	{
-	  len = gen_reg_rtx (Pmode);
-	  emit_vlmax_vsetvl (m_dest_mode, len);
-	}
+    if (m_use_undef_merge_p)
+      add_vundef_operand ();
 
-      add_input_operand (len, Pmode);
+    for (; opno < m_op_num; opno++)
+      {
+	any_mem_p |= MEM_P (ops[opno]);
+	all_mask_p &= GET_MODE_CLASS (GET_MODE (ops[opno])) == MODE_VECTOR_BOOL;
+	machine_mode mode = insn_data[(int) icode].operand[m_opno].mode;
+	/* 'create_input_operand doesn't allow VOIDmode.
+	   According to vector.md, we may have some patterns that do not have
+	   explicit machine mode specifying the operand. Such operands are
+	   always Pmode.  */
+	if (mode == VOIDmode)
+	  mode = Pmode;
+	add_input_operand (ops[opno], mode);
+      }
 
-      if (GET_MODE_CLASS (m_dest_mode) != MODE_VECTOR_BOOL)
-	add_policy_operand (get_prefer_tail_policy (), get_prefer_mask_policy ());
+    if (m_has_avl_p)
+      {
+	rtx len = ops[m_op_num];
+	if (m_vlmax_p)
+	  {
+	    if (const_vlmax_p (m_dest_mode))
+	      {
+		/* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
+		   the vsetvli to obtain the value of vlmax.  */
+		poly_uint64 nunits = GET_MODE_NUNITS (m_dest_mode);
+		len = gen_int_mode (nunits, Pmode);
+		m_vlmax_p = false; /* It has became NONVLMAX now.  */
+	      }
+	    else if (can_create_pseudo_p ())
+	      {
+		len = gen_reg_rtx (Pmode);
+		emit_vlmax_vsetvl (m_dest_mode, len);
+	      }
+	  }
+	add_input_operand (len, Pmode);
+      }
 
-      add_avl_type_operand (vlmax_p ? avl_type::VLMAX : avl_type::NONVLMAX);
-    }
+    if (!all_mask_p)
+      add_policy_operand ();
+    if (m_has_avl_p)
+      add_avl_type_operand (m_vlmax_p ? avl_type::VLMAX : avl_type::NONVLMAX);
+    expand (icode, any_mem_p);
+  }
 
   void expand (enum insn_code icode, bool temporary_volatile_p = false)
   {
@@ -152,8 +201,23 @@  public:
 
 private:
   int m_opno;
+  int m_op_num;
+  /* It't true when the pattern has a dest operand. Most of the patterns have
+     dest operand wheras some patterns like STOREs does not have dest operand.
+  */
   bool m_has_dest_p;
+  /* It't true if the pattern uses all trues mask operand.  */
+  bool m_use_all_trues_mask_p;
+  /* It's true if the pattern uses undefined merge operand.  */
+  bool m_use_undef_merge_p;
+  bool m_has_avl_p;
+  bool m_vlmax_p;
+  bool m_has_tail_policy_p;
+  bool m_has_mask_policy_p;
+  enum tail_policy m_tail_policy;
+  enum mask_policy m_mask_policy;
   machine_mode m_dest_mode;
+  machine_mode m_mask_mode;
   expand_operand m_ops[MAX_OPERANDS];
 };
 
@@ -246,49 +310,6 @@  autovec_use_vlmax_p (void)
 	  || riscv_autovec_preference == RVV_FIXED_VLMAX);
 }
 
-/* Emit an RVV unmask && vl mov from SRC to DEST.  */
-static void
-emit_pred_op (unsigned icode, rtx mask, rtx dest, rtx src, rtx len,
-	      machine_mode mask_mode, bool force_vlmax = false)
-{
-  insn_expander<8> e;
-  e.set_dest_and_mask (mask, dest, mask_mode);
-
-  e.add_input_operand (src, GET_MODE (src));
-
-  e.set_len_and_policy (len, force_vlmax);
-
-  e.expand ((enum insn_code) icode, MEM_P (dest) || MEM_P (src));
-}
-
-/* Emit an RVV binop.  If one of SRC1 and SRC2 is a scalar operand, its mode is
-   specified using SCALAR_MODE.  */
-static void
-emit_pred_binop (unsigned icode, rtx mask, rtx dest, rtx src1, rtx src2,
-		 rtx len, machine_mode mask_mode,
-		 machine_mode scalar_mode = VOIDmode)
-{
-  insn_expander<9> e;
-  e.set_dest_and_mask (mask, dest, mask_mode);
-
-  gcc_assert (VECTOR_MODE_P (GET_MODE (src1))
-	      || VECTOR_MODE_P (GET_MODE (src2)));
-
-  if (VECTOR_MODE_P (GET_MODE (src1)))
-    e.add_input_operand (src1, GET_MODE (src1));
-  else
-    e.add_input_operand (src1, scalar_mode);
-
-  if (VECTOR_MODE_P (GET_MODE (src2)))
-    e.add_input_operand (src2, GET_MODE (src2));
-  else
-    e.add_input_operand (src2, scalar_mode);
-
-  e.set_len_and_policy (len);
-
-  e.expand ((enum insn_code) icode, MEM_P (dest) || MEM_P (src1) || MEM_P (src2));
-}
-
 /* The RISC-V vsetvli pass uses "known vlmax" operations for optimization.
    Whether or not an instruction actually is a vlmax operation is not
    recognizable from the length operand alone but the avl_type operand
@@ -305,52 +326,42 @@  emit_pred_binop (unsigned icode, rtx mask, rtx dest, rtx src1, rtx src2,
     For that case we also allow to set the avl_type to VLMAX.
 */
 
-/* This function emits a VLMAX vsetvli followed by the actual operation.  */
+/* This function emits a {VLMAX, TAIL_ANY, MASK_ANY} vsetvli followed by the
+ * actual operation.  */
 void
-emit_vlmax_op (unsigned icode, rtx dest, rtx src, machine_mode mask_mode)
+emit_vlmax_tany_many (unsigned icode, int op_num, rtx *ops)
 {
-  emit_pred_op (icode, NULL_RTX, dest, src, NULL_RTX, mask_mode);
+  machine_mode data_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (data_mode).require ();
+  /* The number = 11 is because we have maximum 11 operands for
+     RVV instruction patterns according to vector.md.  */
+  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
+		       /*USE_ALL_TRUES_MASK_P*/ true,
+		       /*USE_UNDEF_MERGE_P*/ true, /*HAS_AVL_P*/ true,
+		       /*VLMAX_P*/ true,
+		       /*HAS_TAIL_POLICY_P*/ true, /*HAS_MASK_POLICY_P*/ true,
+		       /*TAIL_POLICY*/ TAIL_ANY, /*MASK_POLICY*/ MASK_ANY,
+		       /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode);
+  e.emit_insn ((enum insn_code) icode, ops);
 }
 
-/* This function emits an operation with a given LEN that is determined
-   by a previously emitted VLMAX vsetvli.  */
+/* This function emits a {NONVLMAX, TAIL_ANY, MASK_ANY} vsetvli followed by the
+ * actual operation.  */
 void
-emit_len_op (unsigned icode, rtx dest, rtx src, rtx len,
-	     machine_mode mask_mode)
+emit_nonvlmax_tany_many (unsigned icode, int op_num, rtx *ops)
 {
-  emit_pred_op (icode, NULL_RTX, dest, src, len, mask_mode);
-}
-
-/* This function emits an operation with a given LEN that is known to be
-   a preceding VLMAX.  It also sets the VLMAX flag which allows further
-   optimization in the vsetvli pass.  */
-void
-emit_vlmax_reg_op (unsigned icode, rtx dest, rtx src, rtx len,
-		   machine_mode mask_mode)
-{
-  emit_pred_op (icode, NULL_RTX, dest, src, len, mask_mode,
-		/* Force VLMAX */ true);
-}
-
-void
-emit_len_binop (unsigned icode, rtx dest, rtx src1, rtx src2, rtx len,
-		machine_mode mask_mode, machine_mode scalar_mode)
-{
-  emit_pred_binop (icode, NULL_RTX, dest, src1, src2, len,
-		   mask_mode, scalar_mode);
-}
-
-/* Emit vid.v instruction.  */
-
-static void
-emit_index_op (rtx dest, machine_mode mask_mode)
-{
-  insn_expander<7> e;
-  e.set_dest_and_mask (NULL, dest, mask_mode);
-
-  e.set_len_and_policy (NULL, true);
-
-  e.expand (code_for_pred_series (GET_MODE (dest)), false);
+  machine_mode data_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (data_mode).require ();
+  /* The number = 11 is because we have maximum 11 operands for
+     RVV instruction patterns according to vector.md.  */
+  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
+		       /*USE_ALL_TRUES_MASK_P*/ true,
+		       /*USE_UNDEF_MERGE_P*/ true, /*HAS_AVL_P*/ true,
+		       /*VLMAX_P*/ false,
+		       /*HAS_TAIL_POLICY_P*/ true, /*HAS_MASK_POLICY_P*/ true,
+		       /*TAIL_POLICY*/ TAIL_ANY, /*MASK_POLICY*/ MASK_ANY,
+		       /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode);
+  e.emit_insn ((enum insn_code) icode, ops);
 }
 
 /* Expand series const vector.  */
@@ -359,7 +370,6 @@  void
 expand_vec_series (rtx dest, rtx base, rtx step)
 {
   machine_mode mode = GET_MODE (dest);
-  machine_mode inner_mode = GET_MODE_INNER (mode);
   machine_mode mask_mode;
   gcc_assert (get_mask_mode (mode).exists (&mask_mode));
 
@@ -367,7 +377,8 @@  expand_vec_series (rtx dest, rtx base, rtx step)
 
   /* Step 1: Generate I = { 0, 1, 2, ... } by vid.v.  */
   rtx vid = gen_reg_rtx (mode);
-  emit_index_op (vid, mask_mode);
+  rtx op[1] = {vid};
+  emit_vlmax_tany_many (code_for_pred_series (mode), RVV_MISC_OP_NUM, op);
 
   /* Step 2: Generate I * STEP.
      - STEP is 1, we don't emit any instructions.
@@ -385,14 +396,14 @@  expand_vec_series (rtx dest, rtx base, rtx step)
 	  int shift = exact_log2 (INTVAL (step));
 	  rtx shift_amount = gen_int_mode (shift, Pmode);
 	  insn_code icode = code_for_pred_scalar (ASHIFT, mode);
-	  emit_len_binop (icode, step_adj, vid, shift_amount,
-			  NULL, mask_mode, Pmode);
+	  rtx ops[3] = {step_adj, vid, shift_amount};
+	  emit_vlmax_tany_many (icode, RVV_BINOP_NUM, ops);
 	}
       else
 	{
 	  insn_code icode = code_for_pred_scalar (MULT, mode);
-	  emit_len_binop (icode, step_adj, vid, step,
-			  NULL, mask_mode, inner_mode);
+	  rtx ops[3] = {step_adj, vid, step};
+	  emit_vlmax_tany_many (icode, RVV_BINOP_NUM, ops);
 	}
     }
 
@@ -407,14 +418,14 @@  expand_vec_series (rtx dest, rtx base, rtx step)
     {
       rtx result = gen_reg_rtx (mode);
       insn_code icode = code_for_pred_scalar (PLUS, mode);
-      emit_len_binop (icode, result, step_adj, base,
-			   NULL, mask_mode, inner_mode);
+      rtx ops[3] = {result, step_adj, base};
+      emit_vlmax_tany_many (icode, RVV_BINOP_NUM, ops);
       emit_move_insn (dest, result);
     }
 }
 
 static void
-expand_const_vector (rtx target, rtx src, machine_mode mask_mode)
+expand_const_vector (rtx target, rtx src)
 {
   machine_mode mode = GET_MODE (target);
   scalar_mode elt_mode = GET_MODE_INNER (mode);
@@ -424,7 +435,8 @@  expand_const_vector (rtx target, rtx src, machine_mode mask_mode)
       gcc_assert (
 	const_vec_duplicate_p (src, &elt)
 	&& (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
-      emit_vlmax_op (code_for_pred_mov (mode), target, src, mask_mode);
+      rtx ops[2] = {target, src};
+      emit_vlmax_tany_many (code_for_pred_mov (mode), RVV_UNOP_NUM, ops);
       return;
     }
 
@@ -435,10 +447,16 @@  expand_const_vector (rtx target, rtx src, machine_mode mask_mode)
       /* Element in range -16 ~ 15 integer or 0.0 floating-point,
 	 we use vmv.v.i instruction.  */
       if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
-	emit_vlmax_op (code_for_pred_mov (mode), tmp, src, mask_mode);
+	{
+	  rtx ops[2] = {tmp, src};
+	  emit_vlmax_tany_many (code_for_pred_mov (mode), RVV_UNOP_NUM, ops);
+	}
       else
-	emit_vlmax_op (code_for_pred_broadcast (mode), tmp,
-		       force_reg (elt_mode, elt), mask_mode);
+	{
+	  elt = force_reg (elt_mode, elt);
+	  rtx ops[2] = {tmp, elt};
+	  emit_vlmax_tany_many (code_for_pred_broadcast (mode), RVV_UNOP_NUM, ops);
+	}
 
       if (tmp != target)
 	emit_move_insn (target, tmp);
@@ -463,12 +481,12 @@  expand_const_vector (rtx target, rtx src, machine_mode mask_mode)
 /* Expand a pre-RA RVV data move from SRC to DEST.
    It expands move for RVV fractional vector modes.  */
 bool
-legitimize_move (rtx dest, rtx src, machine_mode mask_mode)
+legitimize_move (rtx dest, rtx src)
 {
   machine_mode mode = GET_MODE (dest);
   if (CONST_VECTOR_P (src))
     {
-      expand_const_vector (dest, src, mask_mode);
+      expand_const_vector (dest, src);
       return true;
     }
 
@@ -505,7 +523,10 @@  legitimize_move (rtx dest, rtx src, machine_mode mask_mode)
     {
       rtx tmp = gen_reg_rtx (mode);
       if (MEM_P (src))
-	emit_vlmax_op (code_for_pred_mov (mode), tmp, src, mask_mode);
+	{
+	  rtx ops[2] = {tmp, src};
+	  emit_vlmax_tany_many (code_for_pred_mov (mode), RVV_UNOP_NUM, ops);
+	}
       else
 	emit_move_insn (tmp, src);
       src = tmp;
@@ -514,7 +535,8 @@  legitimize_move (rtx dest, rtx src, machine_mode mask_mode)
   if (satisfies_constraint_vu (src))
     return false;
 
-  emit_vlmax_op (code_for_pred_mov (mode), dest, src, mask_mode);
+  rtx ops[2] = {dest, src};
+  emit_vlmax_tany_many (code_for_pred_mov (mode), RVV_UNOP_NUM, ops);
   return true;
 }
 
@@ -748,8 +770,7 @@  has_vi_variant_p (rtx_code code, rtx x)
 
 bool
 sew64_scalar_helper (rtx *operands, rtx *scalar_op, rtx vl,
-		     machine_mode vector_mode, machine_mode mask_mode,
-		     bool has_vi_variant_p,
+		     machine_mode vector_mode, bool has_vi_variant_p,
 		     void (*emit_vector_func) (rtx *, rtx))
 {
   machine_mode scalar_mode = GET_MODE_INNER (vector_mode);
@@ -779,8 +800,9 @@  sew64_scalar_helper (rtx *operands, rtx *scalar_op, rtx vl,
     *scalar_op = force_reg (scalar_mode, *scalar_op);
 
   rtx tmp = gen_reg_rtx (vector_mode);
-  riscv_vector::emit_len_op (code_for_pred_broadcast (vector_mode), tmp,
-			     *scalar_op, vl, mask_mode);
+  rtx ops[3] = {tmp, *scalar_op, vl};
+  riscv_vector::emit_nonvlmax_tany_many (code_for_pred_broadcast (vector_mode),
+					 RVV_UNOP_NUM, ops);
   emit_vector_func (operands, tmp);
 
   return true;
@@ -990,7 +1012,7 @@  gen_avl_for_scalar_move (rtx avl)
 
 /* Expand tuple modes data movement for.  */
 void
-expand_tuple_move (machine_mode mask_mode, rtx *ops)
+expand_tuple_move (rtx *ops)
 {
   unsigned int i;
   machine_mode tuple_mode = GET_MODE (ops[0]);
@@ -1086,8 +1108,11 @@  expand_tuple_move (machine_mode mask_mode, rtx *ops)
 	      rtx mem = gen_rtx_MEM (subpart_mode, ops[3]);
 
 	      if (fractional_p)
-		emit_vlmax_reg_op (code_for_pred_mov (subpart_mode), subreg, mem,
-			       ops[4], mask_mode);
+		{
+		  rtx operands[3] = {subreg, mem, ops[4]};
+		  emit_vlmax_tany_many (code_for_pred_mov (subpart_mode),
+					RVV_UNOP_NUM, operands);
+		}
 	      else
 		emit_move_insn (subreg, mem);
 	    }
@@ -1108,8 +1133,11 @@  expand_tuple_move (machine_mode mask_mode, rtx *ops)
 	      rtx mem = gen_rtx_MEM (subpart_mode, ops[3]);
 
 	      if (fractional_p)
-		emit_vlmax_reg_op (code_for_pred_mov (subpart_mode), mem, subreg,
-			       ops[4], mask_mode);
+		{
+		  rtx operands[3] = {mem, subreg, ops[4]};
+		  emit_vlmax_tany_many (code_for_pred_mov (subpart_mode),
+					RVV_UNOP_NUM, operands);
+		}
 	      else
 		emit_move_insn (mem, subreg);
 	    }
@@ -1230,7 +1258,6 @@  expand_vector_init_insert_elems (rtx target, const rvv_builder &builder,
 				 int nelts_reqd)
 {
   machine_mode mode = GET_MODE (target);
-  scalar_mode elem_mode = GET_MODE_INNER (mode);
   machine_mode mask_mode;
   gcc_assert (get_mask_mode (mode).exists (&mask_mode));
   rtx dup = expand_vector_broadcast (mode, builder.elt (0));
@@ -1241,8 +1268,8 @@  expand_vector_init_insert_elems (rtx target, const rvv_builder &builder,
       unsigned int unspec
 	= FLOAT_MODE_P (mode) ? UNSPEC_VFSLIDE1DOWN : UNSPEC_VSLIDE1DOWN;
       insn_code icode = code_for_pred_slide (unspec, mode);
-      emit_len_binop (icode, target, target, builder.elt (i), NULL, mask_mode,
-		      elem_mode);
+      rtx ops[3] = {target, target, builder.elt (i)};
+      emit_vlmax_tany_many (icode, RVV_BINOP_NUM, ops);
     }
 }
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5ac187c1b1b..109483c8b1c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7389,9 +7389,6 @@  vector_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 	{
 	  rtx target = regno_reg_rtx[regno];
 	  machine_mode mode = GET_MODE (target);
-	  poly_uint16 nunits = GET_MODE_NUNITS (mode);
-	  machine_mode mask_mode
-	    = riscv_vector::get_vector_mode (BImode, nunits).require ();
 
 	  if (!emitted_vlmax_vsetvl)
 	    {
@@ -7399,8 +7396,9 @@  vector_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 	      emitted_vlmax_vsetvl = true;
 	    }
 
-	  riscv_vector::emit_vlmax_reg_op (code_for_pred_mov (mode), target,
-					   CONST0_RTX (mode), vl, mask_mode);
+	  rtx ops[3] = {target, CONST0_RTX (mode), vl};
+	  riscv_vector::emit_vlmax_tany_many (code_for_pred_mov (mode),
+					      RVV_UNOP_NUM, ops);
 
 	  SET_HARD_REG_BIT (zeroed_hardregs, regno);
 	}
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index e8bb7c5dec1..b6663973ba1 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -662,7 +662,7 @@ 
 	 before spilling. The clobber scratch is used by spilling fractional
 	 registers in IRA/LRA so it's too early.  */
 
-  if (riscv_vector::legitimize_move (operands[0], operands[1], <VM>mode))
+  if (riscv_vector::legitimize_move (operands[0], operands[1]))
     DONE;
 })
 
@@ -718,7 +718,7 @@ 
 	(match_operand:VB 1 "general_operand"))]
   "TARGET_VECTOR"
 {
-  if (riscv_vector::legitimize_move (operands[0], operands[1], <MODE>mode))
+  if (riscv_vector::legitimize_move (operands[0], operands[1]))
     DONE;
 })
 
@@ -760,9 +760,8 @@ 
   else
     {
       riscv_vector::emit_vlmax_vsetvl (<V_FRACT:MODE>mode, operands[2]);
-      riscv_vector::emit_vlmax_reg_op (code_for_pred_mov (<V_FRACT:MODE>mode),
-				       operands[0], operands[1], operands[2],
-				       <VM>mode);
+      riscv_vector::emit_vlmax_tany_many (code_for_pred_mov (<V_FRACT:MODE>mode),
+					  RVV_UNOP_NUM, operands);
     }
   DONE;
 })
@@ -781,9 +780,8 @@ 
   else
     {
       riscv_vector::emit_vlmax_vsetvl (<VB:MODE>mode, operands[2]);
-      riscv_vector::emit_vlmax_reg_op (code_for_pred_mov (<VB:MODE>mode),
-				       operands[0], operands[1], operands[2],
-				       <VB:MODE>mode);
+      riscv_vector::emit_vlmax_tany_many (code_for_pred_mov (<VB:MODE>mode),
+					  RVV_UNOP_NUM, operands);
     }
   DONE;
 })
@@ -806,7 +804,7 @@ 
 
     if (GET_CODE (operands[1]) == CONST_VECTOR)
       {
-        riscv_vector::expand_tuple_move (<VM>mode, operands);
+        riscv_vector::expand_tuple_move (operands);
         DONE;
       }
 
@@ -826,7 +824,7 @@ 
   "&& reload_completed"
   [(const_int 0)]
   {
-    riscv_vector::expand_tuple_move (<VM>mode, operands);
+    riscv_vector::expand_tuple_move (operands);
     DONE;
   }
   [(set_attr "type" "vmov,vlde,vste")
@@ -846,8 +844,8 @@ 
 	  (match_operand:<VEL> 1 "direct_broadcast_operand")))]
   "TARGET_VECTOR"
   {
-    riscv_vector::emit_vlmax_op (code_for_pred_broadcast (<MODE>mode),
-				 operands[0], operands[1], <VM>mode);
+    riscv_vector::emit_vlmax_tany_many (code_for_pred_broadcast (<MODE>mode),
+					RVV_UNOP_NUM, operands);
     DONE;
   }
 )
@@ -1272,7 +1270,6 @@ 
 	/* scalar op */&operands[3],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::simm5_p (operands[3]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_merge<mode> (operands[0], operands[1],
@@ -1983,7 +1980,6 @@ 
 	/* scalar op */&operands[4],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::has_vi_variant_p (<CODE>, operands[4]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_<optab><mode> (operands[0], operands[1],
@@ -2059,7 +2055,6 @@ 
 	/* scalar op */&operands[4],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::has_vi_variant_p (<CODE>, operands[4]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_<optab><mode> (operands[0], operands[1],
@@ -2135,7 +2130,6 @@ 
 	/* scalar op */&operands[4],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::neg_simm5_p (operands[4]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_sub<mode> (operands[0], operands[1],
@@ -2253,7 +2247,6 @@ 
 	/* scalar op */&operands[4],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	false,
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_mulh<v_su><mode> (operands[0], operands[1],
@@ -2428,7 +2421,6 @@ 
 	/* scalar op */&operands[3],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::simm5_p (operands[3]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_adc<mode> (operands[0], operands[1],
@@ -2512,7 +2504,6 @@ 
 	/* scalar op */&operands[3],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	false,
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_sbc<mode> (operands[0], operands[1],
@@ -2671,7 +2662,6 @@ 
 	/* scalar op */&operands[2],
 	/* vl */operands[4],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::simm5_p (operands[2]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_madc<mode> (operands[0], operands[1],
@@ -2741,7 +2731,6 @@ 
 	/* scalar op */&operands[2],
 	/* vl */operands[4],
 	<MODE>mode,
-	<VM>mode,
 	false,
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_msbc<mode> (operands[0], operands[1],
@@ -2884,7 +2873,6 @@ 
 	/* scalar op */&operands[2],
 	/* vl */operands[3],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::simm5_p (operands[2]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_madc<mode>_overflow (operands[0], operands[1],
@@ -2951,7 +2939,6 @@ 
 	/* scalar op */&operands[2],
 	/* vl */operands[3],
 	<MODE>mode,
-	<VM>mode,
 	false,
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_msbc<mode>_overflow (operands[0], operands[1],
@@ -3449,7 +3436,6 @@ 
 	/* scalar op */&operands[4],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::has_vi_variant_p (<CODE>, operands[4]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_<optab><mode> (operands[0], operands[1],
@@ -3531,7 +3517,6 @@ 
 	/* scalar op */&operands[4],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::has_vi_variant_p (<CODE>, operands[4]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_<optab><mode> (operands[0], operands[1],
@@ -3681,7 +3666,6 @@ 
 	/* scalar op */&operands[4],
 	/* vl */operands[5],
 	<MODE>mode,
-	<VM>mode,
 	false,
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_<sat_op><mode> (operands[0], operands[1],
@@ -4141,7 +4125,6 @@ 
 	/* scalar op */&operands[5],
 	/* vl */operands[6],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::has_vi_variant_p (code, operands[5]),
 	code == LT || code == LTU ?
 	  [] (rtx *operands, rtx boardcast_scalar) {
@@ -4181,7 +4164,6 @@ 
 	/* scalar op */&operands[5],
 	/* vl */operands[6],
 	<MODE>mode,
-	<VM>mode,
 	riscv_vector::has_vi_variant_p (code, operands[5]),
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_cmp<mode> (operands[0], operands[1],
@@ -4880,7 +4862,6 @@ 
 	/* scalar op */&operands[2],
 	/* vl */operands[6],
 	<MODE>mode,
-	<VM>mode,
 	false,
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_mul_plus<mode> (operands[0], operands[1],
@@ -5301,7 +5282,6 @@ 
 	/* scalar op */&operands[2],
 	/* vl */operands[6],
 	<MODE>mode,
-	<VM>mode,
 	false,
 	[] (rtx *operands, rtx boardcast_scalar) {
 	  emit_insn (gen_pred_minus_mul<mode> (operands[0], operands[1],