[1/1,V2,RISC-V] support cm.push cm.pop cm.popret in zcmp

Message ID 20230512090443.34123-2-gaofei@eswincomputing.com
State Accepted
Headers
Series RISC-V: support Zcmp extension |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Fei Gao May 12, 2023, 9:04 a.m. UTC
  Zcmp can share the same logic as save-restore in stack allocation: pre-allocation
by cm.push, step 1 and step 2.

please be noted cm.push pushes ra, s0-s11 in reverse order than what save-restore does.
So adaption has been done in .cfi directives in my patch.

gcc/ChangeLog:

        * config/riscv/predicates.md (slot_0_offset_operand): predicates for slot 0 offset.
        (slot_1_offset_operand): likewise
        (slot_2_offset_operand): likewise
        (slot_3_offset_operand): likewise
        (slot_4_offset_operand): likewise
        (slot_5_offset_operand): likewise
        (slot_6_offset_operand): likewise
        (slot_7_offset_operand): likewise
        (slot_8_offset_operand): likewise
        (slot_9_offset_operand): likewise
        (slot_10_offset_operand): likewise
        (slot_11_offset_operand): likewise
        (slot_12_offset_operand): likewise
        (stack_push_up_to_ra_operand): predicates for stack adjust of pushing ra
        (stack_push_up_to_s0_operand): predicates for stack adjust of pushing ra, s0
        (stack_push_up_to_s1_operand): likewise
        (stack_push_up_to_s2_operand): likewise
        (stack_push_up_to_s3_operand): likewise
        (stack_push_up_to_s4_operand): likewise
        (stack_push_up_to_s5_operand): likewise
        (stack_push_up_to_s6_operand): likewise
        (stack_push_up_to_s7_operand): likewise
        (stack_push_up_to_s8_operand): likewise
        (stack_push_up_to_s9_operand): likewise
        (stack_push_up_to_s11_operand): likewise
        (stack_pop_up_to_ra_operand): predicates for stack adjust of poping ra
        (stack_pop_up_to_s0_operand): predicates for stack adjust of poping ra, s0
        (stack_pop_up_to_s1_operand): likewise
        (stack_pop_up_to_s2_operand): likewise
        (stack_pop_up_to_s3_operand): likewise
        (stack_pop_up_to_s4_operand): likewise
        (stack_pop_up_to_s5_operand): likewise
        (stack_pop_up_to_s6_operand): likewise
        (stack_pop_up_to_s7_operand): likewise
        (stack_pop_up_to_s8_operand): likewise
        (stack_pop_up_to_s9_operand): likewise
        (stack_pop_up_to_s11_operand): likewise
        * config/riscv/riscv-protos.h (riscv_zcmp_valid_slot_offset_p): declaration
        (riscv_zcmp_valid_stack_adj_bytes_p): declaration
        * config/riscv/riscv.cc (struct riscv_frame_info): comment change
        (riscv_avoid_multi_push): helper function of riscv_use_multi_push
        (riscv_use_multi_push): true if multi push is used
        (riscv_multi_push_sregs_count): num of sregs in multi-push
        (riscv_multi_push_regs_count): num of regs in multi-push
        (riscv_16bytes_align): align to 16 bytes
        (riscv_stack_align): moved to a better place
        (riscv_save_libcall_count): no functional change
        (riscv_compute_frame_info): add zcmp frame info
        (riscv_adjust_multi_push_cfi_prologue): adjust cfi for cm.push
        (get_slot_offset_rtx): get the rtx of slot to push or pop
        (riscv_gen_multi_push_pop_insn): gen function for multi push and pop
        (riscv_expand_prologue): allocate stack by cm.push
        (riscv_adjust_multi_pop_cfi_epilogue): adjust cfi for cm.pop[ret]
        (riscv_expand_epilogue): allocate stack by cm.pop[ret]
        (zcmp_base_adj): calculate stack adjustment base size
        (zcmp_additional_adj): calculate stack adjustment additional size
        (riscv_zcmp_valid_slot_offset_p): check if offset is valid for a slot
        (riscv_zcmp_valid_stack_adj_bytes_p): check if stack adjustment size is valid
        * config/riscv/riscv.h (RETURN_ADDR_MASK): mask of ra
        (S0_MASK): likewise
        (S1_MASK): likewise
        (S2_MASK): likewise
        (S3_MASK): likewise
        (S4_MASK): likewise
        (S5_MASK): likewise
        (S6_MASK): likewise
        (S7_MASK): likewise
        (S8_MASK): likewise
        (S9_MASK): likewise
        (S10_MASK): likewise
        (S11_MASK): likewise
        (MULTI_PUSH_GPR_MASK): GPR_MASK that cm.push can cover at most
        (ZCMP_MAX_SPIMM): max spimm value
        (ZCMP_SP_INC_STEP): zcmp sp increment step
        (ZCMP_INVALID_S0S10_SREGS_COUNTS): num of s0-s10
        (ZCMP_S0S11_SREGS_COUNTS): num of s0-s11
        (ZCMP_MAX_GRP_SLOTS): max slots of pushing and poping in zcmp
        * config/riscv/riscv.md: include zc.md
        * config/riscv/zc.md: New file. machine description for zcmp

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rv32e_zcmp.c: New test.
        * gcc.target/riscv/rv32i_zcmp.c: New test.
        * gcc.target/riscv/zcmp_stack_alignment.c: New test.
---
 gcc/config/riscv/predicates.md                |  148 +++
 gcc/config/riscv/riscv-protos.h               |    2 +
 gcc/config/riscv/riscv.cc                     |  477 +++++++-
 gcc/config/riscv/riscv.h                      |   23 +
 gcc/config/riscv/riscv.md                     |    2 +
 gcc/config/riscv/zc.md                        | 1042 +++++++++++++++++
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   |  239 ++++
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   |  239 ++++
 .../gcc.target/riscv/zcmp_stack_alignment.c   |   23 +
 9 files changed, 2155 insertions(+), 40 deletions(-)
 create mode 100644 gcc/config/riscv/zc.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_stack_alignment.c
  

Comments

Kito Cheng May 29, 2023, 3:05 a.m. UTC | #1
Thanks for this patch, just few minor comment, I think this is pretty
close to accept :)

Could you reference JiaWei's match_parallel[1] to prevent adding bunch
of *_offset_operand and stack_push_up_to_*_operand?


[1] https://patchwork.sourceware.org/project/gcc/patch/20230406062118.47431-5-jiawei@iscas.ac.cn/


> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 629e5e45cac..a0a2db1f594 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -117,6 +117,14 @@ struct GTY(())  riscv_frame_info {
>    /* How much the GPR save/restore routines adjust sp (or 0 if unused).  */
>    unsigned save_libcall_adjustment;
>
> +  /* the minimum number of bytes, in multiples of 16-byte address increments,
> +     required to cover the registers in a multi push & pop.  */
> +  unsigned multi_push_adj_base;
> +
> +  /* the number of additional 16-byte address increments allocated for the stack frame
> +     in a multi push & pop.  */
> +  unsigned multi_push_adj_addi;
> +
>    /* Offsets of fixed-point and floating-point save areas from frame bottom */
>    poly_int64 gp_sp_offset;
>    poly_int64 fp_sp_offset;
> @@ -413,6 +421,21 @@ static const struct riscv_tune_info riscv_tune_info_table[] = {
>  #include "riscv-cores.def"
>  };
>
> +typedef enum
> +{
> +  SI_IDX = 0,
> +  DI_IDX,
> +  MAX_MODE_IDX = DI_IDX
> +} mode_idx;
> +

Didn't see any use in this version?

> @@ -5574,18 +5924,25 @@ riscv_expand_epilogue (int style)
>        REG_NOTES (insn) = dwarf;
>      }
>
> -  if (use_restore_libcall)
> -    frame->mask = 0; /* Temporarily fib for GPRs.  */
> +  if (use_restore_libcall || use_multi_pop)
> +    frame->mask = 0; /* Temporarily fib that we need not save GPRs.  */
>
>    /* If we need to restore registers, deallocate as much stack as
>       possible in the second step without going out of range.  */
> -  if ((frame->mask | frame->fmask) != 0)
> +  if (use_multi_pop)
> +    {
> +      if (frame->fmask
> +          && known_gt (frame->total_size - multipop_size,
> +                      frame->frame_pointer_offset))
> +        step2 = riscv_first_stack_step (frame, frame->total_size - multipop_size);
> +    }
> +  else if ((frame->mask | frame->fmask) != 0)
>      step2 = riscv_first_stack_step (frame, frame->total_size - libcall_size);
>
> -  if (use_restore_libcall)
> +  if (use_restore_libcall || use_multi_pop)
>      frame->mask = mask; /* Undo the above fib.  */
>
> -  poly_int64 step1 = frame->total_size - step2 - libcall_size;
> +  poly_int64 step1 = frame->total_size - step2 - libcall_size - multipop_size ;
>
>    /* Set TARGET to BASE + STEP1.  */
>    if (known_gt (step1, 0))
> @@ -5620,7 +5977,7 @@ riscv_expand_epilogue (int style)
>                                            adjust));
>           rtx dwarf = NULL_RTX;
>           rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
> -                                            GEN_INT (step2));
> +                                            GEN_INT (step2 + libcall_size + multipop_size));

Why we need `+ libcall_size` here? or...why we don't need that before?

>
>           dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
>           RTX_FRAME_RELATED_P (insn) = 1;
> @@ -5635,15 +5992,15 @@ riscv_expand_epilogue (int style)
>        epilogue_cfa_sp_offset = step2;
>      }
>
> -  if (use_restore_libcall)
> +  if (use_restore_libcall || use_multi_pop)
>      frame->mask = 0; /* Temporarily fib that we need not save GPRs.  */
>
>    /* Restore the registers.  */
> -  riscv_for_each_saved_reg (frame->total_size - step2 - libcall_size,
> +  riscv_for_each_saved_reg (frame->total_size - step2 - libcall_size - multipop_size,
>                             riscv_restore_reg,
>                             true, style == EXCEPTION_RETURN);
>
> -  if (use_restore_libcall)
> +  if (use_restore_libcall || use_multi_pop)
>        frame->mask = mask; /* Undo the above fib.  */
>
>    if (need_barrier_p)
> @@ -5657,14 +6014,30 @@ riscv_expand_epilogue (int style)
>
>        rtx dwarf = NULL_RTX;
>        rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
> -                                        const0_rtx);
> +                                        GEN_INT (libcall_size + multipop_size));

Same question for `libcall_size` part.
  
Fei Gao May 30, 2023, 6:58 a.m. UTC | #2
On 2023-05-29 11:05  Kito Cheng <kito.cheng@gmail.com> wrote:
>
>Thanks for this patch, just few minor comment, I think this is pretty
>close to accept :)
>
>Could you reference JiaWei's match_parallel[1] to prevent adding bunch
>of *_offset_operand and stack_push_up_to_*_operand?
>
>
>[1] https://patchwork.sourceware.org/project/gcc/patch/20230406062118.47431-5-jiawei@iscas.ac.cn/
Thanks for your review. 

The md file looks verbose with bunch of *_offset_operand and stack_push_up_to_*_operand, but it significantly
simplies implementation of recognizing zmcp push and pop insns and outputting assembly.  Also, the md file
clearly shows and checks the slot that each register is placed(different to slot order w/o save-restore before
zcmp is introduced). So I prefer my patch V2 to V1 or the link you attached. But ideas are welcome to make
it better. Appreciated if you suggest more details for the improvement.

>
>
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index 629e5e45cac..a0a2db1f594 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -117,6 +117,14 @@ struct GTY(())  riscv_frame_info {
>>    /* How much the GPR save/restore routines adjust sp (or 0 if unused).  */
>>    unsigned save_libcall_adjustment;
>>
>> +  /* the minimum number of bytes, in multiples of 16-byte address increments,
>> +     required to cover the registers in a multi push & pop.  */
>> +  unsigned multi_push_adj_base;
>> +
>> +  /* the number of additional 16-byte address increments allocated for the stack frame
>> +     in a multi push & pop.  */
>> +  unsigned multi_push_adj_addi;
>> +
>>    /* Offsets of fixed-point and floating-point save areas from frame bottom */
>>    poly_int64 gp_sp_offset;
>>    poly_int64 fp_sp_offset;
>> @@ -413,6 +421,21 @@ static const struct riscv_tune_info riscv_tune_info_table[] = {
>>  #include "riscv-cores.def"
>>  };
>>
>> +typedef enum
>> +{
>> +  SI_IDX = 0,
>> +  DI_IDX,
>> +  MAX_MODE_IDX = DI_IDX
>> +} mode_idx;
>> +
>
>Didn't see any use in this version? 
It's used in defining the array below.
const insn_gen_fn gen_push_pop [MAX_OP_IDX + 1][MAX_MODE_IDX + 1][ZCMP_MAX_GRP_SLOTS]

>
>> @@ -5574,18 +5924,25 @@ riscv_expand_epilogue (int style)
>>        REG_NOTES (insn) = dwarf;
>>      }
>>
>> -  if (use_restore_libcall)
>> -    frame->mask = 0; /* Temporarily fib for GPRs.  */
>> +  if (use_restore_libcall || use_multi_pop)
>> +    frame->mask = 0; /* Temporarily fib that we need not save GPRs.  */
>>
>>    /* If we need to restore registers, deallocate as much stack as
>>       possible in the second step without going out of range.  */
>> -  if ((frame->mask | frame->fmask) != 0)
>> +  if (use_multi_pop)
>> +    {
>> +      if (frame->fmask
>> +          && known_gt (frame->total_size - multipop_size,
>> +                      frame->frame_pointer_offset))
>> +        step2 = riscv_first_stack_step (frame, frame->total_size - multipop_size);
>> +    }
>> +  else if ((frame->mask | frame->fmask) != 0)
>>      step2 = riscv_first_stack_step (frame, frame->total_size - libcall_size);
>>
>> -  if (use_restore_libcall)
>> +  if (use_restore_libcall || use_multi_pop)
>>      frame->mask = mask; /* Undo the above fib.  */
>>
>> -  poly_int64 step1 = frame->total_size - step2 - libcall_size;
>> +  poly_int64 step1 = frame->total_size - step2 - libcall_size - multipop_size ;
>>
>>    /* Set TARGET to BASE + STEP1.  */
>>    if (known_gt (step1, 0))
>> @@ -5620,7 +5977,7 @@ riscv_expand_epilogue (int style)
>>                                            adjust));
>>           rtx dwarf = NULL_RTX;
>>           rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>> -                                            GEN_INT (step2));
>> +                                            GEN_INT (step2 + libcall_size + multipop_size));
>
>Why we need `+ libcall_size` here? or...why we don't need that before? 
It's a good catch:)  
I should have  added `+ libcall_size` in 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=60524be1e3929d83e15fceac6e2aa053c8a6fb20

That's why I corrected the cfi issue in save-restore along with zcmp changes in this patch.

>
>>
>>           dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
>>           RTX_FRAME_RELATED_P (insn) = 1;
>> @@ -5635,15 +5992,15 @@ riscv_expand_epilogue (int style)
>>        epilogue_cfa_sp_offset = step2;
>>      }
>>
>> -  if (use_restore_libcall)
>> +  if (use_restore_libcall || use_multi_pop)
>>      frame->mask = 0; /* Temporarily fib that we need not save GPRs.  */
>>
>>    /* Restore the registers.  */
>> -  riscv_for_each_saved_reg (frame->total_size - step2 - libcall_size,
>> +  riscv_for_each_saved_reg (frame->total_size - step2 - libcall_size - multipop_size,
>>                             riscv_restore_reg,
>>                             true, style == EXCEPTION_RETURN);
>>
>> -  if (use_restore_libcall)
>> +  if (use_restore_libcall || use_multi_pop)
>>        frame->mask = mask; /* Undo the above fib.  */
>>
>>    if (need_barrier_p)
>> @@ -5657,14 +6014,30 @@ riscv_expand_epilogue (int style)
>>
>>        rtx dwarf = NULL_RTX;
>>        rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>> -                                        const0_rtx);
>> +                                        GEN_INT (libcall_size + multipop_size));
>
>Same question for `libcall_size` part. 

Same reason as described above.

BR, 
Fei
  
Kito Cheng May 31, 2023, 10:13 a.m. UTC | #3
> >[1] https://patchwork.sourceware.org/project/gcc/patch/20230406062118.47431-5-jiawei@iscas.ac.cn/
> Thanks for your review.
>
> The md file looks verbose with bunch of *_offset_operand and stack_push_up_to_*_operand, but it significantly
> simplies implementation of recognizing zmcp push and pop insns and outputting assembly.  Also, the md file
> clearly shows and checks the slot that each register is placed(different to slot order w/o save-restore before
> zcmp is introduced). So I prefer my patch V2 to V1 or the link you attached. But ideas are welcome to make
> it better. Appreciated if you suggest more details for the improvement.

Got your point, and share an idea to simplify that:

struct code_for_push_pop_t {
   insn_code (*push)(machine_mode);
   insn_code (*pop)(machine_mode);
   insn_code (*pop_ret)(machine_mode);
};
const code_for_push_pop_t code_for_push_pop [/*ZCMP_MAX_GRP_SLOTS*/2] = {
    {code_for_gpr_multi_pop_up_to_ra, /*FIXME*/nullptr, /*FIXME*/nullptr},
    {code_for_gpr_multi_pop_up_to_s0, /*FIXME*/nullptr, /*FIXME*/nullptr}
};

static rtx
riscv_gen_multi_push_pop_insn (op_idx op, HOST_WIDE_INT adj_size,
unsigned int regs_num)
{
  rtx stack_adj = GEN_INT (adj_size);

  return GEN_FCN (code_for_push_pop[regs_num].push(Pmode)) (stack_adj);
}

(define_mode_attr slot0_offset [(SI "0") (DI "0")])
(define_mode_attr slot1_offset [(SI "4") (DI "8")])

(define_insn "@gpr_multi_pop_up_to_ra<mode>"
  [(set (reg:X SP_REGNUM)
        (plus:X (reg:X SP_REGNUM)
                 (match_operand 0 "stack_pop_up_to_ra_operand" "I")))
   (set (reg:X RETURN_ADDR_REGNUM)
        (mem:X (plus:X (reg:X SP_REGNUM)
                       (const_int <slot0_offset>))))]
  "TARGET_ZCMP"
  "cm.pop       {ra}, %0"
)

(define_insn "@gpr_multi_pop_up_to_s0<mode>"
  [(set (reg:X SP_REGNUM)
        (plus:X (reg:X SP_REGNUM)
                 (match_operand 0 "stack_pop_up_to_s0_operand" "I")))
   (set (reg:X S0_REGNUM)
        (mem:X (plus:X (reg:X SP_REGNUM)
                       (const_int <slot0_offset>))))
   (set (reg:X RETURN_ADDR_REGNUM)
        (mem:X (plus:X (reg:X SP_REGNUM)
                       (const_int <slot1_offset>))))]
  "TARGET_ZCMP"
  "cm.pop       {ra, s0}, %0"
)



> >> @@ -5620,7 +5977,7 @@ riscv_expand_epilogue (int style)
> >>                                            adjust));
> >>           rtx dwarf = NULL_RTX;
> >>           rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
> >> -                                            GEN_INT (step2));
> >> +                                            GEN_INT (step2 + libcall_size + multipop_size));
> >
> >Why we need `+ libcall_size` here? or...why we don't need that before?
> It's a good catch:)
> I should have  added `+ libcall_size` in
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=60524be1e3929d83e15fceac6e2aa053c8a6fb20
>
> That's why I corrected the cfi issue in save-restore along with zcmp changes in this patch.

I would like to have a separate patch to fix this bug instead of
hidden in this patch.
  
Fei Gao June 1, 2023, 6:27 a.m. UTC | #4
On 2023-05-31 18:13  Kito Cheng <kito.cheng@gmail.com> wrote:
>
>> >[1] https://patchwork.sourceware.org/project/gcc/patch/20230406062118.47431-5-jiawei@iscas.ac.cn/
>> Thanks for your review.
>>
>> The md file looks verbose with bunch of *_offset_operand and stack_push_up_to_*_operand, but it significantly
>> simplies implementation of recognizing zmcp push and pop insns and outputting assembly.  Also, the md file
>> clearly shows and checks the slot that each register is placed(different to slot order w/o save-restore before
>> zcmp is introduced). So I prefer my patch V2 to V1 or the link you attached. But ideas are welcome to make
>> it better. Appreciated if you suggest more details for the improvement.
>
>Got your point, and share an idea to simplify that:
>
>struct code_for_push_pop_t {
>   insn_code (*push)(machine_mode);
>   insn_code (*pop)(machine_mode);
>   insn_code (*pop_ret)(machine_mode);
>};
>const code_for_push_pop_t code_for_push_pop [/*ZCMP_MAX_GRP_SLOTS*/2] = {
>    {code_for_gpr_multi_pop_up_to_ra, /*FIXME*/nullptr, /*FIXME*/nullptr},
>    {code_for_gpr_multi_pop_up_to_s0, /*FIXME*/nullptr, /*FIXME*/nullptr}
>};
>
>static rtx
>riscv_gen_multi_push_pop_insn (op_idx op, HOST_WIDE_INT adj_size,
>unsigned int regs_num)
>{
>  rtx stack_adj = GEN_INT (adj_size);
>
>  return GEN_FCN (code_for_push_pop[regs_num].push(Pmode)) (stack_adj);
>}
>
>(define_mode_attr slot0_offset [(SI "0") (DI "0")])
>(define_mode_attr slot1_offset [(SI "4") (DI "8")])
>
>(define_insn "@gpr_multi_pop_up_to_ra<mode>"
>  [(set (reg:X SP_REGNUM)
>        (plus:X (reg:X SP_REGNUM)
>                 (match_operand 0 "stack_pop_up_to_ra_operand" "I")))
>   (set (reg:X RETURN_ADDR_REGNUM)
>        (mem:X (plus:X (reg:X SP_REGNUM)
>                       (const_int <slot0_offset>))))]
>  "TARGET_ZCMP"
>  "cm.pop       {ra}, %0"
>)
>
>(define_insn "@gpr_multi_pop_up_to_s0<mode>"
>  [(set (reg:X SP_REGNUM)
>        (plus:X (reg:X SP_REGNUM)
>                 (match_operand 0 "stack_pop_up_to_s0_operand" "I")))
>   (set (reg:X S0_REGNUM)
>        (mem:X (plus:X (reg:X SP_REGNUM)
>                       (const_int <slot0_offset>))))
>   (set (reg:X RETURN_ADDR_REGNUM)
>        (mem:X (plus:X (reg:X SP_REGNUM)
>                       (const_int <slot1_offset>))))]
>  "TARGET_ZCMP"
>  "cm.pop       {ra, s0}, %0"
>) 

Perfect. 
Working on it. 

>
>
>
>> >> @@ -5620,7 +5977,7 @@ riscv_expand_epilogue (int style)
>> >>                                            adjust));
>> >>           rtx dwarf = NULL_RTX;
>> >>           rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>> >> -                                            GEN_INT (step2));
>> >> +                                            GEN_INT (step2 + libcall_size + multipop_size));
>> >
>> >Why we need `+ libcall_size` here? or...why we don't need that before?
>> It's a good catch:)
>> I should have  added `+ libcall_size` in
>> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=60524be1e3929d83e15fceac6e2aa053c8a6fb20
>>
>> That's why I corrected the cfi issue in save-restore along with zcmp changes in this patch.
>
>I would like to have a separate patch to fix this bug instead of
>hidden in this patch. 
sure,  I will make  a separate patch. 

Thanks & BR, 
Fei
  

Patch

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index e5adf06fa25..d3d30dc67f7 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -59,6 +59,154 @@ 
   (ior (match_operand 0 "const_0_operand")
        (match_operand 0 "register_operand")))
 
+(define_predicate "slot_0_offset_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_slot_offset_p (INTVAL (op), 0)")))
+
+(define_predicate "slot_1_offset_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_slot_offset_p (INTVAL (op), 1)")))
+
+(define_predicate "slot_2_offset_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_slot_offset_p (INTVAL (op), 2)")))
+
+(define_predicate "slot_3_offset_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_slot_offset_p (INTVAL (op), 3)")))
+
+(define_predicate "slot_4_offset_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_slot_offset_p (INTVAL (op), 4)")))
+
+(define_predicate "slot_5_offset_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_slot_offset_p (INTVAL (op), 5)")))
+
+(define_predicate "slot_6_offset_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_slot_offset_p (INTVAL (op), 6)")))
+
+(define_predicate "slot_7_offset_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_slot_offset_p (INTVAL (op), 7)")))
+
+(define_predicate "slot_8_offset_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_slot_offset_p (INTVAL (op), 8)")))
+
+(define_predicate "slot_9_offset_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_slot_offset_p (INTVAL (op), 9)")))
+
+(define_predicate "slot_10_offset_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_slot_offset_p (INTVAL (op), 10)")))
+
+(define_predicate "slot_11_offset_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_slot_offset_p (INTVAL (op), 11)")))
+
+(define_predicate "slot_12_offset_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_slot_offset_p (INTVAL (op), 12)")))
+
+(define_predicate "stack_push_up_to_ra_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op) * -1, 1)")))
+
+(define_predicate "stack_push_up_to_s0_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op) * -1, 2)")))
+
+(define_predicate "stack_push_up_to_s1_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op) * -1, 3)")))
+
+(define_predicate "stack_push_up_to_s2_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op) * -1, 4)")))
+
+(define_predicate "stack_push_up_to_s3_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op) * -1, 5)")))
+
+(define_predicate "stack_push_up_to_s4_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op) * -1, 6)")))
+
+(define_predicate "stack_push_up_to_s5_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op) * -1, 7)")))
+
+(define_predicate "stack_push_up_to_s6_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op) * -1, 8)")))
+
+(define_predicate "stack_push_up_to_s7_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op) * -1, 9)")))
+
+(define_predicate "stack_push_up_to_s8_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op) * -1, 10)")))
+
+(define_predicate "stack_push_up_to_s9_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op) * -1, 11)")))
+
+(define_predicate "stack_push_up_to_s11_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op) * -1, 13)")))
+
+(define_predicate "stack_pop_up_to_ra_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op), 1)")))
+
+(define_predicate "stack_pop_up_to_s0_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op), 2)")))
+
+(define_predicate "stack_pop_up_to_s1_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op), 3)")))
+
+(define_predicate "stack_pop_up_to_s2_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op), 4)")))
+
+(define_predicate "stack_pop_up_to_s3_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op), 5)")))
+
+(define_predicate "stack_pop_up_to_s4_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op), 6)")))
+
+(define_predicate "stack_pop_up_to_s5_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op), 7)")))
+
+(define_predicate "stack_pop_up_to_s6_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op), 8)")))
+
+(define_predicate "stack_pop_up_to_s7_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op), 9)")))
+
+(define_predicate "stack_pop_up_to_s8_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op), 10)")))
+
+(define_predicate "stack_pop_up_to_s9_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op), 11)")))
+
+(define_predicate "stack_pop_up_to_s11_operand"
+  (and (match_code "const_int")
+       (match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op), 13)")))
+
 ;; Only use branch-on-bit sequences when the mask is not an ANDI immediate.
 (define_predicate "branch_on_bit_operand"
   (and (match_code "const_int")
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 7760a9cac8d..f0ea14f05be 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -56,6 +56,8 @@  extern bool riscv_split_64bit_move_p (rtx, rtx);
 extern void riscv_split_doubleword_move (rtx, rtx);
 extern const char *riscv_output_move (rtx, rtx);
 extern const char *riscv_output_return ();
+extern bool riscv_zcmp_valid_slot_offset_p (HOST_WIDE_INT, int);
+extern bool riscv_zcmp_valid_stack_adj_bytes_p(HOST_WIDE_INT, int);
 
 #ifdef RTX_CODE
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 629e5e45cac..a0a2db1f594 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -117,6 +117,14 @@  struct GTY(())  riscv_frame_info {
   /* How much the GPR save/restore routines adjust sp (or 0 if unused).  */
   unsigned save_libcall_adjustment;
 
+  /* the minimum number of bytes, in multiples of 16-byte address increments,
+     required to cover the registers in a multi push & pop.  */
+  unsigned multi_push_adj_base;
+
+  /* the number of additional 16-byte address increments allocated for the stack frame
+     in a multi push & pop.  */
+  unsigned multi_push_adj_addi;
+
   /* Offsets of fixed-point and floating-point save areas from frame bottom */
   poly_int64 gp_sp_offset;
   poly_int64 fp_sp_offset;
@@ -413,6 +421,21 @@  static const struct riscv_tune_info riscv_tune_info_table[] = {
 #include "riscv-cores.def"
 };
 
+typedef enum
+{
+  SI_IDX = 0,
+  DI_IDX,
+  MAX_MODE_IDX = DI_IDX
+} mode_idx;
+
+typedef enum
+{
+  PUSH_IDX = 0,
+  POP_IDX,
+  POPRET_IDX,
+  MAX_OP_IDX = POPRET_IDX
+} op_idx;
+
 void riscv_frame_info::reset(void)
 {
   total_size = 0;
@@ -4844,6 +4867,37 @@  riscv_save_reg_p (unsigned int regno)
   return false;
 }
 
+/* Return TRUE if Zcmp push and pop insns should be
+   avoided. FALSE otherwise.
+   Only use multi push & pop if all GPRs masked can be covered,
+   and stack access is SP based,
+   and GPRs are at top of the stack frame,
+   and no conflicts in stack allocation with other features  */
+static bool
+riscv_avoid_multi_push(const struct riscv_frame_info *frame)
+{
+  if (!TARGET_ZCMP
+      || crtl->calls_eh_return
+      || frame_pointer_needed
+      || cfun->machine->interrupt_handler_p
+      || cfun->machine->varargs_size != 0
+      || crtl->args.pretend_args_size != 0
+      || (frame->mask & ~ MULTI_PUSH_GPR_MASK))
+    return true;
+
+  return false;
+}
+
+/* Determine whether to use multi push insn.  */
+static bool
+riscv_use_multi_push(const struct riscv_frame_info *frame)
+{
+  if (riscv_avoid_multi_push (frame))
+    return false;
+
+  return (frame->multi_push_adj_base != 0);
+}
+
 /* Return TRUE if a libcall to save/restore GPRs should be
    avoided.  FALSE otherwise.  */
 static bool
@@ -4881,6 +4935,51 @@  riscv_save_libcall_count (unsigned mask)
   abort ();
 }
 
+/* calculate number of s regs in multi push and pop.
+   Note that {s0-s10} is not valid in Zcmp, use {s0-s11} instead.  */
+static unsigned
+riscv_multi_push_sregs_count (unsigned mask)
+{
+  unsigned num = riscv_save_libcall_count (mask);
+  return (num == ZCMP_INVALID_S0S10_SREGS_COUNTS)
+    ? ZCMP_S0S11_SREGS_COUNTS
+    : num;
+}
+
+/* calculate number of regs(ra, s0-sx) in multi push and pop.  */
+static unsigned
+riscv_multi_push_regs_count (unsigned mask)
+{
+  /* 1 is for ra  */
+  return riscv_multi_push_sregs_count (mask) + 1;
+}
+
+/* Handle 16 bytes align for poly_int.  */
+static poly_int64
+riscv_16bytes_align (poly_int64 value)
+{
+  return aligned_upper_bound (value, 16);
+}
+
+static HOST_WIDE_INT
+riscv_16bytes_align (HOST_WIDE_INT value)
+{
+  return ROUND_UP(value, 16);
+}
+
+/* Handle stack align for poly_int.  */
+static poly_int64
+riscv_stack_align (poly_int64 value)
+{
+  return aligned_upper_bound (value, PREFERRED_STACK_BOUNDARY / 8);
+}
+
+static HOST_WIDE_INT
+riscv_stack_align (HOST_WIDE_INT value)
+{
+  return RISCV_STACK_ALIGN (value);
+}
+
 /* Populate the current function's riscv_frame_info structure.
 
    RISC-V stack frames grown downward.  High addresses are at the top.
@@ -4906,7 +5005,7 @@  riscv_save_libcall_count (unsigned mask)
 	|  GPR save area                |       + UNITS_PER_WORD
 	|                               |
 	+-------------------------------+ <-- stack_pointer_rtx + fp_sp_offset
-	|                               |       + UNITS_PER_HWVALUE
+	|                               |       + UNITS_PER_FP_REG
 	|  FPR save area                |
 	|                               |
 	+-------------------------------+ <-- frame_pointer_rtx (virtual)
@@ -4925,19 +5024,6 @@  riscv_save_libcall_count (unsigned mask)
 
 static HOST_WIDE_INT riscv_first_stack_step (struct riscv_frame_info *frame, poly_int64 remaining_size);
 
-/* Handle stack align for poly_int.  */
-static poly_int64
-riscv_stack_align (poly_int64 value)
-{
-  return aligned_upper_bound (value, PREFERRED_STACK_BOUNDARY / 8);
-}
-
-static HOST_WIDE_INT
-riscv_stack_align (HOST_WIDE_INT value)
-{
-  return RISCV_STACK_ALIGN (value);
-}
-
 static void
 riscv_compute_frame_info (void)
 {
@@ -4985,8 +5071,9 @@  riscv_compute_frame_info (void)
   if (frame->mask)
     {
       x_save_size = riscv_stack_align (num_x_saved * UNITS_PER_WORD);
-      unsigned num_save_restore = 1 + riscv_save_libcall_count (frame->mask);
 
+      /* 1 is for ra  */
+      unsigned num_save_restore = 1 + riscv_save_libcall_count (frame->mask);
       /* Only use save/restore routines if they don't alter the stack size.  */
       if (riscv_stack_align (num_save_restore * UNITS_PER_WORD) == x_save_size
           && !riscv_avoid_save_libcall ())
@@ -4998,6 +5085,15 @@  riscv_compute_frame_info (void)
 
 	  frame->save_libcall_adjustment = x_save_size;
 	}
+
+      if (!riscv_avoid_multi_push (frame))
+        {
+          /* num(ra, s0-sx)  */
+          unsigned num_multi_push =
+            riscv_multi_push_regs_count (frame->mask);
+          x_save_size = riscv_stack_align (num_multi_push * UNITS_PER_WORD);
+          frame->multi_push_adj_base = riscv_16bytes_align (x_save_size);
+        }
     }
 
   /* At the bottom of the frame are any outgoing stack arguments. */
@@ -5012,7 +5108,15 @@  riscv_compute_frame_info (void)
   frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
   /* Next are the callee-saved GPRs. */
   if (frame->mask)
-    offset += x_save_size;
+    {
+      offset += x_save_size;
+      /* align to 16 bytes and add paddings to GPR part to honor
+         both stack alignment and zcmp pus/pop size alignment. */
+      if (riscv_use_multi_push (frame)
+          && known_lt(offset,
+                      frame->multi_push_adj_base + ZCMP_SP_INC_STEP * ZCMP_MAX_SPIMM))
+        offset = riscv_16bytes_align (offset);
+    }
   frame->gp_sp_offset = offset - UNITS_PER_WORD;
   /* The hard frame pointer points above the callee-saved GPRs. */
   frame->hard_frame_pointer_offset = offset;
@@ -5356,6 +5460,42 @@  riscv_adjust_libcall_cfi_prologue ()
   return dwarf;
 }
 
+static rtx
+riscv_adjust_multi_push_cfi_prologue (int saved_size)
+{
+  rtx dwarf = NULL_RTX;
+  rtx adjust_sp_rtx, reg, mem, insn;
+  unsigned int mask = cfun->machine->frame.mask;
+  int offset;
+  int saved_cnt = 0;
+
+  if (mask & S10_MASK)
+    mask |= S11_MASK;
+
+  for (int regno = GP_REG_LAST; regno >= GP_REG_FIRST; regno--)
+    if (BITSET_P (mask & MULTI_PUSH_GPR_MASK, regno - GP_REG_FIRST))
+      {
+        /* The save order is s11-s0, ra
+           from high to low addr.  */
+        offset = saved_size - UNITS_PER_WORD * (++saved_cnt);
+
+        reg = gen_rtx_REG (SImode, regno);
+        mem = gen_frame_mem (SImode, plus_constant (Pmode,
+                                                    stack_pointer_rtx,
+                                                    offset));
+
+        insn = gen_rtx_SET (mem, reg);
+        dwarf = alloc_reg_note (REG_CFA_OFFSET, insn, dwarf);
+      }
+
+  /* Debug info for adjust sp.  */
+  adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx,
+                               plus_constant(Pmode, stack_pointer_rtx, -saved_size));
+  dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
+                          dwarf);
+  return dwarf;
+}
+
 static void
 riscv_emit_stack_tie (void)
 {
@@ -5365,6 +5505,152 @@  riscv_emit_stack_tie (void)
     emit_insn (gen_stack_tiedi (stack_pointer_rtx, hard_frame_pointer_rtx));
 }
 
+static rtx
+get_slot_offset_rtx (int slot_idx)
+{
+  HOST_WIDE_INT slot_offset = -1 * (slot_idx + 1) * GET_MODE_SIZE (word_mode);
+  return GEN_INT (slot_offset);
+}
+
+/*zcmp multi push and pop function ptr array  */
+const insn_gen_fn gen_push_pop [MAX_OP_IDX + 1][MAX_MODE_IDX + 1][ZCMP_MAX_GRP_SLOTS] =
+{{{(insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_ra_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s0_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s1_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s2_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s3_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s4_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s5_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s6_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s7_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s8_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s9_si,
+   NULL,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s11_si},
+  {(insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_ra_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s0_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s1_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s2_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s3_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s4_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s5_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s6_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s7_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s8_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s9_di,
+   NULL,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_push_up_to_s11_di}},
+ {{(insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_ra_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s0_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s1_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s2_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s3_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s4_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s5_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s6_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s7_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s8_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s9_si,
+   NULL,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s11_si},
+  {(insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_ra_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s0_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s1_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s2_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s3_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s4_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s5_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s6_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s7_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s8_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s9_di,
+   NULL,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_pop_up_to_s11_di}},
+ {{(insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_ra_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s0_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s1_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s2_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s3_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s4_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s5_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s6_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s7_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s8_si,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s9_si,
+   NULL,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s11_si},
+  {(insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_ra_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s0_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s1_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s2_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s3_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s4_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s5_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s6_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s7_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s8_di,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s9_di,
+   NULL,
+   (insn_gen_fn::stored_funcptr) gen_gpr_multi_popret_up_to_s11_di}}};
+
+static rtx
+riscv_gen_multi_push_pop_insn (op_idx op, HOST_WIDE_INT adj_size, unsigned int regs_num)
+{
+  rtx stack_adj = GEN_INT (adj_size);
+  rtx slots[ZCMP_MAX_GRP_SLOTS];
+
+  for (int slot_idx = 0; slot_idx < ZCMP_MAX_GRP_SLOTS; slot_idx++)
+    slots[slot_idx] = get_slot_offset_rtx (slot_idx);
+
+  switch (regs_num)
+    {
+    case 1:
+      return (gen_push_pop[op][TARGET_64BIT][regs_num - 1])
+        (stack_adj, slots[0]);
+    case 2:
+      return (gen_push_pop[op][TARGET_64BIT][regs_num - 1])
+        (stack_adj, slots[0], slots[1]);
+    case 3:
+      return (gen_push_pop[op][TARGET_64BIT][regs_num - 1])
+        (stack_adj, slots[0], slots[1], slots[2]);
+    case 4:
+      return (gen_push_pop[op][TARGET_64BIT][regs_num - 1])
+        (stack_adj, slots[0], slots[1], slots[2], slots[3]);
+    case 5:
+      return (gen_push_pop[op][TARGET_64BIT][regs_num - 1])
+        (stack_adj, slots[0], slots[1], slots[2], slots[3], slots[4]);
+    case 6:
+      return (gen_push_pop[op][TARGET_64BIT][regs_num - 1])
+        (stack_adj, slots[0], slots[1], slots[2], slots[3], slots[4], slots[5]);
+    case 7:
+      return (gen_push_pop[op][TARGET_64BIT][regs_num - 1])
+        (stack_adj, slots[0], slots[1], slots[2], slots[3], slots[4], slots[5],
+        slots[6]);
+    case 8:
+      return (gen_push_pop[op][TARGET_64BIT][regs_num - 1])
+        (stack_adj, slots[0], slots[1], slots[2], slots[3], slots[4], slots[5],
+        slots[6], slots[7]);
+    case 9:
+      return (gen_push_pop[op][TARGET_64BIT][regs_num - 1])
+        (stack_adj, slots[0], slots[1], slots[2], slots[3], slots[4], slots[5],
+        slots[6], slots[7], slots[8]);
+    case 10:
+      return (gen_push_pop[op][TARGET_64BIT][regs_num - 1])
+        (stack_adj, slots[0], slots[1], slots[2], slots[3], slots[4], slots[5],
+        slots[6], slots[7], slots[8], slots[9]);
+    case 11:
+      return (gen_push_pop[op][TARGET_64BIT][regs_num - 1])
+        (stack_adj, slots[0], slots[1], slots[2], slots[3], slots[4], slots[5],
+        slots[6], slots[7], slots[8], slots[9], slots[10]);
+    case 13:
+      return (gen_push_pop[op][TARGET_64BIT][regs_num - 1])
+        (stack_adj, slots[0], slots[1], slots[2], slots[3], slots[4], slots[5],
+        slots[6], slots[7], slots[8], slots[9], slots[10], slots[11], slots[12]);
+    default:
+      gcc_unreachable ();
+    }
+}
+
 /* Expand the "prologue" pattern.  */
 
 void
@@ -5373,7 +5659,8 @@  riscv_expand_prologue (void)
   struct riscv_frame_info *frame = &cfun->machine->frame;
   poly_int64 remaining_size = frame->total_size;
   unsigned mask = frame->mask;
-  rtx insn;
+  int spimm, multi_push_additional, stack_adj;
+  rtx insn, dwarf = NULL_RTX;
 
   if (flag_stack_usage_info)
     current_function_static_stack_size = constant_lower_bound (remaining_size);
@@ -5381,8 +5668,35 @@  riscv_expand_prologue (void)
   if (cfun->machine->naked_p)
     return;
 
+  /* prefer muti-push to save-restore libcall.  */
+  if (riscv_use_multi_push(frame))
+    {
+      remaining_size -= frame->multi_push_adj_base;
+      if (known_gt(remaining_size, 2 * ZCMP_SP_INC_STEP))
+        spimm = 3;
+      else if (known_gt(remaining_size, ZCMP_SP_INC_STEP))
+        spimm = 2;
+      else if (known_gt(remaining_size, 0))
+        spimm = 1;
+      else
+        spimm = 0;
+      multi_push_additional = spimm * ZCMP_SP_INC_STEP;
+      frame->multi_push_adj_addi = multi_push_additional;
+      remaining_size -= multi_push_additional;
+
+      /* emit multi push insn & dwarf along with it.  */
+      stack_adj = frame->multi_push_adj_base + multi_push_additional;
+      insn = emit_insn (riscv_gen_multi_push_pop_insn(PUSH_IDX,
+        -stack_adj, riscv_multi_push_regs_count(frame->mask)));
+      dwarf = riscv_adjust_multi_push_cfi_prologue (stack_adj);
+      RTX_FRAME_RELATED_P (insn) = 1;
+      REG_NOTES (insn) = dwarf;
+
+      /* Temporarily fib that we need not save GPRs.  */
+      frame->mask = 0; 
+    }
   /* When optimizing for size, call a subroutine to save the registers.  */
-  if (riscv_use_save_libcall (frame))
+  else if (riscv_use_save_libcall (frame))
     {
       rtx dwarf = NULL_RTX;
       dwarf = riscv_adjust_libcall_cfi_prologue ();
@@ -5398,13 +5712,15 @@  riscv_expand_prologue (void)
   /* Save the registers.  */
   if ((frame->mask | frame->fmask) != 0)
     {
-      HOST_WIDE_INT step1 = riscv_first_stack_step (frame, remaining_size);
-
-      insn = gen_add3_insn (stack_pointer_rtx,
-			    stack_pointer_rtx,
-			    GEN_INT (-step1));
-      RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
-      remaining_size -= step1;
+      if (known_gt (remaining_size, frame->frame_pointer_offset))
+        {
+          HOST_WIDE_INT step1 = riscv_first_stack_step (frame, remaining_size);
+          remaining_size -= step1;
+          insn = gen_add3_insn (stack_pointer_rtx,
+                                stack_pointer_rtx,
+                                GEN_INT (-step1));
+          RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
+        }
       riscv_for_each_saved_reg (remaining_size, riscv_save_reg, false, false);
     }
 
@@ -5461,6 +5777,32 @@  riscv_expand_prologue (void)
     }
 }
 
+static rtx
+riscv_adjust_multi_pop_cfi_epilogue (int saved_size)
+{
+  rtx dwarf = NULL_RTX;
+  rtx adjust_sp_rtx, reg;
+  unsigned int mask = cfun->machine->frame.mask;
+
+  if (mask & S10_MASK)
+    mask |= S11_MASK;
+
+  /* Debug info for adjust sp.  */
+  adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx,
+                               plus_constant(Pmode, stack_pointer_rtx, saved_size));
+  dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
+                          dwarf);
+
+  for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
+    if (BITSET_P (mask, regno - GP_REG_FIRST))
+      {
+        reg = gen_rtx_REG (SImode, regno);
+        dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+      }
+
+  return dwarf;
+}
+
 static rtx
 riscv_adjust_libcall_cfi_epilogue ()
 {
@@ -5500,10 +5842,18 @@  riscv_expand_epilogue (int style)
   struct riscv_frame_info *frame = &cfun->machine->frame;
   unsigned mask = frame->mask;
   HOST_WIDE_INT step2 = 0;
-  bool use_restore_libcall = ((style == NORMAL_RETURN)
-			      && riscv_use_save_libcall (frame));
-  unsigned libcall_size = (use_restore_libcall
-			   ? frame->save_libcall_adjustment : 0);
+  bool use_multi_pop_normal = ((style == NORMAL_RETURN)
+                              && riscv_use_multi_push (frame));
+  bool use_multi_pop_sibcall = ((style == SIBCALL_RETURN)
+                              && riscv_use_multi_push (frame));
+  bool use_multi_pop = use_multi_pop_normal || use_multi_pop_sibcall;
+
+  bool use_restore_libcall = !use_multi_pop && ((style == NORMAL_RETURN)
+                              && riscv_use_save_libcall (frame));
+  unsigned libcall_size = use_restore_libcall && !use_multi_pop ?
+                            frame->save_libcall_adjustment : 0;
+  unsigned multipop_size = use_multi_pop ?
+                            frame->multi_push_adj_base + frame->multi_push_adj_addi : 0;
   rtx ra = gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM);
   rtx insn;
 
@@ -5574,18 +5924,25 @@  riscv_expand_epilogue (int style)
       REG_NOTES (insn) = dwarf;
     }
 
-  if (use_restore_libcall)
-    frame->mask = 0; /* Temporarily fib for GPRs.  */
+  if (use_restore_libcall || use_multi_pop)
+    frame->mask = 0; /* Temporarily fib that we need not save GPRs.  */
 
   /* If we need to restore registers, deallocate as much stack as
      possible in the second step without going out of range.  */
-  if ((frame->mask | frame->fmask) != 0)
+  if (use_multi_pop)
+    {
+      if (frame->fmask
+          && known_gt (frame->total_size - multipop_size,
+                      frame->frame_pointer_offset))
+        step2 = riscv_first_stack_step (frame, frame->total_size - multipop_size);
+    }
+  else if ((frame->mask | frame->fmask) != 0)
     step2 = riscv_first_stack_step (frame, frame->total_size - libcall_size);
 
-  if (use_restore_libcall)
+  if (use_restore_libcall || use_multi_pop)
     frame->mask = mask; /* Undo the above fib.  */
 
-  poly_int64 step1 = frame->total_size - step2 - libcall_size;
+  poly_int64 step1 = frame->total_size - step2 - libcall_size - multipop_size ;
 
   /* Set TARGET to BASE + STEP1.  */
   if (known_gt (step1, 0))
@@ -5620,7 +5977,7 @@  riscv_expand_epilogue (int style)
 					   adjust));
 	  rtx dwarf = NULL_RTX;
 	  rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
-					     GEN_INT (step2));
+					     GEN_INT (step2 + libcall_size + multipop_size));
 
 	  dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
 	  RTX_FRAME_RELATED_P (insn) = 1;
@@ -5635,15 +5992,15 @@  riscv_expand_epilogue (int style)
       epilogue_cfa_sp_offset = step2;
     }
 
-  if (use_restore_libcall)
+  if (use_restore_libcall || use_multi_pop)
     frame->mask = 0; /* Temporarily fib that we need not save GPRs.  */
 
   /* Restore the registers.  */
-  riscv_for_each_saved_reg (frame->total_size - step2 - libcall_size,
+  riscv_for_each_saved_reg (frame->total_size - step2 - libcall_size - multipop_size,
 			    riscv_restore_reg,
 			    true, style == EXCEPTION_RETURN);
 
-  if (use_restore_libcall)
+  if (use_restore_libcall || use_multi_pop)
       frame->mask = mask; /* Undo the above fib.  */
 
   if (need_barrier_p)
@@ -5657,14 +6014,30 @@  riscv_expand_epilogue (int style)
 
       rtx dwarf = NULL_RTX;
       rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
-					 const0_rtx);
+					 GEN_INT (libcall_size + multipop_size));
       dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
       RTX_FRAME_RELATED_P (insn) = 1;
 
       REG_NOTES (insn) = dwarf;
     }
 
-  if (use_restore_libcall)
+  if (use_multi_pop)
+    {
+      unsigned regs_count = riscv_multi_push_regs_count (frame->mask);
+      if (use_multi_pop_normal)
+        insn = emit_jump_insn (
+          riscv_gen_multi_push_pop_insn (POPRET_IDX, multipop_size, regs_count));
+      else
+        insn= emit_insn (
+          riscv_gen_multi_push_pop_insn(POP_IDX, multipop_size, regs_count));
+
+      rtx dwarf = riscv_adjust_multi_pop_cfi_epilogue (multipop_size);
+      RTX_FRAME_RELATED_P (insn) = 1;
+      REG_NOTES (insn) = dwarf;
+      if (use_multi_pop_normal)
+        return;
+    }
+  else if (use_restore_libcall)
     {
       rtx dwarf = riscv_adjust_libcall_cfi_epilogue ();
       insn = emit_insn (gen_gpr_restore (GEN_INT (riscv_save_libcall_count (mask))));
@@ -6937,6 +7310,30 @@  riscv_gen_gpr_save_insn (struct riscv_frame_info *frame)
   return gen_rtx_PARALLEL (VOIDmode, vec);
 }
 
+static HOST_WIDE_INT zcmp_base_adj(int regs_num)
+{
+  return riscv_16bytes_align ((regs_num) * GET_MODE_SIZE (word_mode));
+}
+
+static HOST_WIDE_INT zcmp_additional_adj(HOST_WIDE_INT total, int regs_num)
+{
+  return total - zcmp_base_adj(regs_num);
+}
+
+bool riscv_zcmp_valid_slot_offset_p (HOST_WIDE_INT offset, int slot_idx)
+{
+  return offset == -1 * (slot_idx + 1) * GET_MODE_SIZE (word_mode);
+}
+
+bool riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT total, int regs_num)
+{
+  HOST_WIDE_INT additioanl_bytes = zcmp_additional_adj(total, regs_num);
+  return additioanl_bytes == 0
+         || additioanl_bytes  == 1 * ZCMP_SP_INC_STEP
+         || additioanl_bytes  == 2 * ZCMP_SP_INC_STEP
+         || additioanl_bytes  == ZCMP_MAX_SPIMM * ZCMP_SP_INC_STEP;
+}
+
 /* Return true if it's valid gpr_save pattern.  */
 
 bool
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 13038a39e5c..ff210083004 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -413,6 +413,29 @@  ASM_MISA_SPEC
 #define RISCV_CALL_ADDRESS_TEMP(MODE) \
   gen_rtx_REG (MODE, RISCV_CALL_ADDRESS_TEMP_REGNUM)
 
+#define RETURN_ADDR_MASK        ( 1 << RETURN_ADDR_REGNUM)
+#define S0_MASK                 ( 1 << S0_REGNUM)
+#define S1_MASK                 ( 1 << S1_REGNUM)
+#define S2_MASK                 ( 1 << S2_REGNUM)
+#define S3_MASK                 ( 1 << S3_REGNUM)
+#define S4_MASK                 ( 1 << S4_REGNUM)
+#define S5_MASK                 ( 1 << S5_REGNUM)
+#define S6_MASK                 ( 1 << S6_REGNUM)
+#define S7_MASK                 ( 1 << S7_REGNUM)
+#define S8_MASK                 ( 1 << S8_REGNUM)
+#define S9_MASK                 ( 1 << S9_REGNUM)
+#define S10_MASK                ( 1 << S10_REGNUM)
+#define S11_MASK                ( 1 << S11_REGNUM)
+
+#define MULTI_PUSH_GPR_MASK ( RETURN_ADDR_MASK | S0_MASK | S1_MASK | S2_MASK  | S3_MASK \
+                                               | S4_MASK | S5_MASK | S6_MASK  | S7_MASK \
+                                               | S8_MASK | S9_MASK | S10_MASK | S11_MASK )
+#define ZCMP_MAX_SPIMM 3
+#define ZCMP_SP_INC_STEP 16
+#define ZCMP_INVALID_S0S10_SREGS_COUNTS 11
+#define ZCMP_S0S11_SREGS_COUNTS 12
+#define ZCMP_MAX_GRP_SLOTS 13
+
 #define MCOUNT_NAME "_mcount"
 
 #define NO_PROFILE_COUNTERS 1
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 7065e68c0b7..73fc8cb69bc 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -113,6 +113,7 @@ 
 
 (define_constants
   [(RETURN_ADDR_REGNUM		1)
+   (SP_REGNUM 			2)
    (GP_REGNUM 			3)
    (TP_REGNUM			4)
    (T0_REGNUM			5)
@@ -3205,3 +3206,4 @@ 
 (include "sifive-7.md")
 (include "thead.md")
 (include "vector.md")
+(include "zc.md")
diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
new file mode 100644
index 00000000000..6e6c87983fb
--- /dev/null
+++ b/gcc/config/riscv/zc.md
@@ -0,0 +1,1042 @@ 
+;; Machine description for RISC-V Zc extention.
+;; Copyright (C) 2011-2023 Free Software Foundation, Inc.
+;; Contributed by Fei Gao (gaofei@eswincomputing.com).
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_insn "gpr_multi_pop_up_to_ra_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_ra_operand" "I")))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))]
+  "TARGET_ZCMP"
+  "cm.pop	{ra}, %0"
+)
+
+(define_insn "gpr_multi_pop_up_to_s0_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s0_operand" "I")))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))]
+  "TARGET_ZCMP"
+  "cm.pop	{ra, s0}, %0"
+)
+
+(define_insn "gpr_multi_pop_up_to_s1_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s1_operand" "I")))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))]
+  "TARGET_ZCMP"
+  "cm.pop	{ra, s0-s1}, %0"
+)
+
+(define_insn "gpr_multi_pop_up_to_s2_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s2_operand" "I")))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))]
+  "TARGET_ZCMP"
+  "cm.pop	{ra, s0-s2}, %0"
+)
+
+(define_insn "gpr_multi_pop_up_to_s3_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s3_operand" "I")))
+   (set (reg:X S3_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I"))))]
+  "TARGET_ZCMP"
+  "cm.pop	{ra, s0-s3}, %0"
+)
+
+(define_insn "gpr_multi_pop_up_to_s4_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s4_operand" "I")))
+   (set (reg:X S4_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S3_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I"))))]
+  "TARGET_ZCMP"
+  "cm.pop	{ra, s0-s4}, %0"
+)
+
+(define_insn "gpr_multi_pop_up_to_s5_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s5_operand" "I")))
+   (set (reg:X S5_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S4_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S3_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I"))))]
+  "TARGET_ZCMP"
+  "cm.pop	{ra, s0-s5}, %0"
+)
+
+(define_insn "gpr_multi_pop_up_to_s6_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s6_operand" "I")))
+   (set (reg:X S6_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S5_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S4_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X S3_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 8 "slot_7_offset_operand" "I"))))]
+  "TARGET_ZCMP"
+  "cm.pop	{ra, s0-s6}, %0"
+)
+
+(define_insn "gpr_multi_pop_up_to_s7_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s7_operand" "I")))
+   (set (reg:X S7_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S6_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S5_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X S4_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))
+   (set (reg:X S3_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I"))))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 8 "slot_7_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 9 "slot_8_offset_operand" "I"))))]
+  "TARGET_ZCMP"
+  "cm.pop	{ra, s0-s7}, %0"
+)
+
+(define_insn "gpr_multi_pop_up_to_s8_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s8_operand" "I")))
+   (set (reg:X S8_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S7_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S6_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X S5_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))
+   (set (reg:X S4_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I"))))
+   (set (reg:X S3_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I"))))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 8 "slot_7_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 9 "slot_8_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 10 "slot_9_offset_operand" "I"))))]
+  "TARGET_ZCMP"
+  "cm.pop	{ra, s0-s8}, %0"
+)
+
+(define_insn "gpr_multi_pop_up_to_s9_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s9_operand" "I")))
+   (set (reg:X S9_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S8_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S7_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X S6_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))
+   (set (reg:X S5_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I"))))
+   (set (reg:X S4_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I"))))
+   (set (reg:X S3_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I"))))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 8 "slot_7_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 9 "slot_8_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 10 "slot_9_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 11 "slot_10_offset_operand" "I"))))]
+  "TARGET_ZCMP"
+  "cm.pop	{ra, s0-s9}, %0"
+)
+
+(define_insn "gpr_multi_pop_up_to_s11_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s11_operand" "I")))
+   (set (reg:X S11_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S10_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S9_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X S8_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))
+   (set (reg:X S7_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I"))))
+   (set (reg:X S6_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I"))))
+   (set (reg:X S5_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I"))))
+   (set (reg:X S4_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 8 "slot_7_offset_operand" "I"))))
+   (set (reg:X S3_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 9 "slot_8_offset_operand" "I"))))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 10 "slot_9_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 11 "slot_10_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 12 "slot_11_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 13 "slot_12_offset_operand" "I"))))]
+  "TARGET_ZCMP"
+  "cm.pop	{ra, s0-s11}, %0"
+)
+
+(define_insn "gpr_multi_popret_up_to_ra_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_ra_operand" "I")))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (return)
+   (use (reg:SI RETURN_ADDR_REGNUM))]
+  "TARGET_ZCMP"
+  "cm.popret	{ra}, %0"
+)
+
+(define_insn "gpr_multi_popret_up_to_s0_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s0_operand" "I")))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (return)
+   (use (reg:SI RETURN_ADDR_REGNUM))]
+  "TARGET_ZCMP"
+  "cm.popret	{ra, s0}, %0"
+)
+
+(define_insn "gpr_multi_popret_up_to_s1_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s1_operand" "I")))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (return)
+   (use (reg:SI RETURN_ADDR_REGNUM))]
+  "TARGET_ZCMP"
+  "cm.popret	{ra, s0-s1}, %0"
+)
+
+(define_insn "gpr_multi_popret_up_to_s2_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s2_operand" "I")))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))
+   (return)
+   (use (reg:SI RETURN_ADDR_REGNUM))]
+  "TARGET_ZCMP"
+  "cm.popret	{ra, s0-s2}, %0"
+)
+
+(define_insn "gpr_multi_popret_up_to_s3_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s3_operand" "I")))
+   (set (reg:X S3_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I"))))
+   (return)
+   (use (reg:SI RETURN_ADDR_REGNUM))]
+  "TARGET_ZCMP"
+  "cm.popret	{ra, s0-s3}, %0"
+)
+
+(define_insn "gpr_multi_popret_up_to_s4_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s4_operand" "I")))
+   (set (reg:X S4_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S3_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I"))))
+   (return)
+   (use (reg:SI RETURN_ADDR_REGNUM))]
+  "TARGET_ZCMP"
+  "cm.popret	{ra, s0-s4}, %0"
+)
+
+(define_insn "gpr_multi_popret_up_to_s5_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s5_operand" "I")))
+   (set (reg:X S5_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S4_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S3_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I"))))
+   (return)
+   (use (reg:SI RETURN_ADDR_REGNUM))]
+  "TARGET_ZCMP"
+  "cm.popret	{ra, s0-s5}, %0"
+)
+
+(define_insn "gpr_multi_popret_up_to_s6_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s6_operand" "I")))
+   (set (reg:X S6_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S5_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S4_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X S3_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 8 "slot_7_offset_operand" "I"))))
+   (return)
+   (use (reg:SI RETURN_ADDR_REGNUM))]
+  "TARGET_ZCMP"
+  "cm.popret	{ra, s0-s6}, %0"
+)
+
+(define_insn "gpr_multi_popret_up_to_s7_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s7_operand" "I")))
+   (set (reg:X S7_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S6_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S5_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X S4_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))
+   (set (reg:X S3_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I"))))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 8 "slot_7_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 9 "slot_8_offset_operand" "I"))))
+   (return)
+   (use (reg:SI RETURN_ADDR_REGNUM))]
+  "TARGET_ZCMP"
+  "cm.popret	{ra, s0-s7}, %0"
+)
+
+(define_insn "gpr_multi_popret_up_to_s8_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s8_operand" "I")))
+   (set (reg:X S8_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S7_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S6_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X S5_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))
+   (set (reg:X S4_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I"))))
+   (set (reg:X S3_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I"))))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 8 "slot_7_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 9 "slot_8_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 10 "slot_9_offset_operand" "I"))))
+   (return)
+   (use (reg:SI RETURN_ADDR_REGNUM))]
+  "TARGET_ZCMP"
+  "cm.popret	{ra, s0-s8}, %0"
+)
+
+(define_insn "gpr_multi_popret_up_to_s9_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s9_operand" "I")))
+   (set (reg:X S9_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S8_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S7_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X S6_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))
+   (set (reg:X S5_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I"))))
+   (set (reg:X S4_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I"))))
+   (set (reg:X S3_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I"))))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 8 "slot_7_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 9 "slot_8_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 10 "slot_9_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 11 "slot_10_offset_operand" "I"))))
+   (return)
+   (use (reg:SI RETURN_ADDR_REGNUM))]
+  "TARGET_ZCMP"
+  "cm.popret	{ra, s0-s9}, %0"
+)
+
+(define_insn "gpr_multi_popret_up_to_s11_<mode>"
+  [(set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_pop_up_to_s11_operand" "I")))
+   (set (reg:X S11_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I"))))
+   (set (reg:X S10_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I"))))
+   (set (reg:X S9_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I"))))
+   (set (reg:X S8_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I"))))
+   (set (reg:X S7_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I"))))
+   (set (reg:X S6_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I"))))
+   (set (reg:X S5_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I"))))
+   (set (reg:X S4_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 8 "slot_7_offset_operand" "I"))))
+   (set (reg:X S3_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 9 "slot_8_offset_operand" "I"))))
+   (set (reg:X S2_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 10 "slot_9_offset_operand" "I"))))
+   (set (reg:X S1_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 11 "slot_10_offset_operand" "I"))))
+   (set (reg:X S0_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 12 "slot_11_offset_operand" "I"))))
+   (set (reg:X RETURN_ADDR_REGNUM)
+        (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 13 "slot_12_offset_operand" "I"))))
+   (return)
+   (use (reg:SI RETURN_ADDR_REGNUM))]
+  "TARGET_ZCMP"
+  "cm.popret	{ra, s0-s11}, %0"
+)
+
+(define_insn "gpr_multi_push_up_to_ra_<mode>"
+  [(set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I")))
+        (reg:X RETURN_ADDR_REGNUM))
+   (set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_push_up_to_ra_operand" "I")))]
+  "TARGET_ZCMP"
+  "cm.push	{ra}, %0"
+)
+
+(define_insn "gpr_multi_push_up_to_s0_<mode>"
+  [(set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I")))
+        (reg:X S0_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I")))
+        (reg:X RETURN_ADDR_REGNUM))
+   (set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_push_up_to_s0_operand" "I")))]
+  "TARGET_ZCMP"
+  "cm.push	{ra, s0}, %0"
+)
+
+(define_insn "gpr_multi_push_up_to_s1_<mode>"
+  [(set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I")))
+        (reg:X S1_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I")))
+        (reg:X S0_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I")))
+        (reg:X RETURN_ADDR_REGNUM))
+   (set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_push_up_to_s1_operand" "I")))]
+  "TARGET_ZCMP"
+  "cm.push	{ra, s0-s1}, %0"
+)
+
+(define_insn "gpr_multi_push_up_to_s2_<mode>"
+  [(set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I")))
+        (reg:X S2_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I")))
+        (reg:X S1_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I")))
+        (reg:X S0_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I")))
+        (reg:X RETURN_ADDR_REGNUM))
+   (set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_push_up_to_s2_operand" "I")))]
+  "TARGET_ZCMP"
+  "cm.push	{ra, s0-s2}, %0"
+)
+
+(define_insn "gpr_multi_push_up_to_s3_<mode>"
+  [(set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I")))
+        (reg:X S3_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I")))
+        (reg:X S2_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I")))
+        (reg:X S1_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I")))
+        (reg:X S0_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I")))
+        (reg:X RETURN_ADDR_REGNUM))
+   (set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_push_up_to_s3_operand" "I")))]
+  "TARGET_ZCMP"
+  "cm.push	{ra, s0-s3}, %0"
+)
+
+(define_insn "gpr_multi_push_up_to_s4_<mode>"
+  [(set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I")))
+        (reg:X S4_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I")))
+        (reg:X S3_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I")))
+        (reg:X S2_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I")))
+        (reg:X S1_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I")))
+        (reg:X S0_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I")))
+        (reg:X RETURN_ADDR_REGNUM))
+   (set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_push_up_to_s4_operand" "I")))]
+  "TARGET_ZCMP"
+  "cm.push	{ra, s0-s4}, %0"
+)
+
+(define_insn "gpr_multi_push_up_to_s5_<mode>"
+  [(set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I")))
+        (reg:X S5_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I")))
+        (reg:X S4_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I")))
+        (reg:X S3_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I")))
+        (reg:X S2_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I")))
+        (reg:X S1_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I")))
+        (reg:X S0_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I")))
+        (reg:X RETURN_ADDR_REGNUM))
+   (set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_push_up_to_s5_operand" "I")))]
+  "TARGET_ZCMP"
+  "cm.push	{ra, s0-s5}, %0"
+)
+
+(define_insn "gpr_multi_push_up_to_s6_<mode>"
+  [(set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I")))
+        (reg:X S6_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I")))
+        (reg:X S5_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I")))
+        (reg:X S4_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I")))
+        (reg:X S3_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I")))
+        (reg:X S2_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I")))
+        (reg:X S1_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I")))
+        (reg:X S0_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 8 "slot_7_offset_operand" "I")))
+        (reg:X RETURN_ADDR_REGNUM))
+   (set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_push_up_to_s6_operand" "I")))]
+  "TARGET_ZCMP"
+  "cm.push	{ra, s0-s6}, %0"
+)
+
+(define_insn "gpr_multi_push_up_to_s7_<mode>"
+  [(set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I")))
+        (reg:X S7_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I")))
+        (reg:X S6_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I")))
+        (reg:X S5_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I")))
+        (reg:X S4_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I")))
+        (reg:X S3_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I")))
+        (reg:X S2_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I")))
+        (reg:X S1_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 8 "slot_7_offset_operand" "I")))
+        (reg:X S0_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 9 "slot_8_offset_operand" "I")))
+        (reg:X RETURN_ADDR_REGNUM))
+   (set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_push_up_to_s7_operand" "I")))]
+  "TARGET_ZCMP"
+  "cm.push	{ra, s0-s7}, %0"
+)
+
+(define_insn "gpr_multi_push_up_to_s8_<mode>"
+  [(set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I")))
+        (reg:X S8_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I")))
+        (reg:X S7_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I")))
+        (reg:X S6_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I")))
+        (reg:X S5_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I")))
+        (reg:X S4_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I")))
+        (reg:X S3_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I")))
+        (reg:X S2_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 8 "slot_7_offset_operand" "I")))
+        (reg:X S1_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 9 "slot_8_offset_operand" "I")))
+        (reg:X S0_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 10 "slot_9_offset_operand" "I")))
+        (reg:X RETURN_ADDR_REGNUM))
+   (set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_push_up_to_s8_operand" "I")))]
+  "TARGET_ZCMP"
+  "cm.push	{ra, s0-s8}, %0"
+)
+
+(define_insn "gpr_multi_push_up_to_s9_<mode>"
+  [(set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I")))
+        (reg:X S9_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I")))
+        (reg:X S8_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I")))
+        (reg:X S7_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I")))
+        (reg:X S6_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I")))
+        (reg:X S5_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I")))
+        (reg:X S4_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I")))
+        (reg:X S3_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 8 "slot_7_offset_operand" "I")))
+        (reg:X S2_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 9 "slot_8_offset_operand" "I")))
+        (reg:X S1_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 10 "slot_9_offset_operand" "I")))
+        (reg:X S0_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 11 "slot_10_offset_operand" "I")))
+        (reg:X RETURN_ADDR_REGNUM))
+   (set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_push_up_to_s9_operand" "I")))]
+  "TARGET_ZCMP"
+  "cm.push	{ra, s0-s9}, %0"
+)
+
+(define_insn "gpr_multi_push_up_to_s11_<mode>"
+  [(set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 1 "slot_0_offset_operand" "I")))
+        (reg:X S11_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 2 "slot_1_offset_operand" "I")))
+        (reg:X S10_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 3 "slot_2_offset_operand" "I")))
+        (reg:X S9_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 4 "slot_3_offset_operand" "I")))
+        (reg:X S8_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 5 "slot_4_offset_operand" "I")))
+        (reg:X S7_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 6 "slot_5_offset_operand" "I")))
+        (reg:X S6_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 7 "slot_6_offset_operand" "I")))
+        (reg:X S5_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 8 "slot_7_offset_operand" "I")))
+        (reg:X S4_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 9 "slot_8_offset_operand" "I")))
+        (reg:X S3_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 10 "slot_9_offset_operand" "I")))
+        (reg:X S2_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 11 "slot_10_offset_operand" "I")))
+        (reg:X S1_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 12 "slot_11_offset_operand" "I")))
+        (reg:X S0_REGNUM))
+   (set (mem:X (plus:X (reg:X SP_REGNUM)
+                       (match_operand:X 13 "slot_12_offset_operand" "I")))
+        (reg:X RETURN_ADDR_REGNUM))
+   (set (reg:X SP_REGNUM)
+        (plus:X (reg:X SP_REGNUM)
+                 (match_operand 0 "stack_push_up_to_s11_operand" "I")))]
+  "TARGET_ZCMP"
+  "cm.push	{ra, s0-s11}, %0"
+)
diff --git a/gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c b/gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
new file mode 100644
index 00000000000..6dbe489da9b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
@@ -0,0 +1,239 @@ 
+/* { dg-do compile } */
+/* { dg-options " -Os -march=rv32e_zca_zcmp -mabi=ilp32e -mcmodel=medlow" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-O2" "-Og" "-O3" "-Oz" "-flto"} } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+char my_getchar();
+float getf();
+int __attribute__((noinline)) incoming_stack_args
+  (int arg0, int arg1, int arg2, int arg3,
+   int arg4, int arg5, int arg6, int arg7, int arg8);
+int getint();
+void PrintInts (int n, ...); // varargs
+void __attribute__((noinline)) PrintIntsNoVaStart (int n, ...); // varargs
+void PrintInts2 (int arg0, int arg1, int arg2, int arg3, int arg4, int arg5, int n, ...);
+extern void f1(void);
+extern void f2(void);
+
+/*
+**test1:
+**	...
+**	cm.push	{ra, s0-s1}, -64
+**	...
+**	cm.popret	{ra, s0-s1}, 64
+**	...
+*/
+int test1()
+{
+  char volatile array[3120];
+  float volatile farray[3120];
+
+  float sum = 0;
+  for (int i = 0; i < 3120; i++)
+  {
+    array[i] = my_getchar();
+    farray[i] = my_getchar() * 1.2;
+    sum += array[i] + farray[i];
+  }
+  return sum;
+}
+
+/*
+**test2_step1_0_size:
+**	...
+**	cm.push	{ra, s0}, -64
+**	...
+**	cm.popret	{ra, s0}, 64
+**	...
+*/
+int test2_step1_0_size()
+{
+  int volatile iarray[3120 + 1824/4 -8];
+
+  for (int i = 0; i < 3120 + 1824/4 - 8; i++)
+  {
+    iarray[i] = my_getchar() * 2;
+  }
+  return iarray[0] + iarray[1];
+}
+
+/*
+**test3:
+**	...
+**	cm.push	{ra, s0-s1}, -64
+**	...
+**	cm.popret	{ra, s0-s1}, 64
+**	...
+*/
+float test3()
+{
+  char volatile array[3120];
+  float volatile farray[3120];
+
+  float sum = 0, f1 = 0, f2 = 0, f3 = 0, f4 = 0, f5 = 0, f6 = 0, f7 = 0;
+
+  for (int i = 0; i < 3120; i++)
+  {
+    f1 = getf();
+    f2 = getf();
+    f3 = getf();
+    f4 = getf();
+    array[i] = my_getchar();
+    farray[i] = my_getchar() * 1.2;
+    sum += array[i] + farray[i] + f1 + f2 + f3 + f4;
+  }
+  return sum;
+}
+
+/*
+**outgoing_stack_args:
+**	...
+**	cm.push	{ra, s0}, -32
+**	...
+**	cm.popret	{ra, s0}, 32
+**	...
+*/
+int outgoing_stack_args()
+{
+  int  local = getint();
+  return local +incoming_stack_args(0, 1, 2, 3, 4, 5, 6, 7, 8);
+}
+
+/*
+**callPrintInts:
+**	...
+**	cm.push	{ra}, -32
+**	...
+**	cm.popret	{ra}, 32
+**	...
+*/
+float callPrintInts()
+{
+  volatile float f = getf(); // f in local
+  PrintInts(9,1,2,3,4,5,6,7,8,9);
+  return f;
+}
+
+/*
+**callPrint:
+**	...
+**	cm.push	{ra}, -32
+**	...
+**	cm.popret	{ra}, 32
+**	...
+*/
+float callPrint()
+{
+  volatile float f = getf(); // f in local
+  PrintIntsNoVaStart(0,1,2,3,4,5,6,7,8,9);
+  return f;
+}
+
+/*
+**callPrint_S:
+**	...
+**	cm.push	{ra, s0}, -32
+**	...
+**	cm.popret	{ra, s0}, 32
+**	...
+*/
+float callPrint_S()
+{
+  float f = getf();
+  PrintIntsNoVaStart(0,1,2,3,4,5,6,7,8,9);
+  return f;
+}
+
+/*
+**callPrint_2:
+**	...
+**	cm.push	{ra, s0}, -32
+**	...
+**	cm.popret	{ra, s0}, 32
+**	...
+*/
+float callPrint_2()
+{
+  float f = getf();
+  PrintInts2(0,1,2,3,4,5,6,7,8,9);
+  return f;
+}
+
+/*
+**test_step1_0bytes_save_restore:
+**	...
+**	cm.push	{ra}, -16
+**	...
+**	cm.popret	{ra}, 16
+**	...
+*/
+int test_step1_0bytes_save_restore()
+{
+
+  int a  =  9;
+  int b  =  my_getchar();
+  return a +b;
+}
+
+/*
+**test_s0:
+**	...
+**	cm.push	{ra, s0}, -16
+**	...
+**	cm.popret	{ra, s0}, 16
+**	...
+*/
+int test_s0()
+{
+
+  int a  =  my_getchar();
+  int b  =  my_getchar();
+  return a +b;
+}
+
+/*
+**test_s1:
+**	...
+**	cm.push	{ra, s0-s1}, -16
+**	...
+**	cm.popret	{ra, s0-s1}, 16
+**	...
+*/
+int test_s1()
+{
+
+  int s0  =  my_getchar();
+  int s1  =  my_getchar();
+  int b  =  my_getchar();
+  return s1 +s0 +b;
+}
+
+/*
+**test_f0:
+**	...
+**	cm.push	{ra, s0-s1}, -16
+**	...
+**	cm.popret	{ra, s0-s1}, 16
+**	...
+*/
+int test_f0()
+{
+
+  int s0  =  my_getchar();
+  float f0  =  getf(); 
+  int b  =  my_getchar();
+  return f0 +s0 +b;
+}
+
+/*
+**foo:
+**	cm.push	{ra}, -16
+**	call	f1
+**	cm.pop	{ra}, 16
+**	tail	f2
+*/
+void foo(void)
+{
+  f1();
+  f2();
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c b/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
new file mode 100644
index 00000000000..924197cb3c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
@@ -0,0 +1,239 @@ 
+/* { dg-do compile } */
+/* { dg-options " -Os -march=rv32imaf_zca_zcmp -mabi=ilp32f -mcmodel=medlow" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-O2" "-Og" "-O3" "-Oz" "-flto"} } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+char my_getchar();
+float getf();
+int __attribute__((noinline)) incoming_stack_args
+  (int arg0, int arg1, int arg2, int arg3,
+   int arg4, int arg5, int arg6, int arg7, int arg8);
+int getint();
+void PrintInts (int n, ...); // varargs
+void __attribute__((noinline)) PrintIntsNoVaStart (int n, ...); // varargs
+void PrintInts2 (int arg0, int arg1, int arg2, int arg3, int arg4, int arg5, int n, ...);
+extern void f1(void);
+extern void f2(void);
+
+/*
+**test1:
+**	...
+**	cm.push	{ra, s0-s4}, -80
+**	...
+**	cm.popret	{ra, s0-s4}, 80
+**	...
+*/
+int test1()
+{
+  char volatile array[3120];
+  float volatile farray[3120];
+
+  float sum = 0;
+  for (int i = 0; i < 3120; i++)
+  {
+    array[i] = my_getchar();
+    farray[i] = my_getchar() * 1.2;
+    sum += array[i] + farray[i];
+  }
+  return sum;
+}
+
+/*
+**test2_step1_0_size:
+**	...
+**	cm.push	{ra, s0-s1}, -64
+**	...
+**	cm.popret	{ra, s0-s1}, 64
+**	...
+*/
+int test2_step1_0_size()
+{
+  int volatile iarray[3120 + 1824/4 -8];
+
+  for (int i = 0; i < 3120 + 1824/4 - 8; i++)
+  {
+    iarray[i] = my_getchar() * 2;
+  }
+  return iarray[0] + iarray[1];
+}
+
+/*
+**test3:
+**	...
+**	cm.push	{ra, s0-s4}, -80
+**	...
+**	cm.popret	{ra, s0-s4}, 80
+**	...
+*/
+float test3()
+{
+  char volatile array[3120];
+  float volatile farray[3120];
+
+  float sum = 0, f1 = 0, f2 = 0, f3 = 0, f4 = 0, f5 = 0, f6 = 0, f7 = 0;
+
+  for (int i = 0; i < 3120; i++)
+  {
+    f1 = getf();
+    f2 = getf();
+    f3 = getf();
+    f4 = getf();
+    array[i] = my_getchar();
+    farray[i] = my_getchar() * 1.2;
+    sum += array[i] + farray[i] + f1 + f2 + f3 + f4;
+  }
+  return sum;
+}
+
+/*
+**outgoing_stack_args:
+**	...
+**	cm.push	{ra, s0}, -32
+**	...
+**	cm.popret	{ra, s0}, 32
+**	...
+*/
+int outgoing_stack_args()
+{
+  int  local = getint();
+  return local +incoming_stack_args(0, 1, 2, 3, 4, 5, 6, 7, 8);
+}
+
+/*
+**callPrintInts:
+**	...
+**	cm.push	{ra}, -48
+**	...
+**	cm.popret	{ra}, 48
+**	...
+*/
+float callPrintInts()
+{
+  volatile float f = getf(); // f in local
+  PrintInts(9,1,2,3,4,5,6,7,8,9);
+  return f;
+}
+
+/*
+**callPrint:
+**	...
+**	cm.push	{ra}, -48
+**	...
+**	cm.popret	{ra}, 48
+**	...
+*/
+float callPrint()
+{
+  volatile float f = getf(); // f in local
+  PrintIntsNoVaStart(0,1,2,3,4,5,6,7,8,9);
+  return f;
+}
+
+/*
+**callPrint_S:
+**	...
+**	cm.push	{ra}, -48
+**	...
+**	cm.popret	{ra}, 48
+**	...
+*/
+float callPrint_S()
+{
+  float f = getf();
+  PrintIntsNoVaStart(0,1,2,3,4,5,6,7,8,9);
+  return f;
+}
+
+/*
+**callPrint_2:
+**	...
+**	cm.push	{ra}, -48
+**	...
+**	cm.popret	{ra}, 48
+**	...
+*/
+float callPrint_2()
+{
+  float f = getf();
+  PrintInts2(0,1,2,3,4,5,6,7,8,9);
+  return f;
+}
+
+/*
+**test_step1_0bytes_save_restore:
+**	...
+**	cm.push	{ra}, -16
+**	...
+**	cm.popret	{ra}, 16
+**	...
+*/
+int test_step1_0bytes_save_restore()
+{
+
+  int a  =  9;
+  int b  =  my_getchar();
+  return a +b;
+}
+
+/*
+**test_s0:
+**	...
+**	cm.push	{ra, s0}, -16
+**	...
+**	cm.popret	{ra, s0}, 16
+**	...
+*/
+int test_s0()
+{
+
+  int a  =  my_getchar();
+  int b  =  my_getchar();
+  return a +b;
+}
+
+/*
+**test_s1:
+**	...
+**	cm.push	{ra, s0-s1}, -16
+**	...
+**	cm.popret	{ra, s0-s1}, 16
+**	...
+*/
+int test_s1()
+{
+
+  int s0  =  my_getchar();
+  int s1  =  my_getchar();
+  int b  =  my_getchar();
+  return s1 +s0 +b;
+}
+
+/*
+**test_f0:
+**	...
+**	cm.push	{ra, s0}, -32
+**	...
+**	cm.popret	{ra, s0}, 32
+**	...
+*/
+int test_f0()
+{
+
+  int s0  =  my_getchar();
+  float f0  =  getf(); 
+  int b  =  my_getchar();
+  return f0 +s0 +b;
+}
+
+/*
+**foo:
+**	cm.push	{ra}, -16
+**	call	f1
+**	cm.pop	{ra}, 16
+**	tail	f2
+*/
+void foo(void)
+{
+  f1();
+  f2();
+}
diff --git a/gcc/testsuite/gcc.target/riscv/zcmp_stack_alignment.c b/gcc/testsuite/gcc.target/riscv/zcmp_stack_alignment.c
new file mode 100644
index 00000000000..8e481522f89
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zcmp_stack_alignment.c
@@ -0,0 +1,23 @@ 
+/* { dg-do compile } */
+/* { dg-options " -O0 -march=rv32e_zca_zcb_zcmp -mabi=ilp32e -mcmodel=medlow -fomit-frame-pointer" } */
+/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-Os" "-Og" "-O3" "-Oz" "-flto"} } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+void bar();
+
+/*
+**fool_rv32e:
+**	cm.push	{ra}, -32
+**	...
+**	call	bar
+**	...
+**	lw	a5,32\(sp\)
+**	...
+**	cm.popret	{ra}, 32
+*/
+int fool_rv32e ( int a0, int a1, int a2, int a3, int a4, int a5,
+                  int incoming0)
+{
+  bar();
+  return a0 + a1 + a2 + a3 + a4 + a5 + incoming0;
+}
\ No newline at end of file