RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Message ID 20230428152102.1653600-1-pan2.li@intel.com
State Accepted
Headers
Series RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Li, Pan2 via Gcc-patches April 28, 2023, 3:21 p.m. UTC
  From: Pan Li <pan2.li@intel.com>

When some RVV integer compare operators act on the same vector registers
without mask. They can be simplified to VMSET.

This PATCH allows the eq, le, leu, ge, geu to perform such kind of the
simplification by adding one macro in riscv for simplify rtx.

Given we have:
vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl)
{
  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
}

Before this patch:
vsetvli  zero,a2,e8,m8,ta,ma
vl8re8.v v8,0(a1)
vmseq.vv v8,v8,v8
vsetvli  a5,zero,e8,m8,ta,ma
vsm.v    v8,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,m8,ta,ma
vmset.m v1                  <- optimized to vmset.m
vsetvli a5,zero,e8,m8,ta,ma
vsm.v   v1,0(a0)
ret

As above, we may have one instruction eliminated and require less vector
registers.

Signed-off-by: Pan Li <pan2.li@intel.com>

gcc/ChangeLog:

	* config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
	  consumed by simplify_rtx.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
	  Adjust test check condition.
---
 gcc/config/riscv/riscv.h                                    | 5 +++++
 .../riscv/rvv/base/integer_compare_insn_shortcut.c          | 6 +-----
 2 files changed, 6 insertions(+), 5 deletions(-)
  

Comments

Jeff Law April 28, 2023, 9:47 p.m. UTC | #1
On 4/28/23 09:21, Pan Li via Gcc-patches wrote:
> From: Pan Li <pan2.li@intel.com>
> 
> When some RVV integer compare operators act on the same vector registers
> without mask. They can be simplified to VMSET.
> 
> This PATCH allows the eq, le, leu, ge, geu to perform such kind of the
> simplification by adding one macro in riscv for simplify rtx.
> 
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl)
> {
>    return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
> }
> 
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v8,0(a1)
> vmseq.vv v8,v8,v8
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.v    v8,0(a0)
> ret
> 
> After this patch:
> vsetvli zero,a2,e8,m8,ta,ma
> vmset.m v1                  <- optimized to vmset.m
> vsetvli a5,zero,e8,m8,ta,ma
> vsm.v   v1,0(a0)
> ret
> 
> As above, we may have one instruction eliminated and require less vector
> registers.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
> 	  consumed by simplify_rtx.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
> 	  Adjust test check condition.
I'm not sure this is 100% correct.

What happens to the high bits in the resultant mask register?  My 
understanding is we have one output bit per input element in the 
comparison.  So unless the number of elements matches the bit width of 
the mask register, this isn't going to work.

Am I missing something?

Jeff
  
Li, Pan2 via Gcc-patches April 29, 2023, 2:55 a.m. UTC | #2
Thanks Jeff for comments.

It makes sense to me. For the EQ operator we should have CONSTM1. Does this mean s390 parts has similar issue here? Then for instructions like VMSEQ, we need to adjust the simplify_rtx up to a point.

Please help to correct me if any mistake. Thank you again.

Pan

-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Saturday, April 29, 2023 5:48 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET



On 4/28/23 09:21, Pan Li via Gcc-patches wrote:
> From: Pan Li <pan2.li@intel.com>
> 
> When some RVV integer compare operators act on the same vector 
> registers without mask. They can be simplified to VMSET.
> 
> This PATCH allows the eq, le, leu, ge, geu to perform such kind of the 
> simplification by adding one macro in riscv for simplify rtx.
> 
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl) 
> {
>    return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl); }
> 
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v8,0(a1)
> vmseq.vv v8,v8,v8
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.v    v8,0(a0)
> ret
> 
> After this patch:
> vsetvli zero,a2,e8,m8,ta,ma
> vmset.m v1                  <- optimized to vmset.m
> vsetvli a5,zero,e8,m8,ta,ma
> vsm.v   v1,0(a0)
> ret
> 
> As above, we may have one instruction eliminated and require less 
> vector registers.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
> 	  consumed by simplify_rtx.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
> 	  Adjust test check condition.
I'm not sure this is 100% correct.

What happens to the high bits in the resultant mask register?  My understanding is we have one output bit per input element in the comparison.  So unless the number of elements matches the bit width of the mask register, this isn't going to work.

Am I missing something?

Jeff
  
Li, Pan2 via Gcc-patches April 29, 2023, 1:35 p.m. UTC | #3
Hi Jeff

Just have a try in simplify_rtx for this optimization in PATCH v2. Could you please help to share any idea about this when you free? Thank you!

https://gcc.gnu.org/pipermail/gcc-patches/2023-April/617117.html

Pan

-----Original Message-----
From: Li, Pan2 
Sent: Saturday, April 29, 2023 10:55 AM
To: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Thanks Jeff for comments.

It makes sense to me. For the EQ operator we should have CONSTM1. Does this mean s390 parts has similar issue here? Then for instructions like VMSEQ, we need to adjust the simplify_rtx up to a point.

Please help to correct me if any mistake. Thank you again.

Pan

-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Saturday, April 29, 2023 5:48 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET



On 4/28/23 09:21, Pan Li via Gcc-patches wrote:
> From: Pan Li <pan2.li@intel.com>
> 
> When some RVV integer compare operators act on the same vector 
> registers without mask. They can be simplified to VMSET.
> 
> This PATCH allows the eq, le, leu, ge, geu to perform such kind of the 
> simplification by adding one macro in riscv for simplify rtx.
> 
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl) 
> {
>    return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl); }
> 
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v8,0(a1)
> vmseq.vv v8,v8,v8
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.v    v8,0(a0)
> ret
> 
> After this patch:
> vsetvli zero,a2,e8,m8,ta,ma
> vmset.m v1                  <- optimized to vmset.m
> vsetvli a5,zero,e8,m8,ta,ma
> vsm.v   v1,0(a0)
> ret
> 
> As above, we may have one instruction eliminated and require less 
> vector registers.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
> 	  consumed by simplify_rtx.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
> 	  Adjust test check condition.
I'm not sure this is 100% correct.

What happens to the high bits in the resultant mask register?  My understanding is we have one output bit per input element in the comparison.  So unless the number of elements matches the bit width of the mask register, this isn't going to work.

Am I missing something?

Jeff
  
Jeff Law April 29, 2023, 3:05 p.m. UTC | #4
On 4/28/23 20:55, Li, Pan2 wrote:
> Thanks Jeff for comments.
> 
> It makes sense to me. For the EQ operator we should have CONSTM1. 
That's not the way I interpret the RVV documentation.  Of course it's 
not terribly clear.    I guess one could do some experiments with qemu 
or try to dig into the sail code and figure out the intent from those.



Does this mean s390 parts has similar issue here? Then for instructions 
like VMSEQ, we need to adjust the simplify_rtx up to a point.
You'd have to refer to the s390 instruction set reference to understand 
precisely how the vector compares work.

But as it stands this really isn't a simplify-rtx question, but a 
question of the semantics of risc-v.   What happens with the high bits 
in the destination mask register is critical -- and if risc-v doesn't 
set them to all ones in this case, then that would mean that defining 
that macro is simply wrong for risc-v.

jeff
  
Andrew Waterman April 29, 2023, 5:21 p.m. UTC | #5
On Sat, Apr 29, 2023 at 8:06 AM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 4/28/23 20:55, Li, Pan2 wrote:
> > Thanks Jeff for comments.
> >
> > It makes sense to me. For the EQ operator we should have CONSTM1.
> That's not the way I interpret the RVV documentation.  Of course it's
> not terribly clear.    I guess one could do some experiments with qemu
> or try to dig into the sail code and figure out the intent from those.
>
>
>
> Does this mean s390 parts has similar issue here? Then for instructions
> like VMSEQ, we need to adjust the simplify_rtx up to a point.
> You'd have to refer to the s390 instruction set reference to understand
> precisely how the vector compares work.
>
> But as it stands this really isn't a simplify-rtx question, but a
> question of the semantics of risc-v.   What happens with the high bits
> in the destination mask register is critical -- and if risc-v doesn't
> set them to all ones in this case, then that would mean that defining
> that macro is simply wrong for risc-v.

The relevant statement in the spec is that "the tail elements are always
updated with a tail-agnostic policy".  The vmset.m instruction will cause
mask register bits [0, vl-1] to be set to 1; elements [vl, VLMAX-1] will
either be undisturbed or set to 1, i.e., effectively unspecified.

>
> jeff
  
Palmer Dabbelt April 29, 2023, 5:28 p.m. UTC | #6
On Sat, 29 Apr 2023 10:21:53 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
> On Sat, Apr 29, 2023 at 8:06 AM Jeff Law via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
>>
>>
>>
>> On 4/28/23 20:55, Li, Pan2 wrote:
>> > Thanks Jeff for comments.
>> >
>> > It makes sense to me. For the EQ operator we should have CONSTM1.
>> That's not the way I interpret the RVV documentation.  Of course it's
>> not terribly clear.    I guess one could do some experiments with qemu
>> or try to dig into the sail code and figure out the intent from those.

QEMU specifically takes advantage of the behavior Andrew is pointing out 
it the spec, and will soon do so more aggressively (assuming the patches 
Daniel just sent out get merged).

>> Does this mean s390 parts has similar issue here? Then for instructions
>> like VMSEQ, we need to adjust the simplify_rtx up to a point.
>> You'd have to refer to the s390 instruction set reference to understand
>> precisely how the vector compares work.
>>
>> But as it stands this really isn't a simplify-rtx question, but a
>> question of the semantics of risc-v.   What happens with the high bits
>> in the destination mask register is critical -- and if risc-v doesn't
>> set them to all ones in this case, then that would mean that defining
>> that macro is simply wrong for risc-v.
>
> The relevant statement in the spec is that "the tail elements are always
> updated with a tail-agnostic policy".  The vmset.m instruction will cause
> mask register bits [0, vl-1] to be set to 1; elements [vl, VLMAX-1] will
> either be undisturbed or set to 1, i.e., effectively unspecified.
>
>>
>> jeff
  
Jeff Law April 29, 2023, 5:46 p.m. UTC | #7
On 4/29/23 11:28, Palmer Dabbelt wrote:
> On Sat, 29 Apr 2023 10:21:53 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
>> On Sat, Apr 29, 2023 at 8:06 AM Jeff Law via Gcc-patches <
>> gcc-patches@gcc.gnu.org> wrote:
>>>
>>>
>>>
>>> On 4/28/23 20:55, Li, Pan2 wrote:
>>> > Thanks Jeff for comments.
>>> >
>>> > It makes sense to me. For the EQ operator we should have CONSTM1.
>>> That's not the way I interpret the RVV documentation.  Of course it's
>>> not terribly clear.    I guess one could do some experiments with qemu
>>> or try to dig into the sail code and figure out the intent from those.
> 
> QEMU specifically takes advantage of the behavior Andrew is pointing out 
> it the spec, and will soon do so more aggressively (assuming the patches 
> Daniel just sent out get merged).
Yea.  And taking advantage of that behavior is definitely a performance 
issue for QEMU.  There's still work to do though.  QEMU on vector code 
is running crazy slow.

jeff
  
Palmer Dabbelt April 29, 2023, 5:48 p.m. UTC | #8
On Sat, 29 Apr 2023 10:46:37 PDT (-0700), jeffreyalaw@gmail.com wrote:
>
>
> On 4/29/23 11:28, Palmer Dabbelt wrote:
>> On Sat, 29 Apr 2023 10:21:53 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
>>> On Sat, Apr 29, 2023 at 8:06 AM Jeff Law via Gcc-patches <
>>> gcc-patches@gcc.gnu.org> wrote:
>>>>
>>>>
>>>>
>>>> On 4/28/23 20:55, Li, Pan2 wrote:
>>>> > Thanks Jeff for comments.
>>>> >
>>>> > It makes sense to me. For the EQ operator we should have CONSTM1.
>>>> That's not the way I interpret the RVV documentation.  Of course it's
>>>> not terribly clear.    I guess one could do some experiments with qemu
>>>> or try to dig into the sail code and figure out the intent from those.
>>
>> QEMU specifically takes advantage of the behavior Andrew is pointing out
>> it the spec, and will soon do so more aggressively (assuming the patches
>> Daniel just sent out get merged).
> Yea.  And taking advantage of that behavior is definitely a performance
> issue for QEMU.  There's still work to do though.  QEMU on vector code
> is running crazy slow.

I guess we're kind of off the rails for a GCC patch, but that's 
definately true.  Across the board RVV is going to just need a lot of 
work, it's very different than SVE or AVX.

Unfortunately QEMU performance isn't really a priority on our end, but 
it's great to see folks digging into it.
  
Jeff Law April 29, 2023, 5:49 p.m. UTC | #9
On 4/29/23 11:21, Andrew Waterman wrote:

> 
> The relevant statement in the spec is that "the tail elements are always 
> updated with a tail-agnostic policy".  The vmset.m instruction will 
> cause mask register bits [0, vl-1] to be set to 1; elements [vl, 
> VLMAX-1] will either be undisturbed or set to 1, i.e., effectively 
> unspecified.
Makes sense.  Just have to stitch together bits from different locations 
in the manual.

The net being that I can't think we can define that macro for RISC-V in 
the way that Pan wants, the semantics just don't line up correctly.

jeff
  
Jeff Law April 29, 2023, 5:52 p.m. UTC | #10
On 4/29/23 11:48, Palmer Dabbelt wrote:

>> Yea.  And taking advantage of that behavior is definitely a performance
>> issue for QEMU.  There's still work to do though.  QEMU on vector code
>> is running crazy slow.
> 
> I guess we're kind of off the rails for a GCC patch, but that's 
> definately true.  Across the board RVV is going to just need a lot of 
> work, it's very different than SVE or AVX.
> 
> Unfortunately QEMU performance isn't really a priority on our end, but 
> it's great to see folks digging into it.
Well, when a user mode SPEC run goes from ~15 minutes to multiple hours 
for a single input workload within specint it becomes a development 
problem.  Daniel is loosely affiliated with my group in Ventana, so I 
can bug him with this kind of stuff.

jeff
  
Palmer Dabbelt April 29, 2023, 6:15 p.m. UTC | #11
On Sat, 29 Apr 2023 10:52:50 PDT (-0700), jeffreyalaw@gmail.com wrote:
>
>
> On 4/29/23 11:48, Palmer Dabbelt wrote:
>
>>> Yea.  And taking advantage of that behavior is definitely a performance
>>> issue for QEMU.  There's still work to do though.  QEMU on vector code
>>> is running crazy slow.
>>
>> I guess we're kind of off the rails for a GCC patch, but that's
>> definately true.  Across the board RVV is going to just need a lot of
>> work, it's very different than SVE or AVX.
>>
>> Unfortunately QEMU performance isn't really a priority on our end, but
>> it's great to see folks digging into it.
> Well, when a user mode SPEC run goes from ~15 minutes to multiple hours
> for a single input workload within specint it becomes a development
> problem.  Daniel is loosely affiliated with my group in Ventana, so I
> can bug him with this kind of stuff.

We've got another team actually doing the mechanics of the SPEC runs, we 
just do the compiler.  So while I guess it is a problem, it's not my 
problem ;)

Maybe not the best way to go about things, but there's only so much that 
can be done...
  
Kito Cheng April 30, 2023, 1:40 a.m. UTC | #12
Hi Jeff:

The RTL pattern already models tail element and vector length well,
so I don't feel the first version of Pan's patch has any problem?

Input RTL pattern:

#(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
#        (if_then_else:VNx2BI (unspec:VNx2BI [
#                    (const_vector:VNx2BI repeat [
#                            (const_int 1 [0x1])
#                        ])  # all-1 mask
#                    (reg:DI 143)  # AVL reg, or vector length
#                    (const_int 2 [0x2]) # mask policy
#                    (const_int 0 [0])   # avl type
#                    (reg:SI 66 vl)
#                    (reg:SI 67 vtype)
#                ] UNSPEC_VPREDICATE)
#            (geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
#                (reg/v:VNx2QI 137 [ v1 ]))
#            (unspec:VNx2BI [
#                    (reg:SI 0 zero)
#                ] UNSPEC_VUNDEF))) # maskoff and tail operand
#     (expr_list:REG_DEAD (reg:DI 143)
#        (expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
#            (nil))))

And the split pattern, only did on tail/maskoff element with undefined value:

(define_split
 [(set (match_operand:VB      0 "register_operand")
       (if_then_else:VB
         (unspec:VB
           [(match_operand:VB 1 "vector_all_trues_mask_operand")
            (match_operand    4 "vector_length_operand")
            (match_operand    5 "const_int_operand")
            (match_operand    6 "const_int_operand")
            (reg:SI VL_REGNUM)
            (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
         (match_operand:VB    3 "vector_move_operand")
         (match_operand:VB    2 "vector_undef_operand")))] # maskoff
and tail operand, only match undef value

Then it turns into vmset, and also discard mask policy operand (since
maskoff is undef means don't care IMO):

(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
       (if_then_else:VNx2BI (unspec:VNx2BI [
                   (const_vector:VNx2BI repeat [
                           (const_int 1 [0x1])
                       ])  # all-1 mask
                   (reg:DI 143) # AVL reg, or vector length
                   (const_int 2 [0x2]) # mask policy
                   (reg:SI 66 vl)
                   (reg:SI 67 vtype)
               ] UNSPEC_VPREDICATE)
           (const_vector:VNx2BI repeat [
                   (const_int 1 [0x1])
               ])    # all-1
           (unspec:VNx2BI [
                   (reg:SI 0 zero)
               ] UNSPEC_VUNDEF))) # still vundef
    (expr_list:REG_DEAD (reg:DI 143)
       (nil)))



On Sat, Apr 29, 2023 at 11:05 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 4/28/23 20:55, Li, Pan2 wrote:
> > Thanks Jeff for comments.
> >
> > It makes sense to me. For the EQ operator we should have CONSTM1.
> That's not the way I interpret the RVV documentation.  Of course it's
> not terribly clear.    I guess one could do some experiments with qemu
> or try to dig into the sail code and figure out the intent from those.
>
>
>
> Does this mean s390 parts has similar issue here? Then for instructions
> like VMSEQ, we need to adjust the simplify_rtx up to a point.
> You'd have to refer to the s390 instruction set reference to understand
> precisely how the vector compares work.
>
> But as it stands this really isn't a simplify-rtx question, but a
> question of the semantics of risc-v.   What happens with the high bits
> in the destination mask register is critical -- and if risc-v doesn't
> set them to all ones in this case, then that would mean that defining
> that macro is simply wrong for risc-v.
>
> jeff
  
Li, Pan2 via Gcc-patches April 30, 2023, 2:21 p.m. UTC | #13
Thanks all for comments. Summary what I have learned from the mail thread as below. Please feel free to correct me if any mistake.

1. The RVV VMSET has tail policy and the high bits of target register can be overridden to 1 or retain the value they held according to the ISA.
2. The semantics of tail policy is different with s390 according the macro comment " /* The truth element value for vector comparisons.  Our instructions always generate -1 in that case.  */ ".
3. We still have a lot of work to do for the RISC-V besides compiler.
4. The RTL pattern of PATCH v1 models tail policy and vector length as well.

Pan

-----Original Message-----
From: Kito Cheng <kito.cheng@sifive.com> 
Sent: Sunday, April 30, 2023 9:40 AM
To: Jeff Law <jeffreyalaw@gmail.com>
Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; Andrew Waterman <andrew@sifive.com>
Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Hi Jeff:

The RTL pattern already models tail element and vector length well, so I don't feel the first version of Pan's patch has any problem?

Input RTL pattern:

#(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
#        (if_then_else:VNx2BI (unspec:VNx2BI [
#                    (const_vector:VNx2BI repeat [
#                            (const_int 1 [0x1])
#                        ])  # all-1 mask
#                    (reg:DI 143)  # AVL reg, or vector length
#                    (const_int 2 [0x2]) # mask policy
#                    (const_int 0 [0])   # avl type
#                    (reg:SI 66 vl)
#                    (reg:SI 67 vtype)
#                ] UNSPEC_VPREDICATE)
#            (geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
#                (reg/v:VNx2QI 137 [ v1 ]))
#            (unspec:VNx2BI [
#                    (reg:SI 0 zero)
#                ] UNSPEC_VUNDEF))) # maskoff and tail operand
#     (expr_list:REG_DEAD (reg:DI 143)
#        (expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
#            (nil))))

And the split pattern, only did on tail/maskoff element with undefined value:

(define_split
 [(set (match_operand:VB      0 "register_operand")
       (if_then_else:VB
         (unspec:VB
           [(match_operand:VB 1 "vector_all_trues_mask_operand")
            (match_operand    4 "vector_length_operand")
            (match_operand    5 "const_int_operand")
            (match_operand    6 "const_int_operand")
            (reg:SI VL_REGNUM)
            (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
         (match_operand:VB    3 "vector_move_operand")
         (match_operand:VB    2 "vector_undef_operand")))] # maskoff
and tail operand, only match undef value

Then it turns into vmset, and also discard mask policy operand (since maskoff is undef means don't care IMO):

(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
       (if_then_else:VNx2BI (unspec:VNx2BI [
                   (const_vector:VNx2BI repeat [
                           (const_int 1 [0x1])
                       ])  # all-1 mask
                   (reg:DI 143) # AVL reg, or vector length
                   (const_int 2 [0x2]) # mask policy
                   (reg:SI 66 vl)
                   (reg:SI 67 vtype)
               ] UNSPEC_VPREDICATE)
           (const_vector:VNx2BI repeat [
                   (const_int 1 [0x1])
               ])    # all-1
           (unspec:VNx2BI [
                   (reg:SI 0 zero)
               ] UNSPEC_VUNDEF))) # still vundef
    (expr_list:REG_DEAD (reg:DI 143)
       (nil)))



On Sat, Apr 29, 2023 at 11:05 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 4/28/23 20:55, Li, Pan2 wrote:
> > Thanks Jeff for comments.
> >
> > It makes sense to me. For the EQ operator we should have CONSTM1.
> That's not the way I interpret the RVV documentation.  Of course it's
> not terribly clear.    I guess one could do some experiments with qemu
> or try to dig into the sail code and figure out the intent from those.
>
>
>
> Does this mean s390 parts has similar issue here? Then for 
> instructions like VMSEQ, we need to adjust the simplify_rtx up to a point.
> You'd have to refer to the s390 instruction set reference to 
> understand precisely how the vector compares work.
>
> But as it stands this really isn't a simplify-rtx question, but a
> question of the semantics of risc-v.   What happens with the high bits
> in the destination mask register is critical -- and if risc-v doesn't 
> set them to all ones in this case, then that would mean that defining 
> that macro is simply wrong for risc-v.
>
> jeff
  
Jeff Law May 2, 2023, 4:28 p.m. UTC | #14
On 4/29/23 19:40, Kito Cheng wrote:
> Hi Jeff:
> 
> The RTL pattern already models tail element and vector length well,
> so I don't feel the first version of Pan's patch has any problem?
> 
> Input RTL pattern:
> 
> #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> #        (if_then_else:VNx2BI (unspec:VNx2BI [
> #                    (const_vector:VNx2BI repeat [
> #                            (const_int 1 [0x1])
> #                        ])  # all-1 mask
> #                    (reg:DI 143)  # AVL reg, or vector length
> #                    (const_int 2 [0x2]) # mask policy
> #                    (const_int 0 [0])   # avl type
> #                    (reg:SI 66 vl)
> #                    (reg:SI 67 vtype)
> #                ] UNSPEC_VPREDICATE)
> #            (geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> #                (reg/v:VNx2QI 137 [ v1 ]))
> #            (unspec:VNx2BI [
> #                    (reg:SI 0 zero)
> #                ] UNSPEC_VUNDEF))) # maskoff and tail operand
> #     (expr_list:REG_DEAD (reg:DI 143)
> #        (expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> #            (nil))))
> 
> And the split pattern, only did on tail/maskoff element with undefined value:
> 
> (define_split
>   [(set (match_operand:VB      0 "register_operand")
>         (if_then_else:VB
>           (unspec:VB
>             [(match_operand:VB 1 "vector_all_trues_mask_operand")
>              (match_operand    4 "vector_length_operand")
>              (match_operand    5 "const_int_operand")
>              (match_operand    6 "const_int_operand")
>              (reg:SI VL_REGNUM)
>              (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>           (match_operand:VB    3 "vector_move_operand")
>           (match_operand:VB    2 "vector_undef_operand")))] # maskoff
> and tail operand, only match undef value
> 
> Then it turns into vmset, and also discard mask policy operand (since
> maskoff is undef means don't care IMO):
> 
> (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
>         (if_then_else:VNx2BI (unspec:VNx2BI [
>                     (const_vector:VNx2BI repeat [
>                             (const_int 1 [0x1])
>                         ])  # all-1 mask
>                     (reg:DI 143) # AVL reg, or vector length
>                     (const_int 2 [0x2]) # mask policy
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (const_vector:VNx2BI repeat [
>                     (const_int 1 [0x1])
>                 ])    # all-1
>             (unspec:VNx2BI [
>                     (reg:SI 0 zero)
>                 ] UNSPEC_VUNDEF))) # still vundef
>      (expr_list:REG_DEAD (reg:DI 143)
>         (nil)))
Right.  My concern is that when we call relational_result it's going to 
return -1 (as a vector of bools) which bubbles up through the call 
chain.   If that doesn't match the actual register state after the 
instruction (irrespective of the tail policy), then we have the 
potential to generate incorrect code.

For example, if there's a subsequent instruction that tried to set a 
vector register to -1, it could just copy from the destination of the 
vmset to the new target.  But if the vmset didn't set all the bits to 1, 
then the code is wrong.

With all the UNSPECs in place, this may not be a problem in practice. 
Unsure.  I'm willing to defer to you on this Kito.

Jeff
  
Li, Pan2 via Gcc-patches May 3, 2023, 11:17 a.m. UTC | #15
Thanks all for comments, will work with kito to make it happen.

Pan

-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Wednesday, May 3, 2023 12:28 AM
To: Kito Cheng <kito.cheng@sifive.com>
Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; Andrew Waterman <andrew@sifive.com>
Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET



On 4/29/23 19:40, Kito Cheng wrote:
> Hi Jeff:
> 
> The RTL pattern already models tail element and vector length well, so 
> I don't feel the first version of Pan's patch has any problem?
> 
> Input RTL pattern:
> 
> #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> #        (if_then_else:VNx2BI (unspec:VNx2BI [
> #                    (const_vector:VNx2BI repeat [
> #                            (const_int 1 [0x1])
> #                        ])  # all-1 mask
> #                    (reg:DI 143)  # AVL reg, or vector length
> #                    (const_int 2 [0x2]) # mask policy
> #                    (const_int 0 [0])   # avl type
> #                    (reg:SI 66 vl)
> #                    (reg:SI 67 vtype)
> #                ] UNSPEC_VPREDICATE)
> #            (geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> #                (reg/v:VNx2QI 137 [ v1 ]))
> #            (unspec:VNx2BI [
> #                    (reg:SI 0 zero)
> #                ] UNSPEC_VUNDEF))) # maskoff and tail operand
> #     (expr_list:REG_DEAD (reg:DI 143)
> #        (expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> #            (nil))))
> 
> And the split pattern, only did on tail/maskoff element with undefined value:
> 
> (define_split
>   [(set (match_operand:VB      0 "register_operand")
>         (if_then_else:VB
>           (unspec:VB
>             [(match_operand:VB 1 "vector_all_trues_mask_operand")
>              (match_operand    4 "vector_length_operand")
>              (match_operand    5 "const_int_operand")
>              (match_operand    6 "const_int_operand")
>              (reg:SI VL_REGNUM)
>              (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>           (match_operand:VB    3 "vector_move_operand")
>           (match_operand:VB    2 "vector_undef_operand")))] # maskoff
> and tail operand, only match undef value
> 
> Then it turns into vmset, and also discard mask policy operand (since 
> maskoff is undef means don't care IMO):
> 
> (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
>         (if_then_else:VNx2BI (unspec:VNx2BI [
>                     (const_vector:VNx2BI repeat [
>                             (const_int 1 [0x1])
>                         ])  # all-1 mask
>                     (reg:DI 143) # AVL reg, or vector length
>                     (const_int 2 [0x2]) # mask policy
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (const_vector:VNx2BI repeat [
>                     (const_int 1 [0x1])
>                 ])    # all-1
>             (unspec:VNx2BI [
>                     (reg:SI 0 zero)
>                 ] UNSPEC_VUNDEF))) # still vundef
>      (expr_list:REG_DEAD (reg:DI 143)
>         (nil)))
Right.  My concern is that when we call relational_result it's going to return -1 (as a vector of bools) which bubbles up through the call 
chain.   If that doesn't match the actual register state after the 
instruction (irrespective of the tail policy), then we have the potential to generate incorrect code.

For example, if there's a subsequent instruction that tried to set a vector register to -1, it could just copy from the destination of the vmset to the new target.  But if the vmset didn't set all the bits to 1, then the code is wrong.

With all the UNSPECs in place, this may not be a problem in practice. 
Unsure.  I'm willing to defer to you on this Kito.

Jeff
  
Li, Pan2 via Gcc-patches May 5, 2023, 12:30 p.m. UTC | #16
Hi kito,

Could you please help to share any suggestion about the PATCH? Comparing the V1 and V2.

Pan


-----Original Message-----
From: Li, Pan2 
Sent: Wednesday, May 3, 2023 7:18 PM
To: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@sifive.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; Andrew Waterman <andrew@sifive.com>
Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Thanks all for comments, will work with kito to make it happen.

Pan

-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Wednesday, May 3, 2023 12:28 AM
To: Kito Cheng <kito.cheng@sifive.com>
Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; Andrew Waterman <andrew@sifive.com>
Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET



On 4/29/23 19:40, Kito Cheng wrote:
> Hi Jeff:
> 
> The RTL pattern already models tail element and vector length well, so 
> I don't feel the first version of Pan's patch has any problem?
> 
> Input RTL pattern:
> 
> #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> #        (if_then_else:VNx2BI (unspec:VNx2BI [
> #                    (const_vector:VNx2BI repeat [
> #                            (const_int 1 [0x1])
> #                        ])  # all-1 mask
> #                    (reg:DI 143)  # AVL reg, or vector length
> #                    (const_int 2 [0x2]) # mask policy
> #                    (const_int 0 [0])   # avl type
> #                    (reg:SI 66 vl)
> #                    (reg:SI 67 vtype)
> #                ] UNSPEC_VPREDICATE)
> #            (geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> #                (reg/v:VNx2QI 137 [ v1 ]))
> #            (unspec:VNx2BI [
> #                    (reg:SI 0 zero)
> #                ] UNSPEC_VUNDEF))) # maskoff and tail operand
> #     (expr_list:REG_DEAD (reg:DI 143)
> #        (expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> #            (nil))))
> 
> And the split pattern, only did on tail/maskoff element with undefined value:
> 
> (define_split
>   [(set (match_operand:VB      0 "register_operand")
>         (if_then_else:VB
>           (unspec:VB
>             [(match_operand:VB 1 "vector_all_trues_mask_operand")
>              (match_operand    4 "vector_length_operand")
>              (match_operand    5 "const_int_operand")
>              (match_operand    6 "const_int_operand")
>              (reg:SI VL_REGNUM)
>              (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>           (match_operand:VB    3 "vector_move_operand")
>           (match_operand:VB    2 "vector_undef_operand")))] # maskoff
> and tail operand, only match undef value
> 
> Then it turns into vmset, and also discard mask policy operand (since 
> maskoff is undef means don't care IMO):
> 
> (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
>         (if_then_else:VNx2BI (unspec:VNx2BI [
>                     (const_vector:VNx2BI repeat [
>                             (const_int 1 [0x1])
>                         ])  # all-1 mask
>                     (reg:DI 143) # AVL reg, or vector length
>                     (const_int 2 [0x2]) # mask policy
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (const_vector:VNx2BI repeat [
>                     (const_int 1 [0x1])
>                 ])    # all-1
>             (unspec:VNx2BI [
>                     (reg:SI 0 zero)
>                 ] UNSPEC_VUNDEF))) # still vundef
>      (expr_list:REG_DEAD (reg:DI 143)
>         (nil)))
Right.  My concern is that when we call relational_result it's going to return -1 (as a vector of bools) which bubbles up through the call 
chain.   If that doesn't match the actual register state after the 
instruction (irrespective of the tail policy), then we have the potential to generate incorrect code.

For example, if there's a subsequent instruction that tried to set a vector register to -1, it could just copy from the destination of the vmset to the new target.  But if the vmset didn't set all the bits to 1, then the code is wrong.

With all the UNSPECs in place, this may not be a problem in practice. 
Unsure.  I'm willing to defer to you on this Kito.

Jeff
  
Kito Cheng May 5, 2023, 12:37 p.m. UTC | #17
I will take V1 and commit to trunk after my local test is done :)

On Fri, May 5, 2023 at 8:30 PM Li, Pan2 <pan2.li@intel.com> wrote:
>
> Hi kito,
>
> Could you please help to share any suggestion about the PATCH? Comparing the V1 and V2.
>
> Pan
>
>
> -----Original Message-----
> From: Li, Pan2
> Sent: Wednesday, May 3, 2023 7:18 PM
> To: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@sifive.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; Andrew Waterman <andrew@sifive.com>
> Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
>
> Thanks all for comments, will work with kito to make it happen.
>
> Pan
>
> -----Original Message-----
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: Wednesday, May 3, 2023 12:28 AM
> To: Kito Cheng <kito.cheng@sifive.com>
> Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; Andrew Waterman <andrew@sifive.com>
> Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
>
>
>
> On 4/29/23 19:40, Kito Cheng wrote:
> > Hi Jeff:
> >
> > The RTL pattern already models tail element and vector length well, so
> > I don't feel the first version of Pan's patch has any problem?
> >
> > Input RTL pattern:
> >
> > #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > #        (if_then_else:VNx2BI (unspec:VNx2BI [
> > #                    (const_vector:VNx2BI repeat [
> > #                            (const_int 1 [0x1])
> > #                        ])  # all-1 mask
> > #                    (reg:DI 143)  # AVL reg, or vector length
> > #                    (const_int 2 [0x2]) # mask policy
> > #                    (const_int 0 [0])   # avl type
> > #                    (reg:SI 66 vl)
> > #                    (reg:SI 67 vtype)
> > #                ] UNSPEC_VPREDICATE)
> > #            (geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> > #                (reg/v:VNx2QI 137 [ v1 ]))
> > #            (unspec:VNx2BI [
> > #                    (reg:SI 0 zero)
> > #                ] UNSPEC_VUNDEF))) # maskoff and tail operand
> > #     (expr_list:REG_DEAD (reg:DI 143)
> > #        (expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> > #            (nil))))
> >
> > And the split pattern, only did on tail/maskoff element with undefined value:
> >
> > (define_split
> >   [(set (match_operand:VB      0 "register_operand")
> >         (if_then_else:VB
> >           (unspec:VB
> >             [(match_operand:VB 1 "vector_all_trues_mask_operand")
> >              (match_operand    4 "vector_length_operand")
> >              (match_operand    5 "const_int_operand")
> >              (match_operand    6 "const_int_operand")
> >              (reg:SI VL_REGNUM)
> >              (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> >           (match_operand:VB    3 "vector_move_operand")
> >           (match_operand:VB    2 "vector_undef_operand")))] # maskoff
> > and tail operand, only match undef value
> >
> > Then it turns into vmset, and also discard mask policy operand (since
> > maskoff is undef means don't care IMO):
> >
> > (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> >         (if_then_else:VNx2BI (unspec:VNx2BI [
> >                     (const_vector:VNx2BI repeat [
> >                             (const_int 1 [0x1])
> >                         ])  # all-1 mask
> >                     (reg:DI 143) # AVL reg, or vector length
> >                     (const_int 2 [0x2]) # mask policy
> >                     (reg:SI 66 vl)
> >                     (reg:SI 67 vtype)
> >                 ] UNSPEC_VPREDICATE)
> >             (const_vector:VNx2BI repeat [
> >                     (const_int 1 [0x1])
> >                 ])    # all-1
> >             (unspec:VNx2BI [
> >                     (reg:SI 0 zero)
> >                 ] UNSPEC_VUNDEF))) # still vundef
> >      (expr_list:REG_DEAD (reg:DI 143)
> >         (nil)))
> Right.  My concern is that when we call relational_result it's going to return -1 (as a vector of bools) which bubbles up through the call
> chain.   If that doesn't match the actual register state after the
> instruction (irrespective of the tail policy), then we have the potential to generate incorrect code.
>
> For example, if there's a subsequent instruction that tried to set a vector register to -1, it could just copy from the destination of the vmset to the new target.  But if the vmset didn't set all the bits to 1, then the code is wrong.
>
> With all the UNSPECs in place, this may not be a problem in practice.
> Unsure.  I'm willing to defer to you on this Kito.
>
> Jeff
  
Li, Pan2 via Gcc-patches May 5, 2023, 12:45 p.m. UTC | #18
Ok, sounds good. Thank you!

Pan

-----Original Message-----
From: Kito Cheng <kito.cheng@sifive.com> 
Sent: Friday, May 5, 2023 8:37 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

I will take V1 and commit to trunk after my local test is done :)

On Fri, May 5, 2023 at 8:30 PM Li, Pan2 <pan2.li@intel.com> wrote:
>
> Hi kito,
>
> Could you please help to share any suggestion about the PATCH? Comparing the V1 and V2.
>
> Pan
>
>
> -----Original Message-----
> From: Li, Pan2
> Sent: Wednesday, May 3, 2023 7:18 PM
> To: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng 
> <kito.cheng@sifive.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang 
> <yanzhang.wang@intel.com>; Andrew Waterman <andrew@sifive.com>
> Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify 
> to VMSET
>
> Thanks all for comments, will work with kito to make it happen.
>
> Pan
>
> -----Original Message-----
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: Wednesday, May 3, 2023 12:28 AM
> To: Kito Cheng <kito.cheng@sifive.com>
> Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; 
> juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; Andrew 
> Waterman <andrew@sifive.com>
> Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify 
> to VMSET
>
>
>
> On 4/29/23 19:40, Kito Cheng wrote:
> > Hi Jeff:
> >
> > The RTL pattern already models tail element and vector length well, 
> > so I don't feel the first version of Pan's patch has any problem?
> >
> > Input RTL pattern:
> >
> > #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > #        (if_then_else:VNx2BI (unspec:VNx2BI [
> > #                    (const_vector:VNx2BI repeat [
> > #                            (const_int 1 [0x1])
> > #                        ])  # all-1 mask
> > #                    (reg:DI 143)  # AVL reg, or vector length
> > #                    (const_int 2 [0x2]) # mask policy
> > #                    (const_int 0 [0])   # avl type
> > #                    (reg:SI 66 vl)
> > #                    (reg:SI 67 vtype)
> > #                ] UNSPEC_VPREDICATE)
> > #            (geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> > #                (reg/v:VNx2QI 137 [ v1 ]))
> > #            (unspec:VNx2BI [
> > #                    (reg:SI 0 zero)
> > #                ] UNSPEC_VUNDEF))) # maskoff and tail operand
> > #     (expr_list:REG_DEAD (reg:DI 143)
> > #        (expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> > #            (nil))))
> >
> > And the split pattern, only did on tail/maskoff element with undefined value:
> >
> > (define_split
> >   [(set (match_operand:VB      0 "register_operand")
> >         (if_then_else:VB
> >           (unspec:VB
> >             [(match_operand:VB 1 "vector_all_trues_mask_operand")
> >              (match_operand    4 "vector_length_operand")
> >              (match_operand    5 "const_int_operand")
> >              (match_operand    6 "const_int_operand")
> >              (reg:SI VL_REGNUM)
> >              (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> >           (match_operand:VB    3 "vector_move_operand")
> >           (match_operand:VB    2 "vector_undef_operand")))] # maskoff
> > and tail operand, only match undef value
> >
> > Then it turns into vmset, and also discard mask policy operand 
> > (since maskoff is undef means don't care IMO):
> >
> > (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> >         (if_then_else:VNx2BI (unspec:VNx2BI [
> >                     (const_vector:VNx2BI repeat [
> >                             (const_int 1 [0x1])
> >                         ])  # all-1 mask
> >                     (reg:DI 143) # AVL reg, or vector length
> >                     (const_int 2 [0x2]) # mask policy
> >                     (reg:SI 66 vl)
> >                     (reg:SI 67 vtype)
> >                 ] UNSPEC_VPREDICATE)
> >             (const_vector:VNx2BI repeat [
> >                     (const_int 1 [0x1])
> >                 ])    # all-1
> >             (unspec:VNx2BI [
> >                     (reg:SI 0 zero)
> >                 ] UNSPEC_VUNDEF))) # still vundef
> >      (expr_list:REG_DEAD (reg:DI 143)
> >         (nil)))
> Right.  My concern is that when we call relational_result it's going to return -1 (as a vector of bools) which bubbles up through the call
> chain.   If that doesn't match the actual register state after the
> instruction (irrespective of the tail policy), then we have the potential to generate incorrect code.
>
> For example, if there's a subsequent instruction that tried to set a vector register to -1, it could just copy from the destination of the vmset to the new target.  But if the vmset didn't set all the bits to 1, then the code is wrong.
>
> With all the UNSPECs in place, this may not be a problem in practice.
> Unsure.  I'm willing to defer to you on this Kito.
>
> Jeff
  
Kito Cheng May 5, 2023, 2:51 p.m. UTC | #19
pushed v1 to trunk

On Fri, May 5, 2023 at 8:46 PM Li, Pan2 via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Ok, sounds good. Thank you!
>
> Pan
>
> -----Original Message-----
> From: Kito Cheng <kito.cheng@sifive.com>
> Sent: Friday, May 5, 2023 8:37 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>
> Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
>
> I will take V1 and commit to trunk after my local test is done :)
>
> On Fri, May 5, 2023 at 8:30 PM Li, Pan2 <pan2.li@intel.com> wrote:
> >
> > Hi kito,
> >
> > Could you please help to share any suggestion about the PATCH? Comparing the V1 and V2.
> >
> > Pan
> >
> >
> > -----Original Message-----
> > From: Li, Pan2
> > Sent: Wednesday, May 3, 2023 7:18 PM
> > To: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng
> > <kito.cheng@sifive.com>
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang
> > <yanzhang.wang@intel.com>; Andrew Waterman <andrew@sifive.com>
> > Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify
> > to VMSET
> >
> > Thanks all for comments, will work with kito to make it happen.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Jeff Law <jeffreyalaw@gmail.com>
> > Sent: Wednesday, May 3, 2023 12:28 AM
> > To: Kito Cheng <kito.cheng@sifive.com>
> > Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org;
> > juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; Andrew
> > Waterman <andrew@sifive.com>
> > Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify
> > to VMSET
> >
> >
> >
> > On 4/29/23 19:40, Kito Cheng wrote:
> > > Hi Jeff:
> > >
> > > The RTL pattern already models tail element and vector length well,
> > > so I don't feel the first version of Pan's patch has any problem?
> > >
> > > Input RTL pattern:
> > >
> > > #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > > #        (if_then_else:VNx2BI (unspec:VNx2BI [
> > > #                    (const_vector:VNx2BI repeat [
> > > #                            (const_int 1 [0x1])
> > > #                        ])  # all-1 mask
> > > #                    (reg:DI 143)  # AVL reg, or vector length
> > > #                    (const_int 2 [0x2]) # mask policy
> > > #                    (const_int 0 [0])   # avl type
> > > #                    (reg:SI 66 vl)
> > > #                    (reg:SI 67 vtype)
> > > #                ] UNSPEC_VPREDICATE)
> > > #            (geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> > > #                (reg/v:VNx2QI 137 [ v1 ]))
> > > #            (unspec:VNx2BI [
> > > #                    (reg:SI 0 zero)
> > > #                ] UNSPEC_VUNDEF))) # maskoff and tail operand
> > > #     (expr_list:REG_DEAD (reg:DI 143)
> > > #        (expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> > > #            (nil))))
> > >
> > > And the split pattern, only did on tail/maskoff element with undefined value:
> > >
> > > (define_split
> > >   [(set (match_operand:VB      0 "register_operand")
> > >         (if_then_else:VB
> > >           (unspec:VB
> > >             [(match_operand:VB 1 "vector_all_trues_mask_operand")
> > >              (match_operand    4 "vector_length_operand")
> > >              (match_operand    5 "const_int_operand")
> > >              (match_operand    6 "const_int_operand")
> > >              (reg:SI VL_REGNUM)
> > >              (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> > >           (match_operand:VB    3 "vector_move_operand")
> > >           (match_operand:VB    2 "vector_undef_operand")))] # maskoff
> > > and tail operand, only match undef value
> > >
> > > Then it turns into vmset, and also discard mask policy operand
> > > (since maskoff is undef means don't care IMO):
> > >
> > > (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > >         (if_then_else:VNx2BI (unspec:VNx2BI [
> > >                     (const_vector:VNx2BI repeat [
> > >                             (const_int 1 [0x1])
> > >                         ])  # all-1 mask
> > >                     (reg:DI 143) # AVL reg, or vector length
> > >                     (const_int 2 [0x2]) # mask policy
> > >                     (reg:SI 66 vl)
> > >                     (reg:SI 67 vtype)
> > >                 ] UNSPEC_VPREDICATE)
> > >             (const_vector:VNx2BI repeat [
> > >                     (const_int 1 [0x1])
> > >                 ])    # all-1
> > >             (unspec:VNx2BI [
> > >                     (reg:SI 0 zero)
> > >                 ] UNSPEC_VUNDEF))) # still vundef
> > >      (expr_list:REG_DEAD (reg:DI 143)
> > >         (nil)))
> > Right.  My concern is that when we call relational_result it's going to return -1 (as a vector of bools) which bubbles up through the call
> > chain.   If that doesn't match the actual register state after the
> > instruction (irrespective of the tail policy), then we have the potential to generate incorrect code.
> >
> > For example, if there's a subsequent instruction that tried to set a vector register to -1, it could just copy from the destination of the vmset to the new target.  But if the vmset didn't set all the bits to 1, then the code is wrong.
> >
> > With all the UNSPECs in place, this may not be a problem in practice.
> > Unsure.  I'm willing to defer to you on this Kito.
> >
> > Jeff
  

Patch

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 13038a39e5c..4473115d3a9 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1096,4 +1096,9 @@  extern void riscv_remove_unneeded_save_restore_calls (void);
 #define DWARF_REG_TO_UNWIND_COLUMN(REGNO) \
   ((REGNO == RISCV_DWARF_VLENB) ? (FIRST_PSEUDO_REGISTER + 1) : REGNO)
 
+/* Like s390, riscv also defined this macro for the vector comparision.  Then
+   the simplify-rtx relational_result will canonicalize the result to the
+   CONST1_RTX for the simplification.  */
+#define VECTOR_STORE_FLAG_VALUE(MODE) CONSTM1_RTX (GET_MODE_INNER (MODE))
+
 #endif /* ! GCC_RISCV_H */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
index 8954adad09d..1bca8467a16 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
@@ -283,9 +283,5 @@  vbool64_t test_shortcut_for_riscv_vmsgeu_case_6(vuint8mf8_t v1, size_t vl) {
   return __riscv_vmsgeu_vv_u8mf8_b64(v1, v1, vl);
 }
 
-/* { dg-final { scan-assembler-times {vmseq\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
-/* { dg-final { scan-assembler-times {vmsle\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
-/* { dg-final { scan-assembler-times {vmsleu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
-/* { dg-final { scan-assembler-times {vmsge\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
-/* { dg-final { scan-assembler-times {vmsgeu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
 /* { dg-final { scan-assembler-times {vmclr\.m\sv[0-9]} 35 } } */
+/* { dg-final { scan-assembler-times {vmset\.m\sv[0-9]} 35 } } */