[V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
Checks
Commit Message
This bug blocks the following patches.
GCC doesn't know RVV is using compact mask model.
Consider this following case:
#define N 16
int
main ()
{
int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
int8_t out[N] = {0};
for (int8_t i = 0; i < N; ++i)
if (mask[i])
out[i] = i;
for (int8_t i = 0; i < N; ++i)
{
if (mask[i])
assert (out[i] == i);
else
assert (out[i] == 0);
}
}
Before this patch, the pre-calculated mask in constant memory pool:
.LC1:
.byte 68 ====> 0b01000100
This is incorrect, such case failed in execution.
After this patch:
.LC1:
.byte 10 ====> 0b1010
Pass on exection.
After diging into this issue, I figure such bug only happens on VNx1BI, VNx2BI and VNx4BI.
The reason as follows:
/* Return true if the BITSIZE and PRECISION are not equal.
This helper function tests BITSIZE and PRECISION on RVV mask modes.
For VNx1BI/VNx2BI/VNx4BI modes, since they are having same BYTESIZE
with VNx8BI and compiler can not differentiate them when they are having
same BYTESIZE which will cause incorrect DCE/DSE for them.
To differentiate VNx1BI/VNx2BI/VNx4BI/VNx8BI, we use ADJUST_PRECISION
in riscv-modes.def to adjust different PRECISION for them.
Such approach works fine that compiler can differentiate them, but it causes
incorrect organization of bitmask memory layout.
E.g mask = { 0, -1 } for VNx2BI, the PRECISION will let compiler adjust
bitmask memory layout: 0b0001 which is incorrect for RVV.
Instead, we want to see the correct bitmask memory layout: 0b01.
In this situation, we let RISC-V backend to re-organize the bitmask
memory layout in "mov<mode>" pattern.
*/
So here we add a helper function "bitsize_precision_unequal_p" to force RISC-V backend re-organize
bitmask memory layout of VNx1BI, VNx2BI, VNx4BI since their PRECISION != BITSIZE.
I don't use mode == VNx1BI || mode == VNx2BI || mode == VNx4BI since we are going to have VLS modes.
maybe_ne (GET_MODE_BITSIZE (mode), GET_MODE_PRECISION (mode)) can cover any case including VLA and VLS.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (rvv_builder::get_compact_mask): New function.
(expand_const_vector): Fix bug.
* config/riscv/riscv.cc (bitsize_precision_unequal_p): New function.
(riscv_const_insns): Fix bug.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-10.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-11.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-12.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-13.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-14.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c: New test.
---
gcc/config/riscv/riscv-v.cc | 64 +++++++++++++++++--
gcc/config/riscv/riscv.cc | 36 +++++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-1.c | 23 +++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-10.c | 22 +++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-11.c | 23 +++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-12.c | 23 +++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-13.c | 23 +++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-14.c | 24 +++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-2.c | 23 +++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-3.c | 23 +++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-4.c | 23 +++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-5.c | 25 ++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-6.c | 27 ++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-7.c | 30 +++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-8.c | 30 +++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-9.c | 30 +++++++++
16 files changed, 444 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-11.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-12.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-13.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-14.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c
Comments
On 6/28/23 03:47, Juzhe-Zhong wrote:
> This bug blocks the following patches.
>
> GCC doesn't know RVV is using compact mask model.
> Consider this following case:
>
> #define N 16
>
> int
> main ()
> {
> int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
> int8_t out[N] = {0};
> for (int8_t i = 0; i < N; ++i)
> if (mask[i])
> out[i] = i;
> for (int8_t i = 0; i < N; ++i)
> {
> if (mask[i])
> assert (out[i] == i);
> else
> assert (out[i] == 0);
> }
> }
>
> Before this patch, the pre-calculated mask in constant memory pool:
> .LC1:
> .byte 68 ====> 0b01000100
>
> This is incorrect, such case failed in execution.
>
> After this patch:
> .LC1:
> .byte 10 ====> 0b1010
So I don't get anything like this in my testing. What are the precise
arguments you're using to build the testcase?
I'm compiling the test use a trunk compiler with
-O3 --param riscv-autovec-preference=fixed-vlmax -march=rv64gcv
I get the attached code both before and after your patch. Clearly I'm
doing something different/wrong. So my request is for the precise
command line you're using and the before/after resulting assembly code.
Jeff
.file "j.c"
.option nopic
.attribute arch, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0"
.attribute unaligned_access, 0
.attribute stack_align, 16
.text
.section .rodata.str1.8,"aMS",@progbits,1
.align 3
.LC1:
.string "j.c"
.align 3
.LC2:
.string "out[i] == i"
.align 3
.LC3:
.string "out[i] == 0"
.section .text.startup,"ax",@progbits
.align 1
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
lui a5,%hi(.LANCHOR0)
addi a5,a5,%lo(.LANCHOR0)
ld a4,0(a5)
ld a5,8(a5)
addi sp,sp,-48
.cfi_def_cfa_offset 48
vsetivli zero,16,e8,m1,ta,ma
sd zero,16(sp)
sd a4,0(sp)
sd a5,8(sp)
sd ra,40(sp)
.cfi_offset 1, -8
addi a5,sp,16
sd zero,24(sp)
vid.v v1
vl1re8.v v0,0(sp)
vmsne.vi v0,v0,0
vsetvli a4,zero,e8,m1,ta,ma
vse8.v v1,0(a5),v0.t
lbu a5,16(sp)
bne a5,zero,.L2
lbu a4,17(sp)
li a5,1
bne a4,a5,.L3
lbu a5,18(sp)
bne a5,zero,.L2
lbu a4,19(sp)
li a5,3
bne a4,a5,.L3
lbu a5,20(sp)
bne a5,zero,.L2
lbu a4,21(sp)
li a5,5
bne a4,a5,.L3
lbu a5,22(sp)
bne a5,zero,.L2
lbu a4,23(sp)
li a5,7
bne a4,a5,.L3
lbu a5,24(sp)
bne a5,zero,.L2
lbu a4,25(sp)
li a5,9
bne a4,a5,.L3
lbu a5,26(sp)
bne a5,zero,.L2
lbu a4,27(sp)
li a5,11
bne a4,a5,.L3
lbu a5,28(sp)
bne a5,zero,.L2
lbu a4,29(sp)
li a5,13
bne a4,a5,.L3
lbu a5,30(sp)
bne a5,zero,.L2
lbu a4,31(sp)
li a5,15
bne a4,a5,.L3
ld ra,40(sp)
.cfi_remember_state
.cfi_restore 1
li a0,0
addi sp,sp,48
.cfi_def_cfa_offset 0
jr ra
.L2:
.cfi_restore_state
lui a3,%hi(__PRETTY_FUNCTION__.0)
lui a1,%hi(.LC1)
lui a0,%hi(.LC3)
addi a3,a3,%lo(__PRETTY_FUNCTION__.0)
li a2,18
addi a1,a1,%lo(.LC1)
addi a0,a0,%lo(.LC3)
call __assert_fail
.L3:
lui a3,%hi(__PRETTY_FUNCTION__.0)
lui a1,%hi(.LC1)
lui a0,%hi(.LC2)
addi a3,a3,%lo(__PRETTY_FUNCTION__.0)
li a2,16
addi a1,a1,%lo(.LC1)
addi a0,a0,%lo(.LC2)
call __assert_fail
.cfi_endproc
.LFE0:
.size main, .-main
.section .rodata
.align 3
.set .LANCHOR0,. + 0
.LC0:
.string ""
.string "\001"
.string "\001"
.string "\001"
.string "\001"
.string "\001"
.string "\001"
.string "\001"
.ascii "\001"
.section .srodata,"a"
.align 3
.type __PRETTY_FUNCTION__.0, @object
.size __PRETTY_FUNCTION__.0, 5
__PRETTY_FUNCTION__.0:
.string "main"
.ident "GCC: (GNU) 14.0.0 20230628 (experimental)"
.section .note.GNU-stack,"",@progbits
Try this:
https://godbolt.org/z/x7bM5Pr84
juzhe.zhong@rivai.ai
From: Jeff Law
Date: 2023-06-29 02:11
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
On 6/28/23 03:47, Juzhe-Zhong wrote:
> This bug blocks the following patches.
>
> GCC doesn't know RVV is using compact mask model.
> Consider this following case:
>
> #define N 16
>
> int
> main ()
> {
> int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
> int8_t out[N] = {0};
> for (int8_t i = 0; i < N; ++i)
> if (mask[i])
> out[i] = i;
> for (int8_t i = 0; i < N; ++i)
> {
> if (mask[i])
> assert (out[i] == i);
> else
> assert (out[i] == 0);
> }
> }
>
> Before this patch, the pre-calculated mask in constant memory pool:
> .LC1:
> .byte 68 ====> 0b01000100
>
> This is incorrect, such case failed in execution.
>
> After this patch:
> .LC1:
> .byte 10 ====> 0b1010
So I don't get anything like this in my testing. What are the precise
arguments you're using to build the testcase?
I'm compiling the test use a trunk compiler with
-O3 --param riscv-autovec-preference=fixed-vlmax -march=rv64gcv
I get the attached code both before and after your patch. Clearly I'm
doing something different/wrong. So my request is for the precise
command line you're using and the before/after resulting assembly code.
Jeff
Hi Juzhe,
I find the bug description rather confusing. What I can see is that
the constant in the literal pool is indeed wrong but how would DSE or
so play a role there? Particularly only for the smaller modes?
My suspicion would be that the constant in the literal/constant pool
is wrong from start to finish.
I just played around with the following hunk:
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 542315f88cd..5223c08924f 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
whole element. Often this is byte_mode and contains more
than one element. */
unsigned int nelts = GET_MODE_NUNITS (mode);
- unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
+ unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
With this all your examples pass for me. We then pack e.g. 16 VNx2BI elements
into an int and not just 8. It would also explain why it works for modes
where PRECISION == BITSIZE. Now it will certainly require a more thorough
analysis but maybe it's a start?
Regards
Robin
Robin Dapp via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi Juzhe,
>
> I find the bug description rather confusing. What I can see is that
> the constant in the literal pool is indeed wrong but how would DSE or
> so play a role there? Particularly only for the smaller modes?
>
> My suspicion would be that the constant in the literal/constant pool
> is wrong from start to finish.
>
> I just played around with the following hunk:
>
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index 542315f88cd..5223c08924f 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
> whole element. Often this is byte_mode and contains more
> than one element. */
> unsigned int nelts = GET_MODE_NUNITS (mode);
> - unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
> + unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
> unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
> scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
>
> With this all your examples pass for me. We then pack e.g. 16 VNx2BI elements
> into an int and not just 8. It would also explain why it works for modes
> where PRECISION == BITSIZE. Now it will certainly require a more thorough
> analysis but maybe it's a start?
Yeah. Preapproved for trunk & any necessary branches.
Thanks,
Richard
Ok. Plz go ahead commit this change with the testcases.
Then it won't block the following patches.
Thanks.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2023-06-29 04:42
To: Robin Dapp via Gcc-patches
CC: 钟居哲; Jeff Law; Robin Dapp; kito.cheng; kito.cheng; palmer; palmer
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
Robin Dapp via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi Juzhe,
>
> I find the bug description rather confusing. What I can see is that
> the constant in the literal pool is indeed wrong but how would DSE or
> so play a role there? Particularly only for the smaller modes?
>
> My suspicion would be that the constant in the literal/constant pool
> is wrong from start to finish.
>
> I just played around with the following hunk:
>
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index 542315f88cd..5223c08924f 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
> whole element. Often this is byte_mode and contains more
> than one element. */
> unsigned int nelts = GET_MODE_NUNITS (mode);
> - unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
> + unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
> unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
> scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
>
> With this all your examples pass for me. We then pack e.g. 16 VNx2BI elements
> into an int and not just 8. It would also explain why it works for modes
> where PRECISION == BITSIZE. Now it will certainly require a more thorough
> analysis but maybe it's a start?
Yeah. Preapproved for trunk & any necessary branches.
Thanks,
Richard
Richard Sandiford <richard.sandiford@arm.com> writes:
> Robin Dapp via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> Hi Juzhe,
>>
>> I find the bug description rather confusing. What I can see is that
>> the constant in the literal pool is indeed wrong but how would DSE or
>> so play a role there? Particularly only for the smaller modes?
>>
>> My suspicion would be that the constant in the literal/constant pool
>> is wrong from start to finish.
>>
>> I just played around with the following hunk:
>>
>> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
>> index 542315f88cd..5223c08924f 100644
>> --- a/gcc/varasm.cc
>> +++ b/gcc/varasm.cc
>> @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
>> whole element. Often this is byte_mode and contains more
>> than one element. */
>> unsigned int nelts = GET_MODE_NUNITS (mode);
>> - unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
>> + unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
>> unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
>> scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
>>
>> With this all your examples pass for me. We then pack e.g. 16 VNx2BI elements
>> into an int and not just 8. It would also explain why it works for modes
>> where PRECISION == BITSIZE. Now it will certainly require a more thorough
>> analysis but maybe it's a start?
>
> Yeah. Preapproved for trunk & any necessary branches.
Sorry, only realised later, but: if the precision can cover fewer
bytes than the bitsize, I suppose there ought to be some zero-byte
padding at the end as well.
Thanks,
Richard
Yes. There is a trick fix in RVV.
Ideally, each mode should have PRECISION == BITSIZE. However, for RVV, there is a bug which cause incorrect DSE.
We have VNx1BI (occupy 1bit), VNx2BI (occupy 2bit), VNx4BI (occupy 4bit), VNx8BI (occupy 8bit), since they are having same BYTESIZE,
it cause incorrect DSE.
So we add a trick (ADJUST_PRECISION) to fix it:
https://github.com/gcc-mirror/gcc/commit/247cacc9e381d666a492dfa4ed61b7b19e2d008f
which will prevent the incorrect DSE.
But the maskbit layout in memory comes wrong since the inconsistency between PRECISION and BITSIZE.
So, I force GCC handle this in the RISC-V backend for VNx1BI/VNx2BI/VNx4BI.
I think this is RISC-V backend issue and can be well addressed in RISC-V port (as this patch I post).
No need to bother generic codes since other target could not have the same issues.
Thanks.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2023-06-29 15:53
To: Robin Dapp via Gcc-patches
CC: 钟居哲; Jeff Law; Robin Dapp; kito.cheng; kito.cheng; palmer; palmer
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
Richard Sandiford <richard.sandiford@arm.com> writes:
> Robin Dapp via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> Hi Juzhe,
>>
>> I find the bug description rather confusing. What I can see is that
>> the constant in the literal pool is indeed wrong but how would DSE or
>> so play a role there? Particularly only for the smaller modes?
>>
>> My suspicion would be that the constant in the literal/constant pool
>> is wrong from start to finish.
>>
>> I just played around with the following hunk:
>>
>> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
>> index 542315f88cd..5223c08924f 100644
>> --- a/gcc/varasm.cc
>> +++ b/gcc/varasm.cc
>> @@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
>> whole element. Often this is byte_mode and contains more
>> than one element. */
>> unsigned int nelts = GET_MODE_NUNITS (mode);
>> - unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
>> + unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
>> unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
>> scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
>>
>> With this all your examples pass for me. We then pack e.g. 16 VNx2BI elements
>> into an int and not just 8. It would also explain why it works for modes
>> where PRECISION == BITSIZE. Now it will certainly require a more thorough
>> analysis but maybe it's a start?
>
> Yeah. Preapproved for trunk & any necessary branches.
Sorry, only realised later, but: if the precision can cover fewer
bytes than the bitsize, I suppose there ought to be some zero-byte
padding at the end as well.
Thanks,
Richard
I grep'ed a bit and found several more instances of the same pattern
which would probably all have to be adjusted (frontend-related mostly
but also in native_encode_rtx). Most likely they would all have to
be adjusted?
> Sorry, only realised later, but: if the precision can cover fewer
> bytes than the bitsize, I suppose there ought to be some zero-byte
> padding at the end as well.
It looks like this problem, and also the padding, has been discussed
before when the precision of VNx1BI etc. was first adjusted in the
RISC-V backend?
I didn't immediately get the padding, though. So if we e.g. have a
VNx2BI constant {0, 1} what would we pad the resulting value "2" to?
A full byte?
Juzhe, are we absolutely sure this is the only problem we will have
with precision != bitsize and it is confined to the backend? I would
not dare to make that call. How does DSE come in here at all as you
keep mentioning it?
Regards
Robin
>> are we absolutely sure this is the only problem we will have
>> with precision != bitsize and it is confined to the backend?
Yes.
>>I would
>>not dare to make that call. How does DSE come in here at all as you
>>keep mentioning it?
I mentioned DSE is because:
We have DSE issue before so we use ADJUST_PRECISION to make PRECISON != BITSIZE but we still to walk around this DSE issue:
https://github.com/gcc-mirror/gcc/commit/247cacc9e381d666a492dfa4ed61b7b19e2d008f
However, this fix patch fixed DSE issue which makes PRECISON != BITSIZE, then GCC will generate padding bits for it which we
don't want it.
juzhe.zhong@rivai.ai
From: Robin Dapp
Date: 2023-06-29 16:14
To: Robin Dapp via Gcc-patches; 钟居哲; Jeff Law; kito.cheng; kito.cheng; palmer; palmer; richard.sandiford
CC: rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
I grep'ed a bit and found several more instances of the same pattern
which would probably all have to be adjusted (frontend-related mostly
but also in native_encode_rtx). Most likely they would all have to
be adjusted?
> Sorry, only realised later, but: if the precision can cover fewer
> bytes than the bitsize, I suppose there ought to be some zero-byte
> padding at the end as well.
It looks like this problem, and also the padding, has been discussed
before when the precision of VNx1BI etc. was first adjusted in the
RISC-V backend?
I didn't immediately get the padding, though. So if we e.g. have a
VNx2BI constant {0, 1} what would we pad the resulting value "2" to?
A full byte?
Juzhe, are we absolutely sure this is the only problem we will have
with precision != bitsize and it is confined to the backend? I would
not dare to make that call. How does DSE come in here at all as you
keep mentioning it?
Regards
Robin
>>> are we absolutely sure this is the only problem we will have
>>> with precision != bitsize and it is confined to the backend?
> Yes.
With vinfo.vector_mode == VNx4SI
mask_type = get_mask_type_for_scalar_type (vinfo, int)
mask_type is:
vector(4) <signed-boolean:2>
I.e. the precision is 2. This is definitely fishy and related
to the same problem. I would almost bet that something in the
middle-end relies on the precision for some optimization but
we just haven't hit it yet.
Then we have
vector(2) <signed-boolean:4> (precision 4)
as a mask type for vector(2) long int.
Likewise we would likely have a precision of 8 for a vector(1)?
Those might be less severe but still...
And that's just what I'm seeing spontaneously after like five
minutes.
Regards
Robin
Robin Dapp <rdapp.gcc@gmail.com> writes:
>> Sorry, only realised later, but: if the precision can cover fewer
>> bytes than the bitsize, I suppose there ought to be some zero-byte
>> padding at the end as well.
> It looks like this problem, and also the padding, has been discussed
> before when the precision of VNx1BI etc. was first adjusted in the
> RISC-V backend?
Very probably. Can't remember now.
> I didn't immediately get the padding, though. So if we e.g. have a
> VNx2BI constant {0, 1} what would we pad the resulting value "2" to?
> A full byte?
Yeah, that part is OK, and was the case I was thinking about when
I said OK yesterday. But now that we allow BITSIZE != PRECISION,
it's possible for BITSIZE - PRECISION to be more than a full byte,
in which case the new loop would not initialise every byte of
the mode.
I vaguely remembered that that could happen for RVV_FIXED_VLMAX,
but perhaps I misremember. If it can't happen then an assert
would be OK instead.
Thanks,
Richard
Yes, we have no choice since DSE is base on BYTESIZE.
So I walk around in RISC-V backend making VNx1BI, VNx2BI, VNx4BI precision different with VNx8BI to prevent incorrect DSE.
I think such issue can be addressed when we adjust everything using BITSIZE instead of BYTESIZE but it may change to much.
I prefer it to be GCC-15 (such issue can be walk around in RISC-V backend) since we have to much things need to be landed in GCC-14.
Thanks.
juzhe.zhong@rivai.ai
From: Robin Dapp
Date: 2023-06-29 16:53
To: juzhe.zhong@rivai.ai; gcc-patches; jeffreyalaw; kito.cheng; Kito.cheng; palmer; palmer; richard.sandiford
CC: rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
>>> are we absolutely sure this is the only problem we will have
>>> with precision != bitsize and it is confined to the backend?
> Yes.
With vinfo.vector_mode == VNx4SI
mask_type = get_mask_type_for_scalar_type (vinfo, int)
mask_type is:
vector(4) <signed-boolean:2>
I.e. the precision is 2. This is definitely fishy and related
to the same problem. I would almost bet that something in the
middle-end relies on the precision for some optimization but
we just haven't hit it yet.
Then we have
vector(2) <signed-boolean:4> (precision 4)
as a mask type for vector(2) long int.
Likewise we would likely have a precision of 8 for a vector(1)?
Those might be less severe but still...
And that's just what I'm seeing spontaneously after like five
minutes.
Regards
Robin
> Yeah, that part is OK, and was the case I was thinking about when
> I said OK yesterday. But now that we allow BITSIZE != PRECISION,
> it's possible for BITSIZE - PRECISION to be more than a full byte,
> in which case the new loop would not initialise every byte of
> the mode.
Ah, I see, so when e.g. BITSIZE == 16 and PRECISION == 1. Luckily
this cannot happen with RVV as all we do is adjust the precision
of the modes that have BITSIZE == 8. I'm going to add an assert.
Juzhe would rather work around that in the backend, though.
The other thing I just noticed is
tree
build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode)
{
gcc_assert (mask_mode != BLKmode);
unsigned HOST_WIDE_INT esize;
if (VECTOR_MODE_P (mask_mode))
{
poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode);
esize = vector_element_size (vsize, nunits);
}
else
esize = 1;
tree bool_type = build_nonstandard_boolean_type (esize);
return make_vector_type (bool_type, nunits, mask_mode);
}
which gives us wrong precision as we rely on the BITSIZE here as well.
This results in a precision of 1 for VNx8BI, 2 for VNx4BI and 4 for
VNx2BI.
Maybe this isn't a problem per se but to me it appears
just wrong.
Regards
Robin
No, I am not saying I want to fix it in RISC-V backend.
Actually, if you can quickly land the fix in generic codes and not block of the RISC-V following patches.
I am glad to see. Otherwise, I prefer to fix it RISC-V backend for now if it is not a big issue for performance and defer it to GCC-15 to make it perfect.
The reason why I plan that is global reviewers bandwidth is very limit.
We should make the highest priority auto-vectorizaiton middle-end support first and then let's come back to see the corner case issues.
Thanks.
juzhe.zhong@rivai.ai
From: Robin Dapp
Date: 2023-06-29 17:09
To: Robin Dapp via Gcc-patches; 钟居哲; Jeff Law; kito.cheng; kito.cheng; palmer; palmer; richard.sandiford
CC: rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Fix bug of pre-calculated const vector mask for VNx1BI, VNx2BI and VNx4BI
> Yeah, that part is OK, and was the case I was thinking about when
> I said OK yesterday. But now that we allow BITSIZE != PRECISION,
> it's possible for BITSIZE - PRECISION to be more than a full byte,
> in which case the new loop would not initialise every byte of
> the mode.
Ah, I see, so when e.g. BITSIZE == 16 and PRECISION == 1. Luckily
this cannot happen with RVV as all we do is adjust the precision
of the modes that have BITSIZE == 8. I'm going to add an assert.
Juzhe would rather work around that in the backend, though.
The other thing I just noticed is
tree
build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode)
{
gcc_assert (mask_mode != BLKmode);
unsigned HOST_WIDE_INT esize;
if (VECTOR_MODE_P (mask_mode))
{
poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode);
esize = vector_element_size (vsize, nunits);
}
else
esize = 1;
tree bool_type = build_nonstandard_boolean_type (esize);
return make_vector_type (bool_type, nunits, mask_mode);
}
which gives us wrong precision as we rely on the BITSIZE here as well.
This results in a precision of 1 for VNx8BI, 2 for VNx4BI and 4 for
VNx2BI.
Maybe this isn't a problem per se but to me it appears
just wrong.
Regards
Robin
On Thu, Jun 29, 2023 at 11:10 AM Robin Dapp via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> > Yeah, that part is OK, and was the case I was thinking about when
> > I said OK yesterday. But now that we allow BITSIZE != PRECISION,
> > it's possible for BITSIZE - PRECISION to be more than a full byte,
> > in which case the new loop would not initialise every byte of
> > the mode.
>
> Ah, I see, so when e.g. BITSIZE == 16 and PRECISION == 1. Luckily
> this cannot happen with RVV as all we do is adjust the precision
> of the modes that have BITSIZE == 8. I'm going to add an assert.
> Juzhe would rather work around that in the backend, though.
>
> The other thing I just noticed is
>
> tree
> build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode)
> {
> gcc_assert (mask_mode != BLKmode);
>
> unsigned HOST_WIDE_INT esize;
> if (VECTOR_MODE_P (mask_mode))
> {
> poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode);
> esize = vector_element_size (vsize, nunits);
> }
> else
> esize = 1;
>
> tree bool_type = build_nonstandard_boolean_type (esize);
>
> return make_vector_type (bool_type, nunits, mask_mode);
> }
>
> which gives us wrong precision as we rely on the BITSIZE here as well.
> This results in a precision of 1 for VNx8BI, 2 for VNx4BI and 4 for
> VNx2BI.
This should probably use GET_MODE_PRECISION as well.
OK if it bootstraps/tests on both aarch64 and riscv.
Richard.
>
> Maybe this isn't a problem per se but to me it appears
> just wrong.
>
> Regards
> Robin
>
> This should probably use GET_MODE_PRECISION as well.
>
> OK if it bootstraps/tests on both aarch64 and riscv.
>
> Richard.
I found a several other instances, also in the frontends that
I'm not exactly sure about. I'm currently testing this but aarch64
bootstrap is still going to take a while, various aarch compile
farm machines are down?
Regards
Robin
From ef919a27f4a156afeca6b4825e6029d9f44be556 Mon Sep 17 00:00:00 2001
From: Robin Dapp <rdapp@ventanamicro.com>
Date: Wed, 28 Jun 2023 20:59:29 +0200
Subject: [PATCH] mode_bitsize -> precision.
bitsize -> precision.
---
gcc/c-family/c-common.cc | 2 +-
gcc/fortran/trans-types.cc | 2 +-
gcc/go/go-lang.cc | 2 +-
gcc/lto/lto-lang.cc | 2 +-
gcc/rust/backend/rust-tree.cc | 2 +-
gcc/simplify-rtx.cc | 10 +++++-----
gcc/tree.cc | 2 +-
gcc/varasm.cc | 2 +-
8 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 34566a342bd..6ab63dae997 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -2458,7 +2458,7 @@ c_common_type_for_mode (machine_mode mode, int unsignedp)
else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
&& valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
{
- unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+ unsigned int elem_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
tree bool_type = build_nonstandard_boolean_type (elem_bits);
return build_vector_type_for_mode (bool_type, mode);
diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index d718f28cc86..987e3d26c46 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -3403,7 +3403,7 @@ gfc_type_for_mode (machine_mode mode, int unsignedp)
else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
&& valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
{
- unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+ unsigned int elem_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
tree bool_type = build_nonstandard_boolean_type (elem_bits);
return build_vector_type_for_mode (bool_type, mode);
diff --git a/gcc/go/go-lang.cc b/gcc/go/go-lang.cc
index e85a4bfe949..d5c871a533c 100644
--- a/gcc/go/go-lang.cc
+++ b/gcc/go/go-lang.cc
@@ -414,7 +414,7 @@ go_langhook_type_for_mode (machine_mode mode, int unsignedp)
if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
&& valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
{
- unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+ unsigned int elem_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
tree bool_type = build_nonstandard_boolean_type (elem_bits);
return build_vector_type_for_mode (bool_type, mode);
diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc
index 52d7626e92e..14d419c2013 100644
--- a/gcc/lto/lto-lang.cc
+++ b/gcc/lto/lto-lang.cc
@@ -1050,7 +1050,7 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
&& valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
{
- unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+ unsigned int elem_bits = vectwhereor_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
tree bool_type = build_nonstandard_boolean_type (elem_bits);
return build_vector_type_for_mode (bool_type, mode);
diff --git a/gcc/rust/backend/rust-tree.cc b/gcc/rust/backend/rust-tree.cc
index 8243d4cf5c6..66e859cd70c 100644
--- a/gcc/rust/backend/rust-tree.cc
+++ b/gcc/rust/backend/rust-tree.cc
@@ -5320,7 +5320,7 @@ c_common_type_for_mode (machine_mode mode, int unsignedp)
&& valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
{
unsigned int elem_bits
- = vector_element_size (GET_MODE_BITSIZE (mode), GET_MODE_NUNITS (mode));
+ = vector_element_size (GET_MODE_PRECISION (mode), GET_MODE_NUNITS (mode));
tree bool_type = build_nonstandard_boolean_type (elem_bits);
return build_vector_type_for_mode (bool_type, mode);
}
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 99cbdd47d93..d7315d82aa3 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -7076,7 +7076,7 @@ native_encode_rtx (machine_mode mode, rtx x, vec<target_unit> &bytes,
/* CONST_VECTOR_ELT follows target memory order, so no shuffling
is necessary. The only complication is that MODE_VECTOR_BOOL
vectors can have several elements per byte. */
- unsigned int elt_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+ unsigned int elt_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
unsigned int elt = first_byte * BITS_PER_UNIT / elt_bits;
if (elt_bits < BITS_PER_UNIT)
@@ -7222,7 +7222,7 @@ native_decode_vector_rtx (machine_mode mode, const vec<target_unit> &bytes,
{
rtx_vector_builder builder (mode, npatterns, nelts_per_pattern);
- unsigned int elt_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+ unsigned int elt_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
if (elt_bits < BITS_PER_UNIT)
{
@@ -7359,7 +7359,7 @@ simplify_const_vector_byte_offset (rtx x, poly_uint64 byte)
{
/* Cope with MODE_VECTOR_BOOL by operating on bits rather than bytes. */
machine_mode mode = GET_MODE (x);
- unsigned int elt_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+ unsigned int elt_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
/* The number of bits needed to encode one element from each pattern. */
unsigned int sequence_bits = CONST_VECTOR_NPATTERNS (x) * elt_bits;
@@ -7414,10 +7414,10 @@ simplify_const_vector_subreg (machine_mode outermode, rtx x,
/* Cope with MODE_VECTOR_BOOL by operating on bits rather than bytes. */
unsigned int x_elt_bits
- = vector_element_size (GET_MODE_BITSIZE (innermode),
+ = vector_element_size (GET_MODE_PRECISION (innermode),
GET_MODE_NUNITS (innermode));
unsigned int out_elt_bits
- = vector_element_size (GET_MODE_BITSIZE (outermode),
+ = vector_element_size (GET_MODE_PRECISION (outermode),
GET_MODE_NUNITS (outermode));
/* The number of bits needed to encode one element from every pattern
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 58288efa2e2..c68761fccee 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -10143,7 +10143,7 @@ build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode)
unsigned HOST_WIDE_INT esize;
if (VECTOR_MODE_P (mask_mode))
{
- poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode);
+ poly_uint64 vsize = GET_MODE_PRECISION (mask_mode);
esize = vector_element_size (vsize, nunits);
}
else
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 8ae0a2555cd..f65416cff99 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -4061,7 +4061,7 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
whole element. Often this is byte_mode and contains more
than one element. */
unsigned int nelts = GET_MODE_NUNITS (mode);
- unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
+ unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
unsigned int mask = GET_MODE_MASK (GET_MODE_INNER (mode));
Hi Robin:
> diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc
> index 52d7626e92e..14d419c2013 100644
> --- a/gcc/lto/lto-lang.cc
> +++ b/gcc/lto/lto-lang.cc
> @@ -1050,7 +1050,7 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
> else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
> && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
> {
> - unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
> + unsigned int elem_bits = vectwhereor_element_size (GET_MODE_PRECISION (mode),
This seems weird?
> GET_MODE_NUNITS (mode));
> tree bool_type = build_nonstandard_boolean_type (elem_bits);
> return build_vector_type_for_mode (bool_type, mode);
Kito Cheng <kito.cheng@gmail.com> writes:
> Hi Robin:
>
>> diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc
>> index 52d7626e92e..14d419c2013 100644
>> --- a/gcc/lto/lto-lang.cc
>> +++ b/gcc/lto/lto-lang.cc
>> @@ -1050,7 +1050,7 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
>> else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
>> && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
>> {
>> - unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
>> + unsigned int elem_bits = vectwhereor_element_size (GET_MODE_PRECISION (mode),
>
> This seems weird?
FWIW, I bootstrapped & regression-tested the patch with that fixed
on aarch64-linux-gnu (all languages).
So OK with the above fixed from my POV.
Thanks,
Richard
>
>> GET_MODE_NUNITS (mode));
>> tree bool_type = build_nonstandard_boolean_type (elem_bits);
>> return build_vector_type_for_mode (bool_type, mode);
>> Hi Robin:
>>
>>> diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc
>>> index 52d7626e92e..14d419c2013 100644
>>> --- a/gcc/lto/lto-lang.cc
>>> +++ b/gcc/lto/lto-lang.cc
>>> @@ -1050,7 +1050,7 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
>>> else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
>>> && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
>>> {
>>> - unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
>>> + unsigned int elem_bits = vectwhereor_element_size (GET_MODE_PRECISION (mode),
>>
>> This seems weird?
Indeed :D Must be an accidental middle-click in Thunderbird. I just
re-checked and the diff itself is fine.
> FWIW, I bootstrapped & regression-tested the patch with that fixed
> on aarch64-linux-gnu (all languages).
>
> So OK with the above fixed from my POV.
Oh, thanks! Mine is still running, not even with all languages. I picked
the M1 from the compile farm which only has eight cores.
Kito (or somebody else), would you mind doing a RISC-V bootstrap? It would
take forever on my machine. Thank you.
Regards
Robin
> Kito (or somebody else), would you mind doing a RISC-V bootstrap? It would
> take forever on my machine. Thank you.
I did a bootstrap myself now and it finally finished. Going to commit the
attached tomorrow.
Regards
Robin
Subject: [PATCH] Change MODE_BITSIZE to MODE_PRECISION for MODE_VECTOR_BOOL.
RISC-V lowers the TYPE_PRECISION for MODE_VECTOR_BOOL vectors in order
to distinguish between VNx1BI, VNx2BI, VNx4BI and VNx8BI.
This patch adjusts uses of MODE_VECTOR_BOOL to use GET_MODE_PRECISION
instead of GET_MODE_BITSIZE.
The RISC-V tests are provided by Juzhe.
Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/c-family/ChangeLog:
* c-common.cc (c_common_type_for_mode): Use GET_MODE_PRECISION.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector): Ditto.
* simplify-rtx.cc (native_encode_rtx): Ditto.
(native_decode_vector_rtx): Ditto.
(simplify_const_vector_byte_offset): Ditto.
(simplify_const_vector_subreg): Ditto.
* tree.cc (build_truth_vector_type_for_mode): Ditto.
* varasm.cc (output_constant_pool_2): Ditto.
gcc/fortran/ChangeLog:
* trans-types.cc (gfc_type_for_mode): Ditto.
gcc/go/ChangeLog:
* go-lang.cc (go_langhook_type_for_mode): Ditto.
gcc/lto/ChangeLog:
* lto-lang.cc (lto_type_for_mode): Ditto.
gcc/rust/ChangeLog:
* backend/rust-tree.cc (c_common_type_for_mode): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-10.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-11.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-12.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-13.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-14.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c: New test.
---
gcc/c-family/c-common.cc | 2 +-
gcc/config/riscv/riscv-v.cc | 12 ++++----
gcc/fortran/trans-types.cc | 2 +-
gcc/go/go-lang.cc | 2 +-
gcc/lto/lto-lang.cc | 2 +-
gcc/rust/backend/rust-tree.cc | 2 +-
gcc/simplify-rtx.cc | 10 +++----
.../riscv/rvv/autovec/vls-vlmax/bitmask-1.c | 23 ++++++++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-10.c | 22 ++++++++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-11.c | 23 ++++++++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-12.c | 23 ++++++++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-13.c | 23 ++++++++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-14.c | 24 +++++++++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-2.c | 23 ++++++++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-3.c | 23 ++++++++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-4.c | 23 ++++++++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-5.c | 25 ++++++++++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-6.c | 27 +++++++++++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-7.c | 30 +++++++++++++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-8.c | 30 +++++++++++++++++++
.../riscv/rvv/autovec/vls-vlmax/bitmask-9.c | 30 +++++++++++++++++++
gcc/tree.cc | 2 +-
gcc/varasm.cc | 8 ++++-
23 files changed, 374 insertions(+), 17 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-11.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-12.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-13.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-14.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c
diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 34566a342bd..6ab63dae997 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -2458,7 +2458,7 @@ c_common_type_for_mode (machine_mode mode, int unsignedp)
else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
&& valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
{
- unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+ unsigned int elem_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
tree bool_type = build_nonstandard_boolean_type (elem_bits);
return build_vector_type_for_mode (bool_type, mode);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 3f9ee044e8e..0595e5726a7 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1141,11 +1141,13 @@ expand_const_vector (rtx target, rtx src)
if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
{
rtx elt;
- gcc_assert (
- const_vec_duplicate_p (src, &elt)
- && (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
- rtx ops[] = {target, src};
- emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
+ if (const_vec_duplicate_p (src, &elt))
+ {
+ rtx ops[] = {target, src};
+ emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
+ }
+ else
+ gcc_unreachable ();
return;
}
diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index d718f28cc86..987e3d26c46 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -3403,7 +3403,7 @@ gfc_type_for_mode (machine_mode mode, int unsignedp)
else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
&& valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
{
- unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+ unsigned int elem_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
tree bool_type = build_nonstandard_boolean_type (elem_bits);
return build_vector_type_for_mode (bool_type, mode);
diff --git a/gcc/go/go-lang.cc b/gcc/go/go-lang.cc
index e85a4bfe949..d5c871a533c 100644
--- a/gcc/go/go-lang.cc
+++ b/gcc/go/go-lang.cc
@@ -414,7 +414,7 @@ go_langhook_type_for_mode (machine_mode mode, int unsignedp)
if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
&& valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
{
- unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+ unsigned int elem_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
tree bool_type = build_nonstandard_boolean_type (elem_bits);
return build_vector_type_for_mode (bool_type, mode);
diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc
index 52d7626e92e..14d419c2013 100644
--- a/gcc/lto/lto-lang.cc
+++ b/gcc/lto/lto-lang.cc
@@ -1050,7 +1050,7 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
else if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
&& valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
{
- unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+ unsigned int elem_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
tree bool_type = build_nonstandard_boolean_type (elem_bits);
return build_vector_type_for_mode (bool_type, mode);
diff --git a/gcc/rust/backend/rust-tree.cc b/gcc/rust/backend/rust-tree.cc
index 8243d4cf5c6..66e859cd70c 100644
--- a/gcc/rust/backend/rust-tree.cc
+++ b/gcc/rust/backend/rust-tree.cc
@@ -5320,7 +5320,7 @@ c_common_type_for_mode (machine_mode mode, int unsignedp)
&& valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
{
unsigned int elem_bits
- = vector_element_size (GET_MODE_BITSIZE (mode), GET_MODE_NUNITS (mode));
+ = vector_element_size (GET_MODE_PRECISION (mode), GET_MODE_NUNITS (mode));
tree bool_type = build_nonstandard_boolean_type (elem_bits);
return build_vector_type_for_mode (bool_type, mode);
}
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 99cbdd47d93..d7315d82aa3 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -7076,7 +7076,7 @@ native_encode_rtx (machine_mode mode, rtx x, vec<target_unit> &bytes,
/* CONST_VECTOR_ELT follows target memory order, so no shuffling
is necessary. The only complication is that MODE_VECTOR_BOOL
vectors can have several elements per byte. */
- unsigned int elt_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+ unsigned int elt_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
unsigned int elt = first_byte * BITS_PER_UNIT / elt_bits;
if (elt_bits < BITS_PER_UNIT)
@@ -7222,7 +7222,7 @@ native_decode_vector_rtx (machine_mode mode, const vec<target_unit> &bytes,
{
rtx_vector_builder builder (mode, npatterns, nelts_per_pattern);
- unsigned int elt_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+ unsigned int elt_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
if (elt_bits < BITS_PER_UNIT)
{
@@ -7359,7 +7359,7 @@ simplify_const_vector_byte_offset (rtx x, poly_uint64 byte)
{
/* Cope with MODE_VECTOR_BOOL by operating on bits rather than bytes. */
machine_mode mode = GET_MODE (x);
- unsigned int elt_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+ unsigned int elt_bits = vector_element_size (GET_MODE_PRECISION (mode),
GET_MODE_NUNITS (mode));
/* The number of bits needed to encode one element from each pattern. */
unsigned int sequence_bits = CONST_VECTOR_NPATTERNS (x) * elt_bits;
@@ -7414,10 +7414,10 @@ simplify_const_vector_subreg (machine_mode outermode, rtx x,
/* Cope with MODE_VECTOR_BOOL by operating on bits rather than bytes. */
unsigned int x_elt_bits
- = vector_element_size (GET_MODE_BITSIZE (innermode),
+ = vector_element_size (GET_MODE_PRECISION (innermode),
GET_MODE_NUNITS (innermode));
unsigned int out_elt_bits
- = vector_element_size (GET_MODE_BITSIZE (outermode),
+ = vector_element_size (GET_MODE_PRECISION (outermode),
GET_MODE_NUNITS (outermode));
/* The number of bits needed to encode one element from every pattern
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c
new file mode 100644
index 00000000000..81229fd62b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c
@@ -0,0 +1,23 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 16
+
+int
+main ()
+{
+ int mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int64_t out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-10.c
new file mode 100644
index 00000000000..d891f3c16e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-10.c
@@ -0,0 +1,22 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3 --param riscv-autovec-lmul=m2" } */
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 16
+
+int
+main ()
+{
+ int mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int8_t out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-11.c
new file mode 100644
index 00000000000..535641443ec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-11.c
@@ -0,0 +1,23 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 4
+
+int
+main ()
+{
+ int mask[N] = {0, 1, 0, 1};
+ int out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-12.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-12.c
new file mode 100644
index 00000000000..a7c12c3797b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-12.c
@@ -0,0 +1,23 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3 --param riscv-autovec-lmul=m2" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 8
+
+int
+main ()
+{
+ int mask[N] = {0, 1, 0, 1, 0, 1, 0, 1};
+ int out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-13.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-13.c
new file mode 100644
index 00000000000..726238c1cd8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-13.c
@@ -0,0 +1,23 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3 --param riscv-autovec-lmul=m4" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 16
+
+int
+main ()
+{
+ int mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-14.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-14.c
new file mode 100644
index 00000000000..c369cf0b268
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-14.c
@@ -0,0 +1,24 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3 --param riscv-autovec-lmul=m8" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 32
+
+int
+main ()
+{
+ int mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c
new file mode 100644
index 00000000000..a23e47171bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c
@@ -0,0 +1,23 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 16
+
+int
+main ()
+{
+ int mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c
new file mode 100644
index 00000000000..6ea8fdd89c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c
@@ -0,0 +1,23 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 16
+
+int
+main ()
+{
+ int16_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int16_t out[N] = {0};
+ for (int16_t i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int16_t i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c
new file mode 100644
index 00000000000..2d97c26abfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c
@@ -0,0 +1,23 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 16
+
+int
+main ()
+{
+ int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int8_t out[N] = {0};
+ for (int8_t i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int8_t i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c
new file mode 100644
index 00000000000..b89b70e99a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m2 -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+
+#define N 32
+
+int
+main ()
+{
+ int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int8_t out[N] = {0};
+ for (int8_t i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int8_t i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c
new file mode 100644
index 00000000000..ac8d91e793b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c
@@ -0,0 +1,27 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m4 -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+
+#define N 64
+
+int
+main ()
+{
+ int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int8_t out[N] = {0};
+ for (int8_t i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int8_t i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c
new file mode 100644
index 00000000000..f538db23b1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c
@@ -0,0 +1,30 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+
+#define N 128
+
+int
+main ()
+{
+ uint8_t mask[N]
+ = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ uint8_t out[N] = {0};
+ for (uint8_t i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (uint8_t i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c
new file mode 100644
index 00000000000..5abb34c1686
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c
@@ -0,0 +1,30 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+
+#define N 128
+
+int
+main ()
+{
+ int mask[N]
+ = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c
new file mode 100644
index 00000000000..6fdaa516534
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c
@@ -0,0 +1,30 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+
+#define N 128
+
+int
+main ()
+{
+ int64_t mask[N]
+ = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int64_t out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
diff --git a/gcc/tree.cc b/gcc/tree.cc
index bd500ec72a5..420857b110c 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -10143,7 +10143,7 @@ build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode)
unsigned HOST_WIDE_INT esize;
if (VECTOR_MODE_P (mask_mode))
{
- poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode);
+ poly_uint64 vsize = GET_MODE_PRECISION (mask_mode);
esize = vector_element_size (vsize, nunits);
}
else
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 8ae0a2555cd..53f0cc61922 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -4061,11 +4061,17 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
whole element. Often this is byte_mode and contains more
than one element. */
unsigned int nelts = GET_MODE_NUNITS (mode);
- unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
+ unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
unsigned int mask = GET_MODE_MASK (GET_MODE_INNER (mode));
+ /* We allow GET_MODE_PRECISION (mode) <= GET_MODE_BITSIZE (mode) but
+ only properly handle cases where the difference is less than a
+ byte. */
+ gcc_assert (GET_MODE_BITSIZE (mode) - GET_MODE_PRECISION (mode) <
+ BITS_PER_UNIT);
+
/* Build the constant up one integer at a time. */
unsigned int elts_per_int = int_bits / elt_bits;
for (unsigned int i = 0; i < nelts; i += elts_per_int)
@@ -291,6 +291,7 @@ public:
bool single_step_npatterns_p () const;
bool npatterns_all_equal_p () const;
+ rtx get_compact_mask () const;
machine_mode new_mode () const { return m_new_mode; }
scalar_mode inner_mode () const { return m_inner_mode; }
@@ -505,6 +506,47 @@ rvv_builder::npatterns_all_equal_p () const
return true;
}
+/* Generate the compact mask.
+
+ E.g: mask = { 0, -1 }, mode = VNx2BI, bitsize = 128bits.
+
+ GCC by default will generate the mask = 0b00000001xxxxx.
+
+ However, it's not expected mask for RVV since RVV
+ prefers the compact mask = 0b10xxxxx.
+*/
+rtx
+rvv_builder::get_compact_mask () const
+{
+ /* If TARGET_MIN_VLEN == 32, the minimum LMUL = 1/4.
+ Otherwise, the minimum LMUL = 1/8. */
+ unsigned min_lmul = TARGET_MIN_VLEN == 32 ? 4 : 8;
+ unsigned min_container_size
+ = BYTES_PER_RISCV_VECTOR.to_constant () / min_lmul;
+ unsigned container_size = MAX (CEIL (npatterns (), 8), min_container_size);
+ machine_mode container_mode
+ = get_vector_mode (QImode, container_size).require ();
+
+ unsigned nunits = GET_MODE_NUNITS (container_mode).to_constant ();
+ rtvec v = rtvec_alloc (nunits);
+ for (unsigned i = 0; i < nunits; i++)
+ RTVEC_ELT (v, i) = const0_rtx;
+
+ unsigned char b = 0;
+ for (unsigned i = 0; i < npatterns (); i++)
+ {
+ if (INTVAL (elt (i)))
+ b = b | (1 << (i % 8));
+
+ if ((i > 0 && (i % 8) == 7) || (i == (npatterns () - 1)))
+ {
+ RTVEC_ELT (v, ((i + 7) / 8) - 1) = gen_int_mode (b, QImode);
+ b = 0;
+ }
+ }
+ return gen_rtx_CONST_VECTOR (container_mode, v);
+}
+
static unsigned
get_sew (machine_mode mode)
{
@@ -1141,11 +1183,23 @@ expand_const_vector (rtx target, rtx src)
if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
{
rtx elt;
- gcc_assert (
- const_vec_duplicate_p (src, &elt)
- && (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
- rtx ops[] = {target, src};
- emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
+ unsigned int nelts;
+ if (const_vec_duplicate_p (src, &elt))
+ {
+ rtx ops[] = {target, src};
+ emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
+ }
+ else if (GET_MODE_NUNITS (mode).is_constant (&nelts))
+ {
+ rvv_builder builder (mode, nelts, 1);
+ for (unsigned int i = 0; i < nelts; i++)
+ builder.quick_push (CONST_VECTOR_ELT (src, i));
+ rtx mask = builder.get_compact_mask ();
+ rtx mem = validize_mem (force_const_mem (GET_MODE (mask), mask));
+ emit_move_insn (target, gen_rtx_MEM (mode, XEXP (mem, 0)));
+ }
+ else
+ gcc_unreachable ();
return;
}
@@ -1244,6 +1244,36 @@ riscv_address_insns (rtx x, machine_mode mode, bool might_split_p)
return n;
}
+/* Return true if the BITSIZE and PRECISION are not equal.
+
+ This helper function tests BITSIZE and PRECISION on RVV mask modes.
+
+ For VNx1BI/VNx2BI/VNx4BI modes, since they are having same BYTESIZE
+ with VNx8BI and compiler can not differentiate them when they are having
+ same BYTESIZE which will cause incorrect DCE/DSE for them.
+
+ To differentiate VNx1BI/VNx2BI/VNx4BI/VNx8BI, we use ADJUST_PRECISION
+ in riscv-modes.def to adjust different PRECISION for them.
+
+ Such approach works fine that compiler can differentiate them, but it causes
+ incorrect organization of bitmask memory layout.
+
+ E.g mask = { 0, -1 } for VNx2BI, the PRECISION will let compiler adjust
+ bitmask memory layout: 0b0001 which is incorrect for RVV.
+
+ Instead, we want to see the correct bitmask memory layout: 0b01.
+ In this situation, we let RISC-V backend to re-organize the bitmask
+ memory layout in "mov<mode>" pattern.
+*/
+static bool
+bitsize_precision_unequal_p (machine_mode mode)
+{
+ /* We don't need to worry about non-BOOL vector modes for RVV. */
+ if (GET_MODE_CLASS (mode) != MODE_VECTOR_BOOL)
+ return false;
+ return maybe_ne (GET_MODE_BITSIZE (mode), GET_MODE_PRECISION (mode));
+}
+
/* Return the number of instructions needed to load constant X.
Return 0 if X isn't a valid constant. */
@@ -1323,6 +1353,12 @@ riscv_const_insns (rtx x)
return 1 + 4; /*vmv.v.x + memory access. */
}
}
+
+ /* GCC doesn't known RVV is using compact model of mask,
+ we should by default handle mask in mov<mode> pattern. */
+ if (bitsize_precision_unequal_p (GET_MODE (x)))
+ /* TODO: We can adjust it according real cost model of vlm.v. */
+ return 1;
}
/* TODO: We may support more const vector in the future. */
new file mode 100644
@@ -0,0 +1,23 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 16
+
+int
+main ()
+{
+ int mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int64_t out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
new file mode 100644
@@ -0,0 +1,22 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3 --param riscv-autovec-lmul=m2" } */
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 16
+
+int
+main ()
+{
+ int mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int8_t out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
new file mode 100644
@@ -0,0 +1,23 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 4
+
+int
+main ()
+{
+ int mask[N] = {0, 1, 0, 1};
+ int out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
new file mode 100644
@@ -0,0 +1,23 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3 --param riscv-autovec-lmul=m2" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 8
+
+int
+main ()
+{
+ int mask[N] = {0, 1, 0, 1, 0, 1, 0, 1};
+ int out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
new file mode 100644
@@ -0,0 +1,23 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3 --param riscv-autovec-lmul=m4" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 16
+
+int
+main ()
+{
+ int mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
new file mode 100644
@@ -0,0 +1,24 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3 --param riscv-autovec-lmul=m8" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 32
+
+int
+main ()
+{
+ int mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
new file mode 100644
@@ -0,0 +1,23 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 16
+
+int
+main ()
+{
+ int mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
new file mode 100644
@@ -0,0 +1,23 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 16
+
+int
+main ()
+{
+ int16_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int16_t out[N] = {0};
+ for (int16_t i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int16_t i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
new file mode 100644
@@ -0,0 +1,23 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+#define N 16
+
+int
+main ()
+{
+ int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int8_t out[N] = {0};
+ for (int8_t i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int8_t i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
new file mode 100644
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m2 -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+
+#define N 32
+
+int
+main ()
+{
+ int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int8_t out[N] = {0};
+ for (int8_t i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int8_t i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
new file mode 100644
@@ -0,0 +1,27 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m4 -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+
+#define N 64
+
+int
+main ()
+{
+ int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int8_t out[N] = {0};
+ for (int8_t i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int8_t i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
new file mode 100644
@@ -0,0 +1,30 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+
+#define N 128
+
+int
+main ()
+{
+ uint8_t mask[N]
+ = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ uint8_t out[N] = {0};
+ for (uint8_t i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (uint8_t i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
new file mode 100644
@@ -0,0 +1,30 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+
+#define N 128
+
+int
+main ()
+{
+ int mask[N]
+ = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}
new file mode 100644
@@ -0,0 +1,30 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "--param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m8 -O3" } */
+
+#include <stdint-gcc.h>
+#include <assert.h>
+
+#define N 128
+
+int
+main ()
+{
+ int64_t mask[N]
+ = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
+ 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
+ int64_t out[N] = {0};
+ for (int i = 0; i < N; ++i)
+ if (mask[i])
+ out[i] = i;
+ for (int i = 0; i < N; ++i)
+ {
+ if (mask[i])
+ assert (out[i] == i);
+ else
+ assert (out[i] == 0);
+ }
+}