[v4,07/10] vect: Verify that GET_MODE_NUNITS is a multiple of 2.
Checks
Commit Message
While working on autovectorizing for the RISCV port I encountered an issue
where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
evenly divisible by two. The RISC-V target has vector modes (e.g. VNx1DImode),
where GET_MODE_NUNITS is equal to one.
Tested on RISCV and x86_64-linux-gnu. Okay?
2023-03-09 Michael Collison <collison@rivosinc.com>
* tree-vect-slp.cc (can_duplicate_and_interleave_p):
Check that GET_MODE_NUNITS is a multiple of 2.
---
gcc/tree-vect-slp.cc | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
Comments
On Mon, Apr 17, 2023 at 8:42 PM Michael Collison <collison@rivosinc.com> wrote:
>
> While working on autovectorizing for the RISCV port I encountered an issue
> where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
> evenly divisible by two. The RISC-V target has vector modes (e.g. VNx1DImode),
> where GET_MODE_NUNITS is equal to one.
>
> Tested on RISCV and x86_64-linux-gnu. Okay?
OK.
> 2023-03-09 Michael Collison <collison@rivosinc.com>
>
> * tree-vect-slp.cc (can_duplicate_and_interleave_p):
> Check that GET_MODE_NUNITS is a multiple of 2.
> ---
> gcc/tree-vect-slp.cc | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index d73deaecce0..a64fe454e19 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -423,10 +423,13 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned int count,
> (GET_MODE_BITSIZE (int_mode), 1);
> tree vector_type
> = get_vectype_for_scalar_type (vinfo, int_type, count);
> + poly_int64 half_nelts;
> if (vector_type
> && VECTOR_MODE_P (TYPE_MODE (vector_type))
> && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
> - GET_MODE_SIZE (base_vector_mode)))
> + GET_MODE_SIZE (base_vector_mode))
> + && multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)),
> + 2, &half_nelts))
> {
> /* Try fusing consecutive sequences of COUNT / NVECTORS elements
> together into elements of type INT_TYPE and using the result
> @@ -434,7 +437,7 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned int count,
> poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE (vector_type));
> vec_perm_builder sel1 (nelts, 2, 3);
> vec_perm_builder sel2 (nelts, 2, 3);
> - poly_int64 half_nelts = exact_div (nelts, 2);
> +
> for (unsigned int i = 0; i < 3; ++i)
> {
> sel1.quick_push (i);
> --
> 2.34.1
>
Wait, VNx1DImode can be really evaluate to just one element if
-march=rv64g_zve64x,
I thinks this should be just fixed on backend by this patch:
https://patchwork.ozlabs.org/project/gcc/patch/20230414014518.15458-1-juzhe.zhong@rivai.ai/
On Tue, Apr 18, 2023 at 2:12 PM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Mon, Apr 17, 2023 at 8:42 PM Michael Collison <collison@rivosinc.com> wrote:
> >
> > While working on autovectorizing for the RISCV port I encountered an issue
> > where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
> > evenly divisible by two. The RISC-V target has vector modes (e.g. VNx1DImode),
> > where GET_MODE_NUNITS is equal to one.
> >
> > Tested on RISCV and x86_64-linux-gnu. Okay?
>
> OK.
>
> > 2023-03-09 Michael Collison <collison@rivosinc.com>
> >
> > * tree-vect-slp.cc (can_duplicate_and_interleave_p):
> > Check that GET_MODE_NUNITS is a multiple of 2.
> > ---
> > gcc/tree-vect-slp.cc | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index d73deaecce0..a64fe454e19 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -423,10 +423,13 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned int count,
> > (GET_MODE_BITSIZE (int_mode), 1);
> > tree vector_type
> > = get_vectype_for_scalar_type (vinfo, int_type, count);
> > + poly_int64 half_nelts;
> > if (vector_type
> > && VECTOR_MODE_P (TYPE_MODE (vector_type))
> > && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
> > - GET_MODE_SIZE (base_vector_mode)))
> > + GET_MODE_SIZE (base_vector_mode))
> > + && multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)),
> > + 2, &half_nelts))
> > {
> > /* Try fusing consecutive sequences of COUNT / NVECTORS elements
> > together into elements of type INT_TYPE and using the result
> > @@ -434,7 +437,7 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned int count,
> > poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE (vector_type));
> > vec_perm_builder sel1 (nelts, 2, 3);
> > vec_perm_builder sel2 (nelts, 2, 3);
> > - poly_int64 half_nelts = exact_div (nelts, 2);
> > +
> > for (unsigned int i = 0; i < 3; ++i)
> > {
> > sel1.quick_push (i);
> > --
> > 2.34.1
> >
Few more background about RVV:
RISC-V has provide different VLEN configuration by different ISA
extension like `zve32x`, `zve64x` and `v`
zve32x just guarantee the minimal VLEN is 32 bits,
zve64x guarantee the minimal VLEN is 64 bits,
and v guarantee the minimal VLEN is 128 bits,
Current status (without that patch):
Zve32x: Mode for one vector register mode is VNx1SImode and VNx1DImode
is invalid mode
- one vector register could hold 1 + 1x SImode where x is 0~n, so it
might hold just one SI
Zve64x: Mode for one vector register mode is VNx1DImode or VNx2SImode
- one vector register could hold 1 + 1x DImode where x is 0~n, so it
might hold just one DI
- one vector register could hold 2 + 2x SImode where x is 0~n, so it
might hold just two SI
So what I want to say here is VNx1DImode is really NOT safe to assume
to have more than two DI in theory.
However `v` extension guarantees the minimal VLEN is 128 bits.
We are trying to introduce another type/mode mapping for this configure:
v: Mode for one vector register mode is VNx2DImode or VNx4SImode
- one vector register could hold 2 + 2x DImode where x is 0~n, so it
will hold at least two DI
- one vector register could hold 4 + 4x SImode where x is 0~n, so it
will hold at least four DI
So GET_MODE_NUNITS for a single vector register with DI mode will
become 2 (VNx2DImode) if it is really possible, which is a more
precise way to model the vector extension for RISC-V .
On Tue, Apr 18, 2023 at 10:28 PM Kito Cheng <kito.cheng@gmail.com> wrote:
>
> Wait, VNx1DImode can be really evaluate to just one element if
> -march=rv64g_zve64x,
>
> I thinks this should be just fixed on backend by this patch:
>
> https://patchwork.ozlabs.org/project/gcc/patch/20230414014518.15458-1-juzhe.zhong@rivai.ai/
>
> On Tue, Apr 18, 2023 at 2:12 PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Mon, Apr 17, 2023 at 8:42 PM Michael Collison <collison@rivosinc.com> wrote:
> > >
> > > While working on autovectorizing for the RISCV port I encountered an issue
> > > where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
> > > evenly divisible by two. The RISC-V target has vector modes (e.g. VNx1DImode),
> > > where GET_MODE_NUNITS is equal to one.
> > >
> > > Tested on RISCV and x86_64-linux-gnu. Okay?
> >
> > OK.
> >
> > > 2023-03-09 Michael Collison <collison@rivosinc.com>
> > >
> > > * tree-vect-slp.cc (can_duplicate_and_interleave_p):
> > > Check that GET_MODE_NUNITS is a multiple of 2.
> > > ---
> > > gcc/tree-vect-slp.cc | 7 +++++--
> > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index d73deaecce0..a64fe454e19 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -423,10 +423,13 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned int count,
> > > (GET_MODE_BITSIZE (int_mode), 1);
> > > tree vector_type
> > > = get_vectype_for_scalar_type (vinfo, int_type, count);
> > > + poly_int64 half_nelts;
> > > if (vector_type
> > > && VECTOR_MODE_P (TYPE_MODE (vector_type))
> > > && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
> > > - GET_MODE_SIZE (base_vector_mode)))
> > > + GET_MODE_SIZE (base_vector_mode))
> > > + && multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)),
> > > + 2, &half_nelts))
> > > {
> > > /* Try fusing consecutive sequences of COUNT / NVECTORS elements
> > > together into elements of type INT_TYPE and using the result
> > > @@ -434,7 +437,7 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned int count,
> > > poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE (vector_type));
> > > vec_perm_builder sel1 (nelts, 2, 3);
> > > vec_perm_builder sel2 (nelts, 2, 3);
> > > - poly_int64 half_nelts = exact_div (nelts, 2);
> > > +
> > > for (unsigned int i = 0; i < 3; ++i)
> > > {
> > > sel1.quick_push (i);
> > > --
> > > 2.34.1
> > >
Yes, like kito said.
We won't enable VNx1DImode in auto-vectorization so it's meaningless to fix it here.
We dynamic adjust the minimum vector-length for different '-march' according to RVV ISA specification.
So we strongly suggest that we should drop this fix.
Thanks.
juzhe.zhong@rivai.ai
From: Kito Cheng
Date: 2023-04-19 02:21
To: Richard Biener; Jeff Law; Palmer Dabbelt
CC: Michael Collison; gcc-patches; 钟居哲
Subject: Re: [PATCH v4 07/10] vect: Verify that GET_MODE_NUNITS is a multiple of 2.
Few more background about RVV:
RISC-V has provide different VLEN configuration by different ISA
extension like `zve32x`, `zve64x` and `v`
zve32x just guarantee the minimal VLEN is 32 bits,
zve64x guarantee the minimal VLEN is 64 bits,
and v guarantee the minimal VLEN is 128 bits,
Current status (without that patch):
Zve32x: Mode for one vector register mode is VNx1SImode and VNx1DImode
is invalid mode
- one vector register could hold 1 + 1x SImode where x is 0~n, so it
might hold just one SI
Zve64x: Mode for one vector register mode is VNx1DImode or VNx2SImode
- one vector register could hold 1 + 1x DImode where x is 0~n, so it
might hold just one DI
- one vector register could hold 2 + 2x SImode where x is 0~n, so it
might hold just two SI
So what I want to say here is VNx1DImode is really NOT safe to assume
to have more than two DI in theory.
However `v` extension guarantees the minimal VLEN is 128 bits.
We are trying to introduce another type/mode mapping for this configure:
v: Mode for one vector register mode is VNx2DImode or VNx4SImode
- one vector register could hold 2 + 2x DImode where x is 0~n, so it
will hold at least two DI
- one vector register could hold 4 + 4x SImode where x is 0~n, so it
will hold at least four DI
So GET_MODE_NUNITS for a single vector register with DI mode will
become 2 (VNx2DImode) if it is really possible, which is a more
precise way to model the vector extension for RISC-V .
On Tue, Apr 18, 2023 at 10:28 PM Kito Cheng <kito.cheng@gmail.com> wrote:
>
> Wait, VNx1DImode can be really evaluate to just one element if
> -march=rv64g_zve64x,
>
> I thinks this should be just fixed on backend by this patch:
>
> https://patchwork.ozlabs.org/project/gcc/patch/20230414014518.15458-1-juzhe.zhong@rivai.ai/
>
> On Tue, Apr 18, 2023 at 2:12 PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Mon, Apr 17, 2023 at 8:42 PM Michael Collison <collison@rivosinc.com> wrote:
> > >
> > > While working on autovectorizing for the RISCV port I encountered an issue
> > > where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
> > > evenly divisible by two. The RISC-V target has vector modes (e.g. VNx1DImode),
> > > where GET_MODE_NUNITS is equal to one.
> > >
> > > Tested on RISCV and x86_64-linux-gnu. Okay?
> >
> > OK.
> >
> > > 2023-03-09 Michael Collison <collison@rivosinc.com>
> > >
> > > * tree-vect-slp.cc (can_duplicate_and_interleave_p):
> > > Check that GET_MODE_NUNITS is a multiple of 2.
> > > ---
> > > gcc/tree-vect-slp.cc | 7 +++++--
> > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index d73deaecce0..a64fe454e19 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -423,10 +423,13 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned int count,
> > > (GET_MODE_BITSIZE (int_mode), 1);
> > > tree vector_type
> > > = get_vectype_for_scalar_type (vinfo, int_type, count);
> > > + poly_int64 half_nelts;
> > > if (vector_type
> > > && VECTOR_MODE_P (TYPE_MODE (vector_type))
> > > && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
> > > - GET_MODE_SIZE (base_vector_mode)))
> > > + GET_MODE_SIZE (base_vector_mode))
> > > + && multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)),
> > > + 2, &half_nelts))
> > > {
> > > /* Try fusing consecutive sequences of COUNT / NVECTORS elements
> > > together into elements of type INT_TYPE and using the result
> > > @@ -434,7 +437,7 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned int count,
> > > poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE (vector_type));
> > > vec_perm_builder sel1 (nelts, 2, 3);
> > > vec_perm_builder sel2 (nelts, 2, 3);
> > > - poly_int64 half_nelts = exact_div (nelts, 2);
> > > +
> > > for (unsigned int i = 0; i < 3; ++i)
> > > {
> > > sel1.quick_push (i);
> > > --
> > > 2.34.1
> > >
Juzhe and Kito,
Thank you for the clarification.
On 4/18/23 18:48, juzhe.zhong@rivai.ai wrote:
> Yes, like kito said.
> We won't enable VNx1DImode in auto-vectorization so it's meaningless
> to fix it here.
> We dynamic adjust the minimum vector-length for different '-march'
> according to RVV ISA specification.
> So we strongly suggest that we should drop this fix.
>
> Thanks.
> ------------------------------------------------------------------------
> juzhe.zhong@rivai.ai
>
> *From:* Kito Cheng <mailto:kito.cheng@gmail.com>
> *Date:* 2023-04-19 02:21
> *To:* Richard Biener <mailto:richard.guenther@gmail.com>; Jeff Law
> <mailto:jeffreyalaw@gmail.com>; Palmer Dabbelt
> <mailto:palmer@dabbelt.com>
> *CC:* Michael Collison <mailto:collison@rivosinc.com>; gcc-patches
> <mailto:gcc-patches@gcc.gnu.org>; 钟居哲 <mailto:juzhe.zhong@rivai.ai>
> *Subject:* Re: [PATCH v4 07/10] vect: Verify that GET_MODE_NUNITS
> is a multiple of 2.
> Few more background about RVV:
> RISC-V has provide different VLEN configuration by different ISA
> extension like `zve32x`, `zve64x` and `v`
> zve32x just guarantee the minimal VLEN is 32 bits,
> zve64x guarantee the minimal VLEN is 64 bits,
> and v guarantee the minimal VLEN is 128 bits,
> Current status (without that patch):
> Zve32x: Mode for one vector register mode is VNx1SImode and VNx1DImode
> is invalid mode
> - one vector register could hold 1 + 1x SImode where x is 0~n, so it
> might hold just one SI
> Zve64x: Mode for one vector register mode is VNx1DImode or VNx2SImode
> - one vector register could hold 1 + 1x DImode where x is 0~n, so it
> might hold just one DI
> - one vector register could hold 2 + 2x SImode where x is 0~n, so it
> might hold just two SI
> So what I want to say here is VNx1DImode is really NOT safe to assume
> to have more than two DI in theory.
> However `v` extension guarantees the minimal VLEN is 128 bits.
> We are trying to introduce another type/mode mapping for this
> configure:
> v: Mode for one vector register mode is VNx2DImode or VNx4SImode
> - one vector register could hold 2 + 2x DImode where x is 0~n, so it
> will hold at least two DI
> - one vector register could hold 4 + 4x SImode where x is 0~n, so it
> will hold at least four DI
> So GET_MODE_NUNITS for a single vector register with DI mode will
> become 2 (VNx2DImode) if it is really possible, which is a more
> precise way to model the vector extension for RISC-V .
> On Tue, Apr 18, 2023 at 10:28 PM Kito Cheng <kito.cheng@gmail.com>
> wrote:
> >
> > Wait, VNx1DImode can be really evaluate to just one element if
> > -march=rv64g_zve64x,
> >
> > I thinks this should be just fixed on backend by this patch:
> >
> >
> https://patchwork.ozlabs.org/project/gcc/patch/20230414014518.15458-1-juzhe.zhong@rivai.ai/
> >
> > On Tue, Apr 18, 2023 at 2:12 PM Richard Biener via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > On Mon, Apr 17, 2023 at 8:42 PM Michael Collison
> <collison@rivosinc.com> wrote:
> > > >
> > > > While working on autovectorizing for the RISCV port I
> encountered an issue
> > > > where can_duplicate_and_interleave_p assumes that
> GET_MODE_NUNITS is a
> > > > evenly divisible by two. The RISC-V target has vector modes
> (e.g. VNx1DImode),
> > > > where GET_MODE_NUNITS is equal to one.
> > > >
> > > > Tested on RISCV and x86_64-linux-gnu. Okay?
> > >
> > > OK.
> > >
> > > > 2023-03-09 Michael Collison <collison@rivosinc.com>
> > > >
> > > > * tree-vect-slp.cc (can_duplicate_and_interleave_p):
> > > > Check that GET_MODE_NUNITS is a multiple of 2.
> > > > ---
> > > > gcc/tree-vect-slp.cc | 7 +++++--
> > > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > > index d73deaecce0..a64fe454e19 100644
> > > > --- a/gcc/tree-vect-slp.cc
> > > > +++ b/gcc/tree-vect-slp.cc
> > > > @@ -423,10 +423,13 @@ can_duplicate_and_interleave_p
> (vec_info *vinfo, unsigned int count,
> > > > (GET_MODE_BITSIZE (int_mode), 1);
> > > > tree vector_type
> > > > = get_vectype_for_scalar_type (vinfo, int_type,
> count);
> > > > + poly_int64 half_nelts;
> > > > if (vector_type
> > > > && VECTOR_MODE_P (TYPE_MODE (vector_type))
> > > > && known_eq (GET_MODE_SIZE (TYPE_MODE
> (vector_type)),
> > > > - GET_MODE_SIZE (base_vector_mode)))
> > > > + GET_MODE_SIZE (base_vector_mode))
> > > > + && multiple_p (GET_MODE_NUNITS (TYPE_MODE
> (vector_type)),
> > > > + 2, &half_nelts))
> > > > {
> > > > /* Try fusing consecutive sequences of COUNT /
> NVECTORS elements
> > > > together into elements of type INT_TYPE and
> using the result
> > > > @@ -434,7 +437,7 @@ can_duplicate_and_interleave_p (vec_info
> *vinfo, unsigned int count,
> > > > poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE
> (vector_type));
> > > > vec_perm_builder sel1 (nelts, 2, 3);
> > > > vec_perm_builder sel2 (nelts, 2, 3);
> > > > - poly_int64 half_nelts = exact_div (nelts, 2);
> > > > +
> > > > for (unsigned int i = 0; i < 3; ++i)
> > > > {
> > > > sel1.quick_push (i);
> > > > --
> > > > 2.34.1
> > > >
>
<juzhe.zhong@rivai.ai> writes:
> Yes, like kito said.
> We won't enable VNx1DImode in auto-vectorization so it's meaningless to fix it here.
> We dynamic adjust the minimum vector-length for different '-march' according to RVV ISA specification.
> So we strongly suggest that we should drop this fix.
I think the patch should go in regardless. If we have a port with
a VNx1 mode then the exact_div is at best dubious and at worst wrong.
Thanks,
Richard
@@ -423,10 +423,13 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned int count,
(GET_MODE_BITSIZE (int_mode), 1);
tree vector_type
= get_vectype_for_scalar_type (vinfo, int_type, count);
+ poly_int64 half_nelts;
if (vector_type
&& VECTOR_MODE_P (TYPE_MODE (vector_type))
&& known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
- GET_MODE_SIZE (base_vector_mode)))
+ GET_MODE_SIZE (base_vector_mode))
+ && multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)),
+ 2, &half_nelts))
{
/* Try fusing consecutive sequences of COUNT / NVECTORS elements
together into elements of type INT_TYPE and using the result
@@ -434,7 +437,7 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned int count,
poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE (vector_type));
vec_perm_builder sel1 (nelts, 2, 3);
vec_perm_builder sel2 (nelts, 2, 3);
- poly_int64 half_nelts = exact_div (nelts, 2);
+
for (unsigned int i = 0; i < 3; ++i)
{
sel1.quick_push (i);