[V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
Checks
Commit Message
This patch fixes this following FAILs in RISC-V regression:
FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD:
1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, condtional mask).
This situation we just need to leverage the current MASK_GATHER_LOAD which can achieve SLP MASK_LEN_GATHER_LOAD.
2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, -1)
Current SLP check will failed on dummy mask -1, so we relax the check in tree-vect-slp.cc and allow it to be materialized.
Consider this following case:
void __attribute__((noipa))
f (int *restrict y, int *restrict x, int *restrict indices, int n)
{
for (int i = 0; i < n; ++i)
{
y[i * 2] = x[indices[i * 2]] + 1;
y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
}
}
https://godbolt.org/z/WG3M3n7Mo
GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES:
f:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e8,mf4,ta,ma
vsetvli zero,a5,e32,m1,ta,ma
vlseg2e32.v v6,(a2)
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2 v2,v6
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v v1,(a1),v2
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2 v2,v7
vsetvli zero,zero,e32,m1,ta,ma
vadd.vi v4,v1,1
vsetvli zero,zero,e64,m2,ta,ma
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v v2,(a1),v2
vsetvli a4,zero,e32,m1,ta,ma
slli a6,a5,3
vadd.vi v5,v2,2
sub a3,a3,a5
vsetvli zero,a5,e32,m1,ta,ma
vsseg2e32.v v4,(a0)
add a2,a2,a6
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret
After this patch:
f:
ble a3,zero,.L5
li a5,1
csrr t1,vlenb
slli a5,a5,33
srli a7,t1,2
addi a5,a5,1
slli a3,a3,1
neg t3,a7
vsetvli a4,zero,e64,m1,ta,ma
vmv.v.x v4,a5
.L3:
minu a5,a3,a7
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a2)
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2 v2,v1
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v v2,(a1),v2
vsetvli a4,zero,e32,m1,ta,ma
mv a6,a3
vadd.vv v2,v2,v4
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v2,0(a0)
add a2,a2,t1
add a0,a0,t1
add a3,a3,t3
bgtu a6,a7,.L3
.L5:
ret
Note that I found we are missing conditional mask gather_load SLP test, Append a test for it in this patch.
Tested on RISC-V and Bootstrap && Regression on X86 passed.
Ok for trunk ?
gcc/ChangeLog:
* tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD.
(vect_get_and_check_slp_defs): Ditto.
(vect_build_slp_tree_1): Ditto.
(vect_build_slp_tree_2): Ditto.
* tree-vect-stmts.cc (vectorizable_load): Ditto.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-gather-6.c: New test.
---
gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++++++++++++++
gcc/tree-vect-slp.cc | 22 ++++++++++++++++++----
gcc/tree-vect-stmts.cc | 12 +++++++++++-
3 files changed, 44 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c
Comments
Hi, this patch fix V4 issue:
Previously as Richard S commented:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633178.html
slp_op and mask_vectype are only initialised when mask_index >= 0.
Shouldn't this code be under mask_index >= 0 too?
Also, when do we encounter mismatched mask_vectypes? Presumably the SLP
node has a known vectype by this point. I think a comment would be useful.
Since I didn't encounter mismatched case in the regression of RISC-V and X86, so
I fix it in V4 patch as follows:
+ if (mask_index >= 0 && slp_node)
+ {
+ bool match_p
+ = vect_maybe_update_slp_op_vectype (slp_op, mask_vectype);
+ gcc_assert (match_p);
+ }
Add assertion here.
However, recently an ICE suddenly appear today in RISC-V regression:
FAIL: gcc.dg/tree-ssa/pr44306.c (internal compiler error: in vectorizable_load, at tree-vect-stmts.cc:9885)
FAIL: gcc.dg/tree-ssa/pr44306.c (test for excess errors)
This is because we are encountering that mask_vectype is boolean type and it is external def.
Then vect_maybe_update_slp_op_vectype will return false.
Then I fix this piece of code in V5 here:
+ if (mask_index >= 0 && slp_node
+ && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
+ {
+ /* We don't vectorize the boolean type external SLP mask. */
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "incompatible vector types for invariants\n");
+ return false;
+ }
Bootstrap and Regression on x86 passed.
Thanks.
juzhe.zhong@rivai.ai
From: Juzhe-Zhong
Date: 2023-10-18 20:36
To: gcc-patches
CC: richard.sandiford; rguenther; Juzhe-Zhong
Subject: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
This patch fixes this following FAILs in RISC-V regression:
FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD:
1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, condtional mask).
This situation we just need to leverage the current MASK_GATHER_LOAD which can achieve SLP MASK_LEN_GATHER_LOAD.
2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, -1)
Current SLP check will failed on dummy mask -1, so we relax the check in tree-vect-slp.cc and allow it to be materialized.
Consider this following case:
void __attribute__((noipa))
f (int *restrict y, int *restrict x, int *restrict indices, int n)
{
for (int i = 0; i < n; ++i)
{
y[i * 2] = x[indices[i * 2]] + 1;
y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
}
}
https://godbolt.org/z/WG3M3n7Mo
GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES:
f:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e8,mf4,ta,ma
vsetvli zero,a5,e32,m1,ta,ma
vlseg2e32.v v6,(a2)
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2 v2,v6
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v v1,(a1),v2
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2 v2,v7
vsetvli zero,zero,e32,m1,ta,ma
vadd.vi v4,v1,1
vsetvli zero,zero,e64,m2,ta,ma
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v v2,(a1),v2
vsetvli a4,zero,e32,m1,ta,ma
slli a6,a5,3
vadd.vi v5,v2,2
sub a3,a3,a5
vsetvli zero,a5,e32,m1,ta,ma
vsseg2e32.v v4,(a0)
add a2,a2,a6
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret
After this patch:
f:
ble a3,zero,.L5
li a5,1
csrr t1,vlenb
slli a5,a5,33
srli a7,t1,2
addi a5,a5,1
slli a3,a3,1
neg t3,a7
vsetvli a4,zero,e64,m1,ta,ma
vmv.v.x v4,a5
.L3:
minu a5,a3,a7
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a2)
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2 v2,v1
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v v2,(a1),v2
vsetvli a4,zero,e32,m1,ta,ma
mv a6,a3
vadd.vv v2,v2,v4
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v2,0(a0)
add a2,a2,t1
add a0,a0,t1
add a3,a3,t3
bgtu a6,a7,.L3
.L5:
ret
Note that I found we are missing conditional mask gather_load SLP test, Append a test for it in this patch.
Tested on RISC-V and Bootstrap && Regression on X86 passed.
Ok for trunk ?
gcc/ChangeLog:
* tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD.
(vect_get_and_check_slp_defs): Ditto.
(vect_build_slp_tree_1): Ditto.
(vect_build_slp_tree_2): Ditto.
* tree-vect-stmts.cc (vectorizable_load): Ditto.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-gather-6.c: New test.
---
gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++++++++++++++
gcc/tree-vect-slp.cc | 22 ++++++++++++++++++----
gcc/tree-vect-stmts.cc | 12 +++++++++++-
3 files changed, 44 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
new file mode 100644
index 00000000000..ff55f321854
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+void
+f (int *restrict y, int *restrict x, int *restrict indices, int *restrict cond, int n)
+{
+ for (int i = 0; i < n; ++i)
+ {
+ if (cond[i * 2])
+ y[i * 2] = x[indices[i * 2]] + 1;
+ if (cond[i * 2 + 1])
+ y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
+ }
+}
+
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target vect_gather_load_ifn } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index d081999a763..146dba658a2 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -552,6 +552,7 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
return arg1_map;
case IFN_MASK_GATHER_LOAD:
+ case IFN_MASK_LEN_GATHER_LOAD:
return arg1_arg4_map;
case IFN_MASK_STORE:
@@ -719,8 +720,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap,
{
tree type = TREE_TYPE (oprnd);
dt = dts[i];
- if ((dt == vect_constant_def
- || dt == vect_external_def)
+ if (dt == vect_external_def
&& !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
&& (TREE_CODE (type) == BOOLEAN_TYPE
|| !can_duplicate_and_interleave_p (vinfo, stmts.length (),
@@ -732,6 +732,17 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap,
"for variable-length SLP %T\n", oprnd);
return -1;
}
+ if (dt == vect_constant_def
+ && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
+ && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "Build SLP failed: invalid type of def "
+ "for variable-length SLP %T\n",
+ oprnd);
+ return -1;
+ }
/* For the swapping logic below force vect_reduction_def
for the reduction op in a SLP reduction group. */
@@ -1096,7 +1107,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
if (cfn == CFN_MASK_LOAD
|| cfn == CFN_GATHER_LOAD
- || cfn == CFN_MASK_GATHER_LOAD)
+ || cfn == CFN_MASK_GATHER_LOAD
+ || cfn == CFN_MASK_LEN_GATHER_LOAD)
ldst_p = true;
else if (cfn == CFN_MASK_STORE)
{
@@ -1357,6 +1369,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
&& rhs_code != CFN_GATHER_LOAD
&& rhs_code != CFN_MASK_GATHER_LOAD
+ && rhs_code != CFN_MASK_LEN_GATHER_LOAD
/* Not grouped loads are handled as externals for BB
vectorization. For loop vectorization we can handle
splats the same we handle single element interleaving. */
@@ -1857,7 +1870,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
|| gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
- || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
+ || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
+ || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
else
{
*max_nunits = this_max_nunits;
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index e5ff44c25f1..0bc7beb1342 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9878,12 +9878,22 @@ vectorizable_load (vec_info *vinfo,
return false;
mask_index = internal_fn_mask_index (ifn);
+ slp_tree slp_op = NULL;
if (mask_index >= 0 && slp_node)
mask_index = vect_slp_child_index_for_operand (call, mask_index);
if (mask_index >= 0
&& !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
- &mask, NULL, &mask_dt, &mask_vectype))
+ &mask, &slp_op, &mask_dt, &mask_vectype))
return false;
+ if (mask_index >= 0 && slp_node
+ && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
+ {
+ /* We don't vectorize the boolean type external SLP mask. */
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "incompatible vector types for invariants\n");
+ return false;
+ }
}
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
--
2.36.3
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> Hi, this patch fix V4 issue:
>
> Previously as Richard S commented:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633178.html
>
> slp_op and mask_vectype are only initialised when mask_index >= 0.
> Shouldn't this code be under mask_index >= 0 too?
> Also, when do we encounter mismatched mask_vectypes? Presumably the SLP
> node has a known vectype by this point. I think a comment would be useful.
>
> Since I didn't encounter mismatched case in the regression of RISC-V and X86, so
> I fix it in V4 patch as follows:
> + if (mask_index >= 0 && slp_node)
> + {
> + bool match_p
> + = vect_maybe_update_slp_op_vectype (slp_op, mask_vectype);
> + gcc_assert (match_p);
> + }
> Add assertion here.
>
> However, recently an ICE suddenly appear today in RISC-V regression:
>
> FAIL: gcc.dg/tree-ssa/pr44306.c (internal compiler error: in vectorizable_load, at tree-vect-stmts.cc:9885)
> FAIL: gcc.dg/tree-ssa/pr44306.c (test for excess errors)
>
> This is because we are encountering that mask_vectype is boolean type and it is external def.
> Then vect_maybe_update_slp_op_vectype will return false.
>
> Then I fix this piece of code in V5 here:
>
> + if (mask_index >= 0 && slp_node
> + && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
> + {
> + /* We don't vectorize the boolean type external SLP mask. */
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "incompatible vector types for invariants\n");
> + return false;
> + }
>
> Bootstrap and Regression on x86 passed.
Why are external defs a problem though? E.g. in pr44306, it looks like
we should be able to create an invariant mask that contains
(!present[0]) || UseDefaultScalingMatrix8x8Flag[0]
(!present[1]) || UseDefaultScalingMatrix8x8Flag[1]
(!present[0]) || UseDefaultScalingMatrix8x8Flag[0]
(!present[1]) || UseDefaultScalingMatrix8x8Flag[1]
...repeating...
The type of the mask can be driven by the code that needs it.
Thanks,
Richard
>
> Thanks.
>
>
> juzhe.zhong@rivai.ai
>
> From: Juzhe-Zhong
> Date: 2023-10-18 20:36
> To: gcc-patches
> CC: richard.sandiford; rguenther; Juzhe-Zhong
> Subject: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> This patch fixes this following FAILs in RISC-V regression:
>
> FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
>
> The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
>
> We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD:
>
> 1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, condtional mask).
>
> This situation we just need to leverage the current MASK_GATHER_LOAD which can achieve SLP MASK_LEN_GATHER_LOAD.
>
> 2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, -1)
>
> Current SLP check will failed on dummy mask -1, so we relax the check in tree-vect-slp.cc and allow it to be materialized.
>
> Consider this following case:
>
> void __attribute__((noipa))
> f (int *restrict y, int *restrict x, int *restrict indices, int n)
> {
> for (int i = 0; i < n; ++i)
> {
> y[i * 2] = x[indices[i * 2]] + 1;
> y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
> }
> }
>
> https://godbolt.org/z/WG3M3n7Mo
>
> GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES:
>
> f:
> ble a3,zero,.L5
> .L3:
> vsetvli a5,a3,e8,mf4,ta,ma
> vsetvli zero,a5,e32,m1,ta,ma
> vlseg2e32.v v6,(a2)
> vsetvli a4,zero,e64,m2,ta,ma
> vsext.vf2 v2,v6
> vsll.vi v2,v2,2
> vsetvli zero,a5,e32,m1,ta,ma
> vluxei64.v v1,(a1),v2
> vsetvli a4,zero,e64,m2,ta,ma
> vsext.vf2 v2,v7
> vsetvli zero,zero,e32,m1,ta,ma
> vadd.vi v4,v1,1
> vsetvli zero,zero,e64,m2,ta,ma
> vsll.vi v2,v2,2
> vsetvli zero,a5,e32,m1,ta,ma
> vluxei64.v v2,(a1),v2
> vsetvli a4,zero,e32,m1,ta,ma
> slli a6,a5,3
> vadd.vi v5,v2,2
> sub a3,a3,a5
> vsetvli zero,a5,e32,m1,ta,ma
> vsseg2e32.v v4,(a0)
> add a2,a2,a6
> add a0,a0,a6
> bne a3,zero,.L3
> .L5:
> ret
>
> After this patch:
>
> f:
> ble a3,zero,.L5
> li a5,1
> csrr t1,vlenb
> slli a5,a5,33
> srli a7,t1,2
> addi a5,a5,1
> slli a3,a3,1
> neg t3,a7
> vsetvli a4,zero,e64,m1,ta,ma
> vmv.v.x v4,a5
> .L3:
> minu a5,a3,a7
> vsetvli zero,a5,e32,m1,ta,ma
> vle32.v v1,0(a2)
> vsetvli a4,zero,e64,m2,ta,ma
> vsext.vf2 v2,v1
> vsll.vi v2,v2,2
> vsetvli zero,a5,e32,m1,ta,ma
> vluxei64.v v2,(a1),v2
> vsetvli a4,zero,e32,m1,ta,ma
> mv a6,a3
> vadd.vv v2,v2,v4
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v2,0(a0)
> add a2,a2,t1
> add a0,a0,t1
> add a3,a3,t3
> bgtu a6,a7,.L3
> .L5:
> ret
>
> Note that I found we are missing conditional mask gather_load SLP test, Append a test for it in this patch.
>
> Tested on RISC-V and Bootstrap && Regression on X86 passed.
>
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD.
> (vect_get_and_check_slp_defs): Ditto.
> (vect_build_slp_tree_1): Ditto.
> (vect_build_slp_tree_2): Ditto.
> * tree-vect-stmts.cc (vectorizable_load): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-gather-6.c: New test.
>
> ---
> gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++++++++++++++
> gcc/tree-vect-slp.cc | 22 ++++++++++++++++++----
> gcc/tree-vect-stmts.cc | 12 +++++++++++-
> 3 files changed, 44 insertions(+), 5 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
> new file mode 100644
> index 00000000000..ff55f321854
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +
> +void
> +f (int *restrict y, int *restrict x, int *restrict indices, int *restrict cond, int n)
> +{
> + for (int i = 0; i < n; ++i)
> + {
> + if (cond[i * 2])
> + y[i * 2] = x[indices[i * 2]] + 1;
> + if (cond[i * 2 + 1])
> + y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
> + }
> +}
> +
> +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target vect_gather_load_ifn } } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index d081999a763..146dba658a2 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -552,6 +552,7 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
> return arg1_map;
> case IFN_MASK_GATHER_LOAD:
> + case IFN_MASK_LEN_GATHER_LOAD:
> return arg1_arg4_map;
> case IFN_MASK_STORE:
> @@ -719,8 +720,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap,
> {
> tree type = TREE_TYPE (oprnd);
> dt = dts[i];
> - if ((dt == vect_constant_def
> - || dt == vect_external_def)
> + if (dt == vect_external_def
> && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> && (TREE_CODE (type) == BOOLEAN_TYPE
> || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
> @@ -732,6 +732,17 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap,
> "for variable-length SLP %T\n", oprnd);
> return -1;
> }
> + if (dt == vect_constant_def
> + && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> + && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "Build SLP failed: invalid type of def "
> + "for variable-length SLP %T\n",
> + oprnd);
> + return -1;
> + }
> /* For the swapping logic below force vect_reduction_def
> for the reduction op in a SLP reduction group. */
> @@ -1096,7 +1107,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> if (cfn == CFN_MASK_LOAD
> || cfn == CFN_GATHER_LOAD
> - || cfn == CFN_MASK_GATHER_LOAD)
> + || cfn == CFN_MASK_GATHER_LOAD
> + || cfn == CFN_MASK_LEN_GATHER_LOAD)
> ldst_p = true;
> else if (cfn == CFN_MASK_STORE)
> {
> @@ -1357,6 +1369,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> && rhs_code != CFN_GATHER_LOAD
> && rhs_code != CFN_MASK_GATHER_LOAD
> + && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> /* Not grouped loads are handled as externals for BB
> vectorization. For loop vectorization we can handle
> splats the same we handle single element interleaving. */
> @@ -1857,7 +1870,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
> gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
> || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> - || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> + || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> + || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
> else
> {
> *max_nunits = this_max_nunits;
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index e5ff44c25f1..0bc7beb1342 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -9878,12 +9878,22 @@ vectorizable_load (vec_info *vinfo,
> return false;
> mask_index = internal_fn_mask_index (ifn);
> + slp_tree slp_op = NULL;
> if (mask_index >= 0 && slp_node)
> mask_index = vect_slp_child_index_for_operand (call, mask_index);
> if (mask_index >= 0
> && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> - &mask, NULL, &mask_dt, &mask_vectype))
> + &mask, &slp_op, &mask_dt, &mask_vectype))
> return false;
> + if (mask_index >= 0 && slp_node
> + && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
> + {
> + /* We don't vectorize the boolean type external SLP mask. */
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "incompatible vector types for invariants\n");
> + return false;
> + }
> }
> tree vectype = STMT_VINFO_VECTYPE (stmt_info);
Hi, Richard.
Assertion failed at this IR:
_427 = _425 & _426;
_429 = present$0_16(D) != 0;
_430 = _425 & _429;
_409 = _430 | _445;
_410 = _409 | _449;
_411 = .LOOP_VECTORIZED (3, 6);
if (_411 != 0)
goto <bb 267>; [100.00%]
else
goto <bb 268>; [100.00%]
<bb 267> [local count: 3280550]:
<bb 83> [local count: 29823181]:
....
pretmp_56 = .MASK_LOAD (_293, 32B, _427); ---> cause assertion failed.
You can take a look this ifcvt IR and search 'pretmp_56 = .MASK_LOAD (_293, 32B, _427);'
RVV is totally the same IR as ARM SVE:
https://godbolt.org/z/rPbzfExWP
I was struggling at this issue for a few days but failed to figure out why.
I am sorry about that. Could you help me with that?
And I adjust the code as follows:
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a9200767f67..42f85839c6e 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9843,10 +9843,17 @@ vectorizable_load (vec_info *vinfo,
mask_index = internal_fn_mask_index (ifn);
if (mask_index >= 0 && slp_node)
mask_index = vect_slp_child_index_for_operand (call, mask_index);
+ slp_tree slp_op = NULL;
if (mask_index >= 0
&& !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
- &mask, NULL, &mask_dt, &mask_vectype))
+ &mask,
+ ifn == IFN_MASK_LEN_GATHER_LOAD ? &slp_op
+ : NULL,
+ &mask_dt, &mask_vectype))
return false;
+ if (mask_index >= 0 && slp_node
+ && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
+ gcc_unreachable ();
}
As you can see, except MASK_LEN_GATHER_LOAD, other LOADs, I pass 'NULL' same as before.
Only MASK_LEN_GATHER_LOAD passes '&slp_op'. It works fine for RVV. But I don't think it's a correct code, we may need to find another solution.
Thanks.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2023-10-20 06:19
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther
Subject: Re: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> Hi, this patch fix V4 issue:
>
> Previously as Richard S commented:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633178.html
>
> slp_op and mask_vectype are only initialised when mask_index >= 0.
> Shouldn't this code be under mask_index >= 0 too?
> Also, when do we encounter mismatched mask_vectypes? Presumably the SLP
> node has a known vectype by this point. I think a comment would be useful.
>
> Since I didn't encounter mismatched case in the regression of RISC-V and X86, so
> I fix it in V4 patch as follows:
> + if (mask_index >= 0 && slp_node)
> + {
> + bool match_p
> + = vect_maybe_update_slp_op_vectype (slp_op, mask_vectype);
> + gcc_assert (match_p);
> + }
> Add assertion here.
>
> However, recently an ICE suddenly appear today in RISC-V regression:
>
> FAIL: gcc.dg/tree-ssa/pr44306.c (internal compiler error: in vectorizable_load, at tree-vect-stmts.cc:9885)
> FAIL: gcc.dg/tree-ssa/pr44306.c (test for excess errors)
>
> This is because we are encountering that mask_vectype is boolean type and it is external def.
> Then vect_maybe_update_slp_op_vectype will return false.
>
> Then I fix this piece of code in V5 here:
>
> + if (mask_index >= 0 && slp_node
> + && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
> + {
> + /* We don't vectorize the boolean type external SLP mask. */
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "incompatible vector types for invariants\n");
> + return false;
> + }
>
> Bootstrap and Regression on x86 passed.
Why are external defs a problem though? E.g. in pr44306, it looks like
we should be able to create an invariant mask that contains
(!present[0]) || UseDefaultScalingMatrix8x8Flag[0]
(!present[1]) || UseDefaultScalingMatrix8x8Flag[1]
(!present[0]) || UseDefaultScalingMatrix8x8Flag[0]
(!present[1]) || UseDefaultScalingMatrix8x8Flag[1]
...repeating...
The type of the mask can be driven by the code that needs it.
Thanks,
Richard
>
> Thanks.
>
>
> juzhe.zhong@rivai.ai
>
> From: Juzhe-Zhong
> Date: 2023-10-18 20:36
> To: gcc-patches
> CC: richard.sandiford; rguenther; Juzhe-Zhong
> Subject: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> This patch fixes this following FAILs in RISC-V regression:
>
> FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"
>
> The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
>
> We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD:
>
> 1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, condtional mask).
>
> This situation we just need to leverage the current MASK_GATHER_LOAD which can achieve SLP MASK_LEN_GATHER_LOAD.
>
> 2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, -1)
>
> Current SLP check will failed on dummy mask -1, so we relax the check in tree-vect-slp.cc and allow it to be materialized.
>
> Consider this following case:
>
> void __attribute__((noipa))
> f (int *restrict y, int *restrict x, int *restrict indices, int n)
> {
> for (int i = 0; i < n; ++i)
> {
> y[i * 2] = x[indices[i * 2]] + 1;
> y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
> }
> }
>
> https://godbolt.org/z/WG3M3n7Mo
>
> GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES:
>
> f:
> ble a3,zero,.L5
> .L3:
> vsetvli a5,a3,e8,mf4,ta,ma
> vsetvli zero,a5,e32,m1,ta,ma
> vlseg2e32.v v6,(a2)
> vsetvli a4,zero,e64,m2,ta,ma
> vsext.vf2 v2,v6
> vsll.vi v2,v2,2
> vsetvli zero,a5,e32,m1,ta,ma
> vluxei64.v v1,(a1),v2
> vsetvli a4,zero,e64,m2,ta,ma
> vsext.vf2 v2,v7
> vsetvli zero,zero,e32,m1,ta,ma
> vadd.vi v4,v1,1
> vsetvli zero,zero,e64,m2,ta,ma
> vsll.vi v2,v2,2
> vsetvli zero,a5,e32,m1,ta,ma
> vluxei64.v v2,(a1),v2
> vsetvli a4,zero,e32,m1,ta,ma
> slli a6,a5,3
> vadd.vi v5,v2,2
> sub a3,a3,a5
> vsetvli zero,a5,e32,m1,ta,ma
> vsseg2e32.v v4,(a0)
> add a2,a2,a6
> add a0,a0,a6
> bne a3,zero,.L3
> .L5:
> ret
>
> After this patch:
>
> f:
> ble a3,zero,.L5
> li a5,1
> csrr t1,vlenb
> slli a5,a5,33
> srli a7,t1,2
> addi a5,a5,1
> slli a3,a3,1
> neg t3,a7
> vsetvli a4,zero,e64,m1,ta,ma
> vmv.v.x v4,a5
> .L3:
> minu a5,a3,a7
> vsetvli zero,a5,e32,m1,ta,ma
> vle32.v v1,0(a2)
> vsetvli a4,zero,e64,m2,ta,ma
> vsext.vf2 v2,v1
> vsll.vi v2,v2,2
> vsetvli zero,a5,e32,m1,ta,ma
> vluxei64.v v2,(a1),v2
> vsetvli a4,zero,e32,m1,ta,ma
> mv a6,a3
> vadd.vv v2,v2,v4
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v2,0(a0)
> add a2,a2,t1
> add a0,a0,t1
> add a3,a3,t3
> bgtu a6,a7,.L3
> .L5:
> ret
>
> Note that I found we are missing conditional mask gather_load SLP test, Append a test for it in this patch.
>
> Tested on RISC-V and Bootstrap && Regression on X86 passed.
>
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD.
> (vect_get_and_check_slp_defs): Ditto.
> (vect_build_slp_tree_1): Ditto.
> (vect_build_slp_tree_2): Ditto.
> * tree-vect-stmts.cc (vectorizable_load): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-gather-6.c: New test.
>
> ---
> gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++++++++++++++
> gcc/tree-vect-slp.cc | 22 ++++++++++++++++++----
> gcc/tree-vect-stmts.cc | 12 +++++++++++-
> 3 files changed, 44 insertions(+), 5 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
> new file mode 100644
> index 00000000000..ff55f321854
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +
> +void
> +f (int *restrict y, int *restrict x, int *restrict indices, int *restrict cond, int n)
> +{
> + for (int i = 0; i < n; ++i)
> + {
> + if (cond[i * 2])
> + y[i * 2] = x[indices[i * 2]] + 1;
> + if (cond[i * 2 + 1])
> + y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
> + }
> +}
> +
> +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target vect_gather_load_ifn } } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index d081999a763..146dba658a2 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -552,6 +552,7 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
> return arg1_map;
> case IFN_MASK_GATHER_LOAD:
> + case IFN_MASK_LEN_GATHER_LOAD:
> return arg1_arg4_map;
> case IFN_MASK_STORE:
> @@ -719,8 +720,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap,
> {
> tree type = TREE_TYPE (oprnd);
> dt = dts[i];
> - if ((dt == vect_constant_def
> - || dt == vect_external_def)
> + if (dt == vect_external_def
> && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> && (TREE_CODE (type) == BOOLEAN_TYPE
> || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
> @@ -732,6 +732,17 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap,
> "for variable-length SLP %T\n", oprnd);
> return -1;
> }
> + if (dt == vect_constant_def
> + && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> + && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "Build SLP failed: invalid type of def "
> + "for variable-length SLP %T\n",
> + oprnd);
> + return -1;
> + }
> /* For the swapping logic below force vect_reduction_def
> for the reduction op in a SLP reduction group. */
> @@ -1096,7 +1107,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> if (cfn == CFN_MASK_LOAD
> || cfn == CFN_GATHER_LOAD
> - || cfn == CFN_MASK_GATHER_LOAD)
> + || cfn == CFN_MASK_GATHER_LOAD
> + || cfn == CFN_MASK_LEN_GATHER_LOAD)
> ldst_p = true;
> else if (cfn == CFN_MASK_STORE)
> {
> @@ -1357,6 +1369,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> && rhs_code != CFN_GATHER_LOAD
> && rhs_code != CFN_MASK_GATHER_LOAD
> + && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> /* Not grouped loads are handled as externals for BB
> vectorization. For loop vectorization we can handle
> splats the same we handle single element interleaving. */
> @@ -1857,7 +1870,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
> gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
> || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> - || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> + || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> + || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
> else
> {
> *max_nunits = this_max_nunits;
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index e5ff44c25f1..0bc7beb1342 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -9878,12 +9878,22 @@ vectorizable_load (vec_info *vinfo,
> return false;
> mask_index = internal_fn_mask_index (ifn);
> + slp_tree slp_op = NULL;
> if (mask_index >= 0 && slp_node)
> mask_index = vect_slp_child_index_for_operand (call, mask_index);
> if (mask_index >= 0
> && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> - &mask, NULL, &mask_dt, &mask_vectype))
> + &mask, &slp_op, &mask_dt, &mask_vectype))
> return false;
> + if (mask_index >= 0 && slp_node
> + && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
> + {
> + /* We don't vectorize the boolean type external SLP mask. */
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "incompatible vector types for invariants\n");
> + return false;
> + }
> }
> tree vectype = STMT_VINFO_VECTYPE (stmt_info);
new file mode 100644
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+void
+f (int *restrict y, int *restrict x, int *restrict indices, int *restrict cond, int n)
+{
+ for (int i = 0; i < n; ++i)
+ {
+ if (cond[i * 2])
+ y[i * 2] = x[indices[i * 2]] + 1;
+ if (cond[i * 2 + 1])
+ y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
+ }
+}
+
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target vect_gather_load_ifn } } } */
@@ -552,6 +552,7 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
return arg1_map;
case IFN_MASK_GATHER_LOAD:
+ case IFN_MASK_LEN_GATHER_LOAD:
return arg1_arg4_map;
case IFN_MASK_STORE:
@@ -719,8 +720,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap,
{
tree type = TREE_TYPE (oprnd);
dt = dts[i];
- if ((dt == vect_constant_def
- || dt == vect_external_def)
+ if (dt == vect_external_def
&& !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
&& (TREE_CODE (type) == BOOLEAN_TYPE
|| !can_duplicate_and_interleave_p (vinfo, stmts.length (),
@@ -732,6 +732,17 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap,
"for variable-length SLP %T\n", oprnd);
return -1;
}
+ if (dt == vect_constant_def
+ && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
+ && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "Build SLP failed: invalid type of def "
+ "for variable-length SLP %T\n",
+ oprnd);
+ return -1;
+ }
/* For the swapping logic below force vect_reduction_def
for the reduction op in a SLP reduction group. */
@@ -1096,7 +1107,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
if (cfn == CFN_MASK_LOAD
|| cfn == CFN_GATHER_LOAD
- || cfn == CFN_MASK_GATHER_LOAD)
+ || cfn == CFN_MASK_GATHER_LOAD
+ || cfn == CFN_MASK_LEN_GATHER_LOAD)
ldst_p = true;
else if (cfn == CFN_MASK_STORE)
{
@@ -1357,6 +1369,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
&& rhs_code != CFN_GATHER_LOAD
&& rhs_code != CFN_MASK_GATHER_LOAD
+ && rhs_code != CFN_MASK_LEN_GATHER_LOAD
/* Not grouped loads are handled as externals for BB
vectorization. For loop vectorization we can handle
splats the same we handle single element interleaving. */
@@ -1857,7 +1870,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
|| gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
- || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
+ || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
+ || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
else
{
*max_nunits = this_max_nunits;
@@ -9878,12 +9878,22 @@ vectorizable_load (vec_info *vinfo,
return false;
mask_index = internal_fn_mask_index (ifn);
+ slp_tree slp_op = NULL;
if (mask_index >= 0 && slp_node)
mask_index = vect_slp_child_index_for_operand (call, mask_index);
if (mask_index >= 0
&& !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
- &mask, NULL, &mask_dt, &mask_vectype))
+ &mask, &slp_op, &mask_dt, &mask_vectype))
return false;
+ if (mask_index >= 0 && slp_node
+ && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
+ {
+ /* We don't vectorize the boolean type external SLP mask. */
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "incompatible vector types for invariants\n");
+ return false;
+ }
}
tree vectype = STMT_VINFO_VECTYPE (stmt_info);