[v4] VECT: Refine the type size restriction of call vectorizer
Checks
Commit Message
From: Pan Li <pan2.li@intel.com>
Update in v4:
* Append the check to vectorizable_internal_function.
Update in v3:
* Add func to predicate type size is legal or not for vectorizer call.
Update in v2:
* Fix one ICE of type assertion.
* Adjust some test cases for aarch64 sve and riscv vector.
Original log:
The vectoriable_call has one restriction of the size of data type.
Aka DF to DI is allowed but SF to DI isn't. You may see below message
when try to vectorize function call like lrintf.
void
test_lrintf (long *out, float *in, unsigned count)
{
for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrintf (in[i]);
}
lrintf.c:5:26: missed: couldn't vectorize loop
lrintf.c:5:26: missed: not vectorized: unsupported data-type
Then the standard name pattern like lrintmn2 cannot work for different
data type size like SF => DI. This patch would like to refine this data
type size check and unblock the standard name like lrintmn2 on conditions.
The type size of vectype_out need to be exactly the same as the type
size of vectype_in when the vectype_out size isn't participating in
the optab selection. While there is no such restriction when the
vectype_out is somehow a part of the optab query.
The below test are passed for this patch.
* The risc-v regression tests.
* Ensure the lrintf standard name in risc-v.
The below test are ongoing.
* The x86 bootstrap and regression test.
* The aarch64 regression test.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_internal_function): Add type
size check for vectype_out doesn't participating for optab query.
(vectorizable_call): Remove the type size check.
Signed-off-by: Pan Li <pan2.li@intel.com>
---
gcc/tree-vect-stmts.cc | 22 +++++++++-------------
1 file changed, 9 insertions(+), 13 deletions(-)
Comments
The below test are passed for this patch.
* The x86 bootstrap and regression test.
* The aarch64 regression test.
* The risc-v regression tests.
* Ensure the lrintf standard name in RVV.
Pan
-----Original Message-----
From: Li, Pan2 <pan2.li@intel.com>
Sent: Tuesday, October 31, 2023 11:10 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Li, Pan2 <pan2.li@intel.com>; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; richard.guenther@gmail.com
Subject: [PATCH v4] VECT: Refine the type size restriction of call vectorizer
From: Pan Li <pan2.li@intel.com>
Update in v4:
* Append the check to vectorizable_internal_function.
Update in v3:
* Add func to predicate type size is legal or not for vectorizer call.
Update in v2:
* Fix one ICE of type assertion.
* Adjust some test cases for aarch64 sve and riscv vector.
Original log:
The vectoriable_call has one restriction of the size of data type.
Aka DF to DI is allowed but SF to DI isn't. You may see below message
when try to vectorize function call like lrintf.
void
test_lrintf (long *out, float *in, unsigned count)
{
for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrintf (in[i]);
}
lrintf.c:5:26: missed: couldn't vectorize loop
lrintf.c:5:26: missed: not vectorized: unsupported data-type
Then the standard name pattern like lrintmn2 cannot work for different
data type size like SF => DI. This patch would like to refine this data
type size check and unblock the standard name like lrintmn2 on conditions.
The type size of vectype_out need to be exactly the same as the type
size of vectype_in when the vectype_out size isn't participating in
the optab selection. While there is no such restriction when the
vectype_out is somehow a part of the optab query.
The below test are passed for this patch.
* The risc-v regression tests.
* Ensure the lrintf standard name in risc-v.
The below test are ongoing.
* The x86 bootstrap and regression test.
* The aarch64 regression test.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_internal_function): Add type
size check for vectype_out doesn't participating for optab query.
(vectorizable_call): Remove the type size check.
Signed-off-by: Pan Li <pan2.li@intel.com>
---
gcc/tree-vect-stmts.cc | 22 +++++++++-------------
1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a9200767f67..799b4ab10c7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1420,8 +1420,17 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
const direct_internal_fn_info &info = direct_internal_fn (ifn);
if (info.vectorizable)
{
+ bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
+
+ /* The type size of both the vectype_in and vectype_out should be
+ exactly the same when vectype_out isn't participating the optab.
+ While there is no restriction for type size when vectype_out
+ is part of the optab query. */
+ if (type0 != vectype_out && type1 != vectype_out && !same_size_p)
+ return IFN_LAST;
+
if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
OPTIMIZE_FOR_SPEED))
return ifn;
@@ -3361,19 +3370,6 @@ vectorizable_call (vec_info *vinfo,
return false;
}
- /* FORNOW: we don't yet support mixtures of vector sizes for calls,
- just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz*
- are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
- by a pack of the two vectors into an SI vector. We would need
- separate code to handle direct VnDI->VnSI IFN_CTZs. */
- if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
- {
- if (dump_enabled_p ())
- dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
- "mismatched vector sizes %T and %T\n",
- vectype_in, vectype_out);
- return false;
- }
if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
!= VECTOR_BOOLEAN_TYPE_P (vectype_in))
> Am 31.10.2023 um 16:10 schrieb pan2.li@intel.com:
>
> From: Pan Li <pan2.li@intel.com>
>
> Update in v4:
>
> * Append the check to vectorizable_internal_function.
>
> Update in v3:
>
> * Add func to predicate type size is legal or not for vectorizer call.
>
> Update in v2:
>
> * Fix one ICE of type assertion.
> * Adjust some test cases for aarch64 sve and riscv vector.
>
> Original log:
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
> for (unsigned i = 0; i < count; i++)
> out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to refine this data
> type size check and unblock the standard name like lrintmn2 on conditions.
>
> The type size of vectype_out need to be exactly the same as the type
> size of vectype_in when the vectype_out size isn't participating in
> the optab selection. While there is no such restriction when the
> vectype_out is somehow a part of the optab query.
>
> The below test are passed for this patch.
>
> * The risc-v regression tests.
> * Ensure the lrintf standard name in risc-v.
>
> The below test are ongoing.
>
> * The x86 bootstrap and regression test.
> * The aarch64 regression test.
>
Ok
Thanks,
Richard
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (vectorizable_internal_function): Add type
> size check for vectype_out doesn't participating for optab query.
> (vectorizable_call): Remove the type size check.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
> gcc/tree-vect-stmts.cc | 22 +++++++++-------------
> 1 file changed, 9 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index a9200767f67..799b4ab10c7 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1420,8 +1420,17 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
> const direct_internal_fn_info &info = direct_internal_fn (ifn);
> if (info.vectorizable)
> {
> + bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
> tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
> tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
> +
> + /* The type size of both the vectype_in and vectype_out should be
> + exactly the same when vectype_out isn't participating the optab.
> + While there is no restriction for type size when vectype_out
> + is part of the optab query. */
> + if (type0 != vectype_out && type1 != vectype_out && !same_size_p)
> + return IFN_LAST;
> +
> if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
> OPTIMIZE_FOR_SPEED))
> return ifn;
> @@ -3361,19 +3370,6 @@ vectorizable_call (vec_info *vinfo,
>
> return false;
> }
> - /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz*
> - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> - by a pack of the two vectors into an SI vector. We would need
> - separate code to handle direct VnDI->VnSI IFN_CTZs. */
> - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> - {
> - if (dump_enabled_p ())
> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> - "mismatched vector sizes %T and %T\n",
> - vectype_in, vectype_out);
> - return false;
> - }
>
> if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
> != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> --
> 2.34.1
>
Committed, thanks Richard.
Pan
-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com>
Sent: Thursday, November 2, 2023 12:43 AM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v4] VECT: Refine the type size restriction of call vectorizer
> Am 31.10.2023 um 16:10 schrieb pan2.li@intel.com:
>
> From: Pan Li <pan2.li@intel.com>
>
> Update in v4:
>
> * Append the check to vectorizable_internal_function.
>
> Update in v3:
>
> * Add func to predicate type size is legal or not for vectorizer call.
>
> Update in v2:
>
> * Fix one ICE of type assertion.
> * Adjust some test cases for aarch64 sve and riscv vector.
>
> Original log:
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
> for (unsigned i = 0; i < count; i++)
> out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to refine this data
> type size check and unblock the standard name like lrintmn2 on conditions.
>
> The type size of vectype_out need to be exactly the same as the type
> size of vectype_in when the vectype_out size isn't participating in
> the optab selection. While there is no such restriction when the
> vectype_out is somehow a part of the optab query.
>
> The below test are passed for this patch.
>
> * The risc-v regression tests.
> * Ensure the lrintf standard name in risc-v.
>
> The below test are ongoing.
>
> * The x86 bootstrap and regression test.
> * The aarch64 regression test.
>
Ok
Thanks,
Richard
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (vectorizable_internal_function): Add type
> size check for vectype_out doesn't participating for optab query.
> (vectorizable_call): Remove the type size check.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
> gcc/tree-vect-stmts.cc | 22 +++++++++-------------
> 1 file changed, 9 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index a9200767f67..799b4ab10c7 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1420,8 +1420,17 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
> const direct_internal_fn_info &info = direct_internal_fn (ifn);
> if (info.vectorizable)
> {
> + bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
> tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
> tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
> +
> + /* The type size of both the vectype_in and vectype_out should be
> + exactly the same when vectype_out isn't participating the optab.
> + While there is no restriction for type size when vectype_out
> + is part of the optab query. */
> + if (type0 != vectype_out && type1 != vectype_out && !same_size_p)
> + return IFN_LAST;
> +
> if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
> OPTIMIZE_FOR_SPEED))
> return ifn;
> @@ -3361,19 +3370,6 @@ vectorizable_call (vec_info *vinfo,
>
> return false;
> }
> - /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz*
> - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> - by a pack of the two vectors into an SI vector. We would need
> - separate code to handle direct VnDI->VnSI IFN_CTZs. */
> - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> - {
> - if (dump_enabled_p ())
> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> - "mismatched vector sizes %T and %T\n",
> - vectype_in, vectype_out);
> - return false;
> - }
>
> if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
> != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> --
> 2.34.1
>
@@ -1420,8 +1420,17 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
const direct_internal_fn_info &info = direct_internal_fn (ifn);
if (info.vectorizable)
{
+ bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
+
+ /* The type size of both the vectype_in and vectype_out should be
+ exactly the same when vectype_out isn't participating the optab.
+ While there is no restriction for type size when vectype_out
+ is part of the optab query. */
+ if (type0 != vectype_out && type1 != vectype_out && !same_size_p)
+ return IFN_LAST;
+
if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
OPTIMIZE_FOR_SPEED))
return ifn;
@@ -3361,19 +3370,6 @@ vectorizable_call (vec_info *vinfo,
return false;
}
- /* FORNOW: we don't yet support mixtures of vector sizes for calls,
- just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz*
- are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
- by a pack of the two vectors into an SI vector. We would need
- separate code to handle direct VnDI->VnSI IFN_CTZs. */
- if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
- {
- if (dump_enabled_p ())
- dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
- "mismatched vector sizes %T and %T\n",
- vectype_in, vectype_out);
- return false;
- }
if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
!= VECTOR_BOOLEAN_TYPE_P (vectype_in))