[V2,RFC] Support vectorization for Complex type.
Commit Message
V2 update:
Handle VMAT_ELEMENTWISE, VMAT_CONTIGUOUS_PERMUTE, VMAT_STRIDED_SLP,
VMAT_CONTIGUOUS_REVERSE, VMAT_CONTIGUOUS_DOWN for complex type.
I've run SPECspeed@2017 627.cam4_s, there's some vectorization cases,
but no big performance impact(since this patch only handle load/store).
Any comments?
gcc/ChangeLog:
PR tree-optimization/106010
* tree-vect-data-refs.cc (vect_get_data_access_cost):
Pass complex_p to vect_get_num_copies to avoid ICE.
(vect_analyze_data_refs): Support vectorization for Complex
type with vector scalar types.
(vect_permute_load_chain): Handle Complex type.
* tree-vect-loop.cc (vect_determine_vf_for_stmt_1): VF should
be half of TYPE_VECTOR_SUBPARTS when complex_p.
* tree-vect-slp.cc (vect_record_max_nunits): nunits should be
half of TYPE_VECTOR_SUBPARTS when complex_p.
(vect_optimize_slp): Support permutation for complex type.
(vect_slp_analyze_node_operations_1): Double nunits in
vect_get_num_vectors to get right SLP_TREE_NUMBER_OF_VEC_STMTS
when complex_p.
(vect_slp_analyze_node_operations): Ditto.
(vect_create_constant_vectors): Support CTOR for complex type.
(vect_transform_slp_perm_load): Support permutation for
complex type.
* tree-vect-stmts.cc (vect_init_vector): Support complex type.
(vect_get_vec_defs_for_operand): Get vector type for
complex type.
(vectorizable_store): Get right ncopies/nunits and
elem_type for complex type vector, also return false when
complex_p and !TYPE_VECTOR_SUBPARTS.is_constant ().
(vect_truncate_gather_scatter_offset): Return false for
complex type.
(vectorizable_load): Ditto.
(vect_get_vector_types_for_stmt): Get vector type for
complex type.
(get_group_load_store_type): Hanlde complex type for
nunits.
(perm_mask_for_reverse): New overload.
(get_negative_load_store_type): Handle complex type,
p_offset should be N - 2 beofre addres of DR.
(vect_check_scalar_mask): Return false for complex type.
* tree-vectorizer.h (STMT_VINFO_COMPLEX_P): New macro.
(vect_get_num_copies): New overload.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr106010-1a.c: New test.
* gcc.target/i386/pr106010-1b.c: New test.
* gcc.target/i386/pr106010-1c.c: New test.
* gcc.target/i386/pr106010-2a.c: New test.
* gcc.target/i386/pr106010-2b.c: New test.
* gcc.target/i386/pr106010-2c.c: New test.
* gcc.target/i386/pr106010-3a.c: New test.
* gcc.target/i386/pr106010-3b.c: New test.
* gcc.target/i386/pr106010-3c.c: New test.
* gcc.target/i386/pr106010-4a.c: New test.
* gcc.target/i386/pr106010-4b.c: New test.
* gcc.target/i386/pr106010-4c.c: New test.
* gcc.target/i386/pr106010-5a.c: New test.
* gcc.target/i386/pr106010-5b.c: New test.
* gcc.target/i386/pr106010-5c.c: New test.
* gcc.target/i386/pr106010-6a.c: New test.
* gcc.target/i386/pr106010-6b.c: New test.
* gcc.target/i386/pr106010-6c.c: New test.
* gcc.target/i386/pr106010-7a.c: New test.
* gcc.target/i386/pr106010-7b.c: New test.
* gcc.target/i386/pr106010-7c.c: New test.
* gcc.target/i386/pr106010-8a.c: New test.
* gcc.target/i386/pr106010-8b.c: New test.
* gcc.target/i386/pr106010-8c.c: New test.
* gcc.target/i386/pr106010-9a.c: New test.
* gcc.target/i386/pr106010-9b.c: New test.
* gcc.target/i386/pr106010-9c.c: New test.
* gcc.target/i386/pr106010-9d.c: New test.
---
gcc/testsuite/gcc.target/i386/pr106010-1a.c | 58 +++++
gcc/testsuite/gcc.target/i386/pr106010-1b.c | 63 ++++++
gcc/testsuite/gcc.target/i386/pr106010-1c.c | 41 ++++
gcc/testsuite/gcc.target/i386/pr106010-2a.c | 82 +++++++
gcc/testsuite/gcc.target/i386/pr106010-2b.c | 62 ++++++
gcc/testsuite/gcc.target/i386/pr106010-2c.c | 47 ++++
gcc/testsuite/gcc.target/i386/pr106010-3a.c | 80 +++++++
gcc/testsuite/gcc.target/i386/pr106010-3b.c | 126 +++++++++++
gcc/testsuite/gcc.target/i386/pr106010-3c.c | 69 ++++++
gcc/testsuite/gcc.target/i386/pr106010-4a.c | 101 +++++++++
gcc/testsuite/gcc.target/i386/pr106010-4b.c | 67 ++++++
gcc/testsuite/gcc.target/i386/pr106010-4c.c | 54 +++++
gcc/testsuite/gcc.target/i386/pr106010-5a.c | 117 ++++++++++
gcc/testsuite/gcc.target/i386/pr106010-5b.c | 80 +++++++
gcc/testsuite/gcc.target/i386/pr106010-5c.c | 62 ++++++
gcc/testsuite/gcc.target/i386/pr106010-6a.c | 115 ++++++++++
gcc/testsuite/gcc.target/i386/pr106010-6b.c | 157 +++++++++++++
gcc/testsuite/gcc.target/i386/pr106010-6c.c | 80 +++++++
gcc/testsuite/gcc.target/i386/pr106010-7a.c | 58 +++++
gcc/testsuite/gcc.target/i386/pr106010-7b.c | 63 ++++++
gcc/testsuite/gcc.target/i386/pr106010-7c.c | 41 ++++
gcc/testsuite/gcc.target/i386/pr106010-8a.c | 58 +++++
gcc/testsuite/gcc.target/i386/pr106010-8b.c | 53 +++++
gcc/testsuite/gcc.target/i386/pr106010-8c.c | 38 ++++
gcc/testsuite/gcc.target/i386/pr106010-9a.c | 89 ++++++++
gcc/testsuite/gcc.target/i386/pr106010-9b.c | 90 ++++++++
gcc/testsuite/gcc.target/i386/pr106010-9c.c | 90 ++++++++
gcc/testsuite/gcc.target/i386/pr106010-9d.c | 92 ++++++++
gcc/tree-vect-data-refs.cc | 134 +++++++++---
gcc/tree-vect-loop.cc | 7 +-
gcc/tree-vect-slp.cc | 174 +++++++++++----
gcc/tree-vect-stmts.cc | 231 +++++++++++++++++---
gcc/tree-vectorizer.h | 13 ++
33 files changed, 2594 insertions(+), 98 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-1a.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-1b.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-1c.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-2a.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-2b.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-2c.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-3a.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-3b.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-3c.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-4a.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-4b.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-4c.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-5a.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-5b.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-5c.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-6a.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-6b.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-6c.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-7a.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-7b.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-7c.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-8a.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-8b.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-8c.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-9a.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-9b.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-9c.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-9d.c
Comments
On Mon, Jul 18, 2022 at 4:31 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> V2 update:
> Handle VMAT_ELEMENTWISE, VMAT_CONTIGUOUS_PERMUTE, VMAT_STRIDED_SLP,
> VMAT_CONTIGUOUS_REVERSE, VMAT_CONTIGUOUS_DOWN for complex type.
>
> I've run SPECspeed@2017 627.cam4_s, there's some vectorization cases,
> but no big performance impact(since this patch only handle load/store).
>
> Any comments?
My original comments still stand (it feels like this should be more generic).
Can we go the way lowering complex loads/stores first? A large part
of the testcases
added by the patch should pass after that.
Thanks,
Richard.
> gcc/ChangeLog:
>
> PR tree-optimization/106010
> * tree-vect-data-refs.cc (vect_get_data_access_cost):
> Pass complex_p to vect_get_num_copies to avoid ICE.
> (vect_analyze_data_refs): Support vectorization for Complex
> type with vector scalar types.
> (vect_permute_load_chain): Handle Complex type.
> * tree-vect-loop.cc (vect_determine_vf_for_stmt_1): VF should
> be half of TYPE_VECTOR_SUBPARTS when complex_p.
> * tree-vect-slp.cc (vect_record_max_nunits): nunits should be
> half of TYPE_VECTOR_SUBPARTS when complex_p.
> (vect_optimize_slp): Support permutation for complex type.
> (vect_slp_analyze_node_operations_1): Double nunits in
> vect_get_num_vectors to get right SLP_TREE_NUMBER_OF_VEC_STMTS
> when complex_p.
> (vect_slp_analyze_node_operations): Ditto.
> (vect_create_constant_vectors): Support CTOR for complex type.
> (vect_transform_slp_perm_load): Support permutation for
> complex type.
> * tree-vect-stmts.cc (vect_init_vector): Support complex type.
> (vect_get_vec_defs_for_operand): Get vector type for
> complex type.
> (vectorizable_store): Get right ncopies/nunits and
> elem_type for complex type vector, also return false when
> complex_p and !TYPE_VECTOR_SUBPARTS.is_constant ().
> (vect_truncate_gather_scatter_offset): Return false for
> complex type.
> (vectorizable_load): Ditto.
> (vect_get_vector_types_for_stmt): Get vector type for
> complex type.
> (get_group_load_store_type): Hanlde complex type for
> nunits.
> (perm_mask_for_reverse): New overload.
> (get_negative_load_store_type): Handle complex type,
> p_offset should be N - 2 beofre addres of DR.
> (vect_check_scalar_mask): Return false for complex type.
> * tree-vectorizer.h (STMT_VINFO_COMPLEX_P): New macro.
> (vect_get_num_copies): New overload.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr106010-1a.c: New test.
> * gcc.target/i386/pr106010-1b.c: New test.
> * gcc.target/i386/pr106010-1c.c: New test.
> * gcc.target/i386/pr106010-2a.c: New test.
> * gcc.target/i386/pr106010-2b.c: New test.
> * gcc.target/i386/pr106010-2c.c: New test.
> * gcc.target/i386/pr106010-3a.c: New test.
> * gcc.target/i386/pr106010-3b.c: New test.
> * gcc.target/i386/pr106010-3c.c: New test.
> * gcc.target/i386/pr106010-4a.c: New test.
> * gcc.target/i386/pr106010-4b.c: New test.
> * gcc.target/i386/pr106010-4c.c: New test.
> * gcc.target/i386/pr106010-5a.c: New test.
> * gcc.target/i386/pr106010-5b.c: New test.
> * gcc.target/i386/pr106010-5c.c: New test.
> * gcc.target/i386/pr106010-6a.c: New test.
> * gcc.target/i386/pr106010-6b.c: New test.
> * gcc.target/i386/pr106010-6c.c: New test.
> * gcc.target/i386/pr106010-7a.c: New test.
> * gcc.target/i386/pr106010-7b.c: New test.
> * gcc.target/i386/pr106010-7c.c: New test.
> * gcc.target/i386/pr106010-8a.c: New test.
> * gcc.target/i386/pr106010-8b.c: New test.
> * gcc.target/i386/pr106010-8c.c: New test.
> * gcc.target/i386/pr106010-9a.c: New test.
> * gcc.target/i386/pr106010-9b.c: New test.
> * gcc.target/i386/pr106010-9c.c: New test.
> * gcc.target/i386/pr106010-9d.c: New test.
> ---
> gcc/testsuite/gcc.target/i386/pr106010-1a.c | 58 +++++
> gcc/testsuite/gcc.target/i386/pr106010-1b.c | 63 ++++++
> gcc/testsuite/gcc.target/i386/pr106010-1c.c | 41 ++++
> gcc/testsuite/gcc.target/i386/pr106010-2a.c | 82 +++++++
> gcc/testsuite/gcc.target/i386/pr106010-2b.c | 62 ++++++
> gcc/testsuite/gcc.target/i386/pr106010-2c.c | 47 ++++
> gcc/testsuite/gcc.target/i386/pr106010-3a.c | 80 +++++++
> gcc/testsuite/gcc.target/i386/pr106010-3b.c | 126 +++++++++++
> gcc/testsuite/gcc.target/i386/pr106010-3c.c | 69 ++++++
> gcc/testsuite/gcc.target/i386/pr106010-4a.c | 101 +++++++++
> gcc/testsuite/gcc.target/i386/pr106010-4b.c | 67 ++++++
> gcc/testsuite/gcc.target/i386/pr106010-4c.c | 54 +++++
> gcc/testsuite/gcc.target/i386/pr106010-5a.c | 117 ++++++++++
> gcc/testsuite/gcc.target/i386/pr106010-5b.c | 80 +++++++
> gcc/testsuite/gcc.target/i386/pr106010-5c.c | 62 ++++++
> gcc/testsuite/gcc.target/i386/pr106010-6a.c | 115 ++++++++++
> gcc/testsuite/gcc.target/i386/pr106010-6b.c | 157 +++++++++++++
> gcc/testsuite/gcc.target/i386/pr106010-6c.c | 80 +++++++
> gcc/testsuite/gcc.target/i386/pr106010-7a.c | 58 +++++
> gcc/testsuite/gcc.target/i386/pr106010-7b.c | 63 ++++++
> gcc/testsuite/gcc.target/i386/pr106010-7c.c | 41 ++++
> gcc/testsuite/gcc.target/i386/pr106010-8a.c | 58 +++++
> gcc/testsuite/gcc.target/i386/pr106010-8b.c | 53 +++++
> gcc/testsuite/gcc.target/i386/pr106010-8c.c | 38 ++++
> gcc/testsuite/gcc.target/i386/pr106010-9a.c | 89 ++++++++
> gcc/testsuite/gcc.target/i386/pr106010-9b.c | 90 ++++++++
> gcc/testsuite/gcc.target/i386/pr106010-9c.c | 90 ++++++++
> gcc/testsuite/gcc.target/i386/pr106010-9d.c | 92 ++++++++
> gcc/tree-vect-data-refs.cc | 134 +++++++++---
> gcc/tree-vect-loop.cc | 7 +-
> gcc/tree-vect-slp.cc | 174 +++++++++++----
> gcc/tree-vect-stmts.cc | 231 +++++++++++++++++---
> gcc/tree-vectorizer.h | 13 ++
> 33 files changed, 2594 insertions(+), 98 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-1a.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-1b.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-1c.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-2a.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-2b.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-2c.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-3a.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-3b.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-3c.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-4a.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-4b.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-4c.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-5a.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-5b.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-5c.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-6a.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-6b.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-6c.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-7a.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-7b.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-7c.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-8a.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-8b.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-8c.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-9a.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-9b.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-9c.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr106010-9d.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-1a.c b/gcc/testsuite/gcc.target/i386/pr106010-1a.c
> new file mode 100644
> index 00000000000..b608f484934
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-1a.c
> @@ -0,0 +1,58 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-vect-details -mprefer-vector-width=256" } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 6 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) double>} 2 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) float>} 2 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) long long int>} 2 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) int>} 2 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) short int>} 2 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(32\) char>} 2 "vect" } } */
> +
> +#define N 10000
> +void
> +__attribute__((noipa))
> +foo_pd (_Complex double* a, _Complex double* b)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = b[i];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_ps (_Complex float* a, _Complex float* b)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = b[i];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi64 (_Complex long long* a, _Complex long long* b)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = b[i];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi32 (_Complex int* a, _Complex int* b)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = b[i];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi16 (_Complex short* a, _Complex short* b)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = b[i];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi8 (_Complex char* a, _Complex char* b)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = b[i];
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-1b.c b/gcc/testsuite/gcc.target/i386/pr106010-1b.c
> new file mode 100644
> index 00000000000..0f377c3a548
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-1b.c
> @@ -0,0 +1,63 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
> +/* { dg-require-effective-target avx } */
> +
> +#include "avx-check.h"
> +#include <string.h>
> +#include "pr106010-1a.c"
> +
> +void
> +avx_test (void)
> +{
> + _Complex double* pd_src = (_Complex double*) malloc (2 * N * sizeof (double));
> + _Complex double* pd_dst = (_Complex double*) malloc (2 * N * sizeof (double));
> + _Complex float* ps_src = (_Complex float*) malloc (2 * N * sizeof (float));
> + _Complex float* ps_dst = (_Complex float*) malloc (2 * N * sizeof (float));
> + _Complex long long* epi64_src = (_Complex long long*) malloc (2 * N * sizeof (long long));
> + _Complex long long* epi64_dst = (_Complex long long*) malloc (2 * N * sizeof (long long));
> + _Complex int* epi32_src = (_Complex int*) malloc (2 * N * sizeof (int));
> + _Complex int* epi32_dst = (_Complex int*) malloc (2 * N * sizeof (int));
> + _Complex short* epi16_src = (_Complex short*) malloc (2 * N * sizeof (short));
> + _Complex short* epi16_dst = (_Complex short*) malloc (2 * N * sizeof (short));
> + _Complex char* epi8_src = (_Complex char*) malloc (2 * N * sizeof (char));
> + _Complex char* epi8_dst = (_Complex char*) malloc (2 * N * sizeof (char));
> + char* p_init = (char*) malloc (2 * N * sizeof (double));
> +
> + __builtin_memset (pd_dst, 0, 2 * N * sizeof (double));
> + __builtin_memset (ps_dst, 0, 2 * N * sizeof (float));
> + __builtin_memset (epi64_dst, 0, 2 * N * sizeof (long long));
> + __builtin_memset (epi32_dst, 0, 2 * N * sizeof (int));
> + __builtin_memset (epi16_dst, 0, 2 * N * sizeof (short));
> + __builtin_memset (epi8_dst, 0, 2 * N * sizeof (char));
> +
> + for (int i = 0; i != 2 * N * sizeof (double); i++)
> + p_init[i] = i;
> +
> + memcpy (pd_src, p_init, 2 * N * sizeof (double));
> + memcpy (ps_src, p_init, 2 * N * sizeof (float));
> + memcpy (epi64_src, p_init, 2 * N * sizeof (long long));
> + memcpy (epi32_src, p_init, 2 * N * sizeof (int));
> + memcpy (epi16_src, p_init, 2 * N * sizeof (short));
> + memcpy (epi8_src, p_init, 2 * N * sizeof (char));
> +
> + foo_pd (pd_dst, pd_src);
> + foo_ps (ps_dst, ps_src);
> + foo_epi64 (epi64_dst, epi64_src);
> + foo_epi32 (epi32_dst, epi32_src);
> + foo_epi16 (epi16_dst, epi16_src);
> + foo_epi8 (epi8_dst, epi8_src);
> + if (__builtin_memcmp (pd_dst, pd_src, N * 2 * sizeof (double)) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (ps_dst, ps_src, N * 2 * sizeof (float)) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi64_dst, epi64_src, N * 2 * sizeof (long long)) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi32_dst, epi32_src, N * 2 * sizeof (int)) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi16_dst, epi16_src, N * 2 * sizeof (short)) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi8_dst, epi8_src, N * 2 * sizeof (char)) != 0)
> + __builtin_abort ();
> +
> + return;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-1c.c b/gcc/testsuite/gcc.target/i386/pr106010-1c.c
> new file mode 100644
> index 00000000000..f07e9fb2d3d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-1c.c
> @@ -0,0 +1,41 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256 -fdump-tree-vect-details" } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) _Float16>} 2 "vect" } } */
> +/* { dg-require-effective-target avx512fp16 } */
> +
> +#include <string.h>
> +
> +static void do_test (void);
> +
> +#define DO_TEST do_test
> +#define AVX512FP16
> +#include "avx512-check.h"
> +
> +#define N 10000
> +
> +void
> +__attribute__((noipa))
> +foo_ph (_Complex _Float16* a, _Complex _Float16* b)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = b[i];
> +}
> +
> +static void
> +do_test (void)
> +{
> + _Complex _Float16* ph_src = (_Complex _Float16*) malloc (2 * N * sizeof (_Float16));
> + _Complex _Float16* ph_dst = (_Complex _Float16*) malloc (2 * N * sizeof (_Float16));
> + char* p_init = (char*) malloc (2 * N * sizeof (_Float16));
> +
> + __builtin_memset (ph_dst, 0, 2 * N * sizeof (_Float16));
> +
> + for (int i = 0; i != 2 * N * sizeof (_Float16); i++)
> + p_init[i] = i;
> +
> + memcpy (ph_src, p_init, 2 * N * sizeof (_Float16));
> +
> + foo_ph (ph_dst, ph_src);
> + if (__builtin_memcmp (ph_dst, ph_src, N * 2 * sizeof (_Float16)) != 0)
> + __builtin_abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-2a.c b/gcc/testsuite/gcc.target/i386/pr106010-2a.c
> new file mode 100644
> index 00000000000..d2e2f8d4f43
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-2a.c
> @@ -0,0 +1,82 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-slp-details -mprefer-vector-width=256" } */
> +/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 6 "slp2" } }*/
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) double>} 2 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) float>} 2 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) long long int>} 2 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) int>} 2 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) short int>} 2 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(32\) char>} 2 "slp2" } } */
> +
> +void
> +__attribute__((noipa))
> +foo_pd (_Complex double* a, _Complex double* __restrict b)
> +{
> + a[0] = b[0];
> + a[1] = b[1];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_ps (_Complex float* a, _Complex float* __restrict b)
> +{
> + a[0] = b[0];
> + a[1] = b[1];
> + a[2] = b[2];
> + a[3] = b[3];
> +
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi64 (_Complex long long* a, _Complex long long* __restrict b)
> +{
> + a[0] = b[0];
> + a[1] = b[1];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi32 (_Complex int* a, _Complex int* __restrict b)
> +{
> + a[0] = b[0];
> + a[1] = b[1];
> + a[2] = b[2];
> + a[3] = b[3];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi16 (_Complex short* a, _Complex short* __restrict b)
> +{
> + a[0] = b[0];
> + a[1] = b[1];
> + a[2] = b[2];
> + a[3] = b[3];
> + a[4] = b[4];
> + a[5] = b[5];
> + a[6] = b[6];
> + a[7] = b[7];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi8 (_Complex char* a, _Complex char* __restrict b)
> +{
> + a[0] = b[0];
> + a[1] = b[1];
> + a[2] = b[2];
> + a[3] = b[3];
> + a[4] = b[4];
> + a[5] = b[5];
> + a[6] = b[6];
> + a[7] = b[7];
> + a[8] = b[8];
> + a[9] = b[9];
> + a[10] = b[10];
> + a[11] = b[11];
> + a[12] = b[12];
> + a[13] = b[13];
> + a[14] = b[14];
> + a[15] = b[15];
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-2b.c b/gcc/testsuite/gcc.target/i386/pr106010-2b.c
> new file mode 100644
> index 00000000000..ac360752693
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-2b.c
> @@ -0,0 +1,62 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
> +/* { dg-require-effective-target avx } */
> +
> +#include "avx-check.h"
> +#include <string.h>
> +#include "pr106010-2a.c"
> +
> +void
> +avx_test (void)
> +{
> + _Complex double* pd_src = (_Complex double*) malloc (32);
> + _Complex double* pd_dst = (_Complex double*) malloc (32);
> + _Complex float* ps_src = (_Complex float*) malloc (32);
> + _Complex float* ps_dst = (_Complex float*) malloc (32);
> + _Complex long long* epi64_src = (_Complex long long*) malloc (32);
> + _Complex long long* epi64_dst = (_Complex long long*) malloc (32);
> + _Complex int* epi32_src = (_Complex int*) malloc (32);
> + _Complex int* epi32_dst = (_Complex int*) malloc (32);
> + _Complex short* epi16_src = (_Complex short*) malloc (32);
> + _Complex short* epi16_dst = (_Complex short*) malloc (32);
> + _Complex char* epi8_src = (_Complex char*) malloc (32);
> + _Complex char* epi8_dst = (_Complex char*) malloc (32);
> + char* p = (char* ) malloc (32);
> +
> + __builtin_memset (pd_dst, 0, 32);
> + __builtin_memset (ps_dst, 0, 32);
> + __builtin_memset (epi64_dst, 0, 32);
> + __builtin_memset (epi32_dst, 0, 32);
> + __builtin_memset (epi16_dst, 0, 32);
> + __builtin_memset (epi8_dst, 0, 32);
> +
> + for (int i = 0; i != 32; i++)
> + p[i] = i;
> + __builtin_memcpy (pd_src, p, 32);
> + __builtin_memcpy (ps_src, p, 32);
> + __builtin_memcpy (epi64_src, p, 32);
> + __builtin_memcpy (epi32_src, p, 32);
> + __builtin_memcpy (epi16_src, p, 32);
> + __builtin_memcpy (epi8_src, p, 32);
> +
> + foo_pd (pd_dst, pd_src);
> + foo_ps (ps_dst, ps_src);
> + foo_epi64 (epi64_dst, epi64_src);
> + foo_epi32 (epi32_dst, epi32_src);
> + foo_epi16 (epi16_dst, epi16_src);
> + foo_epi8 (epi8_dst, epi8_src);
> + if (__builtin_memcmp (pd_dst, pd_src, 32) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (ps_dst, ps_src, 32) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi64_dst, epi64_src, 32) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi32_dst, epi32_src, 32) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi16_dst, epi16_src, 32) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi16_dst, epi16_src, 32) != 0)
> + __builtin_abort ();
> +
> + return;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-2c.c b/gcc/testsuite/gcc.target/i386/pr106010-2c.c
> new file mode 100644
> index 00000000000..a002f209ec9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-2c.c
> @@ -0,0 +1,47 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256 -fdump-tree-slp-details" } */
> +/* { dg-require-effective-target avx512fp16 } */
> +
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) _Float16>} 2 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 1 "slp2" } }*/
> +
> +#include <string.h>
> +
> +static void do_test (void);
> +#define DO_TEST do_test
> +#define AVX512FP16
> +#include "avx512-check.h"
> +
> +void
> +__attribute__((noipa))
> +foo_ph (_Complex _Float16* a, _Complex _Float16* __restrict b)
> +{
> + a[0] = b[0];
> + a[1] = b[1];
> + a[2] = b[2];
> + a[3] = b[3];
> + a[4] = b[4];
> + a[5] = b[5];
> + a[6] = b[6];
> + a[7] = b[7];
> +}
> +
> +void
> +do_test (void)
> +{
> + _Complex _Float16* ph_src = (_Complex _Float16*) malloc (32);
> + _Complex _Float16* ph_dst = (_Complex _Float16*) malloc (32);
> + char* p = (char* ) malloc (32);
> +
> + __builtin_memset (ph_dst, 0, 32);
> +
> + for (int i = 0; i != 32; i++)
> + p[i] = i;
> + __builtin_memcpy (ph_src, p, 32);
> +
> + foo_ph (ph_dst, ph_src);
> + if (__builtin_memcmp (ph_dst, ph_src, 32) != 0)
> + __builtin_abort ();
> +
> + return;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-3a.c b/gcc/testsuite/gcc.target/i386/pr106010-3a.c
> new file mode 100644
> index 00000000000..c1b64b56b1c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-3a.c
> @@ -0,0 +1,80 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-slp-details" } */
> +/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 6 "slp2" } }*/
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 2, 3, 0, 1 \}} 2 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 6, 7, 4, 5, 2, 3, 0, 1 \}} 1 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 2, 3, 0, 1, 6, 7, 4, 5 \}} 1 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1 \}} 1 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1, 30, 31, 28, 29, 26, 27, 24, 25, 22, 23, 20, 21, 18, 19, 16, 17 \}} 1 "slp2" } } */
> +
> +void
> +__attribute__((noipa))
> +foo_pd (_Complex double* a, _Complex double* __restrict b)
> +{
> + a[0] = b[1];
> + a[1] = b[0];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_ps (_Complex float* a, _Complex float* __restrict b)
> +{
> + a[0] = b[1];
> + a[1] = b[0];
> + a[2] = b[3];
> + a[3] = b[2];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi64 (_Complex long long* a, _Complex long long* __restrict b)
> +{
> + a[0] = b[1];
> + a[1] = b[0];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi32 (_Complex int* a, _Complex int* __restrict b)
> +{
> + a[0] = b[3];
> + a[1] = b[2];
> + a[2] = b[1];
> + a[3] = b[0];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi16 (_Complex short* a, _Complex short* __restrict b)
> +{
> + a[0] = b[7];
> + a[1] = b[6];
> + a[2] = b[5];
> + a[3] = b[4];
> + a[4] = b[3];
> + a[5] = b[2];
> + a[6] = b[1];
> + a[7] = b[0];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi8 (_Complex char* a, _Complex char* __restrict b)
> +{
> + a[0] = b[7];
> + a[1] = b[6];
> + a[2] = b[5];
> + a[3] = b[4];
> + a[4] = b[3];
> + a[5] = b[2];
> + a[6] = b[1];
> + a[7] = b[0];
> + a[8] = b[15];
> + a[9] = b[14];
> + a[10] = b[13];
> + a[11] = b[12];
> + a[12] = b[11];
> + a[13] = b[10];
> + a[14] = b[9];
> + a[15] = b[8];
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-3b.c b/gcc/testsuite/gcc.target/i386/pr106010-3b.c
> new file mode 100644
> index 00000000000..e4fa3f3a541
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-3b.c
> @@ -0,0 +1,126 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
> +/* { dg-require-effective-target avx2 } */
> +
> +#include "avx2-check.h"
> +#include <string.h>
> +#include "pr106010-3a.c"
> +
> +void
> +avx2_test (void)
> +{
> + _Complex double* pd_src = (_Complex double*) malloc (32);
> + _Complex double* pd_dst = (_Complex double*) malloc (32);
> + _Complex double* pd_exp = (_Complex double*) malloc (32);
> + _Complex float* ps_src = (_Complex float*) malloc (32);
> + _Complex float* ps_dst = (_Complex float*) malloc (32);
> + _Complex float* ps_exp = (_Complex float*) malloc (32);
> + _Complex long long* epi64_src = (_Complex long long*) malloc (32);
> + _Complex long long* epi64_dst = (_Complex long long*) malloc (32);
> + _Complex long long* epi64_exp = (_Complex long long*) malloc (32);
> + _Complex int* epi32_src = (_Complex int*) malloc (32);
> + _Complex int* epi32_dst = (_Complex int*) malloc (32);
> + _Complex int* epi32_exp = (_Complex int*) malloc (32);
> + _Complex short* epi16_src = (_Complex short*) malloc (32);
> + _Complex short* epi16_dst = (_Complex short*) malloc (32);
> + _Complex short* epi16_exp = (_Complex short*) malloc (32);
> + _Complex char* epi8_src = (_Complex char*) malloc (32);
> + _Complex char* epi8_dst = (_Complex char*) malloc (32);
> + _Complex char* epi8_exp = (_Complex char*) malloc (32);
> + char* p = (char* ) malloc (32);
> + char* q = (char* ) malloc (32);
> +
> + __builtin_memset (pd_dst, 0, 32);
> + __builtin_memset (ps_dst, 0, 32);
> + __builtin_memset (epi64_dst, 0, 32);
> + __builtin_memset (epi32_dst, 0, 32);
> + __builtin_memset (epi16_dst, 0, 32);
> + __builtin_memset (epi8_dst, 0, 32);
> +
> + for (int i = 0; i != 32; i++)
> + p[i] = i;
> + __builtin_memcpy (pd_src, p, 32);
> + __builtin_memcpy (ps_src, p, 32);
> + __builtin_memcpy (epi64_src, p, 32);
> + __builtin_memcpy (epi32_src, p, 32);
> + __builtin_memcpy (epi16_src, p, 32);
> + __builtin_memcpy (epi8_src, p, 32);
> +
> + for (int i = 0; i != 16; i++)
> + {
> + p[i] = i + 16;
> + p[i + 16] = i;
> + }
> + __builtin_memcpy (pd_exp, p, 32);
> + __builtin_memcpy (epi64_exp, p, 32);
> +
> + for (int i = 0; i != 8; i++)
> + {
> + p[i] = i + 8;
> + p[i + 8] = i;
> + p[i + 16] = i + 24;
> + p[i + 24] = i + 16;
> + q[i] = i + 24;
> + q[i + 8] = i + 16;
> + q[i + 16] = i + 8;
> + q[i + 24] = i;
> + }
> + __builtin_memcpy (ps_exp, p, 32);
> + __builtin_memcpy (epi32_exp, q, 32);
> +
> +
> + for (int i = 0; i != 4; i++)
> + {
> + q[i] = i + 28;
> + q[i + 4] = i + 24;
> + q[i + 8] = i + 20;
> + q[i + 12] = i + 16;
> + q[i + 16] = i + 12;
> + q[i + 20] = i + 8;
> + q[i + 24] = i + 4;
> + q[i + 28] = i;
> + }
> + __builtin_memcpy (epi16_exp, q, 32);
> +
> + for (int i = 0; i != 2; i++)
> + {
> + q[i] = i + 14;
> + q[i + 2] = i + 12;
> + q[i + 4] = i + 10;
> + q[i + 6] = i + 8;
> + q[i + 8] = i + 6;
> + q[i + 10] = i + 4;
> + q[i + 12] = i + 2;
> + q[i + 14] = i;
> + q[i + 16] = i + 30;
> + q[i + 18] = i + 28;
> + q[i + 20] = i + 26;
> + q[i + 22] = i + 24;
> + q[i + 24] = i + 22;
> + q[i + 26] = i + 20;
> + q[i + 28] = i + 18;
> + q[i + 30] = i + 16;
> + }
> + __builtin_memcpy (epi8_exp, q, 32);
> +
> + foo_pd (pd_dst, pd_src);
> + foo_ps (ps_dst, ps_src);
> + foo_epi64 (epi64_dst, epi64_src);
> + foo_epi32 (epi32_dst, epi32_src);
> + foo_epi16 (epi16_dst, epi16_src);
> + foo_epi8 (epi8_dst, epi8_src);
> + if (__builtin_memcmp (pd_dst, pd_exp, 32) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (ps_dst, ps_exp, 32) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi64_dst, epi64_exp, 32) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi32_dst, epi32_exp, 32) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi16_dst, epi16_exp, 32) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi8_dst, epi8_exp, 32) != 0)
> + __builtin_abort ();
> +
> + return;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-3c.c b/gcc/testsuite/gcc.target/i386/pr106010-3c.c
> new file mode 100644
> index 00000000000..5a5a3d4b992
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-3c.c
> @@ -0,0 +1,69 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256 -fdump-tree-slp-details" } */
> +/* { dg-require-effective-target avx512fp16 } */
> +/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 1 "slp2" } }*/
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 2, 3, 0, 1, 8, 9, 6, 7, 14, 15, 12, 13, 4, 5, 10, 11 \}} 1 "slp2" } } */
> +
> +#include <string.h>
> +
> +static void do_test (void);
> +#define DO_TEST do_test
> +#define AVX512FP16
> +#include "avx512-check.h"
> +
> +void
> +__attribute__((noipa))
> +foo_ph (_Complex _Float16* a, _Complex _Float16* __restrict b)
> +{
> + a[0] = b[1];
> + a[1] = b[0];
> + a[2] = b[4];
> + a[3] = b[3];
> + a[4] = b[7];
> + a[5] = b[6];
> + a[6] = b[2];
> + a[7] = b[5];
> +}
> +
> +void
> +do_test (void)
> +{
> + _Complex _Float16* ph_src = (_Complex _Float16*) malloc (32);
> + _Complex _Float16* ph_dst = (_Complex _Float16*) malloc (32);
> + _Complex _Float16* ph_exp = (_Complex _Float16*) malloc (32);
> + char* p = (char* ) malloc (32);
> + char* q = (char* ) malloc (32);
> +
> + __builtin_memset (ph_dst, 0, 32);
> +
> + for (int i = 0; i != 32; i++)
> + p[i] = i;
> + __builtin_memcpy (ph_src, p, 32);
> +
> + for (int i = 0; i != 4; i++)
> + {
> + p[i] = i + 4;
> + p[i + 4] = i;
> + p[i + 8] = i + 16;
> + p[i + 12] = i + 12;
> + p[i + 16] = i + 28;
> + p[i + 20] = i + 24;
> + p[i + 24] = i + 8;
> + p[i + 28] = i + 20;
> + q[i] = i + 28;
> + q[i + 4] = i + 24;
> + q[i + 8] = i + 20;
> + q[i + 12] = i + 16;
> + q[i + 16] = i + 12;
> + q[i + 20] = i + 8;
> + q[i + 24] = i + 4;
> + q[i + 28] = i;
> + }
> + __builtin_memcpy (ph_exp, p, 32);
> +
> + foo_ph (ph_dst, ph_src);
> + if (__builtin_memcmp (ph_dst, ph_exp, 32) != 0)
> + __builtin_abort ();
> +
> + return;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-4a.c b/gcc/testsuite/gcc.target/i386/pr106010-4a.c
> new file mode 100644
> index 00000000000..b7b0b532bb1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-4a.c
> @@ -0,0 +1,101 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-slp-details" } */
> +/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 6 "slp2" } }*/
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) double>} 1 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) float>} 1 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) long long int>} 1 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) int>} 1 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) short int>} 1 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(32\) char>} 1 "slp2" } } */
> +
> +void
> +__attribute__((noipa))
> +foo_pd (_Complex double* a,
> + _Complex double b1,
> + _Complex double b2)
> +{
> + a[0] = b1;
> + a[1] = b2;
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_ps (_Complex float* a,
> + _Complex float b1, _Complex float b2,
> + _Complex float b3, _Complex float b4)
> +{
> + a[0] = b1;
> + a[1] = b2;
> + a[2] = b3;
> + a[3] = b4;
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi64 (_Complex long long* a,
> + _Complex long long b1,
> + _Complex long long b2)
> +{
> + a[0] = b1;
> + a[1] = b2;
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi32 (_Complex int* a,
> + _Complex int b1, _Complex int b2,
> + _Complex int b3, _Complex int b4)
> +{
> + a[0] = b1;
> + a[1] = b2;
> + a[2] = b3;
> + a[3] = b4;
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi16 (_Complex short* a,
> + _Complex short b1, _Complex short b2,
> + _Complex short b3, _Complex short b4,
> + _Complex short b5, _Complex short b6,
> + _Complex short b7,_Complex short b8)
> +{
> + a[0] = b1;
> + a[1] = b2;
> + a[2] = b3;
> + a[3] = b4;
> + a[4] = b5;
> + a[5] = b6;
> + a[6] = b7;
> + a[7] = b8;
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi8 (_Complex char* a,
> + _Complex char b1, _Complex char b2,
> + _Complex char b3, _Complex char b4,
> + _Complex char b5, _Complex char b6,
> + _Complex char b7,_Complex char b8,
> + _Complex char b9, _Complex char b10,
> + _Complex char b11, _Complex char b12,
> + _Complex char b13, _Complex char b14,
> + _Complex char b15,_Complex char b16)
> +{
> + a[0] = b1;
> + a[1] = b2;
> + a[2] = b3;
> + a[3] = b4;
> + a[4] = b5;
> + a[5] = b6;
> + a[6] = b7;
> + a[7] = b8;
> + a[8] = b9;
> + a[9] = b10;
> + a[10] = b11;
> + a[11] = b12;
> + a[12] = b13;
> + a[13] = b14;
> + a[14] = b15;
> + a[15] = b16;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-4b.c b/gcc/testsuite/gcc.target/i386/pr106010-4b.c
> new file mode 100644
> index 00000000000..e2e79508c4b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-4b.c
> @@ -0,0 +1,67 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
> +/* { dg-require-effective-target avx } */
> +
> +#include "avx-check.h"
> +#include <string.h>
> +#include "pr106010-4a.c"
> +
> +void
> +avx_test (void)
> +{
> + _Complex double* pd_src = (_Complex double*) malloc (32);
> + _Complex double* pd_dst = (_Complex double*) malloc (32);
> + _Complex float* ps_src = (_Complex float*) malloc (32);
> + _Complex float* ps_dst = (_Complex float*) malloc (32);
> + _Complex long long* epi64_src = (_Complex long long*) malloc (32);
> + _Complex long long* epi64_dst = (_Complex long long*) malloc (32);
> + _Complex int* epi32_src = (_Complex int*) malloc (32);
> + _Complex int* epi32_dst = (_Complex int*) malloc (32);
> + _Complex short* epi16_src = (_Complex short*) malloc (32);
> + _Complex short* epi16_dst = (_Complex short*) malloc (32);
> + _Complex char* epi8_src = (_Complex char*) malloc (32);
> + _Complex char* epi8_dst = (_Complex char*) malloc (32);
> + char* p = (char* ) malloc (32);
> +
> + __builtin_memset (pd_dst, 0, 32);
> + __builtin_memset (ps_dst, 0, 32);
> + __builtin_memset (epi64_dst, 0, 32);
> + __builtin_memset (epi32_dst, 0, 32);
> + __builtin_memset (epi16_dst, 0, 32);
> + __builtin_memset (epi8_dst, 0, 32);
> +
> + for (int i = 0; i != 32; i++)
> + p[i] = i;
> + __builtin_memcpy (pd_src, p, 32);
> + __builtin_memcpy (ps_src, p, 32);
> + __builtin_memcpy (epi64_src, p, 32);
> + __builtin_memcpy (epi32_src, p, 32);
> + __builtin_memcpy (epi16_src, p, 32);
> + __builtin_memcpy (epi8_src, p, 32);
> +
> + foo_pd (pd_dst, pd_src[0], pd_src[1]);
> + foo_ps (ps_dst, ps_src[0], ps_src[1], ps_src[2], ps_src[3]);
> + foo_epi64 (epi64_dst, epi64_src[0], epi64_src[1]);
> + foo_epi32 (epi32_dst, epi32_src[0], epi32_src[1], epi32_src[2], epi32_src[3]);
> + foo_epi16 (epi16_dst, epi16_src[0], epi16_src[1], epi16_src[2], epi16_src[3],
> + epi16_src[4], epi16_src[5], epi16_src[6], epi16_src[7]);
> + foo_epi8 (epi8_dst, epi8_src[0], epi8_src[1], epi8_src[2], epi8_src[3],
> + epi8_src[4], epi8_src[5], epi8_src[6], epi8_src[7],
> + epi8_src[8], epi8_src[9], epi8_src[10], epi8_src[11],
> + epi8_src[12], epi8_src[13], epi8_src[14], epi8_src[15]);
> +
> + if (__builtin_memcmp (pd_dst, pd_src, 32) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (ps_dst, ps_src, 32) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi64_dst, epi64_src, 32) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi32_dst, epi32_src, 32) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi16_dst, epi16_src, 32) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi8_dst, epi8_src, 32) != 0)
> + __builtin_abort ();
> +
> + return;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-4c.c b/gcc/testsuite/gcc.target/i386/pr106010-4c.c
> new file mode 100644
> index 00000000000..8e02aefe3b5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-4c.c
> @@ -0,0 +1,54 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -fdump-tree-slp-details -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
> +/* { dg-require-effective-target avx512fp16 } */
> +/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 1 "slp2" } }*/
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) _Float16>} 1 "slp2" } } */
> +
> +#include <string.h>
> +
> +static void do_test (void);
> +#define DO_TEST do_test
> +#define AVX512FP16
> +#include "avx512-check.h"
> +
> +void
> +__attribute__((noipa))
> +foo_ph (_Complex _Float16* a,
> + _Complex _Float16 b1, _Complex _Float16 b2,
> + _Complex _Float16 b3, _Complex _Float16 b4,
> + _Complex _Float16 b5, _Complex _Float16 b6,
> + _Complex _Float16 b7,_Complex _Float16 b8)
> +{
> + a[0] = b1;
> + a[1] = b2;
> + a[2] = b3;
> + a[3] = b4;
> + a[4] = b5;
> + a[5] = b6;
> + a[6] = b7;
> + a[7] = b8;
> +}
> +
> +void
> +do_test (void)
> +{
> +
> + _Complex _Float16* ph_src = (_Complex _Float16*) malloc (32);
> + _Complex _Float16* ph_dst = (_Complex _Float16*) malloc (32);
> +
> + char* p = (char* ) malloc (32);
> +
> + __builtin_memset (ph_dst, 0, 32);
> +
> + for (int i = 0; i != 32; i++)
> + p[i] = i;
> +
> + __builtin_memcpy (ph_src, p, 32);
> +
> + foo_ph (ph_dst, ph_src[0], ph_src[1], ph_src[2], ph_src[3],
> + ph_src[4], ph_src[5], ph_src[6], ph_src[7]);
> +
> + if (__builtin_memcmp (ph_dst, ph_src, 32) != 0)
> + __builtin_abort ();
> + return;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-5a.c b/gcc/testsuite/gcc.target/i386/pr106010-5a.c
> new file mode 100644
> index 00000000000..9d4a6f9846b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-5a.c
> @@ -0,0 +1,117 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-slp-details -mprefer-vector-width=256" } */
> +/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 6 "slp2" } }*/
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) double>} 4 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) float>} 4 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) long long int>} 4 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) int>} 4 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) short int>} 4 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(32\) char>} 4 "slp2" } } */
> +
> +void
> +__attribute__((noipa))
> +foo_pd (_Complex double* a, _Complex double* __restrict b)
> +{
> + a[0] = b[2];
> + a[1] = b[3];
> + a[2] = b[0];
> + a[3] = b[1];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_ps (_Complex float* a, _Complex float* __restrict b)
> +{
> + a[0] = b[4];
> + a[1] = b[5];
> + a[2] = b[6];
> + a[3] = b[7];
> + a[4] = b[0];
> + a[5] = b[1];
> + a[6] = b[2];
> + a[7] = b[3];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi64 (_Complex long long* a, _Complex long long* __restrict b)
> +{
> + a[0] = b[2];
> + a[1] = b[3];
> + a[2] = b[0];
> + a[3] = b[1];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi32 (_Complex int* a, _Complex int* __restrict b)
> +{
> + a[0] = b[4];
> + a[1] = b[5];
> + a[2] = b[6];
> + a[3] = b[7];
> + a[4] = b[0];
> + a[5] = b[1];
> + a[6] = b[2];
> + a[7] = b[3];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi16 (_Complex short* a, _Complex short* __restrict b)
> +{
> + a[0] = b[8];
> + a[1] = b[9];
> + a[2] = b[10];
> + a[3] = b[11];
> + a[4] = b[12];
> + a[5] = b[13];
> + a[6] = b[14];
> + a[7] = b[15];
> + a[8] = b[0];
> + a[9] = b[1];
> + a[10] = b[2];
> + a[11] = b[3];
> + a[12] = b[4];
> + a[13] = b[5];
> + a[14] = b[6];
> + a[15] = b[7];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi8 (_Complex char* a, _Complex char* __restrict b)
> +{
> + a[0] = b[16];
> + a[1] = b[17];
> + a[2] = b[18];
> + a[3] = b[19];
> + a[4] = b[20];
> + a[5] = b[21];
> + a[6] = b[22];
> + a[7] = b[23];
> + a[8] = b[24];
> + a[9] = b[25];
> + a[10] = b[26];
> + a[11] = b[27];
> + a[12] = b[28];
> + a[13] = b[29];
> + a[14] = b[30];
> + a[15] = b[31];
> + a[16] = b[0];
> + a[17] = b[1];
> + a[18] = b[2];
> + a[19] = b[3];
> + a[20] = b[4];
> + a[21] = b[5];
> + a[22] = b[6];
> + a[23] = b[7];
> + a[24] = b[8];
> + a[25] = b[9];
> + a[26] = b[10];
> + a[27] = b[11];
> + a[28] = b[12];
> + a[29] = b[13];
> + a[30] = b[14];
> + a[31] = b[15];
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-5b.c b/gcc/testsuite/gcc.target/i386/pr106010-5b.c
> new file mode 100644
> index 00000000000..d5c6ebeb5cf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-5b.c
> @@ -0,0 +1,80 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
> +/* { dg-require-effective-target avx } */
> +
> +#include "avx-check.h"
> +#include <string.h>
> +#include "pr106010-5a.c"
> +
> +void
> +avx_test (void)
> +{
> + _Complex double* pd_src = (_Complex double*) malloc (64);
> + _Complex double* pd_dst = (_Complex double*) malloc (64);
> + _Complex double* pd_exp = (_Complex double*) malloc (64);
> + _Complex float* ps_src = (_Complex float*) malloc (64);
> + _Complex float* ps_dst = (_Complex float*) malloc (64);
> + _Complex float* ps_exp = (_Complex float*) malloc (64);
> + _Complex long long* epi64_src = (_Complex long long*) malloc (64);
> + _Complex long long* epi64_dst = (_Complex long long*) malloc (64);
> + _Complex long long* epi64_exp = (_Complex long long*) malloc (64);
> + _Complex int* epi32_src = (_Complex int*) malloc (64);
> + _Complex int* epi32_dst = (_Complex int*) malloc (64);
> + _Complex int* epi32_exp = (_Complex int*) malloc (64);
> + _Complex short* epi16_src = (_Complex short*) malloc (64);
> + _Complex short* epi16_dst = (_Complex short*) malloc (64);
> + _Complex short* epi16_exp = (_Complex short*) malloc (64);
> + _Complex char* epi8_src = (_Complex char*) malloc (64);
> + _Complex char* epi8_dst = (_Complex char*) malloc (64);
> + _Complex char* epi8_exp = (_Complex char*) malloc (64);
> + char* p = (char* ) malloc (64);
> + char* q = (char* ) malloc (64);
> +
> + __builtin_memset (pd_dst, 0, 64);
> + __builtin_memset (ps_dst, 0, 64);
> + __builtin_memset (epi64_dst, 0, 64);
> + __builtin_memset (epi32_dst, 0, 64);
> + __builtin_memset (epi16_dst, 0, 64);
> + __builtin_memset (epi8_dst, 0, 64);
> +
> + for (int i = 0; i != 64; i++)
> + {
> + p[i] = i;
> + q[i] = (i + 32) % 64;
> + }
> + __builtin_memcpy (pd_src, p, 64);
> + __builtin_memcpy (ps_src, p, 64);
> + __builtin_memcpy (epi64_src, p, 64);
> + __builtin_memcpy (epi32_src, p, 64);
> + __builtin_memcpy (epi16_src, p, 64);
> + __builtin_memcpy (epi8_src, p, 64);
> +
> + __builtin_memcpy (pd_exp, q, 64);
> + __builtin_memcpy (ps_exp, q, 64);
> + __builtin_memcpy (epi64_exp, q, 64);
> + __builtin_memcpy (epi32_exp, q, 64);
> + __builtin_memcpy (epi16_exp, q, 64);
> + __builtin_memcpy (epi8_exp, q, 64);
> +
> + foo_pd (pd_dst, pd_src);
> + foo_ps (ps_dst, ps_src);
> + foo_epi64 (epi64_dst, epi64_src);
> + foo_epi32 (epi32_dst, epi32_src);
> + foo_epi16 (epi16_dst, epi16_src);
> + foo_epi8 (epi8_dst, epi8_src);
> +
> + if (__builtin_memcmp (pd_dst, pd_exp, 64) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (ps_dst, ps_exp, 64) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi64_dst, epi64_exp, 64) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi32_dst, epi32_exp, 64) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi16_dst, epi16_exp, 64) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi8_dst, epi8_exp, 64) != 0)
> + __builtin_abort ();
> +
> + return;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-5c.c b/gcc/testsuite/gcc.target/i386/pr106010-5c.c
> new file mode 100644
> index 00000000000..9ce4e6dd5c0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-5c.c
> @@ -0,0 +1,62 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-slp-details -mprefer-vector-width=256" } */
> +/* { dg-require-effective-target avx512fp16 } */
> +/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 1 "slp2" } }*/
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) _Float16>} 4 "slp2" } } */
> +
> +#include <string.h>
> +
> +static void do_test (void);
> +#define DO_TEST do_test
> +#define AVX512FP16
> +#include "avx512-check.h"
> +
> +void
> +__attribute__((noipa))
> +foo_ph (_Complex _Float16* a, _Complex _Float16* __restrict b)
> +{
> + a[0] = b[8];
> + a[1] = b[9];
> + a[2] = b[10];
> + a[3] = b[11];
> + a[4] = b[12];
> + a[5] = b[13];
> + a[6] = b[14];
> + a[7] = b[15];
> + a[8] = b[0];
> + a[9] = b[1];
> + a[10] = b[2];
> + a[11] = b[3];
> + a[12] = b[4];
> + a[13] = b[5];
> + a[14] = b[6];
> + a[15] = b[7];
> +}
> +
> +void
> +do_test (void)
> +{
> + _Complex _Float16* ph_src = (_Complex _Float16*) malloc (64);
> + _Complex _Float16* ph_dst = (_Complex _Float16*) malloc (64);
> + _Complex _Float16* ph_exp = (_Complex _Float16*) malloc (64);
> + char* p = (char* ) malloc (64);
> + char* q = (char* ) malloc (64);
> +
> + __builtin_memset (ph_dst, 0, 64);
> +
> + for (int i = 0; i != 64; i++)
> + {
> + p[i] = i;
> + q[i] = (i + 32) % 64;
> + }
> + __builtin_memcpy (ph_src, p, 64);
> +
> + __builtin_memcpy (ph_exp, q, 64);
> +
> + foo_ph (ph_dst, ph_src);
> +
> + if (__builtin_memcmp (ph_dst, ph_exp, 64) != 0)
> + __builtin_abort ();
> +
> + return;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-6a.c b/gcc/testsuite/gcc.target/i386/pr106010-6a.c
> new file mode 100644
> index 00000000000..65a90d03684
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-6a.c
> @@ -0,0 +1,115 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-slp-details -mprefer-vector-width=256" } */
> +/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 6 "slp2" } }*/
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 2, 3, 0, 1 \}} 4 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 6, 7, 4, 5, 2, 3, 0, 1 \}} 4 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1 \}} 2 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 30, 31, 28, 29, 26, 27, 24, 25, 22, 23, 20, 21, 18, 19, 16, 17, 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1 \}} 2 "slp2" } } */
> +
> +void
> +__attribute__((noipa))
> +foo_pd (_Complex double* a, _Complex double* __restrict b)
> +{
> + a[0] = b[3];
> + a[1] = b[2];
> + a[2] = b[1];
> + a[3] = b[0];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_ps (_Complex float* a, _Complex float* __restrict b)
> +{
> + a[0] = b[7];
> + a[1] = b[6];
> + a[2] = b[5];
> + a[3] = b[4];
> + a[4] = b[3];
> + a[5] = b[2];
> + a[6] = b[1];
> + a[7] = b[0];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi64 (_Complex long long* a, _Complex long long* __restrict b)
> +{
> + a[0] = b[3];
> + a[1] = b[2];
> + a[2] = b[1];
> + a[3] = b[0];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi32 (_Complex int* a, _Complex int* __restrict b)
> +{
> + a[0] = b[7];
> + a[1] = b[6];
> + a[2] = b[5];
> + a[3] = b[4];
> + a[4] = b[3];
> + a[5] = b[2];
> + a[6] = b[1];
> + a[7] = b[0];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi16 (_Complex short* a, _Complex short* __restrict b)
> +{
> + a[0] = b[15];
> + a[1] = b[14];
> + a[2] = b[13];
> + a[3] = b[12];
> + a[4] = b[11];
> + a[5] = b[10];
> + a[6] = b[9];
> + a[7] = b[8];
> + a[8] = b[7];
> + a[9] = b[6];
> + a[10] = b[5];
> + a[11] = b[4];
> + a[12] = b[3];
> + a[13] = b[2];
> + a[14] = b[1];
> + a[15] = b[0];
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi8 (_Complex char* a, _Complex char* __restrict b)
> +{
> + a[0] = b[31];
> + a[1] = b[30];
> + a[2] = b[29];
> + a[3] = b[28];
> + a[4] = b[27];
> + a[5] = b[26];
> + a[6] = b[25];
> + a[7] = b[24];
> + a[8] = b[23];
> + a[9] = b[22];
> + a[10] = b[21];
> + a[11] = b[20];
> + a[12] = b[19];
> + a[13] = b[18];
> + a[14] = b[17];
> + a[15] = b[16];
> + a[16] = b[15];
> + a[17] = b[14];
> + a[18] = b[13];
> + a[19] = b[12];
> + a[20] = b[11];
> + a[21] = b[10];
> + a[22] = b[9];
> + a[23] = b[8];
> + a[24] = b[7];
> + a[25] = b[6];
> + a[26] = b[5];
> + a[27] = b[4];
> + a[28] = b[3];
> + a[29] = b[2];
> + a[30] = b[1];
> + a[31] = b[0];
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-6b.c b/gcc/testsuite/gcc.target/i386/pr106010-6b.c
> new file mode 100644
> index 00000000000..1c5bb020939
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-6b.c
> @@ -0,0 +1,157 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
> +/* { dg-require-effective-target avx2 } */
> +
> +#include "avx2-check.h"
> +#include <string.h>
> +#include "pr106010-6a.c"
> +
> +void
> +avx2_test (void)
> +{
> + _Complex double* pd_src = (_Complex double*) malloc (64);
> + _Complex double* pd_dst = (_Complex double*) malloc (64);
> + _Complex double* pd_exp = (_Complex double*) malloc (64);
> + _Complex float* ps_src = (_Complex float*) malloc (64);
> + _Complex float* ps_dst = (_Complex float*) malloc (64);
> + _Complex float* ps_exp = (_Complex float*) malloc (64);
> + _Complex long long* epi64_src = (_Complex long long*) malloc (64);
> + _Complex long long* epi64_dst = (_Complex long long*) malloc (64);
> + _Complex long long* epi64_exp = (_Complex long long*) malloc (64);
> + _Complex int* epi32_src = (_Complex int*) malloc (64);
> + _Complex int* epi32_dst = (_Complex int*) malloc (64);
> + _Complex int* epi32_exp = (_Complex int*) malloc (64);
> + _Complex short* epi16_src = (_Complex short*) malloc (64);
> + _Complex short* epi16_dst = (_Complex short*) malloc (64);
> + _Complex short* epi16_exp = (_Complex short*) malloc (64);
> + _Complex char* epi8_src = (_Complex char*) malloc (64);
> + _Complex char* epi8_dst = (_Complex char*) malloc (64);
> + _Complex char* epi8_exp = (_Complex char*) malloc (64);
> + char* p = (char* ) malloc (64);
> + char* q = (char* ) malloc (64);
> +
> + __builtin_memset (pd_dst, 0, 64);
> + __builtin_memset (ps_dst, 0, 64);
> + __builtin_memset (epi64_dst, 0, 64);
> + __builtin_memset (epi32_dst, 0, 64);
> + __builtin_memset (epi16_dst, 0, 64);
> + __builtin_memset (epi8_dst, 0, 64);
> +
> + for (int i = 0; i != 64; i++)
> + p[i] = i;
> +
> + __builtin_memcpy (pd_src, p, 64);
> + __builtin_memcpy (ps_src, p, 64);
> + __builtin_memcpy (epi64_src, p, 64);
> + __builtin_memcpy (epi32_src, p, 64);
> + __builtin_memcpy (epi16_src, p, 64);
> + __builtin_memcpy (epi8_src, p, 64);
> +
> +
> + for (int i = 0; i != 16; i++)
> + {
> + q[i] = i + 48;
> + q[i + 16] = i + 32;
> + q[i + 32] = i + 16;
> + q[i + 48] = i;
> + }
> +
> + __builtin_memcpy (pd_exp, q, 64);
> + __builtin_memcpy (epi64_exp, q, 64);
> +
> + for (int i = 0; i != 8; i++)
> + {
> + q[i] = i + 56;
> + q[i + 8] = i + 48;
> + q[i + 16] = i + 40;
> + q[i + 24] = i + 32;
> + q[i + 32] = i + 24;
> + q[i + 40] = i + 16;
> + q[i + 48] = i + 8;
> + q[i + 56] = i;
> + }
> +
> + __builtin_memcpy (ps_exp, q, 64);
> + __builtin_memcpy (epi32_exp, q, 64);
> +
> + for (int i = 0; i != 4; i++)
> + {
> + q[i] = i + 60;
> + q[i + 4] = i + 56;
> + q[i + 8] = i + 52;
> + q[i + 12] = i + 48;
> + q[i + 16] = i + 44;
> + q[i + 20] = i + 40;
> + q[i + 24] = i + 36;
> + q[i + 28] = i + 32;
> + q[i + 32] = i + 28;
> + q[i + 36] = i + 24;
> + q[i + 40] = i + 20;
> + q[i + 44] = i + 16;
> + q[i + 48] = i + 12;
> + q[i + 52] = i + 8;
> + q[i + 56] = i + 4;
> + q[i + 60] = i;
> + }
> +
> + __builtin_memcpy (epi16_exp, q, 64);
> +
> + for (int i = 0; i != 2; i++)
> + {
> + q[i] = i + 62;
> + q[i + 2] = i + 60;
> + q[i + 4] = i + 58;
> + q[i + 6] = i + 56;
> + q[i + 8] = i + 54;
> + q[i + 10] = i + 52;
> + q[i + 12] = i + 50;
> + q[i + 14] = i + 48;
> + q[i + 16] = i + 46;
> + q[i + 18] = i + 44;
> + q[i + 20] = i + 42;
> + q[i + 22] = i + 40;
> + q[i + 24] = i + 38;
> + q[i + 26] = i + 36;
> + q[i + 28] = i + 34;
> + q[i + 30] = i + 32;
> + q[i + 32] = i + 30;
> + q[i + 34] = i + 28;
> + q[i + 36] = i + 26;
> + q[i + 38] = i + 24;
> + q[i + 40] = i + 22;
> + q[i + 42] = i + 20;
> + q[i + 44] = i + 18;
> + q[i + 46] = i + 16;
> + q[i + 48] = i + 14;
> + q[i + 50] = i + 12;
> + q[i + 52] = i + 10;
> + q[i + 54] = i + 8;
> + q[i + 56] = i + 6;
> + q[i + 58] = i + 4;
> + q[i + 60] = i + 2;
> + q[i + 62] = i;
> + }
> + __builtin_memcpy (epi8_exp, q, 64);
> +
> + foo_pd (pd_dst, pd_src);
> + foo_ps (ps_dst, ps_src);
> + foo_epi64 (epi64_dst, epi64_src);
> + foo_epi32 (epi32_dst, epi32_src);
> + foo_epi16 (epi16_dst, epi16_src);
> + foo_epi8 (epi8_dst, epi8_src);
> +
> + if (__builtin_memcmp (pd_dst, pd_exp, 64) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (ps_dst, ps_exp, 64) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi64_dst, epi64_exp, 64) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi32_dst, epi32_exp, 64) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi16_dst, epi16_exp, 64) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi8_dst, epi8_exp, 64) != 0)
> + __builtin_abort ();
> +
> + return;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-6c.c b/gcc/testsuite/gcc.target/i386/pr106010-6c.c
> new file mode 100644
> index 00000000000..b859d884a7f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-6c.c
> @@ -0,0 +1,80 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256 -fdump-tree-slp-details" } */
> +/* { dg-require-effective-target avx512fp16 } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1 \}} 2 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 1 "slp2" } } */
> +
> +#include <string.h>
> +
> +static void do_test (void);
> +#define DO_TEST do_test
> +#define AVX512FP16
> +#include "avx512-check.h"
> +
> +void
> +__attribute__((noipa))
> +foo_ph (_Complex _Float16* a, _Complex _Float16* __restrict b)
> +{
> + a[0] = b[15];
> + a[1] = b[14];
> + a[2] = b[13];
> + a[3] = b[12];
> + a[4] = b[11];
> + a[5] = b[10];
> + a[6] = b[9];
> + a[7] = b[8];
> + a[8] = b[7];
> + a[9] = b[6];
> + a[10] = b[5];
> + a[11] = b[4];
> + a[12] = b[3];
> + a[13] = b[2];
> + a[14] = b[1];
> + a[15] = b[0];
> +}
> +
> +void
> +do_test (void)
> +{
> + _Complex _Float16* ph_src = (_Complex _Float16*) malloc (64);
> + _Complex _Float16* ph_dst = (_Complex _Float16*) malloc (64);
> + _Complex _Float16* ph_exp = (_Complex _Float16*) malloc (64);
> + char* p = (char* ) malloc (64);
> + char* q = (char* ) malloc (64);
> +
> + __builtin_memset (ph_dst, 0, 64);
> +
> + for (int i = 0; i != 64; i++)
> + p[i] = i;
> +
> + __builtin_memcpy (ph_src, p, 64);
> +
> + for (int i = 0; i != 4; i++)
> + {
> + q[i] = i + 60;
> + q[i + 4] = i + 56;
> + q[i + 8] = i + 52;
> + q[i + 12] = i + 48;
> + q[i + 16] = i + 44;
> + q[i + 20] = i + 40;
> + q[i + 24] = i + 36;
> + q[i + 28] = i + 32;
> + q[i + 32] = i + 28;
> + q[i + 36] = i + 24;
> + q[i + 40] = i + 20;
> + q[i + 44] = i + 16;
> + q[i + 48] = i + 12;
> + q[i + 52] = i + 8;
> + q[i + 56] = i + 4;
> + q[i + 60] = i;
> + }
> +
> + __builtin_memcpy (ph_exp, q, 64);
> +
> + foo_ph (ph_dst, ph_src);
> +
> + if (__builtin_memcmp (ph_dst, ph_exp, 64) != 0)
> + __builtin_abort ();
> +
> + return;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-7a.c b/gcc/testsuite/gcc.target/i386/pr106010-7a.c
> new file mode 100644
> index 00000000000..2ea01fac927
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-7a.c
> @@ -0,0 +1,58 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-vect-details -mprefer-vector-width=256" } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 6 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) double>} 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) float>} 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) long long int>} 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) int>} 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) short int>} 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(32\) char>} 1 "vect" } } */
> +
> +#define N 10000
> +void
> +__attribute__((noipa))
> +foo_pd (_Complex double* a, _Complex double b)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = b;
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_ps (_Complex float* a, _Complex float b)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = b;
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi64 (_Complex long long* a, _Complex long long b)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = b;
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi32 (_Complex int* a, _Complex int b)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = b;
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi16 (_Complex short* a, _Complex short b)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = b;
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi8 (_Complex char* a, _Complex char b)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = b;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-7b.c b/gcc/testsuite/gcc.target/i386/pr106010-7b.c
> new file mode 100644
> index 00000000000..26482cc10f5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-7b.c
> @@ -0,0 +1,63 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
> +/* { dg-require-effective-target avx } */
> +
> +#include "avx-check.h"
> +#include <string.h>
> +#include "pr106010-7a.c"
> +
> +void
> +avx_test (void)
> +{
> + _Complex double* pd_src = (_Complex double*) malloc (2 * N * sizeof (double));
> + _Complex double* pd_dst = (_Complex double*) malloc (2 * N * sizeof (double));
> + _Complex float* ps_src = (_Complex float*) malloc (2 * N * sizeof (float));
> + _Complex float* ps_dst = (_Complex float*) malloc (2 * N * sizeof (float));
> + _Complex long long* epi64_src = (_Complex long long*) malloc (2 * N * sizeof (long long));
> + _Complex long long* epi64_dst = (_Complex long long*) malloc (2 * N * sizeof (long long));
> + _Complex int* epi32_src = (_Complex int*) malloc (2 * N * sizeof (int));
> + _Complex int* epi32_dst = (_Complex int*) malloc (2 * N * sizeof (int));
> + _Complex short* epi16_src = (_Complex short*) malloc (2 * N * sizeof (short));
> + _Complex short* epi16_dst = (_Complex short*) malloc (2 * N * sizeof (short));
> + _Complex char* epi8_src = (_Complex char*) malloc (2 * N * sizeof (char));
> + _Complex char* epi8_dst = (_Complex char*) malloc (2 * N * sizeof (char));
> + char* p_init = (char*) malloc (2 * N * sizeof (double));
> +
> + __builtin_memset (pd_dst, 0, 2 * N * sizeof (double));
> + __builtin_memset (ps_dst, 0, 2 * N * sizeof (float));
> + __builtin_memset (epi64_dst, 0, 2 * N * sizeof (long long));
> + __builtin_memset (epi32_dst, 0, 2 * N * sizeof (int));
> + __builtin_memset (epi16_dst, 0, 2 * N * sizeof (short));
> + __builtin_memset (epi8_dst, 0, 2 * N * sizeof (char));
> +
> + for (int i = 0; i != 2 * N * sizeof (double); i++)
> + p_init[i] = i % 2 + 3;
> +
> + memcpy (pd_src, p_init, 2 * N * sizeof (double));
> + memcpy (ps_dst, p_init, 2 * N * sizeof (float));
> + memcpy (epi64_dst, p_init, 2 * N * sizeof (long long));
> + memcpy (epi32_dst, p_init, 2 * N * sizeof (int));
> + memcpy (epi16_dst, p_init, 2 * N * sizeof (short));
> + memcpy (epi8_dst, p_init, 2 * N * sizeof (char));
> +
> + foo_pd (pd_dst, pd_src[0]);
> + foo_ps (ps_dst, ps_src[0]);
> + foo_epi64 (epi64_dst, epi64_src[0]);
> + foo_epi32 (epi32_dst, epi32_src[0]);
> + foo_epi16 (epi16_dst, epi16_src[0]);
> + foo_epi8 (epi8_dst, epi8_src[0]);
> + if (__builtin_memcmp (pd_dst, pd_src, N * 2 * sizeof (double)) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (ps_dst, ps_src, N * 2 * sizeof (float)) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi64_dst, epi64_src, N * 2 * sizeof (long long)) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi32_dst, epi32_src, N * 2 * sizeof (int)) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi16_dst, epi16_src, N * 2 * sizeof (short)) != 0)
> + __builtin_abort ();
> + if (__builtin_memcmp (epi8_dst, epi8_src, N * 2 * sizeof (char)) != 0)
> + __builtin_abort ();
> +
> + return;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-7c.c b/gcc/testsuite/gcc.target/i386/pr106010-7c.c
> new file mode 100644
> index 00000000000..7f4056a5ecc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-7c.c
> @@ -0,0 +1,41 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256 -fdump-tree-vect-details" } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) _Float16>} 1 "vect" } } */
> +/* { dg-require-effective-target avx512fp16 } */
> +
> +#include <string.h>
> +
> +static void do_test (void);
> +
> +#define DO_TEST do_test
> +#define AVX512FP16
> +#include "avx512-check.h"
> +
> +#define N 10000
> +
> +void
> +__attribute__((noipa))
> +foo_ph (_Complex _Float16* a, _Complex _Float16 b)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = b;
> +}
> +
> +static void
> +do_test (void)
> +{
> + _Complex _Float16* ph_src = (_Complex _Float16*) malloc (2 * N * sizeof (_Float16));
> + _Complex _Float16* ph_dst = (_Complex _Float16*) malloc (2 * N * sizeof (_Float16));
> + char* p_init = (char*) malloc (2 * N * sizeof (_Float16));
> +
> + __builtin_memset (ph_dst, 0, 2 * N * sizeof (_Float16));
> +
> + for (int i = 0; i != 2 * N * sizeof (_Float16); i++)
> + p_init[i] = i % 2 + 3;
> +
> + memcpy (ph_src, p_init, 2 * N * sizeof (_Float16));
> +
> + foo_ph (ph_dst, ph_src[0]);
> + if (__builtin_memcmp (ph_dst, ph_src, N * 2 * sizeof (_Float16)) != 0)
> + __builtin_abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-8a.c b/gcc/testsuite/gcc.target/i386/pr106010-8a.c
> new file mode 100644
> index 00000000000..11054b60d30
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-8a.c
> @@ -0,0 +1,58 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-vect-details -mprefer-vector-width=256" } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 6 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) double>} 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) float>} 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) long long int>} 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) int>} 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) short int>} 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(32\) char>} 1 "vect" } } */
> +
> +#define N 10000
> +void
> +__attribute__((noipa))
> +foo_pd (_Complex double* a)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = 1.0 + 2.0i;
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_ps (_Complex float* a)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = 1.0f + 2.0fi;
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi64 (_Complex long long* a)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = 1 + 2i;
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi32 (_Complex int* a)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = 1 + 2i;
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi16 (_Complex short* a)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = 1 + 2i;
> +}
> +
> +void
> +__attribute__((noipa))
> +foo_epi8 (_Complex char* a)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = 1 + 2i;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-8b.c b/gcc/testsuite/gcc.target/i386/pr106010-8b.c
> new file mode 100644
> index 00000000000..6bb0073b691
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-8b.c
> @@ -0,0 +1,53 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
> +/* { dg-require-effective-target avx } */
> +
> +#include "avx-check.h"
> +#include <string.h>
> +#include "pr106010-8a.c"
> +
> +void
> +avx_test (void)
> +{
> + _Complex double pd_src = 1.0 + 2.0i;
> + _Complex double* pd_dst = (_Complex double*) malloc (2 * N * sizeof (double));
> + _Complex float ps_src = 1.0 + 2.0i;
> + _Complex float* ps_dst = (_Complex float*) malloc (2 * N * sizeof (float));
> + _Complex long long epi64_src = 1 + 2i;;
> + _Complex long long* epi64_dst = (_Complex long long*) malloc (2 * N * sizeof (long long));
> + _Complex int epi32_src = 1 + 2i;
> + _Complex int* epi32_dst = (_Complex int*) malloc (2 * N * sizeof (int));
> + _Complex short epi16_src = 1 + 2i;
> + _Complex short* epi16_dst = (_Complex short*) malloc (2 * N * sizeof (short));
> + _Complex char epi8_src = 1 + 2i;
> + _Complex char* epi8_dst = (_Complex char*) malloc (2 * N * sizeof (char));
> +
> + __builtin_memset (pd_dst, 0, 2 * N * sizeof (double));
> + __builtin_memset (ps_dst, 0, 2 * N * sizeof (float));
> + __builtin_memset (epi64_dst, 0, 2 * N * sizeof (long long));
> + __builtin_memset (epi32_dst, 0, 2 * N * sizeof (int));
> + __builtin_memset (epi16_dst, 0, 2 * N * sizeof (short));
> + __builtin_memset (epi8_dst, 0, 2 * N * sizeof (char));
> +
> + foo_pd (pd_dst);
> + foo_ps (ps_dst);
> + foo_epi64 (epi64_dst);
> + foo_epi32 (epi32_dst);
> + foo_epi16 (epi16_dst);
> + foo_epi8 (epi8_dst);
> + for (int i = 0 ; i != N; i++)
> + {
> + if (pd_dst[i] != pd_src)
> + __builtin_abort ();
> + if (ps_dst[i] != ps_src)
> + __builtin_abort ();
> + if (epi64_dst[i] != epi64_src)
> + __builtin_abort ();
> + if (epi32_dst[i] != epi32_src)
> + __builtin_abort ();
> + if (epi16_dst[i] != epi16_src)
> + __builtin_abort ();
> + if (epi8_dst[i] != epi8_src)
> + __builtin_abort ();
> + }
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-8c.c b/gcc/testsuite/gcc.target/i386/pr106010-8c.c
> new file mode 100644
> index 00000000000..61ae131829d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-8c.c
> @@ -0,0 +1,38 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256 -fdump-tree-vect-details" } */
> +/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) _Float16>} 1 "vect" } } */
> +/* { dg-require-effective-target avx512fp16 } */
> +
> +#include <string.h>
> +
> +static void do_test (void);
> +
> +#define DO_TEST do_test
> +#define AVX512FP16
> +#include "avx512-check.h"
> +
> +#define N 10000
> +
> +void
> +__attribute__((noipa))
> +foo_ph (_Complex _Float16* a)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = 1.0f16 + 2.0f16i;
> +}
> +
> +static void
> +do_test (void)
> +{
> + _Complex _Float16 ph_src = 1.0f16 + 2.0f16i;
> + _Complex _Float16* ph_dst = (_Complex _Float16*) malloc (2 * N * sizeof (_Float16));
> +
> + __builtin_memset (ph_dst, 0, 2 * N * sizeof (_Float16));
> +
> + foo_ph (ph_dst);
> + for (int i = 0; i != N; i++)
> + {
> + if (ph_dst[i] != ph_src)
> + __builtin_abort ();
> + }
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-9a.c b/gcc/testsuite/gcc.target/i386/pr106010-9a.c
> new file mode 100644
> index 00000000000..e922f7b5400
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-9a.c
> @@ -0,0 +1,89 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx2 -fvect-cost-model=unlimited -fdump-tree-vect-details" } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 6 "vect" } } */
> +
> +typedef struct { _Complex double c; double a1; double a2;}
> + cdf;
> +typedef struct { _Complex double c; double a1; double a2; double a3; double a4;}
> + cdf2;
> +typedef struct { _Complex double c1; _Complex double c2; double a1; double a2; double a3; double a4;}
> + cdf3;
> +typedef struct { _Complex double c1; _Complex double c2; double a1; double a2;}
> + cdf4;
> +
> +#define N 100
> +/* VMAT_ELEMENTWISE. */
> +void
> +__attribute__((noipa))
> +foo (cdf* a, cdf* __restrict b)
> +{
> + for (int i = 0; i < N; ++i)
> + {
> + a[i].c = b[i].c;
> + a[i].a1 = b[i].a1;
> + a[i].a2 = b[i].a2;
> + }
> +}
> +
> +/* VMAT_CONTIGUOUS_PERMUTE. */
> +void
> +__attribute__((noipa))
> +foo1 (cdf2* a, cdf2* __restrict b)
> +{
> + for (int i = 0; i < N; ++i)
> + {
> + a[i].c = b[i].c;
> + a[i].a1 = b[i].a1;
> + a[i].a2 = b[i].a2;
> + a[i].a3 = b[i].a3;
> + a[i].a4 = b[i].a4;
> + }
> +}
> +
> +/* VMAT_CONTIGUOUS. */
> +void
> +__attribute__((noipa))
> +foo2 (cdf3* a, cdf3* __restrict b)
> +{
> + for (int i = 0; i < N; ++i)
> + {
> + a[i].c1 = b[i].c1;
> + a[i].c2 = b[i].c2;
> + a[i].a1 = b[i].a1;
> + a[i].a2 = b[i].a2;
> + a[i].a3 = b[i].a3;
> + a[i].a4 = b[i].a4;
> + }
> +}
> +
> +/* VMAT_STRIDED_SLP. */
> +void
> +__attribute__((noipa))
> +foo3 (cdf4* a, cdf4* __restrict b)
> +{
> + for (int i = 0; i < N; ++i)
> + {
> + a[i].c1 = b[i].c1;
> + a[i].c2 = b[i].c2;
> + a[i].a1 = b[i].a1;
> + a[i].a2 = b[i].a2;
> + }
> +}
> +
> +/* VMAT_CONTIGUOUS_REVERSE. */
> +void
> +__attribute__((noipa))
> +foo4 (_Complex double* a, _Complex double* __restrict b)
> +{
> + for (int i = 0; i != N; i++)
> + a[i] = b[N-i-1];
> +}
> +
> +/* VMAT_CONTIGUOUS_DOWN. */
> +void
> +__attribute__((noipa))
> +foo5 (_Complex double* a, _Complex double* __restrict b)
> +{
> + for (int i = 0; i != N; i++)
> + a[N-i-1] = b[0];
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-9b.c b/gcc/testsuite/gcc.target/i386/pr106010-9b.c
> new file mode 100644
> index 00000000000..e220445e6e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-9b.c
> @@ -0,0 +1,90 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3 -msse2 -fvect-cost-model=unlimited" } */
> +/* { dg-require-effective-target sse2 } */
> +
> +#include <string.h>
> +#include "sse2-check.h"
> +#include "pr106010-9a.c"
> +
> +static void
> +sse2_test (void)
> +{
> + _Complex double* pd_src = (_Complex double*) malloc (N * sizeof (_Complex double));
> + _Complex double* pd_dst = (_Complex double*) malloc (N * sizeof (_Complex double));
> + _Complex double* pd_src2 = (_Complex double*) malloc (N * sizeof (_Complex double));
> + _Complex double* pd_dst2 = (_Complex double*) malloc (N * sizeof (_Complex double));
> + cdf* cdf_src = (cdf*) malloc (N * sizeof (cdf));
> + cdf* cdf_dst = (cdf*) malloc (N * sizeof (cdf));
> + cdf2* cdf2_src = (cdf2*) malloc (N * sizeof (cdf2));
> + cdf2* cdf2_dst = (cdf2*) malloc (N * sizeof (cdf2));
> + cdf3* cdf3_src = (cdf3*) malloc (N * sizeof (cdf3));
> + cdf3* cdf3_dst = (cdf3*) malloc (N * sizeof (cdf3));
> + cdf4* cdf4_src = (cdf4*) malloc (N * sizeof (cdf4));
> + cdf4* cdf4_dst = (cdf4*) malloc (N * sizeof (cdf4));
> +
> + char* p_init = (char*) malloc (N * sizeof (cdf3));
> +
> + __builtin_memset (cdf_dst, 0, N * sizeof (cdf));
> + __builtin_memset (cdf2_dst, 0, N * sizeof (cdf2));
> + __builtin_memset (cdf3_dst, 0, N * sizeof (cdf3));
> + __builtin_memset (cdf4_dst, 0, N * sizeof (cdf4));
> + __builtin_memset (pd_dst, 0, N * sizeof (_Complex double));
> + __builtin_memset (pd_dst2, 0, N * sizeof (_Complex double));
> +
> + for (int i = 0; i != N * sizeof (cdf3); i++)
> + p_init[i] = i;
> +
> + memcpy (cdf_src, p_init, N * sizeof (cdf));
> + memcpy (cdf2_src, p_init, N * sizeof (cdf2));
> + memcpy (cdf3_src, p_init, N * sizeof (cdf3));
> + memcpy (cdf4_src, p_init, N * sizeof (cdf4));
> + memcpy (pd_src, p_init, N * sizeof (_Complex double));
> + for (int i = 0; i != 2 * N * sizeof (double); i++)
> + p_init[i] = i % 16;
> + memcpy (pd_src2, p_init, N * sizeof (_Complex double));
> +
> + foo (cdf_dst, cdf_src);
> + foo1 (cdf2_dst, cdf2_src);
> + foo2 (cdf3_dst, cdf3_src);
> + foo3 (cdf4_dst, cdf4_src);
> + foo4 (pd_dst, pd_src);
> + foo5 (pd_dst2, pd_src2);
> + for (int i = 0; i != N; i++)
> + {
> + p_init[(N - i - 1) * 16] = i * 16;
> + p_init[(N - i - 1) * 16 + 1] = i * 16 + 1;
> + p_init[(N - i - 1) * 16 + 2] = i * 16 + 2;
> + p_init[(N - i - 1) * 16 + 3] = i * 16 + 3;
> + p_init[(N - i - 1) * 16 + 4] = i * 16 + 4;
> + p_init[(N - i - 1) * 16 + 5] = i * 16 + 5;
> + p_init[(N - i - 1) * 16 + 6] = i * 16 + 6;
> + p_init[(N - i - 1) * 16 + 7] = i * 16 + 7;
> + p_init[(N - i - 1) * 16 + 8] = i * 16 + 8;
> + p_init[(N - i - 1) * 16 + 9] = i * 16 + 9;
> + p_init[(N - i - 1) * 16 + 10] = i * 16 + 10;
> + p_init[(N - i - 1) * 16 + 11] = i * 16 + 11;
> + p_init[(N - i - 1) * 16 + 12] = i * 16 + 12;
> + p_init[(N - i - 1) * 16 + 13] = i * 16 + 13;
> + p_init[(N - i - 1) * 16 + 14] = i * 16 + 14;
> + p_init[(N - i - 1) * 16 + 15] = i * 16 + 15;
> + }
> + memcpy (pd_src, p_init, N * 16);
> +
> + if (__builtin_memcmp (pd_dst, pd_src, N * 2 * sizeof (double)) != 0)
> + __builtin_abort ();
> +
> + if (__builtin_memcmp (pd_dst2, pd_src2, N * 2 * sizeof (double)) != 0)
> + __builtin_abort ();
> +
> + if (__builtin_memcmp (cdf_dst, cdf_src, N * sizeof (cdf)) != 0)
> + __builtin_abort ();
> +
> + if (__builtin_memcmp (cdf2_dst, cdf2_src, N * sizeof (cdf2)) != 0)
> + __builtin_abort ();
> +
> + if (__builtin_memcmp (cdf3_dst, cdf3_src, N * sizeof (cdf3)) != 0)
> + __builtin_abort ();
> +
> + if (__builtin_memcmp (cdf4_dst, cdf4_src, N * sizeof (cdf4)) != 0)
> + __builtin_abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-9c.c b/gcc/testsuite/gcc.target/i386/pr106010-9c.c
> new file mode 100644
> index 00000000000..ff51f6195b7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-9c.c
> @@ -0,0 +1,90 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3 -mavx2 -fvect-cost-model=unlimited" } */
> +/* { dg-require-effective-target avx2 } */
> +
> +#include <string.h>
> +#include "avx2-check.h"
> +#include "pr106010-9a.c"
> +
> +static void
> +avx2_test (void)
> +{
> + _Complex double* pd_src = (_Complex double*) malloc (N * sizeof (_Complex double));
> + _Complex double* pd_dst = (_Complex double*) malloc (N * sizeof (_Complex double));
> + _Complex double* pd_src2 = (_Complex double*) malloc (N * sizeof (_Complex double));
> + _Complex double* pd_dst2 = (_Complex double*) malloc (N * sizeof (_Complex double));
> + cdf* cdf_src = (cdf*) malloc (N * sizeof (cdf));
> + cdf* cdf_dst = (cdf*) malloc (N * sizeof (cdf));
> + cdf2* cdf2_src = (cdf2*) malloc (N * sizeof (cdf2));
> + cdf2* cdf2_dst = (cdf2*) malloc (N * sizeof (cdf2));
> + cdf3* cdf3_src = (cdf3*) malloc (N * sizeof (cdf3));
> + cdf3* cdf3_dst = (cdf3*) malloc (N * sizeof (cdf3));
> + cdf4* cdf4_src = (cdf4*) malloc (N * sizeof (cdf4));
> + cdf4* cdf4_dst = (cdf4*) malloc (N * sizeof (cdf4));
> +
> + char* p_init = (char*) malloc (N * sizeof (cdf3));
> +
> + __builtin_memset (cdf_dst, 0, N * sizeof (cdf));
> + __builtin_memset (cdf2_dst, 0, N * sizeof (cdf2));
> + __builtin_memset (cdf3_dst, 0, N * sizeof (cdf3));
> + __builtin_memset (cdf4_dst, 0, N * sizeof (cdf4));
> + __builtin_memset (pd_dst, 0, N * sizeof (_Complex double));
> + __builtin_memset (pd_dst2, 0, N * sizeof (_Complex double));
> +
> + for (int i = 0; i != N * sizeof (cdf3); i++)
> + p_init[i] = i;
> +
> + memcpy (cdf_src, p_init, N * sizeof (cdf));
> + memcpy (cdf2_src, p_init, N * sizeof (cdf2));
> + memcpy (cdf3_src, p_init, N * sizeof (cdf3));
> + memcpy (cdf4_src, p_init, N * sizeof (cdf4));
> + memcpy (pd_src, p_init, N * sizeof (_Complex double));
> + for (int i = 0; i != 2 * N * sizeof (double); i++)
> + p_init[i] = i % 16;
> + memcpy (pd_src2, p_init, N * sizeof (_Complex double));
> +
> + foo (cdf_dst, cdf_src);
> + foo1 (cdf2_dst, cdf2_src);
> + foo2 (cdf3_dst, cdf3_src);
> + foo3 (cdf4_dst, cdf4_src);
> + foo4 (pd_dst, pd_src);
> + foo5 (pd_dst2, pd_src2);
> + for (int i = 0; i != N; i++)
> + {
> + p_init[(N - i - 1) * 16] = i * 16;
> + p_init[(N - i - 1) * 16 + 1] = i * 16 + 1;
> + p_init[(N - i - 1) * 16 + 2] = i * 16 + 2;
> + p_init[(N - i - 1) * 16 + 3] = i * 16 + 3;
> + p_init[(N - i - 1) * 16 + 4] = i * 16 + 4;
> + p_init[(N - i - 1) * 16 + 5] = i * 16 + 5;
> + p_init[(N - i - 1) * 16 + 6] = i * 16 + 6;
> + p_init[(N - i - 1) * 16 + 7] = i * 16 + 7;
> + p_init[(N - i - 1) * 16 + 8] = i * 16 + 8;
> + p_init[(N - i - 1) * 16 + 9] = i * 16 + 9;
> + p_init[(N - i - 1) * 16 + 10] = i * 16 + 10;
> + p_init[(N - i - 1) * 16 + 11] = i * 16 + 11;
> + p_init[(N - i - 1) * 16 + 12] = i * 16 + 12;
> + p_init[(N - i - 1) * 16 + 13] = i * 16 + 13;
> + p_init[(N - i - 1) * 16 + 14] = i * 16 + 14;
> + p_init[(N - i - 1) * 16 + 15] = i * 16 + 15;
> + }
> + memcpy (pd_src, p_init, N * 16);
> +
> + if (__builtin_memcmp (pd_dst, pd_src, N * 2 * sizeof (double)) != 0)
> + __builtin_abort ();
> +
> + if (__builtin_memcmp (pd_dst2, pd_src2, N * 2 * sizeof (double)) != 0)
> + __builtin_abort ();
> +
> + if (__builtin_memcmp (cdf_dst, cdf_src, N * sizeof (cdf)) != 0)
> + __builtin_abort ();
> +
> + if (__builtin_memcmp (cdf2_dst, cdf2_src, N * sizeof (cdf2)) != 0)
> + __builtin_abort ();
> +
> + if (__builtin_memcmp (cdf3_dst, cdf3_src, N * sizeof (cdf3)) != 0)
> + __builtin_abort ();
> +
> + if (__builtin_memcmp (cdf4_dst, cdf4_src, N * sizeof (cdf4)) != 0)
> + __builtin_abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr106010-9d.c b/gcc/testsuite/gcc.target/i386/pr106010-9d.c
> new file mode 100644
> index 00000000000..d4d8f1dd722
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106010-9d.c
> @@ -0,0 +1,92 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3 -mavx512f -mavx512vl -fvect-cost-model=unlimited -mprefer-vector-width=512" } */
> +/* { dg-require-effective-target avx512f } */
> +
> +#include <string.h>
> +#include <stdlib.h>
> +#define AVX512F
> +#include "avx512-check.h"
> +#include "pr106010-9a.c"
> +
> +static void
> +test_512 (void)
> +{
> + _Complex double* pd_src = (_Complex double*) malloc (N * sizeof (_Complex double));
> + _Complex double* pd_dst = (_Complex double*) malloc (N * sizeof (_Complex double));
> + _Complex double* pd_src2 = (_Complex double*) malloc (N * sizeof (_Complex double));
> + _Complex double* pd_dst2 = (_Complex double*) malloc (N * sizeof (_Complex double));
> + cdf* cdf_src = (cdf*) malloc (N * sizeof (cdf));
> + cdf* cdf_dst = (cdf*) malloc (N * sizeof (cdf));
> + cdf2* cdf2_src = (cdf2*) malloc (N * sizeof (cdf2));
> + cdf2* cdf2_dst = (cdf2*) malloc (N * sizeof (cdf2));
> + cdf3* cdf3_src = (cdf3*) malloc (N * sizeof (cdf3));
> + cdf3* cdf3_dst = (cdf3*) malloc (N * sizeof (cdf3));
> + cdf4* cdf4_src = (cdf4*) malloc (N * sizeof (cdf4));
> + cdf4* cdf4_dst = (cdf4*) malloc (N * sizeof (cdf4));
> +
> + char* p_init = (char*) malloc (N * sizeof (cdf3));
> +
> + __builtin_memset (cdf_dst, 0, N * sizeof (cdf));
> + __builtin_memset (cdf2_dst, 0, N * sizeof (cdf2));
> + __builtin_memset (cdf3_dst, 0, N * sizeof (cdf3));
> + __builtin_memset (cdf4_dst, 0, N * sizeof (cdf4));
> + __builtin_memset (pd_dst, 0, N * sizeof (_Complex double));
> + __builtin_memset (pd_dst2, 0, N * sizeof (_Complex double));
> +
> + for (int i = 0; i != N * sizeof (cdf3); i++)
> + p_init[i] = i;
> +
> + memcpy (cdf_src, p_init, N * sizeof (cdf));
> + memcpy (cdf2_src, p_init, N * sizeof (cdf2));
> + memcpy (cdf3_src, p_init, N * sizeof (cdf3));
> + memcpy (cdf4_src, p_init, N * sizeof (cdf4));
> + memcpy (pd_src, p_init, N * sizeof (_Complex double));
> + for (int i = 0; i != 2 * N * sizeof (double); i++)
> + p_init[i] = i % 16;
> + memcpy (pd_src2, p_init, N * sizeof (_Complex double));
> +
> + foo (cdf_dst, cdf_src);
> + foo1 (cdf2_dst, cdf2_src);
> + foo2 (cdf3_dst, cdf3_src);
> + foo3 (cdf4_dst, cdf4_src);
> + foo4 (pd_dst, pd_src);
> + foo5 (pd_dst2, pd_src2);
> + for (int i = 0; i != N; i++)
> + {
> + p_init[(N - i - 1) * 16] = i * 16;
> + p_init[(N - i - 1) * 16 + 1] = i * 16 + 1;
> + p_init[(N - i - 1) * 16 + 2] = i * 16 + 2;
> + p_init[(N - i - 1) * 16 + 3] = i * 16 + 3;
> + p_init[(N - i - 1) * 16 + 4] = i * 16 + 4;
> + p_init[(N - i - 1) * 16 + 5] = i * 16 + 5;
> + p_init[(N - i - 1) * 16 + 6] = i * 16 + 6;
> + p_init[(N - i - 1) * 16 + 7] = i * 16 + 7;
> + p_init[(N - i - 1) * 16 + 8] = i * 16 + 8;
> + p_init[(N - i - 1) * 16 + 9] = i * 16 + 9;
> + p_init[(N - i - 1) * 16 + 10] = i * 16 + 10;
> + p_init[(N - i - 1) * 16 + 11] = i * 16 + 11;
> + p_init[(N - i - 1) * 16 + 12] = i * 16 + 12;
> + p_init[(N - i - 1) * 16 + 13] = i * 16 + 13;
> + p_init[(N - i - 1) * 16 + 14] = i * 16 + 14;
> + p_init[(N - i - 1) * 16 + 15] = i * 16 + 15;
> + }
> + memcpy (pd_src, p_init, N * 16);
> +
> + if (__builtin_memcmp (pd_dst, pd_src, N * 2 * sizeof (double)) != 0)
> + __builtin_abort ();
> +
> + if (__builtin_memcmp (pd_dst2, pd_src2, N * 2 * sizeof (double)) != 0)
> + __builtin_abort ();
> +
> + if (__builtin_memcmp (cdf_dst, cdf_src, N * sizeof (cdf)) != 0)
> + __builtin_abort ();
> +
> + if (__builtin_memcmp (cdf2_dst, cdf2_src, N * sizeof (cdf2)) != 0)
> + __builtin_abort ();
> +
> + if (__builtin_memcmp (cdf3_dst, cdf3_src, N * sizeof (cdf3)) != 0)
> + __builtin_abort ();
> +
> + if (__builtin_memcmp (cdf4_dst, cdf4_src, N * sizeof (cdf4)) != 0)
> + __builtin_abort ();
> +}
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index d20a10a1524..19567bb338a 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -1403,7 +1403,8 @@ vect_get_data_access_cost (vec_info *vinfo, dr_vec_info *dr_info,
> if (PURE_SLP_STMT (stmt_info))
> ncopies = 1;
> else
> - ncopies = vect_get_num_copies (loop_vinfo, STMT_VINFO_VECTYPE (stmt_info));
> + ncopies = vect_get_num_copies (loop_vinfo, STMT_VINFO_VECTYPE (stmt_info),
> + STMT_VINFO_COMPLEX_P (stmt_info));
>
> if (DR_IS_READ (dr_info->dr))
> vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
> @@ -4597,8 +4598,22 @@ vect_analyze_data_refs (vec_info *vinfo, poly_uint64 *min_vf, bool *fatal)
>
> /* Set vectype for STMT. */
> scalar_type = TREE_TYPE (DR_REF (dr));
> - tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
> - if (!vectype)
> + tree adjust_scalar_type = scalar_type;
> + /* Support Complex type access. Note that the complex type of load/store
> + does not support gather/scatter. */
> + if (TREE_CODE (scalar_type) == COMPLEX_TYPE
> + && gatherscatter == SG_NONE)
> + {
> + adjust_scalar_type = TREE_TYPE (scalar_type);
> + STMT_VINFO_COMPLEX_P (stmt_info) = true;
> + }
> + tree vectype = get_vectype_for_scalar_type (vinfo, adjust_scalar_type);
> + unsigned HOST_WIDE_INT constant_nunits;
> + if (!vectype
> + /* For complex type, V1DI doesn't make sense. */
> + || (STMT_VINFO_COMPLEX_P (stmt_info)
> + && (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&constant_nunits)
> + || constant_nunits == 1)))
> {
> if (dump_enabled_p ())
> {
> @@ -4635,8 +4650,11 @@ vect_analyze_data_refs (vec_info *vinfo, poly_uint64 *min_vf, bool *fatal)
> }
>
> /* Adjust the minimal vectorization factor according to the
> - vector type. */
> + vector type. Note for complex type, VF is half of
> + TYPE_VECTOR_SUBPARTS. */
> vf = TYPE_VECTOR_SUBPARTS (vectype);
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + vf = exact_div (vf, 2);
> *min_vf = upper_bound (*min_vf, vf);
>
> /* Leave the BB vectorizer to pick the vector type later, based on
> @@ -6140,21 +6158,55 @@ vect_permute_load_chain (vec_info *vinfo, vec<tree> dr_chain,
> vec_perm_indices indices;
> for (k = 0; k < 3; k++)
> {
> - for (i = 0; i < nelt; i++)
> - if (3 * i + k < 2 * nelt)
> - sel[i] = 3 * i + k;
> - else
> - sel[i] = 0;
> - indices.new_vector (sel, 2, nelt);
> - perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + {
> + for (i = 0; i < nelt / 2; i++)
> + if (6 * i + 2 * k + 1 < 2 * nelt)
> + {
> + sel[2 * i] = 6 * i + 2 * k;
> + sel[2 * i + 1] = 6 * i + 2 * k + 1;
> + }
> + else
> + {
> + sel[2 * i] = 0;
> + sel[2 * i + 1] = 0;
> + }
>
> - for (i = 0, j = 0; i < nelt; i++)
> - if (3 * i + k < 2 * nelt)
> - sel[i] = i;
> - else
> - sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
> - indices.new_vector (sel, 2, nelt);
> - perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
> + indices.new_vector (sel, 2, nelt);
> + perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
> +
> + for (i = 0, j = 0; i < nelt / 2; i++)
> + if (6 * i + 2 * k + 1 < 2 * nelt)
> + {
> + sel[2 * i] = 2 * i;
> + sel[2 * i + 1] = 2 * i + 1;
> + }
> + else
> + {
> + sel[2 * i] = nelt + ((nelt + 2 * k) % 6) + 6 * j;
> + sel[2 * i + 1] = nelt + ((nelt + 2 * k) % 6) + 6 * (j++) + 1;
> + }
> + indices.new_vector (sel, 2, nelt);
> + perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
> + }
> + else
> + {
> + for (i = 0; i < nelt; i++)
> + if (3 * i + k < 2 * nelt)
> + sel[i] = 3 * i + k;
> + else
> + sel[i] = 0;
> + indices.new_vector (sel, 2, nelt);
> + perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
> +
> + for (i = 0, j = 0; i < nelt; i++)
> + if (3 * i + k < 2 * nelt)
> + sel[i] = i;
> + else
> + sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
> + indices.new_vector (sel, 2, nelt);
> + perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
> + }
>
> first_vect = dr_chain[0];
> second_vect = dr_chain[1];
> @@ -6186,17 +6238,43 @@ vect_permute_load_chain (vec_info *vinfo, vec<tree> dr_chain,
>
> /* The encoding has a single stepped pattern. */
> poly_uint64 nelt = TYPE_VECTOR_SUBPARTS (vectype);
> - vec_perm_builder sel (nelt, 1, 3);
> - sel.quick_grow (3);
> - for (i = 0; i < 3; ++i)
> - sel[i] = i * 2;
> - vec_perm_indices indices (sel, 2, nelt);
> - perm_mask_even = vect_gen_perm_mask_checked (vectype, indices);
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + {
> + vec_perm_builder sel;
> + unsigned neltc = nelt.to_constant ();
> + sel.new_vector (neltc, neltc, 1);
> + sel.quick_grow (neltc);
> + for (unsigned i = 0; i != neltc / 2; i++)
> + {
> + sel[2 * i] = i * 4;
> + sel[2 * i + 1] = i * 4 + 1;
> + }
> + vec_perm_indices indices (sel, 2, nelt);
> + perm_mask_even = vect_gen_perm_mask_checked (vectype, indices);
>
> - for (i = 0; i < 3; ++i)
> - sel[i] = i * 2 + 1;
> - indices.new_vector (sel, 2, nelt);
> - perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices);
> + for (unsigned i = 0; i != nelt.to_constant() / 2; i++)
> + {
> + sel[2 * i] = i * 4 + 2;
> + sel[2 * i + 1] = i * 4 + 3;
> + }
> + indices.new_vector (sel, 2, nelt);
> + perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices);
> + }
> + else
> + {
> + vec_perm_builder sel (nelt, 1, 3);
> + sel.quick_grow (3);
> + for (i = 0; i < 3; ++i)
> + sel[i] = i * 2;
> +
> + vec_perm_indices indices (sel, 2, nelt);
> + perm_mask_even = vect_gen_perm_mask_checked (vectype, indices);
> +
> + for (i = 0; i < 3; ++i)
> + sel[i] = i * 2 + 1;
> + indices.new_vector (sel, 2, nelt);
> + perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices);
> + }
>
> for (i = 0; i < log_length; i++)
> {
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 3a70c15b593..365fa738022 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -200,7 +200,12 @@ vect_determine_vf_for_stmt_1 (vec_info *vinfo, stmt_vec_info stmt_info,
> }
>
> if (nunits_vectype)
> - vect_update_max_nunits (vf, nunits_vectype);
> + {
> + poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (nunits_vectype);
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + nunits = exact_div (nunits, 2);
> + vect_update_max_nunits (vf, nunits);
> + }
>
> return opt_result::success ();
> }
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index dab5daddcc5..5d66ea2f286 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -877,10 +877,14 @@ vect_record_max_nunits (vec_info *vinfo, stmt_vec_info stmt_info,
> return false;
> }
>
> + poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + nunits = exact_div (nunits, 2);
> +
> /* If populating the vector type requires unrolling then fail
> before adjusting *max_nunits for basic-block vectorization. */
> if (is_a <bb_vec_info> (vinfo)
> - && !multiple_p (group_size, TYPE_VECTOR_SUBPARTS (vectype)))
> + && !multiple_p (group_size , nunits))
> {
> if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -891,7 +895,7 @@ vect_record_max_nunits (vec_info *vinfo, stmt_vec_info stmt_info,
> }
>
> /* In case of multiple types we need to detect the smallest type. */
> - vect_update_max_nunits (max_nunits, vectype);
> + vect_update_max_nunits (max_nunits, nunits);
> return true;
> }
>
> @@ -3720,22 +3724,54 @@ vect_optimize_slp (vec_info *vinfo)
> vect_attempt_slp_rearrange_stmts did. This allows us to be lazy
> when permuting constants and invariants keeping the permute
> bijective. */
> - auto_sbitmap load_index (SLP_TREE_LANES (node));
> - bitmap_clear (load_index);
> - for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j)
> - bitmap_set_bit (load_index, SLP_TREE_LOAD_PERMUTATION (node)[j] - imin);
> - unsigned j;
> - for (j = 0; j < SLP_TREE_LANES (node); ++j)
> - if (!bitmap_bit_p (load_index, j))
> - break;
> - if (j != SLP_TREE_LANES (node))
> - continue;
> + /* Permutation of Complex type. */
> + if (STMT_VINFO_COMPLEX_P (dr_stmt))
> + {
> + auto_sbitmap load_index (SLP_TREE_LANES (node) * 2);
> + bitmap_clear (load_index);
> + for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j)
> + {
> + unsigned bit = SLP_TREE_LOAD_PERMUTATION (node)[j] - imin;
> + bitmap_set_bit (load_index, 2 * bit);
> + bitmap_set_bit (load_index, 2 * bit + 1);
> + }
> + unsigned j;
> + for (j = 0; j < SLP_TREE_LANES (node) * 2; ++j)
> + if (!bitmap_bit_p (load_index, j))
> + break;
> + if (j != SLP_TREE_LANES (node) * 2)
> + continue;
>
> - vec<unsigned> perm = vNULL;
> - perm.safe_grow (SLP_TREE_LANES (node), true);
> - for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j)
> - perm[j] = SLP_TREE_LOAD_PERMUTATION (node)[j] - imin;
> - perms.safe_push (perm);
> + vec<unsigned> perm = vNULL;
> + perm.safe_grow (SLP_TREE_LANES (node) * 2, true);
> + for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j)
> + {
> + unsigned cidx = SLP_TREE_LOAD_PERMUTATION (node)[j] - imin;
> + perm[2 * j] = 2 * cidx;
> + perm[2 * j + 1] = 2 * cidx + 1;
> + }
> + perms.safe_push (perm);
> + }
> + else
> + {
> + auto_sbitmap load_index (SLP_TREE_LANES (node));
> + bitmap_clear (load_index);
> + for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j)
> + bitmap_set_bit (load_index,
> + SLP_TREE_LOAD_PERMUTATION (node)[j] - imin);
> + unsigned j;
> + for (j = 0; j < SLP_TREE_LANES (node); ++j)
> + if (!bitmap_bit_p (load_index, j))
> + break;
> + if (j != SLP_TREE_LANES (node))
> + continue;
> +
> + vec<unsigned> perm = vNULL;
> + perm.safe_grow (SLP_TREE_LANES (node), true);
> + for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j)
> + perm[j] = SLP_TREE_LOAD_PERMUTATION (node)[j] - imin;
> + perms.safe_push (perm);
> + }
> vertices[idx].perm_in = perms.length () - 1;
> vertices[idx].perm_out = perms.length () - 1;
> }
> @@ -4518,6 +4554,12 @@ vect_slp_analyze_node_operations_1 (vec_info *vinfo, slp_tree node,
> vf = loop_vinfo->vectorization_factor;
> else
> vf = 1;
> + /* For complex type and SLP, double vf to get right vectype.
> + .i.e vector(4) double for complex double, group size is 2, double vf
> + to map vf * group_size to TYPE_VECTOR_SUBPARTS. */
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + vf *= 2;
> +
> unsigned int group_size = SLP_TREE_LANES (node);
> tree vectype = SLP_TREE_VECTYPE (node);
> SLP_TREE_NUMBER_OF_VEC_STMTS (node)
> @@ -4763,10 +4805,17 @@ vect_slp_analyze_node_operations (vec_info *vinfo, slp_tree node,
> }
> unsigned group_size = SLP_TREE_LANES (child);
> poly_uint64 vf = 1;
> +
> if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> vf = loop_vinfo->vectorization_factor;
> +
> + /* V2SF is just 1 complex type, so mutiply by 2
> + to get release vector numbers. */
> + unsigned cp
> + = STMT_VINFO_COMPLEX_P (SLP_TREE_REPRESENTATIVE (node)) ? 2 : 1;
> +
> SLP_TREE_NUMBER_OF_VEC_STMTS (child)
> - = vect_get_num_vectors (vf * group_size, vector_type);
> + = vect_get_num_vectors (vf * group_size * cp, vector_type);
> /* And cost them. */
> vect_prologue_cost_for_slp (child, cost_vec);
> }
> @@ -6402,6 +6451,11 @@ vect_create_constant_vectors (vec_info *vinfo, slp_tree op_node)
>
> /* We always want SLP_TREE_VECTYPE (op_node) here correctly set. */
> vector_type = SLP_TREE_VECTYPE (op_node);
> + unsigned int cp = 1;
> + /* Handle Complex type vector init.
> + SLP_TREE_REPRESENTATIVE (op_node) could be NULL. */
> + if (TREE_CODE (TREE_TYPE (op_node->ops[0])) == COMPLEX_TYPE)
> + cp = 2;
>
> unsigned int number_of_vectors = SLP_TREE_NUMBER_OF_VEC_STMTS (op_node);
> SLP_TREE_VEC_DEFS (op_node).create (number_of_vectors);
> @@ -6426,9 +6480,9 @@ vect_create_constant_vectors (vec_info *vinfo, slp_tree op_node)
> /* When using duplicate_and_interleave, we just need one element for
> each scalar statement. */
> if (!TYPE_VECTOR_SUBPARTS (vector_type).is_constant (&nunits))
> - nunits = group_size;
> + nunits = group_size * cp;
>
> - number_of_copies = nunits * number_of_vectors / group_size;
> + number_of_copies = nunits * number_of_vectors / (group_size * cp);
>
> number_of_places_left_in_vector = nunits;
> constant_p = true;
> @@ -6460,8 +6514,23 @@ vect_create_constant_vectors (vec_info *vinfo, slp_tree op_node)
> gcc_unreachable ();
> }
> else
> - op = fold_unary (VIEW_CONVERT_EXPR,
> - TREE_TYPE (vector_type), op);
> + {
> + tree scalar_type = TREE_TYPE (vector_type);
> + /* For complex type, insert real and imag part
> + separately. */
> + if (cp == 2)
> + {
> + gcc_assert ((TREE_CODE (TREE_TYPE (op))
> + == COMPLEX_TYPE)
> + && (scalar_type
> + == TREE_TYPE (TREE_TYPE (op))));
> + elts[number_of_places_left_in_vector--]
> + = fold_unary (IMAGPART_EXPR, scalar_type, op);
> + op = fold_unary (REALPART_EXPR, scalar_type, op);
> + }
> + else
> + op = fold_unary (VIEW_CONVERT_EXPR, scalar_type, op);
> + }
> gcc_assert (op && CONSTANT_CLASS_P (op));
> }
> else
> @@ -6481,11 +6550,28 @@ vect_create_constant_vectors (vec_info *vinfo, slp_tree op_node)
> }
> else
> {
> - op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vector_type),
> - op);
> - init_stmt
> - = gimple_build_assign (new_temp, VIEW_CONVERT_EXPR,
> - op);
> + tree scalar_type = TREE_TYPE (vector_type);
> + if (cp == 2)
> + {
> + gcc_assert ((TREE_CODE (TREE_TYPE (op))
> + == COMPLEX_TYPE)
> + && (scalar_type
> + == TREE_TYPE (TREE_TYPE (op))));
> + tree imag = build1 (IMAGPART_EXPR, scalar_type, op);
> + op = build1 (REALPART_EXPR, scalar_type, op);
> + tree imag_temp = make_ssa_name (scalar_type);
> + elts[number_of_places_left_in_vector--] = imag_temp;
> + init_stmt = gimple_build_assign (imag_temp, imag);
> + gimple_seq_add_stmt (&ctor_seq, init_stmt);
> + init_stmt = gimple_build_assign (new_temp, op);
> + }
> + else
> + {
> + op = build1 (VIEW_CONVERT_EXPR, scalar_type, op);
> + init_stmt
> + = gimple_build_assign (new_temp, VIEW_CONVERT_EXPR,
> + op);
> + }
> }
> gimple_seq_add_stmt (&ctor_seq, init_stmt);
> op = new_temp;
> @@ -6696,15 +6782,17 @@ vect_transform_slp_perm_load (vec_info *vinfo,
> unsigned int nelts_to_build;
> unsigned int nvectors_per_build;
> unsigned int in_nlanes;
> + unsigned int cp = STMT_VINFO_COMPLEX_P (stmt_info) ? 2 : 1;
> bool repeating_p = (group_size == DR_GROUP_SIZE (stmt_info)
> - && multiple_p (nunits, group_size));
> + && multiple_p (nunits, group_size * cp));
> if (repeating_p)
> {
> /* A single vector contains a whole number of copies of the node, so:
> (a) all permutes can use the same mask; and
> (b) the permutes only need a single vector input. */
> - mask.new_vector (nunits, group_size, 3);
> - nelts_to_build = mask.encoded_nelts ();
> + /* For complex type, mask size should be double of nelts_to_build. */
> + mask.new_vector (nunits, group_size * cp, 3);
> + nelts_to_build = mask.encoded_nelts () / cp;
> nvectors_per_build = SLP_TREE_VEC_STMTS (node).length ();
> in_nlanes = DR_GROUP_SIZE (stmt_info) * 3;
> }
> @@ -6744,8 +6832,8 @@ vect_transform_slp_perm_load (vec_info *vinfo,
> {
> /* Enforced before the loop when !repeating_p. */
> unsigned int const_nunits = nunits.to_constant ();
> - vec_index = i / const_nunits;
> - mask_element = i % const_nunits;
> + vec_index = i / (const_nunits / cp);
> + mask_element = i % (const_nunits / cp);
> if (vec_index == first_vec_index
> || first_vec_index == -1)
> {
> @@ -6755,7 +6843,7 @@ vect_transform_slp_perm_load (vec_info *vinfo,
> || second_vec_index == -1)
> {
> second_vec_index = vec_index;
> - mask_element += const_nunits;
> + mask_element += (const_nunits / cp);
> }
> else
> {
> @@ -6768,14 +6856,24 @@ vect_transform_slp_perm_load (vec_info *vinfo,
> return false;
> }
>
> - gcc_assert (mask_element < 2 * const_nunits);
> + gcc_assert (mask_element < 2 * const_nunits / cp);
> }
>
> if (mask_element != index)
> noop_p = false;
> - mask[index++] = mask_element;
> + /* Set index for Complex _type.
> + i.e. mask like [1,0] is actually [2, 3, 0, 1]
> + for vector scalar type. */
> + if (cp == 2)
> + {
> + mask[2 * index] = 2 * mask_element;
> + mask[2 * index + 1] = 2 * mask_element + 1;
> + }
> + else
> + mask[index] = mask_element;
> + index++;
>
> - if (index == count && !noop_p)
> + if (index * cp == count && !noop_p)
> {
> indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
> if (!can_vec_perm_const_p (mode, mode, indices))
> @@ -6799,7 +6897,7 @@ vect_transform_slp_perm_load (vec_info *vinfo,
> ++*n_perms;
> }
>
> - if (index == count)
> + if (index * cp == count)
> {
> if (!analyze_only)
> {
> @@ -6869,7 +6967,7 @@ vect_transform_slp_perm_load (vec_info *vinfo,
> bool load_seen = false;
> for (unsigned i = 0; i < in_nlanes; ++i)
> {
> - if (i % const_nunits == 0)
> + if (i % (const_nunits * cp) == 0)
> {
> if (load_seen)
> *n_loads += 1;
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 72107afc883..d6223c28f1c 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1397,25 +1397,70 @@ vect_init_vector (vec_info *vinfo, stmt_vec_info stmt_info, tree val, tree type,
> {
> gimple *init_stmt;
> tree new_temp;
> + tree scalar_type = TREE_TYPE (type);
> + gimple_seq stmts = NULL;
> +
> + if (TREE_CODE (TREE_TYPE (val)) == COMPLEX_TYPE)
> + {
> + unsigned HOST_WIDE_INT nunits;
> + gcc_assert (TYPE_VECTOR_SUBPARTS (type).is_constant (&nunits));
> +
> + tree_vector_builder elts (type, nunits, 1);
> + tree imag, real;
> + if (TREE_CODE (val) == COMPLEX_CST)
> + {
> + real = fold_unary (REALPART_EXPR, scalar_type, val);
> + imag = fold_unary (IMAGPART_EXPR, scalar_type, val);
> + }
> + else
> + {
> + real = make_ssa_name (scalar_type);
> + imag = make_ssa_name (scalar_type);
> + init_stmt
> + = gimple_build_assign (real,
> + build1 (REALPART_EXPR, scalar_type, val));
> + gimple_seq_add_stmt (&stmts, init_stmt);
> + init_stmt
> + = gimple_build_assign (imag,
> + build1 (IMAGPART_EXPR, scalar_type, val));
> + gimple_seq_add_stmt (&stmts, init_stmt);
> + }
>
> + /* Build vector as [real,imag,real,imag,...]. */
> + for (unsigned i = 0; i != nunits; i++)
> + {
> + if (i % 2)
> + elts.quick_push (imag);
> + else
> + elts.quick_push (real);
> + }
> + val = gimple_build_vector (&stmts, &elts);
> + if (!gimple_seq_empty_p (stmts))
> + {
> + if (gsi)
> + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> + else
> + vinfo->insert_seq_on_entry (stmt_info, stmts);
> + }
> + }
> /* We abuse this function to push sth to a SSA name with initial 'val'. */
> - if (! useless_type_conversion_p (type, TREE_TYPE (val)))
> + else if (! useless_type_conversion_p (type, TREE_TYPE (val)))
> {
> gcc_assert (TREE_CODE (type) == VECTOR_TYPE);
> - if (! types_compatible_p (TREE_TYPE (type), TREE_TYPE (val)))
> + if (! types_compatible_p (scalar_type, TREE_TYPE (val)))
> {
> /* Scalar boolean value should be transformed into
> all zeros or all ones value before building a vector. */
> if (VECTOR_BOOLEAN_TYPE_P (type))
> {
> - tree true_val = build_all_ones_cst (TREE_TYPE (type));
> - tree false_val = build_zero_cst (TREE_TYPE (type));
> + tree true_val = build_all_ones_cst (scalar_type);
> + tree false_val = build_zero_cst (scalar_type);
>
> if (CONSTANT_CLASS_P (val))
> val = integer_zerop (val) ? false_val : true_val;
> else
> {
> - new_temp = make_ssa_name (TREE_TYPE (type));
> + new_temp = make_ssa_name (scalar_type);
> init_stmt = gimple_build_assign (new_temp, COND_EXPR,
> val, true_val, false_val);
> vect_init_vector_1 (vinfo, stmt_info, init_stmt, gsi);
> @@ -1424,14 +1469,13 @@ vect_init_vector (vec_info *vinfo, stmt_vec_info stmt_info, tree val, tree type,
> }
> else
> {
> - gimple_seq stmts = NULL;
> if (! INTEGRAL_TYPE_P (TREE_TYPE (val)))
> val = gimple_build (&stmts, VIEW_CONVERT_EXPR,
> - TREE_TYPE (type), val);
> + scalar_type, val);
> else
> /* ??? Condition vectorization expects us to do
> promotion of invariant/external defs. */
> - val = gimple_convert (&stmts, TREE_TYPE (type), val);
> + val = gimple_convert (&stmts, scalar_type, val);
> for (gimple_stmt_iterator gsi2 = gsi_start (stmts);
> !gsi_end_p (gsi2); )
> {
> @@ -1496,7 +1540,12 @@ vect_get_vec_defs_for_operand (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> && VECTOR_BOOLEAN_TYPE_P (stmt_vectype))
> vector_type = truth_type_for (stmt_vectype);
> else
> - vector_type = get_vectype_for_scalar_type (loop_vinfo, TREE_TYPE (op));
> + {
> + tree scalar_type = TREE_TYPE (op);
> + if (STMT_VINFO_COMPLEX_P (stmt_vinfo))
> + scalar_type = TREE_TYPE (scalar_type);
> + vector_type = get_vectype_for_scalar_type (loop_vinfo, scalar_type);
> + }
>
> gcc_assert (vector_type);
> tree vop = vect_init_vector (vinfo, stmt_vinfo, op, vector_type, NULL);
> @@ -1892,6 +1941,13 @@ vect_truncate_gather_scatter_offset (stmt_vec_info stmt_info,
> return false;
> }
>
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> + "Complex type doens't support gather_scatter.\n");
> + return false;
> + }
> /* Get the number of bits in an element. */
> tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> scalar_mode element_mode = SCALAR_TYPE_MODE (TREE_TYPE (vectype));
> @@ -2022,6 +2078,30 @@ perm_mask_for_reverse (tree vectype)
> return vect_gen_perm_mask_checked (vectype, indices);
> }
>
> +static tree
> +perm_mask_for_reverse (tree vectype, bool complex_p)
> +{
> + if (!complex_p)
> + return perm_mask_for_reverse (vectype);
> +
> + unsigned HOST_WIDE_INT nunits;
> + gcc_assert (TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits));
> +
> + /* The encoding has a single stepped pattern. */
> + vec_perm_builder sel (nunits, nunits, 1);
> + for (unsigned i = 0; i < nunits; i+=2)
> + {
> + sel.quick_push (nunits - 2 - i);
> + sel.quick_push (nunits - 1 - i);
> + }
> +
> + vec_perm_indices indices (sel, 1, nunits);
> + if (!can_vec_perm_const_p (TYPE_MODE (vectype), TYPE_MODE (vectype),
> + indices))
> + return NULL_TREE;
> + return vect_gen_perm_mask_checked (vectype, indices);
> +}
> +
> /* A subroutine of get_load_store_type, with a subset of the same
> arguments. Handle the case where STMT_INFO is a load or store that
> accesses consecutive elements with a negative step. Sets *POFFSET
> @@ -2045,8 +2125,12 @@ get_negative_load_store_type (vec_info *vinfo,
> }
>
> /* For backward running DRs the first access in vectype actually is
> - N-1 elements before the address of the DR. */
> - *poffset = ((-TYPE_VECTOR_SUBPARTS (vectype) + 1)
> + N-1 elements before the address of the DR.
> + for Complex type, it's N - 2. */
> + unsigned cp = 1;
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + cp = 2;
> + *poffset = ((-TYPE_VECTOR_SUBPARTS (vectype) + cp)
> * TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype))));
>
> int misalignment = dr_misalignment (dr_info, vectype, *poffset);
> @@ -2071,7 +2155,7 @@ get_negative_load_store_type (vec_info *vinfo,
> return VMAT_CONTIGUOUS_DOWN;
> }
>
> - if (!perm_mask_for_reverse (vectype))
> + if (!perm_mask_for_reverse (vectype, STMT_VINFO_COMPLEX_P (stmt_info)))
> {
> if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -2188,6 +2272,8 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
> && !DR_GROUP_NEXT_ELEMENT (stmt_info));
> unsigned HOST_WIDE_INT gap = DR_GROUP_GAP (first_stmt_info);
> poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + nunits = exact_div (nunits, 2);
>
> /* True if the vectorized statements would access beyond the last
> statement in the group. */
> @@ -2352,7 +2438,11 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
> {
> /* First cope with the degenerate case of a single-element
> vector. */
> - if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U))
> + poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + nunits = exact_div (nunits, 2);
> +
> + if (known_eq (nunits, 1U))
> ;
>
> /* Otherwise try using LOAD/STORE_LANES. */
> @@ -2361,6 +2451,8 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
> : vect_store_lanes_supported (vectype, group_size,
> masked_p))
> {
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + return false;
> *memory_access_type = VMAT_LOAD_STORE_LANES;
> overrun_p = would_overrun_p;
> }
> @@ -2620,6 +2712,14 @@ vect_check_scalar_mask (vec_info *vinfo, stmt_vec_info stmt_info,
> return false;
> }
>
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "Complex type doesn't support mask argument.\n");
> + return false;
> + }
> +
> if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (*mask)))
> {
> if (dump_enabled_p ())
> @@ -7509,8 +7609,17 @@ vectorizable_store (vec_info *vinfo,
> same location twice. */
> gcc_assert (slp == PURE_SLP_STMT (stmt_info));
>
> + if (!STMT_VINFO_DATA_REF (stmt_info))
> + return false;
> +
> tree vectype = STMT_VINFO_VECTYPE (stmt_info), rhs_vectype = NULL_TREE;
> poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + {
> + if (!nunits.is_constant ())
> + return false;
> + nunits = exact_div (nunits, 2);
> + }
>
> if (loop_vinfo)
> {
> @@ -7526,7 +7635,8 @@ vectorizable_store (vec_info *vinfo,
> if (slp)
> ncopies = 1;
> else
> - ncopies = vect_get_num_copies (loop_vinfo, vectype);
> + ncopies = vect_get_num_copies (loop_vinfo, vectype,
> + STMT_VINFO_COMPLEX_P (stmt_info));
>
> gcc_assert (ncopies >= 1);
>
> @@ -7544,11 +7654,10 @@ vectorizable_store (vec_info *vinfo,
> return false;
>
> elem_type = TREE_TYPE (vectype);
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + elem_type = build_complex_type (elem_type);
> vec_mode = TYPE_MODE (vectype);
>
> - if (!STMT_VINFO_DATA_REF (stmt_info))
> - return false;
> -
> vect_memory_access_type memory_access_type;
> enum dr_alignment_support alignment_support_scheme;
> int misalignment;
> @@ -7951,21 +8060,31 @@ vectorizable_store (vec_info *vinfo,
> tree lvectype = vectype;
> if (slp)
> {
> + scalar_mode elmode;
> if (group_size < const_nunits
> && const_nunits % group_size == 0)
> {
> nstores = const_nunits / group_size;
> - lnel = group_size;
> - ltype = build_vector_type (elem_type, group_size);
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + {
> + lnel = group_size * 2;
> + ltype = build_vector_type (TREE_TYPE (elem_type), group_size * 2);
> + elmode = SCALAR_TYPE_MODE (TREE_TYPE (elem_type));
> + }
> + else
> + {
> + ltype = build_vector_type (elem_type, group_size);
> + lnel = group_size;
> + elmode = SCALAR_TYPE_MODE (elem_type);
> + }
> lvectype = vectype;
>
> /* First check if vec_extract optab doesn't support extraction
> of vector elts directly. */
> - scalar_mode elmode = SCALAR_TYPE_MODE (elem_type);
> machine_mode vmode;
> if (!VECTOR_MODE_P (TYPE_MODE (vectype))
> || !related_vector_mode (TYPE_MODE (vectype), elmode,
> - group_size).exists (&vmode)
> + lnel).exists (&vmode)
> || (convert_optab_handler (vec_extract_optab,
> TYPE_MODE (vectype), vmode)
> == CODE_FOR_nothing))
> @@ -8051,6 +8170,8 @@ vectorizable_store (vec_info *vinfo,
> unsigned int group_el = 0;
> unsigned HOST_WIDE_INT
> elsz = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + elsz *= 2;
> for (j = 0; j < ncopies; j++)
> {
> vec_oprnd = vec_oprnds[j];
> @@ -8448,7 +8569,9 @@ vectorizable_store (vec_info *vinfo,
>
> if (memory_access_type == VMAT_CONTIGUOUS_REVERSE)
> {
> - tree perm_mask = perm_mask_for_reverse (vectype);
> + tree perm_mask
> + = perm_mask_for_reverse (vectype,
> + STMT_VINFO_COMPLEX_P (stmt_info));
> tree perm_dest = vect_create_destination_var
> (vect_get_store_rhs (stmt_info), vectype);
> tree new_temp = make_ssa_name (perm_dest);
> @@ -8778,6 +8901,12 @@ vectorizable_load (vec_info *vinfo,
>
> tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + {
> + if (!nunits.is_constant ())
> + return false;
> + nunits = exact_div (nunits, 2);
> + }
>
> if (loop_vinfo)
> {
> @@ -8794,7 +8923,8 @@ vectorizable_load (vec_info *vinfo,
> if (slp)
> ncopies = 1;
> else
> - ncopies = vect_get_num_copies (loop_vinfo, vectype);
> + ncopies = vect_get_num_copies (loop_vinfo, vectype,
> + STMT_VINFO_COMPLEX_P (stmt_info));
>
> gcc_assert (ncopies >= 1);
>
> @@ -8822,6 +8952,8 @@ vectorizable_load (vec_info *vinfo,
> }
>
> elem_type = TREE_TYPE (vectype);
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + elem_type = build_complex_type (elem_type);
> mode = TYPE_MODE (vectype);
>
> /* FORNOW. In some cases can vectorize even if data-type not supported
> @@ -8870,8 +9002,11 @@ vectorizable_load (vec_info *vinfo,
> if (k > maxk)
> maxk = k;
> tree vectype = SLP_TREE_VECTYPE (slp_node);
> + /* For complex type, half the nunits. */
> if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits)
> - || maxk >= (DR_GROUP_SIZE (group_info) & ~(nunits - 1)))
> + || maxk >= (DR_GROUP_SIZE (group_info)
> + & ~((STMT_VINFO_COMPLEX_P (group_info)
> + ? nunits >> 1 : nunits) - 1)))
> {
> if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -9098,9 +9233,10 @@ vectorizable_load (vec_info *vinfo,
> }
> else
> {
> + unsigned cp = STMT_VINFO_COMPLEX_P (stmt_info) ? 2 : 1;
> if (grouped_load)
> cst_offset
> - = (tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (vectype)))
> + = (tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (vectype))) * cp
> * vect_get_place_in_interleaving_chain (stmt_info,
> first_stmt_info));
> group_size = 1;
> @@ -9150,6 +9286,8 @@ vectorizable_load (vec_info *vinfo,
> int nloads = const_nunits;
> int lnel = 1;
> tree ltype = TREE_TYPE (vectype);
> + if (STMT_VINFO_COMPLEX_P (stmt_info))
> + ltype = build_complex_type (ltype);
> tree lvectype = vectype;
> auto_vec<tree> dr_chain;
> if (memory_access_type == VMAT_STRIDED_SLP)
> @@ -10080,7 +10218,9 @@ vectorizable_load (vec_info *vinfo,
>
> if (memory_access_type == VMAT_CONTIGUOUS_REVERSE)
> {
> - tree perm_mask = perm_mask_for_reverse (vectype);
> + tree perm_mask
> + = perm_mask_for_reverse (vectype,
> + STMT_VINFO_COMPLEX_P (stmt_info));
> new_temp = permute_vec_elements (vinfo, new_temp, new_temp,
> perm_mask, stmt_info, gsi);
> new_stmt = SSA_NAME_DEF_STMT (new_temp);
> @@ -12499,12 +12639,27 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
> dump_printf_loc (MSG_NOTE, vect_location,
> "get vectype for scalar type: %T\n", scalar_type);
> }
> +
> + tree orig_scalar_type = scalar_type;
> + if (TREE_CODE (scalar_type) == COMPLEX_TYPE)
> + {
> + /* Set complex_p for BB vectorizer. */
> + STMT_VINFO_COMPLEX_P (stmt_info) = true;
> + scalar_type = TREE_TYPE (scalar_type);
> + /* Double group_size for BB vectorizer to make
> + following 2 get_vectype_for_scalar_type return wanted vectype.
> + Real group size is not changed, just make the "faked" input
> + group_size. */
> + group_size *= 2;
> + }
> vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> - if (!vectype)
> + if (!vectype
> + || (STMT_VINFO_COMPLEX_P (stmt_info)
> + && !TYPE_VECTOR_SUBPARTS (vectype).is_constant ()))
> return opt_result::failure_at (stmt,
> "not vectorized:"
> " unsupported data-type %T\n",
> - scalar_type);
> + orig_scalar_type);
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
> @@ -12529,16 +12684,30 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
> TREE_TYPE (vectype));
> if (scalar_type != TREE_TYPE (vectype))
> {
> - if (dump_enabled_p ())
> + tree orig_scalar_type = scalar_type;
> + if (TREE_CODE (scalar_type) == COMPLEX_TYPE)
> + {
> + /* Set complex_p for Loop vectorizer. */
> + STMT_VINFO_COMPLEX_P (stmt_info) = true;
> + scalar_type = TREE_TYPE (scalar_type);
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> + "get complex for smallest scalar type: %T\n",
> + scalar_type);
> +
> + }
> + else if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> "get vectype for smallest scalar type: %T\n",
> scalar_type);
> nunits_vectype = get_vectype_for_scalar_type (vinfo, scalar_type,
> group_size);
> - if (!nunits_vectype)
> + if (!nunits_vectype
> + || (STMT_VINFO_COMPLEX_P (stmt_info)
> + && !TYPE_VECTOR_SUBPARTS (nunits_vectype).is_constant ()))
> return opt_result::failure_at
> (stmt, "not vectorized: unsupported data-type %T\n",
> - scalar_type);
> + orig_scalar_type);
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location, "nunits vectype: %T\n",
> nunits_vectype);
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index e5fdc9e0a14..4a809e492c4 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1161,6 +1161,9 @@ public:
> vectorization. */
> bool vectorizable;
>
> + /* The scalar type of the LHS of this statement is complex type. */
> + bool complex_p;
> +
> /* The stmt to which this info struct refers to. */
> gimple *stmt;
>
> @@ -1395,6 +1398,7 @@ struct gather_scatter_info {
> #define STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT(S) (S)->reduc_epilogue_adjustment
> #define STMT_VINFO_REDUC_IDX(S) (S)->reduc_idx
> #define STMT_VINFO_FORCE_SINGLE_CYCLE(S) (S)->force_single_cycle
> +#define STMT_VINFO_COMPLEX_P(S) (S)->complex_p
>
> #define STMT_VINFO_DR_WRT_VEC_LOOP(S) (S)->dr_wrt_vec_loop
> #define STMT_VINFO_DR_BASE_ADDRESS(S) (S)->dr_wrt_vec_loop.base_address
> @@ -1970,6 +1974,15 @@ vect_get_num_copies (loop_vec_info loop_vinfo, tree vectype)
> return vect_get_num_vectors (LOOP_VINFO_VECT_FACTOR (loop_vinfo), vectype);
> }
>
> +static inline unsigned int
> +vect_get_num_copies (loop_vec_info loop_vinfo, tree vectype, bool complex_p)
> +{
> + poly_uint64 nunits = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> + if (complex_p)
> + nunits *= 2;
> + return vect_get_num_vectors (nunits, vectype);
> +}
> +
> /* Update maximum unit count *MAX_NUNITS so that it accounts for
> NUNITS. *MAX_NUNITS can be 1 if we haven't yet recorded anything. */
>
> --
> 2.18.1
>
new file mode 100644
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-vect-details -mprefer-vector-width=256" } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 6 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) double>} 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) float>} 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) long long int>} 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) int>} 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) short int>} 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(32\) char>} 2 "vect" } } */
+
+#define N 10000
+void
+__attribute__((noipa))
+foo_pd (_Complex double* a, _Complex double* b)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = b[i];
+}
+
+void
+__attribute__((noipa))
+foo_ps (_Complex float* a, _Complex float* b)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = b[i];
+}
+
+void
+__attribute__((noipa))
+foo_epi64 (_Complex long long* a, _Complex long long* b)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = b[i];
+}
+
+void
+__attribute__((noipa))
+foo_epi32 (_Complex int* a, _Complex int* b)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = b[i];
+}
+
+void
+__attribute__((noipa))
+foo_epi16 (_Complex short* a, _Complex short* b)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = b[i];
+}
+
+void
+__attribute__((noipa))
+foo_epi8 (_Complex char* a, _Complex char* b)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = b[i];
+}
new file mode 100644
@@ -0,0 +1,63 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
+/* { dg-require-effective-target avx } */
+
+#include "avx-check.h"
+#include <string.h>
+#include "pr106010-1a.c"
+
+void
+avx_test (void)
+{
+ _Complex double* pd_src = (_Complex double*) malloc (2 * N * sizeof (double));
+ _Complex double* pd_dst = (_Complex double*) malloc (2 * N * sizeof (double));
+ _Complex float* ps_src = (_Complex float*) malloc (2 * N * sizeof (float));
+ _Complex float* ps_dst = (_Complex float*) malloc (2 * N * sizeof (float));
+ _Complex long long* epi64_src = (_Complex long long*) malloc (2 * N * sizeof (long long));
+ _Complex long long* epi64_dst = (_Complex long long*) malloc (2 * N * sizeof (long long));
+ _Complex int* epi32_src = (_Complex int*) malloc (2 * N * sizeof (int));
+ _Complex int* epi32_dst = (_Complex int*) malloc (2 * N * sizeof (int));
+ _Complex short* epi16_src = (_Complex short*) malloc (2 * N * sizeof (short));
+ _Complex short* epi16_dst = (_Complex short*) malloc (2 * N * sizeof (short));
+ _Complex char* epi8_src = (_Complex char*) malloc (2 * N * sizeof (char));
+ _Complex char* epi8_dst = (_Complex char*) malloc (2 * N * sizeof (char));
+ char* p_init = (char*) malloc (2 * N * sizeof (double));
+
+ __builtin_memset (pd_dst, 0, 2 * N * sizeof (double));
+ __builtin_memset (ps_dst, 0, 2 * N * sizeof (float));
+ __builtin_memset (epi64_dst, 0, 2 * N * sizeof (long long));
+ __builtin_memset (epi32_dst, 0, 2 * N * sizeof (int));
+ __builtin_memset (epi16_dst, 0, 2 * N * sizeof (short));
+ __builtin_memset (epi8_dst, 0, 2 * N * sizeof (char));
+
+ for (int i = 0; i != 2 * N * sizeof (double); i++)
+ p_init[i] = i;
+
+ memcpy (pd_src, p_init, 2 * N * sizeof (double));
+ memcpy (ps_src, p_init, 2 * N * sizeof (float));
+ memcpy (epi64_src, p_init, 2 * N * sizeof (long long));
+ memcpy (epi32_src, p_init, 2 * N * sizeof (int));
+ memcpy (epi16_src, p_init, 2 * N * sizeof (short));
+ memcpy (epi8_src, p_init, 2 * N * sizeof (char));
+
+ foo_pd (pd_dst, pd_src);
+ foo_ps (ps_dst, ps_src);
+ foo_epi64 (epi64_dst, epi64_src);
+ foo_epi32 (epi32_dst, epi32_src);
+ foo_epi16 (epi16_dst, epi16_src);
+ foo_epi8 (epi8_dst, epi8_src);
+ if (__builtin_memcmp (pd_dst, pd_src, N * 2 * sizeof (double)) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (ps_dst, ps_src, N * 2 * sizeof (float)) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi64_dst, epi64_src, N * 2 * sizeof (long long)) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi32_dst, epi32_src, N * 2 * sizeof (int)) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi16_dst, epi16_src, N * 2 * sizeof (short)) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi8_dst, epi8_src, N * 2 * sizeof (char)) != 0)
+ __builtin_abort ();
+
+ return;
+}
new file mode 100644
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256 -fdump-tree-vect-details" } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) _Float16>} 2 "vect" } } */
+/* { dg-require-effective-target avx512fp16 } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+#define N 10000
+
+void
+__attribute__((noipa))
+foo_ph (_Complex _Float16* a, _Complex _Float16* b)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = b[i];
+}
+
+static void
+do_test (void)
+{
+ _Complex _Float16* ph_src = (_Complex _Float16*) malloc (2 * N * sizeof (_Float16));
+ _Complex _Float16* ph_dst = (_Complex _Float16*) malloc (2 * N * sizeof (_Float16));
+ char* p_init = (char*) malloc (2 * N * sizeof (_Float16));
+
+ __builtin_memset (ph_dst, 0, 2 * N * sizeof (_Float16));
+
+ for (int i = 0; i != 2 * N * sizeof (_Float16); i++)
+ p_init[i] = i;
+
+ memcpy (ph_src, p_init, 2 * N * sizeof (_Float16));
+
+ foo_ph (ph_dst, ph_src);
+ if (__builtin_memcmp (ph_dst, ph_src, N * 2 * sizeof (_Float16)) != 0)
+ __builtin_abort ();
+}
new file mode 100644
@@ -0,0 +1,82 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-slp-details -mprefer-vector-width=256" } */
+/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 6 "slp2" } }*/
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) double>} 2 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) float>} 2 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) long long int>} 2 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) int>} 2 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) short int>} 2 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(32\) char>} 2 "slp2" } } */
+
+void
+__attribute__((noipa))
+foo_pd (_Complex double* a, _Complex double* __restrict b)
+{
+ a[0] = b[0];
+ a[1] = b[1];
+}
+
+void
+__attribute__((noipa))
+foo_ps (_Complex float* a, _Complex float* __restrict b)
+{
+ a[0] = b[0];
+ a[1] = b[1];
+ a[2] = b[2];
+ a[3] = b[3];
+
+}
+
+void
+__attribute__((noipa))
+foo_epi64 (_Complex long long* a, _Complex long long* __restrict b)
+{
+ a[0] = b[0];
+ a[1] = b[1];
+}
+
+void
+__attribute__((noipa))
+foo_epi32 (_Complex int* a, _Complex int* __restrict b)
+{
+ a[0] = b[0];
+ a[1] = b[1];
+ a[2] = b[2];
+ a[3] = b[3];
+}
+
+void
+__attribute__((noipa))
+foo_epi16 (_Complex short* a, _Complex short* __restrict b)
+{
+ a[0] = b[0];
+ a[1] = b[1];
+ a[2] = b[2];
+ a[3] = b[3];
+ a[4] = b[4];
+ a[5] = b[5];
+ a[6] = b[6];
+ a[7] = b[7];
+}
+
+void
+__attribute__((noipa))
+foo_epi8 (_Complex char* a, _Complex char* __restrict b)
+{
+ a[0] = b[0];
+ a[1] = b[1];
+ a[2] = b[2];
+ a[3] = b[3];
+ a[4] = b[4];
+ a[5] = b[5];
+ a[6] = b[6];
+ a[7] = b[7];
+ a[8] = b[8];
+ a[9] = b[9];
+ a[10] = b[10];
+ a[11] = b[11];
+ a[12] = b[12];
+ a[13] = b[13];
+ a[14] = b[14];
+ a[15] = b[15];
+}
new file mode 100644
@@ -0,0 +1,62 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
+/* { dg-require-effective-target avx } */
+
+#include "avx-check.h"
+#include <string.h>
+#include "pr106010-2a.c"
+
+void
+avx_test (void)
+{
+ _Complex double* pd_src = (_Complex double*) malloc (32);
+ _Complex double* pd_dst = (_Complex double*) malloc (32);
+ _Complex float* ps_src = (_Complex float*) malloc (32);
+ _Complex float* ps_dst = (_Complex float*) malloc (32);
+ _Complex long long* epi64_src = (_Complex long long*) malloc (32);
+ _Complex long long* epi64_dst = (_Complex long long*) malloc (32);
+ _Complex int* epi32_src = (_Complex int*) malloc (32);
+ _Complex int* epi32_dst = (_Complex int*) malloc (32);
+ _Complex short* epi16_src = (_Complex short*) malloc (32);
+ _Complex short* epi16_dst = (_Complex short*) malloc (32);
+ _Complex char* epi8_src = (_Complex char*) malloc (32);
+ _Complex char* epi8_dst = (_Complex char*) malloc (32);
+ char* p = (char* ) malloc (32);
+
+ __builtin_memset (pd_dst, 0, 32);
+ __builtin_memset (ps_dst, 0, 32);
+ __builtin_memset (epi64_dst, 0, 32);
+ __builtin_memset (epi32_dst, 0, 32);
+ __builtin_memset (epi16_dst, 0, 32);
+ __builtin_memset (epi8_dst, 0, 32);
+
+ for (int i = 0; i != 32; i++)
+ p[i] = i;
+ __builtin_memcpy (pd_src, p, 32);
+ __builtin_memcpy (ps_src, p, 32);
+ __builtin_memcpy (epi64_src, p, 32);
+ __builtin_memcpy (epi32_src, p, 32);
+ __builtin_memcpy (epi16_src, p, 32);
+ __builtin_memcpy (epi8_src, p, 32);
+
+ foo_pd (pd_dst, pd_src);
+ foo_ps (ps_dst, ps_src);
+ foo_epi64 (epi64_dst, epi64_src);
+ foo_epi32 (epi32_dst, epi32_src);
+ foo_epi16 (epi16_dst, epi16_src);
+ foo_epi8 (epi8_dst, epi8_src);
+ if (__builtin_memcmp (pd_dst, pd_src, 32) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (ps_dst, ps_src, 32) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi64_dst, epi64_src, 32) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi32_dst, epi32_src, 32) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi16_dst, epi16_src, 32) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi16_dst, epi16_src, 32) != 0)
+ __builtin_abort ();
+
+ return;
+}
new file mode 100644
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256 -fdump-tree-slp-details" } */
+/* { dg-require-effective-target avx512fp16 } */
+
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) _Float16>} 2 "slp2" } } */
+/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 1 "slp2" } }*/
+
+#include <string.h>
+
+static void do_test (void);
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+void
+__attribute__((noipa))
+foo_ph (_Complex _Float16* a, _Complex _Float16* __restrict b)
+{
+ a[0] = b[0];
+ a[1] = b[1];
+ a[2] = b[2];
+ a[3] = b[3];
+ a[4] = b[4];
+ a[5] = b[5];
+ a[6] = b[6];
+ a[7] = b[7];
+}
+
+void
+do_test (void)
+{
+ _Complex _Float16* ph_src = (_Complex _Float16*) malloc (32);
+ _Complex _Float16* ph_dst = (_Complex _Float16*) malloc (32);
+ char* p = (char* ) malloc (32);
+
+ __builtin_memset (ph_dst, 0, 32);
+
+ for (int i = 0; i != 32; i++)
+ p[i] = i;
+ __builtin_memcpy (ph_src, p, 32);
+
+ foo_ph (ph_dst, ph_src);
+ if (__builtin_memcmp (ph_dst, ph_src, 32) != 0)
+ __builtin_abort ();
+
+ return;
+}
new file mode 100644
@@ -0,0 +1,80 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-slp-details" } */
+/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 6 "slp2" } }*/
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 2, 3, 0, 1 \}} 2 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 6, 7, 4, 5, 2, 3, 0, 1 \}} 1 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 2, 3, 0, 1, 6, 7, 4, 5 \}} 1 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1 \}} 1 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1, 30, 31, 28, 29, 26, 27, 24, 25, 22, 23, 20, 21, 18, 19, 16, 17 \}} 1 "slp2" } } */
+
+void
+__attribute__((noipa))
+foo_pd (_Complex double* a, _Complex double* __restrict b)
+{
+ a[0] = b[1];
+ a[1] = b[0];
+}
+
+void
+__attribute__((noipa))
+foo_ps (_Complex float* a, _Complex float* __restrict b)
+{
+ a[0] = b[1];
+ a[1] = b[0];
+ a[2] = b[3];
+ a[3] = b[2];
+}
+
+void
+__attribute__((noipa))
+foo_epi64 (_Complex long long* a, _Complex long long* __restrict b)
+{
+ a[0] = b[1];
+ a[1] = b[0];
+}
+
+void
+__attribute__((noipa))
+foo_epi32 (_Complex int* a, _Complex int* __restrict b)
+{
+ a[0] = b[3];
+ a[1] = b[2];
+ a[2] = b[1];
+ a[3] = b[0];
+}
+
+void
+__attribute__((noipa))
+foo_epi16 (_Complex short* a, _Complex short* __restrict b)
+{
+ a[0] = b[7];
+ a[1] = b[6];
+ a[2] = b[5];
+ a[3] = b[4];
+ a[4] = b[3];
+ a[5] = b[2];
+ a[6] = b[1];
+ a[7] = b[0];
+}
+
+void
+__attribute__((noipa))
+foo_epi8 (_Complex char* a, _Complex char* __restrict b)
+{
+ a[0] = b[7];
+ a[1] = b[6];
+ a[2] = b[5];
+ a[3] = b[4];
+ a[4] = b[3];
+ a[5] = b[2];
+ a[6] = b[1];
+ a[7] = b[0];
+ a[8] = b[15];
+ a[9] = b[14];
+ a[10] = b[13];
+ a[11] = b[12];
+ a[12] = b[11];
+ a[13] = b[10];
+ a[14] = b[9];
+ a[15] = b[8];
+}
new file mode 100644
@@ -0,0 +1,126 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
+/* { dg-require-effective-target avx2 } */
+
+#include "avx2-check.h"
+#include <string.h>
+#include "pr106010-3a.c"
+
+void
+avx2_test (void)
+{
+ _Complex double* pd_src = (_Complex double*) malloc (32);
+ _Complex double* pd_dst = (_Complex double*) malloc (32);
+ _Complex double* pd_exp = (_Complex double*) malloc (32);
+ _Complex float* ps_src = (_Complex float*) malloc (32);
+ _Complex float* ps_dst = (_Complex float*) malloc (32);
+ _Complex float* ps_exp = (_Complex float*) malloc (32);
+ _Complex long long* epi64_src = (_Complex long long*) malloc (32);
+ _Complex long long* epi64_dst = (_Complex long long*) malloc (32);
+ _Complex long long* epi64_exp = (_Complex long long*) malloc (32);
+ _Complex int* epi32_src = (_Complex int*) malloc (32);
+ _Complex int* epi32_dst = (_Complex int*) malloc (32);
+ _Complex int* epi32_exp = (_Complex int*) malloc (32);
+ _Complex short* epi16_src = (_Complex short*) malloc (32);
+ _Complex short* epi16_dst = (_Complex short*) malloc (32);
+ _Complex short* epi16_exp = (_Complex short*) malloc (32);
+ _Complex char* epi8_src = (_Complex char*) malloc (32);
+ _Complex char* epi8_dst = (_Complex char*) malloc (32);
+ _Complex char* epi8_exp = (_Complex char*) malloc (32);
+ char* p = (char* ) malloc (32);
+ char* q = (char* ) malloc (32);
+
+ __builtin_memset (pd_dst, 0, 32);
+ __builtin_memset (ps_dst, 0, 32);
+ __builtin_memset (epi64_dst, 0, 32);
+ __builtin_memset (epi32_dst, 0, 32);
+ __builtin_memset (epi16_dst, 0, 32);
+ __builtin_memset (epi8_dst, 0, 32);
+
+ for (int i = 0; i != 32; i++)
+ p[i] = i;
+ __builtin_memcpy (pd_src, p, 32);
+ __builtin_memcpy (ps_src, p, 32);
+ __builtin_memcpy (epi64_src, p, 32);
+ __builtin_memcpy (epi32_src, p, 32);
+ __builtin_memcpy (epi16_src, p, 32);
+ __builtin_memcpy (epi8_src, p, 32);
+
+ for (int i = 0; i != 16; i++)
+ {
+ p[i] = i + 16;
+ p[i + 16] = i;
+ }
+ __builtin_memcpy (pd_exp, p, 32);
+ __builtin_memcpy (epi64_exp, p, 32);
+
+ for (int i = 0; i != 8; i++)
+ {
+ p[i] = i + 8;
+ p[i + 8] = i;
+ p[i + 16] = i + 24;
+ p[i + 24] = i + 16;
+ q[i] = i + 24;
+ q[i + 8] = i + 16;
+ q[i + 16] = i + 8;
+ q[i + 24] = i;
+ }
+ __builtin_memcpy (ps_exp, p, 32);
+ __builtin_memcpy (epi32_exp, q, 32);
+
+
+ for (int i = 0; i != 4; i++)
+ {
+ q[i] = i + 28;
+ q[i + 4] = i + 24;
+ q[i + 8] = i + 20;
+ q[i + 12] = i + 16;
+ q[i + 16] = i + 12;
+ q[i + 20] = i + 8;
+ q[i + 24] = i + 4;
+ q[i + 28] = i;
+ }
+ __builtin_memcpy (epi16_exp, q, 32);
+
+ for (int i = 0; i != 2; i++)
+ {
+ q[i] = i + 14;
+ q[i + 2] = i + 12;
+ q[i + 4] = i + 10;
+ q[i + 6] = i + 8;
+ q[i + 8] = i + 6;
+ q[i + 10] = i + 4;
+ q[i + 12] = i + 2;
+ q[i + 14] = i;
+ q[i + 16] = i + 30;
+ q[i + 18] = i + 28;
+ q[i + 20] = i + 26;
+ q[i + 22] = i + 24;
+ q[i + 24] = i + 22;
+ q[i + 26] = i + 20;
+ q[i + 28] = i + 18;
+ q[i + 30] = i + 16;
+ }
+ __builtin_memcpy (epi8_exp, q, 32);
+
+ foo_pd (pd_dst, pd_src);
+ foo_ps (ps_dst, ps_src);
+ foo_epi64 (epi64_dst, epi64_src);
+ foo_epi32 (epi32_dst, epi32_src);
+ foo_epi16 (epi16_dst, epi16_src);
+ foo_epi8 (epi8_dst, epi8_src);
+ if (__builtin_memcmp (pd_dst, pd_exp, 32) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (ps_dst, ps_exp, 32) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi64_dst, epi64_exp, 32) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi32_dst, epi32_exp, 32) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi16_dst, epi16_exp, 32) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi8_dst, epi8_exp, 32) != 0)
+ __builtin_abort ();
+
+ return;
+}
new file mode 100644
@@ -0,0 +1,69 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256 -fdump-tree-slp-details" } */
+/* { dg-require-effective-target avx512fp16 } */
+/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 1 "slp2" } }*/
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 2, 3, 0, 1, 8, 9, 6, 7, 14, 15, 12, 13, 4, 5, 10, 11 \}} 1 "slp2" } } */
+
+#include <string.h>
+
+static void do_test (void);
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+void
+__attribute__((noipa))
+foo_ph (_Complex _Float16* a, _Complex _Float16* __restrict b)
+{
+ a[0] = b[1];
+ a[1] = b[0];
+ a[2] = b[4];
+ a[3] = b[3];
+ a[4] = b[7];
+ a[5] = b[6];
+ a[6] = b[2];
+ a[7] = b[5];
+}
+
+void
+do_test (void)
+{
+ _Complex _Float16* ph_src = (_Complex _Float16*) malloc (32);
+ _Complex _Float16* ph_dst = (_Complex _Float16*) malloc (32);
+ _Complex _Float16* ph_exp = (_Complex _Float16*) malloc (32);
+ char* p = (char* ) malloc (32);
+ char* q = (char* ) malloc (32);
+
+ __builtin_memset (ph_dst, 0, 32);
+
+ for (int i = 0; i != 32; i++)
+ p[i] = i;
+ __builtin_memcpy (ph_src, p, 32);
+
+ for (int i = 0; i != 4; i++)
+ {
+ p[i] = i + 4;
+ p[i + 4] = i;
+ p[i + 8] = i + 16;
+ p[i + 12] = i + 12;
+ p[i + 16] = i + 28;
+ p[i + 20] = i + 24;
+ p[i + 24] = i + 8;
+ p[i + 28] = i + 20;
+ q[i] = i + 28;
+ q[i + 4] = i + 24;
+ q[i + 8] = i + 20;
+ q[i + 12] = i + 16;
+ q[i + 16] = i + 12;
+ q[i + 20] = i + 8;
+ q[i + 24] = i + 4;
+ q[i + 28] = i;
+ }
+ __builtin_memcpy (ph_exp, p, 32);
+
+ foo_ph (ph_dst, ph_src);
+ if (__builtin_memcmp (ph_dst, ph_exp, 32) != 0)
+ __builtin_abort ();
+
+ return;
+}
new file mode 100644
@@ -0,0 +1,101 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-slp-details" } */
+/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 6 "slp2" } }*/
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) double>} 1 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) float>} 1 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) long long int>} 1 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) int>} 1 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) short int>} 1 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(32\) char>} 1 "slp2" } } */
+
+void
+__attribute__((noipa))
+foo_pd (_Complex double* a,
+ _Complex double b1,
+ _Complex double b2)
+{
+ a[0] = b1;
+ a[1] = b2;
+}
+
+void
+__attribute__((noipa))
+foo_ps (_Complex float* a,
+ _Complex float b1, _Complex float b2,
+ _Complex float b3, _Complex float b4)
+{
+ a[0] = b1;
+ a[1] = b2;
+ a[2] = b3;
+ a[3] = b4;
+}
+
+void
+__attribute__((noipa))
+foo_epi64 (_Complex long long* a,
+ _Complex long long b1,
+ _Complex long long b2)
+{
+ a[0] = b1;
+ a[1] = b2;
+}
+
+void
+__attribute__((noipa))
+foo_epi32 (_Complex int* a,
+ _Complex int b1, _Complex int b2,
+ _Complex int b3, _Complex int b4)
+{
+ a[0] = b1;
+ a[1] = b2;
+ a[2] = b3;
+ a[3] = b4;
+}
+
+void
+__attribute__((noipa))
+foo_epi16 (_Complex short* a,
+ _Complex short b1, _Complex short b2,
+ _Complex short b3, _Complex short b4,
+ _Complex short b5, _Complex short b6,
+ _Complex short b7,_Complex short b8)
+{
+ a[0] = b1;
+ a[1] = b2;
+ a[2] = b3;
+ a[3] = b4;
+ a[4] = b5;
+ a[5] = b6;
+ a[6] = b7;
+ a[7] = b8;
+}
+
+void
+__attribute__((noipa))
+foo_epi8 (_Complex char* a,
+ _Complex char b1, _Complex char b2,
+ _Complex char b3, _Complex char b4,
+ _Complex char b5, _Complex char b6,
+ _Complex char b7,_Complex char b8,
+ _Complex char b9, _Complex char b10,
+ _Complex char b11, _Complex char b12,
+ _Complex char b13, _Complex char b14,
+ _Complex char b15,_Complex char b16)
+{
+ a[0] = b1;
+ a[1] = b2;
+ a[2] = b3;
+ a[3] = b4;
+ a[4] = b5;
+ a[5] = b6;
+ a[6] = b7;
+ a[7] = b8;
+ a[8] = b9;
+ a[9] = b10;
+ a[10] = b11;
+ a[11] = b12;
+ a[12] = b13;
+ a[13] = b14;
+ a[14] = b15;
+ a[15] = b16;
+}
new file mode 100644
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
+/* { dg-require-effective-target avx } */
+
+#include "avx-check.h"
+#include <string.h>
+#include "pr106010-4a.c"
+
+void
+avx_test (void)
+{
+ _Complex double* pd_src = (_Complex double*) malloc (32);
+ _Complex double* pd_dst = (_Complex double*) malloc (32);
+ _Complex float* ps_src = (_Complex float*) malloc (32);
+ _Complex float* ps_dst = (_Complex float*) malloc (32);
+ _Complex long long* epi64_src = (_Complex long long*) malloc (32);
+ _Complex long long* epi64_dst = (_Complex long long*) malloc (32);
+ _Complex int* epi32_src = (_Complex int*) malloc (32);
+ _Complex int* epi32_dst = (_Complex int*) malloc (32);
+ _Complex short* epi16_src = (_Complex short*) malloc (32);
+ _Complex short* epi16_dst = (_Complex short*) malloc (32);
+ _Complex char* epi8_src = (_Complex char*) malloc (32);
+ _Complex char* epi8_dst = (_Complex char*) malloc (32);
+ char* p = (char* ) malloc (32);
+
+ __builtin_memset (pd_dst, 0, 32);
+ __builtin_memset (ps_dst, 0, 32);
+ __builtin_memset (epi64_dst, 0, 32);
+ __builtin_memset (epi32_dst, 0, 32);
+ __builtin_memset (epi16_dst, 0, 32);
+ __builtin_memset (epi8_dst, 0, 32);
+
+ for (int i = 0; i != 32; i++)
+ p[i] = i;
+ __builtin_memcpy (pd_src, p, 32);
+ __builtin_memcpy (ps_src, p, 32);
+ __builtin_memcpy (epi64_src, p, 32);
+ __builtin_memcpy (epi32_src, p, 32);
+ __builtin_memcpy (epi16_src, p, 32);
+ __builtin_memcpy (epi8_src, p, 32);
+
+ foo_pd (pd_dst, pd_src[0], pd_src[1]);
+ foo_ps (ps_dst, ps_src[0], ps_src[1], ps_src[2], ps_src[3]);
+ foo_epi64 (epi64_dst, epi64_src[0], epi64_src[1]);
+ foo_epi32 (epi32_dst, epi32_src[0], epi32_src[1], epi32_src[2], epi32_src[3]);
+ foo_epi16 (epi16_dst, epi16_src[0], epi16_src[1], epi16_src[2], epi16_src[3],
+ epi16_src[4], epi16_src[5], epi16_src[6], epi16_src[7]);
+ foo_epi8 (epi8_dst, epi8_src[0], epi8_src[1], epi8_src[2], epi8_src[3],
+ epi8_src[4], epi8_src[5], epi8_src[6], epi8_src[7],
+ epi8_src[8], epi8_src[9], epi8_src[10], epi8_src[11],
+ epi8_src[12], epi8_src[13], epi8_src[14], epi8_src[15]);
+
+ if (__builtin_memcmp (pd_dst, pd_src, 32) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (ps_dst, ps_src, 32) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi64_dst, epi64_src, 32) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi32_dst, epi32_src, 32) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi16_dst, epi16_src, 32) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi8_dst, epi8_src, 32) != 0)
+ __builtin_abort ();
+
+ return;
+}
new file mode 100644
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -fdump-tree-slp-details -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
+/* { dg-require-effective-target avx512fp16 } */
+/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 1 "slp2" } }*/
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) _Float16>} 1 "slp2" } } */
+
+#include <string.h>
+
+static void do_test (void);
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+void
+__attribute__((noipa))
+foo_ph (_Complex _Float16* a,
+ _Complex _Float16 b1, _Complex _Float16 b2,
+ _Complex _Float16 b3, _Complex _Float16 b4,
+ _Complex _Float16 b5, _Complex _Float16 b6,
+ _Complex _Float16 b7,_Complex _Float16 b8)
+{
+ a[0] = b1;
+ a[1] = b2;
+ a[2] = b3;
+ a[3] = b4;
+ a[4] = b5;
+ a[5] = b6;
+ a[6] = b7;
+ a[7] = b8;
+}
+
+void
+do_test (void)
+{
+
+ _Complex _Float16* ph_src = (_Complex _Float16*) malloc (32);
+ _Complex _Float16* ph_dst = (_Complex _Float16*) malloc (32);
+
+ char* p = (char* ) malloc (32);
+
+ __builtin_memset (ph_dst, 0, 32);
+
+ for (int i = 0; i != 32; i++)
+ p[i] = i;
+
+ __builtin_memcpy (ph_src, p, 32);
+
+ foo_ph (ph_dst, ph_src[0], ph_src[1], ph_src[2], ph_src[3],
+ ph_src[4], ph_src[5], ph_src[6], ph_src[7]);
+
+ if (__builtin_memcmp (ph_dst, ph_src, 32) != 0)
+ __builtin_abort ();
+ return;
+}
new file mode 100644
@@ -0,0 +1,117 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-slp-details -mprefer-vector-width=256" } */
+/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 6 "slp2" } }*/
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) double>} 4 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) float>} 4 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) long long int>} 4 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) int>} 4 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) short int>} 4 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(32\) char>} 4 "slp2" } } */
+
+void
+__attribute__((noipa))
+foo_pd (_Complex double* a, _Complex double* __restrict b)
+{
+ a[0] = b[2];
+ a[1] = b[3];
+ a[2] = b[0];
+ a[3] = b[1];
+}
+
+void
+__attribute__((noipa))
+foo_ps (_Complex float* a, _Complex float* __restrict b)
+{
+ a[0] = b[4];
+ a[1] = b[5];
+ a[2] = b[6];
+ a[3] = b[7];
+ a[4] = b[0];
+ a[5] = b[1];
+ a[6] = b[2];
+ a[7] = b[3];
+}
+
+void
+__attribute__((noipa))
+foo_epi64 (_Complex long long* a, _Complex long long* __restrict b)
+{
+ a[0] = b[2];
+ a[1] = b[3];
+ a[2] = b[0];
+ a[3] = b[1];
+}
+
+void
+__attribute__((noipa))
+foo_epi32 (_Complex int* a, _Complex int* __restrict b)
+{
+ a[0] = b[4];
+ a[1] = b[5];
+ a[2] = b[6];
+ a[3] = b[7];
+ a[4] = b[0];
+ a[5] = b[1];
+ a[6] = b[2];
+ a[7] = b[3];
+}
+
+void
+__attribute__((noipa))
+foo_epi16 (_Complex short* a, _Complex short* __restrict b)
+{
+ a[0] = b[8];
+ a[1] = b[9];
+ a[2] = b[10];
+ a[3] = b[11];
+ a[4] = b[12];
+ a[5] = b[13];
+ a[6] = b[14];
+ a[7] = b[15];
+ a[8] = b[0];
+ a[9] = b[1];
+ a[10] = b[2];
+ a[11] = b[3];
+ a[12] = b[4];
+ a[13] = b[5];
+ a[14] = b[6];
+ a[15] = b[7];
+}
+
+void
+__attribute__((noipa))
+foo_epi8 (_Complex char* a, _Complex char* __restrict b)
+{
+ a[0] = b[16];
+ a[1] = b[17];
+ a[2] = b[18];
+ a[3] = b[19];
+ a[4] = b[20];
+ a[5] = b[21];
+ a[6] = b[22];
+ a[7] = b[23];
+ a[8] = b[24];
+ a[9] = b[25];
+ a[10] = b[26];
+ a[11] = b[27];
+ a[12] = b[28];
+ a[13] = b[29];
+ a[14] = b[30];
+ a[15] = b[31];
+ a[16] = b[0];
+ a[17] = b[1];
+ a[18] = b[2];
+ a[19] = b[3];
+ a[20] = b[4];
+ a[21] = b[5];
+ a[22] = b[6];
+ a[23] = b[7];
+ a[24] = b[8];
+ a[25] = b[9];
+ a[26] = b[10];
+ a[27] = b[11];
+ a[28] = b[12];
+ a[29] = b[13];
+ a[30] = b[14];
+ a[31] = b[15];
+}
new file mode 100644
@@ -0,0 +1,80 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
+/* { dg-require-effective-target avx } */
+
+#include "avx-check.h"
+#include <string.h>
+#include "pr106010-5a.c"
+
+void
+avx_test (void)
+{
+ _Complex double* pd_src = (_Complex double*) malloc (64);
+ _Complex double* pd_dst = (_Complex double*) malloc (64);
+ _Complex double* pd_exp = (_Complex double*) malloc (64);
+ _Complex float* ps_src = (_Complex float*) malloc (64);
+ _Complex float* ps_dst = (_Complex float*) malloc (64);
+ _Complex float* ps_exp = (_Complex float*) malloc (64);
+ _Complex long long* epi64_src = (_Complex long long*) malloc (64);
+ _Complex long long* epi64_dst = (_Complex long long*) malloc (64);
+ _Complex long long* epi64_exp = (_Complex long long*) malloc (64);
+ _Complex int* epi32_src = (_Complex int*) malloc (64);
+ _Complex int* epi32_dst = (_Complex int*) malloc (64);
+ _Complex int* epi32_exp = (_Complex int*) malloc (64);
+ _Complex short* epi16_src = (_Complex short*) malloc (64);
+ _Complex short* epi16_dst = (_Complex short*) malloc (64);
+ _Complex short* epi16_exp = (_Complex short*) malloc (64);
+ _Complex char* epi8_src = (_Complex char*) malloc (64);
+ _Complex char* epi8_dst = (_Complex char*) malloc (64);
+ _Complex char* epi8_exp = (_Complex char*) malloc (64);
+ char* p = (char* ) malloc (64);
+ char* q = (char* ) malloc (64);
+
+ __builtin_memset (pd_dst, 0, 64);
+ __builtin_memset (ps_dst, 0, 64);
+ __builtin_memset (epi64_dst, 0, 64);
+ __builtin_memset (epi32_dst, 0, 64);
+ __builtin_memset (epi16_dst, 0, 64);
+ __builtin_memset (epi8_dst, 0, 64);
+
+ for (int i = 0; i != 64; i++)
+ {
+ p[i] = i;
+ q[i] = (i + 32) % 64;
+ }
+ __builtin_memcpy (pd_src, p, 64);
+ __builtin_memcpy (ps_src, p, 64);
+ __builtin_memcpy (epi64_src, p, 64);
+ __builtin_memcpy (epi32_src, p, 64);
+ __builtin_memcpy (epi16_src, p, 64);
+ __builtin_memcpy (epi8_src, p, 64);
+
+ __builtin_memcpy (pd_exp, q, 64);
+ __builtin_memcpy (ps_exp, q, 64);
+ __builtin_memcpy (epi64_exp, q, 64);
+ __builtin_memcpy (epi32_exp, q, 64);
+ __builtin_memcpy (epi16_exp, q, 64);
+ __builtin_memcpy (epi8_exp, q, 64);
+
+ foo_pd (pd_dst, pd_src);
+ foo_ps (ps_dst, ps_src);
+ foo_epi64 (epi64_dst, epi64_src);
+ foo_epi32 (epi32_dst, epi32_src);
+ foo_epi16 (epi16_dst, epi16_src);
+ foo_epi8 (epi8_dst, epi8_src);
+
+ if (__builtin_memcmp (pd_dst, pd_exp, 64) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (ps_dst, ps_exp, 64) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi64_dst, epi64_exp, 64) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi32_dst, epi32_exp, 64) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi16_dst, epi16_exp, 64) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi8_dst, epi8_exp, 64) != 0)
+ __builtin_abort ();
+
+ return;
+}
new file mode 100644
@@ -0,0 +1,62 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-slp-details -mprefer-vector-width=256" } */
+/* { dg-require-effective-target avx512fp16 } */
+/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 1 "slp2" } }*/
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) _Float16>} 4 "slp2" } } */
+
+#include <string.h>
+
+static void do_test (void);
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+void
+__attribute__((noipa))
+foo_ph (_Complex _Float16* a, _Complex _Float16* __restrict b)
+{
+ a[0] = b[8];
+ a[1] = b[9];
+ a[2] = b[10];
+ a[3] = b[11];
+ a[4] = b[12];
+ a[5] = b[13];
+ a[6] = b[14];
+ a[7] = b[15];
+ a[8] = b[0];
+ a[9] = b[1];
+ a[10] = b[2];
+ a[11] = b[3];
+ a[12] = b[4];
+ a[13] = b[5];
+ a[14] = b[6];
+ a[15] = b[7];
+}
+
+void
+do_test (void)
+{
+ _Complex _Float16* ph_src = (_Complex _Float16*) malloc (64);
+ _Complex _Float16* ph_dst = (_Complex _Float16*) malloc (64);
+ _Complex _Float16* ph_exp = (_Complex _Float16*) malloc (64);
+ char* p = (char* ) malloc (64);
+ char* q = (char* ) malloc (64);
+
+ __builtin_memset (ph_dst, 0, 64);
+
+ for (int i = 0; i != 64; i++)
+ {
+ p[i] = i;
+ q[i] = (i + 32) % 64;
+ }
+ __builtin_memcpy (ph_src, p, 64);
+
+ __builtin_memcpy (ph_exp, q, 64);
+
+ foo_ph (ph_dst, ph_src);
+
+ if (__builtin_memcmp (ph_dst, ph_exp, 64) != 0)
+ __builtin_abort ();
+
+ return;
+}
new file mode 100644
@@ -0,0 +1,115 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-slp-details -mprefer-vector-width=256" } */
+/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 6 "slp2" } }*/
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 2, 3, 0, 1 \}} 4 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 6, 7, 4, 5, 2, 3, 0, 1 \}} 4 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1 \}} 2 "slp2" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 30, 31, 28, 29, 26, 27, 24, 25, 22, 23, 20, 21, 18, 19, 16, 17, 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1 \}} 2 "slp2" } } */
+
+void
+__attribute__((noipa))
+foo_pd (_Complex double* a, _Complex double* __restrict b)
+{
+ a[0] = b[3];
+ a[1] = b[2];
+ a[2] = b[1];
+ a[3] = b[0];
+}
+
+void
+__attribute__((noipa))
+foo_ps (_Complex float* a, _Complex float* __restrict b)
+{
+ a[0] = b[7];
+ a[1] = b[6];
+ a[2] = b[5];
+ a[3] = b[4];
+ a[4] = b[3];
+ a[5] = b[2];
+ a[6] = b[1];
+ a[7] = b[0];
+}
+
+void
+__attribute__((noipa))
+foo_epi64 (_Complex long long* a, _Complex long long* __restrict b)
+{
+ a[0] = b[3];
+ a[1] = b[2];
+ a[2] = b[1];
+ a[3] = b[0];
+}
+
+void
+__attribute__((noipa))
+foo_epi32 (_Complex int* a, _Complex int* __restrict b)
+{
+ a[0] = b[7];
+ a[1] = b[6];
+ a[2] = b[5];
+ a[3] = b[4];
+ a[4] = b[3];
+ a[5] = b[2];
+ a[6] = b[1];
+ a[7] = b[0];
+}
+
+void
+__attribute__((noipa))
+foo_epi16 (_Complex short* a, _Complex short* __restrict b)
+{
+ a[0] = b[15];
+ a[1] = b[14];
+ a[2] = b[13];
+ a[3] = b[12];
+ a[4] = b[11];
+ a[5] = b[10];
+ a[6] = b[9];
+ a[7] = b[8];
+ a[8] = b[7];
+ a[9] = b[6];
+ a[10] = b[5];
+ a[11] = b[4];
+ a[12] = b[3];
+ a[13] = b[2];
+ a[14] = b[1];
+ a[15] = b[0];
+}
+
+void
+__attribute__((noipa))
+foo_epi8 (_Complex char* a, _Complex char* __restrict b)
+{
+ a[0] = b[31];
+ a[1] = b[30];
+ a[2] = b[29];
+ a[3] = b[28];
+ a[4] = b[27];
+ a[5] = b[26];
+ a[6] = b[25];
+ a[7] = b[24];
+ a[8] = b[23];
+ a[9] = b[22];
+ a[10] = b[21];
+ a[11] = b[20];
+ a[12] = b[19];
+ a[13] = b[18];
+ a[14] = b[17];
+ a[15] = b[16];
+ a[16] = b[15];
+ a[17] = b[14];
+ a[18] = b[13];
+ a[19] = b[12];
+ a[20] = b[11];
+ a[21] = b[10];
+ a[22] = b[9];
+ a[23] = b[8];
+ a[24] = b[7];
+ a[25] = b[6];
+ a[26] = b[5];
+ a[27] = b[4];
+ a[28] = b[3];
+ a[29] = b[2];
+ a[30] = b[1];
+ a[31] = b[0];
+}
new file mode 100644
@@ -0,0 +1,157 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
+/* { dg-require-effective-target avx2 } */
+
+#include "avx2-check.h"
+#include <string.h>
+#include "pr106010-6a.c"
+
+void
+avx2_test (void)
+{
+ _Complex double* pd_src = (_Complex double*) malloc (64);
+ _Complex double* pd_dst = (_Complex double*) malloc (64);
+ _Complex double* pd_exp = (_Complex double*) malloc (64);
+ _Complex float* ps_src = (_Complex float*) malloc (64);
+ _Complex float* ps_dst = (_Complex float*) malloc (64);
+ _Complex float* ps_exp = (_Complex float*) malloc (64);
+ _Complex long long* epi64_src = (_Complex long long*) malloc (64);
+ _Complex long long* epi64_dst = (_Complex long long*) malloc (64);
+ _Complex long long* epi64_exp = (_Complex long long*) malloc (64);
+ _Complex int* epi32_src = (_Complex int*) malloc (64);
+ _Complex int* epi32_dst = (_Complex int*) malloc (64);
+ _Complex int* epi32_exp = (_Complex int*) malloc (64);
+ _Complex short* epi16_src = (_Complex short*) malloc (64);
+ _Complex short* epi16_dst = (_Complex short*) malloc (64);
+ _Complex short* epi16_exp = (_Complex short*) malloc (64);
+ _Complex char* epi8_src = (_Complex char*) malloc (64);
+ _Complex char* epi8_dst = (_Complex char*) malloc (64);
+ _Complex char* epi8_exp = (_Complex char*) malloc (64);
+ char* p = (char* ) malloc (64);
+ char* q = (char* ) malloc (64);
+
+ __builtin_memset (pd_dst, 0, 64);
+ __builtin_memset (ps_dst, 0, 64);
+ __builtin_memset (epi64_dst, 0, 64);
+ __builtin_memset (epi32_dst, 0, 64);
+ __builtin_memset (epi16_dst, 0, 64);
+ __builtin_memset (epi8_dst, 0, 64);
+
+ for (int i = 0; i != 64; i++)
+ p[i] = i;
+
+ __builtin_memcpy (pd_src, p, 64);
+ __builtin_memcpy (ps_src, p, 64);
+ __builtin_memcpy (epi64_src, p, 64);
+ __builtin_memcpy (epi32_src, p, 64);
+ __builtin_memcpy (epi16_src, p, 64);
+ __builtin_memcpy (epi8_src, p, 64);
+
+
+ for (int i = 0; i != 16; i++)
+ {
+ q[i] = i + 48;
+ q[i + 16] = i + 32;
+ q[i + 32] = i + 16;
+ q[i + 48] = i;
+ }
+
+ __builtin_memcpy (pd_exp, q, 64);
+ __builtin_memcpy (epi64_exp, q, 64);
+
+ for (int i = 0; i != 8; i++)
+ {
+ q[i] = i + 56;
+ q[i + 8] = i + 48;
+ q[i + 16] = i + 40;
+ q[i + 24] = i + 32;
+ q[i + 32] = i + 24;
+ q[i + 40] = i + 16;
+ q[i + 48] = i + 8;
+ q[i + 56] = i;
+ }
+
+ __builtin_memcpy (ps_exp, q, 64);
+ __builtin_memcpy (epi32_exp, q, 64);
+
+ for (int i = 0; i != 4; i++)
+ {
+ q[i] = i + 60;
+ q[i + 4] = i + 56;
+ q[i + 8] = i + 52;
+ q[i + 12] = i + 48;
+ q[i + 16] = i + 44;
+ q[i + 20] = i + 40;
+ q[i + 24] = i + 36;
+ q[i + 28] = i + 32;
+ q[i + 32] = i + 28;
+ q[i + 36] = i + 24;
+ q[i + 40] = i + 20;
+ q[i + 44] = i + 16;
+ q[i + 48] = i + 12;
+ q[i + 52] = i + 8;
+ q[i + 56] = i + 4;
+ q[i + 60] = i;
+ }
+
+ __builtin_memcpy (epi16_exp, q, 64);
+
+ for (int i = 0; i != 2; i++)
+ {
+ q[i] = i + 62;
+ q[i + 2] = i + 60;
+ q[i + 4] = i + 58;
+ q[i + 6] = i + 56;
+ q[i + 8] = i + 54;
+ q[i + 10] = i + 52;
+ q[i + 12] = i + 50;
+ q[i + 14] = i + 48;
+ q[i + 16] = i + 46;
+ q[i + 18] = i + 44;
+ q[i + 20] = i + 42;
+ q[i + 22] = i + 40;
+ q[i + 24] = i + 38;
+ q[i + 26] = i + 36;
+ q[i + 28] = i + 34;
+ q[i + 30] = i + 32;
+ q[i + 32] = i + 30;
+ q[i + 34] = i + 28;
+ q[i + 36] = i + 26;
+ q[i + 38] = i + 24;
+ q[i + 40] = i + 22;
+ q[i + 42] = i + 20;
+ q[i + 44] = i + 18;
+ q[i + 46] = i + 16;
+ q[i + 48] = i + 14;
+ q[i + 50] = i + 12;
+ q[i + 52] = i + 10;
+ q[i + 54] = i + 8;
+ q[i + 56] = i + 6;
+ q[i + 58] = i + 4;
+ q[i + 60] = i + 2;
+ q[i + 62] = i;
+ }
+ __builtin_memcpy (epi8_exp, q, 64);
+
+ foo_pd (pd_dst, pd_src);
+ foo_ps (ps_dst, ps_src);
+ foo_epi64 (epi64_dst, epi64_src);
+ foo_epi32 (epi32_dst, epi32_src);
+ foo_epi16 (epi16_dst, epi16_src);
+ foo_epi8 (epi8_dst, epi8_src);
+
+ if (__builtin_memcmp (pd_dst, pd_exp, 64) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (ps_dst, ps_exp, 64) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi64_dst, epi64_exp, 64) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi32_dst, epi32_exp, 64) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi16_dst, epi16_exp, 64) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi8_dst, epi8_exp, 64) != 0)
+ __builtin_abort ();
+
+ return;
+}
new file mode 100644
@@ -0,0 +1,80 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256 -fdump-tree-slp-details" } */
+/* { dg-require-effective-target avx512fp16 } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*VEC_PERM_EXPR.*\{ 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1 \}} 2 "slp2" } } */
+/* { dg-final { scan-tree-dump-times "basic block part vectorized using (?:32|64) byte vectors" 1 "slp2" } } */
+
+#include <string.h>
+
+static void do_test (void);
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+void
+__attribute__((noipa))
+foo_ph (_Complex _Float16* a, _Complex _Float16* __restrict b)
+{
+ a[0] = b[15];
+ a[1] = b[14];
+ a[2] = b[13];
+ a[3] = b[12];
+ a[4] = b[11];
+ a[5] = b[10];
+ a[6] = b[9];
+ a[7] = b[8];
+ a[8] = b[7];
+ a[9] = b[6];
+ a[10] = b[5];
+ a[11] = b[4];
+ a[12] = b[3];
+ a[13] = b[2];
+ a[14] = b[1];
+ a[15] = b[0];
+}
+
+void
+do_test (void)
+{
+ _Complex _Float16* ph_src = (_Complex _Float16*) malloc (64);
+ _Complex _Float16* ph_dst = (_Complex _Float16*) malloc (64);
+ _Complex _Float16* ph_exp = (_Complex _Float16*) malloc (64);
+ char* p = (char* ) malloc (64);
+ char* q = (char* ) malloc (64);
+
+ __builtin_memset (ph_dst, 0, 64);
+
+ for (int i = 0; i != 64; i++)
+ p[i] = i;
+
+ __builtin_memcpy (ph_src, p, 64);
+
+ for (int i = 0; i != 4; i++)
+ {
+ q[i] = i + 60;
+ q[i + 4] = i + 56;
+ q[i + 8] = i + 52;
+ q[i + 12] = i + 48;
+ q[i + 16] = i + 44;
+ q[i + 20] = i + 40;
+ q[i + 24] = i + 36;
+ q[i + 28] = i + 32;
+ q[i + 32] = i + 28;
+ q[i + 36] = i + 24;
+ q[i + 40] = i + 20;
+ q[i + 44] = i + 16;
+ q[i + 48] = i + 12;
+ q[i + 52] = i + 8;
+ q[i + 56] = i + 4;
+ q[i + 60] = i;
+ }
+
+ __builtin_memcpy (ph_exp, q, 64);
+
+ foo_ph (ph_dst, ph_src);
+
+ if (__builtin_memcmp (ph_dst, ph_exp, 64) != 0)
+ __builtin_abort ();
+
+ return;
+}
new file mode 100644
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-vect-details -mprefer-vector-width=256" } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 6 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) double>} 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) float>} 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) long long int>} 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) int>} 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) short int>} 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(32\) char>} 1 "vect" } } */
+
+#define N 10000
+void
+__attribute__((noipa))
+foo_pd (_Complex double* a, _Complex double b)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = b;
+}
+
+void
+__attribute__((noipa))
+foo_ps (_Complex float* a, _Complex float b)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = b;
+}
+
+void
+__attribute__((noipa))
+foo_epi64 (_Complex long long* a, _Complex long long b)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = b;
+}
+
+void
+__attribute__((noipa))
+foo_epi32 (_Complex int* a, _Complex int b)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = b;
+}
+
+void
+__attribute__((noipa))
+foo_epi16 (_Complex short* a, _Complex short b)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = b;
+}
+
+void
+__attribute__((noipa))
+foo_epi8 (_Complex char* a, _Complex char b)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = b;
+}
new file mode 100644
@@ -0,0 +1,63 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
+/* { dg-require-effective-target avx } */
+
+#include "avx-check.h"
+#include <string.h>
+#include "pr106010-7a.c"
+
+void
+avx_test (void)
+{
+ _Complex double* pd_src = (_Complex double*) malloc (2 * N * sizeof (double));
+ _Complex double* pd_dst = (_Complex double*) malloc (2 * N * sizeof (double));
+ _Complex float* ps_src = (_Complex float*) malloc (2 * N * sizeof (float));
+ _Complex float* ps_dst = (_Complex float*) malloc (2 * N * sizeof (float));
+ _Complex long long* epi64_src = (_Complex long long*) malloc (2 * N * sizeof (long long));
+ _Complex long long* epi64_dst = (_Complex long long*) malloc (2 * N * sizeof (long long));
+ _Complex int* epi32_src = (_Complex int*) malloc (2 * N * sizeof (int));
+ _Complex int* epi32_dst = (_Complex int*) malloc (2 * N * sizeof (int));
+ _Complex short* epi16_src = (_Complex short*) malloc (2 * N * sizeof (short));
+ _Complex short* epi16_dst = (_Complex short*) malloc (2 * N * sizeof (short));
+ _Complex char* epi8_src = (_Complex char*) malloc (2 * N * sizeof (char));
+ _Complex char* epi8_dst = (_Complex char*) malloc (2 * N * sizeof (char));
+ char* p_init = (char*) malloc (2 * N * sizeof (double));
+
+ __builtin_memset (pd_dst, 0, 2 * N * sizeof (double));
+ __builtin_memset (ps_dst, 0, 2 * N * sizeof (float));
+ __builtin_memset (epi64_dst, 0, 2 * N * sizeof (long long));
+ __builtin_memset (epi32_dst, 0, 2 * N * sizeof (int));
+ __builtin_memset (epi16_dst, 0, 2 * N * sizeof (short));
+ __builtin_memset (epi8_dst, 0, 2 * N * sizeof (char));
+
+ for (int i = 0; i != 2 * N * sizeof (double); i++)
+ p_init[i] = i % 2 + 3;
+
+ memcpy (pd_src, p_init, 2 * N * sizeof (double));
+ memcpy (ps_dst, p_init, 2 * N * sizeof (float));
+ memcpy (epi64_dst, p_init, 2 * N * sizeof (long long));
+ memcpy (epi32_dst, p_init, 2 * N * sizeof (int));
+ memcpy (epi16_dst, p_init, 2 * N * sizeof (short));
+ memcpy (epi8_dst, p_init, 2 * N * sizeof (char));
+
+ foo_pd (pd_dst, pd_src[0]);
+ foo_ps (ps_dst, ps_src[0]);
+ foo_epi64 (epi64_dst, epi64_src[0]);
+ foo_epi32 (epi32_dst, epi32_src[0]);
+ foo_epi16 (epi16_dst, epi16_src[0]);
+ foo_epi8 (epi8_dst, epi8_src[0]);
+ if (__builtin_memcmp (pd_dst, pd_src, N * 2 * sizeof (double)) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (ps_dst, ps_src, N * 2 * sizeof (float)) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi64_dst, epi64_src, N * 2 * sizeof (long long)) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi32_dst, epi32_src, N * 2 * sizeof (int)) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi16_dst, epi16_src, N * 2 * sizeof (short)) != 0)
+ __builtin_abort ();
+ if (__builtin_memcmp (epi8_dst, epi8_src, N * 2 * sizeof (char)) != 0)
+ __builtin_abort ();
+
+ return;
+}
new file mode 100644
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256 -fdump-tree-vect-details" } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) _Float16>} 1 "vect" } } */
+/* { dg-require-effective-target avx512fp16 } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+#define N 10000
+
+void
+__attribute__((noipa))
+foo_ph (_Complex _Float16* a, _Complex _Float16 b)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = b;
+}
+
+static void
+do_test (void)
+{
+ _Complex _Float16* ph_src = (_Complex _Float16*) malloc (2 * N * sizeof (_Float16));
+ _Complex _Float16* ph_dst = (_Complex _Float16*) malloc (2 * N * sizeof (_Float16));
+ char* p_init = (char*) malloc (2 * N * sizeof (_Float16));
+
+ __builtin_memset (ph_dst, 0, 2 * N * sizeof (_Float16));
+
+ for (int i = 0; i != 2 * N * sizeof (_Float16); i++)
+ p_init[i] = i % 2 + 3;
+
+ memcpy (ph_src, p_init, 2 * N * sizeof (_Float16));
+
+ foo_ph (ph_dst, ph_src[0]);
+ if (__builtin_memcmp (ph_dst, ph_src, N * 2 * sizeof (_Float16)) != 0)
+ __builtin_abort ();
+}
new file mode 100644
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-vect-details -mprefer-vector-width=256" } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 6 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) double>} 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) float>} 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(4\) long long int>} 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(8\) int>} 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) short int>} 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(32\) char>} 1 "vect" } } */
+
+#define N 10000
+void
+__attribute__((noipa))
+foo_pd (_Complex double* a)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = 1.0 + 2.0i;
+}
+
+void
+__attribute__((noipa))
+foo_ps (_Complex float* a)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = 1.0f + 2.0fi;
+}
+
+void
+__attribute__((noipa))
+foo_epi64 (_Complex long long* a)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = 1 + 2i;
+}
+
+void
+__attribute__((noipa))
+foo_epi32 (_Complex int* a)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = 1 + 2i;
+}
+
+void
+__attribute__((noipa))
+foo_epi16 (_Complex short* a)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = 1 + 2i;
+}
+
+void
+__attribute__((noipa))
+foo_epi8 (_Complex char* a)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = 1 + 2i;
+}
new file mode 100644
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */
+/* { dg-require-effective-target avx } */
+
+#include "avx-check.h"
+#include <string.h>
+#include "pr106010-8a.c"
+
+void
+avx_test (void)
+{
+ _Complex double pd_src = 1.0 + 2.0i;
+ _Complex double* pd_dst = (_Complex double*) malloc (2 * N * sizeof (double));
+ _Complex float ps_src = 1.0 + 2.0i;
+ _Complex float* ps_dst = (_Complex float*) malloc (2 * N * sizeof (float));
+ _Complex long long epi64_src = 1 + 2i;;
+ _Complex long long* epi64_dst = (_Complex long long*) malloc (2 * N * sizeof (long long));
+ _Complex int epi32_src = 1 + 2i;
+ _Complex int* epi32_dst = (_Complex int*) malloc (2 * N * sizeof (int));
+ _Complex short epi16_src = 1 + 2i;
+ _Complex short* epi16_dst = (_Complex short*) malloc (2 * N * sizeof (short));
+ _Complex char epi8_src = 1 + 2i;
+ _Complex char* epi8_dst = (_Complex char*) malloc (2 * N * sizeof (char));
+
+ __builtin_memset (pd_dst, 0, 2 * N * sizeof (double));
+ __builtin_memset (ps_dst, 0, 2 * N * sizeof (float));
+ __builtin_memset (epi64_dst, 0, 2 * N * sizeof (long long));
+ __builtin_memset (epi32_dst, 0, 2 * N * sizeof (int));
+ __builtin_memset (epi16_dst, 0, 2 * N * sizeof (short));
+ __builtin_memset (epi8_dst, 0, 2 * N * sizeof (char));
+
+ foo_pd (pd_dst);
+ foo_ps (ps_dst);
+ foo_epi64 (epi64_dst);
+ foo_epi32 (epi32_dst);
+ foo_epi16 (epi16_dst);
+ foo_epi8 (epi8_dst);
+ for (int i = 0 ; i != N; i++)
+ {
+ if (pd_dst[i] != pd_src)
+ __builtin_abort ();
+ if (ps_dst[i] != ps_src)
+ __builtin_abort ();
+ if (epi64_dst[i] != epi64_src)
+ __builtin_abort ();
+ if (epi32_dst[i] != epi32_src)
+ __builtin_abort ();
+ if (epi16_dst[i] != epi16_src)
+ __builtin_abort ();
+ if (epi8_dst[i] != epi8_src)
+ __builtin_abort ();
+ }
+}
new file mode 100644
@@ -0,0 +1,38 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256 -fdump-tree-vect-details" } */
+/* { dg-final { scan-tree-dump-times {(?n)add new stmt:.*MEM <vector\(16\) _Float16>} 1 "vect" } } */
+/* { dg-require-effective-target avx512fp16 } */
+
+#include <string.h>
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+#define N 10000
+
+void
+__attribute__((noipa))
+foo_ph (_Complex _Float16* a)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = 1.0f16 + 2.0f16i;
+}
+
+static void
+do_test (void)
+{
+ _Complex _Float16 ph_src = 1.0f16 + 2.0f16i;
+ _Complex _Float16* ph_dst = (_Complex _Float16*) malloc (2 * N * sizeof (_Float16));
+
+ __builtin_memset (ph_dst, 0, 2 * N * sizeof (_Float16));
+
+ foo_ph (ph_dst);
+ for (int i = 0; i != N; i++)
+ {
+ if (ph_dst[i] != ph_src)
+ __builtin_abort ();
+ }
+}
new file mode 100644
@@ -0,0 +1,89 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2 -fvect-cost-model=unlimited -fdump-tree-vect-details" } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 6 "vect" } } */
+
+typedef struct { _Complex double c; double a1; double a2;}
+ cdf;
+typedef struct { _Complex double c; double a1; double a2; double a3; double a4;}
+ cdf2;
+typedef struct { _Complex double c1; _Complex double c2; double a1; double a2; double a3; double a4;}
+ cdf3;
+typedef struct { _Complex double c1; _Complex double c2; double a1; double a2;}
+ cdf4;
+
+#define N 100
+/* VMAT_ELEMENTWISE. */
+void
+__attribute__((noipa))
+foo (cdf* a, cdf* __restrict b)
+{
+ for (int i = 0; i < N; ++i)
+ {
+ a[i].c = b[i].c;
+ a[i].a1 = b[i].a1;
+ a[i].a2 = b[i].a2;
+ }
+}
+
+/* VMAT_CONTIGUOUS_PERMUTE. */
+void
+__attribute__((noipa))
+foo1 (cdf2* a, cdf2* __restrict b)
+{
+ for (int i = 0; i < N; ++i)
+ {
+ a[i].c = b[i].c;
+ a[i].a1 = b[i].a1;
+ a[i].a2 = b[i].a2;
+ a[i].a3 = b[i].a3;
+ a[i].a4 = b[i].a4;
+ }
+}
+
+/* VMAT_CONTIGUOUS. */
+void
+__attribute__((noipa))
+foo2 (cdf3* a, cdf3* __restrict b)
+{
+ for (int i = 0; i < N; ++i)
+ {
+ a[i].c1 = b[i].c1;
+ a[i].c2 = b[i].c2;
+ a[i].a1 = b[i].a1;
+ a[i].a2 = b[i].a2;
+ a[i].a3 = b[i].a3;
+ a[i].a4 = b[i].a4;
+ }
+}
+
+/* VMAT_STRIDED_SLP. */
+void
+__attribute__((noipa))
+foo3 (cdf4* a, cdf4* __restrict b)
+{
+ for (int i = 0; i < N; ++i)
+ {
+ a[i].c1 = b[i].c1;
+ a[i].c2 = b[i].c2;
+ a[i].a1 = b[i].a1;
+ a[i].a2 = b[i].a2;
+ }
+}
+
+/* VMAT_CONTIGUOUS_REVERSE. */
+void
+__attribute__((noipa))
+foo4 (_Complex double* a, _Complex double* __restrict b)
+{
+ for (int i = 0; i != N; i++)
+ a[i] = b[N-i-1];
+}
+
+/* VMAT_CONTIGUOUS_DOWN. */
+void
+__attribute__((noipa))
+foo5 (_Complex double* a, _Complex double* __restrict b)
+{
+ for (int i = 0; i != N; i++)
+ a[N-i-1] = b[0];
+}
new file mode 100644
@@ -0,0 +1,90 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -msse2 -fvect-cost-model=unlimited" } */
+/* { dg-require-effective-target sse2 } */
+
+#include <string.h>
+#include "sse2-check.h"
+#include "pr106010-9a.c"
+
+static void
+sse2_test (void)
+{
+ _Complex double* pd_src = (_Complex double*) malloc (N * sizeof (_Complex double));
+ _Complex double* pd_dst = (_Complex double*) malloc (N * sizeof (_Complex double));
+ _Complex double* pd_src2 = (_Complex double*) malloc (N * sizeof (_Complex double));
+ _Complex double* pd_dst2 = (_Complex double*) malloc (N * sizeof (_Complex double));
+ cdf* cdf_src = (cdf*) malloc (N * sizeof (cdf));
+ cdf* cdf_dst = (cdf*) malloc (N * sizeof (cdf));
+ cdf2* cdf2_src = (cdf2*) malloc (N * sizeof (cdf2));
+ cdf2* cdf2_dst = (cdf2*) malloc (N * sizeof (cdf2));
+ cdf3* cdf3_src = (cdf3*) malloc (N * sizeof (cdf3));
+ cdf3* cdf3_dst = (cdf3*) malloc (N * sizeof (cdf3));
+ cdf4* cdf4_src = (cdf4*) malloc (N * sizeof (cdf4));
+ cdf4* cdf4_dst = (cdf4*) malloc (N * sizeof (cdf4));
+
+ char* p_init = (char*) malloc (N * sizeof (cdf3));
+
+ __builtin_memset (cdf_dst, 0, N * sizeof (cdf));
+ __builtin_memset (cdf2_dst, 0, N * sizeof (cdf2));
+ __builtin_memset (cdf3_dst, 0, N * sizeof (cdf3));
+ __builtin_memset (cdf4_dst, 0, N * sizeof (cdf4));
+ __builtin_memset (pd_dst, 0, N * sizeof (_Complex double));
+ __builtin_memset (pd_dst2, 0, N * sizeof (_Complex double));
+
+ for (int i = 0; i != N * sizeof (cdf3); i++)
+ p_init[i] = i;
+
+ memcpy (cdf_src, p_init, N * sizeof (cdf));
+ memcpy (cdf2_src, p_init, N * sizeof (cdf2));
+ memcpy (cdf3_src, p_init, N * sizeof (cdf3));
+ memcpy (cdf4_src, p_init, N * sizeof (cdf4));
+ memcpy (pd_src, p_init, N * sizeof (_Complex double));
+ for (int i = 0; i != 2 * N * sizeof (double); i++)
+ p_init[i] = i % 16;
+ memcpy (pd_src2, p_init, N * sizeof (_Complex double));
+
+ foo (cdf_dst, cdf_src);
+ foo1 (cdf2_dst, cdf2_src);
+ foo2 (cdf3_dst, cdf3_src);
+ foo3 (cdf4_dst, cdf4_src);
+ foo4 (pd_dst, pd_src);
+ foo5 (pd_dst2, pd_src2);
+ for (int i = 0; i != N; i++)
+ {
+ p_init[(N - i - 1) * 16] = i * 16;
+ p_init[(N - i - 1) * 16 + 1] = i * 16 + 1;
+ p_init[(N - i - 1) * 16 + 2] = i * 16 + 2;
+ p_init[(N - i - 1) * 16 + 3] = i * 16 + 3;
+ p_init[(N - i - 1) * 16 + 4] = i * 16 + 4;
+ p_init[(N - i - 1) * 16 + 5] = i * 16 + 5;
+ p_init[(N - i - 1) * 16 + 6] = i * 16 + 6;
+ p_init[(N - i - 1) * 16 + 7] = i * 16 + 7;
+ p_init[(N - i - 1) * 16 + 8] = i * 16 + 8;
+ p_init[(N - i - 1) * 16 + 9] = i * 16 + 9;
+ p_init[(N - i - 1) * 16 + 10] = i * 16 + 10;
+ p_init[(N - i - 1) * 16 + 11] = i * 16 + 11;
+ p_init[(N - i - 1) * 16 + 12] = i * 16 + 12;
+ p_init[(N - i - 1) * 16 + 13] = i * 16 + 13;
+ p_init[(N - i - 1) * 16 + 14] = i * 16 + 14;
+ p_init[(N - i - 1) * 16 + 15] = i * 16 + 15;
+ }
+ memcpy (pd_src, p_init, N * 16);
+
+ if (__builtin_memcmp (pd_dst, pd_src, N * 2 * sizeof (double)) != 0)
+ __builtin_abort ();
+
+ if (__builtin_memcmp (pd_dst2, pd_src2, N * 2 * sizeof (double)) != 0)
+ __builtin_abort ();
+
+ if (__builtin_memcmp (cdf_dst, cdf_src, N * sizeof (cdf)) != 0)
+ __builtin_abort ();
+
+ if (__builtin_memcmp (cdf2_dst, cdf2_src, N * sizeof (cdf2)) != 0)
+ __builtin_abort ();
+
+ if (__builtin_memcmp (cdf3_dst, cdf3_src, N * sizeof (cdf3)) != 0)
+ __builtin_abort ();
+
+ if (__builtin_memcmp (cdf4_dst, cdf4_src, N * sizeof (cdf4)) != 0)
+ __builtin_abort ();
+}
new file mode 100644
@@ -0,0 +1,90 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -mavx2 -fvect-cost-model=unlimited" } */
+/* { dg-require-effective-target avx2 } */
+
+#include <string.h>
+#include "avx2-check.h"
+#include "pr106010-9a.c"
+
+static void
+avx2_test (void)
+{
+ _Complex double* pd_src = (_Complex double*) malloc (N * sizeof (_Complex double));
+ _Complex double* pd_dst = (_Complex double*) malloc (N * sizeof (_Complex double));
+ _Complex double* pd_src2 = (_Complex double*) malloc (N * sizeof (_Complex double));
+ _Complex double* pd_dst2 = (_Complex double*) malloc (N * sizeof (_Complex double));
+ cdf* cdf_src = (cdf*) malloc (N * sizeof (cdf));
+ cdf* cdf_dst = (cdf*) malloc (N * sizeof (cdf));
+ cdf2* cdf2_src = (cdf2*) malloc (N * sizeof (cdf2));
+ cdf2* cdf2_dst = (cdf2*) malloc (N * sizeof (cdf2));
+ cdf3* cdf3_src = (cdf3*) malloc (N * sizeof (cdf3));
+ cdf3* cdf3_dst = (cdf3*) malloc (N * sizeof (cdf3));
+ cdf4* cdf4_src = (cdf4*) malloc (N * sizeof (cdf4));
+ cdf4* cdf4_dst = (cdf4*) malloc (N * sizeof (cdf4));
+
+ char* p_init = (char*) malloc (N * sizeof (cdf3));
+
+ __builtin_memset (cdf_dst, 0, N * sizeof (cdf));
+ __builtin_memset (cdf2_dst, 0, N * sizeof (cdf2));
+ __builtin_memset (cdf3_dst, 0, N * sizeof (cdf3));
+ __builtin_memset (cdf4_dst, 0, N * sizeof (cdf4));
+ __builtin_memset (pd_dst, 0, N * sizeof (_Complex double));
+ __builtin_memset (pd_dst2, 0, N * sizeof (_Complex double));
+
+ for (int i = 0; i != N * sizeof (cdf3); i++)
+ p_init[i] = i;
+
+ memcpy (cdf_src, p_init, N * sizeof (cdf));
+ memcpy (cdf2_src, p_init, N * sizeof (cdf2));
+ memcpy (cdf3_src, p_init, N * sizeof (cdf3));
+ memcpy (cdf4_src, p_init, N * sizeof (cdf4));
+ memcpy (pd_src, p_init, N * sizeof (_Complex double));
+ for (int i = 0; i != 2 * N * sizeof (double); i++)
+ p_init[i] = i % 16;
+ memcpy (pd_src2, p_init, N * sizeof (_Complex double));
+
+ foo (cdf_dst, cdf_src);
+ foo1 (cdf2_dst, cdf2_src);
+ foo2 (cdf3_dst, cdf3_src);
+ foo3 (cdf4_dst, cdf4_src);
+ foo4 (pd_dst, pd_src);
+ foo5 (pd_dst2, pd_src2);
+ for (int i = 0; i != N; i++)
+ {
+ p_init[(N - i - 1) * 16] = i * 16;
+ p_init[(N - i - 1) * 16 + 1] = i * 16 + 1;
+ p_init[(N - i - 1) * 16 + 2] = i * 16 + 2;
+ p_init[(N - i - 1) * 16 + 3] = i * 16 + 3;
+ p_init[(N - i - 1) * 16 + 4] = i * 16 + 4;
+ p_init[(N - i - 1) * 16 + 5] = i * 16 + 5;
+ p_init[(N - i - 1) * 16 + 6] = i * 16 + 6;
+ p_init[(N - i - 1) * 16 + 7] = i * 16 + 7;
+ p_init[(N - i - 1) * 16 + 8] = i * 16 + 8;
+ p_init[(N - i - 1) * 16 + 9] = i * 16 + 9;
+ p_init[(N - i - 1) * 16 + 10] = i * 16 + 10;
+ p_init[(N - i - 1) * 16 + 11] = i * 16 + 11;
+ p_init[(N - i - 1) * 16 + 12] = i * 16 + 12;
+ p_init[(N - i - 1) * 16 + 13] = i * 16 + 13;
+ p_init[(N - i - 1) * 16 + 14] = i * 16 + 14;
+ p_init[(N - i - 1) * 16 + 15] = i * 16 + 15;
+ }
+ memcpy (pd_src, p_init, N * 16);
+
+ if (__builtin_memcmp (pd_dst, pd_src, N * 2 * sizeof (double)) != 0)
+ __builtin_abort ();
+
+ if (__builtin_memcmp (pd_dst2, pd_src2, N * 2 * sizeof (double)) != 0)
+ __builtin_abort ();
+
+ if (__builtin_memcmp (cdf_dst, cdf_src, N * sizeof (cdf)) != 0)
+ __builtin_abort ();
+
+ if (__builtin_memcmp (cdf2_dst, cdf2_src, N * sizeof (cdf2)) != 0)
+ __builtin_abort ();
+
+ if (__builtin_memcmp (cdf3_dst, cdf3_src, N * sizeof (cdf3)) != 0)
+ __builtin_abort ();
+
+ if (__builtin_memcmp (cdf4_dst, cdf4_src, N * sizeof (cdf4)) != 0)
+ __builtin_abort ();
+}
new file mode 100644
@@ -0,0 +1,92 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -mavx512f -mavx512vl -fvect-cost-model=unlimited -mprefer-vector-width=512" } */
+/* { dg-require-effective-target avx512f } */
+
+#include <string.h>
+#include <stdlib.h>
+#define AVX512F
+#include "avx512-check.h"
+#include "pr106010-9a.c"
+
+static void
+test_512 (void)
+{
+ _Complex double* pd_src = (_Complex double*) malloc (N * sizeof (_Complex double));
+ _Complex double* pd_dst = (_Complex double*) malloc (N * sizeof (_Complex double));
+ _Complex double* pd_src2 = (_Complex double*) malloc (N * sizeof (_Complex double));
+ _Complex double* pd_dst2 = (_Complex double*) malloc (N * sizeof (_Complex double));
+ cdf* cdf_src = (cdf*) malloc (N * sizeof (cdf));
+ cdf* cdf_dst = (cdf*) malloc (N * sizeof (cdf));
+ cdf2* cdf2_src = (cdf2*) malloc (N * sizeof (cdf2));
+ cdf2* cdf2_dst = (cdf2*) malloc (N * sizeof (cdf2));
+ cdf3* cdf3_src = (cdf3*) malloc (N * sizeof (cdf3));
+ cdf3* cdf3_dst = (cdf3*) malloc (N * sizeof (cdf3));
+ cdf4* cdf4_src = (cdf4*) malloc (N * sizeof (cdf4));
+ cdf4* cdf4_dst = (cdf4*) malloc (N * sizeof (cdf4));
+
+ char* p_init = (char*) malloc (N * sizeof (cdf3));
+
+ __builtin_memset (cdf_dst, 0, N * sizeof (cdf));
+ __builtin_memset (cdf2_dst, 0, N * sizeof (cdf2));
+ __builtin_memset (cdf3_dst, 0, N * sizeof (cdf3));
+ __builtin_memset (cdf4_dst, 0, N * sizeof (cdf4));
+ __builtin_memset (pd_dst, 0, N * sizeof (_Complex double));
+ __builtin_memset (pd_dst2, 0, N * sizeof (_Complex double));
+
+ for (int i = 0; i != N * sizeof (cdf3); i++)
+ p_init[i] = i;
+
+ memcpy (cdf_src, p_init, N * sizeof (cdf));
+ memcpy (cdf2_src, p_init, N * sizeof (cdf2));
+ memcpy (cdf3_src, p_init, N * sizeof (cdf3));
+ memcpy (cdf4_src, p_init, N * sizeof (cdf4));
+ memcpy (pd_src, p_init, N * sizeof (_Complex double));
+ for (int i = 0; i != 2 * N * sizeof (double); i++)
+ p_init[i] = i % 16;
+ memcpy (pd_src2, p_init, N * sizeof (_Complex double));
+
+ foo (cdf_dst, cdf_src);
+ foo1 (cdf2_dst, cdf2_src);
+ foo2 (cdf3_dst, cdf3_src);
+ foo3 (cdf4_dst, cdf4_src);
+ foo4 (pd_dst, pd_src);
+ foo5 (pd_dst2, pd_src2);
+ for (int i = 0; i != N; i++)
+ {
+ p_init[(N - i - 1) * 16] = i * 16;
+ p_init[(N - i - 1) * 16 + 1] = i * 16 + 1;
+ p_init[(N - i - 1) * 16 + 2] = i * 16 + 2;
+ p_init[(N - i - 1) * 16 + 3] = i * 16 + 3;
+ p_init[(N - i - 1) * 16 + 4] = i * 16 + 4;
+ p_init[(N - i - 1) * 16 + 5] = i * 16 + 5;
+ p_init[(N - i - 1) * 16 + 6] = i * 16 + 6;
+ p_init[(N - i - 1) * 16 + 7] = i * 16 + 7;
+ p_init[(N - i - 1) * 16 + 8] = i * 16 + 8;
+ p_init[(N - i - 1) * 16 + 9] = i * 16 + 9;
+ p_init[(N - i - 1) * 16 + 10] = i * 16 + 10;
+ p_init[(N - i - 1) * 16 + 11] = i * 16 + 11;
+ p_init[(N - i - 1) * 16 + 12] = i * 16 + 12;
+ p_init[(N - i - 1) * 16 + 13] = i * 16 + 13;
+ p_init[(N - i - 1) * 16 + 14] = i * 16 + 14;
+ p_init[(N - i - 1) * 16 + 15] = i * 16 + 15;
+ }
+ memcpy (pd_src, p_init, N * 16);
+
+ if (__builtin_memcmp (pd_dst, pd_src, N * 2 * sizeof (double)) != 0)
+ __builtin_abort ();
+
+ if (__builtin_memcmp (pd_dst2, pd_src2, N * 2 * sizeof (double)) != 0)
+ __builtin_abort ();
+
+ if (__builtin_memcmp (cdf_dst, cdf_src, N * sizeof (cdf)) != 0)
+ __builtin_abort ();
+
+ if (__builtin_memcmp (cdf2_dst, cdf2_src, N * sizeof (cdf2)) != 0)
+ __builtin_abort ();
+
+ if (__builtin_memcmp (cdf3_dst, cdf3_src, N * sizeof (cdf3)) != 0)
+ __builtin_abort ();
+
+ if (__builtin_memcmp (cdf4_dst, cdf4_src, N * sizeof (cdf4)) != 0)
+ __builtin_abort ();
+}
@@ -1403,7 +1403,8 @@ vect_get_data_access_cost (vec_info *vinfo, dr_vec_info *dr_info,
if (PURE_SLP_STMT (stmt_info))
ncopies = 1;
else
- ncopies = vect_get_num_copies (loop_vinfo, STMT_VINFO_VECTYPE (stmt_info));
+ ncopies = vect_get_num_copies (loop_vinfo, STMT_VINFO_VECTYPE (stmt_info),
+ STMT_VINFO_COMPLEX_P (stmt_info));
if (DR_IS_READ (dr_info->dr))
vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
@@ -4597,8 +4598,22 @@ vect_analyze_data_refs (vec_info *vinfo, poly_uint64 *min_vf, bool *fatal)
/* Set vectype for STMT. */
scalar_type = TREE_TYPE (DR_REF (dr));
- tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
- if (!vectype)
+ tree adjust_scalar_type = scalar_type;
+ /* Support Complex type access. Note that the complex type of load/store
+ does not support gather/scatter. */
+ if (TREE_CODE (scalar_type) == COMPLEX_TYPE
+ && gatherscatter == SG_NONE)
+ {
+ adjust_scalar_type = TREE_TYPE (scalar_type);
+ STMT_VINFO_COMPLEX_P (stmt_info) = true;
+ }
+ tree vectype = get_vectype_for_scalar_type (vinfo, adjust_scalar_type);
+ unsigned HOST_WIDE_INT constant_nunits;
+ if (!vectype
+ /* For complex type, V1DI doesn't make sense. */
+ || (STMT_VINFO_COMPLEX_P (stmt_info)
+ && (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&constant_nunits)
+ || constant_nunits == 1)))
{
if (dump_enabled_p ())
{
@@ -4635,8 +4650,11 @@ vect_analyze_data_refs (vec_info *vinfo, poly_uint64 *min_vf, bool *fatal)
}
/* Adjust the minimal vectorization factor according to the
- vector type. */
+ vector type. Note for complex type, VF is half of
+ TYPE_VECTOR_SUBPARTS. */
vf = TYPE_VECTOR_SUBPARTS (vectype);
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ vf = exact_div (vf, 2);
*min_vf = upper_bound (*min_vf, vf);
/* Leave the BB vectorizer to pick the vector type later, based on
@@ -6140,21 +6158,55 @@ vect_permute_load_chain (vec_info *vinfo, vec<tree> dr_chain,
vec_perm_indices indices;
for (k = 0; k < 3; k++)
{
- for (i = 0; i < nelt; i++)
- if (3 * i + k < 2 * nelt)
- sel[i] = 3 * i + k;
- else
- sel[i] = 0;
- indices.new_vector (sel, 2, nelt);
- perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ {
+ for (i = 0; i < nelt / 2; i++)
+ if (6 * i + 2 * k + 1 < 2 * nelt)
+ {
+ sel[2 * i] = 6 * i + 2 * k;
+ sel[2 * i + 1] = 6 * i + 2 * k + 1;
+ }
+ else
+ {
+ sel[2 * i] = 0;
+ sel[2 * i + 1] = 0;
+ }
- for (i = 0, j = 0; i < nelt; i++)
- if (3 * i + k < 2 * nelt)
- sel[i] = i;
- else
- sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
- indices.new_vector (sel, 2, nelt);
- perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
+ indices.new_vector (sel, 2, nelt);
+ perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
+
+ for (i = 0, j = 0; i < nelt / 2; i++)
+ if (6 * i + 2 * k + 1 < 2 * nelt)
+ {
+ sel[2 * i] = 2 * i;
+ sel[2 * i + 1] = 2 * i + 1;
+ }
+ else
+ {
+ sel[2 * i] = nelt + ((nelt + 2 * k) % 6) + 6 * j;
+ sel[2 * i + 1] = nelt + ((nelt + 2 * k) % 6) + 6 * (j++) + 1;
+ }
+ indices.new_vector (sel, 2, nelt);
+ perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
+ }
+ else
+ {
+ for (i = 0; i < nelt; i++)
+ if (3 * i + k < 2 * nelt)
+ sel[i] = 3 * i + k;
+ else
+ sel[i] = 0;
+ indices.new_vector (sel, 2, nelt);
+ perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
+
+ for (i = 0, j = 0; i < nelt; i++)
+ if (3 * i + k < 2 * nelt)
+ sel[i] = i;
+ else
+ sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
+ indices.new_vector (sel, 2, nelt);
+ perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
+ }
first_vect = dr_chain[0];
second_vect = dr_chain[1];
@@ -6186,17 +6238,43 @@ vect_permute_load_chain (vec_info *vinfo, vec<tree> dr_chain,
/* The encoding has a single stepped pattern. */
poly_uint64 nelt = TYPE_VECTOR_SUBPARTS (vectype);
- vec_perm_builder sel (nelt, 1, 3);
- sel.quick_grow (3);
- for (i = 0; i < 3; ++i)
- sel[i] = i * 2;
- vec_perm_indices indices (sel, 2, nelt);
- perm_mask_even = vect_gen_perm_mask_checked (vectype, indices);
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ {
+ vec_perm_builder sel;
+ unsigned neltc = nelt.to_constant ();
+ sel.new_vector (neltc, neltc, 1);
+ sel.quick_grow (neltc);
+ for (unsigned i = 0; i != neltc / 2; i++)
+ {
+ sel[2 * i] = i * 4;
+ sel[2 * i + 1] = i * 4 + 1;
+ }
+ vec_perm_indices indices (sel, 2, nelt);
+ perm_mask_even = vect_gen_perm_mask_checked (vectype, indices);
- for (i = 0; i < 3; ++i)
- sel[i] = i * 2 + 1;
- indices.new_vector (sel, 2, nelt);
- perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices);
+ for (unsigned i = 0; i != nelt.to_constant() / 2; i++)
+ {
+ sel[2 * i] = i * 4 + 2;
+ sel[2 * i + 1] = i * 4 + 3;
+ }
+ indices.new_vector (sel, 2, nelt);
+ perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices);
+ }
+ else
+ {
+ vec_perm_builder sel (nelt, 1, 3);
+ sel.quick_grow (3);
+ for (i = 0; i < 3; ++i)
+ sel[i] = i * 2;
+
+ vec_perm_indices indices (sel, 2, nelt);
+ perm_mask_even = vect_gen_perm_mask_checked (vectype, indices);
+
+ for (i = 0; i < 3; ++i)
+ sel[i] = i * 2 + 1;
+ indices.new_vector (sel, 2, nelt);
+ perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices);
+ }
for (i = 0; i < log_length; i++)
{
@@ -200,7 +200,12 @@ vect_determine_vf_for_stmt_1 (vec_info *vinfo, stmt_vec_info stmt_info,
}
if (nunits_vectype)
- vect_update_max_nunits (vf, nunits_vectype);
+ {
+ poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (nunits_vectype);
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ nunits = exact_div (nunits, 2);
+ vect_update_max_nunits (vf, nunits);
+ }
return opt_result::success ();
}
@@ -877,10 +877,14 @@ vect_record_max_nunits (vec_info *vinfo, stmt_vec_info stmt_info,
return false;
}
+ poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ nunits = exact_div (nunits, 2);
+
/* If populating the vector type requires unrolling then fail
before adjusting *max_nunits for basic-block vectorization. */
if (is_a <bb_vec_info> (vinfo)
- && !multiple_p (group_size, TYPE_VECTOR_SUBPARTS (vectype)))
+ && !multiple_p (group_size , nunits))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -891,7 +895,7 @@ vect_record_max_nunits (vec_info *vinfo, stmt_vec_info stmt_info,
}
/* In case of multiple types we need to detect the smallest type. */
- vect_update_max_nunits (max_nunits, vectype);
+ vect_update_max_nunits (max_nunits, nunits);
return true;
}
@@ -3720,22 +3724,54 @@ vect_optimize_slp (vec_info *vinfo)
vect_attempt_slp_rearrange_stmts did. This allows us to be lazy
when permuting constants and invariants keeping the permute
bijective. */
- auto_sbitmap load_index (SLP_TREE_LANES (node));
- bitmap_clear (load_index);
- for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j)
- bitmap_set_bit (load_index, SLP_TREE_LOAD_PERMUTATION (node)[j] - imin);
- unsigned j;
- for (j = 0; j < SLP_TREE_LANES (node); ++j)
- if (!bitmap_bit_p (load_index, j))
- break;
- if (j != SLP_TREE_LANES (node))
- continue;
+ /* Permutation of Complex type. */
+ if (STMT_VINFO_COMPLEX_P (dr_stmt))
+ {
+ auto_sbitmap load_index (SLP_TREE_LANES (node) * 2);
+ bitmap_clear (load_index);
+ for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j)
+ {
+ unsigned bit = SLP_TREE_LOAD_PERMUTATION (node)[j] - imin;
+ bitmap_set_bit (load_index, 2 * bit);
+ bitmap_set_bit (load_index, 2 * bit + 1);
+ }
+ unsigned j;
+ for (j = 0; j < SLP_TREE_LANES (node) * 2; ++j)
+ if (!bitmap_bit_p (load_index, j))
+ break;
+ if (j != SLP_TREE_LANES (node) * 2)
+ continue;
- vec<unsigned> perm = vNULL;
- perm.safe_grow (SLP_TREE_LANES (node), true);
- for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j)
- perm[j] = SLP_TREE_LOAD_PERMUTATION (node)[j] - imin;
- perms.safe_push (perm);
+ vec<unsigned> perm = vNULL;
+ perm.safe_grow (SLP_TREE_LANES (node) * 2, true);
+ for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j)
+ {
+ unsigned cidx = SLP_TREE_LOAD_PERMUTATION (node)[j] - imin;
+ perm[2 * j] = 2 * cidx;
+ perm[2 * j + 1] = 2 * cidx + 1;
+ }
+ perms.safe_push (perm);
+ }
+ else
+ {
+ auto_sbitmap load_index (SLP_TREE_LANES (node));
+ bitmap_clear (load_index);
+ for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j)
+ bitmap_set_bit (load_index,
+ SLP_TREE_LOAD_PERMUTATION (node)[j] - imin);
+ unsigned j;
+ for (j = 0; j < SLP_TREE_LANES (node); ++j)
+ if (!bitmap_bit_p (load_index, j))
+ break;
+ if (j != SLP_TREE_LANES (node))
+ continue;
+
+ vec<unsigned> perm = vNULL;
+ perm.safe_grow (SLP_TREE_LANES (node), true);
+ for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j)
+ perm[j] = SLP_TREE_LOAD_PERMUTATION (node)[j] - imin;
+ perms.safe_push (perm);
+ }
vertices[idx].perm_in = perms.length () - 1;
vertices[idx].perm_out = perms.length () - 1;
}
@@ -4518,6 +4554,12 @@ vect_slp_analyze_node_operations_1 (vec_info *vinfo, slp_tree node,
vf = loop_vinfo->vectorization_factor;
else
vf = 1;
+ /* For complex type and SLP, double vf to get right vectype.
+ .i.e vector(4) double for complex double, group size is 2, double vf
+ to map vf * group_size to TYPE_VECTOR_SUBPARTS. */
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ vf *= 2;
+
unsigned int group_size = SLP_TREE_LANES (node);
tree vectype = SLP_TREE_VECTYPE (node);
SLP_TREE_NUMBER_OF_VEC_STMTS (node)
@@ -4763,10 +4805,17 @@ vect_slp_analyze_node_operations (vec_info *vinfo, slp_tree node,
}
unsigned group_size = SLP_TREE_LANES (child);
poly_uint64 vf = 1;
+
if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
vf = loop_vinfo->vectorization_factor;
+
+ /* V2SF is just 1 complex type, so mutiply by 2
+ to get release vector numbers. */
+ unsigned cp
+ = STMT_VINFO_COMPLEX_P (SLP_TREE_REPRESENTATIVE (node)) ? 2 : 1;
+
SLP_TREE_NUMBER_OF_VEC_STMTS (child)
- = vect_get_num_vectors (vf * group_size, vector_type);
+ = vect_get_num_vectors (vf * group_size * cp, vector_type);
/* And cost them. */
vect_prologue_cost_for_slp (child, cost_vec);
}
@@ -6402,6 +6451,11 @@ vect_create_constant_vectors (vec_info *vinfo, slp_tree op_node)
/* We always want SLP_TREE_VECTYPE (op_node) here correctly set. */
vector_type = SLP_TREE_VECTYPE (op_node);
+ unsigned int cp = 1;
+ /* Handle Complex type vector init.
+ SLP_TREE_REPRESENTATIVE (op_node) could be NULL. */
+ if (TREE_CODE (TREE_TYPE (op_node->ops[0])) == COMPLEX_TYPE)
+ cp = 2;
unsigned int number_of_vectors = SLP_TREE_NUMBER_OF_VEC_STMTS (op_node);
SLP_TREE_VEC_DEFS (op_node).create (number_of_vectors);
@@ -6426,9 +6480,9 @@ vect_create_constant_vectors (vec_info *vinfo, slp_tree op_node)
/* When using duplicate_and_interleave, we just need one element for
each scalar statement. */
if (!TYPE_VECTOR_SUBPARTS (vector_type).is_constant (&nunits))
- nunits = group_size;
+ nunits = group_size * cp;
- number_of_copies = nunits * number_of_vectors / group_size;
+ number_of_copies = nunits * number_of_vectors / (group_size * cp);
number_of_places_left_in_vector = nunits;
constant_p = true;
@@ -6460,8 +6514,23 @@ vect_create_constant_vectors (vec_info *vinfo, slp_tree op_node)
gcc_unreachable ();
}
else
- op = fold_unary (VIEW_CONVERT_EXPR,
- TREE_TYPE (vector_type), op);
+ {
+ tree scalar_type = TREE_TYPE (vector_type);
+ /* For complex type, insert real and imag part
+ separately. */
+ if (cp == 2)
+ {
+ gcc_assert ((TREE_CODE (TREE_TYPE (op))
+ == COMPLEX_TYPE)
+ && (scalar_type
+ == TREE_TYPE (TREE_TYPE (op))));
+ elts[number_of_places_left_in_vector--]
+ = fold_unary (IMAGPART_EXPR, scalar_type, op);
+ op = fold_unary (REALPART_EXPR, scalar_type, op);
+ }
+ else
+ op = fold_unary (VIEW_CONVERT_EXPR, scalar_type, op);
+ }
gcc_assert (op && CONSTANT_CLASS_P (op));
}
else
@@ -6481,11 +6550,28 @@ vect_create_constant_vectors (vec_info *vinfo, slp_tree op_node)
}
else
{
- op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vector_type),
- op);
- init_stmt
- = gimple_build_assign (new_temp, VIEW_CONVERT_EXPR,
- op);
+ tree scalar_type = TREE_TYPE (vector_type);
+ if (cp == 2)
+ {
+ gcc_assert ((TREE_CODE (TREE_TYPE (op))
+ == COMPLEX_TYPE)
+ && (scalar_type
+ == TREE_TYPE (TREE_TYPE (op))));
+ tree imag = build1 (IMAGPART_EXPR, scalar_type, op);
+ op = build1 (REALPART_EXPR, scalar_type, op);
+ tree imag_temp = make_ssa_name (scalar_type);
+ elts[number_of_places_left_in_vector--] = imag_temp;
+ init_stmt = gimple_build_assign (imag_temp, imag);
+ gimple_seq_add_stmt (&ctor_seq, init_stmt);
+ init_stmt = gimple_build_assign (new_temp, op);
+ }
+ else
+ {
+ op = build1 (VIEW_CONVERT_EXPR, scalar_type, op);
+ init_stmt
+ = gimple_build_assign (new_temp, VIEW_CONVERT_EXPR,
+ op);
+ }
}
gimple_seq_add_stmt (&ctor_seq, init_stmt);
op = new_temp;
@@ -6696,15 +6782,17 @@ vect_transform_slp_perm_load (vec_info *vinfo,
unsigned int nelts_to_build;
unsigned int nvectors_per_build;
unsigned int in_nlanes;
+ unsigned int cp = STMT_VINFO_COMPLEX_P (stmt_info) ? 2 : 1;
bool repeating_p = (group_size == DR_GROUP_SIZE (stmt_info)
- && multiple_p (nunits, group_size));
+ && multiple_p (nunits, group_size * cp));
if (repeating_p)
{
/* A single vector contains a whole number of copies of the node, so:
(a) all permutes can use the same mask; and
(b) the permutes only need a single vector input. */
- mask.new_vector (nunits, group_size, 3);
- nelts_to_build = mask.encoded_nelts ();
+ /* For complex type, mask size should be double of nelts_to_build. */
+ mask.new_vector (nunits, group_size * cp, 3);
+ nelts_to_build = mask.encoded_nelts () / cp;
nvectors_per_build = SLP_TREE_VEC_STMTS (node).length ();
in_nlanes = DR_GROUP_SIZE (stmt_info) * 3;
}
@@ -6744,8 +6832,8 @@ vect_transform_slp_perm_load (vec_info *vinfo,
{
/* Enforced before the loop when !repeating_p. */
unsigned int const_nunits = nunits.to_constant ();
- vec_index = i / const_nunits;
- mask_element = i % const_nunits;
+ vec_index = i / (const_nunits / cp);
+ mask_element = i % (const_nunits / cp);
if (vec_index == first_vec_index
|| first_vec_index == -1)
{
@@ -6755,7 +6843,7 @@ vect_transform_slp_perm_load (vec_info *vinfo,
|| second_vec_index == -1)
{
second_vec_index = vec_index;
- mask_element += const_nunits;
+ mask_element += (const_nunits / cp);
}
else
{
@@ -6768,14 +6856,24 @@ vect_transform_slp_perm_load (vec_info *vinfo,
return false;
}
- gcc_assert (mask_element < 2 * const_nunits);
+ gcc_assert (mask_element < 2 * const_nunits / cp);
}
if (mask_element != index)
noop_p = false;
- mask[index++] = mask_element;
+ /* Set index for Complex _type.
+ i.e. mask like [1,0] is actually [2, 3, 0, 1]
+ for vector scalar type. */
+ if (cp == 2)
+ {
+ mask[2 * index] = 2 * mask_element;
+ mask[2 * index + 1] = 2 * mask_element + 1;
+ }
+ else
+ mask[index] = mask_element;
+ index++;
- if (index == count && !noop_p)
+ if (index * cp == count && !noop_p)
{
indices.new_vector (mask, second_vec_index == -1 ? 1 : 2, nunits);
if (!can_vec_perm_const_p (mode, mode, indices))
@@ -6799,7 +6897,7 @@ vect_transform_slp_perm_load (vec_info *vinfo,
++*n_perms;
}
- if (index == count)
+ if (index * cp == count)
{
if (!analyze_only)
{
@@ -6869,7 +6967,7 @@ vect_transform_slp_perm_load (vec_info *vinfo,
bool load_seen = false;
for (unsigned i = 0; i < in_nlanes; ++i)
{
- if (i % const_nunits == 0)
+ if (i % (const_nunits * cp) == 0)
{
if (load_seen)
*n_loads += 1;
@@ -1397,25 +1397,70 @@ vect_init_vector (vec_info *vinfo, stmt_vec_info stmt_info, tree val, tree type,
{
gimple *init_stmt;
tree new_temp;
+ tree scalar_type = TREE_TYPE (type);
+ gimple_seq stmts = NULL;
+
+ if (TREE_CODE (TREE_TYPE (val)) == COMPLEX_TYPE)
+ {
+ unsigned HOST_WIDE_INT nunits;
+ gcc_assert (TYPE_VECTOR_SUBPARTS (type).is_constant (&nunits));
+
+ tree_vector_builder elts (type, nunits, 1);
+ tree imag, real;
+ if (TREE_CODE (val) == COMPLEX_CST)
+ {
+ real = fold_unary (REALPART_EXPR, scalar_type, val);
+ imag = fold_unary (IMAGPART_EXPR, scalar_type, val);
+ }
+ else
+ {
+ real = make_ssa_name (scalar_type);
+ imag = make_ssa_name (scalar_type);
+ init_stmt
+ = gimple_build_assign (real,
+ build1 (REALPART_EXPR, scalar_type, val));
+ gimple_seq_add_stmt (&stmts, init_stmt);
+ init_stmt
+ = gimple_build_assign (imag,
+ build1 (IMAGPART_EXPR, scalar_type, val));
+ gimple_seq_add_stmt (&stmts, init_stmt);
+ }
+ /* Build vector as [real,imag,real,imag,...]. */
+ for (unsigned i = 0; i != nunits; i++)
+ {
+ if (i % 2)
+ elts.quick_push (imag);
+ else
+ elts.quick_push (real);
+ }
+ val = gimple_build_vector (&stmts, &elts);
+ if (!gimple_seq_empty_p (stmts))
+ {
+ if (gsi)
+ gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+ else
+ vinfo->insert_seq_on_entry (stmt_info, stmts);
+ }
+ }
/* We abuse this function to push sth to a SSA name with initial 'val'. */
- if (! useless_type_conversion_p (type, TREE_TYPE (val)))
+ else if (! useless_type_conversion_p (type, TREE_TYPE (val)))
{
gcc_assert (TREE_CODE (type) == VECTOR_TYPE);
- if (! types_compatible_p (TREE_TYPE (type), TREE_TYPE (val)))
+ if (! types_compatible_p (scalar_type, TREE_TYPE (val)))
{
/* Scalar boolean value should be transformed into
all zeros or all ones value before building a vector. */
if (VECTOR_BOOLEAN_TYPE_P (type))
{
- tree true_val = build_all_ones_cst (TREE_TYPE (type));
- tree false_val = build_zero_cst (TREE_TYPE (type));
+ tree true_val = build_all_ones_cst (scalar_type);
+ tree false_val = build_zero_cst (scalar_type);
if (CONSTANT_CLASS_P (val))
val = integer_zerop (val) ? false_val : true_val;
else
{
- new_temp = make_ssa_name (TREE_TYPE (type));
+ new_temp = make_ssa_name (scalar_type);
init_stmt = gimple_build_assign (new_temp, COND_EXPR,
val, true_val, false_val);
vect_init_vector_1 (vinfo, stmt_info, init_stmt, gsi);
@@ -1424,14 +1469,13 @@ vect_init_vector (vec_info *vinfo, stmt_vec_info stmt_info, tree val, tree type,
}
else
{
- gimple_seq stmts = NULL;
if (! INTEGRAL_TYPE_P (TREE_TYPE (val)))
val = gimple_build (&stmts, VIEW_CONVERT_EXPR,
- TREE_TYPE (type), val);
+ scalar_type, val);
else
/* ??? Condition vectorization expects us to do
promotion of invariant/external defs. */
- val = gimple_convert (&stmts, TREE_TYPE (type), val);
+ val = gimple_convert (&stmts, scalar_type, val);
for (gimple_stmt_iterator gsi2 = gsi_start (stmts);
!gsi_end_p (gsi2); )
{
@@ -1496,7 +1540,12 @@ vect_get_vec_defs_for_operand (vec_info *vinfo, stmt_vec_info stmt_vinfo,
&& VECTOR_BOOLEAN_TYPE_P (stmt_vectype))
vector_type = truth_type_for (stmt_vectype);
else
- vector_type = get_vectype_for_scalar_type (loop_vinfo, TREE_TYPE (op));
+ {
+ tree scalar_type = TREE_TYPE (op);
+ if (STMT_VINFO_COMPLEX_P (stmt_vinfo))
+ scalar_type = TREE_TYPE (scalar_type);
+ vector_type = get_vectype_for_scalar_type (loop_vinfo, scalar_type);
+ }
gcc_assert (vector_type);
tree vop = vect_init_vector (vinfo, stmt_vinfo, op, vector_type, NULL);
@@ -1892,6 +1941,13 @@ vect_truncate_gather_scatter_offset (stmt_vec_info stmt_info,
return false;
}
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Complex type doens't support gather_scatter.\n");
+ return false;
+ }
/* Get the number of bits in an element. */
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
scalar_mode element_mode = SCALAR_TYPE_MODE (TREE_TYPE (vectype));
@@ -2022,6 +2078,30 @@ perm_mask_for_reverse (tree vectype)
return vect_gen_perm_mask_checked (vectype, indices);
}
+static tree
+perm_mask_for_reverse (tree vectype, bool complex_p)
+{
+ if (!complex_p)
+ return perm_mask_for_reverse (vectype);
+
+ unsigned HOST_WIDE_INT nunits;
+ gcc_assert (TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits));
+
+ /* The encoding has a single stepped pattern. */
+ vec_perm_builder sel (nunits, nunits, 1);
+ for (unsigned i = 0; i < nunits; i+=2)
+ {
+ sel.quick_push (nunits - 2 - i);
+ sel.quick_push (nunits - 1 - i);
+ }
+
+ vec_perm_indices indices (sel, 1, nunits);
+ if (!can_vec_perm_const_p (TYPE_MODE (vectype), TYPE_MODE (vectype),
+ indices))
+ return NULL_TREE;
+ return vect_gen_perm_mask_checked (vectype, indices);
+}
+
/* A subroutine of get_load_store_type, with a subset of the same
arguments. Handle the case where STMT_INFO is a load or store that
accesses consecutive elements with a negative step. Sets *POFFSET
@@ -2045,8 +2125,12 @@ get_negative_load_store_type (vec_info *vinfo,
}
/* For backward running DRs the first access in vectype actually is
- N-1 elements before the address of the DR. */
- *poffset = ((-TYPE_VECTOR_SUBPARTS (vectype) + 1)
+ N-1 elements before the address of the DR.
+ for Complex type, it's N - 2. */
+ unsigned cp = 1;
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ cp = 2;
+ *poffset = ((-TYPE_VECTOR_SUBPARTS (vectype) + cp)
* TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype))));
int misalignment = dr_misalignment (dr_info, vectype, *poffset);
@@ -2071,7 +2155,7 @@ get_negative_load_store_type (vec_info *vinfo,
return VMAT_CONTIGUOUS_DOWN;
}
- if (!perm_mask_for_reverse (vectype))
+ if (!perm_mask_for_reverse (vectype, STMT_VINFO_COMPLEX_P (stmt_info)))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -2188,6 +2272,8 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
&& !DR_GROUP_NEXT_ELEMENT (stmt_info));
unsigned HOST_WIDE_INT gap = DR_GROUP_GAP (first_stmt_info);
poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ nunits = exact_div (nunits, 2);
/* True if the vectorized statements would access beyond the last
statement in the group. */
@@ -2352,7 +2438,11 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
{
/* First cope with the degenerate case of a single-element
vector. */
- if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U))
+ poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ nunits = exact_div (nunits, 2);
+
+ if (known_eq (nunits, 1U))
;
/* Otherwise try using LOAD/STORE_LANES. */
@@ -2361,6 +2451,8 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
: vect_store_lanes_supported (vectype, group_size,
masked_p))
{
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ return false;
*memory_access_type = VMAT_LOAD_STORE_LANES;
overrun_p = would_overrun_p;
}
@@ -2620,6 +2712,14 @@ vect_check_scalar_mask (vec_info *vinfo, stmt_vec_info stmt_info,
return false;
}
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "Complex type doesn't support mask argument.\n");
+ return false;
+ }
+
if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (*mask)))
{
if (dump_enabled_p ())
@@ -7509,8 +7609,17 @@ vectorizable_store (vec_info *vinfo,
same location twice. */
gcc_assert (slp == PURE_SLP_STMT (stmt_info));
+ if (!STMT_VINFO_DATA_REF (stmt_info))
+ return false;
+
tree vectype = STMT_VINFO_VECTYPE (stmt_info), rhs_vectype = NULL_TREE;
poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ {
+ if (!nunits.is_constant ())
+ return false;
+ nunits = exact_div (nunits, 2);
+ }
if (loop_vinfo)
{
@@ -7526,7 +7635,8 @@ vectorizable_store (vec_info *vinfo,
if (slp)
ncopies = 1;
else
- ncopies = vect_get_num_copies (loop_vinfo, vectype);
+ ncopies = vect_get_num_copies (loop_vinfo, vectype,
+ STMT_VINFO_COMPLEX_P (stmt_info));
gcc_assert (ncopies >= 1);
@@ -7544,11 +7654,10 @@ vectorizable_store (vec_info *vinfo,
return false;
elem_type = TREE_TYPE (vectype);
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ elem_type = build_complex_type (elem_type);
vec_mode = TYPE_MODE (vectype);
- if (!STMT_VINFO_DATA_REF (stmt_info))
- return false;
-
vect_memory_access_type memory_access_type;
enum dr_alignment_support alignment_support_scheme;
int misalignment;
@@ -7951,21 +8060,31 @@ vectorizable_store (vec_info *vinfo,
tree lvectype = vectype;
if (slp)
{
+ scalar_mode elmode;
if (group_size < const_nunits
&& const_nunits % group_size == 0)
{
nstores = const_nunits / group_size;
- lnel = group_size;
- ltype = build_vector_type (elem_type, group_size);
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ {
+ lnel = group_size * 2;
+ ltype = build_vector_type (TREE_TYPE (elem_type), group_size * 2);
+ elmode = SCALAR_TYPE_MODE (TREE_TYPE (elem_type));
+ }
+ else
+ {
+ ltype = build_vector_type (elem_type, group_size);
+ lnel = group_size;
+ elmode = SCALAR_TYPE_MODE (elem_type);
+ }
lvectype = vectype;
/* First check if vec_extract optab doesn't support extraction
of vector elts directly. */
- scalar_mode elmode = SCALAR_TYPE_MODE (elem_type);
machine_mode vmode;
if (!VECTOR_MODE_P (TYPE_MODE (vectype))
|| !related_vector_mode (TYPE_MODE (vectype), elmode,
- group_size).exists (&vmode)
+ lnel).exists (&vmode)
|| (convert_optab_handler (vec_extract_optab,
TYPE_MODE (vectype), vmode)
== CODE_FOR_nothing))
@@ -8051,6 +8170,8 @@ vectorizable_store (vec_info *vinfo,
unsigned int group_el = 0;
unsigned HOST_WIDE_INT
elsz = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ elsz *= 2;
for (j = 0; j < ncopies; j++)
{
vec_oprnd = vec_oprnds[j];
@@ -8448,7 +8569,9 @@ vectorizable_store (vec_info *vinfo,
if (memory_access_type == VMAT_CONTIGUOUS_REVERSE)
{
- tree perm_mask = perm_mask_for_reverse (vectype);
+ tree perm_mask
+ = perm_mask_for_reverse (vectype,
+ STMT_VINFO_COMPLEX_P (stmt_info));
tree perm_dest = vect_create_destination_var
(vect_get_store_rhs (stmt_info), vectype);
tree new_temp = make_ssa_name (perm_dest);
@@ -8778,6 +8901,12 @@ vectorizable_load (vec_info *vinfo,
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ {
+ if (!nunits.is_constant ())
+ return false;
+ nunits = exact_div (nunits, 2);
+ }
if (loop_vinfo)
{
@@ -8794,7 +8923,8 @@ vectorizable_load (vec_info *vinfo,
if (slp)
ncopies = 1;
else
- ncopies = vect_get_num_copies (loop_vinfo, vectype);
+ ncopies = vect_get_num_copies (loop_vinfo, vectype,
+ STMT_VINFO_COMPLEX_P (stmt_info));
gcc_assert (ncopies >= 1);
@@ -8822,6 +8952,8 @@ vectorizable_load (vec_info *vinfo,
}
elem_type = TREE_TYPE (vectype);
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ elem_type = build_complex_type (elem_type);
mode = TYPE_MODE (vectype);
/* FORNOW. In some cases can vectorize even if data-type not supported
@@ -8870,8 +9002,11 @@ vectorizable_load (vec_info *vinfo,
if (k > maxk)
maxk = k;
tree vectype = SLP_TREE_VECTYPE (slp_node);
+ /* For complex type, half the nunits. */
if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits)
- || maxk >= (DR_GROUP_SIZE (group_info) & ~(nunits - 1)))
+ || maxk >= (DR_GROUP_SIZE (group_info)
+ & ~((STMT_VINFO_COMPLEX_P (group_info)
+ ? nunits >> 1 : nunits) - 1)))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -9098,9 +9233,10 @@ vectorizable_load (vec_info *vinfo,
}
else
{
+ unsigned cp = STMT_VINFO_COMPLEX_P (stmt_info) ? 2 : 1;
if (grouped_load)
cst_offset
- = (tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (vectype)))
+ = (tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (vectype))) * cp
* vect_get_place_in_interleaving_chain (stmt_info,
first_stmt_info));
group_size = 1;
@@ -9150,6 +9286,8 @@ vectorizable_load (vec_info *vinfo,
int nloads = const_nunits;
int lnel = 1;
tree ltype = TREE_TYPE (vectype);
+ if (STMT_VINFO_COMPLEX_P (stmt_info))
+ ltype = build_complex_type (ltype);
tree lvectype = vectype;
auto_vec<tree> dr_chain;
if (memory_access_type == VMAT_STRIDED_SLP)
@@ -10080,7 +10218,9 @@ vectorizable_load (vec_info *vinfo,
if (memory_access_type == VMAT_CONTIGUOUS_REVERSE)
{
- tree perm_mask = perm_mask_for_reverse (vectype);
+ tree perm_mask
+ = perm_mask_for_reverse (vectype,
+ STMT_VINFO_COMPLEX_P (stmt_info));
new_temp = permute_vec_elements (vinfo, new_temp, new_temp,
perm_mask, stmt_info, gsi);
new_stmt = SSA_NAME_DEF_STMT (new_temp);
@@ -12499,12 +12639,27 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
dump_printf_loc (MSG_NOTE, vect_location,
"get vectype for scalar type: %T\n", scalar_type);
}
+
+ tree orig_scalar_type = scalar_type;
+ if (TREE_CODE (scalar_type) == COMPLEX_TYPE)
+ {
+ /* Set complex_p for BB vectorizer. */
+ STMT_VINFO_COMPLEX_P (stmt_info) = true;
+ scalar_type = TREE_TYPE (scalar_type);
+ /* Double group_size for BB vectorizer to make
+ following 2 get_vectype_for_scalar_type return wanted vectype.
+ Real group size is not changed, just make the "faked" input
+ group_size. */
+ group_size *= 2;
+ }
vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
- if (!vectype)
+ if (!vectype
+ || (STMT_VINFO_COMPLEX_P (stmt_info)
+ && !TYPE_VECTOR_SUBPARTS (vectype).is_constant ()))
return opt_result::failure_at (stmt,
"not vectorized:"
" unsupported data-type %T\n",
- scalar_type);
+ orig_scalar_type);
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
@@ -12529,16 +12684,30 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
TREE_TYPE (vectype));
if (scalar_type != TREE_TYPE (vectype))
{
- if (dump_enabled_p ())
+ tree orig_scalar_type = scalar_type;
+ if (TREE_CODE (scalar_type) == COMPLEX_TYPE)
+ {
+ /* Set complex_p for Loop vectorizer. */
+ STMT_VINFO_COMPLEX_P (stmt_info) = true;
+ scalar_type = TREE_TYPE (scalar_type);
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "get complex for smallest scalar type: %T\n",
+ scalar_type);
+
+ }
+ else if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
"get vectype for smallest scalar type: %T\n",
scalar_type);
nunits_vectype = get_vectype_for_scalar_type (vinfo, scalar_type,
group_size);
- if (!nunits_vectype)
+ if (!nunits_vectype
+ || (STMT_VINFO_COMPLEX_P (stmt_info)
+ && !TYPE_VECTOR_SUBPARTS (nunits_vectype).is_constant ()))
return opt_result::failure_at
(stmt, "not vectorized: unsupported data-type %T\n",
- scalar_type);
+ orig_scalar_type);
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location, "nunits vectype: %T\n",
nunits_vectype);
@@ -1161,6 +1161,9 @@ public:
vectorization. */
bool vectorizable;
+ /* The scalar type of the LHS of this statement is complex type. */
+ bool complex_p;
+
/* The stmt to which this info struct refers to. */
gimple *stmt;
@@ -1395,6 +1398,7 @@ struct gather_scatter_info {
#define STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT(S) (S)->reduc_epilogue_adjustment
#define STMT_VINFO_REDUC_IDX(S) (S)->reduc_idx
#define STMT_VINFO_FORCE_SINGLE_CYCLE(S) (S)->force_single_cycle
+#define STMT_VINFO_COMPLEX_P(S) (S)->complex_p
#define STMT_VINFO_DR_WRT_VEC_LOOP(S) (S)->dr_wrt_vec_loop
#define STMT_VINFO_DR_BASE_ADDRESS(S) (S)->dr_wrt_vec_loop.base_address
@@ -1970,6 +1974,15 @@ vect_get_num_copies (loop_vec_info loop_vinfo, tree vectype)
return vect_get_num_vectors (LOOP_VINFO_VECT_FACTOR (loop_vinfo), vectype);
}
+static inline unsigned int
+vect_get_num_copies (loop_vec_info loop_vinfo, tree vectype, bool complex_p)
+{
+ poly_uint64 nunits = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+ if (complex_p)
+ nunits *= 2;
+ return vect_get_num_vectors (nunits, vectype);
+}
+
/* Update maximum unit count *MAX_NUNITS so that it accounts for
NUNITS. *MAX_NUNITS can be 1 if we haven't yet recorded anything. */