Optimize nested permutation to single VEC_PERM_EXPR [PR54346]

Message ID 20220926065604.783193-1-liwei.xu@intel.com
State Accepted, archived
Headers
Series Optimize nested permutation to single VEC_PERM_EXPR [PR54346] |

Checks

Context Check Description
snail/gcc-patches-check success Github commit url

Commit Message

Liwei Xu Sept. 26, 2022, 6:56 a.m. UTC
  This patch implemented the optimization in PR 54346, which Merges

	c = VEC_PERM_EXPR <a, b, VCST0>;
        d = VEC_PERM_EXPR <c, c, VCST1>;
                to
        d = VEC_PERM_EXPR <a, b, NEW_VCST>;

	Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
        tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
        is ok to removed it.

gcc/ChangeLog:

	PR target/54346
	* match.pd: Merge the index of VCST then generates the new vec_perm.

gcc/testsuite/ChangeLog:

	PR target/54346
	* gcc.dg/pr54346.c: New test.

Co-authored-by: liuhongt <hongtao.liu@intel.com>
---
 gcc/match.pd                   | 41 ++++++++++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++
 2 files changed, 54 insertions(+)
 create mode 100755 gcc/testsuite/gcc.dg/pr54346.c
  

Comments

Richard Biener Sept. 26, 2022, 8:21 a.m. UTC | #1
On Mon, Sep 26, 2022 at 8:58 AM Liwei Xu <liwei.xu@intel.com> wrote:
>
>         This patch implemented the optimization in PR 54346, which Merges
>
>         c = VEC_PERM_EXPR <a, b, VCST0>;
>         d = VEC_PERM_EXPR <c, c, VCST1>;
>                 to
>         d = VEC_PERM_EXPR <a, b, NEW_VCST>;
>
>         Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
>         tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
>         is ok to removed it.

Looks good, but leave Richard a chance to ask for VLA vector support which
might be trivial to do.

Btw, doesn't this handle the VEC_PERM + VEC_PERM case in
tree-ssa-forwprop.cc:simplify_permutation as well?  Note _that_ does
seem to handle VLA vectors.

Thanks,
Richard.

> gcc/ChangeLog:
>
>         PR target/54346
>         * match.pd: Merge the index of VCST then generates the new vec_perm.
>
> gcc/testsuite/ChangeLog:
>
>         PR target/54346
>         * gcc.dg/pr54346.c: New test.
>
> Co-authored-by: liuhongt <hongtao.liu@intel.com>
> ---
>  gcc/match.pd                   | 41 ++++++++++++++++++++++++++++++++++
>  gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++
>  2 files changed, 54 insertions(+)
>  create mode 100755 gcc/testsuite/gcc.dg/pr54346.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 345bcb701a5..9219b0a10e1 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -8086,6 +8086,47 @@ and,
>    (minus (mult (vec_perm @1 @1 @3) @2) @4)))
>
>
> +/* (PR54346) Merge
> +   c = VEC_PERM_EXPR <a, b, VCST0>;
> +   d = VEC_PERM_EXPR <c, c, VCST1>;
> +   to
> +   d = VEC_PERM_EXPR <a, b, NEW_VCST>; */
> +
> +(simplify
> + (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
> + (with
> +  {
> +    if(!TYPE_VECTOR_SUBPARTS (type).is_constant())
> +      return NULL_TREE;
> +
> +    tree op0;
> +    machine_mode result_mode = TYPE_MODE (type);
> +    machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
> +    int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant();
> +    vec_perm_builder builder0;
> +    vec_perm_builder builder1;
> +    vec_perm_builder builder2 (nelts, nelts, 1);
> +
> +    if (!tree_to_vec_perm_builder (&builder0, @3)
> +    || !tree_to_vec_perm_builder (&builder1, @4))
> +      return NULL_TREE;
> +
> +    vec_perm_indices sel0 (builder0, 2, nelts);
> +    vec_perm_indices sel1 (builder1, 1, nelts);
> +
> +    for (int i = 0; i < nelts; i++)
> +      builder2.quick_push (sel0[sel1[i].to_constant()]);
> +
> +    vec_perm_indices sel2 (builder2, 2, nelts);
> +
> +    if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false))
> +      return NULL_TREE;
> +
> +    op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
> +  }
> +  (vec_perm @1 @2 { op0; })))
> +
> +
>  /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
>     The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
>     constant which when multiplied by a power of 2 contains a unique value
> diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c
> new file mode 100755
> index 00000000000..d87dc3a79a5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr54346.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-dse1" } */
> +
> +typedef int veci __attribute__ ((vector_size (4 * sizeof (int))));
> +
> +void fun (veci a, veci b, veci *i)
> +{
> +  veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7});
> +  *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 });
> +}
> +
> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */
> +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */
> \ No newline at end of file
> --
> 2.18.2
>
  
Richard Sandiford Sept. 30, 2022, 9:17 a.m. UTC | #2
Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Sep 26, 2022 at 8:58 AM Liwei Xu <liwei.xu@intel.com> wrote:
>>
>>         This patch implemented the optimization in PR 54346, which Merges
>>
>>         c = VEC_PERM_EXPR <a, b, VCST0>;
>>         d = VEC_PERM_EXPR <c, c, VCST1>;
>>                 to
>>         d = VEC_PERM_EXPR <a, b, NEW_VCST>;
>>
>>         Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
>>         tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
>>         is ok to removed it.
>
> Looks good, but leave Richard a chance to ask for VLA vector support which
> might be trivial to do.

Sorry for the slow reply.  It might be tricky to handle the general case,
so I'd suggest going with this for now and dealing with VLA as a follow-up.
(Probably after Prathamesh's changes to fold_vec_perm_expr.)

Thanks,
Richard

> Btw, doesn't this handle the VEC_PERM + VEC_PERM case in
> tree-ssa-forwprop.cc:simplify_permutation as well?  Note _that_ does
> seem to handle VLA vectors.
>
> Thanks,
> Richard.
>
>> gcc/ChangeLog:
>>
>>         PR target/54346
>>         * match.pd: Merge the index of VCST then generates the new vec_perm.
>>
>> gcc/testsuite/ChangeLog:
>>
>>         PR target/54346
>>         * gcc.dg/pr54346.c: New test.
>>
>> Co-authored-by: liuhongt <hongtao.liu@intel.com>
>> ---
>>  gcc/match.pd                   | 41 ++++++++++++++++++++++++++++++++++
>>  gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++
>>  2 files changed, 54 insertions(+)
>>  create mode 100755 gcc/testsuite/gcc.dg/pr54346.c
>>
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 345bcb701a5..9219b0a10e1 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -8086,6 +8086,47 @@ and,
>>    (minus (mult (vec_perm @1 @1 @3) @2) @4)))
>>
>>
>> +/* (PR54346) Merge
>> +   c = VEC_PERM_EXPR <a, b, VCST0>;
>> +   d = VEC_PERM_EXPR <c, c, VCST1>;
>> +   to
>> +   d = VEC_PERM_EXPR <a, b, NEW_VCST>; */
>> +
>> +(simplify
>> + (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
>> + (with
>> +  {
>> +    if(!TYPE_VECTOR_SUBPARTS (type).is_constant())
>> +      return NULL_TREE;
>> +
>> +    tree op0;
>> +    machine_mode result_mode = TYPE_MODE (type);
>> +    machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
>> +    int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant();
>> +    vec_perm_builder builder0;
>> +    vec_perm_builder builder1;
>> +    vec_perm_builder builder2 (nelts, nelts, 1);
>> +
>> +    if (!tree_to_vec_perm_builder (&builder0, @3)
>> +    || !tree_to_vec_perm_builder (&builder1, @4))
>> +      return NULL_TREE;
>> +
>> +    vec_perm_indices sel0 (builder0, 2, nelts);
>> +    vec_perm_indices sel1 (builder1, 1, nelts);
>> +
>> +    for (int i = 0; i < nelts; i++)
>> +      builder2.quick_push (sel0[sel1[i].to_constant()]);
>> +
>> +    vec_perm_indices sel2 (builder2, 2, nelts);
>> +
>> +    if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false))
>> +      return NULL_TREE;
>> +
>> +    op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
>> +  }
>> +  (vec_perm @1 @2 { op0; })))
>> +
>> +
>>  /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
>>     The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
>>     constant which when multiplied by a power of 2 contains a unique value
>> diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c
>> new file mode 100755
>> index 00000000000..d87dc3a79a5
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr54346.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O -fdump-tree-dse1" } */
>> +
>> +typedef int veci __attribute__ ((vector_size (4 * sizeof (int))));
>> +
>> +void fun (veci a, veci b, veci *i)
>> +{
>> +  veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7});
>> +  *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 });
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */
>> +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */
>> \ No newline at end of file
>> --
>> 2.18.2
>>
  
Li, Pan2 via Gcc-patches Sept. 30, 2022, 9:36 a.m. UTC | #3
Hi Richard 

Sure, I’ll also check the VLA case once I’m back from vacation, thank you for your help.

Best Regards
Liwei

> On 30 Sep 2022, at 5:17 pm, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Mon, Sep 26, 2022 at 8:58 AM Liwei Xu <liwei.xu@intel.com> wrote:
>>> 
>>>        This patch implemented the optimization in PR 54346, which Merges
>>> 
>>>        c = VEC_PERM_EXPR <a, b, VCST0>;
>>>        d = VEC_PERM_EXPR <c, c, VCST1>;
>>>                to
>>>        d = VEC_PERM_EXPR <a, b, NEW_VCST>;
>>> 
>>>        Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
>>>        tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
>>>        is ok to removed it.
>> 
>> Looks good, but leave Richard a chance to ask for VLA vector support which
>> might be trivial to do.
> 
> Sorry for the slow reply.  It might be tricky to handle the general case,
> so I'd suggest going with this for now and dealing with VLA as a follow-up.
> (Probably after Prathamesh's changes to fold_vec_perm_expr.)
> 
> Thanks,
> Richard
> 
>> Btw, doesn't this handle the VEC_PERM + VEC_PERM case in
>> tree-ssa-forwprop.cc:simplify_permutation as well?  Note _that_ does
>> seem to handle VLA vectors.
>> 
>> Thanks,
>> Richard.
>> 
>>> gcc/ChangeLog:
>>> 
>>>        PR target/54346
>>>        * match.pd: Merge the index of VCST then generates the new vec_perm.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>        PR target/54346
>>>        * gcc.dg/pr54346.c: New test.
>>> 
>>> Co-authored-by: liuhongt <hongtao.liu@intel.com>
>>> ---
>>> gcc/match.pd                   | 41 ++++++++++++++++++++++++++++++++++
>>> gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++
>>> 2 files changed, 54 insertions(+)
>>> create mode 100755 gcc/testsuite/gcc.dg/pr54346.c
>>> 
>>> diff --git a/gcc/match.pd b/gcc/match.pd
>>> index 345bcb701a5..9219b0a10e1 100644
>>> --- a/gcc/match.pd
>>> +++ b/gcc/match.pd
>>> @@ -8086,6 +8086,47 @@ and,
>>>   (minus (mult (vec_perm @1 @1 @3) @2) @4)))
>>> 
>>> 
>>> +/* (PR54346) Merge
>>> +   c = VEC_PERM_EXPR <a, b, VCST0>;
>>> +   d = VEC_PERM_EXPR <c, c, VCST1>;
>>> +   to
>>> +   d = VEC_PERM_EXPR <a, b, NEW_VCST>; */
>>> +
>>> +(simplify
>>> + (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
>>> + (with
>>> +  {
>>> +    if(!TYPE_VECTOR_SUBPARTS (type).is_constant())
>>> +      return NULL_TREE;
>>> +
>>> +    tree op0;
>>> +    machine_mode result_mode = TYPE_MODE (type);
>>> +    machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
>>> +    int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant();
>>> +    vec_perm_builder builder0;
>>> +    vec_perm_builder builder1;
>>> +    vec_perm_builder builder2 (nelts, nelts, 1);
>>> +
>>> +    if (!tree_to_vec_perm_builder (&builder0, @3)
>>> +    || !tree_to_vec_perm_builder (&builder1, @4))
>>> +      return NULL_TREE;
>>> +
>>> +    vec_perm_indices sel0 (builder0, 2, nelts);
>>> +    vec_perm_indices sel1 (builder1, 1, nelts);
>>> +
>>> +    for (int i = 0; i < nelts; i++)
>>> +      builder2.quick_push (sel0[sel1[i].to_constant()]);
>>> +
>>> +    vec_perm_indices sel2 (builder2, 2, nelts);
>>> +
>>> +    if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false))
>>> +      return NULL_TREE;
>>> +
>>> +    op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
>>> +  }
>>> +  (vec_perm @1 @2 { op0; })))
>>> +
>>> +
>>> /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
>>>    The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
>>>    constant which when multiplied by a power of 2 contains a unique value
>>> diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c
>>> new file mode 100755
>>> index 00000000000..d87dc3a79a5
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/pr54346.c
>>> @@ -0,0 +1,13 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O -fdump-tree-dse1" } */
>>> +
>>> +typedef int veci __attribute__ ((vector_size (4 * sizeof (int))));
>>> +
>>> +void fun (veci a, veci b, veci *i)
>>> +{
>>> +  veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7});
>>> +  *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 });
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */
>>> +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */
>>> \ No newline at end of file
>>> --
>>> 2.18.2
>>>
  
Xi Ruoyao Oct. 12, 2022, 1:51 p.m. UTC | #4
On Mon, 2022-09-26 at 14:56 +0800, Liwei Xu via Gcc-patches wrote:
>         This patch implemented the optimization in PR 54346, which Merges
> 
>         c = VEC_PERM_EXPR <a, b, VCST0>;
>         d = VEC_PERM_EXPR <c, c, VCST1>;
>                 to
>         d = VEC_PERM_EXPR <a, b, NEW_VCST>;
> 
>         Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
>         tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
>         is ok to removed it.

I'm getting:

FAIL: gcc.dg/pr54346.c scan-tree-dump dse1 "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }"
FAIL: gcc.dg/pr54346.c scan-tree-dump-times dse1 "VEC_PERM_EXPR" 1

on loongarch64-linux-gnu.  Not sure why.
  
Levy Hsu Oct. 13, 2022, 6:15 a.m. UTC | #5
Hi RuoYao

It’s probably because loongarch64 doesn’t support 
can_vec_perm_const_p(result_mode, op_mode, sel2, false)

I’m not sure whether if loongarch will support it or should I just limit the test target for pr54346.c?

Best Regards
Levy

> On 12 Oct 2022, at 9:51 pm, Xi Ruoyao <xry111@xry111.site> wrote:
> 
> pr54346.
  
Xi Ruoyao Oct. 13, 2022, 6:44 a.m. UTC | #6
On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
> Hi RuoYao
> 
> It’s probably because loongarch64 doesn’t support 
> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
> 
> I’m not sure whether if loongarch will support it or should I just
> limit the test target for pr54346.c?

I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
actually support vector.  (LoongArch has SIMD instructions but the
support in GCC won't be added in a very recent future.)
  
chenglulu Oct. 13, 2022, 8:15 a.m. UTC | #7
在 2022/10/13 下午2:44, Xi Ruoyao 写道:
> On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
>> Hi RuoYao
>>
>> It’s probably because loongarch64 doesn’t support
>> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
>>
>> I’m not sure whether if loongarch will support it or should I just
>> limit the test target for pr54346.c?
> I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
> actually support vector.  (LoongArch has SIMD instructions but the
> support in GCC won't be added in a very recent future.)
>
If what I understand is correct, I think this might be a better solution.

  /* { dg-do compile } */

+/* { dg-require-effective-target vect_perm } */
  /* { dg-options "-O -fdump-tree-dse1" } */
  
Richard Biener Oct. 13, 2022, 11:10 a.m. UTC | #8
On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng <chenglulu@loongson.cn> wrote:
>
>
> 在 2022/10/13 下午2:44, Xi Ruoyao 写道:
> > On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
> >> Hi RuoYao
> >>
> >> It’s probably because loongarch64 doesn’t support
> >> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
> >>
> >> I’m not sure whether if loongarch will support it or should I just
> >> limit the test target for pr54346.c?
> > I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
> > actually support vector.  (LoongArch has SIMD instructions but the
> > support in GCC won't be added in a very recent future.)
> >
> If what I understand is correct, I think this might be a better solution.
>
>   /* { dg-do compile } */
>
> +/* { dg-require-effective-target vect_perm } */
>   /* { dg-options "-O -fdump-tree-dse1" } */

Btw, what forwprop does is check whether any of the original permutations are
not supported and then elide the supportability check for the result.
The reasoning
is that the original permute(s) would be lowered during vectlower so we can as
well do that for the result.  We should just never turn a supported permutation
sequence into a not supported one.

Richard.

>
  
Li, Pan2 via Gcc-patches Oct. 14, 2022, 1:09 a.m. UTC | #9
Hi Richard
 
Thank your for your detailed explanation, I’ll patch the test case with suggestions form LuLu.

Best
Levy

> On 13 Oct 2022, at 7:12 pm, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng <chenglulu@loongson.cn> wrote:
>> 
>> 
>>> 在 2022/10/13 下午2:44, Xi Ruoyao 写道:
>>> On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
>>>> Hi RuoYao
>>>> 
>>>> It’s probably because loongarch64 doesn’t support
>>>> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
>>>> 
>>>> I’m not sure whether if loongarch will support it or should I just
>>>> limit the test target for pr54346.c?
>>> I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
>>> actually support vector.  (LoongArch has SIMD instructions but the
>>> support in GCC won't be added in a very recent future.)
>>> 
>> If what I understand is correct, I think this might be a better solution.
>> 
>>  /* { dg-do compile } */
>> 
>> +/* { dg-require-effective-target vect_perm } */
>>  /* { dg-options "-O -fdump-tree-dse1" } */
> 
> Btw, what forwprop does is check whether any of the original permutations are
> not supported and then elide the supportability check for the result.
> The reasoning
> is that the original permute(s) would be lowered during vectlower so we can as
> well do that for the result.  We should just never turn a supported permutation
> sequence into a not supported one.
> 
> Richard.
>
  
chenglulu Oct. 14, 2022, 1:49 a.m. UTC | #10
在 2022/10/13 下午7:10, Richard Biener 写道:
> On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng <chenglulu@loongson.cn> wrote:
>>
>> 在 2022/10/13 下午2:44, Xi Ruoyao 写道:
>>> On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
>>>> Hi RuoYao
>>>>
>>>> It’s probably because loongarch64 doesn’t support
>>>> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
>>>>
>>>> I’m not sure whether if loongarch will support it or should I just
>>>> limit the test target for pr54346.c?
>>> I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
>>> actually support vector.  (LoongArch has SIMD instructions but the
>>> support in GCC won't be added in a very recent future.)
>>>
>> If what I understand is correct, I think this might be a better solution.
>>
>>    /* { dg-do compile } */
>>
>> +/* { dg-require-effective-target vect_perm } */
>>    /* { dg-options "-O -fdump-tree-dse1" } */
> Btw, what forwprop does is check whether any of the original permutations are
> not supported and then elide the supportability check for the result.
> The reasoning
> is that the original permute(s) would be lowered during vectlower so we can as
> well do that for the result.  We should just never turn a supported permutation
> sequence into a not supported one.
>
> Richard.
>
Hi Richard:

I'm very sorry. I don't fully understand what you mean.

Could you give me some more details?


Thanks!

Lulu Cheng
  
Richard Biener Oct. 14, 2022, 6:18 a.m. UTC | #11
On Fri, Oct 14, 2022 at 3:49 AM Lulu Cheng <chenglulu@loongson.cn> wrote:
>
>
> 在 2022/10/13 下午7:10, Richard Biener 写道:
> > On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng <chenglulu@loongson.cn> wrote:
> >>
> >> 在 2022/10/13 下午2:44, Xi Ruoyao 写道:
> >>> On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
> >>>> Hi RuoYao
> >>>>
> >>>> It’s probably because loongarch64 doesn’t support
> >>>> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
> >>>>
> >>>> I’m not sure whether if loongarch will support it or should I just
> >>>> limit the test target for pr54346.c?
> >>> I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
> >>> actually support vector.  (LoongArch has SIMD instructions but the
> >>> support in GCC won't be added in a very recent future.)
> >>>
> >> If what I understand is correct, I think this might be a better solution.
> >>
> >>    /* { dg-do compile } */
> >>
> >> +/* { dg-require-effective-target vect_perm } */
> >>    /* { dg-options "-O -fdump-tree-dse1" } */
> > Btw, what forwprop does is check whether any of the original permutations are
> > not supported and then elide the supportability check for the result.
> > The reasoning
> > is that the original permute(s) would be lowered during vectlower so we can as
> > well do that for the result.  We should just never turn a supported permutation
> > sequence into a not supported one.
> >
> > Richard.
> >
> Hi Richard:
>
> I'm very sorry. I don't fully understand what you mean.
>
> Could you give me some more details?

The argument is that if Loongarch64 doesn't support the VEC_PERM_EXPR in the IL
before the transform then it doesn't matter if the transform introduces another
VEC_PERM_EXPR that isn't supported.  Both are later (in the veclower pass)
lowered to scalar operations.

So instead of

  if (can_vec_perm_const_p (result_mode, op_mode, sel2, false))
   ..

you'd do

  if (can_vec_perm_const_p (result_mode, op_mode, sel2, false)
      || !can_vec_perm_const_p ( ... original vec_perm1 ...)
      || !can_vec_perm_const_p ( ... original vec_perm2 ...))

Richard.

>
> Thanks!
>
> Lulu Cheng
>
>
  
Li, Pan2 via Gcc-patches Oct. 14, 2022, 6:34 a.m. UTC | #12
Hi Richard

I just found that modifying match.pd may be a better way since for forwprop-19.c, VEC_PERM_EXPR exists in all gimple until 235t.optimize with current trunk code, that leave us no pass to ‘scan-tree-dump-not’.

Best Regards
Levy

> On 14 Oct 2022, at 2:19 pm, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Fri, Oct 14, 2022 at 3:49 AM Lulu Cheng <chenglulu@loongson.cn> wrote:
>> 
>> 
>>> 在 2022/10/13 下午7:10, Richard Biener 写道:
>>> On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng <chenglulu@loongson.cn> wrote:
>>>> 
>>>> 在 2022/10/13 下午2:44, Xi Ruoyao 写道:
>>>>> On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
>>>>>> Hi RuoYao
>>>>>> 
>>>>>> It’s probably because loongarch64 doesn’t support
>>>>>> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
>>>>>> 
>>>>>> I’m not sure whether if loongarch will support it or should I just
>>>>>> limit the test target for pr54346.c?
>>>>> I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
>>>>> actually support vector.  (LoongArch has SIMD instructions but the
>>>>> support in GCC won't be added in a very recent future.)
>>>>> 
>>>> If what I understand is correct, I think this might be a better solution.
>>>> 
>>>>   /* { dg-do compile } */
>>>> 
>>>> +/* { dg-require-effective-target vect_perm } */
>>>>   /* { dg-options "-O -fdump-tree-dse1" } */
>>> Btw, what forwprop does is check whether any of the original permutations are
>>> not supported and then elide the supportability check for the result.
>>> The reasoning
>>> is that the original permute(s) would be lowered during vectlower so we can as
>>> well do that for the result.  We should just never turn a supported permutation
>>> sequence into a not supported one.
>>> 
>>> Richard.
>>> 
>> Hi Richard:
>> 
>> I'm very sorry. I don't fully understand what you mean.
>> 
>> Could you give me some more details?
> 
> The argument is that if Loongarch64 doesn't support the VEC_PERM_EXPR in the IL
> before the transform then it doesn't matter if the transform introduces another
> VEC_PERM_EXPR that isn't supported.  Both are later (in the veclower pass)
> lowered to scalar operations.
> 
> So instead of
> 
>  if (can_vec_perm_const_p (result_mode, op_mode, sel2, false))
>   ..
> 
> you'd do
> 
>  if (can_vec_perm_const_p (result_mode, op_mode, sel2, false)
>      || !can_vec_perm_const_p ( ... original vec_perm1 ...)
>      || !can_vec_perm_const_p ( ... original vec_perm2 ...))
> 
> Richard.
> 
>> 
>> Thanks!
>> 
>> Lulu Cheng
>>
  

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 345bcb701a5..9219b0a10e1 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8086,6 +8086,47 @@  and,
   (minus (mult (vec_perm @1 @1 @3) @2) @4)))
 
 
+/* (PR54346) Merge 
+   c = VEC_PERM_EXPR <a, b, VCST0>; 
+   d = VEC_PERM_EXPR <c, c, VCST1>;
+   to
+   d = VEC_PERM_EXPR <a, b, NEW_VCST>; */
+   
+(simplify
+ (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
+ (with
+  {
+    if(!TYPE_VECTOR_SUBPARTS (type).is_constant())
+      return NULL_TREE;
+
+    tree op0;
+    machine_mode result_mode = TYPE_MODE (type);
+    machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
+    int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant();
+    vec_perm_builder builder0;
+    vec_perm_builder builder1;
+    vec_perm_builder builder2 (nelts, nelts, 1);
+
+    if (!tree_to_vec_perm_builder (&builder0, @3) 
+    || !tree_to_vec_perm_builder (&builder1, @4))
+      return NULL_TREE;
+
+    vec_perm_indices sel0 (builder0, 2, nelts);
+    vec_perm_indices sel1 (builder1, 1, nelts);
+   
+    for (int i = 0; i < nelts; i++)
+      builder2.quick_push (sel0[sel1[i].to_constant()]);
+
+    vec_perm_indices sel2 (builder2, 2, nelts);
+
+    if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false))
+      return NULL_TREE;
+
+    op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
+  }
+  (vec_perm @1 @2 { op0; })))
+
+
 /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
    The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
    constant which when multiplied by a power of 2 contains a unique value
diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c
new file mode 100755
index 00000000000..d87dc3a79a5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr54346.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-dse1" } */
+
+typedef int veci __attribute__ ((vector_size (4 * sizeof (int))));
+
+void fun (veci a, veci b, veci *i)
+{
+  veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7});
+  *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 });
+}
+
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */
\ No newline at end of file