[0/9] vect: Move costing next to the transform for vect load

Message ID	cover.1686573640.git.linkw@linux.ibm.com
Headers	Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 26FB138582A4 To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, bergner@linux.ibm.com Subject: [PATCH 0/9] vect: Move costing next to the transform for vect load Date: Mon, 12 Jun 2023 21:03:21 -0500 Message-Id: <cover.1686573640.git.linkw@linux.ibm.com> Content-Transfer-Encoding: 8bit MIME-Version: 1.0 Precedence: list From: Kewen Lin via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Kewen Lin <linkw@linux.ibm.com> Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
Series	vect: Move costing next to the transform for vect load \| [0/9] vect: Move costing next to the transform for vect load [1/9] vect: Move vect_model_load_cost next to the transform in vectorizable_load [2/9] vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER && gs_info.decl [3/9] vect: Adjust vectorizable_load costing on VMAT_INVARIANT [4/9] vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP [5/9] vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER [6/9] vect: Adjust vectorizable_load costing on VMAT_LOAD_STORE_LANES [7/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_REVERSE [8/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE [9/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS

Message ID

cover.1686573640.git.linkw@linux.ibm.com

Headers

Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 client-ip=2620:52:3:1:0:246e:9693:128c;
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 26FB138582A4
To: gcc-patches@gcc.gnu.org
Cc: richard.guenther@gmail.com, richard.sandiford@arm.com,
 segher@kernel.crashing.org, bergner@linux.ibm.com
Subject: [PATCH 0/9] vect: Move costing next to the transform for vect load
Date: Mon, 12 Jun 2023 21:03:21 -0500
Message-Id: <cover.1686573640.git.linkw@linux.ibm.com>
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
Precedence: list
From: Kewen Lin via Gcc-patches <gcc-patches@gcc.gnu.org>
Reply-To: Kewen Lin <linkw@linux.ibm.com>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>

Series

vect: Move costing next to the transform for vect load |

Message

Kewen.Lin June 13, 2023, 2:03 a.m. UTC

  This patch series follows Richi's suggestion at the link [1],
which suggest structuring vectorizable_load to make costing
next to the transform, in order to make it easier to keep
costing and the transform in sync.  For now, it's a known
issue that what we cost can be inconsistent with what we
transform, as the case in PR82255 and some other associated
test cases in the patches of this series show.

Basically this patch series makes costing not call function
vect_model_load_cost any more.  To make the review and
bisection easy, I organized the changes according to the
memory access types of vector load.  For each memory access
type, firstly it follows the handlings in the function
vect_model_load_costto avoid any missing, then refines
further by referring to the transform code, I also checked
them with some typical test cases to verify.  Hope the
subjects of patches are clear enough.

The whole series can be bootstrapped and regtested
incrementally on:
  - x86_64-redhat-linux
  - aarch64-linux-gnu
  - powerpc64-linux-gnu P7, P8 and P9
  - powerpc64le-linux-gnu P8, P9 and P10

By considering the current vector test buckets are mainly
tested without cost model, I also verified the whole patch
series was neutral for SPEC2017 int/fp on Power9 at O2,
O3 and Ofast separately.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html

Kewen Lin (9):
  vect: Move vect_model_load_cost next to the transform in vectorizable_load
  vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER && gs_info.decl
  vect: Adjust vectorizable_load costing on VMAT_INVARIANT
  vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP
  vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER
  vect: Adjust vectorizable_load costing on VMAT_LOAD_STORE_LANES
  vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_REVERSE
  vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE
  vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS

 .../vect/costmodel/ppc/costmodel-pr82255.c    |  31 +
 .../costmodel/ppc/costmodel-vect-reversed.c   |  22 +
 gcc/testsuite/gcc.target/i386/pr70021.c       |   2 +-
 gcc/tree-vect-stmts.cc                        | 651 ++++++++++--------
 4 files changed, 432 insertions(+), 274 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c

Comments

Kewen.Lin June 26, 2023, 6 a.m. UTC | #1

Hi,

I'd like to gentle ping this patch series:

https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621460.html

BR,
Kewen

on 2023/6/13 10:03, Kewen Lin via Gcc-patches wrote:
> This patch series follows Richi's suggestion at the link [1],
> which suggest structuring vectorizable_load to make costing
> next to the transform, in order to make it easier to keep
> costing and the transform in sync.  For now, it's a known
> issue that what we cost can be inconsistent with what we
> transform, as the case in PR82255 and some other associated
> test cases in the patches of this series show.
> 
> Basically this patch series makes costing not call function
> vect_model_load_cost any more.  To make the review and
> bisection easy, I organized the changes according to the
> memory access types of vector load.  For each memory access
> type, firstly it follows the handlings in the function
> vect_model_load_costto avoid any missing, then refines
> further by referring to the transform code, I also checked
> them with some typical test cases to verify.  Hope the
> subjects of patches are clear enough.
> 
> The whole series can be bootstrapped and regtested
> incrementally on:
>   - x86_64-redhat-linux
>   - aarch64-linux-gnu
>   - powerpc64-linux-gnu P7, P8 and P9
>   - powerpc64le-linux-gnu P8, P9 and P10
> 
> By considering the current vector test buckets are mainly
> tested without cost model, I also verified the whole patch
> series was neutral for SPEC2017 int/fp on Power9 at O2,
> O3 and Ofast separately.
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html
> 
> Kewen Lin (9):
>   vect: Move vect_model_load_cost next to the transform in vectorizable_load
>   vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER && gs_info.decl
>   vect: Adjust vectorizable_load costing on VMAT_INVARIANT
>   vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP
>   vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER
>   vect: Adjust vectorizable_load costing on VMAT_LOAD_STORE_LANES
>   vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_REVERSE
>   vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE
>   vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS
> 
>  .../vect/costmodel/ppc/costmodel-pr82255.c    |  31 +
>  .../costmodel/ppc/costmodel-vect-reversed.c   |  22 +
>  gcc/testsuite/gcc.target/i386/pr70021.c       |   2 +-
>  gcc/tree-vect-stmts.cc                        | 651 ++++++++++--------
>  4 files changed, 432 insertions(+), 274 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c
>

Richard Biener June 30, 2023, 11:37 a.m. UTC | #2

On Tue, Jun 13, 2023 at 4:07 AM Kewen Lin <linkw@linux.ibm.com> wrote:
>
> This patch series follows Richi's suggestion at the link [1],
> which suggest structuring vectorizable_load to make costing
> next to the transform, in order to make it easier to keep
> costing and the transform in sync.  For now, it's a known
> issue that what we cost can be inconsistent with what we
> transform, as the case in PR82255 and some other associated
> test cases in the patches of this series show.
>
> Basically this patch series makes costing not call function
> vect_model_load_cost any more.  To make the review and
> bisection easy, I organized the changes according to the
> memory access types of vector load.  For each memory access
> type, firstly it follows the handlings in the function
> vect_model_load_costto avoid any missing, then refines
> further by referring to the transform code, I also checked
> them with some typical test cases to verify.  Hope the
> subjects of patches are clear enough.
>
> The whole series can be bootstrapped and regtested
> incrementally on:
>   - x86_64-redhat-linux
>   - aarch64-linux-gnu
>   - powerpc64-linux-gnu P7, P8 and P9
>   - powerpc64le-linux-gnu P8, P9 and P10
>
> By considering the current vector test buckets are mainly
> tested without cost model, I also verified the whole patch
> series was neutral for SPEC2017 int/fp on Power9 at O2,
> O3 and Ofast separately.

I went through the series now and I like it overall (well, I suggested
the change).
Looking at the changes I think we want some followup to reduce the
mess in the final loop nest.  We already have some VMAT_* cases handled
separately, maybe we can split out some more cases.  Maybe we should
bite the bullet and duplicate that loop nest for the different VMAT_* cases.
Maybe we can merge some of the if (!costing_p) checks by clever
re-ordering.  So what
this series doesn't improve is overall readability of the code (indent and our
80 char line limit).

The change also makes it more difficult(?) to separate analysis and transform
though in the end I hope that analysis will actually "code generate" to a (SLP)
data structure so the target will have a chance to see the actual flow of insns.

That said, I'd like to hear from Richard whether he thinks this is a step
in the right direction.

Are you willing to followup with doing the same re-structuring to
vectorizable_store?

OK from my side with the few comments addressed.  The patch likely needs refresh
after the RVV changes in this area?

Thanks,
Richard.

> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html
>
> Kewen Lin (9):
>   vect: Move vect_model_load_cost next to the transform in vectorizable_load
>   vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER && gs_info.decl
>   vect: Adjust vectorizable_load costing on VMAT_INVARIANT
>   vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP
>   vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER
>   vect: Adjust vectorizable_load costing on VMAT_LOAD_STORE_LANES
>   vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_REVERSE
>   vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE
>   vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS
>
>  .../vect/costmodel/ppc/costmodel-pr82255.c    |  31 +
>  .../costmodel/ppc/costmodel-vect-reversed.c   |  22 +
>  gcc/testsuite/gcc.target/i386/pr70021.c       |   2 +-
>  gcc/tree-vect-stmts.cc                        | 651 ++++++++++--------
>  4 files changed, 432 insertions(+), 274 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c
>
> --
> 2.31.1
>

Richard Sandiford July 2, 2023, 9:13 a.m. UTC | #3

Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Jun 13, 2023 at 4:07 AM Kewen Lin <linkw@linux.ibm.com> wrote:
>>
>> This patch series follows Richi's suggestion at the link [1],
>> which suggest structuring vectorizable_load to make costing
>> next to the transform, in order to make it easier to keep
>> costing and the transform in sync.

FTR, I was keeping quiet given that this was following an agreed plan :)

Thanks for organising the series this way.  It made it easier to review.

>> For now, it's a known
>> issue that what we cost can be inconsistent with what we
>> transform, as the case in PR82255 and some other associated
>> test cases in the patches of this series show.
>>
>> Basically this patch series makes costing not call function
>> vect_model_load_cost any more.  To make the review and
>> bisection easy, I organized the changes according to the
>> memory access types of vector load.  For each memory access
>> type, firstly it follows the handlings in the function
>> vect_model_load_costto avoid any missing, then refines
>> further by referring to the transform code, I also checked
>> them with some typical test cases to verify.  Hope the
>> subjects of patches are clear enough.
>>
>> The whole series can be bootstrapped and regtested
>> incrementally on:
>>   - x86_64-redhat-linux
>>   - aarch64-linux-gnu
>>   - powerpc64-linux-gnu P7, P8 and P9
>>   - powerpc64le-linux-gnu P8, P9 and P10
>>
>> By considering the current vector test buckets are mainly
>> tested without cost model, I also verified the whole patch
>> series was neutral for SPEC2017 int/fp on Power9 at O2,
>> O3 and Ofast separately.
>
> I went through the series now and I like it overall (well, I suggested
> the change).
> Looking at the changes I think we want some followup to reduce the
> mess in the final loop nest.  We already have some VMAT_* cases handled
> separately, maybe we can split out some more cases.  Maybe we should
> bite the bullet and duplicate that loop nest for the different VMAT_* cases.
> Maybe we can merge some of the if (!costing_p) checks by clever
> re-ordering.  So what
> this series doesn't improve is overall readability of the code (indent and our
> 80 char line limit).
>
> The change also makes it more difficult(?) to separate analysis and transform
> though in the end I hope that analysis will actually "code generate" to a (SLP)
> data structure so the target will have a chance to see the actual flow of insns.
>
> That said, I'd like to hear from Richard whether he thinks this is a step
> in the right direction.

Yeah, agree that it's probably better on balance.  It's going to need a
bit of discipline to make sure that we don't accidentally change the IR
during the analysis phase, but I guess that already exists to a lesser
extent with the “before !vec_stmt”/“after !vec_stmt” split.

Thanks,
Richard

Kewen.Lin July 3, 2023, 3:39 a.m. UTC | #4

Hi Richi,

Thanks for your review comments on this and some others!

on 2023/6/30 19:37, Richard Biener wrote:
> On Tue, Jun 13, 2023 at 4:07 AM Kewen Lin <linkw@linux.ibm.com> wrote:
>>
>> This patch series follows Richi's suggestion at the link [1],
>> which suggest structuring vectorizable_load to make costing
>> next to the transform, in order to make it easier to keep
>> costing and the transform in sync.  For now, it's a known
>> issue that what we cost can be inconsistent with what we
>> transform, as the case in PR82255 and some other associated
>> test cases in the patches of this series show.
>>
>> Basically this patch series makes costing not call function
>> vect_model_load_cost any more.  To make the review and
>> bisection easy, I organized the changes according to the
>> memory access types of vector load.  For each memory access
>> type, firstly it follows the handlings in the function
>> vect_model_load_costto avoid any missing, then refines
>> further by referring to the transform code, I also checked
>> them with some typical test cases to verify.  Hope the
>> subjects of patches are clear enough.
>>
>> The whole series can be bootstrapped and regtested
>> incrementally on:
>>   - x86_64-redhat-linux
>>   - aarch64-linux-gnu
>>   - powerpc64-linux-gnu P7, P8 and P9
>>   - powerpc64le-linux-gnu P8, P9 and P10
>>
>> By considering the current vector test buckets are mainly
>> tested without cost model, I also verified the whole patch
>> series was neutral for SPEC2017 int/fp on Power9 at O2,
>> O3 and Ofast separately.
> 
> I went through the series now and I like it overall (well, I suggested
> the change).
> Looking at the changes I think we want some followup to reduce the
> mess in the final loop nest.  We already have some VMAT_* cases handled
> separately, maybe we can split out some more cases.  Maybe we should

At first glance, the simple parts look to be the handlings for
VMAT_LOAD_STORE_LANES, and VMAT_GATHER_SCATTER (with ifn and emulated).
It seems a bit straightforward if it's fine to duplicate the nested loop,
but may need to care about removing some useless code.

> bite the bullet and duplicate that loop nest for the different VMAT_* cases.
> Maybe we can merge some of the if (!costing_p) checks by clever
> re-ordering.

I've tried a bit to merge them if possible, like the place to check
VMAT_CONTIGUOUS, VMAT_CONTIGUOUS_REVERSE and VMAT_CONTIGUOUS_PERMUTE.
But will keep in mind for the following updates.

> So what
> this series doesn't improve is overall readability of the code (indent and our
> 80 char line limit).

Sorry about that.

> 
> The change also makes it more difficult(?) to separate analysis and transform
> though in the end I hope that analysis will actually "code generate" to a (SLP)
> data structure so the target will have a chance to see the actual flow of insns.
> 
> That said, I'd like to hear from Richard whether he thinks this is a step
> in the right direction.
> 
> Are you willing to followup with doing the same re-structuring to
> vectorizable_store?

Yes, vectorizable_store was also pointed out in your original suggestion [1],
I planned to update this once this series meets your expectations and gets landed.

> 
> OK from my side with the few comments addressed.  The patch likely needs refresh
> after the RVV changes in this area?

Thanks!  Yes, I've updated 2/9 and 3/9 according to your comments, and updated
5/9 and 9/9 as they had some conflicts when rebasing.  Re-testing is ongoing,
do the updated versions look good to you?  Is this series ok for trunk if all the
test runs go well again as before?

BR,
Kewen

Richard Biener July 3, 2023, 8:42 a.m. UTC | #5

On Mon, Jul 3, 2023 at 5:39 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi Richi,
>
> Thanks for your review comments on this and some others!
>
> on 2023/6/30 19:37, Richard Biener wrote:
> > On Tue, Jun 13, 2023 at 4:07 AM Kewen Lin <linkw@linux.ibm.com> wrote:
> >>
> >> This patch series follows Richi's suggestion at the link [1],
> >> which suggest structuring vectorizable_load to make costing
> >> next to the transform, in order to make it easier to keep
> >> costing and the transform in sync.  For now, it's a known
> >> issue that what we cost can be inconsistent with what we
> >> transform, as the case in PR82255 and some other associated
> >> test cases in the patches of this series show.
> >>
> >> Basically this patch series makes costing not call function
> >> vect_model_load_cost any more.  To make the review and
> >> bisection easy, I organized the changes according to the
> >> memory access types of vector load.  For each memory access
> >> type, firstly it follows the handlings in the function
> >> vect_model_load_costto avoid any missing, then refines
> >> further by referring to the transform code, I also checked
> >> them with some typical test cases to verify.  Hope the
> >> subjects of patches are clear enough.
> >>
> >> The whole series can be bootstrapped and regtested
> >> incrementally on:
> >>   - x86_64-redhat-linux
> >>   - aarch64-linux-gnu
> >>   - powerpc64-linux-gnu P7, P8 and P9
> >>   - powerpc64le-linux-gnu P8, P9 and P10
> >>
> >> By considering the current vector test buckets are mainly
> >> tested without cost model, I also verified the whole patch
> >> series was neutral for SPEC2017 int/fp on Power9 at O2,
> >> O3 and Ofast separately.
> >
> > I went through the series now and I like it overall (well, I suggested
> > the change).
> > Looking at the changes I think we want some followup to reduce the
> > mess in the final loop nest.  We already have some VMAT_* cases handled
> > separately, maybe we can split out some more cases.  Maybe we should
>
> At first glance, the simple parts look to be the handlings for
> VMAT_LOAD_STORE_LANES, and VMAT_GATHER_SCATTER (with ifn and emulated).
> It seems a bit straightforward if it's fine to duplicate the nested loop,
> but may need to care about removing some useless code.
>
> > bite the bullet and duplicate that loop nest for the different VMAT_* cases.
> > Maybe we can merge some of the if (!costing_p) checks by clever
> > re-ordering.
>
> I've tried a bit to merge them if possible, like the place to check
> VMAT_CONTIGUOUS, VMAT_CONTIGUOUS_REVERSE and VMAT_CONTIGUOUS_PERMUTE.
> But will keep in mind for the following updates.
>
> > So what
> > this series doesn't improve is overall readability of the code (indent and our
> > 80 char line limit).
>
> Sorry about that.
>
> >
> > The change also makes it more difficult(?) to separate analysis and transform
> > though in the end I hope that analysis will actually "code generate" to a (SLP)
> > data structure so the target will have a chance to see the actual flow of insns.
> >
> > That said, I'd like to hear from Richard whether he thinks this is a step
> > in the right direction.
> >
> > Are you willing to followup with doing the same re-structuring to
> > vectorizable_store?
>
> Yes, vectorizable_store was also pointed out in your original suggestion [1],
> I planned to update this once this series meets your expectations and gets landed.
>
> >
> > OK from my side with the few comments addressed.  The patch likely needs refresh
> > after the RVV changes in this area?
>
> Thanks!  Yes, I've updated 2/9 and 3/9 according to your comments, and updated
> 5/9 and 9/9 as they had some conflicts when rebasing.  Re-testing is ongoing,
> do the updated versions look good to you?  Is this series ok for trunk if all the
> test runs go well again as before?

Yes.

Thanks,
Richard.

> BR,
> Kewen