[committed,0/6] amdgcn: Add V32, V16, V8, V4, and V2 vectors

Message ID	cover.1665485382.git.ams@codesourcery.com
Headers	Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1F6313858C20 IronPort-SDR: +F1pE7JiYK6grrU5yJG/eb8YteHKFFWms5J84OK9VhIDNeAdBIVjYZrAP1PHqKSxOxVelKafIf dPzISdEDyKnzPxshlfNpg551VVsZIRcbCtQ/QOOO/gwlrPLFf3dOiII7VPmR5/CfRdQtgQ+ien Mmi+2GpreUMxUnK0UHzSPvk6Eg4kpI9O0giigOb0xKMknvaLc3LRO6cmLMUy9woZ4WKhD0m2qL TQO1zSiIoRKQthbclocxXhNaoWoJ/l69HBw3M+Nb2/FC5vyqrVqRC8Bq32E19AfYa87VSVVR4V ruQ= From: Andrew Stubbs <ams@codesourcery.com> To: <gcc-patches@gcc.gnu.org> Subject: [committed 0/6] amdgcn: Add V32, V16, V8, V4, and V2 vectors Date: Tue, 11 Oct 2022 12:02:02 +0100 Message-ID: <cover.1665485382.git.ams@codesourcery.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Precedence: list Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
Series	amdgcn: Add V32, V16, V8, V4, and V2 vectors \| [committed,0/6] amdgcn: Add V32, V16, V8, V4, and V2 vectors [committed,1/6] amdgcn: add multiple vector sizes [committed,2/6] amdgcn: Resolve insn conditions at compile time [committed,3/6] amdgcn: Add vec_extract for partial vectors [committed,4/6] amdgcn: vec_init for multiple vector sizes [committed,5/6] amdgcn: Add vector integer negate insn [committed,6/6] amdgcn: vector testsuite tweaks

Message ID

cover.1665485382.git.ams@codesourcery.com

Headers

Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender) client-ip=8.43.85.97;
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1F6313858C20
IronPort-SDR: 
 +F1pE7JiYK6grrU5yJG/eb8YteHKFFWms5J84OK9VhIDNeAdBIVjYZrAP1PHqKSxOxVelKafIf
 dPzISdEDyKnzPxshlfNpg551VVsZIRcbCtQ/QOOO/gwlrPLFf3dOiII7VPmR5/CfRdQtgQ+ien
 Mmi+2GpreUMxUnK0UHzSPvk6Eg4kpI9O0giigOb0xKMknvaLc3LRO6cmLMUy9woZ4WKhD0m2qL
 TQO1zSiIoRKQthbclocxXhNaoWoJ/l69HBw3M+Nb2/FC5vyqrVqRC8Bq32E19AfYa87VSVVR4V
 ruQ=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [committed 0/6] amdgcn: Add V32, V16, V8, V4, and V2 vectors
Date: Tue, 11 Oct 2022 12:02:02 +0100
Message-ID: <cover.1665485382.git.ams@codesourcery.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
Precedence: list
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>

Series

amdgcn: Add V32, V16, V8, V4, and V2 vectors |

Message

Andrew Stubbs Oct. 11, 2022, 11:02 a.m. UTC

  This patch series adds additional vector sizes for the amdgcn backend.

The hardware supports any arbitrary vector length up to 64-lanes via
masking, but GCC cannot (yet) make full use of them due to middle-end
limitations.  Adding smaller "virtual" vector sizes increases the
complexity of the backend a little, but opens up optimization
opportunities for the current middle-end implementation somewhat. In
particular, it enables many more cases of SLP optimization.

The patchset gives aproximately 100 addtional test PASS and a few extra
FAIL.  However, the failures are not new issues, but rather existing
problems that did not show up because the code did not previously
vectorize.  Expanding the testcase to allow 64-lane vectors shows the
same problems there.

I shall backport these patches to the OG12 branch shortly.

Andrew

Andrew Stubbs (6):
  amdgcn: add multiple vector sizes
  amdgcn: Resolve insn conditions at compile time
  amdgcn: Add vec_extract for partial vectors
  amdgcn: vec_init for multiple vector sizes
  amdgcn: Add vector integer negate insn
  amdgcn: vector testsuite tweaks

 gcc/config/gcn/gcn-modes.def                  |   82 ++
 gcc/config/gcn/gcn-protos.h                   |   24 +-
 gcc/config/gcn/gcn-valu.md                    |  399 +++++--
 gcc/config/gcn/gcn.cc                         | 1063 +++++++++++------
 gcc/config/gcn/gcn.h                          |   24 +
 gcc/testsuite/gcc.dg/pr104464.c               |    2 +
 gcc/testsuite/gcc.dg/signbit-2.c              |    5 +-
 gcc/testsuite/gcc.dg/signbit-5.c              |    1 +
 gcc/testsuite/gcc.dg/vect/bb-slp-68.c         |    5 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c     |    3 +-
 .../gcc.dg/vect/bb-slp-subgroups-3.c          |    5 +-
 .../gcc.dg/vect/no-vfa-vect-depend-2.c        |    3 +-
 gcc/testsuite/gcc.dg/vect/pr33953.c           |    3 +-
 gcc/testsuite/gcc.dg/vect/pr65947-12.c        |    3 +-
 gcc/testsuite/gcc.dg/vect/pr65947-13.c        |    3 +-
 gcc/testsuite/gcc.dg/vect/pr80631-2.c         |    3 +-
 gcc/testsuite/gcc.dg/vect/slp-reduc-4.c       |    3 +-
 .../gcc.dg/vect/trapv-vect-reduc-4.c          |    3 +-
 gcc/testsuite/lib/target-supports.exp         |    3 +-
 19 files changed, 1183 insertions(+), 454 deletions(-)

Comments

Richard Biener Oct. 11, 2022, 11:29 a.m. UTC | #1

On Tue, Oct 11, 2022 at 1:03 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>
> This patch series adds additional vector sizes for the amdgcn backend.
>
> The hardware supports any arbitrary vector length up to 64-lanes via
> masking, but GCC cannot (yet) make full use of them due to middle-end
> limitations.  Adding smaller "virtual" vector sizes increases the
> complexity of the backend a little, but opens up optimization
> opportunities for the current middle-end implementation somewhat. In
> particular, it enables many more cases of SLP optimization.
>
> The patchset gives aproximately 100 addtional test PASS and a few extra
> FAIL.  However, the failures are not new issues, but rather existing
> problems that did not show up because the code did not previously
> vectorize.  Expanding the testcase to allow 64-lane vectors shows the
> same problems there.
>
> I shall backport these patches to the OG12 branch shortly.

I suppose until you change the related_vector_mode hook the PR107096 issue
will not hit you but at least it's then latent ...

>
> Andrew
>
> Andrew Stubbs (6):
>   amdgcn: add multiple vector sizes
>   amdgcn: Resolve insn conditions at compile time
>   amdgcn: Add vec_extract for partial vectors
>   amdgcn: vec_init for multiple vector sizes
>   amdgcn: Add vector integer negate insn
>   amdgcn: vector testsuite tweaks
>
>  gcc/config/gcn/gcn-modes.def                  |   82 ++
>  gcc/config/gcn/gcn-protos.h                   |   24 +-
>  gcc/config/gcn/gcn-valu.md                    |  399 +++++--
>  gcc/config/gcn/gcn.cc                         | 1063 +++++++++++------
>  gcc/config/gcn/gcn.h                          |   24 +
>  gcc/testsuite/gcc.dg/pr104464.c               |    2 +
>  gcc/testsuite/gcc.dg/signbit-2.c              |    5 +-
>  gcc/testsuite/gcc.dg/signbit-5.c              |    1 +
>  gcc/testsuite/gcc.dg/vect/bb-slp-68.c         |    5 +-
>  gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c     |    3 +-
>  .../gcc.dg/vect/bb-slp-subgroups-3.c          |    5 +-
>  .../gcc.dg/vect/no-vfa-vect-depend-2.c        |    3 +-
>  gcc/testsuite/gcc.dg/vect/pr33953.c           |    3 +-
>  gcc/testsuite/gcc.dg/vect/pr65947-12.c        |    3 +-
>  gcc/testsuite/gcc.dg/vect/pr65947-13.c        |    3 +-
>  gcc/testsuite/gcc.dg/vect/pr80631-2.c         |    3 +-
>  gcc/testsuite/gcc.dg/vect/slp-reduc-4.c       |    3 +-
>  .../gcc.dg/vect/trapv-vect-reduc-4.c          |    3 +-
>  gcc/testsuite/lib/target-supports.exp         |    3 +-
>  19 files changed, 1183 insertions(+), 454 deletions(-)
>
> --
> 2.37.0
>

Andrew Stubbs Oct. 11, 2022, 11:53 a.m. UTC | #2

On 11/10/2022 12:29, Richard Biener wrote:
> On Tue, Oct 11, 2022 at 1:03 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>>
>> This patch series adds additional vector sizes for the amdgcn backend.
>>
>> The hardware supports any arbitrary vector length up to 64-lanes via
>> masking, but GCC cannot (yet) make full use of them due to middle-end
>> limitations.  Adding smaller "virtual" vector sizes increases the
>> complexity of the backend a little, but opens up optimization
>> opportunities for the current middle-end implementation somewhat. In
>> particular, it enables many more cases of SLP optimization.
>>
>> The patchset gives aproximately 100 addtional test PASS and a few extra
>> FAIL.  However, the failures are not new issues, but rather existing
>> problems that did not show up because the code did not previously
>> vectorize.  Expanding the testcase to allow 64-lane vectors shows the
>> same problems there.
>>
>> I shall backport these patches to the OG12 branch shortly.
> 
> I suppose until you change the related_vector_mode hook the PR107096 issue
> will not hit you but at least it's then latent ...

How do you mean, change it?

static opt_machine_mode
gcn_related_vector_mode (machine_mode vector_mode,
                          scalar_mode element_mode, poly_uint64 nunits)
{
   int n = nunits.to_constant ();

   if (n == 0)
     n = GET_MODE_NUNITS (vector_mode);

   return VnMODE (n, element_mode);
}


It returns what it's asked for, always matching the number of lanes (not 
the bitsize), which is most likely the most natural for GCN.

Andrew

Richard Biener Oct. 11, 2022, 11:58 a.m. UTC | #3

On Tue, Oct 11, 2022 at 1:53 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>
> On 11/10/2022 12:29, Richard Biener wrote:
> > On Tue, Oct 11, 2022 at 1:03 PM Andrew Stubbs <ams@codesourcery.com> wrote:
> >>
> >> This patch series adds additional vector sizes for the amdgcn backend.
> >>
> >> The hardware supports any arbitrary vector length up to 64-lanes via
> >> masking, but GCC cannot (yet) make full use of them due to middle-end
> >> limitations.  Adding smaller "virtual" vector sizes increases the
> >> complexity of the backend a little, but opens up optimization
> >> opportunities for the current middle-end implementation somewhat. In
> >> particular, it enables many more cases of SLP optimization.
> >>
> >> The patchset gives aproximately 100 addtional test PASS and a few extra
> >> FAIL.  However, the failures are not new issues, but rather existing
> >> problems that did not show up because the code did not previously
> >> vectorize.  Expanding the testcase to allow 64-lane vectors shows the
> >> same problems there.
> >>
> >> I shall backport these patches to the OG12 branch shortly.
> >
> > I suppose until you change the related_vector_mode hook the PR107096 issue
> > will not hit you but at least it's then latent ...
>
> How do you mean, change it?
>
> static opt_machine_mode
> gcn_related_vector_mode (machine_mode vector_mode,
>                           scalar_mode element_mode, poly_uint64 nunits)
> {
>    int n = nunits.to_constant ();
>
>    if (n == 0)
>      n = GET_MODE_NUNITS (vector_mode);
>
>    return VnMODE (n, element_mode);
> }
>
>
> It returns what it's asked for, always matching the number of lanes (not
> the bitsize), which is most likely the most natural for GCN.

Yes, change it in any way no longer honoring that.  Or discover the
case (not sure if it actually exists) where the vectorizer itself tricks
you into this by passing down nunits !=0 when vectorizing a loop
(I _think_ that's only done for basic-block vectorization currently).

Richard.

>
> Andrew