Message ID | 20230324153046.3996092-1-frederik@codesourcery.com |
---|---|
Headers |
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp717989vqo; Fri, 24 Mar 2023 08:37:43 -0700 (PDT) X-Google-Smtp-Source: AKy350bj3q6HpKBMlrzeVng1Ik6y8iDsmC4Yq8P9Z7A0Y79exfDR+jMY4BxZ+yTxr1rbnstVgeOQ X-Received: by 2002:a17:907:161f:b0:932:7f5c:4bb2 with SMTP id hb31-20020a170907161f00b009327f5c4bb2mr3651555ejc.75.1679672263670; Fri, 24 Mar 2023 08:37:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679672263; cv=none; d=google.com; s=arc-20160816; b=GVnAoftt9k3eGykeMeWUbqMeXvCg1gwU6x7b07hRZd0orJsi7gO+9j+54MhpfPfXrQ 4RKduoWjft9bmssoWu4OFuRq1+r0xlG+FcuqRipzeXS/HWEDv3KYJ+DD5zC6yneq9pSp cGCiVJERZeauovqF+Ce5O3mf7IgV/S9iZk1j0ADP8Xmomfs5XXbAPFPeVKwKnTTtjXlO Oi8G9CNg01cBaTHPV5Q5x94cGQEYh5/lNz4LcUbjhNnNMSv3P7pyR2fDnuwme9C5lyq4 W33f5GDpsGYGA8EkerteVDdu/ZJGwEKdMSnpSJruDfvs1JkBe6eQuOHiEgmVgDEBCw/S wBGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:to:from:ironport-sdr :dmarc-filter:delivered-to; bh=scsDjOWTmiwB014eK71vIClmiaBnCctY5hNknwbHgtc=; b=HNTkxZa3E2MLABb+LYJLwz2RWVNvZd+6B3enW6vUl36nahEHmYcL0kNl52miXlGi/H o27hhyP+Pysg+Ub2UGc+wUotV/deR1LAuBdWWuLbAW8TXQuoQpOFyVew59TSvvz4IfNe X9V/0Eyb5HIM/iJ3QxElcTWiwPPwdrCbpTQRfoAHbBlS8g7sHGexwdXp/YaQSoEi7QJ+ R0xUHUVPwBK3ly9BU3CCtaTGMBuUJY0kc7Rt35RR7bf1vk+CXF3OhBLZMXbBDgPJ9m3e bA4+qTe6ta9iieyGO6MQuSie29WbZdV40rruTM0V8KVfo0gBE443SfhfVBfveDwgXCOD NZxw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id k17-20020a1709063e1100b0092fd7346e4bsi20877134eji.342.2023.03.24.08.37.43 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Mar 2023 08:37:43 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6E8403889829 for <ouuuleilei@gmail.com>; Fri, 24 Mar 2023 15:33:34 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 0E7AC3858C50; Fri, 24 Mar 2023 15:32:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0E7AC3858C50 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.98,288,1673942400"; d="scan'208";a="324677" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 24 Mar 2023 07:31:03 -0800 IronPort-SDR: oIsp9X4QR3O7ckDtl1fxxzmIwiMGzigoc5wvg1IUAwnYhNy2DSdxvJoVq37do2f8VhmtKQMzuc u0qtWYxJTgYrxwL+RaEUIddJqedYrRkAlaIopgFTq1CcuhUtCxJB8iWbhGUAIFyzfkVe1UCbyq AXod/8cDq3LnV8HpfnxrdRIm/xrqSXoJ5UZouilbxSDf/Osqlo1bx+YyWT6wnn4mwTe4VcrC9S WPm5UbFZVailg8yybFWUpuaklssQJ67P2IPifaOzA1dN6/Al7d5OtPqq2kboFdDLqATQoTQOCj 27U= From: Frederik Harwath <frederik@codesourcery.com> To: <gcc-patches@gcc.gnu.org>, <fortran@gcc.gnu.org>, <tobias@codesourcery.com>, <jakub@redhat.com>, <joseph@codesourcery.com>, <jason@redhat.com> Subject: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives Date: Fri, 24 Mar 2023 16:30:38 +0100 Message-ID: <20230324153046.3996092-1-frederik@codesourcery.com> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_MANYTO, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761264023554561860?= X-GMAIL-MSGID: =?utf-8?q?1761264023554561860?= |
Series |
openmp: OpenMP 5.1 loop transformation directives
|
|
Message
Frederik Harwath
March 24, 2023, 3:30 p.m. UTC
Hi, this patch series implements the OpenMP 5.1 "unroll" and "tile" constructs. It includes changes to the C,C++, and Fortran front end for parsing the new constructs and a new middle-end "omp_transform_loops" pass which implements the transformations in a source language agnostic way. The "unroll" and "tile" directives are internally implemented as clauses. This fits the representation of collapsed loop nests by a single internal gomp_for construct. Loop transformations can be applied to loops at the different levels of such a loop nest and this can be represented well with the clause representation. The transformations can also be applied to loops which are not going to be associated with any OpenMP directive after the transformation. This is represented by a new gomp_for kind. Loops of this kind are lowered in the transformation pass since they are not subject to any further OpenMP-specific processing. The patches are roughly presented in the order of their development: Each construct is implemented in the Fortran front end first including the middle-end additions/changes, followed by a patch that adds the C and C++ front end changes. This initial implementation supports the loop transformation constructs on the outermost loop of a loop nest only. The support for applying the transformations to inner loops is then added in two further patches. The patches have been bootstrapped and tested on x86_64-linux-gnu with both nvptx-none and amdgcn-amdhsa offloading. Best regards, Frederik Frederik Harwath (7): openmp: Add Fortran support for "omp unroll" directive openmp: Add C/C++ support for "omp unroll" directive openacc: Rename OMP_CLAUSE_TILE to OMP_CLAUSE_OACC_TILE openmp: Add Fortran support for "omp tile" openmp: Add C/C++ support for "omp tile" openmp: Add Fortran support for loop transformations on inner loops openmp: Add C/C++ support for loop transformations on inner loops gcc/Makefile.in | 1 + gcc/c-family/c-gimplify.cc | 1 + gcc/c-family/c-omp.cc | 12 +- gcc/c-family/c-pragma.cc | 2 + gcc/c-family/c-pragma.h | 7 +- gcc/c/c-parser.cc | 403 +++- gcc/c/c-typeck.cc | 10 +- gcc/cp/cp-gimplify.cc | 3 + gcc/cp/parser.cc | 453 ++++- gcc/cp/pt.cc | 15 +- gcc/cp/semantics.cc | 104 +- gcc/fortran/dump-parse-tree.cc | 30 + gcc/fortran/gfortran.h | 12 +- gcc/fortran/match.h | 2 + gcc/fortran/openmp.cc | 460 ++++- gcc/fortran/parse.cc | 52 +- gcc/fortran/resolve.cc | 6 + gcc/fortran/st.cc | 2 + gcc/fortran/trans-openmp.cc | 187 +- gcc/fortran/trans.cc | 2 + gcc/gimple-pretty-print.cc | 6 + gcc/gimple.h | 1 + gcc/gimplify.cc | 79 +- gcc/omp-general.cc | 22 +- gcc/omp-general.h | 1 + gcc/omp-low.cc | 6 +- gcc/omp-transform-loops.cc | 1773 +++++++++++++++++ gcc/params.opt | 9 + gcc/passes.def | 1 + .../loop-transforms/imperfect-loop-nest.c | 12 + .../gomp/loop-transforms/tile-1.c | 164 ++ .../gomp/loop-transforms/tile-2.c | 183 ++ .../gomp/loop-transforms/tile-3.c | 117 ++ .../gomp/loop-transforms/tile-4.c | 322 +++ .../gomp/loop-transforms/tile-5.c | 150 ++ .../gomp/loop-transforms/tile-6.c | 34 + .../gomp/loop-transforms/tile-7.c | 31 + .../gomp/loop-transforms/tile-8.c | 40 + .../gomp/loop-transforms/unroll-1.c | 133 ++ .../gomp/loop-transforms/unroll-2.c | 95 + .../gomp/loop-transforms/unroll-3.c | 18 + .../gomp/loop-transforms/unroll-4.c | 19 + .../gomp/loop-transforms/unroll-5.c | 19 + .../gomp/loop-transforms/unroll-6.c | 20 + .../gomp/loop-transforms/unroll-7.c | 144 ++ .../gomp/loop-transforms/unroll-inner-1.c | 15 + .../gomp/loop-transforms/unroll-inner-2.c | 31 + .../gomp/loop-transforms/unroll-non-rect-1.c | 37 + .../gomp/loop-transforms/unroll-non-rect-2.c | 22 + .../gomp/loop-transforms/unroll-simd-1.c | 84 + .../g++.dg/gomp/loop-transforms/tile-1.h | 27 + .../g++.dg/gomp/loop-transforms/tile-1a.C | 27 + .../g++.dg/gomp/loop-transforms/tile-1b.C | 27 + .../g++.dg/gomp/loop-transforms/unroll-1.C | 42 + .../g++.dg/gomp/loop-transforms/unroll-2.C | 47 + .../g++.dg/gomp/loop-transforms/unroll-3.C | 37 + .../gomp/loop-transforms/inner-loops.f90 | 124 ++ .../gomp/loop-transforms/tile-1.f90 | 163 ++ .../gomp/loop-transforms/tile-1a.f90 | 10 + .../gomp/loop-transforms/tile-2.f90 | 80 + .../gomp/loop-transforms/tile-3.f90 | 18 + .../gomp/loop-transforms/tile-4.f90 | 95 + .../loop-transforms/tile-imperfect-nest.f90 | 93 + .../loop-transforms/tile-inner-loops-1.f90 | 16 + .../loop-transforms/tile-inner-loops-2.f90 | 23 + .../loop-transforms/tile-inner-loops-3.f90 | 22 + .../loop-transforms/tile-inner-loops-3a.f90 | 31 + .../loop-transforms/tile-inner-loops-4.f90 | 30 + .../loop-transforms/tile-inner-loops-4a.f90 | 26 + .../loop-transforms/tile-inner-loops-5.f90 | 123 ++ .../tile-non-rectangular-1.f90 | 71 + .../tile-non-rectangular-2.f90 | 12 + .../gomp/loop-transforms/tile-unroll-1.f90 | 57 + .../gomp/loop-transforms/unroll-1.f90 | 277 +++ .../gomp/loop-transforms/unroll-10.f90 | 7 + .../gomp/loop-transforms/unroll-11.f90 | 75 + .../gomp/loop-transforms/unroll-12.f90 | 29 + .../gomp/loop-transforms/unroll-2.f90 | 22 + .../gomp/loop-transforms/unroll-3.f90 | 17 + .../gomp/loop-transforms/unroll-4.f90 | 18 + .../gomp/loop-transforms/unroll-5.f90 | 18 + .../gomp/loop-transforms/unroll-6.f90 | 19 + .../gomp/loop-transforms/unroll-7.f90 | 62 + .../gomp/loop-transforms/unroll-8.f90 | 22 + .../gomp/loop-transforms/unroll-9.f90 | 18 + .../loop-transforms/unroll-inner-loop.f90 | 57 + .../loop-transforms/unroll-no-clause-1.f90 | 20 + .../loop-transforms/unroll-no-clause-2.f90 | 21 + .../loop-transforms/unroll-no-clause-3.f90 | 23 + .../loop-transforms/unroll-non-rect-1.f90 | 31 + .../gomp/loop-transforms/unroll-simd-1.f90 | 244 +++ .../gomp/loop-transforms/unroll-simd-2.f90 | 57 + .../gomp/loop-transforms/unroll-tile-1.f90 | 37 + .../gomp/loop-transforms/unroll-tile-2.f90 | 41 + .../loop-transforms/unroll-tile-inner-1.f90 | 25 + gcc/tree-core.h | 14 +- gcc/tree-nested.cc | 4 +- gcc/tree-pass.h | 1 + gcc/tree-pretty-print.cc | 56 +- gcc/tree.cc | 11 +- gcc/tree.def | 6 + gcc/tree.h | 23 +- .../libgomp.c++/loop-transforms/tile-2.C | 69 + .../libgomp.c++/loop-transforms/tile-3.C | 28 + .../libgomp.c++/loop-transforms/unroll-1.C | 73 + .../libgomp.c++/loop-transforms/unroll-2.C | 34 + .../loop-transforms/unroll-full-tile.C | 84 + .../loop-transforms/matrix-1.h | 70 + .../loop-transforms/matrix-constant-iter.h | 71 + .../loop-transforms/matrix-helper.h | 19 + .../loop-transforms/matrix-no-directive-1.c | 11 + .../matrix-no-directive-unroll-full-1.c | 13 + .../matrix-omp-distribute-parallel-for-1.c | 6 + .../loop-transforms/matrix-omp-for-1.c | 13 + .../matrix-omp-parallel-for-1.c | 13 + .../matrix-omp-parallel-masked-taskloop-1.c | 6 + ...trix-omp-parallel-masked-taskloop-simd-1.c | 6 + .../matrix-omp-target-parallel-for-1.c | 13 + ...p-target-teams-distribute-parallel-for-1.c | 6 + .../loop-transforms/matrix-omp-taskloop-1.c | 6 + ...trix-omp-teams-distribute-parallel-for-1.c | 6 + .../loop-transforms/matrix-simd-1.c | 6 + .../matrix-transform-variants-1.h | 191 ++ .../loop-transforms/unroll-1.c | 76 + .../loop-transforms/unroll-non-rect-1.c | 129 ++ .../loop-transforms/inner-1.f90 | 77 + .../loop-transforms/tile-1.f90 | 71 + .../loop-transforms/tile-2.f90 | 117 ++ .../loop-transforms/tile-unroll-1.f90 | 112 ++ .../loop-transforms/tile-unroll-2.f90 | 71 + .../loop-transforms/tile-unroll-3.f90 | 77 + .../loop-transforms/tile-unroll-4.f90 | 75 + .../loop-transforms/unroll-1.f90 | 52 + .../loop-transforms/unroll-2.f90 | 88 + .../loop-transforms/unroll-3.f90 | 59 + .../loop-transforms/unroll-4.f90 | 72 + .../loop-transforms/unroll-5.f90 | 55 + .../loop-transforms/unroll-6.f90 | 105 + .../loop-transforms/unroll-7.f90 | 198 ++ .../loop-transforms/unroll-7a.f90 | 7 + .../loop-transforms/unroll-7b.f90 | 7 + .../loop-transforms/unroll-7c.f90 | 7 + .../loop-transforms/unroll-8.f90 | 38 + .../loop-transforms/unroll-simd-1.f90 | 33 + .../loop-transforms/unroll-tile-1.f90 | 112 ++ .../loop-transforms/unroll-tile-2.f90 | 71 + 146 files changed, 10154 insertions(+), 107 deletions(-) create mode 100644 gcc/omp-transform-loops.cc create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/imperfect-loop-nest.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-1.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-2.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-3.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-4.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-5.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-6.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-7.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-8.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-1.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-2.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-3.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-4.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-5.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-6.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-7.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-inner-1.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-inner-2.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-non-rect-1.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-non-rect-2.c create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-simd-1.c create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1.h create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1a.C create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1b.C create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-1.C create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-2.C create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/unroll-3.C create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/inner-loops.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1a.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-4.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-imperfect-nest.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-3.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-3a.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-4.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-4a.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-inner-loops-5.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-non-rectangular-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-non-rectangular-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-10.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-11.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-12.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-3.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-4.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-5.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-6.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-7.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-inner-loop.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-no-clause-3.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-non-rect-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-simd-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-simd-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-inner-1.f90 create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/tile-2.C create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/tile-3.C create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/unroll-1.C create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/unroll-2.C create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/unroll-full-tile.C create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-1.h create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-constant-iter.h create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-helper.h create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-transform-variants-1.h create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/inner-1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-2.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-3.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-4.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-5.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7a.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7b.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-7c.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-8.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90 -- 2.36.1 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Comments
On Fri, Mar 24, 2023 at 04:30:38PM +0100, Frederik Harwath wrote: > this patch series implements the OpenMP 5.1 "unroll" and "tile" > constructs. It includes changes to the C,C++, and Fortran front end > for parsing the new constructs and a new middle-end > "omp_transform_loops" pass which implements the transformations in a > source language agnostic way. I'm afraid we can't do it this way, at least not completely. The OpenMP requirements and what is being discussed for further loop transformations pretty much requires parts of it to be done as soon as possible. My understanding is that that is where other implementations implement that too and would also prefer GCC not to be the only implementation that takes significantly different decision in that case from other implementations like e.g. in the offloading case (where all other implementations preprocess/parse etc. source multiple times compared to GCC splitting stuff only at IPA time; this affects what can be done with metadirectives, declare variant etc.). Now, e.g. data sharing is done almost exclusively during gimplification, the proposed pass is later than that; it needs to be done before the data sharing. Ditto doacross handling. The normal loop constructs (OMP_FOR, OMP_SIMD, OMP_DISTRIBUTE, OMP_LOOP) already need to know given their collapse/ordered how many loops they are actually associated with and the loop transformation constructs can change that. So, I think we need to do the loop transformations in the FEs, that doesn't mean we need to write everything 3 times, once for each frontend. Already now, e.g. various stuff is shared between C and C++ FEs in c-family, though how much can be shared between c-family and Fortran is to be discovered. Or at least partially, to the extent that we compute how many canonical loops the loop transformations result in, what artificial iterators they will use etc., so that during gimplification we can take all that into account and then can do the actual transformations later. For C, I think the lowering of loop transformation constructs or at least determining what it means can be done right after we actually parse it and before we finalize the OMP_FOR eetc. that wraps it if any. As discussed last week at F2F, I think we want to remember in OMP_FOR_ORIG_DECLS the user iterators on the loop transformation constructs and take it into account for data sharing purposes. For C++ in templates we obviously need to defer that until instantiations, the constants in the clauses etc. could be template parameters etc. For Fortran during resolving. > The "unroll" and "tile" directives are > internally implemented as clauses. This fits the representation of So perhaps just use OMP_UNROLL/OMP_TILE as GENERIC constructs like OMP_FOR etc. but with some argument where from the early loop transformation analysis you can remember the important stuff, whether does the loop transformation result in a canonical loop nest or not and in the former case with how many nested loops. And then handle the actual transformation IMHO best at gimplification time, find them in the OMP_FOR etc. body if they are nested in there, let the transformation happen on GENERIC before the containing OMP_FOR etc. if any is actually finalized and from the transformation remember the original user decls and what should happen with them for data sharing (e.g. lastprivate/lastprivate conditional). From the slides I saw last week, a lot of other transformations are in the planning, like loop reversal etc. And, I think even in OpenMP 5.1 nothing prevents e.g. #pragma omp for collapse(3) // etc. #pragma omp tile sizes (4, 2, 2) #pragma omp tile sizes (4, 8, 16) for (int i = 0; i < 64; ++i) for (int j = 0; j < 64; ++j) for (int k = 0; k < 64; ++k) body; where the inner tile takes the i and j loops and makes for (int i1 = 0; i1 < 64; i1 += 4) for (int j1 = 0; j1 < 64; j1 += 8) for (int k1 = 0; k1 < 64; k1 += 16) for (int i2 = 0; i2 < 4; i2++) { int i = i1 + i2; for (int j2 = 0; j2 < 8; j2++) { int j = j1 + j2; for (int k2 = 0; k2 < 16; k2++) { int k = k1 + k2; body; } } } out of it with 3 outer loops which have canonical loop form (the rest doesn't). And then the second tile takes the outermost 3 of those generated loops and tiles them again, making it into again 3 canonical loop form loops plus stuff inside of it. Or one can replace the #pragma omp for collapse(3) // etc. with #pragma omp for #pragma omp unroll partial(2) which furthermore unrolls the outermost generated loop from the outer tile turning it into 1 canonical loop form loop plus stuff in it. Or of course as you have in your testcases, some loop transformation constructs could be used on more nested loops, not necessarily before the outermost one. But still, in all cases you need to know quite early how many canonical loop form nested loops you get from each loop transformation, so that it can be e.g. checked against the collapse/ordered clauses. Feel free to disagree if you think your approach is able to handle all of this, just put details in why do you think so. Jakub
On Mon, May 15, 2023 at 12:19:00PM +0200, Jakub Jelinek via Gcc-patches wrote: > For C++ in templates we obviously need to defer that until instantiations, > the constants in the clauses etc. could be template parameters etc. Even in C++ the how many canonical loop nest form loops does this transformation generate can be probably answered during parsing at least for the 5.1/5.2 loop transformations. I think we don't really allow template <int ...args> void foo () { #pragma omp for collapse(2) #pragma omp tile sizes(args...) for (int i = 0; i < 64; i++) for (int j = 0; j < 64; j++) for (int k = 0; k < 64; k++) ; } there how many arguments sizes clause has would be determined only after instantiation. Of course, we don't know the exact values... Jakub
Hi Jakub, On 15.05.23 12:19, Jakub Jelinek wrote: > On Fri, Mar 24, 2023 at 04:30:38PM +0100, Frederik Harwath wrote: >> this patch series implements the OpenMP 5.1 "unroll" and "tile" >> constructs. It includes changes to the C,C++, and Fortran front end >> for parsing the new constructs and a new middle-end >> "omp_transform_loops" pass which implements the transformations in a >> source language agnostic way. > I'm afraid we can't do it this way, at least not completely. > > The OpenMP requirements and what is being discussed for further loop > transformations pretty much requires parts of it to be done as soon as possible. > My understanding is that that is where other implementations implement that > too and would also prefer GCC not to be the only implementation that takes > significantly different decision in that case from other implementations The place where different compilers implement the loop transformations was discussed in an OpenMP loop transformation meeting last year. Two compilers (another one and GCC with this patch series) transformed the loops in the middle end after the handling of data sharing, one planned to do so. Yet another vendor had not yet decided where it will be implemented. Clang currently does everything in the front end, but it was mentioned that this might change in the future e.g. for code sharing with Flang. Implementing the loop transformations late could potentially complicate the implementation of transformations which require adjustments of the data sharing clauses, but this is known and consequentially, no such transformations are planned for OpenMP 6.0. In particular, the "apply" clause therefore only permits loop-transforming constructs to be applied to the loops generated from other loop transformations in TR11. > The normal loop constructs (OMP_FOR, OMP_SIMD, OMP_DISTRIBUTE, OMP_LOOP) > already need to know given their collapse/ordered how many loops they are > actually associated with and the loop transformation constructs can change > that. > So, I think we need to do the loop transformations in the FEs, that doesn't > mean we need to write everything 3 times, once for each frontend. > Already now, e.g. various stuff is shared between C and C++ FEs in c-family, > though how much can be shared between c-family and Fortran is to be > discovered. > Or at least partially, to the extent that we compute how many canonical > loops the loop transformations result in, what artificial iterators they > will use etc., so that during gimplification we can take all that into > account and then can do the actual transformations later. The patches in this patch series already do compute how many canonical loop nests result from the loop transformations in the front end. This is necessary to represent the loop nest that is affected by the loop transformations by a single OMP_FOR to meet the expectations of all later OpenMP code transformations. This is also the major reason why the loop transformations are represented by clauses instead of representing them as "OMP_UNROLL/OMP_TILE as GENERIC constructs like OMP_FOR" as you suggest below. Since the loop transformations may also appear on inner loops of a collapsed loop nest (i.e. within the collapsed depth), representing the transformation by OMP_FOR-like constructs would imply that a collapsed loop nest would have to be broken apart into single loops. Perhaps this could be handled somehow, but the collapsed loop nest would have to be re-assembled to meet the expectations of e.g. gimplification. The clause representation is also much better suited for the upcoming OpenMP "apply" clause where the transformations will not appear as directives in front of actual loops but inside of other clauses. In fact, the loop transformation clauses in the implementation already specify the level of a loop nest to which they apply and it could be possible to re-use this handling for "apply". My initial reaction also was to implement the loop transformations as OMP_FOR-like constructs and the patch actually introduces an OMP_LOOP_TRANS construct which is used to represent loops that are not going to be associated with another OpenMP directive after the transformation, e.g. void foo () { #pragma omp tile sizes (4, 8, 16) for (int i = 0; i < 64; ++i) { ... } } You suggest to implement the loop transformations during gimplification. I am not sure if gimplification is actually well-suited to implement the depth-first evaluation of the loop transformations. I also believe that gimplification already handles too many things which conceptually are not related to the translation to GIMPLE. Having a separate pass seems to be the right move to achieve a better separation of concerns. I think this will be even more important in the future as the size of the loop transformation implementation keeps growing. As you mention below, several new constructs are already planned. > For C, I think the lowering of loop transformation constructs or at least > determining what it means can be done right after we actually parse it and > before we finalize the OMP_FOR eetc. that wraps it if any. As discussed last > week at F2F, I think we want to remember in OMP_FOR_ORIG_DECLS the user > iterators on the loop transformation constructs and take it into account > for data sharing purposes. > > For C++ in templates we obviously need to defer that until instantiations, > the constants in the clauses etc. could be template parameters etc. > > For Fortran during resolving. > >> The "unroll" and "tile" directives are >> internally implemented as clauses. This fits the representation of > So perhaps just use OMP_UNROLL/OMP_TILE as GENERIC constructs like > OMP_FOR etc. but with some argument where from the early loop > transformation analysis you can remember the important stuff, whether > does the loop transformation result in a canonical loop nest or not > and in the former case with how many nested loops. > > And then handle the actual transformation IMHO best at gimplification > time, find them in the OMP_FOR etc. body if they are nested in there, > let the transformation happen on GENERIC before the containing OMP_FOR > etc. if any is actually finalized and from the transformation remember > the original user decls and what should happen with them for data sharing > (e.g. lastprivate/lastprivate conditional). > From the slides I saw last week, a lot of other transformations are in the > planning, like loop reversal etc. Correct, more transformations are planned. I have been following the design discussions and I do not think that any of those constructs will cause problems to the implementation contained in this patch series. Or do you see one that would be problematic? > And, I think even in OpenMP 5.1 nothing prevents e.g. > #pragma omp for collapse(3) // etc. > #pragma omp tile sizes (4, 2, 2) > #pragma omp tile sizes (4, 8, 16) > for (int i = 0; i < 64; ++i) > for (int j = 0; j < 64; ++j) > for (int k = 0; k < 64; ++k) > body; > where the inner tile takes the i and j loops and makes > for (int i1 = 0; i1 < 64; i1 += 4) > for (int j1 = 0; j1 < 64; j1 += 8) > for (int k1 = 0; k1 < 64; k1 += 16) > for (int i2 = 0; i2 < 4; i2++) > { > int i = i1 + i2; > for (int j2 = 0; j2 < 8; j2++) > { > int j = j1 + j2; > for (int k2 = 0; k2 < 16; k2++) > { > int k = k1 + k2; > body; > } > } > } > out of it with 3 outer loops which have canonical loop form (the rest > doesn't). And then the second tile takes the outermost 3 of those generated > loops and tiles them again, making it into again 3 canonical loop form > loops plus stuff inside of it. > Or one can replace the > #pragma omp for collapse(3) // etc. > with > #pragma omp for > #pragma omp unroll partial(2) > which furthermore unrolls the outermost generated loop from the outer tile > turning it into 1 canonical loop form loop plus stuff in it. You are right, this is valid and the implementation in this patch series is able to cope with this. The patches contain corresponding tests. The most extensive testing for different combinations of loop transformations, collapse values, and other directives is done in the libgomp/testsuite/libgomp.c-c++-common/loop-transforms/matrix-* tests. > Or of course as you have in your testcases, some loop transformation > constructs could be used on more nested loops, not necessarily before > the outermost one. But still, in all cases you need to know quite early > how many canonical loop form nested loops you get from each loop > transformation, so that it can be e.g. checked against the collapse/ordered > clauses. You are right and this happens in the implementation. The front ends extend the "collapse" (i.e. the loop nest depth) of the OMP_FOR-like constructs as necessary for the transformations (and rejects invalid combinations, e.g. "collapse" larger than the tiled depth, early) and the transformation pass reduces the "collapse" of the gomp_for to the value requested by the user code after the transformations have been executed. > Feel free to disagree if you think your approach is able to handle all of > this, just put details in why do you think so. I think it is able to handle everything, as I described above. Thanks, Frederik
On Tue, May 16, 2023 at 11:45:16AM +0200, Frederik Harwath wrote: > The place where different compilers implement the loop transformations > was discussed in an OpenMP loop transformation meeting last year. Two > compilers (another one and GCC with this patch series) transformed the loops > in the middle end after the handling of data sharing, one planned to do so. > Yet another vendor had not yet decided where it will be implemented. Clang > currently does everything in the front end, but it was mentioned that this > might change in the future e.g. for code sharing with Flang. Implementing > the loop transformations late could potentially > complicate the implementation of transformations which require adjustments > of the data sharing clauses, but this is known and consequentially, no such When already in the FE we determine how many canonical loops a particular loop transformation creates, I think the primary changes I'd like to see is really have OMP_UNROLL/OMP_TILE GENERIC statements (see below) and consider where is the best spot to lower it. I believe for data sharing it is best done during gimplification before the containing loops are handled, it is already shared code among all the FEs, I think will make it easier to handle data sharing right and gimplification is also where doacross processing is done. While there is restriction that ordered clause is incompatible with generated loops from tile construct, there isn't one for unroll (unless "The ordered clause must not appear on a worksharing-loop directive if the associated loops include the generated loops of a tile directive." means unroll partial implicitly because partial unroll tiles the loop, but it doesn't say it acts as if it was a tile construct), so we'd have to handle #pragma omp for ordered(2) for (int i = 0; i < 64; i++) #pragma omp unroll partial(4) for (int j = 0; j < 64; j++) { #pragma omp ordered depend (sink: i - 1, j - 2) #pragma omp ordered depend (source) } and I think handling it after gimplification is going to be increasingly harder. Of course another possibility is ask lang committee to clarify unless it has been clarified already in 6.0 (but in TR11 it is not). Also, I think creating temporaries is easier to be done during gimplification than later. Another option is as you implemented a separate pre-omp-lowering pass, and another one would be do it in the omplower pass, which has actually several subpasses internally, do it in the scan phase. Disadvantage of a completely separate pass is that we have to walk the whole IL again, while doing it in the scan phase means we avoid that cost. We already do there similar transformations, scan_omp_simd transforms simd constructs into if (...) simd else simt and then we process it with normal scan_omp_for on what we've created. So, if you insist doing it after gimplification perhaps for compatibility with other non-LLVM compilers, I'd prefer to do it there rather than in a completely separate pass. > transformations are planned for OpenMP 6.0. In particular, the "apply" > clause therefore only permits loop-transforming constructs to be applied to > the loops generated from other loop > transformations in TR11. > > > The normal loop constructs (OMP_FOR, OMP_SIMD, OMP_DISTRIBUTE, OMP_LOOP) > > already need to know given their collapse/ordered how many loops they are > > actually associated with and the loop transformation constructs can change > > that. > > So, I think we need to do the loop transformations in the FEs, that doesn't > > mean we need to write everything 3 times, once for each frontend. > > Already now, e.g. various stuff is shared between C and C++ FEs in c-family, > > though how much can be shared between c-family and Fortran is to be > > discovered. > > Or at least partially, to the extent that we compute how many canonical > > loops the loop transformations result in, what artificial iterators they > > will use etc., so that during gimplification we can take all that into > > account and then can do the actual transformations later. > > The patches in this patch series already do compute how many canonical > loop nests result from the loop transformations in the front end. Good. > This is necessary to represent the loop nest that is affected by the > loop transformations by a single OMP_FOR to meet the expectations > of all later OpenMP code transformations. This is also the major > reason why the loop transformations are represented by clauses > instead of representing them as "OMP_UNROLL/OMP_TILE as > GENERIC constructs like OMP_FOR" as you suggest below. Since the I really don't see why. We try to represent what we see in the source as OpenMP constructs as those constructs. We already have a precedent with composite loop constructs, where for the combined constructs which aren't innermost we temporarily use NULL OMP_FOR_{INIT,COND,INCR,ORIG_DECLS} vectors to stand for this will be some loop, but the details for it aren't known yet, to be filled up later. So, why can't we similarly represent #pragma omp for collapse(3) #pragma omp tile sizes (4, 2, 2) #pragma omp tile sizes (4, 8, 16) for (int i = 0; i < 64; ++i) for (int j = 0; j < 64; ++j) for (int k = 0; k < 64; ++k) body; as OMP_FOR with NULL OMP_FOR_{INIT,COND,INCR,ORIG_DECLS} with the appropriate clauses on it, with OMP_TILE (again, right clauses, NULL OMP_FOR_{INIT,COND,INCR,ORIG_DECLS}) and another OMP_TILE, this time with all the vectors filled in in GENERIC? #pragma omp for collapse(2) for (int i = 0; i < 64; ++i) #pragma omp tile sizes (4) for (int j = 0; j < 64; ++j) would be represented by non-NULL vectors which would have all the inner entries NULL (the outer loop is not generated loop, the inner one is generated) with OMP_TILE inside of it. Then depending on where the loop transformation is actually performed, we'd either need to preserve such shape from gimplification until the loop transformations are applied, or would be solely on GENERIC and GIMPLE would already have the transformed loops. Clauses e.g. have the disadvantage that generally they aren't ordered. If it is separate statements, it is e.g. easier to print it right in original dump, so that people can compare the loops before the transformation and after it. > You suggest to implement the loop transformations during gimplification. > I am not sure if gimplification is actually well-suited to implement the > depth-first evaluation of the loop transformations. I also believe that Why not? The loop transformation constructs can't be deeply nested in the bodies, they need to be close. gimplify_omp_for already searches the body for the case of composite constructs - if (OMP_FOR_INIT (for_stmt) == NULL_TREE) early in it. So, this would just mean doing it if that condition is true also looking for loop transformation constructs (if they are found, pass in the containing OMP_{FOR,SIMD,LOOP,DISTRIBUTE,TASKLOOP} if any to a routine that handles the transformation, such that it can update the containing looping construct if any during the transformation. That alone would handle the case where the looping construct should work solely with the generated loop. It would need to do the same thing also if OMP_FOR_INIT (for_stmt) is non-NULL but TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt))) is NULL to handle the case where generated loops are just some of the inner ones. And then when gimplify_omp_for would encounter an OMP_TILE/OMP_UNROLL loop on its own (i.e. not nested inside some other loop), it would similarly find further transform constructs in it like the above but then would just normally do the loop transformation, with NULL_TREE for the containing loop, meaning it is a sequential stuff. Jakub
Hi Jakub, On 16.05.23 13:00, Jakub Jelinek wrote: > On Tue, May 16, 2023 at 11:45:16AM +0200, Frederik Harwath wrote: >> The place where different compilers implement the loop transformations >> was discussed in an OpenMP loop transformation meeting last year. Two >> compilers (another one and GCC with this patch series) transformed >> the loops >> in the middle end after the handling of data sharing, one planned to >> do so. >> Yet another vendor had not yet decided where it will be implemented. >> Clang >> currently does everything in the front end, but it was mentioned that >> this >> might change in the future e.g. for code sharing with Flang. Implementing >> the loop transformations late could potentially >> complicate the implementation of transformations which require >> adjustments >> of the data sharing clauses, but this is known and consequentially, >> no such > When already in the FE we determine how many canonical loops a particular > loop transformation creates, I think the primary changes I'd like to > see is > really have OMP_UNROLL/OMP_TILE GENERIC statements (see below) and > consider > where is the best spot to lower it. I believe for data sharing it is best > done during gimplification before the containing loops are handled, it is > already shared code among all the FEs, I think will make it easier to > handle > data sharing right and gimplification is also where doacross processing is > done. While there is restriction that ordered clause is incompatible with > generated loops from tile construct, there isn't one for unroll (unless > "The ordered clause must not appear on a worksharing-loop directive if > the associated loops > include the generated loops of a tile directive." > means unroll partial implicitly because partial unroll tiles the loop, but > it doesn't say it acts as if it was a tile construct), so we'd have to > handle > #pragma omp for ordered(2) > for (int i = 0; i < 64; i++) > #pragma omp unroll partial(4) > for (int j = 0; j < 64; j++) > { > #pragma omp ordered depend (sink: i - 1, j - 2) > #pragma omp ordered depend (source) > } > and I think handling it after gimplification is going to be increasingly > harder. Of course another possibility is ask lang committee to clarify > unless it has been clarified already in 6.0 (but in TR11 it is not). I do not really expect that we will have to handle this. Questions concerning the correctness of code after applying loop transformations came up several times since I have been following the design meetings and the result was always either that nothing will be changed, because the loop transformations are not expected to ensure the correctness of enclosing directives, or that the use of the problematic construct in conjunction with loop transformations will be forbidden. Concerning the use of "ordered" on transformed loops, the latter approach was suggested for all transformations, cf. issue #3494 in the private OpenMP spec repository. I see that you have already asked for clarification on unroll. I suppose this could also be fixed after gimplification with reasonable effort. But let's just wait for the result of that discussion before we continue worrying about this. > Also, I think creating temporaries is easier to be done during > gimplification than later. This has not caused problems with the current approach. > Another option is as you implemented a separate pre-omp-lowering pass, > and another one would be do it in the omplower pass, which has actually > several subpasses internally, do it in the scan phase. Disadvantage of > a completely separate pass is that we have to walk the whole IL again, > while doing it in the scan phase means we avoid that cost. We already > do there similar transformations, scan_omp_simd transforms simd constructs > into if (...) simd else simt and then we process it with normal > scan_omp_for > on what we've created. So, if you insist doing it after gimplification > perhaps for compatibility with other non-LLVM compilers, I'd prefer to > do it there rather than in a completely separate pass. I see. This would be possible. My current approach is indeed rather wasteful because the pass is not restricted to functions that actually use loop transformations. I could add an attribute to such functions that could be used to avoid the execution of the pass and hence the gimple walk on functions that do not use transformations. >> This is necessary to represent the loop nest that is affected by the >> loop transformations by a single OMP_FOR to meet the expectations >> of all later OpenMP code transformations. This is also the major >> reason why the loop transformations are represented by clauses >> instead of representing them as "OMP_UNROLL/OMP_TILE as >> GENERIC constructs like OMP_FOR" as you suggest below. Since the > I really don't see why. We try to represent what we see in the source > as OpenMP constructs as those constructs. We already have a precedent > with composite loop constructs, where for the combined constructs which > aren't innermost we temporarily use NULL > OMP_FOR_{INIT,COND,INCR,ORIG_DECLS} > vectors to stand for this will be some loop, but the details for it aren't > known yet, to be filled up later. So, why can't we similarly represent > #pragma omp for collapse(3) > #pragma omp tile sizes (4, 2, 2) > #pragma omp tile sizes (4, 8, 16) > for (int i = 0; i < 64; ++i) > for (int j = 0; j < 64; ++j) > for (int k = 0; k < 64; ++k) > body; > as OMP_FOR with NULL OMP_FOR_{INIT,COND,INCR,ORIG_DECLS} > with the appropriate clauses on it, with > OMP_TILE (again, right clauses, NULL OMP_FOR_{INIT,COND,INCR,ORIG_DECLS}) > and another OMP_TILE, this time with all the vectors filled in in GENERIC? > > #pragma omp for collapse(2) > for (int i = 0; i < 64; ++i) > #pragma omp tile sizes (4) > for (int j = 0; j < 64; ++j) > would be represented by non-NULL vectors which would have all the inner > entries NULL (the outer loop is not generated loop, the inner one is > generated) with OMP_TILE inside of it. > > Then depending on where the loop transformation is actually performed, > we'd either need to preserve such shape from gimplification until the > loop transformations are applied, or would be solely on GENERIC and > GIMPLE would already have the transformed loops. Thanks for the explanation! I think now I understand how you would do this. > Clauses e.g. have the disadvantage that generally they aren't ordered. > If it is separate statements, it is e.g. easier to print it right in > original dump, so that people can compare the loops before the > transformation and after it. > You mean, the clauses are not ordered at the level of the specification? In the implementation they are of course ordered and the order has proved to be sufficiently stable. But perhaps you mean that you would like to avoid introducing code that relies on the ordering of the clauses? In this case, I could move the transformations to a separate chain which could be accessed e.g. by OMP_FOR_TRANSFORMS and by gimple_omp_for_transforms per level. This would also allow to print the transformations in the pretty printing functions on the corresponding levels of the loop nest. This would also be possible somehow with the present representation. But right now the transformations are just printed together with the other clauses on the directive. I considerd this to be acceptable because I suppose the dumps will be mostly read by GCC developers. There are also other clauses that are only used internally. >> You suggest to implement the loop transformations during gimplification. >> I am not sure if gimplification is actually well-suited to implement the >> depth-first evaluation of the loop transformations. I also believe that > Why not? The loop transformation constructs can't be deeply nested in the > bodies, they need to be close. > gimplify_omp_for already searches the body for the case of composite > constructs - if (OMP_FOR_INIT (for_stmt) == NULL_TREE) early in it. > So, this would just mean doing it if that condition is true also looking > for loop transformation constructs (if they are found, pass in the > containing OMP_{FOR,SIMD,LOOP,DISTRIBUTE,TASKLOOP} if any to a routine > that handles the transformation, such that it can update the containing > looping construct if any during the transformation. > That alone would handle the case where the looping construct should work > solely with the generated loop. It would need to do the same thing > also if OMP_FOR_INIT (for_stmt) is non-NULL but > TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), TREE_VEC_LENGTH (OMP_FOR_INIT > (for_stmt))) > is NULL to handle the case where generated loops are just some of the > inner > ones. > And then when gimplify_omp_for would encounter an OMP_TILE/OMP_UNROLL > loop on its own (i.e. not nested inside some other loop), it would > similarly > find further transform constructs in it like the above but then would just > normally do the loop transformation, with NULL_TREE for the containing > loop, > meaning it is a sequential stuff. Thanks for the explanation. But actually doing this would require a complete rewrite which would almost certainly imply that mainline GCC would not support the loop transformations for a long time. Best regards, Frederik
On Wed, May 17, 2023 at 01:55:00PM +0200, Frederik Harwath wrote: > Thanks for the explanation. But actually doing this would require a > complete rewrite which would almost certainly imply that mainline GCC > would not support the loop transformations for a long time. I don't think it needs complete rewrite, the change to use OMP_UNROLL/OMP_TILE should actually simplify stuff when you already have some other extra construct to handle the clauses if it isn't nested into something else, so I wouldn't expect it needs more than 2-3 hours of work. It is true that doing the transformation on trees rather than high gimple is something different, but again it doesn't require everything to be rewritten and we have code to do code copying both on trees and high and low gimple in tree-inline.cc, so the unrolling can just use different APIs to perform it. I'd still prefer to do it like that, I think it will pay back in maintainance costs. If you don't get to this within say 2 weeks, I'll try to do the conversion myself. Jakub