From patchwork Thu Sep 22 03:17:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sandra Loosemore X-Patchwork-Id: 1354 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5044:0:0:0:0:0 with SMTP id h4csp2244566wrt; Wed, 21 Sep 2022 20:18:10 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7wH7UHKJu/BUKHcGk0ZWGZDGMlzxDoajJyEir4w7cHKBjxKKpTyQN1FBzj6ivj1+W1J6Kn X-Received: by 2002:a17:907:a40e:b0:779:d3e1:3413 with SMTP id sg14-20020a170907a40e00b00779d3e13413mr1042616ejc.642.1663816690210; Wed, 21 Sep 2022 20:18:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663816690; cv=none; d=google.com; s=arc-20160816; b=j1fqCFNLNU1CqNVY3KL63/UgvgDACs7OltwKF1m6SMH2YQTr7lnGaSu7rN/GZ3jSHA yEMMPQdlJ8m63ZHP9lpVdhRLrK3aLFGqq+j0CgCqRXIea/lO2ZNkfTSZF9ZyyhvNJ3r0 WmIKp3MMB0A24WGlx0WbBHMqP5RPnOSftbqTSFl17DF4i6HHWuO5TDN0muhhYGWGBsNO lHFwTW5WkQq5zgLMr+BKfV7tf0NGFnkvSLew/48X+MABqTDdMZREaVmBuUbX2B6WkQ/B J2ykE0O8oQSkH9o41T4zowL93iCz5qcAlHCLzCLATepHTKORefZw8HofG7tU6TEXJB94 vGSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :ironport-sdr:dmarc-filter:delivered-to; bh=aR/jKpskZzRxvMnPykdRaN/JM/DT57Lc3on1/mL8aXo=; b=UNQhFt9L9iFLAg+7/GRz8P2GWgAgcA69P7+/LCTmekPeWHG+k/ePX4iSS41FruHSgl 0nlb7Wj/T32+CZGkc6Te4WPzjEJLx1K5Khk+RcFyk11N02SgMm5ViqNqJg4qtMF8VVwR 2f+WUDAHcb92qPBWhnvR6xYMigyHMyK8z7214E5Py73aFngn1d/Z/cIkxdTb/tZkn9Rp +Yuaztv7EWO1/0mkqXRxXtU1QELQqkhoFnQyp6ba4OHhaOir8s/MiFGn5Ib/g0Zl08Kn ZGGchthiKGWy8+YhI2QRWleJSB2EoZVVFlsezq3KWtFOsNC7MppI3ZjCJdbQ13aSHamT M8jg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id js1-20020a17090797c100b00763318cc0dcsi4432205ejc.751.2022.09.21.20.18.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Sep 2022 20:18:10 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 36C553857C5B for ; Thu, 22 Sep 2022 03:18:01 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 8D5B93858C52 for ; Thu, 22 Sep 2022 03:17:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8D5B93858C52 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.93,335,1654588800"; d="scan'208,223";a="83293778" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa3.mentor.iphmx.com with ESMTP; 21 Sep 2022 19:17:25 -0800 IronPort-SDR: h5jFrpV5g281YkOkMmpNu36ynGtF4eTgAPAioyfTPWDxm8P2aRSJKPysGfPA4NR4W59Gnx9Lsz qTpB2U3t+523mCzE3eX7pXPP2gtmpaXcXD8M6zXjsk9R5XYInxWoE8OJwg+6QpD93+6KZz4VBl mENLXcH62Vji6fbkGG4fb1dku+jx96y51C/jPB5TYALaJK2x+nArGYW0y4dksG2hi4KGGv8j+S Rk1GZ/BRW+knNAkthLcRxUns32SXPMkJ26uotJMNA9vyrtfDUcJGZPMqPH6BIqFobWlV4pcoV9 +LY= Message-ID: <001679b1-814a-c1db-5611-c663f6931d11@codesourcery.com> Date: Wed, 21 Sep 2022 21:17:18 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: [PATCH v2] Re: OpenMP: Generate SIMD clones for functions with "declare target" Content-Language: en-US To: Jakub Jelinek , Thomas Schwinge References: <0b64e323-63f9-e4b7-eb7f-83f3b5e3125b@codesourcery.com> From: Sandra Loosemore In-Reply-To: X-ClientProxiedBy: SVR-ORW-MBX-07.mgc.mentorg.com (147.34.90.207) To svr-orw-mbx-13.mgc.mentorg.com (147.34.90.213) X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "gcc-patches@gcc.gnu.org" Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1743967270458541110?= X-GMAIL-MSGID: =?utf-8?q?1744638249798380377?= On 9/14/22 12:12, Jakub Jelinek wrote: > If it is pure optimization thing and purely keyed on the definition, > all the simd clones should be local to the TU, never exported from it. OK, here is a revised patch that addresses that. x86_64 target also generates a different set of clones for functions with internal linkage vs external so I hacked that to treat these implicit clones in the same way as other internal clones. There is an existing problem with internal "declare simd" clones in that nothing ever DCEs clones that end up not being useful, or does a scan of the code in the compilation unit before clone generation to avoid generating useless clones in the first place. I haven't tried to solve that problem, but I did attempt to mitigate it for these implicit "declare target" clones by tagging the option OPT_LEVELS_2_PLUS_SPEED_ONLY (instead of enabling it by default all the time) so the clones are not generated by default at -Os and -Og. I added a couple new test cases to check this. On 9/14/22 15:45, Thomas Schwinge wrote: > However, OpenACC and OpenMP support may be active at the same time... > >> + if (attr == NULL_TREE >> + && flag_openmp_target_simd_clone && !flag_openacc) > > ..., so '!flag_openacc' is not the right check here. Instead you'd do > '!oacc_get_fn_attrib (DECL_ATTRIBUTES (node->decl))' (untested) or > similar. This is fixed now too. OK to check in? -Sandra From dfdb9a2162978b964863f351c814211dca8e9a3f Mon Sep 17 00:00:00 2001 From: Sandra Loosemore Date: Thu, 22 Sep 2022 02:16:42 +0000 Subject: [PATCH] OpenMP: Generate SIMD clones for functions with "declare target" This patch causes the IPA simdclone pass to generate clones for functions with the "omp declare target" attribute as if they had "omp declare simd", provided the function appears to be suitable for SIMD execution. The filter is conservative, rejecting functions that write memory or that call other functions not known to be safe. A new option -fopenmp-target-simd-clone is added to control this transformation; it's enabled at -O2 and higher. gcc/ChangeLog: * common.opt (fopenmp-target-simd-clone): New option. * opts.cc (default_options_table): Add -fopenmp-target-simd-clone. * doc/invoke.texi (-fopenmp-target-simd-clone): Document. * omp-simd-clone.cc (auto_simd_check_stmt): New function. (mark_auto_simd_clone): New function. (simd_clone_create): Add force_local argument, make the symbol have internal linkage if it is true. (expand_simd_clones): Also check for cloneable functions with "omp declare target". Pass explicit_p argument to simd_clone.compute_vecsize_and_simdlen target hook. * target.def (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): Add bool explicit_p argument. * doc/tm.texi: Regenerated. * config/aarch64/aarch64.cc (aarch64_simd_clone_compute_vecsize_and_simdlen): Update. * config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen): Update. * config/i386/i386.cc (ix86_simd_clone_compute_vecsize_and_simdlen): Update. gcc/testsuite/ChangeLog: * gcc.dg/gomp/target-simd-clone-1.c: New. * gcc.dg/gomp/target-simd-clone-2.c: New. * gcc.dg/gomp/target-simd-clone-3.c: New. * gcc.dg/gomp/target-simd-clone-4.c: New. * gcc.dg/gomp/target-simd-clone-5.c: New. * gcc.dg/gomp/target-simd-clone-6.c: New. --- gcc/common.opt | 4 + gcc/config/aarch64/aarch64.cc | 24 +- gcc/config/gcn/gcn.cc | 10 +- gcc/config/i386/i386.cc | 27 +- gcc/doc/invoke.texi | 12 +- gcc/doc/tm.texi | 2 +- gcc/omp-simd-clone.cc | 237 ++++++++++++++++-- gcc/opts.cc | 1 + gcc/target.def | 2 +- .../gcc.dg/gomp/target-simd-clone-1.c | 18 ++ .../gcc.dg/gomp/target-simd-clone-2.c | 18 ++ .../gcc.dg/gomp/target-simd-clone-3.c | 17 ++ .../gcc.dg/gomp/target-simd-clone-4.c | 16 ++ .../gcc.dg/gomp/target-simd-clone-5.c | 13 + .../gcc.dg/gomp/target-simd-clone-6.c | 13 + 15 files changed, 362 insertions(+), 52 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-5.c create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-6.c diff --git a/gcc/common.opt b/gcc/common.opt index fba90ff6dcb..c735c62a8d4 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2217,6 +2217,10 @@ fomit-frame-pointer Common Var(flag_omit_frame_pointer) Optimization When possible do not generate stack frames. +fopenmp-target-simd-clone +Common Var(flag_openmp_target_simd_clone) Optimization +Generate SIMD clones for functions with the OpenMP declare target directive. + fopt-info Common Var(flag_opt_info) Optimization Enable all optimization info dumps on stderr. diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index f199e77cd42..c6d282c55ef 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -26612,7 +26612,8 @@ currently_supported_simd_type (tree t, tree b) static int aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, struct cgraph_simd_clone *clonei, - tree base_type, int num) + tree base_type, int num, + bool explicit_p) { tree t, ret_type; unsigned int elt_bits, count; @@ -26630,8 +26631,9 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, || const_simdlen > 1024 || (const_simdlen & (const_simdlen - 1)) != 0)) { - warning_at (DECL_SOURCE_LOCATION (node->decl), 0, - "unsupported simdlen %wd", const_simdlen); + if (explicit_p) + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, + "unsupported simdlen %wd", const_simdlen); return 0; } @@ -26639,7 +26641,9 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, if (TREE_CODE (ret_type) != VOID_TYPE && !currently_supported_simd_type (ret_type, base_type)) { - if (TYPE_SIZE (ret_type) != TYPE_SIZE (base_type)) + if (!explicit_p) + ; + else if (TYPE_SIZE (ret_type) != TYPE_SIZE (base_type)) warning_at (DECL_SOURCE_LOCATION (node->decl), 0, "GCC does not currently support mixed size types " "for % functions"); @@ -26666,7 +26670,9 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM && !currently_supported_simd_type (arg_type, base_type)) { - if (TYPE_SIZE (arg_type) != TYPE_SIZE (base_type)) + if (!explicit_p) + ; + else if (TYPE_SIZE (arg_type) != TYPE_SIZE (base_type)) warning_at (DECL_SOURCE_LOCATION (node->decl), 0, "GCC does not currently support mixed size types " "for % functions"); @@ -26696,9 +26702,11 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, if (clonei->simdlen.is_constant (&const_simdlen) && maybe_ne (vec_bits, 64U) && maybe_ne (vec_bits, 128U)) { - warning_at (DECL_SOURCE_LOCATION (node->decl), 0, - "GCC does not currently support simdlen %wd for type %qT", - const_simdlen, base_type); + if (explicit_p) + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, + "GCC does not currently support simdlen %wd for " + "type %qT", + const_simdlen, base_type); return 0; } } diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index ceb69000807..5c80b8df852 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -4562,7 +4562,8 @@ static int gcn_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *ARG_UNUSED (node), struct cgraph_simd_clone *clonei, tree base_type, - int ARG_UNUSED (num)) + int ARG_UNUSED (num), + bool explicit_p) { unsigned int elt_bits = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)); @@ -4572,9 +4573,10 @@ gcn_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *ARG_UNUSED (node { /* Note that x86 has a similar message that is likely to trigger on sizes that are OK for gcn; the user can't win. */ - warning_at (DECL_SOURCE_LOCATION (node->decl), 0, - "unsupported simdlen %wd (amdgcn)", - clonei->simdlen.to_constant ()); + if (explicit_p) + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, + "unsupported simdlen %wd (amdgcn)", + clonei->simdlen.to_constant ()); return 0; } diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index c4d0e36e9c0..99ae388ad56 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -23647,7 +23647,8 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val) static int ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, struct cgraph_simd_clone *clonei, - tree base_type, int num) + tree base_type, int num, + bool explicit_p) { int ret = 1; @@ -23656,8 +23657,9 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, || clonei->simdlen > 1024 || (clonei->simdlen & (clonei->simdlen - 1)) != 0)) { - warning_at (DECL_SOURCE_LOCATION (node->decl), 0, - "unsupported simdlen %wd", clonei->simdlen.to_constant ()); + if (explicit_p) + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, + "unsupported simdlen %wd", clonei->simdlen.to_constant ()); return 0; } @@ -23677,8 +23679,9 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, break; /* FALLTHRU */ default: - warning_at (DECL_SOURCE_LOCATION (node->decl), 0, - "unsupported return type %qT for simd", ret_type); + if (explicit_p) + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, + "unsupported return type %qT for simd", ret_type); return 0; } @@ -23707,13 +23710,14 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, default: if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_UNIFORM) break; - warning_at (DECL_SOURCE_LOCATION (node->decl), 0, - "unsupported argument type %qT for simd", arg_type); + if (explicit_p) + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, + "unsupported argument type %qT for simd", arg_type); return 0; } } - if (!TREE_PUBLIC (node->decl)) + if (!TREE_PUBLIC (node->decl) || !explicit_p) { /* If the function isn't exported, we can pick up just one ISA for the clones. */ @@ -23784,9 +23788,10 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, cnt /= clonei->vecsize_float; if (cnt > (TARGET_64BIT ? 16 : 8)) { - warning_at (DECL_SOURCE_LOCATION (node->decl), 0, - "unsupported simdlen %wd", - clonei->simdlen.to_constant ()); + if (explicit_p) + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, + "unsupported simdlen %wd", + clonei->simdlen.to_constant ()); return 0; } } diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 8def6baa904..e05739a334c 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -204,7 +204,7 @@ in the following sections. -flax-vector-conversions -fms-extensions @gol -foffload=@var{arg} -foffload-options=@var{arg} @gol -fopenacc -fopenacc-dim=@var{geom} @gol --fopenmp -fopenmp-simd @gol +-fopenmp -fopenmp-simd -fopenmp-target-simd-clone @gol -fpermitted-flt-eval-methods=@var{standard} @gol -fplan9-extensions -fsigned-bitfields -funsigned-bitfields @gol -fsigned-char -funsigned-char -fsso-struct=@var{endianness}} @@ -2749,6 +2749,16 @@ Enable handling of OpenMP's SIMD directives with @code{#pragma omp} in C/C++ and @code{!$omp} in Fortran. Other OpenMP directives are ignored. +@item -fopenmp-target-simd-clone +@opindex fopenmp-target-simd-clone +@cindex OpenMP target SIMD clone +In addition to generating SIMD clones for functions marked with the +@code{declare simd} directive, GCC also generates clones +for functions marked with the OpenMP @code{declare target} directive +that are suitable for vectorization when this option is in effect. +It is enabled by default at @option{-O2} and higher (but not @option{-Os} +or @option{-Og}). + @item -fpermitted-flt-eval-methods=@var{style} @opindex fpermitted-flt-eval-methods @opindex fpermitted-flt-eval-methods=c11 diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index c3001c6ded9..d0a366f1908 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6249,7 +6249,7 @@ The default is @code{NULL_TREE} which means to not vectorize scatter stores. @end deftypefn -@deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int}) +@deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int}, @var{bool}) This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float} fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also @var{simdlen} field if it was previously 0. diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc index 34cbee5afcd..f9e98b099d1 100644 --- a/gcc/omp-simd-clone.cc +++ b/gcc/omp-simd-clone.cc @@ -51,6 +51,152 @@ along with GCC; see the file COPYING3. If not see #include "stringpool.h" #include "attribs.h" #include "omp-simd-clone.h" +#include "omp-low.h" +#include "omp-general.h" + +/* Helper function for mark_auto_simd_clone; return false if the statement + violates restrictions for an "omp declare simd" function. Specifically, + the function must not + - throw or call setjmp/longjmp + - write memory that could alias parallel calls + - include openmp directives or calls + - call functions that might do those things */ + +static bool +auto_simd_check_stmt (gimple *stmt, tree outer) +{ + tree decl; + + switch (gimple_code (stmt)) + { + case GIMPLE_CALL: + decl = gimple_call_fndecl (stmt); + + /* We can't know whether indirect calls are safe. */ + if (decl == NULL_TREE) + return false; + + /* Calls to functions that are CONST or PURE are ok. */ + if (gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE)) + break; + + /* Calls to functions that are already marked "omp declare simd" are + OK. */ + if (lookup_attribute ("omp declare simd", DECL_ATTRIBUTES (decl))) + break; + + /* Let recursive calls to the current function through. */ + if (decl == outer) + break; + + /* Other function calls are not permitted. */ + return false; + + /* OpenMP directives are not permitted. */ + CASE_GIMPLE_OMP: + return false; + + /* Conservatively reject all EH-related constructs. */ + case GIMPLE_CATCH: + case GIMPLE_EH_FILTER: + case GIMPLE_EH_MUST_NOT_THROW: + case GIMPLE_EH_ELSE: + case GIMPLE_EH_DISPATCH: + case GIMPLE_RESX: + case GIMPLE_TRY: + return false; + + /* Asms are not permitted since we don't know what they do. */ + case GIMPLE_ASM: + return false; + + default: + break; + } + + /* Memory writes are not permitted. + FIXME: this could be relaxed a little to permit writes to + function-local variables that could not alias other instances + of the function running in parallel. */ + if (gimple_store_p (stmt)) + return false; + else + return true; +} + +/* If the function NODE appears suitable for auto-annotation with "declare + simd", add and return such an attribute, otherwise return null. */ + +static tree +mark_auto_simd_clone (struct cgraph_node *node) +{ + tree decl = node->decl; + tree t; + machine_mode m; + tree result; + basic_block bb; + + /* Nothing to do if the function isn't a definition or doesn't + have a body. */ + if (!node->definition || !node->has_gimple_body_p ()) + return NULL_TREE; + + /* Nothing to do if the function already has the "omp declare simd" + attribute, is marked noclone, or is not "omp declare target". */ + if (lookup_attribute ("omp declare simd", DECL_ATTRIBUTES (decl)) + || lookup_attribute ("noclone", DECL_ATTRIBUTES (decl)) + || !lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl))) + return NULL_TREE; + + /* Backends will check for vectorizable arguments/return types in a + target-specific way, but we can immediately filter out functions + that have non-scalar arguments/return types. Also, atomic types + trigger warnings in simd_clone_clauses_extract. */ + t = TREE_TYPE (TREE_TYPE (decl)); + m = TYPE_MODE (t); + if (!(VOID_TYPE_P (t) || is_a (m)) || TYPE_ATOMIC (t)) + return NULL_TREE; + + if (TYPE_ARG_TYPES (TREE_TYPE (decl))) + { + for (tree temp = TYPE_ARG_TYPES (TREE_TYPE (decl)); + temp; temp = TREE_CHAIN (temp)) + { + t = TREE_VALUE (temp); + m = TYPE_MODE (t); + if (!(VOID_TYPE_P (t) || is_a (m)) || TYPE_ATOMIC (t)) + return NULL_TREE; + } + } + else + { + for (tree temp = DECL_ARGUMENTS (decl); temp; temp = DECL_CHAIN (temp)) + { + t = TREE_TYPE (temp); + m = TYPE_MODE (t); + if (!(VOID_TYPE_P (t) || is_a (m)) || TYPE_ATOMIC (t)) + return NULL_TREE; + } + } + + /* Scan the function body to see if it is suitable for SIMD-ization. */ + node->get_body (); + + FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (decl)) + { + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); + gsi_next (&gsi)) + if (!auto_simd_check_stmt (gsi_stmt (gsi), decl)) + return NULL_TREE; + } + + /* All is good. */ + result = tree_cons (get_identifier ("omp declare simd"), NULL, + DECL_ATTRIBUTES (decl)); + DECL_ATTRIBUTES (decl) = result; + return result; +} + /* Return the number of elements in vector type VECTYPE, which is associated with a SIMD clone. At present these always have a constant length. */ @@ -430,10 +576,12 @@ simd_clone_mangle (struct cgraph_node *node, return get_identifier (str); } -/* Create a simd clone of OLD_NODE and return it. */ +/* Create a simd clone of OLD_NODE and return it. If FORCE_LOCAL is true, + create it as a local symbol, otherwise copy the symbol linkage and + visibility attributes from OLD_NODE. */ static struct cgraph_node * -simd_clone_create (struct cgraph_node *old_node) +simd_clone_create (struct cgraph_node *old_node, bool force_local) { struct cgraph_node *new_node; if (old_node->definition) @@ -463,23 +611,38 @@ simd_clone_create (struct cgraph_node *old_node) return new_node; set_decl_built_in_function (new_node->decl, NOT_BUILT_IN, 0); - TREE_PUBLIC (new_node->decl) = TREE_PUBLIC (old_node->decl); - DECL_COMDAT (new_node->decl) = DECL_COMDAT (old_node->decl); - DECL_WEAK (new_node->decl) = DECL_WEAK (old_node->decl); - DECL_EXTERNAL (new_node->decl) = DECL_EXTERNAL (old_node->decl); - DECL_VISIBILITY_SPECIFIED (new_node->decl) - = DECL_VISIBILITY_SPECIFIED (old_node->decl); - DECL_VISIBILITY (new_node->decl) = DECL_VISIBILITY (old_node->decl); - DECL_DLLIMPORT_P (new_node->decl) = DECL_DLLIMPORT_P (old_node->decl); - if (DECL_ONE_ONLY (old_node->decl)) - make_decl_one_only (new_node->decl, DECL_ASSEMBLER_NAME (new_node->decl)); - - /* The method cgraph_version_clone_with_body () will force the new - symbol local. Undo this, and inherit external visibility from - the old node. */ - new_node->local = old_node->local; - new_node->externally_visible = old_node->externally_visible; - new_node->calls_declare_variant_alt = old_node->calls_declare_variant_alt; + if (force_local) + { + TREE_PUBLIC (new_node->decl) = 0; + DECL_COMDAT (new_node->decl) = 0; + DECL_WEAK (new_node->decl) = 0; + DECL_EXTERNAL (new_node->decl) = 0; + DECL_VISIBILITY_SPECIFIED (new_node->decl) = 0; + DECL_VISIBILITY (new_node->decl) = VISIBILITY_DEFAULT; + DECL_DLLIMPORT_P (new_node->decl) = 0; + } + else + { + TREE_PUBLIC (new_node->decl) = TREE_PUBLIC (old_node->decl); + DECL_COMDAT (new_node->decl) = DECL_COMDAT (old_node->decl); + DECL_WEAK (new_node->decl) = DECL_WEAK (old_node->decl); + DECL_EXTERNAL (new_node->decl) = DECL_EXTERNAL (old_node->decl); + DECL_VISIBILITY_SPECIFIED (new_node->decl) + = DECL_VISIBILITY_SPECIFIED (old_node->decl); + DECL_VISIBILITY (new_node->decl) = DECL_VISIBILITY (old_node->decl); + DECL_DLLIMPORT_P (new_node->decl) = DECL_DLLIMPORT_P (old_node->decl); + if (DECL_ONE_ONLY (old_node->decl)) + make_decl_one_only (new_node->decl, + DECL_ASSEMBLER_NAME (new_node->decl)); + + /* The method cgraph_version_clone_with_body () will force the new + symbol local. Undo this, and inherit external visibility from + the old node. */ + new_node->local = old_node->local; + new_node->externally_visible = old_node->externally_visible; + new_node->calls_declare_variant_alt + = old_node->calls_declare_variant_alt; + } return new_node; } @@ -1683,13 +1846,32 @@ simd_clone_adjust (struct cgraph_node *node) void expand_simd_clones (struct cgraph_node *node) { - tree attr = lookup_attribute ("omp declare simd", - DECL_ATTRIBUTES (node->decl)); - if (attr == NULL_TREE - || node->inlined_to + tree attr; + bool explicit_p = true; + + if (node->inlined_to || lookup_attribute ("noclone", DECL_ATTRIBUTES (node->decl))) return; + attr = lookup_attribute ("omp declare simd", + DECL_ATTRIBUTES (node->decl)); + + /* See if we can add an "omp declare simd" directive implicitly + before giving up. */ + /* FIXME: OpenACC "#pragma acc routine" translates into + "omp declare target", but appears also to have some other effects + that conflict with generating SIMD clones, causing ICEs. So don't + do this if we've got OpenACC instead of OpenMP. */ + if (attr == NULL_TREE + && flag_openmp_target_simd_clone + && !oacc_get_fn_attrib (node->decl)) + { + attr = mark_auto_simd_clone (node); + explicit_p = false; + } + if (attr == NULL_TREE) + return; + /* Ignore #pragma omp declare simd extern int foo (); @@ -1714,13 +1896,15 @@ expand_simd_clones (struct cgraph_node *node) poly_uint64 orig_simdlen = clone_info->simdlen; tree base_type = simd_clone_compute_base_data_type (node, clone_info); + /* The target can return 0 (no simd clones should be created), 1 (just one ISA of simd clones should be created) or higher count of ISA variants. In that case, clone_info is initialized for the first ISA variant. */ int count = targetm.simd_clone.compute_vecsize_and_simdlen (node, clone_info, - base_type, 0); + base_type, 0, + explicit_p); if (count == 0) continue; @@ -1745,7 +1929,8 @@ expand_simd_clones (struct cgraph_node *node) /* And call the target hook again to get the right ISA. */ targetm.simd_clone.compute_vecsize_and_simdlen (node, clone, base_type, - i / 2); + i / 2, + explicit_p); if ((i & 1) != 0) clone->inbranch = 1; } @@ -1763,7 +1948,7 @@ expand_simd_clones (struct cgraph_node *node) /* Only when we are sure we want to create the clone actually clone the function (or definitions) or create another extern FUNCTION_DECL (for prototypes without definitions). */ - struct cgraph_node *n = simd_clone_create (node); + struct cgraph_node *n = simd_clone_create (node, !explicit_p); if (n == NULL) { if (i == 0) diff --git a/gcc/opts.cc b/gcc/opts.cc index 54e57f36755..b8ca6fdca82 100644 --- a/gcc/opts.cc +++ b/gcc/opts.cc @@ -658,6 +658,7 @@ static const struct default_options default_options_table[] = REORDER_BLOCKS_ALGORITHM_STC }, { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_ftree_loop_vectorize, NULL, 1 }, { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_ftree_slp_vectorize, NULL, 1 }, + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fopenmp_target_simd_clone, NULL, 1 }, #ifdef INSN_SCHEDULING /* Only run the pre-regalloc scheduling pass if optimizing for speed. */ { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fschedule_insns, NULL, 1 }, diff --git a/gcc/target.def b/gcc/target.def index 4d49ffc2c88..6e830bed52a 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1634,7 +1634,7 @@ fields in @var{simd_clone} structure pointed by @var{clone_info} argument and al not determined by the bitsize (in which case @var{simdlen} is always used).\n\ The hook should return 0 if SIMD clones shouldn't be emitted,\n\ or number of @var{vecsize_mangle} variants that should be emitted.", -int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int), NULL) +int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int, bool), NULL) DEFHOOK (adjust, diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c new file mode 100644 index 00000000000..ab027a60970 --- /dev/null +++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c @@ -0,0 +1,18 @@ +/* { dg-options "-fopenmp -O2" } */ + +/* Test that simd clones are generated for functions with "declare target". */ + +#pragma omp declare target +int addit(int a, int b, int c) +{ + return a + b; +} +#pragma omp end declare target + +/* Although addit has external linkage, we expect clones to be generated as + for a function with internal linkage. */ + +/* { dg-final { scan-assembler "\\.type.*_ZGVbN4vvv_addit,.*function" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { scan-assembler "\\.type.*_ZGVbM4vvv_addit,.*function" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { scan-assembler-not "\\.globl.*_ZGVbN4vvv_addit" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { scan-assembler-not "\\.globl.*_ZGVbM4vvv_addit" { target i?86-*-* x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c new file mode 100644 index 00000000000..0ccbfe1d765 --- /dev/null +++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c @@ -0,0 +1,18 @@ +/* { dg-options "-fopenmp -O2" } */ + +/* Test that simd clones are not generated for functions with + "declare target" but unsuitable arguments. */ + +struct s { + int a; + int b; +}; + +#pragma omp declare target +int addit (struct s x) +{ + return x.a + x.b; +} +#pragma omp end declare target + +/* { dg-final { scan-assembler-not "_Z.*_addit" { target i?86-*-* x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c new file mode 100644 index 00000000000..c313cfe53b0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c @@ -0,0 +1,17 @@ +/* { dg-options "-fopenmp -O2" } */ + +/* Test that simd clones are not generated for functions with + "declare target" but that call possibly side-effecting functions + in the body. */ + +extern int f (int); + +#pragma omp declare target +int addit(int a, int b, int c) +{ + return f(a) + b; +} +#pragma omp end declare target + +/* { dg-final { scan-assembler-not "_Z.*_addit" { target i?86-*-* x86_64-*-* } } } */ + diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c new file mode 100644 index 00000000000..e32b22f6a59 --- /dev/null +++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c @@ -0,0 +1,16 @@ +/* { dg-options "-fopenmp -O2" } */ + +/* Test that simd clones are not generated for functions with + "declare target" but that write memory in the body. */ + +extern int save; + +#pragma omp declare target +int addit(int a, int b, int c) +{ + save = c; + return a + b; +} +#pragma omp end declare target + +/* { dg-final { scan-assembler-not "_Z.*_addit" { target i?86-*-* x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-5.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-5.c new file mode 100644 index 00000000000..d39a9ab737f --- /dev/null +++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-5.c @@ -0,0 +1,13 @@ +/* { dg-options "-fopenmp -Os" } */ + +/* Test that simd clones are not generated for functions with + "declare target" at -Os. */ + +#pragma omp declare target +int addit(int a, int b, int c) +{ + return a + b; +} +#pragma omp end declare target + +/* { dg-final { scan-assembler-not "_Z.*_addit" { target i?86-*-* x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-6.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-6.c new file mode 100644 index 00000000000..a0c529b1c4e --- /dev/null +++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-6.c @@ -0,0 +1,13 @@ +/* { dg-options "-fopenmp -Og" } */ + +/* Test that simd clones are not generated for functions with + "declare target" at -Og. */ + +#pragma omp declare target +int addit(int a, int b, int c) +{ + return a + b; +} +#pragma omp end declare target + +/* { dg-final { scan-assembler-not "_Z.*_addit" { target i?86-*-* x86_64-*-* } } } */ -- 2.31.1