OpenMP: Generate SIMD clones for functions with "declare target"

Message ID 0b64e323-63f9-e4b7-eb7f-83f3b5e3125b@codesourcery.com
State New, archived
Headers
Series OpenMP: Generate SIMD clones for functions with "declare target" |

Commit Message

Sandra Loosemore Sept. 14, 2022, 5:32 p.m. UTC
  This patch is part of the ongoing effort to find more SIMD optimization 
opportunities in OpenMP code.  Here we are looking for functions that 
have the "omp declare target" attribute that are also suitable 
candidates for automatic SIMD cloning.  I've made the filter quite 
conservative, but maybe it could be improved with some further analysis. 
  I added a command-line flag to disable this in case it is buggy :-P or 
leads to excessive code bloat without improving performance in some 
cases, otherwise the SIMD clones are generated in the same way and at 
the same optimization levels as the existing simdclone pass.

I had to modify the TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN hook 
to add a boolean argument to control diagnostics, since GCC shouldn't 
complain about types the target doesn't support in cases where the user 
didn't explicitly ask for clones to be created.  I tested on 
x86_64-linux-gnu-amdgcn, plain x86_64-linux-gnu, and aarch64-linux-gnu 
to get coverage of all 3 backends that implement this hook.  OK for 
mainline?

-Sandra
  

Comments

Jakub Jelinek Sept. 14, 2022, 6:12 p.m. UTC | #1
On Wed, Sep 14, 2022 at 11:32:11AM -0600, Sandra Loosemore wrote:
> This patch is part of the ongoing effort to find more SIMD optimization
> opportunities in OpenMP code.  Here we are looking for functions that have
> the "omp declare target" attribute that are also suitable candidates for
> automatic SIMD cloning.  I've made the filter quite conservative, but maybe
> it could be improved with some further analysis.  I added a command-line
> flag to disable this in case it is buggy :-P or leads to excessive code
> bloat without improving performance in some cases, otherwise the SIMD clones
> are generated in the same way and at the same optimization levels as the
> existing simdclone pass.
> 
> I had to modify the TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN hook to
> add a boolean argument to control diagnostics, since GCC shouldn't complain
> about types the target doesn't support in cases where the user didn't
> explicitly ask for clones to be created.  I tested on
> x86_64-linux-gnu-amdgcn, plain x86_64-linux-gnu, and aarch64-linux-gnu to
> get coverage of all 3 backends that implement this hook.  OK for mainline?

declare simd is an ABI relevant declarative directive, while declare target
is not, all the latter does is say whether the function should be also (or
only) compilable on an offloading target.
Creating simd clones under some option for random declare target functions
(note, declare target is partly auto-discovered property) is perhaps fine
for functions not exported from the translation unit where it is purely
an optimization, but otherwise it is a significant ABI problem,
you export whole new bunch of new exports on the definition side and expect
those to be exported on the use side.  If you compile one TU with
-fopenmp-target-simd-clone and another one without it, program might not
link anymore.  And worse, as it is decided based on the exact implementation
of the function, I assume you can't do that automatically for functions
not defined locally, but whether something has simd clones or not might
change over time based on how you change the implementation.
Say libfoo.so exports a declare target function foo, which is initially
implemented without say using inline asm (or calling one of the "bad"
functions or using exceptions etc.), but then a bugfix comes and needs
to use inline asm or something else in the implementation.  Previously
libfoo.so would export the simd clones, but now it doesn't, so the ABI
of the library changes.

If it is pure optimization thing and purely keyed on the definition,
all the simd clones should be local to the TU, never exported from it.

	Jakub
  
Thomas Schwinge Sept. 14, 2022, 9:45 p.m. UTC | #2
Hi Sandra!

Commenting on just one single item:

On 2022-09-14T11:32:11-0600, Sandra Loosemore <sandra@codesourcery.com> wrote:
> --- a/gcc/omp-simd-clone.cc
> +++ b/gcc/omp-simd-clone.cc

>  void
>  expand_simd_clones (struct cgraph_node *node)
>  {
> -  tree attr = lookup_attribute ("omp declare simd",
> -                             DECL_ATTRIBUTES (node->decl));
> -  if (attr == NULL_TREE
> -      || node->inlined_to
> +  tree attr;
> +  bool error_p = true;
> +
> +  if (node->inlined_to
>        || lookup_attribute ("noclone", DECL_ATTRIBUTES (node->decl)))
>      return;
>
> +  attr = lookup_attribute ("omp declare simd",
> +                        DECL_ATTRIBUTES (node->decl));
> +
> +  /* See if we can add an "omp declare simd" directive implicitly
> +     before giving up.  */
> +  /* FIXME: OpenACC "#pragma acc routine" translates into
> +     "omp declare target", but appears also to have some other effects
> +     that conflict with generating SIMD clones, causing ICEs.  So don't
> +     do this if we've got OpenACC instead of OpenMP.  */

Uh, ICEs...  (But I suppose this processing is not relevant for OpenACC
'routine's.)

However, OpenACC and OpenMP support may be active at the same time...

> +  if (attr == NULL_TREE
> +      && flag_openmp_target_simd_clone && !flag_openacc)

..., so '!flag_openacc' is not the right check here.  Instead you'd do
'!oacc_get_fn_attrib (DECL_ATTRIBUTES (node->decl))' (untested) or
similar.

> +    {
> +      attr = mark_auto_simd_clone (node);
> +      error_p = false;
> +    }
> +  if (attr == NULL_TREE)
> +    return;


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  

Patch

From 77df203f8ec191e036580d17b7fa83ae517a8018 Mon Sep 17 00:00:00 2001
From: Sandra Loosemore <sandra@codesourcery.com>
Date: Wed, 14 Sep 2022 00:20:25 +0000
Subject: [PATCH] OpenMP: Generate SIMD clones for functions with "declare
 target"

This patch causes the IPA simdclone pass to generate clones for
functions with the "omp declare target" attribute as if they had
"omp declare simd", provided the function appears to be suitable for
SIMD execution.  The filter is conservative, rejecting functions
that write memory or that call other functions not known to be safe.
A new option -fopenmp-target-simd-clone is added to control this
transformation; it's enabled by default.

gcc/ChangeLog:

	* c-family/c.opt (fopenmp-target-simd-clone): New option.
	* fortran/lang.opt (fopenmp-target-simd-clone): New option.
	* doc/invoke.texi (-fno-openmp-target-simd-clone): Document.
	* omp-simd-clone.cc (auto_simd_check_stmt): New function.
	(mark_auto_simd_clone): New function.
	(expand_simd_clones): Also check for cloneable functions with
	"omp declare target".  Pass error_p argument to
	simd_clone.compute_vecsize_and_simdlen target hook.
	* target.def (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN):
	Add bool error_p argument.
	* doc/tm.texi: Regenerated.
	* config/aarch64/aarch64.cc
	(aarch64_simd_clone_compute_vecsize_and_simdlen): Update.
	* config/gcn/gcn.cc
	(gcn_simd_clone_compute_vecsize_and_simdlen): Update.
	* config/i386/i386.cc
	(ix86_simd_clone_compute_vecsize_and_simdlen): Update.

gcc/testsuite/ChangeLog:

	* gcc.dg/gomp/target-simd-clone-1.c: New.
	* gcc.dg/gomp/target-simd-clone-2.c: New.
	* gcc.dg/gomp/target-simd-clone-3.c: New.
	* gcc.dg/gomp/target-simd-clone-4.c: New.
---
 gcc/c-family/c.opt                            |   4 +
 gcc/config/aarch64/aarch64.cc                 |  24 ++-
 gcc/config/gcn/gcn.cc                         |  10 +-
 gcc/config/i386/i386.cc                       |  25 ++-
 gcc/doc/invoke.texi                           |  13 +-
 gcc/doc/tm.texi                               |   2 +-
 gcc/fortran/lang.opt                          |   4 +
 gcc/omp-simd-clone.cc                         | 178 +++++++++++++++++-
 gcc/target.def                                |   2 +-
 .../gcc.dg/gomp/target-simd-clone-1.c         |  19 ++
 .../gcc.dg/gomp/target-simd-clone-2.c         |  18 ++
 .../gcc.dg/gomp/target-simd-clone-3.c         |  17 ++
 .../gcc.dg/gomp/target-simd-clone-4.c         |  16 ++
 13 files changed, 301 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index ff6fe861534..0be39ae7709 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1993,6 +1993,10 @@  fopenmp-simd
 C ObjC C++ ObjC++ Var(flag_openmp_simd)
 Enable OpenMP's SIMD directives.
 
+fopenmp-target-simd-clone
+C ObjC C++ ObjC++ Var(flag_openmp_target_simd_clone) Init(1)
+Generate SIMD clones for functions with the OpenMP declare target directive.
+
 foperator-names
 C++ ObjC++
 Recognize C++ keywords like \"compl\" and \"xor\".
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f199e77cd42..42c5d281537 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -26612,7 +26612,8 @@  currently_supported_simd_type (tree t, tree b)
 static int
 aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 					struct cgraph_simd_clone *clonei,
-					tree base_type, int num)
+					tree base_type, int num,
+					bool error_p)
 {
   tree t, ret_type;
   unsigned int elt_bits, count;
@@ -26630,8 +26631,9 @@  aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	  || const_simdlen > 1024
 	  || (const_simdlen & (const_simdlen - 1)) != 0))
     {
-      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %wd", const_simdlen);
+      if (error_p)
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		    "unsupported simdlen %wd", const_simdlen);
       return 0;
     }
 
@@ -26639,7 +26641,9 @@  aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
   if (TREE_CODE (ret_type) != VOID_TYPE
       && !currently_supported_simd_type (ret_type, base_type))
     {
-      if (TYPE_SIZE (ret_type) != TYPE_SIZE (base_type))
+      if (!error_p)
+	;
+      else if (TYPE_SIZE (ret_type) != TYPE_SIZE (base_type))
 	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
 		    "GCC does not currently support mixed size types "
 		    "for %<simd%> functions");
@@ -26666,7 +26670,9 @@  aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
       if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM
 	  && !currently_supported_simd_type (arg_type, base_type))
 	{
-	  if (TYPE_SIZE (arg_type) != TYPE_SIZE (base_type))
+	  if (!error_p)
+	    ;
+	  else if (TYPE_SIZE (arg_type) != TYPE_SIZE (base_type))
 	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
 			"GCC does not currently support mixed size types "
 			"for %<simd%> functions");
@@ -26696,9 +26702,11 @@  aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
       if (clonei->simdlen.is_constant (&const_simdlen)
 	  && maybe_ne (vec_bits, 64U) && maybe_ne (vec_bits, 128U))
 	{
-	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "GCC does not currently support simdlen %wd for type %qT",
-		      const_simdlen, base_type);
+	  if (error_p)
+	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+			"GCC does not currently support simdlen %wd for "
+			"type %qT",
+			const_simdlen, base_type);
 	  return 0;
 	}
     }
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index ceb69000807..771cfac9672 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -4562,7 +4562,8 @@  static int
 gcn_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *ARG_UNUSED (node),
 					    struct cgraph_simd_clone *clonei,
 					    tree base_type,
-					    int ARG_UNUSED (num))
+					    int ARG_UNUSED (num),
+					    bool error_p)
 {
   unsigned int elt_bits = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
 
@@ -4572,9 +4573,10 @@  gcn_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *ARG_UNUSED (node
     {
       /* Note that x86 has a similar message that is likely to trigger on
 	 sizes that are OK for gcn; the user can't win.  */
-      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %wd (amdgcn)",
-		  clonei->simdlen.to_constant ());
+      if (error_p)
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		    "unsupported simdlen %wd (amdgcn)",
+		    clonei->simdlen.to_constant ());
       return 0;
     }
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index c4d0e36e9c0..98e5a3f28fe 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23647,7 +23647,8 @@  ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 static int
 ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 					     struct cgraph_simd_clone *clonei,
-					     tree base_type, int num)
+					     tree base_type, int num,
+					     bool error_p)
 {
   int ret = 1;
 
@@ -23656,8 +23657,9 @@  ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	  || clonei->simdlen > 1024
 	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
     {
-      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %wd", clonei->simdlen.to_constant ());
+      if (error_p)
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		    "unsupported simdlen %wd", clonei->simdlen.to_constant ());
       return 0;
     }
 
@@ -23677,8 +23679,9 @@  ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	  break;
 	/* FALLTHRU */
       default:
-	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		    "unsupported return type %qT for simd", ret_type);
+	if (error_p)
+	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		      "unsupported return type %qT for simd", ret_type);
 	return 0;
       }
 
@@ -23707,8 +23710,9 @@  ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	default:
 	  if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_UNIFORM)
 	    break;
-	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "unsupported argument type %qT for simd", arg_type);
+	  if (error_p)
+	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+			"unsupported argument type %qT for simd", arg_type);
 	  return 0;
 	}
     }
@@ -23784,9 +23788,10 @@  ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	cnt /= clonei->vecsize_float;
       if (cnt > (TARGET_64BIT ? 16 : 8))
 	{
-	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "unsupported simdlen %wd",
-		      clonei->simdlen.to_constant ());
+	  if (error_p)
+	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+			"unsupported simdlen %wd",
+			clonei->simdlen.to_constant ());
 	  return 0;
 	}
       }
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8def6baa904..f822091af09 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -204,7 +204,7 @@  in the following sections.
 -flax-vector-conversions  -fms-extensions @gol
 -foffload=@var{arg}  -foffload-options=@var{arg} @gol
 -fopenacc  -fopenacc-dim=@var{geom} @gol
--fopenmp  -fopenmp-simd @gol
+-fopenmp  -fopenmp-simd  -fno-openmp-target-simd-clone @gol
 -fpermitted-flt-eval-methods=@var{standard} @gol
 -fplan9-extensions  -fsigned-bitfields  -funsigned-bitfields @gol
 -fsigned-char  -funsigned-char  -fsso-struct=@var{endianness}}
@@ -2749,6 +2749,17 @@  Enable handling of OpenMP's SIMD directives with @code{#pragma omp}
 in C/C++ and @code{!$omp} in Fortran. Other OpenMP directives
 are ignored.
 
+@item -fno-openmp-target-simd-clone
+@opindex fno-openmp-target-simd-clone
+@cindex OpenMP target SIMD clone
+In addition to generating SIMD clones for functions marked with the
+@code{declare simd} directive, by default, GCC also generates clones
+for functions marked with the OpenMP @code{declare target} directive
+that are suitable for vectorization.
+You can disable this behavior and restrict SIMD clone generation only
+to functions explicitly marked @code{declare simd} using
+@option{-fno-openmp-target-simd}.
+
 @item -fpermitted-flt-eval-methods=@var{style}
 @opindex fpermitted-flt-eval-methods
 @opindex fpermitted-flt-eval-methods=c11
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c3001c6ded9..d0a366f1908 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6249,7 +6249,7 @@  The default is @code{NULL_TREE} which means to not vectorize scatter
 stores.
 @end deftypefn
 
-@deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int})
+@deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int}, @var{bool})
 This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}
 fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also
 @var{simdlen} field if it was previously 0.
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index b18a6d3f4f9..58f7d3a2835 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -712,6 +712,10 @@  fopenmp-simd
 Fortran
 ; Documented in C
 
+fopenmp-target-simd-clone
+Fortran
+; Documented in C
+
 fpack-derived
 Fortran Var(flag_pack_derived)
 Try to lay out derived types as compactly as possible.
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 34cbee5afcd..07c7bad0e2c 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -51,6 +51,151 @@  along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "attribs.h"
 #include "omp-simd-clone.h"
+#include "omp-low.h"
+
+/* Helper function for mark_auto_simd_clone; return false if the statement
+   violates restrictions for an "omp declare simd" function.  Specifically,
+   the function must not
+   - throw or call setjmp/longjmp
+   - write memory that could alias parallel calls
+   - include openmp directives or calls
+   - call functions that might do those things */
+
+static bool
+auto_simd_check_stmt (gimple *stmt, tree outer)
+{
+  tree decl;
+
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_CALL:
+      decl = gimple_call_fndecl (stmt);
+
+      /* We can't know whether indirect calls are safe.  */
+      if (decl == NULL_TREE)
+	return false;
+
+      /* Calls to functions that are CONST or PURE are ok.  */
+      if (gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE))
+	break;
+
+      /* Calls to functions that are already marked "omp declare simd" are
+	 OK.  */
+      if (lookup_attribute ("omp declare simd", DECL_ATTRIBUTES (decl)))
+	break;
+
+      /* Let recursive calls to the current function through.  */
+      if (decl == outer)
+	break;
+
+      /* Other function calls are not permitted.  */
+      return false;
+
+      /* OpenMP directives are not permitted.  */
+    CASE_GIMPLE_OMP:
+      return false;
+
+      /* Conservatively reject all EH-related constructs.  */
+    case GIMPLE_CATCH:
+    case GIMPLE_EH_FILTER:
+    case GIMPLE_EH_MUST_NOT_THROW:
+    case GIMPLE_EH_ELSE:
+    case GIMPLE_EH_DISPATCH:
+    case GIMPLE_RESX:
+    case GIMPLE_TRY:
+      return false;
+
+      /* Asms are not permitted since we don't know what they do.  */
+    case GIMPLE_ASM:
+      return false;
+
+    default:
+      break;
+    }
+
+  /* Memory writes are not permitted.
+     FIXME: this could be relaxed a little to permit writes to
+     function-local variables that could not alias other instances
+     of the function running in parallel.  */
+  if (gimple_store_p (stmt))
+    return false;
+  else
+    return true;
+}
+
+/* If the function NODE appears suitable for auto-annotation with "declare
+   simd", add and return such an attribute, otherwise return null.  */
+
+static tree
+mark_auto_simd_clone (struct cgraph_node *node)
+{
+  tree decl = node->decl;
+  tree t;
+  machine_mode m;
+  tree result;
+  basic_block bb;
+
+  /* Nothing to do if the function isn't a declaration or doesn't
+     have a body.  */
+  if (!node->definition || !node->has_gimple_body_p ())
+    return NULL_TREE;
+
+  /* Nothing to do if the function already has the "omp declare simd"
+     attribute, is marked noclone, or is not "omp declare target".  */
+  if (lookup_attribute ("omp declare simd", DECL_ATTRIBUTES (decl))
+      || lookup_attribute ("noclone", DECL_ATTRIBUTES (decl))
+      || !lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
+    return NULL_TREE;
+
+  /* Backends will check for vectorizable arguments/return types in a
+     target-specific way, but we can immediately filter out functions
+     that have non-scalar arguments/return types.  Also, atomic types
+     trigger warnings in simd_clone_clauses_extract.  */
+  t = TREE_TYPE (TREE_TYPE (decl));
+  m = TYPE_MODE (t);
+  if (!(VOID_TYPE_P (t) || is_a <scalar_mode> (m)) || TYPE_ATOMIC (t))
+    return NULL_TREE;
+
+  if (TYPE_ARG_TYPES (TREE_TYPE (decl)))
+    {
+      for (tree temp = TYPE_ARG_TYPES (TREE_TYPE (decl));
+	   temp; temp = TREE_CHAIN (temp))
+	{
+	  t = TREE_VALUE (temp);
+	  m = TYPE_MODE (t);
+	  if (!(VOID_TYPE_P (t) || is_a <scalar_mode> (m)) || TYPE_ATOMIC (t))
+	    return NULL_TREE;
+	}
+    }
+  else
+    {
+      for (tree temp = DECL_ARGUMENTS (decl); temp; temp = DECL_CHAIN (temp))
+	{
+	  t = TREE_TYPE (temp);
+	  m = TYPE_MODE (t);
+	  if (!(VOID_TYPE_P (t) || is_a <scalar_mode> (m)) || TYPE_ATOMIC (t))
+	    return NULL_TREE;
+	}
+    }
+
+  /* Scan the function body to see if it is suitable for SIMD-ization.  */
+  node->get_body ();
+
+  FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (decl))
+    {
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+	   gsi_next (&gsi))
+	if (!auto_simd_check_stmt (gsi_stmt (gsi), decl))
+	  return NULL_TREE;
+    }
+
+  /* All is good.  */
+  result = tree_cons (get_identifier ("omp declare simd"), NULL,
+		      DECL_ATTRIBUTES (decl));
+  DECL_ATTRIBUTES (decl) = result;
+  return result;
+}
+
 
 /* Return the number of elements in vector type VECTYPE, which is associated
    with a SIMD clone.  At present these always have a constant length.  */
@@ -1683,13 +1828,31 @@  simd_clone_adjust (struct cgraph_node *node)
 void
 expand_simd_clones (struct cgraph_node *node)
 {
-  tree attr = lookup_attribute ("omp declare simd",
-				DECL_ATTRIBUTES (node->decl));
-  if (attr == NULL_TREE
-      || node->inlined_to
+  tree attr;
+  bool error_p = true;
+
+  if (node->inlined_to
       || lookup_attribute ("noclone", DECL_ATTRIBUTES (node->decl)))
     return;
 
+  attr = lookup_attribute ("omp declare simd",
+			   DECL_ATTRIBUTES (node->decl));
+
+  /* See if we can add an "omp declare simd" directive implicitly
+     before giving up.  */
+  /* FIXME: OpenACC "#pragma acc routine" translates into
+     "omp declare target", but appears also to have some other effects
+     that conflict with generating SIMD clones, causing ICEs.  So don't
+     do this if we've got OpenACC instead of OpenMP.  */
+  if (attr == NULL_TREE
+      && flag_openmp_target_simd_clone && !flag_openacc)
+    {
+      attr = mark_auto_simd_clone (node);
+      error_p = false;
+    }
+  if (attr == NULL_TREE)
+    return;
+
   /* Ignore
      #pragma omp declare simd
      extern int foo ();
@@ -1714,13 +1877,15 @@  expand_simd_clones (struct cgraph_node *node)
 
       poly_uint64 orig_simdlen = clone_info->simdlen;
       tree base_type = simd_clone_compute_base_data_type (node, clone_info);
+
       /* The target can return 0 (no simd clones should be created),
 	 1 (just one ISA of simd clones should be created) or higher
 	 count of ISA variants.  In that case, clone_info is initialized
 	 for the first ISA variant.  */
       int count
 	= targetm.simd_clone.compute_vecsize_and_simdlen (node, clone_info,
-							  base_type, 0);
+							  base_type, 0,
+							  error_p);
       if (count == 0)
 	continue;
 
@@ -1745,7 +1910,8 @@  expand_simd_clones (struct cgraph_node *node)
 	      /* And call the target hook again to get the right ISA.  */
 	      targetm.simd_clone.compute_vecsize_and_simdlen (node, clone,
 							      base_type,
-							      i / 2);
+							      i / 2,
+							      error_p);
 	      if ((i & 1) != 0)
 		clone->inbranch = 1;
 	    }
diff --git a/gcc/target.def b/gcc/target.def
index 4d49ffc2c88..6e830bed52a 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1634,7 +1634,7 @@  fields in @var{simd_clone} structure pointed by @var{clone_info} argument and al
 not determined by the bitsize (in which case @var{simdlen} is always used).\n\
 The hook should return 0 if SIMD clones shouldn't be emitted,\n\
 or number of @var{vecsize_mangle} variants that should be emitted.",
-int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int), NULL)
+int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int, bool), NULL)
 
 DEFHOOK
 (adjust,
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c
new file mode 100644
index 00000000000..c367d704002
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c
@@ -0,0 +1,19 @@ 
+/* { dg-options "-fopenmp -fdump-tree-optimized -O2" } */
+
+/* Test that simd clones are generated for functions with "declare target".  */
+
+#pragma omp declare target
+int addit(int a, int b, int c)
+{
+  return a + b;
+}
+#pragma omp end declare target
+
+/* { dg-final { scan-tree-dump "_ZGVbN4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVbM4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVcN4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVcM4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVdN8vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVdM8vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVeN16vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVeM16vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c
new file mode 100644
index 00000000000..28df4282623
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c
@@ -0,0 +1,18 @@ 
+/* { dg-options "-fopenmp -fdump-tree-optimized -O2" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" but unsuitable arguments.  */
+
+struct s {
+  int a;
+  int b;
+};
+  
+#pragma omp declare target
+int addit (struct s x)
+{
+  return x.a + x.b;
+}
+#pragma omp end declare target
+
+/* { dg-final { scan-tree-dump-not "_Z.*_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c
new file mode 100644
index 00000000000..807a2f9204d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c
@@ -0,0 +1,17 @@ 
+/* { dg-options "-fopenmp -fdump-tree-optimized -O2" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" but that call possibly side-effecting functions 
+   in the body.  */
+
+extern int f (int);
+
+#pragma omp declare target
+int addit(int a, int b, int c)
+{
+  return f(a) + b;
+}
+#pragma omp end declare target
+
+/* { dg-final { scan-tree-dump-not "_Z.*_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c
new file mode 100644
index 00000000000..76bbcf43b03
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c
@@ -0,0 +1,16 @@ 
+/* { dg-options "-fopenmp -fdump-tree-optimized -O2" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" but that write memory in the body.  */
+
+extern int save;
+
+#pragma omp declare target
+int addit(int a, int b, int c)
+{
+  save = c;
+  return a + b;
+}
+#pragma omp end declare target
+
+/* { dg-final { scan-tree-dump-not "_Z.*_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
-- 
2.31.1