[OpenACC,2.7] Implement reductions for arrays and structs

Message ID 1ee7eb45-6bf1-40e5-9aec-48f2a8d28196@pllab.cs.nthu.edu.tw
State Accepted
Headers
Series [OpenACC,2.7] Implement reductions for arrays and structs |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Chung-Lin Tang Jan. 2, 2024, 3:21 p.m. UTC
  Hi Thomas, Andrew,
this patch implements reductions for arrays and structs for OpenACC. Following the pattern for OpenACC reductions, this is mostly in the respective NVPTX/GCN backends' *_goacc_reduction_setup/init/fini/teardown hooks, particularly in the fini part, and [nvptx/gcn]_reduction_update routines. The code is mostly similar between the two targets, with mostly the lack of vector mode handling in GCN.

To Julian, there is a patch to the middle-end neutering, a hack actually, that detects SSA_NAMEs used in reduction array MEM_REFs, and avoids single->parallel copying (by moving those definitions before BUILT_IN_GOACC_SINGLE_COPY_START). This appears to work because reductions do their own initializing of the private copy.

As we discussed in our internal calls, the real proper way is to create the private array in a more appropriate stage, but that is too long a shot for now. The changes here are needed at least for some -O0 cases (when under optimization, propagation of the private copies' local address eliminate the SSA_NAME and things actually just work in that case). So please bear with this hack.

I believe the new added libgomp testcases should be fairly complete. Though note that one case of reduction of * for double arrays has been commented out for now, for there appears to be a (presumably) unrelated issue causing this case to fail (maybe has to do with the loop-based atomic form used by both NVPTX/GCN). Maybe should XFAIL instead of comment out. Will do this in next iteration.

Thanks,
Chung-Lin

2024-01-02  Chung-Lin Tang  <cltang@codesourcery.com>

	gcc/c/ChangeLog:
	* c-parser.cc (c_parser_omp_clause_reduction): Adjustments for
	OpenACC-specific cases.
	* c-typeck.cc (c_oacc_reduction_defined_type_p): New function.
	(c_oacc_reduction_code_name): Likewise.
	(c_finish_omp_clauses): Handle OpenACC cases using new functions.

	gcc/cp/ChangeLog:
	* parser.cc (cp_parser_omp_clause_reduction): Adjustments for
	OpenACC-specific cases.
	* semantics.cc (cp_oacc_reduction_defined_type_p): New function.
	(cp_oacc_reduction_code_name): Likewise.
	(finish_omp_reduction_clause): Handle OpenACC cases using new functions.

	gcc/ChangeLog:
	* config/gcn/gcn-tree.cc (gcn_reduction_update): Additions for
	handling ARRAY_TYPE and RECORD_TYPE reductions.
	(gcn_goacc_reduction_setup): Likewise.
	(gcn_goacc_reduction_init): Likewise.
	(gcn_goacc_reduction_fini): Likewise.
	(gcn_goacc_reduction_teardown): Likewise.

	* config/nvptx/nvptx.cc (nvptx_gen_shuffle): Properly generate
	V2SI shuffle using vec_extract op.
	(nvptx_get_shared_red_addr): Adjust type/alignment calculations to
	use TYPE_SIZE/ALIGN_UNIT instead of machine mode based.
	(nvptx_reduction_update): Additions for handling ARRAY_TYPE and
	RECORD_TYPE reductions.
	(nvptx_goacc_reduction_setup): Likewise.
	(nvptx_goacc_reduction_init): Likewise.
	(nvptx_goacc_reduction_fini): Likewise.
	(nvptx_goacc_reduction_teardown): Likewise.

	* omp-low.cc (scan_sharing_clauses): Adjust ARRAY_REF pointer type
	building to use decl type, rather than generic ptr_type_node.
	(omp_reduction_init_op): Add ARRAY_TYPE and RECORD_TYPE init op
	construction.
	(lower_oacc_reductions): Add code to teardown/recover array access
	MEM_REF in OMP_CLAUSE_DECL, to accomodate for lookup requirements.
	Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT
	instead of machine mode based.

	* omp-oacc-neuter-broadcast.cc (worker_single_copy):
	Add 'hash_set<tree> *array_reduction_base_vars' parameter.
	Add xxx.

	(neuter_worker_single): Add 'hash_set<tree> *array_reduction_base_vars'
	parameter. Adjust recursive calls to self and worker_single_copy.
	(oacc_do_neutering): Add 'hash_set<tree> *array_reduction_base_vars'
	parameter. Adjust call to neuter_worker_single.
	(execute_omp_oacc_neuter_broadcast): Add local
	'hash_set<tree> array_reduction_base_vars' declaration. Collect MEM_REF
	base-pointer SSA_NAMEs of arrays into array_reduction_base_vars. Add
	'&array_reduction_base_vars' argument to call of oacc_do_neutering.

	* omp-offload.cc (default_goacc_reduction): Add unshare_expr.

	gcc/testsuite/ChangeLog:
	* c-c++-common/goacc/reduction-9.c: New test.
	* c-c++-common/goacc/reduction-10.c: New test.
	* c-c++-common/goacc/reduction-11.c: New test.
	* c-c++-common/goacc/reduction-12.c: New test.
	* c-c++-common/goacc/reduction-13.c: New test.

	libgomp/ChangeLog:
	* testsuite/libgomp.oacc-c-c++-common/reduction.h
	(check_reduction_array_xx): New macro.
	(operator_apply): Likewise.
	(check_reduction_array_op): Likewise.
	(check_reduction_arraysec_op): Likewise.
	(function_apply): Likewise.
	(check_reduction_array_macro): Likewise.
	(check_reduction_arraysec_macro): Likewise.
	(check_reduction_xxx_xx_all): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-arrays-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction-structs-1.c: New test.
  

Comments

Julian Brown Jan. 10, 2024, 11:33 a.m. UTC | #1
On Tue, 2 Jan 2024 23:21:21 +0800
Chung-Lin Tang <cltang@pllab.cs.nthu.edu.tw> wrote:

> To Julian, there is a patch to the middle-end neutering, a hack
> actually, that detects SSA_NAMEs used in reduction array MEM_REFs,
> and avoids single->parallel copying (by moving those definitions
> before BUILT_IN_GOACC_SINGLE_COPY_START). This appears to work
> because reductions do their own initializing of the private copy.

It looks OK to me I think (bearing in mind your following paragraph, of
course!). I wonder though if maybe non-SSA (i.e. addressable) variables
need to be handled also, i.e. parts like this:

+  /* For accesses of variables used in array reductions, instead of
+     propagating the value for the main thread to all other worker threads
+     (which doesn't make sense as a reduction private var), move the defs
+     of such SSA_NAMEs to before the copy block and leave them alone (each
+     thread should access their own local copy).  */
+  for (gimple_stmt_iterator i = gsi_after_labels (from); !gsi_end_p (i);)
+    {
+      gimple *stmt = gsi_stmt (i);
+      if (gimple_assign_single_p (stmt)
+	  && def_escapes_block->contains (gimple_assign_lhs (stmt))
+	  && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME)

are only handling SSA-converted variables. But maybe that's OK?

> As we discussed in our internal calls, the real proper way is to
> create the private array in a more appropriate stage, but that is too
> long a shot for now. The changes here are needed at least for some
> -O0 cases (when under optimization, propagation of the private
> copies' local address eliminate the SSA_NAME and things actually just
> work in that case). So please bear with this hack.

HTH,

Julian
  

Patch

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index ed92caca814..d13231bc053 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -17201,13 +17201,21 @@  c_parser_omp_clause_reduction (c_parser *parser, enum omp_clause_code kind,
 		code = MAX_EXPR;
 		break;
 	      }
+	    if (!is_omp)
+	      goto name_error;
 	    reduc_id = c_parser_peek_token (parser)->value;
 	    break;
 	  }
 	default:
-	  c_parser_error (parser,
-			  "expected %<+%>, %<*%>, %<-%>, %<&%>, "
-			  "%<^%>, %<|%>, %<&&%>, %<||%> or identifier");
+	name_error:
+	  if (is_omp)
+	    c_parser_error (parser,
+			    "expected %<+%>, %<*%>, %<-%>, %<&%>, "
+			    "%<^%>, %<|%>, %<&&%>, %<||%> or identifier");
+	  else
+	    c_parser_error (parser,
+			    "expected %<+%>, %<*%>, %<-%>, %<&%>, "
+			    "%<^%>, %<|%>, %<&&%>, %<||%>, %<min%> or %<max%>");
 	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, 0);
 	  return list;
 	}
@@ -17220,6 +17228,11 @@  c_parser_omp_clause_reduction (c_parser *parser, enum omp_clause_code kind,
 	  nl = c_parser_omp_variable_list (parser, clause_loc, kind, list);
 	  for (c = nl; c != list; c = OMP_CLAUSE_CHAIN (c))
 	    {
+	      OMP_CLAUSE_REDUCTION_CODE (c) = code;
+	      /* OpenACC does not require anything below.  */
+	      if (!is_omp)
+		continue;
+
 	      tree d = OMP_CLAUSE_DECL (c), type;
 	      if (TREE_CODE (d) != TREE_LIST)
 		type = TREE_TYPE (d);
@@ -17241,7 +17254,6 @@  c_parser_omp_clause_reduction (c_parser *parser, enum omp_clause_code kind,
 		}
 	      while (TREE_CODE (type) == ARRAY_TYPE)
 		type = TREE_TYPE (type);
-	      OMP_CLAUSE_REDUCTION_CODE (c) = code;
 	      if (task)
 		OMP_CLAUSE_REDUCTION_TASK (c) = 1;
 	      else if (inscan)
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 2d9139d09d2..3c3bcb5f8f9 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -14604,6 +14604,68 @@  c_oacc_check_attachments (tree c)
   return false;
 }
 
+static bool
+c_oacc_reduction_defined_type_p (enum tree_code reduction_code, tree t)
+{
+  if (TREE_CODE (t) == INTEGER_TYPE)
+    return true;
+
+  if (FLOAT_TYPE_P (t) || TREE_CODE (t) == COMPLEX_TYPE)
+    switch (reduction_code)
+      {
+      case PLUS_EXPR:
+      case MULT_EXPR:
+      case MINUS_EXPR:
+      case TRUTH_ANDIF_EXPR:
+      case TRUTH_ORIF_EXPR:
+	return true;
+      case MIN_EXPR:
+      case MAX_EXPR:
+	return TREE_CODE (t) != COMPLEX_TYPE;
+      case BIT_AND_EXPR:
+      case BIT_XOR_EXPR:
+      case BIT_IOR_EXPR:
+	return false;
+      default:
+	gcc_unreachable ();
+      }
+
+  if (TREE_CODE (t) == ARRAY_TYPE)
+    return c_oacc_reduction_defined_type_p (reduction_code, TREE_TYPE (t));
+
+  if (TREE_CODE (t) == RECORD_TYPE)
+    {
+      for (tree fld = TYPE_FIELDS (t); fld; fld = TREE_CHAIN (fld))
+	if (TREE_CODE (fld) == FIELD_DECL
+	    && !c_oacc_reduction_defined_type_p (reduction_code,
+						 TREE_TYPE (fld)))
+	  return false;
+      return true;
+    }
+
+  return false;
+}
+
+static const char *
+c_oacc_reduction_code_name (enum tree_code reduction_code)
+{
+  switch (reduction_code)
+    {
+    case PLUS_EXPR: return "+";
+    case MULT_EXPR: return "*";
+    case MINUS_EXPR: return "-";
+    case TRUTH_ANDIF_EXPR: return "&&";
+    case TRUTH_ORIF_EXPR: return "||";
+    case MIN_EXPR: return "min";
+    case MAX_EXPR: return "max";
+    case BIT_AND_EXPR: return "&";
+    case BIT_XOR_EXPR: return "^";
+    case BIT_IOR_EXPR: return "|";
+    default:
+      gcc_unreachable ();
+    }
+}
+
 /* For all elements of CLAUSES, validate them against their constraints.
    Remove any elements from the list that are invalid.  */
 
@@ -14794,9 +14856,22 @@  c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 		  break;
 		}
 	    }
-	  if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c) == NULL_TREE
-	      && (FLOAT_TYPE_P (type)
-		  || TREE_CODE (type) == COMPLEX_TYPE))
+	  if (ort == C_ORT_ACC)
+	    {
+	      enum tree_code r_code = OMP_CLAUSE_REDUCTION_CODE (c);
+	      if (!c_oacc_reduction_defined_type_p (r_code, TREE_TYPE (t)))
+		{
+		  const char *r_name = c_oacc_reduction_code_name (r_code);
+		  error_at (OMP_CLAUSE_LOCATION (c),
+			    "%qE has invalid type for %<reduction(%s)%>",
+			    t, r_name);
+		  remove = true;
+		  break;
+		}
+	    }
+	  else if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c) == NULL_TREE
+		   && (FLOAT_TYPE_P (type)
+		       || TREE_CODE (type) == COMPLEX_TYPE))
 	    {
 	      enum tree_code r_code = OMP_CLAUSE_REDUCTION_CODE (c);
 	      const char *r_name = NULL;
diff --git a/gcc/config/gcn/gcn-tree.cc b/gcc/config/gcn/gcn-tree.cc
index c99c1767659..55cca4b1b81 100644
--- a/gcc/config/gcn/gcn-tree.cc
+++ b/gcc/config/gcn/gcn-tree.cc
@@ -296,6 +296,105 @@  gcn_reduction_update (location_t loc, gimple_stmt_iterator *gsi,
   tree type = TREE_TYPE (var);
   tree size = TYPE_SIZE (type);
 
+  if (!VAR_P (ptr))
+    {
+      tree t = make_ssa_name (TREE_TYPE (ptr));
+      gimple_seq seq = NULL;
+      gimplify_assign (t, ptr, &seq);
+      gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+      ptr = t;
+    }
+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+    {
+      gimple *g;
+      gimple_seq seq = NULL;
+      tree array_type = TREE_TYPE (var);
+      tree array_elem_type = TREE_TYPE (array_type);
+      tree max_index = TYPE_MAX_VALUE (TYPE_DOMAIN (array_type));
+
+      tree init_index = make_ssa_name (TREE_TYPE (max_index));
+      tree loop_index = make_ssa_name (TREE_TYPE (max_index));
+      tree update_index = make_ssa_name (TREE_TYPE (max_index));
+
+      g = gimple_build_assign (init_index,
+			       build_int_cst (TREE_TYPE (init_index), 0));
+      gimple_seq_add_stmt (&seq, g);
+      gimple *init_end = gimple_seq_last (seq);
+      gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+
+      basic_block init_bb = gsi_bb (*gsi);
+      edge init_edge = split_block (init_bb, init_end);
+      basic_block loop_bb = init_edge->dest;
+      /* Reset the iterator.  */
+      *gsi = gsi_for_stmt (gsi_stmt (*gsi));
+
+      seq = NULL;
+      g = gimple_build_assign (update_index, PLUS_EXPR, loop_index,
+			       build_int_cst (TREE_TYPE (loop_index), 1));
+      gimple_seq_add_stmt (&seq, g);
+
+      g = gimple_build_cond (LE_EXPR, update_index, max_index, NULL, NULL);
+      gimple_seq_add_stmt (&seq, g);
+      gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+
+      edge post_edge = split_block (loop_bb, g);
+      basic_block post_bb = post_edge->dest;
+      loop_bb = post_edge->src;
+      /* Reset the iterator.  */
+      *gsi = gsi_for_stmt (gsi_stmt (*gsi));
+
+      /* Place where we insert reduction code below.  */
+      gimple_stmt_iterator reduction_code_gsi = gsi_start_bb (loop_bb);
+
+      post_edge->flags ^= EDGE_FALSE_VALUE | EDGE_FALLTHRU;
+      post_edge->probability = profile_probability::even ();
+      edge loop_edge = make_edge (loop_bb, loop_bb, EDGE_TRUE_VALUE);
+      loop_edge->probability = profile_probability::even ();
+      set_immediate_dominator (CDI_DOMINATORS, loop_bb, init_bb);
+      set_immediate_dominator (CDI_DOMINATORS, post_bb, loop_bb);
+
+      gphi *phi = create_phi_node (loop_index, loop_bb);
+      add_phi_arg (phi, init_index, init_edge, loc);
+      add_phi_arg (phi, update_index, loop_edge, loc);
+
+      tree var_aref = build4 (ARRAY_REF, array_elem_type,
+			      var, loop_index, NULL_TREE, NULL_TREE);
+
+      tree red_array = build_simple_mem_ref (ptr);
+      tree red_array_type = TREE_TYPE (red_array);
+      tree red_array_elem_type
+	= build_qualified_type (TREE_TYPE (red_array_type),
+				TYPE_QUALS (red_array_type));
+      tree ptr_aref = build4 (ARRAY_REF, red_array_elem_type,
+			      red_array, loop_index,
+			      NULL_TREE, NULL_TREE);
+
+      gcn_reduction_update (loc, &reduction_code_gsi,
+			    build_fold_addr_expr (ptr_aref),
+			    var_aref, op);
+      return build_simple_mem_ref (ptr);
+    }
+  else if (TREE_CODE (type) == RECORD_TYPE)
+    {
+      for (tree fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld))
+	if (TREE_CODE (fld) == FIELD_DECL)
+	  {
+	    tree var_fld_ref = build3 (COMPONENT_REF, TREE_TYPE (fld),
+				       var, fld, NULL);
+	    tree ptr_ref = build_simple_mem_ref (ptr);
+	    tree ptr_fld_type
+	      = build_qualified_type (TREE_TYPE (fld),
+				      TYPE_QUALS (TREE_TYPE (ptr_ref)));
+	    tree ptr_fld_ref = build3 (COMPONENT_REF, ptr_fld_type,
+				       ptr_ref, fld, NULL);
+	    gcn_reduction_update (loc, gsi,
+				  build_fold_addr_expr (ptr_fld_ref),
+				  var_fld_ref, op);
+	  }
+      return build_simple_mem_ref (ptr);
+    }
+
   if (size == TYPE_SIZE (unsigned_type_node)
       || size == TYPE_SIZE (long_long_unsigned_type_node))
     return gcn_lockless_update (loc, gsi, ptr, var, op);
@@ -359,11 +458,14 @@  gcn_goacc_reduction_setup (gcall *call)
       gimplify_assign (decl, var, &seq);
     }
 
-  if (lhs)
+  if (lhs
+      && TREE_CODE (TREE_TYPE (var)) != ARRAY_TYPE
+      && TREE_CODE (TREE_TYPE (var)) != RECORD_TYPE)
     gimplify_assign (lhs, var, &seq);
 
   pop_gimplify_context (NULL);
-  gsi_replace_with_seq (&gsi, seq, true);
+  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
 }
 
 /* Expand IFN_GOACC_REDUCTION_INIT.  */
@@ -395,7 +497,8 @@  gcn_goacc_reduction_init (gcall *call)
     gimplify_assign (lhs, init, &seq);
 
   pop_gimplify_context (NULL);
-  gsi_replace_with_seq (&gsi, seq, true);
+  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
 }
 
 /* Expand IFN_GOACC_REDUCTION_FINI.  */
@@ -439,11 +542,13 @@  gcn_goacc_reduction_fini (gcall *call)
       r = gcn_reduction_update (gimple_location (call), &gsi, accum, var, op);
     }
 
-  if (lhs)
+  if (lhs
+      && TREE_CODE (TREE_TYPE (r)) != ARRAY_TYPE
+      && TREE_CODE (TREE_TYPE (r)) != RECORD_TYPE)
     gimplify_assign (lhs, r, &seq);
   pop_gimplify_context (NULL);
-
-  gsi_replace_with_seq (&gsi, seq, true);
+  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
 }
 
 /* Expand IFN_GOACC_REDUCTION_TEARDOWN.  */
@@ -483,8 +588,8 @@  gcn_goacc_reduction_teardown (gcall *call)
     gimplify_assign (lhs, unshare_expr (var), &seq);
 
   pop_gimplify_context (NULL);
-
-  gsi_replace_with_seq (&gsi, seq, true);
+  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
 }
 
 /* Implement TARGET_GOACC_REDUCTION.
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 3fb1deb70fd..ee242c37d25 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -2029,19 +2029,15 @@  nvptx_gen_shuffle (rtx dst, rtx src, rtx idx, nvptx_shuffle_kind kind)
       break;
     case E_V2SImode:
       {
-	rtx src0 = gen_rtx_SUBREG (SImode, src, 0);
-	rtx src1 = gen_rtx_SUBREG (SImode, src, 4);
-	rtx dst0 = gen_rtx_SUBREG (SImode, dst, 0);
-	rtx dst1 = gen_rtx_SUBREG (SImode, dst, 4);
 	rtx tmp0 = gen_reg_rtx (SImode);
 	rtx tmp1 = gen_reg_rtx (SImode);
 	start_sequence ();
-	emit_insn (gen_movsi (tmp0, src0));
-	emit_insn (gen_movsi (tmp1, src1));
+	emit_insn (gen_vec_extractv2sisi (tmp0, src, GEN_INT (0)));
+	emit_insn (gen_vec_extractv2sisi (tmp1, src, GEN_INT (1)));
 	emit_insn (nvptx_gen_shuffle (tmp0, tmp0, idx, kind));
 	emit_insn (nvptx_gen_shuffle (tmp1, tmp1, idx, kind));
-	emit_insn (gen_movsi (dst0, tmp0));
-	emit_insn (gen_movsi (dst1, tmp1));
+	emit_insn (gen_vec_setv2si (dst, tmp0, GEN_INT (0)));
+	emit_insn (gen_vec_setv2si (dst, tmp1, GEN_INT (1)));
 	res = get_insns ();
 	end_sequence ();
       }
@@ -6708,11 +6704,9 @@  nvptx_get_shared_red_addr (tree type, tree offset, bool vector)
   enum nvptx_builtins addr_dim = NVPTX_BUILTIN_WORKER_ADDR;
   if (vector)
     addr_dim = NVPTX_BUILTIN_VECTOR_ADDR;
-  machine_mode mode = TYPE_MODE (type);
   tree fndecl = nvptx_builtin_decl (addr_dim, true);
-  tree size = build_int_cst (unsigned_type_node, GET_MODE_SIZE (mode));
-  tree align = build_int_cst (unsigned_type_node,
-			      GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT);
+  tree size = TYPE_SIZE_UNIT (type);
+  tree align = build_int_cst (unsigned_type_node, TYPE_ALIGN_UNIT (type));
   tree call = build_call_expr (fndecl, 3, offset, size, align);
 
   return fold_convert (build_pointer_type (type), call);
@@ -7029,6 +7023,105 @@  nvptx_reduction_update (location_t loc, gimple_stmt_iterator *gsi,
   tree type = TREE_TYPE (var);
   tree size = TYPE_SIZE (type);
 
+  if (!VAR_P (ptr))
+    {
+      tree t = make_ssa_name (TREE_TYPE (ptr));
+      gimple_seq seq = NULL;
+      gimplify_assign (t, ptr, &seq);
+      gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+      ptr = t;
+    }
+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+    {
+      gimple *g;
+      gimple_seq seq = NULL;
+      tree array_type = TREE_TYPE (var);
+      tree array_elem_type = TREE_TYPE (array_type);
+      tree max_index = TYPE_MAX_VALUE (TYPE_DOMAIN (array_type));
+
+      tree init_index = make_ssa_name (TREE_TYPE (max_index));
+      tree loop_index = make_ssa_name (TREE_TYPE (max_index));
+      tree update_index = make_ssa_name (TREE_TYPE (max_index));
+
+      g = gimple_build_assign (init_index,
+			       build_int_cst (TREE_TYPE (init_index), 0));
+      gimple_seq_add_stmt (&seq, g);
+      gimple *init_end = gimple_seq_last (seq);
+      gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+
+      basic_block init_bb = gsi_bb (*gsi);
+      edge init_edge = split_block (init_bb, init_end);
+      basic_block loop_bb = init_edge->dest;
+      /* Reset the iterator.  */
+      *gsi = gsi_for_stmt (gsi_stmt (*gsi));
+
+      seq = NULL;
+      g = gimple_build_assign (update_index, PLUS_EXPR, loop_index,
+			       build_int_cst (TREE_TYPE (loop_index), 1));
+      gimple_seq_add_stmt (&seq, g);
+
+      g = gimple_build_cond (LE_EXPR, update_index, max_index, NULL, NULL);
+      gimple_seq_add_stmt (&seq, g);
+      gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+
+      edge post_edge = split_block (loop_bb, g);
+      basic_block post_bb = post_edge->dest;
+      loop_bb = post_edge->src;
+      /* Reset the iterator.  */
+      *gsi = gsi_for_stmt (gsi_stmt (*gsi));
+
+      /* Place where we insert reduction code below.  */
+      gimple_stmt_iterator reduction_code_gsi = gsi_start_bb (loop_bb);
+
+      post_edge->flags ^= EDGE_FALSE_VALUE | EDGE_FALLTHRU;
+      post_edge->probability = profile_probability::even ();
+      edge loop_edge = make_edge (loop_bb, loop_bb, EDGE_TRUE_VALUE);
+      loop_edge->probability = profile_probability::even ();
+      set_immediate_dominator (CDI_DOMINATORS, loop_bb, init_bb);
+      set_immediate_dominator (CDI_DOMINATORS, post_bb, loop_bb);
+
+      gphi *phi = create_phi_node (loop_index, loop_bb);
+      add_phi_arg (phi, init_index, init_edge, loc);
+      add_phi_arg (phi, update_index, loop_edge, loc);
+
+      tree var_aref = build4 (ARRAY_REF, array_elem_type,
+			      var, loop_index, NULL_TREE, NULL_TREE);
+
+      tree red_array = build_simple_mem_ref (ptr);
+      tree red_array_type = TREE_TYPE (red_array);
+      tree red_array_elem_type
+	= build_qualified_type (TREE_TYPE (red_array_type),
+				TYPE_QUALS (red_array_type));
+      tree ptr_aref = build4 (ARRAY_REF, red_array_elem_type,
+			      red_array, loop_index,
+			      NULL_TREE, NULL_TREE);
+
+      nvptx_reduction_update (loc, &reduction_code_gsi,
+			      build_fold_addr_expr (ptr_aref),
+			      var_aref, op, level);
+      return build_simple_mem_ref (ptr);
+    }
+  else if (TREE_CODE (type) == RECORD_TYPE)
+    {
+      for (tree fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld))
+	if (TREE_CODE (fld) == FIELD_DECL)
+	  {
+	    tree var_fld_ref = build3 (COMPONENT_REF, TREE_TYPE (fld),
+				       var, fld, NULL);
+	    tree ptr_ref = build_simple_mem_ref (ptr);
+	    tree ptr_fld_type
+	      = build_qualified_type (TREE_TYPE (fld),
+				      TYPE_QUALS (TREE_TYPE (ptr_ref)));
+	    tree ptr_fld_ref = build3 (COMPONENT_REF, ptr_fld_type,
+				       ptr_ref, fld, NULL);
+	    nvptx_reduction_update (loc, gsi,
+				    build_fold_addr_expr (ptr_fld_ref),
+				    var_fld_ref, op, level);
+	  }
+      return build_simple_mem_ref (ptr);
+    }
+
   if (size == TYPE_SIZE (unsigned_type_node)
       || size == TYPE_SIZE (long_long_unsigned_type_node))
     return nvptx_lockless_update (loc, gsi, ptr, var, op);
@@ -7059,7 +7152,10 @@  nvptx_goacc_reduction_setup (gcall *call, offload_attrs *oa)
     }
   
   if (level == GOMP_DIM_WORKER
-      || (level == GOMP_DIM_VECTOR && oa->vector_length > PTX_WARP_SIZE))
+      || (level == GOMP_DIM_VECTOR
+	  && (oa->vector_length > PTX_WARP_SIZE
+	      || TREE_CODE (TREE_TYPE (var)) == ARRAY_TYPE
+	      || TREE_CODE (TREE_TYPE (var)) == RECORD_TYPE)))
     {
       /* Store incoming value to worker reduction buffer.  */
       tree offset = gimple_call_arg (call, 5);
@@ -7073,11 +7169,14 @@  nvptx_goacc_reduction_setup (gcall *call, offload_attrs *oa)
       gimplify_assign (ref, var, &seq);
     }
 
-  if (lhs)
+  if (lhs
+      && TREE_CODE (TREE_TYPE (var)) != ARRAY_TYPE
+      && TREE_CODE (TREE_TYPE (var)) != RECORD_TYPE)
     gimplify_assign (lhs, var, &seq);
 
   pop_gimplify_context (NULL);
-  gsi_replace_with_seq (&gsi, seq, true);
+  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
 }
 
 /* NVPTX implementation of GOACC_REDUCTION_INIT. */
@@ -7097,7 +7196,9 @@  nvptx_goacc_reduction_init (gcall *call, offload_attrs *oa)
   
   push_gimplify_context (true);
 
-  if (level == GOMP_DIM_VECTOR && oa->vector_length == PTX_WARP_SIZE)
+  if (level == GOMP_DIM_VECTOR && oa->vector_length == PTX_WARP_SIZE
+      && TREE_CODE (TREE_TYPE (var)) != ARRAY_TYPE
+      && TREE_CODE (TREE_TYPE (var)) != RECORD_TYPE)
     {
       /* Initialize vector-non-zeroes to INIT_VAL (OP).  */
       tree tid = make_ssa_name (integer_type_node);
@@ -7162,7 +7263,8 @@  nvptx_goacc_reduction_init (gcall *call, offload_attrs *oa)
     }
 
   pop_gimplify_context (NULL);
-  gsi_replace_with_seq (&gsi, seq, true);
+  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
 }
 
 /* NVPTX implementation of GOACC_REDUCTION_FINI.  */
@@ -7182,7 +7284,9 @@  nvptx_goacc_reduction_fini (gcall *call, offload_attrs *oa)
 
   push_gimplify_context (true);
 
-  if (level == GOMP_DIM_VECTOR && oa->vector_length == PTX_WARP_SIZE)
+  if (level == GOMP_DIM_VECTOR && oa->vector_length == PTX_WARP_SIZE
+      && TREE_CODE (TREE_TYPE (var)) != ARRAY_TYPE
+      && TREE_CODE (TREE_TYPE (var)) != RECORD_TYPE)
     {
       /* Emit binary shuffle tree.  TODO. Emit this as an actual loop,
 	 but that requires a method of emitting a unified jump at the
@@ -7229,11 +7333,14 @@  nvptx_goacc_reduction_fini (gcall *call, offload_attrs *oa)
 	}
     }
 
-  if (lhs)
+  if (lhs
+      && TREE_CODE (TREE_TYPE (r)) != ARRAY_TYPE
+      && TREE_CODE (TREE_TYPE (r)) != RECORD_TYPE)
     gimplify_assign (lhs, r, &seq);
-  pop_gimplify_context (NULL);
 
-  gsi_replace_with_seq (&gsi, seq, true);
+  pop_gimplify_context (NULL);
+  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
 }
 
 /* NVPTX implementation of GOACC_REDUCTION_TEARDOWN.  */
@@ -7249,7 +7356,10 @@  nvptx_goacc_reduction_teardown (gcall *call, offload_attrs *oa)
   
   push_gimplify_context (true);
   if (level == GOMP_DIM_WORKER
-      || (level == GOMP_DIM_VECTOR && oa->vector_length > PTX_WARP_SIZE))
+      || (level == GOMP_DIM_VECTOR
+	  && (oa->vector_length > PTX_WARP_SIZE
+	      || TREE_CODE (TREE_TYPE (var)) == ARRAY_TYPE
+	      || TREE_CODE (TREE_TYPE (var)) == RECORD_TYPE)))
     {
       /* Read the worker reduction buffer.  */
       tree offset = gimple_call_arg (call, 5);
@@ -7272,11 +7382,11 @@  nvptx_goacc_reduction_teardown (gcall *call, offload_attrs *oa)
     }
 
   if (lhs)
-    gimplify_assign (lhs, var, &seq);
+    gimplify_assign (lhs, unshare_expr (var), &seq);
   
   pop_gimplify_context (NULL);
-
-  gsi_replace_with_seq (&gsi, seq, true);
+  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
 }
 
 /* NVPTX reduction expander.  */
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 379aeb56b15..a5e67bd3d68 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -39620,6 +39620,12 @@  cp_parser_omp_clause_reduction (cp_parser *parser, enum omp_clause_code kind,
 	    code = TRUTH_ANDIF_EXPR;
 	  else if (id == ovl_op_identifier (false, TRUTH_ORIF_EXPR))
 	    code = TRUTH_ORIF_EXPR;
+	  if (code == ERROR_MARK && !is_omp)
+	    {
+	      cp_parser_error (parser, "expected %<+%>, %<*%>, %<-%>, %<&%>, "
+			       "%<^%>, %<|%>, %<&&%>, %<||%>, %<min%> or %<max%>");
+	      goto resync_fail;
+	    }
 	  id = omp_reduction_id (code, id, NULL_TREE);
 	  tree scope = parser->scope;
 	  if (scope)
@@ -39647,6 +39653,10 @@  cp_parser_omp_clause_reduction (cp_parser *parser, enum omp_clause_code kind,
   for (c = nlist; c != list; c = OMP_CLAUSE_CHAIN (c))
     {
       OMP_CLAUSE_REDUCTION_CODE (c) = code;
+      /* OpenACC does not require anything below.  */
+      if (!is_omp)
+	continue;
+
       if (task)
 	OMP_CLAUSE_REDUCTION_TASK (c) = 1;
       else if (inscan)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index e6dba29ee81..d02d53fd508 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -6409,6 +6409,69 @@  cp_check_omp_declare_reduction (tree udr)
   return true;
 }
 
+
+static bool
+cp_oacc_reduction_defined_type_p (enum tree_code reduction_code, tree t)
+{
+  if (TREE_CODE (t) == INTEGER_TYPE)
+    return true;
+
+  if (FLOAT_TYPE_P (t) || TREE_CODE (t) == COMPLEX_TYPE)
+    switch (reduction_code)
+      {
+      case PLUS_EXPR:
+      case MULT_EXPR:
+      case MINUS_EXPR:
+      case TRUTH_ANDIF_EXPR:
+      case TRUTH_ORIF_EXPR:
+	return true;
+      case MIN_EXPR:
+      case MAX_EXPR:
+	return TREE_CODE (t) != COMPLEX_TYPE;
+      case BIT_AND_EXPR:
+      case BIT_XOR_EXPR:
+      case BIT_IOR_EXPR:
+	return false;
+      default:
+	gcc_unreachable ();
+      }
+
+  if (TREE_CODE (t) == ARRAY_TYPE)
+    return cp_oacc_reduction_defined_type_p (reduction_code, TREE_TYPE (t));
+
+  if (TREE_CODE (t) == RECORD_TYPE)
+    {
+      for (tree fld = TYPE_FIELDS (t); fld; fld = TREE_CHAIN (fld))
+	if (TREE_CODE (fld) == FIELD_DECL
+	    && !cp_oacc_reduction_defined_type_p (reduction_code,
+						  TREE_TYPE (fld)))
+	  return false;
+      return true;
+    }
+
+  return false;
+}
+
+static const char *
+cp_oacc_reduction_code_name (enum tree_code reduction_code)
+{
+  switch (reduction_code)
+    {
+    case PLUS_EXPR: return "+";
+    case MULT_EXPR: return "*";
+    case MINUS_EXPR: return "-";
+    case TRUTH_ANDIF_EXPR: return "&&";
+    case TRUTH_ORIF_EXPR: return "||";
+    case MIN_EXPR: return "min";
+    case MAX_EXPR: return "max";
+    case BIT_AND_EXPR: return "&";
+    case BIT_XOR_EXPR: return "^";
+    case BIT_IOR_EXPR: return "|";
+    default:
+      gcc_unreachable ();
+    }
+}
+
 /* Helper function of finish_omp_clauses.  Clone STMT as if we were making
    an inline call.  But, remap
    the OMP_DECL1 VAR_DECL (omp_out resp. omp_orig) to PLACEHOLDER
@@ -6453,7 +6516,8 @@  find_omp_placeholder_r (tree *tp, int *, void *data)
    Return true if there is some error and the clause should be removed.  */
 
 static bool
-finish_omp_reduction_clause (tree c, bool *need_default_ctor, bool *need_dtor)
+finish_omp_reduction_clause (tree c, bool *need_default_ctor, bool *need_dtor,
+			     enum c_omp_region_type ort)
 {
   tree t = OMP_CLAUSE_DECL (c);
   bool predefined = false;
@@ -6554,6 +6618,20 @@  finish_omp_reduction_clause (tree c, bool *need_default_ctor, bool *need_dtor)
       return false;
     }
 
+  if (ort == C_ORT_ACC)
+    {
+      enum tree_code r_code = OMP_CLAUSE_REDUCTION_CODE (c);
+      if (!cp_oacc_reduction_defined_type_p (r_code, TREE_TYPE (t)))
+	{
+	  const char *r_name = cp_oacc_reduction_code_name (r_code);
+	  error_at (OMP_CLAUSE_LOCATION (c),
+		    "%qE has invalid type for %<reduction(%s)%>",
+		    t, r_name);
+	  return true;
+	}
+      return false;
+    }
+
   tree id = OMP_CLAUSE_REDUCTION_PLACEHOLDER (c);
 
   type = TYPE_MAIN_VARIANT (type);
@@ -9250,7 +9328,7 @@  finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	      && !VAR_P (t) && TREE_CODE (t) != PARM_DECL)
 	    break;
 	  if (finish_omp_reduction_clause (c, &need_default_ctor,
-					   &need_dtor))
+					   &need_dtor, ort))
 	    remove = true;
 	  else
 	    t = OMP_CLAUSE_DECL (c);
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index b2dc5ed931e..749fae4e7a6 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -1712,10 +1712,14 @@  scan_sharing_clauses (tree clauses, omp_context *ctx)
 		    }
 		  gcc_assert (!splay_tree_lookup (ctx->field_map,
 						  (splay_tree_key) decl));
+		  tree ptr_type = ptr_type_node;
+		  if (TREE_CODE (decl) == ARRAY_REF)
+		    ptr_type
+		      = build_pointer_type (TREE_TYPE (TREE_OPERAND (decl, 0)));
 		  tree field
 		    = build_decl (OMP_CLAUSE_LOCATION (c),
-				  FIELD_DECL, NULL_TREE, ptr_type_node);
-		  SET_DECL_ALIGN (field, TYPE_ALIGN (ptr_type_node));
+				  FIELD_DECL, NULL_TREE, ptr_type);
+		  SET_DECL_ALIGN (field, TYPE_ALIGN (ptr_type));
 		  insert_field_into_struct (ctx->record_type, field);
 		  splay_tree_insert (ctx->field_map, (splay_tree_key) decl,
 				     (splay_tree_value) field);
@@ -4420,6 +4424,27 @@  maybe_lookup_decl_in_outer_ctx (tree decl, omp_context *ctx)
 tree
 omp_reduction_init_op (location_t loc, enum tree_code op, tree type)
 {
+  if (TREE_CODE (type) == ARRAY_TYPE)
+    {
+      vec<constructor_elt, va_gc> *v = NULL;
+      HOST_WIDE_INT min = tree_to_shwi (TYPE_MIN_VALUE (TYPE_DOMAIN (type)));
+      HOST_WIDE_INT max = tree_to_shwi (TYPE_MAX_VALUE (TYPE_DOMAIN (type)));
+      tree t = omp_reduction_init_op (loc, op, TREE_TYPE (type));
+      for (HOST_WIDE_INT i = min; i <= max; i++)
+	CONSTRUCTOR_APPEND_ELT (v, size_int (i), t);
+      return build_constructor (type, v);
+    }
+  else if (TREE_CODE (type) == RECORD_TYPE)
+    {
+      vec<constructor_elt, va_gc> *v = NULL;
+      for (tree fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld))
+	if (TREE_CODE (fld) == FIELD_DECL)
+	  CONSTRUCTOR_APPEND_ELT (v, fld,
+				  omp_reduction_init_op (loc, op,
+							 TREE_TYPE (fld)));
+      return build_constructor (type, v);
+    }
+
   switch (op)
     {
     case PLUS_EXPR:
@@ -7406,6 +7431,21 @@  lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	gcc_checking_assert (!is_oacc_kernels_decomposed_part (ctx));
 
 	tree orig = OMP_CLAUSE_DECL (c);
+	tree addr = NULL_TREE;
+	if (TREE_CODE (orig) == MEM_REF)
+	  {
+	    /* Peel away MEM_REF to get at base array VAR_DECL.  */
+	    addr = TREE_OPERAND (orig, 0);
+	    if (TREE_CODE (addr) == POINTER_PLUS_EXPR)
+	      addr = TREE_OPERAND (addr, 0);
+	    if (TREE_CODE (addr) == ADDR_EXPR)
+	      addr = TREE_OPERAND (addr, 0);
+	    else if (INDIRECT_REF_P (addr))
+	      addr = TREE_OPERAND (addr, 0);
+	    orig = addr;
+	    gcc_assert (!is_variable_sized (addr));
+	  }
+
 	tree var = maybe_lookup_decl (orig, ctx);
 	tree ref_to_res = NULL_TREE;
 	tree incoming, outgoing, v1, v2, v3;
@@ -7476,6 +7516,18 @@  lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	  do_lookup:
 	    /* This is the outermost construct with this reduction,
 	       see if there's a mapping for it.  */
+	    if (TREE_CODE (TREE_TYPE (orig)) == ARRAY_TYPE
+		&& gimple_code (outer->stmt) == GIMPLE_OMP_TARGET)
+	      /* Recover original MEM_REF in OMP_CLAUSE_DECL from array
+		 VAR_DECL discovered above. This is due to field lookup
+		 key based on whole MEM_REF earlier during scanning.  */
+	      for (tree c = gimple_omp_target_clauses (outer->stmt); c;
+		   c = OMP_CLAUSE_CHAIN (c))
+		if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+		    && TREE_CODE (OMP_CLAUSE_DECL (c)) == ARRAY_REF
+		    && TREE_OPERAND (OMP_CLAUSE_DECL (c), 0) == orig)
+		  orig = OMP_CLAUSE_DECL (c);
+
 	    if (gimple_code (outer->stmt) == GIMPLE_OMP_TARGET
 		&& maybe_lookup_field (orig, outer) && !is_private)
 	      {
@@ -7547,10 +7599,10 @@  lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	   variable-sized type.  */
 	fixed_size_mode mode
 	  = as_a <fixed_size_mode> (TYPE_MODE (TREE_TYPE (var)));
-	unsigned align = GET_MODE_ALIGNMENT (mode) /  BITS_PER_UNIT;
+	unsigned align = TYPE_ALIGN_UNIT (TREE_TYPE (var));
 	offset = (offset + align - 1) & ~(align - 1);
 	tree off = build_int_cst (sizetype, offset);
-	offset += GET_MODE_SIZE (mode);
+	offset += tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (var)));
 
 	if (!init_code)
 	  {
diff --git a/gcc/omp-oacc-neuter-broadcast.cc b/gcc/omp-oacc-neuter-broadcast.cc
index 779dc6b1afb..5527509f270 100644
--- a/gcc/omp-oacc-neuter-broadcast.cc
+++ b/gcc/omp-oacc-neuter-broadcast.cc
@@ -991,7 +991,8 @@  worker_single_copy (basic_block from, basic_block to,
 		    hash_set<tree> *worker_partitioned_uses,
 		    tree record_type, record_field_map_t *record_field_map,
 		    unsigned HOST_WIDE_INT placement,
-		    bool isolate_broadcasts, bool has_gang_private_write)
+		    bool isolate_broadcasts, bool has_gang_private_write,
+		    hash_set<tree> *array_reduction_base_vars)
 {
   /* If we only have virtual defs, we'll have no record type, but we still want
      to emit single_copy_start and (particularly) single_copy_end to act as
@@ -1015,6 +1016,37 @@  worker_single_copy (basic_block from, basic_block to,
   edge e = split_block (to, gsi_stmt (gsi));
   basic_block barrier_block = e->dest;
 
+  gimple_seq local_asgns = NULL;
+
+  /* For accesses of variables used in array reductions, instead of
+     propagating the value for the main thread to all other worker threads
+     (which doesn't make sense as a reduction private var), move the defs
+     of such SSA_NAMEs to before the copy block and leave them alone (each
+     thread should access their own local copy).  */
+  for (gimple_stmt_iterator i = gsi_after_labels (from); !gsi_end_p (i);)
+    {
+      gimple *stmt = gsi_stmt (i);
+      if (gimple_assign_single_p (stmt)
+	  && def_escapes_block->contains (gimple_assign_lhs (stmt))
+	  && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME)
+	{
+	  tree lhs = gimple_assign_lhs (stmt);
+	  tree rhs = gimple_assign_rhs1 (stmt);
+	  if (TREE_CODE (rhs) == ADDR_EXPR)
+	    {
+	      rhs = TREE_OPERAND (rhs, 0);
+	      if (local_var_based_p (rhs)
+		  && array_reduction_base_vars->contains (lhs))
+		{
+		  gsi_remove (&i, false);
+		  gimple_seq_add_stmt (&local_asgns, stmt);
+		  continue;
+		}
+	    }
+	}
+      gsi_next (&i);
+    }
+
   gimple_stmt_iterator start = gsi_after_labels (from);
 
   tree decl = builtin_decl_explicit (BUILT_IN_GOACC_SINGLE_COPY_START);
@@ -1029,6 +1061,9 @@  worker_single_copy (basic_block from, basic_block to,
   gsi_insert_before (&start, call, GSI_NEW_STMT);
   update_stmt (call);
 
+  if (local_asgns)
+    gsi_insert_seq_before (&start, local_asgns, GSI_SAME_STMT);
+
   /* The shared-memory range for this block overflowed.  Add a barrier before
      the GOACC_single_copy_start call.  */
   if (isolate_broadcasts)
@@ -1128,6 +1163,22 @@  worker_single_copy (basic_block from, basic_block to,
 	  if (gimple_nop_p (def_stmt))
 	    continue;
 
+	  /* For accesses of variables used in array reductions, skip creating
+	     the barrier phi. Each thread runs same def_stmt to access
+	     local variable, there is no main/worker divide here.  */
+	  if (gimple_assign_single_p (def_stmt))
+	    {
+	      tree lhs = gimple_assign_lhs (def_stmt);
+	      tree rhs = gimple_assign_rhs1 (def_stmt);
+	      if (TREE_CODE (rhs) == ADDR_EXPR)
+		{
+		  rhs = TREE_OPERAND (rhs, 0);
+		  if (local_var_based_p (rhs)
+		      && array_reduction_base_vars->contains (lhs))
+		    continue;
+		}
+	    }
+
 	  /* The barrier phi takes one result from the actual work of the
 	     block we're neutering, and the other result is constant zero of
 	     the same type.  */
@@ -1248,7 +1299,8 @@  neuter_worker_single (parallel_g *par, unsigned outer_mask,
 		      hash_set<tree> *partitioned_var_uses,
 		      record_field_map_t *record_field_map,
 		      blk_offset_map_t *blk_offset_map,
-		      bitmap writes_gang_private)
+		      bitmap writes_gang_private,
+		      hash_set<tree> *array_reduction_base_vars)
 {
   unsigned mask = outer_mask | par->mask;
 
@@ -1398,7 +1450,8 @@  neuter_worker_single (parallel_g *par, unsigned outer_mask,
 				  &worker_partitioned_uses, record_type,
 				  record_field_map,
 				  offset, !range_allocated,
-				  has_gang_private_write);
+				  has_gang_private_write,
+				  array_reduction_base_vars);
 	    }
 	  else
 	    worker_single_simple (block, block, &def_escapes_block);
@@ -1436,11 +1489,13 @@  neuter_worker_single (parallel_g *par, unsigned outer_mask,
   if (par->inner)
     neuter_worker_single (par->inner, mask, worker_single, vector_single,
 			  prop_set, partitioned_var_uses, record_field_map,
-			  blk_offset_map, writes_gang_private);
+			  blk_offset_map, writes_gang_private,
+			  array_reduction_base_vars);
   if (par->next)
     neuter_worker_single (par->next, outer_mask, worker_single, vector_single,
 			  prop_set, partitioned_var_uses, record_field_map,
-			  blk_offset_map, writes_gang_private);
+			  blk_offset_map, writes_gang_private,
+			  array_reduction_base_vars);
 }
 
 static void
@@ -1587,7 +1642,8 @@  merge_ranges (splay_tree accum, splay_tree sp)
 
 static void
 oacc_do_neutering (unsigned HOST_WIDE_INT bounds_lo,
-		   unsigned HOST_WIDE_INT bounds_hi)
+		   unsigned HOST_WIDE_INT bounds_hi,
+		   hash_set<tree> *array_reduction_base_vars)
 {
   bb_stmt_map_t bb_stmt_map;
   auto_bitmap worker_single, vector_single;
@@ -1792,7 +1848,8 @@  oacc_do_neutering (unsigned HOST_WIDE_INT bounds_lo,
 
   neuter_worker_single (par, mask, worker_single, vector_single, &prop_set,
 			&partitioned_var_uses, &record_field_map,
-			&blk_offset_map, writes_gang_private);
+			&blk_offset_map, writes_gang_private,
+			array_reduction_base_vars);
 
   record_field_map.empty ();
 
@@ -1831,6 +1888,9 @@  execute_omp_oacc_neuter_broadcast ()
       private_size[i] = 0;
     }
 
+  /* Set of base variables referencing arrays used in array reductions.  */
+  hash_set<tree> array_reduction_base_vars;
+
   /* Calculate shared memory size required for reduction variables and
      gang-private memory for this offloaded function.  */
   basic_block bb;
@@ -1869,6 +1929,15 @@  execute_omp_oacc_neuter_broadcast ()
 			   + tree_to_uhwi (TYPE_SIZE_UNIT (var_type)));
 		      reduction_size[level]
 			= MAX (reduction_size[level], limit);
+
+		      tree lhs = gimple_get_lhs (call);
+		      if (TREE_CODE (lhs) == MEM_REF
+			  && TREE_CODE (TREE_OPERAND (lhs, 0)) == SSA_NAME
+			  && TREE_CODE (TREE_TYPE (lhs)) == ARRAY_TYPE)
+			{
+			  tree addr = TREE_OPERAND (lhs, 0);
+			  array_reduction_base_vars.add (addr);
+			}
 		    }
 		}
 	      break;
@@ -1917,7 +1986,7 @@  execute_omp_oacc_neuter_broadcast ()
 
   /* Perform worker partitioning unless we know 'num_workers(1)'.  */
   if (dims[GOMP_DIM_WORKER] != 1)
-    oacc_do_neutering (bounds_lo, bounds_hi);
+    oacc_do_neutering (bounds_lo, bounds_hi, &array_reduction_base_vars);
 
   return 0;
 }
diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc
index 1d6dfef74fc..c3eab8c240a 100644
--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -1819,7 +1819,7 @@  default_goacc_reduction (gcall *call)
 
   /* Copy VAR to LHS, if there is an LHS.  */
   if (lhs)
-    gimple_seq_add_stmt (&seq, gimple_build_assign (lhs, var));
+    gimple_seq_add_stmt (&seq, gimple_build_assign (lhs, unshare_expr (var)));
 
   gsi_replace_with_seq (&gsi, seq, true);
 }
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-10.c b/gcc/testsuite/c-c++-common/goacc/reduction-10.c
new file mode 100644
index 00000000000..3716e6f3c49
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-10.c
@@ -0,0 +1,60 @@ 
+/* { dg-additional-options "-fdump-tree-gimple" } */
+/* float array reductions.  */
+
+#define n 1000
+
+int
+main(void)
+{
+  int i, j;
+  float result[n], array[n];
+  int lresult[n];
+
+  /* '+' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] += array[i];
+
+  /* '*' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] *= array[i];
+
+  /* 'max' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] = result[j] > array[i] ? result[j] : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] = result[j] < array[i] ? result[j] : array[i];
+
+  /* '&&' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      lresult[j] = lresult[j] && (result[j] > array[i]);
+
+  /* '||' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      lresult[j] = lresult[j] || (result[j] > array[i]);
+
+  return 0;
+}
+
+/* Check that default copy maps are generated for loop reductions.  */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: \[0-9\]+\\\]\\)" 6 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: \[0-9\]+\\\]\\)" 2 "gimple" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-11.c b/gcc/testsuite/c-c++-common/goacc/reduction-11.c
new file mode 100644
index 00000000000..3e3af1a27ed
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-11.c
@@ -0,0 +1,60 @@ 
+/* { dg-additional-options "-fdump-tree-gimple" } */
+/* double array reductions.  */
+
+#define n 1000
+
+int
+main(void)
+{
+  int i, j;
+  double result[n], array[n];
+  int lresult[n];
+
+  /* '+' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] += array[i];
+
+  /* '*' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] *= array[i];
+
+  /* 'max' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] = result[j] > array[i] ? result[j] : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] = result[j] < array[i] ? result[j] : array[i];
+
+  /* '&&' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      lresult[j] = lresult[j] && (result[j] > array[i]);
+
+  /* '||' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      lresult[j] = lresult[j] || (result[j] > array[i]);
+
+  return 0;
+}
+
+/* Check that default copy maps are generated for loop reductions.  */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: \[0-9\]+\\\]\\)" 6 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: \[0-9\]+\\\]\\)" 2 "gimple" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-12.c b/gcc/testsuite/c-c++-common/goacc/reduction-12.c
new file mode 100644
index 00000000000..39571abfa1b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-12.c
@@ -0,0 +1,46 @@ 
+/* { dg-additional-options "-fdump-tree-gimple" } */
+/* complex array reductions.  */
+
+#define n 1000
+
+int
+main(void)
+{
+  int i, j;
+  __complex__ double result[n], array[n];
+  int lresult[n];
+
+  /* '+' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] += array[i];
+
+  /* '*' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] *= array[i];
+
+  /* '&&' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      lresult[j] = lresult[j] && (__real__(result[j]) > __real__(array[i]));
+
+  /* '||' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult[j])
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      lresult[j] = lresult[j] || (__real__(result[j]) > __real__(array[i]));
+
+  return 0;
+}
+
+/* Check that default copy maps are generated for loop reductions.  */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: \[0-9\]+\\\]\\)" 4 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: \[0-9\]+\\\]\\)" 2 "gimple" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-13.c b/gcc/testsuite/c-c++-common/goacc/reduction-13.c
new file mode 100644
index 00000000000..1d241bba18d
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-13.c
@@ -0,0 +1,51 @@ 
+/* { dg-additional-options "-fdump-tree-gimple" } */
+/* struct reductions.  */
+
+typedef struct { int x, y; } int_pair;
+typedef struct { float m, n; } flt_pair;
+typedef struct
+{
+  int i;
+  double d;
+  float f;
+  int a[4];
+  int_pair ip;
+  flt_pair fp;
+} rectype;
+
+#define n 1000
+
+int
+main(void)
+{
+  int i;
+  rectype result, array[n];
+
+  /* '+' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
+  for (i = 0; i < n; i++)
+    {
+      result.i += array[i].i;
+      result.f += array[i].f;
+      result.ip.x += array[i].ip.x;
+      result.ip.y += array[i].ip.y;
+    }
+
+  /* '*' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
+  for (i = 0; i < n; i++)
+    {
+      result.i *= array[i].i;
+      result.f *= array[i].f;
+      result.ip.x *= array[i].ip.x;
+      result.ip.y *= array[i].ip.y;
+    }
+
+  return 0;
+}
+
+/* Check that default copy maps are generated for loop reductions.  */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: \[0-9\]+\\\]\\)" 2 "gimple" } } */
+
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-9.c b/gcc/testsuite/c-c++-common/goacc/reduction-9.c
new file mode 100644
index 00000000000..04be548814c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-9.c
@@ -0,0 +1,81 @@ 
+/* { dg-additional-options "-fdump-tree-gimple" } */
+/* Integer array reductions.  */
+
+#define n 1000
+
+int
+main(void)
+{
+  int i, j;
+  int result[n], array[n];
+  int lresult[n];
+
+  /* '+' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] += array[i];
+
+  /* '*' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] *= array[i];
+
+  /* 'max' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] = result[j] > array[i] ? result[j] : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] = result[j] < array[i] ? result[j] : array[i];
+
+  /* '&' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] &= array[i];
+
+  /* '|' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (|:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] |= array[i];
+
+  /* '^' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (^:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] ^= array[i];
+
+  /* '&&' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      lresult[j] = lresult[j] && (result[j] > array[i]);
+
+  /* '||' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      lresult[j] = lresult[j] || (result[j] > array[i]);
+
+  return 0;
+}
+
+/* Check that default copy maps are generated for loop reductions.  */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: \[0-9\]+\\\]\\)" 9 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: \[0-9\]+\\\]\\)" 2 "gimple" } } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-arrays-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-arrays-1.c
new file mode 100644
index 00000000000..6f1b86a32a7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-arrays-1.c
@@ -0,0 +1,69 @@ 
+/* { dg-do run } */
+
+/* Array reductions.  */
+
+#include <stdlib.h>
+#include "reduction.h"
+
+#define ng 8
+#define nw 4
+#define vl 32
+
+#define N 10
+
+#define check_reduction_array_op_all(type, opr, init, b)	\
+  check_reduction_xxx_xx_all(array, op, type, opr, init, b)
+#define check_reduction_arraysec_op_all(type, opr, init, b)	\
+  check_reduction_xxx_xx_all(arraysec, op, type, opr, init, b)
+#define check_reduction_array_macro_all(type, opr, init, b)	\
+  check_reduction_xxx_xx_all(array, macro, type, opr, init, b)
+#define check_reduction_arraysec_macro_all(type, opr, init, b)	\
+  check_reduction_xxx_xx_all(arraysec, macro, type, opr, init, b)
+    
+int
+main (void)
+{
+  const int n = 100;
+  int ints[n];
+  float flts[n];
+  double dbls[n];
+  int cmp_val = 5;
+
+  for (int i = 0; i < n; i++)
+    {
+      ints[i] = i + 1;
+      flts[i] = i + 1;
+      dbls[i] = i + 1;
+    }
+
+  check_reduction_array_op_all (int, +, 0, ints[i]);
+  check_reduction_array_op_all (int, *, 1, ints[i]);
+  check_reduction_array_op_all (int, &, -1, ints[i]);
+  check_reduction_array_op_all (int, |, 0, ints[i]);
+  check_reduction_array_op_all (int, ^, 0, ints[i]);
+  check_reduction_array_op_all (int, &&, 1, (cmp_val > ints[i]));
+  check_reduction_array_op_all (int, ||, 0, (cmp_val > ints[i]));
+  check_reduction_array_macro_all (int, min, n + 1, ints[i]);
+  check_reduction_array_macro_all (int, max, -1, ints[i]);
+
+  check_reduction_array_op_all (float, +, 0, flts[i]);
+  check_reduction_array_op_all (float, *, 1, flts[i]);
+  check_reduction_array_macro_all (float, min, n + 1, flts[i]);
+  check_reduction_array_macro_all (float, max, -1, flts[i]);
+
+  check_reduction_arraysec_op_all (int, +, 0, ints[i]);
+  check_reduction_arraysec_op_all (float, *, 1, flts[i]);
+  check_reduction_arraysec_macro_all (double, min, n + 1, dbls[i]);
+  check_reduction_arraysec_macro_all (double, max, -1, dbls[i]);
+
+  check_reduction_array_op_all (double, +, 0, dbls[i]);
+#if 0
+  /* Currently fails due to unclear issue, presumably unrelated to reduction
+     mechanics. Avoiding for now.  */
+  check_reduction_array_op_all (double, *, 1.0, dbls[i]);
+#endif
+  check_reduction_array_macro_all (double, min, n + 1, dbls[i]);
+  check_reduction_array_macro_all (double, max, -1, dbls[i]);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-structs-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-structs-1.c
new file mode 100644
index 00000000000..22216ff3008
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-structs-1.c
@@ -0,0 +1,121 @@ 
+/* { dg-do run } */
+
+/* Struct reductions.  */
+
+#include <stdlib.h>
+#include "reduction.h"
+
+#define ng 8
+#define nw 4
+#define vl 32
+
+#define N 10
+
+typedef struct { int x, y; } int_pair;
+typedef struct { float m, n; } flt_pair;
+typedef struct
+{
+  int i;
+  double d;
+  float f;
+  int a[N];
+  int_pair ip;
+  flt_pair fp;
+} rectype;
+
+static void
+init_struct (rectype *rec, int val)
+{
+  rec->i = val;
+  rec->d = (double) val;
+  rec->f = (float) val;
+  for (int i = 0; i < N; i++)
+    rec->a[i] = val;
+  rec->ip.x = val;
+  rec->ip.y = val;
+  rec->fp.m = (float) val;
+  rec->fp.n = (float) val;
+}
+
+static int
+struct_eq (rectype *a, rectype *b)
+{
+  if (a->i != b->i || a->d != b->d
+      || a->f != b->f
+      || a->ip.x != b->ip.x
+      || a->ip.y != b->ip.y
+      || a->fp.m != b->fp.m
+      || a->fp.n != b->fp.n)
+    return 0;
+
+  for (int i = 0; i < N; i++)
+    if (a->a[i] != b->a[i])
+      return 0;
+  return 1;
+}
+
+#define check_reduction_struct_xx(type, op, init, b, gwv_par, gwv_loop, apply) \
+  {									\
+    type res, vres;							\
+    init_struct (&res, init);						\
+    DO_PRAGMA (acc parallel gwv_par copy(res))				\
+    DO_PRAGMA (acc loop gwv_loop reduction (op:res))			\
+    for (int i = 0; i < n; i++)						\
+      {									\
+	res.i = apply (op, res.i, b);					\
+	res.d = apply (op, res.d, b);					\
+	res.f = apply (op, res.f, b);					\
+	for (int j = 0; j < N; j++)					\
+	  res.a[j] = apply (op, res.a[j], b);				\
+	res.ip.x = apply (op, res.ip.x, b);				\
+	res.ip.y = apply (op, res.ip.y, b);				\
+	res.fp.m = apply (op, res.fp.m, b);				\
+	res.fp.n = apply (op, res.fp.n, b);				\
+      }									\
+									\
+    init_struct (&vres, init);						\
+    for (int i = 0; i < n; i++)						\
+      {									\
+        vres.i = apply (op, vres.i, b);					\
+	vres.d = apply (op, vres.d, b);					\
+	vres.f = apply (op, vres.f, b);					\
+	for (int j = 0; j < N; j++)					\
+	  vres.a[j] = apply (op, vres.a[j], b);				\
+	vres.ip.x = apply (op, vres.ip.x, b);				\
+	vres.ip.y = apply (op, vres.ip.y, b);				\
+	vres.fp.m = apply (op, vres.fp.m, b);				\
+	vres.fp.n = apply (op, vres.fp.n, b);				\
+      }									\
+									\
+    if (!struct_eq (&res, &vres))					\
+      __builtin_abort ();						\
+  }
+
+#define operator_apply(op, a, b) (a op b)
+#define check_reduction_struct_op(type, op, init, b, gwv_par, gwv_loop)	\
+  check_reduction_struct_xx(type, op, init, b, gwv_par, gwv_loop, operator_apply)
+
+#define function_apply(op, a, b) (op (a, b))
+#define check_reduction_struct_macro(type, op, init, b, gwv_par, gwv_loop) \
+  check_reduction_struct_xx(type, op, init, b, gwv_par, gwv_loop, function_apply)
+
+#define check_reduction_struct_op_all(type, opr, init, b)	\
+  check_reduction_xxx_xx_all (struct, op, type, opr, init, b)
+#define check_reduction_struct_macro_all(type, opr, init, b)		\
+  check_reduction_xxx_xx_all (struct, macro, type, opr, init, b)
+
+int
+main (void)
+{
+  const int n = 10;
+  int ints[n];
+
+  for (int i = 0; i < n; i++)
+    ints[i] = i + 1;
+
+  check_reduction_struct_op_all (rectype, +, 0, ints[i]);
+  check_reduction_struct_op_all (rectype, *, 1, ints[i]);
+  check_reduction_struct_macro_all (rectype, min, n + 1, ints[i]);
+  check_reduction_struct_macro_all (rectype, max, -1, ints[i]);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h
index 1b3f8d45ace..c928578eeea 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h
@@ -37,6 +37,58 @@  DO_PRAGMA (acc loop gwv_loop reduction (op:res))			\
       abort ();								\
   }
 
+#define check_reduction_array_xx(type, var, var_in_clause, op, init, b, \
+				 gwv_par, gwv_loop, apply)		\
+  {									\
+   type var[N], var ## _check[N];					\
+   for (int i = 0; i < N; i++)						\
+     var[i] = var ## _check[i] = (init);				\
+   DO_PRAGMA (acc parallel gwv_par copy (var_in_clause))		\
+   DO_PRAGMA (acc loop gwv_loop reduction (op: var_in_clause))		\
+   for (int i = 0; i < n; i++)						\
+     for (int j = 0; j < N; j++)					\
+       var[j] = apply (op, var[j], (b));				\
+									\
+   for (int i = 0; i < n; i++)						\
+     for (int j = 0; j < N; j++)					\
+       var ## _check[j] = apply (op, var ## _check[j], (b));		\
+									\
+   for (int j = 0; j < N; j++)						\
+     if (var[j] != var ## _check[j])					\
+       abort ();							\
+  }
+
+#define operator_apply(op, a, b) (a op b)
+#define check_reduction_array_op(type, op, init, b, gwv_par, gwv_loop)	\
+  check_reduction_array_xx (type, v, v, op, init, b, gwv_par, gwv_loop,	\
+			    operator_apply)
+#define check_reduction_arraysec_op(type, op, init, b, gwv_par, gwv_loop) \
+  check_reduction_array_xx (type, v, v[:N], op, init, b, gwv_par, gwv_loop, \
+			    operator_apply)
+
+
+#define function_apply(op, a, b) (op (a, b))
+#define check_reduction_array_macro(type, op, init, b, gwv_par, gwv_loop)\
+  check_reduction_array_xx (type, v, v, op, init, b, gwv_par, gwv_loop,	\
+			    function_apply)
+#define check_reduction_arraysec_macro(type, op, init, b, gwv_par, gwv_loop)\
+  check_reduction_array_xx (type, v, v[:N], op, init, b, gwv_par, gwv_loop, \
+			    function_apply)
+
+#define check_reduction_xxx_xx_all(tclass, form, type, op, init, b)	\
+  check_reduction_ ## tclass ## _ ## form (type, op, init, b, num_gangs (ng), gang);	\
+  check_reduction_ ## tclass ## _ ## form (type, op, init, b, num_workers (nw), worker); \
+  check_reduction_ ## tclass ## _ ## form (type, op, init, b, vector_length (vl), vector); \
+  check_reduction_ ## tclass ## _ ## form (type, op, init, b,			\
+					   num_gangs (ng) num_workers (nw), gang worker); \
+  check_reduction_ ## tclass ## _ ## form (type, op, init, b,			\
+					   num_gangs (ng) vector_length (vl), gang vector); \
+  check_reduction_ ## tclass ## _ ## form (type, op, init, b,			\
+					   num_workers (nw) vector_length (vl), worker vector); \
+  check_reduction_ ## tclass ## _ ## form (type, op, init, b, \
+					   num_gangs (ng) num_workers (nw) vector_length (vl), \
+					   gang worker vector);
+
 #define max(a, b) (((a) > (b)) ? (a) : (b))
 #define min(a, b) (((a) < (b)) ? (a) : (b))