[1/3] OpenMP: C support for imperfectly-nested loops

Message ID 20230428232254.628185-2-sandra@codesourcery.com
State Accepted
Headers
Series OpenMP: Support imperfectly-nested loops |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Sandra Loosemore April 28, 2023, 11:22 p.m. UTC
  OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop.  In GCC this code is moved
into the inner loop body by the respective front ends.

This patch changes the C front end to use recursive descent parsing
on nested loops within an "omp for" construct, rather than an iterative
approach, in order to preserve proper nesting of compound statements.

gcc/c/ChangeLog
	* c-parser.cc (struct c_parser): Add omp_for_parse_state field.
	(struct omp_for_parse_data): New.
	(c_parser_compound_statement_nostart): Special-case nested
	OMP loops and blocks in intervening code.
	(c_parser_while_statement): Reject in intervening code.
	(c_parser_do_statement): Likewise.
	(c_parser_for_statement): Likewise.
	(c_parser_postfix_expression_after_primary): Reject calls to OMP
	runtime routines in intervening code.
	(c_parser_pragma): Reject OMP pragmas in intervening code.
	(c_parser_omp_loop_nest): New, split from c_parser_omp_for_loop.
	(c_parser_omp_for_loop): Rewrite to use recursive descent and
	generalize handling for intervening code.

gcc/ChangeLog
	* omp-api.h: New file.
	* omp-general.cc (omp_runtime_api_procname): New.
	(omp_runtime_api_call): Moved here from omp-low.cc, and make
	non-static.
	* omp-general.h: Include omp-api.h.
	* omp-low.cc (omp_runtime_api_call): Delete this copy.

gcc/testsuite/ChangeLog
	* c-c++-common/goacc/collapse-1.c: Adjust expected error messages.
	* c-c++-common/goacc/tile-2.c: Likewise.
	* c-c++-common/gomp/imperfect1.c: New.
	* c-c++-common/gomp/imperfect2.c: New.
	* c-c++-common/gomp/imperfect3.c: New.
	* c-c++-common/gomp/imperfect4.c: New.
	* c-c++-common/gomp/imperfect5.c: New.
	* gcc.dg/gomp/collapse-1.c: Adjust expected error messages.

libgomp/ChangeLog
	* testsuite/libgomp.c-c++-common/imperfect1.c: New.
	* testsuite/libgomp.c-c++-common/imperfect2.c: New.
	* testsuite/libgomp.c-c++-common/imperfect3.c: New.
	* testsuite/libgomp.c-c++-common/imperfect4.c: New.
	* testsuite/libgomp.c-c++-common/imperfect5.c: New.
	* testsuite/libgomp.c-c++-common/imperfect6.c: New.
	* testsuite/libgomp.c-c++-common/offload-imperfect1.c: New.
	* testsuite/libgomp.c-c++-common/offload-imperfect2.c: New.
	* testsuite/libgomp.c-c++-common/offload-imperfect3.c: New.
	* testsuite/libgomp.c-c++-common/offload-imperfect4.c: New.
---
 gcc/c/c-parser.cc                             | 692 +++++++++++-------
 gcc/omp-api.h                                 |  32 +
 gcc/omp-general.cc                            | 134 ++++
 gcc/omp-general.h                             |   1 +
 gcc/omp-low.cc                                | 129 ----
 gcc/testsuite/c-c++-common/goacc/collapse-1.c |  14 +-
 gcc/testsuite/c-c++-common/goacc/tile-2.c     |   4 +-
 gcc/testsuite/c-c++-common/gomp/imperfect1.c  |  40 +
 gcc/testsuite/c-c++-common/gomp/imperfect2.c  |  36 +
 gcc/testsuite/c-c++-common/gomp/imperfect3.c  |  35 +
 gcc/testsuite/c-c++-common/gomp/imperfect4.c  |  35 +
 gcc/testsuite/c-c++-common/gomp/imperfect5.c  |  59 ++
 gcc/testsuite/gcc.dg/gomp/collapse-1.c        |  10 +-
 .../libgomp.c-c++-common/imperfect1.c         |  76 ++
 .../libgomp.c-c++-common/imperfect2.c         | 114 +++
 .../libgomp.c-c++-common/imperfect3.c         | 119 +++
 .../libgomp.c-c++-common/imperfect4.c         | 117 +++
 .../libgomp.c-c++-common/imperfect5.c         |  49 ++
 .../libgomp.c-c++-common/imperfect6.c         | 115 +++
 .../libgomp.c-c++-common/offload-imperfect1.c |  81 ++
 .../libgomp.c-c++-common/offload-imperfect2.c | 122 +++
 .../libgomp.c-c++-common/offload-imperfect3.c | 125 ++++
 .../libgomp.c-c++-common/offload-imperfect4.c | 122 +++
 23 files changed, 1870 insertions(+), 391 deletions(-)
 create mode 100644 gcc/omp-api.h
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect3.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect4.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect5.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect3.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect4.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect5.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect6.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/offload-imperfect1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/offload-imperfect2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/offload-imperfect3.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/offload-imperfect4.c
  

Comments

Jakub Jelinek May 25, 2023, 10 a.m. UTC | #1
On Fri, Apr 28, 2023 at 05:22:52PM -0600, Sandra Loosemore wrote:
> OpenMP 5.0 removed the restriction that multiple collapsed loops must
> be perfectly nested, allowing "intervening code" (including nested
> BLOCKs) before or after each nested loop.  In GCC this code is moved
> into the inner loop body by the respective front ends.
> 
> This patch changes the C front end to use recursive descent parsing
> on nested loops within an "omp for" construct, rather than an iterative
> approach, in order to preserve proper nesting of compound statements.
> 
> gcc/c/ChangeLog
> 	* c-parser.cc (struct c_parser): Add omp_for_parse_state field.
> 	(struct omp_for_parse_data): New.
> 	(c_parser_compound_statement_nostart): Special-case nested
> 	OMP loops and blocks in intervening code.
> 	(c_parser_while_statement): Reject in intervening code.
> 	(c_parser_do_statement): Likewise.
> 	(c_parser_for_statement): Likewise.
> 	(c_parser_postfix_expression_after_primary): Reject calls to OMP
> 	runtime routines in intervening code.
> 	(c_parser_pragma): Reject OMP pragmas in intervening code.
> 	(c_parser_omp_loop_nest): New, split from c_parser_omp_for_loop.
> 	(c_parser_omp_for_loop): Rewrite to use recursive descent and
> 	generalize handling for intervening code.
> 
> gcc/ChangeLog
> 	* omp-api.h: New file.

Why?  Just add those to omp-general.h.

> 	* omp-general.cc (omp_runtime_api_procname): New.
> 	(omp_runtime_api_call): Moved here from omp-low.cc, and make
> 	non-static.
> 	* omp-general.h: Include omp-api.h.
> 	* omp-low.cc (omp_runtime_api_call): Delete this copy.
> 
> gcc/testsuite/ChangeLog
> 	* c-c++-common/goacc/collapse-1.c: Adjust expected error messages.
> 	* c-c++-common/goacc/tile-2.c: Likewise.
> 	* c-c++-common/gomp/imperfect1.c: New.
> 	* c-c++-common/gomp/imperfect2.c: New.
> 	* c-c++-common/gomp/imperfect3.c: New.
> 	* c-c++-common/gomp/imperfect4.c: New.
> 	* c-c++-common/gomp/imperfect5.c: New.
> 	* gcc.dg/gomp/collapse-1.c: Adjust expected error messages.
> 
> libgomp/ChangeLog
> 	* testsuite/libgomp.c-c++-common/imperfect1.c: New.
> 	* testsuite/libgomp.c-c++-common/imperfect2.c: New.
> 	* testsuite/libgomp.c-c++-common/imperfect3.c: New.
> 	* testsuite/libgomp.c-c++-common/imperfect4.c: New.
> 	* testsuite/libgomp.c-c++-common/imperfect5.c: New.
> 	* testsuite/libgomp.c-c++-common/imperfect6.c: New.
> 	* testsuite/libgomp.c-c++-common/offload-imperfect1.c: New.
> 	* testsuite/libgomp.c-c++-common/offload-imperfect2.c: New.
> 	* testsuite/libgomp.c-c++-common/offload-imperfect3.c: New.
> 	* testsuite/libgomp.c-c++-common/offload-imperfect4.c: New.

If the 3 patches are going to be committed separately (which I think is a
good idea), then the *c-c++-common* tests are a problem, because the tests
will then fail after the C FE part is committed before the C++ FE part is
committed.
For the new tests there are 2 options, one is commit them in the C patch
with /* { dg-do run { target c } } */ instead of just
/* { dg-do run } */ etc. and then in the second patch remove those
" { target c }" parts, or commit them in the second patch only.
For the existing tests with adjustments, do the { target c } vs.
{ target c++ } games and tweak in the second patch.

The offload-imperfect* tests should be called target-imperfect* I think,
for consistency with other tests.

In the gcc/testsuite/c-c++-common/gomp/ tests I miss some coverage for
the boundary cases what is and isn't intervening code.
Before your changes, we were allowing multiple levels of {}s,
so
#pragma omp for ordered(2)
for (int i = 0; i < 64; i++)
  {
    {
      {
        for (int j = 0; j < 64; j++)
          ;
      }
    }
  }
which is valid in 5.0 (but should be tested in the testsuite), but also
empty statements, which when reading the 5.1/5.2 spec don't actually seem to
be valid.
#pragma omp for ordered(2)
for (int i = 0; i < 64; i++)
  {
    ;
    ;
    ;
    for (int j = 0; j < 64; j++)
      ;
    ;
  }
because even the empty statement is I think intervening code according to
the grammar.

Another thing I don't really see covered in the testsuite nor in the code
is if some variable declared in intervening code is then used in the inner
loop's init/cond/incr expressions.  I mean something like:
#pragma omp for collapse(2)
for (int i = 0; i < 64; i++)
  {
    int v = (i + 4) * 2;
    for (int j = v; j < 64; j++)
      ;
  }
That just ICEs with your patch, we should diagnose it as invalid.
In the canonical loop form requirements the standard requires that the
expressions are loop invariant with the exceptions of specific cases
allowed for non-rectangular loops.  So, above I'm sure that is violated.
Another case would be
#pragma omp for collapse(2)
for (int i = 0; i < 64; i++)
  {
    int v = (i + 4);
    for (int j = v; j < 64; j++)
      ;
  }
but here it isn't invariant either, although for (int j = i + 4; j < 64; j++)
would be valid non-rectangular loop I think it requires the exact syntax.
Yet another case is
#pragma omp for collapse(2)
for (int i = 0; i < 64; i++)
  {
    int v = 8;
    for (int j = v; j < 64; j++)
      ;
  }
This actually is loop invariant (the value) and not really sure what would
in the standard prevent it from being valid, but it doesn't feel right and
can't be handled easily, because one should be able to compute the number
of iterations before the loop where v isn't in scope or there could be a
different v in scope.
And/or for C++
#pragma omp for collapse(2)
for (int i = 0; i < 64; i++)
  {
    const int v = 8; // or even constexpr
    for (int j = v; j < 64; j++)
      ;
  }
where already during parsing we won't really know we are actually using v
if it isn't ODR used.  Or unless it is in a template and the const/constexpr
variable is value or type dependent.
I think we want to discuss this in omp-lang.

> @@ -1525,6 +1529,23 @@ struct oacc_routine_data {
>  /* Used for parsing objc foreach statements.  */
>  static tree objc_foreach_break_label, objc_foreach_continue_label;
>  
> +/* Used for parsing OMP for loops.  See c_parser_omp_for_loop.  */
> +struct omp_for_parse_data {
> +  tree declv, condv, incrv, initv;
> +  tree pre_body;
> +  tree bindings;
> +  int collapse;

I think it is confusing to call this collapse when it isn't collapse
but ordered ? ordered : collapse.
Call it count like in c_parser_omp_for_loop?

> @@ -6123,6 +6145,15 @@ c_parser_compound_statement_nostart (c_parser *parser)
>    bool last_label = false;
>    bool save_valid_for_pragma = valid_location_for_stdc_pragma_p ();
>    location_t label_loc = UNKNOWN_LOCATION;  /* Quiet warning.  */
> +  struct omp_for_parse_data *omp_for_parse_state;
> +
> +  if (parser->omp_for_parse_state
> +      && (parser->omp_for_parse_state->depth
> +	  < parser->omp_for_parse_state->collapse - 1))
> +    omp_for_parse_state = parser->omp_for_parse_state;
> +  else
> +    omp_for_parse_state = NULL;
> +
>    if (c_parser_next_token_is (parser, CPP_CLOSE_BRACE))
>      {
>        location_t endloc = c_parser_peek_token (parser)->location;

So, here we skip over various cases that need to be covered in the testcases
and decided what to do about.
The first one is:
#pragma omp for collapse(2)
for (int i = 0; i < 64; ++i)
  {
    __label__ a, b;
    goto a; a:;
    for (int j = 0; j < 64; ++j)
      ;
    goto b; b:;
  }
This is a GNU extension, do we want to allow it?
Conceptually it is like variable declarations which can appear in
intervening code, but we need to make sure it is handled properly
(the code registers those in the current scope using declare_label, so
supposedly we want to move those later to the body scope when moving there
other intervening code.  And probably it should be rejected when intervening
code is not allowed (+ test for that).

Next are C2X standard attributes, case XX:/default: (I think these would
violate the single entry single exit requirements of intervening code),
label: (that can be fine as long as it is only jumped to from within the
same intervening code), so again something for the testsuite:
int k = 0;
#pragma omp for collapse(2)
for (int i = 0; i < 64; ++i)
  {
    a: if (k) goto a;
    for (int j = 0; j < 64; ++j)
      ;
    b: if (k) goto b;
  }
Then __extension__, again something that should be considered whether it
acts as intervening code, or is even allowed as if it wasn't intervening
code + testsuite coverage.
Then pragmas which you handle in c_parser_pragma already (but will need some
tweaks for tile/unroll from Frederik).
else is an error, so nothing needs to be done about it.

> @@ -7138,6 +7223,16 @@ c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll,
>    gcc_assert (c_parser_next_token_is_keyword (parser, RID_WHILE));
>    token_indent_info while_tinfo
>      = get_token_indent_info (c_parser_peek_token (parser));
> +
> +  if (parser->omp_for_parse_state
> +      && (parser->omp_for_parse_state->depth
> +	  < parser->omp_for_parse_state->collapse - 1))
> +    {
> +      error_at (c_parser_peek_token (parser)->location,
> +		"loop not permitted in intervening code in OMP loop body");

Please use OpenMP in diagnostics rather than OMP (multiple times).
Also, the while and do cases are not covered in the testsuite, they should
be (there is just test for for in there).
Would be nice to also test it nested in other compound statements, like
#pragma omp for collapse(2)
for (int i = 0; i < 64; ++i)
  {
    if (1)
      {
	for (int j = 0; j < 64; ++j)	// { dg-error "..." }
	  ;
	while (0) ;			// { dg-error "..." }
	do { } while (0);		// { dg-error "..." }
      }
    for (int k = 0; k < 64; ++k)
      ;
  }
I guess especially the do { ... } while (0); limitation could be quite
harmful, that is something used heavily in various macros.
So perhaps for OpenMP 6.0 we should consider there rejecting just
for statements for C/C++?  And/or also allow for loops if inside of some
substatement of say if/do/while/switch, those are also clearly
distinguishable from the main loop.  On the other side, I've never been
really excited by this imperfectly nested loops mess making it into the
standard.

> +      parser->omp_for_parse_state->fail = true;
> +    }
> +
>    c_parser_consume_token (parser);
>    block = c_begin_compound_stmt (flag_isoc99);
>    loc = c_parser_peek_token (parser)->location;
> @@ -11234,6 +11349,14 @@ c_parser_postfix_expression_after_primary (c_parser *parser,
>  		  && fndecl_built_in_p (expr.value, BUILT_IN_NORMAL)
>  		  && vec_safe_length (exprlist) == 1)
>  		warn_for_abs (expr_loc, expr.value, (*exprlist)[0]);
> +	      if (parser->omp_for_parse_state
> +		  && parser->omp_for_parse_state->in_intervening_code
> +		  && omp_runtime_api_call (expr.value))
> +		{
> +		  error_at (expr_loc, "Calls to the OpenMP runtime API are "

s/Calls/calls/, C/C++ FE/middle end diagnostics never start with capital
letter unless it is part of some shorthand/word always spelled with capital
letters.

> +				      "not permitted in intervening code");
> +		  parser->omp_for_parse_state->fail = true;
> +		}
>  	    }
>  
>  	  start = expr.get_start ();

> +  if (decl == NULL || decl == error_mark_node || init == error_mark_node)
> +    omp_for_parse_state->fail = true;
> +  else
> +    {
> +      TREE_VEC_ELT (omp_for_parse_state->declv, omp_for_parse_state->depth)
> +	= decl;
> +      TREE_VEC_ELT (omp_for_parse_state->initv, omp_for_parse_state->depth)
> +	= init;
> +      TREE_VEC_ELT (omp_for_parse_state->condv, omp_for_parse_state->depth)
> +	= cond;
> +      TREE_VEC_ELT (omp_for_parse_state->incrv, omp_for_parse_state->depth)
> +	= incr;

This would be more readable using a temporary:
      int depth = omp_for_parse_state->depth;
      TREE_VEC_ELT (omp_for_parse_state->declv, depth) = decl;
      TREE_VEC_ELT (omp_for_parse_state->initv, depth) = init;
      TREE_VEC_ELT (omp_for_parse_state->condv, depth) = cond;
      TREE_VEC_ELT (omp_for_parse_state->incrv, depth) = incr;

> +    }
> +
> +parse_next:
> +  omp_for_parse_state->want_nested_loop = true;
> +  moreloops = omp_for_parse_state->depth < omp_for_parse_state->collapse - 1;

Shouldn't omp_for_parse_state->want_nested_loop be initialized to moreloops
here?

> +  if (moreloops && c_parser_next_token_is_keyword (parser, RID_FOR))
> +    {
> +      omp_for_parse_state->depth++;
> +      body = c_parser_omp_loop_nest (parser, if_p);
> +      omp_for_parse_state->depth--;
> +    }
> +  else if (moreloops && c_parser_next_token_is (parser, CPP_OPEN_BRACE))
> +    {
> +      /* This is the open brace in the loop-body grammar production.  Rather
> +	 than trying to special-case braces, just parse it as a compound
> +	 statement and handle the nested loop-body case there.  Note that
> +	 when we see a further open brace inside the compound statement
> +	 loop-body, we don't know whether it is the start of intervening
> +	 code that is a compound statement, or a level of braces
> +	 surrounding a nested loop-body.  Use the WANT_NESTED_LOOP state
> +	 bit to ensure we have only one nested loop at each level.  */
> +      omp_for_parse_state->in_intervening_code = true;
> +      body = c_parser_compound_statement (parser, NULL);
> +      omp_for_parse_state->in_intervening_code = false;
> +      if (omp_for_parse_state->want_nested_loop)
> +	{
> +	  /* We have already parsed the whole loop body and not found a
> +	     nested loop.  */
> +	  error_at (omp_for_parse_state->for_loc,
> +		    "not enough nested loops");
> +	  omp_for_parse_state->fail = true;
> +	}
> +      if_p = NULL;
> +    }
> +  else
> +    {
> +      /* This is the final-loop-body case in the grammar: we have
> +	 something that is not a FOR and not an open brace.  */
> +      if (moreloops)
> +	{
> +	  /* If we were expecting a nested loop, give an error and mark
> +	     that parsing has failed, and try to recover by parsing the
> +	     body as regular code without further collapsing.  */
> +	  error_at (omp_for_parse_state->for_loc,
> +		    "not enough nested loops");
> +	  omp_for_parse_state->fail = true;
> +	}
> +      in_statement = IN_OMP_FOR;

And/or temporarily clear
parser->omp_for_parse_state here and reset it back afterwards?
Then
  if (parser->omp_for_parse_state
      && (parser->omp_for_parse_state->depth
          < parser->omp_for_parse_state->collapse - 1))
    omp_for_parse_state = parser->omp_for_parse_state;
  else
    omp_for_parse_state = NULL;
etc. wouldn't be really needed, if parser->omp_for_parse_state would
be non-NULL, we'd want to use it without further checks.

> +      body = push_stmt_list ();
> +      if (omp_for_parse_state->inscan)
> +	c_parser_omp_scan_loop_body (parser, false);
> +      else
> +	add_stmt (c_parser_c99_block_statement (parser, if_p));
> +      body = pop_stmt_list (body);
> +    }
> +  in_statement = save_in_statement;
> +  omp_for_parse_state->want_nested_loop = false;
> +  omp_for_parse_state->in_intervening_code = true;
> +
> +  /* Pop and return the implicit scope surrounding this level of loop.
> +     Any iteration variable bound in loop_scope is pulled out and later
> +     will be added to the scope surrounding the entire OMP_FOR.  That
> +     keeps the gimplifier happy later on, and meanwhile we have already
> +     resolved all references to the iteration variable in its true scope.  */
> +  add_stmt (body);
> +  body = c_end_compound_stmt (loc, loop_scope, true);
> +  if (decl && TREE_CODE (body) == BIND_EXPR)

So, this moves just the iteration var if declared in the for loop and nothing else?

> +    {
> +      tree t = BIND_EXPR_VARS (body);
> +      tree prev = NULL_TREE, next = NULL_TREE;
> +      while (t)
> +	{
> +	  next = DECL_CHAIN (t);
> +	  if (t == decl)
> +	    {
> +	      if (prev)
> +		DECL_CHAIN (prev) = next;
> +	      else
> +		{
> +		  BIND_EXPR_VARS (body) = next;
> +		  BLOCK_VARS (BIND_EXPR_BLOCK (body)) = next;
> +		}
> +	      DECL_CHAIN (t) = omp_for_parse_state->bindings;
> +	      omp_for_parse_state->bindings = t;
> +	      break;
> +	    }
> +	  else
> +	    {
> +	      prev = t;
> +	      t = next;
> +	    }
> +	}
> +      if (BIND_EXPR_VARS (body) == NULL_TREE)
> +	body = BIND_EXPR_BODY (body);
> +    }
> +
> +  return body;
> +}
> +
>  /* Parse the restricted form of loop statements allowed by OpenACC and OpenMP.
>     The real trick here is to determine the loop control variable early
>     so that we can push a new decl if necessary to make it private.

> +#pragma omp for ordered(3)
> +  for (i = 0; i < a1; i++)  /* { dg-error "inner loops must be perfectly nested" } */
> +    {
> +      f1 (0, i);
> +      for (j = 0; j < a2; j++)
> +	{
> +	  f1 (1, j);
> +	  for (k = 0; k < a3; k++)
> +	    {
> +	      f1 (2, k);
> +	      f2 (2, k);

Would be good to stick here #pragma omp ordered doacross(source) and sink,
just to make it valid except for the intervening code.

> +	    }
> +	  f2 (1, j);
> +	}
> +      f2 (0, i);
> +    }
> +}
> +
> diff --git a/gcc/testsuite/c-c++-common/gomp/imperfect4.c b/gcc/testsuite/c-c++-common/gomp/imperfect4.c
> new file mode 100644
> index 00000000000..e5feff730a9
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/gomp/imperfect4.c
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +
> +/* This test case is expected to fail due to errors.  */
> +
> +static int f1count[3], f2count[3];

Don't declare variables you don't use at all in the test (in multiple
tests).

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/gomp/imperfect5.c
> @@ -0,0 +1,59 @@
> +/* { dg-do compile } */

I think you'd better just copy some existing scan testcase and
just add intervening code to it.

> +
> +/* This test case is expected to fail due to errors.  */
> +
> +static int f1count[3], f2count[3];
> +
> +int f1 (int depth, int iter);
> +int f2 (int depth, int iter);
> +int ijk (int x, int y, int z);
> +void f3 (int sum);
> +
> +/* This function isn't particularly meaningful, but it should compile without
> +   error.  */
> +int s1 (int a1, int a2, int a3)
> +{
> +  int i, j, k;
> +  int r = 0;
> +
> +#pragma omp simd collapse(3) reduction (inscan, +:r)
> +  for (i = 0; i < a1; i++)
> +    {
> +      for (j = 0; j < a2; j++)
> +	{
> +	  for (k = 0; k < a3; k++)
> +	    {
> +	      r = r + ijk (i, j, k);
> +#pragma omp scan exclusive (r)
> +	      f3 (r);
> +	    }
> +	}
> +    }
> +  return r;
> +}
> +
> +/* Adding intervening code should trigger an error.  */
> +int s2 (int a1, int a2, int a3)
> +{
> +  int i, j, k;
> +  int r = 0;
> +
> +#pragma omp simd collapse(3) reduction (inscan, +:r)
> +  for (i = 0; i < a1; i++)  /* { dg-error "inner loops must be perfectly nested" } */
> +    {
> +      f1 (0, i);
> +      for (j = 0; j < a2; j++)
> +	{
> +	  f1 (1, j);
> +	  for (k = 0; k < a3; k++)
> +	    {
> +	      r = r + ijk (i, j, k);
> +#pragma omp scan exclusive (r)
> +	      f3 (r);
> +	    }
> +	  f2 (1, j);
> +	}
> +      f2 (0, i);
> +    }
> +  return r;
> +}

	Jakub
  
Sandra Loosemore June 14, 2023, 10:41 p.m. UTC | #2
High-order bit:  I've just committed OG13 version of these patches that is integrated with Frederik's previous loop transformation patches that are already on that branch.  The OG13 version incorporates many of the suggestions from this initial review plus a few bug fixes.  I've also made corresponding fixes to the mainline version but I've still got a lot of unfinished items, mostly related to additional tests for corner cases.

On 5/25/23 04:00, Jakub Jelinek wrote:
> On Fri, Apr 28, 2023 at 05:22:52PM -0600, Sandra Loosemore wrote:
>> OpenMP 5.0 removed the restriction that multiple collapsed loops must
>> be perfectly nested, allowing "intervening code" (including nested
>> BLOCKs) before or after each nested loop.  In GCC this code is moved
>> into the inner loop body by the respective front ends.
>>
>> This patch changes the C front end to use recursive descent parsing
>> on nested loops within an "omp for" construct, rather than an iterative
>> approach, in order to preserve proper nesting of compound statements.
>>
>> gcc/c/ChangeLog
>> 	* c-parser.cc (struct c_parser): Add omp_for_parse_state field.
>> 	(struct omp_for_parse_data): New.
>> 	(c_parser_compound_statement_nostart): Special-case nested
>> 	OMP loops and blocks in intervening code.
>> 	(c_parser_while_statement): Reject in intervening code.
>> 	(c_parser_do_statement): Likewise.
>> 	(c_parser_for_statement): Likewise.
>> 	(c_parser_postfix_expression_after_primary): Reject calls to OMP
>> 	runtime routines in intervening code.
>> 	(c_parser_pragma): Reject OMP pragmas in intervening code.
>> 	(c_parser_omp_loop_nest): New, split from c_parser_omp_for_loop.
>> 	(c_parser_omp_for_loop): Rewrite to use recursive descent and
>> 	generalize handling for intervening code.
>>
>> gcc/ChangeLog
>> 	* omp-api.h: New file.
> 
> Why?  Just add those to omp-general.h.

This is for the Fortran front end, which needs this stuff without everything else omp-general.h sucks in.  I remember that I initially did try to put it in omp-general.h but split it out when I ran into some trouble with that, and I thought it was an abstraction violation in the Fortran front end.  I didn't touch this for now; is it important enough that I should spend more time on it?

>> 	* omp-general.cc (omp_runtime_api_procname): New.
>> 	(omp_runtime_api_call): Moved here from omp-low.cc, and make
>> 	non-static.
>> 	* omp-general.h: Include omp-api.h.
>> 	* omp-low.cc (omp_runtime_api_call): Delete this copy.
>>
>> gcc/testsuite/ChangeLog
>> 	* c-c++-common/goacc/collapse-1.c: Adjust expected error messages.
>> 	* c-c++-common/goacc/tile-2.c: Likewise.
>> 	* c-c++-common/gomp/imperfect1.c: New.
>> 	* c-c++-common/gomp/imperfect2.c: New.
>> 	* c-c++-common/gomp/imperfect3.c: New.
>> 	* c-c++-common/gomp/imperfect4.c: New.
>> 	* c-c++-common/gomp/imperfect5.c: New.
>> 	* gcc.dg/gomp/collapse-1.c: Adjust expected error messages.
>>
>> libgomp/ChangeLog
>> 	* testsuite/libgomp.c-c++-common/imperfect1.c: New.
>> 	* testsuite/libgomp.c-c++-common/imperfect2.c: New.
>> 	* testsuite/libgomp.c-c++-common/imperfect3.c: New.
>> 	* testsuite/libgomp.c-c++-common/imperfect4.c: New.
>> 	* testsuite/libgomp.c-c++-common/imperfect5.c: New.
>> 	* testsuite/libgomp.c-c++-common/imperfect6.c: New.
>> 	* testsuite/libgomp.c-c++-common/offload-imperfect1.c: New.
>> 	* testsuite/libgomp.c-c++-common/offload-imperfect2.c: New.
>> 	* testsuite/libgomp.c-c++-common/offload-imperfect3.c: New.
>> 	* testsuite/libgomp.c-c++-common/offload-imperfect4.c: New.
> 
> If the 3 patches are going to be committed separately (which I think is a
> good idea), then the *c-c++-common* tests are a problem, because the tests
> will then fail after the C FE part is committed before the C++ FE part is
> committed.
> For the new tests there are 2 options, one is commit them in the C patch
> with /* { dg-do run { target c } } */ instead of just
> /* { dg-do run } */ etc. and then in the second patch remove those
> " { target c }" parts, or commit them in the second patch only.
> For the existing tests with adjustments, do the { target c } vs.
> { target c++ } games and tweak in the second patch.

OK, I've split the new c-c++-common tests into a separate commit, and done the other rigamarole to adjust the other test cases incrementally with each part of the series.

> The offload-imperfect* tests should be called target-imperfect* I think,
> for consistency with other tests.

Done.

> In the gcc/testsuite/c-c++-common/gomp/ tests I miss some coverage for
> the boundary cases what is and isn't intervening code.
> Before your changes, we were allowing multiple levels of {}s,
> so
> #pragma omp for ordered(2)
> for (int i = 0; i < 64; i++)
>    {
>      {
>        {
>          for (int j = 0; j < 64; j++)
>            ;
>        }
>      }
>    }
> which is valid in 5.0 (but should be tested in the testsuite), but also
> empty statements, which when reading the 5.1/5.2 spec don't actually seem to
> be valid.
> #pragma omp for ordered(2)
> for (int i = 0; i < 64; i++)
>    {
>      ;
>      ;
>      ;
>      for (int j = 0; j < 64; j++)
>        ;
>      ;
>    }
> because even the empty statement is I think intervening code according to
> the grammar.

Do you have recommendations for what the behavior should be now?  Reject these constructs with an error?  Accept them with a warning?  Or accept them quietly for backward compatibility?

> Another thing I don't really see covered in the testsuite nor in the code
> is if some variable declared in intervening code is then used in the inner
> loop's init/cond/incr expressions.  I mean something like:
> #pragma omp for collapse(2)
> for (int i = 0; i < 64; i++)
>    {
>      int v = (i + 4) * 2;
>      for (int j = v; j < 64; j++)
>        ;
>    }
> That just ICEs with your patch, we should diagnose it as invalid.
> In the canonical loop form requirements the standard requires that the
> expressions are loop invariant with the exceptions of specific cases
> allowed for non-rectangular loops.  So, above I'm sure that is violated.
> Another case would be
> #pragma omp for collapse(2)
> for (int i = 0; i < 64; i++)
>    {
>      int v = (i + 4);
>      for (int j = v; j < 64; j++)
>        ;
>    }
> but here it isn't invariant either, although for (int j = i + 4; j < 64; j++)
> would be valid non-rectangular loop I think it requires the exact syntax.
> Yet another case is
> #pragma omp for collapse(2)
> for (int i = 0; i < 64; i++)
>    {
>      int v = 8;
>      for (int j = v; j < 64; j++)
>        ;
>    }
> This actually is loop invariant (the value) and not really sure what would
> in the standard prevent it from being valid, but it doesn't feel right and
> can't be handled easily, because one should be able to compute the number
> of iterations before the loop where v isn't in scope or there could be a
> different v in scope.
> And/or for C++
> #pragma omp for collapse(2)
> for (int i = 0; i < 64; i++)
>    {
>      const int v = 8; // or even constexpr
>      for (int j = v; j < 64; j++)
>        ;
>    }
> where already during parsing we won't really know we are actually using v
> if it isn't ODR used.  Or unless it is in a template and the const/constexpr
> variable is value or type dependent.
> I think we want to discuss this in omp-lang.

I haven't tackled this group of corner cases yet.  I am not an OpenMP expert and I'd be inclined just to leave FIXMEs for anything that is not entirely clear in the spec, though (and just forget about cases where the spec clearly says something else).

>> @@ -1525,6 +1529,23 @@ struct oacc_routine_data {
>>   /* Used for parsing objc foreach statements.  */
>>   static tree objc_foreach_break_label, objc_foreach_continue_label;
>>   
>> +/* Used for parsing OMP for loops.  See c_parser_omp_for_loop.  */
>> +struct omp_for_parse_data {
>> +  tree declv, condv, incrv, initv;
>> +  tree pre_body;
>> +  tree bindings;
>> +  int collapse;
> 
> I think it is confusing to call this collapse when it isn't collapse
> but ordered ? ordered : collapse.
> Call it count like in c_parser_omp_for_loop?

Done.

>> @@ -6123,6 +6145,15 @@ c_parser_compound_statement_nostart (c_parser *parser)
>>     bool last_label = false;
>>     bool save_valid_for_pragma = valid_location_for_stdc_pragma_p ();
>>     location_t label_loc = UNKNOWN_LOCATION;  /* Quiet warning.  */
>> +  struct omp_for_parse_data *omp_for_parse_state;
>> +
>> +  if (parser->omp_for_parse_state
>> +      && (parser->omp_for_parse_state->depth
>> +	  < parser->omp_for_parse_state->collapse - 1))
>> +    omp_for_parse_state = parser->omp_for_parse_state;
>> +  else
>> +    omp_for_parse_state = NULL;
>> +
>>     if (c_parser_next_token_is (parser, CPP_CLOSE_BRACE))
>>       {
>>         location_t endloc = c_parser_peek_token (parser)->location;
> 
> So, here we skip over various cases that need to be covered in the testcases
> and decided what to do about.
> The first one is:
> #pragma omp for collapse(2)
> for (int i = 0; i < 64; ++i)
>    {
>      __label__ a, b;
>      goto a; a:;
>      for (int j = 0; j < 64; ++j)
>        ;
>      goto b; b:;
>    }
> This is a GNU extension, do we want to allow it?
> Conceptually it is like variable declarations which can appear in
> intervening code, but we need to make sure it is handled properly
> (the code registers those in the current scope using declare_label, so
> supposedly we want to move those later to the body scope when moving there
> other intervening code.  And probably it should be rejected when intervening
> code is not allowed (+ test for that).
> 
> Next are C2X standard attributes, case XX:/default: (I think these would
> violate the single entry single exit requirements of intervening code),
> label: (that can be fine as long as it is only jumped to from within the
> same intervening code), so again something for the testsuite:
> int k = 0;
> #pragma omp for collapse(2)
> for (int i = 0; i < 64; ++i)
>    {
>      a: if (k) goto a;
>      for (int j = 0; j < 64; ++j)
>        ;
>      b: if (k) goto b;
>    }
> Then __extension__, again something that should be considered whether it
> acts as intervening code, or is even allowed as if it wasn't intervening
> code + testsuite coverage.
> Then pragmas which you handle in c_parser_pragma already (but will need some
> tweaks for tile/unroll from Frederik).
> else is an error, so nothing needs to be done about it.

I haven't done these testcases + fixes yet either, although the integration with Frederik's tile/unroll work is included in the OG13 version of the patches I've just committed.

>> @@ -7138,6 +7223,16 @@ c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll,
>>     gcc_assert (c_parser_next_token_is_keyword (parser, RID_WHILE));
>>     token_indent_info while_tinfo
>>       = get_token_indent_info (c_parser_peek_token (parser));
>> +
>> +  if (parser->omp_for_parse_state
>> +      && (parser->omp_for_parse_state->depth
>> +	  < parser->omp_for_parse_state->collapse - 1))
>> +    {
>> +      error_at (c_parser_peek_token (parser)->location,
>> +		"loop not permitted in intervening code in OMP loop body");
> 
> Please use OpenMP in diagnostics rather than OMP (multiple times).

Done.

> Also, the while and do cases are not covered in the testsuite, they should
> be (there is just test for for in there).
> Would be nice to also test it nested in other compound statements, like
> #pragma omp for collapse(2)
> for (int i = 0; i < 64; ++i)
>    {
>      if (1)
>        {
> 	for (int j = 0; j < 64; ++j)	// { dg-error "..." }
> 	  ;
> 	while (0) ;			// { dg-error "..." }
> 	do { } while (0);		// { dg-error "..." }
>        }
>      for (int k = 0; k < 64; ++k)
>        ;
>    }
> I guess especially the do { ... } while (0); limitation could be quite
> harmful, that is something used heavily in various macros.
> So perhaps for OpenMP 6.0 we should consider there rejecting just
> for statements for C/C++?  And/or also allow for loops if inside of some
> substatement of say if/do/while/switch, those are also clearly
> distinguishable from the main loop.  On the other side, I've never been
> really excited by this imperfectly nested loops mess making it into the
> standard.

Like I said, I'm not an OpenMP expert and I don't want to mess with anything that isn't already clear in the spec, so my inclination would be not wander off proposing/implementing extensions.

>> +      parser->omp_for_parse_state->fail = true;
>> +    }
>> +
>>     c_parser_consume_token (parser);
>>     block = c_begin_compound_stmt (flag_isoc99);
>>     loc = c_parser_peek_token (parser)->location;
>> @@ -11234,6 +11349,14 @@ c_parser_postfix_expression_after_primary (c_parser *parser,
>>   		  && fndecl_built_in_p (expr.value, BUILT_IN_NORMAL)
>>   		  && vec_safe_length (exprlist) == 1)
>>   		warn_for_abs (expr_loc, expr.value, (*exprlist)[0]);
>> +	      if (parser->omp_for_parse_state
>> +		  && parser->omp_for_parse_state->in_intervening_code
>> +		  && omp_runtime_api_call (expr.value))
>> +		{
>> +		  error_at (expr_loc, "Calls to the OpenMP runtime API are "
> 
> s/Calls/calls/, C/C++ FE/middle end diagnostics never start with capital
> letter unless it is part of some shorthand/word always spelled with capital
> letters.

Done.

>> +  if (decl == NULL || decl == error_mark_node || init == error_mark_node)
>> +    omp_for_parse_state->fail = true;
>> +  else
>> +    {
>> +      TREE_VEC_ELT (omp_for_parse_state->declv, omp_for_parse_state->depth)
>> +	= decl;
>> +      TREE_VEC_ELT (omp_for_parse_state->initv, omp_for_parse_state->depth)
>> +	= init;
>> +      TREE_VEC_ELT (omp_for_parse_state->condv, omp_for_parse_state->depth)
>> +	= cond;
>> +      TREE_VEC_ELT (omp_for_parse_state->incrv, omp_for_parse_state->depth)
>> +	= incr;
> 
> This would be more readable using a temporary:
>        int depth = omp_for_parse_state->depth;
>        TREE_VEC_ELT (omp_for_parse_state->declv, depth) = decl;
>        TREE_VEC_ELT (omp_for_parse_state->initv, depth) = init;
>        TREE_VEC_ELT (omp_for_parse_state->condv, depth) = cond;
>        TREE_VEC_ELT (omp_for_parse_state->incrv, depth) = incr;

Done.

> 
>> +    }
>> +
>> +parse_next:
>> +  omp_for_parse_state->want_nested_loop = true;
>> +  moreloops = omp_for_parse_state->depth < omp_for_parse_state->collapse - 1;
> 
> Shouldn't omp_for_parse_state->want_nested_loop be initialized to moreloops
> here?

Done.

> And/or temporarily clear
> parser->omp_for_parse_state here and reset it back afterwards?
> Then
>    if (parser->omp_for_parse_state
>        && (parser->omp_for_parse_state->depth
>            < parser->omp_for_parse_state->collapse - 1))
>      omp_for_parse_state = parser->omp_for_parse_state;
>    else
>      omp_for_parse_state = NULL;
> etc. wouldn't be really needed, if parser->omp_for_parse_state would
> be non-NULL, we'd want to use it without further checks.

Done.

>> +  /* Pop and return the implicit scope surrounding this level of loop.
>> +     Any iteration variable bound in loop_scope is pulled out and later
>> +     will be added to the scope surrounding the entire OMP_FOR.  That
>> +     keeps the gimplifier happy later on, and meanwhile we have already
>> +     resolved all references to the iteration variable in its true scope.  */
>> +  add_stmt (body);
>> +  body = c_end_compound_stmt (loc, loop_scope, true);
>> +  if (decl && TREE_CODE (body) == BIND_EXPR)
> 
> So, this moves just the iteration var if declared in the for loop and nothing else?

Yes.  I've updated the comments to try to explain this better.

>> +#pragma omp for ordered(3)
>> +  for (i = 0; i < a1; i++)  /* { dg-error "inner loops must be perfectly nested" } */
>> +    {
>> +      f1 (0, i);
>> +      for (j = 0; j < a2; j++)
>> +	{
>> +	  f1 (1, j);
>> +	  for (k = 0; k < a3; k++)
>> +	    {
>> +	      f1 (2, k);
>> +	      f2 (2, k);
> 
> Would be good to stick here #pragma omp ordered doacross(source) and sink,
> just to make it valid except for the intervening code.

Skipped this for now.

>> diff --git a/gcc/testsuite/c-c++-common/gomp/imperfect4.c b/gcc/testsuite/c-c++-common/gomp/imperfect4.c
>> new file mode 100644
>> index 00000000000..e5feff730a9
>> --- /dev/null
>> +++ b/gcc/testsuite/c-c++-common/gomp/imperfect4.c
>> @@ -0,0 +1,35 @@
>> +/* { dg-do compile } */
>> +
>> +/* This test case is expected to fail due to errors.  */
>> +
>> +static int f1count[3], f2count[3];
> 
> Don't declare variables you don't use at all in the test (in multiple
> tests).

Fixed.

>> --- /dev/null
>> +++ b/gcc/testsuite/c-c++-common/gomp/imperfect5.c
>> @@ -0,0 +1,59 @@
>> +/* { dg-do compile } */
> 
> I think you'd better just copy some existing scan testcase and
> just add intervening code to it.

I actually tried to do that originally, but I could not find "some existing scan testcase" with more than one loop that I could copy from.  Tobias helped me come up with the testcase I've presently got.

As I said at the top of this mail, the OG13 version of the patches includes the cleanups I indicated are "done", and I've applied those fixes in parallel to the mainline version I'm still working on.  If it would be helpful I could post that as WIP in its current state, or just continue adding the extra testcases, fixing the ICE, etc before reposting.

-Sandra
  

Patch

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index c9f06930e3a..891e45ecaea 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -249,6 +249,10 @@  struct GTY(()) c_parser {
 
   /* Location of the last consumed token.  */
   location_t last_token_location;
+
+  /* Holds state for parsing collapsed OMP_FOR loops.  Managed by
+     c_parser_omp_for_loop.  */
+  struct omp_for_parse_data * GTY((skip)) omp_for_parse_state;
 };
 
 /* Return a pointer to the Nth token in PARSERs tokens_buf.  */
@@ -1525,6 +1529,23 @@  struct oacc_routine_data {
 /* Used for parsing objc foreach statements.  */
 static tree objc_foreach_break_label, objc_foreach_continue_label;
 
+/* Used for parsing OMP for loops.  See c_parser_omp_for_loop.  */
+struct omp_for_parse_data {
+  tree declv, condv, incrv, initv;
+  tree pre_body;
+  tree bindings;
+  int collapse;
+  int depth;
+  location_t for_loc;
+  bool want_nested_loop : 1;
+  bool ordered : 1;
+  bool in_intervening_code : 1;
+  bool perfect_nesting_fail : 1;
+  bool fail : 1;
+  bool inscan : 1;
+  enum tree_code code;
+};
+
 static bool c_parser_nth_token_starts_std_attributes (c_parser *,
 						      unsigned int);
 static tree c_parser_std_attribute_specifier_sequence (c_parser *);
@@ -1616,6 +1637,7 @@  static void c_parser_omp_threadprivate (c_parser *);
 static void c_parser_omp_barrier (c_parser *);
 static void c_parser_omp_depobj (c_parser *);
 static void c_parser_omp_flush (c_parser *);
+static tree c_parser_omp_loop_nest (c_parser *, bool *);
 static tree c_parser_omp_for_loop (location_t, c_parser *, enum tree_code,
 				   tree, tree *, bool *);
 static void c_parser_omp_taskwait (c_parser *);
@@ -6123,6 +6145,15 @@  c_parser_compound_statement_nostart (c_parser *parser)
   bool last_label = false;
   bool save_valid_for_pragma = valid_location_for_stdc_pragma_p ();
   location_t label_loc = UNKNOWN_LOCATION;  /* Quiet warning.  */
+  struct omp_for_parse_data *omp_for_parse_state;
+
+  if (parser->omp_for_parse_state
+      && (parser->omp_for_parse_state->depth
+	  < parser->omp_for_parse_state->collapse - 1))
+    omp_for_parse_state = parser->omp_for_parse_state;
+  else
+    omp_for_parse_state = NULL;
+
   if (c_parser_next_token_is (parser, CPP_CLOSE_BRACE))
     {
       location_t endloc = c_parser_peek_token (parser)->location;
@@ -6289,6 +6320,15 @@  c_parser_compound_statement_nostart (c_parser *parser)
               continue;
             }
         }
+      else if (omp_for_parse_state
+	       && c_parser_next_token_is_keyword (parser, RID_FOR)
+	       && omp_for_parse_state->want_nested_loop)
+	{
+	  /* Special treatment for collapsed loop nests.  */
+	  omp_for_parse_state->depth++;
+	  add_stmt (c_parser_omp_loop_nest (parser, NULL));
+	  omp_for_parse_state->depth--;
+	}
       else
 	{
 	statement:
@@ -6296,7 +6336,52 @@  c_parser_compound_statement_nostart (c_parser *parser)
 	  last_label = false;
 	  last_stmt = true;
 	  mark_valid_location_for_stdc_pragma (false);
-	  c_parser_statement_after_labels (parser, NULL);
+	  if (omp_for_parse_state
+	      && !c_parser_next_token_is (parser, CPP_OPEN_BRACE))
+	    {
+	      /* Nested loops can only appear directly or nested in
+		 compound statements.  We have neither, so set the bit
+		 to treat everything inside the subsequent statement
+		 as intervening code instead.  */
+	      bool want_nested_loop = omp_for_parse_state->want_nested_loop;
+	      omp_for_parse_state->want_nested_loop = false;
+
+	      /* Only diagnose errors related to perfect nesting once.  */
+	      if (!omp_for_parse_state->perfect_nesting_fail)
+		{
+
+		  /* OpenACC does not (yet) permit intervening code, in
+		     addition to situations forbidden by the OpenMP spec.  */
+		  if (omp_for_parse_state->code == OACC_LOOP)
+		    {
+		      error_at (omp_for_parse_state->for_loc,
+				"inner loops must be perfectly nested in "
+				"%<#pragma acc loop%>");
+		      omp_for_parse_state->perfect_nesting_fail = true;
+		    }
+		  else if (omp_for_parse_state->ordered)
+		    {
+		      error_at (omp_for_parse_state->for_loc,
+				"inner loops must be perfectly nested with "
+				"%<ordered%> clause");
+		      omp_for_parse_state->perfect_nesting_fail = true;
+		    }
+		  else if (omp_for_parse_state->inscan)
+		    {
+		      error_at (omp_for_parse_state->for_loc,
+				"inner loops must be perfectly nested with "
+				"%<reduction%> %<inscan%> clause");
+		      omp_for_parse_state->perfect_nesting_fail = true;
+		    }
+		  /* TODO: Also reject loops with TILE directive.  */
+		  if (omp_for_parse_state->perfect_nesting_fail)
+		    omp_for_parse_state->fail = true;
+		}
+	      c_parser_statement_after_labels (parser, NULL);
+	      omp_for_parse_state->want_nested_loop = want_nested_loop;
+	    }
+	  else
+	    c_parser_statement_after_labels (parser, NULL);
 	}
 
       parser->error = false;
@@ -7138,6 +7223,16 @@  c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll,
   gcc_assert (c_parser_next_token_is_keyword (parser, RID_WHILE));
   token_indent_info while_tinfo
     = get_token_indent_info (c_parser_peek_token (parser));
+
+  if (parser->omp_for_parse_state
+      && (parser->omp_for_parse_state->depth
+	  < parser->omp_for_parse_state->collapse - 1))
+    {
+      error_at (c_parser_peek_token (parser)->location,
+		"loop not permitted in intervening code in OMP loop body");
+      parser->omp_for_parse_state->fail = true;
+    }
+
   c_parser_consume_token (parser);
   block = c_begin_compound_stmt (flag_isoc99);
   loc = c_parser_peek_token (parser)->location;
@@ -7189,6 +7284,16 @@  c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll)
   unsigned char save_in_statement;
   location_t loc;
   gcc_assert (c_parser_next_token_is_keyword (parser, RID_DO));
+
+  if (parser->omp_for_parse_state
+      && (parser->omp_for_parse_state->depth
+	  < parser->omp_for_parse_state->collapse - 1))
+    {
+      error_at (c_parser_peek_token (parser)->location,
+		"loop not permitted in intervening code in OMP loop body");
+      parser->omp_for_parse_state->fail = true;
+    }
+
   c_parser_consume_token (parser);
   if (c_parser_next_token_is (parser, CPP_SEMICOLON))
     warning_at (c_parser_peek_token (parser)->location,
@@ -7295,6 +7400,16 @@  c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll,
   gcc_assert (c_parser_next_token_is_keyword (parser, RID_FOR));
   token_indent_info for_tinfo
     = get_token_indent_info (c_parser_peek_token (parser));
+
+  if (parser->omp_for_parse_state
+      && (parser->omp_for_parse_state->depth
+	  < parser->omp_for_parse_state->collapse - 1))
+    {
+      error_at (for_loc,
+		"loop not permitted in intervening code in OMP loop body");
+      parser->omp_for_parse_state->fail = true;
+    }
+
   c_parser_consume_token (parser);
   /* Open a compound statement in Objective-C as well, just in case this is
      as foreach expression.  */
@@ -11234,6 +11349,14 @@  c_parser_postfix_expression_after_primary (c_parser *parser,
 		  && fndecl_built_in_p (expr.value, BUILT_IN_NORMAL)
 		  && vec_safe_length (exprlist) == 1)
 		warn_for_abs (expr_loc, expr.value, (*exprlist)[0]);
+	      if (parser->omp_for_parse_state
+		  && parser->omp_for_parse_state->in_intervening_code
+		  && omp_runtime_api_call (expr.value))
+		{
+		  error_at (expr_loc, "Calls to the OpenMP runtime API are "
+				      "not permitted in intervening code");
+		  parser->omp_for_parse_state->fail = true;
+		}
 	    }
 
 	  start = expr.get_start ();
@@ -13068,6 +13191,17 @@  c_parser_pragma (c_parser *parser, enum pragma_context context, bool *if_p)
   input_location = c_parser_peek_token (parser)->location;
   id = c_parser_peek_token (parser)->pragma_kind;
   gcc_assert (id != PRAGMA_NONE);
+  if (parser->omp_for_parse_state
+      && parser->omp_for_parse_state->in_intervening_code
+      && id >= PRAGMA_OMP__START_
+      && id <= PRAGMA_OMP__LAST_)
+    {
+      error_at (input_location,
+		"intervening code must not contain OpenMP directives");
+      parser->omp_for_parse_state->fail = true;
+      c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL);
+      return false;
+    }
 
   switch (id)
     {
@@ -20190,6 +20324,274 @@  c_parser_omp_scan_loop_body (c_parser *parser, bool open_brace_parsed)
 			     "expected %<}%>");
 }
 
+
+/* This function parses a single level of a loop nest, invoking itself
+   recursively if necessary.
+
+   loop-nest :: for (...) loop-body
+   loop-body :: loop-nest
+	     |  { [intervening-code] loop-body [intervening-code] }
+	     |  final-loop-body
+   intervening-code :: structured-block-sequence
+   final-loop-body :: structured-block
+
+   For a collapsed loop nest, only a single OMP_FOR is built, pulling out
+   all the iterator information from the inner loops into the
+   parser->omp_for_parse_state structure.
+
+   The iterator decl, init, cond, and incr are stored in vectors.
+
+   Initialization code for iterator variables is collected into
+   parser->omp_for_parse_state->pre_body and ends up inserted directly
+   into the OMP_FOR structure.  */
+
+static tree
+c_parser_omp_loop_nest (c_parser *parser, bool *if_p)
+{
+  tree decl, cond, incr, init;
+  tree body = NULL_TREE;
+  matching_parens parens;
+  bool moreloops;
+  unsigned char save_in_statement;
+  tree loop_scope;
+  location_t loc;
+  struct omp_for_parse_data *omp_for_parse_state
+    = parser->omp_for_parse_state;
+
+  gcc_assert (omp_for_parse_state);
+
+  /* We have already matched the FOR token but not consumed it yet.  */
+  loc = c_parser_peek_token (parser)->location;
+  gcc_assert (c_parser_next_token_is_keyword (parser, RID_FOR));
+  c_parser_consume_token (parser);
+
+  /* Forbid break/continue in the loop initializer, condition, and
+     increment expressions.  */
+  save_in_statement = in_statement;
+  in_statement = IN_OMP_BLOCK;
+
+  /* We are not in intervening code now.  */
+  omp_for_parse_state->in_intervening_code = false;
+
+  if (!parens.require_open (parser))
+    {
+      omp_for_parse_state->fail = true;
+      return NULL_TREE;
+    }
+
+  /* An implicit scope block surrounds each level of FOR loop, for
+     declarations of iteration variables at this loop depth.  */
+  loop_scope = c_begin_compound_stmt (true);
+
+  /* Parse the initialization declaration or expression.  */
+  if (c_parser_next_tokens_start_declaration (parser))
+    {
+      /* This is a declaration, which must be added to the pre_body code.  */
+      tree this_pre_body = push_stmt_list ();
+      c_in_omp_for = true;
+      c_parser_declaration_or_fndef (parser, true, true, true, true, true);
+      c_in_omp_for = false;
+      this_pre_body = pop_stmt_list (this_pre_body);
+      append_to_statement_list_force (this_pre_body,
+				      &(omp_for_parse_state->pre_body));
+      decl = check_for_loop_decls (omp_for_parse_state->for_loc, flag_isoc99);
+      if (decl == NULL)
+	goto error_init;
+      if (DECL_INITIAL (decl) == error_mark_node)
+	decl = error_mark_node;
+      init = decl;
+    }
+  else if (c_parser_next_token_is (parser, CPP_NAME)
+	   && c_parser_peek_2nd_token (parser)->type == CPP_EQ)
+    {
+      struct c_expr decl_exp;
+      struct c_expr init_exp;
+      location_t init_loc;
+
+      decl_exp = c_parser_postfix_expression (parser);
+      decl = decl_exp.value;
+
+      c_parser_require (parser, CPP_EQ, "expected %<=%>");
+
+      init_loc = c_parser_peek_token (parser)->location;
+      init_exp = c_parser_expr_no_commas (parser, NULL);
+      init_exp = default_function_array_read_conversion (init_loc,
+							 init_exp);
+      c_in_omp_for = true;
+      init = build_modify_expr (init_loc, decl, decl_exp.original_type,
+				NOP_EXPR, init_loc, init_exp.value,
+				init_exp.original_type);
+      c_in_omp_for = false;
+      init = c_process_expr_stmt (init_loc, init);
+
+      c_parser_skip_until_found (parser, CPP_SEMICOLON, "expected %<;%>");
+    }
+  else
+    {
+    error_init:
+      c_parser_error (parser,
+		      "expected iteration declaration or initialization");
+      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
+				 "expected %<)%>");
+      omp_for_parse_state->fail = true;
+      goto parse_next;
+    }
+
+  /* Parse the loop condition.  */
+  cond = NULL_TREE;
+  if (c_parser_next_token_is_not (parser, CPP_SEMICOLON))
+    {
+      location_t cond_loc = c_parser_peek_token (parser)->location;
+      c_in_omp_for = true;
+      struct c_expr cond_expr
+	= c_parser_binary_expression (parser, NULL, NULL_TREE);
+      c_in_omp_for = false;
+
+      cond = cond_expr.value;
+      cond = c_objc_common_truthvalue_conversion (cond_loc, cond);
+      switch (cond_expr.original_code)
+	{
+	case GT_EXPR:
+	case GE_EXPR:
+	case LT_EXPR:
+	case LE_EXPR:
+	  break;
+	case NE_EXPR:
+	  if (omp_for_parse_state->code != OACC_LOOP)
+	    break;
+	  /* FALLTHRU.  */
+	default:
+	  /* Can't be cond = error_mark_node, because we want to preserve
+	     the location until c_finish_omp_for.  */
+	  cond = build1 (NOP_EXPR, boolean_type_node, error_mark_node);
+	  break;
+	}
+      protected_set_expr_location (cond, cond_loc);
+    }
+  c_parser_skip_until_found (parser, CPP_SEMICOLON, "expected %<;%>");
+
+  /* Parse the increment expression.  */
+  incr = NULL_TREE;
+  if (c_parser_next_token_is_not (parser, CPP_CLOSE_PAREN))
+    {
+      location_t incr_loc = c_parser_peek_token (parser)->location;
+
+      incr = c_process_expr_stmt (incr_loc,
+				  c_parser_expression (parser).value);
+    }
+  parens.skip_until_found_close (parser);
+
+  if (decl == NULL || decl == error_mark_node || init == error_mark_node)
+    omp_for_parse_state->fail = true;
+  else
+    {
+      TREE_VEC_ELT (omp_for_parse_state->declv, omp_for_parse_state->depth)
+	= decl;
+      TREE_VEC_ELT (omp_for_parse_state->initv, omp_for_parse_state->depth)
+	= init;
+      TREE_VEC_ELT (omp_for_parse_state->condv, omp_for_parse_state->depth)
+	= cond;
+      TREE_VEC_ELT (omp_for_parse_state->incrv, omp_for_parse_state->depth)
+	= incr;
+    }
+
+parse_next:
+  omp_for_parse_state->want_nested_loop = true;
+  moreloops = omp_for_parse_state->depth < omp_for_parse_state->collapse - 1;
+  if (moreloops && c_parser_next_token_is_keyword (parser, RID_FOR))
+    {
+      omp_for_parse_state->depth++;
+      body = c_parser_omp_loop_nest (parser, if_p);
+      omp_for_parse_state->depth--;
+    }
+  else if (moreloops && c_parser_next_token_is (parser, CPP_OPEN_BRACE))
+    {
+      /* This is the open brace in the loop-body grammar production.  Rather
+	 than trying to special-case braces, just parse it as a compound
+	 statement and handle the nested loop-body case there.  Note that
+	 when we see a further open brace inside the compound statement
+	 loop-body, we don't know whether it is the start of intervening
+	 code that is a compound statement, or a level of braces
+	 surrounding a nested loop-body.  Use the WANT_NESTED_LOOP state
+	 bit to ensure we have only one nested loop at each level.  */
+      omp_for_parse_state->in_intervening_code = true;
+      body = c_parser_compound_statement (parser, NULL);
+      omp_for_parse_state->in_intervening_code = false;
+      if (omp_for_parse_state->want_nested_loop)
+	{
+	  /* We have already parsed the whole loop body and not found a
+	     nested loop.  */
+	  error_at (omp_for_parse_state->for_loc,
+		    "not enough nested loops");
+	  omp_for_parse_state->fail = true;
+	}
+      if_p = NULL;
+    }
+  else
+    {
+      /* This is the final-loop-body case in the grammar: we have
+	 something that is not a FOR and not an open brace.  */
+      if (moreloops)
+	{
+	  /* If we were expecting a nested loop, give an error and mark
+	     that parsing has failed, and try to recover by parsing the
+	     body as regular code without further collapsing.  */
+	  error_at (omp_for_parse_state->for_loc,
+		    "not enough nested loops");
+	  omp_for_parse_state->fail = true;
+	}
+      in_statement = IN_OMP_FOR;
+      body = push_stmt_list ();
+      if (omp_for_parse_state->inscan)
+	c_parser_omp_scan_loop_body (parser, false);
+      else
+	add_stmt (c_parser_c99_block_statement (parser, if_p));
+      body = pop_stmt_list (body);
+    }
+  in_statement = save_in_statement;
+  omp_for_parse_state->want_nested_loop = false;
+  omp_for_parse_state->in_intervening_code = true;
+
+  /* Pop and return the implicit scope surrounding this level of loop.
+     Any iteration variable bound in loop_scope is pulled out and later
+     will be added to the scope surrounding the entire OMP_FOR.  That
+     keeps the gimplifier happy later on, and meanwhile we have already
+     resolved all references to the iteration variable in its true scope.  */
+  add_stmt (body);
+  body = c_end_compound_stmt (loc, loop_scope, true);
+  if (decl && TREE_CODE (body) == BIND_EXPR)
+    {
+      tree t = BIND_EXPR_VARS (body);
+      tree prev = NULL_TREE, next = NULL_TREE;
+      while (t)
+	{
+	  next = DECL_CHAIN (t);
+	  if (t == decl)
+	    {
+	      if (prev)
+		DECL_CHAIN (prev) = next;
+	      else
+		{
+		  BIND_EXPR_VARS (body) = next;
+		  BLOCK_VARS (BIND_EXPR_BLOCK (body)) = next;
+		}
+	      DECL_CHAIN (t) = omp_for_parse_state->bindings;
+	      omp_for_parse_state->bindings = t;
+	      break;
+	    }
+	  else
+	    {
+	      prev = t;
+	      t = next;
+	    }
+	}
+      if (BIND_EXPR_VARS (body) == NULL_TREE)
+	body = BIND_EXPR_BODY (body);
+    }
+
+  return body;
+}
+
 /* Parse the restricted form of loop statements allowed by OpenACC and OpenMP.
    The real trick here is to determine the loop control variable early
    so that we can push a new decl if necessary to make it private.
@@ -20200,17 +20602,15 @@  static tree
 c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 		       tree clauses, tree *cclauses, bool *if_p)
 {
-  tree decl, cond, incr, body, init, stmt, cl;
-  unsigned char save_in_statement;
-  tree declv, condv, incrv, initv, ret = NULL_TREE;
-  tree pre_body = NULL_TREE, this_pre_body;
+  tree body, stmt, cl;
+  tree ret = NULL_TREE;
   tree ordered_cl = NULL_TREE;
-  bool fail = false, open_brace_parsed = false;
-  int i, collapse = 1, ordered = 0, count, nbraces = 0;
+  int i, collapse = 1, ordered = 0, count;
   location_t for_loc;
   bool tiling = false;
   bool inscan = false;
-  vec<tree, va_gc> *for_block = make_tree_vector ();
+  struct omp_for_parse_data data;
+  struct omp_for_parse_data *save_data = parser->omp_for_parse_state;
 
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
     if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
@@ -20243,250 +20643,53 @@  c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   gcc_assert (tiling || (collapse >= 1 && ordered >= 0));
   count = ordered ? ordered : collapse;
 
-  declv = make_tree_vec (count);
-  initv = make_tree_vec (count);
-  condv = make_tree_vec (count);
-  incrv = make_tree_vec (count);
-
   if (!c_parser_next_token_is_keyword (parser, RID_FOR))
     {
       c_parser_error (parser, "for statement expected");
       return NULL;
     }
   for_loc = c_parser_peek_token (parser)->location;
-  c_parser_consume_token (parser);
-
-  /* Forbid break/continue in the loop initializer, condition, and
-     increment expressions.  */
-  save_in_statement = in_statement;
-  in_statement = IN_OMP_BLOCK;
-
-  for (i = 0; i < count; i++)
-    {
-      int bracecount = 0;
-
-      matching_parens parens;
-      if (!parens.require_open (parser))
-	goto pop_scopes;
-
-      /* Parse the initialization declaration or expression.  */
-      if (c_parser_next_tokens_start_declaration (parser))
-	{
-	  if (i > 0)
-	    vec_safe_push (for_block, c_begin_compound_stmt (true));
-	  this_pre_body = push_stmt_list ();
-	  c_in_omp_for = true;
-	  c_parser_declaration_or_fndef (parser, true, true, true, true, true);
-	  c_in_omp_for = false;
-	  if (this_pre_body)
-	    {
-	      this_pre_body = pop_stmt_list (this_pre_body);
-	      if (pre_body)
-		{
-		  tree t = pre_body;   
-		  pre_body = push_stmt_list ();
-		  add_stmt (t);
-		  add_stmt (this_pre_body);
-		  pre_body = pop_stmt_list (pre_body);
-		}
-	      else
-		pre_body = this_pre_body;
-	    }
-	  decl = check_for_loop_decls (for_loc, flag_isoc99);
-	  if (decl == NULL)
-	    goto error_init;
-	  if (DECL_INITIAL (decl) == error_mark_node)
-	    decl = error_mark_node;
-	  init = decl;
-	}
-      else if (c_parser_next_token_is (parser, CPP_NAME)
-	       && c_parser_peek_2nd_token (parser)->type == CPP_EQ)
-	{
-	  struct c_expr decl_exp;
-	  struct c_expr init_exp;
-	  location_t init_loc;
-
-	  decl_exp = c_parser_postfix_expression (parser);
-	  decl = decl_exp.value;
-
-	  c_parser_require (parser, CPP_EQ, "expected %<=%>");
-
-	  init_loc = c_parser_peek_token (parser)->location;
-	  init_exp = c_parser_expr_no_commas (parser, NULL);
-	  init_exp = default_function_array_read_conversion (init_loc,
-							     init_exp);
-	  c_in_omp_for = true;
-	  init = build_modify_expr (init_loc, decl, decl_exp.original_type,
-				    NOP_EXPR, init_loc, init_exp.value,
-				    init_exp.original_type);
-	  c_in_omp_for = false;
-	  init = c_process_expr_stmt (init_loc, init);
-
-	  c_parser_skip_until_found (parser, CPP_SEMICOLON, "expected %<;%>");
-	}
-      else
-	{
-	error_init:
-	  c_parser_error (parser,
-			  "expected iteration declaration or initialization");
-	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
-				     "expected %<)%>");
-	  fail = true;
-	  goto parse_next;
-	}
-
-      /* Parse the loop condition.  */
-      cond = NULL_TREE;
-      if (c_parser_next_token_is_not (parser, CPP_SEMICOLON))
-	{
-	  location_t cond_loc = c_parser_peek_token (parser)->location;
-	  c_in_omp_for = true;
-	  struct c_expr cond_expr
-	    = c_parser_binary_expression (parser, NULL, NULL_TREE);
-          c_in_omp_for = false;
-
-	  cond = cond_expr.value;
-	  cond = c_objc_common_truthvalue_conversion (cond_loc, cond);
-	  switch (cond_expr.original_code)
-	    {
-	    case GT_EXPR:
-	    case GE_EXPR:
-	    case LT_EXPR:
-	    case LE_EXPR:
-	      break;
-	    case NE_EXPR:
-	      if (code != OACC_LOOP)
-		break;
-	      /* FALLTHRU.  */
-	    default:
-	      /* Can't be cond = error_mark_node, because we want to preserve
-		 the location until c_finish_omp_for.  */
-	      cond = build1 (NOP_EXPR, boolean_type_node, error_mark_node);
-	      break;
-	    }
-	  protected_set_expr_location (cond, cond_loc);
-	}
-      c_parser_skip_until_found (parser, CPP_SEMICOLON, "expected %<;%>");
-
-      /* Parse the increment expression.  */
-      incr = NULL_TREE;
-      if (c_parser_next_token_is_not (parser, CPP_CLOSE_PAREN))
-	{
-	  location_t incr_loc = c_parser_peek_token (parser)->location;
 
-	  incr = c_process_expr_stmt (incr_loc,
-				      c_parser_expression (parser).value);
-	}
-      parens.skip_until_found_close (parser);
-
-      if (decl == NULL || decl == error_mark_node || init == error_mark_node)
-	fail = true;
-      else
-	{
-	  TREE_VEC_ELT (declv, i) = decl;
-	  TREE_VEC_ELT (initv, i) = init;
-	  TREE_VEC_ELT (condv, i) = cond;
-	  TREE_VEC_ELT (incrv, i) = incr;
-	}
-
-    parse_next:
-      if (i == count - 1)
-	break;
-
-      /* FIXME: OpenMP 3.0 draft isn't very clear on what exactly is allowed
-	 in between the collapsed for loops to be still considered perfectly
-	 nested.  Hopefully the final version clarifies this.
-	 For now handle (multiple) {'s and empty statements.  */
-      do
-	{
-	  if (c_parser_next_token_is_keyword (parser, RID_FOR))
-	    {
-	      c_parser_consume_token (parser);
-	      break;
-	    }
-	  else if (c_parser_next_token_is (parser, CPP_OPEN_BRACE))
-	    {
-	      c_parser_consume_token (parser);
-	      bracecount++;
-	    }
-	  else if (bracecount
-		   && c_parser_next_token_is (parser, CPP_SEMICOLON))
-	    c_parser_consume_token (parser);
-	  else
-	    {
-	      c_parser_error (parser, "not enough perfectly nested loops");
-	      if (bracecount)
-		{
-		  open_brace_parsed = true;
-		  bracecount--;
-		}
-	      fail = true;
-	      count = 0;
-	      break;
-	    }
-	}
-      while (1);
-
-      nbraces += bracecount;
-    }
-
-  if (nbraces)
-    if_p = NULL;
-
-  in_statement = IN_OMP_FOR;
-  body = push_stmt_list ();
-
-  if (inscan)
-    c_parser_omp_scan_loop_body (parser, open_brace_parsed);
-  else if (open_brace_parsed)
-    {
-      location_t here = c_parser_peek_token (parser)->location;
-      stmt = c_begin_compound_stmt (true);
-      c_parser_compound_statement_nostart (parser);
-      add_stmt (c_end_compound_stmt (here, stmt, true));
-    }
-  else
-    add_stmt (c_parser_c99_block_statement (parser, if_p));
-
-  body = pop_stmt_list (body);
-  in_statement = save_in_statement;
-
-  while (nbraces)
-    {
-      if (c_parser_next_token_is (parser, CPP_CLOSE_BRACE))
-	{
-	  c_parser_consume_token (parser);
-	  nbraces--;
-	}
-      else if (c_parser_next_token_is (parser, CPP_SEMICOLON))
-	c_parser_consume_token (parser);
-      else
-	{
-	  c_parser_error (parser, "collapsed loops not perfectly nested");
-	  while (nbraces)
-	    {
-	      location_t here = c_parser_peek_token (parser)->location;
-	      stmt = c_begin_compound_stmt (true);
-	      add_stmt (body);
-	      c_parser_compound_statement_nostart (parser);
-	      body = c_end_compound_stmt (here, stmt, true);
-	      nbraces--;
-	    }
-	  goto pop_scopes;
-	}
+  /* Initialize parse state for recursive descent.  */
+  data.declv = make_tree_vec (count);
+  data.initv = make_tree_vec (count);
+  data.condv = make_tree_vec (count);
+  data.incrv = make_tree_vec (count);
+  data.pre_body = NULL_TREE;;
+  data.bindings = NULL_TREE;
+  data.for_loc = for_loc;
+  data.collapse = count;
+  data.depth = 0;
+  data.want_nested_loop = true;
+  data.ordered = ordered > 0;
+  data.in_intervening_code = false;
+  data.perfect_nesting_fail = false;
+  data.fail = false;
+  data.inscan = inscan;
+  data.code = code;
+  parser->omp_for_parse_state = &data;
+
+  body = c_parser_omp_loop_nest (parser, if_p);
+  for (tree t = data.bindings; t; )
+    {
+      tree n = TREE_CHAIN (t);
+      TREE_CHAIN (t) = NULL_TREE;
+      pushdecl (t);
+      t = n;
     }
 
   /* Only bother calling c_finish_omp_for if we haven't already generated
      an error from the initialization parsing.  */
-  if (!fail)
+  if (!data.fail)
     {
       c_in_omp_for = true;
-      stmt = c_finish_omp_for (loc, code, declv, NULL, initv, condv,
-			       incrv, body, pre_body, true);
+      stmt = c_finish_omp_for (loc, code, data.declv, NULL, data.initv,
+			       data.condv, data.incrv,
+			       body, data.pre_body, true);
       c_in_omp_for = false;
 
       /* Check for iterators appearing in lb, b or incr expressions.  */
-      if (stmt && !c_omp_check_loop_iv (stmt, declv, NULL))
+      if (stmt && !c_omp_check_loop_iv (stmt, data.declv, NULL))
 	stmt = NULL_TREE;
 
       if (stmt)
@@ -20538,7 +20741,7 @@  c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 		else
 		  {
 		    for (i = 0; i < count; i++)
-		      if (TREE_VEC_ELT (declv, i) == OMP_CLAUSE_DECL (*c))
+		      if (TREE_VEC_ELT (data.declv, i) == OMP_CLAUSE_DECL (*c))
 			break;
 		    if (i == count)
 		      c = &OMP_CLAUSE_CHAIN (*c);
@@ -20551,7 +20754,8 @@  c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 		      }
 		    else
 		      {
-			/* Move lastprivate (decl) clause to OMP_FOR_CLAUSES.  */
+			/* Move lastprivate (decl) clause to
+			   OMP_FOR_CLAUSES.  */
 			tree l = *c;
 			*c = OMP_CLAUSE_CHAIN (*c);
 			if (code == OMP_SIMD)
@@ -20572,16 +20776,8 @@  c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 	}
       ret = stmt;
     }
-pop_scopes:
-  while (!for_block->is_empty ())
-    {
-      /* FIXME diagnostics: LOC below should be the actual location of
-	 this particular for block.  We need to build a list of
-	 locations to go along with FOR_BLOCK.  */
-      stmt = c_end_compound_stmt (loc, for_block->pop (), true);
-      add_stmt (stmt);
-    }
-  release_tree_vector (for_block);
+
+  parser->omp_for_parse_state = save_data;
   return ret;
 }
 
diff --git a/gcc/omp-api.h b/gcc/omp-api.h
new file mode 100644
index 00000000000..2a7ec7b72a6
--- /dev/null
+++ b/gcc/omp-api.h
@@ -0,0 +1,32 @@ 
+/* Functions for querying whether a function name is reserved by the
+   OpenMP API.  This is used for error checking.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_OMP_API_H
+#define GCC_OMP_API_H
+
+#include "coretypes.h"
+
+/* In omp-general.cc, but declared in a separate header file for
+   convenience of the Fortran front end.  */
+extern bool omp_runtime_api_procname (const char *name);
+extern bool omp_runtime_api_call (const_tree fndecl);
+
+#endif
diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index eefdcb54590..1e31014c454 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -3013,4 +3013,138 @@  omp_build_component_ref (tree obj, tree field)
   return ret;
 }
 
+/* Return true if NAME is the name of an omp_* runtime API call.  */
+bool
+omp_runtime_api_procname (const char *name)
+{
+  if (!startswith (name, "omp_"))
+    return false;
+
+  static const char *omp_runtime_apis[] =
+    {
+      /* This array has 3 sections.  First omp_* calls that don't
+	 have any suffixes.  */
+      "aligned_alloc",
+      "aligned_calloc",
+      "alloc",
+      "calloc",
+      "free",
+      "get_mapped_ptr",
+      "realloc",
+      "target_alloc",
+      "target_associate_ptr",
+      "target_disassociate_ptr",
+      "target_free",
+      "target_is_accessible",
+      "target_is_present",
+      "target_memcpy",
+      "target_memcpy_async",
+      "target_memcpy_rect",
+      "target_memcpy_rect_async",
+      NULL,
+      /* Now omp_* calls that are available as omp_* and omp_*_; however, the
+	 DECL_NAME is always omp_* without tailing underscore.  */
+      "capture_affinity",
+      "destroy_allocator",
+      "destroy_lock",
+      "destroy_nest_lock",
+      "display_affinity",
+      "fulfill_event",
+      "get_active_level",
+      "get_affinity_format",
+      "get_cancellation",
+      "get_default_allocator",
+      "get_default_device",
+      "get_device_num",
+      "get_dynamic",
+      "get_initial_device",
+      "get_level",
+      "get_max_active_levels",
+      "get_max_task_priority",
+      "get_max_teams",
+      "get_max_threads",
+      "get_nested",
+      "get_num_devices",
+      "get_num_places",
+      "get_num_procs",
+      "get_num_teams",
+      "get_num_threads",
+      "get_partition_num_places",
+      "get_place_num",
+      "get_proc_bind",
+      "get_supported_active_levels",
+      "get_team_num",
+      "get_teams_thread_limit",
+      "get_thread_limit",
+      "get_thread_num",
+      "get_wtick",
+      "get_wtime",
+      "in_explicit_task",
+      "in_final",
+      "in_parallel",
+      "init_lock",
+      "init_nest_lock",
+      "is_initial_device",
+      "pause_resource",
+      "pause_resource_all",
+      "set_affinity_format",
+      "set_default_allocator",
+      "set_lock",
+      "set_nest_lock",
+      "test_lock",
+      "test_nest_lock",
+      "unset_lock",
+      "unset_nest_lock",
+      NULL,
+      /* And finally calls available as omp_*, omp_*_ and omp_*_8_; however,
+	 as DECL_NAME only omp_* and omp_*_8 appear.  */
+      "display_env",
+      "get_ancestor_thread_num",
+      "init_allocator",
+      "get_partition_place_nums",
+      "get_place_num_procs",
+      "get_place_proc_ids",
+      "get_schedule",
+      "get_team_size",
+      "set_default_device",
+      "set_dynamic",
+      "set_max_active_levels",
+      "set_nested",
+      "set_num_teams",
+      "set_num_threads",
+      "set_schedule",
+      "set_teams_thread_limit"
+    };
+
+  int mode = 0;
+  for (unsigned i = 0; i < ARRAY_SIZE (omp_runtime_apis); i++)
+    {
+      if (omp_runtime_apis[i] == NULL)
+	{
+	  mode++;
+	  continue;
+	}
+      size_t len = strlen (omp_runtime_apis[i]);
+      if (strncmp (name + 4, omp_runtime_apis[i], len) == 0
+	  && (name[4 + len] == '\0'
+	      || (mode > 1 && strcmp (name + 4 + len, "_8") == 0)))
+	return true;
+    }
+  return false;
+}
+
+/* Return true if FNDECL is an omp_* runtime API call.  */
+
+bool
+omp_runtime_api_call (const_tree fndecl)
+{
+  tree declname = DECL_NAME (fndecl);
+  if (!declname
+      || (DECL_CONTEXT (fndecl) != NULL_TREE
+	  && TREE_CODE (DECL_CONTEXT (fndecl)) != TRANSLATION_UNIT_DECL)
+      || !TREE_PUBLIC (fndecl))
+    return false;
+  return omp_runtime_api_procname (IDENTIFIER_POINTER (declname));
+}
+
 #include "gt-omp-general.h"
diff --git a/gcc/omp-general.h b/gcc/omp-general.h
index 92717db1628..1a52bfdb56b 100644
--- a/gcc/omp-general.h
+++ b/gcc/omp-general.h
@@ -23,6 +23,7 @@  along with GCC; see the file COPYING3.  If not see
 #define GCC_OMP_GENERAL_H
 
 #include "gomp-constants.h"
+#include "omp-api.h"
 
 /*  Flags for an OpenACC loop.  */
 
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index dddf5b59d8f..77adf97896a 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -4006,135 +4006,6 @@  setjmp_or_longjmp_p (const_tree fndecl)
   return !strcmp (name, "setjmp") || !strcmp (name, "longjmp");
 }
 
-/* Return true if FNDECL is an omp_* runtime API call.  */
-
-static bool
-omp_runtime_api_call (const_tree fndecl)
-{
-  tree declname = DECL_NAME (fndecl);
-  if (!declname
-      || (DECL_CONTEXT (fndecl) != NULL_TREE
-          && TREE_CODE (DECL_CONTEXT (fndecl)) != TRANSLATION_UNIT_DECL)
-      || !TREE_PUBLIC (fndecl))
-    return false;
-
-  const char *name = IDENTIFIER_POINTER (declname);
-  if (!startswith (name, "omp_"))
-    return false;
-
-  static const char *omp_runtime_apis[] =
-    {
-      /* This array has 3 sections.  First omp_* calls that don't
-	 have any suffixes.  */
-      "aligned_alloc",
-      "aligned_calloc",
-      "alloc",
-      "calloc",
-      "free",
-      "get_mapped_ptr",
-      "realloc",
-      "target_alloc",
-      "target_associate_ptr",
-      "target_disassociate_ptr",
-      "target_free",
-      "target_is_accessible",
-      "target_is_present",
-      "target_memcpy",
-      "target_memcpy_async",
-      "target_memcpy_rect",
-      "target_memcpy_rect_async",
-      NULL,
-      /* Now omp_* calls that are available as omp_* and omp_*_; however, the
-	 DECL_NAME is always omp_* without tailing underscore.  */
-      "capture_affinity",
-      "destroy_allocator",
-      "destroy_lock",
-      "destroy_nest_lock",
-      "display_affinity",
-      "fulfill_event",
-      "get_active_level",
-      "get_affinity_format",
-      "get_cancellation",
-      "get_default_allocator",
-      "get_default_device",
-      "get_device_num",
-      "get_dynamic",
-      "get_initial_device",
-      "get_level",
-      "get_max_active_levels",
-      "get_max_task_priority",
-      "get_max_teams",
-      "get_max_threads",
-      "get_nested",
-      "get_num_devices",
-      "get_num_places",
-      "get_num_procs",
-      "get_num_teams",
-      "get_num_threads",
-      "get_partition_num_places",
-      "get_place_num",
-      "get_proc_bind",
-      "get_supported_active_levels",
-      "get_team_num",
-      "get_teams_thread_limit",
-      "get_thread_limit",
-      "get_thread_num",
-      "get_wtick",
-      "get_wtime",
-      "in_explicit_task",
-      "in_final",
-      "in_parallel",
-      "init_lock",
-      "init_nest_lock",
-      "is_initial_device",
-      "pause_resource",
-      "pause_resource_all",
-      "set_affinity_format",
-      "set_default_allocator",
-      "set_lock",
-      "set_nest_lock",
-      "test_lock",
-      "test_nest_lock",
-      "unset_lock",
-      "unset_nest_lock",
-      NULL,
-      /* And finally calls available as omp_*, omp_*_ and omp_*_8_; however,
-	 as DECL_NAME only omp_* and omp_*_8 appear.  */
-      "display_env",
-      "get_ancestor_thread_num",
-      "init_allocator",
-      "get_partition_place_nums",
-      "get_place_num_procs",
-      "get_place_proc_ids",
-      "get_schedule",
-      "get_team_size",
-      "set_default_device",
-      "set_dynamic",
-      "set_max_active_levels",
-      "set_nested",
-      "set_num_teams",
-      "set_num_threads",
-      "set_schedule",
-      "set_teams_thread_limit"
-    };
-
-  int mode = 0;
-  for (unsigned i = 0; i < ARRAY_SIZE (omp_runtime_apis); i++)
-    {
-      if (omp_runtime_apis[i] == NULL)
-	{
-	  mode++;
-	  continue;
-	}
-      size_t len = strlen (omp_runtime_apis[i]);
-      if (strncmp (name + 4, omp_runtime_apis[i], len) == 0
-	  && (name[4 + len] == '\0'
-	      || (mode > 1 && strcmp (name + 4 + len, "_8") == 0)))
-	return true;
-    }
-  return false;
-}
-
 /* Helper function for scan_omp.
 
    Callback for walk_gimple_stmt used to scan for OMP directives in
diff --git a/gcc/testsuite/c-c++-common/goacc/collapse-1.c b/gcc/testsuite/c-c++-common/goacc/collapse-1.c
index 11b14383983..05025e64af7 100644
--- a/gcc/testsuite/c-c++-common/goacc/collapse-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/collapse-1.c
@@ -8,8 +8,8 @@  f1 (void)
 {
   #pragma acc parallel
   #pragma acc loop collapse (2)
-  for (i = 0; i < 5; i++)
-    ;					/* { dg-error "not enough perfectly nested" } */
+  for (i = 0; i < 5; i++)	/* { dg-error "not enough nested loops" } */
+    ;
   {
     for (j = 0; j < 5; j++)
       ;
@@ -40,7 +40,7 @@  f3 (void)
   #pragma acc loop collapse (2)
   for (i = 0; i < 5; i++)
     {
-      int k = foo ();			/* { dg-error "not enough perfectly nested" } */
+      int k = foo ();
       {
 	{
 	  for (j = 0; j < 5; j++)
@@ -56,12 +56,12 @@  f4 (void)
 {
   #pragma acc parallel
   #pragma acc loop collapse (2)
-  for (i = 0; i < 5; i++)
+  for (i = 0; i < 5; i++)	/* { dg-error "inner loops must be perfectly nested" } */
     {
       {
 	for (j = 0; j < 5; j++)
 	  ;
-	foo ();				/* { dg-error "collapsed loops not perfectly nested before" } */
+	foo ();
       }
     }
 }
@@ -71,13 +71,13 @@  f5 (void)
 {
   #pragma acc parallel
   #pragma acc loop collapse (2)
-  for (i = 0; i < 5; i++)
+  for (i = 0; i < 5; i++)	/* { dg-error "inner loops must be perfectly nested" } */
     {
       {
 	for (j = 0; j < 5; j++)
 	  ;
       }
-      foo ();				/* { dg-error "collapsed loops not perfectly nested before" } */
+      foo ();
     }
 }
 
diff --git a/gcc/testsuite/c-c++-common/goacc/tile-2.c b/gcc/testsuite/c-c++-common/goacc/tile-2.c
index c8b240d225b..dc306703260 100644
--- a/gcc/testsuite/c-c++-common/goacc/tile-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/tile-2.c
@@ -3,8 +3,8 @@  int main ()
 #pragma acc parallel
   {
 #pragma acc loop tile (*,*)
-    for (int ix = 0; ix < 30; ix++)
-      ; /* { dg-error "not enough" } */
+    for (int ix = 0; ix < 30; ix++) /* { dg-error "not enough" } */
+      ;
 
 #pragma acc loop tile (*,*)
     for (int ix = 0; ix < 30; ix++)
diff --git a/gcc/testsuite/c-c++-common/gomp/imperfect1.c b/gcc/testsuite/c-c++-common/gomp/imperfect1.c
new file mode 100644
index 00000000000..a95f972e0d7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/imperfect1.c
@@ -0,0 +1,40 @@ 
+/* { dg-do compile } */
+
+/* This test case is expected to fail due to errors.  */
+
+static int f1count[3], f2count[3];
+
+int f1 (int depth, int iter);
+int f2 (int depth, int iter);
+
+void s1 (int a1, int a2, int a3)
+{
+  int i, j, k;
+
+#pragma omp for collapse(3)
+  for (i = 0; i < a1; i++)
+    {
+      f1 (0, i);
+      for (j = 0; j < a2; j++)
+	{
+#pragma omp barrier	/* { dg-error "intervening code must not contain OpenMP directives" } */
+	  f1 (1, j);
+	  if (i == 2)
+	    continue;	/* { dg-error "invalid exit" } */
+	  else
+	    break;	/* { dg-error "invalid exit" } */
+	  for (k = 0; k < a3; k++)
+	    {
+	      f1 (2, k);
+	      f2 (2, k);
+	    }
+	  f2 (1, j);
+	}
+      for (k = 0; k < a3; k++)	/* { dg-error "loop not permitted in intervening code " } */
+	{
+	  f1 (2, k);
+	  f2 (2, k);
+	}
+      f2 (0, i);
+    }
+}
diff --git a/gcc/testsuite/c-c++-common/gomp/imperfect2.c b/gcc/testsuite/c-c++-common/gomp/imperfect2.c
new file mode 100644
index 00000000000..9649f28503b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/imperfect2.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+
+/* This test case is expected to fail due to errors.  */
+
+/* These functions that are part of the OpenMP runtime API would ordinarily
+   be declared in omp.h, but we don't have that here.  */
+extern int omp_get_num_threads(void);
+extern int omp_get_max_threads(void);
+
+static int f1count[3], f2count[3];
+
+int f1 (int depth, int iter);
+int f2 (int depth, int iter);
+
+void s1 (int a1, int a2, int a3)
+{
+  int i, j, k;
+#pragma omp for collapse(3)
+  for (i = 0; i < a1; i++)
+    {
+      f1 (0, i);
+      for (j = 0; j < omp_get_num_threads (); j++)  /* This is OK */
+	{
+	  f1 (1, omp_get_num_threads ());  /* { dg-error "not permitted in intervening code" } */
+	  for (k = omp_get_num_threads (); k < a3; k++)  /* This is OK */
+	    {
+	      f1 (2, omp_get_num_threads ());
+	      f2 (2, omp_get_max_threads ());
+	    }
+	  f2 (1, omp_get_max_threads ());  /* { dg-error "not permitted in intervening code" } */
+	}
+      f2 (0, i);
+    }
+}
+
+
diff --git a/gcc/testsuite/c-c++-common/gomp/imperfect3.c b/gcc/testsuite/c-c++-common/gomp/imperfect3.c
new file mode 100644
index 00000000000..a29157094b0
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/imperfect3.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+
+/* This test case is expected to fail due to errors.  */
+
+/* Test that the imperfectly-nested loops with the ordered clause gives
+   an error, and that there is only one error (and not one on every
+   intervening statement).  */
+
+static int f1count[3], f2count[3];
+
+int f1 (int depth, int iter);
+int f2 (int depth, int iter);
+
+void s1 (int a1, int a2, int a3)
+{
+  int i, j, k;
+
+#pragma omp for ordered(3)
+  for (i = 0; i < a1; i++)  /* { dg-error "inner loops must be perfectly nested" } */
+    {
+      f1 (0, i);
+      for (j = 0; j < a2; j++)
+	{
+	  f1 (1, j);
+	  for (k = 0; k < a3; k++)
+	    {
+	      f1 (2, k);
+	      f2 (2, k);
+	    }
+	  f2 (1, j);
+	}
+      f2 (0, i);
+    }
+}
+
diff --git a/gcc/testsuite/c-c++-common/gomp/imperfect4.c b/gcc/testsuite/c-c++-common/gomp/imperfect4.c
new file mode 100644
index 00000000000..e5feff730a9
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/imperfect4.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+
+/* This test case is expected to fail due to errors.  */
+
+static int f1count[3], f2count[3];
+
+int f1 (int depth, int iter);
+int f2 (int depth, int iter);
+
+void s1 (int a1, int a2, int a3)
+{
+  int i, j, k;
+
+#pragma omp for collapse(4)
+  for (i = 0; i < a1; i++)	/* { dg-error "not enough nested loops" } */
+    {
+      f1 (0, i);
+      for (j = 0; j < a2; j++)
+	{
+	  f1 (1, j);
+	  for (k = 0; k < a3; k++)
+	    {
+	      /* According to the grammar, this is intervening code; we
+		 don't know that we are also missing a nested for loop
+		 until we have parsed this whole compound expression.  */
+#pragma omp barrier	/* { dg-error "intervening code must not contain OpenMP directives" } */
+	      f1 (2, k);
+	      f2 (2, k);
+	    }
+	  f2 (1, j);
+	}
+      f2 (0, i);
+    }
+}
+
diff --git a/gcc/testsuite/c-c++-common/gomp/imperfect5.c b/gcc/testsuite/c-c++-common/gomp/imperfect5.c
new file mode 100644
index 00000000000..7294a816aba
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/imperfect5.c
@@ -0,0 +1,59 @@ 
+/* { dg-do compile } */
+
+/* This test case is expected to fail due to errors.  */
+
+static int f1count[3], f2count[3];
+
+int f1 (int depth, int iter);
+int f2 (int depth, int iter);
+int ijk (int x, int y, int z);
+void f3 (int sum);
+
+/* This function isn't particularly meaningful, but it should compile without
+   error.  */
+int s1 (int a1, int a2, int a3)
+{
+  int i, j, k;
+  int r = 0;
+
+#pragma omp simd collapse(3) reduction (inscan, +:r)
+  for (i = 0; i < a1; i++)
+    {
+      for (j = 0; j < a2; j++)
+	{
+	  for (k = 0; k < a3; k++)
+	    {
+	      r = r + ijk (i, j, k);
+#pragma omp scan exclusive (r)
+	      f3 (r);
+	    }
+	}
+    }
+  return r;
+}
+
+/* Adding intervening code should trigger an error.  */
+int s2 (int a1, int a2, int a3)
+{
+  int i, j, k;
+  int r = 0;
+
+#pragma omp simd collapse(3) reduction (inscan, +:r)
+  for (i = 0; i < a1; i++)  /* { dg-error "inner loops must be perfectly nested" } */
+    {
+      f1 (0, i);
+      for (j = 0; j < a2; j++)
+	{
+	  f1 (1, j);
+	  for (k = 0; k < a3; k++)
+	    {
+	      r = r + ijk (i, j, k);
+#pragma omp scan exclusive (r)
+	      f3 (r);
+	    }
+	  f2 (1, j);
+	}
+      f2 (0, i);
+    }
+  return r;
+}
diff --git a/gcc/testsuite/gcc.dg/gomp/collapse-1.c b/gcc/testsuite/gcc.dg/gomp/collapse-1.c
index 89b76bb669c..16a102ff3fd 100644
--- a/gcc/testsuite/gcc.dg/gomp/collapse-1.c
+++ b/gcc/testsuite/gcc.dg/gomp/collapse-1.c
@@ -8,8 +8,8 @@  void
 f1 (void)
 {
   #pragma omp for collapse (2)
-  for (i = 0; i < 5; i++)
-    ;					/* { dg-error "not enough perfectly nested" } */
+  for (i = 0; i < 5; i++)	/* { dg-error "not enough nested loops" } */
+    ;
   {
     for (j = 0; j < 5; j++)
       ;
@@ -38,7 +38,7 @@  f3 (void)
   #pragma omp for collapse (2)
   for (i = 0; i < 5; i++)
     {
-      int k = foo ();			/* { dg-error "not enough perfectly nested" } */
+      int k = foo ();
       {
 	{
 	  for (j = 0; j < 5; j++)
@@ -58,7 +58,7 @@  f4 (void)
       {
 	for (j = 0; j < 5; j++)
 	  ;
-	foo ();				/* { dg-error "collapsed loops not perfectly nested before" } */
+	foo ();
       }
     }
 }
@@ -73,7 +73,7 @@  f5 (void)
 	for (j = 0; j < 5; j++)
 	  ;
       }
-      foo ();				/* { dg-error "collapsed loops not perfectly nested before" } */
+      foo ();
     }
 }
 
diff --git a/libgomp/testsuite/libgomp.c-c++-common/imperfect1.c b/libgomp/testsuite/libgomp.c-c++-common/imperfect1.c
new file mode 100644
index 00000000000..cafdcaf25b0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/imperfect1.c
@@ -0,0 +1,76 @@ 
+/* { dg-do run } */
+
+static int f1count[3], f2count[3];
+
+#ifndef __cplusplus
+extern void abort (void);
+#else
+extern "C" void abort (void);
+#endif
+
+int f1 (int depth, int iter)
+{
+  f1count[depth]++;
+  return iter;
+}
+
+int f2 (int depth, int iter)
+{
+  f2count[depth]++;
+  return iter;
+}
+
+void s1 (int a1, int a2, int a3)
+{
+  int i, j, k;
+
+#pragma omp for collapse(3)
+  for (i = 0; i < a1; i++)
+    {
+      f1 (0, i);
+      for (j = 0; j < a2; j++)
+	{
+	  f1 (1, j);
+	  for (k = 0; k < a3; k++)
+	    {
+	      f1 (2, k);
+	      f2 (2, k);
+	    }
+	  f2 (1, j);
+	}
+      f2 (0, i);
+    }
+}
+
+int
+main (void)
+{
+  f1count[0] = 0;
+  f1count[1] = 0;
+  f1count[2] = 0;
+  f2count[0] = 0;
+  f2count[1] = 0;
+  f2count[2] = 0;
+
+  s1 (3, 4, 5);
+
+  /* All intervening code at the same depth must be executed the same
+     number of times. */
+  if (f1count[0] != f2count[0]) abort ();
+  if (f1count[1] != f2count[1]) abort ();
+  if (f1count[2] != f2count[2]) abort ();
+
+  /* Intervening code must be executed at least as many times as the loop
+     that encloses it. */
+  if (f1count[0] < 3) abort ();
+  if (f1count[1] < 3 * 4) abort ();
+
+  /* Intervening code must not be executed more times than the number
+     of logical iterations. */
+  if (f1count[0] > 3 * 4 * 5) abort ();
+  if (f1count[1] > 3 * 4 * 5) abort ();
+
+  /* Check that the innermost loop body is executed exactly the number
+     of logical iterations expected. */
+  if (f1count[2] != 3 * 4 * 5) abort ();
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/imperfect2.c b/libgomp/testsuite/libgomp.c-c++-common/imperfect2.c
new file mode 100644
index 00000000000..e2098006eab
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/imperfect2.c
@@ -0,0 +1,114 @@ 
+/* { dg-do run } */
+
+static int f1count[3], f2count[3];
+static int g1count[3], g2count[3];
+
+#ifndef __cplusplus
+extern void abort (void);
+#else
+extern "C" void abort (void);
+#endif
+
+int f1 (int depth, int iter)
+{
+  f1count[depth]++;
+  return iter;
+}
+
+int f2 (int depth, int iter)
+{
+  f2count[depth]++;
+  return iter;
+}
+
+int g1 (int depth, int iter)
+{
+  g1count[depth]++;
+  return iter;
+}
+
+int g2 (int depth, int iter)
+{
+  g2count[depth]++;
+  return iter;
+}
+
+void s1 (int a1, int a2, int a3)
+{
+  int i, j, k;
+
+#pragma omp for collapse(3)
+  for (i = 0; i < a1; i++)
+    {
+      f1 (0, i);
+      {
+	g1 (0, i);
+	for (j = 0; j < a2; j++)
+	  {
+	    f1 (1, j);
+	    {
+	      g1 (1, j);
+	      for (k = 0; k < a3; k++)
+		{
+		  f1 (2, k);
+		  {
+		    g1 (2, k);
+		    g2 (2, k);
+		  }
+		  f2 (2, k);
+		}
+	      g2 (1, j);
+	    }
+	  f2 (1, j);
+	  }
+	g2 (0, i);
+      }
+      f2 (0, i);
+    }
+}
+
+int
+main (void)
+{
+  f1count[0] = 0;
+  f1count[1] = 0;
+  f1count[2] = 0;
+  f2count[0] = 0;
+  f2count[1] = 0;
+  f2count[2] = 0;
+
+  g1count[0] = 0;
+  g1count[1] = 0;
+  g1count[2] = 0;
+  g2count[0] = 0;
+  g2count[1] = 0;
+  g2count[2] = 0;
+
+  s1 (3, 4, 5);
+
+  /* All intervening code at the same depth must be executed the same
+     number of times. */
+  if (f1count[0] != f2count[0]) abort ();
+  if (f1count[1] != f2count[1]) abort ();
+  if (f1count[2] != f2count[2]) abort ();
+  if (g1count[0] != f1count[0]) abort ();
+  if (g2count[0] != f1count[0]) abort ();
+  if (g1count[1] != f1count[1]) abort ();
+  if (g2count[1] != f1count[1]) abort ();
+  if (g1count[2] != f1count[2]) abort ();
+  if (g2count[2] != f1count[2]) abort ();
+
+  /* Intervening code must be executed at least as many times as the loop
+     that encloses it. */
+  if (f1count[0] < 3) abort ();
+  if (f1count[1] < 3 * 4) abort ();
+
+  /* Intervening code must not be executed more times than the number
+     of logical iterations. */
+  if (f1count[0] > 3 * 4 * 5) abort ();
+  if (f1count[1] > 3 * 4 * 5) abort ();
+
+  /* Check that the innermost loop body is executed exactly the number
+     of logical iterations expected. */
+  if (f1count[2] != 3 * 4 * 5) abort ();
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/imperfect3.c b/libgomp/testsuite/libgomp.c-c++-common/imperfect3.c
new file mode 100644
index 00000000000..feb5e32d1d6
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/imperfect3.c
@@ -0,0 +1,119 @@ 
+/* { dg-do run } */
+
+/* Like imperfect2.c, but includes bindings in the blocks.  */
+
+static int f1count[3], f2count[3];
+static int g1count[3], g2count[3];
+
+#ifndef __cplusplus
+extern void abort (void);
+#else
+extern "C" void abort (void);
+#endif
+
+int f1 (int depth, int iter)
+{
+  f1count[depth]++;
+  return iter;
+}
+
+int f2 (int depth, int iter)
+{
+  f2count[depth]++;
+  return iter;
+}
+
+int g1 (int depth, int iter)
+{
+  g1count[depth]++;
+  return iter;
+}
+
+int g2 (int depth, int iter)
+{
+  g2count[depth]++;
+  return iter;
+}
+
+void s1 (int a1, int a2, int a3)
+{
+  int i, j, k;
+
+#pragma omp for collapse(3)
+  for (i = 0; i < a1; i++)
+    {
+      int local0 = 0;
+      f1 (local0, i);
+      {
+	g1 (local0, i);
+	for (j = 0; j < a2; j++)
+	  {
+	    int local1 = 1;
+	    f1 (local1, j);
+	    {
+	      g1 (local1, j);
+	      for (k = 0; k < a3; k++)
+		{
+		  int local2 = 2;
+		  f1 (local2, k);
+		  {
+		    g1 (local2, k);
+		    g2 (local2, k);
+		  }
+		  f2 (local2, k);
+		}
+	      g2 (local1, j);
+	    }
+	  f2 (local1, j);
+	  }
+	g2 (local0, i);
+      }
+      f2 (local0, i);
+    }
+}
+
+int
+main (void)
+{
+  f1count[0] = 0;
+  f1count[1] = 0;
+  f1count[2] = 0;
+  f2count[0] = 0;
+  f2count[1] = 0;
+  f2count[2] = 0;
+
+  g1count[0] = 0;
+  g1count[1] = 0;
+  g1count[2] = 0;
+  g2count[0] = 0;
+  g2count[1] = 0;
+  g2count[2] = 0;
+
+  s1 (3, 4, 5);
+
+  /* All intervening code at the same depth must be executed the same
+     number of times. */
+  if (f1count[0] != f2count[0]) abort ();
+  if (f1count[1] != f2count[1]) abort ();
+  if (f1count[2] != f2count[2]) abort ();
+  if (g1count[0] != f1count[0]) abort ();
+  if (g2count[0] != f1count[0]) abort ();
+  if (g1count[1] != f1count[1]) abort ();
+  if (g2count[1] != f1count[1]) abort ();
+  if (g1count[2] != f1count[2]) abort ();
+  if (g2count[2] != f1count[2]) abort ();
+
+  /* Intervening code must be executed at least as many times as the loop
+     that encloses it. */
+  if (f1count[0] < 3) abort ();
+  if (f1count[1] < 3 * 4) abort ();
+
+  /* Intervening code must not be executed more times than the number
+     of logical iterations. */
+  if (f1count[0] > 3 * 4 * 5) abort ();
+  if (f1count[1] > 3 * 4 * 5) abort ();
+
+  /* Check that the innermost loop body is executed exactly the number
+     of logical iterations expected. */
+  if (f1count[2] != 3 * 4 * 5) abort ();
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/imperfect4.c b/libgomp/testsuite/libgomp.c-c++-common/imperfect4.c
new file mode 100644
index 00000000000..e29301bfbad
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/imperfect4.c
@@ -0,0 +1,117 @@ 
+/* { dg-do run } */
+
+/* Like imperfect2.c, but includes blocks that are themselves intervening
+   code.  */
+
+static int f1count[3], f2count[3];
+static int g1count[3], g2count[3];
+
+#ifndef __cplusplus
+extern void abort (void);
+#else
+extern "C" void abort (void);
+#endif
+
+int f1 (int depth, int iter)
+{
+  f1count[depth]++;
+  return iter;
+}
+
+int f2 (int depth, int iter)
+{
+  f2count[depth]++;
+  return iter;
+}
+
+int g1 (int depth, int iter)
+{
+  g1count[depth]++;
+  return iter;
+}
+
+int g2 (int depth, int iter)
+{
+  g2count[depth]++;
+  return iter;
+}
+
+void s1 (int a1, int a2, int a3)
+{
+  int i, j, k;
+
+#pragma omp for collapse(3)
+  for (i = 0; i < a1; i++)
+    {
+      { f1 (0, i); }
+      {
+	g1 (0, i);
+	for (j = 0; j < a2; j++)
+	  {
+	    { f1 (1, j); }
+	    {
+	      { g1 (1, j); }
+	      for (k = 0; k < a3; k++)
+		{
+		  f1 (2, k);
+		  {
+		    g1 (2, k);
+		    g2 (2, k);
+		  }
+		  f2 (2, k);
+		}
+	      { g2 (1, j); }
+	    }
+	    { f2 (1, j); }
+	  }
+	{ g2 (0, i); }
+      }
+      { f2 (0, i); }
+    }
+}
+
+int
+main (void)
+{
+  f1count[0] = 0;
+  f1count[1] = 0;
+  f1count[2] = 0;
+  f2count[0] = 0;
+  f2count[1] = 0;
+  f2count[2] = 0;
+
+  g1count[0] = 0;
+  g1count[1] = 0;
+  g1count[2] = 0;
+  g2count[0] = 0;
+  g2count[1] = 0;
+  g2count[2] = 0;
+
+  s1 (3, 4, 5);
+
+  /* All intervening code at the same depth must be executed the same
+     number of times. */
+  if (f1count[0] != f2count[0]) abort ();
+  if (f1count[1] != f2count[1]) abort ();
+  if (f1count[2] != f2count[2]) abort ();
+  if (g1count[0] != f1count[0]) abort ();
+  if (g2count[0] != f1count[0]) abort ();
+  if (g1count[1] != f1count[1]) abort ();
+  if (g2count[1] != f1count[1]) abort ();
+  if (g1count[2] != f1count[2]) abort ();
+  if (g2count[2] != f1count[2]) abort ();
+
+  /* Intervening code must be executed at least as many times as the loop
+     that encloses it. */
+  if (f1count[0] < 3) abort ();
+  if (f1count[1] < 3 * 4) abort ();
+
+  /* Intervening code must not be executed more times than the number
+     of logical iterations. */
+  if (f1count[0] > 3 * 4 * 5) abort ();
+  if (f1count[1] > 3 * 4 * 5) abort ();
+
+  /* Check that the innermost loop body is executed exactly the number
+     of logical iterations expected. */
+  if (f1count[2] != 3 * 4 * 5) abort ();
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/imperfect5.c b/libgomp/testsuite/libgomp.c-c++-common/imperfect5.c
new file mode 100644
index 00000000000..7bd4f12d472
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/imperfect5.c
@@ -0,0 +1,49 @@ 
+/* { dg-do run } */
+
+#ifndef __cplusplus
+extern void abort (void);
+#else
+extern "C" void abort (void);
+#endif
+
+static int inner_loop_count = 0;
+static int intervening_code_count = 0;
+
+void
+g (int x, int y)
+{
+  inner_loop_count++;
+}
+
+int
+foo (int imax, int jmax)
+{
+  int j = 0;
+
+#pragma omp for collapse(2)
+  for (int i = 0; i < imax; ++i)
+    {
+      /* All the intervening code at the same level must be executed
+	 the same number of times.  */
+      ++intervening_code_count;
+      for (int j = 0; j < jmax; ++j)
+	{
+	  g (i, j);
+	}
+      /* This is the outer j, not the one from the inner collapsed loop.  */
+      ++j;
+    }
+  return j;
+}
+
+int
+main (void)
+{
+  int j = foo (5, 3);
+  if (j != intervening_code_count)
+    abort ();
+  if (inner_loop_count != 5 * 3)
+    abort ();
+  if (intervening_code_count < 5 || intervening_code_count > 5 * 3)
+    abort ();
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/imperfect6.c b/libgomp/testsuite/libgomp.c-c++-common/imperfect6.c
new file mode 100644
index 00000000000..808c6540890
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/imperfect6.c
@@ -0,0 +1,115 @@ 
+/* { dg-do run } */
+
+/* Like imperfect4.c, but bind the iteration variables in the loops.  */
+
+static int f1count[3], f2count[3];
+static int g1count[3], g2count[3];
+
+#ifndef __cplusplus
+extern void abort (void);
+#else
+extern "C" void abort (void);
+#endif
+
+int f1 (int depth, int iter)
+{
+  f1count[depth]++;
+  return iter;
+}
+
+int f2 (int depth, int iter)
+{
+  f2count[depth]++;
+  return iter;
+}
+
+int g1 (int depth, int iter)
+{
+  g1count[depth]++;
+  return iter;
+}
+
+int g2 (int depth, int iter)
+{
+  g2count[depth]++;
+  return iter;
+}
+
+void s1 (int a1, int a2, int a3)
+{
+
+#pragma omp for collapse(3)
+  for (int i = 0; i < a1; i++)
+    {
+      { f1 (0, i); }
+      {
+	g1 (0, i);
+	for (int j = 0; j < a2; j++)
+	  {
+	    { f1 (1, j); }
+	    {
+	      { g1 (1, j); }
+	      for (int k = 0; k < a3; k++)
+		{
+		  f1 (2, k);
+		  {
+		    g1 (2, k);
+		    g2 (2, k);
+		  }
+		  f2 (2, k);
+		}
+	      { g2 (1, j); }
+	    }
+	    { f2 (1, j); }
+	  }
+	{ g2 (0, i); }
+      }
+      { f2 (0, i); }
+    }
+}
+
+int
+main (void)
+{
+  f1count[0] = 0;
+  f1count[1] = 0;
+  f1count[2] = 0;
+  f2count[0] = 0;
+  f2count[1] = 0;
+  f2count[2] = 0;
+
+  g1count[0] = 0;
+  g1count[1] = 0;
+  g1count[2] = 0;
+  g2count[0] = 0;
+  g2count[1] = 0;
+  g2count[2] = 0;
+
+  s1 (3, 4, 5);
+
+  /* All intervening code at the same depth must be executed the same
+     number of times. */
+  if (f1count[0] != f2count[0]) abort ();
+  if (f1count[1] != f2count[1]) abort ();
+  if (f1count[2] != f2count[2]) abort ();
+  if (g1count[0] != f1count[0]) abort ();
+  if (g2count[0] != f1count[0]) abort ();
+  if (g1count[1] != f1count[1]) abort ();
+  if (g2count[1] != f1count[1]) abort ();
+  if (g1count[2] != f1count[2]) abort ();
+  if (g2count[2] != f1count[2]) abort ();
+
+  /* Intervening code must be executed at least as many times as the loop
+     that encloses it. */
+  if (f1count[0] < 3) abort ();
+  if (f1count[1] < 3 * 4) abort ();
+
+  /* Intervening code must not be executed more times than the number
+     of logical iterations. */
+  if (f1count[0] > 3 * 4 * 5) abort ();
+  if (f1count[1] > 3 * 4 * 5) abort ();
+
+  /* Check that the innermost loop body is executed exactly the number
+     of logical iterations expected. */
+  if (f1count[2] != 3 * 4 * 5) abort ();
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/offload-imperfect1.c b/libgomp/testsuite/libgomp.c-c++-common/offload-imperfect1.c
new file mode 100644
index 00000000000..53bc611ace3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/offload-imperfect1.c
@@ -0,0 +1,81 @@ 
+/* { dg-do run } */
+
+/* Like imperfect1.c, but enables offloading.  */
+
+static int f1count[3], f2count[3];
+#pragma omp declare target enter (f1count, f2count)
+
+#ifndef __cplusplus
+extern void abort (void);
+#else
+extern "C" void abort (void);
+#endif
+
+int f1 (int depth, int iter)
+{
+  #pragma omp atomic
+  f1count[depth]++;
+  return iter;
+}
+
+int f2 (int depth, int iter)
+{
+  #pragma omp atomic
+  f2count[depth]++;
+  return iter;
+}
+
+void s1 (int a1, int a2, int a3)
+{
+  int i, j, k;
+
+#pragma omp target parallel for collapse(3) map(always, tofrom:f1count, f2count)
+  for (i = 0; i < a1; i++)
+    {
+      f1 (0, i);
+      for (j = 0; j < a2; j++)
+	{
+	  f1 (1, j);
+	  for (k = 0; k < a3; k++)
+	    {
+	      f1 (2, k);
+	      f2 (2, k);
+	    }
+	  f2 (1, j);
+	}
+      f2 (0, i);
+    }
+}
+
+int
+main (void)
+{
+  f1count[0] = 0;
+  f1count[1] = 0;
+  f1count[2] = 0;
+  f2count[0] = 0;
+  f2count[1] = 0;
+  f2count[2] = 0;
+
+  s1 (3, 4, 5);
+
+  /* All intervening code at the same depth must be executed the same
+     number of times. */
+  if (f1count[0] != f2count[0]) abort ();
+  if (f1count[1] != f2count[1]) abort ();
+  if (f1count[2] != f2count[2]) abort ();
+
+  /* Intervening code must be executed at least as many times as the loop
+     that encloses it. */
+  if (f1count[0] < 3) abort ();
+  if (f1count[1] < 3 * 4) abort ();
+
+  /* Intervening code must not be executed more times than the number
+     of logical iterations. */
+  if (f1count[0] > 3 * 4 * 5) abort ();
+  if (f1count[1] > 3 * 4 * 5) abort ();
+
+  /* Check that the innermost loop body is executed exactly the number
+     of logical iterations expected. */
+  if (f1count[2] != 3 * 4 * 5) abort ();
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/offload-imperfect2.c b/libgomp/testsuite/libgomp.c-c++-common/offload-imperfect2.c
new file mode 100644
index 00000000000..bc2901a517e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/offload-imperfect2.c
@@ -0,0 +1,122 @@ 
+/* { dg-do run } */
+
+/* Like imperfect2.c, but enables offloading.  */
+
+static int f1count[3], f2count[3];
+static int g1count[3], g2count[3];
+#pragma omp declare target enter (f1count, f2count)
+#pragma omp declare target enter (g1count, g2count)
+
+#ifndef __cplusplus
+extern void abort (void);
+#else
+extern "C" void abort (void);
+#endif
+
+int f1 (int depth, int iter)
+{
+  #pragma omp atomic
+  f1count[depth]++;
+  return iter;
+}
+
+int f2 (int depth, int iter)
+{
+  #pragma omp atomic
+  f2count[depth]++;
+  return iter;
+}
+
+int g1 (int depth, int iter)
+{
+  #pragma omp atomic
+  g1count[depth]++;
+  return iter;
+}
+
+int g2 (int depth, int iter)
+{
+  #pragma omp atomic
+  g2count[depth]++;
+  return iter;
+}
+
+void s1 (int a1, int a2, int a3)
+{
+  int i, j, k;
+
+#pragma omp target parallel for collapse(3) map(always, tofrom:f1count, f2count, g1count, g2count)
+  for (i = 0; i < a1; i++)
+    {
+      f1 (0, i);
+      {
+	g1 (0, i);
+	for (j = 0; j < a2; j++)
+	  {
+	    f1 (1, j);
+	    {
+	      g1 (1, j);
+	      for (k = 0; k < a3; k++)
+		{
+		  f1 (2, k);
+		  {
+		    g1 (2, k);
+		    g2 (2, k);
+		  }
+		  f2 (2, k);
+		}
+	      g2 (1, j);
+	    }
+	  f2 (1, j);
+	  }
+	g2 (0, i);
+      }
+      f2 (0, i);
+    }
+}
+
+int
+main (void)
+{
+  f1count[0] = 0;
+  f1count[1] = 0;
+  f1count[2] = 0;
+  f2count[0] = 0;
+  f2count[1] = 0;
+  f2count[2] = 0;
+
+  g1count[0] = 0;
+  g1count[1] = 0;
+  g1count[2] = 0;
+  g2count[0] = 0;
+  g2count[1] = 0;
+  g2count[2] = 0;
+
+  s1 (3, 4, 5);
+
+  /* All intervening code at the same depth must be executed the same
+     number of times. */
+  if (f1count[0] != f2count[0]) abort ();
+  if (f1count[1] != f2count[1]) abort ();
+  if (f1count[2] != f2count[2]) abort ();
+  if (g1count[0] != f1count[0]) abort ();
+  if (g2count[0] != f1count[0]) abort ();
+  if (g1count[1] != f1count[1]) abort ();
+  if (g2count[1] != f1count[1]) abort ();
+  if (g1count[2] != f1count[2]) abort ();
+  if (g2count[2] != f1count[2]) abort ();
+
+  /* Intervening code must be executed at least as many times as the loop
+     that encloses it. */
+  if (f1count[0] < 3) abort ();
+  if (f1count[1] < 3 * 4) abort ();
+
+  /* Intervening code must not be executed more times than the number
+     of logical iterations. */
+  if (f1count[0] > 3 * 4 * 5) abort ();
+  if (f1count[1] > 3 * 4 * 5) abort ();
+
+  /* Check that the innermost loop body is executed exactly the number
+     of logical iterations expected. */
+  if (f1count[2] != 3 * 4 * 5) abort ();
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/offload-imperfect3.c b/libgomp/testsuite/libgomp.c-c++-common/offload-imperfect3.c
new file mode 100644
index 00000000000..ddcfcf4b7eb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/offload-imperfect3.c
@@ -0,0 +1,125 @@ 
+/* { dg-do run } */
+
+/* Like imperfect3.c, but enables offloading.  */
+
+static int f1count[3], f2count[3];
+static int g1count[3], g2count[3];
+#pragma omp declare target enter (f1count, f2count)
+#pragma omp declare target enter (g1count, g2count)
+
+#ifndef __cplusplus
+extern void abort (void);
+#else
+extern "C" void abort (void);
+#endif
+
+int f1 (int depth, int iter)
+{
+  #pragma omp atomic
+  f1count[depth]++;
+  return iter;
+}
+
+int f2 (int depth, int iter)
+{
+  #pragma omp atomic
+  f2count[depth]++;
+  return iter;
+}
+
+int g1 (int depth, int iter)
+{
+  #pragma omp atomic
+  g1count[depth]++;
+  return iter;
+}
+
+int g2 (int depth, int iter)
+{
+  #pragma omp atomic
+  g2count[depth]++;
+  return iter;
+}
+
+void s1 (int a1, int a2, int a3)
+{
+  int i, j, k;
+
+#pragma omp target parallel for collapse(3) map(always, tofrom:f1count, f2count, g1count, g2count)
+  for (i = 0; i < a1; i++)
+    {
+      int local0 = 0;
+      f1 (local0, i);
+      {
+	g1 (local0, i);
+	for (j = 0; j < a2; j++)
+	  {
+	    int local1 = 1;
+	    f1 (local1, j);
+	    {
+	      g1 (local1, j);
+	      for (k = 0; k < a3; k++)
+		{
+		  int local2 = 2;
+		  f1 (local2, k);
+		  {
+		    g1 (local2, k);
+		    g2 (local2, k);
+		  }
+		  f2 (local2, k);
+		}
+	      g2 (local1, j);
+	    }
+	  f2 (local1, j);
+	  }
+	g2 (local0, i);
+      }
+      f2 (local0, i);
+    }
+}
+
+int
+main (void)
+{
+  f1count[0] = 0;
+  f1count[1] = 0;
+  f1count[2] = 0;
+  f2count[0] = 0;
+  f2count[1] = 0;
+  f2count[2] = 0;
+
+  g1count[0] = 0;
+  g1count[1] = 0;
+  g1count[2] = 0;
+  g2count[0] = 0;
+  g2count[1] = 0;
+  g2count[2] = 0;
+
+  s1 (3, 4, 5);
+
+  /* All intervening code at the same depth must be executed the same
+     number of times. */
+  if (f1count[0] != f2count[0]) abort ();
+  if (f1count[1] != f2count[1]) abort ();
+  if (f1count[2] != f2count[2]) abort ();
+  if (g1count[0] != f1count[0]) abort ();
+  if (g2count[0] != f1count[0]) abort ();
+  if (g1count[1] != f1count[1]) abort ();
+  if (g2count[1] != f1count[1]) abort ();
+  if (g1count[2] != f1count[2]) abort ();
+  if (g2count[2] != f1count[2]) abort ();
+
+  /* Intervening code must be executed at least as many times as the loop
+     that encloses it. */
+  if (f1count[0] < 3) abort ();
+  if (f1count[1] < 3 * 4) abort ();
+
+  /* Intervening code must not be executed more times than the number
+     of logical iterations. */
+  if (f1count[0] > 3 * 4 * 5) abort ();
+  if (f1count[1] > 3 * 4 * 5) abort ();
+
+  /* Check that the innermost loop body is executed exactly the number
+     of logical iterations expected. */
+  if (f1count[2] != 3 * 4 * 5) abort ();
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/offload-imperfect4.c b/libgomp/testsuite/libgomp.c-c++-common/offload-imperfect4.c
new file mode 100644
index 00000000000..ede488977b8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/offload-imperfect4.c
@@ -0,0 +1,122 @@ 
+/* { dg-do run } */
+
+/* Like imperfect4.c, but enables offloading.  */
+
+static int f1count[3], f2count[3];
+static int g1count[3], g2count[3];
+#pragma omp declare target enter (f1count, f2count)
+#pragma omp declare target enter (g1count, g2count)
+
+#ifndef __cplusplus
+extern void abort (void);
+#else
+extern "C" void abort (void);
+#endif
+
+int f1 (int depth, int iter)
+{
+  #pragma omp atomic
+  f1count[depth]++;
+  return iter;
+}
+
+int f2 (int depth, int iter)
+{
+  #pragma omp atomic
+  f2count[depth]++;
+  return iter;
+}
+
+int g1 (int depth, int iter)
+{
+  #pragma omp atomic
+  g1count[depth]++;
+  return iter;
+}
+
+int g2 (int depth, int iter)
+{
+  #pragma omp atomic
+  g2count[depth]++;
+  return iter;
+}
+
+void s1 (int a1, int a2, int a3)
+{
+  int i, j, k;
+
+#pragma omp target parallel for collapse(3) map(always, tofrom:f1count, f2count, g1count, g2count)
+  for (i = 0; i < a1; i++)
+    {
+      { f1 (0, i); }
+      {
+	g1 (0, i);
+	for (j = 0; j < a2; j++)
+	  {
+	    { f1 (1, j); }
+	    {
+	      { g1 (1, j); }
+	      for (k = 0; k < a3; k++)
+		{
+		  f1 (2, k);
+		  {
+		    g1 (2, k);
+		    g2 (2, k);
+		  }
+		  f2 (2, k);
+		}
+	      { g2 (1, j); }
+	    }
+	    { f2 (1, j); }
+	  }
+	{ g2 (0, i); }
+      }
+      { f2 (0, i); }
+    }
+}
+
+int
+main (void)
+{
+  f1count[0] = 0;
+  f1count[1] = 0;
+  f1count[2] = 0;
+  f2count[0] = 0;
+  f2count[1] = 0;
+  f2count[2] = 0;
+
+  g1count[0] = 0;
+  g1count[1] = 0;
+  g1count[2] = 0;
+  g2count[0] = 0;
+  g2count[1] = 0;
+  g2count[2] = 0;
+
+  s1 (3, 4, 5);
+
+  /* All intervening code at the same depth must be executed the same
+     number of times. */
+  if (f1count[0] != f2count[0]) abort ();
+  if (f1count[1] != f2count[1]) abort ();
+  if (f1count[2] != f2count[2]) abort ();
+  if (g1count[0] != f1count[0]) abort ();
+  if (g2count[0] != f1count[0]) abort ();
+  if (g1count[1] != f1count[1]) abort ();
+  if (g2count[1] != f1count[1]) abort ();
+  if (g1count[2] != f1count[2]) abort ();
+  if (g2count[2] != f1count[2]) abort ();
+
+  /* Intervening code must be executed at least as many times as the loop
+     that encloses it. */
+  if (f1count[0] < 3) abort ();
+  if (f1count[1] < 3 * 4) abort ();
+
+  /* Intervening code must not be executed more times than the number
+     of logical iterations. */
+  if (f1count[0] > 3 * 4 * 5) abort ();
+  if (f1count[1] > 3 * 4 * 5) abort ();
+
+  /* Check that the innermost loop body is executed exactly the number
+     of logical iterations expected. */
+  if (f1count[2] != 3 * 4 * 5) abort ();
+}