libcpp, v3: Named universal character escapes and delimited escape sequence tweaks

Message ID YxMsnC5ei4zydz+4@tucnak
State New, archived
Headers
Series libcpp, v3: Named universal character escapes and delimited escape sequence tweaks |

Commit Message

Jakub Jelinek Sept. 3, 2022, 10:29 a.m. UTC
  On Thu, Sep 01, 2022 at 03:00:28PM -0400, Jason Merrill wrote:
> We might as well use the same flag name, and document it to mean what it
> currently means for GCC.

Ok, following patch introduces -Wunicode (on by default).

> It looks like this is handling \N{abc}, for which "incomplete" seems like
> the wrong description; it's complete, just wrong, and the diagnostic doesn't
> help correct it.

And also will emit the is not a valid universal character with did you mean
if it matches loosely, otherwise will use the not terminated with } after
... wording.

Ok if it passes bootstrap/regtest?

2022-09-03  Jakub Jelinek  <jakub@redhat.com>

libcpp/
	* include/cpplib.h (struct cpp_options): Add cpp_warn_unicode member.
	(enum cpp_warning_reason): Add CPP_W_UNICODE.
	* init.cc (cpp_create_reader): Initialize cpp_warn_unicode.
	* charset.cc (_cpp_valid_ucn): In possible identifier contexts, don't
	handle \u{ or \N{ specially in -std=c* modes except -std=c++2{3,b}.
	In possible identifier contexts, don't emit an error and punt
	if \N isn't followed by {, or if \N{} surrounds some lower case
	letters or _.  In possible identifier contexts when not C++23, don't
	emit an error but warning about unknown character names and treat as
	separate tokens.  When treating as separate tokens \u{ or \N{, emit
	warnings.
gcc/
	* doc/invoke.texi (-Wno-unicode): Document.
gcc/c-family/
	* c.opt (Winvalid-utf8): Use ObjC instead of objC.  Remove
	" in comments" from description.
	(Wunicode): New option.
gcc/testsuite/
	* c-c++-common/cpp/delimited-escape-seq-4.c: New test.
	* c-c++-common/cpp/delimited-escape-seq-5.c: New test.
	* c-c++-common/cpp/delimited-escape-seq-6.c: New test.
	* c-c++-common/cpp/delimited-escape-seq-7.c: New test.
	* c-c++-common/cpp/named-universal-char-escape-5.c: New test.
	* c-c++-common/cpp/named-universal-char-escape-6.c: New test.
	* c-c++-common/cpp/named-universal-char-escape-7.c: New test.
	* g++.dg/cpp23/named-universal-char-escape1.C: New test.
	* g++.dg/cpp23/named-universal-char-escape2.C: New test.



	Jakub
  

Comments

Jakub Jelinek Sept. 3, 2022, 10:54 a.m. UTC | #1
On Sat, Sep 03, 2022 at 12:29:52PM +0200, Jakub Jelinek wrote:
> On Thu, Sep 01, 2022 at 03:00:28PM -0400, Jason Merrill wrote:
> > We might as well use the same flag name, and document it to mean what it
> > currently means for GCC.
> 
> Ok, following patch introduces -Wunicode (on by default).
> 
> > It looks like this is handling \N{abc}, for which "incomplete" seems like
> > the wrong description; it's complete, just wrong, and the diagnostic doesn't
> > help correct it.
> 
> And also will emit the is not a valid universal character with did you mean
> if it matches loosely, otherwise will use the not terminated with } after
> ... wording.
> 
> Ok if it passes bootstrap/regtest?

Actually, treating the !strict case like the strict case except for always
warning instead of error if outside of literals is simpler.

The following version does that.  The only difference on the testcases is in
the
int f = a\N{abc});
cases where it emits different diagnostics.

2022-09-03  Jakub Jelinek  <jakub@redhat.com>

libcpp/
	* include/cpplib.h (struct cpp_options): Add cpp_warn_unicode member.
	(enum cpp_warning_reason): Add CPP_W_UNICODE.
	* init.cc (cpp_create_reader): Initialize cpp_warn_unicode.
	* charset.cc (_cpp_valid_ucn): In possible identifier contexts, don't
	handle \u{ or \N{ specially in -std=c* modes except -std=c++2{3,b}.
	In possible identifier contexts, don't emit an error and punt
	if \N isn't followed by {, or if \N{} surrounds some lower case
	letters or _.  In possible identifier contexts when not C++23, don't
	emit an error but warning about unknown character names and treat as
	separate tokens.  When treating as separate tokens \u{ or \N{, emit
	warnings.
gcc/
	* doc/invoke.texi (-Wno-unicode): Document.
gcc/c-family/
	* c.opt (Winvalid-utf8): Use ObjC instead of objC.  Remove
	" in comments" from description.
	(Wunicode): New option.
gcc/testsuite/
	* c-c++-common/cpp/delimited-escape-seq-4.c: New test.
	* c-c++-common/cpp/delimited-escape-seq-5.c: New test.
	* c-c++-common/cpp/delimited-escape-seq-6.c: New test.
	* c-c++-common/cpp/delimited-escape-seq-7.c: New test.
	* c-c++-common/cpp/named-universal-char-escape-5.c: New test.
	* c-c++-common/cpp/named-universal-char-escape-6.c: New test.
	* c-c++-common/cpp/named-universal-char-escape-7.c: New test.
	* g++.dg/cpp23/named-universal-char-escape1.C: New test.
	* g++.dg/cpp23/named-universal-char-escape2.C: New test.

--- libcpp/include/cpplib.h.jj	2022-09-03 09:35:41.465984642 +0200
+++ libcpp/include/cpplib.h	2022-09-03 11:30:57.250677870 +0200
@@ -565,6 +565,10 @@ struct cpp_options
      2 if it should be a pedwarn.  */
   unsigned char cpp_warn_invalid_utf8;
 
+  /* True if libcpp should warn about invalid forms of delimited or named
+     escape sequences.  */
+  bool cpp_warn_unicode;
+
   /* True if -finput-charset= option has been used explicitly.  */
   bool cpp_input_charset_explicit;
 
@@ -675,7 +679,8 @@ enum cpp_warning_reason {
   CPP_W_CXX20_COMPAT,
   CPP_W_EXPANSION_TO_DEFINED,
   CPP_W_BIDIRECTIONAL,
-  CPP_W_INVALID_UTF8
+  CPP_W_INVALID_UTF8,
+  CPP_W_UNICODE
 };
 
 /* Callback for header lookup for HEADER, which is the name of a
--- libcpp/init.cc.jj	2022-09-01 09:47:23.729892618 +0200
+++ libcpp/init.cc	2022-09-03 11:19:10.954452329 +0200
@@ -228,6 +228,7 @@ cpp_create_reader (enum c_lang lang, cpp
   CPP_OPTION (pfile, warn_date_time) = 0;
   CPP_OPTION (pfile, cpp_warn_bidirectional) = bidirectional_unpaired;
   CPP_OPTION (pfile, cpp_warn_invalid_utf8) = 0;
+  CPP_OPTION (pfile, cpp_warn_unicode) = 1;
   CPP_OPTION (pfile, cpp_input_charset_explicit) = 0;
 
   /* Default CPP arithmetic to something sensible for the host for the
--- libcpp/charset.cc.jj	2022-09-01 14:19:47.462235851 +0200
+++ libcpp/charset.cc	2022-09-03 12:42:41.800923600 +0200
@@ -1448,7 +1448,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const
   if (str[-1] == 'u')
     {
       length = 4;
-      if (str < limit && *str == '{')
+      if (str < limit
+	  && *str == '{'
+	  && (!identifier_pos
+	      || CPP_OPTION (pfile, delimited_escape_seqs)
+	      || !CPP_OPTION (pfile, std)))
 	{
 	  str++;
 	  /* Magic value to indicate no digits seen.  */
@@ -1462,8 +1466,22 @@ _cpp_valid_ucn (cpp_reader *pfile, const
   else if (str[-1] == 'N')
     {
       length = 4;
+      if (identifier_pos
+	  && !CPP_OPTION (pfile, delimited_escape_seqs)
+	  && CPP_OPTION (pfile, std))
+	{
+	  *cp = 0;
+	  return false;
+	}
       if (str == limit || *str != '{')
-	cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'");
+	{
+	  if (identifier_pos)
+	    {
+	      *cp = 0;
+	      return false;
+	    }
+	  cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'");
+	}
       else
 	{
 	  str++;
@@ -1489,15 +1507,19 @@ _cpp_valid_ucn (cpp_reader *pfile, const
 
 	  if (str < limit && *str == '}')
 	    {
-	      if (name == str && identifier_pos)
+	      if (identifier_pos && name == str)
 		{
+		  cpp_warning (pfile, CPP_W_UNICODE,
+			       "empty named universal character escape "
+			       "sequence; treating it as separate tokens");
 		  *cp = 0;
 		  return false;
 		}
 	      if (name == str)
 		cpp_error (pfile, CPP_DL_ERROR,
 			   "empty named universal character escape sequence");
-	      else if (!CPP_OPTION (pfile, delimited_escape_seqs)
+	      else if ((!identifier_pos || strict)
+		       && !CPP_OPTION (pfile, delimited_escape_seqs)
 		       && CPP_OPTION (pfile, cpp_pedantic))
 		cpp_error (pfile, CPP_DL_PEDWARN,
 			   "named universal character escapes are only valid "
@@ -1515,27 +1537,51 @@ _cpp_valid_ucn (cpp_reader *pfile, const
 					   uname2c_tree, NULL);
 		  if (result == (cppchar_t) -1)
 		    {
-		      cpp_error (pfile, CPP_DL_ERROR,
-				 "\\N{%.*s} is not a valid universal "
-				 "character", (int) (str - name), name);
+		      bool ret = true;
+		      if (identifier_pos
+			  && (!CPP_OPTION (pfile, delimited_escape_seqs)
+			      || !strict))
+			ret = cpp_warning (pfile, CPP_W_UNICODE,
+					   "\\N{%.*s} is not a valid "
+					   "universal character; treating it "
+					   "as separate tokens",
+					   (int) (str - name), name);
+		      else
+			cpp_error (pfile, CPP_DL_ERROR,
+				   "\\N{%.*s} is not a valid universal "
+				   "character", (int) (str - name), name);
 
 		      /* Try to do a loose name lookup according to
 			 Unicode loose matching rule UAX44-LM2.  */
 		      char canon_name[uname2c_max_name_len + 1];
 		      result = _cpp_uname2c_uax44_lm2 ((const char *) name,
 						       str - name, canon_name);
-		      if (result != (cppchar_t) -1)
+		      if (result != (cppchar_t) -1 && ret)
 			cpp_error (pfile, CPP_DL_NOTE,
 				   "did you mean \\N{%s}?", canon_name);
 		      else
-			result = 0x40;
+			result = 0xC0;
+		      if (identifier_pos
+			  && (!CPP_OPTION (pfile, delimited_escape_seqs)
+			      || !strict))
+			{
+			  *cp = 0;
+			  return false;
+			}
 		    }
 		}
 	      str++;
 	      extend_char_range (char_range, loc_reader);
 	    }
 	  else if (identifier_pos)
-	    length = 1;
+	    {
+	      cpp_warning (pfile, CPP_W_UNICODE,
+			   "'\\N{' not terminated with '}' after %.*s; "
+			   "treating it as separate tokens",
+			   (int) (str - base), base);
+	      *cp = 0;
+	      return false;
+	    }
 	  else
 	    {
 	      cpp_error (pfile, CPP_DL_ERROR,
@@ -1584,12 +1630,17 @@ _cpp_valid_ucn (cpp_reader *pfile, const
       }
     while (--length);
 
-  if (delimited
-      && str < limit
-      && *str == '}'
-      && (length != 32 || !identifier_pos))
+  if (delimited && str < limit && *str == '}')
     {
-      if (length == 32)
+      if (length == 32 && identifier_pos)
+	{
+	  cpp_warning (pfile, CPP_W_UNICODE,
+		       "empty delimited escape sequence; "
+		       "treating it as separate tokens");
+	  *cp = 0;
+	  return false;
+	}
+      else if (length == 32)
 	cpp_error (pfile, CPP_DL_ERROR,
 		   "empty delimited escape sequence");
       else if (!CPP_OPTION (pfile, delimited_escape_seqs)
@@ -1607,6 +1658,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const
      error message in that case.  */
   if (length && identifier_pos)
     {
+      if (delimited)
+	cpp_warning (pfile, CPP_W_UNICODE,
+		     "'\\u{' not terminated with '}' after %.*s; "
+		     "treating it as separate tokens",
+		     (int) (str - base), base);
       *cp = 0;
       return false;
     }
--- gcc/doc/invoke.texi.jj	2022-09-03 09:35:40.966991672 +0200
+++ gcc/doc/invoke.texi	2022-09-03 11:39:03.875914845 +0200
@@ -365,7 +365,7 @@ Objective-C and Objective-C++ Dialects}.
 -Winfinite-recursion @gol
 -Winit-self  -Winline  -Wno-int-conversion  -Wint-in-bool-context @gol
 -Wno-int-to-pointer-cast  -Wno-invalid-memory-model @gol
--Winvalid-pch  -Winvalid-utf8 -Wjump-misses-init  @gol
+-Winvalid-pch  -Winvalid-utf8  -Wno-unicode  -Wjump-misses-init  @gol
 -Wlarger-than=@var{byte-size}  -Wlogical-not-parentheses  -Wlogical-op  @gol
 -Wlong-long  -Wno-lto-type-mismatch -Wmain  -Wmaybe-uninitialized @gol
 -Wmemset-elt-size  -Wmemset-transposed-args @gol
@@ -9577,6 +9577,12 @@ Warn if an invalid UTF-8 character is fo
 This warning is on by default for C++23 if @option{-finput-charset=UTF-8}
 is used and turned into error with @option{-pedantic-errors}.
 
+@item -Wno-unicode
+@opindex Wunicode
+@opindex Wno-unicode
+Don't diagnose invalid forms of delimited or named escape sequences which are
+treated as separate tokens.  @option{Wunicode} is enabled by default.
+
 @item -Wlong-long
 @opindex Wlong-long
 @opindex Wno-long-long
--- gcc/c-family/c.opt.jj	2022-09-03 09:35:40.206002393 +0200
+++ gcc/c-family/c.opt	2022-09-03 11:17:04.529201926 +0200
@@ -822,8 +822,8 @@ C ObjC C++ ObjC++ CPP(warn_invalid_pch)
 Warn about PCH files that are found but not used.
 
 Winvalid-utf8
-C objC C++ ObjC++ CPP(cpp_warn_invalid_utf8) CppReason(CPP_W_INVALID_UTF8) Var(warn_invalid_utf8) Init(0) Warning
-Warn about invalid UTF-8 characters in comments.
+C ObjC C++ ObjC++ CPP(cpp_warn_invalid_utf8) CppReason(CPP_W_INVALID_UTF8) Var(warn_invalid_utf8) Init(0) Warning
+Warn about invalid UTF-8 characters.
 
 Wjump-misses-init
 C ObjC Var(warn_jump_misses_init) Warning LangEnabledby(C ObjC,Wc++-compat)
@@ -1345,6 +1345,10 @@ Wundef
 C ObjC C++ ObjC++ CPP(warn_undef) CppReason(CPP_W_UNDEF) Var(cpp_warn_undef) Init(0) Warning
 Warn if an undefined macro is used in an #if directive.
 
+Wunicode
+C ObjC C++ ObjC++ CPP(cpp_warn_unicode) CppReason(CPP_W_UNICODE) Var(warn_unicode) Init(1) Warning
+Warn about invalid forms of delimited or named escape sequences.
+
 Wuninitialized
 C ObjC C++ ObjC++ LTO LangEnabledBy(C ObjC C++ ObjC++ LTO,Wall)
 ;
--- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c.jj	2022-09-03 11:13:37.570068845 +0200
+++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c	2022-09-03 11:56:52.818054420 +0200
@@ -0,0 +1,13 @@
+/* P2290R3 - Delimited escape sequences */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */
+/* { dg-options "-std=gnu++20" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\u{});		/* { dg-warning "empty delimited escape sequence; treating it as separate tokens" } */
+int c = a\u{);		/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */
+int d = a\u{12XYZ});	/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */
+int e = a\u123);
+int f = a\U1234567);
--- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c.jj	2022-09-03 11:13:37.570068845 +0200
+++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c	2022-09-03 12:01:35.618124647 +0200
@@ -0,0 +1,13 @@
+/* P2290R3 - Delimited escape sequences */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */
+/* { dg-options "-std=c++23" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\u{});		/* { dg-warning "empty delimited escape sequence; treating it as separate tokens" "" { target c++23 } } */
+int c = a\u{);		/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" "" { target c++23 } } */
+int d = a\u{12XYZ});	/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" "" { target c++23 } } */
+int e = a\u123);
+int f = a\U1234567);
--- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-6.c.jj	2022-09-03 11:59:36.573778876 +0200
+++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-6.c	2022-09-03 11:59:55.808511591 +0200
@@ -0,0 +1,13 @@
+/* P2290R3 - Delimited escape sequences */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=gnu99 -Wno-c++-compat -Wno-unicode" { target c } } */
+/* { dg-options "-std=gnu++20 -Wno-unicode" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\u{});		/* { dg-bogus "empty delimited escape sequence; treating it as separate tokens" } */
+int c = a\u{);		/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */
+int d = a\u{12XYZ});	/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */
+int e = a\u123);
+int f = a\U1234567);
--- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-7.c.jj	2022-09-03 12:01:48.958939255 +0200
+++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-7.c	2022-09-03 12:02:16.765552854 +0200
@@ -0,0 +1,13 @@
+/* P2290R3 - Delimited escape sequences */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=c17 -Wno-c++-compat -Wno-unicode" { target c } } */
+/* { dg-options "-std=c++23 -Wno-unicode" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\u{});		/* { dg-bogus "empty delimited escape sequence; treating it as separate tokens" } */
+int c = a\u{);		/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */
+int d = a\u{12XYZ});	/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */
+int e = a\u123);
+int f = a\U1234567);
--- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c.jj	2022-09-03 11:13:37.570068845 +0200
+++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c	2022-09-03 12:45:18.968747909 +0200
@@ -0,0 +1,17 @@
+/* P2071R2 - Named universal character escapes */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */
+/* { dg-options "-std=gnu++20" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\N{});				/* { dg-warning "empty named universal character escape sequence; treating it as separate tokens" } */
+int c = a\N{);				/* { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});				/* { dg-warning "\\\\N\\\{abc\\\} is not a valid universal character; treating it as separate tokens" } */
+int g = a\N{ABC.123});				/* { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" } */
+int h = a\N{NON-EXISTENT CHAR});	/* { dg-warning "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" } */
+int i = a\N{Latin_Small_Letter_A_With_Acute});	/* { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" } */
+					/* { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 } */
--- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c.jj	2022-09-03 11:13:37.570068845 +0200
+++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c	2022-09-03 11:44:34.558316155 +0200
@@ -0,0 +1,17 @@
+/* P2071R2 - Named universal character escapes */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */
+/* { dg-options "-std=c++20" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\N{});
+int c = a\N{);
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});
+int g = a\N{ABC.123});
+int h = a\N{NON-EXISTENT CHAR});	/* { dg-bogus "is not a valid universal character" } */
+int i = a\N{Latin_Small_Letter_A_With_Acute});
+int j = a\N{LATIN SMALL LETTER A WITH ACUTE});
--- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-7.c.jj	2022-09-03 12:18:31.296022384 +0200
+++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-7.c	2022-09-03 12:45:57.663212248 +0200
@@ -0,0 +1,17 @@
+/* P2071R2 - Named universal character escapes */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=gnu99 -Wno-c++-compat -Wno-unicode" { target c } } */
+/* { dg-options "-std=gnu++20 -Wno-unicode" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\N{});				/* { dg-bogus "empty named universal character escape sequence; treating it as separate tokens" } */
+int c = a\N{);				/* { dg-bogus "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});				/* { dg-bogus "\\\\N\\\{abc\\\} is not a valid universal character; treating it as separate tokens" } */
+int g = a\N{ABC.123});				/* { dg-bogus "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" } */
+int h = a\N{NON-EXISTENT CHAR});	/* { dg-bogus "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" } */
+int i = a\N{Latin_Small_Letter_A_With_Acute});	/* { dg-bogus "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" } */
+					/* { dg-bogus "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 } */
--- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C.jj	2022-09-03 11:13:37.571068831 +0200
+++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C	2022-09-03 12:44:03.893787182 +0200
@@ -0,0 +1,16 @@
+// P2071R2 - Named universal character escapes
+// { dg-do compile }
+// { dg-require-effective-target wchar }
+
+#define z(x) 0
+#define a z(
+int b = a\N{});				// { dg-warning "empty named universal character escape sequence; treating it as separate tokens" "" { target c++23 } }
+int c = a\N{);				// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" "" { target c++23 } }
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});			// { dg-warning "\\\\N\\\{abc\\\} is not a valid universal character; treating it as separate tokens" "" { target c++23 } }
+int g = a\N{ABC.123});			// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" "" { target c++23 } }
+int h = a\N{NON-EXISTENT CHAR});	// { dg-error "is not a valid universal character" "" { target c++23 } }
+					// { dg-error "was not declared in this scope" "" { target c++23 } .-1 }
+int i = a\N{Latin_Small_Letter_A_With_Acute});	// { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" "" { target c++23 } }
+					// { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target c++23 } .-1 }
--- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C.jj	2022-09-03 11:13:37.571068831 +0200
+++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C	2022-09-03 12:44:31.723401937 +0200
@@ -0,0 +1,18 @@
+// P2071R2 - Named universal character escapes
+// { dg-do compile }
+// { dg-require-effective-target wchar }
+// { dg-options "" }
+
+#define z(x) 0
+#define a z(
+int b = a\N{});				// { dg-warning "empty named universal character escape sequence; treating it as separate tokens" }
+int c = a\N{);				// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" }
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});			// { dg-warning "\\\\N\\\{abc\\\} is not a valid universal character; treating it as separate tokens" }
+int g = a\N{ABC.123});			// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" }
+int h = a\N{NON-EXISTENT CHAR});	// { dg-error "is not a valid universal character" "" { target c++23 } }
+					// { dg-error "was not declared in this scope" "" { target c++23 } .-1 }
+					// { dg-warning "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" "" { target c++20_down } .-2 }
+int i = a\N{Latin_Small_Letter_A_With_Acute});	// { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" }
+					// { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 }


	Jakub
  
Jakub Jelinek Sept. 5, 2022, 7:54 a.m. UTC | #2
On Sat, Sep 03, 2022 at 12:54:31PM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Sat, Sep 03, 2022 at 12:29:52PM +0200, Jakub Jelinek wrote:
> > On Thu, Sep 01, 2022 at 03:00:28PM -0400, Jason Merrill wrote:
> > > We might as well use the same flag name, and document it to mean what it
> > > currently means for GCC.
> > 
> > Ok, following patch introduces -Wunicode (on by default).
> > 
> > > It looks like this is handling \N{abc}, for which "incomplete" seems like
> > > the wrong description; it's complete, just wrong, and the diagnostic doesn't
> > > help correct it.
> > 
> > And also will emit the is not a valid universal character with did you mean
> > if it matches loosely, otherwise will use the not terminated with } after
> > ... wording.
> > 
> > Ok if it passes bootstrap/regtest?
> 
> Actually, treating the !strict case like the strict case except for always
> warning instead of error if outside of literals is simpler.
> 
> The following version does that.  The only difference on the testcases is in
> the
> int f = a\N{abc});
> cases where it emits different diagnostics.

And this version passed successfully bootstrap/regtest.

	Jakub
  
Jason Merrill Sept. 7, 2022, 1:32 a.m. UTC | #3
On 9/3/22 06:54, Jakub Jelinek wrote:
> On Sat, Sep 03, 2022 at 12:29:52PM +0200, Jakub Jelinek wrote:
>> On Thu, Sep 01, 2022 at 03:00:28PM -0400, Jason Merrill wrote:
>>> We might as well use the same flag name, and document it to mean what it
>>> currently means for GCC.
>>
>> Ok, following patch introduces -Wunicode (on by default).
>>
>>> It looks like this is handling \N{abc}, for which "incomplete" seems like
>>> the wrong description; it's complete, just wrong, and the diagnostic doesn't
>>> help correct it.
>>
>> And also will emit the is not a valid universal character with did you mean
>> if it matches loosely, otherwise will use the not terminated with } after
>> ... wording.
>>
>> Ok if it passes bootstrap/regtest?

OK, thanks.

> Actually, treating the !strict case like the strict case except for always
> warning instead of error if outside of literals is simpler.
> 
> The following version does that.  The only difference on the testcases is in
> the
> int f = a\N{abc});
> cases where it emits different diagnostics.
> 
> 2022-09-03  Jakub Jelinek  <jakub@redhat.com>
> 
> libcpp/
> 	* include/cpplib.h (struct cpp_options): Add cpp_warn_unicode member.
> 	(enum cpp_warning_reason): Add CPP_W_UNICODE.
> 	* init.cc (cpp_create_reader): Initialize cpp_warn_unicode.
> 	* charset.cc (_cpp_valid_ucn): In possible identifier contexts, don't
> 	handle \u{ or \N{ specially in -std=c* modes except -std=c++2{3,b}.
> 	In possible identifier contexts, don't emit an error and punt
> 	if \N isn't followed by {, or if \N{} surrounds some lower case
> 	letters or _.  In possible identifier contexts when not C++23, don't
> 	emit an error but warning about unknown character names and treat as
> 	separate tokens.  When treating as separate tokens \u{ or \N{, emit
> 	warnings.
> gcc/
> 	* doc/invoke.texi (-Wno-unicode): Document.
> gcc/c-family/
> 	* c.opt (Winvalid-utf8): Use ObjC instead of objC.  Remove
> 	" in comments" from description.
> 	(Wunicode): New option.
> gcc/testsuite/
> 	* c-c++-common/cpp/delimited-escape-seq-4.c: New test.
> 	* c-c++-common/cpp/delimited-escape-seq-5.c: New test.
> 	* c-c++-common/cpp/delimited-escape-seq-6.c: New test.
> 	* c-c++-common/cpp/delimited-escape-seq-7.c: New test.
> 	* c-c++-common/cpp/named-universal-char-escape-5.c: New test.
> 	* c-c++-common/cpp/named-universal-char-escape-6.c: New test.
> 	* c-c++-common/cpp/named-universal-char-escape-7.c: New test.
> 	* g++.dg/cpp23/named-universal-char-escape1.C: New test.
> 	* g++.dg/cpp23/named-universal-char-escape2.C: New test.
> 
> --- libcpp/include/cpplib.h.jj	2022-09-03 09:35:41.465984642 +0200
> +++ libcpp/include/cpplib.h	2022-09-03 11:30:57.250677870 +0200
> @@ -565,6 +565,10 @@ struct cpp_options
>        2 if it should be a pedwarn.  */
>     unsigned char cpp_warn_invalid_utf8;
>   
> +  /* True if libcpp should warn about invalid forms of delimited or named
> +     escape sequences.  */
> +  bool cpp_warn_unicode;
> +
>     /* True if -finput-charset= option has been used explicitly.  */
>     bool cpp_input_charset_explicit;
>   
> @@ -675,7 +679,8 @@ enum cpp_warning_reason {
>     CPP_W_CXX20_COMPAT,
>     CPP_W_EXPANSION_TO_DEFINED,
>     CPP_W_BIDIRECTIONAL,
> -  CPP_W_INVALID_UTF8
> +  CPP_W_INVALID_UTF8,
> +  CPP_W_UNICODE
>   };
>   
>   /* Callback for header lookup for HEADER, which is the name of a
> --- libcpp/init.cc.jj	2022-09-01 09:47:23.729892618 +0200
> +++ libcpp/init.cc	2022-09-03 11:19:10.954452329 +0200
> @@ -228,6 +228,7 @@ cpp_create_reader (enum c_lang lang, cpp
>     CPP_OPTION (pfile, warn_date_time) = 0;
>     CPP_OPTION (pfile, cpp_warn_bidirectional) = bidirectional_unpaired;
>     CPP_OPTION (pfile, cpp_warn_invalid_utf8) = 0;
> +  CPP_OPTION (pfile, cpp_warn_unicode) = 1;
>     CPP_OPTION (pfile, cpp_input_charset_explicit) = 0;
>   
>     /* Default CPP arithmetic to something sensible for the host for the
> --- libcpp/charset.cc.jj	2022-09-01 14:19:47.462235851 +0200
> +++ libcpp/charset.cc	2022-09-03 12:42:41.800923600 +0200
> @@ -1448,7 +1448,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>     if (str[-1] == 'u')
>       {
>         length = 4;
> -      if (str < limit && *str == '{')
> +      if (str < limit
> +	  && *str == '{'
> +	  && (!identifier_pos
> +	      || CPP_OPTION (pfile, delimited_escape_seqs)
> +	      || !CPP_OPTION (pfile, std)))
>   	{
>   	  str++;
>   	  /* Magic value to indicate no digits seen.  */
> @@ -1462,8 +1466,22 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>     else if (str[-1] == 'N')
>       {
>         length = 4;
> +      if (identifier_pos
> +	  && !CPP_OPTION (pfile, delimited_escape_seqs)
> +	  && CPP_OPTION (pfile, std))
> +	{
> +	  *cp = 0;
> +	  return false;
> +	}
>         if (str == limit || *str != '{')
> -	cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'");
> +	{
> +	  if (identifier_pos)
> +	    {
> +	      *cp = 0;
> +	      return false;
> +	    }
> +	  cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'");
> +	}
>         else
>   	{
>   	  str++;
> @@ -1489,15 +1507,19 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>   
>   	  if (str < limit && *str == '}')
>   	    {
> -	      if (name == str && identifier_pos)
> +	      if (identifier_pos && name == str)
>   		{
> +		  cpp_warning (pfile, CPP_W_UNICODE,
> +			       "empty named universal character escape "
> +			       "sequence; treating it as separate tokens");
>   		  *cp = 0;
>   		  return false;
>   		}
>   	      if (name == str)
>   		cpp_error (pfile, CPP_DL_ERROR,
>   			   "empty named universal character escape sequence");
> -	      else if (!CPP_OPTION (pfile, delimited_escape_seqs)
> +	      else if ((!identifier_pos || strict)
> +		       && !CPP_OPTION (pfile, delimited_escape_seqs)
>   		       && CPP_OPTION (pfile, cpp_pedantic))
>   		cpp_error (pfile, CPP_DL_PEDWARN,
>   			   "named universal character escapes are only valid "
> @@ -1515,27 +1537,51 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>   					   uname2c_tree, NULL);
>   		  if (result == (cppchar_t) -1)
>   		    {
> -		      cpp_error (pfile, CPP_DL_ERROR,
> -				 "\\N{%.*s} is not a valid universal "
> -				 "character", (int) (str - name), name);
> +		      bool ret = true;
> +		      if (identifier_pos
> +			  && (!CPP_OPTION (pfile, delimited_escape_seqs)
> +			      || !strict))
> +			ret = cpp_warning (pfile, CPP_W_UNICODE,
> +					   "\\N{%.*s} is not a valid "
> +					   "universal character; treating it "
> +					   "as separate tokens",
> +					   (int) (str - name), name);
> +		      else
> +			cpp_error (pfile, CPP_DL_ERROR,
> +				   "\\N{%.*s} is not a valid universal "
> +				   "character", (int) (str - name), name);
>   
>   		      /* Try to do a loose name lookup according to
>   			 Unicode loose matching rule UAX44-LM2.  */
>   		      char canon_name[uname2c_max_name_len + 1];
>   		      result = _cpp_uname2c_uax44_lm2 ((const char *) name,
>   						       str - name, canon_name);
> -		      if (result != (cppchar_t) -1)
> +		      if (result != (cppchar_t) -1 && ret)
>   			cpp_error (pfile, CPP_DL_NOTE,
>   				   "did you mean \\N{%s}?", canon_name);
>   		      else
> -			result = 0x40;
> +			result = 0xC0;
> +		      if (identifier_pos
> +			  && (!CPP_OPTION (pfile, delimited_escape_seqs)
> +			      || !strict))
> +			{
> +			  *cp = 0;
> +			  return false;
> +			}
>   		    }
>   		}
>   	      str++;
>   	      extend_char_range (char_range, loc_reader);
>   	    }
>   	  else if (identifier_pos)
> -	    length = 1;
> +	    {
> +	      cpp_warning (pfile, CPP_W_UNICODE,
> +			   "'\\N{' not terminated with '}' after %.*s; "
> +			   "treating it as separate tokens",
> +			   (int) (str - base), base);
> +	      *cp = 0;
> +	      return false;
> +	    }
>   	  else
>   	    {
>   	      cpp_error (pfile, CPP_DL_ERROR,
> @@ -1584,12 +1630,17 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>         }
>       while (--length);
>   
> -  if (delimited
> -      && str < limit
> -      && *str == '}'
> -      && (length != 32 || !identifier_pos))
> +  if (delimited && str < limit && *str == '}')
>       {
> -      if (length == 32)
> +      if (length == 32 && identifier_pos)
> +	{
> +	  cpp_warning (pfile, CPP_W_UNICODE,
> +		       "empty delimited escape sequence; "
> +		       "treating it as separate tokens");
> +	  *cp = 0;
> +	  return false;
> +	}
> +      else if (length == 32)
>   	cpp_error (pfile, CPP_DL_ERROR,
>   		   "empty delimited escape sequence");
>         else if (!CPP_OPTION (pfile, delimited_escape_seqs)
> @@ -1607,6 +1658,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>        error message in that case.  */
>     if (length && identifier_pos)
>       {
> +      if (delimited)
> +	cpp_warning (pfile, CPP_W_UNICODE,
> +		     "'\\u{' not terminated with '}' after %.*s; "
> +		     "treating it as separate tokens",
> +		     (int) (str - base), base);
>         *cp = 0;
>         return false;
>       }
> --- gcc/doc/invoke.texi.jj	2022-09-03 09:35:40.966991672 +0200
> +++ gcc/doc/invoke.texi	2022-09-03 11:39:03.875914845 +0200
> @@ -365,7 +365,7 @@ Objective-C and Objective-C++ Dialects}.
>   -Winfinite-recursion @gol
>   -Winit-self  -Winline  -Wno-int-conversion  -Wint-in-bool-context @gol
>   -Wno-int-to-pointer-cast  -Wno-invalid-memory-model @gol
> --Winvalid-pch  -Winvalid-utf8 -Wjump-misses-init  @gol
> +-Winvalid-pch  -Winvalid-utf8  -Wno-unicode  -Wjump-misses-init  @gol
>   -Wlarger-than=@var{byte-size}  -Wlogical-not-parentheses  -Wlogical-op  @gol
>   -Wlong-long  -Wno-lto-type-mismatch -Wmain  -Wmaybe-uninitialized @gol
>   -Wmemset-elt-size  -Wmemset-transposed-args @gol
> @@ -9577,6 +9577,12 @@ Warn if an invalid UTF-8 character is fo
>   This warning is on by default for C++23 if @option{-finput-charset=UTF-8}
>   is used and turned into error with @option{-pedantic-errors}.
>   
> +@item -Wno-unicode
> +@opindex Wunicode
> +@opindex Wno-unicode
> +Don't diagnose invalid forms of delimited or named escape sequences which are
> +treated as separate tokens.  @option{Wunicode} is enabled by default.
> +
>   @item -Wlong-long
>   @opindex Wlong-long
>   @opindex Wno-long-long
> --- gcc/c-family/c.opt.jj	2022-09-03 09:35:40.206002393 +0200
> +++ gcc/c-family/c.opt	2022-09-03 11:17:04.529201926 +0200
> @@ -822,8 +822,8 @@ C ObjC C++ ObjC++ CPP(warn_invalid_pch)
>   Warn about PCH files that are found but not used.
>   
>   Winvalid-utf8
> -C objC C++ ObjC++ CPP(cpp_warn_invalid_utf8) CppReason(CPP_W_INVALID_UTF8) Var(warn_invalid_utf8) Init(0) Warning
> -Warn about invalid UTF-8 characters in comments.
> +C ObjC C++ ObjC++ CPP(cpp_warn_invalid_utf8) CppReason(CPP_W_INVALID_UTF8) Var(warn_invalid_utf8) Init(0) Warning
> +Warn about invalid UTF-8 characters.
>   
>   Wjump-misses-init
>   C ObjC Var(warn_jump_misses_init) Warning LangEnabledby(C ObjC,Wc++-compat)
> @@ -1345,6 +1345,10 @@ Wundef
>   C ObjC C++ ObjC++ CPP(warn_undef) CppReason(CPP_W_UNDEF) Var(cpp_warn_undef) Init(0) Warning
>   Warn if an undefined macro is used in an #if directive.
>   
> +Wunicode
> +C ObjC C++ ObjC++ CPP(cpp_warn_unicode) CppReason(CPP_W_UNICODE) Var(warn_unicode) Init(1) Warning
> +Warn about invalid forms of delimited or named escape sequences.
> +
>   Wuninitialized
>   C ObjC C++ ObjC++ LTO LangEnabledBy(C ObjC C++ ObjC++ LTO,Wall)
>   ;
> --- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c.jj	2022-09-03 11:13:37.570068845 +0200
> +++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c	2022-09-03 11:56:52.818054420 +0200
> @@ -0,0 +1,13 @@
> +/* P2290R3 - Delimited escape sequences */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */
> +/* { dg-options "-std=gnu++20" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\u{});		/* { dg-warning "empty delimited escape sequence; treating it as separate tokens" } */
> +int c = a\u{);		/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */
> +int d = a\u{12XYZ});	/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */
> +int e = a\u123);
> +int f = a\U1234567);
> --- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c.jj	2022-09-03 11:13:37.570068845 +0200
> +++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c	2022-09-03 12:01:35.618124647 +0200
> @@ -0,0 +1,13 @@
> +/* P2290R3 - Delimited escape sequences */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */
> +/* { dg-options "-std=c++23" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\u{});		/* { dg-warning "empty delimited escape sequence; treating it as separate tokens" "" { target c++23 } } */
> +int c = a\u{);		/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" "" { target c++23 } } */
> +int d = a\u{12XYZ});	/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" "" { target c++23 } } */
> +int e = a\u123);
> +int f = a\U1234567);
> --- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-6.c.jj	2022-09-03 11:59:36.573778876 +0200
> +++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-6.c	2022-09-03 11:59:55.808511591 +0200
> @@ -0,0 +1,13 @@
> +/* P2290R3 - Delimited escape sequences */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=gnu99 -Wno-c++-compat -Wno-unicode" { target c } } */
> +/* { dg-options "-std=gnu++20 -Wno-unicode" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\u{});		/* { dg-bogus "empty delimited escape sequence; treating it as separate tokens" } */
> +int c = a\u{);		/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */
> +int d = a\u{12XYZ});	/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */
> +int e = a\u123);
> +int f = a\U1234567);
> --- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-7.c.jj	2022-09-03 12:01:48.958939255 +0200
> +++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-7.c	2022-09-03 12:02:16.765552854 +0200
> @@ -0,0 +1,13 @@
> +/* P2290R3 - Delimited escape sequences */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=c17 -Wno-c++-compat -Wno-unicode" { target c } } */
> +/* { dg-options "-std=c++23 -Wno-unicode" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\u{});		/* { dg-bogus "empty delimited escape sequence; treating it as separate tokens" } */
> +int c = a\u{);		/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */
> +int d = a\u{12XYZ});	/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */
> +int e = a\u123);
> +int f = a\U1234567);
> --- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c.jj	2022-09-03 11:13:37.570068845 +0200
> +++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c	2022-09-03 12:45:18.968747909 +0200
> @@ -0,0 +1,17 @@
> +/* P2071R2 - Named universal character escapes */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */
> +/* { dg-options "-std=gnu++20" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\N{});				/* { dg-warning "empty named universal character escape sequence; treating it as separate tokens" } */
> +int c = a\N{);				/* { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */
> +int d = a\N);
> +int e = a\NARG);
> +int f = a\N{abc});				/* { dg-warning "\\\\N\\\{abc\\\} is not a valid universal character; treating it as separate tokens" } */
> +int g = a\N{ABC.123});				/* { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" } */
> +int h = a\N{NON-EXISTENT CHAR});	/* { dg-warning "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" } */
> +int i = a\N{Latin_Small_Letter_A_With_Acute});	/* { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" } */
> +					/* { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 } */
> --- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c.jj	2022-09-03 11:13:37.570068845 +0200
> +++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c	2022-09-03 11:44:34.558316155 +0200
> @@ -0,0 +1,17 @@
> +/* P2071R2 - Named universal character escapes */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */
> +/* { dg-options "-std=c++20" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\N{});
> +int c = a\N{);
> +int d = a\N);
> +int e = a\NARG);
> +int f = a\N{abc});
> +int g = a\N{ABC.123});
> +int h = a\N{NON-EXISTENT CHAR});	/* { dg-bogus "is not a valid universal character" } */
> +int i = a\N{Latin_Small_Letter_A_With_Acute});
> +int j = a\N{LATIN SMALL LETTER A WITH ACUTE});
> --- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-7.c.jj	2022-09-03 12:18:31.296022384 +0200
> +++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-7.c	2022-09-03 12:45:57.663212248 +0200
> @@ -0,0 +1,17 @@
> +/* P2071R2 - Named universal character escapes */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=gnu99 -Wno-c++-compat -Wno-unicode" { target c } } */
> +/* { dg-options "-std=gnu++20 -Wno-unicode" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\N{});				/* { dg-bogus "empty named universal character escape sequence; treating it as separate tokens" } */
> +int c = a\N{);				/* { dg-bogus "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */
> +int d = a\N);
> +int e = a\NARG);
> +int f = a\N{abc});				/* { dg-bogus "\\\\N\\\{abc\\\} is not a valid universal character; treating it as separate tokens" } */
> +int g = a\N{ABC.123});				/* { dg-bogus "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" } */
> +int h = a\N{NON-EXISTENT CHAR});	/* { dg-bogus "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" } */
> +int i = a\N{Latin_Small_Letter_A_With_Acute});	/* { dg-bogus "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" } */
> +					/* { dg-bogus "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 } */
> --- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C.jj	2022-09-03 11:13:37.571068831 +0200
> +++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C	2022-09-03 12:44:03.893787182 +0200
> @@ -0,0 +1,16 @@
> +// P2071R2 - Named universal character escapes
> +// { dg-do compile }
> +// { dg-require-effective-target wchar }
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\N{});				// { dg-warning "empty named universal character escape sequence; treating it as separate tokens" "" { target c++23 } }
> +int c = a\N{);				// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" "" { target c++23 } }
> +int d = a\N);
> +int e = a\NARG);
> +int f = a\N{abc});			// { dg-warning "\\\\N\\\{abc\\\} is not a valid universal character; treating it as separate tokens" "" { target c++23 } }
> +int g = a\N{ABC.123});			// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" "" { target c++23 } }
> +int h = a\N{NON-EXISTENT CHAR});	// { dg-error "is not a valid universal character" "" { target c++23 } }
> +					// { dg-error "was not declared in this scope" "" { target c++23 } .-1 }
> +int i = a\N{Latin_Small_Letter_A_With_Acute});	// { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" "" { target c++23 } }
> +					// { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target c++23 } .-1 }
> --- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C.jj	2022-09-03 11:13:37.571068831 +0200
> +++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C	2022-09-03 12:44:31.723401937 +0200
> @@ -0,0 +1,18 @@
> +// P2071R2 - Named universal character escapes
> +// { dg-do compile }
> +// { dg-require-effective-target wchar }
> +// { dg-options "" }
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\N{});				// { dg-warning "empty named universal character escape sequence; treating it as separate tokens" }
> +int c = a\N{);				// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" }
> +int d = a\N);
> +int e = a\NARG);
> +int f = a\N{abc});			// { dg-warning "\\\\N\\\{abc\\\} is not a valid universal character; treating it as separate tokens" }
> +int g = a\N{ABC.123});			// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" }
> +int h = a\N{NON-EXISTENT CHAR});	// { dg-error "is not a valid universal character" "" { target c++23 } }
> +					// { dg-error "was not declared in this scope" "" { target c++23 } .-1 }
> +					// { dg-warning "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" "" { target c++20_down } .-2 }
> +int i = a\N{Latin_Small_Letter_A_With_Acute});	// { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" }
> +					// { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 }
> 
> 
> 	Jakub
>
  

Patch

--- libcpp/include/cpplib.h.jj	2022-09-03 09:35:41.465984642 +0200
+++ libcpp/include/cpplib.h	2022-09-03 11:30:57.250677870 +0200
@@ -565,6 +565,10 @@  struct cpp_options
      2 if it should be a pedwarn.  */
   unsigned char cpp_warn_invalid_utf8;
 
+  /* True if libcpp should warn about invalid forms of delimited or named
+     escape sequences.  */
+  bool cpp_warn_unicode;
+
   /* True if -finput-charset= option has been used explicitly.  */
   bool cpp_input_charset_explicit;
 
@@ -675,7 +679,8 @@  enum cpp_warning_reason {
   CPP_W_CXX20_COMPAT,
   CPP_W_EXPANSION_TO_DEFINED,
   CPP_W_BIDIRECTIONAL,
-  CPP_W_INVALID_UTF8
+  CPP_W_INVALID_UTF8,
+  CPP_W_UNICODE
 };
 
 /* Callback for header lookup for HEADER, which is the name of a
--- libcpp/init.cc.jj	2022-09-01 09:47:23.729892618 +0200
+++ libcpp/init.cc	2022-09-03 11:19:10.954452329 +0200
@@ -228,6 +228,7 @@  cpp_create_reader (enum c_lang lang, cpp
   CPP_OPTION (pfile, warn_date_time) = 0;
   CPP_OPTION (pfile, cpp_warn_bidirectional) = bidirectional_unpaired;
   CPP_OPTION (pfile, cpp_warn_invalid_utf8) = 0;
+  CPP_OPTION (pfile, cpp_warn_unicode) = 1;
   CPP_OPTION (pfile, cpp_input_charset_explicit) = 0;
 
   /* Default CPP arithmetic to something sensible for the host for the
--- libcpp/charset.cc.jj	2022-09-01 14:19:47.462235851 +0200
+++ libcpp/charset.cc	2022-09-03 11:26:14.858585905 +0200
@@ -1448,7 +1448,11 @@  _cpp_valid_ucn (cpp_reader *pfile, const
   if (str[-1] == 'u')
     {
       length = 4;
-      if (str < limit && *str == '{')
+      if (str < limit
+	  && *str == '{'
+	  && (!identifier_pos
+	      || CPP_OPTION (pfile, delimited_escape_seqs)
+	      || !CPP_OPTION (pfile, std)))
 	{
 	  str++;
 	  /* Magic value to indicate no digits seen.  */
@@ -1462,8 +1466,22 @@  _cpp_valid_ucn (cpp_reader *pfile, const
   else if (str[-1] == 'N')
     {
       length = 4;
+      if (identifier_pos
+	  && !CPP_OPTION (pfile, delimited_escape_seqs)
+	  && CPP_OPTION (pfile, std))
+	{
+	  *cp = 0;
+	  return false;
+	}
       if (str == limit || *str != '{')
-	cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'");
+	{
+	  if (identifier_pos)
+	    {
+	      *cp = 0;
+	      return false;
+	    }
+	  cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'");
+	}
       else
 	{
 	  str++;
@@ -1472,6 +1490,7 @@  _cpp_valid_ucn (cpp_reader *pfile, const
 	  length = 0;
 	  const uchar *name = str;
 	  bool strict = true;
+	  const uchar *strict_end = name;
 
 	  do
 	    {
@@ -1481,7 +1500,11 @@  _cpp_valid_ucn (cpp_reader *pfile, const
 	      if (!ISIDNUM (c) && c != ' ' && c != '-')
 		break;
 	      if (ISLOWER (c) || c == '_')
-		strict = false;
+		{
+		  if (strict)
+		    strict_end = str;
+		  strict = false;
+		}
 	      str++;
 	      extend_char_range (char_range, loc_reader);
 	    }
@@ -1489,8 +1512,35 @@  _cpp_valid_ucn (cpp_reader *pfile, const
 
 	  if (str < limit && *str == '}')
 	    {
-	      if (name == str && identifier_pos)
+	      if (identifier_pos && (name == str || !strict))
 		{
+		  if (name == str)
+		    cpp_warning (pfile, CPP_W_UNICODE,
+				 "empty named universal character escape "
+				 "sequence; treating it as separate tokens");
+		  else
+		    {
+		      char canon_name[uname2c_max_name_len + 1];
+		      result = _cpp_uname2c_uax44_lm2 ((const char *) name,
+						       str - name, canon_name);
+		      if (result == (cppchar_t) -1)
+			cpp_warning (pfile, CPP_W_UNICODE,
+				     "'\\N{' not terminated with '}' after "
+				     "%.*s; treating it as separate tokens",
+				     (int) (strict_end - base), base);
+		      else
+			{
+			  bool ret
+			    = cpp_warning (pfile, CPP_W_UNICODE,
+					   "\\N{%.*s} is not a valid "
+					   "universal character; treating it "
+					   "as separate tokens",
+					   (int) (str - name), name);
+			  if (ret)
+			    cpp_error (pfile, CPP_DL_NOTE,
+				       "did you mean \\N{%s}?", canon_name);
+			}
+		    }
 		  *cp = 0;
 		  return false;
 		}
@@ -1515,27 +1565,49 @@  _cpp_valid_ucn (cpp_reader *pfile, const
 					   uname2c_tree, NULL);
 		  if (result == (cppchar_t) -1)
 		    {
-		      cpp_error (pfile, CPP_DL_ERROR,
-				 "\\N{%.*s} is not a valid universal "
-				 "character", (int) (str - name), name);
+		      bool ret = true;
+		      if (identifier_pos
+			  && !CPP_OPTION (pfile, delimited_escape_seqs))
+			ret = cpp_warning (pfile, CPP_W_UNICODE,
+					   "\\N{%.*s} is not a valid "
+					   "universal character; treating it "
+					   "as separate tokens",
+					   (int) (str - name), name);
+		      else
+			cpp_error (pfile, CPP_DL_ERROR,
+				   "\\N{%.*s} is not a valid universal "
+				   "character", (int) (str - name), name);
 
 		      /* Try to do a loose name lookup according to
 			 Unicode loose matching rule UAX44-LM2.  */
 		      char canon_name[uname2c_max_name_len + 1];
 		      result = _cpp_uname2c_uax44_lm2 ((const char *) name,
 						       str - name, canon_name);
-		      if (result != (cppchar_t) -1)
+		      if (result != (cppchar_t) -1 && ret)
 			cpp_error (pfile, CPP_DL_NOTE,
 				   "did you mean \\N{%s}?", canon_name);
 		      else
-			result = 0x40;
+			result = 0xC0;
+		      if (identifier_pos
+			  && !CPP_OPTION (pfile, delimited_escape_seqs))
+			{
+			  *cp = 0;
+			  return false;
+			}
 		    }
 		}
 	      str++;
 	      extend_char_range (char_range, loc_reader);
 	    }
 	  else if (identifier_pos)
-	    length = 1;
+	    {
+	      cpp_warning (pfile, CPP_W_UNICODE,
+			   "'\\N{' not terminated with '}' after %.*s; "
+			   "treating it as separate tokens",
+			   (int) (str - base), base);
+	      *cp = 0;
+	      return false;
+	    }
 	  else
 	    {
 	      cpp_error (pfile, CPP_DL_ERROR,
@@ -1584,12 +1656,17 @@  _cpp_valid_ucn (cpp_reader *pfile, const
       }
     while (--length);
 
-  if (delimited
-      && str < limit
-      && *str == '}'
-      && (length != 32 || !identifier_pos))
+  if (delimited && str < limit && *str == '}')
     {
-      if (length == 32)
+      if (length == 32 && identifier_pos)
+	{
+	  cpp_warning (pfile, CPP_W_UNICODE,
+		       "empty delimited escape sequence; "
+		       "treating it as separate tokens");
+	  *cp = 0;
+	  return false;
+	}
+      else if (length == 32)
 	cpp_error (pfile, CPP_DL_ERROR,
 		   "empty delimited escape sequence");
       else if (!CPP_OPTION (pfile, delimited_escape_seqs)
@@ -1607,6 +1684,11 @@  _cpp_valid_ucn (cpp_reader *pfile, const
      error message in that case.  */
   if (length && identifier_pos)
     {
+      if (delimited)
+	cpp_warning (pfile, CPP_W_UNICODE,
+		     "'\\u{' not terminated with '}' after %.*s; "
+		     "treating it as separate tokens",
+		     (int) (str - base), base);
       *cp = 0;
       return false;
     }
--- gcc/doc/invoke.texi.jj	2022-09-03 09:35:40.966991672 +0200
+++ gcc/doc/invoke.texi	2022-09-03 11:39:03.875914845 +0200
@@ -365,7 +365,7 @@  Objective-C and Objective-C++ Dialects}.
 -Winfinite-recursion @gol
 -Winit-self  -Winline  -Wno-int-conversion  -Wint-in-bool-context @gol
 -Wno-int-to-pointer-cast  -Wno-invalid-memory-model @gol
--Winvalid-pch  -Winvalid-utf8 -Wjump-misses-init  @gol
+-Winvalid-pch  -Winvalid-utf8  -Wno-unicode  -Wjump-misses-init  @gol
 -Wlarger-than=@var{byte-size}  -Wlogical-not-parentheses  -Wlogical-op  @gol
 -Wlong-long  -Wno-lto-type-mismatch -Wmain  -Wmaybe-uninitialized @gol
 -Wmemset-elt-size  -Wmemset-transposed-args @gol
@@ -9577,6 +9577,12 @@  Warn if an invalid UTF-8 character is fo
 This warning is on by default for C++23 if @option{-finput-charset=UTF-8}
 is used and turned into error with @option{-pedantic-errors}.
 
+@item -Wno-unicode
+@opindex Wunicode
+@opindex Wno-unicode
+Don't diagnose invalid forms of delimited or named escape sequences which are
+treated as separate tokens.  @option{Wunicode} is enabled by default.
+
 @item -Wlong-long
 @opindex Wlong-long
 @opindex Wno-long-long
--- gcc/c-family/c.opt.jj	2022-09-03 09:35:40.206002393 +0200
+++ gcc/c-family/c.opt	2022-09-03 11:17:04.529201926 +0200
@@ -822,8 +822,8 @@  C ObjC C++ ObjC++ CPP(warn_invalid_pch)
 Warn about PCH files that are found but not used.
 
 Winvalid-utf8
-C objC C++ ObjC++ CPP(cpp_warn_invalid_utf8) CppReason(CPP_W_INVALID_UTF8) Var(warn_invalid_utf8) Init(0) Warning
-Warn about invalid UTF-8 characters in comments.
+C ObjC C++ ObjC++ CPP(cpp_warn_invalid_utf8) CppReason(CPP_W_INVALID_UTF8) Var(warn_invalid_utf8) Init(0) Warning
+Warn about invalid UTF-8 characters.
 
 Wjump-misses-init
 C ObjC Var(warn_jump_misses_init) Warning LangEnabledby(C ObjC,Wc++-compat)
@@ -1345,6 +1345,10 @@  Wundef
 C ObjC C++ ObjC++ CPP(warn_undef) CppReason(CPP_W_UNDEF) Var(cpp_warn_undef) Init(0) Warning
 Warn if an undefined macro is used in an #if directive.
 
+Wunicode
+C ObjC C++ ObjC++ CPP(cpp_warn_unicode) CppReason(CPP_W_UNICODE) Var(warn_unicode) Init(1) Warning
+Warn about invalid forms of delimited or named escape sequences.
+
 Wuninitialized
 C ObjC C++ ObjC++ LTO LangEnabledBy(C ObjC C++ ObjC++ LTO,Wall)
 ;
--- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c.jj	2022-09-03 11:13:37.570068845 +0200
+++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c	2022-09-03 11:56:52.818054420 +0200
@@ -0,0 +1,13 @@ 
+/* P2290R3 - Delimited escape sequences */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */
+/* { dg-options "-std=gnu++20" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\u{});		/* { dg-warning "empty delimited escape sequence; treating it as separate tokens" } */
+int c = a\u{);		/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */
+int d = a\u{12XYZ});	/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */
+int e = a\u123);
+int f = a\U1234567);
--- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c.jj	2022-09-03 11:13:37.570068845 +0200
+++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c	2022-09-03 12:01:35.618124647 +0200
@@ -0,0 +1,13 @@ 
+/* P2290R3 - Delimited escape sequences */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */
+/* { dg-options "-std=c++23" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\u{});		/* { dg-warning "empty delimited escape sequence; treating it as separate tokens" "" { target c++23 } } */
+int c = a\u{);		/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" "" { target c++23 } } */
+int d = a\u{12XYZ});	/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" "" { target c++23 } } */
+int e = a\u123);
+int f = a\U1234567);
--- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-6.c.jj	2022-09-03 11:59:36.573778876 +0200
+++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-6.c	2022-09-03 11:59:55.808511591 +0200
@@ -0,0 +1,13 @@ 
+/* P2290R3 - Delimited escape sequences */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=gnu99 -Wno-c++-compat -Wno-unicode" { target c } } */
+/* { dg-options "-std=gnu++20 -Wno-unicode" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\u{});		/* { dg-bogus "empty delimited escape sequence; treating it as separate tokens" } */
+int c = a\u{);		/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */
+int d = a\u{12XYZ});	/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */
+int e = a\u123);
+int f = a\U1234567);
--- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-7.c.jj	2022-09-03 12:01:48.958939255 +0200
+++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-7.c	2022-09-03 12:02:16.765552854 +0200
@@ -0,0 +1,13 @@ 
+/* P2290R3 - Delimited escape sequences */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=c17 -Wno-c++-compat -Wno-unicode" { target c } } */
+/* { dg-options "-std=c++23 -Wno-unicode" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\u{});		/* { dg-bogus "empty delimited escape sequence; treating it as separate tokens" } */
+int c = a\u{);		/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */
+int d = a\u{12XYZ});	/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */
+int e = a\u123);
+int f = a\U1234567);
--- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c.jj	2022-09-03 11:13:37.570068845 +0200
+++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c	2022-09-03 12:12:29.596042747 +0200
@@ -0,0 +1,17 @@ 
+/* P2071R2 - Named universal character escapes */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */
+/* { dg-options "-std=gnu++20" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\N{});				/* { dg-warning "empty named universal character escape sequence; treating it as separate tokens" } */
+int c = a\N{);				/* { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});				/* { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */
+int g = a\N{ABC.123});				/* { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" } */
+int h = a\N{NON-EXISTENT CHAR});	/* { dg-warning "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" } */
+int i = a\N{Latin_Small_Letter_A_With_Acute});	/* { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" } */
+					/* { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 } */
--- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c.jj	2022-09-03 11:13:37.570068845 +0200
+++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c	2022-09-03 11:44:34.558316155 +0200
@@ -0,0 +1,17 @@ 
+/* P2071R2 - Named universal character escapes */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */
+/* { dg-options "-std=c++20" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\N{});
+int c = a\N{);
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});
+int g = a\N{ABC.123});
+int h = a\N{NON-EXISTENT CHAR});	/* { dg-bogus "is not a valid universal character" } */
+int i = a\N{Latin_Small_Letter_A_With_Acute});
+int j = a\N{LATIN SMALL LETTER A WITH ACUTE});
--- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-7.c.jj	2022-09-03 12:18:31.296022384 +0200
+++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-7.c	2022-09-03 12:19:00.956610699 +0200
@@ -0,0 +1,17 @@ 
+/* P2071R2 - Named universal character escapes */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=gnu99 -Wno-c++-compat -Wno-unicode" { target c } } */
+/* { dg-options "-std=gnu++20 -Wno-unicode" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\N{});				/* { dg-bogus "empty named universal character escape sequence; treating it as separate tokens" } */
+int c = a\N{);				/* { dg-bogus "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});				/* { dg-bogus "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */
+int g = a\N{ABC.123});				/* { dg-bogus "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" } */
+int h = a\N{NON-EXISTENT CHAR});	/* { dg-bogus "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" } */
+int i = a\N{Latin_Small_Letter_A_With_Acute});	/* { dg-bogus "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" } */
+					/* { dg-bogus "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 } */
--- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C.jj	2022-09-03 11:13:37.571068831 +0200
+++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C	2022-09-03 12:16:49.010442096 +0200
@@ -0,0 +1,16 @@ 
+// P2071R2 - Named universal character escapes
+// { dg-do compile }
+// { dg-require-effective-target wchar }
+
+#define z(x) 0
+#define a z(
+int b = a\N{});				// { dg-warning "empty named universal character escape sequence; treating it as separate tokens" "" { target c++23 } }
+int c = a\N{);				// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" "" { target c++23 } }
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});			// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" "" { target c++23 } }
+int g = a\N{ABC.123});			// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" "" { target c++23 } }
+int h = a\N{NON-EXISTENT CHAR});	// { dg-error "is not a valid universal character" "" { target c++23 } }
+					// { dg-error "was not declared in this scope" "" { target c++23 } .-1 }
+int i = a\N{Latin_Small_Letter_A_With_Acute});	// { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" "" { target c++23 } }
+					// { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target c++23 } .-1 }
--- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C.jj	2022-09-03 11:13:37.571068831 +0200
+++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C	2022-09-03 12:18:03.567407252 +0200
@@ -0,0 +1,18 @@ 
+// P2071R2 - Named universal character escapes
+// { dg-do compile }
+// { dg-require-effective-target wchar }
+// { dg-options "" }
+
+#define z(x) 0
+#define a z(
+int b = a\N{});				// { dg-warning "empty named universal character escape sequence; treating it as separate tokens" }
+int c = a\N{);				// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" }
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});			// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" }
+int g = a\N{ABC.123});			// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" }
+int h = a\N{NON-EXISTENT CHAR});	// { dg-error "is not a valid universal character" "" { target c++23 } }
+					// { dg-error "was not declared in this scope" "" { target c++23 } .-1 }
+					// { dg-warning "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" "" { target c++20_down } .-2 }
+int i = a\N{Latin_Small_Letter_A_With_Acute});	// { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" }
+					// { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 }