From patchwork Sat Sep  3 10:29:52 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jakub Jelinek <jakub@redhat.com>
X-Patchwork-Id: 947
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:ecc5:0:0:0:0:0 with SMTP id s5csp1123483wro;
        Sat, 3 Sep 2022 03:52:18 -0700 (PDT)
X-Google-Smtp-Source: 
 AA6agR7x3IhFU1PKyXSQFmj+2oePZRdHbbzYuSgmpkUVE0TgKyCPGJ86EW9suOyhiOVW7TLtWg9h
X-Received: by 2002:a17:907:7256:b0:741:9bfb:5fce with SMTP id
 ds22-20020a170907725600b007419bfb5fcemr18206206ejc.560.1662202338839;
        Sat, 03 Sep 2022 03:52:18 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1662202338; cv=none;
        d=google.com; s=arc-20160816;
        b=rJ+L6xuVf5GMx7cBJavFVuc1lmfMIIgvEpLMp7YS6LSfmzCcV/knf38rFdoFkcAJFF
         +lPthfUogAAxPMM2wcl/pY4g1kYtjExYHR62tZgK4udoijIJk1UF0DcpJdeIcBb1VIoy
         eORVgrqxD744ncgSzdRhWkXkMtMcDYtIfPxrYZjyOhIuZ2xUg5scQpcFYPzezR4kOPX1
         at21a3sYrsz17gtmnZRS9YhI9VAgC/9BuM/7pl0tZGB8wB99dcjMQlUpfmU91Ulc3K30
         pVxwER6FYQFClSoeXxMOiFKbGygWe/lMwvKEhpgexVMQZjeKZMc0ky9ia/jJ1JKdQj0W
         rvQA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help
         :list-post:list-archive:list-unsubscribe:list-id:precedence
         :content-disposition:in-reply-to:mime-version:references:message-id
         :subject:to:date:resent-to:resent-message-id:resent-date:resent-from
         :dmarc-filter:delivered-to:dkim-signature:dkim-filter;
        bh=iLqGRPqQ0dtAyT6VGtvhsv4axqCFy/4KiWqEjctpv5M=;
        b=Z05aRGarPg9RvqPORSJgeS19xnosxiwAtLHJgM3OS8VJiKHG1vQaFI6bqaSdlNgoUI
         KkE72JJmZZsYJQqm2ZHL6iUG7vfgW4w4Wi8v5hWwaztqqo1bMRZSWouqomplljTpadoC
         0tFe4dkhrb9jxFD8iHwpzEOKeqVySlX2x+lVdNA0HtYnDUXOECqwycd7R2GkuTFx7nXo
         Gkt7nPDyBRewlfzWbc93/Ruy5a1OcvVPXAa7XM+/biNJhhfRvles6UKW46QBsao2Db1l
         Pe4cpufSzItVsQWmAfkclhdmfPxD/8zw2ybLWwVLO71DZ0pFFNngW7sW/E7kQq3UnXGm
         5A+Q==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Q1RlG5cb;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from sourceware.org (server2.sourceware.org. [8.43.85.97])
        by mx.google.com with ESMTPS id
 dr19-20020a170907721300b007306ac0faa0si3709156ejc.615.2022.09.03.03.52.18
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 03 Sep 2022 03:52:18 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender) client-ip=8.43.85.97;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Q1RlG5cb;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id BB54A3856DD5
	for <ouuuleilei@gmail.com>; Sat,  3 Sep 2022 10:52:17 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BB54A3856DD5
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1662202337;
	bh=iLqGRPqQ0dtAyT6VGtvhsv4axqCFy/4KiWqEjctpv5M=;
	h=Resent-From:Resent-Date:Resent-To:Date:To:Subject:References:
	 In-Reply-To:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:Cc:From;
	b=Q1RlG5cbJqxr3+pXEEMqrKWRBDfh+TYthgrV9SznHoyBy5nDF0UMnC/jwiu4yRMYM
	 bEW/qq3q4pNEYwwcQ/BVgjOMxRImcND+EUsw4tDvc0EvLMGXdGh/CfEtgHivyBAtFA
	 3ec6PUCGW8FVMXUKb3Z+0KnVPk22RAyfrhc0Cp6E=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.129.124])
 by sourceware.org (Postfix) with ESMTPS id 1AE7C38560B9
 for <gcc-patches@gcc.gnu.org>; Sat,  3 Sep 2022 10:50:35 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1AE7C38560B9
Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com
 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-665-xeSKEKsvNsalHCA-utrvQg-1; Sat, 03 Sep 2022 06:50:33 -0400
X-MC-Unique: xeSKEKsvNsalHCA-utrvQg-1
Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com
 [10.11.54.3])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 20F8029AA3BD;
 Sat,  3 Sep 2022 10:50:33 +0000 (UTC)
Received: from tucnak.zalov.cz (unknown [10.39.192.41])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id B73BA1121314;
 Sat,  3 Sep 2022 10:50:32 +0000 (UTC)
Received: from tucnak.zalov.cz (localhost [127.0.0.1])
 by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 283AoUHo654473
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT);
 Sat, 3 Sep 2022 12:50:30 +0200
Received: (from jakub@localhost)
 by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 283AoTTk654472;
 Sat, 3 Sep 2022 12:50:29 +0200
Resent-From: Jakub Jelinek <jakub@redhat.com>
Resent-Date: Sat, 3 Sep 2022 12:50:29 +0200
Resent-Message-ID: <YxMxdRMCX62uC+Rp@tucnak>
Resent-To: Jason Merrill <jason@redhat.com>,
 Joseph Myers <joseph@codesourcery.com>, gcc-patches@gcc.gnu.org
Date: Sat, 3 Sep 2022 12:29:52 +0200
To: Jason Merrill <jason@redhat.com>
Subject: [PATCH] libcpp, v3: Named universal character escapes and delimited
 escape sequence tweaks
Message-ID: <YxMsnC5ei4zydz+4@tucnak>
References: <Ywc3pI1lnzq/FvOu@tucnak>
 <alpine.DEB.2.22.394.2208302055240.446383@digraph.polyomino.org.uk>
 <Yw5+nPD8O+JTx3uL@tucnak> <Yw6DA3MhofyzWnje@tucnak>
 <Yw9xsBRmTqkLMlGC@tucnak>
 <5da578e7-9c43-99ea-15c1-aefc641a0654@redhat.com>
 <Yw95MR3YN1aT2ks6@tucnak>
 <df9730f4-d796-7bf6-dd18-d0c9c5a0cf12@redhat.com>
 <YxCULjMrhvN5f7xR@tucnak>
 <37250e6c-80f9-2b93-a381-c1c9b869c04d@redhat.com>
MIME-Version: 1.0
In-Reply-To: <37250e6c-80f9-2b93-a381-c1c9b869c04d@redhat.com>
X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Disposition: inline
X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW,
 SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Jakub Jelinek via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Jakub Jelinek <jakub@redhat.com>
Reply-To: Jakub Jelinek <jakub@redhat.com>
Cc: gcc-patches@gcc.gnu.org, Joseph Myers <joseph@codesourcery.com>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1742945479994284493?=
X-GMAIL-MSGID: =?utf-8?q?1742945479994284493?=

On Thu, Sep 01, 2022 at 03:00:28PM -0400, Jason Merrill wrote:
> We might as well use the same flag name, and document it to mean what it
> currently means for GCC.

Ok, following patch introduces -Wunicode (on by default).

> It looks like this is handling \N{abc}, for which "incomplete" seems like
> the wrong description; it's complete, just wrong, and the diagnostic doesn't
> help correct it.

And also will emit the is not a valid universal character with did you mean
if it matches loosely, otherwise will use the not terminated with } after
... wording.

Ok if it passes bootstrap/regtest?

2022-09-03  Jakub Jelinek  <jakub@redhat.com>

libcpp/
	* include/cpplib.h (struct cpp_options): Add cpp_warn_unicode member.
	(enum cpp_warning_reason): Add CPP_W_UNICODE.
	* init.cc (cpp_create_reader): Initialize cpp_warn_unicode.
	* charset.cc (_cpp_valid_ucn): In possible identifier contexts, don't
	handle \u{ or \N{ specially in -std=c* modes except -std=c++2{3,b}.
	In possible identifier contexts, don't emit an error and punt
	if \N isn't followed by {, or if \N{} surrounds some lower case
	letters or _.  In possible identifier contexts when not C++23, don't
	emit an error but warning about unknown character names and treat as
	separate tokens.  When treating as separate tokens \u{ or \N{, emit
	warnings.
gcc/
	* doc/invoke.texi (-Wno-unicode): Document.
gcc/c-family/
	* c.opt (Winvalid-utf8): Use ObjC instead of objC.  Remove
	" in comments" from description.
	(Wunicode): New option.
gcc/testsuite/
	* c-c++-common/cpp/delimited-escape-seq-4.c: New test.
	* c-c++-common/cpp/delimited-escape-seq-5.c: New test.
	* c-c++-common/cpp/delimited-escape-seq-6.c: New test.
	* c-c++-common/cpp/delimited-escape-seq-7.c: New test.
	* c-c++-common/cpp/named-universal-char-escape-5.c: New test.
	* c-c++-common/cpp/named-universal-char-escape-6.c: New test.
	* c-c++-common/cpp/named-universal-char-escape-7.c: New test.
	* g++.dg/cpp23/named-universal-char-escape1.C: New test.
	* g++.dg/cpp23/named-universal-char-escape2.C: New test.


	Jakub

--- libcpp/include/cpplib.h.jj	2022-09-03 09:35:41.465984642 +0200
+++ libcpp/include/cpplib.h	2022-09-03 11:30:57.250677870 +0200
@@ -565,6 +565,10 @@ struct cpp_options
      2 if it should be a pedwarn.  */
   unsigned char cpp_warn_invalid_utf8;
 
+  /* True if libcpp should warn about invalid forms of delimited or named
+     escape sequences.  */
+  bool cpp_warn_unicode;
+
   /* True if -finput-charset= option has been used explicitly.  */
   bool cpp_input_charset_explicit;
 
@@ -675,7 +679,8 @@ enum cpp_warning_reason {
   CPP_W_CXX20_COMPAT,
   CPP_W_EXPANSION_TO_DEFINED,
   CPP_W_BIDIRECTIONAL,
-  CPP_W_INVALID_UTF8
+  CPP_W_INVALID_UTF8,
+  CPP_W_UNICODE
 };
 
 /* Callback for header lookup for HEADER, which is the name of a
--- libcpp/init.cc.jj	2022-09-01 09:47:23.729892618 +0200
+++ libcpp/init.cc	2022-09-03 11:19:10.954452329 +0200
@@ -228,6 +228,7 @@ cpp_create_reader (enum c_lang lang, cpp
   CPP_OPTION (pfile, warn_date_time) = 0;
   CPP_OPTION (pfile, cpp_warn_bidirectional) = bidirectional_unpaired;
   CPP_OPTION (pfile, cpp_warn_invalid_utf8) = 0;
+  CPP_OPTION (pfile, cpp_warn_unicode) = 1;
   CPP_OPTION (pfile, cpp_input_charset_explicit) = 0;
 
   /* Default CPP arithmetic to something sensible for the host for the
--- libcpp/charset.cc.jj	2022-09-01 14:19:47.462235851 +0200
+++ libcpp/charset.cc	2022-09-03 11:26:14.858585905 +0200
@@ -1448,7 +1448,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const
   if (str[-1] == 'u')
     {
       length = 4;
-      if (str < limit && *str == '{')
+      if (str < limit
+	  && *str == '{'
+	  && (!identifier_pos
+	      || CPP_OPTION (pfile, delimited_escape_seqs)
+	      || !CPP_OPTION (pfile, std)))
 	{
 	  str++;
 	  /* Magic value to indicate no digits seen.  */
@@ -1462,8 +1466,22 @@ _cpp_valid_ucn (cpp_reader *pfile, const
   else if (str[-1] == 'N')
     {
       length = 4;
+      if (identifier_pos
+	  && !CPP_OPTION (pfile, delimited_escape_seqs)
+	  && CPP_OPTION (pfile, std))
+	{
+	  *cp = 0;
+	  return false;
+	}
       if (str == limit || *str != '{')
-	cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'");
+	{
+	  if (identifier_pos)
+	    {
+	      *cp = 0;
+	      return false;
+	    }
+	  cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'");
+	}
       else
 	{
 	  str++;
@@ -1472,6 +1490,7 @@ _cpp_valid_ucn (cpp_reader *pfile, const
 	  length = 0;
 	  const uchar *name = str;
 	  bool strict = true;
+	  const uchar *strict_end = name;
 
 	  do
 	    {
@@ -1481,7 +1500,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const
 	      if (!ISIDNUM (c) && c != ' ' && c != '-')
 		break;
 	      if (ISLOWER (c) || c == '_')
-		strict = false;
+		{
+		  if (strict)
+		    strict_end = str;
+		  strict = false;
+		}
 	      str++;
 	      extend_char_range (char_range, loc_reader);
 	    }
@@ -1489,8 +1512,35 @@ _cpp_valid_ucn (cpp_reader *pfile, const
 
 	  if (str < limit && *str == '}')
 	    {
-	      if (name == str && identifier_pos)
+	      if (identifier_pos && (name == str || !strict))
 		{
+		  if (name == str)
+		    cpp_warning (pfile, CPP_W_UNICODE,
+				 "empty named universal character escape "
+				 "sequence; treating it as separate tokens");
+		  else
+		    {
+		      char canon_name[uname2c_max_name_len + 1];
+		      result = _cpp_uname2c_uax44_lm2 ((const char *) name,
+						       str - name, canon_name);
+		      if (result == (cppchar_t) -1)
+			cpp_warning (pfile, CPP_W_UNICODE,
+				     "'\\N{' not terminated with '}' after "
+				     "%.*s; treating it as separate tokens",
+				     (int) (strict_end - base), base);
+		      else
+			{
+			  bool ret
+			    = cpp_warning (pfile, CPP_W_UNICODE,
+					   "\\N{%.*s} is not a valid "
+					   "universal character; treating it "
+					   "as separate tokens",
+					   (int) (str - name), name);
+			  if (ret)
+			    cpp_error (pfile, CPP_DL_NOTE,
+				       "did you mean \\N{%s}?", canon_name);
+			}
+		    }
 		  *cp = 0;
 		  return false;
 		}
@@ -1515,27 +1565,49 @@ _cpp_valid_ucn (cpp_reader *pfile, const
 					   uname2c_tree, NULL);
 		  if (result == (cppchar_t) -1)
 		    {
-		      cpp_error (pfile, CPP_DL_ERROR,
-				 "\\N{%.*s} is not a valid universal "
-				 "character", (int) (str - name), name);
+		      bool ret = true;
+		      if (identifier_pos
+			  && !CPP_OPTION (pfile, delimited_escape_seqs))
+			ret = cpp_warning (pfile, CPP_W_UNICODE,
+					   "\\N{%.*s} is not a valid "
+					   "universal character; treating it "
+					   "as separate tokens",
+					   (int) (str - name), name);
+		      else
+			cpp_error (pfile, CPP_DL_ERROR,
+				   "\\N{%.*s} is not a valid universal "
+				   "character", (int) (str - name), name);
 
 		      /* Try to do a loose name lookup according to
 			 Unicode loose matching rule UAX44-LM2.  */
 		      char canon_name[uname2c_max_name_len + 1];
 		      result = _cpp_uname2c_uax44_lm2 ((const char *) name,
 						       str - name, canon_name);
-		      if (result != (cppchar_t) -1)
+		      if (result != (cppchar_t) -1 && ret)
 			cpp_error (pfile, CPP_DL_NOTE,
 				   "did you mean \\N{%s}?", canon_name);
 		      else
-			result = 0x40;
+			result = 0xC0;
+		      if (identifier_pos
+			  && !CPP_OPTION (pfile, delimited_escape_seqs))
+			{
+			  *cp = 0;
+			  return false;
+			}
 		    }
 		}
 	      str++;
 	      extend_char_range (char_range, loc_reader);
 	    }
 	  else if (identifier_pos)
-	    length = 1;
+	    {
+	      cpp_warning (pfile, CPP_W_UNICODE,
+			   "'\\N{' not terminated with '}' after %.*s; "
+			   "treating it as separate tokens",
+			   (int) (str - base), base);
+	      *cp = 0;
+	      return false;
+	    }
 	  else
 	    {
 	      cpp_error (pfile, CPP_DL_ERROR,
@@ -1584,12 +1656,17 @@ _cpp_valid_ucn (cpp_reader *pfile, const
       }
     while (--length);
 
-  if (delimited
-      && str < limit
-      && *str == '}'
-      && (length != 32 || !identifier_pos))
+  if (delimited && str < limit && *str == '}')
     {
-      if (length == 32)
+      if (length == 32 && identifier_pos)
+	{
+	  cpp_warning (pfile, CPP_W_UNICODE,
+		       "empty delimited escape sequence; "
+		       "treating it as separate tokens");
+	  *cp = 0;
+	  return false;
+	}
+      else if (length == 32)
 	cpp_error (pfile, CPP_DL_ERROR,
 		   "empty delimited escape sequence");
       else if (!CPP_OPTION (pfile, delimited_escape_seqs)
@@ -1607,6 +1684,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const
      error message in that case.  */
   if (length && identifier_pos)
     {
+      if (delimited)
+	cpp_warning (pfile, CPP_W_UNICODE,
+		     "'\\u{' not terminated with '}' after %.*s; "
+		     "treating it as separate tokens",
+		     (int) (str - base), base);
       *cp = 0;
       return false;
     }
--- gcc/doc/invoke.texi.jj	2022-09-03 09:35:40.966991672 +0200
+++ gcc/doc/invoke.texi	2022-09-03 11:39:03.875914845 +0200
@@ -365,7 +365,7 @@ Objective-C and Objective-C++ Dialects}.
 -Winfinite-recursion @gol
 -Winit-self  -Winline  -Wno-int-conversion  -Wint-in-bool-context @gol
 -Wno-int-to-pointer-cast  -Wno-invalid-memory-model @gol
--Winvalid-pch  -Winvalid-utf8 -Wjump-misses-init  @gol
+-Winvalid-pch  -Winvalid-utf8  -Wno-unicode  -Wjump-misses-init  @gol
 -Wlarger-than=@var{byte-size}  -Wlogical-not-parentheses  -Wlogical-op  @gol
 -Wlong-long  -Wno-lto-type-mismatch -Wmain  -Wmaybe-uninitialized @gol
 -Wmemset-elt-size  -Wmemset-transposed-args @gol
@@ -9577,6 +9577,12 @@ Warn if an invalid UTF-8 character is fo
 This warning is on by default for C++23 if @option{-finput-charset=UTF-8}
 is used and turned into error with @option{-pedantic-errors}.
 
+@item -Wno-unicode
+@opindex Wunicode
+@opindex Wno-unicode
+Don't diagnose invalid forms of delimited or named escape sequences which are
+treated as separate tokens.  @option{Wunicode} is enabled by default.
+
 @item -Wlong-long
 @opindex Wlong-long
 @opindex Wno-long-long
--- gcc/c-family/c.opt.jj	2022-09-03 09:35:40.206002393 +0200
+++ gcc/c-family/c.opt	2022-09-03 11:17:04.529201926 +0200
@@ -822,8 +822,8 @@ C ObjC C++ ObjC++ CPP(warn_invalid_pch)
 Warn about PCH files that are found but not used.
 
 Winvalid-utf8
-C objC C++ ObjC++ CPP(cpp_warn_invalid_utf8) CppReason(CPP_W_INVALID_UTF8) Var(warn_invalid_utf8) Init(0) Warning
-Warn about invalid UTF-8 characters in comments.
+C ObjC C++ ObjC++ CPP(cpp_warn_invalid_utf8) CppReason(CPP_W_INVALID_UTF8) Var(warn_invalid_utf8) Init(0) Warning
+Warn about invalid UTF-8 characters.
 
 Wjump-misses-init
 C ObjC Var(warn_jump_misses_init) Warning LangEnabledby(C ObjC,Wc++-compat)
@@ -1345,6 +1345,10 @@ Wundef
 C ObjC C++ ObjC++ CPP(warn_undef) CppReason(CPP_W_UNDEF) Var(cpp_warn_undef) Init(0) Warning
 Warn if an undefined macro is used in an #if directive.
 
+Wunicode
+C ObjC C++ ObjC++ CPP(cpp_warn_unicode) CppReason(CPP_W_UNICODE) Var(warn_unicode) Init(1) Warning
+Warn about invalid forms of delimited or named escape sequences.
+
 Wuninitialized
 C ObjC C++ ObjC++ LTO LangEnabledBy(C ObjC C++ ObjC++ LTO,Wall)
 ;
--- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c.jj	2022-09-03 11:13:37.570068845 +0200
+++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c	2022-09-03 11:56:52.818054420 +0200
@@ -0,0 +1,13 @@
+/* P2290R3 - Delimited escape sequences */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */
+/* { dg-options "-std=gnu++20" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\u{});		/* { dg-warning "empty delimited escape sequence; treating it as separate tokens" } */
+int c = a\u{);		/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */
+int d = a\u{12XYZ});	/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */
+int e = a\u123);
+int f = a\U1234567);
--- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c.jj	2022-09-03 11:13:37.570068845 +0200
+++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c	2022-09-03 12:01:35.618124647 +0200
@@ -0,0 +1,13 @@
+/* P2290R3 - Delimited escape sequences */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */
+/* { dg-options "-std=c++23" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\u{});		/* { dg-warning "empty delimited escape sequence; treating it as separate tokens" "" { target c++23 } } */
+int c = a\u{);		/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" "" { target c++23 } } */
+int d = a\u{12XYZ});	/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" "" { target c++23 } } */
+int e = a\u123);
+int f = a\U1234567);
--- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-6.c.jj	2022-09-03 11:59:36.573778876 +0200
+++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-6.c	2022-09-03 11:59:55.808511591 +0200
@@ -0,0 +1,13 @@
+/* P2290R3 - Delimited escape sequences */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=gnu99 -Wno-c++-compat -Wno-unicode" { target c } } */
+/* { dg-options "-std=gnu++20 -Wno-unicode" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\u{});		/* { dg-bogus "empty delimited escape sequence; treating it as separate tokens" } */
+int c = a\u{);		/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */
+int d = a\u{12XYZ});	/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */
+int e = a\u123);
+int f = a\U1234567);
--- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-7.c.jj	2022-09-03 12:01:48.958939255 +0200
+++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-7.c	2022-09-03 12:02:16.765552854 +0200
@@ -0,0 +1,13 @@
+/* P2290R3 - Delimited escape sequences */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=c17 -Wno-c++-compat -Wno-unicode" { target c } } */
+/* { dg-options "-std=c++23 -Wno-unicode" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\u{});		/* { dg-bogus "empty delimited escape sequence; treating it as separate tokens" } */
+int c = a\u{);		/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */
+int d = a\u{12XYZ});	/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */
+int e = a\u123);
+int f = a\U1234567);
--- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c.jj	2022-09-03 11:13:37.570068845 +0200
+++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c	2022-09-03 12:12:29.596042747 +0200
@@ -0,0 +1,17 @@
+/* P2071R2 - Named universal character escapes */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */
+/* { dg-options "-std=gnu++20" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\N{});				/* { dg-warning "empty named universal character escape sequence; treating it as separate tokens" } */
+int c = a\N{);				/* { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});				/* { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */
+int g = a\N{ABC.123});				/* { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" } */
+int h = a\N{NON-EXISTENT CHAR});	/* { dg-warning "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" } */
+int i = a\N{Latin_Small_Letter_A_With_Acute});	/* { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" } */
+					/* { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 } */
--- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c.jj	2022-09-03 11:13:37.570068845 +0200
+++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c	2022-09-03 11:44:34.558316155 +0200
@@ -0,0 +1,17 @@
+/* P2071R2 - Named universal character escapes */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */
+/* { dg-options "-std=c++20" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\N{});
+int c = a\N{);
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});
+int g = a\N{ABC.123});
+int h = a\N{NON-EXISTENT CHAR});	/* { dg-bogus "is not a valid universal character" } */
+int i = a\N{Latin_Small_Letter_A_With_Acute});
+int j = a\N{LATIN SMALL LETTER A WITH ACUTE});
--- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-7.c.jj	2022-09-03 12:18:31.296022384 +0200
+++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-7.c	2022-09-03 12:19:00.956610699 +0200
@@ -0,0 +1,17 @@
+/* P2071R2 - Named universal character escapes */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=gnu99 -Wno-c++-compat -Wno-unicode" { target c } } */
+/* { dg-options "-std=gnu++20 -Wno-unicode" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\N{});				/* { dg-bogus "empty named universal character escape sequence; treating it as separate tokens" } */
+int c = a\N{);				/* { dg-bogus "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});				/* { dg-bogus "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */
+int g = a\N{ABC.123});				/* { dg-bogus "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" } */
+int h = a\N{NON-EXISTENT CHAR});	/* { dg-bogus "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" } */
+int i = a\N{Latin_Small_Letter_A_With_Acute});	/* { dg-bogus "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" } */
+					/* { dg-bogus "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 } */
--- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C.jj	2022-09-03 11:13:37.571068831 +0200
+++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C	2022-09-03 12:16:49.010442096 +0200
@@ -0,0 +1,16 @@
+// P2071R2 - Named universal character escapes
+// { dg-do compile }
+// { dg-require-effective-target wchar }
+
+#define z(x) 0
+#define a z(
+int b = a\N{});				// { dg-warning "empty named universal character escape sequence; treating it as separate tokens" "" { target c++23 } }
+int c = a\N{);				// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" "" { target c++23 } }
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});			// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" "" { target c++23 } }
+int g = a\N{ABC.123});			// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" "" { target c++23 } }
+int h = a\N{NON-EXISTENT CHAR});	// { dg-error "is not a valid universal character" "" { target c++23 } }
+					// { dg-error "was not declared in this scope" "" { target c++23 } .-1 }
+int i = a\N{Latin_Small_Letter_A_With_Acute});	// { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" "" { target c++23 } }
+					// { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target c++23 } .-1 }
--- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C.jj	2022-09-03 11:13:37.571068831 +0200
+++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C	2022-09-03 12:18:03.567407252 +0200
@@ -0,0 +1,18 @@
+// P2071R2 - Named universal character escapes
+// { dg-do compile }
+// { dg-require-effective-target wchar }
+// { dg-options "" }
+
+#define z(x) 0
+#define a z(
+int b = a\N{});				// { dg-warning "empty named universal character escape sequence; treating it as separate tokens" }
+int c = a\N{);				// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" }
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});			// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" }
+int g = a\N{ABC.123});			// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" }
+int h = a\N{NON-EXISTENT CHAR});	// { dg-error "is not a valid universal character" "" { target c++23 } }
+					// { dg-error "was not declared in this scope" "" { target c++23 } .-1 }
+					// { dg-warning "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" "" { target c++20_down } .-2 }
+int i = a\N{Latin_Small_Letter_A_With_Acute});	// { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" }
+					// { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 }