From patchwork Wed Aug 31 14:15:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 870 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:ecc5:0:0:0:0:0 with SMTP id s5csp252736wro; Wed, 31 Aug 2022 07:16:55 -0700 (PDT) X-Google-Smtp-Source: AA6agR6d+vHJUkIFvfUDTE5SSj3+p8KYZ55tsJ2sSBadJDF6a1xO8/5F8d9hDDI/fSKQE/x1IdOJ X-Received: by 2002:aa7:d292:0:b0:447:f99d:8b9b with SMTP id w18-20020aa7d292000000b00447f99d8b9bmr19779137edq.29.1661955415771; Wed, 31 Aug 2022 07:16:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661955415; cv=none; d=google.com; s=arc-20160816; b=MrJ6mwS3ftFd9bqBQWawYgcLgCpohwjfh25U6cm7HD59BBATItXAd1HZTgsOFEiYnB KlmPotF8p0VHq3eyOJn2XuDHlT6UGoD5+OlNw5AFhMGKctX9kcUc6warfg53IYcZ9aAd UicUPmJ08R29g6+aJ4UzmB20XiKp5+9sB4W6peZdzb2vqz34NYLD5GiwGSDb8/tXpwIV ChnvRniWekXpZmnmCJPXPg7Cs9GGSHhMTjt8n+r5IP9d70PK4U3XmvZ8RQbBsGcG50Vf ubnLw8qggvUxniw2E7cUWzndN/uc7Ntt6ADgJI+eGlQ2Dkq4JbN22gP0uOUchSayB6TV oCtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:content-disposition:in-reply-to :mime-version:references:message-id:subject:to:date:dmarc-filter :delivered-to:dkim-signature:dkim-filter; bh=XN0bcfestiyvZHZAb+DFAKUaQQXwvtnd6JmvwquLFJ4=; b=GDev3Rnbi7ov8fxlqBCoqhal3Fc+rCRgSD12TS5BEIJVaQptkkN+Qa0+K7Y/1tiKVz v0FwIBnDq4ps0JM6IgsWVY8f0smrpY9pYvbs7XlQlQfR6bFm192t591gpsn23+k1WyCS viqMS4dnPC6dOkAknieOspDJ1B450ITi9QSW2fSALukLciU0a4xLuC15Ci9eozI23Zq/ uVcIfbZVdh+CVEFOoo064hBW/0FM05jSuy+l6qpkY1yjUT59obPPhc6hoHkhuyaj5B9X sNSHVaB3lM9L3NQFt23hDsXxfxMn+aUBP0CJU/8QtQS7ecRNrRBqx9W7k82Se2rnaKXG wrQQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=BZ10xdj+; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id gb2-20020a170907960200b007392f9ad702si12515777ejc.741.2022.08.31.07.16.55 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Aug 2022 07:16:55 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=BZ10xdj+; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4A2E438515FB for ; Wed, 31 Aug 2022 14:16:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4A2E438515FB DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1661955414; bh=XN0bcfestiyvZHZAb+DFAKUaQQXwvtnd6JmvwquLFJ4=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=BZ10xdj+JOMBAXchmY77ddSePQ3GEY4a9jL1wO54T5c9L2XILSHWITDwcyAz/mmvF QnsQ7TKLo+lqiNQ5s5jikRk97iR9VrHqvY0EG6d3L5kFLjMQXnVBfUa3IGC5aYQD+0 K+WMU4i0IesGHWhZWsrS2v6xeO6u3V+EDVUqYFFM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id DC5C73858D39 for ; Wed, 31 Aug 2022 14:16:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DC5C73858D39 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-582-sJNl7zTgNP2RWRvCP7FSBw-1; Wed, 31 Aug 2022 10:16:00 -0400 X-MC-Unique: sJNl7zTgNP2RWRvCP7FSBw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F04A3185A7BA for ; Wed, 31 Aug 2022 14:15:59 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 52E5FC15BBA; Wed, 31 Aug 2022 14:15:59 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 27VEFunb205761 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Wed, 31 Aug 2022 16:15:57 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 27VEFuId205760; Wed, 31 Aug 2022 16:15:56 +0200 Date: Wed, 31 Aug 2022 16:15:55 +0200 To: Jason Merrill Subject: [PATCH] libcpp, v4: Add -Winvalid-utf8 warning [PR106655] Message-ID: References: <53c4b971-4f14-848c-e921-e10d6f18407f@redhat.com> <95a40255-7d7e-4a59-e0b0-589ee5770238@redhat.com> MIME-Version: 1.0 In-Reply-To: <95a40255-7d7e-4a59-e0b0-589ee5770238@redhat.com> X-Scanned-By: MIMEDefang 2.85 on 10.11.54.8 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00, BODY_8BITS, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jakub Jelinek via Gcc-patches From: Jakub Jelinek Reply-To: Jakub Jelinek Cc: gcc-patches@gcc.gnu.org Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1742686562173362790?= X-GMAIL-MSGID: =?utf-8?q?1742686562173362790?= On Wed, Aug 31, 2022 at 09:55:29AM -0400, Jason Merrill wrote: > On 8/31/22 07:14, Jakub Jelinek wrote: > > On Tue, Aug 30, 2022 at 05:51:26PM -0400, Jason Merrill wrote: > > > This hunk now seems worth factoring out of the four places it occurs. > > > > > > It also seems the comment for _cpp_valid_utf8 needs to be updated: it > > > currently says it's not called when parsing a string. > > > > Ok, so like this? > > OK, thanks. Actually, it isn't enough to diagnose this in comments and character/string literals, sorry for finding that out only today. We don't accept invalid UTF-8 in identifiers, it fails the checking in there (most of the times without errors), what we do is create CPP_OTHER tokens out of those and then typically diagnose it when it is used somewhere. Except it doesn't have to be used anywhere, it can be omitted. So if we have say #define I(x) I(���) like in the Winvalid-utf8-3.c test, we silently accept it. This updated version extends the diagnostics even to those cases. I can't use _cpp_handle_multibyte_utf8 in that case because it needs different treatment (no bidi stuff which is emitted already from forms_identifier_p etc.). Tested so far on the new tests, ok for trunk if it passes full bootstrap/regtest? 2022-08-31 Jakub Jelinek PR c++/106655 libcpp/ * include/cpplib.h (struct cpp_options): Implement C++23 P2295R6 - Support for UTF-8 as a portable source file encoding. Add cpp_warn_invalid_utf8 and cpp_input_charset_explicit fields. (enum cpp_warning_reason): Add CPP_W_INVALID_UTF8 enumerator. * init.cc (cpp_create_reader): Initialize cpp_warn_invalid_utf8 and cpp_input_charset_explicit. * charset.cc (_cpp_valid_utf8): Adjust function comment. * lex.cc (UCS_LIMIT): Define. (utf8_continuation): New const variable. (utf8_signifier): Move earlier in the file. (_cpp_warn_invalid_utf8, _cpp_handle_multibyte_utf8): New functions. (_cpp_skip_block_comment): Handle -Winvalid-utf8 warning. (skip_line_comment): Likewise. (lex_raw_string, lex_string): Likewise. (_cpp_lex_direct): Likewise. gcc/ * doc/invoke.texi (-Winvalid-utf8): Document it. gcc/c-family/ * c.opt (-Winvalid-utf8): New warning. * c-opts.c (c_common_handle_option) : Set cpp_opts->cpp_input_charset_explicit. (c_common_post_options): If -finput-charset=UTF-8 is explicit in C++23, enable -Winvalid-utf8 by default and if -pedantic or -pedantic-errors, make it a pedwarn. gcc/testsuite/ * c-c++-common/cpp/Winvalid-utf8-1.c: New test. * c-c++-common/cpp/Winvalid-utf8-2.c: New test. * c-c++-common/cpp/Winvalid-utf8-3.c: New test. * g++.dg/cpp23/Winvalid-utf8-1.C: New test. * g++.dg/cpp23/Winvalid-utf8-2.C: New test. * g++.dg/cpp23/Winvalid-utf8-3.C: New test. * g++.dg/cpp23/Winvalid-utf8-4.C: New test. * g++.dg/cpp23/Winvalid-utf8-5.C: New test. * g++.dg/cpp23/Winvalid-utf8-6.C: New test. * g++.dg/cpp23/Winvalid-utf8-7.C: New test. * g++.dg/cpp23/Winvalid-utf8-8.C: New test. * g++.dg/cpp23/Winvalid-utf8-9.C: New test. * g++.dg/cpp23/Winvalid-utf8-10.C: New test. * g++.dg/cpp23/Winvalid-utf8-11.C: New test. * g++.dg/cpp23/Winvalid-utf8-12.C: New test. Jakub --- libcpp/include/cpplib.h.jj 2022-08-31 10:19:45.226452609 +0200 +++ libcpp/include/cpplib.h 2022-08-31 12:25:42.451125755 +0200 @@ -560,6 +560,13 @@ struct cpp_options cpp_bidirectional_level. */ unsigned char cpp_warn_bidirectional; + /* True if libcpp should warn about invalid UTF-8 characters in comments. + 2 if it should be a pedwarn. */ + unsigned char cpp_warn_invalid_utf8; + + /* True if -finput-charset= option has been used explicitly. */ + bool cpp_input_charset_explicit; + /* Dependency generation. */ struct { @@ -666,7 +673,8 @@ enum cpp_warning_reason { CPP_W_CXX11_COMPAT, CPP_W_CXX20_COMPAT, CPP_W_EXPANSION_TO_DEFINED, - CPP_W_BIDIRECTIONAL + CPP_W_BIDIRECTIONAL, + CPP_W_INVALID_UTF8 }; /* Callback for header lookup for HEADER, which is the name of a --- libcpp/init.cc.jj 2022-08-31 10:19:45.260452148 +0200 +++ libcpp/init.cc 2022-08-31 12:25:42.451125755 +0200 @@ -227,6 +227,8 @@ cpp_create_reader (enum c_lang lang, cpp CPP_OPTION (pfile, ext_numeric_literals) = 1; CPP_OPTION (pfile, warn_date_time) = 0; CPP_OPTION (pfile, cpp_warn_bidirectional) = bidirectional_unpaired; + CPP_OPTION (pfile, cpp_warn_invalid_utf8) = 0; + CPP_OPTION (pfile, cpp_input_charset_explicit) = 0; /* Default CPP arithmetic to something sensible for the host for the benefit of dumb users like fix-header. */ --- libcpp/charset.cc.jj 2022-08-26 16:06:10.578493272 +0200 +++ libcpp/charset.cc 2022-08-31 12:34:18.921176118 +0200 @@ -1742,9 +1742,9 @@ convert_ucn (cpp_reader *pfile, const uc case, no diagnostic is emitted, and the return value of FALSE should cause a new token to be formed. - Unlike _cpp_valid_ucn, this will never be called when lexing a string; only - a potential identifier, or a CPP_OTHER token. NST is unused in the latter - case. + _cpp_valid_utf8 can be called when lexing a potential identifier, or a + CPP_OTHER token or for the purposes of -Winvalid-utf8 warning in string or + character literals. NST is unused when not in a potential identifier. As in _cpp_valid_ucn, IDENTIFIER_POS is 0 when not in an identifier, 1 for the start of an identifier, or 2 otherwise. */ --- libcpp/lex.cc.jj 2022-08-31 10:19:45.327451236 +0200 +++ libcpp/lex.cc 2022-08-31 15:23:53.753556178 +0200 @@ -50,6 +50,9 @@ static const struct token_spelling token #define TOKEN_SPELL(token) (token_spellings[(token)->type].category) #define TOKEN_NAME(token) (token_spellings[(token)->type].name) +/* ISO 10646 defines the UCS codespace as the range 0-0x10FFFF inclusive. */ +#define UCS_LIMIT 0x10FFFF + static void add_line_note (cpp_buffer *, const uchar *, unsigned int); static int skip_line_comment (cpp_reader *); static void skip_whitespace (cpp_reader *, cppchar_t); @@ -1704,6 +1707,120 @@ maybe_warn_bidi_on_char (cpp_reader *pfi bidi::on_char (kind, ucn_p, loc); } +static const cppchar_t utf8_continuation = 0x80; +static const cppchar_t utf8_signifier = 0xC0; + +/* Emit -Winvalid-utf8 warning on invalid UTF-8 character starting + at PFILE->buffer->cur. Return a pointer after the diagnosed + invalid character. */ + +static const uchar * +_cpp_warn_invalid_utf8 (cpp_reader *pfile) +{ + cpp_buffer *buffer = pfile->buffer; + const uchar *cur = buffer->cur; + bool pedantic = (CPP_PEDANTIC (pfile) + && CPP_OPTION (pfile, cpp_warn_invalid_utf8) == 2); + + if (cur[0] < utf8_signifier + || cur[1] < utf8_continuation || cur[1] >= utf8_signifier) + { + if (pedantic) + cpp_error_with_line (pfile, CPP_DL_PEDWARN, + pfile->line_table->highest_line, + CPP_BUF_COL (buffer), + "invalid UTF-8 character <%x>", + cur[0]); + else + cpp_warning_with_line (pfile, CPP_W_INVALID_UTF8, + pfile->line_table->highest_line, + CPP_BUF_COL (buffer), + "invalid UTF-8 character <%x>", + cur[0]); + return cur + 1; + } + else if (cur[2] < utf8_continuation || cur[2] >= utf8_signifier) + { + if (pedantic) + cpp_error_with_line (pfile, CPP_DL_PEDWARN, + pfile->line_table->highest_line, + CPP_BUF_COL (buffer), + "invalid UTF-8 character <%x><%x>", + cur[0], cur[1]); + else + cpp_warning_with_line (pfile, CPP_W_INVALID_UTF8, + pfile->line_table->highest_line, + CPP_BUF_COL (buffer), + "invalid UTF-8 character <%x><%x>", + cur[0], cur[1]); + return cur + 2; + } + else if (cur[3] < utf8_continuation || cur[3] >= utf8_signifier) + { + if (pedantic) + cpp_error_with_line (pfile, CPP_DL_PEDWARN, + pfile->line_table->highest_line, + CPP_BUF_COL (buffer), + "invalid UTF-8 character <%x><%x><%x>", + cur[0], cur[1], cur[2]); + else + cpp_warning_with_line (pfile, CPP_W_INVALID_UTF8, + pfile->line_table->highest_line, + CPP_BUF_COL (buffer), + "invalid UTF-8 character <%x><%x><%x>", + cur[0], cur[1], cur[2]); + return cur + 3; + } + else + { + if (pedantic) + cpp_error_with_line (pfile, CPP_DL_PEDWARN, + pfile->line_table->highest_line, + CPP_BUF_COL (buffer), + "invalid UTF-8 character <%x><%x><%x><%x>", + cur[0], cur[1], cur[2], cur[3]); + else + cpp_warning_with_line (pfile, CPP_W_INVALID_UTF8, + pfile->line_table->highest_line, + CPP_BUF_COL (buffer), + "invalid UTF-8 character <%x><%x><%x><%x>", + cur[0], cur[1], cur[2], cur[3]); + return cur + 4; + } +} + +/* Helper function of *skip_*_comment and lex*_string. For C, + character at CUR[-1] with MSB set handle -Wbidi-chars* and + -Winvalid-utf8 diagnostics and return pointer to first character + that should be processed next. */ + +static inline const uchar * +_cpp_handle_multibyte_utf8 (cpp_reader *pfile, uchar c, + const uchar *cur, bool warn_bidi_p, + bool warn_invalid_utf8_p) +{ + /* If this is a beginning of a UTF-8 encoding, it might be + a bidirectional control character. */ + if (c == bidi::utf8_start && warn_bidi_p) + { + location_t loc; + bidi::kind kind = get_bidi_utf8 (pfile, cur - 1, &loc); + maybe_warn_bidi_on_char (pfile, kind, /*ucn_p=*/false, loc); + } + if (!warn_invalid_utf8_p) + return cur; + if (c >= utf8_signifier) + { + cppchar_t s; + const uchar *pstr = cur - 1; + if (_cpp_valid_utf8 (pfile, &pstr, pfile->buffer->rlimit, 0, NULL, &s) + && s <= UCS_LIMIT) + return pstr; + } + pfile->buffer->cur = cur - 1; + return _cpp_warn_invalid_utf8 (pfile); +} + /* Skip a C-style block comment. We find the end of the comment by seeing if an asterisk is before every '/' we encounter. Returns nonzero if comment terminated by EOF, zero otherwise. @@ -1716,6 +1833,8 @@ _cpp_skip_block_comment (cpp_reader *pfi const uchar *cur = buffer->cur; uchar c; const bool warn_bidi_p = pfile->warn_bidi_p (); + const bool warn_invalid_utf8_p = CPP_OPTION (pfile, cpp_warn_invalid_utf8); + const bool warn_bidi_or_invalid_utf8_p = warn_bidi_p | warn_invalid_utf8_p; cur++; if (*cur == '/') @@ -1765,14 +1884,10 @@ _cpp_skip_block_comment (cpp_reader *pfi cur = buffer->cur; } - /* If this is a beginning of a UTF-8 encoding, it might be - a bidirectional control character. */ - else if (__builtin_expect (c == bidi::utf8_start, 0) && warn_bidi_p) - { - location_t loc; - bidi::kind kind = get_bidi_utf8 (pfile, cur - 1, &loc); - maybe_warn_bidi_on_char (pfile, kind, /*ucn_p=*/false, loc); - } + else if (__builtin_expect (c >= utf8_continuation, 0) + && warn_bidi_or_invalid_utf8_p) + cur = _cpp_handle_multibyte_utf8 (pfile, c, cur, warn_bidi_p, + warn_invalid_utf8_p); } buffer->cur = cur; @@ -1789,11 +1904,13 @@ skip_line_comment (cpp_reader *pfile) cpp_buffer *buffer = pfile->buffer; location_t orig_line = pfile->line_table->highest_line; const bool warn_bidi_p = pfile->warn_bidi_p (); + const bool warn_invalid_utf8_p = CPP_OPTION (pfile, cpp_warn_invalid_utf8); + const bool warn_bidi_or_invalid_utf8_p = warn_bidi_p | warn_invalid_utf8_p; - if (!warn_bidi_p) + if (!warn_bidi_or_invalid_utf8_p) while (*buffer->cur != '\n') buffer->cur++; - else + else if (!warn_invalid_utf8_p) { while (*buffer->cur != '\n' && *buffer->cur != bidi::utf8_start) @@ -1813,6 +1930,22 @@ skip_line_comment (cpp_reader *pfile) maybe_warn_bidi_on_close (pfile, buffer->cur); } } + else + { + while (*buffer->cur != '\n') + { + if (*buffer->cur < utf8_continuation) + { + buffer->cur++; + continue; + } + buffer->cur + = _cpp_handle_multibyte_utf8 (pfile, *buffer->cur, buffer->cur + 1, + warn_bidi_p, warn_invalid_utf8_p); + } + if (warn_bidi_p) + maybe_warn_bidi_on_close (pfile, buffer->cur); + } _cpp_process_line_notes (pfile, true); return orig_line != pfile->line_table->highest_line; @@ -1919,8 +2052,6 @@ warn_about_normalization (cpp_reader *pf } } -static const cppchar_t utf8_signifier = 0xC0; - /* Returns TRUE if the sequence starting at buffer->cur is valid in an identifier. FIRST is TRUE if this starts an identifier. */ @@ -2361,6 +2492,8 @@ lex_raw_string (cpp_reader *pfile, cpp_t { const uchar *pos = base; const bool warn_bidi_p = pfile->warn_bidi_p (); + const bool warn_invalid_utf8_p = CPP_OPTION (pfile, cpp_warn_invalid_utf8); + const bool warn_bidi_or_invalid_utf8_p = warn_bidi_p | warn_invalid_utf8_p; /* 'tis a pity this information isn't passed down from the lexer's initial categorization of the token. */ @@ -2597,13 +2730,10 @@ lex_raw_string (cpp_reader *pfile, cpp_t pos = base = pfile->buffer->cur; note = &pfile->buffer->notes[pfile->buffer->cur_note]; } - else if (__builtin_expect ((unsigned char) c == bidi::utf8_start, 0) - && warn_bidi_p) - { - location_t loc; - bidi::kind kind = get_bidi_utf8 (pfile, pos - 1, &loc); - maybe_warn_bidi_on_char (pfile, kind, /*ucn_p=*/false, loc); - } + else if (__builtin_expect ((unsigned char) c >= utf8_continuation, 0) + && warn_bidi_or_invalid_utf8_p) + pos = _cpp_handle_multibyte_utf8 (pfile, c, pos, warn_bidi_p, + warn_invalid_utf8_p); } if (warn_bidi_p) @@ -2704,6 +2834,8 @@ lex_string (cpp_reader *pfile, cpp_token terminator = '>', type = CPP_HEADER_NAME; const bool warn_bidi_p = pfile->warn_bidi_p (); + const bool warn_invalid_utf8_p = CPP_OPTION (pfile, cpp_warn_invalid_utf8); + const bool warn_bidi_or_invalid_utf8_p = warn_bidi_p | warn_invalid_utf8_p; for (;;) { cppchar_t c = *cur++; @@ -2745,12 +2877,10 @@ lex_string (cpp_reader *pfile, cpp_token } else if (c == '\0') saw_NUL = true; - else if (__builtin_expect (c == bidi::utf8_start, 0) && warn_bidi_p) - { - location_t loc; - bidi::kind kind = get_bidi_utf8 (pfile, cur - 1, &loc); - maybe_warn_bidi_on_char (pfile, kind, /*ucn_p=*/false, loc); - } + else if (__builtin_expect (c >= utf8_continuation, 0) + && warn_bidi_or_invalid_utf8_p) + cur = _cpp_handle_multibyte_utf8 (pfile, c, cur, warn_bidi_p, + warn_invalid_utf8_p); } if (saw_NUL && !pfile->state.skipping) @@ -4052,6 +4182,7 @@ _cpp_lex_direct (cpp_reader *pfile) default: { const uchar *base = --buffer->cur; + static int no_warn_cnt; /* Check for an extended identifier ($ or UCN or UTF-8). */ struct normalize_state nst = INITIAL_NORMALIZE_STATE; @@ -4072,7 +4203,33 @@ _cpp_lex_direct (cpp_reader *pfile) const uchar *pstr = base; cppchar_t s; if (_cpp_valid_utf8 (pfile, &pstr, buffer->rlimit, 0, NULL, &s)) - buffer->cur = pstr; + { + if (s > UCS_LIMIT && CPP_OPTION (pfile, cpp_warn_invalid_utf8)) + { + buffer->cur = base; + _cpp_warn_invalid_utf8 (pfile); + } + buffer->cur = pstr; + } + else if (CPP_OPTION (pfile, cpp_warn_invalid_utf8)) + { + buffer->cur = base; + const uchar *end = _cpp_warn_invalid_utf8 (pfile); + buffer->cur = base + 1; + no_warn_cnt = end - buffer->cur; + } + } + else if (c >= utf8_continuation + && CPP_OPTION (pfile, cpp_warn_invalid_utf8)) + { + if (no_warn_cnt) + --no_warn_cnt; + else + { + buffer->cur = base; + _cpp_warn_invalid_utf8 (pfile); + buffer->cur = base + 1; + } } create_literal (pfile, result, base, buffer->cur - base, CPP_OTHER); break; --- gcc/doc/invoke.texi.jj 2022-08-31 10:20:20.224976860 +0200 +++ gcc/doc/invoke.texi 2022-08-31 14:58:17.654138549 +0200 @@ -365,9 +365,9 @@ Objective-C and Objective-C++ Dialects}. -Winfinite-recursion @gol -Winit-self -Winline -Wno-int-conversion -Wint-in-bool-context @gol -Wno-int-to-pointer-cast -Wno-invalid-memory-model @gol --Winvalid-pch -Wjump-misses-init -Wlarger-than=@var{byte-size} @gol --Wlogical-not-parentheses -Wlogical-op -Wlong-long @gol --Wno-lto-type-mismatch -Wmain -Wmaybe-uninitialized @gol +-Winvalid-pch -Winvalid-utf8 -Wjump-misses-init @gol +-Wlarger-than=@var{byte-size} -Wlogical-not-parentheses -Wlogical-op @gol +-Wlong-long -Wno-lto-type-mismatch -Wmain -Wmaybe-uninitialized @gol -Wmemset-elt-size -Wmemset-transposed-args @gol -Wmisleading-indentation -Wmissing-attributes -Wmissing-braces @gol -Wmissing-field-initializers -Wmissing-format-attribute @gol @@ -9569,6 +9569,13 @@ different size. Warn if a precompiled header (@pxref{Precompiled Headers}) is found in the search path but cannot be used. +@item -Winvalid-utf8 +@opindex Winvalid-utf8 +@opindex Wno-invalid-utf8 +Warn if an invalid UTF-8 character is found. +This warning is on by default for C++23 if @option{-finput-charset=UTF-8} +is used and turned into error with @option{-pedantic-errors}. + @item -Wlong-long @opindex Wlong-long @opindex Wno-long-long --- gcc/c-family/c.opt.jj 2022-08-31 10:19:45.145453711 +0200 +++ gcc/c-family/c.opt 2022-08-31 12:25:42.457125674 +0200 @@ -821,6 +821,10 @@ Winvalid-pch C ObjC C++ ObjC++ CPP(warn_invalid_pch) CppReason(CPP_W_INVALID_PCH) Var(cpp_warn_invalid_pch) Init(0) Warning Warn about PCH files that are found but not used. +Winvalid-utf8 +C objC C++ ObjC++ CPP(cpp_warn_invalid_utf8) CppReason(CPP_W_INVALID_UTF8) Var(warn_invalid_utf8) Init(0) Warning +Warn about invalid UTF-8 characters in comments. + Wjump-misses-init C ObjC Var(warn_jump_misses_init) Warning LangEnabledby(C ObjC,Wc++-compat) Warn when a jump misses a variable initialization. --- gcc/c-family/c-opts.cc.jj 2022-08-31 10:19:45.080454594 +0200 +++ gcc/c-family/c-opts.cc 2022-08-31 12:25:42.457125674 +0200 @@ -534,6 +534,7 @@ c_common_handle_option (size_t scode, co case OPT_finput_charset_: cpp_opts->input_charset = arg; + cpp_opts->cpp_input_charset_explicit = 1; break; case OPT_ftemplate_depth_: @@ -1152,6 +1153,17 @@ c_common_post_options (const char **pfil lang_hooks.preprocess_options (parse_in); cpp_post_options (parse_in); init_global_opts_from_cpp (&global_options, cpp_get_options (parse_in)); + /* For C++23 and explicit -finput-charset=UTF-8, turn on -Winvalid-utf8 + by default and make it a pedwarn unless -Wno-invalid-utf8. */ + if (cxx_dialect >= cxx23 + && cpp_opts->cpp_input_charset_explicit + && strcmp (cpp_opts->input_charset, "UTF-8") == 0 + && (cpp_opts->cpp_warn_invalid_utf8 + || !global_options_set.x_warn_invalid_utf8)) + { + global_options.x_warn_invalid_utf8 = 1; + cpp_opts->cpp_warn_invalid_utf8 = cpp_opts->cpp_pedantic ? 2 : 1; + } /* Let diagnostics infrastructure know how to convert input files the same way libcpp will do it, namely using the configured input charset and --- gcc/testsuite/c-c++-common/cpp/Winvalid-utf8-1.c.jj 2022-08-31 12:25:42.458125660 +0200 +++ gcc/testsuite/c-c++-common/cpp/Winvalid-utf8-1.c 2022-08-31 14:58:28.977986756 +0200 @@ -0,0 +1,43 @@ +// P2295R6 - Support for UTF-8 as a portable source file encoding +// This test intentionally contains various byte sequences which are not valid UTF-8 +// { dg-do preprocess } +// { dg-options "-finput-charset=UTF-8 -Winvalid-utf8" } + +// a€߿ࠀ퟿𐀀􏿿a { dg-bogus "invalid UTF-8 character" } +// a�a { dg-warning "invalid UTF-8 character <80>" } +// a�a { dg-warning "invalid UTF-8 character " } +// a�a { dg-warning "invalid UTF-8 character " } +// a�a { dg-warning "invalid UTF-8 character " } +// a�a { dg-warning "invalid UTF-8 character " } +// a�a { dg-warning "invalid UTF-8 character " } +// a�a { dg-warning "invalid UTF-8 character " } +// a�a { dg-warning "invalid UTF-8 character " } +// a���a { dg-warning "invalid UTF-8 character <80>" } +// a���a { dg-warning "invalid UTF-8 character <9f><80>" } +// a��a { dg-warning "invalid UTF-8 character " } +// a��a { dg-warning "invalid UTF-8 character <80>" } +// a���a { dg-warning "invalid UTF-8 character <80>" } +// a����a { dg-warning "invalid UTF-8 character <80><80><80>" } +// a����a { dg-warning "invalid UTF-8 character <8f>" } +// a����a { dg-warning "invalid UTF-8 character <90><80><80>" } +// a������a { dg-warning "invalid UTF-8 character " } +// { dg-warning "invalid UTF-8 character " "" { target *-*-* } .-1 } +/* a€߿ࠀ퟿𐀀􏿿a { dg-bogus "invalid UTF-8 character" } */ +/* a�a { dg-warning "invalid UTF-8 character <80>" } */ +/* a�a { dg-warning "invalid UTF-8 character " } */ +/* a�a { dg-warning "invalid UTF-8 character " } */ +/* a�a { dg-warning "invalid UTF-8 character " } */ +/* a�a { dg-warning "invalid UTF-8 character " } */ +/* a�a { dg-warning "invalid UTF-8 character " } */ +/* a�a { dg-warning "invalid UTF-8 character " } */ +/* a�a { dg-warning "invalid UTF-8 character " } */ +/* a���a { dg-warning "invalid UTF-8 character <80>" } */ +/* a���a { dg-warning "invalid UTF-8 character <9f><80>" } */ +/* a��a { dg-warning "invalid UTF-8 character " } */ +/* a��a { dg-warning "invalid UTF-8 character <80>" } */ +/* a���a { dg-warning "invalid UTF-8 character <80>" } */ +/* a����a { dg-warning "invalid UTF-8 character <80><80><80>" } */ +/* a����a { dg-warning "invalid UTF-8 character <8f>" } */ +/* a����a { dg-warning "invalid UTF-8 character <90><80><80>" } */ +/* a������a { dg-warning "invalid UTF-8 character " } */ +/* { dg-warning "invalid UTF-8 character " "" { target *-*-* } .-1 } */ --- gcc/testsuite/c-c++-common/cpp/Winvalid-utf8-2.c.jj 2022-08-31 12:25:42.458125660 +0200 +++ gcc/testsuite/c-c++-common/cpp/Winvalid-utf8-2.c 2022-08-31 14:58:38.091864588 +0200 @@ -0,0 +1,88 @@ +// P2295R6 - Support for UTF-8 as a portable source file encoding +// This test intentionally contains various byte sequences which are not valid UTF-8 +// { dg-do preprocess { target { c || c++11 } } } +// { dg-require-effective-target wchar } +// { dg-options "-finput-charset=UTF-8 -Winvalid-utf8" } +// { dg-additional-options "-std=gnu99" { target c } } + +#ifndef __cplusplus +#include +typedef __CHAR16_TYPE__ char16_t; +typedef __CHAR32_TYPE__ char32_t; +#endif + +char32_t a = U'�'; // { dg-warning "invalid UTF-8 character <80>" } +char32_t b = U'�'; // { dg-warning "invalid UTF-8 character " } +char32_t c = U'�'; // { dg-warning "invalid UTF-8 character " } +char32_t d = U'�'; // { dg-warning "invalid UTF-8 character " } +char32_t e = U'�'; // { dg-warning "invalid UTF-8 character " } +char32_t f = U'�'; // { dg-warning "invalid UTF-8 character " } +char32_t g = U'�'; // { dg-warning "invalid UTF-8 character " } +char32_t h = U'�'; // { dg-warning "invalid UTF-8 character " } +char32_t i = U'���'; // { dg-warning "invalid UTF-8 character <80>" } +char32_t j = U'���'; // { dg-warning "invalid UTF-8 character <9f><80>" } +char32_t k = U'��'; // { dg-warning "invalid UTF-8 character " } +char32_t l = U'��'; // { dg-warning "invalid UTF-8 character <80>" } +char32_t m = U'���'; // { dg-warning "invalid UTF-8 character <80>" } +char32_t n = U'����'; // { dg-warning "invalid UTF-8 character <80><80><80>" } +char32_t o = U'����'; // { dg-warning "invalid UTF-8 character <8f>" } +char32_t p = U'����'; // { dg-warning "invalid UTF-8 character <90><80><80>" } +char32_t q = U'������'; // { dg-warning "invalid UTF-8 character " } + // { dg-warning "invalid UTF-8 character " "" { target *-*-* } .-1 } +const char32_t *A = U"€߿ࠀ퟿𐀀􏿿"; // { dg-bogus "invalid UTF-8 character" } +const char32_t *B = U"�"; // { dg-warning "invalid UTF-8 character <80>" } +const char32_t *C = U"�"; // { dg-warning "invalid UTF-8 character " } +const char32_t *D = U"�"; // { dg-warning "invalid UTF-8 character " } +const char32_t *E = U"�"; // { dg-warning "invalid UTF-8 character " } +const char32_t *F = U"�"; // { dg-warning "invalid UTF-8 character " } +const char32_t *G = U"�"; // { dg-warning "invalid UTF-8 character " } +const char32_t *H = U"�"; // { dg-warning "invalid UTF-8 character " } +const char32_t *I = U"�"; // { dg-warning "invalid UTF-8 character " } +const char32_t *J = U"���"; // { dg-warning "invalid UTF-8 character <80>" } +const char32_t *K = U"���"; // { dg-warning "invalid UTF-8 character <9f><80>" } +const char32_t *L = U"��"; // { dg-warning "invalid UTF-8 character " } +const char32_t *M = U"��"; // { dg-warning "invalid UTF-8 character <80>" } +const char32_t *N = U"���"; // { dg-warning "invalid UTF-8 character <80>" } +const char32_t *O = U"����"; // { dg-warning "invalid UTF-8 character <80><80><80>" } +const char32_t *P = U"����"; // { dg-warning "invalid UTF-8 character <8f>" } +const char32_t *Q = U"����"; // { dg-warning "invalid UTF-8 character <90><80><80>" } +const char32_t *R = U"������"; // { dg-warning "invalid UTF-8 character " } + // { dg-warning "invalid UTF-8 character " "" { target *-*-* } .-1 } +const char32_t *A1 = UR"(€߿ࠀ퟿𐀀􏿿)"; // { dg-bogus "invalid UTF-8 character" } +const char32_t *B1 = UR"(�)"; // { dg-warning "invalid UTF-8 character <80>" } +const char32_t *C1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " } +const char32_t *D1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " } +const char32_t *E1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " } +const char32_t *F1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " } +const char32_t *G1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " } +const char32_t *H1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " } +const char32_t *I1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " } +const char32_t *J1 = UR"(���)"; // { dg-warning "invalid UTF-8 character <80>" } +const char32_t *K1 = UR"(���)"; // { dg-warning "invalid UTF-8 character <9f><80>" } +const char32_t *L1 = UR"(��)"; // { dg-warning "invalid UTF-8 character " } +const char32_t *M1 = UR"(��)"; // { dg-warning "invalid UTF-8 character <80>" } +const char32_t *N1 = UR"(���)"; // { dg-warning "invalid UTF-8 character <80>" } +const char32_t *O1 = UR"(����)"; // { dg-warning "invalid UTF-8 character <80><80><80>" } +const char32_t *P1 = UR"(����)"; // { dg-warning "invalid UTF-8 character <8f>" } +const char32_t *Q1 = UR"(����)"; // { dg-warning "invalid UTF-8 character <90><80><80>" } +const char32_t *R1 = UR"(������)"; // { dg-warning "invalid UTF-8 character " } + // { dg-warning "invalid UTF-8 character " "" { target *-*-* } .-1 } +const char *A2 = u8"€߿ࠀ퟿𐀀􏿿"; // { dg-bogus "invalid UTF-8 character" } +const char *B2 = u8"�"; // { dg-warning "invalid UTF-8 character <80>" } +const char *C2 = u8"�"; // { dg-warning "invalid UTF-8 character " } +const char *D2 = u8"�"; // { dg-warning "invalid UTF-8 character " } +const char *E2 = u8"�"; // { dg-warning "invalid UTF-8 character " } +const char *F2 = u8"�"; // { dg-warning "invalid UTF-8 character " } +const char *G2 = u8"�"; // { dg-warning "invalid UTF-8 character " } +const char *H2 = u8"�"; // { dg-warning "invalid UTF-8 character " } +const char *I2 = u8"�"; // { dg-warning "invalid UTF-8 character " } +const char *J2 = u8"���"; // { dg-warning "invalid UTF-8 character <80>" } +const char *K2 = u8"���"; // { dg-warning "invalid UTF-8 character <9f><80>" } +const char *L2 = u8"��"; // { dg-warning "invalid UTF-8 character " } +const char *M2 = u8"��"; // { dg-warning "invalid UTF-8 character <80>" } +const char *N2 = u8"���"; // { dg-warning "invalid UTF-8 character <80>" } +const char *O2 = u8"����"; // { dg-warning "invalid UTF-8 character <80><80><80>" } +const char *P2 = u8"����"; // { dg-warning "invalid UTF-8 character <8f>" } +const char *Q2 = u8"����"; // { dg-warning "invalid UTF-8 character <90><80><80>" } +const char *R2 = u8"������"; // { dg-warning "invalid UTF-8 character " } + // { dg-warning "invalid UTF-8 character " "" { target *-*-* } .-1 } --- gcc/testsuite/c-c++-common/cpp/Winvalid-utf8-3.c.jj 2022-08-31 13:15:18.456085432 +0200 +++ gcc/testsuite/c-c++-common/cpp/Winvalid-utf8-3.c 2022-08-31 15:55:45.469984253 +0200 @@ -0,0 +1,27 @@ +// P2295R6 - Support for UTF-8 as a portable source file encoding +// This test intentionally contains various byte sequences which are not valid UTF-8 +// { dg-do preprocess } +// { dg-options "-finput-charset=UTF-8 -Winvalid-utf8" } + +#define I(x) +I(€߿ࠀ퟿𐀀􏿿) // { dg-bogus "invalid UTF-8 character" } + // { dg-error "is not valid in an identifier" "" { target c++ } .-1 } +I(�) // { dg-warning "invalid UTF-8 character <80>" } +I(�) // { dg-warning "invalid UTF-8 character " } +I(�) // { dg-warning "invalid UTF-8 character " } +I(�) // { dg-warning "invalid UTF-8 character " } +I(�) // { dg-warning "invalid UTF-8 character " } +I(�) // { dg-warning "invalid UTF-8 character " } +I(�) // { dg-warning "invalid UTF-8 character " } +I(�) // { dg-warning "invalid UTF-8 character " } +I(���) // { dg-warning "invalid UTF-8 character <80>" } +I(���) // { dg-warning "invalid UTF-8 character <9f><80>" } +I(��) // { dg-warning "invalid UTF-8 character " } +I(��) // { dg-warning "invalid UTF-8 character <80>" } +I(���) // { dg-warning "invalid UTF-8 character <80>" } +I(����) // { dg-warning "invalid UTF-8 character <80><80><80>" } +I(����) // { dg-warning "invalid UTF-8 character <8f>" } +I(����) // { dg-warning "invalid UTF-8 character <90><80><80>" "" { target c } } + // { dg-error "is not valid in an identifier" "" { target c++ } .-1 } +I(������) // { dg-warning "invalid UTF-8 character " "" { target c } } + // { dg-error "is not valid in an identifier" "" { target c++ } .-1 } --- gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-1.C.jj 2022-08-31 12:25:42.458125660 +0200 +++ gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-1.C 2022-08-31 14:59:22.296272041 +0200 @@ -0,0 +1,43 @@ +// P2295R6 - Support for UTF-8 as a portable source file encoding +// This test intentionally contains various byte sequences which are not valid UTF-8 +// { dg-do preprocess } +// { dg-options "-finput-charset=UTF-8" } + +// a€߿ࠀ퟿𐀀􏿿a { dg-bogus "invalid UTF-8 character" } +// a�a { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +// a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// a���a { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +// a���a { dg-warning "invalid UTF-8 character <9f><80>" "" { target c++23 } } +// a��a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// a��a { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +// a���a { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +// a����a { dg-warning "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +// a����a { dg-warning "invalid UTF-8 character <8f>" "" { target c++23 } } +// a����a { dg-warning "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +// a������a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// { dg-warning "invalid UTF-8 character " "" { target c++23 } .-1 } +/* a€߿ࠀ퟿𐀀􏿿a { dg-bogus "invalid UTF-8 character" } */ +/* a�a { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } */ +/* a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* a���a { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } */ +/* a���a { dg-warning "invalid UTF-8 character <9f><80>" "" { target c++23 } } */ +/* a?�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* a��a { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } */ +/* a���a { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } */ +/* a����a { dg-warning "invalid UTF-8 character <80><80><80>" "" { target c++23 } } */ +/* a����a { dg-warning "invalid UTF-8 character <8f>" "" { target c++23 } } */ +/* a����a { dg-warning "invalid UTF-8 character <90><80><80>" "" { target c++23 } } */ +/* a������a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* { dg-warning "invalid UTF-8 character " "" { target c++23 } .-1 } */ --- gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-2.C.jj 2022-08-31 12:25:42.458125660 +0200 +++ gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-2.C 2022-08-31 14:59:27.314204777 +0200 @@ -0,0 +1,43 @@ +// P2295R6 - Support for UTF-8 as a portable source file encoding +// This test intentionally contains various byte sequences which are not valid UTF-8 +// { dg-do preprocess } +// { dg-options "-finput-charset=UTF-8 -pedantic" } + +// a€߿ࠀ퟿𐀀􏿿a { dg-bogus "invalid UTF-8 character" } +// a�a { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +// a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// a?a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// a���a { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +// a���a { dg-warning "invalid UTF-8 character <9f><80>" "" { target c++23 } } +// a��a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// a��a { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +// a���a { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +// a����a { dg-warning "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +// a����a { dg-warning "invalid UTF-8 character <8f>" "" { target c++23 } } +// a����a { dg-warning "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +// a������a { dg-warning "invalid UTF-8 character " "" { target c++23 } } +// { dg-warning "invalid UTF-8 character " "" { target c++23 } .-1 } +/* a€߿ࠀ퟿𐀀􏿿a { dg-bogus "invalid UTF-8 character" } */ +/* a�a { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } */ +/* a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* a���a { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } */ +/* a���a { dg-warning "invalid UTF-8 character <9f><80>" "" { target c++23 } } */ +/* a��a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* a��a { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } */ +/* a���a { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } */ +/* a����a { dg-warning "invalid UTF-8 character <80><80><80>" "" { target c++23 } } */ +/* a����a { dg-warning "invalid UTF-8 character <8f>" "" { target c++23 } } */ +/* a����a { dg-warning "invalid UTF-8 character <90><80><80>" "" { target c++23 } } */ +/* a������a { dg-warning "invalid UTF-8 character " "" { target c++23 } } */ +/* { dg-warning "invalid UTF-8 character " "" { target c++23 } .-1 } */ --- gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-3.C.jj 2022-08-31 12:25:42.458125660 +0200 +++ gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-3.C 2022-08-31 14:59:33.624120194 +0200 @@ -0,0 +1,43 @@ +// P2295R6 - Support for UTF-8 as a portable source file encoding +// This test intentionally contains various byte sequences which are not valid UTF-8 +// { dg-do preprocess } +// { dg-options "-finput-charset=UTF-8 -pedantic-errors" } + +// a€߿ࠀ퟿𐀀􏿿a { dg-bogus "invalid UTF-8 character" } +// a�a { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +// a�a { dg-error "invalid UTF-8 character " "" { target c++23 } } +// a�a { dg-error "invalid UTF-8 character " "" { target c++23 } } +// a�a { dg-error "invalid UTF-8 character " "" { target c++23 } } +// a�a { dg-error "invalid UTF-8 character " "" { target c++23 } } +// a�a { dg-error "invalid UTF-8 character " "" { target c++23 } } +// a�a { dg-error "invalid UTF-8 character " "" { target c++23 } } +// a�a { dg-error "invalid UTF-8 character " "" { target c++23 } } +// a���a { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +// a���a { dg-error "invalid UTF-8 character <9f><80>" "" { target c++23 } } +// a��a { dg-error "invalid UTF-8 character " "" { target c++23 } } +// a��a { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +// a���a { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +// a����a { dg-error "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +// a����a { dg-error "invalid UTF-8 character <8f>" "" { target c++23 } } +// a����a { dg-error "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +// a������a { dg-error "invalid UTF-8 character " "" { target c++23 } } +// { dg-error "invalid UTF-8 character " "" { target c++23 } .-1 } +/* a€߿ࠀ퟿𐀀􏿿a { dg-bogus "invalid UTF-8 character" } */ +/* a�a { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } */ +/* a�a { dg-error "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-error "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-error "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-error "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-error "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-error "invalid UTF-8 character " "" { target c++23 } } */ +/* a�a { dg-error "invalid UTF-8 character " "" { target c++23 } } */ +/* a���a { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } */ +/* a���a { dg-error "invalid UTF-8 character <9f><80>" "" { target c++23 } } */ +/* a��a { dg-error "invalid UTF-8 character " "" { target c++23 } } */ +/* a��a { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } */ +/* a���a { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } */ +/* a����a { dg-error "invalid UTF-8 character <80><80><80>" "" { target c++23 } } */ +/* a����a { dg-error "invalid UTF-8 character <8f>" "" { target c++23 } } */ +/* a����a { dg-error "invalid UTF-8 character <90><80><80>" "" { target c++23 } } */ +/* a������a { dg-error "invalid UTF-8 character " "" { target c++23 } } */ +/* { dg-error "invalid UTF-8 character " "" { target c++23 } .-1 } */ --- gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-4.C.jj 2022-08-31 12:25:42.458125660 +0200 +++ gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-4.C 2022-08-31 14:59:37.965062005 +0200 @@ -0,0 +1,43 @@ +// P2295R6 - Support for UTF-8 as a portable source file encoding +// This test intentionally contains various byte sequences which are not valid UTF-8 +// { dg-do preprocess } +// { dg-options "-finput-charset=UTF-8 -pedantic-errors -Wno-invalid-utf8" } + +// a€߿ࠀ퟿𐀀􏿿a { dg-bogus "invalid UTF-8 character" } +// a�a { dg-bogus "invalid UTF-8 character <80>" } +// a�a { dg-bogus "invalid UTF-8 character " } +// a�a { dg-bogus "invalid UTF-8 character " } +// a�a { dg-bogus "invalid UTF-8 character " } +// a�a { dg-bogus "invalid UTF-8 character " } +// a�a { dg-bogus "invalid UTF-8 character " } +// a�a { dg-bogus "invalid UTF-8 character " } +// a�a { dg-bogus "invalid UTF-8 character " } +// a���a { dg-bogus "invalid UTF-8 character <80>" } +// a���a { dg-bogus "invalid UTF-8 character <9f><80>" } +// a��a { dg-bogus "invalid UTF-8 character " } +// a��a { dg-bogus "invalid UTF-8 character <80>" } +// a���a { dg-bogus "invalid UTF-8 character <80>" } +// a����a { dg-bogus "invalid UTF-8 character <80><80><80>" } +// a����a { dg-bogus "invalid UTF-8 character <8f>" } +// a����a { dg-bogus "invalid UTF-8 character <90><80><80>" } +// a������a { dg-bogus "invalid UTF-8 character " } +// { dg-bogus "invalid UTF-8 character " "" { target *-*-* } .-1 } +/* a€߿ࠀ퟿𐀀􏿿a { dg-bogus "invalid UTF-8 character" } */ +/* a�a { dg-bogus "invalid UTF-8 character <80>" } */ +/* a�a { dg-bogus "invalid UTF-8 character " } */ +/* a�a { dg-bogus "invalid UTF-8 character " } */ +/* a�a { dg-bogus "invalid UTF-8 character " } */ +/* a�a { dg-bogus "invalid UTF-8 character " } */ +/* a�a { dg-bogus "invalid UTF-8 character " } */ +/* a�a { dg-bogus "invalid UTF-8 character " } */ +/* a�a { dg-bogus "invalid UTF-8 character " } */ +/* a���a { dg-bogus "invalid UTF-8 character <80>" } */ +/* a���a { dg-bogus "invalid UTF-8 character <9f><80>" } */ +/* a��a { dg-bogus "invalid UTF-8 character " } */ +/* a��a { dg-bogus "invalid UTF-8 character <80>" } */ +/* a���a { dg-bogus "invalid UTF-8 character <80>" } */ +/* a����a { dg-bogus "invalid UTF-8 character <80><80><80>" } */ +/* a����a { dg-bogus "invalid UTF-8 character <8f>" } */ +/* a����a { dg-bogus "invalid UTF-8 character <90><80><80>" } */ +/* a������a { dg-bogus "invalid UTF-8 character " } */ +/* { dg-bogus "invalid UTF-8 character " "" { target *-*-* } .-1 } */ --- gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-5.C.jj 2022-08-31 12:25:42.458125660 +0200 +++ gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-5.C 2022-08-31 14:59:49.089912880 +0200 @@ -0,0 +1,80 @@ +// P2295R6 - Support for UTF-8 as a portable source file encoding +// This test intentionally contains various byte sequences which are not valid UTF-8 +// { dg-do preprocess { target c++11 } } +// { dg-options "-finput-charset=UTF-8" } + +char32_t a = U'?'; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +char32_t b = U'�'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +char32_t c = U'�'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +char32_t d = U'�'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +char32_t e = U'�'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +char32_t f = U'�'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +char32_t g = U'�'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +char32_t h = U'�'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +char32_t i = U'���'; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +char32_t j = U'���'; // { dg-warning "invalid UTF-8 character <9f><80>" "" { target c++23 } } +char32_t k = U'��'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +char32_t l = U'��'; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +char32_t m = U'���'; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +char32_t n = U'����'; // { dg-warning "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +char32_t o = U'����'; // { dg-warning "invalid UTF-8 character <8f>" "" { target c++23 } } +char32_t p = U'����'; // { dg-warning "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +char32_t q = U'������'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } + // { dg-warning "invalid UTF-8 character " "" { target c++23 } .-1 } +auto A = U"€߿ࠀ퟿𐀀􏿿"; // { dg-bogus "invalid UTF-8 character" } +auto B = U"�"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto C = U"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto D = U"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto E = U"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto F = U"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto G = U"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto H = U"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto I = U"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto J = U"���"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto K = U"���"; // { dg-warning "invalid UTF-8 character <9f><80>" "" { target c++23 } } +auto L = U"��"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto M = U"��"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto N = U"���"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto O = U"����"; // { dg-warning "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +auto P = U"����"; // { dg-warning "invalid UTF-8 character <8f>" "" { target c++23 } } +auto Q = U"����"; // { dg-warning "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +auto R = U"������"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } + // { dg-warning "invalid UTF-8 character " "" { target c++23 } .-1 } +auto A1 = UR"(€߿ࠀ퟿𐀀􏿿)"; // { dg-bogus "invalid UTF-8 character" } +auto B1 = UR"(�)"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto C1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto D1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto E1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto F1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto G1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto H1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto I1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto J1 = UR"(���)"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto K1 = UR"(���)"; // { dg-warning "invalid UTF-8 character <9f><80>" "" { target c++23 } } +auto L1 = UR"(��)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto M1 = UR"(��)"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto N1 = UR"(���)"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto O1 = UR"(����)"; // { dg-warning "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +auto P1 = UR"(����)"; // { dg-warning "invalid UTF-8 character <8f>" "" { target c++23 } } +auto Q1 = UR"(����)"; // { dg-warning "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +auto R1 = UR"(������)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } + // { dg-warning "invalid UTF-8 character " "" { target c++23 } .-1 } +auto A2 = u8"€߿ࠀ퟿𐀀􏿿"; // { dg-bogus "invalid UTF-8 character" } +auto B2 = u8"�"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto C2 = u8"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto D2 = u8"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto E2 = u8"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto F2 = u8"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto G2 = u8"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto H2 = u8"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto I2 = u8"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto J2 = u8"���"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto K2 = u8"���"; // { dg-warning "invalid UTF-8 character <9f><80>" "" { target c++23 } } +auto L2 = u8"��"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto M2 = u8"��"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto N2 = u8"���"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto O2 = u8"����"; // { dg-warning "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +auto P2 = u8"����"; // { dg-warning "invalid UTF-8 character <8f>" "" { target c++23 } } +auto Q2 = u8"����"; // { dg-warning "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +auto R2 = u8"������"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } + // { dg-warning "invalid UTF-8 character " "" { target c++23 } .-1 } --- gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-6.C.jj 2022-08-31 12:25:42.458125660 +0200 +++ gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-6.C 2022-08-31 14:59:58.128791717 +0200 @@ -0,0 +1,80 @@ +// P2295R6 - Support for UTF-8 as a portable source file encoding +// This test intentionally contains various byte sequences which are not valid UTF-8 +// { dg-do preprocess { target c++11 } } +// { dg-options "-finput-charset=UTF-8 -pedantic" } + +char32_t a = U'�'; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +char32_t b = U'�'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +char32_t c = U'�'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +char32_t d = U'�'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +char32_t e = U'�'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +char32_t f = U'�'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +char32_t g = U'�'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +char32_t h = U'�'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +char32_t i = U'���'; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +char32_t j = U'���'; // { dg-warning "invalid UTF-8 character <9f><80>" "" { target c++23 } } +char32_t k = U'��'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +char32_t l = U'��'; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +char32_t m = U'���'; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +char32_t n = U'����'; // { dg-warning "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +char32_t o = U'����'; // { dg-warning "invalid UTF-8 character <8f>" "" { target c++23 } } +char32_t p = U'����'; // { dg-warning "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +char32_t q = U'������'; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } + // { dg-warning "invalid UTF-8 character " "" { target c++23 } .-1 } +auto A = U"€߿ࠀ퟿𐀀􏿿"; // { dg-bogus "invalid UTF-8 character" } +auto B = U"�"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto C = U"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto D = U"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto E = U"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto F = U"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto G = U"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto H = U"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto I = U"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto J = U"���"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto K = U"���"; // { dg-warning "invalid UTF-8 character <9f><80>" "" { target c++23 } } +auto L = U"��"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto M = U"��"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto N = U"���"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto O = U"����"; // { dg-warning "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +auto P = U"����"; // { dg-warning "invalid UTF-8 character <8f>" "" { target c++23 } } +auto Q = U"����"; // { dg-warning "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +auto R = U"������"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } + // { dg-warning "invalid UTF-8 character " "" { target c++23 } .-1 } +auto A1 = UR"(€߿ࠀ퟿𐀀􏿿)"; // { dg-bogus "invalid UTF-8 character" } +auto B1 = UR"(�)"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto C1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto D1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto E1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto F1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto G1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto H1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto I1 = UR"(�)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto J1 = UR"(���)"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto K1 = UR"(���)"; // { dg-warning "invalid UTF-8 character <9f><80>" "" { target c++23 } } +auto L1 = UR"(��)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto M1 = UR"(??)"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto N1 = UR"(���)"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto O1 = UR"(����)"; // { dg-warning "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +auto P1 = UR"(����)"; // { dg-warning "invalid UTF-8 character <8f>" "" { target c++23 } } +auto Q1 = UR"(����)"; // { dg-warning "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +auto R1 = UR"(������)"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } + // { dg-warning "invalid UTF-8 character " "" { target c++23 } .-1 } +auto A2 = u8"€߿ࠀ퟿𐀀􏿿"; // { dg-bogus "invalid UTF-8 character" } +auto B2 = u8"�"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto C2 = u8"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto D2 = u8"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto E2 = u8"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto F2 = u8"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto G2 = u8"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto H2 = u8"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto I2 = u8"�"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto J2 = u8"���"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto K2 = u8"���"; // { dg-warning "invalid UTF-8 character <9f><80>" "" { target c++23 } } +auto L2 = u8"��"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +auto M2 = u8"��"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto N2 = u8"���"; // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +auto O2 = u8"����"; // { dg-warning "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +auto P2 = u8"����"; // { dg-warning "invalid UTF-8 character <8f>" "" { target c++23 } } +auto Q2 = u8"����"; // { dg-warning "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +auto R2 = u8"������"; // { dg-warning "invalid UTF-8 character " "" { target c++23 } } + // { dg-warning "invalid UTF-8 character " "" { target c++23 } .-1 } --- gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-7.C.jj 2022-08-31 12:25:42.459125647 +0200 +++ gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-7.C 2022-08-31 15:00:05.749689562 +0200 @@ -0,0 +1,80 @@ +// P2295R6 - Support for UTF-8 as a portable source file encoding +// This test intentionally contains various byte sequences which are not valid UTF-8 +// { dg-do preprocess { target c++11 } } +// { dg-options "-finput-charset=UTF-8 -pedantic-errors" } + +char32_t a = U'�'; // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +char32_t b = U'�'; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +char32_t c = U'�'; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +char32_t d = U'�'; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +char32_t e = U'�'; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +char32_t f = U'�'; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +char32_t g = U'�'; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +char32_t h = U'�'; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +char32_t i = U'���'; // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +char32_t j = U'���'; // { dg-error "invalid UTF-8 character <9f><80>" "" { target c++23 } } +char32_t k = U'��'; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +char32_t l = U'��'; // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +char32_t m = U'���'; // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +char32_t n = U'����'; // { dg-error "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +char32_t o = U'����'; // { dg-error "invalid UTF-8 character <8f>" "" { target c++23 } } +char32_t p = U'����'; // { dg-error "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +char32_t q = U'������'; // { dg-error "invalid UTF-8 character " "" { target c++23 } } + // { dg-error "invalid UTF-8 character " "" { target c++23 } .-1 } +auto A = U"€߿ࠀ퟿𐀀􏿿"; // { dg-bogus "invalid UTF-8 character" } +auto B = U"�"; // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +auto C = U"�"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto D = U"�"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto E = U"�"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto F = U"�"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto G = U"�"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto H = U"�"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto I = U"�"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto J = U"���"; // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +auto K = U"���"; // { dg-error "invalid UTF-8 character <9f><80>" "" { target c++23 } } +auto L = U"��"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto M = U"��"; // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +auto N = U"���"; // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +auto O = U"����"; // { dg-error "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +auto P = U"����"; // { dg-error "invalid UTF-8 character <8f>" "" { target c++23 } } +auto Q = U"����"; // { dg-error "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +auto R = U"������"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } + // { dg-error "invalid UTF-8 character " "" { target c++23 } .-1 } +auto A1 = UR"(€߿ࠀ퟿𐀀􏿿)"; // { dg-bogus "invalid UTF-8 character" } +auto B1 = UR"(�)"; // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +auto C1 = UR"(�)"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto D1 = UR"(�)"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto E1 = UR"(�)"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto F1 = UR"(�)"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto G1 = UR"(�)"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto H1 = UR"(�)"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto I1 = UR"(�)"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto J1 = UR"(���)"; // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +auto K1 = UR"(���)"; // { dg-error "invalid UTF-8 character <9f><80>" "" { target c++23 } } +auto L1 = UR"(��)"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto M1 = UR"(��)"; // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +auto N1 = UR"(���)"; // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +auto O1 = UR"(����)"; // { dg-error "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +auto P1 = UR"(����)"; // { dg-error "invalid UTF-8 character <8f>" "" { target c++23 } } +auto Q1 = UR"(����)"; // { dg-error "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +auto R1 = UR"(������)"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } + // { dg-error "invalid UTF-8 character " "" { target c++23 } .-1 } +auto A2 = u8"€߿ࠀ퟿𐀀􏿿"; // { dg-bogus "invalid UTF-8 character" } +auto B2 = u8"�"; // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +auto C2 = u8"�"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto D2 = u8"�"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto E2 = u8"�"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto F2 = u8"�"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto G2 = u8"�"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto H2 = u8"�"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto I2 = u8"�"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto J2 = u8"���"; // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +auto K2 = u8"���"; // { dg-error "invalid UTF-8 character <9f><80>" "" { target c++23 } } +auto L2 = u8"��"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } +auto M2 = u8"��"; // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +auto N2 = u8"���"; // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +auto O2 = u8"����"; // { dg-error "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +auto P2 = u8"����"; // { dg-error "invalid UTF-8 character <8f>" "" { target c++23 } } +auto Q2 = u8"����"; // { dg-error "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +auto R2 = u8"������"; // { dg-error "invalid UTF-8 character " "" { target c++23 } } + // { dg-error "invalid UTF-8 character " "" { target c++23 } .-1 } --- gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-8.C.jj 2022-08-31 12:25:42.459125647 +0200 +++ gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-8.C 2022-08-31 15:00:11.378614108 +0200 @@ -0,0 +1,80 @@ +// P2295R6 - Support for UTF-8 as a portable source file encoding +// This test intentionally contains various byte sequences which are not valid UTF-8 +// { dg-do preprocess { target c++11 } } +// { dg-options "-finput-charset=UTF-8 -pedantic-errors -Wno-invalid-utf8" } + +char32_t a = U'�'; // { dg-bogus "invalid UTF-8 character <80>" "" { target c++23 } } +char32_t b = U'�'; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +char32_t c = U'�'; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +char32_t d = U'�'; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +char32_t e = U'�'; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +char32_t f = U'�'; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +char32_t g = U'�'; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +char32_t h = U'�'; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +char32_t i = U'���'; // { dg-bogus "invalid UTF-8 character <80>" "" { target c++23 } } +char32_t j = U'���'; // { dg-bogus "invalid UTF-8 character <9f><80>" "" { target c++23 } } +char32_t k = U'��'; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +char32_t l = U'��'; // { dg-bogus "invalid UTF-8 character <80>" "" { target c++23 } } +char32_t m = U'���'; // { dg-bogus "invalid UTF-8 character <80>" "" { target c++23 } } +char32_t n = U'����'; // { dg-bogus "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +char32_t o = U'����'; // { dg-bogus "invalid UTF-8 character <8f>" "" { target c++23 } } +char32_t p = U'����'; // { dg-bogus "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +char32_t q = U'������'; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } + // { dg-bogus "invalid UTF-8 character " "" { target c++23 } .-1 } +auto A = U"€߿ࠀ퟿𐀀􏿿"; // { dg-bogus "invalid UTF-8 character" } +auto B = U"�"; // { dg-bogus "invalid UTF-8 character <80>" "" { target c++23 } } +auto C = U"�"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto D = U"�"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto E = U"�"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto F = U"�"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto G = U"�"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto H = U"�"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto I = U"�"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto J = U"���"; // { dg-bogus "invalid UTF-8 character <80>" "" { target c++23 } } +auto K = U"���"; // { dg-bogus "invalid UTF-8 character <9f><80>" "" { target c++23 } } +auto L = U"��"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto M = U"��"; // { dg-bogus "invalid UTF-8 character <80>" "" { target c++23 } } +auto N = U"���"; // { dg-bogus "invalid UTF-8 character <80>" "" { target c++23 } } +auto O = U"����"; // { dg-bogus "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +auto P = U"����"; // { dg-bogus "invalid UTF-8 character <8f>" "" { target c++23 } } +auto Q = U"����"; // { dg-bogus "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +auto R = U"������"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } + // { dg-bogus "invalid UTF-8 character " "" { target c++23 } .-1 } +auto A1 = UR"(€߿ࠀ퟿𐀀􏿿)"; // { dg-bogus "invalid UTF-8 character" } +auto B1 = UR"(�)"; // { dg-bogus "invalid UTF-8 character <80>" "" { target c++23 } } +auto C1 = UR"(�)"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto D1 = UR"(�)"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto E1 = UR"(�)"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto F1 = UR"(�)"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto G1 = UR"(�)"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto H1 = UR"(�)"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto I1 = UR"(�)"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto J1 = UR"(���)"; // { dg-bogus "invalid UTF-8 character <80>" "" { target c++23 } } +auto K1 = UR"(���)"; // { dg-bogus "invalid UTF-8 character <9f><80>" "" { target c++23 } } +auto L1 = UR"(��)"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto M1 = UR"(��)"; // { dg-bogus "invalid UTF-8 character <80>" "" { target c++23 } } +auto N1 = UR"(���)"; // { dg-bogus "invalid UTF-8 character <80>" "" { target c++23 } } +auto O1 = UR"(����)"; // { dg-bogus "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +auto P1 = UR"(����)"; // { dg-bogus "invalid UTF-8 character <8f>" "" { target c++23 } } +auto Q1 = UR"(����)"; // { dg-bogus "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +auto R1 = UR"(������)"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } + // { dg-bogus "invalid UTF-8 character " "" { target c++23 } .-1 } +auto A2 = u8"€߿ࠀ퟿𐀀􏿿"; // { dg-bogus "invalid UTF-8 character" } +auto B2 = u8"�"; // { dg-bogus "invalid UTF-8 character <80>" "" { target c++23 } } +auto C2 = u8"�"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto D2 = u8"�"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto E2 = u8"�"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto F2 = u8"�"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto G2 = u8"�"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto H2 = u8"�"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto I2 = u8"�"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto J2 = u8"���"; // { dg-bogus "invalid UTF-8 character <80>" "" { target c++23 } } +auto K2 = u8"���"; // { dg-bogus "invalid UTF-8 character <9f><80>" "" { target c++23 } } +auto L2 = u8"��"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } +auto M2 = u8"��"; // { dg-bogus "invalid UTF-8 character <80>" "" { target c++23 } } +auto N2 = u8"���"; // { dg-bogus "invalid UTF-8 character <80>" "" { target c++23 } } +auto O2 = u8"����"; // { dg-bogus "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +auto P2 = u8"����"; // { dg-bogus "invalid UTF-8 character <8f>" "" { target c++23 } } +auto Q2 = u8"����"; // { dg-bogus "invalid UTF-8 character <90><80><80>" "" { target c++23 } } +auto R2 = u8"������"; // { dg-bogus "invalid UTF-8 character " "" { target c++23 } } + // { dg-bogus "invalid UTF-8 character " "" { target c++23 } .-1 } --- gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-9.C.jj 2022-08-31 15:56:45.801177822 +0200 +++ gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-9.C 2022-08-31 15:57:42.281422873 +0200 @@ -0,0 +1,25 @@ +// P2295R6 - Support for UTF-8 as a portable source file encoding +// This test intentionally contains various byte sequences which are not valid UTF-8 +// { dg-do preprocess } +// { dg-options "-finput-charset=UTF-8" } + +#define I(x) +I(€߿ࠀ퟿𐀀􏿿) // { dg-bogus "invalid UTF-8 character" } + // { dg-error "is not valid in an identifier" "" { target *-*-* } .-1 } +I(�) // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +I(�) // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +I(���) // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +I(���) // { dg-warning "invalid UTF-8 character <9f><80>" "" { target c++23 } } +I(��) // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +I(��) // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +I(���) // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +I(����) // { dg-warning "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +I(����) // { dg-warning "invalid UTF-8 character <8f>" "" { target c++23 } } +I(����) // { dg-error "is not valid in an identifier" } +I(������) // { dg-error "is not valid in an identifier" } --- gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-10.C.jj 2022-08-31 15:58:04.029132184 +0200 +++ gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-10.C 2022-08-31 15:58:09.934053256 +0200 @@ -0,0 +1,25 @@ +// P2295R6 - Support for UTF-8 as a portable source file encoding +// This test intentionally contains various byte sequences which are not valid UTF-8 +// { dg-do preprocess } +// { dg-options "-finput-charset=UTF-8 -pedantic" } + +#define I(x) +I(€߿ࠀ퟿𐀀􏿿) // { dg-bogus "invalid UTF-8 character" } + // { dg-error "is not valid in an identifier" "" { target *-*-* } .-1 } +I(�) // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +I(�) // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +I(���) // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +I(���) // { dg-warning "invalid UTF-8 character <9f><80>" "" { target c++23 } } +I(��) // { dg-warning "invalid UTF-8 character " "" { target c++23 } } +I(��) // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +I(���) // { dg-warning "invalid UTF-8 character <80>" "" { target c++23 } } +I(����) // { dg-warning "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +I(����) // { dg-warning "invalid UTF-8 character <8f>" "" { target c++23 } } +I(����) // { dg-error "is not valid in an identifier" } +I(������) // { dg-error "is not valid in an identifier" } --- gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-11.C.jj 2022-08-31 15:58:18.825934408 +0200 +++ gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-11.C 2022-08-31 15:58:30.929772616 +0200 @@ -0,0 +1,25 @@ +// P2295R6 - Support for UTF-8 as a portable source file encoding +// This test intentionally contains various byte sequences which are not valid UTF-8 +// { dg-do preprocess } +// { dg-options "-finput-charset=UTF-8 -pedantic-errors" } + +#define I(x) +I(€߿ࠀ퟿𐀀􏿿) // { dg-bogus "invalid UTF-8 character" } + // { dg-error "is not valid in an identifier" "" { target *-*-* } .-1 } +I(�) // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +I(�) // { dg-error "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-error "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-error "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-error "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-error "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-error "invalid UTF-8 character " "" { target c++23 } } +I(�) // { dg-error "invalid UTF-8 character " "" { target c++23 } } +I(���) // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +I(���) // { dg-error "invalid UTF-8 character <9f><80>" "" { target c++23 } } +I(��) // { dg-error "invalid UTF-8 character " "" { target c++23 } } +I(��) // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +I(���) // { dg-error "invalid UTF-8 character <80>" "" { target c++23 } } +I(����) // { dg-error "invalid UTF-8 character <80><80><80>" "" { target c++23 } } +I(����) // { dg-error "invalid UTF-8 character <8f>" "" { target c++23 } } +I(����) // { dg-error "is not valid in an identifier" } +I(������) // { dg-error "is not valid in an identifier" } --- gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-12.C.jj 2022-08-31 15:58:52.404485572 +0200 +++ gcc/testsuite/g++.dg/cpp23/Winvalid-utf8-12.C 2022-08-31 15:59:19.735120251 +0200 @@ -0,0 +1,25 @@ +// P2295R6 - Support for UTF-8 as a portable source file encoding +// This test intentionally contains various byte sequences which are not valid UTF-8 +// { dg-do preprocess } +// { dg-options "-finput-charset=UTF-8 -pedantic-errors -Wno-invalid-utf8" } + +#define I(x) +I(€߿ࠀ퟿𐀀􏿿) // { dg-bogus "invalid UTF-8 character" } + // { dg-error "is not valid in an identifier" "" { target *-*-* } .-1 } +I(�) // { dg-bogus "invalid UTF-8 character <80>" } +I(�) // { dg-bogus "invalid UTF-8 character " } +I(�) // { dg-bogus "invalid UTF-8 character " } +I(�) // { dg-bogus "invalid UTF-8 character " } +I(�) // { dg-bogus "invalid UTF-8 character " } +I(�) // { dg-bogus "invalid UTF-8 character " } +I(�) // { dg-bogus "invalid UTF-8 character " } +I(�) // { dg-bogus "invalid UTF-8 character " } +I(���) // { dg-bogus "invalid UTF-8 character <80>" } +I(���) // { dg-bogus "invalid UTF-8 character <9f><80>" } +I(��) // { dg-bogus "invalid UTF-8 character " } +I(?�) // { dg-bogus "invalid UTF-8 character <80>" } +I(���) // { dg-bogus "invalid UTF-8 character <80>" } +I(����) // { dg-bogus "invalid UTF-8 character <80><80><80>" } +I(����) // { dg-bogus "invalid UTF-8 character <8f>" } +I(����) // { dg-error "is not valid in an identifier" } +I(������) // { dg-error "is not valid in an identifier" }