From patchwork Tue Aug 2 18:36:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 361 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6a10:b5d6:b0:2b9:3548:2db5 with SMTP id v22csp3149749pxt; Tue, 2 Aug 2022 11:38:56 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vlmsYD/0CYu+DpLy7uDZKByB3TA9PeJkN1exRQgnsH4macYB6g2zH2HhHN+1HV09d1nnWz X-Received: by 2002:a17:907:1b25:b0:6da:8206:fc56 with SMTP id mp37-20020a1709071b2500b006da8206fc56mr17219026ejc.81.1659465536177; Tue, 02 Aug 2022 11:38:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659465536; cv=none; d=google.com; s=arc-20160816; b=KR+VWN/pwsu7DIu5Aa2a01v/Nv49yLVN+DnY01Gv1AtsxeSyhMRBydDOWLLrHL8mvd s8u8yhOde/ksZ8RMUQNZtu0Sk7qwBgE9vlIXqSYF/kpVvb7TFRx4rkEjNCgeeEyBJ98r Dlj+QVVdGJ/o1wT17Tzt8O8zKB6ONDMQTc1PRd5/PCnLmKKrh2G0DPJPFPw8o0sub9QW ORBCLcNaDpOut1FG3WnpKV1L/jXXygEOTMhfQhBQOKAFyn6fNhWu4bd1EOpjPZ8Lxs8J hjGYLjsguroxloNmIiUP4Aru2tfzDuH8pDZwlwciPuWU9Q9VXJUOoJqnKk0qU7mu++Xn QsgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=wz64NeGnSxvwj8mQS6ZT2Tudkje5OvfdFIYz8CFGT8o=; b=ALkZr1mDSk7rLeOZ3XbmtGjOkxGc4YavEGCTjHKtWfnu00ID0Wtg4vevcPOip4xp2J JSaI41prGmlWzyYv4DufNcRMB6Pk0SWuFK1ISWXmLQ2/itvYCQBYhKpasfUJtiOOBqmM YM6UnMAUu+9A2PZmZ9+RB+BbpngtRQuSSnnuP7ATszvh3T4ahrtePwF7DQ1kXK/N7k2g JeQaEpkDK9bEtCoxl6mM92cJFshX8omr0hdOxI65aIFSiac3W2sHDwld8pBbmyWhV5LE nFAh0mbVfQKorFjZiotDX3sVKHj0VNKg47Z+r6uVirYmi7UG0aexW9Jd+XScmDj+bUUL 7E1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=fTE8As0X; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id h6-20020a056402280600b0043da17ed648si1082889ede.461.2022.08.02.11.38.55 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Aug 2022 11:38:56 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=fTE8As0X; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 22DC43857829 for ; Tue, 2 Aug 2022 18:38:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 22DC43857829 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1659465535; bh=wz64NeGnSxvwj8mQS6ZT2Tudkje5OvfdFIYz8CFGT8o=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=fTE8As0XvEnQtkXFCz6EPtw3aQSuCR+uXwZK1NdTuOdeb6Yc8X8YD357gZz3+K1LY b2tZLXS+1p/RrgUifmNdaq/mXLZjZo9PZVlh/x/GXYyOKBfmOSe0aTkZa2wi8UlnZm ZocJ2CgHkcAzswpcpWrpjkwKNr8bm4WQ1PqwWZzk= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp101.ord1d.emailsrvr.com (smtp101.ord1d.emailsrvr.com [184.106.54.101]) by sourceware.org (Postfix) with ESMTPS id E327038582A7 for ; Tue, 2 Aug 2022 18:36:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E327038582A7 X-Auth-ID: tom@honermann.net Received: by smtp13.relay.ord1d.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 79444C01A0; Tue, 2 Aug 2022 14:36:14 -0400 (EDT) To: gcc-patches@gcc.gnu.org Subject: [PATCH v4 2/2] preprocessor/106426: Treat u8 character literals as unsigned in char8_t modes. Date: Tue, 2 Aug 2022 14:36:02 -0400 Message-Id: <20220802183602.1575950-3-tom@honermann.net> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220802183602.1575950-1-tom@honermann.net> References: <20220802183602.1575950-1-tom@honermann.net> MIME-Version: 1.0 X-Classification-ID: 1c8dd2ad-b7d1-42b3-917d-b50ffbed0e63-3-1 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Gcc-patches From: Tom Honermann Reply-To: Tom Honermann Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1740075734308282777?= X-GMAIL-MSGID: =?utf-8?q?1740075734308282777?= This patch corrects handling of UTF-8 character literals in preprocessing directives so that they are treated as unsigned types in char8_t enabled C++ modes (C++17 with -fchar8_t or C++20 without -fno-char8_t). Previously, UTF-8 character literals were always treated as having the same type as ordinary character literals (signed or unsigned dependent on target or use of the -fsigned-char or -funsigned char options). PR preprocessor/106426 gcc/c-family/ChangeLog: * c-opts.cc (c_common_post_options): Assign cpp_opts->unsigned_utf8char subject to -fchar8_t, -fsigned-char, and/or -funsigned-char. gcc/testsuite/ChangeLog: * g++.dg/ext/char8_t-char-literal-1.C: Check signedness of u8 literals. * g++.dg/ext/char8_t-char-literal-2.C: Check signedness of u8 literals. libcpp/ChangeLog: * charset.cc (narrow_str_to_charconst): Set signedness of CPP_UTF8CHAR literals based on unsigned_utf8char. * include/cpplib.h (cpp_options): Add unsigned_utf8char. * init.cc (cpp_create_reader): Initialize unsigned_utf8char. --- gcc/c-family/c-opts.cc | 1 + gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C | 6 +++++- gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C | 4 ++++ libcpp/charset.cc | 4 ++-- libcpp/include/cpplib.h | 4 ++-- libcpp/init.cc | 1 + 6 files changed, 15 insertions(+), 5 deletions(-) diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc index 108adc5caf8..02ce1e86cdb 100644 --- a/gcc/c-family/c-opts.cc +++ b/gcc/c-family/c-opts.cc @@ -1062,6 +1062,7 @@ c_common_post_options (const char **pfilename) /* char8_t support is implicitly enabled in C++20 and C2X. */ if (flag_char8_t == -1) flag_char8_t = (cxx_dialect >= cxx20) || flag_isoc2x; + cpp_opts->unsigned_utf8char = flag_char8_t ? 1 : cpp_opts->unsigned_char; if (flag_extern_tls_init) { diff --git a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C index 8ed85ccfdcd..2994dd38516 100644 --- a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C +++ b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C @@ -1,6 +1,6 @@ // Test that UTF-8 character literals have type char if -fchar8_t is not enabled. // { dg-do compile } -// { dg-options "-std=c++17 -fno-char8_t" } +// { dg-options "-std=c++17 -fsigned-char -fno-char8_t" } template struct is_same @@ -10,3 +10,7 @@ template { static const bool value = true; }; static_assert(is_same::value, "Error"); + +#if u8'\0' - 1 > 0 +#error "UTF-8 character literals not signed in preprocessor" +#endif diff --git a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C index 7861736689c..db4fe70046d 100644 --- a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C +++ b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C @@ -10,3 +10,7 @@ template { static const bool value = true; }; static_assert(is_same::value, "Error"); + +#if u8'\0' - 1 < 0 +#error "UTF-8 character literals not unsigned in preprocessor" +#endif diff --git a/libcpp/charset.cc b/libcpp/charset.cc index ca8b7cf7aa5..12e31632228 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -1960,8 +1960,8 @@ narrow_str_to_charconst (cpp_reader *pfile, cpp_string str, /* Multichar constants are of type int and therefore signed. */ if (i > 1) unsigned_p = 0; - else if (type == CPP_UTF8CHAR && !CPP_OPTION (pfile, cplusplus)) - unsigned_p = 1; + else if (type == CPP_UTF8CHAR) + unsigned_p = CPP_OPTION (pfile, unsigned_utf8char); else unsigned_p = CPP_OPTION (pfile, unsigned_char); diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h index 3eba6f74b57..f9c042db034 100644 --- a/libcpp/include/cpplib.h +++ b/libcpp/include/cpplib.h @@ -581,8 +581,8 @@ struct cpp_options ints and target wide characters, respectively. */ size_t precision, char_precision, int_precision, wchar_precision; - /* True means chars (wide chars) are unsigned. */ - bool unsigned_char, unsigned_wchar; + /* True means chars (wide chars, UTF-8 chars) are unsigned. */ + bool unsigned_char, unsigned_wchar, unsigned_utf8char; /* True if the most significant byte in a word has the lowest address in memory. */ diff --git a/libcpp/init.cc b/libcpp/init.cc index f4ab83d2145..0242da5f55c 100644 --- a/libcpp/init.cc +++ b/libcpp/init.cc @@ -231,6 +231,7 @@ cpp_create_reader (enum c_lang lang, cpp_hash_table *table, CPP_OPTION (pfile, int_precision) = CHAR_BIT * sizeof (int); CPP_OPTION (pfile, unsigned_char) = 0; CPP_OPTION (pfile, unsigned_wchar) = 1; + CPP_OPTION (pfile, unsigned_utf8char) = 1; CPP_OPTION (pfile, bytes_big_endian) = 1; /* does not matter */ /* Default to no charset conversion. */