From patchwork Mon Aug 1 18:32:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Honermann X-Patchwork-Id: 334 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6a10:b5d6:b0:2b9:3548:2db5 with SMTP id v22csp2504491pxt; Mon, 1 Aug 2022 11:33:18 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sDP6mK5LtiV/F5osOfhd/fkxYlNAte6V7StM9CcM277menAfeOxyitcjeYYkWXCs7txzY4 X-Received: by 2002:a17:906:9b86:b0:6fe:d37f:b29d with SMTP id dd6-20020a1709069b8600b006fed37fb29dmr13446278ejc.327.1659378798646; Mon, 01 Aug 2022 11:33:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659378798; cv=none; d=google.com; s=arc-20160816; b=RTMRgNHRBklD5PU8br5BSPUo67PTlWWFcckA9ihLdTLK8PWYJ0WZQGHBb/EUQSYuTg 9AYWXRvdjZmQwwohf7l6vMLyGm2F7CM4V7FjEAd7eC9b4V0uc1ZyK2LCxe0KLS/JCO96 YVv8Ahk+Fg61H8/IZrPuNYoAq978JfD1HhrIKL0N3hCEcXSqfneD63ISpCBBCy4N3pBh v2bKbP6xhdWZjQjllTrSG9p0D9iqDW36h3Ky6hX42Z4Dn1DhQu3IR1xBww8CuMQTh7OH fK4r70L33sXYAKA1A9nnOP92dz/BtMotEArXk6515WHrk8RUO2pgsrociEYQEfrLUI9I nKcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=0RXx9YjFmVZfpRqDS698JMsoQl+yUNlBTOGngfi6kdI=; b=FvSD8TQi2vg8+dFe16LVUBDBu+dkpqSQ7XcyQ0QmMgmjG0k2BjmrS1yFiD6eJl6p19 V7LZ0fHSZX42AKgqhZ4rJM1GQ59nt6mKlwUqoz7VRlUo1ic5gF8abryhzUu039rk93eY I7xmlmKJt2F0D5hOeq9nR5ig9u55FNdReqwJGG4pazjJSs5Anxp7rOWcdsY3tI/UJCde P6957eTgMyV+pYo0+hUFFPO9eskITd/Ssqp3o7ayvkTFXQgwseGpkeXovvNrIvV7QgqK SkYysRMG3ncAm+XfqdvkO67OH/3PKVEEjGARG/rKA0fAy4u45GoQIHoauDD3TkLdFkaZ cEMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="R/iTfSrj"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id cr21-20020a170906d55500b0073087f30394si3029959ejc.854.2022.08.01.11.33.18 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Aug 2022 11:33:18 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="R/iTfSrj"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6A7DD38582BC for ; Mon, 1 Aug 2022 18:33:17 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6A7DD38582BC DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1659378797; bh=0RXx9YjFmVZfpRqDS698JMsoQl+yUNlBTOGngfi6kdI=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=R/iTfSrj0GEzlrLi7ZQikqnyyAFSH68YYnLZuVHtodtcN2yY4BMkHykTaI+TmJX1I ghwTHdGcesvJ2lmSA49F7JC5+ghwckrxoKp7L7LVAUV1OIi7Sef48Q3G3+xtLcCXwq ff9lEIm902I+w+fe/ElHF6/s/6rxwI5nDr1Is2L0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp67.iad3a.emailsrvr.com (smtp67.iad3a.emailsrvr.com [173.203.187.67]) by sourceware.org (Postfix) with ESMTPS id B259938582A5 for ; Mon, 1 Aug 2022 18:32:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B259938582A5 X-Auth-ID: tom@honermann.net Received: by smtp17.relay.iad3a.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 21DC9255C0; Mon, 1 Aug 2022 14:32:34 -0400 (EDT) To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/3 v2] C: Implement C2X N2653 char8_t and UTF-8 string literal changes Date: Mon, 1 Aug 2022 14:32:29 -0400 Message-Id: <20220801183229.1325250-1-tom@honermann.net> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220725175948.1424695-2-tom@honermann.net> References: <20220725175948.1424695-2-tom@honermann.net> MIME-Version: 1.0 X-Classification-ID: 5112a87d-c8e1-47b2-9c2a-6939e4c2f8fc-1-1 X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom Honermann via Gcc-patches From: Tom Honermann Reply-To: Tom Honermann Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1739348618757383551?= X-GMAIL-MSGID: =?utf-8?q?1739984783636971572?= This patch implements the core language and compiler dependent library changes adopted for C2X via WG14 N2653. The changes include: - Change of type for UTF-8 string literals from array of const char to array of const char8_t (unsigned char). - A new atomic_char8_t typedef. - A new ATOMIC_CHAR8_T_LOCK_FREE macro defined in terms of the existing __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined macro. gcc/ChangeLog: * ginclude/stdatomic.h (atomic_char8_t, ATOMIC_CHAR8_T_LOCK_FREE): New typedef and macro. gcc/c/ChangeLog: * c-parser.c (c_parser_string_literal): Use char8_t as the type of CPP_UTF8STRING when char8_t support is enabled. * c-typeck.c (digest_init): Allow initialization of an array of character type by a string literal with type array of char8_t. gcc/c-family/ChangeLog: * c-lex.c (lex_string, lex_charconst): Use char8_t as the type of CPP_UTF8CHAR and CPP_UTF8STRING when char8_t support is enabled. * c-opts.c (c_common_post_options): Set flag_char8_t if targeting C2x. --- gcc/c-family/c-lex.cc | 13 +++++++++---- gcc/c-family/c-opts.cc | 4 ++-- gcc/c/c-parser.cc | 16 ++++++++++++++-- gcc/c/c-typeck.cc | 2 +- gcc/ginclude/stdatomic.h | 6 ++++++ 5 files changed, 32 insertions(+), 9 deletions(-) diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc index 8bfa4f4024f..0b6f94e18a8 100644 --- a/gcc/c-family/c-lex.cc +++ b/gcc/c-family/c-lex.cc @@ -1352,7 +1352,14 @@ lex_string (const cpp_token *tok, tree *valp, bool objc_string, bool translate) default: case CPP_STRING: case CPP_UTF8STRING: - value = build_string (1, ""); + if (type == CPP_UTF8STRING && flag_char8_t) + { + value = build_string (TYPE_PRECISION (char8_type_node) + / TYPE_PRECISION (char_type_node), + ""); /* char8_t is 8 bits */ + } + else + value = build_string (1, ""); break; case CPP_STRING16: value = build_string (TYPE_PRECISION (char16_type_node) @@ -1425,9 +1432,7 @@ lex_charconst (const cpp_token *token) type = char16_type_node; else if (token->type == CPP_UTF8CHAR) { - if (!c_dialect_cxx ()) - type = unsigned_char_type_node; - else if (flag_char8_t) + if (flag_char8_t) type = char8_type_node; else type = char_type_node; diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc index b9f01a65ed7..108adc5caf8 100644 --- a/gcc/c-family/c-opts.cc +++ b/gcc/c-family/c-opts.cc @@ -1059,9 +1059,9 @@ c_common_post_options (const char **pfilename) if (flag_sized_deallocation == -1) flag_sized_deallocation = (cxx_dialect >= cxx14); - /* char8_t support is new in C++20. */ + /* char8_t support is implicitly enabled in C++20 and C2X. */ if (flag_char8_t == -1) - flag_char8_t = (cxx_dialect >= cxx20); + flag_char8_t = (cxx_dialect >= cxx20) || flag_isoc2x; if (flag_extern_tls_init) { diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 92049d1a101..fa9395986de 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -7447,7 +7447,14 @@ c_parser_string_literal (c_parser *parser, bool translate, bool wide_ok) default: case CPP_STRING: case CPP_UTF8STRING: - value = build_string (1, ""); + if (type == CPP_UTF8STRING && flag_char8_t) + { + value = build_string (TYPE_PRECISION (char8_type_node) + / TYPE_PRECISION (char_type_node), + ""); /* char8_t is 8 bits */ + } + else + value = build_string (1, ""); break; case CPP_STRING16: value = build_string (TYPE_PRECISION (char16_type_node) @@ -7472,9 +7479,14 @@ c_parser_string_literal (c_parser *parser, bool translate, bool wide_ok) { default: case CPP_STRING: - case CPP_UTF8STRING: TREE_TYPE (value) = char_array_type_node; break; + case CPP_UTF8STRING: + if (flag_char8_t) + TREE_TYPE (value) = char8_array_type_node; + else + TREE_TYPE (value) = char_array_type_node; + break; case CPP_STRING16: TREE_TYPE (value) = char16_array_type_node; break; diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc index fd0a7f81a7a..231f4e980b6 100644 --- a/gcc/c/c-typeck.cc +++ b/gcc/c/c-typeck.cc @@ -8045,7 +8045,7 @@ digest_init (location_t init_loc, tree type, tree init, tree origtype, if (char_array) { - if (typ2 != char_type_node) + if (typ2 != char_type_node && typ2 != char8_type_node) incompat_string_cst = true; } else if (!comptypes (typ1, typ2)) diff --git a/gcc/ginclude/stdatomic.h b/gcc/ginclude/stdatomic.h index bfcfdf664c7..9f2475b739d 100644 --- a/gcc/ginclude/stdatomic.h +++ b/gcc/ginclude/stdatomic.h @@ -49,6 +49,9 @@ typedef _Atomic long atomic_long; typedef _Atomic unsigned long atomic_ulong; typedef _Atomic long long atomic_llong; typedef _Atomic unsigned long long atomic_ullong; +#ifdef __CHAR8_TYPE__ +typedef _Atomic __CHAR8_TYPE__ atomic_char8_t; +#endif typedef _Atomic __CHAR16_TYPE__ atomic_char16_t; typedef _Atomic __CHAR32_TYPE__ atomic_char32_t; typedef _Atomic __WCHAR_TYPE__ atomic_wchar_t; @@ -97,6 +100,9 @@ extern void atomic_signal_fence (memory_order); #define ATOMIC_BOOL_LOCK_FREE __GCC_ATOMIC_BOOL_LOCK_FREE #define ATOMIC_CHAR_LOCK_FREE __GCC_ATOMIC_CHAR_LOCK_FREE +#ifdef __GCC_ATOMIC_CHAR8_T_LOCK_FREE +#define ATOMIC_CHAR8_T_LOCK_FREE __GCC_ATOMIC_CHAR8_T_LOCK_FREE +#endif #define ATOMIC_CHAR16_T_LOCK_FREE __GCC_ATOMIC_CHAR16_T_LOCK_FREE #define ATOMIC_CHAR32_T_LOCK_FREE __GCC_ATOMIC_CHAR32_T_LOCK_FREE #define ATOMIC_WCHAR_T_LOCK_FREE __GCC_ATOMIC_WCHAR_T_LOCK_FREE