From patchwork Wed Jan 31 09:50:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Wakely X-Patchwork-Id: 194664 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1804672dyb; Wed, 31 Jan 2024 02:57:28 -0800 (PST) X-Google-Smtp-Source: AGHT+IElmcIfFpwJe9kIoEogw/iS886KJX05BMQ2h3RzrC20DYqR2ntOFiRFoRhafiC1a9gwO+aS X-Received: by 2002:ae9:c00d:0:b0:781:6055:9283 with SMTP id u13-20020ae9c00d000000b0078160559283mr1040638qkk.64.1706698648142; Wed, 31 Jan 2024 02:57:28 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706698648; cv=pass; d=google.com; s=arc-20160816; b=B5+EoQaHJH4wuLlkUcVEvyrHbv3haX2SPaid7hEiMs2UAfutwPvwL2wwya4BPuBbSx UyA/FucJ3yBiA54+eM+ptrdZS0bSSZsgN5V5iXKsZG+BgiO/fqbDpiCEuyBSmh/r2m88 2aM9SUwR0aYmFupxvR6DeRCkDcPZqgtca8wToTzouJ1relcR3Q4BHD253R8J/UxmQwUn MkB4uWmfiiUEjJ72/XxzFHCTgZGO1f6PZJonWN+YrfjBe7q7RYRtUAfhf90IPGmjLIlL sDKzjfRIXBuDmVCGjyCjzsJwyRzkc2WFRELnVT9/NdD/TY0Sb8650qp1FxTjjeR9tM2F nTQQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :arc-filter:dmarc-filter:delivered-to; bh=MwWKBbp56tWa9BZ2VJoRFgVrqRsD35az+HyWCHC8vjU=; fh=CmbzGNHmIj3SPCZvdrapYc2SOWmg7uvZkLn3OSi7GEQ=; b=jhWLCKGz0UAoIJYHawIS4L2ze8iDlSbqYPHi9R8VZmnVkALB8LvaPkxZZH95COi+iv WGxHpuTnrupvd6O43VQtner7sEX7exui8/EhvkwyRMHKJ8cuHCc4KonY+0Xf2WWLbWID Ihs8vVaTH6DPHPvPBqqWPXbGMF1aNYwLP2OOFwYokxqTsqy5IxrnuEz8fyR24BPBNl9a 31VzarHP2160q1zspFt5M/xIPG9QIU9TWbUH7z6a1so55/qYUXf9DpLxLUZdNOIPWnry i4wLdxmdVvXwMvZUkNm68ocSS9v96/6HXiGEknSIcSWe63aenk+X8VoSveYJl3J8NlVt 677Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CMmHz5u7; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Forwarded-Encrypted: i=1; AJvYcCXM5xG003gv1xYnLNDeiQ4fEC8Mtc1SAoXVKlNgDYBdOhkVoZRcYUii5+G7WIx0Rcnr2QLBhD2Sa0w/HGqJmzkPQJkvJQ== Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id wa19-20020a05620a4d1300b00783dab074a8si10810651qkn.582.2024.01.31.02.57.27 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Jan 2024 02:57:28 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CMmHz5u7; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 308083861858 for ; Wed, 31 Jan 2024 09:53:56 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id ED1C23858C78 for ; Wed, 31 Jan 2024 09:52:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ED1C23858C78 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org ED1C23858C78 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706694770; cv=none; b=K0y5qiuKDpQ7aktervX1hkWD7S3LcQkWdhi2qvKWyV46tPIEEEcS2+bF/byuffEpqyY3ZSYwVl24lGnvXyQlIgYByoYMLGNalqPDJtV7Nujwx8RTrwwNJZUFMT/rQgQDrMRF5WXwIDSrawL4ZRf9nboQUwsm7Vw01wpBT4aSYpI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706694770; c=relaxed/simple; bh=brjTdCApyDLja7B3aZjHrX7Pf080VsBWpihXJixghbY=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=wyXAFWZRF5vA186UlQfTCkEre233PYhOARyFJZlto1W8d6zp8spKC9AaWF25RFGMXHS2XTdNbEhbkM9yaR0YIeF2Ov+K3UdHsZ6CbyFV17sjnazB0qKVlbqR+kJSUIcdq1QT8m2aEjUZQufKMcsqJ55SBJ/VxxM3gKVwDa2IROw= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1706694767; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=MwWKBbp56tWa9BZ2VJoRFgVrqRsD35az+HyWCHC8vjU=; b=CMmHz5u7Yu63VdXth5UgjSrZzmmeoOuuE607DKk7pCYPzzm6UUzIjowxsBz8HkP33LMl1Y syOFe4bDcSiU9gY8tZe/aNzoO7fXuYehQ2x5tpNGR9+FTbNuLska867BjdL0lJeWRrTH8S autkhfakLSxXEThImImZyB/E0z50rKI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-615-BxxV7c7jMuS0jYIPrpqRdA-1; Wed, 31 Jan 2024 04:52:46 -0500 X-MC-Unique: BxxV7c7jMuS0jYIPrpqRdA-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0CE0D87DC05; Wed, 31 Jan 2024 09:52:46 +0000 (UTC) Received: from localhost (unknown [10.42.28.13]) by smtp.corp.redhat.com (Postfix) with ESMTP id CDFF5492BE4; Wed, 31 Jan 2024 09:52:45 +0000 (UTC) From: Jonathan Wakely To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Cc: Ewan Higgs Subject: [committed] libstdc++: Add "ASCII" as an alias for std::text_encoding::id::ASCII Date: Wed, 31 Jan 2024 09:50:50 +0000 Message-ID: <20240131095245.1915153-1-jwakely@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.10 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789603241717603105 X-GMAIL-MSGID: 1789603241717603105 SG16 (Unicode and Text Study Group) and LWG are overwhelmingly in favour of adding this alias, so let's not wait for the issue to get voted into the working draft. Tested aarch64-linux. Pushed to trunk. -- >8 -- As noted in LWG 4043, "ASCII" is not an alias for any known registered character encoding, so std::text_encoding("ASCII").mib() == id::other. Add the alias "ASCII" to the implementation-defined superset of aliases for that encoding. libstdc++-v3/ChangeLog: * include/bits/text_encoding-data.h: Regenerate. * scripts/gen_text_encoding_data.py: Add extra_aliases dict containing "ASCII". * testsuite/std/text_encoding/cons.cc: Check "ascii" is known. Co-authored-by: Ewan Higgs Signed-off-by: Ewan Higgs --- .../include/bits/text_encoding-data.h | 3 ++- .../scripts/gen_text_encoding_data.py | 24 ++++++++++++++++++- .../testsuite/std/text_encoding/cons.cc | 5 ++++ 3 files changed, 30 insertions(+), 2 deletions(-) diff --git a/libstdc++-v3/include/bits/text_encoding-data.h b/libstdc++-v3/include/bits/text_encoding-data.h index 7ac2e9dc3d9..5041e738d21 100644 --- a/libstdc++-v3/include/bits/text_encoding-data.h +++ b/libstdc++-v3/include/bits/text_encoding-data.h @@ -14,6 +14,7 @@ { 3, "IBM367" }, { 3, "cp367" }, { 3, "csASCII" }, + { 3, "ASCII" }, // libstdc++ extension { 4, "ISO_8859-1:1987" }, { 4, "iso-ir-100" }, { 4, "ISO_8859-1" }, @@ -417,7 +418,7 @@ { 104, "csISO2022CN" }, { 105, "ISO-2022-CN-EXT" }, { 105, "csISO2022CNEXT" }, -#define _GLIBCXX_TEXT_ENCODING_UTF8_OFFSET 413 +#define _GLIBCXX_TEXT_ENCODING_UTF8_OFFSET 414 { 106, "UTF-8" }, { 106, "csUTF8" }, { 109, "ISO-8859-13" }, diff --git a/libstdc++-v3/scripts/gen_text_encoding_data.py b/libstdc++-v3/scripts/gen_text_encoding_data.py index 2d6f3e4077a..f0ebb42d8c2 100755 --- a/libstdc++-v3/scripts/gen_text_encoding_data.py +++ b/libstdc++-v3/scripts/gen_text_encoding_data.py @@ -36,6 +36,18 @@ print("#ifndef _GLIBCXX_GET_ENCODING_DATA") print('# error "This is not a public header, do not include it directly"') print("#endif\n") +# We need to generate a list of initializers of the form { mib, alias }, e.g., +# { 3, "US-ASCII" }, +# { 3, "ISO646-US" }, +# { 3, "csASCII" }, +# { 4, "ISO_8859-1:1987" }, +# { 4, "latin1" }, +# The initializers must be sorted by the mib value. The first entry for +# a given mib must be the primary name for the encoding. Any aliases for +# the encoding come after the primary name. +# We also define a macro _GLIBCXX_TEXT_ENCODING_UTF8_OFFSET which is the +# offset into the list of the mib=106, alias="UTF-8" entry. This is used +# to optimize the common case, so we don't need to search for "UTF-8". charsets = {} with open(sys.argv[1], newline='') as f: @@ -52,10 +64,15 @@ with open(sys.argv[1], newline='') as f: aliases.remove(name) charsets[mib] = [name] + aliases -# Remove "NATS-DANO" and "NATS-DANO-ADD" +# Remove "NATS-DANO" and "NATS-DANO-ADD" as specified by the C++ standard. charsets.pop(33, None) charsets.pop(34, None) +# This is not an official IANA alias, but we include it in the +# implementation-defined superset of aliases for US-ASCII. +# See also LWG 4043. +extra_aliases = {3: ["ASCII"]} + count = 0 for mib in sorted(charsets.keys()): names = charsets[mib] @@ -64,6 +81,11 @@ for mib in sorted(charsets.keys()): for name in names: print(' {{ {:4}, "{}" }},'.format(mib, name)) count += len(names) + if mib in extra_aliases: + names = extra_aliases[mib] + for name in names: + print(' {{ {:4}, "{}" }}, // libstdc++ extension'.format(mib, name)) + count += len(names) # gives an error if this macro is left defined. # Do this last, so that the generated output is not usable unless we reach here. diff --git a/libstdc++-v3/testsuite/std/text_encoding/cons.cc b/libstdc++-v3/testsuite/std/text_encoding/cons.cc index b9d93641de4..8fcc2ec8c3b 100644 --- a/libstdc++-v3/testsuite/std/text_encoding/cons.cc +++ b/libstdc++-v3/testsuite/std/text_encoding/cons.cc @@ -53,6 +53,11 @@ test_construct_by_name() VERIFY( e4.name() == s ); VERIFY( ! e4.aliases().empty() ); VERIFY( e4.aliases().front() == "US-ASCII"sv ); // primary name + + s = "ascii"; + std::text_encoding e5(s); + VERIFY( e5.mib() == std::text_encoding::ASCII ); + VERIFY( e5.name() == s ); } constexpr void