From patchwork Tue Nov 14 07:33:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 164745 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b909:0:b0:403:3b70:6f57 with SMTP id t9csp1701085vqg; Mon, 13 Nov 2023 23:34:43 -0800 (PST) X-Google-Smtp-Source: AGHT+IHa05xHBUZFU1+BIuS0gwMcv0y1DVo2H22gfEVpxgAIZFHdji8+eLiEyNGYKl2pZyS6hmgo X-Received: by 2002:ad4:5888:0:b0:677:b239:ca14 with SMTP id dz8-20020ad45888000000b00677b239ca14mr1300248qvb.44.1699947283515; Mon, 13 Nov 2023 23:34:43 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1699947283; cv=pass; d=google.com; s=arc-20160816; b=v1vJmfg86hJZfyBoUtlhhtq1Mhkw2Os+8UZyqCj2LSbMrnbWPxQH49/MvshfU0nmk5 /6PF437fsWmyWi+/o5EIAA8nDW/iIOn+h0Qu0WQOZy88b45eLlHO0Dt01jl2RJLb3A2a cvislPurdC0yiZ5L6sRRygixTRPdkq3PDD41cij4F+ZarmG6u5zG++EYSRccwoZtdG69 yBT01orKwskOglR46d09FCOfT2r//fRa6DxIlqjtTGq9sHhtqraiBGVABX37CdcupMt5 BsYXK7AXh96fC+UDewT0p9zOrUqD3OVHV1S+UTFQWVtNvTwdkIhzimX8Us2OVil9FcG4 IcvQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:reply-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-disposition :mime-version:message-id:subject:cc:to:from:date:dkim-signature :arc-filter:dmarc-filter:delivered-to; bh=/1cdXtX171ITGE87RmWPwKinw0hhSYdYMSW6FgXPv8M=; fh=0xZT+NBKSeH8qOu04/61f1ZGePpF4jF/gxp331YE14k=; b=mpWnCBsZN+wPpb4IWzgEfntc0fk0YhQ4B5+5PUQinUDuQEw3uRkvBVLPs+7292QBwI G+sHdTdRqrFkRYe+4LdLr07MZi9C0FoUK8CIfeHcAYtawRUXpzp4eHtK1yMvldbG81Eq mYCmz4io8EHYZqqAx0tcmd0gkH/jFQwpg/DGFUkiveb73N74UKtDElc67YdCR4tFZdCG UU7CMVqx6UbefH7jdCQan5hAbcWAM2VtAeelWzOdnsQ+uasnw+iFHUrbHRr57XzUU+ze x+C5SLj6Y6h29muam9noHY50sqk1KJoFiwUgcpIHs4SNc1Z0aUhfcuCOdxZg9iSj4J2L TqWA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=irjH7oFh; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id h20-20020a05620a245400b0077892b417desi6387331qkn.333.2023.11.13.23.34.43 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Nov 2023 23:34:43 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=irjH7oFh; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3D3533858C27 for ; Tue, 14 Nov 2023 07:34:43 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id F0BAE3858C2D for ; Tue, 14 Nov 2023 07:34:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F0BAE3858C2D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F0BAE3858C2D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699947250; cv=none; b=xktTcPCtzTMocPHTtLMrOFH0fFlSL9EkBIvTUBCZmCWTt8Rg5R+Zx0CT72+vunjfxDxGW8iERwKFJDlC51uDXSwcH2SQp173n2qSjLP9VjccKaCXPv5vp/YYyLwL9uZRBvZO5zpjOnqUyABrHHbwbTwC/y90sVGMzGylE8SyGcg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699947250; c=relaxed/simple; bh=ObIdAtl52QMO5yuerxP0QF+bldMolt9BuwbVKtY0c50=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=rV55PC+5NkyH9BXS4jcsyoc/M3iJLrfVDjiHu+QQ60wXS1mb0EEd/B9dFCs9FKtR6yxHWsLh1p4uOdMO2hW/IyZxJsClZ1jHhWSJUK96dz/kKS1eCf8+9gva0KTCBv/cglfWXbZx2WJBMI8VXtGQM2CiH1MC/E2/2GO8nbLJuiw= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1699947246; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=/1cdXtX171ITGE87RmWPwKinw0hhSYdYMSW6FgXPv8M=; b=irjH7oFhBtqsBjor/GfzWjjVyM1PLj0qCt72KryFXv4CGJGelHINzT0WRdOJ0fFKZIXx79 vSQxeRDXDqKEBF+FppXy0tEPX9VduBxjdSMKCV7ZE481d+UEZMkXGgFy+T7He+DoJxCHMQ Ug5L7p1QQf8aZi5Wzs0dXt8tN3efgn4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-301-o8x5IGtxNKq6wLOQb0PxtQ-1; Tue, 14 Nov 2023 02:33:58 -0500 X-MC-Unique: o8x5IGtxNKq6wLOQb0PxtQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1B79C85A58B for ; Tue, 14 Nov 2023 07:33:58 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.194.53]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5E8F8C1596F; Tue, 14 Nov 2023 07:33:57 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 3AE7XtqN3619751 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 14 Nov 2023 08:33:55 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 3AE7Xsg63619750; Tue, 14 Nov 2023 08:33:54 +0100 Date: Tue, 14 Nov 2023 08:33:54 +0100 From: Jakub Jelinek To: Jason Merrill Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] libcpp, contrib: Update to Unicode 15.1 Message-ID: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.8 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-0.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Jakub Jelinek Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782523815620272665 X-GMAIL-MSGID: 1782523922986009664 Hi! On Tue, Nov 14, 2023 at 08:23:27AM +0100, Jakub Jelinek wrote: > The following patch (in plaintext just a pseudo-patch where I've left out > the too big parts of either wget downloaded or regenerated files out with > ..., full patch attached compressed) updates to Unicode 15.1 from 15.0 > we had last year. Apparently Unicode forgot to add a new range to 4-8 Table > we are using, but from the other files it is clear what should have been > added; I've filed a bugreport against Unicode. Reposted, because the attachment was still too even after compression. This compressed patch leaves out uname2c.h changes, will post that as a separate mail. 2023-11-14 Jakub Jelinek contrib/ * unicode/README: Adjust glibc git commit hash, number of Unicode data files to be updated and latest Unicode version. * unicode/from_glibc/utf8_gen.py: Update from glibc. * unicode/UnicodeData.txt: Update from Unicode 15.1. * unicode/EastAsianWidth.txt: Likewise. * unicode/DerivedNormalizationProps.txt: Likewise. * unicode/NameAliases.txt: Likewise. * unicode/DerivedCoreProperties.txt: Likewise. * unicode/PropList.txt: Likewise. libcpp/ * makeucnid.cc (write_copyright): Update copyright year. * makeuname2c.cc (write_copyright): Likewise. (struct generated): Update latest Unicode version. (generated_ranges): Add 2ebf0-2ee5d CJK UNIFIED IDEOGRAPH range which was forgotten to be added to 4-8 table, but clearly is expected to be there from the 15.1 additions. * ucnid.h: Regenerated. * uname2c.h: Regenerated. * generated_cpp_wcwidth.h: Regenerated. ... Jakub --- contrib/unicode/README.jj 2023-03-16 10:28:18.226187960 +0100 +++ contrib/unicode/README 2023-11-13 13:53:22.777991374 +0100 @@ -30,7 +30,7 @@ localedata/unicode-gen/unicode_utils.py localedata/unicode-gen/utf8_gen.py And the most recent versions added to GCC are from glibc git commit: -4c721f24fc190d1dc935eb0bab283de7cf13182e +71de3aead9fffe89556e80ebc94aa918d8ee7bca The script gen_wcwidth.py found here contains the GCC-specific code to map glibc's output to the lookup tables we require. This script should not need @@ -40,14 +40,14 @@ produce ucnid.h. The procedure to update GCC's Unicode support is the following: -1. Update the five Unicode data files from the above URLs. +1. Update the six Unicode data files from the above URLs. 2. Update the two glibc files in from_glibc/ from glibc's git. Update the commit number above in this README. 3. Run ./gen_wcwidth.py X.Y > ../../libcpp/generated_cpp_wcwidth.h (where X.Y is the version of the Unicode standard corresponding to the - Unicode data files being used, most recently, 15.0.0). + Unicode data files being used, most recently, 15.1.0). 4. Update Unicode Copyright years in libcpp/makeucnid.cc and in libcpp/makeuname2c.cc up to the year in which the Unicode --- contrib/unicode/from_glibc/utf8_gen.py.jj 2023-01-16 11:52:15.879737071 +0100 +++ contrib/unicode/from_glibc/utf8_gen.py 2023-10-12 09:42:01.018694503 +0200 @@ -350,7 +350,7 @@ if __name__ == "__main__": # the EastAsianWidth.txt file. if re.match(r'.*\.\..*', LINE): continue - if re.match(r'^[^;]*;[WF]', LINE): + if re.match(r'^[^;]*;\s*[WF]\s*', LINE): EAST_ASIAN_WIDTH_LINES.append(LINE.strip()) with open(ARGS.prop_list_file, mode='r') as PROP_LIST_FILE: PROP_LIST_LINES = [] --- contrib/unicode/UnicodeData.txt.jj 2023-03-14 12:24:55.545729148 +0100 +++ contrib/unicode/UnicodeData.txt 2023-08-28 18:08:58.000000000 +0200 @@ -11231,6 +11231,10 @@ 2FF9;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER RIGHT;So;0;ON;;;;;N;;;;; 2FFA;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER LEFT;So;0;ON;;;;;N;;;;; 2FFB;IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID;So;0;ON;;;;;N;;;;; +2FFC;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM RIGHT;So;0;ON;;;;;N;;;;; +2FFD;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER RIGHT;So;0;ON;;;;;N;;;;; +2FFE;IDEOGRAPHIC DESCRIPTION CHARACTER HORIZONTAL REFLECTION;So;0;ON;;;;;N;;;;; +2FFF;IDEOGRAPHIC DESCRIPTION CHARACTER ROTATION;So;0;ON;;;;;N;;;;; 3000;IDEOGRAPHIC SPACE;Zs;0;WS; 0020;;;;N;;;;; 3001;IDEOGRAPHIC COMMA;Po;0;ON;;;;;N;;;;; 3002;IDEOGRAPHIC FULL STOP;Po;0;ON;;;;;N;IDEOGRAPHIC PERIOD;;;; @@ -11705,6 +11709,7 @@ 31E1;CJK STROKE HZZZG;So;0;ON;;;;;N;;;;; 31E2;CJK STROKE PG;So;0;ON;;;;;N;;;;; 31E3;CJK STROKE Q;So;0;ON;;;;;N;;;;; +31EF;IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION;So;0;ON;;;;;N;;;;; 31F0;KATAKANA LETTER SMALL KU;Lo;0;L;;;;;N;;;;; 31F1;KATAKANA LETTER SMALL SI;Lo;0;L;;;;;N;;;;; 31F2;KATAKANA LETTER SMALL SU;Lo;0;L;;;;;N;;;;; @@ -34035,6 +34040,8 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N 2CEA1;;Lo;0;L;;;;;N;;;;; 2CEB0;;Lo;0;L;;;;;N;;;;; 2EBE0;;Lo;0;L;;;;;N;;;;; +2EBF0;;Lo;0;L;;;;;N;;;;; +2EE5D;;Lo;0;L;;;;;N;;;;; 2F800;CJK COMPATIBILITY IDEOGRAPH-2F800;Lo;0;L;4E3D;;;;N;;;;; 2F801;CJK COMPATIBILITY IDEOGRAPH-2F801;Lo;0;L;4E38;;;;N;;;;; 2F802;CJK COMPATIBILITY IDEOGRAPH-2F802;Lo;0;L;4E41;;;;N;;;;; --- contrib/unicode/EastAsianWidth.txt.jj 2023-03-14 12:24:55.496729855 +0100 +++ contrib/unicode/EastAsianWidth.txt 2023-08-28 18:08:56.000000000 +0200 @@ -1,11 +1,11 @@ -# EastAsianWidth-15.0.0.txt -# Date: 2022-05-24, 17:40:20 GMT [KW, LI] -# © 2022 Unicode®, Inc. +# EastAsianWidth-15.1.0.txt +# Date: 2023-07-28, 23:34:08 GMT +# © 2023 Unicode®, Inc. # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. # For terms of use, see https://www.unicode.org/terms_of_use.html # # Unicode Character Database -# For documentation, see https://www.unicode.org/reports/tr44/ +# For documentation, see https://www.unicode.org/reports/tr44/ # # East_Asian_Width Property # ... --- contrib/unicode/DerivedNormalizationProps.txt.jj 2023-03-14 12:24:55.480730086 +0100 +++ contrib/unicode/DerivedNormalizationProps.txt 2023-08-28 18:08:56.000000000 +0200 @@ -1,6 +1,6 @@ -# DerivedNormalizationProps-15.0.0.txt -# Date: 2022-04-02, 01:29:03 GMT -# © 2022 Unicode®, Inc. +# DerivedNormalizationProps-15.1.0.txt +# Date: 2023-05-02, 13:20:58 GMT +# © 2023 Unicode®, Inc. # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. # For terms of use, see https://www.unicode.org/terms_of_use.html # ... --- contrib/unicode/NameAliases.txt.jj 2023-03-16 10:28:18.226187960 +0100 +++ contrib/unicode/NameAliases.txt 2023-08-28 18:08:56.000000000 +0200 @@ -1,6 +1,6 @@ -# NameAliases-15.0.0.txt -# Date: 2022-07-26, 20:13:00 GMT [KW] -# © 2022 Unicode®, Inc. +# NameAliases-15.1.0.txt +# Date: 2023-01-05 +# © 2023 Unicode®, Inc. # For terms of use, see https://www.unicode.org/terms_of_use.html # # Unicode Character Database --- contrib/unicode/DerivedCoreProperties.txt.jj 2023-03-14 12:24:55.468730260 +0100 +++ contrib/unicode/DerivedCoreProperties.txt 2023-08-28 18:08:56.000000000 +0200 @@ -1,6 +1,6 @@ -# DerivedCoreProperties-15.0.0.txt -# Date: 2022-08-05, 22:17:05 GMT -# © 2022 Unicode®, Inc. +# DerivedCoreProperties-15.1.0.txt +# Date: 2023-08-07, 15:21:24 GMT +# © 2023 Unicode®, Inc. # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. # For terms of use, see https://www.unicode.org/terms_of_use.html # @@ -1397,11 +1397,12 @@ FFDA..FFDC ; Alphabetic # Lo [3] HA 2B740..2B81D ; Alphabetic # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D 2B820..2CEA1 ; Alphabetic # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1 2CEB0..2EBE0 ; Alphabetic # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0 +2EBF0..2EE5D ; Alphabetic # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D 2F800..2FA1D ; Alphabetic # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D 30000..3134A ; Alphabetic # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A 31350..323AF ; Alphabetic # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF -# Total code points: 137765 +# Total code points: 138387 # ================================================ @@ -6853,11 +6854,12 @@ FFDA..FFDC ; ID_Start # Lo [3] HALF 2B740..2B81D ; ID_Start # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D 2B820..2CEA1 ; ID_Start # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1 2CEB0..2EBE0 ; ID_Start # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0 +2EBF0..2EE5D ; ID_Start # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D 2F800..2FA1D ; ID_Start # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D 30000..3134A ; ID_Start # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A 31350..323AF ; ID_Start # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF -# Total code points: 136345 +# Total code points: 136967 # ================================================ @@ -7438,6 +7440,7 @@ FFDA..FFDC ; ID_Start # Lo [3] HALF 1FE0..1FEC ; ID_Continue # L& [13] GREEK SMALL LETTER UPSILON WITH VRACHY..GREEK CAPITAL LETTER RHO WITH DASIA 1FF2..1FF4 ; ID_Continue # L& [3] GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI..GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI 1FF6..1FFC ; ID_Continue # L& [7] GREEK SMALL LETTER OMEGA WITH PERISPOMENI..GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI +200C..200D ; ID_Continue # Cf [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER 203F..2040 ; ID_Continue # Pc [2] UNDERTIE..CHARACTER TIE 2054 ; ID_Continue # Pc INVERTED UNDERTIE 2071 ; ID_Continue # Lm SUPERSCRIPT LATIN SMALL LETTER I @@ -7504,6 +7507,7 @@ FFDA..FFDC ; ID_Start # Lo [3] HALF 309D..309E ; ID_Continue # Lm [2] HIRAGANA ITERATION MARK..HIRAGANA VOICED ITERATION MARK 309F ; ID_Continue # Lo HIRAGANA DIGRAPH YORI 30A1..30FA ; ID_Continue # Lo [90] KATAKANA LETTER SMALL A..KATAKANA LETTER VO +30FB ; ID_Continue # Po KATAKANA MIDDLE DOT 30FC..30FE ; ID_Continue # Lm [3] KATAKANA-HIRAGANA PROLONGED SOUND MARK..KATAKANA VOICED ITERATION MARK 30FF ; ID_Continue # Lo KATAKANA DIGRAPH KOTO 3105..312F ; ID_Continue # Lo [43] BOPOMOFO LETTER B..BOPOMOFO LETTER NN @@ -7683,6 +7687,7 @@ FF10..FF19 ; ID_Continue # Nd [10] F FF21..FF3A ; ID_Continue # L& [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z FF3F ; ID_Continue # Pc FULLWIDTH LOW LINE FF41..FF5A ; ID_Continue # L& [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z +FF65 ; ID_Continue # Po HALFWIDTH KATAKANA MIDDLE DOT FF66..FF6F ; ID_Continue # Lo [10] HALFWIDTH KATAKANA LETTER WO..HALFWIDTH KATAKANA LETTER SMALL TU FF70 ; ID_Continue # Lm HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK FF71..FF9D ; ID_Continue # Lo [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAKANA LETTER N @@ -8207,12 +8212,13 @@ FFDA..FFDC ; ID_Continue # Lo [3] H 2B740..2B81D ; ID_Continue # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D 2B820..2CEA1 ; ID_Continue # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1 2CEB0..2EBE0 ; ID_Continue # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0 +2EBF0..2EE5D ; ID_Continue # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D 2F800..2FA1D ; ID_Continue # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D 30000..3134A ; ID_Continue # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A 31350..323AF ; ID_Continue # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF E0100..E01EF ; ID_Continue # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256 -# Total code points: 139482 +# Total code points: 140108 # ================================================ @@ -8962,11 +8968,12 @@ FFDA..FFDC ; XID_Start # Lo [3] HAL 2B740..2B81D ; XID_Start # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D 2B820..2CEA1 ; XID_Start # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1 2CEB0..2EBE0 ; XID_Start # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0 +2EBF0..2EE5D ; XID_Start # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D 2F800..2FA1D ; XID_Start # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D 30000..3134A ; XID_Start # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A 31350..323AF ; XID_Start # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF -# Total code points: 136322 +# Total code points: 136944 # ================================================ @@ -9543,6 +9550,7 @@ FFDA..FFDC ; XID_Start # Lo [3] HAL 1FE0..1FEC ; XID_Continue # L& [13] GREEK SMALL LETTER UPSILON WITH VRACHY..GREEK CAPITAL LETTER RHO WITH DASIA 1FF2..1FF4 ; XID_Continue # L& [3] GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI..GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI 1FF6..1FFC ; XID_Continue # L& [7] GREEK SMALL LETTER OMEGA WITH PERISPOMENI..GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI +200C..200D ; XID_Continue # Cf [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER 203F..2040 ; XID_Continue # Pc [2] UNDERTIE..CHARACTER TIE 2054 ; XID_Continue # Pc INVERTED UNDERTIE 2071 ; XID_Continue # Lm SUPERSCRIPT LATIN SMALL LETTER I @@ -9608,6 +9616,7 @@ FFDA..FFDC ; XID_Start # Lo [3] HAL 309D..309E ; XID_Continue # Lm [2] HIRAGANA ITERATION MARK..HIRAGANA VOICED ITERATION MARK 309F ; XID_Continue # Lo HIRAGANA DIGRAPH YORI 30A1..30FA ; XID_Continue # Lo [90] KATAKANA LETTER SMALL A..KATAKANA LETTER VO +30FB ; XID_Continue # Po KATAKANA MIDDLE DOT 30FC..30FE ; XID_Continue # Lm [3] KATAKANA-HIRAGANA PROLONGED SOUND MARK..KATAKANA VOICED ITERATION MARK 30FF ; XID_Continue # Lo KATAKANA DIGRAPH KOTO 3105..312F ; XID_Continue # Lo [43] BOPOMOFO LETTER B..BOPOMOFO LETTER NN @@ -9793,6 +9802,7 @@ FF10..FF19 ; XID_Continue # Nd [10] FF21..FF3A ; XID_Continue # L& [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z FF3F ; XID_Continue # Pc FULLWIDTH LOW LINE FF41..FF5A ; XID_Continue # L& [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z +FF65 ; XID_Continue # Po HALFWIDTH KATAKANA MIDDLE DOT FF66..FF6F ; XID_Continue # Lo [10] HALFWIDTH KATAKANA LETTER WO..HALFWIDTH KATAKANA LETTER SMALL TU FF70 ; XID_Continue # Lm HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK FF71..FF9D ; XID_Continue # Lo [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAKANA LETTER N @@ -10317,12 +10327,13 @@ FFDA..FFDC ; XID_Continue # Lo [3] 2B740..2B81D ; XID_Continue # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D 2B820..2CEA1 ; XID_Continue # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1 2CEB0..2EBE0 ; XID_Continue # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0 +2EBF0..2EE5D ; XID_Continue # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D 2F800..2FA1D ; XID_Continue # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D 30000..3134A ; XID_Continue # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A 31350..323AF ; XID_Continue # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF E0100..E01EF ; XID_Continue # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256 -# Total code points: 139463 +# Total code points: 140089 # ================================================ @@ -10335,6 +10346,15 @@ E0100..E01EF ; XID_Continue # Mn [240] # - FFF9..FFFB (Interlinear annotation format characters) # - 13430..13440 (Egyptian hieroglyph format characters) # - Prepended_Concatenation_Mark (Exceptional format characters that should be visible) +# +# There are currently no stability guarantees for DICP. However, the +# values of DICP interact with the derivation of XID_Continue +# and NFKC_CF, for which there are stability guarantees. +# Maintainers of this property should note that in the +# unlikely case that the DICP value changes for an existing character +# which is also XID_Continue=Yes, then exceptions must be put +# in place to ensure that the NFKC_CF mapping value for that +# existing character does not change. 00AD ; Default_Ignorable_Code_Point # Cf SOFT HYPHEN 034F ; Default_Ignorable_Code_Point # Mn COMBINING GRAPHEME JOINER @@ -11602,7 +11622,7 @@ E0100..E01EF ; Grapheme_Extend # Mn [24 2E80..2E99 ; Grapheme_Base # So [26] CJK RADICAL REPEAT..CJK RADICAL RAP 2E9B..2EF3 ; Grapheme_Base # So [89] CJK RADICAL CHOKE..CJK RADICAL C-SIMPLIFIED TURTLE 2F00..2FD5 ; Grapheme_Base # So [214] KANGXI RADICAL ONE..KANGXI RADICAL FLUTE -2FF0..2FFB ; Grapheme_Base # So [12] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID +2FF0..2FFF ; Grapheme_Base # So [16] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER ROTATION 3000 ; Grapheme_Base # Zs IDEOGRAPHIC SPACE 3001..3003 ; Grapheme_Base # Po [3] IDEOGRAPHIC COMMA..DITTO MARK 3004 ; Grapheme_Base # So JAPANESE INDUSTRIAL STANDARD SYMBOL @@ -11657,6 +11677,7 @@ E0100..E01EF ; Grapheme_Extend # Mn [24 3196..319F ; Grapheme_Base # So [10] IDEOGRAPHIC ANNOTATION TOP MARK..IDEOGRAPHIC ANNOTATION MAN MARK 31A0..31BF ; Grapheme_Base # Lo [32] BOPOMOFO LETTER BU..BOPOMOFO LETTER AH 31C0..31E3 ; Grapheme_Base # So [36] CJK STROKE T..CJK STROKE Q +31EF ; Grapheme_Base # So IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION 31F0..31FF ; Grapheme_Base # Lo [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO 3200..321E ; Grapheme_Base # So [31] PARENTHESIZED HANGUL KIYEOK..PARENTHESIZED KOREAN CHARACTER O HU 3220..3229 ; Grapheme_Base # No [10] PARENTHESIZED IDEOGRAPH ONE..PARENTHESIZED IDEOGRAPH TEN @@ -12497,11 +12518,12 @@ FFFC..FFFD ; Grapheme_Base # So [2] 2B740..2B81D ; Grapheme_Base # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D 2B820..2CEA1 ; Grapheme_Base # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1 2CEB0..2EBE0 ; Grapheme_Base # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0 +2EBF0..2EE5D ; Grapheme_Base # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D 2F800..2FA1D ; Grapheme_Base # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D 30000..3134A ; Grapheme_Base # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A 31350..323AF ; Grapheme_Base # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF -# Total code points: 146986 +# Total code points: 147613 # ================================================ ... --- contrib/unicode/PropList.txt.jj 2023-03-14 12:24:55.497729841 +0100 +++ contrib/unicode/PropList.txt 2023-08-28 18:08:56.000000000 +0200 @@ -1,6 +1,6 @@ -# PropList-15.0.0.txt -# Date: 2022-08-05, 22:17:16 GMT -# © 2022 Unicode®, Inc. +# PropList-15.1.0.txt +# Date: 2023-08-01, 21:56:53 GMT +# © 2023 Unicode®, Inc. # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. # For terms of use, see https://www.unicode.org/terms_of_use.html # ... --- libcpp/makeuname2c.cc.jj 2023-03-16 10:19:01.734373423 +0100 +++ libcpp/makeuname2c.cc 2023-11-13 13:42:08.912442830 +0100 @@ -69,7 +69,7 @@ struct entry { const char *name; unsigne static struct entry *entries; static unsigned long num_allocated, num_entries; -/* Unicode 15 Table 4-8. */ +/* Unicode 15.1 Table 4-8. */ struct generated { const char *prefix; /* max_high is a workaround for UnicodeData.txt inconsistencies @@ -87,6 +87,7 @@ static struct generated generated_ranges { "CJK UNIFIED IDEOGRAPH-", 0x2b740, 0x2b81d, 0, 1, 0 }, { "CJK UNIFIED IDEOGRAPH-", 0x2b820, 0x2cea1, 0, 1, 0 }, { "CJK UNIFIED IDEOGRAPH-", 0x2ceb0, 0x2ebe0, 0, 1, 0 }, + { "CJK UNIFIED IDEOGRAPH-", 0x2ebf0, 0x2ee5d, 0, 1, 0 }, { "CJK UNIFIED IDEOGRAPH-", 0x30000, 0x3134a, 0, 1, 0 }, { "CJK UNIFIED IDEOGRAPH-", 0x31350, 0x323af, 0, 1, 0 }, { "TANGUT IDEOGRAPH-", 0x17000, 0x187f7, 0, 2, 0 }, @@ -669,7 +670,7 @@ write_copyright (void) .\n\ \n\ \n\ - Copyright (C) 1991-2022 Unicode, Inc. All rights reserved.\n\ + Copyright (C) 1991-2023 Unicode, Inc. All rights reserved.\n\ Distributed under the Terms of Use in\n\ http://www.unicode.org/copyright.html.\n\ \n\ --- libcpp/makeucnid.cc.jj 2023-03-16 10:19:01.722373601 +0100 +++ libcpp/makeucnid.cc 2023-11-13 13:42:21.728263043 +0100 @@ -467,7 +467,7 @@ write_copyright (void) .\n\ \n\ \n\ - Copyright (C) 1991-2022 Unicode, Inc. All rights reserved.\n\ + Copyright (C) 1991-2023 Unicode, Inc. All rights reserved.\n\ Distributed under the Terms of Use in\n\ http://www.unicode.org/copyright.html.\n\ \n\ --- libcpp/ucnid.h.jj 2023-03-16 10:19:01.735373409 +0100 +++ libcpp/ucnid.h 2023-11-13 13:42:50.819854928 +0100 @@ -16,7 +16,7 @@ . - Copyright (C) 1991-2022 Unicode, Inc. All rights reserved. + Copyright (C) 1991-2023 Unicode, Inc. All rights reserved. Distributed under the Terms of Use in http://www.unicode.org/copyright.html. @@ -1379,7 +1379,8 @@ static const struct ucnrange ucnranges[] { 0| 0| 0|C11| 0| 0| 0|CID|NFC| 0| 0, 0, 0x1ffe }, { 0| 0| 0|C11| 0| 0| 0|CID|NFC|NKC| 0, 0, 0x1fff }, { 0| 0| 0| 0| 0| 0| 0|CID| 0| 0| 0, 0, 0x200a }, -{ 0| 0| 0|C11| 0| 0| 0|CID|NFC|NKC| 0, 0, 0x200d }, +{ 0| 0| 0|C11| 0| 0| 0|CID|NFC|NKC| 0, 0, 0x200b }, +{ 0| 0| 0|C11| 0|CXX23|NXX23|CID|NFC|NKC| 0, 0, 0x200d }, { 0| 0| 0| 0| 0| 0| 0|CID|NFC|NKC| 0, 0, 0x2029 }, { 0| 0| 0|C11| 0| 0| 0|CID|NFC|NKC| 0, 0, 0x202e }, { 0| 0| 0| 0| 0| 0| 0|CID|NFC| 0| 0, 0, 0x203e }, @@ -1625,7 +1626,7 @@ static const struct ucnrange ucnranges[] { C99| 0|CXX|C11| 0|CXX23| 0| 0|NFC|NKC| 0, 0, 0x30f4 }, { C99| 0|CXX|C11| 0|CXX23| 0|CID|NFC|NKC| 0, 0, 0x30f6 }, { 0| 0|CXX|C11| 0|CXX23| 0| 0|NFC|NKC| 0, 0, 0x30fa }, -{ C99| 0|CXX|C11| 0| 0| 0|CID|NFC|NKC| 0, 0, 0x30fb }, +{ C99| 0|CXX|C11| 0|CXX23|NXX23|CID|NFC|NKC| 0, 0, 0x30fb }, { C99| 0|CXX|C11| 0|CXX23| 0|CID|NFC|NKC| 0, 0, 0x30fc }, { 0| 0|CXX|C11| 0|CXX23| 0|CID|NFC|NKC| 0, 0, 0x30fd }, { 0| 0|CXX|C11| 0|CXX23| 0| 0|NFC|NKC| 0, 0, 0x30fe }, @@ -1906,7 +1907,8 @@ static const struct ucnrange ucnranges[] { 0| 0| 0|C11| 0|CXX23|NXX23|CID|NFC| 0| 0, 0, 0xff3f }, { 0| 0| 0|C11| 0| 0| 0|CID|NFC| 0| 0, 0, 0xff40 }, { 0| 0|CXX|C11| 0|CXX23| 0|CID|NFC| 0| 0, 0, 0xff5a }, -{ 0| 0| 0|C11| 0| 0| 0|CID|NFC| 0| 0, 0, 0xff65 }, +{ 0| 0| 0|C11| 0| 0| 0|CID|NFC| 0| 0, 0, 0xff64 }, +{ 0| 0| 0|C11| 0|CXX23|NXX23|CID|NFC| 0| 0, 0, 0xff65 }, { 0| 0|CXX|C11| 0|CXX23| 0|CID|NFC| 0| 0, 0, 0xff9d }, { 0| 0|CXX|C11| 0|CXX23|NXX23|CID|NFC| 0| 0, 0, 0xff9f }, { 0| 0|CXX|C11| 0|CXX23| 0|CID|NFC| 0| 0, 0, 0xffbe }, @@ -2786,6 +2788,8 @@ static const struct ucnrange ucnranges[] { 0| 0| 0|C11| 0|CXX23| 0|CID|NFC|NKC| 0, 0, 0x2cea1 }, { 0| 0| 0|C11| 0| 0| 0|CID|NFC|NKC| 0, 0, 0x2ceaf }, { 0| 0| 0|C11| 0|CXX23| 0|CID|NFC|NKC| 0, 0, 0x2ebe0 }, +{ 0| 0| 0|C11| 0| 0| 0|CID|NFC|NKC| 0, 0, 0x2ebef }, +{ 0| 0| 0|C11| 0|CXX23| 0|CID|NFC|NKC| 0, 0, 0x2ee5d }, { 0| 0| 0|C11| 0| 0| 0|CID|NFC|NKC| 0, 0, 0x2f7ff }, { 0| 0| 0|C11| 0|CXX23| 0| 0| 0| 0| 0, 0, 0x2fa1d }, { 0| 0| 0|C11| 0| 0| 0|CID|NFC|NKC| 0, 0, 0x2fffd }, --- libcpp/uname2c.h.jj 2023-03-16 10:19:01.739373350 +0100 +++ libcpp/uname2c.h 2023-11-13 13:42:43.912951822 +0100 @@ -16,7 +16,7 @@ . - Copyright (C) 1991-2022 Unicode, Inc. All rights reserved. + Copyright (C) 1991-2023 Unicode, Inc. All rights reserved. Distributed under the Terms of Use in http://www.unicode.org/copyright.html. @@ -52,7 +52,7 @@ use or other dealings in these Data Files or Software without prior written authorization of the copyright holder. */ -static const char uname2c_dict[59891] = +static const char uname2c_dict[59919] = "DIVIDED BY HORIZONTAL BAR AND TOP HALF DIVIDED BY VERTICAL BARUIGHUR KIRGHIZ " "YEH WITH HAMZA ABOVE WITH ALEF MAKSURA LANTED EQUAL ABOVE GREATER-THAN ABOVE " "SLANTED EQUAL WITH EXCLAMATION MARK WITH LEFT RIGHT ARROW ABOVELANTED EQUAL A" ... --- libcpp/generated_cpp_wcwidth.h.jj 2023-03-14 12:24:55.976722924 +0100 +++ libcpp/generated_cpp_wcwidth.h 2023-11-13 13:54:30.472042026 +0100 @@ -1,5 +1,5 @@ /* Generated by contrib/unicode/gen_wcwidth.py, with the help of glibc's - utf8_gen.py, using version 15.0.0 of the Unicode standard. */ + utf8_gen.py, using version 15.1.0 of the Unicode standard. */ static const cppchar_t wcwidth_range_ends[] = { 0x2ff, 0x36f, 0x482, 0x489, 0x590, 0x5bd, 0x5be, 0x5bf,