Message ID | 20221027231645.67623-2-ben.boeckel@kitware.com |
---|---|
State | Accepted |
Headers |
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp507748wru; Thu, 27 Oct 2022 16:23:14 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6sJNl1PUDp/WvWHBFMxC29mQftP8PxEMvaQatWSKU93yy2vaBLKCgN2FKlTc/eEndd0TjT X-Received: by 2002:a05:6402:1f84:b0:455:27b8:27aa with SMTP id c4-20020a0564021f8400b0045527b827aamr47126185edc.243.1666912994599; Thu, 27 Oct 2022 16:23:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666912994; cv=none; d=google.com; s=arc-20160816; b=KlvxmakIfENQGYgUjJ8uCmoiw7C5lrf5ze0X5n1DJ9bExCKEfbzVfRGdhEWEOqkHAd UA2wZ24vhDBLIDbOMhSqZhj12CAoChL38Ccpxy4hRV+RlCxa2dsTsG3USEXskIf2Oi1h nY6eYjThjz2X/jXHgLZPAx/+NAqkuU0UQ92wGyMlJT4HOcO/TrbP22386AvhQufSU6+W PN33YJA8UEeLZ9K7FEPMJbUFGLzi61av8mnHCmkWlbqbsRAKYk8QoiWTwpxiFH1+Ku2p SE2/k+W34P4jAccMM7p2PhG7f1DSz2Ikjp2eM2er0ItnnfhBfEuflZDKBpZlX29IQgrE 4G9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=WD7Nl3PzdQINQhUMU6SJY7Al+WrwPD5f/oQH2DRtcQI=; b=hSTVOCWofHVpaIUInLoHVSOFLEozl4fXZz7PmFvh7ImhQMEZHWOAl2ljS7fgO5qbTl KPCpwJCawtEHzTagcH6VpdLIvNuf9Kl8XQQ2KT3qPtPW+PTB2LFv2jZ2/RmknFfsKucL tgckJSS0wZkJeM0bMkaquMClsa954v1az4fCwrpp5IrpIetgaXMJTx04JEEx0qwvko0C QgVszywAup0RDn7HoC82TROOHQ4+XwcgNn3YRppZTuQRqOPk2KSb3ruQpGQpOjrFUvZo /i+rc9knT5tXgQE7X3YejhI/7Ix4mMBIcXD5378XaSMEm750+Oy2mw+DX2ciLyab3IZh lxKg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=n7YSWCzv; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id sg9-20020a170907a40900b0078dd12d0a9bsi67900ejc.875.2022.10.27.16.23.14 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Oct 2022 16:23:14 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=n7YSWCzv; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 547823817765 for <ouuuleilei@gmail.com>; Thu, 27 Oct 2022 23:19:52 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 547823817765 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1666912792; bh=WD7Nl3PzdQINQhUMU6SJY7Al+WrwPD5f/oQH2DRtcQI=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=n7YSWCzvsZpL6pGiC4k9t8eRBHo5+GwFHHE187Pkrre7uO6+bdXMmP0RLzDoc8kW0 WNoWeTtap0QTcg4gCoVfoyX2M4c7YVOWdRqiBfppGeJE27ydE/04NbIbpZP6QDNzUZ lY4hU1RGgfmFHclbfyPq8xII4+3Q3GCkSR9dmLuQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com [IPv6:2607:f8b0:4864:20::72e]) by sourceware.org (Postfix) with ESMTPS id B952B3851147 for <gcc-patches@gcc.gnu.org>; Thu, 27 Oct 2022 23:17:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B952B3851147 Received: by mail-qk1-x72e.google.com with SMTP id z17so2368683qkj.8 for <gcc-patches@gcc.gnu.org>; Thu, 27 Oct 2022 16:17:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WD7Nl3PzdQINQhUMU6SJY7Al+WrwPD5f/oQH2DRtcQI=; b=u+zX4JwqpHns6Vv2ClqGJKYgIlCsK8JMykRZGn2Gna8FjjxvYkLDonKcIDdvMiMkiH 85pLfFW/TKJzXZ3VAqE4jdsNJK4ja5Zin965dk4F1X+i4HT1z7QEgjTJCByjYeYtJJHT VZOv65PAvMPwF8Gxdsus0XPqaZ4aKL5I4hF9mDDF4KlgJVSzMGjb3BvoEWSaNDgB7fa6 g5pIiofxoS+tF4ClwNgAcA/ZTiPkqjSRpzU89LwGgtbW9Ed8nH+6IWjedLpgO02hL1L8 mGyoAxc5fKPhmM4N5PgiMBv7ss1iAbkBofBaABKjYVUba5DYSHmuo9KyDnV5+osk1bxF L9Cg== X-Gm-Message-State: ACrzQf3z92wFa2J0sRj64y7kgTbIgQwJC3hzvGCuBDxrbys+KdHpskQ0 ASuAlmyhv5PZS9zZ0tcQMgS2jCtZNPyCXw== X-Received: by 2002:ae9:e8c2:0:b0:6f9:2661:b674 with SMTP id a185-20020ae9e8c2000000b006f92661b674mr7735819qkg.392.1666912627197; Thu, 27 Oct 2022 16:17:07 -0700 (PDT) Received: from localhost (cpe-142-105-146-128.nycap.res.rr.com. [142.105.146.128]) by smtp.gmail.com with ESMTPSA id d17-20020a05622a15d100b003434d3b5938sm1620908qty.2.2022.10.27.16.17.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Oct 2022 16:17:06 -0700 (PDT) To: gcc-patches@gcc.gnu.org Subject: [PATCH v2 1/3] libcpp: reject codepoints above 0x10FFFF Date: Thu, 27 Oct 2022 19:16:42 -0400 Message-Id: <20221027231645.67623-2-ben.boeckel@kitware.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221027231645.67623-1-ben.boeckel@kitware.com> References: <20221027231645.67623-1-ben.boeckel@kitware.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Ben Boeckel via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Ben Boeckel <ben.boeckel@kitware.com> Cc: gcc@gcc.gnu.org, brad.king@kitware.com, fortran@gcc.gnu.org, anlauf@gmx.de, Ben Boeckel <ben.boeckel@kitware.com>, nathan@acm.org Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747884960059819116?= X-GMAIL-MSGID: =?utf-8?q?1747884960059819116?= |
Series |
RFC: P1689R5 support
|
|
Checks
Context | Check | Description |
---|---|---|
snail/gcc-patch-check | success | Github commit url |
Commit Message
Ben Boeckel
Oct. 27, 2022, 11:16 p.m. UTC
Unicode does not support such values because they are unrepresentable in
UTF-16.
Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
---
libcpp/ChangeLog | 6 ++++++
libcpp/charset.cc | 4 ++--
2 files changed, 8 insertions(+), 2 deletions(-)
Comments
On Thu, 2022-10-27 at 19:16 -0400, Ben Boeckel wrote: > Unicode does not support such values because they are unrepresentable > in > UTF-16. Wikipedia pointed me to RFC 3629, which was when UTF-8 introduced this restriction, whereas libcpp was implementing the higher upper limit from the earlier, superceded RFC 2279. The patch looks good to me, assuming it bootstraps and passes usual regression testing, but... > > Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com> > --- > libcpp/ChangeLog | 6 ++++++ > libcpp/charset.cc | 4 ++-- > 2 files changed, 8 insertions(+), 2 deletions(-) > > diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog > index 18d5bcceaf0..4d707277531 100644 > --- a/libcpp/ChangeLog > +++ b/libcpp/ChangeLog > @@ -1,3 +1,9 @@ > +2022-10-27 Ben Boeckel <ben.boeckel@kitware.com> > + > + * include/charset.cc: Reject encodings of codepoints above > 0x10FFFF. > + UTF-16 does not support such codepoints and therefore all > Unicode > + rejects such values. > + > 2022-10-19 Lewis Hyatt <lhyatt@gmail.com> ...AIUI we now put ChangeLog entries in the blurb part of the patch, so that server-side git scripts add them to the actual ChangeLog file. Does the patch pass: ./contrib/gcc-changelog/git_check_commit.py ? Thanks Dave > > * include/cpplib.h (struct cpp_string): Use new > "string_length" GTY. > diff --git a/libcpp/charset.cc b/libcpp/charset.cc > index 12a398e7527..e9da6674b5f 100644 > --- a/libcpp/charset.cc > +++ b/libcpp/charset.cc > @@ -216,7 +216,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t > *inbytesleftp, > if (c <= 0x3FFFFFF && nbytes > 5) return EILSEQ; > > /* Make sure the character is valid. */ > - if (c > 0x7FFFFFFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ; > + if (c > 0x10FFFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ; > > *cp = c; > *inbufp = inbuf; > @@ -320,7 +320,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar > **inbufp, size_t *inbytesleftp, > s += inbuf[bigend ? 2 : 1] << 8; > s += inbuf[bigend ? 3 : 0]; > > - if (s >= 0x7FFFFFFF || (s >= 0xD800 && s <= 0xDFFF)) > + if (s > 0x10FFFF || (s >= 0xD800 && s <= 0xDFFF)) > return EILSEQ; > > rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);
On 10/27/22 13:16, Ben Boeckel wrote: > Unicode does not support such values because they are unrepresentable in > UTF-16. > > Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com> > --- > libcpp/ChangeLog | 6 ++++++ > libcpp/charset.cc | 4 ++-- > 2 files changed, 8 insertions(+), 2 deletions(-) > > diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog > index 18d5bcceaf0..4d707277531 100644 > --- a/libcpp/ChangeLog > +++ b/libcpp/ChangeLog > @@ -1,3 +1,9 @@ > +2022-10-27 Ben Boeckel <ben.boeckel@kitware.com> > + > + * include/charset.cc: Reject encodings of codepoints above 0x10FFFF. > + UTF-16 does not support such codepoints and therefore all Unicode > + rejects such values. > + > 2022-10-19 Lewis Hyatt <lhyatt@gmail.com> > > * include/cpplib.h (struct cpp_string): Use new "string_length" GTY. > diff --git a/libcpp/charset.cc b/libcpp/charset.cc > index 12a398e7527..e9da6674b5f 100644 > --- a/libcpp/charset.cc > +++ b/libcpp/charset.cc > @@ -216,7 +216,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t *inbytesleftp, > if (c <= 0x3FFFFFF && nbytes > 5) return EILSEQ; > > /* Make sure the character is valid. */ > - if (c > 0x7FFFFFFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ; > + if (c > 0x10FFFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ; Please also adjust the comment before the function that talks about the 0x7FFFFFFF maximum. > > *cp = c; > *inbufp = inbuf; > @@ -320,7 +320,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, size_t *inbytesleftp, > s += inbuf[bigend ? 2 : 1] << 8; > s += inbuf[bigend ? 3 : 0]; > > - if (s >= 0x7FFFFFFF || (s >= 0xD800 && s <= 0xDFFF)) > + if (s > 0x10FFFF || (s >= 0xD800 && s <= 0xDFFF)) > return EILSEQ; > > rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);
diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog index 18d5bcceaf0..4d707277531 100644 --- a/libcpp/ChangeLog +++ b/libcpp/ChangeLog @@ -1,3 +1,9 @@ +2022-10-27 Ben Boeckel <ben.boeckel@kitware.com> + + * include/charset.cc: Reject encodings of codepoints above 0x10FFFF. + UTF-16 does not support such codepoints and therefore all Unicode + rejects such values. + 2022-10-19 Lewis Hyatt <lhyatt@gmail.com> * include/cpplib.h (struct cpp_string): Use new "string_length" GTY. diff --git a/libcpp/charset.cc b/libcpp/charset.cc index 12a398e7527..e9da6674b5f 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -216,7 +216,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t *inbytesleftp, if (c <= 0x3FFFFFF && nbytes > 5) return EILSEQ; /* Make sure the character is valid. */ - if (c > 0x7FFFFFFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ; + if (c > 0x10FFFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ; *cp = c; *inbufp = inbuf; @@ -320,7 +320,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, size_t *inbytesleftp, s += inbuf[bigend ? 2 : 1] << 8; s += inbuf[bigend ? 3 : 0]; - if (s >= 0x7FFFFFFF || (s >= 0xD800 && s <= 0xDFFF)) + if (s > 0x10FFFF || (s >= 0xD800 && s <= 0xDFFF)) return EILSEQ; rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);