From patchwork Tue Jun 6 20:50:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Boeckel X-Patchwork-Id: 104108 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp3662836vqr; Tue, 6 Jun 2023 13:52:58 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7438fIBljnn3OyrT5u7bgyg3LpVWXMaBpaYWOp0qMcFF8S6KkpR7zzjasfT7na3W4e2pfQ X-Received: by 2002:aa7:dad9:0:b0:514:bcfd:6e2a with SMTP id x25-20020aa7dad9000000b00514bcfd6e2amr3930213eds.2.1686084778792; Tue, 06 Jun 2023 13:52:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686084778; cv=none; d=google.com; s=arc-20160816; b=QdhbeJsMtfYmLtI214jm4xdm4mcxKtEaBDtIBFUk7C4319w/ETfDZsatmGjhOtKcg0 BUmMLH6+Cx2IIc8vUMqhi3mb33UmpT8wqMVKAhqmTidassChM95g3IdT87p9R4iC2EHN PTi3nSu9mLFzP53JoNx92Onrdf3T1xXBtyHlF95G5/afY5bGxJtiy4OukB4lIrhd1iRc bq9LkNNWNvHpYfCKCluF0wvVTbnXBiiohbAYRNpirbKbnPRUssfAkBcJCufw7yDoYpOV P4NSOunCNbF8aEQ26KO6QcOb2z7IbiPyllsmQ3XJuQArlSGz479jRJNHCqyxTmqPE1bp JiLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=REXHb1SA1zX3JaNOuBOM4dldKzP8DRyzfZx6ICOG4Vs=; b=FrwmmnC8YViRbRK+0weZBVDukE2kpeRxLL5M57X8gbu6Ld6MVXfQvZ0t1ZALlhp684 Xht7ABz2DUlp+ol60lLBhgKMGkdQ9okffxhSbzoHc0bWk1RQeSSyDzG6MnufTLJjeLw3 kbwhMUw+mIHdKKFI1+zRkyhENcygSqK2+LS5XEc3OLkAYjrE0yNOUNXDHdxOQ0ZI5K0G SU2wBUTx2MeDa0qZ8ptlZBZGr/oP/nNgsqgiI06ySDEuo8S7+YV+TCC5cZOPJOPve80X d7bOeR/lq43uxM3UFWxPylbrPi7A9Zg050bPLZ1PnnEc+AiYr+b3MAeHa3ctRcFkLaUO mz/w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=eas0+HKo; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id n8-20020a056402514800b00514b886ef57si7047974edd.383.2023.06.06.13.52.58 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Jun 2023 13:52:58 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=eas0+HKo; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E54503844077 for ; Tue, 6 Jun 2023 20:51:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E54503844077 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1686084685; bh=REXHb1SA1zX3JaNOuBOM4dldKzP8DRyzfZx6ICOG4Vs=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=eas0+HKolOjgdKbbyy7nxywrO0huFdJ+cQrnwpQXkaZ3Rdh4D7MDiN2TwvpbDnph3 VgQQM4VX7o0PCegVhHGlXqM5qw43RxVt+Dk9ApNJMs3xgcCMkWj601Q5pCch2RsbQV 7OWNpiumx55zsonQkY2DXJgq7IwYBVCETBzDTmGQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qv1-xf35.google.com (mail-qv1-xf35.google.com [IPv6:2607:f8b0:4864:20::f35]) by sourceware.org (Postfix) with ESMTPS id C54613858281 for ; Tue, 6 Jun 2023 20:50:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C54613858281 Received: by mail-qv1-xf35.google.com with SMTP id 6a1803df08f44-6261cb1208eso44316836d6.0 for ; Tue, 06 Jun 2023 13:50:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686084635; x=1688676635; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=REXHb1SA1zX3JaNOuBOM4dldKzP8DRyzfZx6ICOG4Vs=; b=kGJc/IOa8LxDbJhI9bcWaDKHobDO5aQUo5gDgKt1Z1L+sbsH4jwSQt4Hlu+eaTIsnU yzq5RfXrFkxOxXkDl59wdjlow4nAmj5nf+q57ioGUODJXH6Zf1HbNcN4WX92zJPWu/9q naf90Y7mURtX7vjbbTE0fQLCs8xG3+61sROiCJpwQ1ZaqsmoQ2Y0tbvlNTSzVT0Okk/T oTV30qwnFUthVy7wYUSgsoBEtaVlaj4My2bJDg16IZqkexyKiC3Ppn+Ns9SxbcE1uzL3 dDMYIp50wJRzwYpKxB7zjbcr0guznxQmqcTlgVU555+iT6iK+xR5YLDLyzbA2BiiFE8B Mp9g== X-Gm-Message-State: AC+VfDyOBYbWXRkesIQpLbboo2mDd71D7hbXDDFPsZbSi9a0+wvU/BzD L6jk7YYlU3vNHPtZslwLTZKQ/wJx2uHcPBJYezRQaQ== X-Received: by 2002:a05:6214:2a84:b0:625:86ed:8aab with SMTP id jr4-20020a0562142a8400b0062586ed8aabmr874774qvb.14.1686084635112; Tue, 06 Jun 2023 13:50:35 -0700 (PDT) Received: from localhost (cpe-142-105-146-128.nycap.res.rr.com. [142.105.146.128]) by smtp.gmail.com with ESMTPSA id w2-20020ac84d02000000b003f6a0fa022bsm5831228qtv.51.2023.06.06.13.50.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Jun 2023 13:50:34 -0700 (PDT) To: gcc-patches@gcc.gnu.org Cc: Ben Boeckel , jason@redhat.com, nathan@acm.org, fortran@gcc.gnu.org, gcc@gcc.gnu.org, brad.king@kitware.com Subject: [PATCH v6 1/4] libcpp: reject codepoints above 0x10FFFF Date: Tue, 6 Jun 2023 16:50:22 -0400 Message-Id: <20230606205025.3164738-2-ben.boeckel@kitware.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230606205025.3164738-1-ben.boeckel@kitware.com> References: <20230606205025.3164738-1-ben.boeckel@kitware.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Ben Boeckel via Gcc-patches From: Ben Boeckel Reply-To: Ben Boeckel Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767988032832158982?= X-GMAIL-MSGID: =?utf-8?q?1767988032832158982?= Unicode does not support such values because they are unrepresentable in UTF-16. libcpp/ * charset.cc: Reject encodings of codepoints above 0x10FFFF. UTF-16 does not support such codepoints and therefore all Unicode rejects such values. Signed-off-by: Ben Boeckel --- libcpp/charset.cc | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/libcpp/charset.cc b/libcpp/charset.cc index d7f323b2cd5..3b34d804cf1 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -1886,6 +1886,13 @@ cpp_valid_utf8_p (const char *buffer, size_t num_bytes) int err = one_utf8_to_cppchar (&iter, &bytesleft, &cp); if (err) return false; + + /* Additionally, Unicode declares that all codepoints above 0010FFFF are + invalid because they cannot be represented in UTF-16. + + Reject such values.*/ + if (cp >= 0x10FFFF) + return false; } /* No problems encountered. */ return true;