From patchwork Thu Oct 27 23:16:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Boeckel X-Patchwork-Id: 12002 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp506162wru; Thu, 27 Oct 2022 16:18:46 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6ML92M9e0lvSLAARhTLT0v3k4RmRkMSiMeCuvLTWIeM9tzCvsAF1VaCJTH8xYj796XA22N X-Received: by 2002:a05:6402:5248:b0:461:f0fa:864e with SMTP id t8-20020a056402524800b00461f0fa864emr17988220edd.81.1666912726054; Thu, 27 Oct 2022 16:18:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666912726; cv=none; d=google.com; s=arc-20160816; b=bcz/XUwr1uVbzI0phfyHo4f3imGTTVlJYvPbAaGYPRaECPCuPPM80dIucX6JvUr/QQ 7gLKvBhkFi/YtNmISQsuCVHlBhsJ5e2PG1k054Yh0C2V8kOJ/+TY9EQKiKqbJuesKUZB l3KpKN7zaXa9eoGSi4qx3naQ4pZoojgS5274ow/wdrxs/zZA65J8l+oCD9Srd1Hth6eG DQYUmsOCw1ioYdgE31gOsBhAqMea7U6LkS9k4qkLrtbP7uLF7ftQ6WLKGmt5xAZsXkV0 g4coywv5kS+7+gt2n9gSiTWC1UCkOfMv58lbRjDNGxfOSB9+R+cGASZDYo3HxS1bGKg2 tLEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=k1ckgw/Zitbryg1fYk9NUQK1HUEQUI6/L91coAjePPM=; b=WmAnMkslZgWWVn30Cg1cyoucDjcGfZDUHv3YQkjqV2RUxyoDrRHRPRz4HKoezjVahu 0kwTyUacxtnsymyboiVxaTlTd1lhW/ho3pWHX+JY9UXf/KUulbLFteWczXfYfhHER1hw Xtb7jtxXBQhXxYkH2A2CaObu+WXSlTGEJRJu7o+k9dMtdqPWPudeqyZqrO/ADvbFtd7R mjo3BCbwz+NhtlZt83vyzqRdKZKOEiRGv6pLx0ARDYJ+cs39HMfeWiU6U840K5mCgBp5 boTgNDS/xMTO5k1ficrbVRP1tM/yz+fmeZtHMue5xKs+n7ghMBEcnLx9uoPu7Ibx+kh9 HvqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="rF/lhgMP"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id qb6-20020a1709077e8600b0073155abc1b8si3157209ejc.154.2022.10.27.16.18.45 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Oct 2022 16:18:46 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="rF/lhgMP"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0F23D381D445 for ; Thu, 27 Oct 2022 23:17:59 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0F23D381D445 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1666912679; bh=k1ckgw/Zitbryg1fYk9NUQK1HUEQUI6/L91coAjePPM=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=rF/lhgMPVukHtFElGOBxUSLxjB7k0/jSRMC6vL8OFoIWNxRRhehTFM/3tYLW6u7dD HiLzr0Xcf99uBTmjrHjmRCKaNMjqdT9w1trJeSULDQ4ucc/sBf9re+QLRJVBucRHQi /4PzYmCqR9sgqxPh4DPxlzh5inL64CgZnPJPpOaI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qv1-xf2c.google.com (mail-qv1-xf2c.google.com [IPv6:2607:f8b0:4864:20::f2c]) by sourceware.org (Postfix) with ESMTPS id D2715385AC26 for ; Thu, 27 Oct 2022 23:17:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D2715385AC26 Received: by mail-qv1-xf2c.google.com with SMTP id e15so2919152qvo.4 for ; Thu, 27 Oct 2022 16:17:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=k1ckgw/Zitbryg1fYk9NUQK1HUEQUI6/L91coAjePPM=; b=5JPVaZYDmO6ve2QT36UQqqLp9fiRxd98iX0UAcN5AWBMPQgBEJ3Z9N1GWA1cg+Demr 9mIZ6ErssmtcN9GqoZHACf9eUQXNcp5+7dmPsztapzUJ4zTNVvd8mW0CNvtbbgD8tByv nyQuOhGMOK5VOjHrC77AEd+rfkacvDp/2J0K/c/gYov5tPqOVelP8+AukWJFIbQrZ1RG wZP+E5buTVboNa2g/rUmNqNQi5kOWQQ2UblL4Zj8U/5/2R31AAVoQ/6lJdexLAlYyLKZ BcTOEUoP76/qY25eEw3GyRcbJqM9/mII74aPZp1yEMxdCMRczzzk9zfyJOPamgdnMhMv f+yA== X-Gm-Message-State: ACrzQf1sgGstLVc98dZMkAwrQzhw3Zz+b7avrM37Au91FyyJ0Qy2edJ0 DXNWnIdPcCROsICGaUsi/eFmi5OwvrfokA== X-Received: by 2002:a05:6214:d46:b0:4b4:625b:7f0c with SMTP id 6-20020a0562140d4600b004b4625b7f0cmr44238248qvr.86.1666912629198; Thu, 27 Oct 2022 16:17:09 -0700 (PDT) Received: from localhost (cpe-142-105-146-128.nycap.res.rr.com. [142.105.146.128]) by smtp.gmail.com with ESMTPSA id i16-20020a05620a405000b006eeb25369e9sm1881155qko.25.2022.10.27.16.17.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Oct 2022 16:17:08 -0700 (PDT) To: gcc-patches@gcc.gnu.org Subject: [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string Date: Thu, 27 Oct 2022 19:16:43 -0400 Message-Id: <20221027231645.67623-3-ben.boeckel@kitware.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221027231645.67623-1-ben.boeckel@kitware.com> References: <20221027231645.67623-1-ben.boeckel@kitware.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Ben Boeckel via Gcc-patches From: Ben Boeckel Reply-To: Ben Boeckel Cc: gcc@gcc.gnu.org, brad.king@kitware.com, fortran@gcc.gnu.org, anlauf@gmx.de, Ben Boeckel , nathan@acm.org Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747884679069333862?= X-GMAIL-MSGID: =?utf-8?q?1747884679069333862?= This simplifies the interface for other UTF-8 validity detections when a simple "yes" or "no" answer is sufficient. Signed-off-by: Ben Boeckel --- libcpp/ChangeLog | 6 ++++++ libcpp/charset.cc | 18 ++++++++++++++++++ libcpp/internal.h | 2 ++ 3 files changed, 26 insertions(+) diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog index 4d707277531..4e2c7900ae2 100644 --- a/libcpp/ChangeLog +++ b/libcpp/ChangeLog @@ -1,3 +1,9 @@ +2022-10-27 Ben Boeckel + + * include/charset.cc: Add `_cpp_valid_utf8_str` which determines + whether a C string is valid UTF-8 or not. + * include/internal.h: Add prototype for `_cpp_valid_utf8_str`. + 2022-10-27 Ben Boeckel * include/charset.cc: Reject encodings of codepoints above 0x10FFFF. diff --git a/libcpp/charset.cc b/libcpp/charset.cc index e9da6674b5f..0524ab6beba 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -1864,6 +1864,24 @@ _cpp_valid_utf8 (cpp_reader *pfile, return true; } +extern bool +_cpp_valid_utf8_str (const char *name) +{ + const uchar* in = (const uchar*)name; + size_t len = strlen(name); + cppchar_t cp; + + while (*in) + { + if (one_utf8_to_cppchar(&in, &len, &cp)) + { + return false; + } + } + + return true; +} + /* Subroutine of convert_hex and convert_oct. N is the representation in the execution character set of a numeric escape; write it into the string buffer TBUF and update the end-of-string pointer therein. WIDE diff --git a/libcpp/internal.h b/libcpp/internal.h index badfd1b40da..4f2dd4a2f5c 100644 --- a/libcpp/internal.h +++ b/libcpp/internal.h @@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile, struct normalize_state *nst, cppchar_t *cp); +extern bool _cpp_valid_utf8_str (const char *str); + extern void _cpp_destroy_iconv (cpp_reader *); extern unsigned char *_cpp_convert_input (cpp_reader *, const char *, unsigned char *, size_t, size_t,