From patchwork Tue Feb 28 09:50:42 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jonathan Wakely <jwakely@redhat.com>
X-Patchwork-Id: 62387
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp2919303wrd;
        Tue, 28 Feb 2023 01:53:53 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set8423ATdBc8HM+4T84Z9pBI9opfChD96EhQusRrJOW6uFxP83QiHNdc9CpVK5DNDEMcZoQD
X-Received: by 2002:a17:906:ad81:b0:8aa:c2bd:a71c with SMTP id
 la1-20020a170906ad8100b008aac2bda71cmr1493687ejb.75.1677578033795;
        Tue, 28 Feb 2023 01:53:53 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1677578033; cv=none;
        d=google.com; s=arc-20160816;
        b=ekHGJmZhIXRff6SPeoFJHEjZPA6cEN4ktX6cMBDNkx+wQq4JXfQKnSfbohwCKo+vKd
         nKXUMX5kGGP/B/8wIgKhL3eFz6uYaAkjS53wql3OGqbaoyJId3Tg8N51Zdq5UCGjDG7h
         bvJyYGqA/q77VfMjVn9DSql5J3sS2lVKRYj2apzUed7sPDY1h4B8UWfnDuJQSpda4j7+
         YDseGB9Kcdc4cTTfJNZbVylXbAwe4CZcq0BdrZke4HOosWc6Q0xCgFRtpe1bp+7iMwXs
         W99aSvFNv5BA6Unegycj3mu62CmSj3M2ZhodyGS4fK0Y6x8DLcNrJzmTVBKRC0nEBXHf
         PNPg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post
         :list-archive:list-unsubscribe:list-id:precedence
         :content-transfer-encoding:mime-version:message-id:date:subject:to
         :dmarc-filter:delivered-to:dkim-signature:dkim-filter;
        bh=6zNwlzA6gmIr6DaGqElX1q4PgVfe0/j0f8Zd2by8xpQ=;
        b=K0OPQGtdN0eYoReTgmSRFqNb+YKItDQ/9NPgxELqDds1rqkmZ+qIXKU0XpR0x10jL1
         qXVbNsVohB3JxpFrX1jOjFRScc5kKDQLKa1kIfF5kzDz4mAgMjUYVsITk+N0vfd7XpXv
         +iRLS0b6TnOtYpoPT2tKDIsq+vMuQPy5Gse2Z6dNgeu592uEDWDS0eBTlOFIlKpzREXX
         63IRZTZbD/kSxcUVKrabCTFfwUap5OFJuXePtyDfIAxEi31Z0l516AuHEu6H/Jns8RtQ
         LOgWwC3SBh/dprYW93jZ+oS8BAnvl88O3D+yOIyDR87Pfgyknn1QppjuhlYweryvmZ7e
         8F2A==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=vHIsuX1C;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from sourceware.org (server2.sourceware.org. [8.43.85.97])
        by mx.google.com with ESMTPS id
 um17-20020a170907cb1100b008d7a24d9185si8999782ejc.928.2023.02.28.01.53.53
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 28 Feb 2023 01:53:53 -0800 (PST)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender) client-ip=8.43.85.97;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=vHIsuX1C;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id B0885384842E
	for <ouuuleilei@gmail.com>; Tue, 28 Feb 2023 09:52:22 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B0885384842E
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1677577942;
	bh=6zNwlzA6gmIr6DaGqElX1q4PgVfe0/j0f8Zd2by8xpQ=;
	h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=vHIsuX1C/KpsmuX3s8PEV75oyXbZkmPwrI3CwIilEghQRwlt6vlcPsXGH5LIPkSms
	 7BvwJmTtTr+6R+EFv0MtOl+e5+aiAaIYD5AVJoaj5TI7fzT++oT/HrMiZfDeh+7hv1
	 mkatESsbS04RBAMh9hoIZzd0Qj99Xy2yxXqNSkNs=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.133.124])
 by sourceware.org (Postfix) with ESMTPS id 23FAF3858D39
 for <gcc-patches@gcc.gnu.org>; Tue, 28 Feb 2023 09:50:47 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 23FAF3858D39
Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com
 [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-672-nCVeT30jNOiw39nlBiOlkg-1; Tue, 28 Feb 2023 04:50:43 -0500
X-MC-Unique: nCVeT30jNOiw39nlBiOlkg-1
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com
 [10.11.54.7])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7F5F780558D;
 Tue, 28 Feb 2023 09:50:43 +0000 (UTC)
Received: from localhost (unknown [10.33.36.228])
 by smtp.corp.redhat.com (Postfix) with ESMTP id 46678140EBF6;
 Tue, 28 Feb 2023 09:50:43 +0000 (UTC)
To: libstdc++@gcc.gnu.org,
	gcc-patches@gcc.gnu.org
Subject: [committed] libstdc++: Add likely/unlikely attributes to <codecvt>
 implementation
Date: Tue, 28 Feb 2023 09:50:42 +0000
Message-Id: <20230228095042.1192997-1-jwakely@redhat.com>
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0,
 RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=unavailable autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Jonathan Wakely via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Jonathan Wakely <jwakely@redhat.com>
Reply-To: Jonathan Wakely <jwakely@redhat.com>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1759068064288570133?=
X-GMAIL-MSGID: =?utf-8?q?1759068064288570133?=

Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

For the common case of converting valid text this improves performance
significantly.

libstdc++-v3/ChangeLog:

	* src/c++11/codecvt.cc: Add [[likely]] and [[unlikely]]
	attributes.
---
 libstdc++-v3/src/c++11/codecvt.cc | 92 +++++++++++++++----------------
 1 file changed, 46 insertions(+), 46 deletions(-)

diff --git a/libstdc++-v3/src/c++11/codecvt.cc b/libstdc++-v3/src/c++11/codecvt.cc
index e333e795f48..02f05752de8 100644
--- a/libstdc++-v3/src/c++11/codecvt.cc
+++ b/libstdc++-v3/src/c++11/codecvt.cc
@@ -256,19 +256,19 @@ namespace
       return incomplete_mb_character;
     char32_t c1 = (unsigned char) from[0];
     // https://en.wikipedia.org/wiki/UTF-8#Sample_code
-    if (c1 < 0x80)
+    if (c1 < 0x80) [[likely]]
     {
       ++from;
       return c1;
     }
-    else if (c1 < 0xC2) // continuation or overlong 2-byte sequence
+    else if (c1 < 0xC2) [[unlikely]] // continuation or overlong 2-byte sequence
       return invalid_mb_sequence;
     else if (c1 < 0xE0) // 2-byte sequence
     {
-      if (avail < 2)
+      if (avail < 2) [[unlikely]]
 	return incomplete_mb_character;
       char32_t c2 = (unsigned char) from[1];
-      if ((c2 & 0xC0) != 0x80)
+      if ((c2 & 0xC0) != 0x80) [[unlikely]]
 	return invalid_mb_sequence;
       char32_t c = (c1 << 6) + c2 - 0x3080;
       if (c <= maxcode)
@@ -277,17 +277,17 @@ namespace
     }
     else if (c1 < 0xF0) // 3-byte sequence
     {
-      if (avail < 2)
+      if (avail < 2) [[unlikely]]
 	return incomplete_mb_character;
       char32_t c2 = (unsigned char) from[1];
-      if ((c2 & 0xC0) != 0x80)
+      if ((c2 & 0xC0) != 0x80) [[unlikely]]
 	return invalid_mb_sequence;
-      if (c1 == 0xE0 && c2 < 0xA0) // overlong
+      if (c1 == 0xE0 && c2 < 0xA0) [[unlikely]] // overlong
 	return invalid_mb_sequence;
-      if (avail < 3)
+      if (avail < 3) [[unlikely]]
 	return incomplete_mb_character;
       char32_t c3 = (unsigned char) from[2];
-      if ((c3 & 0xC0) != 0x80)
+      if ((c3 & 0xC0) != 0x80) [[unlikely]]
 	return invalid_mb_sequence;
       char32_t c = (c1 << 12) + (c2 << 6) + c3 - 0xE2080;
       if (c <= maxcode)
@@ -296,31 +296,31 @@ namespace
     }
     else if (c1 < 0xF5 && maxcode > 0xFFFF) // 4-byte sequence
     {
-      if (avail < 2)
+      if (avail < 2) [[unlikely]]
 	return incomplete_mb_character;
       char32_t c2 = (unsigned char) from[1];
-      if ((c2 & 0xC0) != 0x80)
+      if ((c2 & 0xC0) != 0x80) [[unlikely]]
 	return invalid_mb_sequence;
-      if (c1 == 0xF0 && c2 < 0x90) // overlong
+      if (c1 == 0xF0 && c2 < 0x90) [[unlikely]] // overlong
 	return invalid_mb_sequence;
-      if (c1 == 0xF4 && c2 >= 0x90) // > U+10FFFF
+      if (c1 == 0xF4 && c2 >= 0x90) [[unlikely]] // > U+10FFFF
 	return invalid_mb_sequence;
-      if (avail < 3)
+      if (avail < 3) [[unlikely]]
 	return incomplete_mb_character;
       char32_t c3 = (unsigned char) from[2];
-      if ((c3 & 0xC0) != 0x80)
+      if ((c3 & 0xC0) != 0x80) [[unlikely]]
 	return invalid_mb_sequence;
-      if (avail < 4)
+      if (avail < 4) [[unlikely]]
 	return incomplete_mb_character;
       char32_t c4 = (unsigned char) from[3];
-      if ((c4 & 0xC0) != 0x80)
+      if ((c4 & 0xC0) != 0x80) [[unlikely]]
 	return invalid_mb_sequence;
       char32_t c = (c1 << 18) + (c2 << 12) + (c3 << 6) + c4 - 0x3C82080;
       if (c <= maxcode)
 	from += 4;
       return c;
     }
-    else // > U+10FFFF
+    else [[unlikely]] // > U+10FFFF
       return invalid_mb_sequence;
   }
 
@@ -330,20 +330,20 @@ namespace
   {
     if (code_point < 0x80)
       {
-	if (to.size() < 1)
+	if (to.size() < 1) [[unlikely]]
 	  return false;
 	to = code_point;
       }
     else if (code_point <= 0x7FF)
       {
-	if (to.size() < 2)
+	if (to.size() < 2) [[unlikely]]
 	  return false;
 	to = (code_point >> 6) + 0xC0;
 	to = (code_point & 0x3F) + 0x80;
       }
     else if (code_point <= 0xFFFF)
       {
-	if (to.size() < 3)
+	if (to.size() < 3) [[unlikely]]
 	  return false;
 	to = (code_point >> 12) + 0xE0;
 	to = ((code_point >> 6) & 0x3F) + 0x80;
@@ -351,14 +351,14 @@ namespace
       }
     else if (code_point <= 0x10FFFF)
       {
-	if (to.size() < 4)
+	if (to.size() < 4) [[unlikely]]
 	  return false;
 	to = (code_point >> 18) + 0xF0;
 	to = ((code_point >> 12) & 0x3F) + 0x80;
 	to = ((code_point >> 6) & 0x3F) + 0x80;
 	to = (code_point & 0x3F) + 0x80;
       }
-    else
+    else [[unlikely]]
       return false;
     return true;
   }
@@ -403,16 +403,16 @@ namespace
 			  unsigned long maxcode, codecvt_mode mode)
     {
       const size_t avail = from.size();
-      if (avail == 0)
+      if (avail == 0) [[unlikely]]
 	return incomplete_mb_character;
       int inc = 1;
       char32_t c = adjust_byte_order(from[0], mode);
       if (is_high_surrogate(c))
 	{
-	  if (avail < 2)
+	  if (avail < 2) [[unlikely]]
 	    return incomplete_mb_character;
 	  const char16_t c2 = adjust_byte_order(from[1], mode);
-	  if (is_low_surrogate(c2))
+	  if (is_low_surrogate(c2)) [[likely]]
 	    {
 	      c = surrogate_pair_to_code_point(c, c2);
 	      inc = 2;
@@ -420,7 +420,7 @@ namespace
 	  else
 	    return invalid_mb_sequence;
 	}
-      else if (is_low_surrogate(c))
+      else if (is_low_surrogate(c)) [[unlikely]]
 	return invalid_mb_sequence;
       if (c <= maxcode)
 	from += inc;
@@ -464,9 +464,9 @@ namespace
     while (from.size() && to.size())
       {
 	const char32_t codepoint = read_utf8_code_point(from, maxcode);
-	if (codepoint == incomplete_mb_character)
+	if (codepoint == incomplete_mb_character) [[unlikely]]
 	  return codecvt_base::partial;
-	if (codepoint > maxcode)
+	if (codepoint > maxcode) [[unlikely]]
 	  return codecvt_base::error;
 	to = codepoint;
       }
@@ -479,14 +479,14 @@ namespace
   ucs4_out(range<const char32_t>& from, range<C>& to,
            unsigned long maxcode = max_code_point, codecvt_mode mode = {})
   {
-    if (!write_utf8_bom(to, mode))
+    if (!write_utf8_bom(to, mode)) [[unlikely]]
       return codecvt_base::partial;
     while (from.size())
       {
 	const char32_t c = from[0];
-	if (c > maxcode)
+	if (c > maxcode) [[unlikely]]
 	  return codecvt_base::error;
-	if (!write_utf8_code_point(to, c))
+	if (!write_utf8_code_point(to, c)) [[unlikely]]
 	  return codecvt_base::partial;
 	++from;
       }
@@ -502,9 +502,9 @@ namespace
     while (from.size() && to.size())
       {
 	const char32_t codepoint = read_utf16_code_point(from, maxcode, mode);
-	if (codepoint == incomplete_mb_character)
+	if (codepoint == incomplete_mb_character) [[unlikely]]
 	  return codecvt_base::partial;
-	if (codepoint > maxcode)
+	if (codepoint > maxcode) [[unlikely]]
 	  return codecvt_base::error;
 	to = codepoint;
       }
@@ -516,14 +516,14 @@ namespace
   ucs4_out(range<const char32_t>& from, range<char16_t, false>& to,
            unsigned long maxcode = max_code_point, codecvt_mode mode = {})
   {
-    if (!write_utf16_bom(to, mode))
+    if (!write_utf16_bom(to, mode)) [[unlikely]]
       return codecvt_base::partial;
     while (from.size())
       {
 	const char32_t c = from[0];
-	if (c > maxcode)
+	if (c > maxcode) [[unlikely]]
 	  return codecvt_base::error;
-	if (!write_utf16_code_point(to, c, mode))
+	if (!write_utf16_code_point(to, c, mode)) [[unlikely]]
 	  return codecvt_base::partial;
 	++from;
       }
@@ -544,11 +544,11 @@ namespace
       {
 	auto orig = from;
 	const char32_t codepoint = read_utf8_code_point(from, maxcode);
-	if (codepoint == incomplete_mb_character)
+	if (codepoint == incomplete_mb_character) [[unlikely]]
 	  return codecvt_base::partial;
 	if (codepoint > maxcode)
 	  return codecvt_base::error;
-	if (!write_utf16_code_point(to, codepoint, mode))
+	if (!write_utf16_code_point(to, codepoint, mode)) [[unlikely]]
 	  {
 	    from = orig; // rewind to previous position
 	    return codecvt_base::partial;
@@ -564,7 +564,7 @@ namespace
 	    unsigned long maxcode = max_code_point, codecvt_mode mode = {},
 	    surrogates s = surrogates::allowed)
   {
-    if (!write_utf8_bom(to, mode))
+    if (!write_utf8_bom(to, mode)) [[unlikely]]
       return codecvt_base::partial;
     while (from.size())
       {
@@ -572,14 +572,14 @@ namespace
 	int inc = 1;
 	if (is_high_surrogate(c))
 	  {
-	    if (s == surrogates::disallowed)
+	    if (s == surrogates::disallowed) [[unlikely]]
 	      return codecvt_base::error; // No surrogates in UCS-2
 
-	    if (from.size() < 2)
+	    if (from.size() < 2) [[unlikely]]
 	      return codecvt_base::partial; // stop converting at this point
 
 	    const char32_t c2 = from[1];
-	    if (is_low_surrogate(c2))
+	    if (is_low_surrogate(c2)) [[likely]]
 	      {
 		c = surrogate_pair_to_code_point(c, c2);
 		inc = 2;
@@ -587,11 +587,11 @@ namespace
 	    else
 	      return codecvt_base::error;
 	  }
-	else if (is_low_surrogate(c))
+	else if (is_low_surrogate(c)) [[unlikely]]
 	  return codecvt_base::error;
-	if (c > maxcode)
+	if (c > maxcode) [[unlikely]]
 	  return codecvt_base::error;
-	if (!write_utf8_code_point(to, c))
+	if (!write_utf8_code_point(to, c)) [[unlikely]]
 	  return codecvt_base::partial;
 	from += inc;
       }