From patchwork Wed Feb 8 23:38:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dimitrij Mijoski X-Patchwork-Id: 54637 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp5096wrn; Wed, 8 Feb 2023 15:39:25 -0800 (PST) X-Google-Smtp-Source: AK7set9Z/DLyFejC/RBxRlEmlWz1z8/n1GyxnIhIiNQxeG6gS6C1FlSVZEYQSgsmfGNREp8yrNhi X-Received: by 2002:a50:8ad7:0:b0:4aa:a82b:9711 with SMTP id k23-20020a508ad7000000b004aaa82b9711mr9281499edk.35.1675899565638; Wed, 08 Feb 2023 15:39:25 -0800 (PST) Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id c19-20020aa7df13000000b004aaa1a2cd0bsi138367edy.154.2023.02.08.15.39.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Feb 2023 15:39:25 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Hz8NsuQc; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 64FCD3858C50 for ; Wed, 8 Feb 2023 23:39:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 64FCD3858C50 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1675899564; bh=+BCpv7ZcPFqVdm1+j8ZccqB0I6BB9kUX+R3pIwg1fRA=; h=Subject:To:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=Hz8NsuQc/x3Ew4ao5QMmUFGPPAInHGo/mdh7+REFxDXtOywwHRIhdUJegoSw3A5FP Y7mnC0pSL7vbuj7wkYanS+FqKy3F5SqoB2KK4aS5oXtqvZWru8SCp0XHrW0ZsTFH3b 3bfYILNSCJTWSUsYmxeVLypTo1Vm4jZ9/biGs5TE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05olkn2060.outbound.protection.outlook.com [40.92.89.60]) by sourceware.org (Postfix) with ESMTPS id 0E41A3858C60; Wed, 8 Feb 2023 23:38:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0E41A3858C60 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jv68xx3dlnZBPf3TaFvR5ylTlJ/3SuycMulhFFVagP/J0ZBud5oIkB7fr5O9HkX2cD/CIdrqYFW0ijWSFh4O4suaW1nNvzWoSOZmXndalrXLq4Nb3jrQxee4txGq1sxsylidQj6bW8EH6Ri2xfpO+suApKnr5ZF0Zsbv6MuynNpAfSwN0AMhJEN02tn3HRAe0mDicVJvTUxthBHD9w9gIRgTAAl4wTf+zbWq85dDzTNEPcnTNvnW5PJOZbmbuEFSW6ySMN9IADiD0SA6vkbUYXsiJ05pWYGItS/L6yzFA4rO5rdzfIEhrEnP26uHCLey2mBuZG2nPpN4etSxHwixAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+BCpv7ZcPFqVdm1+j8ZccqB0I6BB9kUX+R3pIwg1fRA=; b=QK0XpiUizk7MPz59GqOocuhEmSUlkn3zsUrAOgd8rrYUKVsLCDdRjX+DMjtESBVIS4GFMsTBa6EMrucSqTqabRKloNkQgWf65Qe4XgDY53vSAxoQtzK9qaT3KaybTvz3qf3uE7EBs+kbomTiCP0aR9kiYa83f+SSCXPaYPXgr1AHtNHVJxRK8onfGE2cuwrBD8+fvjcC9+WykA80COtq55C+YXhc8eGNEhOTtLgr1IHJOx/AMy99oqcY5jjY6TXX/+oibdnZGRyf5BwCDSllDP2r/OPa5V+PyXPhLtpcjAAIOdCGD2TSiHfrj7T6migEjYeXPBXpcOenCQrz+NwPQw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none Received: from AM0PR04MB5412.eurprd04.prod.outlook.com (2603:10a6:208:10f::11) by AM9PR04MB8778.eurprd04.prod.outlook.com (2603:10a6:20b:409::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6086.17; Wed, 8 Feb 2023 23:38:24 +0000 Received: from AM0PR04MB5412.eurprd04.prod.outlook.com ([fe80::68d9:6d8c:1653:64fb]) by AM0PR04MB5412.eurprd04.prod.outlook.com ([fe80::68d9:6d8c:1653:64fb%4]) with mapi id 15.20.6064.036; Wed, 8 Feb 2023 23:38:24 +0000 Message-ID: Subject: [PATCH] libstdc++: testsuite: Add char8_t to codecvt_unicode To: gcc-patches@gcc.gnu.org, libstdc++@gcc.gnu.org Date: Thu, 09 Feb 2023 00:38:21 +0100 User-Agent: Evolution 3.44.4-0ubuntu1 X-TMN: [tpqXWgl5nCgsJsFyxHswMqy7U0XwGJCi] X-ClientProxiedBy: VI1PR04CA0137.eurprd04.prod.outlook.com (2603:10a6:803:f0::35) To AM0PR04MB5412.eurprd04.prod.outlook.com (2603:10a6:208:10f::11) X-Microsoft-Original-Message-ID: <022b16a428fd2d4b05107430d5d42b2630549615.camel@hotmail.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: AM0PR04MB5412:EE_|AM9PR04MB8778:EE_ X-MS-Office365-Filtering-Correlation-Id: c1426ffa-774e-48b2-5616-08db0a2d91dc X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: BfSzcQCtCre81P2yue28Ad0hRXN6oPyBF4c+3az5Q/548A6NJSfMizEAZsnKnH2oLqNI48EW5d53sacQI11bChs4M5DruWif5uipT7QBgxjjMbOgTPUlW3K3tfnxSRf+9A13vVpWg2QzovvjfMlqj9qEXRTM444f2izjhJ6Y7LiOcnLoKqsjZzPyKDHi9ZCOLPirqmN+WC063FsLGYzhOypIlZRrSzqxAtxG680SghAP9TKEgjX/YF/6haF3vNaqZm1sFD29+057ZYPLBym5E5nSxUM9daWvsDyLbNYIR78eNWLPeOcpVuKJ3jxHFVqSPJdTSjoNKkHWe3fmDSdK7QuJhMkun7oUzYQf756MQR1zXM67yZ1YfPOKeAh4GdA1il6DKH5uy1gui8f/o4I4aau0eljQmcr82NA+240+zhfowonl2BVd/3zAATRxpGuhoXYjtKFj9Jg7yoLASoo50N0d1wYycm5HvkQKzTe8hT9G6TEFGJ2Ud3EtiNmx43eAOrwiechU0muRIoVi4j7Cf4ildfYsp9iuwJRH5UZ6eBC5BVqZhn7HH3bj40prEFUB2aJISj3Ndx8he3abOR7I0g== X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?q?NxGDpKUZ1f2aeHmA80Vunh9u7ZhS?= =?utf-8?q?RNFa89v5ZHgE6ns7jgLD/9ZlhefJFmUSQqN0quOUtljg69YDygKx/dWWROuGRJzQZ?= =?utf-8?q?oswOoCU50KcciEIzcoQ+1J16bf6+dRoBk3QV+Wfdpa6I4NNsmfCacEJimdluA0/hA?= =?utf-8?q?VqULMZ5OKxRtcAj52qqFqb0br2D2BGCDhfY6i5boLEZ+04sUIV46x5GDGT/exWwiv?= =?utf-8?q?0zf6E15+MWQ/r2FyH35Ll1iSjGSUaW1O14Quvfgvza9f68Bkn3yQjqBOAG1aALlqE?= =?utf-8?q?PyiWX9V7puPzKqt/+COc/tO8HDySlFxRpnVJ6GfokI/RXqRZQnMiXibhcBrSsVjvg?= =?utf-8?q?GaOKBaDw6AoO2fY1wtnMRF9z4cMJ+qvcvy1RPUol+OG2gNxUos196vZWa1eta0vBj?= =?utf-8?q?41MeI2+g7fC19yNTatXFpBXToE4upu4iWw+O8qSHdjOdQec6XJdF4Rcg/7WJzYb8L?= =?utf-8?q?+Vg0v1jm+KQ7rDVWmWS8mLBOvOyuNQbT0jdxsmh8eAW47AZ4YFRZSENqCoVTTCvnF?= =?utf-8?q?vhFNJTBeVKy1pto7oiAlE7vZi7eMUf2uIK4XJcpBy2+VTisaOOR0ZJOOKK372Sgti?= =?utf-8?q?Vyt0VkC8DZBalgSh8RBrn5lzzZEghsjXUCw3M/6zsSzwBlJObci6Nak7HqNR5XIxk?= =?utf-8?q?xiKKbG98rGZwcuDK5yb9vDc+8LsLjNSYMXeo2PeSZe5XnZs9pLH6h5EBuLeAiikFK?= =?utf-8?q?Iy+FgIP1D8efkOaNBEJfZmztgfuZqy3REBOVNosFd4RUSAsJqFPFpKgo8QaA8Vrgo?= =?utf-8?q?HME/xUs5EMksEytwL49gLyVADDoN2RXFsLq/VE1ate9dFt4tpN9OaG69ys4VKpqf4?= =?utf-8?q?uruxb8zL6o1OxkuU+RgEOQ4gDtrCnRd5J/QTq0OWK+IH+VyBG0FzKCB0XADqlm4SC?= =?utf-8?q?XzPJ27x5461e0yQx+PKmgPe0ewlLVpIU2dHl6YovBXZXgFm5S7y9T2LAG65sl6sML?= =?utf-8?q?gx6zygVn9sXVlWX8qWFIxbD9iw6cABcdw+3YaMMZJ0HgO5XReMFe9cdp1lxgKcD//?= =?utf-8?q?uXI4StU9FCr2bT0sIp6akon/c9pMFwikz21uDe0IBHERkc0sQa0Ic/vlsYbQWJCWP?= =?utf-8?q?1HOlbucfr/xIegBt51Z4qpyuW1pplooRGbjIhyIZsdMBs+EIdPHF5i6JSiNA/JmTB?= =?utf-8?q?ELqBwjEymQMsZZ23+/rBcZ9CrrMRZd6whmc1w+oAkkLmW0Yd8UWzoYlX9jOnRVs8f?= =?utf-8?q?EAECqEpykdR54SQK7?= X-OriginatorOrg: sct-15-20-4755-11-msonline-outlook-03a34.templateTenant X-MS-Exchange-CrossTenant-Network-Message-Id: c1426ffa-774e-48b2-5616-08db0a2d91dc X-MS-Exchange-CrossTenant-AuthSource: AM0PR04MB5412.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Feb 2023 23:38:24.5624 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM9PR04MB8778 X-Spam-Status: No, score=-9.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, HK_RANDOM_ENVFROM, HK_RANDOM_FROM, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Dimitrij Mijoski via Gcc-patches From: Dimitrij Mijoski Reply-To: Dimitrij Mijoski Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757308062847347360?= X-GMAIL-MSGID: =?utf-8?q?1757308062847347360?= libstdc++-v3/ChangeLog: * testsuite/22_locale/codecvt/codecvt_unicode.cc: Rename functions. * testsuite/22_locale/codecvt/codecvt_unicode.h: Make more generic so it accepts char8_t. * testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc: Rename functions. * testsuite/22_locale/codecvt/codecvt_unicode_char8_t.cc: New test. --- .../22_locale/codecvt/codecvt_unicode.cc | 16 +- .../22_locale/codecvt/codecvt_unicode.h | 807 +++++++++--------- .../codecvt/codecvt_unicode_char8_t.cc | 53 ++ .../codecvt/codecvt_unicode_wchar_t.cc | 6 +- 4 files changed, 484 insertions(+), 398 deletions(-) create mode 100644 libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode_char8_t.cc diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc index df1a2b4cc..eafb53a8c 100644 --- a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc +++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc @@ -27,38 +27,38 @@ void test_utf8_utf32_codecvts () { using codecvt_c32 = codecvt; - auto loc_c = locale::classic (); + auto &loc_c = locale::classic (); VERIFY (has_facet (loc_c)); auto &cvt = use_facet (loc_c); - test_utf8_utf32_codecvts (cvt); + test_utf8_utf32_cvt (cvt); codecvt_utf8 cvt2; - test_utf8_utf32_codecvts (cvt2); + test_utf8_utf32_cvt (cvt2); } void test_utf8_utf16_codecvts () { using codecvt_c16 = codecvt; - auto loc_c = locale::classic (); + auto &loc_c = locale::classic (); VERIFY (has_facet (loc_c)); auto &cvt = use_facet (loc_c); - test_utf8_utf16_cvts (cvt); + test_utf8_utf16_cvt (cvt); codecvt_utf8_utf16 cvt2; - test_utf8_utf16_cvts (cvt2); + test_utf8_utf16_cvt (cvt2); codecvt_utf8_utf16 cvt3; - test_utf8_utf16_cvts (cvt3); + test_utf8_utf16_cvt (cvt3); } void test_utf8_ucs2_codecvts () { codecvt_utf8 cvt; - test_utf8_ucs2_cvts (cvt); + test_utf8_ucs2_cvt (cvt); } int diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.h b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.h index fbdc7a35b..690c07215 100644 --- a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.h +++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.h @@ -42,33 +42,33 @@ auto constexpr array_size (const T (&)[N]) -> size_t return N; } -template +template void -utf8_to_utf32_in_ok (const std::codecvt &cvt) +utf8_to_utf32_in_ok (const std::codecvt &cvt) { using namespace std; // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP - const char in[] = "bш\uAAAA\U0010AAAA"; - const char32_t exp_literal[] = U"bш\uAAAA\U0010AAAA"; - CharT exp[array_size (exp_literal)] = {}; - std::copy (begin (exp_literal), end (exp_literal), begin (exp)); - - static_assert (array_size (in) == 11, ""); - static_assert (array_size (exp_literal) == 5, ""); - static_assert (array_size (exp) == 5, ""); - VERIFY (char_traits::length (in) == 10); - VERIFY (char_traits::length (exp_literal) == 4); - VERIFY (char_traits::length (exp) == 4); + const unsigned char input[] = "bш\uAAAA\U0010AAAA"; + const char32_t expected[] = U"bш\uAAAA\U0010AAAA"; + static_assert (array_size (input) == 11, ""); + static_assert (array_size (expected) == 5, ""); + + ExternT in[array_size (input)]; + InternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 10); + VERIFY (char_traits::length (exp) == 4); test_offsets_ok offsets[] = {{0, 0}, {1, 1}, {3, 2}, {6, 3}, {10, 4}}; for (auto t : offsets) { - CharT out[array_size (exp) - 1] = {}; + InternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); auto state = mbstate_t{}; - auto in_next = (const char *) nullptr; - auto out_next = (CharT *) nullptr; + auto in_next = (const ExternT *) nullptr; + auto out_next = (InternT *) nullptr; auto res = codecvt_base::result (); res = cvt.in (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -76,19 +76,19 @@ utf8_to_utf32_in_ok (const std::codecvt &cvt) VERIFY (res == cvt.ok); VERIFY (in_next == in + t.in_size); VERIFY (out_next == out + t.out_size); - VERIFY (char_traits::compare (out, exp, t.out_size) == 0); + VERIFY (char_traits::compare (out, exp, t.out_size) == 0); if (t.out_size < array_size (out)) VERIFY (out[t.out_size] == 0); } for (auto t : offsets) { - CharT out[array_size (exp)] = {}; + InternT out[array_size (exp)] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); auto state = mbstate_t{}; - auto in_next = (const char *) nullptr; - auto out_next = (CharT *) nullptr; + auto in_next = (const ExternT *) nullptr; + auto out_next = (InternT *) nullptr; auto res = codecvt_base::result (); res @@ -96,29 +96,29 @@ utf8_to_utf32_in_ok (const std::codecvt &cvt) VERIFY (res == cvt.ok); VERIFY (in_next == in + t.in_size); VERIFY (out_next == out + t.out_size); - VERIFY (char_traits::compare (out, exp, t.out_size) == 0); + VERIFY (char_traits::compare (out, exp, t.out_size) == 0); if (t.out_size < array_size (out)) VERIFY (out[t.out_size] == 0); } } -template +template void -utf8_to_utf32_in_partial (const std::codecvt &cvt) +utf8_to_utf32_in_partial (const std::codecvt &cvt) { using namespace std; // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP - const char in[] = "bш\uAAAA\U0010AAAA"; - const char32_t exp_literal[] = U"bш\uAAAA\U0010AAAA"; - CharT exp[array_size (exp_literal)] = {}; - std::copy (begin (exp_literal), end (exp_literal), begin (exp)); - - static_assert (array_size (in) == 11, ""); - static_assert (array_size (exp_literal) == 5, ""); - static_assert (array_size (exp) == 5, ""); - VERIFY (char_traits::length (in) == 10); - VERIFY (char_traits::length (exp_literal) == 4); - VERIFY (char_traits::length (exp) == 4); + const unsigned char input[] = "bш\uAAAA\U0010AAAA"; + const char32_t expected[] = U"bш\uAAAA\U0010AAAA"; + static_assert (array_size (input) == 11, ""); + static_assert (array_size (expected) == 5, ""); + + ExternT in[array_size (input)]; + InternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 10); + VERIFY (char_traits::length (exp) == 4); test_offsets_partial offsets[] = { {1, 0, 0, 0}, // no space for first CP @@ -144,14 +144,14 @@ utf8_to_utf32_in_partial (const std::codecvt &cvt) for (auto t : offsets) { - CharT out[array_size (exp) - 1] = {}; + InternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); VERIFY (t.expected_in_next <= t.in_size); VERIFY (t.expected_out_next <= t.out_size); auto state = mbstate_t{}; - auto in_next = (const char *) nullptr; - auto out_next = (CharT *) nullptr; + auto in_next = (const ExternT *) nullptr; + auto out_next = (InternT *) nullptr; auto res = codecvt_base::result (); res = cvt.in (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -159,37 +159,38 @@ utf8_to_utf32_in_partial (const std::codecvt &cvt) VERIFY (res == cvt.partial); VERIFY (in_next == in + t.expected_in_next); VERIFY (out_next == out + t.expected_out_next); - VERIFY (char_traits::compare (out, exp, t.expected_out_next) == 0); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) + == 0); if (t.expected_out_next < array_size (out)) VERIFY (out[t.expected_out_next] == 0); } } -template +template void -utf8_to_utf32_in_error (const std::codecvt &cvt) +utf8_to_utf32_in_error (const std::codecvt &cvt) { using namespace std; // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP - const char valid_in[] = "bш\uAAAA\U0010AAAA"; - const char32_t exp_literal[] = U"bш\uAAAA\U0010AAAA"; - CharT exp[array_size (exp_literal)] = {}; - std::copy (begin (exp_literal), end (exp_literal), begin (exp)); + const unsigned char input[] = "bш\uAAAA\U0010AAAA"; + const char32_t expected[] = U"bш\uAAAA\U0010AAAA"; + static_assert (array_size (input) == 11, ""); + static_assert (array_size (expected) == 5, ""); - static_assert (array_size (valid_in) == 11, ""); - static_assert (array_size (exp_literal) == 5, ""); - static_assert (array_size (exp) == 5, ""); - VERIFY (char_traits::length (valid_in) == 10); - VERIFY (char_traits::length (exp_literal) == 4); - VERIFY (char_traits::length (exp) == 4); + ExternT in[array_size (input)]; + InternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 10); + VERIFY (char_traits::length (exp) == 4); - test_offsets_error offsets[] = { + test_offsets_error offsets[] = { // replace leading byte with invalid byte - {1, 4, 0, 0, '\xFF', 0}, - {3, 4, 1, 1, '\xFF', 1}, - {6, 4, 3, 2, '\xFF', 3}, - {10, 4, 6, 3, '\xFF', 6}, + {1, 4, 0, 0, 0xFF, 0}, + {3, 4, 1, 1, 0xFF, 1}, + {6, 4, 3, 2, 0xFF, 3}, + {10, 4, 6, 3, 0xFF, 6}, // replace first trailing byte with ASCII byte {3, 4, 1, 1, 'z', 2}, @@ -197,21 +198,21 @@ utf8_to_utf32_in_error (const std::codecvt &cvt) {10, 4, 6, 3, 'z', 7}, // replace first trailing byte with invalid byte - {3, 4, 1, 1, '\xFF', 2}, - {6, 4, 3, 2, '\xFF', 4}, - {10, 4, 6, 3, '\xFF', 7}, + {3, 4, 1, 1, 0xFF, 2}, + {6, 4, 3, 2, 0xFF, 4}, + {10, 4, 6, 3, 0xFF, 7}, // replace second trailing byte with ASCII byte {6, 4, 3, 2, 'z', 5}, {10, 4, 6, 3, 'z', 8}, // replace second trailing byte with invalid byte - {6, 4, 3, 2, '\xFF', 5}, - {10, 4, 6, 3, '\xFF', 8}, + {6, 4, 3, 2, 0xFF, 5}, + {10, 4, 6, 3, 0xFF, 8}, // replace third trailing byte {10, 4, 6, 3, 'z', 9}, - {10, 4, 6, 3, '\xFF', 9}, + {10, 4, 6, 3, 0xFF, 9}, // replace first trailing byte with ASCII byte, also incomplete at end {5, 4, 3, 2, 'z', 4}, @@ -219,30 +220,29 @@ utf8_to_utf32_in_error (const std::codecvt &cvt) {9, 4, 6, 3, 'z', 7}, // replace first trailing byte with invalid byte, also incomplete at end - {5, 4, 3, 2, '\xFF', 4}, - {8, 4, 6, 3, '\xFF', 7}, - {9, 4, 6, 3, '\xFF', 7}, + {5, 4, 3, 2, 0xFF, 4}, + {8, 4, 6, 3, 0xFF, 7}, + {9, 4, 6, 3, 0xFF, 7}, // replace second trailing byte with ASCII byte, also incomplete at end {9, 4, 6, 3, 'z', 8}, // replace second trailing byte with invalid byte, also incomplete at end - {9, 4, 6, 3, '\xFF', 8}, + {9, 4, 6, 3, 0xFF, 8}, }; for (auto t : offsets) { - char in[array_size (valid_in)] = {}; - CharT out[array_size (exp) - 1] = {}; + InternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); VERIFY (t.expected_in_next <= t.in_size); VERIFY (t.expected_out_next <= t.out_size); - char_traits::copy (in, valid_in, array_size (valid_in)); + auto old_char = in[t.replace_pos]; in[t.replace_pos] = t.replace_char; auto state = mbstate_t{}; - auto in_next = (const char *) nullptr; - auto out_next = (CharT *) nullptr; + auto in_next = (const ExternT *) nullptr; + auto out_next = (InternT *) nullptr; auto res = codecvt_base::result (); res = cvt.in (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -250,48 +250,51 @@ utf8_to_utf32_in_error (const std::codecvt &cvt) VERIFY (res == cvt.error); VERIFY (in_next == in + t.expected_in_next); VERIFY (out_next == out + t.expected_out_next); - VERIFY (char_traits::compare (out, exp, t.expected_out_next) == 0); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) + == 0); if (t.expected_out_next < array_size (out)) VERIFY (out[t.expected_out_next] == 0); + + in[t.replace_pos] = old_char; } } -template +template void -utf8_to_utf32_in (const std::codecvt &cvt) +utf8_to_utf32_in (const std::codecvt &cvt) { utf8_to_utf32_in_ok (cvt); utf8_to_utf32_in_partial (cvt); utf8_to_utf32_in_error (cvt); } -template +template void -utf32_to_utf8_out_ok (const std::codecvt &cvt) +utf32_to_utf8_out_ok (const std::codecvt &cvt) { using namespace std; // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP - const char32_t in_literal[] = U"bш\uAAAA\U0010AAAA"; - const char exp[] = "bш\uAAAA\U0010AAAA"; - CharT in[array_size (in_literal)] = {}; - copy (begin (in_literal), end (in_literal), begin (in)); - - static_assert (array_size (in_literal) == 5, ""); - static_assert (array_size (in) == 5, ""); - static_assert (array_size (exp) == 11, ""); - VERIFY (char_traits::length (in_literal) == 4); - VERIFY (char_traits::length (in) == 4); - VERIFY (char_traits::length (exp) == 10); + const char32_t input[] = U"bш\uAAAA\U0010AAAA"; + const unsigned char expected[] = "bш\uAAAA\U0010AAAA"; + static_assert (array_size (input) == 5, ""); + static_assert (array_size (expected) == 11, ""); + + InternT in[array_size (input)]; + ExternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 4); + VERIFY (char_traits::length (exp) == 10); const test_offsets_ok offsets[] = {{0, 0}, {1, 1}, {2, 3}, {3, 6}, {4, 10}}; for (auto t : offsets) { - char out[array_size (exp) - 1] = {}; + ExternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); auto state = mbstate_t{}; - auto in_next = (const CharT *) nullptr; - auto out_next = (char *) nullptr; + auto in_next = (const InternT *) nullptr; + auto out_next = (ExternT *) nullptr; auto res = codecvt_base::result (); res = cvt.out (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -299,29 +302,29 @@ utf32_to_utf8_out_ok (const std::codecvt &cvt) VERIFY (res == cvt.ok); VERIFY (in_next == in + t.in_size); VERIFY (out_next == out + t.out_size); - VERIFY (char_traits::compare (out, exp, t.out_size) == 0); + VERIFY (char_traits::compare (out, exp, t.out_size) == 0); if (t.out_size < array_size (out)) VERIFY (out[t.out_size] == 0); } } -template +template void -utf32_to_utf8_out_partial (const std::codecvt &cvt) +utf32_to_utf8_out_partial (const std::codecvt &cvt) { using namespace std; // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP - const char32_t in_literal[] = U"bш\uAAAA\U0010AAAA"; - const char exp[] = "bш\uAAAA\U0010AAAA"; - CharT in[array_size (in_literal)] = {}; - copy (begin (in_literal), end (in_literal), begin (in)); - - static_assert (array_size (in_literal) == 5, ""); - static_assert (array_size (in) == 5, ""); - static_assert (array_size (exp) == 11, ""); - VERIFY (char_traits::length (in_literal) == 4); - VERIFY (char_traits::length (in) == 4); - VERIFY (char_traits::length (exp) == 10); + const char32_t input[] = U"bш\uAAAA\U0010AAAA"; + const unsigned char expected[] = "bш\uAAAA\U0010AAAA"; + static_assert (array_size (input) == 5, ""); + static_assert (array_size (expected) == 11, ""); + + InternT in[array_size (input)]; + ExternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 4); + VERIFY (char_traits::length (exp) == 10); const test_offsets_partial offsets[] = { {1, 0, 0, 0}, // no space for first CP @@ -340,14 +343,14 @@ utf32_to_utf8_out_partial (const std::codecvt &cvt) }; for (auto t : offsets) { - char out[array_size (exp) - 1] = {}; + ExternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); VERIFY (t.expected_in_next <= t.in_size); VERIFY (t.expected_out_next <= t.out_size); auto state = mbstate_t{}; - auto in_next = (const CharT *) nullptr; - auto out_next = (char *) nullptr; + auto in_next = (const InternT *) nullptr; + auto out_next = (ExternT *) nullptr; auto res = codecvt_base::result (); res = cvt.out (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -355,44 +358,49 @@ utf32_to_utf8_out_partial (const std::codecvt &cvt) VERIFY (res == cvt.partial); VERIFY (in_next == in + t.expected_in_next); VERIFY (out_next == out + t.expected_out_next); - VERIFY (char_traits::compare (out, exp, t.expected_out_next) == 0); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) + == 0); if (t.expected_out_next < array_size (out)) VERIFY (out[t.expected_out_next] == 0); } } -template +template void -utf32_to_utf8_out_error (const std::codecvt &cvt) +utf32_to_utf8_out_error (const std::codecvt &cvt) { using namespace std; - const char32_t valid_in[] = U"bш\uAAAA\U0010AAAA"; - const char exp[] = "bш\uAAAA\U0010AAAA"; - - static_assert (array_size (valid_in) == 5, ""); - static_assert (array_size (exp) == 11, ""); - VERIFY (char_traits::length (valid_in) == 4); - VERIFY (char_traits::length (exp) == 10); - - test_offsets_error offsets[] = {{4, 10, 0, 0, 0x00110000, 0}, - {4, 10, 1, 1, 0x00110000, 1}, - {4, 10, 2, 3, 0x00110000, 2}, - {4, 10, 3, 6, 0x00110000, 3}}; + // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP + const char32_t input[] = U"bш\uAAAA\U0010AAAA"; + const unsigned char expected[] = "bш\uAAAA\U0010AAAA"; + static_assert (array_size (input) == 5, ""); + static_assert (array_size (expected) == 11, ""); + + InternT in[array_size (input)]; + ExternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 4); + VERIFY (char_traits::length (exp) == 10); + + test_offsets_error offsets[] = {{4, 10, 0, 0, 0x00110000, 0}, + {4, 10, 1, 1, 0x00110000, 1}, + {4, 10, 2, 3, 0x00110000, 2}, + {4, 10, 3, 6, 0x00110000, 3}}; for (auto t : offsets) { - CharT in[array_size (valid_in)] = {}; - char out[array_size (exp) - 1] = {}; + ExternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); VERIFY (t.expected_in_next <= t.in_size); VERIFY (t.expected_out_next <= t.out_size); - copy (begin (valid_in), end (valid_in), begin (in)); + auto old_char = in[t.replace_pos]; in[t.replace_pos] = t.replace_char; auto state = mbstate_t{}; - auto in_next = (const CharT *) nullptr; - auto out_next = (char *) nullptr; + auto in_next = (const InternT *) nullptr; + auto out_next = (ExternT *) nullptr; auto res = codecvt_base::result (); res = cvt.out (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -400,56 +408,59 @@ utf32_to_utf8_out_error (const std::codecvt &cvt) VERIFY (res == cvt.error); VERIFY (in_next == in + t.expected_in_next); VERIFY (out_next == out + t.expected_out_next); - VERIFY (char_traits::compare (out, exp, t.expected_out_next) == 0); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) + == 0); if (t.expected_out_next < array_size (out)) VERIFY (out[t.expected_out_next] == 0); + + in[t.replace_pos] = old_char; } } -template +template void -utf32_to_utf8_out (const std::codecvt &cvt) +utf32_to_utf8_out (const std::codecvt &cvt) { utf32_to_utf8_out_ok (cvt); utf32_to_utf8_out_partial (cvt); utf32_to_utf8_out_error (cvt); } -template +template void -test_utf8_utf32_codecvts (const std::codecvt &cvt) +test_utf8_utf32_cvt (const std::codecvt &cvt) { utf8_to_utf32_in (cvt); utf32_to_utf8_out (cvt); } -template +template void -utf8_to_utf16_in_ok (const std::codecvt &cvt) +utf8_to_utf16_in_ok (const std::codecvt &cvt) { using namespace std; // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP - const char in[] = "bш\uAAAA\U0010AAAA"; - const char16_t exp_literal[] = u"bш\uAAAA\U0010AAAA"; - CharT exp[array_size (exp_literal)] = {}; - copy (begin (exp_literal), end (exp_literal), begin (exp)); - - static_assert (array_size (in) == 11, ""); - static_assert (array_size (exp_literal) == 6, ""); - static_assert (array_size (exp) == 6, ""); - VERIFY (char_traits::length (in) == 10); - VERIFY (char_traits::length (exp_literal) == 5); - VERIFY (char_traits::length (exp) == 5); + const unsigned char input[] = "bш\uAAAA\U0010AAAA"; + const char16_t expected[] = u"bш\uAAAA\U0010AAAA"; + static_assert (array_size (input) == 11, ""); + static_assert (array_size (expected) == 6, ""); + + ExternT in[array_size (input)]; + InternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 10); + VERIFY (char_traits::length (exp) == 5); test_offsets_ok offsets[] = {{0, 0}, {1, 1}, {3, 2}, {6, 3}, {10, 5}}; for (auto t : offsets) { - CharT out[array_size (exp) - 1] = {}; + InternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); auto state = mbstate_t{}; - auto in_next = (const char *) nullptr; - auto out_next = (CharT *) nullptr; + auto in_next = (const ExternT *) nullptr; + auto out_next = (InternT *) nullptr; auto res = codecvt_base::result (); res = cvt.in (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -457,19 +468,19 @@ utf8_to_utf16_in_ok (const std::codecvt &cvt) VERIFY (res == cvt.ok); VERIFY (in_next == in + t.in_size); VERIFY (out_next == out + t.out_size); - VERIFY (char_traits::compare (out, exp, t.out_size) == 0); + VERIFY (char_traits::compare (out, exp, t.out_size) == 0); if (t.out_size < array_size (out)) VERIFY (out[t.out_size] == 0); } for (auto t : offsets) { - CharT out[array_size (exp)] = {}; + InternT out[array_size (exp)] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); auto state = mbstate_t{}; - auto in_next = (const char *) nullptr; - auto out_next = (CharT *) nullptr; + auto in_next = (const ExternT *) nullptr; + auto out_next = (InternT *) nullptr; auto res = codecvt_base::result (); res @@ -477,29 +488,29 @@ utf8_to_utf16_in_ok (const std::codecvt &cvt) VERIFY (res == cvt.ok); VERIFY (in_next == in + t.in_size); VERIFY (out_next == out + t.out_size); - VERIFY (char_traits::compare (out, exp, t.out_size) == 0); + VERIFY (char_traits::compare (out, exp, t.out_size) == 0); if (t.out_size < array_size (out)) VERIFY (out[t.out_size] == 0); } } -template +template void -utf8_to_utf16_in_partial (const std::codecvt &cvt) +utf8_to_utf16_in_partial (const std::codecvt &cvt) { using namespace std; // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP - const char in[] = "bш\uAAAA\U0010AAAA"; - const char16_t exp_literal[] = u"bш\uAAAA\U0010AAAA"; - CharT exp[array_size (exp_literal)] = {}; - copy (begin (exp_literal), end (exp_literal), begin (exp)); - - static_assert (array_size (in) == 11, ""); - static_assert (array_size (exp_literal) == 6, ""); - static_assert (array_size (exp) == 6, ""); - VERIFY (char_traits::length (in) == 10); - VERIFY (char_traits::length (exp_literal) == 5); - VERIFY (char_traits::length (exp) == 5); + const unsigned char input[] = "bш\uAAAA\U0010AAAA"; + const char16_t expected[] = u"bш\uAAAA\U0010AAAA"; + static_assert (array_size (input) == 11, ""); + static_assert (array_size (expected) == 6, ""); + + ExternT in[array_size (input)]; + InternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 10); + VERIFY (char_traits::length (exp) == 5); test_offsets_partial offsets[] = { {1, 0, 0, 0}, // no space for first CP @@ -530,14 +541,14 @@ utf8_to_utf16_in_partial (const std::codecvt &cvt) for (auto t : offsets) { - CharT out[array_size (exp) - 1] = {}; + InternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); VERIFY (t.expected_in_next <= t.in_size); VERIFY (t.expected_out_next <= t.out_size); auto state = mbstate_t{}; - auto in_next = (const char *) nullptr; - auto out_next = (CharT *) nullptr; + auto in_next = (const ExternT *) nullptr; + auto out_next = (InternT *) nullptr; auto res = codecvt_base::result (); res = cvt.in (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -545,36 +556,38 @@ utf8_to_utf16_in_partial (const std::codecvt &cvt) VERIFY (res == cvt.partial); VERIFY (in_next == in + t.expected_in_next); VERIFY (out_next == out + t.expected_out_next); - VERIFY (char_traits::compare (out, exp, t.expected_out_next) == 0); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) + == 0); if (t.expected_out_next < array_size (out)) VERIFY (out[t.expected_out_next] == 0); } } -template +template void -utf8_to_utf16_in_error (const std::codecvt &cvt) +utf8_to_utf16_in_error (const std::codecvt &cvt) { using namespace std; - const char valid_in[] = "bш\uAAAA\U0010AAAA"; - const char16_t exp_literal[] = u"bш\uAAAA\U0010AAAA"; - CharT exp[array_size (exp_literal)] = {}; - copy (begin (exp_literal), end (exp_literal), begin (exp)); + // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP + const unsigned char input[] = "bш\uAAAA\U0010AAAA"; + const char16_t expected[] = u"bш\uAAAA\U0010AAAA"; + static_assert (array_size (input) == 11, ""); + static_assert (array_size (expected) == 6, ""); - static_assert (array_size (valid_in) == 11, ""); - static_assert (array_size (exp_literal) == 6, ""); - static_assert (array_size (exp) == 6, ""); - VERIFY (char_traits::length (valid_in) == 10); - VERIFY (char_traits::length (exp_literal) == 5); - VERIFY (char_traits::length (exp) == 5); + ExternT in[array_size (input)]; + InternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 10); + VERIFY (char_traits::length (exp) == 5); - test_offsets_error offsets[] = { + test_offsets_error offsets[] = { // replace leading byte with invalid byte - {1, 5, 0, 0, '\xFF', 0}, - {3, 5, 1, 1, '\xFF', 1}, - {6, 5, 3, 2, '\xFF', 3}, - {10, 5, 6, 3, '\xFF', 6}, + {1, 5, 0, 0, 0xFF, 0}, + {3, 5, 1, 1, 0xFF, 1}, + {6, 5, 3, 2, 0xFF, 3}, + {10, 5, 6, 3, 0xFF, 6}, // replace first trailing byte with ASCII byte {3, 5, 1, 1, 'z', 2}, @@ -582,21 +595,21 @@ utf8_to_utf16_in_error (const std::codecvt &cvt) {10, 5, 6, 3, 'z', 7}, // replace first trailing byte with invalid byte - {3, 5, 1, 1, '\xFF', 2}, - {6, 5, 3, 2, '\xFF', 4}, - {10, 5, 6, 3, '\xFF', 7}, + {3, 5, 1, 1, 0xFF, 2}, + {6, 5, 3, 2, 0xFF, 4}, + {10, 5, 6, 3, 0xFF, 7}, // replace second trailing byte with ASCII byte {6, 5, 3, 2, 'z', 5}, {10, 5, 6, 3, 'z', 8}, // replace second trailing byte with invalid byte - {6, 5, 3, 2, '\xFF', 5}, - {10, 5, 6, 3, '\xFF', 8}, + {6, 5, 3, 2, 0xFF, 5}, + {10, 5, 6, 3, 0xFF, 8}, // replace third trailing byte {10, 5, 6, 3, 'z', 9}, - {10, 5, 6, 3, '\xFF', 9}, + {10, 5, 6, 3, 0xFF, 9}, // replace first trailing byte with ASCII byte, also incomplete at end {5, 5, 3, 2, 'z', 4}, @@ -604,30 +617,29 @@ utf8_to_utf16_in_error (const std::codecvt &cvt) {9, 5, 6, 3, 'z', 7}, // replace first trailing byte with invalid byte, also incomplete at end - {5, 5, 3, 2, '\xFF', 4}, - {8, 5, 6, 3, '\xFF', 7}, - {9, 5, 6, 3, '\xFF', 7}, + {5, 5, 3, 2, 0xFF, 4}, + {8, 5, 6, 3, 0xFF, 7}, + {9, 5, 6, 3, 0xFF, 7}, // replace second trailing byte with ASCII byte, also incomplete at end {9, 5, 6, 3, 'z', 8}, // replace second trailing byte with invalid byte, also incomplete at end - {9, 5, 6, 3, '\xFF', 8}, + {9, 5, 6, 3, 0xFF, 8}, }; for (auto t : offsets) { - char in[array_size (valid_in)] = {}; - CharT out[array_size (exp) - 1] = {}; + InternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); VERIFY (t.expected_in_next <= t.in_size); VERIFY (t.expected_out_next <= t.out_size); - char_traits::copy (in, valid_in, array_size (valid_in)); + auto old_char = in[t.replace_pos]; in[t.replace_pos] = t.replace_char; auto state = mbstate_t{}; - auto in_next = (const char *) nullptr; - auto out_next = (CharT *) nullptr; + auto in_next = (const ExternT *) nullptr; + auto out_next = (InternT *) nullptr; auto res = codecvt_base::result (); res = cvt.in (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -635,48 +647,51 @@ utf8_to_utf16_in_error (const std::codecvt &cvt) VERIFY (res == cvt.error); VERIFY (in_next == in + t.expected_in_next); VERIFY (out_next == out + t.expected_out_next); - VERIFY (char_traits::compare (out, exp, t.expected_out_next) == 0); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) + == 0); if (t.expected_out_next < array_size (out)) VERIFY (out[t.expected_out_next] == 0); + + in[t.replace_pos] = old_char; } } -template +template void -utf8_to_utf16_in (const std::codecvt &cvt) +utf8_to_utf16_in (const std::codecvt &cvt) { utf8_to_utf16_in_ok (cvt); utf8_to_utf16_in_partial (cvt); utf8_to_utf16_in_error (cvt); } -template +template void -utf16_to_utf8_out_ok (const std::codecvt &cvt) +utf16_to_utf8_out_ok (const std::codecvt &cvt) { using namespace std; // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP - const char16_t in_literal[] = u"bш\uAAAA\U0010AAAA"; - const char exp[] = "bш\uAAAA\U0010AAAA"; - CharT in[array_size (in_literal)]; - copy (begin (in_literal), end (in_literal), begin (in)); - - static_assert (array_size (in_literal) == 6, ""); - static_assert (array_size (exp) == 11, ""); - static_assert (array_size (in) == 6, ""); - VERIFY (char_traits::length (in_literal) == 5); - VERIFY (char_traits::length (exp) == 10); - VERIFY (char_traits::length (in) == 5); + const char16_t input[] = u"bш\uAAAA\U0010AAAA"; + const unsigned char expected[] = "bш\uAAAA\U0010AAAA"; + static_assert (array_size (input) == 6, ""); + static_assert (array_size (expected) == 11, ""); + + InternT in[array_size (input)]; + ExternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 5); + VERIFY (char_traits::length (exp) == 10); const test_offsets_ok offsets[] = {{0, 0}, {1, 1}, {2, 3}, {3, 6}, {5, 10}}; for (auto t : offsets) { - char out[array_size (exp) - 1] = {}; + ExternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); auto state = mbstate_t{}; - auto in_next = (const CharT *) nullptr; - auto out_next = (char *) nullptr; + auto in_next = (const InternT *) nullptr; + auto out_next = (ExternT *) nullptr; auto res = codecvt_base::result (); res = cvt.out (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -684,29 +699,29 @@ utf16_to_utf8_out_ok (const std::codecvt &cvt) VERIFY (res == cvt.ok); VERIFY (in_next == in + t.in_size); VERIFY (out_next == out + t.out_size); - VERIFY (char_traits::compare (out, exp, t.out_size) == 0); + VERIFY (char_traits::compare (out, exp, t.out_size) == 0); if (t.out_size < array_size (out)) VERIFY (out[t.out_size] == 0); } } -template +template void -utf16_to_utf8_out_partial (const std::codecvt &cvt) +utf16_to_utf8_out_partial (const std::codecvt &cvt) { using namespace std; // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP - const char16_t in_literal[] = u"bш\uAAAA\U0010AAAA"; - const char exp[] = "bш\uAAAA\U0010AAAA"; - CharT in[array_size (in_literal)]; - copy (begin (in_literal), end (in_literal), begin (in)); - - static_assert (array_size (in_literal) == 6, ""); - static_assert (array_size (exp) == 11, ""); - static_assert (array_size (in) == 6, ""); - VERIFY (char_traits::length (in_literal) == 5); - VERIFY (char_traits::length (exp) == 10); - VERIFY (char_traits::length (in) == 5); + const char16_t input[] = u"bш\uAAAA\U0010AAAA"; + const unsigned char expected[] = "bш\uAAAA\U0010AAAA"; + static_assert (array_size (input) == 6, ""); + static_assert (array_size (expected) == 11, ""); + + InternT in[array_size (input)]; + ExternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 5); + VERIFY (char_traits::length (exp) == 10); const test_offsets_partial offsets[] = { {1, 0, 0, 0}, // no space for first CP @@ -732,14 +747,14 @@ utf16_to_utf8_out_partial (const std::codecvt &cvt) }; for (auto t : offsets) { - char out[array_size (exp) - 1] = {}; + ExternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); VERIFY (t.expected_in_next <= t.in_size); VERIFY (t.expected_out_next <= t.out_size); auto state = mbstate_t{}; - auto in_next = (const CharT *) nullptr; - auto out_next = (char *) nullptr; + auto in_next = (const InternT *) nullptr; + auto out_next = (ExternT *) nullptr; auto res = codecvt_base::result (); res = cvt.out (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -747,26 +762,32 @@ utf16_to_utf8_out_partial (const std::codecvt &cvt) VERIFY (res == cvt.partial); VERIFY (in_next == in + t.expected_in_next); VERIFY (out_next == out + t.expected_out_next); - VERIFY (char_traits::compare (out, exp, t.expected_out_next) == 0); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) + == 0); if (t.expected_out_next < array_size (out)) VERIFY (out[t.expected_out_next] == 0); } } -template +template void -utf16_to_utf8_out_error (const std::codecvt &cvt) +utf16_to_utf8_out_error (const std::codecvt &cvt) { using namespace std; - const char16_t valid_in[] = u"bш\uAAAA\U0010AAAA"; - const char exp[] = "bш\uAAAA\U0010AAAA"; - - static_assert (array_size (valid_in) == 6, ""); - static_assert (array_size (exp) == 11, ""); - VERIFY (char_traits::length (valid_in) == 5); - VERIFY (char_traits::length (exp) == 10); - - test_offsets_error offsets[] = { + // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP + const char16_t input[] = u"bш\uAAAA\U0010AAAA"; + const unsigned char expected[] = "bш\uAAAA\U0010AAAA"; + static_assert (array_size (input) == 6, ""); + static_assert (array_size (expected) == 11, ""); + + InternT in[array_size (input)]; + ExternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 5); + VERIFY (char_traits::length (exp) == 10); + + test_offsets_error offsets[] = { {5, 10, 0, 0, 0xD800, 0}, {5, 10, 0, 0, 0xDBFF, 0}, {5, 10, 0, 0, 0xDC00, 0}, @@ -796,18 +817,17 @@ utf16_to_utf8_out_error (const std::codecvt &cvt) for (auto t : offsets) { - CharT in[array_size (valid_in)] = {}; - char out[array_size (exp) - 1] = {}; + ExternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); VERIFY (t.expected_in_next <= t.in_size); VERIFY (t.expected_out_next <= t.out_size); - copy (begin (valid_in), end (valid_in), begin (in)); + auto old_char = in[t.replace_pos]; in[t.replace_pos] = t.replace_char; auto state = mbstate_t{}; - auto in_next = (const CharT *) nullptr; - auto out_next = (char *) nullptr; + auto in_next = (const InternT *) nullptr; + auto out_next = (ExternT *) nullptr; auto res = codecvt_base::result (); res = cvt.out (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -815,56 +835,59 @@ utf16_to_utf8_out_error (const std::codecvt &cvt) VERIFY (res == cvt.error); VERIFY (in_next == in + t.expected_in_next); VERIFY (out_next == out + t.expected_out_next); - VERIFY (char_traits::compare (out, exp, t.expected_out_next) == 0); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) + == 0); if (t.expected_out_next < array_size (out)) VERIFY (out[t.expected_out_next] == 0); + + in[t.replace_pos] = old_char; } } -template +template void -utf16_to_utf8_out (const std::codecvt &cvt) +utf16_to_utf8_out (const std::codecvt &cvt) { utf16_to_utf8_out_ok (cvt); utf16_to_utf8_out_partial (cvt); utf16_to_utf8_out_error (cvt); } -template +template void -test_utf8_utf16_cvts (const std::codecvt &cvt) +test_utf8_utf16_cvt (const std::codecvt &cvt) { utf8_to_utf16_in (cvt); utf16_to_utf8_out (cvt); } -template +template void -utf8_to_ucs2_in_ok (const std::codecvt &cvt) +utf8_to_ucs2_in_ok (const std::codecvt &cvt) { using namespace std; // UTF-8 string of 1-byte CP, 2-byte CP and 3-byte CP - const char in[] = "bш\uAAAA"; - const char16_t exp_literal[] = u"bш\uAAAA"; - CharT exp[array_size (exp_literal)] = {}; - copy (begin (exp_literal), end (exp_literal), begin (exp)); - - static_assert (array_size (in) == 7, ""); - static_assert (array_size (exp_literal) == 4, ""); - static_assert (array_size (exp) == 4, ""); - VERIFY (char_traits::length (in) == 6); - VERIFY (char_traits::length (exp_literal) == 3); - VERIFY (char_traits::length (exp) == 3); + const unsigned char input[] = "bш\uAAAA"; + const char16_t expected[] = u"bш\uAAAA"; + static_assert (array_size (input) == 7, ""); + static_assert (array_size (expected) == 4, ""); + + ExternT in[array_size (input)]; + InternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 6); + VERIFY (char_traits::length (exp) == 3); test_offsets_ok offsets[] = {{0, 0}, {1, 1}, {3, 2}, {6, 3}}; for (auto t : offsets) { - CharT out[array_size (exp) - 1] = {}; + InternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); auto state = mbstate_t{}; - auto in_next = (const char *) nullptr; - auto out_next = (CharT *) nullptr; + auto in_next = (const ExternT *) nullptr; + auto out_next = (InternT *) nullptr; auto res = codecvt_base::result (); res = cvt.in (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -872,19 +895,19 @@ utf8_to_ucs2_in_ok (const std::codecvt &cvt) VERIFY (res == cvt.ok); VERIFY (in_next == in + t.in_size); VERIFY (out_next == out + t.out_size); - VERIFY (char_traits::compare (out, exp, t.out_size) == 0); + VERIFY (char_traits::compare (out, exp, t.out_size) == 0); if (t.out_size < array_size (out)) VERIFY (out[t.out_size] == 0); } for (auto t : offsets) { - CharT out[array_size (exp)] = {}; + InternT out[array_size (exp)] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); auto state = mbstate_t{}; - auto in_next = (const char *) nullptr; - auto out_next = (CharT *) nullptr; + auto in_next = (const ExternT *) nullptr; + auto out_next = (InternT *) nullptr; auto res = codecvt_base::result (); res @@ -892,29 +915,29 @@ utf8_to_ucs2_in_ok (const std::codecvt &cvt) VERIFY (res == cvt.ok); VERIFY (in_next == in + t.in_size); VERIFY (out_next == out + t.out_size); - VERIFY (char_traits::compare (out, exp, t.out_size) == 0); + VERIFY (char_traits::compare (out, exp, t.out_size) == 0); if (t.out_size < array_size (out)) VERIFY (out[t.out_size] == 0); } } -template +template void -utf8_to_ucs2_in_partial (const std::codecvt &cvt) +utf8_to_ucs2_in_partial (const std::codecvt &cvt) { using namespace std; // UTF-8 string of 1-byte CP, 2-byte CP and 3-byte CP - const char in[] = "bш\uAAAA"; - const char16_t exp_literal[] = u"bш\uAAAA"; - CharT exp[array_size (exp_literal)] = {}; - copy (begin (exp_literal), end (exp_literal), begin (exp)); - - static_assert (array_size (in) == 7, ""); - static_assert (array_size (exp_literal) == 4, ""); - static_assert (array_size (exp) == 4, ""); - VERIFY (char_traits::length (in) == 6); - VERIFY (char_traits::length (exp_literal) == 3); - VERIFY (char_traits::length (exp) == 3); + const unsigned char input[] = "bш\uAAAA"; + const char16_t expected[] = u"bш\uAAAA"; + static_assert (array_size (input) == 7, ""); + static_assert (array_size (expected) == 4, ""); + + ExternT in[array_size (input)]; + InternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 6); + VERIFY (char_traits::length (exp) == 3); test_offsets_partial offsets[] = { {1, 0, 0, 0}, // no space for first CP @@ -932,14 +955,14 @@ utf8_to_ucs2_in_partial (const std::codecvt &cvt) for (auto t : offsets) { - CharT out[array_size (exp) - 1] = {}; + InternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); VERIFY (t.expected_in_next <= t.in_size); VERIFY (t.expected_out_next <= t.out_size); auto state = mbstate_t{}; - auto in_next = (const char *) nullptr; - auto out_next = (CharT *) nullptr; + auto in_next = (const ExternT *) nullptr; + auto out_next = (InternT *) nullptr; auto res = codecvt_base::result (); res = cvt.in (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -947,36 +970,37 @@ utf8_to_ucs2_in_partial (const std::codecvt &cvt) VERIFY (res == cvt.partial); VERIFY (in_next == in + t.expected_in_next); VERIFY (out_next == out + t.expected_out_next); - VERIFY (char_traits::compare (out, exp, t.expected_out_next) == 0); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) + == 0); if (t.expected_out_next < array_size (out)) VERIFY (out[t.expected_out_next] == 0); } } -template +template void -utf8_to_ucs2_in_error (const std::codecvt &cvt) +utf8_to_ucs2_in_error (const std::codecvt &cvt) { using namespace std; - const char valid_in[] = "bш\uAAAA\U0010AAAA"; - const char16_t exp_literal[] = u"bш\uAAAA\U0010AAAA"; - CharT exp[array_size (exp_literal)] = {}; - copy (begin (exp_literal), end (exp_literal), begin (exp)); + const unsigned char input[] = "bш\uAAAA\U0010AAAA"; + const char16_t expected[] = u"bш\uAAAA\U0010AAAA"; + static_assert (array_size (input) == 11, ""); + static_assert (array_size (expected) == 6, ""); - static_assert (array_size (valid_in) == 11, ""); - static_assert (array_size (exp_literal) == 6, ""); - static_assert (array_size (exp) == 6, ""); - VERIFY (char_traits::length (valid_in) == 10); - VERIFY (char_traits::length (exp_literal) == 5); - VERIFY (char_traits::length (exp) == 5); + ExternT in[array_size (input)]; + InternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 10); + VERIFY (char_traits::length (exp) == 5); - test_offsets_error offsets[] = { + test_offsets_error offsets[] = { // replace leading byte with invalid byte - {1, 5, 0, 0, '\xFF', 0}, - {3, 5, 1, 1, '\xFF', 1}, - {6, 5, 3, 2, '\xFF', 3}, - {10, 5, 6, 3, '\xFF', 6}, + {1, 5, 0, 0, 0xFF, 0}, + {3, 5, 1, 1, 0xFF, 1}, + {6, 5, 3, 2, 0xFF, 3}, + {10, 5, 6, 3, 0xFF, 6}, // replace first trailing byte with ASCII byte {3, 5, 1, 1, 'z', 2}, @@ -984,21 +1008,21 @@ utf8_to_ucs2_in_error (const std::codecvt &cvt) {10, 5, 6, 3, 'z', 7}, // replace first trailing byte with invalid byte - {3, 5, 1, 1, '\xFF', 2}, - {6, 5, 3, 2, '\xFF', 4}, - {10, 5, 6, 3, '\xFF', 7}, + {3, 5, 1, 1, 0xFF, 2}, + {6, 5, 3, 2, 0xFF, 4}, + {10, 5, 6, 3, 0xFF, 7}, // replace second trailing byte with ASCII byte {6, 5, 3, 2, 'z', 5}, {10, 5, 6, 3, 'z', 8}, // replace second trailing byte with invalid byte - {6, 5, 3, 2, '\xFF', 5}, - {10, 5, 6, 3, '\xFF', 8}, + {6, 5, 3, 2, 0xFF, 5}, + {10, 5, 6, 3, 0xFF, 8}, // replace third trailing byte {10, 5, 6, 3, 'z', 9}, - {10, 5, 6, 3, '\xFF', 9}, + {10, 5, 6, 3, 0xFF, 9}, // When we see a leading byte of 4-byte CP, we should return error, no // matter if it is incomplete at the end or has errors in the trailing @@ -1020,36 +1044,35 @@ utf8_to_ucs2_in_error (const std::codecvt &cvt) {5, 5, 3, 2, 'z', 4}, // replace first trailing byte with invalid byte, also incomplete at end - {5, 5, 3, 2, '\xFF', 4}, + {5, 5, 3, 2, 0xFF, 4}, // replace first trailing byte with ASCII byte, also incomplete at end {8, 5, 6, 3, 'z', 7}, {9, 5, 6, 3, 'z', 7}, // replace first trailing byte with invalid byte, also incomplete at end - {8, 5, 6, 3, '\xFF', 7}, - {9, 5, 6, 3, '\xFF', 7}, + {8, 5, 6, 3, 0xFF, 7}, + {9, 5, 6, 3, 0xFF, 7}, // replace second trailing byte with ASCII byte, also incomplete at end {9, 5, 6, 3, 'z', 8}, // replace second trailing byte with invalid byte, also incomplete at end - {9, 5, 6, 3, '\xFF', 8}, + {9, 5, 6, 3, 0xFF, 8}, }; for (auto t : offsets) { - char in[array_size (valid_in)] = {}; - CharT out[array_size (exp) - 1] = {}; + InternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); VERIFY (t.expected_in_next <= t.in_size); VERIFY (t.expected_out_next <= t.out_size); - char_traits::copy (in, valid_in, array_size (valid_in)); + auto old_char = in[t.replace_pos]; in[t.replace_pos] = t.replace_char; auto state = mbstate_t{}; - auto in_next = (const char *) nullptr; - auto out_next = (CharT *) nullptr; + auto in_next = (const ExternT *) nullptr; + auto out_next = (InternT *) nullptr; auto res = codecvt_base::result (); res = cvt.in (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -1057,48 +1080,51 @@ utf8_to_ucs2_in_error (const std::codecvt &cvt) VERIFY (res == cvt.error); VERIFY (in_next == in + t.expected_in_next); VERIFY (out_next == out + t.expected_out_next); - VERIFY (char_traits::compare (out, exp, t.expected_out_next) == 0); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) + == 0); if (t.expected_out_next < array_size (out)) VERIFY (out[t.expected_out_next] == 0); + + in[t.replace_pos] = old_char; } } -template +template void -utf8_to_ucs2_in (const std::codecvt &cvt) +utf8_to_ucs2_in (const std::codecvt &cvt) { utf8_to_ucs2_in_ok (cvt); utf8_to_ucs2_in_partial (cvt); utf8_to_ucs2_in_error (cvt); } -template +template void -ucs2_to_utf8_out_ok (const std::codecvt &cvt) +ucs2_to_utf8_out_ok (const std::codecvt &cvt) { using namespace std; // UTF-8 string of 1-byte CP, 2-byte CP and 3-byte CP - const char16_t in_literal[] = u"bш\uAAAA"; - const char exp[] = "bш\uAAAA"; - CharT in[array_size (in_literal)] = {}; - copy (begin (in_literal), end (in_literal), begin (in)); - - static_assert (array_size (in_literal) == 4, ""); - static_assert (array_size (exp) == 7, ""); - static_assert (array_size (in) == 4, ""); - VERIFY (char_traits::length (in_literal) == 3); - VERIFY (char_traits::length (exp) == 6); - VERIFY (char_traits::length (in) == 3); + const char16_t input[] = u"bш\uAAAA"; + const unsigned char expected[] = "bш\uAAAA"; + static_assert (array_size (input) == 4, ""); + static_assert (array_size (expected) == 7, ""); + + InternT in[array_size (input)]; + ExternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 3); + VERIFY (char_traits::length (exp) == 6); const test_offsets_ok offsets[] = {{0, 0}, {1, 1}, {2, 3}, {3, 6}}; for (auto t : offsets) { - char out[array_size (exp) - 1] = {}; + ExternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); auto state = mbstate_t{}; - auto in_next = (const CharT *) nullptr; - auto out_next = (char *) nullptr; + auto in_next = (const InternT *) nullptr; + auto out_next = (ExternT *) nullptr; auto res = codecvt_base::result (); res = cvt.out (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -1106,29 +1132,29 @@ ucs2_to_utf8_out_ok (const std::codecvt &cvt) VERIFY (res == cvt.ok); VERIFY (in_next == in + t.in_size); VERIFY (out_next == out + t.out_size); - VERIFY (char_traits::compare (out, exp, t.out_size) == 0); + VERIFY (char_traits::compare (out, exp, t.out_size) == 0); if (t.out_size < array_size (out)) VERIFY (out[t.out_size] == 0); } } -template +template void -ucs2_to_utf8_out_partial (const std::codecvt &cvt) +ucs2_to_utf8_out_partial (const std::codecvt &cvt) { using namespace std; // UTF-8 string of 1-byte CP, 2-byte CP and 3-byte CP - const char16_t in_literal[] = u"bш\uAAAA"; - const char exp[] = "bш\uAAAA"; - CharT in[array_size (in_literal)] = {}; - copy (begin (in_literal), end (in_literal), begin (in)); - - static_assert (array_size (in_literal) == 4, ""); - static_assert (array_size (exp) == 7, ""); - static_assert (array_size (in) == 4, ""); - VERIFY (char_traits::length (in_literal) == 3); - VERIFY (char_traits::length (exp) == 6); - VERIFY (char_traits::length (in) == 3); + const char16_t input[] = u"bш\uAAAA"; + const unsigned char expected[] = "bш\uAAAA"; + static_assert (array_size (input) == 4, ""); + static_assert (array_size (expected) == 7, ""); + + InternT in[array_size (input)]; + ExternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 3); + VERIFY (char_traits::length (exp) == 6); const test_offsets_partial offsets[] = { {1, 0, 0, 0}, // no space for first CP @@ -1142,14 +1168,14 @@ ucs2_to_utf8_out_partial (const std::codecvt &cvt) }; for (auto t : offsets) { - char out[array_size (exp) - 1] = {}; + ExternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); VERIFY (t.expected_in_next <= t.in_size); VERIFY (t.expected_out_next <= t.out_size); auto state = mbstate_t{}; - auto in_next = (const CharT *) nullptr; - auto out_next = (char *) nullptr; + auto in_next = (const InternT *) nullptr; + auto out_next = (ExternT *) nullptr; auto res = codecvt_base::result (); res = cvt.out (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -1157,26 +1183,31 @@ ucs2_to_utf8_out_partial (const std::codecvt &cvt) VERIFY (res == cvt.partial); VERIFY (in_next == in + t.expected_in_next); VERIFY (out_next == out + t.expected_out_next); - VERIFY (char_traits::compare (out, exp, t.expected_out_next) == 0); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) + == 0); if (t.expected_out_next < array_size (out)) VERIFY (out[t.expected_out_next] == 0); } } -template +template void -ucs2_to_utf8_out_error (const std::codecvt &cvt) +ucs2_to_utf8_out_error (const std::codecvt &cvt) { using namespace std; - const char16_t valid_in[] = u"bш\uAAAA\U0010AAAA"; - const char exp[] = "bш\uAAAA\U0010AAAA"; - - static_assert (array_size (valid_in) == 6, ""); - static_assert (array_size (exp) == 11, ""); - VERIFY (char_traits::length (valid_in) == 5); - VERIFY (char_traits::length (exp) == 10); - - test_offsets_error offsets[] = { + const char16_t input[] = u"bш\uAAAA\U0010AAAA"; + const unsigned char expected[] = "bш\uAAAA\U0010AAAA"; + static_assert (array_size (input) == 6, ""); + static_assert (array_size (expected) == 11, ""); + + InternT in[array_size (input)]; + ExternT exp[array_size (expected)]; + copy (begin (input), end (input), begin (in)); + copy (begin (expected), end (expected), begin (exp)); + VERIFY (char_traits::length (in) == 5); + VERIFY (char_traits::length (exp) == 10); + + test_offsets_error offsets[] = { {5, 10, 0, 0, 0xD800, 0}, {5, 10, 0, 0, 0xDBFF, 0}, {5, 10, 0, 0, 0xDC00, 0}, @@ -1219,18 +1250,17 @@ ucs2_to_utf8_out_error (const std::codecvt &cvt) for (auto t : offsets) { - CharT in[array_size (valid_in)] = {}; - char out[array_size (exp) - 1] = {}; + ExternT out[array_size (exp) - 1] = {}; VERIFY (t.in_size <= array_size (in)); VERIFY (t.out_size <= array_size (out)); VERIFY (t.expected_in_next <= t.in_size); VERIFY (t.expected_out_next <= t.out_size); - copy (begin (valid_in), end (valid_in), begin (in)); + auto old_char = in[t.replace_pos]; in[t.replace_pos] = t.replace_char; auto state = mbstate_t{}; - auto in_next = (const CharT *) nullptr; - auto out_next = (char *) nullptr; + auto in_next = (const InternT *) nullptr; + auto out_next = (ExternT *) nullptr; auto res = codecvt_base::result (); res = cvt.out (state, in, in + t.in_size, in_next, out, out + t.out_size, @@ -1238,24 +1268,27 @@ ucs2_to_utf8_out_error (const std::codecvt &cvt) VERIFY (res == cvt.error); VERIFY (in_next == in + t.expected_in_next); VERIFY (out_next == out + t.expected_out_next); - VERIFY (char_traits::compare (out, exp, t.expected_out_next) == 0); + VERIFY (char_traits::compare (out, exp, t.expected_out_next) + == 0); if (t.expected_out_next < array_size (out)) VERIFY (out[t.expected_out_next] == 0); + + in[t.replace_pos] = old_char; } } -template +template void -ucs2_to_utf8_out (const std::codecvt &cvt) +ucs2_to_utf8_out (const std::codecvt &cvt) { ucs2_to_utf8_out_ok (cvt); ucs2_to_utf8_out_partial (cvt); ucs2_to_utf8_out_error (cvt); } -template +template void -test_utf8_ucs2_cvts (const std::codecvt &cvt) +test_utf8_ucs2_cvt (const std::codecvt &cvt) { utf8_to_ucs2_in (cvt); ucs2_to_utf8_out (cvt); diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode_char8_t.cc b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode_char8_t.cc new file mode 100644 index 000000000..8ab5ba79f --- /dev/null +++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode_char8_t.cc @@ -0,0 +1,53 @@ +// Copyright (C) 2020-2023 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// . + +// { dg-do run { target c++11 } } +// { dg-require-cstdint "" } +// { dg-options "-fchar8_t" } + +#include "codecvt_unicode.h" + +using namespace std; + +void +test_utf8_utf32_codecvts () +{ + using codecvt_c32_c8 = codecvt; + auto &loc_c = locale::classic (); + VERIFY (has_facet (loc_c)); + + auto &cvt = use_facet (loc_c); + test_utf8_utf32_cvt (cvt); +} + +void +test_utf8_utf16_codecvts () +{ + using codecvt_c16_c8 = codecvt; + auto &loc_c = locale::classic (); + VERIFY (has_facet (loc_c)); + + auto &cvt = use_facet (loc_c); + test_utf8_utf16_cvt (cvt); +} + +int +main () +{ + test_utf8_utf32_codecvts (); + test_utf8_utf16_codecvts (); +} diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc index 4fd1bfec6..6e9152b50 100644 --- a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc +++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc @@ -28,7 +28,7 @@ test_utf8_utf32_codecvts () { #if __SIZEOF_WCHAR_T__ == 4 codecvt_utf8 cvt; - test_utf8_utf32_codecvts (cvt); + test_utf8_utf32_cvt (cvt); #endif } @@ -37,7 +37,7 @@ test_utf8_utf16_codecvts () { #if __SIZEOF_WCHAR_T__ >= 2 codecvt_utf8_utf16 cvt; - test_utf8_utf16_cvts (cvt); + test_utf8_utf16_cvt (cvt); #endif } @@ -46,7 +46,7 @@ test_utf8_ucs2_codecvts () { #if __SIZEOF_WCHAR_T__ == 2 codecvt_utf8 cvt; - test_utf8_ucs2_cvts (cvt); + test_utf8_ucs2_cvt (cvt); #endif }