[v6,1/4] libcpp: reject codepoints above 0x10FFFF

Message ID 20230606205025.3164738-2-ben.boeckel@kitware.com
State Accepted
Headers
Series P1689R5 support |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Ben Boeckel June 6, 2023, 8:50 p.m. UTC
  Unicode does not support such values because they are unrepresentable in
UTF-16.

libcpp/

	* charset.cc: Reject encodings of codepoints above 0x10FFFF.
	UTF-16 does not support such codepoints and therefore all
	Unicode rejects such values.

Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
---
 libcpp/charset.cc | 7 +++++++
 1 file changed, 7 insertions(+)
  

Patch

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index d7f323b2cd5..3b34d804cf1 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1886,6 +1886,13 @@  cpp_valid_utf8_p (const char *buffer, size_t num_bytes)
       int err = one_utf8_to_cppchar (&iter, &bytesleft, &cp);
       if (err)
 	return false;
+
+      /* Additionally, Unicode declares that all codepoints above 0010FFFF are
+	 invalid because they cannot be represented in UTF-16.
+
+	 Reject such values.*/
+      if (cp >= 0x10FFFF)
+	return false;
     }
   /* No problems encountered.  */
   return true;