[v4,2/2] preprocessor/106426: Treat u8 character literals as unsigned in char8_t modes.

Message ID 20220802183602.1575950-3-tom@honermann.net
State New, archived
Headers
Series Implement C2X N2653 (char8_t) and correct UTF-8 character literal type in preprocessor directives for C++ |

Commit Message

Tom Honermann Aug. 2, 2022, 6:36 p.m. UTC
  This patch corrects handling of UTF-8 character literals in preprocessing
directives so that they are treated as unsigned types in char8_t enabled
C++ modes (C++17 with -fchar8_t or C++20 without -fno-char8_t). Previously,
UTF-8 character literals were always treated as having the same type as
ordinary character literals (signed or unsigned dependent on target or use
of the -fsigned-char or -funsigned char options).

	PR preprocessor/106426

gcc/c-family/ChangeLog:
	* c-opts.cc (c_common_post_options): Assign cpp_opts->unsigned_utf8char
	subject to -fchar8_t, -fsigned-char, and/or -funsigned-char.

gcc/testsuite/ChangeLog:
	* g++.dg/ext/char8_t-char-literal-1.C: Check signedness of u8 literals.
	* g++.dg/ext/char8_t-char-literal-2.C: Check signedness of u8 literals.

libcpp/ChangeLog:
	* charset.cc (narrow_str_to_charconst): Set signedness of CPP_UTF8CHAR
	literals based on unsigned_utf8char.
	* include/cpplib.h (cpp_options): Add unsigned_utf8char.
	* init.cc (cpp_create_reader): Initialize unsigned_utf8char.
---
 gcc/c-family/c-opts.cc                            | 1 +
 gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C | 6 +++++-
 gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C | 4 ++++
 libcpp/charset.cc                                 | 4 ++--
 libcpp/include/cpplib.h                           | 4 ++--
 libcpp/init.cc                                    | 1 +
 6 files changed, 15 insertions(+), 5 deletions(-)
  

Comments

Joseph Myers Aug. 2, 2022, 10:14 p.m. UTC | #1
On Tue, 2 Aug 2022, Tom Honermann via Gcc-patches wrote:

> This patch corrects handling of UTF-8 character literals in preprocessing
> directives so that they are treated as unsigned types in char8_t enabled
> C++ modes (C++17 with -fchar8_t or C++20 without -fno-char8_t). Previously,
> UTF-8 character literals were always treated as having the same type as
> ordinary character literals (signed or unsigned dependent on target or use
> of the -fsigned-char or -funsigned char options).

OK in the absence of C++ maintainer objections within 72 hours.  (This is 
the case where, when I added support for such literals for C (commit 
7c5890cc0a0ecea0e88cc39e9fba6385fb579e61), I raised the question of 
whether they should be unsigned in the preprocessor for C++ as well.)
  
Tom Honermann Aug. 8, 2022, 1:45 p.m. UTC | #2
On 8/2/22 6:14 PM, Joseph Myers wrote:
> On Tue, 2 Aug 2022, Tom Honermann via Gcc-patches wrote:
>
>> This patch corrects handling of UTF-8 character literals in preprocessing
>> directives so that they are treated as unsigned types in char8_t enabled
>> C++ modes (C++17 with -fchar8_t or C++20 without -fno-char8_t). Previously,
>> UTF-8 character literals were always treated as having the same type as
>> ordinary character literals (signed or unsigned dependent on target or use
>> of the -fsigned-char or -funsigned char options).
> OK in the absence of C++ maintainer objections within 72 hours.  (This is
> the case where, when I added support for such literals for C (commit
> 7c5890cc0a0ecea0e88cc39e9fba6385fb579e61), I raised the question of
> whether they should be unsigned in the preprocessor for C++ as well.)

Joseph, would you be so kind as to commit this patch series for me? I 
don't have commit access. Thank you in advance!

Tom.
  
Joseph Myers Aug. 8, 2022, 8:01 p.m. UTC | #3
On Mon, 8 Aug 2022, Tom Honermann via Gcc-patches wrote:

> On 8/2/22 6:14 PM, Joseph Myers wrote:
> > On Tue, 2 Aug 2022, Tom Honermann via Gcc-patches wrote:
> > 
> > > This patch corrects handling of UTF-8 character literals in preprocessing
> > > directives so that they are treated as unsigned types in char8_t enabled
> > > C++ modes (C++17 with -fchar8_t or C++20 without -fno-char8_t).
> > > Previously,
> > > UTF-8 character literals were always treated as having the same type as
> > > ordinary character literals (signed or unsigned dependent on target or use
> > > of the -fsigned-char or -funsigned char options).
> > OK in the absence of C++ maintainer objections within 72 hours.  (This is
> > the case where, when I added support for such literals for C (commit
> > 7c5890cc0a0ecea0e88cc39e9fba6385fb579e61), I raised the question of
> > whether they should be unsigned in the preprocessor for C++ as well.)
> 
> Joseph, would you be so kind as to commit this patch series for me? I don't
> have commit access. Thank you in advance!

Done.
  

Patch

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index 108adc5caf8..02ce1e86cdb 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1062,6 +1062,7 @@  c_common_post_options (const char **pfilename)
   /* char8_t support is implicitly enabled in C++20 and C2X.  */
   if (flag_char8_t == -1)
     flag_char8_t = (cxx_dialect >= cxx20) || flag_isoc2x;
+  cpp_opts->unsigned_utf8char = flag_char8_t ? 1 : cpp_opts->unsigned_char;
 
   if (flag_extern_tls_init)
     {
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C
index 8ed85ccfdcd..2994dd38516 100644
--- a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C
+++ b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C
@@ -1,6 +1,6 @@ 
 // Test that UTF-8 character literals have type char if -fchar8_t is not enabled.
 // { dg-do compile }
-// { dg-options "-std=c++17 -fno-char8_t" }
+// { dg-options "-std=c++17 -fsigned-char -fno-char8_t" }
 
 template<typename T1, typename T2>
   struct is_same
@@ -10,3 +10,7 @@  template<typename T>
   { static const bool value = true; };
 
 static_assert(is_same<decltype(u8'x'), char>::value, "Error");
+
+#if u8'\0' - 1 > 0
+#error "UTF-8 character literals not signed in preprocessor"
+#endif
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C
index 7861736689c..db4fe70046d 100644
--- a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C
+++ b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C
@@ -10,3 +10,7 @@  template<typename T>
   { static const bool value = true; };
 
 static_assert(is_same<decltype(u8'x'), char8_t>::value, "Error");
+
+#if u8'\0' - 1 < 0
+#error "UTF-8 character literals not unsigned in preprocessor"
+#endif
diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index ca8b7cf7aa5..12e31632228 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1960,8 +1960,8 @@  narrow_str_to_charconst (cpp_reader *pfile, cpp_string str,
   /* Multichar constants are of type int and therefore signed.  */
   if (i > 1)
     unsigned_p = 0;
-  else if (type == CPP_UTF8CHAR && !CPP_OPTION (pfile, cplusplus))
-    unsigned_p = 1;
+  else if (type == CPP_UTF8CHAR)
+    unsigned_p = CPP_OPTION (pfile, unsigned_utf8char);
   else
     unsigned_p = CPP_OPTION (pfile, unsigned_char);
 
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index 3eba6f74b57..f9c042db034 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -581,8 +581,8 @@  struct cpp_options
      ints and target wide characters, respectively.  */
   size_t precision, char_precision, int_precision, wchar_precision;
 
-  /* True means chars (wide chars) are unsigned.  */
-  bool unsigned_char, unsigned_wchar;
+  /* True means chars (wide chars, UTF-8 chars) are unsigned.  */
+  bool unsigned_char, unsigned_wchar, unsigned_utf8char;
 
   /* True if the most significant byte in a word has the lowest
      address in memory.  */
diff --git a/libcpp/init.cc b/libcpp/init.cc
index f4ab83d2145..0242da5f55c 100644
--- a/libcpp/init.cc
+++ b/libcpp/init.cc
@@ -231,6 +231,7 @@  cpp_create_reader (enum c_lang lang, cpp_hash_table *table,
   CPP_OPTION (pfile, int_precision) = CHAR_BIT * sizeof (int);
   CPP_OPTION (pfile, unsigned_char) = 0;
   CPP_OPTION (pfile, unsigned_wchar) = 1;
+  CPP_OPTION (pfile, unsigned_utf8char) = 1;
   CPP_OPTION (pfile, bytes_big_endian) = 1;  /* does not matter */
 
   /* Default to no charset conversion.  */