[v3,0/3] RFC: P1689R5 support

Message ID 20221109021048.2123704-1-ben.boeckel@kitware.com
Headers
Series RFC: P1689R5 support |

Message

Ben Boeckel Nov. 9, 2022, 2:10 a.m. UTC
  Hi,

This patch adds initial support for ISO C++'s [P1689R5][], a format for
describing C++ module requirements and provisions based on the source
code. This is required because compiling C++ with modules is not
embarrassingly parallel and need to be ordered to ensure that `import
some_module;` can be satisfied in time by making sure that the TU with
`export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
<some_header>;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

For the record, Clang has patches with similar flags and behavior by
Chuanqi Xu here:

    https://reviews.llvm.org/D134269

with the same flags.

Thanks,

--Ben

---
v2 -> v3:

- changelog entries moved to commit messages
- documentation updated/added in the UTF-8 routine editing

v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (3):
  libcpp: reject codepoints above 0x10FFFF
  libcpp: add a function to determine UTF-8 validity of a C string
  p1689r5: initial support

 gcc/c-family/c-opts.cc                        |  40 +++-
 gcc/c-family/c.opt                            |  12 +
 gcc/cp/module.cc                              |   3 +-
 gcc/doc/invoke.texi                           |  15 ++
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C     |   1 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C    |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C    |   4 +
 .../g++.dg/modules/depflags-fjo-MD.C          |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C    |   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C     |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C    |   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C     |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp      |   1 +
 gcc/testsuite/g++.dg/modules/p1689-1.C        |  18 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.json |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C        |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C        |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-3.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C        |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-4.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C        |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/test-p1689.py    | 222 ++++++++++++++++++
 gcc/testsuite/lib/modules.exp                 |  71 ++++++
 libcpp/charset.cc                             |  28 ++-
 libcpp/include/cpplib.h                       |  12 +-
 libcpp/include/mkdeps.h                       |  17 +-
 libcpp/init.cc                                |  13 +-
 libcpp/internal.h                             |   2 +
 libcpp/mkdeps.cc                              | 149 +++++++++++-
 38 files changed, 773 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.exp.json
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.exp.json
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.exp.json
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.exp.json
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.exp.json
 create mode 100644 gcc/testsuite/g++.dg/modules/test-p1689.py
 create mode 100644 gcc/testsuite/lib/modules.exp


base-commit: f95d3d5de72a1c43e8d529bad3ef59afc3214705
  

Comments

Jason Merrill Nov. 16, 2022, midnight UTC | #1
On 11/8/22 16:10, Ben Boeckel wrote:
> Unicode does not support such values because they are unrepresentable in
> UTF-16.
> 
> libcpp/
> 
> 	* charset.cc: Reject encodings of codepoints above 0x10FFFF.
> 	UTF-16 does not support such codepoints and therefore all
> 	Unicode rejects such values.

OK.

> Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
> ---
>   libcpp/charset.cc | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/libcpp/charset.cc b/libcpp/charset.cc
> index 12a398e7527..324b5b19136 100644
> --- a/libcpp/charset.cc
> +++ b/libcpp/charset.cc
> @@ -158,6 +158,10 @@ struct _cpp_strbuf
>      encoded as any of DF 80, E0 9F 80, F0 80 9F 80, F8 80 80 9F 80, or
>      FC 80 80 80 9F 80.  Only the first is valid.
>   
> +   Additionally, Unicode declares that all codepoints above 0010FFFF are
> +   invalid because they cannot be represented in UTF-16. As such, all 5- and
> +   6-byte encodings are invalid.
> +
>      An implementation note: the transformation from UTF-16 to UTF-8, or
>      vice versa, is easiest done by using UTF-32 as an intermediary.  */
>   
> @@ -216,7 +220,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t *inbytesleftp,
>     if (c <= 0x3FFFFFF && nbytes > 5) return EILSEQ;
>   
>     /* Make sure the character is valid.  */
> -  if (c > 0x7FFFFFFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
> +  if (c > 0x10FFFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
>   
>     *cp = c;
>     *inbufp = inbuf;
> @@ -320,7 +324,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, size_t *inbytesleftp,
>     s += inbuf[bigend ? 2 : 1] << 8;
>     s += inbuf[bigend ? 3 : 0];
>   
> -  if (s >= 0x7FFFFFFF || (s >= 0xD800 && s <= 0xDFFF))
> +  if (s > 0x10FFFF || (s >= 0xD800 && s <= 0xDFFF))
>       return EILSEQ;
>   
>     rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);