From patchwork Thu Oct 27 17:30:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lewis Hyatt X-Patchwork-Id: 11894 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp363204wru; Thu, 27 Oct 2022 10:31:17 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4yNl1LYJ3M4fUXJ5H53Q09RZTuEXdBi7AkH6bPJeJfiWnGDPn/BIaRSZ9qzIxDLK2jPR8o X-Received: by 2002:a05:6402:3512:b0:461:e146:d2e7 with SMTP id b18-20020a056402351200b00461e146d2e7mr18698954edd.39.1666891877570; Thu, 27 Oct 2022 10:31:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666891877; cv=none; d=google.com; s=arc-20160816; b=TLNOXtELcx1Jf1u6XxPddPdneIjZQDvwcLO4OwBffQsaXxbsYXATrBMhxHlGC3FPBc Nt5XSLeWqyLMQqmPs5l5fXZTzJUJ6Hgr49Xrtzhuz6Fca04Q6Hr9+z5hc83HX0jx88Jq KG1bWe5l5MjvOygf3/MTulYc2eVohfodfYOno9ESRiB62yOL1aHYIZ/4yLVqe6Vdyx28 9drplqvlmX3dcRdv3WTVN/HxLyYiQfP84OXccgBf1TCY/y7JxSg/pmAjFjoLG/STeKkC QmRkR9stnrfO3OuHJ+FUvISmi90CI5hgYm8ELtmfqb7M+V9iL4Idr1txo8wjppb48WaT /kRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:message-id:date:subject:to :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=k2usTU9XNA5qJE4lIYLYpHEgXhyAjqCTigTPSeNvdw8=; b=kBG/6cvXVK5Osgg3tfzrJFio1yQqda6WhZYvmWvnpGH//PUmmwO0xJ69g7EUX46BzI MaHSi3DJufxWX+VjsLhf/4GME/fRdYME87N/yXyu3Yl0r4zS5NwT2I4J7BcvhQ4Ciz27 qH6CmkDojKQoW6HBqugavS3jWva+xfbb7VFzwpp9LmEVD42ydIdifVjpaqPljYTJAIAQ Qh+BdPeiRKXelB/7qX+EytZUV6B4EVQhItVK5OxCLF/lVgfCYpCBAmvnyLHrcT22ebNv Xe6PmsIJl0AVgSQk93be8gLhnycwF2rRsuGYPumQbXckiIJNOy+I7IDBSG3D8DbpUyXq MZQQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=ZUvYHqsc; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id v19-20020aa7d653000000b00459ff7667b4si1871112edr.203.2022.10.27.10.31.17 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Oct 2022 10:31:17 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=ZUvYHqsc; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 214D93865C2C for ; Thu, 27 Oct 2022 17:31:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 214D93865C2C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1666891876; bh=k2usTU9XNA5qJE4lIYLYpHEgXhyAjqCTigTPSeNvdw8=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=ZUvYHqscjR9hVKU1sbyNZUCI4v48qtt+dNDVH5WiIOza8HVopm3k7ULOa70OtaMW7 CZzpXl1wxtdoPuyHYe4E424vbhDEKa+LtLXVRSJmTx6jilTa4ri64D9unVFtScOKc1 MwQaQ+cfDCW5scGGYhqVFZ8X+nEIPYe900+gxMuc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qt1-x833.google.com (mail-qt1-x833.google.com [IPv6:2607:f8b0:4864:20::833]) by sourceware.org (Postfix) with ESMTPS id E0C703865C27 for ; Thu, 27 Oct 2022 17:30:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E0C703865C27 Received: by mail-qt1-x833.google.com with SMTP id h24so1688764qta.7 for ; Thu, 27 Oct 2022 10:30:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=k2usTU9XNA5qJE4lIYLYpHEgXhyAjqCTigTPSeNvdw8=; b=FdXhvwxWBHDjb/QdGbaFI+Gjp3mHCnhXhw7O7CwUSF9rn6lxDmwGBB5zB2t3Slrv9C VsrEJ1JGkqk/lWXFrzHcCyxsROiq9xGhCPGM/FgTr89XPzDjNUjilTbOigzrAzOXnrV4 gSZLqozfeYmMmEldqYQ8T6soJhzM94NLcFEeLiKq0T98hGmF+tswZDnKT6tOcT8kIvUk Liyk9g9l0AwbkKoXZXp/dBvnmpHBug6Xc2+u1jqdBk5jjGCqyzNbFIOWwT2oy2rOUPDm PCX8khpxrIwCiKNGZ3SfC7fjLXpV2hvQ4mG92DqxrMMpVHG6vzjMefkZVXBuAKSLf5K4 /W4Q== X-Gm-Message-State: ACrzQf1mU8aCM51E363F/IdDWo8gkux2DpJdzATzbYg59uq4luJxFkS5 nIBevT1H5oV7d7P8N8xReBDu2lRaDd0= X-Received: by 2002:ac8:5e4d:0:b0:39e:cc0d:3428 with SMTP id i13-20020ac85e4d000000b0039ecc0d3428mr24313026qtx.44.1666891827030; Thu, 27 Oct 2022 10:30:27 -0700 (PDT) Received: from localhost.localdomain (96-67-140-173-static.hfc.comcastbusiness.net. [96.67.140.173]) by smtp.gmail.com with ESMTPSA id q11-20020a37f70b000000b006ce0733caebsm1332525qkj.14.2022.10.27.10.30.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Oct 2022 10:30:26 -0700 (PDT) To: gcc-patches@gcc.gnu.org Subject: [PATCH] c++: libcpp: Support raw strings with newlines in directives [PR55971] Date: Thu, 27 Oct 2022 13:30:11 -0400 Message-Id: <38b67944c0759299533ad163d002247996fa5e33.1666891579.git.lhyatt@gmail.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Spam-Status: No, score=-3039.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Lewis Hyatt via Gcc-patches From: Lewis Hyatt Reply-To: Lewis Hyatt Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747862817582065594?= X-GMAIL-MSGID: =?utf-8?q?1747862817582065594?= Hello- May I please ask for a review of this patch from June? I realize it's a 10-year-old PR that doesn't seem to be bothering people much, but I still feel like it's an unfortunate gap in C++11 support that is not hard to fix. Original submission is here: https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596820.html But I have attached a new version here that is simplified, all the _Pragma-related stuff has been removed and I will handle that in a later patch instead. I also removed the changes to c-ppoutput.cc that I realized were not needed after all. Bootstrap+regtest all languages on x86-64 Linux still looks good. Thanks! -Lewis -- >8 -- It's not currently possible to use a C++11 raw string containing a newline as part of the definition of a macro, or in any other preprocessing directive, such as: #define X R"(two lines)" #error R"(this error has two lines)" Add support for that by relaxing the conditions under which _cpp_get_fresh_line() refuses to get a new line. For the case of lexing a raw string, it's OK to do so as long as there is another line within the current buffer. The code in cpp_get_fresh_line() was refactored into a new function get_fresh_line_impl(), so that the new logic is applied only when processing a raw string and not any other times. libcpp/ChangeLog: PR preprocessor/55971 * lex.cc (get_fresh_line_impl): New function refactoring the code from... (_cpp_get_fresh_line): ...here. (lex_raw_string): Use the new version of get_fresh_line_impl() to support raw strings containing new lines when processing a directive. gcc/testsuite/ChangeLog: PR preprocessor/55971 * c-c++-common/raw-string-directive-1.c: New test. * c-c++-common/raw-string-directive-2.c: New test. gcc/c-family/ChangeLog: PR preprocessor/55971 * c-ppoutput.cc (adjust_for_newlines): Update comment. --- gcc/c-family/c-ppoutput.cc | 10 ++- .../c-c++-common/raw-string-directive-1.c | 74 +++++++++++++++++++ .../c-c++-common/raw-string-directive-2.c | 33 +++++++++ libcpp/lex.cc | 41 +++++++--- 4 files changed, 148 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/raw-string-directive-1.c create mode 100644 gcc/testsuite/c-c++-common/raw-string-directive-2.c diff --git a/gcc/c-family/c-ppoutput.cc b/gcc/c-family/c-ppoutput.cc index a99d9e9c5ca..6e054358e9e 100644 --- a/gcc/c-family/c-ppoutput.cc +++ b/gcc/c-family/c-ppoutput.cc @@ -433,7 +433,15 @@ scan_translation_unit_directives_only (cpp_reader *pfile) lang_hooks.preprocess_token (pfile, NULL, streamer.filter); } -/* Adjust print.src_line for newlines embedded in output. */ +/* Adjust print.src_line for newlines embedded in output. For example, if a raw + string literal contains newlines, then we need to increment our notion of the + current line to keep in sync and avoid outputting a line marker + unnecessarily. If a raw string literal containing newlines is the result of + macro expansion, then we have the opposite problem, where the token takes up + more lines in the output than it did in the input, and hence a line marker is + needed to restore the correct state for subsequent lines. In this case, + incrementing print.src_line still does the job, because it will cause us to + emit the line marker the next time a token is streamed. */ static void account_for_newlines (const unsigned char *str, size_t len) { diff --git a/gcc/testsuite/c-c++-common/raw-string-directive-1.c b/gcc/testsuite/c-c++-common/raw-string-directive-1.c new file mode 100644 index 00000000000..d6525e107bc --- /dev/null +++ b/gcc/testsuite/c-c++-common/raw-string-directive-1.c @@ -0,0 +1,74 @@ +/* { dg-do compile } */ +/* { dg-options "-std=gnu99" { target c } } */ +/* { dg-options "-std=c++11" { target c++ } } */ + +/* Test that multi-line raw strings are lexed OK for all preprocessing + directives where one could appear. Test raw-string-directive-2.c + checks that #define is also processed properly. */ + +/* Note that in cases where we cause GCC to produce a multi-line error + message, we construct the string so that the second line looks enough + like an error message for DejaGNU to process it as such, so that we + can use dg-warning or dg-error directives to check for it. */ + +#warning R"delim(line1 /* { dg-warning "line1" } */ +file:15:1: warning: line2)delim" /* { dg-warning "line2" } */ + +#error R"delim(line3 /* { dg-error "line3" } */ +file:18:1: error: line4)delim" /* { dg-error "line4" } */ + +#define X1 R"(line 5 +line 6 +line 7 +line 8 +/* +// +line 9)" R"delim( +line10)delim" + +#define X2(a) X1 #a R"(line 11 +/* +line12 +)" + +#if R"(line 13 /* { dg-error "line13" } */ +file:35:1: error: line14)" /* { dg-error "line14\\)\"\" is not valid" } */ +#endif R"(line 15 /* { dg-warning "extra tokens at end of #endif" } */ +\ +line16)" "" + +#ifdef XYZ R"(line17 /* { dg-warning "extra tokens at end of #ifdef" } */ +\ +\ +line18)" +#endif + +#if 1 +#else R"(line23 /* { dg-warning "extra tokens at end of #else" } */ +\ + +line24)" +#endif + +#if 0 +#elif R"(line 25 /* { dg-error "line25" } */ +file:55:1: error: line26)" /* { dg-error "line26\\)\"\" is not valid" } */ +#endif + +#line 60 R"(file:60:1: warning: this file has a space +in it!)" +#warning "line27" /* { dg-warning "line27" } */ +/* { dg-warning "this file has a space" "#line check" { target *-*-* } 60 } */ +#line 63 "file" + +#undef X1 R"(line28 /* { dg-warning "extra tokens at end of #undef" } */ +line29 +\ +)" + +#ident R"(line30 +line31)" R"(line 32 /* { dg-warning "extra tokens at end of #ident" } */ +line 33)" + +#pragma GCC diagnostic ignored R"(-Woption /* { dg-warning "-Wpragmas" } */ +-with-a-newline)" diff --git a/gcc/testsuite/c-c++-common/raw-string-directive-2.c b/gcc/testsuite/c-c++-common/raw-string-directive-2.c new file mode 100644 index 00000000000..6fc673ccd82 --- /dev/null +++ b/gcc/testsuite/c-c++-common/raw-string-directive-2.c @@ -0,0 +1,33 @@ +/* { dg-do run } */ +/* { dg-options "-std=gnu99" { target c } } */ +/* { dg-options "-std=c++11" { target c++ } } */ + +#define S1 R"(three +line +string)" + +#define S2 R"(pasted +two line)" " string" + +#define X(a, b) a b R"( +one more)" + +const char *s1 = S1; +const char *s2 = S2; +const char *s3 = X(S1, R"( +with this line plus)"); + +int main () +{ + const char s1_correct[] = "three\nline\nstring"; + if (__builtin_memcmp (s1, s1_correct, sizeof s1_correct)) + __builtin_abort (); + + const char s2_correct[] = "pasted\ntwo line string"; + if (__builtin_memcmp (s2, s2_correct, sizeof s2_correct)) + __builtin_abort (); + + const char s3_correct[] = "three\nline\nstring\nwith this line plus\none more"; + if (__builtin_memcmp (s3, s3_correct, sizeof s3_correct)) + __builtin_abort (); +} diff --git a/libcpp/lex.cc b/libcpp/lex.cc index cc12a52d282..b1107920c94 100644 --- a/libcpp/lex.cc +++ b/libcpp/lex.cc @@ -1076,6 +1076,9 @@ _cpp_clean_line (cpp_reader *pfile) buffer->next_line = s + 1; } +template +static bool get_fresh_line_impl (cpp_reader *pfile); + /* Return true if the trigraph indicated by NOTE should be warned about in a comment. */ static bool @@ -2695,9 +2698,8 @@ lex_raw_string (cpp_reader *pfile, cpp_token *token, const uchar *base) { pos--; pfile->buffer->cur = pos; - if (pfile->state.in_directive - || (pfile->state.parsing_args - && pfile->buffer->next_line >= pfile->buffer->rlimit)) + if ((pfile->state.in_directive || pfile->state.parsing_args) + && pfile->buffer->next_line >= pfile->buffer->rlimit) { cpp_error_with_line (pfile, CPP_DL_ERROR, token->src_loc, 0, "unterminated raw string"); @@ -2712,7 +2714,7 @@ lex_raw_string (cpp_reader *pfile, cpp_token *token, const uchar *base) CPP_INCREMENT_LINE (pfile, 0); pfile->buffer->need_line = true; - if (!_cpp_get_fresh_line (pfile)) + if (!get_fresh_line_impl (pfile)) { /* We ran out of file and failed to get a line. */ location_t src_loc = token->src_loc; @@ -2724,8 +2726,15 @@ lex_raw_string (cpp_reader *pfile, cpp_token *token, const uchar *base) _cpp_release_buff (pfile, accum.first); cpp_error_with_line (pfile, CPP_DL_ERROR, src_loc, 0, "unterminated raw string"); - /* Now pop the buffer that _cpp_get_fresh_line did not. */ + + /* Now pop the buffer that get_fresh_line_impl() did not. Popping + is not safe if processing a directive, however this cannot + happen as we already checked above that a line would be + available, and get_fresh_line_impl() can't fail in this + case. */ + gcc_assert (!pfile->state.in_directive); _cpp_pop_buffer (pfile); + return; } @@ -3659,11 +3668,14 @@ _cpp_lex_token (cpp_reader *pfile) } /* Returns true if a fresh line has been loaded. */ -bool -_cpp_get_fresh_line (cpp_reader *pfile) +template +static bool +get_fresh_line_impl (cpp_reader *pfile) { - /* We can't get a new line until we leave the current directive. */ - if (pfile->state.in_directive) + /* We can't get a new line until we leave the current directive, unless we + are lexing a raw string, in which case it will be OK as long as we don't + pop the current buffer. */ + if (!lexing_raw_string && pfile->state.in_directive) return false; for (;;) @@ -3679,6 +3691,10 @@ _cpp_get_fresh_line (cpp_reader *pfile) return true; } + /* We can't change buffers until we leave the current directive. */ + if (lexing_raw_string && pfile->state.in_directive) + return false; + /* First, get out of parsing arguments state. */ if (pfile->state.parsing_args) return false; @@ -3706,6 +3722,13 @@ _cpp_get_fresh_line (cpp_reader *pfile) } } +bool +_cpp_get_fresh_line (cpp_reader *pfile) +{ + return get_fresh_line_impl (pfile); +} + + #define IF_NEXT_IS(CHAR, THEN_TYPE, ELSE_TYPE) \ do \ { \