From patchwork Tue Oct 18 22:14:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lewis Hyatt X-Patchwork-Id: 4342 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:3c08:b0:7f:eb39:1b51 with SMTP id e8csp3553458dys; Tue, 18 Oct 2022 15:15:49 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6/k5ejZJwkL9HOR3zLiA0/pwUKlntCD+0OIAOKznED9/Xj2lcmnlxE/SERzznEMpO79nze X-Received: by 2002:a05:6402:114a:b0:454:85e4:2295 with SMTP id g10-20020a056402114a00b0045485e42295mr4658337edw.348.1666131349404; Tue, 18 Oct 2022 15:15:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666131349; cv=none; d=google.com; s=arc-20160816; b=pelyeDroEhHx7Kjpl98rdKqD0W0HVhiyp+BdKapA3Sxh7jtHna57+eDfCViNl4kmef n1TyhWUsaG0TIOTi/TAEeuU5P6lyAGLqIGJNUgfA8SDzlVouNfQM6A2Tpy2+hcBeqJ0q PDFDjB8hBqV1lMj30QwQl2ynCVjEMRo0bJafnPsXNzWaRlRgqad710XISWn9kNQyagSv vMNFeh6ttZ2HT8smFLH45SXZRN5lj3FjTszZX2CppssS1YoAD/MKBHDxt2T/aEaxqRd/ G0IMjRlvn0PPQWepmK/S9WqMyokTP6ODi5yYIPVjqEnvmA6o9d8JGDmABZEKv+iNpVp4 BTew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:message-id:date:subject:to :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=AZLe7ypHcC8xXa/slk6m/G10hDXBjlbOjwfKRnSefGE=; b=arIWT6Vaquj+G2LqCFVBZNF6v4qRDoxw0G1/7JnXVKlbfH2Knn/avVWCz8SAsz0/rm 93gkFc1FtIKqmiCgDSJ9ZbZ6vGSIj5ieqHhcfiYWeBWU1GmKGRYKzOzCPsVP4yzUeTLt R/6/2btHnA2Dq0N8ZqcO+AWImuv1oN5e3JUtGzgA7UTPDHW0niHcCAd75DYsne8rAxyD i01+FpbYuNcoHdpbIIX7H9wmAgmcYVCBXWlK51nq9DJpX5Mo6KX6Z02OI+R5+/XIx1GY 5dotZ5tO7rl6qfCMO50o/nZ54BuDAwTo58WmlmTrWpaTERKUZvIJgCamwFm6w4MuMSvM Tk4w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=k0Jlve7L; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id e14-20020a056402148e00b00459e39448b3si11306846edv.254.2022.10.18.15.15.48 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Oct 2022 15:15:49 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=k0Jlve7L; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B684E3858286 for ; Tue, 18 Oct 2022 22:15:47 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B684E3858286 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1666131347; bh=AZLe7ypHcC8xXa/slk6m/G10hDXBjlbOjwfKRnSefGE=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=k0Jlve7LeJUBDSlxmYYi76rpgcankp6Zzsic5SIfUZ9z4+CHc3zGEPe4/2NiR37B6 gu+pFn1V++k6PdbHtzHkjI8VoEAjMEd57n3hpr6fPRETyXEbNa+1c4OoKflVHecVtO 09YTHHd05E6Fo3b05DpRb5NBlkF2YWCv2LU1kx8k= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qv1-xf36.google.com (mail-qv1-xf36.google.com [IPv6:2607:f8b0:4864:20::f36]) by sourceware.org (Postfix) with ESMTPS id 584983858D32 for ; Tue, 18 Oct 2022 22:15:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 584983858D32 Received: by mail-qv1-xf36.google.com with SMTP id o67so10220590qvo.13 for ; Tue, 18 Oct 2022 15:15:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=AZLe7ypHcC8xXa/slk6m/G10hDXBjlbOjwfKRnSefGE=; b=OsP+G1pcvdZqcdQ8To0gw702dSrx0yDs5EPRInV8GlZqkEpG3wv6s+6MM/Y3nPh2iq VwmCMOmU2ZPMGWqymIrtuzBTm5x2+3EvGn/i1/TFA0DmeQh+Vx4+g1vqf9/7/Jmwqw13 qXujLW9GeOpLu3a1YOe1Qg2kok1t7ALHWp5B+OBZs4jMr4cZZ8diaBg4xaTqAp9C9CQQ XpJLXJS65aa43KD6BSuRO1aA9R/I0g0+yXe1OpZFeiBAAU5l1sUDyVsPxUHfr5zCbUHd oq1mtP+KgMCso4ajL2pb0MwKdMMPC7MI+LP2pyxj8/Wx5ocCNRgiTIhdisTOE/ULnb3P dHdg== X-Gm-Message-State: ACrzQf2ZgqoGKA20jy7kBb1vPONywC1/vf54n8u14ViyiDmWWmVDbOh/ 9r3Y9GNh7GXsfseCvyO8/3YDTTvmlng= X-Received: by 2002:a05:6214:27ec:b0:4b2:1337:a442 with SMTP id jt12-20020a05621427ec00b004b21337a442mr4134539qvb.20.1666131301421; Tue, 18 Oct 2022 15:15:01 -0700 (PDT) Received: from localhost.localdomain (96-67-140-173-static.hfc.comcastbusiness.net. [96.67.140.173]) by smtp.gmail.com with ESMTPSA id dt5-20020a05620a478500b006ee94c5bf26sm3410761qkb.91.2022.10.18.15.15.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Oct 2022 15:15:00 -0700 (PDT) To: gcc-patches@gcc.gnu.org Subject: [PATCH] pch: Fix streaming of strings with embedded null bytes Date: Tue, 18 Oct 2022 18:14:54 -0400 Message-Id: X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Spam-Status: No, score=-3039.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Lewis Hyatt via Gcc-patches From: Lewis Hyatt Reply-To: Lewis Hyatt Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747065345918430269?= X-GMAIL-MSGID: =?utf-8?q?1747065345918430269?= When a GTY'ed struct is streamed to PCH, any plain char* pointers it contains (whether they live in GC-controlled memory or not) will be marked for PCH output by the routine gt_pch_note_object in ggc-common.cc. This routine special-cases plain char* strings, and in particular it uses strlen() to get their length. Thus it does not handle strings with embedded null bytes, but it is possible for something PCH cares about (such as a string literal token in a macro definition) to contain such embedded nulls. To fix that up, add a new GTY option "string_length" so that gt_pch_note_object can be informed the actual length it ought to use, and use it in the relevant libcpp structs (cpp_string and ht_identifier) accordingly. gcc/ChangeLog: * gengtype.cc (output_escaped_param): Add missing const. (get_string_option): Add missing check for option type. (walk_type): Support new "string_length" GTY option. (write_types_process_field): Likewise. * ggc-common.cc (gt_pch_note_object): Add optional length argument. * ggc.h (gt_pch_note_object): Adjust prototype for new argument. (gt_pch_n_S2): Declare... * stringpool.cc (gt_pch_n_S2): ...new function. * doc/gty.texi: Document new GTY((string_length)) option. libcpp/ChangeLog: * include/cpplib.h (struct cpp_string): Use new "string_length" GTY. * include/symtab.h (struct ht_identifier): Likewise. gcc/testsuite/ChangeLog: * g++.dg/pch/pch-string-nulls.C: New test. * g++.dg/pch/pch-string-nulls.Hs: New test. --- Notes: Hello- This fixes a small glitch with PCH files that I doubt matters in practice. However, the new GTY((string_length)) option I think should be also useful for other things (including for another patch I am working on), and it seems worth fixing to me anyway. Please let me know if it looks OK, or if you'd prefer another approach? I did consider reusing GTY((length)) for this purpose but it seemed much more straightforward to do it with a new option, and it's really about something different since it isn't related to marking of GC-controlled memory. BTW, the testcase (pch-string-nulls.Hs) needs to have a literal null byte in it. That wasn't emailing well so I temporarily have it as the string "^@" in this patch, for illustration. Bootstrap + regtest all languages looks good on x86-64 Linux. Thanks! -Lewis gcc/doc/gty.texi | 21 +++++++++++++++- gcc/gengtype.cc | 25 ++++++++++++++++---- gcc/ggc-common.cc | 7 ++++-- gcc/ggc.h | 4 +++- gcc/stringpool.cc | 7 ++++++ gcc/testsuite/g++.dg/pch/pch-string-nulls.C | 3 +++ gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs | 2 ++ libcpp/include/cpplib.h | 6 ++++- libcpp/include/symtab.h | 5 +++- 9 files changed, 70 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/g++.dg/pch/pch-string-nulls.C create mode 100644 gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs diff --git a/gcc/doc/gty.texi b/gcc/doc/gty.texi index 81aafd11ce3..4f791b300ba 100644 --- a/gcc/doc/gty.texi +++ b/gcc/doc/gty.texi @@ -196,7 +196,26 @@ static GTY((length("reg_known_value_size"))) rtx *reg_known_value; Note that the @code{length} option is only meant for use with arrays of non-atomic objects, that is, objects that contain pointers pointing to other GTY-managed objects. For other GC-allocated arrays and strings -you should use @code{atomic}. +you should use @code{atomic} or @code{string_length}. + +@findex string_length +@item string_length ("@var{expression}") + +In order to simplify production of PCH, a structure member that is a plain +array of bytes (an optionally @code{const} and/or @code{unsigned} @code{char +*}) is treated specially by the infrastructure. Even if such an array has not +been allocated in GC-controlled memory, it will still be written properly into +a PCH. The machinery responsible for this needs to know the length of the +data; by default, the length is determined by calling @code{strlen} on the +pointer. The @code{string_length} option specifies an alternate way to +determine the length, such as by inspecting another struct member: + +@smallexample +struct GTY(()) non_terminated_string @{ + size_t sz; + const char * GTY((string_length ("%h.sz"))) data; +@}; +@end smallexample @findex skip @item skip diff --git a/gcc/gengtype.cc b/gcc/gengtype.cc index 42363439bd8..28bf05e9c57 100644 --- a/gcc/gengtype.cc +++ b/gcc/gengtype.cc @@ -2403,7 +2403,7 @@ struct write_types_data enum write_types_kinds kind; }; -static void output_escaped_param (struct walk_type_data *d, +static void output_escaped_param (const struct walk_type_data *d, const char *, const char *); static void output_mangled_typename (outf_p, const_type_p); static void walk_type (type_p t, struct walk_type_data *d); @@ -2537,7 +2537,7 @@ output_mangled_typename (outf_p of, const_type_p t) print error messages. */ static void -output_escaped_param (struct walk_type_data *d, const char *param, +output_escaped_param (const struct walk_type_data *d, const char *param, const char *oname) { const char *p; @@ -2576,7 +2576,7 @@ const char * get_string_option (options_p opt, const char *key) { for (; opt; opt = opt->next) - if (strcmp (opt->name, key) == 0) + if (opt->kind == OPTION_STRING && strcmp (opt->name, key) == 0) return opt->info.string; return NULL; } @@ -2700,6 +2700,8 @@ walk_type (type_p t, struct walk_type_data *d) ; else if (strcmp (oo->name, "callback") == 0) ; + else if (strcmp (oo->name, "string_length") == 0) + ; else error_at_line (d->line, "unknown option `%s'\n", oo->name); @@ -3251,7 +3253,22 @@ write_types_process_field (type_p f, const struct walk_type_data *d) { oprintf (d->of, "%*sgt_%s_", d->indent, "", wtd->prefix); output_mangled_typename (d->of, f); - oprintf (d->of, " (%s%s);\n", cast, d->val); + + /* Check if we need to call the special pch note version + for strings that takes an explicit length. */ + const auto length_override + = (f->kind == TYPE_STRING && !strcmp (wtd->prefix, "pch_n") + ? get_string_option (d->opt, "string_length") + : nullptr); + if (length_override) + { + oprintf (d->of, "2 (%s%s, ", cast, d->val); + output_escaped_param (d, length_override, "string_length"); + } + else + oprintf (d->of, " (%s%s", cast, d->val); + + oprintf (d->of, ");\n"); if (d->reorder_fn && wtd->reorder_note_routine) oprintf (d->of, "%*s%s (%s%s, %s%s, %s);\n", d->indent, "", wtd->reorder_note_routine, cast, d->val, cast, d->val, diff --git a/gcc/ggc-common.cc b/gcc/ggc-common.cc index 8b3389e8760..62da09d66a7 100644 --- a/gcc/ggc-common.cc +++ b/gcc/ggc-common.cc @@ -253,7 +253,8 @@ static vec reloc_addrs_vec; int gt_pch_note_object (void *obj, void *note_ptr_cookie, - gt_note_pointers note_ptr_fn) + gt_note_pointers note_ptr_fn, + size_t length_override) { struct ptr_data **slot; @@ -273,7 +274,9 @@ gt_pch_note_object (void *obj, void *note_ptr_cookie, (*slot)->obj = obj; (*slot)->note_ptr_fn = note_ptr_fn; (*slot)->note_ptr_cookie = note_ptr_cookie; - if (note_ptr_fn == gt_pch_p_S) + if (length_override != (size_t)-1) + (*slot)->size = length_override; + else if (note_ptr_fn == gt_pch_p_S) (*slot)->size = strlen ((const char *)obj) + 1; else (*slot)->size = ggc_get_size (obj); diff --git a/gcc/ggc.h b/gcc/ggc.h index aeec1bafb9b..7bc74ec82b5 100644 --- a/gcc/ggc.h +++ b/gcc/ggc.h @@ -44,7 +44,8 @@ typedef void (*gt_handle_reorder) (void *, void *, gt_pointer_operator, void *); /* Used by the gt_pch_n_* routines. Register an object in the hash table. */ -extern int gt_pch_note_object (void *, void *, gt_note_pointers); +extern int gt_pch_note_object (void *, void *, gt_note_pointers, + size_t length_override = (size_t)-1); /* Used by the gt_pch_p_* routines. Register address of a callback pointer. */ @@ -101,6 +102,7 @@ extern int ggc_marked_p (const void *); /* PCH and GGC handling for strings, mostly trivial. */ extern void gt_pch_n_S (const void *); +extern void gt_pch_n_S2 (const void *, size_t); extern void gt_ggc_m_S (const void *); /* End of GTY machinery API. */ diff --git a/gcc/stringpool.cc b/gcc/stringpool.cc index 57509d58e15..20dbef5580c 100644 --- a/gcc/stringpool.cc +++ b/gcc/stringpool.cc @@ -196,6 +196,13 @@ gt_pch_n_S (const void *x) >_pch_p_S); } +void +gt_pch_n_S2 (const void *x, size_t string_len) +{ + gt_pch_note_object (CONST_CAST (void *, x), CONST_CAST (void *, x), + >_pch_p_S, string_len); +} + /* User-callable entry point for marking string X. */ diff --git a/gcc/testsuite/g++.dg/pch/pch-string-nulls.C b/gcc/testsuite/g++.dg/pch/pch-string-nulls.C new file mode 100644 index 00000000000..dfeb21adf71 --- /dev/null +++ b/gcc/testsuite/g++.dg/pch/pch-string-nulls.C @@ -0,0 +1,3 @@ +// { dg-do compile { target c++11 } } +#include "pch-string-nulls.H" +static_assert (X[4] == '[' && X[5] == '!' && X[6] == ']', "error"); diff --git a/gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs b/gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs new file mode 100644 index 00000000000..8f8bc187f8c --- /dev/null +++ b/gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs @@ -0,0 +1,2 @@ +/* Note that there is a null byte following "ABC". */ +#define X R"(ABC^@[!])" diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h index d5ef12a30ea..1d34c00669f 100644 --- a/libcpp/include/cpplib.h +++ b/libcpp/include/cpplib.h @@ -179,7 +179,11 @@ enum c_lang {CLK_GNUC89 = 0, CLK_GNUC99, CLK_GNUC11, CLK_GNUC17, CLK_GNUC2X, /* Payload of a NUMBER, STRING, CHAR or COMMENT token. */ struct GTY(()) cpp_string { unsigned int len; - const unsigned char *text; + + /* TEXT is always null terminated (terminator not included in len); but this + GTY markup arranges that PCH streaming works properly even if there is a + null byte in the middle of the string. */ + const unsigned char * GTY((string_length ("1 + %h.len"))) text; }; /* Flags for the cpp_token structure. */ diff --git a/libcpp/include/symtab.h b/libcpp/include/symtab.h index 53efe6c3943..8b45fd5c2ce 100644 --- a/libcpp/include/symtab.h +++ b/libcpp/include/symtab.h @@ -29,7 +29,10 @@ along with this program; see the file COPYING3. If not see typedef struct ht_identifier ht_identifier; typedef struct ht_identifier *ht_identifier_ptr; struct GTY(()) ht_identifier { - const unsigned char *str; + /* This GTY markup arranges that the null-terminated identifier would still + stream to PCH correctly, if a null byte were to make its way into an + identifier somehow. */ + const unsigned char * GTY((string_length ("1 + %h.len"))) str; unsigned int len; unsigned int hash_value; };