From patchwork Wed Sep 7 22:04:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lewis Hyatt X-Patchwork-Id: 1080 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5044:0:0:0:0:0 with SMTP id h4csp1342559wrt; Wed, 7 Sep 2022 15:05:34 -0700 (PDT) X-Google-Smtp-Source: AA6agR5FtzO3BPIGyRDoLmGm7xoJ6++l8kgqNDHZ0mXUPyHXhili6mxFR2La8KulgmI3P9E9mAlA X-Received: by 2002:a17:907:7f91:b0:741:a071:cdb3 with SMTP id qk17-20020a1709077f9100b00741a071cdb3mr3695935ejc.522.1662588334481; Wed, 07 Sep 2022 15:05:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1662588334; cv=none; d=google.com; s=arc-20160816; b=pYvMhT+lItF9EXmvobPMu1EraDMgHbaEpioK98IsritqjESSJ8WU5sWKVjpAR6sowK 2DfZFqvTvJjLVdA1vR0hIGiYo0UM66e/KYFyWUBI5Mxm2XAhVqipr8bx1NYZvDZMov3/ IKGznzG+kbupA2cKxEV8okwQhCqqb3c8joQepTyKPd3+jp+IowE3MXGco7z7mLVKyI82 miOYQTRdeJW/2eE484HJiUTtWLRlhy5p1azI54/a4Rvpadq41itzp2ZDhybVG/+z8ckm qEDw6HG4Rpm+6F0fvFVwU+tlpNVwxff/iko/K6FKjm1wyG4Po1MpQrzyKa9qUroXe6mO Yisw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:message-id:date:subject:to :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=HbH3Pc9BCBTSfmkzLkYHU1ijJWl9E8GswL9Opf1LtQc=; b=FZqFzrhAxV0Y6AMcgS597BULlW67xg7yOnWtwj5jAS51cs1TkmucQbvrSWeo2M7479 bjuy/FvTbBxarHhhdZi/hG6cxncpY9EoSP3ErZVtSQbW9Vgdtgun7qdGO7hRwvgR9/8e thxmYo81MDE3yEN66HmYr06jgVs+UHH8XJdWAFIthmUvWU2KOTuBN6o2EjbTqjDSyMay /WHs4Ma/AKXpZ+ZfMDD5ARQpm0ghVORcHgSGAhhoquUbv82yZyQJzYQLfXI6otM5xAkH dY2xJsiNTm/KvNjQQHXsIDIncv5zy5DoPSfExs1GhqEJ/q3Z0qbIJyf5RdTzg4Jy4AYa dX8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=CoFSAvvE; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id j21-20020a17090643d500b0073092b8be95si363443ejn.705.2022.09.07.15.05.34 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Sep 2022 15:05:34 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=CoFSAvvE; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 73593385AC1C for ; Wed, 7 Sep 2022 22:05:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 73593385AC1C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1662588333; bh=HbH3Pc9BCBTSfmkzLkYHU1ijJWl9E8GswL9Opf1LtQc=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=CoFSAvvEzkmrziOrsWoLRKCvJUXCkCvLhfZPdltVb64jFfaEcqX2gFDd2sbGMmQ/J VS5Rbfb+Xdu7XYhmU8xRFI8OC6UYymuwydBwEq38L9gfty+4px/Nza7Eq7aA83wk0I yw0rny3xsuUZiB7+3WOBueTvZc4LlBU6CyRs1sqg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [IPv6:2607:f8b0:4864:20::f29]) by sourceware.org (Postfix) with ESMTPS id E1C573858407 for ; Wed, 7 Sep 2022 22:04:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E1C573858407 Received: by mail-qv1-xf29.google.com with SMTP id g4so1968607qvo.3 for ; Wed, 07 Sep 2022 15:04:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date; bh=HbH3Pc9BCBTSfmkzLkYHU1ijJWl9E8GswL9Opf1LtQc=; b=hc4ge0H4e4WshlXZeLZoqpKHhO1G4JAAb/W9oqK+7fvBOZkqjoqf3gbRHYe4DKaxlp sljhQ8ZXY6JW0NXwd5Gbv4L58SPIBeL/gYsLp+b3FE4PYTHoF7warNjrBXD+sLFAvbRR YpXtjBogTqNYgY1OZyNeiF0K1bH8k+UVpYSJwA/MdsX6Q1UtSFfHlnklq78cDofFS81h a1mN5qIP6k4YmSos9tsxMsdogGvsKaOH50pm64M+7YpIHG6Ox3RvebGSmcmuzmSfTz3n 8SEXQumRMwl26G5Me8lcWrmChzJrfCaKU2z1F8LMFiJ58iLtkCkCC7WpKMadSTBjj6fd OUaA== X-Gm-Message-State: ACgBeo3txiq/9stleiaUS8YBDzND+aXZsfkC6+hle/s5Jh9ghSAzSjR9 4Q8EWMN2HuLvLVC0T9yRoloRszC+OnE= X-Received: by 2002:ad4:5def:0:b0:476:c97f:e28d with SMTP id jn15-20020ad45def000000b00476c97fe28dmr5132934qvb.42.1662588288031; Wed, 07 Sep 2022 15:04:48 -0700 (PDT) Received: from localhost.localdomain (96-67-140-173-static.hfc.comcastbusiness.net. [96.67.140.173]) by smtp.gmail.com with ESMTPSA id q29-20020a37f71d000000b006b615cd8c13sm14396661qkj.106.2022.09.07.15.04.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Sep 2022 15:04:47 -0700 (PDT) To: gcc-patches@gcc.gnu.org Subject: [PATCH] pch: Fix the reconstruction of adhoc data hash table Date: Wed, 7 Sep 2022 18:04:36 -0400 Message-Id: X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Spam-Status: No, score=-3039.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Lewis Hyatt via Gcc-patches From: Lewis Hyatt Reply-To: Lewis Hyatt Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1743350225004673261?= X-GMAIL-MSGID: =?utf-8?q?1743350225004673261?= The function rebuild_location_adhoc_htab() was meant to reconstruct the adhoc location hash map after restoring a line_maps instance from a PCH. However, the function has never performed as intended because it missed the last step of adding the data into the newly reconstructed hash map. This patch fixes that. It does not seem possible to construct a test case such that the current incorrect behavior is observable as a compiler issue. It would be observable, if it were possible for a precompiled header to contain an adhoc location with a non-zero custom data pointer. But currently, such data pointers are used only by the middle end to track inlining information, and this happens later, too late to show up in a PCH. I also noted that location_adhoc_data_update, which updates the hash map pointers in a different scenario, was relying on undefined pointer arithmetic behavior. I'm not aware of this having caused any issue in practice, but in this patch I have also changed it to use defined pointer operations instead. libcpp/ChangeLog: * line-map.cc (location_adhoc_data_update): Remove reliance on undefined behavior. (get_combined_adhoc_loc): Likewise. (rebuild_location_adhoc_htab): Fix issue where the htab was not properly updated. --- Notes: Hello- While working on something unrelated in line-map.cc, I noticed that the function rebuild_location_adhoc_htab(), whose job is to reconstruct the adhoc data hash table after a line_maps instance is reconstructed from PCH, doesn't actually rebuild the hash table at all: void rebuild_location_adhoc_htab (line_maps *set) { unsigned i; set->location_adhoc_data_map.htab = htab_create (100, location_adhoc_data_hash, location_adhoc_data_eq, NULL); for (i = 0; i < set->location_adhoc_data_map.curr_loc; i++) htab_find_slot (set->location_adhoc_data_map.htab, set->location_adhoc_data_map.data + i, INSERT); ^^^^^^^^^^^^^^ } In order to have the intended effect, it needs to set the return value of htab_find_slot to be set->location_adhoc_data_map.data + i, otherwise it doesn't effectively do anything except make the hash table think it has curr_loc elements set, when in fact it has 0. Subsequent calls to htab_traverse, for instance, will do nothing, and any lookups will also fail. I tried for some time to construct a test case that would demonstrate an observable consequence of this issue, but I don't think it's possible... The nontrivial uses of this hash map are in the middle end (e.g. to produce the trace of where a given expression was inlined from), and all this happens after PCH was read in, and doesn't require any state to be read from the PCH. It would become apparent in the future, however, if the ability to attach arbitrary data to an adhoc location were used in other ways, perhaps somewhere in libcpp; in that hypothetical case, the data would be lost when reading back in the PCH. There is another kinda related function, location_adhoc_data_update, which updates all the pointers in the hash map whenever they are invalidated. It seems to me, that this function invokes undefined behavior, since it adds an arbitrary offset to the pointers, which do not necessarily point into the same array after they were realloced. I don't think it's led to any problems in practice but in this patch I also changed that to use well-defined operations. Note sure how people may feel about that, since it does require on the surface 2x as many operations with this change, but I can't see how the current approach is guaranteed to be valid on all architectures? Bootstrap + regtest looks good on x86-64 Linux. Thanks a lot to whoever may have time to take a look at it. -Lewis libcpp/line-map.cc | 41 +++++++++++++++++++++++++++-------------- 1 file changed, 27 insertions(+), 14 deletions(-) diff --git a/libcpp/line-map.cc b/libcpp/line-map.cc index 62077c3857c..391f1d4bbc1 100644 --- a/libcpp/line-map.cc +++ b/libcpp/line-map.cc @@ -85,27 +85,38 @@ location_adhoc_data_eq (const void *l1, const void *l2) && lb1->data == lb2->data); } -/* Update the hashtable when location_adhoc_data is reallocated. */ +/* Update the hashtable when location_adhoc_data_map::data is reallocated. + The param is an array of two pointers, the previous value of the data + pointer, and then the new value. The pointers stored in the hash map + are then rebased to be relative to the new data pointer instead of the + old one. */ static int -location_adhoc_data_update (void **slot, void *data) +location_adhoc_data_update (void **slot_v, void *param_v) { - *((char **) slot) - = (char *) ((uintptr_t) *((char **) slot) + *((ptrdiff_t *) data)); + const auto slot = reinterpret_cast (slot_v); + const auto param = static_cast (param_v); + *slot = (*slot - param[0]) + param[1]; return 1; } -/* Rebuild the hash table from the location adhoc data. */ +/* The adhoc data hash table is not part of the GGC infrastructure, so it was + not initialized when SET was reconstructed from PCH; take care of that by + rebuilding it from scratch. */ void rebuild_location_adhoc_htab (line_maps *set) { - unsigned i; set->location_adhoc_data_map.htab = htab_create (100, location_adhoc_data_hash, location_adhoc_data_eq, NULL); - for (i = 0; i < set->location_adhoc_data_map.curr_loc; i++) - htab_find_slot (set->location_adhoc_data_map.htab, - set->location_adhoc_data_map.data + i, INSERT); + for (auto p = set->location_adhoc_data_map.data, + end = p + set->location_adhoc_data_map.curr_loc; + p != end; ++p) + { + const auto slot = reinterpret_cast + (htab_find_slot (set->location_adhoc_data_map.htab, p, INSERT)); + *slot = p; + } } /* Helper function for get_combined_adhoc_loc. @@ -211,8 +222,7 @@ get_combined_adhoc_loc (line_maps *set, if (set->location_adhoc_data_map.curr_loc >= set->location_adhoc_data_map.allocated) { - char *orig_data = (char *) set->location_adhoc_data_map.data; - ptrdiff_t offset; + const auto orig_data = set->location_adhoc_data_map.data; /* Cast away extern "C" from the type of xrealloc. */ line_map_realloc reallocator = (set->reallocator ? set->reallocator @@ -226,10 +236,13 @@ get_combined_adhoc_loc (line_maps *set, reallocator (set->location_adhoc_data_map.data, set->location_adhoc_data_map.allocated * sizeof (struct location_adhoc_data)); - offset = (char *) (set->location_adhoc_data_map.data) - orig_data; if (set->location_adhoc_data_map.allocated > 128) - htab_traverse (set->location_adhoc_data_map.htab, - location_adhoc_data_update, &offset); + { + location_adhoc_data *param[2] + = {orig_data, set->location_adhoc_data_map.data}; + htab_traverse (set->location_adhoc_data_map.htab, + location_adhoc_data_update, param); + } } *slot = set->location_adhoc_data_map.data + set->location_adhoc_data_map.curr_loc;