From patchwork Fri Mar 17 20:53:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Malcolm X-Patchwork-Id: 71430 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp555977wrt; Fri, 17 Mar 2023 13:54:39 -0700 (PDT) X-Google-Smtp-Source: AK7set8p3XfbyROa0Yx1+qciO0Ax1WMkj4lZuATAu9NVtqbjHF/jWjqCqOm/MFQa+iUylgv7iNzv X-Received: by 2002:a17:906:3e0c:b0:92c:5f1:8288 with SMTP id k12-20020a1709063e0c00b0092c05f18288mr713192eji.13.1679086479184; Fri, 17 Mar 2023 13:54:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679086479; cv=none; d=google.com; s=arc-20160816; b=AB0ZCGTiqL5YTGHcdQuTcyYDCjsJDJNxM7AOrvrdZsKP7q/d29kY1rG6C9Z7EKZwJs 0HVSs+5q4ubYgwayVsmQcrBkpAqed7dIlLxCZk/hq6BEIPL3QVKlPDLGvJvYaWlC/zFj VH+XgjeY/1dvssFTfuKgaY9zlxqozdbt5bjY10JKeTVwYcxuiFIDQo7i/T2YXT0I++cS UfGEIzvXWY42Sf0rtw31OCvPRxhYkXWIrItPkz5N89xm6zzvb+2ossnuRdNhEVmGHBsG tQofzZB8NIDNIX58bT+C58sft06ojwmTzvWImnCeTEDncBFsvHsQucWZYAeZiMBEVmvE t80Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=EMiHRLdo+hDis+am4ra60TYHcofQ6Yd3nOYlIkDCDq8=; b=VobBH83z1R6RH8Gif8srk9TxGlCwdb61oPXin9xJlv6gb5Q0X1HrRM5uj2CDzcAfF9 Mz0rS3URcV2wO9PsLoRO+8VUD7O6XFz5NCrmcF0Ad42CmyQWqTSmYbPzSpK0vhSoEQZU QTXv0SlcwgqAsPNpudsdtI8BbWZqJ14AvWU59/3m/fjjcyqro6XYGSWBskbyWnRY6E4e qAXdbbpNKTtwq9kMVxx7ybkZ+KyPPxg1i1vjafK56OtslGLvgVv360X77Rk0aOHkcuyP U1oVygCeQPtUiHjg/eiRXh+Ijav5D7EU9FEp4FjwBzPQh/3hzA0ybtMQ5FR1j3nH2CQZ 0yxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=f8dfuvPR; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id y3-20020a1709064b0300b009246c22f6fbsi3282985eju.475.2023.03.17.13.54.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Mar 2023 13:54:39 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=f8dfuvPR; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DA16B3850855 for ; Fri, 17 Mar 2023 20:54:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DA16B3850855 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1679086477; bh=EMiHRLdo+hDis+am4ra60TYHcofQ6Yd3nOYlIkDCDq8=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=f8dfuvPR0Eoa6KD9KKnv4PuA0HhrCxiAOHn6HnFZwZcmM73ccZDfglMNV+WAW1ZV3 rwMi0lCATGd2zoBJJS+NlrMd0JcdhZXXdtx+39Mfw+JYK09wBQSPlU9gSXFj/VnsTq OfVGkgPPtO57NuZpQO73C1r6ubErt3H1wTP6VScc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id A35E938582A4 for ; Fri, 17 Mar 2023 20:53:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A35E938582A4 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-213-XE5GPnT5May-186DEjWxUA-1; Fri, 17 Mar 2023 16:53:52 -0400 X-MC-Unique: XE5GPnT5May-186DEjWxUA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E663596DC81 for ; Fri, 17 Mar 2023 20:53:51 +0000 (UTC) Received: from t14s.localdomain.com (unknown [10.2.17.70]) by smtp.corp.redhat.com (Postfix) with ESMTP id AA3FE40CF8F2; Fri, 17 Mar 2023 20:53:51 +0000 (UTC) To: gcc-patches@gcc.gnu.org Cc: David Malcolm Subject: [PATCH] json: preserve key-insertion order [PR109163] Date: Fri, 17 Mar 2023 16:53:49 -0400 Message-Id: <20230317205349.3635562-1-dmalcolm@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: David Malcolm via Gcc-patches From: David Malcolm Reply-To: David Malcolm Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760649783999010388?= X-GMAIL-MSGID: =?utf-8?q?1760649783999010388?= PR other/109163 notes that when we write out JSON files, we traverse the keys within each object via hash_map iteration, and thus the ordering is non-deterministic - it can arbitrarily vary from run to run and from different machines, making it harder for users to compare results and determine if anything has "really" changed. I'm running into this issue with SARIF output, but there are several places where we're currently emitting JSON: * -fsave-optimization-record emits SRCFILE.opt-record.json.gz "This option is experimental and the format of the data within the compressed JSON file is subject to change."; see optinfo-emit-json.{h,cc}, dumpfile.cc, etc * -fdiagnostics-format= with the various "sarif" and "json" options * -fdump-analyzer-json is a developer option in the analyzer * gcov has: "-j, --json-format: Output JSON intermediate format into .gcov.json.gz file" This patch adds an auto_vec to class json::object to preserve key-insertion order, and use it when writing out objects. Potentially this slightly slows down JSON output, but I believe that this isn't normally a bottleneck, and that the benefits to the user of deterministic output are worth it. I had first attempted to use ordered_hash_map.h for this, but ran into impenetrable template errors, so this patch uses a simpler approach of just adding an auto_vec to json::object. Testing showed a failure of diagnostic-format-json-5.c, which was using a convoluted set of regexps to consume the output; I believe that this was brittle, and was intermittently failing for some of the random orderings of output. I rewrote these regexps to work with the expected output order. The other such tests seem to pass with the now-deterministic orderings. Lightly tested with valgrind. I manually verified that the SARIF output is now deterministic. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. OK for trunk? gcc/ChangeLog: PR other/109163 * json.cc: Update comments to indicate that we now preserve insertion order of keys within objects. (object::print): Traverse keys in insertion order. (object::set): Preserve insertion order of keys. (selftest::test_writing_objects): Add an additional key to verify that we preserve insertion order. * json.h (object::m_keys): New field. gcc/testsuite/ChangeLog: PR other/109163 * c-c++-common/diagnostic-format-json-1.c: Update comment. * c-c++-common/diagnostic-format-json-2.c: Likewise. * c-c++-common/diagnostic-format-json-3.c: Likewise. * c-c++-common/diagnostic-format-json-4.c: Likewise. * c-c++-common/diagnostic-format-json-5.c: Rewrite regexps. * c-c++-common/diagnostic-format-json-stderr-1.c: Update comment. Signed-off-by: David Malcolm --- gcc/json.cc | 40 ++++--- gcc/json.h | 10 +- .../c-c++-common/diagnostic-format-json-1.c | 3 +- .../c-c++-common/diagnostic-format-json-2.c | 3 +- .../c-c++-common/diagnostic-format-json-3.c | 3 +- .../c-c++-common/diagnostic-format-json-4.c | 3 +- .../c-c++-common/diagnostic-format-json-5.c | 100 ++++++++++-------- .../diagnostic-format-json-stderr-1.c | 3 +- 8 files changed, 95 insertions(+), 70 deletions(-) diff --git a/gcc/json.cc b/gcc/json.cc index 01879c9c466..741e97b20e5 100644 --- a/gcc/json.cc +++ b/gcc/json.cc @@ -31,8 +31,11 @@ using namespace json; /* class json::value. */ /* Dump this json::value tree to OUTF. - No formatting is done. There are no guarantees about the order - in which the key/value pairs of json::objects are printed. */ + + No formatting is done. + + The key/value pairs of json::objects are printed in the order + in which the keys were originally inserted. */ void value::dump (FILE *outf) const @@ -44,7 +47,7 @@ value::dump (FILE *outf) const } /* class json::object, a subclass of json::value, representing - an unordered collection of key/value pairs. */ + an ordered collection of key/value pairs. */ /* json:object's dtor. */ @@ -62,14 +65,17 @@ object::~object () void object::print (pretty_printer *pp) const { - /* Note that the order is not guaranteed. */ pp_character (pp, '{'); - for (map_t::iterator it = m_map.begin (); it != m_map.end (); ++it) + + /* Iterate in the order that the keys were inserted. */ + unsigned i; + const char *key; + FOR_EACH_VEC_ELT (m_keys, i, key) { - if (it != m_map.begin ()) + if (i > 0) pp_string (pp, ", "); - const char *key = const_cast ((*it).first); - value *value = (*it).second; + map_t &mut_map = const_cast (m_map); + value *value = *mut_map.get (key); pp_doublequote (pp); pp_string (pp, key); // FIXME: escaping? pp_doublequote (pp); @@ -97,9 +103,13 @@ object::set (const char *key, value *v) *ptr = v; } else - /* If the key wasn't already present, take a copy of the key, - and store the value. */ - m_map.put (xstrdup (key), v); + { + /* If the key wasn't already present, take a copy of the key, + and store the value. */ + char *owned_key = xstrdup (key); + m_map.put (owned_key, v); + m_keys.safe_push (owned_key); + } } /* Get the json::value * for KEY. @@ -295,15 +305,17 @@ test_object_get () ASSERT_EQ (obj.get ("not-present"), NULL); } -/* Verify that JSON objects are written correctly. We can't test more than - one key/value pair, as we don't impose a guaranteed ordering. */ +/* Verify that JSON objects are written correctly. */ static void test_writing_objects () { object obj; obj.set ("foo", new json::string ("bar")); - assert_print_eq (obj, "{\"foo\": \"bar\"}"); + obj.set ("baz", new json::string ("quux")); + /* This test relies on json::object writing out key/value pairs + in key-insertion order. */ + assert_print_eq (obj, "{\"foo\": \"bar\", \"baz\": \"quux\"}"); } /* Verify that JSON arrays are written correctly. */ diff --git a/gcc/json.h b/gcc/json.h index aa52ba2951a..057119db277 100644 --- a/gcc/json.h +++ b/gcc/json.h @@ -82,8 +82,11 @@ class value void dump (FILE *) const; }; -/* Subclass of value for objects: an unordered collection of - key/value pairs. */ +/* Subclass of value for objects: a collection of key/value pairs + preserving the ordering in which keys were inserted. + + Preserving the order eliminates non-determinism in the output, + making it easier for the user to compare repeated invocations. */ class object : public value { @@ -100,6 +103,9 @@ class object : public value typedef hash_map > map_t; map_t m_map; + + /* Keep track of order in which keys were inserted. */ + auto_vec m_keys; }; /* Subclass of value for arrays. */ diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c b/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c index af57eb636d5..6bab30e3e6c 100644 --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c @@ -4,8 +4,7 @@ #error message /* Use dg-regexp to consume the JSON output starting with - the innermost values, and working outwards. - We can't rely on any ordering of the keys. */ + the innermost values, and working outwards. */ /* { dg-regexp "\"kind\": \"error\"" } */ /* { dg-regexp "\"column-origin\": 1" } */ diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c b/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c index edb802efb8d..3c12103c9f8 100644 --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c @@ -4,8 +4,7 @@ #warning message /* Use dg-regexp to consume the JSON output starting with - the innermost values, and working outwards. - We can't rely on any ordering of the keys. */ + the innermost values, and working outwards. */ /* { dg-regexp "\"kind\": \"warning\"" } */ /* { dg-regexp "\"column-origin\": 1" } */ diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c b/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c index bb7b8dc5d16..11d74624ff1 100644 --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c @@ -4,8 +4,7 @@ #warning message /* Use dg-regexp to consume the JSON output starting with - the innermost values, and working outwards. - We can't rely on any ordering of the keys. */ + the innermost values, and working outwards. */ /* { dg-regexp "\"kind\": \"error\"" } */ /* { dg-regexp "\"column-origin\": 1" } */ diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c b/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c index 8ac90723cbd..cec1cf924b4 100644 --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c @@ -10,8 +10,7 @@ int test (void) } /* Use dg-regexp to consume the JSON output starting with - the innermost values, and working outwards. - We can't rely on any ordering of the keys. */ + the innermost values, and working outwards. */ /* Verify nested diagnostics. */ diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c b/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c index 8d2eb0c5089..86f8c5fb374 100644 --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c @@ -8,49 +8,61 @@ int test (struct s *ptr) return ptr->colour; } -/* Use dg-regexp to consume the JSON output starting with - the innermost values, and working outwards. - We can't rely on any ordering of the keys. */ +/* Verify fix-it hints. -/* { dg-regexp "\"kind\": \"error\"" } */ -/* { dg-regexp "\"column-origin\": 1" } */ -/* { dg-regexp "\"escape-source\": false" } */ -/* { dg-regexp "\"message\": \".*\"" } */ + Use dg-regexp to consume the JSON output from start to + finish, relying on the ordering of the keys. + The following uses indentation to visualize the structure + of the JSON (although the actual output is all on one line). -/* Verify fix-it hints. */ - -/* { dg-regexp "\"string\": \"color\"" } */ - -/* { dg-regexp "\"start\": \{" } */ -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json-5.c\"" } */ -/* { dg-regexp "\"line\": 8" } */ -/* { dg-regexp "\"column\": 15" } */ -/* { dg-regexp "\"display-column\": 15" } */ -/* { dg-regexp "\"byte-column\": 15" } */ - -/* { dg-regexp "\"next\": \{" } */ -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json-5.c\"" } */ -/* { dg-regexp "\"line\": 8" } */ -/* { dg-regexp "\"column\": 21" } */ -/* { dg-regexp "\"display-column\": 21" } */ -/* { dg-regexp "\"byte-column\": 21" } */ - -/* { dg-regexp "\"fixits\": \[\[\{\}, \]*\]" } */ - -/* { dg-regexp "\"caret\": \{" } */ -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json-5.c\"" } */ -/* { dg-regexp "\"line\": 8" } */ -/* { dg-regexp "\"column\": 15" } */ -/* { dg-regexp "\"display-column\": 15" } */ -/* { dg-regexp "\"byte-column\": 15" } */ - -/* { dg-regexp "\"finish\": \{" } */ -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json-5.c\"" } */ -/* { dg-regexp "\"line\": 8" } */ -/* { dg-regexp "\"column\": 20" } */ -/* { dg-regexp "\"display-column\": 20" } */ -/* { dg-regexp "\"byte-column\": 20" } */ - -/* { dg-regexp "\"locations\": \[\[\{\}, \]*\]" } */ -/* { dg-regexp "\"children\": \[\[\]\[\]\]" } */ -/* { dg-regexp "\[\[\{\}, \]*\]" } */ + { dg-regexp {\[} } + { dg-regexp {\{} } + { dg-regexp {"kind": "error"} } + { dg-regexp {, "message": "'struct s' has no member named 'colour'; did you mean 'color'\?"} } + { dg-regexp {, "children": \[\]} } + { dg-regexp {, "column-origin": 1} } + { dg-regexp {, "locations": } } + { dg-regexp {\[} } + { dg-regexp {\{} } + { dg-regexp {"caret": } } + { dg-regexp {\{} } + { dg-regexp {"file": "[^\n\r"]*diagnostic-format-json-5.c"} } + { dg-regexp {, "line": 8} } + { dg-regexp {, "display-column": 15} } + { dg-regexp {, "byte-column": 15} } + { dg-regexp {, "column": 15} } + { dg-regexp {\}} } + { dg-regexp {, "finish": } } + { dg-regexp {\{} } + { dg-regexp {"file": "[^\n\r"]*diagnostic-format-json-5.c"} } + { dg-regexp {, "line": 8} } + { dg-regexp {, "display-column": 20} } + { dg-regexp {, "byte-column": 20} } + { dg-regexp {, "column": 20} } + { dg-regexp {\}} } + { dg-regexp {\}} } + { dg-regexp {\]} } + { dg-regexp {, "fixits": } } + { dg-regexp {\[} } + { dg-regexp {\{} } + { dg-regexp {"start": } } + { dg-regexp {\{} } + { dg-regexp {"file": "[^\n\r"]*diagnostic-format-json-5.c"} } + { dg-regexp {, "line": 8} } + { dg-regexp {, "display-column": 15} } + { dg-regexp {, "byte-column": 15} } + { dg-regexp {, "column": 15} } + { dg-regexp {\}} } + { dg-regexp {, "next": } } + { dg-regexp {\{} } + { dg-regexp {"file": "[^\n\r"]*diagnostic-format-json-5.c"} } + { dg-regexp {, "line": 8} } + { dg-regexp {, "display-column": 21} } + { dg-regexp {, "byte-column": 21} } + { dg-regexp {, "column": 21} } + { dg-regexp {\}} } + { dg-regexp {, "string": "color"} } + { dg-regexp {\}} } + { dg-regexp {\]} } + { dg-regexp {, "escape-source": false\}} } + { dg-regexp {\]} } */ diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-stderr-1.c b/gcc/testsuite/c-c++-common/diagnostic-format-json-stderr-1.c index 02f780bce10..bcfa92110f5 100644 --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-stderr-1.c +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-stderr-1.c @@ -6,8 +6,7 @@ #error message /* Use dg-regexp to consume the JSON output starting with - the innermost values, and working outwards. - We can't rely on any ordering of the keys. */ + the innermost values, and working outwards. */ /* { dg-regexp "\"kind\": \"error\"" } */ /* { dg-regexp "\"column-origin\": 1" } */