From patchwork Wed Aug 9 22:14:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lewis Hyatt X-Patchwork-Id: 133588 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp40634vqi; Wed, 9 Aug 2023 15:17:19 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEIs0MDVelw9qxXIW5/jDpfnQ7gzNekLQWmtg7Pq4vivzWtXNCHiuRtZREJCK6w0XF/gsp0 X-Received: by 2002:a17:907:762d:b0:99c:f6f7:1796 with SMTP id jy13-20020a170907762d00b0099cf6f71796mr323442ejc.42.1691619439031; Wed, 09 Aug 2023 15:17:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691619439; cv=none; d=google.com; s=arc-20160816; b=rUARMcMiXnOarT3kMxR9or0GhPdXF2z7OxJerojx59yKmoNs5YymlWW4tc1jQvbtC9 z2AUR8Qnf+KR9+xlNZVo+pDakTXMgRzb8OZnIcsoHe1QM0vtRQb6HRS3eibhRJDlBsyi x9d+qz61EwC/doz83VQiKQkbNi5pSRKY5ivA5MALii5/0zfriG7hGT8zm/sBScp3DAK9 bcUbE2scHav2nVOjEvSY3OEsnBIMzlyQOPbWjIc42h4DaIZi4+UR4qaznBO1KDKYbMhz OIP7gQBq+9Cz48R5HMqm/5x9yh2IHZCTo+BxyxiPaphSZuFUbZcjNWZ80jAuEYgT+d8F FR6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=r3pPrWqQYQ7C8r1NsXpqvjXME+banAHcGjkccqn77Lg=; fh=hLxXrzU+VDBolomQxjoi9c6yn4Oij2Jaf7BaYMHGh24=; b=COTnfuKvC8HygTNmFSg0/Vs1RqnBigS5HNg05aqca81TAoOziMH2U1T0tzi04NKR3x HZYt99zP1SdZ10kTDOtfyvN3v4N9AQsKADFEc6EDI6/zuPQjy7nn5XJGpuo5Bx6uGxiW 4Ska9KSyVAzMND3R6IYCi8bOaaZJxGP+r3z4QScKWcmD4G6lvB18hrvjXqnonrgDuL1v Mcns18pqcstRgbqXjXL2MOa29ECuX4agBdNchC8K1GPQvTd6xmH1V4y69PPISqWDCBCR J0+6FSoVHkItRHNqOLKa7gVnNC3bPzjGRi55nt4u8uveftQTvB497piMVMft/a86Omnt 1W1Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=GxxyUjvB; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id i14-20020a170906264e00b00992fef51a60si118388ejc.525.2023.08.09.15.17.18 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Aug 2023 15:17:19 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=GxxyUjvB; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9F5053831E01 for ; Wed, 9 Aug 2023 22:15:59 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9F5053831E01 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1691619359; bh=r3pPrWqQYQ7C8r1NsXpqvjXME+banAHcGjkccqn77Lg=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=GxxyUjvBsgggDoYEl7fJAdLXO2k7XdqIAE/ocL635ibRe1A4ETTU4yzE1HvB3dncT tXxWunIa7uwpV67YG0yeQEXD1ECaR1x+X0XwDOxaYjoVrlRfQvhBDnt6GoznDDaJfG g35jw9WcBOerXBA8lz2INO91kW1xx/L2CJht9F1g= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com [IPv6:2607:f8b0:4864:20::82e]) by sourceware.org (Postfix) with ESMTPS id 86C64385770F for ; Wed, 9 Aug 2023 22:14:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 86C64385770F Received: by mail-qt1-x82e.google.com with SMTP id d75a77b69052e-40fc670197aso6961461cf.1 for ; Wed, 09 Aug 2023 15:14:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691619293; x=1692224093; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r3pPrWqQYQ7C8r1NsXpqvjXME+banAHcGjkccqn77Lg=; b=lBHNMLsHtr3DVrVoKrsT1RP6CuO3LJtI/orHDD1UCmY7XUnBhJSc3mOsv3bq0orfeI om6vR4pVyvR/wzT2QpcYlRlzrmfSAQz9FlX6C6ky9Vf6ukDkoPUQvL2Ldo7jFef3ozrz 2IWUkE3yWO+au6MewP3O5PxprTpnpIe4rgfBb3IKfCIj2wcUZILIJGHaubo71EmGmEby NCjtemHzO9m522eCPBNvAlc0kBn6c4/i302aa4sSSEYmbD+neWdEqBV50EdJ0SLRlX7k nEuJ014n4emxwJIJVHvD9Rf2oFlG4gH2Pkqg+d0WpgiRSeM2+5KOcQk9Ow3JfvbF9TUd kIvA== X-Gm-Message-State: AOJu0Yylp+QfkSpHRqcVkuicC3YKt0UWzWfdhkGc5sNOk2kXe3Qchsob eTnTJROOKUyxYXYCSvFEExvPdEQpLok= X-Received: by 2002:ac8:5b08:0:b0:40c:21b2:40ab with SMTP id m8-20020ac85b08000000b0040c21b240abmr244849qtw.22.1691619292690; Wed, 09 Aug 2023 15:14:52 -0700 (PDT) Received: from localhost.localdomain (96-67-140-173-static.hfc.comcastbusiness.net. [96.67.140.173]) by smtp.gmail.com with ESMTPSA id ce11-20020a05622a41cb00b0040fef71dc1esm46334qtb.10.2023.08.09.15.14.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Aug 2023 15:14:52 -0700 (PDT) To: gcc-patches@gcc.gnu.org Cc: David Malcolm , Lewis Hyatt Subject: [PATCH v4 8/8] diagnostics: Support generated data locations in SARIF output Date: Wed, 9 Aug 2023 18:14:14 -0400 Message-Id: <20230809221414.2849878-9-lhyatt@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230809221414.2849878-1-lhyatt@gmail.com> References: <20230809221414.2849878-1-lhyatt@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-3038.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Lewis Hyatt via Gcc-patches From: Lewis Hyatt Reply-To: Lewis Hyatt Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773791544470107004 X-GMAIL-MSGID: 1773791544470107004 The diagnostics routines for SARIF output need to read the source code back in, so that they can generate "snippet" and "content" records, so they need to be able to cope with generated data locations. Add support for that in diagnostic-format-sarif.cc. gcc/ChangeLog: * diagnostic-format-sarif.cc (class sarif_builder): Adapt interface to support generated data locations. (sarif_builder::maybe_make_physical_location_object): Change the m_filenames hash_set to support generated data. (sarif_builder::make_artifact_location_object): Use a source_id rather than a plain file name. (sarif_builder::maybe_make_region_object): Adapt to expanded_location interface changes. (sarif_builder::maybe_make_region_object_for_context): Likewise. (sarif_builder::make_artifact_object): Likewise. (sarif_builder::make_run_object): Handle generated data. (sarif_builder::maybe_make_artifact_content_object): Likewise. (get_source_lines): Likewise. gcc/testsuite/ChangeLog: * c-c++-common/diagnostic-format-sarif-file-5.c: New test. --- gcc/diagnostic-format-sarif.cc | 88 +++++++++++-------- .../diagnostic-format-sarif-file-5.c | 31 +++++++ 2 files changed, 82 insertions(+), 37 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc index 1eff71962d7..c7c0e5d4b0a 100644 --- a/gcc/diagnostic-format-sarif.cc +++ b/gcc/diagnostic-format-sarif.cc @@ -174,7 +174,7 @@ private: json::array *maybe_make_kinds_array (diagnostic_event::meaning m) const; json::object *maybe_make_physical_location_object (location_t loc); json::object *make_artifact_location_object (location_t loc); - json::object *make_artifact_location_object (const char *filename); + json::object *make_artifact_location_object (source_id src); json::object *make_artifact_location_object_for_pwd () const; json::object *maybe_make_region_object (location_t loc) const; json::object *maybe_make_region_object_for_context (location_t loc) const; @@ -197,9 +197,9 @@ private: json::object *make_reporting_descriptor_object_for_cwe_id (int cwe_id) const; json::object * make_reporting_descriptor_reference_object_for_cwe_id (int cwe_id); - json::object *make_artifact_object (const char *filename); - json::object *maybe_make_artifact_content_object (const char *filename) const; - json::object *maybe_make_artifact_content_object (const char *filename, + json::object *make_artifact_object (source_id src); + json::object *maybe_make_artifact_content_object (source_id src) const; + json::object *maybe_make_artifact_content_object (source_id src, int start_line, int end_line) const; json::object *make_fix_object (const rich_location &rich_loc); @@ -220,7 +220,11 @@ private: diagnostic group. */ sarif_result *m_cur_group_result; - hash_set m_filenames; + /* If the second member is >0, then this is a buffer of generated content, + with that length, not a filename. */ + hash_set , + int_hash > + > m_filenames; bool m_seen_any_relative_paths; hash_set m_rule_id_set; json::array *m_rules_arr; @@ -787,7 +791,8 @@ sarif_builder::maybe_make_physical_location_object (location_t loc) /* "artifactLocation" property (SARIF v2.1.0 section 3.29.3). */ json::object *artifact_loc_obj = make_artifact_location_object (loc); phys_loc_obj->set ("artifactLocation", artifact_loc_obj); - m_filenames.add (LOCATION_FILE (loc)); + const auto src = LOCATION_SRC (loc); + m_filenames.add ({src.get_filename_or_buffer (), src.get_buffer_len ()}); /* "region" property (SARIF v2.1.0 section 3.29.4). */ if (json::object *region_obj = maybe_make_region_object (loc)) @@ -811,7 +816,7 @@ sarif_builder::maybe_make_physical_location_object (location_t loc) json::object * sarif_builder::make_artifact_location_object (location_t loc) { - return make_artifact_location_object (LOCATION_FILE (loc)); + return make_artifact_location_object (LOCATION_SRC (loc)); } /* The ID value for use in "uriBaseId" properties (SARIF v2.1.0 section 3.4.4) @@ -823,10 +828,13 @@ sarif_builder::make_artifact_location_object (location_t loc) or return NULL. */ json::object * -sarif_builder::make_artifact_location_object (const char *filename) +sarif_builder::make_artifact_location_object (source_id src) { json::object *artifact_loc_obj = new json::object (); + const auto filename = src.is_buffer () + ? special_fname_generated () : src.get_filename_or_buffer (); + /* "uri" property (SARIF v2.1.0 section 3.4.3). */ artifact_loc_obj->set ("uri", new json::string (filename)); @@ -912,9 +920,9 @@ sarif_builder::maybe_make_region_object (location_t loc) const expanded_location exploc_start = expand_location (start_loc); expanded_location exploc_finish = expand_location (finish_loc); - if (exploc_start.file !=exploc_caret.file) + if (exploc_start.src != exploc_caret.src) return NULL; - if (exploc_finish.file !=exploc_caret.file) + if (exploc_finish.src != exploc_caret.src) return NULL; json::object *region_obj = new json::object (); @@ -963,9 +971,9 @@ sarif_builder::maybe_make_region_object_for_context (location_t loc) const expanded_location exploc_start = expand_location (start_loc); expanded_location exploc_finish = expand_location (finish_loc); - if (exploc_start.file !=exploc_caret.file) + if (exploc_start.src != exploc_caret.src) return NULL; - if (exploc_finish.file !=exploc_caret.file) + if (exploc_finish.src != exploc_caret.src) return NULL; json::object *region_obj = new json::object (); @@ -979,9 +987,9 @@ sarif_builder::maybe_make_region_object_for_context (location_t loc) const /* "snippet" property (SARIF v2.1.0 section 3.30.13). */ if (json::object *artifact_content_obj - = maybe_make_artifact_content_object (exploc_start.file, - exploc_start.line, - exploc_finish.line)) + = maybe_make_artifact_content_object (exploc_start.src, + exploc_start.line, + exploc_finish.line)) region_obj->set ("snippet", artifact_content_obj); return region_obj; @@ -1298,7 +1306,10 @@ sarif_builder::make_run_object (sarif_invocation *invocation_obj, json::array *artifacts_arr = new json::array (); for (auto iter : m_filenames) { - json::object *artifact_obj = make_artifact_object (iter); + const auto src = iter.second + ? source_id {iter.first, iter.second} /* Memory buffer. */ + : source_id {iter.first}; /* Filename. */ + json::object *artifact_obj = make_artifact_object (src); artifacts_arr->append (artifact_obj); } run_obj->set ("artifacts", artifacts_arr); @@ -1472,37 +1483,37 @@ sarif_builder::maybe_make_cwe_taxonomy_object () const /* Make an artifact object (SARIF v2.1.0 section 3.24). */ json::object * -sarif_builder::make_artifact_object (const char *filename) +sarif_builder::make_artifact_object (source_id src) { json::object *artifact_obj = new json::object (); /* "location" property (SARIF v2.1.0 section 3.24.2). */ - json::object *artifact_loc_obj = make_artifact_location_object (filename); + json::object *artifact_loc_obj = make_artifact_location_object (src); artifact_obj->set ("location", artifact_loc_obj); /* "contents" property (SARIF v2.1.0 section 3.24.8). */ if (json::object *artifact_content_obj - = maybe_make_artifact_content_object (filename)) + = maybe_make_artifact_content_object (src)) artifact_obj->set ("contents", artifact_content_obj); /* "sourceLanguage" property (SARIF v2.1.0 section 3.24.10). */ if (m_context->m_client_data_hooks) if (const char *source_lang = m_context->m_client_data_hooks->maybe_get_sarif_source_language - (filename)) + (src.get_filename_or_buffer ())) artifact_obj->set ("sourceLanguage", new json::string (source_lang)); return artifact_obj; } /* Make an artifactContent object (SARIF v2.1.0 section 3.3) for the - full contents of FILENAME. */ + full contents of SRC. */ json::object * -sarif_builder::maybe_make_artifact_content_object (const char *filename) const +sarif_builder::maybe_make_artifact_content_object (source_id src) const { /* Let input.cc handle any charset conversion. */ - char_span utf8_content = get_source_file_content (filename); + char_span utf8_content = get_source_file_content (src); if (!utf8_content) return NULL; @@ -1518,10 +1529,12 @@ sarif_builder::maybe_make_artifact_content_object (const char *filename) const } /* Attempt to read the given range of lines from FILENAME; return - a freshly-allocated 0-terminated buffer containing them, or NULL. */ + a freshly-allocated buffer containing them, or NULL. + The buffer is null-terminated, but could also contain embedded null + bytes, so the char_span's length() accessor should be used. */ -static char * -get_source_lines (const char *filename, +static char_span +get_source_lines (source_id src, int start_line, int end_line) { @@ -1529,9 +1542,9 @@ get_source_lines (const char *filename, for (int line = start_line; line <= end_line; line++) { - char_span line_content = location_get_source_line (filename, line); + char_span line_content = location_get_source_line (src, line); if (!line_content.get_buffer ()) - return NULL; + return char_span (nullptr, 0); result.reserve (line_content.length () + 1); for (size_t i = 0; i < line_content.length (); i++) result.quick_push (line_content[i]); @@ -1539,33 +1552,34 @@ get_source_lines (const char *filename, } result.safe_push ('\0'); - return xstrdup (result.address ()); + return char_span (xstrdup (result.address ()), result.length () - 1); } /* Make an artifactContent object (SARIF v2.1.0 section 3.3) for the given - run of lines within FILENAME (including the endpoints). */ + run of lines in the source code identified by SRC (including the + endpoints). */ json::object * -sarif_builder::maybe_make_artifact_content_object (const char *filename, +sarif_builder::maybe_make_artifact_content_object (source_id src, int start_line, int end_line) const { - char *text_utf8 = get_source_lines (filename, start_line, end_line); + const char_span text_utf8 = get_source_lines (src, start_line, end_line); if (!text_utf8) return NULL; /* Don't add it if it's not valid UTF-8. */ - if (!cpp_valid_utf8_p(text_utf8, strlen(text_utf8))) + if (!cpp_valid_utf8_p (text_utf8.get_buffer (), text_utf8.length ())) { - free (text_utf8); + free (const_cast (text_utf8.get_buffer ())); return NULL; } json::object *artifact_content_obj = new json::object (); - artifact_content_obj->set ("text", new json::string (text_utf8)); - free (text_utf8); - + artifact_content_obj->set ("text", new json::string (text_utf8.get_buffer (), + text_utf8.length ())); + free (const_cast (text_utf8.get_buffer ())); return artifact_content_obj; } diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c b/gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c new file mode 100644 index 00000000000..2ca6a069d3f --- /dev/null +++ b/gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c @@ -0,0 +1,31 @@ +/* The goal is to test SARIF output of generated data, such as a _Pragma string. + But SARIF output as of yet does not output macro definitions, so such + generated data buffers never end up in the typical SARIF output. One way we + can achieve it is to use -fdump-internal-locations, which outputs top-level + diagnostic notes inside macro definitions, that SARIF will end up processing. + It also outputs a lot of other stuff to stderr (not to the SARIF file) that + is not relevant to this test, so we use a blanket dg-regexp to filter all of + that away. */ + +/* { dg-do compile } */ +/* { dg-options "-fdiagnostics-format=sarif-file -fdump-internal-locations" } */ +/* { dg-allow-blank-lines-in-output "" } */ + +_Pragma("GCC diagnostic push") + +/* { dg-regexp {(.|[\n\r])*} } */ + +/* Because of the way -fdump-internal-locations works, these regexes themselves + will end up in the sarif output also. But due to the escaping, they don't + match themselves, so they still test what we need. */ + +/* Four of this pair are output for the tokens inside the + _Pragma string (3 plus a PRAGMA_EOL). */ + +/* { dg-final { scan-sarif-file "\"artifactLocation\": \{\"uri\": \"\"," } } */ +/* { dg-final { scan-sarif-file "\"snippet\": \{\"text\": \"GCC diagnostic push\\\\n\"" } } */ + +/* One of this pair is output for the overall internal location. */ + +/* { dg-final { scan-sarif-file "\{\"location\": \{\"uri\": \"\"," } } */ +/* { dg-final { scan-sarif-file "\"contents\": \{\"text\": \"GCC diagnostic push\\\\n\\\\0" } } */