[v4,8/8] diagnostics: Support generated data locations in SARIF output

Message ID 20230809221414.2849878-9-lhyatt@gmail.com
State Accepted
Headers
Series diagnostics: libcpp: Overhaul locations for _Pragma tokens |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Lewis Hyatt Aug. 9, 2023, 10:14 p.m. UTC
  The diagnostics routines for SARIF output need to read the source code back
in, so that they can generate "snippet" and "content" records, so they need to
be able to cope with generated data locations.  Add support for that in
diagnostic-format-sarif.cc.

gcc/ChangeLog:

	* diagnostic-format-sarif.cc (class sarif_builder): Adapt interface
	to support generated data locations.
	(sarif_builder::maybe_make_physical_location_object): Change the
	m_filenames hash_set to support generated data.
	(sarif_builder::make_artifact_location_object): Use a source_id rather
	than a plain file name.
	(sarif_builder::maybe_make_region_object): Adapt to
	expanded_location interface changes.
	(sarif_builder::maybe_make_region_object_for_context): Likewise.
	(sarif_builder::make_artifact_object): Likewise.
	(sarif_builder::make_run_object): Handle generated data.
	(sarif_builder::maybe_make_artifact_content_object): Likewise.
	(get_source_lines): Likewise.

gcc/testsuite/ChangeLog:

	* c-c++-common/diagnostic-format-sarif-file-5.c: New test.
---
 gcc/diagnostic-format-sarif.cc                | 88 +++++++++++--------
 .../diagnostic-format-sarif-file-5.c          | 31 +++++++
 2 files changed, 82 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c
  

Comments

David Malcolm Aug. 15, 2023, 5:04 p.m. UTC | #1
On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> The diagnostics routines for SARIF output need to read the source code back
> in, so that they can generate "snippet" and "content" records, so they need to
> be able to cope with generated data locations.  Add support for that in
> diagnostic-format-sarif.cc.
> 
> gcc/ChangeLog:
> 
>         * diagnostic-format-sarif.cc (class sarif_builder): Adapt interface
>         to support generated data locations.
>         (sarif_builder::maybe_make_physical_location_object): Change the
>         m_filenames hash_set to support generated data.
>         (sarif_builder::make_artifact_location_object): Use a source_id rather
>         than a plain file name.
>         (sarif_builder::maybe_make_region_object): Adapt to
>         expanded_location interface changes.
>         (sarif_builder::maybe_make_region_object_for_context): Likewise.
>         (sarif_builder::make_artifact_object): Likewise.
>         (sarif_builder::make_run_object): Handle generated data.
>         (sarif_builder::maybe_make_artifact_content_object): Likewise.
>         (get_source_lines): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>         * c-c++-common/diagnostic-format-sarif-file-5.c: New test.

I'm not sure if generated data is allowed as part of a SARIF artefact,
or if there's a more standard-compliant way of representing this; SARIF
says an artefact is a "sequence of bytes addressable via a URI".

Can you post a simple example of the generated .sarif JSON please? 
e.g. from the new test, so that we can see it looks like.

You could run it through:

  python -m json.tool 

to format it for easier reading.


Thanks
Dave
  
Lewis Hyatt Aug. 15, 2023, 5:51 p.m. UTC | #2
On Tue, Aug 15, 2023 at 01:04:04PM -0400, David Malcolm wrote:
> On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > The diagnostics routines for SARIF output need to read the source code back
> > in, so that they can generate "snippet" and "content" records, so they need to
> > be able to cope with generated data locations.  Add support for that in
> > diagnostic-format-sarif.cc.
> > 
> > gcc/ChangeLog:
> > 
> >         * diagnostic-format-sarif.cc (class sarif_builder): Adapt interface
> >         to support generated data locations.
> >         (sarif_builder::maybe_make_physical_location_object): Change the
> >         m_filenames hash_set to support generated data.
> >         (sarif_builder::make_artifact_location_object): Use a source_id rather
> >         than a plain file name.
> >         (sarif_builder::maybe_make_region_object): Adapt to
> >         expanded_location interface changes.
> >         (sarif_builder::maybe_make_region_object_for_context): Likewise.
> >         (sarif_builder::make_artifact_object): Likewise.
> >         (sarif_builder::make_run_object): Handle generated data.
> >         (sarif_builder::maybe_make_artifact_content_object): Likewise.
> >         (get_source_lines): Likewise.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >         * c-c++-common/diagnostic-format-sarif-file-5.c: New test.
> 
> I'm not sure if generated data is allowed as part of a SARIF artefact,
> or if there's a more standard-compliant way of representing this; SARIF
> says an artefact is a "sequence of bytes addressable via a URI".
> 
> Can you post a simple example of the generated .sarif JSON please? 
> e.g. from the new test, so that we can see it looks like.
> 
> You could run it through:
> 
>   python -m json.tool 
> 
> to format it for easier reading.

For a simple example like:

_Pragma("GCC diagnostic ignored \"-Wnot-an-option\"")

for which the normal output is:

=====
In buffer generated from t.cpp:1:
<generated>:1:24: warning: unknown option after ‘#pragma GCC diagnostic’ kind [-Wpragmas]
    1 | GCC diagnostic ignored "-Wnot-an-option"
      |                        ^~~~~~~~~~~~~~~~~
t.cpp:1:1: note: in <_Pragma directive>
    1 | _Pragma("GCC diagnostic ignored \"-Wnot-an-option\"")
      | ^~~~~~~
=====

The SARIF output does not end up referencing any generated data locations,
because those are logically part of the "expansion" of the _Pragma
directive, and it doesn't output macro expansions.  In order for SARIF to
currently do something with generated data, it needs to see a generated data
location in a non-macro context. The only way to get GCC to do that, right
now, is with -fdump-internal-locations, which is what the new test case
does. That just unfortunately generates a larger amount of output. I attached
it, in case that's still helpful, for the following program:

=====
_Pragma("GCC diagnostic push")
=====

I guess there's potentially already a problem here because 'python -m
json.tool' is unhappy with this output and refuses to process it:

=====
Invalid \escape: line 1 column 3436 (char 3435)
=====

The related text is:
=====
{"location": {"uri": "<generated>", "uriBaseId": "PWD"},
"contents":{"text": "GCC diagnostic push\n\0"}
=====

And the \0 is not allowed it seems?

I also attached the output of 'python -m json.tool' anyway, after manually
removing the \0.

Is it better to just skip these locations for now?

-Lewis
{"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json", "version": "2.1.0", "runs": [{"tool": {"driver": {"name": "GNU C++17", "fullName": "GNU C++17 (GCC) version 14.0.0 20230811 (experimental) (x86_64-pc-linux-gnu)", "version": "14.0.0 20230811 (experimental)", "informationUri": "https://gcc.gnu.org/gcc-14/", "rules": []}}, "invocations": [{"executionSuccessful": true, "toolExecutionNotifications": []}], "originalUriBaseIds": {"PWD": {"uri": "file:///home/lewis/"}}, "artifacts": [{"location": {"uri": "t.cpp", "uriBaseId": "PWD"}, "contents": {"text": "_Pragma(\"GCC diagnostic push\")\n"}, "sourceLanguage": "cplusplus"}, {"location": {"uri": "/usr/include/stdc-predef.h"}, "contents": {"text": "/* Copyright (C) 1991-2022 Free Software Foundation, Inc.\n   This file is part of the GNU C Library.\n\n   The GNU C Library is free software; you can redistribute it and/or\n   modify it under the terms of the GNU Lesser General Public\n 
   License as published by the Free Software Foundation; either\n   version 2.1 of the License, or (at your option) any later version.\n\n   The GNU C Library is distributed in the hope that it will be useful,\n   but WITHOUT ANY WARRANTY; without even the implied warranty of\n   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU\n   Lesser General Public License for more details.\n\n   You should have received a copy of the GNU Lesser General Public\n   License along with the GNU C Library; if not, see\n   <https://www.gnu.org/licenses/>.  */\n\n#ifndef\t_STDC_PREDEF_H\n#define\t_STDC_PREDEF_H\t1\n\n/* This header is separate from features.h so that the compiler can\n   include it implicitly at the start of every compilation.  It must\n   not itself include <features.h> or any other header that includes\n   <features.h> because the implicit include comes before any feature\n   test macros that may be defined in a source file before it first\n   explicitly includes a s
 ystem header.  GCC knows the name of this\n   header in order to preinclude it.  */\n\n/* glibc's intent is to support the IEC 559 math functionality, real\n   and complex.  If the GCC (4.9 and later) predefined macros\n   specifying compiler intent are available, use them to determine\n   whether the overall intent is to support these features; otherwise,\n   presume an older compiler has intent to support these features and\n   define these macros by default.  */\n\n#ifdef __GCC_IEC_559\n# if __GCC_IEC_559 > 0\n#  define __STDC_IEC_559__\t\t1\n#  define __STDC_IEC_60559_BFP__ \t201404L\n# endif\n#else\n# define __STDC_IEC_559__\t\t1\n# define __STDC_IEC_60559_BFP__ \t201404L\n#endif\n\n#ifdef __GCC_IEC_559_COMPLEX\n# if __GCC_IEC_559_COMPLEX > 0\n#  define __STDC_IEC_559_COMPLEX__\t1\n#  define __STDC_IEC_60559_COMPLEX__\t201404L\n# endif\n#else\n# define __STDC_IEC_559_COMPLEX__\t1\n# define __STDC_IEC_60559_COMPLEX__\t201404L\n#endif\n\n/* wchar_t uses Unicode 10.0.0.  Version 1
 0.0 of the Unicode Standard is\n   synchronized with ISO/IEC 10646:2017, fifth edition, plus\n   the following additions from Amendment 1 to the fifth edition:\n   - 56 emoji characters\n   - 285 hentaigana\n   - 3 additional Zanabazar Square characters */\n#define __STDC_ISO_10646__\t\t201706L\n\n#endif\n"}, "sourceLanguage": "cplusplus"}, {"location": {"uri": "<generated>", "uriBaseId": "PWD"}, "contents": {"text": "GCC diagnostic push\n\0"}, "sourceLanguage": "cplusplus"}], "results": [{"ruleId": "note", "level": "note", "message": {"text": "expansion point is location 258918"}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "t.cpp", "uriBaseId": "PWD"}, "region": {"startLine": 1, "startColumn": 1, "endColumn": 8}, "contextRegion": {"startLine": 1, "snippet": {"text": "_Pragma(\"GCC diagnostic push\")\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "token 0 has ‘x-location == y-location == 259906’"}, "locations": [{"physicalLocation": {"arti
 factLocation": {"uri": "<generated>", "uriBaseId": "PWD"}, "region": {"startLine": 1, "startColumn": 1, "endColumn": 4}, "contextRegion": {"startLine": 1, "snippet": {"text": "GCC diagnostic push\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "token 1 has ‘x-location == y-location == 260387’"}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "<generated>", "uriBaseId": "PWD"}, "region": {"startLine": 1, "startColumn": 16, "endColumn": 20}, "contextRegion": {"startLine": 1, "snippet": {"text": "GCC diagnostic push\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "token 2 has ‘x-location == y-location == 260512’"}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "<generated>", "uriBaseId": "PWD"}, "region": {"startLine": 1, "startColumn": 20, "endColumn": 21}, "contextRegion": {"startLine": 1, "snippet": {"text": "GCC diagnostic push\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "expansion 
 point is location 189172"}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "/usr/include/stdc-predef.h"}, "region": {"startLine": 47, "startColumn": 6, "endColumn": 27}, "contextRegion": {"startLine": 47, "snippet": {"text": "# if __GCC_IEC_559_COMPLEX > 0\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "token 0 has ‘x-location == y-location == 1’"}, "locations": [{}]}, {"ruleId": "note", "level": "note", "message": {"text": "expansion point is location 148204"}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "/usr/include/stdc-predef.h"}, "region": {"startLine": 37, "startColumn": 6, "endColumn": 19}, "contextRegion": {"startLine": 37, "snippet": {"text": "# if __GCC_IEC_559 > 0\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "token 0 has ‘x-location == y-location == 1’"}, "locations": [{}]}]}]}
{
    "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
    "runs": [
        {
            "artifacts": [
                {
                    "contents": {
                        "text": "_Pragma(\"GCC diagnostic push\")\n"
                    },
                    "location": {
                        "uri": "t.cpp",
                        "uriBaseId": "PWD"
                    },
                    "sourceLanguage": "cplusplus"
                },
                {
                    "contents": {
                        "text": "/* Copyright (C) 1991-2022 Free Software Foundation, Inc.\n   This file is part of the GNU C Library.\n\n   The GNU C Library is free software; you can redistribute it and/or\n   modify it under the terms of the GNU Lesser General Public\n   License as published by the Free Software Foundation; either\n   version 2.1 of the License, or (at your option) any later version.\n\n   The GNU C Library is distributed in the hope that it will be useful,\n   but WITHOUT ANY WARRANTY; without even the implied warranty of\n   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU\n   Lesser General Public License for more details.\n\n   You should have received a copy of the GNU Lesser General Public\n   License along with the GNU C Library; if not, see\n   <https://www.gnu.org/licenses/>.  */\n\n#ifndef\t_STDC_PREDEF_H\n#define\t_STDC_PREDEF_H\t1\n\n/* This header is separate from features.h so that the compiler can\n   include it implicitly at the st
 art of every compilation..  It must\n   not itself include <features.h> or any other header that includes\n   <features.h> because the implicit include comes before any feature\n   test macros that may be defined in a source file before it first\n   explicitly includes a system header.  GCC knows the name of this\n   header in order to preinclude it.  */\n\n/* glibc's intent is to support the IEC 559 math functionality, real\n   and complex.  If the GCC (4.9 and later) predefined macros\n   specifying compiler intent are available, use them to determine\n   whether the overall intent is to support these features; otherwise,\n   presume an older compiler has intent to support these features and\n   define these macros by default.  */\n\n#ifdef __GCC_IEC_559\n# if __GCC_IEC_559 > 0\n#  define __STDC_IEC_559__\t\t1\n#  define __STDC_IEC_60559_BFP__ \t201404L\n# endif\n#else\n# define __STDC_IEC_559__\t\t1\n# define __STDC_IEC_60559_BFP__ \t201404L\n#endif\n\n#ifdef __GCC_IEC_559_COMPLE
 X\n# if __GCC_IEC_559_COMPLEX > 0\n#  define __STDC_IEC_559_COMPLEX__\t1\n#  define __STDC_IEC_60559_COMPLEX__\t201404L\n# endif\n#else\n# define __STDC_IEC_559_COMPLEX__\t1\n# define __STDC_IEC_60559_COMPLEX__\t201404L\n#endif\n\n/* wchar_t uses Unicode 10.0.0.  Version 10.0 of the Unicode Standard is\n   synchronized with ISO/IEC 10646:2017, fifth edition, plus\n   the following additions from Amendment 1 to the fifth edition:\n   - 56 emoji characters\n   - 285 hentaigana\n   - 3 additional Zanabazar Square characters */\n#define __STDC_ISO_10646__\t\t201706L\n\n#endif\n"
                    },
                    "location": {
                        "uri": "/usr/include/stdc-predef.h"
                    },
                    "sourceLanguage": "cplusplus"
                },
                {
                    "contents": {
                        "text": "GCC diagnostic push\n"
                    },
                    "location": {
                        "uri": "<generated>",
                        "uriBaseId": "PWD"
                    },
                    "sourceLanguage": "cplusplus"
                }
            ],
            "invocations": [
                {
                    "executionSuccessful": true,
                    "toolExecutionNotifications": []
                }
            ],
            "originalUriBaseIds": {
                "PWD": {
                    "uri": "file:///home/lewis/"
                }
            },
            "results": [
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "t.cpp",
                                    "uriBaseId": "PWD"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "_Pragma(\"GCC diagnostic push\")\n"
                                    },
                                    "startLine": 1
                                },
                                "region": {
                                    "endColumn": 8,
                                    "startColumn": 1,
                                    "startLine": 1
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "expansion point is location 258918"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "<generated>",
                                    "uriBaseId": "PWD"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "GCC diagnostic push\n"
                                    },
                                    "startLine": 1
                                },
                                "region": {
                                    "endColumn": 4,
                                    "startColumn": 1,
                                    "startLine": 1
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "token 0 has \u2018x-location == y-location == 259906\u2019"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "<generated>",
                                    "uriBaseId": "PWD"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "GCC diagnostic push\n"
                                    },
                                    "startLine": 1
                                },
                                "region": {
                                    "endColumn": 20,
                                    "startColumn": 16,
                                    "startLine": 1
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "token 1 has \u2018x-location == y-location == 260387\u2019"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "<generated>",
                                    "uriBaseId": "PWD"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "GCC diagnostic push\n"
                                    },
                                    "startLine": 1
                                },
                                "region": {
                                    "endColumn": 21,
                                    "startColumn": 20,
                                    "startLine": 1
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "token 2 has \u2018x-location == y-location == 260512\u2019"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "/usr/include/stdc-predef.h"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "# if __GCC_IEC_559_COMPLEX > 0\n"
                                    },
                                    "startLine": 47
                                },
                                "region": {
                                    "endColumn": 27,
                                    "startColumn": 6,
                                    "startLine": 47
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "expansion point is location 189172"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {}
                    ],
                    "message": {
                        "text": "token 0 has \u2018x-location == y-location == 1\u2019"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "/usr/include/stdc-predef.h"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "# if __GCC_IEC_559 > 0\n"
                                    },
                                    "startLine": 37
                                },
                                "region": {
                                    "endColumn": 19,
                                    "startColumn": 6,
                                    "startLine": 37
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "expansion point is location 148204"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {}
                    ],
                    "message": {
                        "text": "token 0 has \u2018x-location == y-location == 1\u2019"
                    },
                    "ruleId": "note"
                }
            ],
            "tool": {
                "driver": {
                    "fullName": "GNU C++17 (GCC) version 14.0.0 20230811 (experimental) (x86_64-pc-linux-gnu)",
                    "informationUri": "https://gcc.gnu.org/gcc-14/",
                    "name": "GNU C++17",
                    "rules": [],
                    "version": "14.0.0 20230811 (experimental)"
                }
            }
        }
    ],
    "version": "2.1.0"
}
  

Patch

diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index 1eff71962d7..c7c0e5d4b0a 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -174,7 +174,7 @@  private:
   json::array *maybe_make_kinds_array (diagnostic_event::meaning m) const;
   json::object *maybe_make_physical_location_object (location_t loc);
   json::object *make_artifact_location_object (location_t loc);
-  json::object *make_artifact_location_object (const char *filename);
+  json::object *make_artifact_location_object (source_id src);
   json::object *make_artifact_location_object_for_pwd () const;
   json::object *maybe_make_region_object (location_t loc) const;
   json::object *maybe_make_region_object_for_context (location_t loc) const;
@@ -197,9 +197,9 @@  private:
   json::object *make_reporting_descriptor_object_for_cwe_id (int cwe_id) const;
   json::object *
   make_reporting_descriptor_reference_object_for_cwe_id (int cwe_id);
-  json::object *make_artifact_object (const char *filename);
-  json::object *maybe_make_artifact_content_object (const char *filename) const;
-  json::object *maybe_make_artifact_content_object (const char *filename,
+  json::object *make_artifact_object (source_id src);
+  json::object *maybe_make_artifact_content_object (source_id src) const;
+  json::object *maybe_make_artifact_content_object (source_id src,
 						    int start_line,
 						    int end_line) const;
   json::object *make_fix_object (const rich_location &rich_loc);
@@ -220,7 +220,11 @@  private:
      diagnostic group.  */
   sarif_result *m_cur_group_result;
 
-  hash_set <const char *> m_filenames;
+  /* If the second member is >0, then this is a buffer of generated content,
+     with that length, not a filename.  */
+  hash_set <pair_hash <nofree_ptr_hash <const char>,
+		       int_hash <unsigned int, -1U> >
+	    > m_filenames;
   bool m_seen_any_relative_paths;
   hash_set <free_string_hash> m_rule_id_set;
   json::array *m_rules_arr;
@@ -787,7 +791,8 @@  sarif_builder::maybe_make_physical_location_object (location_t loc)
   /* "artifactLocation" property (SARIF v2.1.0 section 3.29.3).  */
   json::object *artifact_loc_obj = make_artifact_location_object (loc);
   phys_loc_obj->set ("artifactLocation", artifact_loc_obj);
-  m_filenames.add (LOCATION_FILE (loc));
+  const auto src = LOCATION_SRC (loc);
+  m_filenames.add ({src.get_filename_or_buffer (), src.get_buffer_len ()});
 
   /* "region" property (SARIF v2.1.0 section 3.29.4).  */
   if (json::object *region_obj = maybe_make_region_object (loc))
@@ -811,7 +816,7 @@  sarif_builder::maybe_make_physical_location_object (location_t loc)
 json::object *
 sarif_builder::make_artifact_location_object (location_t loc)
 {
-  return make_artifact_location_object (LOCATION_FILE (loc));
+  return make_artifact_location_object (LOCATION_SRC (loc));
 }
 
 /* The ID value for use in "uriBaseId" properties (SARIF v2.1.0 section 3.4.4)
@@ -823,10 +828,13 @@  sarif_builder::make_artifact_location_object (location_t loc)
    or return NULL.  */
 
 json::object *
-sarif_builder::make_artifact_location_object (const char *filename)
+sarif_builder::make_artifact_location_object (source_id src)
 {
   json::object *artifact_loc_obj = new json::object ();
 
+  const auto filename = src.is_buffer ()
+    ? special_fname_generated () : src.get_filename_or_buffer ();
+
   /* "uri" property (SARIF v2.1.0 section 3.4.3).  */
   artifact_loc_obj->set ("uri", new json::string (filename));
 
@@ -912,9 +920,9 @@  sarif_builder::maybe_make_region_object (location_t loc) const
   expanded_location exploc_start = expand_location (start_loc);
   expanded_location exploc_finish = expand_location (finish_loc);
 
-  if (exploc_start.file !=exploc_caret.file)
+  if (exploc_start.src != exploc_caret.src)
     return NULL;
-  if (exploc_finish.file !=exploc_caret.file)
+  if (exploc_finish.src != exploc_caret.src)
     return NULL;
 
   json::object *region_obj = new json::object ();
@@ -963,9 +971,9 @@  sarif_builder::maybe_make_region_object_for_context (location_t loc) const
   expanded_location exploc_start = expand_location (start_loc);
   expanded_location exploc_finish = expand_location (finish_loc);
 
-  if (exploc_start.file !=exploc_caret.file)
+  if (exploc_start.src != exploc_caret.src)
     return NULL;
-  if (exploc_finish.file !=exploc_caret.file)
+  if (exploc_finish.src != exploc_caret.src)
     return NULL;
 
   json::object *region_obj = new json::object ();
@@ -979,9 +987,9 @@  sarif_builder::maybe_make_region_object_for_context (location_t loc) const
 
   /* "snippet" property (SARIF v2.1.0 section 3.30.13).  */
   if (json::object *artifact_content_obj
-	 = maybe_make_artifact_content_object (exploc_start.file,
-					       exploc_start.line,
-					       exploc_finish.line))
+      = maybe_make_artifact_content_object (exploc_start.src,
+					    exploc_start.line,
+					    exploc_finish.line))
     region_obj->set ("snippet", artifact_content_obj);
 
   return region_obj;
@@ -1298,7 +1306,10 @@  sarif_builder::make_run_object (sarif_invocation *invocation_obj,
   json::array *artifacts_arr = new json::array ();
   for (auto iter : m_filenames)
     {
-      json::object *artifact_obj = make_artifact_object (iter);
+      const auto src = iter.second
+	? source_id {iter.first, iter.second} /* Memory buffer.  */
+	: source_id {iter.first}; /* Filename.  */
+      json::object *artifact_obj = make_artifact_object (src);
       artifacts_arr->append (artifact_obj);
     }
   run_obj->set ("artifacts", artifacts_arr);
@@ -1472,37 +1483,37 @@  sarif_builder::maybe_make_cwe_taxonomy_object () const
 /* Make an artifact object (SARIF v2.1.0 section 3.24).  */
 
 json::object *
-sarif_builder::make_artifact_object (const char *filename)
+sarif_builder::make_artifact_object (source_id src)
 {
   json::object *artifact_obj = new json::object ();
 
   /* "location" property (SARIF v2.1.0 section 3.24.2).  */
-  json::object *artifact_loc_obj = make_artifact_location_object (filename);
+  json::object *artifact_loc_obj = make_artifact_location_object (src);
   artifact_obj->set ("location", artifact_loc_obj);
 
   /* "contents" property (SARIF v2.1.0 section 3.24.8).  */
   if (json::object *artifact_content_obj
-	= maybe_make_artifact_content_object (filename))
+	= maybe_make_artifact_content_object (src))
     artifact_obj->set ("contents", artifact_content_obj);
 
   /* "sourceLanguage" property (SARIF v2.1.0 section 3.24.10).  */
   if (m_context->m_client_data_hooks)
     if (const char *source_lang
 	= m_context->m_client_data_hooks->maybe_get_sarif_source_language
-	    (filename))
+	    (src.get_filename_or_buffer ()))
       artifact_obj->set ("sourceLanguage", new json::string (source_lang));
 
   return artifact_obj;
 }
 
 /* Make an artifactContent object (SARIF v2.1.0 section 3.3) for the
-   full contents of FILENAME.  */
+   full contents of SRC.  */
 
 json::object *
-sarif_builder::maybe_make_artifact_content_object (const char *filename) const
+sarif_builder::maybe_make_artifact_content_object (source_id src) const
 {
   /* Let input.cc handle any charset conversion.  */
-  char_span utf8_content = get_source_file_content (filename);
+  char_span utf8_content = get_source_file_content (src);
   if (!utf8_content)
     return NULL;
 
@@ -1518,10 +1529,12 @@  sarif_builder::maybe_make_artifact_content_object (const char *filename) const
 }
 
 /* Attempt to read the given range of lines from FILENAME; return
-   a freshly-allocated 0-terminated buffer containing them, or NULL.  */
+   a freshly-allocated buffer containing them, or NULL.
+   The buffer is null-terminated, but could also contain embedded null
+   bytes, so the char_span's length() accessor should be used.  */
 
-static char *
-get_source_lines (const char *filename,
+static char_span
+get_source_lines (source_id src,
 		  int start_line,
 		  int end_line)
 {
@@ -1529,9 +1542,9 @@  get_source_lines (const char *filename,
 
   for (int line = start_line; line <= end_line; line++)
     {
-      char_span line_content = location_get_source_line (filename, line);
+      char_span line_content = location_get_source_line (src, line);
       if (!line_content.get_buffer ())
-	return NULL;
+	return char_span (nullptr, 0);
       result.reserve (line_content.length () + 1);
       for (size_t i = 0; i < line_content.length (); i++)
 	result.quick_push (line_content[i]);
@@ -1539,33 +1552,34 @@  get_source_lines (const char *filename,
     }
   result.safe_push ('\0');
 
-  return xstrdup (result.address ());
+  return char_span (xstrdup (result.address ()), result.length () - 1);
 }
 
 /* Make an artifactContent object (SARIF v2.1.0 section 3.3) for the given
-   run of lines within FILENAME (including the endpoints).  */
+   run of lines in the source code identified by SRC (including the
+   endpoints).  */
 
 json::object *
-sarif_builder::maybe_make_artifact_content_object (const char *filename,
+sarif_builder::maybe_make_artifact_content_object (source_id src,
 						   int start_line,
 						   int end_line) const
 {
-  char *text_utf8 = get_source_lines (filename, start_line, end_line);
+  const char_span text_utf8 = get_source_lines (src, start_line, end_line);
 
   if (!text_utf8)
     return NULL;
 
   /* Don't add it if it's not valid UTF-8.  */
-  if (!cpp_valid_utf8_p(text_utf8, strlen(text_utf8)))
+  if (!cpp_valid_utf8_p (text_utf8.get_buffer (), text_utf8.length ()))
     {
-      free (text_utf8);
+      free (const_cast<char *> (text_utf8.get_buffer ()));
       return NULL;
     }
 
   json::object *artifact_content_obj = new json::object ();
-  artifact_content_obj->set ("text", new json::string (text_utf8));
-  free (text_utf8);
-
+  artifact_content_obj->set ("text", new json::string (text_utf8.get_buffer (),
+						       text_utf8.length ()));
+  free (const_cast<char *> (text_utf8.get_buffer ()));
   return artifact_content_obj;
 }
 
diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c b/gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c
new file mode 100644
index 00000000000..2ca6a069d3f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c
@@ -0,0 +1,31 @@ 
+/* The goal is to test SARIF output of generated data, such as a _Pragma string.
+   But SARIF output as of yet does not output macro definitions, so such
+   generated data buffers never end up in the typical SARIF output.  One way we
+   can achieve it is to use -fdump-internal-locations, which outputs top-level
+   diagnostic notes inside macro definitions, that SARIF will end up processing.
+   It also outputs a lot of other stuff to stderr (not to the SARIF file) that
+   is not relevant to this test, so we use a blanket dg-regexp to filter all of
+   that away.  */
+
+/* { dg-do compile } */
+/* { dg-options "-fdiagnostics-format=sarif-file -fdump-internal-locations" } */
+/* { dg-allow-blank-lines-in-output "" } */
+
+_Pragma("GCC diagnostic push")
+
+/* { dg-regexp {(.|[\n\r])*} } */
+
+/* Because of the way -fdump-internal-locations works, these regexes themselves
+   will end up in the sarif output also.  But due to the escaping, they don't
+   match themselves, so they still test what we need.  */
+
+/* Four of this pair are output for the tokens inside the
+   _Pragma string (3 plus a PRAGMA_EOL).  */
+
+/* { dg-final { scan-sarif-file "\"artifactLocation\": \{\"uri\": \"<generated>\"," } } */
+/* { dg-final { scan-sarif-file "\"snippet\": \{\"text\": \"GCC diagnostic push\\\\n\"" } } */
+
+/* One of this pair is output for the overall internal location.  */
+
+/* { dg-final { scan-sarif-file "\{\"location\": \{\"uri\": \"<generated>\"," } } */
+/* { dg-final { scan-sarif-file "\"contents\": \{\"text\": \"GCC diagnostic push\\\\n\\\\0" } } */