[v4,4/8] diagnostics: Support obtaining source code lines from generated data buffers
Checks
Commit Message
This patch enhances location_get_source_line(), which is the primary
interface provided by the diagnostics infrastructure to obtain the line of
source code corresponding to a given location, so that it understands
generated data locations in addition to normal file-based locations. This
involves changing the argument to location_get_source_line() from a plain
file name, to a source_id object that can represent either type of location.
gcc/ChangeLog:
* input.cc (class data_cache_slot): New class.
(file_cache::lookup_data): New function.
(diagnostics_file_cache_forcibly_evict_data): New function.
(file_cache::forcibly_evict_data): New function.
(file_cache::evicted_cache_tab_entry): Generalize (via a template)
to work for both file_cache_slot and data_cache_slot.
(file_cache::add_file): Adapt for new interface to
evicted_cache_tab_entry.
(file_cache::add_data): New function.
(data_cache_slot::create): New function.
(file_cache::file_cache): Support the new m_data_slots member.
(file_cache::~file_cache): Likewise.
(file_cache::lookup_or_add_data): New function.
(file_cache::lookup_or_add): New function that calls either
lookup_or_add_data or lookup_or_add_file as appropriate.
(location_get_source_line): Change the FILE_PATH argument to a
source_id SRC, and use it to support obtaining source lines from
generated data as well as from files.
(location_compute_display_column): Support generated data using the
new features of location_get_source_line.
(dump_location_info): Likewise.
* input.h (location_get_source_line): Adjust prototype. Add a new
convenience overload taking an expanded_location.
(class cache_data_source): Declare.
(class data_cache_slot): Declare.
(class file_cache): Declare new members.
(diagnostics_file_cache_forcibly_evict_data): Declare.
---
gcc/input.cc | 171 ++++++++++++++++++++++++++++++++++++++++-----------
gcc/input.h | 23 +++++--
2 files changed, 153 insertions(+), 41 deletions(-)
Comments
On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> This patch enhances location_get_source_line(), which is the primary
> interface provided by the diagnostics infrastructure to obtain the line of
> source code corresponding to a given location, so that it understands
> generated data locations in addition to normal file-based locations. This
> involves changing the argument to location_get_source_line() from a plain
> file name, to a source_id object that can represent either type of location.
>
> gcc/ChangeLog:
>
> * input.cc (class data_cache_slot): New class.
> (file_cache::lookup_data): New function.
> (diagnostics_file_cache_forcibly_evict_data): New function.
> (file_cache::forcibly_evict_data): New function.
> (file_cache::evicted_cache_tab_entry): Generalize (via a template)
> to work for both file_cache_slot and data_cache_slot.
> (file_cache::add_file): Adapt for new interface to
> evicted_cache_tab_entry.
> (file_cache::add_data): New function.
> (data_cache_slot::create): New function.
> (file_cache::file_cache): Support the new m_data_slots member.
> (file_cache::~file_cache): Likewise.
> (file_cache::lookup_or_add_data): New function.
> (file_cache::lookup_or_add): New function that calls either
> lookup_or_add_data or lookup_or_add_file as appropriate.
> (location_get_source_line): Change the FILE_PATH argument to a
> source_id SRC, and use it to support obtaining source lines from
> generated data as well as from files.
> (location_compute_display_column): Support generated data using the
> new features of location_get_source_line.
> (dump_location_info): Likewise.
> * input.h (location_get_source_line): Adjust prototype. Add a new
> convenience overload taking an expanded_location.
> (class cache_data_source): Declare.
> (class data_cache_slot): Declare.
> (class file_cache): Declare new members.
> (diagnostics_file_cache_forcibly_evict_data): Declare.
> ---
> gcc/input.cc | 171 ++++++++++++++++++++++++++++++++++++++++-----------
> gcc/input.h | 23 +++++--
> 2 files changed, 153 insertions(+), 41 deletions(-)
>
> diff --git a/gcc/input.cc b/gcc/input.cc
> index 9377020b460..790279d4273 100644
> --- a/gcc/input.cc
> +++ b/gcc/input.cc
> @@ -207,6 +207,28 @@ private:
> void maybe_grow ();
> };
>
> +/* This is the implementation of cache_data_source for generated
> + data that is already in memory. */
> +class data_cache_slot final : public cache_data_source
It occurred to me: why are we caching accessing a buffer that's already
in memory - but we're also caching the line-splitting information, and
providing the line-splitting algorithm with a consistent interface to
the data, right?
[...snip...]
> @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file (const char *file_path)
> global_dc->m_file_cache->forcibly_evict_file (file_path);
> }
>
> +void
> +diagnostics_file_cache_forcibly_evict_data (const char *data,
> + unsigned int data_len)
> +{
> + if (!global_dc->m_file_cache)
> + return;
> + global_dc->m_file_cache->forcibly_evict_data (data, data_len);
Maybe we should rename diagnostic_context's m_file_cache to
m_source_cache? (and class file_cache for that matter?) But if so,
that can/should be a followup/separate patch.
[...snip...]
> @@ -525,10 +582,22 @@ file_cache_slot::create (const file_cache::input_context &in_context,
> return true;
> }
>
> +void
> +data_cache_slot::create (const char *data, unsigned int data_len,
> + unsigned int highest_use_count)
> +{
> + reset ();
> + on_create (highest_use_count + 1,
> + total_lines_num (source_id {data, data_len}));
> + m_data_begin = data;
> + m_data_end = data + data_len;
> +}
> +
> /* file_cache's ctor. */
>
> file_cache::file_cache ()
> -: m_file_slots (new file_cache_slot[num_file_slots])
> + : m_file_slots (new file_cache_slot[num_file_slots]),
> + m_data_slots (new data_cache_slot[num_file_slots])
Should "num_file_slots" be renamed to "num_slots"?
I assume you're using the same value for both kinds of slot since the
file_cache::evicted_cache_tab_entry template uses this. I suppose the
number could be passed in as an argument to that function if we wanted
to have different sizes for the two kinds, but I don't think it
matters.
[...snip...]
> @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t line_num,
> If the function fails, a NULL char_span is returned. */
>
> char_span
> -location_get_source_line (const char *file_path, int line)
> +location_get_source_line (source_id src, int line)
> {
> - const char *buffer = NULL;
> - ssize_t len;
> -
> - if (line == 0)
> - return char_span (NULL, 0);
> -
> - if (file_path == NULL)
> - return char_span (NULL, 0);
> + const char_span fail (nullptr, 0);
> + if (!src || line <= 0)
> + return fail;
Looking at source_id's operator bool, are there effectively three kinds
of source_id?
(a) file names
(b) generated buffer
(c) NULL == m_filename_or_buffer
What does (c) mean? Is it a "something's gone wrong/error" state? Or
is this more a special-case of (a)? (in that the m_len for such a case
would be zero)
Should source_id's 2-param ctor have an assert that the ptr is non-
NULL?
[...snip...]
The patch is OK for trunk as-is, but note the question about the
source_id ctor above.
Thanks
Dave
On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote:
> On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > This patch enhances location_get_source_line(), which is the primary
> > interface provided by the diagnostics infrastructure to obtain the line of
> > source code corresponding to a given location, so that it understands
> > generated data locations in addition to normal file-based locations. This
> > involves changing the argument to location_get_source_line() from a plain
> > file name, to a source_id object that can represent either type of location.
> >
> > gcc/ChangeLog:
> >
> > * input.cc (class data_cache_slot): New class.
> > (file_cache::lookup_data): New function.
> > (diagnostics_file_cache_forcibly_evict_data): New function.
> > (file_cache::forcibly_evict_data): New function.
> > (file_cache::evicted_cache_tab_entry): Generalize (via a template)
> > to work for both file_cache_slot and data_cache_slot.
> > (file_cache::add_file): Adapt for new interface to
> > evicted_cache_tab_entry.
> > (file_cache::add_data): New function.
> > (data_cache_slot::create): New function.
> > (file_cache::file_cache): Support the new m_data_slots member.
> > (file_cache::~file_cache): Likewise.
> > (file_cache::lookup_or_add_data): New function.
> > (file_cache::lookup_or_add): New function that calls either
> > lookup_or_add_data or lookup_or_add_file as appropriate.
> > (location_get_source_line): Change the FILE_PATH argument to a
> > source_id SRC, and use it to support obtaining source lines from
> > generated data as well as from files.
> > (location_compute_display_column): Support generated data using the
> > new features of location_get_source_line.
> > (dump_location_info): Likewise.
> > * input.h (location_get_source_line): Adjust prototype. Add a new
> > convenience overload taking an expanded_location.
> > (class cache_data_source): Declare.
> > (class data_cache_slot): Declare.
> > (class file_cache): Declare new members.
> > (diagnostics_file_cache_forcibly_evict_data): Declare.
> > ---
> > gcc/input.cc | 171 ++++++++++++++++++++++++++++++++++++++++-----------
> > gcc/input.h | 23 +++++--
> > 2 files changed, 153 insertions(+), 41 deletions(-)
> >
> > diff --git a/gcc/input.cc b/gcc/input.cc
> > index 9377020b460..790279d4273 100644
> > --- a/gcc/input.cc
> > +++ b/gcc/input.cc
> > @@ -207,6 +207,28 @@ private:
> > void maybe_grow ();
> > };
> >
> > +/* This is the implementation of cache_data_source for generated
> > + data that is already in memory. */
> > +class data_cache_slot final : public cache_data_source
>
> It occurred to me: why are we caching accessing a buffer that's already
> in memory - but we're also caching the line-splitting information, and
> providing the line-splitting algorithm with a consistent interface to
> the data, right?
>
Yeah, for the current _Pragma use case, multi-line buffers are not going to
be common, but they can occur. I was mainly motivated by the consistent
interface, and by the assumption that the overhead is not critical given a
diagnostic is being issued.
> [...snip...]
>
> > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file (const char *file_path)
> > global_dc->m_file_cache->forcibly_evict_file (file_path);
> > }
> >
> > +void
> > +diagnostics_file_cache_forcibly_evict_data (const char *data,
> > + unsigned int data_len)
> > +{
> > + if (!global_dc->m_file_cache)
> > + return;
> > + global_dc->m_file_cache->forcibly_evict_data (data, data_len);
>
> Maybe we should rename diagnostic_context's m_file_cache to
> m_source_cache? (and class file_cache for that matter?) But if so,
> that can/should be a followup/separate patch.
>
Yes, we should. Believe it or not, I was trying to minimize the size of the
patch :) So I didn't make such changes, but they will make things more
clear.
> [...snip...]
>
> > @@ -525,10 +582,22 @@ file_cache_slot::create (const file_cache::input_context &in_context,
> > return true;
> > }
> >
> > +void
> > +data_cache_slot::create (const char *data, unsigned int data_len,
> > + unsigned int highest_use_count)
> > +{
> > + reset ();
> > + on_create (highest_use_count + 1,
> > + total_lines_num (source_id {data, data_len}));
> > + m_data_begin = data;
> > + m_data_end = data + data_len;
> > +}
> > +
> > /* file_cache's ctor. */
> >
> > file_cache::file_cache ()
> > -: m_file_slots (new file_cache_slot[num_file_slots])
> > + : m_file_slots (new file_cache_slot[num_file_slots]),
> > + m_data_slots (new data_cache_slot[num_file_slots])
>
> Should "num_file_slots" be renamed to "num_slots"?
>
> I assume you're using the same value for both kinds of slot since the
> file_cache::evicted_cache_tab_entry template uses this. I suppose the
> number could be passed in as an argument to that function if we wanted
> to have different sizes for the two kinds, but I don't think it
> matters.
>
Yes that's right... would rename num_file_slots too.
> [...snip...]
>
> > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t line_num,
> > If the function fails, a NULL char_span is returned. */
> >
> > char_span
> > -location_get_source_line (const char *file_path, int line)
> > +location_get_source_line (source_id src, int line)
> > {
> > - const char *buffer = NULL;
> > - ssize_t len;
> > -
> > - if (line == 0)
> > - return char_span (NULL, 0);
> > -
> > - if (file_path == NULL)
> > - return char_span (NULL, 0);
> > + const char_span fail (nullptr, 0);
> > + if (!src || line <= 0)
> > + return fail;
>
> Looking at source_id's operator bool, are there effectively three kinds
> of source_id?
>
> (a) file names
> (b) generated buffer
> (c) NULL == m_filename_or_buffer
>
> What does (c) mean? Is it a "something's gone wrong/error" state? Or
> is this more a special-case of (a)? (in that the m_len for such a case
> would be zero)
>
> Should source_id's 2-param ctor have an assert that the ptr is non-
> NULL?
>
> [...snip...]
>
> The patch is OK for trunk as-is, but note the question about the
> source_id ctor above.
>
Thanks. (c) has the same meaning as a NULL file name currently does, so a
default-constructed source_id is not an in-memory buffer, but is rather a
NULL filename. linemap_add() for instance, will interpret a NULL filename
for an LC_LEAVE map, as a request to copy it from the natural values being
returned to. I think the source_id constructor needs to accept a NULL
filename to remain backwards compatible. With the current design of
source_id, it is safe always to change a 'const char*' file name argument to
a source_id argument instead; it will work just how it did before because it
has an implicit constructor. But if the constructor would assert on a
non-NULL pointer, that would necessitate changing all call sites that
currently expect they can pass a NULL pointer there. (For example, there are
several calls to _cpp_do_file_change() within libcpp that take advantage of
being able to pass a NULL filename to linemap_add.)
-Lewis
On Tue, 2023-08-15 at 14:15 -0400, Lewis Hyatt wrote:
> On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote:
> > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > > This patch enhances location_get_source_line(), which is the
> > > primary
> > > interface provided by the diagnostics infrastructure to obtain
> > > the line of
> > > source code corresponding to a given location, so that it
> > > understands
> > > generated data locations in addition to normal file-based
> > > locations. This
> > > involves changing the argument to location_get_source_line() from
> > > a plain
> > > file name, to a source_id object that can represent either type
> > > of location.
> > >
[...]
> > >
> > >
> > > diff --git a/gcc/input.cc b/gcc/input.cc
> > > index 9377020b460..790279d4273 100644
> > > --- a/gcc/input.cc
> > > +++ b/gcc/input.cc
> > > @@ -207,6 +207,28 @@ private:
> > > void maybe_grow ();
> > > };
> > >
> > > +/* This is the implementation of cache_data_source for generated
> > > + data that is already in memory. */
> > > +class data_cache_slot final : public cache_data_source
> >
> > It occurred to me: why are we caching accessing a buffer that's
> > already
> > in memory - but we're also caching the line-splitting information,
> > and
> > providing the line-splitting algorithm with a consistent interface
> > to
> > the data, right?
> >
>
> Yeah, for the current _Pragma use case, multi-line buffers are not
> going to
> be common, but they can occur. I was mainly motivated by the
> consistent
> interface, and by the assumption that the overhead is not critical
> given a
> diagnostic is being issued.
(nods)
>
> > [...snip...]
> >
> > > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file
> > > (const char *file_path)
> > > global_dc->m_file_cache->forcibly_evict_file (file_path);
> > > }
> > >
> > > +void
> > > +diagnostics_file_cache_forcibly_evict_data (const char *data,
> > > + unsigned int
> > > data_len)
> > > +{
> > > + if (!global_dc->m_file_cache)
> > > + return;
> > > + global_dc->m_file_cache->forcibly_evict_data (data, data_len);
> >
> > Maybe we should rename diagnostic_context's m_file_cache to
> > m_source_cache? (and class file_cache for that matter?) But if
> > so,
> > that can/should be a followup/separate patch.
> >
>
> Yes, we should. Believe it or not, I was trying to minimize the size
> of the
> patch :)
:)
Thanks for splitting it up, BTW.
[...]
> >
> > > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t
> > > line_num,
> > > If the function fails, a NULL char_span is returned. */
> > >
> > > char_span
> > > -location_get_source_line (const char *file_path, int line)
> > > +location_get_source_line (source_id src, int line)
> > > {
> > > - const char *buffer = NULL;
> > > - ssize_t len;
> > > -
> > > - if (line == 0)
> > > - return char_span (NULL, 0);
> > > -
> > > - if (file_path == NULL)
> > > - return char_span (NULL, 0);
> > > + const char_span fail (nullptr, 0);
> > > + if (!src || line <= 0)
> > > + return fail;
> >
> > Looking at source_id's operator bool, are there effectively three
> > kinds
> > of source_id?
> >
> > (a) file names
> > (b) generated buffer
> > (c) NULL == m_filename_or_buffer
> >
> > What does (c) mean? Is it a "something's gone wrong/error" state?
> > Or
> > is this more a special-case of (a)? (in that the m_len for such a
> > case
> > would be zero)
> >
> > Should source_id's 2-param ctor have an assert that the ptr is non-
> > NULL?
> >
> > [...snip...]
> >
> > The patch is OK for trunk as-is, but note the question about the
> > source_id ctor above.
> >
>
> Thanks. (c) has the same meaning as a NULL file name currently does,
> so a
> default-constructed source_id is not an in-memory buffer, but is
> rather a
> NULL filename. linemap_add() for instance, will interpret a NULL
> filename
> for an LC_LEAVE map, as a request to copy it from the natural values
> being
> returned to. I think the source_id constructor needs to accept a NULL
> filename to remain backwards compatible. With the current design of
> source_id, it is safe always to change a 'const char*' file name
> argument to
> a source_id argument instead; it will work just how it did before
> because it
> has an implicit constructor. But if the constructor would assert on a
> non-NULL pointer, that would necessitate changing all call sites that
> currently expect they can pass a NULL pointer there. (For example,
> there are
> several calls to _cpp_do_file_change() within libcpp that take
> advantage of
> being able to pass a NULL filename to linemap_add.)
Yes, it's OK for this ctor to accept NULL;
source_id (const char *filename = nullptr)
and I see you added the default arg.
I was referring to this ctor:
source_id (const char *buffer, unsigned buffer_len)
Is it ever OK for "buffer" to be NULL in this 2-param ctor, or can we
assert that it's non-NULL in this ctor? Does the generated data case
ever return NULL?
This is more of a patch 1 thing, of course.
Thanks
Dave
On Tue, Aug 15, 2023 at 3:46 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Tue, 2023-08-15 at 14:15 -0400, Lewis Hyatt wrote:
> > On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote:
> > > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > > > This patch enhances location_get_source_line(), which is the
> > > > primary
> > > > interface provided by the diagnostics infrastructure to obtain
> > > > the line of
> > > > source code corresponding to a given location, so that it
> > > > understands
> > > > generated data locations in addition to normal file-based
> > > > locations. This
> > > > involves changing the argument to location_get_source_line() from
> > > > a plain
> > > > file name, to a source_id object that can represent either type
> > > > of location.
> > > >
>
> [...]
>
> > > >
> > > >
> > > > diff --git a/gcc/input.cc b/gcc/input.cc
> > > > index 9377020b460..790279d4273 100644
> > > > --- a/gcc/input.cc
> > > > +++ b/gcc/input.cc
> > > > @@ -207,6 +207,28 @@ private:
> > > > void maybe_grow ();
> > > > };
> > > >
> > > > +/* This is the implementation of cache_data_source for generated
> > > > + data that is already in memory. */
> > > > +class data_cache_slot final : public cache_data_source
> > >
> > > It occurred to me: why are we caching accessing a buffer that's
> > > already
> > > in memory - but we're also caching the line-splitting information,
> > > and
> > > providing the line-splitting algorithm with a consistent interface
> > > to
> > > the data, right?
> > >
> >
> > Yeah, for the current _Pragma use case, multi-line buffers are not
> > going to
> > be common, but they can occur. I was mainly motivated by the
> > consistent
> > interface, and by the assumption that the overhead is not critical
> > given a
> > diagnostic is being issued.
>
> (nods)
>
> >
> > > [...snip...]
> > >
> > > > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file
> > > > (const char *file_path)
> > > > global_dc->m_file_cache->forcibly_evict_file (file_path);
> > > > }
> > > >
> > > > +void
> > > > +diagnostics_file_cache_forcibly_evict_data (const char *data,
> > > > + unsigned int
> > > > data_len)
> > > > +{
> > > > + if (!global_dc->m_file_cache)
> > > > + return;
> > > > + global_dc->m_file_cache->forcibly_evict_data (data, data_len);
> > >
> > > Maybe we should rename diagnostic_context's m_file_cache to
> > > m_source_cache? (and class file_cache for that matter?) But if
> > > so,
> > > that can/should be a followup/separate patch.
> > >
> >
> > Yes, we should. Believe it or not, I was trying to minimize the size
> > of the
> > patch :)
>
> :)
>
> Thanks for splitting it up, BTW.
>
> [...]
>
>
> > >
> > > > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t
> > > > line_num,
> > > > If the function fails, a NULL char_span is returned. */
> > > >
> > > > char_span
> > > > -location_get_source_line (const char *file_path, int line)
> > > > +location_get_source_line (source_id src, int line)
> > > > {
> > > > - const char *buffer = NULL;
> > > > - ssize_t len;
> > > > -
> > > > - if (line == 0)
> > > > - return char_span (NULL, 0);
> > > > -
> > > > - if (file_path == NULL)
> > > > - return char_span (NULL, 0);
> > > > + const char_span fail (nullptr, 0);
> > > > + if (!src || line <= 0)
> > > > + return fail;
> > >
> > > Looking at source_id's operator bool, are there effectively three
> > > kinds
> > > of source_id?
> > >
> > > (a) file names
> > > (b) generated buffer
> > > (c) NULL == m_filename_or_buffer
> > >
> > > What does (c) mean? Is it a "something's gone wrong/error" state?
> > > Or
> > > is this more a special-case of (a)? (in that the m_len for such a
> > > case
> > > would be zero)
> > >
> > > Should source_id's 2-param ctor have an assert that the ptr is non-
> > > NULL?
> > >
> > > [...snip...]
> > >
> > > The patch is OK for trunk as-is, but note the question about the
> > > source_id ctor above.
> > >
> >
> > Thanks. (c) has the same meaning as a NULL file name currently does,
> > so a
> > default-constructed source_id is not an in-memory buffer, but is
> > rather a
> > NULL filename. linemap_add() for instance, will interpret a NULL
> > filename
> > for an LC_LEAVE map, as a request to copy it from the natural values
> > being
> > returned to. I think the source_id constructor needs to accept a NULL
> > filename to remain backwards compatible. With the current design of
> > source_id, it is safe always to change a 'const char*' file name
> > argument to
> > a source_id argument instead; it will work just how it did before
> > because it
> > has an implicit constructor. But if the constructor would assert on a
> > non-NULL pointer, that would necessitate changing all call sites that
> > currently expect they can pass a NULL pointer there. (For example,
> > there are
> > several calls to _cpp_do_file_change() within libcpp that take
> > advantage of
> > being able to pass a NULL filename to linemap_add.)
>
> Yes, it's OK for this ctor to accept NULL;
> source_id (const char *filename = nullptr)
> and I see you added the default arg.
>
> I was referring to this ctor:
> source_id (const char *buffer, unsigned buffer_len)
> Is it ever OK for "buffer" to be NULL in this 2-param ctor, or can we
> assert that it's non-NULL in this ctor? Does the generated data case
> ever return NULL?
>
Oh, I see. This should never be NULL and I can add an assert for that.
-Lewis
On Tue, Aug 15, 2023 at 04:08:47PM -0400, Lewis Hyatt wrote:
> On Tue, Aug 15, 2023 at 3:46 PM David Malcolm <dmalcolm@redhat.com> wrote:
> >
> > On Tue, 2023-08-15 at 14:15 -0400, Lewis Hyatt wrote:
> > > On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote:
> > > > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > > > > This patch enhances location_get_source_line(), which is the
> > > > > primary
> > > > > interface provided by the diagnostics infrastructure to obtain
> > > > > the line of
> > > > > source code corresponding to a given location, so that it
> > > > > understands
> > > > > generated data locations in addition to normal file-based
> > > > > locations. This
> > > > > involves changing the argument to location_get_source_line() from
> > > > > a plain
> > > > > file name, to a source_id object that can represent either type
> > > > > of location.
> > > > >
> >
> > [...]
> >
> > > > >
> > > > >
> > > > > diff --git a/gcc/input.cc b/gcc/input.cc
> > > > > index 9377020b460..790279d4273 100644
> > > > > --- a/gcc/input.cc
> > > > > +++ b/gcc/input.cc
> > > > > @@ -207,6 +207,28 @@ private:
> > > > > void maybe_grow ();
> > > > > };
> > > > >
> > > > > +/* This is the implementation of cache_data_source for generated
> > > > > + data that is already in memory. */
> > > > > +class data_cache_slot final : public cache_data_source
> > > >
> > > > It occurred to me: why are we caching accessing a buffer that's
> > > > already
> > > > in memory - but we're also caching the line-splitting information,
> > > > and
> > > > providing the line-splitting algorithm with a consistent interface
> > > > to
> > > > the data, right?
> > > >
> > >
> > > Yeah, for the current _Pragma use case, multi-line buffers are not
> > > going to
> > > be common, but they can occur. I was mainly motivated by the
> > > consistent
> > > interface, and by the assumption that the overhead is not critical
> > > given a
> > > diagnostic is being issued.
> >
> > (nods)
> >
> > >
> > > > [...snip...]
> > > >
> > > > > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file
> > > > > (const char *file_path)
> > > > > global_dc->m_file_cache->forcibly_evict_file (file_path);
> > > > > }
> > > > >
> > > > > +void
> > > > > +diagnostics_file_cache_forcibly_evict_data (const char *data,
> > > > > + unsigned int
> > > > > data_len)
> > > > > +{
> > > > > + if (!global_dc->m_file_cache)
> > > > > + return;
> > > > > + global_dc->m_file_cache->forcibly_evict_data (data, data_len);
> > > >
> > > > Maybe we should rename diagnostic_context's m_file_cache to
> > > > m_source_cache? (and class file_cache for that matter?) But if
> > > > so,
> > > > that can/should be a followup/separate patch.
> > > >
> > >
> > > Yes, we should. Believe it or not, I was trying to minimize the size
> > > of the
> > > patch :)
> >
> > :)
> >
> > Thanks for splitting it up, BTW.
> >
> > [...]
> >
> >
> > > >
> > > > > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t
> > > > > line_num,
> > > > > If the function fails, a NULL char_span is returned. */
> > > > >
> > > > > char_span
> > > > > -location_get_source_line (const char *file_path, int line)
> > > > > +location_get_source_line (source_id src, int line)
> > > > > {
> > > > > - const char *buffer = NULL;
> > > > > - ssize_t len;
> > > > > -
> > > > > - if (line == 0)
> > > > > - return char_span (NULL, 0);
> > > > > -
> > > > > - if (file_path == NULL)
> > > > > - return char_span (NULL, 0);
> > > > > + const char_span fail (nullptr, 0);
> > > > > + if (!src || line <= 0)
> > > > > + return fail;
> > > >
> > > > Looking at source_id's operator bool, are there effectively three
> > > > kinds
> > > > of source_id?
> > > >
> > > > (a) file names
> > > > (b) generated buffer
> > > > (c) NULL == m_filename_or_buffer
> > > >
> > > > What does (c) mean? Is it a "something's gone wrong/error" state?
> > > > Or
> > > > is this more a special-case of (a)? (in that the m_len for such a
> > > > case
> > > > would be zero)
> > > >
> > > > Should source_id's 2-param ctor have an assert that the ptr is non-
> > > > NULL?
> > > >
> > > > [...snip...]
> > > >
> > > > The patch is OK for trunk as-is, but note the question about the
> > > > source_id ctor above.
> > > >
> > >
> > > Thanks. (c) has the same meaning as a NULL file name currently does,
> > > so a
> > > default-constructed source_id is not an in-memory buffer, but is
> > > rather a
> > > NULL filename. linemap_add() for instance, will interpret a NULL
> > > filename
> > > for an LC_LEAVE map, as a request to copy it from the natural values
> > > being
> > > returned to. I think the source_id constructor needs to accept a NULL
> > > filename to remain backwards compatible. With the current design of
> > > source_id, it is safe always to change a 'const char*' file name
> > > argument to
> > > a source_id argument instead; it will work just how it did before
> > > because it
> > > has an implicit constructor. But if the constructor would assert on a
> > > non-NULL pointer, that would necessitate changing all call sites that
> > > currently expect they can pass a NULL pointer there. (For example,
> > > there are
> > > several calls to _cpp_do_file_change() within libcpp that take
> > > advantage of
> > > being able to pass a NULL filename to linemap_add.)
> >
> > Yes, it's OK for this ctor to accept NULL;
> > source_id (const char *filename = nullptr)
> > and I see you added the default arg.
> >
> > I was referring to this ctor:
> > source_id (const char *buffer, unsigned buffer_len)
> > Is it ever OK for "buffer" to be NULL in this 2-param ctor, or can we
> > assert that it's non-NULL in this ctor? Does the generated data case
> > ever return NULL?
> >
>
> Oh, I see. This should never be NULL and I can add an assert for that.
>
This tweak (incremental to patch 1/8) accomplishes that.
-Lewis
-- >8 --
diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
index 395c4612dbe..ad20b140cce 100644
--- a/libcpp/include/line-map.h
+++ b/libcpp/include/line-map.h
@@ -604,7 +604,7 @@ public:
: m_filename_or_buffer (buffer),
m_len (buffer_len)
{
- linemap_assert (buffer_len > 0);
+ linemap_assert (buffer && buffer_len > 0);
}
explicit operator bool () const { return m_filename_or_buffer; }
@@ -207,6 +207,28 @@ private:
void maybe_grow ();
};
+/* This is the implementation of cache_data_source for generated
+ data that is already in memory. */
+class data_cache_slot final : public cache_data_source
+{
+public:
+ void create (const char *data, unsigned int data_len,
+ unsigned int highest_use_count);
+ bool represents_data (const char *data, unsigned int) const
+ {
+ /* We can just use pointer equality here since the generated data lives in
+ memory in one persistent place. It isn't anticipated there would be
+ several generated data buffers with the same content, so we don't mind
+ that in such a case we will store it twice. */
+ return m_data_begin == data;
+ }
+
+protected:
+ /* In contrast to file_cache_slot, we do not own a buffer. The buffer
+ passed to create() needs to outlive this object. */
+ bool get_more_data () override { return false; }
+};
+
/* Current position in real source file. */
location_t input_location = UNKNOWN_LOCATION;
@@ -382,6 +404,21 @@ file_cache::lookup_file (const char *file_path)
return r;
}
+data_cache_slot *
+file_cache::lookup_data (const char *data, unsigned int data_len)
+{
+ for (unsigned int i = 0; i != num_file_slots; ++i)
+ {
+ const auto slot = m_data_slots + i;
+ if (slot->represents_data (data, data_len))
+ {
+ slot->inc_use_count ();
+ return slot;
+ }
+ }
+ return nullptr;
+}
+
/* Purge any mention of FILENAME from the cache of files used for
printing source code. For use in selftests when working
with tempfiles. */
@@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file (const char *file_path)
global_dc->m_file_cache->forcibly_evict_file (file_path);
}
+void
+diagnostics_file_cache_forcibly_evict_data (const char *data,
+ unsigned int data_len)
+{
+ if (!global_dc->m_file_cache)
+ return;
+ global_dc->m_file_cache->forcibly_evict_data (data, data_len);
+}
+
void
file_cache::forcibly_evict_file (const char *file_path)
{
@@ -410,36 +456,36 @@ file_cache::forcibly_evict_file (const char *file_path)
r->reset ();
}
+void
+file_cache::forcibly_evict_data (const char *data, unsigned int data_len)
+{
+ if (auto r = lookup_data (data, data_len))
+ r->reset ();
+}
+
/* Return the cache that has been less used, recently, or the
first empty one. If HIGHEST_USE_COUNT is non-null,
*HIGHEST_USE_COUNT is set to the highest use count of the entries
in the cache table. */
-file_cache_slot*
-file_cache::evicted_cache_tab_entry (unsigned *highest_use_count)
+template <class Slot>
+Slot *
+file_cache::evicted_cache_tab_entry (Slot *slots,
+ unsigned int *highest_use_count)
{
- diagnostic_file_cache_init ();
-
- file_cache_slot *to_evict = &m_file_slots[0];
+ auto to_evict = &slots[0];
unsigned huc = to_evict->get_use_count ();
for (unsigned i = 1; i < num_file_slots; ++i)
{
- file_cache_slot *c = &m_file_slots[i];
- bool c_is_empty = (c->get_file_path () == NULL);
-
+ auto c = &slots[i];
if (c->get_use_count () < to_evict->get_use_count ()
- || (to_evict->get_file_path () && c_is_empty))
+ || (!to_evict->unused () && c->unused ()))
/* We evict C because it's either an entry with a lower use
count or one that is empty. */
to_evict = c;
if (huc < c->get_use_count ())
huc = c->get_use_count ();
-
- if (c_is_empty)
- /* We've reached the end of the cache; subsequent elements are
- all empty. */
- break;
}
if (highest_use_count)
@@ -463,12 +509,23 @@ file_cache::add_file (const char *file_path)
return NULL;
unsigned highest_use_count = 0;
- file_cache_slot *r = evicted_cache_tab_entry (&highest_use_count);
+ file_cache_slot *r = evicted_cache_tab_entry (m_file_slots,
+ &highest_use_count);
if (!r->create (in_context, file_path, fp, highest_use_count))
return NULL;
return r;
}
+data_cache_slot *
+file_cache::add_data (const char *data, unsigned int data_len)
+{
+ unsigned int highest_use_count = 0;
+ data_cache_slot *r = evicted_cache_tab_entry (m_data_slots,
+ &highest_use_count);
+ r->create (data, data_len, highest_use_count);
+ return r;
+}
+
/* Get a borrowed char_span to the full content of this file
as decoded according to the input charset, encoded as UTF-8. */
@@ -525,10 +582,22 @@ file_cache_slot::create (const file_cache::input_context &in_context,
return true;
}
+void
+data_cache_slot::create (const char *data, unsigned int data_len,
+ unsigned int highest_use_count)
+{
+ reset ();
+ on_create (highest_use_count + 1,
+ total_lines_num (source_id {data, data_len}));
+ m_data_begin = data;
+ m_data_end = data + data_len;
+}
+
/* file_cache's ctor. */
file_cache::file_cache ()
-: m_file_slots (new file_cache_slot[num_file_slots])
+ : m_file_slots (new file_cache_slot[num_file_slots]),
+ m_data_slots (new data_cache_slot[num_file_slots])
{
initialize_input_context (nullptr, false);
}
@@ -537,6 +606,7 @@ file_cache::file_cache ()
file_cache::~file_cache ()
{
+ delete[] m_data_slots;
delete[] m_file_slots;
}
@@ -554,6 +624,24 @@ file_cache::lookup_or_add_file (const char *file_path)
return r;
}
+data_cache_slot *
+file_cache::lookup_or_add_data (const char *data, unsigned int data_len)
+{
+ data_cache_slot *r = lookup_data (data, data_len);
+ if (!r)
+ r = add_data (data, data_len);
+ return r;
+}
+
+cache_data_source *
+file_cache::lookup_or_add (source_id src)
+{
+ if (src.is_buffer ())
+ return lookup_or_add_data (src.get_filename_or_buffer (),
+ src.get_buffer_len ());
+ return src ? lookup_or_add_file (src.get_filename_or_buffer ()) : nullptr;
+}
+
cache_data_source::cache_data_source ()
: m_data_begin (nullptr), m_data_end (nullptr),
m_use_count (0),
@@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t line_num,
If the function fails, a NULL char_span is returned. */
char_span
-location_get_source_line (const char *file_path, int line)
+location_get_source_line (source_id src, int line)
{
- const char *buffer = NULL;
- ssize_t len;
-
- if (line == 0)
- return char_span (NULL, 0);
-
- if (file_path == NULL)
- return char_span (NULL, 0);
+ const char_span fail (nullptr, 0);
+ if (!src || line <= 0)
+ return fail;
diagnostic_file_cache_init ();
+ const auto c = global_dc->m_file_cache->lookup_or_add (src);
+ if (!c)
+ return fail;
- file_cache_slot *c = global_dc->m_file_cache->lookup_or_add_file (file_path);
- if (c == NULL)
- return char_span (NULL, 0);
-
+ const char *buffer = NULL;
+ ssize_t len;
bool read = c->read_line_num (line, &buffer, &len);
if (!read)
- return char_span (NULL, 0);
+ return fail;
return char_span (buffer, len);
}
@@ -1193,9 +1277,9 @@ int
location_compute_display_column (expanded_location exploc,
const cpp_char_column_policy &policy)
{
- if (!(exploc.file && *exploc.file && exploc.line && exploc.column))
+ if (!(exploc.src && exploc.line && exploc.column))
return exploc.column;
- char_span line = location_get_source_line (exploc.file, exploc.line);
+ char_span line = location_get_source_line (exploc);
/* If line is NULL, this function returns exploc.column which is the
desired fallback. */
return cpp_byte_column_to_display_column (line.get_buffer (), line.length (),
@@ -1425,13 +1509,26 @@ dump_location_info (FILE *stream)
{
/* Beginning of a new source line: draw the line. */
- char_span line_text = location_get_source_line (exploc.file,
- exploc.line);
+ char_span line_text = location_get_source_line (exploc);
if (!line_text)
break;
+
+ const char *fn1, *fn2;
+ if (exploc.src.is_buffer ())
+ {
+ fn1 = ORDINARY_MAP_CONTAINING_FILE_NAME (line_table, map);
+ fn2 = special_fname_generated ();
+ }
+ else
+ {
+ fn1 = exploc.file;
+ fn2 = "";
+ }
+
fprintf (stream,
- "%s:%3i|loc:%5i|%.*s\n",
- exploc.file, exploc.line,
+ "%s%s:%3i|loc:%5i|%.*s\n",
+ fn1, fn2,
+ exploc.line,
loc,
(int)line_text.length (), line_text.get_buffer ());
@@ -1450,7 +1547,7 @@ dump_location_info (FILE *stream)
if (len_loc < 5)
len_loc = 5;
- int indent = 6 + strlen (exploc.file) + len_lnum + len_loc;
+ int indent = 6 + strlen (fn1) + strlen (fn2) + len_lnum + len_loc;
/* Thousands. */
if (end_location > 999)
@@ -114,15 +114,21 @@ class char_span
size_t m_n_elts;
};
-extern char_span location_get_source_line (const char *file_path, int line);
+extern char_span location_get_source_line (source_id src, int line);
+inline char_span location_get_source_line (expanded_location exploc)
+{
+ return location_get_source_line (exploc.src, exploc.line);
+}
extern char *get_source_text_between (location_t, location_t);
extern char_span get_source_file_content (const char *file_path);
extern bool location_missing_trailing_newline (const char *file_path);
-/* Forward decl of slot within file_cache, so that the definition doesn't
+/* Forward decl of slots within file_cache, so that the definition doesn't
need to be in this header. */
+class cache_data_source;
class file_cache_slot;
+class data_cache_slot;
/* A cache of source files for use when emitting diagnostics
(and in a few places in the C/C++ frontends).
@@ -140,7 +146,10 @@ class file_cache
~file_cache ();
file_cache_slot *lookup_or_add_file (const char *file_path);
+ data_cache_slot *lookup_or_add_data (const char *data, unsigned int data_len);
+ cache_data_source *lookup_or_add (source_id src);
void forcibly_evict_file (const char *file_path);
+ void forcibly_evict_data (const char *data, unsigned int data_len);
/* See comments in diagnostic.h about the input conversion context. */
struct input_context
@@ -152,13 +161,17 @@ class file_cache
bool should_skip_bom);
private:
- file_cache_slot *evicted_cache_tab_entry (unsigned *highest_use_count);
+ template <class Slot>
+ Slot *evicted_cache_tab_entry (Slot *slots, unsigned int *highest_use_count);
+
file_cache_slot *add_file (const char *file_path);
+ data_cache_slot *add_data (const char *data, unsigned int data_len);
file_cache_slot *lookup_file (const char *file_path);
+ data_cache_slot *lookup_data (const char *data, unsigned int data_len);
- private:
static const size_t num_file_slots = 16;
file_cache_slot *m_file_slots;
+ data_cache_slot *m_data_slots;
input_context in_context;
};
@@ -256,6 +269,8 @@ void dump_location_info (FILE *stream);
void diagnostics_file_cache_fini (void);
void diagnostics_file_cache_forcibly_evict_file (const char *file_path);
+void diagnostics_file_cache_forcibly_evict_data (const char *data,
+ unsigned int data_len);
class GTY(()) string_concat
{