ld: Sort section contributions in PDB files

Message ID 20230220141328.20441-1-mark@harmstone.com
State Accepted
Headers
Series ld: Sort section contributions in PDB files |

Checks

Context Check Description
snail/binutils-gdb-check success Github commit url

Commit Message

Mark Harmstone Feb. 20, 2023, 2:13 p.m. UTC
  Microsoft's DIA library, and thus also MSVC and WinDbg, expects section
contributions to be ordered by section number and offset, otherwise it's
unable to resolve line numbers.
---
 ld/pdb.c                                  | 81 +++++++++++++++++------
 ld/testsuite/ld-pe/pdb2-section-contrib.d |  6 +-
 2 files changed, 65 insertions(+), 22 deletions(-)
  

Comments

Jan Beulich Feb. 21, 2023, 8:26 a.m. UTC | #1
On 20.02.2023 15:13, Mark Harmstone wrote:
> Microsoft's DIA library, and thus also MSVC and WinDbg, expects section
> contributions to be ordered by section number and offset, otherwise it's
> unable to resolve line numbers.
> ---
>  ld/pdb.c                                  | 81 +++++++++++++++++------
>  ld/testsuite/ld-pe/pdb2-section-contrib.d |  6 +-
>  2 files changed, 65 insertions(+), 22 deletions(-)

In principle okay, but I'd like to re-raise the question of excess
casting (hence including Nick and Alan as the far more experienced
binutils maintainers):

> @@ -4118,6 +4125,27 @@ find_section_number (bfd *abfd, asection *sect)
>    return 0;
>  }
>  
> +/* Used as parameter to qsort, to sort section contributions by section and
> +   offset.  */
> +static int
> +section_contribs_compare (const void *p1, const void *p2)
> +{
> +  const struct in_sc *sc1 = (const struct in_sc *) p1;
> +  const struct in_sc *sc2 = (const struct in_sc *) p2;

In ANSI C there's no need for these casts; it may be that they were
needed in pre-ANSI dialects like K&R. Personally I view _any_ cast
as latently dangerous, and hence I'd prefer if casts were used only
if there's no other option.

> @@ -4177,32 +4209,43 @@ create_section_contrib_substream (bfd *abfd, void **data, uint32_t *size)
>      {
>        for (asection *s = in->sections; s; s = s->next)
>  	{
> -	  uint16_t sect_num;
> -
>  	  if (s->size == 0 || discarded_section (s))
>  	    continue;
>  
> -	  sect_num = find_section_number (abfd, s->output_section);
> -
> -	  memcpy (&sc->characteristics,
> -		  sect_flags + ((sect_num - 1) * sizeof (uint32_t)),
> -		  sizeof (uint32_t));
> -
> -	  bfd_putl16 (sect_num, &sc->section);
> -	  bfd_putl16 (0, &sc->padding1);
> -	  bfd_putl32 (s->output_offset, &sc->offset);
> -	  bfd_putl32 (s->size, &sc->size);
> -	  bfd_putl16 (mod_index, &sc->module_index);
> -	  bfd_putl16 (0, &sc->padding2);
> -	  bfd_putl32 (0, &sc->data_crc);
> -	  bfd_putl32 (0, &sc->reloc_crc);
> +	  sc2->s = s;
> +	  sc2->sect_num = find_section_number (abfd, s->output_section);
> +	  sc2->mod_index = mod_index;
>  
> -	  sc++;
> +	  sc2++;
>  	}
>  
>        mod_index++;
>      }
>  
> +  qsort (sc_in, num_sc, sizeof (struct in_sc), section_contribs_compare);
> +
> +  sc =
> +    (struct section_contribution *) ((uint8_t *) *data + sizeof (uint32_t));

This one's more interesting: Some cast is needed here at least as long as
we don't mean to allow use of GNU extensions (here: arithmetic on pointers
to void). But seeing that this causes a line length issue, at a minimum
I'd recommend to go with

  sc = (void *) ((uint8_t *) *data + sizeof (uint32_t));

(Ideally sc would be pointer-to-const and the cast here then also one to
pointer-to-const.)

Nick, Alan - thoughts?

Jan
  
Nick Clifton Feb. 21, 2023, 10:49 a.m. UTC | #2
Hi Jan, Hi Mark,

> In principle okay, but I'd like to re-raise the question of excess
> casting (hence including Nick and Alan as the far more experienced
> binutils maintainers):

Normally I would only be worried about excess casting if it is likely
to obscure a problem.  Unnecessary casting might be niggling from a
readability point of view, but I would not normally consider it to be
a reason to reject a patch.  Dangerous casting is another matter though...


>> +section_contribs_compare (const void *p1, const void *p2)
>> +{
>> +  const struct in_sc *sc1 = (const struct in_sc *) p1;
>> +  const struct in_sc *sc2 = (const struct in_sc *) p2;
> 
> In ANSI C there's no need for these casts; it may be that they were
> needed in pre-ANSI dialects like K&R.

Agreed - the casts are not needed here.

> Personally I view _any_ cast
> as latently dangerous, and hence I'd prefer if casts were used only
> if there's no other option.

Well I think casts from a void type to something else are usually reasonable,
But otherwise I would agree that they are often suspicious.


On a related note - I would consider this line to be problematic:

   sc_in = xmalloc (num_sc * sizeof (struct in_sc));

The code here implies that "sc_in" is a pointer to the "struct in_sc" type.
If at some future date the code is changed and the type of "sc_in" is changed
then the above line will still work, but the wrong amount of space will be
allocated.  So I would suggest changing it to either:

   sc_in = xmalloc (num_sc * sizeof (* sc_in));

Or:

   sc_in = xmalloc (num_sc * sizeof * sc_in);  /* I like this version, but nobody else does ... :-) */

Or:

   sc_in = XNEWVEC (typeof (sc_in), num_sc);


>> +  sc =
>> +    (struct section_contribution *) ((uint8_t *) *data + sizeof (uint32_t));
> 
> This one's more interesting: Some cast is needed here at least as long as
> we don't mean to allow use of GNU extensions (here: arithmetic on pointers
> to void). But seeing that this causes a line length issue, at a minimum
> I'd recommend to go with
> 
>    sc = (void *) ((uint8_t *) *data + sizeof (uint32_t));
> 
> (Ideally sc would be pointer-to-const and the cast here then also one to
> pointer-to-const.)
> 
> Nick, Alan - thoughts?

The code certainly is messy.  I do not like the implicit casting of a void
pointer to a structure pointer, so personally I would keep the assignment
cast and try to eliminate the cast of *data by using an intermediary variable:

   uint32_t * ptr = * data;
   sc = (struct section_contribution *) (ptr + 1); /* Skip the version word.  */

Even this looks wrong to me, since it assumes that it is OK to cast a 4 byte
aligned pointer to a structure pointer, with no guarantee that the structure
does not need a larger alignment.  I get that the code is computing the
contents of a section which starts with a version number followed by a set
of filled in objects, and that this is hard to express cleanly in C.  But still
I would not be surprised if a static analysis tool flagged this code as a
potential problem.

Cheers
   Nick
  
Jan Beulich Feb. 21, 2023, 11:03 a.m. UTC | #3
On 21.02.2023 11:49, Nick Clifton wrote:
> On a related note - I would consider this line to be problematic:
> 
>    sc_in = xmalloc (num_sc * sizeof (struct in_sc));
> 
> The code here implies that "sc_in" is a pointer to the "struct in_sc" type.
> If at some future date the code is changed and the type of "sc_in" is changed
> then the above line will still work, but the wrong amount of space will be
> allocated.

Oh, indeed, another pattern I would normally feel tempted to comment on, just
that I've overlooked it this time.

>  So I would suggest changing it to either:
> 
>    sc_in = xmalloc (num_sc * sizeof (* sc_in));

Yes.

> Or:
> 
>    sc_in = xmalloc (num_sc * sizeof * sc_in);  /* I like this version, but nobody else does ... :-) */

Well ... * is commutative as a binary operator, so how about re-writing
it to

   sc_in = xmalloc (num_sc * sc_in * sizeof);

;-) ?

> Or:
> 
>    sc_in = XNEWVEC (typeof (sc_in), num_sc);

I guess this one's the form that's best in line with what's used elsewhere
in binutils.

Jan
  
Pedro Alves Feb. 21, 2023, 6:44 p.m. UTC | #4
On 2023-02-21 11:03 a.m., Jan Beulich via Binutils wrote:
> On 21.02.2023 11:49, Nick Clifton wrote:
>> On a related note - I would consider this line to be problematic:
>>
>>    sc_in = xmalloc (num_sc * sizeof (struct in_sc));
>>
>> The code here implies that "sc_in" is a pointer to the "struct in_sc" type.
>> If at some future date the code is changed and the type of "sc_in" is changed
>> then the above line will still work, but the wrong amount of space will be
>> allocated.
> 
> Oh, indeed, another pattern I would normally feel tempted to comment on, just
> that I've overlooked it this time.
> 
>>  So I would suggest changing it to either:
>>
>>    sc_in = xmalloc (num_sc * sizeof (* sc_in));
> 
> Yes.
> 
>> Or:
>>
>>    sc_in = xmalloc (num_sc * sizeof * sc_in);  /* I like this version, but nobody else does ... :-) */
> 
> Well ... * is commutative as a binary operator, so how about re-writing
> it to
> 
>    sc_in = xmalloc (num_sc * sc_in * sizeof);
> 
> ;-) ?
> 
>> Or:
>>
>>    sc_in = XNEWVEC (typeof (sc_in), num_sc);
> 
> I guess this one's the form that's best in line with what's used elsewhere
> in binutils.

'typeof' is a GNU extension, though.

Switch to C++ and use 'decltype'? :-D
  
Alan Modra Feb. 22, 2023, 5:48 a.m. UTC | #5
On Tue, Feb 21, 2023 at 09:26:33AM +0100, Jan Beulich wrote:
> On 20.02.2023 15:13, Mark Harmstone wrote:
> > +/* Used as parameter to qsort, to sort section contributions by section and
> > +   offset.  */
> > +static int
> > +section_contribs_compare (const void *p1, const void *p2)
> > +{
> > +  const struct in_sc *sc1 = (const struct in_sc *) p1;
> > +  const struct in_sc *sc2 = (const struct in_sc *) p2;
> 
> In ANSI C there's no need for these casts; it may be that they were
> needed in pre-ANSI dialects like K&R. Personally I view _any_ cast
> as latently dangerous, and hence I'd prefer if casts were used only
> if there's no other option.

I agree that it's fine to write this without the casts, and I've even
used the cast to void* you mention later in your email to shorten
lines myself.  I agree that casts are inherently dangerous too, and
that it's a good idea to not use them unless necessary.  Also, it's
really, really annoying to need casts because something like
  os = &lang_os_list.head->output_section_statement;
gets a ubsan warning when lang_os_list.head is NULL, forcing you to
use a cast or accessor that loses the type checking or to write
horrendous code.  There have been bugs in ld list handling due to
casts.

However, I'm inclined to say that a cast in a qsort comparison
function, or to cast the return of malloc or similar is mostly a
matter of style.  If a contributor wants to write it that way, I'm
fine with approving new code with these casts.  After all, there is
plenty of such code in binutils already.
  
Jan Beulich Feb. 22, 2023, 7:08 a.m. UTC | #6
On 22.02.2023 06:48, Alan Modra wrote:
> On Tue, Feb 21, 2023 at 09:26:33AM +0100, Jan Beulich wrote:
>> On 20.02.2023 15:13, Mark Harmstone wrote:
>>> +/* Used as parameter to qsort, to sort section contributions by section and
>>> +   offset.  */
>>> +static int
>>> +section_contribs_compare (const void *p1, const void *p2)
>>> +{
>>> +  const struct in_sc *sc1 = (const struct in_sc *) p1;
>>> +  const struct in_sc *sc2 = (const struct in_sc *) p2;
>>
>> In ANSI C there's no need for these casts; it may be that they were
>> needed in pre-ANSI dialects like K&R. Personally I view _any_ cast
>> as latently dangerous, and hence I'd prefer if casts were used only
>> if there's no other option.
> 
> I agree that it's fine to write this without the casts, and I've even
> used the cast to void* you mention later in your email to shorten
> lines myself.  I agree that casts are inherently dangerous too, and
> that it's a good idea to not use them unless necessary.  Also, it's
> really, really annoying to need casts because something like
>   os = &lang_os_list.head->output_section_statement;
> gets a ubsan warning when lang_os_list.head is NULL, forcing you to
> use a cast or accessor that loses the type checking or to write
> horrendous code.  There have been bugs in ld list handling due to
> casts.
> 
> However, I'm inclined to say that a cast in a qsort comparison
> function, or to cast the return of malloc or similar is mostly a
> matter of style.  If a contributor wants to write it that way, I'm
> fine with approving new code with these casts.  After all, there is
> plenty of such code in binutils already.

Now that's an interesting perspective. Depending on feedback on the
topic I was meaning to possibly conclude that such unnecessary casts
would be ripe for removal. Perhaps not by hunting them down and
replacing them in an isolated effort, but as code is being touched
anyway.

Jan
  
Alan Modra Feb. 22, 2023, 10:52 a.m. UTC | #7
On Wed, Feb 22, 2023 at 08:08:43AM +0100, Jan Beulich wrote:
> On 22.02.2023 06:48, Alan Modra wrote:
> > On Tue, Feb 21, 2023 at 09:26:33AM +0100, Jan Beulich wrote:
> >> On 20.02.2023 15:13, Mark Harmstone wrote:
> >>> +/* Used as parameter to qsort, to sort section contributions by section and
> >>> +   offset.  */
> >>> +static int
> >>> +section_contribs_compare (const void *p1, const void *p2)
> >>> +{
> >>> +  const struct in_sc *sc1 = (const struct in_sc *) p1;
> >>> +  const struct in_sc *sc2 = (const struct in_sc *) p2;
> >>
> >> In ANSI C there's no need for these casts; it may be that they were
> >> needed in pre-ANSI dialects like K&R. Personally I view _any_ cast
> >> as latently dangerous, and hence I'd prefer if casts were used only
> >> if there's no other option.
> > 
> > I agree that it's fine to write this without the casts, and I've even
> > used the cast to void* you mention later in your email to shorten
> > lines myself.  I agree that casts are inherently dangerous too, and
> > that it's a good idea to not use them unless necessary.  Also, it's
> > really, really annoying to need casts because something like
> >   os = &lang_os_list.head->output_section_statement;
> > gets a ubsan warning when lang_os_list.head is NULL, forcing you to
> > use a cast or accessor that loses the type checking or to write
> > horrendous code.  There have been bugs in ld list handling due to
> > casts.
> > 
> > However, I'm inclined to say that a cast in a qsort comparison
> > function, or to cast the return of malloc or similar is mostly a
> > matter of style.  If a contributor wants to write it that way, I'm
> > fine with approving new code with these casts.  After all, there is
> > plenty of such code in binutils already.
> 
> Now that's an interesting perspective. Depending on feedback on the
> topic I was meaning to possibly conclude that such unnecessary casts
> would be ripe for removal. Perhaps not by hunting them down and
> replacing them in an isolated effort, but as code is being touched
> anyway.

I'm fine with doing that too, particularly as you touch code.

What I was trying to say, is that I think it's important for
maintainers to allow contributors some freedom in style, to tolerate
things that don't matter that much.
  
Mark Harmstone Feb. 27, 2023, 12:50 a.m. UTC | #8
On 22/2/23 10:52, Alan Modra wrote:
> On Wed, Feb 22, 2023 at 08:08:43AM +0100, Jan Beulich wrote:
>> On 22.02.2023 06:48, Alan Modra wrote:
>>> On Tue, Feb 21, 2023 at 09:26:33AM +0100, Jan Beulich wrote:
>>>> On 20.02.2023 15:13, Mark Harmstone wrote:
>>>>> +/* Used as parameter to qsort, to sort section contributions by section and
>>>>> +   offset.  */
>>>>> +static int
>>>>> +section_contribs_compare (const void *p1, const void *p2)
>>>>> +{
>>>>> +  const struct in_sc *sc1 = (const struct in_sc *) p1;
>>>>> +  const struct in_sc *sc2 = (const struct in_sc *) p2;
>>>> In ANSI C there's no need for these casts; it may be that they were
>>>> needed in pre-ANSI dialects like K&R. Personally I view _any_ cast
>>>> as latently dangerous, and hence I'd prefer if casts were used only
>>>> if there's no other option.
>>> I agree that it's fine to write this without the casts, and I've even
>>> used the cast to void* you mention later in your email to shorten
>>> lines myself.  I agree that casts are inherently dangerous too, and
>>> that it's a good idea to not use them unless necessary.  Also, it's
>>> really, really annoying to need casts because something like
>>>    os = &lang_os_list.head->output_section_statement;
>>> gets a ubsan warning when lang_os_list.head is NULL, forcing you to
>>> use a cast or accessor that loses the type checking or to write
>>> horrendous code.  There have been bugs in ld list handling due to
>>> casts.
>>>
>>> However, I'm inclined to say that a cast in a qsort comparison
>>> function, or to cast the return of malloc or similar is mostly a
>>> matter of style.  If a contributor wants to write it that way, I'm
>>> fine with approving new code with these casts.  After all, there is
>>> plenty of such code in binutils already.
>> Now that's an interesting perspective. Depending on feedback on the
>> topic I was meaning to possibly conclude that such unnecessary casts
>> would be ripe for removal. Perhaps not by hunting them down and
>> replacing them in an isolated effort, but as code is being touched
>> anyway.
> I'm fine with doing that too, particularly as you touch code.
>
> What I was trying to say, is that I think it's important for
> maintainers to allow contributors some freedom in style, to tolerate
> things that don't matter that much.
>
Thanks all, I'll resubmit with the changes suggested.

Jan, something to bear in mind is that doing such a thing would be counter-productive if binutils were ever to be converted to C++. I don't know if anybody's looked into such a thing in earnest - I'm guessing the project's too large for it to be worth anyone's time. But it'd be much more pleasant if the files in emultempl etc. were nice templated cpp files, rather than what's there at present :-)

Mark
  

Patch

diff --git a/ld/pdb.c b/ld/pdb.c
index 12c4ac4b112..6ac45dea4f2 100644
--- a/ld/pdb.c
+++ b/ld/pdb.c
@@ -107,6 +107,13 @@  struct globals
   htab_t hashmap;
 };
 
+struct in_sc
+{
+  asection *s;
+  uint16_t sect_num;
+  uint16_t mod_index;
+};
+
 static const uint32_t crc_table[] =
 {
   0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f,
@@ -4118,6 +4125,27 @@  find_section_number (bfd *abfd, asection *sect)
   return 0;
 }
 
+/* Used as parameter to qsort, to sort section contributions by section and
+   offset.  */
+static int
+section_contribs_compare (const void *p1, const void *p2)
+{
+  const struct in_sc *sc1 = (const struct in_sc *) p1;
+  const struct in_sc *sc2 = (const struct in_sc *) p2;
+
+  if (sc1->sect_num < sc2->sect_num)
+    return -1;
+  if (sc1->sect_num > sc2->sect_num)
+    return 1;
+
+  if (sc1->s->output_offset < sc2->s->output_offset)
+    return -1;
+  if (sc1->s->output_offset > sc2->s->output_offset)
+    return 1;
+
+  return 0;
+}
+
 /* Create the substream which maps addresses in the image file to locations
    in the original object files.  */
 static bool
@@ -4128,6 +4156,7 @@  create_section_contrib_substream (bfd *abfd, void **data, uint32_t *size)
   uint16_t mod_index;
   char *sect_flags;
   file_ptr offset;
+  struct in_sc *sc_in, *sc2;
 
   for (bfd *in = coff_data (abfd)->link_info->input_bfds; in;
        in = in->link.next)
@@ -4168,8 +4197,11 @@  create_section_contrib_substream (bfd *abfd, void **data, uint32_t *size)
       offset += sizeof (struct external_scnhdr);
     }
 
-  sc =
-    (struct section_contribution *) ((uint8_t *) *data + sizeof (uint32_t));
+  /* Microsoft's DIA expects section contributions to be sorted by section
+     number and offset, otherwise it will be unable to resolve line numbers.  */
+
+  sc_in = xmalloc (num_sc * sizeof (struct in_sc));
+  sc2 = sc_in;
 
   mod_index = 0;
   for (bfd *in = coff_data (abfd)->link_info->input_bfds; in;
@@ -4177,32 +4209,43 @@  create_section_contrib_substream (bfd *abfd, void **data, uint32_t *size)
     {
       for (asection *s = in->sections; s; s = s->next)
 	{
-	  uint16_t sect_num;
-
 	  if (s->size == 0 || discarded_section (s))
 	    continue;
 
-	  sect_num = find_section_number (abfd, s->output_section);
-
-	  memcpy (&sc->characteristics,
-		  sect_flags + ((sect_num - 1) * sizeof (uint32_t)),
-		  sizeof (uint32_t));
-
-	  bfd_putl16 (sect_num, &sc->section);
-	  bfd_putl16 (0, &sc->padding1);
-	  bfd_putl32 (s->output_offset, &sc->offset);
-	  bfd_putl32 (s->size, &sc->size);
-	  bfd_putl16 (mod_index, &sc->module_index);
-	  bfd_putl16 (0, &sc->padding2);
-	  bfd_putl32 (0, &sc->data_crc);
-	  bfd_putl32 (0, &sc->reloc_crc);
+	  sc2->s = s;
+	  sc2->sect_num = find_section_number (abfd, s->output_section);
+	  sc2->mod_index = mod_index;
 
-	  sc++;
+	  sc2++;
 	}
 
       mod_index++;
     }
 
+  qsort (sc_in, num_sc, sizeof (struct in_sc), section_contribs_compare);
+
+  sc =
+    (struct section_contribution *) ((uint8_t *) *data + sizeof (uint32_t));
+
+  for (unsigned int i = 0; i < num_sc; i++)
+    {
+      memcpy (&sc->characteristics,
+	      sect_flags + ((sc_in[i].sect_num - 1) * sizeof (uint32_t)),
+	      sizeof (uint32_t));
+
+      bfd_putl16 (sc_in[i].sect_num, &sc->section);
+      bfd_putl16 (0, &sc->padding1);
+      bfd_putl32 (sc_in[i].s->output_offset, &sc->offset);
+      bfd_putl32 (sc_in[i].s->size, &sc->size);
+      bfd_putl16 (sc_in[i].mod_index, &sc->module_index);
+      bfd_putl16 (0, &sc->padding2);
+      bfd_putl32 (0, &sc->data_crc);
+      bfd_putl32 (0, &sc->reloc_crc);
+
+      sc++;
+    }
+
+  free (sc_in);
   free (sect_flags);
 
   return true;
diff --git a/ld/testsuite/ld-pe/pdb2-section-contrib.d b/ld/testsuite/ld-pe/pdb2-section-contrib.d
index dd9437214bb..65ed76dafdc 100644
--- a/ld/testsuite/ld-pe/pdb2-section-contrib.d
+++ b/ld/testsuite/ld-pe/pdb2-section-contrib.d
@@ -4,9 +4,9 @@  tmpdir/pdb2-sc:     file format binary
 Contents of section .data:
  0000 2dba2ef1 01000000 00000000 10000000  -...............
  0010 20000060 00000000 00000000 00000000   ..`............
- 0020 02000000 00000000 3d000000 40000040  ........=...@..@
- 0030 00000000 00000000 00000000 01000000  ................
- 0040 10000000 10000000 20000060 01000000  ........ ..`....
+ 0020 01000000 10000000 10000000 20000060  ............ ..`
+ 0030 01000000 00000000 00000000 02000000  ................
+ 0040 00000000 3d000000 40000040 00000000  ....=...@..@....
  0050 00000000 00000000 04000000 00000000  ................
  0060 0c000000 40000042 02000000 00000000  ....@..B........
  0070 00000000                             ....            
\ No newline at end of file