bfd: use less memory in string merging

Message ID alpine.LSU.2.20.2311071651280.15233@wotan.suse.de
State Accepted
Headers
Series bfd: use less memory in string merging |

Checks

Context Check Description
snail/binutils-gdb-check success Github commit url

Commit Message

Michael Matz Nov. 7, 2023, 4:51 p.m. UTC
  the offset-to-entry mappings are allocated in blocks, which may
become a bit wasteful in case there are extremely many small
input files or sections.  This made it so that a large project
(Qt5WebEngine) didn't build anymore on x86 32bit due to address
space limits.  It barely fit into address space before the new
string merging, and then got pushed over the limit by this.

So instead of leaving the waste reallocate the maps to their final
size once known.  Now the link barely fits again.

bfd/
    * merge.c (record_section): Reallocate offset maps to their
    final size.
---

regtested on many targets, okay for master?

 bfd/merge.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)
  

Comments

Jan Beulich Nov. 8, 2023, 7:41 a.m. UTC | #1
On 07.11.2023 17:51, Michael Matz wrote:
> --- a/bfd/merge.c
> +++ b/bfd/merge.c
> @@ -767,6 +767,18 @@ record_section (struct sec_merge_info *sinfo,
>  
>    free (contents);
>    contents = NULL;
> +
> +  /* We allocate the ofsmap arrays in blocks of 2048 elements.
> +     In case we have very many small input files/sections,
> +     this might waste large amounts of memory, so reallocate these
> +     arrays here to their true size.  */
> +  amt = secinfo->noffsetmap + 1;
> +  secinfo->map_ofs = bfd_realloc (secinfo->map_ofs,
> +				  amt * sizeof(secinfo->map_ofs[0]));
> +  BFD_ASSERT (secinfo->map_ofs);
> +  secinfo->map = bfd_realloc (secinfo->map, amt * sizeof(secinfo->map[0]));
> +  BFD_ASSERT (secinfo->map);

Re-use of the same block when shrinking isn't guaranteed, so depending
on the underlying allocator this may actually add memory pressure (and
the allocations may therefore also fail). I think it would be nice to
be independent of such an implementation detail of the underlying
library. (It may also be worthwhile then to shrink the larger of the
two arrays first. Otoh the comment ahead of mapofs_type already
indicates that this type may need widening at some point.)

Jan
  
Michael Matz Nov. 8, 2023, 1:39 p.m. UTC | #2
Hello,

On Wed, 8 Nov 2023, Jan Beulich wrote:

> > +  /* We allocate the ofsmap arrays in blocks of 2048 elements.
> > +     In case we have very many small input files/sections,
> > +     this might waste large amounts of memory, so reallocate these
> > +     arrays here to their true size.  */
> > +  amt = secinfo->noffsetmap + 1;
> > +  secinfo->map_ofs = bfd_realloc (secinfo->map_ofs,
> > +				  amt * sizeof(secinfo->map_ofs[0]));
> > +  BFD_ASSERT (secinfo->map_ofs);
> > +  secinfo->map = bfd_realloc (secinfo->map, amt * sizeof(secinfo->map[0]));
> > +  BFD_ASSERT (secinfo->map);
> 
> Re-use of the same block when shrinking isn't guaranteed, so depending
> on the underlying allocator this may actually add memory pressure (and
> the allocations may therefore also fail).

That's true, strictly speaking, but when this doesn't average out over 
thousands of blocks then it's such a low quality malloc(3) that it will 
have many other problems as well.  Certainly with the cases that lead me 
to the above (linking running nearly out of 32bit address space).  So I 
had that worry as well and rejected it.

> I think it would be nice to be independent of such an implementation 
> detail of the underlying library.

Yes.  But do you mean it as pre-requisite for the patch?  In that case 
I'll try something about a bucket allocator for the offsetmap blocks, 
though I think it's a bit on the extreme to work around lousy mallocs in 
current times.

> (It may also be worthwhile then to shrink the larger of the
> two arrays first. Otoh the comment ahead of mapofs_type already
> indicates that this type may need widening at some point.)

That is true nevertheless, so consider this changed in the patch.


Ciao,
Michael.
  
Jan Beulich Nov. 9, 2023, 7:59 a.m. UTC | #3
On 08.11.2023 14:39, Michael Matz wrote:
> Hello,
> 
> On Wed, 8 Nov 2023, Jan Beulich wrote:
> 
>>> +  /* We allocate the ofsmap arrays in blocks of 2048 elements.
>>> +     In case we have very many small input files/sections,
>>> +     this might waste large amounts of memory, so reallocate these
>>> +     arrays here to their true size.  */
>>> +  amt = secinfo->noffsetmap + 1;
>>> +  secinfo->map_ofs = bfd_realloc (secinfo->map_ofs,
>>> +				  amt * sizeof(secinfo->map_ofs[0]));
>>> +  BFD_ASSERT (secinfo->map_ofs);
>>> +  secinfo->map = bfd_realloc (secinfo->map, amt * sizeof(secinfo->map[0]));
>>> +  BFD_ASSERT (secinfo->map);
>>
>> Re-use of the same block when shrinking isn't guaranteed, so depending
>> on the underlying allocator this may actually add memory pressure (and
>> the allocations may therefore also fail).
> 
> That's true, strictly speaking, but when this doesn't average out over 
> thousands of blocks then it's such a low quality malloc(3) that it will 
> have many other problems as well.  Certainly with the cases that lead me 
> to the above (linking running nearly out of 32bit address space).  So I 
> had that worry as well and rejected it.
> 
>> I think it would be nice to be independent of such an implementation 
>> detail of the underlying library.
> 
> Yes.  But do you mean it as pre-requisite for the patch?  In that case 
> I'll try something about a bucket allocator for the offsetmap blocks, 
> though I think it's a bit on the extreme to work around lousy mallocs in 
> current times.

I definitely wouldn't go as far as asking for such a rework. What I'd
like to see though is that the realloc() return values be latched into
a local, and instead of failing upon the function returning NULL the
old pointers in the struct simply not be updated. Preferably with that
adjustment okay to put in.

Jan

>> (It may also be worthwhile then to shrink the larger of the
>> two arrays first. Otoh the comment ahead of mapofs_type already
>> indicates that this type may need widening at some point.)
> 
> That is true nevertheless, so consider this changed in the patch.
> 
> 
> Ciao,
> Michael.
  
Michael Matz Nov. 9, 2023, 4:45 p.m. UTC | #4
Hey,

On Thu, 9 Nov 2023, Jan Beulich wrote:

> >> I think it would be nice to be independent of such an implementation 
> >> detail of the underlying library.
> > 
> > Yes.  But do you mean it as pre-requisite for the patch?  In that case 
> > I'll try something about a bucket allocator for the offsetmap blocks, 
> > though I think it's a bit on the extreme to work around lousy mallocs in 
> > current times.
> 
> I definitely wouldn't go as far as asking for such a rework. What I'd
> like to see though is that the realloc() return values be latched into
> a local, and instead of failing upon the function returning NULL the
> old pointers in the struct simply not be updated. Preferably with that
> adjustment okay to put in.

Oh, that makes sense, yes.  (The contract on the bfd_realloc wrapper is a 
bit unhelpful here, it invariably will set bfd_error_no_memory when 
returning NULL, but I still agree with you that it's nicer to not 
overwrite the existing pointer when realloc didn't work).


Ciao,
Michael.

> 
> Jan
> 
> >> (It may also be worthwhile then to shrink the larger of the
> >> two arrays first. Otoh the comment ahead of mapofs_type already
> >> indicates that this type may need widening at some point.)
> > 
> > That is true nevertheless, so consider this changed in the patch.
> > 
> > 
> > Ciao,
> > Michael.
>
  

Patch

diff --git a/bfd/merge.c b/bfd/merge.c
index 4aa2f838679..ccefb707c47 100644
--- a/bfd/merge.c
+++ b/bfd/merge.c
@@ -767,6 +767,18 @@  record_section (struct sec_merge_info *sinfo,
 
   free (contents);
   contents = NULL;
+
+  /* We allocate the ofsmap arrays in blocks of 2048 elements.
+     In case we have very many small input files/sections,
+     this might waste large amounts of memory, so reallocate these
+     arrays here to their true size.  */
+  amt = secinfo->noffsetmap + 1;
+  secinfo->map_ofs = bfd_realloc (secinfo->map_ofs,
+				  amt * sizeof(secinfo->map_ofs[0]));
+  BFD_ASSERT (secinfo->map_ofs);
+  secinfo->map = bfd_realloc (secinfo->map, amt * sizeof(secinfo->map[0]));
+  BFD_ASSERT (secinfo->map);
+
   /*printf ("ZZZ %s:%s %u entries\n", sec->owner->filename, sec->name,
 	  (unsigned)secinfo->noffsetmap);*/