[0/2] gas,bpf: cleanup bad symbols created while parsing

Message ID 20231114175805.7783-1-david.faust@oracle.com
Headers
Series gas,bpf: cleanup bad symbols created while parsing |

Message

David Faust Nov. 14, 2023, 5:58 p.m. UTC
  To support the "pseudo-C" asm dialect in BPF, the BPF parser must
often attempt multiple different templates for a single instruction.
In some cases, this can lead to a call to expression () which creates a
symbol, and only later the template is determined not to match and the
expression is discarded.

However, symbols created during this process are added to the symbol
table and are not removed if the expression is discarded.

This is a problem for BPF: generally the assembled object will be
loaded directly by the Linux kernel, without being linked.  The kernel
BPF loader requires that BTF information is available for every symbol
in a loaded BPF program, which will not be available for these symbols
erroneously created by the parser.

Patch 1 adds a symbol_table_remove () function to symbols.c and
exposes it in the header, since symbol_remove () is not sufficient to
prevent such symbols being written in the symbol table of the resulting
ELF object.

Patch 2 detects cases in the BPF parser where a call to expression ()
created any new symbol(s), but the parsing of the instruction as a
whole against the current template failed.  In that case the created
symbols are deleted.

David Faust (2):
  gas: add symbol_table_remove
  bpf: remove symbols created during failed parse

 gas/config/tc-bpf.c                     | 30 +++++++++++++++++++++++++
 gas/symbols.c                           | 10 +++++++++
 gas/symbols.h                           |  1 +
 gas/testsuite/gas/bpf/asm-extra-sym-1.d |  7 ++++++
 gas/testsuite/gas/bpf/asm-extra-sym-1.s |  1 +
 gas/testsuite/gas/bpf/asm-extra-sym-2.d |  7 ++++++
 gas/testsuite/gas/bpf/asm-extra-sym-2.s |  8 +++++++
 gas/testsuite/gas/bpf/bpf.exp           |  4 ++++
 8 files changed, 68 insertions(+)
 create mode 100644 gas/testsuite/gas/bpf/asm-extra-sym-1.d
 create mode 100644 gas/testsuite/gas/bpf/asm-extra-sym-1.s
 create mode 100644 gas/testsuite/gas/bpf/asm-extra-sym-2.d
 create mode 100644 gas/testsuite/gas/bpf/asm-extra-sym-2.s
  

Comments

David Faust Nov. 14, 2023, 10:13 p.m. UTC | #1
On 11/14/23 09:58, David Faust wrote:
...
> 
> Patch 1 adds a symbol_table_remove () function to symbols.c and
> exposes it in the header, since symbol_remove () is not sufficient to
> prevent such symbols being written in the symbol table of the resulting
> ELF object.

Hm, I misspoke here and inverted the reasoning.  To correct myself:

symbol_remove () is sufficient to prevent the symbol from being emitted.

But it does not remove the symbol from the hash table used by the
symbol_find () family of routines.  That means later calls to e.g.
symbol_find_or_make () will return a reference to the old (removed)
symbol.  The symbol will not be re-added to the symbol list, so it will
not be emitted, even in cases where it is real and needed.

Adding and using symbol_table_remove () allows a removed symbol to be
potentially re-added later.
  
Jan Beulich Nov. 15, 2023, 9:49 a.m. UTC | #2
On 14.11.2023 18:58, David Faust wrote:
> To support the "pseudo-C" asm dialect in BPF, the BPF parser must
> often attempt multiple different templates for a single instruction.
> In some cases, this can lead to a call to expression () which creates a
> symbol, and only later the template is determined not to match and the
> expression is discarded.
> 
> However, symbols created during this process are added to the symbol
> table and are not removed if the expression is discarded.
> 
> This is a problem for BPF: generally the assembled object will be
> loaded directly by the Linux kernel, without being linked.  The kernel
> BPF loader requires that BTF information is available for every symbol
> in a loaded BPF program, which will not be available for these symbols
> erroneously created by the parser.
> 
> Patch 1 adds a symbol_table_remove () function to symbols.c and
> exposes it in the header, since symbol_remove () is not sufficient to
> prevent such symbols being written in the symbol table of the resulting
> ELF object.
> 
> Patch 2 detects cases in the BPF parser where a call to expression ()
> created any new symbol(s), but the parsing of the instruction as a
> whole against the current template failed.  In that case the created
> symbols are deleted.

While this may be a workaround (and perhaps even a viable one), I think
it would be better to suppress symbol table insertion in the first place.
I had to solve a somewhat related issue for RISC-V (and I expect MIPS
would want to also be switched to a similar approach), see 7a29ee290307
("RISC-V: adjust logic to avoid register name symbols"). I've looked at
the testcases added by patch 2. While the first one's purpose is clear
(thanks to the comment there), I can't really figure what the 2nd aims to
test. Which may mean that I'm not properly understanding what (set of)
condition(s) is/are involved here, and hence whether using one or more of
the existing target hooks can indeed help here (without needing to
transiently insert any symbols, and then having target-specific code
depend on how exactly symbols are inserted).

Jan
  
David Faust Nov. 15, 2023, 9:57 p.m. UTC | #3
On 11/15/23 01:49, Jan Beulich wrote:
> On 14.11.2023 18:58, David Faust wrote:
>> To support the "pseudo-C" asm dialect in BPF, the BPF parser must
>> often attempt multiple different templates for a single instruction.
>> In some cases, this can lead to a call to expression () which creates a
>> symbol, and only later the template is determined not to match and the
>> expression is discarded.
>>
>> However, symbols created during this process are added to the symbol
>> table and are not removed if the expression is discarded.
>>
>> This is a problem for BPF: generally the assembled object will be
>> loaded directly by the Linux kernel, without being linked.  The kernel
>> BPF loader requires that BTF information is available for every symbol
>> in a loaded BPF program, which will not be available for these symbols
>> erroneously created by the parser.
>>
>> Patch 1 adds a symbol_table_remove () function to symbols.c and
>> exposes it in the header, since symbol_remove () is not sufficient to
>> prevent such symbols being written in the symbol table of the resulting
>> ELF object.
>>
>> Patch 2 detects cases in the BPF parser where a call to expression ()
>> created any new symbol(s), but the parsing of the instruction as a
>> whole against the current template failed.  In that case the created
>> symbols are deleted.
> 
> While this may be a workaround (and perhaps even a viable one), I think
> it would be better to suppress symbol table insertion in the first place.
> I had to solve a somewhat related issue for RISC-V (and I expect MIPS
> would want to also be switched to a similar approach), see 7a29ee290307
> ("RISC-V: adjust logic to avoid register name symbols"). I've looked at
> the testcases added by patch 2. While the first one's purpose is clear
> (thanks to the comment there), I can't really figure what the 2nd aims to
> test. Which may mean that I'm not properly understanding what (set of)
> condition(s) is/are involved here, and hence whether using one or more of
> the existing target hooks can indeed help here (without needing to
> transiently insert any symbols, and then having target-specific code
> depend on how exactly symbols are inserted).

Hi Jan,
Thank you for the review. This is very helpful.

I agree that it is be better to suppress symbol table insertion. To be
honest, I did not think of a good way to do it before this workaround.
So, thank you very much for the pointer to 7a29ee290307 because that is
certainly a nicer approach :). I think something similar to that will
work for BPF too, and I plan to rewrite the patch to use that approach.

As for the 2nd test, before adding and using symbol_table_remove ()
it would fail because the symbol 'foo' was not emitted in the object.
This happened because a failed instruction parse would now remove
any symbol created while parsing the instruction, including the real and
used 'foo'. The symbol was never "un-removed", because it remained in
the symbol hash table. So all later calls to find_symbol ('foo')
returned a reference to the removed symbol, rather than failing and
allowing 'foo' to be remade by a later instruction parse. With an
approach that suppresses insertion of wrong symbols rather than trying
to remove them later, this will not be an issue.
(The existing gas/bpf/indcall-1-pseudoc test was also failing for the
same reason.)

Thanks
David