btf: do not use the CHAR `encoding' bit for BTF

Message ID 877d458k7r.fsf@oracle.com
State New, archived
Headers
Series btf: do not use the CHAR `encoding' bit for BTF |

Commit Message

Li, Pan2 via Gcc-patches July 22, 2022, 11:23 a.m. UTC
  Contrary to CTF and our previous expectations, as per [1], turns out
that in BTF:

1) The `encoding' field in integer types shall not be treated as a
   bitmap, but as an enumerated, i.e. these bits are exclusive to each
   other.

2) The CHAR bit in `encoding' shall _not_ be set when emitting types
   for char nor `unsigned char'.

Consequently this patch clears the CHAR bit before emitting the
variable part of BTF integral types.  It also updates the testsuite
accordingly, expanding it to check for BOOL bits.

[1] https://lore.kernel.org/bpf/a73586ad-f2dc-0401-1eba-2004357b7edf@fb.com/T/#t

gcc/ChangeLog:

	* btfout.cc (output_asm_btf_vlen_bytes): Do not use the CHAR
	encoding bit in BTF.

gcc/testsuite/ChangeLog:

	* gcc.dg/debug/btf/btf-int-1.c: Do not check for char bits in
	bti_encoding and check for bool bits.
---
 gcc/btfout.cc                              |  4 ++++
 gcc/testsuite/gcc.dg/debug/btf/btf-int-1.c | 18 +++++++++++-------
 2 files changed, 15 insertions(+), 7 deletions(-)
  

Comments

Indu Bhagat July 26, 2022, 9:58 p.m. UTC | #1
On 7/22/22 4:23 AM, Jose E. Marchesi via Gcc-patches wrote:
> 
> Contrary to CTF and our previous expectations, as per [1], turns out
> that in BTF:
> 
> 1) The `encoding' field in integer types shall not be treated as a
>     bitmap, but as an enumerated, i.e. these bits are exclusive to each
>     other.
> 
> 2) The CHAR bit in `encoding' shall _not_ be set when emitting types
>     for char nor `unsigned char'.
> 

Hmm...well.  At this time, I suggest we make a note of this in the btf.h 
for posterity that BTF_INT_CHAR is to not be used (i.e., BTF_INT_CHAR 
should not be set for char / unsigned char).

> Consequently this patch clears the CHAR bit before emitting the
> variable part of BTF integral types.  It also updates the testsuite
> accordingly, expanding it to check for BOOL bits.
> 
> [1] https://lore.kernel.org/bpf/a73586ad-f2dc-0401-1eba-2004357b7edf@fb.com/T/#t
> 
> gcc/ChangeLog:
> 
> 	* btfout.cc (output_asm_btf_vlen_bytes): Do not use the CHAR
> 	encoding bit in BTF.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/debug/btf/btf-int-1.c: Do not check for char bits in
> 	bti_encoding and check for bool bits.
> ---
>   gcc/btfout.cc                              |  4 ++++
>   gcc/testsuite/gcc.dg/debug/btf/btf-int-1.c | 18 +++++++++++-------
>   2 files changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
> index 31af50521da..576f73d47cf 100644
> --- a/gcc/btfout.cc
> +++ b/gcc/btfout.cc
> @@ -914,6 +914,10 @@ output_asm_btf_vlen_bytes (ctf_container_ref ctfc, ctf_dtdef_ref dtd)
>         if (dtd->dtd_data.ctti_size < 1)
>   	break;
>   
> +      /* In BTF the CHAR `encoding' seems to not be used, so clear it
> +         here.  */
> +      dtd->dtd_u.dtu_enc.cte_format &= ~BTF_INT_CHAR;
> +

[Added David Faust]

What do you think about doing this in btf_dtd_emit_preprocess_cb () for 
types where kind == BTF_KIND_INT. This is the place where BTF specific 
massaging of type info takes place.

>         encoding = BTF_INT_DATA (dtd->dtd_u.dtu_enc.cte_format,
>   			       dtd->dtd_u.dtu_enc.cte_offset,
>   			       dtd->dtd_u.dtu_enc.cte_bits);
> diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-int-1.c b/gcc/testsuite/gcc.dg/debug/btf/btf-int-1.c
> index 2381decd6ff..87d9758e9cb 100644
> --- a/gcc/testsuite/gcc.dg/debug/btf/btf-int-1.c
> +++ b/gcc/testsuite/gcc.dg/debug/btf/btf-int-1.c
> @@ -4,7 +4,8 @@
>      | 0 | encoding | offset | 00 | bits |
>      encoding:
>        signed  1 << 24
> -     char    2 << 24
> +     char    2 << 24  (not used)
> +     bool    4 << 24
>   
>      All offsets in this test should be 0.
>      This test does _not_ check number of bits, as it may vary between targets.
> @@ -13,13 +14,14 @@
>   /* { dg-do compile } */
>   /* { dg-options "-O0 -gbtf -dA" } */
>   
> -/* Check for 8 BTF_KIND_INT types.  */
> -/* { dg-final { scan-assembler-times "\[\t \]0x1000000\[\t \]+\[^\n\]*btt_info" 8 } } */
> +/* Check for 9 BTF_KIND_INT types.  */
> +/* { dg-final { scan-assembler-times "\[\t \]0x1000000\[\t \]+\[^\n\]*btt_info" 9 } } */
>   
> -/* Check the signed/char flags, but not bit size. */
> -/* { dg-final { scan-assembler-times "\[\t \]0x10000..\[\t \]+\[^\n\]*bti_encoding" 3 } } */
> -/* { dg-final { scan-assembler-times "\[\t \]0x20000..\[\t \]+\[^\n\]*bti_encoding" 1 } } */
> -/* { dg-final { scan-assembler-times "\[\t \]0x30000..\[\t \]+\[^\n\]*bti_encoding" 1 } } */
> +/* Check the signed flags, but not bit size. */
> +/* { dg-final { scan-assembler-times "\[\t \]0x10000..\[\t \]+\[^\n\]*bti_encoding" 4 } } */
> +/* { dg-final { scan-assembler-times "\[\t \]0x..\[\t \]+\[^\n\]*bti_encoding" 3 } } */
> +/* { dg-final { scan-assembler-times "\[\t \]0x.\[\t \]+\[^\n\]*bti_encoding" 1 } } */
> +/* { dg-final { scan-assembler-times "\[\t \]0x40000..\[\t \]+\[^\n\]*bti_encoding" 1 } } */
>   
>   /* Check that there is a string entry for each type name.  */
>   /* { dg-final { scan-assembler-times "ascii \"unsigned char.0\"\[\t \]+\[^\n\]*btf_string" 1 } } */
> @@ -42,3 +44,5 @@ signed int f = -66;
>   
>   unsigned long int g = 77;
>   signed long int h = 88;
> +
> +_Bool x = 1;
>
  
David Faust Aug. 1, 2022, 6:52 p.m. UTC | #2
On 7/26/22 14:58, Indu Bhagat wrote:
> On 7/22/22 4:23 AM, Jose E. Marchesi via Gcc-patches wrote:
>>
>> Contrary to CTF and our previous expectations, as per [1], turns out
>> that in BTF:
>>
>> 1) The `encoding' field in integer types shall not be treated as a
>>     bitmap, but as an enumerated, i.e. these bits are exclusive to each
>>     other.
>>
>> 2) The CHAR bit in `encoding' shall _not_ be set when emitting types
>>     for char nor `unsigned char'.
>>
> 
> Hmm...well.  At this time, I suggest we make a note of this in the btf.h 
> for posterity that BTF_INT_CHAR is to not be used (i.e., BTF_INT_CHAR 
> should not be set for char / unsigned char).

Agreed it would be good to add this note.

> 
>> Consequently this patch clears the CHAR bit before emitting the
>> variable part of BTF integral types.  It also updates the testsuite
>> accordingly, expanding it to check for BOOL bits.
>>
>> [1] https://lore.kernel.org/bpf/a73586ad-f2dc-0401-1eba-2004357b7edf@fb.com/T/#t
>>
>> gcc/ChangeLog:
>>
>> 	* btfout.cc (output_asm_btf_vlen_bytes): Do not use the CHAR
>> 	encoding bit in BTF.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* gcc.dg/debug/btf/btf-int-1.c: Do not check for char bits in
>> 	bti_encoding and check for bool bits.
>> ---
>>   gcc/btfout.cc                              |  4 ++++
>>   gcc/testsuite/gcc.dg/debug/btf/btf-int-1.c | 18 +++++++++++-------
>>   2 files changed, 15 insertions(+), 7 deletions(-)
>>
>> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
>> index 31af50521da..576f73d47cf 100644
>> --- a/gcc/btfout.cc
>> +++ b/gcc/btfout.cc
>> @@ -914,6 +914,10 @@ output_asm_btf_vlen_bytes (ctf_container_ref ctfc, ctf_dtdef_ref dtd)
>>         if (dtd->dtd_data.ctti_size < 1)
>>   	break;
>>   
>> +      /* In BTF the CHAR `encoding' seems to not be used, so clear it
>> +         here.  */
>> +      dtd->dtd_u.dtu_enc.cte_format &= ~BTF_INT_CHAR;
>> +
> 
> [Added David Faust]
> 
> What do you think about doing this in btf_dtd_emit_preprocess_cb () for 
> types where kind == BTF_KIND_INT. This is the place where BTF specific 
> massaging of type info takes place.

Sorry for the delay.

I think this could be done in either place. I lean slightly in favor
of doing it here as implemented in this patch. Other integer encodings
are not specifically massaged into BTF; we leave the in-memory
representation the same as CTF and just write them out according to the
BTF rules since both formats hold the same information, and this function
is where we do that.

In my opinion, this is a similar case. It is not wrong hold on to the
information (internally) that this is a char type, rather it is a quirk
of the other BTF encoding rules which means it _is_ wrong to ever set the
bit saying as much when writing the BTF info (at least for now...).

So, this patch LGTM, but please add a brief note in btf.h that the CHAR
flag is currently not to be used.

Thanks

> 
>>         encoding = BTF_INT_DATA (dtd->dtd_u.dtu_enc.cte_format,
>>   			       dtd->dtd_u.dtu_enc.cte_offset,
>>   			       dtd->dtd_u.dtu_enc.cte_bits);
>> diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-int-1.c b/gcc/testsuite/gcc.dg/debug/btf/btf-int-1.c
>> index 2381decd6ff..87d9758e9cb 100644
>> --- a/gcc/testsuite/gcc.dg/debug/btf/btf-int-1.c
>> +++ b/gcc/testsuite/gcc.dg/debug/btf/btf-int-1.c
>> @@ -4,7 +4,8 @@
>>      | 0 | encoding | offset | 00 | bits |
>>      encoding:
>>        signed  1 << 24
>> -     char    2 << 24
>> +     char    2 << 24  (not used)
>> +     bool    4 << 24
>>   
>>      All offsets in this test should be 0.
>>      This test does _not_ check number of bits, as it may vary between targets.
>> @@ -13,13 +14,14 @@
>>   /* { dg-do compile } */
>>   /* { dg-options "-O0 -gbtf -dA" } */
>>   
>> -/* Check for 8 BTF_KIND_INT types.  */
>> -/* { dg-final { scan-assembler-times "\[\t \]0x1000000\[\t \]+\[^\n\]*btt_info" 8 } } */
>> +/* Check for 9 BTF_KIND_INT types.  */
>> +/* { dg-final { scan-assembler-times "\[\t \]0x1000000\[\t \]+\[^\n\]*btt_info" 9 } } */
>>   
>> -/* Check the signed/char flags, but not bit size. */
>> -/* { dg-final { scan-assembler-times "\[\t \]0x10000..\[\t \]+\[^\n\]*bti_encoding" 3 } } */
>> -/* { dg-final { scan-assembler-times "\[\t \]0x20000..\[\t \]+\[^\n\]*bti_encoding" 1 } } */
>> -/* { dg-final { scan-assembler-times "\[\t \]0x30000..\[\t \]+\[^\n\]*bti_encoding" 1 } } */
>> +/* Check the signed flags, but not bit size. */
>> +/* { dg-final { scan-assembler-times "\[\t \]0x10000..\[\t \]+\[^\n\]*bti_encoding" 4 } } */
>> +/* { dg-final { scan-assembler-times "\[\t \]0x..\[\t \]+\[^\n\]*bti_encoding" 3 } } */
>> +/* { dg-final { scan-assembler-times "\[\t \]0x.\[\t \]+\[^\n\]*bti_encoding" 1 } } */
>> +/* { dg-final { scan-assembler-times "\[\t \]0x40000..\[\t \]+\[^\n\]*bti_encoding" 1 } } */
>>   
>>   /* Check that there is a string entry for each type name.  */
>>   /* { dg-final { scan-assembler-times "ascii \"unsigned char.0\"\[\t \]+\[^\n\]*btf_string" 1 } } */
>> @@ -42,3 +44,5 @@ signed int f = -66;
>>   
>>   unsigned long int g = 77;
>>   signed long int h = 88;
>> +
>> +_Bool x = 1;
>>
>
  
Jose E. Marchesi Aug. 2, 2022, 3:42 p.m. UTC | #3
> On 7/26/22 14:58, Indu Bhagat wrote:
>> On 7/22/22 4:23 AM, Jose E. Marchesi via Gcc-patches wrote:
>>>
>>> Contrary to CTF and our previous expectations, as per [1], turns out
>>> that in BTF:
>>>
>>> 1) The `encoding' field in integer types shall not be treated as a
>>>     bitmap, but as an enumerated, i.e. these bits are exclusive to each
>>>     other.
>>>
>>> 2) The CHAR bit in `encoding' shall _not_ be set when emitting types
>>>     for char nor `unsigned char'.
>>>
>> 
>> Hmm...well.  At this time, I suggest we make a note of this in the btf.h 
>> for posterity that BTF_INT_CHAR is to not be used (i.e., BTF_INT_CHAR 
>> should not be set for char / unsigned char).
>
> Agreed it would be good to add this note.

Hmm, I am not sure such a comment actually belongs to include/btf.h,
which is not specific to the compiler and is supposed to reflect the BTF
format per-se.  The CHAR bit is documented in the kernel documentation
and it may be used at some point by bpflib, or who knows what.

That's why I put the comment in btfout.cc instead, to make it clear that
BTF_INT_CHAR is indeed not to be set for char / unsigned char by the
compiler:

>>> +      /* In BTF the CHAR `encoding' seems to not be used, so clear it
>>> +         here.  */
>>> +      dtd->dtd_u.dtu_enc.cte_format &= ~BTF_INT_CHAR;
  
David Faust Aug. 2, 2022, 4:05 p.m. UTC | #4
On 8/2/22 08:42, Jose E. Marchesi wrote:
> 
>> On 7/26/22 14:58, Indu Bhagat wrote:
>>> On 7/22/22 4:23 AM, Jose E. Marchesi via Gcc-patches wrote:
>>>>
>>>> Contrary to CTF and our previous expectations, as per [1], turns out
>>>> that in BTF:
>>>>
>>>> 1) The `encoding' field in integer types shall not be treated as a
>>>>     bitmap, but as an enumerated, i.e. these bits are exclusive to each
>>>>     other.
>>>>
>>>> 2) The CHAR bit in `encoding' shall _not_ be set when emitting types
>>>>     for char nor `unsigned char'.
>>>>
>>>
>>> Hmm...well.  At this time, I suggest we make a note of this in the btf.h 
>>> for posterity that BTF_INT_CHAR is to not be used (i.e., BTF_INT_CHAR 
>>> should not be set for char / unsigned char).
>>
>> Agreed it would be good to add this note.
> 
> Hmm, I am not sure such a comment actually belongs to include/btf.h,
> which is not specific to the compiler and is supposed to reflect the BTF
> format per-se.  The CHAR bit is documented in the kernel documentation
> and it may be used at some point by bpflib, or who knows what.

OK you make a good point.

In that case the patch LGTM to commit. Thanks!

> 
> That's why I put the comment in btfout.cc instead, to make it clear that
> BTF_INT_CHAR is indeed not to be set for char / unsigned char by the
> compiler:
> 
>>>> +      /* In BTF the CHAR `encoding' seems to not be used, so clear it
>>>> +         here.  */
>>>> +      dtd->dtd_u.dtu_enc.cte_format &= ~BTF_INT_CHAR;
  
Jose E. Marchesi Aug. 2, 2022, 5:26 p.m. UTC | #5
> On 8/2/22 08:42, Jose E. Marchesi wrote:
>> 
>>> On 7/26/22 14:58, Indu Bhagat wrote:
>>>> On 7/22/22 4:23 AM, Jose E. Marchesi via Gcc-patches wrote:
>>>>>
>>>>> Contrary to CTF and our previous expectations, as per [1], turns out
>>>>> that in BTF:
>>>>>
>>>>> 1) The `encoding' field in integer types shall not be treated as a
>>>>>     bitmap, but as an enumerated, i.e. these bits are exclusive to each
>>>>>     other.
>>>>>
>>>>> 2) The CHAR bit in `encoding' shall _not_ be set when emitting types
>>>>>     for char nor `unsigned char'.
>>>>>
>>>>
>>>> Hmm...well.  At this time, I suggest we make a note of this in the btf.h 
>>>> for posterity that BTF_INT_CHAR is to not be used (i.e., BTF_INT_CHAR 
>>>> should not be set for char / unsigned char).
>>>
>>> Agreed it would be good to add this note.
>> 
>> Hmm, I am not sure such a comment actually belongs to include/btf.h,
>> which is not specific to the compiler and is supposed to reflect the BTF
>> format per-se.  The CHAR bit is documented in the kernel documentation
>> and it may be used at some point by bpflib, or who knows what.
>
> OK you make a good point.
>
> In that case the patch LGTM to commit. Thanks!

Pushed to master.
Thanks!

>> 
>> That's why I put the comment in btfout.cc instead, to make it clear that
>> BTF_INT_CHAR is indeed not to be set for char / unsigned char by the
>> compiler:
>> 
>>>>> +      /* In BTF the CHAR `encoding' seems to not be used, so clear it
>>>>> +         here.  */
>>>>> +      dtd->dtd_u.dtu_enc.cte_format &= ~BTF_INT_CHAR;
  

Patch

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index 31af50521da..576f73d47cf 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -914,6 +914,10 @@  output_asm_btf_vlen_bytes (ctf_container_ref ctfc, ctf_dtdef_ref dtd)
       if (dtd->dtd_data.ctti_size < 1)
 	break;
 
+      /* In BTF the CHAR `encoding' seems to not be used, so clear it
+         here.  */
+      dtd->dtd_u.dtu_enc.cte_format &= ~BTF_INT_CHAR;
+
       encoding = BTF_INT_DATA (dtd->dtd_u.dtu_enc.cte_format,
 			       dtd->dtd_u.dtu_enc.cte_offset,
 			       dtd->dtd_u.dtu_enc.cte_bits);
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-int-1.c b/gcc/testsuite/gcc.dg/debug/btf/btf-int-1.c
index 2381decd6ff..87d9758e9cb 100644
--- a/gcc/testsuite/gcc.dg/debug/btf/btf-int-1.c
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-int-1.c
@@ -4,7 +4,8 @@ 
    | 0 | encoding | offset | 00 | bits |
    encoding:
      signed  1 << 24
-     char    2 << 24
+     char    2 << 24  (not used)
+     bool    4 << 24
 
    All offsets in this test should be 0.
    This test does _not_ check number of bits, as it may vary between targets.
@@ -13,13 +14,14 @@ 
 /* { dg-do compile } */
 /* { dg-options "-O0 -gbtf -dA" } */
 
-/* Check for 8 BTF_KIND_INT types.  */
-/* { dg-final { scan-assembler-times "\[\t \]0x1000000\[\t \]+\[^\n\]*btt_info" 8 } } */
+/* Check for 9 BTF_KIND_INT types.  */
+/* { dg-final { scan-assembler-times "\[\t \]0x1000000\[\t \]+\[^\n\]*btt_info" 9 } } */
 
-/* Check the signed/char flags, but not bit size. */
-/* { dg-final { scan-assembler-times "\[\t \]0x10000..\[\t \]+\[^\n\]*bti_encoding" 3 } } */
-/* { dg-final { scan-assembler-times "\[\t \]0x20000..\[\t \]+\[^\n\]*bti_encoding" 1 } } */
-/* { dg-final { scan-assembler-times "\[\t \]0x30000..\[\t \]+\[^\n\]*bti_encoding" 1 } } */
+/* Check the signed flags, but not bit size. */
+/* { dg-final { scan-assembler-times "\[\t \]0x10000..\[\t \]+\[^\n\]*bti_encoding" 4 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0x..\[\t \]+\[^\n\]*bti_encoding" 3 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0x.\[\t \]+\[^\n\]*bti_encoding" 1 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0x40000..\[\t \]+\[^\n\]*bti_encoding" 1 } } */
 
 /* Check that there is a string entry for each type name.  */
 /* { dg-final { scan-assembler-times "ascii \"unsigned char.0\"\[\t \]+\[^\n\]*btf_string" 1 } } */
@@ -42,3 +44,5 @@  signed int f = -66;
 
 unsigned long int g = 77;
 signed long int h = 88;
+
+_Bool x = 1;