[v2,0/5] LoongArch tls le model linker relaxation support.

Message ID 20231202065334.25904-1-changjiachen@stu.xupt.edu.cn
Headers
Series LoongArch tls le model linker relaxation support. |

Message

changjiachen Dec. 2, 2023, 6:53 a.m. UTC
  This is the v2 version of patches to support loongarch linker tls le model relax.

Changes from v1:

* Modified v1-0000-cover-letter.patch part of the explanatory content.

Before Modify:

example: __thread int a = 1;

old insn sequence:

lu12i.w $r12,%le_hi20_r(a)
ori     $r12,$r12,%le_lo12_r(a)
add.d   $r12,$r12,$r2,%le_add_r(a)
li.w  	$r13,$r0,1
stptr.w $r13,$r12,0

new insn sequence:

lu12i.w $r12,%le_hi20_r(a)
add.d   $r12,$r12,$r2,%le_add_r(a)
li.w  	$r13,$r0,1
st.w    $r13,$r12,%le_lo12_r(a)

After Modify:

example: __thread int a = 1;

old insn sequence(at the O0 optimization level):

lu12i.w $r12,%le_hi20(a)
ori     $r12,$r12,%le_lo12(a)
add.d   $r12,$r12,$r2
addi.w  $r13,$r0,1
stptr.w $r13,$r12,0

new insn sequence(at the O0 optimization level):

lu12i.w $r12,%le_hi20_r(a)
add.d   $r12,$r12,$r2,%le_add_r(a)
addi.w  $r13,$r0,1
st.w    $r13,$r12,%le_lo12_r(a)

changjiachen (5):
  LoongArch: bfd: Add support for tls le relax.
  LoongArch: include: Add support for tls le relax.
  LoongArch: opcodes: Add support for tls le relax.
  LoongArch: gas: Add support for tls le relax.
  LoongArch: ld: Add support for tls le relax.

 bfd/bfd-in2.h                                 |   4 +
 bfd/elfnn-loongarch.c                         |  74 +++++++++
 bfd/elfxx-loongarch.c                         |  50 ++++++
 bfd/libbfd.h                                  |   3 +
 bfd/reloc.c                                   |   6 +
 gas/config/tc-loongarch.c                     |  12 +-
 gas/testsuite/gas/loongarch/reloc.d           |  18 +++
 gas/testsuite/gas/loongarch/reloc.s           |  11 ++
 include/elf/loongarch.h                       |  13 ++
 ld/testsuite/ld-loongarch-elf/old-tls-le.s    |  19 +++
 .../relax-bound-check-tls-le.s                |  48 ++++++
 .../ld-loongarch-elf/relax-check-tls-le.s     |  43 ++++++
 ld/testsuite/ld-loongarch-elf/relax-tls-le.s  |  17 ++
 ld/testsuite/ld-loongarch-elf/relax.exp       | 146 +++++++++++++++++-
 .../tls-relax-compatible-check-old.s          |  39 +++++
 opcodes/loongarch-opc.c                       |   1 +
 16 files changed, 501 insertions(+), 3 deletions(-)
 create mode 100644 ld/testsuite/ld-loongarch-elf/old-tls-le.s
 create mode 100644 ld/testsuite/ld-loongarch-elf/relax-bound-check-tls-le.s
 create mode 100644 ld/testsuite/ld-loongarch-elf/relax-check-tls-le.s
 create mode 100644 ld/testsuite/ld-loongarch-elf/relax-tls-le.s
 create mode 100644 ld/testsuite/ld-loongarch-elf/tls-relax-compatible-check-old.s
  

Comments

Jinyang He Dec. 4, 2023, 2:25 a.m. UTC | #1
On 2023-12-02 14:53, changjiachen wrote:
> This is the v2 version of patches to support loongarch linker tls le model relax.
>
> Changes from v1:
>
> * Modified v1-0000-cover-letter.patch part of the explanatory content.
>
> Before Modify:
>
> example: __thread int a = 1;
>
> old insn sequence:
>
> lu12i.w $r12,%le_hi20_r(a)
> ori     $r12,$r12,%le_lo12_r(a)
> add.d   $r12,$r12,$r2,%le_add_r(a)
> li.w  	$r13,$r0,1
> stptr.w $r13,$r12,0
>
> new insn sequence:
>
> lu12i.w $r12,%le_hi20_r(a)
> add.d   $r12,$r12,$r2,%le_add_r(a)
> li.w  	$r13,$r0,1
> st.w    $r13,$r12,%le_lo12_r(a)
>
> After Modify:
>
> example: __thread int a = 1;
>
> old insn sequence(at the O0 optimization level):

If the sequence appear only at -O0, is it worth optimizing by relaxation?


>
> lu12i.w $r12,%le_hi20(a)
> ori     $r12,$r12,%le_lo12(a)
> add.d   $r12,$r12,$r2
> addi.w  $r13,$r0,1
> stptr.w $r13,$r12,0
>
> new insn sequence(at the O0 optimization level):
>
> lu12i.w $r12,%le_hi20_r(a)
> add.d   $r12,$r12,$r2,%le_add_r(a)
And here, if the sequence appear in other optimization level, will
register value ($r12) being different between the old sequence and
the new sequence cause other problems, e.g. worse sequence? Have you

tried this relaxation at other optimization levels?


Thanks.

> addi.w  $r13,$r0,1
> st.w    $r13,$r12,%le_lo12_r(a)
>
> changjiachen (5):
>    LoongArch: bfd: Add support for tls le relax.
>    LoongArch: include: Add support for tls le relax.
>    LoongArch: opcodes: Add support for tls le relax.
>    LoongArch: gas: Add support for tls le relax.
>    LoongArch: ld: Add support for tls le relax.
>
>   bfd/bfd-in2.h                                 |   4 +
>   bfd/elfnn-loongarch.c                         |  74 +++++++++
>   bfd/elfxx-loongarch.c                         |  50 ++++++
>   bfd/libbfd.h                                  |   3 +
>   bfd/reloc.c                                   |   6 +
>   gas/config/tc-loongarch.c                     |  12 +-
>   gas/testsuite/gas/loongarch/reloc.d           |  18 +++
>   gas/testsuite/gas/loongarch/reloc.s           |  11 ++
>   include/elf/loongarch.h                       |  13 ++
>   ld/testsuite/ld-loongarch-elf/old-tls-le.s    |  19 +++
>   .../relax-bound-check-tls-le.s                |  48 ++++++
>   .../ld-loongarch-elf/relax-check-tls-le.s     |  43 ++++++
>   ld/testsuite/ld-loongarch-elf/relax-tls-le.s  |  17 ++
>   ld/testsuite/ld-loongarch-elf/relax.exp       | 146 +++++++++++++++++-
>   .../tls-relax-compatible-check-old.s          |  39 +++++
>   opcodes/loongarch-opc.c                       |   1 +
>   16 files changed, 501 insertions(+), 3 deletions(-)
>   create mode 100644 ld/testsuite/ld-loongarch-elf/old-tls-le.s
>   create mode 100644 ld/testsuite/ld-loongarch-elf/relax-bound-check-tls-le.s
>   create mode 100644 ld/testsuite/ld-loongarch-elf/relax-check-tls-le.s
>   create mode 100644 ld/testsuite/ld-loongarch-elf/relax-tls-le.s
>   create mode 100644 ld/testsuite/ld-loongarch-elf/tls-relax-compatible-check-old.s
>
  
changjiachen Dec. 4, 2023, 3:39 a.m. UTC | #2
The above is a simple explanation of the O0 optimization, 
which is currently available with O2 and O3 turned on.

example:
test.c:
__thread int count1;
int main(){
    count1 = 1;
}
(Enable O2 option and no relax)
0000000120000480 <main>:
   120000480: 1400000c lu12i.w     $t0, 0
   120000484: 0280040d li.w        $t1, 1
   120000488: 0010898c add.d       $t0, $t0, $tp
   12000048c: 00150004 move        $a0, $zero
   120000490: 2980018d st.w        $t1, $t0, 0
   120000494: 4c000020 ret         



(Enable O2 option and relax)
0000000120000480 <main>:
   120000480: 0280040d li.w        $t1, 1
   120000484: 00150004 move        $a0, $zero
   120000488: 2980004d st.w        $t1, $tp, 0
   12000048c: 4c000020 ret         



As you can see, with the O2 option turned on, the order of instructions changes, 
but the relax optimization is still not affected, and the address calculation of the 
tls variable count1 is correct before and after optimization. The situation of enabling 
O3 is similar to that of enabling O2.







From: Jinyang He <hejinyang@loongson.cn>
Date: 2023-12-04 10:25:13
To:  changjiachen <changjiachen@stu.xupt.edu.cn>,binutils@sourceware.org
Cc:  xuchenghua@loongson.cn,chenglulu@loongson.cn,liuzhensong@loongson.cn,xry111@xry111.site,i.swmail@xen0n.name,maskray@google.com,cailulu@loongson.cn,luweining@loongson.cn,wanglei@loongson.cn,Lazy_Linux@126.com,mengqinggang@loongson.cn
Subject: Re: [PATCH v2 0/5] LoongArch tls le model linker relaxation support.>
>On 2023-12-02 14:53, changjiachen wrote:
>> This is the v2 version of patches to support loongarch linker tls le model relax.
>>
>> Changes from v1:
>>
>> * Modified v1-0000-cover-letter.patch part of the explanatory content.
>>
>> Before Modify:
>>
>> example: __thread int a = 1;
>>
>> old insn sequence:
>>
>> lu12i.w $r12,%le_hi20_r(a)
>> ori     $r12,$r12,%le_lo12_r(a)
>> add.d   $r12,$r12,$r2,%le_add_r(a)
>> li.w  	$r13,$r0,1
>> stptr.w $r13,$r12,0
>>
>> new insn sequence:
>>
>> lu12i.w $r12,%le_hi20_r(a)
>> add.d   $r12,$r12,$r2,%le_add_r(a)
>> li.w  	$r13,$r0,1
>> st.w    $r13,$r12,%le_lo12_r(a)
>>
>> After Modify:
>>
>> example: __thread int a = 1;
>>
>> old insn sequence(at the O0 optimization level):
>
>If the sequence appear only at -O0, is it worth optimizing by relaxation?
>
>
>>
>> lu12i.w $r12,%le_hi20(a)
>> ori     $r12,$r12,%le_lo12(a)
>> add.d   $r12,$r12,$r2
>> addi.w  $r13,$r0,1
>> stptr.w $r13,$r12,0
>>
>> new insn sequence(at the O0 optimization level):
>>
>> lu12i.w $r12,%le_hi20_r(a)
>> add.d   $r12,$r12,$r2,%le_add_r(a)
>And here, if the sequence appear in other optimization level, will
>register value ($r12) being different between the old sequence and
>the new sequence cause other problems, e.g. worse sequence? Have you
>
>tried this relaxation at other optimization levels?
>
>
>Thanks.
>
>> addi.w  $r13,$r0,1
>> st.w    $r13,$r12,%le_lo12_r(a)
>>
>> changjiachen (5):
>>    LoongArch: bfd: Add support for tls le relax.
>>    LoongArch: include: Add support for tls le relax.
>>    LoongArch: opcodes: Add support for tls le relax.
>>    LoongArch: gas: Add support for tls le relax.
>>    LoongArch: ld: Add support for tls le relax.
>>
>>   bfd/bfd-in2.h                                 |   4 +
>>   bfd/elfnn-loongarch.c                         |  74 +++++++++
>>   bfd/elfxx-loongarch.c                         |  50 ++++++
>>   bfd/libbfd.h                                  |   3 +
>>   bfd/reloc.c                                   |   6 +
>>   gas/config/tc-loongarch.c                     |  12 +-
>>   gas/testsuite/gas/loongarch/reloc.d           |  18 +++
>>   gas/testsuite/gas/loongarch/reloc.s           |  11 ++
>>   include/elf/loongarch.h                       |  13 ++
>>   ld/testsuite/ld-loongarch-elf/old-tls-le.s    |  19 +++
>>   .../relax-bound-check-tls-le.s                |  48 ++++++
>>   .../ld-loongarch-elf/relax-check-tls-le.s     |  43 ++++++
>>   ld/testsuite/ld-loongarch-elf/relax-tls-le.s  |  17 ++
>>   ld/testsuite/ld-loongarch-elf/relax.exp       | 146 +++++++++++++++++-
>>   .../tls-relax-compatible-check-old.s          |  39 +++++
>>   opcodes/loongarch-opc.c                       |   1 +
>>   16 files changed, 501 insertions(+), 3 deletions(-)
>>   create mode 100644 ld/testsuite/ld-loongarch-elf/old-tls-le.s
>>   create mode 100644 ld/testsuite/ld-loongarch-elf/relax-bound-check-tls-le.s
>>   create mode 100644 ld/testsuite/ld-loongarch-elf/relax-check-tls-le.s
>>   create mode 100644 ld/testsuite/ld-loongarch-elf/relax-tls-le.s
>>   create mode 100644 ld/testsuite/ld-loongarch-elf/tls-relax-compatible-check-old.s
>>
>
  
Jinyang He Dec. 4, 2023, 8:57 a.m. UTC | #3
On 2023-12-04 11:39, 常佳琛 wrote:
> The above is a simple explanation of the O0 optimization,
> which is currently available with O2 and O3 turned on.
>
> example:
> test.c:
> __thread int count1;
> int main(){
>     count1 = 1;
> }
> (Enable O2 option and no relax)
> 0000000120000480 <main>:
>    120000480:1400000c lu12i.w     $t0, 0
>    120000484:0280040d li.w $t1, 1
>    120000488:0010898c add.d       $t0, $t0, $tp
>    12000048c:00150004 move        $a0, $zero
>    120000490:2980018d st.w $t1, $t0, 0
>    120000494:4c000020 ret
>
> (Enable O2 option and relax)
> 0000000120000480 <main>:
>    120000480:0280040d li.w $t1, 1
>    120000484:00150004 move        $a0, $zero
>    120000488:2980004d st.w $t1, $tp, 0
>    12000048c:4c000020 ret
>
> As you can see, with the O2 option turned on, the order of 
> instructions changes,
> but the relax optimization is still not affected, and the address 
> calculation of the
> tls variable count1 is correct before and after optimization. The 
> situation of enabling
> O3 is similar to that of enabling O2.
>
>

How can I get your gcc (or patches)? I tried to compare access to 
non-thread var with old gcc.

Condition:
__thread int a;
int b;
extern int foo(int *);

Compare in old gcc:

a = 1;                                 b = 1;

lu12i.w $r12,%le_hi20(a)               pcalau12i $r12,%pc_hi20(b)
ori     $r12,$r12,%le_lo12(a)
addi.w  $r13,$r0,1                     addi.w  $r13,$r0,1
stx.w   $r13,$r12,$r2                  st.w    $r13,$r12,%pc_lo12(b)


a = 1; return foo(&a);                 b = 1; return foo(&b);

lu12i.w $r12,%le_hi20(a)               pcalau12i $r4,%pc_hi20(b)
ori     $r12,$r12,%le_lo12(a)          addi.d  $r4,$r4,%pc_lo12(b)
addi.w  $r13,$r0,1                     addi.w  $r12,$r0,1
add.d   $r4,$r12,$r2
stx.w   $r13,$r12,$r2                  stptr.w $r12,$r4,0
b       %plt(foo)                      b       %plt(foo)

I worry about this case we need the address of the thread-var after
accessing it, which may cause worse sequence in your gcc. For the
non-thread-var it load the address to a register first and then
access it by that register. How about your gcc handle this case?


>
>
> From: Jinyang He <hejinyang@loongson.cn>
> Date: 2023-12-04 10:25:13
> To:  changjiachen <changjiachen@stu.xupt.edu.cn>,binutils@sourceware.org
> Cc:  xuchenghua@loongson.cn,chenglulu@loongson.cn,liuzhensong@loongson.cn,xry111@xry111.site,i.swmail@xen0n.name,maskray@google.com,cailulu@loongson.cn,luweining@loongson.cn,wanglei@loongson.cn,Lazy_Linux@126.com,mengqinggang@loongson.cn
> Subject: Re: [PATCH v2 0/5] LoongArch tls le model linker relaxation support.>
> >On 2023-12-02 14:53, changjiachen wrote:
> >> This is the v2 version of patches to support loongarch linker tls le model relax.
> >>
> >> Changes from v1:
> >>
> >> * Modified v1-0000-cover-letter.patch part of the explanatory content.
> >>
> >> Before Modify:
> >>
> >> example: __thread int a = 1;
> >>
> >> old insn sequence:
> >>
> >> lu12i.w $r12,%le_hi20_r(a)
> >> ori     $r12,$r12,%le_lo12_r(a)
> >> add.d   $r12,$r12,$r2,%le_add_r(a)
> >> li.w  	$r13,$r0,1
> >> stptr.w $r13,$r12,0
> >>
> >> new insn sequence:
> >>
> >> lu12i.w $r12,%le_hi20_r(a)
> >> add.d   $r12,$r12,$r2,%le_add_r(a)
> >> li.w  	$r13,$r0,1
> >> st.w    $r13,$r12,%le_lo12_r(a)
> >>
> >> After Modify:
> >>
> >> example: __thread int a = 1;
> >>
> >> old insn sequence(at the O0 optimization level):
> >
> >If the sequence appear only at -O0, is it worth optimizing by relaxation?
> >
> >
> >>
> >> lu12i.w $r12,%le_hi20(a)
> >> ori     $r12,$r12,%le_lo12(a)
> >> add.d   $r12,$r12,$r2
> >> addi.w  $r13,$r0,1
> >> stptr.w $r13,$r12,0
> >>
> >> new insn sequence(at the O0 optimization level):
> >>
> >> lu12i.w $r12,%le_hi20_r(a)
> >> add.d   $r12,$r12,$r2,%le_add_r(a)
> >And here, if the sequence appear in other optimization level, will
> >register value ($r12) being different between the old sequence and
> >the new sequence cause other problems, e.g. worse sequence? Have you
> >
> >tried this relaxation at other optimization levels?
> >
> >
> >Thanks.
> >
> >> addi.w  $r13,$r0,1
> >> st.w    $r13,$r12,%le_lo12_r(a)
> >>
> >> changjiachen (5):
> >>    LoongArch: bfd: Add support for tls le relax.
> >>    LoongArch: include: Add support for tls le relax.
> >>    LoongArch: opcodes: Add support for tls le relax.
> >>    LoongArch: gas: Add support for tls le relax.
> >>    LoongArch: ld: Add support for tls le relax.
> >>
> >>   bfd/bfd-in2.h                                 |   4 +
> >>   bfd/elfnn-loongarch.c                         |  74 +++++++++
> >>   bfd/elfxx-loongarch.c                         |  50 ++++++
> >>   bfd/libbfd.h                                  |   3 +
> >>   bfd/reloc.c                                   |   6 +
> >>   gas/config/tc-loongarch.c                     |  12 +-
> >>   gas/testsuite/gas/loongarch/reloc.d           |  18 +++
> >>   gas/testsuite/gas/loongarch/reloc.s           |  11 ++
> >>   include/elf/loongarch.h                       |  13 ++
> >>   ld/testsuite/ld-loongarch-elf/old-tls-le.s    |  19 +++
> >>   .../relax-bound-check-tls-le.s                |  48 ++++++
> >>   .../ld-loongarch-elf/relax-check-tls-le.s     |  43 ++++++
> >>   ld/testsuite/ld-loongarch-elf/relax-tls-le.s  |  17 ++
> >>   ld/testsuite/ld-loongarch-elf/relax.exp       | 146 +++++++++++++++++-
> >>   .../tls-relax-compatible-check-old.s          |  39 +++++
> >>   opcodes/loongarch-opc.c                       |   1 +
> >>   16 files changed, 501 insertions(+), 3 deletions(-)
> >>   create mode 100644 ld/testsuite/ld-loongarch-elf/old-tls-le.s
> >>   create mode 100644 ld/testsuite/ld-loongarch-elf/relax-bound-check-tls-le.s
> >>   create mode 100644 ld/testsuite/ld-loongarch-elf/relax-check-tls-le.s
> >>   create mode 100644 ld/testsuite/ld-loongarch-elf/relax-tls-le.s
> >>   create mode 100644 ld/testsuite/ld-loongarch-elf/tls-relax-compatible-check-old.s
> >>
> >
>
  
changjiachen Dec. 4, 2023, 9:25 a.m. UTC | #4
发件人:Jinyang He <hejinyang@loongson.cn>
发送日期:2023-12-04 16:57:55
收件人:"常佳琛" <changjiachen@stu.xupt.edu.cn>,binutils@sourceware.org
抄送人:xuchenghua@loongson.cn,chenglulu@loongson.cn,liuzhensong@loongson.cn,xry111@xry111.site,i.swmail@xen0n.name,maskray@google.com,cailulu@loongson.cn,luweining@loongson.cn,wanglei@loongson.cn,Lazy_Linux@126.com,mengqinggang@loongson.cn
主题:Re: [PATCH v2 0/5] LoongArch tls le model linker relaxation support.>
>On 2023-12-04 11:39, 常佳琛 wrote:
>> The above is a simple explanation of the O0 optimization,
>> which is currently available with O2 and O3 turned on.
>>
>> example:
>> test.c:
>> __thread int count1;
>> int main(){
>>     count1 = 1;
>> }
>> (Enable O2 option and no relax)
>> 0000000120000480 <main>:
>>    120000480:1400000c lu12i.w     $t0, 0
>>    120000484:0280040d li.w $t1, 1
>>    120000488:0010898c add.d       $t0, $t0, $tp
>>    12000048c:00150004 move        $a0, $zero
>>    120000490:2980018d st.w $t1, $t0, 0
>>    120000494:4c000020 ret
>>
>> (Enable O2 option and relax)
>> 0000000120000480 <main>:
>>    120000480:0280040d li.w $t1, 1
>>    120000484:00150004 move        $a0, $zero
>>    120000488:2980004d st.w $t1, $tp, 0
>>    12000048c:4c000020 ret
>>
>> As you can see, with the O2 option turned on, the order of 
>> instructions changes,
>> but the relax optimization is still not affected, and the address 
>> calculation of the
>> tls variable count1 is correct before and after optimization. The 
>> situation of enabling
>> O3 is similar to that of enabling O2.
>>
>>
>
>How can I get your gcc (or patches)? I tried to compare access to 
>non-thread var with old gcc.
Reply :
There are still some issues with gcc that need to be worked out.
As for gcc patch, it will be shipped on Tuesday or Wednesday of this week, you may have to wait for a while.


changjiachen
>
>Condition:
>__thread int a;
>int b;
>extern int foo(int *);
>
>Compare in old gcc:
>
>a = 1;                                 b = 1;
>
>lu12i.w $r12,%le_hi20(a)               pcalau12i $r12,%pc_hi20(b)
>ori     $r12,$r12,%le_lo12(a)
>addi.w  $r13,$r0,1                     addi.w  $r13,$r0,1
>stx.w   $r13,$r12,$r2                  st.w    $r13,$r12,%pc_lo12(b)
>
>
>a = 1; return foo(&a);                 b = 1; return foo(&b);
>
>lu12i.w $r12,%le_hi20(a)               pcalau12i $r4,%pc_hi20(b)
>ori     $r12,$r12,%le_lo12(a)          addi.d  $r4,$r4,%pc_lo12(b)
>addi.w  $r13,$r0,1                     addi.w  $r12,$r0,1
>add.d   $r4,$r12,$r2
>stx.w   $r13,$r12,$r2                  stptr.w $r12,$r4,0
>b       %plt(foo)                      b       %plt(foo)
>
>I worry about this case we need the address of the thread-var after
>accessing it, which may cause worse sequence in your gcc. For the
>non-thread-var it load the address to a register first and then
>access it by that register. How about your gcc handle this case?
>
>


>>
>>
>> From: Jinyang He <hejinyang@loongson.cn>
>> Date: 2023-12-04 10:25:13
>> To:  changjiachen <changjiachen@stu.xupt.edu.cn>,binutils@sourceware.org
>> Cc:  xuchenghua@loongson.cn,chenglulu@loongson.cn,liuzhensong@loongson.cn,xry111@xry111.site,i.swmail@xen0n.name,maskray@google.com,cailulu@loongson.cn,luweining@loongson.cn,wanglei@loongson.cn,Lazy_Linux@126.com,mengqinggang@loongson.cn
>> Subject: Re: [PATCH v2 0/5] LoongArch tls le model linker relaxation support.>
>> >On 2023-12-02 14:53, changjiachen wrote:
>> >> This is the v2 version of patches to support loongarch linker tls le model relax.
>> >>
>> >> Changes from v1:
>> >>
>> >> * Modified v1-0000-cover-letter.patch part of the explanatory content.
>> >>
>> >> Before Modify:
>> >>
>> >> example: __thread int a = 1;
>> >>
>> >> old insn sequence:
>> >>
>> >> lu12i.w $r12,%le_hi20_r(a)
>> >> ori     $r12,$r12,%le_lo12_r(a)
>> >> add.d   $r12,$r12,$r2,%le_add_r(a)
>> >> li.w  	$r13,$r0,1
>> >> stptr.w $r13,$r12,0
>> >>
>> >> new insn sequence:
>> >>
>> >> lu12i.w $r12,%le_hi20_r(a)
>> >> add.d   $r12,$r12,$r2,%le_add_r(a)
>> >> li.w  	$r13,$r0,1
>> >> st.w    $r13,$r12,%le_lo12_r(a)
>> >>
>> >> After Modify:
>> >>
>> >> example: __thread int a = 1;
>> >>
>> >> old insn sequence(at the O0 optimization level):
>> >
>> >If the sequence appear only at -O0, is it worth optimizing by relaxation?
>> >
>> >
>> >>
>> >> lu12i.w $r12,%le_hi20(a)
>> >> ori     $r12,$r12,%le_lo12(a)
>> >> add.d   $r12,$r12,$r2
>> >> addi.w  $r13,$r0,1
>> >> stptr.w $r13,$r12,0
>> >>
>> >> new insn sequence(at the O0 optimization level):
>> >>
>> >> lu12i.w $r12,%le_hi20_r(a)
>> >> add.d   $r12,$r12,$r2,%le_add_r(a)
>> >And here, if the sequence appear in other optimization level, will
>> >register value ($r12) being different between the old sequence and
>> >the new sequence cause other problems, e.g. worse sequence? Have you
>> >
>> >tried this relaxation at other optimization levels?
>> >
>> >
>> >Thanks.
>> >
>> >> addi.w  $r13,$r0,1
>> >> st.w    $r13,$r12,%le_lo12_r(a)
>> >>
>> >> changjiachen (5):
>> >>    LoongArch: bfd: Add support for tls le relax.
>> >>    LoongArch: include: Add support for tls le relax.
>> >>    LoongArch: opcodes: Add support for tls le relax.
>> >>    LoongArch: gas: Add support for tls le relax.
>> >>    LoongArch: ld: Add support for tls le relax.
>> >>
>> >>   bfd/bfd-in2.h                                 |   4 +
>> >>   bfd/elfnn-loongarch.c                         |  74 +++++++++
>> >>   bfd/elfxx-loongarch.c                         |  50 ++++++
>> >>   bfd/libbfd.h                                  |   3 +
>> >>   bfd/reloc.c                                   |   6 +
>> >>   gas/config/tc-loongarch.c                     |  12 +-
>> >>   gas/testsuite/gas/loongarch/reloc.d           |  18 +++
>> >>   gas/testsuite/gas/loongarch/reloc.s           |  11 ++
>> >>   include/elf/loongarch.h                       |  13 ++
>> >>   ld/testsuite/ld-loongarch-elf/old-tls-le.s    |  19 +++
>> >>   .../relax-bound-check-tls-le.s                |  48 ++++++
>> >>   .../ld-loongarch-elf/relax-check-tls-le.s     |  43 ++++++
>> >>   ld/testsuite/ld-loongarch-elf/relax-tls-le.s  |  17 ++
>> >>   ld/testsuite/ld-loongarch-elf/relax.exp       | 146 +++++++++++++++++-
>> >>   .../tls-relax-compatible-check-old.s          |  39 +++++
>> >>   opcodes/loongarch-opc.c                       |   1 +
>> >>   16 files changed, 501 insertions(+), 3 deletions(-)
>> >>   create mode 100644 ld/testsuite/ld-loongarch-elf/old-tls-le.s
>> >>   create mode 100644 ld/testsuite/ld-loongarch-elf/relax-bound-check-tls-le.s
>> >>   create mode 100644 ld/testsuite/ld-loongarch-elf/relax-check-tls-le.s
>> >>   create mode 100644 ld/testsuite/ld-loongarch-elf/relax-tls-le.s
>> >>   create mode 100644 ld/testsuite/ld-loongarch-elf/tls-relax-compatible-check-old.s
>> >>
>> >
>>
>
  
Xi Ruoyao Dec. 4, 2023, 9:37 a.m. UTC | #5
On Mon, 2023-12-04 at 17:25 +0800, 常佳琛 wrote:
> > How can I get your gcc (or patches)? I tried to compare access to 
> > non-thread var with old gcc.
> Reply :
> There are still some issues with gcc that need to be worked out.
> As for gcc patch, it will be shipped on Tuesday or Wednesday of this
> week, you may have to wait for a while.

Let's not add huge thunks of new features into GCC at the moment because
we are in stage 3 (general bugfixing) of GCC 14 development.  So we
should concentrate on fixing bugs and avoid from potentially introducing
new bugs.

You may still post the GCC patch for a review though.
  
Xi Ruoyao Dec. 4, 2023, 9:42 a.m. UTC | #6
On Mon, 2023-12-04 at 17:37 +0800, Xi Ruoyao wrote:
> On Mon, 2023-12-04 at 17:25 +0800, 常佳琛 wrote:
> > > How can I get your gcc (or patches)? I tried to compare access to 
> > > non-thread var with old gcc.
> > Reply :
> > There are still some issues with gcc that need to be worked out.
> > As for gcc patch, it will be shipped on Tuesday or Wednesday of this
> > week, you may have to wait for a while.
> 
> Let's not add huge thunks of new features into GCC at the moment because
> we are in stage 3 (general bugfixing) of GCC 14 development.  So we
> should concentrate on fixing bugs and avoid from potentially introducing
> new bugs.
> 
> You may still post the GCC patch for a review though.

FWIW if you want this for GCC 14, IMO you can just remove SYMBOL_TLS_LE
from loongarch_explicit_relocs_p in GCC (so GCC will always generate
la.tls.le instead of the real instruction sequence to load address of a
LE TLS symbol, unless -mexplicit-relocs=always) and expand la.tls.le as
you wish in GAS.  This will be a one-line change in GCC and it's more
acceptable than a huge diff in stage 3.