[v4,0/6] LoongArch linker relaxation support.

Message ID 20230522013441.3074776-1-mengqinggang@loongson.cn
Headers
Series LoongArch linker relaxation support. |

Message

mengqinggang May 22, 2023, 1:34 a.m. UTC
  This is the v4 version of patches to support loongarch linker relax.
This version mainly rebase to the master branch.

The binutils, gcc, glibc and Spec2006 testcases is ok.

Now, only the instrunctions expand from macro (la.local, la.global, etc.) at          
assembly time can be relaxed, because gcc instruction scheduling causes relax   
unable to handle some special cases. Gcc can add -mno-explicit-relocs option       
to generate macro instrunction.

There are two code sequence can be relaxed in LoongArch. The first one  
is "pcala12i + addi.d", which can be relaxed to pcaddi. Another one is           
"pcalau12i + ld.d", which can be relaxed to "pcalau12i + addi.d". And it can be  
relaxed to pcaddi one more time. Pcaddi instrunction can address a signed 22       
bits 4-byte alinged offset relative to pc. 

In the future, the TLS LE code sequence and function call in medium
code mode would be relaxed too.

For .align directive, some small problems cannot be perfectly solved (see
http://maskray.me/blog/2021-03-14-the-dark-side-of-riscv-linker-relaxation). 

The new relocs document at here:
  https://github.com/loongson/LoongArch-Documentation/pull/77

mengqinggang (6):
  LoongArch: include: Add support for linker relaxation.
  LoongArch: bfd: Add support for linker relaxation.
  LoongArch: opcodes: Add support for linker relaxation.
  LoongArch: binutils: Add support for linker relaxation.
  LoongArch: gas: Add support for linker relaxation.
  LoongArch: ld: Add support for linker relaxation.

 bfd/bfd-in2.h                                 |   8 +
 bfd/elfnn-loongarch.c                         | 582 +++++++++++++--
 bfd/elfxx-loongarch.c                         | 676 +++++++++++++-----
 bfd/elfxx-loongarch.h                         |  10 +-
 bfd/libbfd.h                                  |   8 +
 bfd/reloc.c                                   |  22 +
 binutils/readelf.c                            |  84 ++-
 binutils/testsuite/binutils-all/readelf.exp   |  13 +-
 gas/config/tc-loongarch.c                     | 427 +++++++++--
 gas/config/tc-loongarch.h                     |  48 +-
 gas/testsuite/gas/all/align.d                 |   5 +-
 gas/testsuite/gas/all/gas.exp                 |  10 +-
 gas/testsuite/gas/all/relax.d                 |   4 +
 gas/testsuite/gas/elf/dwarf-5-irp.d           |   3 +-
 gas/testsuite/gas/elf/dwarf-5-loc0.d          |   3 +-
 gas/testsuite/gas/elf/dwarf-5-macro-include.d |   2 +-
 gas/testsuite/gas/elf/dwarf-5-macro.d         |   2 +-
 gas/testsuite/gas/elf/dwarf2-11.d             |   3 +-
 gas/testsuite/gas/elf/dwarf2-15.d             |   3 +-
 gas/testsuite/gas/elf/dwarf2-16.d             |   3 +-
 gas/testsuite/gas/elf/dwarf2-17.d             |   3 +-
 gas/testsuite/gas/elf/dwarf2-18.d             |   3 +-
 gas/testsuite/gas/elf/dwarf2-19.d             |   3 +-
 gas/testsuite/gas/elf/dwarf2-5.d              |   3 +-
 gas/testsuite/gas/elf/ehopt0.d                |   3 +
 gas/testsuite/gas/elf/elf.exp                 |   3 +
 gas/testsuite/gas/elf/section11.d             |   4 +-
 gas/testsuite/gas/lns/lns.exp                 |   1 +
 gas/testsuite/gas/loongarch/jmp_op.d          |  65 +-
 gas/testsuite/gas/loongarch/li.d              |   8 +-
 gas/testsuite/gas/loongarch/macro_op.d        |  68 +-
 gas/testsuite/gas/loongarch/macro_op_32.d     |  24 +-
 .../gas/loongarch/macro_op_large_abs.d        |  32 +-
 .../gas/loongarch/macro_op_large_pc.d         | 134 ++--
 gas/testsuite/gas/loongarch/relax_align.d     |  26 +
 gas/testsuite/gas/loongarch/relax_align.s     |   5 +
 gas/testsuite/gas/loongarch/uleb128.d         |  36 +
 gas/testsuite/gas/loongarch/uleb128.s         |  20 +
 include/elf/loongarch.h                       |  20 +
 include/opcode/loongarch.h                    |   3 +
 ld/emultempl/loongarchelf.em                  |   3 +
 ld/testsuite/ld-elf/compressed1d.d            |   3 +
 ld/testsuite/ld-elf/pr26936.d                 |   4 +-
 ld/testsuite/ld-loongarch-elf/disas-jirl-32.d |   2 +
 ld/testsuite/ld-loongarch-elf/disas-jirl.d    |   4 +-
 ld/testsuite/ld-loongarch-elf/jmp_op.d        |  65 +-
 ld/testsuite/ld-loongarch-elf/macro_op.d      |  84 ++-
 ld/testsuite/ld-loongarch-elf/macro_op_32.d   |  24 +-
 ld/testsuite/ld-loongarch-elf/relax-align.dd  |   7 +
 ld/testsuite/ld-loongarch-elf/relax-align.s   |   9 +
 ld/testsuite/ld-loongarch-elf/relax.exp       |  73 ++
 ld/testsuite/ld-loongarch-elf/relax.s         |  16 +
 ld/testsuite/ld-loongarch-elf/uleb128.dd      |  10 +
 ld/testsuite/ld-loongarch-elf/uleb128.s       |  21 +
 opcodes/loongarch-opc.c                       |   5 +-
 55 files changed, 2189 insertions(+), 521 deletions(-)
 create mode 100644 gas/testsuite/gas/loongarch/relax_align.d
 create mode 100644 gas/testsuite/gas/loongarch/relax_align.s
 create mode 100644 gas/testsuite/gas/loongarch/uleb128.d
 create mode 100644 gas/testsuite/gas/loongarch/uleb128.s
 create mode 100644 ld/testsuite/ld-loongarch-elf/relax-align.dd
 create mode 100644 ld/testsuite/ld-loongarch-elf/relax-align.s
 create mode 100644 ld/testsuite/ld-loongarch-elf/relax.exp
 create mode 100644 ld/testsuite/ld-loongarch-elf/relax.s
 create mode 100644 ld/testsuite/ld-loongarch-elf/uleb128.dd
 create mode 100644 ld/testsuite/ld-loongarch-elf/uleb128.s
  

Comments

Xi Ruoyao May 22, 2023, 5:40 a.m. UTC | #1
On Mon, 2023-05-22 at 09:34 +0800, mengqinggang wrote:
> This is the v4 version of patches to support loongarch linker relax.
> This version mainly rebase to the master branch.
> 
> The binutils, gcc, glibc and Spec2006 testcases is ok.

Have you tried the kernel and GRUB?  AFAIK they are the most "fragile"
package regarding to this kind of optimization.  The RISC-V port of them
uses -mno-relax.

> Now, only the instrunctions expand from macro (la.local, la.global, etc.) at          
> assembly time can be relaxed, because gcc instruction scheduling causes relax   
> unable to handle some special cases. Gcc can add -mno-explicit-relocs option       
> to generate macro instrunction.

I guess -fsection-anchors (enabled by default with any optimization
level but -O0) can also affect.  Maybe we should change GCC to use -mno-
explicit-relocs and maybe -fno-section-anchors for -Os then.  For -O1/-
O2/-O3 the benefit of scheduling is more important on a modern CPU.

> There are two code sequence can be relaxed in LoongArch. The first one  
> is "pcala12i + addi.d", which can be relaxed to pcaddi. Another one is           
> "pcalau12i + ld.d", which can be relaxed to "pcalau12i + addi.d". And it can be  
> relaxed to pcaddi one more time. Pcaddi instrunction can address a signed 22       
> bits 4-byte alinged offset relative to pc. 
> 
> In the future, the TLS LE code sequence and function call in medium
> code mode would be relaxed too.
> 
> For .align directive, some small problems cannot be perfectly solved (see
> http://maskray.me/blog/2021-03-14-the-dark-side-of-riscv-linker-relaxation).
>
> 
> The new relocs document at here:
>   https://github.com/loongson/LoongArch-Documentation/pull/77

But the repo is archived so any PR in it should be considered dead.  May
I "hijack" the discussion to ask the rationale about archiving it?  Note
that if you want to mean "it's stable and should not be changed w/o a
major update" you should create a tag and release instead of archiving
it.  Archiving basically mean "the repo is dead or moved elsewhere".
  
mengqinggang May 22, 2023, 8:14 a.m. UTC | #2
The new ABI document is currently being organized, and there will be a
new repo that only stores ABI document.


在 2023/5/22 下午1:40, Xi Ruoyao 写道:
> On Mon, 2023-05-22 at 09:34 +0800, mengqinggang wrote:
>> This is the v4 version of patches to support loongarch linker relax.
>> This version mainly rebase to the master branch.
>>
>> The binutils, gcc, glibc and Spec2006 testcases is ok.
> Have you tried the kernel and GRUB?  AFAIK they are the most "fragile"
> package regarding to this kind of optimization.  The RISC-V port of them
> uses -mno-relax.
>
>> Now, only the instrunctions expand from macro (la.local, la.global, etc.) at
>> assembly time can be relaxed, because gcc instruction scheduling causes relax
>> unable to handle some special cases. Gcc can add -mno-explicit-relocs option
>> to generate macro instrunction.
> I guess -fsection-anchors (enabled by default with any optimization
> level but -O0) can also affect.  Maybe we should change GCC to use -mno-
> explicit-relocs and maybe -fno-section-anchors for -Os then.  For -O1/-
> O2/-O3 the benefit of scheduling is more important on a modern CPU.
>
>> There are two code sequence can be relaxed in LoongArch. The first one
>> is "pcala12i + addi.d", which can be relaxed to pcaddi. Another one is
>> "pcalau12i + ld.d", which can be relaxed to "pcalau12i + addi.d". And it can be
>> relaxed to pcaddi one more time. Pcaddi instrunction can address a signed 22
>> bits 4-byte alinged offset relative to pc.
>>
>> In the future, the TLS LE code sequence and function call in medium
>> code mode would be relaxed too.
>>
>> For .align directive, some small problems cannot be perfectly solved (see
>> http://maskray.me/blog/2021-03-14-the-dark-side-of-riscv-linker-relaxation).
>>
>>
>> The new relocs document at here:
>>    https://github.com/loongson/LoongArch-Documentation/pull/77
> But the repo is archived so any PR in it should be considered dead.  May
> I "hijack" the discussion to ask the rationale about archiving it?  Note
> that if you want to mean "it's stable and should not be changed w/o a
> major update" you should create a tag and release instead of archiving
> it.  Archiving basically mean "the repo is dead or moved elsewhere".
>
>
  
Andreas Schwab May 22, 2023, 8:18 a.m. UTC | #3
On Mai 22 2023, Xi Ruoyao via Binutils wrote:

> Have you tried the kernel and GRUB?  AFAIK they are the most "fragile"
> package regarding to this kind of optimization.  The RISC-V port of them
> uses -mno-relax.

GRUB does not use -mno-relax (it just ignores R_RISCV_RELAX).
  
mengqinggang May 22, 2023, 10:04 a.m. UTC | #4
For -fsection-anchors, I think it is less affected by relax.  Could you 
please
give some special question about this?


On LoongArch architecture, -mno-explicit-relocs may have a higher 
performance
than -mexplicit-relocs for some large program.


在 2023/5/22 下午1:40, Xi Ruoyao 写道:
> On Mon, 2023-05-22 at 09:34 +0800, mengqinggang wrote:
>> This is the v4 version of patches to support loongarch linker relax.
>> This version mainly rebase to the master branch.
>>
>> The binutils, gcc, glibc and Spec2006 testcases is ok.
> Have you tried the kernel and GRUB?  AFAIK they are the most "fragile"
> package regarding to this kind of optimization.  The RISC-V port of them
> uses -mno-relax.
>
>> Now, only the instrunctions expand from macro (la.local, la.global, etc.) at
>> assembly time can be relaxed, because gcc instruction scheduling causes relax
>> unable to handle some special cases. Gcc can add -mno-explicit-relocs option
>> to generate macro instrunction.
> I guess -fsection-anchors (enabled by default with any optimization
> level but -O0) can also affect.  Maybe we should change GCC to use -mno-
> explicit-relocs and maybe -fno-section-anchors for -Os then.  For -O1/-
> O2/-O3 the benefit of scheduling is more important on a modern CPU.
>
>> There are two code sequence can be relaxed in LoongArch. The first one
>> is "pcala12i + addi.d", which can be relaxed to pcaddi. Another one is
>> "pcalau12i + ld.d", which can be relaxed to "pcalau12i + addi.d". And it can be
>> relaxed to pcaddi one more time. Pcaddi instrunction can address a signed 22
>> bits 4-byte alinged offset relative to pc.
>>
>> In the future, the TLS LE code sequence and function call in medium
>> code mode would be relaxed too.
>>
>> For .align directive, some small problems cannot be perfectly solved (see
>> http://maskray.me/blog/2021-03-14-the-dark-side-of-riscv-linker-relaxation).
>>
>>
>> The new relocs document at here:
>>    https://github.com/loongson/LoongArch-Documentation/pull/77
> But the repo is archived so any PR in it should be considered dead.  May
> I "hijack" the discussion to ask the rationale about archiving it?  Note
> that if you want to mean "it's stable and should not be changed w/o a
> major update" you should create a tag and release instead of archiving
> it.  Archiving basically mean "the repo is dead or moved elsewhere".
>
>
  
Xi Ruoyao May 22, 2023, 11:04 a.m. UTC | #5
On Mon, 2023-05-22 at 18:04 +0800, mengqinggang wrote:
> For -fsection-anchors, I think it is less affected by relax.  Could
> you 
> please
> give some special question about this?

Alright, it seems I'd misunderstood -fsection-anchors with -mno-
explicit-relocs.  It generates things like:

la.pcrel t0, .ANCHOR0 + 8
la.pcrel t1, .ANCHOR0 + 16

Not

la.pcrel t2, .ANCHOR0
addi.d t0, t2, 8
addi.d t1, t2, 16

which may puzzle the relaxation pass.

> On LoongArch architecture, -mno-explicit-relocs may have a higher 
> performance
> than -mexplicit-relocs for some large program.

I'd consider this situation "bad" as a distro maintainer will need to
decide this on per-package or even per-link-unit basis if (s)he really
wants to squeeze the last drop of the performance...  Is there any
possibility to make both scheduling and relaxation work?
  
mengqinggang May 23, 2023, 2:04 a.m. UTC | #6
For -mexplicit-relocs, two addi.d may share one pcalau12i after gcc
scheduling like this:

     pcalau12i $t0, %pc_hi20(a)
     beq $t1, $t2, L1
     addi.d $t0, %pc_lo12(a)
L1:
     addi.d $t0, %pc_lo12(a)

If the first pcalau12i and addi.d be relaxed to pcaddi, the last
addi.d would get error address.


I guess riscv function call relaxation has the same question.
Riscv function call using 'call' instruction in  -mexplicit-relocs and
-mno-explicit-relocs. And R_RISCV_CALL relocation size is 8
bytes, so it can directly process two instructions expand from
'call' instruction.


在 2023/5/22 下午7:04, Xi Ruoyao 写道:
> On Mon, 2023-05-22 at 18:04 +0800, mengqinggang wrote:
>> For -fsection-anchors, I think it is less affected by relax.  Could
>> you
>> please
>> give some special question about this?
> Alright, it seems I'd misunderstood -fsection-anchors with -mno-
> explicit-relocs.  It generates things like:
>
> la.pcrel t0, .ANCHOR0 + 8
> la.pcrel t1, .ANCHOR0 + 16
>
> Not
>
> la.pcrel t2, .ANCHOR0
> addi.d t0, t2, 8
> addi.d t1, t2, 16
>
> which may puzzle the relaxation pass.
>
>> On LoongArch architecture, -mno-explicit-relocs may have a higher
>> performance
>> than -mexplicit-relocs for some large program.
> I'd consider this situation "bad" as a distro maintainer will need to
> decide this on per-package or even per-link-unit basis if (s)he really
> wants to squeeze the last drop of the performance...  Is there any
> possibility to make both scheduling and relaxation work?
>
>
  
mengqinggang May 24, 2023, 9:58 a.m. UTC | #7
There are some "unsupported relocation" compile errors for grub,
because grub currently does not support relax/b16/b21/add/sub relocations.


For kernel, I just did a simple test. It can be compiled correctly, and
the OS can boot normally.


在 2023/5/22 下午1:40, Xi Ruoyao 写道:
> On Mon, 2023-05-22 at 09:34 +0800, mengqinggang wrote:
>> This is the v4 version of patches to support loongarch linker relax.
>> This version mainly rebase to the master branch.
>>
>> The binutils, gcc, glibc and Spec2006 testcases is ok.
> Have you tried the kernel and GRUB?  AFAIK they are the most "fragile"
> package regarding to this kind of optimization.  The RISC-V port of them
> uses -mno-relax.
>
>> Now, only the instrunctions expand from macro (la.local, la.global, etc.) at
>> assembly time can be relaxed, because gcc instruction scheduling causes relax
>> unable to handle some special cases. Gcc can add -mno-explicit-relocs option
>> to generate macro instrunction.
> I guess -fsection-anchors (enabled by default with any optimization
> level but -O0) can also affect.  Maybe we should change GCC to use -mno-
> explicit-relocs and maybe -fno-section-anchors for -Os then.  For -O1/-
> O2/-O3 the benefit of scheduling is more important on a modern CPU.
>
>> There are two code sequence can be relaxed in LoongArch. The first one
>> is "pcala12i + addi.d", which can be relaxed to pcaddi. Another one is
>> "pcalau12i + ld.d", which can be relaxed to "pcalau12i + addi.d". And it can be
>> relaxed to pcaddi one more time. Pcaddi instrunction can address a signed 22
>> bits 4-byte alinged offset relative to pc.
>>
>> In the future, the TLS LE code sequence and function call in medium
>> code mode would be relaxed too.
>>
>> For .align directive, some small problems cannot be perfectly solved (see
>> http://maskray.me/blog/2021-03-14-the-dark-side-of-riscv-linker-relaxation).
>>
>>
>> The new relocs document at here:
>>    https://github.com/loongson/LoongArch-Documentation/pull/77
> But the repo is archived so any PR in it should be considered dead.  May
> I "hijack" the discussion to ask the rationale about archiving it?  Note
> that if you want to mean "it's stable and should not be changed w/o a
> major update" you should create a tag and release instead of archiving
> it.  Archiving basically mean "the repo is dead or moved elsewhere".
>
>