arm64: enable dead code elimination

Message ID 20230717080739.1000460-1-wangkefeng.wang@huawei.com
State New
Headers
Series arm64: enable dead code elimination |

Commit Message

Kefeng Wang July 17, 2023, 8:07 a.m. UTC
  Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the
user to enable dead code elimination. In order for this to work, ensure
that we keep the necessary tables by annotating them with KEEP, also it
requires further changes to linker script to KEEP some tables and wildcard
compiler generated sections into the right place.

The following comparison is based 6.5-rc2 with defconfig,

$ ./scripts/bloat-o-meter vmlinux-base vmlinux-new
add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276)
Function                                     old     new   delta
...
Total: Before=17888959, After=17824683, chg -0.36%

add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44)
Data                                         old     new   delta
...
Total: Before=4820808, After=4820764, chg -0.00%

add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096)
RO Data                                      old     new   delta
...
Total: Before=5179123, After=5178027, chg -0.02%

$ size vmlinux-base vmlinux
   text	   data	     bss      dec       hex	filename
25433734  15385766  630656  41450156  2787aac	vmlinux-base
24756738  15360870  629888  40747496  26dc1e8	vmlinux-new

Memory available after booting, saving 704k on qemu,
base: 8084532K/8388608K
new:  8085236K/8388608K

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/arm64/Kconfig              | 1 +
 arch/arm64/kernel/vmlinux.lds.S | 5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)
  

Comments

Will Deacon July 17, 2023, 9:24 a.m. UTC | #1
On Mon, Jul 17, 2023 at 04:07:39PM +0800, Kefeng Wang wrote:
> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the
> user to enable dead code elimination. In order for this to work, ensure
> that we keep the necessary tables by annotating them with KEEP, also it
> requires further changes to linker script to KEEP some tables and wildcard
> compiler generated sections into the right place.
> 
> The following comparison is based 6.5-rc2 with defconfig,
> 
> $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new
> add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276)
> Function                                     old     new   delta
> ...
> Total: Before=17888959, After=17824683, chg -0.36%
> 
> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44)
> Data                                         old     new   delta
> ...
> Total: Before=4820808, After=4820764, chg -0.00%
> 
> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096)
> RO Data                                      old     new   delta
> ...
> Total: Before=5179123, After=5178027, chg -0.02%
> 
> $ size vmlinux-base vmlinux
>    text	   data	     bss      dec       hex	filename
> 25433734  15385766  630656  41450156  2787aac	vmlinux-base
> 24756738  15360870  629888  40747496  26dc1e8	vmlinux-new
> 
> Memory available after booting, saving 704k on qemu,
> base: 8084532K/8388608K
> new:  8085236K/8388608K

Is that a 0.009% improvement? Is it really worth the hassle?

x86 doesn't select this and risc-v had to turn it off for LLD, so it feels
like we're just creating a rod for our own back by selecting it.

Will

> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  arch/arm64/Kconfig              | 1 +
>  arch/arm64/kernel/vmlinux.lds.S | 5 +++--
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a2511b30d0f6..73bb908ec62f 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -148,6 +148,7 @@ config ARM64
>  	select GENERIC_VDSO_TIME_NS
>  	select HARDIRQS_SW_RESEND
>  	select HAS_IOPORT
> +	select HAVE_LD_DEAD_CODE_DATA_ELIMINATION
>  	select HAVE_MOVE_PMD
>  	select HAVE_MOVE_PUD
>  	select HAVE_PCI
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 3cd7e76cc562..bb4ce6cd6896 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -238,7 +238,7 @@ SECTIONS
>  	. = ALIGN(4);
>  	.altinstructions : {
>  		__alt_instructions = .;
> -		*(.altinstructions)
> +		KEEP(*(.altinstructions))
>  		__alt_instructions_end = .;
>  	}
>  
> @@ -258,8 +258,9 @@ SECTIONS
>  		INIT_CALLS
>  		CON_INITCALL
>  		INIT_RAM_FS
> -		*(.init.altinstructions .init.bss)	/* from the EFI stub */
> +		KEEP(*(.init.altinstructions .init.bss*))	/* from the EFI stub */
>  	}
> +
>  	.exit.data : {
>  		EXIT_DATA
>  	}
> -- 
> 2.27.0
>
  
Marc Zyngier July 17, 2023, 9:42 a.m. UTC | #2
On 2023-07-17 09:07, Kefeng Wang wrote:
> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing 
> the
> user to enable dead code elimination. In order for this to work, ensure
> that we keep the necessary tables by annotating them with KEEP, also it
> requires further changes to linker script to KEEP some tables and 
> wildcard
> compiler generated sections into the right place.
> 
> The following comparison is based 6.5-rc2 with defconfig,
> 
> $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new
> add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 
> (-64276)
> Function                                     old     new   delta
> ...
> Total: Before=17888959, After=17824683, chg -0.36%
> 
> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44)
> Data                                         old     new   delta
> ...
> Total: Before=4820808, After=4820764, chg -0.00%
> 
> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096)
> RO Data                                      old     new   delta
> ...
> Total: Before=5179123, After=5178027, chg -0.02%
> 
> $ size vmlinux-base vmlinux
>    text	   data	     bss      dec       hex	filename
> 25433734  15385766  630656  41450156  2787aac	vmlinux-base
> 24756738  15360870  629888  40747496  26dc1e8	vmlinux-new
> 
> Memory available after booting, saving 704k on qemu,
> base: 8084532K/8388608K
> new:  8085236K/8388608K
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

I took this patch for a spin in my tree, and ended up with:

   CC      .vmlinux.export.o
   UPD     include/generated/utsversion.h
   CC      init/version-timestamp.o
   LD      .tmp_vmlinux.kallsyms1
ld: init/main.o(__patchable_function_entries): error: need linked-to 
section for --gc-sections
make[2]: *** [scripts/Makefile.vmlinux:36: vmlinux] Error 1
make[1]: *** [/home/maz/hot-poop/arm-platforms/Makefile:1238: vmlinux] 
Error 2
make: *** [Makefile:234: __sub-make] Error 2

so it's probably not ready for prime time.

         M.
  
Kefeng Wang July 17, 2023, 11:20 a.m. UTC | #3
On 2023/7/17 17:24, Will Deacon wrote:
> On Mon, Jul 17, 2023 at 04:07:39PM +0800, Kefeng Wang wrote:
>> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the
>> user to enable dead code elimination. In order for this to work, ensure
>> that we keep the necessary tables by annotating them with KEEP, also it
>> requires further changes to linker script to KEEP some tables and wildcard
>> compiler generated sections into the right place.
>>
>> The following comparison is based 6.5-rc2 with defconfig,
>>
>> $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new
>> add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276)
>> Function                                     old     new   delta
>> ...
>> Total: Before=17888959, After=17824683, chg -0.36%
>>
>> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44)
>> Data                                         old     new   delta
>> ...
>> Total: Before=4820808, After=4820764, chg -0.00%
>>
>> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096)
>> RO Data                                      old     new   delta
>> ...
>> Total: Before=5179123, After=5178027, chg -0.02%
>>
>> $ size vmlinux-base vmlinux
>>     text	   data	     bss      dec       hex	filename
>> 25433734  15385766  630656  41450156  2787aac	vmlinux-base
>> 24756738  15360870  629888  40747496  26dc1e8	vmlinux-new
>>
>> Memory available after booting, saving 704k on qemu,
>> base: 8084532K/8388608K
>> new:  8085236K/8388608K
> 
> Is that a 0.009% improvement? Is it really worth the hassle?
> 
> x86 doesn't select this and risc-v had to turn it off for LLD, so it feels
> like we're just creating a rod for our own back by selecting it.


The LD_DEAD_CODE_DATA_ELIMINATION is particularly used for small configs
on small systems, risc-v is aimed to resource limited board platforms,
maybe x86 has no strong requirement, and we will try to use it on some
embedded board, if no one try it, this feature will never become stable :)

> 
> Will
>
  
Kefeng Wang July 17, 2023, 11:56 a.m. UTC | #4
On 2023/7/17 17:42, Marc Zyngier wrote:
> On 2023-07-17 09:07, Kefeng Wang wrote:
>> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the
>> user to enable dead code elimination. In order for this to work, ensure
>> that we keep the necessary tables by annotating them with KEEP, also it
>> requires further changes to linker script to KEEP some tables and 
>> wildcard
>> compiler generated sections into the right place.
>>
>> The following comparison is based 6.5-rc2 with defconfig,
>>
>> $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new
>> add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276)
>> Function                                     old     new   delta
>> ...
>> Total: Before=17888959, After=17824683, chg -0.36%
>>
>> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44)
>> Data                                         old     new   delta
>> ...
>> Total: Before=4820808, After=4820764, chg -0.00%
>>
>> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096)
>> RO Data                                      old     new   delta
>> ...
>> Total: Before=5179123, After=5178027, chg -0.02%
>>
>> $ size vmlinux-base vmlinux
>>    text       data         bss      dec       hex    filename
>> 25433734  15385766  630656  41450156  2787aac    vmlinux-base
>> 24756738  15360870  629888  40747496  26dc1e8    vmlinux-new
>>
>> Memory available after booting, saving 704k on qemu,
>> base: 8084532K/8388608K
>> new:  8085236K/8388608K
>>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> 
> I took this patch for a spin in my tree, and ended up with:
> 
>    CC      .vmlinux.export.o
>    UPD     include/generated/utsversion.h
>    CC      init/version-timestamp.o
>    LD      .tmp_vmlinux.kallsyms1
> ld: init/main.o(__patchable_function_entries): error: need linked-to 
> section for --gc-sections
> make[2]: *** [scripts/Makefile.vmlinux:36: vmlinux] Error 1
> make[1]: *** [/home/maz/hot-poop/arm-platforms/Makefile:1238: vmlinux] 
> Error 2
> make: *** [Makefile:234: __sub-make] Error 2

I don't find this error with CONFIG_FTRACE_MCOUNT_RECORD or 
allyesconfig, does it need special config or gcc version?
> 
> so it's probably not ready for prime time.
> 
>          M.
  
Marc Zyngier July 17, 2023, 12:15 p.m. UTC | #5
On Mon, 17 Jul 2023 12:56:39 +0100,
Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> 
> 
> 
> On 2023/7/17 17:42, Marc Zyngier wrote:
> > On 2023-07-17 09:07, Kefeng Wang wrote:
> >> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the
> >> user to enable dead code elimination. In order for this to work, ensure
> >> that we keep the necessary tables by annotating them with KEEP, also it
> >> requires further changes to linker script to KEEP some tables and
> >> wildcard
> >> compiler generated sections into the right place.
> >> 
> >> The following comparison is based 6.5-rc2 with defconfig,
> >> 
> >> $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new
> >> add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276)
> >> Function                                     old     new   delta
> >> ...
> >> Total: Before=17888959, After=17824683, chg -0.36%
> >> 
> >> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44)
> >> Data                                         old     new   delta
> >> ...
> >> Total: Before=4820808, After=4820764, chg -0.00%
> >> 
> >> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096)
> >> RO Data                                      old     new   delta
> >> ...
> >> Total: Before=5179123, After=5178027, chg -0.02%
> >> 
> >> $ size vmlinux-base vmlinux
> >>    text       data         bss      dec       hex    filename
> >> 25433734  15385766  630656  41450156  2787aac    vmlinux-base
> >> 24756738  15360870  629888  40747496  26dc1e8    vmlinux-new
> >> 
> >> Memory available after booting, saving 704k on qemu,
> >> base: 8084532K/8388608K
> >> new:  8085236K/8388608K
> >> 
> >> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> > 
> > I took this patch for a spin in my tree, and ended up with:
> > 
> >    CC      .vmlinux.export.o
> >    UPD     include/generated/utsversion.h
> >    CC      init/version-timestamp.o
> >    LD      .tmp_vmlinux.kallsyms1
> > ld: init/main.o(__patchable_function_entries): error: need linked-to
> > section for --gc-sections
> > make[2]: *** [scripts/Makefile.vmlinux:36: vmlinux] Error 1
> > make[1]: *** [/home/maz/hot-poop/arm-platforms/Makefile:1238:
> > vmlinux] Error 2
> > make: *** [Makefile:234: __sub-make] Error 2
> 
> I don't find this error with CONFIG_FTRACE_MCOUNT_RECORD or
> allyesconfig, does it need special config or gcc version?

You tell me!

gcc (Debian 10.2.1-6) 10.2.1 20210110
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

so hardly something special. This is built with the current state of
my NV tree, available here[1] As for the configuration, have a look
here[2].

	M.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.6-WIP
[2] https://paste.debian.net/1286106/
  
Kefeng Wang July 18, 2023, 11:11 a.m. UTC | #6
On 2023/7/17 20:15, Marc Zyngier wrote:
> On Mon, 17 Jul 2023 12:56:39 +0100,
> Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>
>>
>>
>> On 2023/7/17 17:42, Marc Zyngier wrote:
>>> On 2023-07-17 09:07, Kefeng Wang wrote:
>>>> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the
>>>> user to enable dead code elimination. In order for this to work, ensure
>>>> that we keep the necessary tables by annotating them with KEEP, also it
>>>> requires further changes to linker script to KEEP some tables and
>>>> wildcard
>>>> compiler generated sections into the right place.
>>>>
>>>> The following comparison is based 6.5-rc2 with defconfig,
>>>>
...
>>>
>>> I took this patch for a spin in my tree, and ended up with:
>>>
>>>     CC      .vmlinux.export.o
>>>     UPD     include/generated/utsversion.h
>>>     CC      init/version-timestamp.o
>>>     LD      .tmp_vmlinux.kallsyms1
>>> ld: init/main.o(__patchable_function_entries): error: need linked-to
>>> section for --gc-sections
>>> make[2]: *** [scripts/Makefile.vmlinux:36: vmlinux] Error 1
>>> make[1]: *** [/home/maz/hot-poop/arm-platforms/Makefile:1238:
>>> vmlinux] Error 2
>>> make: *** [Makefile:234: __sub-make] Error 2
>>
>> I don't find this error with CONFIG_FTRACE_MCOUNT_RECORD or
>> allyesconfig, does it need special config or gcc version?
> 
> You tell me!
> 
> gcc (Debian 10.2.1-6) 10.2.1 20210110
> Copyright (C) 2020 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> 
> so hardly something special. This is built with the current state of
> my NV tree, available here[1] As for the configuration, have a look
> here[2].

1) With gcc 10.3.1/ld (GNU Binutils) 2.37, it could be reproduced,
but there is no issue for cross-compiler gcc 9.3/ld (GNU Binutils for
Ubuntu) 2.34.

2) There is same issue like commit f7584322e4fe ("riscv: disable 
HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD") said with allyesconfig on
arm64, it takes too long in bfd_flavour_name()

Samples: 257K of event 'cycles', Event count (approx.): 203974259359 

   Overhead  Shared Object             Symbol 
               IPC   [IPC Coverage]
-   61.11%  libbfd-2.34-arm64.so      [.] bfd_flavour_name 
              -      -
      bfd_flavour_name 

-    6.55%  libbfd-2.34-arm64.so      [.] bfd_hash_traverse 
              -      -


Just like you said, it is not ready for prime time, so please ignore
this patch :(


> 
> 	M.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.6-WIP
> [2] https://paste.debian.net/1286106/
>
  
liuyuntao (F) Jan. 25, 2024, 1:45 p.m. UTC | #7
On 2023/7/17 17:24, Will Deacon wrote:
> On Mon, Jul 17, 2023 at 04:07:39PM +0800, Kefeng Wang wrote:
>> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the
>> user to enable dead code elimination. In order for this to work, ensure
>> that we keep the necessary tables by annotating them with KEEP, also it
>> requires further changes to linker script to KEEP some tables and wildcard
>> compiler generated sections into the right place.
>>
>> The following comparison is based 6.5-rc2 with defconfig,
>>
>> $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new
>> add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276)
>> Function                                     old     new   delta
>> ...
>> Total: Before=17888959, After=17824683, chg -0.36%
>>
>> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44)
>> Data                                         old     new   delta
>> ...
>> Total: Before=4820808, After=4820764, chg -0.00%
>>
>> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096)
>> RO Data                                      old     new   delta
>> ...
>> Total: Before=5179123, After=5178027, chg -0.02%
>>
>> $ size vmlinux-base vmlinux
>>     text	   data	     bss      dec       hex	filename
>> 25433734  15385766  630656  41450156  2787aac	vmlinux-base
>> 24756738  15360870  629888  40747496  26dc1e8	vmlinux-new
>>
>> Memory available after booting, saving 704k on qemu,
>> base: 8084532K/8388608K
>> new:  8085236K/8388608K
> 
> Is that a 0.009% improvement? Is it really worth the hassle?
> 
> x86 doesn't select this and risc-v had to turn it off for LLD, so it feels
> like we're just creating a rod for our own back by selecting it.

I tested this patch and found that, the smaller the config file,the more 
significant the reduction in file size of the builds. This may be useful
in scenarios such as embedded systems where size is particularly critical.

Just like Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for RISC-V,
this boots well on qemu, with defconfig, it shrinks their builds by ~1.6%,
and with tinyconfig it shrinks their builds by ~18.7%.

defconfig:
    text        data       bss         dec        hex
26839348    16695234    629456    44164038    2a1e3c6    before
26140556    16667058    628880    43436494    296c9ce    after

tinyconfig:
    text        data       bss         dec        hex
 1259568      272100    104312     1635980     18f68c    before
  967056      258716    103824     1329596     1449bc    after

          | tinyconfig             | defconfig
  --------|------------------------|---------------------
   No DCE | 1635980                | 44164038
      DCE | 1329596                | 43436494
   Shrink |  306384 (~18.7%)       |   727544 (~1.6%)

> 
> Will
>
  

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a2511b30d0f6..73bb908ec62f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -148,6 +148,7 @@  config ARM64
 	select GENERIC_VDSO_TIME_NS
 	select HARDIRQS_SW_RESEND
 	select HAS_IOPORT
+	select HAVE_LD_DEAD_CODE_DATA_ELIMINATION
 	select HAVE_MOVE_PMD
 	select HAVE_MOVE_PUD
 	select HAVE_PCI
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 3cd7e76cc562..bb4ce6cd6896 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -238,7 +238,7 @@  SECTIONS
 	. = ALIGN(4);
 	.altinstructions : {
 		__alt_instructions = .;
-		*(.altinstructions)
+		KEEP(*(.altinstructions))
 		__alt_instructions_end = .;
 	}
 
@@ -258,8 +258,9 @@  SECTIONS
 		INIT_CALLS
 		CON_INITCALL
 		INIT_RAM_FS
-		*(.init.altinstructions .init.bss)	/* from the EFI stub */
+		KEEP(*(.init.altinstructions .init.bss*))	/* from the EFI stub */
 	}
+
 	.exit.data : {
 		EXIT_DATA
 	}