[00/46] gcc-LTO support for the kernel

Message ID 20221114114344.18650-1-jirislaby@kernel.org
Headers
Series gcc-LTO support for the kernel |

Message

Jiri Slaby Nov. 14, 2022, 11:42 a.m. UTC
  Hi,

this is the first call for comments (and kbuild complaints) for this
support of gcc (full) LTO in the kernel. Most of the patches come from
Andi. Me and Martin rebased them to new kernels and fixed the to-use
known issues. Also I updated most of the commit logs and reordered the
patches to groups of patches with similar intent.

The very first patch comes from Alexander and is pending on some x86
queue already (I believe). I am attaching it only for completeness.
Without that, the kernel does not boot (LTO reorders a lot).

In our measurements, the performance differences are negligible.

The kernel is bigger with gcc LTO due to more inlining. The next step
might be to play with non-static functions as we export everything, so
the compiler cannot actually drop anything (esp. inlined and no longer
needed functions).

Cc: Alexander Potapenko <glider@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Makhalov <amakhalov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Hao Luo <haoluo@google.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Hubicka <jh@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Martin Liska <mliska@suse.cz>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Richard Biener <RGuenther@suse.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
Cc: Yonghong Song <yhs@fb.com>

Alexander Lobakin (1):
  x86/boot: robustify calling startup_{32,64}() from the decompressor
    code

Andi Kleen (36):
  Compiler Attributes, lto: introduce __noreorder
  tracepoint, lto: Mark static call functions as __visible
  static_call, lto: Mark static keys as __visible
  static_call, lto: Mark static_call_return0() as __visible
  static_call, lto: Mark func_a() as __visible_on_lto
  x86/alternative, lto: Mark int3_*() as global and __visible
  x86/paravirt, lto: Mark native_steal_clock() as __visible_on_lto
  x86/preempt, lto: Mark preempt_schedule_*thunk() as __visible
  x86/xen, lto: Mark xen_vcpu_stolen() as __visible
  x86, lto: Mark gdt_page and native_sched_clock() as __visible
  amd, lto: Mark amd pmu and pstate functions as __visible_on_lto
  entry, lto: Mark raw_irqentry_exit_cond_resched() as __visible
  export, lto: Mark __kstrtab* in EXPORT_SYMBOL() as global and
    __visible
  softirq, lto: Mark irq_enter/exit_rcu() as __visible
  btf, lto: Make all BTF IDs global on LTO
  init.h, lto: mark initcalls as __noreorder
  bpf, lto: mark interpreter jump table as __noreorder
  sched, lto: mark sched classes as __noreorder
  linkage, lto: use C version for SYSCALL_ALIAS() / cond_syscall()
  scripts, lto: re-add gcc-ld
  scripts, lto: use CONFIG_LTO for many LTO specific actions
  Kbuild, lto: Add Link Time Optimization support
  x86/purgatory, lto: Disable gcc LTO for purgatory
  x86/realmode, lto: Disable gcc LTO for real mode code
  x86/vdso, lto: Disable gcc LTO for the vdso
  scripts, lto: disable gcc LTO for some mod sources
  Kbuild, lto: disable gcc LTO for bounds+asm-offsets
  lib/string, lto: disable gcc LTO for string.o
  Compiler attributes, lto: disable __flatten with LTO
  Kbuild, lto: don't include weak source file symbols in System.map
  x86, lto: Disable relative init pointers with gcc LTO
  x86/livepatch, lto: Disable live patching with gcc LTO
  x86/lib, lto: Mark 32bit mem{cpy,move,set} as __used
  scripts, lto: check C symbols for modversions
  scripts/bloat-o-meter, lto: handle gcc LTO
  x86, lto: Finally enable gcc LTO for x86

Jiri Slaby (5):
  kbuild: pass jobserver to cmd_ld_vmlinux.o
  compiler.h: introduce __visible_on_lto
  compiler.h: introduce __global_on_lto
  btf, lto: pass scope as strings
  x86/apic, lto: Mark apic_driver*() as __noreorder

Martin Liska (4):
  kbuild: lto: preserve MAKEFLAGS for module linking
  x86/sev, lto: Mark cpuid_table_copy as __visible_on_lto
  mm/kasan, lto: Mark kasan mem{cpy,move,set} as __used
  kasan, lto: remove extra BUILD_BUG() in memory_is_poisoned

 Documentation/kbuild/index.rst      |  2 +
 Documentation/kbuild/lto-build.rst  | 76 +++++++++++++++++++++++++++++
 Kbuild                              |  3 ++
 Makefile                            |  6 ++-
 arch/Kconfig                        | 52 ++++++++++++++++++++
 arch/x86/Kconfig                    |  5 +-
 arch/x86/boot/compressed/head_32.S  |  2 +-
 arch/x86/boot/compressed/head_64.S  |  2 +-
 arch/x86/boot/compressed/misc.c     | 16 +++---
 arch/x86/entry/vdso/Makefile        |  2 +
 arch/x86/events/amd/core.c          |  2 +-
 arch/x86/include/asm/apic.h         |  4 +-
 arch/x86/include/asm/preempt.h      |  4 +-
 arch/x86/kernel/alternative.c       |  5 +-
 arch/x86/kernel/cpu/common.c        |  2 +-
 arch/x86/kernel/paravirt.c          |  2 +-
 arch/x86/kernel/sev-shared.c        |  2 +-
 arch/x86/kernel/tsc.c               |  2 +-
 arch/x86/lib/memcpy_32.c            |  6 +--
 arch/x86/purgatory/Makefile         |  2 +
 arch/x86/realmode/Makefile          |  1 +
 drivers/cpufreq/amd-pstate.c        | 15 +++---
 drivers/xen/time.c                  |  2 +-
 include/asm-generic/vmlinux.lds.h   |  2 +-
 include/linux/btf_ids.h             | 24 ++++-----
 include/linux/compiler.h            |  8 +++
 include/linux/compiler_attributes.h | 15 ++++++
 include/linux/export.h              |  6 ++-
 include/linux/init.h                |  2 +-
 include/linux/linkage.h             | 16 +++---
 include/linux/static_call.h         | 12 ++---
 include/linux/tracepoint.h          |  4 +-
 kernel/bpf/core.c                   |  2 +-
 kernel/entry/common.c               |  2 +-
 kernel/kallsyms.c                   |  2 +-
 kernel/livepatch/Kconfig            |  1 +
 kernel/sched/sched.h                |  1 +
 kernel/softirq.c                    |  4 +-
 kernel/static_call.c                |  2 +-
 kernel/static_call_inline.c         |  6 +--
 kernel/time/posix-stubs.c           | 19 +++++++-
 lib/Makefile                        |  2 +
 mm/kasan/generic.c                  |  2 +-
 mm/kasan/shadow.c                   |  6 +--
 scripts/Makefile.build              | 17 ++++---
 scripts/Makefile.lib                |  2 +-
 scripts/Makefile.lto                | 43 ++++++++++++++++
 scripts/Makefile.modfinal           |  2 +-
 scripts/Makefile.vmlinux            |  3 +-
 scripts/Makefile.vmlinux_o          |  6 +--
 scripts/bloat-o-meter               |  2 +-
 scripts/gcc-ld                      | 40 +++++++++++++++
 scripts/link-vmlinux.sh             |  9 ++--
 scripts/mksysmap                    |  2 +
 scripts/mod/Makefile                |  3 ++
 scripts/module.lds.S                |  2 +-
 56 files changed, 384 insertions(+), 100 deletions(-)
 create mode 100644 Documentation/kbuild/lto-build.rst
 create mode 100644 scripts/Makefile.lto
 create mode 100755 scripts/gcc-ld
  

Comments

Ard Biesheuvel Nov. 14, 2022, 11:56 a.m. UTC | #1
On Mon, 14 Nov 2022 at 12:44, Jiri Slaby (SUSE) <jirislaby@kernel.org> wrote:
>
> Hi,
>
> this is the first call for comments (and kbuild complaints) for this
> support of gcc (full) LTO in the kernel. Most of the patches come from
> Andi. Me and Martin rebased them to new kernels and fixed the to-use
> known issues. Also I updated most of the commit logs and reordered the
> patches to groups of patches with similar intent.
>
> The very first patch comes from Alexander and is pending on some x86
> queue already (I believe). I am attaching it only for completeness.
> Without that, the kernel does not boot (LTO reorders a lot).
>

You didn't cc me on that patch so I will reply here: I don't think
this is the right solution.
On x86, there is a lot of stuff injected into .head.text that simply
does not belong there, and getting rid of the __head annotation and
dropping __HEAD from the Xen pvh head.S file would be a much better
solution.
  
Jiri Slaby Nov. 14, 2022, 12:04 p.m. UTC | #2
On 14. 11. 22, 12:56, Ard Biesheuvel wrote:
> On Mon, 14 Nov 2022 at 12:44, Jiri Slaby (SUSE) <jirislaby@kernel.org> wrote:
>>
>> Hi,
>>
>> this is the first call for comments (and kbuild complaints) for this
>> support of gcc (full) LTO in the kernel. Most of the patches come from
>> Andi. Me and Martin rebased them to new kernels and fixed the to-use
>> known issues. Also I updated most of the commit logs and reordered the
>> patches to groups of patches with similar intent.
>>
>> The very first patch comes from Alexander and is pending on some x86
>> queue already (I believe). I am attaching it only for completeness.
>> Without that, the kernel does not boot (LTO reorders a lot).
>>
> 
> You didn't cc me on that patch so I will reply here: I don't think
> this is the right solution.
> On x86, there is a lot of stuff injected into .head.text that simply
> does not belong there, and getting rid of the __head annotation and
> dropping __HEAD from the Xen pvh head.S file would be a much better
> solution.

I think Alexander was working on that too. I'm not sure -- anyway, we 
still have the other fix. That is putting startup_64() to a special 
section and put that to the beginning of vmlinux using lds. (Until 
.head.text is completely gone for good -- same as on arm, you wrote 
somewhere.)

In any case, that patch was added only for reference, if anyone wants to 
give the series a try. Next time, I can attach the other workaround ;).

I don't expect anyone will take the series as is. There will be a lot of 
comments, I suppose. Hence many re-spins...

thanks,
  
Ard Biesheuvel Nov. 14, 2022, 7:40 p.m. UTC | #3
On Mon, 14 Nov 2022 at 12:44, Jiri Slaby (SUSE) <jirislaby@kernel.org> wrote:
>
> Hi,
>
> this is the first call for comments (and kbuild complaints) for this
> support of gcc (full) LTO in the kernel. Most of the patches come from
> Andi. Me and Martin rebased them to new kernels and fixed the to-use
> known issues. Also I updated most of the commit logs and reordered the
> patches to groups of patches with similar intent.
>
> The very first patch comes from Alexander and is pending on some x86
> queue already (I believe). I am attaching it only for completeness.
> Without that, the kernel does not boot (LTO reorders a lot).
>
> In our measurements, the performance differences are negligible.
>
> The kernel is bigger with gcc LTO due to more inlining.

OK, so if I understand this correctly:
- the performance is the same
- the resulting image is bigger
- we need a whole lot of ugly hacks to placate the linker.

Pardon my cynicism, but this cover letter does not mention any
advantages of LTO, so what is the point of all of this?

(On Clang, LTO was needed for CFI, but this is not even the case anymore)
  
Peter Zijlstra Nov. 17, 2022, 8:28 a.m. UTC | #4
On Mon, Nov 14, 2022 at 08:40:50PM +0100, Ard Biesheuvel wrote:
> On Mon, 14 Nov 2022 at 12:44, Jiri Slaby (SUSE) <jirislaby@kernel.org> wrote:
> >
> > Hi,
> >
> > this is the first call for comments (and kbuild complaints) for this
> > support of gcc (full) LTO in the kernel. Most of the patches come from
> > Andi. Me and Martin rebased them to new kernels and fixed the to-use
> > known issues. Also I updated most of the commit logs and reordered the
> > patches to groups of patches with similar intent.
> >
> > The very first patch comes from Alexander and is pending on some x86
> > queue already (I believe). I am attaching it only for completeness.
> > Without that, the kernel does not boot (LTO reorders a lot).
> >
> > In our measurements, the performance differences are negligible.
> >
> > The kernel is bigger with gcc LTO due to more inlining.
> 
> OK, so if I understand this correctly:
> - the performance is the same
> - the resulting image is bigger
> - we need a whole lot of ugly hacks to placate the linker.
> 
> Pardon my cynicism, but this cover letter does not mention any
> advantages of LTO, so what is the point of all of this?

Seconded; I really hate all the ugly required for the GCC-LTO
'solution'. There not actually being any benefit just makes it a very
simple decision to drop all these patches on the floor.
  
Richard Biener Nov. 17, 2022, 8:50 a.m. UTC | #5
On Thu, 17 Nov 2022, Peter Zijlstra wrote:

> On Mon, Nov 14, 2022 at 08:40:50PM +0100, Ard Biesheuvel wrote:
> > On Mon, 14 Nov 2022 at 12:44, Jiri Slaby (SUSE) <jirislaby@kernel.org> wrote:
> > >
> > > Hi,
> > >
> > > this is the first call for comments (and kbuild complaints) for this
> > > support of gcc (full) LTO in the kernel. Most of the patches come from
> > > Andi. Me and Martin rebased them to new kernels and fixed the to-use
> > > known issues. Also I updated most of the commit logs and reordered the
> > > patches to groups of patches with similar intent.
> > >
> > > The very first patch comes from Alexander and is pending on some x86
> > > queue already (I believe). I am attaching it only for completeness.
> > > Without that, the kernel does not boot (LTO reorders a lot).
> > >
> > > In our measurements, the performance differences are negligible.
> > >
> > > The kernel is bigger with gcc LTO due to more inlining.
> > 
> > OK, so if I understand this correctly:
> > - the performance is the same
> > - the resulting image is bigger
> > - we need a whole lot of ugly hacks to placate the linker.
> > 
> > Pardon my cynicism, but this cover letter does not mention any
> > advantages of LTO, so what is the point of all of this?
> 
> Seconded; I really hate all the ugly required for the GCC-LTO
> 'solution'. There not actually being any benefit just makes it a very
> simple decision to drop all these patches on the floor.

I'd say that instead a prerequesite for the series would be to actually
enforce hidden visibility for everything not part of the kernel module
API so the compiler can throw away unused functions.  Currently it has
to keep everything because with a shared object there might be external
references to everything exported from individual TUs.

There was a size benefit mentioned for module-less monolithic kernels
as likely used in embedded setups, not sure if that's enough motivation
to properly annotate symbols with visibility - and as far as I understand
all these 'required' are actually such fixes.

Richard.
  
Peter Zijlstra Nov. 17, 2022, 11:42 a.m. UTC | #6
On Thu, Nov 17, 2022 at 08:50:59AM +0000, Richard Biener wrote:
> On Thu, 17 Nov 2022, Peter Zijlstra wrote:
> 
> > On Mon, Nov 14, 2022 at 08:40:50PM +0100, Ard Biesheuvel wrote:
> > > On Mon, 14 Nov 2022 at 12:44, Jiri Slaby (SUSE) <jirislaby@kernel.org> wrote:
> > > >
> > > > Hi,
> > > >
> > > > this is the first call for comments (and kbuild complaints) for this
> > > > support of gcc (full) LTO in the kernel. Most of the patches come from
> > > > Andi. Me and Martin rebased them to new kernels and fixed the to-use
> > > > known issues. Also I updated most of the commit logs and reordered the
> > > > patches to groups of patches with similar intent.
> > > >
> > > > The very first patch comes from Alexander and is pending on some x86
> > > > queue already (I believe). I am attaching it only for completeness.
> > > > Without that, the kernel does not boot (LTO reorders a lot).
> > > >
> > > > In our measurements, the performance differences are negligible.
> > > >
> > > > The kernel is bigger with gcc LTO due to more inlining.
> > > 
> > > OK, so if I understand this correctly:
> > > - the performance is the same
> > > - the resulting image is bigger
> > > - we need a whole lot of ugly hacks to placate the linker.
> > > 
> > > Pardon my cynicism, but this cover letter does not mention any
> > > advantages of LTO, so what is the point of all of this?
> > 
> > Seconded; I really hate all the ugly required for the GCC-LTO
> > 'solution'. There not actually being any benefit just makes it a very
> > simple decision to drop all these patches on the floor.
> 
> I'd say that instead a prerequesite for the series would be to actually
> enforce hidden visibility for everything not part of the kernel module
> API so the compiler can throw away unused functions.  Currently it has
> to keep everything because with a shared object there might be external
> references to everything exported from individual TUs.

I'm not sure what you're on about; only symbols annotated with
EXPORT_SYMBOL*() are accessible from modules (aka DSOs) and those will
have their address taken. You can feely eliminate any unused symbol.

> There was a size benefit mentioned for module-less monolithic kernels
> as likely used in embedded setups, not sure if that's enough motivation
> to properly annotate symbols with visibility - and as far as I understand
> all these 'required' are actually such fixes.

I'm not seeing how littering __visible is useful or desired, doubly so
for that static hack, that's just a crude work around for GCC LTO being
inferior for not being able to read inline asm.
  
Thomas Gleixner Nov. 17, 2022, 11:48 a.m. UTC | #7
On Thu, Nov 17 2022 at 08:50, Richard Biener wrote:
> On Thu, 17 Nov 2022, Peter Zijlstra wrote:
>> Seconded; I really hate all the ugly required for the GCC-LTO
>> 'solution'. There not actually being any benefit just makes it a very
>> simple decision to drop all these patches on the floor.
>
> I'd say that instead a prerequesite for the series would be to actually
> enforce hidden visibility for everything not part of the kernel module
> API so the compiler can throw away unused functions.  Currently it has
> to keep everything because with a shared object there might be external
> references to everything exported from individual TUs.
>
> There was a size benefit mentioned for module-less monolithic kernels
> as likely used in embedded setups, not sure if that's enough motivation
> to properly annotate symbols with visibility - and as far as I understand
> all these 'required' are actually such fixes.

To accomodate a broken tool which cannot figure out which functions are
referenced in the final lump and which are not, right?

Can we pretty please fix the tool instead of proliferating the
brokenness?

Thanks,

        tglx
  
Ard Biesheuvel Nov. 17, 2022, 11:49 a.m. UTC | #8
On Thu, 17 Nov 2022 at 12:43, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Nov 17, 2022 at 08:50:59AM +0000, Richard Biener wrote:
> > On Thu, 17 Nov 2022, Peter Zijlstra wrote:
> >
> > > On Mon, Nov 14, 2022 at 08:40:50PM +0100, Ard Biesheuvel wrote:
> > > > On Mon, 14 Nov 2022 at 12:44, Jiri Slaby (SUSE) <jirislaby@kernel.org> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > this is the first call for comments (and kbuild complaints) for this
> > > > > support of gcc (full) LTO in the kernel. Most of the patches come from
> > > > > Andi. Me and Martin rebased them to new kernels and fixed the to-use
> > > > > known issues. Also I updated most of the commit logs and reordered the
> > > > > patches to groups of patches with similar intent.
> > > > >
> > > > > The very first patch comes from Alexander and is pending on some x86
> > > > > queue already (I believe). I am attaching it only for completeness.
> > > > > Without that, the kernel does not boot (LTO reorders a lot).
> > > > >
> > > > > In our measurements, the performance differences are negligible.
> > > > >
> > > > > The kernel is bigger with gcc LTO due to more inlining.
> > > >
> > > > OK, so if I understand this correctly:
> > > > - the performance is the same
> > > > - the resulting image is bigger
> > > > - we need a whole lot of ugly hacks to placate the linker.
> > > >
> > > > Pardon my cynicism, but this cover letter does not mention any
> > > > advantages of LTO, so what is the point of all of this?
> > >
> > > Seconded; I really hate all the ugly required for the GCC-LTO
> > > 'solution'. There not actually being any benefit just makes it a very
> > > simple decision to drop all these patches on the floor.
> >
> > I'd say that instead a prerequesite for the series would be to actually
> > enforce hidden visibility for everything not part of the kernel module
> > API so the compiler can throw away unused functions.  Currently it has
> > to keep everything because with a shared object there might be external
> > references to everything exported from individual TUs.
>
> I'm not sure what you're on about; only symbols annotated with
> EXPORT_SYMBOL*() are accessible from modules (aka DSOs) and those will
> have their address taken. You can feely eliminate any unused symbol.
>
> > There was a size benefit mentioned for module-less monolithic kernels
> > as likely used in embedded setups, not sure if that's enough motivation
> > to properly annotate symbols with visibility - and as far as I understand
> > all these 'required' are actually such fixes.
>
> I'm not seeing how littering __visible is useful or desired, doubly so
> for that static hack, that's just a crude work around for GCC LTO being
> inferior for not being able to read inline asm.

We have an __ADDRESSABLE() macro and asmlinkage modifier to annotate
symbols that may appear to the compiler as though they are never
referenced.

Would it be possible to repurpose those so that the LTO code knows
which symbols it must not remove?
  
Richard Biener Nov. 17, 2022, 1:55 p.m. UTC | #9
On Thu, 17 Nov 2022, Ard Biesheuvel wrote:

> On Thu, 17 Nov 2022 at 12:43, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Nov 17, 2022 at 08:50:59AM +0000, Richard Biener wrote:
> > > On Thu, 17 Nov 2022, Peter Zijlstra wrote:
> > >
> > > > On Mon, Nov 14, 2022 at 08:40:50PM +0100, Ard Biesheuvel wrote:
> > > > > On Mon, 14 Nov 2022 at 12:44, Jiri Slaby (SUSE) <jirislaby@kernel.org> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > this is the first call for comments (and kbuild complaints) for this
> > > > > > support of gcc (full) LTO in the kernel. Most of the patches come from
> > > > > > Andi. Me and Martin rebased them to new kernels and fixed the to-use
> > > > > > known issues. Also I updated most of the commit logs and reordered the
> > > > > > patches to groups of patches with similar intent.
> > > > > >
> > > > > > The very first patch comes from Alexander and is pending on some x86
> > > > > > queue already (I believe). I am attaching it only for completeness.
> > > > > > Without that, the kernel does not boot (LTO reorders a lot).
> > > > > >
> > > > > > In our measurements, the performance differences are negligible.
> > > > > >
> > > > > > The kernel is bigger with gcc LTO due to more inlining.
> > > > >
> > > > > OK, so if I understand this correctly:
> > > > > - the performance is the same
> > > > > - the resulting image is bigger
> > > > > - we need a whole lot of ugly hacks to placate the linker.
> > > > >
> > > > > Pardon my cynicism, but this cover letter does not mention any
> > > > > advantages of LTO, so what is the point of all of this?
> > > >
> > > > Seconded; I really hate all the ugly required for the GCC-LTO
> > > > 'solution'. There not actually being any benefit just makes it a very
> > > > simple decision to drop all these patches on the floor.
> > >
> > > I'd say that instead a prerequesite for the series would be to actually
> > > enforce hidden visibility for everything not part of the kernel module
> > > API so the compiler can throw away unused functions.  Currently it has
> > > to keep everything because with a shared object there might be external
> > > references to everything exported from individual TUs.
> >
> > I'm not sure what you're on about; only symbols annotated with
> > EXPORT_SYMBOL*() are accessible from modules (aka DSOs) and those will
> > have their address taken. You can feely eliminate any unused symbol.

But IIRC that's not reflected on the ELF level by making EXPORT_SYMBOL*()
symbols public and the rest hidden - instead all symbols global in the C TUs
will become public and the module dynamic loader details are hidden from
GCCs view of the kernel image as ELF relocatable object.

> > > There was a size benefit mentioned for module-less monolithic kernels
> > > as likely used in embedded setups, not sure if that's enough motivation
> > > to properly annotate symbols with visibility - and as far as I understand
> > > all these 'required' are actually such fixes.
> >
> > I'm not seeing how littering __visible is useful or desired, doubly so
> > for that static hack, that's just a crude work around for GCC LTO being
> > inferior for not being able to read inline asm.
> 
> We have an __ADDRESSABLE() macro and asmlinkage modifier to annotate
> symbols that may appear to the compiler as though they are never
> referenced.
> 
> Would it be possible to repurpose those so that the LTO code knows
> which symbols it must not remove?

I find

/*
 * Force the compiler to emit 'sym' as a symbol, so that we can reference
 * it from inline assembler. Necessary in case 'sym' could be inlined
 * otherwise, or eliminated entirely due to lack of references that are
 * visible to the compiler.
 */
#define ___ADDRESSABLE(sym, __attrs) \
	static void * __used __attrs \
		__UNIQUE_ID(__PASTE(__addressable_,sym)) = (void *)&sym;
#define __ADDRESSABLE(sym) \
	___ADDRESSABLE(sym, __section(".discard.addressable"))

that should be enough to force LTO keeping 'sym' - unless there's
a linker script that discards .discard.addressable which I fear LTO
will notice, losing the effect.  A more direct way would be to attach
__used to 'sym' directly.  __ADDRESSABLE doesn't seem to be used
directly but instead I see cases like

#define __define_initcall_stub(__stub, fn)                      \
        int __init __stub(void);                                \
        int __init __stub(void)                                 \
        {                                                       \
                return fn();                                    \
        }                                                       \
        __ADDRESSABLE(__stub)

where one could have added __used to the __stub prototypes instead?

The folks who worked on LTO enablement of the kernel should know the
real issue better - I understand asm()s are a pain because GCC
refuses to parse the assembler string heuristically for used
symbols (but it can never be more than heuristics).  The issue with
asm()s is not so much elimination (__used solves that) but that
GCC can end up moving the asm() and the refered to symbols to
different link-time units causing unresolved symbols for non-global
symbols.  -fno-toplevel-reorder should fix that at some cost.

Richard.
  
Peter Zijlstra Nov. 17, 2022, 2:32 p.m. UTC | #10
On Thu, Nov 17, 2022 at 01:55:07PM +0000, Richard Biener wrote:

> > > I'm not sure what you're on about; only symbols annotated with
> > > EXPORT_SYMBOL*() are accessible from modules (aka DSOs) and those will
> > > have their address taken. You can feely eliminate any unused symbol.
> 
> But IIRC that's not reflected on the ELF level by making EXPORT_SYMBOL*()
> symbols public and the rest hidden - instead all symbols global in the C TUs
> will become public and the module dynamic loader details are hidden from
> GCCs view of the kernel image as ELF relocatable object.

It is reflected by keeping their address in __ksymtab_$foo sections, as
such their address 'escapes'.

> > We have an __ADDRESSABLE() macro and asmlinkage modifier to annotate
> > symbols that may appear to the compiler as though they are never
> > referenced.
> > 
> > Would it be possible to repurpose those so that the LTO code knows
> > which symbols it must not remove?
> 
> I find
> 
> /*
>  * Force the compiler to emit 'sym' as a symbol, so that we can reference
>  * it from inline assembler. Necessary in case 'sym' could be inlined
>  * otherwise, or eliminated entirely due to lack of references that are
>  * visible to the compiler.
>  */
> #define ___ADDRESSABLE(sym, __attrs) \
> 	static void * __used __attrs \
> 		__UNIQUE_ID(__PASTE(__addressable_,sym)) = (void *)&sym;
> #define __ADDRESSABLE(sym) \
> 	___ADDRESSABLE(sym, __section(".discard.addressable"))
> 
> that should be enough to force LTO keeping 'sym' - unless there's
> a linker script that discards .discard.addressable which I fear LTO
> will notice, losing the effect.

The initial LTO link pass will not discard .discard sections in order to
generate a regular ELF object file. This object file is then fed to
objtool and the kallsyms tool and eventually linked with the linker
script in a multi-stage link pass.

Also see scripts/link-vmlinux.sh for all the horrible details.

> The folks who worked on LTO enablement of the kernel should know the
> real issue better - I understand asm()s are a pain because GCC
> refuses to parse the assembler string heuristically for used
> symbols (but it can never be more than heuristics). 

I don't understand why it can't be more than heuristics; eventually the
asm() contents end up in a real assembler and it has to make sense.

Might as well parse it directly -- isn't that what clang-ias does?

> The issue with asm()s is not so much elimination (__used solves that)
> but that GCC can end up moving the asm() and the refered to symbols to
> different link-time units causing unresolved symbols for non-global
> symbols.  -fno-toplevel-reorder should fix that at some cost.

I thought the whole point of LTO was that there was only a single link
time unit, translate all the tus into intermadiate gunk and then collect
the whole lot in one go.
  
Richard Biener Nov. 17, 2022, 2:40 p.m. UTC | #11
On Thu, 17 Nov 2022, Peter Zijlstra wrote:

> On Thu, Nov 17, 2022 at 01:55:07PM +0000, Richard Biener wrote:
> 
> > > > I'm not sure what you're on about; only symbols annotated with
> > > > EXPORT_SYMBOL*() are accessible from modules (aka DSOs) and those will
> > > > have their address taken. You can feely eliminate any unused symbol.
> > 
> > But IIRC that's not reflected on the ELF level by making EXPORT_SYMBOL*()
> > symbols public and the rest hidden - instead all symbols global in the C TUs
> > will become public and the module dynamic loader details are hidden from
> > GCCs view of the kernel image as ELF relocatable object.
> 
> It is reflected by keeping their address in __ksymtab_$foo sections, as
> such their address 'escapes'.

That's not enough to make symbols not appearing in __ksymtab_$foo
sections eliminatable.

> > > We have an __ADDRESSABLE() macro and asmlinkage modifier to annotate
> > > symbols that may appear to the compiler as though they are never
> > > referenced.
> > > 
> > > Would it be possible to repurpose those so that the LTO code knows
> > > which symbols it must not remove?
> > 
> > I find
> > 
> > /*
> >  * Force the compiler to emit 'sym' as a symbol, so that we can reference
> >  * it from inline assembler. Necessary in case 'sym' could be inlined
> >  * otherwise, or eliminated entirely due to lack of references that are
> >  * visible to the compiler.
> >  */
> > #define ___ADDRESSABLE(sym, __attrs) \
> > 	static void * __used __attrs \
> > 		__UNIQUE_ID(__PASTE(__addressable_,sym)) = (void *)&sym;
> > #define __ADDRESSABLE(sym) \
> > 	___ADDRESSABLE(sym, __section(".discard.addressable"))
> > 
> > that should be enough to force LTO keeping 'sym' - unless there's
> > a linker script that discards .discard.addressable which I fear LTO
> > will notice, losing the effect.
> 
> The initial LTO link pass will not discard .discard sections in order to
> generate a regular ELF object file. This object file is then fed to
> objtool and the kallsyms tool and eventually linked with the linker
> script in a multi-stage link pass.
> 
> Also see scripts/link-vmlinux.sh for all the horrible details.
> 
> > The folks who worked on LTO enablement of the kernel should know the
> > real issue better - I understand asm()s are a pain because GCC
> > refuses to parse the assembler string heuristically for used
> > symbols (but it can never be more than heuristics). 
> 
> I don't understand why it can't be more than heuristics; eventually the
> asm() contents end up in a real assembler and it has to make sense.
> 
> Might as well parse it directly -- isn't that what clang-ias does?

GCC doesn't have an integrated assembler and the actual assembler text
that's emitted is not known at the stage we need to know the symbol.
Which means for GCC it would be heuristics.

> > The issue with asm()s is not so much elimination (__used solves that)
> > but that GCC can end up moving the asm() and the refered to symbols to
> > different link-time units causing unresolved symbols for non-global
> > symbols.  -fno-toplevel-reorder should fix that at some cost.
> 
> I thought the whole point of LTO was that there was only a single link
> time unit, translate all the tus into intermadiate gunk and then collect
> the whole lot in one go.

that's what it does, but it fans out to parallelize the final compile,
dividing the whole lot again which is where this problem can appear
if GCC doesn't see that asm() X uses symbol Y.

Richard.
  
Ard Biesheuvel Nov. 17, 2022, 3:15 p.m. UTC | #12
On Thu, 17 Nov 2022 at 14:55, Richard Biener <rguenther@suse.de> wrote:
>
> On Thu, 17 Nov 2022, Ard Biesheuvel wrote:
>
...
> > We have an __ADDRESSABLE() macro and asmlinkage modifier to annotate
> > symbols that may appear to the compiler as though they are never
> > referenced.
> >
> > Would it be possible to repurpose those so that the LTO code knows
> > which symbols it must not remove?
>
> I find
>
> /*
>  * Force the compiler to emit 'sym' as a symbol, so that we can reference
>  * it from inline assembler. Necessary in case 'sym' could be inlined
>  * otherwise, or eliminated entirely due to lack of references that are
>  * visible to the compiler.
>  */
> #define ___ADDRESSABLE(sym, __attrs) \
>         static void * __used __attrs \
>                 __UNIQUE_ID(__PASTE(__addressable_,sym)) = (void *)&sym;
> #define __ADDRESSABLE(sym) \
>         ___ADDRESSABLE(sym, __section(".discard.addressable"))
>
> that should be enough to force LTO keeping 'sym' - unless there's
> a linker script that discards .discard.addressable which I fear LTO
> will notice, losing the effect.  A more direct way would be to attach
> __used to 'sym' directly.  __ADDRESSABLE doesn't seem to be used
> directly but instead I see cases like
>
> #define __define_initcall_stub(__stub, fn)                      \
>         int __init __stub(void);                                \
>         int __init __stub(void)                                 \
>         {                                                       \
>                 return fn();                                    \
>         }                                                       \
>         __ADDRESSABLE(__stub)
>
> where one could have added __used to the __stub prototypes instead?
>

Probably, yes.

But my point was not really about the implementation of those things,
more about whether we could redefine them to something else that would
help the compiler infer that this symbol needs to be retained.

asmlinkage in particular seems relevant, which is currently only used
for C++ inclusion or for setting regparm{0} on i386.