[v2,0/2] x86: Don't save callee-saved registers if not needed

Message ID 20240122154540.65652-1-hjl.tools@gmail.com
Headers
Series x86: Don't save callee-saved registers if not needed |

Message

H.J. Lu Jan. 22, 2024, 3:45 p.m. UTC
  Changes in v2:

1. Rebase against commit f9df00340e3
2. Don't add redundant clobbered_registers check in ix86_expand_call.

In some cases, there are no need to save callee-saved registers:

1. If a noreturn function doesn't throw nor support exceptions, it can
skip saving callee-saved registers.

2. When an interrupt handler is implemented by an assembly stub which does:

  1. Save all registers.
  2. Call a C function.
  3. Restore all registers.
  4. Return from interrupt.

it is completely unnecessary to save and restore any registers in the C
function called by the assembly stub, even if they would normally be
callee-saved.

This patch set adds no_callee_saved_registers function attribute, which
is complementary to no_caller_saved_registers function attribute, to
classify x86 backend call-saved register handling type with

  1. Default call-saved registers.
  2. No caller-saved registers with no_caller_saved_registers attribute.
  3. No callee-saved registers with no_callee_saved_registers attribute.

Functions of no callee-saved registers won't save callee-saved registers.
If a noreturn function doesn't throw nor support exceptions, it is
classified as the no callee-saved registers type.

With these changes, __libc_start_main in glibc 2.39, which is a noreturn
function, is changed from

__libc_start_main:
	endbr64
	push   %r15
	push   %r14
	mov    %rcx,%r14
	push   %r13
	push   %r12
	push   %rbp
	mov    %esi,%ebp
	push   %rbx
	mov    %rdx,%rbx
	sub    $0x28,%rsp
	mov    %rdi,(%rsp)
	mov    %fs:0x28,%rax
	mov    %rax,0x18(%rsp)
	xor    %eax,%eax
	test   %r9,%r9

to

__libc_start_main:
	endbr64
        sub    $0x28,%rsp
        mov    %esi,%ebp
        mov    %rdx,%rbx
        mov    %rcx,%r14
        mov    %rdi,(%rsp)
        mov    %fs:0x28,%rax
        mov    %rax,0x18(%rsp)
        xor    %eax,%eax
        test   %r9,%r9

In Linux kernel 6.7.0 on x86-64, do_exit is changed from

do_exit:
        endbr64
        call   <do_exit+0x9>
        push   %r15
        push   %r14
        push   %r13
        push   %r12
        mov    %rdi,%r12
        push   %rbp
        push   %rbx
        mov    %gs:0x0,%rbx
        sub    $0x28,%rsp
        mov    %gs:0x28,%rax
        mov    %rax,0x20(%rsp)
        xor    %eax,%eax
        call   *0x0(%rip)        # <do_exit+0x39>
        test   $0x2,%ah
        je     <do_exit+0x8d3>

to

do_exit:
        endbr64
        call   <do_exit+0x9>
        sub    $0x28,%rsp
        mov    %rdi,%r12
        mov    %gs:0x28,%rax
        mov    %rax,0x20(%rsp)
        xor    %eax,%eax
        mov    %gs:0x0,%rbx
        call   *0x0(%rip)        # <do_exit+0x2f>
        test   $0x2,%ah
        je     <do_exit+0x8c9>

I compared GCC master branch bootstrap and test times on a slow machine
with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13
with the backported patch.  The performance data isn't precise since the
measurements were done on different days with different GCC sources under
different 6.6 kernel versions.

GCC master branch build time in seconds:

before                after                  improvement
30043.75user          30013.16user           0%
1274.85system         1243.72system          2.4%

GCC master branch test time in seconds (new tests added):

before                after                  improvement
216035.90user         216547.51user          0
27365.51system        26658.54system         2.6%

Backported to GCC 13 to rebuild system glibc and kernel on Fedora 39.
Systems perform normally.

H.J. Lu (2):
  x86: Add no_callee_saved_registers function attribute
  x86: Don't save callee-saved registers in noreturn functions

 gcc/config/i386/i386-expand.cc                | 58 +++++++++++++--
 gcc/config/i386/i386-options.cc               | 61 ++++++++++++----
 gcc/config/i386/i386.cc                       | 70 +++++++++++++++----
 gcc/config/i386/i386.h                        | 20 +++++-
 gcc/doc/extend.texi                           |  8 +++
 .../gcc.dg/torture/no-callee-saved-run-1a.c   | 23 ++++++
 .../gcc.dg/torture/no-callee-saved-run-1b.c   | 59 ++++++++++++++++
 .../gcc.target/i386/no-callee-saved-1.c       | 30 ++++++++
 .../gcc.target/i386/no-callee-saved-10.c      | 46 ++++++++++++
 .../gcc.target/i386/no-callee-saved-11.c      | 11 +++
 .../gcc.target/i386/no-callee-saved-12.c      | 10 +++
 .../gcc.target/i386/no-callee-saved-13.c      | 16 +++++
 .../gcc.target/i386/no-callee-saved-14.c      | 16 +++++
 .../gcc.target/i386/no-callee-saved-15.c      | 17 +++++
 .../gcc.target/i386/no-callee-saved-16.c      | 16 +++++
 .../gcc.target/i386/no-callee-saved-17.c      | 16 +++++
 .../gcc.target/i386/no-callee-saved-18.c      | 51 ++++++++++++++
 .../gcc.target/i386/no-callee-saved-2.c       | 30 ++++++++
 .../gcc.target/i386/no-callee-saved-3.c       |  8 +++
 .../gcc.target/i386/no-callee-saved-4.c       |  8 +++
 .../gcc.target/i386/no-callee-saved-5.c       | 11 +++
 .../gcc.target/i386/no-callee-saved-6.c       | 12 ++++
 .../gcc.target/i386/no-callee-saved-7.c       | 49 +++++++++++++
 .../gcc.target/i386/no-callee-saved-8.c       | 50 +++++++++++++
 .../gcc.target/i386/no-callee-saved-9.c       | 49 +++++++++++++
 gcc/testsuite/gcc.target/i386/pr38534-1.c     | 26 +++++++
 gcc/testsuite/gcc.target/i386/pr38534-2.c     | 18 +++++
 gcc/testsuite/gcc.target/i386/pr38534-3.c     | 19 +++++
 gcc/testsuite/gcc.target/i386/pr38534-4.c     | 18 +++++
 .../gcc.target/i386/stack-check-17.c          | 19 ++---
 30 files changed, 797 insertions(+), 48 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/no-callee-saved-run-1a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/no-callee-saved-run-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-14.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-9.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-4.c