[tip:,x86/boot] x86/boot: Use 32-bit XOR to clear registers

Message ID 170929679278.398.4143931058196373559.tip-bot2@tip-bot2
State New
Headers
Series [tip:,x86/boot] x86/boot: Use 32-bit XOR to clear registers |

Commit Message

tip-bot2 for Thomas Gleixner March 1, 2024, 12:39 p.m. UTC
  The following commit has been merged into the x86/boot branch of tip:

Commit-ID:     721f791ce1cddfa5f2bf524ac14741bfa0f72697
Gitweb:        https://git.kernel.org/tip/721f791ce1cddfa5f2bf524ac14741bfa0f72697
Author:        Uros Bizjak <ubizjak@gmail.com>
AuthorDate:    Wed, 24 Jan 2024 11:38:59 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 01 Mar 2024 12:47:37 +01:00

x86/boot: Use 32-bit XOR to clear registers

x86_64 zero extends 32-bit operations, so for 64-bit operands,
XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids
a REX prefix byte when legacy registers are used.

Slightly smaller code generated, no change in functionality.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240124103859.611372-1-ubizjak@gmail.com
---
 arch/x86/kernel/head_64.S         | 6 +++---
 arch/x86/kernel/sev_verify_cbit.S | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)
  

Comments

Ard Biesheuvel March 1, 2024, 12:44 p.m. UTC | #1
On Fri, 1 Mar 2024 at 13:39, tip-bot2 for Uros Bizjak
<tip-bot2@linutronix.de> wrote:
>
> The following commit has been merged into the x86/boot branch of tip:
>
> Commit-ID:     721f791ce1cddfa5f2bf524ac14741bfa0f72697
> Gitweb:        https://git.kernel.org/tip/721f791ce1cddfa5f2bf524ac14741bfa0f72697
> Author:        Uros Bizjak <ubizjak@gmail.com>
> AuthorDate:    Wed, 24 Jan 2024 11:38:59 +01:00
> Committer:     Ingo Molnar <mingo@kernel.org>
> CommitterDate: Fri, 01 Mar 2024 12:47:37 +01:00
>
> x86/boot: Use 32-bit XOR to clear registers
>
> x86_64 zero extends 32-bit operations, so for 64-bit operands,
> XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids
> a REX prefix byte when legacy registers are used.
>

.. and so this change is pointless churn when not using legacy
registers, right?

> Slightly smaller code generated, no change in functionality.
>
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Denys Vlasenko <dvlasenk@redhat.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Link: https://lore.kernel.org/r/20240124103859.611372-1-ubizjak@gmail.com
> ---
>  arch/x86/kernel/head_64.S         | 6 +++---
>  arch/x86/kernel/sev_verify_cbit.S | 2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index d295bf6..86136a7 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -169,7 +169,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
>         ANNOTATE_NOENDBR
>
>         /* Clear %R15 which holds the boot_params pointer on the boot CPU */
> -       xorq    %r15, %r15
> +       xorl    %r15d, %r15d
>

   0: 4d 31 ff              xor    %r15,%r15
   3: 45 31 ff              xor    %r15d,%r15d


>         /*
>          * Retrieve the modifier (SME encryption mask if SME is active) to be
> @@ -178,7 +178,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
>  #ifdef CONFIG_AMD_MEM_ENCRYPT
>         movq    sme_me_mask, %rax
>  #else
> -       xorq    %rax, %rax
> +       xorl    %eax, %eax
>  #endif
>

This conflicts with my RIP-relative boot cleanup series.

>         /* Form the CR3 value being sure to include the CR3 modifier */
> @@ -295,7 +295,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
>
>  .Llookup_AP:
>         /* EAX contains the APIC ID of the current CPU */
> -       xorq    %rcx, %rcx
> +       xorl    %ecx, %ecx
>         leaq    cpuid_to_apicid(%rip), %rbx
>
>  .Lfind_cpunr:
> diff --git a/arch/x86/kernel/sev_verify_cbit.S b/arch/x86/kernel/sev_verify_cbit.S
> index 3355e27..1ab65f6 100644
> --- a/arch/x86/kernel/sev_verify_cbit.S
> +++ b/arch/x86/kernel/sev_verify_cbit.S
> @@ -77,7 +77,7 @@ SYM_FUNC_START(sev_verify_cbit)
>          * The check failed, prevent any forward progress to prevent ROP
>          * attacks, invalidate the stack and go into a hlt loop.
>          */
> -       xorq    %rsp, %rsp
> +       xorl    %esp, %esp
>         subq    $0x1000, %rsp
>  2:     hlt
>         jmp 2b
  
Uros Bizjak March 1, 2024, 12:50 p.m. UTC | #2
On Fri, Mar 1, 2024 at 1:45 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Fri, 1 Mar 2024 at 13:39, tip-bot2 for Uros Bizjak
> <tip-bot2@linutronix.de> wrote:
> >
> > The following commit has been merged into the x86/boot branch of tip:
> >
> > Commit-ID:     721f791ce1cddfa5f2bf524ac14741bfa0f72697
> > Gitweb:        https://git.kernel.org/tip/721f791ce1cddfa5f2bf524ac14741bfa0f72697
> > Author:        Uros Bizjak <ubizjak@gmail.com>
> > AuthorDate:    Wed, 24 Jan 2024 11:38:59 +01:00
> > Committer:     Ingo Molnar <mingo@kernel.org>
> > CommitterDate: Fri, 01 Mar 2024 12:47:37 +01:00
> >
> > x86/boot: Use 32-bit XOR to clear registers
> >
> > x86_64 zero extends 32-bit operations, so for 64-bit operands,
> > XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids
> > a REX prefix byte when legacy registers are used.
> >
>
> ... and so this change is pointless churn when not using legacy
> registers, right?

Although there is no code size change with REX registers, it would
look weird to use XORQ with REX registers and XORL with legacy regs.
Please see arch/x86/kvm/{vmx,svm}/vmenter.S where this approach is
also used.

Uros.

> > Slightly smaller code generated, no change in functionality.
> >
> > Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> > Signed-off-by: Ingo Molnar <mingo@kernel.org>
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Brian Gerst <brgerst@gmail.com>
> > Cc: Denys Vlasenko <dvlasenk@redhat.com>
> > Cc: H. Peter Anvin <hpa@zytor.com>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> > Cc: Ard Biesheuvel <ardb@kernel.org>
> > Link: https://lore.kernel.org/r/20240124103859.611372-1-ubizjak@gmail.com
> > ---
> >  arch/x86/kernel/head_64.S         | 6 +++---
> >  arch/x86/kernel/sev_verify_cbit.S | 2 +-
> >  2 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> > index d295bf6..86136a7 100644
> > --- a/arch/x86/kernel/head_64.S
> > +++ b/arch/x86/kernel/head_64.S
> > @@ -169,7 +169,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
> >         ANNOTATE_NOENDBR
> >
> >         /* Clear %R15 which holds the boot_params pointer on the boot CPU */
> > -       xorq    %r15, %r15
> > +       xorl    %r15d, %r15d
> >
>
>    0: 4d 31 ff              xor    %r15,%r15
>    3: 45 31 ff              xor    %r15d,%r15d
>
>
> >         /*
> >          * Retrieve the modifier (SME encryption mask if SME is active) to be
> > @@ -178,7 +178,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
> >  #ifdef CONFIG_AMD_MEM_ENCRYPT
> >         movq    sme_me_mask, %rax
> >  #else
> > -       xorq    %rax, %rax
> > +       xorl    %eax, %eax
> >  #endif
> >
>
> This conflicts with my RIP-relative boot cleanup series.
>
> >         /* Form the CR3 value being sure to include the CR3 modifier */
> > @@ -295,7 +295,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
> >
> >  .Llookup_AP:
> >         /* EAX contains the APIC ID of the current CPU */
> > -       xorq    %rcx, %rcx
> > +       xorl    %ecx, %ecx
> >         leaq    cpuid_to_apicid(%rip), %rbx
> >
> >  .Lfind_cpunr:
> > diff --git a/arch/x86/kernel/sev_verify_cbit.S b/arch/x86/kernel/sev_verify_cbit.S
> > index 3355e27..1ab65f6 100644
> > --- a/arch/x86/kernel/sev_verify_cbit.S
> > +++ b/arch/x86/kernel/sev_verify_cbit.S
> > @@ -77,7 +77,7 @@ SYM_FUNC_START(sev_verify_cbit)
> >          * The check failed, prevent any forward progress to prevent ROP
> >          * attacks, invalidate the stack and go into a hlt loop.
> >          */
> > -       xorq    %rsp, %rsp
> > +       xorl    %esp, %esp
> >         subq    $0x1000, %rsp
> >  2:     hlt
> >         jmp 2b
  
Ard Biesheuvel March 1, 2024, 1:10 p.m. UTC | #3
On Fri, 1 Mar 2024 at 13:51, Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Fri, Mar 1, 2024 at 1:45 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Fri, 1 Mar 2024 at 13:39, tip-bot2 for Uros Bizjak
> > <tip-bot2@linutronix.de> wrote:
> > >
> > > The following commit has been merged into the x86/boot branch of tip:
> > >
> > > Commit-ID:     721f791ce1cddfa5f2bf524ac14741bfa0f72697
> > > Gitweb:        https://git.kernel.org/tip/721f791ce1cddfa5f2bf524ac14741bfa0f72697
> > > Author:        Uros Bizjak <ubizjak@gmail.com>
> > > AuthorDate:    Wed, 24 Jan 2024 11:38:59 +01:00
> > > Committer:     Ingo Molnar <mingo@kernel.org>
> > > CommitterDate: Fri, 01 Mar 2024 12:47:37 +01:00
> > >
> > > x86/boot: Use 32-bit XOR to clear registers
> > >
> > > x86_64 zero extends 32-bit operations, so for 64-bit operands,
> > > XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids
> > > a REX prefix byte when legacy registers are used.
> > >
> >
> > ... and so this change is pointless churn when not using legacy
> > registers, right?
>
> Although there is no code size change with REX registers, it would
> look weird to use XORQ with REX registers and XORL with legacy regs.

You are changing an isolated occurrence of XORQ into XORL on the basis
that XORQ 'looks weird', and would produce a longer opcode if the
occurrence in question would be using a different register than it
actually uses.

Apologies for the bluntness, but in my book, this really falls firmly
into the 'pointless churn' territory. The startup code is not
performance critical, neither in terms of size nor in speed, and so
I'd prefer to avoid these kinds of changes. Just my 2c, though - Ingo
has already merged the patch.
  
Andrew Cooper March 1, 2024, 4:02 p.m. UTC | #4
On 01/03/2024 1:10 pm, Ard Biesheuvel wrote:
> On Fri, 1 Mar 2024 at 13:51, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Fri, Mar 1, 2024 at 1:45 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>>> On Fri, 1 Mar 2024 at 13:39, tip-bot2 for Uros Bizjak
>>> <tip-bot2@linutronix.de> wrote:
>>>> The following commit has been merged into the x86/boot branch of tip:
>>>>
>>>> Commit-ID:     721f791ce1cddfa5f2bf524ac14741bfa0f72697
>>>> Gitweb:        https://git.kernel.org/tip/721f791ce1cddfa5f2bf524ac14741bfa0f72697
>>>> Author:        Uros Bizjak <ubizjak@gmail.com>
>>>> AuthorDate:    Wed, 24 Jan 2024 11:38:59 +01:00
>>>> Committer:     Ingo Molnar <mingo@kernel.org>
>>>> CommitterDate: Fri, 01 Mar 2024 12:47:37 +01:00
>>>>
>>>> x86/boot: Use 32-bit XOR to clear registers
>>>>
>>>> x86_64 zero extends 32-bit operations, so for 64-bit operands,
>>>> XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids
>>>> a REX prefix byte when legacy registers are used.
>>>>
>>> ... and so this change is pointless churn when not using legacy
>>> registers, right?
>> Although there is no code size change with REX registers, it would
>> look weird to use XORQ with REX registers and XORL with legacy regs.
> You are changing an isolated occurrence of XORQ into XORL on the basis
> that XORQ 'looks weird', and would produce a longer opcode if the
> occurrence in question would be using a different register than it
> actually uses.
>
> Apologies for the bluntness, but in my book, this really falls firmly
> into the 'pointless churn' territory. The startup code is not
> performance critical, neither in terms of size nor in speed, and so
> I'd prefer to avoid these kinds of changes. Just my 2c, though - Ingo
> has already merged the patch.

Without trying to get into an argument here...

The better reason is that Silvermont Atoms don't recognise the 64bit
form as a zeroing idiom.  They only recognise the 32bit form of the idiom.

Therefore in fastpaths it is (marginally) important to xorl %r15d, %r15d
rather than xorq %r15, %r15.

But this instance is not a fastpath, and it also doesn't save any
encoding space, so I'm not sure it was really worth changing.

~Andrew
  
Ard Biesheuvel March 1, 2024, 4:35 p.m. UTC | #5
On Fri, 1 Mar 2024 at 17:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 01/03/2024 1:10 pm, Ard Biesheuvel wrote:
> > On Fri, 1 Mar 2024 at 13:51, Uros Bizjak <ubizjak@gmail.com> wrote:
> >> On Fri, Mar 1, 2024 at 1:45 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> >>> On Fri, 1 Mar 2024 at 13:39, tip-bot2 for Uros Bizjak
> >>> <tip-bot2@linutronix.de> wrote:
> >>>> The following commit has been merged into the x86/boot branch of tip:
> >>>>
> >>>> Commit-ID:     721f791ce1cddfa5f2bf524ac14741bfa0f72697
> >>>> Gitweb:        https://git.kernel.org/tip/721f791ce1cddfa5f2bf524ac14741bfa0f72697
> >>>> Author:        Uros Bizjak <ubizjak@gmail.com>
> >>>> AuthorDate:    Wed, 24 Jan 2024 11:38:59 +01:00
> >>>> Committer:     Ingo Molnar <mingo@kernel.org>
> >>>> CommitterDate: Fri, 01 Mar 2024 12:47:37 +01:00
> >>>>
> >>>> x86/boot: Use 32-bit XOR to clear registers
> >>>>
> >>>> x86_64 zero extends 32-bit operations, so for 64-bit operands,
> >>>> XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids
> >>>> a REX prefix byte when legacy registers are used.
> >>>>
> >>> ... and so this change is pointless churn when not using legacy
> >>> registers, right?
> >> Although there is no code size change with REX registers, it would
> >> look weird to use XORQ with REX registers and XORL with legacy regs.
> > You are changing an isolated occurrence of XORQ into XORL on the basis
> > that XORQ 'looks weird', and would produce a longer opcode if the
> > occurrence in question would be using a different register than it
> > actually uses.
> >
> > Apologies for the bluntness, but in my book, this really falls firmly
> > into the 'pointless churn' territory. The startup code is not
> > performance critical, neither in terms of size nor in speed, and so
> > I'd prefer to avoid these kinds of changes. Just my 2c, though - Ingo
> > has already merged the patch.
>
> Without trying to get into an argument here...
>

No worries, we're all friends here :-)

> The better reason is that Silvermont Atoms don't recognise the 64bit
> form as a zeroing idiom.  They only recognise the 32bit form of the idiom.
>
> Therefore in fastpaths it is (marginally) important to xorl %r15d, %r15d
> rather than xorq %r15, %r15.
>
> But this instance is not a fastpath, and it also doesn't save any
> encoding space, so I'm not sure it was really worth changing.
>

Yeah, that is my point, really. But let's move on. And apologies to
Uros for the tone - it didn't sound as grumpy in my head when I typed
it as it does now reading back the thread.
  

Patch

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d295bf6..86136a7 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -169,7 +169,7 @@  SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
 	ANNOTATE_NOENDBR
 
 	/* Clear %R15 which holds the boot_params pointer on the boot CPU */
-	xorq	%r15, %r15
+	xorl	%r15d, %r15d
 
 	/*
 	 * Retrieve the modifier (SME encryption mask if SME is active) to be
@@ -178,7 +178,7 @@  SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 	movq	sme_me_mask, %rax
 #else
-	xorq	%rax, %rax
+	xorl	%eax, %eax
 #endif
 
 	/* Form the CR3 value being sure to include the CR3 modifier */
@@ -295,7 +295,7 @@  SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
 
 .Llookup_AP:
 	/* EAX contains the APIC ID of the current CPU */
-	xorq	%rcx, %rcx
+	xorl	%ecx, %ecx
 	leaq	cpuid_to_apicid(%rip), %rbx
 
 .Lfind_cpunr:
diff --git a/arch/x86/kernel/sev_verify_cbit.S b/arch/x86/kernel/sev_verify_cbit.S
index 3355e27..1ab65f6 100644
--- a/arch/x86/kernel/sev_verify_cbit.S
+++ b/arch/x86/kernel/sev_verify_cbit.S
@@ -77,7 +77,7 @@  SYM_FUNC_START(sev_verify_cbit)
 	 * The check failed, prevent any forward progress to prevent ROP
 	 * attacks, invalidate the stack and go into a hlt loop.
 	 */
-	xorq	%rsp, %rsp
+	xorl	%esp, %esp
 	subq	$0x1000, %rsp
 2:	hlt
 	jmp 2b