[v3,3/6] x86/gsseg: make asm_load_gs_index() take an u16

Message ID 20221013200134.1487-4-xin3.li@intel.com
State New
Headers
Series Enable LKGS instruction |

Commit Message

Li, Xin3 Oct. 13, 2022, 8:01 p.m. UTC
  From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

Let gcc know that only the low 16 bits of load_gs_index() argument
actually matter. It might allow it to create slightly better
code. However, do not propagate this into the prototypes of functions
that end up being paravirtualized, to avoid unnecessary changes.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/entry/entry_64.S            | 2 +-
 arch/x86/include/asm/special_insns.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
  

Comments

David Laight Oct. 14, 2022, 12:28 p.m. UTC | #1
From: Xin Li
> Sent: 13 October 2022 21:02
> 
> From: "H. Peter Anvin (Intel)" <hpa@zytor.com>
> 
> Let gcc know that only the low 16 bits of load_gs_index() argument
> actually matter. It might allow it to create slightly better
> code. However, do not propagate this into the prototypes of functions
> that end up being paravirtualized, to avoid unnecessary changes.

Using u16 will almost always make the code worse.
At some point the value has to be masked and/or extended
to ensure an out of range value doesn't appear in
a register.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
Li, Xin3 Oct. 15, 2022, 12:13 a.m. UTC | #2
> >
> > From: "H. Peter Anvin (Intel)" <hpa@zytor.com>
> >
> > Let gcc know that only the low 16 bits of load_gs_index() argument
> > actually matter. It might allow it to create slightly better code.
> > However, do not propagate this into the prototypes of functions that
> > end up being paravirtualized, to avoid unnecessary changes.
> 
> Using u16 will almost always make the code worse.
> At some point the value has to be masked and/or extended to ensure an out of
> range value doesn't appear in a register.

Any potential issue with this patch set?

> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1
> 1PT, UK Registration No: 1397386 (Wales)
  
H. Peter Anvin Oct. 15, 2022, 2:41 a.m. UTC | #3
On October 14, 2022 5:28:25 AM PDT, David Laight <David.Laight@ACULAB.COM> wrote:
>From: Xin Li
>> Sent: 13 October 2022 21:02
>> 
>> From: "H. Peter Anvin (Intel)" <hpa@zytor.com>
>> 
>> Let gcc know that only the low 16 bits of load_gs_index() argument
>> actually matter. It might allow it to create slightly better
>> code. However, do not propagate this into the prototypes of functions
>> that end up being paravirtualized, to avoid unnecessary changes.
>
>Using u16 will almost always make the code worse.
>At some point the value has to be masked and/or extended
>to ensure an out of range value doesn't appear in
>a register.
>
>	David
>
>-
>Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
>Registration No: 1397386 (Wales)
>
>

Is that a general statement or are you actually invoking it in this case? This is about it being a narrowing input, *removing* such constraints.
  
David Laight Oct. 17, 2022, 7:49 a.m. UTC | #4
From: H. Peter Anvin
> Sent: 15 October 2022 03:41
> 
> On October 14, 2022 5:28:25 AM PDT, David Laight <David.Laight@ACULAB.COM> wrote:
> >From: Xin Li
> >> Sent: 13 October 2022 21:02
> >>
> >> From: "H. Peter Anvin (Intel)" <hpa@zytor.com>
> >>
> >> Let gcc know that only the low 16 bits of load_gs_index() argument
> >> actually matter. It might allow it to create slightly better
> >> code. However, do not propagate this into the prototypes of functions
> >> that end up being paravirtualized, to avoid unnecessary changes.
> >
> >Using u16 will almost always make the code worse.
> >At some point the value has to be masked and/or extended
> >to ensure an out of range value doesn't appear in
> >a register.
> >
> >	David
> 
> Is that a general statement or are you actually invoking it in this case?
> This is about it being a narrowing input, *removing* such constraints.

It is a general statement.
You suggested you might get better code.
If fact you'll probably get worse code.
It might not matter here, but ...

Most modern calling conventions use cpu register to pass arguments
and results.
So the compiler is required to ensure that u16 values are in range
in either the caller or called code (or both).
Just because the domain of a value is small doesn't mean that
the best type isn't 'int' or 'unsigned int'.

Additionally (except on x86) any arithmetic on sub-32bit values
requires additional instructions to mask the result.

Even on x86-64 if you index an array with an 'int' the compiler
has to generate code to sign extend the value to 64 bits.
You get better code for 'signed long' or unsigned types.
This is probably true for all 64bit architectures.

Since (most) cpu have both sign extending an zero extending
loads from memory, it can make sense to use u8 and u16 to
reduce the size of structures.
But for function arguments and function locals it almost
always makes the code worse.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
H. Peter Anvin Oct. 17, 2022, 6:39 p.m. UTC | #5
On October 17, 2022 12:49:41 AM PDT, David Laight <David.Laight@ACULAB.COM> wrote:
>From: H. Peter Anvin
>> Sent: 15 October 2022 03:41
>> 
>> On October 14, 2022 5:28:25 AM PDT, David Laight <David.Laight@ACULAB.COM> wrote:
>> >From: Xin Li
>> >> Sent: 13 October 2022 21:02
>> >>
>> >> From: "H. Peter Anvin (Intel)" <hpa@zytor.com>
>> >>
>> >> Let gcc know that only the low 16 bits of load_gs_index() argument
>> >> actually matter. It might allow it to create slightly better
>> >> code. However, do not propagate this into the prototypes of functions
>> >> that end up being paravirtualized, to avoid unnecessary changes.
>> >
>> >Using u16 will almost always make the code worse.
>> >At some point the value has to be masked and/or extended
>> >to ensure an out of range value doesn't appear in
>> >a register.
>> >
>> >	David
>> 
>> Is that a general statement or are you actually invoking it in this case?
>> This is about it being a narrowing input, *removing* such constraints.
>
>It is a general statement.
>You suggested you might get better code.
>If fact you'll probably get worse code.
>It might not matter here, but ...
>
>Most modern calling conventions use cpu register to pass arguments
>and results.
>So the compiler is required to ensure that u16 values are in range
>in either the caller or called code (or both).
>Just because the domain of a value is small doesn't mean that
>the best type isn't 'int' or 'unsigned int'.
>
>Additionally (except on x86) any arithmetic on sub-32bit values
>requires additional instructions to mask the result.
>
>Even on x86-64 if you index an array with an 'int' the compiler
>has to generate code to sign extend the value to 64 bits.
>You get better code for 'signed long' or unsigned types.
>This is probably true for all 64bit architectures.
>
>Since (most) cpu have both sign extending an zero extending
>loads from memory, it can make sense to use u8 and u16 to
>reduce the size of structures.
>But for function arguments and function locals it almost
>always makes the code worse.
>
>	David
>
>-
>Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
>Registration No: 1397386 (Wales)
>

Ok. You are plain incorrect in this case for two reasons:

1. The x86-64 calling convention makes it up to the receiver (callee for arguments, caller for returns) to do such masking of values.

2. The consumer of the values here does not need any masking or extensions.

So this is simply telling the compiler what the programmer knows.
  

Patch

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9953d966d124..e0c48998d2fb 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -779,7 +779,7 @@  _ASM_NOKPROBE(common_interrupt_return)
 
 /*
  * Reload gs selector with exception handling
- * edi:  new selector
+ *  di:  new selector
  *
  * Is in entry.text as it shouldn't be instrumented.
  */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 35f709f619fb..a71d0e8d4684 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -120,7 +120,7 @@  static inline void native_wbinvd(void)
 	asm volatile("wbinvd": : :"memory");
 }
 
-extern asmlinkage void asm_load_gs_index(unsigned int selector);
+extern asmlinkage void asm_load_gs_index(u16 selector);
 
 static inline void native_load_gs_index(unsigned int selector)
 {