[v4,5/5] x86/gsseg: use the LKGS instruction if available for load_gs_index()

Message ID 20221019095035.10823-6-xin3.li@intel.com
State New
Headers
Series Enable LKGS instruction |

Commit Message

Li, Xin3 Oct. 19, 2022, 9:50 a.m. UTC
  From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

The LKGS instruction atomically loads a segment descriptor into the
%gs descriptor registers, *except* that %gs.base is unchanged, and the
base is instead loaded into MSR_IA32_KERNEL_GS_BASE, which is exactly
what we want this function to do.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---

Changes since v3:
* We want less ASM not more, thus keep local_irq_save/restore() inside
  native_load_gs_index() (Thomas Gleixner).
* For paravirt enabled kernels, initialize pv_ops.cpu.load_gs_index to
  native_lkgs (Thomas Gleixner).

Changes since V2:
* Mark DI as input and output (+D) as in V1, since the exception handler
  modifies it (Brian Gerst).

Changes since V1:
* Use EX_TYPE_ZERO_REG instead of fixup code in the obsolete .fixup code
  section (Peter Zijlstra).
* Add a comment that states the LKGS_DI macro will be repalced with "lkgs %di"
  once the binutils support the LKGS instruction (Peter Zijlstra).
---
 arch/x86/include/asm/gsseg.h | 33 +++++++++++++++++++++++++++++----
 arch/x86/kernel/cpu/common.c |  1 +
 2 files changed, 30 insertions(+), 4 deletions(-)
  

Comments

Juergen Gross Oct. 19, 2022, 12:16 p.m. UTC | #1
On 19.10.22 11:50, Xin Li wrote:
> From: "H. Peter Anvin (Intel)" <hpa@zytor.com>
> 
> The LKGS instruction atomically loads a segment descriptor into the
> %gs descriptor registers, *except* that %gs.base is unchanged, and the
> base is instead loaded into MSR_IA32_KERNEL_GS_BASE, which is exactly
> what we want this function to do.
> 
> Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Brian Gerst <brgerst@gmail.com>
> Signed-off-by: Xin Li <xin3.li@intel.com>
> ---
> 
> Changes since v3:
> * We want less ASM not more, thus keep local_irq_save/restore() inside
>    native_load_gs_index() (Thomas Gleixner).
> * For paravirt enabled kernels, initialize pv_ops.cpu.load_gs_index to
>    native_lkgs (Thomas Gleixner).
> 
> Changes since V2:
> * Mark DI as input and output (+D) as in V1, since the exception handler
>    modifies it (Brian Gerst).
> 
> Changes since V1:
> * Use EX_TYPE_ZERO_REG instead of fixup code in the obsolete .fixup code
>    section (Peter Zijlstra).
> * Add a comment that states the LKGS_DI macro will be repalced with "lkgs %di"
>    once the binutils support the LKGS instruction (Peter Zijlstra).
> ---
>   arch/x86/include/asm/gsseg.h | 33 +++++++++++++++++++++++++++++----
>   arch/x86/kernel/cpu/common.c |  1 +
>   2 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/gsseg.h b/arch/x86/include/asm/gsseg.h
> index d15577c39e8d..ab6a595cea70 100644
> --- a/arch/x86/include/asm/gsseg.h
> +++ b/arch/x86/include/asm/gsseg.h
> @@ -14,17 +14,42 @@
>   
>   extern asmlinkage void asm_load_gs_index(u16 selector);
>   
> +/* Replace with "lkgs %di" once binutils support LKGS instruction */
> +#define LKGS_DI _ASM_BYTES(0xf2,0x0f,0x00,0xf7)
> +
> +static inline void native_lkgs(unsigned int selector)
> +{
> +	u16 sel = selector;
> +	asm_inline volatile("1: " LKGS_DI
> +			    _ASM_EXTABLE_TYPE_REG(1b, 1b, EX_TYPE_ZERO_REG, %k[sel])
> +			    : [sel] "+D" (sel));
> +}
> +
>   static inline void native_load_gs_index(unsigned int selector)
>   {
> -	unsigned long flags;
> +	if (cpu_feature_enabled(X86_FEATURE_LKGS)) {
> +		native_lkgs(selector);
> +	} else {
> +		unsigned long flags;
>   
> -	local_irq_save(flags);
> -	asm_load_gs_index(selector);
> -	local_irq_restore(flags);
> +		local_irq_save(flags);
> +		asm_load_gs_index(selector);
> +		local_irq_restore(flags);
> +	}
>   }
>   
>   #endif /* CONFIG_X86_64 */
>   
> +static inline void __init lkgs_init(void)
> +{
> +#ifdef CONFIG_PARAVIRT_XXL
> +#ifdef CONFIG_X86_64
> +	if (cpu_feature_enabled(X86_FEATURE_LKGS))
> +		pv_ops.cpu.load_gs_index = native_lkgs;

For this to work correctly when running as a Xen PV guest, you need to add

	setup_clear_cpu_cap(X86_FEATURE_LKGS);

to xen_init_capabilities() in arch/x86/xen/enlighten_pv.c, as otherwise
the Xen specific .load_gs_index vector will be overwritten.


Juergen
  
Li, Xin3 Oct. 19, 2022, 5:45 p.m. UTC | #2
> > +static inline void __init lkgs_init(void) { #ifdef
> > +CONFIG_PARAVIRT_XXL #ifdef CONFIG_X86_64
> > +	if (cpu_feature_enabled(X86_FEATURE_LKGS))
> > +		pv_ops.cpu.load_gs_index = native_lkgs;
> 
> For this to work correctly when running as a Xen PV guest, you need to add
> 
> 	setup_clear_cpu_cap(X86_FEATURE_LKGS);
> 
> to xen_init_capabilities() in arch/x86/xen/enlighten_pv.c, as otherwise the Xen
> specific .load_gs_index vector will be overwritten.

Yeah, we definitely should add it to disable LKGS in a Xen PV guest.

So does it mean that the Xen PV uses a black list during feature detection?
If yes then new features are often required to be masked with an explicit
call to setup_clear_cpu_cap.

Wouldn't a white list be better?
Then the job is more just on the Xen PV side, and it can selectively enable
a new feature, sometimes with Xen PV specific handling code added.

Xin

> 
> 
> Juergen
  
H. Peter Anvin Oct. 19, 2022, 6:01 p.m. UTC | #3
On October 19, 2022 10:45:07 AM PDT, "Li, Xin3" <xin3.li@intel.com> wrote:
>> > +static inline void __init lkgs_init(void) { #ifdef
>> > +CONFIG_PARAVIRT_XXL #ifdef CONFIG_X86_64
>> > +	if (cpu_feature_enabled(X86_FEATURE_LKGS))
>> > +		pv_ops.cpu.load_gs_index = native_lkgs;
>> 
>> For this to work correctly when running as a Xen PV guest, you need to add
>> 
>> 	setup_clear_cpu_cap(X86_FEATURE_LKGS);
>> 
>> to xen_init_capabilities() in arch/x86/xen/enlighten_pv.c, as otherwise the Xen
>> specific .load_gs_index vector will be overwritten.
>
>Yeah, we definitely should add it to disable LKGS in a Xen PV guest.
>
>So does it mean that the Xen PV uses a black list during feature detection?
>If yes then new features are often required to be masked with an explicit
>call to setup_clear_cpu_cap.
>
>Wouldn't a white list be better?
>Then the job is more just on the Xen PV side, and it can selectively enable
>a new feature, sometimes with Xen PV specific handling code added.
>
>Xin
>
>> 
>> 
>> Juergen
>

Most things don't frob the paravirt list.

Maybe we should make the paravirt frobbing a separate patch, at it is separable.
  
Juergen Gross Oct. 20, 2022, 4:53 a.m. UTC | #4
On 19.10.22 19:45, Li, Xin3 wrote:
>>> +static inline void __init lkgs_init(void) { #ifdef
>>> +CONFIG_PARAVIRT_XXL #ifdef CONFIG_X86_64
>>> +	if (cpu_feature_enabled(X86_FEATURE_LKGS))
>>> +		pv_ops.cpu.load_gs_index = native_lkgs;
>>
>> For this to work correctly when running as a Xen PV guest, you need to add
>>
>> 	setup_clear_cpu_cap(X86_FEATURE_LKGS);
>>
>> to xen_init_capabilities() in arch/x86/xen/enlighten_pv.c, as otherwise the Xen
>> specific .load_gs_index vector will be overwritten.
> 
> Yeah, we definitely should add it to disable LKGS in a Xen PV guest.
> 
> So does it mean that the Xen PV uses a black list during feature detection?
> If yes then new features are often required to be masked with an explicit
> call to setup_clear_cpu_cap.
> 
> Wouldn't a white list be better?
> Then the job is more just on the Xen PV side, and it can selectively enable
> a new feature, sometimes with Xen PV specific handling code added.

This is not how it works. Feature detection is generic code, so we'd need to
tweak that for switching to a whitelist.

Additionally most features don't require any Xen PV specific handling. This is
needed for some paravirtualized privileged operations only. So switching to a
whitelist would add more effort.


Juergen
  
Juergen Gross Oct. 20, 2022, 4:54 a.m. UTC | #5
On 19.10.22 20:01, H. Peter Anvin wrote:
> On October 19, 2022 10:45:07 AM PDT, "Li, Xin3" <xin3.li@intel.com> wrote:
>>>> +static inline void __init lkgs_init(void) { #ifdef
>>>> +CONFIG_PARAVIRT_XXL #ifdef CONFIG_X86_64
>>>> +	if (cpu_feature_enabled(X86_FEATURE_LKGS))
>>>> +		pv_ops.cpu.load_gs_index = native_lkgs;
>>>
>>> For this to work correctly when running as a Xen PV guest, you need to add
>>>
>>> 	setup_clear_cpu_cap(X86_FEATURE_LKGS);
>>>
>>> to xen_init_capabilities() in arch/x86/xen/enlighten_pv.c, as otherwise the Xen
>>> specific .load_gs_index vector will be overwritten.
>>
>> Yeah, we definitely should add it to disable LKGS in a Xen PV guest.
>>
>> So does it mean that the Xen PV uses a black list during feature detection?
>> If yes then new features are often required to be masked with an explicit
>> call to setup_clear_cpu_cap.
>>
>> Wouldn't a white list be better?
>> Then the job is more just on the Xen PV side, and it can selectively enable
>> a new feature, sometimes with Xen PV specific handling code added.
>>
>> Xin
>>
>>>
>>>
>>> Juergen
>>
> 
> Most things don't frob the paravirt list.
> 
> Maybe we should make the paravirt frobbing a separate patch, at it is separable.

Works for me.


Juergen
  
Li, Xin3 Oct. 20, 2022, 5:58 a.m. UTC | #6
> On 19.10.22 19:45, Li, Xin3 wrote:
> >>> +static inline void __init lkgs_init(void) { #ifdef
> >>> +CONFIG_PARAVIRT_XXL #ifdef CONFIG_X86_64
> >>> +	if (cpu_feature_enabled(X86_FEATURE_LKGS))
> >>> +		pv_ops.cpu.load_gs_index = native_lkgs;
> >>
> >> For this to work correctly when running as a Xen PV guest, you need
> >> to add
> >>
> >> 	setup_clear_cpu_cap(X86_FEATURE_LKGS);
> >>
> >> to xen_init_capabilities() in arch/x86/xen/enlighten_pv.c, as
> >> otherwise the Xen specific .load_gs_index vector will be overwritten.
> >
> > Yeah, we definitely should add it to disable LKGS in a Xen PV guest.
> >
> > So does it mean that the Xen PV uses a black list during feature detection?
> > If yes then new features are often required to be masked with an
> > explicit call to setup_clear_cpu_cap.
> >
> > Wouldn't a white list be better?
> > Then the job is more just on the Xen PV side, and it can selectively
> > enable a new feature, sometimes with Xen PV specific handling code added.
> 
> This is not how it works. Feature detection is generic code, so we'd need to
> tweak that for switching to a whitelist.
>

Yes, a Xen PV guest is basically a Linux system.  However IIRC, the Xen PV
CPUID is para-virtualized, so it's Xen hypervisor's responsibility to decide
features exposed to a Xen PV guest.  No?

> Additionally most features don't require any Xen PV specific handling. This is
> needed for some paravirtualized privileged operations only. So switching to a
> whitelist would add more effort.
> 

LKGS is allowed only in ring 0, thus only Xen hypervisor could use it.

Xin

> 
> Juergen
  
Li, Xin3 Oct. 20, 2022, 6:05 a.m. UTC | #7
> >
> > Most things don't frob the paravirt list.
> >
> > Maybe we should make the paravirt frobbing a separate patch, at it is
> separable.
> 
> Works for me.

Thanks, I will send out the patch after Xen PV testing (need to setup it first).

Xin

> 
> 
> Juergen
  
Juergen Gross Oct. 20, 2022, 6:08 a.m. UTC | #8
On 20.10.22 07:58, Li, Xin3 wrote:
>> On 19.10.22 19:45, Li, Xin3 wrote:
>>>>> +static inline void __init lkgs_init(void) { #ifdef
>>>>> +CONFIG_PARAVIRT_XXL #ifdef CONFIG_X86_64
>>>>> +	if (cpu_feature_enabled(X86_FEATURE_LKGS))
>>>>> +		pv_ops.cpu.load_gs_index = native_lkgs;
>>>>
>>>> For this to work correctly when running as a Xen PV guest, you need
>>>> to add
>>>>
>>>> 	setup_clear_cpu_cap(X86_FEATURE_LKGS);
>>>>
>>>> to xen_init_capabilities() in arch/x86/xen/enlighten_pv.c, as
>>>> otherwise the Xen specific .load_gs_index vector will be overwritten.
>>>
>>> Yeah, we definitely should add it to disable LKGS in a Xen PV guest.
>>>
>>> So does it mean that the Xen PV uses a black list during feature detection?
>>> If yes then new features are often required to be masked with an
>>> explicit call to setup_clear_cpu_cap.
>>>
>>> Wouldn't a white list be better?
>>> Then the job is more just on the Xen PV side, and it can selectively
>>> enable a new feature, sometimes with Xen PV specific handling code added.
>>
>> This is not how it works. Feature detection is generic code, so we'd need to
>> tweak that for switching to a whitelist.
>>
> 
> Yes, a Xen PV guest is basically a Linux system.  However IIRC, the Xen PV
> CPUID is para-virtualized, so it's Xen hypervisor's responsibility to decide
> features exposed to a Xen PV guest.  No?

In theory you are right, of course.

OTOH the Xen PV interface has a long and complicated history, and we have to
deal with old hypervisor versions, too.

>> Additionally most features don't require any Xen PV specific handling. This is
>> needed for some paravirtualized privileged operations only. So switching to a
>> whitelist would add more effort.
>>
> 
> LKGS is allowed only in ring 0, thus only Xen hypervisor could use it.

Right, it would be one of the features where a whitelist would be nice.

OTOH today only 11 features need special handling in Xen PV guests, while
the rest of more than 300 features doesn't.


Juergen
  
Li, Xin3 Oct. 20, 2022, 6:24 a.m. UTC | #9
> On 20.10.22 07:58, Li, Xin3 wrote:
> >> On 19.10.22 19:45, Li, Xin3 wrote:
> >>>>> +static inline void __init lkgs_init(void) { #ifdef
> >>>>> +CONFIG_PARAVIRT_XXL #ifdef CONFIG_X86_64
> >>>>> +	if (cpu_feature_enabled(X86_FEATURE_LKGS))
> >>>>> +		pv_ops.cpu.load_gs_index = native_lkgs;
> >>>>
> >>>> For this to work correctly when running as a Xen PV guest, you need
> >>>> to add
> >>>>
> >>>> 	setup_clear_cpu_cap(X86_FEATURE_LKGS);
> >>>>
> >>>> to xen_init_capabilities() in arch/x86/xen/enlighten_pv.c, as
> >>>> otherwise the Xen specific .load_gs_index vector will be overwritten.
> >>>
> >>> Yeah, we definitely should add it to disable LKGS in a Xen PV guest.
> >>>
> >>> So does it mean that the Xen PV uses a black list during feature detection?
> >>> If yes then new features are often required to be masked with an
> >>> explicit call to setup_clear_cpu_cap.
> >>>
> >>> Wouldn't a white list be better?
> >>> Then the job is more just on the Xen PV side, and it can selectively
> >>> enable a new feature, sometimes with Xen PV specific handling code
> added.
> >>
> >> This is not how it works. Feature detection is generic code, so we'd
> >> need to tweak that for switching to a whitelist.
> >>
> >
> > Yes, a Xen PV guest is basically a Linux system.  However IIRC, the
> > Xen PV CPUID is para-virtualized, so it's Xen hypervisor's
> > responsibility to decide features exposed to a Xen PV guest.  No?
> 
> In theory you are right, of course.
> 
> OTOH the Xen PV interface has a long and complicated history, and we have to
> deal with old hypervisor versions, too.
> 
> >> Additionally most features don't require any Xen PV specific
> >> handling. This is needed for some paravirtualized privileged
> >> operations only. So switching to a whitelist would add more effort.
> >>
> >
> > LKGS is allowed only in ring 0, thus only Xen hypervisor could use it.
> 
> Right, it would be one of the features where a whitelist would be nice.
> 
> OTOH today only 11 features need special handling in Xen PV guests, while the
> rest of more than 300 features doesn't.
> 

Got to say, nothing is more convincing than strong data.
Xin

> 
> Juergen
  

Patch

diff --git a/arch/x86/include/asm/gsseg.h b/arch/x86/include/asm/gsseg.h
index d15577c39e8d..ab6a595cea70 100644
--- a/arch/x86/include/asm/gsseg.h
+++ b/arch/x86/include/asm/gsseg.h
@@ -14,17 +14,42 @@ 
 
 extern asmlinkage void asm_load_gs_index(u16 selector);
 
+/* Replace with "lkgs %di" once binutils support LKGS instruction */
+#define LKGS_DI _ASM_BYTES(0xf2,0x0f,0x00,0xf7)
+
+static inline void native_lkgs(unsigned int selector)
+{
+	u16 sel = selector;
+	asm_inline volatile("1: " LKGS_DI
+			    _ASM_EXTABLE_TYPE_REG(1b, 1b, EX_TYPE_ZERO_REG, %k[sel])
+			    : [sel] "+D" (sel));
+}
+
 static inline void native_load_gs_index(unsigned int selector)
 {
-	unsigned long flags;
+	if (cpu_feature_enabled(X86_FEATURE_LKGS)) {
+		native_lkgs(selector);
+	} else {
+		unsigned long flags;
 
-	local_irq_save(flags);
-	asm_load_gs_index(selector);
-	local_irq_restore(flags);
+		local_irq_save(flags);
+		asm_load_gs_index(selector);
+		local_irq_restore(flags);
+	}
 }
 
 #endif /* CONFIG_X86_64 */
 
+static inline void __init lkgs_init(void)
+{
+#ifdef CONFIG_PARAVIRT_XXL
+#ifdef CONFIG_X86_64
+	if (cpu_feature_enabled(X86_FEATURE_LKGS))
+		pv_ops.cpu.load_gs_index = native_lkgs;
+#endif
+#endif
+}
+
 #ifndef CONFIG_PARAVIRT_XXL
 
 static inline void load_gs_index(unsigned int selector)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3e508f239098..d6eb4f60b47d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1939,6 +1939,7 @@  void __init identify_boot_cpu(void)
 	setup_cr_pinning();
 
 	tsx_init();
+	lkgs_init();
 }
 
 void identify_secondary_cpu(struct cpuinfo_x86 *c)