[1/1] x86/hyperv: Disable IBT when hypercall page lacks ENDBR instruction

Message ID 1689885237-32662-1-git-send-email-mikelley@microsoft.com
State New
Headers
Series [1/1] x86/hyperv: Disable IBT when hypercall page lacks ENDBR instruction |

Commit Message

Michael Kelley (LINUX) July 20, 2023, 8:33 p.m. UTC
  On hardware that supports Indirect Branch Tracking (IBT), Hyper-V VMs
with ConfigVersion 9.3 or later support IBT in the guest. However,
current versions of Hyper-V have a bug in that there's not an ENDBR64
instruction at the beginning of the hypercall page. Since hypercalls are
made with an indirect call to the hypercall page, all hypercall attempts
fail with an exception and Linux panics.

A Hyper-V fix is in progress to add ENDBR64. But guard against the Linux
panic by clearing X86_FEATURE_IBT if the hypercall page doesn't start
with ENDBR. The VM will boot and run without IBT.

If future Linux 32-bit kernels were to support IBT, additional hypercall
page hackery would be needed to make IBT work for such kernels in a
Hyper-V VM.

Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
---
 arch/x86/hyperv/hv_init.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)
  

Comments

Peter Zijlstra July 20, 2023, 9:15 p.m. UTC | #1
On Thu, Jul 20, 2023 at 01:33:57PM -0700, Michael Kelley wrote:
> On hardware that supports Indirect Branch Tracking (IBT), Hyper-V VMs
> with ConfigVersion 9.3 or later support IBT in the guest. However,
> current versions of Hyper-V have a bug in that there's not an ENDBR64
> instruction at the beginning of the hypercall page. 

Whoops :/

> Since hypercalls are
> made with an indirect call to the hypercall page, all hypercall attempts
> fail with an exception and Linux panics.
> 
> A Hyper-V fix is in progress to add ENDBR64. But guard against the Linux
> panic by clearing X86_FEATURE_IBT if the hypercall page doesn't start
> with ENDBR. The VM will boot and run without IBT.
> 
> If future Linux 32-bit kernels were to support IBT, additional hypercall
> page hackery would be needed to make IBT work for such kernels in a
> Hyper-V VM.

There are currently no plans to add IBT support to 32bit.

> Cc: stable@vger.kernel.org
> Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> ---
>  arch/x86/hyperv/hv_init.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 6c04b52..5cbee24 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -14,6 +14,7 @@
>  #include <asm/apic.h>
>  #include <asm/desc.h>
>  #include <asm/sev.h>
> +#include <asm/ibt.h>
>  #include <asm/hypervisor.h>
>  #include <asm/hyperv-tlfs.h>
>  #include <asm/mshyperv.h>
> @@ -472,6 +473,26 @@ void __init hyperv_init(void)
>  	}
>  
>  	/*
> +	 * Some versions of Hyper-V that provide IBT in guest VMs have a bug
> +	 * in that there's no ENDBR64 instruction at the entry to the
> +	 * hypercall page. Because hypercalls are invoked via an indirect call
> +	 * to the hypercall page, all hypercall attempts fail when IBT is
> +	 * enabled, and Linux panics. For such buggy versions, disable IBT.
> +	 *
> +	 * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall
> +	 * page, so if future Linux kernel versions enable IBT for 32-bit
> +	 * builds, additional hypercall page hackery will be required here
> +	 * to provide an ENDBR32.
> +	 */
> +#ifdef CONFIG_X86_KERNEL_IBT
> +	if (cpu_feature_enabled(X86_FEATURE_IBT) &&
> +	    *(u32 *)hv_hypercall_pg != gen_endbr()) {
> +		setup_clear_cpu_cap(X86_FEATURE_IBT);
> +		pr_info("Hyper-V: Disabling IBT because of Hyper-V bug\n");
> +	}
> +#endif

pr_warn() perhaps?

Other than that, this seems fairly straight forward. One thing I
wondered about; wouldn't it be possible to re-write the indirect
hypercall thingies to a direct call? I mean, once we have the hypercall
page mapped, the address is known right?
  
Michael Kelley (LINUX) July 21, 2023, 12:41 a.m. UTC | #2
From: Peter Zijlstra <peterz@infradead.org> Sent: Thursday, July 20, 2023 2:16 PM
> 
> On Thu, Jul 20, 2023 at 01:33:57PM -0700, Michael Kelley wrote:
> > On hardware that supports Indirect Branch Tracking (IBT), Hyper-V VMs
> > with ConfigVersion 9.3 or later support IBT in the guest. However,
> > current versions of Hyper-V have a bug in that there's not an ENDBR64
> > instruction at the beginning of the hypercall page.
> 
> Whoops :/
> 
> > Since hypercalls are
> > made with an indirect call to the hypercall page, all hypercall attempts
> > fail with an exception and Linux panics.
> >
> > A Hyper-V fix is in progress to add ENDBR64. But guard against the Linux
> > panic by clearing X86_FEATURE_IBT if the hypercall page doesn't start
> > with ENDBR. The VM will boot and run without IBT.
> >
> > If future Linux 32-bit kernels were to support IBT, additional hypercall
> > page hackery would be needed to make IBT work for such kernels in a
> > Hyper-V VM.
> 
> There are currently no plans to add IBT support to 32bit.

That's what I thought.

> 
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> > ---
> >  arch/x86/hyperv/hv_init.c | 21 +++++++++++++++++++++
> >  1 file changed, 21 insertions(+)
> >
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index 6c04b52..5cbee24 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -14,6 +14,7 @@
> >  #include <asm/apic.h>
> >  #include <asm/desc.h>
> >  #include <asm/sev.h>
> > +#include <asm/ibt.h>
> >  #include <asm/hypervisor.h>
> >  #include <asm/hyperv-tlfs.h>
> >  #include <asm/mshyperv.h>
> > @@ -472,6 +473,26 @@ void __init hyperv_init(void)
> >  	}
> >
> >  	/*
> > +	 * Some versions of Hyper-V that provide IBT in guest VMs have a bug
> > +	 * in that there's no ENDBR64 instruction at the entry to the
> > +	 * hypercall page. Because hypercalls are invoked via an indirect call
> > +	 * to the hypercall page, all hypercall attempts fail when IBT is
> > +	 * enabled, and Linux panics. For such buggy versions, disable IBT.
> > +	 *
> > +	 * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall
> > +	 * page, so if future Linux kernel versions enable IBT for 32-bit
> > +	 * builds, additional hypercall page hackery will be required here
> > +	 * to provide an ENDBR32.
> > +	 */
> > +#ifdef CONFIG_X86_KERNEL_IBT
> > +	if (cpu_feature_enabled(X86_FEATURE_IBT) &&
> > +	    *(u32 *)hv_hypercall_pg != gen_endbr()) {
> > +		setup_clear_cpu_cap(X86_FEATURE_IBT);
> > +		pr_info("Hyper-V: Disabling IBT because of Hyper-V bug\n");
> > +	}
> > +#endif
> 
> pr_warn() perhaps?

I wanted pr_info() so there's an immediate way to check for this
case in the dmesg output if a user complains about IBT not being
enabled when he expects it.   In some sense, the message is temporary
because once the Hyper-V patch is available and users install it,
the message will go away.  The pipeline for the Hyper-V patch is a
bit long, so availability is at least several months away.  This Linux
workaround will be available much faster.  Once it is picked up on
stable branches, we will avoid the situations like we saw where
someone upgraded Fedora 38 from a 6.2 to a 6.3 kernel, and the 6.3
kernel wouldn't boot because it has kernel IBT enabled.

> 
> Other than that, this seems fairly straight forward. One thing I
> wondered about; wouldn't it be possible to re-write the indirect
> hypercall thingies to a direct call? I mean, once we have the hypercall
> page mapped, the address is known right?

Yes, the address is known.  It does not change across things like
hibernation.  But the indirect call instruction is part of an inline assembly
sequence, so the call instructions that need re-writing are scattered
throughout the code. There's also the SEV-SNP case from the
latest version of Tianyu Lan's patch set [1] where vmmcall may be used
instead, based on your recent enhancement for nested ALTERNATIVE.
Re-writing seems like that's more complexity than warranted for a
mostly interim situation until the Hyper-V patch is available and
users install it.

Michael

[1] https://lore.kernel.org/lkml/20230718032304.136888-6-ltykernel@gmail.com/
  
Peter Zijlstra July 21, 2023, 7:58 a.m. UTC | #3
On Fri, Jul 21, 2023 at 12:41:35AM +0000, Michael Kelley (LINUX) wrote:

> > Other than that, this seems fairly straight forward. One thing I
> > wondered about; wouldn't it be possible to re-write the indirect
> > hypercall thingies to a direct call? I mean, once we have the hypercall
> > page mapped, the address is known right?
> 
> Yes, the address is known.  It does not change across things like
> hibernation.  But the indirect call instruction is part of an inline assembly
> sequence, so the call instructions that need re-writing are scattered
> throughout the code. There's also the SEV-SNP case from the
> latest version of Tianyu Lan's patch set [1] where vmmcall may be used
> instead, based on your recent enhancement for nested ALTERNATIVE.
> Re-writing seems like that's more complexity than warranted for a
> mostly interim situation until the Hyper-V patch is available and
> users install it.

Well, we have a lot of infrastructure for this already. Specifically
this is very like the paravirt patching.

Also, direct calls are both faster and have less speculation issues, so
it might still be worth looking at.

The way to do something like this would be:


	asm volatile ("   ANNOTATE_RETPOLINE_SAFE	\n\t"
		      "1: call *hv_hypercall_page	\n\t"
		      ".pushsection .hv_call_sites	\n\t"
		      ".long 1b - .			\n\t"
		      ".popsection			\n\t");


And then (see alternative.c for many other examples):


patch_hypercalls()
{
	s32 *s;

	for (s = __hv_call_sites_begin; s < __hv_call_sites_end; s++) {
		void *addr = (void *)s + *s;
		struct insn insn;

		ret = insn_decode_kernel(&insn, addr);
		if (WARN_ON_ONCE(ret < 0))
			continue;

		/*
		 * indirect call: ff 15 disp32
		 * direct call:   2e e8 disp32
		 */
		if (insn.length == 6 &&
		    insn.opcode.bytes[0] == 0xFF &&
		    X86_MODRM_REG(insn.modrm.bytes[0]) == 2) {

			/* verify it was calling hy_hypercall_page */
			if (WARN_ON_ONCE(addr + 6 + insn.displacement.value != &hv_hypercall_page))
				continue;

			/*
			 * write a CS padded direct call -- assumes the
			 * hypercall page is in the 2G immediate range
			 * of the kernel text
			 */
			addr[0] = 0x2e; /* CS prefix */
			addr[1] = CALL_INSN_OPCODE;
			(s32 *)&Addr[2] = *hv_hypercall_page - (addr + 6);
		}
	}
}


See, easy :-)
  
Michael Kelley (LINUX) July 21, 2023, 2 p.m. UTC | #4
From: Peter Zijlstra <peterz@infradead.org> Sent: Friday, July 21, 2023 12:59 AM
> 
> On Fri, Jul 21, 2023 at 12:41:35AM +0000, Michael Kelley (LINUX) wrote:
> 
> > > Other than that, this seems fairly straight forward. One thing I
> > > wondered about; wouldn't it be possible to re-write the indirect
> > > hypercall thingies to a direct call? I mean, once we have the hypercall
> > > page mapped, the address is known right?
> >
> > Yes, the address is known.  It does not change across things like
> > hibernation.  But the indirect call instruction is part of an inline assembly
> > sequence, so the call instructions that need re-writing are scattered
> > throughout the code. There's also the SEV-SNP case from the
> > latest version of Tianyu Lan's patch set [1] where vmmcall may be used
> > instead, based on your recent enhancement for nested ALTERNATIVE.
> > Re-writing seems like that's more complexity than warranted for a
> > mostly interim situation until the Hyper-V patch is available and
> > users install it.
> 
> Well, we have a lot of infrastructure for this already. Specifically
> this is very like the paravirt patching.
> 
> Also, direct calls are both faster and have less speculation issues, so
> it might still be worth looking at.
> 
> The way to do something like this would be:
> 
> 
> 	asm volatile ("   ANNOTATE_RETPOLINE_SAFE	\n\t"
> 		      "1: call *hv_hypercall_page	\n\t"
> 		      ".pushsection .hv_call_sites	\n\t"
> 		      ".long 1b - .			\n\t"
> 		      ".popsection			\n\t");
> 
> 
> And then (see alternative.c for many other examples):
> 
> 
> patch_hypercalls()
> {
> 	s32 *s;
> 
> 	for (s = __hv_call_sites_begin; s < __hv_call_sites_end; s++) {
> 		void *addr = (void *)s + *s;
> 		struct insn insn;
> 
> 		ret = insn_decode_kernel(&insn, addr);
> 		if (WARN_ON_ONCE(ret < 0))
> 			continue;
> 
> 		/*
> 		 * indirect call: ff 15 disp32
> 		 * direct call:   2e e8 disp32
> 		 */
> 		if (insn.length == 6 &&
> 		    insn.opcode.bytes[0] == 0xFF &&
> 		    X86_MODRM_REG(insn.modrm.bytes[0]) == 2) {
> 
> 			/* verify it was calling hy_hypercall_page */
> 			if (WARN_ON_ONCE(addr + 6 + insn.displacement.value != &hv_hypercall_page))
> 				continue;
> 
> 			/*
> 			 * write a CS padded direct call -- assumes the
> 			 * hypercall page is in the 2G immediate range
> 			 * of the kernel text

Probably not true -- the hypercall page has a vmalloc address.

> 			 */
> 			addr[0] = 0x2e; /* CS prefix */
> 			addr[1] = CALL_INSN_OPCODE;
> 			(s32 *)&Addr[2] = *hv_hypercall_page - (addr + 6);
> 		}
> 	}
> }
> 
> 
> See, easy :-)

OK, worth looking into.  This is a corner of the Linux kernel code that
I've never looked at before.  I appreciate the pointers.

Hypercall sites also exist in loadable modules, so would need to hook
into module_finalize() as well.  Processing a new section type looks
straightforward.

But altogether, this feels like more change than should go as a bug
fix to be backported to stable kernels.  It's something to look at for a
future kernel release.

Michael
  
Michael Kelley (LINUX) July 21, 2023, 2:05 p.m. UTC | #5
From: Michael Kelley (LINUX) <mikelley@microsoft.com> Sent: Thursday, July 20, 2023 5:42 PM
> 
> From: Peter Zijlstra <peterz@infradead.org> Sent: Thursday, July 20, 2023 2:16 PM
> >
> > > @@ -472,6 +473,26 @@ void __init hyperv_init(void)
> > >  	}
> > >
> > >  	/*
> > > +	 * Some versions of Hyper-V that provide IBT in guest VMs have a bug
> > > +	 * in that there's no ENDBR64 instruction at the entry to the
> > > +	 * hypercall page. Because hypercalls are invoked via an indirect call
> > > +	 * to the hypercall page, all hypercall attempts fail when IBT is
> > > +	 * enabled, and Linux panics. For such buggy versions, disable IBT.
> > > +	 *
> > > +	 * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall
> > > +	 * page, so if future Linux kernel versions enable IBT for 32-bit
> > > +	 * builds, additional hypercall page hackery will be required here
> > > +	 * to provide an ENDBR32.
> > > +	 */
> > > +#ifdef CONFIG_X86_KERNEL_IBT
> > > +	if (cpu_feature_enabled(X86_FEATURE_IBT) &&
> > > +	    *(u32 *)hv_hypercall_pg != gen_endbr()) {
> > > +		setup_clear_cpu_cap(X86_FEATURE_IBT);
> > > +		pr_info("Hyper-V: Disabling IBT because of Hyper-V bug\n");
> > > +	}
> > > +#endif
> >
> > pr_warn() perhaps?
> 
> I wanted pr_info() so there's an immediate way to check for this
> case in the dmesg output if a user complains about IBT not being
> enabled when he expects it.   In some sense, the message is temporary
> because once the Hyper-V patch is available and users install it,
> the message will go away.  The pipeline for the Hyper-V patch is a
> bit long, so availability is at least several months away.  This Linux
> workaround will be available much faster.  Once it is picked up on
> stable branches, we will avoid the situations like we saw where
> someone upgraded Fedora 38 from a 6.2 to a 6.3 kernel, and the 6.3
> kernel wouldn't boot because it has kernel IBT enabled.
> 

I realized in the middle of the night that my reply was nonsense. :-(
pr_warn() makes the message visible when pr_info() might not.  I'm
happy to change to pr_warn().

Michael
  
David Laight July 21, 2023, 2:07 p.m. UTC | #6
...
> I realized in the middle of the night that my reply was nonsense. :-(
> pr_warn() makes the message visible when pr_info() might not.  I'm
> happy to change to pr_warn().

PANIC_ON_WARN??

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
Michael Kelley (LINUX) July 21, 2023, 2:21 p.m. UTC | #7
From: David Laight <David.Laight@ACULAB.COM> Sent: Friday, July 21, 2023 7:07 AM
> 
> ...
> > I realized in the middle of the night that my reply was nonsense. :-(
> > pr_warn() makes the message visible when pr_info() might not.  I'm
> > happy to change to pr_warn().
> 
> PANIC_ON_WARN??
> 

panic_on_warn applies to WARN() and variants.  pr_warn() is unrelated;
it's just kernel logging level 4 vs. logging level 6 for pr_info().

Michael
  
Peter Zijlstra July 21, 2023, 6:49 p.m. UTC | #8
On Fri, Jul 21, 2023 at 02:00:35PM +0000, Michael Kelley (LINUX) wrote:

> > Well, we have a lot of infrastructure for this already. Specifically
> > this is very like the paravirt patching.
> > 
> > Also, direct calls are both faster and have less speculation issues, so
> > it might still be worth looking at.
> > 
> > The way to do something like this would be:
> > 
> > 
> > 	asm volatile ("   ANNOTATE_RETPOLINE_SAFE	\n\t"
> > 		      "1: call *hv_hypercall_page	\n\t"
> > 		      ".pushsection .hv_call_sites	\n\t"
> > 		      ".long 1b - .			\n\t"
> > 		      ".popsection			\n\t");
> > 
> > 
> > And then (see alternative.c for many other examples):
> > 
> > 
> > patch_hypercalls()
> > {
> > 	s32 *s;
> > 
> > 	for (s = __hv_call_sites_begin; s < __hv_call_sites_end; s++) {
> > 		void *addr = (void *)s + *s;
> > 		struct insn insn;
> > 
> > 		ret = insn_decode_kernel(&insn, addr);
> > 		if (WARN_ON_ONCE(ret < 0))
> > 			continue;
> > 
> > 		/*
> > 		 * indirect call: ff 15 disp32
> > 		 * direct call:   2e e8 disp32
> > 		 */
> > 		if (insn.length == 6 &&
> > 		    insn.opcode.bytes[0] == 0xFF &&
> > 		    X86_MODRM_REG(insn.modrm.bytes[0]) == 2) {
> > 
> > 			/* verify it was calling hy_hypercall_page */
> > 			if (WARN_ON_ONCE(addr + 6 + insn.displacement.value != &hv_hypercall_page))
> > 				continue;
> > 
> > 			/*
> > 			 * write a CS padded direct call -- assumes the
> > 			 * hypercall page is in the 2G immediate range
> > 			 * of the kernel text
> 
> Probably not true -- the hypercall page has a vmalloc address.

See module_alloc(), that uses vmalloc but constrains the address to stay
within the 2G immediate address limit.

> > 			 */
> > 			addr[0] = 0x2e; /* CS prefix */
> > 			addr[1] = CALL_INSN_OPCODE;
> > 			(s32 *)&Addr[2] = *hv_hypercall_page - (addr + 6);
			*(s32 *)...
> > 		}
> > 	}
> > }
> > 
> > 
> > See, easy :-)
> 
> OK, worth looking into.  This is a corner of the Linux kernel code that
> I've never looked at before.  I appreciate the pointers.

No problem, I've been doing too much of this the past few years :-)

> Hypercall sites also exist in loadable modules, so would need to hook
> into module_finalize() as well.  Processing a new section type looks
> straightforward.

Yep,

> But altogether, this feels like more change than should go as a bug
> fix to be backported to stable kernels.  It's something to look at for a
> future kernel release.

Agreed!
  

Patch

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 6c04b52..5cbee24 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -14,6 +14,7 @@ 
 #include <asm/apic.h>
 #include <asm/desc.h>
 #include <asm/sev.h>
+#include <asm/ibt.h>
 #include <asm/hypervisor.h>
 #include <asm/hyperv-tlfs.h>
 #include <asm/mshyperv.h>
@@ -472,6 +473,26 @@  void __init hyperv_init(void)
 	}
 
 	/*
+	 * Some versions of Hyper-V that provide IBT in guest VMs have a bug
+	 * in that there's no ENDBR64 instruction at the entry to the
+	 * hypercall page. Because hypercalls are invoked via an indirect call
+	 * to the hypercall page, all hypercall attempts fail when IBT is
+	 * enabled, and Linux panics. For such buggy versions, disable IBT.
+	 *
+	 * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall
+	 * page, so if future Linux kernel versions enable IBT for 32-bit
+	 * builds, additional hypercall page hackery will be required here
+	 * to provide an ENDBR32.
+	 */
+#ifdef CONFIG_X86_KERNEL_IBT
+	if (cpu_feature_enabled(X86_FEATURE_IBT) &&
+	    *(u32 *)hv_hypercall_pg != gen_endbr()) {
+		setup_clear_cpu_cap(X86_FEATURE_IBT);
+		pr_info("Hyper-V: Disabling IBT because of Hyper-V bug\n");
+	}
+#endif
+
+	/*
 	 * hyperv_init() is called before LAPIC is initialized: see
 	 * apic_intr_mode_init() -> x86_platform.apic_post_init() and
 	 * apic_bsp_setup() -> setup_local_APIC(). The direct-mode STIMER