[v2,02/18] x86/reboot: Expose VMCS crash hooks if and only if KVM_INTEL is enabled

Message ID 20230310214232.806108-3-seanjc@google.com
State New
Headers
Series x86/reboot: KVM: Clean up "emergency" virt code |

Commit Message

Sean Christopherson March 10, 2023, 9:42 p.m. UTC
  Expose the crash/reboot hooks used by KVM to do VMCLEAR+VMXOFF if and
only if there's a potential in-tree user, KVM_INTEL.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/reboot.h | 2 ++
 arch/x86/kernel/reboot.c      | 4 ++++
 2 files changed, 6 insertions(+)
  

Comments

Kai Huang March 13, 2023, 12:31 a.m. UTC | #1
Hi Sean,

Thanks for copying me.

On Fri, 2023-03-10 at 13:42 -0800, Sean Christopherson wrote:
> Expose the crash/reboot hooks used by KVM to do VMCLEAR+VMXOFF if and
> only if there's a potential in-tree user, KVM_INTEL.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/reboot.h | 2 ++
>  arch/x86/kernel/reboot.c      | 4 ++++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
> index 2551baec927d..33c8e911e0de 100644
> --- a/arch/x86/include/asm/reboot.h
> +++ b/arch/x86/include/asm/reboot.h
> @@ -25,8 +25,10 @@ void __noreturn machine_real_restart(unsigned int type);
>  #define MRR_BIOS	0
>  #define MRR_APM		1
>  
> +#if IS_ENABLED(CONFIG_KVM_INTEL)
>  typedef void crash_vmclear_fn(void);
>  extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
> +#endif
>  void cpu_emergency_disable_virtualization(void);
>  
>  typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index 299b970e5f82..6c0b1634b884 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -787,6 +787,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
>  }
>  #endif
>  
> +#if IS_ENABLED(CONFIG_KVM_INTEL)
>  /*
>   * This is used to VMCLEAR all VMCSs loaded on the
>   * processor. And when loading kvm_intel module, the
> @@ -807,6 +808,7 @@ static inline void cpu_crash_vmclear_loaded_vmcss(void)
>  		do_vmclear_operation();
>  	rcu_read_unlock();
>  }
> +#endif
>  
>  /* This is the CPU performing the emergency shutdown work. */
>  int crashing_cpu = -1;
> @@ -818,7 +820,9 @@ int crashing_cpu = -1;
>   */
>  void cpu_emergency_disable_virtualization(void)
>  {
> +#if IS_ENABLED(CONFIG_KVM_INTEL)
>  	cpu_crash_vmclear_loaded_vmcss();
> +#endif
>  
>  	cpu_emergency_vmxoff();

In the changelog you mentioned to expose the *hooks* (plural) used to do
"VMCLEAR+VMXOFF" only when KVM_INTEL is on, but here only "VMCLEAR" is embraced
with CONFIG_KVM_INTEL.  So either the changelog needs improvement, or the code
should be adjusted?

Personally, I think it's better to move VMXOFF part within CONFIG_KVM_INTEL too,
if you want to do this.

But I am not sure whether we want to do this (having CONFIG_KVM_INTEL around the
relevant code).  In later patches, you mentioned the case of out-of-tree
hypervisor, for instance, below in the changelog of patch 04:

	There's no need to attempt VMXOFF if KVM (or some other out-of-tree 
	hypervisor) isn't loaded/active...

This means we want to do handle VMCLEAR+VMXOFF in case of out-of-tree hypervisor
too.  So, shouldn't the hooks always exist but not only available when KVM_INTEL
or KVM_AMD is on, so the out-of-tree hypervisor can register their callbacks?


>  	cpu_emergency_svm_disable();
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
>
  
Sean Christopherson March 13, 2023, 6:31 p.m. UTC | #2
On Mon, Mar 13, 2023, Huang, Kai wrote:
> Hi Sean,
> 
> Thanks for copying me.
> 
> On Fri, 2023-03-10 at 13:42 -0800, Sean Christopherson wrote:
> > Expose the crash/reboot hooks used by KVM to do VMCLEAR+VMXOFF if and
> > only if there's a potential in-tree user, KVM_INTEL.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---

...

> > diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> > index 299b970e5f82..6c0b1634b884 100644
> > --- a/arch/x86/kernel/reboot.c
> > +++ b/arch/x86/kernel/reboot.c
> > @@ -787,6 +787,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
> >  }
> >  #endif
> >  
> > +#if IS_ENABLED(CONFIG_KVM_INTEL)
> >  /*
> >   * This is used to VMCLEAR all VMCSs loaded on the
> >   * processor. And when loading kvm_intel module, the
> > @@ -807,6 +808,7 @@ static inline void cpu_crash_vmclear_loaded_vmcss(void)
> >  		do_vmclear_operation();
> >  	rcu_read_unlock();
> >  }
> > +#endif
> >  
> >  /* This is the CPU performing the emergency shutdown work. */
> >  int crashing_cpu = -1;
> > @@ -818,7 +820,9 @@ int crashing_cpu = -1;
> >   */
> >  void cpu_emergency_disable_virtualization(void)
> >  {
> > +#if IS_ENABLED(CONFIG_KVM_INTEL)
> >  	cpu_crash_vmclear_loaded_vmcss();
> > +#endif
> >  
> >  	cpu_emergency_vmxoff();
> 
> In the changelog you mentioned to expose the *hooks* (plural) used to do
> "VMCLEAR+VMXOFF" only when KVM_INTEL is on, but here only "VMCLEAR" is embraced
> with CONFIG_KVM_INTEL.  So either the changelog needs improvement, or the code
> should be adjusted?

I'll reword the changelog, "hooks" in my head was referring to the regsiter and
unregister "hooks", not the callback itself.

> Personally, I think it's better to move VMXOFF part within CONFIG_KVM_INTEL too,
> if you want to do this.

That happens eventually in the final third of this series.

> But I am not sure whether we want to do this (having CONFIG_KVM_INTEL around the
> relevant code).  In later patches, you mentioned the case of out-of-tree
> hypervisor, for instance, below in the changelog of patch 04:
> 
> 	There's no need to attempt VMXOFF if KVM (or some other out-of-tree�
> 	hypervisor) isn't loaded/active...
> 
> This means we want to do handle VMCLEAR+VMXOFF in case of out-of-tree hypervisor
> too.  So, shouldn't the hooks always exist but not only available when KVM_INTEL
> or KVM_AMD is on, so the out-of-tree hypervisor can register their callbacks?

Ah, I see how I confused things with that statement.  My intent was only to call
out that, technically, a non-NULL callback doesn't mean KVM is loaded.  I didn't
intend to sign the kernel up for going out of its way to support out-of-tree hypervisors.

Does it read better if I add a "that piggybacked the callback" qualifier?

  There's no need to attempt VMXOFF if KVM (or some other out-of-tree hypervisor
  that piggybacked the callback) isn't loaded/active, i.e. if the CPU can't
  possibly be post-VMXON.
  
Kai Huang March 14, 2023, 1:19 a.m. UTC | #3
> 
> > But I am not sure whether we want to do this (having CONFIG_KVM_INTEL around the
> > relevant code).  In later patches, you mentioned the case of out-of-tree
> > hypervisor, for instance, below in the changelog of patch 04:
> > 
> > 	There's no need to attempt VMXOFF if KVM (or some other out-of-tree�
> > 	hypervisor) isn't loaded/active...
> > 
> > This means we want to do handle VMCLEAR+VMXOFF in case of out-of-tree hypervisor
> > too.  So, shouldn't the hooks always exist but not only available when KVM_INTEL
> > or KVM_AMD is on, so the out-of-tree hypervisor can register their callbacks?
> 
> Ah, I see how I confused things with that statement.  My intent was only to call
> out that, technically, a non-NULL callback doesn't mean KVM is loaded.  I didn't
> intend to sign the kernel up for going out of its way to support out-of-tree hypervisors.

I interpret this as:

Kernel doesn't officially support the out-of-tree hypervisor, but it provides a
callback which the out-of-tree hypervisor can utilize to handle emergency virt
disable.  But such callback is only available when KVM is turned on in the
Kconfig.

?

> 
> Does it read better if I add a "that piggybacked the callback" qualifier?
> 
>   There's no need to attempt VMXOFF if KVM (or some other out-of-tree hypervisor
>   that piggybacked the callback) isn't loaded/active, i.e. if the CPU can't
>   possibly be post-VMXON. 

I think so?

But overall I just think having to consider out-of-tree hypervisor (we are
talking about a loadable module, right) only makes thing more confusing.  I
guess we can either:

1) Don't mention out-of-tree hypervisor at all.  This means kernel doesn't
officially provide mechanisms to support out-of-tree hyperivisor (a module).  If
someone wants to do that, then someone takes the risk.

2) The kernel officially provide the callback to handle emergency virt disable
for out-of-tree hypervisor (module) to use.  But this callback should be
available when KVM is off.

?
  

Patch

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 2551baec927d..33c8e911e0de 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -25,8 +25,10 @@  void __noreturn machine_real_restart(unsigned int type);
 #define MRR_BIOS	0
 #define MRR_APM		1
 
+#if IS_ENABLED(CONFIG_KVM_INTEL)
 typedef void crash_vmclear_fn(void);
 extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
+#endif
 void cpu_emergency_disable_virtualization(void);
 
 typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 299b970e5f82..6c0b1634b884 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -787,6 +787,7 @@  void machine_crash_shutdown(struct pt_regs *regs)
 }
 #endif
 
+#if IS_ENABLED(CONFIG_KVM_INTEL)
 /*
  * This is used to VMCLEAR all VMCSs loaded on the
  * processor. And when loading kvm_intel module, the
@@ -807,6 +808,7 @@  static inline void cpu_crash_vmclear_loaded_vmcss(void)
 		do_vmclear_operation();
 	rcu_read_unlock();
 }
+#endif
 
 /* This is the CPU performing the emergency shutdown work. */
 int crashing_cpu = -1;
@@ -818,7 +820,9 @@  int crashing_cpu = -1;
  */
 void cpu_emergency_disable_virtualization(void)
 {
+#if IS_ENABLED(CONFIG_KVM_INTEL)
 	cpu_crash_vmclear_loaded_vmcss();
+#endif
 
 	cpu_emergency_vmxoff();
 	cpu_emergency_svm_disable();