[1/3] kvm: wire up KVM_CAP_VM_GPA_BITS for x86

Message ID 20240301101410.356007-2-kraxel@redhat.com
State New
Headers
Series [1/3] kvm: wire up KVM_CAP_VM_GPA_BITS for x86 |

Commit Message

Gerd Hoffmann March 1, 2024, 10:14 a.m. UTC
  Add new guest_phys_bits field to kvm_caps, return the value to
userspace when asked for KVM_CAP_VM_GPA_BITS capability.

Initialize guest_phys_bits with boot_cpu_data.x86_phys_bits.
Vendor modules (i.e. vmx and svm) can adjust this field in case
additional restrictions apply, for example in case EPT has no
support for 5-level paging.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 arch/x86/kvm/x86.h | 2 ++
 arch/x86/kvm/x86.c | 5 +++++
 2 files changed, 7 insertions(+)
  

Comments

Tao Su March 1, 2024, 4:13 p.m. UTC | #1
On Fri, Mar 01, 2024 at 11:14:07AM +0100, Gerd Hoffmann wrote:
> Add new guest_phys_bits field to kvm_caps, return the value to
> userspace when asked for KVM_CAP_VM_GPA_BITS capability.
> 
> Initialize guest_phys_bits with boot_cpu_data.x86_phys_bits.
> Vendor modules (i.e. vmx and svm) can adjust this field in case
> additional restrictions apply, for example in case EPT has no
> support for 5-level paging.
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  arch/x86/kvm/x86.h | 2 ++
>  arch/x86/kvm/x86.c | 5 +++++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 2f7e19166658..e03aec3527f8 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -24,6 +24,8 @@ struct kvm_caps {
>  	bool has_bus_lock_exit;
>  	/* notify VM exit supported? */
>  	bool has_notify_vmexit;
> +	/* usable guest phys bits */
> +	u32  guest_phys_bits;
>  
>  	u64 supported_mce_cap;
>  	u64 supported_xcr0;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 48a61d283406..e270b9b708d1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4784,6 +4784,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  		if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM))
>  			r |= BIT(KVM_X86_SW_PROTECTED_VM);
>  		break;
> +	case KVM_CAP_VM_GPA_BITS:
> +		r = kvm_caps.guest_phys_bits;
> +		break;
>  	default:
>  		break;
>  	}
> @@ -9706,6 +9709,8 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>  	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
>  		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, host_arch_capabilities);
>  
> +	kvm_caps.guest_phys_bits = boot_cpu_data.x86_phys_bits;

When KeyID_bits is non-zero, MAXPHYADDR != boot_cpu_data.x86_phys_bits
here, you can check in detect_tme().

Thanks,
Tao

> +
>  	r = ops->hardware_setup();
>  	if (r != 0)
>  		goto out_mmu_exit;
> -- 
> 2.44.0
> 
>
  
Gerd Hoffmann March 4, 2024, 8:43 a.m. UTC | #2
> > +	kvm_caps.guest_phys_bits = boot_cpu_data.x86_phys_bits;
> 
> When KeyID_bits is non-zero, MAXPHYADDR != boot_cpu_data.x86_phys_bits
> here, you can check in detect_tme().

from detect_tme():

        /*
         * KeyID bits effectively lower the number of physical address
         * bits.  Update cpuinfo_x86::x86_phys_bits accordingly.
         */
        c->x86_phys_bits -= keyid_bits;

This looks like x86_phys_bits gets adjusted if needed.

take care,
  Gerd
  
Tao Su March 4, 2024, 8:59 a.m. UTC | #3
On Mon, Mar 04, 2024 at 09:43:53AM +0100, Gerd Hoffmann wrote:
> > > +	kvm_caps.guest_phys_bits = boot_cpu_data.x86_phys_bits;
> > 
> > When KeyID_bits is non-zero, MAXPHYADDR != boot_cpu_data.x86_phys_bits
> > here, you can check in detect_tme().
> 
> from detect_tme():
> 
>         /*
>          * KeyID bits effectively lower the number of physical address
>          * bits.  Update cpuinfo_x86::x86_phys_bits accordingly.
>          */
>         c->x86_phys_bits -= keyid_bits;
> 
> This looks like x86_phys_bits gets adjusted if needed.

If TDP is enabled and supports 5-level, we want kvm_caps.guest_phys_bits=52,
but c->x86_phys_bits!=52 here. Maybe we need to set kvm_caps.guest_phys_bits
according to whether TDP is enabled or not, like leaf 0x80000008 in
__do_cpuid_func().

Thanks,
Tao
  
Gerd Hoffmann March 4, 2024, 11:47 a.m. UTC | #4
On Mon, Mar 04, 2024 at 04:59:32PM +0800, Tao Su wrote:
> On Mon, Mar 04, 2024 at 09:43:53AM +0100, Gerd Hoffmann wrote:
> > > > +	kvm_caps.guest_phys_bits = boot_cpu_data.x86_phys_bits;
> > > 
> > > When KeyID_bits is non-zero, MAXPHYADDR != boot_cpu_data.x86_phys_bits
> > > here, you can check in detect_tme().
> > 
> > from detect_tme():
> > 
> >         /*
> >          * KeyID bits effectively lower the number of physical address
> >          * bits.  Update cpuinfo_x86::x86_phys_bits accordingly.
> >          */
> >         c->x86_phys_bits -= keyid_bits;
> > 
> > This looks like x86_phys_bits gets adjusted if needed.
> 
> If TDP is enabled and supports 5-level, we want kvm_caps.guest_phys_bits=52,
> but c->x86_phys_bits!=52 here.

Do you talk about EPT or NPT or both?

> Maybe we need to set kvm_caps.guest_phys_bits
> according to whether TDP is enabled or not, like leaf 0x80000008 in
> __do_cpuid_func().

See patches 2+3 of this series.

Maybe it is better to just not set kvm_caps.guest_phys_bits in generic
kvm code and leave that completely to vmx / svm vendor modules.  Or let
the generic code handle the !tdp_enabled case and have the vendor
modules override (considering EPT / NPT limitations) in case tdp is
enabled.

take care,
  Gerd
  
Sean Christopherson March 4, 2024, 3:15 p.m. UTC | #5
On Fri, Mar 01, 2024, Gerd Hoffmann wrote:
> Add new guest_phys_bits field to kvm_caps, return the value to
> userspace when asked for KVM_CAP_VM_GPA_BITS capability.
> 
> Initialize guest_phys_bits with boot_cpu_data.x86_phys_bits.
> Vendor modules (i.e. vmx and svm) can adjust this field in case
> additional restrictions apply, for example in case EPT has no
> support for 5-level paging.
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  arch/x86/kvm/x86.h | 2 ++
>  arch/x86/kvm/x86.c | 5 +++++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 2f7e19166658..e03aec3527f8 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -24,6 +24,8 @@ struct kvm_caps {
>  	bool has_bus_lock_exit;
>  	/* notify VM exit supported? */
>  	bool has_notify_vmexit;
> +	/* usable guest phys bits */
> +	u32  guest_phys_bits;
>  
>  	u64 supported_mce_cap;
>  	u64 supported_xcr0;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 48a61d283406..e270b9b708d1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4784,6 +4784,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  		if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM))
>  			r |= BIT(KVM_X86_SW_PROTECTED_VM);
>  		break;
> +	case KVM_CAP_VM_GPA_BITS:
> +		r = kvm_caps.guest_phys_bits;

This is not a fast path, just compute the effective guest.MAXPHYADDR on the fly
using tdp_root_level and max_tdp_level.  But as pointed out and discussed in the
previous thread, adverising a guest.MAXPHYADDR that is smaller than host.MAXPHYADDR
simply doesn't work[*].

I thought the plan was to add a way for KVM to advertise the maximum *addressable*
GPA, and figure out a way to communicate that to the guest, e.g. so that firmware
doesn't try to use legal GPAs that the host cannot address.

Paolo, any update on this?

[*] https://lore.kernel.org/all/CALMp9eTutnTxCjQjs-nxP=XC345vTmJJODr+PcSOeaQpBW0Skw@mail.gmail.com
  

Patch

diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 2f7e19166658..e03aec3527f8 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -24,6 +24,8 @@  struct kvm_caps {
 	bool has_bus_lock_exit;
 	/* notify VM exit supported? */
 	bool has_notify_vmexit;
+	/* usable guest phys bits */
+	u32  guest_phys_bits;
 
 	u64 supported_mce_cap;
 	u64 supported_xcr0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 48a61d283406..e270b9b708d1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4784,6 +4784,9 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM))
 			r |= BIT(KVM_X86_SW_PROTECTED_VM);
 		break;
+	case KVM_CAP_VM_GPA_BITS:
+		r = kvm_caps.guest_phys_bits;
+		break;
 	default:
 		break;
 	}
@@ -9706,6 +9709,8 @@  static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
 		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, host_arch_capabilities);
 
+	kvm_caps.guest_phys_bits = boot_cpu_data.x86_phys_bits;
+
 	r = ops->hardware_setup();
 	if (r != 0)
 		goto out_mmu_exit;