diff mbox series

[1/3] kvm: wire up KVM_CAP_VM_GPA_BITS for x86

Message ID	20240301101410.356007-2-kraxel@redhat.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel+bounces-88243-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; From: Gerd Hoffmann <kraxel@redhat.com> To: kvm@vger.kernel.org Cc: Gerd Hoffmann <kraxel@redhat.com>, Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), "H. Peter Anvin" <hpa@zytor.com>, linux-kernel@vger.kernel.org (open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)) Subject: [PATCH 1/3] kvm: wire up KVM_CAP_VM_GPA_BITS for x86 Date: Fri, 1 Mar 2024 11:14:07 +0100 Message-ID: <20240301101410.356007-2-kraxel@redhat.com> In-Reply-To: <20240301101410.356007-1-kraxel@redhat.com> References: <20240301101410.356007-1-kraxel@redhat.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[1/3] kvm: wire up KVM_CAP_VM_GPA_BITS for x86 \| [1/3] kvm: wire up KVM_CAP_VM_GPA_BITS for x86 [2/3] kvm/vmx: limit guest_phys_bits to 48 without 5-level ept [3/3] kvm/svm: limit guest_phys_bits to 48 in 4-level paging mode

Commit Message

Gerd Hoffmann March 1, 2024, 10:14 a.m. UTC

  Add new guest_phys_bits field to kvm_caps, return the value to
userspace when asked for KVM_CAP_VM_GPA_BITS capability.

Initialize guest_phys_bits with boot_cpu_data.x86_phys_bits.
Vendor modules (i.e. vmx and svm) can adjust this field in case
additional restrictions apply, for example in case EPT has no
support for 5-level paging.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 arch/x86/kvm/x86.h | 2 ++
 arch/x86/kvm/x86.c | 5 +++++
 2 files changed, 7 insertions(+)

Comments

Tao Su March 1, 2024, 4:13 p.m. UTC | #1

On Fri, Mar 01, 2024 at 11:14:07AM +0100, Gerd Hoffmann wrote:
> Add new guest_phys_bits field to kvm_caps, return the value to
> userspace when asked for KVM_CAP_VM_GPA_BITS capability.
> 
> Initialize guest_phys_bits with boot_cpu_data.x86_phys_bits.
> Vendor modules (i.e. vmx and svm) can adjust this field in case
> additional restrictions apply, for example in case EPT has no
> support for 5-level paging.
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  arch/x86/kvm/x86.h | 2 ++
>  arch/x86/kvm/x86.c | 5 +++++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 2f7e19166658..e03aec3527f8 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -24,6 +24,8 @@ struct kvm_caps {
>  	bool has_bus_lock_exit;
>  	/* notify VM exit supported? */
>  	bool has_notify_vmexit;
> +	/* usable guest phys bits */
> +	u32  guest_phys_bits;
>  
>  	u64 supported_mce_cap;
>  	u64 supported_xcr0;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 48a61d283406..e270b9b708d1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4784,6 +4784,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  		if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM))
>  			r |= BIT(KVM_X86_SW_PROTECTED_VM);
>  		break;
> +	case KVM_CAP_VM_GPA_BITS:
> +		r = kvm_caps.guest_phys_bits;
> +		break;
>  	default:
>  		break;
>  	}
> @@ -9706,6 +9709,8 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>  	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
>  		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, host_arch_capabilities);
>  
> +	kvm_caps.guest_phys_bits = boot_cpu_data.x86_phys_bits;

When KeyID_bits is non-zero, MAXPHYADDR != boot_cpu_data.x86_phys_bits
here, you can check in detect_tme().

Thanks,
Tao

> +
>  	r = ops->hardware_setup();
>  	if (r != 0)
>  		goto out_mmu_exit;
> -- 
> 2.44.0
> 
>

Gerd Hoffmann March 4, 2024, 8:43 a.m. UTC | #2

> > +	kvm_caps.guest_phys_bits = boot_cpu_data.x86_phys_bits;
> 
> When KeyID_bits is non-zero, MAXPHYADDR != boot_cpu_data.x86_phys_bits
> here, you can check in detect_tme().

from detect_tme():

        /*
         * KeyID bits effectively lower the number of physical address
         * bits.  Update cpuinfo_x86::x86_phys_bits accordingly.
         */
        c->x86_phys_bits -= keyid_bits;

This looks like x86_phys_bits gets adjusted if needed.

take care,
  Gerd

Tao Su March 4, 2024, 8:59 a.m. UTC | #3

On Mon, Mar 04, 2024 at 09:43:53AM +0100, Gerd Hoffmann wrote:
> > > +	kvm_caps.guest_phys_bits = boot_cpu_data.x86_phys_bits;
> > 
> > When KeyID_bits is non-zero, MAXPHYADDR != boot_cpu_data.x86_phys_bits
> > here, you can check in detect_tme().
> 
> from detect_tme():
> 
>         /*
>          * KeyID bits effectively lower the number of physical address
>          * bits.  Update cpuinfo_x86::x86_phys_bits accordingly.
>          */
>         c->x86_phys_bits -= keyid_bits;
> 
> This looks like x86_phys_bits gets adjusted if needed.

If TDP is enabled and supports 5-level, we want kvm_caps.guest_phys_bits=52,
but c->x86_phys_bits!=52 here. Maybe we need to set kvm_caps.guest_phys_bits
according to whether TDP is enabled or not, like leaf 0x80000008 in
__do_cpuid_func().

Thanks,
Tao

Gerd Hoffmann March 4, 2024, 11:47 a.m. UTC | #4

On Mon, Mar 04, 2024 at 04:59:32PM +0800, Tao Su wrote:
> On Mon, Mar 04, 2024 at 09:43:53AM +0100, Gerd Hoffmann wrote:
> > > > +	kvm_caps.guest_phys_bits = boot_cpu_data.x86_phys_bits;
> > > 
> > > When KeyID_bits is non-zero, MAXPHYADDR != boot_cpu_data.x86_phys_bits
> > > here, you can check in detect_tme().
> > 
> > from detect_tme():
> > 
> >         /*
> >          * KeyID bits effectively lower the number of physical address
> >          * bits.  Update cpuinfo_x86::x86_phys_bits accordingly.
> >          */
> >         c->x86_phys_bits -= keyid_bits;
> > 
> > This looks like x86_phys_bits gets adjusted if needed.
> 
> If TDP is enabled and supports 5-level, we want kvm_caps.guest_phys_bits=52,
> but c->x86_phys_bits!=52 here.

Do you talk about EPT or NPT or both?

> Maybe we need to set kvm_caps.guest_phys_bits
> according to whether TDP is enabled or not, like leaf 0x80000008 in
> __do_cpuid_func().

See patches 2+3 of this series.

Maybe it is better to just not set kvm_caps.guest_phys_bits in generic
kvm code and leave that completely to vmx / svm vendor modules.  Or let
the generic code handle the !tdp_enabled case and have the vendor
modules override (considering EPT / NPT limitations) in case tdp is
enabled.

take care,
  Gerd

Sean Christopherson March 4, 2024, 3:15 p.m. UTC | #5

On Fri, Mar 01, 2024, Gerd Hoffmann wrote:
> Add new guest_phys_bits field to kvm_caps, return the value to
> userspace when asked for KVM_CAP_VM_GPA_BITS capability.
> 
> Initialize guest_phys_bits with boot_cpu_data.x86_phys_bits.
> Vendor modules (i.e. vmx and svm) can adjust this field in case
> additional restrictions apply, for example in case EPT has no
> support for 5-level paging.
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  arch/x86/kvm/x86.h | 2 ++
>  arch/x86/kvm/x86.c | 5 +++++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 2f7e19166658..e03aec3527f8 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -24,6 +24,8 @@ struct kvm_caps {
>  	bool has_bus_lock_exit;
>  	/* notify VM exit supported? */
>  	bool has_notify_vmexit;
> +	/* usable guest phys bits */
> +	u32  guest_phys_bits;
>  
>  	u64 supported_mce_cap;
>  	u64 supported_xcr0;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 48a61d283406..e270b9b708d1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4784,6 +4784,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  		if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM))
>  			r |= BIT(KVM_X86_SW_PROTECTED_VM);
>  		break;
> +	case KVM_CAP_VM_GPA_BITS:
> +		r = kvm_caps.guest_phys_bits;

This is not a fast path, just compute the effective guest.MAXPHYADDR on the fly
using tdp_root_level and max_tdp_level.  But as pointed out and discussed in the
previous thread, adverising a guest.MAXPHYADDR that is smaller than host.MAXPHYADDR
simply doesn't work[*].

I thought the plan was to add a way for KVM to advertise the maximum *addressable*
GPA, and figure out a way to communicate that to the guest, e.g. so that firmware
doesn't try to use legal GPAs that the host cannot address.

Paolo, any update on this?

[*] https://lore.kernel.org/all/CALMp9eTutnTxCjQjs-nxP=XC345vTmJJODr+PcSOeaQpBW0Skw@mail.gmail.com

diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 2f7e19166658..e03aec3527f8 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -24,6 +24,8 @@  struct kvm_caps {
 	bool has_bus_lock_exit;
 	/* notify VM exit supported? */
 	bool has_notify_vmexit;
+	/* usable guest phys bits */
+	u32  guest_phys_bits;
 
 	u64 supported_mce_cap;
 	u64 supported_xcr0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 48a61d283406..e270b9b708d1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4784,6 +4784,9 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM))
 			r |= BIT(KVM_X86_SW_PROTECTED_VM);
 		break;
+	case KVM_CAP_VM_GPA_BITS:
+		r = kvm_caps.guest_phys_bits;
+		break;
 	default:
 		break;
 	}
@@ -9706,6 +9709,8 @@  static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
 		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, host_arch_capabilities);
 
+	kvm_caps.guest_phys_bits = boot_cpu_data.x86_phys_bits;
+
 	r = ops->hardware_setup();
 	if (r != 0)
 		goto out_mmu_exit;