From patchwork Sun Oct 30 06:22:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 12830 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp1665223wru; Sat, 29 Oct 2022 23:25:28 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5BsuxK/qSogv7HQc3RAElPPNyy5Wxe3Y74YcxqAoDu5LQe3DNLWP4hi6upvzB3keHPw5gV X-Received: by 2002:a50:ee87:0:b0:461:a09b:aae5 with SMTP id f7-20020a50ee87000000b00461a09baae5mr7650127edr.24.1667111128089; Sat, 29 Oct 2022 23:25:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667111128; cv=none; d=google.com; s=arc-20160816; b=Mo4NzZyHSJ18DuGXbz/RwVWjImvvv8tDcDvNldiC1URcwiA9B5aFKFklW5GAesTTzY JC6EC+0At4Y+Rpbi9M3PpHQL/vcWUUJzJK/8kx/xAWxnNUtXPceiP1InvkSFvqM5I6PN s2mlSXlrXmx3uPzb/GSp2lboN9wMRPKbGzZ50K12dPEIbr4Mm7wsHWEd4CM8u/oYTiiI u70gL0uBWwhhhygzPn2eda7SUjEz8g3bN/KOg+y33Tui0Z0NISCykCUC3jWUraFbjt7g mzw800hn46ha55mkh33KBSh9yJ1VCOa7Jr41/KYl0SfuUWQGrS7jIyyLGgHmko6kdccw 8cuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ZteNeb4mkj37Hzzn537NNuYlS8RHkAafgGEaQZZv1bE=; b=jT7v1Hu/eqyjFIcbudqI4fb/djIokeUzb47J1qC0D63653HKx9fuKzQDEMEUsbEwld jnHVFcP4a6QoXFBxjxccVZtKkJjUe7johq+nLxEg2oHD7zX3KKY2hK213OEuB0WYXjTm 9mmsoDx5MYh5IBreOduyBhE4sAMFbzyAE2Kwt2jBMahCwGD8nGEJ4ooqOaW4zUVSVJ2K qNP4salGBy0BQ87+MLi8ag3AWKQPt6aehxMydEIPTzIds9kjfN2hfklO1vMH1oU2fd2m PIxDxCA8Peirp/AnkRVJG87py2T6Bms96s+/V6MRp45JCC3+VW6J3nON20NqGBkOGwZn 8ucw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=PcRz3Wlv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a27-20020a50c31b000000b00457e9f88b90si3715362edb.246.2022.10.29.23.25.00; Sat, 29 Oct 2022 23:25:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=PcRz3Wlv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229552AbiJ3GYU (ORCPT + 99 others); Sun, 30 Oct 2022 02:24:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46846 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229647AbiJ3GYC (ORCPT ); Sun, 30 Oct 2022 02:24:02 -0400 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 02043A8; Sat, 29 Oct 2022 23:24:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667111040; x=1698647040; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Wz0IOBwmt08sjnia6i0n3OJTM/lMAvZ79Lb70SgXMts=; b=PcRz3WlvN2kzIZzgkGZNjbmHgKfH4IfslmkHfAIsJw4rFyq0IEQnj+N9 PPYaxj0J4uERFOvNGUxubutU5FYbuMg5CSAEu9HJnB2/UkDUS1yBxH7+G Kwws201DeQMdKlMkNP5MG3uKVD14UYnI0XaPTK5NoIjuNjakuWUBhLYgU wSaRKAvw7QdIRAO0rNHHla8aIXh+bBLCBmrIVS5KuE6sBvxu1IlZoXMtg UkdTH2CV3k1qcGU3zPqe3aXsO2DSZLwj8hlVamqh9UhI5m0jAXStEOfjY RyCakXK7MXiKZid85dQJqrW7N6iYuPPdh3X+jDR5bp1Q5Y7V1j4h0WSrV Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10515"; a="395037116" X-IronPort-AV: E=Sophos;i="5.95,225,1661842800"; d="scan'208";a="395037116" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Oct 2022 23:23:57 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10515"; a="878392847" X-IronPort-AV: E=Sophos;i="5.95,225,1661842800"; d="scan'208";a="878392847" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Oct 2022 23:23:57 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , erdemaktas@google.com, Sean Christopherson , Sagi Shahar , David Matlack , Sean Christopherson , Xiaoyao Li Subject: [PATCH v10 006/108] KVM: x86: Introduce vm_type to differentiate default VMs from confidential VMs Date: Sat, 29 Oct 2022 23:22:07 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748092718308454117?= X-GMAIL-MSGID: =?utf-8?q?1748092718308454117?= From: Sean Christopherson Unlike default VMs, confidential VMs (Intel TDX and AMD SEV-ES) don't allow some operations (e.g., memory read/write, register state access, etc). Introduce vm_type to track the type of the VM to x86 KVM. Other arch KVMs already use vm_type, KVM_INIT_VM accepts vm_type, and x86 KVM callback vm_init accepts vm_type. So follow them. Further, a different policy can be made based on vm_type. Define KVM_X86_DEFAULT_VM for default VM as default and define KVM_X86_TDX_VM for Intel TDX VM. The wrapper function will be defined as "bool is_td(kvm) { return vm_type == VM_TYPE_TDX; }" Add a capability KVM_CAP_VM_TYPES to effectively allow device model, e.g. qemu, to query what VM types are supported by KVM. This (introduce a new capability and add vm_type) is chosen to align with other arch KVMs that have VM types already. Other arch KVMs uses different name to query supported vm types and there is no common name for it, so new name was chosen. Co-developed-by: Xiaoyao Li Signed-off-by: Xiaoyao Li Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Reviewed-by: Paolo Bonzini --- Documentation/virt/kvm/api.rst | 21 +++++++++++++++++++++ arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/include/uapi/asm/kvm.h | 3 +++ arch/x86/kvm/svm/svm.c | 6 ++++++ arch/x86/kvm/vmx/main.c | 1 + arch/x86/kvm/vmx/tdx.h | 6 +----- arch/x86/kvm/vmx/vmx.c | 5 +++++ arch/x86/kvm/vmx/x86_ops.h | 1 + arch/x86/kvm/x86.c | 9 ++++++++- include/uapi/linux/kvm.h | 1 + tools/arch/x86/include/uapi/asm/kvm.h | 3 +++ tools/include/uapi/linux/kvm.h | 1 + 13 files changed, 54 insertions(+), 6 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 08253cf498d1..b6f08e8a8320 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -147,10 +147,31 @@ described as 'basic' will be available. The new VM has no virtual cpus and no memory. You probably want to use 0 as machine type. +X86: +^^^^ + +Supported vm type can be queried from KVM_CAP_VM_TYPES, which returns the +bitmap of supported vm types. The 1-setting of bit @n means vm type with +value @n is supported. + +S390: +^^^^^ + In order to create user controlled virtual machines on S390, check KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as privileged user (CAP_SYS_ADMIN). +MIPS: +^^^^^ + +To use hardware assisted virtualization on MIPS (VZ ASE) rather than +the default trap & emulate implementation (which changes the virtual +memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the +flag KVM_VM_MIPS_VZ. + +ARM64: +^^^^^^ + On arm64, the physical address size for a VM (IPA Size limit) is limited to 40bits by default. The limit can be configured if the host supports the extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index 15c1515a0760..8a5c5ae70bc5 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -19,6 +19,7 @@ KVM_X86_OP(hardware_disable) KVM_X86_OP(hardware_unsetup) KVM_X86_OP(has_emulated_msr) KVM_X86_OP(vcpu_after_set_cpuid) +KVM_X86_OP(is_vm_type_supported) KVM_X86_OP(vm_init) KVM_X86_OP_OPTIONAL(vm_destroy) KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 66b1ff1cec61..2a41a93a80f3 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1151,6 +1151,7 @@ enum kvm_apicv_inhibit { }; struct kvm_arch { + unsigned long vm_type; unsigned long n_used_mmu_pages; unsigned long n_requested_mmu_pages; unsigned long n_max_mmu_pages; @@ -1468,6 +1469,7 @@ struct kvm_x86_ops { bool (*has_emulated_msr)(struct kvm *kvm, u32 index); void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu); + bool (*is_vm_type_supported)(unsigned long vm_type); unsigned int vm_size; int (*vm_init)(struct kvm *kvm); void (*vm_destroy)(struct kvm *kvm); diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 46de10a809ec..54b08789c402 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -532,4 +532,7 @@ struct kvm_pmu_event_filter { #define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */ #define KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */ +#define KVM_X86_DEFAULT_VM 0 +#define KVM_X86_TDX_VM 1 + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index aaf5f0623011..2bcf2e1a5271 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4713,6 +4713,11 @@ static void svm_vm_destroy(struct kvm *kvm) sev_vm_destroy(kvm); } +static bool svm_is_vm_type_supported(unsigned long type) +{ + return type == KVM_X86_DEFAULT_VM; +} + static int svm_vm_init(struct kvm *kvm) { if (!pause_filter_count || !pause_filter_thresh) @@ -4740,6 +4745,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { .vcpu_free = svm_vcpu_free, .vcpu_reset = svm_vcpu_reset, + .is_vm_type_supported = svm_is_vm_type_supported, .vm_size = sizeof(struct kvm_svm), .vm_init = svm_vm_init, .vm_destroy = svm_vm_destroy, diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index a5cbf3ca2055..22bf49afc761 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -33,6 +33,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .hardware_disable = vmx_hardware_disable, .has_emulated_msr = vmx_has_emulated_msr, + .is_vm_type_supported = vmx_is_vm_type_supported, .vm_size = sizeof(struct kvm_vmx), .vm_init = vmx_vm_init, .vm_destroy = vmx_vm_destroy, diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 060bf48ec3d6..473013265bd8 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -15,11 +15,7 @@ struct vcpu_tdx { static inline bool is_td(struct kvm *kvm) { - /* - * TDX VM type isn't defined yet. - * return kvm->arch.vm_type == KVM_X86_TDX_VM; - */ - return false; + return kvm->arch.vm_type == KVM_X86_TDX_VM; } static inline bool is_td_vcpu(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 68aef67c5eb7..dc05b78e0a1e 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7410,6 +7410,11 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu) return err; } +bool vmx_is_vm_type_supported(unsigned long type) +{ + return type == KVM_X86_DEFAULT_VM; +} + #define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n" #define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation disabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n" diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 6196d651a00a..d4877f4f93de 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -27,6 +27,7 @@ void vmx_hardware_unsetup(void); int vmx_check_processor_compatibility(void); int vmx_hardware_enable(void); void vmx_hardware_disable(void); +bool vmx_is_vm_type_supported(unsigned long type); int vmx_vm_init(struct kvm *kvm); void vmx_vm_destroy(struct kvm *kvm); int vmx_vcpu_precreate(struct kvm *kvm); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 715a53d4fc3d..91053fdc4512 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4520,6 +4520,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_X86_NOTIFY_VMEXIT: r = kvm_caps.has_notify_vmexit; break; + case KVM_CAP_VM_TYPES: + r = BIT(KVM_X86_DEFAULT_VM); + if (static_call(kvm_x86_is_vm_type_supported)(KVM_X86_TDX_VM)) + r |= BIT(KVM_X86_TDX_VM); + break; default: break; } @@ -12507,9 +12512,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) int ret; unsigned long flags; - if (type) + if (!static_call(kvm_x86_is_vm_type_supported)(type)) return -EINVAL; + kvm->arch.vm_type = type; + ret = kvm_page_track_init(kvm); if (ret) goto out; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index fa60b032a405..49386e4de8b8 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1216,6 +1216,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_S390_CPU_TOPOLOGY 222 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223 #define KVM_CAP_PRIVATE_MEM 224 +#define KVM_CAP_VM_TYPES 225 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include/uapi/asm/kvm.h index 46de10a809ec..54b08789c402 100644 --- a/tools/arch/x86/include/uapi/asm/kvm.h +++ b/tools/arch/x86/include/uapi/asm/kvm.h @@ -532,4 +532,7 @@ struct kvm_pmu_event_filter { #define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */ #define KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */ +#define KVM_X86_DEFAULT_VM 0 +#define KVM_X86_TDX_VM 1 + #endif /* _ASM_X86_KVM_H */ diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h index 0d5d4419139a..812b771d8702 100644 --- a/tools/include/uapi/linux/kvm.h +++ b/tools/include/uapi/linux/kvm.h @@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_S390_ZPCI_OP 221 #define KVM_CAP_S390_CPU_TOPOLOGY 222 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223 +#define KVM_CAP_VM_TYPES 225 #ifdef KVM_CAP_IRQ_ROUTING