diff mbox series

[v10,024/108] KVM: TDX: allocate/free TDX vcpu structure

Message ID	ba773a3f779d4d9df24c03874462410d8ee9c955.1667110240.git.isaku.yamahata@intel.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com, Sean Christopherson <seanjc@google.com>, Sagi Shahar <sagis@google.com>, David Matlack <dmatlack@google.com> Subject: [PATCH v10 024/108] KVM: TDX: allocate/free TDX vcpu structure Date: Sat, 29 Oct 2022 23:22:25 -0700 Message-Id: <ba773a3f779d4d9df24c03874462410d8ee9c955.1667110240.git.isaku.yamahata@intel.com> In-Reply-To: <cover.1667110240.git.isaku.yamahata@intel.com> References: <cover.1667110240.git.isaku.yamahata@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	KVM TDX basic feature support \| [v10,000/108] KVM TDX basic feature support [v10,001/108] KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX [v10,002/108] KVM: x86: Refactor KVM VMX module init/exit functions [v10,003/108] KVM: TDX: Add placeholders for TDX VM/vcpu structure [v10,004/108] x86/virt/tdx: Add a helper function to return system wide info about TDX module [v10,005/108] KVM: TDX: Initialize the TDX module when loading the KVM intel kernel module [v10,006/108] KVM: x86: Introduce vm_type to differentiate default VMs from confidential VMs [v10,007/108] KVM: TDX: Make TDX VM type supported [v10,008/108,MARKER] The start of TDX KVM patch series: TDX architectural definitions [v10,009/108] KVM: TDX: Define TDX architectural definitions [v10,010/108] KVM: TDX: Add TDX "architectural" error codes [v10,011/108] KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module [v10,012/108] KVM: TDX: Add helper functions to print TDX SEAMCALL error [v10,013/108,MARKER] The start of TDX KVM patch series: TD VM creation/destruction [v10,014/108] KVM: TDX: Stub in tdx.h with structs, accessors, and VMCS helpers [v10,015/108] x86/cpu: Add helper functions to allocate/free TDX private host key id [v10,016/108] KVM: TDX: create/destroy VM structure [v10,017/108] KVM: TDX: Refuse to unplug the last cpu on the package [v10,018/108] KVM: TDX: x86: Add ioctl to get TDX systemwide parameters [v10,019/108] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl [v10,020/108] KVM: Support KVM_CAP_MAX_VCPUS for KVM_ENABLE_CAP [v10,021/108] KVM: TDX: initialize VM with TDX specific parameters [v10,022/108] KVM: TDX: Make pmu_intel.c ignore guest TD case [v10,023/108,MARKER] The start of TDX KVM patch series: TD vcpu creation/destruction [v10,024/108] KVM: TDX: allocate/free TDX vcpu structure [v10,025/108] KVM: TDX: Do TDX specific vcpu initialization [v10,026/108] KVM: TDX: Use private memory for TDX [v10,027/108,MARKER] The start of TDX KVM patch series: KVM MMU GPA shared bits [v10,028/108] KVM: x86/mmu: introduce config for PRIVATE KVM MMU [v10,029/108] KVM: x86/mmu: Add address conversion functions for TDX shared bit of GPA [v10,030/108,MARKER] The start of TDX KVM patch series: KVM TDP refactoring for TDX [v10,031/108] KVM: x86/mmu: Replace hardcoded value 0 for the initial value for SPTE [v10,032/108] KVM: x86/mmu: Make sync_page not use hard-coded 0 as the initial SPTE value [v10,033/108] KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed SPTE [v10,034/108] KVM: x86/mmu: Add Suppress VE bit to shadow_mmio_{value, mask} [v10,035/108] KVM: x86/mmu: Track shadow MMIO value on a per-VM basis [v10,036/108] KVM: TDX: Enable mmio spte caching always for TDX [v10,037/108] KVM: x86/mmu: Disallow fast page fault on private GPA [v10,038/108] KVM: x86/mmu: Allow per-VM override of the TDP max page level [v10,039/108] KVM: VMX: Introduce test mode related to EPT violation VE [v10,040/108,MARKER] The start of TDX KVM patch series: KVM TDP MMU hooks [v10,041/108] KVM: x86/tdp_mmu: refactor kvm_tdp_mmu_map() [v10,042/108] KVM: x86/tdp_mmu: Init role member of struct kvm_mmu_page at allocation [v10,043/108] KVM: x86/mmu: Require TDP MMU for TDX [v10,044/108] KVM: x86/mmu: Add a new is_private member for union kvm_mmu_page_role [v10,045/108] KVM: x86/mmu: Add a private pointer to struct kvm_mmu_page [v10,046/108] KVM: Add flags to struct kvm_gfn_range [v10,047/108] KVM: x86/tdp_mmu: Don't zap private pages for unsupported cases [v10,048/108] KVM: x86/tdp_mmu: Make handle_changed_spte() return value [v10,049/108] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU [v10,050/108,MARKER] The start of TDX KVM patch series: TDX EPT violation [v10,051/108] KVM: x86/mmu: Disallow dirty logging for x86 TDX [v10,052/108] KVM: x86/tdp_mmu: Ignore unsupported mmu operation on private GFNs [v10,053/108] KVM: VMX: Split out guts of EPT violation to common/exposed function [v10,054/108] KVM: VMX: Move setting of EPT MMU masks to common VT-x code [v10,055/108] KVM: TDX: Add load_mmu_pgd method for TDX [v10,056/108] KVM: TDX: don't request KVM_REQ_APIC_PAGE_RELOAD [v10,057/108] KVM: x86/VMX: introduce vmx tlb_remote_flush and tlb_remote_flush_with_range [v10,058/108] KVM: TDX: TDP MMU TDX support [v10,059/108,MARKER] The start of TDX KVM patch series: KVM TDP MMU MapGPA [v10,060/108] KVM: Add functions to set GFN to private or shared [v10,061/108] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX [v10,062/108] KVM: x86/tdp_mmu: implement MapGPA hypercall for TDX [v10,063/108,MARKER] The start of TDX KVM patch series: TD finalization [v10,064/108] KVM: TDX: Create initial guest memory [v10,065/108] KVM: TDX: Finalize VM initialization [v10,066/108,MARKER] The start of TDX KVM patch series: TD vcpu enter/exit [v10,067/108] KVM: TDX: Add helper assembly function to TDX vcpu [v10,068/108] KVM: TDX: Implement TDX vcpu enter/exit path [v10,069/108] KVM: TDX: vcpu_run: save/restore host state(host kernel gs) [v10,070/108] KVM: TDX: restore host xsave state when exit from the guest TD [v10,071/108] KVM: x86: Allow to update cached values in kvm_user_return_msrs w/o wrmsr [v10,072/108] KVM: TDX: restore user ret MSRs [v10,073/108,MARKER] The start of TDX KVM patch series: TD vcpu exits/interrupts/hypercalls [v10,074/108] KVM: TDX: complete interrupts after tdexit [v10,075/108] KVM: TDX: restore debug store when TD exit [v10,076/108] KVM: TDX: handle vcpu migration over logical processor [v10,077/108] KVM: x86: Add a switch_db_regs flag to handle TDX's auto-switched behavior [v10,078/108] KVM: TDX: Add support for find pending IRQ in a protected local APIC [v10,079/108] KVM: x86: Assume timer IRQ was injected if APIC state is proteced [v10,080/108] KVM: TDX: remove use of struct vcpu_vmx from posted_interrupt.c [v10,081/108] KVM: TDX: Implement interrupt injection [v10,082/108] KVM: TDX: Implements vcpu request_immediate_exit [v10,083/108] KVM: TDX: Implement methods to inject NMI [v10,084/108] KVM: VMX: Modify NMI and INTR handlers to take intr_info as function argument [v10,085/108] KVM: VMX: Move NMI/exception handler to common helper [v10,086/108] KVM: x86: Split core of hypercall emulation to helper function [v10,087/108] KVM: TDX: Add a place holder to handle TDX VM exit [v10,088/108] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT [v10,089/108] KVM: TDX: handle EXIT_REASON_OTHER_SMI [v10,090/108] KVM: TDX: handle ept violation/misconfig exit [v10,091/108] KVM: TDX: handle EXCEPTION_NMI and EXTERNAL_INTERRUPT [v10,092/108] KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL) [v10,093/108] KVM: TDX: handle KVM hypercall with TDG.VP.VMCALL [v10,094/108] KVM: TDX: Handle TDX PV CPUID hypercall [v10,095/108] KVM: TDX: Handle TDX PV HLT hypercall [v10,096/108] KVM: TDX: Handle TDX PV port io hypercall [v10,097/108] KVM: TDX: Handle TDX PV MMIO hypercall [v10,098/108] KVM: TDX: Implement callbacks for MSR operations for TDX [v10,099/108] KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall [v10,100/108] KVM: TDX: Handle TDX PV report fatal error hypercall [v10,101/108] KVM: TDX: Handle TDX PV map_gpa hypercall [v10,102/108] KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo> hypercall [v10,103/108] KVM: TDX: Silently discard SMI request [v10,104/108] KVM: TDX: Silently ignore INIT/SIPI [v10,105/108] KVM: TDX: Add methods to ignore accesses to CPU state [v10,106/108] Documentation/virt/kvm: Document on Trust Domain Extensions(TDX) [v10,107/108] KVM: x86: design documentation on TDX support of x86 KVM TDP MMU [v10,108/108,MARKER] the end of (the first phase of) TDX KVM patch series

Commit Message

Isaku Yamahata Oct. 30, 2022, 6:22 a.m. UTC

  From: Isaku Yamahata <isaku.yamahata@intel.com>

The next step of TDX guest creation is to create vcpu.  Allocate TDX vcpu
structures, initialize it.  Allocate pages of TDX vcpu for the TDX module.

In the case of the conventional case, cpuid is empty at the initialization.
and cpuid is configured after the vcpu initialization.  Because TDX
supports only X2APIC mode, cpuid is forcibly initialized to support X2APIC
on the vcpu initialization.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    |  40 +++++++++--
 arch/x86/kvm/vmx/tdx.c     | 138 +++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h |   8 +++
 3 files changed, 182 insertions(+), 4 deletions(-)

Comments

Yuan Yao Nov. 14, 2022, 6:46 a.m. UTC | #1

On Sat, Oct 29, 2022 at 11:22:25PM -0700, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> The next step of TDX guest creation is to create vcpu.  Allocate TDX vcpu
> structures, initialize it.  Allocate pages of TDX vcpu for the TDX module.
>
> In the case of the conventional case, cpuid is empty at the initialization.
> and cpuid is configured after the vcpu initialization.  Because TDX
> supports only X2APIC mode, cpuid is forcibly initialized to support X2APIC
> on the vcpu initialization.
>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>  arch/x86/kvm/vmx/main.c    |  40 +++++++++--
>  arch/x86/kvm/vmx/tdx.c     | 138 +++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/vmx/x86_ops.h |   8 +++
>  3 files changed, 182 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index b4e4c6c677f6..c125b2e3e8b4 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -63,6 +63,38 @@ static void vt_vm_free(struct kvm *kvm)
>  		return tdx_vm_free(kvm);
>  }
>
> +static int vt_vcpu_precreate(struct kvm *kvm)
> +{
> +	if (is_td(kvm))
> +		return 0;
> +
> +	return vmx_vcpu_precreate(kvm);
> +}
> +
> +static int vt_vcpu_create(struct kvm_vcpu *vcpu)
> +{
> +	if (is_td_vcpu(vcpu))
> +		return tdx_vcpu_create(vcpu);
> +
> +	return vmx_vcpu_create(vcpu);
> +}
> +
> +static void vt_vcpu_free(struct kvm_vcpu *vcpu)
> +{
> +	if (is_td_vcpu(vcpu))
> +		return tdx_vcpu_free(vcpu);
> +
> +	return vmx_vcpu_free(vcpu);
> +}
> +
> +static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> +{
> +	if (is_td_vcpu(vcpu))
> +		return tdx_vcpu_reset(vcpu, init_event);
> +
> +	return vmx_vcpu_reset(vcpu, init_event);
> +}
> +
>  static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>  {
>  	if (!is_td(kvm))
> @@ -89,10 +121,10 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
>  	.vm_destroy = vt_vm_destroy,
>  	.vm_free = vt_vm_free,
>
> -	.vcpu_precreate = vmx_vcpu_precreate,
> -	.vcpu_create = vmx_vcpu_create,
> -	.vcpu_free = vmx_vcpu_free,
> -	.vcpu_reset = vmx_vcpu_reset,
> +	.vcpu_precreate = vt_vcpu_precreate,
> +	.vcpu_create = vt_vcpu_create,
> +	.vcpu_free = vt_vcpu_free,
> +	.vcpu_reset = vt_vcpu_reset,
>
>  	.prepare_switch_to_guest = vmx_prepare_switch_to_guest,
>  	.vcpu_load = vmx_vcpu_load,
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 54045e0576e7..0625c354b341 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -49,6 +49,11 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
>  	return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
>  }
>
> +static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
> +{
> +	return tdx->tdvpr.added;
> +}
> +
>  static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
>  {
>  	return kvm_tdx->tdr.added;
> @@ -296,6 +301,139 @@ int tdx_vm_init(struct kvm *kvm)
>  	return 0;
>  }
>
> +int tdx_vcpu_create(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> +	int ret, i;
> +
> +	/* TDX only supports x2APIC, which requires an in-kernel local APIC. */
> +	if (!vcpu->arch.apic)
> +		return -EINVAL;
> +
> +	fpstate_set_confidential(&vcpu->arch.guest_fpu);
> +
> +	ret = tdx_alloc_td_page(&tdx->tdvpr);
> +	if (ret)
> +		return ret;
> +
> +	tdx->tdvpx = kcalloc(tdx_caps.tdvpx_nr_pages, sizeof(*tdx->tdvpx),
> +			GFP_KERNEL_ACCOUNT);
> +	if (!tdx->tdvpx) {
> +		ret = -ENOMEM;
> +		goto free_tdvpr;
> +	}
> +	for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) {
> +		ret = tdx_alloc_td_page(&tdx->tdvpx[i]);
> +		if (ret)
> +			goto free_tdvpx;
> +	}
> +
> +	vcpu->arch.efer = EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
> +
> +	vcpu->arch.cr0_guest_owned_bits = -1ul;
> +	vcpu->arch.cr4_guest_owned_bits = -1ul;
> +
> +	vcpu->arch.tsc_offset = to_kvm_tdx(vcpu->kvm)->tsc_offset;
> +	vcpu->arch.l1_tsc_offset = vcpu->arch.tsc_offset;
> +	vcpu->arch.guest_state_protected =
> +		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
> +
> +	return 0;
> +
> +free_tdvpx:
> +	/* @i points at the TDVPX page that failed allocation. */
> +	for (--i; i >= 0; i--)
> +		free_page(tdx->tdvpx[i].va);
> +	kfree(tdx->tdvpx);
> +	tdx->tdvpx = NULL;
> +free_tdvpr:
> +	free_page(tdx->tdvpr.va);
> +
> +	return ret;
> +}
> +
> +void tdx_vcpu_free(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> +	int i;
> +
> +	/* Can't reclaim or free pages if teardown failed. */
> +	if (is_hkid_assigned(to_kvm_tdx(vcpu->kvm)))
> +		return;
> +
> +	if (tdx->tdvpx) {
> +		for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++)
> +			tdx_reclaim_td_page(&tdx->tdvpx[i]);
> +		kfree(tdx->tdvpx);
> +		tdx->tdvpx = NULL;
> +	}
> +	tdx_reclaim_td_page(&tdx->tdvpr);
> +}
> +
> +void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> +{
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> +	struct msr_data apic_base_msr;
> +	u64 err;
> +	int i;
> +
> +	/* TDX doesn't support INIT event. */
> +	if (WARN_ON_ONCE(init_event))
> +		goto td_bugged;
> +	if (WARN_ON_ONCE(is_td_vcpu_created(tdx)))
> +		goto td_bugged;
> +
> +	err = tdh_vp_create(kvm_tdx->tdr.pa, tdx->tdvpr.pa);
> +	if (WARN_ON_ONCE(err)) {
> +		pr_tdx_error(TDH_VP_CREATE, err, NULL);
> +		goto td_bugged;
> +	}
> +	tdx_mark_td_page_added(&tdx->tdvpr);
> +
> +	for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) {
> +		err = tdh_vp_addcx(tdx->tdvpr.pa, tdx->tdvpx[i].pa);
> +		if (WARN_ON_ONCE(err)) {
> +			pr_tdx_error(TDH_VP_ADDCX, err, NULL);
> +			goto td_bugged;
> +		}
> +		tdx_mark_td_page_added(&tdx->tdvpx[i]);
> +	}
> +
> +	if (!vcpu->arch.cpuid_entries) {
> +		/*
> +		 * On cpu creation, cpuid entry is blank.  Forcibly enable
> +		 * X2APIC feature to allow X2APIC.
> +		 */
> +		struct kvm_cpuid_entry2 *e;
> +
> +		e = kvmalloc_array(1, sizeof(*e), GFP_KERNEL_ACCOUNT);

NULL checking is necessary for kvmalloc_array.

> +		*e  = (struct kvm_cpuid_entry2) {
> +			.function = 1,	/* Features for X2APIC */
> +			.index = 0,
> +			.eax = 0,
> +			.ebx = 0,
> +			.ecx = 1ULL << 21,	/* X2APIC */
> +			.edx = 0,
> +		};
> +		vcpu->arch.cpuid_entries = e;
> +		vcpu->arch.cpuid_nent = 1;
> +	}
> +	apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC;
> +	if (kvm_vcpu_is_reset_bsp(vcpu))
> +		apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
> +	apic_base_msr.host_initiated = true;
> +	if (WARN_ON_ONCE(kvm_set_apic_base(vcpu, &apic_base_msr)))
> +		goto td_bugged;
> +
> +	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> +
> +	return;
> +
> +td_bugged:
> +	vcpu->kvm->vm_bugged = true;
> +}
> +
>  int tdx_dev_ioctl(void __user *argp)
>  {
>  	struct kvm_tdx_capabilities __user *user_caps;
> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
> index 93ffe2deb8e8..f6841c3dd12d 100644
> --- a/arch/x86/kvm/vmx/x86_ops.h
> +++ b/arch/x86/kvm/vmx/x86_ops.h
> @@ -141,6 +141,10 @@ int tdx_vm_init(struct kvm *kvm);
>  void tdx_mmu_release_hkid(struct kvm *kvm);
>  void tdx_vm_free(struct kvm *kvm);
>
> +int tdx_vcpu_create(struct kvm_vcpu *vcpu);
> +void tdx_vcpu_free(struct kvm_vcpu *vcpu);
> +void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
> +
>  int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
>  #else
>  static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return 0; }
> @@ -154,6 +158,10 @@ static inline void tdx_mmu_release_hkid(struct kvm *kvm) {}
>  static inline void tdx_flush_shadow_all_private(struct kvm *kvm) {}
>  static inline void tdx_vm_free(struct kvm *kvm) {}
>
> +static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; }
> +static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
> +static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {}
> +
>  static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; }
>  #endif
>
> --
> 2.25.1
>

Isaku Yamahata Dec. 15, 2022, 9:28 p.m. UTC | #2

On Mon, Nov 14, 2022 at 02:46:22PM +0800,
Yuan Yao <yuan.yao@linux.intel.com> wrote:

> On Sat, Oct 29, 2022 at 11:22:25PM -0700, isaku.yamahata@intel.com wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> >
> > The next step of TDX guest creation is to create vcpu.  Allocate TDX vcpu
> > structures, initialize it.  Allocate pages of TDX vcpu for the TDX module.
> >
> > In the case of the conventional case, cpuid is empty at the initialization.
> > and cpuid is configured after the vcpu initialization.  Because TDX
> > supports only X2APIC mode, cpuid is forcibly initialized to support X2APIC
> > on the vcpu initialization.
> >
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > ---
> >  arch/x86/kvm/vmx/main.c    |  40 +++++++++--
> >  arch/x86/kvm/vmx/tdx.c     | 138 +++++++++++++++++++++++++++++++++++++
> >  arch/x86/kvm/vmx/x86_ops.h |   8 +++
> >  3 files changed, 182 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> > index b4e4c6c677f6..c125b2e3e8b4 100644
> > --- a/arch/x86/kvm/vmx/main.c
> > +++ b/arch/x86/kvm/vmx/main.c
> > @@ -63,6 +63,38 @@ static void vt_vm_free(struct kvm *kvm)
> >  		return tdx_vm_free(kvm);
> >  }
> >
> > +static int vt_vcpu_precreate(struct kvm *kvm)
> > +{
> > +	if (is_td(kvm))
> > +		return 0;
> > +
> > +	return vmx_vcpu_precreate(kvm);
> > +}
> > +
> > +static int vt_vcpu_create(struct kvm_vcpu *vcpu)
> > +{
> > +	if (is_td_vcpu(vcpu))
> > +		return tdx_vcpu_create(vcpu);
> > +
> > +	return vmx_vcpu_create(vcpu);
> > +}
> > +
> > +static void vt_vcpu_free(struct kvm_vcpu *vcpu)
> > +{
> > +	if (is_td_vcpu(vcpu))
> > +		return tdx_vcpu_free(vcpu);
> > +
> > +	return vmx_vcpu_free(vcpu);
> > +}
> > +
> > +static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > +{
> > +	if (is_td_vcpu(vcpu))
> > +		return tdx_vcpu_reset(vcpu, init_event);
> > +
> > +	return vmx_vcpu_reset(vcpu, init_event);
> > +}
> > +
> >  static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> >  {
> >  	if (!is_td(kvm))
> > @@ -89,10 +121,10 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
> >  	.vm_destroy = vt_vm_destroy,
> >  	.vm_free = vt_vm_free,
> >
> > -	.vcpu_precreate = vmx_vcpu_precreate,
> > -	.vcpu_create = vmx_vcpu_create,
> > -	.vcpu_free = vmx_vcpu_free,
> > -	.vcpu_reset = vmx_vcpu_reset,
> > +	.vcpu_precreate = vt_vcpu_precreate,
> > +	.vcpu_create = vt_vcpu_create,
> > +	.vcpu_free = vt_vcpu_free,
> > +	.vcpu_reset = vt_vcpu_reset,
> >
> >  	.prepare_switch_to_guest = vmx_prepare_switch_to_guest,
> >  	.vcpu_load = vmx_vcpu_load,
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index 54045e0576e7..0625c354b341 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -49,6 +49,11 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
> >  	return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
> >  }
> >
> > +static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
> > +{
> > +	return tdx->tdvpr.added;
> > +}
> > +
> >  static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
> >  {
> >  	return kvm_tdx->tdr.added;
> > @@ -296,6 +301,139 @@ int tdx_vm_init(struct kvm *kvm)
> >  	return 0;
> >  }
> >
> > +int tdx_vcpu_create(struct kvm_vcpu *vcpu)
> > +{
> > +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> > +	int ret, i;
> > +
> > +	/* TDX only supports x2APIC, which requires an in-kernel local APIC. */
> > +	if (!vcpu->arch.apic)
> > +		return -EINVAL;
> > +
> > +	fpstate_set_confidential(&vcpu->arch.guest_fpu);
> > +
> > +	ret = tdx_alloc_td_page(&tdx->tdvpr);
> > +	if (ret)
> > +		return ret;
> > +
> > +	tdx->tdvpx = kcalloc(tdx_caps.tdvpx_nr_pages, sizeof(*tdx->tdvpx),
> > +			GFP_KERNEL_ACCOUNT);
> > +	if (!tdx->tdvpx) {
> > +		ret = -ENOMEM;
> > +		goto free_tdvpr;
> > +	}
> > +	for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) {
> > +		ret = tdx_alloc_td_page(&tdx->tdvpx[i]);
> > +		if (ret)
> > +			goto free_tdvpx;
> > +	}
> > +
> > +	vcpu->arch.efer = EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
> > +
> > +	vcpu->arch.cr0_guest_owned_bits = -1ul;
> > +	vcpu->arch.cr4_guest_owned_bits = -1ul;
> > +
> > +	vcpu->arch.tsc_offset = to_kvm_tdx(vcpu->kvm)->tsc_offset;
> > +	vcpu->arch.l1_tsc_offset = vcpu->arch.tsc_offset;
> > +	vcpu->arch.guest_state_protected =
> > +		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
> > +
> > +	return 0;
> > +
> > +free_tdvpx:
> > +	/* @i points at the TDVPX page that failed allocation. */
> > +	for (--i; i >= 0; i--)
> > +		free_page(tdx->tdvpx[i].va);
> > +	kfree(tdx->tdvpx);
> > +	tdx->tdvpx = NULL;
> > +free_tdvpr:
> > +	free_page(tdx->tdvpr.va);
> > +
> > +	return ret;
> > +}
> > +
> > +void tdx_vcpu_free(struct kvm_vcpu *vcpu)
> > +{
> > +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> > +	int i;
> > +
> > +	/* Can't reclaim or free pages if teardown failed. */
> > +	if (is_hkid_assigned(to_kvm_tdx(vcpu->kvm)))
> > +		return;
> > +
> > +	if (tdx->tdvpx) {
> > +		for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++)
> > +			tdx_reclaim_td_page(&tdx->tdvpx[i]);
> > +		kfree(tdx->tdvpx);
> > +		tdx->tdvpx = NULL;
> > +	}
> > +	tdx_reclaim_td_page(&tdx->tdvpr);
> > +}
> > +
> > +void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > +{
> > +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> > +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> > +	struct msr_data apic_base_msr;
> > +	u64 err;
> > +	int i;
> > +
> > +	/* TDX doesn't support INIT event. */
> > +	if (WARN_ON_ONCE(init_event))
> > +		goto td_bugged;
> > +	if (WARN_ON_ONCE(is_td_vcpu_created(tdx)))
> > +		goto td_bugged;
> > +
> > +	err = tdh_vp_create(kvm_tdx->tdr.pa, tdx->tdvpr.pa);
> > +	if (WARN_ON_ONCE(err)) {
> > +		pr_tdx_error(TDH_VP_CREATE, err, NULL);
> > +		goto td_bugged;
> > +	}
> > +	tdx_mark_td_page_added(&tdx->tdvpr);
> > +
> > +	for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) {
> > +		err = tdh_vp_addcx(tdx->tdvpr.pa, tdx->tdvpx[i].pa);
> > +		if (WARN_ON_ONCE(err)) {
> > +			pr_tdx_error(TDH_VP_ADDCX, err, NULL);
> > +			goto td_bugged;
> > +		}
> > +		tdx_mark_td_page_added(&tdx->tdvpx[i]);
> > +	}
> > +
> > +	if (!vcpu->arch.cpuid_entries) {
> > +		/*
> > +		 * On cpu creation, cpuid entry is blank.  Forcibly enable
> > +		 * X2APIC feature to allow X2APIC.
> > +		 */
> > +		struct kvm_cpuid_entry2 *e;
> > +
> > +		e = kvmalloc_array(1, sizeof(*e), GFP_KERNEL_ACCOUNT);
> 
> NULL checking is necessary for kvmalloc_array.

Fixed. Because vcpu_reset() function doesn't return error, this logic is
moved to tdx_vcpu_create().

Thanks,

diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index b4e4c6c677f6..c125b2e3e8b4 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -63,6 +63,38 @@  static void vt_vm_free(struct kvm *kvm)
 		return tdx_vm_free(kvm);
 }
 
+static int vt_vcpu_precreate(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return 0;
+
+	return vmx_vcpu_precreate(kvm);
+}
+
+static int vt_vcpu_create(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_create(vcpu);
+
+	return vmx_vcpu_create(vcpu);
+}
+
+static void vt_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_free(vcpu);
+
+	return vmx_vcpu_free(vcpu);
+}
+
+static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_reset(vcpu, init_event);
+
+	return vmx_vcpu_reset(vcpu, init_event);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -89,10 +121,10 @@  struct kvm_x86_ops vt_x86_ops __initdata = {
 	.vm_destroy = vt_vm_destroy,
 	.vm_free = vt_vm_free,
 
-	.vcpu_precreate = vmx_vcpu_precreate,
-	.vcpu_create = vmx_vcpu_create,
-	.vcpu_free = vmx_vcpu_free,
-	.vcpu_reset = vmx_vcpu_reset,
+	.vcpu_precreate = vt_vcpu_precreate,
+	.vcpu_create = vt_vcpu_create,
+	.vcpu_free = vt_vcpu_free,
+	.vcpu_reset = vt_vcpu_reset,
 
 	.prepare_switch_to_guest = vmx_prepare_switch_to_guest,
 	.vcpu_load = vmx_vcpu_load,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 54045e0576e7..0625c354b341 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -49,6 +49,11 @@  static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
 	return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
 }
 
+static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
+{
+	return tdx->tdvpr.added;
+}
+
 static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
 {
 	return kvm_tdx->tdr.added;
@@ -296,6 +301,139 @@  int tdx_vm_init(struct kvm *kvm)
 	return 0;
 }
 
+int tdx_vcpu_create(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+	int ret, i;
+
+	/* TDX only supports x2APIC, which requires an in-kernel local APIC. */
+	if (!vcpu->arch.apic)
+		return -EINVAL;
+
+	fpstate_set_confidential(&vcpu->arch.guest_fpu);
+
+	ret = tdx_alloc_td_page(&tdx->tdvpr);
+	if (ret)
+		return ret;
+
+	tdx->tdvpx = kcalloc(tdx_caps.tdvpx_nr_pages, sizeof(*tdx->tdvpx),
+			GFP_KERNEL_ACCOUNT);
+	if (!tdx->tdvpx) {
+		ret = -ENOMEM;
+		goto free_tdvpr;
+	}
+	for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) {
+		ret = tdx_alloc_td_page(&tdx->tdvpx[i]);
+		if (ret)
+			goto free_tdvpx;
+	}
+
+	vcpu->arch.efer = EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
+
+	vcpu->arch.cr0_guest_owned_bits = -1ul;
+	vcpu->arch.cr4_guest_owned_bits = -1ul;
+
+	vcpu->arch.tsc_offset = to_kvm_tdx(vcpu->kvm)->tsc_offset;
+	vcpu->arch.l1_tsc_offset = vcpu->arch.tsc_offset;
+	vcpu->arch.guest_state_protected =
+		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
+
+	return 0;
+
+free_tdvpx:
+	/* @i points at the TDVPX page that failed allocation. */
+	for (--i; i >= 0; i--)
+		free_page(tdx->tdvpx[i].va);
+	kfree(tdx->tdvpx);
+	tdx->tdvpx = NULL;
+free_tdvpr:
+	free_page(tdx->tdvpr.va);
+
+	return ret;
+}
+
+void tdx_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+	int i;
+
+	/* Can't reclaim or free pages if teardown failed. */
+	if (is_hkid_assigned(to_kvm_tdx(vcpu->kvm)))
+		return;
+
+	if (tdx->tdvpx) {
+		for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++)
+			tdx_reclaim_td_page(&tdx->tdvpx[i]);
+		kfree(tdx->tdvpx);
+		tdx->tdvpx = NULL;
+	}
+	tdx_reclaim_td_page(&tdx->tdvpr);
+}
+
+void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+{
+	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+	struct msr_data apic_base_msr;
+	u64 err;
+	int i;
+
+	/* TDX doesn't support INIT event. */
+	if (WARN_ON_ONCE(init_event))
+		goto td_bugged;
+	if (WARN_ON_ONCE(is_td_vcpu_created(tdx)))
+		goto td_bugged;
+
+	err = tdh_vp_create(kvm_tdx->tdr.pa, tdx->tdvpr.pa);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_VP_CREATE, err, NULL);
+		goto td_bugged;
+	}
+	tdx_mark_td_page_added(&tdx->tdvpr);
+
+	for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) {
+		err = tdh_vp_addcx(tdx->tdvpr.pa, tdx->tdvpx[i].pa);
+		if (WARN_ON_ONCE(err)) {
+			pr_tdx_error(TDH_VP_ADDCX, err, NULL);
+			goto td_bugged;
+		}
+		tdx_mark_td_page_added(&tdx->tdvpx[i]);
+	}
+
+	if (!vcpu->arch.cpuid_entries) {
+		/*
+		 * On cpu creation, cpuid entry is blank.  Forcibly enable
+		 * X2APIC feature to allow X2APIC.
+		 */
+		struct kvm_cpuid_entry2 *e;
+
+		e = kvmalloc_array(1, sizeof(*e), GFP_KERNEL_ACCOUNT);
+		*e  = (struct kvm_cpuid_entry2) {
+			.function = 1,	/* Features for X2APIC */
+			.index = 0,
+			.eax = 0,
+			.ebx = 0,
+			.ecx = 1ULL << 21,	/* X2APIC */
+			.edx = 0,
+		};
+		vcpu->arch.cpuid_entries = e;
+		vcpu->arch.cpuid_nent = 1;
+	}
+	apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC;
+	if (kvm_vcpu_is_reset_bsp(vcpu))
+		apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
+	apic_base_msr.host_initiated = true;
+	if (WARN_ON_ONCE(kvm_set_apic_base(vcpu, &apic_base_msr)))
+		goto td_bugged;
+
+	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+	return;
+
+td_bugged:
+	vcpu->kvm->vm_bugged = true;
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 93ffe2deb8e8..f6841c3dd12d 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -141,6 +141,10 @@  int tdx_vm_init(struct kvm *kvm);
 void tdx_mmu_release_hkid(struct kvm *kvm);
 void tdx_vm_free(struct kvm *kvm);
 
+int tdx_vcpu_create(struct kvm_vcpu *vcpu);
+void tdx_vcpu_free(struct kvm_vcpu *vcpu);
+void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return 0; }
@@ -154,6 +158,10 @@  static inline void tdx_mmu_release_hkid(struct kvm *kvm) {}
 static inline void tdx_flush_shadow_all_private(struct kvm *kvm) {}
 static inline void tdx_vm_free(struct kvm *kvm) {}
 
+static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; }
+static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
+static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {}
+
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; }
 #endif