diff mbox series

[v19,092/130] KVM: TDX: Implement interrupt injection

Message ID	b2d9539b23f155b95864db3eacce55e0e24eed4d.1708933498.git.isaku.yamahata@intel.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel+bounces-80856-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com, Sean Christopherson <seanjc@google.com>, Sagi Shahar <sagis@google.com>, Kai Huang <kai.huang@intel.com>, chen.bo@intel.com, hang.yuan@intel.com, tina.zhang@intel.com Subject: [PATCH v19 092/130] KVM: TDX: Implement interrupt injection Date: Mon, 26 Feb 2024 00:26:34 -0800 Message-Id: <b2d9539b23f155b95864db3eacce55e0e24eed4d.1708933498.git.isaku.yamahata@intel.com> In-Reply-To: <cover.1708933498.git.isaku.yamahata@intel.com> References: <cover.1708933498.git.isaku.yamahata@intel.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX
Series	[v19,001/130] x86/virt/tdx: Rename _offset to _member for TD_SYSINFO_MAP() macro \| [v19,001/130] x86/virt/tdx: Rename _offset to _member for TD_SYSINFO_MAP() macro [v19,002/130] x86/virt/tdx: Move TDMR metadata fields map table to local variable [v19,003/130] x86/virt/tdx: Unbind global metadata read with 'struct tdx_tdmr_sysinfo' [v19,004/130] x86/virt/tdx: Support global metadata read for all element sizes [v19,005/130] x86/virt/tdx: Export global metadata read infrastructure [v19,006/130] x86/virt/tdx: Export TDX KeyID information [v19,007/130] x86/virt/tdx: Export SEAMCALL functions [v19,008/130] x86/tdx: Warning with 32bit build shift-count-overflow [v19,009/130] KVM: x86: Add gmem hook for determining max NPT mapping level [v19,010/130] KVM: x86: Pass is_private to gmem hook of gmem_max_level [v19,011/130] KVM: Add new members to struct kvm_gfn_range to operate on [v19,012/130] KVM: x86/mmu: Pass around full 64-bit error code for the KVM page fault [v19,013/130] KVM: x86: Use PFERR_GUEST_ENC_MASK to indicate fault is private [v19,014/130] KVM: Add KVM vcpu ioctl to pre-populate guest memory [v19,015/130] KVM: Document KVM_MEMORY_MAPPING ioctl [v19,016/130] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX [v19,017/130] KVM: x86: Implement kvm_arch_{, pre_}vcpu_memory_mapping() [v19,018/130] KVM: x86/mmu: Assume guest MMIOs are shared [v19,019/130] KVM: x86: Add is_vm_type_supported callback [v19,020/130] KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX [v19,021/130] KVM: x86/vmx: initialize loaded_vmcss_on_cpu in vmx_init() [v19,022/130] KVM: x86/vmx: Refactor KVM VMX module init/exit functions [v19,023/130] KVM: TDX: Initialize the TDX module when loading the KVM intel kernel module [v19,024/130] KVM: TDX: Add placeholders for TDX VM/vcpu structure [v19,025/130] KVM: TDX: Make TDX VM type supported [v19,026/130,MARKER] The start of TDX KVM patch series: TDX architectural definitions [v19,027/130] KVM: TDX: Define TDX architectural definitions [v19,028/130] KVM: TDX: Add TDX "architectural" error codes [v19,029/130] KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module [v19,030/130] KVM: TDX: Add helper functions to print TDX SEAMCALL error [v19,031/130,MARKER] The start of TDX KVM patch series: TD VM creation/destruction [v19,032/130] KVM: TDX: Add helper functions to allocate/free TDX private host key id [v19,033/130] KVM: TDX: Add helper function to read TDX metadata in array [v19,034/130] KVM: TDX: Get system-wide info about TDX module on initialization [v19,035/130] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl [v19,036/130] KVM: TDX: x86: Add ioctl to get TDX systemwide parameters [v19,037/130] KVM: TDX: Make KVM_CAP_MAX_VCPUS backend specific [v19,038/130] KVM: TDX: create/destroy VM structure [v19,039/130] KVM: TDX: initialize VM with TDX specific parameters [v19,040/130] KVM: TDX: Make pmu_intel.c ignore guest TD case [v19,041/130] KVM: TDX: Refuse to unplug the last cpu on the package [v19,042/130,MARKER] The start of TDX KVM patch series: TD vcpu creation/destruction [v19,043/130] KVM: TDX: create/free TDX vcpu structure [v19,044/130] KVM: TDX: Do TDX specific vcpu initialization [v19,045/130,MARKER] The start of TDX KVM patch series: KVM MMU GPA shared bits [v19,046/130] KVM: x86/mmu: Add address conversion functions for TDX shared bit of GPA [v19,047/130,MARKER] The start of TDX KVM patch series: KVM TDP refactoring for TDX [v19,048/130] KVM: Allow page-sized MMU caches to be initialized with custom 64-bit values [v19,049/130] KVM: x86/mmu: Replace hardcoded value 0 for the initial value for SPTE [v19,050/130] KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed SPTE [v19,051/130] KVM: x86/mmu: Add Suppress VE bit to shadow_mmio_mask/shadow_present_mask [v19,052/130] KVM: x86/mmu: Track shadow MMIO value on a per-VM basis [v19,053/130] KVM: x86/mmu: Disallow fast page fault on private GPA [v19,054/130] KVM: VMX: Introduce test mode related to EPT violation VE [v19,055/130,MARKER] The start of TDX KVM patch series: KVM TDP MMU hooks [v19,056/130] KVM: x86/tdp_mmu: Init role member of struct kvm_mmu_page at allocation [v19,057/130] KVM: x86/mmu: Add a new is_private member for union kvm_mmu_page_role [v19,058/130] KVM: x86/mmu: Add a private pointer to struct kvm_mmu_page [v19,059/130] KVM: x86/tdp_mmu: Don't zap private pages for unsupported cases [v19,060/130] KVM: x86/tdp_mmu: Apply mmu notifier callback to only shared GPA [v19,061/130] KVM: x86/tdp_mmu: Sprinkle __must_check [v19,062/130] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU [v19,063/130,MARKER] The start of TDX KVM patch series: TDX EPT violation [v19,064/130] KVM: x86/mmu: Do not enable page track for TD guest [v19,065/130] KVM: VMX: Split out guts of EPT violation to common/exposed function [v19,066/130] KVM: TDX: Add accessors VMX VMCS helpers [v19,067/130] KVM: TDX: Add load_mmu_pgd method for TDX [v19,068/130] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT [v19,069/130] KVM: TDX: Require TDP MMU and mmio caching for TDX [v19,070/130] KVM: TDX: TDP MMU TDX support [v19,071/130] KVM: TDX: MTRR: implement get_mt_mask() for TDX [v19,072/130,MARKER] The start of TDX KVM patch series: TD finalization [v19,073/130] KVM: x86: Add hooks in kvm_arch_vcpu_memory_mapping() [v19,074/130] KVM: TDX: Create initial guest memory [v19,075/130] KVM: TDX: Extend memory measurement with initial guest memory [v19,076/130] KVM: TDX: Finalize VM initialization [v19,077/130,MARKER] The start of TDX KVM patch series: TD vcpu enter/exit [v19,078/130] KVM: TDX: Implement TDX vcpu enter/exit path [v19,079/130] KVM: TDX: vcpu_run: save/restore host state(host kernel gs) [v19,080/130] KVM: TDX: restore host xsave state when exit from the guest TD [v19,081/130] KVM: x86: Allow to update cached values in kvm_user_return_msrs w/o wrmsr [v19,082/130] KVM: TDX: restore user ret MSRs [v19,083/130] KVM: TDX: Add TSX_CTRL msr into uret_msrs list [v19,084/130,MARKER] The start of TDX KVM patch series: TD vcpu exits/interrupts/hypercalls [v19,085/130] KVM: TDX: Complete interrupts after tdexit [v19,086/130] KVM: TDX: restore debug store when TD exit [v19,087/130] KVM: TDX: handle vcpu migration over logical processor [v19,088/130] KVM: x86: Add a switch_db_regs flag to handle TDX's auto-switched behavior [v19,089/130] KVM: TDX: Add support for find pending IRQ in a protected local APIC [v19,090/130] KVM: x86: Assume timer IRQ was injected if APIC state is proteced [v19,091/130] KVM: TDX: remove use of struct vcpu_vmx from posted_interrupt.c [v19,092/130] KVM: TDX: Implement interrupt injection [v19,093/130] KVM: TDX: Implements vcpu request_immediate_exit [v19,094/130] KVM: TDX: Implement methods to inject NMI [v19,095/130] KVM: VMX: Modify NMI and INTR handlers to take intr_info as function argument [v19,096/130] KVM: VMX: Move NMI/exception handler to common helper [v19,097/130] KVM: x86: Split core of hypercall emulation to helper function [v19,098/130] KVM: TDX: Add a place holder to handle TDX VM exit [v19,099/130] KVM: TDX: Handle vmentry failure for INTEL TD guest [v19,100/130] KVM: TDX: handle EXIT_REASON_OTHER_SMI [v19,101/130] KVM: TDX: handle ept violation/misconfig exit [v19,102/130] KVM: TDX: handle EXCEPTION_NMI and EXTERNAL_INTERRUPT [v19,103/130] KVM: TDX: Handle EXIT_REASON_OTHER_SMI with MSMI [v19,104/130] KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL) [v19,105/130] KVM: TDX: handle KVM hypercall with TDG.VP.VMCALL [v19,106/130] KVM: TDX: Add KVM Exit for TDX TDG.VP.VMCALL [v19,107/130] KVM: TDX: Handle TDX PV CPUID hypercall [v19,108/130] KVM: TDX: Handle TDX PV HLT hypercall [v19,109/130] KVM: TDX: Handle TDX PV port io hypercall [v19,110/130] KVM: TDX: Handle TDX PV MMIO hypercall [v19,111/130] KVM: TDX: Implement callbacks for MSR operations for TDX [v19,112/130] KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall [v19,113/130] KVM: TDX: Handle MSR MTRRCap and MTRRDefType access [v19,114/130] KVM: TDX: Handle MSR IA32_FEAT_CTL MSR and IA32_MCG_EXT_CTL [v19,115/130] KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo> hypercall [v19,116/130] KVM: TDX: Silently discard SMI request [v19,117/130] KVM: TDX: Silently ignore INIT/SIPI [v19,118/130] KVM: TDX: Add methods to ignore accesses to CPU state [v19,119/130] KVM: TDX: Add methods to ignore guest instruction emulation [v19,120/130] KVM: TDX: Add a method to ignore dirty logging [v19,121/130] KVM: TDX: Add methods to ignore VMX preemption timer [v19,122/130] KVM: TDX: Add methods to ignore accesses to TSC [v19,123/130] KVM: TDX: Ignore setting up mce [v19,124/130] KVM: TDX: Add a method to ignore for TDX to ignore hypercall patch [v19,125/130] KVM: TDX: Add methods to ignore virtual apic related operation [v19,126/130] KVM: TDX: Inhibit APICv for TDX guest [v19,127/130] Documentation/virt/kvm: Document on Trust Domain Extensions(TDX) [v19,128/130] KVM: x86: design documentation on TDX support of x86 KVM TDP MMU [v19,129/130] RFC: KVM: x86: Add x86 callback to check cpuid [v19,130/130] RFC: KVM: x86, TDX: Add check for KVM_SET_CPUID2

Commit Message

Isaku Yamahata Feb. 26, 2024, 8:26 a.m. UTC

  From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX supports interrupt inject into vcpu with posted interrupt.  Wire up the
corresponding kvm x86 operations to posted interrupt.  Move
kvm_vcpu_trigger_posted_interrupt() from vmx.c to common.h to share the
code.

VMX can inject interrupt by setting interrupt information field,
VM_ENTRY_INTR_INFO_FIELD, of VMCS.  TDX supports interrupt injection only
by posted interrupt.  Ignore the execution path to access
VM_ENTRY_INTR_INFO_FIELD.

As cpu state is protected and apicv is enabled for the TDX guest, VMM can
inject interrupt by updating posted interrupt descriptor.  Treat interrupt
can be injected always.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/common.h      | 71 ++++++++++++++++++++++++++
 arch/x86/kvm/vmx/main.c        | 93 ++++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/posted_intr.c |  2 +-
 arch/x86/kvm/vmx/posted_intr.h |  2 +
 arch/x86/kvm/vmx/tdx.c         | 25 +++++++++
 arch/x86/kvm/vmx/vmx.c         | 67 +-----------------------
 arch/x86/kvm/vmx/x86_ops.h     |  7 ++-
 7 files changed, 190 insertions(+), 77 deletions(-)

diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index 235908f3e044..6f21d0d48809 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -4,6 +4,7 @@ 
 
 #include <linux/kvm_host.h>
 
+#include "posted_intr.h"
 #include "mmu.h"
 
 static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa,
@@ -30,4 +31,74 @@  static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa,
 	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
 }
 
+static inline void kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
+						     int pi_vec)
+{
+#ifdef CONFIG_SMP
+	if (vcpu->mode == IN_GUEST_MODE) {
+		/*
+		 * The vector of the virtual has already been set in the PIR.
+		 * Send a notification event to deliver the virtual interrupt
+		 * unless the vCPU is the currently running vCPU, i.e. the
+		 * event is being sent from a fastpath VM-Exit handler, in
+		 * which case the PIR will be synced to the vIRR before
+		 * re-entering the guest.
+		 *
+		 * When the target is not the running vCPU, the following
+		 * possibilities emerge:
+		 *
+		 * Case 1: vCPU stays in non-root mode. Sending a notification
+		 * event posts the interrupt to the vCPU.
+		 *
+		 * Case 2: vCPU exits to root mode and is still runnable. The
+		 * PIR will be synced to the vIRR before re-entering the guest.
+		 * Sending a notification event is ok as the host IRQ handler
+		 * will ignore the spurious event.
+		 *
+		 * Case 3: vCPU exits to root mode and is blocked. vcpu_block()
+		 * has already synced PIR to vIRR and never blocks the vCPU if
+		 * the vIRR is not empty. Therefore, a blocked vCPU here does
+		 * not wait for any requested interrupts in PIR, and sending a
+		 * notification event also results in a benign, spurious event.
+		 */
+
+		if (vcpu != kvm_get_running_vcpu())
+			__apic_send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
+		return;
+	}
+#endif
+	/*
+	 * The vCPU isn't in the guest; wake the vCPU in case it is blocking,
+	 * otherwise do nothing as KVM will grab the highest priority pending
+	 * IRQ via ->sync_pir_to_irr() in vcpu_enter_guest().
+	 */
+	kvm_vcpu_wake_up(vcpu);
+}
+
+/*
+ * Send interrupt to vcpu via posted interrupt way.
+ * 1. If target vcpu is running(non-root mode), send posted interrupt
+ * notification to vcpu and hardware will sync PIR to vIRR atomically.
+ * 2. If target vcpu isn't running(root mode), kick it to pick up the
+ * interrupt from PIR in next vmentry.
+ */
+static inline void __vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu,
+						  struct pi_desc *pi_desc, int vector)
+{
+	if (pi_test_and_set_pir(vector, pi_desc))
+		return;
+
+	/* If a previous notification has sent the IPI, nothing to do.  */
+	if (pi_test_and_set_on(pi_desc))
+		return;
+
+	/*
+	 * The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*()
+	 * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is
+	 * guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a
+	 * posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE.
+	 */
+	kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_VECTOR);
+}
+
 #endif /* __KVM_X86_VMX_COMMON_H */
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 5fd99e844b86..f2c9d6358f9e 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -228,6 +228,34 @@  static bool vt_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
 	return tdx_protected_apic_has_interrupt(vcpu);
 }
 
+static void vt_apicv_pre_state_restore(struct kvm_vcpu *vcpu)
+{
+	struct pi_desc *pi = vcpu_to_pi_desc(vcpu);
+
+	pi_clear_on(pi);
+	memset(pi->pir, 0, sizeof(pi->pir));
+}
+
+static int vt_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return -1;
+
+	return vmx_sync_pir_to_irr(vcpu);
+}
+
+static void vt_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector)
+{
+	if (is_td_vcpu(apic->vcpu)) {
+		tdx_deliver_interrupt(apic, delivery_mode, trig_mode,
+					     vector);
+		return;
+	}
+
+	vmx_deliver_interrupt(apic, delivery_mode, trig_mode, vector);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu)) {
@@ -297,6 +325,53 @@  static void vt_sched_in(struct kvm_vcpu *vcpu, int cpu)
 	vmx_sched_in(vcpu, cpu);
 }
 
+static void vt_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+	vmx_set_interrupt_shadow(vcpu, mask);
+}
+
+static u32 vt_get_interrupt_shadow(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return 0;
+
+	return vmx_get_interrupt_shadow(vcpu);
+}
+
+static void vt_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_inject_irq(vcpu, reinjected);
+}
+
+static void vt_cancel_injection(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_cancel_injection(vcpu);
+}
+
+static int vt_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	if (is_td_vcpu(vcpu))
+		return true;
+
+	return vmx_interrupt_allowed(vcpu, for_injection);
+}
+
+static void vt_enable_irq_window(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_enable_irq_window(vcpu);
+}
+
 static u8 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
 	if (is_td_vcpu(vcpu))
@@ -426,31 +501,31 @@  struct kvm_x86_ops vt_x86_ops __initdata = {
 	.handle_exit = vmx_handle_exit,
 	.skip_emulated_instruction = vmx_skip_emulated_instruction,
 	.update_emulated_instruction = vmx_update_emulated_instruction,
-	.set_interrupt_shadow = vmx_set_interrupt_shadow,
-	.get_interrupt_shadow = vmx_get_interrupt_shadow,
+	.set_interrupt_shadow = vt_set_interrupt_shadow,
+	.get_interrupt_shadow = vt_get_interrupt_shadow,
 	.patch_hypercall = vmx_patch_hypercall,
-	.inject_irq = vmx_inject_irq,
+	.inject_irq = vt_inject_irq,
 	.inject_nmi = vmx_inject_nmi,
 	.inject_exception = vmx_inject_exception,
-	.cancel_injection = vmx_cancel_injection,
-	.interrupt_allowed = vmx_interrupt_allowed,
+	.cancel_injection = vt_cancel_injection,
+	.interrupt_allowed = vt_interrupt_allowed,
 	.nmi_allowed = vmx_nmi_allowed,
 	.get_nmi_mask = vmx_get_nmi_mask,
 	.set_nmi_mask = vmx_set_nmi_mask,
 	.enable_nmi_window = vmx_enable_nmi_window,
-	.enable_irq_window = vmx_enable_irq_window,
+	.enable_irq_window = vt_enable_irq_window,
 	.update_cr8_intercept = vmx_update_cr8_intercept,
 	.set_virtual_apic_mode = vmx_set_virtual_apic_mode,
 	.set_apic_access_page_addr = vmx_set_apic_access_page_addr,
 	.refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl,
 	.load_eoi_exitmap = vmx_load_eoi_exitmap,
-	.apicv_pre_state_restore = vmx_apicv_pre_state_restore,
+	.apicv_pre_state_restore = vt_apicv_pre_state_restore,
 	.required_apicv_inhibits = VMX_REQUIRED_APICV_INHIBITS,
 	.hwapic_irr_update = vmx_hwapic_irr_update,
 	.hwapic_isr_update = vmx_hwapic_isr_update,
 	.guest_apic_has_interrupt = vmx_guest_apic_has_interrupt,
-	.sync_pir_to_irr = vmx_sync_pir_to_irr,
-	.deliver_interrupt = vmx_deliver_interrupt,
+	.sync_pir_to_irr = vt_sync_pir_to_irr,
+	.deliver_interrupt = vt_deliver_interrupt,
 	.dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
 	.protected_apic_has_interrupt = vt_protected_apic_has_interrupt,
 
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index b66add9da0f3..c86768b83f0b 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -52,7 +52,7 @@  static inline struct vcpu_pi *vcpu_to_pi(struct kvm_vcpu *vcpu)
 	return (struct vcpu_pi *)vcpu;
 }
 
-static inline struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
 {
 	return &vcpu_to_pi(vcpu)->pi_desc;
 }
diff --git a/arch/x86/kvm/vmx/posted_intr.h b/arch/x86/kvm/vmx/posted_intr.h
index 2fe8222308b2..0f9983b6910b 100644
--- a/arch/x86/kvm/vmx/posted_intr.h
+++ b/arch/x86/kvm/vmx/posted_intr.h
@@ -105,6 +105,8 @@  struct vcpu_pi {
 	/* Until here common layout betwwn vcpu_vmx and vcpu_tdx. */
 };
 
+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu);
+
 void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu);
 void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu);
 void pi_wakeup_handler(void);
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 1da58c36217c..1dfa9b503e0d 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -7,6 +7,7 @@ 
 
 #include "capabilities.h"
 #include "x86_ops.h"
+#include "common.h"
 #include "mmu.h"
 #include "tdx_arch.h"
 #include "tdx.h"
@@ -603,6 +604,9 @@  int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 	tdx->host_state_need_save = true;
 	tdx->host_state_need_restore = false;
 
+	tdx->pi_desc.nv = POSTED_INTR_VECTOR;
+	tdx->pi_desc.sn = 1;
+
 	return 0;
 }
 
@@ -610,6 +614,7 @@  void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct vcpu_tdx *tdx = to_tdx(vcpu);
 
+	vmx_vcpu_pi_load(vcpu, cpu);
 	if (vcpu->cpu == cpu)
 		return;
 
@@ -848,6 +853,12 @@  fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
 
 	trace_kvm_entry(vcpu);
 
+	if (pi_test_on(&tdx->pi_desc)) {
+		apic->send_IPI_self(POSTED_INTR_VECTOR);
+
+		kvm_wait_lapic_expire(vcpu);
+	}
+
 	tdx_vcpu_enter_exit(tdx);
 
 	tdx_user_return_update_cache(vcpu);
@@ -1213,6 +1224,16 @@  static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
 	return tdx_sept_drop_private_spte(kvm, gfn, level, pfn);
 }
 
+void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector)
+{
+	struct kvm_vcpu *vcpu = apic->vcpu;
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+
+	/* TDX supports only posted interrupt.  No lapic emulation. */
+	__vmx_deliver_posted_interrupt(vcpu, &tdx->pi_desc, vector);
+}
+
 static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
@@ -1972,6 +1993,10 @@  int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
 	if (ret)
 		return ret;
 
+	td_vmcs_write16(tdx, POSTED_INTR_NV, POSTED_INTR_VECTOR);
+	td_vmcs_write64(tdx, POSTED_INTR_DESC_ADDR, __pa(&tdx->pi_desc));
+	td_vmcs_setbit32(tdx, PIN_BASED_VM_EXEC_CONTROL, PIN_BASED_POSTED_INTR);
+
 	tdx->initialized = true;
 	return 0;
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 162bb134aae6..1349ec438837 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4157,50 +4157,6 @@  void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
 		pt_update_intercept_for_msr(vcpu);
 }
 
-static inline void kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
-						     int pi_vec)
-{
-#ifdef CONFIG_SMP
-	if (vcpu->mode == IN_GUEST_MODE) {
-		/*
-		 * The vector of the virtual has already been set in the PIR.
-		 * Send a notification event to deliver the virtual interrupt
-		 * unless the vCPU is the currently running vCPU, i.e. the
-		 * event is being sent from a fastpath VM-Exit handler, in
-		 * which case the PIR will be synced to the vIRR before
-		 * re-entering the guest.
-		 *
-		 * When the target is not the running vCPU, the following
-		 * possibilities emerge:
-		 *
-		 * Case 1: vCPU stays in non-root mode. Sending a notification
-		 * event posts the interrupt to the vCPU.
-		 *
-		 * Case 2: vCPU exits to root mode and is still runnable. The
-		 * PIR will be synced to the vIRR before re-entering the guest.
-		 * Sending a notification event is ok as the host IRQ handler
-		 * will ignore the spurious event.
-		 *
-		 * Case 3: vCPU exits to root mode and is blocked. vcpu_block()
-		 * has already synced PIR to vIRR and never blocks the vCPU if
-		 * the vIRR is not empty. Therefore, a blocked vCPU here does
-		 * not wait for any requested interrupts in PIR, and sending a
-		 * notification event also results in a benign, spurious event.
-		 */
-
-		if (vcpu != kvm_get_running_vcpu())
-			__apic_send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
-		return;
-	}
-#endif
-	/*
-	 * The vCPU isn't in the guest; wake the vCPU in case it is blocking,
-	 * otherwise do nothing as KVM will grab the highest priority pending
-	 * IRQ via ->sync_pir_to_irr() in vcpu_enter_guest().
-	 */
-	kvm_vcpu_wake_up(vcpu);
-}
-
 static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
 						int vector)
 {
@@ -4253,20 +4209,7 @@  static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
 	if (!vcpu->arch.apic->apicv_active)
 		return -1;
 
-	if (pi_test_and_set_pir(vector, &vmx->pi_desc))
-		return 0;
-
-	/* If a previous notification has sent the IPI, nothing to do.  */
-	if (pi_test_and_set_on(&vmx->pi_desc))
-		return 0;
-
-	/*
-	 * The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*()
-	 * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is
-	 * guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a
-	 * posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE.
-	 */
-	kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_VECTOR);
+	__vmx_deliver_posted_interrupt(vcpu, &vmx->pi_desc, vector);
 	return 0;
 }
 
@@ -6946,14 +6889,6 @@  void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
 	vmcs_write64(EOI_EXIT_BITMAP3, eoi_exit_bitmap[3]);
 }
 
-void vmx_apicv_pre_state_restore(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-
-	pi_clear_on(&vmx->pi_desc);
-	memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir));
-}
-
 void vmx_do_interrupt_irqoff(unsigned long entry);
 void vmx_do_nmi_irqoff(void);
 
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index f3f90186926c..12a212e71827 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -58,7 +58,6 @@  int vmx_check_intercept(struct kvm_vcpu *vcpu,
 bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu);
 void vmx_migrate_timers(struct kvm_vcpu *vcpu);
 void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
-void vmx_apicv_pre_state_restore(struct kvm_vcpu *vcpu);
 bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason);
 void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr);
 void vmx_hwapic_isr_update(int max_isr);
@@ -158,6 +157,9 @@  void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu);
 u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
 
+void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector);
+
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
 
 void tdx_flush_tlb(struct kvm_vcpu *vcpu);
@@ -198,6 +200,9 @@  static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {}
 static inline bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu) { return false; }
 static inline u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) { return 0; }
 
+static inline void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+					 int trig_mode, int vector) {}
+
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; }
 
 static inline void tdx_flush_tlb(struct kvm_vcpu *vcpu) {}