[v1,13/23] KVM: VMX: Handle VMX nested exception for FRED

Message ID 20231108183003.5981-14-xin3.li@intel.com
State New
Headers
Series Enable FRED with KVM VMX |

Commit Message

Li, Xin3 Nov. 8, 2023, 6:29 p.m. UTC
  Set VMX nested exception bit in the VM-entry interruption information
VMCS field when injecting a nested exception using FRED event delivery
to ensure:
  1) The nested exception is injected on a correct stack level.
  2) The nested bit defined in FRED stack frame is set.

The event stack level used by FRED event delivery depends on whether the
event was a nested exception encountered during delivery of another event,
because a nested exception is "regarded" as happening on ring 0.  E.g.,
when #PF is configured to use stack level 1 in IA32_FRED_STKLVLS MSR:
  - nested #PF will be delivered on stack level 1 when triggered from
    user level.
  - normal #PF will be delivered on stack level 0 when triggered from
    user level.

The VMX nested-exception support ensures the correct event stack level is
chosen when a VM entry injects a nested exception.

Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  6 ++++--
 arch/x86/include/asm/vmx.h      |  4 +++-
 arch/x86/kvm/svm/svm.c          |  4 ++--
 arch/x86/kvm/vmx/vmx.c          | 26 +++++++++++++++++++++-----
 arch/x86/kvm/x86.c              | 22 +++++++++++++---------
 arch/x86/kvm/x86.h              |  1 +
 6 files changed, 44 insertions(+), 19 deletions(-)
  

Comments

Chao Gao Nov. 14, 2023, 7:40 a.m. UTC | #1
> 	/* Require Write-Back (WB) memory type for VMCS accesses. */
>@@ -7313,11 +7328,12 @@ static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu,
> 			}
> 		}
> 
>-		if (idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK) {
>-			u32 err = vmcs_read32(error_code_field);
>-			kvm_requeue_exception_e(vcpu, vector, err);
>-		} else
>-			kvm_requeue_exception(vcpu, vector);
>+		if (idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK)
>+			kvm_requeue_exception_e(vcpu, vector, vmcs_read32(error_code_field),
>+						idt_vectoring_info & INTR_INFO_NESTED_EXCEPTION_MASK);
>+		else
>+			kvm_requeue_exception(vcpu, vector,
>+					      idt_vectoring_info & INTR_INFO_NESTED_EXCEPTION_MASK);

Exiting-event identification can also have bit 13 set, indicating a nested
exception encountered and caused VM-exit. when reinjecting the exception to
guests, kvm needs to set the "nested" bit, right? I suspect some changes
to e.g., handle_exception_nmi() are needed.
  
Li, Xin3 Nov. 15, 2023, 3:03 a.m. UTC | #2
> >+		if (idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK)
> >+			kvm_requeue_exception_e(vcpu, vector,
> vmcs_read32(error_code_field),
> >+						idt_vectoring_info &
> INTR_INFO_NESTED_EXCEPTION_MASK);
> >+		else
> >+			kvm_requeue_exception(vcpu, vector,
> >+					      idt_vectoring_info &
> INTR_INFO_NESTED_EXCEPTION_MASK);
> 
> Exiting-event identification can also have bit 13 set, indicating a nested
> exception encountered and caused VM-exit. when reinjecting the exception to
> guests, kvm needs to set the "nested" bit, right? I suspect some changes
> to e.g., handle_exception_nmi() are needed.

The current patch relies on kvm_multiple_exception() to do that.  But TBH,
I'm not sure it can recognize all nested cases.  I probably should revisit
it.
  
Li, Xin3 Dec. 6, 2023, 8:37 a.m. UTC | #3
> Subject: RE: [PATCH v1 13/23] KVM: VMX: Handle VMX nested exception for FRED
> 
> > >+		if (idt_vectoring_info &
> VECTORING_INFO_DELIVER_CODE_MASK)
> > >+			kvm_requeue_exception_e(vcpu, vector,
> > vmcs_read32(error_code_field),
> > >+						idt_vectoring_info &
> > INTR_INFO_NESTED_EXCEPTION_MASK);
> > >+		else
> > >+			kvm_requeue_exception(vcpu, vector,
> > >+					      idt_vectoring_info &
> > INTR_INFO_NESTED_EXCEPTION_MASK);
> >
> > Exiting-event identification can also have bit 13 set, indicating a
> > nested exception encountered and caused VM-exit. when reinjecting the
> > exception to guests, kvm needs to set the "nested" bit, right? I
> > suspect some changes to e.g., handle_exception_nmi() are needed.
> 
> The current patch relies on kvm_multiple_exception() to do that.  But TBH, I'm
> not sure it can recognize all nested cases.  I probably should revisit it.

So the conclusion is that kvm_multiple_exception() is smart enough, and
a VMM doesn't have to check bit 13 of the Exiting-event identification.

In FRED spec 5.0, section 9.2 - New VMX Feature: VMX Nested-Exception
Support, there is a statement at the end of Exiting-event identification:

(The value of this bit is always identical to that of the valid bit of
the original-event identification field.)

It means that even w/o VMX Nested-Exception support, a VMM already knows
if an exception is a nested exception encountered during delivery of
another event in an exception caused VM exit (exit reason 0).  This is
done in KVM through reading IDT_VECTORING_INFO_FIELD and calling
vmx_complete_interrupts() immediately after VM exits.

vmx_complete_interrupts() simply queues the original exception if there is
one, and later the nested exception causing the VM exit could be cancelled
if it is a shadow page fault.  However if the shadow page fault is caused
by a guest page fault, KVM injects it as a nested exception to have guest
fix its page table.

I will add comments about this background in the next iteration.
  
Chao Gao Dec. 7, 2023, 8:42 a.m. UTC | #4
On Wed, Dec 06, 2023 at 04:37:39PM +0800, Li, Xin3 wrote:
>> Subject: RE: [PATCH v1 13/23] KVM: VMX: Handle VMX nested exception for FRED
>> 
>> > >+		if (idt_vectoring_info &
>> VECTORING_INFO_DELIVER_CODE_MASK)
>> > >+			kvm_requeue_exception_e(vcpu, vector,
>> > vmcs_read32(error_code_field),
>> > >+						idt_vectoring_info &
>> > INTR_INFO_NESTED_EXCEPTION_MASK);
>> > >+		else
>> > >+			kvm_requeue_exception(vcpu, vector,
>> > >+					      idt_vectoring_info &
>> > INTR_INFO_NESTED_EXCEPTION_MASK);
>> >
>> > Exiting-event identification can also have bit 13 set, indicating a
>> > nested exception encountered and caused VM-exit. when reinjecting the
>> > exception to guests, kvm needs to set the "nested" bit, right? I
>> > suspect some changes to e.g., handle_exception_nmi() are needed.
>> 
>> The current patch relies on kvm_multiple_exception() to do that.  But TBH, I'm
>> not sure it can recognize all nested cases.  I probably should revisit it.
>
>So the conclusion is that kvm_multiple_exception() is smart enough, and
>a VMM doesn't have to check bit 13 of the Exiting-event identification.
>
>In FRED spec 5.0, section 9.2 - New VMX Feature: VMX Nested-Exception
>Support, there is a statement at the end of Exiting-event identification:
>
>(The value of this bit is always identical to that of the valid bit of
>the original-event identification field.)
>
>It means that even w/o VMX Nested-Exception support, a VMM already knows
>if an exception is a nested exception encountered during delivery of
>another event in an exception caused VM exit (exit reason 0).  This is
>done in KVM through reading IDT_VECTORING_INFO_FIELD and calling
>vmx_complete_interrupts() immediately after VM exits.
>
>vmx_complete_interrupts() simply queues the original exception if there is
>one, and later the nested exception causing the VM exit could be cancelled
>if it is a shadow page fault.  However if the shadow page fault is caused
>by a guest page fault, KVM injects it as a nested exception to have guest
>fix its page table.
>
>I will add comments about this background in the next iteration.

is it possible that the CPU encounters an exception and causes VM-exit during
injecting an __interrupt__? in this case, no __exception__ will be (re-)queued
by vmx_complete_interrupts().
  
Li, Xin3 Dec. 7, 2023, 10:09 a.m. UTC | #5
> >> > Exiting-event identification can also have bit 13 set, indicating a
> >> > nested exception encountered and caused VM-exit. when reinjecting the
> >> > exception to guests, kvm needs to set the "nested" bit, right? I
> >> > suspect some changes to e.g., handle_exception_nmi() are needed.
> >>
> >> The current patch relies on kvm_multiple_exception() to do that.  But TBH, I'm
> >> not sure it can recognize all nested cases.  I probably should revisit it.
> >
> >So the conclusion is that kvm_multiple_exception() is smart enough, and
> >a VMM doesn't have to check bit 13 of the Exiting-event identification.
> >
> >In FRED spec 5.0, section 9.2 - New VMX Feature: VMX Nested-Exception
> >Support, there is a statement at the end of Exiting-event identification:
> >
> >(The value of this bit is always identical to that of the valid bit of
> >the original-event identification field.)
> >
> >It means that even w/o VMX Nested-Exception support, a VMM already knows
> >if an exception is a nested exception encountered during delivery of
> >another event in an exception caused VM exit (exit reason 0).  This is
> >done in KVM through reading IDT_VECTORING_INFO_FIELD and calling
> >vmx_complete_interrupts() immediately after VM exits.
> >
> >vmx_complete_interrupts() simply queues the original exception if there is
> >one, and later the nested exception causing the VM exit could be cancelled
> >if it is a shadow page fault.  However if the shadow page fault is caused
> >by a guest page fault, KVM injects it as a nested exception to have guest
> >fix its page table.
> >
> >I will add comments about this background in the next iteration.
> 
> is it possible that the CPU encounters an exception and causes VM-exit during
> injecting an __interrupt__? in this case, no __exception__ will be (re-)queued
> by vmx_complete_interrupts().

I guess the following case is what you're suggesting:
KVM injects an external interrupt after shadow page tables are nuked.

vmx_complete_interrupts() are called after each VM exit to clear both
interrupt and exception queues, which means it always pushes the
deepest event if there is an original event.  In the above case, the
original event is the external interrupt KVM just tried to inject.
  
Chao Gao Dec. 8, 2023, 1:56 a.m. UTC | #6
On Thu, Dec 07, 2023 at 06:09:46PM +0800, Li, Xin3 wrote:
>> >> > Exiting-event identification can also have bit 13 set, indicating a
>> >> > nested exception encountered and caused VM-exit. when reinjecting the
>> >> > exception to guests, kvm needs to set the "nested" bit, right? I
>> >> > suspect some changes to e.g., handle_exception_nmi() are needed.
>> >>
>> >> The current patch relies on kvm_multiple_exception() to do that.  But TBH, I'm
>> >> not sure it can recognize all nested cases.  I probably should revisit it.
>> >
>> >So the conclusion is that kvm_multiple_exception() is smart enough, and
>> >a VMM doesn't have to check bit 13 of the Exiting-event identification.
>> >
>> >In FRED spec 5.0, section 9.2 - New VMX Feature: VMX Nested-Exception
>> >Support, there is a statement at the end of Exiting-event identification:
>> >
>> >(The value of this bit is always identical to that of the valid bit of
>> >the original-event identification field.)
>> >
>> >It means that even w/o VMX Nested-Exception support, a VMM already knows
>> >if an exception is a nested exception encountered during delivery of
>> >another event in an exception caused VM exit (exit reason 0).  This is
>> >done in KVM through reading IDT_VECTORING_INFO_FIELD and calling
>> >vmx_complete_interrupts() immediately after VM exits.
>> >
>> >vmx_complete_interrupts() simply queues the original exception if there is
>> >one, and later the nested exception causing the VM exit could be cancelled
>> >if it is a shadow page fault.  However if the shadow page fault is caused
>> >by a guest page fault, KVM injects it as a nested exception to have guest
>> >fix its page table.
>> >
>> >I will add comments about this background in the next iteration.
>> 
>> is it possible that the CPU encounters an exception and causes VM-exit during
>> injecting an __interrupt__? in this case, no __exception__ will be (re-)queued
>> by vmx_complete_interrupts().
>
>I guess the following case is what you're suggesting:
>KVM injects an external interrupt after shadow page tables are nuked.
>
>vmx_complete_interrupts() are called after each VM exit to clear both
>interrupt and exception queues, which means it always pushes the
>deepest event if there is an original event.  In the above case, the
>original event is the external interrupt KVM just tried to inject.

in my understanding, your point is:
1. if bit 13 of the Exiting-event identification is set. the original-event
identification field should be valid.
2. vmx_complete_interrupts() is done immediately after VM exits and reads
original-event identification and reinjects the event there.
3. if KVM injects the exception in exiting-event identification
to guest, KVM doesn't need to read the bit 13 because kvm_multiple_exception()
is "smart enough" and recognize the exception as nested-exception because if
bit 13 is 1, one exception must has been queued in #2.

my question is:
what if the event in original-event identification is an interrupt e.g.,
external interrupt or NMI, rather than exception. vmx_complete_interrupts()
won't queue an exception, then how can KVM or kvm_multiple_exception() know the
exception that caused VM-exit is an nested exception w/o reading bit 13 of the
Exiting-event identification?
  
Li, Xin3 Dec. 8, 2023, 11:48 p.m. UTC | #7
> >> >> > Exiting-event identification can also have bit 13 set, indicating a
> >> >> > nested exception encountered and caused VM-exit. when reinjecting the
> >> >> > exception to guests, kvm needs to set the "nested" bit, right? I
> >> >> > suspect some changes to e.g., handle_exception_nmi() are needed.
> >> >>
> >> >> The current patch relies on kvm_multiple_exception() to do that.  But TBH,
> I'm
> >> >> not sure it can recognize all nested cases.  I probably should revisit it.
> >> >
> >> >So the conclusion is that kvm_multiple_exception() is smart enough, and
> >> >a VMM doesn't have to check bit 13 of the Exiting-event identification.
> >> >
> >> >In FRED spec 5.0, section 9.2 - New VMX Feature: VMX Nested-Exception
> >> >Support, there is a statement at the end of Exiting-event identification:
> >> >
> >> >(The value of this bit is always identical to that of the valid bit of
> >> >the original-event identification field.)
> >> >
> >> >It means that even w/o VMX Nested-Exception support, a VMM already
> knows
> >> >if an exception is a nested exception encountered during delivery of
> >> >another event in an exception caused VM exit (exit reason 0).  This is
> >> >done in KVM through reading IDT_VECTORING_INFO_FIELD and calling
> >> >vmx_complete_interrupts() immediately after VM exits.
> >> >
> >> >vmx_complete_interrupts() simply queues the original exception if there is
> >> >one, and later the nested exception causing the VM exit could be cancelled
> >> >if it is a shadow page fault.  However if the shadow page fault is caused
> >> >by a guest page fault, KVM injects it as a nested exception to have guest
> >> >fix its page table.
> >> >
> >> >I will add comments about this background in the next iteration.
> >>
> >> is it possible that the CPU encounters an exception and causes VM-exit during
> >> injecting an __interrupt__? in this case, no __exception__ will be (re-)queued
> >> by vmx_complete_interrupts().
> >
> >I guess the following case is what you're suggesting:
> >KVM injects an external interrupt after shadow page tables are nuked.
> >
> >vmx_complete_interrupts() are called after each VM exit to clear both
> >interrupt and exception queues, which means it always pushes the
> >deepest event if there is an original event.  In the above case, the
> >original event is the external interrupt KVM just tried to inject.
> 
> in my understanding, your point is:
> 1. if bit 13 of the Exiting-event identification is set. the original-event
> identification field should be valid.
> 2. vmx_complete_interrupts() is done immediately after VM exits and reads
> original-event identification and reinjects the event there.
> 3. if KVM injects the exception in exiting-event identification
> to guest, KVM doesn't need to read the bit 13 because kvm_multiple_exception()
> is "smart enough" and recognize the exception as nested-exception because if
> bit 13 is 1, one exception must has been queued in #2.
> 
> my question is:
> what if the event in original-event identification is an interrupt e.g.,
> external interrupt or NMI, rather than exception.  vmx_complete_interrupts()
> won't queue an exception, then how can KVM or kvm_multiple_exception()
> know the
> exception that caused VM-exit is an nested exception w/o reading bit 13 of the
> Exiting-event identification?

The good news is that vmx_complete_interrupts() still queues the event
even it's not a hardware exception.  It's just that kvm_multiple_exception()
doesn't check if there is an original interrupt or NMI because IDT event
delivery doesn't care such a case.

I think your point is more of that we should check it when FRED is enabled
for a guest.  Yes, architecturally we should do it.

What I want to emphasize is that bit 13 of the exiting-event identification
is set to the valid bit of the original-event identification, they are
logically the same thing when FRED is enabled.  It doens't matter which one
a VMM reads and uses.  But a VMM doesn't need to differentiate FRED and IDT
if it reads the info from original-event identification.
  

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1e5a6d9439f8..2ae8cc83dbb3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -721,6 +721,7 @@  struct kvm_queued_exception {
 	u32 error_code;
 	unsigned long payload;
 	bool has_payload;
+	bool nested;
 };
 
 struct kvm_vcpu_arch {
@@ -2015,8 +2016,9 @@  int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu);
 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, unsigned long payload);
-void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
-void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
+void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr, bool nested);
+void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr,
+			     u32 error_code, bool nested);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
 void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
 				    struct x86_exception *fault);
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 97729248e844..020dfd3f6b44 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -132,6 +132,7 @@ 
 /* VMX_BASIC bits and bitmasks */
 #define VMX_BASIC_32BIT_PHYS_ADDR_ONLY		BIT_ULL(48)
 #define VMX_BASIC_INOUT				BIT_ULL(54)
+#define VMX_BASIC_NESTED_EXCEPTION		BIT_ULL(58)
 
 /* VMX_MISC bits and bitmasks */
 #define VMX_MISC_INTEL_PT			BIT_ULL(14)
@@ -404,8 +405,9 @@  enum vmcs_field {
 #define INTR_INFO_INTR_TYPE_MASK        0x700           /* 10:8 */
 #define INTR_INFO_DELIVER_CODE_MASK     0x800           /* 11 */
 #define INTR_INFO_UNBLOCK_NMI		0x1000		/* 12 */
+#define INTR_INFO_NESTED_EXCEPTION_MASK	0x2000		/* 13 */
 #define INTR_INFO_VALID_MASK            0x80000000      /* 31 */
-#define INTR_INFO_RESVD_BITS_MASK       0x7ffff000
+#define INTR_INFO_RESVD_BITS_MASK       0x7fffd000
 
 #define VECTORING_INFO_VECTOR_MASK           	INTR_INFO_VECTOR_MASK
 #define VECTORING_INFO_TYPE_MASK        	INTR_INFO_INTR_TYPE_MASK
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 712146312358..78a9ff5cfcad 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4047,10 +4047,10 @@  static void svm_complete_interrupts(struct kvm_vcpu *vcpu)
 
 		if (exitintinfo & SVM_EXITINTINFO_VALID_ERR) {
 			u32 err = svm->vmcb->control.exit_int_info_err;
-			kvm_requeue_exception_e(vcpu, vector, err);
+			kvm_requeue_exception_e(vcpu, vector, err, false);
 
 		} else
-			kvm_requeue_exception(vcpu, vector);
+			kvm_requeue_exception(vcpu, vector, false);
 		break;
 	case SVM_EXITINTINFO_TYPE_INTR:
 		kvm_queue_interrupt(vcpu, vector, false);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 67fd4a56d031..518e68ee5a0d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1901,6 +1901,8 @@  static void vmx_inject_exception(struct kvm_vcpu *vcpu)
 				event_data = vcpu->arch.guest_fpu.xfd_err;
 
 			vmcs_write64(INJECTED_EVENT_DATA, event_data);
+
+			intr_info |= ex->nested ? INTR_INFO_NESTED_EXCEPTION_MASK : 0;
 		}
 	}
 
@@ -2851,6 +2853,19 @@  static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	/* IA-32 SDM Vol 3B: 64-bit CPUs always have VMX_BASIC_MSR[48]==0. */
 	if (basic_msr & VMX_BASIC_32BIT_PHYS_ADDR_ONLY)
 		return -EIO;
+
+	/*
+	 * FRED draft Spec 5.0 Section 9.2:
+	 *
+	 * Any processor that enumerates support for FRED transitions
+	 * will also enumerate VMX nested-exception support.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_FRED) &&
+	    !(basic_msr & VMX_BASIC_NESTED_EXCEPTION)) {
+		pr_warn_once("FRED enabled but no VMX nested-exception support\n");
+		if (error_on_inconsistent_vmcs_config)
+			return -EIO;
+	}
 #endif
 
 	/* Require Write-Back (WB) memory type for VMCS accesses. */
@@ -7313,11 +7328,12 @@  static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu,
 			}
 		}
 
-		if (idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK) {
-			u32 err = vmcs_read32(error_code_field);
-			kvm_requeue_exception_e(vcpu, vector, err);
-		} else
-			kvm_requeue_exception(vcpu, vector);
+		if (idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK)
+			kvm_requeue_exception_e(vcpu, vector, vmcs_read32(error_code_field),
+						idt_vectoring_info & INTR_INFO_NESTED_EXCEPTION_MASK);
+		else
+			kvm_requeue_exception(vcpu, vector,
+					      idt_vectoring_info & INTR_INFO_NESTED_EXCEPTION_MASK);
 		break;
 	case INTR_TYPE_SOFT_INTR:
 		vcpu->arch.event_exit_inst_len = vmcs_read32(instr_len_field);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d190bfc63fc4..51c07730f1b6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -645,7 +645,8 @@  static void kvm_leave_nested(struct kvm_vcpu *vcpu)
 
 static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 		unsigned nr, bool has_error, u32 error_code,
-	        bool has_payload, unsigned long payload, bool reinject)
+	        bool has_payload, unsigned long payload,
+		bool reinject, bool nested)
 {
 	u32 prev_nr;
 	int class1, class2;
@@ -678,6 +679,7 @@  static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 			 */
 			WARN_ON_ONCE(kvm_is_exception_pending(vcpu));
 			vcpu->arch.exception.injected = true;
+			vcpu->arch.exception.nested = nested;
 			if (WARN_ON_ONCE(has_payload)) {
 				/*
 				 * For a reinjected event, KVM delivers its
@@ -727,6 +729,8 @@  static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 
 		kvm_queue_exception_e(vcpu, DF_VECTOR, 0);
 	} else {
+		vcpu->arch.exception.nested = true;
+
 		/* replace previous exception with a new one in a hope
 		   that instruction re-execution will regenerate lost
 		   exception */
@@ -736,20 +740,20 @@  static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 
 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
 {
-	kvm_multiple_exception(vcpu, nr, false, 0, false, 0, false);
+	kvm_multiple_exception(vcpu, nr, false, 0, false, 0, false, false);
 }
 EXPORT_SYMBOL_GPL(kvm_queue_exception);
 
-void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr)
+void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr, bool nested)
 {
-	kvm_multiple_exception(vcpu, nr, false, 0, false, 0, true);
+	kvm_multiple_exception(vcpu, nr, false, 0, false, 0, true, nested);
 }
 EXPORT_SYMBOL_GPL(kvm_requeue_exception);
 
 void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr,
 			   unsigned long payload)
 {
-	kvm_multiple_exception(vcpu, nr, false, 0, true, payload, false);
+	kvm_multiple_exception(vcpu, nr, false, 0, true, payload, false, false);
 }
 EXPORT_SYMBOL_GPL(kvm_queue_exception_p);
 
@@ -757,7 +761,7 @@  static void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr,
 				    u32 error_code, unsigned long payload)
 {
 	kvm_multiple_exception(vcpu, nr, true, error_code,
-			       true, payload, false);
+			       true, payload, false, false);
 }
 
 int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err)
@@ -829,13 +833,13 @@  void kvm_inject_nmi(struct kvm_vcpu *vcpu)
 
 void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code)
 {
-	kvm_multiple_exception(vcpu, nr, true, error_code, false, 0, false);
+	kvm_multiple_exception(vcpu, nr, true, error_code, false, 0, false, false);
 }
 EXPORT_SYMBOL_GPL(kvm_queue_exception_e);
 
-void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code)
+void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code, bool nested)
 {
-	kvm_multiple_exception(vcpu, nr, true, error_code, false, 0, true);
+	kvm_multiple_exception(vcpu, nr, true, error_code, false, 0, true, nested);
 }
 EXPORT_SYMBOL_GPL(kvm_requeue_exception_e);
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 60da8cbe6759..63e543c6834b 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -108,6 +108,7 @@  static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.exception.pending = false;
 	vcpu->arch.exception.injected = false;
+	vcpu->arch.exception.nested = false;
 	vcpu->arch.exception_vmexit.pending = false;
 }