[v2] KVM: x86: Service NMI requests after PMI requests in VM-Enter path

Message ID 20231002040839.2630027-1-mizhang@google.com
State New
Headers
Series [v2] KVM: x86: Service NMI requests after PMI requests in VM-Enter path |

Commit Message

Mingwei Zhang Oct. 2, 2023, 4:08 a.m. UTC
  Service NMI requests after PMI requests in vcpu_enter_guest() so that KVM
does not need to cancel and redo the VM-Enter. Because APIC emulation
"injects" NMIs via KVM_REQ_NMI, handling PMI requests after NMI requests
means KVM won't detect the pending NMI request until the final check for
outstanding requests. Detecting requests at the final stage is costly as
KVM has already loaded guest state, potentially queued events for
injection, disabled IRQs, dropped SRCU, etc., most of which needs to be
unwound.

Move the code snippet processing KVM_REQ_NMI to the one after processing
KVM_REQ_PMI to make KVM vPMU more efficient. Note that changing the order
of the code snippets does make any functional change to KVM. KVM allows two
pending NMIs while x86 architecture allows only one pending. This is
because KVM may not be able to process NMI quickly. Therefore, if a PMI
generated NMI is coincident two other NMIs pending (or one other pending
NMI and KVM blocks NMI) in KVM, then it will be dropped and it is
architecturely ok, since x86 CPU does not guarantee the delivery of each
NMI.

Using SPEC2017 benchmark programs running along with Intel vtune in a VM
demonstrates that the following code change reduces 800~1500 canceled
VM-Enters per second.

Some glory details:

Probe the invocation to vmx_cancel_injection():

    $ perf probe -a vmx_cancel_injection
    $ perf stat -a -e probe:vmx_cancel_injection -I 10000 # per 10 seconds

Partial results when SPEC2017 with Intel vtune are running in the VM:

On kernel without the change:
    10.010018010              14254      probe:vmx_cancel_injection
    20.037646388              15207      probe:vmx_cancel_injection
    30.078739816              15261      probe:vmx_cancel_injection
    40.114033258              15085      probe:vmx_cancel_injection
    50.149297460              15112      probe:vmx_cancel_injection
    60.185103088              15104      probe:vmx_cancel_injection

On kernel with the change:
    10.003595390                 40      probe:vmx_cancel_injection
    20.017855682                 31      probe:vmx_cancel_injection
    30.028355883                 34      probe:vmx_cancel_injection
    40.038686298                 31      probe:vmx_cancel_injection
    50.048795162                 20      probe:vmx_cancel_injection
    60.069057747                 19      probe:vmx_cancel_injection

v1: https://lore.kernel.org/all/20230927040939.342643-1-mizhang@google.com/

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/kvm/x86.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


base-commit: 73554b29bd70546c1a9efc9c160641ef1b849358
  

Comments

Sean Christopherson Oct. 20, 2023, 10:56 p.m. UTC | #1
On Mon, 02 Oct 2023 04:08:39 +0000, Mingwei Zhang wrote:
> Service NMI requests after PMI requests in vcpu_enter_guest() so that KVM
> does not need to cancel and redo the VM-Enter. Because APIC emulation
> "injects" NMIs via KVM_REQ_NMI, handling PMI requests after NMI requests
> means KVM won't detect the pending NMI request until the final check for
> outstanding requests. Detecting requests at the final stage is costly as
> KVM has already loaded guest state, potentially queued events for
> injection, disabled IRQs, dropped SRCU, etc., most of which needs to be
> unwound.
> 
> [...]

Applied to kvm-x86 pmu, thanks!

I made a tweak to the code and massaged one part of the changelog.  For the
code, I hoisted PMU/PMI above SMI too, mainly to keep SMI+NMI together, but
also because *technically* the guest could configure LVTPC to send an SMI (LOL).

Regarding the changelog, I replaced the justification about correctness with
this:

    Note that changing the order of request processing doesn't change the end
    result, as KVM's final check for outstanding requests prevents entering
    the guest until all requests are serviced.  I.e. KVM will ultimately
    coalesce events (or not) regardless of the ordering.
    
The architectural behavior of NMIs and KVM's unintuitive simultaneous NMI
handling simply doesn't matter as far as this patch is concerned, especially
when considering the SMI technicality.  E.g. the net effect would be the same
even if KVM allowed only a single NMIs.

Please holler if you disagree with either/both of the above changes.

[1/1] KVM: x86: Service NMI requests after PMI requests in VM-Enter path
      https://github.com/kvm-x86/linux/commit/4b09cc132a59

--
https://github.com/kvm-x86/linux/tree/next
  
Mingwei Zhang Oct. 21, 2023, 2:36 a.m. UTC | #2
On Fri, Oct 20, 2023 at 3:58 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, 02 Oct 2023 04:08:39 +0000, Mingwei Zhang wrote:
> > Service NMI requests after PMI requests in vcpu_enter_guest() so that KVM
> > does not need to cancel and redo the VM-Enter. Because APIC emulation
> > "injects" NMIs via KVM_REQ_NMI, handling PMI requests after NMI requests
> > means KVM won't detect the pending NMI request until the final check for
> > outstanding requests. Detecting requests at the final stage is costly as
> > KVM has already loaded guest state, potentially queued events for
> > injection, disabled IRQs, dropped SRCU, etc., most of which needs to be
> > unwound.
> >
> > [...]
>
> Applied to kvm-x86 pmu, thanks!
>
> I made a tweak to the code and massaged one part of the changelog.  For the
> code, I hoisted PMU/PMI above SMI too, mainly to keep SMI+NMI together, but
> also because *technically* the guest could configure LVTPC to send an SMI (LOL).
>
> Regarding the changelog, I replaced the justification about correctness with
> this:
>
>     Note that changing the order of request processing doesn't change the end
>     result, as KVM's final check for outstanding requests prevents entering
>     the guest until all requests are serviced.  I.e. KVM will ultimately
>     coalesce events (or not) regardless of the ordering.
>
> The architectural behavior of NMIs and KVM's unintuitive simultaneous NMI
> handling simply doesn't matter as far as this patch is concerned, especially
> when considering the SMI technicality.  E.g. the net effect would be the same
> even if KVM allowed only a single NMIs.
>
> Please holler if you disagree with either/both of the above changes.

That works for me. Initially, I thought you would touch the SMI code
(if so, I would push back). Nothing like that shows up in the change
so LGTM.

Thanks.
-Mingwei
>
> [1/1] KVM: x86: Service NMI requests after PMI requests in VM-Enter path
>       https://github.com/kvm-x86/linux/commit/4b09cc132a59
>
> --
> https://github.com/kvm-x86/linux/tree/next
  

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 42a4e8f5e89a..302b6f8ddfb1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10580,12 +10580,12 @@  static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		if (kvm_check_request(KVM_REQ_SMI, vcpu))
 			process_smi(vcpu);
 #endif
-		if (kvm_check_request(KVM_REQ_NMI, vcpu))
-			process_nmi(vcpu);
 		if (kvm_check_request(KVM_REQ_PMU, vcpu))
 			kvm_pmu_handle_event(vcpu);
 		if (kvm_check_request(KVM_REQ_PMI, vcpu))
 			kvm_pmu_deliver_pmi(vcpu);
+		if (kvm_check_request(KVM_REQ_NMI, vcpu))
+			process_nmi(vcpu);
 		if (kvm_check_request(KVM_REQ_IOAPIC_EOI_EXIT, vcpu)) {
 			BUG_ON(vcpu->arch.pending_ioapic_eoi > 255);
 			if (test_bit(vcpu->arch.pending_ioapic_eoi,