[v4,0/4] KVM: x86/mmu: Pre-check for mmu_notifier retry

Message ID 20240209222858.396696-1-seanjc@google.com
Headers
Series KVM: x86/mmu: Pre-check for mmu_notifier retry |

Message

Sean Christopherson Feb. 9, 2024, 10:28 p.m. UTC
  Retry page faults without acquiring mmu_lock, and potentially even without
resolving a pfn, if the gfn is covered by an active invalidation.  This
avoids resource and lock contention, which can be especially beneficial
for preemptible kernels as KVM can get stuck bouncing mmu_lock between a
vCPU and the invalidation task the vCPU is waiting on to finish.

v4: 
 - Pre-check for retry before resolving the pfn, too. [Yan]
 - Add a patch to fix a private/shared vs. memslot validity check
   priority inversion bug.
 - Refactor kvm_faultin_pfn() to clean up the handling of noslot faults.

v3:
 - https://lkml.kernel.org/r/20240203003518.387220-1-seanjc%40google.com
 - Release the pfn, i.e. put the struct page reference if one was held,
   as the caller doesn't expect to get a reference on "failure". [Yuan]
 - Fix a typo in the comment.

v2:
 - Introduce a dedicated helper and collapse to a single patch (because
   adding an unused helper would be quite silly).
 - Add a comment to explain the "unsafe" check in kvm_faultin_pfn(). [Kai]
 - Add Kai's Ack.

v1: https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@google.com

Sean Christopherson (4):
  KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is
    changing
  KVM: x86/mmu: Move private vs. shared check above slot validity checks
  KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to
    kvm_faultin_pfn()
  KVM: x86/mmu: Handle no-slot faults at the beginning of
    kvm_faultin_pfn()

 arch/x86/kvm/mmu/mmu.c          | 134 ++++++++++++++++++++++----------
 arch/x86/kvm/mmu/mmu_internal.h |   5 +-
 include/linux/kvm_host.h        |  26 +++++++
 3 files changed, 122 insertions(+), 43 deletions(-)


base-commit: 60eedcfceda9db46f1b333e5e1aa9359793f04fb
  

Comments

Friedrich Weber Feb. 14, 2024, 1:17 p.m. UTC | #1
On 09/02/2024 23:28, Sean Christopherson wrote:
> Retry page faults without acquiring mmu_lock, and potentially even without
> resolving a pfn, if the gfn is covered by an active invalidation.  This
> avoids resource and lock contention, which can be especially beneficial
> for preemptible kernels as KVM can get stuck bouncing mmu_lock between a
> vCPU and the invalidation task the vCPU is waiting on to finish.
> 
> v4: 
>  - Pre-check for retry before resolving the pfn, too. [Yan]
>  - Add a patch to fix a private/shared vs. memslot validity check
>    priority inversion bug.
>  - Refactor kvm_faultin_pfn() to clean up the handling of noslot faults.

Can confirm that v4 also fixes the temporary guest hangs [1] I'm seeing
in combination with KSM and NUMA balancing:
* On 60eedcfc, the reproducer [1] triggers temporary hangs
* With the four patches applied on top of 60eedcfc, the reproducer does
not trigger hangs

Thanks a lot for looking into this!

[1]
https://lore.kernel.org/kvm/832697b9-3652-422d-a019-8c0574a188ac@proxmox.com/