[v4,00/18] NUMA aware page table allocation

Message ID	20230306224127.1689967-1-vipinsh@google.com
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Date: Mon, 6 Mar 2023 14:41:09 -0800 Mime-Version: 1.0 Message-ID: <20230306224127.1689967-1-vipinsh@google.com> Subject: [Patch v4 00/18] NUMA aware page table allocation From: Vipin Sharma <vipinsh@google.com> To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: jmattson@google.com, mizhang@google.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma <vipinsh@google.com> Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	NUMA aware page table allocation \| [v4,00/18] NUMA aware page table allocation [v4,01/18] KVM: x86/mmu: Change KVM mmu shrinker to no-op [v4,02/18] KVM: x86/mmu: Remove zapped_obsolete_pages from struct kvm_arch{} [v4,03/18] KVM: x86/mmu: Track count of pages in KVM MMU page caches globally [v4,04/18] KVM: x86/mmu: Shrink shadow page caches via MMU shrinker [v4,05/18] KVM: x86/mmu: Add split_shadow_page_cache pages to global count of MMU cache pages [v4,06/18] KVM: x86/mmu: Shrink split_shadow_page_cache via MMU shrinker [v4,07/18] KVM: x86/mmu: Unconditionally count allocations from MMU page caches [v4,08/18] KVM: x86/mmu: Track unused mmu_shadowed_info_cache pages count via global counter [v4,09/18] KVM: x86/mmu: Shrink mmu_shadowed_info_cache via MMU shrinker [v4,10/18] KVM: x86/mmu: Add per VM NUMA aware page table capability [v4,11/18] KVM: x86/mmu: Add documentation of NUMA aware page table capability [v4,12/18] KVM: x86/mmu: Allocate NUMA aware page tables on TDP huge page splits [v4,13/18] KVM: mmu: Add common initialization logic for struct kvm_mmu_memory_cache{} [v4,14/18] KVM: mmu: Initialize kvm_mmu_memory_cache.gfp_zero to __GFP_ZERO by default [v4,15/18] KVM: mmu: Add NUMA node support in struct kvm_mmu_memory_cache{} [v4,16/18] KVM: x86/mmu: Allocate numa aware page tables during page fault [v4,17/18] KVM: x86/mmu: Allocate shadow mmu page table on huge page split on the same NUMA node [v4,18/18] KVM: x86/mmu: Reduce default mmu memory cache size

Message ID

20230306224127.1689967-1-vipinsh@google.com

Headers

Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Date: Mon,  6 Mar 2023 14:41:09 -0800
Mime-Version: 1.0
Message-ID: <20230306224127.1689967-1-vipinsh@google.com>
Subject: [Patch v4 00/18] NUMA aware page table allocation
From: Vipin Sharma <vipinsh@google.com>
To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com,
        dmatlack@google.com
Cc: jmattson@google.com, mizhang@google.com, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Vipin Sharma <vipinsh@google.com>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

Series

NUMA aware page table allocation |

Message

Vipin Sharma March 6, 2023, 10:41 p.m. UTC

  Hi,

This series build up based on the feedback on v3.

Biggest change in features is to enable NUMA aware page table per VM
basis instead of using a module parameter for all VMs on a host. This
was decided based on an internal discussion to avoid forcing all VMs to
be NUMA aware on a host. We need to collect more data to see how much
performance degradation a VM can get in negative testing, where vCPUs in
VM are always accessing remote NUMA nodes memory instead of staying
local compared to a VM which is not NUMA aware.

There are other changes which are mentioned in the change log below for
v4.

Thanks
Vipin

v4:
- Removed module parameter for enabling NUMA aware page table.
- Added new capability KVM_CAP_NUMA_AWARE_PAGE_TABLE to enable this
  feature per VM.
- Added documentation for the new capability.
- Holding mutex just before the top up and releasing it after the
  fault/split is addressed. Previous version were using spinlocks two
  times, first time for topup and second time fetching the page from
  cache.
- Using the existing slots_lock for split_shadow_page_cache operations.
- KVM MMU shrinker will also shrink mm_shadow_info_cache besides
  split_shadow_page_cache and mmu_shadow_page_cache.
- Reduced cache default size to 4.
- Split patches into smaller ones.

v3: https://lore.kernel.org/lkml/20221222023457.1764-1-vipinsh@google.com/
- Split patches into smaller ones.
- Repurposed KVM MMU shrinker to free cache pages instead of oldest page table
  pages
- Reduced cache size from 40 to 5
- Removed __weak function and initializing node value in all architectures.
- Some name changes.

v2: https://lore.kernel.org/lkml/20221201195718.1409782-1-vipinsh@google.com/
- All page table pages will be allocated on underlying physical page's
  NUMA node.
- Introduced module parameter, numa_aware_pagetable, to disable this
  feature.
- Using kvm_pfn_to_refcounted_page to get page from a pfn.

v1: https://lore.kernel.org/all/20220801151928.270380-1-vipinsh@google.com/

Vipin Sharma (18):
  KVM: x86/mmu: Change KVM mmu shrinker to no-op
  KVM: x86/mmu: Remove zapped_obsolete_pages from struct kvm_arch{}
  KVM: x86/mmu: Track count of pages in KVM MMU page caches globally
  KVM: x86/mmu: Shrink shadow page caches via MMU shrinker
  KVM: x86/mmu: Add split_shadow_page_cache pages to global count of MMU
    cache pages
  KVM: x86/mmu: Shrink split_shadow_page_cache via MMU shrinker
  KVM: x86/mmu: Unconditionally count allocations from MMU page caches
  KVM: x86/mmu: Track unused mmu_shadowed_info_cache pages count via
    global counter
  KVM: x86/mmu: Shrink mmu_shadowed_info_cache via MMU shrinker
  KVM: x86/mmu: Add per VM NUMA aware page table capability
  KVM: x86/mmu: Add documentation of NUMA aware page table capability
  KVM: x86/mmu: Allocate NUMA aware page tables on TDP huge page splits
  KVM: mmu: Add common initialization logic for struct
    kvm_mmu_memory_cache{}
  KVM: mmu: Initialize kvm_mmu_memory_cache.gfp_zero to __GFP_ZERO by
    default
  KVM: mmu: Add NUMA node support in struct kvm_mmu_memory_cache{}
  KVM: x86/mmu: Allocate numa aware page tables during page fault
  KVM: x86/mmu: Allocate shadow mmu page table on huge page split on the
    same NUMA node
  KVM: x86/mmu: Reduce default mmu memory cache size

 Documentation/virt/kvm/api.rst   |  29 +++
 arch/arm64/kvm/arm.c             |   2 +-
 arch/arm64/kvm/mmu.c             |   2 +-
 arch/mips/kvm/mips.c             |   3 +
 arch/riscv/kvm/mmu.c             |   8 +-
 arch/riscv/kvm/vcpu.c            |   2 +-
 arch/x86/include/asm/kvm_host.h  |  17 +-
 arch/x86/include/asm/kvm_types.h |   6 +-
 arch/x86/kvm/mmu/mmu.c           | 319 +++++++++++++++++++------------
 arch/x86/kvm/mmu/mmu_internal.h  |  38 ++++
 arch/x86/kvm/mmu/paging_tmpl.h   |  29 +--
 arch/x86/kvm/mmu/tdp_mmu.c       |  23 ++-
 arch/x86/kvm/x86.c               |  18 +-
 include/linux/kvm_host.h         |   2 +
 include/linux/kvm_types.h        |  21 ++
 include/uapi/linux/kvm.h         |   1 +
 virt/kvm/kvm_main.c              |  24 ++-
 17 files changed, 386 insertions(+), 158 deletions(-)

Comments

Mingwei Zhang March 7, 2023, 6:19 p.m. UTC | #1

On Mon, Mar 06, 2023, Vipin Sharma wrote:
> Hi,
> 
> This series build up based on the feedback on v3.
> 
> Biggest change in features is to enable NUMA aware page table per VM
> basis instead of using a module parameter for all VMs on a host. This
> was decided based on an internal discussion to avoid forcing all VMs to
> be NUMA aware on a host. We need to collect more data to see how much
> performance degradation a VM can get in negative testing, where vCPUs in
> VM are always accessing remote NUMA nodes memory instead of staying
> local compared to a VM which is not NUMA aware.
> 
> There are other changes which are mentioned in the change log below for
> v4.
> 
> Thanks
> Vipin
> 
> v4:
> - Removed module parameter for enabling NUMA aware page table.

Could you have a space before the dash? I think the mutt mistakenly
treats it as a 'diff' where you removes a line.
> - Added new capability KVM_CAP_NUMA_AWARE_PAGE_TABLE to enable this
>   feature per VM.
> - Added documentation for the new capability.
> - Holding mutex just before the top up and releasing it after the
>   fault/split is addressed. Previous version were using spinlocks two
>   times, first time for topup and second time fetching the page from
>   cache.
> - Using the existing slots_lock for split_shadow_page_cache operations.
> - KVM MMU shrinker will also shrink mm_shadow_info_cache besides
>   split_shadow_page_cache and mmu_shadow_page_cache.
> - Reduced cache default size to 4.
> - Split patches into smaller ones.
> 
> v3: https://lore.kernel.org/lkml/20221222023457.1764-1-vipinsh@google.com/
> - Split patches into smaller ones.
> - Repurposed KVM MMU shrinker to free cache pages instead of oldest page table
>   pages
> - Reduced cache size from 40 to 5
> - Removed __weak function and initializing node value in all architectures.
> - Some name changes.
> 
> v2: https://lore.kernel.org/lkml/20221201195718.1409782-1-vipinsh@google.com/
> - All page table pages will be allocated on underlying physical page's
>   NUMA node.
> - Introduced module parameter, numa_aware_pagetable, to disable this
>   feature.
> - Using kvm_pfn_to_refcounted_page to get page from a pfn.
> 
> v1: https://lore.kernel.org/all/20220801151928.270380-1-vipinsh@google.com/
> 
> Vipin Sharma (18):
>   KVM: x86/mmu: Change KVM mmu shrinker to no-op
>   KVM: x86/mmu: Remove zapped_obsolete_pages from struct kvm_arch{}
>   KVM: x86/mmu: Track count of pages in KVM MMU page caches globally
>   KVM: x86/mmu: Shrink shadow page caches via MMU shrinker
>   KVM: x86/mmu: Add split_shadow_page_cache pages to global count of MMU
>     cache pages
>   KVM: x86/mmu: Shrink split_shadow_page_cache via MMU shrinker
>   KVM: x86/mmu: Unconditionally count allocations from MMU page caches
>   KVM: x86/mmu: Track unused mmu_shadowed_info_cache pages count via
>     global counter
>   KVM: x86/mmu: Shrink mmu_shadowed_info_cache via MMU shrinker
>   KVM: x86/mmu: Add per VM NUMA aware page table capability
>   KVM: x86/mmu: Add documentation of NUMA aware page table capability
>   KVM: x86/mmu: Allocate NUMA aware page tables on TDP huge page splits
>   KVM: mmu: Add common initialization logic for struct
>     kvm_mmu_memory_cache{}
>   KVM: mmu: Initialize kvm_mmu_memory_cache.gfp_zero to __GFP_ZERO by
>     default
>   KVM: mmu: Add NUMA node support in struct kvm_mmu_memory_cache{}
>   KVM: x86/mmu: Allocate numa aware page tables during page fault
>   KVM: x86/mmu: Allocate shadow mmu page table on huge page split on the
>     same NUMA node
>   KVM: x86/mmu: Reduce default mmu memory cache size
> 
>  Documentation/virt/kvm/api.rst   |  29 +++
>  arch/arm64/kvm/arm.c             |   2 +-
>  arch/arm64/kvm/mmu.c             |   2 +-
>  arch/mips/kvm/mips.c             |   3 +
>  arch/riscv/kvm/mmu.c             |   8 +-
>  arch/riscv/kvm/vcpu.c            |   2 +-
>  arch/x86/include/asm/kvm_host.h  |  17 +-
>  arch/x86/include/asm/kvm_types.h |   6 +-
>  arch/x86/kvm/mmu/mmu.c           | 319 +++++++++++++++++++------------
>  arch/x86/kvm/mmu/mmu_internal.h  |  38 ++++
>  arch/x86/kvm/mmu/paging_tmpl.h   |  29 +--
>  arch/x86/kvm/mmu/tdp_mmu.c       |  23 ++-
>  arch/x86/kvm/x86.c               |  18 +-
>  include/linux/kvm_host.h         |   2 +
>  include/linux/kvm_types.h        |  21 ++
>  include/uapi/linux/kvm.h         |   1 +
>  virt/kvm/kvm_main.c              |  24 ++-
>  17 files changed, 386 insertions(+), 158 deletions(-)
> 
> -- 
> 2.40.0.rc0.216.gc4246ad0f0-goog
> 

May I know your base? It seems I cannot apply the series to kvm/master
or kvm/queue without manual manipulation.

Vipin Sharma March 7, 2023, 6:33 p.m. UTC | #2

On Tue, Mar 7, 2023 at 10:19 AM Mingwei Zhang <mizhang@google.com> wrote:
>
> On Mon, Mar 06, 2023, Vipin Sharma wrote:

> > v4:
> > - Removed module parameter for enabling NUMA aware page table.
>
> Could you have a space before the dash? I think the mutt mistakenly
> treats it as a 'diff' where you removes a line.

From next version I will add a space before dash.


> > --
> > 2.40.0.rc0.216.gc4246ad0f0-goog
> >
>
> May I know your base? It seems I cannot apply the series to kvm/master
> or kvm/queue without manual manipulation.

My patch series is on the latest kvm/queue branch which is currently
on commit 45dd9bc75d9a ("KVM: SVM: hyper-v: placate modpost section
mismatch error")

What manual manipulation do you have to do to apply this series?