Message ID | 20230306224127.1689967-1-vipinsh@google.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp2114591wrd; Mon, 6 Mar 2023 14:46:03 -0800 (PST) X-Google-Smtp-Source: AK7set833uiwce2u4FW3DD1kfzdrzUrjstIO5BA3NHcO82RKVG4qUZKmyneNpmuLaBtSycee+HS3 X-Received: by 2002:a17:907:7244:b0:8e7:916f:193d with SMTP id ds4-20020a170907724400b008e7916f193dmr15907364ejc.28.1678142763061; Mon, 06 Mar 2023 14:46:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678142763; cv=none; d=google.com; s=arc-20160816; b=n41x5sxNfnz6RuFdcwoh1kDPcURA0zxaXqTBWQJG5U0hf7LmWApedMfeMd1NgYsF21 ICJQhzhhVBQkBUKGvmRihecGDNDhT9VKmVHyP1ST2g+tBzLtVPh4eb8JmqFmMSYIdwqF 6qsMaY2BqHT4dWFpP0WVdE/l8EiYdDfGLpw9rEJmiWKT6KBszju675OmcYO6jdutlhDc esecED4BVeItyrsLGF15BjNuQZNG0aBqeuoLmpfCkzQr3vIp6Plg3fg6pA3HCLj5V0H6 Q0nHa9cOKqY6PT6L6HWHnqajHmX3loW0ZafB4dgTSMju4o6VD/VNb2tCftPdhhJ0xBpX DvLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=bFNJWG6qjm2MPxLP7ev67uNQscemoGBQSlcIHBm/wyc=; b=bOKr6QyFDee/069tHfMjCAtWrOqjq7E3yWTJv6N4sasOchvHJhYnMuUgjq13nCLWO+ vvqyevftIoRAaou7Dld1xVbc66K+QWaKOs/eTyKtykoHTwH0D37fCcxYm+1YNTnejzxS 4PeHECNZN0clBBJG9HGkn3KZ0OQoo16SiYyCjve1iUWjeFO8ygb5dtcPOx1fYzhfmNAi a7qUnHkSrcqXOSQwVHwsum1QpvBm5g3qADmr4XPhvNfgs2xX6/+2y/U+GKIJgKGqM8MC NU3XI8xI6MIb0cM4i+y8WNt6WTCidFHug0ZD3tw7WjRXrYI9RNgkU1lIdKPiuAZYUNhg mI5w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=C7ey9k8k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h26-20020a50ed9a000000b004acbe796d63si11791389edr.463.2023.03.06.14.44.27; Mon, 06 Mar 2023 14:46:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=C7ey9k8k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230101AbjCFWll (ORCPT <rfc822;toshivichauhan@gmail.com> + 99 others); Mon, 6 Mar 2023 17:41:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229718AbjCFWlk (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 6 Mar 2023 17:41:40 -0500 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C267874A74 for <linux-kernel@vger.kernel.org>; Mon, 6 Mar 2023 14:41:34 -0800 (PST) Received: by mail-pf1-x44a.google.com with SMTP id y35-20020a056a00182300b005e8e2c6afe2so6186785pfa.12 for <linux-kernel@vger.kernel.org>; Mon, 06 Mar 2023 14:41:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678142494; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=bFNJWG6qjm2MPxLP7ev67uNQscemoGBQSlcIHBm/wyc=; b=C7ey9k8kuVCYIYxg7pb/31RB4zdzj0lDq+3l5WlFPkoz5lEb4hLTDreF0q/eEjxDM8 bBip4kKF/B/Fzc+u4bI9iSMbiEbYNzCMn64/UPrjVW7/6EO5wcixYlAA+waXBdkR17lB EIZPnmJeuxODV5D50rWDNHTVpcp8LCE8875YkAU5IZTO+uHE02KGiKtqNy64GPQdMTXI YmM1E60e9EjAzPfrEs0O1dUCkHSCQWcYn++QXl/HAHXSUsOhoVu0h0NCAq/LdgI/sk/L ZrJn9jbv6c8bEspACsek8WeEVXfEJRc98+PQCxJyfclbW1ekDOaMbC9V6ed9Oqru+YAt 40kA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678142494; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=bFNJWG6qjm2MPxLP7ev67uNQscemoGBQSlcIHBm/wyc=; b=vriQnmoDWJVumQAXV94dWgUemwLkOkJeM/14JLGZBOrpCPkjTfHkb3bFEVASq10/Tp S2PzNC/4SCR5dPwQMJiyAPLJJRXt3ypL8XLyH8BjHOYnUrBbzy/SjkdvFbPDTjvM6tsA KLh49IJrM86AxcG+CuOb50QzqP7vSS3W8VQWz4K8mz16aSFm5gJDah28Cal+pWFtlGpg 5h2PTZAJb+pISOHlKHu/Zaja2hr8RIfuPrelSZMS9YdHj+EHNJwDlkaH8eu1fyfrR7AJ HGnu7SP6Zh/gU7YIC+wbrJQpXI2vQNR7LDIidRyR0Nj71ObLNwv8QRjzf3twadOzrWlA sDhA== X-Gm-Message-State: AO0yUKXSw3Usv3FPNCYEFC3JZ8HD1X0aOcMUX7Nfh3k3eUj7rk8emFt4 mlorjdQFxYqIYqwz3H5UOcEQTKh3eC+o X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a62:8307:0:b0:5a8:4dc1:5916 with SMTP id h7-20020a628307000000b005a84dc15916mr5433004pfe.2.1678142494296; Mon, 06 Mar 2023 14:41:34 -0800 (PST) Date: Mon, 6 Mar 2023 14:41:09 -0800 Mime-Version: 1.0 X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog Message-ID: <20230306224127.1689967-1-vipinsh@google.com> Subject: [Patch v4 00/18] NUMA aware page table allocation From: Vipin Sharma <vipinsh@google.com> To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: jmattson@google.com, mizhang@google.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma <vipinsh@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759660226158396935?= X-GMAIL-MSGID: =?utf-8?q?1759660226158396935?= |
Series |
NUMA aware page table allocation
|
|
Message
Vipin Sharma
March 6, 2023, 10:41 p.m. UTC
Hi, This series build up based on the feedback on v3. Biggest change in features is to enable NUMA aware page table per VM basis instead of using a module parameter for all VMs on a host. This was decided based on an internal discussion to avoid forcing all VMs to be NUMA aware on a host. We need to collect more data to see how much performance degradation a VM can get in negative testing, where vCPUs in VM are always accessing remote NUMA nodes memory instead of staying local compared to a VM which is not NUMA aware. There are other changes which are mentioned in the change log below for v4. Thanks Vipin v4: - Removed module parameter for enabling NUMA aware page table. - Added new capability KVM_CAP_NUMA_AWARE_PAGE_TABLE to enable this feature per VM. - Added documentation for the new capability. - Holding mutex just before the top up and releasing it after the fault/split is addressed. Previous version were using spinlocks two times, first time for topup and second time fetching the page from cache. - Using the existing slots_lock for split_shadow_page_cache operations. - KVM MMU shrinker will also shrink mm_shadow_info_cache besides split_shadow_page_cache and mmu_shadow_page_cache. - Reduced cache default size to 4. - Split patches into smaller ones. v3: https://lore.kernel.org/lkml/20221222023457.1764-1-vipinsh@google.com/ - Split patches into smaller ones. - Repurposed KVM MMU shrinker to free cache pages instead of oldest page table pages - Reduced cache size from 40 to 5 - Removed __weak function and initializing node value in all architectures. - Some name changes. v2: https://lore.kernel.org/lkml/20221201195718.1409782-1-vipinsh@google.com/ - All page table pages will be allocated on underlying physical page's NUMA node. - Introduced module parameter, numa_aware_pagetable, to disable this feature. - Using kvm_pfn_to_refcounted_page to get page from a pfn. v1: https://lore.kernel.org/all/20220801151928.270380-1-vipinsh@google.com/ Vipin Sharma (18): KVM: x86/mmu: Change KVM mmu shrinker to no-op KVM: x86/mmu: Remove zapped_obsolete_pages from struct kvm_arch{} KVM: x86/mmu: Track count of pages in KVM MMU page caches globally KVM: x86/mmu: Shrink shadow page caches via MMU shrinker KVM: x86/mmu: Add split_shadow_page_cache pages to global count of MMU cache pages KVM: x86/mmu: Shrink split_shadow_page_cache via MMU shrinker KVM: x86/mmu: Unconditionally count allocations from MMU page caches KVM: x86/mmu: Track unused mmu_shadowed_info_cache pages count via global counter KVM: x86/mmu: Shrink mmu_shadowed_info_cache via MMU shrinker KVM: x86/mmu: Add per VM NUMA aware page table capability KVM: x86/mmu: Add documentation of NUMA aware page table capability KVM: x86/mmu: Allocate NUMA aware page tables on TDP huge page splits KVM: mmu: Add common initialization logic for struct kvm_mmu_memory_cache{} KVM: mmu: Initialize kvm_mmu_memory_cache.gfp_zero to __GFP_ZERO by default KVM: mmu: Add NUMA node support in struct kvm_mmu_memory_cache{} KVM: x86/mmu: Allocate numa aware page tables during page fault KVM: x86/mmu: Allocate shadow mmu page table on huge page split on the same NUMA node KVM: x86/mmu: Reduce default mmu memory cache size Documentation/virt/kvm/api.rst | 29 +++ arch/arm64/kvm/arm.c | 2 +- arch/arm64/kvm/mmu.c | 2 +- arch/mips/kvm/mips.c | 3 + arch/riscv/kvm/mmu.c | 8 +- arch/riscv/kvm/vcpu.c | 2 +- arch/x86/include/asm/kvm_host.h | 17 +- arch/x86/include/asm/kvm_types.h | 6 +- arch/x86/kvm/mmu/mmu.c | 319 +++++++++++++++++++------------ arch/x86/kvm/mmu/mmu_internal.h | 38 ++++ arch/x86/kvm/mmu/paging_tmpl.h | 29 +-- arch/x86/kvm/mmu/tdp_mmu.c | 23 ++- arch/x86/kvm/x86.c | 18 +- include/linux/kvm_host.h | 2 + include/linux/kvm_types.h | 21 ++ include/uapi/linux/kvm.h | 1 + virt/kvm/kvm_main.c | 24 ++- 17 files changed, 386 insertions(+), 158 deletions(-)
Comments
On Mon, Mar 06, 2023, Vipin Sharma wrote: > Hi, > > This series build up based on the feedback on v3. > > Biggest change in features is to enable NUMA aware page table per VM > basis instead of using a module parameter for all VMs on a host. This > was decided based on an internal discussion to avoid forcing all VMs to > be NUMA aware on a host. We need to collect more data to see how much > performance degradation a VM can get in negative testing, where vCPUs in > VM are always accessing remote NUMA nodes memory instead of staying > local compared to a VM which is not NUMA aware. > > There are other changes which are mentioned in the change log below for > v4. > > Thanks > Vipin > > v4: > - Removed module parameter for enabling NUMA aware page table. Could you have a space before the dash? I think the mutt mistakenly treats it as a 'diff' where you removes a line. > - Added new capability KVM_CAP_NUMA_AWARE_PAGE_TABLE to enable this > feature per VM. > - Added documentation for the new capability. > - Holding mutex just before the top up and releasing it after the > fault/split is addressed. Previous version were using spinlocks two > times, first time for topup and second time fetching the page from > cache. > - Using the existing slots_lock for split_shadow_page_cache operations. > - KVM MMU shrinker will also shrink mm_shadow_info_cache besides > split_shadow_page_cache and mmu_shadow_page_cache. > - Reduced cache default size to 4. > - Split patches into smaller ones. > > v3: https://lore.kernel.org/lkml/20221222023457.1764-1-vipinsh@google.com/ > - Split patches into smaller ones. > - Repurposed KVM MMU shrinker to free cache pages instead of oldest page table > pages > - Reduced cache size from 40 to 5 > - Removed __weak function and initializing node value in all architectures. > - Some name changes. > > v2: https://lore.kernel.org/lkml/20221201195718.1409782-1-vipinsh@google.com/ > - All page table pages will be allocated on underlying physical page's > NUMA node. > - Introduced module parameter, numa_aware_pagetable, to disable this > feature. > - Using kvm_pfn_to_refcounted_page to get page from a pfn. > > v1: https://lore.kernel.org/all/20220801151928.270380-1-vipinsh@google.com/ > > Vipin Sharma (18): > KVM: x86/mmu: Change KVM mmu shrinker to no-op > KVM: x86/mmu: Remove zapped_obsolete_pages from struct kvm_arch{} > KVM: x86/mmu: Track count of pages in KVM MMU page caches globally > KVM: x86/mmu: Shrink shadow page caches via MMU shrinker > KVM: x86/mmu: Add split_shadow_page_cache pages to global count of MMU > cache pages > KVM: x86/mmu: Shrink split_shadow_page_cache via MMU shrinker > KVM: x86/mmu: Unconditionally count allocations from MMU page caches > KVM: x86/mmu: Track unused mmu_shadowed_info_cache pages count via > global counter > KVM: x86/mmu: Shrink mmu_shadowed_info_cache via MMU shrinker > KVM: x86/mmu: Add per VM NUMA aware page table capability > KVM: x86/mmu: Add documentation of NUMA aware page table capability > KVM: x86/mmu: Allocate NUMA aware page tables on TDP huge page splits > KVM: mmu: Add common initialization logic for struct > kvm_mmu_memory_cache{} > KVM: mmu: Initialize kvm_mmu_memory_cache.gfp_zero to __GFP_ZERO by > default > KVM: mmu: Add NUMA node support in struct kvm_mmu_memory_cache{} > KVM: x86/mmu: Allocate numa aware page tables during page fault > KVM: x86/mmu: Allocate shadow mmu page table on huge page split on the > same NUMA node > KVM: x86/mmu: Reduce default mmu memory cache size > > Documentation/virt/kvm/api.rst | 29 +++ > arch/arm64/kvm/arm.c | 2 +- > arch/arm64/kvm/mmu.c | 2 +- > arch/mips/kvm/mips.c | 3 + > arch/riscv/kvm/mmu.c | 8 +- > arch/riscv/kvm/vcpu.c | 2 +- > arch/x86/include/asm/kvm_host.h | 17 +- > arch/x86/include/asm/kvm_types.h | 6 +- > arch/x86/kvm/mmu/mmu.c | 319 +++++++++++++++++++------------ > arch/x86/kvm/mmu/mmu_internal.h | 38 ++++ > arch/x86/kvm/mmu/paging_tmpl.h | 29 +-- > arch/x86/kvm/mmu/tdp_mmu.c | 23 ++- > arch/x86/kvm/x86.c | 18 +- > include/linux/kvm_host.h | 2 + > include/linux/kvm_types.h | 21 ++ > include/uapi/linux/kvm.h | 1 + > virt/kvm/kvm_main.c | 24 ++- > 17 files changed, 386 insertions(+), 158 deletions(-) > > -- > 2.40.0.rc0.216.gc4246ad0f0-goog > May I know your base? It seems I cannot apply the series to kvm/master or kvm/queue without manual manipulation.
On Tue, Mar 7, 2023 at 10:19 AM Mingwei Zhang <mizhang@google.com> wrote: > > On Mon, Mar 06, 2023, Vipin Sharma wrote: > > v4: > > - Removed module parameter for enabling NUMA aware page table. > > Could you have a space before the dash? I think the mutt mistakenly > treats it as a 'diff' where you removes a line. From next version I will add a space before dash. > > -- > > 2.40.0.rc0.216.gc4246ad0f0-goog > > > > May I know your base? It seems I cannot apply the series to kvm/master > or kvm/queue without manual manipulation. My patch series is on the latest kvm/queue branch which is currently on commit 45dd9bc75d9a ("KVM: SVM: hyper-v: placate modpost section mismatch error") What manual manipulation do you have to do to apply this series?