From patchwork Tue Aug 8 07:13:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13301 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c44e:0:b0:3f2:4152:657d with SMTP id w14csp2299915vqr; Tue, 8 Aug 2023 11:03:57 -0700 (PDT) X-Google-Smtp-Source: AGHT+IElv5rRpR7iEH1+eheYVGaiLHJ/Do19pgBpLkiTPkE9QEoA7zifIvE1TRNmRw+7kxoyDdaD X-Received: by 2002:a05:6a20:8f09:b0:132:cd2d:16fd with SMTP id b9-20020a056a208f0900b00132cd2d16fdmr289824pzk.38.1691517837333; Tue, 08 Aug 2023 11:03:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691517837; cv=none; d=google.com; s=arc-20160816; b=si2ZEJImPwIxr2Tq/8AoGAU2FkSatrCTEVVfnUMh9GnxUWy3J0G1gXR55n+ZXLnhEA LiNAxQNKFcUXtewNKssZv39bouCMW5lpb0i6S1yPy9/90BzUYIacQcQHUi/MRMKqtCSx a0fVIP9Aplo1t3+xsl0OGCWh6ThpC7cESu56vc1LglCr1hHV0f+Ta/V9ccJ1b8n9igX+ HJdbifK7HFsVSOeDQz3jK+VPloQrSoneyQ6L+jRsy1J0lYtpTf/F1r5u1V6hMU3BJ4yL OH2T7i4Tfbo1La/r9Cwx8WY8c/n/Qtdrg1Xi74A+tVmQR5tn61zeTAho866vcuBVBwtm v8lg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from :dkim-signature; bh=yb3PScxP/yXgdF6uacGEI8JsPThoaWGXGlLFr70kzOQ=; fh=66ZP7DriuAjqdI9o/zCDiI1Ib6I2mhfDuCT5DLSNGrg=; b=uzCQm2s7R6AgyobXo8k3huQKrInJ/QbYdF0Eag3gDJtdi+YbT1I6Ofvgsht+W7Er7G 5Lv4q7V/0ngh5HMm31LnSPFmcD8sXkDIiyZyXmWWdzfFutguNcRupqQuaRoQVe9+AoVy EeDyugztHMvOqOaTp48al1FAZ5l81ykOSp0Cra5KaR64BNLdbkgNKLN7EJVE7zs4yX5s 3S0cImorPG9WUohdr7ihvMjGMJKLSbq7hJ2OZPXsa74tG0NUC9wV/JbigXIcLUnTQ7bL PNqEIXaNW5MAxLXLU4P26bslZCJcZxnF63J6gjAOv0u14G8hsSd5b5eZv2YV4Doy+1Km BtQQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=WMRta5pX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q17-20020a632a11000000b0055c8d58cee9si7014960pgq.714.2023.08.08.11.03.41; Tue, 08 Aug 2023 11:03:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=WMRta5pX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233803AbjHHQ6X (ORCPT + 99 others); Tue, 8 Aug 2023 12:58:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233487AbjHHQ53 (ORCPT ); Tue, 8 Aug 2023 12:57:29 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 491361FFB; Tue, 8 Aug 2023 08:42:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691509361; x=1723045361; h=from:to:cc:subject:date:message-id; bh=Vk5oRN79w/cLbLNx3NSjT6+SGZdwpnI5SrG+5/rFROM=; b=WMRta5pXM2WzqBbEDFnIePz90d6/yUTY0Nc79rR+LbpjwpBfAoUJLU// rCXoJ1YhtJ6uvXBzjmvOWexf1Hgk/yAThrosO9TJfGnSVKXpfF5latMz5 gSy+9vDOf9NChui6e1rDYju4ITdV+/tifIWNgKUly7V2YnUam8i+blXmJ BSnQj95H9F0mQh3uGKj+paJunZLmKmYipGjucW6H185zlN4von9OdAecw Xo9dvO1aWcRiAvUOdxvleSO7OgFpa5RVAH8EDa0pJg3injbaC6qBjPIDK 2DOY9eMR6xiEnfvnmJhyb3Sk20GBZErfG/2ULee3QYl2Sn4oQbDd/GXeC Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="457130721" X-IronPort-AV: E=Sophos;i="6.01,263,1684825200"; d="scan'208";a="457130721" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2023 00:40:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="731281811" X-IronPort-AV: E=Sophos;i="6.01,263,1684825200"; d="scan'208";a="731281811" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2023 00:40:22 -0700 From: Yan Zhao To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, mike.kravetz@oracle.com, apopple@nvidia.com, jgg@nvidia.com, rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com, Yan Zhao Subject: [RFC PATCH 0/3] Reduce NUMA balance caused TLB-shootdowns in a VM Date: Tue, 8 Aug 2023 15:13:29 +0800 Message-Id: <20230808071329.19995-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773685008285703124 X-GMAIL-MSGID: 1773685008285703124 This is an RFC series trying to fix the issue of unnecessary NUMA protection and TLB-shootdowns found in VMs with assigned devices or VFIO mediated devices during NUMA balance. For VMs with assigned devices or VFIO mediated devices, all or part of guest memory are pinned for long-term. Auto NUMA balancing will periodically selects VMAs of a process and change protections to PROT_NONE even though some or all pages in the selected ranges are long-term pinned for DMAs, which is true for VMs with assigned devices or VFIO mediated devices. Though this will not cause real problem because NUMA migration will ultimately reject migration of those kind of pages and restore those PROT_NONE PTEs, it causes KVM's secondary MMU to be zapped periodically with equal SPTEs finally faulted back, wasting CPU cycles and generating unnecessary TLB-shootdowns. This series first introduces a new flag MMU_NOTIFIER_RANGE_NUMA in patch 1 to work with mmu notifier event type MMU_NOTIFY_PROTECTION_VMA, so that the subscriber (e.g.KVM) of the mmu notifier can know that an invalidation event is sent for NUMA migration purpose in specific. Then, with patch 3, during zapping KVM's secondary MMU, KVM can check and keep accessing the long-term pinned pages even though it's PROT_NONE-mapped in the primary MMU. Patch 2 skips setting PROT_NONE to long-term pinned pages in the primary MMU to avoid NUMA protection introduced page faults and restoration of old huge PMDs/PTEs in primary MMU. As change_pmd_range() will first send .invalidate_range_start() before going down and checking the pages to skip, patch 1 and 3 are still required for KVM. In my test environment, with this series, during boot-up with a VM with assigned devices: TLB shootdown count in KVM caused by .invalidate_range_start() sent for NUMA balancing in change_pmd_range() is reduced from 9000+ on average to 0. Yan Zhao (3): mm/mmu_notifier: introduce a new mmu notifier flag MMU_NOTIFIER_RANGE_NUMA mm: don't set PROT_NONE to maybe-dma-pinned pages for NUMA-migrate purpose KVM: x86/mmu: skip zap maybe-dma-pinned pages for NUMA migration arch/x86/kvm/mmu/mmu.c | 4 ++-- arch/x86/kvm/mmu/tdp_mmu.c | 26 ++++++++++++++++++++++---- arch/x86/kvm/mmu/tdp_mmu.h | 4 ++-- include/linux/kvm_host.h | 1 + include/linux/mmu_notifier.h | 1 + mm/huge_memory.c | 5 +++++ mm/mprotect.c | 9 ++++++++- virt/kvm/kvm_main.c | 5 +++++ 8 files changed, 46 insertions(+), 9 deletions(-) base-commit: fdf0eaf11452d72945af31804e2a1048ee1b574c