From patchwork Tue Aug 8 07:14:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 132631 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c44e:0:b0:3f2:4152:657d with SMTP id w14csp2291902vqr; Tue, 8 Aug 2023 10:50:54 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHSDMhgRRyb3prMpHuKDpxnxWVEKTyqGe/d7w5A38dtcjaKXj89gIlq8ikHxlLehKxcTTf2 X-Received: by 2002:a05:6a00:2d1e:b0:687:7ef9:a796 with SMTP id fa30-20020a056a002d1e00b006877ef9a796mr239594pfb.25.1691517054415; Tue, 08 Aug 2023 10:50:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691517054; cv=none; d=google.com; s=arc-20160816; b=Klxl2LUR9I/jbTrx/+2Ut3suKfTX5pXdc4/y7DRxCcTMVjRwJcOsnS608kjNuAbbfJ S8UxP4ra/gFn5ubk+oLKXBaIhwkwmJD4TRQUhteCmW4t2Y8L2ovXrpj5gZrXdEv2518N MOXXPN6BE+sPFjumtUGGvuQ1+Spb49OlSDQ9P/A8j8woTSXum2uiBH2igj2/OFOvtbtc W0A0qKT9D/cRJ+WujWuBpOTYlZqdMkPEZsAqFRkFVRxNHMgxGclDUYwTRR/E+Q1C5Kaz SiXu31Imd4z7gW6WqUHLFtenUkb5viy47c1DoZhdENcKIg/UUi6QYxrFrItvvVb7OqRZ ViTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=lI+Fq3zWUVQ+DCd5N9ZAZdBGYc8163OF3lr8FBRZ658=; fh=66ZP7DriuAjqdI9o/zCDiI1Ib6I2mhfDuCT5DLSNGrg=; b=jNq1Bcou4sNh+CvwF1fYRfLnzXr8IxV76DsMHpjlEbwTXsCGpNOvMxMGo+TX91OU55 mpNiKliyB8ZiCw/hK/3X3WE72BfOJgh61jPtibr9cVqKSHRxMUGpMnKhvdouv3/6rlVo 023btcyccAnnxgt3CfYW4a/yXKtn9LVNsqsni1EheTonWma8U/f6im7Bj1dLeYmCCaFP 3m/2NYFlkXxJl3z32KPiQVnQ9UXd0ZfxbZCnHkLlNChzRsCrE902G2PNSLuJ68QG383d SGf37BGze1i+Ot1GII+VkF82d2JeF4/O58ySQcwhMAO4on0WqMmpQmo/uHhWWDP5VSlm yEnQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RewRBO1W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z15-20020a056a001d8f00b0068730af8c95si7533961pfw.267.2023.08.08.10.50.40; Tue, 08 Aug 2023 10:50:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RewRBO1W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233237AbjHHQkW (ORCPT + 99 others); Tue, 8 Aug 2023 12:40:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33262 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232457AbjHHQja (ORCPT ); Tue, 8 Aug 2023 12:39:30 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBD3E155A0; Tue, 8 Aug 2023 08:54:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691510063; x=1723046063; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=ocUW9vorpBfsQJM0jJFuvi0sbVc6/jU+9IayIc6V+C0=; b=RewRBO1W3a89ECBY8L0u9HwnGl4ZHMIJBLPexztM6KIw1E/2mDgXETWB GeTWYah26SxGEoXcSkTrZWcTAHr88e11rUvEcml8j9IdJbyCev54oSvOl NCwsWzT7PuWwigj/RJQ6n4TfN5eWk93d8y0F1sBfIAn8yYCY0o3Jx5uMj sgWtOI1Ys6CaFTqCMGtKKR5GZBTfXCieCtAcqn7/lTcuCl+trLsES7ikd 9Bafzi6l7uMJRMZViku898FDeaJCIWZDkTqLDQ/N7b2fmreDspS3n1iAE sVSJ8HC6eWlOMdoU2GF/cyjEZn9ovtKHYIaxt7vq0KwjY0i1p2MQi6s0f w==; X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="437077172" X-IronPort-AV: E=Sophos;i="6.01,263,1684825200"; d="scan'208";a="437077172" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2023 00:41:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="681143961" X-IronPort-AV: E=Sophos;i="6.01,263,1684825200"; d="scan'208";a="681143961" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2023 00:41:50 -0700 From: Yan Zhao To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, mike.kravetz@oracle.com, apopple@nvidia.com, jgg@nvidia.com, rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com, Yan Zhao Subject: [RFC PATCH 1/3] mm/mmu_notifier: introduce a new mmu notifier flag MMU_NOTIFIER_RANGE_NUMA Date: Tue, 8 Aug 2023 15:14:48 +0800 Message-Id: <20230808071448.20105-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230808071329.19995-1-yan.y.zhao@intel.com> References: <20230808071329.19995-1-yan.y.zhao@intel.com> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773684186465136744 X-GMAIL-MSGID: 1773684186465136744 Introduce a new mmu notifier flag MMU_NOTIFIER_RANGE_NUMA to indicate the notification of MMU_NOTIFY_PROTECTION_VMA is for NUMA balance purpose specifically. So that, the subscriber of mmu notifier, like KVM, can do some performance optimization according to this accurate information. Signed-off-by: Yan Zhao --- include/linux/mmu_notifier.h | 1 + mm/mprotect.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 64a3e051c3c4..a6dc829a4bce 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -60,6 +60,7 @@ enum mmu_notifier_event { }; #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0) +#define MMU_NOTIFIER_RANGE_NUMA (1 << 1) struct mmu_notifier_ops { /* diff --git a/mm/mprotect.c b/mm/mprotect.c index 6f658d483704..cb99a7d66467 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -381,7 +381,9 @@ static inline long change_pmd_range(struct mmu_gather *tlb, /* invoke the mmu notifier if the pmd is populated */ if (!range.start) { mmu_notifier_range_init(&range, - MMU_NOTIFY_PROTECTION_VMA, 0, + MMU_NOTIFY_PROTECTION_VMA, + cp_flags & MM_CP_PROT_NUMA ? + MMU_NOTIFIER_RANGE_NUMA : 0, vma->vm_mm, addr, end); mmu_notifier_invalidate_range_start(&range); } From patchwork Tue Aug 8 07:15:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 132687 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c44e:0:b0:3f2:4152:657d with SMTP id w14csp2321151vqr; Tue, 8 Aug 2023 11:39:59 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFAmRhsLx2++4wfAdWfOEsx/UuTHv16u2G+pn7yvp25Hh3vDyvSRvUJpU0Nrodmeg6W8X64 X-Received: by 2002:a05:6402:b12:b0:523:18db:e3ab with SMTP id bm18-20020a0564020b1200b0052318dbe3abmr505784edb.39.1691519999118; Tue, 08 Aug 2023 11:39:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691519999; cv=none; d=google.com; s=arc-20160816; b=rop8eauACdTmI9cl4heCgM2Ok+7USHZDtpp082e9Mr4FKUc5WKmeOeUntG4NOGj2Fd Q1B6k0tZS97GRCKrSTpLjsM7eEhK81TuZF3TTwLIKSee+NEPDnf0nn7C2xOyGCwHyhGF sCmQ/xQTJIe7D+c98v/orZudrbXrStUOotGYwoopUfsp/W87z5mHHKyLI3InFBSel1ss Ce4WuXQIAsB37Uk27CUfGR+NWgHM7XmePpw1Z7dVKvgEoyP/CHKUS9sf6esEwnCuV+UP TSNIIuvGxjgBX737j0wk9VtFH1C5KlfjOeulUJd2J5RfmAXgrKpRtJSleM7sI++uHiHB tXog== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=SxloWDV3h+sdKOmXK4HSVovGWLEgXVGeioIvdwHwjaU=; fh=66ZP7DriuAjqdI9o/zCDiI1Ib6I2mhfDuCT5DLSNGrg=; b=k+j5yhwTQNZWZeznZbqIdgBUUugeyW1XQg00VNkxTjqTthdQQI1bLajWHEB/Q2U5EE Mc4VpQGHyiGgD8eaSCXM7Vrtq5eYxFaP50M6Pvr8pPmU/KtWHvz8INynxuXQoxa3IYcR I5+mib4TirTfq4/XUXrw/zk1q7ijh9y1mnrZssCUYleCs7RRta2ezk9ry40+yGj1NM4O rA80YnYuTGwoQlQGmlXjWPg1U7A6fPTvy9o1dTXpmgx4+0tB1Nsqx5xeCZilV6jm61kg Y+F2dHajvkbOYV6QBeODPgXcfUgVXWtrDJAjJl8i7nFsz5G7H+b0w2ILSE+3Nf2AAogG k8Lw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ORZ+D49z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n19-20020a056402061300b0052323052ebasi2319300edv.660.2023.08.08.11.39.35; Tue, 08 Aug 2023 11:39:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ORZ+D49z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233295AbjHHQna (ORCPT + 99 others); Tue, 8 Aug 2023 12:43:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231142AbjHHQmi (ORCPT ); Tue, 8 Aug 2023 12:42:38 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 142021655A; Tue, 8 Aug 2023 08:55:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691510124; x=1723046124; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=n1Msk62xu+U/2i8oJmw01jY05sFs/JrR7DH8MV3GHPs=; b=ORZ+D49z9sv2aH5G42g9nIrix9X2cgGmyTJ4Bl2+LhxKMpczVnVeiqYU bp6ry9i1ngVoYhNdVq02CQrmWCme9/BMSxrTNH8rjai9J/aBELx3W6iDJ Q0UlQvSQWOJrO07VBBmNoXzaw7r/Hw9fXQQGl6H6iQLTin7jQapkkFncN Dgp1Akviyn2G+OTMuHlvFB1WdxI09ZcBqAlwNdDTcUBsotTLccxo/t2hI PBz/eL1RfSrJ51WnM+AxIWETU+eaD0q4i8rcCGyji+EchJBtpo/MUWQv1 arBJCXpIvZ3LtV98wU/fHW8VgwkXY6DY/ccNUo59hE0sioJcyxVzFf50n w==; X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="369646759" X-IronPort-AV: E=Sophos;i="6.01,263,1684825200"; d="scan'208";a="369646759" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2023 00:42:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.01,202,1684825200"; d="scan'208";a="874625381" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2023 00:42:43 -0700 From: Yan Zhao To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, mike.kravetz@oracle.com, apopple@nvidia.com, jgg@nvidia.com, rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com, Yan Zhao Subject: [RFC PATCH 2/3] mm: don't set PROT_NONE to maybe-dma-pinned pages for NUMA-migrate purpose Date: Tue, 8 Aug 2023 15:15:46 +0800 Message-Id: <20230808071546.20173-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230808071329.19995-1-yan.y.zhao@intel.com> References: <20230808071329.19995-1-yan.y.zhao@intel.com> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773687274951757267 X-GMAIL-MSGID: 1773687274951757267 Don't set PROT_NONE for exclusive anonymas and maybe-dma-pinned pages for NUMA migration purpose. For exclusive anonymas and page_maybe_dma_pinned() pages, NUMA-migration will eventually drop migration of those pages in try_to_migrate_one(). (i.e. after -EBUSY returned in page_try_share_anon_rmap()). So, skip setting PROT_NONE to those kind of pages earlier in change_protection_range() phase to avoid later futile page faults, detections, and restoration to original PTEs/PMDs. Signed-off-by: Yan Zhao --- mm/huge_memory.c | 5 +++++ mm/mprotect.c | 5 +++++ 2 files changed, 10 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index eb3678360b97..a71cf686e3b2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1875,6 +1875,11 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, goto unlock; page = pmd_page(*pmd); + + if (PageAnon(page) && PageAnonExclusive(page) && + page_maybe_dma_pinned(page)) + goto unlock; + toptier = node_is_toptier(page_to_nid(page)); /* * Skip scanning top tier node if normal numa diff --git a/mm/mprotect.c b/mm/mprotect.c index cb99a7d66467..a1f63df34b86 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -146,6 +146,11 @@ static long change_pte_range(struct mmu_gather *tlb, nid = page_to_nid(page); if (target_node == nid) continue; + + if (PageAnon(page) && PageAnonExclusive(page) && + page_maybe_dma_pinned(page)) + continue; + toptier = node_is_toptier(nid); /* From patchwork Tue Aug 8 07:17:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 132795 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c44e:0:b0:3f2:4152:657d with SMTP id w14csp2356410vqr; Tue, 8 Aug 2023 12:45:54 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEBsnXB6azQqQT7vXyngFMQ99NK/pv4/JeCV+uZrRRrgtIwXWHBHOQRkvtsclAp24T40OLC X-Received: by 2002:aa7:c542:0:b0:523:2847:fb5a with SMTP id s2-20020aa7c542000000b005232847fb5amr590911edr.40.1691523953792; Tue, 08 Aug 2023 12:45:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691523953; cv=none; d=google.com; s=arc-20160816; b=GFgJjjJa71c0gbTmleSsyE56amvLlYyGVtSHQMZEqO+dqafAw7gsRVgulXfbRqVYEK jckvx7x6e9dGPCHEtnQqE2g4ZB3qCkaMNFp2oh4jqv6yTLXl7quLQsq3puK8DMfod0AZ gwb63LxJnxhyy3/0cnLusasEHwh8RYRSenGJbanBqeAy9QyGLEKi/tn506KuERgqijMd FQ4TRWEM4SS0oj+GyiSSs/HpkWzGfTzODFes07+dvtCmS/l7gpTZqGi6dnLrltNyDqzP 6I0bFQ1Wb5HHs5mMmIgOTKaBzCF9PowtY08ZJjH4p+3rfsbIFSOSnHZasdfcQMGj5IGm Zw3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=TTI6i3UWD/+5Il7eiaHyhNMlfjaIFCHboA46ySNBTAc=; fh=66ZP7DriuAjqdI9o/zCDiI1Ib6I2mhfDuCT5DLSNGrg=; b=m2w6mX2hTQku4/Mop6ibag8fB5RTjg0p8VvG7x2rYaanO4wVu82jzt2FAmwI7SNgwz clM7d7OuceJtr5Aw008j6yJgi0VL47d5mncQP/+hdovncoGFjQlbz+5dWWiUXmgbjaFE ZhkrKktsyib5nycif6W7TAnlEC/BqeqpyE6KxkSidw1kN/sBJqfbZna7Lq3yM2IifJdw r1W2seqfGAeqrYcuGS2lC1MY3iUz5ypbiMz6FWKhJgc8u+nSiuzh4wFoFu4C+ZhSdiLh FpPL8MLxRFANLlO2NvpytWTvrbrwcNzGcceC8diFQrMqwcmvoZnNTwTt2KhycUlfAJ4y Q/EA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=L85yKn0v; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z22-20020aa7c656000000b005233ded4188si3097673edr.432.2023.08.08.12.45.30; Tue, 08 Aug 2023 12:45:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=L85yKn0v; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229790AbjHHTJE (ORCPT + 99 others); Tue, 8 Aug 2023 15:09:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39896 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229526AbjHHTId (ORCPT ); Tue, 8 Aug 2023 15:08:33 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD9E2D6284; Tue, 8 Aug 2023 09:30:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691512237; x=1723048237; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=K1GJsw8lgF1tjIRVOFaimnhAkcSgXQtHjuv5CgZrIAk=; b=L85yKn0veHzlCxtgvqDywAwLbtM0gGcMUJlyOdRZb/DrNGO0DgL5SGVn 2KGCi4lSLoODHbxrTg00VakA8EKJs8aXkN7stg6F/qAFEvZz7YyPuOPfL QShE/NjV0bRILAteBMWULQfMdgihgkYwN9TthZ27XzTXeoiJsOmxEtQe5 5f0leygAZNRFB7UUmtXOWbcn53y17m0E4lfu8YFlMSExqYl4sAxX9RF8D zmN3/5D5faxddFUq8yoBTd8DRbHmkADyty0n4c7cMaC8F84IJUzZTaxWW C5cgF6h75qmK0eakRLlvSAcT8ifszFgcLHJoLr194T7qgm/ujZaYr5A+4 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="401711470" X-IronPort-AV: E=Sophos;i="6.01,263,1684825200"; d="scan'208";a="401711470" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2023 00:45:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="1061918389" X-IronPort-AV: E=Sophos;i="6.01,263,1684825200"; d="scan'208";a="1061918389" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2023 00:43:51 -0700 From: Yan Zhao To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, mike.kravetz@oracle.com, apopple@nvidia.com, jgg@nvidia.com, rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com, Yan Zhao Subject: [RFC PATCH 3/3] KVM: x86/mmu: skip zap maybe-dma-pinned pages for NUMA migration Date: Tue, 8 Aug 2023 15:17:02 +0800 Message-Id: <20230808071702.20269-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230808071329.19995-1-yan.y.zhao@intel.com> References: <20230808071329.19995-1-yan.y.zhao@intel.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773687631389578444 X-GMAIL-MSGID: 1773691421577859182 Skip zapping pages that're exclusive anonymas and maybe-dma-pinned in TDP MMU if it's for NUMA migration purpose to save unnecessary zaps and TLB shootdowns. For NUMA balancing, change_pmd_range() will send .invalidate_range_start() and .invalidate_range_end() pair unconditionally before setting a huge PMD or PTE to be PROT_NONE. No matter whether PROT_NONE is set under change_pmd_range(), NUMA migration will eventually reject migrating of exclusive anonymas and maybe_dma_pinned pages in later try_to_migrate_one() phase and restoring the affected huge PMD or PTE. Therefore, if KVM can detect those kind of pages in the zap phase, zap and TLB shootdowns caused by this kind of protection can be avoided. Corner cases like below are still fine. 1. Auto NUMA balancing selects a PMD range to set PROT_NONE in change_pmd_range(). 2. A page is maybe-dma-pinned at the time of sending .invalidate_range_start() with event type MMU_NOTIFY_PROTECTION_VMA. ==> so it's not zapped in KVM's secondary MMU. 3. The page is unpinned after sending .invalidate_range_start(), therefore is not maybe-dma-pinned and set to PROT_NONE in primary MMU. 4. For some reason, page fault is triggered in primary MMU and the page will be found to be suitable for NUMA migration. 5. try_to_migrate_one() will send .invalidate_range_start() notification with event type MMU_NOTIFY_CLEAR to KVM, and ===> KVM will zap the pages in secondary MMU. 6. The old page will be replaced by a new page in primary MMU. If step 4 does not happen, though KVM will keep accessing a page that might not be on the best NUMA node, it can be fixed by a next round of step 1 in Auto NUMA balancing as change_pmd_range() will send mmu notification without checking PROT_NONE is set or not. Currently in this patch, for NUMA migration protection purpose, only exclusive anonymous maybe-dma-pinned pages are skipped. Can later include other type of pages, e.g., is_zone_device_page() or PageKsm() if necessary. Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu/mmu.c | 4 ++-- arch/x86/kvm/mmu/tdp_mmu.c | 26 ++++++++++++++++++++++---- arch/x86/kvm/mmu/tdp_mmu.h | 4 ++-- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 5 +++++ 5 files changed, 32 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index d72f2b20f430..9dccc25b1389 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6307,8 +6307,8 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) if (tdp_mmu_enabled) { for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) - flush = kvm_tdp_mmu_zap_leafs(kvm, i, gfn_start, - gfn_end, true, flush); + flush = kvm_tdp_mmu_zap_leafs(kvm, i, gfn_start, gfn_end, + true, flush, false); } if (flush) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 6250bd3d20c1..17762b5a2b98 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -838,7 +838,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp) * operation can cause a soft lockup. */ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root, - gfn_t start, gfn_t end, bool can_yield, bool flush) + gfn_t start, gfn_t end, bool can_yield, bool flush, + bool skip_pinned) { struct tdp_iter iter; @@ -859,6 +860,21 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root, !is_last_spte(iter.old_spte, iter.level)) continue; + if (skip_pinned) { + kvm_pfn_t pfn = spte_to_pfn(iter.old_spte); + struct page *page = kvm_pfn_to_refcounted_page(pfn); + struct folio *folio; + + if (!page) + continue; + + folio = page_folio(page); + + if (folio_test_anon(folio) && PageAnonExclusive(&folio->page) && + folio_maybe_dma_pinned(folio)) + continue; + } + tdp_mmu_iter_set_spte(kvm, &iter, 0); flush = true; } @@ -878,12 +894,13 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root, * more SPTEs were zapped since the MMU lock was last acquired. */ bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end, - bool can_yield, bool flush) + bool can_yield, bool flush, bool skip_pinned) { struct kvm_mmu_page *root; for_each_tdp_mmu_root_yield_safe(kvm, root, as_id) - flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush); + flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush, + skip_pinned); return flush; } @@ -1147,7 +1164,8 @@ bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, bool flush) { return kvm_tdp_mmu_zap_leafs(kvm, range->slot->as_id, range->start, - range->end, range->may_block, flush); + range->end, range->may_block, flush, + range->skip_pinned); } typedef bool (*tdp_handler_t)(struct kvm *kvm, struct tdp_iter *iter, diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 0a63b1afabd3..2a9de44bc5c3 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -20,8 +20,8 @@ __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root) void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root, bool shared); -bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, - gfn_t end, bool can_yield, bool flush); +bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end, + bool can_yield, bool flush, bool skip_pinned); bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp); void kvm_tdp_mmu_zap_all(struct kvm *kvm); void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9125d0ab642d..f883d6b59545 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -266,6 +266,7 @@ struct kvm_gfn_range { gfn_t end; union kvm_mmu_notifier_arg arg; bool may_block; + bool skip_pinned; }; bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f84ef9399aee..1202c1daa568 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -532,6 +532,7 @@ struct kvm_hva_range { on_unlock_fn_t on_unlock; bool flush_on_ret; bool may_block; + bool skip_pinned; }; /* @@ -595,6 +596,7 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, */ gfn_range.arg = range->arg; gfn_range.may_block = range->may_block; + gfn_range.skip_pinned = range->skip_pinned; /* * {gfn(page) | page intersects with [hva_start, hva_end)} = @@ -754,6 +756,9 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, .on_unlock = kvm_arch_guest_memory_reclaimed, .flush_on_ret = true, .may_block = mmu_notifier_range_blockable(range), + .skip_pinned = test_bit(MMF_HAS_PINNED, &range->mm->flags) && + (range->event == MMU_NOTIFY_PROTECTION_VMA) && + (range->flags & MMU_NOTIFIER_RANGE_NUMA), }; trace_kvm_unmap_hva_range(range->start, range->end);