From patchwork Thu Oct 20 07:49:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolin Wang X-Patchwork-Id: 6023 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp124623wrs; Thu, 20 Oct 2022 01:00:31 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6/di5ydE968N3xCFNn5CrwW/8g1VuK8qXbEU1uXa84D8j8BtETXUAnqep/XwB0cBwyOSWX X-Received: by 2002:a17:906:5dcc:b0:78d:e76a:ef23 with SMTP id p12-20020a1709065dcc00b0078de76aef23mr9618003ejv.317.1666252831748; Thu, 20 Oct 2022 01:00:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666252831; cv=none; d=google.com; s=arc-20160816; b=g7Ak4WeItbicMfHH3IG2DWoXtl58HH8JolJXNYVjY8E2eCfLL7V20lMJUN8Y37S5kE GosAWBSbq1zMxQUXF/ilXyWAZPQJeqzMBBUVPakSPIjYz/cl+dx+zsWUKSTKZPBJ1a64 ZAe/PdPm2GMbK45/zoaGxanX+73ozDnzWRYuIn2F5mACxhg0nX9QkR8jdT9gtqFQ1PjQ H2+lf+jlZc5aXYM7FAuvN9N+vbGxl/gw1AkorA5d6PtzY12d1vDHcmT0rcUzwMn9+aSK bLrsYnTTRt4Cq7Ntwgvn3eF3pQ2QSZQQpg7cqT/g7bs6ZTQYbypghvVLqi57EmQYKJtQ PRFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from; bh=NESju3q1gAbH3qzt3XpNCw/tm1i7c3v5l8OQbViFb0M=; b=mabAGtYmB7ltkZr9s84oJTdOgN0EtwxEsva/XhixFKOyUe4DwBrD/qUXbYDixyiTIS s9UyOvv82BEYxJ+VhuCJHxqoTDp+ue43BJ3rLZmDtvxHd8KO+nfU1k2qFUoJGAgpeGc6 jzGa3ACTTPII25QwvYUeTV1RoUP6q4QN99JioBEkiwIljawYf8w/Jby9rDL7PjtFPKff KhGjkRgA4uzeeTVQ4mX37OaWcp39Zy+MrI3aDXVGxalrM/ScYlh6F1KymbSfxp516qfm ZvksEMu7WbAJGUzSE/P5RQo0GRIGf7GMKlf2CUpAiBZipWPinQ1frFAMbxTkEvZQ9MOg +WIQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id sz15-20020a1709078b0f00b0078cffe5dcdesi12680353ejc.451.2022.10.20.01.00.05; Thu, 20 Oct 2022 01:00:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230481AbiJTHtV (ORCPT + 99 others); Thu, 20 Oct 2022 03:49:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231149AbiJTHtR (ORCPT ); Thu, 20 Oct 2022 03:49:17 -0400 Received: from out30-56.freemail.mail.aliyun.com (out30-56.freemail.mail.aliyun.com [115.124.30.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E6CC1757B3 for ; Thu, 20 Oct 2022 00:49:15 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046050;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=9;SR=0;TI=SMTPD_---0VSeILu6_1666252152; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0VSeILu6_1666252152) by smtp.aliyun-inc.com; Thu, 20 Oct 2022 15:49:13 +0800 From: Baolin Wang To: akpm@linux-foundation.org Cc: david@redhat.com, ying.huang@intel.com, ziy@nvidia.com, shy828301@gmail.com, baolin.wang@linux.alibaba.com, jingshan@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/2] mm: gup: Re-pin pages in case of trying several times to migrate Date: Thu, 20 Oct 2022 15:49:00 +0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747192728993719132?= X-GMAIL-MSGID: =?utf-8?q?1747192728993719132?= The migrate_pages() will return the number of {normal page, THP, hugetlb} that were not migrated, or an error code. That means it can still return the number of failure count, though the pages have been migrated successfully with several times re-try. So we should not use the return value of migrate_pages() to determin if there are pages are failed to migrate. Instead we can validate the 'movable_page_list' to see if there are pages remained in the list, which are failed to migrate. That can mitigate the failure of longterm pinning. Signed-off-by: Baolin Wang --- mm/gup.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 5182aba..bd8cfcd 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1914,9 +1914,10 @@ static int migrate_longterm_unpinnable_pages( .gfp_mask = GFP_USER | __GFP_NOWARN, }; - if (migrate_pages(movable_page_list, alloc_migration_target, - NULL, (unsigned long)&mtc, MIGRATE_SYNC, - MR_LONGTERM_PIN, NULL)) { + ret = migrate_pages(movable_page_list, alloc_migration_target, + NULL, (unsigned long)&mtc, MIGRATE_SYNC, + MR_LONGTERM_PIN, NULL); + if (ret < 0 || !list_empty(movable_page_list)) { ret = -ENOMEM; goto err; } From patchwork Thu Oct 20 07:49:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolin Wang X-Patchwork-Id: 6022 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp123331wrs; Thu, 20 Oct 2022 00:57:35 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5JGlRhlJMIZMbyJyBahmqInYXr+VMtmAMC7jVtViwio1fNnEnOFoiZH+rHQz/49evimPKQ X-Received: by 2002:a05:6a00:1828:b0:563:24ea:5728 with SMTP id y40-20020a056a00182800b0056324ea5728mr12737908pfa.3.1666252655040; Thu, 20 Oct 2022 00:57:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666252655; cv=none; d=google.com; s=arc-20160816; b=Z7HmHmd+oJY5XMBHvdF+BJLsZ9ul97N9r9h5Gvqhc7WITxxnHi+xPdCNswVsN8FOSq ipjqI4RtyiOWVRIxTurcCOk2ue+hvYXL8lTXC5xjtg+smxwrZwVp8jfhpSKtjFPswOie 7VEWd8hmDWpperjbnMrCDEB6yD9XHZk2F2Q7Pc1es9/mi9wc7gFgj2Qtgd/QQEIbnAnl OeJvBzOPPd/qnLN7NkhMZlsNVmZ1qgSPeVsSZidnxcL3KGa4CmTciN4IAx3qJenujTkr 0HO81A+PGyP14fFFoQmDA0WnbfT2Mc6NkvUkjRJxe4b4CaWhm1VL8Z4Uw5doUdD9sKZw HzrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:references:in-reply-to :message-id:date:subject:cc:to:from; bh=ZltSmbhiU0ljivOsJWaxq+ZuWDtrgMfiiTpfEGjto18=; b=JUZx82Pa5uup0NxpW1diNczEEdxZbRRwWrMkfWQM4Bai6e4ScojJrhS+UjRe3Jz6Pt KBSP0GIC9rnyc5M6B3aNqXGradon8sPV89MmaxZobXU3P0nRamN2Rz9q1LU/zvNEjqNu L+NGKLsUWMZFm+0fJGeScVHpMkfCQtc/vNPRfmUDTzAQQUbdxHNnqRf9rel/nIx8tn08 f+WsY4NptmeaxxkTUCZ0eSNO9buGfVdcucHG60ecb7j7upQCOG0abZlV5/dTnhS6lXAd Sh8bK/NSAFTNSI3F0mHwU+HITeZ9HCVre4zGoPjtB5F9fE1ybZ7eRBa1nKKd5Gyvc+7d hWWA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p14-20020a631e4e000000b0043560d14c72si20405300pgm.105.2022.10.20.00.57.14; Thu, 20 Oct 2022 00:57:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230492AbiJTHtZ (ORCPT + 99 others); Thu, 20 Oct 2022 03:49:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231153AbiJTHtS (ORCPT ); Thu, 20 Oct 2022 03:49:18 -0400 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A5BA17A005 for ; Thu, 20 Oct 2022 00:49:16 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R201e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045168;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=9;SR=0;TI=SMTPD_---0VSeILvE_1666252153; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0VSeILvE_1666252153) by smtp.aliyun-inc.com; Thu, 20 Oct 2022 15:49:14 +0800 From: Baolin Wang To: akpm@linux-foundation.org Cc: david@redhat.com, ying.huang@intel.com, ziy@nvidia.com, shy828301@gmail.com, baolin.wang@linux.alibaba.com, jingshan@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 2/2] mm: migrate: Try again if THP split is failed due to page refcnt Date: Thu, 20 Oct 2022 15:49:01 +0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747192543955722728?= X-GMAIL-MSGID: =?utf-8?q?1747192543955722728?= When creating a virtual machine, we will use memfd_create() to get a file descriptor which can be used to create share memory mappings using the mmap function, meanwhile the mmap() will set the MAP_POPULATE flag to allocate physical pages for the virtual machine. When allocating physical pages for the guest, the host can fallback to allocate some CMA pages for the guest when over half of the zone's free memory is in the CMA area. In guest os, when the application wants to do some data transaction with DMA, our QEMU will call VFIO_IOMMU_MAP_DMA ioctl to do longterm-pin and create IOMMU mappings for the DMA pages. However, when calling VFIO_IOMMU_MAP_DMA ioctl to pin the physical pages, we found it will be failed to longterm-pin sometimes. After some invetigation, we found the pages used to do DMA mapping can contain some CMA pages, and these CMA pages will cause a possible failure of the longterm-pin, due to failed to migrate the CMA pages. The reason of migration failure may be temporary reference count or memory allocation failure. So that will cause the VFIO_IOMMU_MAP_DMA ioctl returns error, which makes the application failed to start. I observed one migration failure case (which is not easy to reproduce) is that, the 'thp_migration_fail' count is 1 and the 'thp_split_page_failed' count is also 1. That means when migrating a THP which is in CMA area, but can not allocate a new THP due to memory fragmentation, so it will split the THP. However THP split is also failed, probably the reason is temporary reference count of this THP. And the temporary reference count can be caused by dropping page caches (I observed the drop caches operation in the system), but we can not drop the shmem page caches due to they are already dirty at that time. Especially for THP split failure, which is caused by temporary reference count, we can try again to mitigate the failure of migration in this case according to previous discussion [1]. [1] https://lore.kernel.org/all/470dc638-a300-f261-94b4-e27250e42f96@redhat.com/ Signed-off-by: Baolin Wang --- mm/huge_memory.c | 4 ++-- mm/migrate.c | 18 +++++++++++++++--- 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ad17c8d..a79f03b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2666,7 +2666,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) * split PMDs */ if (!can_split_folio(folio, &extra_pins)) { - ret = -EBUSY; + ret = -EAGAIN; goto out_unlock; } @@ -2716,7 +2716,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) xas_unlock(&xas); local_irq_enable(); remap_page(folio, folio_nr_pages(folio)); - ret = -EBUSY; + ret = -EAGAIN; } out_unlock: diff --git a/mm/migrate.c b/mm/migrate.c index 8e5eb6e..55c7855 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1506,9 +1506,21 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, if (is_thp) { nr_thp_failed++; /* THP NUMA faulting doesn't split THP to retry. */ - if (!nosplit && !try_split_thp(page, &thp_split_pages)) { - nr_thp_split++; - break; + if (!nosplit) { + rc = try_split_thp(page, &thp_split_pages); + if (!rc) { + nr_thp_split++; + break; + } else if (reason == MR_LONGTERM_PIN && + rc == -EAGAIN) { + /* + * Try again to split THP to mitigate + * the failure of longterm pinning. + */ + thp_retry++; + nr_retry_pages += nr_subpages; + break; + } } } else if (!no_subpage_counting) { nr_failed++;