From patchwork Mon Oct 24 08:34:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolin Wang X-Patchwork-Id: 8246 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp325753wru; Mon, 24 Oct 2022 01:48:39 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4F8y4GNDyUHNjjKmXqSdbm4tKyVzYg/9nlDs2mTKyM7plIVtmeE0DcTQexLxIpohobjphc X-Received: by 2002:a17:906:da85:b0:741:40a7:d08d with SMTP id xh5-20020a170906da8500b0074140a7d08dmr27980417ejb.263.1666601319084; Mon, 24 Oct 2022 01:48:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666601319; cv=none; d=google.com; s=arc-20160816; b=Nkwy3a7UBrzrZ7R0coJeNzvX4+49MHhwTbI/Z5fTDaGhzKivI+cUCXh/Lo9kIk2JMJ P7qPfPFNrkLx93DCAn78KJmD1+TXCMTCBWYRQSPm2KWwx/CfWkLG9MCCAT7iC0vMk9iF UQMZQWpRRX4dCGS5/1r1ymuxSckzE0p3WBKZeaC0ft1BdPRC27zqCJnAwjf/i51qVAMw RsvnUKPhpsT+SUKHONMOuljXADLada8PR09y+t85Vwl10xlWmfBKQWL6r/hkkFjvanUU xQvIvVqb/kTksYd4jEBVRyGganTWema7c/RSP7TOAMN5f/Qk7mk9zVRU9XVtlmz7ay7q 2DFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from; bh=KeC00WeoYZ/ViJPn20UQg+IEpDwRzEZ+DxYoDOeVhEM=; b=YxtBeisr0RoS+GZIQ1aItyh+NNQCQ8prHZqQWBMLmhZ2z1gSs32o6UpGOGeKpIOhDK YRN+1vCUZeAQpngd5uGIKrULZZ1oP+6phxq4e+KHCKaSgDm9TQnL/asrvHD8+Tz9OQ7y rHlDXwiXjVtXdsrgVidjQoRIBbSD08gOqtgf+3EG0RohyQ4YkWVsU3zjFH84kGiGr1J9 jC/QTr6+3wl5XMFHBmvEMMAwrlrDNIBCPrsnLu2MuTN4SMm4DCAUvNw7L5nSP2lfxbpm cbeGJ6YzWg13c5G9ste8ncR/G3wHSI+bju0WNmIJ3UNuhha1vVwkXAmVhdj+gzY8Hl2h lLew== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dn16-20020a17090794d000b00730870cb4b6si22546537ejc.621.2022.10.24.01.48.14; Mon, 24 Oct 2022 01:48:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229738AbiJXIem (ORCPT + 99 others); Mon, 24 Oct 2022 04:34:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229721AbiJXIei (ORCPT ); Mon, 24 Oct 2022 04:34:38 -0400 Received: from out30-44.freemail.mail.aliyun.com (out30-44.freemail.mail.aliyun.com [115.124.30.44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2C5851A229 for ; Mon, 24 Oct 2022 01:34:34 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R571e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046060;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0VSvcrBd_1666600470; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0VSvcrBd_1666600470) by smtp.aliyun-inc.com; Mon, 24 Oct 2022 16:34:31 +0800 From: Baolin Wang To: akpm@linux-foundation.org Cc: david@redhat.com, ying.huang@intel.com, ziy@nvidia.com, shy828301@gmail.com, apopple@nvidia.com, baolin.wang@linux.alibaba.com, jingshan@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 1/2] mm: migrate: Fix return value if all subpages of THPs are migrated successfully Date: Mon, 24 Oct 2022 16:34:21 +0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747558144931584831?= X-GMAIL-MSGID: =?utf-8?q?1747558144931584831?= During THP migration, if THPs are not migrated but they are split and all subpages are migrated successfully, migrate_pages() will still return the number of THP pages that were not migrated. This will confuse the callers of migrate_pages(). For example, the longterm pinning will failed though all pages are migrated successfully. Thus we should return 0 to indicate that all pages are migrated in this case Fixes: b5bade978e9b ("mm: migrate: fix the return value of migrate_pages()") Signed-off-by: Baolin Wang Reviewed-by: Alistair Popple Cc: Reviewed-by: Yang Shi --- Changes from v2: - Add Fixes tag suggested by Yang Shi and Huang, Ying. - Drop 'nr_thp_split' validation suggested by Alistair. - Add reviewed tag from Alistair. - Update the commit message suggested by Andrew. Changes from v1: - Fix the return value of migrate_pages() instead of fixing the callers' validation. --- mm/migrate.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/migrate.c b/mm/migrate.c index 8e5eb6e..2eb16f8 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1582,6 +1582,13 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, */ list_splice(&ret_pages, from); + /* + * Return 0 in case all subpages of fail-to-migrate THPs are + * migrated successfully. + */ + if (list_empty(from)) + rc = 0; + count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded); count_vm_events(PGMIGRATE_FAIL, nr_failed_pages); count_vm_events(THP_MIGRATION_SUCCESS, nr_thp_succeeded); From patchwork Mon Oct 24 08:34:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolin Wang X-Patchwork-Id: 8245 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp325165wru; Mon, 24 Oct 2022 01:46:45 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4OoWiTd0IvQIfOCrUSE4l99onBbbW1D/ofRR07Kd/Zc2TScL/ztkI1QV1CUcJOQoGLFYjQ X-Received: by 2002:a17:907:16a5:b0:78d:e608:f073 with SMTP id hc37-20020a17090716a500b0078de608f073mr26856226ejc.753.1666601205021; Mon, 24 Oct 2022 01:46:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666601205; cv=none; d=google.com; s=arc-20160816; b=g8/cpv9bthTEMFNYbETdGFG2fWMupOwBfwo5ZJAr5OonaHJbG/r1XElr0OrIM+1da2 UbIkvfIMDPfeJXstSbEHlS02WToC+m/d44DMGVKtUEpEDCGr/HnYG9ftcgiwHBFVW47C 8XqTtzJyaKefJ4vdh6jiYQqs0pW+yMmapQNsOLKTH6JL/GtfY0R0UzrF+DyDUwOeYkBq tqGKMc1rm20qiZ98M8c6FKWvQgwjEAzeh9/OdW4WM8yO8F/GCmMwzYJGNunttYzilaOx 7AgDi1IwBjHytyov7q20qfn9umj2T3fjFh58VhC3EU82MWsfm//8gS7KSky02JjlEvV/ uXlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:references:in-reply-to :message-id:date:subject:cc:to:from; bh=yHuPSwCVfjgJqKstmV+GbFjwqW+Eg3mz1UD+VB1tqn8=; b=QCSEW/DBodH+XnpfCzUSkTLxH3akXzQsL3KPRBiJJ4YPP+jkBv5BtDZNY/4Upl9SvP zP8VZ7wmXkf5GA28hOFIES1CAoCv5jFzLREsWHhXjAX6GhKOH4wG41BKv5wKPpIaj7Vp Ql0zi1LAk4GiRaRCnQV8d4w/8DoLPAlao2ip5YHCa2OtJsKgUEMhI7nbXP4QHD0D3qcD eWTHYJjiG2BgQgTYBINQaa8kt8lTLqxMEMtVq4ICpjXUW+vZWL07rPt0c3l8Cffc2u+d N1Uft65plEEcd0S87ysdoQAgNSdVBCdEr0Ra4FcftgeSbUP/IcvJ2ygrsx+ai4oH13mb 8g4A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kx24-20020a170907775800b0078266dc4b8csi23650081ejc.719.2022.10.24.01.46.18; Mon, 24 Oct 2022 01:46:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230080AbiJXIep (ORCPT + 99 others); Mon, 24 Oct 2022 04:34:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51388 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229729AbiJXIei (ORCPT ); Mon, 24 Oct 2022 04:34:38 -0400 Received: from out30-45.freemail.mail.aliyun.com (out30-45.freemail.mail.aliyun.com [115.124.30.45]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 225511AD8A for ; Mon, 24 Oct 2022 01:34:35 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046051;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0VSvcrCI_1666600471; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0VSvcrCI_1666600471) by smtp.aliyun-inc.com; Mon, 24 Oct 2022 16:34:32 +0800 From: Baolin Wang To: akpm@linux-foundation.org Cc: david@redhat.com, ying.huang@intel.com, ziy@nvidia.com, shy828301@gmail.com, apopple@nvidia.com, baolin.wang@linux.alibaba.com, jingshan@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 2/2] mm: migrate: Try again if THP split is failed due to page refcnt Date: Mon, 24 Oct 2022 16:34:22 +0800 Message-Id: <6784730480a1df82e8f4cba1ed088e4ac767994b.1666599848.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747558024879561178?= X-GMAIL-MSGID: =?utf-8?q?1747558024879561178?= When creating a virtual machine, we will use memfd_create() to get a file descriptor which can be used to create share memory mappings using the mmap function, meanwhile the mmap() will set the MAP_POPULATE flag to allocate physical pages for the virtual machine. When allocating physical pages for the guest, the host can fallback to allocate some CMA pages for the guest when over half of the zone's free memory is in the CMA area. In guest os, when the application wants to do some data transaction with DMA, our QEMU will call VFIO_IOMMU_MAP_DMA ioctl to do longterm-pin and create IOMMU mappings for the DMA pages. However, when calling VFIO_IOMMU_MAP_DMA ioctl to pin the physical pages, we found it will be failed to longterm-pin sometimes. After some invetigation, we found the pages used to do DMA mapping can contain some CMA pages, and these CMA pages will cause a possible failure of the longterm-pin, due to failed to migrate the CMA pages. The reason of migration failure may be temporary reference count or memory allocation failure. So that will cause the VFIO_IOMMU_MAP_DMA ioctl returns error, which makes the application failed to start. I observed one migration failure case (which is not easy to reproduce) is that, the 'thp_migration_fail' count is 1 and the 'thp_split_page_failed' count is also 1. That means when migrating a THP which is in CMA area, but can not allocate a new THP due to memory fragmentation, so it will split the THP. However THP split is also failed, probably the reason is temporary reference count of this THP. And the temporary reference count can be caused by dropping page caches (I observed the drop caches operation in the system), but we can not drop the shmem page caches due to they are already dirty at that time. Especially for THP split failure, which is caused by temporary reference count, we can try again to mitigate the failure of migration in this case according to previous discussion [1]. [1] https://lore.kernel.org/all/470dc638-a300-f261-94b4-e27250e42f96@redhat.com/ Signed-off-by: Baolin Wang Reviewed-by: "Huang, Ying" --- Changes from v2: - Add reviewed tag from Ying. Thanks. Changes from v1: - Use another variable to save the return value of THP split. --- mm/huge_memory.c | 4 ++-- mm/migrate.c | 19 ++++++++++++++++--- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ad17c8d..a79f03b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2666,7 +2666,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) * split PMDs */ if (!can_split_folio(folio, &extra_pins)) { - ret = -EBUSY; + ret = -EAGAIN; goto out_unlock; } @@ -2716,7 +2716,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) xas_unlock(&xas); local_irq_enable(); remap_page(folio, folio_nr_pages(folio)); - ret = -EBUSY; + ret = -EAGAIN; } out_unlock: diff --git a/mm/migrate.c b/mm/migrate.c index 2eb16f8..14562ab 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1506,9 +1506,22 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, if (is_thp) { nr_thp_failed++; /* THP NUMA faulting doesn't split THP to retry. */ - if (!nosplit && !try_split_thp(page, &thp_split_pages)) { - nr_thp_split++; - break; + if (!nosplit) { + int ret = try_split_thp(page, &thp_split_pages); + + if (!ret) { + nr_thp_split++; + break; + } else if (reason == MR_LONGTERM_PIN && + ret == -EAGAIN) { + /* + * Try again to split THP to mitigate + * the failure of longterm pinning. + */ + thp_retry++; + nr_retry_pages += nr_subpages; + break; + } } } else if (!no_subpage_counting) { nr_failed++;