From patchwork Fri Oct 21 10:16:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolin Wang X-Patchwork-Id: 6620 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp610801wrr; Fri, 21 Oct 2022 03:18:16 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5hEgqqv/1itfOq8ZWs79JCnkn3wGpm9vuy2AIOpWbenra2UWWnTiqU6mQQhXrFjxhEfwan X-Received: by 2002:a17:907:6d9b:b0:78d:f24b:e358 with SMTP id sb27-20020a1709076d9b00b0078df24be358mr14703228ejc.714.1666347496627; Fri, 21 Oct 2022 03:18:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666347496; cv=none; d=google.com; s=arc-20160816; b=mbbxLSVp/9H2AmQzKSIjVO+PuxsR6TPJFYkZRPsDnOVBZ+eAtelbJ1Ukb3x6nxUjtE 3SzASFsu81m3R8CcCZrq2FQrKwHHD30iYW3Eokf3YUYfODmoscCKxIZLm9lCLzMErrsu iZ1Ww32+2WwjsW3NfgI/xWyy5mDi/9Xn3cdMSBbuZ0icfvBoYGtKI8/VtnasbR+hH4kD GPNw6jTtvgp1CWuk9wPRktt3AIIvTQBwOa0UqByjE1Lh9ltgOzcO6LV11af+GSdlyRH1 kFc4qnT/fnnfh1VFAcubz1zQYf0KTL8uptXTW1m/EcaORMZ7xcgps+EsWbHIGQVF/Xio M12g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from; bh=VgX9H1qAS+UbCF4uoAVvy1TMCj1OGBBNsSmsDI0cBG4=; b=Xsv8xIbmPTvlMddQiLPdFRnhUisatDdYpAs3PExVLmo1z8a2bttkEJloIeFYapDH9F uaF+PXdMagJYnikOoKLevltH5qHXhQelAt6rZl6SLwQAStyMooh3P5jx0CFZMeODWXr2 h4QvpClBCrH3VVDZUal72vZEXeM3sfiWisNhPqCJq3bmq/lcumtP0bEpE8QMmQ+iNSVg Fjhlw/oCP4ZIbVEBnFecRIQ6U+NLS9n3wLNQL5u243PfnkpuLpSO2PjxVk6tZ+rAxZ+Y 82Ry4n+uw9AzOtccfor1PjO3z/yloemUT9yGcwrLuwGuYp13Xo4Xml5tm9Wm18b+fsjy cVOw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id du6-20020a17090772c600b0078c0c866a18si20818898ejc.19.2022.10.21.03.17.51; Fri, 21 Oct 2022 03:18:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230446AbiJUKQl (ORCPT + 99 others); Fri, 21 Oct 2022 06:16:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230430AbiJUKQi (ORCPT ); Fri, 21 Oct 2022 06:16:38 -0400 Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30B401AD680 for ; Fri, 21 Oct 2022 03:16:36 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045192;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0VSjF-ZZ_1666347392; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0VSjF-ZZ_1666347392) by smtp.aliyun-inc.com; Fri, 21 Oct 2022 18:16:33 +0800 From: Baolin Wang To: akpm@linux-foundation.org Cc: david@redhat.com, ying.huang@intel.com, ziy@nvidia.com, shy828301@gmail.com, apopple@nvidia.com, baolin.wang@linux.alibaba.com, jingshan@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 1/2] mm: migrate: Fix return value if all subpages of THPs are migrated successfully Date: Fri, 21 Oct 2022 18:16:23 +0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747291992492526484?= X-GMAIL-MSGID: =?utf-8?q?1747291992492526484?= When THP migration, if THPs are split and all subpages are migrated successfully , the migrate_pages() will still return the number of THP that were not migrated. That will confuse the callers of migrate_pages(), for example, which will make the longterm pinning failed though all pages are migrated successfully. Thus we should return 0 to indicate all pages are migrated in this case. Signed-off-by: Baolin Wang Reviewed-by: Alistair Popple --- Changes from v1: - Fix the return value of migrate_pages() instead of fixing the callers' validation. --- mm/migrate.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/migrate.c b/mm/migrate.c index 8e5eb6e..1da0dbc 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1582,6 +1582,13 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, */ list_splice(&ret_pages, from); + /* + * Return 0 in case all subpages of fail-to-migrate THPs are + * migrated successfully. + */ + if (nr_thp_split && list_empty(from)) + rc = 0; + count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded); count_vm_events(PGMIGRATE_FAIL, nr_failed_pages); count_vm_events(THP_MIGRATION_SUCCESS, nr_thp_succeeded); From patchwork Fri Oct 21 10:16:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolin Wang X-Patchwork-Id: 6621 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp610924wrr; Fri, 21 Oct 2022 03:18:32 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4t8sZu3tXkO0BRI+0WU1QBAwc4D9PLRHR9Qup8s3ODfsDtdsKA1hkBq+50UPYPjhZQn5Fj X-Received: by 2002:a17:906:1ec5:b0:78d:b3d1:183b with SMTP id m5-20020a1709061ec500b0078db3d1183bmr14763571ejj.709.1666347512424; Fri, 21 Oct 2022 03:18:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666347512; cv=none; d=google.com; s=arc-20160816; b=emhdf4URQaZEv2FZ2KWMuzGagURr7rVJ9VKMEufyG/24bClYueC47W3EZNi4Ce2roM 6GW5gRhXHDaK4EgXPWJXWl5w9+QXU+vA+U0U4h5Klx5M/bXlYDcBeM1y/Ke1Z3F23lrK NQWMQY/s//rvEZaKd6S3Amj5/9k2UoFtOQwVE81vvpRq4QocolEOSxpUDGJiQnM988Dj XJ6NI6jcXO/+6BL/rFZIrdb9Ne3omeTBT6r98NzhHCQlya68L0No09ysFpRaOf2iz3N/ 8aqY8Mil8WL6lLwlWR+s0mvYNfI4YGeHvOV8J0oPRAKK1xvK8Fa+NrhRVFVmtcwqnGso L3TQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:references:in-reply-to :message-id:date:subject:cc:to:from; bh=N0k4jT1y19ruhynEFfZshoJ2zJKmqfD5pvpkeHcAZ10=; b=qGbJyYlTuYcdRY4kE9RgUAQ0a8RrlERcW0hB2zdlDqSNYVJMvLr4Zz4CL4fUvwcxKz nkMATuQGV+N17l/Nk86BhA0AmT2+9SQBcEF7d1+zMEEMjJlHbr1YvpuCTgON4OAp5QaP fY11A+2y9wFlGvm8Ug4sYDoq9cznIcaFH/QOXlj7ZeClOT6fs6y5UHKzcdPbSm9UNe8A y4qz9Kzbg/jKTQo8O5kSrpYWv82W2mCoExv9W+GHMpZzAqMbNXtqLSCiVXRdzlQJtqKe WlC4GkpD2Tzot/L+XlUJwF//08B7O7qjGE2GRNLfFWMcntdHxIhFH9oS9tFe/INNqOZx FFMw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 4-20020a508e04000000b0045a22a21abesi17330064edw.299.2022.10.21.03.18.07; Fri, 21 Oct 2022 03:18:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230454AbiJUKQo (ORCPT + 99 others); Fri, 21 Oct 2022 06:16:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43170 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230423AbiJUKQj (ORCPT ); Fri, 21 Oct 2022 06:16:39 -0400 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11B4B18A008 for ; Fri, 21 Oct 2022 03:16:37 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046060;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0VSj.fqa_1666347393; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0VSj.fqa_1666347393) by smtp.aliyun-inc.com; Fri, 21 Oct 2022 18:16:34 +0800 From: Baolin Wang To: akpm@linux-foundation.org Cc: david@redhat.com, ying.huang@intel.com, ziy@nvidia.com, shy828301@gmail.com, apopple@nvidia.com, baolin.wang@linux.alibaba.com, jingshan@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 2/2] mm: migrate: Try again if THP split is failed due to page refcnt Date: Fri, 21 Oct 2022 18:16:24 +0800 Message-Id: <88831f1764c8fbd5b5fdad27cd5ae3d2ca796e44.1666335603.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747292009412288510?= X-GMAIL-MSGID: =?utf-8?q?1747292009412288510?= When creating a virtual machine, we will use memfd_create() to get a file descriptor which can be used to create share memory mappings using the mmap function, meanwhile the mmap() will set the MAP_POPULATE flag to allocate physical pages for the virtual machine. When allocating physical pages for the guest, the host can fallback to allocate some CMA pages for the guest when over half of the zone's free memory is in the CMA area. In guest os, when the application wants to do some data transaction with DMA, our QEMU will call VFIO_IOMMU_MAP_DMA ioctl to do longterm-pin and create IOMMU mappings for the DMA pages. However, when calling VFIO_IOMMU_MAP_DMA ioctl to pin the physical pages, we found it will be failed to longterm-pin sometimes. After some invetigation, we found the pages used to do DMA mapping can contain some CMA pages, and these CMA pages will cause a possible failure of the longterm-pin, due to failed to migrate the CMA pages. The reason of migration failure may be temporary reference count or memory allocation failure. So that will cause the VFIO_IOMMU_MAP_DMA ioctl returns error, which makes the application failed to start. I observed one migration failure case (which is not easy to reproduce) is that, the 'thp_migration_fail' count is 1 and the 'thp_split_page_failed' count is also 1. That means when migrating a THP which is in CMA area, but can not allocate a new THP due to memory fragmentation, so it will split the THP. However THP split is also failed, probably the reason is temporary reference count of this THP. And the temporary reference count can be caused by dropping page caches (I observed the drop caches operation in the system), but we can not drop the shmem page caches due to they are already dirty at that time. Especially for THP split failure, which is caused by temporary reference count, we can try again to mitigate the failure of migration in this case according to previous discussion [1]. [1] https://lore.kernel.org/all/470dc638-a300-f261-94b4-e27250e42f96@redhat.com/ Signed-off-by: Baolin Wang Reviewed-by: "Huang, Ying" --- Changes from v1: - Use another variable to save the return value of THP split. --- mm/huge_memory.c | 4 ++-- mm/migrate.c | 19 ++++++++++++++++--- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ad17c8d..a79f03b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2666,7 +2666,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) * split PMDs */ if (!can_split_folio(folio, &extra_pins)) { - ret = -EBUSY; + ret = -EAGAIN; goto out_unlock; } @@ -2716,7 +2716,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) xas_unlock(&xas); local_irq_enable(); remap_page(folio, folio_nr_pages(folio)); - ret = -EBUSY; + ret = -EAGAIN; } out_unlock: diff --git a/mm/migrate.c b/mm/migrate.c index 1da0dbc..6d49a3e 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1506,9 +1506,22 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, if (is_thp) { nr_thp_failed++; /* THP NUMA faulting doesn't split THP to retry. */ - if (!nosplit && !try_split_thp(page, &thp_split_pages)) { - nr_thp_split++; - break; + if (!nosplit) { + int ret = try_split_thp(page, &thp_split_pages); + + if (!ret) { + nr_thp_split++; + break; + } else if (reason == MR_LONGTERM_PIN && + ret == -EAGAIN) { + /* + * Try again to split THP to mitigate + * the failure of longterm pinning. + */ + thp_retry++; + nr_retry_pages += nr_subpages; + break; + } } } else if (!no_subpage_counting) { nr_failed++;