Message ID | 20230214223236.58430-1-sj@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp3237614wrn; Tue, 14 Feb 2023 14:41:06 -0800 (PST) X-Google-Smtp-Source: AK7set/QldiY+cYEmv0aHXg3ec6HM71iq8l5EfvcqjE38eZyCW/JL3L8QAcKm/K16Ll0fVSsC1d3 X-Received: by 2002:a05:6a20:42a2:b0:bc:d4cf:d64c with SMTP id o34-20020a056a2042a200b000bcd4cfd64cmr4157699pzj.10.1676414466535; Tue, 14 Feb 2023 14:41:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676414466; cv=none; d=google.com; s=arc-20160816; b=zduZUt2+80jxlL5dd0PtAIUrPzdmK93H/jyilkh6CdWhI+vV0Keb+UzDPRWee1zLoA M4OjujZZ/+J7qKYm/27elXrrLdXskbgYHe2h5oRVA/CqjGx9IuxsGoSkzfF7TLxAkOL0 ttGmsETUnUKAR0NG5HVypn6NUqKRRMwzSY4EblhizokSNlDWOODijpG1cnHOV002kO6G 4Gv46A8MBH/mYVEETstEN+hnF+LWzmExbEqOTMDAfwObeN3KlXoOsHfPaj90rYbmD71t 9Jtlse9CcOn2/DO0JSkwO92PaqtF7bRrfWV/nuNmiwvvGNV0yHUnZVYQI9OkgxfgjbUO LnRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=8CcXaxMyq3iiiakfWcZLYbSdjf8ofOVAA4WOUuVyoVM=; b=duKR1fWeWfhLEdMLEyUPt9JxsDlG6QUrSGnw45bvLKMoYmkF0Hdqev//L5XhQDLFZ/ RiRDyV9QgiqIC3pQ1bE6m60JDnvZijOyIdlq/JN/85ADt+8bW4eUbS6nr232Gv3iKhwm bcMrF8p0pkWjAr8IkISsL2J33bFGU/4jvyUekvgZR0l1sEJee74UgL8YjcEt5jRcwmOr bGB1b8yOzARMw33eeYYItDXdXw4rxCaS0CiPTTsK0rLR4/CWpehld/Z0jppW6N+//05x BZT4H3euzuy9H/y9GkF9/23EL9/BvUJfnM1Ru2HoHFrf33mUOwHRiN53LnSTNvXBUfhb MKlg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=SkpDRj6G; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o24-20020a17090aac1800b00233e4acc8d5si90579pjq.170.2023.02.14.14.40.53; Tue, 14 Feb 2023 14:41:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=SkpDRj6G; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229813AbjBNWcs (ORCPT <rfc822;tebrre53rla2o@gmail.com> + 99 others); Tue, 14 Feb 2023 17:32:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46438 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229519AbjBNWcp (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 14 Feb 2023 17:32:45 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4CDC72ED67 for <linux-kernel@vger.kernel.org>; Tue, 14 Feb 2023 14:32:44 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id BA7E261947 for <linux-kernel@vger.kernel.org>; Tue, 14 Feb 2023 22:32:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9ED34C433D2; Tue, 14 Feb 2023 22:32:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1676413963; bh=uCymYJjsvlSAxM5NzeVGbDlkk/dWJpRCAk9l/0RStWA=; h=From:To:Cc:Subject:Date:From; b=SkpDRj6GA4MjWpC9ZEdydWBG+6rhmVnxHG+cMRQ1D99RJJjTo23AtnBINs3gylKva 3NA0UCoHrWsKQLCOGD8joUhDwU9cBeiIzu4Pu1Km6XZLkLVUuXvvkGktiZ5N7kJU6R rDPqpJPsOpeyPwM4UKW4UQHsrJLHi3wRdEVJildlep5aIwP6y4RUgEsTlgDS4oGPmg L50nvmNyY7GN3vSNklsA1zaoOSxivjp+WXoP7fdug9WWL5zUknog9rv0I7NWLb3Inm HjoYn3H1+UKle3ip+Z12jXMjkesM1XMa/JTRAWsqFTGJnnXChKfOPcaYUNKUf4pOeI 2OafpudXJUC1g== From: SeongJae Park <sj@kernel.org> To: akpm@linux-foundation.org Cc: david@redhat.com, osalvador@suse.de, linux-mm@kvack.org, linux-kernel@vger.kernel.org, SeongJae Park <sj@kernel.org> Subject: [PATCH] mm/memory_hotplug: return zero from do_migrate_range() for only success Date: Tue, 14 Feb 2023 22:32:36 +0000 Message-Id: <20230214223236.58430-1-sj@kernel.org> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757847975308083645?= X-GMAIL-MSGID: =?utf-8?q?1757847975308083645?= |
Series |
mm/memory_hotplug: return zero from do_migrate_range() for only success
|
|
Commit Message
SeongJae Park
Feb. 14, 2023, 10:32 p.m. UTC
do_migrate_range() returns migrate_pages() return value, which zero
means perfect success, in usual cases. If all pages are failed to be
isolated, however, it returns isolate_{lru,movalbe}_page() return
values, or zero if all pfn were invalid, were hugetlb or hwpoisoned. So
do_migrate_range() returning zero means either perfect success, or
special cases of isolation total failure.
Actually, the return value is not checked by any caller, so it might be
better to simply make it a void function. However, there is a TODO for
checking the return value.
Make it easier to understand what it means.
Signed-off-by: SeongJae Park <sj@kernel.org>
---
mm/memory_hotplug.c | 7 +++++++
1 file changed, 7 insertions(+)
Comments
On 14.02.23 23:32, SeongJae Park wrote: > do_migrate_range() returns migrate_pages() return value, which zero > means perfect success, in usual cases. If all pages are failed to be > isolated, however, it returns isolate_{lru,movalbe}_page() return > values, or zero if all pfn were invalid, were hugetlb or hwpoisoned. So > do_migrate_range() returning zero means either perfect success, or > special cases of isolation total failure. > > Actually, the return value is not checked by any caller, so it might be > better to simply make it a void function. However, there is a TODO for > checking the return value. I'd prefer to not add more dead code ;) Let's not return an error instead. It's still unclear which kind of fatal migration issues we actually care about and how to really detect them.
On Wed, 15 Feb 2023 14:16:05 +0100 David Hildenbrand <david@redhat.com> wrote: > On 14.02.23 23:32, SeongJae Park wrote: > > do_migrate_range() returns migrate_pages() return value, which zero > > means perfect success, in usual cases. If all pages are failed to be > > isolated, however, it returns isolate_{lru,movalbe}_page() return > > values, or zero if all pfn were invalid, were hugetlb or hwpoisoned. So > > do_migrate_range() returning zero means either perfect success, or > > special cases of isolation total failure. > > > > Actually, the return value is not checked by any caller, so it might be > > better to simply make it a void function. However, there is a TODO for > > checking the return value. > > I'd prefer to not add more dead code ;) Let's not return an error instead. Makes sense, I will send next spin soon. > > It's still unclear which kind of fatal migration issues we actually care > about and how to really detect them. What do you think about treating the isolation/migration rate limit (migrate_rs) hit in do_migrate_range() as fatal? It warns for the event already, so definitely a bad sign. If that's not that bad enough to be treated as fatal, I think we could have yet another rate limit to be considered fatal. Thanks, SJ > > -- > Thanks, > > David / dhildenb
On 15.02.23 19:03, SeongJae Park wrote: > On Wed, 15 Feb 2023 14:16:05 +0100 David Hildenbrand <david@redhat.com> wrote: > >> On 14.02.23 23:32, SeongJae Park wrote: >>> do_migrate_range() returns migrate_pages() return value, which zero >>> means perfect success, in usual cases. If all pages are failed to be >>> isolated, however, it returns isolate_{lru,movalbe}_page() return >>> values, or zero if all pfn were invalid, were hugetlb or hwpoisoned. So >>> do_migrate_range() returning zero means either perfect success, or >>> special cases of isolation total failure. >>> >>> Actually, the return value is not checked by any caller, so it might be >>> better to simply make it a void function. However, there is a TODO for >>> checking the return value. >> >> I'd prefer to not add more dead code ;) Let's not return an error instead. > > Makes sense, I will send next spin soon. > >> >> It's still unclear which kind of fatal migration issues we actually care >> about and how to really detect them. > > What do you think about treating the isolation/migration rate limit > (migrate_rs) hit in do_migrate_range() as fatal? It warns for the event > already, so definitely a bad sign. > > If that's not that bad enough to be treated as fatal, I think we could have yet > another rate limit to be considered fatal. IIRC, there are some setups where offlining might take several minutes (e.g., heavy O_DIRECT load) and that's to be expected. So the existing code warns for better debugging, but keeps trying. So the ratelimit is rather to not produce too much debug output, not to really indicate that something is fatal.
On Wed, 15 Feb 2023 21:00:50 +0100 David Hildenbrand <david@redhat.com> wrote: > On 15.02.23 19:03, SeongJae Park wrote: > > On Wed, 15 Feb 2023 14:16:05 +0100 David Hildenbrand <david@redhat.com> wrote: > > > >> On 14.02.23 23:32, SeongJae Park wrote: > >>> do_migrate_range() returns migrate_pages() return value, which zero > >>> means perfect success, in usual cases. If all pages are failed to be > >>> isolated, however, it returns isolate_{lru,movalbe}_page() return > >>> values, or zero if all pfn were invalid, were hugetlb or hwpoisoned. So > >>> do_migrate_range() returning zero means either perfect success, or > >>> special cases of isolation total failure. > >>> > >>> Actually, the return value is not checked by any caller, so it might be > >>> better to simply make it a void function. However, there is a TODO for > >>> checking the return value. > >> > >> I'd prefer to not add more dead code ;) Let's not return an error instead. > > > > Makes sense, I will send next spin soon. > > > >> > >> It's still unclear which kind of fatal migration issues we actually care > >> about and how to really detect them. > > > > What do you think about treating the isolation/migration rate limit > > (migrate_rs) hit in do_migrate_range() as fatal? It warns for the event > > already, so definitely a bad sign. > > > > If that's not that bad enough to be treated as fatal, I think we could have yet > > another rate limit to be considered fatal. > > IIRC, there are some setups where offlining might take several minutes > (e.g., heavy O_DIRECT load) and that's to be expected. > > So the existing code warns for better debugging, but keeps trying. So > the ratelimit is rather to not produce too much debug output, not to > really indicate that something is fatal. Thank you for clarification, David! Thanks, SJ > > -- > Thanks, > > David / dhildenb
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index a1e8c3e9ab08..db2c02d502a2 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1620,6 +1620,12 @@ static int scan_movable_pages(unsigned long start, unsigned long end, return 0; } +/* + * migrate pages in the given pfn range. + * + * Returns the number of {normal folio, large folio, hugetlb} that were not + * migrated, or an error code. + */ static int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) { @@ -1685,6 +1691,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) } put_page(page); } + ret = -ENOENT; if (!list_empty(&source)) { nodemask_t nmask = node_states[N_MEMORY]; struct migration_target_control mtc = {