Message ID | 20230224141145.96814-4-ying.huang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp944703wrd; Fri, 24 Feb 2023 06:38:08 -0800 (PST) X-Google-Smtp-Source: AK7set/sm28iJnhMkcJ6c7+dFbjB9icIuiMAQiXeh3t4kzg4hszBXbXPfINYmNNP7hiGaVnm4wWX X-Received: by 2002:a17:906:2c12:b0:881:d1ad:1640 with SMTP id e18-20020a1709062c1200b00881d1ad1640mr25636669ejh.57.1677249487960; Fri, 24 Feb 2023 06:38:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677249487; cv=none; d=google.com; s=arc-20160816; b=NFOcO5XdN/qpypH3bZ4LG+fW35l/xaE30mq0vfNhhK6gWmUD3s2pSOcZnKqKxHTcqC z4djOjiQg1RYx5j47VRg8W21QuD+atuo/pGvrJVn5ApMUOTZPUQljn95L+cGaIYPUx62 raYpu8Tu3ZwNWPRaroiuEJQ+cFcSPQFyMpJqkP2cZ2axmRQnVilVOQUKMPr/UOsxHxPH zkuOoEjaV8mYDn4yP/ygYX8VNpCPhG4tvd9XYHSr4j8QpqCKWUydk2gFlyx5AW/Rv9UP rrn5+ttyRHpzRPnQ7NiL7sDW5+gHS0YkXzI2UwR0NlvUR0vrWJ10CcQpjMsa/hOJoDZV xp4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=jtIb4ZWHwjQeYauy02UTZy5dAdRH53XVqaRUoqPshEM=; b=GZK8D5aRastuZIPTTYjIuNFqa6j3wzRsnRh7yRNIYpzNPouD08eabGlJJcU5hjC2vC X/m2d/VvWpiYu5ymtwsS9er4knyXuIRS+pWufyTlz1sIjaDBQHPrE/g1/oDwBMXaKje9 0iMH4HYDg9DcW1hAPwtLuNGxJXl+qmOJHwEzZE+XxdZcEYHJohJ9SDeStL53fsx7DNcs MTvYcXzyemJvQe8pFqhXuFFgxeUG9kS0lslhhT6EdcPkqy1FnmKgYzHMi5ulJ66Nx3+T PdE4uhHALlh97nNHrg8K0nZzw2AwvkKl2iFSnVnvOiTSZi4wD/fyeD/4znjP309dNGmf RLyw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FVryWDEe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ge7-20020a170907908700b008b23d0dda74si23341696ejb.693.2023.02.24.06.37.44; Fri, 24 Feb 2023 06:38:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FVryWDEe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229891AbjBXOMh (ORCPT <rfc822;jeff.pang.chn@gmail.com> + 99 others); Fri, 24 Feb 2023 09:12:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229888AbjBXOM2 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 24 Feb 2023 09:12:28 -0500 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C87169AF6 for <linux-kernel@vger.kernel.org>; Fri, 24 Feb 2023 06:12:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1677247943; x=1708783943; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Dxby7yNPzwalm19ztka4+4UYeHwqmfCE4JJYZx6x6X0=; b=FVryWDEezEQUTH30h6wK5CQXpb+8MAttxCZ+JGNizUmjiL8DxHqOj+WP VBjCfKNgok2ybcUXSCGhJZXAXaUKm+ifcki8ZlyXpWRDnmnlTyl46ljXv TTCSq6zXrpkJ22vs/D74lnPAAFHpa7cnrPl8EoG3Ob1w3BOBBdILPbOXW +3xX6swV7hmPWYJl2JR0TZ3PP851aCjtez6HN7kHidufiiYcmckHngEFc UFWPHH6coAqGwLGmezIvatptI+2Jk50CWvrSnGwo5oyn8fWUUIlZvuFOl Mc1vuGdATYGNA0h5gOGvEQn9ByuwQam/2O7ogPRnZ2MGTsQhFXWveq5nt A==; X-IronPort-AV: E=McAfee;i="6500,9779,10630"; a="332167736" X-IronPort-AV: E=Sophos;i="5.97,324,1669104000"; d="scan'208";a="332167736" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Feb 2023 06:12:22 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10630"; a="741684670" X-IronPort-AV: E=Sophos;i="5.97,324,1669104000"; d="scan'208";a="741684670" Received: from bingqili-mobl2.ccr.corp.intel.com (HELO yhuang6-mobl2.ccr.corp.intel.com) ([10.255.28.19]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Feb 2023 06:12:19 -0800 From: Huang Ying <ying.huang@intel.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying <ying.huang@intel.com>, Hugh Dickins <hughd@google.com>, "Xu, Pengfei" <pengfei.xu@intel.com>, Christoph Hellwig <hch@lst.de>, Stefan Roesch <shr@devkernel.io>, Tejun Heo <tj@kernel.org>, Xin Hao <xhao@linux.alibaba.com>, Zi Yan <ziy@nvidia.com>, Yang Shi <shy828301@gmail.com>, Baolin Wang <baolin.wang@linux.alibaba.com>, Matthew Wilcox <willy@infradead.org>, Mike Kravetz <mike.kravetz@oracle.com> Subject: [PATCH 3/3] migrate_pages: try migrate in batch asynchronously firstly Date: Fri, 24 Feb 2023 22:11:45 +0800 Message-Id: <20230224141145.96814-4-ying.huang@intel.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230224141145.96814-1-ying.huang@intel.com> References: <20230224141145.96814-1-ying.huang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758723558999793054?= X-GMAIL-MSGID: =?utf-8?q?1758723558999793054?= |
Series |
migrate_pages: fix deadlock in batched synchronous migration
|
|
Commit Message
Huang, Ying
Feb. 24, 2023, 2:11 p.m. UTC
When we have locked more than one folios, we cannot wait the lock or
bit (e.g., page lock, buffer head lock, writeback bit) synchronously.
Otherwise deadlock may be triggered. This make it hard to batch the
synchronous migration directly.
This patch re-enables batching synchronous migration via trying to
migrate in batch asynchronously firstly. And any folios that are
failed to be migrated asynchronously will be migrated synchronously
one by one.
Test shows that this can restore the TLB flushing batching performance
for synchronous migration effectively.
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Xu, Pengfei" <pengfei.xu@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: Tejun Heo <tj@kernel.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
---
mm/migrate.c | 65 ++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 55 insertions(+), 10 deletions(-)
Comments
On Fri, 24 Feb 2023, Huang Ying wrote: > When we have locked more than one folios, we cannot wait the lock or > bit (e.g., page lock, buffer head lock, writeback bit) synchronously. > Otherwise deadlock may be triggered. This make it hard to batch the > synchronous migration directly. > > This patch re-enables batching synchronous migration via trying to > migrate in batch asynchronously firstly. And any folios that are > failed to be migrated asynchronously will be migrated synchronously > one by one. > > Test shows that this can restore the TLB flushing batching performance > for synchronous migration effectively. > > Signed-off-by: "Huang, Ying" <ying.huang@intel.com> > Cc: Hugh Dickins <hughd@google.com> I'm not sure whether my 48 hours on two machines counts for a Tested-by: Hugh Dickins <hughd@google.com> or not; but it certainly looks like you've fixed my deadlock. > Cc: "Xu, Pengfei" <pengfei.xu@intel.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Stefan Roesch <shr@devkernel.io> > Cc: Tejun Heo <tj@kernel.org> > Cc: Xin Hao <xhao@linux.alibaba.com> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Yang Shi <shy828301@gmail.com> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Mike Kravetz <mike.kravetz@oracle.com> > --- > mm/migrate.c | 65 ++++++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 55 insertions(+), 10 deletions(-) I was initially disappointed, that this was more complicated than I had thought it should be; but came to understand why. My "change the mode to MIGRATE_ASYNC after the first" model would have condemned most of the MIGRATE_SYNC batch of pages to be handled as lightly as MIGRATE_ASYNC: not good enough, you're right be trying harder here. > > diff --git a/mm/migrate.c b/mm/migrate.c > index 91198b487e49..c17ce5ee8d92 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1843,6 +1843,51 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, > return rc; > } > > +static int migrate_pages_sync(struct list_head *from, new_page_t get_new_page, > + free_page_t put_new_page, unsigned long private, > + enum migrate_mode mode, int reason, struct list_head *ret_folios, > + struct list_head *split_folios, struct migrate_pages_stats *stats) > +{ > + int rc, nr_failed = 0; > + LIST_HEAD(folios); > + struct migrate_pages_stats astats; > + > + memset(&astats, 0, sizeof(astats)); > + /* Try to migrate in batch with MIGRATE_ASYNC mode firstly */ > + rc = migrate_pages_batch(from, get_new_page, put_new_page, private, MIGRATE_ASYNC, > + reason, &folios, split_folios, &astats, > + NR_MAX_MIGRATE_PAGES_RETRY); I wonder if that and below would better be NR_MAX_MIGRATE_PAGES_RETRY / 2. Though I've never got down to adjusting that number (and it's not a job to be done in this set of patches), those 10 retries sometimes terrify me, from a latency point of view. They can have such different weights: in the unmapped case, 10 retries is okay; but when a pinned page is mapped into 1000 processes, the thought of all that unmapping and TLB flushing and remapping is terrifying. Since you're retrying below, halve both numbers of retries for now? > + stats->nr_succeeded += astats.nr_succeeded; > + stats->nr_thp_succeeded += astats.nr_thp_succeeded; > + stats->nr_thp_split += astats.nr_thp_split; > + if (rc < 0) { > + stats->nr_failed_pages += astats.nr_failed_pages; > + stats->nr_thp_failed += astats.nr_thp_failed; > + list_splice_tail(&folios, ret_folios); > + return rc; > + } > + stats->nr_thp_failed += astats.nr_thp_split; > + nr_failed += astats.nr_thp_split; > + /* > + * Fall back to migrate all failed folios one by one synchronously. All > + * failed folios except split THPs will be retried, so their failure > + * isn't counted > + */ > + list_splice_tail_init(&folios, from); > + while (!list_empty(from)) { > + list_move(from->next, &folios); > + rc = migrate_pages_batch(&folios, get_new_page, put_new_page, > + private, mode, reason, ret_folios, > + split_folios, stats, NR_MAX_MIGRATE_PAGES_RETRY); NR_MAX_MIGRATE_PAGES_RETRY / 2 ? > + list_splice_tail_init(&folios, ret_folios); > + if (rc < 0) > + return rc; > + nr_failed += rc; > + } > + > + return nr_failed; > +} > + > /* > * migrate_pages - migrate the folios specified in a list, to the free folios > * supplied as the target for the page migration > @@ -1874,7 +1919,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, > enum migrate_mode mode, int reason, unsigned int *ret_succeeded) > { > int rc, rc_gather; > - int nr_pages, batch; > + int nr_pages; > struct folio *folio, *folio2; > LIST_HEAD(folios); > LIST_HEAD(ret_folios); > @@ -1890,10 +1935,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, > if (rc_gather < 0) > goto out; > > - if (mode == MIGRATE_ASYNC) > - batch = NR_MAX_BATCHED_MIGRATION; > - else > - batch = 1; > again: > nr_pages = 0; > list_for_each_entry_safe(folio, folio2, from, lru) { > @@ -1904,16 +1945,20 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, > } > > nr_pages += folio_nr_pages(folio); > - if (nr_pages >= batch) > + if (nr_pages >= NR_MAX_BATCHED_MIGRATION) > break; > } > - if (nr_pages >= batch) > + if (nr_pages >= NR_MAX_BATCHED_MIGRATION) > list_cut_before(&folios, from, &folio2->lru); > else > list_splice_init(from, &folios); > - rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private, > - mode, reason, &ret_folios, &split_folios, &stats, > - NR_MAX_MIGRATE_PAGES_RETRY); > + if (mode == MIGRATE_ASYNC) > + rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private, > + mode, reason, &ret_folios, &split_folios, &stats, > + NR_MAX_MIGRATE_PAGES_RETRY); > + else > + rc = migrate_pages_sync(&folios, get_new_page, put_new_page, private, > + mode, reason, &ret_folios, &split_folios, &stats); > list_splice_tail_init(&folios, &ret_folios); > if (rc < 0) { > rc_gather = rc; > -- > 2.39.1
Hugh Dickins <hughd@google.com> writes: > On Fri, 24 Feb 2023, Huang Ying wrote: > >> When we have locked more than one folios, we cannot wait the lock or >> bit (e.g., page lock, buffer head lock, writeback bit) synchronously. >> Otherwise deadlock may be triggered. This make it hard to batch the >> synchronous migration directly. >> >> This patch re-enables batching synchronous migration via trying to >> migrate in batch asynchronously firstly. And any folios that are >> failed to be migrated asynchronously will be migrated synchronously >> one by one. >> >> Test shows that this can restore the TLB flushing batching performance >> for synchronous migration effectively. >> >> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> >> Cc: Hugh Dickins <hughd@google.com> > > I'm not sure whether my 48 hours on two machines counts for a > Tested-by: Hugh Dickins <hughd@google.com> > or not; but it certainly looks like you've fixed my deadlock. Thank you very much for testing the series! >> Cc: "Xu, Pengfei" <pengfei.xu@intel.com> >> Cc: Christoph Hellwig <hch@lst.de> >> Cc: Stefan Roesch <shr@devkernel.io> >> Cc: Tejun Heo <tj@kernel.org> >> Cc: Xin Hao <xhao@linux.alibaba.com> >> Cc: Zi Yan <ziy@nvidia.com> >> Cc: Yang Shi <shy828301@gmail.com> >> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> >> Cc: Matthew Wilcox <willy@infradead.org> >> Cc: Mike Kravetz <mike.kravetz@oracle.com> >> --- >> mm/migrate.c | 65 ++++++++++++++++++++++++++++++++++++++++++++-------- >> 1 file changed, 55 insertions(+), 10 deletions(-) > > I was initially disappointed, that this was more complicated than I had > thought it should be; but came to understand why. My "change the mode > to MIGRATE_ASYNC after the first" model would have condemned most of the > MIGRATE_SYNC batch of pages to be handled as lightly as MIGRATE_ASYNC: > not good enough, you're right be trying harder here. > >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index 91198b487e49..c17ce5ee8d92 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -1843,6 +1843,51 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, >> return rc; >> } >> >> +static int migrate_pages_sync(struct list_head *from, new_page_t get_new_page, >> + free_page_t put_new_page, unsigned long private, >> + enum migrate_mode mode, int reason, struct list_head *ret_folios, >> + struct list_head *split_folios, struct migrate_pages_stats *stats) >> +{ >> + int rc, nr_failed = 0; >> + LIST_HEAD(folios); >> + struct migrate_pages_stats astats; >> + >> + memset(&astats, 0, sizeof(astats)); >> + /* Try to migrate in batch with MIGRATE_ASYNC mode firstly */ >> + rc = migrate_pages_batch(from, get_new_page, put_new_page, private, MIGRATE_ASYNC, >> + reason, &folios, split_folios, &astats, >> + NR_MAX_MIGRATE_PAGES_RETRY); > > I wonder if that and below would better be NR_MAX_MIGRATE_PAGES_RETRY / 2. > > Though I've never got down to adjusting that number (and it's not a job > to be done in this set of patches), those 10 retries sometimes terrify > me, from a latency point of view. They can have such different weights: > in the unmapped case, 10 retries is okay; but when a pinned page is mapped > into 1000 processes, the thought of all that unmapping and TLB flushing > and remapping is terrifying. > > Since you're retrying below, halve both numbers of retries for now? Yes. These are reasonable concerns. And in the original implementation, we only wait to lock page and wait the writeback to complete if pass > 2. This is kind of trying to migrate asynchronously for 3 times before the real synchronous migration. So, should we delete the "force" logic (in migrate_folio_unmap()), and try to migrate asynchronously for 3 times in batch before migrating synchronously for 7 times one by one? >> + stats->nr_succeeded += astats.nr_succeeded; >> + stats->nr_thp_succeeded += astats.nr_thp_succeeded; >> + stats->nr_thp_split += astats.nr_thp_split; >> + if (rc < 0) { >> + stats->nr_failed_pages += astats.nr_failed_pages; >> + stats->nr_thp_failed += astats.nr_thp_failed; >> + list_splice_tail(&folios, ret_folios); >> + return rc; >> + } >> + stats->nr_thp_failed += astats.nr_thp_split; >> + nr_failed += astats.nr_thp_split; >> + /* >> + * Fall back to migrate all failed folios one by one synchronously. All >> + * failed folios except split THPs will be retried, so their failure >> + * isn't counted >> + */ >> + list_splice_tail_init(&folios, from); >> + while (!list_empty(from)) { >> + list_move(from->next, &folios); >> + rc = migrate_pages_batch(&folios, get_new_page, put_new_page, >> + private, mode, reason, ret_folios, >> + split_folios, stats, NR_MAX_MIGRATE_PAGES_RETRY); > > NR_MAX_MIGRATE_PAGES_RETRY / 2 ? > >> + list_splice_tail_init(&folios, ret_folios); >> + if (rc < 0) >> + return rc; >> + nr_failed += rc; >> + } >> + >> + return nr_failed; >> +} >> + >> /* >> * migrate_pages - migrate the folios specified in a list, to the free folios >> * supplied as the target for the page migration >> @@ -1874,7 +1919,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, >> enum migrate_mode mode, int reason, unsigned int *ret_succeeded) >> { >> int rc, rc_gather; >> - int nr_pages, batch; >> + int nr_pages; >> struct folio *folio, *folio2; >> LIST_HEAD(folios); >> LIST_HEAD(ret_folios); >> @@ -1890,10 +1935,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, >> if (rc_gather < 0) >> goto out; >> >> - if (mode == MIGRATE_ASYNC) >> - batch = NR_MAX_BATCHED_MIGRATION; >> - else >> - batch = 1; >> again: >> nr_pages = 0; >> list_for_each_entry_safe(folio, folio2, from, lru) { >> @@ -1904,16 +1945,20 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, >> } >> >> nr_pages += folio_nr_pages(folio); >> - if (nr_pages >= batch) >> + if (nr_pages >= NR_MAX_BATCHED_MIGRATION) >> break; >> } >> - if (nr_pages >= batch) >> + if (nr_pages >= NR_MAX_BATCHED_MIGRATION) >> list_cut_before(&folios, from, &folio2->lru); >> else >> list_splice_init(from, &folios); >> - rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private, >> - mode, reason, &ret_folios, &split_folios, &stats, >> - NR_MAX_MIGRATE_PAGES_RETRY); >> + if (mode == MIGRATE_ASYNC) >> + rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private, >> + mode, reason, &ret_folios, &split_folios, &stats, >> + NR_MAX_MIGRATE_PAGES_RETRY); >> + else >> + rc = migrate_pages_sync(&folios, get_new_page, put_new_page, private, >> + mode, reason, &ret_folios, &split_folios, &stats); >> list_splice_tail_init(&folios, &ret_folios); >> if (rc < 0) { >> rc_gather = rc; >> -- >> 2.39.1 Best Regards, Huang, Ying
On Tue, 28 Feb 2023, Huang, Ying wrote: > Hugh Dickins <hughd@google.com> writes: > > On Fri, 24 Feb 2023, Huang Ying wrote: > >> > >> diff --git a/mm/migrate.c b/mm/migrate.c > >> index 91198b487e49..c17ce5ee8d92 100644 > >> --- a/mm/migrate.c > >> +++ b/mm/migrate.c > >> @@ -1843,6 +1843,51 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, > >> return rc; > >> } > >> > >> +static int migrate_pages_sync(struct list_head *from, new_page_t get_new_page, > >> + free_page_t put_new_page, unsigned long private, > >> + enum migrate_mode mode, int reason, struct list_head *ret_folios, > >> + struct list_head *split_folios, struct migrate_pages_stats *stats) > >> +{ > >> + int rc, nr_failed = 0; > >> + LIST_HEAD(folios); > >> + struct migrate_pages_stats astats; > >> + > >> + memset(&astats, 0, sizeof(astats)); > >> + /* Try to migrate in batch with MIGRATE_ASYNC mode firstly */ > >> + rc = migrate_pages_batch(from, get_new_page, put_new_page, private, MIGRATE_ASYNC, > >> + reason, &folios, split_folios, &astats, > >> + NR_MAX_MIGRATE_PAGES_RETRY); > > > > I wonder if that and below would better be NR_MAX_MIGRATE_PAGES_RETRY / 2. > > > > Though I've never got down to adjusting that number (and it's not a job > > to be done in this set of patches), those 10 retries sometimes terrify > > me, from a latency point of view. They can have such different weights: > > in the unmapped case, 10 retries is okay; but when a pinned page is mapped > > into 1000 processes, the thought of all that unmapping and TLB flushing > > and remapping is terrifying. > > > > Since you're retrying below, halve both numbers of retries for now? > > Yes. These are reasonable concerns. > > And in the original implementation, we only wait to lock page and wait > the writeback to complete if pass > 2. This is kind of trying to > migrate asynchronously for 3 times before the real synchronous > migration. So, should we delete the "force" logic (in > migrate_folio_unmap()), and try to migrate asynchronously for 3 times in > batch before migrating synchronously for 7 times one by one? Oh, that's a good idea (but please don't imagine I've thought it through): I hadn't realized the way in which your migrate_pages_sync() addition is kind of duplicating the way that the "force" argument conditions behaviour, It would be very appealing to delete the "force" argument now if you can. But aside from that, you've also made me wonder (again, please remember I don't have a good picture of the new migrate_pages() sequence in my head) whether you have already made a *great* strike against my 10 retries terror. Am I reading it right, that the unmapping is now done on the first try, and the remove_migration_ptes after the last try (all the pages involved having remained locked throughout)? Hugh
On 2/24/2023 10:11 PM, Huang Ying wrote: > When we have locked more than one folios, we cannot wait the lock or > bit (e.g., page lock, buffer head lock, writeback bit) synchronously. > Otherwise deadlock may be triggered. This make it hard to batch the > synchronous migration directly. > > This patch re-enables batching synchronous migration via trying to > migrate in batch asynchronously firstly. And any folios that are > failed to be migrated asynchronously will be migrated synchronously > one by one. > > Test shows that this can restore the TLB flushing batching performance > for synchronous migration effectively. > > Signed-off-by: "Huang, Ying" <ying.huang@intel.com> > Cc: Hugh Dickins <hughd@google.com> > Cc: "Xu, Pengfei" <pengfei.xu@intel.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Stefan Roesch <shr@devkernel.io> > Cc: Tejun Heo <tj@kernel.org> > Cc: Xin Hao <xhao@linux.alibaba.com> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Yang Shi <shy828301@gmail.com> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Mike Kravetz <mike.kravetz@oracle.com> > --- > mm/migrate.c | 65 ++++++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 55 insertions(+), 10 deletions(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index 91198b487e49..c17ce5ee8d92 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1843,6 +1843,51 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, > return rc; > } > > +static int migrate_pages_sync(struct list_head *from, new_page_t get_new_page, > + free_page_t put_new_page, unsigned long private, > + enum migrate_mode mode, int reason, struct list_head *ret_folios, > + struct list_head *split_folios, struct migrate_pages_stats *stats) > +{ > + int rc, nr_failed = 0; > + LIST_HEAD(folios); > + struct migrate_pages_stats astats; > + > + memset(&astats, 0, sizeof(astats)); > + /* Try to migrate in batch with MIGRATE_ASYNC mode firstly */ > + rc = migrate_pages_batch(from, get_new_page, put_new_page, private, MIGRATE_ASYNC, > + reason, &folios, split_folios, &astats, > + NR_MAX_MIGRATE_PAGES_RETRY); > + stats->nr_succeeded += astats.nr_succeeded; > + stats->nr_thp_succeeded += astats.nr_thp_succeeded; > + stats->nr_thp_split += astats.nr_thp_split; > + if (rc < 0) { > + stats->nr_failed_pages += astats.nr_failed_pages; > + stats->nr_thp_failed += astats.nr_thp_failed; > + list_splice_tail(&folios, ret_folios); > + return rc; > + } > + stats->nr_thp_failed += astats.nr_thp_split; > + nr_failed += astats.nr_thp_split; > + /* > + * Fall back to migrate all failed folios one by one synchronously. All > + * failed folios except split THPs will be retried, so their failure > + * isn't counted > + */ > + list_splice_tail_init(&folios, from); > + while (!list_empty(from)) { > + list_move(from->next, &folios); > + rc = migrate_pages_batch(&folios, get_new_page, put_new_page, > + private, mode, reason, ret_folios, > + split_folios, stats, NR_MAX_MIGRATE_PAGES_RETRY); > + list_splice_tail_init(&folios, ret_folios); > + if (rc < 0) > + return rc; > + nr_failed += rc; > + } > + > + return nr_failed; > +} > + > /* > * migrate_pages - migrate the folios specified in a list, to the free folios > * supplied as the target for the page migration > @@ -1874,7 +1919,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, > enum migrate_mode mode, int reason, unsigned int *ret_succeeded) > { > int rc, rc_gather; > - int nr_pages, batch; > + int nr_pages; > struct folio *folio, *folio2; > LIST_HEAD(folios); > LIST_HEAD(ret_folios); > @@ -1890,10 +1935,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, > if (rc_gather < 0) > goto out; > > - if (mode == MIGRATE_ASYNC) > - batch = NR_MAX_BATCHED_MIGRATION; > - else > - batch = 1; > again: > nr_pages = 0; > list_for_each_entry_safe(folio, folio2, from, lru) { > @@ -1904,16 +1945,20 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, > } > > nr_pages += folio_nr_pages(folio); > - if (nr_pages >= batch) > + if (nr_pages >= NR_MAX_BATCHED_MIGRATION) > break; > } > - if (nr_pages >= batch) > + if (nr_pages >= NR_MAX_BATCHED_MIGRATION) > list_cut_before(&folios, from, &folio2->lru); > else > list_splice_init(from, &folios); > - rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private, > - mode, reason, &ret_folios, &split_folios, &stats, > - NR_MAX_MIGRATE_PAGES_RETRY); > + if (mode == MIGRATE_ASYNC) > + rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private, > + mode, reason, &ret_folios, &split_folios, &stats, > + NR_MAX_MIGRATE_PAGES_RETRY); > + else > + rc = migrate_pages_sync(&folios, get_new_page, put_new_page, private, > + mode, reason, &ret_folios, &split_folios, &stats); For split folios, it seems also reasonable to use migrate_pages_sync() instead of always using fixed MIGRATE_ASYNC mode? > list_splice_tail_init(&folios, &ret_folios); > if (rc < 0) { > rc_gather = rc;
Hugh Dickins <hughd@google.com> writes: > On Tue, 28 Feb 2023, Huang, Ying wrote: >> Hugh Dickins <hughd@google.com> writes: >> > On Fri, 24 Feb 2023, Huang Ying wrote: >> >> >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> >> index 91198b487e49..c17ce5ee8d92 100644 >> >> --- a/mm/migrate.c >> >> +++ b/mm/migrate.c >> >> @@ -1843,6 +1843,51 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, >> >> return rc; >> >> } >> >> >> >> +static int migrate_pages_sync(struct list_head *from, new_page_t get_new_page, >> >> + free_page_t put_new_page, unsigned long private, >> >> + enum migrate_mode mode, int reason, struct list_head *ret_folios, >> >> + struct list_head *split_folios, struct migrate_pages_stats *stats) >> >> +{ >> >> + int rc, nr_failed = 0; >> >> + LIST_HEAD(folios); >> >> + struct migrate_pages_stats astats; >> >> + >> >> + memset(&astats, 0, sizeof(astats)); >> >> + /* Try to migrate in batch with MIGRATE_ASYNC mode firstly */ >> >> + rc = migrate_pages_batch(from, get_new_page, put_new_page, private, MIGRATE_ASYNC, >> >> + reason, &folios, split_folios, &astats, >> >> + NR_MAX_MIGRATE_PAGES_RETRY); >> > >> > I wonder if that and below would better be NR_MAX_MIGRATE_PAGES_RETRY / 2. >> > >> > Though I've never got down to adjusting that number (and it's not a job >> > to be done in this set of patches), those 10 retries sometimes terrify >> > me, from a latency point of view. They can have such different weights: >> > in the unmapped case, 10 retries is okay; but when a pinned page is mapped >> > into 1000 processes, the thought of all that unmapping and TLB flushing >> > and remapping is terrifying. >> > >> > Since you're retrying below, halve both numbers of retries for now? >> >> Yes. These are reasonable concerns. >> >> And in the original implementation, we only wait to lock page and wait >> the writeback to complete if pass > 2. This is kind of trying to >> migrate asynchronously for 3 times before the real synchronous >> migration. So, should we delete the "force" logic (in >> migrate_folio_unmap()), and try to migrate asynchronously for 3 times in >> batch before migrating synchronously for 7 times one by one? > > Oh, that's a good idea (but please don't imagine I've thought it through): > I hadn't realized the way in which your migrate_pages_sync() addition is > kind of duplicating the way that the "force" argument conditions behaviour, > It would be very appealing to delete the "force" argument now if you can. Sure. Will do that in the next version. > But aside from that, you've also made me wonder (again, please remember I > don't have a good picture of the new migrate_pages() sequence in my head) > whether you have already made a *great* strike against my 10 retries > terror. Am I reading it right, that the unmapping is now done on the > first try, and the remove_migration_ptes after the last try (all the > pages involved having remained locked throughout)? Yes. You are right. Now, unmapping and moving are two separate steps, and they are retried separately. After a folio has been unmapped successfully, we will not remap/unmap it 10 times if the folio is pinned so that failed to move (migrate_folio_move()). So the latency caused by retrying is much better now. But I still tend to keep the total retry number as before. Do you agree? Best Regards, Huang, Ying
Baolin Wang <baolin.wang@linux.alibaba.com> writes: > On 2/24/2023 10:11 PM, Huang Ying wrote: >> When we have locked more than one folios, we cannot wait the lock or >> bit (e.g., page lock, buffer head lock, writeback bit) synchronously. >> Otherwise deadlock may be triggered. This make it hard to batch the >> synchronous migration directly. >> This patch re-enables batching synchronous migration via trying to >> migrate in batch asynchronously firstly. And any folios that are >> failed to be migrated asynchronously will be migrated synchronously >> one by one. >> Test shows that this can restore the TLB flushing batching >> performance >> for synchronous migration effectively. >> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> >> Cc: Hugh Dickins <hughd@google.com> >> Cc: "Xu, Pengfei" <pengfei.xu@intel.com> >> Cc: Christoph Hellwig <hch@lst.de> >> Cc: Stefan Roesch <shr@devkernel.io> >> Cc: Tejun Heo <tj@kernel.org> >> Cc: Xin Hao <xhao@linux.alibaba.com> >> Cc: Zi Yan <ziy@nvidia.com> >> Cc: Yang Shi <shy828301@gmail.com> >> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> >> Cc: Matthew Wilcox <willy@infradead.org> >> Cc: Mike Kravetz <mike.kravetz@oracle.com> >> --- >> mm/migrate.c | 65 ++++++++++++++++++++++++++++++++++++++++++++-------- >> 1 file changed, 55 insertions(+), 10 deletions(-) >> diff --git a/mm/migrate.c b/mm/migrate.c >> index 91198b487e49..c17ce5ee8d92 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -1843,6 +1843,51 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, >> return rc; >> } >> +static int migrate_pages_sync(struct list_head *from, new_page_t >> get_new_page, >> + free_page_t put_new_page, unsigned long private, >> + enum migrate_mode mode, int reason, struct list_head *ret_folios, >> + struct list_head *split_folios, struct migrate_pages_stats *stats) >> +{ >> + int rc, nr_failed = 0; >> + LIST_HEAD(folios); >> + struct migrate_pages_stats astats; >> + >> + memset(&astats, 0, sizeof(astats)); >> + /* Try to migrate in batch with MIGRATE_ASYNC mode firstly */ >> + rc = migrate_pages_batch(from, get_new_page, put_new_page, private, MIGRATE_ASYNC, >> + reason, &folios, split_folios, &astats, >> + NR_MAX_MIGRATE_PAGES_RETRY); >> + stats->nr_succeeded += astats.nr_succeeded; >> + stats->nr_thp_succeeded += astats.nr_thp_succeeded; >> + stats->nr_thp_split += astats.nr_thp_split; >> + if (rc < 0) { >> + stats->nr_failed_pages += astats.nr_failed_pages; >> + stats->nr_thp_failed += astats.nr_thp_failed; >> + list_splice_tail(&folios, ret_folios); >> + return rc; >> + } >> + stats->nr_thp_failed += astats.nr_thp_split; >> + nr_failed += astats.nr_thp_split; >> + /* >> + * Fall back to migrate all failed folios one by one synchronously. All >> + * failed folios except split THPs will be retried, so their failure >> + * isn't counted >> + */ >> + list_splice_tail_init(&folios, from); >> + while (!list_empty(from)) { >> + list_move(from->next, &folios); >> + rc = migrate_pages_batch(&folios, get_new_page, put_new_page, >> + private, mode, reason, ret_folios, >> + split_folios, stats, NR_MAX_MIGRATE_PAGES_RETRY); >> + list_splice_tail_init(&folios, ret_folios); >> + if (rc < 0) >> + return rc; >> + nr_failed += rc; >> + } >> + >> + return nr_failed; >> +} >> + >> /* >> * migrate_pages - migrate the folios specified in a list, to the free folios >> * supplied as the target for the page migration >> @@ -1874,7 +1919,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, >> enum migrate_mode mode, int reason, unsigned int *ret_succeeded) >> { >> int rc, rc_gather; >> - int nr_pages, batch; >> + int nr_pages; >> struct folio *folio, *folio2; >> LIST_HEAD(folios); >> LIST_HEAD(ret_folios); >> @@ -1890,10 +1935,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, >> if (rc_gather < 0) >> goto out; >> - if (mode == MIGRATE_ASYNC) >> - batch = NR_MAX_BATCHED_MIGRATION; >> - else >> - batch = 1; >> again: >> nr_pages = 0; >> list_for_each_entry_safe(folio, folio2, from, lru) { >> @@ -1904,16 +1945,20 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, >> } >> nr_pages += folio_nr_pages(folio); >> - if (nr_pages >= batch) >> + if (nr_pages >= NR_MAX_BATCHED_MIGRATION) >> break; >> } >> - if (nr_pages >= batch) >> + if (nr_pages >= NR_MAX_BATCHED_MIGRATION) >> list_cut_before(&folios, from, &folio2->lru); >> else >> list_splice_init(from, &folios); >> - rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private, >> - mode, reason, &ret_folios, &split_folios, &stats, >> - NR_MAX_MIGRATE_PAGES_RETRY); >> + if (mode == MIGRATE_ASYNC) >> + rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private, >> + mode, reason, &ret_folios, &split_folios, &stats, >> + NR_MAX_MIGRATE_PAGES_RETRY); >> + else >> + rc = migrate_pages_sync(&folios, get_new_page, put_new_page, private, >> + mode, reason, &ret_folios, &split_folios, &stats); > > For split folios, it seems also reasonable to use migrate_pages_sync() > instead of always using fixed MIGRATE_ASYNC mode? For split folios, we only try to migrate them with minimal effort. Previously, we decrease the retry number from 10 to 1. Now, I think that it's reasonable to change the migration mode to MIGRATE_ASYNC to reduce latency. They have been counted as failure anyway. >> list_splice_tail_init(&folios, &ret_folios); >> if (rc < 0) { >> rc_gather = rc; Best Regards, Huang, Ying
On Wed, 1 Mar 2023, Huang, Ying wrote: > Hugh Dickins <hughd@google.com> writes: > > On Tue, 28 Feb 2023, Huang, Ying wrote: > >> Hugh Dickins <hughd@google.com> writes: > >> > On Fri, 24 Feb 2023, Huang Ying wrote: > >> >> > >> >> diff --git a/mm/migrate.c b/mm/migrate.c > >> >> index 91198b487e49..c17ce5ee8d92 100644 > >> >> --- a/mm/migrate.c > >> >> +++ b/mm/migrate.c > >> >> @@ -1843,6 +1843,51 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, > >> >> return rc; > >> >> } > >> >> > >> >> +static int migrate_pages_sync(struct list_head *from, new_page_t get_new_page, > >> >> + free_page_t put_new_page, unsigned long private, > >> >> + enum migrate_mode mode, int reason, struct list_head *ret_folios, > >> >> + struct list_head *split_folios, struct migrate_pages_stats *stats) > >> >> +{ > >> >> + int rc, nr_failed = 0; > >> >> + LIST_HEAD(folios); > >> >> + struct migrate_pages_stats astats; > >> >> + > >> >> + memset(&astats, 0, sizeof(astats)); > >> >> + /* Try to migrate in batch with MIGRATE_ASYNC mode firstly */ > >> >> + rc = migrate_pages_batch(from, get_new_page, put_new_page, private, MIGRATE_ASYNC, > >> >> + reason, &folios, split_folios, &astats, > >> >> + NR_MAX_MIGRATE_PAGES_RETRY); > >> > > >> > I wonder if that and below would better be NR_MAX_MIGRATE_PAGES_RETRY / 2. > >> > > >> > Though I've never got down to adjusting that number (and it's not a job > >> > to be done in this set of patches), those 10 retries sometimes terrify > >> > me, from a latency point of view. They can have such different weights: > >> > in the unmapped case, 10 retries is okay; but when a pinned page is mapped > >> > into 1000 processes, the thought of all that unmapping and TLB flushing > >> > and remapping is terrifying. > >> > > >> > Since you're retrying below, halve both numbers of retries for now? > >> > >> Yes. These are reasonable concerns. > >> > >> And in the original implementation, we only wait to lock page and wait > >> the writeback to complete if pass > 2. This is kind of trying to > >> migrate asynchronously for 3 times before the real synchronous > >> migration. So, should we delete the "force" logic (in > >> migrate_folio_unmap()), and try to migrate asynchronously for 3 times in > >> batch before migrating synchronously for 7 times one by one? > > > > Oh, that's a good idea (but please don't imagine I've thought it through): > > I hadn't realized the way in which your migrate_pages_sync() addition is > > kind of duplicating the way that the "force" argument conditions behaviour, > > It would be very appealing to delete the "force" argument now if you can. > > Sure. Will do that in the next version. > > > But aside from that, you've also made me wonder (again, please remember I > > don't have a good picture of the new migrate_pages() sequence in my head) > > whether you have already made a *great* strike against my 10 retries > > terror. Am I reading it right, that the unmapping is now done on the > > first try, and the remove_migration_ptes after the last try (all the > > pages involved having remained locked throughout)? > > Yes. You are right. Now, unmapping and moving are two separate steps, > and they are retried separately. After a folio has been unmapped > successfully, we will not remap/unmap it 10 times if the folio is pinned > so that failed to move (migrate_folio_move()). So the latency caused by > retrying is much better now. But I still tend to keep the total retry > number as before. Do you agree? Yes, I agree, keep the total retry number 10 as before: maybe someone in future will show that more than 5 is a waste of time, but there's little need to get into that now: if you've put an end to that 10 times unmapping and remapping, that's a great step forward, quite apart from the TLB flush batching itself. (I did change "no need" to "little need" above: I do have some some anxiety about the increased latencies from keeping folios locked and migration entries in place for significantly longer than before your batching: I won't be surprised if the maximum batch size has to be lowered, if reports of latency spikes come in; and that might extend to the retry count too.) Hugh
Hugh Dickins <hughd@google.com> writes: > On Wed, 1 Mar 2023, Huang, Ying wrote: >> Hugh Dickins <hughd@google.com> writes: >> > On Tue, 28 Feb 2023, Huang, Ying wrote: >> >> Hugh Dickins <hughd@google.com> writes: >> >> > On Fri, 24 Feb 2023, Huang Ying wrote: >> >> >> >> >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> >> >> index 91198b487e49..c17ce5ee8d92 100644 >> >> >> --- a/mm/migrate.c >> >> >> +++ b/mm/migrate.c >> >> >> @@ -1843,6 +1843,51 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, >> >> >> return rc; >> >> >> } >> >> >> >> >> >> +static int migrate_pages_sync(struct list_head *from, new_page_t get_new_page, >> >> >> + free_page_t put_new_page, unsigned long private, >> >> >> + enum migrate_mode mode, int reason, struct list_head *ret_folios, >> >> >> + struct list_head *split_folios, struct migrate_pages_stats *stats) >> >> >> +{ >> >> >> + int rc, nr_failed = 0; >> >> >> + LIST_HEAD(folios); >> >> >> + struct migrate_pages_stats astats; >> >> >> + >> >> >> + memset(&astats, 0, sizeof(astats)); >> >> >> + /* Try to migrate in batch with MIGRATE_ASYNC mode firstly */ >> >> >> + rc = migrate_pages_batch(from, get_new_page, put_new_page, private, MIGRATE_ASYNC, >> >> >> + reason, &folios, split_folios, &astats, >> >> >> + NR_MAX_MIGRATE_PAGES_RETRY); >> >> > >> >> > I wonder if that and below would better be NR_MAX_MIGRATE_PAGES_RETRY / 2. >> >> > >> >> > Though I've never got down to adjusting that number (and it's not a job >> >> > to be done in this set of patches), those 10 retries sometimes terrify >> >> > me, from a latency point of view. They can have such different weights: >> >> > in the unmapped case, 10 retries is okay; but when a pinned page is mapped >> >> > into 1000 processes, the thought of all that unmapping and TLB flushing >> >> > and remapping is terrifying. >> >> > >> >> > Since you're retrying below, halve both numbers of retries for now? >> >> >> >> Yes. These are reasonable concerns. >> >> >> >> And in the original implementation, we only wait to lock page and wait >> >> the writeback to complete if pass > 2. This is kind of trying to >> >> migrate asynchronously for 3 times before the real synchronous >> >> migration. So, should we delete the "force" logic (in >> >> migrate_folio_unmap()), and try to migrate asynchronously for 3 times in >> >> batch before migrating synchronously for 7 times one by one? >> > >> > Oh, that's a good idea (but please don't imagine I've thought it through): >> > I hadn't realized the way in which your migrate_pages_sync() addition is >> > kind of duplicating the way that the "force" argument conditions behaviour, >> > It would be very appealing to delete the "force" argument now if you can. >> >> Sure. Will do that in the next version. >> >> > But aside from that, you've also made me wonder (again, please remember I >> > don't have a good picture of the new migrate_pages() sequence in my head) >> > whether you have already made a *great* strike against my 10 retries >> > terror. Am I reading it right, that the unmapping is now done on the >> > first try, and the remove_migration_ptes after the last try (all the >> > pages involved having remained locked throughout)? >> >> Yes. You are right. Now, unmapping and moving are two separate steps, >> and they are retried separately. After a folio has been unmapped >> successfully, we will not remap/unmap it 10 times if the folio is pinned >> so that failed to move (migrate_folio_move()). So the latency caused by >> retrying is much better now. But I still tend to keep the total retry >> number as before. Do you agree? > > Yes, I agree, keep the total retry number 10 as before: maybe someone in > future will show that more than 5 is a waste of time, but there's little > need to get into that now: if you've put an end to that 10 times unmapping > and remapping, that's a great step forward, quite apart from the TLB flush > batching itself. > > (I did change "no need" to "little need" above: I do have some some > anxiety about the increased latencies from keeping folios locked and > migration entries in place for significantly longer than before your > batching: I won't be surprised if the maximum batch size has to be > lowered, if reports of latency spikes come in; and that might extend > to the retry count too.) Yes. Latency are always concerns for batching. We may revisit this when needed. Something good now is that we will never wait the lock or bit in batched mode. Latency tolerance depends on caller too, for example, when we migrate some cold pages from DRAM to CXL MEM, we can tolerate relatively long latency. If so, we can add a parameter to migrate_pages() to restrict the batch number and retry number when necessary too. Best Regards, Huang, Ying
On 3/1/2023 2:18 PM, Huang, Ying wrote: > Baolin Wang <baolin.wang@linux.alibaba.com> writes: > >> On 2/24/2023 10:11 PM, Huang Ying wrote: >>> When we have locked more than one folios, we cannot wait the lock or >>> bit (e.g., page lock, buffer head lock, writeback bit) synchronously. >>> Otherwise deadlock may be triggered. This make it hard to batch the >>> synchronous migration directly. >>> This patch re-enables batching synchronous migration via trying to >>> migrate in batch asynchronously firstly. And any folios that are >>> failed to be migrated asynchronously will be migrated synchronously >>> one by one. >>> Test shows that this can restore the TLB flushing batching >>> performance >>> for synchronous migration effectively. >>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> >>> Cc: Hugh Dickins <hughd@google.com> >>> Cc: "Xu, Pengfei" <pengfei.xu@intel.com> >>> Cc: Christoph Hellwig <hch@lst.de> >>> Cc: Stefan Roesch <shr@devkernel.io> >>> Cc: Tejun Heo <tj@kernel.org> >>> Cc: Xin Hao <xhao@linux.alibaba.com> >>> Cc: Zi Yan <ziy@nvidia.com> >>> Cc: Yang Shi <shy828301@gmail.com> >>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> >>> Cc: Matthew Wilcox <willy@infradead.org> >>> Cc: Mike Kravetz <mike.kravetz@oracle.com> >>> --- >>> mm/migrate.c | 65 ++++++++++++++++++++++++++++++++++++++++++++-------- >>> 1 file changed, 55 insertions(+), 10 deletions(-) >>> diff --git a/mm/migrate.c b/mm/migrate.c >>> index 91198b487e49..c17ce5ee8d92 100644 >>> --- a/mm/migrate.c >>> +++ b/mm/migrate.c >>> @@ -1843,6 +1843,51 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, >>> return rc; >>> } >>> +static int migrate_pages_sync(struct list_head *from, new_page_t >>> get_new_page, >>> + free_page_t put_new_page, unsigned long private, >>> + enum migrate_mode mode, int reason, struct list_head *ret_folios, >>> + struct list_head *split_folios, struct migrate_pages_stats *stats) >>> +{ >>> + int rc, nr_failed = 0; >>> + LIST_HEAD(folios); >>> + struct migrate_pages_stats astats; >>> + >>> + memset(&astats, 0, sizeof(astats)); >>> + /* Try to migrate in batch with MIGRATE_ASYNC mode firstly */ >>> + rc = migrate_pages_batch(from, get_new_page, put_new_page, private, MIGRATE_ASYNC, >>> + reason, &folios, split_folios, &astats, >>> + NR_MAX_MIGRATE_PAGES_RETRY); >>> + stats->nr_succeeded += astats.nr_succeeded; >>> + stats->nr_thp_succeeded += astats.nr_thp_succeeded; >>> + stats->nr_thp_split += astats.nr_thp_split; >>> + if (rc < 0) { >>> + stats->nr_failed_pages += astats.nr_failed_pages; >>> + stats->nr_thp_failed += astats.nr_thp_failed; >>> + list_splice_tail(&folios, ret_folios); >>> + return rc; >>> + } >>> + stats->nr_thp_failed += astats.nr_thp_split; >>> + nr_failed += astats.nr_thp_split; >>> + /* >>> + * Fall back to migrate all failed folios one by one synchronously. All >>> + * failed folios except split THPs will be retried, so their failure >>> + * isn't counted >>> + */ >>> + list_splice_tail_init(&folios, from); >>> + while (!list_empty(from)) { >>> + list_move(from->next, &folios); >>> + rc = migrate_pages_batch(&folios, get_new_page, put_new_page, >>> + private, mode, reason, ret_folios, >>> + split_folios, stats, NR_MAX_MIGRATE_PAGES_RETRY); >>> + list_splice_tail_init(&folios, ret_folios); >>> + if (rc < 0) >>> + return rc; >>> + nr_failed += rc; >>> + } >>> + >>> + return nr_failed; >>> +} >>> + >>> /* >>> * migrate_pages - migrate the folios specified in a list, to the free folios >>> * supplied as the target for the page migration >>> @@ -1874,7 +1919,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, >>> enum migrate_mode mode, int reason, unsigned int *ret_succeeded) >>> { >>> int rc, rc_gather; >>> - int nr_pages, batch; >>> + int nr_pages; >>> struct folio *folio, *folio2; >>> LIST_HEAD(folios); >>> LIST_HEAD(ret_folios); >>> @@ -1890,10 +1935,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, >>> if (rc_gather < 0) >>> goto out; >>> - if (mode == MIGRATE_ASYNC) >>> - batch = NR_MAX_BATCHED_MIGRATION; >>> - else >>> - batch = 1; >>> again: >>> nr_pages = 0; >>> list_for_each_entry_safe(folio, folio2, from, lru) { >>> @@ -1904,16 +1945,20 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, >>> } >>> nr_pages += folio_nr_pages(folio); >>> - if (nr_pages >= batch) >>> + if (nr_pages >= NR_MAX_BATCHED_MIGRATION) >>> break; >>> } >>> - if (nr_pages >= batch) >>> + if (nr_pages >= NR_MAX_BATCHED_MIGRATION) >>> list_cut_before(&folios, from, &folio2->lru); >>> else >>> list_splice_init(from, &folios); >>> - rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private, >>> - mode, reason, &ret_folios, &split_folios, &stats, >>> - NR_MAX_MIGRATE_PAGES_RETRY); >>> + if (mode == MIGRATE_ASYNC) >>> + rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private, >>> + mode, reason, &ret_folios, &split_folios, &stats, >>> + NR_MAX_MIGRATE_PAGES_RETRY); >>> + else >>> + rc = migrate_pages_sync(&folios, get_new_page, put_new_page, private, >>> + mode, reason, &ret_folios, &split_folios, &stats); >> >> For split folios, it seems also reasonable to use migrate_pages_sync() >> instead of always using fixed MIGRATE_ASYNC mode? > > For split folios, we only try to migrate them with minimal effort. > Previously, we decrease the retry number from 10 to 1. Now, I think > that it's reasonable to change the migration mode to MIGRATE_ASYNC to > reduce latency. They have been counted as failure anyway. Sounds reasonable. Thanks for explanation. Please feel free to add: Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
diff --git a/mm/migrate.c b/mm/migrate.c index 91198b487e49..c17ce5ee8d92 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1843,6 +1843,51 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, return rc; } +static int migrate_pages_sync(struct list_head *from, new_page_t get_new_page, + free_page_t put_new_page, unsigned long private, + enum migrate_mode mode, int reason, struct list_head *ret_folios, + struct list_head *split_folios, struct migrate_pages_stats *stats) +{ + int rc, nr_failed = 0; + LIST_HEAD(folios); + struct migrate_pages_stats astats; + + memset(&astats, 0, sizeof(astats)); + /* Try to migrate in batch with MIGRATE_ASYNC mode firstly */ + rc = migrate_pages_batch(from, get_new_page, put_new_page, private, MIGRATE_ASYNC, + reason, &folios, split_folios, &astats, + NR_MAX_MIGRATE_PAGES_RETRY); + stats->nr_succeeded += astats.nr_succeeded; + stats->nr_thp_succeeded += astats.nr_thp_succeeded; + stats->nr_thp_split += astats.nr_thp_split; + if (rc < 0) { + stats->nr_failed_pages += astats.nr_failed_pages; + stats->nr_thp_failed += astats.nr_thp_failed; + list_splice_tail(&folios, ret_folios); + return rc; + } + stats->nr_thp_failed += astats.nr_thp_split; + nr_failed += astats.nr_thp_split; + /* + * Fall back to migrate all failed folios one by one synchronously. All + * failed folios except split THPs will be retried, so their failure + * isn't counted + */ + list_splice_tail_init(&folios, from); + while (!list_empty(from)) { + list_move(from->next, &folios); + rc = migrate_pages_batch(&folios, get_new_page, put_new_page, + private, mode, reason, ret_folios, + split_folios, stats, NR_MAX_MIGRATE_PAGES_RETRY); + list_splice_tail_init(&folios, ret_folios); + if (rc < 0) + return rc; + nr_failed += rc; + } + + return nr_failed; +} + /* * migrate_pages - migrate the folios specified in a list, to the free folios * supplied as the target for the page migration @@ -1874,7 +1919,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, enum migrate_mode mode, int reason, unsigned int *ret_succeeded) { int rc, rc_gather; - int nr_pages, batch; + int nr_pages; struct folio *folio, *folio2; LIST_HEAD(folios); LIST_HEAD(ret_folios); @@ -1890,10 +1935,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, if (rc_gather < 0) goto out; - if (mode == MIGRATE_ASYNC) - batch = NR_MAX_BATCHED_MIGRATION; - else - batch = 1; again: nr_pages = 0; list_for_each_entry_safe(folio, folio2, from, lru) { @@ -1904,16 +1945,20 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, } nr_pages += folio_nr_pages(folio); - if (nr_pages >= batch) + if (nr_pages >= NR_MAX_BATCHED_MIGRATION) break; } - if (nr_pages >= batch) + if (nr_pages >= NR_MAX_BATCHED_MIGRATION) list_cut_before(&folios, from, &folio2->lru); else list_splice_init(from, &folios); - rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private, - mode, reason, &ret_folios, &split_folios, &stats, - NR_MAX_MIGRATE_PAGES_RETRY); + if (mode == MIGRATE_ASYNC) + rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private, + mode, reason, &ret_folios, &split_folios, &stats, + NR_MAX_MIGRATE_PAGES_RETRY); + else + rc = migrate_pages_sync(&folios, get_new_page, put_new_page, private, + mode, reason, &ret_folios, &split_folios, &stats); list_splice_tail_init(&folios, &ret_folios); if (rc < 0) { rc_gather = rc;