diff mbox series

[2/8] migrate_pages: separate hugetlb folios migration

Message ID	20221227002859.27740-3-ying.huang@intel.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Huang Ying <ying.huang@intel.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com>, Yang Shi <shy828301@gmail.com>, Baolin Wang <baolin.wang@linux.alibaba.com>, Oscar Salvador <osalvador@suse.de>, Matthew Wilcox <willy@infradead.org>, Bharata B Rao <bharata@amd.com>, Alistair Popple <apopple@nvidia.com>, haoxin <xhao@linux.alibaba.com> Subject: [PATCH 2/8] migrate_pages: separate hugetlb folios migration Date: Tue, 27 Dec 2022 08:28:53 +0800 Message-Id: <20221227002859.27740-3-ying.huang@intel.com> In-Reply-To: <20221227002859.27740-1-ying.huang@intel.com> References: <20221227002859.27740-1-ying.huang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	migrate_pages(): batch TLB flushing \| [0/8] migrate_pages(): batch TLB flushing [1/8] migrate_pages: organize stats with struct migrate_pages_stats [2/8] migrate_pages: separate hugetlb folios migration [3/8] migrate_pages: restrict number of pages to migrate in batch [4/8] migrate_pages: split unmap_and_move() to _unmap() and _move() [5/8] migrate_pages: batch _unmap and _move [6/8] migrate_pages: move migrate_folio_done() and migrate_folio_unmap() [7/8] migrate_pages: share more code between _unmap and _move [8/8] migrate_pages: batch flushing TLB

Commit Message

Huang, Ying Dec. 27, 2022, 12:28 a.m. UTC

  This is a preparation patch to batch the folio unmapping and moving
for the non-hugetlb folios.  Based on that we can batch the TLB
shootdown during the folio migration and make it possible to use some
hardware accelerator for the folio copying.

In this patch the hugetlb folios and non-hugetlb folios migration is
separated in migrate_pages() to make it easy to change the non-hugetlb
folios migration implementation.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: haoxin <xhao@linux.alibaba.com>
---
 mm/migrate.c | 114 ++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 99 insertions(+), 15 deletions(-)

Comments

Andrew Morton Dec. 28, 2022, 11:17 p.m. UTC | #1

On Tue, 27 Dec 2022 08:28:53 +0800 Huang Ying <ying.huang@intel.com> wrote:

> This is a preparation patch to batch the folio unmapping and moving
> for the non-hugetlb folios.  Based on that we can batch the TLB
> shootdown during the folio migration and make it possible to use some
> hardware accelerator for the folio copying.
> 
> In this patch the hugetlb folios and non-hugetlb folios migration is
> separated in migrate_pages() to make it easy to change the non-hugetlb
> folios migration implementation.
> 
> ...
>
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1404,6 +1404,87 @@ struct migrate_pages_stats {
>  	int nr_thp_split;
>  };
>  
> +static int migrate_hugetlbs(struct list_head *from, new_page_t get_new_page,
> +			    free_page_t put_new_page, unsigned long private,
> +			    enum migrate_mode mode, int reason,
> +			    struct migrate_pages_stats *stats,
> +			    struct list_head *ret_folios)
> +{
> +	int retry = 1;
> +	int nr_failed = 0;
> +	int nr_retry_pages = 0;
> +	int pass = 0;
> +	struct folio *folio, *folio2;
> +	int rc = 0, nr_pages;
> +
> +	for (pass = 0; pass < 10 && retry; pass++) {

Why 10?

> +		retry = 0;
> +		nr_retry_pages = 0;
> +
> +		list_for_each_entry_safe(folio, folio2, from, lru) {
> +			if (!folio_test_hugetlb(folio))
> +				continue;
> +
> +			nr_pages = folio_nr_pages(folio);
> +
> +			cond_resched();
> +
> +			rc = unmap_and_move_huge_page(get_new_page,
> +						      put_new_page, private,
> +						      &folio->page, pass > 2, mode,
> +						      reason, ret_folios);
> +			/*
> +			 * The rules are:
> +			 *	Success: hugetlb folio will be put back
> +			 *	-EAGAIN: stay on the from list
> +			 *	-ENOMEM: stay on the from list
> +			 *	-ENOSYS: stay on the from list
> +			 *	Other errno: put on ret_folios list
> +			 */
> +			switch(rc) {
> +			case -ENOSYS:
> +				/* Hugetlb migration is unsupported */
> +				nr_failed++;
> +				stats->nr_failed_pages += nr_pages;
> +				list_move_tail(&folio->lru, ret_folios);
> +				break;
> +			case -ENOMEM:
> +				/*
> +				 * When memory is low, don't bother to try to migrate
> +				 * other folios, just exit.
> +				 */
> +				nr_failed++;
> +				stats->nr_failed_pages += nr_pages;
> +				goto out;
> +			case -EAGAIN:
> +				retry++;
> +				nr_retry_pages += nr_pages;
> +				break;
> +			case MIGRATEPAGE_SUCCESS:
> +				stats->nr_succeeded += nr_pages;
> +				break;
> +			default:
> +				/*
> +				 * Permanent failure (-EBUSY, etc.):
> +				 * unlike -EAGAIN case, the failed folio is
> +				 * removed from migration folio list and not
> +				 * retried in the next outer loop.
> +				 */
> +				nr_failed++;
> +				stats->nr_failed_pages += nr_pages;
> +				break;
> +			}
> +		}
> +	}
> +out:
> +	nr_failed += retry;
> +	stats->nr_failed_pages += nr_retry_pages;
> +	if (rc != -ENOMEM)
> +		rc = nr_failed;
> +
> +	return rc;
> +}

The interpretation of the return value of this function is somewhat
unobvious.

I suggest that this function be fully commented.

Why does a retry contribute to nr_failed.  What is the interpretation
of nr_failed.  etcetera.

Huang, Ying Jan. 2, 2023, 11:53 p.m. UTC | #2

Andrew Morton <akpm@linux-foundation.org> writes:

> On Tue, 27 Dec 2022 08:28:53 +0800 Huang Ying <ying.huang@intel.com> wrote:
>
>> This is a preparation patch to batch the folio unmapping and moving
>> for the non-hugetlb folios.  Based on that we can batch the TLB
>> shootdown during the folio migration and make it possible to use some
>> hardware accelerator for the folio copying.
>> 
>> In this patch the hugetlb folios and non-hugetlb folios migration is
>> separated in migrate_pages() to make it easy to change the non-hugetlb
>> folios migration implementation.
>> 
>> ...
>>
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1404,6 +1404,87 @@ struct migrate_pages_stats {
>>  	int nr_thp_split;
>>  };
>>  
>> +static int migrate_hugetlbs(struct list_head *from, new_page_t get_new_page,
>> +			    free_page_t put_new_page, unsigned long private,
>> +			    enum migrate_mode mode, int reason,
>> +			    struct migrate_pages_stats *stats,
>> +			    struct list_head *ret_folios)
>> +{
>> +	int retry = 1;
>> +	int nr_failed = 0;
>> +	int nr_retry_pages = 0;
>> +	int pass = 0;
>> +	struct folio *folio, *folio2;
>> +	int rc = 0, nr_pages;
>> +
>> +	for (pass = 0; pass < 10 && retry; pass++) {
>
> Why 10?

This is inherited from the original max pass number from
migrate_pages().  Which is introduced in commit 49d2e9cc4544 ("[PATCH]
Swap Migration V5: migrate_pages() function").  From the code and commit
message, I don't find out why.  I guess that we need some magic number
anyway.

Now, because the magic number is used in 2 places (migrate_pages() and
migrate_hugetlbs()), it's better to define it as a constant macro?

>> +		retry = 0;
>> +		nr_retry_pages = 0;
>> +
>> +		list_for_each_entry_safe(folio, folio2, from, lru) {
>> +			if (!folio_test_hugetlb(folio))
>> +				continue;
>> +
>> +			nr_pages = folio_nr_pages(folio);
>> +
>> +			cond_resched();
>> +
>> +			rc = unmap_and_move_huge_page(get_new_page,
>> +						      put_new_page, private,
>> +						      &folio->page, pass > 2, mode,
>> +						      reason, ret_folios);
>> +			/*
>> +			 * The rules are:
>> +			 *	Success: hugetlb folio will be put back
>> +			 *	-EAGAIN: stay on the from list
>> +			 *	-ENOMEM: stay on the from list
>> +			 *	-ENOSYS: stay on the from list
>> +			 *	Other errno: put on ret_folios list
>> +			 */
>> +			switch(rc) {
>> +			case -ENOSYS:
>> +				/* Hugetlb migration is unsupported */
>> +				nr_failed++;
>> +				stats->nr_failed_pages += nr_pages;
>> +				list_move_tail(&folio->lru, ret_folios);
>> +				break;
>> +			case -ENOMEM:
>> +				/*
>> +				 * When memory is low, don't bother to try to migrate
>> +				 * other folios, just exit.
>> +				 */
>> +				nr_failed++;
>> +				stats->nr_failed_pages += nr_pages;
>> +				goto out;
>> +			case -EAGAIN:
>> +				retry++;
>> +				nr_retry_pages += nr_pages;
>> +				break;
>> +			case MIGRATEPAGE_SUCCESS:
>> +				stats->nr_succeeded += nr_pages;
>> +				break;
>> +			default:
>> +				/*
>> +				 * Permanent failure (-EBUSY, etc.):
>> +				 * unlike -EAGAIN case, the failed folio is
>> +				 * removed from migration folio list and not
>> +				 * retried in the next outer loop.
>> +				 */
>> +				nr_failed++;
>> +				stats->nr_failed_pages += nr_pages;
>> +				break;
>> +			}
>> +		}
>> +	}
>> +out:
>> +	nr_failed += retry;
>> +	stats->nr_failed_pages += nr_retry_pages;
>> +	if (rc != -ENOMEM)
>> +		rc = nr_failed;
>> +
>> +	return rc;
>> +}
>
> The interpretation of the return value of this function is somewhat
> unobvious.
>
> I suggest that this function be fully commented.
>
> Why does a retry contribute to nr_failed.  What is the interpretation
> of nr_failed.  etcetera.

Sure.  Will do that in the next version.

Best Regards,
Huang, Ying

Alistair Popple Jan. 5, 2023, 4:13 a.m. UTC | #3

Huang Ying <ying.huang@intel.com> writes:

> This is a preparation patch to batch the folio unmapping and moving
> for the non-hugetlb folios.  Based on that we can batch the TLB
> shootdown during the folio migration and make it possible to use some
> hardware accelerator for the folio copying.
>
> In this patch the hugetlb folios and non-hugetlb folios migration is
> separated in migrate_pages() to make it easy to change the non-hugetlb
> folios migration implementation.
>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Bharata B Rao <bharata@amd.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: haoxin <xhao@linux.alibaba.com>
> ---
>  mm/migrate.c | 114 ++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 99 insertions(+), 15 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index ec9263a33d38..bdbe73fe2eb7 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1404,6 +1404,87 @@ struct migrate_pages_stats {
>  	int nr_thp_split;
>  };
>  
> +static int migrate_hugetlbs(struct list_head *from, new_page_t get_new_page,
> +			    free_page_t put_new_page, unsigned long private,
> +			    enum migrate_mode mode, int reason,
> +			    struct migrate_pages_stats *stats,
> +			    struct list_head *ret_folios)
> +{
> +	int retry = 1;
> +	int nr_failed = 0;
> +	int nr_retry_pages = 0;
> +	int pass = 0;
> +	struct folio *folio, *folio2;
> +	int rc = 0, nr_pages;
> +
> +	for (pass = 0; pass < 10 && retry; pass++) {
> +		retry = 0;
> +		nr_retry_pages = 0;
> +
> +		list_for_each_entry_safe(folio, folio2, from, lru) {
> +			if (!folio_test_hugetlb(folio))
> +				continue;
> +
> +			nr_pages = folio_nr_pages(folio);
> +
> +			cond_resched();
> +
> +			rc = unmap_and_move_huge_page(get_new_page,
> +						      put_new_page, private,
> +						      &folio->page, pass > 2, mode,
> +						      reason, ret_folios);
> +			/*
> +			 * The rules are:
> +			 *	Success: hugetlb folio will be put back
> +			 *	-EAGAIN: stay on the from list
> +			 *	-ENOMEM: stay on the from list
> +			 *	-ENOSYS: stay on the from list
> +			 *	Other errno: put on ret_folios list
> +			 */
> +			switch(rc) {
> +			case -ENOSYS:
> +				/* Hugetlb migration is unsupported */
> +				nr_failed++;
> +				stats->nr_failed_pages += nr_pages;
> +				list_move_tail(&folio->lru, ret_folios);
> +				break;
> +			case -ENOMEM:
> +				/*
> +				 * When memory is low, don't bother to try to migrate
> +				 * other folios, just exit.
> +				 */
> +				nr_failed++;

This currently isn't relevant for -ENOMEM and I think it would be
clearer if it was dropped.

> +				stats->nr_failed_pages += nr_pages;

Makes sense not to continue migration with low memory, but shouldn't we
add the remaining unmigrated hugetlb folios to stats->nr_failed_pages as
well? Ie. don't we still have to continue the iteration to to find and
account for these?

> +				goto out;

Given this is the only use of the out label, and that there is a special
case for -ENOMEM there anyway I think it would be clearer to return
directly.

> +			case -EAGAIN:
> +				retry++;
> +				nr_retry_pages += nr_pages;
> +				break;
> +			case MIGRATEPAGE_SUCCESS:
> +				stats->nr_succeeded += nr_pages;
> +				break;
> +			default:
> +				/*
> +				 * Permanent failure (-EBUSY, etc.):
> +				 * unlike -EAGAIN case, the failed folio is
> +				 * removed from migration folio list and not
> +				 * retried in the next outer loop.
> +				 */
> +				nr_failed++;
> +				stats->nr_failed_pages += nr_pages;
> +				break;
> +			}
> +		}
> +	}
> +out:
> +	nr_failed += retry;
> +	stats->nr_failed_pages += nr_retry_pages;
> +	if (rc != -ENOMEM)
> +		rc = nr_failed;
> +
> +	return rc;
> +}
> +
>  /*
>   * migrate_pages - migrate the folios specified in a list, to the free folios
>   *		   supplied as the target for the page migration
> @@ -1437,7 +1518,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  	int retry = 1;
>  	int large_retry = 1;
>  	int thp_retry = 1;
> -	int nr_failed = 0;
> +	int nr_failed;
>  	int nr_retry_pages = 0;
>  	int nr_large_failed = 0;
>  	int pass = 0;
> @@ -1454,6 +1535,12 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  	trace_mm_migrate_pages_start(mode, reason);
>  
>  	memset(&stats, 0, sizeof(stats));
> +	rc = migrate_hugetlbs(from, get_new_page, put_new_page, private, mode, reason,
> +			      &stats, &ret_folios);
> +	if (rc < 0)
> +		goto out;
> +	nr_failed = rc;
> +
>  split_folio_migration:
>  	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
>  		retry = 0;
> @@ -1462,30 +1549,28 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  		nr_retry_pages = 0;
>  
>  		list_for_each_entry_safe(folio, folio2, from, lru) {
> +			if (folio_test_hugetlb(folio)) {

How do we hit this case? Shouldn't migrate_hugetlbs() have already moved
any hugetlb folios off the from list?

> +				list_move_tail(&folio->lru, &ret_folios);
> +				continue;
> +			}
> +
>  			/*
>  			 * Large folio statistics is based on the source large
>  			 * folio. Capture required information that might get
>  			 * lost during migration.
>  			 */
> -			is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
> +			is_large = folio_test_large(folio);
>  			is_thp = is_large && folio_test_pmd_mappable(folio);
>  			nr_pages = folio_nr_pages(folio);
> +
>  			cond_resched();
>  
> -			if (folio_test_hugetlb(folio))
> -				rc = unmap_and_move_huge_page(get_new_page,
> -						put_new_page, private,
> -						&folio->page, pass > 2, mode,
> -						reason,
> -						&ret_folios);
> -			else
> -				rc = unmap_and_move(get_new_page, put_new_page,
> -						private, folio, pass > 2, mode,
> -						reason, &ret_folios);
> +			rc = unmap_and_move(get_new_page, put_new_page,
> +					    private, folio, pass > 2, mode,
> +					    reason, &ret_folios);
>  			/*
>  			 * The rules are:
> -			 *	Success: non hugetlb folio will be freed, hugetlb
> -			 *		 folio will be put back
> +			 *	Success: folio will be freed
>  			 *	-EAGAIN: stay on the from list
>  			 *	-ENOMEM: stay on the from list
>  			 *	-ENOSYS: stay on the from list
> @@ -1512,7 +1597,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  						stats.nr_thp_split += is_thp;
>  						break;
>  					}
> -				/* Hugetlb migration is unsupported */
>  				} else if (!no_split_folio_counting) {
>  					nr_failed++;
>  				}

Huang, Ying Jan. 5, 2023, 5:51 a.m. UTC | #4

Alistair Popple <apopple@nvidia.com> writes:

> Huang Ying <ying.huang@intel.com> writes:
>
>> This is a preparation patch to batch the folio unmapping and moving
>> for the non-hugetlb folios.  Based on that we can batch the TLB
>> shootdown during the folio migration and make it possible to use some
>> hardware accelerator for the folio copying.
>>
>> In this patch the hugetlb folios and non-hugetlb folios migration is
>> separated in migrate_pages() to make it easy to change the non-hugetlb
>> folios migration implementation.
>>
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Yang Shi <shy828301@gmail.com>
>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Bharata B Rao <bharata@amd.com>
>> Cc: Alistair Popple <apopple@nvidia.com>
>> Cc: haoxin <xhao@linux.alibaba.com>
>> ---
>>  mm/migrate.c | 114 ++++++++++++++++++++++++++++++++++++++++++++-------
>>  1 file changed, 99 insertions(+), 15 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index ec9263a33d38..bdbe73fe2eb7 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1404,6 +1404,87 @@ struct migrate_pages_stats {
>>  	int nr_thp_split;
>>  };
>>  
>> +static int migrate_hugetlbs(struct list_head *from, new_page_t get_new_page,
>> +			    free_page_t put_new_page, unsigned long private,
>> +			    enum migrate_mode mode, int reason,
>> +			    struct migrate_pages_stats *stats,
>> +			    struct list_head *ret_folios)
>> +{
>> +	int retry = 1;
>> +	int nr_failed = 0;
>> +	int nr_retry_pages = 0;
>> +	int pass = 0;
>> +	struct folio *folio, *folio2;
>> +	int rc = 0, nr_pages;
>> +
>> +	for (pass = 0; pass < 10 && retry; pass++) {
>> +		retry = 0;
>> +		nr_retry_pages = 0;
>> +
>> +		list_for_each_entry_safe(folio, folio2, from, lru) {
>> +			if (!folio_test_hugetlb(folio))
>> +				continue;
>> +
>> +			nr_pages = folio_nr_pages(folio);
>> +
>> +			cond_resched();
>> +
>> +			rc = unmap_and_move_huge_page(get_new_page,
>> +						      put_new_page, private,
>> +						      &folio->page, pass > 2, mode,
>> +						      reason, ret_folios);
>> +			/*
>> +			 * The rules are:
>> +			 *	Success: hugetlb folio will be put back
>> +			 *	-EAGAIN: stay on the from list
>> +			 *	-ENOMEM: stay on the from list
>> +			 *	-ENOSYS: stay on the from list
>> +			 *	Other errno: put on ret_folios list
>> +			 */
>> +			switch(rc) {
>> +			case -ENOSYS:
>> +				/* Hugetlb migration is unsupported */
>> +				nr_failed++;
>> +				stats->nr_failed_pages += nr_pages;
>> +				list_move_tail(&folio->lru, ret_folios);
>> +				break;
>> +			case -ENOMEM:
>> +				/*
>> +				 * When memory is low, don't bother to try to migrate
>> +				 * other folios, just exit.
>> +				 */
>> +				nr_failed++;
>
> This currently isn't relevant for -ENOMEM and I think it would be
> clearer if it was dropped.

OK.

>> +				stats->nr_failed_pages += nr_pages;
>
> Makes sense not to continue migration with low memory, but shouldn't we
> add the remaining unmigrated hugetlb folios to stats->nr_failed_pages as
> well? Ie. don't we still have to continue the iteration to to find and
> account for these?

I think nr_failed_pages only counts tried pages.  IIUC, it's the
original behavior and behavior for non-hugetlb pages too.

>> +				goto out;
>
> Given this is the only use of the out label, and that there is a special
> case for -ENOMEM there anyway I think it would be clearer to return
> directly.

Sounds good.  Will do that in next version.

>> +			case -EAGAIN:
>> +				retry++;
>> +				nr_retry_pages += nr_pages;
>> +				break;
>> +			case MIGRATEPAGE_SUCCESS:
>> +				stats->nr_succeeded += nr_pages;
>> +				break;
>> +			default:
>> +				/*
>> +				 * Permanent failure (-EBUSY, etc.):
>> +				 * unlike -EAGAIN case, the failed folio is
>> +				 * removed from migration folio list and not
>> +				 * retried in the next outer loop.
>> +				 */
>> +				nr_failed++;
>> +				stats->nr_failed_pages += nr_pages;
>> +				break;
>> +			}
>> +		}
>> +	}
>> +out:
>> +	nr_failed += retry;
>> +	stats->nr_failed_pages += nr_retry_pages;
>> +	if (rc != -ENOMEM)
>> +		rc = nr_failed;
>> +
>> +	return rc;
>> +}
>> +
>>  /*
>>   * migrate_pages - migrate the folios specified in a list, to the free folios
>>   *		   supplied as the target for the page migration
>> @@ -1437,7 +1518,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  	int retry = 1;
>>  	int large_retry = 1;
>>  	int thp_retry = 1;
>> -	int nr_failed = 0;
>> +	int nr_failed;
>>  	int nr_retry_pages = 0;
>>  	int nr_large_failed = 0;
>>  	int pass = 0;
>> @@ -1454,6 +1535,12 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  	trace_mm_migrate_pages_start(mode, reason);
>>  
>>  	memset(&stats, 0, sizeof(stats));
>> +	rc = migrate_hugetlbs(from, get_new_page, put_new_page, private, mode, reason,
>> +			      &stats, &ret_folios);
>> +	if (rc < 0)
>> +		goto out;
>> +	nr_failed = rc;
>> +
>>  split_folio_migration:
>>  	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
>>  		retry = 0;
>> @@ -1462,30 +1549,28 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  		nr_retry_pages = 0;
>>  
>>  		list_for_each_entry_safe(folio, folio2, from, lru) {
>> +			if (folio_test_hugetlb(folio)) {
>
> How do we hit this case? Shouldn't migrate_hugetlbs() have already moved
> any hugetlb folios off the from list?

Retried hugetlb folios will be kept in from list.

>> +				list_move_tail(&folio->lru, &ret_folios);
>> +				continue;
>> +			}
>> +
>>  			/*
>>  			 * Large folio statistics is based on the source large
>>  			 * folio. Capture required information that might get
>>  			 * lost during migration.
>>  			 */
>> -			is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
>> +			is_large = folio_test_large(folio);
>>  			is_thp = is_large && folio_test_pmd_mappable(folio);
>>  			nr_pages = folio_nr_pages(folio);
>> +
>>  			cond_resched();
>>  
>> -			if (folio_test_hugetlb(folio))
>> -				rc = unmap_and_move_huge_page(get_new_page,
>> -						put_new_page, private,
>> -						&folio->page, pass > 2, mode,
>> -						reason,
>> -						&ret_folios);
>> -			else
>> -				rc = unmap_and_move(get_new_page, put_new_page,
>> -						private, folio, pass > 2, mode,
>> -						reason, &ret_folios);
>> +			rc = unmap_and_move(get_new_page, put_new_page,
>> +					    private, folio, pass > 2, mode,
>> +					    reason, &ret_folios);
>>  			/*
>>  			 * The rules are:
>> -			 *	Success: non hugetlb folio will be freed, hugetlb
>> -			 *		 folio will be put back
>> +			 *	Success: folio will be freed
>>  			 *	-EAGAIN: stay on the from list
>>  			 *	-ENOMEM: stay on the from list
>>  			 *	-ENOSYS: stay on the from list
>> @@ -1512,7 +1597,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>  						stats.nr_thp_split += is_thp;
>>  						break;
>>  					}
>> -				/* Hugetlb migration is unsupported */
>>  				} else if (!no_split_folio_counting) {
>>  					nr_failed++;
>>  				}

Best Regards,
Huang, Ying

Alistair Popple Jan. 5, 2023, 6:43 a.m. UTC | #5

"Huang, Ying" <ying.huang@intel.com> writes:

> Alistair Popple <apopple@nvidia.com> writes:
>
>> Huang Ying <ying.huang@intel.com> writes:
>>
>>> This is a preparation patch to batch the folio unmapping and moving
>>> for the non-hugetlb folios.  Based on that we can batch the TLB
>>> shootdown during the folio migration and make it possible to use some
>>> hardware accelerator for the folio copying.
>>>
>>> In this patch the hugetlb folios and non-hugetlb folios migration is
>>> separated in migrate_pages() to make it easy to change the non-hugetlb
>>> folios migration implementation.
>>>
>>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>>> Cc: Zi Yan <ziy@nvidia.com>
>>> Cc: Yang Shi <shy828301@gmail.com>
>>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> Cc: Oscar Salvador <osalvador@suse.de>
>>> Cc: Matthew Wilcox <willy@infradead.org>
>>> Cc: Bharata B Rao <bharata@amd.com>
>>> Cc: Alistair Popple <apopple@nvidia.com>
>>> Cc: haoxin <xhao@linux.alibaba.com>
>>> ---
>>>  mm/migrate.c | 114 ++++++++++++++++++++++++++++++++++++++++++++-------
>>>  1 file changed, 99 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index ec9263a33d38..bdbe73fe2eb7 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1404,6 +1404,87 @@ struct migrate_pages_stats {
>>>  	int nr_thp_split;
>>>  };
>>>  
>>> +static int migrate_hugetlbs(struct list_head *from, new_page_t get_new_page,
>>> +			    free_page_t put_new_page, unsigned long private,
>>> +			    enum migrate_mode mode, int reason,
>>> +			    struct migrate_pages_stats *stats,
>>> +			    struct list_head *ret_folios)
>>> +{
>>> +	int retry = 1;
>>> +	int nr_failed = 0;
>>> +	int nr_retry_pages = 0;
>>> +	int pass = 0;
>>> +	struct folio *folio, *folio2;
>>> +	int rc = 0, nr_pages;
>>> +
>>> +	for (pass = 0; pass < 10 && retry; pass++) {
>>> +		retry = 0;
>>> +		nr_retry_pages = 0;
>>> +
>>> +		list_for_each_entry_safe(folio, folio2, from, lru) {
>>> +			if (!folio_test_hugetlb(folio))
>>> +				continue;
>>> +
>>> +			nr_pages = folio_nr_pages(folio);
>>> +
>>> +			cond_resched();
>>> +
>>> +			rc = unmap_and_move_huge_page(get_new_page,
>>> +						      put_new_page, private,
>>> +						      &folio->page, pass > 2, mode,
>>> +						      reason, ret_folios);
>>> +			/*
>>> +			 * The rules are:
>>> +			 *	Success: hugetlb folio will be put back
>>> +			 *	-EAGAIN: stay on the from list
>>> +			 *	-ENOMEM: stay on the from list
>>> +			 *	-ENOSYS: stay on the from list
>>> +			 *	Other errno: put on ret_folios list
>>> +			 */
>>> +			switch(rc) {
>>> +			case -ENOSYS:
>>> +				/* Hugetlb migration is unsupported */
>>> +				nr_failed++;
>>> +				stats->nr_failed_pages += nr_pages;
>>> +				list_move_tail(&folio->lru, ret_folios);
>>> +				break;
>>> +			case -ENOMEM:
>>> +				/*
>>> +				 * When memory is low, don't bother to try to migrate
>>> +				 * other folios, just exit.
>>> +				 */
>>> +				nr_failed++;
>>
>> This currently isn't relevant for -ENOMEM and I think it would be
>> clearer if it was dropped.
>
> OK.
>
>>> +				stats->nr_failed_pages += nr_pages;
>>
>> Makes sense not to continue migration with low memory, but shouldn't we
>> add the remaining unmigrated hugetlb folios to stats->nr_failed_pages as
>> well? Ie. don't we still have to continue the iteration to to find and
>> account for these?
>
> I think nr_failed_pages only counts tried pages.  IIUC, it's the
> original behavior and behavior for non-hugetlb pages too.

Hmm, I agree it seems this is the original behavior but that behaviour
seems arbitrary and wrong IMHO. The page failed to migrate, therefore it
should count as such. The fact we didn't even try seems irrelevant.

Indeed it looks like this was introduced because it was confusing to see
no failures even though migrate_pages() was called - see dfef2ef4027b
("mm, migrate: increment fail count on ENOMEM").

But that seems inconsistent - why count this one folio as failed because
of the allocation failure while other folios which would also likely
cause allocation failures don't get counted? Fixing it is probably
outside the scope of this series so I won't insist, but it would be nice
as it could still lead to confusion in some scenarios.

[...]

>>> @@ -1462,30 +1549,28 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>  		nr_retry_pages = 0;
>>>  
>>>  		list_for_each_entry_safe(folio, folio2, from, lru) {
>>> +			if (folio_test_hugetlb(folio)) {
>>
>> How do we hit this case? Shouldn't migrate_hugetlbs() have already moved
>> any hugetlb folios off the from list?
>
> Retried hugetlb folios will be kept in from list.

Couldn't migrate_hugetlbs() remove the failing retried pages from the
list on the final pass? That seems cleaner to me.

>>> +				list_move_tail(&folio->lru, &ret_folios);
>>> +				continue;
>>> +			}
>>> +
>>>  			/*
>>>  			 * Large folio statistics is based on the source large
>>>  			 * folio. Capture required information that might get
>>>  			 * lost during migration.
>>>  			 */
>>> -			is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
>>> +			is_large = folio_test_large(folio);
>>>  			is_thp = is_large && folio_test_pmd_mappable(folio);
>>>  			nr_pages = folio_nr_pages(folio);
>>> +
>>>  			cond_resched();
>>>  
>>> -			if (folio_test_hugetlb(folio))
>>> -				rc = unmap_and_move_huge_page(get_new_page,
>>> -						put_new_page, private,
>>> -						&folio->page, pass > 2, mode,
>>> -						reason,
>>> -						&ret_folios);
>>> -			else
>>> -				rc = unmap_and_move(get_new_page, put_new_page,
>>> -						private, folio, pass > 2, mode,
>>> -						reason, &ret_folios);
>>> +			rc = unmap_and_move(get_new_page, put_new_page,
>>> +					    private, folio, pass > 2, mode,
>>> +					    reason, &ret_folios);
>>>  			/*
>>>  			 * The rules are:
>>> -			 *	Success: non hugetlb folio will be freed, hugetlb
>>> -			 *		 folio will be put back
>>> +			 *	Success: folio will be freed
>>>  			 *	-EAGAIN: stay on the from list
>>>  			 *	-ENOMEM: stay on the from list
>>>  			 *	-ENOSYS: stay on the from list
>>> @@ -1512,7 +1597,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>  						stats.nr_thp_split += is_thp;
>>>  						break;
>>>  					}
>>> -				/* Hugetlb migration is unsupported */
>>>  				} else if (!no_split_folio_counting) {
>>>  					nr_failed++;
>>>  				}
>
> Best Regards,
> Huang, Ying

Huang, Ying Jan. 5, 2023, 7:31 a.m. UTC | #6

[snip]

>
>>>> @@ -1462,30 +1549,28 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>  		nr_retry_pages = 0;
>>>>  
>>>>  		list_for_each_entry_safe(folio, folio2, from, lru) {
>>>> +			if (folio_test_hugetlb(folio)) {
>>>
>>> How do we hit this case? Shouldn't migrate_hugetlbs() have already moved
>>> any hugetlb folios off the from list?
>>
>> Retried hugetlb folios will be kept in from list.
>
> Couldn't migrate_hugetlbs() remove the failing retried pages from the
> list on the final pass? That seems cleaner to me.

To do that, we need to go through the folio list again to remove all
hugetlb pages.  It could be time-consuming in some cases.  So I think
that it's better to keep this.

Best Regards,
Huang, Ying

>>>> +				list_move_tail(&folio->lru, &ret_folios);
>>>> +				continue;
>>>> +			}
>>>> +
>>>>  			/*
>>>>  			 * Large folio statistics is based on the source large
>>>>  			 * folio. Capture required information that might get
>>>>  			 * lost during migration.
>>>>  			 */
>>>> -			is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
>>>> +			is_large = folio_test_large(folio);
>>>>  			is_thp = is_large && folio_test_pmd_mappable(folio);
>>>>  			nr_pages = folio_nr_pages(folio);
>>>> +
>>>>  			cond_resched();
>>>>  
>>>> -			if (folio_test_hugetlb(folio))
>>>> -				rc = unmap_and_move_huge_page(get_new_page,
>>>> -						put_new_page, private,
>>>> -						&folio->page, pass > 2, mode,
>>>> -						reason,
>>>> -						&ret_folios);
>>>> -			else
>>>> -				rc = unmap_and_move(get_new_page, put_new_page,
>>>> -						private, folio, pass > 2, mode,
>>>> -						reason, &ret_folios);
>>>> +			rc = unmap_and_move(get_new_page, put_new_page,
>>>> +					    private, folio, pass > 2, mode,
>>>> +					    reason, &ret_folios);
>>>>  			/*
>>>>  			 * The rules are:
>>>> -			 *	Success: non hugetlb folio will be freed, hugetlb
>>>> -			 *		 folio will be put back
>>>> +			 *	Success: folio will be freed
>>>>  			 *	-EAGAIN: stay on the from list
>>>>  			 *	-ENOMEM: stay on the from list
>>>>  			 *	-ENOSYS: stay on the from list
>>>> @@ -1512,7 +1597,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>  						stats.nr_thp_split += is_thp;
>>>>  						break;
>>>>  					}
>>>> -				/* Hugetlb migration is unsupported */
>>>>  				} else if (!no_split_folio_counting) {
>>>>  					nr_failed++;
>>>>  				}

Alistair Popple Jan. 5, 2023, 7:39 a.m. UTC | #7

"Huang, Ying" <ying.huang@intel.com> writes:

> [snip]
>
>>
>>>>> @@ -1462,30 +1549,28 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>>  		nr_retry_pages = 0;
>>>>>  
>>>>>  		list_for_each_entry_safe(folio, folio2, from, lru) {
>>>>> +			if (folio_test_hugetlb(folio)) {
>>>>
>>>> How do we hit this case? Shouldn't migrate_hugetlbs() have already moved
>>>> any hugetlb folios off the from list?
>>>
>>> Retried hugetlb folios will be kept in from list.
>>
>> Couldn't migrate_hugetlbs() remove the failing retried pages from the
>> list on the final pass? That seems cleaner to me.
>
> To do that, we need to go through the folio list again to remove all
> hugetlb pages.  It could be time-consuming in some cases.  So I think
> that it's better to keep this.

Why? Couldn't we test pass == 9 and remove it from the list if it fails
the final retry in migrate_hugetlbs()? In any case if it's on the list
due to failed retries we have already passed over it 10 times, so the
extra loop hardly seems like a problem.

 - Alistair

> Best Regards,
> Huang, Ying
>
>>>>> +				list_move_tail(&folio->lru, &ret_folios);
>>>>> +				continue;
>>>>> +			}
>>>>> +
>>>>>  			/*
>>>>>  			 * Large folio statistics is based on the source large
>>>>>  			 * folio. Capture required information that might get
>>>>>  			 * lost during migration.
>>>>>  			 */
>>>>> -			is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
>>>>> +			is_large = folio_test_large(folio);
>>>>>  			is_thp = is_large && folio_test_pmd_mappable(folio);
>>>>>  			nr_pages = folio_nr_pages(folio);
>>>>> +
>>>>>  			cond_resched();
>>>>>  
>>>>> -			if (folio_test_hugetlb(folio))
>>>>> -				rc = unmap_and_move_huge_page(get_new_page,
>>>>> -						put_new_page, private,
>>>>> -						&folio->page, pass > 2, mode,
>>>>> -						reason,
>>>>> -						&ret_folios);
>>>>> -			else
>>>>> -				rc = unmap_and_move(get_new_page, put_new_page,
>>>>> -						private, folio, pass > 2, mode,
>>>>> -						reason, &ret_folios);
>>>>> +			rc = unmap_and_move(get_new_page, put_new_page,
>>>>> +					    private, folio, pass > 2, mode,
>>>>> +					    reason, &ret_folios);
>>>>>  			/*
>>>>>  			 * The rules are:
>>>>> -			 *	Success: non hugetlb folio will be freed, hugetlb
>>>>> -			 *		 folio will be put back
>>>>> +			 *	Success: folio will be freed
>>>>>  			 *	-EAGAIN: stay on the from list
>>>>>  			 *	-ENOMEM: stay on the from list
>>>>>  			 *	-ENOSYS: stay on the from list
>>>>> @@ -1512,7 +1597,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>>  						stats.nr_thp_split += is_thp;
>>>>>  						break;
>>>>>  					}
>>>>> -				/* Hugetlb migration is unsupported */
>>>>>  				} else if (!no_split_folio_counting) {
>>>>>  					nr_failed++;
>>>>>  				}

Huang, Ying Jan. 9, 2023, 7:23 a.m. UTC | #8

Alistair Popple <apopple@nvidia.com> writes:

> "Huang, Ying" <ying.huang@intel.com> writes:
>
>> [snip]
>>
>>>
>>>>>> @@ -1462,30 +1549,28 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>>>  		nr_retry_pages = 0;
>>>>>>  
>>>>>>  		list_for_each_entry_safe(folio, folio2, from, lru) {
>>>>>> +			if (folio_test_hugetlb(folio)) {
>>>>>
>>>>> How do we hit this case? Shouldn't migrate_hugetlbs() have already moved
>>>>> any hugetlb folios off the from list?
>>>>
>>>> Retried hugetlb folios will be kept in from list.
>>>
>>> Couldn't migrate_hugetlbs() remove the failing retried pages from the
>>> list on the final pass? That seems cleaner to me.
>>
>> To do that, we need to go through the folio list again to remove all
>> hugetlb pages.  It could be time-consuming in some cases.  So I think
>> that it's better to keep this.
>
> Why? Couldn't we test pass == 9 and remove it from the list if it fails
> the final retry in migrate_hugetlbs()? In any case if it's on the list
> due to failed retries we have already passed over it 10 times, so the
> extra loop hardly seems like a problem.

Yes.  That's possible.  But "test pass == 9" looks more tricky than the
current code.

Feel free to change the code as you suggested on top this series.  If no
others object, I'm OK with that.  OK?

Best Regards,
Huang, Ying

>>
>>>>>> +				list_move_tail(&folio->lru, &ret_folios);
>>>>>> +				continue;
>>>>>> +			}
>>>>>> +
>>>>>>  			/*
>>>>>>  			 * Large folio statistics is based on the source large
>>>>>>  			 * folio. Capture required information that might get
>>>>>>  			 * lost during migration.
>>>>>>  			 */
>>>>>> -			is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
>>>>>> +			is_large = folio_test_large(folio);
>>>>>>  			is_thp = is_large && folio_test_pmd_mappable(folio);
>>>>>>  			nr_pages = folio_nr_pages(folio);
>>>>>> +
>>>>>>  			cond_resched();
>>>>>>  
>>>>>> -			if (folio_test_hugetlb(folio))
>>>>>> -				rc = unmap_and_move_huge_page(get_new_page,
>>>>>> -						put_new_page, private,
>>>>>> -						&folio->page, pass > 2, mode,
>>>>>> -						reason,
>>>>>> -						&ret_folios);
>>>>>> -			else
>>>>>> -				rc = unmap_and_move(get_new_page, put_new_page,
>>>>>> -						private, folio, pass > 2, mode,
>>>>>> -						reason, &ret_folios);
>>>>>> +			rc = unmap_and_move(get_new_page, put_new_page,
>>>>>> +					    private, folio, pass > 2, mode,
>>>>>> +					    reason, &ret_folios);
>>>>>>  			/*
>>>>>>  			 * The rules are:
>>>>>> -			 *	Success: non hugetlb folio will be freed, hugetlb
>>>>>> -			 *		 folio will be put back
>>>>>> +			 *	Success: folio will be freed
>>>>>>  			 *	-EAGAIN: stay on the from list
>>>>>>  			 *	-ENOMEM: stay on the from list
>>>>>>  			 *	-ENOSYS: stay on the from list
>>>>>> @@ -1512,7 +1597,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>>>  						stats.nr_thp_split += is_thp;
>>>>>>  						break;
>>>>>>  					}
>>>>>> -				/* Hugetlb migration is unsupported */
>>>>>>  				} else if (!no_split_folio_counting) {
>>>>>>  					nr_failed++;
>>>>>>  				}

Alistair Popple Jan. 10, 2023, 1:37 a.m. UTC | #9

"Huang, Ying" <ying.huang@intel.com> writes:

> Alistair Popple <apopple@nvidia.com> writes:
>
>> "Huang, Ying" <ying.huang@intel.com> writes:
>>
>>> [snip]
>>>
>>>>
>>>>>>> @@ -1462,30 +1549,28 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>>>>  		nr_retry_pages = 0;
>>>>>>>  
>>>>>>>  		list_for_each_entry_safe(folio, folio2, from, lru) {
>>>>>>> +			if (folio_test_hugetlb(folio)) {
>>>>>>
>>>>>> How do we hit this case? Shouldn't migrate_hugetlbs() have already moved
>>>>>> any hugetlb folios off the from list?
>>>>>
>>>>> Retried hugetlb folios will be kept in from list.
>>>>
>>>> Couldn't migrate_hugetlbs() remove the failing retried pages from the
>>>> list on the final pass? That seems cleaner to me.
>>>
>>> To do that, we need to go through the folio list again to remove all
>>> hugetlb pages.  It could be time-consuming in some cases.  So I think
>>> that it's better to keep this.
>>
>> Why? Couldn't we test pass == 9 and remove it from the list if it fails
>> the final retry in migrate_hugetlbs()? In any case if it's on the list
>> due to failed retries we have already passed over it 10 times, so the
>> extra loop hardly seems like a problem.
>
> Yes.  That's possible.  But "test pass == 9" looks more tricky than the
> current code.
>
> Feel free to change the code as you suggested on top this series.  If no
> others object, I'm OK with that.  OK?

Sure. Part of my problem when reviewing this series is that everytime I
look at migrate_pages(), and in particular the number of conditionals
that are sufficiently non-obvious to require extensive comments, I can't
help but think it all needs some refactoring before making it any more
complicated. However perhaps I am alone in that.

Either way this kind of refactoring has been on my TODO list for a while
- I have a WIP series to converge some of the migrate_device.c code
which I will need to rebase on this anyway so as you suggest I could
make a lot of my suggested changes on top of this series.

Regards,

Alistair

> Best Regards,
> Huang, Ying
>
>>>
>>>>>>> +				list_move_tail(&folio->lru, &ret_folios);
>>>>>>> +				continue;
>>>>>>> +			}
>>>>>>> +
>>>>>>>  			/*
>>>>>>>  			 * Large folio statistics is based on the source large
>>>>>>>  			 * folio. Capture required information that might get
>>>>>>>  			 * lost during migration.
>>>>>>>  			 */
>>>>>>> -			is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
>>>>>>> +			is_large = folio_test_large(folio);
>>>>>>>  			is_thp = is_large && folio_test_pmd_mappable(folio);
>>>>>>>  			nr_pages = folio_nr_pages(folio);
>>>>>>> +
>>>>>>>  			cond_resched();
>>>>>>>  
>>>>>>> -			if (folio_test_hugetlb(folio))
>>>>>>> -				rc = unmap_and_move_huge_page(get_new_page,
>>>>>>> -						put_new_page, private,
>>>>>>> -						&folio->page, pass > 2, mode,
>>>>>>> -						reason,
>>>>>>> -						&ret_folios);
>>>>>>> -			else
>>>>>>> -				rc = unmap_and_move(get_new_page, put_new_page,
>>>>>>> -						private, folio, pass > 2, mode,
>>>>>>> -						reason, &ret_folios);
>>>>>>> +			rc = unmap_and_move(get_new_page, put_new_page,
>>>>>>> +					    private, folio, pass > 2, mode,
>>>>>>> +					    reason, &ret_folios);
>>>>>>>  			/*
>>>>>>>  			 * The rules are:
>>>>>>> -			 *	Success: non hugetlb folio will be freed, hugetlb
>>>>>>> -			 *		 folio will be put back
>>>>>>> +			 *	Success: folio will be freed
>>>>>>>  			 *	-EAGAIN: stay on the from list
>>>>>>>  			 *	-ENOMEM: stay on the from list
>>>>>>>  			 *	-ENOSYS: stay on the from list
>>>>>>> @@ -1512,7 +1597,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>>>>>  						stats.nr_thp_split += is_thp;
>>>>>>>  						break;
>>>>>>>  					}
>>>>>>> -				/* Hugetlb migration is unsupported */
>>>>>>>  				} else if (!no_split_folio_counting) {
>>>>>>>  					nr_failed++;
>>>>>>>  				}

diff mbox series

Patch

diff --git a/mm/migrate.c b/mm/migrate.c
index ec9263a33d38..bdbe73fe2eb7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1404,6 +1404,87 @@  struct migrate_pages_stats {
 	int nr_thp_split;
 };
 
+static int migrate_hugetlbs(struct list_head *from, new_page_t get_new_page,
+			    free_page_t put_new_page, unsigned long private,
+			    enum migrate_mode mode, int reason,
+			    struct migrate_pages_stats *stats,
+			    struct list_head *ret_folios)
+{
+	int retry = 1;
+	int nr_failed = 0;
+	int nr_retry_pages = 0;
+	int pass = 0;
+	struct folio *folio, *folio2;
+	int rc = 0, nr_pages;
+
+	for (pass = 0; pass < 10 && retry; pass++) {
+		retry = 0;
+		nr_retry_pages = 0;
+
+		list_for_each_entry_safe(folio, folio2, from, lru) {
+			if (!folio_test_hugetlb(folio))
+				continue;
+
+			nr_pages = folio_nr_pages(folio);
+
+			cond_resched();
+
+			rc = unmap_and_move_huge_page(get_new_page,
+						      put_new_page, private,
+						      &folio->page, pass > 2, mode,
+						      reason, ret_folios);
+			/*
+			 * The rules are:
+			 *	Success: hugetlb folio will be put back
+			 *	-EAGAIN: stay on the from list
+			 *	-ENOMEM: stay on the from list
+			 *	-ENOSYS: stay on the from list
+			 *	Other errno: put on ret_folios list
+			 */
+			switch(rc) {
+			case -ENOSYS:
+				/* Hugetlb migration is unsupported */
+				nr_failed++;
+				stats->nr_failed_pages += nr_pages;
+				list_move_tail(&folio->lru, ret_folios);
+				break;
+			case -ENOMEM:
+				/*
+				 * When memory is low, don't bother to try to migrate
+				 * other folios, just exit.
+				 */
+				nr_failed++;
+				stats->nr_failed_pages += nr_pages;
+				goto out;
+			case -EAGAIN:
+				retry++;
+				nr_retry_pages += nr_pages;
+				break;
+			case MIGRATEPAGE_SUCCESS:
+				stats->nr_succeeded += nr_pages;
+				break;
+			default:
+				/*
+				 * Permanent failure (-EBUSY, etc.):
+				 * unlike -EAGAIN case, the failed folio is
+				 * removed from migration folio list and not
+				 * retried in the next outer loop.
+				 */
+				nr_failed++;
+				stats->nr_failed_pages += nr_pages;
+				break;
+			}
+		}
+	}
+out:
+	nr_failed += retry;
+	stats->nr_failed_pages += nr_retry_pages;
+	if (rc != -ENOMEM)
+		rc = nr_failed;
+
+	return rc;
+}
+
 /*
  * migrate_pages - migrate the folios specified in a list, to the free folios
  *		   supplied as the target for the page migration
@@ -1437,7 +1518,7 @@  int migrate_pages(struct list_head *from, new_page_t get_new_page,
 	int retry = 1;
 	int large_retry = 1;
 	int thp_retry = 1;
-	int nr_failed = 0;
+	int nr_failed;
 	int nr_retry_pages = 0;
 	int nr_large_failed = 0;
 	int pass = 0;
@@ -1454,6 +1535,12 @@  int migrate_pages(struct list_head *from, new_page_t get_new_page,
 	trace_mm_migrate_pages_start(mode, reason);
 
 	memset(&stats, 0, sizeof(stats));
+	rc = migrate_hugetlbs(from, get_new_page, put_new_page, private, mode, reason,
+			      &stats, &ret_folios);
+	if (rc < 0)
+		goto out;
+	nr_failed = rc;
+
 split_folio_migration:
 	for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
 		retry = 0;
@@ -1462,30 +1549,28 @@  int migrate_pages(struct list_head *from, new_page_t get_new_page,
 		nr_retry_pages = 0;
 
 		list_for_each_entry_safe(folio, folio2, from, lru) {
+			if (folio_test_hugetlb(folio)) {
+				list_move_tail(&folio->lru, &ret_folios);
+				continue;
+			}
+
 			/*
 			 * Large folio statistics is based on the source large
 			 * folio. Capture required information that might get
 			 * lost during migration.
 			 */
-			is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
+			is_large = folio_test_large(folio);
 			is_thp = is_large && folio_test_pmd_mappable(folio);
 			nr_pages = folio_nr_pages(folio);
+
 			cond_resched();
 
-			if (folio_test_hugetlb(folio))
-				rc = unmap_and_move_huge_page(get_new_page,
-						put_new_page, private,
-						&folio->page, pass > 2, mode,
-						reason,
-						&ret_folios);
-			else
-				rc = unmap_and_move(get_new_page, put_new_page,
-						private, folio, pass > 2, mode,
-						reason, &ret_folios);
+			rc = unmap_and_move(get_new_page, put_new_page,
+					    private, folio, pass > 2, mode,
+					    reason, &ret_folios);
 			/*
 			 * The rules are:
-			 *	Success: non hugetlb folio will be freed, hugetlb
-			 *		 folio will be put back
+			 *	Success: folio will be freed
 			 *	-EAGAIN: stay on the from list
 			 *	-ENOMEM: stay on the from list
 			 *	-ENOSYS: stay on the from list
@@ -1512,7 +1597,6 @@  int migrate_pages(struct list_head *from, new_page_t get_new_page,
 						stats.nr_thp_split += is_thp;
 						break;
 					}
-				/* Hugetlb migration is unsupported */
 				} else if (!no_split_folio_counting) {
 					nr_failed++;
 				}