[v3] sched/numa, mm: do not promote folios to nodes not set N_MEMORY

Message ID 20240216114045.24828-1-byungchul@sk.com
State New
Headers
Series [v3] sched/numa, mm: do not promote folios to nodes not set N_MEMORY |

Commit Message

Byungchul Park Feb. 16, 2024, 11:40 a.m. UTC
  Changes from v2:
	1. Rewrite the comment in code and the commit message becasue it
	   turns out that this patch is not the real fix for the oops
	   descriped. The real fix goes in another patch below:

	   https://lore.kernel.org/lkml/20240216111502.79759-1-byungchul@sk.com/

Changes from v1:
	1. Trim the verbose oops in the commit message. (feedbacked by
	   Phil Auld)
	2. Rewrite a comment in code. (feedbacked by Phil Auld)

--->8---
From 150af2f78e19217a1d03e47e3ee5279684590fb4 Mon Sep 17 00:00:00 2001
From: Byungchul Park <byungchul@sk.com>
Date: Fri, 16 Feb 2024 20:18:10 +0900
Subject: [PATCH v3] sched/numa, mm: do not promote folios to nodes not set N_MEMORY

A numa node might not have its local memory but CPUs. Promoting a folio
to the node's local memory is nonsense. So avoid nodes not set N_MEMORY
from getting promoted.

Signed-off-by: Byungchul Park <byungchul@sk.com>
---
 kernel/sched/fair.c | 7 +++++++
 1 file changed, 7 insertions(+)
  

Comments

Oscar Salvador Feb. 16, 2024, 1:51 p.m. UTC | #1
On Fri, Feb 16, 2024 at 08:40:45PM +0900, Byungchul Park wrote:
> From 150af2f78e19217a1d03e47e3ee5279684590fb4 Mon Sep 17 00:00:00 2001
> From: Byungchul Park <byungchul@sk.com>
> Date: Fri, 16 Feb 2024 20:18:10 +0900
> Subject: [PATCH v3] sched/numa, mm: do not promote folios to nodes not set N_MEMORY

"do not try to promote folios to memoryless nodes"

because AFAICS we are just trying.
Even if should_numa_migrate_memory() returns true, I assume that we will
fail somewhere down the chain e.g: migrate_pages() when we see that this
node does not any memory, right?

> A numa node might not have its local memory but CPUs. Promoting a folio
> to the node's local memory is nonsense. So avoid nodes not set N_MEMORY
> from getting promoted.

If you talk about memoryless nodes everybody gets it better IMHO.
"Memoryless nodes do not have any memory to migrate to, so stop trying it."


> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
>  kernel/sched/fair.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d7a3c63a2171..7ed9ef3c0134 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1828,6 +1828,13 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
>  	int dst_nid = cpu_to_node(dst_cpu);
>  	int last_cpupid, this_cpupid;
>  
> +	/*
> +	 * A node of dst_nid might not have its local memory. Promoting
> +	 * a folio to the node is meaningless.
> +	 */
> +	if (!node_state(dst_nid, N_MEMORY))
> +		return false;

"Cannot migrate to memoryless nodes"

seems shorter and more clear.

So, what happens when we return true here? will we fail at
migrate_pages() I guess? That is quite down the road so I guess
this check can save us some time.
  
Oscar Salvador Feb. 18, 2024, 7:46 a.m. UTC | #2
On Fri, Feb 16, 2024 at 02:51:24PM +0100, Oscar Salvador wrote:
> On Fri, Feb 16, 2024 at 08:40:45PM +0900, Byungchul Park wrote:
> > From 150af2f78e19217a1d03e47e3ee5279684590fb4 Mon Sep 17 00:00:00 2001
> > From: Byungchul Park <byungchul@sk.com>
> > Date: Fri, 16 Feb 2024 20:18:10 +0900
> > Subject: [PATCH v3] sched/numa, mm: do not promote folios to nodes not set N_MEMORY
> 
> "do not try to promote folios to memoryless nodes"

Thinking some more, promote might be misleading, just something like
"do not try to migrate memory to memoryless nodes".

As this is not a bug fix but an optimization, as we will fail anyways
in migrate_misplaced_folio() when migrate_balanced_pgdat() notices that
we do not have any memory on that code.

With the other comments addressed:

Reviewed-by: Oscar Salvador <osalvador@suse.de>

> 
> because AFAICS we are just trying.
> Even if should_numa_migrate_memory() returns true, I assume that we will
> fail somewhere down the chain e.g: migrate_pages() when we see that this
> node does not any memory, right?
> 
> > A numa node might not have its local memory but CPUs. Promoting a folio
> > to the node's local memory is nonsense. So avoid nodes not set N_MEMORY
> > from getting promoted.
> 
> If you talk about memoryless nodes everybody gets it better IMHO.
> "Memoryless nodes do not have any memory to migrate to, so stop trying it."
> 
> 
> > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > ---
> >  kernel/sched/fair.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index d7a3c63a2171..7ed9ef3c0134 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -1828,6 +1828,13 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
> >  	int dst_nid = cpu_to_node(dst_cpu);
> >  	int last_cpupid, this_cpupid;
> >  
> > +	/*
> > +	 * A node of dst_nid might not have its local memory. Promoting
> > +	 * a folio to the node is meaningless.
> > +	 */
> > +	if (!node_state(dst_nid, N_MEMORY))
> > +		return false;
> 
> "Cannot migrate to memoryless nodes"
> 
> seems shorter and more clear.
> 
> So, what happens when we return true here? will we fail at
> migrate_pages() I guess? That is quite down the road so I guess
> this check can save us some time.
> 
> 
> -- 
> Oscar Salvador
> SUSE Labs
>
  
Byungchul Park Feb. 19, 2024, 2:08 a.m. UTC | #3
On Fri, Feb 16, 2024 at 02:51:24PM +0100, Oscar Salvador wrote:
> On Fri, Feb 16, 2024 at 08:40:45PM +0900, Byungchul Park wrote:
> > From 150af2f78e19217a1d03e47e3ee5279684590fb4 Mon Sep 17 00:00:00 2001
> > From: Byungchul Park <byungchul@sk.com>
> > Date: Fri, 16 Feb 2024 20:18:10 +0900
> > Subject: [PATCH v3] sched/numa, mm: do not promote folios to nodes not set N_MEMORY
> 
> "do not try to promote folios to memoryless nodes"

Thank you. I will.

> because AFAICS we are just trying.
> Even if should_numa_migrate_memory() returns true, I assume that we will
> fail somewhere down the chain e.g: migrate_pages() when we see that this
> node does not any memory, right?

Yes.

> > A numa node might not have its local memory but CPUs. Promoting a folio
> > to the node's local memory is nonsense. So avoid nodes not set N_MEMORY
> > from getting promoted.
> 
> If you talk about memoryless nodes everybody gets it better IMHO.
> "Memoryless nodes do not have any memory to migrate to, so stop trying it."

Much better.

> > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > ---
> >  kernel/sched/fair.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index d7a3c63a2171..7ed9ef3c0134 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -1828,6 +1828,13 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
> >  	int dst_nid = cpu_to_node(dst_cpu);
> >  	int last_cpupid, this_cpupid;
> >  
> > +	/*
> > +	 * A node of dst_nid might not have its local memory. Promoting
> > +	 * a folio to the node is meaningless.
> > +	 */
> > +	if (!node_state(dst_nid, N_MEMORY))
> > +		return false;
> 
> "Cannot migrate to memoryless nodes"
> 
> seems shorter and more clear.

Agree.

	Byungchul

> So, what happens when we return true here? will we fail at
> migrate_pages() I guess? That is quite down the road so I guess
> this check can save us some time.
> 
> 
> -- 
> Oscar Salvador
> SUSE Labs
  
Byungchul Park Feb. 19, 2024, 2:10 a.m. UTC | #4
On Sun, Feb 18, 2024 at 08:46:16AM +0100, Oscar Salvador wrote:
> On Fri, Feb 16, 2024 at 02:51:24PM +0100, Oscar Salvador wrote:
> > On Fri, Feb 16, 2024 at 08:40:45PM +0900, Byungchul Park wrote:
> > > From 150af2f78e19217a1d03e47e3ee5279684590fb4 Mon Sep 17 00:00:00 2001
> > > From: Byungchul Park <byungchul@sk.com>
> > > Date: Fri, 16 Feb 2024 20:18:10 +0900
> > > Subject: [PATCH v3] sched/numa, mm: do not promote folios to nodes not set N_MEMORY
> > 
> > "do not try to promote folios to memoryless nodes"
> 
> Thinking some more, promote might be misleading, just something like
> "do not try to migrate memory to memoryless nodes".

Thank you. I will.

> As this is not a bug fix but an optimization, as we will fail anyways
> in migrate_misplaced_folio() when migrate_balanced_pgdat() notices that
> we do not have any memory on that code.
> 
> With the other comments addressed:
> 
> Reviewed-by: Oscar Salvador <osalvador@suse.de>

Thank you for reviewing.

	Byungchul

> > because AFAICS we are just trying.
> > Even if should_numa_migrate_memory() returns true, I assume that we will
> > fail somewhere down the chain e.g: migrate_pages() when we see that this
> > node does not any memory, right?
> > 
> > > A numa node might not have its local memory but CPUs. Promoting a folio
> > > to the node's local memory is nonsense. So avoid nodes not set N_MEMORY
> > > from getting promoted.
> > 
> > If you talk about memoryless nodes everybody gets it better IMHO.
> > "Memoryless nodes do not have any memory to migrate to, so stop trying it."
> > 
> > 
> > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > ---
> > >  kernel/sched/fair.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index d7a3c63a2171..7ed9ef3c0134 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -1828,6 +1828,13 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
> > >  	int dst_nid = cpu_to_node(dst_cpu);
> > >  	int last_cpupid, this_cpupid;
> > >  
> > > +	/*
> > > +	 * A node of dst_nid might not have its local memory. Promoting
> > > +	 * a folio to the node is meaningless.
> > > +	 */
> > > +	if (!node_state(dst_nid, N_MEMORY))
> > > +		return false;
> > 
> > "Cannot migrate to memoryless nodes"
> > 
> > seems shorter and more clear.
> > 
> > So, what happens when we return true here? will we fail at
> > migrate_pages() I guess? That is quite down the road so I guess
> > this check can save us some time.
> > 
> > 
> > -- 
> > Oscar Salvador
> > SUSE Labs
> > 
> 
> -- 
> Oscar Salvador
> SUSE Labs
  
David Hildenbrand Feb. 19, 2024, 9:30 a.m. UTC | #5
On 16.02.24 12:40, Byungchul Park wrote:
> Changes from v2:
> 	1. Rewrite the comment in code and the commit message becasue it
> 	   turns out that this patch is not the real fix for the oops
> 	   descriped. The real fix goes in another patch below:
> 
> 	   https://lore.kernel.org/lkml/20240216111502.79759-1-byungchul@sk.com/
> 
> Changes from v1:
> 	1. Trim the verbose oops in the commit message. (feedbacked by
> 	   Phil Auld)
> 	2. Rewrite a comment in code. (feedbacked by Phil Auld)
> 
> --->8---
>  From 150af2f78e19217a1d03e47e3ee5279684590fb4 Mon Sep 17 00:00:00 2001
> From: Byungchul Park <byungchul@sk.com>
> Date: Fri, 16 Feb 2024 20:18:10 +0900
> Subject: [PATCH v3] sched/numa, mm: do not promote folios to nodes not set N_MEMORY
> 
> A numa node might not have its local memory but CPUs. Promoting a folio
> to the node's local memory is nonsense. So avoid nodes not set N_MEMORY
> from getting promoted.

So there is no bug/panic that can be triggered and this is not a "fix" 
but an optimization?

> 
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
>   kernel/sched/fair.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d7a3c63a2171..7ed9ef3c0134 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1828,6 +1828,13 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
>   	int dst_nid = cpu_to_node(dst_cpu);
>   	int last_cpupid, this_cpupid;
>   
> +	/*
> +	 * A node of dst_nid might not have its local memory. Promoting
> +	 * a folio to the node is meaningless.
> +	 */
> +	if (!node_state(dst_nid, N_MEMORY))
> +		return false;
> +
>   	/*
>   	 * The pages in slow memory node should be migrated according
>   	 * to hot/cold instead of private/shared.

Acked-by: David Hildenbrand <david@redhat.com>
  
Phil Auld Feb. 19, 2024, 3 p.m. UTC | #6
On Mon, Feb 19, 2024 at 11:08:54AM +0900 Byungchul Park wrote:
> On Fri, Feb 16, 2024 at 02:51:24PM +0100, Oscar Salvador wrote:
> > On Fri, Feb 16, 2024 at 08:40:45PM +0900, Byungchul Park wrote:
> > > From 150af2f78e19217a1d03e47e3ee5279684590fb4 Mon Sep 17 00:00:00 2001
> > > From: Byungchul Park <byungchul@sk.com>
> > > Date: Fri, 16 Feb 2024 20:18:10 +0900
> > > Subject: [PATCH v3] sched/numa, mm: do not promote folios to nodes not set N_MEMORY
> > 
> > "do not try to promote folios to memoryless nodes"
> 
> Thank you. I will.
> 
> > because AFAICS we are just trying.
> > Even if should_numa_migrate_memory() returns true, I assume that we will
> > fail somewhere down the chain e.g: migrate_pages() when we see that this
> > node does not any memory, right?
> 
> Yes.
> 
> > > A numa node might not have its local memory but CPUs. Promoting a folio
> > > to the node's local memory is nonsense. So avoid nodes not set N_MEMORY
> > > from getting promoted.
> > 
> > If you talk about memoryless nodes everybody gets it better IMHO.
> > "Memoryless nodes do not have any memory to migrate to, so stop trying it."
> 
> Much better.
> 
> > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > ---
> > >  kernel/sched/fair.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index d7a3c63a2171..7ed9ef3c0134 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -1828,6 +1828,13 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
> > >  	int dst_nid = cpu_to_node(dst_cpu);
> > >  	int last_cpupid, this_cpupid;
> > >  
> > > +	/*
> > > +	 * A node of dst_nid might not have its local memory. Promoting
> > > +	 * a folio to the node is meaningless.
> > > +	 */
> > > +	if (!node_state(dst_nid, N_MEMORY))
> > > +		return false;
> > 
> > "Cannot migrate to memoryless nodes"
> > 
> > seems shorter and more clear.
> 
> Agree.
>

We clearly have dst_cpu when we call this so I still think a
check could be done farther up. But this one still looks reasonable
to me.

Thanks,
Phil


Reviewed-by: Phil Auld <pauld@redhat.com>


> 	Byungchul
> 
> > So, what happens when we return true here? will we fail at
> > migrate_pages() I guess? That is quite down the road so I guess
> > this check can save us some time.
> > 
> > 
> > -- 
> > Oscar Salvador
> > SUSE Labs
> 

--
  
Davidlohr Bueso Feb. 19, 2024, 7:26 p.m. UTC | #7
On Sun, 18 Feb 2024, Oscar Salvador wrote:

>On Fri, Feb 16, 2024 at 02:51:24PM +0100, Oscar Salvador wrote:
>> On Fri, Feb 16, 2024 at 08:40:45PM +0900, Byungchul Park wrote:
>> > From 150af2f78e19217a1d03e47e3ee5279684590fb4 Mon Sep 17 00:00:00 2001
>> > From: Byungchul Park <byungchul@sk.com>
>> > Date: Fri, 16 Feb 2024 20:18:10 +0900
>> > Subject: [PATCH v3] sched/numa, mm: do not promote folios to nodes not set N_MEMORY
>>
>> "do not try to promote folios to memoryless nodes"
>
>Thinking some more, promote might be misleading, just something like
>"do not try to migrate memory to memoryless nodes".

Yes. Does this also want an unlikely()? Not that it would be measurable.

>As this is not a bug fix but an optimization, as we will fail anyways
>in migrate_misplaced_folio() when migrate_balanced_pgdat() notices that
>we do not have any memory on that code.

This should be in the changelog and the subject is misleading as well.

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>

>
>With the other comments addressed:
>
>Reviewed-by: Oscar Salvador <osalvador@suse.de>
>
>>
>> because AFAICS we are just trying.
>> Even if should_numa_migrate_memory() returns true, I assume that we will
>> fail somewhere down the chain e.g: migrate_pages() when we see that this
>> node does not any memory, right?
>>
>> > A numa node might not have its local memory but CPUs. Promoting a folio
>> > to the node's local memory is nonsense. So avoid nodes not set N_MEMORY
>> > from getting promoted.
>>
>> If you talk about memoryless nodes everybody gets it better IMHO.
>> "Memoryless nodes do not have any memory to migrate to, so stop trying it."
>>
>>
>> > Signed-off-by: Byungchul Park <byungchul@sk.com>
>> > ---
>> >  kernel/sched/fair.c | 7 +++++++
>> >  1 file changed, 7 insertions(+)
>> >
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index d7a3c63a2171..7ed9ef3c0134 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -1828,6 +1828,13 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
>> >  	int dst_nid = cpu_to_node(dst_cpu);
>> >  	int last_cpupid, this_cpupid;
>> >
>> > +	/*
>> > +	 * A node of dst_nid might not have its local memory. Promoting
>> > +	 * a folio to the node is meaningless.
>> > +	 */
>> > +	if (!node_state(dst_nid, N_MEMORY))
>> > +		return false;
>>
>> "Cannot migrate to memoryless nodes"
>>
>> seems shorter and more clear.
>>
>> So, what happens when we return true here? will we fail at
>> migrate_pages() I guess? That is quite down the road so I guess
>> this check can save us some time.
>>
>>
>> --
>> Oscar Salvador
>> SUSE Labs
>>
>
>-- 
>Oscar Salvador
>SUSE Labs
>
  

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d7a3c63a2171..7ed9ef3c0134 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1828,6 +1828,13 @@  bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
 	int dst_nid = cpu_to_node(dst_cpu);
 	int last_cpupid, this_cpupid;
 
+	/*
+	 * A node of dst_nid might not have its local memory. Promoting
+	 * a folio to the node is meaningless.
+	 */
+	if (!node_state(dst_nid, N_MEMORY))
+		return false;
+
 	/*
 	 * The pages in slow memory node should be migrated according
 	 * to hot/cold instead of private/shared.