diff mbox series

[1/3] mm/mempolicy: Use the already fetched local variable

Message ID	9c3f7b743477560d1c5b12b8c111a584a2cc92ee.1708097962.git.donettom@linux.ibm.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel+bounces-69718-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; From: Donet Tom <donettom@linux.ibm.com> To: Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Aneesh Kumar <aneesh.kumar@kernel.org>, Huang Ying <ying.huang@intel.com>, Dave Hansen <dave.hansen@linux.intel.com>, Mel Gorman <mgorman@suse.de>, Ben Widawsky <ben.widawsky@intel.com>, Feng Tang <feng.tang@intel.com>, Michal Hocko <mhocko@kernel.org>, Andrea Arcangeli <aarcange@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Rik van Riel <riel@surriel.com>, Johannes Weiner <hannes@cmpxchg.org>, Matthew Wilcox <willy@infradead.org>, Mike Kravetz <mike.kravetz@oracle.com>, Vlastimil Babka <vbabka@suse.cz>, Dan Williams <dan.j.williams@intel.com>, Hugh Dickins <hughd@google.com>, Kefeng Wang <wangkefeng.wang@huawei.com>, Suren Baghdasaryan <surenb@google.com> Subject: [PATCH 1/3] mm/mempolicy: Use the already fetched local variable Date: Sat, 17 Feb 2024 01:31:33 -0600 Message-Id: <9c3f7b743477560d1c5b12b8c111a584a2cc92ee.1708097962.git.donettom@linux.ibm.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[1/3] mm/mempolicy: Use the already fetched local variable \| [1/3] mm/mempolicy: Use the already fetched local variable [2/3] mm/mempolicy: Avoid the fallthrough with MPOLD_BIND in mpol_misplaced. [3/3] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy

Commit Message

Donet Tom Feb. 17, 2024, 7:31 a.m. UTC

  Avoid doing a per cpu read and use the local variable thisnid. IMHO
this also makes the code more readable.

Signed-off-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
---
 mm/mempolicy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Andrew Morton Feb. 18, 2024, 9:38 p.m. UTC | #1

On Sat, 17 Feb 2024 01:31:33 -0600 Donet Tom <donettom@linux.ibm.com> wrote:

> Avoid doing a per cpu read and use the local variable thisnid. IMHO
> this also makes the code more readable.
> 
> ...
>
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2526,7 +2526,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>  		if (node_isset(curnid, pol->nodes))
>  			goto out;
>  		z = first_zones_zonelist(
> -				node_zonelist(numa_node_id(), GFP_HIGHUSER),
> +				node_zonelist(thisnid, GFP_HIGHUSER),
>  				gfp_zone(GFP_HIGHUSER),
>  				&pol->nodes);
>  		polnid = zone_to_nid(z->zone);

	int thisnid = cpu_to_node(thiscpu);

Is there any dofference between numa_node_id() and
cpu_to_node(raw_smp_processor_id())?  And it it explicable that we're
using one here and not the other?

Donet Tom Feb. 19, 2024, 8:34 a.m. UTC | #2

On 2/19/24 03:08, Andrew Morton wrote:
> On Sat, 17 Feb 2024 01:31:33 -0600 Donet Tom <donettom@linux.ibm.com> wrote:
>
>> Avoid doing a per cpu read and use the local variable thisnid. IMHO
>> this also makes the code more readable.
>>
>> ...
>>
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -2526,7 +2526,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>>   		if (node_isset(curnid, pol->nodes))
>>   			goto out;
>>   		z = first_zones_zonelist(
>> -				node_zonelist(numa_node_id(), GFP_HIGHUSER),
>> +				node_zonelist(thisnid, GFP_HIGHUSER),
>>   				gfp_zone(GFP_HIGHUSER),
>>   				&pol->nodes);
>>   		polnid = zone_to_nid(z->zone);
> 	int thisnid = cpu_to_node(thiscpu);
>
> Is there any dofference between numa_node_id() and
> cpu_to_node(raw_smp_processor_id())?  And it it explicable that we're
> using one here and not the other?

Hi Andrew

Both numa_node_id() and cpu_to_node(raw_smp_processor_id()) return the current execution node id,
Since the current execution node is already fetched at the beginning (thisnid) we can reuse it instead of getting it again.

Thanks
Donet Tom

>

Andrew Morton Feb. 20, 2024, 1:21 a.m. UTC | #3

On Mon, 19 Feb 2024 14:04:23 +0530 Donet Tom <donettom@linux.ibm.com> wrote:

> >> --- a/mm/mempolicy.c
> >> +++ b/mm/mempolicy.c
> >> @@ -2526,7 +2526,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
> >>   		if (node_isset(curnid, pol->nodes))
> >>   			goto out;
> >>   		z = first_zones_zonelist(
> >> -				node_zonelist(numa_node_id(), GFP_HIGHUSER),
> >> +				node_zonelist(thisnid, GFP_HIGHUSER),
> >>   				gfp_zone(GFP_HIGHUSER),
> >>   				&pol->nodes);
> >>   		polnid = zone_to_nid(z->zone);
> > 	int thisnid = cpu_to_node(thiscpu);
> >
> > Is there any dofference between numa_node_id() and
> > cpu_to_node(raw_smp_processor_id())?  And it it explicable that we're
> > using one here and not the other?
> 
> Hi Andrew
> 
> Both numa_node_id() and cpu_to_node(raw_smp_processor_id()) return the current execution node id,
> Since the current execution node is already fetched at the beginning (thisnid) we can reuse it instead of getting it again.

Sure, but mine was a broader thought: why do we have both?  Is one
preferable and if so why?

Aneesh Kumar K.V Feb. 20, 2024, 4:10 a.m. UTC | #4

On 2/20/24 6:51 AM, Andrew Morton wrote:
> On Mon, 19 Feb 2024 14:04:23 +0530 Donet Tom <donettom@linux.ibm.com> wrote:
> 
>>>> --- a/mm/mempolicy.c
>>>> +++ b/mm/mempolicy.c
>>>> @@ -2526,7 +2526,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>>>>   		if (node_isset(curnid, pol->nodes))
>>>>   			goto out;
>>>>   		z = first_zones_zonelist(
>>>> -				node_zonelist(numa_node_id(), GFP_HIGHUSER),
>>>> +				node_zonelist(thisnid, GFP_HIGHUSER),
>>>>   				gfp_zone(GFP_HIGHUSER),
>>>>   				&pol->nodes);
>>>>   		polnid = zone_to_nid(z->zone);
>>> 	int thisnid = cpu_to_node(thiscpu);
>>>
>>> Is there any dofference between numa_node_id() and
>>> cpu_to_node(raw_smp_processor_id())?  And it it explicable that we're
>>> using one here and not the other?
>>
>> Hi Andrew
>>
>> Both numa_node_id() and cpu_to_node(raw_smp_processor_id()) return the current execution node id,
>> Since the current execution node is already fetched at the beginning (thisnid) we can reuse it instead of getting it again.
> 
> Sure, but mine was a broader thought: why do we have both?  Is one
> preferable and if so why?

IIUC these are two helpers to fetch current numa node id. and either of them can be used based on need. The default implementation shows the details.
(One small difference is numa_node_id() can use optimized per cpu reader because it is fetching the per cpu variable of the currently running cpu.)

#ifndef numa_node_id
/* Returns the number of the current Node. */
static inline int numa_node_id(void)
{
	return raw_cpu_read(numa_node);
}
#endif

#ifndef cpu_to_node
static inline int cpu_to_node(int cpu)
{
	return per_cpu(numa_node, cpu);
}
#endif

In mpol_misplaced function, we need the cpu details because we are using that in other place (should_numa_migreate_memory()). So it makes it easy
to use cpu_to_node(thiscpu) instead of numa_node_id(). 

-aneesh

Huang, Ying Feb. 20, 2024, 6:25 a.m. UTC | #5

"Aneesh Kumar K.V" <aneesh.kumar@kernel.org> writes:

> On 2/20/24 6:51 AM, Andrew Morton wrote:
>> On Mon, 19 Feb 2024 14:04:23 +0530 Donet Tom <donettom@linux.ibm.com> wrote:
>> 
>>>>> --- a/mm/mempolicy.c
>>>>> +++ b/mm/mempolicy.c
>>>>> @@ -2526,7 +2526,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>>>>>   		if (node_isset(curnid, pol->nodes))
>>>>>   			goto out;
>>>>>   		z = first_zones_zonelist(
>>>>> -				node_zonelist(numa_node_id(), GFP_HIGHUSER),
>>>>> +				node_zonelist(thisnid, GFP_HIGHUSER),
>>>>>   				gfp_zone(GFP_HIGHUSER),
>>>>>   				&pol->nodes);
>>>>>   		polnid = zone_to_nid(z->zone);
>>>> 	int thisnid = cpu_to_node(thiscpu);
>>>>
>>>> Is there any dofference between numa_node_id() and
>>>> cpu_to_node(raw_smp_processor_id())?  And it it explicable that we're
>>>> using one here and not the other?
>>>
>>> Hi Andrew
>>>
>>> Both numa_node_id() and cpu_to_node(raw_smp_processor_id()) return the current execution node id,
>>> Since the current execution node is already fetched at the beginning (thisnid) we can reuse it instead of getting it again.
>> 
>> Sure, but mine was a broader thought: why do we have both?  Is one
>> preferable and if so why?
>
> IIUC these are two helpers to fetch current numa node id. and either of them can be used based on need. The default implementation shows the details.
> (One small difference is numa_node_id() can use optimized per cpu reader because it is fetching the per cpu variable of the currently running cpu.)
>
> #ifndef numa_node_id
> /* Returns the number of the current Node. */
> static inline int numa_node_id(void)
> {
> 	return raw_cpu_read(numa_node);
> }
> #endif
>
> #ifndef cpu_to_node
> static inline int cpu_to_node(int cpu)
> {
> 	return per_cpu(numa_node, cpu);
> }
> #endif
>
> In mpol_misplaced function, we need the cpu details because we are using that in other place (should_numa_migreate_memory()). So it makes it easy
> to use cpu_to_node(thiscpu) instead of numa_node_id(). 

IIUC, numa_node_id() is faster than cpu_to_node(thiscpu), even if we
have thiscpu already.  cpu_to_node() is mainly used to get the node of
NOT current CPU.  So, IMHO, we should only use numa_node_id() in this
function.

--
Best Regards,
Huang, Ying

Aneesh Kumar K.V Feb. 20, 2024, 6:32 a.m. UTC | #6

On 2/20/24 11:55 AM, Huang, Ying wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> writes:
> 
>> On 2/20/24 6:51 AM, Andrew Morton wrote:
>>> On Mon, 19 Feb 2024 14:04:23 +0530 Donet Tom <donettom@linux.ibm.com> wrote:
>>>
>>>>>> --- a/mm/mempolicy.c
>>>>>> +++ b/mm/mempolicy.c
>>>>>> @@ -2526,7 +2526,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>>>>>>   		if (node_isset(curnid, pol->nodes))
>>>>>>   			goto out;
>>>>>>   		z = first_zones_zonelist(
>>>>>> -				node_zonelist(numa_node_id(), GFP_HIGHUSER),
>>>>>> +				node_zonelist(thisnid, GFP_HIGHUSER),
>>>>>>   				gfp_zone(GFP_HIGHUSER),
>>>>>>   				&pol->nodes);
>>>>>>   		polnid = zone_to_nid(z->zone);
>>>>> 	int thisnid = cpu_to_node(thiscpu);
>>>>>
>>>>> Is there any dofference between numa_node_id() and
>>>>> cpu_to_node(raw_smp_processor_id())?  And it it explicable that we're
>>>>> using one here and not the other?
>>>>
>>>> Hi Andrew
>>>>
>>>> Both numa_node_id() and cpu_to_node(raw_smp_processor_id()) return the current execution node id,
>>>> Since the current execution node is already fetched at the beginning (thisnid) we can reuse it instead of getting it again.
>>>
>>> Sure, but mine was a broader thought: why do we have both?  Is one
>>> preferable and if so why?
>>
>> IIUC these are two helpers to fetch current numa node id. and either of them can be used based on need. The default implementation shows the details.
>> (One small difference is numa_node_id() can use optimized per cpu reader because it is fetching the per cpu variable of the currently running cpu.)
>>
>> #ifndef numa_node_id
>> /* Returns the number of the current Node. */
>> static inline int numa_node_id(void)
>> {
>> 	return raw_cpu_read(numa_node);
>> }
>> #endif
>>
>> #ifndef cpu_to_node
>> static inline int cpu_to_node(int cpu)
>> {
>> 	return per_cpu(numa_node, cpu);
>> }
>> #endif
>>
>> In mpol_misplaced function, we need the cpu details because we are using that in other place (should_numa_migreate_memory()). So it makes it easy
>> to use cpu_to_node(thiscpu) instead of numa_node_id(). 
> 
> IIUC, numa_node_id() is faster than cpu_to_node(thiscpu), even if we
> have thiscpu already.  cpu_to_node() is mainly used to get the node of
> NOT current CPU.  So, IMHO, we should only use numa_node_id() in this
> function.
> 

This change?

modified   mm/mempolicy.c
@@ -2502,8 +2502,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
 	pgoff_t ilx;
 	struct zoneref *z;
 	int curnid = folio_nid(folio);
-	int thiscpu = raw_smp_processor_id();
-	int thisnid = cpu_to_node(thiscpu);
+	int thisnid = numa_node_id();
 	int polnid = NUMA_NO_NODE;
 	int ret = NUMA_NO_NODE;
 
@@ -2573,7 +2572,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
 		polnid = thisnid;
 
 		if (!should_numa_migrate_memory(current, folio, curnid,
-						thiscpu))
+						raw_smp_processor_id()))
 			goto out;
 	}
 



-aneesh

Aneesh Kumar K.V Feb. 20, 2024, 7:03 a.m. UTC | #7

On 2/20/24 12:02 PM, Aneesh Kumar K.V wrote:
> On 2/20/24 11:55 AM, Huang, Ying wrote:
>> "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> writes:
>>
>>> On 2/20/24 6:51 AM, Andrew Morton wrote:
>>>> On Mon, 19 Feb 2024 14:04:23 +0530 Donet Tom <donettom@linux.ibm.com> wrote:
>>>>
>>>>>>> --- a/mm/mempolicy.c
>>>>>>> +++ b/mm/mempolicy.c
>>>>>>> @@ -2526,7 +2526,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>>>>>>>   		if (node_isset(curnid, pol->nodes))
>>>>>>>   			goto out;
>>>>>>>   		z = first_zones_zonelist(
>>>>>>> -				node_zonelist(numa_node_id(), GFP_HIGHUSER),
>>>>>>> +				node_zonelist(thisnid, GFP_HIGHUSER),
>>>>>>>   				gfp_zone(GFP_HIGHUSER),
>>>>>>>   				&pol->nodes);
>>>>>>>   		polnid = zone_to_nid(z->zone);
>>>>>> 	int thisnid = cpu_to_node(thiscpu);
>>>>>>
>>>>>> Is there any dofference between numa_node_id() and
>>>>>> cpu_to_node(raw_smp_processor_id())?  And it it explicable that we're
>>>>>> using one here and not the other?
>>>>>
>>>>> Hi Andrew
>>>>>
>>>>> Both numa_node_id() and cpu_to_node(raw_smp_processor_id()) return the current execution node id,
>>>>> Since the current execution node is already fetched at the beginning (thisnid) we can reuse it instead of getting it again.
>>>>
>>>> Sure, but mine was a broader thought: why do we have both?  Is one
>>>> preferable and if so why?
>>>
>>> IIUC these are two helpers to fetch current numa node id. and either of them can be used based on need. The default implementation shows the details.
>>> (One small difference is numa_node_id() can use optimized per cpu reader because it is fetching the per cpu variable of the currently running cpu.)
>>>
>>> #ifndef numa_node_id
>>> /* Returns the number of the current Node. */
>>> static inline int numa_node_id(void)
>>> {
>>> 	return raw_cpu_read(numa_node);
>>> }
>>> #endif
>>>
>>> #ifndef cpu_to_node
>>> static inline int cpu_to_node(int cpu)
>>> {
>>> 	return per_cpu(numa_node, cpu);
>>> }
>>> #endif
>>>
>>> In mpol_misplaced function, we need the cpu details because we are using that in other place (should_numa_migreate_memory()). So it makes it easy
>>> to use cpu_to_node(thiscpu) instead of numa_node_id(). 
>>
>> IIUC, numa_node_id() is faster than cpu_to_node(thiscpu), even if we
>> have thiscpu already.  cpu_to_node() is mainly used to get the node of
>> NOT current CPU.  So, IMHO, we should only use numa_node_id() in this
>> function.
>>
> 
> This change?
> 
> modified   mm/mempolicy.c
> @@ -2502,8 +2502,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>  	pgoff_t ilx;
>  	struct zoneref *z;
>  	int curnid = folio_nid(folio);
> -	int thiscpu = raw_smp_processor_id();
> -	int thisnid = cpu_to_node(thiscpu);
> +	int thisnid = numa_node_id();
>  	int polnid = NUMA_NO_NODE;
>  	int ret = NUMA_NO_NODE;
>  
> @@ -2573,7 +2572,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>  		polnid = thisnid;
>  
>  		if (!should_numa_migrate_memory(current, folio, curnid,
> -						thiscpu))
> +						raw_smp_processor_id()))
>  			goto out;
>  	}
>  
> 

One of the problem with the above change will be the need to make sure smp processor id remaining stable, which
I am not sure we want in this function. With that we can end up with processor id not related to the numa node id
we are using. 

-aneesh

Huang, Ying Feb. 20, 2024, 7:22 a.m. UTC | #8

"Aneesh Kumar K.V" <aneesh.kumar@kernel.org> writes:

> On 2/20/24 12:02 PM, Aneesh Kumar K.V wrote:
>> On 2/20/24 11:55 AM, Huang, Ying wrote:
>>> "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> writes:
>>>
>>>> On 2/20/24 6:51 AM, Andrew Morton wrote:
>>>>> On Mon, 19 Feb 2024 14:04:23 +0530 Donet Tom <donettom@linux.ibm.com> wrote:
>>>>>
>>>>>>>> --- a/mm/mempolicy.c
>>>>>>>> +++ b/mm/mempolicy.c
>>>>>>>> @@ -2526,7 +2526,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>>>>>>>>   		if (node_isset(curnid, pol->nodes))
>>>>>>>>   			goto out;
>>>>>>>>   		z = first_zones_zonelist(
>>>>>>>> -				node_zonelist(numa_node_id(), GFP_HIGHUSER),
>>>>>>>> +				node_zonelist(thisnid, GFP_HIGHUSER),
>>>>>>>>   				gfp_zone(GFP_HIGHUSER),
>>>>>>>>   				&pol->nodes);
>>>>>>>>   		polnid = zone_to_nid(z->zone);
>>>>>>> 	int thisnid = cpu_to_node(thiscpu);
>>>>>>>
>>>>>>> Is there any dofference between numa_node_id() and
>>>>>>> cpu_to_node(raw_smp_processor_id())?  And it it explicable that we're
>>>>>>> using one here and not the other?
>>>>>>
>>>>>> Hi Andrew
>>>>>>
>>>>>> Both numa_node_id() and cpu_to_node(raw_smp_processor_id()) return the current execution node id,
>>>>>> Since the current execution node is already fetched at the beginning (thisnid) we can reuse it instead of getting it again.
>>>>>
>>>>> Sure, but mine was a broader thought: why do we have both?  Is one
>>>>> preferable and if so why?
>>>>
>>>> IIUC these are two helpers to fetch current numa node id. and either of them can be used based on need. The default implementation shows the details.
>>>> (One small difference is numa_node_id() can use optimized per cpu reader because it is fetching the per cpu variable of the currently running cpu.)
>>>>
>>>> #ifndef numa_node_id
>>>> /* Returns the number of the current Node. */
>>>> static inline int numa_node_id(void)
>>>> {
>>>> 	return raw_cpu_read(numa_node);
>>>> }
>>>> #endif
>>>>
>>>> #ifndef cpu_to_node
>>>> static inline int cpu_to_node(int cpu)
>>>> {
>>>> 	return per_cpu(numa_node, cpu);
>>>> }
>>>> #endif
>>>>
>>>> In mpol_misplaced function, we need the cpu details because we are using that in other place (should_numa_migreate_memory()). So it makes it easy
>>>> to use cpu_to_node(thiscpu) instead of numa_node_id(). 
>>>
>>> IIUC, numa_node_id() is faster than cpu_to_node(thiscpu), even if we
>>> have thiscpu already.  cpu_to_node() is mainly used to get the node of
>>> NOT current CPU.  So, IMHO, we should only use numa_node_id() in this
>>> function.
>>>
>> 
>> This change?
>> 
>> modified   mm/mempolicy.c
>> @@ -2502,8 +2502,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>>  	pgoff_t ilx;
>>  	struct zoneref *z;
>>  	int curnid = folio_nid(folio);
>> -	int thiscpu = raw_smp_processor_id();
>> -	int thisnid = cpu_to_node(thiscpu);
>> +	int thisnid = numa_node_id();
>>  	int polnid = NUMA_NO_NODE;
>>  	int ret = NUMA_NO_NODE;
>>  
>> @@ -2573,7 +2572,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>>  		polnid = thisnid;
>>  
>>  		if (!should_numa_migrate_memory(current, folio, curnid,
>> -						thiscpu))
>> +						raw_smp_processor_id()))
>>  			goto out;
>>  	}
>>  
>> 
>
> One of the problem with the above change will be the need to make sure smp processor id remaining stable, which
> I am not sure we want in this function. With that we can end up with processor id not related to the numa node id
> we are using. 

This isn't an issue now, because mpol_misplaced() are always called with
PTL held.  And, we can still keep thiscpu local variable.

--
Best Regards,
Huang, Ying

Michal Hocko Feb. 20, 2024, 9:03 a.m. UTC | #9

On Tue 20-02-24 15:22:07, Huang, Ying wrote:
[...]
> This isn't an issue now, because mpol_misplaced() are always called with
> PTL held.  And, we can still keep thiscpu local variable.

yes, this is the case but it would be better if we made that assumption
official by lockdep_assert_held

Aneesh Kumar K.V March 3, 2024, 6:17 a.m. UTC | #10

Michal Hocko <mhocko@suse.com> writes:

> On Tue 20-02-24 15:22:07, Huang, Ying wrote:
> [...]
>> This isn't an issue now, because mpol_misplaced() are always called with
>> PTL held.  And, we can still keep thiscpu local variable.
>
> yes, this is the case but it would be better if we made that assumption
> official by lockdep_assert_held
>

How about this folded into this patch?

2 files changed, 12 insertions(+), 4 deletions(-)
mm/memory.c    |  6 ++++--
mm/mempolicy.c | 10 ++++++++--

modified   mm/memory.c
@@ -4879,9 +4879,11 @@ static vm_fault_t do_fault(struct vm_fault *vmf)
 	return ret;
 }
 
-int numa_migrate_prep(struct folio *folio, struct vm_area_struct *vma,
+int numa_migrate_prep(struct folio *folio, struct vm_fault *vmf,
 		      unsigned long addr, int page_nid, int *flags)
 {
+	struct vm_area_struct *vma = vmf->vma;
+
 	folio_get(folio);
 
 	/* Record the current PID acceesing VMA */
@@ -4893,7 +4895,7 @@ int numa_migrate_prep(struct folio *folio, struct vm_area_struct *vma,
 		*flags |= TNF_FAULT_LOCAL;
 	}
 
-	return mpol_misplaced(folio, vma, addr);
+	return mpol_misplaced(folio, vmf, addr);
 }
 
 static vm_fault_t do_numa_page(struct vm_fault *vmf)
modified   mm/mempolicy.c
@@ -2495,18 +2495,24 @@ static inline bool mpol_preferred_should_numa_migrate(int exec_node, int folio_n
  * Return: NUMA_NO_NODE if the page is in a node that is valid for this
  * policy, or a suitable node ID to allocate a replacement folio from.
  */
-int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
+int mpol_misplaced(struct folio *folio, struct vm_fault *vmf,
 		   unsigned long addr)
 {
 	struct mempolicy *pol;
 	pgoff_t ilx;
 	struct zoneref *z;
 	int curnid = folio_nid(folio);
+	struct vm_area_struct *vma = vmf->vma;
 	int thiscpu = raw_smp_processor_id();
-	int thisnid = cpu_to_node(thiscpu);
+	int thisnid = numa_node_id();
 	int polnid = NUMA_NO_NODE;
 	int ret = NUMA_NO_NODE;
 
+	/*
+	 * Make sure ptl is held so that we don't preempt and we
+	 * have a stable smp processor id
+	 */
+	lockdep_assert_held(vmf->ptl);
 	pol = get_vma_policy(vma, addr, folio_order(folio), &ilx);
 	if (!(pol->flags & MPOL_F_MOF))
 		goto out;

[back]

Huang, Ying March 4, 2024, 1:49 a.m. UTC | #11

Aneesh Kumar K.V <aneesh.kumar@kernel.org> writes:

> Michal Hocko <mhocko@suse.com> writes:
>
>> On Tue 20-02-24 15:22:07, Huang, Ying wrote:
>> [...]
>>> This isn't an issue now, because mpol_misplaced() are always called with
>>> PTL held.  And, we can still keep thiscpu local variable.
>>
>> yes, this is the case but it would be better if we made that assumption
>> official by lockdep_assert_held
>>
>
> How about this folded into this patch?
>
> 2 files changed, 12 insertions(+), 4 deletions(-)
> mm/memory.c    |  6 ++++--
> mm/mempolicy.c | 10 ++++++++--
>
> modified   mm/memory.c
> @@ -4879,9 +4879,11 @@ static vm_fault_t do_fault(struct vm_fault *vmf)
>  	return ret;
>  }
>  
> -int numa_migrate_prep(struct folio *folio, struct vm_area_struct *vma,
> +int numa_migrate_prep(struct folio *folio, struct vm_fault *vmf,
>  		      unsigned long addr, int page_nid, int *flags)
>  {
> +	struct vm_area_struct *vma = vmf->vma;
> +
>  	folio_get(folio);
>  
>  	/* Record the current PID acceesing VMA */
> @@ -4893,7 +4895,7 @@ int numa_migrate_prep(struct folio *folio, struct vm_area_struct *vma,
>  		*flags |= TNF_FAULT_LOCAL;
>  	}
>  
> -	return mpol_misplaced(folio, vma, addr);
> +	return mpol_misplaced(folio, vmf, addr);
>  }
>  
>  static vm_fault_t do_numa_page(struct vm_fault *vmf)
> modified   mm/mempolicy.c
> @@ -2495,18 +2495,24 @@ static inline bool mpol_preferred_should_numa_migrate(int exec_node, int folio_n
>   * Return: NUMA_NO_NODE if the page is in a node that is valid for this
>   * policy, or a suitable node ID to allocate a replacement folio from.
>   */
> -int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
> +int mpol_misplaced(struct folio *folio, struct vm_fault *vmf,
>  		   unsigned long addr)
>  {
>  	struct mempolicy *pol;
>  	pgoff_t ilx;
>  	struct zoneref *z;
>  	int curnid = folio_nid(folio);
> +	struct vm_area_struct *vma = vmf->vma;
>  	int thiscpu = raw_smp_processor_id();
> -	int thisnid = cpu_to_node(thiscpu);
> +	int thisnid = numa_node_id();
>  	int polnid = NUMA_NO_NODE;
>  	int ret = NUMA_NO_NODE;
>  
> +	/*
> +	 * Make sure ptl is held so that we don't preempt and we
> +	 * have a stable smp processor id
> +	 */
> +	lockdep_assert_held(vmf->ptl);
>  	pol = get_vma_policy(vma, addr, folio_order(folio), &ilx);
>  	if (!(pol->flags & MPOL_F_MOF))
>  		goto out;
>
> [back]
>  

LGTM, Thanks!

--
Best Regards,
Huang, Ying

diff mbox series

Patch

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 10a590ee1c89..8478574c000c 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2526,7 +2526,7 @@  int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
 		if (node_isset(curnid, pol->nodes))
 			goto out;
 		z = first_zones_zonelist(
-				node_zonelist(numa_node_id(), GFP_HIGHUSER),
+				node_zonelist(thisnid, GFP_HIGHUSER),
 				gfp_zone(GFP_HIGHUSER),
 				&pol->nodes);
 		polnid = zone_to_nid(z->zone);