[v4,4/7] hugetlb: pass *next_nid_to_alloc directly to for_each_node_mask_to_alloc

Message ID 20240118123911.88833-5-gang.li@linux.dev
State New
Headers
Series hugetlb: parallelize hugetlb page init on boot |

Commit Message

Gang Li Jan. 18, 2024, 12:39 p.m. UTC
  With parallelization of hugetlb allocation across different threads, each
thread works on a differnet node to allocate pages from, instead of all
allocating from a common node h->next_nid_to_alloc.  To address this, it's
necessary to assign a separate next_nid_to_alloc for each thread.

Consequently, the hstate_next_node_to_alloc and for_each_node_mask_to_alloc
have been modified to directly accept a *next_nid_to_alloc parameter,
ensuring thread-specific allocation and avoiding concurrent access issues.

Signed-off-by: Gang Li <gang.li@linux.dev>
Tested-by: David Rientjes <rientjes@google.com>
---
 mm/hugetlb.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)
  

Comments

Tim Chen Jan. 18, 2024, 11:01 p.m. UTC | #1
On Thu, 2024-01-18 at 20:39 +0800, Gang Li wrote:
> With parallelization of hugetlb allocation across different threads, each
> thread works on a differnet node to allocate pages from, instead of all
> allocating from a common node h->next_nid_to_alloc.  To address this, it's
> necessary to assign a separate next_nid_to_alloc for each thread.
> 
> Consequently, the hstate_next_node_to_alloc and for_each_node_mask_to_alloc
> have been modified to directly accept a *next_nid_to_alloc parameter,
> ensuring thread-specific allocation and avoiding concurrent access issues.
> 
> Signed-off-by: Gang Li <gang.li@linux.dev>
> Tested-by: David Rientjes <rientjes@google.com>

Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>

> ---
  
Muchun Song Jan. 19, 2024, 2:54 a.m. UTC | #2
> On Jan 18, 2024, at 20:39, Gang Li <gang.li@linux.dev> wrote:
> 
> With parallelization of hugetlb allocation across different threads, each
> thread works on a differnet node to allocate pages from, instead of all
> allocating from a common node h->next_nid_to_alloc.  To address this, it's
> necessary to assign a separate next_nid_to_alloc for each thread.
> 
> Consequently, the hstate_next_node_to_alloc and for_each_node_mask_to_alloc
> have been modified to directly accept a *next_nid_to_alloc parameter,
> ensuring thread-specific allocation and avoiding concurrent access issues.
> 
> Signed-off-by: Gang Li <gang.li@linux.dev>
> Tested-by: David Rientjes <rientjes@google.com>

Reviewed-by: Muchun Song <muchun.song@linux.dev>

Thanks.
  
Muchun Song Jan. 22, 2024, 6:16 a.m. UTC | #3
On 2024/1/18 20:39, Gang Li wrote:
> With parallelization of hugetlb allocation across different threads, each
> thread works on a differnet node to allocate pages from, instead of all
> allocating from a common node h->next_nid_to_alloc.  To address this, it's
> necessary to assign a separate next_nid_to_alloc for each thread.
>
> Consequently, the hstate_next_node_to_alloc and for_each_node_mask_to_alloc
> have been modified to directly accept a *next_nid_to_alloc parameter,
> ensuring thread-specific allocation and avoiding concurrent access issues.
>
> Signed-off-by: Gang Li <gang.li@linux.dev>
> Tested-by: David Rientjes <rientjes@google.com>
> ---
>   mm/hugetlb.c | 22 ++++++++++++----------
>   1 file changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 98ae108e1fac..effe5539e545 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1464,15 +1464,15 @@ static int get_valid_node_allowed(int nid, nodemask_t *nodes_allowed)
>    * next node from which to allocate, handling wrap at end of node
>    * mask.
>    */
> -static int hstate_next_node_to_alloc(struct hstate *h,
> +static int hstate_next_node_to_alloc(int *next_node,
>   					nodemask_t *nodes_allowed)
>   {
>   	int nid;
>   
>   	VM_BUG_ON(!nodes_allowed);
>   
> -	nid = get_valid_node_allowed(h->next_nid_to_alloc, nodes_allowed);
> -	h->next_nid_to_alloc = next_node_allowed(nid, nodes_allowed);
> +	nid = get_valid_node_allowed(*next_node, nodes_allowed);
> +	*next_node = next_node_allowed(nid, nodes_allowed);
>   
>   	return nid;
>   }
> @@ -1495,10 +1495,10 @@ static int hstate_next_node_to_free(struct hstate *h, nodemask_t *nodes_allowed)
>   	return nid;
>   }
>   
> -#define for_each_node_mask_to_alloc(hs, nr_nodes, node, mask)		\
> +#define for_each_node_mask_to_alloc(next_node, nr_nodes, node, mask)		\
>   	for (nr_nodes = nodes_weight(*mask);				\
>   		nr_nodes > 0 &&						\
> -		((node = hstate_next_node_to_alloc(hs, mask)) || 1);	\
> +		((node = hstate_next_node_to_alloc(next_node, mask)) || 1);	\
>   		nr_nodes--)
>   
>   #define for_each_node_mask_to_free(hs, nr_nodes, node, mask)		\
> @@ -2350,12 +2350,13 @@ static void prep_and_add_allocated_folios(struct hstate *h,
>    */
>   static struct folio *alloc_pool_huge_folio(struct hstate *h,
>   					nodemask_t *nodes_allowed,
> -					nodemask_t *node_alloc_noretry)
> +					nodemask_t *node_alloc_noretry,
> +					int *next_node)
>   {
>   	gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
>   	int nr_nodes, node;
>   
> -	for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) {
> +	for_each_node_mask_to_alloc(next_node, nr_nodes, node, nodes_allowed) {


A small question here, why not pass h->next_nid_to_alloc to
for_each_node_mask_to_alloc()? What's the purpose of the third
parameter of alloc_pool_huge_folio()?

Thanks.
  
Gang Li Jan. 22, 2024, 9:14 a.m. UTC | #4
On 2024/1/22 14:16, Muchun Song wrote:
> On 2024/1/18 20:39, Gang Li wrote:
>>   static struct folio *alloc_pool_huge_folio(struct hstate *h,
>>                       nodemask_t *nodes_allowed,
>> -                    nodemask_t *node_alloc_noretry)
>> +                    nodemask_t *node_alloc_noretry,
>> +                    int *next_node)
>>   {
>>       gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
>>       int nr_nodes, node;
>> -    for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) {
>> +    for_each_node_mask_to_alloc(next_node, nr_nodes, node, 
>> nodes_allowed) {
> 
> 
> A small question here, why not pass h->next_nid_to_alloc to
> for_each_node_mask_to_alloc()? What's the purpose of the third
> parameter of alloc_pool_huge_folio()?
> 
> Thanks.
> 

In hugetlb_alloc_node->alloc_pool_huge_folio, hugetlb is initialized in
parallel at boot time, then it needs each thread to have its own
next_nid, and can't use the global h->next_nid_to_alloc. so an extra 
parameter is added.

And h->next_nid_to_alloc in set_max_huge_pages->alloc_pool_huge_folio
can not be removed. Because if the user calls set_max_huge_pages
frequently and only adds 1 page at a time, that would result in each
request being made on the same node if local variables are used.
  
Muchun Song Jan. 22, 2024, 9:50 a.m. UTC | #5
> On Jan 22, 2024, at 17:14, Gang Li <gang.li@linux.dev> wrote:
> 
> On 2024/1/22 14:16, Muchun Song wrote:
>> On 2024/1/18 20:39, Gang Li wrote:
>>>   static struct folio *alloc_pool_huge_folio(struct hstate *h,
>>>                       nodemask_t *nodes_allowed,
>>> -                    nodemask_t *node_alloc_noretry)
>>> +                    nodemask_t *node_alloc_noretry,
>>> +                    int *next_node)
>>>   {
>>>       gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
>>>       int nr_nodes, node;
>>> -    for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) {
>>> +    for_each_node_mask_to_alloc(next_node, nr_nodes, node, nodes_allowed) {
>> A small question here, why not pass h->next_nid_to_alloc to
>> for_each_node_mask_to_alloc()? What's the purpose of the third
>> parameter of alloc_pool_huge_folio()?
>> Thanks.
> 
> In hugetlb_alloc_node->alloc_pool_huge_folio, hugetlb is initialized in
> parallel at boot time, then it needs each thread to have its own
> next_nid, and can't use the global h->next_nid_to_alloc. so an extra parameter is added.

Yes. When I read your patch 6, I realized this.

> 
> And h->next_nid_to_alloc in set_max_huge_pages->alloc_pool_huge_folio
> can not be removed. Because if the user calls set_max_huge_pages
> frequently and only adds 1 page at a time, that would result in each
> request being made on the same node if local variables are used.
  

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 98ae108e1fac..effe5539e545 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1464,15 +1464,15 @@  static int get_valid_node_allowed(int nid, nodemask_t *nodes_allowed)
  * next node from which to allocate, handling wrap at end of node
  * mask.
  */
-static int hstate_next_node_to_alloc(struct hstate *h,
+static int hstate_next_node_to_alloc(int *next_node,
 					nodemask_t *nodes_allowed)
 {
 	int nid;
 
 	VM_BUG_ON(!nodes_allowed);
 
-	nid = get_valid_node_allowed(h->next_nid_to_alloc, nodes_allowed);
-	h->next_nid_to_alloc = next_node_allowed(nid, nodes_allowed);
+	nid = get_valid_node_allowed(*next_node, nodes_allowed);
+	*next_node = next_node_allowed(nid, nodes_allowed);
 
 	return nid;
 }
@@ -1495,10 +1495,10 @@  static int hstate_next_node_to_free(struct hstate *h, nodemask_t *nodes_allowed)
 	return nid;
 }
 
-#define for_each_node_mask_to_alloc(hs, nr_nodes, node, mask)		\
+#define for_each_node_mask_to_alloc(next_node, nr_nodes, node, mask)		\
 	for (nr_nodes = nodes_weight(*mask);				\
 		nr_nodes > 0 &&						\
-		((node = hstate_next_node_to_alloc(hs, mask)) || 1);	\
+		((node = hstate_next_node_to_alloc(next_node, mask)) || 1);	\
 		nr_nodes--)
 
 #define for_each_node_mask_to_free(hs, nr_nodes, node, mask)		\
@@ -2350,12 +2350,13 @@  static void prep_and_add_allocated_folios(struct hstate *h,
  */
 static struct folio *alloc_pool_huge_folio(struct hstate *h,
 					nodemask_t *nodes_allowed,
-					nodemask_t *node_alloc_noretry)
+					nodemask_t *node_alloc_noretry,
+					int *next_node)
 {
 	gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
 	int nr_nodes, node;
 
-	for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) {
+	for_each_node_mask_to_alloc(next_node, nr_nodes, node, nodes_allowed) {
 		struct folio *folio;
 
 		folio = only_alloc_fresh_hugetlb_folio(h, gfp_mask, node,
@@ -3310,7 +3311,7 @@  int __alloc_bootmem_huge_page(struct hstate *h, int nid)
 		goto found;
 	}
 	/* allocate from next node when distributing huge pages */
-	for_each_node_mask_to_alloc(h, nr_nodes, node, &node_states[N_MEMORY]) {
+	for_each_node_mask_to_alloc(&h->next_nid_to_alloc, nr_nodes, node, &node_states[N_MEMORY]) {
 		m = memblock_alloc_try_nid_raw(
 				huge_page_size(h), huge_page_size(h),
 				0, MEMBLOCK_ALLOC_ACCESSIBLE, node);
@@ -3679,7 +3680,7 @@  static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed,
 	VM_BUG_ON(delta != -1 && delta != 1);
 
 	if (delta < 0) {
-		for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) {
+		for_each_node_mask_to_alloc(&h->next_nid_to_alloc, nr_nodes, node, nodes_allowed) {
 			if (h->surplus_huge_pages_node[node])
 				goto found;
 		}
@@ -3794,7 +3795,8 @@  static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
 		cond_resched();
 
 		folio = alloc_pool_huge_folio(h, nodes_allowed,
-						node_alloc_noretry);
+						node_alloc_noretry,
+						&h->next_nid_to_alloc);
 		if (!folio) {
 			prep_and_add_allocated_folios(h, &page_list);
 			spin_lock_irq(&hugetlb_lock);