[1/1] hugetlb: fix CONFIG_PADATA dependency for non-SMP system

Message ID 20240204072525.1986626-1-gang.li@linux.dev
State New
Headers
Series [1/1] hugetlb: fix CONFIG_PADATA dependency for non-SMP system |

Commit Message

Gang Li Feb. 4, 2024, 7:25 a.m. UTC
  Randy Dunlap and kernel test robot reported a warning:

```
WARNING: unmet direct dependencies detected for PADATA
  Depends on [n]: SMP [=n]
  Selected by [y]:
  - HUGETLBFS [=y] && (X86 [=y] || SPARC64 || ARCH_SUPPORTS_HUGETLBFS [=n] || BROKEN [=n]) && (SYSFS [=y] || SYSCTL [=n])
```

hugetlb parallelization depends on PADATA, and PADATA depends on SMP, so
when the SMP config is disabled, the dependency of hugetlb on padata
should be downgraded to single thread.

Fixes: f2f635264b98 ("hugetlb: have CONFIG_HUGETLBFS select CONFIG_PADATA")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/lkml/ec5dc528-2c3c-4444-9e88-d2c48395b433@infradead.org/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402020454.6EPkP1hi-lkp@intel.com/
Signed-off-by: Gang Li <ligang.bdlg@bytedance.com>
```
Hi Andrew, this fix patch is based on mm/mm-unstable.
Thanks!
```
---
 fs/Kconfig   | 2 +-
 mm/hugetlb.c | 9 ++++++++-
 2 files changed, 9 insertions(+), 2 deletions(-)
  

Comments

Muchun Song Feb. 4, 2024, 7:44 a.m. UTC | #1
On 2024/2/4 15:25, Gang Li wrote:
> Randy Dunlap and kernel test robot reported a warning:
>
> ```
> WARNING: unmet direct dependencies detected for PADATA
>    Depends on [n]: SMP [=n]
>    Selected by [y]:
>    - HUGETLBFS [=y] && (X86 [=y] || SPARC64 || ARCH_SUPPORTS_HUGETLBFS [=n] || BROKEN [=n]) && (SYSFS [=y] || SYSCTL [=n])
> ```
>
> hugetlb parallelization depends on PADATA, and PADATA depends on SMP, so
> when the SMP config is disabled, the dependency of hugetlb on padata
> should be downgraded to single thread.
>
> Fixes: f2f635264b98 ("hugetlb: have CONFIG_HUGETLBFS select CONFIG_PADATA")
> Reported-by: Randy Dunlap <rdunlap@infradead.org>
> Closes: https://lore.kernel.org/lkml/ec5dc528-2c3c-4444-9e88-d2c48395b433@infradead.org/
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202402020454.6EPkP1hi-lkp@intel.com/
> Signed-off-by: Gang Li <ligang.bdlg@bytedance.com>
> ```
> Hi Andrew, this fix patch is based on mm/mm-unstable.
> Thanks!
> ```
> ---
>   fs/Kconfig   | 2 +-
>   mm/hugetlb.c | 9 ++++++++-
>   2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 3abc107ab2fbd..f2bc73fc0417e 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -261,7 +261,7 @@ menuconfig HUGETLBFS
>   	depends on X86 || SPARC64 || ARCH_SUPPORTS_HUGETLBFS || BROKEN
>   	depends on (SYSFS || SYSCTL)
>   	select MEMFD_CREATE
> -	select PADATA
> +	select PADATA if SMP

I don't think it is a clear way to fix this. If someone want to
use PADATA in a non-SMP system, he should be carefully to handle
the non-SMP case himself. I think the better way is to make PADATA
handle the non-SMP case, I think it should be easy for it, which
could just call ->thread_fn() many times instead of creating many
threads in the non-SMP case.

Thanks.

>   	help
>   	  hugetlbfs is a filesystem backing for HugeTLB pages, based on
>   	  ramfs. For architectures that support it, say Y here and read
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index bf3d5dfb921e6..1b01b244fb50b 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3457,6 +3457,7 @@ static void __init gather_bootmem_prealloc_node(unsigned long start, unsigned lo
>   
>   static void __init gather_bootmem_prealloc(void)
>   {
> +#ifdef CONFIG_PADATA
>   	struct padata_mt_job job = {
>   		.thread_fn	= gather_bootmem_prealloc_node,
>   		.fn_arg		= NULL,
> @@ -3469,6 +3470,9 @@ static void __init gather_bootmem_prealloc(void)
>   	};
>   
>   	padata_do_multithreaded(&job);
> +#else
> +	gather_bootmem_prealloc_node(0, 0, NULL);
> +#endif
>   }
>   
>   static void __init hugetlb_hstate_alloc_pages_onenode(struct hstate *h, int nid)
> @@ -3568,6 +3572,7 @@ static unsigned long __init hugetlb_gigantic_pages_alloc_boot(struct hstate *h)
>   
>   static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
>   {
> +#ifdef CONFIG_PADATA
>   	struct padata_mt_job job = {
>   		.fn_arg		= h,
>   		.align		= 1,
> @@ -3600,7 +3605,9 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
>   	job.max_threads	= num_node_state(N_MEMORY) * 2;
>   	job.min_chunk	= h->max_huge_pages / num_node_state(N_MEMORY) / 2;
>   	padata_do_multithreaded(&job);
> -
> +#else
> +	hugetlb_pages_alloc_boot_node(0, h->max_huge_pages, h);
> +#endif
>   	return h->nr_huge_pages;
>   }
>
  
Gang Li Feb. 4, 2024, 7:48 a.m. UTC | #2
On 2024/2/4 15:44, Muchun Song wrote:
> I don't think it is a clear way to fix this. If someone want to
> use PADATA in a non-SMP system, he should be carefully to handle
> the non-SMP case himself. I think the better way is to make PADATA
> handle the non-SMP case, I think it should be easy for it, which
> could just call ->thread_fn() many times instead of creating many
> threads in the non-SMP case.
> 
> Thanks.
>

Sounds good, I'll take a look at padata and send a new patch.
  
Gang Li Feb. 5, 2024, 6:55 a.m. UTC | #3
On 2024/2/4 15:48, Gang Li wrote:
> On 2024/2/4 15:44, Muchun Song wrote:
>> I don't think it is a clear way to fix this. If someone want to
>> use PADATA in a non-SMP system, he should be carefully to handle
>> the non-SMP case himself. I think the better way is to make PADATA
>> handle the non-SMP case, I think it should be easy for it, which
>> could just call ->thread_fn() many times instead of creating many
>> threads in the non-SMP case.
>>
>> Thanks.
>>
> 
> Sounds good, I'll take a look at padata and send a new patch.

1. delete the dependency on SMP

PADATA only depends on workqueue and completion. It works well with !SMP
currently but has no performance benefits. What we can do is make PADATA
handle the non-SMP case more elegantly.

PADATA has two parts: "Running Multithreaded Jobs" and "Running
Serialized Jobs".

"Running Multithreaded Jobs", which hugetlb parallelization relies on
can be easily deparallelize through this patch:

```
@@ -495,7 +495,7 @@ void __init padata_do_multithreaded(struct 
padata_mt_job *job)
        nworks = max(job->size / max(job->min_chunk, job->align), 1ul);
        nworks = min(nworks, job->max_threads);

-      if (nworks == 1) {
+      if (nworks == 1 || !IS_ENABLED(CONFIG_SMP)) {
                /* Single thread, no coordination needed, cut to the 
chase. */
                job->thread_fn(job->start, job->start + job->size, 
job->fn_arg);
                return;
```

However, "Running Serialized Jobs" is more challenging due to its
various workers queuing each other, making it more complex than "Running
Multithreaded Jobs." I am currently in the process of deciphering the
code.

To eliminate kconfig warnings, other methods could be considered:

2. Split hugetlb parallelization into a separate kconfig.
3. Wrap hugetlb parallelization with SMP or PADATA macros (already ruled 
out).
4. Split PADATA into PADATA_SERIALIZED and PADATA_MULTITHREADED (too heavy).

Anyway, this is only FYI. I will continue exploring how to deparallelize
"Running Serialized Jobs."
  
Muchun Song Feb. 5, 2024, 7:37 a.m. UTC | #4
> On Feb 5, 2024, at 14:55, Gang Li <gang.li@linux.dev> wrote:
> 
> 
> On 2024/2/4 15:48, Gang Li wrote:
>> On 2024/2/4 15:44, Muchun Song wrote:
>>> I don't think it is a clear way to fix this. If someone want to
>>> use PADATA in a non-SMP system, he should be carefully to handle
>>> the non-SMP case himself. I think the better way is to make PADATA
>>> handle the non-SMP case, I think it should be easy for it, which
>>> could just call ->thread_fn() many times instead of creating many
>>> threads in the non-SMP case.
>>> 
>>> Thanks.
>>> 
>> Sounds good, I'll take a look at padata and send a new patch.
> 
> 1. delete the dependency on SMP
> 
> PADATA only depends on workqueue and completion. It works well with !SMP
> currently but has no performance benefits. What we can do is make PADATA
> handle the non-SMP case more elegantly.
> 
> PADATA has two parts: "Running Multithreaded Jobs" and "Running
> Serialized Jobs".
> 
> "Running Multithreaded Jobs", which hugetlb parallelization relies on
> can be easily deparallelize through this patch:
> 
> ```
> @@ -495,7 +495,7 @@ void __init padata_do_multithreaded(struct padata_mt_job *job)
>       nworks = max(job->size / max(job->min_chunk, job->align), 1ul);
>       nworks = min(nworks, job->max_threads);
> 
> -      if (nworks == 1) {
> +      if (nworks == 1 || !IS_ENABLED(CONFIG_SMP)) {
>               /* Single thread, no coordination needed, cut to the chase. */
>               job->thread_fn(job->start, job->start + job->size, job->fn_arg);
>               return;
> ```
> 
> However, "Running Serialized Jobs" is more challenging due to its
> various workers queuing each other, making it more complex than "Running
> Multithreaded Jobs." I am currently in the process of deciphering the
> code.

Actually, I did not get it. Why the above code cannot work? The above
code already make it serialized in one call, right? What do I miss here?

Thanks.

> 
> To eliminate kconfig warnings, other methods could be considered:
> 
> 2. Split hugetlb parallelization into a separate kconfig.
> 3. Wrap hugetlb parallelization with SMP or PADATA macros (already ruled out).
> 4. Split PADATA into PADATA_SERIALIZED and PADATA_MULTITHREADED (too heavy).
> 
> Anyway, this is only FYI. I will continue exploring how to deparallelize
> "Running Serialized Jobs."
  
Gang Li Feb. 5, 2024, 8:08 a.m. UTC | #5
On 2024/2/5 15:37, Muchun Song wrote:
> Actually, I did not get it. Why the above code cannot work? The above
> code already make it serialized in one call, right? What do I miss here?
> 
> Thanks.
> 

PADATA consists of two distinct functionality:

One part is `padata_do_multithreaded`, which disregards
order and simply divides tasks into several groups for parallel
execution. My patch use `padata_do_multithreaded`.

The other part is composed of a set of APIs that, while handling data in
an out-of-order parallel manner, can eventually return the data with
ordered sequence. Only `crypto/pcrypt.c` use them. I guess these APIs
are designed specifically for `crypto/pcrypt.c`.
```
padata_alloc
padata_alloc_shell
padata_do_parallel
padata_do_serial
padata_free_shell
padata_free
```
  

Patch

diff --git a/fs/Kconfig b/fs/Kconfig
index 3abc107ab2fbd..f2bc73fc0417e 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -261,7 +261,7 @@  menuconfig HUGETLBFS
 	depends on X86 || SPARC64 || ARCH_SUPPORTS_HUGETLBFS || BROKEN
 	depends on (SYSFS || SYSCTL)
 	select MEMFD_CREATE
-	select PADATA
+	select PADATA if SMP
 	help
 	  hugetlbfs is a filesystem backing for HugeTLB pages, based on
 	  ramfs. For architectures that support it, say Y here and read
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bf3d5dfb921e6..1b01b244fb50b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3457,6 +3457,7 @@  static void __init gather_bootmem_prealloc_node(unsigned long start, unsigned lo
 
 static void __init gather_bootmem_prealloc(void)
 {
+#ifdef CONFIG_PADATA
 	struct padata_mt_job job = {
 		.thread_fn	= gather_bootmem_prealloc_node,
 		.fn_arg		= NULL,
@@ -3469,6 +3470,9 @@  static void __init gather_bootmem_prealloc(void)
 	};
 
 	padata_do_multithreaded(&job);
+#else
+	gather_bootmem_prealloc_node(0, 0, NULL);
+#endif
 }
 
 static void __init hugetlb_hstate_alloc_pages_onenode(struct hstate *h, int nid)
@@ -3568,6 +3572,7 @@  static unsigned long __init hugetlb_gigantic_pages_alloc_boot(struct hstate *h)
 
 static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
 {
+#ifdef CONFIG_PADATA
 	struct padata_mt_job job = {
 		.fn_arg		= h,
 		.align		= 1,
@@ -3600,7 +3605,9 @@  static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
 	job.max_threads	= num_node_state(N_MEMORY) * 2;
 	job.min_chunk	= h->max_huge_pages / num_node_state(N_MEMORY) / 2;
 	padata_do_multithreaded(&job);
-
+#else
+	hugetlb_pages_alloc_boot_node(0, h->max_huge_pages, h);
+#endif
 	return h->nr_huge_pages;
 }