mm: hugetlb: Add Kconfig option to set default nr_overcommit_hugepages

Message ID 88fc41edeb5667534cde344c9220fcdfc00047b1.1686359973.git.josh@joshtriplett.org
State New
Headers
Series mm: hugetlb: Add Kconfig option to set default nr_overcommit_hugepages |

Commit Message

Josh Triplett June 10, 2023, 1:21 a.m. UTC
  The default kernel configuration does not allow any huge page allocation
until after setting nr_hugepages or nr_overcommit_hugepages to a
non-zero value; without setting those, mmap attempts with MAP_HUGETLB
will always fail with -ENOMEM. nr_overcommit_hugepages allows userspace
to attempt to allocate huge pages at runtime, succeeding if the kernel
can find or assemble a free huge page.

Provide a Kconfig option to make nr_overcommit_hugepages default to
unlimited, which permits userspace to always attempt huge page
allocation on a best-effort basis. This makes it easier and more
worthwhile for random applications and libraries to opportunistically
attempt MAP_HUGETLB allocations without special configuration.

In particular, current versions of liburing with IORING_SETUP_NO_MMAP
attempt to allocate the rings in a huge page. This seems likely to lead
to more applications and libraries attempting to use huge pages.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
---
 mm/Kconfig   | 14 ++++++++++++++
 mm/hugetlb.c |  2 ++
 2 files changed, 16 insertions(+)
  

Comments

David Rientjes June 11, 2023, 5:20 a.m. UTC | #1
On Fri, 9 Jun 2023, Josh Triplett wrote:

> The default kernel configuration does not allow any huge page allocation
> until after setting nr_hugepages or nr_overcommit_hugepages to a
> non-zero value; without setting those, mmap attempts with MAP_HUGETLB
> will always fail with -ENOMEM. nr_overcommit_hugepages allows userspace
> to attempt to allocate huge pages at runtime, succeeding if the kernel
> can find or assemble a free huge page.
> 
> Provide a Kconfig option to make nr_overcommit_hugepages default to
> unlimited, which permits userspace to always attempt huge page
> allocation on a best-effort basis. This makes it easier and more
> worthwhile for random applications and libraries to opportunistically
> attempt MAP_HUGETLB allocations without special configuration.
> 
> In particular, current versions of liburing with IORING_SETUP_NO_MMAP
> attempt to allocate the rings in a huge page. This seems likely to lead
> to more applications and libraries attempting to use huge pages.
> 
> Signed-off-by: Josh Triplett <josh@joshtriplett.org>

Why not do this in an initscript?

Or, if absolutely necessary, a kernel command line parameter?

A Kconfig option to set a default value to be ULONG_MAX seems strange if 
you can just write the value to procfs.
  
David Hildenbrand June 12, 2023, 9:12 a.m. UTC | #2
On 11.06.23 07:20, David Rientjes wrote:
> On Fri, 9 Jun 2023, Josh Triplett wrote:
> 
>> The default kernel configuration does not allow any huge page allocation
>> until after setting nr_hugepages or nr_overcommit_hugepages to a
>> non-zero value; without setting those, mmap attempts with MAP_HUGETLB
>> will always fail with -ENOMEM. nr_overcommit_hugepages allows userspace
>> to attempt to allocate huge pages at runtime, succeeding if the kernel
>> can find or assemble a free huge page.
>>
>> Provide a Kconfig option to make nr_overcommit_hugepages default to
>> unlimited, which permits userspace to always attempt huge page
>> allocation on a best-effort basis. This makes it easier and more
>> worthwhile for random applications and libraries to opportunistically
>> attempt MAP_HUGETLB allocations without special configuration.
>>
>> In particular, current versions of liburing with IORING_SETUP_NO_MMAP
>> attempt to allocate the rings in a huge page. This seems likely to lead
>> to more applications and libraries attempting to use huge pages.
>>
>> Signed-off-by: Josh Triplett <josh@joshtriplett.org>
> 
> Why not do this in an initscript?
> 
> Or, if absolutely necessary, a kernel command line parameter?
> 
> A Kconfig option to set a default value to be ULONG_MAX seems strange if
> you can just write the value to procfs.
> 

Agreed, not to mention that huge pages in some environment can cause 
trouble (some architectures -- or with gigantic huge pages --  don't 
support huge page migration and you can run into trouble with 
ZONE_MOVABLE or MIGRATE_CMA, because you'll end up "consuming" all 
memory for unmovable allocations in the system), and we shouldn't 
advocate the use of unlimited overcommit for huge pages ...
  
Mike Kravetz June 12, 2023, 4:42 p.m. UTC | #3
On 06/12/23 11:12, David Hildenbrand wrote:
> On 11.06.23 07:20, David Rientjes wrote:
> > On Fri, 9 Jun 2023, Josh Triplett wrote:
> > 
> > > The default kernel configuration does not allow any huge page allocation
> > > until after setting nr_hugepages or nr_overcommit_hugepages to a
> > > non-zero value; without setting those, mmap attempts with MAP_HUGETLB
> > > will always fail with -ENOMEM. nr_overcommit_hugepages allows userspace
> > > to attempt to allocate huge pages at runtime, succeeding if the kernel
> > > can find or assemble a free huge page.
> > > 
> > > Provide a Kconfig option to make nr_overcommit_hugepages default to
> > > unlimited, which permits userspace to always attempt huge page
> > > allocation on a best-effort basis. This makes it easier and more
> > > worthwhile for random applications and libraries to opportunistically
> > > attempt MAP_HUGETLB allocations without special configuration.
> > > 
> > > In particular, current versions of liburing with IORING_SETUP_NO_MMAP
> > > attempt to allocate the rings in a huge page. This seems likely to lead
> > > to more applications and libraries attempting to use huge pages.
> > > 
> > > Signed-off-by: Josh Triplett <josh@joshtriplett.org>
> > 
> > Why not do this in an initscript?
> > 
> > Or, if absolutely necessary, a kernel command line parameter?
> > 
> > A Kconfig option to set a default value to be ULONG_MAX seems strange if
> > you can just write the value to procfs.
> > 
> 
> Agreed, not to mention that huge pages in some environment can cause trouble
> (some architectures -- or with gigantic huge pages --  don't support huge
> page migration and you can run into trouble with ZONE_MOVABLE or
> MIGRATE_CMA, because you'll end up "consuming" all memory for unmovable
> allocations in the system), and we shouldn't advocate the use of unlimited
> overcommit for huge pages ...
> 

Agree with David(s).  Such an option should really be decided by a sysadmin.

Any reason why liburing can not use THP?  Seems like that would provide the
desired functionality.
  

Patch

diff --git a/mm/Kconfig b/mm/Kconfig
index 7672a22647b4..32c13610c5c4 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -824,6 +824,20 @@  config READ_ONLY_THP_FOR_FS
 
 endif # TRANSPARENT_HUGEPAGE
 
+config HUGEPAGE_OVERCOMMIT_DEFAULT_UNLIMITED
+	bool "Allow huge page allocation attempts by default"
+	depends on HUGETLB_PAGE
+	help
+	  By default, the kernel does not allow any huge page allocation until
+	  after setting nr_hugepages or nr_overcommit_hugepages to a non-zero
+	  value. nr_overcommit_hugepages allows userspace to attempt to
+	  allocate huge pages at runtime, succeeding if the kernel can find or
+	  assemble a free huge page.
+
+	  Enable this option to make nr_overcommit_hugepages default to
+	  unlimited, which permits userspace to always attempt hugepage
+	  allocation.
+
 #
 # UP and nommu archs use km based percpu allocator
 #
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f154019e6b84..65abbe254e10 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4305,6 +4305,8 @@  void __init hugetlb_add_hstate(unsigned int order)
 	mutex_init(&h->resize_lock);
 	h->order = order;
 	h->mask = ~(huge_page_size(h) - 1);
+	if (IS_ENABLED(CONFIG_HUGEPAGE_OVERCOMMIT_DEFAULT_UNLIMITED))
+		h->nr_overcommit_huge_pages = ULONG_MAX;
 	for (i = 0; i < MAX_NUMNODES; ++i)
 		INIT_LIST_HEAD(&h->hugepage_freelists[i]);
 	INIT_LIST_HEAD(&h->hugepage_activelist);