Message ID | 88fc41edeb5667534cde344c9220fcdfc00047b1.1686359973.git.josh@joshtriplett.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1305334vqr; Fri, 9 Jun 2023 19:16:00 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6R/QmAE0JAqEcHDQ1/ALmDWKCqg4zZDQPpc9Z7ZrsvCQlfE6azOQ6QDFpqgq5IRgw/JLYG X-Received: by 2002:a17:90b:1494:b0:253:62c2:4e1b with SMTP id js20-20020a17090b149400b0025362c24e1bmr2161017pjb.48.1686363360200; Fri, 09 Jun 2023 19:16:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686363360; cv=none; d=google.com; s=arc-20160816; b=cOuNe3zbcu5tRMLOhBlH57WBwvVrb/GB4zFl0Il8Gzy5y6RrIo4uyGOWcS1SAyszeb kgdHkzuyQkErXA8EBQtgQfy9zUq3Zc/5/GHpbO2chIV+fF8neTnVe0PvgMh0BXlIb0lN wSnMgXoe0byKzd4VfWlzZZBsBD8sz4LOIZd5wkZXBZ5RD3Ms2+6VcvnnfYXGeWh5lWxY pH/LDs6QzZHZKYPc2O8m2nx2MC7ohB7Z9s+W1RW2LUBHrGeLECjtcO0C3TXmbNzGWhJO UEkvUvB3h17NhVzCdm583zWlii9+rz7QC0p9ttG+4OV9H3csfIUP0vNjm7N7QrqXFcrL W2WA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:date:feedback-id:dkim-signature:dkim-signature; bh=6g2RVl+GpOZe2oQM/8nONsqThYPCh2VfCuomeQ/pDfE=; b=MgFVqQ1BaVFq4o0Jv2vzzyC70EzP5ZjUor9drsSaG/jLxHR9GUNZo4EqQQSY/ZIBnk WZGsdmEHRbO6CiQES37z+2lIQ8kets1U/0x0svggvFN9obY6MeW9smSRgD8dMvIhEmm0 i4vUUPfRAZlTBFCoOqMRbAmu8F8jUxVOc1IORtzNxg/HTFxxYEmN5h+ywIG1cICSTCTY ZgjTONbnGLlqR3H4Ju+COgzOKGbXOnCUTGvKaBwDm9cqK0Qq6myXmy6DKOGObYcYoSKK jLdyAjKyWvvWEm+n4GQqN1rgYn424ohstOeHsFrFiD8mnMzljTvVEwGqw8GBFFZjCNlO sQvA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joshtriplett.org header.s=fm3 header.b=ixy4icfz; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b=PQOIbIq4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u63-20020a17090a51c500b002569b9b4397si5319900pjh.47.2023.06.09.19.15.46; Fri, 09 Jun 2023 19:16:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@joshtriplett.org header.s=fm3 header.b=ixy4icfz; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b=PQOIbIq4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233206AbjFJBV0 (ORCPT <rfc822;liningstudo@gmail.com> + 99 others); Fri, 9 Jun 2023 21:21:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231938AbjFJBVY (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 9 Jun 2023 21:21:24 -0400 Received: from wout1-smtp.messagingengine.com (wout1-smtp.messagingengine.com [64.147.123.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A49830F9 for <linux-kernel@vger.kernel.org>; Fri, 9 Jun 2023 18:21:23 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.west.internal (Postfix) with ESMTP id A37AD3200914; Fri, 9 Jun 2023 21:21:22 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Fri, 09 Jun 2023 21:21:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= joshtriplett.org; h=cc:cc:content-type:content-type:date:date :from:from:in-reply-to:message-id:mime-version:reply-to:sender :subject:subject:to:to; s=fm3; t=1686360082; x=1686446482; bh=6g 2RVl+GpOZe2oQM/8nONsqThYPCh2VfCuomeQ/pDfE=; b=ixy4icfz8BrLovCkGr M+huV0302OV6wUxAHuOR+c8W7vhfc09wvPnQtzl/nyMMZlhikc0m2JllT/wBCSx2 DJBNVVsDNBS3gHUHZKMqLwg2jxICrp1LeyH8cNponrqEcFay6AKG1VR0p0ZAlS+w PsgX8lv7a3kWeh8no10JKcbzUDAfMX3VPG44LwghwXywN1GQGuGPSKmX/0+/zTbD zl2S4VGjT1DEh8u6f++9vzvY/9zLESx9j27yQNmdW7GFwGyUcxvDDk2QahYmNfHI 2Wjja2kUQaViuSVi2W767xTAmi7+eM4EEueTCOiFHA+FdYTj4WwcJjjZBL/xESP0 Jw6A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:message-id :mime-version:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1686360082; x=1686446482; bh=6g2RVl+GpOZe2oQM/8nONsqThYPCh2VfCuo meQ/pDfE=; b=PQOIbIq42EKpMxvSNOdVfZdzdFqzS422zwnuMhi1KtYihXf+HQB CDM/53QH55Z7zYu8XI9xwlYd4009UEIy2fa7ZYXyDh5RhrUJiUzu2ta760QrMh7T zDU5OpVdxL+pa0RhzeXRHZOb3h9sO6WGBnTNL68z7ytYOfl5gGhI4zN4lJcbg/7v +ozTSN6y1nQ22+30ITaBGSJJmLoIYYAPykTz3REh3f6TBbB+/fE8Ae0F5biK+DEN IU6PbJMMTItpt69RMEJsl9ihkzxsGKGug3zHQ2NndV0II610u0xINkHKFJTIiGVq YJogkUZEJuS1sUtoAP5RWW/algYrKYYW5iA== X-ME-Sender: <xms:EdCDZEnuqTpMeXgN36L4Zpip7N2TkMj0AhVmial-eRuINCtTvEwqlw> <xme:EdCDZD0nsucEMJATGovBxCphseTFiBkD4EYRSPnnKGd03AcArxQvEbNAtXMmwjuLg xmPkZEEdvg7SllfXvg> X-ME-Received: <xmr:EdCDZCrbt7-Xqe5cdGwbP1i45-TKH9bkp4oRMpfyse39EULVnc5-FV3wXA> X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrgedtledggeegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvfevuffkgggtugesthdtredttddtvdenucfhrhhomheplfhoshhhucfv rhhiphhlvghtthcuoehjohhshhesjhhoshhhthhrihhplhgvthhtrdhorhhgqeenucggtf frrghtthgvrhhnpeduvdelheettdfgvddvleegueefudegudevffekjeegffefvdeikeeh vdehleekhfenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhroh hmpehjohhshhesjhhoshhhthhrihhplhgvthhtrdhorhhg X-ME-Proxy: <xmx:EtCDZAl5y4vTebDsgAhVLsvibDfw1pcD7OXOOD3rzVRQKLUwOrQF6g> <xmx:EtCDZC1j8PKzdSj7JDVgKCIM82hJUX4RmDxmTx8nYeK3wqnFcZDJGQ> <xmx:EtCDZHtE1139yibXIvshqcC8i2L-6Zl2zQbSaB5zAAcEAKBJzKlegQ> <xmx:EtCDZFSCdQ3yO6YBH2yuChoDRmOHEYmXWmDVcenxHdmeBept6IKQtw> Feedback-ID: i83e94755:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 9 Jun 2023 21:21:20 -0400 (EDT) Date: Fri, 9 Jun 2023 18:21:19 -0700 From: Josh Triplett <josh@joshtriplett.org> To: Andrew Morton <akpm@linux-foundation.org>, Mike Kravetz <mike.kravetz@oracle.com>, Muchun Song <muchun.song@linux.dev>, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Subject: [PATCH] mm: hugetlb: Add Kconfig option to set default nr_overcommit_hugepages Message-ID: <88fc41edeb5667534cde344c9220fcdfc00047b1.1686359973.git.josh@joshtriplett.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768280146758034093?= X-GMAIL-MSGID: =?utf-8?q?1768280146758034093?= |
Series |
mm: hugetlb: Add Kconfig option to set default nr_overcommit_hugepages
|
|
Commit Message
Josh Triplett
June 10, 2023, 1:21 a.m. UTC
The default kernel configuration does not allow any huge page allocation
until after setting nr_hugepages or nr_overcommit_hugepages to a
non-zero value; without setting those, mmap attempts with MAP_HUGETLB
will always fail with -ENOMEM. nr_overcommit_hugepages allows userspace
to attempt to allocate huge pages at runtime, succeeding if the kernel
can find or assemble a free huge page.
Provide a Kconfig option to make nr_overcommit_hugepages default to
unlimited, which permits userspace to always attempt huge page
allocation on a best-effort basis. This makes it easier and more
worthwhile for random applications and libraries to opportunistically
attempt MAP_HUGETLB allocations without special configuration.
In particular, current versions of liburing with IORING_SETUP_NO_MMAP
attempt to allocate the rings in a huge page. This seems likely to lead
to more applications and libraries attempting to use huge pages.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
---
mm/Kconfig | 14 ++++++++++++++
mm/hugetlb.c | 2 ++
2 files changed, 16 insertions(+)
Comments
On Fri, 9 Jun 2023, Josh Triplett wrote: > The default kernel configuration does not allow any huge page allocation > until after setting nr_hugepages or nr_overcommit_hugepages to a > non-zero value; without setting those, mmap attempts with MAP_HUGETLB > will always fail with -ENOMEM. nr_overcommit_hugepages allows userspace > to attempt to allocate huge pages at runtime, succeeding if the kernel > can find or assemble a free huge page. > > Provide a Kconfig option to make nr_overcommit_hugepages default to > unlimited, which permits userspace to always attempt huge page > allocation on a best-effort basis. This makes it easier and more > worthwhile for random applications and libraries to opportunistically > attempt MAP_HUGETLB allocations without special configuration. > > In particular, current versions of liburing with IORING_SETUP_NO_MMAP > attempt to allocate the rings in a huge page. This seems likely to lead > to more applications and libraries attempting to use huge pages. > > Signed-off-by: Josh Triplett <josh@joshtriplett.org> Why not do this in an initscript? Or, if absolutely necessary, a kernel command line parameter? A Kconfig option to set a default value to be ULONG_MAX seems strange if you can just write the value to procfs.
On 11.06.23 07:20, David Rientjes wrote: > On Fri, 9 Jun 2023, Josh Triplett wrote: > >> The default kernel configuration does not allow any huge page allocation >> until after setting nr_hugepages or nr_overcommit_hugepages to a >> non-zero value; without setting those, mmap attempts with MAP_HUGETLB >> will always fail with -ENOMEM. nr_overcommit_hugepages allows userspace >> to attempt to allocate huge pages at runtime, succeeding if the kernel >> can find or assemble a free huge page. >> >> Provide a Kconfig option to make nr_overcommit_hugepages default to >> unlimited, which permits userspace to always attempt huge page >> allocation on a best-effort basis. This makes it easier and more >> worthwhile for random applications and libraries to opportunistically >> attempt MAP_HUGETLB allocations without special configuration. >> >> In particular, current versions of liburing with IORING_SETUP_NO_MMAP >> attempt to allocate the rings in a huge page. This seems likely to lead >> to more applications and libraries attempting to use huge pages. >> >> Signed-off-by: Josh Triplett <josh@joshtriplett.org> > > Why not do this in an initscript? > > Or, if absolutely necessary, a kernel command line parameter? > > A Kconfig option to set a default value to be ULONG_MAX seems strange if > you can just write the value to procfs. > Agreed, not to mention that huge pages in some environment can cause trouble (some architectures -- or with gigantic huge pages -- don't support huge page migration and you can run into trouble with ZONE_MOVABLE or MIGRATE_CMA, because you'll end up "consuming" all memory for unmovable allocations in the system), and we shouldn't advocate the use of unlimited overcommit for huge pages ...
On 06/12/23 11:12, David Hildenbrand wrote: > On 11.06.23 07:20, David Rientjes wrote: > > On Fri, 9 Jun 2023, Josh Triplett wrote: > > > > > The default kernel configuration does not allow any huge page allocation > > > until after setting nr_hugepages or nr_overcommit_hugepages to a > > > non-zero value; without setting those, mmap attempts with MAP_HUGETLB > > > will always fail with -ENOMEM. nr_overcommit_hugepages allows userspace > > > to attempt to allocate huge pages at runtime, succeeding if the kernel > > > can find or assemble a free huge page. > > > > > > Provide a Kconfig option to make nr_overcommit_hugepages default to > > > unlimited, which permits userspace to always attempt huge page > > > allocation on a best-effort basis. This makes it easier and more > > > worthwhile for random applications and libraries to opportunistically > > > attempt MAP_HUGETLB allocations without special configuration. > > > > > > In particular, current versions of liburing with IORING_SETUP_NO_MMAP > > > attempt to allocate the rings in a huge page. This seems likely to lead > > > to more applications and libraries attempting to use huge pages. > > > > > > Signed-off-by: Josh Triplett <josh@joshtriplett.org> > > > > Why not do this in an initscript? > > > > Or, if absolutely necessary, a kernel command line parameter? > > > > A Kconfig option to set a default value to be ULONG_MAX seems strange if > > you can just write the value to procfs. > > > > Agreed, not to mention that huge pages in some environment can cause trouble > (some architectures -- or with gigantic huge pages -- don't support huge > page migration and you can run into trouble with ZONE_MOVABLE or > MIGRATE_CMA, because you'll end up "consuming" all memory for unmovable > allocations in the system), and we shouldn't advocate the use of unlimited > overcommit for huge pages ... > Agree with David(s). Such an option should really be decided by a sysadmin. Any reason why liburing can not use THP? Seems like that would provide the desired functionality.
diff --git a/mm/Kconfig b/mm/Kconfig index 7672a22647b4..32c13610c5c4 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -824,6 +824,20 @@ config READ_ONLY_THP_FOR_FS endif # TRANSPARENT_HUGEPAGE +config HUGEPAGE_OVERCOMMIT_DEFAULT_UNLIMITED + bool "Allow huge page allocation attempts by default" + depends on HUGETLB_PAGE + help + By default, the kernel does not allow any huge page allocation until + after setting nr_hugepages or nr_overcommit_hugepages to a non-zero + value. nr_overcommit_hugepages allows userspace to attempt to + allocate huge pages at runtime, succeeding if the kernel can find or + assemble a free huge page. + + Enable this option to make nr_overcommit_hugepages default to + unlimited, which permits userspace to always attempt hugepage + allocation. + # # UP and nommu archs use km based percpu allocator # diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f154019e6b84..65abbe254e10 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4305,6 +4305,8 @@ void __init hugetlb_add_hstate(unsigned int order) mutex_init(&h->resize_lock); h->order = order; h->mask = ~(huge_page_size(h) - 1); + if (IS_ENABLED(CONFIG_HUGEPAGE_OVERCOMMIT_DEFAULT_UNLIMITED)) + h->nr_overcommit_huge_pages = ULONG_MAX; for (i = 0; i < MAX_NUMNODES; ++i) INIT_LIST_HEAD(&h->hugepage_freelists[i]); INIT_LIST_HEAD(&h->hugepage_activelist);