[v3] mm: hugetlb_vmemmap: provide stronger vmemmap allocation guarantees

Message ID 20230508234059.2529638-1-pasha.tatashin@soleen.com
State New
Headers
Series [v3] mm: hugetlb_vmemmap: provide stronger vmemmap allocation guarantees |

Commit Message

Pasha Tatashin May 8, 2023, 11:40 p.m. UTC
  HugeTLB pages have a struct page optimizations where struct pages for tail
pages are freed. However, when HugeTLB pages are destroyed, the memory for
struct pages (vmemmap) need to be allocated again.

Currently, __GFP_NORETRY flag is used to allocate the memory for vmemmap,
but given that this flag makes very little effort to actually reclaim
memory the returning of huge pages back to the system can be problem. Lets
use __GFP_RETRY_MAYFAIL instead. This flag is also performs graceful
reclaim without causing ooms, but at least it may perform a few retries,
and will fail only when there is genuinely little amount of unused memory
in the system.

Freeing a 1G page requires 16M of free memory. A machine might need to be
reconfigured from one task to another, and release a large number of 1G pages
back to the system if allocating 16M fails, the release won't work.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Suggested-by: David Rientjes <rientjes@google.com>
---
 mm/hugetlb_vmemmap.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

Changelog:
v3
  - updated patch log to include details about when the failure can happen.
v2
  - removed gfp_mask argument from alloc_vmemmap_page_list as suggested by
    David Rientjes.
  - Fixed spelling in the patch title.
  

Comments

Mike Kravetz May 9, 2023, 3:29 a.m. UTC | #1
On 05/08/23 23:40, Pasha Tatashin wrote:
> HugeTLB pages have a struct page optimizations where struct pages for tail
> pages are freed. However, when HugeTLB pages are destroyed, the memory for
> struct pages (vmemmap) need to be allocated again.
> 
> Currently, __GFP_NORETRY flag is used to allocate the memory for vmemmap,
> but given that this flag makes very little effort to actually reclaim
> memory the returning of huge pages back to the system can be problem. Lets
> use __GFP_RETRY_MAYFAIL instead. This flag is also performs graceful
> reclaim without causing ooms, but at least it may perform a few retries,
> and will fail only when there is genuinely little amount of unused memory
> in the system.

It may be worth mentioning that no failures to allocate vmemmap pages
with __GFP_NORETRY have been seen/reported.  At this time, this is only
a theoretical issue.  However, making this change lessens the likelihood
of this theoretical issue ever being seen.

> Freeing a 1G page requires 16M of free memory. A machine might need to be
> reconfigured from one task to another, and release a large number of 1G pages
> back to the system if allocating 16M fails, the release won't work.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Suggested-by: David Rientjes <rientjes@google.com>
> ---
>  mm/hugetlb_vmemmap.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)

I understand Michal's concerns that this is only a theoretical fix for a
theoretical problem.  However, it seems pretty obvious that an allocation with
__GFP_RETRY_MAYFAIL is more likely to succeed than one with __GFP_NORETRY.
So, this patch will at least make the theoretical problem less likely to
happen.  For this reason I would be inclined to make the change.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

> 
> Changelog:
> v3
>   - updated patch log to include details about when the failure can happen.
> v2
>   - removed gfp_mask argument from alloc_vmemmap_page_list as suggested by
>     David Rientjes.
>   - Fixed spelling in the patch title.
> 
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 27f001e0f0a2..f42079b73f82 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -384,8 +384,9 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
>  }
>  
>  static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
> -				   gfp_t gfp_mask, struct list_head *list)
> +				   struct list_head *list)
>  {
> +	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
>  	unsigned long nr_pages = (end - start) >> PAGE_SHIFT;
>  	int nid = page_to_nid((struct page *)start);
>  	struct page *page, *next;
> @@ -413,12 +414,11 @@ static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
>   * @end:	end address of the vmemmap virtual address range that we want to
>   *		remap.
>   * @reuse:	reuse address.
> - * @gfp_mask:	GFP flag for allocating vmemmap pages.
>   *
>   * Return: %0 on success, negative error code otherwise.
>   */
>  static int vmemmap_remap_alloc(unsigned long start, unsigned long end,
> -			       unsigned long reuse, gfp_t gfp_mask)
> +			       unsigned long reuse)
>  {
>  	LIST_HEAD(vmemmap_pages);
>  	struct vmemmap_remap_walk walk = {
> @@ -430,7 +430,7 @@ static int vmemmap_remap_alloc(unsigned long start, unsigned long end,
>  	/* See the comment in the vmemmap_remap_free(). */
>  	BUG_ON(start - reuse != PAGE_SIZE);
>  
> -	if (alloc_vmemmap_page_list(start, end, gfp_mask, &vmemmap_pages))
> +	if (alloc_vmemmap_page_list(start, end, &vmemmap_pages))
>  		return -ENOMEM;
>  
>  	mmap_read_lock(&init_mm);
> @@ -476,8 +476,7 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head)
>  	 * When a HugeTLB page is freed to the buddy allocator, previously
>  	 * discarded vmemmap pages must be allocated and remapping.
>  	 */
> -	ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, vmemmap_reuse,
> -				  GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
> +	ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, vmemmap_reuse);
>  	if (!ret) {
>  		ClearHPageVmemmapOptimized(head);
>  		static_branch_dec(&hugetlb_optimize_vmemmap_key);
> -- 
> 2.40.1.521.gf1e218fcd8-goog
>
  

Patch

diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 27f001e0f0a2..f42079b73f82 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -384,8 +384,9 @@  static int vmemmap_remap_free(unsigned long start, unsigned long end,
 }
 
 static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
-				   gfp_t gfp_mask, struct list_head *list)
+				   struct list_head *list)
 {
+	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
 	unsigned long nr_pages = (end - start) >> PAGE_SHIFT;
 	int nid = page_to_nid((struct page *)start);
 	struct page *page, *next;
@@ -413,12 +414,11 @@  static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
  * @end:	end address of the vmemmap virtual address range that we want to
  *		remap.
  * @reuse:	reuse address.
- * @gfp_mask:	GFP flag for allocating vmemmap pages.
  *
  * Return: %0 on success, negative error code otherwise.
  */
 static int vmemmap_remap_alloc(unsigned long start, unsigned long end,
-			       unsigned long reuse, gfp_t gfp_mask)
+			       unsigned long reuse)
 {
 	LIST_HEAD(vmemmap_pages);
 	struct vmemmap_remap_walk walk = {
@@ -430,7 +430,7 @@  static int vmemmap_remap_alloc(unsigned long start, unsigned long end,
 	/* See the comment in the vmemmap_remap_free(). */
 	BUG_ON(start - reuse != PAGE_SIZE);
 
-	if (alloc_vmemmap_page_list(start, end, gfp_mask, &vmemmap_pages))
+	if (alloc_vmemmap_page_list(start, end, &vmemmap_pages))
 		return -ENOMEM;
 
 	mmap_read_lock(&init_mm);
@@ -476,8 +476,7 @@  int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head)
 	 * When a HugeTLB page is freed to the buddy allocator, previously
 	 * discarded vmemmap pages must be allocated and remapping.
 	 */
-	ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, vmemmap_reuse,
-				  GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
+	ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, vmemmap_reuse);
 	if (!ret) {
 		ClearHPageVmemmapOptimized(head);
 		static_branch_dec(&hugetlb_optimize_vmemmap_key);