mm: hugetlb_vmemmap: provide stronger vmemmap allocaction gurantees

Message ID 20230412152337.1203254-1-pasha.tatashin@soleen.com
State New
Headers
Series mm: hugetlb_vmemmap: provide stronger vmemmap allocaction gurantees |

Commit Message

Pasha Tatashin April 12, 2023, 3:23 p.m. UTC
  HugeTLB pages have a struct page optimizations where struct pages for tail
pages are freed. However, when HugeTLB pages are destroyed, the memory for
struct pages (vmemmap) need to be allocated again.

Currently, __GFP_NORETRY flag is used to allocate the memory for vmemmap,
but given that this flag makes very little effort to actually reclaim
memory the returning of huge pages back to the system can be problem. Lets
use __GFP_RETRY_MAYFAIL instead. This flag is also performs graceful
reclaim without causing ooms, but at least it may perform a few retries,
and will fail only when there is genuinely little amount of unused memory
in the system.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Suggested-by: David Rientjes <rientjes@google.com>
---
 mm/hugetlb_vmemmap.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
  

Comments

David Rientjes April 12, 2023, 5:54 p.m. UTC | #1
On Wed, 12 Apr 2023, Pasha Tatashin wrote:

> HugeTLB pages have a struct page optimizations where struct pages for tail
> pages are freed. However, when HugeTLB pages are destroyed, the memory for
> struct pages (vmemmap) need to be allocated again.
> 
> Currently, __GFP_NORETRY flag is used to allocate the memory for vmemmap,
> but given that this flag makes very little effort to actually reclaim
> memory the returning of huge pages back to the system can be problem. Lets
> use __GFP_RETRY_MAYFAIL instead. This flag is also performs graceful
> reclaim without causing ooms, but at least it may perform a few retries,
> and will fail only when there is genuinely little amount of unused memory
> in the system.
> 

Thanks Pasha, this definitely makes sense.  We want to free the hugetlb 
page back to the system so it would be a shame to have to strand it in the 
hugetlb pool because we can't allocate the tail pages (we want to free 
more memory than we're allocating).

> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Suggested-by: David Rientjes <rientjes@google.com>
> ---
>  mm/hugetlb_vmemmap.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index a559037cce00..c4226d2af7cc 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -475,9 +475,12 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head)
>  	 * the range is mapped to the page which @vmemmap_reuse is mapped to.
>  	 * When a HugeTLB page is freed to the buddy allocator, previously
>  	 * discarded vmemmap pages must be allocated and remapping.
> +	 *
> +	 * Use __GFP_RETRY_MAYFAIL to fail only when there is genuinely little
> +	 * unused memory in the system.
>  	 */
>  	ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, vmemmap_reuse,
> -				  GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
> +				  GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE);
>  	if (!ret) {
>  		ClearHPageVmemmapOptimized(head);
>  		static_branch_dec(&hugetlb_optimize_vmemmap_key);

The behavior of __GFP_RETRY_MAYFAIL is different for high-order memory (at 
least larger than PAGE_ALLOC_COSTLY_ORDER).  The order that we're 
allocating would depend on the implementation of alloc_vmemmap_page_list() 
so likely best to move the gfp mask to that function.
  
Mike Kravetz April 12, 2023, 7:57 p.m. UTC | #2
On 04/12/23 10:54, David Rientjes wrote:
> On Wed, 12 Apr 2023, Pasha Tatashin wrote:
> 
> > HugeTLB pages have a struct page optimizations where struct pages for tail
> > pages are freed. However, when HugeTLB pages are destroyed, the memory for
> > struct pages (vmemmap) need to be allocated again.
> > 
> > Currently, __GFP_NORETRY flag is used to allocate the memory for vmemmap,
> > but given that this flag makes very little effort to actually reclaim
> > memory the returning of huge pages back to the system can be problem. Lets
> > use __GFP_RETRY_MAYFAIL instead. This flag is also performs graceful
> > reclaim without causing ooms, but at least it may perform a few retries,
> > and will fail only when there is genuinely little amount of unused memory
> > in the system.
> > 
> 
> Thanks Pasha, this definitely makes sense.  We want to free the hugetlb 
> page back to the system so it would be a shame to have to strand it in the 
> hugetlb pool because we can't allocate the tail pages (we want to free 
> more memory than we're allocating).

Agree.

The hugetlb vmemmmap freeing series went through more than 20 revisions
before being merged.  One issue with much discussion was the need to
allocate vmemmap pages when hugetlb pages were returned to buddy.

It looks like the current set of GFP flags was suggested here:
https://lore.kernel.org/linux-mm/YC4ji+pMhtOs+KVM@dhcp22.suse.cz/

Although, it was also mentioned that __GFP_RETRY_MAYFAIL could be used
instead of __GFP_NORETRY here:
https://lore.kernel.org/linux-mm/YCafit5ruRJ+SL8I@dhcp22.suse.cz/

Adding Michal on Cc: since these were his suggestions.

> 
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > Suggested-by: David Rientjes <rientjes@google.com>
> > ---
> >  mm/hugetlb_vmemmap.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> > index a559037cce00..c4226d2af7cc 100644
> > --- a/mm/hugetlb_vmemmap.c
> > +++ b/mm/hugetlb_vmemmap.c
> > @@ -475,9 +475,12 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head)
> >  	 * the range is mapped to the page which @vmemmap_reuse is mapped to.
> >  	 * When a HugeTLB page is freed to the buddy allocator, previously
> >  	 * discarded vmemmap pages must be allocated and remapping.
> > +	 *
> > +	 * Use __GFP_RETRY_MAYFAIL to fail only when there is genuinely little
> > +	 * unused memory in the system.
> >  	 */
> >  	ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, vmemmap_reuse,
> > -				  GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
> > +				  GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE);
> >  	if (!ret) {
> >  		ClearHPageVmemmapOptimized(head);
> >  		static_branch_dec(&hugetlb_optimize_vmemmap_key);
> 
> The behavior of __GFP_RETRY_MAYFAIL is different for high-order memory (at 
> least larger than PAGE_ALLOC_COSTLY_ORDER).  The order that we're 
> allocating would depend on the implementation of alloc_vmemmap_page_list() 
> so likely best to move the gfp mask to that function.

Good point.
  
Pasha Tatashin April 12, 2023, 7:57 p.m. UTC | #3
On Wed, Apr 12, 2023 at 1:54 PM David Rientjes <rientjes@google.com> wrote:
>
> On Wed, 12 Apr 2023, Pasha Tatashin wrote:
>
> > HugeTLB pages have a struct page optimizations where struct pages for tail
> > pages are freed. However, when HugeTLB pages are destroyed, the memory for
> > struct pages (vmemmap) need to be allocated again.
> >
> > Currently, __GFP_NORETRY flag is used to allocate the memory for vmemmap,
> > but given that this flag makes very little effort to actually reclaim
> > memory the returning of huge pages back to the system can be problem. Lets
> > use __GFP_RETRY_MAYFAIL instead. This flag is also performs graceful
> > reclaim without causing ooms, but at least it may perform a few retries,
> > and will fail only when there is genuinely little amount of unused memory
> > in the system.
> >
>
> Thanks Pasha, this definitely makes sense.  We want to free the hugetlb
> page back to the system so it would be a shame to have to strand it in the
> hugetlb pool because we can't allocate the tail pages (we want to free
> more memory than we're allocating).
>
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > Suggested-by: David Rientjes <rientjes@google.com>
> > ---
> >  mm/hugetlb_vmemmap.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> > index a559037cce00..c4226d2af7cc 100644
> > --- a/mm/hugetlb_vmemmap.c
> > +++ b/mm/hugetlb_vmemmap.c
> > @@ -475,9 +475,12 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head)
> >        * the range is mapped to the page which @vmemmap_reuse is mapped to.
> >        * When a HugeTLB page is freed to the buddy allocator, previously
> >        * discarded vmemmap pages must be allocated and remapping.
> > +      *
> > +      * Use __GFP_RETRY_MAYFAIL to fail only when there is genuinely little
> > +      * unused memory in the system.
> >        */
> >       ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, vmemmap_reuse,
> > -                               GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
> > +                               GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE);
> >       if (!ret) {
> >               ClearHPageVmemmapOptimized(head);
> >               static_branch_dec(&hugetlb_optimize_vmemmap_key);
>
> The behavior of __GFP_RETRY_MAYFAIL is different for high-order memory (at
> least larger than PAGE_ALLOC_COSTLY_ORDER).  The order that we're
> allocating would depend on the implementation of alloc_vmemmap_page_list()
> so likely best to move the gfp mask to that function.

Thank you David. This makes sense, I will send the 2nd version soon.

Pasha
  
Pasha Tatashin April 12, 2023, 8 p.m. UTC | #4
On Wed, Apr 12, 2023 at 3:57 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 04/12/23 10:54, David Rientjes wrote:
> > On Wed, 12 Apr 2023, Pasha Tatashin wrote:
> >
> > > HugeTLB pages have a struct page optimizations where struct pages for tail
> > > pages are freed. However, when HugeTLB pages are destroyed, the memory for
> > > struct pages (vmemmap) need to be allocated again.
> > >
> > > Currently, __GFP_NORETRY flag is used to allocate the memory for vmemmap,
> > > but given that this flag makes very little effort to actually reclaim
> > > memory the returning of huge pages back to the system can be problem. Lets
> > > use __GFP_RETRY_MAYFAIL instead. This flag is also performs graceful
> > > reclaim without causing ooms, but at least it may perform a few retries,
> > > and will fail only when there is genuinely little amount of unused memory
> > > in the system.
> > >
> >
> > Thanks Pasha, this definitely makes sense.  We want to free the hugetlb
> > page back to the system so it would be a shame to have to strand it in the
> > hugetlb pool because we can't allocate the tail pages (we want to free
> > more memory than we're allocating).
>
> Agree.
>
> The hugetlb vmemmmap freeing series went through more than 20 revisions
> before being merged.  One issue with much discussion was the need to
> allocate vmemmap pages when hugetlb pages were returned to buddy.
>
> It looks like the current set of GFP flags was suggested here:
> https://lore.kernel.org/linux-mm/YC4ji+pMhtOs+KVM@dhcp22.suse.cz/
>
> Although, it was also mentioned that __GFP_RETRY_MAYFAIL could be used
> instead of __GFP_NORETRY here:
> https://lore.kernel.org/linux-mm/YCafit5ruRJ+SL8I@dhcp22.suse.cz/
>
> Adding Michal on Cc: since these were his suggestions.

Thank you for the background Mike. I have sent the 2nd version, and
added Michal into that patch.

Pasha
  

Patch

diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index a559037cce00..c4226d2af7cc 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -475,9 +475,12 @@  int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head)
 	 * the range is mapped to the page which @vmemmap_reuse is mapped to.
 	 * When a HugeTLB page is freed to the buddy allocator, previously
 	 * discarded vmemmap pages must be allocated and remapping.
+	 *
+	 * Use __GFP_RETRY_MAYFAIL to fail only when there is genuinely little
+	 * unused memory in the system.
 	 */
 	ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, vmemmap_reuse,
-				  GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
+				  GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE);
 	if (!ret) {
 		ClearHPageVmemmapOptimized(head);
 		static_branch_dec(&hugetlb_optimize_vmemmap_key);