[RFC,v1,4/5] maple_tree: avoid bulk alloc/free to use percpu array more

Message ID 20230808095342.12637-11-vbabka@suse.cz
State New
Headers
Series SLUB percpu array caches and maple tree nodes |

Commit Message

Vlastimil Babka Aug. 8, 2023, 9:53 a.m. UTC
  Using bulk alloc/free on a cache with percpu array should not be
necessary and the bulk alloc actually bypasses the array (the prefill
functionality currently relies on this).

The simplest change is just to convert the respective maple tree
wrappers to do a loop of normal alloc/free.
---
 lib/maple_tree.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)
  

Comments

Peng Zhang Aug. 8, 2023, 11:17 a.m. UTC | #1
在 2023/8/8 17:53, Vlastimil Babka 写道:
> Using bulk alloc/free on a cache with percpu array should not be
> necessary and the bulk alloc actually bypasses the array (the prefill
> functionality currently relies on this).
> 
> The simplest change is just to convert the respective maple tree
> wrappers to do a loop of normal alloc/free.
> ---
>   lib/maple_tree.c | 11 +++++++++--
>   1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 1196d0a17f03..7a8e7c467d7c 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -161,12 +161,19 @@ static inline struct maple_node *mt_alloc_one(gfp_t gfp)
>   
>   static inline int mt_alloc_bulk(gfp_t gfp, size_t size, void **nodes)
>   {
> -	return kmem_cache_alloc_bulk(maple_node_cache, gfp, size, nodes);
> +	int allocated = 0;
> +	for (size_t i = 0; i < size; i++) {
> +		nodes[i] = kmem_cache_alloc(maple_node_cache, gfp);
> +		if (nodes[i])
If the i-th allocation fails, node[i] will be NULL. This is wrong. We'd
better guarantee that mt_alloc_bulk() allocates completely successfully,
or returns 0. The following cases are not allowed:
nodes: [addr1][addr2][NULL][addr3].
> +			allocated++;
> +	}
> +	return allocated;
>   }
>   
>   static inline void mt_free_bulk(size_t size, void __rcu **nodes)
>   {
> -	kmem_cache_free_bulk(maple_node_cache, size, (void **)nodes);
> +	for (size_t i = 0; i < size; i++)
> +		kmem_cache_free(maple_node_cache, nodes[i]);
>   }
>   
>   static void mt_free_rcu(struct rcu_head *head)
  
Liam R. Howlett Aug. 8, 2023, 2:29 p.m. UTC | #2
* Peng Zhang <zhangpeng.00@bytedance.com> [230808 07:17]:
> 
> 
> 在 2023/8/8 17:53, Vlastimil Babka 写道:
> > Using bulk alloc/free on a cache with percpu array should not be
> > necessary and the bulk alloc actually bypasses the array (the prefill
> > functionality currently relies on this).
> > 
> > The simplest change is just to convert the respective maple tree
> > wrappers to do a loop of normal alloc/free.
> > ---
> >   lib/maple_tree.c | 11 +++++++++--
> >   1 file changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> > index 1196d0a17f03..7a8e7c467d7c 100644
> > --- a/lib/maple_tree.c
> > +++ b/lib/maple_tree.c
> > @@ -161,12 +161,19 @@ static inline struct maple_node *mt_alloc_one(gfp_t gfp)
> >   static inline int mt_alloc_bulk(gfp_t gfp, size_t size, void **nodes)
> >   {
> > -	return kmem_cache_alloc_bulk(maple_node_cache, gfp, size, nodes);
> > +	int allocated = 0;
> > +	for (size_t i = 0; i < size; i++) {
> > +		nodes[i] = kmem_cache_alloc(maple_node_cache, gfp);
> > +		if (nodes[i])
> If the i-th allocation fails, node[i] will be NULL. This is wrong. We'd
> better guarantee that mt_alloc_bulk() allocates completely successfully,
> or returns 0. The following cases are not allowed:
> nodes: [addr1][addr2][NULL][addr3].

Thanks for pointing this out Peng.

We can handle a lower number than requested being returned, but we
cannot handle the sparse data.

The kmem_cache_alloc_bulk() can return a failure today - leaving the
array to be cleaned by the caller, so if this is changed to a full
success or full fail, then we will also have to change the caller to
handle whatever state is returned if it differs from
kmem_cache_alloc_bulk().

It might be best to return the size already allocated when a failure is
encountered. This will make the caller, mas_alloc_nodes(), request more
nodes.  Only in the case of zero allocations would this be seen as an
OOM event.

Vlastimil, Is the first kmem_cache_alloc() call failing a possibility?
If so, what should be the corrective action?

> > +			allocated++;
> > +	}
> > +	return allocated;
> >   }
> >   static inline void mt_free_bulk(size_t size, void __rcu **nodes)
> >   {
> > -	kmem_cache_free_bulk(maple_node_cache, size, (void **)nodes);
> > +	for (size_t i = 0; i < size; i++)
> > +		kmem_cache_free(maple_node_cache, nodes[i]);
> >   }
> >   static void mt_free_rcu(struct rcu_head *head)
  

Patch

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 1196d0a17f03..7a8e7c467d7c 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -161,12 +161,19 @@  static inline struct maple_node *mt_alloc_one(gfp_t gfp)
 
 static inline int mt_alloc_bulk(gfp_t gfp, size_t size, void **nodes)
 {
-	return kmem_cache_alloc_bulk(maple_node_cache, gfp, size, nodes);
+	int allocated = 0;
+	for (size_t i = 0; i < size; i++) {
+		nodes[i] = kmem_cache_alloc(maple_node_cache, gfp);
+		if (nodes[i])
+			allocated++;
+	}
+	return allocated;
 }
 
 static inline void mt_free_bulk(size_t size, void __rcu **nodes)
 {
-	kmem_cache_free_bulk(maple_node_cache, size, (void **)nodes);
+	for (size_t i = 0; i < size; i++)
+		kmem_cache_free(maple_node_cache, nodes[i]);
 }
 
 static void mt_free_rcu(struct rcu_head *head)