[RFC,v2,3/6] slub: Don't freeze slabs for cpu partial

Message ID 20231021144317.3400916-4-chengming.zhou@linux.dev
State New
Headers
Series slub: Delay freezing of CPU partial slabs |

Commit Message

Chengming Zhou Oct. 21, 2023, 2:43 p.m. UTC
  From: Chengming Zhou <zhouchengming@bytedance.com>

Now we will freeze slabs when moving them out of node partial list to
cpu partial list, this method needs two cmpxchg_double operations:

1. freeze slab (acquire_slab()) under the node list_lock
2. get_freelist() when pick used in ___slab_alloc()

Actually we don't need to freeze when moving slabs out of node partial
list, we can delay freeze to use slab freelist in ___slab_alloc(), so
we can save one cmpxchg_double().

And there are other good points:

1. The moving of slabs between node partial list and cpu partial list
   becomes simpler, since we don't need to freeze or unfreeze at all.

2. The node list_lock contention would be less, since we only need to
   freeze one slab under the node list_lock. (In fact, we can first
   move slabs out of node partial list, don't need to freeze any slab
   at all, so the contention on slab won't transfer to the node list_lock
   contention.)

We can achieve this because there is no concurrent path would manipulate
the partial slab list except the __slab_free() path, which is serialized
now.

Note this patch just change the parts of moving the partial slabs for
easy code review, we will fix other parts in the following patches.
Specifically this patch change three paths:
1. get partial slab from node: get_partial_node()
2. put partial slab to node: __unfreeze_partials()
3. cache partail slab on cpu when __slab_free()

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/slub.c | 63 +++++++++++++++++--------------------------------------
 1 file changed, 19 insertions(+), 44 deletions(-)
  

Comments

Vlastimil Babka Oct. 23, 2023, 4 p.m. UTC | #1
On 10/21/23 16:43, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
> 
> Now we will freeze slabs when moving them out of node partial list to
> cpu partial list, this method needs two cmpxchg_double operations:
> 
> 1. freeze slab (acquire_slab()) under the node list_lock
> 2. get_freelist() when pick used in ___slab_alloc()
> 
> Actually we don't need to freeze when moving slabs out of node partial
> list, we can delay freeze to use slab freelist in ___slab_alloc(), so
> we can save one cmpxchg_double().
> 
> And there are other good points:
> 
> 1. The moving of slabs between node partial list and cpu partial list
>    becomes simpler, since we don't need to freeze or unfreeze at all.
> 
> 2. The node list_lock contention would be less, since we only need to
>    freeze one slab under the node list_lock. (In fact, we can first
>    move slabs out of node partial list, don't need to freeze any slab
>    at all, so the contention on slab won't transfer to the node list_lock
>    contention.)
> 
> We can achieve this because there is no concurrent path would manipulate
> the partial slab list except the __slab_free() path, which is serialized
> now.
> 
> Note this patch just change the parts of moving the partial slabs for
> easy code review, we will fix other parts in the following patches.
> Specifically this patch change three paths:
> 1. get partial slab from node: get_partial_node()
> 2. put partial slab to node: __unfreeze_partials()
> 3. cache partail slab on cpu when __slab_free()

So the problem with this approach that one patch breaks things and another
one fixes them up, is that git bisect becomes problematic, so we shouldn't
do that and instead bite the bullet and deal with a potentially large patch.
At some level it's unavoidable as one has to grasp all the moving pieces
anyway to see e.g. if the changes in allocation path are compatible with the
changes in freeing.
When possible, we can do preparatory stuff that doesn't break things like in
patches 1 and 2, maybe get_cpu_partial() could also be introduced (even if
unused) before switching the world over to the new scheme in a single patch
(and possible cleanups afterwards). So would it be possible to redo it in
such way please?

<snip>

> @@ -2621,23 +2622,7 @@ static void __unfreeze_partials(struct kmem_cache *s, struct slab *partial_slab)
>  			spin_lock_irqsave(&n->list_lock, flags);
>  		}
>  
> -		do {
> -
> -			old.freelist = slab->freelist;
> -			old.counters = slab->counters;
> -			VM_BUG_ON(!old.frozen);
> -
> -			new.counters = old.counters;
> -			new.freelist = old.freelist;
> -
> -			new.frozen = 0;
> -
> -		} while (!__slab_update_freelist(s, slab,
> -				old.freelist, old.counters,
> -				new.freelist, new.counters,
> -				"unfreezing slab"));
> -
> -		if (unlikely(!new.inuse && n->nr_partial >= s->min_partial)) {
> +		if (unlikely(!slab->inuse && n->nr_partial >= s->min_partial)) {
>  			slab->next = slab_to_discard;
>  			slab_to_discard = slab;
>  		} else {
> @@ -3634,18 +3619,8 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
>  		was_frozen = new.frozen;
>  		new.inuse -= cnt;
>  		if ((!new.inuse || !prior) && !was_frozen) {
> -
> -			if (kmem_cache_has_cpu_partial(s) && !prior) {
> -
> -				/*
> -				 * Slab was on no list before and will be
> -				 * partially empty
> -				 * We can defer the list move and instead
> -				 * freeze it.
> -				 */
> -				new.frozen = 1;
> -
> -			} else { /* Needs to be taken off a list */
> +			/* Needs to be taken off a list */
> +			if (!kmem_cache_has_cpu_partial(s) || prior) {
>  
>  				n = get_node(s, slab_nid(slab));
>  				/*
> @@ -3675,7 +3650,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
>  			 * activity can be necessary.
>  			 */
>  			stat(s, FREE_FROZEN);
> -		} else if (new.frozen) {
> +		} else if (kmem_cache_has_cpu_partial(s) && !prior) {
>  			/*
>  			 * If we just froze the slab then put it onto the
>  			 * per cpu partial list.

I think this comment is now misleading, we didn't freeze the slab, so it's
now something like "if we started with a full slab...".
  
Chengming Zhou Oct. 24, 2023, 2:39 a.m. UTC | #2
On 2023/10/24 00:00, Vlastimil Babka wrote:
> On 10/21/23 16:43, chengming.zhou@linux.dev wrote:
>> From: Chengming Zhou <zhouchengming@bytedance.com>
>>
>> Now we will freeze slabs when moving them out of node partial list to
>> cpu partial list, this method needs two cmpxchg_double operations:
>>
>> 1. freeze slab (acquire_slab()) under the node list_lock
>> 2. get_freelist() when pick used in ___slab_alloc()
>>
>> Actually we don't need to freeze when moving slabs out of node partial
>> list, we can delay freeze to use slab freelist in ___slab_alloc(), so
>> we can save one cmpxchg_double().
>>
>> And there are other good points:
>>
>> 1. The moving of slabs between node partial list and cpu partial list
>>    becomes simpler, since we don't need to freeze or unfreeze at all.
>>
>> 2. The node list_lock contention would be less, since we only need to
>>    freeze one slab under the node list_lock. (In fact, we can first
>>    move slabs out of node partial list, don't need to freeze any slab
>>    at all, so the contention on slab won't transfer to the node list_lock
>>    contention.)
>>
>> We can achieve this because there is no concurrent path would manipulate
>> the partial slab list except the __slab_free() path, which is serialized
>> now.
>>
>> Note this patch just change the parts of moving the partial slabs for
>> easy code review, we will fix other parts in the following patches.
>> Specifically this patch change three paths:
>> 1. get partial slab from node: get_partial_node()
>> 2. put partial slab to node: __unfreeze_partials()
>> 3. cache partail slab on cpu when __slab_free()
> 
> So the problem with this approach that one patch breaks things and another
> one fixes them up, is that git bisect becomes problematic, so we shouldn't
> do that and instead bite the bullet and deal with a potentially large patch.
> At some level it's unavoidable as one has to grasp all the moving pieces
> anyway to see e.g. if the changes in allocation path are compatible with the
> changes in freeing.
> When possible, we can do preparatory stuff that doesn't break things like in
> patches 1 and 2, maybe get_cpu_partial() could also be introduced (even if
> unused) before switching the world over to the new scheme in a single patch
> (and possible cleanups afterwards). So would it be possible to redo it in
> such way please?

Ok, I will change to this way. I didn't know how to handle this potentially
large patch gracefully at first. Your detailed advice is very helpful to me,
I will prepare all possible preparatory stuff, then change all parts over
in one patch.

> 
> <snip>
> 
>> @@ -2621,23 +2622,7 @@ static void __unfreeze_partials(struct kmem_cache *s, struct slab *partial_slab)
>>  			spin_lock_irqsave(&n->list_lock, flags);
>>  		}
>>  
>> -		do {
>> -
>> -			old.freelist = slab->freelist;
>> -			old.counters = slab->counters;
>> -			VM_BUG_ON(!old.frozen);
>> -
>> -			new.counters = old.counters;
>> -			new.freelist = old.freelist;
>> -
>> -			new.frozen = 0;
>> -
>> -		} while (!__slab_update_freelist(s, slab,
>> -				old.freelist, old.counters,
>> -				new.freelist, new.counters,
>> -				"unfreezing slab"));
>> -
>> -		if (unlikely(!new.inuse && n->nr_partial >= s->min_partial)) {
>> +		if (unlikely(!slab->inuse && n->nr_partial >= s->min_partial)) {
>>  			slab->next = slab_to_discard;
>>  			slab_to_discard = slab;
>>  		} else {
>> @@ -3634,18 +3619,8 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
>>  		was_frozen = new.frozen;
>>  		new.inuse -= cnt;
>>  		if ((!new.inuse || !prior) && !was_frozen) {
>> -
>> -			if (kmem_cache_has_cpu_partial(s) && !prior) {
>> -
>> -				/*
>> -				 * Slab was on no list before and will be
>> -				 * partially empty
>> -				 * We can defer the list move and instead
>> -				 * freeze it.
>> -				 */
>> -				new.frozen = 1;
>> -
>> -			} else { /* Needs to be taken off a list */
>> +			/* Needs to be taken off a list */
>> +			if (!kmem_cache_has_cpu_partial(s) || prior) {
>>  
>>  				n = get_node(s, slab_nid(slab));
>>  				/*
>> @@ -3675,7 +3650,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
>>  			 * activity can be necessary.
>>  			 */
>>  			stat(s, FREE_FROZEN);
>> -		} else if (new.frozen) {
>> +		} else if (kmem_cache_has_cpu_partial(s) && !prior) {
>>  			/*
>>  			 * If we just froze the slab then put it onto the
>>  			 * per cpu partial list.
> 
> I think this comment is now misleading, we didn't freeze the slab, so it's
> now something like "if we started with a full slab...".

Ok, will check and change these inconsistent comments by the way.

Thanks!
  

Patch

diff --git a/mm/slub.c b/mm/slub.c
index adeff8df85ec..61ee82ea21b6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2277,7 +2277,9 @@  static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
 	struct slab *slab, *slab2;
 	void *object = NULL;
 	unsigned long flags;
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	unsigned int partial_slabs = 0;
+#endif
 
 	/*
 	 * Racy check. If we mistakenly see no partial slabs then we
@@ -2303,20 +2305,22 @@  static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
 			continue;
 		}
 
-		t = acquire_slab(s, n, slab, object == NULL);
-		if (!t)
-			break;
-
 		if (!object) {
-			*pc->slab = slab;
-			stat(s, ALLOC_FROM_PARTIAL);
-			object = t;
-		} else {
-			put_cpu_partial(s, slab, 0);
-			stat(s, CPU_PARTIAL_NODE);
-			partial_slabs++;
+			t = acquire_slab(s, n, slab, object == NULL);
+			if (t) {
+				*pc->slab = slab;
+				stat(s, ALLOC_FROM_PARTIAL);
+				object = t;
+				continue;
+			}
 		}
+
 #ifdef CONFIG_SLUB_CPU_PARTIAL
+		remove_partial(n, slab);
+		put_cpu_partial(s, slab, 0);
+		stat(s, CPU_PARTIAL_NODE);
+		partial_slabs++;
+
 		if (!kmem_cache_has_cpu_partial(s)
 			|| partial_slabs > s->cpu_partial_slabs / 2)
 			break;
@@ -2606,9 +2610,6 @@  static void __unfreeze_partials(struct kmem_cache *s, struct slab *partial_slab)
 	unsigned long flags = 0;
 
 	while (partial_slab) {
-		struct slab new;
-		struct slab old;
-
 		slab = partial_slab;
 		partial_slab = slab->next;
 
@@ -2621,23 +2622,7 @@  static void __unfreeze_partials(struct kmem_cache *s, struct slab *partial_slab)
 			spin_lock_irqsave(&n->list_lock, flags);
 		}
 
-		do {
-
-			old.freelist = slab->freelist;
-			old.counters = slab->counters;
-			VM_BUG_ON(!old.frozen);
-
-			new.counters = old.counters;
-			new.freelist = old.freelist;
-
-			new.frozen = 0;
-
-		} while (!__slab_update_freelist(s, slab,
-				old.freelist, old.counters,
-				new.freelist, new.counters,
-				"unfreezing slab"));
-
-		if (unlikely(!new.inuse && n->nr_partial >= s->min_partial)) {
+		if (unlikely(!slab->inuse && n->nr_partial >= s->min_partial)) {
 			slab->next = slab_to_discard;
 			slab_to_discard = slab;
 		} else {
@@ -3634,18 +3619,8 @@  static void __slab_free(struct kmem_cache *s, struct slab *slab,
 		was_frozen = new.frozen;
 		new.inuse -= cnt;
 		if ((!new.inuse || !prior) && !was_frozen) {
-
-			if (kmem_cache_has_cpu_partial(s) && !prior) {
-
-				/*
-				 * Slab was on no list before and will be
-				 * partially empty
-				 * We can defer the list move and instead
-				 * freeze it.
-				 */
-				new.frozen = 1;
-
-			} else { /* Needs to be taken off a list */
+			/* Needs to be taken off a list */
+			if (!kmem_cache_has_cpu_partial(s) || prior) {
 
 				n = get_node(s, slab_nid(slab));
 				/*
@@ -3675,7 +3650,7 @@  static void __slab_free(struct kmem_cache *s, struct slab *slab,
 			 * activity can be necessary.
 			 */
 			stat(s, FREE_FROZEN);
-		} else if (new.frozen) {
+		} else if (kmem_cache_has_cpu_partial(s) && !prior) {
 			/*
 			 * If we just froze the slab then put it onto the
 			 * per cpu partial list.