[RFC,71/86] treewide: lib: remove cond_resched()

Message ID 20231107230822.371443-15-ankur.a.arora@oracle.com
State New
Headers
Series Make the kernel preemptible |

Commit Message

Ankur Arora Nov. 7, 2023, 11:08 p.m. UTC
  There are broadly three sets of uses of cond_resched():

1.  Calls to cond_resched() out of the goodness of our heart,
    otherwise known as avoiding lockup splats.

2.  Open coded variants of cond_resched_lock() which call
    cond_resched().

3.  Retry or error handling loops, where cond_resched() is used as a
    quick alternative to spinning in a tight-loop.

When running under a full preemption model, the cond_resched() reduces
to a NOP (not even a barrier) so removing it obviously cannot matter.

But considering only voluntary preemption models (for say code that
has been mostly tested under those), for set-1 and set-2 the
scheduler can now preempt kernel tasks running beyond their time
quanta anywhere they are preemptible() [1]. Which removes any need
for these explicitly placed scheduling points.

The cond_resched() calls in set-3 are a little more difficult.
To start with, given it's NOP character under full preemption, it
never actually saved us from a tight loop.
With voluntary preemption, it's not a NOP, but it might as well be --
for most workloads the scheduler does not have an interminable supply
of runnable tasks on the runqueue.

So, cond_resched() is useful to not get softlockup splats, but not
terribly good for error handling. Ideally, these should be replaced
with some kind of timed or event wait.
For now we use cond_resched_stall(), which tries to schedule if
possible, and executes a cpu_relax() if not.

Almost all the cond_resched() calls are from set-1. Remove them.

[1] https://lore.kernel.org/lkml/20231107215742.363031-1-ankur.a.arora@oracle.com/

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net> 
Cc: Kees Cook <keescook@chromium.org> 
Cc: Eric Dumazet <edumazet@google.com> 
Cc: Jakub Kicinski <kuba@kernel.org> 
Cc: Paolo Abeni <pabeni@redhat.com> 
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 lib/crc32test.c          |  2 --
 lib/crypto/mpi/mpi-pow.c |  1 -
 lib/memcpy_kunit.c       |  5 -----
 lib/random32.c           |  1 -
 lib/rhashtable.c         |  2 --
 lib/test_bpf.c           |  3 ---
 lib/test_lockup.c        |  2 +-
 lib/test_maple_tree.c    |  8 --------
 lib/test_rhashtable.c    | 10 ----------
 9 files changed, 1 insertion(+), 33 deletions(-)
  

Comments

Herbert Xu Nov. 8, 2023, 9:15 a.m. UTC | #1
On Tue, Nov 07, 2023 at 03:08:07PM -0800, Ankur Arora wrote:
> There are broadly three sets of uses of cond_resched():
> 
> 1.  Calls to cond_resched() out of the goodness of our heart,
>     otherwise known as avoiding lockup splats.
> 
> 2.  Open coded variants of cond_resched_lock() which call
>     cond_resched().
> 
> 3.  Retry or error handling loops, where cond_resched() is used as a
>     quick alternative to spinning in a tight-loop.
> 
> When running under a full preemption model, the cond_resched() reduces
> to a NOP (not even a barrier) so removing it obviously cannot matter.
> 
> But considering only voluntary preemption models (for say code that
> has been mostly tested under those), for set-1 and set-2 the
> scheduler can now preempt kernel tasks running beyond their time
> quanta anywhere they are preemptible() [1]. Which removes any need
> for these explicitly placed scheduling points.
> 
> The cond_resched() calls in set-3 are a little more difficult.
> To start with, given it's NOP character under full preemption, it
> never actually saved us from a tight loop.
> With voluntary preemption, it's not a NOP, but it might as well be --
> for most workloads the scheduler does not have an interminable supply
> of runnable tasks on the runqueue.
> 
> So, cond_resched() is useful to not get softlockup splats, but not
> terribly good for error handling. Ideally, these should be replaced
> with some kind of timed or event wait.
> For now we use cond_resched_stall(), which tries to schedule if
> possible, and executes a cpu_relax() if not.
> 
> Almost all the cond_resched() calls are from set-1. Remove them.
> 
> [1] https://lore.kernel.org/lkml/20231107215742.363031-1-ankur.a.arora@oracle.com/
> 
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: "David S. Miller" <davem@davemloft.net> 
> Cc: Kees Cook <keescook@chromium.org> 
> Cc: Eric Dumazet <edumazet@google.com> 
> Cc: Jakub Kicinski <kuba@kernel.org> 
> Cc: Paolo Abeni <pabeni@redhat.com> 
> Cc: Thomas Graf <tgraf@suug.ch>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
>  lib/crc32test.c          |  2 --
>  lib/crypto/mpi/mpi-pow.c |  1 -
>  lib/memcpy_kunit.c       |  5 -----
>  lib/random32.c           |  1 -
>  lib/rhashtable.c         |  2 --
>  lib/test_bpf.c           |  3 ---
>  lib/test_lockup.c        |  2 +-
>  lib/test_maple_tree.c    |  8 --------
>  lib/test_rhashtable.c    | 10 ----------
>  9 files changed, 1 insertion(+), 33 deletions(-)

Nack.
  
Steven Rostedt Nov. 8, 2023, 3:08 p.m. UTC | #2
On Wed, 8 Nov 2023 17:15:11 +0800
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> >  lib/crc32test.c          |  2 --
> >  lib/crypto/mpi/mpi-pow.c |  1 -
> >  lib/memcpy_kunit.c       |  5 -----
> >  lib/random32.c           |  1 -
> >  lib/rhashtable.c         |  2 --
> >  lib/test_bpf.c           |  3 ---
> >  lib/test_lockup.c        |  2 +-
> >  lib/test_maple_tree.c    |  8 --------
> >  lib/test_rhashtable.c    | 10 ----------
> >  9 files changed, 1 insertion(+), 33 deletions(-)  
> 
> Nack.

A "Nack" with no commentary is completely useless and borderline offensive.

What is your rationale for the Nack?

The cond_resched() is going away if the patches earlier in the series gets
implemented. So either it is removed from your code, or it will become a
nop, and just wasting bits in the source tree. Your choice.

-- Steve
  
Kees Cook Nov. 8, 2023, 7:15 p.m. UTC | #3
On Tue, Nov 07, 2023 at 03:08:07PM -0800, Ankur Arora wrote:
> There are broadly three sets of uses of cond_resched():
> 
> 1.  Calls to cond_resched() out of the goodness of our heart,
>     otherwise known as avoiding lockup splats.
> 
> 2.  Open coded variants of cond_resched_lock() which call
>     cond_resched().
> 
> 3.  Retry or error handling loops, where cond_resched() is used as a
>     quick alternative to spinning in a tight-loop.
> 
> When running under a full preemption model, the cond_resched() reduces
> to a NOP (not even a barrier) so removing it obviously cannot matter.
> 
> But considering only voluntary preemption models (for say code that
> has been mostly tested under those), for set-1 and set-2 the
> scheduler can now preempt kernel tasks running beyond their time
> quanta anywhere they are preemptible() [1]. Which removes any need
> for these explicitly placed scheduling points.
> 
> The cond_resched() calls in set-3 are a little more difficult.
> To start with, given it's NOP character under full preemption, it
> never actually saved us from a tight loop.
> With voluntary preemption, it's not a NOP, but it might as well be --
> for most workloads the scheduler does not have an interminable supply
> of runnable tasks on the runqueue.
> 
> So, cond_resched() is useful to not get softlockup splats, but not
> terribly good for error handling. Ideally, these should be replaced
> with some kind of timed or event wait.
> For now we use cond_resched_stall(), which tries to schedule if
> possible, and executes a cpu_relax() if not.
> 
> Almost all the cond_resched() calls are from set-1. Remove them.

FOr the memcpy_kunit.c cases, I don't think there are preemption
locations in its loops. Perhaps I'm misunderstanding something? Why will
the memcpy test no longer produce softlockup splats?

-Kees

> 
> [1] https://lore.kernel.org/lkml/20231107215742.363031-1-ankur.a.arora@oracle.com/
> 
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: "David S. Miller" <davem@davemloft.net> 
> Cc: Kees Cook <keescook@chromium.org> 
> Cc: Eric Dumazet <edumazet@google.com> 
> Cc: Jakub Kicinski <kuba@kernel.org> 
> Cc: Paolo Abeni <pabeni@redhat.com> 
> Cc: Thomas Graf <tgraf@suug.ch>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
>  lib/crc32test.c          |  2 --
>  lib/crypto/mpi/mpi-pow.c |  1 -
>  lib/memcpy_kunit.c       |  5 -----
>  lib/random32.c           |  1 -
>  lib/rhashtable.c         |  2 --
>  lib/test_bpf.c           |  3 ---
>  lib/test_lockup.c        |  2 +-
>  lib/test_maple_tree.c    |  8 --------
>  lib/test_rhashtable.c    | 10 ----------
>  9 files changed, 1 insertion(+), 33 deletions(-)
> 
> diff --git a/lib/crc32test.c b/lib/crc32test.c
> index 9b4af79412c4..3eee90482e9a 100644
> --- a/lib/crc32test.c
> +++ b/lib/crc32test.c
> @@ -729,7 +729,6 @@ static int __init crc32c_combine_test(void)
>  			      crc_full == test[i].crc32c_le))
>  				errors++;
>  			runs++;
> -			cond_resched();
>  		}
>  	}
>  
> @@ -817,7 +816,6 @@ static int __init crc32_combine_test(void)
>  			      crc_full == test[i].crc_le))
>  				errors++;
>  			runs++;
> -			cond_resched();
>  		}
>  	}
>  
> diff --git a/lib/crypto/mpi/mpi-pow.c b/lib/crypto/mpi/mpi-pow.c
> index 2fd7a46d55ec..074534900b7e 100644
> --- a/lib/crypto/mpi/mpi-pow.c
> +++ b/lib/crypto/mpi/mpi-pow.c
> @@ -242,7 +242,6 @@ int mpi_powm(MPI res, MPI base, MPI exp, MPI mod)
>  				}
>  				e <<= 1;
>  				c--;
> -				cond_resched();
>  			}
>  
>  			i--;
> diff --git a/lib/memcpy_kunit.c b/lib/memcpy_kunit.c
> index 440aee705ccc..c2a6b09fe93a 100644
> --- a/lib/memcpy_kunit.c
> +++ b/lib/memcpy_kunit.c
> @@ -361,8 +361,6 @@ static void copy_large_test(struct kunit *test, bool use_memmove)
>  			/* Zero out what we copied for the next cycle. */
>  			memset(large_dst + offset, 0, bytes);
>  		}
> -		/* Avoid stall warnings if this loop gets slow. */
> -		cond_resched();
>  	}
>  }
>  
> @@ -489,9 +487,6 @@ static void memmove_overlap_test(struct kunit *test)
>  			for (int s_off = s_start; s_off < s_end;
>  			     s_off = next_step(s_off, s_start, s_end, window_step))
>  				inner_loop(test, bytes, d_off, s_off);
> -
> -			/* Avoid stall warnings. */
> -			cond_resched();
>  		}
>  	}
>  }
> diff --git a/lib/random32.c b/lib/random32.c
> index 32060b852668..10bc804d99d6 100644
> --- a/lib/random32.c
> +++ b/lib/random32.c
> @@ -287,7 +287,6 @@ static int __init prandom_state_selftest(void)
>  			errors++;
>  
>  		runs++;
> -		cond_resched();
>  	}
>  
>  	if (errors)
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index 6ae2ba8e06a2..5ff0f521bf29 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -328,7 +328,6 @@ static int rhashtable_rehash_table(struct rhashtable *ht)
>  		err = rhashtable_rehash_chain(ht, old_hash);
>  		if (err)
>  			return err;
> -		cond_resched();
>  	}
>  
>  	/* Publish the new table pointer. */
> @@ -1147,7 +1146,6 @@ void rhashtable_free_and_destroy(struct rhashtable *ht,
>  		for (i = 0; i < tbl->size; i++) {
>  			struct rhash_head *pos, *next;
>  
> -			cond_resched();
>  			for (pos = rht_ptr_exclusive(rht_bucket(tbl, i)),
>  			     next = !rht_is_a_nulls(pos) ?
>  					rht_dereference(pos->next, ht) : NULL;
> diff --git a/lib/test_bpf.c b/lib/test_bpf.c
> index ecde4216201e..15b4d32712d8 100644
> --- a/lib/test_bpf.c
> +++ b/lib/test_bpf.c
> @@ -14758,7 +14758,6 @@ static __init int test_skb_segment(void)
>  	for (i = 0; i < ARRAY_SIZE(skb_segment_tests); i++) {
>  		const struct skb_segment_test *test = &skb_segment_tests[i];
>  
> -		cond_resched();
>  		if (exclude_test(i))
>  			continue;
>  
> @@ -14787,7 +14786,6 @@ static __init int test_bpf(void)
>  		struct bpf_prog *fp;
>  		int err;
>  
> -		cond_resched();
>  		if (exclude_test(i))
>  			continue;
>  
> @@ -15171,7 +15169,6 @@ static __init int test_tail_calls(struct bpf_array *progs)
>  		u64 duration;
>  		int ret;
>  
> -		cond_resched();
>  		if (exclude_test(i))
>  			continue;
>  
> diff --git a/lib/test_lockup.c b/lib/test_lockup.c
> index c3fd87d6c2dd..9af5d34c98f6 100644
> --- a/lib/test_lockup.c
> +++ b/lib/test_lockup.c
> @@ -381,7 +381,7 @@ static void test_lockup(bool master)
>  			touch_nmi_watchdog();
>  
>  		if (call_cond_resched)
> -			cond_resched();
> +			cond_resched_stall();
>  
>  		test_wait(cooldown_secs, cooldown_nsecs);
>  
> diff --git a/lib/test_maple_tree.c b/lib/test_maple_tree.c
> index 464eeb90d5ad..321fd5d8aef3 100644
> --- a/lib/test_maple_tree.c
> +++ b/lib/test_maple_tree.c
> @@ -2672,7 +2672,6 @@ static noinline void __init check_dup(struct maple_tree *mt)
>  		rcu_barrier();
>  	}
>  
> -	cond_resched();
>  	mt_cache_shrink();
>  	/* Check with a value at zero, no gap */
>  	for (i = 1000; i < 2000; i++) {
> @@ -2682,7 +2681,6 @@ static noinline void __init check_dup(struct maple_tree *mt)
>  		rcu_barrier();
>  	}
>  
> -	cond_resched();
>  	mt_cache_shrink();
>  	/* Check with a value at zero and unreasonably large */
>  	for (i = big_start; i < big_start + 10; i++) {
> @@ -2692,7 +2690,6 @@ static noinline void __init check_dup(struct maple_tree *mt)
>  		rcu_barrier();
>  	}
>  
> -	cond_resched();
>  	mt_cache_shrink();
>  	/* Small to medium size not starting at zero*/
>  	for (i = 200; i < 1000; i++) {
> @@ -2702,7 +2699,6 @@ static noinline void __init check_dup(struct maple_tree *mt)
>  		rcu_barrier();
>  	}
>  
> -	cond_resched();
>  	mt_cache_shrink();
>  	/* Unreasonably large not starting at zero*/
>  	for (i = big_start; i < big_start + 10; i++) {
> @@ -2710,7 +2706,6 @@ static noinline void __init check_dup(struct maple_tree *mt)
>  		check_dup_gaps(mt, i, false, 5);
>  		mtree_destroy(mt);
>  		rcu_barrier();
> -		cond_resched();
>  		mt_cache_shrink();
>  	}
>  
> @@ -2720,7 +2715,6 @@ static noinline void __init check_dup(struct maple_tree *mt)
>  		check_dup_gaps(mt, i, false, 5);
>  		mtree_destroy(mt);
>  		rcu_barrier();
> -		cond_resched();
>  		if (i % 2 == 0)
>  			mt_cache_shrink();
>  	}
> @@ -2732,7 +2726,6 @@ static noinline void __init check_dup(struct maple_tree *mt)
>  		check_dup_gaps(mt, i, true, 5);
>  		mtree_destroy(mt);
>  		rcu_barrier();
> -		cond_resched();
>  	}
>  
>  	mt_cache_shrink();
> @@ -2743,7 +2736,6 @@ static noinline void __init check_dup(struct maple_tree *mt)
>  		mtree_destroy(mt);
>  		rcu_barrier();
>  		mt_cache_shrink();
> -		cond_resched();
>  	}
>  }
>  
> diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
> index c20f6cb4bf55..e5d1f272f2c6 100644
> --- a/lib/test_rhashtable.c
> +++ b/lib/test_rhashtable.c
> @@ -119,7 +119,6 @@ static int insert_retry(struct rhashtable *ht, struct test_obj *obj,
>  
>  	do {
>  		retries++;
> -		cond_resched();
>  		err = rhashtable_insert_fast(ht, &obj->node, params);
>  		if (err == -ENOMEM && enomem_retry) {
>  			enomem_retries++;
> @@ -253,8 +252,6 @@ static s64 __init test_rhashtable(struct rhashtable *ht, struct test_obj *array,
>  
>  			rhashtable_remove_fast(ht, &obj->node, test_rht_params);
>  		}
> -
> -		cond_resched();
>  	}
>  
>  	end = ktime_get_ns();
> @@ -371,8 +368,6 @@ static int __init test_rhltable(unsigned int entries)
>  		u32 i = get_random_u32_below(entries);
>  		u32 prand = get_random_u32_below(4);
>  
> -		cond_resched();
> -
>  		err = rhltable_remove(&rhlt, &rhl_test_objects[i].list_node, test_rht_params);
>  		if (test_bit(i, obj_in_table)) {
>  			clear_bit(i, obj_in_table);
> @@ -412,7 +407,6 @@ static int __init test_rhltable(unsigned int entries)
>  	}
>  
>  	for (i = 0; i < entries; i++) {
> -		cond_resched();
>  		err = rhltable_remove(&rhlt, &rhl_test_objects[i].list_node, test_rht_params);
>  		if (test_bit(i, obj_in_table)) {
>  			if (WARN(err, "cannot remove element at slot %d", i))
> @@ -607,8 +601,6 @@ static int thread_lookup_test(struct thread_data *tdata)
>  			       obj->value.tid, obj->value.id, key.tid, key.id);
>  			err++;
>  		}
> -
> -		cond_resched();
>  	}
>  	return err;
>  }
> @@ -660,8 +652,6 @@ static int threadfunc(void *data)
>  				goto out;
>  			}
>  			tdata->objs[i].value.id = TEST_INSERT_FAIL;
> -
> -			cond_resched();
>  		}
>  		err = thread_lookup_test(tdata);
>  		if (err) {
> -- 
> 2.31.1
>
  
Steven Rostedt Nov. 8, 2023, 7:41 p.m. UTC | #4
On Wed, 8 Nov 2023 11:15:37 -0800
Kees Cook <keescook@chromium.org> wrote:

> FOr the memcpy_kunit.c cases, I don't think there are preemption
> locations in its loops. Perhaps I'm misunderstanding something? Why will
> the memcpy test no longer produce softlockup splats?

This patchset will switch over to a NEED_RESCHED_LAZY routine, so that
VOLUNTARY and NONE preemption models will be forced to preempt if its in
the kernel for too long.

Time slice is over: set NEED_RESCHED_LAZY

For VOLUNTARY and NONE, NEED_RESCHED_LAZY will not preempt the kernel (but
will preempt user space).

If in the kernel for over 1 tick (1ms for 1000Hz, 4ms for 250Hz, etc),
if NEED_RESCHED_LAZY is still set after one tick, then set NEED_RESCHED.

NEED_RESCHED will now schedule in the kernel once it is able to regardless
of preemption model. (PREEMPT_NONE will now use preempt_disable()).

This allows us to get rid of all cond_resched()s throughout the kernel as
this will be the new mechanism to keep from running inside the kernel for
too long. The watchdog is always longer than one tick.

-- Steve
  
Kees Cook Nov. 8, 2023, 10:16 p.m. UTC | #5
On Wed, Nov 08, 2023 at 02:41:44PM -0500, Steven Rostedt wrote:
> On Wed, 8 Nov 2023 11:15:37 -0800
> Kees Cook <keescook@chromium.org> wrote:
> 
> > FOr the memcpy_kunit.c cases, I don't think there are preemption
> > locations in its loops. Perhaps I'm misunderstanding something? Why will
> > the memcpy test no longer produce softlockup splats?
> 
> This patchset will switch over to a NEED_RESCHED_LAZY routine, so that
> VOLUNTARY and NONE preemption models will be forced to preempt if its in
> the kernel for too long.
> 
> Time slice is over: set NEED_RESCHED_LAZY
> 
> For VOLUNTARY and NONE, NEED_RESCHED_LAZY will not preempt the kernel (but
> will preempt user space).
> 
> If in the kernel for over 1 tick (1ms for 1000Hz, 4ms for 250Hz, etc),
> if NEED_RESCHED_LAZY is still set after one tick, then set NEED_RESCHED.
> 
> NEED_RESCHED will now schedule in the kernel once it is able to regardless
> of preemption model. (PREEMPT_NONE will now use preempt_disable()).
> 
> This allows us to get rid of all cond_resched()s throughout the kernel as
> this will be the new mechanism to keep from running inside the kernel for
> too long. The watchdog is always longer than one tick.

Okay, it sounds like it's taken care of. :)

Acked-by: Kees Cook <keescook@chromium.org> # for lib/memcpy_kunit.c
  
Steven Rostedt Nov. 8, 2023, 10:21 p.m. UTC | #6
On Wed, 8 Nov 2023 14:16:25 -0800
Kees Cook <keescook@chromium.org> wrote:

> Okay, it sounds like it's taken care of. :)
> 
> Acked-by: Kees Cook <keescook@chromium.org> # for lib/memcpy_kunit.c

Thanks Kees,

But I have to admit (and Ankur is now aware) that it was premature to send
the cond_resched() removal patches with this RFC. It may be a year before
we get everything straighten out with the new preempt models.

So, expect to see this patch again sometime next year ;-)

Hopefully, you can ack it then.

-- Steve
  
Herbert Xu Nov. 9, 2023, 4:19 a.m. UTC | #7
On Wed, Nov 08, 2023 at 10:08:18AM -0500, Steven Rostedt wrote:
>
> A "Nack" with no commentary is completely useless and borderline offensive.

Well you just sent me an email out of the blue, with zero context
about what you were doing, and you're complaining to me about giving
your a curt response?

> What is your rationale for the Nack?

Next time perhaps consider sending the cover letter and the important
patches to everyone rather than the mailing list.

> The cond_resched() is going away if the patches earlier in the series gets
> implemented. So either it is removed from your code, or it will become a
> nop, and just wasting bits in the source tree. Your choice.

This is exactly what I should have received.

Thanks,
  
Steven Rostedt Nov. 9, 2023, 4:43 a.m. UTC | #8
On Thu, 9 Nov 2023 12:19:55 +0800
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Wed, Nov 08, 2023 at 10:08:18AM -0500, Steven Rostedt wrote:
> >
> > A "Nack" with no commentary is completely useless and borderline offensive.  
> 
> Well you just sent me an email out of the blue, with zero context
> about what you were doing, and you're complaining to me about giving
> your a curt response?

First, I didn't send the email, and your "Nack" wasn't directed at me.

Second, with lore and lei, it's trivial today to find the cover letter from
the message id. But I get it. It's annoying when you have to do that.

> 
> > What is your rationale for the Nack?  
> 
> Next time perhaps consider sending the cover letter and the important
> patches to everyone rather than the mailing list.

Then that is how you should have responded. I see other maintainers respond
as such. A "Nack" is still meaningless. You could have responded with:

 "What is this? And why are you doing it?"

Which is a much better and a more meaningful response than a "Nack".

> 
> > The cond_resched() is going away if the patches earlier in the series gets
> > implemented. So either it is removed from your code, or it will become a
> > nop, and just wasting bits in the source tree. Your choice.  
> 
> This is exactly what I should have received.

Which is why I replied, as the original email author is still new at this,
but is learning.

-- Steve
  
David Laight Nov. 9, 2023, 9:39 a.m. UTC | #9
From: Steven Rostedt
> Sent: 08 November 2023 19:42
> 
> On Wed, 8 Nov 2023 11:15:37 -0800
> Kees Cook <keescook@chromium.org> wrote:
> 
> > FOr the memcpy_kunit.c cases, I don't think there are preemption
> > locations in its loops. Perhaps I'm misunderstanding something? Why will
> > the memcpy test no longer produce softlockup splats?
> 
> This patchset will switch over to a NEED_RESCHED_LAZY routine, so that
> VOLUNTARY and NONE preemption models will be forced to preempt if its in
> the kernel for too long.
> 
> Time slice is over: set NEED_RESCHED_LAZY
> 
> For VOLUNTARY and NONE, NEED_RESCHED_LAZY will not preempt the kernel (but
> will preempt user space).
> 
> If in the kernel for over 1 tick (1ms for 1000Hz, 4ms for 250Hz, etc),
> if NEED_RESCHED_LAZY is still set after one tick, then set NEED_RESCHED.

Delaying the reschedule that long seems like a regression.
I'm sure a lot of the cond_resched() calls were added to cause
pre-emption much earlier than 1 tick.

I doubt the distibutions will change from VOLUTARY any time soon.
So that is what most people will be using.

	David.

> 
> NEED_RESCHED will now schedule in the kernel once it is able to regardless
> of preemption model. (PREEMPT_NONE will now use preempt_disable()).
> 
> This allows us to get rid of all cond_resched()s throughout the kernel as
> this will be the new mechanism to keep from running inside the kernel for
> too long. The watchdog is always longer than one tick.
> 
> -- Steve

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  

Patch

diff --git a/lib/crc32test.c b/lib/crc32test.c
index 9b4af79412c4..3eee90482e9a 100644
--- a/lib/crc32test.c
+++ b/lib/crc32test.c
@@ -729,7 +729,6 @@  static int __init crc32c_combine_test(void)
 			      crc_full == test[i].crc32c_le))
 				errors++;
 			runs++;
-			cond_resched();
 		}
 	}
 
@@ -817,7 +816,6 @@  static int __init crc32_combine_test(void)
 			      crc_full == test[i].crc_le))
 				errors++;
 			runs++;
-			cond_resched();
 		}
 	}
 
diff --git a/lib/crypto/mpi/mpi-pow.c b/lib/crypto/mpi/mpi-pow.c
index 2fd7a46d55ec..074534900b7e 100644
--- a/lib/crypto/mpi/mpi-pow.c
+++ b/lib/crypto/mpi/mpi-pow.c
@@ -242,7 +242,6 @@  int mpi_powm(MPI res, MPI base, MPI exp, MPI mod)
 				}
 				e <<= 1;
 				c--;
-				cond_resched();
 			}
 
 			i--;
diff --git a/lib/memcpy_kunit.c b/lib/memcpy_kunit.c
index 440aee705ccc..c2a6b09fe93a 100644
--- a/lib/memcpy_kunit.c
+++ b/lib/memcpy_kunit.c
@@ -361,8 +361,6 @@  static void copy_large_test(struct kunit *test, bool use_memmove)
 			/* Zero out what we copied for the next cycle. */
 			memset(large_dst + offset, 0, bytes);
 		}
-		/* Avoid stall warnings if this loop gets slow. */
-		cond_resched();
 	}
 }
 
@@ -489,9 +487,6 @@  static void memmove_overlap_test(struct kunit *test)
 			for (int s_off = s_start; s_off < s_end;
 			     s_off = next_step(s_off, s_start, s_end, window_step))
 				inner_loop(test, bytes, d_off, s_off);
-
-			/* Avoid stall warnings. */
-			cond_resched();
 		}
 	}
 }
diff --git a/lib/random32.c b/lib/random32.c
index 32060b852668..10bc804d99d6 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -287,7 +287,6 @@  static int __init prandom_state_selftest(void)
 			errors++;
 
 		runs++;
-		cond_resched();
 	}
 
 	if (errors)
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 6ae2ba8e06a2..5ff0f521bf29 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -328,7 +328,6 @@  static int rhashtable_rehash_table(struct rhashtable *ht)
 		err = rhashtable_rehash_chain(ht, old_hash);
 		if (err)
 			return err;
-		cond_resched();
 	}
 
 	/* Publish the new table pointer. */
@@ -1147,7 +1146,6 @@  void rhashtable_free_and_destroy(struct rhashtable *ht,
 		for (i = 0; i < tbl->size; i++) {
 			struct rhash_head *pos, *next;
 
-			cond_resched();
 			for (pos = rht_ptr_exclusive(rht_bucket(tbl, i)),
 			     next = !rht_is_a_nulls(pos) ?
 					rht_dereference(pos->next, ht) : NULL;
diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index ecde4216201e..15b4d32712d8 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -14758,7 +14758,6 @@  static __init int test_skb_segment(void)
 	for (i = 0; i < ARRAY_SIZE(skb_segment_tests); i++) {
 		const struct skb_segment_test *test = &skb_segment_tests[i];
 
-		cond_resched();
 		if (exclude_test(i))
 			continue;
 
@@ -14787,7 +14786,6 @@  static __init int test_bpf(void)
 		struct bpf_prog *fp;
 		int err;
 
-		cond_resched();
 		if (exclude_test(i))
 			continue;
 
@@ -15171,7 +15169,6 @@  static __init int test_tail_calls(struct bpf_array *progs)
 		u64 duration;
 		int ret;
 
-		cond_resched();
 		if (exclude_test(i))
 			continue;
 
diff --git a/lib/test_lockup.c b/lib/test_lockup.c
index c3fd87d6c2dd..9af5d34c98f6 100644
--- a/lib/test_lockup.c
+++ b/lib/test_lockup.c
@@ -381,7 +381,7 @@  static void test_lockup(bool master)
 			touch_nmi_watchdog();
 
 		if (call_cond_resched)
-			cond_resched();
+			cond_resched_stall();
 
 		test_wait(cooldown_secs, cooldown_nsecs);
 
diff --git a/lib/test_maple_tree.c b/lib/test_maple_tree.c
index 464eeb90d5ad..321fd5d8aef3 100644
--- a/lib/test_maple_tree.c
+++ b/lib/test_maple_tree.c
@@ -2672,7 +2672,6 @@  static noinline void __init check_dup(struct maple_tree *mt)
 		rcu_barrier();
 	}
 
-	cond_resched();
 	mt_cache_shrink();
 	/* Check with a value at zero, no gap */
 	for (i = 1000; i < 2000; i++) {
@@ -2682,7 +2681,6 @@  static noinline void __init check_dup(struct maple_tree *mt)
 		rcu_barrier();
 	}
 
-	cond_resched();
 	mt_cache_shrink();
 	/* Check with a value at zero and unreasonably large */
 	for (i = big_start; i < big_start + 10; i++) {
@@ -2692,7 +2690,6 @@  static noinline void __init check_dup(struct maple_tree *mt)
 		rcu_barrier();
 	}
 
-	cond_resched();
 	mt_cache_shrink();
 	/* Small to medium size not starting at zero*/
 	for (i = 200; i < 1000; i++) {
@@ -2702,7 +2699,6 @@  static noinline void __init check_dup(struct maple_tree *mt)
 		rcu_barrier();
 	}
 
-	cond_resched();
 	mt_cache_shrink();
 	/* Unreasonably large not starting at zero*/
 	for (i = big_start; i < big_start + 10; i++) {
@@ -2710,7 +2706,6 @@  static noinline void __init check_dup(struct maple_tree *mt)
 		check_dup_gaps(mt, i, false, 5);
 		mtree_destroy(mt);
 		rcu_barrier();
-		cond_resched();
 		mt_cache_shrink();
 	}
 
@@ -2720,7 +2715,6 @@  static noinline void __init check_dup(struct maple_tree *mt)
 		check_dup_gaps(mt, i, false, 5);
 		mtree_destroy(mt);
 		rcu_barrier();
-		cond_resched();
 		if (i % 2 == 0)
 			mt_cache_shrink();
 	}
@@ -2732,7 +2726,6 @@  static noinline void __init check_dup(struct maple_tree *mt)
 		check_dup_gaps(mt, i, true, 5);
 		mtree_destroy(mt);
 		rcu_barrier();
-		cond_resched();
 	}
 
 	mt_cache_shrink();
@@ -2743,7 +2736,6 @@  static noinline void __init check_dup(struct maple_tree *mt)
 		mtree_destroy(mt);
 		rcu_barrier();
 		mt_cache_shrink();
-		cond_resched();
 	}
 }
 
diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index c20f6cb4bf55..e5d1f272f2c6 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -119,7 +119,6 @@  static int insert_retry(struct rhashtable *ht, struct test_obj *obj,
 
 	do {
 		retries++;
-		cond_resched();
 		err = rhashtable_insert_fast(ht, &obj->node, params);
 		if (err == -ENOMEM && enomem_retry) {
 			enomem_retries++;
@@ -253,8 +252,6 @@  static s64 __init test_rhashtable(struct rhashtable *ht, struct test_obj *array,
 
 			rhashtable_remove_fast(ht, &obj->node, test_rht_params);
 		}
-
-		cond_resched();
 	}
 
 	end = ktime_get_ns();
@@ -371,8 +368,6 @@  static int __init test_rhltable(unsigned int entries)
 		u32 i = get_random_u32_below(entries);
 		u32 prand = get_random_u32_below(4);
 
-		cond_resched();
-
 		err = rhltable_remove(&rhlt, &rhl_test_objects[i].list_node, test_rht_params);
 		if (test_bit(i, obj_in_table)) {
 			clear_bit(i, obj_in_table);
@@ -412,7 +407,6 @@  static int __init test_rhltable(unsigned int entries)
 	}
 
 	for (i = 0; i < entries; i++) {
-		cond_resched();
 		err = rhltable_remove(&rhlt, &rhl_test_objects[i].list_node, test_rht_params);
 		if (test_bit(i, obj_in_table)) {
 			if (WARN(err, "cannot remove element at slot %d", i))
@@ -607,8 +601,6 @@  static int thread_lookup_test(struct thread_data *tdata)
 			       obj->value.tid, obj->value.id, key.tid, key.id);
 			err++;
 		}
-
-		cond_resched();
 	}
 	return err;
 }
@@ -660,8 +652,6 @@  static int threadfunc(void *data)
 				goto out;
 			}
 			tdata->objs[i].value.id = TEST_INSERT_FAIL;
-
-			cond_resched();
 		}
 		err = thread_lookup_test(tdata);
 		if (err) {