[RFC,54/86] sched: add cond_resched_stall()

Message ID 20231107215742.363031-55-ankur.a.arora@oracle.com
State New
Headers
Series Make the kernel preemptible |

Commit Message

Ankur Arora Nov. 7, 2023, 9:57 p.m. UTC
  The kernel has a lot of intances of cond_resched() where it is used
as an alternative to spinning in a tight-loop while waiting to
retry an operation, or while waiting for a device state to change.

Unfortunately, because the scheduler is unlikely to have an
interminable supply of runnable tasks on the runqueue, this just
amounts to spinning in a tight-loop with a cond_resched().
(When running in a fully preemptible kernel, cond_resched()
calls are stubbed out so it amounts to even less.)

In sum, cond_resched() in error handling/retry contexts might
be useful in avoiding softlockup splats, but not very good at
error handling. Ideally, these should be replaced with some kind
of timed or event wait.

For now add cond_resched_stall(), which tries to schedule if
possible, and failing that executes a cpu_relax().

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 include/linux/sched.h |  6 ++++++
 kernel/sched/core.c   | 12 ++++++++++++
 2 files changed, 18 insertions(+)
  

Comments

Thomas Gleixner Nov. 9, 2023, 11:19 a.m. UTC | #1
On Tue, Nov 07 2023 at 13:57, Ankur Arora wrote:
> The kernel has a lot of intances of cond_resched() where it is used
> as an alternative to spinning in a tight-loop while waiting to
> retry an operation, or while waiting for a device state to change.
>
> Unfortunately, because the scheduler is unlikely to have an
> interminable supply of runnable tasks on the runqueue, this just
> amounts to spinning in a tight-loop with a cond_resched().
> (When running in a fully preemptible kernel, cond_resched()
> calls are stubbed out so it amounts to even less.)
>
> In sum, cond_resched() in error handling/retry contexts might
> be useful in avoiding softlockup splats, but not very good at
> error handling. Ideally, these should be replaced with some kind
> of timed or event wait.
>
> For now add cond_resched_stall(), which tries to schedule if
> possible, and failing that executes a cpu_relax().

What's the point of this new variant of cond_resched()? We really do not
want it at all. 

> +int __cond_resched_stall(void)
> +{
> +	if (tif_need_resched(RESCHED_eager)) {
> +		__preempt_schedule();

Under the new model TIF_NEED_RESCHED is going to reschedule if the
preemption counter goes to zero.

So the typical

   while (readl(mmio) & BUSY)
   	cpu_relax();

will just be preempted like any other loop, no?

Confused.
  
Ankur Arora Nov. 9, 2023, 10:27 p.m. UTC | #2
Thomas Gleixner <tglx@linutronix.de> writes:

> On Tue, Nov 07 2023 at 13:57, Ankur Arora wrote:
>> The kernel has a lot of intances of cond_resched() where it is used
>> as an alternative to spinning in a tight-loop while waiting to
>> retry an operation, or while waiting for a device state to change.
>>
>> Unfortunately, because the scheduler is unlikely to have an
>> interminable supply of runnable tasks on the runqueue, this just
>> amounts to spinning in a tight-loop with a cond_resched().
>> (When running in a fully preemptible kernel, cond_resched()
>> calls are stubbed out so it amounts to even less.)
>>
>> In sum, cond_resched() in error handling/retry contexts might
>> be useful in avoiding softlockup splats, but not very good at
>> error handling. Ideally, these should be replaced with some kind
>> of timed or event wait.
>>
>> For now add cond_resched_stall(), which tries to schedule if
>> possible, and failing that executes a cpu_relax().
>
> What's the point of this new variant of cond_resched()? We really do not
> want it at all.
>
>> +int __cond_resched_stall(void)
>> +{
>> +	if (tif_need_resched(RESCHED_eager)) {
>> +		__preempt_schedule();
>
> Under the new model TIF_NEED_RESCHED is going to reschedule if the
> preemption counter goes to zero.

Yes agreed. cond_resched_stall() was just meant to be window dressing.

> So the typical
>
>    while (readl(mmio) & BUSY)
>    	cpu_relax();
>
> will just be preempted like any other loop, no?

Yeah. But drivers could be using that right now as well. I suspect people
don't like the idea of spinning in a loop and, that's why they use
cond_resched(). Which in loops like this, is pretty much:

     while (readl(mmio) & BUSY)
     	   ;

The reason I added cond_resched_stall() was as an analogue to
cond_resched_lock() etc. Here, explicitly giving up CPU.

Though, someone pointed out a much better interface to do that sort
of thing: readb_poll_timeout(). Not all but a fair number of sites
could be converted to that.

Ankur
  

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6ba4371761c4..199f8f7211f2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2100,6 +2100,7 @@  static inline int _cond_resched(void) { return 0; }
 extern int __cond_resched_lock(spinlock_t *lock);
 extern int __cond_resched_rwlock_read(rwlock_t *lock);
 extern int __cond_resched_rwlock_write(rwlock_t *lock);
+extern int __cond_resched_stall(void);
 
 #define MIGHT_RESCHED_RCU_SHIFT		8
 #define MIGHT_RESCHED_PREEMPT_MASK	((1U << MIGHT_RESCHED_RCU_SHIFT) - 1)
@@ -2135,6 +2136,11 @@  extern int __cond_resched_rwlock_write(rwlock_t *lock);
 	__cond_resched_rwlock_write(lock);					\
 })
 
+#define cond_resched_stall() ({					\
+	__might_resched(__FILE__, __LINE__, 0);			\
+	__cond_resched_stall();					\
+})
+
 static inline void cond_resched_rcu(void)
 {
 #if defined(CONFIG_DEBUG_ATOMIC_SLEEP) || !defined(CONFIG_PREEMPT_RCU)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e1b0759ed3ab..ea00e8489ebb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8652,6 +8652,18 @@  int __cond_resched_rwlock_write(rwlock_t *lock)
 }
 EXPORT_SYMBOL(__cond_resched_rwlock_write);
 
+int __cond_resched_stall(void)
+{
+	if (tif_need_resched(RESCHED_eager)) {
+		__preempt_schedule();
+		return 1;
+	} else {
+		cpu_relax();
+		return 0;
+	}
+}
+EXPORT_SYMBOL(__cond_resched_stall);
+
 /**
  * yield - yield the current processor to other threads.
  *