[v2] sched/core: introduce sched_core_idle_cpu()
Commit Message
As core scheduling introduced, a new state of idle is defined as
force idle, running idle task but nr_running greater than zero.
If a cpu is in force idle state, idle_cpu() will return zero. This
result makes sense in certain scenarios, e.g., load balance. but
this will cause error in other scenarios, e.g., tick_irq_exit()
will not change ts->idle_active back to 1 because idle_cpu() returns
0 when force idle.
To solve this problem, we introduce sched_core_idle_cpu(), which
returns 1 when force idle. We audit all users of idle_cpu(), and
change idle_cpu() into sched_core_idle_cpu() in the following
functions:
- showacpu()
- rcu_is_rcuc_kthread_starving()
- tick_irq_exit()
v1-->v2: replace idle_cpu() with sched_core_idle_cpu() in function
showacpu() and rcu_is_rcuc_kthread_starving()
Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
---
drivers/tty/sysrq.c | 2 +-
include/linux/sched.h | 2 ++
kernel/rcu/tree_stall.h | 2 +-
kernel/sched/core.c | 13 +++++++++++++
kernel/softirq.c | 2 +-
5 files changed, 18 insertions(+), 3 deletions(-)
Comments
Le Sun, Jun 25, 2023 at 02:28:15AM +0800, Cruz Zhao a écrit :
> As core scheduling introduced, a new state of idle is defined as
> force idle, running idle task but nr_running greater than zero.
>
> If a cpu is in force idle state, idle_cpu() will return zero. This
> result makes sense in certain scenarios, e.g., load balance. but
> this will cause error in other scenarios, e.g., tick_irq_exit()
> will not change ts->idle_active back to 1 because idle_cpu() returns
> 0 when force idle.
>
> To solve this problem, we introduce sched_core_idle_cpu(), which
> returns 1 when force idle. We audit all users of idle_cpu(), and
> change idle_cpu() into sched_core_idle_cpu() in the following
> functions:
> - showacpu()
> - rcu_is_rcuc_kthread_starving()
> - tick_irq_exit()
>
> v1-->v2: replace idle_cpu() with sched_core_idle_cpu() in function
> showacpu() and rcu_is_rcuc_kthread_starving()
>
> Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
> ---
> drivers/tty/sysrq.c | 2 +-
> include/linux/sched.h | 2 ++
> kernel/rcu/tree_stall.h | 2 +-
> kernel/sched/core.c | 13 +++++++++++++
> kernel/softirq.c | 2 +-
> 5 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> index b6e70c5cfa17..8a6586800385 100644
> --- a/drivers/tty/sysrq.c
> +++ b/drivers/tty/sysrq.c
> @@ -232,7 +232,7 @@ static void showacpu(void *dummy)
> unsigned long flags;
>
> /* Idle CPUs have no interesting backtrace. */
> - if (idle_cpu(smp_processor_id())) {
> + if (sched_core_idle_cpu(smp_processor_id())) {
> pr_info("CPU%d: backtrace skipped as idling\n", smp_processor_id());
Actually perhaps an idle injection's backtrace is worth dumping. I guess
it might accidentally produce lockups and it's worth knowing the source then.
Though I don't have a strong opinion on that...
> return;
> }
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index b10b8349bb2a..6169faf30ecd 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -418,7 +418,7 @@ static bool rcu_is_rcuc_kthread_starving(struct rcu_data *rdp, unsigned long *jp
> return false;
>
> cpu = task_cpu(rcuc);
> - if (cpu_is_offline(cpu) || idle_cpu(cpu))
> + if (cpu_is_offline(cpu) || sched_core_idle_cpu(cpu))
An idle injection could possibly starve the RCU boost kthread, and then it's
worth knowing about it. I would suggest keeping idle_cpu() here.
> return false;
>
> j = jiffies - READ_ONCE(rdp->rcuc_activity);
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index c8a6913c067d..98b98991ce45 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -630,7 +630,7 @@ static inline void tick_irq_exit(void)
> int cpu = smp_processor_id();
>
> /* Make sure that timer wheel updates are propagated */
> - if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
> + if ((sched_core_idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
That one looks good.
Thanks!
On 6/25/23 17:28, Frederic Weisbecker wrote:
>> drivers/tty/sysrq.c | 2 +-
>> include/linux/sched.h | 2 ++
>> kernel/rcu/tree_stall.h | 2 +-
>> kernel/sched/core.c | 13 +++++++++++++
>> kernel/softirq.c | 2 +-
>> 5 files changed, 18 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
>> index b6e70c5cfa17..8a6586800385 100644
>> --- a/drivers/tty/sysrq.c
>> +++ b/drivers/tty/sysrq.c
>> @@ -232,7 +232,7 @@ static void showacpu(void *dummy)
>> unsigned long flags;
>>
>> /* Idle CPUs have no interesting backtrace. */
>> - if (idle_cpu(smp_processor_id())) {
>> + if (sched_core_idle_cpu(smp_processor_id())) {
>> pr_info("CPU%d: backtrace skipped as idling\n", smp_processor_id());
> Actually perhaps an idle injection's backtrace is worth dumping. I guess
> it might accidentally produce lockups and it's worth knowing the source then.
>
> Though I don't have a strong opinion on that...
>
>> return;
>> }
>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
>> index b10b8349bb2a..6169faf30ecd 100644
>> --- a/kernel/rcu/tree_stall.h
>> +++ b/kernel/rcu/tree_stall.h
>> @@ -418,7 +418,7 @@ static bool rcu_is_rcuc_kthread_starving(struct rcu_data *rdp, unsigned long *jp
>> return false;
>>
>> cpu = task_cpu(rcuc);
>> - if (cpu_is_offline(cpu) || idle_cpu(cpu))
>> + if (cpu_is_offline(cpu) || sched_core_idle_cpu(cpu))
> An idle injection could possibly starve the RCU boost kthread, and then it's
> worth knowing about it. I would suggest keeping idle_cpu() here.
>
Actually I think it should just be idle_cpu() for rcu_is_rcuc_kthread_starving()
and showacpu() because "force idling" is different from "idling".
Force idling happens because there is something incompatible on the sibling
runqueue in the core. That just makes the 2 runqueues on the core appear to be a
single runqueue. The concept of "force idling" is more closer to "preemption of
tasks on a single runqueue".
Considering that, I would vote for only converting the tick user. If force
idling happens for too long, it'd be good to know that as Frederic also mentioned.
thanks,
- Joel
@@ -232,7 +232,7 @@ static void showacpu(void *dummy)
unsigned long flags;
/* Idle CPUs have no interesting backtrace. */
- if (idle_cpu(smp_processor_id())) {
+ if (sched_core_idle_cpu(smp_processor_id())) {
pr_info("CPU%d: backtrace skipped as idling\n", smp_processor_id());
return;
}
@@ -2430,9 +2430,11 @@ extern void sched_core_free(struct task_struct *tsk);
extern void sched_core_fork(struct task_struct *p);
extern int sched_core_share_pid(unsigned int cmd, pid_t pid, enum pid_type type,
unsigned long uaddr);
+extern int sched_core_idle_cpu(int cpu);
#else
static inline void sched_core_free(struct task_struct *tsk) { }
static inline void sched_core_fork(struct task_struct *p) { }
+static inline int sched_core_idle_cpu(int cpu) { return idle_cpu(cpu); }
#endif
extern void sched_set_stop_task(int cpu, struct task_struct *stop);
@@ -418,7 +418,7 @@ static bool rcu_is_rcuc_kthread_starving(struct rcu_data *rdp, unsigned long *jp
return false;
cpu = task_cpu(rcuc);
- if (cpu_is_offline(cpu) || idle_cpu(cpu))
+ if (cpu_is_offline(cpu) || sched_core_idle_cpu(cpu))
return false;
j = jiffies - READ_ONCE(rdp->rcuc_activity);
@@ -7286,6 +7286,19 @@ struct task_struct *idle_task(int cpu)
return cpu_rq(cpu)->idle;
}
+#ifdef CONFIG_SCHED_CORE
+int sched_core_idle_cpu(int cpu)
+{
+ struct rq *rq = cpu_rq(cpu);
+
+ if (sched_core_enabled(rq) && rq->curr == rq->idle)
+ return 1;
+
+ return idle_cpu(cpu);
+}
+
+#endif
+
#ifdef CONFIG_SMP
/*
* This function computes an effective utilization for the given CPU, to be
@@ -630,7 +630,7 @@ static inline void tick_irq_exit(void)
int cpu = smp_processor_id();
/* Make sure that timer wheel updates are propagated */
- if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
+ if ((sched_core_idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
if (!in_hardirq())
tick_nohz_irq_exit();
}