sched/rt: Fix possible warn when push_rt_task

Message ID 20230624092130.174409-1-tanghui20@huawei.com
State New
Headers
Series sched/rt: Fix possible warn when push_rt_task |

Commit Message

Hui Tang June 24, 2023, 9:21 a.m. UTC
  A warn may be triggered during reboot, as follows:

reboot
  ->kernel_restart
    ->machine_restart
      ->smp_send_stop --- ipi handler set_cpu_online(cpu, false)

balance_callback
-> __balance_callback
  ->push_rt_task
    -> find_lock_lowest_rq  --- offline cpu in vec->mask not be cleared
      -> find_lowest_rq
        -> cpupri_find
          -> cpupri_find_fitness
            -> __cpupri_find [cpumask_and(..., vec->mask)]
    -> set_task_cpu(next_task, lowest_rq->cpu) --- WARN_ON(!oneline(cpu)

So add !cpu_online(lowest_rq->cpu) check before set_task_cpu().
The fix does not completely fix the problem, since cpu_online_mask may
be cleared after check.

Fixes: 4ff9083b8a9a8 ("sched/core: WARN() when migrating to an offline CPU")
Signed-off-by: Hui Tang <tanghui20@huawei.com>
---
 kernel/sched/rt.c | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Peter Zijlstra July 3, 2023, 12:39 p.m. UTC | #1
On Sat, Jun 24, 2023 at 05:21:30PM +0800, Hui Tang wrote:
> A warn may be triggered during reboot, as follows:
> 
> reboot
>   ->kernel_restart
>     ->machine_restart
>       ->smp_send_stop --- ipi handler set_cpu_online(cpu, false)
> 
> balance_callback
> -> __balance_callback
>   ->push_rt_task
>     -> find_lock_lowest_rq  --- offline cpu in vec->mask not be cleared
>       -> find_lowest_rq
>         -> cpupri_find
>           -> cpupri_find_fitness
>             -> __cpupri_find [cpumask_and(..., vec->mask)]
>     -> set_task_cpu(next_task, lowest_rq->cpu) --- WARN_ON(!oneline(cpu)
> 
> So add !cpu_online(lowest_rq->cpu) check before set_task_cpu().
> The fix does not completely fix the problem, since cpu_online_mask may
> be cleared after check.

This is tinkering.. at best. I'm sure there's a score of other issues,
not in the least the very same issue in deadline.c. But since this
doesn't actually fix anything, this clearly isn't the right way.

> Fixes: 4ff9083b8a9a8 ("sched/core: WARN() when migrating to an offline CPU")
> Signed-off-by: Hui Tang <tanghui20@huawei.com>
> ---
>  kernel/sched/rt.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 00e0e5074115..852ef18b6a50 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2159,6 +2159,9 @@ static int push_rt_task(struct rq *rq, bool pull)
>  		goto retry;
>  	}
>  
> +	if (unlikely(!cpu_online(lowest_rq->cpu)))
> +		goto out;
> +
>  	deactivate_task(rq, next_task, 0);
>  	set_task_cpu(next_task, lowest_rq->cpu);
>  	activate_task(lowest_rq, next_task, 0);
> -- 
> 2.17.1
>
  

Patch

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 00e0e5074115..852ef18b6a50 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2159,6 +2159,9 @@  static int push_rt_task(struct rq *rq, bool pull)
 		goto retry;
 	}
 
+	if (unlikely(!cpu_online(lowest_rq->cpu)))
+		goto out;
+
 	deactivate_task(rq, next_task, 0);
 	set_task_cpu(next_task, lowest_rq->cpu);
 	activate_task(lowest_rq, next_task, 0);