sched/rt: Use cpu_active_mask to prevent rto_push_irq_work's dead loop

Message ID 20221114120453.3233-1-xuewen.yan@unisoc.com
State New
Headers
Series sched/rt: Use cpu_active_mask to prevent rto_push_irq_work's dead loop |

Commit Message

Xuewen Yan Nov. 14, 2022, 12:04 p.m. UTC
  We are performing the stress test related to cpu hotplug,
when there are only two cpus left in the system(for example: cpu0 and cpu1),
if cpu1 is offline at this time, the following infinite loop may occur:

When cpu0 contains more than one rt task, it will push the other rt tasks
to cpux for execution. If cpu1 is going through the hotplug process at this time,
it woule start a stop_machine to be responsible for migrating the tasks on cpu1
to other online cpus. And prevent any task from migrating to this cpu.
As a result, when the cpu0 migrates the rt task to cpu1, the stop_machine
re-migrates the rt task to cpu0, which causes the rt.pushable_tasks of rq0
to never be null. The result is these rt tasks that have been repeatedly
migrated to cpu0 and cpu1.

In order to prevent the above situation from happening, use cpu_active_mask
to prevent find_lowset_rq from selecting inactive cpu. At the same time,
when there is only one active cpu in the system, irq_push_irq_work should be
terminated in advance.

Co-developed-by: Ke Wang <ke.wang@unisoc.com>
Signed-off-by: Ke Wang <ke.wang@unisoc.com>
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
---
 kernel/sched/cpupri.c | 1 +
 kernel/sched/rt.c     | 6 ++++++
 2 files changed, 7 insertions(+)
  

Comments

Steven Rostedt Nov. 17, 2022, 10 p.m. UTC | #1
On Mon, 14 Nov 2022 20:04:53 +0800
Xuewen Yan <xuewen.yan@unisoc.com> wrote:

> +++ b/kernel/sched/rt.c
> @@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd)
>  {
>  	int next;
>  	int cpu;
> +	struct cpumask tmp_cpumask;

If you have a machine with thousands of CPUs, this will likely kill the
stack.

>  
>  	/*
>  	 * When starting the IPI RT pushing, the rto_cpu is set to -1,
> @@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd)
>  		/* When rto_cpu is -1 this acts like cpumask_first() */
>  		cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
>  
> +		cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask);
> +		if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 &&
> +		    cpumask_test_cpu(smp_processor_id(), &tmp_cpumask))
> +			break;
> +

Kill the above.

>  		rd->rto_cpu = cpu;
>  
>  		if (cpu < nr_cpu_ids) {

Why not just add here:

			if (!cpumask_test_cpu(cpu, cpu_active_mask))
				continue;
			return cpu;
		}

?

-- Steve
  
Xuewen Yan Nov. 18, 2022, 12:08 p.m. UTC | #2
On Fri, Nov 18, 2022 at 6:16 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 14 Nov 2022 20:04:53 +0800
> Xuewen Yan <xuewen.yan@unisoc.com> wrote:
>
> > +++ b/kernel/sched/rt.c
> > @@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd)
> >  {
> >       int next;
> >       int cpu;
> > +     struct cpumask tmp_cpumask;
>
> If you have a machine with thousands of CPUs, this will likely kill the
> stack.
Ha, I did not take it into account. Thanks!

>
> >
> >       /*
> >        * When starting the IPI RT pushing, the rto_cpu is set to -1,
> > @@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd)
> >               /* When rto_cpu is -1 this acts like cpumask_first() */
> >               cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
> >
> > +             cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask);
> > +             if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 &&
> > +                 cpumask_test_cpu(smp_processor_id(), &tmp_cpumask))
> > +                     break;
> > +
>
> Kill the above.
>
> >               rd->rto_cpu = cpu;
> >
> >               if (cpu < nr_cpu_ids) {
>
> Why not just add here:
>
>                         if (!cpumask_test_cpu(cpu, cpu_active_mask))
>                                 continue;
>                         return cpu;
>                 }
>
> ?
Let's consider this scenario:
the online_cpu_mask is 0x03(cpu0/1),the active_cpu_mask is
0x01(cpu0),the rto cpu is cpu0,
the rto_mask is 0x01, and the irq cpu is cpu0, as a result,  the first
loop, the rto_cpu would be -1,
but the loop < rto_loop_next, on  next loop, because of the rto_cpu is
-1, so the next rto cpu would
be cpu0 still, as a result, the cpu0 would push rt tasks to
cpu1(inactive cpu) while running in the irq_work.

So we should judge whether the current cpu(the only one active cpu) is
the next loop's cpu.

Thanks!

>
> -- Steve
  
Steven Rostedt Nov. 20, 2022, 7:27 p.m. UTC | #3
On Fri, 18 Nov 2022 20:08:54 +0800
Xuewen Yan <xuewen.yan94@gmail.com> wrote:

> Let's consider this scenario:
> the online_cpu_mask is 0x03(cpu0/1),the active_cpu_mask is
> 0x01(cpu0),the rto cpu is cpu0,
> the rto_mask is 0x01, and the irq cpu is cpu0, as a result,  the first
> loop, the rto_cpu would be -1,
> but the loop < rto_loop_next, on  next loop, because of the rto_cpu is
> -1, so the next rto cpu would
> be cpu0 still, as a result, the cpu0 would push rt tasks to
> cpu1(inactive cpu) while running in the irq_work.
> 
> So we should judge whether the current cpu(the only one active cpu) is
> the next loop's cpu.

Wait, I'm confused by what you are trying to do here.

The rto_next_cpu() only sends an IPI to the next CPU that has more than
one RT task queued on it, where the RT tasks can migrate.

If we send CPU0 an IPI, let CPU0 figure out to only push to the active
mask. Why are trying to prevent sending the IPI to an active CPU?

Will the first part of your patch that modifies the cpupri() not keep
it from pushing to the non active CPU?

-- Steve
  
Qais Yousef Nov. 20, 2022, 11:35 p.m. UTC | #4
On 11/17/22 17:00, Steven Rostedt wrote:
> On Mon, 14 Nov 2022 20:04:53 +0800
> Xuewen Yan <xuewen.yan@unisoc.com> wrote:
> 
> > +++ b/kernel/sched/rt.c
> > @@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd)
> >  {
> >  	int next;
> >  	int cpu;
> > +	struct cpumask tmp_cpumask;
> 
> If you have a machine with thousands of CPUs, this will likely kill the
> stack.
> 
> >  
> >  	/*
> >  	 * When starting the IPI RT pushing, the rto_cpu is set to -1,
> > @@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd)
> >  		/* When rto_cpu is -1 this acts like cpumask_first() */
> >  		cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
> >  
> > +		cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask);
> > +		if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 &&
> > +		    cpumask_test_cpu(smp_processor_id(), &tmp_cpumask))
> > +			break;
> > +
> 
> Kill the above.
> 
> >  		rd->rto_cpu = cpu;
> >  
> >  		if (cpu < nr_cpu_ids) {
> 
> Why not just add here:
> 
> 			if (!cpumask_test_cpu(cpu, cpu_active_mask))
> 				continue;
> 			return cpu;
> 		}
> 
> ?

Or this?


	diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
	index ed2a47e4ddae..5073e06e34c3 100644
	--- a/kernel/sched/rt.c
	+++ b/kernel/sched/rt.c
	@@ -2236,7 +2236,7 @@ static int rto_next_cpu(struct root_domain *rd)
		for (;;) {

			/* When rto_cpu is -1 this acts like cpumask_first() */
	-               cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
	+               cpu = cpumask_next_and(rd->rto_cpu, rd->rto_mask, cpu_active_mask);

			rd->rto_cpu = cpu;


I think we might be circumventing the root cause though. It looks like that
there are only 2 cpus online in the system and one of them going offline (cpu1)
while the other is being overloaded (cpu0), ending in ping-ponging the tasks?

If that's the case, it seems to me the rto state needs to be cleared for cpu0
and stop attempting to push tasks. Which I'd assume it usually happens but
there's a race that prevents it from realizing this.

Maybe something like this would be better? Assuming I understood the problem
correctly of course.


	diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
	index ed2a47e4ddae..d80072cc2596 100644
	--- a/kernel/sched/rt.c
	+++ b/kernel/sched/rt.c
	@@ -334,6 +334,10 @@ static inline void rt_set_overload(struct rq *rq)
		if (!rq->online)
			return;

	+       /* Overload is meaningless if we shrank to become a UP system */
	+       if (cpumask_weight(cpu_online_mask) == 1)
	+               return;
	+
		cpumask_set_cpu(rq->cpu, rq->rd->rto_mask);
		/*
		 * Make sure the mask is visible before we set

This could be working around the problem too, not sure. But the bug IMHO is
that if we shrink to a UP system, the whole push-pull mechanism should retire
temporarily. Which I assume this usually happens, but there's a race somewhere.


Thanks

--
Qais Yousef
  

Patch

diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
index a286e726eb4b..42c40cfdf836 100644
--- a/kernel/sched/cpupri.c
+++ b/kernel/sched/cpupri.c
@@ -101,6 +101,7 @@  static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
 
 	if (lowest_mask) {
 		cpumask_and(lowest_mask, &p->cpus_mask, vec->mask);
+		cpumask_and(lowest_mask, lowest_mask, cpu_active_mask);
 
 		/*
 		 * We have to ensure that we have at least one bit
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index ed2a47e4ddae..9433696dae50 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2219,6 +2219,7 @@  static int rto_next_cpu(struct root_domain *rd)
 {
 	int next;
 	int cpu;
+	struct cpumask tmp_cpumask;
 
 	/*
 	 * When starting the IPI RT pushing, the rto_cpu is set to -1,
@@ -2238,6 +2239,11 @@  static int rto_next_cpu(struct root_domain *rd)
 		/* When rto_cpu is -1 this acts like cpumask_first() */
 		cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
 
+		cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask);
+		if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 &&
+		    cpumask_test_cpu(smp_processor_id(), &tmp_cpumask))
+			break;
+
 		rd->rto_cpu = cpu;
 
 		if (cpu < nr_cpu_ids)