[v2] workqueue: Fix warning triggered when nr_running is checked in worker_enter_idle()

Message ID 20230523140942.18679-1-qiang.zhang1211@gmail.com
State New
Headers
Series [v2] workqueue: Fix warning triggered when nr_running is checked in worker_enter_idle() |

Commit Message

Z qiang May 23, 2023, 2:09 p.m. UTC
  Currently, the nr_running can be modified from timer tick, that means
the timer tick can run in not-irq-protected critical section to modify
nr_runnig, consider the following scenario:

CPU0
kworker/0:2 (events)
   worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
   ->pool->nr_running++;  (1)

   process_one_work()
   ->worker->current_func(work);
     ->schedule()
       ->wq_worker_sleeping()
         ->worker->sleeping = 1;
         ->pool->nr_running--;  (0)
           ....
       ->wq_worker_running()
             ....
             CPU0 by interrupt:
             wq_worker_tick()
             ->worker_set_flags(worker, WORKER_CPU_INTENSIVE);
               ->pool->nr_running--;  (-1)
	       ->worker->flags |= WORKER_CPU_INTENSIVE;
             ....
         ->if (!(worker->flags & WORKER_NOT_RUNNING))
           ->pool->nr_running++;    (will not execute)
         ->worker->sleeping = 0;
           ....
    ->worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
      ->pool->nr_running++;  (0)
    ....
    worker_set_flags(worker, WORKER_PREP);
    ->pool->nr_running--;   (-1)
    ....
    worker_enter_idle()
    ->WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && pool->nr_running);

if the nr_workers is equal to nr_idle, due to the nr_running is not zero,
will trigger WARN_ON_ONCE().

[    2.460602] WARNING: CPU: 0 PID: 63 at kernel/workqueue.c:1999 worker_enter_idle+0xb2/0xc0
[    2.462163] Modules linked in:
[    2.463401] CPU: 0 PID: 63 Comm: kworker/0:2 Not tainted 6.4.0-rc2-next-20230519 #1
[    2.463771] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
[    2.465127] Workqueue:  0x0 (events)
[    2.465678] RIP: 0010:worker_enter_idle+0xb2/0xc0
...
[    2.472614] Call Trace:
[    2.473152]  <TASK>
[    2.474182]  worker_thread+0x71/0x430
[    2.474992]  ? _raw_spin_unlock_irqrestore+0x28/0x50
[    2.475263]  kthread+0x103/0x120
[    2.475493]  ? __pfx_worker_thread+0x10/0x10
[    2.476355]  ? __pfx_kthread+0x10/0x10
[    2.476635]  ret_from_fork+0x2c/0x50
[    2.477051]  </TASK>

This commit therefore add the check of worker->sleeping in
wq_worker_tick(), if the worker->sleeping is not zero, directly return.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Closes: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230519/testrun/17078554/suite/boot/test/clang-nightly-lkftconfig/log
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
---
 kernel/workqueue.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
  

Comments

Tejun Heo May 23, 2023, 7:40 p.m. UTC | #1
On Tue, May 23, 2023 at 10:09:41PM +0800, Zqiang wrote:
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 9c5c1cfa478f..329b84c42062 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1144,13 +1144,12 @@ void wq_worker_tick(struct task_struct *task)
>  	 * longer than wq_cpu_intensive_thresh_us, it's automatically marked
>  	 * CPU_INTENSIVE to avoid stalling other concurrency-managed work items.
>  	 */
> -	if ((worker->flags & WORKER_NOT_RUNNING) ||
> +	if ((worker->flags & WORKER_NOT_RUNNING) || worker->sleeping ||
>  	    worker->task->se.sum_exec_runtime - worker->current_at <
>  	    wq_cpu_intensive_thresh_us * NSEC_PER_USEC)
>  		return;

Ah, right, this isn't just interrupted read-modify-write. It has to consider
sleeping. This is subtle. We'll definitely need more comments. Will think
more about it.

Thanks.
  
Tejun Heo May 23, 2023, 7:48 p.m. UTC | #2
Hello,

On Tue, May 23, 2023 at 09:40:16AM -1000, Tejun Heo wrote:
> On Tue, May 23, 2023 at 10:09:41PM +0800, Zqiang wrote:
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index 9c5c1cfa478f..329b84c42062 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -1144,13 +1144,12 @@ void wq_worker_tick(struct task_struct *task)
> >  	 * longer than wq_cpu_intensive_thresh_us, it's automatically marked
> >  	 * CPU_INTENSIVE to avoid stalling other concurrency-managed work items.
> >  	 */
> > -	if ((worker->flags & WORKER_NOT_RUNNING) ||
> > +	if ((worker->flags & WORKER_NOT_RUNNING) || worker->sleeping ||
> >  	    worker->task->se.sum_exec_runtime - worker->current_at <
> >  	    wq_cpu_intensive_thresh_us * NSEC_PER_USEC)
> >  		return;
> 
> Ah, right, this isn't just interrupted read-modify-write. It has to consider
> sleeping. This is subtle. We'll definitely need more comments. Will think
> more about it.

So, there already are enough barriers to make this safe but that's kinda
brittle because e.g. it'd depend on the barrier in preempt_disable() which
is there for an unrelated reason. Can you please change ->sleeping accesses
to use WRITE/READ_ONCE() and explain in wq_worker_tick() that the worker
doesn't contribute to ->nr_running while ->sleeping regardless of
NOT_RUNNING and thus the operation shouldn't proceed? We probably need to
make it prettier but I think that should do for now.

Thanks.
  
Z qiang May 24, 2023, 3:39 a.m. UTC | #3
>
> Hello,
>
> On Tue, May 23, 2023 at 09:40:16AM -1000, Tejun Heo wrote:
> > On Tue, May 23, 2023 at 10:09:41PM +0800, Zqiang wrote:
> > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > > index 9c5c1cfa478f..329b84c42062 100644
> > > --- a/kernel/workqueue.c
> > > +++ b/kernel/workqueue.c
> > > @@ -1144,13 +1144,12 @@ void wq_worker_tick(struct task_struct *task)
> > >      * longer than wq_cpu_intensive_thresh_us, it's automatically marked
> > >      * CPU_INTENSIVE to avoid stalling other concurrency-managed work items.
> > >      */
> > > -   if ((worker->flags & WORKER_NOT_RUNNING) ||
> > > +   if ((worker->flags & WORKER_NOT_RUNNING) || worker->sleeping ||
> > >         worker->task->se.sum_exec_runtime - worker->current_at <
> > >         wq_cpu_intensive_thresh_us * NSEC_PER_USEC)
> > >             return;
> >
> > Ah, right, this isn't just interrupted read-modify-write. It has to consider
> > sleeping. This is subtle. We'll definitely need more comments. Will think
> > more about it.
>
> So, there already are enough barriers to make this safe but that's kinda
> brittle because e.g. it'd depend on the barrier in preempt_disable() which
> is there for an unrelated reason. Can you please change ->sleeping accesses
> to use WRITE/READ_ONCE() and explain in wq_worker_tick() that the worker
> doesn't contribute to ->nr_running while ->sleeping regardless of
> NOT_RUNNING and thus the operation shouldn't proceed? We probably need to
> make it prettier but I think that should do for now.

Thanks for the suggestion, I will resend.

>
> Thanks.
>
> --
> tejun
  

Patch

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9c5c1cfa478f..329b84c42062 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1144,13 +1144,12 @@  void wq_worker_tick(struct task_struct *task)
 	 * longer than wq_cpu_intensive_thresh_us, it's automatically marked
 	 * CPU_INTENSIVE to avoid stalling other concurrency-managed work items.
 	 */
-	if ((worker->flags & WORKER_NOT_RUNNING) ||
+	if ((worker->flags & WORKER_NOT_RUNNING) || worker->sleeping ||
 	    worker->task->se.sum_exec_runtime - worker->current_at <
 	    wq_cpu_intensive_thresh_us * NSEC_PER_USEC)
 		return;
 
 	raw_spin_lock(&pool->lock);
-
 	worker_set_flags(worker, WORKER_CPU_INTENSIVE);
 	wq_cpu_intensive_report(worker->current_func);
 	pwq->stats[PWQ_STAT_CPU_INTENSIVE]++;