[v6,0/4] workqueue: destroy_worker() vs isolated CPUs

Message ID	20221128183109.446754-1-vschneid@redhat.com
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Valentin Schneider <vschneid@redhat.com> To: linux-kernel@vger.kernel.org Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>, Peter Zijlstra <peterz@infradead.org>, Frederic Weisbecker <frederic@kernel.org>, Juri Lelli <juri.lelli@redhat.com>, Phil Auld <pauld@redhat.com>, Marcelo Tosatti <mtosatti@redhat.com> Subject: [PATCH v6 0/4] workqueue: destroy_worker() vs isolated CPUs Date: Mon, 28 Nov 2022 18:31:05 +0000 Message-Id: <20221128183109.446754-1-vschneid@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	workqueue: destroy_worker() vs isolated CPUs \| [v6,0/4] workqueue: destroy_worker() vs isolated CPUs [v6,1/4] workqueue: Protects wq_unbound_cpumask with wq_pool_attach_mutex [v6,2/4] workqueue: Factorize unbind/rebind_workers() logic [v6,3/4] workqueue: Convert the idle_timer to a timer + work_struct [v6,4/4] workqueue: Unbind kworkers before sending them to exit()

Message ID

20221128183109.446754-1-vschneid@redhat.com

Headers

Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
From: Valentin Schneider <vschneid@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Frederic Weisbecker <frederic@kernel.org>,
        Juri Lelli <juri.lelli@redhat.com>,
        Phil Auld <pauld@redhat.com>,
        Marcelo Tosatti <mtosatti@redhat.com>
Subject: [PATCH v6 0/4] workqueue: destroy_worker() vs isolated CPUs
Date: Mon, 28 Nov 2022 18:31:05 +0000
Message-Id: <20221128183109.446754-1-vschneid@redhat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

workqueue: destroy_worker() vs isolated CPUs |

Message

Valentin Schneider Nov. 28, 2022, 6:31 p.m. UTC

  Hi folks,

This revision is mostly about getting work out of the timer callback and
into the new idle worker culling work item.

Revisions
=========

v5 -> v6
++++++++

o Rebase onto v6.1-rc7
o Get rid of worker_pool.idle_cull_list; only do minimal amount of work in the
  timer callback (Tejun)
o Dropped the too_many_workers() -> nr_workers_to_cull() change

v4 -> v5
++++++++

o Rebase onto v6.1-rc6

o Overall renaming from "reaping" to "cull"
  I somehow convinced myself this was more appropriate
  
o Split the dwork into timer callback + work item (Tejun)

  I didn't want to have redudant operations happen in the timer callback and in
  the work item, so I made the timer callback detect which workers are "ripe"
  enough and then toss them to a worker for removal.

  This however means we release the pool->lock before getting to actually doing
  anything to those idle workers, which means they can wake up in the meantime.
  The new worker_pool.idle_cull_list is there for that reason.

  The alternative was to have the timer callback detect if any worker was ripe
  enough, kick the work item if so, and have the work item do the same thing
  again, which I didn't like.

RFCv3 -> v4
+++++++++++

o Rebase onto v6.0
o Split into more patches for reviewability
o Take dying workers out of the pool->workers as suggested by Lai

RFCv2 -> RFCv3
++++++++++++++

o Rebase onto v5.19
o Add new patch (1/3) around accessing wq_unbound_cpumask

o Prevent WORKER_DIE workers for kfree()'ing themselves before the idle reaper
  gets to handle them (Tejun)

  Bit of an aside on that: I've been struggling to convince myself this can
  happen due to spurious wakeups and would like some help here.

  Idle workers are TASK_UNINTERRUPTIBLE, so they can't be woken up by
  signals. That state is set *under* pool->lock, and all wakeups (before this
  patch) are also done while holding pool->lock.
  
  wake_up_worker() is done under pool->lock AND only wakes a worker on the
  pool->idle_list. Thus the to-be-woken worker *cannot* have WORKER_DIE, though
  it could gain it *after* being woken but *before* it runs, e.g.:
                          
  LOCK pool->lock
  wake_up_worker(pool)
      wake_up_process(p)
  UNLOCK pool->lock
                          idle_reaper_fn()
                            LOCK pool->lock
                            destroy_worker(worker, list);
			    UNLOCK pool->lock
			                            worker_thread()
						      goto woke_up;
                                                      LOCK pool->lock
						      READ worker->flags & WORKER_DIE
                                                          UNLOCK pool->lock
                                                          ...
						          kfree(worker);
                            reap_worker(worker);
			        // Uh-oh
			  
  ... But IMO that's not a spurious wakeup, that's a concurrency issue. I don't
  see any spurious/unexpected worker wakeup happening once a worker is off the
  pool->idle_list.
  

RFCv1 -> RFCv2
++++++++++++++

o Change the pool->timer into a delayed_work to have a sleepable context for
  unbinding kworkers

Cheers,
Valentin

Lai Jiangshan (1):
  workqueue: Protects wq_unbound_cpumask with wq_pool_attach_mutex

Valentin Schneider (3):
  workqueue: Factorize unbind/rebind_workers() logic
  workqueue: Convert the idle_timer to a timer + work_struct
  workqueue: Unbind kworkers before sending them to exit()

 kernel/workqueue.c | 195 +++++++++++++++++++++++++++++++++------------
 1 file changed, 143 insertions(+), 52 deletions(-)

--
2.31.1

Comments

Tejun Heo Nov. 30, 2022, 9:06 p.m. UTC | #1

Hello,

So, this generally looks great to me. Lai, what do you think?

Thanks.

Lai Jiangshan Dec. 1, 2022, 3:05 a.m. UTC | #2

On Thu, Dec 1, 2022 at 5:06 AM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> So, this generally looks great to me. Lai, what do you think?
>
> Thanks.
>

Hello,

It looks great to me too. (except for a defect in patch4)

Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>

Thanks
Lai