[16/17] workqueue: Allow cancel_work_sync() and disable_work() from atomic contexts on BH work items
Commit Message
Now that work_grab_pending() can always grab the PENDING bit without
sleeping, the only thing that prevents allowing cancel_work_sync() of a BH
work item from an atomic context is the flushing of the in-flight instance.
When we're flushing a BH work item for cancel_work_sync(), we know that the
work item is not queued and must be executing in a BH context, which means
that it's safe to busy-wait for its completion from a non-hardirq atomic
context.
This patch updates __flush_work() so that it busy-waits when flushing a BH
work item for cancel_work_sync(). might_sleep() is pushed from
start_flush_work() to its callers - when operating on a BH work item,
__cancel_work_sync() now enforces !in_hardirq() instead of might_sleep().
This allows cancel_work_sync() and disable_work() to be called from
non-hardirq atomic contexts on BH work items.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
kernel/workqueue.c | 62 +++++++++++++++++++++++++++++++++++-----------
1 file changed, 47 insertions(+), 15 deletions(-)
Comments
Hello, Tejun
On Sat, Feb 17, 2024 at 2:06 AM Tejun Heo <tj@kernel.org> wrote:
> @@ -4072,7 +4070,32 @@ static bool __flush_work(struct work_struct *work, bool from_cancel)
> if (!pool)
> return false;
>
> - wait_for_completion(&barr.done);
> + if ((pool->flags & POOL_BH) && from_cancel) {
pool pointer might be invalid here, please check POOL_BH before
rcu_read_unlock()
or move rcu_read_unlock() here, or use "*work_data_bits(work) & WORK_OFFQ_BH".
> + /*
> + * We're flushing a BH work item which is being canceled. It
> + * must have been executing during start_flush_work() and can't
> + * currently be queued. If @work is still executing, we know it
> + * is running in the BH context and thus can be busy-waited.
> + *
> + * On RT, prevent a live lock when current preempted soft
> + * interrupt processing or prevents ksoftirqd from running by
> + * keeping flipping BH. If the tasklet runs on a different CPU
> + * then this has no effect other than doing the BH
> + * disable/enable dance for nothing. This is copied from
> + * kernel/softirq.c::tasklet_unlock_spin_wait().
> + */
s/tasklet/BH work/g
Although the comment is copied from kernel/softirq.c, but I can't
envision what the scenario is when the current task
"prevents ksoftirqd from running by keeping flipping BH"
since the @work is still executing or the tasklet is running.
> + while (!try_wait_for_completion(&barr.done)) {
> + if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> + local_bh_disable();
> + local_bh_enable();
> + } else {
> + cpu_relax();
> + }
> + }
> + } else {
> + wait_for_completion(&barr.done);
> + }
> +
> destroy_work_on_stack(&barr.work);
> return true;
> }
> @@ -4090,6 +4113,7 @@ static bool __flush_work(struct work_struct *work, bool from_cancel)
> */
> bool flush_work(struct work_struct *work)
> {
> + might_sleep();
> return __flush_work(work, false);
> }
> EXPORT_SYMBOL_GPL(flush_work);
> @@ -4179,6 +4203,11 @@ static bool __cancel_work_sync(struct work_struct *work, u32 cflags)
>
> ret = __cancel_work(work, cflags | WORK_CANCEL_DISABLE);
>
> + if (*work_data_bits(work) & WORK_OFFQ_BH)
> + WARN_ON_ONCE(in_hardirq());
When !PREEMPT_RT, this check is sufficient.
But when PREEMP_RT, it should be only in the contexts that allow
local_bh_disable() for synching a BH work, although I'm not sure
what check code is proper.
In PREEMPT_RT, local_bh_disable() is disallowed in not only hardirq
context but also !preemptible() context (I'm not sure about it).
__local_bh_disable_ip() (PREEMP_RT version) doesn't contain
full check except for "WARN_ON_ONCE(in_hardirq())" either.
Since the check is just for debugging, I'm OK with the current check.
Thanks
Lai
Hello,
On Tue, Feb 20, 2024 at 03:33:34PM +0800, Lai Jiangshan wrote:
> Hello, Tejun
>
> On Sat, Feb 17, 2024 at 2:06 AM Tejun Heo <tj@kernel.org> wrote:
>
> > @@ -4072,7 +4070,32 @@ static bool __flush_work(struct work_struct *work, bool from_cancel)
> > if (!pool)
> > return false;
> >
> > - wait_for_completion(&barr.done);
> > + if ((pool->flags & POOL_BH) && from_cancel) {
>
> pool pointer might be invalid here, please check POOL_BH before
> rcu_read_unlock()
Right, it had a local variable caching the test result from inside the
rcu_read_lock() section and I removed it by mistake while splitting patches.
Will fix.
> > + /*
> > + * We're flushing a BH work item which is being canceled. It
> > + * must have been executing during start_flush_work() and can't
> > + * currently be queued. If @work is still executing, we know it
> > + * is running in the BH context and thus can be busy-waited.
> > + *
> > + * On RT, prevent a live lock when current preempted soft
> > + * interrupt processing or prevents ksoftirqd from running by
> > + * keeping flipping BH. If the tasklet runs on a different CPU
> > + * then this has no effect other than doing the BH
> > + * disable/enable dance for nothing. This is copied from
> > + * kernel/softirq.c::tasklet_unlock_spin_wait().
> > + */
>
> s/tasklet/BH work/g
Updated.
> Although the comment is copied from kernel/softirq.c, but I can't
> envision what the scenario is when the current task
> "prevents ksoftirqd from running by keeping flipping BH"
Yeah, that sentence is not the easiest to parse. The following parentheses
might be helpful:
prevent a live lock (when current (preempted soft interrupt processing) or
(prevents ksoftirqd from running)) by keeping flipping BH.
> since the @work is still executing or the tasklet is running.
eb2dafbba8b8 ("tasklets: Prevent tasklet_unlock_spin_wait() deadlock on RT")
is the commit which added the flipping to tasklet_unlock_spin_wait(). My
understanding of RT isn't great but it sounds like BH execution can be
preempted by someone else who does the busy wait which would be sad. IIUC,
it looks like flipping BH off/on makes the busy waiting one yield to BH
processing.
> > @@ -4179,6 +4203,11 @@ static bool __cancel_work_sync(struct work_struct *work, u32 cflags)
> >
> > ret = __cancel_work(work, cflags | WORK_CANCEL_DISABLE);
> >
> > + if (*work_data_bits(work) & WORK_OFFQ_BH)
> > + WARN_ON_ONCE(in_hardirq());
>
> When !PREEMPT_RT, this check is sufficient.
>
> But when PREEMP_RT, it should be only in the contexts that allow
> local_bh_disable() for synching a BH work, although I'm not sure
> what check code is proper.
>
> In PREEMPT_RT, local_bh_disable() is disallowed in not only hardirq
> context but also !preemptible() context (I'm not sure about it).
>
> __local_bh_disable_ip() (PREEMP_RT version) doesn't contain
> full check except for "WARN_ON_ONCE(in_hardirq())" either.
>
> Since the check is just for debugging, I'm OK with the current check.
We should have enough test coverage with !RT kernels. If you can think of a
better notation for RT, please be my guest.
Thanks.
@@ -4004,8 +4004,6 @@ static struct worker_pool *start_flush_work(struct work_struct *work,
struct pool_workqueue *pwq;
struct workqueue_struct *wq;
- might_sleep();
-
pool = get_work_pool(work);
if (!pool)
return NULL;
@@ -4072,7 +4070,32 @@ static bool __flush_work(struct work_struct *work, bool from_cancel)
if (!pool)
return false;
- wait_for_completion(&barr.done);
+ if ((pool->flags & POOL_BH) && from_cancel) {
+ /*
+ * We're flushing a BH work item which is being canceled. It
+ * must have been executing during start_flush_work() and can't
+ * currently be queued. If @work is still executing, we know it
+ * is running in the BH context and thus can be busy-waited.
+ *
+ * On RT, prevent a live lock when current preempted soft
+ * interrupt processing or prevents ksoftirqd from running by
+ * keeping flipping BH. If the tasklet runs on a different CPU
+ * then this has no effect other than doing the BH
+ * disable/enable dance for nothing. This is copied from
+ * kernel/softirq.c::tasklet_unlock_spin_wait().
+ */
+ while (!try_wait_for_completion(&barr.done)) {
+ if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ local_bh_disable();
+ local_bh_enable();
+ } else {
+ cpu_relax();
+ }
+ }
+ } else {
+ wait_for_completion(&barr.done);
+ }
+
destroy_work_on_stack(&barr.work);
return true;
}
@@ -4090,6 +4113,7 @@ static bool __flush_work(struct work_struct *work, bool from_cancel)
*/
bool flush_work(struct work_struct *work)
{
+ might_sleep();
return __flush_work(work, false);
}
EXPORT_SYMBOL_GPL(flush_work);
@@ -4179,6 +4203,11 @@ static bool __cancel_work_sync(struct work_struct *work, u32 cflags)
ret = __cancel_work(work, cflags | WORK_CANCEL_DISABLE);
+ if (*work_data_bits(work) & WORK_OFFQ_BH)
+ WARN_ON_ONCE(in_hardirq());
+ else
+ might_sleep();
+
/*
* Skip __flush_work() during early boot when we know that @work isn't
* executing. This allows canceling during early boot.
@@ -4205,19 +4234,19 @@ EXPORT_SYMBOL(cancel_work);
* cancel_work_sync - cancel a work and wait for it to finish
* @work: the work to cancel
*
- * Cancel @work and wait for its execution to finish. This function
- * can be used even if the work re-queues itself or migrates to
- * another workqueue. On return from this function, @work is
- * guaranteed to be not pending or executing on any CPU.
+ * Cancel @work and wait for its execution to finish. This function can be used
+ * even if the work re-queues itself or migrates to another workqueue. On return
+ * from this function, @work is guaranteed to be not pending or executing on any
+ * CPU as long as there aren't racing enqueues.
*
- * cancel_work_sync(&delayed_work->work) must not be used for
- * delayed_work's. Use cancel_delayed_work_sync() instead.
+ * cancel_work_sync(&delayed_work->work) must not be used for delayed_work's.
+ * Use cancel_delayed_work_sync() instead.
*
- * The caller must ensure that the workqueue on which @work was last
- * queued can't be destroyed before this function returns.
+ * Must be called from a sleepable context if @work was last queued on a non-BH
+ * workqueue. Can also be called from non-hardirq atomic contexts including BH
+ * if @work was last queued on a BH workqueue.
*
- * Return:
- * %true if @work was pending, %false otherwise.
+ * Returns %true if @work was pending, %false otherwise.
*/
bool cancel_work_sync(struct work_struct *work)
{
@@ -4287,8 +4316,11 @@ EXPORT_SYMBOL_GPL(disable_work);
* Similar to disable_work() but also wait for @work to finish if currently
* executing.
*
- * Must be called from a sleepable context. Returns %true if @work was pending,
- * %false otherwise.
+ * Must be called from a sleepable context if @work was last queued on a non-BH
+ * workqueue. Can also be called from non-hardirq atomic contexts including BH
+ * if @work was last queued on a BH workqueue.
+ *
+ * Returns %true if @work was pending, %false otherwise.
*/
bool disable_work_sync(struct work_struct *work)
{