workqueue: Prevent a new work item from queueing into a destruction wq

Message ID 20221212061836.3620-1-richard.xnu.clark@gmail.com
State New
Headers
Series workqueue: Prevent a new work item from queueing into a destruction wq |

Commit Message

richard clark Dec. 12, 2022, 6:18 a.m. UTC
  Currently the __WQ_DRAINING is used to prevent a new work item from queueing
to a draining workqueue, but this flag will be cleared before the end of a
RCU grace period. Because the workqueue instance is actually freed after
the RCU grace period, this fact results in an opening window in which a new
work item can be queued into a destorying workqueue and be scheduled
consequently, for instance, the below code snippet demos this accident:

	static void work_func(struct work_struct *work)
	{
		pr_info("%s scheduled\n", __func__);
	}
	static DECLARE_WORK(w0, work_func);

	struct workqueue_struct * wq0 = alloc_workqueue("wq0", 0, 0);
	destroy_workqueue(wq0);
	queue_work_on(1, wq0, &w0);

The above work_func(...) can be scheduled by a destroying workqueue.

This patch will close that window by introducing a new flag __WQ_DESTROYING,
which will keep valid until the end of a RCU grace period. With this commit,
the above code will trigger a WARNING message and return directly from
__queue_work(...), like this:

WARNING: CPU: 7 PID: 3994 at kernel/workqueue.c:1438 __queue_work+0x3ec/0x580

Signed-off-by: Richard Clark <richard.xnu.clark@gmail.com>
---
 include/linux/workqueue.h |  1 +
 kernel/workqueue.c        | 11 ++++++++---
 2 files changed, 9 insertions(+), 3 deletions(-)
  

Comments

Tejun Heo Dec. 12, 2022, 6:23 a.m. UTC | #1
On Mon, Dec 12, 2022 at 02:18:36PM +0800, Richard Clark wrote:
> Currently the __WQ_DRAINING is used to prevent a new work item from queueing
> to a draining workqueue, but this flag will be cleared before the end of a
> RCU grace period. Because the workqueue instance is actually freed after
> the RCU grace period, this fact results in an opening window in which a new
> work item can be queued into a destorying workqueue and be scheduled
> consequently, for instance, the below code snippet demos this accident:

I mean, this is just use-after-free. The same scenario can happen with
non-RCU frees or if there happens to be an RCU grace period inbetween. I'm
not sure what's being protected here.

Thanks.
  
Lai Jiangshan Dec. 12, 2022, 6:48 a.m. UTC | #2
On Mon, Dec 12, 2022 at 2:23 PM Tejun Heo <tj@kernel.org> wrote:
>
> On Mon, Dec 12, 2022 at 02:18:36PM +0800, Richard Clark wrote:
> > Currently the __WQ_DRAINING is used to prevent a new work item from queueing
> > to a draining workqueue, but this flag will be cleared before the end of a
> > RCU grace period. Because the workqueue instance is actually freed after
> > the RCU grace period, this fact results in an opening window in which a new
> > work item can be queued into a destorying workqueue and be scheduled
> > consequently, for instance, the below code snippet demos this accident:
>
> I mean, this is just use-after-free. The same scenario can happen with
> non-RCU frees or if there happens to be an RCU grace period inbetween. I'm
> not sure what's being protected here.

I think it is a kind of debugging facility with no overhead in the
fast path.

It is indeed the caller's responsibility not to do use-after-free.

For non-RCU free, the freed workqueue's state can be arbitrary soon and
the caller might get a complaint. And if there are some kinds of debugging
facilities for freed memory, the system can notice the problem earlier.

But now is RCU free for the workqueue, and the workqueue has nothing
different between before and after destroy_workqueue() unless the
grace period ends and the memory-allocation subsystem takes charge of
the memory.



Thanks
Lai
  
richard clark Dec. 12, 2022, 9:17 a.m. UTC | #3
On Mon, Dec 12, 2022 at 2:23 PM Tejun Heo <tj@kernel.org> wrote:
>
> On Mon, Dec 12, 2022 at 02:18:36PM +0800, Richard Clark wrote:
> > Currently the __WQ_DRAINING is used to prevent a new work item from queueing
> > to a draining workqueue, but this flag will be cleared before the end of a
> > RCU grace period. Because the workqueue instance is actually freed after
> > the RCU grace period, this fact results in an opening window in which a new
> > work item can be queued into a destorying workqueue and be scheduled
> > consequently, for instance, the below code snippet demos this accident:
>
> I mean, this is just use-after-free. The same scenario can happen with

IMO, it's not exactly the use-after-free since no free action before
the end of RCU grace period, if it really is then the code will
trigger a kernel BUG:

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    ...

Which can be easily observed for both non-RCU frees and RCU frees.

Thanks

> non-RCU frees or if there happens to be an RCU grace period inbetween. I'm
> not sure what's being protected here.
>
> Thanks.
>
> --
> tejun
  
richard clark Dec. 12, 2022, 9:26 a.m. UTC | #4
On Mon, Dec 12, 2022 at 2:48 PM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
>
> On Mon, Dec 12, 2022 at 2:23 PM Tejun Heo <tj@kernel.org> wrote:
> >
> > On Mon, Dec 12, 2022 at 02:18:36PM +0800, Richard Clark wrote:
> > > Currently the __WQ_DRAINING is used to prevent a new work item from queueing
> > > to a draining workqueue, but this flag will be cleared before the end of a
> > > RCU grace period. Because the workqueue instance is actually freed after
> > > the RCU grace period, this fact results in an opening window in which a new
> > > work item can be queued into a destorying workqueue and be scheduled
> > > consequently, for instance, the below code snippet demos this accident:
> >
> > I mean, this is just use-after-free. The same scenario can happen with
> > non-RCU frees or if there happens to be an RCU grace period inbetween. I'm
> > not sure what's being protected here.
>
> I think it is a kind of debugging facility with no overhead in the
> fast path.

Agree...

>
> It is indeed the caller's responsibility not to do use-after-free.
>
> For non-RCU free, the freed workqueue's state can be arbitrary soon and
> the caller might get a complaint. And if there are some kinds of debugging
> facilities for freed memory, the system can notice the problem earlier.

This case will trigger a noticeable kernel BUG

>
> But now is RCU free for the workqueue, and the workqueue has nothing
> different between before and after destroy_workqueue() unless the
> grace period ends and the memory-allocation subsystem takes charge of
> the memory.
>

    destroy_workqueue(wq0);
    schedule_timeout_interruptible(msecs_to_jiffies(1000));
    queue_work_on(1, wq0, &w0);

Sleep 1s to guarantee the RCU grace period completes, then the same
result with non-RCU free

Thanks

>
>
> Thanks
> Lai
  
Tejun Heo Dec. 12, 2022, 10:27 p.m. UTC | #5
On Mon, Dec 12, 2022 at 02:48:25PM +0800, Lai Jiangshan wrote:
> On Mon, Dec 12, 2022 at 2:23 PM Tejun Heo <tj@kernel.org> wrote:
> >
> > On Mon, Dec 12, 2022 at 02:18:36PM +0800, Richard Clark wrote:
> > > Currently the __WQ_DRAINING is used to prevent a new work item from queueing
> > > to a draining workqueue, but this flag will be cleared before the end of a
> > > RCU grace period. Because the workqueue instance is actually freed after
> > > the RCU grace period, this fact results in an opening window in which a new
> > > work item can be queued into a destorying workqueue and be scheduled
> > > consequently, for instance, the below code snippet demos this accident:
> >
> > I mean, this is just use-after-free. The same scenario can happen with
> > non-RCU frees or if there happens to be an RCU grace period inbetween. I'm
> > not sure what's being protected here.
> 
> I think it is a kind of debugging facility with no overhead in the
> fast path.
> 
> It is indeed the caller's responsibility not to do use-after-free.
> 
> For non-RCU free, the freed workqueue's state can be arbitrary soon and
> the caller might get a complaint. And if there are some kinds of debugging
> facilities for freed memory, the system can notice the problem earlier.
> 
> But now is RCU free for the workqueue, and the workqueue has nothing
> different between before and after destroy_workqueue() unless the
> grace period ends and the memory-allocation subsystem takes charge of
> the memory.

idk, maybe? It seems kinda out of scope. Richard, can you update the patch
description and comment so that they clearly state that this is a debug aid
to help spotting user errors?

Thanks.
  
richard clark Dec. 13, 2022, 1:09 a.m. UTC | #6
On Tue, Dec 13, 2022 at 6:27 AM Tejun Heo <tj@kernel.org> wrote:
>
> On Mon, Dec 12, 2022 at 02:48:25PM +0800, Lai Jiangshan wrote:
> > On Mon, Dec 12, 2022 at 2:23 PM Tejun Heo <tj@kernel.org> wrote:
> > >
> > > On Mon, Dec 12, 2022 at 02:18:36PM +0800, Richard Clark wrote:
> > > > Currently the __WQ_DRAINING is used to prevent a new work item from queueing
> > > > to a draining workqueue, but this flag will be cleared before the end of a
> > > > RCU grace period. Because the workqueue instance is actually freed after
> > > > the RCU grace period, this fact results in an opening window in which a new
> > > > work item can be queued into a destorying workqueue and be scheduled
> > > > consequently, for instance, the below code snippet demos this accident:
> > >
> > > I mean, this is just use-after-free. The same scenario can happen with
> > > non-RCU frees or if there happens to be an RCU grace period inbetween. I'm
> > > not sure what's being protected here.
> >
> > I think it is a kind of debugging facility with no overhead in the
> > fast path.
> >
> > It is indeed the caller's responsibility not to do use-after-free.
> >
> > For non-RCU free, the freed workqueue's state can be arbitrary soon and
> > the caller might get a complaint. And if there are some kinds of debugging
> > facilities for freed memory, the system can notice the problem earlier.
> >
> > But now is RCU free for the workqueue, and the workqueue has nothing
> > different between before and after destroy_workqueue() unless the
> > grace period ends and the memory-allocation subsystem takes charge of
> > the memory.
>
> idk, maybe? It seems kinda out of scope. Richard, can you update the patch
> description and comment so that they clearly state that this is a debug aid
> to help spotting user errors?

Sure, will update soon

Thanks

>
> Thanks.
>
> --
> tejun
  

Patch

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index a0143dd24430..ac551b8ee7d9 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -335,6 +335,7 @@  enum {
 	 */
 	WQ_POWER_EFFICIENT	= 1 << 7,
 
+	__WQ_DESTROYING		= 1 << 15, /* internal: workqueue is destroying */
 	__WQ_DRAINING		= 1 << 16, /* internal: workqueue is draining */
 	__WQ_ORDERED		= 1 << 17, /* internal: workqueue is ordered */
 	__WQ_LEGACY		= 1 << 18, /* internal: create*_workqueue() */
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 39060a5d0905..09527c9db9cb 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1433,9 +1433,9 @@  static void __queue_work(int cpu, struct workqueue_struct *wq,
 	lockdep_assert_irqs_disabled();
 
 
-	/* if draining, only works from the same workqueue are allowed */
-	if (unlikely(wq->flags & __WQ_DRAINING) &&
-	    WARN_ON_ONCE(!is_chained_work(wq)))
+	/* if destroying or draining, only works from the same workqueue are allowed */
+	if (unlikely(wq->flags & (__WQ_DESTROYING | __WQ_DRAINING)
+		&& WARN_ON_ONCE(!is_chained_work(wq))))
 		return;
 	rcu_read_lock();
 retry:
@@ -4414,6 +4414,11 @@  void destroy_workqueue(struct workqueue_struct *wq)
 	 */
 	workqueue_sysfs_unregister(wq);
 
+	/* mark the workqueue destruction is in progress */
+	mutex_lock(&wq->mutex);
+	wq->flags |= __WQ_DESTROYING;
+	mutex_unlock(&wq->mutex);
+
 	/* drain it before proceeding with destruction */
 	drain_workqueue(wq);