Message ID | 20221212061836.3620-1-richard.xnu.clark@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp2077838wrr; Sun, 11 Dec 2022 22:33:16 -0800 (PST) X-Google-Smtp-Source: AA0mqf7PCIZ7UiwAFRKleuhPQ7SpDrhDnXfMOhm1H0AkLlvnQXkHYdfyB6NyuKHe8jlppWN4iG7q X-Received: by 2002:a50:fe15:0:b0:46d:731f:d726 with SMTP id f21-20020a50fe15000000b0046d731fd726mr13284493edt.22.1670826796424; Sun, 11 Dec 2022 22:33:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670826796; cv=none; d=google.com; s=arc-20160816; b=OggydCu2N0CjOnfaCwhmVXP1HaghX2tBcnAYSP39JZK/N/fSMbTdV417a8tHV6fh16 cjiGmLJsQxHQJHFCT4FEmlUNats/i7zH+EgRSRxAYyq+pkrR8QartsViflQUv+HLHYMn ofoYzIlh1nEl2e2gTF0ryKfb06K41WmmKC5phoJyG8TFbJjUUdwGlVV7KioBbWm7XUhB dvFn2pULZkKBT1Wcdh7pQRsuKjbnZgpqRq6bjCHO2zN9635R+SUJTdqfX7M1uAtLzYOB BNSi2JkIdHsMDj144Gb1pspfyuqZaI0CEs1mevwJOUEd0APwapG+VnQaOqCiA9cstblv 3TRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=zRiAhQfW9zEgSY0UOYSIwvqI1zEk4A5iqjfPvFfb03k=; b=x9VJsymsxyCvAUOvw4cb3xIlooiqYWldnr42WQUmrCEe2UraOm1Eywv32931Om+zNc Q3jWMw7yaau2IKxEiKrZIs0E3On8rii7hSihuKlX4KG8wUoeujXxwuxnLcPaGPzTSitP d2xDXmws1n4btw0Tqzbv1l8RokGLbuRfsXcIMX3Jw5rZNPiByekljpB/DCKrMX/OHNkA 0do+QcBLKfSPdClW0ACGhK2flg5GpD8xMOTWbvPbsrjGrnjAhhACmuLWY/GFWG/5Jnj7 9ar9j7id3t22VHHGZQM9wi8cbxuLpvhxp1T8tV8xNzZR/+ryRdPMXMCwC5hjUccVQyS6 5moQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=JZfZNOLo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s6-20020a056402520600b0046c9a4ab650si8150654edd.393.2022.12.11.22.32.52; Sun, 11 Dec 2022 22:33:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=JZfZNOLo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231245AbiLLGT7 (ORCPT <rfc822;jeantsuru.cumc.mandola@gmail.com> + 99 others); Mon, 12 Dec 2022 01:19:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36334 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229726AbiLLGT5 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 12 Dec 2022 01:19:57 -0500 Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A770B1EE for <linux-kernel@vger.kernel.org>; Sun, 11 Dec 2022 22:19:54 -0800 (PST) Received: by mail-pj1-x102c.google.com with SMTP id w4-20020a17090ac98400b002186f5d7a4cso14590519pjt.0 for <linux-kernel@vger.kernel.org>; Sun, 11 Dec 2022 22:19:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=zRiAhQfW9zEgSY0UOYSIwvqI1zEk4A5iqjfPvFfb03k=; b=JZfZNOLoPP63jKFM/KWRIKkVvvHmhX81gpBv4S2vAHF1EAM1RnSkYbeJhUdkF7CmOP hJnAIkvNuew5Mkrl05teoqY5+Wq+4Xu70ireh6T0IzKcYI9mCs0aFt5MPH96xJ9nPJti MCiVmD6yGnlkG9ZxJdtmvQegAgMr+0h1KnKufU5hyS2hZxmifPUpdtXrcKE4dC4Jlq5w kK5auMjenfWYMJvN+1g8jJ6DU0p0ukwyfNlj3ctITfmy6avwEFsq5o3ubevoYijtQ2lB hs7M7R69YvLFc9tvqvTSOiDvpWp++A3oWP+x9lIRsaagiPXhhqlc0yo8mlYQiBPbYk+s OaCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=zRiAhQfW9zEgSY0UOYSIwvqI1zEk4A5iqjfPvFfb03k=; b=32WInOE5SihbIk3vaHR11J/ceYnE+mLRQw8CDaDyqwN2ErMDnKw7uU7Q74jxOqlqMp TNj/sIDBQdnVlMVqBORoujgarDJwJ0IkuZTD3QlCC+T6/38B7LT2btT+nTkFVffY0aCG e5yednFAR5Kx7Zfmbw+KPjoBz4uEcmWlXf9V7V13xy2+doEs8Q2wbBTtT5nvlIgJc+8C iEXyyVgarCRRJlmWyMZIleKc9dj95yJS3IcsLyiDG1h5XKfnqazITeneOw63OuECU8QG yY2GZa2IBVqROeaVdWrblL3ozFy62iXl0gTTYe6Wz6pEpFBWjXRBaTLlrRBZgzfPpPlB H4iA== X-Gm-Message-State: ANoB5pmVNZQYdEpZJ+6HvnMld6VHqRjrvfi6+s/GmQOG+OQs7xxICPT2 Kb+18E+u0mT4Mh1QdJRtgGs= X-Received: by 2002:a17:903:2412:b0:188:82fc:e277 with SMTP id e18-20020a170903241200b0018882fce277mr16091283plo.12.1670825994012; Sun, 11 Dec 2022 22:19:54 -0800 (PST) Received: from den-workstation.fareast.nevint.com ([140.206.46.75]) by smtp.gmail.com with ESMTPSA id i2-20020a17090332c200b001892af9472esm5371089plr.261.2022.12.11.22.19.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Dec 2022 22:19:53 -0800 (PST) From: Richard Clark <richard.xnu.clark@gmail.com> To: jiangshanlai@gmail.com Cc: tj@kernel.org, linux-kernel@vger.kernel.org, Richard Clark <richard.xnu.clark@gmail.com> Subject: [PATCH] workqueue: Prevent a new work item from queueing into a destruction wq Date: Mon, 12 Dec 2022 14:18:36 +0800 Message-Id: <20221212061836.3620-1-richard.xnu.clark@gmail.com> X-Mailer: git-send-email 2.37.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751988878495172058?= X-GMAIL-MSGID: =?utf-8?q?1751988878495172058?= |
Series |
workqueue: Prevent a new work item from queueing into a destruction wq
|
|
Commit Message
richard clark
Dec. 12, 2022, 6:18 a.m. UTC
Currently the __WQ_DRAINING is used to prevent a new work item from queueing
to a draining workqueue, but this flag will be cleared before the end of a
RCU grace period. Because the workqueue instance is actually freed after
the RCU grace period, this fact results in an opening window in which a new
work item can be queued into a destorying workqueue and be scheduled
consequently, for instance, the below code snippet demos this accident:
static void work_func(struct work_struct *work)
{
pr_info("%s scheduled\n", __func__);
}
static DECLARE_WORK(w0, work_func);
struct workqueue_struct * wq0 = alloc_workqueue("wq0", 0, 0);
destroy_workqueue(wq0);
queue_work_on(1, wq0, &w0);
The above work_func(...) can be scheduled by a destroying workqueue.
This patch will close that window by introducing a new flag __WQ_DESTROYING,
which will keep valid until the end of a RCU grace period. With this commit,
the above code will trigger a WARNING message and return directly from
__queue_work(...), like this:
WARNING: CPU: 7 PID: 3994 at kernel/workqueue.c:1438 __queue_work+0x3ec/0x580
Signed-off-by: Richard Clark <richard.xnu.clark@gmail.com>
---
include/linux/workqueue.h | 1 +
kernel/workqueue.c | 11 ++++++++---
2 files changed, 9 insertions(+), 3 deletions(-)
Comments
On Mon, Dec 12, 2022 at 02:18:36PM +0800, Richard Clark wrote: > Currently the __WQ_DRAINING is used to prevent a new work item from queueing > to a draining workqueue, but this flag will be cleared before the end of a > RCU grace period. Because the workqueue instance is actually freed after > the RCU grace period, this fact results in an opening window in which a new > work item can be queued into a destorying workqueue and be scheduled > consequently, for instance, the below code snippet demos this accident: I mean, this is just use-after-free. The same scenario can happen with non-RCU frees or if there happens to be an RCU grace period inbetween. I'm not sure what's being protected here. Thanks.
On Mon, Dec 12, 2022 at 2:23 PM Tejun Heo <tj@kernel.org> wrote: > > On Mon, Dec 12, 2022 at 02:18:36PM +0800, Richard Clark wrote: > > Currently the __WQ_DRAINING is used to prevent a new work item from queueing > > to a draining workqueue, but this flag will be cleared before the end of a > > RCU grace period. Because the workqueue instance is actually freed after > > the RCU grace period, this fact results in an opening window in which a new > > work item can be queued into a destorying workqueue and be scheduled > > consequently, for instance, the below code snippet demos this accident: > > I mean, this is just use-after-free. The same scenario can happen with > non-RCU frees or if there happens to be an RCU grace period inbetween. I'm > not sure what's being protected here. I think it is a kind of debugging facility with no overhead in the fast path. It is indeed the caller's responsibility not to do use-after-free. For non-RCU free, the freed workqueue's state can be arbitrary soon and the caller might get a complaint. And if there are some kinds of debugging facilities for freed memory, the system can notice the problem earlier. But now is RCU free for the workqueue, and the workqueue has nothing different between before and after destroy_workqueue() unless the grace period ends and the memory-allocation subsystem takes charge of the memory. Thanks Lai
On Mon, Dec 12, 2022 at 2:23 PM Tejun Heo <tj@kernel.org> wrote: > > On Mon, Dec 12, 2022 at 02:18:36PM +0800, Richard Clark wrote: > > Currently the __WQ_DRAINING is used to prevent a new work item from queueing > > to a draining workqueue, but this flag will be cleared before the end of a > > RCU grace period. Because the workqueue instance is actually freed after > > the RCU grace period, this fact results in an opening window in which a new > > work item can be queued into a destorying workqueue and be scheduled > > consequently, for instance, the below code snippet demos this accident: > > I mean, this is just use-after-free. The same scenario can happen with IMO, it's not exactly the use-after-free since no free action before the end of RCU grace period, if it really is then the code will trigger a kernel BUG: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page ... Which can be easily observed for both non-RCU frees and RCU frees. Thanks > non-RCU frees or if there happens to be an RCU grace period inbetween. I'm > not sure what's being protected here. > > Thanks. > > -- > tejun
On Mon, Dec 12, 2022 at 2:48 PM Lai Jiangshan <jiangshanlai@gmail.com> wrote: > > On Mon, Dec 12, 2022 at 2:23 PM Tejun Heo <tj@kernel.org> wrote: > > > > On Mon, Dec 12, 2022 at 02:18:36PM +0800, Richard Clark wrote: > > > Currently the __WQ_DRAINING is used to prevent a new work item from queueing > > > to a draining workqueue, but this flag will be cleared before the end of a > > > RCU grace period. Because the workqueue instance is actually freed after > > > the RCU grace period, this fact results in an opening window in which a new > > > work item can be queued into a destorying workqueue and be scheduled > > > consequently, for instance, the below code snippet demos this accident: > > > > I mean, this is just use-after-free. The same scenario can happen with > > non-RCU frees or if there happens to be an RCU grace period inbetween. I'm > > not sure what's being protected here. > > I think it is a kind of debugging facility with no overhead in the > fast path. Agree... > > It is indeed the caller's responsibility not to do use-after-free. > > For non-RCU free, the freed workqueue's state can be arbitrary soon and > the caller might get a complaint. And if there are some kinds of debugging > facilities for freed memory, the system can notice the problem earlier. This case will trigger a noticeable kernel BUG > > But now is RCU free for the workqueue, and the workqueue has nothing > different between before and after destroy_workqueue() unless the > grace period ends and the memory-allocation subsystem takes charge of > the memory. > destroy_workqueue(wq0); schedule_timeout_interruptible(msecs_to_jiffies(1000)); queue_work_on(1, wq0, &w0); Sleep 1s to guarantee the RCU grace period completes, then the same result with non-RCU free Thanks > > > Thanks > Lai
On Mon, Dec 12, 2022 at 02:48:25PM +0800, Lai Jiangshan wrote: > On Mon, Dec 12, 2022 at 2:23 PM Tejun Heo <tj@kernel.org> wrote: > > > > On Mon, Dec 12, 2022 at 02:18:36PM +0800, Richard Clark wrote: > > > Currently the __WQ_DRAINING is used to prevent a new work item from queueing > > > to a draining workqueue, but this flag will be cleared before the end of a > > > RCU grace period. Because the workqueue instance is actually freed after > > > the RCU grace period, this fact results in an opening window in which a new > > > work item can be queued into a destorying workqueue and be scheduled > > > consequently, for instance, the below code snippet demos this accident: > > > > I mean, this is just use-after-free. The same scenario can happen with > > non-RCU frees or if there happens to be an RCU grace period inbetween. I'm > > not sure what's being protected here. > > I think it is a kind of debugging facility with no overhead in the > fast path. > > It is indeed the caller's responsibility not to do use-after-free. > > For non-RCU free, the freed workqueue's state can be arbitrary soon and > the caller might get a complaint. And if there are some kinds of debugging > facilities for freed memory, the system can notice the problem earlier. > > But now is RCU free for the workqueue, and the workqueue has nothing > different between before and after destroy_workqueue() unless the > grace period ends and the memory-allocation subsystem takes charge of > the memory. idk, maybe? It seems kinda out of scope. Richard, can you update the patch description and comment so that they clearly state that this is a debug aid to help spotting user errors? Thanks.
On Tue, Dec 13, 2022 at 6:27 AM Tejun Heo <tj@kernel.org> wrote: > > On Mon, Dec 12, 2022 at 02:48:25PM +0800, Lai Jiangshan wrote: > > On Mon, Dec 12, 2022 at 2:23 PM Tejun Heo <tj@kernel.org> wrote: > > > > > > On Mon, Dec 12, 2022 at 02:18:36PM +0800, Richard Clark wrote: > > > > Currently the __WQ_DRAINING is used to prevent a new work item from queueing > > > > to a draining workqueue, but this flag will be cleared before the end of a > > > > RCU grace period. Because the workqueue instance is actually freed after > > > > the RCU grace period, this fact results in an opening window in which a new > > > > work item can be queued into a destorying workqueue and be scheduled > > > > consequently, for instance, the below code snippet demos this accident: > > > > > > I mean, this is just use-after-free. The same scenario can happen with > > > non-RCU frees or if there happens to be an RCU grace period inbetween. I'm > > > not sure what's being protected here. > > > > I think it is a kind of debugging facility with no overhead in the > > fast path. > > > > It is indeed the caller's responsibility not to do use-after-free. > > > > For non-RCU free, the freed workqueue's state can be arbitrary soon and > > the caller might get a complaint. And if there are some kinds of debugging > > facilities for freed memory, the system can notice the problem earlier. > > > > But now is RCU free for the workqueue, and the workqueue has nothing > > different between before and after destroy_workqueue() unless the > > grace period ends and the memory-allocation subsystem takes charge of > > the memory. > > idk, maybe? It seems kinda out of scope. Richard, can you update the patch > description and comment so that they clearly state that this is a debug aid > to help spotting user errors? Sure, will update soon Thanks > > Thanks. > > -- > tejun
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index a0143dd24430..ac551b8ee7d9 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -335,6 +335,7 @@ enum { */ WQ_POWER_EFFICIENT = 1 << 7, + __WQ_DESTROYING = 1 << 15, /* internal: workqueue is destroying */ __WQ_DRAINING = 1 << 16, /* internal: workqueue is draining */ __WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */ __WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */ diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 39060a5d0905..09527c9db9cb 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1433,9 +1433,9 @@ static void __queue_work(int cpu, struct workqueue_struct *wq, lockdep_assert_irqs_disabled(); - /* if draining, only works from the same workqueue are allowed */ - if (unlikely(wq->flags & __WQ_DRAINING) && - WARN_ON_ONCE(!is_chained_work(wq))) + /* if destroying or draining, only works from the same workqueue are allowed */ + if (unlikely(wq->flags & (__WQ_DESTROYING | __WQ_DRAINING) + && WARN_ON_ONCE(!is_chained_work(wq)))) return; rcu_read_lock(); retry: @@ -4414,6 +4414,11 @@ void destroy_workqueue(struct workqueue_struct *wq) */ workqueue_sysfs_unregister(wq); + /* mark the workqueue destruction is in progress */ + mutex_lock(&wq->mutex); + wq->flags |= __WQ_DESTROYING; + mutex_unlock(&wq->mutex); + /* drain it before proceeding with destruction */ drain_workqueue(wq);