Message ID | 20230523075857.76520-1-duoming@zju.edu.cn |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1979738vqo; Tue, 23 May 2023 01:25:47 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4dyhPCvePOVnsx+juc1yynI2mbrj3lMnAq6BZjSagYkiLvmulyCYYAiuSs/rxOHlTybz7C X-Received: by 2002:a17:903:2682:b0:1ac:a28e:4b29 with SMTP id jf2-20020a170903268200b001aca28e4b29mr12494256plb.26.1684830346940; Tue, 23 May 2023 01:25:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684830346; cv=none; d=google.com; s=arc-20160816; b=inQFUEanLxL8cZVA/tyg1N/edTsHR0fpUUHdvf8YHOi2Kt6zqCZFihSyyvceVRUt4A O/lrQtZGZbJarmb9bZkyuC6qUr1JIrbOH/AqMm7J7l8Q0TDXMUCDrJ54W6TWmQaR3f3U kl07AHKjt9d6FxeBDhrMwiu4NBT18ycEtnJAiBbhJvydwSRoHnc73G4REbPVCViRbvRV YuR5dF7G9vY3QkD1dy7VYjZWH2S5zZ1cGZgSySIn3kPoBx83opd6U+Fu6dmhmQWQE/e5 RWfDGMVeLKnkPW63gl6iEiO+smWMkBEkgie5BGxd7T7ffp61THp3Pn18UrYKtPmh7R6x Fktg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from; bh=ZwYs1vSHu4BN8qMaRsxuhcyX1P1s54EzoPj9uQeqRZA=; b=j3fiQ82F7kenzG5/vSshyxRkD9OsH6UaMvPb0S627XaffsEncbQy6WP0NzOVzs0509 jDrjtEJsXTEQ2ITww88KR0DoLg5FznTz7e+KrXSftCPym+UhwqLcohomTZwnH1uQuOka jBV3Dis0kRAunOtsep6lpbNfrYDwhwQYGod8SA22Xbmuphy+/1r5yEiKu87VoFywf99J WNhygEHWHB0nCFoiKUn9cojD7V8h7+4MSXCZ0Gv+pQPb9E3l3cmkHg6nSkp40Cv4kmMi DrN4JvK7IDS7bomH9RmX9ZvzJT+N+CYKC3GvVS1LLFLMFThb9qV9VneIB5mqb1DNQu7M jIww== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u3-20020a170902b28300b001ac3f74f488si6126802plr.79.2023.05.23.01.25.34; Tue, 23 May 2023 01:25:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236021AbjEWIIc (ORCPT <rfc822;ahmedalshaiji.dev@gmail.com> + 99 others); Tue, 23 May 2023 04:08:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33038 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236102AbjEWIIS (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 23 May 2023 04:08:18 -0400 Received: from zg8tndyumtaxlji0oc4xnzya.icoremail.net (zg8tndyumtaxlji0oc4xnzya.icoremail.net [46.101.248.176]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id ADB1110C3 for <linux-kernel@vger.kernel.org>; Tue, 23 May 2023 01:07:50 -0700 (PDT) Received: from ubuntu.localdomain (unknown [218.12.19.251]) by mail-app2 (Coremail) with SMTP id by_KCgDX_n1Ccmxk1PiyAw--.43277S2; Tue, 23 May 2023 15:59:07 +0800 (CST) From: Duoming Zhou <duoming@zju.edu.cn> To: linux-kernel@vger.kernel.org Cc: agk@redhat.com, snitzer@kernel.org, dm-devel@redhat.com, Duoming Zhou <duoming@zju.edu.cn> Subject: [PATCH] dm crypt: fix sleep-in-atomic-context bug in kcryptd_crypt_tasklet Date: Tue, 23 May 2023 15:58:57 +0800 Message-Id: <20230523075857.76520-1-duoming@zju.edu.cn> X-Mailer: git-send-email 2.17.1 X-CM-TRANSID: by_KCgDX_n1Ccmxk1PiyAw--.43277S2 X-Coremail-Antispam: 1UD129KBjvJXoW7Zr18ur45Ww47Wr1kXFyrWFg_yoW8Xw47pF WruF95CFy8Gr4Ygw1DKF18tFy5Gw4kGFW3GFW8Wa43AF15Xr1vvFy2krWUtr4UZF95ZFy7 ZFWkAay5WF1qy37anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUkS14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26F4U JVW0owA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oV Cq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0 I7IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r 4UM4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY02Avz4vE14v_GF4l 42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJV WUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAK I48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r 4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY 6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfU52NtUUUUU X-CM-SenderInfo: qssqjiasttq6lmxovvfxof0/1tbiAwQEAWRrhN4LtgAxsb X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766672665676381167?= X-GMAIL-MSGID: =?utf-8?q?1766672665676381167?= |
Series |
dm crypt: fix sleep-in-atomic-context bug in kcryptd_crypt_tasklet
|
|
Commit Message
Duoming Zhou
May 23, 2023, 7:58 a.m. UTC
In order to improve the IO performance of the dm-crypt
implementation, the commit 39d42fa96ba1 ("dm crypt:
add flags to optionally bypass kcryptd workqueues")
adds tasklet to do the crypto operations.
The tasklet callback function kcryptd_crypt_tasklet()
calls kcryptd_crypt() which is an original workqueue
callback function that may sleep. As a result, the
sleep-in-atomic-context bug may happen. The process
is shown below.
(atomic context)
kcryptd_crypt_tasklet()
kcryptd_crypt()
kcryptd_crypt_write_convert()
wait_for_completion() //may sleep
The wait_for_completion() is a function that may sleep.
In order to mitigate the bug, this patch changes
wait_for_completion() to try_wait_for_completion().
Fixes: 39d42fa96ba1 ("dm crypt: add flags to optionally bypass kcryptd workqueues")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
drivers/md/dm-crypt.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Comments
On Tue, May 23 2023 at 3:58P -0400, Duoming Zhou <duoming@zju.edu.cn> wrote: > In order to improve the IO performance of the dm-crypt > implementation, the commit 39d42fa96ba1 ("dm crypt: > add flags to optionally bypass kcryptd workqueues") > adds tasklet to do the crypto operations. > > The tasklet callback function kcryptd_crypt_tasklet() > calls kcryptd_crypt() which is an original workqueue > callback function that may sleep. As a result, the > sleep-in-atomic-context bug may happen. The process > is shown below. > > (atomic context) > kcryptd_crypt_tasklet() > kcryptd_crypt() > kcryptd_crypt_write_convert() > wait_for_completion() //may sleep > > The wait_for_completion() is a function that may sleep. > In order to mitigate the bug, this patch changes > wait_for_completion() to try_wait_for_completion(). > > Fixes: 39d42fa96ba1 ("dm crypt: add flags to optionally bypass kcryptd workqueues") > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> > --- > drivers/md/dm-crypt.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c > index 8b47b913ee8..5e2b2463d87 100644 > --- a/drivers/md/dm-crypt.c > +++ b/drivers/md/dm-crypt.c > @@ -2085,7 +2085,8 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io) > crypt_finished = atomic_dec_and_test(&ctx->cc_pending); > if (!crypt_finished && kcryptd_crypt_write_inline(cc, ctx)) { > /* Wait for completion signaled by kcryptd_async_done() */ > - wait_for_completion(&ctx->restart); > + while (!try_wait_for_completion(&ctx->restart)) > + ; > crypt_finished = 1; > } > > -- > 2.17.1 > Cc'ing Ignat for closer review. But wasn't this already addressed with this commit?: 8abec36d1274 dm crypt: do not wait for backlogged crypto request completion in softirq Mike
Hello, On Tue, 23 May 2023 13:53:23 -0400 Mike Snitzer wrote: > > In order to improve the IO performance of the dm-crypt > > implementation, the commit 39d42fa96ba1 ("dm crypt: > > add flags to optionally bypass kcryptd workqueues") > > adds tasklet to do the crypto operations. > > > > The tasklet callback function kcryptd_crypt_tasklet() > > calls kcryptd_crypt() which is an original workqueue > > callback function that may sleep. As a result, the > > sleep-in-atomic-context bug may happen. The process > > is shown below. > > > > (atomic context) > > kcryptd_crypt_tasklet() > > kcryptd_crypt() > > kcryptd_crypt_write_convert() > > wait_for_completion() //may sleep > > > > The wait_for_completion() is a function that may sleep. > > In order to mitigate the bug, this patch changes > > wait_for_completion() to try_wait_for_completion(). > > > > Fixes: 39d42fa96ba1 ("dm crypt: add flags to optionally bypass kcryptd workqueues") > > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> > > --- > > drivers/md/dm-crypt.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c > > index 8b47b913ee8..5e2b2463d87 100644 > > --- a/drivers/md/dm-crypt.c > > +++ b/drivers/md/dm-crypt.c > > @@ -2085,7 +2085,8 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io) > > crypt_finished = atomic_dec_and_test(&ctx->cc_pending); > > if (!crypt_finished && kcryptd_crypt_write_inline(cc, ctx)) { > > /* Wait for completion signaled by kcryptd_async_done() */ > > - wait_for_completion(&ctx->restart); > > + while (!try_wait_for_completion(&ctx->restart)) > > + ; > > crypt_finished = 1; > > } > > > > -- > > 2.17.1 > > > > Cc'ing Ignat for closer review. > > But wasn't this already addressed with this commit?: > 8abec36d1274 dm crypt: do not wait for backlogged crypto request completion in softirq > > Mike Thank you for your review, I think the commit 8abec36d1274 ("dm crypt: do not wait for backlogged crypto request completion in softirq") could not solve this problem. When crypt_convert() returns BLK_STS_PROTECTION or BLK_STS_IOERR, meanwhile, there is request that is queued and wait kcryptd_async_done() to process. The workqueue callback function kcryptd_crypt_read_continue() will not be called. The variable crypt_finished equals to zero, it will call wait_for_completion() that may sleep. The relevant codes are shown below: static blk_status_t crypt_convert(...) { ... /* * The request is queued and processed asynchronously, * completion function kcryptd_async_done() will be called. */ case -EINPROGRESS: ctx->r.req = NULL; ctx->cc_sector += sector_step; tag_offset++; continue; ... /* * There was a data integrity error. */ case -EBADMSG: atomic_dec(&ctx->cc_pending); return BLK_STS_PROTECTION; /* * There was an error while processing the request. */ default: atomic_dec(&ctx->cc_pending); return BLK_STS_IOERR; } ... } static void kcryptd_crypt_write_convert(...) { ... r = crypt_convert(); //return BLK_STS_PROTECTION or BLK_STS_IOERR ... if (r == BLK_STS_DEV_RESOURCE) { //this condition is not satisfied, the workqueue will not be called. INIT_WORK(&io->work, kcryptd_crypt_write_continue); queue_work(cc->crypt_queue, &io->work); return; } ... // crypt_finished is zero, because there is request that is queued and wait kcryptd_async_done() to process. crypt_finished = atomic_dec_and_test(&ctx->cc_pending); if (!crypt_finished && kcryptd_crypt_write_inline(cc, ctx)) { /* Wait for completion signaled by kcryptd_async_done() */ wait_for_completion(&ctx->restart); //may sleep! ... } ... } Best regards, Duoming Zhou
HI, On Thu, May 25, 2023 at 3:34 AM <duoming@zju.edu.cn> wrote: > > Hello, > > On Tue, 23 May 2023 13:53:23 -0400 Mike Snitzer wrote: > > > > In order to improve the IO performance of the dm-crypt > > > implementation, the commit 39d42fa96ba1 ("dm crypt: > > > add flags to optionally bypass kcryptd workqueues") > > > adds tasklet to do the crypto operations. > > > > > > The tasklet callback function kcryptd_crypt_tasklet() > > > calls kcryptd_crypt() which is an original workqueue > > > callback function that may sleep. As a result, the > > > sleep-in-atomic-context bug may happen. The process > > > is shown below. > > > > > > (atomic context) Did you actually encounter this in practice? I know it is possible from the dm-crypt code perspective, but during my initial testing I could never trigger a setup when the write path was happening in atomic context > > > kcryptd_crypt_tasklet() > > > kcryptd_crypt() > > > kcryptd_crypt_write_convert() > > > wait_for_completion() //may sleep > > > > > > The wait_for_completion() is a function that may sleep. > > > In order to mitigate the bug, this patch changes > > > wait_for_completion() to try_wait_for_completion(). > > > > > > Fixes: 39d42fa96ba1 ("dm crypt: add flags to optionally bypass kcryptd workqueues") > > > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> > > > --- > > > drivers/md/dm-crypt.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c > > > index 8b47b913ee8..5e2b2463d87 100644 > > > --- a/drivers/md/dm-crypt.c > > > +++ b/drivers/md/dm-crypt.c > > > @@ -2085,7 +2085,8 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io) > > > crypt_finished = atomic_dec_and_test(&ctx->cc_pending); > > > if (!crypt_finished && kcryptd_crypt_write_inline(cc, ctx)) { > > > /* Wait for completion signaled by kcryptd_async_done() */ > > > - wait_for_completion(&ctx->restart); > > > + while (!try_wait_for_completion(&ctx->restart)) > > > + ; We can't just busy-loop here. This effectively creates a spin-lock. And probably on a single CPU system it would hang everything, because the code would be stuck here and would never give the chance to other code to signal completion. You might need to do something similar to how crypt_convert handles this for "EBUSY" case. > > > crypt_finished = 1; > > > } > > > > > > -- > > > 2.17.1 > > > > > > > Cc'ing Ignat for closer review. > > > > But wasn't this already addressed with this commit?: > > 8abec36d1274 dm crypt: do not wait for backlogged crypto request completion in softirq > > > > Mike > > Thank you for your review, I think the commit 8abec36d1274 ("dm crypt: > do not wait for backlogged crypto request completion in softirq") could > not solve this problem. > > When crypt_convert() returns BLK_STS_PROTECTION or BLK_STS_IOERR, meanwhile, > there is request that is queued and wait kcryptd_async_done() to process. > The workqueue callback function kcryptd_crypt_read_continue() will not be called. > The variable crypt_finished equals to zero, it will call wait_for_completion() > that may sleep. The relevant codes are shown below: > > static blk_status_t crypt_convert(...) > { > ... > /* > * The request is queued and processed asynchronously, > * completion function kcryptd_async_done() will be called. > */ > case -EINPROGRESS: > ctx->r.req = NULL; > ctx->cc_sector += sector_step; > tag_offset++; > continue; > ... > /* > * There was a data integrity error. > */ > case -EBADMSG: > atomic_dec(&ctx->cc_pending); > return BLK_STS_PROTECTION; > /* > * There was an error while processing the request. > */ > default: > atomic_dec(&ctx->cc_pending); > return BLK_STS_IOERR; > } > ... > } > > static void kcryptd_crypt_write_convert(...) > { > ... > r = crypt_convert(); //return BLK_STS_PROTECTION or BLK_STS_IOERR > ... > if (r == BLK_STS_DEV_RESOURCE) { //this condition is not satisfied, the workqueue will not be called. Should we just handle BLK_STS_PROTECTION and BLK_STS_IOERR similarly to BLK_STS_DEV_RESOURCE? > INIT_WORK(&io->work, kcryptd_crypt_write_continue); > queue_work(cc->crypt_queue, &io->work); > return; > } > ... > // crypt_finished is zero, because there is request that is queued and wait kcryptd_async_done() to process. > crypt_finished = atomic_dec_and_test(&ctx->cc_pending); > if (!crypt_finished && kcryptd_crypt_write_inline(cc, ctx)) { > /* Wait for completion signaled by kcryptd_async_done() */ > wait_for_completion(&ctx->restart); //may sleep! > ... > } > ... > } > > Best regards, > Duoming Zhou > > > > Ignat
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index 8b47b913ee8..5e2b2463d87 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -2085,7 +2085,8 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io) crypt_finished = atomic_dec_and_test(&ctx->cc_pending); if (!crypt_finished && kcryptd_crypt_write_inline(cc, ctx)) { /* Wait for completion signaled by kcryptd_async_done() */ - wait_for_completion(&ctx->restart); + while (!try_wait_for_completion(&ctx->restart)) + ; crypt_finished = 1; }