Message ID | 20221103013937.603626-1-khazhy@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp262350wru; Wed, 2 Nov 2022 19:00:27 -0700 (PDT) X-Google-Smtp-Source: AMsMyM73//tJKyflpT1GWB3Q41z+Nszn9MyWW/JIV6jxDHidlpbrS7LhYTabMr69QLR3F5BUtYiS X-Received: by 2002:a17:907:72c3:b0:792:56d7:285a with SMTP id du3-20020a17090772c300b0079256d7285amr26273134ejc.597.1667440827552; Wed, 02 Nov 2022 19:00:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667440827; cv=none; d=google.com; s=arc-20160816; b=Mh9HrPMu3qt6m7U6+8dVibeYNwkk+9vJonaqrARDSJPJPP74TA9upYmVE8gmfmCtdD tQL4ZvPg1IVX0wRW7qtt8Ei9jpAHeRsE5wxVst2rKOhRrKaChvquAaRpeeobwNCRr5fz Fj95T8+7RLAWsBAvjceZlaP4e8PBzDX6nMs7vYLH5bxev5iUsDKqzGxtg/4WIWJ5TWUI T9tltTfNjjNQmT3ABaAYiZIuv45Wxnun1YYvhCTMRsN/agjKwA0GORbJvz2qS2TcVnqG WDDs0MZfFf4ZQkh8uSTTFQk+vMJvaIz3q53mjGduTDpIjJl70Nj/VtT8YDBxAZVvsSbJ 5eww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=EvT5dgXSYKBpxcVTYD1qf8cSHTUWaJZF8xGQfaZD+ZU=; b=oHG/rKhVVkZbQ5gbiArxYT4KqrdSdLwqtWOTRVEHu6nImCMqxPbQ9s3bvUrMaOvCzp i34UZqiq2yOVeGocwL3ikoLksJwtACUonVW5ljAEj1i+JhRXDupOkumNNfk1RHBPEaRY 8QhzdTpluCECVsRyVf+rJCx77P27K99xfTWNh8RqKHCX4ddk/PK7m8AfWe5BLSpi7Keb vV6i95w0PK1wW8PNwiy2QxXxN8so3hT8pXqvbAlB2doj2pFCx/Uy1ZZgGzRvL/s0Pzjc nV8Yf8fbda9FLbyVVHHRi0TAYyYfN2Uyw9JP3+I3wr/+mtnJ2qjO52rlUDPkjUZ/zO/L tA7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="a/RROOv1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ca7-20020aa7cd67000000b0045caa8bf80bsi15232098edb.307.2022.11.02.19.00.01; Wed, 02 Nov 2022 19:00:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="a/RROOv1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230303AbiKCBki (ORCPT <rfc822;yves.mi.zy@gmail.com> + 99 others); Wed, 2 Nov 2022 21:40:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229459AbiKCBkf (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 2 Nov 2022 21:40:35 -0400 Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09F441114D for <linux-kernel@vger.kernel.org>; Wed, 2 Nov 2022 18:40:34 -0700 (PDT) Received: by mail-pl1-x62e.google.com with SMTP id v17so555954plo.1 for <linux-kernel@vger.kernel.org>; Wed, 02 Nov 2022 18:40:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=EvT5dgXSYKBpxcVTYD1qf8cSHTUWaJZF8xGQfaZD+ZU=; b=a/RROOv1Lf+Wm2fr9D3eQRp6ISUL4AOlQae9HPGthkEBzMVudH/Ngx9bXGynny5St8 g0gYbfl3i3N9mbrpII9cqdelc6e1jWDzwrnmemcXg1eQTOPVbzFlUqRFvUfpWh4Ma511 97Sta3HSo6BKDK7nmnjJrX5BWxK3RqZtOqATk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=EvT5dgXSYKBpxcVTYD1qf8cSHTUWaJZF8xGQfaZD+ZU=; b=opeFk1AxNcC3bBrLKCxzm13aauOiJg84aTocKzcatr7dlBIhUc+E++RRzkxhC/ofiM tUYze0b+/PpqbuY+r1hyQlSg6GkhTRvJo63sMO8Wie4uWT3Q9CcTQ22OcfATf6o6SARr vVltvkq6HRp+L+dLYZQNMj41lef279ylqaDNbFZUIZYrgNnToiaRFNkxqxwByF8fFcV/ ncP9yUqYtbdj8mRO4AdsibNuOntJzaZ0u/cQ1n3phENNa8QrtEEOOEyTGRw8sr6xr4Ji PoTXzAWUTpmmB8QqBt49Wydxx+IHUJZq6d2dpoLADFwE9TbHvhcNQWYyPzl0C8OrJYjo CCyA== X-Gm-Message-State: ACrzQf2b5OaqGNclKfW+MMCc/WH3Gj8xVXdjxs7kqGJvLJvBYLj9S0CN CntG+6rMyhtZ1Iw1o6LOJ5l/bA== X-Received: by 2002:a17:90a:2a02:b0:214:247a:c185 with SMTP id i2-20020a17090a2a0200b00214247ac185mr8234172pjd.226.1667439633589; Wed, 02 Nov 2022 18:40:33 -0700 (PDT) Received: from khazhy-linux.svl.corp.google.com ([2620:15c:2d4:203:1464:23cb:dead:55d3]) by smtp.gmail.com with ESMTPSA id x8-20020a17090abc8800b0020ad86f4c54sm2091370pjr.16.2022.11.02.18.40.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Nov 2022 18:40:33 -0700 (PDT) From: Khazhismel Kumykov <khazhy@chromium.org> X-Google-Original-From: Khazhismel Kumykov <khazhy@google.com> To: Paolo Valente <paolo.valente@linaro.org>, Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Khazhismel Kumykov <khazhy@google.com> Subject: [RFC PATCH] bfq: fix waker_bfqq inconsistency crash Date: Wed, 2 Nov 2022 18:39:37 -0700 Message-Id: <20221103013937.603626-1-khazhy@google.com> X-Mailer: git-send-email 2.38.1.273.g43a17bfeac-goog MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748438433438530545?= X-GMAIL-MSGID: =?utf-8?q?1748438433438530545?= |
Series |
[RFC] bfq: fix waker_bfqq inconsistency crash
|
|
Commit Message
Khazhismel Kumykov
Nov. 3, 2022, 1:39 a.m. UTC
This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
but woken_list_node still being hashed. This would happen when
bfq_init_rq() expects a brand new allocated queue to be returned from
bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
without resetting woken_list_node. Since we can always return oom_bfqq
when attempting to allocate, we cannot assume waker_bfqq starts as NULL.
We must either reset woken_list_node, or avoid setting woken_list at all
for oom_bfqq - opt to do the former.
Crashes would have a stacktrace like:
[160595.656560] bfq_add_bfqq_busy+0x110/0x1ec
[160595.661142] bfq_add_request+0x6bc/0x980
[160595.666602] bfq_insert_request+0x8ec/0x1240
[160595.671762] bfq_insert_requests+0x58/0x9c
[160595.676420] blk_mq_sched_insert_request+0x11c/0x198
[160595.682107] blk_mq_submit_bio+0x270/0x62c
[160595.686759] __submit_bio_noacct_mq+0xec/0x178
[160595.691926] submit_bio+0x120/0x184
[160595.695990] ext4_mpage_readpages+0x77c/0x7c8
[160595.701026] ext4_readpage+0x60/0xb0
[160595.705158] filemap_read_page+0x54/0x114
[160595.711961] filemap_fault+0x228/0x5f4
[160595.716272] do_read_fault+0xe0/0x1f0
[160595.720487] do_fault+0x40/0x1c8
Tested by injecting random failures into bfq_get_queue, crashes go away
completely.
Fixes: 8ef3fc3a043c ("block, bfq: make shared queues inherit wakers")
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
---
RFC mainly because it's not clear to me the best policy here - but the
patch is tested and fixes a real crash we started seeing in 5.15
This is following up my ramble over at
https://lore.kernel.org/lkml/CACGdZYLMnfcqwbAXDx+x9vUOMn2cz55oc+8WySBS3J2Xd_q7Lg@mail.gmail.com/
block/bfq-iosched.c | 5 +++++
1 file changed, 5 insertions(+)
Comments
Hi, 在 2022/11/03 9:39, Khazhismel Kumykov 写道: > This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL, > but woken_list_node still being hashed. This would happen when > bfq_init_rq() expects a brand new allocated queue to be returned from From what I see, bfqq->waker_bfqq is updated in bfq_init_rq() only if 'new_queue' is false, but if 'new_queue' is false, the returned 'bfqq' from bfq_get_bfqq_handle_split() will never be oom_bfqq, so I'm confused here... > bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq > without resetting woken_list_node. Since we can always return oom_bfqq > when attempting to allocate, we cannot assume waker_bfqq starts as NULL. > We must either reset woken_list_node, or avoid setting woken_list at all > for oom_bfqq - opt to do the former. Once oom_bfqq is used, I think the io is treated as issued from root group. Hence I don't think it's necessary to set woken_list or waker_bfqq for oom_bfqq. Thanks, Kuai > > Crashes would have a stacktrace like: > [160595.656560] bfq_add_bfqq_busy+0x110/0x1ec > [160595.661142] bfq_add_request+0x6bc/0x980 > [160595.666602] bfq_insert_request+0x8ec/0x1240 > [160595.671762] bfq_insert_requests+0x58/0x9c > [160595.676420] blk_mq_sched_insert_request+0x11c/0x198 > [160595.682107] blk_mq_submit_bio+0x270/0x62c > [160595.686759] __submit_bio_noacct_mq+0xec/0x178 > [160595.691926] submit_bio+0x120/0x184 > [160595.695990] ext4_mpage_readpages+0x77c/0x7c8 > [160595.701026] ext4_readpage+0x60/0xb0 > [160595.705158] filemap_read_page+0x54/0x114 > [160595.711961] filemap_fault+0x228/0x5f4 > [160595.716272] do_read_fault+0xe0/0x1f0 > [160595.720487] do_fault+0x40/0x1c8 > > Tested by injecting random failures into bfq_get_queue, crashes go away > completely. > > Fixes: 8ef3fc3a043c ("block, bfq: make shared queues inherit wakers") > Signed-off-by: Khazhismel Kumykov <khazhy@google.com> > --- > RFC mainly because it's not clear to me the best policy here - but the > patch is tested and fixes a real crash we started seeing in 5.15 > > This is following up my ramble over at > https://lore.kernel.org/lkml/CACGdZYLMnfcqwbAXDx+x9vUOMn2cz55oc+8WySBS3J2Xd_q7Lg@mail.gmail.com/ > > block/bfq-iosched.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c > index 7ea427817f7f..5d2861119d20 100644 > --- a/block/bfq-iosched.c > +++ b/block/bfq-iosched.c > @@ -6793,7 +6793,12 @@ static struct bfq_queue *bfq_init_rq(struct request *rq) > * reset. So insert new_bfqq into the > * woken_list of the waker. See > * bfq_check_waker for details. > + * > + * Also, if we got oom_bfqq, we must check if > + * it's already in a woken_list > */ > + if (unlikely(!hlist_unhashed(&bfqq->woken_list_node))) > + hlist_del_init(&bfqq->woken_list_node); > if (bfqq->waker_bfqq) > hlist_add_head(&bfqq->woken_list_node, > &bfqq->waker_bfqq->woken_list); >
On Wed, Nov 2, 2022 at 7:56 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2022/11/03 9:39, Khazhismel Kumykov 写道: > > This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL, > > but woken_list_node still being hashed. This would happen when > > bfq_init_rq() expects a brand new allocated queue to be returned from > > From what I see, bfqq->waker_bfqq is updated in bfq_init_rq() only if > 'new_queue' is false, but if 'new_queue' is false, the returned 'bfqq' > from bfq_get_bfqq_handle_split() will never be oom_bfqq, so I'm confused > here... There's two calls for bfq_get_bfqq_handle_split in this function - the second one is after the check you mentioned, and is the problematic one. > > > bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq > > without resetting woken_list_node. Since we can always return oom_bfqq > > when attempting to allocate, we cannot assume waker_bfqq starts as NULL. > > We must either reset woken_list_node, or avoid setting woken_list at all > > for oom_bfqq - opt to do the former. > > Once oom_bfqq is used, I think the io is treated as issued from root > group. Hence I don't think it's necessary to set woken_list or > waker_bfqq for oom_bfqq. Ack, I was wondering what's right here since, evidently, *someone* had already set oom_bfqq->waker_bfqq to *something* (although... maybe it was an earlier init_rq). But maybe it's better to do nothing if we *know* it's oom_bfqq. Is it a correct interpretation here that setting waker_bfqq won't accomplish anything anyways on oom_bfqq, so better off not? > > Thanks, > Kuai
Hi, 在 2022/11/03 11:05, Khazhy Kumykov 写道: > On Wed, Nov 2, 2022 at 7:56 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> Hi, >> >> 在 2022/11/03 9:39, Khazhismel Kumykov 写道: >>> This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL, >>> but woken_list_node still being hashed. This would happen when >>> bfq_init_rq() expects a brand new allocated queue to be returned from >> >> From what I see, bfqq->waker_bfqq is updated in bfq_init_rq() only if >> 'new_queue' is false, but if 'new_queue' is false, the returned 'bfqq' >> from bfq_get_bfqq_handle_split() will never be oom_bfqq, so I'm confused >> here... > There's two calls for bfq_get_bfqq_handle_split in this function - the > second one is after the check you mentioned, and is the problematic > one. Yes, thanks for the explanation. Now I understand how the problem triggers. >> >>> bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq >>> without resetting woken_list_node. Since we can always return oom_bfqq >>> when attempting to allocate, we cannot assume waker_bfqq starts as NULL. >>> We must either reset woken_list_node, or avoid setting woken_list at all >>> for oom_bfqq - opt to do the former. >> >> Once oom_bfqq is used, I think the io is treated as issued from root >> group. Hence I don't think it's necessary to set woken_list or >> waker_bfqq for oom_bfqq. > Ack, I was wondering what's right here since, evidently, *someone* had > already set oom_bfqq->waker_bfqq to *something* (although... maybe it > was an earlier init_rq). But maybe it's better to do nothing if we > *know* it's oom_bfqq. I need to have a check how oom_bfqq get involved with waker_bfqq, and then see if it's reasonable. Probably Jan and Paolo will have better view on this. Thanks, Kuai > > Is it a correct interpretation here that setting waker_bfqq won't > accomplish anything anyways on oom_bfqq, so better off not?
On Thu 03-11-22 11:51:15, Yu Kuai wrote: > Hi, > > 在 2022/11/03 11:05, Khazhy Kumykov 写道: > > On Wed, Nov 2, 2022 at 7:56 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > > > > > Hi, > > > > > > 在 2022/11/03 9:39, Khazhismel Kumykov 写道: > > > > This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL, > > > > but woken_list_node still being hashed. This would happen when > > > > bfq_init_rq() expects a brand new allocated queue to be returned from > > > > > > From what I see, bfqq->waker_bfqq is updated in bfq_init_rq() only if > > > 'new_queue' is false, but if 'new_queue' is false, the returned 'bfqq' > > > from bfq_get_bfqq_handle_split() will never be oom_bfqq, so I'm confused > > > here... > > There's two calls for bfq_get_bfqq_handle_split in this function - the > > second one is after the check you mentioned, and is the problematic > > one. > Yes, thanks for the explanation. Now I understand how the problem > triggers. > > > > > > > > bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq > > > > without resetting woken_list_node. Since we can always return oom_bfqq > > > > when attempting to allocate, we cannot assume waker_bfqq starts as NULL. > > > > We must either reset woken_list_node, or avoid setting woken_list at all > > > > for oom_bfqq - opt to do the former. > > > > > > Once oom_bfqq is used, I think the io is treated as issued from root > > > group. Hence I don't think it's necessary to set woken_list or > > > waker_bfqq for oom_bfqq. > > Ack, I was wondering what's right here since, evidently, *someone* had > > already set oom_bfqq->waker_bfqq to *something* (although... maybe it > > was an earlier init_rq). But maybe it's better to do nothing if we > > *know* it's oom_bfqq. > > I need to have a check how oom_bfqq get involved with waker_bfqq, and > then see if it's reasonable. > > Probably Jan and Paolo will have better view on this. Thanks for the CC Kuai and thanks to Khazy for spotting the bug. The oom_bfqq is just a fallback bfqq and as such it should be extempted from all special handling like waker detection etc. All this stuff is just for optimizing performance and when we are OOM, we have far larger troubles than to optimize performance. So how I think we should really fix this is that we extempt oom_bfqq from waker detection in bfq_check_waker() by adding: bfqq == bfqd->oom_bfqq || bfqd->last_completed_rq_bfq == bfqd->oom_bfqq) to the initial check and then also if bfq_get_bfqq_handle_split() returns oom_bfqq we should just skip carrying over the waker information. Honza
On Thu, Nov 3, 2022 at 1:47 AM Jan Kara <jack@suse.cz> wrote: > > On Thu 03-11-22 11:51:15, Yu Kuai wrote: > > Hi, > > > > 在 2022/11/03 11:05, Khazhy Kumykov 写道: > > > On Wed, Nov 2, 2022 at 7:56 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > > > > > > > Hi, > > > > > > > > 在 2022/11/03 9:39, Khazhismel Kumykov 写道: > > > > > This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL, > > > > > but woken_list_node still being hashed. This would happen when > > > > > bfq_init_rq() expects a brand new allocated queue to be returned from > > > > > > > > From what I see, bfqq->waker_bfqq is updated in bfq_init_rq() only if > > > > 'new_queue' is false, but if 'new_queue' is false, the returned 'bfqq' > > > > from bfq_get_bfqq_handle_split() will never be oom_bfqq, so I'm confused > > > > here... > > > There's two calls for bfq_get_bfqq_handle_split in this function - the > > > second one is after the check you mentioned, and is the problematic > > > one. > > Yes, thanks for the explanation. Now I understand how the problem > > triggers. > > > > > > > > > > > bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq > > > > > without resetting woken_list_node. Since we can always return oom_bfqq > > > > > when attempting to allocate, we cannot assume waker_bfqq starts as NULL. > > > > > We must either reset woken_list_node, or avoid setting woken_list at all > > > > > for oom_bfqq - opt to do the former. > > > > > > > > Once oom_bfqq is used, I think the io is treated as issued from root > > > > group. Hence I don't think it's necessary to set woken_list or > > > > waker_bfqq for oom_bfqq. > > > Ack, I was wondering what's right here since, evidently, *someone* had > > > already set oom_bfqq->waker_bfqq to *something* (although... maybe it > > > was an earlier init_rq). But maybe it's better to do nothing if we > > > *know* it's oom_bfqq. > > > > I need to have a check how oom_bfqq get involved with waker_bfqq, and > > then see if it's reasonable. > > > > Probably Jan and Paolo will have better view on this. > > Thanks for the CC Kuai and thanks to Khazy for spotting the bug. The > oom_bfqq is just a fallback bfqq and as such it should be extempted from > all special handling like waker detection etc. All this stuff is just for > optimizing performance and when we are OOM, we have far larger troubles > than to optimize performance. > > So how I think we should really fix this is that we extempt oom_bfqq from > waker detection in bfq_check_waker() by adding: > > bfqq == bfqd->oom_bfqq || > bfqd->last_completed_rq_bfq == bfqd->oom_bfqq) > > to the initial check and then also if bfq_get_bfqq_handle_split() returns > oom_bfqq we should just skip carrying over the waker information. Thanks for the tip! I'll send a followup, including your suggestions. I do have some other questions in this area, if you could help me understand. Looking again at bfq_init_rq, inside of the !new_queue section - we call bfq_split_bfqq() to "split" our bfqq, then in the next line bfq_get_bfqq_handle_split inspects bic_to_bfqq(bic, is_sync), and if it's NULL, allocates a new queue. However, if we have an async rq, this call will return the pre-existing async bfqq, as the call to bfq_split_bfqq() only clears the sync bfqq, ever. The best understanding I have now is: "bic->bfqq[aync] is never NULL (and we don't merge async queues) so we'll never reach this !new_queue section anyways if it's async". Is that accurate? Thanks, Khazhy > > Honza > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR
On Fri 04-11-22 14:25:32, Khazhy Kumykov wrote: > On Thu, Nov 3, 2022 at 1:47 AM Jan Kara <jack@suse.cz> wrote: > > > > On Thu 03-11-22 11:51:15, Yu Kuai wrote: > > > Hi, > > > > > > 在 2022/11/03 11:05, Khazhy Kumykov 写道: > > > > On Wed, Nov 2, 2022 at 7:56 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > > > > > > > > > Hi, > > > > > > > > > > 在 2022/11/03 9:39, Khazhismel Kumykov 写道: > > > > > > This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL, > > > > > > but woken_list_node still being hashed. This would happen when > > > > > > bfq_init_rq() expects a brand new allocated queue to be returned from > > > > > > > > > > From what I see, bfqq->waker_bfqq is updated in bfq_init_rq() only if > > > > > 'new_queue' is false, but if 'new_queue' is false, the returned 'bfqq' > > > > > from bfq_get_bfqq_handle_split() will never be oom_bfqq, so I'm confused > > > > > here... > > > > There's two calls for bfq_get_bfqq_handle_split in this function - the > > > > second one is after the check you mentioned, and is the problematic > > > > one. > > > Yes, thanks for the explanation. Now I understand how the problem > > > triggers. > > > > > > > > > > > > > > bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq > > > > > > without resetting woken_list_node. Since we can always return oom_bfqq > > > > > > when attempting to allocate, we cannot assume waker_bfqq starts as NULL. > > > > > > We must either reset woken_list_node, or avoid setting woken_list at all > > > > > > for oom_bfqq - opt to do the former. > > > > > > > > > > Once oom_bfqq is used, I think the io is treated as issued from root > > > > > group. Hence I don't think it's necessary to set woken_list or > > > > > waker_bfqq for oom_bfqq. > > > > Ack, I was wondering what's right here since, evidently, *someone* had > > > > already set oom_bfqq->waker_bfqq to *something* (although... maybe it > > > > was an earlier init_rq). But maybe it's better to do nothing if we > > > > *know* it's oom_bfqq. > > > > > > I need to have a check how oom_bfqq get involved with waker_bfqq, and > > > then see if it's reasonable. > > > > > > Probably Jan and Paolo will have better view on this. > > > > Thanks for the CC Kuai and thanks to Khazy for spotting the bug. The > > oom_bfqq is just a fallback bfqq and as such it should be extempted from > > all special handling like waker detection etc. All this stuff is just for > > optimizing performance and when we are OOM, we have far larger troubles > > than to optimize performance. > > > > So how I think we should really fix this is that we extempt oom_bfqq from > > waker detection in bfq_check_waker() by adding: > > > > bfqq == bfqd->oom_bfqq || > > bfqd->last_completed_rq_bfq == bfqd->oom_bfqq) > > > > to the initial check and then also if bfq_get_bfqq_handle_split() returns > > oom_bfqq we should just skip carrying over the waker information. > Thanks for the tip! I'll send a followup, including your suggestions. > > I do have some other questions in this area, if you could help me > understand. Looking again at bfq_init_rq, inside of the !new_queue > section - we call bfq_split_bfqq() to "split" our bfqq, then in the > next line bfq_get_bfqq_handle_split inspects bic_to_bfqq(bic, > is_sync), and if it's NULL, allocates a new queue. However, if we have > an async rq, this call will return the pre-existing async bfqq, as the > call to bfq_split_bfqq() only clears the sync bfqq, ever. The best > understanding I have now is: "bic->bfqq[aync] is never NULL (and we > don't merge async queues) so we'll never reach this !new_queue section > anyways if it's async". Is that accurate? So you are right that async queues are never merged or split. In fact, if you have a look at bfq_get_queue(), you'll notice that async queue is common for all processes with the same ioprio & blkcg. So all these games with splitting, merging, waker detection etc. impact only sync queues. Honza
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 7ea427817f7f..5d2861119d20 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -6793,7 +6793,12 @@ static struct bfq_queue *bfq_init_rq(struct request *rq) * reset. So insert new_bfqq into the * woken_list of the waker. See * bfq_check_waker for details. + * + * Also, if we got oom_bfqq, we must check if + * it's already in a woken_list */ + if (unlikely(!hlist_unhashed(&bfqq->woken_list_node))) + hlist_del_init(&bfqq->woken_list_node); if (bfqq->waker_bfqq) hlist_add_head(&bfqq->woken_list_node, &bfqq->waker_bfqq->woken_list);