Message ID | 572cfcc0-197a-9ead-9cb-3c5bf5e735@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp25058wrt; Fri, 23 Dec 2022 21:40:09 -0800 (PST) X-Google-Smtp-Source: AMrXdXvJO02DDI2JVSKjX6eJhtIwMzZsNqHUOr9QbOzsitgT0bayR8uPK3c/w7dVd78dJIz8/OIe X-Received: by 2002:a17:907:8744:b0:7c1:277:cb05 with SMTP id qo4-20020a170907874400b007c10277cb05mr12447027ejc.6.1671860409012; Fri, 23 Dec 2022 21:40:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671860408; cv=none; d=google.com; s=arc-20160816; b=f1VZ3C60X3v+gAaadGCDnt9Uz+pA/SRCAbKXtAM3bWrcRc9SUHWhpbQhs/M7koR7pC WKS0Jzd6/rEqu09qeHfiRt3OgshXmgaF+qUsHXSH21YT0w/j2eJDw02OWobFKAjjytuQ B1uWYMphewGgI5KWGPTDlXsd0cR3fIY0fautciO1uznz825JmQRrzVpMKtXo7dhJVVvz TFFRzgM9op91S0kyZAqdMUlr73yqDwPkVNAYFOigpxZcvHeb794pGrifWdc0PkGYXN+a hXrGG2f8MnnirgHp1Ar5wcySLvVmHXVaQsLr8qNfqzvp8qGssKHvQOHdEXlQOFep0Val 1Cbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:subject:cc:to:from:date :dkim-signature; bh=BvE2awRwFzFbxJ18OKrhIqmcucLQ6AvTFfNUPm7G5WM=; b=aYncFzLqs26HpDM/S6f/305+5OZVIBqFL9J8odFmF0ag1TQjZZjxwRSz6BizXviGo/ 8d0OKCqAFHBn8S8UHMvhzbZfAc/9+nNR4aoZEO9u/yL+n0XVoYCh/pi96fJ5x0q8rNDK wzvIHeQzTFTq7wKjxl2XMzhjSCtKZjR2/ztPLlmBjK34bKroGRzynT0Q5e+lzQYX05Sk MiG4IXr8JGmIhd5mBl7qJBgd3JAPg99NX6kQSJkVCquPVB4Y1YHeoma3vwlSkV9DAO1h deh+HJqfkELiCQw3WxHwYtD7W+JC/zZXhpbgviF8UFXVYi3NTB9O6vIKCM6/xz3/Of12 9z6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="k6yGb/e8"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l2-20020a170906a40200b0082ad5e78975si3221028ejz.803.2022.12.23.21.39.43; Fri, 23 Dec 2022 21:40:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="k6yGb/e8"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231134AbiLXFZL (ORCPT <rfc822;pacteraone@gmail.com> + 99 others); Sat, 24 Dec 2022 00:25:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229534AbiLXFZJ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 24 Dec 2022 00:25:09 -0500 Received: from mail-qt1-x82d.google.com (mail-qt1-x82d.google.com [IPv6:2607:f8b0:4864:20::82d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 081FEBE11 for <linux-kernel@vger.kernel.org>; Fri, 23 Dec 2022 21:25:08 -0800 (PST) Received: by mail-qt1-x82d.google.com with SMTP id j16so5163030qtv.4 for <linux-kernel@vger.kernel.org>; Fri, 23 Dec 2022 21:25:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:message-id:subject:cc:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=BvE2awRwFzFbxJ18OKrhIqmcucLQ6AvTFfNUPm7G5WM=; b=k6yGb/e8VmV5KYVZsTahb9RS1DKkPk395PBpTV6Q0c0DCTyRc82+Q8HoH8NgoHMKlu k0OJAqWW+dtvDsPiUy+FHoe2+HNwiRoRulbbb/udOX5BbqyGakPkGPqYwtw//+x8CJNH LgqkLvXGhZp7vBGd/aGwXhmFxSo/qDjCyUXHaVuYpgTeIrv1fjIQiX83kC67SV+3rW/n Prew+NQNkcr42MsV06oBUb6os+U5uPv4Ocej1jDMD9c3BVENdFVTXiv3gsleYn8W+Tlm fkaLLbvNvYxPcDoryiGUIf4ka6vhmNl57zxitj1ZSgqZX6C55cy+MeUN99MLsdp/X6Ew iPyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=BvE2awRwFzFbxJ18OKrhIqmcucLQ6AvTFfNUPm7G5WM=; b=TWgmsJjq1ZVFUYIjIbbpHQWH0EK4p8rItB9asCW0TcTVh6nO8uHQMT2ZY4Bd3+oZAm nGECoabNaHR8UR/F8zJFW4Z6BDPqks/mrJG+Vl6YkiFwN2pJ+4SWEAm+51dzg/gf0NiX pS9ZdA/4kJGGDRgN8rsIVD3+qK5kW53AK6Dd/zkpF9omZSN2I6gAgQcdOp8wuOmKrXS8 tPKc1bu/fVLD3VmtN3MLOFYID/gSYo9BsbaZ6upNxwn3wmB7JArze5kpWQdPUDAeHlH5 WLQohkH5+lRvUGIK+xsW2MFs9Pv3AkhQh2pWcneNSTMQseE1/GqHsWOoVFlxSjPbeubI 3Wxg== X-Gm-Message-State: AFqh2kpmrnkSBUZJusOo5wzcCtBDm7erm+DZzdmn5AEemhFSMUZT7GAe SmknxvDU8G8X+TLNiBU/ymRjdA== X-Received: by 2002:ac8:649:0:b0:3a6:93d7:873f with SMTP id e9-20020ac80649000000b003a693d7873fmr14076523qth.49.1671859507050; Fri, 23 Dec 2022 21:25:07 -0800 (PST) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id v2-20020a05620a440200b006fed2788751sm3654004qkp.76.2022.12.23.21.25.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Dec 2022 21:25:06 -0800 (PST) Date: Fri, 23 Dec 2022 21:24:56 -0800 (PST) From: Hugh Dickins <hughd@google.com> X-X-Sender: hugh@ripple.attlocal.net To: Christoph Hellwig <hch@infradead.org> cc: Jens Axboe <axboe@kernel.dk>, Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>, Chaitanya Kulkarni <kch@nvidia.com>, Linus Torvalds <torvalds@linux-foundation.org>, Thorsten Leemhuis <regressions@leemhuis.info>, linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: 6.2 nvme-pci: something wrong Message-ID: <572cfcc0-197a-9ead-9cb-3c5bf5e735@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1753072700330947579?= X-GMAIL-MSGID: =?utf-8?q?1753072700330947579?= |
Series |
6.2 nvme-pci: something wrong
|
|
Commit Message
Hugh Dickins
Dec. 24, 2022, 5:24 a.m. UTC
Hi Christoph, There's something wrong with the nvme-pci heading for 6.2-rc1: no problem booting here on this Lenovo ThinkPad X1 Carbon 5th, but under load... nvme nvme0: I/O 0 (I/O Cmd) QID 2 timeout, aborting nvme nvme0: I/O 1 (I/O Cmd) QID 2 timeout, aborting nvme nvme0: I/O 2 (I/O Cmd) QID 2 timeout, aborting nvme nvme0: I/O 3 (I/O Cmd) QID 2 timeout, aborting nvme nvme0: Abort status: 0x0 nvme nvme0: Abort status: 0x0 nvme nvme0: Abort status: 0x0 nvme nvme0: Abort status: 0x0 nvme nvme0: I/O 0 QID 2 timeout, reset controller ...and more, until I just have to poweroff and reboot. Bisection points to your 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers") And that does revert cleanly, giving a kernel which shows no problem. I've spent a while comparing old nvme_pci_alloc_tag_set() and new nvme_alloc_io_tag_set(), I do not know my way around there at all and may be talking nonsense, but it did look as if there might now be a difference in the queue_depth, sqsize, q_depth conversions. I'm running load successfully with the patch below, but I strongly suspect that the right patch will be somewhere else: over to you! Hugh
Comments
On Fri, Dec 23, 2022 at 09:24:56PM -0800, Hugh Dickins wrote: > Hi Christoph, > > There's something wrong with the nvme-pci heading for 6.2-rc1: > no problem booting here on this Lenovo ThinkPad X1 Carbon 5th, > but under load... > > nvme nvme0: I/O 0 (I/O Cmd) QID 2 timeout, aborting > nvme nvme0: I/O 1 (I/O Cmd) QID 2 timeout, aborting > nvme nvme0: I/O 2 (I/O Cmd) QID 2 timeout, aborting > nvme nvme0: I/O 3 (I/O Cmd) QID 2 timeout, aborting > nvme nvme0: Abort status: 0x0 > nvme nvme0: Abort status: 0x0 > nvme nvme0: Abort status: 0x0 > nvme nvme0: Abort status: 0x0 > nvme nvme0: I/O 0 QID 2 timeout, reset controller > > ...and more, until I just have to poweroff and reboot. > > Bisection points to your > 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers") > And that does revert cleanly, giving a kernel which shows no problem. > > I've spent a while comparing old nvme_pci_alloc_tag_set() and new > nvme_alloc_io_tag_set(), I do not know my way around there at all > and may be talking nonsense, but it did look as if there might now > be a difference in the queue_depth, sqsize, q_depth conversions. > > I'm running load successfully with the patch below, but I strongly > suspect that the right patch will be somewhere else: over to you! Thanks for the report! The patch is definitively wrong, ->sqsize hold one of the awful so called 'zeroes based values' in NVMe, where 0 means 1, and thus have a built-in one off. We should probably convert it to a sane value at read time, but that's a separate discussion. I suspect your controller is one of those where we quirk the size, and the commit you bisected fails to reflects that in the common sqsizse value. The patch below should be the minimum fix, and in the long term, the duplicate bookkeeping for it in the PCI driver should go away: diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index f0f8027644bbf8..a73c0ee7bd1892 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2536,7 +2536,6 @@ static int nvme_pci_enable(struct nvme_dev *dev) dev->q_depth = min_t(u32, NVME_CAP_MQES(dev->ctrl.cap) + 1, io_queue_depth); - dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */ dev->db_stride = 1 << NVME_CAP_STRIDE(dev->ctrl.cap); dev->dbs = dev->bar + 4096; @@ -2577,7 +2576,7 @@ static int nvme_pci_enable(struct nvme_dev *dev) dev_warn(dev->ctrl.device, "IO queue depth clamped to %d\n", dev->q_depth); } - + dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */ nvme_map_cmb(dev);
Linux regression tracking (Thorsten Leemhuis)
Dec. 24, 2022, 7:52 a.m. UTC |
#2
Addressed
Unaddressed
[Note: this mail contains only information for Linux kernel regression tracking. Mails like these contain '#forregzbot' in the subject to make then easy to spot and filter out. The author also tried to remove most or all individuals from the list of recipients to spare them the hassle.] On 24.12.22 06:24, Hugh Dickins wrote: > > There's something wrong with the nvme-pci heading for 6.2-rc1: > no problem booting here on this Lenovo ThinkPad X1 Carbon 5th, > but under load...> [...] > Bisection points to your > 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers") > And that does revert cleanly, giving a kernel which shows no problem. > [...] Thanks for the report. To be sure below issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression tracking bot: #regzbot ^introduced 0da7feaa5913 #regzbot title nvme-pci: problems under load #regzbot ignore-activity Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.ns.
On Fri, 23 Dec 2022, Christoph Hellwig wrote: > On Fri, Dec 23, 2022 at 09:24:56PM -0800, Hugh Dickins wrote: > > Hi Christoph, > > > > There's something wrong with the nvme-pci heading for 6.2-rc1: > > no problem booting here on this Lenovo ThinkPad X1 Carbon 5th, > > but under load... > > > > nvme nvme0: I/O 0 (I/O Cmd) QID 2 timeout, aborting > > nvme nvme0: I/O 1 (I/O Cmd) QID 2 timeout, aborting > > nvme nvme0: I/O 2 (I/O Cmd) QID 2 timeout, aborting > > nvme nvme0: I/O 3 (I/O Cmd) QID 2 timeout, aborting > > nvme nvme0: Abort status: 0x0 > > nvme nvme0: Abort status: 0x0 > > nvme nvme0: Abort status: 0x0 > > nvme nvme0: Abort status: 0x0 > > nvme nvme0: I/O 0 QID 2 timeout, reset controller > > > > ...and more, until I just have to poweroff and reboot. > > > > Bisection points to your > > 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers") > > And that does revert cleanly, giving a kernel which shows no problem. > > > > I've spent a while comparing old nvme_pci_alloc_tag_set() and new > > nvme_alloc_io_tag_set(), I do not know my way around there at all > > and may be talking nonsense, but it did look as if there might now > > be a difference in the queue_depth, sqsize, q_depth conversions. > > > > I'm running load successfully with the patch below, but I strongly > > suspect that the right patch will be somewhere else: over to you! > > Thanks for the report! The patch is definitively wrong, ->sqsize > hold one of the awful so called 'zeroes based values' in NVMe, > where 0 means 1, and thus have a built-in one off. We should > probably convert it to a sane value at read time, but that's a > separate discussion. > > I suspect your controller is one of those where we quirk the size, > and the commit you bisected fails to reflects that in the common > sqsizse value. The patch below should be the minimum fix, and in > the long term, the duplicate bookkeeping for it in the PCI driver > should go away: Thanks for the rapid response. No, I've just tried your patch below, and it does not help. And I don't see any message about "queue size" in dmesg, so don't think mine is quirked. cat /sys/devices/pci0000:00/0000:00:1c.4/0000:05:00.0/nvme/nvme0/sqsize tells me 511 (and that looked like the only relevant file under /sys). Regarding the awful 0's based queue depth: yes, it just looked to me as if the way that got handled in pci.c before differed from the way it gets handled in pci.c and core.c now, one too many "+ 1"s or "- 1"s somewhere. > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index f0f8027644bbf8..a73c0ee7bd1892 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -2536,7 +2536,6 @@ static int nvme_pci_enable(struct nvme_dev *dev) > > dev->q_depth = min_t(u32, NVME_CAP_MQES(dev->ctrl.cap) + 1, > io_queue_depth); > - dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */ > dev->db_stride = 1 << NVME_CAP_STRIDE(dev->ctrl.cap); > dev->dbs = dev->bar + 4096; > > @@ -2577,7 +2576,7 @@ static int nvme_pci_enable(struct nvme_dev *dev) > dev_warn(dev->ctrl.device, "IO queue depth clamped to %d\n", > dev->q_depth); > } > - > + dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */ > > nvme_map_cmb(dev); >
On Sat, Dec 24, 2022 at 2:19 AM Hugh Dickins <hughd@google.com> wrote: > > Regarding the awful 0's based queue depth: yes, it just looked to me > as if the way that got handled in pci.c before differed from the way > it gets handled in pci.c and core.c now, one too many "+ 1"s or "- 1"s > somewhere. The commit in question seems to replace nvme_pci_alloc_tag_set() calls with nvme_alloc_io_tag_set(), and that has a big difference in how queue_depth is set. It used to do (in nnvme_pci_alloc_tag_set()): set->queue_depth = min_t(unsigned, dev->q_depth, BLK_MQ_MAX_DEPTH) - 1; but now it does (in nvme_alloc_io_tag_set()) set->queue_depth = ctrl->sqsize + 1; instead. So that "set->queue_depth" _seems_ to have historically had that "-1" (that "zero means one" that apparently sqsize also has), but the new code basically undoes it. I don't know the code at all, but this does all seem to be a change (and *very* confusing). The fact that Hugh gets it to work by doint that set->queue_depth = ctrl->sqsize; does seem to match the whole "it used to subtract one" behavior it had. Which is why I assume Hugh tried that patch in the first place. Linus
On Fri, Dec 23, 2022 at 09:24:56PM -0800, Hugh Dickins wrote: > Hi Christoph, > > There's something wrong with the nvme-pci heading for 6.2-rc1: > no problem booting here on this Lenovo ThinkPad X1 Carbon 5th, > but under load... > > nvme nvme0: I/O 0 (I/O Cmd) QID 2 timeout, aborting > nvme nvme0: I/O 1 (I/O Cmd) QID 2 timeout, aborting > nvme nvme0: I/O 2 (I/O Cmd) QID 2 timeout, aborting > nvme nvme0: I/O 3 (I/O Cmd) QID 2 timeout, aborting > nvme nvme0: Abort status: 0x0 > nvme nvme0: Abort status: 0x0 > nvme nvme0: Abort status: 0x0 > nvme nvme0: Abort status: 0x0 > nvme nvme0: I/O 0 QID 2 timeout, reset controller > > ...and more, until I just have to poweroff and reboot. > > Bisection points to your > 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers") > And that does revert cleanly, giving a kernel which shows no problem. > > I've spent a while comparing old nvme_pci_alloc_tag_set() and new > nvme_alloc_io_tag_set(), I do not know my way around there at all > and may be talking nonsense, but it did look as if there might now > be a difference in the queue_depth, sqsize, q_depth conversions. > > I'm running load successfully with the patch below, but I strongly > suspect that the right patch will be somewhere else: over to you! > > Hugh > > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -4926,7 +4926,7 @@ int nvme_alloc_io_tag_set(struct nvme_ct > > memset(set, 0, sizeof(*set)); > set->ops = ops; > - set->queue_depth = ctrl->sqsize + 1; > + set->queue_depth = ctrl->sqsize; Your observation is a queue-wrap condition that makes it impossible for the controller know there are new commands. Your patch does look like the correct thing to do. The "zero means one" thing is a confusing distraction, I think. It makes more sense if you consider sqsize as the maximum number of tags we can have outstanding at one time and it looks like all the drivers set it that way. We're supposed to leave one slot empty for a full NVMe queue, so adding one here to report the total number slots isn't right since that would allow us to fill all slots. Fabrics drivers have been using this method for a while, though, so interesting they haven't had a simiar problem.
On Sat, Dec 24, 2022 at 03:06:38PM -0700, Keith Busch wrote: > Your observation is a queue-wrap condition that makes it impossible for > the controller know there are new commands. > > Your patch does look like the correct thing to do. The "zero means one" > thing is a confusing distraction, I think. It makes more sense if you > consider sqsize as the maximum number of tags we can have outstanding at > one time and it looks like all the drivers set it that way. We're > supposed to leave one slot empty for a full NVMe queue, so adding one > here to report the total number slots isn't right since that would allow > us to fill all slots. Yes, and pcie did actually do the ‐ 1 from q_depth, so we should drop the +1 for sqsize. And add back the missing BLK_MQ_MAX_DEPTH. But we still need to keep sqsize updated as well. > Fabrics drivers have been using this method for a while, though, so > interesting they haven't had a simiar problem. Fabrics doesn't have a real queue and thus no actual wrap, so I don't think they will be hit as bad by this. So we'll probably need something like this, split into two patches. And then for 6.2 clean up the sqsize vs q_depth mess for real. diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 95c488ea91c303..5b723c65fbeab5 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4926,7 +4926,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set, memset(set, 0, sizeof(*set)); set->ops = ops; - set->queue_depth = ctrl->sqsize + 1; + set->queue_depth = min_t(unsigned, ctrl->sqsize, BLK_MQ_MAX_DEPTH - 1); /* * Some Apple controllers requires tags to be unique across admin and * the (only) I/O queue, so reserve the first 32 tags of the I/O queue. diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index f0f8027644bbf8..ec5e1c578a710b 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2332,10 +2332,12 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) if (dev->cmb_use_sqes) { result = nvme_cmb_qdepth(dev, nr_io_queues, sizeof(struct nvme_command)); - if (result > 0) + if (result > 0) { dev->q_depth = result; - else + dev->ctrl.sqsize = dev->q_depth - 1; + } else { dev->cmb_use_sqes = false; + } } do { @@ -2536,7 +2538,6 @@ static int nvme_pci_enable(struct nvme_dev *dev) dev->q_depth = min_t(u32, NVME_CAP_MQES(dev->ctrl.cap) + 1, io_queue_depth); - dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */ dev->db_stride = 1 << NVME_CAP_STRIDE(dev->ctrl.cap); dev->dbs = dev->bar + 4096; @@ -2577,7 +2578,7 @@ static int nvme_pci_enable(struct nvme_dev *dev) dev_warn(dev->ctrl.device, "IO queue depth clamped to %d\n", dev->q_depth); } - + dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */ nvme_map_cmb(dev);
On Sat, 24 Dec 2022, Christoph Hellwig wrote: > On Sat, Dec 24, 2022 at 03:06:38PM -0700, Keith Busch wrote: > > Your observation is a queue-wrap condition that makes it impossible for > > the controller know there are new commands. > > > > Your patch does look like the correct thing to do. The "zero means one" > > thing is a confusing distraction, I think. It makes more sense if you > > consider sqsize as the maximum number of tags we can have outstanding at > > one time and it looks like all the drivers set it that way. We're > > supposed to leave one slot empty for a full NVMe queue, so adding one > > here to report the total number slots isn't right since that would allow > > us to fill all slots. > > Yes, and pcie did actually do the ‐ 1 from q_depth, so we should > drop the +1 for sqsize. And add back the missing BLK_MQ_MAX_DEPTH. > But we still need to keep sqsize updated as well. > > > Fabrics drivers have been using this method for a while, though, so > > interesting they haven't had a simiar problem. > > Fabrics doesn't have a real queue and thus no actual wrap, so > I don't think they will be hit as bad by this. > > So we'll probably need something like this, split into two patches. > And then for 6.2 clean up the sqsize vs q_depth mess for real. This patch is working fine for me; and, in the light of Keith's explanation, so far as I can tell, seems the right thing to do. Thanks! Hugh > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 95c488ea91c303..5b723c65fbeab5 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -4926,7 +4926,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set, > > memset(set, 0, sizeof(*set)); > set->ops = ops; > - set->queue_depth = ctrl->sqsize + 1; > + set->queue_depth = min_t(unsigned, ctrl->sqsize, BLK_MQ_MAX_DEPTH - 1); > /* > * Some Apple controllers requires tags to be unique across admin and > * the (only) I/O queue, so reserve the first 32 tags of the I/O queue. > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index f0f8027644bbf8..ec5e1c578a710b 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -2332,10 +2332,12 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) > if (dev->cmb_use_sqes) { > result = nvme_cmb_qdepth(dev, nr_io_queues, > sizeof(struct nvme_command)); > - if (result > 0) > + if (result > 0) { > dev->q_depth = result; > - else > + dev->ctrl.sqsize = dev->q_depth - 1; > + } else { > dev->cmb_use_sqes = false; > + } > } > > do { > @@ -2536,7 +2538,6 @@ static int nvme_pci_enable(struct nvme_dev *dev) > > dev->q_depth = min_t(u32, NVME_CAP_MQES(dev->ctrl.cap) + 1, > io_queue_depth); > - dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */ > dev->db_stride = 1 << NVME_CAP_STRIDE(dev->ctrl.cap); > dev->dbs = dev->bar + 4096; > > @@ -2577,7 +2578,7 @@ static int nvme_pci_enable(struct nvme_dev *dev) > dev_warn(dev->ctrl.device, "IO queue depth clamped to %d\n", > dev->q_depth); > } > - > + dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */ > > nvme_map_cmb(dev); > >
Linux regression tracking (Thorsten Leemhuis)
Jan. 4, 2023, 2:02 p.m. UTC |
#8
Addressed
Unaddressed
[TLDR: This mail in primarily relevant for Linux kernel regression tracking. See link in footer if these mails annoy you.] On 24.12.22 08:52, Thorsten Leemhuis wrote: > [Note: this mail contains only information for Linux kernel regression > tracking. Mails like these contain '#forregzbot' in the subject to make > then easy to spot and filter out. The author also tried to remove most > or all individuals from the list of recipients to spare them the hassle.] > > On 24.12.22 06:24, Hugh Dickins wrote: >> >> There's something wrong with the nvme-pci heading for 6.2-rc1: >> no problem booting here on this Lenovo ThinkPad X1 Carbon 5th, >> but under load...> [...] >> Bisection points to your >> 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers") >> And that does revert cleanly, giving a kernel which shows no problem. >> [...] > > Thanks for the report. To be sure below issue doesn't fall through the > cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression > tracking bot: > > #regzbot ^introduced 0da7feaa5913 > #regzbot title nvme-pci: problems under load > #regzbot ignore-activity #regzbot fix: 88d356ca41ba1c3effc2d4208dfbd43 Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you.
--- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4926,7 +4926,7 @@ int nvme_alloc_io_tag_set(struct nvme_ct memset(set, 0, sizeof(*set)); set->ops = ops; - set->queue_depth = ctrl->sqsize + 1; + set->queue_depth = ctrl->sqsize; /* * Some Apple controllers requires tags to be unique across admin and * the (only) I/O queue, so reserve the first 32 tags of the I/O queue.