Message ID | 20240118210303.10484-1-stuart.w.hayes@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-30567-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2bc4:b0:101:a8e8:374 with SMTP id hx4csp621342dyb; Thu, 18 Jan 2024 13:03:33 -0800 (PST) X-Google-Smtp-Source: AGHT+IFMh83yfyubGsPBwUnC903BZJwGDT93sPG56O8K3J2x4MNmnn085JtD4XW5nDNeG/ElROBw X-Received: by 2002:a05:6870:d150:b0:210:c0dd:c90f with SMTP id f16-20020a056870d15000b00210c0ddc90fmr1233306oac.11.1705611813333; Thu, 18 Jan 2024 13:03:33 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705611813; cv=pass; d=google.com; s=arc-20160816; b=H5qOCQls4GKvwZk/Jg0nnB78/kuuFjM0W0dXEDDh2Emb9o5RkXEK4DcojIlCeXbF3P UWVkQt+gfyQQXUww+B7LcnoF6E2AiN7EMiFm1Hxpx6Fk8JxfhwI53ZEw49iP7eMM0Db1 C88WncBVS/iLm2m/6CZ8LDVzAgHTR0hcr/zfeVUQtA2Hy0RjNipSvh4i5cYnSsOxtGY/ PMOBmgtTb1A19OsW0Us7olud1Zv8kgSli5J1xRLgWZ6ZApdXxkR8I7rLbeoNVVao3/O3 ZvLqdupgQV/WdSt6Vn3rfmnbv+94z6JD0fn1j6Vhz9JtE+4a08oBDXK9tiNTTZBp0Qhw BLzg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=hZuonEw3eedxFWRIJ3FPhNOWVoihKIZ04zTqPK6ckUU=; fh=x5S1JJHrMMcc6jJnl9jRDN6qIqw+JSOO2hf0tSH7HdE=; b=ey9XeyaWfkjyKHxvMiiGxKPRYse5Lu/3uzoV6+1H42KwQ6fR3S03iqy5luQOAZy5gK GHgDOQkEeypHtRYWuB6Ruuf8m+vZlfPN4bycVbotWbS89MVuUoOfzRz5vLTnjebtv5fE 9odTknECOSGqOX/ICQTNaDkQlztVHASssCtu2YLuDtNaZzROtpK5wNLPnR/rj2yk4eNW 8aPhXyd6zzEwOzWU8X/XheqXkhvQ+a5UUwjYC+2Mc1hx6gDp5AhLHzvVUr+tsuG46s7w cCUPy1kKRl5Cr8tT1ubqRgPxZcsrV319uscdm06ZTzZW2F1+GrSZu7qMszqsniODwkXH eKIw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=ToZ1V4ys; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-30567-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-30567-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id n15-20020a0ce48f000000b0068171ee32bcsi849664qvl.451.2024.01.18.13.03.33 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jan 2024 13:03:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-30567-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=ToZ1V4ys; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-30567-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-30567-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 1462B1C236DE for <ouuuleilei@gmail.com>; Thu, 18 Jan 2024 21:03:33 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DB71531732; Thu, 18 Jan 2024 21:03:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ToZ1V4ys" Received: from mail-ot1-f50.google.com (mail-ot1-f50.google.com [209.85.210.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 886852E84E for <linux-kernel@vger.kernel.org>; Thu, 18 Jan 2024 21:03:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705611798; cv=none; b=UkI7fKMEPxd7H3tFJ8udMguvWm+K9C2THd8yBiMG8LhR6k9zJtq+dl3O/CoysgQai+S3905508MdF83oJ3e1x6OdZ38XuNjBiv+zrxyKTp/I6eD3hdZVOvU+pPnMnTghjfPla9V7GqYN0HBzZflLuBTH3QNws0w3HQGsLRk+cnc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705611798; c=relaxed/simple; bh=6SMKOxcf+4Apt7o+Be8igJKCg+nalrh+TLWLD1u7S5g=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=Oz+nzy0N4bQNvqflrvCKpYCc3SO040LwxBZEW8ZIyEH+Y6f2FkQgHy/oZiF5vwGPP/U3qkeweeSglw2XRg/t7yHyvpxJVkHM1/c/tUNSK8pc9D5RAOZzQ6LDRRm8icmiefcCyE1+UN5tNkIu+WUZd2fcgJsH/3jL9LMyqSFwD9Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ToZ1V4ys; arc=none smtp.client-ip=209.85.210.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ot1-f50.google.com with SMTP id 46e09a7af769-6ddf26eba3cso56162a34.0 for <linux-kernel@vger.kernel.org>; Thu, 18 Jan 2024 13:03:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705611794; x=1706216594; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=hZuonEw3eedxFWRIJ3FPhNOWVoihKIZ04zTqPK6ckUU=; b=ToZ1V4ys6wOgODSnd4v/Rc8Tb7q16I5UymsPrQVAXekZIBm27aoisds+sdDc2sJz3V X3qP7tB83plaeyomeMkSRwdTYo1mPXmix+qov2iq0BlyA1CHyQQk2T3D8EQnnYWWO3e+ Ou3bortXk/EVDbshDERQUC/sT0YPuULfblVX3Wl+fsCMXqCZSRtI0N+nswM9BF4SM2t2 jLRwDWyZz1xhIY2QKjflxgsfBjmZak1bnvBVrZ3q6676OaFH2aqbZDB/NMIBamgIiyYY 3YZdcQlHsuTshzJnrSTwCNfMbXArvvAfwlhwhoeFoMLK0JDPJ/aXfYQzReRDJwfjLW9V rzNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705611794; x=1706216594; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hZuonEw3eedxFWRIJ3FPhNOWVoihKIZ04zTqPK6ckUU=; b=RjxCrtL35SClGd6ANHj68dDMI8kwAByhpd/d0HftldHRkq99MBFCr6Rild0rqnAGw8 DvPJaGj+VD7UwzrJmeYdAyWL2/yVkz9FOkPWa2LxoZGHNKOe+vydJifqNyxUFSgYx2ve cdXU1yMfyv4imv5SzhLFVonD+NBL6Jjs7QWhDeh5TRRVfnu3ia8q1YAugvslFB9fUQj0 g7AjybVZ50g9TBSl/dVHLsLf4dOTU8chAaAJZm8o8n75ZQsQz8o2Kdm3ql92BN6LFjTV /Hjx/zGP/0KWZnB/rGRpsFuTZ6EHuvkTBLl9tOTFuGa7eduHPspZrBo9SZVX7+mlxhji pHvQ== X-Gm-Message-State: AOJu0Yxj8tGYA6NzhckR/lRQoBV0Wdk853B+N0AFd05MPKv0PsroHqJR 6aQe+KV84hwhwWoONOZeqUZE9F5qmKvY9dTCmeycnry5TBxus0+U7ZF/GRkx X-Received: by 2002:a05:6870:7099:b0:210:aec0:e8ed with SMTP id v25-20020a056870709900b00210aec0e8edmr1429121oae.73.1705611794473; Thu, 18 Jan 2024 13:03:14 -0800 (PST) Received: from localhost.localdomain ([143.166.81.254]) by smtp.gmail.com with ESMTPSA id gu27-20020a056870ab1b00b00206516474f3sm561572oab.38.2024.01.18.13.03.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jan 2024 13:03:14 -0800 (PST) From: Stuart Hayes <stuart.w.hayes@gmail.com> To: linux-kernel@vger.kernel.org, Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>, linux-nvme@lists.infradead.org Cc: Stuart Hayes <stuart.w.hayes@gmail.com> Subject: [PATCH v2] nvme_core: scan namespaces asynchronously Date: Thu, 18 Jan 2024 15:03:03 -0600 Message-Id: <20240118210303.10484-1-stuart.w.hayes@gmail.com> X-Mailer: git-send-email 2.39.3 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1788463612311892119 X-GMAIL-MSGID: 1788463612311892119 |
Series |
[v2] nvme_core: scan namespaces asynchronously
|
|
Commit Message
stuart hayes
Jan. 18, 2024, 9:03 p.m. UTC
Use async function calls to make namespace scanning happen in parallel.
Without the patch, NVME namespaces are scanned serially, so it can take a
long time for all of a controller's namespaces to become available,
especially with a slower (TCP) interface with large number of namespaces.
The time it took for all namespaces to show up after connecting (via TCP)
to a controller with 1002 namespaces was measured:
network latency without patch with patch
0 6s 1s
50ms 210s 10s
100ms 417s 18s
Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
--
V2: remove module param to enable/disable async scanning
add scan time measurements to commit message
Comments
On 1/18/24 23:03, Stuart Hayes wrote: > Use async function calls to make namespace scanning happen in parallel. > > Without the patch, NVME namespaces are scanned serially, so it can take a > long time for all of a controller's namespaces to become available, > especially with a slower (TCP) interface with large number of namespaces. > > The time it took for all namespaces to show up after connecting (via TCP) > to a controller with 1002 namespaces was measured: > > network latency without patch with patch > 0 6s 1s > 50ms 210s 10s > 100ms 417s 18s > Impressive speedup. Not a very common use-case though... > Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> > > -- > V2: remove module param to enable/disable async scanning > add scan time measurements to commit message > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 0af612387083..069350f85b83 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -4,6 +4,7 @@ > * Copyright (c) 2011-2014, Intel Corporation. > */ > > +#include <linux/async.h> > #include <linux/blkdev.h> > #include <linux/blk-mq.h> > #include <linux/blk-integrity.h> > @@ -3812,12 +3813,38 @@ static void nvme_validate_ns(struct nvme_ns *ns, struct nvme_ns_info *info) > nvme_ns_remove(ns); > } > > -static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid) > +/* > + * struct nvme_scan_state - keeps track of controller & NSIDs to scan > + * @ctrl: Controller on which namespaces are being scanned > + * @count: Next NSID to scan (for sequential scan), or > + * Index of next NSID to scan in ns_list (for list scan) > + * @ns_list: pointer to list of NSIDs to scan (NULL if sequential scan) > + */ > +struct nvme_scan_state { > + struct nvme_ctrl *ctrl; > + atomic_t count; > + __le32 *ns_list; > +}; > + > +static void nvme_scan_ns(void *data, async_cookie_t cookie) I think its better to call it nvme_scan_ns_async to indicate what it is. > { > - struct nvme_ns_info info = { .nsid = nsid }; > + struct nvme_ns_info info = {}; > + struct nvme_scan_state *scan_state; > + struct nvme_ctrl *ctrl; > + u32 nsid; > struct nvme_ns *ns; > int ret; > > + scan_state = data; > + ctrl = scan_state->ctrl; I think these assignments can be done on the declaration. > + nsid = (u32)atomic_fetch_add(1, &scan_state->count); > + /* > + * get NSID from list (if scanning from a list, not sequentially) > + */ > + if (scan_state->ns_list) > + nsid = le32_to_cpu(scan_state->ns_list[nsid]); > + This is awkward. ns_list passed in optionally. How about we limit this change to only operate on nvme_scan_ns_list? If the controller is old or quirked to support only a sequential scan it does not benefit from a parallel scan. I doubt that these controllers are likely to expose a large number of namespaces anyways. > + info.nsid = nsid; > if (nvme_identify_ns_descs(ctrl, &info)) > return; > > @@ -3881,11 +3908,15 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl) > __le32 *ns_list; > u32 prev = 0; > int ret = 0, i; > + ASYNC_DOMAIN(domain); > + struct nvme_scan_state scan_state; > > ns_list = kzalloc(NVME_IDENTIFY_DATA_SIZE, GFP_KERNEL); > if (!ns_list) > return -ENOMEM; > > + scan_state.ctrl = ctrl; > + scan_state.ns_list = ns_list; Is there a need to have a local ns_list variable here? > for (;;) { > struct nvme_command cmd = { > .identify.opcode = nvme_admin_identify, > @@ -3901,19 +3932,25 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl) > goto free; > } > > + /* > + * scan list starting at list offset 0 > + */ > + atomic_set(&scan_state.count, 0); > for (i = 0; i < nr_entries; i++) { > u32 nsid = le32_to_cpu(ns_list[i]); > > if (!nsid) /* end of the list? */ > goto out; > - nvme_scan_ns(ctrl, nsid); > + async_schedule_domain(nvme_scan_ns, &scan_state, &domain); > while (++prev < nsid) > nvme_ns_remove_by_nsid(ctrl, prev); > } > + async_synchronize_full_domain(&domain); > } > out: > nvme_remove_invalid_namespaces(ctrl, prev); Is it a good idea to remove the invalid namespaces before synchronizing the async scans? > free: > + async_synchronize_full_domain(&domain); > kfree(ns_list); > return ret; > } > @@ -3922,14 +3959,23 @@ static void nvme_scan_ns_sequential(struct nvme_ctrl *ctrl) > { > struct nvme_id_ctrl *id; > u32 nn, i; > + ASYNC_DOMAIN(domain); > + struct nvme_scan_state scan_state; > > if (nvme_identify_ctrl(ctrl, &id)) > return; > nn = le32_to_cpu(id->nn); > kfree(id); > > + scan_state.ctrl = ctrl; > + /* > + * scan sequentially starting at NSID 1 > + */ > + atomic_set(&scan_state.count, 1); > + scan_state.ns_list = NULL; > for (i = 1; i <= nn; i++) > - nvme_scan_ns(ctrl, i); > + async_schedule_domain(nvme_scan_ns, &scan_state, &domain); > + async_synchronize_full_domain(&domain); > > nvme_remove_invalid_namespaces(ctrl, nn); > } I think we need a blktest for this. ns scanning has been notorious when running simultaneously with controller reset/reconnect/remove sequences... Ideally a test with a larger number of namespaces to exercise the code. Also, make sure that blktest suite does not complain about anything else.
Resending... didn't make it to the list, probably smtp issues.... On 1/22/24 11:13, Sagi Grimberg wrote: > > > On 1/18/24 23:03, Stuart Hayes wrote: >> Use async function calls to make namespace scanning happen in parallel. >> >> Without the patch, NVME namespaces are scanned serially, so it can take a >> long time for all of a controller's namespaces to become available, >> especially with a slower (TCP) interface with large number of namespaces. >> >> The time it took for all namespaces to show up after connecting (via TCP) >> to a controller with 1002 namespaces was measured: >> >> network latency without patch with patch >> 0 6s 1s >> 50ms 210s 10s >> 100ms 417s 18s >> > > Impressive speedup. Not a very common use-case though... > >> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> >> >> -- >> V2: remove module param to enable/disable async scanning >> add scan time measurements to commit message >> >> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >> index 0af612387083..069350f85b83 100644 >> --- a/drivers/nvme/host/core.c >> +++ b/drivers/nvme/host/core.c >> @@ -4,6 +4,7 @@ >> * Copyright (c) 2011-2014, Intel Corporation. >> */ >> +#include <linux/async.h> >> #include <linux/blkdev.h> >> #include <linux/blk-mq.h> >> #include <linux/blk-integrity.h> >> @@ -3812,12 +3813,38 @@ static void nvme_validate_ns(struct nvme_ns >> *ns, struct nvme_ns_info *info) >> nvme_ns_remove(ns); >> } >> -static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid) >> +/* >> + * struct nvme_scan_state - keeps track of controller & NSIDs to scan >> + * @ctrl: Controller on which namespaces are being scanned >> + * @count: Next NSID to scan (for sequential scan), or >> + * Index of next NSID to scan in ns_list (for list scan) >> + * @ns_list: pointer to list of NSIDs to scan (NULL if sequential >> scan) >> + */ >> +struct nvme_scan_state { >> + struct nvme_ctrl *ctrl; >> + atomic_t count; >> + __le32 *ns_list; >> +}; >> + >> +static void nvme_scan_ns(void *data, async_cookie_t cookie) > > I think its better to call it nvme_scan_ns_async to indicate what > it is. > >> { >> - struct nvme_ns_info info = { .nsid = nsid }; >> + struct nvme_ns_info info = {}; >> + struct nvme_scan_state *scan_state; >> + struct nvme_ctrl *ctrl; >> + u32 nsid; >> struct nvme_ns *ns; >> int ret; >> + scan_state = data; >> + ctrl = scan_state->ctrl; > > I think these assignments can be done on the declaration. > >> + nsid = (u32)atomic_fetch_add(1, &scan_state->count); >> + /* >> + * get NSID from list (if scanning from a list, not sequentially) >> + */ >> + if (scan_state->ns_list) >> + nsid = le32_to_cpu(scan_state->ns_list[nsid]); >> + > > This is awkward. ns_list passed in optionally. > How about we limit this change to only operate on nvme_scan_ns_list? > If the controller is old or quirked to support only a sequential scan > it does not benefit from a parallel scan. I doubt that these controllers > are likely to expose a large number of namespaces anyways. > >> + info.nsid = nsid; >> if (nvme_identify_ns_descs(ctrl, &info)) >> return; >> @@ -3881,11 +3908,15 @@ static int nvme_scan_ns_list(struct nvme_ctrl >> *ctrl) >> __le32 *ns_list; >> u32 prev = 0; >> int ret = 0, i; >> + ASYNC_DOMAIN(domain); >> + struct nvme_scan_state scan_state; >> ns_list = kzalloc(NVME_IDENTIFY_DATA_SIZE, GFP_KERNEL); >> if (!ns_list) >> return -ENOMEM; >> + scan_state.ctrl = ctrl; >> + scan_state.ns_list = ns_list; > > Is there a need to have a local ns_list variable here? > >> for (;;) { >> struct nvme_command cmd = { >> .identify.opcode = nvme_admin_identify, >> @@ -3901,19 +3932,25 @@ static int nvme_scan_ns_list(struct nvme_ctrl >> *ctrl) >> goto free; >> } >> + /* >> + * scan list starting at list offset 0 >> + */ >> + atomic_set(&scan_state.count, 0); >> for (i = 0; i < nr_entries; i++) { >> u32 nsid = le32_to_cpu(ns_list[i]); >> if (!nsid) /* end of the list? */ >> goto out; >> - nvme_scan_ns(ctrl, nsid); >> + async_schedule_domain(nvme_scan_ns, &scan_state, &domain); >> while (++prev < nsid) >> nvme_ns_remove_by_nsid(ctrl, prev); >> } >> + async_synchronize_full_domain(&domain); >> } >> out: >> nvme_remove_invalid_namespaces(ctrl, prev); > > Is it a good idea to remove the invalid namespaces before synchronizing > the async scans? > >> free: >> + async_synchronize_full_domain(&domain); >> kfree(ns_list); >> return ret; >> } >> @@ -3922,14 +3959,23 @@ static void nvme_scan_ns_sequential(struct >> nvme_ctrl *ctrl) >> { >> struct nvme_id_ctrl *id; >> u32 nn, i; >> + ASYNC_DOMAIN(domain); >> + struct nvme_scan_state scan_state; >> if (nvme_identify_ctrl(ctrl, &id)) >> return; >> nn = le32_to_cpu(id->nn); >> kfree(id); >> + scan_state.ctrl = ctrl; >> + /* >> + * scan sequentially starting at NSID 1 >> + */ >> + atomic_set(&scan_state.count, 1); >> + scan_state.ns_list = NULL; >> for (i = 1; i <= nn; i++) >> - nvme_scan_ns(ctrl, i); >> + async_schedule_domain(nvme_scan_ns, &scan_state, &domain); >> + async_synchronize_full_domain(&domain); >> nvme_remove_invalid_namespaces(ctrl, nn); >> } > > I think we need a blktest for this. ns scanning has been notorious when > running simultaneously with controller reset/reconnect/remove > sequences... Ideally a test with a larger number of namespaces to > exercise the code. > > Also, make sure that blktest suite does not complain about anything > else.
On Mon, Jan 22, 2024 at 11:13:15AM +0200, Sagi Grimberg wrote: > On 1/18/24 23:03, Stuart Hayes wrote: > > @@ -3901,19 +3932,25 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl) > > goto free; > > } > > + /* > > + * scan list starting at list offset 0 > > + */ > > + atomic_set(&scan_state.count, 0); > > for (i = 0; i < nr_entries; i++) { > > u32 nsid = le32_to_cpu(ns_list[i]); > > if (!nsid) /* end of the list? */ > > goto out; > > - nvme_scan_ns(ctrl, nsid); > > + async_schedule_domain(nvme_scan_ns, &scan_state, &domain); > > while (++prev < nsid) > > nvme_ns_remove_by_nsid(ctrl, prev); > > } > > + async_synchronize_full_domain(&domain); You mentioned async scanning was an improvement if you have 1000 namespaces, but wouldn't this be worse if you have very few namespaces? IOW, the decision to use the async schedule should be based on nr_entries, right?
On 1/23/2024 8:37 AM, Keith Busch wrote: > On Mon, Jan 22, 2024 at 11:13:15AM +0200, Sagi Grimberg wrote: >> On 1/18/24 23:03, Stuart Hayes wrote: >>> @@ -3901,19 +3932,25 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl) >>> goto free; >>> } >>> + /* >>> + * scan list starting at list offset 0 >>> + */ >>> + atomic_set(&scan_state.count, 0); >>> for (i = 0; i < nr_entries; i++) { >>> u32 nsid = le32_to_cpu(ns_list[i]); >>> if (!nsid) /* end of the list? */ >>> goto out; >>> - nvme_scan_ns(ctrl, nsid); >>> + async_schedule_domain(nvme_scan_ns, &scan_state, &domain); >>> while (++prev < nsid) >>> nvme_ns_remove_by_nsid(ctrl, prev); >>> } >>> + async_synchronize_full_domain(&domain); > > You mentioned async scanning was an improvement if you have 1000 > namespaces, but wouldn't this be worse if you have very few namespaces? > IOW, the decision to use the async schedule should be based on > nr_entries, right? > Perhaps it's also helpful to documents the data for small number of namespaces, we can think of collecting data something like this:- NR Namespaces Seq Scan Async Scan 2 4 8 16 32 64 128 256 512 1024 If we find that difference is not that much then we can go ahead with this patch, if it the difference is not acceptable to the point that it will regress for common setups then we can make it configurable ? -ck
On 1/23/2024 2:21 PM, Chaitanya Kulkarni wrote: > On 1/23/2024 8:37 AM, Keith Busch wrote: >> On Mon, Jan 22, 2024 at 11:13:15AM +0200, Sagi Grimberg wrote: >>> On 1/18/24 23:03, Stuart Hayes wrote: >>>> @@ -3901,19 +3932,25 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl) >>>> goto free; >>>> } >>>> + /* >>>> + * scan list starting at list offset 0 >>>> + */ >>>> + atomic_set(&scan_state.count, 0); >>>> for (i = 0; i < nr_entries; i++) { >>>> u32 nsid = le32_to_cpu(ns_list[i]); >>>> if (!nsid) /* end of the list? */ >>>> goto out; >>>> - nvme_scan_ns(ctrl, nsid); >>>> + async_schedule_domain(nvme_scan_ns, &scan_state, &domain); >>>> while (++prev < nsid) >>>> nvme_ns_remove_by_nsid(ctrl, prev); >>>> } >>>> + async_synchronize_full_domain(&domain); >> >> You mentioned async scanning was an improvement if you have 1000 >> namespaces, but wouldn't this be worse if you have very few namespaces? >> IOW, the decision to use the async schedule should be based on >> nr_entries, right? >> > > Perhaps it's also helpful to documents the data for small number of > namespaces, we can think of collecting data something like this:- > > NR Namespaces Seq Scan Async Scan > 2 > 4 > 8 > 16 > 32 > 64 > 128 > 256 > 512 > 1024 > > If we find that difference is not that much then we can go ahead with > this patch, if it the difference is not acceptable to the point that it > will regress for common setups then we can make it configurable ? > > -ck > > I believe the only reason the async scanning should take any longer than sync scanning is that nvme_scan_ns() has to wait on the workqueue until it is scheduled. Testing on my system (with pcie nvme devices with a single namespace), it looks like it only takes a fraction of a ms (100us or less typically) for that to happen. Then it takes 6-10ms or more for the actual namesapce scan. So scanning asynchronously, even using a local pcie device with a single namespace, doesn't take significantly longer. Of course I guess it might take a bit longer on a busy system, but I wouldn't think that scanning namespaces is a critical path where a few milliseconds would make much difference (?). It wouldn't be too hard to make it scan synchronously if there aren't more than, say, a couple namespaces, but my opinion is that the minimal benefit wouldn't be worth the extra code.
On 1/22/2024 3:13 AM, Sagi Grimberg wrote: > > > On 1/18/24 23:03, Stuart Hayes wrote: >> Use async function calls to make namespace scanning happen in parallel. >> >> Without the patch, NVME namespaces are scanned serially, so it can take a >> long time for all of a controller's namespaces to become available, >> especially with a slower (TCP) interface with large number of namespaces. >> >> The time it took for all namespaces to show up after connecting (via TCP) >> to a controller with 1002 namespaces was measured: >> >> network latency without patch with patch >> 0 6s 1s >> 50ms 210s 10s >> 100ms 417s 18s >> > > Impressive speedup. Not a very common use-case though... > >> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> >> >> -- >> V2: remove module param to enable/disable async scanning >> add scan time measurements to commit message >> >> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >> index 0af612387083..069350f85b83 100644 >> --- a/drivers/nvme/host/core.c >> +++ b/drivers/nvme/host/core.c >> @@ -4,6 +4,7 @@ >> * Copyright (c) 2011-2014, Intel Corporation. >> */ >> +#include <linux/async.h> >> #include <linux/blkdev.h> >> #include <linux/blk-mq.h> >> #include <linux/blk-integrity.h> >> @@ -3812,12 +3813,38 @@ static void nvme_validate_ns(struct nvme_ns *ns, struct nvme_ns_info *info) >> nvme_ns_remove(ns); >> } >> -static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid) >> +/* >> + * struct nvme_scan_state - keeps track of controller & NSIDs to scan >> + * @ctrl: Controller on which namespaces are being scanned >> + * @count: Next NSID to scan (for sequential scan), or >> + * Index of next NSID to scan in ns_list (for list scan) >> + * @ns_list: pointer to list of NSIDs to scan (NULL if sequential scan) >> + */ >> +struct nvme_scan_state { >> + struct nvme_ctrl *ctrl; >> + atomic_t count; >> + __le32 *ns_list; >> +}; >> + >> +static void nvme_scan_ns(void *data, async_cookie_t cookie) > > I think its better to call it nvme_scan_ns_async to indicate what > it is. > >> { >> - struct nvme_ns_info info = { .nsid = nsid }; >> + struct nvme_ns_info info = {}; >> + struct nvme_scan_state *scan_state; >> + struct nvme_ctrl *ctrl; >> + u32 nsid; >> struct nvme_ns *ns; >> int ret; >> + scan_state = data; >> + ctrl = scan_state->ctrl; > > I think these assignments can be done on the declaration. > >> + nsid = (u32)atomic_fetch_add(1, &scan_state->count); >> + /* >> + * get NSID from list (if scanning from a list, not sequentially) >> + */ >> + if (scan_state->ns_list) >> + nsid = le32_to_cpu(scan_state->ns_list[nsid]); >> + > > This is awkward. ns_list passed in optionally. > How about we limit this change to only operate on nvme_scan_ns_list? > If the controller is old or quirked to support only a sequential scan > it does not benefit from a parallel scan. I doubt that these controllers > are likely to expose a large number of namespaces anyways. > >> + info.nsid = nsid; >> if (nvme_identify_ns_descs(ctrl, &info)) >> return; >> @@ -3881,11 +3908,15 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl) >> __le32 *ns_list; >> u32 prev = 0; >> int ret = 0, i; >> + ASYNC_DOMAIN(domain); >> + struct nvme_scan_state scan_state; >> ns_list = kzalloc(NVME_IDENTIFY_DATA_SIZE, GFP_KERNEL); >> if (!ns_list) >> return -ENOMEM; >> + scan_state.ctrl = ctrl; >> + scan_state.ns_list = ns_list; > > Is there a need to have a local ns_list variable here? > >> for (;;) { >> struct nvme_command cmd = { >> .identify.opcode = nvme_admin_identify, >> @@ -3901,19 +3932,25 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl) >> goto free; >> } >> + /* >> + * scan list starting at list offset 0 >> + */ >> + atomic_set(&scan_state.count, 0); >> for (i = 0; i < nr_entries; i++) { >> u32 nsid = le32_to_cpu(ns_list[i]); >> if (!nsid) /* end of the list? */ >> goto out; >> - nvme_scan_ns(ctrl, nsid); >> + async_schedule_domain(nvme_scan_ns, &scan_state, &domain); >> while (++prev < nsid) >> nvme_ns_remove_by_nsid(ctrl, prev); >> } >> + async_synchronize_full_domain(&domain); >> } >> out: >> nvme_remove_invalid_namespaces(ctrl, prev); > > Is it a good idea to remove the invalid namespaces before synchronizing > the async scans? > >> free: >> + async_synchronize_full_domain(&domain); >> kfree(ns_list); >> return ret; >> } >> @@ -3922,14 +3959,23 @@ static void nvme_scan_ns_sequential(struct nvme_ctrl *ctrl) >> { >> struct nvme_id_ctrl *id; >> u32 nn, i; >> + ASYNC_DOMAIN(domain); >> + struct nvme_scan_state scan_state; >> if (nvme_identify_ctrl(ctrl, &id)) >> return; >> nn = le32_to_cpu(id->nn); >> kfree(id); >> + scan_state.ctrl = ctrl; >> + /* >> + * scan sequentially starting at NSID 1 >> + */ >> + atomic_set(&scan_state.count, 1); >> + scan_state.ns_list = NULL; >> for (i = 1; i <= nn; i++) >> - nvme_scan_ns(ctrl, i); >> + async_schedule_domain(nvme_scan_ns, &scan_state, &domain); >> + async_synchronize_full_domain(&domain); >> nvme_remove_invalid_namespaces(ctrl, nn); >> } > > I think we need a blktest for this. ns scanning has been notorious when > running simultaneously with controller reset/reconnect/remove > sequences... Ideally a test with a larger number of namespaces to > exercise the code. > > Also, make sure that blktest suite does not complain about anything > else. Thank you for the feedback on the patch, I agree with it. I'm not sure how to implement a blktest suite for this, though. I can look into it.
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 0af612387083..069350f85b83 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4,6 +4,7 @@ * Copyright (c) 2011-2014, Intel Corporation. */ +#include <linux/async.h> #include <linux/blkdev.h> #include <linux/blk-mq.h> #include <linux/blk-integrity.h> @@ -3812,12 +3813,38 @@ static void nvme_validate_ns(struct nvme_ns *ns, struct nvme_ns_info *info) nvme_ns_remove(ns); } -static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid) +/* + * struct nvme_scan_state - keeps track of controller & NSIDs to scan + * @ctrl: Controller on which namespaces are being scanned + * @count: Next NSID to scan (for sequential scan), or + * Index of next NSID to scan in ns_list (for list scan) + * @ns_list: pointer to list of NSIDs to scan (NULL if sequential scan) + */ +struct nvme_scan_state { + struct nvme_ctrl *ctrl; + atomic_t count; + __le32 *ns_list; +}; + +static void nvme_scan_ns(void *data, async_cookie_t cookie) { - struct nvme_ns_info info = { .nsid = nsid }; + struct nvme_ns_info info = {}; + struct nvme_scan_state *scan_state; + struct nvme_ctrl *ctrl; + u32 nsid; struct nvme_ns *ns; int ret; + scan_state = data; + ctrl = scan_state->ctrl; + nsid = (u32)atomic_fetch_add(1, &scan_state->count); + /* + * get NSID from list (if scanning from a list, not sequentially) + */ + if (scan_state->ns_list) + nsid = le32_to_cpu(scan_state->ns_list[nsid]); + + info.nsid = nsid; if (nvme_identify_ns_descs(ctrl, &info)) return; @@ -3881,11 +3908,15 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl) __le32 *ns_list; u32 prev = 0; int ret = 0, i; + ASYNC_DOMAIN(domain); + struct nvme_scan_state scan_state; ns_list = kzalloc(NVME_IDENTIFY_DATA_SIZE, GFP_KERNEL); if (!ns_list) return -ENOMEM; + scan_state.ctrl = ctrl; + scan_state.ns_list = ns_list; for (;;) { struct nvme_command cmd = { .identify.opcode = nvme_admin_identify, @@ -3901,19 +3932,25 @@ static int nvme_scan_ns_list(struct nvme_ctrl *ctrl) goto free; } + /* + * scan list starting at list offset 0 + */ + atomic_set(&scan_state.count, 0); for (i = 0; i < nr_entries; i++) { u32 nsid = le32_to_cpu(ns_list[i]); if (!nsid) /* end of the list? */ goto out; - nvme_scan_ns(ctrl, nsid); + async_schedule_domain(nvme_scan_ns, &scan_state, &domain); while (++prev < nsid) nvme_ns_remove_by_nsid(ctrl, prev); } + async_synchronize_full_domain(&domain); } out: nvme_remove_invalid_namespaces(ctrl, prev); free: + async_synchronize_full_domain(&domain); kfree(ns_list); return ret; } @@ -3922,14 +3959,23 @@ static void nvme_scan_ns_sequential(struct nvme_ctrl *ctrl) { struct nvme_id_ctrl *id; u32 nn, i; + ASYNC_DOMAIN(domain); + struct nvme_scan_state scan_state; if (nvme_identify_ctrl(ctrl, &id)) return; nn = le32_to_cpu(id->nn); kfree(id); + scan_state.ctrl = ctrl; + /* + * scan sequentially starting at NSID 1 + */ + atomic_set(&scan_state.count, 1); + scan_state.ns_list = NULL; for (i = 1; i <= nn; i++) - nvme_scan_ns(ctrl, i); + async_schedule_domain(nvme_scan_ns, &scan_state, &domain); + async_synchronize_full_domain(&domain); nvme_remove_invalid_namespaces(ctrl, nn); }