nvme_core: scan namespaces asynchronously

Message ID 20240104163826.10561-1-stuart.w.hayes@gmail.com
State New
Headers
Series nvme_core: scan namespaces asynchronously |

Commit Message

stuart hayes Jan. 4, 2024, 4:38 p.m. UTC
  Currently NVME namespaces are scanned serially, so it can take a long time
for all of a controller's namespaces to become available, especially with a
slower (fabrics) interface with large number (~1000) of namespaces.

Use async function calls to make namespace scanning happen in parallel,
and add a (boolean) module parameter "async_ns_scan" to enable this.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
  

Comments

Keith Busch Jan. 4, 2024, 4:47 p.m. UTC | #1
On Thu, Jan 04, 2024 at 10:38:26AM -0600, Stuart Hayes wrote:
> Currently NVME namespaces are scanned serially, so it can take a long time
> for all of a controller's namespaces to become available, especially with a
> slower (fabrics) interface with large number (~1000) of namespaces.
> 
> Use async function calls to make namespace scanning happen in parallel,
> and add a (boolean) module parameter "async_ns_scan" to enable this.

Hm, we're not doing a whole lot of blocking IO to bring up a namespace,
so I'm a little surprised it makes a noticable difference. How much time
improvement are you observing by parallelizing the scan? Is there a
tipping point in Number of Namespaces where inline scanning is better
than asynchronous? And if it is a meaningful gain, let's not introduce
another module parameter to disable it.
  
Chaitanya Kulkarni Jan. 4, 2024, 11:06 p.m. UTC | #2
On 1/4/2024 8:38 AM, Stuart Hayes wrote:
> Currently NVME namespaces are scanned serially, so it can take a long time
> for all of a controller's namespaces to become available, especially with a
> slower (fabrics) interface with large number (~1000) of namespaces.
> 
> Use async function calls to make namespace scanning happen in parallel,
> and add a (boolean) module parameter "async_ns_scan" to enable this.
> 
> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> 

please update commit log with detailed setup information and
% time saved to make everyone else review it quickly ...

-ck
  
Max Gurtovoy Jan. 7, 2024, 12:40 a.m. UTC | #3
On 04/01/2024 18:47, Keith Busch wrote:
> On Thu, Jan 04, 2024 at 10:38:26AM -0600, Stuart Hayes wrote:
>> Currently NVME namespaces are scanned serially, so it can take a long time
>> for all of a controller's namespaces to become available, especially with a
>> slower (fabrics) interface with large number (~1000) of namespaces.
>>
>> Use async function calls to make namespace scanning happen in parallel,
>> and add a (boolean) module parameter "async_ns_scan" to enable this.
> 
> Hm, we're not doing a whole lot of blocking IO to bring up a namespace,
> so I'm a little surprised it makes a noticable difference. How much time
> improvement are you observing by parallelizing the scan? Is there a
> tipping point in Number of Namespaces where inline scanning is better
> than asynchronous? And if it is a meaningful gain, let's not introduce
> another module parameter to disable it.

I don't think it is a good idea since some of the namespace 
characteristics must be validated during re-connection time for example.
I actually prepared a patch that makes sure we sync the ns scanning 
before kicking the ns blk queue to avoid that situations.
for example, if for some reason ns1 change its uuid then we must remove 
it and open a new bdev instead. We can't kick old request to it...
  
stuart hayes Jan. 12, 2024, 7:36 p.m. UTC | #4
> 
> 
> On 04/01/2024 18:47, Keith Busch wrote:
>> On Thu, Jan 04, 2024 at 10:38:26AM -0600, Stuart Hayes wrote:
>>> Currently NVME namespaces are scanned serially, so it can take a long time
>>> for all of a controller's namespaces to become available, especially with a
>>> slower (fabrics) interface with large number (~1000) of namespaces.
>>>
>>> Use async function calls to make namespace scanning happen in parallel,
>>> and add a (boolean) module parameter "async_ns_scan" to enable this.
>>
>> Hm, we're not doing a whole lot of blocking IO to bring up a namespace,
>> so I'm a little surprised it makes a noticable difference. How much time
>> improvement are you observing by parallelizing the scan? Is there a
>> tipping point in Number of Namespaces where inline scanning is better
>> than asynchronous? And if it is a meaningful gain, let's not introduce
>> another module parameter to disable it.
> 
> I don't think it is a good idea since some of the namespace characteristics must be validated during re-connection time for example.
> I actually prepared a patch that makes sure we sync the ns scanning before kicking the ns blk queue to avoid that situations.
> for example, if for some reason ns1 change its uuid then we must remove it and open a new bdev instead. We can't kick old request to it...
> 


Sorry for the delayed response--I thought I could get exact data on how long it takes with and
without the patch before I responded, it is taking a while (I'm having to rely on someone else
to do the testing).  I'll respond with the data as soon as I get it--hopefully it won't be too
much longer.  The time it takes to scan namespaces adds up when there are 1000 namespaces and
you have a fabrics controller on a network that isn't too fast.

I don't expect there would be any reason to disable this.  I only put the module parameter to
disable it in case there was some unforeseen issue, but I can remove that.

To Max Gurtovoy--this patch wouldn't change when or how namespaces are validated... it just
puts the actual scan work function on a workqueue so the scans can happen in parallel.  It will
do the same work to scan, at the same point, and it will wait for all the scanning to finish
before proceeding.  I don't understand how this patch would make the situation you mention any
worse.
  
stuart hayes Jan. 16, 2024, 7:14 p.m. UTC | #5
On 1/12/2024 1:36 PM, stuart hayes wrote:
> 
>>
>>
>> On 04/01/2024 18:47, Keith Busch wrote:
>>> On Thu, Jan 04, 2024 at 10:38:26AM -0600, Stuart Hayes wrote:
>>>> Currently NVME namespaces are scanned serially, so it can take a long time
>>>> for all of a controller's namespaces to become available, especially with a
>>>> slower (fabrics) interface with large number (~1000) of namespaces.
>>>>
>>>> Use async function calls to make namespace scanning happen in parallel,
>>>> and add a (boolean) module parameter "async_ns_scan" to enable this.
>>>
>>> Hm, we're not doing a whole lot of blocking IO to bring up a namespace,
>>> so I'm a little surprised it makes a noticable difference. How much time
>>> improvement are you observing by parallelizing the scan? Is there a
>>> tipping point in Number of Namespaces where inline scanning is better
>>> than asynchronous? And if it is a meaningful gain, let's not introduce
>>> another module parameter to disable it.
>>
>> I don't think it is a good idea since some of the namespace characteristics must be validated during re-connection time for example.
>> I actually prepared a patch that makes sure we sync the ns scanning before kicking the ns blk queue to avoid that situations.
>> for example, if for some reason ns1 change its uuid then we must remove it and open a new bdev instead. We can't kick old request to it...
>>
> 
> 
> Sorry for the delayed response--I thought I could get exact data on how long it takes with and
> without the patch before I responded, it is taking a while (I'm having to rely on someone else
> to do the testing).  I'll respond with the data as soon as I get it--hopefully it won't be too
> much longer.  The time it takes to scan namespaces adds up when there are 1000 namespaces and
> you have a fabrics controller on a network that isn't too fast.
> 
> I don't expect there would be any reason to disable this.  I only put the module parameter to
> disable it in case there was some unforeseen issue, but I can remove that.
> 
> To Max Gurtovoy--this patch wouldn't change when or how namespaces are validated... it just
> puts the actual scan work function on a workqueue so the scans can happen in parallel.  It will
> do the same work to scan, at the same point, and it will wait for all the scanning to finish
> before proceeding.  I don't understand how this patch would make the situation you mention any
> worse.
> 

I have numbers for the namespace scan time improvement.  Below is the amount of time it took for
all of the namespaces to show up when connecting to a controller with 1002 namespaces:

network latency   time without patch    time with patch
   0                        6s                 1s
  50                      210s                10s
100                      417s                18s

I'll prepare a v2, removing the module parameter and including this data.
  
Chaitanya Kulkarni Jan. 16, 2024, 9:20 p.m. UTC | #6
> I have numbers for the namespace scan time improvement.  Below is the 
> amount of time it took for
> all of the namespaces to show up when connecting to a controller with 
> 1002 namespaces:
>
> network latency   time without patch    time with patch
>   0                        6s                 1s
>  50                      210s                10s
> 100                      417s                18s
>
> I'll prepare a v2, removing the module parameter and including this data.
>

interesting! please include this data either in the cover-letter or in the
commit log for v2 ..

-ck
  
Sagi Grimberg Jan. 17, 2024, 2:15 p.m. UTC | #7
On 1/16/24 21:14, stuart hayes wrote:
> 
> 
> On 1/12/2024 1:36 PM, stuart hayes wrote:
>>
>>>
>>>
>>> On 04/01/2024 18:47, Keith Busch wrote:
>>>> On Thu, Jan 04, 2024 at 10:38:26AM -0600, Stuart Hayes wrote:
>>>>> Currently NVME namespaces are scanned serially, so it can take a 
>>>>> long time
>>>>> for all of a controller's namespaces to become available, 
>>>>> especially with a
>>>>> slower (fabrics) interface with large number (~1000) of namespaces.
>>>>>
>>>>> Use async function calls to make namespace scanning happen in 
>>>>> parallel,
>>>>> and add a (boolean) module parameter "async_ns_scan" to enable this.
>>>>
>>>> Hm, we're not doing a whole lot of blocking IO to bring up a namespace,
>>>> so I'm a little surprised it makes a noticable difference. How much 
>>>> time
>>>> improvement are you observing by parallelizing the scan? Is there a
>>>> tipping point in Number of Namespaces where inline scanning is better
>>>> than asynchronous? And if it is a meaningful gain, let's not introduce
>>>> another module parameter to disable it.
>>>
>>> I don't think it is a good idea since some of the namespace 
>>> characteristics must be validated during re-connection time for example.
>>> I actually prepared a patch that makes sure we sync the ns scanning 
>>> before kicking the ns blk queue to avoid that situations.
>>> for example, if for some reason ns1 change its uuid then we must 
>>> remove it and open a new bdev instead. We can't kick old request to 
>>> it...
>>>
>>
>>
>> Sorry for the delayed response--I thought I could get exact data on 
>> how long it takes with and
>> without the patch before I responded, it is taking a while (I'm having 
>> to rely on someone else
>> to do the testing).  I'll respond with the data as soon as I get 
>> it--hopefully it won't be too
>> much longer.  The time it takes to scan namespaces adds up when there 
>> are 1000 namespaces and
>> you have a fabrics controller on a network that isn't too fast.
>>
>> I don't expect there would be any reason to disable this.  I only put 
>> the module parameter to
>> disable it in case there was some unforeseen issue, but I can remove 
>> that.
>>
>> To Max Gurtovoy--this patch wouldn't change when or how namespaces are 
>> validated... it just
>> puts the actual scan work function on a workqueue so the scans can 
>> happen in parallel.  It will
>> do the same work to scan, at the same point, and it will wait for all 
>> the scanning to finish
>> before proceeding.  I don't understand how this patch would make the 
>> situation you mention any
>> worse.
>>
> 
> I have numbers for the namespace scan time improvement.  Below is the 
> amount of time it took for
> all of the namespaces to show up when connecting to a controller with 
> 1002 namespaces:
> 
> network latency   time without patch    time with patch
>    0                        6s                 1s
>   50                      210s                10s
> 100                      417s                18s


That is a big improvement. I wouldn't say that 1000+ namespaces
is a common requirement. But the improvement speaks for itself.
  

Patch

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 60f14019f981..aeda60469872 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4,6 +4,7 @@ 
  * Copyright (c) 2011-2014, Intel Corporation.
  */
 
+#include <linux/async.h>
 #include <linux/blkdev.h>
 #include <linux/blk-mq.h>
 #include <linux/blk-integrity.h>
@@ -88,6 +89,10 @@  module_param(apst_secondary_latency_tol_us, ulong, 0644);
 MODULE_PARM_DESC(apst_secondary_latency_tol_us,
 	"secondary APST latency tolerance in us");
 
+static bool async_ns_scan;
+module_param(async_ns_scan, bool, 0644);
+MODULE_PARM_DESC(async_ns_scan, "allow namespaces to be scanned asynchronously");
+
 /*
  * nvme_wq - hosts nvme related works that are not reset or delete
  * nvme_reset_wq - hosts nvme reset works
@@ -3780,12 +3785,38 @@  static void nvme_validate_ns(struct nvme_ns *ns, struct nvme_ns_info *info)
 		nvme_ns_remove(ns);
 }
 
-static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid)
+/*
+ * struct nvme_scan_state - keeps track of controller & NSIDs to scan
+ * @ctrl:	Controller on which namespaces are being scanned
+ * @count:	Next NSID to scan (for sequential scan), or
+ *		Index of next NSID to scan in ns_list (for list scan)
+ * @ns_list:	pointer to list of NSIDs to scan (NULL if sequential scan)
+ */
+struct nvme_scan_state {
+	struct nvme_ctrl *ctrl;
+	atomic_t count;
+	__le32 *ns_list;
+};
+
+static void nvme_scan_ns(void *data, async_cookie_t cookie)
 {
-	struct nvme_ns_info info = { .nsid = nsid };
+	struct nvme_ns_info info = {};
+	struct nvme_scan_state *scan_state;
+	struct nvme_ctrl *ctrl;
+	u32 nsid;
 	struct nvme_ns *ns;
 	int ret;
 
+	scan_state = data;
+	ctrl = scan_state->ctrl;
+	nsid = (u32)atomic_fetch_add(1, &scan_state->count);
+	/*
+	 * get NSID from list (if scanning from a list, not sequentially)
+	 */
+	if (scan_state->ns_list)
+		nsid = le32_to_cpu(scan_state->ns_list[nsid]);
+
+	info.nsid = nsid;
 	if (nvme_identify_ns_descs(ctrl, &info))
 		return;
 
@@ -3849,11 +3880,15 @@  static int nvme_scan_ns_list(struct nvme_ctrl *ctrl)
 	__le32 *ns_list;
 	u32 prev = 0;
 	int ret = 0, i;
+	ASYNC_DOMAIN(domain);
+	struct nvme_scan_state scan_state;
 
 	ns_list = kzalloc(NVME_IDENTIFY_DATA_SIZE, GFP_KERNEL);
 	if (!ns_list)
 		return -ENOMEM;
 
+	scan_state.ctrl = ctrl;
+	scan_state.ns_list = ns_list;
 	for (;;) {
 		struct nvme_command cmd = {
 			.identify.opcode	= nvme_admin_identify,
@@ -3869,19 +3904,30 @@  static int nvme_scan_ns_list(struct nvme_ctrl *ctrl)
 			goto free;
 		}
 
+		/*
+		 * scan list starting at list offset 0
+		 */
+		atomic_set(&scan_state.count, 0);
 		for (i = 0; i < nr_entries; i++) {
 			u32 nsid = le32_to_cpu(ns_list[i]);
 
 			if (!nsid)	/* end of the list? */
 				goto out;
-			nvme_scan_ns(ctrl, nsid);
+
+			if (async_ns_scan)
+				async_schedule_domain(nvme_scan_ns, &scan_state, &domain);
+			else
+				nvme_scan_ns(&scan_state, 0);
+
 			while (++prev < nsid)
 				nvme_ns_remove_by_nsid(ctrl, prev);
 		}
+		async_synchronize_full_domain(&domain);
 	}
  out:
 	nvme_remove_invalid_namespaces(ctrl, prev);
  free:
+	async_synchronize_full_domain(&domain);
 	kfree(ns_list);
 	return ret;
 }
@@ -3890,14 +3936,26 @@  static void nvme_scan_ns_sequential(struct nvme_ctrl *ctrl)
 {
 	struct nvme_id_ctrl *id;
 	u32 nn, i;
+	ASYNC_DOMAIN(domain);
+	struct nvme_scan_state scan_state;
 
 	if (nvme_identify_ctrl(ctrl, &id))
 		return;
 	nn = le32_to_cpu(id->nn);
 	kfree(id);
 
+	scan_state.ctrl = ctrl;
+	/*
+	 * scan sequentially starting at NSID 1
+	 */
+	atomic_set(&scan_state.count, 1);
+	scan_state.ns_list = NULL;
 	for (i = 1; i <= nn; i++)
-		nvme_scan_ns(ctrl, i);
+		if (async_ns_scan)
+			async_schedule_domain(nvme_scan_ns, &scan_state, &domain);
+		else
+			nvme_scan_ns(&scan_state, 0);
+	async_synchronize_full_domain(&domain);
 
 	nvme_remove_invalid_namespaces(ctrl, nn);
 }