diff mbox series

[v0,1/6] nvme-fabrics: introduce connect_sync option

Message ID	20240216084526.14133-2-dwagner@suse.de
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel+bounces-68261-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; From: Daniel Wagner <dwagner@suse.de> To: James Smart <james.smart@broadcom.com> Cc: Keith Busch <kbusch@kernel.org>, Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>, Hannes Reinecke <hare@suse.de>, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Daniel Wagner <dwagner@suse.de> Subject: [PATCH v0 1/6] nvme-fabrics: introduce connect_sync option Date: Fri, 16 Feb 2024 09:45:21 +0100 Message-ID: <20240216084526.14133-2-dwagner@suse.de> In-Reply-To: <20240216084526.14133-1-dwagner@suse.de> References: <20240216084526.14133-1-dwagner@suse.de> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	nvme-fc: fix blktests nvme/041 \| [v0,0/6] nvme-fc: fix blktests nvme/041 [v0,1/6] nvme-fabrics: introduce connect_sync option [v0,2/6] nvme-fc: rename free_ctrl callback to match name pattern [v0,3/6] nvme-fc: do not retry when auth fails or connection is refused [v0,4/6] nvme-fabrics: introduce ref counting for nvmf_ctrl_options [v0,5/6] nvme-fc: redesign locking and refcounting [v0,6/6] nvme-fc: wait for connect attempt to finish

Commit Message

Daniel Wagner Feb. 16, 2024, 8:45 a.m. UTC

  The TCP and RDMA transport are doing a synchronous connect, meaning the
syscal returns with the final result, that is. it either failed or
succeeded.

This isn't the case for FC. This transport just setups and triggers
the connect and returns without waiting on the result. Introduce a flag
to allow user space to control the behavior, wait or don't wait.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/host/fabrics.c | 6 +++++-
 drivers/nvme/host/fabrics.h | 3 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig Feb. 16, 2024, 9:49 a.m. UTC | #1

On Fri, Feb 16, 2024 at 09:45:21AM +0100, Daniel Wagner wrote:
> The TCP and RDMA transport are doing a synchronous connect, meaning the
> syscal returns with the final result, that is. it either failed or
> succeeded.
> 
> This isn't the case for FC. This transport just setups and triggers
> the connect and returns without waiting on the result.

That's really weird and unexpected.  James, can you explain the reason
behind this?

> Introduce a flag
> to allow user space to control the behavior, wait or don't wait.

I'd expect this to be the default, but I'll wait to hear more about
the rationale.  If we keep the async default the option looks sensible.

Daniel Wagner Feb. 16, 2024, 4:44 p.m. UTC | #2

On Fri, Feb 16, 2024 at 10:49:09AM +0100, Christoph Hellwig wrote:
> On Fri, Feb 16, 2024 at 09:45:21AM +0100, Daniel Wagner wrote:
> > The TCP and RDMA transport are doing a synchronous connect, meaning the
> > syscal returns with the final result, that is. it either failed or
> > succeeded.
> > 
> > This isn't the case for FC. This transport just setups and triggers
> > the connect and returns without waiting on the result.
> 
> That's really weird and unexpected.  James, can you explain the reason
> behind this?

James answered this point on my attempt to make this synchronous:

https://lore.kernel.org/linux-nvme/0605ac36-16d5-2026-d3c6-62d346db6dfb@gmail.com/

> > Introduce a flag
> > to allow user space to control the behavior, wait or don't wait.
> 
> I'd expect this to be the default, but I'll wait to hear more about
> the rationale.  If we keep the async default the option looks sensible.

Ideally, we could agree on behavior which is the same for all
transports.

Hannes Reinecke Feb. 17, 2024, 4:27 p.m. UTC | #3

On 2/16/24 10:49, Christoph Hellwig wrote:
> On Fri, Feb 16, 2024 at 09:45:21AM +0100, Daniel Wagner wrote:
>> The TCP and RDMA transport are doing a synchronous connect, meaning the
>> syscal returns with the final result, that is. it either failed or
>> succeeded.
>>
>> This isn't the case for FC. This transport just setups and triggers
>> the connect and returns without waiting on the result.
> 
> That's really weird and unexpected.  James, can you explain the reason
> behind this?
> 
Reason is that the initial connect attempt might fail with an temporary 
failure, and will need to be retried. And rather than implementing two 
methods for handling this (one for the initial connect, and another one
for reconnect where one _has_ to use a workqueue) as eg TCP and RDMA
has implemented it FC is using a single code path for handling both.

Temporary failure on initial connect is far more likely on FC than on
other transports due to the way how FC-NVMe is modelled; essentially
one has to log into the remote port for each protocol. So if you run
in a dual fabric (with both FCP and NVMe) you'll need to log into the
same remote port twice. Depending on the implementation the target might 
only be capable of handling one port login at the same time, so the
other one will be failed with a temporary error.
That's why it's a common issue with FC. It _might_ happen with TCP, too,
but apparently not regularly otherwise we would have seen quite some
failures here; TCP can't really handle temporary failures for the
initial connect.

Cheers,

Hannes

Christoph Hellwig Feb. 20, 2024, 6:51 a.m. UTC | #4

On Fri, Feb 16, 2024 at 05:44:02PM +0100, Daniel Wagner wrote:
> James answered this point on my attempt to make this synchronous:
> 
> https://lore.kernel.org/linux-nvme/0605ac36-16d5-2026-d3c6-62d346db6dfb@gmail.com/

That needs to go into the commit log.  And I call complete BS on that
to be honest.

> Ideally, we could agree on behavior which is the same for all
> transports.

sync and async opt in it is.  I'm still pissed FC did this differently
without any proper discussion of the tradeoffs.

diff mbox series

Patch

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 3499acbf6a82..7d33f0f5824f 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -678,6 +678,7 @@  static const match_table_t opt_tokens = {
 #ifdef CONFIG_NVME_TCP_TLS
 	{ NVMF_OPT_TLS,			"tls"			},
 #endif
+	{ NVMF_OPT_CONNECT_SYNC,	"connect_sync"		},
 	{ NVMF_OPT_ERR,			NULL			}
 };
 
@@ -1024,6 +1025,9 @@  static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 			}
 			opts->tls = true;
 			break;
+		case NVMF_OPT_CONNECT_SYNC:
+			opts->connect_sync = true;
+			break;
 		default:
 			pr_warn("unknown parameter or missing value '%s' in ctrl creation request\n",
 				p);
@@ -1245,7 +1249,7 @@  EXPORT_SYMBOL_GPL(nvmf_free_options);
 				 NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
 				 NVMF_OPT_DISABLE_SQFLOW | NVMF_OPT_DISCOVERY |\
 				 NVMF_OPT_FAIL_FAST_TMO | NVMF_OPT_DHCHAP_SECRET |\
-				 NVMF_OPT_DHCHAP_CTRL_SECRET)
+				 NVMF_OPT_DHCHAP_CTRL_SECRET | NVMF_OPT_CONNECT_SYNC)
 
 static struct nvme_ctrl *
 nvmf_create_ctrl(struct device *dev, const char *buf)
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index 06cc54851b1b..01d3ef545f14 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -73,6 +73,7 @@  enum {
 	NVMF_OPT_TLS		= 1 << 25,
 	NVMF_OPT_KEYRING	= 1 << 26,
 	NVMF_OPT_TLS_KEY	= 1 << 27,
+	NVMF_OPT_CONNECT_SYNC	= 1 << 28,
 };
 
 /**
@@ -115,6 +116,7 @@  enum {
  * @nr_poll_queues: number of queues for polling I/O
  * @tos: type of service
  * @fast_io_fail_tmo: Fast I/O fail timeout in seconds
+ * @connect_sync: wait for connect attempt(s) to succeed or fail
  */
 struct nvmf_ctrl_options {
 	unsigned		mask;
@@ -144,6 +146,7 @@  struct nvmf_ctrl_options {
 	unsigned int		nr_poll_queues;
 	int			tos;
 	int			fast_io_fail_tmo;
+	bool			connect_sync;
 };
 
 /*