net: mana: Fix MANA VF unload when host is unresponsive

Message ID 1687343341-10898-1-git-send-email-schakrabarti@linux.microsoft.com
State New
Headers
Series net: mana: Fix MANA VF unload when host is unresponsive |

Commit Message

Souradeep Chakrabarti June 21, 2023, 10:29 a.m. UTC
  From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>

This patch addresses  the VF unload issue, where mana_dealloc_queues()
gets stuck in infinite while loop, because of host unresponsiveness.
It adds a timeout in the while loop, to fix it.

Also this patch adds a new attribute in mana_context, which gets set when
mana_hwc_send_request() hits a timeout because of host unresponsiveness.
This flag then helps to avoid the timeouts in successive calls.

Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
---
 .../net/ethernet/microsoft/mana/gdma_main.c   |  4 +++-
 .../net/ethernet/microsoft/mana/hw_channel.c  | 12 ++++++++++-
 drivers/net/ethernet/microsoft/mana/mana_en.c | 21 +++++++++++++++++--
 include/net/mana/mana.h                       |  2 ++
 4 files changed, 35 insertions(+), 4 deletions(-)
  

Comments

Simon Horman June 21, 2023, 5:37 p.m. UTC | #1
On Wed, Jun 21, 2023 at 03:29:01AM -0700, souradeep chakrabarti wrote:
> From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> 
> This patch addresses  the VF unload issue, where mana_dealloc_queues()
> gets stuck in infinite while loop, because of host unresponsiveness.
> It adds a timeout in the while loop, to fix it.
> 
> Also this patch adds a new attribute in mana_context, which gets set when
> mana_hwc_send_request() hits a timeout because of host unresponsiveness.
> This flag then helps to avoid the timeouts in successive calls.
> 
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>

Hi Souradeep,

thanks for your patch.
Some minor feedback from my aide.

> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 8f3f78b68592..5cc43ae78334 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -946,10 +946,12 @@ int mana_gd_deregister_device(struct gdma_dev *gd)
>  	struct gdma_context *gc = gd->gdma_context;
>  	struct gdma_general_resp resp = {};
>  	struct gdma_general_req req = {};
> +	struct mana_context *ac;
>  	int err;
>  
>  	if (gd->pdid == INVALID_PDID)
>  		return -EINVAL;
> +	ac = (struct mana_context *)gd->driver_data;

drive_data is a void *.
There is no need to cast it to another type of pointer.

...

> diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> index 9d1507eba5b9..557b890ad0ae 100644
> --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c

...

> @@ -786,12 +788,19 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
>  	struct hwc_wq *txq = hwc->txq;
>  	struct gdma_req_hdr *req_msg;
>  	struct hwc_caller_ctx *ctx;
> +	struct mana_context *ac;
>  	u32 dest_vrcq = 0;
>  	u32 dest_vrq = 0;
>  	u16 msg_id;
>  	int err;
>  
>  	mana_hwc_get_msg_index(hwc, &msg_id);
> +	ac = (struct mana_context *)hwc->gdma_dev->driver_data;

Ditto.

...

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index d907727c7b7a..24f5508d2979 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -2330,7 +2330,10 @@ static int mana_dealloc_queues(struct net_device *ndev)
>  	struct mana_port_context *apc = netdev_priv(ndev);
>  	struct gdma_dev *gd = apc->ac->gdma_dev;
>  	struct mana_txq *txq;
> +	struct sk_buff *skb;
> +	struct mana_cq *cq;
>  	int i, err;
> +	unsigned long timeout;

Please use reverse xmas tree - longest line to shortest - for
local variable declarations in Networking code.

...

> diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
> index 9eef19972845..34f5d8e06ede 100644
> --- a/include/net/mana/mana.h
> +++ b/include/net/mana/mana.h
> @@ -361,6 +361,8 @@ struct mana_context {
>  	struct mana_eq *eqs;
>  
>  	struct net_device *ports[MAX_PORTS_IN_MANA_DEV];
> +
> +	bool vf_unload_timeout;

Perhaps it is not important, but on x86_54 there is a 6 byte hole in the first
cacheline after num_ports where this could go.

pahole reports:

struct mana_context {
	struct gdma_dev *          gdma_dev;             /*     0     8 */
	u16                        num_ports;            /*     8     2 */

	/* XXX 6 bytes hole, try to pack */

	struct mana_eq *           eqs;                  /*    16     8 */
	struct net_device *        ports[256];           /*    24  2048 */
	/* --- cacheline 32 boundary (2048 bytes) was 24 bytes ago --- */
	bool                       vf_unload_timeout;    /*  2072     1 */

	/* size: 2080, cachelines: 33, members: 5 */
	/* sum members: 2067, holes: 1, sum holes: 6 */
	/* padding: 7 */
	/* last cacheline: 32 bytes */
};
  
Haiyang Zhang June 21, 2023, 6:27 p.m. UTC | #2
> -----Original Message-----
> From: souradeep chakrabarti <schakrabarti@linux.microsoft.com>
> Sent: Wednesday, June 21, 2023 6:29 AM
> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> <decui@microsoft.com>; davem@davemloft.net; edumazet@google.com;
> kuba@kernel.org; pabeni@redhat.com; Long Li <longli@microsoft.com>; Ajay
> Sharma <sharmaajay@microsoft.com>; leon@kernel.org;
> cai.huoqing@linux.dev; ssengar@linux.microsoft.com; vkuznets@redhat.com;
> tglx@linutronix.de; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-rdma@vger.kernel.org
> Cc: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> Subject: [PATCH] net: mana: Fix MANA VF unload when host is unresponsive
> 
> From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> 
> This patch addresses  the VF unload issue, where mana_dealloc_queues()
> gets stuck in infinite while loop, because of host unresponsiveness.
> It adds a timeout in the while loop, to fix it.
> 
> Also this patch adds a new attribute in mana_context, which gets set when
> mana_hwc_send_request() hits a timeout because of host unresponsiveness.
> This flag then helps to avoid the timeouts in successive calls.
> 
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> ---
>  .../net/ethernet/microsoft/mana/gdma_main.c   |  4 +++-
>  .../net/ethernet/microsoft/mana/hw_channel.c  | 12 ++++++++++-
>  drivers/net/ethernet/microsoft/mana/mana_en.c | 21 +++++++++++++++++--
>  include/net/mana/mana.h                       |  2 ++
>  4 files changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 8f3f78b68592..5cc43ae78334 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -946,10 +946,12 @@ int mana_gd_deregister_device(struct gdma_dev
> *gd)
>  	struct gdma_context *gc = gd->gdma_context;
>  	struct gdma_general_resp resp = {};
>  	struct gdma_general_req req = {};
> +	struct mana_context *ac;
>  	int err;
> 
>  	if (gd->pdid == INVALID_PDID)
>  		return -EINVAL;
> +	ac = (struct mana_context *)gd->driver_data;
> 
>  	mana_gd_init_req_hdr(&req.hdr, GDMA_DEREGISTER_DEVICE,
> sizeof(req),
>  			     sizeof(resp));
> @@ -957,7 +959,7 @@ int mana_gd_deregister_device(struct gdma_dev *gd)
>  	req.hdr.dev_id = gd->dev_id;
> 
>  	err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
> -	if (err || resp.hdr.status) {
> +	if ((err || resp.hdr.status) && !ac->vf_unload_timeout) {
>  		dev_err(gc->dev, "Failed to deregister device: %d, 0x%x\n",
>  			err, resp.hdr.status);
>  		if (!err)
> diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> index 9d1507eba5b9..557b890ad0ae 100644
> --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> @@ -1,8 +1,10 @@
>  // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
>  /* Copyright (c) 2021, Microsoft Corporation. */
> 
> +#include "asm-generic/errno.h"
>  #include <net/mana/gdma.h>
>  #include <net/mana/hw_channel.h>
> +#include <net/mana/mana.h>
> 
>  static int mana_hwc_get_msg_index(struct hw_channel_context *hwc, u16
> *msg_id)
>  {
> @@ -786,12 +788,19 @@ int mana_hwc_send_request(struct
> hw_channel_context *hwc, u32 req_len,
>  	struct hwc_wq *txq = hwc->txq;
>  	struct gdma_req_hdr *req_msg;
>  	struct hwc_caller_ctx *ctx;
> +	struct mana_context *ac;
>  	u32 dest_vrcq = 0;
>  	u32 dest_vrq = 0;
>  	u16 msg_id;
>  	int err;
> 
>  	mana_hwc_get_msg_index(hwc, &msg_id);
> +	ac = (struct mana_context *)hwc->gdma_dev->driver_data;
> +	if (ac->vf_unload_timeout) {
> +		dev_err(hwc->dev, "HWC: vport is already unloaded.\n");
> +		err = -ETIMEDOUT;
> +		goto out;
> +	}
> 
>  	tx_wr = &txq->msg_buf->reqs[msg_id];
> 
> @@ -825,9 +834,10 @@ int mana_hwc_send_request(struct
> hw_channel_context *hwc, u32 req_len,
>  		goto out;
>  	}
> 
> -	if (!wait_for_completion_timeout(&ctx->comp_event, 30 * HZ)) {
> +	if (!wait_for_completion_timeout(&ctx->comp_event, 5 * HZ)) {
>  		dev_err(hwc->dev, "HWC: Request timed out!\n");
>  		err = -ETIMEDOUT;
> +		ac->vf_unload_timeout = true;
>  		goto out;
>  	}
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index d907727c7b7a..24f5508d2979 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -2330,7 +2330,10 @@ static int mana_dealloc_queues(struct net_device
> *ndev)
>  	struct mana_port_context *apc = netdev_priv(ndev);
>  	struct gdma_dev *gd = apc->ac->gdma_dev;
>  	struct mana_txq *txq;
> +	struct sk_buff *skb;
> +	struct mana_cq *cq;
>  	int i, err;
> +	unsigned long timeout;
> 
>  	if (apc->port_is_up)
>  		return -EINVAL;
> @@ -2348,13 +2351,26 @@ static int mana_dealloc_queues(struct net_device
> *ndev)
>  	 *
>  	 * Drain all the in-flight TX packets
>  	 */
> +
> +	timeout = jiffies + 120 * HZ;
>  	for (i = 0; i < apc->num_queues; i++) {
>  		txq = &apc->tx_qp[i].txq;
> -
> -		while (atomic_read(&txq->pending_sends) > 0)
> +		while (atomic_read(&txq->pending_sends) > 0 &&
> +		       time_before(jiffies, timeout)) {
>  			usleep_range(1000, 2000);
> +		}
>  	}
> 
> +	for (i = 0; i < apc->num_queues; i++) {
> +		txq = &apc->tx_qp[i].txq;
> +		cq = &apc->tx_qp[i].tx_cq;
> +		while (atomic_read(&txq->pending_sends)) {
> +			skb = skb_dequeue(&txq->pending_skbs);
> +			mana_unmap_skb(skb, apc);
> +			napi_consume_skb(skb, cq->budget);
> +			atomic_sub(1, &txq->pending_sends);
> +		}
> +	}
>  	/* We're 100% sure the queues can no longer be woken up, because
>  	 * we're sure now mana_poll_tx_cq() can't be running.
>  	 */
> @@ -2605,6 +2621,7 @@ int mana_probe(struct gdma_dev *gd, bool
> resuming)
>  		}
>  	}
> 
> +	ac->vf_unload_timeout = false;
>  	err = add_adev(gd);
>  out:
>  	if (err)
> diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
> index 9eef19972845..34f5d8e06ede 100644
> --- a/include/net/mana/mana.h
> +++ b/include/net/mana/mana.h
> @@ -361,6 +361,8 @@ struct mana_context {
>  	struct mana_eq *eqs;
> 
>  	struct net_device *ports[MAX_PORTS_IN_MANA_DEV];
> +
> +	bool vf_unload_timeout;
>  };
> 
>  struct mana_port_context {
> --

Please specify "net" branch for fixes.
Also Cc: stable@vger.kernel.org So it will be ported to stable trees.

Thanks,
- Haiyang
  

Patch

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 8f3f78b68592..5cc43ae78334 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -946,10 +946,12 @@  int mana_gd_deregister_device(struct gdma_dev *gd)
 	struct gdma_context *gc = gd->gdma_context;
 	struct gdma_general_resp resp = {};
 	struct gdma_general_req req = {};
+	struct mana_context *ac;
 	int err;
 
 	if (gd->pdid == INVALID_PDID)
 		return -EINVAL;
+	ac = (struct mana_context *)gd->driver_data;
 
 	mana_gd_init_req_hdr(&req.hdr, GDMA_DEREGISTER_DEVICE, sizeof(req),
 			     sizeof(resp));
@@ -957,7 +959,7 @@  int mana_gd_deregister_device(struct gdma_dev *gd)
 	req.hdr.dev_id = gd->dev_id;
 
 	err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
-	if (err || resp.hdr.status) {
+	if ((err || resp.hdr.status) && !ac->vf_unload_timeout) {
 		dev_err(gc->dev, "Failed to deregister device: %d, 0x%x\n",
 			err, resp.hdr.status);
 		if (!err)
diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
index 9d1507eba5b9..557b890ad0ae 100644
--- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
@@ -1,8 +1,10 @@ 
 // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
 /* Copyright (c) 2021, Microsoft Corporation. */
 
+#include "asm-generic/errno.h"
 #include <net/mana/gdma.h>
 #include <net/mana/hw_channel.h>
+#include <net/mana/mana.h>
 
 static int mana_hwc_get_msg_index(struct hw_channel_context *hwc, u16 *msg_id)
 {
@@ -786,12 +788,19 @@  int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
 	struct hwc_wq *txq = hwc->txq;
 	struct gdma_req_hdr *req_msg;
 	struct hwc_caller_ctx *ctx;
+	struct mana_context *ac;
 	u32 dest_vrcq = 0;
 	u32 dest_vrq = 0;
 	u16 msg_id;
 	int err;
 
 	mana_hwc_get_msg_index(hwc, &msg_id);
+	ac = (struct mana_context *)hwc->gdma_dev->driver_data;
+	if (ac->vf_unload_timeout) {
+		dev_err(hwc->dev, "HWC: vport is already unloaded.\n");
+		err = -ETIMEDOUT;
+		goto out;
+	}
 
 	tx_wr = &txq->msg_buf->reqs[msg_id];
 
@@ -825,9 +834,10 @@  int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
 		goto out;
 	}
 
-	if (!wait_for_completion_timeout(&ctx->comp_event, 30 * HZ)) {
+	if (!wait_for_completion_timeout(&ctx->comp_event, 5 * HZ)) {
 		dev_err(hwc->dev, "HWC: Request timed out!\n");
 		err = -ETIMEDOUT;
+		ac->vf_unload_timeout = true;
 		goto out;
 	}
 
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index d907727c7b7a..24f5508d2979 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2330,7 +2330,10 @@  static int mana_dealloc_queues(struct net_device *ndev)
 	struct mana_port_context *apc = netdev_priv(ndev);
 	struct gdma_dev *gd = apc->ac->gdma_dev;
 	struct mana_txq *txq;
+	struct sk_buff *skb;
+	struct mana_cq *cq;
 	int i, err;
+	unsigned long timeout;
 
 	if (apc->port_is_up)
 		return -EINVAL;
@@ -2348,13 +2351,26 @@  static int mana_dealloc_queues(struct net_device *ndev)
 	 *
 	 * Drain all the in-flight TX packets
 	 */
+
+	timeout = jiffies + 120 * HZ;
 	for (i = 0; i < apc->num_queues; i++) {
 		txq = &apc->tx_qp[i].txq;
-
-		while (atomic_read(&txq->pending_sends) > 0)
+		while (atomic_read(&txq->pending_sends) > 0 &&
+		       time_before(jiffies, timeout)) {
 			usleep_range(1000, 2000);
+		}
 	}
 
+	for (i = 0; i < apc->num_queues; i++) {
+		txq = &apc->tx_qp[i].txq;
+		cq = &apc->tx_qp[i].tx_cq;
+		while (atomic_read(&txq->pending_sends)) {
+			skb = skb_dequeue(&txq->pending_skbs);
+			mana_unmap_skb(skb, apc);
+			napi_consume_skb(skb, cq->budget);
+			atomic_sub(1, &txq->pending_sends);
+		}
+	}
 	/* We're 100% sure the queues can no longer be woken up, because
 	 * we're sure now mana_poll_tx_cq() can't be running.
 	 */
@@ -2605,6 +2621,7 @@  int mana_probe(struct gdma_dev *gd, bool resuming)
 		}
 	}
 
+	ac->vf_unload_timeout = false;
 	err = add_adev(gd);
 out:
 	if (err)
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 9eef19972845..34f5d8e06ede 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -361,6 +361,8 @@  struct mana_context {
 	struct mana_eq *eqs;
 
 	struct net_device *ports[MAX_PORTS_IN_MANA_DEV];
+
+	bool vf_unload_timeout;
 };
 
 struct mana_port_context {