[1/2,V3,net] net: mana: Fix MANA VF unload when host is unresponsive

Message ID 1687771098-26775-1-git-send-email-schakrabarti@linux.microsoft.com
State New
Headers
Series [1/2,V3,net] net: mana: Fix MANA VF unload when host is unresponsive |

Commit Message

Souradeep Chakrabarti June 26, 2023, 9:18 a.m. UTC
  From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>

This patch addresses the VF unload issue, where mana_dealloc_queues()
gets stuck in infinite while loop, because of host unresponsiveness.
It adds a timeout in the while loop, to fix it.

Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a driver for
Microsoft Azure Network Adapter)
Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
---
V2 -> V3:
* Splitted the patch in two parts.
* Removed the unnecessary braces from mana_dealloc_queues().
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)
  

Comments

Simon Horman June 26, 2023, 1:05 p.m. UTC | #1
On Mon, Jun 26, 2023 at 02:18:18AM -0700, souradeep chakrabarti wrote:
> From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> 
> This patch addresses the VF unload issue, where mana_dealloc_queues()
> gets stuck in infinite while loop, because of host unresponsiveness.
> It adds a timeout in the while loop, to fix it.
> 
> Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a driver for
> Microsoft Azure Network Adapter)

nit: A correct format of this fixes tag is:

In particular:
* All on lone line
* Description in double quotes.

Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")

> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> ---
> V2 -> V3:
> * Splitted the patch in two parts.
> * Removed the unnecessary braces from mana_dealloc_queues().
  
Dexuan Cui June 26, 2023, 8:06 p.m. UTC | #2
> From: Simon Horman
> Sent: Monday, June 26, 2023 6:05 AM
> > ...
> > Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a
> > driver for
> > Microsoft Azure Network Adapter)
> 
> nit: A correct format of this fixes tag is:
> 
> In particular:
> * All on lone line
> * Description in double quotes.
> 
> Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network
> Adapter (MANA)")

Hi Souradeep, FYI I often refer to:
https://marc.info/?l=linux-pci&m=150905742808166&w=2

The link mentions:
alias gsr='git --no-pager show -s --abbrev-commit --abbrev=12 --pretty=format:"%h (\"%s\")%n"'

"gsr ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f" produces:
ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
  
Stephen Hemminger June 26, 2023, 8:47 p.m. UTC | #3
On Mon, 26 Jun 2023 20:06:48 +0000
Dexuan Cui <decui@microsoft.com> wrote:

> > From: Simon Horman
> > Sent: Monday, June 26, 2023 6:05 AM  
> > > ...
> > > Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a
> > > driver for
> > > Microsoft Azure Network Adapter)  
> > 
> > nit: A correct format of this fixes tag is:
> > 
> > In particular:
> > * All on lone line
> > * Description in double quotes.
> > 
> > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network
> > Adapter (MANA)")  
> 
> Hi Souradeep, FYI I often refer to:
> https://marc.info/?l=linux-pci&m=150905742808166&w=2
> 
> The link mentions:
> alias gsr='git --no-pager show -s --abbrev-commit --abbrev=12 --pretty=format:"%h (\"%s\")%n"'
> 
> "gsr ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f" produces:
> ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")

You can do same thing without shell alias by using git-config

[alias]
	fixes = log -1 --format=fixes
	gsr = log -1 --format=gsr

[pretty]
	fixes = Fixes: %h (\"%s\")
	gsr = %h (\"%s\")

Then:
$ git gsr 1919b39fc6eabb9a6f9a51706ff6d03865f5df29
1919b39fc6ea ("net: mana: Fix perf regression: remove rx_cqes, tx_cqes counters")
  
Souradeep Chakrabarti June 27, 2023, 8:35 a.m. UTC | #4
On Mon, Jun 26, 2023 at 01:47:21PM -0700, Stephen Hemminger wrote:
> On Mon, 26 Jun 2023 20:06:48 +0000
> Dexuan Cui <decui@microsoft.com> wrote:
> 
> > > From: Simon Horman
> > > Sent: Monday, June 26, 2023 6:05 AM  
> > > > ...
> > > > Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a
> > > > driver for
> > > > Microsoft Azure Network Adapter)  
> > > 
> > > nit: A correct format of this fixes tag is:
> > > 
> > > In particular:
> > > * All on lone line
> > > * Description in double quotes.
> > > 
> > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network
> > > Adapter (MANA)")  
> > 
> > Hi Souradeep, FYI I often refer to:
> > https://marc.info/?l=linux-pci&m=150905742808166&w=2
> > 
> > The link mentions:
> > alias gsr='git --no-pager show -s --abbrev-commit --abbrev=12 --pretty=format:"%h (\"%s\")%n"'
> > 
Thank you for the advice. Will use it from now onwards.
> > "gsr ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f" produces:
> > ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
> 
> You can do same thing without shell alias by using git-config
> 
> [alias]
> 	fixes = log -1 --format=fixes
> 	gsr = log -1 --format=gsr
> 
> [pretty]
> 	fixes = Fixes: %h (\"%s\")
> 	gsr = %h (\"%s\")
> 
> Then:
> $ git gsr 1919b39fc6eabb9a6f9a51706ff6d03865f5df29
> 1919b39fc6ea ("net: mana: Fix perf regression: remove rx_cqes, tx_cqes counters")
Thank you for the suggestion.
  

Patch

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index d907727c7b7a..cb5c43c3c47e 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2329,7 +2329,10 @@  static int mana_dealloc_queues(struct net_device *ndev)
 {
 	struct mana_port_context *apc = netdev_priv(ndev);
 	struct gdma_dev *gd = apc->ac->gdma_dev;
+	unsigned long timeout;
 	struct mana_txq *txq;
+	struct sk_buff *skb;
+	struct mana_cq *cq;
 	int i, err;
 
 	if (apc->port_is_up)
@@ -2348,13 +2351,25 @@  static int mana_dealloc_queues(struct net_device *ndev)
 	 *
 	 * Drain all the in-flight TX packets
 	 */
+
+	timeout = jiffies + 120 * HZ;
 	for (i = 0; i < apc->num_queues; i++) {
 		txq = &apc->tx_qp[i].txq;
-
-		while (atomic_read(&txq->pending_sends) > 0)
+		while (atomic_read(&txq->pending_sends) > 0 &&
+		       time_before(jiffies, timeout))
 			usleep_range(1000, 2000);
 	}
 
+	for (i = 0; i < apc->num_queues; i++) {
+		txq = &apc->tx_qp[i].txq;
+		cq = &apc->tx_qp[i].tx_cq;
+		while (atomic_read(&txq->pending_sends)) {
+			skb = skb_dequeue(&txq->pending_skbs);
+			mana_unmap_skb(skb, apc);
+			napi_consume_skb(skb, cq->budget);
+			atomic_sub(1, &txq->pending_sends);
+		}
+	}
 	/* We're 100% sure the queues can no longer be woken up, because
 	 * we're sure now mana_poll_tx_cq() can't be running.
 	 */