[net,v2] mlxbf_gige: fix receive packet race condition

Message ID 20231220234739.13753-1-davthompson@nvidia.com
State New
Headers
Series [net,v2] mlxbf_gige: fix receive packet race condition |

Commit Message

David Thompson Dec. 20, 2023, 11:47 p.m. UTC
  Under heavy traffic, the BlueField Gigabit interface can
become unresponsive. This is due to a possible race condition
in the mlxbf_gige_rx_packet function, where the function exits
with producer and consumer indices equal but there are remaining
packet(s) to be processed. In order to prevent this situation,
read receive consumer index *before* the HW replenish so that
the mlxbf_gige_rx_packet function returns an accurate return
value even if a packet is received into just-replenished buffer
prior to exiting this routine. If the just-replenished buffer
is received and occupies the last RX ring entry, the interface
would not recover and instead would encounter RX packet drops
related to internal buffer shortages since the driver RX logic
is not being triggered to drain the RX ring. This patch will
address and prevent this "ring full" condition.

Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
Signed-off-by: David Thompson <davthompson@nvidia.com>
---
v2
* removed logic that affected RX_DMA enable/disable
---
 drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)
  

Comments

patchwork-bot+netdevbpf@kernel.org Dec. 29, 2023, 10:50 p.m. UTC | #1
Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Wed, 20 Dec 2023 18:47:39 -0500 you wrote:
> Under heavy traffic, the BlueField Gigabit interface can
> become unresponsive. This is due to a possible race condition
> in the mlxbf_gige_rx_packet function, where the function exits
> with producer and consumer indices equal but there are remaining
> packet(s) to be processed. In order to prevent this situation,
> read receive consumer index *before* the HW replenish so that
> the mlxbf_gige_rx_packet function returns an accurate return
> value even if a packet is received into just-replenished buffer
> prior to exiting this routine. If the just-replenished buffer
> is received and occupies the last RX ring entry, the interface
> would not recover and instead would encounter RX packet drops
> related to internal buffer shortages since the driver RX logic
> is not being triggered to drain the RX ring. This patch will
> address and prevent this "ring full" condition.
> 
> [...]

Here is the summary with links:
  - [net,v2] mlxbf_gige: fix receive packet race condition
    https://git.kernel.org/netdev/net/c/dcea1bd45e6d

You are awesome, thank you!
  

Patch

diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
index 0d5a41a2ae01..227d01cace3f 100644
--- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
@@ -267,6 +267,13 @@  static bool mlxbf_gige_rx_packet(struct mlxbf_gige *priv, int *rx_pkts)
 		priv->stats.rx_truncate_errors++;
 	}
 
+	/* Read receive consumer index before replenish so that this routine
+	 * returns accurate return value even if packet is received into
+	 * just-replenished buffer prior to exiting this routine.
+	 */
+	rx_ci = readq(priv->base + MLXBF_GIGE_RX_CQE_PACKET_CI);
+	rx_ci_rem = rx_ci % priv->rx_q_entries;
+
 	/* Let hardware know we've replenished one buffer */
 	rx_pi++;
 
@@ -279,8 +286,6 @@  static bool mlxbf_gige_rx_packet(struct mlxbf_gige *priv, int *rx_pkts)
 	rx_pi_rem = rx_pi % priv->rx_q_entries;
 	if (rx_pi_rem == 0)
 		priv->valid_polarity ^= 1;
-	rx_ci = readq(priv->base + MLXBF_GIGE_RX_CQE_PACKET_CI);
-	rx_ci_rem = rx_ci % priv->rx_q_entries;
 
 	if (skb)
 		netif_receive_skb(skb);