[v2,net] net: enetc: survive memory pressure without crashing

Message ID 20221026121330.2042989-1-vladimir.oltean@nxp.com
State New
Headers
Series [v2,net] net: enetc: survive memory pressure without crashing |

Commit Message

Vladimir Oltean Oct. 26, 2022, 12:13 p.m. UTC
  Under memory pressure, enetc_refill_rx_ring() may fail, and when called
during the enetc_open() -> enetc_setup_rxbdr() procedure, this is not
checked for.

An extreme case of memory pressure will result in exactly zero buffers
being allocated for the RX ring, and in such a case it is expected that
hardware drops all RX packets due to lack of buffers.

There are 2 problems. One is that the hardware drop doesn't happen, the
other is that even if this is fixed, the driver has undefined behavior
and may even crash. Explanation for the latter follows below.

The enetc NAPI poll procedure is shared between RX and TX conf, and
enetc_poll() calls enetc_clean_rx_ring() even if the reason why NAPI was
scheduled is TX.

The enetc_clean_rx_ring() function (and its XDP derivative) is not
prepared to handle such a condition. It has this loop exit condition:

		rxbd = enetc_rxbd(rx_ring, i);
		bd_status = le32_to_cpu(rxbd->r.lstatus);
		if (!bd_status)
			break;

otherwise said, the NAPI poll procedure does not look at the Producer
Index of the RX ring, instead it just walks circularly through the
descriptors until it finds one which is not Ready.

The driver undefined behavior is caused by the fact that the
enetc_rxbd(rx_ring, i) RX descriptor is only initialized by
enetc_refill_rx_ring() if page allocation has succeeded.

If memory allocation ever failed, enetc_clean_rx_ring() looks at
rxbd->r.lstatus as an exit condition, but "rxbd" itself is uninitialized
memory. If it contains junk, then junk buffers will be processed.

To fix this problem, memset the DMA coherent area used for RX buffer
descriptors in enetc_dma_alloc_bdr(). This makes all BDs be "not ready"
by default, which makes enetc_clean_rx_ring() exit early from the BD
processing loop when there is no valid buffer available.

The other problem (hardware does not drop packet in lack of buffers)
is due to an initial misconfiguration of the RX ring consumer index,
misconfiguration which is usually masked away by the proper
configuration done by enetc_refill_rx_ring() - when page allocation does
not fail.

The hardware guide recommends BD rings to be configured as follows:

| Configure the receive ring producer index register RBaPIR with a value
| of 0. The producer index is initially configured by software but owned
| by hardware after the ring has been enabled. Hardware increments the
| index when a frame is received which may consume one or more BDs.
| Hardware is not allowed to increment the producer index to match the
| consumer index since it is used to indicate an empty condition. The ring
| can hold at most RBLENR[LENGTH]-1 received BDs.
|
| Configure the receive ring consumer index register RBaCIR. The
| consumer index is owned by software and updated during operation of the
| of the BD ring by software, to indicate that any receive data occupied
| in the BD has been processed and it has been prepared for new data.
| - If consumer index and producer index are initialized to the same
|   value, it indicates that all BDs in the ring have been prepared and
|   hardware owns all of the entries.
| - If consumer index is initialized to producer index plus N, it would
|   indicate N BDs have been prepared. Note that hardware cannot start if
|   only a single buffer is prepared due to the restrictions described in
|   (2).
| - Software may write consumer index to match producer index anytime
|   while the ring is operational to indicate all received BDs prior have
|   been processed and new BDs prepared for hardware.

The reset-default value of the consumer index is 0, and this makes the
ENETC think that all buffers have been initialized (when in reality none
were).

To operate using no buffer, we must initialize the CI to PI + 1.

Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
v1->v2:
- also add an initial value for the RX ring consumer index, to be used
  when page allocation fails
- update the commit message
- deliberately not pick up Claudiu's review tag due to the above changes

v1 at:
https://patchwork.kernel.org/project/netdevbpf/patch/20221024172049.4187400-1-vladimir.oltean@nxp.com/

 drivers/net/ethernet/freescale/enetc/enetc.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)
  

Comments

Claudiu Manoil Oct. 27, 2022, 7:03 a.m. UTC | #1
Hi, Vladimir,

> -----Original Message-----
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> Sent: Wednesday, October 26, 2022 3:14 PM
> To: netdev@vger.kernel.org
[...]
> Subject: [PATCH v2 net] net: enetc: survive memory pressure without crashing
> 
> Under memory pressure, enetc_refill_rx_ring() may fail, and when called
> during the enetc_open() -> enetc_setup_rxbdr() procedure, this is not
> checked for.
> 
> An extreme case of memory pressure will result in exactly zero buffers
> being allocated for the RX ring, and in such a case it is expected that
> hardware drops all RX packets due to lack of buffers.
> 

How do you trigger this "extreme case of memory pressure" where no enetc buffer
can be allocated? Do you simulate it?

Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
  
Vladimir Oltean Oct. 27, 2022, 2:21 p.m. UTC | #2
On Thu, Oct 27, 2022 at 07:03:31AM +0000, Claudiu Manoil wrote:
> How do you trigger this "extreme case of memory pressure" where no enetc buffer
> can be allocated? Do you simulate it?

As far as I understand, making dev_alloc_page() predictably fail in some
particular spot is hard and probabilistic (but possible given enough tries).

The reason I stubled upon this particularly bad handling of low memory
in the enetc driver is because with AF_XDP zero-copy sockets, memory
for RX buffers comes directly from user space, which may simply opt to
not put any buffers in the fill queue (see "xdpsock --txonly" for
example).

The fix for the case where the buffers come from the kernel's page
allocator is simply analogous to that. It would be shady to add this
exact same patch only as part of the XSK work, when it's clear that
existing code suffers from this problem too, even if it's not easy to
trigger it.
  
Jakub Kicinski Oct. 27, 2022, 5:58 p.m. UTC | #3
On Wed, 26 Oct 2022 15:13:30 +0300 Vladimir Oltean wrote:
> To fix this problem, memset the DMA coherent area used for RX buffer
> descriptors in enetc_dma_alloc_bdr(). This makes all BDs be "not ready"
> by default, which makes enetc_clean_rx_ring() exit early from the BD
> processing loop when there is no valid buffer available.

IIRC dma_alloc_coherent() always zeros, and I'd guess there is a cocci
script that checks this judging but the number of "fixes" we got for
this in the past.

scripts/coccinelle/api/alloc/zalloc-simple.cocci ?
  
Vladimir Oltean Oct. 27, 2022, 6:02 p.m. UTC | #4
On Thu, Oct 27, 2022 at 10:58:24AM -0700, Jakub Kicinski wrote:
> On Wed, 26 Oct 2022 15:13:30 +0300 Vladimir Oltean wrote:
> > To fix this problem, memset the DMA coherent area used for RX buffer
> > descriptors in enetc_dma_alloc_bdr(). This makes all BDs be "not ready"
> > by default, which makes enetc_clean_rx_ring() exit early from the BD
> > processing loop when there is no valid buffer available.
> 
> IIRC dma_alloc_coherent() always zeros, and I'd guess there is a cocci
> script that checks this judging but the number of "fixes" we got for
> this in the past.
> 
> scripts/coccinelle/api/alloc/zalloc-simple.cocci ?

Yeah, ok, fair, I guess only the producer/consumer indices were the problem,
then. The "junk" I was seeing in the buffer descriptors was the "Ready"
bit caused by the hardware thinking it owns all BDs when in fact it
owned none of them.

Is there a chance the patch makes it for this week's PR if I amend it
really quick?
  
Jakub Kicinski Oct. 27, 2022, 6:23 p.m. UTC | #5
On Thu, 27 Oct 2022 18:02:09 +0000 Vladimir Oltean wrote:
> On Thu, Oct 27, 2022 at 10:58:24AM -0700, Jakub Kicinski wrote:
> > On Wed, 26 Oct 2022 15:13:30 +0300 Vladimir Oltean wrote:  
> > > To fix this problem, memset the DMA coherent area used for RX buffer
> > > descriptors in enetc_dma_alloc_bdr(). This makes all BDs be "not ready"
> > > by default, which makes enetc_clean_rx_ring() exit early from the BD
> > > processing loop when there is no valid buffer available.  
> > 
> > IIRC dma_alloc_coherent() always zeros, and I'd guess there is a cocci
> > script that checks this judging but the number of "fixes" we got for
> > this in the past.
> > 
> > scripts/coccinelle/api/alloc/zalloc-simple.cocci ?  
> 
> Yeah, ok, fair, I guess only the producer/consumer indices were the problem,
> then. The "junk" I was seeing in the buffer descriptors was the "Ready"
> bit caused by the hardware thinking it owns all BDs when in fact it
> owned none of them.
> 
> Is there a chance the patch makes it for this week's PR if I amend it
> really quick?

Yeah, you got 15min :)
  

Patch

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 54bc92fc6bf0..a787114c9ede 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1738,18 +1738,21 @@  void enetc_get_si_caps(struct enetc_si *si)
 
 static int enetc_dma_alloc_bdr(struct enetc_bdr *r, size_t bd_size)
 {
-	r->bd_base = dma_alloc_coherent(r->dev, r->bd_count * bd_size,
-					&r->bd_dma_base, GFP_KERNEL);
+	size_t size = r->bd_count * bd_size;
+
+	r->bd_base = dma_alloc_coherent(r->dev, size, &r->bd_dma_base,
+					GFP_KERNEL);
 	if (!r->bd_base)
 		return -ENOMEM;
 
 	/* h/w requires 128B alignment */
 	if (!IS_ALIGNED(r->bd_dma_base, 128)) {
-		dma_free_coherent(r->dev, r->bd_count * bd_size, r->bd_base,
-				  r->bd_dma_base);
+		dma_free_coherent(r->dev, size, r->bd_base, r->bd_dma_base);
 		return -EINVAL;
 	}
 
+	memset(r->bd_base, 0, size);
+
 	return 0;
 }
 
@@ -2090,7 +2093,12 @@  static void enetc_setup_rxbdr(struct enetc_hw *hw, struct enetc_bdr *rx_ring)
 	else
 		enetc_rxbdr_wr(hw, idx, ENETC_RBBSR, ENETC_RXB_DMA_SIZE);
 
+	/* Also prepare the consumer index in case page allocation never
+	 * succeeds. In that case, hardware will never advance producer index
+	 * to match consumer index, and will drop all frames.
+	 */
 	enetc_rxbdr_wr(hw, idx, ENETC_RBPIR, 0);
+	enetc_rxbdr_wr(hw, idx, ENETC_RBCIR, 1);
 
 	/* enable Rx ints by setting pkt thr to 1 */
 	enetc_rxbdr_wr(hw, idx, ENETC_RBICR0, ENETC_RBICR0_ICEN | 0x1);