[03/15] can: m_can: Cache tx putidx and transmits in flight

Message ID 20221116205308.2996556-4-msp@baylibre.com
State New
Headers
Series can: m_can: Optimizations for tcan and peripheral chips |

Commit Message

Markus Schneider-Pargmann Nov. 16, 2022, 8:52 p.m. UTC
  On peripheral chips every read/write can be costly. Avoid reading easily
trackable information and cache them internally. This saves multiple
reads.

Transmit FIFO put index is cached, this is increased for every time we
enqueue a transmit request.

The transmits in flight is cached as well. With each transmit request it
is increased when reading the finished transmit event it is decreased.

A submit limit is cached to avoid submitting too many transmits at once,
either because the TX FIFO or the TXE FIFO is limited. This is currently
done very conservatively as the minimum of the fifo sizes. This means we
can reach FIFO full events but won't drop anything.

Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
---
 drivers/net/can/m_can/m_can.c | 21 +++++++++++++++------
 drivers/net/can/m_can/m_can.h |  5 +++++
 2 files changed, 20 insertions(+), 6 deletions(-)
  

Comments

Marc Kleine-Budde Dec. 1, 2022, 11:14 a.m. UTC | #1
On 16.11.2022 21:52:56, Markus Schneider-Pargmann wrote:
> On peripheral chips every read/write can be costly. Avoid reading easily
> trackable information and cache them internally. This saves multiple
> reads.
> 
> Transmit FIFO put index is cached, this is increased for every time we
> enqueue a transmit request.
> 
> The transmits in flight is cached as well. With each transmit request it
> is increased when reading the finished transmit event it is decreased.
> 
> A submit limit is cached to avoid submitting too many transmits at once,
> either because the TX FIFO or the TXE FIFO is limited. This is currently
> done very conservatively as the minimum of the fifo sizes. This means we
> can reach FIFO full events but won't drop anything.

You have a dedicated in_flight variable, which is read-modify-write in 2
different code path, i.e. this looks racy.

If you allow only power-of-two FIFO size, you can use 2 unsigned
variables, i.e. a head and a tail pointer. You can apply a mask to get
the index to the FIFO. The difference between head and tail is the fill
level of the FIFO. See mcp251xfd driver for this.

Marc
  
Markus Schneider-Pargmann Dec. 2, 2022, 8:37 a.m. UTC | #2
Hi Marc,

On Thu, Dec 01, 2022 at 12:14:50PM +0100, Marc Kleine-Budde wrote:
> On 16.11.2022 21:52:56, Markus Schneider-Pargmann wrote:
> > On peripheral chips every read/write can be costly. Avoid reading easily
> > trackable information and cache them internally. This saves multiple
> > reads.
> > 
> > Transmit FIFO put index is cached, this is increased for every time we
> > enqueue a transmit request.
> > 
> > The transmits in flight is cached as well. With each transmit request it
> > is increased when reading the finished transmit event it is decreased.
> > 
> > A submit limit is cached to avoid submitting too many transmits at once,
> > either because the TX FIFO or the TXE FIFO is limited. This is currently
> > done very conservatively as the minimum of the fifo sizes. This means we
> > can reach FIFO full events but won't drop anything.
> 
> You have a dedicated in_flight variable, which is read-modify-write in 2
> different code path, i.e. this looks racy.

True, of course, thank you. Yes I have to redesign this a bit for
concurrency.

> If you allow only power-of-two FIFO size, you can use 2 unsigned
> variables, i.e. a head and a tail pointer. You can apply a mask to get
> the index to the FIFO. The difference between head and tail is the fill
> level of the FIFO. See mcp251xfd driver for this.

Maybe that is a trivial question but what's wrong with using atomics
instead?

The tcan mram size is limited to 2048 so I would like to avoid limiting
the possible sizes of the tx fifos.

Best,
Markus

> 
> Marc
> 
> -- 
> Pengutronix e.K.                 | Marc Kleine-Budde           |
> Embedded Linux                   | https://www.pengutronix.de  |
> Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |
  
Marc Kleine-Budde Dec. 2, 2022, 2:46 p.m. UTC | #3
On 02.12.2022 09:37:40, Markus Schneider-Pargmann wrote:
> Hi Marc,
> 
> On Thu, Dec 01, 2022 at 12:14:50PM +0100, Marc Kleine-Budde wrote:
> > On 16.11.2022 21:52:56, Markus Schneider-Pargmann wrote:
> > > On peripheral chips every read/write can be costly. Avoid reading easily
> > > trackable information and cache them internally. This saves multiple
> > > reads.
> > > 
> > > Transmit FIFO put index is cached, this is increased for every time we
> > > enqueue a transmit request.
> > > 
> > > The transmits in flight is cached as well. With each transmit request it
> > > is increased when reading the finished transmit event it is decreased.
> > > 
> > > A submit limit is cached to avoid submitting too many transmits at once,
> > > either because the TX FIFO or the TXE FIFO is limited. This is currently
> > > done very conservatively as the minimum of the fifo sizes. This means we
> > > can reach FIFO full events but won't drop anything.
> > 
> > You have a dedicated in_flight variable, which is read-modify-write in 2
> > different code path, i.e. this looks racy.
> 
> True, of course, thank you. Yes I have to redesign this a bit for
> concurrency.
> 
> > If you allow only power-of-two FIFO size, you can use 2 unsigned
> > variables, i.e. a head and a tail pointer. You can apply a mask to get
> > the index to the FIFO. The difference between head and tail is the fill
> > level of the FIFO. See mcp251xfd driver for this.
> 
> Maybe that is a trivial question but what's wrong with using atomics
> instead?

I think it's Ok to use an atomic for the fill level. The put index
doesn't need to be. No need to cache the get index, as it's in the same
register as the fill level.

As the mcp251xfd benefits from caching both indexes, a head and tail
pointer felt like the right choice. As both are only written in 1
location, no need for atomics or locking.

> The tcan mram size is limited to 2048 so I would like to avoid limiting
> the possible sizes of the tx fifos.

What FIFO sizes are you using currently?

Marc
  
Markus Schneider-Pargmann Dec. 13, 2022, 5:13 p.m. UTC | #4
On Fri, Dec 02, 2022 at 03:46:30PM +0100, Marc Kleine-Budde wrote:
> On 02.12.2022 09:37:40, Markus Schneider-Pargmann wrote:
> > Hi Marc,
> > 
> > On Thu, Dec 01, 2022 at 12:14:50PM +0100, Marc Kleine-Budde wrote:
> > > On 16.11.2022 21:52:56, Markus Schneider-Pargmann wrote:
> > > > On peripheral chips every read/write can be costly. Avoid reading easily
> > > > trackable information and cache them internally. This saves multiple
> > > > reads.
> > > > 
> > > > Transmit FIFO put index is cached, this is increased for every time we
> > > > enqueue a transmit request.
> > > > 
> > > > The transmits in flight is cached as well. With each transmit request it
> > > > is increased when reading the finished transmit event it is decreased.
> > > > 
> > > > A submit limit is cached to avoid submitting too many transmits at once,
> > > > either because the TX FIFO or the TXE FIFO is limited. This is currently
> > > > done very conservatively as the minimum of the fifo sizes. This means we
> > > > can reach FIFO full events but won't drop anything.
> > > 
> > > You have a dedicated in_flight variable, which is read-modify-write in 2
> > > different code path, i.e. this looks racy.
> > 
> > True, of course, thank you. Yes I have to redesign this a bit for
> > concurrency.
> > 
> > > If you allow only power-of-two FIFO size, you can use 2 unsigned
> > > variables, i.e. a head and a tail pointer. You can apply a mask to get
> > > the index to the FIFO. The difference between head and tail is the fill
> > > level of the FIFO. See mcp251xfd driver for this.
> > 
> > Maybe that is a trivial question but what's wrong with using atomics
> > instead?
> 
> I think it's Ok to use an atomic for the fill level. The put index
> doesn't need to be. No need to cache the get index, as it's in the same
> register as the fill level.
> 
> As the mcp251xfd benefits from caching both indexes, a head and tail
> pointer felt like the right choice. As both are only written in 1
> location, no need for atomics or locking.
> 
> > The tcan mram size is limited to 2048 so I would like to avoid limiting
> > the possible sizes of the tx fifos.
> 
> What FIFO sizes are you using currently?

I am currently using 13 for TXB, TXE and RXF0.

Best,
Markus
  
Marc Kleine-Budde Dec. 13, 2022, 7:17 p.m. UTC | #5
On 13.12.2022 18:13:09, Markus Schneider-Pargmann wrote:
> > > The tcan mram size is limited to 2048 so I would like to avoid limiting
> > > the possible sizes of the tx fifos.
> > 
> > What FIFO sizes are you using currently?
> 
> I am currently using 13 for TXB, TXE and RXF0.

Have you CAN-FD enabled?

Marc
  
Markus Schneider-Pargmann Dec. 14, 2022, 8:32 a.m. UTC | #6
Hi Marc,

On Tue, Dec 13, 2022 at 08:17:17PM +0100, Marc Kleine-Budde wrote:
> On 13.12.2022 18:13:09, Markus Schneider-Pargmann wrote:
> > > > The tcan mram size is limited to 2048 so I would like to avoid limiting
> > > > the possible sizes of the tx fifos.
> > > 
> > > What FIFO sizes are you using currently?
> > 
> > I am currently using 13 for TXB, TXE and RXF0.
> 
> Have you CAN-FD enabled?

yes, it is enabled.

Best,
Markus
  

Patch

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 4adf03111782..f5bba848bd56 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -1041,6 +1041,7 @@  static int m_can_echo_tx_event(struct net_device *dev)
 		/* ack txe element */
 		m_can_write(cdev, M_CAN_TXEFA, FIELD_PREP(TXEFA_EFAI_MASK,
 							  fgi));
+		--cdev->tx_fifo_in_flight;
 
 		/* update stats */
 		m_can_tx_update_stats(cdev, msg_mark, timestamp);
@@ -1376,6 +1377,14 @@  static void m_can_start(struct net_device *dev)
 	cdev->can.state = CAN_STATE_ERROR_ACTIVE;
 
 	m_can_enable_all_interrupts(cdev);
+
+	if (cdev->version > 30) {
+		cdev->tx_fifo_putidx = FIELD_GET(TXFQS_TFQPI_MASK,
+						 m_can_read(cdev, M_CAN_TXFQS));
+		cdev->tx_fifo_in_flight = 0;
+		cdev->tx_fifo_submit_limit = min(cdev->mcfg[MRAM_TXE].num,
+						 cdev->mcfg[MRAM_TXB].num);
+	}
 }
 
 static int m_can_set_mode(struct net_device *dev, enum can_mode mode)
@@ -1589,7 +1598,6 @@  static netdev_tx_t m_can_tx_handler(struct m_can_classdev *cdev)
 	struct sk_buff *skb = cdev->tx_skb;
 	struct id_and_dlc fifo_header;
 	u32 cccr, fdflags;
-	u32 txfqs;
 	int err;
 	int putidx;
 
@@ -1646,10 +1654,8 @@  static netdev_tx_t m_can_tx_handler(struct m_can_classdev *cdev)
 	} else {
 		/* Transmit routine for version >= v3.1.x */
 
-		txfqs = m_can_read(cdev, M_CAN_TXFQS);
-
 		/* Check if FIFO full */
-		if (_m_can_tx_fifo_full(txfqs)) {
+		if (cdev->tx_fifo_in_flight >= cdev->tx_fifo_submit_limit) {
 			/* This shouldn't happen */
 			netif_stop_queue(dev);
 			netdev_warn(dev,
@@ -1665,7 +1671,7 @@  static netdev_tx_t m_can_tx_handler(struct m_can_classdev *cdev)
 		}
 
 		/* get put index for frame */
-		putidx = FIELD_GET(TXFQS_TFQPI_MASK, txfqs);
+		putidx = cdev->tx_fifo_putidx;
 
 		/* Construct DLC Field, with CAN-FD configuration.
 		 * Use the put index of the fifo as the message marker,
@@ -1699,9 +1705,12 @@  static netdev_tx_t m_can_tx_handler(struct m_can_classdev *cdev)
 
 		/* Enable TX FIFO element to start transfer  */
 		m_can_write(cdev, M_CAN_TXBAR, (1 << putidx));
+		++cdev->tx_fifo_in_flight;
+		cdev->tx_fifo_putidx = (++cdev->tx_fifo_putidx >= cdev->can.echo_skb_max ?
+					0 : cdev->tx_fifo_putidx);
 
 		/* stop network queue if fifo full */
-		if (m_can_tx_fifo_full(cdev) ||
+		if (cdev->tx_fifo_in_flight >= cdev->tx_fifo_submit_limit ||
 		    m_can_next_echo_skb_occupied(dev, putidx))
 			netif_stop_queue(dev);
 		else if (cdev->is_peripheral && !cdev->tx_skb && netif_queue_stopped(dev))
diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h
index 4c0267f9f297..7464ce56753a 100644
--- a/drivers/net/can/m_can/m_can.h
+++ b/drivers/net/can/m_can/m_can.h
@@ -92,6 +92,11 @@  struct m_can_classdev {
 	int pm_clock_support;
 	int is_peripheral;
 
+	// Store this internally to avoid fetch delays on peripheral chips
+	int tx_fifo_putidx;
+	int tx_fifo_in_flight;
+	int tx_fifo_submit_limit;
+
 	struct mram_cfg mcfg[MRAM_CFG_NUM];
 };