[net-next,2/2] net: lan966x: Stop using packing library

Message ID 20230312202424.1495439-3-horatiu.vultur@microchip.com
State New
Headers
Series net: lan966x: Improve TX/RX of frames from/to CPU |

Commit Message

Horatiu Vultur March 12, 2023, 8:24 p.m. UTC
  When a frame is injected from CPU, it is required to create an IFH(Inter
frame header) which sits in front of the frame that is transmitted.
This IFH, contains different fields like destination port, to bypass the
analyzer, priotity, etc. Lan966x it is using packing library to set and
get the fields of this IFH. But this seems to be an expensive
operations.
If this is changed with a simpler implementation, the RX will be
improved with ~5Mbit while on the TX is a much bigger improvement as it
is required to set more fields. Below are the numbers for TX.

Before:
[  5]   0.00-10.02  sec   439 MBytes   367 Mbits/sec    0 sender

After:
[  5]   0.00-10.00  sec   563 MBytes   472 Mbits/sec    0 sender

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
---
 .../net/ethernet/microchip/lan966x/Kconfig    |  1 -
 .../ethernet/microchip/lan966x/lan966x_main.c | 75 +++++++++++++------
 2 files changed, 51 insertions(+), 25 deletions(-)
  

Comments

David Laight March 13, 2023, 5:04 p.m. UTC | #1
From: Horatiu Vultur
> Sent: 12 March 2023 20:24
> 
> When a frame is injected from CPU, it is required to create an IFH(Inter
> frame header) which sits in front of the frame that is transmitted.
> This IFH, contains different fields like destination port, to bypass the
> analyzer, priotity, etc. Lan966x it is using packing library to set and
> get the fields of this IFH. But this seems to be an expensive
> operations.
> If this is changed with a simpler implementation, the RX will be
> improved with ~5Mbit while on the TX is a much bigger improvement as it
> is required to set more fields. Below are the numbers for TX.
...
> +static void lan966x_ifh_set(u8 *ifh, size_t val, size_t pos, size_t length)
> +{
> +	u32 v = 0;
> +
> +	for (int i = 0; i < length ; i++) {
> +		int j = pos + i;
> +		int k = j % 8;
> +
> +		if (i == 0 || k == 0)
> +			v = ifh[IFH_LEN_BYTES - (j / 8) - 1];
> +
> +		if (val & (1 << i))
> +			v |= (1 << k);
> +
> +		if (i == (length - 1) || k == 7)
> +			ifh[IFH_LEN_BYTES - (j / 8) - 1] = v;
> +	}
> +}
> +

It has to be possible to do much better that that.
Given  that 'pos' and 'length' are always constants it looks like
each call should reduce to (something like):
	ifh[k] |= val << n;
	ifk[k + 1] |= val >> (8 - n);
	...
It might be that the compiler manages to do this, but I doubt it.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
Jakub Kicinski March 13, 2023, 10:18 p.m. UTC | #2
On Mon, 13 Mar 2023 17:04:11 +0000 David Laight wrote:
> It has to be possible to do much better that that.
> Given  that 'pos' and 'length' are always constants it looks like
> each call should reduce to (something like):
> 	ifh[k] |= val << n;
> 	ifk[k + 1] |= val >> (8 - n);
> 	...
> It might be that the compiler manages to do this, but I doubt it.

Agreed, going bit-by-bit seems overly cautious.
  
Horatiu Vultur March 15, 2023, 1:33 p.m. UTC | #3
The 03/13/2023 17:04, David Laight wrote:
> 
> From: Horatiu Vultur
> > Sent: 12 March 2023 20:24
> >
> > When a frame is injected from CPU, it is required to create an IFH(Inter
> > frame header) which sits in front of the frame that is transmitted.
> > This IFH, contains different fields like destination port, to bypass the
> > analyzer, priotity, etc. Lan966x it is using packing library to set and
> > get the fields of this IFH. But this seems to be an expensive
> > operations.
> > If this is changed with a simpler implementation, the RX will be
> > improved with ~5Mbit while on the TX is a much bigger improvement as it
> > is required to set more fields. Below are the numbers for TX.
> ...
> > +static void lan966x_ifh_set(u8 *ifh, size_t val, size_t pos, size_t length)
> > +{
> > +     u32 v = 0;
> > +
> > +     for (int i = 0; i < length ; i++) {
> > +             int j = pos + i;
> > +             int k = j % 8;
> > +
> > +             if (i == 0 || k == 0)
> > +                     v = ifh[IFH_LEN_BYTES - (j / 8) - 1];
> > +
> > +             if (val & (1 << i))
> > +                     v |= (1 << k);
> > +
> > +             if (i == (length - 1) || k == 7)
> > +                     ifh[IFH_LEN_BYTES - (j / 8) - 1] = v;
> > +     }
> > +}
> > +
> 
> It has to be possible to do much better that that.
> Given  that 'pos' and 'length' are always constants it looks like
> each call should reduce to (something like):
>         ifh[k] |= val << n;
>         ifk[k + 1] |= val >> (8 - n);
>         ...
> It might be that the compiler manages to do this, but I doubt it.

Thanks for the review. I will update this in the next version.

Do you think it is worth updating the code in lan966x_ifh_get to use
byte access and not to read each bit individually?
As there is no much improvement on the RX side that is using lan966x_ifh_get.

> 
>         David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>
  

Patch

diff --git a/drivers/net/ethernet/microchip/lan966x/Kconfig b/drivers/net/ethernet/microchip/lan966x/Kconfig
index 8bcd60f17d6d3..571e6d4da1e9d 100644
--- a/drivers/net/ethernet/microchip/lan966x/Kconfig
+++ b/drivers/net/ethernet/microchip/lan966x/Kconfig
@@ -6,7 +6,6 @@  config LAN966X_SWITCH
 	depends on NET_SWITCHDEV
 	depends on BRIDGE || BRIDGE=n
 	select PHYLINK
-	select PACKING
 	select PAGE_POOL
 	select VCAP
 	help
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index 4584a78c6ecbd..9134716b62a55 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -7,7 +7,6 @@ 
 #include <linux/ip.h>
 #include <linux/of_platform.h>
 #include <linux/of_net.h>
-#include <linux/packing.h>
 #include <linux/phy/phy.h>
 #include <linux/reset.h>
 #include <net/addrconf.h>
@@ -305,46 +304,58 @@  static int lan966x_port_ifh_xmit(struct sk_buff *skb,
 	return NETDEV_TX_BUSY;
 }
 
+static void lan966x_ifh_set(u8 *ifh, size_t val, size_t pos, size_t length)
+{
+	u32 v = 0;
+
+	for (int i = 0; i < length ; i++) {
+		int j = pos + i;
+		int k = j % 8;
+
+		if (i == 0 || k == 0)
+			v = ifh[IFH_LEN_BYTES - (j / 8) - 1];
+
+		if (val & (1 << i))
+			v |= (1 << k);
+
+		if (i == (length - 1) || k == 7)
+			ifh[IFH_LEN_BYTES - (j / 8) - 1] = v;
+	}
+}
+
 void lan966x_ifh_set_bypass(void *ifh, u64 bypass)
 {
-	packing(ifh, &bypass, IFH_POS_BYPASS + IFH_WID_BYPASS - 1,
-		IFH_POS_BYPASS, IFH_LEN * 4, PACK, 0);
+	lan966x_ifh_set(ifh, bypass, IFH_POS_BYPASS, IFH_WID_BYPASS);
 }
 
-void lan966x_ifh_set_port(void *ifh, u64 bypass)
+void lan966x_ifh_set_port(void *ifh, u64 port)
 {
-	packing(ifh, &bypass, IFH_POS_DSTS + IFH_WID_DSTS - 1,
-		IFH_POS_DSTS, IFH_LEN * 4, PACK, 0);
+	lan966x_ifh_set(ifh, port, IFH_POS_DSTS, IFH_WID_DSTS);
 }
 
-static void lan966x_ifh_set_qos_class(void *ifh, u64 bypass)
+static void lan966x_ifh_set_qos_class(void *ifh, u64 qos)
 {
-	packing(ifh, &bypass, IFH_POS_QOS_CLASS + IFH_WID_QOS_CLASS - 1,
-		IFH_POS_QOS_CLASS, IFH_LEN * 4, PACK, 0);
+	lan966x_ifh_set(ifh, qos, IFH_POS_QOS_CLASS, IFH_WID_QOS_CLASS);
 }
 
-static void lan966x_ifh_set_ipv(void *ifh, u64 bypass)
+static void lan966x_ifh_set_ipv(void *ifh, u64 ipv)
 {
-	packing(ifh, &bypass, IFH_POS_IPV + IFH_WID_IPV - 1,
-		IFH_POS_IPV, IFH_LEN * 4, PACK, 0);
+	lan966x_ifh_set(ifh, ipv, IFH_POS_IPV, IFH_WID_IPV);
 }
 
 static void lan966x_ifh_set_vid(void *ifh, u64 vid)
 {
-	packing(ifh, &vid, IFH_POS_TCI + IFH_WID_TCI - 1,
-		IFH_POS_TCI, IFH_LEN * 4, PACK, 0);
+	lan966x_ifh_set(ifh, vid, IFH_POS_TCI, IFH_WID_TCI);
 }
 
 static void lan966x_ifh_set_rew_op(void *ifh, u64 rew_op)
 {
-	packing(ifh, &rew_op, IFH_POS_REW_CMD + IFH_WID_REW_CMD - 1,
-		IFH_POS_REW_CMD, IFH_LEN * 4, PACK, 0);
+	lan966x_ifh_set(ifh, rew_op, IFH_POS_REW_CMD, IFH_WID_REW_CMD);
 }
 
 static void lan966x_ifh_set_timestamp(void *ifh, u64 timestamp)
 {
-	packing(ifh, &timestamp, IFH_POS_TIMESTAMP + IFH_WID_TIMESTAMP - 1,
-		IFH_POS_TIMESTAMP, IFH_LEN * 4, PACK, 0);
+	lan966x_ifh_set(ifh, timestamp, IFH_POS_TIMESTAMP, IFH_WID_TIMESTAMP);
 }
 
 static netdev_tx_t lan966x_port_xmit(struct sk_buff *skb,
@@ -582,22 +593,38 @@  static int lan966x_rx_frame_word(struct lan966x *lan966x, u8 grp, u32 *rval)
 	}
 }
 
+static u64 lan966x_ifh_get(u8 *ifh, size_t pos, size_t length)
+{
+	u64 val = 0;
+	u8 v;
+
+	for (int i = 0; i < length ; i++) {
+		int j = pos + i;
+		int k = j % 8;
+
+		if (i == 0 || k == 0)
+			v = ifh[IFH_LEN_BYTES - (j / 8) - 1];
+
+		if (v & (1 << k))
+			val |= (1 << i);
+	}
+
+	return val;
+}
+
 void lan966x_ifh_get_src_port(void *ifh, u64 *src_port)
 {
-	packing(ifh, src_port, IFH_POS_SRCPORT + IFH_WID_SRCPORT - 1,
-		IFH_POS_SRCPORT, IFH_LEN * 4, UNPACK, 0);
+	*src_port = lan966x_ifh_get(ifh, IFH_POS_SRCPORT, IFH_WID_SRCPORT);
 }
 
 static void lan966x_ifh_get_len(void *ifh, u64 *len)
 {
-	packing(ifh, len, IFH_POS_LEN + IFH_WID_LEN - 1,
-		IFH_POS_LEN, IFH_LEN * 4, UNPACK, 0);
+	*len = lan966x_ifh_get(ifh, IFH_POS_LEN, IFH_WID_LEN);
 }
 
 void lan966x_ifh_get_timestamp(void *ifh, u64 *timestamp)
 {
-	packing(ifh, timestamp, IFH_POS_TIMESTAMP + IFH_WID_TIMESTAMP - 1,
-		IFH_POS_TIMESTAMP, IFH_LEN * 4, UNPACK, 0);
+	*timestamp = lan966x_ifh_get(ifh, IFH_POS_TIMESTAMP, IFH_WID_TIMESTAMP);
 }
 
 static irqreturn_t lan966x_xtr_irq_handler(int irq, void *args)