[net,v2] net: dsa: tag_rtl4_a: Bump min packet size

Message ID 20231030-fix-rtl8366rb-v2-1-e66e1ef7dbd2@linaro.org
State New
Headers
Series [net,v2] net: dsa: tag_rtl4_a: Bump min packet size |

Commit Message

Linus Walleij Oct. 30, 2023, 9:26 a.m. UTC
  It was reported that the "LuCI" web UI was not working properly
with a device using the RTL8366RB switch. Disabling the egress
port tagging code made the switch work again, but this is not
a good solution as we want to be able to direct traffic to a
certain port.

It turns out that packets between 1496 and 1500 bytes are
dropped unless padded out to 1518 bytes.

If we pad the ethernet frames to a minimum of ETH_FRAME_LEN + FCS
(1518 bytes) everything starts working fine.

1496 is the size of a normal data frame minus the internal DSA
tag. The number is intuitive, the explanation evades me.

As we completely lack datasheet or any insight into how this
switch works, this trial-and-error solution is the best we
can do. FWIW this bug may very well be the reason why Realteks
code drops are not using CPU tagging. The vendor routers using
this switch are all hardwired to disable CPU tagging and all
packets are pushed to all ports on TX. This is also the case
in the old OpenWrt driver, derived from the vendor code drops.

I have tested smaller sizes, only 1518 or bigger padding works.

Modifying the MTU on the switch (one setting affecting all
ports) has no effect.

Before this patch:

PING 192.168.1.1 (192.168.1.1) 1470(1498) bytes of data.
^C
--- 192.168.1.1 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1048ms

PING 192.168.1.1 (192.168.1.1) 1472(1500) bytes of data.
^C
--- 192.168.1.1 ping statistics ---
12 packets transmitted, 0 received, 100% packet loss, time 11267ms

After this patch:

PING 192.168.1.1 (192.168.1.1) 1470(1498) bytes of data.
1478 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.533 ms
1478 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.630 ms

PING 192.168.1.1 (192.168.1.1) 1472(1500) bytes of data.
1480 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.527 ms
1480 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.562 ms

Also LuCI starts working with the 1500 bytes frames it produces
from the HTTP server.

Fixes: 9eb8bc593a5e ("net: dsa: tag_rtl4_a: fix egress tags")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
Changes in v2:
- Pad packages >= 1496 bytes after some further tests
- Use more to-the-point description
- Provide ping results in the commit message
- Link to v1: https://lore.kernel.org/r/20231027-fix-rtl8366rb-v1-1-d565d905535a@linaro.org
---
 net/dsa/tag_rtl4_a.c | 10 ++++++++++
 1 file changed, 10 insertions(+)


---
base-commit: d9e164e4199bc465b3540d5fe3ffc8a9a4fc6157
change-id: 20231027-fix-rtl8366rb-e752bd5901ca

Best regards,
  

Comments

Vladimir Oltean Oct. 30, 2023, 2:16 p.m. UTC | #1
On Mon, Oct 30, 2023 at 10:26:50AM +0100, Linus Walleij wrote:
> It was reported that the "LuCI" web UI was not working properly
> with a device using the RTL8366RB switch. Disabling the egress
> port tagging code made the switch work again, but this is not
> a good solution as we want to be able to direct traffic to a
> certain port.
> 
> It turns out that packets between 1496 and 1500 bytes are
> dropped unless padded out to 1518 bytes.
> 
> If we pad the ethernet frames to a minimum of ETH_FRAME_LEN + FCS
> (1518 bytes) everything starts working fine.
> 
> 1496 is the size of a normal data frame minus the internal DSA
> tag. The number is intuitive, the explanation evades me.
> 
> As we completely lack datasheet or any insight into how this
> switch works, this trial-and-error solution is the best we
> can do. FWIW this bug may very well be the reason why Realteks
> code drops are not using CPU tagging. The vendor routers using
> this switch are all hardwired to disable CPU tagging and all
> packets are pushed to all ports on TX. This is also the case
> in the old OpenWrt driver, derived from the vendor code drops.
> 
> I have tested smaller sizes, only 1518 or bigger padding works.
> 
> Modifying the MTU on the switch (one setting affecting all
> ports) has no effect.
> 
> Before this patch:
> 
> PING 192.168.1.1 (192.168.1.1) 1470(1498) bytes of data.
> ^C
> --- 192.168.1.1 ping statistics ---
> 2 packets transmitted, 0 received, 100% packet loss, time 1048ms
> 
> PING 192.168.1.1 (192.168.1.1) 1472(1500) bytes of data.
> ^C
> --- 192.168.1.1 ping statistics ---
> 12 packets transmitted, 0 received, 100% packet loss, time 11267ms
> 
> After this patch:
> 
> PING 192.168.1.1 (192.168.1.1) 1470(1498) bytes of data.
> 1478 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.533 ms
> 1478 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.630 ms
> 
> PING 192.168.1.1 (192.168.1.1) 1472(1500) bytes of data.
> 1480 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.527 ms
> 1480 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.562 ms
> 
> Also LuCI starts working with the 1500 bytes frames it produces
> from the HTTP server.
> 
> Fixes: 9eb8bc593a5e ("net: dsa: tag_rtl4_a: fix egress tags")
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
> Changes in v2:
> - Pad packages >= 1496 bytes after some further tests
> - Use more to-the-point description
> - Provide ping results in the commit message
> - Link to v1: https://lore.kernel.org/r/20231027-fix-rtl8366rb-v1-1-d565d905535a@linaro.org
> ---
>  net/dsa/tag_rtl4_a.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/net/dsa/tag_rtl4_a.c b/net/dsa/tag_rtl4_a.c
> index c327314b95e3..3292bc49b158 100644
> --- a/net/dsa/tag_rtl4_a.c
> +++ b/net/dsa/tag_rtl4_a.c
> @@ -45,6 +45,16 @@ static struct sk_buff *rtl4a_tag_xmit(struct sk_buff *skb,
>  	if (unlikely(__skb_put_padto(skb, ETH_ZLEN, false)))
>  		return NULL;
>  
> +	/* Packets over 1496 bytes get dropped unless they get padded
> +	 * out to 1518 bytes. 1496 is ETH_DATA_LEN - tag which is hardly
> +	 * a coinicidence, and 1518 is ETH_FRAME_LEN + FCS so we define
> +	 * the threshold size and padding like this.
> +	 */
> +	if (skb->len >= (ETH_DATA_LEN - RTL4_A_HDR_LEN)) {
> +		if (unlikely(__skb_put_padto(skb, ETH_FRAME_LEN + ETH_FCS_LEN, false)))
> +			return NULL;
> +	}

De-obfuscating the macros:

	if (skb->len >= 1496)
		__skb_put_padto(skb, 1518, false);

	(...)

	skb_push(skb, 4);

which means that here, skb->len will be 1522, if it was originally 1496.
So the code adds 26 extra octets, and only 4 of those are legitimate (a tag).
The rest is absolutely unexplained, which means that until there is a
valid explanation for them:

pw-bot: cr

(sorry, but if it works and we don't know why it works, then at some
point it will break and we won't know why it stopped working)

In the discussion on the previous thread:
https://lore.kernel.org/netdev/CACRpkdbq03ZXcB-TaBp5Udo3M47rb-o+LfkEkC-gA1+=x1Zd-g@mail.gmail.com/

you said that what increments is Dot1dTpPortInDiscards. 802.1Q-2018 says
about it: "Count of received valid frames that were discarded (i.e.,
filtered) by the Forwarding Process." which is odd enough to me, since
packets sent by rtl4a_tag_xmit() should *not* be processed by the forwarding
layer of the switch, but rather, force-delivered to the specified egress
port.

Could you please try to revert the effect of commit 339133f6c318 ("net:
dsa: tag_rtl4_a: Drop bit 9 from egress frames") by setting that bit in
the egress tag again? Who knows, maybe it is the bit which tells the
switch to bypass the forwarding process. Or maybe there is a bit in a
different position which does this. You could try to fill in all bits in
unknown positions, check that there are no regressions with packet sizes
< 1496, and then see if that made any changes to packet sizes >= 1496.
If it did, try to see which bit made the difference.

> +
>  	netdev_dbg(dev, "add realtek tag to package to port %d\n",
>  		   dp->index);
>  	skb_push(skb, RTL4_A_HDR_LEN);
> 
> ---
> base-commit: d9e164e4199bc465b3540d5fe3ffc8a9a4fc6157
> change-id: 20231027-fix-rtl8366rb-e752bd5901ca
> 
> Best regards,
> -- 
> Linus Walleij <linus.walleij@linaro.org>
>
  
Linus Walleij Oct. 30, 2023, 2:37 p.m. UTC | #2
On Mon, Oct 30, 2023 at 3:16 PM Vladimir Oltean <olteanv@gmail.com> wrote:

> Could you please try to revert the effect of commit 339133f6c318 ("net:
> dsa: tag_rtl4_a: Drop bit 9 from egress frames") by setting that bit in
> the egress tag again? Who knows, maybe it is the bit which tells the
> switch to bypass the forwarding process.

I have already tried that, it was one of the first things I tried,
just looking over the git log and looking for usual suspects.

Sadly it has no effect whatsoever, the problem persists :(

> Or maybe there is a bit in a
> different position which does this. You could try to fill in all bits in
> unknown positions, check that there are no regressions with packet sizes
> < 1496, and then see if that made any changes to packet sizes >= 1496.
> If it did, try to see which bit made the difference.

Hehe now we're talking :D

I did something similar before, I think just switching a different bit
every 10 packets or so and running a persistent ping until it succeeds
or not.

I'll see what I can come up with.

Yours,
Linus Walleij
  
Vladimir Oltean Oct. 30, 2023, 3:23 p.m. UTC | #3
On Mon, Oct 30, 2023 at 03:37:24PM +0100, Linus Walleij wrote:
> On Mon, Oct 30, 2023 at 3:16 PM Vladimir Oltean <olteanv@gmail.com> wrote:
> 
> > Could you please try to revert the effect of commit 339133f6c318 ("net:
> > dsa: tag_rtl4_a: Drop bit 9 from egress frames") by setting that bit in
> > the egress tag again? Who knows, maybe it is the bit which tells the
> > switch to bypass the forwarding process.
> 
> I have already tried that, it was one of the first things I tried,
> just looking over the git log and looking for usual suspects.
> 
> Sadly it has no effect whatsoever, the problem persists :(
> 
> > Or maybe there is a bit in a
> > different position which does this. You could try to fill in all bits in
> > unknown positions, check that there are no regressions with packet sizes
> > < 1496, and then see if that made any changes to packet sizes >= 1496.
> > If it did, try to see which bit made the difference.
> 
> Hehe now we're talking :D
> 
> I did something similar before, I think just switching a different bit
> every 10 packets or so and running a persistent ping until it succeeds
> or not.
> 
> I'll see what I can come up with.
> 
> Yours,
> Linus Walleij

And the drop reason in ethtool -S also stays unchanged despite all the
extra bits set in the tag? It still behaves as if the packet goes
through the forwarding path?
  
Vladimir Oltean Oct. 30, 2023, 3:30 p.m. UTC | #4
On Mon, Oct 30, 2023 at 05:23:25PM +0200, Vladimir Oltean wrote:
> On Mon, Oct 30, 2023 at 03:37:24PM +0100, Linus Walleij wrote:
> > On Mon, Oct 30, 2023 at 3:16 PM Vladimir Oltean <olteanv@gmail.com> wrote:
> > 
> > > Could you please try to revert the effect of commit 339133f6c318 ("net:
> > > dsa: tag_rtl4_a: Drop bit 9 from egress frames") by setting that bit in
> > > the egress tag again? Who knows, maybe it is the bit which tells the
> > > switch to bypass the forwarding process.
> > 
> > I have already tried that, it was one of the first things I tried,
> > just looking over the git log and looking for usual suspects.
> > 
> > Sadly it has no effect whatsoever, the problem persists :(
> > 
> > > Or maybe there is a bit in a
> > > different position which does this. You could try to fill in all bits in
> > > unknown positions, check that there are no regressions with packet sizes
> > > < 1496, and then see if that made any changes to packet sizes >= 1496.
> > > If it did, try to see which bit made the difference.
> > 
> > Hehe now we're talking :D
> > 
> > I did something similar before, I think just switching a different bit
> > every 10 packets or so and running a persistent ping until it succeeds
> > or not.
> > 
> > I'll see what I can come up with.
> > 
> > Yours,
> > Linus Walleij
> 
> And the drop reason in ethtool -S also stays unchanged despite all the
> extra bits set in the tag? It still behaves as if the packet goes
> through the forwarding path?

Could you please place these skb_dump() calls before and after the magic
__skb_put_padto() call, for us to see if anything unexpected changes?
Maybe the socket buffers have some unusual geometry which the conduit
interface doesn't like, for some reason.

The number of skb dumps that you provide back should be even, and
ideally the first one should be the unaltered packet, to avoid confusion :)

diff --git a/net/dsa/tag_rtl4_a.c b/net/dsa/tag_rtl4_a.c
index 25238093686c..2ca189b5125e 100644
--- a/net/dsa/tag_rtl4_a.c
+++ b/net/dsa/tag_rtl4_a.c
@@ -41,18 +41,22 @@ static struct sk_buff *rtl4a_tag_xmit(struct sk_buff *skb,
 	u8 *tag;
 	u16 out;
 
-	/* Pad out to at least 60 bytes */
-	if (unlikely(__skb_put_padto(skb, ETH_ZLEN, false)))
-		return NULL;
-
 	/* Packets over 1496 bytes get dropped unless they get padded
 	 * out to 1518 bytes. 1496 is ETH_DATA_LEN - tag which is hardly
 	 * a coinicidence, and 1518 is ETH_FRAME_LEN + FCS so we define
 	 * the threshold size and padding like this.
 	 */
 	if (skb->len >= (ETH_DATA_LEN - RTL4_A_HDR_LEN)) {
+		skb_dump(KERN_ERR, skb, true);
+
 		if (unlikely(__skb_put_padto(skb, ETH_FRAME_LEN + ETH_FCS_LEN, false)))
 			return NULL;
+
+		skb_dump(KERN_ERR, skb, true);
+	} else {
+		/* Pad out to at least 60 bytes */
+		if (unlikely(__skb_put_padto(skb, ETH_ZLEN, false)))
+			return NULL;
 	}
 
 	netdev_dbg(dev, "add realtek tag to package to port %d\n",
  
Linus Walleij Oct. 30, 2023, 9:50 p.m. UTC | #5
On Mon, Oct 30, 2023 at 3:16 PM Vladimir Oltean <olteanv@gmail.com> wrote:

> which means that here, skb->len will be 1522, if it was originally 1496.
> So the code adds 26 extra octets, and only 4 of those are legitimate (a tag).

Yeah I know :/

> The rest is absolutely unexplained, which means that until there is a
> valid explanation for them:
>
> pw-bot: cr
>
> (sorry, but if it works and we don't know why it works, then at some
> point it will break and we won't know why it stopped working)

Yeah it broke now and we don't know why...

> you said that what increments is Dot1dTpPortInDiscards. 802.1Q-2018 says
> about it: "Count of received valid frames that were discarded (i.e.,
> filtered) by the Forwarding Process." which is odd enough to me, since
> packets sent by rtl4a_tag_xmit() should *not* be processed by the forwarding
> layer of the switch, but rather, force-delivered to the specified egress
> port.

No this was a coincidence, we can rule this out. There are always
a few (2-3) Dot1dTpPortInDiscards on the switch port when it
is connected, sorry for getting this wrong :/

What happens is way more disturbing: packets are dropped
*silently* if not padded.

I added the following patch:

@@ -37,6 +37,8 @@ static struct sk_buff *rtl4a_tag_xmit(struct sk_buff *skb,
                                      struct net_device *dev)
 {
        struct dsa_port *dp = dsa_slave_to_port(dev);
+       static u16 mask = BIT(6);
+       static int cnt = 0;
        __be16 *p;
        u8 *tag;
        u16 out;
@@ -60,6 +62,19 @@ static struct sk_buff *rtl4a_tag_xmit(struct sk_buff *skb,
        /* The lower bits indicate the port number */
        out |= BIT(dp->index);

+       if (skb->len >= (ETH_DATA_LEN - RTL4_A_HDR_LEN)) {
+               /* Test bits... */
+               out |= mask;
+               netdev_info(dev, "add mask %04x to big package\n", mask);
+               cnt ++;
+               if (cnt == 10) {
+                       cnt = 0;
+                       mask <<= 1;
+                       if (mask == BIT(15))
+                               mask = BIT(6);
+               }
+       }
+
        p = (__be16 *)(tag + 2);
        *p = htons(out);

This loops over all the bits not used by the port mask and test them
one by one to see if any of them help.

Then ran a few rounds of ping -s 1472 and ping -s 1470.

There are console prints:

realtek-smi switch lan0: add mask 0040 to big package
realtek-smi switch lan0: add mask 0040 to big package
realtek-smi switch lan0: add mask 0040 to big package
(...)

Then bits 6,7,8,9,10,11,12,13,14 and 15 are tested in succession.

No error counters increase in ethtool -S lan0.

I can see the big packets leave the eth0 interface
(from ethtool -S eth0)

     p05_EtherStatsPkts1024to1518Octe: 370

But they do not appear in the targeted switch port stats:

     EtherStatsPkts1024to1518Octets: 22

(these 22 are some unrelated -s 1400 packets I sent to test)

Yours,
Linus Walleij
  
Vladimir Oltean Oct. 30, 2023, 10:20 p.m. UTC | #6
On Mon, Oct 30, 2023 at 10:50:31PM +0100, Linus Walleij wrote:
> > you said that what increments is Dot1dTpPortInDiscards
> No this was a coincidence, we can rule this out.

I see commit 86dd9868b878 ("net: dsa: tag_rtl4_a: Support also egress tags")
also mentions: "Qingfang came up with the solution: we need to pad the
ethernet frame to 60 bytes using eth_skb_pad(), then the switch will
happily accept frames with custom tags.". So the __skb_put_padto() was
something very empirical in the first place.

Since it's all problematic, would you mind removing the __skb_put_padto()
altogether from rtl4a_tag_xmit(), and let me know what is the output for
the following sweep through packet sizes? I truly wonder if it's just
for small and large packets that we see packet drops, or if it's something
repetitive throughout the range as well.

for size in $(seq 0 1476); do if ping 10.0.0.56 -s $size -W 1 -c 1 -q >/dev/null; then echo "$((size + 42)): OK"; else echo "$((size + 42)): NOK"; fi; done
  
Linus Walleij Oct. 30, 2023, 10:57 p.m. UTC | #7
On Mon, Oct 30, 2023 at 11:20 PM Vladimir Oltean <olteanv@gmail.com> wrote:

> I see commit 86dd9868b878 ("net: dsa: tag_rtl4_a: Support also egress tags")
> also mentions: "Qingfang came up with the solution: we need to pad the
> ethernet frame to 60 bytes using eth_skb_pad(), then the switch will
> happily accept frames with custom tags.". So the __skb_put_padto() was
> something very empirical in the first place.
>
> Since it's all problematic, would you mind removing the __skb_put_padto()
> altogether from rtl4a_tag_xmit(), and let me know what is the output for
> the following sweep through packet sizes? I truly wonder if it's just
> for small and large packets that we see packet drops, or if it's something
> repetitive throughout the range as well.
>
> for size in $(seq 0 1476); do if ping 10.0.0.56 -s $size -W 1 -c 1 -q >/dev/null; then echo "$((size + 42)): OK"; else echo "$((size + 42)): NOK"; fi; done

The weird thing is that if I remove the __skb_put_padto()
call, ping doesn't work at all. Somehow the packets are
corrupted, because they sure get out of the switch and I
can see them arriving with tcpdump on the host.

root@OpenWrt:/# for size in $(seq 0 1476); do if ping 192.168.1.137 -s $size -W
1 -c 1 -q >/dev/null; then echo "$((size + 42)): OK"; else echo "$((size + 42)):
 NOK"; fi; done
42: NOK
43: NOK
44: NOK
45: NOK
46: NOK
47: NOK
48: NOK
49: NOK
50: NOK
51: NOK
(...)
1509: NOK
1510: NOK
1511: NOK
1512: NOK
1513: NOK
1514: NOK
1515: NOK
1516: NOK
1517: NOK
1518: NOK

This of course make no sense, since the padding function should do nothing
when the packet is bigger than 60 bytes.

So what we are seeing is some kind of side effect from the usage of
__skb_put_padto() I suppose? But I have no idea what that is, I looked
at the function and what it calls down to __skb_pad().

I'm testing skb_linearize(), which seems to be called on this path...

TCPdump on the host looks like this:
# tcpdump -i enp7s0
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on enp7s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
23:28:55.184019 IP _gateway > fedora: ICMP echo request, id 2461, seq
0, length 27
23:28:56.205294 IP _gateway > fedora: ICMP echo request, id 2462, seq
0, length 28
23:28:57.226495 IP _gateway > fedora: ICMP echo request, id 2463, seq
0, length 29
23:28:58.248013 IP _gateway > fedora: ICMP echo request, id 2464, seq
0, length 30
23:28:59.269157 IP _gateway > fedora: ICMP echo request, id 2465, seq
0, length 31
23:29:00.290443 IP _gateway > fedora: ICMP echo request, id 2466, seq
0, length 32
23:29:01.698700 IP _gateway > fedora: ICMP echo request, id 2467, seq
0, length 33
23:29:02.332131 IP _gateway > fedora: ICMP echo request, id 2468, seq
0, length 34
23:29:03.352442 IP _gateway > fedora: ICMP echo request, id 2469, seq
0, length 35
(...)
23:53:33.834706 IP _gateway > fedora: ICMP echo request, id 4000, seq
0, length 1475
23:53:34.854946 IP _gateway > fedora: ICMP echo request, id 4001, seq
0, length 1476
23:53:36.258777 IP truncated-ip - 1 bytes missing! _gateway > fedora:
ICMP echo request, id 4002, seq 0, length 1477
23:53:36.896654 IP truncated-ip - 2 bytes missing! _gateway > fedora:
ICMP echo request, id 4003, seq 0, length 1478
23:53:37.918022 IP truncated-ip - 3 bytes missing! _gateway > fedora:
ICMP echo request, id 4004, seq 0, length 1479
23:53:38.938355 IP truncated-ip - 4 bytes missing! _gateway > fedora:
ICMP echo request, id 4005, seq 0, length 1480
23:53:39.958451 IP truncated-ip - 4 bytes missing! _gateway > fedora:
ICMP echo request, id 4006, seq 0, length 1480
23:53:40.978598 IP truncated-ip - 4 bytes missing! _gateway > fedora:
ICMP echo request, id 4007, seq 0, length 1480
23:53:41.998991 IP truncated-ip - 4 bytes missing! _gateway > fedora:
ICMP echo request, id 4008, seq 0, length 1480
23:53:43.020309 IP truncated-ip - 4 bytes missing! _gateway > fedora:
ICMP echo request, id 4010, seq 0, length 1480

Here you can incidentally also see what happens if we don't pad the big packets:
the packet gets truncated.

Yours,
Linus Walleij
  
Vladimir Oltean Oct. 30, 2023, 11:09 p.m. UTC | #8
On Mon, Oct 30, 2023 at 11:57:33PM +0100, Linus Walleij wrote:
> This of course make no sense, since the padding function should do nothing
> when the packet is bigger than 60 bytes.

Indeed, this of course makes no sense. Ping doesn't work, or ARP doesn't
work? Could you add a static ARP entry for the 192.168.1.137 IP address?
  
Linus Walleij Oct. 30, 2023, 11:11 p.m. UTC | #9
On Tue, Oct 31, 2023 at 12:09 AM Vladimir Oltean <olteanv@gmail.com> wrote:
> On Mon, Oct 30, 2023 at 11:57:33PM +0100, Linus Walleij wrote:
> > This of course make no sense, since the padding function should do nothing
> > when the packet is bigger than 60 bytes.
>
> Indeed, this of course makes no sense. Ping doesn't work, or ARP doesn't
> work? Could you add a static ARP entry for the 192.168.1.137 IP address?

ARP appears to work, and also DHCP then (I reconnect and
restart the interface before each test), so it's really weird.

I'm trying your suggestion to skb_dump() before/after.

Yours,
Linus Walleij
  
Vladimir Oltean Oct. 30, 2023, 11:33 p.m. UTC | #10
On Mon, Oct 30, 2023 at 11:57:33PM +0100, Linus Walleij wrote:
> Here you can incidentally also see what happens if we don't pad the big packets:
> the packet gets truncated.

Are we sure we are debugging a switch problem? On how many platforms
with the RTL8366RB can the issue be seen? Is the conduit interface the
same on all these platforms, or is it different and that makes no
difference?
  
Andrew Lunn Oct. 31, 2023, 12:37 a.m. UTC | #11
On Tue, Oct 31, 2023 at 01:09:06AM +0200, Vladimir Oltean wrote:
> On Mon, Oct 30, 2023 at 11:57:33PM +0100, Linus Walleij wrote:
> > This of course make no sense, since the padding function should do nothing
> > when the packet is bigger than 60 bytes.
> 
> Indeed, this of course makes no sense. Ping doesn't work, or ARP doesn't
> work? Could you add a static ARP entry for the 192.168.1.137 IP address?

Probably the ARP, since they are short packets and probably need the
padding.

	Andrew
  
Florian Fainelli Oct. 31, 2023, 3:37 a.m. UTC | #12
On 10/30/2023 5:37 PM, Andrew Lunn wrote:
> On Tue, Oct 31, 2023 at 01:09:06AM +0200, Vladimir Oltean wrote:
>> On Mon, Oct 30, 2023 at 11:57:33PM +0100, Linus Walleij wrote:
>>> This of course make no sense, since the padding function should do nothing
>>> when the packet is bigger than 60 bytes.
>>
>> Indeed, this of course makes no sense. Ping doesn't work, or ARP doesn't
>> work? Could you add a static ARP entry for the 192.168.1.137 IP address?
> 
> Probably the ARP, since they are short packets and probably need the
> padding.

Does this also mean that there is a general problem with unicast 
packets, and that broadcast packets happen to work by accident? Could 
you check whether a broadcast ping (ping -b) size sweep is immune to 
what you are reporting?
  
Linus Walleij Oct. 31, 2023, 2:06 p.m. UTC | #13
On Tue, Oct 31, 2023 at 1:37 AM Andrew Lunn <andrew@lunn.ch> wrote:
> On Tue, Oct 31, 2023 at 01:09:06AM +0200, Vladimir Oltean wrote:
> > On Mon, Oct 30, 2023 at 11:57:33PM +0100, Linus Walleij wrote:
> > > This of course make no sense, since the padding function should do nothing
> > > when the packet is bigger than 60 bytes.
> >
> > Indeed, this of course makes no sense. Ping doesn't work, or ARP doesn't
> > work? Could you add a static ARP entry for the 192.168.1.137 IP address?
>
> Probably the ARP, since they are short packets and probably need the
> padding.

It seems correct: the reason ping stops working is not that the ping
isn't reaching the host, I can see the pin request on tcpdumps.

The reason is that the host has no idea where to send the reply.
Because it's not getting any ARP replies.

And that is probably because the ARP replies are short and needs
padding to ETH_ZLEN.

I notice the code is probably borrowed from tag_brcm.c which does
exactly the same thing (ETH_ZLEN + tag). It comes with this
explanation:

       /* The Ethernet switch we are interfaced with needs packets to
be at
         * least 64 bytes (including FCS) otherwise they will be
discarded when
         * they enter the switch port logic. When Broadcom tags are
enabled, we
         * need to make sure that packets are at least 68 bytes
         * (including FCS and tag) because the length verification is
done after
         * the Broadcom tag is stripped off the ingress packet.
         *
         * Let dsa_slave_xmit() free the SKB
         */

The switch fabric is dropping smaller packets when CPU tags
(DSA tags) are enabled.

So is the padding to ETH_ZLEN OK or should is happen elsewhere?

Yours,
Linus Walleij
  
Linus Walleij Oct. 31, 2023, 2:16 p.m. UTC | #14
On Tue, Oct 31, 2023 at 12:33 AM Vladimir Oltean <olteanv@gmail.com> wrote:
> On Mon, Oct 30, 2023 at 11:57:33PM +0100, Linus Walleij wrote:
> > Here you can incidentally also see what happens if we don't pad the big packets:
> > the packet gets truncated.
>
> Are we sure we are debugging a switch problem? On how many platforms
> with the RTL8366RB can the issue be seen? Is the conduit interface the
> same on all these platforms, or is it different and that makes no
> difference?

I don't have any other RTL8366RB systems than the D-Link DIR-685.

I however have several systems with the same backing ethernet controller
connected directly to a PHY and they all work fine.

Yours,
Linus Walleij
  
Florian Fainelli Oct. 31, 2023, 4:23 p.m. UTC | #15
On 10/31/23 07:06, Linus Walleij wrote:
> On Tue, Oct 31, 2023 at 1:37 AM Andrew Lunn <andrew@lunn.ch> wrote:
>> On Tue, Oct 31, 2023 at 01:09:06AM +0200, Vladimir Oltean wrote:
>>> On Mon, Oct 30, 2023 at 11:57:33PM +0100, Linus Walleij wrote:
>>>> This of course make no sense, since the padding function should do nothing
>>>> when the packet is bigger than 60 bytes.
>>>
>>> Indeed, this of course makes no sense. Ping doesn't work, or ARP doesn't
>>> work? Could you add a static ARP entry for the 192.168.1.137 IP address?
>>
>> Probably the ARP, since they are short packets and probably need the
>> padding.
> 
> It seems correct: the reason ping stops working is not that the ping
> isn't reaching the host, I can see the pin request on tcpdumps.
> 
> The reason is that the host has no idea where to send the reply.
> Because it's not getting any ARP replies.
> 
> And that is probably because the ARP replies are short and needs
> padding to ETH_ZLEN.
> 
> I notice the code is probably borrowed from tag_brcm.c which does
> exactly the same thing (ETH_ZLEN + tag). It comes with this
> explanation:
> 
>         /* The Ethernet switch we are interfaced with needs packets to
> be at
>           * least 64 bytes (including FCS) otherwise they will be
> discarded when
>           * they enter the switch port logic. When Broadcom tags are
> enabled, we
>           * need to make sure that packets are at least 68 bytes
>           * (including FCS and tag) because the length verification is
> done after
>           * the Broadcom tag is stripped off the ingress packet.
>           *
>           * Let dsa_slave_xmit() free the SKB
>           */
> 
> The switch fabric is dropping smaller packets when CPU tags
> (DSA tags) are enabled.
> 
> So is the padding to ETH_ZLEN OK or should is happen elsewhere?

This is OK IMHO because you may not always have knowledge of which 
Ethernet MAC the switch is interfaced with, and with the tagger code, 
you have a central location to address them all.

The padding should ideally be done by the Ethernet MAC, even better when 
the HW can do it on its own (as it should), because whether there is a 
switch or not, no link partner should accept runt packets (some might 
but this is not standard).
  
Vladimir Oltean Oct. 31, 2023, 4:34 p.m. UTC | #16
On Tue, Oct 31, 2023 at 03:16:50PM +0100, Linus Walleij wrote:
> On Tue, Oct 31, 2023 at 12:33 AM Vladimir Oltean <olteanv@gmail.com> wrote:
> > On Mon, Oct 30, 2023 at 11:57:33PM +0100, Linus Walleij wrote:
> > > Here you can incidentally also see what happens if we don't pad the big packets:
> > > the packet gets truncated.
> >
> > Are we sure we are debugging a switch problem? On how many platforms
> > with the RTL8366RB can the issue be seen? Is the conduit interface the
> > same on all these platforms, or is it different and that makes no
> > difference?
> 
> I don't have any other RTL8366RB systems than the D-Link DIR-685.
> 
> I however have several systems with the same backing ethernet controller
> connected directly to a PHY and they all work fine.
> 
> Yours,
> Linus Walleij

Ok, so we don't have a confirmation of breakage with other conduit
interface than the Gemini driver, either. So a problem there is still
not off the table.

So on the gemini-dlink-dir-685.dts platform, you can also use &gmac1 as
a plain Ethernet port, right?

If possible, could you set up dsa_loop (enable CONFIG_NET_DSA_LOOP, replace
"eth0" in dsa_loop_pdata with the netdev name of &gmac1, replace DSA_TAG_PROTO_NONE
in dsa_loop_get_protocol() with your tagging protocol) and put a tcpdump
on the remote end of the gmac1 port, to see if the issue isn't, in fact,
somewhere else, maybe gmac_start_xmit()?

With dsa_loop, you turn &gmac1 into a conduit interface of sorts, except
for the fact that there is no switch to process the DSA-tagged packets,
you see those packets on the remote end, and you can investigate there
whether it's the switch who's corrupting/truncating, or if they're
somehow sent corrupted/truncated by Gemini on the wire (which would not
be visible in a local tcpdump).
  
Linus Walleij Oct. 31, 2023, 7:02 p.m. UTC | #17
On Tue, Oct 31, 2023 at 5:34 PM Vladimir Oltean <olteanv@gmail.com> wrote:

> Ok, so we don't have a confirmation of breakage with other conduit
> interface than the Gemini driver, either. So a problem there is still
> not off the table.

True!

> So on the gemini-dlink-dir-685.dts platform, you can also use &gmac1 as
> a plain Ethernet port, right?

As a port it exist on the SoC yes but it is not connected physically
to anything.

&gmac0 is connected to the switch, and the switch has all the PHYs.

(I don't know if I misunderstand the question...)

> If possible, could you set up dsa_loop (enable CONFIG_NET_DSA_LOOP, replace
> "eth0" in dsa_loop_pdata with the netdev name of &gmac1, replace DSA_TAG_PROTO_NONE
> in dsa_loop_get_protocol() with your tagging protocol) and put a tcpdump
> on the remote end of the gmac1 port, to see if the issue isn't, in fact,
> somewhere else, maybe gmac_start_xmit()?

If you by remote end mean the end of a physical cable there is
no way I can do that, as I have no PHY on gmac1.

But I have other Gemini platforms, so I will try to do it on one
of them! Let's see if I can do this thing....

Yours,
Linus Walleij
  
Luiz Angelo Daros de Luca Oct. 31, 2023, 7:18 p.m. UTC | #18
> I don't have any other RTL8366RB systems than the D-Link DIR-685.
>
> I however have several systems with the same backing ethernet controller
> connected directly to a PHY and they all work fine.

Hi Linus,

I ported TL-WR1043nd to DSA using RTL8366RB on OpenWrt main. Do you
need some help testing the switch?

I just need to test ping with different sizes?

Regards,

Luiz
  
Linus Walleij Oct. 31, 2023, 7:27 p.m. UTC | #19
On Tue, Oct 31, 2023 at 8:18 PM Luiz Angelo Daros de Luca
<luizluca@gmail.com> wrote:

> > I don't have any other RTL8366RB systems than the D-Link DIR-685.
> >
> > I however have several systems with the same backing ethernet controller
> > connected directly to a PHY and they all work fine.
>
> Hi Linus,
>
> I ported TL-WR1043nd to DSA using RTL8366RB on OpenWrt main. Do you
> need some help testing the switch?

Yes!

> I just need to test ping with different sizes?

Yes try to ping the host from the router:

ping -s 1472 192.168.1.1 or so to send a 1500 byte ping packet,
which will be padded up to a 1518 byte ethernet frame and
1522 bytes from the conduit interface.

Then if it doesn't work, see if this patch solves the issue!

Yours,
Linus Walleij
  
Linus Walleij Oct. 31, 2023, 9:22 p.m. UTC | #20
Hi Vladimir,

I got around to testing this too:

On Mon, Oct 30, 2023 at 4:31 PM Vladimir Oltean <olteanv@gmail.com> wrote:

> Could you please place these skb_dump() calls before and after the magic
> __skb_put_padto() call, for us to see if anything unexpected changes?
> Maybe the socket buffers have some unusual geometry which the conduit
> interface doesn't like, for some reason.
>
> The number of skb dumps that you provide back should be even, and
> ideally the first one should be the unaltered packet, to avoid confusion :)

I did a variant to just get one SKB dump and not tons of them;

@@ -37,22 +37,35 @@ static struct sk_buff *rtl4a_tag_xmit(struct sk_buff *skb,
                                      struct net_device *dev)
 {
        struct dsa_port *dp = dsa_slave_to_port(dev);
+       static int cnt = 0;
        __be16 *p;
        u8 *tag;
        u16 out;

-       /* Pad out to at least 60 bytes */
-       if (unlikely(__skb_put_padto(skb, ETH_ZLEN, false)))
-               return NULL;
-
        /* Packets over 1496 bytes get dropped unless they get padded
         * out to 1518 bytes. 1496 is ETH_DATA_LEN - tag which is hardly
         * a coinicidence, and 1518 is ETH_FRAME_LEN + FCS so we define
         * the threshold size and padding like this.
         */
        if (skb->len >= (ETH_DATA_LEN - RTL4_A_HDR_LEN)) {
+               cnt++;
+
+               if (cnt == 1) {
+                       pr_info("SKB before padding:\n");
+                       skb_dump(KERN_INFO, skb, true);
+               }
+
                if (unlikely(__skb_put_padto(skb, ETH_FRAME_LEN +
ETH_FCS_LEN, false)))
                        return NULL;
+
+               if (cnt == 1) {
+                       pr_info("SKB after padding:\n");
+                       skb_dump(KERN_INFO, skb, true);
+               }
+       } else {
+               /* Pad out to at least 60 bytes */
+               if (unlikely(__skb_put_padto(skb, ETH_ZLEN, false)))
+                       return NULL;
        }

# ping -s 1472 192.168.1.137

The result:

SKB before padding:
37 (192.168.1.13skb len=1514 headroom=18 headlen=1514 tailroom=260
mac=(18,14) net=(32,20) trans=52
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0xd4ef2b1 ip_summed=0 complete_sw=0 valid=0 level=0)
hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=0
7): 1472 data bydev name=lan0 feat=0x0002000000005020
tes
sk family=2 type=3 proto=1
skb headroom: 00000000: 00 02 00 01 00 00 00 00 00 00 03 78 02 00 bc ae
skb headroom: 00000010: 00 00
skb linear:   00000000: bc ae c5 6b a8 3d c2 2f 0b dc cc b4 08 00 45 00
skb linear:   00000010: 05 dc 3b de 40 00 40 01 75 68 c0 a8 01 01 c0 a8
skb linear:   00000020: 01 89 08 00 16 d2 09 54 00 00 8a cc 4d 0d 00 00
skb linear:   00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000001a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000001b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000001c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000001d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000001e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000001f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000270: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000002a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000002b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000002c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000002d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000002e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000002f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000390: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000003a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000003b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000003c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000003d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000003e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000003f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000410: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000420: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000430: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000440: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000450: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000460: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000470: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000490: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000004a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000004b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000004c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000004d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000004e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000004f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000540: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000560: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000570: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000590: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000005a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000005b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000005c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000005d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000005e0: 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 000000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000100: 00 00 00 00

SKB after padding:
skb len=1518 headroom=18 headlen=1518 tailroom=256
mac=(18,14) net=(32,20) trans=52
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0xd4ef2b1 ip_summed=0 complete_sw=0 valid=0 level=0)
hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=0
dev name=lan0 feat=0x0002000000005020
sk family=2 type=3 proto=1
skb headroom: 00000000: 00 02 00 01 00 00 00 00 00 00 03 78 02 00 bc ae
skb headroom: 00000010: 00 00
skb linear:   00000000: bc ae c5 6b a8 3d c2 2f 0b dc cc b4 08 00 45 00
skb linear:   00000010: 05 dc 3b de 40 00 40 01 75 68 c0 a8 01 01 c0 a8
skb linear:   00000020: 01 89 08 00 16 d2 09 54 00 00 8a cc 4d 0d 00 00
skb linear:   00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000001a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000001b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000001c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000001d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000001e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000001f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000270: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000002a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000002b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000002c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000002d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000002e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000002f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000390: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000003a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000003b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000003c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000003d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000003e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000003f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000410: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000420: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000430: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000440: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000450: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000460: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000470: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000490: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000004a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000004b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000004c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000004d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000004e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000004f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000540: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000560: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000570: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   00000590: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000005a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000005b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000005c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000005d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb linear:   000005e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 00000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 000000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb tailroom: 000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

As expected the linear SKB is 4 bytes longer in this case.

Yours,
Linus Walleij
  
Linus Walleij Oct. 31, 2023, 10:44 p.m. UTC | #21
On Mon, Oct 30, 2023 at 3:16 PM Vladimir Oltean <olteanv@gmail.com> wrote:

>         if (skb->len >= 1496)
>                 __skb_put_padto(skb, 1518, false);
>
>         (...)
>
>         skb_push(skb, 4);
>
> which means that here, skb->len will be 1522, if it was originally 1496.
> So the code adds 26 extra octets, and only 4 of those are legitimate (a tag).
> The rest is absolutely unexplained, which means that until there is a
> valid explanation for them:

Indeed only 4 of them are needed, I tested to add 4 bytes on the tail
for > 1496 and it works.

I'll send a new version.

However that its "tag" is bogus since the extra tail isn't needed before
the paket becomes 1496 bytes, i.e. 1500 bytes including the tag.
It has a logic to it but ... yeah. All I can think of is bad VHDL in the
switch.

Yours,
Linus Walleij
  
Luiz Angelo Daros de Luca Nov. 1, 2023, 12:35 p.m. UTC | #22
Em ter., 31 de out. de 2023 às 16:27, Linus Walleij
<linus.walleij@linaro.org> escreveu:
>
> On Tue, Oct 31, 2023 at 8:18 PM Luiz Angelo Daros de Luca
> <luizluca@gmail.com> wrote:
>
> > > I don't have any other RTL8366RB systems than the D-Link DIR-685.
> > >
> > > I however have several systems with the same backing ethernet controller
> > > connected directly to a PHY and they all work fine.
> >
> > Hi Linus,
> >
> > I ported TL-WR1043nd to DSA using RTL8366RB on OpenWrt main. Do you
> > need some help testing the switch?
>
> Yes!
>
> > I just need to test ping with different sizes?
>
> Yes try to ping the host from the router:
>
> ping -s 1472 192.168.1.1 or so to send a 1500 byte ping packet,
> which will be padded up to a 1518 byte ethernet frame and
> 1522 bytes from the conduit interface.
>
> Then if it doesn't work, see if this patch solves the issue!

Hi Linus,

Sorry but I noticed no issues:

From the router:

No. Time Source Destination Protocol Length Info
1 0.000000000 192.168.1.1 192.168.1.2 ICMP 1514 Echo (ping) request
id=0x0789, seq=23/5888, ttl=64 (reply in 2)
2 0.000040094 192.168.1.2 192.168.1.1 ICMP 1514 Echo (ping) reply
id=0x0789, seq=23/5888, ttl=64 (request in 1)
5 1.000361559 192.168.1.1 192.168.1.2 ICMP 1514 Echo (ping) request
id=0x0789, seq=24/6144, ttl=64 (reply in 6)
6 1.000439668 192.168.1.2 192.168.1.1 ICMP 1514 Echo (ping) reply
id=0x0789, seq=24/6144, ttl=64 (request in 5)

From the host:

No. Time Source Destination Protocol Length Info
1 0.000000000 192.168.1.2 192.168.1.1 ICMP 1514 Echo (ping) request
id=0x0002, seq=8/2048, ttl=64 (reply in 2)
2 0.000391800 192.168.1.1 192.168.1.2 ICMP 1514 Echo (ping) reply
id=0x0002, seq=8/2048, ttl=64 (request in 1)
3 1.024825212 192.168.1.2 192.168.1.1 ICMP 1514 Echo (ping) request
id=0x0002, seq=9/2304, ttl=64 (reply in 4)
4 1.026865170 192.168.1.1 192.168.1.2 ICMP 1514 Echo (ping) reply
id=0x0002, seq=9/2304, ttl=64 (request in 3)

If I go over that limit, it fragments the packet as expected.

My device is using
https://github.com/luizluca/openwrt/tree/ath79_dsa_prep%2Bdevices . In
summary, kernel 6.1 with openwrt generic patches and the
reset-controller patch I sent net-next recently.

[    3.888540] realtek-smi switch: found an RTL8366RB switch
[    3.952366] realtek-smi switch: RTL5937 ver 3 chip found
[    3.967086] realtek-smi switch: set MAC: 42:E4:F5:XX:XX:XX
[    3.976779] realtek-smi switch: missing child interrupt-controller node
[    3.983455] realtek-smi switch: no interrupt support
[    4.158891] realtek-smi switch: no LED for port 5
[    4.164130] realtek-smi switch: configuring for fixed/rgmii link mode
[    4.171178] realtek-smi switch wan (uninitialized): PHY [SMI-0:00]
driver [RTL8366RB Gigabit Ethernet] (irq=POLL)
[    4.183849] realtek-smi switch lan1 (uninitialized): PHY [SMI-0:01]
driver [RTL8366RB Gigabit Ethernet] (irq=POLL)
[    4.196439] realtek-smi switch lan2 (uninitialized): PHY [SMI-0:02]
driver [RTL8366RB Gigabit Ethernet] (irq=POLL)
[    4.209258] realtek-smi switch lan3 (uninitialized): PHY [SMI-0:03]
driver [RTL8366RB Gigabit Ethernet] (irq=POLL)
[    4.221815] realtek-smi switch lan4 (uninitialized): PHY [SMI-0:04]
driver [RTL8366RB Gigabit Ethernet] (irq=POLL)
[    4.243071] realtek-smi switch: Link is Up - 1Gbps/Full - flow control off
[    9.707171] realtek-smi switch lan1: configuring for phy/gmii link mode
[    9.727707] realtek-smi switch lan1: Link is Up - 1Gbps/Full - flow
control rx/tx
[   12.289349] realtek-smi switch lan1: Link is Down
[   55.761797] realtek-smi switch lan1: configuring for phy/gmii link mode
[   57.460421] realtek-smi switch lan2: configuring for phy/gmii link mode
[   57.505039] realtek-smi switch lan3: configuring for phy/gmii link mode
[   57.823528] realtek-smi switch lan4: configuring for phy/gmii link mode
[   58.000712] realtek-smi switch lan1: Link is Up - 1Gbps/Full - flow
control rx/tx
[   58.181047] realtek-smi switch wan: configuring for phy/gmii link mode

Maybe the ag71xx driver is doing something differently.

Let me know if you need to test anything else. I didn't test the
device with your patch applied.

Regards,

Luiz
  
Linus Walleij Nov. 1, 2023, 8:26 p.m. UTC | #23
On Wed, Nov 1, 2023 at 1:35 PM Luiz Angelo Daros de Luca
<luizluca@gmail.com> wrote:

> Sorry but I noticed no issues:

Don't be sorry about that, it's good news because now I know
where to look, i.e. in the ethernet controller.

> My device is using
> https://github.com/luizluca/openwrt/tree/ath79_dsa_prep%2Bdevices . In
> summary, kernel 6.1 with openwrt generic patches and the
> reset-controller patch I sent net-next recently.

Looking good to me.

> [    3.888540] realtek-smi switch: found an RTL8366RB switch
> [    3.952366] realtek-smi switch: RTL5937 ver 3 chip found

Same version as mine too.

> [    3.967086] realtek-smi switch: set MAC: 42:E4:F5:XX:XX:XX

Unrelated: I have seen other DSA switches "inherit" the MAC of the
conduit interface (BRCM). I wonder if we could do that too instead
of this random MAC number. Usually the conduit interface has a
properly configured MAC.

> [    3.976779] realtek-smi switch: missing child interrupt-controller node
> [    3.983455] realtek-smi switch: no interrupt support
> [    4.158891] realtek-smi switch: no LED for port 5

Are the LEDs working? My device has no LEDs so I have never
tested it, despite I coded it. (Also these days we can actually
support each LED individually configured from device tree using
the LED API, but that would be quite a bit of code, so only some
fun for the aspiring developer...)

> Maybe the ag71xx driver is doing something differently.

I'll look at it!

Thanks a lot Luiz,
Linus Walleij
  
Vladimir Oltean Nov. 2, 2023, 10:04 a.m. UTC | #24
On Wed, Nov 01, 2023 at 09:26:50PM +0100, Linus Walleij wrote:
> > [    3.967086] realtek-smi switch: set MAC: 42:E4:F5:XX:XX:XX
> 
> Unrelated: I have seen other DSA switches "inherit" the MAC of the
> conduit interface (BRCM). I wonder if we could do that too instead
> of this random MAC number. Usually the conduit interface has a
> properly configured MAC.

We need to know what is the MAC address from the RTL8366RB_SMAR
registers used for.

What you know about is that DSA user interfaces have their own MAC
address for packet termination (send/receive to/from the CPU). MAC
addresses for switch ports are an abstract concept, of course (switches
normally just forward packets, nothing is addressed *to* them), and so,
these addresses are not programmed per se to hardware unless the
prerequisites of dsa_switch_supports_uc_filtering() are implemented.
If they are, the MAC addresses of user ports are programmed as host FDB
entries (FDB entries towards the CPU port).

The rule for establishing the MAC address of each user port is as
follows: if of_get_mac_address() finds something valid for that port's
OF node - provided by the bootloader - ("address", "mac-address",
"local-mac-address", a nvmem provider etc), then that value is used.
Otherwise, the MAC address is inherited from the conduit interface.

Some switches also have a global MAC address register (used for all
ports) of their own, but it is switch-specific what this does. We look
at the functionality it offers when deciding what to program it to,
since it's of course not possible to sync a single hardware register to
the value of the MAC addresses of multiple user ports.

For KSZ switches - see ksz_switch_macaddr_get() - the global MAC address
register is used for HSR self-address filtering and for Wake on LAN.
We sync the value of this hardware register with the MAC address of the
first user port on which these optional features are enabled. Then, we
allow these optional features only on the other user ports which have
the same MAC address as the original one which enabled that feature.
On KSZ, the same switch MAC address is also used as MAC SA in generated
pause frames, but to our knowledge, that MAC address could be any
address (even 00:00:00:00:00:00), so we don't really care too much about
that and we let it fall to whatever value it may.
  
Vladimir Oltean Nov. 2, 2023, 10:31 a.m. UTC | #25
Hi Luiz,

On Wed, Nov 01, 2023 at 09:35:30AM -0300, Luiz Angelo Daros de Luca wrote:
> Hi Linus,
> 
> Sorry but I noticed no issues:
> 
> From the router:
> 
> No. Time Source Destination Protocol Length Info
> 1 0.000000000 192.168.1.1 192.168.1.2 ICMP 1514 Echo (ping) request id=0x0789, seq=23/5888, ttl=64 (reply in 2)
> 2 0.000040094 192.168.1.2 192.168.1.1 ICMP 1514 Echo (ping) reply id=0x0789, seq=23/5888, ttl=64 (request in 1)
> 
> From the host:
> 
> No. Time Source Destination Protocol Length Info
> 1 0.000000000 192.168.1.2 192.168.1.1 ICMP 1514 Echo (ping) request id=0x0002, seq=8/2048, ttl=64 (reply in 2)
> 2 0.000391800 192.168.1.1 192.168.1.2 ICMP 1514 Echo (ping) reply id=0x0002, seq=8/2048, ttl=64 (request in 1)
> 
> If I go over that limit, it fragments the packet as expected.

Could you run the shell command that sweeps over the entire range, fromhere?
https://lore.kernel.org/netdev/20231030222035.oqos7v7sdq5u6mti@skbuf/
  
Vladimir Oltean Nov. 2, 2023, 10:45 a.m. UTC | #26
On Wed, Nov 01, 2023 at 09:26:50PM +0100, Linus Walleij wrote:
> On Wed, Nov 1, 2023 at 1:35 PM Luiz Angelo Daros de Luca <luizluca@gmail.com> wrote:
> 
> > Sorry but I noticed no issues:
> 
> Don't be sorry about that, it's good news because now I know
> where to look, i.e. in the ethernet controller.

Don't look too deeply into the code just yet, just try to see what
happens with dsa_loop on an identical Ethernet controller that isn't
physically attached to a switch.
  
Vladimir Oltean Nov. 2, 2023, 10:48 a.m. UTC | #27
On Tue, Oct 31, 2023 at 10:22:05PM +0100, Linus Walleij wrote:
> Hi Vladimir,
> 
> I got around to testing this too:
> 
> # ping -s 1472 192.168.1.137
> 
> The result:
> 
> SKB before padding:
> 37 (192.168.1.13skb len=1514 headroom=18 headlen=1514 tailroom=260
> mac=(18,14) net=(32,20) trans=52
> shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
> csum(0xd4ef2b1 ip_summed=0 complete_sw=0 valid=0 level=0)
> hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=0
> 7): 1472 data bydev name=lan0 feat=0x0002000000005020
> tes
> sk family=2 type=3 proto=1
> 
> SKB after padding:
> skb len=1518 headroom=18 headlen=1518 tailroom=256
> mac=(18,14) net=(32,20) trans=52
> shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
> csum(0xd4ef2b1 ip_summed=0 complete_sw=0 valid=0 level=0)
> hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=0
> dev name=lan0 feat=0x0002000000005020
> sk family=2 type=3 proto=1
> 
> As expected the linear SKB is 4 bytes longer in this case.

Ok, I'm not seeing anything unusual about the skb geometry in the
skb_dump() output.
  
Vladimir Oltean Nov. 2, 2023, 12:32 p.m. UTC | #28
On Tue, Oct 31, 2023 at 08:02:29PM +0100, Linus Walleij wrote:
> On Tue, Oct 31, 2023 at 5:34 PM Vladimir Oltean <olteanv@gmail.com> wrote:
> > So on the gemini-dlink-dir-685.dts platform, you can also use &gmac1 as
> > a plain Ethernet port, right?
> 
> As a port it exist on the SoC yes but it is not connected physically
> to anything.
> 
> &gmac0 is connected to the switch, and the switch has all the PHYs.
(...)
> If you by remote end mean the end of a physical cable there is
> no way I can do that, as I have no PHY on gmac1.
> 
> (I don't know if I misunderstand the question...)

No, you aren't.

> But I have other Gemini platforms, so I will try to do it on one
> of them! Let's see if I can do this thing....

Ok.
  
Luiz Angelo Daros de Luca Nov. 3, 2023, 5:05 p.m. UTC | #29
> > [    3.967086] realtek-smi switch: set MAC: 42:E4:F5:XX:XX:XX
>
> Unrelated: I have seen other DSA switches "inherit" the MAC of the
> conduit interface (BRCM). I wonder if we could do that too instead
> of this random MAC number. Usually the conduit interface has a
> properly configured MAC.
>
> > [    3.976779] realtek-smi switch: missing child interrupt-controller node
> > [    3.983455] realtek-smi switch: no interrupt support
> > [    4.158891] realtek-smi switch: no LED for port 5
>
> Are the LEDs working? My device has no LEDs so I have never
> tested it, despite I coded it. (Also these days we can actually
> support each LED individually configured from device tree using
> the LED API, but that would be quite a bit of code, so only some
> fun for the aspiring developer...)

No at all. LEDs sometimes change state all together but they normally
are just kept on.

My device is not funny to play with. It has only 32MB of RAM and it
frequently OOM. Even flashing a new image requires some sokoban,
unloading all possible kernel modules. It is not usable anymore in the
real world but I might take a look at LEDs just for fun. The LEDs in
the old swconfig driver do work and I can compare the code.

> > Maybe the ag71xx driver is doing something differently.
>
> I'll look at it!
>
> Thanks a lot Luiz,

I'm glad to help.

Regards,

Luiz
  
Luiz Angelo Daros de Luca Nov. 3, 2023, 5:13 p.m. UTC | #30
>
> Hi Luiz,
>
> On Wed, Nov 01, 2023 at 09:35:30AM -0300, Luiz Angelo Daros de Luca wrote:
> > Hi Linus,
> >
> > Sorry but I noticed no issues:
> >
> > From the router:
> >
> > No. Time Source Destination Protocol Length Info
> > 1 0.000000000 192.168.1.1 192.168.1.2 ICMP 1514 Echo (ping) request id=0x0789, seq=23/5888, ttl=64 (reply in 2)
> > 2 0.000040094 192.168.1.2 192.168.1.1 ICMP 1514 Echo (ping) reply id=0x0789, seq=23/5888, ttl=64 (request in 1)
> >
> > From the host:
> >
> > No. Time Source Destination Protocol Length Info
> > 1 0.000000000 192.168.1.2 192.168.1.1 ICMP 1514 Echo (ping) request id=0x0002, seq=8/2048, ttl=64 (reply in 2)
> > 2 0.000391800 192.168.1.1 192.168.1.2 ICMP 1514 Echo (ping) reply id=0x0002, seq=8/2048, ttl=64 (request in 1)
> >
> > If I go over that limit, it fragments the packet as expected.
>
> Could you run the shell command that sweeps over the entire range, fromhere?
> https://lore.kernel.org/netdev/20231030222035.oqos7v7sdq5u6mti@skbuf/

Sure. I might be able to run it on Monday. I'm away from the device
and it might have OOM. It has too little RAM to survive too much time
by itself. However, I doubt it might fail at a specific packet range
as it would hang an interactive SSH session quite easily.

Regards,

Luiz
  
Luiz Angelo Daros de Luca Dec. 30, 2023, 5:19 a.m. UTC | #31
> > [    3.976779] realtek-smi switch: missing child interrupt-controller node
> > [    3.983455] realtek-smi switch: no interrupt support
> > [    4.158891] realtek-smi switch: no LED for port 5
>
> Are the LEDs working? My device has no LEDs so I have never
> tested it, despite I coded it. (Also these days we can actually
> support each LED individually configured from device tree using
> the LED API, but that would be quite a bit of code, so only some
> fun for the aspiring developer...)

Hi Linus,

I took a look at the LED code. It looks like you got it a little bit wrong.

        /* Set blinking, TODO: make this configurable */
       ret = regmap_update_bits(priv->map, RTL8366RB_LED_BLINKRATE_REG,
                                RTL8366RB_LED_BLINKRATE_MASK,
                                RTL8366RB_LED_BLINKRATE_56MS);
       if (ret)
               return ret;

       /* Set up LED activity:
        * Each port has 4 LEDs, we configure all ports to the same
        * behaviour (no individual config) but we can set up each
        * LED separately.
        */
       if (priv->leds_disabled) {
               /* Turn everything off */
               regmap_update_bits(priv->map,
                                  RTL8366RB_LED_0_1_CTRL_REG,
                                  0x0FFF, 0);
               regmap_update_bits(priv->map,
                                  RTL8366RB_LED_2_3_CTRL_REG,
                                  0x0FFF, 0);
               regmap_update_bits(priv->map,
                                  RTL8366RB_INTERRUPT_CONTROL_REG,
                                  RTL8366RB_P4_RGMII_LED,
                                  0);
               val = RTL8366RB_LED_OFF;
       } else {
               /* TODO: make this configurable per LED */
               val = RTL8366RB_LED_FORCE;
       }
       for (i = 0; i < 4; i++) {
               ret = regmap_update_bits(priv->map,
                                        RTL8366RB_LED_CTRL_REG,
                                        0xf << (i * 4),
                                        val << (i * 4));
               if (ret)
                       return ret;
       }

If LEDs are not disabled, it will use the RTL8366RB_LED_FORCE for all
4 LED groups. That RTL8366RB_LED_FORCE keeps the LEDs on. I would use
RTL8366RB_LED_LINK_ACT by default to make it blink on link activity
(or make it configurable as the comment suggests) but it is not wrong.
I cannot evaluate the RTL8366RB_INTERRUPT_CONTROL_REG usage when you
disable the LEDs but it seems to be odd.

We also have:

static void rb8366rb_set_port_led(struct realtek_priv *priv,
                                 int port, bool enable)
{
       u16 val = enable ? 0x3f : 0;
       int ret;

       if (priv->leds_disabled)
               return;

       switch (port) {
       case 0:
               ret = regmap_update_bits(priv->map,
                                        RTL8366RB_LED_0_1_CTRL_REG,
                                        0x3F, val);
               break;
       case 1:
               ret = regmap_update_bits(priv->map,
                                        RTL8366RB_LED_0_1_CTRL_REG,
                                        0x3F << RTL8366RB_LED_1_OFFSET,
                                        val << RTL8366RB_LED_1_OFFSET);
               break;
       case 2:
               ret = regmap_update_bits(priv->map,
                                        RTL8366RB_LED_2_3_CTRL_REG,
                                        0x3F, val);
               break;
       case 3:
               ret = regmap_update_bits(priv->map,
                                        RTL8366RB_LED_2_3_CTRL_REG,
                                        0x3F << RTL8366RB_LED_3_OFFSET,
                                        val << RTL8366RB_LED_3_OFFSET);
               break;
       case 4:
               ret = regmap_update_bits(priv->map,
                                        RTL8366RB_INTERRUPT_CONTROL_REG,
                                        RTL8366RB_P4_RGMII_LED,
                                        enable ? RTL8366RB_P4_RGMII_LED : 0);
               break;
       default:
               dev_err(priv->dev, "no LED for port %d\n", port);
               return;
       }
       if (ret)
               dev_err(priv->dev, "error updating LED on port %d\n", port);
}

Here things gets strange. The code assumes that
RTL8366RB_LED_0_1_CTRL_REG is related to ports 0 and 1. However, it is
actually LED group 0 and 1. I don't have the docs but the register
seem to enable/disable a port in a group. The first LED pins for all
ports form the group 0 (and so on). My device only use the group 0 for
its 5 ports, limiting my tests. Anyway, to make all ports blink on
link act, I need, at least:

RTL8366RB_LED_CTRL_REG(0x0431) = 0x0002 / 0x000f (set led group 0 to
RTL8366RB_LED_LINK_ACT or 0x2)
RTL8366RB_LED_0_1_CTRL_REG(0x0432) = 0x001f / 0x003f (enable ports
0..5 in LED group 0. I don't really know the mask but the probe code
indicates it is 6 bits per group).

If you really want to disable port 0 LEDs in all groups, you should
unset the first bit for each group. If the mask is really 0x3f, it
would be:

               ret = regmap_update_bits(priv->map,
                                        RTL8366RB_LED_0_1_CTRL_REG,
                                        port,
                                        enable);
               ret = regmap_update_bits(priv->map,
                                        RTL8366RB_LED_0_1_CTRL_REG,
                                        port << RTL8366RB_LED_1_OFFSET,
                                        enable);
               ret = regmap_update_bits(priv->map,
                                        RTL8366RB_LED_2_3_CTRL_REG,
                                        port,
                                        enable);
               ret = regmap_update_bits(priv->map,
                                        RTL8366RB_LED_2_3_CTRL_REG,
                                        port << RTL8366RB_LED_3_OFFSET,
                                        enable);

This message, in fact, does not make sense:

> > [    4.158891] realtek-smi switch: no LED for port 5

I though that maybe we could setup a LED driver to expose the LEDs
status in sysfs. However, I'm not sure it is worth it. If you change a
LED behavior, it would break the HW triggering rule for all the group.
I'm not sure the LED API is ready to expose LEDs with related fate. It
would, indeed, be useful as a readonly source or just to
enable/disable a LED.

Regards,

Luiz
  
Linus Walleij Dec. 30, 2023, 12:28 p.m. UTC | #32
On Sat, Dec 30, 2023 at 6:19 AM Luiz Angelo Daros de Luca
<luizluca@gmail.com> wrote:

> I took a look at the LED code. It looks like you got it a little bit wrong.

You are right...

> If LEDs are not disabled, it will use the RTL8366RB_LED_FORCE for all
> 4 LED groups. That RTL8366RB_LED_FORCE keeps the LEDs on. I would use
> RTL8366RB_LED_LINK_ACT by default to make it blink on link activity
> (or make it configurable as the comment suggests) but it is not wrong.
> I cannot evaluate the RTL8366RB_INTERRUPT_CONTROL_REG usage when you
> disable the LEDs but it seems to be odd.

The problem is that since I don't have a device with LEDs connected
to tHE RTL8366RB it is all just dry coding.

I would suggest if you can test it just make a basic patch that will
at least turn on the LEDs to some default setting that works for
you?

> I though that maybe we could setup a LED driver to expose the LEDs
> status in sysfs. However, I'm not sure it is worth it. If you change a
> LED behavior, it would break the HW triggering rule for all the group.
> I'm not sure the LED API is ready to expose LEDs with related fate. It
> would, indeed, be useful as a readonly source or just to
> enable/disable a LED.

The LED subsystem supports hardware triggering etc thanks to the
elaborate work by Christian (ansuel). You can see an example of how
this is done in:
drivers/net/dsa/qca/qca8k-leds.c

Christian also extended the LEDs subsystem with the necessary
callbacks to support HW-backed LED control.

This can be used already to achieve HW triggers for the LEDs
from sysfs. (See callbacks .hw_control_is_supported,
.hw_control_set etc etc).

I was working to implement this for the Marvell switches but Andrew
wanted to do some more structured approach with a LED library
for DSA switches.

Yours,
Linus Walleij
  
Luiz Angelo Daros de Luca Dec. 30, 2023, 11:56 p.m. UTC | #33
> > I took a look at the LED code. It looks like you got it a little bit wrong.
>
> You are right...
>
> > If LEDs are not disabled, it will use the RTL8366RB_LED_FORCE for all
> > 4 LED groups. That RTL8366RB_LED_FORCE keeps the LEDs on. I would use
> > RTL8366RB_LED_LINK_ACT by default to make it blink on link activity
> > (or make it configurable as the comment suggests) but it is not wrong.
> > I cannot evaluate the RTL8366RB_INTERRUPT_CONTROL_REG usage when you
> > disable the LEDs but it seems to be odd.
>
> The problem is that since I don't have a device with LEDs connected
> to tHE RTL8366RB it is all just dry coding.
>
> I would suggest if you can test it just make a basic patch that will
> at least turn on the LEDs to some default setting that works for
> you?

Sure. I believe using link act will be much more useful than just
turning all leds on, independently from the port state. It is an easy
one-line fix.
I can do that after the other series gets merged.

I think that we should also remove rb8366rb_set_port_led(). Even if I
fix it, it will just turn the LED off when the port is
administratively down. RTL8366RB_LED_0_1_CTRL_REG and
RTL8366RB_LED_2_3_CTRL_REG are only used to control leds when you
force them to be on (RTL8366RB_LED_FORCE). In that scenario, the OS
might be in charge of triggering them, even for uses not related to
the switch.

> > I though that maybe we could setup a LED driver to expose the LEDs
> > status in sysfs. However, I'm not sure it is worth it. If you change a
> > LED behavior, it would break the HW triggering rule for all the group.
> > I'm not sure the LED API is ready to expose LEDs with related fate. It
> > would, indeed, be useful as a readonly source or just to
> > enable/disable a LED.
>
> The LED subsystem supports hardware triggering etc thanks to the
> elaborate work by Christian (ansuel). You can see an example of how
> this is done in:
> drivers/net/dsa/qca/qca8k-leds.c
>
> Christian also extended the LEDs subsystem with the necessary
> callbacks to support HW-backed LED control.
>
> This can be used already to achieve HW triggers for the LEDs
> from sysfs. (See callbacks .hw_control_is_supported,
> .hw_control_set etc etc).
>
> I was working to implement this for the Marvell switches but Andrew
> wanted to do some more structured approach with a LED library
> for DSA switches.

Yes, I took a look at it. It would be my inspiration if I go down that
road. I can use the HW offload only if all LEDs in the group share the
same trigger and timer. Otherwise, I'll need to fall back to software
control. I'll think about it if that is a viable solution.

Regards,

Luiz
  

Patch

diff --git a/net/dsa/tag_rtl4_a.c b/net/dsa/tag_rtl4_a.c
index c327314b95e3..3292bc49b158 100644
--- a/net/dsa/tag_rtl4_a.c
+++ b/net/dsa/tag_rtl4_a.c
@@ -45,6 +45,16 @@  static struct sk_buff *rtl4a_tag_xmit(struct sk_buff *skb,
 	if (unlikely(__skb_put_padto(skb, ETH_ZLEN, false)))
 		return NULL;
 
+	/* Packets over 1496 bytes get dropped unless they get padded
+	 * out to 1518 bytes. 1496 is ETH_DATA_LEN - tag which is hardly
+	 * a coinicidence, and 1518 is ETH_FRAME_LEN + FCS so we define
+	 * the threshold size and padding like this.
+	 */
+	if (skb->len >= (ETH_DATA_LEN - RTL4_A_HDR_LEN)) {
+		if (unlikely(__skb_put_padto(skb, ETH_FRAME_LEN + ETH_FCS_LEN, false)))
+			return NULL;
+	}
+
 	netdev_dbg(dev, "add realtek tag to package to port %d\n",
 		   dp->index);
 	skb_push(skb, RTL4_A_HDR_LEN);