[net-next,RFC] net: dsa: qca8k: make learning configurable and keep off if standalone

Message ID 20230623114005.9680-1-ansuelsmth@gmail.com
State New
Headers
Series [net-next,RFC] net: dsa: qca8k: make learning configurable and keep off if standalone |

Commit Message

Christian Marangi June 23, 2023, 11:40 a.m. UTC
  Address learning should initially be turned off by the driver for port
operation in standalone mode, then the DSA core handles changes to it
via ds->ops->port_bridge_flags().

Currently this is not the case for qca8k where learning is enabled
unconditionally in qca8k_setup for every user port.

Handle ports configured in standalone mode by making the learning
configurable and not enabling it by default.

Implement .port_pre_bridge_flags and .port_bridge_flags dsa ops to
enable learning for bridge that request it and tweak
.port_stp_state_set to correctly disable learning when port is
configured in standalone mode.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---

Posting as RFC as I would love some comments from Vladimir for correct
implementation of this. This was suggested to be on par with offload
bridge API and I used as example [1] (commit that does the same thing
with a microchip switch)

I didn't want to bloat the priv struct with additional info with the
port state and from what I can see it seems using dp->learning is enough
to understand if learning is currently enabled for the port or not but I
would love to have some confirmation about this. (from what I notice
when port is set in standalone mode, flags are cleared so it should be
correct)

I also verified this working by dumping the fdb in a bridge
configuration and in a standalone configuration. Traffic works in both
configuration.

Dump WITH BRIDGE CONFIGURATION:
01:00:5e:00:00:01 dev eth0 self permanent
33:33:00:00:00:02 dev eth0 self permanent
33:33:00:00:00:01 dev eth0 self permanent
33:33:ff:f2:5d:50 dev eth0 self permanent
33:33:ff:00:00:00 dev eth0 self permanent
dc:ef:09:f2:5d:4f dev eth1 self permanent
33:33:00:00:00:01 dev eth1 self permanent
33:33:00:00:00:02 dev eth1 self permanent
01:00:5e:00:00:01 dev eth1 self permanent
33:33:ff:f2:5d:4f dev eth1 self permanent
33:33:ff:00:00:00 dev eth1 self permanent
c0:3e:ba:c1:d7:47 dev lan1 master br-lan
dc:ef:09:f2:5d:4f dev lan1 vlan 1 master br-lan permanent
dc:ef:09:f2:5d:4f dev lan1 master br-lan permanent
c0:3e:ba:c1:d7:47 dev lan1 vlan 1 self
33:33:00:00:00:01 dev wlan0 self permanent
33:33:00:00:00:02 dev wlan0 self permanent
33:33:00:00:00:01 dev wlan1 self permanent
33:33:00:00:00:02 dev wlan1 self permanent
33:33:00:00:00:01 dev br-lan self permanent
33:33:00:00:00:02 dev br-lan self permanent
01:00:5e:00:00:01 dev br-lan self permanent
33:33:ff:00:00:01 dev br-lan self permanent
33:33:ff:f2:5d:4f dev br-lan self permanent
33:33:00:01:00:02 dev br-lan self permanent
33:33:00:01:00:03 dev br-lan self permanent
33:33:ff:00:00:00 dev br-lan self permanent

Dump WITH STANDALONE CONFIGURATION:
01:00:5e:00:00:01 dev eth0 self permanent
33:33:00:00:00:02 dev eth0 self permanent
33:33:00:00:00:01 dev eth0 self permanent
33:33:ff:f2:5d:50 dev eth0 self permanent
33:33:ff:00:00:00 dev eth0 self permanent
33:33:00:00:00:01 dev eth1 self permanent
33:33:00:00:00:02 dev eth1 self permanent
01:00:5e:00:00:01 dev eth1 self permanent
33:33:ff:f2:5d:4f dev eth1 self permanent
33:33:ff:00:00:01 dev eth1 self permanent
33:33:ff:00:00:00 dev eth1 self permanent
33:33:00:01:00:02 dev eth1 self permanent
33:33:00:01:00:03 dev eth1 self permanent
33:33:00:00:00:01 dev wlan0 self permanent
33:33:00:00:00:02 dev wlan0 self permanent
33:33:00:00:00:01 dev wlan1 self permanent
33:33:00:00:00:02 dev wlan1 self permanent

From what I can see there isn't any self entry with the MAC address of
the connected device and this should confirm that learning is actually
disabled.

Hope this is enough to test this feature and I would ask what would be
the next step to reach a point where port_change_master can be
implemented.
(would also love to see what are the criteria to enable offload_fwd_mask
on the targget for rcv and eventually for xmit)

Thanks for any response and sorry for the long comments.


[1] https://github.com/torvalds/linux/commit/15f7cfae912e

 drivers/net/dsa/qca/qca8k-8xxx.c   |  8 ++----
 drivers/net/dsa/qca/qca8k-common.c | 40 ++++++++++++++++++++++++++++++
 drivers/net/dsa/qca/qca8k.h        |  6 +++++
 3 files changed, 48 insertions(+), 6 deletions(-)
  

Comments

Vladimir Oltean June 25, 2023, 11:58 a.m. UTC | #1
On Fri, Jun 23, 2023 at 01:40:05PM +0200, Christian Marangi wrote:
> Address learning should initially be turned off by the driver for port
> operation in standalone mode, then the DSA core handles changes to it
> via ds->ops->port_bridge_flags().
> 
> Currently this is not the case for qca8k where learning is enabled
> unconditionally in qca8k_setup for every user port.
> 
> Handle ports configured in standalone mode by making the learning
> configurable and not enabling it by default.
> 
> Implement .port_pre_bridge_flags and .port_bridge_flags dsa ops to
> enable learning for bridge that request it and tweak
> .port_stp_state_set to correctly disable learning when port is
> configured in standalone mode.
> 
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> ---
> 
> Posting as RFC as I would love some comments from Vladimir for correct
> implementation of this. This was suggested to be on par with offload
> bridge API and I used as example [1] (commit that does the same thing
> with a microchip switch)
> 
> I didn't want to bloat the priv struct with additional info with the
> port state and from what I can see it seems using dp->learning is enough
> to understand if learning is currently enabled for the port or not but I
> would love to have some confirmation about this. (from what I notice
> when port is set in standalone mode, flags are cleared so it should be
> correct)

In principle you can use dp->learning, but in this case you are using it
incorrectly (more below).

> 
> I also verified this working by dumping the fdb in a bridge
> configuration and in a standalone configuration. Traffic works in both
> configuration.
> 
> Dump WITH BRIDGE CONFIGURATION:
> 01:00:5e:00:00:01 dev eth0 self permanent
> 33:33:00:00:00:02 dev eth0 self permanent
> 33:33:00:00:00:01 dev eth0 self permanent
> 33:33:ff:f2:5d:50 dev eth0 self permanent
> 33:33:ff:00:00:00 dev eth0 self permanent
> dc:ef:09:f2:5d:4f dev eth1 self permanent
> 33:33:00:00:00:01 dev eth1 self permanent
> 33:33:00:00:00:02 dev eth1 self permanent
> 01:00:5e:00:00:01 dev eth1 self permanent
> 33:33:ff:f2:5d:4f dev eth1 self permanent
> 33:33:ff:00:00:00 dev eth1 self permanent
> c0:3e:ba:c1:d7:47 dev lan1 master br-lan
> dc:ef:09:f2:5d:4f dev lan1 vlan 1 master br-lan permanent
> dc:ef:09:f2:5d:4f dev lan1 master br-lan permanent
> c0:3e:ba:c1:d7:47 dev lan1 vlan 1 self
> 33:33:00:00:00:01 dev wlan0 self permanent
> 33:33:00:00:00:02 dev wlan0 self permanent
> 33:33:00:00:00:01 dev wlan1 self permanent
> 33:33:00:00:00:02 dev wlan1 self permanent
> 33:33:00:00:00:01 dev br-lan self permanent
> 33:33:00:00:00:02 dev br-lan self permanent
> 01:00:5e:00:00:01 dev br-lan self permanent
> 33:33:ff:00:00:01 dev br-lan self permanent
> 33:33:ff:f2:5d:4f dev br-lan self permanent
> 33:33:00:01:00:02 dev br-lan self permanent
> 33:33:00:01:00:03 dev br-lan self permanent
> 33:33:ff:00:00:00 dev br-lan self permanent
> 
> Dump WITH STANDALONE CONFIGURATION:
> 01:00:5e:00:00:01 dev eth0 self permanent
> 33:33:00:00:00:02 dev eth0 self permanent
> 33:33:00:00:00:01 dev eth0 self permanent
> 33:33:ff:f2:5d:50 dev eth0 self permanent
> 33:33:ff:00:00:00 dev eth0 self permanent
> 33:33:00:00:00:01 dev eth1 self permanent
> 33:33:00:00:00:02 dev eth1 self permanent
> 01:00:5e:00:00:01 dev eth1 self permanent
> 33:33:ff:f2:5d:4f dev eth1 self permanent
> 33:33:ff:00:00:01 dev eth1 self permanent
> 33:33:ff:00:00:00 dev eth1 self permanent
> 33:33:00:01:00:02 dev eth1 self permanent
> 33:33:00:01:00:03 dev eth1 self permanent
> 33:33:00:00:00:01 dev wlan0 self permanent
> 33:33:00:00:00:02 dev wlan0 self permanent
> 33:33:00:00:00:01 dev wlan1 self permanent
> 33:33:00:00:00:02 dev wlan1 self permanent

The information from these dumps is pretty much irrelevant.

> From what I can see there isn't any self entry with the MAC address of
> the connected device and this should confirm that learning is actually
> disabled.
> 
> Hope this is enough to test this feature and I would ask what would be
> the next step to reach a point where port_change_master can be
> implemented.

The way to test this patch would be to connect in loopback 2 standalone
qca8k ports having the same MAC address, and ping from one to the other.

ip netns add ns0
ip link set lan1 netns ns0 && ip -n ns0 link set lan1 up
ip -n ns0 addr add 192.168.100.1/24 dev lan1
ip link set lan2 up && ip addr add 192.168.100.2/24 dev lan2
ping 192.168.100.1

Before, it shouldn't have worked, now it should.

Once that basic precondition passes, you should be able to start looking
at tools/testing/selftests/drivers/net/dsa/ and run those one by one.
An interesting one would be local_termination.sh, which monitors the way
in which frames reach the CPU. Though be aware that some sub-tests from
that suite will fail on misconfigurations that are non-fatal (and don't
impact functionality), just sub-optimal (affecting performance). Like
sending unknown packets to the CPU when the port is non-promiscuous and
software would drop those packets anyway.

> (would also love to see what are the criteria to enable offload_fwd_mask
> on the targget for rcv and eventually for xmit)

For RX, skb->offload_fwd_mark = true (mark not mask) means that the software
bridge shouldn't flood packets received on lanX towards other lanY ports that
are part of the same hwdom, because the hardware already took care of
that.

[ the hardware domain is determined by dev_get_port_parent_id() ->
  devlink_compat_switch_id_get() and populated by dsa_port_devlink_setup() ]

Obviously, the requirement is for the hardware to indeed take care of that :)
Currently it doesn't flood to the other user ports that are part of the
same bridge and have egress flooding enabled for that traffic type. It
just floods to the CPU and software decides where to flood. It's a
matter of implementing other brport flags, like BR_FLOOD and friends.

For TX, skb->offload_fwd_mark means that the driver should be able to
send a skb potentially towards multiple TX ports at the same time, as a
result of an FDB lookup. This makes the bridge avoid cloning that skb
and calling dev_queue_xmit() towards every individual port that it must
reach. I would concentrate on RX and leave TX for later.

> Thanks for any response and sorry for the long comments.
> 
> 
> [1] https://github.com/torvalds/linux/commit/15f7cfae912e
> 
>  drivers/net/dsa/qca/qca8k-8xxx.c   |  8 ++----
>  drivers/net/dsa/qca/qca8k-common.c | 40 ++++++++++++++++++++++++++++++
>  drivers/net/dsa/qca/qca8k.h        |  6 +++++
>  3 files changed, 48 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/dsa/qca/qca8k-8xxx.c b/drivers/net/dsa/qca/qca8k-8xxx.c
> index f08086ac2261..a9af270a03ce 100644
> --- a/drivers/net/dsa/qca/qca8k-8xxx.c
> +++ b/drivers/net/dsa/qca/qca8k-8xxx.c
> @@ -1963,12 +1963,6 @@ qca8k_setup(struct dsa_switch *ds)
>  			if (ret)
>  				return ret;
>  
> -			/* Enable ARP Auto-learning by default */
> -			ret = regmap_set_bits(priv->regmap, QCA8K_PORT_LOOKUP_CTRL(i),
> -					      QCA8K_PORT_LOOKUP_LEARN);
> -			if (ret)
> -				return ret;
> -
>  			/* For port based vlans to work we need to set the
>  			 * default egress vid
>  			 */
> @@ -2071,6 +2065,8 @@ static const struct dsa_switch_ops qca8k_switch_ops = {
>  	.port_change_mtu	= qca8k_port_change_mtu,
>  	.port_max_mtu		= qca8k_port_max_mtu,
>  	.port_stp_state_set	= qca8k_port_stp_state_set,
> +	.port_pre_bridge_flags	= qca8k_port_pre_bridge_flags,
> +	.port_bridge_flags	= qca8k_port_bridge_flags,
>  	.port_bridge_join	= qca8k_port_bridge_join,
>  	.port_bridge_leave	= qca8k_port_bridge_leave,
>  	.port_fast_age		= qca8k_port_fast_age,
> diff --git a/drivers/net/dsa/qca/qca8k-common.c b/drivers/net/dsa/qca/qca8k-common.c
> index 8c2dc0e48ff4..f93defbd8b66 100644
> --- a/drivers/net/dsa/qca/qca8k-common.c
> +++ b/drivers/net/dsa/qca/qca8k-common.c
> @@ -559,8 +559,24 @@ int qca8k_get_mac_eee(struct dsa_switch *ds, int port,
>  	return 0;
>  }
>  
> +static int qca8k_port_configure_learning(struct dsa_switch *ds, int port,
> +					 bool learning)
> +{
> +	struct qca8k_priv *priv = ds->priv;
> +
> +	if (learning)
> +		return regmap_set_bits(priv->regmap,
> +				       QCA8K_PORT_LOOKUP_CTRL(port),
> +				       QCA8K_PORT_LOOKUP_LEARN);
> +	else
> +		return regmap_clear_bits(priv->regmap,
> +					 QCA8K_PORT_LOOKUP_CTRL(port),
> +					 QCA8K_PORT_LOOKUP_LEARN);
> +}
> +
>  void qca8k_port_stp_state_set(struct dsa_switch *ds, int port, u8 state)
>  {
> +	struct dsa_port *dp = dsa_to_port(ds, port);
>  	struct qca8k_priv *priv = ds->priv;
>  	u32 stp_state;
>  
> @@ -585,6 +601,30 @@ void qca8k_port_stp_state_set(struct dsa_switch *ds, int port, u8 state)
>  
>  	qca8k_rmw(priv, QCA8K_PORT_LOOKUP_CTRL(port),
>  		  QCA8K_PORT_LOOKUP_STATE_MASK, stp_state);
> +
> +	qca8k_port_configure_learning(ds, port, dp->learning);

Learning should be enabled only if we're in an STP state compatible with
learning: BR_STATE_LEARNING, BR_STATE_FORWARDING. The dp->learning flag
does not follow the STP state, it is just an override for that.

So, the condition should be:
"(state == BR_STATE_LEARNING || state == BR_STATE_FORWARDING) && dp->learning"

> +}
> +
> +int qca8k_port_pre_bridge_flags(struct dsa_switch *ds, int port,
> +				struct switchdev_brport_flags flags,
> +				struct netlink_ext_ack *extack)
> +{
> +	if (flags.mask & ~BR_LEARNING)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +int qca8k_port_bridge_flags(struct dsa_switch *ds, int port,
> +			    struct switchdev_brport_flags flags,
> +			    struct netlink_ext_ack *extack)
> +{
> +	int ret;
> +
> +	ret = qca8k_port_configure_learning(ds, port,
> +					    flags.mask & ~BR_LEARNING);

flags.mask contains the bits that have changed.
flags.val contains the current value of all bits.

Passing flags.mask & ~BR_LEARNING (the mask of changed flags except for
BR_LEARNING) to qca8k_port_configure_learning() makes absolutely no sense.

> +
> +	return ret;
>  }
>  
>  int qca8k_port_bridge_join(struct dsa_switch *ds, int port,
> diff --git a/drivers/net/dsa/qca/qca8k.h b/drivers/net/dsa/qca/qca8k.h
> index c5cc8a172d65..8f88b7db384d 100644
> --- a/drivers/net/dsa/qca/qca8k.h
> +++ b/drivers/net/dsa/qca/qca8k.h
> @@ -522,6 +522,12 @@ int qca8k_get_mac_eee(struct dsa_switch *ds, int port, struct ethtool_eee *e);
>  
>  /* Common bridge function */
>  void qca8k_port_stp_state_set(struct dsa_switch *ds, int port, u8 state);
> +int qca8k_port_pre_bridge_flags(struct dsa_switch *ds, int port,
> +				struct switchdev_brport_flags flags,
> +				struct netlink_ext_ack *extack);
> +int qca8k_port_bridge_flags(struct dsa_switch *ds, int port,
> +			    struct switchdev_brport_flags flags,
> +			    struct netlink_ext_ack *extack);
>  int qca8k_port_bridge_join(struct dsa_switch *ds, int port,
>  			   struct dsa_bridge bridge,
>  			   bool *tx_fwd_offload,
> -- 
> 2.40.1
>
  
Christian Marangi June 26, 2023, 4:41 p.m. UTC | #2
On Sun, Jun 25, 2023 at 02:58:03PM +0300, Vladimir Oltean wrote:
> On Fri, Jun 23, 2023 at 01:40:05PM +0200, Christian Marangi wrote:
> > Address learning should initially be turned off by the driver for port
> > operation in standalone mode, then the DSA core handles changes to it
> > via ds->ops->port_bridge_flags().
> > 
> > Currently this is not the case for qca8k where learning is enabled
> > unconditionally in qca8k_setup for every user port.
> > 
> > Handle ports configured in standalone mode by making the learning
> > configurable and not enabling it by default.
> > 
> > Implement .port_pre_bridge_flags and .port_bridge_flags dsa ops to
> > enable learning for bridge that request it and tweak
> > .port_stp_state_set to correctly disable learning when port is
> > configured in standalone mode.
> > 
> > Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> > ---
> > 
> > Posting as RFC as I would love some comments from Vladimir for correct
> > implementation of this. This was suggested to be on par with offload
> > bridge API and I used as example [1] (commit that does the same thing
> > with a microchip switch)
> > 
> > I didn't want to bloat the priv struct with additional info with the
> > port state and from what I can see it seems using dp->learning is enough
> > to understand if learning is currently enabled for the port or not but I
> > would love to have some confirmation about this. (from what I notice
> > when port is set in standalone mode, flags are cleared so it should be
> > correct)
> 
> In principle you can use dp->learning, but in this case you are using it
> incorrectly (more below).
>

Hi, thanks a lot for the response and for catching the stupid mistake.

> > 
> > I also verified this working by dumping the fdb in a bridge
> > configuration and in a standalone configuration. Traffic works in both
> > configuration.
> > 
> > Dump WITH BRIDGE CONFIGURATION:
> > 01:00:5e:00:00:01 dev eth0 self permanent
> > 33:33:00:00:00:02 dev eth0 self permanent
> > 33:33:00:00:00:01 dev eth0 self permanent
> > 33:33:ff:f2:5d:50 dev eth0 self permanent
> > 33:33:ff:00:00:00 dev eth0 self permanent
> > dc:ef:09:f2:5d:4f dev eth1 self permanent
> > 33:33:00:00:00:01 dev eth1 self permanent
> > 33:33:00:00:00:02 dev eth1 self permanent
> > 01:00:5e:00:00:01 dev eth1 self permanent
> > 33:33:ff:f2:5d:4f dev eth1 self permanent
> > 33:33:ff:00:00:00 dev eth1 self permanent
> > c0:3e:ba:c1:d7:47 dev lan1 master br-lan
> > dc:ef:09:f2:5d:4f dev lan1 vlan 1 master br-lan permanent
> > dc:ef:09:f2:5d:4f dev lan1 master br-lan permanent
> > c0:3e:ba:c1:d7:47 dev lan1 vlan 1 self
> > 33:33:00:00:00:01 dev wlan0 self permanent
> > 33:33:00:00:00:02 dev wlan0 self permanent
> > 33:33:00:00:00:01 dev wlan1 self permanent
> > 33:33:00:00:00:02 dev wlan1 self permanent
> > 33:33:00:00:00:01 dev br-lan self permanent
> > 33:33:00:00:00:02 dev br-lan self permanent
> > 01:00:5e:00:00:01 dev br-lan self permanent
> > 33:33:ff:00:00:01 dev br-lan self permanent
> > 33:33:ff:f2:5d:4f dev br-lan self permanent
> > 33:33:00:01:00:02 dev br-lan self permanent
> > 33:33:00:01:00:03 dev br-lan self permanent
> > 33:33:ff:00:00:00 dev br-lan self permanent
> > 
> > Dump WITH STANDALONE CONFIGURATION:
> > 01:00:5e:00:00:01 dev eth0 self permanent
> > 33:33:00:00:00:02 dev eth0 self permanent
> > 33:33:00:00:00:01 dev eth0 self permanent
> > 33:33:ff:f2:5d:50 dev eth0 self permanent
> > 33:33:ff:00:00:00 dev eth0 self permanent
> > 33:33:00:00:00:01 dev eth1 self permanent
> > 33:33:00:00:00:02 dev eth1 self permanent
> > 01:00:5e:00:00:01 dev eth1 self permanent
> > 33:33:ff:f2:5d:4f dev eth1 self permanent
> > 33:33:ff:00:00:01 dev eth1 self permanent
> > 33:33:ff:00:00:00 dev eth1 self permanent
> > 33:33:00:01:00:02 dev eth1 self permanent
> > 33:33:00:01:00:03 dev eth1 self permanent
> > 33:33:00:00:00:01 dev wlan0 self permanent
> > 33:33:00:00:00:02 dev wlan0 self permanent
> > 33:33:00:00:00:01 dev wlan1 self permanent
> > 33:33:00:00:00:02 dev wlan1 self permanent
> 
> The information from these dumps is pretty much irrelevant.
> 
> > From what I can see there isn't any self entry with the MAC address of
> > the connected device and this should confirm that learning is actually
> > disabled.
> > 
> > Hope this is enough to test this feature and I would ask what would be
> > the next step to reach a point where port_change_master can be
> > implemented.
> 
> The way to test this patch would be to connect in loopback 2 standalone
> qca8k ports having the same MAC address, and ping from one to the other.
> 
> ip netns add ns0
> ip link set lan1 netns ns0 && ip -n ns0 link set lan1 up
> ip -n ns0 addr add 192.168.100.1/24 dev lan1
> ip link set lan2 up && ip addr add 192.168.100.2/24 dev lan2
> ping 192.168.100.1
> 
> Before, it shouldn't have worked, now it should.

I can confirm this works.

> 
> Once that basic precondition passes, you should be able to start looking
> at tools/testing/selftests/drivers/net/dsa/ and run those one by one.
> An interesting one would be local_termination.sh, which monitors the way
> in which frames reach the CPU. Though be aware that some sub-tests from
> that suite will fail on misconfigurations that are non-fatal (and don't
> impact functionality), just sub-optimal (affecting performance). Like
> sending unknown packets to the CPU when the port is non-promiscuous and
> software would drop those packets anyway.
> 

Lots of difficult to run the selftests on a light fw but step at times
I'm managing to make use of them (could be helpfull to add some comments
in the .config saying that the testing port needs to be declared in the
struct) (and maybe some additional checks on the kind of device type are
required for the test to actually work (vrf, dummy, macvlan...)

Anyway a run of local_termination.sh produce the following output.
# selftests: drivers/net/dsa: local_termination.sh
# TEST: lan1: Unicast IPv4 to primary MAC address                     [FAIL]
#       reception failed
# TEST: lan1: Unicast IPv4 to macvlan MAC address                     [FAIL]
#       reception failed
# TEST: lan1: Unicast IPv4 to unknown MAC address                     [ OK ]
# TEST: lan1: Unicast IPv4 to unknown MAC address, promisc            [FAIL]
#       reception failed
# TEST: lan1: Unicast IPv4 to unknown MAC address, allmulti           [ OK ]
# TEST: lan1: Multicast IPv4 to joined group                          [ OK ]
# TEST: lan1: Multicast IPv4 to unknown group                         [FAIL]
#       reception succeeded, but should have failed
# TEST: lan1: Multicast IPv4 to unknown group, promisc                [FAIL]
#       reception failed
# TEST: lan1: Multicast IPv4 to unknown group, allmulti               [ OK ]
# TEST: lan1: Multicast IPv6 to joined group                          [ OK ]
# TEST: lan1: Multicast IPv6 to unknown group                         [FAIL]
#       reception succeeded, but should have failed
# TEST: lan1: Multicast IPv6 to unknown group, promisc                [FAIL]
#       reception failed
# TEST: lan1: Multicast IPv6 to unknown group, allmulti               [ OK ]
# TEST: br0: Unicast IPv4 to primary MAC address                      [FAIL]
#       reception failed
# TEST: br0: Unicast IPv4 to macvlan MAC address                      [FAIL]
#       reception failed
# TEST: br0: Unicast IPv4 to unknown MAC address                      [ OK ]
# TEST: br0: Unicast IPv4 to unknown MAC address, promisc             [FAIL]
#       reception failed
# TEST: br0: Unicast IPv4 to unknown MAC address, allmulti            [ OK ]
# TEST: br0: Multicast IPv4 to joined group                           [ OK ]
# TEST: br0: Multicast IPv4 to unknown group                          [FAIL]
#       reception succeeded, but should have failed
# TEST: br0: Multicast IPv4 to unknown group, promisc                 [FAIL]
#       reception failed
# TEST: br0: Multicast IPv4 to unknown group, allmulti                [ OK ]
# TEST: br0: Multicast IPv6 to joined group                           [ OK ]
# TEST: br0: Multicast IPv6 to unknown group                          [FAIL]
#       reception succeeded, but should have failed
# TEST: br0: Multicast IPv6 to unknown group, promisc                 [FAIL]
#       reception failed
# TEST: br0: Multicast IPv6 to unknown group, allmulti                [ OK ]

Things doesn't look good to me or I am wrong?

> > (would also love to see what are the criteria to enable offload_fwd_mask
> > on the targget for rcv and eventually for xmit)
> 
> For RX, skb->offload_fwd_mark = true (mark not mask) means that the software
> bridge shouldn't flood packets received on lanX towards other lanY ports that
> are part of the same hwdom, because the hardware already took care of
> that.
> 
> [ the hardware domain is determined by dev_get_port_parent_id() ->
>   devlink_compat_switch_id_get() and populated by dsa_port_devlink_setup() ]
> 
> Obviously, the requirement is for the hardware to indeed take care of that :)
> Currently it doesn't flood to the other user ports that are part of the
> same bridge and have egress flooding enabled for that traffic type. It
> just floods to the CPU and software decides where to flood. It's a
> matter of implementing other brport flags, like BR_FLOOD and friends.
> 
> For TX, skb->offload_fwd_mark means that the driver should be able to
> send a skb potentially towards multiple TX ports at the same time, as a
> result of an FDB lookup. This makes the bridge avoid cloning that skb
> and calling dev_queue_xmit() towards every individual port that it must
> reach. I would concentrate on RX and leave TX for later.
> 

Sure.

> > Thanks for any response and sorry for the long comments.
> > 
> > 
> > [1] https://github.com/torvalds/linux/commit/15f7cfae912e
> > 
> >  drivers/net/dsa/qca/qca8k-8xxx.c   |  8 ++----
> >  drivers/net/dsa/qca/qca8k-common.c | 40 ++++++++++++++++++++++++++++++
> >  drivers/net/dsa/qca/qca8k.h        |  6 +++++
> >  3 files changed, 48 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/net/dsa/qca/qca8k-8xxx.c b/drivers/net/dsa/qca/qca8k-8xxx.c
> > index f08086ac2261..a9af270a03ce 100644
> > --- a/drivers/net/dsa/qca/qca8k-8xxx.c
> > +++ b/drivers/net/dsa/qca/qca8k-8xxx.c
> > @@ -1963,12 +1963,6 @@ qca8k_setup(struct dsa_switch *ds)
> >  			if (ret)
> >  				return ret;
> >  
> > -			/* Enable ARP Auto-learning by default */
> > -			ret = regmap_set_bits(priv->regmap, QCA8K_PORT_LOOKUP_CTRL(i),
> > -					      QCA8K_PORT_LOOKUP_LEARN);
> > -			if (ret)
> > -				return ret;
> > -
> >  			/* For port based vlans to work we need to set the
> >  			 * default egress vid
> >  			 */
> > @@ -2071,6 +2065,8 @@ static const struct dsa_switch_ops qca8k_switch_ops = {
> >  	.port_change_mtu	= qca8k_port_change_mtu,
> >  	.port_max_mtu		= qca8k_port_max_mtu,
> >  	.port_stp_state_set	= qca8k_port_stp_state_set,
> > +	.port_pre_bridge_flags	= qca8k_port_pre_bridge_flags,
> > +	.port_bridge_flags	= qca8k_port_bridge_flags,
> >  	.port_bridge_join	= qca8k_port_bridge_join,
> >  	.port_bridge_leave	= qca8k_port_bridge_leave,
> >  	.port_fast_age		= qca8k_port_fast_age,
> > diff --git a/drivers/net/dsa/qca/qca8k-common.c b/drivers/net/dsa/qca/qca8k-common.c
> > index 8c2dc0e48ff4..f93defbd8b66 100644
> > --- a/drivers/net/dsa/qca/qca8k-common.c
> > +++ b/drivers/net/dsa/qca/qca8k-common.c
> > @@ -559,8 +559,24 @@ int qca8k_get_mac_eee(struct dsa_switch *ds, int port,
> >  	return 0;
> >  }
> >  
> > +static int qca8k_port_configure_learning(struct dsa_switch *ds, int port,
> > +					 bool learning)
> > +{
> > +	struct qca8k_priv *priv = ds->priv;
> > +
> > +	if (learning)
> > +		return regmap_set_bits(priv->regmap,
> > +				       QCA8K_PORT_LOOKUP_CTRL(port),
> > +				       QCA8K_PORT_LOOKUP_LEARN);
> > +	else
> > +		return regmap_clear_bits(priv->regmap,
> > +					 QCA8K_PORT_LOOKUP_CTRL(port),
> > +					 QCA8K_PORT_LOOKUP_LEARN);
> > +}
> > +
> >  void qca8k_port_stp_state_set(struct dsa_switch *ds, int port, u8 state)
> >  {
> > +	struct dsa_port *dp = dsa_to_port(ds, port);
> >  	struct qca8k_priv *priv = ds->priv;
> >  	u32 stp_state;
> >  
> > @@ -585,6 +601,30 @@ void qca8k_port_stp_state_set(struct dsa_switch *ds, int port, u8 state)
> >  
> >  	qca8k_rmw(priv, QCA8K_PORT_LOOKUP_CTRL(port),
> >  		  QCA8K_PORT_LOOKUP_STATE_MASK, stp_state);
> > +
> > +	qca8k_port_configure_learning(ds, port, dp->learning);
> 
> Learning should be enabled only if we're in an STP state compatible with
> learning: BR_STATE_LEARNING, BR_STATE_FORWARDING. The dp->learning flag
> does not follow the STP state, it is just an override for that.
> 
> So, the condition should be:
> "(state == BR_STATE_LEARNING || state == BR_STATE_FORWARDING) && dp->learning"
>

Fixed.

> > +}
> > +
> > +int qca8k_port_pre_bridge_flags(struct dsa_switch *ds, int port,
> > +				struct switchdev_brport_flags flags,
> > +				struct netlink_ext_ack *extack)
> > +{
> > +	if (flags.mask & ~BR_LEARNING)
> > +		return -EINVAL;
> > +
> > +	return 0;
> > +}
> > +
> > +int qca8k_port_bridge_flags(struct dsa_switch *ds, int port,
> > +			    struct switchdev_brport_flags flags,
> > +			    struct netlink_ext_ack *extack)
> > +{
> > +	int ret;
> > +
> > +	ret = qca8k_port_configure_learning(ds, port,
> > +					    flags.mask & ~BR_LEARNING);
> 
> flags.mask contains the bits that have changed.
> flags.val contains the current value of all bits.
> 
> Passing flags.mask & ~BR_LEARNING (the mask of changed flags except for
> BR_LEARNING) to qca8k_port_configure_learning() makes absolutely no sense.
> 

Very stupid of me.

> > +
> > +	return ret;
> >  }
> >  
> >  int qca8k_port_bridge_join(struct dsa_switch *ds, int port,
> > diff --git a/drivers/net/dsa/qca/qca8k.h b/drivers/net/dsa/qca/qca8k.h
> > index c5cc8a172d65..8f88b7db384d 100644
> > --- a/drivers/net/dsa/qca/qca8k.h
> > +++ b/drivers/net/dsa/qca/qca8k.h
> > @@ -522,6 +522,12 @@ int qca8k_get_mac_eee(struct dsa_switch *ds, int port, struct ethtool_eee *e);
> >  
> >  /* Common bridge function */
> >  void qca8k_port_stp_state_set(struct dsa_switch *ds, int port, u8 state);
> > +int qca8k_port_pre_bridge_flags(struct dsa_switch *ds, int port,
> > +				struct switchdev_brport_flags flags,
> > +				struct netlink_ext_ack *extack);
> > +int qca8k_port_bridge_flags(struct dsa_switch *ds, int port,
> > +			    struct switchdev_brport_flags flags,
> > +			    struct netlink_ext_ack *extack);
> >  int qca8k_port_bridge_join(struct dsa_switch *ds, int port,
> >  			   struct dsa_bridge bridge,
> >  			   bool *tx_fwd_offload,
> > -- 
> > 2.40.1
> > 
>
  
Vladimir Oltean June 26, 2023, 5:30 p.m. UTC | #3
On Mon, Jun 26, 2023 at 06:41:50PM +0200, Christian Marangi wrote:
> > Once that basic precondition passes, you should be able to start looking
> > at tools/testing/selftests/drivers/net/dsa/ and run those one by one.
> > An interesting one would be local_termination.sh, which monitors the way
> > in which frames reach the CPU. Though be aware that some sub-tests from
> > that suite will fail on misconfigurations that are non-fatal (and don't
> > impact functionality), just sub-optimal (affecting performance). Like
> > sending unknown packets to the CPU when the port is non-promiscuous and
> > software would drop those packets anyway.
> > 
> 
> Lots of difficult to run the selftests on a light fw but step at times
> I'm managing to make use of them (could be helpfull to add some comments
> in the .config saying that the testing port needs to be declared in the
> struct) (and maybe some additional checks on the kind of device type are
> required for the test to actually work (vrf, dummy, macvlan...)

Yeah, that doesn't sound like a bad idea at all. AFAIK,
tools/testing/selftests/net/forwarding/lib.sh doesn't check
"zcat /proc/config.gz" at all. Maybe it would be nice if it did, and to
guard that behavior based on some REQUIRE_* variables that are true by
default (but can be set to false by scripts).

> Anyway a run of local_termination.sh produce the following output.
> # selftests: drivers/net/dsa: local_termination.sh
> # TEST: lan1: Unicast IPv4 to primary MAC address                     [FAIL]
> #       reception failed

Hmm, so ping works but this doesn't? That's strange, because send_uc_ipv4()
also pings. Have you run with bash -x to see why it fails?

> # TEST: lan1: Unicast IPv4 to macvlan MAC address                     [FAIL]
> #       reception failed
> # TEST: lan1: Unicast IPv4 to unknown MAC address                     [ OK ]

So the only reason why this test passes is because in this case, the
unicast drops are okay?

> # TEST: lan1: Unicast IPv4 to unknown MAC address, promisc            [FAIL]
> #       reception failed
> # TEST: lan1: Unicast IPv4 to unknown MAC address, allmulti           [ OK ]

Similar here. Packet should have been dropped; the test detects a drop => okay.
Passes for the wrong reason, most likely, because this driver doesn't react
on IFF_PROMISC or IFF_ALLMULTI.

> # TEST: lan1: Multicast IPv4 to joined group                          [ OK ]
> # TEST: lan1: Multicast IPv4 to unknown group                         [FAIL]
> #       reception succeeded, but should have failed

"reception succeeded, but should have failed" is the okay kind of failure.
"reception failed" is what's bothering.

> # TEST: lan1: Multicast IPv4 to unknown group, promisc                [FAIL]
> #       reception failed
> # TEST: lan1: Multicast IPv4 to unknown group, allmulti               [ OK ]
> # TEST: lan1: Multicast IPv6 to joined group                          [ OK ]
> # TEST: lan1: Multicast IPv6 to unknown group                         [FAIL]
> #       reception succeeded, but should have failed
> # TEST: lan1: Multicast IPv6 to unknown group, promisc                [FAIL]
> #       reception failed

This I cannot explain. For the test right above, "Multicast IPv6 to unknown group",
it said that reception succeeded. This is sending the same packet, only
the IFF_PROMISC flag of the device changes (this is also propagated to
the DSA master). I've no idea why it fails. Again, bash -x will say more.

> # TEST: lan1: Multicast IPv6 to unknown group, allmulti               [ OK ]
> # TEST: br0: Unicast IPv4 to primary MAC address                      [FAIL]
> #       reception failed
> # TEST: br0: Unicast IPv4 to macvlan MAC address                      [FAIL]
> #       reception failed
> # TEST: br0: Unicast IPv4 to unknown MAC address                      [ OK ]
> # TEST: br0: Unicast IPv4 to unknown MAC address, promisc             [FAIL]
> #       reception failed
> # TEST: br0: Unicast IPv4 to unknown MAC address, allmulti            [ OK ]
> # TEST: br0: Multicast IPv4 to joined group                           [ OK ]
> # TEST: br0: Multicast IPv4 to unknown group                          [FAIL]
> #       reception succeeded, but should have failed
> # TEST: br0: Multicast IPv4 to unknown group, promisc                 [FAIL]
> #       reception failed
> # TEST: br0: Multicast IPv4 to unknown group, allmulti                [ OK ]
> # TEST: br0: Multicast IPv6 to joined group                           [ OK ]
> # TEST: br0: Multicast IPv6 to unknown group                          [FAIL]
> #       reception succeeded, but should have failed
> # TEST: br0: Multicast IPv6 to unknown group, promisc                 [FAIL]
> #       reception failed
> # TEST: br0: Multicast IPv6 to unknown group, allmulti                [ OK ]
> 
> Things doesn't look good to me or I am wrong?

Nope, doesn't look good at all. Looks like an incomplete setup.
  
Christian Marangi June 26, 2023, 5:59 p.m. UTC | #4
On Mon, Jun 26, 2023 at 08:30:56PM +0300, Vladimir Oltean wrote:
> On Mon, Jun 26, 2023 at 06:41:50PM +0200, Christian Marangi wrote:
> > > Once that basic precondition passes, you should be able to start looking
> > > at tools/testing/selftests/drivers/net/dsa/ and run those one by one.
> > > An interesting one would be local_termination.sh, which monitors the way
> > > in which frames reach the CPU. Though be aware that some sub-tests from
> > > that suite will fail on misconfigurations that are non-fatal (and don't
> > > impact functionality), just sub-optimal (affecting performance). Like
> > > sending unknown packets to the CPU when the port is non-promiscuous and
> > > software would drop those packets anyway.
> > > 
> > 
> > Lots of difficult to run the selftests on a light fw but step at times
> > I'm managing to make use of them (could be helpfull to add some comments
> > in the .config saying that the testing port needs to be declared in the
> > struct) (and maybe some additional checks on the kind of device type are
> > required for the test to actually work (vrf, dummy, macvlan...)
> 
> Yeah, that doesn't sound like a bad idea at all. AFAIK,
> tools/testing/selftests/net/forwarding/lib.sh doesn't check
> "zcat /proc/config.gz" at all. Maybe it would be nice if it did, and to
> guard that behavior based on some REQUIRE_* variables that are true by
> default (but can be set to false by scripts).
>

Some thing can even be checked by simply creating an interface and see
if the thing gives error. I feel this is a better approach than checking
config and kflags since from what I can see the idea of these scripts is
to be system agnostic and sometimes it's handy to ""compile"" or package
these scripts on a different system than the target one (this is true
for every shell script and target specific stuff that needs to be
compiled is not the case)

> > Anyway a run of local_termination.sh produce the following output.
> > # selftests: drivers/net/dsa: local_termination.sh
> > # TEST: lan1: Unicast IPv4 to primary MAC address                     [FAIL]
> > #       reception failed
> 
> Hmm, so ping works but this doesn't? That's strange, because send_uc_ipv4()
> also pings. Have you run with bash -x to see why it fails?
> 

I just run with bash -x and I also mod the script to not delete the
tcpdump. Limiting the script to only this test the dump is just 2 ICMPv6
packet and no output from tcpdump aside from

tcpdump: listening on lan1, link-type EN10MB (Ethernet), snapshot length 65535 bytes
3 packets captured
5 packets received by filter
0 packets dropped by kernel

I feel like this is important so I think I should focus on understanding
why this doesn't work? Any clue?

> > # TEST: lan1: Unicast IPv4 to macvlan MAC address                     [FAIL]
> > #       reception failed
> > # TEST: lan1: Unicast IPv4 to unknown MAC address                     [ OK ]
> 
> So the only reason why this test passes is because in this case, the
> unicast drops are okay?
> 
> > # TEST: lan1: Unicast IPv4 to unknown MAC address, promisc            [FAIL]
> > #       reception failed
> > # TEST: lan1: Unicast IPv4 to unknown MAC address, allmulti           [ OK ]
> 
> Similar here. Packet should have been dropped; the test detects a drop => okay.
> Passes for the wrong reason, most likely, because this driver doesn't react
> on IFF_PROMISC or IFF_ALLMULTI.
> 
> > # TEST: lan1: Multicast IPv4 to joined group                          [ OK ]
> > # TEST: lan1: Multicast IPv4 to unknown group                         [FAIL]
> > #       reception succeeded, but should have failed
> 
> "reception succeeded, but should have failed" is the okay kind of failure.
> "reception failed" is what's bothering.
> 
> > # TEST: lan1: Multicast IPv4 to unknown group, promisc                [FAIL]
> > #       reception failed
> > # TEST: lan1: Multicast IPv4 to unknown group, allmulti               [ OK ]
> > # TEST: lan1: Multicast IPv6 to joined group                          [ OK ]
> > # TEST: lan1: Multicast IPv6 to unknown group                         [FAIL]
> > #       reception succeeded, but should have failed
> > # TEST: lan1: Multicast IPv6 to unknown group, promisc                [FAIL]
> > #       reception failed
> 
> This I cannot explain. For the test right above, "Multicast IPv6 to unknown group",
> it said that reception succeeded. This is sending the same packet, only
> the IFF_PROMISC flag of the device changes (this is also propagated to
> the DSA master). I've no idea why it fails. Again, bash -x will say more.
> 
> > # TEST: lan1: Multicast IPv6 to unknown group, allmulti               [ OK ]
> > # TEST: br0: Unicast IPv4 to primary MAC address                      [FAIL]
> > #       reception failed
> > # TEST: br0: Unicast IPv4 to macvlan MAC address                      [FAIL]
> > #       reception failed
> > # TEST: br0: Unicast IPv4 to unknown MAC address                      [ OK ]
> > # TEST: br0: Unicast IPv4 to unknown MAC address, promisc             [FAIL]
> > #       reception failed
> > # TEST: br0: Unicast IPv4 to unknown MAC address, allmulti            [ OK ]
> > # TEST: br0: Multicast IPv4 to joined group                           [ OK ]
> > # TEST: br0: Multicast IPv4 to unknown group                          [FAIL]
> > #       reception succeeded, but should have failed
> > # TEST: br0: Multicast IPv4 to unknown group, promisc                 [FAIL]
> > #       reception failed
> > # TEST: br0: Multicast IPv4 to unknown group, allmulti                [ OK ]
> > # TEST: br0: Multicast IPv6 to joined group                           [ OK ]
> > # TEST: br0: Multicast IPv6 to unknown group                          [FAIL]
> > #       reception succeeded, but should have failed
> > # TEST: br0: Multicast IPv6 to unknown group, promisc                 [FAIL]
> > #       reception failed
> > # TEST: br0: Multicast IPv6 to unknown group, allmulti                [ OK ]
> > 
> > Things doesn't look good to me or I am wrong?
> 
> Nope, doesn't look good at all. Looks like an incomplete setup.
  
Vladimir Oltean June 27, 2023, 9:49 a.m. UTC | #5
On Mon, Jun 26, 2023 at 07:59:41PM +0200, Christian Marangi wrote:
> > Hmm, so ping works but this doesn't? That's strange, because send_uc_ipv4()
> > also pings. Have you run with bash -x to see why it fails?
> > 
> 
> I just run with bash -x and I also mod the script to not delete the
> tcpdump. Limiting the script to only this test the dump is just 2 ICMPv6
> packet and no output from tcpdump aside from
> 
> tcpdump: listening on lan1, link-type EN10MB (Ethernet), snapshot length 65535 bytes
> 3 packets captured
> 5 packets received by filter
> 0 packets dropped by kernel
> 
> I feel like this is important so I think I should focus on understanding
> why this doesn't work? Any clue?

No clue. I'd put a "bash" instruction in send_uc_ipv4(), which would act
as a sort of break point for the script (opens an interactive sub-shell),
then run it again with bash -x, manually repeat the command that failed,
investigate why it failed and hit Ctrl-d when I'm done.
  
Christian Marangi June 28, 2023, 12:49 a.m. UTC | #6
On Tue, Jun 27, 2023 at 12:49:16PM +0300, Vladimir Oltean wrote:
> On Mon, Jun 26, 2023 at 07:59:41PM +0200, Christian Marangi wrote:
> > > Hmm, so ping works but this doesn't? That's strange, because send_uc_ipv4()
> > > also pings. Have you run with bash -x to see why it fails?
> > > 
> > 
> > I just run with bash -x and I also mod the script to not delete the
> > tcpdump. Limiting the script to only this test the dump is just 2 ICMPv6
> > packet and no output from tcpdump aside from
> > 
> > tcpdump: listening on lan1, link-type EN10MB (Ethernet), snapshot length 65535 bytes
> > 3 packets captured
> > 5 packets received by filter
> > 0 packets dropped by kernel
> > 
> > I feel like this is important so I think I should focus on understanding
> > why this doesn't work? Any clue?
> 
> No clue. I'd put a "bash" instruction in send_uc_ipv4(), which would act
> as a sort of break point for the script (opens an interactive sub-shell),
> then run it again with bash -x, manually repeat the command that failed,
> investigate why it failed and hit Ctrl-d when I'm done.

Hi, I wasted a day to only notice that the problem was in busybox not
supporting 0.x value and that is what selfttest script use. Another
thing to check. Also the confusing part of this is that we don't check
if ping_do return error and the test just fails (while in reality the
ping command was never executed)

Anyway I'm fixing all kind of bugs and I even found an even hw
limitation with the FDB table with mgmt packet...

Also I implemented fdb_insolation following the implementation done on
felix with reserving VID at the end...
About this I wonder if it might be a good idea to expose some generic
API from DSA?

qca8k require a reserved VID for VLAN unaware port and with
fdb_isolation even more VID are reserved. DSA have no idea about this so
I wonder if there is a chance of VID clash? I feel like we need
something to declare a range of reserved VID so that they gets rejected
when applied. (about this I think I should return -EINVAL if fdb/mbd
insert are asked with a reserved VID)

Anyway by fixing that interval problem (enabling support in busybox as
it's disabled by default in a OpenWRT system)
This is the new output of the local_termination.sh 

TAP version 13
1..1
# selftests: drivers/net/dsa: local_termination.sh
# TEST: lan1: Unicast IPv4 to primary MAC address                     [ OK ]
# TEST: lan1: Unicast IPv4 to macvlan MAC address                     [ OK ]
# TEST: lan1: Unicast IPv4 to unknown MAC address                     [FAIL]
#       reception succeeded, but should have failed
# TEST: lan1: Unicast IPv4 to unknown MAC address, promisc            [ OK ]
# TEST: lan1: Unicast IPv4 to unknown MAC address, allmulti           [FAIL]
#       reception succeeded, but should have failed
# TEST: lan1: Multicast IPv4 to joined group                          [ OK ]
# TEST: lan1: Multicast IPv4 to unknown group                         [FAIL]
#       reception succeeded, but should have failed
# TEST: lan1: Multicast IPv4 to unknown group, promisc                [ OK ]
# TEST: lan1: Multicast IPv4 to unknown group, allmulti               [ OK ]
# TEST: lan1: Multicast IPv6 to joined group                          [ OK ]
# TEST: lan1: Multicast IPv6 to unknown group                         [FAIL]
#       reception succeeded, but should have failed
# TEST: lan1: Multicast IPv6 to unknown group, promisc                [ OK ]
# TEST: lan1: Multicast IPv6 to unknown group, allmulti               [ OK ]
# TEST: br0: Unicast IPv4 to primary MAC address                      [ OK ]
# TEST: br0: Unicast IPv4 to macvlan MAC address                      [ OK ]
# TEST: br0: Unicast IPv4 to unknown MAC address                      [FAIL]
#       reception succeeded, but should have failed
# TEST: br0: Unicast IPv4 to unknown MAC address, promisc             [ OK ]
# TEST: br0: Unicast IPv4 to unknown MAC address, allmulti            [FAIL]
#       reception succeeded, but should have failed
# TEST: br0: Multicast IPv4 to joined group                           [ OK ]
# TEST: br0: Multicast IPv4 to unknown group                          [FAIL]
#       reception succeeded, but should have failed
# TEST: br0: Multicast IPv4 to unknown group, promisc                 [ OK ]
# TEST: br0: Multicast IPv4 to unknown group, allmulti                [ OK ]
# TEST: br0: Multicast IPv6 to joined group                           [ OK ]
# TEST: br0: Multicast IPv6 to unknown group                          [FAIL]
#       reception succeeded, but should have failed
# TEST: br0: Multicast IPv6 to unknown group, promisc                 [ OK ]
# TEST: br0: Multicast IPv6 to unknown group, allmulti                [ OK ]
not ok 1 selftests: drivers/net/dsa: local_termination.sh # exit=1

Seems good aside from the reception secceeded that I think the kernel
just drops right?

The switch have a way to control FLOOD per port but maybe the correct
feature to use is the VLAN leaky? Where the setting can be set by both
FDB/MBD and per port.

Sorry if I described some fix and implementation without proposing the
patch but I would like some comments on what the tool returned.
  
Christian Marangi June 28, 2023, 12:53 a.m. UTC | #7
On Wed, Jun 28, 2023 at 02:49:53AM +0200, Christian Marangi wrote:
> On Tue, Jun 27, 2023 at 12:49:16PM +0300, Vladimir Oltean wrote:
> > On Mon, Jun 26, 2023 at 07:59:41PM +0200, Christian Marangi wrote:
> > > > Hmm, so ping works but this doesn't? That's strange, because send_uc_ipv4()
> > > > also pings. Have you run with bash -x to see why it fails?
> > > > 
> > > 
> > > I just run with bash -x and I also mod the script to not delete the
> > > tcpdump. Limiting the script to only this test the dump is just 2 ICMPv6
> > > packet and no output from tcpdump aside from
> > > 
> > > tcpdump: listening on lan1, link-type EN10MB (Ethernet), snapshot length 65535 bytes
> > > 3 packets captured
> > > 5 packets received by filter
> > > 0 packets dropped by kernel
> > > 
> > > I feel like this is important so I think I should focus on understanding
> > > why this doesn't work? Any clue?
> > 
> > No clue. I'd put a "bash" instruction in send_uc_ipv4(), which would act
> > as a sort of break point for the script (opens an interactive sub-shell),
> > then run it again with bash -x, manually repeat the command that failed,
> > investigate why it failed and hit Ctrl-d when I'm done.
> 
> Hi, I wasted a day to only notice that the problem was in busybox not
> supporting 0.x value and that is what selfttest script use. Another
> thing to check. Also the confusing part of this is that we don't check
> if ping_do return error and the test just fails (while in reality the
> ping command was never executed)
>

Also sorry for the double email... I forgot to add that it seems I need
to add a sleep 1 right after 

	mc_route_prepare $h1
	mc_route_prepare $rcv_if_name

in local_termination.sh. Think the switch needs some time to enable all
the port. Without the sleep 1 the first test just fails.

> Anyway I'm fixing all kind of bugs and I even found an even hw
> limitation with the FDB table with mgmt packet...
> 
> Also I implemented fdb_insolation following the implementation done on
> felix with reserving VID at the end...
> About this I wonder if it might be a good idea to expose some generic
> API from DSA?
> 
> qca8k require a reserved VID for VLAN unaware port and with
> fdb_isolation even more VID are reserved. DSA have no idea about this so
> I wonder if there is a chance of VID clash? I feel like we need
> something to declare a range of reserved VID so that they gets rejected
> when applied. (about this I think I should return -EINVAL if fdb/mbd
> insert are asked with a reserved VID)
> 
> Anyway by fixing that interval problem (enabling support in busybox as
> it's disabled by default in a OpenWRT system)
> This is the new output of the local_termination.sh 
> 
> TAP version 13
> 1..1
> # selftests: drivers/net/dsa: local_termination.sh
> # TEST: lan1: Unicast IPv4 to primary MAC address                     [ OK ]
> # TEST: lan1: Unicast IPv4 to macvlan MAC address                     [ OK ]
> # TEST: lan1: Unicast IPv4 to unknown MAC address                     [FAIL]
> #       reception succeeded, but should have failed
> # TEST: lan1: Unicast IPv4 to unknown MAC address, promisc            [ OK ]
> # TEST: lan1: Unicast IPv4 to unknown MAC address, allmulti           [FAIL]
> #       reception succeeded, but should have failed
> # TEST: lan1: Multicast IPv4 to joined group                          [ OK ]
> # TEST: lan1: Multicast IPv4 to unknown group                         [FAIL]
> #       reception succeeded, but should have failed
> # TEST: lan1: Multicast IPv4 to unknown group, promisc                [ OK ]
> # TEST: lan1: Multicast IPv4 to unknown group, allmulti               [ OK ]
> # TEST: lan1: Multicast IPv6 to joined group                          [ OK ]
> # TEST: lan1: Multicast IPv6 to unknown group                         [FAIL]
> #       reception succeeded, but should have failed
> # TEST: lan1: Multicast IPv6 to unknown group, promisc                [ OK ]
> # TEST: lan1: Multicast IPv6 to unknown group, allmulti               [ OK ]
> # TEST: br0: Unicast IPv4 to primary MAC address                      [ OK ]
> # TEST: br0: Unicast IPv4 to macvlan MAC address                      [ OK ]
> # TEST: br0: Unicast IPv4 to unknown MAC address                      [FAIL]
> #       reception succeeded, but should have failed
> # TEST: br0: Unicast IPv4 to unknown MAC address, promisc             [ OK ]
> # TEST: br0: Unicast IPv4 to unknown MAC address, allmulti            [FAIL]
> #       reception succeeded, but should have failed
> # TEST: br0: Multicast IPv4 to joined group                           [ OK ]
> # TEST: br0: Multicast IPv4 to unknown group                          [FAIL]
> #       reception succeeded, but should have failed
> # TEST: br0: Multicast IPv4 to unknown group, promisc                 [ OK ]
> # TEST: br0: Multicast IPv4 to unknown group, allmulti                [ OK ]
> # TEST: br0: Multicast IPv6 to joined group                           [ OK ]
> # TEST: br0: Multicast IPv6 to unknown group                          [FAIL]
> #       reception succeeded, but should have failed
> # TEST: br0: Multicast IPv6 to unknown group, promisc                 [ OK ]
> # TEST: br0: Multicast IPv6 to unknown group, allmulti                [ OK ]
> not ok 1 selftests: drivers/net/dsa: local_termination.sh # exit=1
> 
> Seems good aside from the reception secceeded that I think the kernel
> just drops right?
> 
> The switch have a way to control FLOOD per port but maybe the correct
> feature to use is the VLAN leaky? Where the setting can be set by both
> FDB/MBD and per port.
> 
> Sorry if I described some fix and implementation without proposing the
> patch but I would like some comments on what the tool returned.
> 
> -- 
> 	Ansuel
  
Vladimir Oltean June 29, 2023, 12:39 p.m. UTC | #8
On Wed, Jun 28, 2023 at 02:49:53AM +0200, Christian Marangi wrote:
> Hi, I wasted a day to only notice that the problem was in busybox not
> supporting 0.x value and that is what selfttest script use. Another
> thing to check. Also the confusing part of this is that we don't check
> if ping_do return error and the test just fails (while in reality the
> ping command was never executed)

Interesting. Not sure how you'll check for the specific busybox
implementation. Does it help if you add "check_err $?" after ping_do()
in send_uc_ipv4(), like ping_test() has?

> Anyway I'm fixing all kind of bugs and I even found an even hw
> limitation with the FDB table with mgmt packet...
> 
> Also I implemented fdb_insolation following the implementation done on
> felix with reserving VID at the end...
> About this I wonder if it might be a good idea to expose some generic
> API from DSA?
> 
> qca8k require a reserved VID for VLAN unaware port and with
> fdb_isolation even more VID are reserved. DSA have no idea about this so
> I wonder if there is a chance of VID clash? I feel like we need
> something to declare a range of reserved VID so that they gets rejected
> when applied. (about this I think I should return -EINVAL if fdb/mbd
> insert are asked with a reserved VID)

I don't think that it's possible to get a port_fdb_add() or port_mdb_add()
call for a VID that wasn't accepted through port_vlan_add() first.
At least I don't see how. The VLAN-aware FDB entries come from the
bridge through SWITCHDEV_OBJ_ID_PORT_VLAN and from DSA's private
.ndo_vlan_rx_add_vid() / .ndo_vlan_rx_kill_vid() implementations for
VLAN filters (those that land in dp->user_vlans; refresh your net.git
tree if you don't find those).

If we reject VLANs at those layers, then:
# through the bridge
[root@mox:~] # bridge fdb add 00:01:02:03:04:05 vlan 2000 dev lan21 master static
[ 7611.225227] bridge: RTM_NEWNEIGH with unconfigured vlan 2000 on lan21
RTNETLINK answers: Invalid argument
# bridge bypass, should go through ndo_fdb_add() if DSA had one
[root@mox:~] # bridge fdb add 00:01:02:03:04:05 vlan 2000 dev lan21 self static
[ 7855.532615] mv88e6085 d0032004.mdio-mii:12 lan21: default FDB implementation only supports local addresses
RTNETLINK answers: Invalid argument

So, while we could probably add some API in core DSA for a reserved VID
range, I'm not sure that it will be that useful.

And regarding whether there is a chance of VID class? I guess there is.
I have a board where the felix driver (reserved range 4000-4095) is a
DSA master for the sja1105 driver (reserved range 3072-4095 for tag_8021q).
Since any dsa_tag_8021q_port_setup() call does a vlan_vid_add() call
towards the master, then there's a chance that felix could reject the
tag_8021q setup of the sja1105 ports for tag_8021q VIDs 4000 and upwards.

VID 4000 = 0xfa0 = bitwise 0b1111_1010_0000

Comparing with net/dsa/tag_8021q.c:

 * | 11  | 10  |  9  |  8  |  7  |  6  |  5  |  4  |  3  |  2  |  1  |  0  |
 * +-----------+-----+-----------------+-----------+-----------------------+
 * |    RSV    | VBID|    SWITCH_ID    |   VBID    |          PORT         |
 * +-----------+-----+-----------------+-----------+-----------------------+

So it would be problematic for VBID >= 6, whenever SWITCH_ID >= 6.
However, practical applications of tag_8021q with the sja1105 boards
that I'm aware of do not have 7 switches, so it's a theoretical problem
only.

Though we would need to be rather careful when calculating and enforcing
the reserved ranges in the DSA core, to not cause false positive errors.

> 
> Anyway by fixing that interval problem (enabling support in busybox as
> it's disabled by default in a OpenWRT system)
> This is the new output of the local_termination.sh 
> 
> TAP version 13
> 1..1
> # selftests: drivers/net/dsa: local_termination.sh
> # TEST: lan1: Unicast IPv4 to primary MAC address                     [ OK ]
> # TEST: lan1: Unicast IPv4 to macvlan MAC address                     [ OK ]
> # TEST: lan1: Unicast IPv4 to unknown MAC address                     [FAIL]
> #       reception succeeded, but should have failed
> # TEST: lan1: Unicast IPv4 to unknown MAC address, promisc            [ OK ]
> # TEST: lan1: Unicast IPv4 to unknown MAC address, allmulti           [FAIL]
> #       reception succeeded, but should have failed
> # TEST: lan1: Multicast IPv4 to joined group                          [ OK ]
> # TEST: lan1: Multicast IPv4 to unknown group                         [FAIL]
> #       reception succeeded, but should have failed
> # TEST: lan1: Multicast IPv4 to unknown group, promisc                [ OK ]
> # TEST: lan1: Multicast IPv4 to unknown group, allmulti               [ OK ]
> # TEST: lan1: Multicast IPv6 to joined group                          [ OK ]
> # TEST: lan1: Multicast IPv6 to unknown group                         [FAIL]
> #       reception succeeded, but should have failed
> # TEST: lan1: Multicast IPv6 to unknown group, promisc                [ OK ]
> # TEST: lan1: Multicast IPv6 to unknown group, allmulti               [ OK ]
> # TEST: br0: Unicast IPv4 to primary MAC address                      [ OK ]
> # TEST: br0: Unicast IPv4 to macvlan MAC address                      [ OK ]
> # TEST: br0: Unicast IPv4 to unknown MAC address                      [FAIL]
> #       reception succeeded, but should have failed
> # TEST: br0: Unicast IPv4 to unknown MAC address, promisc             [ OK ]
> # TEST: br0: Unicast IPv4 to unknown MAC address, allmulti            [FAIL]
> #       reception succeeded, but should have failed
> # TEST: br0: Multicast IPv4 to joined group                           [ OK ]
> # TEST: br0: Multicast IPv4 to unknown group                          [FAIL]
> #       reception succeeded, but should have failed
> # TEST: br0: Multicast IPv4 to unknown group, promisc                 [ OK ]
> # TEST: br0: Multicast IPv4 to unknown group, allmulti                [ OK ]
> # TEST: br0: Multicast IPv6 to joined group                           [ OK ]
> # TEST: br0: Multicast IPv6 to unknown group                          [FAIL]
> #       reception succeeded, but should have failed
> # TEST: br0: Multicast IPv6 to unknown group, promisc                 [ OK ]
> # TEST: br0: Multicast IPv6 to unknown group, allmulti                [ OK ]
> not ok 1 selftests: drivers/net/dsa: local_termination.sh # exit=1
> 
> Seems good aside from the reception secceeded that I think the kernel
> just drops right?

Looks good. If you've implemented FDB isolation, then it means that you
get port_fdb_add() and port_mdb_add() calls on the CPU port(s), and you
can now also implement .port_set_host_flood() and that should also make
the following tests pass:

* TEST: lan1: Unicast IPv4 to unknown MAC address                     [FAIL]
* TEST: lan1: Unicast IPv4 to unknown MAC address, allmulti           [FAIL]
* TEST: lan1: Multicast IPv4 to unknown group                         [FAIL]
* TEST: lan1: Multicast IPv6 to unknown group                         [FAIL]

The bridge tests to an unknown address will always fail with the message
"reception succeeded, but should have failed", so it's not you, it's the
test there.

Once you're there, you can re-do these tests with:
- all user ports to CPU port 0
- all user ports to CPU port 6
- user ports statically split between CPU ports 0 and 6
- all user ports to LAG composed of CPU ports 0 and 6

> The switch have a way to control FLOOD per port but maybe the correct
> feature to use is the VLAN leaky? Where the setting can be set by both
> FDB/MBD and per port.

I'm not sure why would the leaky VLAN feature be useful ("enable
specific frames to be forwarded across VLAN boundary"). I really don't
know what you're thinking about.

> Sorry if I described some fix and implementation without proposing the
> patch but I would like some comments on what the tool returned.
> 
> -- 
> 	Ansuel
  
Vladimir Oltean June 29, 2023, 12:49 p.m. UTC | #9
On Wed, Jun 28, 2023 at 02:53:35AM +0200, Christian Marangi wrote:
> Also sorry for the double email... I forgot to add that it seems I need
> to add a sleep 1 right after 
> 
> 	mc_route_prepare $h1
> 	mc_route_prepare $rcv_if_name
> 
> in local_termination.sh. Think the switch needs some time to enable all
> the port. Without the sleep 1 the first test just fails.

This makes sense. Those multicast MAC addresses are synced to DSA
through dsa_slave_set_rx_mode() -> dsa_slave_sync_mc(). Notice that the
calling context is atomic, so the implementation sets up a deferred work
item for later. So user space is allowed to return from the syscall
before the hardware is actually reprogrammed.
  
Christian Marangi July 1, 2023, 6:25 p.m. UTC | #10
On Thu, Jun 29, 2023 at 03:39:54PM +0300, Vladimir Oltean wrote:
> On Wed, Jun 28, 2023 at 02:49:53AM +0200, Christian Marangi wrote:
> > Hi, I wasted a day to only notice that the problem was in busybox not
> > supporting 0.x value and that is what selfttest script use. Another
> > thing to check. Also the confusing part of this is that we don't check
> > if ping_do return error and the test just fails (while in reality the
> > ping command was never executed)
> 
> Interesting. Not sure how you'll check for the specific busybox
> implementation. Does it help if you add "check_err $?" after ping_do()
> in send_uc_ipv4(), like ping_test() has?
>

I think it's better to implement some simple tests before the test is
done. Similar to tools missing tests...

So something like "ping -i 0.1" and reading the output if it does match
a specific error. Or at least add some comments that that option is
required. But I guess check_err should be added anyway since it'd the
current pattern used in lib.sh

> > Anyway I'm fixing all kind of bugs and I even found an even hw
> > limitation with the FDB table with mgmt packet...
> > 
> > Also I implemented fdb_insolation following the implementation done on
> > felix with reserving VID at the end...
> > About this I wonder if it might be a good idea to expose some generic
> > API from DSA?
> > 
> > qca8k require a reserved VID for VLAN unaware port and with
> > fdb_isolation even more VID are reserved. DSA have no idea about this so
> > I wonder if there is a chance of VID clash? I feel like we need
> > something to declare a range of reserved VID so that they gets rejected
> > when applied. (about this I think I should return -EINVAL if fdb/mbd
> > insert are asked with a reserved VID)
> 
> I don't think that it's possible to get a port_fdb_add() or port_mdb_add()
> call for a VID that wasn't accepted through port_vlan_add() first.
> At least I don't see how. The VLAN-aware FDB entries come from the
> bridge through SWITCHDEV_OBJ_ID_PORT_VLAN and from DSA's private
> .ndo_vlan_rx_add_vid() / .ndo_vlan_rx_kill_vid() implementations for
> VLAN filters (those that land in dp->user_vlans; refresh your net.git
> tree if you don't find those).
> 

Ok so in theory it should be reserved and a check should be added in
port_valn_add to return an error if a reserved VID is used.

> If we reject VLANs at those layers, then:
> # through the bridge
> [root@mox:~] # bridge fdb add 00:01:02:03:04:05 vlan 2000 dev lan21 master static
> [ 7611.225227] bridge: RTM_NEWNEIGH with unconfigured vlan 2000 on lan21
> RTNETLINK answers: Invalid argument
> # bridge bypass, should go through ndo_fdb_add() if DSA had one
> [root@mox:~] # bridge fdb add 00:01:02:03:04:05 vlan 2000 dev lan21 self static
> [ 7855.532615] mv88e6085 d0032004.mdio-mii:12 lan21: default FDB implementation only supports local addresses
> RTNETLINK answers: Invalid argument
> 
> So, while we could probably add some API in core DSA for a reserved VID
> range, I'm not sure that it will be that useful.
> 
> And regarding whether there is a chance of VID class? I guess there is.
> I have a board where the felix driver (reserved range 4000-4095) is a
> DSA master for the sja1105 driver (reserved range 3072-4095 for tag_8021q).
> Since any dsa_tag_8021q_port_setup() call does a vlan_vid_add() call
> towards the master, then there's a chance that felix could reject the
> tag_8021q setup of the sja1105 ports for tag_8021q VIDs 4000 and upwards.
> 
> VID 4000 = 0xfa0 = bitwise 0b1111_1010_0000
> 
> Comparing with net/dsa/tag_8021q.c:
> 
>  * | 11  | 10  |  9  |  8  |  7  |  6  |  5  |  4  |  3  |  2  |  1  |  0  |
>  * +-----------+-----+-----------------+-----------+-----------------------+
>  * |    RSV    | VBID|    SWITCH_ID    |   VBID    |          PORT         |
>  * +-----------+-----+-----------------+-----------+-----------------------+
> 
> So it would be problematic for VBID >= 6, whenever SWITCH_ID >= 6.
> However, practical applications of tag_8021q with the sja1105 boards
> that I'm aware of do not have 7 switches, so it's a theoretical problem
> only.
> 
> Though we would need to be rather careful when calculating and enforcing
> the reserved ranges in the DSA core, to not cause false positive errors.
> 

Was just an idea since I see only felix using fdb_isolation and it seems
reserving VID is the current way to handle it.

> > 
> > Anyway by fixing that interval problem (enabling support in busybox as
> > it's disabled by default in a OpenWRT system)
> > This is the new output of the local_termination.sh 
> > 
> > TAP version 13
> > 1..1
> > # selftests: drivers/net/dsa: local_termination.sh
> > # TEST: lan1: Unicast IPv4 to primary MAC address                     [ OK ]
> > # TEST: lan1: Unicast IPv4 to macvlan MAC address                     [ OK ]
> > # TEST: lan1: Unicast IPv4 to unknown MAC address                     [FAIL]
> > #       reception succeeded, but should have failed
> > # TEST: lan1: Unicast IPv4 to unknown MAC address, promisc            [ OK ]
> > # TEST: lan1: Unicast IPv4 to unknown MAC address, allmulti           [FAIL]
> > #       reception succeeded, but should have failed
> > # TEST: lan1: Multicast IPv4 to joined group                          [ OK ]
> > # TEST: lan1: Multicast IPv4 to unknown group                         [FAIL]
> > #       reception succeeded, but should have failed
> > # TEST: lan1: Multicast IPv4 to unknown group, promisc                [ OK ]
> > # TEST: lan1: Multicast IPv4 to unknown group, allmulti               [ OK ]
> > # TEST: lan1: Multicast IPv6 to joined group                          [ OK ]
> > # TEST: lan1: Multicast IPv6 to unknown group                         [FAIL]
> > #       reception succeeded, but should have failed
> > # TEST: lan1: Multicast IPv6 to unknown group, promisc                [ OK ]
> > # TEST: lan1: Multicast IPv6 to unknown group, allmulti               [ OK ]
> > # TEST: br0: Unicast IPv4 to primary MAC address                      [ OK ]
> > # TEST: br0: Unicast IPv4 to macvlan MAC address                      [ OK ]
> > # TEST: br0: Unicast IPv4 to unknown MAC address                      [FAIL]
> > #       reception succeeded, but should have failed
> > # TEST: br0: Unicast IPv4 to unknown MAC address, promisc             [ OK ]
> > # TEST: br0: Unicast IPv4 to unknown MAC address, allmulti            [FAIL]
> > #       reception succeeded, but should have failed
> > # TEST: br0: Multicast IPv4 to joined group                           [ OK ]
> > # TEST: br0: Multicast IPv4 to unknown group                          [FAIL]
> > #       reception succeeded, but should have failed
> > # TEST: br0: Multicast IPv4 to unknown group, promisc                 [ OK ]
> > # TEST: br0: Multicast IPv4 to unknown group, allmulti                [ OK ]
> > # TEST: br0: Multicast IPv6 to joined group                           [ OK ]
> > # TEST: br0: Multicast IPv6 to unknown group                          [FAIL]
> > #       reception succeeded, but should have failed
> > # TEST: br0: Multicast IPv6 to unknown group, promisc                 [ OK ]
> > # TEST: br0: Multicast IPv6 to unknown group, allmulti                [ OK ]
> > not ok 1 selftests: drivers/net/dsa: local_termination.sh # exit=1
> > 
> > Seems good aside from the reception secceeded that I think the kernel
> > just drops right?
> 
> Looks good. If you've implemented FDB isolation, then it means that you
> get port_fdb_add() and port_mdb_add() calls on the CPU port(s), and you
> can now also implement .port_set_host_flood() and that should also make
> the following tests pass:
> 
> * TEST: lan1: Unicast IPv4 to unknown MAC address                     [FAIL]
> * TEST: lan1: Unicast IPv4 to unknown MAC address, allmulti           [FAIL]
> * TEST: lan1: Multicast IPv4 to unknown group                         [FAIL]
> * TEST: lan1: Multicast IPv6 to unknown group                         [FAIL]
> 
> The bridge tests to an unknown address will always fail with the message
> "reception succeeded, but should have failed", so it's not you, it's the
> test there.
> 
> Once you're there, you can re-do these tests with:
> - all user ports to CPU port 0
> - all user ports to CPU port 6
> - user ports statically split between CPU ports 0 and 6
> - all user ports to LAG composed of CPU ports 0 and 6
> 

I have bad and good news about this... Bad news is that it seems to be
qca8k handle flooding in a very interesting and broken way.
Due to DSA implementation we expect every packet to be flooded to the
CPU port by default and this is problematic with how the switch works.

It seems that enabling flood bit for the CPU results in packet always be
directed there... The switch tagger have a way to comunicate that the
packet is flooded to the cpu.

With some code in the tagger I manage to reach this state from
local_termination.sh

root@OpenWrt:~# /ktests/run_kselftest.sh -t drivers/net/dsa:local_termination.sh
TAP version 13
1..1
# selftests: drivers/net/dsa: local_termination.sh
# TEST: lan1: Unicast IPv4 to primary MAC address                     [ OK ]
# TEST: lan1: Unicast IPv4 to macvlan MAC address                     [ OK ]
# TEST: lan1: Unicast IPv4 to unknown MAC address                     [ OK ]
# TEST: lan1: Unicast IPv4 to unknown MAC address, promisc            [ OK ]
# TEST: lan1: Unicast IPv4 to unknown MAC address, allmulti           [ OK ]
# TEST: lan1: Multicast IPv4 to joined group                          [ OK ]
# TEST: lan1: Multicast IPv4 to unknown group                         [ OK ]
# TEST: lan1: Multicast IPv4 to unknown group, promisc                [ OK ]
# TEST: lan1: Multicast IPv4 to unknown group, allmulti               [ OK ]
# TEST: lan1: Multicast IPv6 to joined group                          [ OK ]
# TEST: lan1: Multicast IPv6 to unknown group                         [ OK ]
# TEST: lan1: Multicast IPv6 to unknown group, promisc                [ OK ]
# TEST: lan1: Multicast IPv6 to unknown group, allmulti               [ OK ]
# TEST: br0: Unicast IPv4 to primary MAC address                      [ OK ]
# TEST: br0: Unicast IPv4 to macvlan MAC address                      [ OK ]
# TEST: br0: Unicast IPv4 to unknown MAC address                      [FAIL]
#       reception succeeded, but should have failed
# TEST: br0: Unicast IPv4 to unknown MAC address, promisc             [ OK ]
# TEST: br0: Unicast IPv4 to unknown MAC address, allmulti            [FAIL]
#       reception succeeded, but should have failed
# TEST: br0: Multicast IPv4 to joined group                           [ OK ]
# TEST: br0: Multicast IPv4 to unknown group                          [FAIL]
#       reception succeeded, but should have failed
# TEST: br0: Multicast IPv4 to unknown group, promisc                 [ OK ]
# TEST: br0: Multicast IPv4 to unknown group, allmulti                [ OK ]
# TEST: br0: Multicast IPv6 to joined group                           [ OK ]
# TEST: br0: Multicast IPv6 to unknown group                          [FAIL]
#       reception succeeded, but should have failed
# TEST: br0: Multicast IPv6 to unknown group, promisc                 [ OK ]
# TEST: br0: Multicast IPv6 to unknown group, allmulti                [ OK ]
not ok 1 selftests: drivers/net/dsa: local_termination.sh # exit=1

The tagger check if the packet type is a flood kind and drop the packet
(return NULL) if the interface is not promisc or allmulti. If the port
if bridge type this check is skipped...

My concern is... Is it ok to implement host_set_flood this way? I tried
many variant and I wasn't able to make flood work by using those bits...

With the current implementation, host_set_flood will return always zero
and everything will be handled in the tagger by dropping packet but I'm
not sure it's ok to do it.

> > The switch have a way to control FLOOD per port but maybe the correct
> > feature to use is the VLAN leaky? Where the setting can be set by both
> > FDB/MBD and per port.
> 
> I'm not sure why would the leaky VLAN feature be useful ("enable
> specific frames to be forwarded across VLAN boundary"). I really don't
> know what you're thinking about.
> 

Yes sorry, just me wasting way too much time on this and searching for
alternative solution... Switch documentation doesn't describe these case
very well sadly...

> > Sorry if I described some fix and implementation without proposing the
> > patch but I would like some comments on what the tool returned.
> >
  
Vladimir Oltean July 4, 2023, 11:34 p.m. UTC | #11
On Sat, Jul 01, 2023 at 08:25:11PM +0200, Christian Marangi wrote:
> I have bad and good news about this... Bad news is that it seems to be
> qca8k handle flooding in a very interesting and broken way.
> Due to DSA implementation we expect every packet to be flooded to the
> CPU port by default and this is problematic with how the switch works.

To be clear, the network stack understands "flooding" to be something
that takes place for packets that have an unknown MAC DA, or broadcast.
So no, we don't expect that every packet is flooded, nor that all are
flooded to the CPU port. I'm not exactly clear what you mean.

> 
> It seems that enabling flood bit for the CPU results in packet always be
> directed there...

Are you sure? How have you determined this? Can you rephrase this?
Because what I'm understanding is that you're saying that when flooding
towards the CPU is enabled, packets go to the CPU regardless of whether
they're known to the FDB or not. Which can't be correct, because if it
was, then packets sent to a MAC DA statically added as a bridge FDB entry
would be seen twice:

- once, autonomously forwarded by the switch
- twice, due to your claim that "packets are always directed to the CPU",
  (claim which must also hold with the current driver configuration,
  because the driver currently always enables flooding to the CPU)

But if that's the case, it means that the CPU (via the software bridge)
will send a second copy of the packet to the egress port, because
skb->offload_fwd_mark is unset. Someone would have noticed if that was
the case ?!

I have some skepticism towards this claim.

Please try to describe what happens under this scenario:

       br0
     /  |  \
    /   |   \
   /    |    \
 swp0  swp1  swp2
  |     |     |
  A     B     C

# On the switch
ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
ip link set swp0 master br0 && ip link set br0 up
ip link set swp1 master br0 && ip link set br0 up
ip link set swp2 master br0 && ip link set br0 up
bridge fdb add dev swp1 ${mac_of_B} vlan 1 master static

# On C
tcpdump -i eth0 -e -n

# On A
ping $B

How many ICMP requests do you see on B? How many do you see on C?
How many on the CPU port (br0)?

Correct answers should be:
On B: as many as were sent by A
On C: none
On br0: none on ingress (tcpdump -Q in), none on egress (tcpdump -Q out)

If you're seeing the ICMP requests on station C, see whether they don't
in fact come from br0 having flooded them in software, which means that
the real problem is with the FDB not matching on packets..

I have no idea how you've implemented things, but with reserved VIDs,
you also need to remap VID 0 during port_fdb_add() calls to the reserved
PVID of that bridge, and you've done that, otherwise packets won't match,
right?

> The switch tagger have a way to comunicate that the packet is flooded
> to the cpu.
> 
> With some code in the tagger I manage to reach this state from
> local_termination.sh
> 
> root@OpenWrt:~# /ktests/run_kselftest.sh -t drivers/net/dsa:local_termination.sh
> TAP version 13
> 1..1
> # selftests: drivers/net/dsa: local_termination.sh
> # TEST: lan1: Unicast IPv4 to primary MAC address                     [ OK ]
> # TEST: lan1: Unicast IPv4 to macvlan MAC address                     [ OK ]
> # TEST: lan1: Unicast IPv4 to unknown MAC address                     [ OK ]
> # TEST: lan1: Unicast IPv4 to unknown MAC address, promisc            [ OK ]
> # TEST: lan1: Unicast IPv4 to unknown MAC address, allmulti           [ OK ]
> # TEST: lan1: Multicast IPv4 to joined group                          [ OK ]
> # TEST: lan1: Multicast IPv4 to unknown group                         [ OK ]
> # TEST: lan1: Multicast IPv4 to unknown group, promisc                [ OK ]
> # TEST: lan1: Multicast IPv4 to unknown group, allmulti               [ OK ]
> # TEST: lan1: Multicast IPv6 to joined group                          [ OK ]
> # TEST: lan1: Multicast IPv6 to unknown group                         [ OK ]
> # TEST: lan1: Multicast IPv6 to unknown group, promisc                [ OK ]
> # TEST: lan1: Multicast IPv6 to unknown group, allmulti               [ OK ]
> # TEST: br0: Unicast IPv4 to primary MAC address                      [ OK ]
> # TEST: br0: Unicast IPv4 to macvlan MAC address                      [ OK ]
> # TEST: br0: Unicast IPv4 to unknown MAC address                      [FAIL]
> #       reception succeeded, but should have failed
> # TEST: br0: Unicast IPv4 to unknown MAC address, promisc             [ OK ]
> # TEST: br0: Unicast IPv4 to unknown MAC address, allmulti            [FAIL]
> #       reception succeeded, but should have failed
> # TEST: br0: Multicast IPv4 to joined group                           [ OK ]
> # TEST: br0: Multicast IPv4 to unknown group                          [FAIL]
> #       reception succeeded, but should have failed
> # TEST: br0: Multicast IPv4 to unknown group, promisc                 [ OK ]
> # TEST: br0: Multicast IPv4 to unknown group, allmulti                [ OK ]
> # TEST: br0: Multicast IPv6 to joined group                           [ OK ]
> # TEST: br0: Multicast IPv6 to unknown group                          [FAIL]
> #       reception succeeded, but should have failed
> # TEST: br0: Multicast IPv6 to unknown group, promisc                 [ OK ]
> # TEST: br0: Multicast IPv6 to unknown group, allmulti                [ OK ]
> not ok 1 selftests: drivers/net/dsa: local_termination.sh # exit=1
> 
> The tagger check if the packet type is a flood kind and drop the packet
> (return NULL) if the interface is not promisc or allmulti. If the port
> if bridge type this check is skipped...

Why is the check skipped if the port is under a bridge? Because there
might be foreign interfaces under that bridge (like wlan0) which might
be valid destinations for flooded traffic (through software forwarding)?

> My concern is... Is it ok to implement host_set_flood this way? I tried
> many variant and I wasn't able to make flood work by using those bits...
> 
> With the current implementation, host_set_flood will return always zero
> and everything will be handled in the tagger by dropping packet but I'm
> not sure it's ok to do it.

Well, we should first clarify exactly what it is that you claim to be a
hardware bug.
  

Patch

diff --git a/drivers/net/dsa/qca/qca8k-8xxx.c b/drivers/net/dsa/qca/qca8k-8xxx.c
index f08086ac2261..a9af270a03ce 100644
--- a/drivers/net/dsa/qca/qca8k-8xxx.c
+++ b/drivers/net/dsa/qca/qca8k-8xxx.c
@@ -1963,12 +1963,6 @@  qca8k_setup(struct dsa_switch *ds)
 			if (ret)
 				return ret;
 
-			/* Enable ARP Auto-learning by default */
-			ret = regmap_set_bits(priv->regmap, QCA8K_PORT_LOOKUP_CTRL(i),
-					      QCA8K_PORT_LOOKUP_LEARN);
-			if (ret)
-				return ret;
-
 			/* For port based vlans to work we need to set the
 			 * default egress vid
 			 */
@@ -2071,6 +2065,8 @@  static const struct dsa_switch_ops qca8k_switch_ops = {
 	.port_change_mtu	= qca8k_port_change_mtu,
 	.port_max_mtu		= qca8k_port_max_mtu,
 	.port_stp_state_set	= qca8k_port_stp_state_set,
+	.port_pre_bridge_flags	= qca8k_port_pre_bridge_flags,
+	.port_bridge_flags	= qca8k_port_bridge_flags,
 	.port_bridge_join	= qca8k_port_bridge_join,
 	.port_bridge_leave	= qca8k_port_bridge_leave,
 	.port_fast_age		= qca8k_port_fast_age,
diff --git a/drivers/net/dsa/qca/qca8k-common.c b/drivers/net/dsa/qca/qca8k-common.c
index 8c2dc0e48ff4..f93defbd8b66 100644
--- a/drivers/net/dsa/qca/qca8k-common.c
+++ b/drivers/net/dsa/qca/qca8k-common.c
@@ -559,8 +559,24 @@  int qca8k_get_mac_eee(struct dsa_switch *ds, int port,
 	return 0;
 }
 
+static int qca8k_port_configure_learning(struct dsa_switch *ds, int port,
+					 bool learning)
+{
+	struct qca8k_priv *priv = ds->priv;
+
+	if (learning)
+		return regmap_set_bits(priv->regmap,
+				       QCA8K_PORT_LOOKUP_CTRL(port),
+				       QCA8K_PORT_LOOKUP_LEARN);
+	else
+		return regmap_clear_bits(priv->regmap,
+					 QCA8K_PORT_LOOKUP_CTRL(port),
+					 QCA8K_PORT_LOOKUP_LEARN);
+}
+
 void qca8k_port_stp_state_set(struct dsa_switch *ds, int port, u8 state)
 {
+	struct dsa_port *dp = dsa_to_port(ds, port);
 	struct qca8k_priv *priv = ds->priv;
 	u32 stp_state;
 
@@ -585,6 +601,30 @@  void qca8k_port_stp_state_set(struct dsa_switch *ds, int port, u8 state)
 
 	qca8k_rmw(priv, QCA8K_PORT_LOOKUP_CTRL(port),
 		  QCA8K_PORT_LOOKUP_STATE_MASK, stp_state);
+
+	qca8k_port_configure_learning(ds, port, dp->learning);
+}
+
+int qca8k_port_pre_bridge_flags(struct dsa_switch *ds, int port,
+				struct switchdev_brport_flags flags,
+				struct netlink_ext_ack *extack)
+{
+	if (flags.mask & ~BR_LEARNING)
+		return -EINVAL;
+
+	return 0;
+}
+
+int qca8k_port_bridge_flags(struct dsa_switch *ds, int port,
+			    struct switchdev_brport_flags flags,
+			    struct netlink_ext_ack *extack)
+{
+	int ret;
+
+	ret = qca8k_port_configure_learning(ds, port,
+					    flags.mask & ~BR_LEARNING);
+
+	return ret;
 }
 
 int qca8k_port_bridge_join(struct dsa_switch *ds, int port,
diff --git a/drivers/net/dsa/qca/qca8k.h b/drivers/net/dsa/qca/qca8k.h
index c5cc8a172d65..8f88b7db384d 100644
--- a/drivers/net/dsa/qca/qca8k.h
+++ b/drivers/net/dsa/qca/qca8k.h
@@ -522,6 +522,12 @@  int qca8k_get_mac_eee(struct dsa_switch *ds, int port, struct ethtool_eee *e);
 
 /* Common bridge function */
 void qca8k_port_stp_state_set(struct dsa_switch *ds, int port, u8 state);
+int qca8k_port_pre_bridge_flags(struct dsa_switch *ds, int port,
+				struct switchdev_brport_flags flags,
+				struct netlink_ext_ack *extack);
+int qca8k_port_bridge_flags(struct dsa_switch *ds, int port,
+			    struct switchdev_brport_flags flags,
+			    struct netlink_ext_ack *extack);
 int qca8k_port_bridge_join(struct dsa_switch *ds, int port,
 			   struct dsa_bridge bridge,
 			   bool *tx_fwd_offload,