Revert "arm64: dts: ls1028a: sl28: use ocelot-8021q tagging by default"

Message ID 20221027113248.420216-1-michael@walle.cc
State New
Headers
Series Revert "arm64: dts: ls1028a: sl28: use ocelot-8021q tagging by default" |

Commit Message

Michael Walle Oct. 27, 2022, 11:32 a.m. UTC
  This reverts commit be0b178c50c37a666d54f435da71cf9f008362a0.

This commit will break networking on the sl28 boards if the tagger is
not compiled into the kernel. If a non-default tagger is used, the
kernel doesn't do a request_module(). Fixing that is also not that
trivial because the tagger modules are loaded by ids, not by name.
Thus for now, just revert to the default tagger until that is fixed.

Fixes: be0b178c50c3 ("arm64: dts: ls1028a: sl28: use ocelot-8021q tagging by default")
Reported-by: Heiko Thiery <heiko.thiery@gmail.com>
Signed-off-by: Michael Walle <michael@walle.cc>
---
Vladimir, I'm not sure how to fix that one. Adding aliases to the tagger
modules? Something like "MODULE_ALIAS("dsa_tag-ocelot-8021q");" and then do
a request_module() in dsa_find_tagger_by_name(), too?

 .../arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts | 8 --------
 1 file changed, 8 deletions(-)
  

Comments

Vladimir Oltean Oct. 27, 2022, 12:05 p.m. UTC | #1
On Thu, Oct 27, 2022 at 01:32:48PM +0200, Michael Walle wrote:
> This reverts commit be0b178c50c37a666d54f435da71cf9f008362a0.
> 
> This commit will break networking on the sl28 boards if the tagger is
> not compiled into the kernel. If a non-default tagger is used, the
> kernel doesn't do a request_module(). Fixing that is also not that
> trivial because the tagger modules are loaded by ids, not by name.
> Thus for now, just revert to the default tagger until that is fixed.
> 
> Fixes: be0b178c50c3 ("arm64: dts: ls1028a: sl28: use ocelot-8021q tagging by default")
> Reported-by: Heiko Thiery <heiko.thiery@gmail.com>
> Signed-off-by: Michael Walle <michael@walle.cc>
> ---
> Vladimir, I'm not sure how to fix that one. Adding aliases to the tagger
> modules? Something like "MODULE_ALIAS("dsa_tag-ocelot-8021q");" and then do
> a request_module() in dsa_find_tagger_by_name(), too?
> 
>  .../arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts | 8 --------
>  1 file changed, 8 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts b/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts
> index 72429b37a8b4..771c50c7f50a 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts
> @@ -324,14 +324,6 @@ &lpuart1 {
>  	status = "okay";
>  };
>  
> -&mscc_felix_port4 {
> -	dsa-tag-protocol = "ocelot-8021q";
> -};
> -
> -&mscc_felix_port5 {
> -	dsa-tag-protocol = "ocelot-8021q";
> -};
> -
>  &usb0 {
>  	status = "okay";
>  };
> -- 
> 2.30.2
>

Pretty nasty. Of all the switch drivers that support tagging protocol
change, Ocelot/Felix is the only one with this bug, because in all other
cases, the default and the alternative tagging protocol are part of the
same .ko. Only here we have tag_ocelot.ko and tag_ocelot_8021q.ko.

The problem preventing us from calling request_module() is that currently,
the string identifying the tagging protocol (to which we match device
tree information) is part of the tag_*.ko. I think we'd need the
translation table between string and enum dsa_tag_protocol to be part of
dsa_core.ko.
  
Vladimir Oltean Oct. 27, 2022, 12:27 p.m. UTC | #2
On Thu, Oct 27, 2022 at 03:05:19PM +0300, Vladimir Oltean wrote:
> On Thu, Oct 27, 2022 at 01:32:48PM +0200, Michael Walle wrote:
> > This reverts commit be0b178c50c37a666d54f435da71cf9f008362a0.
> > 
> > This commit will break networking on the sl28 boards if the tagger is
> > not compiled into the kernel. If a non-default tagger is used, the
> > kernel doesn't do a request_module(). Fixing that is also not that
> > trivial because the tagger modules are loaded by ids, not by name.
> > Thus for now, just revert to the default tagger until that is fixed.
> > 
> > Fixes: be0b178c50c3 ("arm64: dts: ls1028a: sl28: use ocelot-8021q tagging by default")
> > Reported-by: Heiko Thiery <heiko.thiery@gmail.com>
> > Signed-off-by: Michael Walle <michael@walle.cc>
> > ---
> > Vladimir, I'm not sure how to fix that one. Adding aliases to the tagger
> > modules? Something like "MODULE_ALIAS("dsa_tag-ocelot-8021q");" and then do
> > a request_module() in dsa_find_tagger_by_name(), too?
> > 
> >  .../arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts | 8 --------
> >  1 file changed, 8 deletions(-)
> > 
> > diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts b/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts
> > index 72429b37a8b4..771c50c7f50a 100644
> > --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts
> > +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts
> > @@ -324,14 +324,6 @@ &lpuart1 {
> >  	status = "okay";
> >  };
> >  
> > -&mscc_felix_port4 {
> > -	dsa-tag-protocol = "ocelot-8021q";
> > -};
> > -
> > -&mscc_felix_port5 {
> > -	dsa-tag-protocol = "ocelot-8021q";
> > -};
> > -
> >  &usb0 {
> >  	status = "okay";
> >  };
> > -- 
> > 2.30.2
> >
> 
> Pretty nasty. Of all the switch drivers that support tagging protocol
> change, Ocelot/Felix is the only one with this bug, because in all other
> cases, the default and the alternative tagging protocol are part of the
> same .ko. Only here we have tag_ocelot.ko and tag_ocelot_8021q.ko.
> 
> The problem preventing us from calling request_module() is that currently,
> the string identifying the tagging protocol (to which we match device
> tree information) is part of the tag_*.ko. I think we'd need the
> translation table between string and enum dsa_tag_protocol to be part of
> dsa_core.ko.

I think we should treat what we committed to in terms of dt-bindings
with utmost respect, so I would consider your proposed revert as the
absolute last option. Reverting a device tree change doesn't mean that
the device trees without the revert will disappear from circulation.

So far we have 3 options for fixing this within the kernel

- make tag_ocelot.o and tag_ocelot_8021q.o link into the same
  tag_ocelot.ko

- change the MODULE_ALIAS() of all tagging protocol driver modules from
  "dsa_tag-<number" to something containing their string name - what you
  proposed. I don't know why the current MODULE_ALIAS() is formatted the
  way it is. Maybe Andrew can comment on whether this is feasible.
  I think there isn't any backwards compatibility concern, since only
  modules compiled for a certain kernel version are expected to be
  loaded.

- put a translation table between string and MODULE_ALIAS() inside
  dsa_core.ko, which potentially duplicates code. Maybe if we
  auto-generate it somehow?
  
Vladimir Oltean Oct. 27, 2022, 12:40 p.m. UTC | #3
On Thu, Oct 27, 2022 at 03:27:27PM +0300, Vladimir Oltean wrote:
> I think we should treat what we committed to in terms of dt-bindings
> with utmost respect, so I would consider your proposed revert as the
> absolute last option. Reverting a device tree change doesn't mean that
> the device trees without the revert will disappear from circulation.
> 
> So far we have 3 options for fixing this within the kernel
> 
> - make tag_ocelot.o and tag_ocelot_8021q.o link into the same
>   tag_ocelot.ko
> 
> - change the MODULE_ALIAS() of all tagging protocol driver modules from
>   "dsa_tag-<number" to something containing their string name - what you
>   proposed. I don't know why the current MODULE_ALIAS() is formatted the
>   way it is. Maybe Andrew can comment on whether this is feasible.
>   I think there isn't any backwards compatibility concern, since only
>   modules compiled for a certain kernel version are expected to be
>   loaded.
> 
> - put a translation table between string and MODULE_ALIAS() inside
>   dsa_core.ko, which potentially duplicates code. Maybe if we
>   auto-generate it somehow?

Sorry for sending so many emails. I think the problem we should fix
first and foremost is that, if there's a user protocol specified in the
device tree but the kernel fails to load it, it should simply stick with
the default tagging protocol, instead of failing to probe. Everything
else can be dealt with as a future refinement.
  
Michael Walle Oct. 27, 2022, 1 p.m. UTC | #4
Am 2022-10-27 14:40, schrieb Vladimir Oltean:

> Sorry for sending so many emails. I think the problem we should fix
> first and foremost is that, if there's a user protocol specified in the
> device tree but the kernel fails to load it, it should simply stick 
> with
> the default tagging protocol, instead of failing to probe. Everything
> else can be dealt with as a future refinement.

Sounds good to me. Should I come up with a patch or will you do it?

-michael
  
Vladimir Oltean Oct. 27, 2022, 1:54 p.m. UTC | #5
On Thu, Oct 27, 2022 at 03:00:23PM +0200, Michael Walle wrote:
> Am 2022-10-27 14:40, schrieb Vladimir Oltean:
> 
> > Sorry for sending so many emails. I think the problem we should fix
> > first and foremost is that, if there's a user protocol specified in the
> > device tree but the kernel fails to load it, it should simply stick with
> > the default tagging protocol, instead of failing to probe. Everything
> > else can be dealt with as a future refinement.
> 
> Sounds good to me. Should I come up with a patch or will you do it?

I'll try to prepare a patch and copy you and Heiko. I hope to do that
soon, but I've been running with hardirqs disabled for the past week or
so, and as you can imagine, things are a bit crazy right now and there's
a lot of pending work to do
  
Michael Walle Oct. 27, 2022, 4 p.m. UTC | #6
Am 2022-10-27 14:27, schrieb Vladimir Oltean:

>> Pretty nasty. Of all the switch drivers that support tagging protocol
>> change, Ocelot/Felix is the only one with this bug, because in all 
>> other
>> cases, the default and the alternative tagging protocol are part of 
>> the
>> same .ko. Only here we have tag_ocelot.ko and tag_ocelot_8021q.ko.
>> 
>> The problem preventing us from calling request_module() is that 
>> currently,
>> the string identifying the tagging protocol (to which we match device
>> tree information) is part of the tag_*.ko. I think we'd need the
>> translation table between string and enum dsa_tag_protocol to be part 
>> of
>> dsa_core.ko.
> 
> I think we should treat what we committed to in terms of dt-bindings
> with utmost respect, so I would consider your proposed revert as the
> absolute last option. Reverting a device tree change doesn't mean that
> the device trees without the revert will disappear from circulation.
> 
> So far we have 3 options for fixing this within the kernel
> 
> - make tag_ocelot.o and tag_ocelot_8021q.o link into the same
>   tag_ocelot.ko
> 
> - change the MODULE_ALIAS() of all tagging protocol driver modules from
>   "dsa_tag-<number" to something containing their string name - what 
> you
>   proposed. I don't know why the current MODULE_ALIAS() is formatted 
> the
>   way it is. Maybe Andrew can comment on whether this is feasible.
>   I think there isn't any backwards compatibility concern, since only
>   modules compiled for a certain kernel version are expected to be
>   loaded.

FWIW, you can have multiple aliases if we somehow need to keep the IDs,
too.

> - put a translation table between string and MODULE_ALIAS() inside
>   dsa_core.ko, which potentially duplicates code. Maybe if we
>   auto-generate it somehow?

Yeah, I also thought of a table with of name to module alias mapping.
But then you'd have two places to keep in sync (of not autogenerated).

-michael
  
Michael Walle Oct. 27, 2022, 4:14 p.m. UTC | #7
Am 2022-10-27 13:32, schrieb Michael Walle:
> This reverts commit be0b178c50c37a666d54f435da71cf9f008362a0.
> 
> This commit will break networking on the sl28 boards if the tagger is
> not compiled into the kernel. If a non-default tagger is used, the
> kernel doesn't do a request_module(). Fixing that is also not that
> trivial because the tagger modules are loaded by ids, not by name.
> Thus for now, just revert to the default tagger until that is fixed.
> 
> Fixes: be0b178c50c3 ("arm64: dts: ls1028a: sl28: use ocelot-8021q
> tagging by default")
> Reported-by: Heiko Thiery <heiko.thiery@gmail.com>
> Signed-off-by: Michael Walle <michael@walle.cc>

Please disregard this patch. The proper fix is here:
https://lore.kernel.org/netdev/20221027145439.3086017-1-vladimir.oltean@nxp.com/

-michael
  
Vladimir Oltean Oct. 27, 2022, 6:04 p.m. UTC | #8
On Thu, Oct 27, 2022 at 06:00:04PM +0200, Michael Walle wrote:
> > - change the MODULE_ALIAS() of all tagging protocol driver modules from
> >   "dsa_tag-<number" to something containing their string name - what you
> >   proposed. I don't know why the current MODULE_ALIAS() is formatted the
> >   way it is. Maybe Andrew can comment on whether this is feasible.
> >   I think there isn't any backwards compatibility concern, since only
> >   modules compiled for a certain kernel version are expected to be
> >   loaded.
> 
> FWIW, you can have multiple aliases if we somehow need to keep the IDs,
> too.

Yeah, that's worth exploring, but it means that we have 2 code paths for
request_module() with different string formats. To me this is slightly
undesirable, we should try to consolidate the mechanisms in the core.

> > - put a translation table between string and MODULE_ALIAS() inside
> >   dsa_core.ko, which potentially duplicates code. Maybe if we
> >   auto-generate it somehow?
> 
> Yeah, I also thought of a table with of name to module alias mapping.
> But then you'd have two places to keep in sync (of not autogenerated).

Well, to be fair, this is not exactly true. As far as I could find
(grep for "ops->name" in net/dsa), there are only 3 instances of reading
the "name" field of struct dsa_device_ops, and none of them are from a
fast path.

I can imagine a table along the lines of:

static const char * const dsa_tag_proto_names[] = {
	[DSA_TAG_PROTO_NONE] = "none",
	[DSA_TAG_PROTO_BRCM] = "brcm",
	....
};

which is then used to directly replace ops->name
(becomes dsa_tag_proto_names[ops->proto]).

Then, we could add a new function "dsa_tag_protocol_name_to_id()" or
something along those lines, and construct the modalias string based on
that.

No duplication necessary, since we would remove dsa_device_ops :: name.
  
Michael Walle Oct. 27, 2022, 7:10 p.m. UTC | #9
Am 2022-10-27 20:04, schrieb Vladimir Oltean:
> On Thu, Oct 27, 2022 at 06:00:04PM +0200, Michael Walle wrote:
>> > - change the MODULE_ALIAS() of all tagging protocol driver modules from
>> >   "dsa_tag-<number" to something containing their string name - what you
>> >   proposed. I don't know why the current MODULE_ALIAS() is formatted the
>> >   way it is. Maybe Andrew can comment on whether this is feasible.
>> >   I think there isn't any backwards compatibility concern, since only
>> >   modules compiled for a certain kernel version are expected to be
>> >   loaded.
>> 
>> FWIW, you can have multiple aliases if we somehow need to keep the 
>> IDs,
>> too.
> 
> Yeah, that's worth exploring, but it means that we have 2 code paths 
> for
> request_module() with different string formats. To me this is slightly
> undesirable, we should try to consolidate the mechanisms in the core.
> 
>> > - put a translation table between string and MODULE_ALIAS() inside
>> >   dsa_core.ko, which potentially duplicates code. Maybe if we
>> >   auto-generate it somehow?
>> 
>> Yeah, I also thought of a table with of name to module alias mapping.
>> But then you'd have two places to keep in sync (of not autogenerated).
> 
> Well, to be fair, this is not exactly true. As far as I could find
> (grep for "ops->name" in net/dsa), there are only 3 instances of 
> reading
> the "name" field of struct dsa_device_ops, and none of them are from a
> fast path.
> 
> I can imagine a table along the lines of:
> 
> static const char * const dsa_tag_proto_names[] = {
> 	[DSA_TAG_PROTO_NONE] = "none",
> 	[DSA_TAG_PROTO_BRCM] = "brcm",
> 	....
> };
> 
> which is then used to directly replace ops->name
> (becomes dsa_tag_proto_names[ops->proto]).
> 
> Then, we could add a new function "dsa_tag_protocol_name_to_id()" or
> something along those lines, and construct the modalias string based on
> that.
> 
> No duplication necessary, since we would remove dsa_device_ops :: name.

If one would a new tagger you'd need to add it to
dsa_tag_proto_names[] as well as adding the tagger source file. Thus,
two places to keep in sync. And you don't have all the information in
one place, e.g. the tagger module. The name of the tagger as used in
sysfs or device tree is then in the core. Just wanted to point that out.
After all it's up to you as the maintainer to decide ;)

-michael
  
Vladimir Oltean Oct. 27, 2022, 8:36 p.m. UTC | #10
On Thu, Oct 27, 2022 at 09:10:20PM +0200, Michael Walle wrote:
> If one would a new tagger you'd need to add it to
> dsa_tag_proto_names[] as well as adding the tagger source file. Thus,
> two places to keep in sync. And you don't have all the information in
> one place, e.g. the tagger module. The name of the tagger as used in
> sysfs or device tree is then in the core. Just wanted to point that out.
> After all it's up to you as the maintainer to decide ;)

True, there are already 2 places you need to keep in sync, you already
need to add the tagger id to include/net/dsa.h (twice). The header is
shared between taggers, the DSA core and device drivers. A third place
would indeed be a bit too much.

Actually I came to like your idea with 2 modaliases. I prepared a patch
set and I'm in the process of testing it (need to rebuild everything
with modules, which I usually skip). If it works, I'll post it as an RFC
soon.
  

Patch

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts b/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts
index 72429b37a8b4..771c50c7f50a 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28.dts
@@ -324,14 +324,6 @@  &lpuart1 {
 	status = "okay";
 };
 
-&mscc_felix_port4 {
-	dsa-tag-protocol = "ocelot-8021q";
-};
-
-&mscc_felix_port5 {
-	dsa-tag-protocol = "ocelot-8021q";
-};
-
 &usb0 {
 	status = "okay";
 };