[net-next,v8,3/5] net: dsa: add out-of-band tagging protocol

Message ID 20221104174151.439008-4-maxime.chevallier@bootlin.com
State New
Headers
Series net: ipqess: introduce Qualcomm IPQESS driver |

Commit Message

Maxime Chevallier Nov. 4, 2022, 5:41 p.m. UTC
  This tagging protocol is designed for the situation where the link
between the MAC and the Switch is designed such that the Destination
Port, which is usually embedded in some part of the Ethernet Header, is
sent out-of-band, and isn't present at all in the Ethernet frame.

This can happen when the MAC and Switch are tightly integrated on an
SoC, as is the case with the Qualcomm IPQ4019 for example, where the DSA
tag is inserted directly into the DMA descriptors. In that case,
the MAC driver is responsible for sending the tag to the switch using
the out-of-band medium. To do so, the MAC driver needs to have the
information of the destination port for that skb.

Add a new tagging protocol based on SKB extensions to convey the
information about the destination port to the MAC driver

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---

V7->V8:
 - Added a missing blank line after declaration
V6->V7:
 - Fixed a sparse warning by making the dsa ops static
V5->V6:
 - Added some documentation
 - Removed the pop/push helpers
 - Removed unused fields
V4->V5
 - Use SKB extensions to convey the tag
V3->V4 
 - No changes
V3->V2:
 - No changes, as the discussion is ongoing
V1->V2:
 - Reworked the tagging method, putting the tag at skb->head instead
   of putting it into skb->shinfo, as per Andrew, Florian and Vlad's
   reviews


 Documentation/networking/dsa/dsa.rst | 13 +++++++-
 MAINTAINERS                          |  1 +
 include/linux/dsa/oob.h              | 16 +++++++++
 include/linux/skbuff.h               |  3 ++
 include/net/dsa.h                    |  2 ++
 net/core/skbuff.c                    | 10 ++++++
 net/dsa/Kconfig                      |  9 +++++
 net/dsa/Makefile                     |  1 +
 net/dsa/tag_oob.c                    | 49 ++++++++++++++++++++++++++++
 9 files changed, 103 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/dsa/oob.h
 create mode 100644 net/dsa/tag_oob.c
  

Comments

Jakub Kicinski Nov. 5, 2022, 3:05 a.m. UTC | #1
On Fri,  4 Nov 2022 18:41:49 +0100 Maxime Chevallier wrote:
> This tagging protocol is designed for the situation where the link
> between the MAC and the Switch is designed such that the Destination
> Port, which is usually embedded in some part of the Ethernet Header, is
> sent out-of-band, and isn't present at all in the Ethernet frame.
> 
> This can happen when the MAC and Switch are tightly integrated on an
> SoC, as is the case with the Qualcomm IPQ4019 for example, where the DSA
> tag is inserted directly into the DMA descriptors. In that case,
> the MAC driver is responsible for sending the tag to the switch using
> the out-of-band medium. To do so, the MAC driver needs to have the
> information of the destination port for that skb.
> 
> Add a new tagging protocol based on SKB extensions to convey the
> information about the destination port to the MAC driver

This is what METADATA_HW_PORT_MUX is for, you shouldn't have 
to allocate a piece of memory for every single packet.

Also the series doesn't build.
  
Maxime Chevallier Nov. 7, 2022, 8:39 a.m. UTC | #2
Hello Jakub,

On Fri, 4 Nov 2022 20:05:30 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> On Fri,  4 Nov 2022 18:41:49 +0100 Maxime Chevallier wrote:
> > This tagging protocol is designed for the situation where the link
> > between the MAC and the Switch is designed such that the Destination
> > Port, which is usually embedded in some part of the Ethernet
> > Header, is sent out-of-band, and isn't present at all in the
> > Ethernet frame.
> > 
> > This can happen when the MAC and Switch are tightly integrated on an
> > SoC, as is the case with the Qualcomm IPQ4019 for example, where
> > the DSA tag is inserted directly into the DMA descriptors. In that
> > case, the MAC driver is responsible for sending the tag to the
> > switch using the out-of-band medium. To do so, the MAC driver needs
> > to have the information of the destination port for that skb.
> > 
> > Add a new tagging protocol based on SKB extensions to convey the
> > information about the destination port to the MAC driver  
> 
> This is what METADATA_HW_PORT_MUX is for, you shouldn't have 
> to allocate a piece of memory for every single packet.

Does this work with DSA ? The information conveyed in the extension is
the DSA port identifier. I'm not familiar at all with
METADATA_HW_PORT_MUX, should we extend that mechanism to convey the
DSA port id ?

I also agree that allocating data isn't the best way to go, but from
the history of this series, we've tried 3 approaches so far :

 - Adding a new field to struct sk_buff, which isn't a good idea
 - Using the skb headroom, but then we can't know for sure is the skb
   contains a DSA tag or not
 - Using skb extensions, that comes with the cost of this memory
   allocation. Is this approach also incorrect then ?

> Also the series doesn't build.

Can you elaborate more ? I can't reproduce the build failure on my
side, and I didn't get any reports from the kbuild bot, are you using a
specific config file ?

Thanks,

Maxime
  
Vladimir Oltean Nov. 7, 2022, 11:27 a.m. UTC | #3
Hi Jakub,

On Fri, Nov 04, 2022 at 08:05:30PM -0700, Jakub Kicinski wrote:
> On Fri,  4 Nov 2022 18:41:49 +0100 Maxime Chevallier wrote:
> > Add a new tagging protocol based on SKB extensions to convey the
> > information about the destination port to the MAC driver
>
> This is what METADATA_HW_PORT_MUX is for, you shouldn't have
> to allocate a piece of memory for every single packet.

Since this is the model that skb extensions propose and not something
that Maxime invented for this series, I presume that's not such a big
deal? What's more, couldn't this specific limitation of skb extensions
be addressed in a punctual way, via one-time calls to __skb_ext_alloc()
and fast path calls to __skb_ext_set()?

I'm unfamiliar to the concept of destination cache entries and even more
so to the concept of struct dst_entry * carrying metadata. I suppose the
latter were introduced for lack of space in struct sk_buff, to carry
metadata between layers that aren't L3/L4 (where normal dst_entry structs
are used)? What makes metadata dst's preferable to skb extensions?
The latter are more general; AFAIK they can be used between any layer
and any other layer, like for example between RX and TX in the
forwarding path. Side note, I am not exactly clear what are the lifetime
guarantees of a metadata dst entry, and if DSA's use would be 100% safe
(DSA is kind of L3, since it has an ETH_P_XDSA packet_type handler, not
an rx_handler).

More importantly, what happens if a DSA switch is used together with a
SRIOV-capable DSA master which already uses METADATA_HW_PORT_MUX for
PF-VF communication? (if I understood the commit message of 3fcece12bc1b
("net: store port/representator id in metadata_dst") correctly)
  
Vladimir Oltean Nov. 7, 2022, 12:51 p.m. UTC | #4
On Mon, Nov 07, 2022 at 01:27:36PM +0200, Vladimir Oltean wrote:
> Hi Jakub,
> (...)

There is also another problem having to do with future extensibility of
METADATA_HW_PORT_MUX for DSA. I don't know how much of this is going to
be applicable for qca8k, but DSA tags might also carry such information
as trap reason (RX) or injection type (into forwarding plane or control
packet; the latter bypasses port STP state) and the FID to which the
packet should be classified by the hardware (TX). If we're going to
design a mechanism which only preallocates metadata dst's for ports,
it's going to be difficult to make that work for more information later on.
  
Jakub Kicinski Nov. 7, 2022, 4:25 p.m. UTC | #5
On Mon, 7 Nov 2022 09:39:50 +0100 Maxime Chevallier wrote:
> > Also the series doesn't build.  
> 
> Can you elaborate more ? I can't reproduce the build failure on my
> side, and I didn't get any reports from the kbuild bot, are you using a
> specific config file ?

../net/core/skbuff.c:4495:49: error: invalid application of ‘sizeof’ to incomplete type ‘struct dsa_oob_tag_info’
 4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
      |                                                 ^~~~~~
../include/uapi/linux/const.h:32:44: note: in definition of macro ‘__ALIGN_KERNEL_MASK’
   32 | #define __ALIGN_KERNEL_MASK(x, mask)    (((x) + (mask)) & ~(mask))
      |                                            ^
../include/linux/align.h:8:33: note: in expansion of macro ‘__ALIGN_KERNEL’
    8 | #define ALIGN(x, a)             __ALIGN_KERNEL((x), (a))
      |                                 ^~~~~~~~~~~~~~
../net/core/skbuff.c:4476:34: note: in expansion of macro ‘ALIGN’
 4476 | #define SKB_EXT_CHUNKSIZEOF(x)  (ALIGN((sizeof(x)), SKB_EXT_ALIGN_VALUE) / SKB_EXT_ALIGN_VALUE)
      |                                  ^~~~~
../net/core/skbuff.c:4495:29: note: in expansion of macro ‘SKB_EXT_CHUNKSIZEOF’
 4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
      |                             ^~~~~~~~~~~~~~~~~~~
../net/core/skbuff.c:4495:49: error: invalid application of ‘sizeof’ to incomplete type ‘struct dsa_oob_tag_info’
 4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
      |                                                 ^~~~~~
../include/uapi/linux/const.h:32:50: note: in definition of macro ‘__ALIGN_KERNEL_MASK’
   32 | #define __ALIGN_KERNEL_MASK(x, mask)    (((x) + (mask)) & ~(mask))
      |                                                  ^~~~
../include/linux/align.h:8:33: note: in expansion of macro ‘__ALIGN_KERNEL’
    8 | #define ALIGN(x, a)             __ALIGN_KERNEL((x), (a))
      |                                 ^~~~~~~~~~~~~~
../net/core/skbuff.c:4476:34: note: in expansion of macro ‘ALIGN’
 4476 | #define SKB_EXT_CHUNKSIZEOF(x)  (ALIGN((sizeof(x)), SKB_EXT_ALIGN_VALUE) / SKB_EXT_ALIGN_VALUE)
      |                                  ^~~~~
../net/core/skbuff.c:4495:29: note: in expansion of macro ‘SKB_EXT_CHUNKSIZEOF’
 4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
      |                             ^~~~~~~~~~~~~~~~~~~
../net/core/skbuff.c:4495:49: error: invalid application of ‘sizeof’ to incomplete type ‘struct dsa_oob_tag_info’
 4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
      |                                                 ^~~~~~
../include/uapi/linux/const.h:32:61: note: in definition of macro ‘__ALIGN_KERNEL_MASK’
   32 | #define __ALIGN_KERNEL_MASK(x, mask)    (((x) + (mask)) & ~(mask))
      |                                                             ^~~~
../include/linux/align.h:8:33: note: in expansion of macro ‘__ALIGN_KERNEL’
    8 | #define ALIGN(x, a)             __ALIGN_KERNEL((x), (a))
      |                                 ^~~~~~~~~~~~~~
../net/core/skbuff.c:4476:34: note: in expansion of macro ‘ALIGN’
 4476 | #define SKB_EXT_CHUNKSIZEOF(x)  (ALIGN((sizeof(x)), SKB_EXT_ALIGN_VALUE) / SKB_EXT_ALIGN_VALUE)
      |                                  ^~~~~
../net/core/skbuff.c:4495:29: note: in expansion of macro ‘SKB_EXT_CHUNKSIZEOF’
 4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
      |                             ^~~~~~~~~~~~~~~~~~~


Also this:

drivers/net/ethernet/qualcomm/ipqess/ipqess.c:1172:22: warning: cast to smaller integer type 'u32' (aka 'unsigned int') from 'void *' [-Wvoid-pointer-to-int-cast]
        netdev->base_addr = (u32)ess->hw_addr;
                            ^~~~~~~~~~~~~~~~~
  
Vladimir Oltean Nov. 7, 2022, 5:04 p.m. UTC | #6
On Mon, Nov 07, 2022 at 08:49:34AM -0800, Jakub Kicinski wrote:
> On Mon, 7 Nov 2022 12:51:26 +0000 Vladimir Oltean wrote:
> > There is also another problem having to do with future extensibility of
> > METADATA_HW_PORT_MUX for DSA. I don't know how much of this is going to
> > be applicable for qca8k, but DSA tags might also carry such information
> > as trap reason (RX) or injection type (into forwarding plane or control
> > packet; the latter bypasses port STP state) and the FID to which the
> > packet should be classified by the hardware (TX). If we're going to
> > design a mechanism which only preallocates metadata dst's for ports,
> > it's going to be difficult to make that work for more information later on.
>
> The entire patch we're commenting on is 100 LoC. Seems like a small
> thing, which can be rewritten later as needed. I don't think hand wave-y
> arguments are sufficient to go with a much heavier solution from the
> start.

I don't think it's as hand wavey as you think. Maxime did not present
the switch-side changes or device tree in this patch set. If it's going
to be based on drivers/net/dsa/qca/qca8k-common.c as I suspect, then it
might have some obscure features which are already supported by 'normal'
QCA8K DSA switches, like register read/write over Ethernet, and MIB
autocasting. If these features exist in hardware (they aren't exposed by
this patch set for sure), you'd be hard-pressed to fit them into the
METADATA_HW_PORT_MUX model, since it's pure management traffic consumed
by the switch driver and not delivered to the network stack, as opposed
to packets sent/received on behalf of any switch port.
  
Vladimir Oltean Nov. 7, 2022, 5:28 p.m. UTC | #7
On Mon, Nov 07, 2022 at 08:45:35AM -0800, Jakub Kicinski wrote:
> On Mon, 7 Nov 2022 11:27:37 +0000 Vladimir Oltean wrote:
> > Since this is the model that skb extensions propose and not something
> > that Maxime invented for this series, I presume that's not such a big
> > deal?
> 
> It's not a generic "do whatever you want" with it feature. The more
> people use it the less possible it is to have it disabled in a host-
> -centric kernel. 

We were talking about "the model" being "the model where you allocate
the extension for each packet", no?

> > What's more, couldn't this specific limitation of skb extensions
> > be addressed in a punctual way, via one-time calls to __skb_ext_alloc()
> > and fast path calls to __skb_ext_set()?
> 
> Are you suggesting we add refcounting to the skb ext?

idk, is it such a big offence? :)

Actually my previous paragraph on which you replied with an apparently
unrelated comment was saying that I think we're okay with allocating an
skb extension for each packet, if that's what the skb extension usage
model proposes.

> > I'm unfamiliar to the concept of destination cache entries and even more
> > so to the concept of struct dst_entry * carrying metadata. I suppose the
> > latter were introduced for lack of space in struct sk_buff, to carry
> > metadata between layers that aren't L3/L4 (where normal dst_entry structs
> > are used)? What makes metadata dst's preferable to skb extensions?
> 
> It's much less invasive.

Don't get me wrong, I don't oppose a dst_metadata solution as long as I
think that I understand it and that I can maintain/extend it as needed
going forward (which I clearly think I do about skb extensions, they
seem simple to use to the naive reader). No need to get territorial
about it, better to arm yourself with a bit of patience.

> 
> > The latter are more general; AFAIK they can be used between any layer
> > and any other layer, like for example between RX and TX in the
> > forwarding path.
> 
> You can't be using lower-dev / upper-dev metadata across forwarding,
> how would that ever work?

What makes metadata dst's preferable to skb extensions?
           ~~~~~~~~~~~~                 ~~~~~~~~~~~~~~
           former                       latter

I said: "The latter [aka skb extensions, not metadata dst's] are more general".
I did not say that you can use metadata dst's across forwarding, quite
the opposite.

> 
> > Side note, I am not exactly clear what are the lifetime
> > guarantees of a metadata dst entry, and if DSA's use would be 100% safe
> > (DSA is kind of L3, since it has an ETH_P_XDSA packet_type handler, not
> > an rx_handler).
> 
> It's just a refcounted object. I presume the DSA uppers can't get
> spawned before the lower is spawned already?

By lifetime guarantees, I actually meant: what is the latest point
during the RX path that skb_dst() will still point to the metadata dst,
and not get replaced with the real destination cache entry?

I think we're okay, because although DSA presents itself as an L3
protocol in the RX path, the 'real' L3 protocol handler will surely not
have run earlier than DSA, due to how eth_type_trans() was patched to
return ETH_P_XDSA.

Or I might be reading things completely wrong. Again, I have no
experience with destination cache entry structures or their metadata
carrying kind. Or with skb extensions, for that matter, other than
noticing that they exist.

> 
> > More importantly, what happens if a DSA switch is used together with a
> > SRIOV-capable DSA master which already uses METADATA_HW_PORT_MUX for
> > PF-VF communication? (if I understood the commit message of 3fcece12bc1b
> > ("net: store port/representator id in metadata_dst") correctly)
> 
> Let's be clear that the OOB metadata model only works if both upper and 
> lower are aware of the metadata. In fact they are pretty tightly bound.
> So chances of a mismatch are extremely low and theorizing about them is
> academic.

Legally I'm not allowed to say too much, but let's say I've heard about
something which makes the above not theoretical. Anyway, let's assume
it's not a concern.

> 
> In general, I'm not sure if pretending this is DSA is not an unnecessary
> complication which will end up hurting both ends of the equation.

This is a valid point. We've refused wacky "not DSA, not switchdev"
hardware before:
https://lore.kernel.org/netdev/20201125232459.378-1-lukma@denx.de/
There's also the option of doing what I did with ocelot/felix: a common
switch lib and 2 distinct front-ends, one switchdev and one DSA.

Not a lot of people seem to be willing to put that effort in, though.
The imx28 patch set was eventually abandoned. I though I'd try a
different approach this time. Idk, maybe it's a waste of time.
  
Florian Fainelli Nov. 7, 2022, 6:40 p.m. UTC | #8
On 11/7/22 10:24, Jakub Kicinski wrote:

[snip]

> Yeah, it's a balancing act. Please explore the metadata option, I think
> most people jump to the skb extension because they don't know about
> metadata. If you still want skb extension after, I'll look away.

It seems to me like we are trying too hard to have a generic out of band 
solution to provide tagger information coming from a DMA descriptor as 
opposed to just introducing a DSA tagger variant specific to the format 
being used and specific to the switch + integrated MAC. Something like 
DSA_TAG_IPQDMA or whatever the name chosen would be, may be fine.

The only value I see at this point in just in telling me that the tagger 
format is coming from a DMA descriptor, but other than that, it is just 
a middle layer that requires marshalling of data on both sides, so sure 
the idea behind DSA was to be able to mix and match any Ethernet MAC 
with any discrete switch, but integrating both into the same ASIC does 
nullify the design goal.
  
Vladimir Oltean Nov. 7, 2022, 8:07 p.m. UTC | #9
On Mon, Nov 07, 2022 at 10:24:40AM -0800, Jakub Kicinski wrote:
> IIRC the skb extensions were initially proposed as a way to handle rare
> exception packets (e.g. first packet of a connection-tracked flow in OvS
> offloads). Also MPTCP but that's also edge / slow path (sorry MPTCP).
>
> Now the usage is spreading and I have to keep fighting to keep them out
> of the datacenter production kernel I co-maintain.
>
> So, yeah, I hate them :)

To be fair, most people see CPU-terminated traffic on a DSA switch
(i.e. what you see in net/dsa/tag_*.c) as slow path which needs no
optimization. It's not that I condone this, but it's factually true.
If it wasn't the case, then out of the drivers I maintain, a control
packet wouldn't be delivered via SPI on SJA1105, and flow control
wouldn't be broken on the CPU port of switches from the Ocelot family.
And more importantly, software bridging between a switchdev and a
non-switchdev port wouldn't be such an oversight for more than 3/4 of
all switch drivers.

Also, I didn't really get *why* you hate them, just that you do.
Seems circular: slow => hate; hate => slow?

I don't think that skb->_skb_refdst is the hallmark of clean or simple
designs either, a pointer and a refcount bit squashed into a single
sk_buff field that is also in a union with 2 other things, and which is
reused in other network layers for purposes that have nothing to do with
L3 routing. Nope, sorry, this is highly optimized design at its finest,
true, but I have no interest in doing mental gymnastics in order to
maintain such a thing, just because some hardware manufacturer thought
that it would be a smart idea to split up device ownership in this way,
and neither build a 'switch with rings' nor a 'switch with tags', but
rather 'a switch with somebody else's rings'. The people who built this
monstrosity should step in and maintain the software architecture that's
a direct consequence of their design choices. Otherwise I'm going to opt
for the simplest thing to maintain that works. It's unfair to not care
about software support for your own hardware enough to study frameworks
beforehand, *and* to complain about performance.

> > > > The latter are more general; AFAIK they can be used between any layer
> > > > and any other layer, like for example between RX and TX in the
> > > > forwarding path.
> > >
> > > You can't be using lower-dev / upper-dev metadata across forwarding,
> > > how would that ever work?
> >
> > What makes metadata dst's preferable to skb extensions?
> >            ~~~~~~~~~~~~                 ~~~~~~~~~~~~~~
> >            former                       latter
> >
> > I said: "The latter [aka skb extensions, not metadata dst's] are more general".
> > I did not say that you can use metadata dst's across forwarding, quite
> > the opposite.
>
> No, no, I'm asking how you'd use either. I'm questioning the entire
> flow, not whether either mechanism can be used to fulfill it.

Well, we are probably talking about different things. I said that skb
extensions are a more general concept which *allows* you to pass metadata
from the iif to the oif. Metadata dst's don't. So what is the need for
metadata dst's, if skb extensions can do what those can do, and more.
Not that this use case is particularly relevant to DSA OOB. Just that I
think a reasonable expectation would have been to make skb extensions
more performant, than to introduce a parallel mechanism.

> TBH I mostly have experience on the Tx side, given that on the Rx side
> there is no queuing so the entire abstraction of tag implementation
> being separate is not strictly necessary. But if you find that the Rx
> doesn't work, and you really want the skb extensions - then, well,
> I acquiesce. And hope the Meta prod kernel never needs OOB DSA :)

I would never enable this feature, either. I would love not having to
see it.

> > > > More importantly, what happens if a DSA switch is used together with a
> > > > SRIOV-capable DSA master which already uses METADATA_HW_PORT_MUX for
> > > > PF-VF communication? (if I understood the commit message of 3fcece12bc1b
> > > > ("net: store port/representator id in metadata_dst") correctly)
> > >
> > > Let's be clear that the OOB metadata model only works if both upper and
> > > lower are aware of the metadata. In fact they are pretty tightly bound.
> > > So chances of a mismatch are extremely low and theorizing about them is
> > > academic.
> >
> > Legally I'm not allowed to say too much, but let's say I've heard about
> > something which makes the above not theoretical. Anyway, let's assume
> > it's not a concern.
>
> But in that case the same vendor designs both ends, right?

Yes.

> So there should be no conflict between the metadata assigned for reprs
> vs dsa ports.

Can't say more, sorry.

> > > In general, I'm not sure if pretending this is DSA is not an unnecessary
> > > complication which will end up hurting both ends of the equation.
> >
> > This is a valid point. We've refused wacky "not DSA, not switchdev"
> > hardware before:
> > https://lore.kernel.org/netdev/20201125232459.378-1-lukma@denx.de/
> > There's also the option of doing what I did with ocelot/felix: a common
> > switch lib and 2 distinct front-ends, one switchdev and one DSA.
>
> Exactly.
>
> > Not a lot of people seem to be willing to put that effort in, though.
> > The imx28 patch set was eventually abandoned. I though I'd try a
> > different approach this time. Idk, maybe it's a waste of time.
>
> Yeah, it's a balancing act. Please explore the metadata option, I think
> most people jump to the skb extension because they don't know about
> metadata. If you still want skb extension after, I'll look away.

Well, I guess I'm still not really convinced about metadata_dst, you're
still not really convinced about skb extensions, but what we have in
common is that "one switch lib, two front ends" is an alternative worth
exploring as a design that's both clean and efficient? :)
  
Felix Fietkau Nov. 8, 2022, 12:22 p.m. UTC | #10
On 07.11.22 09:39, Maxime Chevallier wrote:
>> On Fri,  4 Nov 2022 18:41:49 +0100 Maxime Chevallier wrote:
>> > This tagging protocol is designed for the situation where the link
>> > between the MAC and the Switch is designed such that the Destination
>> > Port, which is usually embedded in some part of the Ethernet
>> > Header, is sent out-of-band, and isn't present at all in the
>> > Ethernet frame.
>> > 
>> > This can happen when the MAC and Switch are tightly integrated on an
>> > SoC, as is the case with the Qualcomm IPQ4019 for example, where
>> > the DSA tag is inserted directly into the DMA descriptors. In that
>> > case, the MAC driver is responsible for sending the tag to the
>> > switch using the out-of-band medium. To do so, the MAC driver needs
>> > to have the information of the destination port for that skb.
>> > 
>> > Add a new tagging protocol based on SKB extensions to convey the
>> > information about the destination port to the MAC driver  
>> 
>> This is what METADATA_HW_PORT_MUX is for, you shouldn't have 
>> to allocate a piece of memory for every single packet.
> 
> Does this work with DSA ? The information conveyed in the extension is
> the DSA port identifier. I'm not familiar at all with
> METADATA_HW_PORT_MUX, should we extend that mechanism to convey the
> DSA port id ?
> 
> I also agree that allocating data isn't the best way to go, but from
> the history of this series, we've tried 3 approaches so far :
> 
>   - Adding a new field to struct sk_buff, which isn't a good idea
>   - Using the skb headroom, but then we can't know for sure is the skb
>     contains a DSA tag or not
>   - Using skb extensions, that comes with the cost of this memory
>     allocation. Is this approach also incorrect then ?
FYI, I'm currently working on hardware DSA untagging on the mediatek
mtk_eth_soc driver. On this hardware, I definitely need to keep the
custom DSA tag driver, as hardware untagging is not always available.
For the receive side, I came up with this patch (still untested) for
using METADATA_HW_PORT_MUX.
It has the advantage of being able to skip the tag protocol rcv ops
call for offload-enabled packets.

Maybe for the transmit side we could have some kind of netdev feature
or capability that indicates offload support and allows skipping the
tag xmit function as well.
In that case, ipqess could simply use a no-op tag driver.

What do you think?

---
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -972,11 +972,13 @@ bool __skb_flow_dissect(const struct net *net,
  		if (unlikely(skb->dev && netdev_uses_dsa(skb->dev) &&
  			     skb->protocol == htons(ETH_P_XDSA))) {
  			const struct dsa_device_ops *ops;
+			struct metadata_dst *md_dst = skb_metadata_dst(skb);
  			int offset = 0;
  
  			ops = skb->dev->dsa_ptr->tag_ops;
  			/* Only DSA header taggers break flow dissection */
-			if (ops->needed_headroom) {
+			if (ops->needed_headroom &&
+			    (!md_dst || md_dst->type != METADATA_HW_PORT_MUX)) {
  				if (ops->flow_dissect)
  					ops->flow_dissect(skb, &proto, &offset);
  				else
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -11,6 +11,7 @@
  #include <linux/netdevice.h>
  #include <linux/sysfs.h>
  #include <linux/ptp_classify.h>
+#include <net/dst_metadata.h>
  
  #include "dsa_priv.h"
  
@@ -216,6 +217,7 @@ static bool dsa_skb_defer_rx_timestamp(struct dsa_slave_priv *p,
  static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
  			  struct packet_type *pt, struct net_device *unused)
  {
+	struct metadata_dst *md_dst = skb_metadata_dst(skb);
  	struct dsa_port *cpu_dp = dev->dsa_ptr;
  	struct sk_buff *nskb = NULL;
  	struct dsa_slave_priv *p;
@@ -229,7 +231,21 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
  	if (!skb)
  		return 0;
  
-	nskb = cpu_dp->rcv(skb, dev);
+	if (md_dst && md_dst->type == METADATA_HW_PORT_MUX) {
+		unsigned int port = md_dst->u.port_info.port_id;
+
+		dsa_default_offload_fwd_mark(skb);
+		skb_dst_set(skb, NULL);
+		if (!skb_has_extensions(skb))
+			skb->slow_gro = 0;
+
+		skb->dev = dsa_master_find_slave(dev, 0, port);
+		if (skb->dev)
+			nskb = skb;
+	} else {
+		nskb = cpu_dp->rcv(skb, dev);
+	}
+
  	if (!nskb) {
  		kfree_skb(skb);
  		return 0;
  
kernel test robot Nov. 11, 2022, 11:17 p.m. UTC | #11
Hi Maxime,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Maxime-Chevallier/net-ipqess-introduce-Qualcomm-IPQESS-driver/20221105-014353
patch link:    https://lore.kernel.org/r/20221104174151.439008-4-maxime.chevallier%40bootlin.com
patch subject: [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol
config: riscv-allmodconfig
compiler: riscv64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/68ff661e3d27afe0c9cfc202c8fe111d288cbdd4
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Maxime-Chevallier/net-ipqess-introduce-Qualcomm-IPQESS-driver/20221105-014353
        git checkout 68ff661e3d27afe0c9cfc202c8fe111d288cbdd4
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash net/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/vdso/const.h:5,
                    from include/linux/const.h:4,
                    from include/linux/list.h:9,
                    from include/linux/module.h:12,
                    from net/core/skbuff.c:37:
>> net/core/skbuff.c:4495:49: error: invalid application of 'sizeof' to incomplete type 'struct dsa_oob_tag_info'
    4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
         |                                                 ^~~~~~
   include/uapi/linux/const.h:32:44: note: in definition of macro '__ALIGN_KERNEL_MASK'
      32 | #define __ALIGN_KERNEL_MASK(x, mask)    (((x) + (mask)) & ~(mask))
         |                                            ^
   include/linux/align.h:8:33: note: in expansion of macro '__ALIGN_KERNEL'
       8 | #define ALIGN(x, a)             __ALIGN_KERNEL((x), (a))
         |                                 ^~~~~~~~~~~~~~
   net/core/skbuff.c:4476:34: note: in expansion of macro 'ALIGN'
    4476 | #define SKB_EXT_CHUNKSIZEOF(x)  (ALIGN((sizeof(x)), SKB_EXT_ALIGN_VALUE) / SKB_EXT_ALIGN_VALUE)
         |                                  ^~~~~
   net/core/skbuff.c:4495:29: note: in expansion of macro 'SKB_EXT_CHUNKSIZEOF'
    4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
         |                             ^~~~~~~~~~~~~~~~~~~
>> net/core/skbuff.c:4495:49: error: invalid application of 'sizeof' to incomplete type 'struct dsa_oob_tag_info'
    4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
         |                                                 ^~~~~~
   include/uapi/linux/const.h:32:50: note: in definition of macro '__ALIGN_KERNEL_MASK'
      32 | #define __ALIGN_KERNEL_MASK(x, mask)    (((x) + (mask)) & ~(mask))
         |                                                  ^~~~
   include/linux/align.h:8:33: note: in expansion of macro '__ALIGN_KERNEL'
       8 | #define ALIGN(x, a)             __ALIGN_KERNEL((x), (a))
         |                                 ^~~~~~~~~~~~~~
   net/core/skbuff.c:4476:34: note: in expansion of macro 'ALIGN'
    4476 | #define SKB_EXT_CHUNKSIZEOF(x)  (ALIGN((sizeof(x)), SKB_EXT_ALIGN_VALUE) / SKB_EXT_ALIGN_VALUE)
         |                                  ^~~~~
   net/core/skbuff.c:4495:29: note: in expansion of macro 'SKB_EXT_CHUNKSIZEOF'
    4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
         |                             ^~~~~~~~~~~~~~~~~~~
>> net/core/skbuff.c:4495:49: error: invalid application of 'sizeof' to incomplete type 'struct dsa_oob_tag_info'
    4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
         |                                                 ^~~~~~
   include/uapi/linux/const.h:32:61: note: in definition of macro '__ALIGN_KERNEL_MASK'
      32 | #define __ALIGN_KERNEL_MASK(x, mask)    (((x) + (mask)) & ~(mask))
         |                                                             ^~~~
   include/linux/align.h:8:33: note: in expansion of macro '__ALIGN_KERNEL'
       8 | #define ALIGN(x, a)             __ALIGN_KERNEL((x), (a))
         |                                 ^~~~~~~~~~~~~~
   net/core/skbuff.c:4476:34: note: in expansion of macro 'ALIGN'
    4476 | #define SKB_EXT_CHUNKSIZEOF(x)  (ALIGN((sizeof(x)), SKB_EXT_ALIGN_VALUE) / SKB_EXT_ALIGN_VALUE)
         |                                  ^~~~~
   net/core/skbuff.c:4495:29: note: in expansion of macro 'SKB_EXT_CHUNKSIZEOF'
    4495 |         [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
         |                             ^~~~~~~~~~~~~~~~~~~


vim +4495 net/core/skbuff.c

  4477	
  4478	static const u8 skb_ext_type_len[] = {
  4479	#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
  4480		[SKB_EXT_BRIDGE_NF] = SKB_EXT_CHUNKSIZEOF(struct nf_bridge_info),
  4481	#endif
  4482	#ifdef CONFIG_XFRM
  4483		[SKB_EXT_SEC_PATH] = SKB_EXT_CHUNKSIZEOF(struct sec_path),
  4484	#endif
  4485	#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
  4486		[TC_SKB_EXT] = SKB_EXT_CHUNKSIZEOF(struct tc_skb_ext),
  4487	#endif
  4488	#if IS_ENABLED(CONFIG_MPTCP)
  4489		[SKB_EXT_MPTCP] = SKB_EXT_CHUNKSIZEOF(struct mptcp_ext),
  4490	#endif
  4491	#if IS_ENABLED(CONFIG_MCTP_FLOWS)
  4492		[SKB_EXT_MCTP] = SKB_EXT_CHUNKSIZEOF(struct mctp_flow),
  4493	#endif
  4494	#if IS_ENABLED(CONFIG_NET_DSA_TAG_OOB)
> 4495		[SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
  4496	#endif
  4497	};
  4498
  
Maxime Chevallier Nov. 15, 2022, 9:29 a.m. UTC | #12
Hello everyone,

Felix, thanks for the feedback !

On Tue, 8 Nov 2022 13:22:17 +0100
Felix Fietkau <nbd@nbd.name> wrote:

[...]

> FYI, I'm currently working on hardware DSA untagging on the mediatek
> mtk_eth_soc driver. On this hardware, I definitely need to keep the
> custom DSA tag driver, as hardware untagging is not always available.
> For the receive side, I came up with this patch (still untested) for
> using METADATA_HW_PORT_MUX.
> It has the advantage of being able to skip the tag protocol rcv ops
> call for offload-enabled packets.
> 
> Maybe for the transmit side we could have some kind of netdev feature
> or capability that indicates offload support and allows skipping the
> tag xmit function as well.
> In that case, ipqess could simply use a no-op tag driver.

If I'm not mistaken, Florian also proposed a while ago an offload
mechanism for taggin/untagging :

https://lore.kernel.org/lkml/1438322920.20182.144.camel@edumazet-glaptop2.roam.corp.google.com/T/

It uses some of the points you're mentionning, such as the netdev
feature :)

All in all, I'm still a bit confused about the next steps. If I can
summarize a bit, we have a lot of approaches, all with advantages and
inconvenients, I'll try to summarize the state :

 - We could simply use the skb extensions as-is, rename the tagger
   something like "DSA_TAG_IPQDMA" and consider this a way to perform
   tagging on this specific class of hardware, without trying too hard
   to make it generic.

 - We could try to move forward with this mechanism of offloading
   tagging and untagging from the MAC driver, this would address
   Florian's first try at this, Felix's use-case and would fit well the
   IPQESS case

 - There's the option discussed by Vlad and Jakub to add several
   frontends, one being a switchev driver, here I'm a bit lost TBH, if
   we go this way I could definitely use a few pointers from Vlad :)

When looking at it from this point of view, option 2 looks pretty
promising, but I would like to make sure we're on the same page at that
point. On my side, I've tried several approaches for this tagging and
so far none are acceptable, for good reasons. I would like to make sure
then that I grasp the full picture and I didn't miss other possible
ways of addressing this.

Thanks everyone for your help !

Maxime
  
Vladimir Oltean Nov. 15, 2022, 11:50 a.m. UTC | #13
On Tue, Nov 15, 2022 at 10:29:24AM +0100, Maxime Chevallier wrote:
> Hello everyone,
> 
> Felix, thanks for the feedback !
> 
> On Tue, 8 Nov 2022 13:22:17 +0100
> Felix Fietkau <nbd@nbd.name> wrote:
> 
> [...]
> 
> > FYI, I'm currently working on hardware DSA untagging on the mediatek
> > mtk_eth_soc driver. On this hardware, I definitely need to keep the
> > custom DSA tag driver, as hardware untagging is not always available.
> > For the receive side, I came up with this patch (still untested) for
> > using METADATA_HW_PORT_MUX.
> > It has the advantage of being able to skip the tag protocol rcv ops
> > call for offload-enabled packets.
> > 
> > Maybe for the transmit side we could have some kind of netdev feature
> > or capability that indicates offload support and allows skipping the
> > tag xmit function as well.
> > In that case, ipqess could simply use a no-op tag driver.
> 
> If I'm not mistaken, Florian also proposed a while ago an offload
> mechanism for taggin/untagging :
> 
> https://lore.kernel.org/lkml/1438322920.20182.144.camel@edumazet-glaptop2.roam.corp.google.com/T/
> 
> It uses some of the points you're mentionning, such as the netdev
> feature :)
> 
> All in all, I'm still a bit confused about the next steps. If I can
> summarize a bit, we have a lot of approaches, all with advantages and
> inconvenients, I'll try to summarize the state :
> 
>  - We could simply use the skb extensions as-is, rename the tagger
>    something like "DSA_TAG_IPQDMA" and consider this a way to perform
>    tagging on this specific class of hardware, without trying too hard
>    to make it generic.

For Felix, using skb extensions would be inconvenient, since it would
involve per packet allocations which are now avoided with the metadata
dsts.

>  - We could try to move forward with this mechanism of offloading
>    tagging and untagging from the MAC driver, this would address
>    Florian's first try at this, Felix's use-case and would fit well the
>    IPQESS case

Someone would need to take things from where Felix left them:
https://patchwork.kernel.org/project/netdevbpf/patch/20221114124214.58199-2-nbd@nbd.name/
and add TX tag offloading support as well. Here there would need to be
a mechanism through which DSA asks "hey, this is my tagging protocol,
can the master offload it in the TX direction or am I just going to push
the tag into the packet?". I tried to sketch here something along those
lines:
https://patchwork.kernel.org/project/netdevbpf/patch/20221109163426.76164-10-nbd@nbd.name/#25084481

>  - There's the option discussed by Vlad and Jakub to add several
>    frontends, one being a switchev driver, here I'm a bit lost TBH, if
>    we go this way I could definitely use a few pointers from Vlad :)

The assumption being here that there is more functionality to cover by
the metadata dst than a port mux. I'm really not clear what is the
hardware design truly, hopefully you could give more details about that.

The mechanism is quite simple, it's not rocket science. Take something
like a bridge join operation, the proposal is to do something like this:

    dsa_slave_netdevice_event
        (net/dsa/slave.c)
               |
               v
      dsa_slave_changeupper
       (net/dsa/slave.c)
               |
               v
       dsa_port_bridge_join                         ocelot_netdevice_event
        (net/dsa/port.c)                  (drivers/net/ethernet/mscc/ocelot_net.c)
               |                                           |
               v                                           v
     dsa_switch_bridge_join                     ocelot_netdevice_changeupper
       (net/dsa/switch.c)                 (drivers/net/ethernet/mscc/ocelot_net.c)
               |                                           |
               v                                           v
       felix_bridge_join                        ocelot_netdevice_bridge_join
(drivers/net/dsa/ocelot/felix.c)          (drivers/net/ethernet/mscc/ocelot_net.c)
               |                                           |
               |                                           |
               +---------------------+---------------------+
                                     |
                                     v
                           ocelot_port_bridge_join
                      (drivers/net/ethernet/mscc/ocelot.c)

with you maintaining the entire right branch that represents the switchdev frontend,
and more or less duplicates part of DSA.

The advantage of this approach is that you can register your own NAPI
handler where you can treat packets in whichever way you like, and have
your own ndo_start_xmit. This driver would treat the aggregate of the
ess DMA engine and the ipq switch as a single device, and expose it as a
switch with DMA, basically.
  
Maxime Chevallier May 23, 2023, 12:34 p.m. UTC | #14
Hello everyone,

I'm digging this topic up, it has we'd like to move forward with the
upstreaming of this, and before trying any new approach, I'd like to
see if we can settle on one of the two choices that were expressed so
far.

To summarize the issue, this hardware platform (IPQ4019 from Qualcomm)
uses an internal switch that's a modified QCA8K, for which there
already is a DSA driver. On that platform, there's a MAC (ipqess)
connected to the switch, that passes the dst/src port id through the
DMA descriptor, whereas a typical DSA switch would pass that
information in the frame itself.

There has been a few approaches to try and reuse DSA as-is with a
custom tagger, but all of them eventually got rejected, for a good
reason.

Two solutions are proposed, as discussed in that thread (hence the
top-posting, sorry about that).

There are two approaches remaining, either implementing DSA tagging
offload support in RX/TX, or having a DSA frontend for the switch (the
current QCA8K driver) and a switchdev frontend, using the qca8k logic
with the ESS driver handling transfers for the CPU port.

As both approaches make sense but are quite opposed, I'd like to make
sure we go in the right direction. The switchdev approach definitely
makes a lot of sense, but the DSA tagging offloading has been in
discussion for quite a while, starting with Florian's series, followed
by Felix's, and this could also be a good occasion to move forward with
this, and it would also involve a minimal rework of the current ipqess
driver.

Any pointer would help,

Thanks everyone,

Maxime

On Tue, 15 Nov 2022 11:50:23 +0000
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> On Tue, Nov 15, 2022 at 10:29:24AM +0100, Maxime Chevallier wrote:
> > Hello everyone,
> > 
> > Felix, thanks for the feedback !
> > 
> > On Tue, 8 Nov 2022 13:22:17 +0100
> > Felix Fietkau <nbd@nbd.name> wrote:
> > 
> > [...]
> >   
> > > FYI, I'm currently working on hardware DSA untagging on the
> > > mediatek mtk_eth_soc driver. On this hardware, I definitely need
> > > to keep the custom DSA tag driver, as hardware untagging is not
> > > always available. For the receive side, I came up with this patch
> > > (still untested) for using METADATA_HW_PORT_MUX.
> > > It has the advantage of being able to skip the tag protocol rcv
> > > ops call for offload-enabled packets.
> > > 
> > > Maybe for the transmit side we could have some kind of netdev
> > > feature or capability that indicates offload support and allows
> > > skipping the tag xmit function as well.
> > > In that case, ipqess could simply use a no-op tag driver.  
> > 
> > If I'm not mistaken, Florian also proposed a while ago an offload
> > mechanism for taggin/untagging :
> > 
> > https://lore.kernel.org/lkml/1438322920.20182.144.camel@edumazet-glaptop2.roam.corp.google.com/T/
> > 
> > It uses some of the points you're mentionning, such as the netdev
> > feature :)
> > 
> > All in all, I'm still a bit confused about the next steps. If I can
> > summarize a bit, we have a lot of approaches, all with advantages
> > and inconvenients, I'll try to summarize the state :
> > 
> >  - We could simply use the skb extensions as-is, rename the tagger
> >    something like "DSA_TAG_IPQDMA" and consider this a way to
> > perform tagging on this specific class of hardware, without trying
> > too hard to make it generic.  
> 
> For Felix, using skb extensions would be inconvenient, since it would
> involve per packet allocations which are now avoided with the metadata
> dsts.
> 
> >  - We could try to move forward with this mechanism of offloading
> >    tagging and untagging from the MAC driver, this would address
> >    Florian's first try at this, Felix's use-case and would fit well
> > the IPQESS case  
> 
> Someone would need to take things from where Felix left them:
> https://patchwork.kernel.org/project/netdevbpf/patch/20221114124214.58199-2-nbd@nbd.name/
> and add TX tag offloading support as well. Here there would need to be
> a mechanism through which DSA asks "hey, this is my tagging protocol,
> can the master offload it in the TX direction or am I just going to
> push the tag into the packet?". I tried to sketch here something
> along those lines:
> https://patchwork.kernel.org/project/netdevbpf/patch/20221109163426.76164-10-nbd@nbd.name/#25084481
> 
> >  - There's the option discussed by Vlad and Jakub to add several
> >    frontends, one being a switchev driver, here I'm a bit lost TBH,
> > if we go this way I could definitely use a few pointers from Vlad
> > :)  
> 
> The assumption being here that there is more functionality to cover by
> the metadata dst than a port mux. I'm really not clear what is the
> hardware design truly, hopefully you could give more details about
> that.

TBH the documentation I have is pretty limited, I don't actually know
what else can go in the metadata attached to the descriptor :(

> The mechanism is quite simple, it's not rocket science. Take something
> like a bridge join operation, the proposal is to do something like
> this:
> 
>     dsa_slave_netdevice_event
>         (net/dsa/slave.c)
>                |
>                v
>       dsa_slave_changeupper
>        (net/dsa/slave.c)
>                |
>                v
>        dsa_port_bridge_join
> ocelot_netdevice_event (net/dsa/port.c)
> (drivers/net/ethernet/mscc/ocelot_net.c) |
>                | v                                           v
>      dsa_switch_bridge_join
> ocelot_netdevice_changeupper (net/dsa/switch.c)
> (drivers/net/ethernet/mscc/ocelot_net.c) |
>                | v                                           v
>        felix_bridge_join
> ocelot_netdevice_bridge_join (drivers/net/dsa/ocelot/felix.c)
>  (drivers/net/ethernet/mscc/ocelot_net.c) |
>                 | |                                           |
>                +---------------------+---------------------+
>                                      |
>                                      v
>                            ocelot_port_bridge_join
>                       (drivers/net/ethernet/mscc/ocelot.c)
> 
> with you maintaining the entire right branch that represents the
> switchdev frontend, and more or less duplicates part of DSA.
> 
> The advantage of this approach is that you can register your own NAPI
> handler where you can treat packets in whichever way you like, and
> have your own ndo_start_xmit. This driver would treat the aggregate
> of the ess DMA engine and the ipq switch as a single device, and
> expose it as a switch with DMA, basically.
  

Patch

diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst
index a94ddf83348a..2909ed5f00f6 100644
--- a/Documentation/networking/dsa/dsa.rst
+++ b/Documentation/networking/dsa/dsa.rst
@@ -66,7 +66,8 @@  Switch tagging protocols
 ------------------------
 
 DSA supports many vendor-specific tagging protocols, one software-defined
-tagging protocol, and a tag-less mode as well (``DSA_TAG_PROTO_NONE``).
+tagging protocol, a tag-less mode as well (``DSA_TAG_PROTO_NONE``) and an
+out-of-band tagging protocol (``DSA_TAG_PROTO_OOB``).
 
 The exact format of the tag protocol is vendor specific, but in general, they
 all contain something which:
@@ -217,6 +218,16 @@  receive all frames regardless of the value of the MAC DA. This can be done by
 setting the ``promisc_on_master`` property of the ``struct dsa_device_ops``.
 Note that this assumes a DSA-unaware master driver, which is the norm.
 
+Some SoCs have a tight integration between the conduit network interface and the
+embedded switch, such that the DSA tag isn't transmitted in the packet data,
+but through another media, using so-called out-of-band tagging. In that case,
+the host MAC driver is in charge of transmitting the tag to the switch.
+An example is the IPQ4019 SoC, that transmits the tag between the ipqess
+ethernet controller and the qca8k switch using the DMA descriptor. In that
+configuration, tag-chaining is permitted, but the OOB tag will always be the
+top-most switch in the tree. The tagger (``DSA_TAG_PROTO_OOB``) uses skb
+extensions to transmit the tag to and from the MAC driver.
+
 Master network devices
 ----------------------
 
diff --git a/MAINTAINERS b/MAINTAINERS
index 47588d4b1657..bdf716128058 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17055,6 +17055,7 @@  L:	netdev@vger.kernel.org
 S:	Maintained
 F:	Documentation/devicetree/bindings/net/qcom,ipq4019-ess-edma.yaml
 F:	drivers/net/ethernet/qualcomm/ipqess/
+F:	net/dsa/tag_oob.c
 
 QUALCOMM ETHQOS ETHERNET DRIVER
 M:	Vinod Koul <vkoul@kernel.org>
diff --git a/include/linux/dsa/oob.h b/include/linux/dsa/oob.h
new file mode 100644
index 000000000000..b5683a9a647d
--- /dev/null
+++ b/include/linux/dsa/oob.h
@@ -0,0 +1,16 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only
+ * Copyright (C) 2022 Maxime Chevallier <maxime.chevallier@bootlin.com>
+ */
+
+#ifndef _NET_DSA_OOB_H
+#define _NET_DSA_OOB_H
+
+#include <linux/skbuff.h>
+
+struct dsa_oob_tag_info {
+	u16 port;
+};
+
+int dsa_oob_tag_push(struct sk_buff *skb, struct dsa_oob_tag_info *ti);
+int dsa_oob_tag_pop(struct sk_buff *skb, struct dsa_oob_tag_info *ti);
+#endif
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 59c9fd55699d..ace765ae56b3 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4573,6 +4573,9 @@  enum skb_ext_id {
 #endif
 #if IS_ENABLED(CONFIG_MCTP_FLOWS)
 	SKB_EXT_MCTP,
+#endif
+#if IS_ENABLED(CONFIG_NET_DSA_TAG_OOB)
+	SKB_EXT_DSA_OOB,
 #endif
 	SKB_EXT_NUM, /* must be last */
 };
diff --git a/include/net/dsa.h b/include/net/dsa.h
index ee369670e20e..114176efacc9 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -55,6 +55,7 @@  struct phylink_link_state;
 #define DSA_TAG_PROTO_RTL8_4T_VALUE		25
 #define DSA_TAG_PROTO_RZN1_A5PSW_VALUE		26
 #define DSA_TAG_PROTO_LAN937X_VALUE		27
+#define DSA_TAG_PROTO_OOB_VALUE			28
 
 enum dsa_tag_protocol {
 	DSA_TAG_PROTO_NONE		= DSA_TAG_PROTO_NONE_VALUE,
@@ -85,6 +86,7 @@  enum dsa_tag_protocol {
 	DSA_TAG_PROTO_RTL8_4T		= DSA_TAG_PROTO_RTL8_4T_VALUE,
 	DSA_TAG_PROTO_RZN1_A5PSW	= DSA_TAG_PROTO_RZN1_A5PSW_VALUE,
 	DSA_TAG_PROTO_LAN937X		= DSA_TAG_PROTO_LAN937X_VALUE,
+	DSA_TAG_PROTO_OOB		= DSA_TAG_PROTO_OOB_VALUE,
 };
 
 struct dsa_switch;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 42a35b59fb1e..571ef7fd95b4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -61,8 +61,12 @@ 
 #include <linux/if_vlan.h>
 #include <linux/mpls.h>
 #include <linux/kcov.h>
+#ifdef CONFIG_NET_DSA_TAG_OOB
+#include <linux/dsa/oob.h>
+#endif
 
 #include <net/protocol.h>
+#include <net/dsa.h>
 #include <net/dst.h>
 #include <net/sock.h>
 #include <net/checksum.h>
@@ -4487,6 +4491,9 @@  static const u8 skb_ext_type_len[] = {
 #if IS_ENABLED(CONFIG_MCTP_FLOWS)
 	[SKB_EXT_MCTP] = SKB_EXT_CHUNKSIZEOF(struct mctp_flow),
 #endif
+#if IS_ENABLED(CONFIG_NET_DSA_TAG_OOB)
+	[SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info),
+#endif
 };
 
 static __always_inline unsigned int skb_ext_total_length(void)
@@ -4506,6 +4513,9 @@  static __always_inline unsigned int skb_ext_total_length(void)
 #endif
 #if IS_ENABLED(CONFIG_MCTP_FLOWS)
 		skb_ext_type_len[SKB_EXT_MCTP] +
+#endif
+#if IS_ENABLED(CONFIG_NET_DSA_TAG_OOB)
+		skb_ext_type_len[SKB_EXT_DSA_OOB] +
 #endif
 		0;
 }
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index 3eef72ce99a4..2ba4bbe07df1 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -113,6 +113,15 @@  config NET_DSA_TAG_OCELOT_8021Q
 	  this mode, less TCAM resources (VCAP IS1, IS2, ES0) are available for
 	  use with tc-flower.
 
+config NET_DSA_TAG_OOB
+	select SKB_EXTENSIONS
+	tristate "Tag driver for Out-of-band tagging drivers"
+	help
+	  Say Y or M if you want to enable support for pairs of embedded
+	  switches and host MAC drivers which perform demultiplexing and
+	  packet steering to ports using out of band metadata processed
+	  by the DSA master, rather than tags present in the packets.
+
 config NET_DSA_TAG_QCA
 	tristate "Tag driver for Qualcomm Atheros QCA8K switches"
 	help
diff --git a/net/dsa/Makefile b/net/dsa/Makefile
index bf57ef3bce2a..b11c24c969ee 100644
--- a/net/dsa/Makefile
+++ b/net/dsa/Makefile
@@ -22,6 +22,7 @@  obj-$(CONFIG_NET_DSA_TAG_LAN9303) += tag_lan9303.o
 obj-$(CONFIG_NET_DSA_TAG_MTK) += tag_mtk.o
 obj-$(CONFIG_NET_DSA_TAG_OCELOT) += tag_ocelot.o
 obj-$(CONFIG_NET_DSA_TAG_OCELOT_8021Q) += tag_ocelot_8021q.o
+obj-$(CONFIG_NET_DSA_TAG_OOB) += tag_oob.o
 obj-$(CONFIG_NET_DSA_TAG_QCA) += tag_qca.o
 obj-$(CONFIG_NET_DSA_TAG_RTL4_A) += tag_rtl4_a.o
 obj-$(CONFIG_NET_DSA_TAG_RTL8_4) += tag_rtl8_4.o
diff --git a/net/dsa/tag_oob.c b/net/dsa/tag_oob.c
new file mode 100644
index 000000000000..e328a1f4e38d
--- /dev/null
+++ b/net/dsa/tag_oob.c
@@ -0,0 +1,49 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2022, Maxime Chevallier <maxime.chevallier@bootlin.com> */
+
+#include <linux/bitfield.h>
+#include <linux/dsa/oob.h>
+#include <linux/skbuff.h>
+
+#include "dsa_priv.h"
+
+static struct sk_buff *oob_tag_xmit(struct sk_buff *skb,
+				    struct net_device *dev)
+{
+	struct dsa_oob_tag_info *tag_info = skb_ext_add(skb, SKB_EXT_DSA_OOB);
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+
+	tag_info->port = dp->index;
+
+	return skb;
+}
+
+static struct sk_buff *oob_tag_rcv(struct sk_buff *skb,
+				   struct net_device *dev)
+{
+	struct dsa_oob_tag_info *tag_info = skb_ext_find(skb, SKB_EXT_DSA_OOB);
+
+	if (!tag_info)
+		return NULL;
+
+	skb->dev = dsa_master_find_slave(dev, 0, tag_info->port);
+	if (!skb->dev)
+		return NULL;
+
+	return skb;
+}
+
+static const struct dsa_device_ops oob_tag_dsa_ops = {
+	.name	= "oob",
+	.proto	= DSA_TAG_PROTO_OOB,
+	.xmit	= oob_tag_xmit,
+	.rcv	= oob_tag_rcv,
+};
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("DSA tag driver for out-of-band tagging");
+MODULE_AUTHOR("Maxime Chevallier <maxime.chevallier@bootlin.com>");
+MODULE_ALIAS_DSA_TAG_DRIVER(DSA_TAG_PROTO_OOB);
+
+module_dsa_tag_driver(oob_tag_dsa_ops);