Message ID | 20221104174151.439008-4-maxime.chevallier@bootlin.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp547588wru; Fri, 4 Nov 2022 10:43:10 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4t21PUrxiFhAnQcFONiEZk6ZWiG+wftiEoMBTwFZ0rhI1coig+7ZCe4Osp0IixdkY/e8+l X-Received: by 2002:a17:90a:9606:b0:213:aff5:e537 with SMTP id v6-20020a17090a960600b00213aff5e537mr34961238pjo.183.1667583790607; Fri, 04 Nov 2022 10:43:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667583790; cv=none; d=google.com; s=arc-20160816; b=twnpCbBJX2fF2FkYF8spOwQ5UNAc9mHmaZzp3Bl2nw0yDAqPDrNJJFFCnsE5xBc9BP RSQ2Z/h6M8QexMUzFHv43llSFcqXtD4fvpzVsiwjrSgBlLKN5edVSmAXOSZ5pdhZBHDU 11/2BSyRWNz1ABPOhzj+AQGuoY7qbVSfIwfKlxpjf/sR5Mm8FdSJHZH5LPsEYcj1bM11 0KwOMdGjgwphkvOGVGyjzuBK/2UjyQAxE4kwjRCzosaDLjhM+BGCqt0uOP6Cr/QevDuu TKh46eBuveBgagjgoc+L+ykbZ86KORt4wemM1CvheWWcGAppJkK4CkeLfnXKBrRyQcYE aepw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=aKUenYNB3v7NDOtqjkiaNGBi/7ccXmhnKVdj/8lZw/A=; b=MTC6SLWUjIEi1NCmdCDTy4HN9/XjzHBJyu95KNsom3pUXi0ZCjEFKfZXTv9lFeQPf5 xNXTTY06m5pIj1INSym+Tt8n2MDImM8rSt332Rcib+3uKIDYOPYxQ/F1MsfKWIbUihhU c/OGS/WtX5ALfGyywybwEd7ZARpFeq5dPNa58EvjD5pGoTPlfWNjvHLitGeEZxVc953p eJFns+FSJKVO18MRf6y3RaKUODtV4cI4IshetMANoM7/ec4iuoN/wl90w0gGSqyXBGt/ 2AYsLMX32flL0D9NSaXL+Kfz1GitvZyxrwO3ywy5Fsba171HZB+vHVk+1k1qQ+fjlKvB RZpw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bootlin.com header.s=gm1 header.b=EtFTVGN0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=bootlin.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g2-20020a63fa42000000b004580a3b847dsi85941pgk.287.2022.11.04.10.42.57; Fri, 04 Nov 2022 10:43:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bootlin.com header.s=gm1 header.b=EtFTVGN0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=bootlin.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231796AbiKDRmY (ORCPT <rfc822;jimliu8233@gmail.com> + 99 others); Fri, 4 Nov 2022 13:42:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49408 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231704AbiKDRmF (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 4 Nov 2022 13:42:05 -0400 Received: from relay9-d.mail.gandi.net (relay9-d.mail.gandi.net [217.70.183.199]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BFF85B842; Fri, 4 Nov 2022 10:42:02 -0700 (PDT) Received: (Authenticated sender: maxime.chevallier@bootlin.com) by mail.gandi.net (Postfix) with ESMTPSA id B1393FF808; Fri, 4 Nov 2022 17:41:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bootlin.com; s=gm1; t=1667583721; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aKUenYNB3v7NDOtqjkiaNGBi/7ccXmhnKVdj/8lZw/A=; b=EtFTVGN0gCc9BYzcJmaRaSXKF/RO852zSEdUuAVNFXipmvqJPxcGC8/ca2vyyUQKW+MUTf eh7x6ntifGFJbF+tn1ue9W4JvsHo1pjbJtl+WSs+9egDja6WxHrSYGd/SID3mwu3EBidvB KSSPnEPsMbzODTk0wG5cuaSLSr1bGnM6SjeWPlE24mst8tZRHpfU6cwXxdf7rz9ezLZzFh +tnxsjjOUJkhSmZQnsDufam2plXnrgAxD9uxt0pDFTBXojugxUdijzSQ1MKgsdFs1Xx0Is prBlPDqJl5fBdTlM/RsvtB3TrWbACHmDS+MC4eolstUpaqH7k0UGJ7GBt09YPw== From: Maxime Chevallier <maxime.chevallier@bootlin.com> To: davem@davemloft.net, Rob Herring <robh+dt@kernel.org>, Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org> Cc: Maxime Chevallier <maxime.chevallier@bootlin.com>, Jakub Kicinski <kuba@kernel.org>, Eric Dumazet <edumazet@google.com>, Paolo Abeni <pabeni@redhat.com>, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, devicetree@vger.kernel.org, thomas.petazzoni@bootlin.com, Andrew Lunn <andrew@lunn.ch>, Florian Fainelli <f.fainelli@gmail.com>, Heiner Kallweit <hkallweit1@gmail.com>, Russell King <linux@armlinux.org.uk>, linux-arm-kernel@lists.infradead.org, Vladimir Oltean <vladimir.oltean@nxp.com>, Luka Perkov <luka.perkov@sartura.hr>, Robert Marko <robert.marko@sartura.hr>, Andy Gross <agross@kernel.org>, Bjorn Andersson <andersson@kernel.org>, Konrad Dybcio <konrad.dybcio@somainline.org> Subject: [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol Date: Fri, 4 Nov 2022 18:41:49 +0100 Message-Id: <20221104174151.439008-4-maxime.chevallier@bootlin.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221104174151.439008-1-maxime.chevallier@bootlin.com> References: <20221104174151.439008-1-maxime.chevallier@bootlin.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748588340518464269?= X-GMAIL-MSGID: =?utf-8?q?1748588340518464269?= |
Series |
net: ipqess: introduce Qualcomm IPQESS driver
|
|
Commit Message
Maxime Chevallier
Nov. 4, 2022, 5:41 p.m. UTC
This tagging protocol is designed for the situation where the link
between the MAC and the Switch is designed such that the Destination
Port, which is usually embedded in some part of the Ethernet Header, is
sent out-of-band, and isn't present at all in the Ethernet frame.
This can happen when the MAC and Switch are tightly integrated on an
SoC, as is the case with the Qualcomm IPQ4019 for example, where the DSA
tag is inserted directly into the DMA descriptors. In that case,
the MAC driver is responsible for sending the tag to the switch using
the out-of-band medium. To do so, the MAC driver needs to have the
information of the destination port for that skb.
Add a new tagging protocol based on SKB extensions to convey the
information about the destination port to the MAC driver
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---
V7->V8:
- Added a missing blank line after declaration
V6->V7:
- Fixed a sparse warning by making the dsa ops static
V5->V6:
- Added some documentation
- Removed the pop/push helpers
- Removed unused fields
V4->V5
- Use SKB extensions to convey the tag
V3->V4
- No changes
V3->V2:
- No changes, as the discussion is ongoing
V1->V2:
- Reworked the tagging method, putting the tag at skb->head instead
of putting it into skb->shinfo, as per Andrew, Florian and Vlad's
reviews
Documentation/networking/dsa/dsa.rst | 13 +++++++-
MAINTAINERS | 1 +
include/linux/dsa/oob.h | 16 +++++++++
include/linux/skbuff.h | 3 ++
include/net/dsa.h | 2 ++
net/core/skbuff.c | 10 ++++++
net/dsa/Kconfig | 9 +++++
net/dsa/Makefile | 1 +
net/dsa/tag_oob.c | 49 ++++++++++++++++++++++++++++
9 files changed, 103 insertions(+), 1 deletion(-)
create mode 100644 include/linux/dsa/oob.h
create mode 100644 net/dsa/tag_oob.c
Comments
On Fri, 4 Nov 2022 18:41:49 +0100 Maxime Chevallier wrote: > This tagging protocol is designed for the situation where the link > between the MAC and the Switch is designed such that the Destination > Port, which is usually embedded in some part of the Ethernet Header, is > sent out-of-band, and isn't present at all in the Ethernet frame. > > This can happen when the MAC and Switch are tightly integrated on an > SoC, as is the case with the Qualcomm IPQ4019 for example, where the DSA > tag is inserted directly into the DMA descriptors. In that case, > the MAC driver is responsible for sending the tag to the switch using > the out-of-band medium. To do so, the MAC driver needs to have the > information of the destination port for that skb. > > Add a new tagging protocol based on SKB extensions to convey the > information about the destination port to the MAC driver This is what METADATA_HW_PORT_MUX is for, you shouldn't have to allocate a piece of memory for every single packet. Also the series doesn't build.
Hello Jakub, On Fri, 4 Nov 2022 20:05:30 -0700 Jakub Kicinski <kuba@kernel.org> wrote: > On Fri, 4 Nov 2022 18:41:49 +0100 Maxime Chevallier wrote: > > This tagging protocol is designed for the situation where the link > > between the MAC and the Switch is designed such that the Destination > > Port, which is usually embedded in some part of the Ethernet > > Header, is sent out-of-band, and isn't present at all in the > > Ethernet frame. > > > > This can happen when the MAC and Switch are tightly integrated on an > > SoC, as is the case with the Qualcomm IPQ4019 for example, where > > the DSA tag is inserted directly into the DMA descriptors. In that > > case, the MAC driver is responsible for sending the tag to the > > switch using the out-of-band medium. To do so, the MAC driver needs > > to have the information of the destination port for that skb. > > > > Add a new tagging protocol based on SKB extensions to convey the > > information about the destination port to the MAC driver > > This is what METADATA_HW_PORT_MUX is for, you shouldn't have > to allocate a piece of memory for every single packet. Does this work with DSA ? The information conveyed in the extension is the DSA port identifier. I'm not familiar at all with METADATA_HW_PORT_MUX, should we extend that mechanism to convey the DSA port id ? I also agree that allocating data isn't the best way to go, but from the history of this series, we've tried 3 approaches so far : - Adding a new field to struct sk_buff, which isn't a good idea - Using the skb headroom, but then we can't know for sure is the skb contains a DSA tag or not - Using skb extensions, that comes with the cost of this memory allocation. Is this approach also incorrect then ? > Also the series doesn't build. Can you elaborate more ? I can't reproduce the build failure on my side, and I didn't get any reports from the kbuild bot, are you using a specific config file ? Thanks, Maxime
Hi Jakub, On Fri, Nov 04, 2022 at 08:05:30PM -0700, Jakub Kicinski wrote: > On Fri, 4 Nov 2022 18:41:49 +0100 Maxime Chevallier wrote: > > Add a new tagging protocol based on SKB extensions to convey the > > information about the destination port to the MAC driver > > This is what METADATA_HW_PORT_MUX is for, you shouldn't have > to allocate a piece of memory for every single packet. Since this is the model that skb extensions propose and not something that Maxime invented for this series, I presume that's not such a big deal? What's more, couldn't this specific limitation of skb extensions be addressed in a punctual way, via one-time calls to __skb_ext_alloc() and fast path calls to __skb_ext_set()? I'm unfamiliar to the concept of destination cache entries and even more so to the concept of struct dst_entry * carrying metadata. I suppose the latter were introduced for lack of space in struct sk_buff, to carry metadata between layers that aren't L3/L4 (where normal dst_entry structs are used)? What makes metadata dst's preferable to skb extensions? The latter are more general; AFAIK they can be used between any layer and any other layer, like for example between RX and TX in the forwarding path. Side note, I am not exactly clear what are the lifetime guarantees of a metadata dst entry, and if DSA's use would be 100% safe (DSA is kind of L3, since it has an ETH_P_XDSA packet_type handler, not an rx_handler). More importantly, what happens if a DSA switch is used together with a SRIOV-capable DSA master which already uses METADATA_HW_PORT_MUX for PF-VF communication? (if I understood the commit message of 3fcece12bc1b ("net: store port/representator id in metadata_dst") correctly)
On Mon, Nov 07, 2022 at 01:27:36PM +0200, Vladimir Oltean wrote: > Hi Jakub, > (...) There is also another problem having to do with future extensibility of METADATA_HW_PORT_MUX for DSA. I don't know how much of this is going to be applicable for qca8k, but DSA tags might also carry such information as trap reason (RX) or injection type (into forwarding plane or control packet; the latter bypasses port STP state) and the FID to which the packet should be classified by the hardware (TX). If we're going to design a mechanism which only preallocates metadata dst's for ports, it's going to be difficult to make that work for more information later on.
On Mon, 7 Nov 2022 09:39:50 +0100 Maxime Chevallier wrote: > > Also the series doesn't build. > > Can you elaborate more ? I can't reproduce the build failure on my > side, and I didn't get any reports from the kbuild bot, are you using a > specific config file ? ../net/core/skbuff.c:4495:49: error: invalid application of ‘sizeof’ to incomplete type ‘struct dsa_oob_tag_info’ 4495 | [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info), | ^~~~~~ ../include/uapi/linux/const.h:32:44: note: in definition of macro ‘__ALIGN_KERNEL_MASK’ 32 | #define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask)) | ^ ../include/linux/align.h:8:33: note: in expansion of macro ‘__ALIGN_KERNEL’ 8 | #define ALIGN(x, a) __ALIGN_KERNEL((x), (a)) | ^~~~~~~~~~~~~~ ../net/core/skbuff.c:4476:34: note: in expansion of macro ‘ALIGN’ 4476 | #define SKB_EXT_CHUNKSIZEOF(x) (ALIGN((sizeof(x)), SKB_EXT_ALIGN_VALUE) / SKB_EXT_ALIGN_VALUE) | ^~~~~ ../net/core/skbuff.c:4495:29: note: in expansion of macro ‘SKB_EXT_CHUNKSIZEOF’ 4495 | [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info), | ^~~~~~~~~~~~~~~~~~~ ../net/core/skbuff.c:4495:49: error: invalid application of ‘sizeof’ to incomplete type ‘struct dsa_oob_tag_info’ 4495 | [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info), | ^~~~~~ ../include/uapi/linux/const.h:32:50: note: in definition of macro ‘__ALIGN_KERNEL_MASK’ 32 | #define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask)) | ^~~~ ../include/linux/align.h:8:33: note: in expansion of macro ‘__ALIGN_KERNEL’ 8 | #define ALIGN(x, a) __ALIGN_KERNEL((x), (a)) | ^~~~~~~~~~~~~~ ../net/core/skbuff.c:4476:34: note: in expansion of macro ‘ALIGN’ 4476 | #define SKB_EXT_CHUNKSIZEOF(x) (ALIGN((sizeof(x)), SKB_EXT_ALIGN_VALUE) / SKB_EXT_ALIGN_VALUE) | ^~~~~ ../net/core/skbuff.c:4495:29: note: in expansion of macro ‘SKB_EXT_CHUNKSIZEOF’ 4495 | [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info), | ^~~~~~~~~~~~~~~~~~~ ../net/core/skbuff.c:4495:49: error: invalid application of ‘sizeof’ to incomplete type ‘struct dsa_oob_tag_info’ 4495 | [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info), | ^~~~~~ ../include/uapi/linux/const.h:32:61: note: in definition of macro ‘__ALIGN_KERNEL_MASK’ 32 | #define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask)) | ^~~~ ../include/linux/align.h:8:33: note: in expansion of macro ‘__ALIGN_KERNEL’ 8 | #define ALIGN(x, a) __ALIGN_KERNEL((x), (a)) | ^~~~~~~~~~~~~~ ../net/core/skbuff.c:4476:34: note: in expansion of macro ‘ALIGN’ 4476 | #define SKB_EXT_CHUNKSIZEOF(x) (ALIGN((sizeof(x)), SKB_EXT_ALIGN_VALUE) / SKB_EXT_ALIGN_VALUE) | ^~~~~ ../net/core/skbuff.c:4495:29: note: in expansion of macro ‘SKB_EXT_CHUNKSIZEOF’ 4495 | [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info), | ^~~~~~~~~~~~~~~~~~~ Also this: drivers/net/ethernet/qualcomm/ipqess/ipqess.c:1172:22: warning: cast to smaller integer type 'u32' (aka 'unsigned int') from 'void *' [-Wvoid-pointer-to-int-cast] netdev->base_addr = (u32)ess->hw_addr; ^~~~~~~~~~~~~~~~~
On Mon, Nov 07, 2022 at 08:49:34AM -0800, Jakub Kicinski wrote: > On Mon, 7 Nov 2022 12:51:26 +0000 Vladimir Oltean wrote: > > There is also another problem having to do with future extensibility of > > METADATA_HW_PORT_MUX for DSA. I don't know how much of this is going to > > be applicable for qca8k, but DSA tags might also carry such information > > as trap reason (RX) or injection type (into forwarding plane or control > > packet; the latter bypasses port STP state) and the FID to which the > > packet should be classified by the hardware (TX). If we're going to > > design a mechanism which only preallocates metadata dst's for ports, > > it's going to be difficult to make that work for more information later on. > > The entire patch we're commenting on is 100 LoC. Seems like a small > thing, which can be rewritten later as needed. I don't think hand wave-y > arguments are sufficient to go with a much heavier solution from the > start. I don't think it's as hand wavey as you think. Maxime did not present the switch-side changes or device tree in this patch set. If it's going to be based on drivers/net/dsa/qca/qca8k-common.c as I suspect, then it might have some obscure features which are already supported by 'normal' QCA8K DSA switches, like register read/write over Ethernet, and MIB autocasting. If these features exist in hardware (they aren't exposed by this patch set for sure), you'd be hard-pressed to fit them into the METADATA_HW_PORT_MUX model, since it's pure management traffic consumed by the switch driver and not delivered to the network stack, as opposed to packets sent/received on behalf of any switch port.
On Mon, Nov 07, 2022 at 08:45:35AM -0800, Jakub Kicinski wrote: > On Mon, 7 Nov 2022 11:27:37 +0000 Vladimir Oltean wrote: > > Since this is the model that skb extensions propose and not something > > that Maxime invented for this series, I presume that's not such a big > > deal? > > It's not a generic "do whatever you want" with it feature. The more > people use it the less possible it is to have it disabled in a host- > -centric kernel. We were talking about "the model" being "the model where you allocate the extension for each packet", no? > > What's more, couldn't this specific limitation of skb extensions > > be addressed in a punctual way, via one-time calls to __skb_ext_alloc() > > and fast path calls to __skb_ext_set()? > > Are you suggesting we add refcounting to the skb ext? idk, is it such a big offence? :) Actually my previous paragraph on which you replied with an apparently unrelated comment was saying that I think we're okay with allocating an skb extension for each packet, if that's what the skb extension usage model proposes. > > I'm unfamiliar to the concept of destination cache entries and even more > > so to the concept of struct dst_entry * carrying metadata. I suppose the > > latter were introduced for lack of space in struct sk_buff, to carry > > metadata between layers that aren't L3/L4 (where normal dst_entry structs > > are used)? What makes metadata dst's preferable to skb extensions? > > It's much less invasive. Don't get me wrong, I don't oppose a dst_metadata solution as long as I think that I understand it and that I can maintain/extend it as needed going forward (which I clearly think I do about skb extensions, they seem simple to use to the naive reader). No need to get territorial about it, better to arm yourself with a bit of patience. > > > The latter are more general; AFAIK they can be used between any layer > > and any other layer, like for example between RX and TX in the > > forwarding path. > > You can't be using lower-dev / upper-dev metadata across forwarding, > how would that ever work? What makes metadata dst's preferable to skb extensions? ~~~~~~~~~~~~ ~~~~~~~~~~~~~~ former latter I said: "The latter [aka skb extensions, not metadata dst's] are more general". I did not say that you can use metadata dst's across forwarding, quite the opposite. > > > Side note, I am not exactly clear what are the lifetime > > guarantees of a metadata dst entry, and if DSA's use would be 100% safe > > (DSA is kind of L3, since it has an ETH_P_XDSA packet_type handler, not > > an rx_handler). > > It's just a refcounted object. I presume the DSA uppers can't get > spawned before the lower is spawned already? By lifetime guarantees, I actually meant: what is the latest point during the RX path that skb_dst() will still point to the metadata dst, and not get replaced with the real destination cache entry? I think we're okay, because although DSA presents itself as an L3 protocol in the RX path, the 'real' L3 protocol handler will surely not have run earlier than DSA, due to how eth_type_trans() was patched to return ETH_P_XDSA. Or I might be reading things completely wrong. Again, I have no experience with destination cache entry structures or their metadata carrying kind. Or with skb extensions, for that matter, other than noticing that they exist. > > > More importantly, what happens if a DSA switch is used together with a > > SRIOV-capable DSA master which already uses METADATA_HW_PORT_MUX for > > PF-VF communication? (if I understood the commit message of 3fcece12bc1b > > ("net: store port/representator id in metadata_dst") correctly) > > Let's be clear that the OOB metadata model only works if both upper and > lower are aware of the metadata. In fact they are pretty tightly bound. > So chances of a mismatch are extremely low and theorizing about them is > academic. Legally I'm not allowed to say too much, but let's say I've heard about something which makes the above not theoretical. Anyway, let's assume it's not a concern. > > In general, I'm not sure if pretending this is DSA is not an unnecessary > complication which will end up hurting both ends of the equation. This is a valid point. We've refused wacky "not DSA, not switchdev" hardware before: https://lore.kernel.org/netdev/20201125232459.378-1-lukma@denx.de/ There's also the option of doing what I did with ocelot/felix: a common switch lib and 2 distinct front-ends, one switchdev and one DSA. Not a lot of people seem to be willing to put that effort in, though. The imx28 patch set was eventually abandoned. I though I'd try a different approach this time. Idk, maybe it's a waste of time.
On 11/7/22 10:24, Jakub Kicinski wrote: [snip] > Yeah, it's a balancing act. Please explore the metadata option, I think > most people jump to the skb extension because they don't know about > metadata. If you still want skb extension after, I'll look away. It seems to me like we are trying too hard to have a generic out of band solution to provide tagger information coming from a DMA descriptor as opposed to just introducing a DSA tagger variant specific to the format being used and specific to the switch + integrated MAC. Something like DSA_TAG_IPQDMA or whatever the name chosen would be, may be fine. The only value I see at this point in just in telling me that the tagger format is coming from a DMA descriptor, but other than that, it is just a middle layer that requires marshalling of data on both sides, so sure the idea behind DSA was to be able to mix and match any Ethernet MAC with any discrete switch, but integrating both into the same ASIC does nullify the design goal.
On Mon, Nov 07, 2022 at 10:24:40AM -0800, Jakub Kicinski wrote: > IIRC the skb extensions were initially proposed as a way to handle rare > exception packets (e.g. first packet of a connection-tracked flow in OvS > offloads). Also MPTCP but that's also edge / slow path (sorry MPTCP). > > Now the usage is spreading and I have to keep fighting to keep them out > of the datacenter production kernel I co-maintain. > > So, yeah, I hate them :) To be fair, most people see CPU-terminated traffic on a DSA switch (i.e. what you see in net/dsa/tag_*.c) as slow path which needs no optimization. It's not that I condone this, but it's factually true. If it wasn't the case, then out of the drivers I maintain, a control packet wouldn't be delivered via SPI on SJA1105, and flow control wouldn't be broken on the CPU port of switches from the Ocelot family. And more importantly, software bridging between a switchdev and a non-switchdev port wouldn't be such an oversight for more than 3/4 of all switch drivers. Also, I didn't really get *why* you hate them, just that you do. Seems circular: slow => hate; hate => slow? I don't think that skb->_skb_refdst is the hallmark of clean or simple designs either, a pointer and a refcount bit squashed into a single sk_buff field that is also in a union with 2 other things, and which is reused in other network layers for purposes that have nothing to do with L3 routing. Nope, sorry, this is highly optimized design at its finest, true, but I have no interest in doing mental gymnastics in order to maintain such a thing, just because some hardware manufacturer thought that it would be a smart idea to split up device ownership in this way, and neither build a 'switch with rings' nor a 'switch with tags', but rather 'a switch with somebody else's rings'. The people who built this monstrosity should step in and maintain the software architecture that's a direct consequence of their design choices. Otherwise I'm going to opt for the simplest thing to maintain that works. It's unfair to not care about software support for your own hardware enough to study frameworks beforehand, *and* to complain about performance. > > > > The latter are more general; AFAIK they can be used between any layer > > > > and any other layer, like for example between RX and TX in the > > > > forwarding path. > > > > > > You can't be using lower-dev / upper-dev metadata across forwarding, > > > how would that ever work? > > > > What makes metadata dst's preferable to skb extensions? > > ~~~~~~~~~~~~ ~~~~~~~~~~~~~~ > > former latter > > > > I said: "The latter [aka skb extensions, not metadata dst's] are more general". > > I did not say that you can use metadata dst's across forwarding, quite > > the opposite. > > No, no, I'm asking how you'd use either. I'm questioning the entire > flow, not whether either mechanism can be used to fulfill it. Well, we are probably talking about different things. I said that skb extensions are a more general concept which *allows* you to pass metadata from the iif to the oif. Metadata dst's don't. So what is the need for metadata dst's, if skb extensions can do what those can do, and more. Not that this use case is particularly relevant to DSA OOB. Just that I think a reasonable expectation would have been to make skb extensions more performant, than to introduce a parallel mechanism. > TBH I mostly have experience on the Tx side, given that on the Rx side > there is no queuing so the entire abstraction of tag implementation > being separate is not strictly necessary. But if you find that the Rx > doesn't work, and you really want the skb extensions - then, well, > I acquiesce. And hope the Meta prod kernel never needs OOB DSA :) I would never enable this feature, either. I would love not having to see it. > > > > More importantly, what happens if a DSA switch is used together with a > > > > SRIOV-capable DSA master which already uses METADATA_HW_PORT_MUX for > > > > PF-VF communication? (if I understood the commit message of 3fcece12bc1b > > > > ("net: store port/representator id in metadata_dst") correctly) > > > > > > Let's be clear that the OOB metadata model only works if both upper and > > > lower are aware of the metadata. In fact they are pretty tightly bound. > > > So chances of a mismatch are extremely low and theorizing about them is > > > academic. > > > > Legally I'm not allowed to say too much, but let's say I've heard about > > something which makes the above not theoretical. Anyway, let's assume > > it's not a concern. > > But in that case the same vendor designs both ends, right? Yes. > So there should be no conflict between the metadata assigned for reprs > vs dsa ports. Can't say more, sorry. > > > In general, I'm not sure if pretending this is DSA is not an unnecessary > > > complication which will end up hurting both ends of the equation. > > > > This is a valid point. We've refused wacky "not DSA, not switchdev" > > hardware before: > > https://lore.kernel.org/netdev/20201125232459.378-1-lukma@denx.de/ > > There's also the option of doing what I did with ocelot/felix: a common > > switch lib and 2 distinct front-ends, one switchdev and one DSA. > > Exactly. > > > Not a lot of people seem to be willing to put that effort in, though. > > The imx28 patch set was eventually abandoned. I though I'd try a > > different approach this time. Idk, maybe it's a waste of time. > > Yeah, it's a balancing act. Please explore the metadata option, I think > most people jump to the skb extension because they don't know about > metadata. If you still want skb extension after, I'll look away. Well, I guess I'm still not really convinced about metadata_dst, you're still not really convinced about skb extensions, but what we have in common is that "one switch lib, two front ends" is an alternative worth exploring as a design that's both clean and efficient? :)
On 07.11.22 09:39, Maxime Chevallier wrote: >> On Fri, 4 Nov 2022 18:41:49 +0100 Maxime Chevallier wrote: >> > This tagging protocol is designed for the situation where the link >> > between the MAC and the Switch is designed such that the Destination >> > Port, which is usually embedded in some part of the Ethernet >> > Header, is sent out-of-band, and isn't present at all in the >> > Ethernet frame. >> > >> > This can happen when the MAC and Switch are tightly integrated on an >> > SoC, as is the case with the Qualcomm IPQ4019 for example, where >> > the DSA tag is inserted directly into the DMA descriptors. In that >> > case, the MAC driver is responsible for sending the tag to the >> > switch using the out-of-band medium. To do so, the MAC driver needs >> > to have the information of the destination port for that skb. >> > >> > Add a new tagging protocol based on SKB extensions to convey the >> > information about the destination port to the MAC driver >> >> This is what METADATA_HW_PORT_MUX is for, you shouldn't have >> to allocate a piece of memory for every single packet. > > Does this work with DSA ? The information conveyed in the extension is > the DSA port identifier. I'm not familiar at all with > METADATA_HW_PORT_MUX, should we extend that mechanism to convey the > DSA port id ? > > I also agree that allocating data isn't the best way to go, but from > the history of this series, we've tried 3 approaches so far : > > - Adding a new field to struct sk_buff, which isn't a good idea > - Using the skb headroom, but then we can't know for sure is the skb > contains a DSA tag or not > - Using skb extensions, that comes with the cost of this memory > allocation. Is this approach also incorrect then ? FYI, I'm currently working on hardware DSA untagging on the mediatek mtk_eth_soc driver. On this hardware, I definitely need to keep the custom DSA tag driver, as hardware untagging is not always available. For the receive side, I came up with this patch (still untested) for using METADATA_HW_PORT_MUX. It has the advantage of being able to skip the tag protocol rcv ops call for offload-enabled packets. Maybe for the transmit side we could have some kind of netdev feature or capability that indicates offload support and allows skipping the tag xmit function as well. In that case, ipqess could simply use a no-op tag driver. What do you think? --- --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -972,11 +972,13 @@ bool __skb_flow_dissect(const struct net *net, if (unlikely(skb->dev && netdev_uses_dsa(skb->dev) && skb->protocol == htons(ETH_P_XDSA))) { const struct dsa_device_ops *ops; + struct metadata_dst *md_dst = skb_metadata_dst(skb); int offset = 0; ops = skb->dev->dsa_ptr->tag_ops; /* Only DSA header taggers break flow dissection */ - if (ops->needed_headroom) { + if (ops->needed_headroom && + (!md_dst || md_dst->type != METADATA_HW_PORT_MUX)) { if (ops->flow_dissect) ops->flow_dissect(skb, &proto, &offset); else --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -11,6 +11,7 @@ #include <linux/netdevice.h> #include <linux/sysfs.h> #include <linux/ptp_classify.h> +#include <net/dst_metadata.h> #include "dsa_priv.h" @@ -216,6 +217,7 @@ static bool dsa_skb_defer_rx_timestamp(struct dsa_slave_priv *p, static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *unused) { + struct metadata_dst *md_dst = skb_metadata_dst(skb); struct dsa_port *cpu_dp = dev->dsa_ptr; struct sk_buff *nskb = NULL; struct dsa_slave_priv *p; @@ -229,7 +231,21 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev, if (!skb) return 0; - nskb = cpu_dp->rcv(skb, dev); + if (md_dst && md_dst->type == METADATA_HW_PORT_MUX) { + unsigned int port = md_dst->u.port_info.port_id; + + dsa_default_offload_fwd_mark(skb); + skb_dst_set(skb, NULL); + if (!skb_has_extensions(skb)) + skb->slow_gro = 0; + + skb->dev = dsa_master_find_slave(dev, 0, port); + if (skb->dev) + nskb = skb; + } else { + nskb = cpu_dp->rcv(skb, dev); + } + if (!nskb) { kfree_skb(skb); return 0;
Hi Maxime, I love your patch! Yet something to improve: [auto build test ERROR on net-next/master] url: https://github.com/intel-lab-lkp/linux/commits/Maxime-Chevallier/net-ipqess-introduce-Qualcomm-IPQESS-driver/20221105-014353 patch link: https://lore.kernel.org/r/20221104174151.439008-4-maxime.chevallier%40bootlin.com patch subject: [PATCH net-next v8 3/5] net: dsa: add out-of-band tagging protocol config: riscv-allmodconfig compiler: riscv64-linux-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/68ff661e3d27afe0c9cfc202c8fe111d288cbdd4 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Maxime-Chevallier/net-ipqess-introduce-Qualcomm-IPQESS-driver/20221105-014353 git checkout 68ff661e3d27afe0c9cfc202c8fe111d288cbdd4 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash net/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): In file included from include/vdso/const.h:5, from include/linux/const.h:4, from include/linux/list.h:9, from include/linux/module.h:12, from net/core/skbuff.c:37: >> net/core/skbuff.c:4495:49: error: invalid application of 'sizeof' to incomplete type 'struct dsa_oob_tag_info' 4495 | [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info), | ^~~~~~ include/uapi/linux/const.h:32:44: note: in definition of macro '__ALIGN_KERNEL_MASK' 32 | #define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask)) | ^ include/linux/align.h:8:33: note: in expansion of macro '__ALIGN_KERNEL' 8 | #define ALIGN(x, a) __ALIGN_KERNEL((x), (a)) | ^~~~~~~~~~~~~~ net/core/skbuff.c:4476:34: note: in expansion of macro 'ALIGN' 4476 | #define SKB_EXT_CHUNKSIZEOF(x) (ALIGN((sizeof(x)), SKB_EXT_ALIGN_VALUE) / SKB_EXT_ALIGN_VALUE) | ^~~~~ net/core/skbuff.c:4495:29: note: in expansion of macro 'SKB_EXT_CHUNKSIZEOF' 4495 | [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info), | ^~~~~~~~~~~~~~~~~~~ >> net/core/skbuff.c:4495:49: error: invalid application of 'sizeof' to incomplete type 'struct dsa_oob_tag_info' 4495 | [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info), | ^~~~~~ include/uapi/linux/const.h:32:50: note: in definition of macro '__ALIGN_KERNEL_MASK' 32 | #define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask)) | ^~~~ include/linux/align.h:8:33: note: in expansion of macro '__ALIGN_KERNEL' 8 | #define ALIGN(x, a) __ALIGN_KERNEL((x), (a)) | ^~~~~~~~~~~~~~ net/core/skbuff.c:4476:34: note: in expansion of macro 'ALIGN' 4476 | #define SKB_EXT_CHUNKSIZEOF(x) (ALIGN((sizeof(x)), SKB_EXT_ALIGN_VALUE) / SKB_EXT_ALIGN_VALUE) | ^~~~~ net/core/skbuff.c:4495:29: note: in expansion of macro 'SKB_EXT_CHUNKSIZEOF' 4495 | [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info), | ^~~~~~~~~~~~~~~~~~~ >> net/core/skbuff.c:4495:49: error: invalid application of 'sizeof' to incomplete type 'struct dsa_oob_tag_info' 4495 | [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info), | ^~~~~~ include/uapi/linux/const.h:32:61: note: in definition of macro '__ALIGN_KERNEL_MASK' 32 | #define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask)) | ^~~~ include/linux/align.h:8:33: note: in expansion of macro '__ALIGN_KERNEL' 8 | #define ALIGN(x, a) __ALIGN_KERNEL((x), (a)) | ^~~~~~~~~~~~~~ net/core/skbuff.c:4476:34: note: in expansion of macro 'ALIGN' 4476 | #define SKB_EXT_CHUNKSIZEOF(x) (ALIGN((sizeof(x)), SKB_EXT_ALIGN_VALUE) / SKB_EXT_ALIGN_VALUE) | ^~~~~ net/core/skbuff.c:4495:29: note: in expansion of macro 'SKB_EXT_CHUNKSIZEOF' 4495 | [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info), | ^~~~~~~~~~~~~~~~~~~ vim +4495 net/core/skbuff.c 4477 4478 static const u8 skb_ext_type_len[] = { 4479 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) 4480 [SKB_EXT_BRIDGE_NF] = SKB_EXT_CHUNKSIZEOF(struct nf_bridge_info), 4481 #endif 4482 #ifdef CONFIG_XFRM 4483 [SKB_EXT_SEC_PATH] = SKB_EXT_CHUNKSIZEOF(struct sec_path), 4484 #endif 4485 #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) 4486 [TC_SKB_EXT] = SKB_EXT_CHUNKSIZEOF(struct tc_skb_ext), 4487 #endif 4488 #if IS_ENABLED(CONFIG_MPTCP) 4489 [SKB_EXT_MPTCP] = SKB_EXT_CHUNKSIZEOF(struct mptcp_ext), 4490 #endif 4491 #if IS_ENABLED(CONFIG_MCTP_FLOWS) 4492 [SKB_EXT_MCTP] = SKB_EXT_CHUNKSIZEOF(struct mctp_flow), 4493 #endif 4494 #if IS_ENABLED(CONFIG_NET_DSA_TAG_OOB) > 4495 [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info), 4496 #endif 4497 }; 4498
Hello everyone, Felix, thanks for the feedback ! On Tue, 8 Nov 2022 13:22:17 +0100 Felix Fietkau <nbd@nbd.name> wrote: [...] > FYI, I'm currently working on hardware DSA untagging on the mediatek > mtk_eth_soc driver. On this hardware, I definitely need to keep the > custom DSA tag driver, as hardware untagging is not always available. > For the receive side, I came up with this patch (still untested) for > using METADATA_HW_PORT_MUX. > It has the advantage of being able to skip the tag protocol rcv ops > call for offload-enabled packets. > > Maybe for the transmit side we could have some kind of netdev feature > or capability that indicates offload support and allows skipping the > tag xmit function as well. > In that case, ipqess could simply use a no-op tag driver. If I'm not mistaken, Florian also proposed a while ago an offload mechanism for taggin/untagging : https://lore.kernel.org/lkml/1438322920.20182.144.camel@edumazet-glaptop2.roam.corp.google.com/T/ It uses some of the points you're mentionning, such as the netdev feature :) All in all, I'm still a bit confused about the next steps. If I can summarize a bit, we have a lot of approaches, all with advantages and inconvenients, I'll try to summarize the state : - We could simply use the skb extensions as-is, rename the tagger something like "DSA_TAG_IPQDMA" and consider this a way to perform tagging on this specific class of hardware, without trying too hard to make it generic. - We could try to move forward with this mechanism of offloading tagging and untagging from the MAC driver, this would address Florian's first try at this, Felix's use-case and would fit well the IPQESS case - There's the option discussed by Vlad and Jakub to add several frontends, one being a switchev driver, here I'm a bit lost TBH, if we go this way I could definitely use a few pointers from Vlad :) When looking at it from this point of view, option 2 looks pretty promising, but I would like to make sure we're on the same page at that point. On my side, I've tried several approaches for this tagging and so far none are acceptable, for good reasons. I would like to make sure then that I grasp the full picture and I didn't miss other possible ways of addressing this. Thanks everyone for your help ! Maxime
On Tue, Nov 15, 2022 at 10:29:24AM +0100, Maxime Chevallier wrote: > Hello everyone, > > Felix, thanks for the feedback ! > > On Tue, 8 Nov 2022 13:22:17 +0100 > Felix Fietkau <nbd@nbd.name> wrote: > > [...] > > > FYI, I'm currently working on hardware DSA untagging on the mediatek > > mtk_eth_soc driver. On this hardware, I definitely need to keep the > > custom DSA tag driver, as hardware untagging is not always available. > > For the receive side, I came up with this patch (still untested) for > > using METADATA_HW_PORT_MUX. > > It has the advantage of being able to skip the tag protocol rcv ops > > call for offload-enabled packets. > > > > Maybe for the transmit side we could have some kind of netdev feature > > or capability that indicates offload support and allows skipping the > > tag xmit function as well. > > In that case, ipqess could simply use a no-op tag driver. > > If I'm not mistaken, Florian also proposed a while ago an offload > mechanism for taggin/untagging : > > https://lore.kernel.org/lkml/1438322920.20182.144.camel@edumazet-glaptop2.roam.corp.google.com/T/ > > It uses some of the points you're mentionning, such as the netdev > feature :) > > All in all, I'm still a bit confused about the next steps. If I can > summarize a bit, we have a lot of approaches, all with advantages and > inconvenients, I'll try to summarize the state : > > - We could simply use the skb extensions as-is, rename the tagger > something like "DSA_TAG_IPQDMA" and consider this a way to perform > tagging on this specific class of hardware, without trying too hard > to make it generic. For Felix, using skb extensions would be inconvenient, since it would involve per packet allocations which are now avoided with the metadata dsts. > - We could try to move forward with this mechanism of offloading > tagging and untagging from the MAC driver, this would address > Florian's first try at this, Felix's use-case and would fit well the > IPQESS case Someone would need to take things from where Felix left them: https://patchwork.kernel.org/project/netdevbpf/patch/20221114124214.58199-2-nbd@nbd.name/ and add TX tag offloading support as well. Here there would need to be a mechanism through which DSA asks "hey, this is my tagging protocol, can the master offload it in the TX direction or am I just going to push the tag into the packet?". I tried to sketch here something along those lines: https://patchwork.kernel.org/project/netdevbpf/patch/20221109163426.76164-10-nbd@nbd.name/#25084481 > - There's the option discussed by Vlad and Jakub to add several > frontends, one being a switchev driver, here I'm a bit lost TBH, if > we go this way I could definitely use a few pointers from Vlad :) The assumption being here that there is more functionality to cover by the metadata dst than a port mux. I'm really not clear what is the hardware design truly, hopefully you could give more details about that. The mechanism is quite simple, it's not rocket science. Take something like a bridge join operation, the proposal is to do something like this: dsa_slave_netdevice_event (net/dsa/slave.c) | v dsa_slave_changeupper (net/dsa/slave.c) | v dsa_port_bridge_join ocelot_netdevice_event (net/dsa/port.c) (drivers/net/ethernet/mscc/ocelot_net.c) | | v v dsa_switch_bridge_join ocelot_netdevice_changeupper (net/dsa/switch.c) (drivers/net/ethernet/mscc/ocelot_net.c) | | v v felix_bridge_join ocelot_netdevice_bridge_join (drivers/net/dsa/ocelot/felix.c) (drivers/net/ethernet/mscc/ocelot_net.c) | | | | +---------------------+---------------------+ | v ocelot_port_bridge_join (drivers/net/ethernet/mscc/ocelot.c) with you maintaining the entire right branch that represents the switchdev frontend, and more or less duplicates part of DSA. The advantage of this approach is that you can register your own NAPI handler where you can treat packets in whichever way you like, and have your own ndo_start_xmit. This driver would treat the aggregate of the ess DMA engine and the ipq switch as a single device, and expose it as a switch with DMA, basically.
Hello everyone, I'm digging this topic up, it has we'd like to move forward with the upstreaming of this, and before trying any new approach, I'd like to see if we can settle on one of the two choices that were expressed so far. To summarize the issue, this hardware platform (IPQ4019 from Qualcomm) uses an internal switch that's a modified QCA8K, for which there already is a DSA driver. On that platform, there's a MAC (ipqess) connected to the switch, that passes the dst/src port id through the DMA descriptor, whereas a typical DSA switch would pass that information in the frame itself. There has been a few approaches to try and reuse DSA as-is with a custom tagger, but all of them eventually got rejected, for a good reason. Two solutions are proposed, as discussed in that thread (hence the top-posting, sorry about that). There are two approaches remaining, either implementing DSA tagging offload support in RX/TX, or having a DSA frontend for the switch (the current QCA8K driver) and a switchdev frontend, using the qca8k logic with the ESS driver handling transfers for the CPU port. As both approaches make sense but are quite opposed, I'd like to make sure we go in the right direction. The switchdev approach definitely makes a lot of sense, but the DSA tagging offloading has been in discussion for quite a while, starting with Florian's series, followed by Felix's, and this could also be a good occasion to move forward with this, and it would also involve a minimal rework of the current ipqess driver. Any pointer would help, Thanks everyone, Maxime On Tue, 15 Nov 2022 11:50:23 +0000 Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > On Tue, Nov 15, 2022 at 10:29:24AM +0100, Maxime Chevallier wrote: > > Hello everyone, > > > > Felix, thanks for the feedback ! > > > > On Tue, 8 Nov 2022 13:22:17 +0100 > > Felix Fietkau <nbd@nbd.name> wrote: > > > > [...] > > > > > FYI, I'm currently working on hardware DSA untagging on the > > > mediatek mtk_eth_soc driver. On this hardware, I definitely need > > > to keep the custom DSA tag driver, as hardware untagging is not > > > always available. For the receive side, I came up with this patch > > > (still untested) for using METADATA_HW_PORT_MUX. > > > It has the advantage of being able to skip the tag protocol rcv > > > ops call for offload-enabled packets. > > > > > > Maybe for the transmit side we could have some kind of netdev > > > feature or capability that indicates offload support and allows > > > skipping the tag xmit function as well. > > > In that case, ipqess could simply use a no-op tag driver. > > > > If I'm not mistaken, Florian also proposed a while ago an offload > > mechanism for taggin/untagging : > > > > https://lore.kernel.org/lkml/1438322920.20182.144.camel@edumazet-glaptop2.roam.corp.google.com/T/ > > > > It uses some of the points you're mentionning, such as the netdev > > feature :) > > > > All in all, I'm still a bit confused about the next steps. If I can > > summarize a bit, we have a lot of approaches, all with advantages > > and inconvenients, I'll try to summarize the state : > > > > - We could simply use the skb extensions as-is, rename the tagger > > something like "DSA_TAG_IPQDMA" and consider this a way to > > perform tagging on this specific class of hardware, without trying > > too hard to make it generic. > > For Felix, using skb extensions would be inconvenient, since it would > involve per packet allocations which are now avoided with the metadata > dsts. > > > - We could try to move forward with this mechanism of offloading > > tagging and untagging from the MAC driver, this would address > > Florian's first try at this, Felix's use-case and would fit well > > the IPQESS case > > Someone would need to take things from where Felix left them: > https://patchwork.kernel.org/project/netdevbpf/patch/20221114124214.58199-2-nbd@nbd.name/ > and add TX tag offloading support as well. Here there would need to be > a mechanism through which DSA asks "hey, this is my tagging protocol, > can the master offload it in the TX direction or am I just going to > push the tag into the packet?". I tried to sketch here something > along those lines: > https://patchwork.kernel.org/project/netdevbpf/patch/20221109163426.76164-10-nbd@nbd.name/#25084481 > > > - There's the option discussed by Vlad and Jakub to add several > > frontends, one being a switchev driver, here I'm a bit lost TBH, > > if we go this way I could definitely use a few pointers from Vlad > > :) > > The assumption being here that there is more functionality to cover by > the metadata dst than a port mux. I'm really not clear what is the > hardware design truly, hopefully you could give more details about > that. TBH the documentation I have is pretty limited, I don't actually know what else can go in the metadata attached to the descriptor :( > The mechanism is quite simple, it's not rocket science. Take something > like a bridge join operation, the proposal is to do something like > this: > > dsa_slave_netdevice_event > (net/dsa/slave.c) > | > v > dsa_slave_changeupper > (net/dsa/slave.c) > | > v > dsa_port_bridge_join > ocelot_netdevice_event (net/dsa/port.c) > (drivers/net/ethernet/mscc/ocelot_net.c) | > | v v > dsa_switch_bridge_join > ocelot_netdevice_changeupper (net/dsa/switch.c) > (drivers/net/ethernet/mscc/ocelot_net.c) | > | v v > felix_bridge_join > ocelot_netdevice_bridge_join (drivers/net/dsa/ocelot/felix.c) > (drivers/net/ethernet/mscc/ocelot_net.c) | > | | | > +---------------------+---------------------+ > | > v > ocelot_port_bridge_join > (drivers/net/ethernet/mscc/ocelot.c) > > with you maintaining the entire right branch that represents the > switchdev frontend, and more or less duplicates part of DSA. > > The advantage of this approach is that you can register your own NAPI > handler where you can treat packets in whichever way you like, and > have your own ndo_start_xmit. This driver would treat the aggregate > of the ess DMA engine and the ipq switch as a single device, and > expose it as a switch with DMA, basically.
diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst index a94ddf83348a..2909ed5f00f6 100644 --- a/Documentation/networking/dsa/dsa.rst +++ b/Documentation/networking/dsa/dsa.rst @@ -66,7 +66,8 @@ Switch tagging protocols ------------------------ DSA supports many vendor-specific tagging protocols, one software-defined -tagging protocol, and a tag-less mode as well (``DSA_TAG_PROTO_NONE``). +tagging protocol, a tag-less mode as well (``DSA_TAG_PROTO_NONE``) and an +out-of-band tagging protocol (``DSA_TAG_PROTO_OOB``). The exact format of the tag protocol is vendor specific, but in general, they all contain something which: @@ -217,6 +218,16 @@ receive all frames regardless of the value of the MAC DA. This can be done by setting the ``promisc_on_master`` property of the ``struct dsa_device_ops``. Note that this assumes a DSA-unaware master driver, which is the norm. +Some SoCs have a tight integration between the conduit network interface and the +embedded switch, such that the DSA tag isn't transmitted in the packet data, +but through another media, using so-called out-of-band tagging. In that case, +the host MAC driver is in charge of transmitting the tag to the switch. +An example is the IPQ4019 SoC, that transmits the tag between the ipqess +ethernet controller and the qca8k switch using the DMA descriptor. In that +configuration, tag-chaining is permitted, but the OOB tag will always be the +top-most switch in the tree. The tagger (``DSA_TAG_PROTO_OOB``) uses skb +extensions to transmit the tag to and from the MAC driver. + Master network devices ---------------------- diff --git a/MAINTAINERS b/MAINTAINERS index 47588d4b1657..bdf716128058 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17055,6 +17055,7 @@ L: netdev@vger.kernel.org S: Maintained F: Documentation/devicetree/bindings/net/qcom,ipq4019-ess-edma.yaml F: drivers/net/ethernet/qualcomm/ipqess/ +F: net/dsa/tag_oob.c QUALCOMM ETHQOS ETHERNET DRIVER M: Vinod Koul <vkoul@kernel.org> diff --git a/include/linux/dsa/oob.h b/include/linux/dsa/oob.h new file mode 100644 index 000000000000..b5683a9a647d --- /dev/null +++ b/include/linux/dsa/oob.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0-only + * Copyright (C) 2022 Maxime Chevallier <maxime.chevallier@bootlin.com> + */ + +#ifndef _NET_DSA_OOB_H +#define _NET_DSA_OOB_H + +#include <linux/skbuff.h> + +struct dsa_oob_tag_info { + u16 port; +}; + +int dsa_oob_tag_push(struct sk_buff *skb, struct dsa_oob_tag_info *ti); +int dsa_oob_tag_pop(struct sk_buff *skb, struct dsa_oob_tag_info *ti); +#endif diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 59c9fd55699d..ace765ae56b3 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4573,6 +4573,9 @@ enum skb_ext_id { #endif #if IS_ENABLED(CONFIG_MCTP_FLOWS) SKB_EXT_MCTP, +#endif +#if IS_ENABLED(CONFIG_NET_DSA_TAG_OOB) + SKB_EXT_DSA_OOB, #endif SKB_EXT_NUM, /* must be last */ }; diff --git a/include/net/dsa.h b/include/net/dsa.h index ee369670e20e..114176efacc9 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -55,6 +55,7 @@ struct phylink_link_state; #define DSA_TAG_PROTO_RTL8_4T_VALUE 25 #define DSA_TAG_PROTO_RZN1_A5PSW_VALUE 26 #define DSA_TAG_PROTO_LAN937X_VALUE 27 +#define DSA_TAG_PROTO_OOB_VALUE 28 enum dsa_tag_protocol { DSA_TAG_PROTO_NONE = DSA_TAG_PROTO_NONE_VALUE, @@ -85,6 +86,7 @@ enum dsa_tag_protocol { DSA_TAG_PROTO_RTL8_4T = DSA_TAG_PROTO_RTL8_4T_VALUE, DSA_TAG_PROTO_RZN1_A5PSW = DSA_TAG_PROTO_RZN1_A5PSW_VALUE, DSA_TAG_PROTO_LAN937X = DSA_TAG_PROTO_LAN937X_VALUE, + DSA_TAG_PROTO_OOB = DSA_TAG_PROTO_OOB_VALUE, }; struct dsa_switch; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 42a35b59fb1e..571ef7fd95b4 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -61,8 +61,12 @@ #include <linux/if_vlan.h> #include <linux/mpls.h> #include <linux/kcov.h> +#ifdef CONFIG_NET_DSA_TAG_OOB +#include <linux/dsa/oob.h> +#endif #include <net/protocol.h> +#include <net/dsa.h> #include <net/dst.h> #include <net/sock.h> #include <net/checksum.h> @@ -4487,6 +4491,9 @@ static const u8 skb_ext_type_len[] = { #if IS_ENABLED(CONFIG_MCTP_FLOWS) [SKB_EXT_MCTP] = SKB_EXT_CHUNKSIZEOF(struct mctp_flow), #endif +#if IS_ENABLED(CONFIG_NET_DSA_TAG_OOB) + [SKB_EXT_DSA_OOB] = SKB_EXT_CHUNKSIZEOF(struct dsa_oob_tag_info), +#endif }; static __always_inline unsigned int skb_ext_total_length(void) @@ -4506,6 +4513,9 @@ static __always_inline unsigned int skb_ext_total_length(void) #endif #if IS_ENABLED(CONFIG_MCTP_FLOWS) skb_ext_type_len[SKB_EXT_MCTP] + +#endif +#if IS_ENABLED(CONFIG_NET_DSA_TAG_OOB) + skb_ext_type_len[SKB_EXT_DSA_OOB] + #endif 0; } diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig index 3eef72ce99a4..2ba4bbe07df1 100644 --- a/net/dsa/Kconfig +++ b/net/dsa/Kconfig @@ -113,6 +113,15 @@ config NET_DSA_TAG_OCELOT_8021Q this mode, less TCAM resources (VCAP IS1, IS2, ES0) are available for use with tc-flower. +config NET_DSA_TAG_OOB + select SKB_EXTENSIONS + tristate "Tag driver for Out-of-band tagging drivers" + help + Say Y or M if you want to enable support for pairs of embedded + switches and host MAC drivers which perform demultiplexing and + packet steering to ports using out of band metadata processed + by the DSA master, rather than tags present in the packets. + config NET_DSA_TAG_QCA tristate "Tag driver for Qualcomm Atheros QCA8K switches" help diff --git a/net/dsa/Makefile b/net/dsa/Makefile index bf57ef3bce2a..b11c24c969ee 100644 --- a/net/dsa/Makefile +++ b/net/dsa/Makefile @@ -22,6 +22,7 @@ obj-$(CONFIG_NET_DSA_TAG_LAN9303) += tag_lan9303.o obj-$(CONFIG_NET_DSA_TAG_MTK) += tag_mtk.o obj-$(CONFIG_NET_DSA_TAG_OCELOT) += tag_ocelot.o obj-$(CONFIG_NET_DSA_TAG_OCELOT_8021Q) += tag_ocelot_8021q.o +obj-$(CONFIG_NET_DSA_TAG_OOB) += tag_oob.o obj-$(CONFIG_NET_DSA_TAG_QCA) += tag_qca.o obj-$(CONFIG_NET_DSA_TAG_RTL4_A) += tag_rtl4_a.o obj-$(CONFIG_NET_DSA_TAG_RTL8_4) += tag_rtl8_4.o diff --git a/net/dsa/tag_oob.c b/net/dsa/tag_oob.c new file mode 100644 index 000000000000..e328a1f4e38d --- /dev/null +++ b/net/dsa/tag_oob.c @@ -0,0 +1,49 @@ +// SPDX-License-Identifier: GPL-2.0-only + +/* Copyright (c) 2022, Maxime Chevallier <maxime.chevallier@bootlin.com> */ + +#include <linux/bitfield.h> +#include <linux/dsa/oob.h> +#include <linux/skbuff.h> + +#include "dsa_priv.h" + +static struct sk_buff *oob_tag_xmit(struct sk_buff *skb, + struct net_device *dev) +{ + struct dsa_oob_tag_info *tag_info = skb_ext_add(skb, SKB_EXT_DSA_OOB); + struct dsa_port *dp = dsa_slave_to_port(dev); + + tag_info->port = dp->index; + + return skb; +} + +static struct sk_buff *oob_tag_rcv(struct sk_buff *skb, + struct net_device *dev) +{ + struct dsa_oob_tag_info *tag_info = skb_ext_find(skb, SKB_EXT_DSA_OOB); + + if (!tag_info) + return NULL; + + skb->dev = dsa_master_find_slave(dev, 0, tag_info->port); + if (!skb->dev) + return NULL; + + return skb; +} + +static const struct dsa_device_ops oob_tag_dsa_ops = { + .name = "oob", + .proto = DSA_TAG_PROTO_OOB, + .xmit = oob_tag_xmit, + .rcv = oob_tag_rcv, +}; + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("DSA tag driver for out-of-band tagging"); +MODULE_AUTHOR("Maxime Chevallier <maxime.chevallier@bootlin.com>"); +MODULE_ALIAS_DSA_TAG_DRIVER(DSA_TAG_PROTO_OOB); + +module_dsa_tag_driver(oob_tag_dsa_ops);