Message ID | 20230925023546.9964-1-huangjie.albert@bytedance.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp1038009vqu; Mon, 25 Sep 2023 00:38:38 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFQsPEQWZnkaxip+DQzr1Jfd4BM3dE1337AtrnpChUxuyB9F5ZV+eJY/d9Gjhv9ffzr1k51 X-Received: by 2002:a05:6a20:748c:b0:15e:bf2b:e6c8 with SMTP id p12-20020a056a20748c00b0015ebf2be6c8mr2058331pzd.2.1695627517771; Mon, 25 Sep 2023 00:38:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695627517; cv=none; d=google.com; s=arc-20160816; b=B7RlDRVE3xsqQ6VXQA24+JwQtBRD6vCWos9Qo/IcekHmHmDETwLRhMyUBnl8xjJvNs 8NtQVjeR/z4ARqASL+HeTKPRe6q8zTo2OsYwgVKZs5UIJDqXRc7cYrAhygi9OV/I7wdQ looCH+1IjaujTeO5uef5GKJ8GpJJ9GAllYJNWxxV7p5P2t1pKn62OF2YLFAauSGAzhqR Mq+6q0/eQaPVZM3YR2fQBfFG9NPoqaa7NytyDCFphilrmPXBNlyabiq+LGg4q8DF1Pmm 9E/GgJg2cXD9tJzbUOgOMjtc1mvZQ1I/VWuwWzAADi1WQpnBwsspo0gTqkasH3u8nvMH Ng6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=dzmBYOC+iwFysYemk8kdLeE2Dt2aHmz4Un9SzH6Y5Wg=; fh=s8xMp3kzpiUjeINEqu73Ew84A+8WsDBlOLgplhfx0vo=; b=Khxiycqf2lNxKpUi1vt2F7TYsMkboJfB1dS7x7krlEpV5Ybci+e6649+c3PhW2Gu7v Fuz/WJ6sW/EG1/cG65rjndWoVQTzcnu+Ev3S2pU2ORaDDvl+J3KrORrq48uNF58X0ZLU QTLRmyChBG58zBpi8AMert2anPGE7omZFrHXRXWu3MPxR3TDqCUallbkAZpZ/E2mXi37 nwV9QTu3zhSVNm3DdtESLsSrQndyW9wV9W7X4YifzS99ZUjZo3oOOCPbxFAbvbuY9Wfl YYuTfeTV7uUE2p310MF/qXah3CuzXh7LuzVG92i3lx+u23VZ3XFJd97aDgK0OtLXVAMt 0IYg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=bEPsr68M; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id ca23-20020a056a02069700b00565e6e7a80asi10608149pgb.349.2023.09.25.00.38.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 00:38:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=bEPsr68M; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id DC3A4814595C; Sun, 24 Sep 2023 19:36:38 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231769AbjIYCgc (ORCPT <rfc822;ezelljr.billy@gmail.com> + 30 others); Sun, 24 Sep 2023 22:36:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51274 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230194AbjIYCga (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 24 Sep 2023 22:36:30 -0400 Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E92FC2 for <linux-kernel@vger.kernel.org>; Sun, 24 Sep 2023 19:36:02 -0700 (PDT) Received: by mail-pl1-x62e.google.com with SMTP id d9443c01a7336-1c61acd1285so4888155ad.2 for <linux-kernel@vger.kernel.org>; Sun, 24 Sep 2023 19:36:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1695609361; x=1696214161; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=dzmBYOC+iwFysYemk8kdLeE2Dt2aHmz4Un9SzH6Y5Wg=; b=bEPsr68MnOBQdj+Rj5S5E9z+oTlmdfS3vCrMIPE2MugjG3iq6EtHs+RL3S0noErXHF Gtf6ucm30U2aeRbb59OuRWRh6Bqan4lbe3206VlgZlpDgP/HJH86yZCc3XGYQGIv9S20 BuQEGUd+s71KazqxFCeAyiGAqM++LRocDdekkc5SenypHyLqCVKDopwscI5pBBbqG+bD Sql6HNSekdhWUxlJD/v6B5ArwDocOSa5qt+GqxFtO3/x8lkxrsWN+nNAvR55N7uckpG7 9AVm02ydI4EtbuJkwQeZ8d2Z2GvneG2Ic5lUZ74HDzAvGnNy5SON8QvcUwiqynXiWh49 AFSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695609361; x=1696214161; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=dzmBYOC+iwFysYemk8kdLeE2Dt2aHmz4Un9SzH6Y5Wg=; b=ZnjB5htCChRN8F71KmCu0M1Se7g9eyfF0CimmWUbkVTnsO/H24BQjBC7tCr2SYO83P u2uM8/671WGbp5OQV94HB6lVKB9jlCDwPNo/d29aXIcHXLMcxn2EcnX3+mN7pMCWKIgQ vkUL3vfR3xkcoqHvfe9VUXv6PIYMgNdzXafIAuOFSS6GrpT9QFTb5kCBmqL5uc791Orm COaZ66lNveqhU0wP5HPj16Bs7AXtfHbabHBhblgFrOZOMo9MgYVqvWeOiV37kIfHsw5P 6H8SEbw5bhi6NENvWkTKBNjd8R9QOv3NJWPSZhoo161NSWzhr2oXWfH2h3fqoImYSe+0 vNEw== X-Gm-Message-State: AOJu0YwgfKB406L3V1kIBzuDrjPBPMOno/pQC/rUyCLDPwRKU/lnDvlX ZdjaHN+irMU6wi5Q+Xjk6qtgIA== X-Received: by 2002:a17:902:bd97:b0:1bb:9506:d47c with SMTP id q23-20020a170902bd9700b001bb9506d47cmr3693386pls.19.1695609361476; Sun, 24 Sep 2023 19:36:01 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([203.208.189.6]) by smtp.gmail.com with ESMTPSA id u15-20020a170902e5cf00b001a5fccab02dsm7516445plf.177.2023.09.24.19.35.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 24 Sep 2023 19:36:00 -0700 (PDT) From: Albert Huang <huangjie.albert@bytedance.com> To: Karsten Graul <kgraul@linux.ibm.com>, Wenjia Zhang <wenjia@linux.ibm.com>, Jan Karcher <jaka@linux.ibm.com> Cc: Albert Huang <huangjie.albert@bytedance.com>, "D. Wythe" <alibuda@linux.alibaba.com>, Tony Lu <tonylu@linux.alibaba.com>, Wen Gu <guwen@linux.alibaba.com>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, linux-s390@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next] net/smc: add support for netdevice in containers. Date: Mon, 25 Sep 2023 10:35:45 +0800 Message-Id: <20230925023546.9964-1-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Sun, 24 Sep 2023 19:36:38 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777994319796275288 X-GMAIL-MSGID: 1777994319796275288 |
Series |
[net-next] net/smc: add support for netdevice in containers.
|
|
Commit Message
黄杰
Sept. 25, 2023, 2:35 a.m. UTC
If the netdevice is within a container and communicates externally
through network technologies like VXLAN, we won't be able to find
routing information in the init_net namespace. To address this issue,
we need to add a struct net parameter to the smc_ib_find_route function.
This allow us to locate the routing information within the corresponding
net namespace, ensuring the correct completion of the SMC CLC interaction.
Signed-off-by: Albert Huang <huangjie.albert@bytedance.com>
---
net/smc/af_smc.c | 3 ++-
net/smc/smc_ib.c | 7 ++++---
net/smc/smc_ib.h | 2 +-
3 files changed, 7 insertions(+), 5 deletions(-)
Comments
On Mon, Sep 25, 2023 at 10:35:45AM +0800, Albert Huang wrote: > If the netdevice is within a container and communicates externally > through network technologies like VXLAN, we won't be able to find > routing information in the init_net namespace. To address this issue, > we need to add a struct net parameter to the smc_ib_find_route function. > This allow us to locate the routing information within the corresponding > net namespace, ensuring the correct completion of the SMC CLC interaction. > > Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> > --- > net/smc/af_smc.c | 3 ++- > net/smc/smc_ib.c | 7 ++++--- > net/smc/smc_ib.h | 2 +- > 3 files changed, 7 insertions(+), 5 deletions(-) > > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c > index bacdd971615e..7a874da90c7f 100644 > --- a/net/smc/af_smc.c > +++ b/net/smc/af_smc.c > @@ -1201,6 +1201,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, > (struct smc_clc_msg_accept_confirm_v2 *)aclc; > struct smc_clc_first_contact_ext *fce = > smc_get_clc_first_contact_ext(clc_v2, false); > + struct net *net = sock_net(&smc->sk); > int rc; > > if (!ini->first_contact_peer || aclc->hdr.version == SMC_V1) > @@ -1210,7 +1211,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, > memcpy(ini->smcrv2.nexthop_mac, &aclc->r0.lcl.mac, ETH_ALEN); > ini->smcrv2.uses_gateway = false; > } else { > - if (smc_ib_find_route(smc->clcsock->sk->sk_rcv_saddr, > + if (smc_ib_find_route(net, smc->clcsock->sk->sk_rcv_saddr, > smc_ib_gid_to_ipv4(aclc->r0.lcl.gid), > ini->smcrv2.nexthop_mac, > &ini->smcrv2.uses_gateway)) > diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c > index 9b66d6aeeb1a..89981dbe46c9 100644 > --- a/net/smc/smc_ib.c > +++ b/net/smc/smc_ib.c > @@ -193,7 +193,7 @@ bool smc_ib_port_active(struct smc_ib_device *smcibdev, u8 ibport) > return smcibdev->pattr[ibport - 1].state == IB_PORT_ACTIVE; > } > > -int smc_ib_find_route(__be32 saddr, __be32 daddr, > +int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, > u8 nexthop_mac[], u8 *uses_gateway) > { > struct neighbour *neigh = NULL; > @@ -205,7 +205,7 @@ int smc_ib_find_route(__be32 saddr, __be32 daddr, > > if (daddr == cpu_to_be32(INADDR_NONE)) > goto out; > - rt = ip_route_output_flow(&init_net, &fl4, NULL); > + rt = ip_route_output_flow(net, &fl4, NULL); This patch made me wonder, why doesn't SMC use RDMA-CM like all other in-kernel ULPs which work over RDMA? Thanks
On 26.09.23 12:48, Leon Romanovsky wrote: > This patch made me wonder, why doesn't SMC use RDMA-CM like all other > in-kernel ULPs which work over RDMA? > > Thanks The idea behind SMC is that it should look an feel to the applications like TCP sockets. So for connection management it uses TCP over IP; RDMA is just used for the data transfer.
On Tue, Sep 26, 2023 at 01:14:04PM +0200, Alexandra Winter wrote: > > > On 26.09.23 12:48, Leon Romanovsky wrote: > > This patch made me wonder, why doesn't SMC use RDMA-CM like all other > > in-kernel ULPs which work over RDMA? > > > > Thanks > > The idea behind SMC is that it should look an feel to the applications > like TCP sockets. So for connection management it uses TCP over IP; > RDMA is just used for the data transfer. I think that it is not different from other ULPs. For example, RDS works over sockets and doesn't touch or reimplement GID management logic. Thanks
On Tue, Sep 26, 2023 at 02:41:04PM +0300, Leon Romanovsky wrote: >On Tue, Sep 26, 2023 at 01:14:04PM +0200, Alexandra Winter wrote: >> >> >> On 26.09.23 12:48, Leon Romanovsky wrote: >> > This patch made me wonder, why doesn't SMC use RDMA-CM like all other >> > in-kernel ULPs which work over RDMA? >> > >> > Thanks >> >> The idea behind SMC is that it should look an feel to the applications >> like TCP sockets. So for connection management it uses TCP over IP; >> RDMA is just used for the data transfer. > >I think that it is not different from other ULPs. For example, RDS works >over sockets and doesn't touch or reimplement GID management logic. I think the difference is SMC socket need to be compatible with TCP socket, so it need a tcp socket to fallback when something is not working. If SMC works with rdmacm, it still need a fallback-to-tcp socket, and the tcp connection has to be established for each SMC socket before the SMC socket got established, that would make rdmacm meaningless. Best regards, Dust > >Thanks
On Tue, Sep 26, 2023 at 08:09:03PM +0800, Dust Li wrote: > On Tue, Sep 26, 2023 at 02:41:04PM +0300, Leon Romanovsky wrote: > >On Tue, Sep 26, 2023 at 01:14:04PM +0200, Alexandra Winter wrote: > >> > >> > >> On 26.09.23 12:48, Leon Romanovsky wrote: > >> > This patch made me wonder, why doesn't SMC use RDMA-CM like all other > >> > in-kernel ULPs which work over RDMA? > >> > > >> > Thanks > >> > >> The idea behind SMC is that it should look an feel to the applications > >> like TCP sockets. So for connection management it uses TCP over IP; > >> RDMA is just used for the data transfer. > > > >I think that it is not different from other ULPs. For example, RDS works > >over sockets and doesn't touch or reimplement GID management logic. > > I think the difference is SMC socket need to be compatible with TCP > socket, so it need a tcp socket to fallback when something is not working. > > If SMC works with rdmacm, it still need a fallback-to-tcp socket, and > the tcp connection has to be established for each SMC socket before the > SMC socket got established, that would make rdmacm meaningless. You still need to perform device-GID-route translations [1], which sounds to me very RDMA-CM. I'm not asking you to rewrite the code, but trying to get rationale behind reimplementing part of RDMA subsystem. Thanks [1] 24fb68111d45 ("net/smc: retrieve v2 gid from IB device") > > Best regards, > Dust > > > > >Thanks
On Mon, Sep 25, 2023 at 10:35:45AM +0800, Albert Huang wrote: >If the netdevice is within a container and communicates externally >through network technologies like VXLAN, we won't be able to find >routing information in the init_net namespace. To address this issue, Thanks for your founding ! I think this is a more generic problem, but not just related to VXLAN ? If we use SMC-R v2 and the netdevice is in a net namespace which is not init_net, we should always fail, right ? If so, I'd prefer this to be a bugfix. Best regards, Dust >we need to add a struct net parameter to the smc_ib_find_route function. >This allow us to locate the routing information within the corresponding >net namespace, ensuring the correct completion of the SMC CLC interaction. > >Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> >--- > net/smc/af_smc.c | 3 ++- > net/smc/smc_ib.c | 7 ++++--- > net/smc/smc_ib.h | 2 +- > 3 files changed, 7 insertions(+), 5 deletions(-) > >diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >index bacdd971615e..7a874da90c7f 100644 >--- a/net/smc/af_smc.c >+++ b/net/smc/af_smc.c >@@ -1201,6 +1201,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, > (struct smc_clc_msg_accept_confirm_v2 *)aclc; > struct smc_clc_first_contact_ext *fce = > smc_get_clc_first_contact_ext(clc_v2, false); >+ struct net *net = sock_net(&smc->sk); > int rc; > > if (!ini->first_contact_peer || aclc->hdr.version == SMC_V1) >@@ -1210,7 +1211,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, > memcpy(ini->smcrv2.nexthop_mac, &aclc->r0.lcl.mac, ETH_ALEN); > ini->smcrv2.uses_gateway = false; > } else { >- if (smc_ib_find_route(smc->clcsock->sk->sk_rcv_saddr, >+ if (smc_ib_find_route(net, smc->clcsock->sk->sk_rcv_saddr, > smc_ib_gid_to_ipv4(aclc->r0.lcl.gid), > ini->smcrv2.nexthop_mac, > &ini->smcrv2.uses_gateway)) >diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c >index 9b66d6aeeb1a..89981dbe46c9 100644 >--- a/net/smc/smc_ib.c >+++ b/net/smc/smc_ib.c >@@ -193,7 +193,7 @@ bool smc_ib_port_active(struct smc_ib_device *smcibdev, u8 ibport) > return smcibdev->pattr[ibport - 1].state == IB_PORT_ACTIVE; > } > >-int smc_ib_find_route(__be32 saddr, __be32 daddr, >+int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, > u8 nexthop_mac[], u8 *uses_gateway) > { > struct neighbour *neigh = NULL; >@@ -205,7 +205,7 @@ int smc_ib_find_route(__be32 saddr, __be32 daddr, > > if (daddr == cpu_to_be32(INADDR_NONE)) > goto out; >- rt = ip_route_output_flow(&init_net, &fl4, NULL); >+ rt = ip_route_output_flow(net, &fl4, NULL); > if (IS_ERR(rt)) > goto out; > if (rt->rt_uses_gateway && rt->rt_gw_family != AF_INET) >@@ -235,6 +235,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, > if (smcrv2 && attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP && > smc_ib_gid_to_ipv4((u8 *)&attr->gid) != cpu_to_be32(INADDR_NONE)) { > struct in_device *in_dev = __in_dev_get_rcu(ndev); >+ struct net *net = dev_net(ndev); > const struct in_ifaddr *ifa; > bool subnet_match = false; > >@@ -248,7 +249,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, > } > if (!subnet_match) > goto out; >- if (smcrv2->daddr && smc_ib_find_route(smcrv2->saddr, >+ if (smcrv2->daddr && smc_ib_find_route(net, smcrv2->saddr, > smcrv2->daddr, > smcrv2->nexthop_mac, > &smcrv2->uses_gateway)) >diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h >index 4df5f8c8a0a1..ef8ac2b7546d 100644 >--- a/net/smc/smc_ib.h >+++ b/net/smc/smc_ib.h >@@ -112,7 +112,7 @@ void smc_ib_sync_sg_for_device(struct smc_link *lnk, > int smc_ib_determine_gid(struct smc_ib_device *smcibdev, u8 ibport, > unsigned short vlan_id, u8 gid[], u8 *sgid_index, > struct smc_init_info_smcrv2 *smcrv2); >-int smc_ib_find_route(__be32 saddr, __be32 daddr, >+int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, > u8 nexthop_mac[], u8 *uses_gateway); > bool smc_ib_is_valid_local_systemid(void); > int smcr_nl_get_device(struct sk_buff *skb, struct netlink_callback *cb); >-- >2.37.1 (Apple Git-137.1)
On Wed, Sep 27, 2023 at 11:42:09AM +0800, Dust Li wrote: > On Mon, Sep 25, 2023 at 10:35:45AM +0800, Albert Huang wrote: > >If the netdevice is within a container and communicates externally > >through network technologies like VXLAN, we won't be able to find > >routing information in the init_net namespace. To address this issue, > > Thanks for your founding ! > > I think this is a more generic problem, but not just related to VXLAN ? > If we use SMC-R v2 and the netdevice is in a net namespace which is not > init_net, we should always fail, right ? If so, I'd prefer this to be a bugfix. BTW, does this patch take into account net namespace of ib_device? Thanks > > Best regards, > Dust > > >we need to add a struct net parameter to the smc_ib_find_route function. > >This allow us to locate the routing information within the corresponding > >net namespace, ensuring the correct completion of the SMC CLC interaction. > > > >Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> > >--- > > net/smc/af_smc.c | 3 ++- > > net/smc/smc_ib.c | 7 ++++--- > > net/smc/smc_ib.h | 2 +- > > 3 files changed, 7 insertions(+), 5 deletions(-) > > > >diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c > >index bacdd971615e..7a874da90c7f 100644 > >--- a/net/smc/af_smc.c > >+++ b/net/smc/af_smc.c > >@@ -1201,6 +1201,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, > > (struct smc_clc_msg_accept_confirm_v2 *)aclc; > > struct smc_clc_first_contact_ext *fce = > > smc_get_clc_first_contact_ext(clc_v2, false); > >+ struct net *net = sock_net(&smc->sk); > > int rc; > > > > if (!ini->first_contact_peer || aclc->hdr.version == SMC_V1) > >@@ -1210,7 +1211,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, > > memcpy(ini->smcrv2.nexthop_mac, &aclc->r0.lcl.mac, ETH_ALEN); > > ini->smcrv2.uses_gateway = false; > > } else { > >- if (smc_ib_find_route(smc->clcsock->sk->sk_rcv_saddr, > >+ if (smc_ib_find_route(net, smc->clcsock->sk->sk_rcv_saddr, > > smc_ib_gid_to_ipv4(aclc->r0.lcl.gid), > > ini->smcrv2.nexthop_mac, > > &ini->smcrv2.uses_gateway)) > >diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c > >index 9b66d6aeeb1a..89981dbe46c9 100644 > >--- a/net/smc/smc_ib.c > >+++ b/net/smc/smc_ib.c > >@@ -193,7 +193,7 @@ bool smc_ib_port_active(struct smc_ib_device *smcibdev, u8 ibport) > > return smcibdev->pattr[ibport - 1].state == IB_PORT_ACTIVE; > > } > > > >-int smc_ib_find_route(__be32 saddr, __be32 daddr, > >+int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, > > u8 nexthop_mac[], u8 *uses_gateway) > > { > > struct neighbour *neigh = NULL; > >@@ -205,7 +205,7 @@ int smc_ib_find_route(__be32 saddr, __be32 daddr, > > > > if (daddr == cpu_to_be32(INADDR_NONE)) > > goto out; > >- rt = ip_route_output_flow(&init_net, &fl4, NULL); > >+ rt = ip_route_output_flow(net, &fl4, NULL); > > if (IS_ERR(rt)) > > goto out; > > if (rt->rt_uses_gateway && rt->rt_gw_family != AF_INET) > >@@ -235,6 +235,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, > > if (smcrv2 && attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP && > > smc_ib_gid_to_ipv4((u8 *)&attr->gid) != cpu_to_be32(INADDR_NONE)) { > > struct in_device *in_dev = __in_dev_get_rcu(ndev); > >+ struct net *net = dev_net(ndev); > > const struct in_ifaddr *ifa; > > bool subnet_match = false; > > > >@@ -248,7 +249,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, > > } > > if (!subnet_match) > > goto out; > >- if (smcrv2->daddr && smc_ib_find_route(smcrv2->saddr, > >+ if (smcrv2->daddr && smc_ib_find_route(net, smcrv2->saddr, > > smcrv2->daddr, > > smcrv2->nexthop_mac, > > &smcrv2->uses_gateway)) > >diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h > >index 4df5f8c8a0a1..ef8ac2b7546d 100644 > >--- a/net/smc/smc_ib.h > >+++ b/net/smc/smc_ib.h > >@@ -112,7 +112,7 @@ void smc_ib_sync_sg_for_device(struct smc_link *lnk, > > int smc_ib_determine_gid(struct smc_ib_device *smcibdev, u8 ibport, > > unsigned short vlan_id, u8 gid[], u8 *sgid_index, > > struct smc_init_info_smcrv2 *smcrv2); > >-int smc_ib_find_route(__be32 saddr, __be32 daddr, > >+int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, > > u8 nexthop_mac[], u8 *uses_gateway); > > bool smc_ib_is_valid_local_systemid(void); > > int smcr_nl_get_device(struct sk_buff *skb, struct netlink_callback *cb); > >-- > >2.37.1 (Apple Git-137.1) >
On Wed, Sep 27, 2023 at 08:55:28AM +0300, Leon Romanovsky wrote: >On Wed, Sep 27, 2023 at 11:42:09AM +0800, Dust Li wrote: >> On Mon, Sep 25, 2023 at 10:35:45AM +0800, Albert Huang wrote: >> >If the netdevice is within a container and communicates externally >> >through network technologies like VXLAN, we won't be able to find >> >routing information in the init_net namespace. To address this issue, >> >> Thanks for your founding ! >> >> I think this is a more generic problem, but not just related to VXLAN ? >> If we use SMC-R v2 and the netdevice is in a net namespace which is not >> init_net, we should always fail, right ? If so, I'd prefer this to be a bugfix. > >BTW, does this patch take into account net namespace of ib_device? I think this patch is irrelevant with the netns of ib_device. SMC has a global smc_ib_devices list reported by ib_client, and checked the netns using rdma_dev_access_netns. So I think we should have handled that well. Best regards, Dust > >Thanks > >> >> Best regards, >> Dust >> >> >we need to add a struct net parameter to the smc_ib_find_route function. >> >This allow us to locate the routing information within the corresponding >> >net namespace, ensuring the correct completion of the SMC CLC interaction. >> > >> >Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> >> >--- >> > net/smc/af_smc.c | 3 ++- >> > net/smc/smc_ib.c | 7 ++++--- >> > net/smc/smc_ib.h | 2 +- >> > 3 files changed, 7 insertions(+), 5 deletions(-) >> > >> >diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >> >index bacdd971615e..7a874da90c7f 100644 >> >--- a/net/smc/af_smc.c >> >+++ b/net/smc/af_smc.c >> >@@ -1201,6 +1201,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, >> > (struct smc_clc_msg_accept_confirm_v2 *)aclc; >> > struct smc_clc_first_contact_ext *fce = >> > smc_get_clc_first_contact_ext(clc_v2, false); >> >+ struct net *net = sock_net(&smc->sk); >> > int rc; >> > >> > if (!ini->first_contact_peer || aclc->hdr.version == SMC_V1) >> >@@ -1210,7 +1211,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, >> > memcpy(ini->smcrv2.nexthop_mac, &aclc->r0.lcl.mac, ETH_ALEN); >> > ini->smcrv2.uses_gateway = false; >> > } else { >> >- if (smc_ib_find_route(smc->clcsock->sk->sk_rcv_saddr, >> >+ if (smc_ib_find_route(net, smc->clcsock->sk->sk_rcv_saddr, >> > smc_ib_gid_to_ipv4(aclc->r0.lcl.gid), >> > ini->smcrv2.nexthop_mac, >> > &ini->smcrv2.uses_gateway)) >> >diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c >> >index 9b66d6aeeb1a..89981dbe46c9 100644 >> >--- a/net/smc/smc_ib.c >> >+++ b/net/smc/smc_ib.c >> >@@ -193,7 +193,7 @@ bool smc_ib_port_active(struct smc_ib_device *smcibdev, u8 ibport) >> > return smcibdev->pattr[ibport - 1].state == IB_PORT_ACTIVE; >> > } >> > >> >-int smc_ib_find_route(__be32 saddr, __be32 daddr, >> >+int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, >> > u8 nexthop_mac[], u8 *uses_gateway) >> > { >> > struct neighbour *neigh = NULL; >> >@@ -205,7 +205,7 @@ int smc_ib_find_route(__be32 saddr, __be32 daddr, >> > >> > if (daddr == cpu_to_be32(INADDR_NONE)) >> > goto out; >> >- rt = ip_route_output_flow(&init_net, &fl4, NULL); >> >+ rt = ip_route_output_flow(net, &fl4, NULL); >> > if (IS_ERR(rt)) >> > goto out; >> > if (rt->rt_uses_gateway && rt->rt_gw_family != AF_INET) >> >@@ -235,6 +235,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, >> > if (smcrv2 && attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP && >> > smc_ib_gid_to_ipv4((u8 *)&attr->gid) != cpu_to_be32(INADDR_NONE)) { >> > struct in_device *in_dev = __in_dev_get_rcu(ndev); >> >+ struct net *net = dev_net(ndev); >> > const struct in_ifaddr *ifa; >> > bool subnet_match = false; >> > >> >@@ -248,7 +249,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, >> > } >> > if (!subnet_match) >> > goto out; >> >- if (smcrv2->daddr && smc_ib_find_route(smcrv2->saddr, >> >+ if (smcrv2->daddr && smc_ib_find_route(net, smcrv2->saddr, >> > smcrv2->daddr, >> > smcrv2->nexthop_mac, >> > &smcrv2->uses_gateway)) >> >diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h >> >index 4df5f8c8a0a1..ef8ac2b7546d 100644 >> >--- a/net/smc/smc_ib.h >> >+++ b/net/smc/smc_ib.h >> >@@ -112,7 +112,7 @@ void smc_ib_sync_sg_for_device(struct smc_link *lnk, >> > int smc_ib_determine_gid(struct smc_ib_device *smcibdev, u8 ibport, >> > unsigned short vlan_id, u8 gid[], u8 *sgid_index, >> > struct smc_init_info_smcrv2 *smcrv2); >> >-int smc_ib_find_route(__be32 saddr, __be32 daddr, >> >+int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, >> > u8 nexthop_mac[], u8 *uses_gateway); >> > bool smc_ib_is_valid_local_systemid(void); >> > int smcr_nl_get_device(struct sk_buff *skb, struct netlink_callback *cb); >> >-- >> >2.37.1 (Apple Git-137.1) >>
Leon Romanovsky <leon@kernel.org> 于2023年9月27日周三 13:55写道: > > On Wed, Sep 27, 2023 at 11:42:09AM +0800, Dust Li wrote: > > On Mon, Sep 25, 2023 at 10:35:45AM +0800, Albert Huang wrote: > > >If the netdevice is within a container and communicates externally > > >through network technologies like VXLAN, we won't be able to find > > >routing information in the init_net namespace. To address this issue, > > > > Thanks for your founding ! > > > > I think this is a more generic problem, but not just related to VXLAN ? > > If we use SMC-R v2 and the netdevice is in a net namespace which is not > > init_net, we should always fail, right ? If so, I'd prefer this to be a bugfix. > > BTW, does this patch take into account net namespace of ib_device? > > Thanks > As dust said, the ib_device works well. bool rdma_dev_access_netns(const struct ib_device *dev, const struct net *net) { return (ib_devices_shared_netns || net_eq(read_pnet(&dev->coredev.rdma_net), net)); } EXPORT_SYMBOL(rdma_dev_access_netns); thanks! BR Albert. > > > > Best regards, > > Dust > > > > >we need to add a struct net parameter to the smc_ib_find_route function. > > >This allow us to locate the routing information within the corresponding > > >net namespace, ensuring the correct completion of the SMC CLC interaction. > > > > > >Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> > > >--- > > > net/smc/af_smc.c | 3 ++- > > > net/smc/smc_ib.c | 7 ++++--- > > > net/smc/smc_ib.h | 2 +- > > > 3 files changed, 7 insertions(+), 5 deletions(-) > > > > > >diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c > > >index bacdd971615e..7a874da90c7f 100644 > > >--- a/net/smc/af_smc.c > > >+++ b/net/smc/af_smc.c > > >@@ -1201,6 +1201,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, > > > (struct smc_clc_msg_accept_confirm_v2 *)aclc; > > > struct smc_clc_first_contact_ext *fce = > > > smc_get_clc_first_contact_ext(clc_v2, false); > > >+ struct net *net = sock_net(&smc->sk); > > > int rc; > > > > > > if (!ini->first_contact_peer || aclc->hdr.version == SMC_V1) > > >@@ -1210,7 +1211,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, > > > memcpy(ini->smcrv2.nexthop_mac, &aclc->r0.lcl.mac, ETH_ALEN); > > > ini->smcrv2.uses_gateway = false; > > > } else { > > >- if (smc_ib_find_route(smc->clcsock->sk->sk_rcv_saddr, > > >+ if (smc_ib_find_route(net, smc->clcsock->sk->sk_rcv_saddr, > > > smc_ib_gid_to_ipv4(aclc->r0.lcl.gid), > > > ini->smcrv2.nexthop_mac, > > > &ini->smcrv2.uses_gateway)) > > >diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c > > >index 9b66d6aeeb1a..89981dbe46c9 100644 > > >--- a/net/smc/smc_ib.c > > >+++ b/net/smc/smc_ib.c > > >@@ -193,7 +193,7 @@ bool smc_ib_port_active(struct smc_ib_device *smcibdev, u8 ibport) > > > return smcibdev->pattr[ibport - 1].state == IB_PORT_ACTIVE; > > > } > > > > > >-int smc_ib_find_route(__be32 saddr, __be32 daddr, > > >+int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, > > > u8 nexthop_mac[], u8 *uses_gateway) > > > { > > > struct neighbour *neigh = NULL; > > >@@ -205,7 +205,7 @@ int smc_ib_find_route(__be32 saddr, __be32 daddr, > > > > > > if (daddr == cpu_to_be32(INADDR_NONE)) > > > goto out; > > >- rt = ip_route_output_flow(&init_net, &fl4, NULL); > > >+ rt = ip_route_output_flow(net, &fl4, NULL); > > > if (IS_ERR(rt)) > > > goto out; > > > if (rt->rt_uses_gateway && rt->rt_gw_family != AF_INET) > > >@@ -235,6 +235,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, > > > if (smcrv2 && attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP && > > > smc_ib_gid_to_ipv4((u8 *)&attr->gid) != cpu_to_be32(INADDR_NONE)) { > > > struct in_device *in_dev = __in_dev_get_rcu(ndev); > > >+ struct net *net = dev_net(ndev); > > > const struct in_ifaddr *ifa; > > > bool subnet_match = false; > > > > > >@@ -248,7 +249,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, > > > } > > > if (!subnet_match) > > > goto out; > > >- if (smcrv2->daddr && smc_ib_find_route(smcrv2->saddr, > > >+ if (smcrv2->daddr && smc_ib_find_route(net, smcrv2->saddr, > > > smcrv2->daddr, > > > smcrv2->nexthop_mac, > > > &smcrv2->uses_gateway)) > > >diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h > > >index 4df5f8c8a0a1..ef8ac2b7546d 100644 > > >--- a/net/smc/smc_ib.h > > >+++ b/net/smc/smc_ib.h > > >@@ -112,7 +112,7 @@ void smc_ib_sync_sg_for_device(struct smc_link *lnk, > > > int smc_ib_determine_gid(struct smc_ib_device *smcibdev, u8 ibport, > > > unsigned short vlan_id, u8 gid[], u8 *sgid_index, > > > struct smc_init_info_smcrv2 *smcrv2); > > >-int smc_ib_find_route(__be32 saddr, __be32 daddr, > > >+int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, > > > u8 nexthop_mac[], u8 *uses_gateway); > > > bool smc_ib_is_valid_local_systemid(void); > > > int smcr_nl_get_device(struct sk_buff *skb, struct netlink_callback *cb); > > >-- > > >2.37.1 (Apple Git-137.1) > >
On Wed, Sep 27, 2023 at 08:17:40PM +0800, Dust Li wrote: > On Wed, Sep 27, 2023 at 08:55:28AM +0300, Leon Romanovsky wrote: > >On Wed, Sep 27, 2023 at 11:42:09AM +0800, Dust Li wrote: > >> On Mon, Sep 25, 2023 at 10:35:45AM +0800, Albert Huang wrote: > >> >If the netdevice is within a container and communicates externally > >> >through network technologies like VXLAN, we won't be able to find > >> >routing information in the init_net namespace. To address this issue, > >> > >> Thanks for your founding ! > >> > >> I think this is a more generic problem, but not just related to VXLAN ? > >> If we use SMC-R v2 and the netdevice is in a net namespace which is not > >> init_net, we should always fail, right ? If so, I'd prefer this to be a bugfix. > > > >BTW, does this patch take into account net namespace of ib_device? > > I think this patch is irrelevant with the netns of ib_device. > > SMC has a global smc_ib_devices list reported by ib_client, and checked > the netns using rdma_dev_access_netns. So I think we should have handled > that well. ok, I see Thanks, Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
On Mon, 2023-09-25 at 10:35 +0800, Albert Huang wrote: > If the netdevice is within a container and communicates externally > through network technologies like VXLAN, we won't be able to find > routing information in the init_net namespace. To address this issue, > we need to add a struct net parameter to the smc_ib_find_route function. > This allow us to locate the routing information within the corresponding > net namespace, ensuring the correct completion of the SMC CLC interaction. > > Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> > --- > net/smc/af_smc.c | 3 ++- > net/smc/smc_ib.c | 7 ++++--- > net/smc/smc_ib.h | 2 +- > 3 files changed, 7 insertions(+), 5 deletions(-) > I'm trying to test this patch on s390x but I'm running into the same issue I ran into with the original SMC namespace support:https://lore.kernel.org/netdev/8701fa4557026983a9ec687cfdd7ac5b3b85fd39.camel@linux.ibm.com/ Just like back then I'm using a server and a client network namespace on the same system with two ConnectX-4 VFs from the same card and port. Both TCP/IP traffic as well as user-space RDMA via "qperf … rc_bw" and `qperf … rc_lat` work between namespaces and definitely go via the card. I did use "rdma system set netns exclusive" then moved the RDMA devices into the namespaces with "rdma dev set <rdma_dev> netns <namespace>". I also verified with "ip netns exec <namespace> rdma dev" that the RDMA devices are in the network namespace and as seen by the qperf runs normal RDMA does work. For reference the smc_chck tool gives me the following output: Server started on port 37373 [DEBUG] Interfaces to check: eno4378 Test with target IP 10.10.93.12 and port 37373 Live test (SMC-D and SMC-R) [DEBUG] Running client: smc_run /tmp/echo-clt.x0q8iO 10.10.93.12 -p 37373 [DEBUG] Client result: TCP 0x05000000/0x03030000 Failed (TCP fallback), reasons: Client: 0x05000000 Peer declined during handshake Server: 0x03030000 No SMC devices found (R and D) I also checked that SMC is generally working, once I add an ISM device I do get SMC-D between the namespaces. Any ideas what could break SMC-R here? Thanks, Niklas
On Wed, 2023-09-27 at 11:42 +0800, Dust Li wrote: > On Mon, Sep 25, 2023 at 10:35:45AM +0800, Albert Huang wrote: > > If the netdevice is within a container and communicates externally > > through network technologies like VXLAN, we won't be able to find > > routing information in the init_net namespace. To address this issue, > > Thanks for your founding ! > > I think this is a more generic problem, but not just related to VXLAN ? > If we use SMC-R v2 and the netdevice is in a net namespace which is not > init_net, we should always fail, right ? If so, I'd prefer this to be a bugfix. Re-stating the above to be on the same page: the patch should be re- posted targeting the net tree, and including a suitable fixes tag. @Dust Li: please correct me if I misread you. Thanks, Paolo
On Tue, Oct 03, 2023 at 12:41:25PM +0200, Paolo Abeni wrote: >On Wed, 2023-09-27 at 11:42 +0800, Dust Li wrote: >> On Mon, Sep 25, 2023 at 10:35:45AM +0800, Albert Huang wrote: >> > If the netdevice is within a container and communicates externally >> > through network technologies like VXLAN, we won't be able to find >> > routing information in the init_net namespace. To address this issue, >> >> Thanks for your founding ! >> >> I think this is a more generic problem, but not just related to VXLAN ? >> If we use SMC-R v2 and the netdevice is in a net namespace which is not >> init_net, we should always fail, right ? If so, I'd prefer this to be a bugfix. > >Re-stating the above to be on the same page: the patch should be re- >posted targeting the net tree, and including a suitable fixes tag. > >@Dust Li: please correct me if I misread you. Right, this is exactly what I mean. Best regards, Dust > >Thanks, > >Paolo
On Thu, Sep 28, 2023 at 05:04:21PM +0200, Niklas Schnelle wrote: >On Mon, 2023-09-25 at 10:35 +0800, Albert Huang wrote: >> If the netdevice is within a container and communicates externally >> through network technologies like VXLAN, we won't be able to find >> routing information in the init_net namespace. To address this issue, >> we need to add a struct net parameter to the smc_ib_find_route function. >> This allow us to locate the routing information within the corresponding >> net namespace, ensuring the correct completion of the SMC CLC interaction. >> >> Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> >> --- >> net/smc/af_smc.c | 3 ++- >> net/smc/smc_ib.c | 7 ++++--- >> net/smc/smc_ib.h | 2 +- >> 3 files changed, 7 insertions(+), 5 deletions(-) >> > >I'm trying to test this patch on s390x but I'm running into the same >issue I ran into with the original SMC namespace >support:https://lore.kernel.org/netdev/8701fa4557026983a9ec687cfdd7ac5b3b85fd39.camel@linux.ibm.com/ > >Just like back then I'm using a server and a client network namespace >on the same system with two ConnectX-4 VFs from the same card and port. >Both TCP/IP traffic as well as user-space RDMA via "qperf … rc_bw" and >`qperf … rc_lat` work between namespaces and definitely go via the >card. > >I did use "rdma system set netns exclusive" then moved the RDMA devices >into the namespaces with "rdma dev set <rdma_dev> netns <namespace>". I >also verified with "ip netns exec <namespace> rdma dev" >that the RDMA devices are in the network namespace and as seen by the >qperf runs normal RDMA does work. > >For reference the smc_chck tool gives me the following output: > >Server started on port 37373 >[DEBUG] Interfaces to check: eno4378 >Test with target IP 10.10.93.12 and port 37373 > Live test (SMC-D and SMC-R) >[DEBUG] Running client: smc_run /tmp/echo-clt.x0q8iO 10.10.93.12 -p >37373 >[DEBUG] Client result: TCP 0x05000000/0x03030000 > Failed (TCP fallback), reasons: > Client: 0x05000000 Peer declined during handshake > Server: 0x03030000 No SMC devices found (R and D) > >I also checked that SMC is generally working, once I add an ISM device >I do get SMC-D between the namespaces. Any ideas what could break SMC-R >here? I missed the email :( Are you running SMC-Rv2 or v1 ? Best regards, Dust > >Thanks, >Niklas
On Wed, Oct 11, 2023 at 10:48:16PM +0800, Dust Li wrote: >On Thu, Sep 28, 2023 at 05:04:21PM +0200, Niklas Schnelle wrote: >>On Mon, 2023-09-25 at 10:35 +0800, Albert Huang wrote: >>> If the netdevice is within a container and communicates externally >>> through network technologies like VXLAN, we won't be able to find >>> routing information in the init_net namespace. To address this issue, >>> we need to add a struct net parameter to the smc_ib_find_route function. >>> This allow us to locate the routing information within the corresponding >>> net namespace, ensuring the correct completion of the SMC CLC interaction. >>> >>> Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> >>> --- >>> net/smc/af_smc.c | 3 ++- >>> net/smc/smc_ib.c | 7 ++++--- >>> net/smc/smc_ib.h | 2 +- >>> 3 files changed, 7 insertions(+), 5 deletions(-) >>> >> >>I'm trying to test this patch on s390x but I'm running into the same >>issue I ran into with the original SMC namespace >>support:https://lore.kernel.org/netdev/8701fa4557026983a9ec687cfdd7ac5b3b85fd39.camel@linux.ibm.com/ >> >>Just like back then I'm using a server and a client network namespace >>on the same system with two ConnectX-4 VFs from the same card and port. >>Both TCP/IP traffic as well as user-space RDMA via "qperf … rc_bw" and >>`qperf … rc_lat` work between namespaces and definitely go via the >>card. >> >>I did use "rdma system set netns exclusive" then moved the RDMA devices >>into the namespaces with "rdma dev set <rdma_dev> netns <namespace>". I >>also verified with "ip netns exec <namespace> rdma dev" >>that the RDMA devices are in the network namespace and as seen by the >>qperf runs normal RDMA does work. >> >>For reference the smc_chck tool gives me the following output: >> >>Server started on port 37373 >>[DEBUG] Interfaces to check: eno4378 >>Test with target IP 10.10.93.12 and port 37373 >> Live test (SMC-D and SMC-R) >>[DEBUG] Running client: smc_run /tmp/echo-clt.x0q8iO 10.10.93.12 -p >>37373 >>[DEBUG] Client result: TCP 0x05000000/0x03030000 >> Failed (TCP fallback), reasons: >> Client: 0x05000000 Peer declined during handshake >> Server: 0x03030000 No SMC devices found (R and D) >> >>I also checked that SMC is generally working, once I add an ISM device >>I do get SMC-D between the namespaces. Any ideas what could break SMC-R >>here? > >I missed the email :( > >Are you running SMC-Rv2 or v1 ? Hi Niklas, I tried your test today, and I encounter the same issue. But I found it's because my 2 VFs are in difference subnets, SMC-Rv2 work fine, SMC-Rv1 won't work, which is expected. When I set the 2 VFs in the same subnet, SMC-Rv1 also works. So I'm not sure it's the same for you. Can you check it out ? BTW, the fallback reason(SMC_CLC_DECL_NOSMCDEV) in this case is really not friendly, it's better to return SMC_CLC_DECL_DIFFPREFIX. Best regards, Dust > >Best regards, >Dust > > >> >>Thanks, >>Niklas
On 12.10.23 14:17, Dust Li wrote: > On Wed, Oct 11, 2023 at 10:48:16PM +0800, Dust Li wrote: >> On Thu, Sep 28, 2023 at 05:04:21PM +0200, Niklas Schnelle wrote: >>> On Mon, 2023-09-25 at 10:35 +0800, Albert Huang wrote: >>>> If the netdevice is within a container and communicates externally >>>> through network technologies like VXLAN, we won't be able to find >>>> routing information in the init_net namespace. To address this issue, >>>> we need to add a struct net parameter to the smc_ib_find_route function. >>>> This allow us to locate the routing information within the corresponding >>>> net namespace, ensuring the correct completion of the SMC CLC interaction. >>>> >>>> Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> >>>> --- >>>> net/smc/af_smc.c | 3 ++- >>>> net/smc/smc_ib.c | 7 ++++--- >>>> net/smc/smc_ib.h | 2 +- >>>> 3 files changed, 7 insertions(+), 5 deletions(-) >>>> >>> >>> I'm trying to test this patch on s390x but I'm running into the same >>> issue I ran into with the original SMC namespace >>> support:https://lore.kernel.org/netdev/8701fa4557026983a9ec687cfdd7ac5b3b85fd39.camel@linux.ibm.com/ >>> >>> Just like back then I'm using a server and a client network namespace >>> on the same system with two ConnectX-4 VFs from the same card and port. >>> Both TCP/IP traffic as well as user-space RDMA via "qperf … rc_bw" and >>> `qperf … rc_lat` work between namespaces and definitely go via the >>> card. >>> >>> I did use "rdma system set netns exclusive" then moved the RDMA devices >>> into the namespaces with "rdma dev set <rdma_dev> netns <namespace>". I >>> also verified with "ip netns exec <namespace> rdma dev" >>> that the RDMA devices are in the network namespace and as seen by the >>> qperf runs normal RDMA does work. >>> >>> For reference the smc_chck tool gives me the following output: >>> >>> Server started on port 37373 >>> [DEBUG] Interfaces to check: eno4378 >>> Test with target IP 10.10.93.12 and port 37373 >>> Live test (SMC-D and SMC-R) >>> [DEBUG] Running client: smc_run /tmp/echo-clt.x0q8iO 10.10.93.12 -p >>> 37373 >>> [DEBUG] Client result: TCP 0x05000000/0x03030000 >>> Failed (TCP fallback), reasons: >>> Client: 0x05000000 Peer declined during handshake >>> Server: 0x03030000 No SMC devices found (R and D) >>> >>> I also checked that SMC is generally working, once I add an ISM device >>> I do get SMC-D between the namespaces. Any ideas what could break SMC-R >>> here? >> >> I missed the email :( >> >> Are you running SMC-Rv2 or v1 ? > > Hi Niklas, > > I tried your test today, and I encounter the same issue. > But I found it's because my 2 VFs are in difference subnets, > SMC-Rv2 work fine, SMC-Rv1 won't work, which is expected. > When I set the 2 VFs in the same subnet, SMC-Rv1 also works. > > So I'm not sure it's the same for you. Can you check it out ? > > BTW, the fallback reason(SMC_CLC_DECL_NOSMCDEV) in this case > is really not friendly, it's better to return SMC_CLC_DECL_DIFFPREFIX. > > Best regards, > Dust > Thank you, Dust, for trying it out! The reason code SMC_CLC_DECL_NOSMCDEV there could really make one misunderstand. > >> >> Best regards, >> Dust >> >> >>> >>> Thanks, >>> Niklas
On Thu, 2023-10-12 at 20:17 +0800, Dust Li wrote: > On Wed, Oct 11, 2023 at 10:48:16PM +0800, Dust Li wrote: > > On Thu, Sep 28, 2023 at 05:04:21PM +0200, Niklas Schnelle wrote: > > > On Mon, 2023-09-25 at 10:35 +0800, Albert Huang wrote: > > > > If the netdevice is within a container and communicates externally > > > > through network technologies like VXLAN, we won't be able to find > > > > routing information in the init_net namespace. To address this issue, > > > > we need to add a struct net parameter to the smc_ib_find_route function. > > > > This allow us to locate the routing information within the corresponding > > > > net namespace, ensuring the correct completion of the SMC CLC interaction. > > > > > > > > Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> > > > > --- > > > > net/smc/af_smc.c | 3 ++- > > > > net/smc/smc_ib.c | 7 ++++--- > > > > net/smc/smc_ib.h | 2 +- > > > > 3 files changed, 7 insertions(+), 5 deletions(-) > > > > > > > > > > I'm trying to test this patch on s390x but I'm running into the same > > > issue I ran into with the original SMC namespace > > > support:https://lore.kernel.org/netdev/8701fa4557026983a9ec687cfdd7ac5b3b85fd39.camel@linux.ibm.com/ > > > > > > Just like back then I'm using a server and a client network namespace > > > on the same system with two ConnectX-4 VFs from the same card and port. > > > Both TCP/IP traffic as well as user-space RDMA via "qperf … rc_bw" and > > > `qperf … rc_lat` work between namespaces and definitely go via the > > > card. > > > > > > I did use "rdma system set netns exclusive" then moved the RDMA devices > > > into the namespaces with "rdma dev set <rdma_dev> netns <namespace>". I > > > also verified with "ip netns exec <namespace> rdma dev" > > > that the RDMA devices are in the network namespace and as seen by the > > > qperf runs normal RDMA does work. > > > > > > For reference the smc_chck tool gives me the following output: > > > > > > Server started on port 37373 > > > [DEBUG] Interfaces to check: eno4378 > > > Test with target IP 10.10.93.12 and port 37373 > > > Live test (SMC-D and SMC-R) > > > [DEBUG] Running client: smc_run /tmp/echo-clt.x0q8iO 10.10.93.12 -p > > > 37373 > > > [DEBUG] Client result: TCP 0x05000000/0x03030000 > > > Failed (TCP fallback), reasons: > > > Client: 0x05000000 Peer declined during handshake > > > Server: 0x03030000 No SMC devices found (R and D) > > > > > > I also checked that SMC is generally working, once I add an ISM device > > > I do get SMC-D between the namespaces. Any ideas what could break SMC-R > > > here? > > > > I missed the email :( > > > > Are you running SMC-Rv2 or v1 ? > > Hi Niklas, > > I tried your test today, and I encounter the same issue. > But I found it's because my 2 VFs are in difference subnets, > SMC-Rv2 work fine, SMC-Rv1 won't work, which is expected. > When I set the 2 VFs in the same subnet, SMC-Rv1 also works. > > So I'm not sure it's the same for you. Can you check it out ? > > BTW, the fallback reason(SMC_CLC_DECL_NOSMCDEV) in this case > is really not friendly, it's better to return SMC_CLC_DECL_DIFFPREFIX. > > Best regards, > Dust I think you are right. I did use two consecutive private IPs but I had set the subnet mask to /32. Setting that to /16 the SMC-R connection is established. I'll work with Wenjia and Jan on why my system is defaulting to SMC-Rv1 I would have hoped to get SMC-Rv2. Thanks for your insights! Niklas
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index bacdd971615e..7a874da90c7f 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -1201,6 +1201,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, (struct smc_clc_msg_accept_confirm_v2 *)aclc; struct smc_clc_first_contact_ext *fce = smc_get_clc_first_contact_ext(clc_v2, false); + struct net *net = sock_net(&smc->sk); int rc; if (!ini->first_contact_peer || aclc->hdr.version == SMC_V1) @@ -1210,7 +1211,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, memcpy(ini->smcrv2.nexthop_mac, &aclc->r0.lcl.mac, ETH_ALEN); ini->smcrv2.uses_gateway = false; } else { - if (smc_ib_find_route(smc->clcsock->sk->sk_rcv_saddr, + if (smc_ib_find_route(net, smc->clcsock->sk->sk_rcv_saddr, smc_ib_gid_to_ipv4(aclc->r0.lcl.gid), ini->smcrv2.nexthop_mac, &ini->smcrv2.uses_gateway)) diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c index 9b66d6aeeb1a..89981dbe46c9 100644 --- a/net/smc/smc_ib.c +++ b/net/smc/smc_ib.c @@ -193,7 +193,7 @@ bool smc_ib_port_active(struct smc_ib_device *smcibdev, u8 ibport) return smcibdev->pattr[ibport - 1].state == IB_PORT_ACTIVE; } -int smc_ib_find_route(__be32 saddr, __be32 daddr, +int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, u8 nexthop_mac[], u8 *uses_gateway) { struct neighbour *neigh = NULL; @@ -205,7 +205,7 @@ int smc_ib_find_route(__be32 saddr, __be32 daddr, if (daddr == cpu_to_be32(INADDR_NONE)) goto out; - rt = ip_route_output_flow(&init_net, &fl4, NULL); + rt = ip_route_output_flow(net, &fl4, NULL); if (IS_ERR(rt)) goto out; if (rt->rt_uses_gateway && rt->rt_gw_family != AF_INET) @@ -235,6 +235,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, if (smcrv2 && attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP && smc_ib_gid_to_ipv4((u8 *)&attr->gid) != cpu_to_be32(INADDR_NONE)) { struct in_device *in_dev = __in_dev_get_rcu(ndev); + struct net *net = dev_net(ndev); const struct in_ifaddr *ifa; bool subnet_match = false; @@ -248,7 +249,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, } if (!subnet_match) goto out; - if (smcrv2->daddr && smc_ib_find_route(smcrv2->saddr, + if (smcrv2->daddr && smc_ib_find_route(net, smcrv2->saddr, smcrv2->daddr, smcrv2->nexthop_mac, &smcrv2->uses_gateway)) diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h index 4df5f8c8a0a1..ef8ac2b7546d 100644 --- a/net/smc/smc_ib.h +++ b/net/smc/smc_ib.h @@ -112,7 +112,7 @@ void smc_ib_sync_sg_for_device(struct smc_link *lnk, int smc_ib_determine_gid(struct smc_ib_device *smcibdev, u8 ibport, unsigned short vlan_id, u8 gid[], u8 *sgid_index, struct smc_init_info_smcrv2 *smcrv2); -int smc_ib_find_route(__be32 saddr, __be32 daddr, +int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, u8 nexthop_mac[], u8 *uses_gateway); bool smc_ib_is_valid_local_systemid(void); int smcr_nl_get_device(struct sk_buff *skb, struct netlink_callback *cb);