Message ID | 1684386895-112162-1-git-send-email-guwen@linux.alibaba.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp252004vqo; Wed, 17 May 2023 22:23:46 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4REgtQGjzfJBQ7tFVE3p0lygWPb1WB9K47owxEMxkasqynXOnOfVV7WSs+ZZQj1itq7vI7 X-Received: by 2002:a05:6a00:2d0c:b0:643:aa2:4dcd with SMTP id fa12-20020a056a002d0c00b006430aa24dcdmr3111449pfb.16.1684387425934; Wed, 17 May 2023 22:23:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684387425; cv=none; d=google.com; s=arc-20160816; b=fwPgKvycUvEhhE7oIzDw5yK3PI7qpiNDjLlCn2ZUL9EMCug64uJrmGtHYW4JiZ3D3p bdzkUUHNxQUVlz65xxkBnHua7iqhUvFRJEwvGzQ9efMchlNuwglXghSZ+Inn2oNA9u7x vQWhrUeYBi3gInMAyEZR0LgoKHU1Ez3//2I/NaKp7r8rVfgrLX8UvFBfXl9BpHbWaRA0 vnZ5V7l+HaLwSBHUTAjqh55x4HF1kzvTUH6g+8P/G5oOuH69nCQaBdzwZxuN1Wf+f0+2 CcLtXqf8GgRv66kRdQB/c4Bx8znrOfl+0SuLByq+DmGt8KXtw5GRajjojliGCgY8y9Y1 g1Sw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from; bh=es0lXqcn50mOvnd1xQ+gVG1quq3DrmRbLiRuLY0WXYk=; b=Zi+Uy8GVVNlW16lcqFL54qxiD+p7EIJJxgMIKrglJXlw4laK9RRV/AYYdVIuEgML/e 4OgwGD29Rw1DoDH/yqVbOoOG0tMf7qRcit4ApzDUZkgK3Xpq4/XcDiRTvof/+1Wm6TUw T2ku02o3a89sh47a5dsWRGtsh7Xb5MbE58LGNPPu/SbDzq/f+PeNJWeo71E/wN2RDmud KXrUdPgBJqF3Syg0AUzAUzSE/IhPA5cxl3677hF8CRoXe4x56QWLDmzeTpwobJGSvQM/ KTTh9YdpLe+ZxMuxbIp8Lmd+16PkdxHwNjmboRZmRMJNgtTDU8Is1YMDiJb/m6BoSuns LVKA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x21-20020a63b215000000b0052c9e45cb3fsi598917pge.384.2023.05.17.22.23.31; Wed, 17 May 2023 22:23:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229815AbjERFPV (ORCPT <rfc822;abdi.embedded@gmail.com> + 99 others); Thu, 18 May 2023 01:15:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229621AbjERFPT (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 18 May 2023 01:15:19 -0400 Received: from out30-111.freemail.mail.aliyun.com (out30-111.freemail.mail.aliyun.com [115.124.30.111]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC5141981; Wed, 17 May 2023 22:15:16 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R691e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045176;MF=guwen@linux.alibaba.com;NM=1;PH=DS;RN=11;SR=0;TI=SMTPD_---0VivOQl2_1684386895; Received: from h68b04305.sqa.eu95.tbsite.net(mailfrom:guwen@linux.alibaba.com fp:SMTPD_---0VivOQl2_1684386895) by smtp.aliyun-inc.com; Thu, 18 May 2023 13:15:11 +0800 From: Wen Gu <guwen@linux.alibaba.com> To: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: linux-s390@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, liuyacan@corp.netease.com Subject: [PATCH net] net/smc: Reset connection when trying to use SMCRv2 fails. Date: Thu, 18 May 2023 13:14:55 +0800 Message-Id: <1684386895-112162-1-git-send-email-guwen@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY, USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766208229866130452?= X-GMAIL-MSGID: =?utf-8?q?1766208229866130452?= |
Series |
[net] net/smc: Reset connection when trying to use SMCRv2 fails.
|
|
Commit Message
Wen Gu
May 18, 2023, 5:14 a.m. UTC
We found a crash when using SMCRv2 with 2 Mellanox ConnectX-4. It
can be reproduced by:
- smc_run nginx
- smc_run wrk -t 32 -c 500 -d 30 http://<ip>:<port>
BUG: kernel NULL pointer dereference, address: 0000000000000014
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 8000000108713067 P4D 8000000108713067 PUD 151127067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 4 PID: 2441 Comm: kworker/4:249 Kdump: loaded Tainted: G W E 6.4.0-rc1+ #42
Workqueue: smc_hs_wq smc_listen_work [smc]
RIP: 0010:smc_clc_send_confirm_accept+0x284/0x580 [smc]
RSP: 0018:ffffb8294b2d7c78 EFLAGS: 00010a06
RAX: ffff8f1873238880 RBX: ffffb8294b2d7dc8 RCX: 0000000000000000
RDX: 00000000000000b4 RSI: 0000000000000001 RDI: 0000000000b40c00
RBP: ffffb8294b2d7db8 R08: ffff8f1815c5860c R09: 0000000000000000
R10: 0000000000000400 R11: 0000000000000000 R12: ffff8f1846f56180
R13: ffff8f1815c5860c R14: 0000000000000001 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff8f1aefd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000014 CR3: 00000001027a0001 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? mlx5_ib_map_mr_sg+0xa1/0xd0 [mlx5_ib]
? smcr_buf_map_link+0x24b/0x290 [smc]
? __smc_buf_create+0x4ee/0x9b0 [smc]
smc_clc_send_accept+0x4c/0xb0 [smc]
smc_listen_work+0x346/0x650 [smc]
? __schedule+0x279/0x820
process_one_work+0x1e5/0x3f0
worker_thread+0x4d/0x2f0
? __pfx_worker_thread+0x10/0x10
kthread+0xe5/0x120
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2c/0x50
</TASK>
During the CLC handshake, server sequentially tries available SMCRv2
and SMCRv1 devices in smc_listen_work().
If an SMCRv2 device is found. SMCv2 based link group and link will be
assigned to the connection. Then assumed that some buffer assignment
errors happen later in the CLC handshake, such as RMB registration
failure, server will give up SMCRv2 and try SMCRv1 device instead. But
the resources assigned to the connection won't be reset.
When server tries SMCRv1 device, the connection creation process will
be executed again. Since conn->lnk has been assigned when trying SMCRv2,
it will not be set to the correct SMCRv1 link in
smcr_lgr_conn_assign_link(). So in such situation, conn->lgr points to
correct SMCRv1 link group but conn->lnk points to the SMCRv2 link
mistakenly.
Then in smc_clc_send_confirm_accept(), conn->rmb_desc->mr[link->link_idx]
will be accessed. Since the link->link_idx is not correct, the related
MR may not have been initialized, so crash happens.
| Try SMCRv2 device first
| |-> conn->lgr: assign existed SMCRv2 link group;
| |-> conn->link: assign existed SMCRv2 link (link_idx may be 1 in SMC_LGR_SYMMETRIC);
| |-> sndbuf & RMB creation fails, quit;
|
| Try SMCRv1 device then
| |-> conn->lgr: create SMCRv1 link group and assign;
| |-> conn->link: keep SMCRv2 link mistakenly;
| |-> sndbuf & RMB creation succeed, only RMB->mr[link_idx = 0]
| initialized.
|
| Then smc_clc_send_confirm_accept() accesses
| conn->rmb_desc->mr[conn->link->link_idx, which is 1], then crash.
v
This patch tries to fix this by cleaning conn->lnk before assigning
link. In addition, it is better to reset the connection and clean the
resources assigned if trying SMCRv2 failed in buffer creation or
registration.
Fixes: e49300a6bf62 ("net/smc: add listen processing for SMC-Rv2")
Link: https://lore.kernel.org/r/20220523055056.2078994-1-liuyacan@corp.netease.com/
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
net/smc/af_smc.c | 9 +++++++--
net/smc/smc_core.c | 1 +
2 files changed, 8 insertions(+), 2 deletions(-)
Comments
On Thu, May 18, 2023 at 01:14:55PM +0800, Wen Gu wrote: > We found a crash when using SMCRv2 with 2 Mellanox ConnectX-4. It > can be reproduced by: > > - smc_run nginx > - smc_run wrk -t 32 -c 500 -d 30 http://<ip>:<port> > > BUG: kernel NULL pointer dereference, address: 0000000000000014 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 8000000108713067 P4D 8000000108713067 PUD 151127067 PMD 0 > Oops: 0000 [#1] PREEMPT SMP PTI > CPU: 4 PID: 2441 Comm: kworker/4:249 Kdump: loaded Tainted: G W E 6.4.0-rc1+ #42 > Workqueue: smc_hs_wq smc_listen_work [smc] > RIP: 0010:smc_clc_send_confirm_accept+0x284/0x580 [smc] > RSP: 0018:ffffb8294b2d7c78 EFLAGS: 00010a06 > RAX: ffff8f1873238880 RBX: ffffb8294b2d7dc8 RCX: 0000000000000000 > RDX: 00000000000000b4 RSI: 0000000000000001 RDI: 0000000000b40c00 > RBP: ffffb8294b2d7db8 R08: ffff8f1815c5860c R09: 0000000000000000 > R10: 0000000000000400 R11: 0000000000000000 R12: ffff8f1846f56180 > R13: ffff8f1815c5860c R14: 0000000000000001 R15: 0000000000000001 > FS: 0000000000000000(0000) GS:ffff8f1aefd00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000014 CR3: 00000001027a0001 CR4: 00000000003706e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > ? mlx5_ib_map_mr_sg+0xa1/0xd0 [mlx5_ib] > ? smcr_buf_map_link+0x24b/0x290 [smc] > ? __smc_buf_create+0x4ee/0x9b0 [smc] > smc_clc_send_accept+0x4c/0xb0 [smc] > smc_listen_work+0x346/0x650 [smc] > ? __schedule+0x279/0x820 > process_one_work+0x1e5/0x3f0 > worker_thread+0x4d/0x2f0 > ? __pfx_worker_thread+0x10/0x10 > kthread+0xe5/0x120 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x2c/0x50 > </TASK> > > During the CLC handshake, server sequentially tries available SMCRv2 > and SMCRv1 devices in smc_listen_work(). > > If an SMCRv2 device is found. SMCv2 based link group and link will be > assigned to the connection. Then assumed that some buffer assignment > errors happen later in the CLC handshake, such as RMB registration > failure, server will give up SMCRv2 and try SMCRv1 device instead. But > the resources assigned to the connection won't be reset. > > When server tries SMCRv1 device, the connection creation process will > be executed again. Since conn->lnk has been assigned when trying SMCRv2, > it will not be set to the correct SMCRv1 link in > smcr_lgr_conn_assign_link(). So in such situation, conn->lgr points to > correct SMCRv1 link group but conn->lnk points to the SMCRv2 link > mistakenly. > > Then in smc_clc_send_confirm_accept(), conn->rmb_desc->mr[link->link_idx] > will be accessed. Since the link->link_idx is not correct, the related > MR may not have been initialized, so crash happens. > > | Try SMCRv2 device first > | |-> conn->lgr: assign existed SMCRv2 link group; > | |-> conn->link: assign existed SMCRv2 link (link_idx may be 1 in SMC_LGR_SYMMETRIC); > | |-> sndbuf & RMB creation fails, quit; > | > | Try SMCRv1 device then > | |-> conn->lgr: create SMCRv1 link group and assign; > | |-> conn->link: keep SMCRv2 link mistakenly; > | |-> sndbuf & RMB creation succeed, only RMB->mr[link_idx = 0] > | initialized. > | > | Then smc_clc_send_confirm_accept() accesses > | conn->rmb_desc->mr[conn->link->link_idx, which is 1], then crash. > v > > This patch tries to fix this by cleaning conn->lnk before assigning > link. In addition, it is better to reset the connection and clean the > resources assigned if trying SMCRv2 failed in buffer creation or > registration. > > Fixes: e49300a6bf62 ("net/smc: add listen processing for SMC-Rv2") > Link: https://lore.kernel.org/r/20220523055056.2078994-1-liuyacan@corp.netease.com/ > Signed-off-by: Wen Gu <guwen@linux.alibaba.com> LGTM, thanks for your detailed analysis. Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> > --- > net/smc/af_smc.c | 9 +++++++-- > net/smc/smc_core.c | 1 + > 2 files changed, 8 insertions(+), 2 deletions(-) > > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c > index 50c38b6..538e9c6 100644 > --- a/net/smc/af_smc.c > +++ b/net/smc/af_smc.c > @@ -2000,8 +2000,10 @@ static int smc_listen_rdma_init(struct smc_sock *new_smc, > return rc; > > /* create send buffer and rmb */ > - if (smc_buf_create(new_smc, false)) > + if (smc_buf_create(new_smc, false)) { > + smc_conn_abort(new_smc, ini->first_contact_local); > return SMC_CLC_DECL_MEM; > + } > > return 0; > } > @@ -2217,8 +2219,11 @@ static void smc_find_rdma_v2_device_serv(struct smc_sock *new_smc, > smcr_version = ini->smcr_version; > ini->smcr_version = SMC_V2; > rc = smc_listen_rdma_init(new_smc, ini); > - if (!rc) > + if (!rc) { > rc = smc_listen_rdma_reg(new_smc, ini->first_contact_local); > + if (rc) > + smc_conn_abort(new_smc, ini->first_contact_local); > + } > if (!rc) > return; > ini->smcr_version = smcr_version; > diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c > index 4543567..3f465fa 100644 > --- a/net/smc/smc_core.c > +++ b/net/smc/smc_core.c > @@ -127,6 +127,7 @@ static int smcr_lgr_conn_assign_link(struct smc_connection *conn, bool first) > int i, j; > > /* do link balancing */ > + conn->lnk = NULL; /* reset conn->lnk first */ > for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) { > struct smc_link *lnk = &conn->lgr->lnk[i]; > > -- > 1.8.3.1
Hello: This patch was applied to netdev/net.git (main) by David S. Miller <davem@davemloft.net>: On Thu, 18 May 2023 13:14:55 +0800 you wrote: > We found a crash when using SMCRv2 with 2 Mellanox ConnectX-4. It > can be reproduced by: > > - smc_run nginx > - smc_run wrk -t 32 -c 500 -d 30 http://<ip>:<port> > > BUG: kernel NULL pointer dereference, address: 0000000000000014 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 8000000108713067 P4D 8000000108713067 PUD 151127067 PMD 0 > Oops: 0000 [#1] PREEMPT SMP PTI > CPU: 4 PID: 2441 Comm: kworker/4:249 Kdump: loaded Tainted: G W E 6.4.0-rc1+ #42 > Workqueue: smc_hs_wq smc_listen_work [smc] > RIP: 0010:smc_clc_send_confirm_accept+0x284/0x580 [smc] > RSP: 0018:ffffb8294b2d7c78 EFLAGS: 00010a06 > RAX: ffff8f1873238880 RBX: ffffb8294b2d7dc8 RCX: 0000000000000000 > RDX: 00000000000000b4 RSI: 0000000000000001 RDI: 0000000000b40c00 > RBP: ffffb8294b2d7db8 R08: ffff8f1815c5860c R09: 0000000000000000 > R10: 0000000000000400 R11: 0000000000000000 R12: ffff8f1846f56180 > R13: ffff8f1815c5860c R14: 0000000000000001 R15: 0000000000000001 > FS: 0000000000000000(0000) GS:ffff8f1aefd00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000014 CR3: 00000001027a0001 CR4: 00000000003706e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > ? mlx5_ib_map_mr_sg+0xa1/0xd0 [mlx5_ib] > ? smcr_buf_map_link+0x24b/0x290 [smc] > ? __smc_buf_create+0x4ee/0x9b0 [smc] > smc_clc_send_accept+0x4c/0xb0 [smc] > smc_listen_work+0x346/0x650 [smc] > ? __schedule+0x279/0x820 > process_one_work+0x1e5/0x3f0 > worker_thread+0x4d/0x2f0 > ? __pfx_worker_thread+0x10/0x10 > kthread+0xe5/0x120 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x2c/0x50 > </TASK> > > [...] Here is the summary with links: - [net] net/smc: Reset connection when trying to use SMCRv2 fails. https://git.kernel.org/netdev/net/c/35112271672a You are awesome, thank you!
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 50c38b6..538e9c6 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -2000,8 +2000,10 @@ static int smc_listen_rdma_init(struct smc_sock *new_smc, return rc; /* create send buffer and rmb */ - if (smc_buf_create(new_smc, false)) + if (smc_buf_create(new_smc, false)) { + smc_conn_abort(new_smc, ini->first_contact_local); return SMC_CLC_DECL_MEM; + } return 0; } @@ -2217,8 +2219,11 @@ static void smc_find_rdma_v2_device_serv(struct smc_sock *new_smc, smcr_version = ini->smcr_version; ini->smcr_version = SMC_V2; rc = smc_listen_rdma_init(new_smc, ini); - if (!rc) + if (!rc) { rc = smc_listen_rdma_reg(new_smc, ini->first_contact_local); + if (rc) + smc_conn_abort(new_smc, ini->first_contact_local); + } if (!rc) return; ini->smcr_version = smcr_version; diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 4543567..3f465fa 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -127,6 +127,7 @@ static int smcr_lgr_conn_assign_link(struct smc_connection *conn, bool first) int i, j; /* do link balancing */ + conn->lnk = NULL; /* reset conn->lnk first */ for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) { struct smc_link *lnk = &conn->lgr->lnk[i];