From patchwork Thu Aug 3 14:46:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Gerd Bayer X-Patchwork-Id: 130728 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9f41:0:b0:3e4:2afc:c1 with SMTP id v1csp1262354vqx; Thu, 3 Aug 2023 09:25:26 -0700 (PDT) X-Google-Smtp-Source: APBJJlGLZOFAuKGaigf0Xf1rqGBrVXiqcMHq/zjeM+aHVkfdEwtCQf+E2N0X+KvRMv6ounmUiYMR X-Received: by 2002:a05:6a20:431c:b0:13e:b58a:e3e8 with SMTP id h28-20020a056a20431c00b0013eb58ae3e8mr7703206pzk.50.1691079926137; Thu, 03 Aug 2023 09:25:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691079926; cv=none; d=google.com; s=arc-20160816; b=mYWFYjzeQm97LfS7l4qF8uAFPOLGXGo0tvDhoK+Yk1yq09/IYdmhj4iVnpJZDOgDCD +oHnm8lCuKC+gznIGM4sUJ6Lj6Xn6Ixf6Bt5n76EBWwJQDhbeaZrYj1tnTATkAuZv7Jn DZEuRJOELsjY+1iaiBhokVUCCbaq2ktLRRIPCbsz2jlMtoqAB80v/0ReeZ6acCSt2279 RTrq6sqDcroIydZQb1ftYIJMHbRiyaWYicf6M4iKT8LM2Kho633MChd/oIXiEOFi9Ba1 +1ZRRtotZXTJDmJ2TxFNeRbaNSwdCBdngDAcWsSZ0AfFQiNE5j+xAAejGVjdYNO3mWYw ygcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=YTsznr4HcKqxh6ljKvCLqIYqfoXP9uGFaCk/u+QGN7Q=; fh=MiOLYvgEyBy10I/4Y0PTMEkpvqxuzVBS6+dTfeOblB4=; b=Ybql+gowiJmOKksrtGOeU+7vOwC6Rysenu0rny5xnQiJ6rdZ7CivVW73ROUNIbGltQ tdoDcnhwAPLa529vnU7vTxNutOop7/yc75tKQzUo8R5Ujr70QY7Jgj343v6e6Vxgp8aH Dv1BNlDD6QDgv01ER17UZezZ3kfmSlRHG5FNAoLAYDM9W8iRNpUu8OQRPiy2ToPvhn2p uxCi8zIq413Y4grKvB2rMddL2K0Bxt8dlRCswyNC88oC35NjiKxzIS/Syp0Zn73HMmD2 CfB1w2/rijFEFL9FH59vBSeCWobeEyPust4c5I/esuU9IheuH3+phPc+jWP22eXqYE9k x3IA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=h2SUrt5A; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j4-20020a635504000000b00563de62f2bbsi123570pgb.340.2023.08.03.09.25.11; Thu, 03 Aug 2023 09:25:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=h2SUrt5A; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236007AbjHCOq3 (ORCPT + 99 others); Thu, 3 Aug 2023 10:46:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235572AbjHCOqV (ORCPT ); Thu, 3 Aug 2023 10:46:21 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC43B2D71; Thu, 3 Aug 2023 07:46:15 -0700 (PDT) Received: from pps.filterd (m0353728.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 373EgABD010920; Thu, 3 Aug 2023 14:46:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=pp1; bh=YTsznr4HcKqxh6ljKvCLqIYqfoXP9uGFaCk/u+QGN7Q=; b=h2SUrt5A7GB+xn+891Uv8m5UjGmKnXYiMfo5b+V6zOcv6yl7IkC44mODFvheL+9bzonz k5M0ojyAXwv2Jo1oo3YEVYpd3Vau+B9mygtOJHRerrCpW/r+Ldx8WCyOQRRcme1F0QAR CvlDNasGYec7/a0s9vyEhiITMOMDmjEzKFlxW3OijeKkzbDhGPaxRiHQHp1I9i6QgFcP 4l5e+7BnTvd3D8nzdvhm6qmyB4MfHhNb/DKblm5H/BDkZUBIxDFUYlw9LXXZyLHBeCNq YSaZDIG+1uUsF4mWihHs1Yo0UACb62Y8aBmS1LlKYHNfj+fRAar366kUf+E5bWTpCLbI ew== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3s8e7wg5b9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 03 Aug 2023 14:46:12 +0000 Received: from m0353728.ppops.net (m0353728.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 373EgP1d012104; Thu, 3 Aug 2023 14:46:11 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3s8e7wg5am-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 03 Aug 2023 14:46:11 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 373D4oSt014537; Thu, 3 Aug 2023 14:46:10 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3s5ft1wmnv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 03 Aug 2023 14:46:10 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 373Ek7gj27263390 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 3 Aug 2023 14:46:07 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8246520040; Thu, 3 Aug 2023 14:46:07 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 099212004E; Thu, 3 Aug 2023 14:46:07 +0000 (GMT) Received: from dilbert5.boeblingen.de.ibm.com (unknown [9.155.208.153]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 3 Aug 2023 14:46:06 +0000 (GMT) From: Gerd Bayer To: Wenjia Zhang , Jan Karcher , Tony Lu , Paolo Abeni Cc: Karsten Graul , "D . Wythe" , Wen Gu , "David S . Miller" , Eric Dumazet , Jakub Kicinski , linux-s390@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net v2 1/2] net/smc: Fix setsockopt and sysctl to specify same buffer size again Date: Thu, 3 Aug 2023 16:46:04 +0200 Message-ID: <20230803144605.477903-2-gbayer@linux.ibm.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230803144605.477903-1-gbayer@linux.ibm.com> References: <20230803144605.477903-1-gbayer@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: TpLhauTwof7kcbVzBGPnHai6C2EAtSDI X-Proofpoint-GUID: LDHSWsjJeGkSVzHJSxEZXjpxfi5z0KKu X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-08-03_14,2023-08-03_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 impostorscore=0 spamscore=0 adultscore=0 priorityscore=1501 bulkscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 mlxscore=0 clxscore=1015 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2308030131 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773225824268602397 X-GMAIL-MSGID: 1773225824268602397 Commit 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock and make them tunable") introduced the net.smc.rmem and net.smc.wmem sysctls to specify the size of buffers to be used for SMC type connections. This created a regression for users that specified the buffer size via setsockopt() as the effective buffer size was now doubled. Re-introduce the division by 2 in the SMC buffer create code and level this out by duplicating the net.smc.[rw]mem values used for initializing sk_rcvbuf/sk_sndbuf at socket creation time. This gives users of both methods (setsockopt or sysctl) the effective buffer size that they expect. Initialize net.smc.[rw]mem from its own constant of 64kB, respectively. Internal performance tests show that this value is a good compromise between throughput/latency and memory consumption. Also, this decouples it from any tuning that was done to net.ipv4.tcp_[rw]mem[1] before the module for SMC protocol was loaded. Check that no more than INT_MAX / 2 is assigned to net.smc.[rw]mem, in order to avoid any overflow condition when that is doubled for use in sk_sndbuf or sk_rcvbuf. While at it, drop the confusing sk_buf_size variable from __smc_buf_create and name "compressed" buffer size variables more consistently. Background: Before the commit mentioned above, SMC's buffer allocator in __smc_buf_create() always used half of the sockets' sk_rcvbuf/sk_sndbuf value as initial value to search for appropriate buffers. If the search resorted to using a bigger buffer when all buffers of the specified size were busy, the duplicate of the used effective buffer size is stored back to sk_rcvbuf/sk_sndbuf. When available, buffers of exactly the size that a user had specified as input to setsockopt() were used, despite setsockopt()'s documentation in "man 7 socket" talking of a mandatory duplication: [...] SO_SNDBUF Sets or gets the maximum socket send buffer in bytes. The kernel doubles this value (to allow space for book‐ keeping overhead) when it is set using setsockopt(2), and this doubled value is returned by getsockopt(2). The default value is set by the /proc/sys/net/core/wmem_default file and the maximum allowed value is set by the /proc/sys/net/core/wmem_max file. The minimum (doubled) value for this option is 2048. [...] Fixes: 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock and make them tunable") Co-developed-by: Jan Karcher Signed-off-by: Jan Karcher Signed-off-by: Gerd Bayer Reviewed-by: Wenjia Zhang Reviewed-by: Tony Lu --- net/smc/af_smc.c | 4 ++-- net/smc/smc.h | 2 +- net/smc/smc_clc.c | 4 ++-- net/smc/smc_core.c | 25 ++++++++++++------------- net/smc/smc_sysctl.c | 10 ++++++++-- 5 files changed, 25 insertions(+), 20 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index a7f887d91d89..1fcf1e42474a 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -378,8 +378,8 @@ static struct sock *smc_sock_alloc(struct net *net, struct socket *sock, sk->sk_state = SMC_INIT; sk->sk_destruct = smc_destruct; sk->sk_protocol = protocol; - WRITE_ONCE(sk->sk_sndbuf, READ_ONCE(net->smc.sysctl_wmem)); - WRITE_ONCE(sk->sk_rcvbuf, READ_ONCE(net->smc.sysctl_rmem)); + WRITE_ONCE(sk->sk_sndbuf, 2 * READ_ONCE(net->smc.sysctl_wmem)); + WRITE_ONCE(sk->sk_rcvbuf, 2 * READ_ONCE(net->smc.sysctl_rmem)); smc = smc_sk(sk); INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work); INIT_WORK(&smc->connect_work, smc_connect_work); diff --git a/net/smc/smc.h b/net/smc/smc.h index 2eeea4cdc718..1f2b912c43d1 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -161,7 +161,7 @@ struct smc_connection { struct smc_buf_desc *sndbuf_desc; /* send buffer descriptor */ struct smc_buf_desc *rmb_desc; /* RMBE descriptor */ - int rmbe_size_short;/* compressed notation */ + int rmbe_size_comp; /* compressed notation */ int rmbe_update_limit; /* lower limit for consumer * cursor update diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c index b9b8b07aa702..c90d9e5dda54 100644 --- a/net/smc/smc_clc.c +++ b/net/smc/smc_clc.c @@ -1007,7 +1007,7 @@ static int smc_clc_send_confirm_accept(struct smc_sock *smc, clc->d0.gid = conn->lgr->smcd->ops->get_local_gid(conn->lgr->smcd); clc->d0.token = conn->rmb_desc->token; - clc->d0.dmbe_size = conn->rmbe_size_short; + clc->d0.dmbe_size = conn->rmbe_size_comp; clc->d0.dmbe_idx = 0; memcpy(&clc->d0.linkid, conn->lgr->id, SMC_LGR_ID_SIZE); if (version == SMC_V1) { @@ -1050,7 +1050,7 @@ static int smc_clc_send_confirm_accept(struct smc_sock *smc, clc->r0.qp_mtu = min(link->path_mtu, link->peer_mtu); break; } - clc->r0.rmbe_size = conn->rmbe_size_short; + clc->r0.rmbe_size = conn->rmbe_size_comp; clc->r0.rmb_dma_addr = conn->rmb_desc->is_vm ? cpu_to_be64((uintptr_t)conn->rmb_desc->cpu_addr) : cpu_to_be64((u64)sg_dma_address diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 3f465faf2b68..6b78075404d7 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -2309,31 +2309,30 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb) struct smc_connection *conn = &smc->conn; struct smc_link_group *lgr = conn->lgr; struct list_head *buf_list; - int bufsize, bufsize_short; + int bufsize, bufsize_comp; struct rw_semaphore *lock; /* lock buffer list */ bool is_dgraded = false; - int sk_buf_size; if (is_rmb) /* use socket recv buffer size (w/o overhead) as start value */ - sk_buf_size = smc->sk.sk_rcvbuf; + bufsize = smc->sk.sk_rcvbuf / 2; else /* use socket send buffer size (w/o overhead) as start value */ - sk_buf_size = smc->sk.sk_sndbuf; + bufsize = smc->sk.sk_sndbuf / 2; - for (bufsize_short = smc_compress_bufsize(sk_buf_size, is_smcd, is_rmb); - bufsize_short >= 0; bufsize_short--) { + for (bufsize_comp = smc_compress_bufsize(bufsize, is_smcd, is_rmb); + bufsize_comp >= 0; bufsize_comp--) { if (is_rmb) { lock = &lgr->rmbs_lock; - buf_list = &lgr->rmbs[bufsize_short]; + buf_list = &lgr->rmbs[bufsize_comp]; } else { lock = &lgr->sndbufs_lock; - buf_list = &lgr->sndbufs[bufsize_short]; + buf_list = &lgr->sndbufs[bufsize_comp]; } - bufsize = smc_uncompress_bufsize(bufsize_short); + bufsize = smc_uncompress_bufsize(bufsize_comp); /* check for reusable slot in the link group */ - buf_desc = smc_buf_get_slot(bufsize_short, lock, buf_list); + buf_desc = smc_buf_get_slot(bufsize_comp, lock, buf_list); if (buf_desc) { buf_desc->is_dma_need_sync = 0; SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, bufsize); @@ -2377,8 +2376,8 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb) if (is_rmb) { conn->rmb_desc = buf_desc; - conn->rmbe_size_short = bufsize_short; - smc->sk.sk_rcvbuf = bufsize; + conn->rmbe_size_comp = bufsize_comp; + smc->sk.sk_rcvbuf = bufsize * 2; atomic_set(&conn->bytes_to_rcv, 0); conn->rmbe_update_limit = smc_rmb_wnd_update_limit(buf_desc->len); @@ -2386,7 +2385,7 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb) smc_ism_set_conn(conn); /* map RMB/smcd_dev to conn */ } else { conn->sndbuf_desc = buf_desc; - smc->sk.sk_sndbuf = bufsize; + smc->sk.sk_sndbuf = bufsize * 2; atomic_set(&conn->sndbuf_space, bufsize); } return 0; diff --git a/net/smc/smc_sysctl.c b/net/smc/smc_sysctl.c index b6f79fabb9d3..0b2a957ca5f5 100644 --- a/net/smc/smc_sysctl.c +++ b/net/smc/smc_sysctl.c @@ -21,6 +21,10 @@ static int min_sndbuf = SMC_BUF_MIN_SIZE; static int min_rcvbuf = SMC_BUF_MIN_SIZE; +static int max_sndbuf = INT_MAX / 2; +static int max_rcvbuf = INT_MAX / 2; +static const int net_smc_wmem_init = (64 * 1024); +static const int net_smc_rmem_init = (64 * 1024); static struct ctl_table smc_table[] = { { @@ -53,6 +57,7 @@ static struct ctl_table smc_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = &min_sndbuf, + .extra2 = &max_sndbuf, }, { .procname = "rmem", @@ -61,6 +66,7 @@ static struct ctl_table smc_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = &min_rcvbuf, + .extra2 = &max_rcvbuf, }, { } }; @@ -88,8 +94,8 @@ int __net_init smc_sysctl_net_init(struct net *net) net->smc.sysctl_autocorking_size = SMC_AUTOCORKING_DEFAULT_SIZE; net->smc.sysctl_smcr_buf_type = SMCR_PHYS_CONT_BUFS; net->smc.sysctl_smcr_testlink_time = SMC_LLC_TESTLINK_DEFAULT_TIME; - WRITE_ONCE(net->smc.sysctl_wmem, READ_ONCE(net->ipv4.sysctl_tcp_wmem[1])); - WRITE_ONCE(net->smc.sysctl_rmem, READ_ONCE(net->ipv4.sysctl_tcp_rmem[1])); + WRITE_ONCE(net->smc.sysctl_wmem, net_smc_wmem_init); + WRITE_ONCE(net->smc.sysctl_rmem, net_smc_rmem_init); return 0; From patchwork Thu Aug 3 14:46:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gerd Bayer X-Patchwork-Id: 130684 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9f41:0:b0:3e4:2afc:c1 with SMTP id v1csp1217323vqx; Thu, 3 Aug 2023 08:14:49 -0700 (PDT) X-Google-Smtp-Source: APBJJlEUTMdlG6NvJPFJWzWRKW2yw/8zuOiT7irGg1x3HnYtjZbVE4qegig7dOKXF5FnxGWBLDah X-Received: by 2002:a17:90b:4f81:b0:267:f893:d562 with SMTP id qe1-20020a17090b4f8100b00267f893d562mr16485205pjb.8.1691075689381; Thu, 03 Aug 2023 08:14:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691075689; cv=none; d=google.com; s=arc-20160816; b=Kvon8tVhBMCkSvx26eEtaThGvP2igrq2eCbUi7uc5PgGpEs9XxjIhzXDQh3tkzkDLf kuebY+3/qvQH2l3/hzF4ufZZlBSv7oobE9cS0xYmnjeuNQZkNliRL1s6IN0pGDdxrDo3 gBaDS/NCTkg2/I0dPSwiR7hMgFTulFipQtnJoVW4PhkwRuWvxoxn4VUlF8ClXi8Vom5K wYxs/mdtYCnf6lorp9LDD+HO0veSJ2TVZUJrdPmXaxtvjV5puwwf4P9+ppZ+yu2xrcTc K1ujfAwDppTP4tg/xq51KZsz9JzZhA6/q4SaT/JIx29jjWUCBNulgo7t7+JL+bxKf+i3 MfDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=avuefi9Jf8+2a1NB7rc28/ORuP+YUHSjQ/Bxn8++A4s=; fh=MiOLYvgEyBy10I/4Y0PTMEkpvqxuzVBS6+dTfeOblB4=; b=wtLoF6C8iiuBqc8wDeTE2D6px/KHGrxiXsF6lP96LZIPwGjiea/9JgjUCRI9QUgj1I QbjXcYf3d8HKiZyELLJbtaNi/p0QMBeCYSWImbU9slgb3Iw6Fmw7o4DPRdrMH2yUrDLY DN/aHAwvfa25owBLjW0/+zUOAjZAcbfm/yT8s/pCsdM7cx3NgdArQx+BR5pCuoiQlSf7 CY4sjjwV7tadhbkW/ZHitTy8+qrnYpqIwfV5wMXFOTbm+M2nD0NnQVQx5k9qAg0K2kzx 8LouoaExkQtOSfENncCJdDY4s5iXagW0bFxGVhVtT6ISdtHyCas2xP4jQbPE/6hXdY8E kddQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b="i/DWsOZE"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e14-20020a63e00e000000b00563fe7f89e9si6308pgh.186.2023.08.03.08.14.29; Thu, 03 Aug 2023 08:14:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b="i/DWsOZE"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236466AbjHCOqr (ORCPT + 99 others); Thu, 3 Aug 2023 10:46:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57786 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236439AbjHCOqf (ORCPT ); Thu, 3 Aug 2023 10:46:35 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8277619B9; Thu, 3 Aug 2023 07:46:32 -0700 (PDT) Received: from pps.filterd (m0353726.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 373Eg8JZ011669; Thu, 3 Aug 2023 14:46:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=avuefi9Jf8+2a1NB7rc28/ORuP+YUHSjQ/Bxn8++A4s=; b=i/DWsOZEHgsDxsvqQcqPcuaAx9FdNP2G4+i/pXZJY4nRIaa4Tqpk3C2RvSXMojwt+8i8 LVXq/Soc5oEwXVJV1ocgVfb+g0roqXlOPG/JxYfLlQirOsbW1nvNNGogEZzNV+q4Bhm/ 8yQkbcX3ssnz8TNuYlwCoTNXnd9FBXNl8kuYidaNrG5OxI1gl4MVeNTOr0KT0+MNzawp IFNTSi4q93ON+gcdQsBeZH7hRwadQVTRUkCFx3iS+XsekKlI9bwrIaJFVo7OBRyiKifd pAAWw9dy1POhCzcQjUgDrRAUTQhDvtjZBn7nbgxzynaLjM21iZb635mQ6ZbntgC6fzN7 IA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3s8e7wr591-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 03 Aug 2023 14:46:29 +0000 Received: from m0353726.ppops.net (m0353726.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 373EgMkE012742; Thu, 3 Aug 2023 14:46:29 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3s8e7wr587-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 03 Aug 2023 14:46:28 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 373DwmD9015480; Thu, 3 Aug 2023 14:46:12 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3s5e3ne8qj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 03 Aug 2023 14:46:12 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 373Ek8BM61538602 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 3 Aug 2023 14:46:08 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 050A420040; Thu, 3 Aug 2023 14:46:08 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9045020043; Thu, 3 Aug 2023 14:46:07 +0000 (GMT) Received: from dilbert5.boeblingen.de.ibm.com (unknown [9.155.208.153]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 3 Aug 2023 14:46:07 +0000 (GMT) From: Gerd Bayer To: Wenjia Zhang , Jan Karcher , Tony Lu , Paolo Abeni Cc: Karsten Graul , "D . Wythe" , Wen Gu , "David S . Miller" , Eric Dumazet , Jakub Kicinski , linux-s390@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net v2 2/2] net/smc: Use correct buffer sizes when switching between TCP and SMC Date: Thu, 3 Aug 2023 16:46:05 +0200 Message-ID: <20230803144605.477903-3-gbayer@linux.ibm.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230803144605.477903-1-gbayer@linux.ibm.com> References: <20230803144605.477903-1-gbayer@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: x2tMWBT83Fm1T3Qgq0ZgI2Do9MbS_gGc X-Proofpoint-ORIG-GUID: X53ZC6hFi7PrQEOgHQ0QVes4swtu3Ejv X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-08-03_14,2023-08-03_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 spamscore=0 impostorscore=0 priorityscore=1501 mlxscore=0 adultscore=0 malwarescore=0 mlxlogscore=999 bulkscore=0 lowpriorityscore=0 suspectscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2308030131 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773221381654442397 X-GMAIL-MSGID: 1773221381654442397 Tuning of the effective buffer size through setsockopts was working for SMC traffic only but not for TCP fall-back connections even before commit 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock and make them tunable"). That change made it apparent that TCP fall-back connections would use net.smc.[rw]mem as buffer size instead of net.ipv4_tcp_[rw]mem. Amend the code that copies attributes between the (TCP) clcsock and the SMC socket and adjust buffer sizes appropriately: - Copy over sk_userlocks so that both sockets agree on whether tuning via setsockopt is active. - When falling back to TCP use sk_sndbuf or sk_rcvbuf as specified with setsockopt. Otherwise, use the sysctl value for TCP/IPv4. - Likewise, use either values from setsockopt or from sysctl for SMC (duplicated) on successful SMC connect. In smc_tcp_listen_work() drop the explicit copy of buffer sizes as that is taken care of by the attribute copy. Fixes: 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock and make them tunable") Signed-off-by: Gerd Bayer Reviewed-by: Wenjia Zhang Reviewed-by: Tony Lu --- net/smc/af_smc.c | 73 +++++++++++++++++++++++++++++++++--------------- 1 file changed, 51 insertions(+), 22 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 1fcf1e42474a..385e86bd6bdf 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -436,13 +436,60 @@ static int smc_bind(struct socket *sock, struct sockaddr *uaddr, return rc; } +/* copy only relevant settings and flags of SOL_SOCKET level from smc to + * clc socket (since smc is not called for these options from net/core) + */ + +#define SK_FLAGS_SMC_TO_CLC ((1UL << SOCK_URGINLINE) | \ + (1UL << SOCK_KEEPOPEN) | \ + (1UL << SOCK_LINGER) | \ + (1UL << SOCK_BROADCAST) | \ + (1UL << SOCK_TIMESTAMP) | \ + (1UL << SOCK_DBG) | \ + (1UL << SOCK_RCVTSTAMP) | \ + (1UL << SOCK_RCVTSTAMPNS) | \ + (1UL << SOCK_LOCALROUTE) | \ + (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE) | \ + (1UL << SOCK_RXQ_OVFL) | \ + (1UL << SOCK_WIFI_STATUS) | \ + (1UL << SOCK_NOFCS) | \ + (1UL << SOCK_FILTER_LOCKED) | \ + (1UL << SOCK_TSTAMP_NEW)) + +/* if set, use value set by setsockopt() - else use IPv4 or SMC sysctl value */ +static void smc_adjust_sock_bufsizes(struct sock *nsk, struct sock *osk, + unsigned long mask) +{ + struct net *nnet = sock_net(nsk); + + nsk->sk_userlocks = osk->sk_userlocks; + if (osk->sk_userlocks & SOCK_SNDBUF_LOCK) { + nsk->sk_sndbuf = osk->sk_sndbuf; + } else { + if (mask == SK_FLAGS_SMC_TO_CLC) + WRITE_ONCE(nsk->sk_sndbuf, + READ_ONCE(nnet->ipv4.sysctl_tcp_wmem[1])); + else + WRITE_ONCE(nsk->sk_sndbuf, + 2 * READ_ONCE(nnet->smc.sysctl_wmem)); + } + if (osk->sk_userlocks & SOCK_RCVBUF_LOCK) { + nsk->sk_rcvbuf = osk->sk_rcvbuf; + } else { + if (mask == SK_FLAGS_SMC_TO_CLC) + WRITE_ONCE(nsk->sk_rcvbuf, + READ_ONCE(nnet->ipv4.sysctl_tcp_rmem[1])); + else + WRITE_ONCE(nsk->sk_rcvbuf, + 2 * READ_ONCE(nnet->smc.sysctl_rmem)); + } +} + static void smc_copy_sock_settings(struct sock *nsk, struct sock *osk, unsigned long mask) { /* options we don't get control via setsockopt for */ nsk->sk_type = osk->sk_type; - nsk->sk_sndbuf = osk->sk_sndbuf; - nsk->sk_rcvbuf = osk->sk_rcvbuf; nsk->sk_sndtimeo = osk->sk_sndtimeo; nsk->sk_rcvtimeo = osk->sk_rcvtimeo; nsk->sk_mark = osk->sk_mark; @@ -453,26 +500,10 @@ static void smc_copy_sock_settings(struct sock *nsk, struct sock *osk, nsk->sk_flags &= ~mask; nsk->sk_flags |= osk->sk_flags & mask; + + smc_adjust_sock_bufsizes(nsk, osk, mask); } -#define SK_FLAGS_SMC_TO_CLC ((1UL << SOCK_URGINLINE) | \ - (1UL << SOCK_KEEPOPEN) | \ - (1UL << SOCK_LINGER) | \ - (1UL << SOCK_BROADCAST) | \ - (1UL << SOCK_TIMESTAMP) | \ - (1UL << SOCK_DBG) | \ - (1UL << SOCK_RCVTSTAMP) | \ - (1UL << SOCK_RCVTSTAMPNS) | \ - (1UL << SOCK_LOCALROUTE) | \ - (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE) | \ - (1UL << SOCK_RXQ_OVFL) | \ - (1UL << SOCK_WIFI_STATUS) | \ - (1UL << SOCK_NOFCS) | \ - (1UL << SOCK_FILTER_LOCKED) | \ - (1UL << SOCK_TSTAMP_NEW)) -/* copy only relevant settings and flags of SOL_SOCKET level from smc to - * clc socket (since smc is not called for these options from net/core) - */ static void smc_copy_sock_settings_to_clc(struct smc_sock *smc) { smc_copy_sock_settings(smc->clcsock->sk, &smc->sk, SK_FLAGS_SMC_TO_CLC); @@ -2479,8 +2510,6 @@ static void smc_tcp_listen_work(struct work_struct *work) sock_hold(lsk); /* sock_put in smc_listen_work */ INIT_WORK(&new_smc->smc_listen_work, smc_listen_work); smc_copy_sock_settings_to_smc(new_smc); - new_smc->sk.sk_sndbuf = lsmc->sk.sk_sndbuf; - new_smc->sk.sk_rcvbuf = lsmc->sk.sk_rcvbuf; sock_hold(&new_smc->sk); /* sock_put in passive closing */ if (!queue_work(smc_hs_wq, &new_smc->smc_listen_work)) sock_put(&new_smc->sk);