Message ID | 20230221110344.82818-1-kerneljasonxing@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1809080wrn; Tue, 21 Feb 2023 03:28:29 -0800 (PST) X-Google-Smtp-Source: AK7set8hZ+S6hYOsw6my9NJfi0uQ3YGzsj41CxnQ6xY2rhptPYQ6nhb5nJRCNovLs2zApDBWEhZc X-Received: by 2002:a17:906:819:b0:880:e6d0:5794 with SMTP id e25-20020a170906081900b00880e6d05794mr10879028ejd.58.1676978909163; Tue, 21 Feb 2023 03:28:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676978909; cv=none; d=google.com; s=arc-20160816; b=G1tgDAmknUZfQRGCR59ryop5kYmiXAl3A/8tubODwh6NALbPDdWtEpGjPgR90mufvP bhgG/CsbAHAIvvv43/asTh4Ag01q/E/wPv5iBbv0TL2wULUSD1+l6yVW/pr6PupqgZau sGkTV7QDutPr5agF+nMd7VESikIYje1n4v+zHOYKG03T3BAX2Ubx3wFSusbvrp2nhOqq E334D2dDpMUMQJ6ddcH6t/ePGlyGR+eg5wVlQoRdoMVBg5ftWG63wN8mNhs38TmX7CVe 3FopxLbgi4qrrPiRyagGIo+vwKy58sijneCYFhK2kZ+Ait06u4ijqYaO6TYTFgrqrB+f FpWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=nHPFydhBkhM2PshUYGS/haIdEexuBTK4GOmlepQ4TI4=; b=nfGuUD2miu/o1hzINRRDwLGy48OOIPElLkQpcyc0Rsx6Aofia0+ovf52v71phY6ok1 xeglOkGvQaZepiB1ySVdcFR9kHM9iWRvNgMA5QSisDqH9gXprv18NlHvlpzEI9jbV7j8 21ACVNO7WDNlMSZqgJNBvEn8JD7RFlMnJ0ts9Ibx623ilLrcRdWPpOIv00gDjKlTKDBM QokASZZ09xUTD5uaeziqH7e1oVTy/QfhS0E13vBDuLqDJhNvAvR5r3UA87Ppq3dUEiLI Ska3Twl8Bp6MztDyQToAMdGZ64GpEniKSRtaJelvjAqaEQb2gtJxbNFTtNYyMn91SdFD dK5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=dSqO0PuU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 16-20020a170906209000b008c797d4010fsi11599658ejq.618.2023.02.21.03.28.06; Tue, 21 Feb 2023 03:28:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=dSqO0PuU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233678AbjBULEJ (ORCPT <rfc822;kautuk.consul.80@gmail.com> + 99 others); Tue, 21 Feb 2023 06:04:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233803AbjBULEH (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 21 Feb 2023 06:04:07 -0500 Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC37025E19; Tue, 21 Feb 2023 03:04:05 -0800 (PST) Received: by mail-pf1-x434.google.com with SMTP id n5so2429969pfv.11; Tue, 21 Feb 2023 03:04:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=nHPFydhBkhM2PshUYGS/haIdEexuBTK4GOmlepQ4TI4=; b=dSqO0PuUqfZpp9vDG0LYJ3ZqGUquoJQ0gYyDLnDQ4D8BhJrQKBBBi1Nq3ItpGYkV2P JekWR9fnN/PBcDVsX766YcRwfnvjZb95YqujLsiutWlxS18jlBX2I9L5PsyypG6I5aSi pjPgwEZMc3nAzhhzwskGt8bb1TGJ2khNedYvx8RHLh7n063VRNArz5JSOgxW+R2kUCff 2LnLqfLjOOH81SerQHfqmqmyMbsXUBEUjjkbGvTFqOQgzmh3Kxg7R1QJORNcO/F7I/xX eCY/DJNlsVy8cBnvoSCZAKz0cqKV/YnCLKEo/jVcrGoRHmpDsQ2YOheelHylg/IF/IVv wtVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=nHPFydhBkhM2PshUYGS/haIdEexuBTK4GOmlepQ4TI4=; b=KyAe9GntkBOkGF8B14ykhITzUVZ7D1iQTm+uaBMdsTLnynyfMZCuMLS7H4jOOXH7Sc CMnagZ77aq1GMCeGOIM9u/W2+ROvgMUpKX8SjcFpeeLG6EZAkEv1lqX0g85vKxgBMmdS shW6EFzlbnLMIx55oR8LrMOVgEdPpjLQKfopIdxbRwGtjFVWnkshcS/4I952cMmzVb89 PzzFuvcD9GYjph/wK7Yr2zf3gBMqt6k9B8DNv/5t2Rlap7Tb+8Vxvvqzmm3rcfldPyFH cCLZzRR3qdxzstprnVZ237fqx2ysvelRf0rb5b2rxCLHuLX98YVHA20T5Rb1gJkMG3/N Q7tw== X-Gm-Message-State: AO0yUKVWdUR4+qHlS6i0UT3HvfpPlDxDsii63xEi4frVVi8pu/HOrdF5 a5VBnfRh5tpdOznewj++QKzimFo9B9IChg== X-Received: by 2002:a05:6a00:1786:b0:5a9:d0e6:6a85 with SMTP id s6-20020a056a00178600b005a9d0e66a85mr4943518pfg.7.1676977445194; Tue, 21 Feb 2023 03:04:05 -0800 (PST) Received: from KERNELXING-MB0.tencent.com ([103.7.29.31]) by smtp.gmail.com with ESMTPSA id e15-20020aa78c4f000000b005aa80fe8be7sm7505193pfd.67.2023.02.21.03.04.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Feb 2023 03:04:04 -0800 (PST) From: Jason Xing <kerneljasonxing@gmail.com> To: willemdebruijn.kernel@gmail.com, davem@davemloft.net, dsahern@kernel.org, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, kerneljasonxing@gmail.com, Jason Xing <kernelxing@tencent.com> Subject: [PATCH net] udp: fix memory schedule error Date: Tue, 21 Feb 2023 19:03:44 +0800 Message-Id: <20230221110344.82818-1-kerneljasonxing@gmail.com> X-Mailer: git-send-email 2.33.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758439836215578409?= X-GMAIL-MSGID: =?utf-8?q?1758439836215578409?= |
Series |
[net] udp: fix memory schedule error
|
|
Commit Message
Jason Xing
Feb. 21, 2023, 11:03 a.m. UTC
From: Jason Xing <kernelxing@tencent.com> Quoting from the commit 7c80b038d23e ("net: fix sk_wmem_schedule() and sk_rmem_schedule() errors"): "If sk->sk_forward_alloc is 150000, and we need to schedule 150001 bytes, we want to allocate 1 byte more (rounded up to one page), instead of 150001" After applied this patch, we could avoid receive path scheduling extra amount of memory. Fixes: f970bd9e3a06 ("udp: implement memory accounting helpers") Signed-off-by: Jason Xing <kernelxing@tencent.com> --- net/ipv4/udp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
Comments
On Tue, 2023-02-21 at 19:03 +0800, Jason Xing wrote: > From: Jason Xing <kernelxing@tencent.com> > > Quoting from the commit 7c80b038d23e ("net: fix sk_wmem_schedule() > and sk_rmem_schedule() errors"): > > "If sk->sk_forward_alloc is 150000, and we need to schedule 150001 bytes, > we want to allocate 1 byte more (rounded up to one page), > instead of 150001" I'm wondering if this would cause measurable (even small) performance regression? Specifically under high packet rate, with BH and user-space processing happening on different CPUs. Could you please provide the relevant performance figures? Thanks! Paolo
On Tue, Feb 21, 2023 at 1:27 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On Tue, 2023-02-21 at 19:03 +0800, Jason Xing wrote: > > From: Jason Xing <kernelxing@tencent.com> > > > > Quoting from the commit 7c80b038d23e ("net: fix sk_wmem_schedule() > > and sk_rmem_schedule() errors"): > > > > "If sk->sk_forward_alloc is 150000, and we need to schedule 150001 bytes, > > we want to allocate 1 byte more (rounded up to one page), > > instead of 150001" > > I'm wondering if this would cause measurable (even small) performance > regression? Specifically under high packet rate, with BH and user-space > processing happening on different CPUs. > > Could you please provide the relevant performance figures? > > Thanks! > > Paolo > Probably not a big deal. TCP skb truesize can easily reach 180 KB, but for UDP it's 99% below or close to a 4K page. I doubt this change makes any difference for UDP.
On Tue, Feb 21, 2023 at 8:27 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On Tue, 2023-02-21 at 19:03 +0800, Jason Xing wrote: > > From: Jason Xing <kernelxing@tencent.com> > > > > Quoting from the commit 7c80b038d23e ("net: fix sk_wmem_schedule() > > and sk_rmem_schedule() errors"): > > > > "If sk->sk_forward_alloc is 150000, and we need to schedule 150001 bytes, > > we want to allocate 1 byte more (rounded up to one page), > > instead of 150001" > > I'm wondering if this would cause measurable (even small) performance > regression? Specifically under high packet rate, with BH and user-space > processing happening on different CPUs. > > Could you please provide the relevant performance figures? Sure, I've done some basic tests on my machine as below. Environment: 16 cpus, 60G memory Server: run "iperf3 -s -p [port]" command and start 500 processes. Client: run "iperf3 -u -c 127.0.0.1 -p [port]" command and start 500 processes. Running such tests makes sure that the util output of every cpu is higher than 15% which is observed through top command. Here're some before/after numbers by using the "sar -n DEV 10 2" command. Before: rxpck/s 2000, txpck/s 2000, rxkB/s 64054.69, txkB/s 64054.69 After: rxpck/s 2000, txpck/s 2000, rxkB/s 64054.58, txkB/s 64054.58 So I don't see much impact on the results. In theory, I have no clue about why it could cause some regression? Maybe the memory allocation is not that enough compared to the original code? Thanks, Jason > > Thanks! > > Paolo >
On Tue, Feb 21, 2023 at 8:35 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Feb 21, 2023 at 1:27 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > On Tue, 2023-02-21 at 19:03 +0800, Jason Xing wrote: > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > Quoting from the commit 7c80b038d23e ("net: fix sk_wmem_schedule() > > > and sk_rmem_schedule() errors"): > > > > > > "If sk->sk_forward_alloc is 150000, and we need to schedule 150001 bytes, > > > we want to allocate 1 byte more (rounded up to one page), > > > instead of 150001" > > > > I'm wondering if this would cause measurable (even small) performance > > regression? Specifically under high packet rate, with BH and user-space > > processing happening on different CPUs. > > > > Could you please provide the relevant performance figures? > > > > Thanks! > > > > Paolo > > > > Probably not a big deal. > > TCP skb truesize can easily reach 180 KB, but for UDP it's 99% below > or close to a 4K page. Yes. > > I doubt this change makes any difference for UDP. Based on my understanding of this part, it could not neither cause much regression nor improve much performance. I think what you've done to the TCP stack is the right way to go so the UDP can probably follow this. Calculating extra memory is a little bit odd because we actually don't need that much when receiving skb everytime. Thanks, Jason
On Tue, 2023-02-21 at 21:39 +0800, Jason Xing wrote: > On Tue, Feb 21, 2023 at 8:27 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > On Tue, 2023-02-21 at 19:03 +0800, Jason Xing wrote: > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > Quoting from the commit 7c80b038d23e ("net: fix sk_wmem_schedule() > > > and sk_rmem_schedule() errors"): > > > > > > "If sk->sk_forward_alloc is 150000, and we need to schedule 150001 bytes, > > > we want to allocate 1 byte more (rounded up to one page), > > > instead of 150001" > > > > I'm wondering if this would cause measurable (even small) performance > > regression? Specifically under high packet rate, with BH and user-space > > processing happening on different CPUs. > > > > Could you please provide the relevant performance figures? > > Sure, I've done some basic tests on my machine as below. > > Environment: 16 cpus, 60G memory > Server: run "iperf3 -s -p [port]" command and start 500 processes. > Client: run "iperf3 -u -c 127.0.0.1 -p [port]" command and start 500 processes. Just for the records, with the above command each process will send pkts at 1mbs - not very relevant performance wise. Instead you could do: taskset 0x2 iperf -s & iperf -u -c 127.0.0.1 -b 0 -l 64 > In theory, I have no clue about why it could cause some regression? > Maybe the memory allocation is not that enough compared to the > original code? As Eric noted, for UDP traffic, due to the expected average packet size, sk_forward_alloc is touched quite frequently, both with and without this patch, so there is little chance it will have any performance impact. Cheers, Paolo
On Tue, Feb 21, 2023 at 10:46 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On Tue, 2023-02-21 at 21:39 +0800, Jason Xing wrote: > > On Tue, Feb 21, 2023 at 8:27 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > On Tue, 2023-02-21 at 19:03 +0800, Jason Xing wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > Quoting from the commit 7c80b038d23e ("net: fix sk_wmem_schedule() > > > > and sk_rmem_schedule() errors"): > > > > > > > > "If sk->sk_forward_alloc is 150000, and we need to schedule 150001 bytes, > > > > we want to allocate 1 byte more (rounded up to one page), > > > > instead of 150001" > > > > > > I'm wondering if this would cause measurable (even small) performance > > > regression? Specifically under high packet rate, with BH and user-space > > > processing happening on different CPUs. > > > > > > Could you please provide the relevant performance figures? > > > > Sure, I've done some basic tests on my machine as below. > > > > Environment: 16 cpus, 60G memory > > Server: run "iperf3 -s -p [port]" command and start 500 processes. > > Client: run "iperf3 -u -c 127.0.0.1 -p [port]" command and start 500 processes. > > Just for the records, with the above command each process will send > pkts at 1mbs - not very relevant performance wise. > > Instead you could do: > > taskset 0x2 iperf -s & > iperf -u -c 127.0.0.1 -b 0 -l 64 > Thanks for your guidance. Here're some numbers according to what you suggested, which I tested several times. ----------|IFACE rxpck/s txpck/s rxkB/s txkB/s Before: lo 411073.41 411073.41 36932.38 36932.38 After: lo 410308.73 410308.73 36863.81 36863.81 Above is one of many results which does not mean that the original code absolutely outperforms. The output is not that constant and stable, I think. Please help me review those numbers. > > > In theory, I have no clue about why it could cause some regression? > > Maybe the memory allocation is not that enough compared to the > > original code? > > As Eric noted, for UDP traffic, due to the expected average packet > size, sk_forward_alloc is touched quite frequently, both with and > without this patch, so there is little chance it will have any > performance impact. Well, I see. Thanks, Jason > > Cheers, > > Paolo >
On Tue, Feb 21, 2023 at 11:46 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Tue, Feb 21, 2023 at 10:46 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > On Tue, 2023-02-21 at 21:39 +0800, Jason Xing wrote: > > > On Tue, Feb 21, 2023 at 8:27 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > On Tue, 2023-02-21 at 19:03 +0800, Jason Xing wrote: > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > Quoting from the commit 7c80b038d23e ("net: fix sk_wmem_schedule() > > > > > and sk_rmem_schedule() errors"): > > > > > > > > > > "If sk->sk_forward_alloc is 150000, and we need to schedule 150001 bytes, > > > > > we want to allocate 1 byte more (rounded up to one page), > > > > > instead of 150001" > > > > > > > > I'm wondering if this would cause measurable (even small) performance > > > > regression? Specifically under high packet rate, with BH and user-space > > > > processing happening on different CPUs. > > > > > > > > Could you please provide the relevant performance figures? > > > > > > Sure, I've done some basic tests on my machine as below. > > > > > > Environment: 16 cpus, 60G memory > > > Server: run "iperf3 -s -p [port]" command and start 500 processes. > > > Client: run "iperf3 -u -c 127.0.0.1 -p [port]" command and start 500 processes. > > > > Just for the records, with the above command each process will send > > pkts at 1mbs - not very relevant performance wise. > > > > Instead you could do: > > > > > taskset 0x2 iperf -s & > > iperf -u -c 127.0.0.1 -b 0 -l 64 > > > > Thanks for your guidance. > > Here're some numbers according to what you suggested, which I tested > several times. > ----------|IFACE rxpck/s txpck/s rxkB/s txkB/s > Before: lo 411073.41 411073.41 36932.38 36932.38 > After: lo 410308.73 410308.73 36863.81 36863.81 > > Above is one of many results which does not mean that the original > code absolutely outperforms. > The output is not that constant and stable, I think. Today, I ran the same test on other servers, it looks the same as above. Those results fluctuate within ~2%. Oh, one more thing I forgot to say is the output of iperf itself which doesn't show any difference. Before: Bitrate is 211 - 212 Mbits/sec After: Bitrate is 211 - 212 Mbits/sec So this result is relatively constant especially if we keep running the test over 2 minutes. Jason > > Please help me review those numbers. > > > > > > In theory, I have no clue about why it could cause some regression? > > > Maybe the memory allocation is not that enough compared to the > > > original code? > > > > As Eric noted, for UDP traffic, due to the expected average packet > > size, sk_forward_alloc is touched quite frequently, both with and > > without this patch, so there is little chance it will have any > > performance impact. > > Well, I see. > > Thanks, > Jason > > > > > Cheers, > > > > Paolo > >
On Wed, 2023-02-22 at 11:47 +0800, Jason Xing wrote: > On Tue, Feb 21, 2023 at 11:46 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Tue, Feb 21, 2023 at 10:46 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > On Tue, 2023-02-21 at 21:39 +0800, Jason Xing wrote: > > > > On Tue, Feb 21, 2023 at 8:27 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > > > On Tue, 2023-02-21 at 19:03 +0800, Jason Xing wrote: > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > > > Quoting from the commit 7c80b038d23e ("net: fix sk_wmem_schedule() > > > > > > and sk_rmem_schedule() errors"): > > > > > > > > > > > > "If sk->sk_forward_alloc is 150000, and we need to schedule 150001 bytes, > > > > > > we want to allocate 1 byte more (rounded up to one page), > > > > > > instead of 150001" > > > > > > > > > > I'm wondering if this would cause measurable (even small) performance > > > > > regression? Specifically under high packet rate, with BH and user-space > > > > > processing happening on different CPUs. > > > > > > > > > > Could you please provide the relevant performance figures? > > > > > > > > Sure, I've done some basic tests on my machine as below. > > > > > > > > Environment: 16 cpus, 60G memory > > > > Server: run "iperf3 -s -p [port]" command and start 500 processes. > > > > Client: run "iperf3 -u -c 127.0.0.1 -p [port]" command and start 500 processes. > > > > > > Just for the records, with the above command each process will send > > > pkts at 1mbs - not very relevant performance wise. > > > > > > Instead you could do: > > > > > > > > taskset 0x2 iperf -s & > > > iperf -u -c 127.0.0.1 -b 0 -l 64 > > > > > > > Thanks for your guidance. > > > > Here're some numbers according to what you suggested, which I tested > > several times. > > ----------|IFACE rxpck/s txpck/s rxkB/s txkB/s > > Before: lo 411073.41 411073.41 36932.38 36932.38 > > After: lo 410308.73 410308.73 36863.81 36863.81 > > > > Above is one of many results which does not mean that the original > > code absolutely outperforms. > > The output is not that constant and stable, I think. > > Today, I ran the same test on other servers, it looks the same as > above. Those results fluctuate within ~2%. > > Oh, one more thing I forgot to say is the output of iperf itself which > doesn't show any difference. > Before: Bitrate is 211 - 212 Mbits/sec > After: Bitrate is 211 - 212 Mbits/sec > So this result is relatively constant especially if we keep running > the test over 2 minutes. Thanks for the testing. My personal take on this one is that is more a refactor than a bug fix - as the amount forward allocated memory should always be negligible for UDP. Still it could make sense keep the accounting schema consistent across different protocols. I suggest to repost for net-next, when it will re- open, additionally introducing __sk_mem_schedule() usage to avoid code duplication. Thanks, Paolo
On Thu, Feb 23, 2023 at 4:39 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On Wed, 2023-02-22 at 11:47 +0800, Jason Xing wrote: > > On Tue, Feb 21, 2023 at 11:46 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > On Tue, Feb 21, 2023 at 10:46 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > On Tue, 2023-02-21 at 21:39 +0800, Jason Xing wrote: > > > > > On Tue, Feb 21, 2023 at 8:27 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > > > > > On Tue, 2023-02-21 at 19:03 +0800, Jason Xing wrote: > > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > > > > > Quoting from the commit 7c80b038d23e ("net: fix sk_wmem_schedule() > > > > > > > and sk_rmem_schedule() errors"): > > > > > > > > > > > > > > "If sk->sk_forward_alloc is 150000, and we need to schedule 150001 bytes, > > > > > > > we want to allocate 1 byte more (rounded up to one page), > > > > > > > instead of 150001" > > > > > > > > > > > > I'm wondering if this would cause measurable (even small) performance > > > > > > regression? Specifically under high packet rate, with BH and user-space > > > > > > processing happening on different CPUs. > > > > > > > > > > > > Could you please provide the relevant performance figures? > > > > > > > > > > Sure, I've done some basic tests on my machine as below. > > > > > > > > > > Environment: 16 cpus, 60G memory > > > > > Server: run "iperf3 -s -p [port]" command and start 500 processes. > > > > > Client: run "iperf3 -u -c 127.0.0.1 -p [port]" command and start 500 processes. > > > > > > > > Just for the records, with the above command each process will send > > > > pkts at 1mbs - not very relevant performance wise. > > > > > > > > Instead you could do: > > > > > > > > > > > taskset 0x2 iperf -s & > > > > iperf -u -c 127.0.0.1 -b 0 -l 64 > > > > > > > > > > Thanks for your guidance. > > > > > > Here're some numbers according to what you suggested, which I tested > > > several times. > > > ----------|IFACE rxpck/s txpck/s rxkB/s txkB/s > > > Before: lo 411073.41 411073.41 36932.38 36932.38 > > > After: lo 410308.73 410308.73 36863.81 36863.81 > > > > > > Above is one of many results which does not mean that the original > > > code absolutely outperforms. > > > The output is not that constant and stable, I think. > > > > Today, I ran the same test on other servers, it looks the same as > > above. Those results fluctuate within ~2%. > > > > Oh, one more thing I forgot to say is the output of iperf itself which > > doesn't show any difference. > > Before: Bitrate is 211 - 212 Mbits/sec > > After: Bitrate is 211 - 212 Mbits/sec > > So this result is relatively constant especially if we keep running > > the test over 2 minutes. > > Thanks for the testing. My personal take on this one is that is more a > refactor than a bug fix - as the amount forward allocated memory should > always be negligible for UDP. > > Still it could make sense keep the accounting schema consistent across > different protocols. I suggest to repost for net-next, when it will re- > open, additionally introducing __sk_mem_schedule() usage to avoid code > duplication. > Thanks for the review. I will replace this part with __sk_mem_schedule() and then repost it after Mar 6th. Thanks, Jason > Thanks, > > Paolo >
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 9592fe3e444a..a13f622cfa36 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1567,16 +1567,16 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb) goto uncharge_drop; spin_lock(&list->lock); - if (size >= sk->sk_forward_alloc) { - amt = sk_mem_pages(size); - delta = amt << PAGE_SHIFT; + if (size > sk->sk_forward_alloc) { + delta = size - sk->sk_forward_alloc; + amt = sk_mem_pages(delta); if (!__sk_mem_raise_allocated(sk, delta, amt, SK_MEM_RECV)) { err = -ENOBUFS; spin_unlock(&list->lock); goto uncharge_drop; } - sk->sk_forward_alloc += delta; + sk->sk_forward_alloc += amt << PAGE_SHIFT; } sk->sk_forward_alloc -= size;