Message ID | 20230627110038.GCZJrBVqu/4BfdyBeN@fat_crate.local |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8130890vqr; Tue, 27 Jun 2023 04:47:51 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7PFOMxDxkrNsC40Y7D7/DSQpBCMwoQeSqGpco4s1vefoBlmbOULV/3AoxeDcYLZo9aM/Jn X-Received: by 2002:aa7:db89:0:b0:51a:265a:8fca with SMTP id u9-20020aa7db89000000b0051a265a8fcamr18631425edt.27.1687866471324; Tue, 27 Jun 2023 04:47:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687866471; cv=none; d=google.com; s=arc-20160816; b=aivmdaCuSJRH44f0H9XPh5ivaradPdEANoxZnRxR2XpWBxvqJhCpyfZSPXON+dsrCW CU7BVwYVryap3Jo7YbqaBMaV5o/yAeHGui44G50TI0CQzj+zPWuhRgON9j8xAq81DSy0 XMqDvwJCe5UqVz56lCv8zNLXHza+9DIpd6vSrWF5AyPPXefedz4qmQGz6ItFJl5EdcXB bOOp6bq52jTf7WJsrhs4VTz0y6k3IhaO2nv53OJ4QGjIvebhPQ/piU6B38ZOMuEARe0S UYN7ECOe8mJSb8Jgkke6YEOxlV82hayDjrWDGC3vyuytdEsWKTaALP2xggjrrZtml1id ixRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:date:dkim-signature:dkim-signature; bh=tNTu6Pvc43JIGHySj2x4ASfX/dpRFgRCaVESW8OczY8=; fh=5ZjxVYKXeWzXrQtuvh8WKOTEKQBUJ80ckFoMga5XCYE=; b=wxXf8yXBodWbLgun0McYW6racC45v76e3d1+DH4obDsOsdVcRFaiF21MPa8CuR7qf9 068qVCydRyvPXZ6vBbsX+6rYbDRs/bEKAdwE3W2rZ1HIgzEHNxvHf1JuwirUMX0Ctp/a 7uwiCat+i8tNjvGn3yeP5KTEAdYS5DLpowLPqsElka5cxdCfq1VwtnfW3RLp9kG1/C9z b3D4Sqgx4+VbDzrDLgVldGzbtOen6ihIwu2ylmN10GqofTxg7l8icKqSqzuGy2Eo2HfM KMEj3Q2mT5TlrenU55WCcnJKreBakfUD938kn3Yu2HBJH2PNSBnLw1oDSkhdc4I6S+ds yARQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@alien8.de header.s=dkim header.b=gN9qHRAW; dkim=pass header.i=@alien8.de header.s=alien8 header.b=CVW3LWR5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k24-20020aa7c398000000b0051beb4df9c6si3837178edq.207.2023.06.27.04.47.27; Tue, 27 Jun 2023 04:47:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@alien8.de header.s=dkim header.b=gN9qHRAW; dkim=pass header.i=@alien8.de header.s=alien8 header.b=CVW3LWR5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229481AbjF0LA6 (ORCPT <rfc822;nicolai.engesland@gmail.com> + 99 others); Tue, 27 Jun 2023 07:00:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59248 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230086AbjF0LAy (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 27 Jun 2023 07:00:54 -0400 Received: from mail.skyhub.de (mail.skyhub.de [IPv6:2a01:4f8:190:11c2::b:1457]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E22921FC7 for <linux-kernel@vger.kernel.org>; Tue, 27 Jun 2023 04:00:50 -0700 (PDT) Received: from mail.alien8.de (mail.alien8.de [65.109.113.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.skyhub.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 586521EC06BB; Tue, 27 Jun 2023 13:00:49 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=dkim; t=1687863649; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:in-reply-to:references; bh=tNTu6Pvc43JIGHySj2x4ASfX/dpRFgRCaVESW8OczY8=; b=gN9qHRAWAwU7hIRbz10Y71fwYpbn1+iyg7AaZAtscaxPsyVCIFKJJDuG1kWCPihN36Vuc6 9AhtpCbfUADrHahbmi36FmiXa0Rr3mIM5m6xKkOCuVmcvmcL+/Y2NER0AoqV6ll4/eHm8F R647KpE4fkkYCkrXQPwee+gMyShaHEE= X-Virus-Scanned: Debian amavisd-new at mail.alien8.de Authentication-Results: mail.alien8.de (amavisd-new); dkim=pass (4096-bit key) header.d=alien8.de Received: from mail.alien8.de ([127.0.0.1]) by localhost (mail.alien8.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id VJhc5N-_SW9S; Tue, 27 Jun 2023 11:00:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=alien8; t=1687863645; bh=tNTu6Pvc43JIGHySj2x4ASfX/dpRFgRCaVESW8OczY8=; h=Date:From:To:Cc:Subject:From; b=CVW3LWR5cegKkvBWoGB5TWjPgfICEbpZRG40qRWYrQrAE6B6b/JdkYimjmU4vCse0 TnrJiASnr9Z51wH4+SUzOgYs2xxBOKiFiGV7C+hl5r4NRU13D9fFhleRW87sSLn7qr cecQC6Oe1nZMdxGGRlZyBgo8SklL9Ua+Lm5KhNs+2uAA0i7oV8D10jvi1ybAidI8vm VkOtFKykbdle50I1Mjlh/5MqFPh2IooBGeyo9IJ4IIHPpDS0rSLcAwIqOc7V8ALj8R XOyxta7emEC+Xb+Ak6PmVVIa9nXAXKu8ict/gYEqG9bdqv+qa3wMHC4DRhF/Ze0nUO rg1lEC/V3SiTC+dxHEqJzH/1hOzaQvoHuUsff/dOdecKCtf94HRHg4yvY69rajC+H5 SKQKYbSi6LrjLjcroueX3s5mLVujqaGiynHvjqDIzzmbJSQwgIXj9VuhIr+jnO2aRt 1613pM5a3fUGiUTngJFzFaZ1sd7xSvzBPmvGHZ49YTCYqDcRFCE8/IPnncTA6CfaJr IgbxzhZXOJGL/rYOwW68JDJnSDgfXbb33w3GiSiv3q/5e9VOJUIaua66wqNK4UDiIK H+vCWbsZUdJ6tBkPsKANUmyYumhps11EhVqUGEufXGR8RlS/PHmSQmzoZy/OFtVMP/ kGNZOUKGf96tT6/AxPXta0wI= Received: from zn.tnic (pd9530d32.dip0.t-ipconnect.de [217.83.13.50]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id D4C8040E0033; Tue, 27 Jun 2023 11:00:42 +0000 (UTC) Date: Tue, 27 Jun 2023 13:00:38 +0200 From: Borislav Petkov <bp@alien8.de> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: x86-ml <x86@kernel.org>, lkml <linux-kernel@vger.kernel.org> Subject: [GIT PULL] x86/misc for 6.5 Message-ID: <20230627110038.GCZJrBVqu/4BfdyBeN@fat_crate.local> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769856272827806755?= X-GMAIL-MSGID: =?utf-8?q?1769856272827806755?= |
Series |
[GIT,PULL] x86/misc for 6.5
|
|
Pull-request
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git tags/x86_misc_for_v6.5Message
Borislav Petkov
June 27, 2023, 11 a.m. UTC
Hi Linus, please pull the misc pile of updates for 6.5. Thx. --- The following changes since commit ac9a78681b921877518763ba0e89202254349d1b: Linux 6.4-rc1 (2023-05-07 13:34:35 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git tags/x86_misc_for_v6.5 for you to fetch changes up to 5516c89d58283413134f8d26960c6303d5d5bd89: x86/lib: Make get/put_user() exception handling a visible symbol (2023-06-02 10:51:46 +0200) ---------------------------------------------------------------- - Remove the local symbols prefix of the get/put_user() exception handling symbols so that tools do not get confused by the presence of code belonging to the wrong symbol/not belonging to any symbol - Improve csum_partial()'s performance - Some improvements to the kcpuid tool ---------------------------------------------------------------- Borislav Petkov (AMD) (1): tools/x86/kcpuid: Dump the correct CPUID function in error Nadav Amit (1): x86/lib: Make get/put_user() exception handling a visible symbol Nathan Chancellor (1): x86/csum: Fix clang -Wuninitialized in csum_partial() Noah Goldstein (1): x86/csum: Improve performance of `csum_partial` Rong Tao (1): tools/x86/kcpuid: Add .gitignore arch/x86/lib/csum-partial_64.c | 101 ++++++++---- arch/x86/lib/getuser.S | 32 ++-- arch/x86/lib/putuser.S | 24 +-- lib/Kconfig.debug | 17 ++ lib/Makefile | 1 + lib/checksum_kunit.c | 334 +++++++++++++++++++++++++++++++++++++++ tools/arch/x86/kcpuid/.gitignore | 1 + tools/arch/x86/kcpuid/kcpuid.c | 7 +- 8 files changed, 453 insertions(+), 64 deletions(-) create mode 100644 lib/checksum_kunit.c create mode 100644 tools/arch/x86/kcpuid/.gitignore
Comments
On Tue, 27 Jun 2023 at 04:00, Borislav Petkov <bp@alien8.de> wrote: > > - Improve csum_partial()'s performance Honestly, looking at that patch, my reaction is "why did it get unrolled in 64-byte chunks, if 40 bytes is the magic value"? Particularly when there is then that "do a carry op each 32 bytes to make 32-byte chunks independent and increase ILP". So even the 64-byte case isn't *actuall* doing a 64-byte unrolling, it's really doing two 32-byte unrollings in parallel. So you have three "magic" values, and the only one that really matters is likely the 40-byte one. Yes, yes, 64 bytes is the usual cacheline size, and is "traditional" for unrolling. But there's nothing really magical about it here. End result: wouldn't it have been nice to just do 40-byte chunks, and make the 64-byte "two overlapping 32-byte chunks" be two of the 40-byte chunks. Something like the (ENTIRELY UNTESTED!) attached patch? Again: this is *not* tested. I took a quick look at the generated assembly, and it looked roughly like what I expected it to look like, but it may be complete garbage. I added a couple of "likely()" things just because it made the generated asm look more natural (ie it followed the order of the source code there), they are otherwise questionable annotations. Finally: did I already mention that this is completely untested? Linus
On Tue, 27 Jun 2023 at 13:11, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Finally: did I already mention that this is completely untested? Oh, this part is buggy: + asm("addq %1,%0\n\t" + "adcq $0,%0" + :"=r" (temp64): "r" (temp64_2)); and it needs to show that 'temp64' is an input too. Dummy me. The trivial fix is just to make the "=r" be a "+r". In fact, I should have used "+r" inside update_csum_40b(), but at least there I did add the proper input constraint, so that one isn't actively buggy. And again: I noticed this by looking at the patch one more time. No actual *testing* has happened. It might still be buggy garbage even with that "+r". It's just a bit *less* buggy garbage. I will now go back to my cave and continue pulling stuff, I just had to do something else for a while. Some people relax with a nice drink by the pool, I relax by playing around with inline asm. Linus
On Tue, Jun 27, 2023 at 01:26:18PM -0700, Linus Torvalds wrote: > I will now go back to my cave and continue pulling stuff, I just had > to do something else for a while. Some people relax with a nice drink > by the pool, I relax by playing around with inline asm. And there's a third kind who relax by the pool with a nice drink, *while* playing around with inline asm. ;-P Btw, I'll send you a new version of this pull request with this patch dropped to let folks experiment with it more. Thx.
On Tue, 27 Jun 2023 at 13:38, Borislav Petkov <bp@alien8.de> wrote: > > And there's a third kind who relax by the pool with a nice drink, > *while* playing around with inline asm. ;-P That explains a lot. > Btw, I'll send you a new version of this pull request with this patch > dropped to let folks experiment with it more. Oh, I already merged it. I don't hate the change, I just looked at it and went "I would have done that differently" and started playing around with it. There's nothing hugely *wrong* with the code I merged, but I do think that it did too much inside the inline asm (ie looping inside the asm, but also initializing values that could have - and should have - just been given as inputs to the asm). And the whole "why have two different versions for 40-byte and 64-byte areas, when you _could_ just do it with one 40-byte one that you then also just unroll". So I _think_ my version is nicer and shorter - assuming it works and there are no other bugs than the one I already noticed - but I don't think it's a huge deal. Anyway, before I throw my patch away, I'll just post it with the trivial fixes to use "+r", and with the "volatile" removed (I add "volatile" to asms by habit, but this one really isn't volatile). I just checked that both gcc and clang seem to be happy with it, but that's the only testing this patch has gotten: it compiles for me. Do with it what you will. Linus
The pull request you sent on Tue, 27 Jun 2023 13:00:38 +0200:
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git tags/x86_misc_for_v6.5
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/4baa098a147d76a9ad1a6867fa14286db52085b6
Thank you!
On Tue, Jun 27, 2023 at 01:49:12PM -0700, Linus Torvalds wrote:
> That explains a lot.
LOL!
That activity has one rule: don't send the code on the same day as the
pool visit. :-)
On 6/27/2023 1:11 PM, Linus Torvalds wrote: > On Tue, 27 Jun 2023 at 04:00, Borislav Petkov <bp@alien8.de> wrote: >> >> - Improve csum_partial()'s performance > > Honestly, looking at that patch, my reaction is "why did it get > unrolled in 64-byte chunks, if 40 bytes is the magic value"? > > Particularly when there is then that "do a carry op each 32 bytes to > make 32-byte chunks independent and increase ILP". So even the 64-byte > case isn't *actuall* doing a 64-byte unrolling, it's really doing two > 32-byte unrollings in parallel. > > So you have three "magic" values, and the only one that really matters > is likely the 40-byte one. > > Yes, yes, 64 bytes is the usual cacheline size, and is "traditional" > for unrolling. But there's nothing really magical about it here. > > End result: wouldn't it have been nice to just do 40-byte chunks, and > make the 64-byte "two overlapping 32-byte chunks" be two of the > 40-byte chunks. > > Something like the (ENTIRELY UNTESTED!) attached patch? > > Again: this is *not* tested. I took a quick look at the generated > assembly, and it looked roughly like what I expected it to look like, > but it may be complete garbage. > > I added a couple of "likely()" things just because it made the > generated asm look more natural (ie it followed the order of the > source code there), they are otherwise questionable annotations. > > Finally: did I already mention that this is completely untested? fwiw long flights and pools have a relation; I made a userspace testbench for this some time ago: https://github.com/fenrus75/csum_partial in case one would actually WANT to test ;)
On Tue, 27 Jun 2023 at 13:49, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Anyway, before I throw my patch away, I'll just post it with the > trivial fixes to use "+r", and with the "volatile" removed (I add > "volatile" to asms by habit, but this one really isn't volatile). Oh, never mind. I was about to throw it away, and then I realized that the code *after* the loop relied on the range having been reduced down to below 64 bytes, and checked for 32/16/8/4 byte ranges. And my change to make it loop over 80 bytes had made that no longer be true. But now I'm committed, and decided to fix that too, and just re-organize the code to get all the cases right. And now I'm going to actually boot-test the end result too. Because life is too short to spend all my time _just_ with merging. Linus
On Tue, 27 Jun 2023 at 14:44, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > But now I'm committed, and decided to fix that too, and just > re-organize the code to get all the cases right. > > And now I'm going to actually boot-test the end result too. Because > life is too short to spend all my time _just_ with merging. Well, it boots. And I clearly have networking. But who knows how much that is actually using the csum_partial() function? Not me. I'm just along for the ride. Anyway, that last version handles the 40-byte special case differently, in that it might have done some arbitrary number of 80-byte chunks first. But it shouldn't really make a difference - it does check for >= 80- bytes first, but we're talking two extra instructions. And that way the end case is always less than 64 bytes, and so the tests for 32/16/8 work fine. And now it's committed to my test tree, so I'm not throwing it away, but I also won't be working on it any more. If somebody wants to time it using Arjan's little thing, more power to them. Linus
On Tue, 27 Jun 2023 at 14:44, Arjan van de Ven <arjan@linux.intel.com> wrote: > > fwiw long flights and pools have a relation; I made a userspace testbench > for this some time ago: https://github.com/fenrus75/csum_partial > in case one would actually WANT to test ;) Hmm. I don't know what the rules are - and some of the functions you test seem actively buggy (ie not handling alignment right etc). But on my machine I get: 02: 8.6 / 10.4 cycles (e29e455e) Upcoming linux kernel version 04: 8.6 / 10.4 cycles (e29e455e) Specialized to size 40 06: 7.7 / 9.5 cycles (e29e455e) New version 22: 8.7 / 9.6 cycles (e29e455e) Odd-alignment handling removed ... which would seem to mean that my code ("New version") is doing well. It does do worse on the "odd alignment" case: 03: 15.5 / 17.8 cycles (00006580) Upcoming linux kernel version 05: 15.5 / 17.8 cycles (00006580) Specialized to size 40 07: 16.6 / 19.5 cycles (0000bc29) New version 23: 8.8 / 8.6 cycles (1de29e47) Odd-alignment handling removed ... I just hacked the code into the benchmark without looking too closely at what is going on, so no guarantees. Linus
On Tue, 27 Jun 2023 at 15:25, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I don't know what the rules are - and some of the functions you test > seem actively buggy (ie not handling alignment right etc). Oh. And you *only* test the 40-byte case. That seems a bit bogus. If I change the packet size from 40 to 1500, I get 02: 185.1 / 186.4 cycles (8b414316) Upcoming linux kernel version 04: 184.9 / 186.5 cycles (8b414316) Specialized to size 40 06: 107.3 / 117.2 cycles (8b414316) New version 22: 185.6 / 186.5 cycles (8b414316) Odd-alignment handling removed which seems unexpectedly bad for the other versions. But those other functions have that 64-byte unrolling, rather than the "two 40-byte loops", so maybe it is real, and my version is actually just that good. Or maybe it's a sign that my version is some seriously buggy crap, and it just looks good on the benchmark because it does the wrong thing. Whatever. Back to the merge window again. Linus
On 6/27/2023 3:43 PM, Linus Torvalds wrote: > On Tue, 27 Jun 2023 at 15:25, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> I don't know what the rules are - and some of the functions you test >> seem actively buggy (ie not handling alignment right etc). > > Oh. And you *only* test the 40-byte case. That seems a bit bogus. > > If I change the packet size from 40 to 1500, I get > > 02: 185.1 / 186.4 cycles (8b414316) Upcoming linux kernel version > 04: 184.9 / 186.5 cycles (8b414316) Specialized to size 40 > 06: 107.3 / 117.2 cycles (8b414316) New version > 22: 185.6 / 186.5 cycles (8b414316) Odd-alignment handling removed > > which seems unexpectedly bad for the other versions. > I'm not surprised though; running 2 parallel streams (where one stream has a fixed zero as input, so can run OOO any time) .. can really have a performance change like this
On Tue, 27 Jun 2023 at 15:51, Arjan van de Ven <arjan@linux.intel.com> wrote: > > I'm not surprised though; running 2 parallel streams (where one stream has a fixed zero as input, > so can run OOO any time) .. can really have a performance change like this How much do people care? One of the advantages of just having that single "update_csum_40b()" function is that it's trivial to then manually unroll. With a 4-way unrolling, I get 02: 184.0 / 184.5 cycles (8b414316) Upcoming linux kernel version 04: 184.0 / 184.2 cycles (8b414316) Specialized to size 40 06: 89.4 / 102.5 cycles (512daed6) New version 22: 184.6 / 184.4 cycles (8b414316) Odd-alignment handling removed but doesn't most network hardware do the csum on its own anyway? How critical is csum_partial(), really? (The above is obviously your test thing modified for 1500 byte packets, still. With 40-byte packets, the 4-way unrolling obvious doesn't help, although it doesn't noticeably hurt either - it's just one more compare and branch) Linus
On 6/27/2023 4:02 PM, Linus Torvalds wrote: > On Tue, 27 Jun 2023 at 15:51, Arjan van de Ven <arjan@linux.intel.com> wrote: >> >> I'm not surprised though; running 2 parallel streams (where one stream has a fixed zero as input, >> so can run OOO any time) .. can really have a performance change like this > > How much do people care? > > One of the advantages of just having that single "update_csum_40b()" > function is that it's trivial to then manually unroll. > > With a 4-way unrolling, I get > > 02: 184.0 / 184.5 cycles (8b414316) Upcoming linux kernel version > 04: 184.0 / 184.2 cycles (8b414316) Specialized to size 40 > 06: 89.4 / 102.5 cycles (512daed6) New version > 22: 184.6 / 184.4 cycles (8b414316) Odd-alignment handling removed > > but doesn't most network hardware do the csum on its own anyway? How > critical is csum_partial(), really? the hardware does most cases.. in https://lore.kernel.org/netdev/20211111181025.2139131-1-eric.dumazet@gmail.com/ Eric kind of implies it's for IPv6 headers in practice
Linus Torvalds <torvalds@linux-foundation.org> wrote: > > but doesn't most network hardware do the csum on its own anyway? How > critical is csum_partial(), really? That hardware will checksum the bulk of the packet, but we still need to checksum the header bits (e.g., 40 bytes or less) when we add and remove headers in software. The one exception is Realtek drivers which seems to come with checksum offload disabled by default. Cheers,