Message ID | 20231031-optimize_checksum-v9-0-ea018e69b229@rivosinc.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:abcd:0:b0:403:3b70:6f57 with SMTP id f13csp103103vqx; Tue, 31 Oct 2023 17:19:17 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEzSiHtDiTRhJE+YuV3KuuYP1f+e0xMVO2q6f01geUWaPTCyv9T+efPaEmUaT0Pxt9bD+K2 X-Received: by 2002:a81:440b:0:b0:5b3:1cb2:8845 with SMTP id r11-20020a81440b000000b005b31cb28845mr4002637ywa.30.1698797956902; Tue, 31 Oct 2023 17:19:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698797956; cv=none; d=google.com; s=arc-20160816; b=q+/AUT8bQCXev1vPjXUy9QQ0GvS8q/+kIelEA6BzpDHkRLbpwZ4wMeSFT9Di3cWOLi K1JpGJJo7W1ZnZO1QXvZ/v1cA6J3W9UNf9LQ7R1PG8G3PSwn4EmOD0EagGGikW/SqPGj CMwRCvYsYuHTfhgcRM7evHbQ5wHqHrrZ8//aaLaBtggKGS7UcEM8sdTOgJ9RA/2r58tN 0RpEE8zwlX/9l5oqeM9x5CGxNd1HwZTq1j5afg82EFvR3I5WO0JVDPUteS2pFgLotebp 5gMPTZ+QGtdWvqyIIFrE+U4+Ufb0Dp3qGPg/XRnWOPvAPhXF2bMBADdCNfxZMrWQqnoP zW7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:content-transfer-encoding:mime-version :message-id:date:subject:from:dkim-signature; bh=x9LzagGrV0vnl+5QDWRYBYwCCC2WqKNu2SBug3GHKXY=; fh=B38m/x4vysnb+FduGD3Y3Qx6OAV3jidMNwqy8dyeER8=; b=jiF8qLULuWWlrWmPTcH25pk9v6bI7z+ViOSDpn49A3AeNNWS3xC2Qqf/23189hUd7u NVslBJV7I8MMuZI3LeZWttZ73/BJoSENNon8pVrkbd+Gt9SXjN2gwZfC4VERuHQj0Sub WxFEJ1tMMwvL89GoFdvvjLElU2bZEK/tlwERBQmPT+P0thHSIhN5INBzU4mWmgPKtNNp V/xU5KL550EOxCbz9ZtGHcab2VvQs1aOtmVOLjdPAI2ybN5AR3gT8EnJO1KhT6MUUlJW dYywdZUIMReEWruQjkVFAgqagUTJ30kmAQi6106mcLz4IfBqaEKsOgtc+aetYX2LQXMC rluA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=3FbqWBss; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id h11-20020a65480b000000b00578af609d05si1839722pgs.244.2023.10.31.17.19.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 17:19:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=3FbqWBss; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 15D3B8116E4F; Tue, 31 Oct 2023 17:19:16 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345001AbjKAATN (ORCPT <rfc822;gah0developer@gmail.com> + 34 others); Tue, 31 Oct 2023 20:19:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55312 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231410AbjKAATM (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 31 Oct 2023 20:19:12 -0400 Received: from mail-oi1-x229.google.com (mail-oi1-x229.google.com [IPv6:2607:f8b0:4864:20::229]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32FE8B7 for <linux-kernel@vger.kernel.org>; Tue, 31 Oct 2023 17:19:09 -0700 (PDT) Received: by mail-oi1-x229.google.com with SMTP id 5614622812f47-3b3e13fc1f7so4013934b6e.0 for <linux-kernel@vger.kernel.org>; Tue, 31 Oct 2023 17:19:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1698797948; x=1699402748; darn=vger.kernel.org; h=cc:to:content-transfer-encoding:mime-version:message-id:date :subject:from:from:to:cc:subject:date:message-id:reply-to; bh=x9LzagGrV0vnl+5QDWRYBYwCCC2WqKNu2SBug3GHKXY=; b=3FbqWBss49Lqe8Ti5iSXqCYysW/k77EVKJOEP+HkpT9jmvi+mYFZXkCU1ccVjs7FXL wd0cspP3Qnmxb5asVITpzTWYWfoySCGVluHVIlf4ZkS/btJmL+fX2gjq+rNQSrTIcw+l qSeSYeVhfe+KwjrhgK4GLbwsQQqBeLlouMLq0aiEegceB6gLGb0O0sdNixSILNEGJVGo hW+7KFt0ervl4+cVTAoDmEa6l5tj4FxSSXt0I9UvH4vVfsqaKTG5nRayMpsKQL/9FnN0 A5lKL8tbqJw+RhoXJecmy1A+qQV5HGEXGRyzob/aEK6l85W4cMcneqKhkWEFxLjylZJx vpNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698797948; x=1699402748; h=cc:to:content-transfer-encoding:mime-version:message-id:date :subject:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=x9LzagGrV0vnl+5QDWRYBYwCCC2WqKNu2SBug3GHKXY=; b=XsQ9eQJCmc0r7c9kdSJwYJDj9uNnVxno+u9tHwKZHSKLZmHWAVj+03+Kl/zQex9CVX M1dogIDkJ6jz9IoUS/1QfD5pZsmwMrCH+fXIFQj2ftm+DMf4MZDwOt2xPOYmMz3G9yvg Tjhd1kJdc6erf/71k/0b349wbCsoURZyi4YdRz/x2KubSxxQBUKrtJLMtrLUGpoq4M98 h7+/zf1arsR3Isl5Gj8u/KRdZgqhf5TeUloZLc0O/Pe+tgXuzJ6MCwDuU1NvsOj/vaG6 KMIq8pi0lGJcIClWPcyIPzzHBhCpsrAWfjb+WmW8IllObF+O1XveI/rr3dD+HKyEQo4o Zmcg== X-Gm-Message-State: AOJu0YxpqrY2VPLtvR4N7Ae62YWqkliLvwH8J1YH0lzXnGLpIpiPwf8g UHUDdUwNofZLS9VHWmDKyE/64g== X-Received: by 2002:aca:1214:0:b0:3ae:1446:d48b with SMTP id 20-20020aca1214000000b003ae1446d48bmr14762389ois.3.1698797948507; Tue, 31 Oct 2023 17:19:08 -0700 (PDT) Received: from charlie.ba.rivosinc.com ([64.71.180.162]) by smtp.gmail.com with ESMTPSA id n21-20020aca2415000000b003af638fd8e4sm65309oic.55.2023.10.31.17.19.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 17:19:08 -0700 (PDT) From: Charlie Jenkins <charlie@rivosinc.com> Subject: [PATCH v9 0/5] riscv: Add fine-tuned checksum functions Date: Tue, 31 Oct 2023 17:18:50 -0700 Message-Id: <20231031-optimize_checksum-v9-0-ea018e69b229@rivosinc.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-B4-Tracking: v=1; b=H4sIAGqZQWUC/23Q3UrEQAwF4FdZ5trKJPPvle8hIm0ydQfZdmnXo i59d6eLaJFcnkC+hHNVc55KntXD4aqmvJS5jEMN6e6g6NgOr7kpXLNCjUZHbZvxfCmn8pVf6Jj pbX4/NdyBdRhjSwiq7p2n3JePm/n0XPOxzJdx+rydWGCb/mjoBW2BRjfJBKehqyy2j1NZxrkMd E/jSW3ggn9I0k5CsCJE7F3kLljTCYjZI0FCzIY4jWwsJw5GQOwOAZAQW5EQiLDllKsiIG6PSA0 vbvskuS5ii8GFLCB+j4id+IqAbT31HshDFJCwR5KEhIpoT4E1Mzv2AhJ/EdAoFhsr0ucugAYGj PwPWdf1G5qcXFKYAgAA To: Charlie Jenkins <charlie@rivosinc.com>, Palmer Dabbelt <palmer@dabbelt.com>, Conor Dooley <conor@kernel.org>, Samuel Holland <samuel.holland@sifive.com>, David Laight <David.Laight@aculab.com>, Xiao Wang <xiao.w.wang@intel.com>, Evan Green <evan@rivosinc.com>, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org Cc: Paul Walmsley <paul.walmsley@sifive.com>, Albert Ou <aou@eecs.berkeley.edu>, Arnd Bergmann <arnd@arndb.de>, David Laight <david.laight@aculab.com>, Conor Dooley <conor.dooley@microchip.com> X-Mailer: b4 0.12.3 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 31 Oct 2023 17:19:16 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781318766517896003 X-GMAIL-MSGID: 1781318766517896003 |
Series |
riscv: Add fine-tuned checksum functions
|
|
Message
Charlie Jenkins
Nov. 1, 2023, 12:18 a.m. UTC
Each architecture generally implements fine-tuned checksum functions to
leverage the instruction set. This patch adds the main checksum
functions that are used in networking.
This patch takes heavy use of the Zbb extension using alternatives
patching.
To test this patch, enable the configs for KUNIT, then CHECKSUM_KUNIT
and RISCV_CHECKSUM_KUNIT.
I have attempted to make these functions as optimal as possible, but I
have not ran anything on actual riscv hardware. My performance testing
has been limited to inspecting the assembly, running the algorithms on
x86 hardware, and running in QEMU.
ip_fast_csum is a relatively small function so even though it is
possible to read 64 bits at a time on compatible hardware, the
bottleneck becomes the clean up and setup code so loading 32 bits at a
time is actually faster.
Relies on https://lore.kernel.org/lkml/20230920193801.3035093-1-evan@rivosinc.com/
---
The algorithm proposed to replace the default csum_fold can be seen to
compute the same result by running all 2^32 possible inputs.
static inline unsigned int ror32(unsigned int word, unsigned int shift)
{
return (word >> (shift & 31)) | (word << ((-shift) & 31));
}
unsigned short csum_fold(unsigned int csum)
{
unsigned int sum = csum;
sum = (sum & 0xffff) + (sum >> 16);
sum = (sum & 0xffff) + (sum >> 16);
return ~sum;
}
unsigned short csum_fold_arc(unsigned int csum)
{
return ((~csum - ror32(csum, 16)) >> 16);
}
int main()
{
unsigned int start = 0x0;
do {
if (csum_fold(start) != csum_fold_arc(start)) {
printf("Not the same %u\n", start);
return -1;
}
start += 1;
} while(start != 0x0);
printf("The same\n");
return 0;
}
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
To: Charlie Jenkins <charlie@rivosinc.com>
To: Palmer Dabbelt <palmer@dabbelt.com>
To: Conor Dooley <conor@kernel.org>
To: Samuel Holland <samuel.holland@sifive.com>
To: David Laight <David.Laight@aculab.com>
To: Xiao Wang <xiao.w.wang@intel.com>
To: Evan Green <evan@rivosinc.com>
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
To: linux-arch@vger.kernel.org
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
Changes in v9:
- Use ror64 (Xiao)
- Move do_csum and csum_ipv6_magic headers to patch 4 (Xiao)
- Remove word "IP" from checksum headers (Xiao)
- Swap to using ifndef CONFIG_32BIT instead of ifdef CONFIG_64BIT (Xiao)
- Run no alignment code when buff is aligned (Xiao)
- Consolidate two do_csum implementations overlap into do_csum_common
- Link to v8: https://lore.kernel.org/r/20231027-optimize_checksum-v8-0-feb7101d128d@rivosinc.com
Changes in v8:
- Speedups of 12% without Zbb and 21% with Zbb when cpu supports fast
misaligned accesses for do_csum
- Various formatting updates
- Patch now relies on https://lore.kernel.org/lkml/20230920193801.3035093-1-evan@rivosinc.com/
- Link to v7: https://lore.kernel.org/r/20230919-optimize_checksum-v7-0-06c7d0ddd5d6@rivosinc.com
Changes in v7:
- Included linux/bitops.h in asm-generic/checksum.h to use ror (Conor)
- Optimized loop in do_csum (David)
- Used ror instead of shifting (David)
- Unfortunately had to reintroduce ifdefs because gcc is not smart
enough to not throw warnings on code that will never execute
- Use ifdef instead of IS_ENABLED on __LITTLE_ENDIAN because IS_ENABLED
does not work on that
- Only optimize for zbb when alternatives is enabled in do_csum
- Link to v6: https://lore.kernel.org/r/20230915-optimize_checksum-v6-0-14a6cf61c618@rivosinc.com
Changes in v6:
- Fix accuracy of commit message for csum_fold
- Fix indentation
- Link to v5: https://lore.kernel.org/r/20230914-optimize_checksum-v5-0-c95b82a2757e@rivosinc.com
Changes in v5:
- Drop vector patches
- Check ZBB enabled before doing any ZBB code (Conor)
- Check endianness in IS_ENABLED
- Revert to the simpler non-tree based version of ipv6_csum_magic since
David pointed out that the tree based version is not better.
- Link to v4: https://lore.kernel.org/r/20230911-optimize_checksum-v4-0-77cc2ad9e9d7@rivosinc.com
Changes in v4:
- Suggestion by David Laight to use an improved checksum used in
arch/arc.
- Eliminates zero-extension on rv32, but not on rv64.
- Reduces data dependency which should improve execution speed on
rv32 and rv64
- Still passes CHECKSUM_KUNIT and RISCV_CHECKSUM_KUNIT on rv32 and
rv64 with and without zbb.
- Link to v3: https://lore.kernel.org/r/20230907-optimize_checksum-v3-0-c502d34d9d73@rivosinc.com
Changes in v3:
- Use riscv_has_extension_likely and has_vector where possible (Conor)
- Reduce ifdefs by using IS_ENABLED where possible (Conor)
- Use kernel_vector_begin in the vector code (Samuel)
- Link to v2: https://lore.kernel.org/r/20230905-optimize_checksum-v2-0-ccd658db743b@rivosinc.com
Changes in v2:
- After more benchmarking, rework functions to improve performance.
- Remove tests that overlapped with the already existing checksum
tests and make tests more extensive.
- Use alternatives to activate code with Zbb and vector extensions
- Link to v1: https://lore.kernel.org/r/20230826-optimize_checksum-v1-0-937501b4522a@rivosinc.com
---
Charlie Jenkins (5):
asm-generic: Improve csum_fold
riscv: Add static key for misaligned accesses
riscv: Checksum header
riscv: Add checksum library
riscv: Test checksum functions
arch/riscv/Kconfig.debug | 1 +
arch/riscv/include/asm/checksum.h | 92 ++++++++++
arch/riscv/include/asm/cpufeature.h | 3 +
arch/riscv/kernel/cpufeature.c | 30 ++++
arch/riscv/lib/Kconfig.debug | 31 ++++
arch/riscv/lib/Makefile | 3 +
arch/riscv/lib/csum.c | 326 +++++++++++++++++++++++++++++++++
arch/riscv/lib/riscv_checksum_kunit.c | 330 ++++++++++++++++++++++++++++++++++
include/asm-generic/checksum.h | 6 +-
9 files changed, 819 insertions(+), 3 deletions(-)
---
base-commit: 8d68c506cd34a142331623fd23eb1c4e680e1955
change-id: 20230804-optimize_checksum-db145288ac21
Comments
On Tue, Oct 31, 2023 at 05:18:50PM -0700, Charlie Jenkins wrote: > Each architecture generally implements fine-tuned checksum functions to > leverage the instruction set. This patch adds the main checksum > functions that are used in networking. > > This patch takes heavy use of the Zbb extension using alternatives > patching. > > To test this patch, enable the configs for KUNIT, then CHECKSUM_KUNIT > and RISCV_CHECKSUM_KUNIT. > > I have attempted to make these functions as optimal as possible, but I > have not ran anything on actual riscv hardware. My performance testing > has been limited to inspecting the assembly, running the algorithms on > x86 hardware, and running in QEMU. > > ip_fast_csum is a relatively small function so even though it is > possible to read 64 bits at a time on compatible hardware, the > bottleneck becomes the clean up and setup code so loading 32 bits at a > time is actually faster. > > Relies on https://lore.kernel.org/lkml/20230920193801.3035093-1-evan@rivosinc.com/ I coulda sworn I reported build issues against the v8 of this series that are still present in this v9. For example: https://patchwork.kernel.org/project/linux-riscv/patch/20231031-optimize_checksum-v9-3-ea018e69b229@rivosinc.com/ Cheers, Conor.
On Wed, Nov 01, 2023 at 11:50:46AM +0000, Conor Dooley wrote: > On Tue, Oct 31, 2023 at 05:18:50PM -0700, Charlie Jenkins wrote: > > Each architecture generally implements fine-tuned checksum functions to > > leverage the instruction set. This patch adds the main checksum > > functions that are used in networking. > > > > This patch takes heavy use of the Zbb extension using alternatives > > patching. > > > > To test this patch, enable the configs for KUNIT, then CHECKSUM_KUNIT > > and RISCV_CHECKSUM_KUNIT. > > > > I have attempted to make these functions as optimal as possible, but I > > have not ran anything on actual riscv hardware. My performance testing > > has been limited to inspecting the assembly, running the algorithms on > > x86 hardware, and running in QEMU. > > > > ip_fast_csum is a relatively small function so even though it is > > possible to read 64 bits at a time on compatible hardware, the > > bottleneck becomes the clean up and setup code so loading 32 bits at a > > time is actually faster. > > > > Relies on https://lore.kernel.org/lkml/20230920193801.3035093-1-evan@rivosinc.com/ > > I coulda sworn I reported build issues against the v8 of this series > that are still present in this v9. For example: > https://patchwork.kernel.org/project/linux-riscv/patch/20231031-optimize_checksum-v9-3-ea018e69b229@rivosinc.com/ > > Cheers, > Conor. You did, and I fixed the build issues. This is another instance of how Patchwork reports the results of the previous build before the new build completes. Patchwork was very far behind so it took around 15 hours for the result to be ready. There are some miscellaneous warnings in random drivers that I don't think can be attributed to this patch. - Charlie
On Wed, Nov 01, 2023 at 10:06:26AM -0700, Charlie Jenkins wrote: > On Wed, Nov 01, 2023 at 11:50:46AM +0000, Conor Dooley wrote: > > On Tue, Oct 31, 2023 at 05:18:50PM -0700, Charlie Jenkins wrote: > > > Each architecture generally implements fine-tuned checksum functions to > > > leverage the instruction set. This patch adds the main checksum > > > functions that are used in networking. > > > > > > This patch takes heavy use of the Zbb extension using alternatives > > > patching. > > > > > > To test this patch, enable the configs for KUNIT, then CHECKSUM_KUNIT > > > and RISCV_CHECKSUM_KUNIT. > > > > > > I have attempted to make these functions as optimal as possible, but I > > > have not ran anything on actual riscv hardware. My performance testing > > > has been limited to inspecting the assembly, running the algorithms on > > > x86 hardware, and running in QEMU. > > > > > > ip_fast_csum is a relatively small function so even though it is > > > possible to read 64 bits at a time on compatible hardware, the > > > bottleneck becomes the clean up and setup code so loading 32 bits at a > > > time is actually faster. > > > > > > Relies on https://lore.kernel.org/lkml/20230920193801.3035093-1-evan@rivosinc.com/ > > > > I coulda sworn I reported build issues against the v8 of this series > > that are still present in this v9. For example: > > https://patchwork.kernel.org/project/linux-riscv/patch/20231031-optimize_checksum-v9-3-ea018e69b229@rivosinc.com/ > You did, and I fixed the build issues. This is another instance of how > Patchwork reports the results of the previous build before the new build > completes. Patchwork was very far behind so it took around 15 hours for > the result to be ready. :clown_face: > There are some miscellaneous warnings in random > drivers that I don't think can be attributed to this patch. Yeah, there sometimes are warnings that seem spurious when you touch a bunch of header files. I'm not really sure how to improve on that, since it was newly introduced. My theory is that how we do a build of commit A, then commit A~1 and then commit A again & take the difference between the 2nd and 3rd builds (which should both be partial rebuilds) is not as symmetrical as I might've thought and is the source of those seemingly unrelated issues that come up from time to time.