Message ID | 20231117-optimize_checksum-v11-0-7d9d954fe361@rivosinc.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9910:0:b0:403:3b70:6f57 with SMTP id i16csp823673vqn; Fri, 17 Nov 2023 13:29:57 -0800 (PST) X-Google-Smtp-Source: AGHT+IGZWGIa+KJmXRh8SSHQKhanRrWu7gYIkaU9qCs5KMVt/8+g8oEzXqAoY5cmSz0jVSd14esg X-Received: by 2002:a17:90b:3507:b0:27c:ecec:8854 with SMTP id ls7-20020a17090b350700b0027cecec8854mr424043pjb.7.1700256597558; Fri, 17 Nov 2023 13:29:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700256597; cv=none; d=google.com; s=arc-20160816; b=brT7mmklbcEPvQ2mJ/NNrmDh8/QTVifPwUfl+aX3T01rKUNF50uoC6N9hmubRGAk5H 2r0l8Mn/84c6xpDYKes0vOeDt7ijoPiqqz870ziuvo5K+lH91M414QhyxpA3LJJJwH9g L/XEClf2zrjomHecnsrf9orYPawDUCVzVK5WVe/TgJNjx3VZMSPCoE7YysBBRqWF7maF C7gVTUJX3NmNwYVAL+o+RiNt1V20kknRQz6cG4NB8M8nzOx8rt8HZj134vr6QnjW4BKE YrAK9WRnZwYtufY4iyPMeCqj++FKQRbprH7wR+rSjjEqKw57LMFuxUM9Rkaid1BNHZam bEQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:content-transfer-encoding:mime-version :message-id:date:subject:from:dkim-signature; bh=pbfqhCDRNrxODLpkfaWCwkY+4hLyv36be17aYVh/JsM=; fh=B38m/x4vysnb+FduGD3Y3Qx6OAV3jidMNwqy8dyeER8=; b=w7Bdko9sdXMacpBb73EmvHWaPdQzmgV5D/ro9o8o4HTGZGRsELFphd38xmHbyVDKIa qT0BQh3JyEP2yo/zPzevZDwp+Vzab/5H6XbKQhYQEbFIIJSAn+FIxyWf/sxfTyq/SMMR Z/do5DR6CUv19auwEd+3PmXGNNC7/egDdbdbTuiLBQDj3PoMsAoWCz4Qzkf+FPLO6pgp 9Z1U2RHKKeFoEXA+xeDI/sR+424NnHiHC1uoBmQyIzVTsf1/EO4yY9XHXEaMnXkhnuKs FqYnUFiODPrtLYFRhIMFyloD8SgtV2SPNR0k+ntW/VXK7OLD/GrMnKE0nV9czteSp7hs Iiuw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=l0skyaTl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id oa16-20020a17090b1bd000b0027ffa51a805si2933328pjb.38.2023.11.17.13.29.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Nov 2023 13:29:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=l0skyaTl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id DDE5283037D5; Fri, 17 Nov 2023 13:29:45 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346256AbjKQV2n (ORCPT <rfc822;jaysivo@gmail.com> + 29 others); Fri, 17 Nov 2023 16:28:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57388 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235796AbjKQV2a (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 17 Nov 2023 16:28:30 -0500 Received: from mail-oa1-x2d.google.com (mail-oa1-x2d.google.com [IPv6:2001:4860:4864:20::2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F3071FC0 for <linux-kernel@vger.kernel.org>; Fri, 17 Nov 2023 13:28:09 -0800 (PST) Received: by mail-oa1-x2d.google.com with SMTP id 586e51a60fabf-1f5b92bba54so358732fac.0 for <linux-kernel@vger.kernel.org>; Fri, 17 Nov 2023 13:28:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1700256488; x=1700861288; darn=vger.kernel.org; h=cc:to:content-transfer-encoding:mime-version:message-id:date :subject:from:from:to:cc:subject:date:message-id:reply-to; bh=pbfqhCDRNrxODLpkfaWCwkY+4hLyv36be17aYVh/JsM=; b=l0skyaTlSmnGFoAilxCSOuaeeYutDZ11tdG7ci4qrh5KAOb8EwhlNMw/oVDfy1tNuq eBorKm8Zz+bHjUcoeh+CFl/Ga3f+7ljt6xCkHf12HLclcN+Vis0q2HYlrIFCFkrVQjwS FbJWnNDPlJRpc7CbDEmP8VBVHD9VrP5zZ0SBor5XpLNmLZVFPYXZx8fPuyBH/Bedp49S 0MJHWTNG8sftbbMlrWs7zHxXE+zVujhqUJa5P89nGF2oZYGpU5xybXkrKs1tv49uu6fR 3o2E3B6fh6UN1n9u1w2ACvR8oGP8LrfYeuALDmcn5/5/E18nHQHuUbVLOHy9AleOt71K 038w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700256488; x=1700861288; h=cc:to:content-transfer-encoding:mime-version:message-id:date :subject:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=pbfqhCDRNrxODLpkfaWCwkY+4hLyv36be17aYVh/JsM=; b=JbFYBxY5Prsor0/gvSPs5h0gV4tHJrWkmMfR40L61I2wOY79CgiXWvd2FIIhafzxJZ 9RNbhYBCjxtC+zlJO7O8dAXuaksaOQHYM1YGpZFWjmPUsdM/lrbKADwIm/xFXEP3UaEE xq+g3JJcpybN79SQmJf/8gD7JNju+HdsVoS314EAH8Vy2P+LAiIZRqYTWfcJ2nRMWCaB afbZcgy1yxhiqc2TvIcvHs2f31mu4Au/CifCHi1aCaDr+G9v0wnysZXobjfaori3f1J7 ZVdHrab23VagCY660g/Yx1iL6+El9u4pC8McQmOzLUFS8ddI+HmkKBuVZZ1aAPjgPAAO 30AA== X-Gm-Message-State: AOJu0YzCSp4bbA0MHdx70HF/Vew9LWaXX/1G7Is1POrVM7xwGUQq27QN TDLq2DiSyPYzZbqAnf8h/SeinlWLR9GVOu/gWZU= X-Received: by 2002:a05:6871:88a:b0:1e9:dfc3:1e6c with SMTP id r10-20020a056871088a00b001e9dfc31e6cmr3333927oaq.28.1700256488467; Fri, 17 Nov 2023 13:28:08 -0800 (PST) Received: from charlie.ba.rivosinc.com ([64.71.180.162]) by smtp.gmail.com with ESMTPSA id e2-20020a05683013c200b006d3127234d7sm365677otq.8.2023.11.17.13.28.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Nov 2023 13:28:07 -0800 (PST) From: Charlie Jenkins <charlie@rivosinc.com> Subject: [PATCH v11 0/5] riscv: Add fine-tuned checksum functions Date: Fri, 17 Nov 2023 13:27:58 -0800 Message-Id: <20231117-optimize_checksum-v11-0-7d9d954fe361@rivosinc.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-B4-Tracking: v=1; b=H4sIAN7aV2UC/23S22rDMAwG4FcpuV6GJR9k92rvMcawLXc1o01Ju rCt9N3nlLGGoUsJ9Mn68aWbyljL1G03l24sc53qcGwFwMOmy/t4fCt95dboUKFWXpl+OJ3roX6 X17wv+X36OPScwFj0PmaErs2dxrKrnzf0+aXV+zqdh/HrtmOGpfuroRO0GXrVB01WQWosxqexz sNUj/kxD4duAWe8I0FZCcGG5MzOek5kdBIQvUZIQvSCWIWsDQcmLSBmhQBIiGkIUc4YOZSmCIh dI1LCs11eEmzyGJEsFQFxa0TMxDUETHR55yA78AJCayRICDVEuUysmNmyExD/h4BCMVjfkF1JB AoY0LOAhBWixWBDQ0pU4IsLCTEICKi70naJv62d00cTvCVKKbj/zPV6/QEOzjtpJQMAAA== To: Charlie Jenkins <charlie@rivosinc.com>, Palmer Dabbelt <palmer@dabbelt.com>, Conor Dooley <conor@kernel.org>, Samuel Holland <samuel.holland@sifive.com>, David Laight <David.Laight@aculab.com>, Xiao Wang <xiao.w.wang@intel.com>, Evan Green <evan@rivosinc.com>, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org Cc: Paul Walmsley <paul.walmsley@sifive.com>, Albert Ou <aou@eecs.berkeley.edu>, Arnd Bergmann <arnd@arndb.de>, David Laight <david.laight@aculab.com>, Conor Dooley <conor.dooley@microchip.com> X-Mailer: b4 0.12.3 X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Fri, 17 Nov 2023 13:29:46 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782848262442887580 X-GMAIL-MSGID: 1782848262442887580 |
Series | riscv: Add fine-tuned checksum functions | |
Message
Charlie Jenkins
Nov. 17, 2023, 9:27 p.m. UTC
Each architecture generally implements fine-tuned checksum functions to
leverage the instruction set. This patch adds the main checksum
functions that are used in networking.
This patch takes heavy use of the Zbb extension using alternatives
patching.
To test this patch, enable the configs for KUNIT, then CHECKSUM_KUNIT.
I have attempted to make these functions as optimal as possible, but I
have not ran anything on actual riscv hardware. My performance testing
has been limited to inspecting the assembly, running the algorithms on
x86 hardware, and running in QEMU.
ip_fast_csum is a relatively small function so even though it is
possible to read 64 bits at a time on compatible hardware, the
bottleneck becomes the clean up and setup code so loading 32 bits at a
time is actually faster.
Relies on https://lore.kernel.org/lkml/20230920193801.3035093-1-evan@rivosinc.com/
---
The algorithm proposed to replace the default csum_fold can be seen to
compute the same result by running all 2^32 possible inputs.
static inline unsigned int ror32(unsigned int word, unsigned int shift)
{
return (word >> (shift & 31)) | (word << ((-shift) & 31));
}
unsigned short csum_fold(unsigned int csum)
{
unsigned int sum = csum;
sum = (sum & 0xffff) + (sum >> 16);
sum = (sum & 0xffff) + (sum >> 16);
return ~sum;
}
unsigned short csum_fold_arc(unsigned int csum)
{
return ((~csum - ror32(csum, 16)) >> 16);
}
int main()
{
unsigned int start = 0x0;
do {
if (csum_fold(start) != csum_fold_arc(start)) {
printf("Not the same %u\n", start);
return -1;
}
start += 1;
} while(start != 0x0);
printf("The same\n");
return 0;
}
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
To: Charlie Jenkins <charlie@rivosinc.com>
To: Palmer Dabbelt <palmer@dabbelt.com>
To: Conor Dooley <conor@kernel.org>
To: Samuel Holland <samuel.holland@sifive.com>
To: David Laight <David.Laight@aculab.com>
To: Xiao Wang <xiao.w.wang@intel.com>
To: Evan Green <evan@rivosinc.com>
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
To: linux-arch@vger.kernel.org
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
Changes in v11:
- Extensive modifications to comply to sparse
- Organize include statements (Xiao)
- Add csum_ipv6_magic to commit message (Xiao)
- Remove extraneous len statement (Xiao)
- Add kasan_check_read call (Xiao)
- Improve comment field checksum.h (Xiao)
- Consolidate "buff" and "len" into one parameter "end" (Xiao)
- Link to v10: https://lore.kernel.org/r/20231101-optimize_checksum-v10-0-a498577bb969@rivosinc.com
Changes in v10:
- Move tests that were riscv-specific to be arch agnostic (Arnd)
- Link to v9: https://lore.kernel.org/r/20231031-optimize_checksum-v9-0-ea018e69b229@rivosinc.com
Changes in v9:
- Use ror64 (Xiao)
- Move do_csum and csum_ipv6_magic headers to patch 4 (Xiao)
- Remove word "IP" from checksum headers (Xiao)
- Swap to using ifndef CONFIG_32BIT instead of ifdef CONFIG_64BIT (Xiao)
- Run no alignment code when buff is aligned (Xiao)
- Consolidate two do_csum implementations overlap into do_csum_common
- Link to v8: https://lore.kernel.org/r/20231027-optimize_checksum-v8-0-feb7101d128d@rivosinc.com
Changes in v8:
- Speedups of 12% without Zbb and 21% with Zbb when cpu supports fast
misaligned accesses for do_csum
- Various formatting updates
- Patch now relies on https://lore.kernel.org/lkml/20230920193801.3035093-1-evan@rivosinc.com/
- Link to v7: https://lore.kernel.org/r/20230919-optimize_checksum-v7-0-06c7d0ddd5d6@rivosinc.com
Changes in v7:
- Included linux/bitops.h in asm-generic/checksum.h to use ror (Conor)
- Optimized loop in do_csum (David)
- Used ror instead of shifting (David)
- Unfortunately had to reintroduce ifdefs because gcc is not smart
enough to not throw warnings on code that will never execute
- Use ifdef instead of IS_ENABLED on __LITTLE_ENDIAN because IS_ENABLED
does not work on that
- Only optimize for zbb when alternatives is enabled in do_csum
- Link to v6: https://lore.kernel.org/r/20230915-optimize_checksum-v6-0-14a6cf61c618@rivosinc.com
Changes in v6:
- Fix accuracy of commit message for csum_fold
- Fix indentation
- Link to v5: https://lore.kernel.org/r/20230914-optimize_checksum-v5-0-c95b82a2757e@rivosinc.com
Changes in v5:
- Drop vector patches
- Check ZBB enabled before doing any ZBB code (Conor)
- Check endianness in IS_ENABLED
- Revert to the simpler non-tree based version of ipv6_csum_magic since
David pointed out that the tree based version is not better.
- Link to v4: https://lore.kernel.org/r/20230911-optimize_checksum-v4-0-77cc2ad9e9d7@rivosinc.com
Changes in v4:
- Suggestion by David Laight to use an improved checksum used in
arch/arc.
- Eliminates zero-extension on rv32, but not on rv64.
- Reduces data dependency which should improve execution speed on
rv32 and rv64
- Still passes CHECKSUM_KUNIT and RISCV_CHECKSUM_KUNIT on rv32 and
rv64 with and without zbb.
- Link to v3: https://lore.kernel.org/r/20230907-optimize_checksum-v3-0-c502d34d9d73@rivosinc.com
Changes in v3:
- Use riscv_has_extension_likely and has_vector where possible (Conor)
- Reduce ifdefs by using IS_ENABLED where possible (Conor)
- Use kernel_vector_begin in the vector code (Samuel)
- Link to v2: https://lore.kernel.org/r/20230905-optimize_checksum-v2-0-ccd658db743b@rivosinc.com
Changes in v2:
- After more benchmarking, rework functions to improve performance.
- Remove tests that overlapped with the already existing checksum
tests and make tests more extensive.
- Use alternatives to activate code with Zbb and vector extensions
- Link to v1: https://lore.kernel.org/r/20230826-optimize_checksum-v1-0-937501b4522a@rivosinc.com
---
Charlie Jenkins (5):
asm-generic: Improve csum_fold
riscv: Add static key for misaligned accesses
riscv: Add checksum header
riscv: Add checksum library
kunit: Add tests for csum_ipv6_magic and ip_fast_csum
arch/riscv/include/asm/checksum.h | 93 ++++++++++
arch/riscv/include/asm/cpufeature.h | 3 +
arch/riscv/kernel/cpufeature.c | 30 ++++
arch/riscv/lib/Makefile | 1 +
arch/riscv/lib/csum.c | 326 ++++++++++++++++++++++++++++++++++++
include/asm-generic/checksum.h | 6 +-
lib/checksum_kunit.c | 284 ++++++++++++++++++++++++++++++-
7 files changed, 739 insertions(+), 4 deletions(-)
---
base-commit: 8d68c506cd34a142331623fd23eb1c4e680e1955
change-id: 20230804-optimize_checksum-db145288ac21