Message ID | 20230919-optimize_checksum-v7-0-06c7d0ddd5d6@rivosinc.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp3643809vqi; Tue, 19 Sep 2023 12:55:29 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEICFsVuYM4XcfzEtk+5nxBegJIHBjly2ElNnfrJN1S1xM739wUTrl9T9kYu8kLYIHTcEez X-Received: by 2002:a17:90b:3847:b0:276:7683:70f6 with SMTP id nl7-20020a17090b384700b00276768370f6mr700924pjb.33.1695153329146; Tue, 19 Sep 2023 12:55:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695153329; cv=none; d=google.com; s=arc-20160816; b=t4sCNj92PXwYzTgi+mMyfpFnTiNRiy653Klou2uAOPFJpXvWwUGor4DwgHXbRiBHZq PtVas91kgI/8eKntUYPAkQXrqhe3DGA7QA+bedKmf0GwljawAd5GKc25EgHT1bdTihIe qOTNiLjH1XIvtGiBD5vDJVF6dPWV+KxSdVUuUQKQj5eNyvQ9pIerpbpqYfWq8w+7kEPe UTv405IZiYqHrtPV43L+u0twRDem2UctDkIUYsPMxznSSBOhXgKCyjP4/g6bE1v0+j7I nBokAOHzIMrw6GWCBCgiZTV1OcINjeIH4O/jODrJ0CXRR9uIe82g+Ho1Ql8h30dl7grM rRGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:content-transfer-encoding:mime-version :message-id:date:subject:from:dkim-signature; bh=2uA2AKw5TRuWkTphPeTSud1oyV6iHPUIlWEjGcPkhvI=; fh=Y565EP4XCbbHzyjsLQMNVA5sHkqrC9GPqaO5xfn/wHs=; b=YlI3cVj56Y81wkyTjH+c+STDf0878fqvUk9xMK3QGnp7YY79lRyzoPmar+WD4CczUP vjTZ9FgTFY8qCMMRTGMiRREutaArNuM3nHGwqCz5ZBmtJxl+qIqw32Ldp2ROZnBVE8pi Zzz3vRp0Y7ELz5kLKwudrL5y4qEHnS+Z0T8jXMyNEKaoWWJxXc1aWIPeUrJ/WWnzSrCq LUHe83tCkIkC7o4V9F6wE/umx/9UFRD9OX7vpNYW5QfIsUgS3oi//mhD2h2ZCXtTMAbv kNWdjVD6hamEvLYOZ7HCkB64cQK2NKtIFhqWpkcQAX7Q0XooGZ76ARu1mYzJ+Z+mur/w Uf0Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=ygcczdBo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id lb3-20020a17090b4a4300b0026b52571571si145425pjb.1.2023.09.19.12.55.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Sep 2023 12:55:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=ygcczdBo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 0853C82A3991; Tue, 19 Sep 2023 11:45:04 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232362AbjISSpG (ORCPT <rfc822;toshivichauhan@gmail.com> + 26 others); Tue, 19 Sep 2023 14:45:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229552AbjISSpF (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 19 Sep 2023 14:45:05 -0400 Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C08BABD for <linux-kernel@vger.kernel.org>; Tue, 19 Sep 2023 11:44:58 -0700 (PDT) Received: by mail-pj1-x102b.google.com with SMTP id 98e67ed59e1d1-27489f78e52so3646208a91.1 for <linux-kernel@vger.kernel.org>; Tue, 19 Sep 2023 11:44:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1695149098; x=1695753898; darn=vger.kernel.org; h=cc:to:content-transfer-encoding:mime-version:message-id:date :subject:from:from:to:cc:subject:date:message-id:reply-to; bh=2uA2AKw5TRuWkTphPeTSud1oyV6iHPUIlWEjGcPkhvI=; b=ygcczdBodNaTKVzFwxIcsmZsm1h02TlyGNKHsPyRLUKTfpRRw0LKq3G3WWPB7KJ5aG EKHqgQXGn0N8Lo32ezKoUW73t8ByxUHUVLlKUEU3eW2CPBUZv6zsqSq2/EPUJUJYvB70 grs+X7erBCW+Cc6Wb3e6383TSUhG8Qk4tmhE4j0vpeYxBdDvKORXYw+SDRHF9Y08Rp7q +aCdpI20832Fwu5Os+ktUxRzn5MOJ1QoB0MaKxQD9ceknlDM9gQXneLV5yNcTAO6eDf9 q20HnVHmdCnBXgvxrhpR2KoOhl0m4s55Jj+RXt77zARrH19jonj2mIFKxjONNEE8sc38 Y5VQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695149098; x=1695753898; h=cc:to:content-transfer-encoding:mime-version:message-id:date :subject:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=2uA2AKw5TRuWkTphPeTSud1oyV6iHPUIlWEjGcPkhvI=; b=LsCbBtmLayAuUj6gteVG8w6cgd1yrbAer9Y7b7VoWPDswAqL/A2vHmBnlr+EDbf6BK boTWELhjjV+/zEFbX1VS223SBcvbNh4Rbh8loulQyUXsiX0B16Vw3QTMkSaman9qDr9t Kv5P6gqDclgCNtcvOPmYyQeUrYGe7IrWZSZGMqFyv4lGUEC2HMoX3WbmuNKH7MwpXtjw Kz0UOxNWBY6egrAuNDCyzyQrBGysoszGmBeys4+YbBsyzqm/9MmjYkuu1S8rV/u4/I0q cWHFMhTc5st08E9u3EXcwnQCvB61fDDOE2XjXGWQ7c6dyWDZb25LaNPkM5X3jS2mMoih rD2A== X-Gm-Message-State: AOJu0Yy/jZ6kFCQLGHcadkGWfh2cvuWNI0/Z4suKUWH7tfO2qIQCQtAj HAAFIAA/hzEZxLPknGpiQZdcvg== X-Received: by 2002:a17:90a:e2ce:b0:26b:5ba4:4948 with SMTP id fr14-20020a17090ae2ce00b0026b5ba44948mr534507pjb.12.1695149098229; Tue, 19 Sep 2023 11:44:58 -0700 (PDT) Received: from charlie.ba.rivosinc.com ([66.220.2.162]) by smtp.gmail.com with ESMTPSA id m5-20020a17090b068500b0026309d57724sm3876846pjz.39.2023.09.19.11.44.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Sep 2023 11:44:57 -0700 (PDT) From: Charlie Jenkins <charlie@rivosinc.com> Subject: [PATCH v7 0/4] riscv: Add fine-tuned checksum functions Date: Tue, 19 Sep 2023 11:44:29 -0700 Message-Id: <20230919-optimize_checksum-v7-0-06c7d0ddd5d6@rivosinc.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-B4-Tracking: v=1; b=H4sIAA3sCWUC/23QTWrDMBAF4KsEresijf6z6j1KKNJIqUWxFaxEt A2+e+VQqClavoH5Znh3UuKSYiHHw50ssaaS8tyCfjoQHN38HocUWiZAgVNDxZAv1zSl7/iGY8S PcpuG4JmQYIxDYKTtXZZ4Tp8P8/XU8pjKNS9fjxOVbdNfDVRHq2ygg+VaUuYbC+5lSTWXNOMz5 olsYIU/xFLZQ6AhiEFJE7wW3HcQvkd0D+EbIikELoINmncQsUMY6yGiIVojggs2NqWDyD3Sa7j K7RMrvQEHWurYQdQe6XaiGsKEU3hWDBUz/5B1XX8A5VvcmA4CAAA= To: Charlie Jenkins <charlie@rivosinc.com>, Palmer Dabbelt <palmer@dabbelt.com>, Conor Dooley <conor@kernel.org>, Samuel Holland <samuel.holland@sifive.com>, David Laight <David.Laight@aculab.com>, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org Cc: Paul Walmsley <paul.walmsley@sifive.com>, Albert Ou <aou@eecs.berkeley.edu>, Arnd Bergmann <arnd@arndb.de>, David Laight <david.laight@aculab.com> X-Mailer: b4 0.12.3 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Tue, 19 Sep 2023 11:45:05 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777497096932282334 X-GMAIL-MSGID: 1777497096932282334 |
Series |
riscv: Add fine-tuned checksum functions
|
|
Message
Charlie Jenkins
Sept. 19, 2023, 6:44 p.m. UTC
Each architecture generally implements fine-tuned checksum functions to
leverage the instruction set. This patch adds the main checksum
functions that are used in networking.
Vector support is included in this patch to start a discussion on that,
it can probably be optimized more. The vector patches still need some
work as they rely on GCC vector intrinsics types which cannot work in
the kernel since it requires C vector support rather than just assembler
support. I have tested the vector patches as standalone algorithms in QEMU.
This patch takes heavy use of the Zbb extension using alternatives
patching.
To test this patch, enable the configs for KUNIT, then CHECKSUM_KUNIT
and RISCV_CHECKSUM_KUNIT.
I have attempted to make these functions as optimal as possible, but I
have not ran anything on actual riscv hardware. My performance testing
has been limited to inspecting the assembly, running the algorithms on
x86 hardware, and running in QEMU.
ip_fast_csum is a relatively small function so even though it is
possible to read 64 bits at a time on compatible hardware, the
bottleneck becomes the clean up and setup code so loading 32 bits at a
time is actually faster.
---
The algorithm proposed to replace the default csum_fold can be seen to
compute the same result by running all 2^32 possible inputs.
static inline unsigned int ror32(unsigned int word, unsigned int shift)
{
return (word >> (shift & 31)) | (word << ((-shift) & 31));
}
unsigned short csum_fold(unsigned int csum)
{
unsigned int sum = csum;
sum = (sum & 0xffff) + (sum >> 16);
sum = (sum & 0xffff) + (sum >> 16);
return ~sum;
}
unsigned short csum_fold_arc(unsigned int csum)
{
return ((~csum - ror32(csum, 16)) >> 16);
}
int main()
{
unsigned int start = 0x0;
do {
if (csum_fold(start) != csum_fold_arc(start)) {
printf("Not the same %u\n", start);
return -1;
}
start += 1;
} while(start != 0x0);
printf("The same\n");
return 0;
}
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
To: Charlie Jenkins <charlie@rivosinc.com>
To: Palmer Dabbelt <palmer@dabbelt.com>
To: Conor Dooley <conor@kernel.org>
To: Samuel Holland <samuel.holland@sifive.com>
To: David Laight <David.Laight@aculab.com>
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
To: linux-arch@vger.kernel.org
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
Changes in v7:
- Included linux/bitops.h in asm-generic/checksum.h to use ror (Conor)
- Optimized loop in do_csum (David)
- Used ror instead of shifting (David)
- Unfortunately had to reintroduce ifdefs because gcc is not smart
enough to not throw warnings on code that will never execute
- Use ifdef instead of IS_ENABLED on __LITTLE_ENDIAN because IS_ENABLED
does not work on that
- Only optimize for zbb when alternatives is enabled in do_csum
- Link to v6: https://lore.kernel.org/r/20230915-optimize_checksum-v6-0-14a6cf61c618@rivosinc.com
Changes in v6:
- Fix accuracy of commit message for csum_fold
- Fix indentation
- Link to v5: https://lore.kernel.org/r/20230914-optimize_checksum-v5-0-c95b82a2757e@rivosinc.com
Changes in v5:
- Drop vector patches
- Check ZBB enabled before doing any ZBB code (Conor)
- Check endianness in IS_ENABLED
- Revert to the simpler non-tree based version of ipv6_csum_magic since
David pointed out that the tree based version is not better.
- Link to v4: https://lore.kernel.org/r/20230911-optimize_checksum-v4-0-77cc2ad9e9d7@rivosinc.com
Changes in v4:
- Suggestion by David Laight to use an improved checksum used in
arch/arc.
- Eliminates zero-extension on rv32, but not on rv64.
- Reduces data dependency which should improve execution speed on
rv32 and rv64
- Still passes CHECKSUM_KUNIT and RISCV_CHECKSUM_KUNIT on rv32 and
rv64 with and without zbb.
- Link to v3: https://lore.kernel.org/r/20230907-optimize_checksum-v3-0-c502d34d9d73@rivosinc.com
Changes in v3:
- Use riscv_has_extension_likely and has_vector where possible (Conor)
- Reduce ifdefs by using IS_ENABLED where possible (Conor)
- Use kernel_vector_begin in the vector code (Samuel)
- Link to v2: https://lore.kernel.org/r/20230905-optimize_checksum-v2-0-ccd658db743b@rivosinc.com
Changes in v2:
- After more benchmarking, rework functions to improve performance.
- Remove tests that overlapped with the already existing checksum
tests and make tests more extensive.
- Use alternatives to activate code with Zbb and vector extensions
- Link to v1: https://lore.kernel.org/r/20230826-optimize_checksum-v1-0-937501b4522a@rivosinc.com
---
Charlie Jenkins (4):
asm-generic: Improve csum_fold
riscv: Checksum header
riscv: Add checksum library
riscv: Test checksum functions
arch/riscv/Kconfig.debug | 1 +
arch/riscv/include/asm/checksum.h | 91 ++++++++++
arch/riscv/lib/Kconfig.debug | 31 ++++
arch/riscv/lib/Makefile | 3 +
arch/riscv/lib/csum.c | 217 ++++++++++++++++++++++
arch/riscv/lib/riscv_checksum_kunit.c | 330 ++++++++++++++++++++++++++++++++++
include/asm-generic/checksum.h | 6 +-
7 files changed, 676 insertions(+), 3 deletions(-)
---
base-commit: da5f5b0f1b813dafe9ce81b70fed01b0d103d556
change-id: 20230804-optimize_checksum-db145288ac21