Message ID | 20230719-mcrc-upstream-v2-0-4152b987e4c2@ti.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp730890vqi; Thu, 10 Aug 2023 15:46:19 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEar89doKfw2CV5+jBwvna/zGzLUaS2CnqRyP8wuqcXG630XV4EGSSnjsWAk/9u7MkVcHej X-Received: by 2002:a05:6a20:458:b0:134:a8a1:3bf with SMTP id b24-20020a056a20045800b00134a8a103bfmr340516pzb.30.1691707578634; Thu, 10 Aug 2023 15:46:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691707578; cv=none; d=google.com; s=arc-20160816; b=QyQUgaJSn2cAQOAx7pp8CCL5l/7KTJLeyWk5PT3XofjQcp4jxy1cU0Vs/0ZOC4P7sc UCK1iqklpkVgqfGY6nKpW6D5xiHK/hT6g0xfY81Y03DQdrB7FW9nG2YgrNiCjmrO022k 8gtsI0G811xZRqS2/Qyn8Np1yufH7FoWmIhJVgESWhJ6HPiI4y8ZQiKe9rJq50iEzvBh SjQPrhmraoc4I57PmEDmPlp3gjnY3ciSMRd/hsqVD31jnW8QEV60TIrC+xiDes7CC1Qn jN1SHS87bSuiREbiEr5mRZJNfgWDtEocDc3FyHq4DCcW2azPuEph8kePIKY6lUEMvWU+ uePQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:content-transfer-encoding:mime-version :message-id:date:subject:from:dkim-signature; bh=z+/O/oJEzBl6XJbUbJ5i2E5EgCmgXxI/5YsPmBtngIE=; fh=5Adzayv4n00ggrhmh75Ans57g7YYBPZXGF6nzTUCZX0=; b=OFvH/gPXs8GZ2oj0puwI6WYZkGzM30ajHL0yNGu+YLIxUR1tY004z2wjAXigR3sNpo 5nas5VWBMQyR+oo067fyMO5M/8IZ9e505ukmnR6d4yBJsAUwH5zUPF5I6r9fU0y87Vt4 hEOZw+sTfbbjEjq/InRO+cnYQwppo1LwUlvT9/OYYyjFipTH2xDCmTUOk83MtIc2+Usm 0dOqtRwoe8lKmL20mc85l6pgBdg1LaUxRz6UTrKdCA8GzRhLrsENHgbLxi5UG40t7MaH gwab4XiwpcNoiEQr26YrTFRogsNeaBNbqPs2928pXmag6yf2oRlNlFHPrwRVmcadnsta YkFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ti.com header.s=ti-com-17Q1 header.b=ykTkuXWC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=ti.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a9-20020a63cd49000000b005653d4f9db9si2340171pgj.585.2023.08.10.15.46.05; Thu, 10 Aug 2023 15:46:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ti.com header.s=ti-com-17Q1 header.b=ykTkuXWC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=ti.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236419AbjHJT3v (ORCPT <rfc822;lanlanxiyiji@gmail.com> + 99 others); Thu, 10 Aug 2023 15:29:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43560 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233486AbjHJT3u (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 10 Aug 2023 15:29:50 -0400 Received: from fllv0016.ext.ti.com (fllv0016.ext.ti.com [198.47.19.142]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 284871728; Thu, 10 Aug 2023 12:29:50 -0700 (PDT) Received: from lelv0265.itg.ti.com ([10.180.67.224]) by fllv0016.ext.ti.com (8.15.2/8.15.2) with ESMTP id 37AJTTZ2001573; Thu, 10 Aug 2023 14:29:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ti.com; s=ti-com-17Q1; t=1691695769; bh=z+/O/oJEzBl6XJbUbJ5i2E5EgCmgXxI/5YsPmBtngIE=; h=From:Subject:Date:To:CC; b=ykTkuXWCaxSY2N8vzYZyxhnQJzwRdH9+VStVZWpLYPv/ei6KDvtqL45AZ+6lG95pf Vwwc7IAbge4NzkEfgSKAYAt7CBWi6ByQL8CG6qkoXSJrXvWxejY9+q7CFfgkF9g2wZ y0e81aUAt8J9RtAGBgYT0f7ePvVokgTGYJ1wfs1A= Received: from DLEE101.ent.ti.com (dlee101.ent.ti.com [157.170.170.31]) by lelv0265.itg.ti.com (8.15.2/8.15.2) with ESMTPS id 37AJTTVO027942 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 10 Aug 2023 14:29:29 -0500 Received: from DLEE114.ent.ti.com (157.170.170.25) by DLEE101.ent.ti.com (157.170.170.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2507.23; Thu, 10 Aug 2023 14:29:29 -0500 Received: from lelv0326.itg.ti.com (10.180.67.84) by DLEE114.ent.ti.com (157.170.170.25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2507.23 via Frontend Transport; Thu, 10 Aug 2023 14:29:29 -0500 Received: from localhost (ileaxei01-snat2.itg.ti.com [10.180.69.6]) by lelv0326.itg.ti.com (8.15.2/8.15.2) with ESMTP id 37AJTSeW002155; Thu, 10 Aug 2023 14:29:28 -0500 From: Kamlesh Gurudasani <kamlesh@ti.com> Subject: [PATCH v2 0/6] Add support for Texas Instruments MCRC64 engine Date: Fri, 11 Aug 2023 00:58:47 +0530 Message-ID: <20230719-mcrc-upstream-v2-0-4152b987e4c2@ti.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-B4-Tracking: v=1; b=H4sIAG861WQC/3WNMQ6DMAxFr4Iy1xUEqkAn7lExGOOWDAFkp6gV4 u4N7B3f/3p6m1EWz2ru2WaEV69+nhLYS2ZoxOnF4IfExua2zF3RQCAheC8ahTGAQ27Q3Qj70pn k9KgMveBE42EF1MhyHIvw03/O0KNLPHqNs3zP7loc67/EWkAOA9WuqdFWVLk2+ivNwXT7vv8Ad 3v6JMAAAAA= To: Herbert Xu <herbert@gondor.apana.org.au>, "David S. Miller" <davem@davemloft.net>, Rob Herring <robh+dt@kernel.org>, Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>, Conor Dooley <conor+dt@kernel.org>, Nishanth Menon <nm@ti.com>, Vignesh Raghavendra <vigneshr@ti.com>, Tero Kristo <kristo@kernel.org>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Maxime Coquelin <mcoquelin.stm32@gmail.com>, Alexandre Torgue <alexandre.torgue@foss.st.com> CC: <linux-crypto@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <devicetree@vger.kernel.org>, <linux-arm-kernel@lists.infradead.org>, <linux-stm32@st-md-mailman.stormreply.com>, Kamlesh Gurudasani <kamlesh@ti.com> X-Mailer: b4 0.12.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1691695767; l=3032; i=kamlesh@ti.com; s=20230614; h=from:subject:message-id; bh=LWqG7WYZ/paBwbFSp816khIcJHbId0/YeHLEz1IDBiU=; b=iYk5pAb85XLs5Klt2bkUeSJNIOB7CmIuI1EPgQsM7rZfDSZVI0F/OW7/m/KAgIXAqx5AQZaLb sKT2A/5OJpEDjQvuVCREMZkbRlshuk4cDCvPNLN5LjE3igJThAj3ycJ X-Developer-Key: i=kamlesh@ti.com; a=ed25519; pk=db9XKPVWDGJVqj2jDqgnPQd6uQf3GZ3oaQa4bq1odGo= X-EXCLAIMER-MD-CONFIG: e1e8a2fd-e40a-4ac6-ac9b-f7e9cc9ee180 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773883965695729434 X-GMAIL-MSGID: 1773883965695729434 |
Series |
Add support for Texas Instruments MCRC64 engine
|
|
Message
Kamlesh Gurudasani
Aug. 10, 2023, 7:28 p.m. UTC
Add support for MCRC64 engine to calculate 64-bit CRC in Full-CPU mode
MCRC64 engine calculates 64-bit cyclic redundancy checks (CRC)
according to the ISO 3309 standard.
The ISO 3309 64-bit CRC model parameters are as follows:
Generator Polynomial: x^64 + x^4 + x^3 + x + 1
Polynomial Value: 0x000000000000001B
Initial value: 0x0000000000000000
Reflected Input: False
Reflected Output: False
Xor Final: 0x0000000000000000
Tested with
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y
and tcrypt,
sudo modprobe tcrypt mode=329 sec=1
User space application implemented using algif_hash,
https://gist.github.com/ti-kamlesh/73abfcc1a33318bb3b199d36b6209e59
Signed-off-by: Kamlesh Gurudasani <kamlesh@ti.com>
---
Changes in v2:
- Add generic implementation of crc64-iso
- Fixes according to review comments
- Link to v1: https://lore.kernel.org/r/20230719-mcrc-upstream-v1-0-dc8798a24c47@ti.com
---
Kamlesh Gurudasani (6):
lib: add ISO 3309 model crc64
crypto: crc64 - add crc64-iso framework
dt-bindings: crypto: Add Texas Instruments MCRC64
crypto: ti - add driver for MCRC64 engine
arm64: dts: ti: k3-am62: Add dt node, cbass_main ranges for MCRC64
arm64: defconfig: enable TI MCRC64 module
Documentation/devicetree/bindings/crypto/ti,mcrc64.yaml | 47 ++++++++
MAINTAINERS | 7 ++
arch/arm64/boot/dts/ti/k3-am62-main.dtsi | 7 ++
arch/arm64/boot/dts/ti/k3-am62.dtsi | 1 +
arch/arm64/configs/defconfig | 2 +
crypto/Kconfig | 11 ++
crypto/Makefile | 1 +
crypto/crc64_iso_generic.c | 119 ++++++++++++++++++
crypto/tcrypt.c | 5 +
crypto/testmgr.c | 7 ++
crypto/testmgr.h | 404 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
drivers/crypto/Kconfig | 1 +
drivers/crypto/Makefile | 1 +
drivers/crypto/ti/Kconfig | 10 ++
drivers/crypto/ti/Makefile | 2 +
drivers/crypto/ti/mcrc64.c | 442 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/crc64.h | 5 +
lib/crc64-iso.c | 126 +++++++++++++++++++
lib/crc64.c | 27 +++++
lib/gen_crc64table.c | 6 +
20 files changed, 1231 insertions(+)
---
base-commit: 21ef7b1e17d039053edaeaf41142423810572741
change-id: 20230719-mcrc-upstream-7ae9a75cab37
Best regards,
Comments
On Fri, Aug 11, 2023 at 12:58:47AM +0530, Kamlesh Gurudasani wrote: > Add support for MCRC64 engine to calculate 64-bit CRC in Full-CPU mode > > MCRC64 engine calculates 64-bit cyclic redundancy checks (CRC) > according to the ISO 3309 standard. > > The ISO 3309 64-bit CRC model parameters are as follows: > Generator Polynomial: x^64 + x^4 + x^3 + x + 1 > Polynomial Value: 0x000000000000001B > Initial value: 0x0000000000000000 > Reflected Input: False > Reflected Output: False > Xor Final: 0x0000000000000000 > > Tested with > CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set > CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y > > and tcrypt, > sudo modprobe tcrypt mode=329 sec=1 > > User space application implemented using algif_hash, > https://gist.github.com/ti-kamlesh/73abfcc1a33318bb3b199d36b6209e59 > > Signed-off-by: Kamlesh Gurudasani <kamlesh@ti.com> I do not see any in-kernel user of this CRC variant being introduced, which leaves algif_hash as the only use case. Can you elaborate on the benefit this brings to your application? Yes, it allows you to use your hardware CRC engine. But, that comes with all the overhead from the syscalls, algif_hash, and the driver. How does performance compare to a properly optimized software CRC implementation on your platform, i.e. an implementation using carryless multiplication instructions (e.g. ARMv8 CE) if available on your platform, otherwise an implementation using the slice-by-8 or slice-by-16 method? - Eric
On Fri, Aug 18, 2023 at 02:36:34PM +0530, Kamlesh Gurudasani wrote: > Hi Eric, > > We are more interested in offload than performance, with splice system > call and DMA mode in driver(will be implemented after this series gets > merged), good amount of cpu cycles will be saved. So it's for power usage, then? Or freeing up CPU for other tasks? > There is one more mode(auto mode) in mcrc64 which helps to verify crc64 > values against pre calculated crc64, saving the efforts of comparing in > userspace. Is there any path forward to actually support this? > > Current generic implementation of crc64-iso(part of this series) > gives 173 Mb/s of speed as opposed to mcrc64 which gives speed of 812 > Mb/s when tested with tcrypt. This doesn't answer my question, which to reiterate was: How does performance compare to a properly optimized software CRC implementation on your platform, i.e. an implementation using carryless multiplication instructions (e.g. ARMv8 CE) if available on your platform, otherwise an implementation using the slice-by-8 or slice-by-16 method? The implementation you tested was slice-by-1. Compared to that, it's common for slice-by-8 to speed up CRCs by about 4 times and for folding with carryless multiplication to speed up CRCs by 10-30 times, sometimes limited only by memory bandwidth. I don't know what specific results you would get on your specific CPU and for this specific CRC, and you could certainly see something different if you e.g. have some low-end embedded CPU. But those are the typical results I've seen for other CRCs on different CPUs. So, a software implementation may be more attractive than you realize. It could very well be the case that a PMULL based CRC implementation actually ends up with less CPU load than your "hardware offload", when taking into syscall, algif_hash, and driver overhead... - Eric
Kamlesh Gurudasani <kamlesh@ti.com> writes: ... > Hi Eric, thanks for your detailed and valuable inputs. > > As per your suggestion, we did some profiling. > > Use case is to calculate crc32/crc64 for file input from user space. > > Instead of directly implementing PMULL based CRC64, we made first comparison between > Case 1. > CRC32 (splice() + kernel space SW driver) > https://gist.github.com/ti-kamlesh/5be75dbde292e122135ddf795fad9f21 > > Case 2. > CRC32(mmap() + userspace armv8 crc32 instruction implementation) > (tried read() as well to get contents of file, but that lost to mmap() so not mentioning number here) > https://gist.github.com/ti-kamlesh/002df094dd522422c6cb62069e15c40d > > Case 3. > CRC64 (splice() + MCRC64 HW) > https://gist.github.com/ti-kamlesh/98b1fc36c9a7c3defcc2dced4136b8a0 > > > Overall, overhead of userspace + af_alg + driver in (Case 1) and > ( Case 3) is ~0.025s, which is constant for any file size. > This is calculated using real time to calculate crc - > driver time (time spend inside init() + update() +final()) = overhead ~0.025s > > > > +-------------------+-----------------------------+-----------------------+------------------------+------------------------+ > | | | | | | > | File size | 120mb(ideal size for us) | 20mb | 15mb | 5mb | > +===================+=============================+=======================+========================+========================+ > | | | | | | > | CRC32 (Case 1) | Driver time 0.155s | Driver time 0.0325s | Driver time 0.019s | Driver time 0.0062s | > | | real time 0.18s | real time 0.06s | real time 0.04s | real time 0.03s | > | | overhead 0.025s | overhead 0.025s | overhead 0.021s | overhead ~0.023s | > +-------------------+-----------------------------+-----------------------+------------------------+------------------------+ > | | | | | | > | CRC32 (Case 2) | Real time 0.30s | Real time 0.05s | Real time 0.04s | Real time 0.02s | > +-------------------+-----------------------------+-----------------------+------------------------+------------------------+ > | | | | | | > | CRC64 (Case 3) | Driver time 0.385s | Driver time 0.0665s | Driver time 0.0515s | Driver time 0.019s | > | | real time 0.41s | real time 0.09s | real time 0.08s | real time 0.04s | > | | overhead 0.025s | overhead 0.025s | overhead ~0.025s | overhead ~0.021s | > +-------------------+-----------------------------+-----------------------+------------------------+------------------------+ > > Here, if we consider similar numbers for crc64 PMULL implementation as > crc32 (case 2) , we save good number of cpu cycles using mcrc64 > in case of files bigger than 5-10mb as most of the time is being spent in HW offload. > > Regards, > Kamlesh Hi Eric, Please let me know if above numbers make sense to you and I should send next revision. Regards, Kamlesh
Kamlesh Gurudasani <kamlesh@ti.com> writes: >> >> Here, if we consider similar numbers for crc64 PMULL implementation as >> crc32 (case 2) , we save good number of cpu cycles using mcrc64 >> in case of files bigger than 5-10mb as most of the time is being spent in HW offload. >> >> Regards, >> Kamlesh > > Hi Eric, > > Please let me know if above numbers make sense to you and I should send > next revision. Hi Eric, I understand that there is no in-kernel user for crc64-iso3309 and this is new algorithm that we are trying to add in linux kernel. As per your suggestion we did the calculations and it turns out to be we are saving good number of cpu cycles with HW offload. Also, there are some automotive customers who have a safety requirement to offload any parameters that are in Linux to ensure FFI. Let me know if you are willing to accept this driver, so that I can put efforts to send next revision. Regards, Kamlesh