From patchwork Tue Jul 11 15:37:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= X-Patchwork-Id: 118641 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp601166vqm; Tue, 11 Jul 2023 09:24:47 -0700 (PDT) X-Google-Smtp-Source: APBJJlFdA3aj1cop+cfFxo4i3i4Xn/0ovR3QvmaEbnJ+yQfmCROW3vAyesA0zMlZI9bbOhI/z8Aq X-Received: by 2002:a17:902:c951:b0:1b8:3e15:46a1 with SMTP id i17-20020a170902c95100b001b83e1546a1mr22050718pla.67.1689092686716; Tue, 11 Jul 2023 09:24:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689092686; cv=none; d=google.com; s=arc-20160816; b=tKRDXWt/yfriH3QPCHr4ChxrmTDwt+HjCGPhYNYYLLRjRXdxXWIflyLvGMV4+LOco6 lDCMmJ5joZzoWrrte+V73FgLAw9AaFpjLzwMZ4xkd9+Ltrs21kwAGUT2Cqbjg/8yJxzg 8ovVEVkcdp4IMQkYbdAnWXLi0ZEu5VPsws1VRgyeK+1luwnUGsv8SUkIDLhVy9TZ4BdI J80pJYOfib+/ltKkDFUeET5i0HIOzVlaPWn9lli8FQluUuoFJI+MTAz9bp5D09II5P3Z U7UDxO3ESOuqBFDMNxEo0yevRLPsXDJCrjiAbJB7eFbkj+WYlSRW7Ji6GsRPeh/0N2Jg 8oFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=4TbbVO7SG7dp47Aal+U21iYG9z+QbnBVlp8GwpXOJCs=; fh=xiqSxPwsZ/LAymHJkjgLulIYeRsb1DY7WkvCFfPxYXc=; b=0mgSNKxJxiDkqqjpQSqRWx9/JbmJRK0X6jLXRZA1EXpQTDcF1y2/77S9SJNEokKgUu MSfM8d0PlTzmJQYCZwxIi7ZIPixqq1BuBSWqmZ7KJBktlzvw/kVIpF2U7mSp+8Ut4l2l 0nB4u8GlQ2MczD+bKxmSNTQ1eXgpH8RlSGECcOFuGBhH7ijao4D5AbUQXzsM6M9pDE2g 8KvsmeC8MTW+IlOM+OvyztI3qSW1AOwAsGdtoN5bRhylrXwaRh7Da4p7ZgPjfnZKJY7+ gN+5nUSgLL2jyMHgWr5HLD8wE24E1C6K45WLvmB0cG4N3lCjWrIyl8M/AKGnFtFMZHGB SwYA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o10-20020a170902e28a00b001b7fd6f9606si1683668plc.508.2023.07.11.09.24.33; Tue, 11 Jul 2023 09:24:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233062AbjGKPjX (ORCPT + 99 others); Tue, 11 Jul 2023 11:39:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233038AbjGKPjA (ORCPT ); Tue, 11 Jul 2023 11:39:00 -0400 Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8155ED; Tue, 11 Jul 2023 08:38:01 -0700 (PDT) Received: from i53875a50.versanet.de ([83.135.90.80] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1qJFQx-0000g0-T5; Tue, 11 Jul 2023 17:37:47 +0200 From: Heiko Stuebner To: palmer@dabbelt.com, paul.walmsley@sifive.com Cc: aou@eecs.berkeley.edu, heiko@sntech.de, herbert@gondor.apana.org.au, davem@davemloft.net, conor.dooley@microchip.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org, christoph.muellner@vrull.eu, ebiggers@kernel.org, Greentime Hu , Han-Kuan Chen , Heiko Stuebner Subject: [PATCH v4 02/12] riscv: Add vector extension XOR implementation Date: Tue, 11 Jul 2023 17:37:33 +0200 Message-Id: <20230711153743.1970625-3-heiko@sntech.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230711153743.1970625-1-heiko@sntech.de> References: <20230711153743.1970625-1-heiko@sntech.de> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_PASS,T_SCC_BODY_TEXT_LINE, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771142052759832226 X-GMAIL-MSGID: 1771142052759832226 From: Greentime Hu This patch adds support for vector optimized XOR and it is tested in qemu. Co-developed-by: Han-Kuan Chen Signed-off-by: Han-Kuan Chen Signed-off-by: Greentime Hu Signed-off-by: Heiko Stuebner --- arch/riscv/include/asm/xor.h | 82 ++++++++++++++++++++++++++++++++++++ arch/riscv/lib/Makefile | 1 + arch/riscv/lib/xor.S | 81 +++++++++++++++++++++++++++++++++++ 3 files changed, 164 insertions(+) create mode 100644 arch/riscv/include/asm/xor.h create mode 100644 arch/riscv/lib/xor.S diff --git a/arch/riscv/include/asm/xor.h b/arch/riscv/include/asm/xor.h new file mode 100644 index 000000000000..74867c7fd955 --- /dev/null +++ b/arch/riscv/include/asm/xor.h @@ -0,0 +1,82 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2021 SiFive + */ + +#include +#include +#ifdef CONFIG_VECTOR +#include +#include + +void xor_regs_2_(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2); +void xor_regs_3_(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3); +void xor_regs_4_(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3, + const unsigned long *__restrict p4); +void xor_regs_5_(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3, + const unsigned long *__restrict p4, + const unsigned long *__restrict p5); + +static void xor_rvv_2(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2) +{ + kernel_rvv_begin(); + xor_regs_2_(bytes, p1, p2); + kernel_rvv_end(); +} + +static void xor_rvv_3(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3) +{ + kernel_rvv_begin(); + xor_regs_3_(bytes, p1, p2, p3); + kernel_rvv_end(); +} + +static void xor_rvv_4(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3, + const unsigned long *__restrict p4) +{ + kernel_rvv_begin(); + xor_regs_4_(bytes, p1, p2, p3, p4); + kernel_rvv_end(); +} + +static void xor_rvv_5(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3, + const unsigned long *__restrict p4, + const unsigned long *__restrict p5) +{ + kernel_rvv_begin(); + xor_regs_5_(bytes, p1, p2, p3, p4, p5); + kernel_rvv_end(); +} + +static struct xor_block_template xor_block_rvv = { + .name = "rvv", + .do_2 = xor_rvv_2, + .do_3 = xor_rvv_3, + .do_4 = xor_rvv_4, + .do_5 = xor_rvv_5 +}; + +#undef XOR_TRY_TEMPLATES +#define XOR_TRY_TEMPLATES \ + do { \ + xor_speed(&xor_block_8regs); \ + xor_speed(&xor_block_32regs); \ + if (has_vector()) { \ + xor_speed(&xor_block_rvv);\ + } \ + } while (0) +#endif diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile index 26cb2502ecf8..3164112680f1 100644 --- a/arch/riscv/lib/Makefile +++ b/arch/riscv/lib/Makefile @@ -11,3 +11,4 @@ lib-$(CONFIG_64BIT) += tishift.o lib-$(CONFIG_RISCV_ISA_ZICBOZ) += clear_page.o obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o +lib-$(CONFIG_VECTOR) += xor.o diff --git a/arch/riscv/lib/xor.S b/arch/riscv/lib/xor.S new file mode 100644 index 000000000000..3bc059e18171 --- /dev/null +++ b/arch/riscv/lib/xor.S @@ -0,0 +1,81 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2021 SiFive + */ +#include +#include +#include + +ENTRY(xor_regs_2_) + vsetvli a3, a0, e8, m8, ta, ma + vle8.v v0, (a1) + vle8.v v8, (a2) + sub a0, a0, a3 + vxor.vv v16, v0, v8 + add a2, a2, a3 + vse8.v v16, (a1) + add a1, a1, a3 + bnez a0, xor_regs_2_ + ret +END(xor_regs_2_) +EXPORT_SYMBOL(xor_regs_2_) + +ENTRY(xor_regs_3_) + vsetvli a4, a0, e8, m8, ta, ma + vle8.v v0, (a1) + vle8.v v8, (a2) + sub a0, a0, a4 + vxor.vv v0, v0, v8 + vle8.v v16, (a3) + add a2, a2, a4 + vxor.vv v16, v0, v16 + add a3, a3, a4 + vse8.v v16, (a1) + add a1, a1, a4 + bnez a0, xor_regs_3_ + ret +END(xor_regs_3_) +EXPORT_SYMBOL(xor_regs_3_) + +ENTRY(xor_regs_4_) + vsetvli a5, a0, e8, m8, ta, ma + vle8.v v0, (a1) + vle8.v v8, (a2) + sub a0, a0, a5 + vxor.vv v0, v0, v8 + vle8.v v16, (a3) + add a2, a2, a5 + vxor.vv v0, v0, v16 + vle8.v v24, (a4) + add a3, a3, a5 + vxor.vv v16, v0, v24 + add a4, a4, a5 + vse8.v v16, (a1) + add a1, a1, a5 + bnez a0, xor_regs_4_ + ret +END(xor_regs_4_) +EXPORT_SYMBOL(xor_regs_4_) + +ENTRY(xor_regs_5_) + vsetvli a6, a0, e8, m8, ta, ma + vle8.v v0, (a1) + vle8.v v8, (a2) + sub a0, a0, a6 + vxor.vv v0, v0, v8 + vle8.v v16, (a3) + add a2, a2, a6 + vxor.vv v0, v0, v16 + vle8.v v24, (a4) + add a3, a3, a6 + vxor.vv v0, v0, v24 + vle8.v v8, (a5) + add a4, a4, a6 + vxor.vv v16, v0, v8 + add a5, a5, a6 + vse8.v v16, (a1) + add a1, a1, a6 + bnez a0, xor_regs_5_ + ret +END(xor_regs_5_) +EXPORT_SYMBOL(xor_regs_5_)