From patchwork Mon Mar 13 19:12:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= X-Patchwork-Id: 69069 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1369268wrd; Mon, 13 Mar 2023 12:36:16 -0700 (PDT) X-Google-Smtp-Source: AK7set/Uset2Yf/aIh2oxIa46d4tXODo79IxOxPKkOTt7fozeK3ynNYf2q/Ifv/U8yEfX98xH5zA X-Received: by 2002:a17:90b:1e05:b0:237:7149:b333 with SMTP id pg5-20020a17090b1e0500b002377149b333mr36702344pjb.43.1678736176101; Mon, 13 Mar 2023 12:36:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678736176; cv=none; d=google.com; s=arc-20160816; b=F9nZscs42XwyE9p8EnVCkxyAQrQIR0CQ5iqMVJNeU+LH6n4euMD+xEwMYBg/e+2xl4 YnUfL6TfW11OWUqVVNstVibG6sM5MRWE0yaWM5mDcL6SmaxX3gablS8FgEWydwJuQ4as qdSxRho4fOVesFUohxhIq1BNY/vsPI00MA9ccZi2vMw3h3f3pqCYSA3kDvEkGFVZxc3B KArpJHdqjsXLOggT+wxsh4+mc/wd7tUIy4TvI952AqVa28jUHI9ApnkzqaGfhX0ZA5pk a9JLhYQahvNXH5t2dPkBLeMj1RStweOE954uTP0tvuQRVCN6uP6Mz7JPhW+buc0kA3Pf ksgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=Q+cO1kxqWnTSmcL3EKqhWERKSfiTWJeaLTkYEiu9fiw=; b=pUsKzcDouujRWNSUoqcQH+jjVAC0hBKidDvCvtktItpsM1lmOveE0QMw0bHeX1bZ+d wt9Ht9Uv4yFWyzA4QshizA2dY+S3GrgNW1Vpv0KIXs3iQ7KwVWVqTbojt6mNi3z9p6NX hCCPHAAxveMVlRNJdO/NN4v4WBN7y0bRdTmwYrgeXmSinC5MVKbZ3r7JZyDRpmqd6H0M hN3M2d/CsT3UqGtm1EW17rjdaIE89FZD/Lj0sYY700RlGIgJ2Ou+nKg0Yg/rY3lPiJ/D uYNvrWULtzFyigEjYj06P4nBxHjKs6/oIlJPghrX3x64OQLptCfAVbXCwiofjlZKVzWV QmGA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u3-20020a17090a890300b00234bc771801si461719pjn.57.2023.03.13.12.36.03; Mon, 13 Mar 2023 12:36:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230201AbjCMTNP (ORCPT + 99 others); Mon, 13 Mar 2023 15:13:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229845AbjCMTNM (ORCPT ); Mon, 13 Mar 2023 15:13:12 -0400 Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A84DCA255 for ; Mon, 13 Mar 2023 12:13:07 -0700 (PDT) Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pbnbT-00028k-V2; Mon, 13 Mar 2023 20:13:04 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, heiko@sntech.de Subject: [PATCH RFC v3 01/16] riscv: Add support for kernel mode vector Date: Mon, 13 Mar 2023 20:12:47 +0100 Message-Id: <20230313191302.580787-2-heiko.stuebner@vrull.eu> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu> References: <20230313191302.580787-1-heiko.stuebner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760282464524397023?= X-GMAIL-MSGID: =?utf-8?q?1760282464524397023?= From: Greentime Hu Add kernel_rvv_begin() and kernel_rvv_end() function declarations and corresponding definitions in kernel_mode_vector.c These are needed to wrap uses of vector in kernel mode. Co-developed-by: Vincent Chen Signed-off-by: Vincent Chen Signed-off-by: Greentime Hu Signed-off-by: Heiko Stuebner --- arch/riscv/include/asm/vector.h | 14 +++ arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/kernel_mode_vector.c | 132 +++++++++++++++++++++++++ 3 files changed, 147 insertions(+) create mode 100644 arch/riscv/kernel/kernel_mode_vector.c diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h index 9aeab4074ca8..202df9ea28d7 100644 --- a/arch/riscv/include/asm/vector.h +++ b/arch/riscv/include/asm/vector.h @@ -147,6 +147,20 @@ static inline void __switch_to_vector(struct task_struct *prev, riscv_v_vstate_restore(next, task_pt_regs(next)); } +static inline void riscv_v_flush_cpu_state(void) +{ + asm volatile ( + "vsetvli t0, x0, e8, m8, ta, ma\n\t" + "vmv.v.i v0, 0\n\t" + "vmv.v.i v8, 0\n\t" + "vmv.v.i v16, 0\n\t" + "vmv.v.i v24, 0\n\t" + : : : "t0"); +} + +void kernel_rvv_begin(void); +void kernel_rvv_end(void); + #else /* ! CONFIG_RISCV_ISA_V */ struct pt_regs; diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 48d345a5f326..304c500cc1f7 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -56,6 +56,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/ obj-$(CONFIG_RISCV_M_MODE) += traps_misaligned.o obj-$(CONFIG_FPU) += fpu.o obj-$(CONFIG_RISCV_ISA_V) += vector.o +obj-$(CONFIG_RISCV_ISA_V) += kernel_mode_vector.o obj-$(CONFIG_SMP) += smpboot.o obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_SMP) += cpu_ops.o diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c new file mode 100644 index 000000000000..2d704190c054 --- /dev/null +++ b/arch/riscv/kernel/kernel_mode_vector.c @@ -0,0 +1,132 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2012 ARM Ltd. + * Author: Catalin Marinas + * Copyright (C) 2017 Linaro Ltd. + * Copyright (C) 2021 SiFive + */ +#include +#include +#include +#include +#include + +#include +#include + +DECLARE_PER_CPU(bool, vector_context_busy); +DEFINE_PER_CPU(bool, vector_context_busy); + +/* + * may_use_vector - whether it is allowable at this time to issue vector + * instructions or access the vector register file + * + * Callers must not assume that the result remains true beyond the next + * preempt_enable() or return from softirq context. + */ +static __must_check inline bool may_use_vector(void) +{ + /* + * vector_context_busy is only set while preemption is disabled, + * and is clear whenever preemption is enabled. Since + * this_cpu_read() is atomic w.r.t. preemption, vector_context_busy + * cannot change under our feet -- if it's set we cannot be + * migrated, and if it's clear we cannot be migrated to a CPU + * where it is set. + */ + return !in_irq() && !irqs_disabled() && !in_nmi() && + !this_cpu_read(vector_context_busy); +} + +/* + * Claim ownership of the CPU vector context for use by the calling context. + * + * The caller may freely manipulate the vector context metadata until + * put_cpu_vector_context() is called. + */ +static void get_cpu_vector_context(void) +{ + bool busy; + + preempt_disable(); + busy = __this_cpu_xchg(vector_context_busy, true); + + WARN_ON(busy); +} + +/* + * Release the CPU vector context. + * + * Must be called from a context in which get_cpu_vector_context() was + * previously called, with no call to put_cpu_vector_context() in the + * meantime. + */ +static void put_cpu_vector_context(void) +{ + bool busy = __this_cpu_xchg(vector_context_busy, false); + + WARN_ON(!busy); + preempt_enable(); +} + +/* + * kernel_rvv_begin(): obtain the CPU vector registers for use by the calling + * context + * + * Must not be called unless may_use_vector() returns true. + * Task context in the vector registers is saved back to memory as necessary. + * + * A matching call to kernel_rvv_end() must be made before returning from the + * calling context. + * + * The caller may freely use the vector registers until kernel_rvv_end() is + * called. + */ +void kernel_rvv_begin(void) +{ + if (WARN_ON(!has_vector())) + return; + + WARN_ON(!may_use_vector()); + + /* Acquire kernel mode vector */ + get_cpu_vector_context(); + + /* Save vector state, if any */ + riscv_v_vstate_save(current, task_pt_regs(current)); + + /* Enable vector */ + riscv_v_enable(); + + /* Invalidate vector regs */ + riscv_v_flush_cpu_state(); +} +EXPORT_SYMBOL_GPL(kernel_rvv_begin); + +/* + * kernel_rvv_end(): give the CPU vector registers back to the current task + * + * Must be called from a context in which kernel_rvv_begin() was previously + * called, with no call to kernel_rvv_end() in the meantime. + * + * The caller must not use the vector registers after this function is called, + * unless kernel_rvv_begin() is called again in the meantime. + */ +void kernel_rvv_end(void) +{ + if (WARN_ON(!has_vector())) + return; + + /* Invalidate vector regs */ + riscv_v_flush_cpu_state(); + + /* Restore vector state, if any */ + riscv_v_vstate_restore(current, task_pt_regs(current)); + + /* disable vector */ + riscv_v_disable(); + + /* release kernel mode vector */ + put_cpu_vector_context(); +} +EXPORT_SYMBOL_GPL(kernel_rvv_end); From patchwork Mon Mar 13 19:12:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= X-Patchwork-Id: 69068 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1367334wrd; Mon, 13 Mar 2023 12:31:43 -0700 (PDT) X-Google-Smtp-Source: AK7set/M0z87LZsN+q4stuAh6NHxMtTX8DCIKf6ci+Aj1vLa9Yit4q3SG8RmQnyuP/IHb1z931u1 X-Received: by 2002:a17:90b:38c3:b0:236:704d:ab8c with SMTP id nn3-20020a17090b38c300b00236704dab8cmr38035369pjb.26.1678735903578; Mon, 13 Mar 2023 12:31:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678735903; cv=none; d=google.com; s=arc-20160816; b=tr9lgZB3Oy+JTgcSXBqH8bPT3/LsBIlugZVzFdFywvHrPDLA1VcdbWYg1ntaQBdSJO oBD8rxPQ9Ys8R8ki4bcUwJDSNbO+lapAHOp7miykcvIxs7JlZh2NUM36P5uYOwiGaOTf JioAhEJ1FwqZUho1aznYo4OWz3+UMaAQkJTbgVJtcVVExToFC0mVx4lIgWSSGCb/RXPn pwUXW8znzQ56TGRKrLX0q9H6B/PGCHeNgI9NFBFV8fxz3tTq4xY5IlKOOYllpCyHoPsL MwnfaYe5UIOoQLPk3IA0C9ZnbXihfJuAUxGA9oLjDWvcWZODfeH+rAf5b9w/thIJ8MPL XizA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=Zu1irl9P9YB5ITwf6EXO2Umq9HhJCyQ+UbXvIt20cPQ=; b=bf/0zmrnCI1/ryQ/bLgtkf2dD7nmkKJbKtFLxcsbh958Um+xzTMfe8AXxIIVDf3cOf xOqPI4mLreewxQtZc95hCx7UZNpUHA3py+6KkniZWM2huXoGYSHha5f3is2xJPfdkBrE nXQ5opyr/FgYkvAp0wSincxoME+qzfz5zJKVznLRuCajMgZI7sCocnHNBMEc0WHXqDFH 4m4HpX3aZw0RufAddytOs9WD4AAGsL+xA/OV0OB0F6BeGfj62ntEZnhC3FE+2uvQ81dV Z9os433ylQiTI/PHeq66Rsi8PhgUc7Z8XVRktw+cY4qZSeDDAwMyhejN8tPXfhOKYpZf Tjrg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z3-20020a17090abd8300b00234bfb1c4c6si453103pjr.58.2023.03.13.12.31.27; Mon, 13 Mar 2023 12:31:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231166AbjCMTNf (ORCPT + 99 others); Mon, 13 Mar 2023 15:13:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35642 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229977AbjCMTNO (ORCPT ); Mon, 13 Mar 2023 15:13:14 -0400 Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9063B762 for ; Mon, 13 Mar 2023 12:13:07 -0700 (PDT) Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pbnbU-00028k-6z; Mon, 13 Mar 2023 20:13:04 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, heiko@sntech.de Subject: [PATCH RFC v3 02/16] riscv: Add vector extension XOR implementation Date: Mon, 13 Mar 2023 20:12:48 +0100 Message-Id: <20230313191302.580787-3-heiko.stuebner@vrull.eu> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu> References: <20230313191302.580787-1-heiko.stuebner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760282178725440852?= X-GMAIL-MSGID: =?utf-8?q?1760282178725440852?= From: Greentime Hu This patch adds support for vector optimized XOR and it is tested in qemu. Co-developed-by: Han-Kuan Chen Signed-off-by: Han-Kuan Chen Signed-off-by: Greentime Hu Signed-off-by: Heiko Stuebner --- arch/riscv/include/asm/xor.h | 82 ++++++++++++++++++++++++++++++++++++ arch/riscv/lib/Makefile | 1 + arch/riscv/lib/xor.S | 81 +++++++++++++++++++++++++++++++++++ 3 files changed, 164 insertions(+) create mode 100644 arch/riscv/include/asm/xor.h create mode 100644 arch/riscv/lib/xor.S diff --git a/arch/riscv/include/asm/xor.h b/arch/riscv/include/asm/xor.h new file mode 100644 index 000000000000..74867c7fd955 --- /dev/null +++ b/arch/riscv/include/asm/xor.h @@ -0,0 +1,82 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2021 SiFive + */ + +#include +#include +#ifdef CONFIG_VECTOR +#include +#include + +void xor_regs_2_(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2); +void xor_regs_3_(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3); +void xor_regs_4_(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3, + const unsigned long *__restrict p4); +void xor_regs_5_(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3, + const unsigned long *__restrict p4, + const unsigned long *__restrict p5); + +static void xor_rvv_2(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2) +{ + kernel_rvv_begin(); + xor_regs_2_(bytes, p1, p2); + kernel_rvv_end(); +} + +static void xor_rvv_3(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3) +{ + kernel_rvv_begin(); + xor_regs_3_(bytes, p1, p2, p3); + kernel_rvv_end(); +} + +static void xor_rvv_4(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3, + const unsigned long *__restrict p4) +{ + kernel_rvv_begin(); + xor_regs_4_(bytes, p1, p2, p3, p4); + kernel_rvv_end(); +} + +static void xor_rvv_5(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3, + const unsigned long *__restrict p4, + const unsigned long *__restrict p5) +{ + kernel_rvv_begin(); + xor_regs_5_(bytes, p1, p2, p3, p4, p5); + kernel_rvv_end(); +} + +static struct xor_block_template xor_block_rvv = { + .name = "rvv", + .do_2 = xor_rvv_2, + .do_3 = xor_rvv_3, + .do_4 = xor_rvv_4, + .do_5 = xor_rvv_5 +}; + +#undef XOR_TRY_TEMPLATES +#define XOR_TRY_TEMPLATES \ + do { \ + xor_speed(&xor_block_8regs); \ + xor_speed(&xor_block_32regs); \ + if (has_vector()) { \ + xor_speed(&xor_block_rvv);\ + } \ + } while (0) +#endif diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile index 6c74b0bedd60..d87e0b6fe1d0 100644 --- a/arch/riscv/lib/Makefile +++ b/arch/riscv/lib/Makefile @@ -10,3 +10,4 @@ lib-$(CONFIG_MMU) += uaccess.o lib-$(CONFIG_64BIT) += tishift.o obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o +lib-$(CONFIG_VECTOR) += xor.o diff --git a/arch/riscv/lib/xor.S b/arch/riscv/lib/xor.S new file mode 100644 index 000000000000..3bc059e18171 --- /dev/null +++ b/arch/riscv/lib/xor.S @@ -0,0 +1,81 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2021 SiFive + */ +#include +#include +#include + +ENTRY(xor_regs_2_) + vsetvli a3, a0, e8, m8, ta, ma + vle8.v v0, (a1) + vle8.v v8, (a2) + sub a0, a0, a3 + vxor.vv v16, v0, v8 + add a2, a2, a3 + vse8.v v16, (a1) + add a1, a1, a3 + bnez a0, xor_regs_2_ + ret +END(xor_regs_2_) +EXPORT_SYMBOL(xor_regs_2_) + +ENTRY(xor_regs_3_) + vsetvli a4, a0, e8, m8, ta, ma + vle8.v v0, (a1) + vle8.v v8, (a2) + sub a0, a0, a4 + vxor.vv v0, v0, v8 + vle8.v v16, (a3) + add a2, a2, a4 + vxor.vv v16, v0, v16 + add a3, a3, a4 + vse8.v v16, (a1) + add a1, a1, a4 + bnez a0, xor_regs_3_ + ret +END(xor_regs_3_) +EXPORT_SYMBOL(xor_regs_3_) + +ENTRY(xor_regs_4_) + vsetvli a5, a0, e8, m8, ta, ma + vle8.v v0, (a1) + vle8.v v8, (a2) + sub a0, a0, a5 + vxor.vv v0, v0, v8 + vle8.v v16, (a3) + add a2, a2, a5 + vxor.vv v0, v0, v16 + vle8.v v24, (a4) + add a3, a3, a5 + vxor.vv v16, v0, v24 + add a4, a4, a5 + vse8.v v16, (a1) + add a1, a1, a5 + bnez a0, xor_regs_4_ + ret +END(xor_regs_4_) +EXPORT_SYMBOL(xor_regs_4_) + +ENTRY(xor_regs_5_) + vsetvli a6, a0, e8, m8, ta, ma + vle8.v v0, (a1) + vle8.v v8, (a2) + sub a0, a0, a6 + vxor.vv v0, v0, v8 + vle8.v v16, (a3) + add a2, a2, a6 + vxor.vv v0, v0, v16 + vle8.v v24, (a4) + add a3, a3, a6 + vxor.vv v0, v0, v24 + vle8.v v8, (a5) + add a4, a4, a6 + vxor.vv v16, v0, v8 + add a5, a5, a6 + vse8.v v16, (a1) + add a1, a1, a6 + bnez a0, xor_regs_5_ + ret +END(xor_regs_5_) +EXPORT_SYMBOL(xor_regs_5_) From patchwork Mon Mar 13 19:12:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= X-Patchwork-Id: 69072 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1372130wrd; Mon, 13 Mar 2023 12:44:10 -0700 (PDT) X-Google-Smtp-Source: AK7set+MjlSgyVt04+nYI7rO/bAGiOJT8/J7obkxjj3UExtBAtQmKL8Cy4zEW3ZHBhAO5dtx73N+ X-Received: by 2002:a05:6a20:63a1:b0:c7:4bf5:fa0a with SMTP id m33-20020a056a2063a100b000c74bf5fa0amr28941066pzg.48.1678736650495; Mon, 13 Mar 2023 12:44:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678736650; cv=none; d=google.com; s=arc-20160816; b=IubsiT6ZMYvYAdp/4KlNqSj56TeOJswY5f1om+pN60kCNL3Hk2KTzxH1Pk1GeHFSYC 2WBPSpz06f83wGadnMmUViigGGz21FFgrnESBsfaOlmhTp75Y3ljZugJJEFnTpbqZkm2 EfEqGXFIPHQzKVTOx8jahbQeI8fuDi7s3AKM0jBZklV+OeqWr0y1Qw2PrDQW4xp/FmGw Q2w5cCRVulhadEObEkpJQe8vP0x/7RTni+FfaZC8xuMsjfFnIkojQh2HrUV74X7RxgQd 4QjVRtGapppO/9vILKqR7B4/mlhS5oPsN7u29wqJHUfX/NsQ+KMA98SKaO7UtlYuVKU+ ILSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=WmnFDaMdtJq1tUGJsS3Yk7fBodS3EDm6zwlJrFpC2CQ=; b=V15CNyzYtpsHVyDqoUGEY2+YtKEMYR0PlKwgUGiT2R5X7pI8ARC7fUM85YRT2D0kaS LqoH9fgamren1/tpklYv7nqPQqh2CqDcRaMqwMWX8WiQtibBmJE/7vk417A29GNYcryy TF7rkMoF/fsT0/eKk17RCfHHIMg+mfuISheR6LvQEsaJAlk4bYObQv6VrVP6BVFxeyAj dvQ7WoxtCUYNcfg8BysINDtEnQdyrbjbFXK48i2SLg7uLwU/ToCXYzpXwAcrtk4nwKYS loEEy74fUSuL6dQAAFZawVxDXZDF9WqO4nP87Je+lANH9hxdsqr6MzfAyNQZqiJUY0RA oArQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q20-20020a635054000000b00502f9a7edb4si259885pgl.340.2023.03.13.12.43.56; Mon, 13 Mar 2023 12:44:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229813AbjCMTNS (ORCPT + 99 others); Mon, 13 Mar 2023 15:13:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35608 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230071AbjCMTNN (ORCPT ); Mon, 13 Mar 2023 15:13:13 -0400 Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A86E4A267 for ; Mon, 13 Mar 2023 12:13:07 -0700 (PDT) Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pbnbU-00028k-F7; Mon, 13 Mar 2023 20:13:04 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, heiko@sntech.de Subject: [PATCH RFC v3 03/16] RISC-V: add Zbc extension detection Date: Mon, 13 Mar 2023 20:12:49 +0100 Message-Id: <20230313191302.580787-4-heiko.stuebner@vrull.eu> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu> References: <20230313191302.580787-1-heiko.stuebner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760282962297569342?= X-GMAIL-MSGID: =?utf-8?q?1760282962297569342?= From: Heiko Stuebner Add handling for Zbc extension. Zbc provides instruction for carry-less multiplication. Signed-off-by: Heiko Stuebner --- arch/riscv/Kconfig | 22 ++++++++++++++++++++++ arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/kernel/cpu.c | 1 + arch/riscv/kernel/cpufeature.c | 1 + 4 files changed, 25 insertions(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index eb691dd8ee4f..8d83935f77d2 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -459,6 +459,28 @@ config RISCV_ISA_ZBB If you don't know what to do here, say Y. +config TOOLCHAIN_HAS_ZBC + bool + default y + depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zbc) + depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zbc) + depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900 + depends on AS_IS_GNU + +config RISCV_ISA_ZBC + bool "Zbc extension support for bit manipulation instructions" + depends on TOOLCHAIN_HAS_ZBC + depends on !XIP_KERNEL && MMU + default y + help + Adds support to dynamically detect the presence of the ZBC + extension (carry-less multiplication) and enable its usage. + + The Zbc extension provides instructions clmul, clmulh and clmulr + to accelerate carry-less multiplications. + + If you don't know what to do here, say Y. + config RISCV_ISA_ZICBOM bool "Zicbom extension support for non-coherent DMA operation" depends on !XIP_KERNEL && MMU diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h index ee73341bb1d4..92f234c6cb4c 100644 --- a/arch/riscv/include/asm/hwcap.h +++ b/arch/riscv/include/asm/hwcap.h @@ -43,6 +43,7 @@ #define RISCV_ISA_EXT_ZBB 30 #define RISCV_ISA_EXT_ZICBOM 31 #define RISCV_ISA_EXT_ZIHINTPAUSE 32 +#define RISCV_ISA_EXT_ZBC 33 #define RISCV_ISA_EXT_MAX 64 #define RISCV_ISA_EXT_NAME_LEN_MAX 32 diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c index 8400f0cc9704..5d47a0c75c69 100644 --- a/arch/riscv/kernel/cpu.c +++ b/arch/riscv/kernel/cpu.c @@ -188,6 +188,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = { __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM), __RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE), __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB), + __RISCV_ISA_EXT_DATA(zbc, RISCV_ISA_EXT_ZBC), __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF), __RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC), __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL), diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index e6d53e2e672b..c9099694b5bb 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -228,6 +228,7 @@ void __init riscv_fill_hwcap(void) SET_ISA_EXT_MAP("svinval", RISCV_ISA_EXT_SVINVAL); SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT); SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB); + SET_ISA_EXT_MAP("zbc", RISCV_ISA_EXT_ZBC); SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM); SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE); } From patchwork Mon Mar 13 19:12:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= X-Patchwork-Id: 69064 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1363472wrd; Mon, 13 Mar 2023 12:22:57 -0700 (PDT) X-Google-Smtp-Source: AK7set8krK+dj9Zb5jFNK7UKl6ms4xcB83mhXNEtNzafTDC86PaHcYlT0I+//0Tnu48/393d7zQE X-Received: by 2002:a17:902:da91:b0:19c:d7a9:8bf0 with SMTP id j17-20020a170902da9100b0019cd7a98bf0mr12952525plx.10.1678735377604; Mon, 13 Mar 2023 12:22:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678735377; cv=none; d=google.com; s=arc-20160816; b=nSaA1Anqao2sOCrjz5MGYwh17K/ilhAxCMp/2svAAfLz0uGR0oDftKv2xhC+k+IeQH AV/4OGgrFPxH18UQ0bS7yCDHPk8sxa44GHWaSVrCXWxUTder7fGCF9Qdv6ZQYMuS3102 tkkP+e1tmyh3+R2qRGwq9PYUA8NuTwpzrtciIEDoQb0MnKzl5Wn6elI59HLA7IDBB2rW qmrXMAP9eKJhin9zNJNFjtYsVfCy1Wjn155AQZpsNBxWLwRtMTdwkWAHQAa5Uwt6S/xJ XK2TsG05UozRAFM8Byqku8Re9MxGO2UCqmnwxzqxuGGdVYjT9tB0rC+JImcTtRKVfT/P xfKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=KtF1lFlpDWkxFgxPqL4VlyI53O1VGo/XkpSBIOHBQX4=; b=jXS9AgvUrIrUBl3bnYDEt8hUDEXtMJZ0znNMv/LqmNTeKkshj4l209P3D7j/I0SP/8 elhXU6pdZWfZBH33IDEzPFUs6hqmoJ70qGvfSUxvLDmFqIgv6LACASpwW877DaJ5600j LCNvwAmS7FvtofJ0hobDuL3DNGJNdnAVqCkyNmUQVpIREQ5xEojRNlIs4vf0sx4dRjWt +4b1mYyESJOCZo5tCegepLOHNebTL+kUq+p0i9qRIqKfiAEmwPv+eNJ+Fl5bp9tEJtcR 56YBi9wTYZoZKqXG+FEd2pw+T6Q+Ap9A1Uo74H6v+heJ58BYdzh9Rl0+XuxCQeUQf/pz yGPg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b11-20020a170902d50b00b00196711a94f6si460924plg.353.2023.03.13.12.22.43; Mon, 13 Mar 2023 12:22:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230251AbjCMTNV (ORCPT + 99 others); Mon, 13 Mar 2023 15:13:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230104AbjCMTNN (ORCPT ); Mon, 13 Mar 2023 15:13:13 -0400 Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A96F5CA2D for ; Mon, 13 Mar 2023 12:13:07 -0700 (PDT) Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pbnbU-00028k-NQ; Mon, 13 Mar 2023 20:13:04 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, heiko@sntech.de Subject: [PATCH RFC v3 04/16] RISC-V: add Zbkb extension detection Date: Mon, 13 Mar 2023 20:12:50 +0100 Message-Id: <20230313191302.580787-5-heiko.stuebner@vrull.eu> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu> References: <20230313191302.580787-1-heiko.stuebner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760281627565937956?= X-GMAIL-MSGID: =?utf-8?q?1760281627565937956?= From: Heiko Stuebner Add detection for Zbkb extension. Zbkb is part of the set of scalar cryptography extensions and provides bitmanip instructions for cryptography, with them being a "subset of the Zbb extension particularly useful for cryptography". Zbkb was ratified in january 2022. Expect code using the extension to pre-encode zbkb instructions, so don't introduce special toolchain requirements for now. Signed-off-by: Heiko Stuebner --- arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/kernel/cpu.c | 1 + arch/riscv/kernel/cpufeature.c | 1 + 3 files changed, 3 insertions(+) diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h index 92f234c6cb4c..b28548fb10f3 100644 --- a/arch/riscv/include/asm/hwcap.h +++ b/arch/riscv/include/asm/hwcap.h @@ -44,6 +44,7 @@ #define RISCV_ISA_EXT_ZICBOM 31 #define RISCV_ISA_EXT_ZIHINTPAUSE 32 #define RISCV_ISA_EXT_ZBC 33 +#define RISCV_ISA_EXT_ZBKB 34 #define RISCV_ISA_EXT_MAX 64 #define RISCV_ISA_EXT_NAME_LEN_MAX 32 diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c index 5d47a0c75c69..6f65aac68018 100644 --- a/arch/riscv/kernel/cpu.c +++ b/arch/riscv/kernel/cpu.c @@ -189,6 +189,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = { __RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE), __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB), __RISCV_ISA_EXT_DATA(zbc, RISCV_ISA_EXT_ZBC), + __RISCV_ISA_EXT_DATA(zbkb, RISCV_ISA_EXT_ZBKB), __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF), __RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC), __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL), diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index c9099694b5bb..eb7be8e7f24e 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -229,6 +229,7 @@ void __init riscv_fill_hwcap(void) SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT); SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB); SET_ISA_EXT_MAP("zbc", RISCV_ISA_EXT_ZBC); + SET_ISA_EXT_MAP("zbkb", RISCV_ISA_EXT_ZBKB); SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM); SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE); } From patchwork Mon Mar 13 19:12:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= X-Patchwork-Id: 69067 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1365567wrd; Mon, 13 Mar 2023 12:27:54 -0700 (PDT) X-Google-Smtp-Source: AK7set/Nv7ODvfuLEZDKlgrnjparPtCX4TDypTpd0BLN7SZzlvhdU88lrszY+Zwlr+/KlgFenZSe X-Received: by 2002:a17:90a:bd0d:b0:23b:4b4a:d778 with SMTP id y13-20020a17090abd0d00b0023b4b4ad778mr7306919pjr.34.1678735673966; Mon, 13 Mar 2023 12:27:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678735673; cv=none; d=google.com; s=arc-20160816; b=xF9pEGPWbKkTXbyJAKfPoluhD/JQb4nbzh7WPp8H+ZydQS17XGPjiRSqOYHGe9lCAf ytMpNFhG5ag3+L+v9om2L1R1X2X8VvFwZErl38gRLhxTPTKs3TlRwv81PMrXqhDzkigX Lytamwauta+yw2lOy4nLlvCwMuBGtqevYqXaEh6M2TEZ8nI/qcvXj+5jHtpqRyfzLRyM zo7+jkNXkTbJnYs1mPvArf10ApfPHV46CjXpyYB3ENE0i/2o+2NW1yEd3ryehEXJYPMW Kr/gh32eC41Z5ybwPbSTIuP8AmaDMr1vfn1f1NT1Y0A+7kii5V0g8tQaofsXUi47vL4O zkUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=vJF6CvlSXvmq7qX4eunn3BoeOyl7pdqADErsJfHPImU=; b=NeV1gs63rji0UAbFA80bp0TGgsyvJhuHr++/JG0vuzMpKGLzyPJKU1GhxVF+HL8e/4 q62AFpf9b3CO0CVNxU6kqrcxQhTy2/KmLJv6ojpvmpyjF0XZ93faK2azdpE3QAgmZhrd uKTiBjrkUirob488AhxXRqqKIl1trRxcE3oAIpM9iIDpo3b4h5wAS/94JaSJGvW1orpj xyEf2f8jwcJYT1cIb8GwzS1mdvjtLrGF/HE40B0E58bokVAX1hyPapRott9P6gytVt3h 4eWlLKrigpsMoj7NdN/1gdR5bosWkalq7R7bc2Nyjd280uPs9CWKHUkga5beRrZ8N252 it/w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c8-20020a170902724800b001a05e6d4b08si480210pll.40.2023.03.13.12.27.29; Mon, 13 Mar 2023 12:27:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230450AbjCMTN0 (ORCPT + 99 others); Mon, 13 Mar 2023 15:13:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35610 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230135AbjCMTNN (ORCPT ); Mon, 13 Mar 2023 15:13:13 -0400 Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A92E6BDF2 for ; Mon, 13 Mar 2023 12:13:07 -0700 (PDT) Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pbnbV-00028k-07; Mon, 13 Mar 2023 20:13:05 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, heiko@sntech.de Subject: [PATCH RFC v3 05/16] RISC-V: hook new crypto subdir into build-system Date: Mon, 13 Mar 2023 20:12:51 +0100 Message-Id: <20230313191302.580787-6-heiko.stuebner@vrull.eu> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu> References: <20230313191302.580787-1-heiko.stuebner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760281938371209739?= X-GMAIL-MSGID: =?utf-8?q?1760281938371209739?= From: Heiko Stuebner Create a crypto subdirectory for added accelerated cryptography routines and hook it into the riscv Kbuild and the main crypto Kconfig. Signed-off-by: Heiko Stuebner --- arch/riscv/Kbuild | 1 + arch/riscv/crypto/Kconfig | 5 +++++ arch/riscv/crypto/Makefile | 4 ++++ crypto/Kconfig | 3 +++ 4 files changed, 13 insertions(+) create mode 100644 arch/riscv/crypto/Kconfig create mode 100644 arch/riscv/crypto/Makefile diff --git a/arch/riscv/Kbuild b/arch/riscv/Kbuild index afa83e307a2e..250d1fd38618 100644 --- a/arch/riscv/Kbuild +++ b/arch/riscv/Kbuild @@ -2,6 +2,7 @@ obj-y += kernel/ mm/ net/ obj-$(CONFIG_BUILTIN_DTB) += boot/dts/ +obj-$(CONFIG_CRYPTO) += crypto/ obj-y += errata/ obj-$(CONFIG_KVM) += kvm/ diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig new file mode 100644 index 000000000000..10d60edc0110 --- /dev/null +++ b/arch/riscv/crypto/Kconfig @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +menu "Accelerated Cryptographic Algorithms for CPU (riscv)" + +endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile new file mode 100644 index 000000000000..b3b6332c9f6d --- /dev/null +++ b/arch/riscv/crypto/Makefile @@ -0,0 +1,4 @@ +# SPDX-License-Identifier: GPL-2.0-only +# +# linux/arch/riscv/crypto/Makefile +# diff --git a/crypto/Kconfig b/crypto/Kconfig index 9c86f7045157..003921cb0301 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -1401,6 +1401,9 @@ endif if PPC source "arch/powerpc/crypto/Kconfig" endif +if RISCV +source "arch/riscv/crypto/Kconfig" +endif if S390 source "arch/s390/crypto/Kconfig" endif From patchwork Mon Mar 13 19:12:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= X-Patchwork-Id: 69070 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1370006wrd; Mon, 13 Mar 2023 12:38:17 -0700 (PDT) X-Google-Smtp-Source: AK7set/8x68rcVjrEWDx25gVMyUZZ6D0Ww4JMqcxXlQJQLjWH/Aul4TvFJLXUw8nL1Yn1uq7eXEY X-Received: by 2002:a62:63c2:0:b0:5de:3c49:b06 with SMTP id x185-20020a6263c2000000b005de3c490b06mr12251504pfb.3.1678736297393; Mon, 13 Mar 2023 12:38:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678736297; cv=none; d=google.com; s=arc-20160816; b=sA4ZnNFl8onlcltlT+/iK83w1uNLS1DLxX0gz9YRSRD4MWFo2j8nUXIb+ALxZUWc0p V4YXNWK/2JyCqDvP9uWTCbCW1P+i2xbzv/IapCTv8dz0GyWZOA+vhlgCsJvu0J6NKQuu a5VbKXpHDp1u41+PxpefM0BN4DY3JLYmrM1GeF2zmqT4fpP1XsgbkG81+vxI3t8fu0Xe nZQ4VG21hHwhCBskIo5k+fMZYPB/79+scn0650JWmMsQjKEho+DGRYeYpr1PheHXdjvd w+y1i/XWUfZaQlZ2s1y0yS3EOmhsrsePPQXZqs2D7oUDpbSPlp61nGnnCvme9u/FOKgR 7nzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=kAgPGimJs8ABktzXFns1e6nuPh8j25GVJOc1kftWB5Y=; b=D2/634blbxjfI97guvCQwCAoDYr9gQb3O7ljhRcZ9OPJIsS7P2UJNDUKdc2+NKDVSS kDijeF+n2VwSi9LtwHeb1OJkocfCRDA0Eg7H03WN/w1f4r89OMNWT5G4Fx3Le6VLv7qP bFW4WB9hcIJ6nMQGz6i8DFHCtzxP+Z/ln6RYiurTt00X9hYxJZfI6whZZ7sceRvmrpqa Kh7hg7AqwK/hLWFwjS9nsXSM/zwiByo4CXk1HKJ/JQWoiRn34RGyzwHI+w6Q+JJbiWkM Zh6u7wSyAOZySjEsPneyjH66vFhsum+47SWcCWV+aq+j5ldxkdI4212mhJ9h7kmvlzt7 1ipg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 205-20020a6306d6000000b0050b19d24c3csi237487pgg.509.2023.03.13.12.38.04; Mon, 13 Mar 2023 12:38:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230254AbjCMTNm (ORCPT + 99 others); Mon, 13 Mar 2023 15:13:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35642 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229845AbjCMTNQ (ORCPT ); Mon, 13 Mar 2023 15:13:16 -0400 Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8C90AD30 for ; Mon, 13 Mar 2023 12:13:07 -0700 (PDT) Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pbnbV-00028k-95; Mon, 13 Mar 2023 20:13:05 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, heiko@sntech.de Subject: [PATCH RFC v3 06/16] RISC-V: crypto: add accelerated GCM GHASH implementation Date: Mon, 13 Mar 2023 20:12:52 +0100 Message-Id: <20230313191302.580787-7-heiko.stuebner@vrull.eu> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu> References: <20230313191302.580787-1-heiko.stuebner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760282591795809626?= X-GMAIL-MSGID: =?utf-8?q?1760282591795809626?= From: Heiko Stuebner With different sets of available extensions a number of different implementation variants are possible. Quite a number of them are already implemented in openSSL or are in the process of being implemented, so pick the relevant openSSL coden and add suitable glue code similar to arm64 and powerpc to use it for kernel-specific cryptography. The prioritization of the algorithms follows the ifdef chain for the assembly callbacks done in openssl but here algorithms will get registered separately so that all of them can be part of the crypto selftests. The crypto subsystem will select the most performant of all registered algorithms on the running system but will selftest all registered ones. In a first step this adds scalar variants using the Zbc, Zbb and possible Zbkb (bitmanip crypto extension) and the perl implementation stems from openSSL pull request on https://github.com/openssl/openssl/pull/20078 Co-developed-by: Christoph Müllner Signed-off-by: Christoph Müllner Signed-off-by: Heiko Stuebner --- arch/riscv/crypto/Kconfig | 11 + arch/riscv/crypto/Makefile | 14 + arch/riscv/crypto/ghash-riscv64-glue.c | 258 ++++++++++++++++ arch/riscv/crypto/ghash-riscv64-zbc.pl | 400 +++++++++++++++++++++++++ arch/riscv/crypto/riscv.pm | 230 ++++++++++++++ 5 files changed, 913 insertions(+) create mode 100644 arch/riscv/crypto/ghash-riscv64-glue.c create mode 100644 arch/riscv/crypto/ghash-riscv64-zbc.pl create mode 100644 arch/riscv/crypto/riscv.pm diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 10d60edc0110..010adbbb058a 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -2,4 +2,15 @@ menu "Accelerated Cryptographic Algorithms for CPU (riscv)" +config CRYPTO_GHASH_RISCV64 + tristate "Hash functions: GHASH" + depends on 64BIT && RISCV_ISA_ZBC + select CRYPTO_HASH + select CRYPTO_LIB_GF128MUL + help + GCM GHASH function (NIST SP800-38D) + + Architecture: riscv64 using one of: + - ZBC extension + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index b3b6332c9f6d..0a158919e9da 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -2,3 +2,17 @@ # # linux/arch/riscv/crypto/Makefile # + +obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o +ghash-riscv64-y := ghash-riscv64-glue.o +ifdef CONFIG_RISCV_ISA_ZBC +ghash-riscv64-y += ghash-riscv64-zbc.o +endif + +quiet_cmd_perlasm = PERLASM $@ + cmd_perlasm = $(PERL) $(<) void $(@) + +$(obj)/ghash-riscv64-zbc.S: $(src)/ghash-riscv64-zbc.pl + $(call cmd,perlasm) + +clean-files += ghash-riscv64-zbc.S diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c new file mode 100644 index 000000000000..6a6c39e16702 --- /dev/null +++ b/arch/riscv/crypto/ghash-riscv64-glue.c @@ -0,0 +1,258 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * RISC-V optimized GHASH routines + * + * Copyright (C) 2023 VRULL GmbH + * Author: Heiko Stuebner + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +/* Zbc (optional with zbkb improvements) */ +void gcm_ghash_rv64i_zbc(u64 Xi[2], const u128 Htable[16], + const u8 *inp, size_t len); +void gcm_ghash_rv64i_zbc__zbkb(u64 Xi[2], const u128 Htable[16], + const u8 *inp, size_t len); + +struct riscv64_ghash_ctx { + void (*ghash_func)(u64 Xi[2], const u128 Htable[16], + const u8 *inp, size_t len); + + /* key used by vector asm */ + u128 htable[16]; + /* key used by software fallback */ + be128 key; +}; + +struct riscv64_ghash_desc_ctx { + u64 shash[2]; + u8 buffer[GHASH_DIGEST_SIZE]; + int bytes; +}; + +static int riscv64_ghash_init(struct shash_desc *desc) +{ + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); + + dctx->bytes = 0; + memset(dctx->shash, 0, GHASH_DIGEST_SIZE); + return 0; +} + +#ifdef CONFIG_RISCV_ISA_ZBC + +#define RISCV64_ZBC_SETKEY(VARIANT, GHASH) \ +void gcm_init_rv64i_ ## VARIANT(u128 Htable[16], const u64 Xi[2]); \ +static int riscv64_zbc_ghash_setkey_ ## VARIANT(struct crypto_shash *tfm, \ + const u8 *key, \ + unsigned int keylen) \ +{ \ + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(tfm)); \ + const u64 k[2] = { cpu_to_be64(((const u64 *)key)[0]), \ + cpu_to_be64(((const u64 *)key)[1]) }; \ + \ + if (keylen != GHASH_BLOCK_SIZE) \ + return -EINVAL; \ + \ + memcpy(&ctx->key, key, GHASH_BLOCK_SIZE); \ + gcm_init_rv64i_ ## VARIANT(ctx->htable, k); \ + \ + ctx->ghash_func = gcm_ghash_rv64i_ ## GHASH; \ + \ + return 0; \ +} + +static int riscv64_zbc_ghash_update(struct shash_desc *desc, + const u8 *src, unsigned int srclen) +{ + unsigned int len; + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm)); + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); + + if (dctx->bytes) { + if (dctx->bytes + srclen < GHASH_DIGEST_SIZE) { + memcpy(dctx->buffer + dctx->bytes, src, + srclen); + dctx->bytes += srclen; + return 0; + } + memcpy(dctx->buffer + dctx->bytes, src, + GHASH_DIGEST_SIZE - dctx->bytes); + + ctx->ghash_func(dctx->shash, ctx->htable, + dctx->buffer, GHASH_DIGEST_SIZE); + + src += GHASH_DIGEST_SIZE - dctx->bytes; + srclen -= GHASH_DIGEST_SIZE - dctx->bytes; + dctx->bytes = 0; + } + len = srclen & ~(GHASH_DIGEST_SIZE - 1); + + if (len) { + gcm_ghash_rv64i_zbc(dctx->shash, ctx->htable, + src, len); + src += len; + srclen -= len; + } + + if (srclen) { + memcpy(dctx->buffer, src, srclen); + dctx->bytes = srclen; + } + return 0; +} + +static int riscv64_zbc_ghash_final(struct shash_desc *desc, u8 *out) +{ + int i; + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm)); + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); + + if (dctx->bytes) { + for (i = dctx->bytes; i < GHASH_DIGEST_SIZE; i++) + dctx->buffer[i] = 0; + ctx->ghash_func(dctx->shash, ctx->htable, + dctx->buffer, GHASH_DIGEST_SIZE); + dctx->bytes = 0; + } + memcpy(out, dctx->shash, GHASH_DIGEST_SIZE); + return 0; +} + +RISCV64_ZBC_SETKEY(zbc, zbc); +struct shash_alg riscv64_zbc_ghash_alg = { + .digestsize = GHASH_DIGEST_SIZE, + .init = riscv64_ghash_init, + .update = riscv64_zbc_ghash_update, + .final = riscv64_zbc_ghash_final, + .setkey = riscv64_zbc_ghash_setkey_zbc, + .descsize = sizeof(struct riscv64_ghash_desc_ctx) + + sizeof(struct ghash_desc_ctx), + .base = { + .cra_name = "ghash", + .cra_driver_name = "riscv64_zbc_ghash", + .cra_priority = 250, + .cra_blocksize = GHASH_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_ghash_ctx), + .cra_module = THIS_MODULE, + }, +}; + +RISCV64_ZBC_SETKEY(zbc__zbb, zbc); +struct shash_alg riscv64_zbc_zbb_ghash_alg = { + .digestsize = GHASH_DIGEST_SIZE, + .init = riscv64_ghash_init, + .update = riscv64_zbc_ghash_update, + .final = riscv64_zbc_ghash_final, + .setkey = riscv64_zbc_ghash_setkey_zbc__zbb, + .descsize = sizeof(struct riscv64_ghash_desc_ctx) + + sizeof(struct ghash_desc_ctx), + .base = { + .cra_name = "ghash", + .cra_driver_name = "riscv64_zbc_zbb_ghash", + .cra_priority = 251, + .cra_blocksize = GHASH_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_ghash_ctx), + .cra_module = THIS_MODULE, + }, +}; + +RISCV64_ZBC_SETKEY(zbc__zbkb, zbc__zbkb); +struct shash_alg riscv64_zbc_zbkb_ghash_alg = { + .digestsize = GHASH_DIGEST_SIZE, + .init = riscv64_ghash_init, + .update = riscv64_zbc_ghash_update, + .final = riscv64_zbc_ghash_final, + .setkey = riscv64_zbc_ghash_setkey_zbc__zbkb, + .descsize = sizeof(struct riscv64_ghash_desc_ctx) + + sizeof(struct ghash_desc_ctx), + .base = { + .cra_name = "ghash", + .cra_driver_name = "riscv64_zbc_zbkb_ghash", + .cra_priority = 252, + .cra_blocksize = GHASH_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_ghash_ctx), + .cra_module = THIS_MODULE, + }, +}; + +#endif /* CONFIG_RISCV_ISA_ZBC */ + +#define RISCV64_DEFINED_GHASHES 7 + +static struct shash_alg *riscv64_ghashes[RISCV64_DEFINED_GHASHES]; +static int num_riscv64_ghashes; + +static int __init riscv64_ghash_register(struct shash_alg *ghash) +{ + int ret; + + ret = crypto_register_shash(ghash); + if (ret < 0) { + int i; + + for (i = num_riscv64_ghashes - 1; i >= 0 ; i--) + crypto_unregister_shash(riscv64_ghashes[i]); + + num_riscv64_ghashes = 0; + + return ret; + } + + pr_debug("Registered RISC-V ghash %s\n", ghash->base.cra_driver_name); + riscv64_ghashes[num_riscv64_ghashes] = ghash; + num_riscv64_ghashes++; + return 0; +} + +static int __init riscv64_ghash_mod_init(void) +{ + int ret = 0; + +#ifdef CONFIG_RISCV_ISA_ZBC + if (riscv_isa_extension_available(NULL, ZBC)) { + ret = riscv64_ghash_register(&riscv64_zbc_ghash_alg); + if (ret < 0) + return ret; + + if (riscv_isa_extension_available(NULL, ZBB)) { + ret = riscv64_ghash_register(&riscv64_zbc_zbb_ghash_alg); + if (ret < 0) + return ret; + } + + if (riscv_isa_extension_available(NULL, ZBKB)) { + ret = riscv64_ghash_register(&riscv64_zbc_zbkb_ghash_alg); + if (ret < 0) + return ret; + } + } +#endif + + return 0; +} + +static void __exit riscv64_ghash_mod_fini(void) +{ + int i; + + for (i = num_riscv64_ghashes - 1; i >= 0 ; i--) + crypto_unregister_shash(riscv64_ghashes[i]); + + num_riscv64_ghashes = 0; +} + +module_init(riscv64_ghash_mod_init); +module_exit(riscv64_ghash_mod_fini); + +MODULE_DESCRIPTION("GSM GHASH (accelerated)"); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL v2"); +MODULE_ALIAS_CRYPTO("ghash"); diff --git a/arch/riscv/crypto/ghash-riscv64-zbc.pl b/arch/riscv/crypto/ghash-riscv64-zbc.pl new file mode 100644 index 000000000000..691231ffa11c --- /dev/null +++ b/arch/riscv/crypto/ghash-riscv64-zbc.pl @@ -0,0 +1,400 @@ +#! /usr/bin/env perl +# Copyright 2022 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +################################################################################ +# void gcm_init_rv64i_zbc(u128 Htable[16], const u64 H[2]); +# void gcm_init_rv64i_zbc__zbb(u128 Htable[16], const u64 H[2]); +# void gcm_init_rv64i_zbc__zbkb(u128 Htable[16], const u64 H[2]); +# +# input: H: 128-bit H - secret parameter E(K, 0^128) +# output: Htable: Preprocessed key data for gcm_gmult_rv64i_zbc* and +# gcm_ghash_rv64i_zbc* +# +# All callers of this function revert the byte-order unconditionally +# on little-endian machines. So we need to revert the byte-order back. +# Additionally we reverse the bits of each byte. + +{ +my ($Htable,$H,$VAL0,$VAL1,$TMP0,$TMP1,$TMP2) = ("a0","a1","a2","a3","t0","t1","t2"); + +$code .= <<___; +.p2align 3 +.globl gcm_init_rv64i_zbc +.type gcm_init_rv64i_zbc,\@function +gcm_init_rv64i_zbc: + ld $VAL0,0($H) + ld $VAL1,8($H) + @{[brev8_rv64i $VAL0, $TMP0, $TMP1, $TMP2]} + @{[brev8_rv64i $VAL1, $TMP0, $TMP1, $TMP2]} + @{[sd_rev8_rv64i $VAL0, $Htable, 0, $TMP0]} + @{[sd_rev8_rv64i $VAL1, $Htable, 8, $TMP0]} + ret +.size gcm_init_rv64i_zbc,.-gcm_init_rv64i_zbc +___ +} + +{ +my ($Htable,$H,$VAL0,$VAL1,$TMP0,$TMP1,$TMP2) = ("a0","a1","a2","a3","t0","t1","t2"); + +$code .= <<___; +.p2align 3 +.globl gcm_init_rv64i_zbc__zbb +.type gcm_init_rv64i_zbc__zbb,\@function +gcm_init_rv64i_zbc__zbb: + ld $VAL0,0($H) + ld $VAL1,8($H) + @{[brev8_rv64i $VAL0, $TMP0, $TMP1, $TMP2]} + @{[brev8_rv64i $VAL1, $TMP0, $TMP1, $TMP2]} + @{[rev8 $VAL0, $VAL0]} + @{[rev8 $VAL1, $VAL1]} + sd $VAL0,0($Htable) + sd $VAL1,8($Htable) + ret +.size gcm_init_rv64i_zbc__zbb,.-gcm_init_rv64i_zbc__zbb +___ +} + +{ +my ($Htable,$H,$TMP0,$TMP1) = ("a0","a1","t0","t1"); + +$code .= <<___; +.p2align 3 +.globl gcm_init_rv64i_zbc__zbkb +.type gcm_init_rv64i_zbc__zbkb,\@function +gcm_init_rv64i_zbc__zbkb: + ld $TMP0,0($H) + ld $TMP1,8($H) + @{[brev8 $TMP0, $TMP0]} + @{[brev8 $TMP1, $TMP1]} + @{[rev8 $TMP0, $TMP0]} + @{[rev8 $TMP1, $TMP1]} + sd $TMP0,0($Htable) + sd $TMP1,8($Htable) + ret +.size gcm_init_rv64i_zbc__zbkb,.-gcm_init_rv64i_zbc__zbkb +___ +} + +################################################################################ +# void gcm_gmult_rv64i_zbc(u64 Xi[2], const u128 Htable[16]); +# void gcm_gmult_rv64i_zbc__zbkb(u64 Xi[2], const u128 Htable[16]); +# +# input: Xi: current hash value +# Htable: copy of H +# output: Xi: next hash value Xi +# +# Compute GMULT (Xi*H mod f) using the Zbc (clmul) and Zbb (basic bit manip) +# extensions. Using the no-Karatsuba approach and clmul for the final reduction. +# This results in an implementation with minimized number of instructions. +# HW with clmul latencies higher than 2 cycles might observe a performance +# improvement with Karatsuba. HW with clmul latencies higher than 6 cycles +# might observe a performance improvement with additionally converting the +# reduction to shift&xor. For a full discussion of this estimates see +# https://github.com/riscv/riscv-crypto/blob/master/doc/supp/gcm-mode-cmul.adoc +{ +my ($Xi,$Htable,$x0,$x1,$y0,$y1) = ("a0","a1","a4","a5","a6","a7"); +my ($z0,$z1,$z2,$z3,$t0,$t1,$polymod) = ("t0","t1","t2","t3","t4","t5","t6"); + +$code .= <<___; +.p2align 3 +.globl gcm_gmult_rv64i_zbc +.type gcm_gmult_rv64i_zbc,\@function +gcm_gmult_rv64i_zbc: + # Load Xi and bit-reverse it + ld $x0, 0($Xi) + ld $x1, 8($Xi) + @{[brev8_rv64i $x0, $z0, $z1, $z2]} + @{[brev8_rv64i $x1, $z0, $z1, $z2]} + + # Load the key (already bit-reversed) + ld $y0, 0($Htable) + ld $y1, 8($Htable) + + # Load the reduction constant + la $polymod, Lpolymod + lbu $polymod, 0($polymod) + + # Multiplication (without Karatsuba) + @{[clmulh $z3, $x1, $y1]} + @{[clmul $z2, $x1, $y1]} + @{[clmulh $t1, $x0, $y1]} + @{[clmul $z1, $x0, $y1]} + xor $z2, $z2, $t1 + @{[clmulh $t1, $x1, $y0]} + @{[clmul $t0, $x1, $y0]} + xor $z2, $z2, $t1 + xor $z1, $z1, $t0 + @{[clmulh $t1, $x0, $y0]} + @{[clmul $z0, $x0, $y0]} + xor $z1, $z1, $t1 + + # Reduction with clmul + @{[clmulh $t1, $z3, $polymod]} + @{[clmul $t0, $z3, $polymod]} + xor $z2, $z2, $t1 + xor $z1, $z1, $t0 + @{[clmulh $t1, $z2, $polymod]} + @{[clmul $t0, $z2, $polymod]} + xor $x1, $z1, $t1 + xor $x0, $z0, $t0 + + # Bit-reverse Xi back and store it + @{[brev8_rv64i $x0, $z0, $z1, $z2]} + @{[brev8_rv64i $x1, $z0, $z1, $z2]} + sd $x0, 0($Xi) + sd $x1, 8($Xi) + ret +.size gcm_gmult_rv64i_zbc,.-gcm_gmult_rv64i_zbc +___ +} + +{ +my ($Xi,$Htable,$x0,$x1,$y0,$y1) = ("a0","a1","a4","a5","a6","a7"); +my ($z0,$z1,$z2,$z3,$t0,$t1,$polymod) = ("t0","t1","t2","t3","t4","t5","t6"); + +$code .= <<___; +.p2align 3 +.globl gcm_gmult_rv64i_zbc__zbkb +.type gcm_gmult_rv64i_zbc__zbkb,\@function +gcm_gmult_rv64i_zbc__zbkb: + # Load Xi and bit-reverse it + ld $x0, 0($Xi) + ld $x1, 8($Xi) + @{[brev8 $x0, $x0]} + @{[brev8 $x1, $x1]} + + # Load the key (already bit-reversed) + ld $y0, 0($Htable) + ld $y1, 8($Htable) + + # Load the reduction constant + la $polymod, Lpolymod + lbu $polymod, 0($polymod) + + # Multiplication (without Karatsuba) + @{[clmulh $z3, $x1, $y1]} + @{[clmul $z2, $x1, $y1]} + @{[clmulh $t1, $x0, $y1]} + @{[clmul $z1, $x0, $y1]} + xor $z2, $z2, $t1 + @{[clmulh $t1, $x1, $y0]} + @{[clmul $t0, $x1, $y0]} + xor $z2, $z2, $t1 + xor $z1, $z1, $t0 + @{[clmulh $t1, $x0, $y0]} + @{[clmul $z0, $x0, $y0]} + xor $z1, $z1, $t1 + + # Reduction with clmul + @{[clmulh $t1, $z3, $polymod]} + @{[clmul $t0, $z3, $polymod]} + xor $z2, $z2, $t1 + xor $z1, $z1, $t0 + @{[clmulh $t1, $z2, $polymod]} + @{[clmul $t0, $z2, $polymod]} + xor $x1, $z1, $t1 + xor $x0, $z0, $t0 + + # Bit-reverse Xi back and store it + @{[brev8 $x0, $x0]} + @{[brev8 $x1, $x1]} + sd $x0, 0($Xi) + sd $x1, 8($Xi) + ret +.size gcm_gmult_rv64i_zbc__zbkb,.-gcm_gmult_rv64i_zbc__zbkb +___ +} + +################################################################################ +# void gcm_ghash_rv64i_zbc(u64 Xi[2], const u128 Htable[16], +# const u8 *inp, size_t len); +# void gcm_ghash_rv64i_zbc__zbkb(u64 Xi[2], const u128 Htable[16], +# const u8 *inp, size_t len); +# +# input: Xi: current hash value +# Htable: copy of H +# inp: pointer to input data +# len: length of input data in bytes (mutiple of block size) +# output: Xi: Xi+1 (next hash value Xi) +{ +my ($Xi,$Htable,$inp,$len,$x0,$x1,$y0,$y1) = ("a0","a1","a2","a3","a4","a5","a6","a7"); +my ($z0,$z1,$z2,$z3,$t0,$t1,$polymod) = ("t0","t1","t2","t3","t4","t5","t6"); + +$code .= <<___; +.p2align 3 +.globl gcm_ghash_rv64i_zbc +.type gcm_ghash_rv64i_zbc,\@function +gcm_ghash_rv64i_zbc: + # Load Xi and bit-reverse it + ld $x0, 0($Xi) + ld $x1, 8($Xi) + @{[brev8_rv64i $x0, $z0, $z1, $z2]} + @{[brev8_rv64i $x1, $z0, $z1, $z2]} + + # Load the key (already bit-reversed) + ld $y0, 0($Htable) + ld $y1, 8($Htable) + + # Load the reduction constant + la $polymod, Lpolymod + lbu $polymod, 0($polymod) + +Lstep: + # Load the input data, bit-reverse them, and XOR them with Xi + ld $t0, 0($inp) + ld $t1, 8($inp) + add $inp, $inp, 16 + add $len, $len, -16 + @{[brev8_rv64i $t0, $z0, $z1, $z2]} + @{[brev8_rv64i $t1, $z0, $z1, $z2]} + xor $x0, $x0, $t0 + xor $x1, $x1, $t1 + + # Multiplication (without Karatsuba) + @{[clmulh $z3, $x1, $y1]} + @{[clmul $z2, $x1, $y1]} + @{[clmulh $t1, $x0, $y1]} + @{[clmul $z1, $x0, $y1]} + xor $z2, $z2, $t1 + @{[clmulh $t1, $x1, $y0]} + @{[clmul $t0, $x1, $y0]} + xor $z2, $z2, $t1 + xor $z1, $z1, $t0 + @{[clmulh $t1, $x0, $y0]} + @{[clmul $z0, $x0, $y0]} + xor $z1, $z1, $t1 + + # Reduction with clmul + @{[clmulh $t1, $z3, $polymod]} + @{[clmul $t0, $z3, $polymod]} + xor $z2, $z2, $t1 + xor $z1, $z1, $t0 + @{[clmulh $t1, $z2, $polymod]} + @{[clmul $t0, $z2, $polymod]} + xor $x1, $z1, $t1 + xor $x0, $z0, $t0 + + # Iterate over all blocks + bnez $len, Lstep + + # Bit-reverse final Xi back and store it + @{[brev8_rv64i $x0, $z0, $z1, $z2]} + @{[brev8_rv64i $x1, $z0, $z1, $z2]} + sd $x0, 0($Xi) + sd $x1, 8($Xi) + ret +.size gcm_ghash_rv64i_zbc,.-gcm_ghash_rv64i_zbc +___ +} + +{ +my ($Xi,$Htable,$inp,$len,$x0,$x1,$y0,$y1) = ("a0","a1","a2","a3","a4","a5","a6","a7"); +my ($z0,$z1,$z2,$z3,$t0,$t1,$polymod) = ("t0","t1","t2","t3","t4","t5","t6"); + +$code .= <<___; +.p2align 3 +.globl gcm_ghash_rv64i_zbc__zbkb +.type gcm_ghash_rv64i_zbc__zbkb,\@function +gcm_ghash_rv64i_zbc__zbkb: + # Load Xi and bit-reverse it + ld $x0, 0($Xi) + ld $x1, 8($Xi) + @{[brev8 $x0, $x0]} + @{[brev8 $x1, $x1]} + + # Load the key (already bit-reversed) + ld $y0, 0($Htable) + ld $y1, 8($Htable) + + # Load the reduction constant + la $polymod, Lpolymod + lbu $polymod, 0($polymod) + +Lstep_zkbk: + # Load the input data, bit-reverse them, and XOR them with Xi + ld $t0, 0($inp) + ld $t1, 8($inp) + add $inp, $inp, 16 + add $len, $len, -16 + @{[brev8 $t0, $t0]} + @{[brev8 $t1, $t1]} + xor $x0, $x0, $t0 + xor $x1, $x1, $t1 + + # Multiplication (without Karatsuba) + @{[clmulh $z3, $x1, $y1]} + @{[clmul $z2, $x1, $y1]} + @{[clmulh $t1, $x0, $y1]} + @{[clmul $z1, $x0, $y1]} + xor $z2, $z2, $t1 + @{[clmulh $t1, $x1, $y0]} + @{[clmul $t0, $x1, $y0]} + xor $z2, $z2, $t1 + xor $z1, $z1, $t0 + @{[clmulh $t1, $x0, $y0]} + @{[clmul $z0, $x0, $y0]} + xor $z1, $z1, $t1 + + # Reduction with clmul + @{[clmulh $t1, $z3, $polymod]} + @{[clmul $t0, $z3, $polymod]} + xor $z2, $z2, $t1 + xor $z1, $z1, $t0 + @{[clmulh $t1, $z2, $polymod]} + @{[clmul $t0, $z2, $polymod]} + xor $x1, $z1, $t1 + xor $x0, $z0, $t0 + + # Iterate over all blocks + bnez $len, Lstep_zkbk + + # Bit-reverse final Xi back and store it + @{[brev8 $x0, $x0]} + @{[brev8 $x1, $x1]} + sd $x0, 0($Xi) + sd $x1, 8($Xi) + ret +.size gcm_ghash_rv64i_zbc__zbkb,.-gcm_ghash_rv64i_zbc__zbkb +___ +} + +$code .= <<___; +.p2align 3 +Lbrev8_const: + .dword 0xAAAAAAAAAAAAAAAA + .dword 0xCCCCCCCCCCCCCCCC + .dword 0xF0F0F0F0F0F0F0F0 +.size Lbrev8_const,.-Lbrev8_const + +Lpolymod: + .byte 0x87 +.size Lpolymod,.-Lpolymod +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; diff --git a/arch/riscv/crypto/riscv.pm b/arch/riscv/crypto/riscv.pm new file mode 100644 index 000000000000..61bc4fc41a43 --- /dev/null +++ b/arch/riscv/crypto/riscv.pm @@ -0,0 +1,230 @@ +#! /usr/bin/env perl +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +use strict; +use warnings; + +# Set $have_stacktrace to 1 if we have Devel::StackTrace +my $have_stacktrace = 0; +if (eval {require Devel::StackTrace;1;}) { + $have_stacktrace = 1; +} + +my @regs = map("x$_",(0..31)); +my @regaliases = ('zero','ra','sp','gp','tp','t0','t1','t2','s0','s1', + map("a$_",(0..7)), + map("s$_",(2..11)), + map("t$_",(3..6)) +); + +my %reglookup; +@reglookup{@regs} = @regs; +@reglookup{@regaliases} = @regs; + +# Takes a register name, possibly an alias, and converts it to a register index +# from 0 to 31 +sub read_reg { + my $reg = lc shift; + if (!exists($reglookup{$reg})) { + my $trace = ""; + if ($have_stacktrace) { + $trace = Devel::StackTrace->new->as_string; + } + die("Unknown register ".$reg."\n".$trace); + } + my $regstr = $reglookup{$reg}; + if (!($regstr =~ /^x([0-9]+)$/)) { + my $trace = ""; + if ($have_stacktrace) { + $trace = Devel::StackTrace->new->as_string; + } + die("Could not process register ".$reg."\n".$trace); + } + return $1; +} + +# Helper functions + +sub brev8_rv64i { + # brev8 without `brev8` instruction (only in Zkbk) + # Bit-reverses the first argument and needs three scratch registers + my $val = shift; + my $t0 = shift; + my $t1 = shift; + my $brev8_const = shift; + my $seq = <<___; + la $brev8_const, Lbrev8_const + + ld $t0, 0($brev8_const) # 0xAAAAAAAAAAAAAAAA + slli $t1, $val, 1 + and $t1, $t1, $t0 + and $val, $val, $t0 + srli $val, $val, 1 + or $val, $t1, $val + + ld $t0, 8($brev8_const) # 0xCCCCCCCCCCCCCCCC + slli $t1, $val, 2 + and $t1, $t1, $t0 + and $val, $val, $t0 + srli $val, $val, 2 + or $val, $t1, $val + + ld $t0, 16($brev8_const) # 0xF0F0F0F0F0F0F0F0 + slli $t1, $val, 4 + and $t1, $t1, $t0 + and $val, $val, $t0 + srli $val, $val, 4 + or $val, $t1, $val +___ + return $seq; +} + +sub sd_rev8_rv64i { + # rev8 without `rev8` instruction (only in Zbb or Zbkb) + # Stores the given value byte-reversed and needs one scratch register + my $val = shift; + my $addr = shift; + my $off = shift; + my $tmp = shift; + my $off0 = ($off + 0); + my $off1 = ($off + 1); + my $off2 = ($off + 2); + my $off3 = ($off + 3); + my $off4 = ($off + 4); + my $off5 = ($off + 5); + my $off6 = ($off + 6); + my $off7 = ($off + 7); + my $seq = <<___; + sb $val, $off7($addr) + srli $tmp, $val, 8 + sb $tmp, $off6($addr) + srli $tmp, $val, 16 + sb $tmp, $off5($addr) + srli $tmp, $val, 24 + sb $tmp, $off4($addr) + srli $tmp, $val, 32 + sb $tmp, $off3($addr) + srli $tmp, $val, 40 + sb $tmp, $off2($addr) + srli $tmp, $val, 48 + sb $tmp, $off1($addr) + srli $tmp, $val, 56 + sb $tmp, $off0($addr) +___ + return $seq; +} + +# Scalar crypto instructions + +sub aes64ds { + # Encoding for aes64ds rd, rs1, rs2 instruction on RV64 + # XXXXXXX_ rs2 _ rs1 _XXX_ rd _XXXXXXX + my $template = 0b0011101_00000_00000_000_00000_0110011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7)); +} + +sub aes64dsm { + # Encoding for aes64dsm rd, rs1, rs2 instruction on RV64 + # XXXXXXX_ rs2 _ rs1 _XXX_ rd _XXXXXXX + my $template = 0b0011111_00000_00000_000_00000_0110011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7)); +} + +sub aes64es { + # Encoding for aes64es rd, rs1, rs2 instruction on RV64 + # XXXXXXX_ rs2 _ rs1 _XXX_ rd _XXXXXXX + my $template = 0b0011001_00000_00000_000_00000_0110011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7)); +} + +sub aes64esm { + # Encoding for aes64esm rd, rs1, rs2 instruction on RV64 + # XXXXXXX_ rs2 _ rs1 _XXX_ rd _XXXXXXX + my $template = 0b0011011_00000_00000_000_00000_0110011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7)); +} + +sub aes64im { + # Encoding for aes64im rd, rs1 instruction on RV64 + # XXXXXXXXXXXX_ rs1 _XXX_ rd _XXXXXXX + my $template = 0b001100000000_00000_001_00000_0010011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($rs1 << 15) | ($rd << 7)); +} + +sub aes64ks1i { + # Encoding for aes64ks1i rd, rs1, rnum instruction on RV64 + # XXXXXXXX_rnum_ rs1 _XXX_ rd _XXXXXXX + my $template = 0b00110001_0000_00000_001_00000_0010011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + my $rnum = shift; + return ".word ".($template | ($rnum << 20) | ($rs1 << 15) | ($rd << 7)); +} + +sub aes64ks2 { + # Encoding for aes64ks2 rd, rs1, rs2 instruction on RV64 + # XXXXXXX_ rs2 _ rs1 _XXX_ rd _XXXXXXX + my $template = 0b0111111_00000_00000_000_00000_0110011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7)); +} + +sub brev8 { + # brev8 rd, rs + my $template = 0b011010000111_00000_101_00000_0010011; + my $rd = read_reg shift; + my $rs = read_reg shift; + return ".word ".($template | ($rs << 15) | ($rd << 7)); +} + +sub clmul { + # Encoding for clmul rd, rs1, rs2 instruction on RV64 + # XXXXXXX_ rs2 _ rs1 _XXX_ rd _XXXXXXX + my $template = 0b0000101_00000_00000_001_00000_0110011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7)); +} + +sub clmulh { + # Encoding for clmulh rd, rs1, rs2 instruction on RV64 + # XXXXXXX_ rs2 _ rs1 _XXX_ rd _XXXXXXX + my $template = 0b0000101_00000_00000_011_00000_0110011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7)); +} + +sub rev8 { + # Encoding for rev8 rd, rs instruction on RV64 + # XXXXXXXXXXXXX_ rs _XXX_ rd _XXXXXXX + my $template = 0b011010111000_00000_101_00000_0010011; + my $rd = read_reg shift; + my $rs = read_reg shift; + return ".word ".($template | ($rs << 15) | ($rd << 7)); +} + +1; From patchwork Mon Mar 13 19:12:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= X-Patchwork-Id: 69063 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1362130wrd; Mon, 13 Mar 2023 12:20:25 -0700 (PDT) X-Google-Smtp-Source: AK7set/n3bXvGIhdXHFMJDwYU9s8Wk3fvT/BlmfYVLvz4a5b72oSRBuCuCB5LexUGp9RJEyI7KM5 X-Received: by 2002:a62:1792:0:b0:5a8:cec9:6ab6 with SMTP id 140-20020a621792000000b005a8cec96ab6mr28893503pfx.31.1678735223049; Mon, 13 Mar 2023 12:20:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678735223; cv=none; d=google.com; s=arc-20160816; b=0d6bCUsnCadf/9/fi+/dk9t8dpI1ZvsIgBnWhjbgHcw1PESEhiZ1B0N1aZICgt3REy /fKTyhvvlCp1SG5rKoQtk8afNpb1oKBAhe3FGbPkhEISInoPVtvybc4sFQCpIGczBF8E tQlV6lHIkrhgmGMi5Z8GtWFPEnZax5WXt0EnaiLFkcXfgvynm8ioNIb0a33LXTKVBH3C tZz4irCE7lwwO+2tg/9SzCnuELqmvq8NhbWCtTRUZlb5H3ZmlyN49V2I2iwlX3MWHd4S sRT772dNTAFv+qKkAh6tOBb9Q4mJMzDG7AIjpKuSQwrggOXyrJ99ozdXGF6bJMcP0DO3 0XUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=7I3aclIWEe20n6d9rt8YZV1H1jCvecFjvx48YFQ5AF0=; b=xKds37usWekIjVwqsch7UACSymOdvPaJH8kIL3rCk5Hc7M0eXpCzF+1uQi//kUqm0f XV9McVSxox3lZhqCZ1oMCPRbGOfGq3iQOqiW/jSAO8n/G+BGBORFplShaDksrX3p02hB ++slgbKus1XBAcLzJkWOOu9giGupa0HKPhF72kMaGDTGEiaUvA6WK94vyf1wEAwOWS6r sYxXVp0PPzf8C5MYVRDOwQVfaIy4JncX2OfFjNKiVvivfgxOArTCVdI4NgKQYEfJjzvq +b97V7wKl9WPImDUWMZxfv7ktzflQJlNPGp+d7PpKSwY2dDbTM8/0zkruOS3paHBdMCw dFIQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k189-20020a6284c6000000b0062275c53889si191258pfd.267.2023.03.13.12.20.08; Mon, 13 Mar 2023 12:20:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230287AbjCMTN3 (ORCPT + 99 others); Mon, 13 Mar 2023 15:13:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229608AbjCMTNN (ORCPT ); Mon, 13 Mar 2023 15:13:13 -0400 Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9505CA20 for ; Mon, 13 Mar 2023 12:13:08 -0700 (PDT) Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pbnbV-00028k-In; Mon, 13 Mar 2023 20:13:05 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, heiko@sntech.de Subject: [PATCH RFC v3 07/16] RISC-V: add helper function to read the vector VLEN Date: Mon, 13 Mar 2023 20:12:53 +0100 Message-Id: <20230313191302.580787-8-heiko.stuebner@vrull.eu> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu> References: <20230313191302.580787-1-heiko.stuebner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760281465062527761?= X-GMAIL-MSGID: =?utf-8?q?1760281465062527761?= From: Heiko Stuebner VLEN describes the length of each vector register and some instructions need specific minimal VLENs to work correctly. The vector code already includes a variable riscv_vsize that contains the value of "32 vector registers with vlenb length" that gets filled during boot. vlenb is the value contained in the CSR_VLENB register and the value represents "VLEN / 8". So add riscv_vector_vlen() to return the actual VLEN value for in-kernel users when they need to check the available VLEN. Signed-off-by: Heiko Stuebner --- arch/riscv/include/asm/vector.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h index 202df9ea28d7..e466e7787d25 100644 --- a/arch/riscv/include/asm/vector.h +++ b/arch/riscv/include/asm/vector.h @@ -178,4 +178,15 @@ static inline bool riscv_v_vstate_query(struct pt_regs *regs) { return false; } #endif /* CONFIG_RISCV_ISA_V */ +/* + * Return the implementation's vlen value. + * + * riscv_vsize contains the value of "32 vector registers with vlenb length" + * so rebuild the vlen value in bits from it. + */ +static inline int riscv_vector_vlen(void) +{ + return riscv_v_vsize / 32 * 8; +} + #endif /* ! __ASM_RISCV_VECTOR_H */ From patchwork Mon Mar 13 19:12:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= X-Patchwork-Id: 69073 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1372455wrd; Mon, 13 Mar 2023 12:45:06 -0700 (PDT) X-Google-Smtp-Source: AK7set8ruDrJuJjgQ+DRy1O+r9ArpwW0JN7/JgGC3J2fbnJrnTx4+gPKI7dIpQS9tHQ5VVXBK80G X-Received: by 2002:a05:6a20:4c1f:b0:cc:6d4d:57b3 with SMTP id fm31-20020a056a204c1f00b000cc6d4d57b3mr26699660pzb.12.1678736706016; Mon, 13 Mar 2023 12:45:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678736705; cv=none; d=google.com; s=arc-20160816; b=UPLG9kPKswCIWbWFmTJ95IoitFPm0RL4k1V4H/Lgy6LZmPyGmwzdtPGtmOWFdgISmE utwqt6jovSoi1EV3+vlpcBl4DSir4Pl+uqYNrBL0yiU8j+oAesNn0IwGpj6gJysX58EW T4FSNBnS0oVG7prbJOJRE6xMsurm+M6RptiEYyQn4lTzegrIjmQmwBHHsN+qLc0T+xFA PGoKIEt2Zyf4cAJsWXeVg1qWXk+q5aS856vflKicHL6dZuMB4IlRO0H5IkfIkAT1b1eC NzYuSoHvIoPvhgwV4LnvPesnhkBoRv2Zn6sMO0q5dmj6hPwAQfn1eknBtSNwTICT+2ze mv6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=tx36PPvgqYGv33TKRKLppWsdAiL2MHoLSrap+v9eH3o=; b=dLGpgZ6NDP0mI9e71tFN7Ay/rVLSF0xv+zV4pAmM4gnL8b38HlJQgm1gEJHjL3T1of A+ZLeFdI+HU4Gq02dvYR7nUUcACk6Dkd2SGgcg9eWmMMk/40pC+LanC8ScMozujjI1QY WhRyrpQlngcoOQWarSbnUSMbWSTk//EkeNArucVhdFEEIJAPF8QpKO92VJt/3LXClUPa +OcJgLUgpYsdOKClwQnQuuIvoAGfDTZuoCa0lW67IOZLhN7Cz+Fhhm/4LaIjL+MJirCa N3b641z3NZbeDrs8j8hrl+TWej/LbIJIrKCAS3MHToKCHeROBHc5W9vZmfRxlH3krNGj ed3Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s37-20020a634525000000b004fbe8661fdfsi262873pga.309.2023.03.13.12.44.51; Mon, 13 Mar 2023 12:45:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230409AbjCMTNX (ORCPT + 99 others); Mon, 13 Mar 2023 15:13:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230117AbjCMTNN (ORCPT ); Mon, 13 Mar 2023 15:13:13 -0400 Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9939CC3B for ; Mon, 13 Mar 2023 12:13:08 -0700 (PDT) Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pbnbV-00028k-Qn; Mon, 13 Mar 2023 20:13:05 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, heiko@sntech.de Subject: [PATCH RFC v3 08/16] RISC-V: add vector crypto extension detection Date: Mon, 13 Mar 2023 20:12:54 +0100 Message-Id: <20230313191302.580787-9-heiko.stuebner@vrull.eu> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu> References: <20230313191302.580787-1-heiko.stuebner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760283020141740556?= X-GMAIL-MSGID: =?utf-8?q?1760283020141740556?= From: Heiko Stuebner Add detection for some extensions of the vector-crypto specification, namely - Zvkb: Vector Bit-manipulation used in Cryptography - Zvkg: Vector GCM/GMAC - Zvknha and Zvknhb: NIST Algorithm Suite - Zvkns: AES-128, AES-256 Single Round Suite - Zvksed: ShangMi Algorithm Suite - Zvksh: ShangMi Algorithm Suite As their use is very specific and will likely be limited to special places we expect current code to just pre-encode those instructions, so right now we don't introduce toolchain requirements. Signed-off-by: Heiko Stuebner --- arch/riscv/include/asm/hwcap.h | 7 +++++++ arch/riscv/kernel/cpu.c | 7 +++++++ arch/riscv/kernel/cpufeature.c | 7 +++++++ 3 files changed, 21 insertions(+) diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h index b28548fb10f3..914559e0e136 100644 --- a/arch/riscv/include/asm/hwcap.h +++ b/arch/riscv/include/asm/hwcap.h @@ -45,6 +45,13 @@ #define RISCV_ISA_EXT_ZIHINTPAUSE 32 #define RISCV_ISA_EXT_ZBC 33 #define RISCV_ISA_EXT_ZBKB 34 +#define RISCV_ISA_EXT_ZVKB 35 +#define RISCV_ISA_EXT_ZVKG 36 +#define RISCV_ISA_EXT_ZVKNED 37 +#define RISCV_ISA_EXT_ZVKNHA 38 +#define RISCV_ISA_EXT_ZVKNHB 39 +#define RISCV_ISA_EXT_ZVKSED 40 +#define RISCV_ISA_EXT_ZVKSH 41 #define RISCV_ISA_EXT_MAX 64 #define RISCV_ISA_EXT_NAME_LEN_MAX 32 diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c index 6f65aac68018..c01e6673a947 100644 --- a/arch/riscv/kernel/cpu.c +++ b/arch/riscv/kernel/cpu.c @@ -190,6 +190,13 @@ static struct riscv_isa_ext_data isa_ext_arr[] = { __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB), __RISCV_ISA_EXT_DATA(zbc, RISCV_ISA_EXT_ZBC), __RISCV_ISA_EXT_DATA(zbkb, RISCV_ISA_EXT_ZBKB), + __RISCV_ISA_EXT_DATA(zvkb, RISCV_ISA_EXT_ZVKB), + __RISCV_ISA_EXT_DATA(zvkg, RISCV_ISA_EXT_ZVKG), + __RISCV_ISA_EXT_DATA(zvkned, RISCV_ISA_EXT_ZVKNED), + __RISCV_ISA_EXT_DATA(zvknha, RISCV_ISA_EXT_ZVKNHA), + __RISCV_ISA_EXT_DATA(zvknhb, RISCV_ISA_EXT_ZVKNHB), + __RISCV_ISA_EXT_DATA(zvksed, RISCV_ISA_EXT_ZVKSED), + __RISCV_ISA_EXT_DATA(zvksh, RISCV_ISA_EXT_ZVKSH), __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF), __RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC), __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL), diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index eb7be8e7f24e..ad866321ae37 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -232,6 +232,13 @@ void __init riscv_fill_hwcap(void) SET_ISA_EXT_MAP("zbkb", RISCV_ISA_EXT_ZBKB); SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM); SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE); + SET_ISA_EXT_MAP("zvkb", RISCV_ISA_EXT_ZVKB); + SET_ISA_EXT_MAP("zvkg", RISCV_ISA_EXT_ZVKG); + SET_ISA_EXT_MAP("zvkned", RISCV_ISA_EXT_ZVKNED); + SET_ISA_EXT_MAP("zvknha", RISCV_ISA_EXT_ZVKNHA); + SET_ISA_EXT_MAP("zvknhb", RISCV_ISA_EXT_ZVKNHB); + SET_ISA_EXT_MAP("zvksed", RISCV_ISA_EXT_ZVKSED); + SET_ISA_EXT_MAP("zvksh", RISCV_ISA_EXT_ZVKSH); } #undef SET_ISA_EXT_MAP } From patchwork Mon Mar 13 19:12:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= X-Patchwork-Id: 69062 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1362115wrd; Mon, 13 Mar 2023 12:20:23 -0700 (PDT) X-Google-Smtp-Source: AK7set9bepMhLijXvEyuZj5gF5yuBQ9gGR1BRUsYh3Kll50i9A+SebXiyc2wiJP1+KY0X4wDnAve X-Received: by 2002:a17:902:ce89:b0:19e:6cb9:4c8f with SMTP id f9-20020a170902ce8900b0019e6cb94c8fmr44699811plg.41.1678735223415; Mon, 13 Mar 2023 12:20:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678735223; cv=none; d=google.com; s=arc-20160816; b=TpggRw7ZYVj3eDpQbv2fz70Ckd16+hee93gv2CQs8b6vcUYj2oByQz1GSxHWiq0Var rexOzSMO69GBizt5HEOHsfRdGLEvnrCVTLzTcajTX9W++NHH9z53gl8M91YwH6dBVM8N Xo7i1d2ZTpSBWKFngIpczSProIQ2rAf9CD48Up7T29Uk6UDDSZio+hr1DtcPeUF/qLie jmk0tRUcuWYW8hAeSzGt8m5bl9iMzU0oFaDtufoE/Cp29L0273X8s4k6m8o2mVjPMulX by7t5ixyGoKdKPb/Q1tHcmiC7pKAgYt3CbOczmIhtyQsSbD7UbiKzfPrAC7tAOX4lOZ7 IS/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=3ME6kGGSq8N1B7fh9UKNG+KzI9iIShrWG6DRAEhS6Hc=; b=mpZH+RBhxaeXJnIUB46b424w68VJHhpnMAqgiMny3rIh1vcdnuRyPV3qP1YniCkOtS NAj1QAcghHAPov79IqJrh++Vd1Pb/Dsp0G/eF+QP311AoJ3MNXCVytiqIzunk26/7029 xhdy41zUDvYNrkXS3jSTWVh4vbPIzin+Eegbyk2xc2Mh0C3iFc9ofqX6WF4ASnKU22tx H/OLAPj4x5bKOOpvBlXwof25KqsKzzWK6w2DiyQq6ddeO6dEVzkLMisOLMFXuQbvP/Oh ANNopmGpTjo/7VVqCmowR3uACOC2LwQRCJsCP0LGfb/YUO8+N7aF5NcZMkQMECxa1jgD y+JA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jh6-20020a170903328600b001a05a18d269si460972plb.121.2023.03.13.12.20.08; Mon, 13 Mar 2023 12:20:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231175AbjCMTNj (ORCPT + 99 others); Mon, 13 Mar 2023 15:13:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35608 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230187AbjCMTNP (ORCPT ); Mon, 13 Mar 2023 15:13:15 -0400 Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B8A1D33E for ; Mon, 13 Mar 2023 12:13:08 -0700 (PDT) Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pbnbW-00028k-3C; Mon, 13 Mar 2023 20:13:06 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, heiko@sntech.de Subject: [PATCH RFC v3 09/16] RISC-V: crypto: update perl include with helpers for vector (crypto) instructions Date: Mon, 13 Mar 2023 20:12:55 +0100 Message-Id: <20230313191302.580787-10-heiko.stuebner@vrull.eu> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu> References: <20230313191302.580787-1-heiko.stuebner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760281465534693498?= X-GMAIL-MSGID: =?utf-8?q?1760281465534693498?= From: Heiko Stuebner The openSSL scripts use a number of helpers for handling vector instructions and instructions from the vector-crypto-extensions. Therefore port these over from openSSL. Co-developed-by: Christoph Müllner Signed-off-by: Christoph Müllner Signed-off-by: Heiko Stuebner --- arch/riscv/crypto/riscv.pm | 433 ++++++++++++++++++++++++++++++++++++- 1 file changed, 431 insertions(+), 2 deletions(-) diff --git a/arch/riscv/crypto/riscv.pm b/arch/riscv/crypto/riscv.pm index 61bc4fc41a43..a707dd3a68fb 100644 --- a/arch/riscv/crypto/riscv.pm +++ b/arch/riscv/crypto/riscv.pm @@ -48,11 +48,34 @@ sub read_reg { return $1; } +my @vregs = map("v$_",(0..31)); +my %vreglookup; +@vreglookup{@vregs} = @vregs; + +sub read_vreg { + my $vreg = lc shift; + if (!exists($vreglookup{$vreg})) { + my $trace = ""; + if ($have_stacktrace) { + $trace = Devel::StackTrace->new->as_string; + } + die("Unknown vector register ".$vreg."\n".$trace); + } + if (!($vreg =~ /^v([0-9]+)$/)) { + my $trace = ""; + if ($have_stacktrace) { + $trace = Devel::StackTrace->new->as_string; + } + die("Could not process vector register ".$vreg."\n".$trace); + } + return $1; +} + # Helper functions sub brev8_rv64i { - # brev8 without `brev8` instruction (only in Zkbk) - # Bit-reverses the first argument and needs three scratch registers + # brev8 without `brev8` instruction (only in Zbkb) + # Bit-reverses the first argument and needs two scratch registers my $val = shift; my $t0 = shift; my $t1 = shift; @@ -227,4 +250,410 @@ sub rev8 { return ".word ".($template | ($rs << 15) | ($rd << 7)); } +# Vector instructions + +sub vadd_vv { + # vadd.vv vd, vs2, vs1 + my $template = 0b0000001_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vid_v { + # vid.v vd + my $template = 0b0101001_00000_10001_010_00000_1010111; + my $vd = read_vreg shift; + return ".word ".($template | ($vd << 7)); +} + +sub vle32_v { + # vle32.v vd, (rs1) + my $template = 0b0000001_00000_00000_110_00000_0000111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($rs1 << 15) | ($vd << 7)); +} + +sub vle64_v { + # vle64.v vd, (rs1) + my $template = 0b0000001_00000_00000_111_00000_0000111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($rs1 << 15) | ($vd << 7)); +} + +sub vlse32_v { + # vlse32.v vd, (rs1), rs2 + my $template = 0b0000101_00000_00000_110_00000_0000111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vlse64_v { + # vlse64.v vd, (rs1), rs2 + my $template = 0b0000101_00000_00000_111_00000_0000111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vmerge_vim { + # vmerge.vim vd, vs2, imm, v0 + my $template = 0b0101110_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $imm = shift; + return ".word ".($template | ($vs2 << 20) | ($imm << 15) | ($vd << 7)); +} + +sub vmerge_vvm { + # vmerge.vvm vd vs2 vs1 + my $template = 0b0101110_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)) +} + +sub vmseq_vi { + # vmseq vd vs1, imm + my $template = 0b0110001_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs1 = read_vreg shift; + my $imm = shift; + return ".word ".($template | ($vs1 << 20) | ($imm << 15) | ($vd << 7)) +} + +sub vmv_v_i { + # vmv.v.i vd, imm + my $template = 0b0101111_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $imm = shift; + return ".word ".($template | ($imm << 15) | ($vd << 7)); +} + +sub vmv_v_v { + # vmv.v.v vd, vs1 + my $template = 0b0101111_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs1 << 15) | ($vd << 7)); +} + +sub vor_vv_v0t { + # vor.vv vd, vs2, vs1, v0.t + my $template = 0b0010100_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vse32_v { + # vse32.v vd, (rs1) + my $template = 0b0000001_00000_00000_110_00000_0100111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($rs1 << 15) | ($vd << 7)); +} + +sub vse64_v { + # vse64.v vd, (rs1) + my $template = 0b0000001_00000_00000_111_00000_0100111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($rs1 << 15) | ($vd << 7)); +} + +sub vsetivli__x0_2_e64_m1_ta_ma { + # vsetivli x0, 2, e64, m1, ta, ma + return ".word 0xcd817057"; +} + +sub vsetivli__x0_4_e32_m1_ta_ma { + # vsetivli x0, 4, e32, m1, ta, ma + return ".word 0xcd027057"; +} + +sub vsetivli__x0_4_e64_m1_ta_ma { + # vsetivli x0,4,e64,m1,ta,ma + return ".word 0xcd827057"; +} + +sub vsetivli__x0_8_e32_m1_ta_ma { + return ".word 0xcd047057"; +} + +sub vslidedown_vi { + # vslidedown.vi vd, vs2, uimm + my $template = 0b0011111_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vslideup_vi_v0t { + # vslideup.vi vd, vs2, uimm, v0.t + my $template = 0b0011100_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vslideup_vi { + # vslideup.vi vd, vs2, uimm + my $template = 0b0011101_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vsll_vi { + # vsll.vi vd, vs2, uimm, vm + my $template = 0b1001011_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vsrl_vx { + # vsrl.vx vd, vs2, rs1 + my $template = 0b1010001_00000_00000_100_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vsse32_v { + # vse32.v vs3, (rs1), rs2 + my $template = 0b0000101_00000_00000_110_00000_0100111; + my $vs3 = read_vreg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7)); +} + +sub vsse64_v { + # vsse64.v vs3, (rs1), rs2 + my $template = 0b0000101_00000_00000_111_00000_0100111; + my $vs3 = read_vreg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7)); +} + +sub vxor_vv_v0t { + # vxor.vv vd, vs2, vs1, v0.t + my $template = 0b0010110_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vxor_vv { + # vxor.vv vd, vs2, vs1 + my $template = 0b0010111_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +# Vector crypto instructions + +## Zvkb instructions + +sub vclmulh_vx { + # vclmulh.vx vd, vs2, rs1 + my $template = 0b0011011_00000_00000_110_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vclmul_vx_v0t { + # vclmul.vx vd, vs2, rs1, v0.t + my $template = 0b0011000_00000_00000_110_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vclmul_vx { + # vclmul.vx vd, vs2, rs1 + my $template = 0b0011001_00000_00000_110_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vrev8_v { + # vrev8.v vd, vs2 + my $template = 0b0100101_00000_01001_010_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +## Zvkg instructions + +sub vghsh_vv { + # vghsh.vv vd, vs2, vs1 + my $template = 0b1011001_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vgmul_vv { + # vgmul.vv vd, vs2 + my $template = 0b1010001_00000_10001_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +## Zvkned instructions + +sub vaesdf_vs { + # vaesdf.vs vd, vs2 + my $template = 0b101001_1_00000_00001_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaesdm_vs { + # vaesdm.vs vd, vs2 + my $template = 0b101001_1_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaesef_vs { + # vaesef.vs vd, vs2 + my $template = 0b101001_1_00000_00011_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaesem_vs { + # vaesem.vs vd, vs2 + my $template = 0b101001_1_00000_00010_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaeskf1_vi { + # vaeskf1.vi vd, vs2, uimmm + my $template = 0b100010_1_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($uimm << 15) | ($vs2 << 20) | ($vd << 7)); +} + +sub vaeskf2_vi { + # vaeskf2.vi vd, vs2, uimm + my $template = 0b101010_1_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vaesz_vs { + # vaesz.vs vd, vs2 + my $template = 0b101001_1_00000_00111_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +## Zvknha and Zvknhb instructions + +sub vsha2ms_vv { + # vsha2ms.vv vd, vs2, vs1 + my $template = 0b1011011_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7)); +} + +sub vsha2ch_vv { + # vsha2ch.vv vd, vs2, vs1 + my $template = 0b101110_10000_00000_001_00000_01110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7)); +} + +sub vsha2cl_vv { + # vsha2cl.vv vd, vs2, vs1 + my $template = 0b101111_10000_00000_001_00000_01110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7)); +} + +## Zvksed instructions + +sub vsm4k_vi { + # vsm4k.vi vd, vs2, uimm + my $template = 0b1000011_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vsm4r_vs { + # vsm4r.vs vd, vs2 + my $template = 0b1010011_00000_10000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +## zvksh instructions + +sub vsm3c_vi { + # vsm3c.vi vd, vs2, uimm + my $template = 0b1010111_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15 ) | ($vd << 7)); +} + +sub vsm3me_vv { + # vsm3me.vv vd, vs2, vs1 + my $template = 0b1000001_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15 ) | ($vd << 7)); +} + 1; From patchwork Mon Mar 13 19:12:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= X-Patchwork-Id: 69076 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1378718wrd; Mon, 13 Mar 2023 13:00:51 -0700 (PDT) X-Google-Smtp-Source: AK7set/TCcQtEPT/Pf6OYPGjWoW+lTWLR1qmq/sGMVokVM2fNHc5YQ1cTK6cUY4ncB50ysF5pFGo X-Received: by 2002:a17:903:124c:b0:19e:b2ed:6ffc with SMTP id u12-20020a170903124c00b0019eb2ed6ffcmr34301949plh.57.1678737651245; Mon, 13 Mar 2023 13:00:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678737651; cv=none; d=google.com; s=arc-20160816; b=A4Uq5yauhoBtCVVdhYOCc1lw3MLER2r+zA+t4R2I+7XP1TKsNkpTMhjXofvZMXYZbx +2twZXytI594zdDhRNCmXyStexrsx/MG84+xpgkWlnkr68snRwjGmUpJ5AnNBaFDmg5s qT/028Qbr/sRJoD5kp/ja7ePq38sPKUq+o2P53VNOrHqeAGDNc6ukYRjFw/G1ekJHPfz JobT9Ak7DxDyVepCopYBW4MWoewl1KIccE52SEWX3s7fOUvgNtwxBcb8tXyVLuK5ngYy gj+mb8YMzQFR8LWr5wJ0bTTNx+TfFirtIuu/Va1IEg6YbiaHNhXBCfcNw5QdiVP83f0B otsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=wZpLNihvj8XX8BKZONIK5UN2oqvL4wBYP4vqF3MMMcA=; b=iSQRm+7EyyTQAEaI75BTN+xyIZHY9xPupPqm1kOTm5IDoEWd1kyA2tJWpfwHV5frWo OXH8PoNDdOjyEG6y2oU+jyOAjV5qJ7NTyKLG2hWFk7oiX/jY6DNbRBCtwWEAOnZFZx9w HloAn9f3MxYbrpYUUAMuog+puD1x9Tk0mQKqPkclSjHRsiI+UFFZwrJhrJIenihpV03r RBfcobo9X+tZ32l1pB5cicZuR4kcSTtgwkIXb/+nrmC0e5uEOz+Rg/5n+T8t6/mDATm8 d/SuazxdjNhZ0htVU8jV1I7o9IDn+3xaRT4+oTHAhgu0eh4wI21TPQQrFBUg+xx4mfZx Tm4g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kt5-20020a170903088500b0019fb1cb8dc7si456427plb.359.2023.03.13.13.00.34; Mon, 13 Mar 2023 13:00:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230511AbjCMTRl (ORCPT + 99 others); Mon, 13 Mar 2023 15:17:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44876 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230464AbjCMTRj (ORCPT ); Mon, 13 Mar 2023 15:17:39 -0400 Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 023DE2CFC1 for ; Mon, 13 Mar 2023 12:17:09 -0700 (PDT) Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pbnbW-00028k-Be; Mon, 13 Mar 2023 20:13:06 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, heiko@sntech.de Subject: [PATCH RFC v3 10/16] RISC-V: crypto: add Zvkb accelerated GCM GHASH implementation Date: Mon, 13 Mar 2023 20:12:56 +0100 Message-Id: <20230313191302.580787-11-heiko.stuebner@vrull.eu> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu> References: <20230313191302.580787-1-heiko.stuebner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760284011450761741?= X-GMAIL-MSGID: =?utf-8?q?1760284011450761741?= From: Heiko Stuebner Add a gcm hash implementation using the Zvkb crypto extension. It gets possibly registered alongside the Zbc-based variant, with a higher priority so that the crypto subsystem will be able to select the most performant variant, but the algorithm itself will still be part of the crypto selftests that run during registration. Co-developed-by: Christoph Müllner Signed-off-by: Christoph Müllner Signed-off-by: Heiko Stuebner --- arch/riscv/crypto/Kconfig | 3 +- arch/riscv/crypto/Makefile | 8 +- arch/riscv/crypto/ghash-riscv64-glue.c | 147 ++++++++++ arch/riscv/crypto/ghash-riscv64-zvkb.pl | 349 ++++++++++++++++++++++++ 4 files changed, 505 insertions(+), 2 deletions(-) create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkb.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 010adbbb058a..404fd9b3cb7c 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -4,7 +4,7 @@ menu "Accelerated Cryptographic Algorithms for CPU (riscv)" config CRYPTO_GHASH_RISCV64 tristate "Hash functions: GHASH" - depends on 64BIT && RISCV_ISA_ZBC + depends on 64BIT && (RISCV_ISA_ZBC || RISCV_ISA_V) select CRYPTO_HASH select CRYPTO_LIB_GF128MUL help @@ -12,5 +12,6 @@ config CRYPTO_GHASH_RISCV64 Architecture: riscv64 using one of: - ZBC extension + - ZVKB vector crypto extension endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index 0a158919e9da..8ab9a0ae8f2d 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -8,6 +8,9 @@ ghash-riscv64-y := ghash-riscv64-glue.o ifdef CONFIG_RISCV_ISA_ZBC ghash-riscv64-y += ghash-riscv64-zbc.o endif +ifdef CONFIG_RISCV_ISA_V +ghash-riscv64-y += ghash-riscv64-zvkb.o +endif quiet_cmd_perlasm = PERLASM $@ cmd_perlasm = $(PERL) $(<) void $(@) @@ -15,4 +18,7 @@ quiet_cmd_perlasm = PERLASM $@ $(obj)/ghash-riscv64-zbc.S: $(src)/ghash-riscv64-zbc.pl $(call cmd,perlasm) -clean-files += ghash-riscv64-zbc.S +$(obj)/ghash-riscv64-zvkb.S: $(src)/ghash-riscv64-zvkb.pl + $(call cmd,perlasm) + +clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c index 6a6c39e16702..004a1a11d7d8 100644 --- a/arch/riscv/crypto/ghash-riscv64-glue.c +++ b/arch/riscv/crypto/ghash-riscv64-glue.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -21,6 +22,10 @@ void gcm_ghash_rv64i_zbc(u64 Xi[2], const u128 Htable[16], void gcm_ghash_rv64i_zbc__zbkb(u64 Xi[2], const u128 Htable[16], const u8 *inp, size_t len); +/* Zvkb (vector crypto with vclmul) based routines. */ +void gcm_ghash_rv64i_zvkb(u64 Xi[2], const u128 Htable[16], + const u8 *inp, size_t len); + struct riscv64_ghash_ctx { void (*ghash_func)(u64 Xi[2], const u128 Htable[16], const u8 *inp, size_t len); @@ -46,6 +51,140 @@ static int riscv64_ghash_init(struct shash_desc *desc) return 0; } +#ifdef CONFIG_RISCV_ISA_V + +#define RISCV64_ZVK_SETKEY(VARIANT, GHASH) \ +void gcm_init_rv64i_ ## VARIANT(u128 Htable[16], const u64 Xi[2]); \ +static int riscv64_zvk_ghash_setkey_ ## VARIANT(struct crypto_shash *tfm, \ + const u8 *key, \ + unsigned int keylen) \ +{ \ + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(tfm)); \ + const u64 k[2] = { cpu_to_be64(((const u64 *)key)[0]), \ + cpu_to_be64(((const u64 *)key)[1]) }; \ + \ + if (keylen != GHASH_BLOCK_SIZE) \ + return -EINVAL; \ + \ + memcpy(&ctx->key, key, GHASH_BLOCK_SIZE); \ + kernel_rvv_begin(); \ + gcm_init_rv64i_ ## VARIANT(ctx->htable, k); \ + kernel_rvv_end(); \ + \ + ctx->ghash_func = gcm_ghash_rv64i_ ## GHASH; \ + \ + return 0; \ +} + +static inline void __ghash_block(struct riscv64_ghash_ctx *ctx, + struct riscv64_ghash_desc_ctx *dctx) +{ + if (crypto_simd_usable()) { + kernel_rvv_begin(); + ctx->ghash_func(dctx->shash, ctx->htable, + dctx->buffer, GHASH_DIGEST_SIZE); + kernel_rvv_end(); + } else { + crypto_xor((u8 *)dctx->shash, dctx->buffer, GHASH_BLOCK_SIZE); + gf128mul_lle((be128 *)dctx->shash, &ctx->key); + } +} + +static inline void __ghash_blocks(struct riscv64_ghash_ctx *ctx, + struct riscv64_ghash_desc_ctx *dctx, + const u8 *src, unsigned int srclen) +{ + if (crypto_simd_usable()) { + kernel_rvv_begin(); + ctx->ghash_func(dctx->shash, ctx->htable, + src, srclen); + kernel_rvv_end(); + } else { + while (srclen >= GHASH_BLOCK_SIZE) { + crypto_xor((u8 *)dctx->shash, src, GHASH_BLOCK_SIZE); + gf128mul_lle((be128 *)dctx->shash, &ctx->key); + srclen -= GHASH_BLOCK_SIZE; + src += GHASH_BLOCK_SIZE; + } + } +} + +static int riscv64_zvk_ghash_update(struct shash_desc *desc, + const u8 *src, unsigned int srclen) +{ + unsigned int len; + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm)); + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); + + if (dctx->bytes) { + if (dctx->bytes + srclen < GHASH_DIGEST_SIZE) { + memcpy(dctx->buffer + dctx->bytes, src, + srclen); + dctx->bytes += srclen; + return 0; + } + memcpy(dctx->buffer + dctx->bytes, src, + GHASH_DIGEST_SIZE - dctx->bytes); + + __ghash_block(ctx, dctx); + + src += GHASH_DIGEST_SIZE - dctx->bytes; + srclen -= GHASH_DIGEST_SIZE - dctx->bytes; + dctx->bytes = 0; + } + len = srclen & ~(GHASH_DIGEST_SIZE - 1); + + if (len) { + __ghash_blocks(ctx, dctx, src, len); + src += len; + srclen -= len; + } + + if (srclen) { + memcpy(dctx->buffer, src, srclen); + dctx->bytes = srclen; + } + return 0; +} + +static int riscv64_zvk_ghash_final(struct shash_desc *desc, u8 *out) +{ + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm)); + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); + int i; + + if (dctx->bytes) { + for (i = dctx->bytes; i < GHASH_DIGEST_SIZE; i++) + dctx->buffer[i] = 0; + __ghash_block(ctx, dctx); + dctx->bytes = 0; + } + + memcpy(out, dctx->shash, GHASH_DIGEST_SIZE); + return 0; +} + +RISCV64_ZVK_SETKEY(zvkb, zvkb); +struct shash_alg riscv64_zvkb_ghash_alg = { + .digestsize = GHASH_DIGEST_SIZE, + .init = riscv64_ghash_init, + .update = riscv64_zvk_ghash_update, + .final = riscv64_zvk_ghash_final, + .setkey = riscv64_zvk_ghash_setkey_zvkb, + .descsize = sizeof(struct riscv64_ghash_desc_ctx) + + sizeof(struct ghash_desc_ctx), + .base = { + .cra_name = "ghash", + .cra_driver_name = "riscv64_zvkb_ghash", + .cra_priority = 300, + .cra_blocksize = GHASH_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_ghash_ctx), + .cra_module = THIS_MODULE, + }, +}; + +#endif /* CONFIG_RISCV_ISA_V */ + #ifdef CONFIG_RISCV_ISA_ZBC #define RISCV64_ZBC_SETKEY(VARIANT, GHASH) \ @@ -236,6 +375,14 @@ static int __init riscv64_ghash_mod_init(void) } #endif +#ifdef CONFIG_RISCV_ISA_V + if (riscv_isa_extension_available(NULL, ZVKB)) { + ret = riscv64_ghash_register(&riscv64_zvkb_ghash_alg); + if (ret < 0) + return ret; + } +#endif + return 0; } diff --git a/arch/riscv/crypto/ghash-riscv64-zvkb.pl b/arch/riscv/crypto/ghash-riscv64-zvkb.pl new file mode 100644 index 000000000000..3b81c082ba5a --- /dev/null +++ b/arch/riscv/crypto/ghash-riscv64-zvkb.pl @@ -0,0 +1,349 @@ +#! /usr/bin/env perl +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +# - RV64I +# - RISC-V vector ('V') with VLEN >= 128 +# - Vector Bit-manipulation used in Cryptography ('Zvkb') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +################################################################################ +# void gcm_init_rv64i_zvkg(u128 Htable[16], const u64 H[2]); +# +# input: H: 128-bit H - secret parameter E(K, 0^128) +# output: Htable: Preprocessed key data for gcm_gmult_rv64i_zvkb and +# gcm_ghash_rv64i_zvkb +{ +my ($Htable,$H,$TMP0,$TMP1,$TMP2) = ("a0","a1","t0","t1","t2"); +my ($V0,$V1,$V2,$V3,$V4,$V5,$V6) = ("v0","v1","v2","v3","v4","v5","v6"); + +$code .= <<___; +.p2align 3 +.globl gcm_init_rv64i_zvkb +.type gcm_init_rv64i_zvkb,\@function +gcm_init_rv64i_zvkb: + # Load/store data in reverse order. + # This is needed as a part of endianness swap. + add $H, $H, 8 + li $TMP0, -8 + li $TMP1, 63 + la $TMP2, Lpolymod + + @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma + + @{[vlse64_v $V1, $H, $TMP0]} # vlse64.v v1, (a1), t0 + @{[vle64_v $V2, $TMP2]} # vle64.v v2, (t2) + + # Shift one left and get the carry bits. + @{[vsrl_vx $V3, $V1, $TMP1]} # vsrl.vx v3, v1, t1 + @{[vsll_vi $V1, $V1, 1]} # vsll.vi v1, v1, 1 + + # Use the fact that the polynomial degree is no more than 128, + # i.e. only the LSB of the upper half could be set. + # Thanks to we don't need to do the full reduction here. + # Instead simply subtract the reduction polynomial. + # This idea was taken from x86 ghash implementation in OpenSSL. + @{[vslideup_vi $V4, $V3, 1]} # vslideup.vi v4, v3, 1 + @{[vslidedown_vi $V3, $V3, 1]} # vslidedown.vi v3, v3, 1 + + @{[vmv_v_i $V0, 2]} # vmv.v.i v0, 2 + @{[vor_vv_v0t $V1, $V1, $V4]} # vor.vv v1, v1, v4, v0.t + + # Need to set the mask to 3, if the carry bit is set. + @{[vmv_v_v $V0, $V3]} # vmv.v.v v0, v3 + @{[vmv_v_i $V3, 0]} # vmv.v.i v3, 0 + @{[vmerge_vim $V3, $V3, 3]} # vmerge.vim v3, v3, 3, v0 + @{[vmv_v_v $V0, $V3]} # vmv.v.v v0, v3 + + @{[vxor_vv_v0t $V1, $V1, $V2]} # vxor.vv v1, v1, v2, v0.t + + @{[vse64_v $V1, $Htable]} # vse64.v v1, (a0) + ret +.size gcm_init_rv64i_zvkb,.-gcm_init_rv64i_zvkb +___ +} + +################################################################################ +# void gcm_gmult_rv64i_zvkb(u64 Xi[2], const u128 Htable[16]); +# +# input: Xi: current hash value +# Htable: preprocessed H +# output: Xi: next hash value Xi = (Xi * H mod f) +{ +my ($Xi,$Htable,$TMP0,$TMP1,$TMP2,$TMP3,$TMP4) = ("a0","a1","t0","t1","t2","t3","t4"); +my ($V0,$V1,$V2,$V3,$V4,$V5,$V6) = ("v0","v1","v2","v3","v4","v5","v6"); + +$code .= <<___; +.text +.p2align 3 +.globl gcm_gmult_rv64i_zvkb +.type gcm_gmult_rv64i_zvkb,\@function +gcm_gmult_rv64i_zvkb: + ld $TMP0, ($Htable) + ld $TMP1, 8($Htable) + li $TMP2, 63 + la $TMP3, Lpolymod + ld $TMP3, 8($TMP3) + + # Load/store data in reverse order. + # This is needed as a part of endianness swap. + add $Xi, $Xi, 8 + li $TMP4, -8 + + @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma + + @{[vlse64_v $V5, $Xi, $TMP4]} # vlse64.v v5, (a0), t4 + @{[vrev8_v $V5, $V5]} # vrev8.v v5, v5 + + # Multiplication + + # Do two 64x64 multiplications in one go to save some time + # and simplify things. + + # A = a1a0 (t1, t0) + # B = b1b0 (v5) + # C = c1c0 (256 bit) + # c1 = a1b1 + (a0b1)h + (a1b0)h + # c0 = a0b0 + (a0b1)l + (a1b0)h + + # v1 = (a0b1)l,(a0b0)l + @{[vclmul_vx $V1, $V5, $TMP0]} # vclmul.vx v1, v5, t0 + # v3 = (a0b1)h,(a0b0)h + @{[vclmulh_vx $V3, $V5, $TMP0]} # vclmulh.vx v3, v5, t0 + + # v4 = (a1b1)l,(a1b0)l + @{[vclmul_vx $V4, $V5, $TMP1]} # vclmul.vx v4, v5, t1 + # v2 = (a1b1)h,(a1b0)h + @{[vclmulh_vx $V2, $V5, $TMP1]} # vclmulh.vx v2, v5, t1 + + # Is there a better way to do this? + # Would need to swap the order of elements within a vector register. + @{[vslideup_vi $V5, $V3, 1]} # vslideup.vi v5, v3, 1 + @{[vslideup_vi $V6, $V4, 1]} # vslideup.vi v6, v4, 1 + @{[vslidedown_vi $V3, $V3, 1]} # vslidedown.vi v3, v3, 1 + @{[vslidedown_vi $V4, $V4, 1]} # vslidedown.vi v4, v4, 1 + + @{[vmv_v_i $V0, 1]} # vmv.v.i v0, 1 + # v2 += (a0b1)h + @{[vxor_vv_v0t $V2, $V2, $V3]} # vxor.vv v2, v2, v3, v0.t + # v2 += (a1b1)l + @{[vxor_vv_v0t $V2, $V2, $V4]} # vxor.vv v2, v2, v4, v0.t + + @{[vmv_v_i $V0, 2]} # vmv.v.i v0, 2 + # v1 += (a0b0)h,0 + @{[vxor_vv_v0t $V1, $V1, $V5]} # vxor.vv v1, v1, v5, v0.t + # v1 += (a1b0)l,0 + @{[vxor_vv_v0t $V1, $V1, $V6]} # vxor.vv v1, v1, v6, v0.t + + # Now the 256bit product should be stored in (v2,v1) + # v1 = (a0b1)l + (a0b0)h + (a1b0)l, (a0b0)l + # v2 = (a1b1)h, (a1b0)h + (a0b1)h + (a1b1)l + + # Reduction + # Let C := A*B = c3,c2,c1,c0 = v2[1],v2[0],v1[1],v1[0] + # This is a slight variation of the Gueron's Montgomery reduction. + # The difference being the order of some operations has been changed, + # to make a better use of vclmul(h) instructions. + + # First step: + # c1 += (c0 * P)l + # vmv.v.i v0, 2 + @{[vslideup_vi_v0t $V3, $V1, 1]} # vslideup.vi v3, v1, 1, v0.t + @{[vclmul_vx_v0t $V3, $V3, $TMP3]} # vclmul.vx v3, v3, t3, v0.t + @{[vxor_vv_v0t $V1, $V1, $V3]} # vxor.vv v1, v1, v3, v0.t + + # Second step: + # D = d1,d0 is final result + # We want: + # m1 = c1 + (c1 * P)h + # m0 = (c1 * P)l + (c0 * P)h + c0 + # d1 = c3 + m1 + # d0 = c2 + m0 + + #v3 = (c1 * P)l, 0 + @{[vclmul_vx_v0t $V3, $V1, $TMP3]} # vclmul.vx v3, v1, t3, v0.t + #v4 = (c1 * P)h, (c0 * P)h + @{[vclmulh_vx $V4, $V1, $TMP3]} # vclmulh.vx v4, v1, t3 + + @{[vmv_v_i $V0, 1]} # vmv.v.i v0, 1 + @{[vslidedown_vi $V3, $V3, 1]} # vslidedown.vi v3, v3, 1 + + @{[vxor_vv $V1, $V1, $V4]} # vxor.vv v1, v1, v4 + @{[vxor_vv_v0t $V1, $V1, $V3]} # vxor.vv v1, v1, v3, v0.t + + # XOR in the upper upper part of the product + @{[vxor_vv $V2, $V2, $V1]} # vxor.vv v2, v2, v1 + + @{[vrev8_v $V2, $V2]} # vrev8.v v2, v2 + @{[vsse64_v $V2, $Xi, $TMP4]} # vsse64.v v2, (a0), t4 + ret +.size gcm_gmult_rv64i_zvkb,.-gcm_gmult_rv64i_zvkb +___ +} + +################################################################################ +# void gcm_ghash_rv64i_zvkb(u64 Xi[2], const u128 Htable[16], +# const u8 *inp, size_t len); +# +# input: Xi: current hash value +# Htable: preprocessed H +# inp: pointer to input data +# len: length of input data in bytes (mutiple of block size) +# output: Xi: Xi+1 (next hash value Xi) +{ +my ($Xi,$Htable,$inp,$len,$TMP0,$TMP1,$TMP2,$TMP3,$M8,$TMP5,$TMP6) = ("a0","a1","a2","a3","t0","t1","t2","t3","t4","t5","t6"); +my ($V0,$V1,$V2,$V3,$V4,$V5,$V6,$Vinp) = ("v0","v1","v2","v3","v4","v5","v6","v7"); + +$code .= <<___; +.p2align 3 +.globl gcm_ghash_rv64i_zvkb +.type gcm_ghash_rv64i_zvkb,\@function +gcm_ghash_rv64i_zvkb: + ld $TMP0, ($Htable) + ld $TMP1, 8($Htable) + li $TMP2, 63 + la $TMP3, Lpolymod + ld $TMP3, 8($TMP3) + + # Load/store data in reverse order. + # This is needed as a part of endianness swap. + add $Xi, $Xi, 8 + add $inp, $inp, 8 + li $M8, -8 + + @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma + + @{[vlse64_v $V5, $Xi, $M8]} # vlse64.v v5, (a0), t4 + +Lstep: + # Read input data + @{[vlse64_v $Vinp, $inp, $M8]} # vle64.v v0, (a2) + add $inp, $inp, 16 + add $len, $len, -16 + # XOR them into Xi + @{[vxor_vv $V5, $V5, $Vinp]} # vxor.vv v0, v0, v1 + + @{[vrev8_v $V5, $V5]} # vrev8.v v5, v5 + + # Multiplication + + # Do two 64x64 multiplications in one go to save some time + # and simplify things. + + # A = a1a0 (t1, t0) + # B = b1b0 (v5) + # C = c1c0 (256 bit) + # c1 = a1b1 + (a0b1)h + (a1b0)h + # c0 = a0b0 + (a0b1)l + (a1b0)h + + # v1 = (a0b1)l,(a0b0)l + @{[vclmul_vx $V1, $V5, $TMP0]} # vclmul.vx v1, v5, t0 + # v3 = (a0b1)h,(a0b0)h + @{[vclmulh_vx $V3, $V5, $TMP0]} # vclmulh.vx v3, v5, t0 + + # v4 = (a1b1)l,(a1b0)l + @{[vclmul_vx $V4, $V5, $TMP1]} # vclmul.vx v4, v5, t1 + # v2 = (a1b1)h,(a1b0)h + @{[vclmulh_vx $V2, $V5, $TMP1]} # vclmulh.vx v2, v5, t1 + + # Is there a better way to do this? + # Would need to swap the order of elements within a vector register. + @{[vslideup_vi $V5, $V3, 1]} # vslideup.vi v5, v3, 1 + @{[vslideup_vi $V6, $V4, 1]} # vslideup.vi v6, v4, 1 + @{[vslidedown_vi $V3, $V3, 1]} # vslidedown.vi v3, v3, 1 + @{[vslidedown_vi $V4, $V4, 1]} # vslidedown.vi v4, v4, 1 + + @{[vmv_v_i $V0, 1]} # vmv.v.i v0, 1 + # v2 += (a0b1)h + @{[vxor_vv_v0t $V2, $V2, $V3]} # vxor.vv v2, v2, v3, v0.t + # v2 += (a1b1)l + @{[vxor_vv_v0t $V2, $V2, $V4]} # vxor.vv v2, v2, v4, v0.t + + @{[vmv_v_i $V0, 2]} # vmv.v.i v0, 2 + # v1 += (a0b0)h,0 + @{[vxor_vv_v0t $V1, $V1, $V5]} # vxor.vv v1, v1, v5, v0.t + # v1 += (a1b0)l,0 + @{[vxor_vv_v0t $V1, $V1, $V6]} # vxor.vv v1, v1, v6, v0.t + + # Now the 256bit product should be stored in (v2,v1) + # v1 = (a0b1)l + (a0b0)h + (a1b0)l, (a0b0)l + # v2 = (a1b1)h, (a1b0)h + (a0b1)h + (a1b1)l + + # Reduction + # Let C := A*B = c3,c2,c1,c0 = v2[1],v2[0],v1[1],v1[0] + # This is a slight variation of the Gueron's Montgomery reduction. + # The difference being the order of some operations has been changed, + # to make a better use of vclmul(h) instructions. + + # First step: + # c1 += (c0 * P)l + # vmv.v.i v0, 2 + @{[vslideup_vi_v0t $V3, $V1, 1]} # vslideup.vi v3, v1, 1, v0.t + @{[vclmul_vx_v0t $V3, $V3, $TMP3]} # vclmul.vx v3, v3, t3, v0.t + @{[vxor_vv_v0t $V1, $V1, $V3]} # vxor.vv v1, v1, v3, v0.t + + # Second step: + # D = d1,d0 is final result + # We want: + # m1 = c1 + (c1 * P)h + # m0 = (c1 * P)l + (c0 * P)h + c0 + # d1 = c3 + m1 + # d0 = c2 + m0 + + #v3 = (c1 * P)l, 0 + @{[vclmul_vx_v0t $V3, $V1, $TMP3]} # vclmul.vx v3, v1, t3, v0.t + #v4 = (c1 * P)h, (c0 * P)h + @{[vclmulh_vx $V4, $V1, $TMP3]} # vclmulh.vx v4, v1, t3 + + @{[vmv_v_i $V0, 1]} # vmv.v.i v0, 1 + @{[vslidedown_vi $V3, $V3, 1]} # vslidedown.vi v3, v3, 1 + + @{[vxor_vv $V1, $V1, $V4]} # vxor.vv v1, v1, v4 + @{[vxor_vv_v0t $V1, $V1, $V3]} # vxor.vv v1, v1, v3, v0.t + + # XOR in the upper upper part of the product + @{[vxor_vv $V2, $V2, $V1]} # vxor.vv v2, v2, v1 + + @{[vrev8_v $V5, $V2]} # vrev8.v v2, v2 + + bnez $len, Lstep + + @{[vsse64_v $V5, $Xi, $M8]} # vsse64.v v2, (a0), t4 + ret +.size gcm_ghash_rv64i_zvkb,.-gcm_ghash_rv64i_zvkb +___ +} + +$code .= <<___; +.p2align 4 +Lpolymod: + .dword 0x0000000000000001 + .dword 0xc200000000000000 +.size Lpolymod,.-Lpolymod +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; From patchwork Mon Mar 13 19:12:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= X-Patchwork-Id: 69065 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1363997wrd; Mon, 13 Mar 2023 12:24:03 -0700 (PDT) X-Google-Smtp-Source: AK7set8SjJVYTdd9r9BfqTji2jNCxNa7EPn6RCBIlGYELnd7B8NjgI8RZZKhdRU3OlQvHopDLkGh X-Received: by 2002:a17:902:d2d0:b0:19e:65db:7ac3 with SMTP id n16-20020a170902d2d000b0019e65db7ac3mr45473808plc.68.1678735443410; Mon, 13 Mar 2023 12:24:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678735443; cv=none; d=google.com; s=arc-20160816; b=w8/OhOG3mhah1/65aHwurQi7qb9zR7Y/BsRPzTZJl5ELuVFjjC8Wze4f9v4vgVJJ1b XLBg6SruYM7/nStOKa486ubSB0IKTjntq3zGuccXIL9kSn6mmnYn4hVb6Lr+/O7R3wnY fbsmmJftcZJwBoluihg5PtDQb5I9J+tDCOXHPSwu6FOmH5RE1irxVKkH4U52qI2vdNU0 mW/3Br0J/Y/HaWuagm8H3DL13fjnTNPbQOQm7svRBVIxa/FqIgZ7U1B0dxGBBwu4F5l1 p0YAqCe8njUoCF8BErsAtJAOl72QArezvlpcT2lXaK9FaTrLBzmVpfU+KNMEtC+MfrRF lnZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=xST8Jct9HnqAqddp8dvaQgLM0OC67YRnG9StxVmaRBk=; b=t2r31wgOB2C1OsmyuDtAnSL/W51fJzB1EmsVwPeXAhvVdS2tiksZtC4QU30UirZzFG fLxW6r8+anImsGgwnjYiG7erXXF39PuP5QpzJOdHxW96A2htpp8HgiJ7h4K7eIOqK1e+ PxFizj1y4qUQ7ibSiU2LJxBp2nsbZ4ShSfvf3Yv39b9DdxIhu+ZTRKsRdQxcMqOmNAMA xjRcEcEEzZSCLarPWAyfpa7HM1PZAA5VwRLYv4uo1MWlytidsAza12FllP99QKQlIp1r i6KW79BHfBqlEhojna9yhV+QXLvRhopMnzOyr5BtJFhRepXD57FVvwfldS0zytz9/plA G8yA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ko14-20020a17090307ce00b0019f23a15f43si414950plb.578.2023.03.13.12.23.50; Mon, 13 Mar 2023 12:24:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229694AbjCMTVZ (ORCPT + 99 others); Mon, 13 Mar 2023 15:21:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230448AbjCMTVO (ORCPT ); Mon, 13 Mar 2023 15:21:14 -0400 Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 673896A06F for ; Mon, 13 Mar 2023 12:20:43 -0700 (PDT) Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pbnbW-00028k-K2; Mon, 13 Mar 2023 20:13:06 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, heiko@sntech.de Subject: [PATCH RFC v3 11/16] RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation Date: Mon, 13 Mar 2023 20:12:57 +0100 Message-Id: <20230313191302.580787-12-heiko.stuebner@vrull.eu> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu> References: <20230313191302.580787-1-heiko.stuebner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760281696311367576?= X-GMAIL-MSGID: =?utf-8?q?1760281696311367576?= From: Heiko Stuebner When the Zvkg vector crypto extension is available another optimized gcm ghash variant is possible, so add it as another implmentation. Co-developed-by: Christoph Müllner Signed-off-by: Christoph Müllner Signed-off-by: Heiko Stuebner --- arch/riscv/crypto/Kconfig | 1 + arch/riscv/crypto/Makefile | 7 +- arch/riscv/crypto/ghash-riscv64-glue.c | 80 ++++++++++++ arch/riscv/crypto/ghash-riscv64-zvkg.pl | 161 ++++++++++++++++++++++++ 4 files changed, 247 insertions(+), 2 deletions(-) create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 404fd9b3cb7c..84da19bdde8b 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -13,5 +13,6 @@ config CRYPTO_GHASH_RISCV64 Architecture: riscv64 using one of: - ZBC extension - ZVKB vector crypto extension + - ZVKG vector crypto extension endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index 8ab9a0ae8f2d..1ee0ce7d3264 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -9,7 +9,7 @@ ifdef CONFIG_RISCV_ISA_ZBC ghash-riscv64-y += ghash-riscv64-zbc.o endif ifdef CONFIG_RISCV_ISA_V -ghash-riscv64-y += ghash-riscv64-zvkb.o +ghash-riscv64-y += ghash-riscv64-zvkb.o ghash-riscv64-zvkg.o endif quiet_cmd_perlasm = PERLASM $@ @@ -21,4 +21,7 @@ $(obj)/ghash-riscv64-zbc.S: $(src)/ghash-riscv64-zbc.pl $(obj)/ghash-riscv64-zvkb.S: $(src)/ghash-riscv64-zvkb.pl $(call cmd,perlasm) -clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S +$(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl + $(call cmd,perlasm) + +clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c index 004a1a11d7d8..9c7a8616049e 100644 --- a/arch/riscv/crypto/ghash-riscv64-glue.c +++ b/arch/riscv/crypto/ghash-riscv64-glue.c @@ -26,6 +26,10 @@ void gcm_ghash_rv64i_zbc__zbkb(u64 Xi[2], const u128 Htable[16], void gcm_ghash_rv64i_zvkb(u64 Xi[2], const u128 Htable[16], const u8 *inp, size_t len); +/* Zvkg (vector crypto with vghmac.vv). */ +void gcm_ghash_rv64i_zvkg(u64 Xi[2], const u128 Htable[16], + const u8 *inp, size_t len); + struct riscv64_ghash_ctx { void (*ghash_func)(u64 Xi[2], const u128 Htable[16], const u8 *inp, size_t len); @@ -183,6 +187,63 @@ struct shash_alg riscv64_zvkb_ghash_alg = { }, }; +RISCV64_ZVK_SETKEY(zvkg, zvkg); +struct shash_alg riscv64_zvkg_ghash_alg = { + .digestsize = GHASH_DIGEST_SIZE, + .init = riscv64_ghash_init, + .update = riscv64_zvk_ghash_update, + .final = riscv64_zvk_ghash_final, + .setkey = riscv64_zvk_ghash_setkey_zvkg, + .descsize = sizeof(struct riscv64_ghash_desc_ctx) + + sizeof(struct ghash_desc_ctx), + .base = { + .cra_name = "ghash", + .cra_driver_name = "riscv64_zvkg_ghash", + .cra_priority = 301, + .cra_blocksize = GHASH_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_ghash_ctx), + .cra_module = THIS_MODULE, + }, +}; + +RISCV64_ZVK_SETKEY(zvkg__zbb_or_zbkb, zvkg); +struct shash_alg riscv64_zvkg_zbb_or_zbkb_ghash_alg = { + .digestsize = GHASH_DIGEST_SIZE, + .init = riscv64_ghash_init, + .update = riscv64_zvk_ghash_update, + .final = riscv64_zvk_ghash_final, + .setkey = riscv64_zvk_ghash_setkey_zvkg__zbb_or_zbkb, + .descsize = sizeof(struct riscv64_ghash_desc_ctx) + + sizeof(struct ghash_desc_ctx), + .base = { + .cra_name = "ghash", + .cra_driver_name = "riscv64_zvkg_zbb_or_zbkb_ghash", + .cra_priority = 302, + .cra_blocksize = GHASH_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_ghash_ctx), + .cra_module = THIS_MODULE, + }, +}; + +RISCV64_ZVK_SETKEY(zvkg__zvkb, zvkg); +struct shash_alg riscv64_zvkg_zvkb_ghash_alg = { + .digestsize = GHASH_DIGEST_SIZE, + .init = riscv64_ghash_init, + .update = riscv64_zvk_ghash_update, + .final = riscv64_zvk_ghash_final, + .setkey = riscv64_zvk_ghash_setkey_zvkg__zvkb, + .descsize = sizeof(struct riscv64_ghash_desc_ctx) + + sizeof(struct ghash_desc_ctx), + .base = { + .cra_name = "ghash", + .cra_driver_name = "riscv64_zvkg_zvkb_ghash", + .cra_priority = 303, + .cra_blocksize = GHASH_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_ghash_ctx), + .cra_module = THIS_MODULE, + }, +}; + #endif /* CONFIG_RISCV_ISA_V */ #ifdef CONFIG_RISCV_ISA_ZBC @@ -381,6 +442,25 @@ static int __init riscv64_ghash_mod_init(void) if (ret < 0) return ret; } + + if (riscv_isa_extension_available(NULL, ZVKG)) { + ret = riscv64_ghash_register(&riscv64_zvkg_ghash_alg); + if (ret < 0) + return ret; + + if (riscv_isa_extension_available(NULL, ZVKB)) { + ret = riscv64_ghash_register(&riscv64_zvkg_zvkb_ghash_alg); + if (ret < 0) + return ret; + } + + if (riscv_isa_extension_available(NULL, ZBB) || + riscv_isa_extension_available(NULL, ZBKB)) { + ret = riscv64_ghash_register(&riscv64_zvkg_zbb_or_zbkb_ghash_alg); + if (ret < 0) + return ret; + } + } #endif return 0; diff --git a/arch/riscv/crypto/ghash-riscv64-zvkg.pl b/arch/riscv/crypto/ghash-riscv64-zvkg.pl new file mode 100644 index 000000000000..c13dd9c4ee31 --- /dev/null +++ b/arch/riscv/crypto/ghash-riscv64-zvkg.pl @@ -0,0 +1,161 @@ +#! /usr/bin/env perl +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +# - RV64I +# - RISC-V vector ('V') with VLEN >= 128 +# - RISC-V vector crypto GHASH extension ('Zvkg') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +################################################################################ +# void gcm_init_rv64i_zvkg(u128 Htable[16], const u64 H[2]); +# void gcm_init_rv64i_zvkg__zbb_or_zbkb(u128 Htable[16], const u64 H[2]); +# void gcm_init_rv64i_zvkg__zvkb(u128 Htable[16], const u64 H[2]); +# +# input: H: 128-bit H - secret parameter E(K, 0^128) +# output: Htable: Copy of secret parameter (in normalized byte order) +# +# All callers of this function revert the byte-order unconditionally +# on little-endian machines. So we need to revert the byte-order back. +{ +my ($Htable,$H,$VAL0,$VAL1,$TMP0) = ("a0","a1","a2","a3","t0"); + +$code .= <<___; +.p2align 3 +.globl gcm_init_rv64i_zvkg +.type gcm_init_rv64i_zvkg,\@function +gcm_init_rv64i_zvkg: + # First word + ld $VAL0, 0($H) + ld $VAL1, 8($H) + @{[sd_rev8_rv64i $VAL0, $Htable, 0, $TMP0]} + @{[sd_rev8_rv64i $VAL1, $Htable, 8, $TMP0]} + ret +.size gcm_init_rv64i_zvkg,.-gcm_init_rv64i_zvkg +___ +} + +{ +my ($Htable,$H,$TMP0,$TMP1) = ("a0","a1","t0","t1"); + +$code .= <<___; +.p2align 3 +.globl gcm_init_rv64i_zvkg__zbb_or_zbkb +.type gcm_init_rv64i_zvkg__zbb_or_zbkb,\@function +gcm_init_rv64i_zvkg__zbb_or_zbkb: + ld $TMP0,0($H) + ld $TMP1,8($H) + @{[rev8 $TMP0, $TMP0]} #rev8 $TMP0, $TMP0 + @{[rev8 $TMP1, $TMP1]} #rev8 $TMP1, $TMP1 + sd $TMP0,0($Htable) + sd $TMP1,8($Htable) + ret +.size gcm_init_rv64i_zvkg__zbb_or_zbkb,.-gcm_init_rv64i_zvkg__zbb_or_zbkb +___ +} + +{ +my ($Htable,$H,$V0) = ("a0","a1","v0"); + +$code .= <<___; +.p2align 3 +.globl gcm_init_rv64i_zvkg__zvkb +.type gcm_init_rv64i_zvkg__zvkb,\@function +gcm_init_rv64i_zvkg__zvkb: + # All callers of this function revert the byte-order unconditionally + # on little-endian machines. So we need to revert the byte-order back. + @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma + @{[vle64_v $V0, $H]} # vle64.v v0, (a1) + @{[vrev8_v $V0, $V0]} # vrev8.v v0, v0 + @{[vse64_v $V0, $Htable]} # vse64.v v0, (a0) + ret +.size gcm_init_rv64i_zvkg__zvkb,.-gcm_init_rv64i_zvkg__zvkb +___ +} + +################################################################################ +# void gcm_gmult_rv64i_zvkg(u64 Xi[2], const u128 Htable[16]); +# +# input: Xi: current hash value +# Htable: copy of H +# output: Xi: next hash value Xi +{ +my ($Xi,$Htable) = ("a0","a1"); +my ($VD,$VS2) = ("v1","v2"); + +$code .= <<___; +.p2align 3 +.globl gcm_gmult_rv64i_zvkg +.type gcm_gmult_rv64i_zvkg,\@function +gcm_gmult_rv64i_zvkg: + @{[vsetivli__x0_4_e32_m1_ta_ma]} + @{[vle32_v $VS2, $Htable]} + @{[vle32_v $VD, $Xi]} + @{[vgmul_vv $VD, $VS2]} + @{[vse32_v $VD, $Xi]} + ret +.size gcm_gmult_rv64i_zvkg,.-gcm_gmult_rv64i_zvkg +___ +} + +################################################################################ +# void gcm_ghash_rv64i_zvkg(u64 Xi[2], const u128 Htable[16], +# const u8 *inp, size_t len); +# +# input: Xi: current hash value +# Htable: copy of H +# inp: pointer to input data +# len: length of input data in bytes (mutiple of block size) +# output: Xi: Xi+1 (next hash value Xi) +{ +my ($Xi,$Htable,$inp,$len) = ("a0","a1","a2","a3"); +my ($vXi,$vH,$vinp,$Vzero) = ("v1","v2","v3","v4"); + +$code .= <<___; +.p2align 3 +.globl gcm_ghash_rv64i_zvkg +.type gcm_ghash_rv64i_zvkg,\@function +gcm_ghash_rv64i_zvkg: + @{[vsetivli__x0_4_e32_m1_ta_ma]} + @{[vle32_v $vH, $Htable]} + @{[vle32_v $vXi, $Xi]} + +Lstep: + @{[vle32_v $vinp, $inp]} + add $inp, $inp, 16 + add $len, $len, -16 + @{[vghsh_vv $vXi, $vinp, $vH]} + bnez $len, Lstep + + @{[vse32_v $vXi, $Xi]} + ret + +.size gcm_ghash_rv64i_zvkg,.-gcm_ghash_rv64i_zvkg +___ +} + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; From patchwork Mon Mar 13 19:12:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= X-Patchwork-Id: 69061 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1361726wrd; Mon, 13 Mar 2023 12:19:29 -0700 (PDT) X-Google-Smtp-Source: AK7set8Og2/W23dTko4//bR4XgnPhh9Rl6mciocqcLtQYcLOTwmkcfjj7nFvSOTsDIvD36p2Kl9n X-Received: by 2002:a17:902:8bcb:b0:19e:ad18:da5c with SMTP id r11-20020a1709028bcb00b0019ead18da5cmr24705455plo.37.1678735169149; Mon, 13 Mar 2023 12:19:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678735169; cv=none; d=google.com; s=arc-20160816; b=UAmtg1AwBWL1GnQEJOU8KIHSA3weYohLjai2djaRJ8dlX6LhRM+3uHZxX/G0+BluPB YNaR0GlRYuiP+vcgHRLOx01NcZYVCgICFvCpbJwFlNTr+9JJo6mFLLPmrXbVgqyFcFvh b3GDcN/iNOaKH1PapkhpuLrj4nvUlxJandOwXML0y9pA3/ekJBByA+Wb/zAVCmP/Y8Eo twDB0v2AHcfg2Oj+Qa5L5I1OafywnsXAoKnXfLvAO6unTWD+MkW4tJCa0AfrjPBacWFt UpXgQ6LEyP1R+iYIZ1GZwzkRQl5ZcKo0Jz2k9QKiynILgCmL9VvRPs9Q2zaml1Q5Nh6m /jYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=bw5yiJQTAxNLkWR3Tk3nByraK0m2ufI4SaHxvRLm4MU=; b=OaY8z7I5/6mMcWCADAGQ4888wPY/D+gw6FznkOHwuClgxrD+0Qoif/YDrKpt0G5Qra o7O9Jia6MpnJydWDiyg0O1ASEfqI6pk5xG8lDTkOvH/9mrIpEhRe7YPC6kH3aHp0Bdm3 CE63pURorrucWGd+wc2SFGQ20tDS0XJxlwj5bNA3fItnNruSH73gBQJvXyU+M9MIY9Pf +2TvcfG0iCnqgKNvIPI5iawIb+UPUM8Jwutz79Ch7h4GiYt+vQTdXNQELU1XpIZexfbg +Y+ZUGaCqubP28/XLYK0tsXTFot6DExKYWB2E5tBjpN7ySBxRoYGXqx+PrZvNGHblsZP A2QQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d18-20020a170903231200b0019d04cfd41asi410800plh.593.2023.03.13.12.19.12; Mon, 13 Mar 2023 12:19:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230455AbjCMTQm (ORCPT + 99 others); Mon, 13 Mar 2023 15:16:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230395AbjCMTQd (ORCPT ); Mon, 13 Mar 2023 15:16:33 -0400 Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73DD37B4AC for ; Mon, 13 Mar 2023 12:16:12 -0700 (PDT) Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pbnbW-00028k-Sz; Mon, 13 Mar 2023 20:13:06 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, heiko@sntech.de Subject: [PATCH RFC v3 12/16] RISC-V: crypto: add a vector-crypto-accelerated SHA256 implementation Date: Mon, 13 Mar 2023 20:12:58 +0100 Message-Id: <20230313191302.580787-13-heiko.stuebner@vrull.eu> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu> References: <20230313191302.580787-1-heiko.stuebner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760281408729327811?= X-GMAIL-MSGID: =?utf-8?q?1760281408729327811?= From: Heiko Stuebner This adds an accelerated SHA256 algorithm using either the Zvknha or Zvknhb vector crypto extensions. The spec says that Zvknhb supports SHA-256 and SHA-512. Zvknha supports only SHA-256. so the relevant acclerating instructions are included in both. Co-developed-by: Charalampos Mitrodimas Signed-off-by: Charalampos Mitrodimas Signed-off-by: Heiko Stuebner --- arch/riscv/crypto/Kconfig | 11 + arch/riscv/crypto/Makefile | 7 + arch/riscv/crypto/sha256-riscv64-glue.c | 114 +++++++++ arch/riscv/crypto/sha256-riscv64-zvknha.pl | 284 +++++++++++++++++++++ 4 files changed, 416 insertions(+) create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c create mode 100644 arch/riscv/crypto/sha256-riscv64-zvknha.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 84da19bdde8b..8645e02171f7 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -15,4 +15,15 @@ config CRYPTO_GHASH_RISCV64 - ZVKB vector crypto extension - ZVKG vector crypto extension +config CRYPTO_SHA256_RISCV64 + tristate "Hash functions: SHA-256" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_HASH + select CRYPTO_LIB_SHA256 + help + SHA-256 secure hash algorithm (FIPS 180) + + Architecture: riscv64 using + - Zvknha or Zvknhb vector crypto extensions + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index 1ee0ce7d3264..02b3b4c32672 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -12,6 +12,9 @@ ifdef CONFIG_RISCV_ISA_V ghash-riscv64-y += ghash-riscv64-zvkb.o ghash-riscv64-zvkg.o endif +obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o +sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknhb.o + quiet_cmd_perlasm = PERLASM $@ cmd_perlasm = $(PERL) $(<) void $(@) @@ -24,4 +27,8 @@ $(obj)/ghash-riscv64-zvkb.S: $(src)/ghash-riscv64-zvkb.pl $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl $(call cmd,perlasm) +$(obj)/sha256-riscv64-zvknhb.S: $(src)/sha256-riscv64-zvknha.pl + $(call cmd,perlasm) + clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S +clean-files += sha256-riscv64-zvknha.S diff --git a/arch/riscv/crypto/sha256-riscv64-glue.c b/arch/riscv/crypto/sha256-riscv64-glue.c new file mode 100644 index 000000000000..8e3bb1deaad5 --- /dev/null +++ b/arch/riscv/crypto/sha256-riscv64-glue.c @@ -0,0 +1,114 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Linux/riscv64 port of the OpenSSL SHA256 implementation for RISCV64 + * + * Copyright (C) 2022 VRULL GmbH + * Author: Heiko Stuebner + */ + +#include +#include +#include +#include +#include +#include +#include + +asmlinkage void sha256_block_data_order_zvknha(u32 *digest, const void *data, + unsigned int num_blks); + +static void __sha256_block_data_order(struct sha256_state *sst, u8 const *src, + int blocks) +{ + sha256_block_data_order_zvknha(sst->state, src, blocks); +} + +static int riscv64_sha256_update(struct shash_desc *desc, const u8 *data, + unsigned int len) +{ + if (crypto_simd_usable()) { + int ret; + + kernel_rvv_begin(); + ret = sha256_base_do_update(desc, data, len, + __sha256_block_data_order); + kernel_rvv_end(); + return ret; + } else { + sha256_update(shash_desc_ctx(desc), data, len); + return 0; + } +} + +static int riscv64_sha256_finup(struct shash_desc *desc, const u8 *data, + unsigned int len, u8 *out) +{ + if (!crypto_simd_usable()) { + sha256_update(shash_desc_ctx(desc), data, len); + sha256_final(shash_desc_ctx(desc), out); + return 0; + } + + kernel_rvv_begin(); + if (len) + sha256_base_do_update(desc, data, len, + __sha256_block_data_order); + + sha256_base_do_finalize(desc, __sha256_block_data_order); + kernel_rvv_end(); + + return sha256_base_finish(desc, out); +} + +static int riscv64_sha256_final(struct shash_desc *desc, u8 *out) +{ + return riscv64_sha256_finup(desc, NULL, 0, out); +} + +static struct shash_alg sha256_alg = { + .digestsize = SHA256_DIGEST_SIZE, + .init = sha256_base_init, + .update = riscv64_sha256_update, + .final = riscv64_sha256_final, + .finup = riscv64_sha256_finup, + .descsize = sizeof(struct sha256_state), + .base.cra_name = "sha256", + .base.cra_driver_name = "sha256-riscv64-zvknha", + .base.cra_priority = 150, + .base.cra_blocksize = SHA256_BLOCK_SIZE, + .base.cra_module = THIS_MODULE, +}; + +static int __init sha256_mod_init(void) +{ + /* + * From the spec: + * Zvknhb supports SHA-256 and SHA-512. Zvknha supports only SHA-256. + */ + if ((riscv_isa_extension_available(NULL, ZVKNHA) || + riscv_isa_extension_available(NULL, ZVKNHB)) && + riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >= 128) + + return crypto_register_shash(&sha256_alg); + + return 0; +} + +static void __exit sha256_mod_fini(void) +{ + if ((riscv_isa_extension_available(NULL, ZVKNHA) || + riscv_isa_extension_available(NULL, ZVKNHB)) && + riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >= 128) + crypto_unregister_shash(&sha256_alg); +} + +module_init(sha256_mod_init); +module_exit(sha256_mod_fini); + +MODULE_DESCRIPTION("SHA-256 secure hash for riscv64"); +MODULE_AUTHOR("Andy Polyakov "); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL v2"); +MODULE_ALIAS_CRYPTO("sha256"); diff --git a/arch/riscv/crypto/sha256-riscv64-zvknha.pl b/arch/riscv/crypto/sha256-riscv64-zvknha.pl new file mode 100644 index 000000000000..c4ac20d7d138 --- /dev/null +++ b/arch/riscv/crypto/sha256-riscv64-zvknha.pl @@ -0,0 +1,284 @@ +#! /usr/bin/env perl +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +# The generated code of this file depends on the following RISC-V extensions: +# - RV64I +# - RISC-V vector ('V') with VLEN >= 128 +# - Vector Bit-manipulation used in Cryptography ('Zvkb') +# - Vector SHA-2 Secure Hash ('Zvknha') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +my ($V0, $V10, $V11, $V12, $V13, $V14, $V15, $V16, $V17) = ("v0", "v10", "v11", "v12", "v13", "v14","v15", "v16", "v17"); +my ($V26, $V27) = ("v26", "v27"); + +my $K256 = "K256"; + +# Function arguments +my ($H, $INP, $LEN, $KT, $STRIDE) = ("a0", "a1", "a2", "a3", "t3"); + +################################################################################ +# void sha256_block_data_order(void *c, const void *p, size_t len) +$code .= <<___; +.p2align 2 +.globl sha256_block_data_order_zvknha +.type sha256_block_data_order_zvknha,\@function +sha256_block_data_order_zvknha: + @{[vsetivli__x0_4_e32_m1_ta_ma]} + + # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c} + # We achieve this by reading with a negative stride followed by + # element sliding. + li $STRIDE, -4 + addi $H, $H, 12 + @{[vlse32_v $V16, $H, $STRIDE]} # {d,c,b,a} + addi $H, $H, 16 + @{[vlse32_v $V17, $H, $STRIDE]} # {h,g,f,e} + # Keep H advanced by 12 + addi $H, $H, -16 + + @{[vmv_v_v $V27, $V16]} # {d,c,b,a} + @{[vslidedown_vi $V26, $V16, 2]} # {b,a,0,0} + @{[vslidedown_vi $V16, $V17, 2]} # {f,e,0,0} + @{[vslideup_vi $V16, $V26, 2]} # {f,e,b,a} + @{[vslideup_vi $V17, $V27, 2]} # {h,g,d,c} + + # Keep the old state as we need it later: H' = H+{a',b',c',...,h'}. + @{[vmv_v_v $V26, $V16]} + @{[vmv_v_v $V27, $V17]} + +L_round_loop: + la $KT, $K256 # Load round constants K256 + + # Load the 512-bits of the message block in v10-v13 and perform + # an endian swap on each 4 bytes element. + @{[vle32_v $V10, $INP]} + @{[vrev8_v $V10, $V10]} + add $INP, $INP, 16 + @{[vle32_v $V11, $INP]} + @{[vrev8_v $V11, $V11]} + add $INP, $INP, 16 + @{[vle32_v $V12, $INP]} + @{[vrev8_v $V12, $V12]} + add $INP, $INP, 16 + @{[vle32_v $V13, $INP]} + @{[vrev8_v $V13, $V13]} + add $INP, $INP, 16 + + # Decrement length by 1 + add $LEN, $LEN, -1 + + # Set v0 up for the vmerge that replaces the first word (idx==0) + @{[vid_v $V0]} + @{[vmseq_vi $V0, $V0, 0x0]} # v0.mask[i] = (i == 0 ? 1 : 0) + + # Quad-round 0 (+0, Wt from oldest to newest in v10->v11->v12->v13) + @{[vle32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V10]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V12, $V11, $V0]} + @{[vsha2ms_vv $V10, $V14, $V13]} # Generate W[19:16] + + # Quad-round 1 (+1, v11->v12->v13->v10) + @{[vle32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V11]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V13, $V12, $V0]} + @{[vsha2ms_vv $V11, $V14, $V10]} # Generate W[23:20] + + # Quad-round 2 (+2, v12->v13->v10->v11) + @{[vle32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V12]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V10, $V13, $V0]} + @{[vsha2ms_vv $V12, $V14, $V11]} # Generate W[27:24] + + # Quad-round 3 (+3, v13->v10->v11->v12) + @{[vle32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V13]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V11, $V10, $V0]} + @{[vsha2ms_vv $V13, $V14, $V12]} # Generate W[31:28] + + # Quad-round 4 (+0, v10->v11->v12->v13) + @{[vle32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V10]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V12, $V11, $V0]} + @{[vsha2ms_vv $V10, $V14, $V13]} # Generate W[35:32] + + # Quad-round 5 (+1, v11->v12->v13->v10) + @{[vle32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V11]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V13, $V12, $V0]} + @{[vsha2ms_vv $V11, $V14, $V10]} # Generate W[39:36] + + # Quad-round 6 (+2, v12->v13->v10->v11) + @{[vle32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V12]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V10, $V13, $V0]} + @{[vsha2ms_vv $V12, $V14, $V11]} # Generate W[43:40] + + # Quad-round 7 (+3, v13->v10->v11->v12) + @{[vle32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V13]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V11, $V10, $V0]} + @{[vsha2ms_vv $V13, $V14, $V12]} # Generate W[47:44] + + # Quad-round 8 (+0, v10->v11->v12->v13) + @{[vle32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V10]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V12, $V11, $V0]} + @{[vsha2ms_vv $V10, $V14, $V13]} # Generate W[51:48] + + # Quad-round 9 (+1, v11->v12->v13->v10) + @{[vle32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V11]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V13, $V12, $V0]} + @{[vsha2ms_vv $V11, $V14, $V10]} # Generate W[55:52] + + # Quad-round 10 (+2, v12->v13->v10->v11) + @{[vle32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V12]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V10, $V13, $V0]} + @{[vsha2ms_vv $V12, $V14, $V11]} # Generate W[59:56] + + # Quad-round 11 (+3, v13->v10->v11->v12) + @{[vle32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V13]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V11, $V10, $V0]} + @{[vsha2ms_vv $V13, $V14, $V12]} # Generate W[63:60] + + # Quad-round 12 (+0, v10->v11->v12->v13) + # Note that we stop generating new message schedule words (Wt, v10-13) + # as we already generated all the words we end up consuming (i.e., W[63:60]). + @{[vle32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V10]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + + # Quad-round 13 (+1, v11->v12->v13->v10) + @{[vle32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V11]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + + # Quad-round 14 (+2, v12->v13->v10->v11) + @{[vle32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V12]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + + # Quad-round 15 (+3, v13->v10->v11->v12) + @{[vle32_v $V15, $KT]} + # No kt increment needed. + @{[vadd_vv $V14, $V15, $V13]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + + # H' = H+{a',b',c',...,h'} + @{[vadd_vv $V16, $V26, $V16]} + @{[vadd_vv $V17, $V27, $V17]} + @{[vmv_v_v $V26, $V16]} + @{[vmv_v_v $V27, $V17]} + bnez $LEN, L_round_loop + + # v26 = v16 = {f,e,b,a} + # v27 = v17 = {h,g,d,c} + # Let's do the opposit transformation like on entry. + + @{[vslideup_vi $V17, $V16, 2]} # {h,g,f,e} + + @{[vslidedown_vi $V16, $V27, 2]} # {d,c,0,0} + @{[vslidedown_vi $V26, $V26, 2]} # {b,a,0,0} + @{[vslideup_vi $V16, $V26, 2]} # {d,c,b,a} + + # H is already advanced by 12 + @{[vsse32_v $V16, $H, $STRIDE]} # {a,b,c,d} + addi $H, $H, 16 + @{[vsse32_v $V17, $H, $STRIDE]} # {e,f,g,h} + + ret +.size sha256_block_data_order_zvknha,.-sha256_block_data_order_zvknha + +.p2align 2 +.type $K256,\@object +$K256: + .word 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5 + .word 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5 + .word 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3 + .word 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174 + .word 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc + .word 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da + .word 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7 + .word 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967 + .word 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13 + .word 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85 + .word 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3 + .word 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070 + .word 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5 + .word 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3 + .word 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208 + .word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 +.size $K256,.-$K256 +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; From patchwork Mon Mar 13 19:12:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= X-Patchwork-Id: 69071 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1371195wrd; Mon, 13 Mar 2023 12:41:30 -0700 (PDT) X-Google-Smtp-Source: AK7set8Bo1+JE7sFppb4koDVhwWjkWGQuxk2opqHVsdrnMG/DcwelFCfOwg6724UzaeYB/9JzEOw X-Received: by 2002:a17:90b:38d0:b0:22c:816e:d67d with SMTP id nn16-20020a17090b38d000b0022c816ed67dmr35769210pjb.24.1678736490646; Mon, 13 Mar 2023 12:41:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678736490; cv=none; d=google.com; s=arc-20160816; b=s1o9wMawKQ7YLTodjtTK15NNMEBj9WZ6WJbsKO1Dbz9z5kwOVfF4fOXOQ2W/8ziZj4 4GyTPGEbCKz66c2PTCfXkW4bDk3q8eR0Ojoh7Qfwif8PVQuAZQ39kMw6tmx0kxV5hAXz Rk8lcXjVdrZf1EmJKggvENTz3A3LEa7BfSq9N7Ktb1q54vzy7YojGQvoQMtgyzLWjGje peQAO25LVp1YEPAn0TMIcnpmjxPwtNp4kAz22XUI2+sBUIrHUzWphOO/vGVCdJCdG+KB R6DueCjnhprBy4pWwmKW6DL3h2PN24T+Vw+/impFErayIrVovwwz+jfHoRee4r8HNWpz WP1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=MWiccSq8Jjz3j5dCdf59xPoZ8vANtcyTQmrOs/4RXfE=; b=X41BI/kmPFDINtObzpDoVnJ+6QBmLiDjxY7jOzzMQWq0aFCkJ53iuWHOPBCWQafch1 2DgqHhP9m1uMIXd2v585sPkmcvANBkL0VlS2lOrvBA8gW8nImubvJBVtGd5ewL9Sq2sF EdbfAWaGRbpWf2QsM7QKTgsesuqNq/LBmwfDQjzEZ7Fh8XifXNQFMqHyUkvWjGDwucOi 7sWNM82frdJb3q4kCNsT5QaxAG+26EXhZ4roWZiSPS25oK/04eY0xxad4YaEZScGgY1O Xv8WWukFNzxso7ZTOJccxMkmbQw1Q37vwh/AGja+WeOROy0ufSI7Ptqn5fZxVO1tmM/Z jzYg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x6-20020a17090aca0600b00233866f90e3si452732pjt.125.2023.03.13.12.41.15; Mon, 13 Mar 2023 12:41:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229932AbjCMTPu (ORCPT + 99 others); Mon, 13 Mar 2023 15:15:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41946 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229871AbjCMTPt (ORCPT ); Mon, 13 Mar 2023 15:15:49 -0400 Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0733B17142 for ; Mon, 13 Mar 2023 12:15:00 -0700 (PDT) Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pbnbX-00028k-5S; Mon, 13 Mar 2023 20:13:07 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, heiko@sntech.de Subject: [PATCH RFC v3 13/16] RISC-V: crypto: add a vector-crypto-accelerated SHA512 implementation Date: Mon, 13 Mar 2023 20:12:59 +0100 Message-Id: <20230313191302.580787-14-heiko.stuebner@vrull.eu> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu> References: <20230313191302.580787-1-heiko.stuebner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760282794285913295?= X-GMAIL-MSGID: =?utf-8?q?1760282794285913295?= From: Heiko Stuebner This adds an accelerated SHA512 algorithm using either the Zvknhb vector crypto extension. Co-developed-by: Charalampos Mitrodimas Signed-off-by: Charalampos Mitrodimas Signed-off-by: Heiko Stuebner --- arch/riscv/crypto/Kconfig | 11 + arch/riscv/crypto/Makefile | 8 +- arch/riscv/crypto/sha512-riscv64-glue.c | 104 ++++++ arch/riscv/crypto/sha512-riscv64-zvknhb.pl | 347 +++++++++++++++++++++ 4 files changed, 469 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c create mode 100644 arch/riscv/crypto/sha512-riscv64-zvknhb.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 8645e02171f7..da6244f0c0c4 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -26,4 +26,15 @@ config CRYPTO_SHA256_RISCV64 Architecture: riscv64 using - Zvknha or Zvknhb vector crypto extensions +config CRYPTO_SHA512_RISCV64 + tristate "Hash functions: SHA-512" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_HASH + select CRYPTO_SHA512 + help + SHA-512 secure hash algorithm (FIPS 180) + + Architecture: riscv64 + - Zvknhb vector crypto extension + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index 02b3b4c32672..3c94753affdf 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -15,6 +15,9 @@ endif obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknhb.o +obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o +sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvknhb.o + quiet_cmd_perlasm = PERLASM $@ cmd_perlasm = $(PERL) $(<) void $(@) @@ -30,5 +33,8 @@ $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl $(obj)/sha256-riscv64-zvknhb.S: $(src)/sha256-riscv64-zvknha.pl $(call cmd,perlasm) +$(obj)/sha512-riscv64-zvknhb.S: $(src)/sha512-riscv64-zvknhb.pl + $(call cmd,perlasm) + clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S -clean-files += sha256-riscv64-zvknha.S +clean-files += sha256-riscv64-zvknha.S sha512-riscv64-zvknhb.S diff --git a/arch/riscv/crypto/sha512-riscv64-glue.c b/arch/riscv/crypto/sha512-riscv64-glue.c new file mode 100644 index 000000000000..fc35ba269bbc --- /dev/null +++ b/arch/riscv/crypto/sha512-riscv64-glue.c @@ -0,0 +1,104 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Linux/riscv64 port of the OpenSSL SHA512 implementation for RISCV64 + * + * Copyright (C) 2023 VRULL GmbH + * Author: Heiko Stuebner + */ + +#include +#include +#include +#include +#include +#include +#include + +asmlinkage void sha512_block_data_order_zvknhb(u64 *digest, const void *data, + unsigned int num_blks); + +static void __sha512_block_data_order(struct sha512_state *sst, u8 const *src, + int blocks) +{ + sha512_block_data_order_zvknhb(sst->state, src, blocks); +} + +static int sha512_update(struct shash_desc *desc, const u8 *data, + unsigned int len) +{ + if (crypto_simd_usable()) { + int ret; + + kernel_rvv_begin(); + ret = sha512_base_do_update(desc, data, len, + __sha512_block_data_order); + kernel_rvv_end(); + return ret; + } else { + return crypto_sha512_update(desc, data, len); + } +} + +static int sha512_finup(struct shash_desc *desc, const u8 *data, + unsigned int len, u8 *out) +{ + if (!crypto_simd_usable()) + return crypto_sha512_finup(desc, data, len, out); + + kernel_rvv_begin(); + if (len) + sha512_base_do_update(desc, data, len, + __sha512_block_data_order); + + sha512_base_do_finalize(desc, __sha512_block_data_order); + kernel_rvv_end(); + + return sha512_base_finish(desc, out); +} + +static int sha512_final(struct shash_desc *desc, u8 *out) +{ + return sha512_finup(desc, NULL, 0, out); +} + +static struct shash_alg sha512_alg = { + .digestsize = SHA512_DIGEST_SIZE, + .init = sha512_base_init, + .update = sha512_update, + .final = sha512_final, + .finup = sha512_finup, + .descsize = sizeof(struct sha512_state), + .base.cra_name = "sha512", + .base.cra_driver_name = "sha512-riscv64-zvknhb", + .base.cra_priority = 150, + .base.cra_blocksize = SHA512_BLOCK_SIZE, + .base.cra_module = THIS_MODULE, +}; + +static int __init sha512_mod_init(void) +{ + /* sha512 needs at least a vlen of 256 to work correctly */ + if (riscv_isa_extension_available(NULL, ZVKNHB) && + riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >= 256) + return crypto_register_shash(&sha512_alg); + + return 0; +} + +static void __exit sha512_mod_fini(void) +{ + if (riscv_isa_extension_available(NULL, ZVKNHB) && + riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >= 256) + crypto_unregister_shash(&sha512_alg); +} + +module_init(sha512_mod_init); +module_exit(sha512_mod_fini); + +MODULE_DESCRIPTION("SHA-512 secure hash for riscv64"); +MODULE_AUTHOR("Andy Polyakov "); +MODULE_AUTHOR("Ard Biesheuvel "); +MODULE_LICENSE("GPL v2"); +MODULE_ALIAS_CRYPTO("sha512"); diff --git a/arch/riscv/crypto/sha512-riscv64-zvknhb.pl b/arch/riscv/crypto/sha512-riscv64-zvknhb.pl new file mode 100644 index 000000000000..f7d609003358 --- /dev/null +++ b/arch/riscv/crypto/sha512-riscv64-zvknhb.pl @@ -0,0 +1,347 @@ +#! /usr/bin/env perl +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +# The generated code of this file depends on the following RISC-V extensions: +# - RV64I +# - RISC-V vector ('V') with VLEN >= 256 +# - Vector Bit-manipulation used in Cryptography ('Zvkb') +# - Vector SHA-2 Secure Hash ('Zvknhb') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +my ($V0, $V10, $V11, $V12, $V13, $V14, $V15, $V16, $V17) = ("v0", "v10", "v11", "v12", "v13", "v14","v15", "v16", "v17"); +my ($V26, $V27) = ("v26", "v27"); + +my $K512 = "K512"; + +# Function arguments +my ($H, $INP, $LEN, $KT, $STRIDE) = ("a0", "a1", "a2", "a3", "t3"); + +################################################################################ +# void sha512_block_data_order(void *c, const void *p, size_t len) +$code .= <<___; +.p2align 2 +.globl sha512_block_data_order_zvknhb +.type sha512_block_data_order_zvknhb,\@function +sha512_block_data_order_zvknhb: + @{[vsetivli__x0_4_e64_m1_ta_ma]} + + # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c} + # We achieve this by reading with a negative stride followed by + # element sliding. + li $STRIDE, -8 + addi $H, $H, 24 + @{[vlse64_v $V16, $H, $STRIDE]} # {d,c,b,a} + addi $H, $H, 32 + @{[vlse64_v $V17, $H, $STRIDE]} # {h,g,f,e} + # Keep H advanced by 24 + addi $H, $H, -32 + + @{[vmv_v_v $V27, $V16]} # {d,c,b,a} + @{[vslidedown_vi $V26, $V16, 2]} # {b,a,0,0} + @{[vslidedown_vi $V16, $V17, 2]} # {f,e,0,0} + @{[vslideup_vi $V16, $V26, 2]} # {f,e,b,a} + @{[vslideup_vi $V17, $V27, 2]} # {h,g,d,c} + + # Keep the old state as we need it later: H' = H+{a',b',c',...,h'}. + @{[vmv_v_v $V26, $V16]} + @{[vmv_v_v $V27, $V17]} + +L_round_loop: + la $KT, $K512 # Load round constants K512 + + # Load the 1024-bits of the message block in v10-v13 and perform + # an endian swap on each 4 bytes element. + @{[vle64_v $V10, $INP]} + @{[vrev8_v $V10, $V10]} + add $INP, $INP, 32 + @{[vle64_v $V11, $INP]} + @{[vrev8_v $V11, $V11]} + add $INP, $INP, 32 + @{[vle64_v $V12, $INP]} + @{[vrev8_v $V12, $V12]} + add $INP, $INP, 32 + @{[vle64_v $V13, $INP]} + @{[vrev8_v $V13, $V13]} + add $INP, $INP, 32 + + # Decrement length by 1 + add $LEN, $LEN, -1 + + # Set v0 up for the vmerge that replaces the first word (idx==0) + @{[vid_v $V0]} + @{[vmseq_vi $V0, $V0, 0x0]} # v0.mask[i] = (i == 0 ? 1 : 0) + + # Quad-round 0 (+0, v10->v11->v12->v13) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V10]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V12, $V11, $V0]} + @{[vsha2ms_vv $V10, $V14, $V13]} + + # Quad-round 1 (+1, v11->v12->v13->v10) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V11]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V13, $V12, $V0]} + @{[vsha2ms_vv $V11, $V14, $V10]} + + # Quad-round 2 (+2, v12->v13->v10->v11) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V12]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V10, $V13, $V0]} + @{[vsha2ms_vv $V12, $V14, $V11]} + + # Quad-round 3 (+3, v13->v10->v11->v12) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V13]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V11, $V10, $V0]} + @{[vsha2ms_vv $V13, $V14, $V12]} + + # Quad-round 4 (+0, v10->v11->v12->v13) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V10]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V12, $V11, $V0]} + @{[vsha2ms_vv $V10, $V14, $V13]} + + # Quad-round 5 (+1, v11->v12->v13->v10) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V11]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V13, $V12, $V0]} + @{[vsha2ms_vv $V11, $V14, $V10]} + + # Quad-round 6 (+2, v12->v13->v10->v11) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V12]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V10, $V13, $V0]} + @{[vsha2ms_vv $V12, $V14, $V11]} + + # Quad-round 7 (+3, v13->v10->v11->v12) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V13]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V11, $V10, $V0]} + @{[vsha2ms_vv $V13, $V14, $V12]} + + # Quad-round 8 (+0, v10->v11->v12->v13) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V10]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V12, $V11, $V0]} + @{[vsha2ms_vv $V10, $V14, $V13]} + + # Quad-round 9 (+1, v11->v12->v13->v10) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V11]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V13, $V12, $V0]} + @{[vsha2ms_vv $V11, $V14, $V10]} + + # Quad-round 10 (+2, v12->v13->v10->v11) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V12]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V10, $V13, $V0]} + @{[vsha2ms_vv $V12, $V14, $V11]} + + # Quad-round 11 (+3, v13->v10->v11->v12) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V13]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V11, $V10, $V0]} + @{[vsha2ms_vv $V13, $V14, $V12]} + + # Quad-round 12 (+0, v10->v11->v12->v13) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V10]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V12, $V11, $V0]} + @{[vsha2ms_vv $V10, $V14, $V13]} + + # Quad-round 13 (+1, v11->v12->v13->v10) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V11]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V13, $V12, $V0]} + @{[vsha2ms_vv $V11, $V14, $V10]} + + # Quad-round 14 (+2, v12->v13->v10->v11) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V12]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V10, $V13, $V0]} + @{[vsha2ms_vv $V12, $V14, $V11]} + + # Quad-round 15 (+3, v13->v10->v11->v12) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V13]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V11, $V10, $V0]} + @{[vsha2ms_vv $V13, $V14, $V12]} + + # Quad-round 16 (+0, v10->v11->v12->v13) + # Note that we stop generating new message schedule words (Wt, v10-13) + # as we already generated all the words we end up consuming (i.e., W[79:76]). + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V10]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V12, $V11, $V0]} + + # Quad-round 17 (+1, v11->v12->v13->v10) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V11]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V13, $V12, $V0]} + + # Quad-round 18 (+2, v12->v13->v10->v11) + @{[vle64_v $V15, ($KT)]} + addi $KT, $KT, 32 + @{[vadd_vv $V14, $V15, $V12]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V10, $V13, $V0]} + + # Quad-round 19 (+3, v13->v10->v11->v12) + @{[vle64_v $V15, ($KT)]} + # No t1 increment needed. + @{[vadd_vv $V14, $V15, $V13]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vsha2ch_vv $V16, $V17, $V14]} + + # H' = H+{a',b',c',...,h'} + @{[vadd_vv $V16, $V26, $V16]} + @{[vadd_vv $V17, $V27, $V17]} + @{[vmv_v_v $V26, $V16]} + @{[vmv_v_v $V27, $V17]} + bnez $LEN, L_round_loop + + # v26 = v16 = {f,e,b,a} + # v27 = v17 = {h,g,d,c} + # Let's do the opposit transformation like on entry. + + @{[vslideup_vi $V17, $V16, 2]} # {h,g,f,e} + + @{[vslidedown_vi $V16, $V27, 2]} # {d,c,0,0} + @{[vslidedown_vi $V26, $V26, 2]} # {b,a,0,0} + @{[vslideup_vi $V16, $V26, 2]} # {d,c,b,a} + + # H is already advanced by 24 + @{[vsse64_v $V16, $H, $STRIDE]} # {a,b,c,d} + addi $H, $H, 32 + @{[vsse64_v $V17, $H, $STRIDE]} # {e,f,g,h} + + ret +.size sha512_block_data_order_zvknhb,.-sha512_block_data_order_zvknhb + +.p2align 3 +.type $K512,\@object +$K512: + .dword 0x428a2f98d728ae22, 0x7137449123ef65cd + .dword 0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc + .dword 0x3956c25bf348b538, 0x59f111f1b605d019 + .dword 0x923f82a4af194f9b, 0xab1c5ed5da6d8118 + .dword 0xd807aa98a3030242, 0x12835b0145706fbe + .dword 0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2 + .dword 0x72be5d74f27b896f, 0x80deb1fe3b1696b1 + .dword 0x9bdc06a725c71235, 0xc19bf174cf692694 + .dword 0xe49b69c19ef14ad2, 0xefbe4786384f25e3 + .dword 0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65 + .dword 0x2de92c6f592b0275, 0x4a7484aa6ea6e483 + .dword 0x5cb0a9dcbd41fbd4, 0x76f988da831153b5 + .dword 0x983e5152ee66dfab, 0xa831c66d2db43210 + .dword 0xb00327c898fb213f, 0xbf597fc7beef0ee4 + .dword 0xc6e00bf33da88fc2, 0xd5a79147930aa725 + .dword 0x06ca6351e003826f, 0x142929670a0e6e70 + .dword 0x27b70a8546d22ffc, 0x2e1b21385c26c926 + .dword 0x4d2c6dfc5ac42aed, 0x53380d139d95b3df + .dword 0x650a73548baf63de, 0x766a0abb3c77b2a8 + .dword 0x81c2c92e47edaee6, 0x92722c851482353b + .dword 0xa2bfe8a14cf10364, 0xa81a664bbc423001 + .dword 0xc24b8b70d0f89791, 0xc76c51a30654be30 + .dword 0xd192e819d6ef5218, 0xd69906245565a910 + .dword 0xf40e35855771202a, 0x106aa07032bbd1b8 + .dword 0x19a4c116b8d2d0c8, 0x1e376c085141ab53 + .dword 0x2748774cdf8eeb99, 0x34b0bcb5e19b48a8 + .dword 0x391c0cb3c5c95a63, 0x4ed8aa4ae3418acb + .dword 0x5b9cca4f7763e373, 0x682e6ff3d6b2b8a3 + .dword 0x748f82ee5defb2fc, 0x78a5636f43172f60 + .dword 0x84c87814a1f0ab72, 0x8cc702081a6439ec + .dword 0x90befffa23631e28, 0xa4506cebde82bde9 + .dword 0xbef9a3f7b2c67915, 0xc67178f2e372532b + .dword 0xca273eceea26619c, 0xd186b8c721c0c207 + .dword 0xeada7dd6cde0eb1e, 0xf57d4f7fee6ed178 + .dword 0x06f067aa72176fba, 0x0a637dc5a2c898a6 + .dword 0x113f9804bef90dae, 0x1b710b35131c471b + .dword 0x28db77f523047d84, 0x32caab7b40c72493 + .dword 0x3c9ebe0a15c9bebc, 0x431d67c49c100d4c + .dword 0x4cc5d4becb3e42b6, 0x597f299cfc657e2a + .dword 0x5fcb6fab3ad6faec, 0x6c44198c4a475817 +.size $K512,.-$K512 +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; From patchwork Mon Mar 13 19:13:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= X-Patchwork-Id: 69066 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1365202wrd; Mon, 13 Mar 2023 12:27:02 -0700 (PDT) X-Google-Smtp-Source: AK7set/qHzZ5lK73Rd1clJH87uqaSEdeowO6S5txafi/Eincy6AXMgGa9a3WG3IMyoBOxpNB3/51 X-Received: by 2002:a17:902:dacf:b0:1a0:6010:dd0d with SMTP id q15-20020a170902dacf00b001a06010dd0dmr1315575plx.36.1678735621880; Mon, 13 Mar 2023 12:27:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678735621; cv=none; d=google.com; s=arc-20160816; b=pMkEOS0yzeJocLZvesImpFBr1T5JKDYEnpJiqu60FCPmB11+khyD3HrDCqgfP3ULzl 6L1L3xgapbr2xMaaMv0tDe1oRTYekTEv0Cbw9G0RSI84tqiQWVrI8Xihn/7qqlnEQGmf o7GlIPFD6uP+uVMDsV6ioGk+wR/a+7VO3gRt143TvS9U9xHqj9pl8jCZ/7vLAOydsHjX DtvViMXtMmhmsoqQNDev4k4jedCVda1qEzDhSNsknfSdxYbLVc84tr781+c1ioN0fkvZ x0L3k7KBgdsfLq8/lspqa9TbMtcZJG2FiI7mCYOoQoozsl5CxLBVEaY5OfzAXWPBjelo 5Xgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=A9u9hc1S1QECegBlwDOljP1DV4KYpwT9fkH6i+GEyUY=; b=i0ag+FYxbuvwFmDuHXR38+CTuD6ByDDpRx2uOuD/MhElL/IAiBa6pY2VeTHa03Jb9u 5o/eDpoPHdSE1JWCWrXHSPTahuZyZOApStoNNX+mFSIWYbgDUFmNb0YbFy2bOcdBcoYW zSRhC/G7KzRdboKlu3ir+BLRhdy6jahkmzZK1DPpQ+zop3GaQlDbEOZxc+ux1ZxmKeEn fJpAOZrF9jR519vs4Z4RQcIyVJo8ucuFhgVhchSmBeJk0g/m/tjqNl0PixRUSgEA10px Afszc4JTrZPs068DITJBg6iCV+JT4VgvDSD0OzqfqD31pDedbtLXokmXFaB2eM2BzL10 9BPg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c8-20020a170902724800b001a05e6d4b08si480210pll.40.2023.03.13.12.26.47; Mon, 13 Mar 2023 12:27:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231144AbjCMTUm (ORCPT + 99 others); Mon, 13 Mar 2023 15:20:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51214 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230039AbjCMTUe (ORCPT ); Mon, 13 Mar 2023 15:20:34 -0400 Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CEAD51165E for ; Mon, 13 Mar 2023 12:20:01 -0700 (PDT) Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pbnbX-00028k-Ds; Mon, 13 Mar 2023 20:13:07 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, heiko@sntech.de Subject: [PATCH RFC v3 14/16] RISC-V: crypto: add Zvkned accelerated AES encryption implementation Date: Mon, 13 Mar 2023 20:13:00 +0100 Message-Id: <20230313191302.580787-15-heiko.stuebner@vrull.eu> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu> References: <20230313191302.580787-1-heiko.stuebner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760281883260865783?= X-GMAIL-MSGID: =?utf-8?q?1760281883260865783?= From: Heiko Stuebner This adds an AES implementation using the Zvkned vector crypto instructions. Co-developed-by: Christoph Müllner Signed-off-by: Christoph Müllner Signed-off-by: Heiko Stuebner --- arch/riscv/crypto/Kconfig | 14 + arch/riscv/crypto/Makefile | 7 + arch/riscv/crypto/aes-riscv-glue.c | 169 ++++++++ arch/riscv/crypto/aes-riscv64-zvkned.pl | 500 ++++++++++++++++++++++++ 4 files changed, 690 insertions(+) create mode 100644 arch/riscv/crypto/aes-riscv-glue.c create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index da6244f0c0c4..c8abb29bb49b 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -2,6 +2,20 @@ menu "Accelerated Cryptographic Algorithms for CPU (riscv)" +config CRYPTO_AES_RISCV + tristate "Ciphers: AES (RISCV)" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_AES + help + Block ciphers: AES cipher algorithms (FIPS-197) + Length-preserving ciphers: AES with ECB, CBC, CTR, CTS, + XCTR, and XTS modes + AEAD cipher: AES with CBC, ESSIV, and SHA-256 + for fscrypt and dm-crypt + + Architecture: riscv using one of + - Zvkns + config CRYPTO_GHASH_RISCV64 tristate "Hash functions: GHASH" depends on 64BIT && (RISCV_ISA_ZBC || RISCV_ISA_V) diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index 3c94753affdf..e5c702dff883 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -3,6 +3,9 @@ # linux/arch/riscv/crypto/Makefile # +obj-$(CONFIG_CRYPTO_AES_RISCV) += aes-riscv.o +aes-riscv-y := aes-riscv-glue.o aes-riscv64-zvkned.o + obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o ghash-riscv64-y := ghash-riscv64-glue.o ifdef CONFIG_RISCV_ISA_ZBC @@ -21,6 +24,9 @@ sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvknhb.o quiet_cmd_perlasm = PERLASM $@ cmd_perlasm = $(PERL) $(<) void $(@) +$(obj)/aes-riscv64-zvkned.S: $(src)/aes-riscv64-zvkned.pl + $(call cmd,perlasm) + $(obj)/ghash-riscv64-zbc.S: $(src)/ghash-riscv64-zbc.pl $(call cmd,perlasm) @@ -36,5 +42,6 @@ $(obj)/sha256-riscv64-zvknhb.S: $(src)/sha256-riscv64-zvknha.pl $(obj)/sha512-riscv64-zvknhb.S: $(src)/sha512-riscv64-zvknhb.pl $(call cmd,perlasm) +clean-files += aes-riscv64-zvkned.S clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S clean-files += sha256-riscv64-zvknha.S sha512-riscv64-zvknhb.S diff --git a/arch/riscv/crypto/aes-riscv-glue.c b/arch/riscv/crypto/aes-riscv-glue.c new file mode 100644 index 000000000000..f0b73058bb54 --- /dev/null +++ b/arch/riscv/crypto/aes-riscv-glue.c @@ -0,0 +1,169 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Linux/riscv port of the OpenSSL AES implementation for RISCV + * + * Copyright (C) 2023 VRULL GmbH + * Author: Heiko Stuebner + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct aes_key { + u8 key[AES_MAX_KEYLENGTH]; + int rounds; +}; + +/* variant using the zvkned vector crypto extension */ +void rv64i_zvkned_encrypt(const u8 *in, u8 *out, const struct aes_key *key); +void rv64i_zvkned_decrypt(const u8 *in, u8 *out, const struct aes_key *key); +int rv64i_zvkned_set_encrypt_key(const u8 *userKey, const int bits, + struct aes_key *key); +int rv64i_zvkned_set_decrypt_key(const u8 *userKey, const int bits, + struct aes_key *key); + +struct riscv_aes_ctx { + struct crypto_cipher *fallback; + struct aes_key enc_key; + struct aes_key dec_key; + unsigned int keylen; +}; + +static int riscv64_aes_init_zvkned(struct crypto_tfm *tfm) +{ + struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm); + const char *alg = crypto_tfm_alg_name(tfm); + struct crypto_cipher *fallback; + + fallback = crypto_alloc_cipher(alg, 0, CRYPTO_ALG_NEED_FALLBACK); + if (IS_ERR(fallback)) { + printk(KERN_ERR + "Failed to allocate transformation for '%s': %ld\n", + alg, PTR_ERR(fallback)); + return PTR_ERR(fallback); + } + + crypto_cipher_set_flags(fallback, + crypto_cipher_get_flags((struct + crypto_cipher *) + tfm)); + ctx->fallback = fallback; + + return 0; +} + +static void riscv_aes_exit(struct crypto_tfm *tfm) +{ + struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm); + + if (ctx->fallback) { + crypto_free_cipher(ctx->fallback); + ctx->fallback = NULL; + } +} + +static int riscv64_aes_setkey_zvkned(struct crypto_tfm *tfm, const u8 *key, + unsigned int keylen) +{ + struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm); + int ret; + + ctx->keylen = keylen; + + if (keylen == 16 || keylen == 32) { + kernel_rvv_begin(); + ret = rv64i_zvkned_set_encrypt_key(key, keylen * 8, &ctx->enc_key); + if (ret != 1) { + kernel_rvv_end(); + return -EINVAL; + } + + ret = rv64i_zvkned_set_decrypt_key(key, keylen * 8, &ctx->dec_key); + kernel_rvv_end(); + if (ret != 1) + return -EINVAL; + } + + ret = crypto_cipher_setkey(ctx->fallback, key, keylen); + + return ret ? -EINVAL : 0; +} + +static void riscv64_aes_encrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src) +{ + struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm); + + if (crypto_simd_usable() && (ctx->keylen == 16 || ctx->keylen == 32)) { + kernel_rvv_begin(); + rv64i_zvkned_encrypt(src, dst, &ctx->enc_key); + kernel_rvv_end(); + } else { + crypto_cipher_encrypt_one(ctx->fallback, dst, src); + } +} + +static void riscv64_aes_decrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src) +{ + struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm); + + if (crypto_simd_usable() && (ctx->keylen == 16 || ctx->keylen == 32)) { + kernel_rvv_begin(); + rv64i_zvkned_decrypt(src, dst, &ctx->dec_key); + kernel_rvv_end(); + } else { + crypto_cipher_decrypt_one(ctx->fallback, dst, src); + } +} + +struct crypto_alg riscv64_aes_zvkned_alg = { + .cra_name = "aes", + .cra_driver_name = "riscv-aes-zvkned", + .cra_module = THIS_MODULE, + .cra_priority = 300, + .cra_type = NULL, + .cra_flags = CRYPTO_ALG_TYPE_CIPHER | CRYPTO_ALG_NEED_FALLBACK, + .cra_alignmask = 0, + .cra_blocksize = AES_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv_aes_ctx), + .cra_init = riscv64_aes_init_zvkned, + .cra_exit = riscv_aes_exit, + .cra_cipher = { + .cia_min_keysize = AES_MIN_KEY_SIZE, + .cia_max_keysize = AES_MAX_KEY_SIZE, + .cia_setkey = riscv64_aes_setkey_zvkned, + .cia_encrypt = riscv64_aes_encrypt_zvkned, + .cia_decrypt = riscv64_aes_decrypt_zvkned, + }, +}; + +static int __init riscv_aes_mod_init(void) +{ + if (riscv_isa_extension_available(NULL, ZVKNED) && + riscv_vector_vlen() >= 128) + return crypto_register_alg(&riscv64_aes_zvkned_alg); + + return 0; +} + +static void __exit riscv_aes_mod_fini(void) +{ + if (riscv_isa_extension_available(NULL, ZVKNED) && + riscv_vector_vlen() >= 128) + return crypto_unregister_alg(&riscv64_aes_zvkned_alg); +} + +module_init(riscv_aes_mod_init); +module_exit(riscv_aes_mod_fini); + +MODULE_DESCRIPTION("AES (accelerated)"); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL v2"); +MODULE_ALIAS_CRYPTO("aes"); diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl new file mode 100644 index 000000000000..176588723220 --- /dev/null +++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl @@ -0,0 +1,500 @@ +#! /usr/bin/env perl +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +# - RV64I +# - RISC-V vector ('V') with VLEN >= 128 +# - RISC-V vector crypto AES extension ('Zvkned') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +################################################################################ +# int rv64i_zvkned_set_encrypt_key(const unsigned char *userKey, const int bits, +# AES_KEY *key) +# int rv64i_zvkned_set_decrypt_key(const unsigned char *userKey, const int bits, +# AES_KEY *key) +{ +my ($UKEY,$BITS,$KEYP) = ("a0", "a1", "a2"); +my ($T0,$T1,$T4) = ("t1", "t2", "t4"); +my ($v0, $v1, $v2, $v3, $v4, $v5, $v6, + $v7, $v8, $v9, $v10, $v11, $v12, + $v13, $v14, $v15, $v16, $v17, $v18, + $v19, $v20, $v21, $v22, $v23, $v24, +) = map("v$_",(0..24)); + +$code .= <<___; +.p2align 3 +.globl rv64i_zvkned_set_encrypt_key +.type rv64i_zvkned_set_encrypt_key,\@function +rv64i_zvkned_set_encrypt_key: + beqz $UKEY, L_fail_m1 + beqz $KEYP, L_fail_m1 + + # Get proper routine for key size + li $T0, 256 + beq $BITS, $T0, L_set_key_256 + li $T0, 128 + beq $BITS, $T0, L_set_key_128 + + j L_fail_m2 + +.size rv64i_zvkned_set_encrypt_key,.-rv64i_zvkned_set_encrypt_key +___ + +$code .= <<___; +.p2align 3 +.globl rv64i_zvkned_set_decrypt_key +.type rv64i_zvkned_set_decrypt_key,\@function +rv64i_zvkned_set_decrypt_key: + beqz $UKEY, L_fail_m1 + beqz $KEYP, L_fail_m1 + + # Get proper routine for key size + li $T0, 256 + beq $BITS, $T0, L_set_key_256 + li $T0, 128 + beq $BITS, $T0, L_set_key_128 + + j L_fail_m2 + +.size rv64i_zvkned_set_decrypt_key,.-rv64i_zvkned_set_decrypt_key +___ + +$code .= <<___; +.p2align 3 +L_set_key_128: + # Store the number of rounds + li $T1, 10 + sw $T1, 240($KEYP) + + @{[vsetivli__x0_4_e32_m1_ta_ma]} + + # Load the key + @{[vle32_v $v10, ($UKEY)]} + + # Generate keys for round 2-11 into registers v11-v20. + @{[vaeskf1_vi $v11, $v10, 1]} # v11 <- rk2 (w[ 4, 7]) + @{[vaeskf1_vi $v12, $v11, 2]} # v12 <- rk3 (w[ 8,11]) + @{[vaeskf1_vi $v13, $v12, 3]} # v13 <- rk4 (w[12,15]) + @{[vaeskf1_vi $v14, $v13, 4]} # v14 <- rk5 (w[16,19]) + @{[vaeskf1_vi $v15, $v14, 5]} # v15 <- rk6 (w[20,23]) + @{[vaeskf1_vi $v16, $v15, 6]} # v16 <- rk7 (w[24,27]) + @{[vaeskf1_vi $v17, $v16, 7]} # v17 <- rk8 (w[28,31]) + @{[vaeskf1_vi $v18, $v17, 8]} # v18 <- rk9 (w[32,35]) + @{[vaeskf1_vi $v19, $v18, 9]} # v19 <- rk10 (w[36,39]) + @{[vaeskf1_vi $v20, $v19, 10]} # v20 <- rk11 (w[40,43]) + + # Store the round keys + @{[vse32_v $v10, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v11, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v12, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v13, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v14, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v15, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v16, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v17, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v18, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v19, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v20, ($KEYP)]} + + li a0, 1 + ret +.size L_set_key_128,.-L_set_key_128 +___ + +$code .= <<___; +.p2align 3 +L_set_key_256: + # Store the number of rounds + li $T1, 14 + sw $T1, 240($KEYP) + + @{[vsetivli__x0_4_e32_m1_ta_ma]} + + # Load the key + @{[vle32_v $v10, ($UKEY)]} + addi $UKEY, $UKEY, 16 + @{[vle32_v $v11, ($UKEY)]} + + @{[vmv_v_v $v12, $v10]} + @{[vaeskf2_vi $v12, $v11, 1]} + @{[vmv_v_v $v13, $v11]} + @{[vaeskf2_vi $v13, $v12, 2]} + @{[vmv_v_v $v14, $v12]} + @{[vaeskf2_vi $v14, $v13, 3]} + @{[vmv_v_v $v15, $v13]} + @{[vaeskf2_vi $v15, $v14, 4]} + @{[vmv_v_v $v16, $v14]} + @{[vaeskf2_vi $v16, $v15, 5]} + @{[vmv_v_v $v17, $v15]} + @{[vaeskf2_vi $v17, $v16, 6]} + @{[vmv_v_v $v18, $v16]} + @{[vaeskf2_vi $v18, $v17, 7]} + @{[vmv_v_v $v19, $v17]} + @{[vaeskf2_vi $v19, $v18, 8]} + @{[vmv_v_v $v20, $v18]} + @{[vaeskf2_vi $v20, $v19, 9]} + @{[vmv_v_v $v21, $v19]} + @{[vaeskf2_vi $v21, $v20, 10]} + @{[vmv_v_v $v22, $v20]} + @{[vaeskf2_vi $v22, $v21, 11]} + @{[vmv_v_v $v23, $v21]} + @{[vaeskf2_vi $v23, $v22, 12]} + @{[vmv_v_v $v24, $v22]} + @{[vaeskf2_vi $v24, $v23, 13]} + + @{[vse32_v $v10, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v11, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v12, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v13, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v14, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v15, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v16, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v17, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v18, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v19, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v20, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v21, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v22, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v23, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $v24, ($KEYP)]} + + li a0, 1 + ret +.size L_set_key_256,.-L_set_key_256 +___ +} + +################################################################################ +# void rv64i_zvkned_encrypt(const unsigned char *in, unsigned char *out, +# const AES_KEY *key); +{ +my ($INP,$OUTP,$KEYP) = ("a0", "a1", "a2"); +my ($T0,$T1, $rounds, $T6) = ("a3", "a4", "t5", "t6"); +my ($v0, $v1, $v2, $v3, $v4, $v5, $v6, + $v7, $v8, $v9, $v10, $v11, $v12, + $v13, $v14, $v15, $v16, $v17, $v18, + $v19, $v20, $v21, $v22, $v23, $v24, +) = map("v$_",(0..24)); + +$code .= <<___; +.p2align 3 +.globl rv64i_zvkned_encrypt +.type rv64i_zvkned_encrypt,\@function +rv64i_zvkned_encrypt: + # Load number of rounds + lwu $rounds, 240($KEYP) + + # Get proper routine for key size + li $T6, 14 + beq $rounds, $T6, L_enc_256 + li $T6, 10 + beq $rounds, $T6, L_enc_128 + + j L_fail_m2 +.size rv64i_zvkned_encrypt,.-rv64i_zvkned_encrypt +___ + +$code .= <<___; +.p2align 3 +L_enc_128: + @{[vsetivli__x0_4_e32_m1_ta_ma]} + + @{[vle32_v $v10, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v11, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v12, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v13, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v14, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v15, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v16, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v17, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v18, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v19, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v20, ($KEYP)]} + + @{[vle32_v $v1, ($INP)]} + + @{[vaesz_vs $v1, $v10]} # with round key w[ 0, 3] + @{[vaesem_vs $v1, $v11]} # with round key w[ 4, 7] + @{[vaesem_vs $v1, $v12]} # with round key w[ 8,11] + @{[vaesem_vs $v1, $v13]} # with round key w[12,15] + @{[vaesem_vs $v1, $v14]} # with round key w[16,19] + @{[vaesem_vs $v1, $v15]} # with round key w[20,23] + @{[vaesem_vs $v1, $v16]} # with round key w[24,27] + @{[vaesem_vs $v1, $v17]} # with round key w[28,31] + @{[vaesem_vs $v1, $v18]} # with round key w[32,35] + @{[vaesem_vs $v1, $v19]} # with round key w[36,39] + @{[vaesef_vs $v1, $v20]} # with round key w[40,43] + + @{[vse32_v $v1, ($OUTP)]} + + ret +.size L_enc_128,.-L_enc_128 +___ + +$code .= <<___; +.p2align 3 +L_enc_256: + @{[vsetivli__x0_4_e32_m1_ta_ma]} + + @{[vle32_v $v10, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v11, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v12, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v13, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v14, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v15, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v16, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v17, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v18, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v19, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v20, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v21, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v22, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v23, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v24, ($KEYP)]} + + @{[vle32_v $v1, ($INP)]} + + @{[vaesz_vs $v1, $v10]} # with round key w[ 0, 3] + @{[vaesem_vs $v1, $v11]} + @{[vaesem_vs $v1, $v12]} + @{[vaesem_vs $v1, $v13]} + @{[vaesem_vs $v1, $v14]} + @{[vaesem_vs $v1, $v15]} + @{[vaesem_vs $v1, $v16]} + @{[vaesem_vs $v1, $v17]} + @{[vaesem_vs $v1, $v18]} + @{[vaesem_vs $v1, $v19]} + @{[vaesem_vs $v1, $v20]} + @{[vaesem_vs $v1, $v21]} + @{[vaesem_vs $v1, $v22]} + @{[vaesem_vs $v1, $v23]} + @{[vaesef_vs $v1, $v24]} + + @{[vse32_v $v1, ($OUTP)]} + ret +.size L_enc_256,.-L_enc_256 +___ +} + +################################################################################ +# void rv64i_zvkned_decrypt(const unsigned char *in, unsigned char *out, +# const AES_KEY *key); +{ +my ($INP,$OUTP,$KEYP) = ("a0", "a1", "a2"); +my ($T0,$T1, $rounds, $T6) = ("a3", "a4", "t5", "t6"); +my ($v0, $v1, $v2, $v3, $v4, $v5, $v6, + $v7, $v8, $v9, $v10, $v11, $v12, + $v13, $v14, $v15, $v16, $v17, $v18, + $v19, $v20, $v21, $v22, $v23, $v24, +) = map("v$_",(0..24)); + +$code .= <<___; +.p2align 3 +.globl rv64i_zvkned_decrypt +.type rv64i_zvkned_decrypt,\@function +rv64i_zvkned_decrypt: + # Load number of rounds + lwu $rounds, 240($KEYP) + + # Get proper routine for key size + li $T6, 14 + beq $rounds, $T6, L_dec_256 + li $T6, 10 + beq $rounds, $T6, L_dec_128 + + j L_fail_m2 +.size rv64i_zvkned_decrypt,.-rv64i_zvkned_decrypt +___ + +$code .= <<___; +.p2align 3 +L_dec_128: + @{[vsetivli__x0_4_e32_m1_ta_ma]} + + @{[vle32_v $v10, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v11, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v12, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v13, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v14, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v15, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v16, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v17, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v18, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v19, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v20, ($KEYP)]} + + @{[vle32_v $v1, ($INP)]} + + @{[vaesz_vs $v1, $v20]} # with round key w[43,47] + @{[vaesdm_vs $v1, $v19]} # with round key w[36,39] + @{[vaesdm_vs $v1, $v18]} # with round key w[32,35] + @{[vaesdm_vs $v1, $v17]} # with round key w[28,31] + @{[vaesdm_vs $v1, $v16]} # with round key w[24,27] + @{[vaesdm_vs $v1, $v15]} # with round key w[20,23] + @{[vaesdm_vs $v1, $v14]} # with round key w[16,19] + @{[vaesdm_vs $v1, $v13]} # with round key w[12,15] + @{[vaesdm_vs $v1, $v12]} # with round key w[ 8,11] + @{[vaesdm_vs $v1, $v11]} # with round key w[ 4, 7] + @{[vaesdf_vs $v1, $v10]} # with round key w[ 0, 3] + + @{[vse32_v $v1, ($OUTP)]} + + ret +.size L_dec_128,.-L_dec_128 +___ + +$code .= <<___; +.p2align 3 +L_dec_256: + @{[vsetivli__x0_4_e32_m1_ta_ma]} + + @{[vle32_v $v10, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v11, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v12, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v13, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v14, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v15, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v16, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v17, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v18, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v19, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v20, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v21, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v22, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v23, ($KEYP)]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $v24, ($KEYP)]} + + @{[vle32_v $v1, ($INP)]} + + @{[vaesz_vs $v1, $v24]} # with round key w[56,59] + @{[vaesdm_vs $v1, $v23]} # with round key w[52,55] + @{[vaesdm_vs $v1, $v22]} # with round key w[48,51] + @{[vaesdm_vs $v1, $v21]} # with round key w[44,47] + @{[vaesdm_vs $v1, $v20]} # with round key w[40,43] + @{[vaesdm_vs $v1, $v19]} # with round key w[36,39] + @{[vaesdm_vs $v1, $v18]} # with round key w[32,35] + @{[vaesdm_vs $v1, $v17]} # with round key w[28,31] + @{[vaesdm_vs $v1, $v16]} # with round key w[24,27] + @{[vaesdm_vs $v1, $v15]} # with round key w[20,23] + @{[vaesdm_vs $v1, $v14]} # with round key w[16,19] + @{[vaesdm_vs $v1, $v13]} # with round key w[12,15] + @{[vaesdm_vs $v1, $v12]} # with round key w[ 8,11] + @{[vaesdm_vs $v1, $v11]} # with round key w[ 4, 7] + @{[vaesdf_vs $v1, $v10]} # with round key w[ 0, 3] + + @{[vse32_v $v1, ($OUTP)]} + + ret +.size L_dec_256,.-L_dec_256 +___ +} + +$code .= <<___; +L_fail_m1: + li a0, -1 + ret +.size L_fail_m1,.-L_fail_m1 + +L_fail_m2: + li a0, -2 + ret +.size L_fail_m2,.-L_fail_m2 +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; From patchwork Mon Mar 13 19:13:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= X-Patchwork-Id: 69074 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1376736wrd; Mon, 13 Mar 2023 12:56:09 -0700 (PDT) X-Google-Smtp-Source: AK7set8JES53tBA9HUZeSYbDjGUfsHzKT+vkiM9XDu8k2QXmPjgfmpvMljYxer7CMsiDYwEmwFfe X-Received: by 2002:a17:902:d50f:b0:19c:d6fe:39c7 with SMTP id b15-20020a170902d50f00b0019cd6fe39c7mr42439835plg.41.1678737369708; Mon, 13 Mar 2023 12:56:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678737369; cv=none; d=google.com; s=arc-20160816; b=Ehn8AzU9EiNa+KnFnM+uSCXKs2wRbB7GzsA42qnZPr2HNBd3V5K0Sev1msGpYbxpLG 4pkUqWKg4Amq+dIOQ76l9uiiHfuGxxC+xc1K09XU9DzHuyJ0hOuBBXDSF/ZUDM+WSm9Q 20MBy6uGCahQS/ya9tjcTidjKv7NBBljwJN2nLUfEjtRuWR1FfBTGECIMI7/gZCrk5wD zqCmOgP+nfOjVPw30lQsXAECqjvVGPWrCsyzlMWZ3Lf4aWMXIm0Z1d+/UAT1JLfaWYeQ msHStM6vgoea+V+tQjMKdtOw+pB+UQ56Q8VnaJ0Z/zeejFy955+JM/egoyWE9hlm9iGn bpaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=n27U1p5WWr+zI6nYR6TPegjYDh/bXNoiu8ma5pThcUQ=; b=ZKwXXYxXlbmiXuKFBHUrpw3vTmTzElwTgObXzBN9zkqOO+suOCcwqX+4+Lhusv60MQ t9ZZlbXNHy+EjBQBoPtU/qA9mBoh7CAVYSFFI13YZCuwoTROif+yMx/QwqjqBoy5SHY9 N/ob2lQzgHSowpjZWt/DiB8mu2ReKfNzovQk4gnkWBg6K4KMjMgM/8ejNRw8ZxRe0hPt GDCnXSCKZnhrFBLNFjqi7/HDaY0qd8oIMpU6wLMVxrPiWhPKvUal2GQW8llU1AIu7xfQ /gTb49rl3Hapk9B1U3eAsSeG8GR/p9yAjHYXvvwRJe2QWFsEdQg9225keXNlD66dSrzz 71DQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 33-20020a630e61000000b004fb359820cfsi200184pgo.573.2023.03.13.12.55.53; Mon, 13 Mar 2023 12:56:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230308AbjCMTWr (ORCPT + 99 others); Mon, 13 Mar 2023 15:22:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48286 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231152AbjCMTTo (ORCPT ); Mon, 13 Mar 2023 15:19:44 -0400 Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 140B6265AB for ; Mon, 13 Mar 2023 12:19:12 -0700 (PDT) Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pbnbX-00028k-Mx; Mon, 13 Mar 2023 20:13:07 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, heiko@sntech.de Subject: [PATCH RFC v3 15/16] RISC-V: crypto: add Zvksed accelerated SM4 encryption implementation Date: Mon, 13 Mar 2023 20:13:01 +0100 Message-Id: <20230313191302.580787-16-heiko.stuebner@vrull.eu> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu> References: <20230313191302.580787-1-heiko.stuebner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760283716466694827?= X-GMAIL-MSGID: =?utf-8?q?1760283716466694827?= From: Heiko Stuebner Add support for the SM4 symmetric cipher implemented using the special instructions provided by the Zvksed vector crypto instructions. Co-developed-by: Christoph Müllner Signed-off-by: Christoph Müllner Signed-off-by: Heiko Stuebner --- arch/riscv/crypto/Kconfig | 17 ++ arch/riscv/crypto/Makefile | 7 + arch/riscv/crypto/sm4-riscv64-glue.c | 163 ++++++++++++++ arch/riscv/crypto/sm4-riscv64-zvksed.pl | 270 ++++++++++++++++++++++++ 4 files changed, 457 insertions(+) create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index c8abb29bb49b..a78c4fcb4127 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -51,4 +51,21 @@ config CRYPTO_SHA512_RISCV64 Architecture: riscv64 - Zvknhb vector crypto extension +config CRYPTO_SM4_RISCV64 + tristate "Ciphers: SM4 (ShangMi 4)" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_ALGAPI + select CRYPTO_SM4 + select CRYPTO_SM4_GENERIC + help + SM4 cipher algorithms (OSCCA GB/T 32907-2016, + ISO/IEC 18033-3:2010/Amd 1:2021) + + SM4 (GBT.32907-2016) is a cryptographic standard issued by the + Organization of State Commercial Administration of China (OSCCA) + as an authorized cryptographic algorithms for the use within China. + + Architecture: riscv64 + - Zvksed vector crypto extension + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index e5c702dff883..c721da42af4c 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -21,6 +21,9 @@ sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknhb.o obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvknhb.o +obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o +sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed.o + quiet_cmd_perlasm = PERLASM $@ cmd_perlasm = $(PERL) $(<) void $(@) @@ -42,6 +45,10 @@ $(obj)/sha256-riscv64-zvknhb.S: $(src)/sha256-riscv64-zvknha.pl $(obj)/sha512-riscv64-zvknhb.S: $(src)/sha512-riscv64-zvknhb.pl $(call cmd,perlasm) +$(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl + $(call cmd,perlasm) + clean-files += aes-riscv64-zvkned.S clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S clean-files += sha256-riscv64-zvknha.S sha512-riscv64-zvknhb.S +clean-files += sm4-riscv64-zvksed.S diff --git a/arch/riscv/crypto/sm4-riscv64-glue.c b/arch/riscv/crypto/sm4-riscv64-glue.c new file mode 100644 index 000000000000..3eb37441f37c --- /dev/null +++ b/arch/riscv/crypto/sm4-riscv64-glue.c @@ -0,0 +1,163 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Linux/riscv64 port of the OpenSSL SM4 implementation for RISCV64 + * + * Copyright (C) 2023 VRULL GmbH + * Author: Heiko Stuebner + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct sm4_key { + u32 rkey[SM4_RKEY_WORDS]; +}; + +void rv64i_zvksed_sm4_encrypt(const u8 *in, u8 *out, const struct sm4_key *key); +void rv64i_zvksed_sm4_decrypt(const u8 *in, u8 *out, const struct sm4_key *key); +int rv64i_zvksed_sm4_set_encrypt_key(const u8 *userKey, struct sm4_key *key); +int rv64i_zvksed_sm4_set_decrypt_key(const u8 *userKey, struct sm4_key *key); + +struct riscv_sm4_ctx { + struct crypto_cipher *fallback; + struct sm4_key enc_key; + struct sm4_key dec_key; + unsigned int keylen; +}; + +static int riscv64_sm4_init_zvksed(struct crypto_tfm *tfm) +{ + struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm); + const char *alg = crypto_tfm_alg_name(tfm); + struct crypto_cipher *fallback; + + fallback = crypto_alloc_cipher(alg, 0, CRYPTO_ALG_NEED_FALLBACK); + if (IS_ERR(fallback)) { + printk(KERN_ERR + "Failed to allocate fallback for '%s': %ld\n", + alg, PTR_ERR(fallback)); + return PTR_ERR(fallback); + } + + crypto_cipher_set_flags(fallback, + crypto_cipher_get_flags((struct + crypto_cipher *) + tfm)); + ctx->fallback = fallback; + + return 0; +} + +static void riscv64_sm4_exit_zvksed(struct crypto_tfm *tfm) +{ + struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm); + + if (ctx->fallback) { + crypto_free_cipher(ctx->fallback); + ctx->fallback = NULL; + } +} + +static int riscv64_sm4_setkey_zvksed(struct crypto_tfm *tfm, const u8 *key, + unsigned int keylen) +{ + struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm); + int ret; + + ctx->keylen = keylen; + + kernel_rvv_begin(); + ret = rv64i_zvksed_sm4_set_encrypt_key(key, &ctx->enc_key); + if (ret != 1) { + kernel_rvv_end(); + return -EINVAL; + } + + ret = rv64i_zvksed_sm4_set_decrypt_key(key, &ctx->dec_key); + kernel_rvv_end(); + if (ret != 1) + return -EINVAL; + + ret = crypto_cipher_setkey(ctx->fallback, key, keylen); + + return ret ? -EINVAL : 0; +} + +static void riscv64_sm4_encrypt_zvksed(struct crypto_tfm *tfm, u8 *dst, const u8 *src) +{ + struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm); + + if (crypto_simd_usable()) { + kernel_rvv_begin(); + rv64i_zvksed_sm4_encrypt(src, dst, &ctx->enc_key); + kernel_rvv_end(); + } else { + crypto_cipher_encrypt_one(ctx->fallback, dst, src); + } +} + +static void riscv64_sm4_decrypt_zvksed(struct crypto_tfm *tfm, u8 *dst, const u8 *src) +{ + struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm); + + if (crypto_simd_usable()) { + kernel_rvv_begin(); + rv64i_zvksed_sm4_decrypt(src, dst, &ctx->dec_key); + kernel_rvv_end(); + } else { + crypto_cipher_decrypt_one(ctx->fallback, dst, src); + } +} + +struct crypto_alg riscv64_sm4_zvksed_alg = { + .cra_name = "sm4", + .cra_driver_name = "riscv-sm4-zvksed", + .cra_module = THIS_MODULE, + .cra_priority = 300, + .cra_flags = CRYPTO_ALG_TYPE_CIPHER | CRYPTO_ALG_NEED_FALLBACK, + .cra_blocksize = SM4_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv_sm4_ctx), + .cra_init = riscv64_sm4_init_zvksed, + .cra_exit = riscv64_sm4_exit_zvksed, + .cra_cipher = { + .cia_min_keysize = SM4_KEY_SIZE, + .cia_max_keysize = SM4_KEY_SIZE, + .cia_setkey = riscv64_sm4_setkey_zvksed, + .cia_encrypt = riscv64_sm4_encrypt_zvksed, + .cia_decrypt = riscv64_sm4_decrypt_zvksed, + }, +}; + +static int __init riscv64_sm4_mod_init(void) +{ + if (riscv_isa_extension_available(NULL, ZVKSED) && + riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >= 128) + return crypto_register_alg(&riscv64_sm4_zvksed_alg); + + return 0; +} + +static void __exit riscv64_sm4_mod_fini(void) +{ + if (riscv_isa_extension_available(NULL, ZVKSED) && + riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >= 128) + crypto_unregister_alg(&riscv64_sm4_zvksed_alg); +} + +module_init(riscv64_sm4_mod_init); +module_exit(riscv64_sm4_mod_fini); + +MODULE_DESCRIPTION("SM4 (accelerated)"); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL v2"); +MODULE_ALIAS_CRYPTO("sm4"); diff --git a/arch/riscv/crypto/sm4-riscv64-zvksed.pl b/arch/riscv/crypto/sm4-riscv64-zvksed.pl new file mode 100644 index 000000000000..3c948f273071 --- /dev/null +++ b/arch/riscv/crypto/sm4-riscv64-zvksed.pl @@ -0,0 +1,270 @@ +#! /usr/bin/env perl +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +# The generated code of this file depends on the following RISC-V extensions: +# - RV64I +# - RISC-V vector ('V') with VLEN >= 128 +# - Vector Bit-manipulation used in Cryptography ('Zvkb') +# - Vector ShangMi Suite: SM4 Block Cipher ('Zvksed') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +#### +# int rv64i_zvksed_sm4_set_encrypt_key(const unsigned char *userKey, +# SM4_KEY *key); +# +{ +my ($ukey,$keys,$fk)=("a0","a1","t0"); +my ($vukey,$vfk,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10"); +$code .= <<___; +.p2align 3 +.globl rv64i_zvksed_sm4_set_encrypt_key +.type rv64i_zvksed_sm4_set_encrypt_key,\@function +rv64i_zvksed_sm4_set_encrypt_key: + @{[vsetivli__x0_4_e32_m1_ta_ma]} + + # Load the user key + @{[vle32_v $vukey, $ukey]} + @{[vrev8_v $vukey, $vukey]} + + # Load the FK. + la $fk, FK + @{[vle32_v $vfk, $fk]} + + # Generate round keys. + @{[vxor_vv $vukey, $vukey, $vfk]} + @{[vsm4k_vi $vk0, $vukey, 0]} # rk[0:3] + @{[vsm4k_vi $vk1, $vk0, 1]} # rk[4:7] + @{[vsm4k_vi $vk2, $vk1, 2]} # rk[8:11] + @{[vsm4k_vi $vk3, $vk2, 3]} # rk[12:15] + @{[vsm4k_vi $vk4, $vk3, 4]} # rk[16:19] + @{[vsm4k_vi $vk5, $vk4, 5]} # rk[20:23] + @{[vsm4k_vi $vk6, $vk5, 6]} # rk[24:27] + @{[vsm4k_vi $vk7, $vk6, 7]} # rk[28:31] + + # Store round keys + @{[vse32_v $vk0, $keys]} # rk[0:3] + addi $keys, $keys, 16 + @{[vse32_v $vk1, $keys]} # rk[4:7] + addi $keys, $keys, 16 + @{[vse32_v $vk2, $keys]} # rk[8:11] + addi $keys, $keys, 16 + @{[vse32_v $vk3, $keys]} # rk[12:15] + addi $keys, $keys, 16 + @{[vse32_v $vk4, $keys]} # rk[16:19] + addi $keys, $keys, 16 + @{[vse32_v $vk5, $keys]} # rk[20:23] + addi $keys, $keys, 16 + @{[vse32_v $vk6, $keys]} # rk[24:27] + addi $keys, $keys, 16 + @{[vse32_v $vk7, $keys]} # rk[28:31] + + li a0, 1 + ret +.size rv64i_zvksed_sm4_set_encrypt_key,.-rv64i_zvksed_sm4_set_encrypt_key +___ +} + +#### +# int rv64i_zvksed_sm4_set_decrypt_key(const unsigned char *userKey, +# SM4_KEY *key); +# +{ +my ($ukey,$keys,$fk,$stride)=("a0","a1","t0","t1"); +my ($vukey,$vfk,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10"); +$code .= <<___; +.p2align 3 +.globl rv64i_zvksed_sm4_set_decrypt_key +.type rv64i_zvksed_sm4_set_decrypt_key,\@function +rv64i_zvksed_sm4_set_decrypt_key: + @{[vsetivli__x0_4_e32_m1_ta_ma]} + + # Load the user key + @{[vle32_v $vukey, $ukey]} + @{[vrev8_v $vukey, $vukey]} + + # Load the FK. + la $fk, FK + @{[vle32_v $vfk, $fk]} + + # Generate round keys. + @{[vxor_vv $vukey, $vukey, $vfk]} + @{[vsm4k_vi $vk0, $vukey, 0]} # rk[0:3] + @{[vsm4k_vi $vk1, $vk0, 1]} # rk[4:7] + @{[vsm4k_vi $vk2, $vk1, 2]} # rk[8:11] + @{[vsm4k_vi $vk3, $vk2, 3]} # rk[12:15] + @{[vsm4k_vi $vk4, $vk3, 4]} # rk[16:19] + @{[vsm4k_vi $vk5, $vk4, 5]} # rk[20:23] + @{[vsm4k_vi $vk6, $vk5, 6]} # rk[24:27] + @{[vsm4k_vi $vk7, $vk6, 7]} # rk[28:31] + + # Store round keys in reverse order + addi $keys, $keys, 12 + li $stride, -4 + @{[vsse32_v $vk7, $keys, $stride]} # rk[31:28] + addi $keys, $keys, 16 + @{[vsse32_v $vk6, $keys, $stride]} # rk[27:24] + addi $keys, $keys, 16 + @{[vsse32_v $vk5, $keys, $stride]} # rk[23:20] + addi $keys, $keys, 16 + @{[vsse32_v $vk4, $keys, $stride]} # rk[19:16] + addi $keys, $keys, 16 + @{[vsse32_v $vk3, $keys, $stride]} # rk[15:12] + addi $keys, $keys, 16 + @{[vsse32_v $vk2, $keys, $stride]} # rk[11:8] + addi $keys, $keys, 16 + @{[vsse32_v $vk1, $keys, $stride]} # rk[7:4] + addi $keys, $keys, 16 + @{[vsse32_v $vk0, $keys, $stride]} # rk[3:0] + + li a0, 1 + ret +.size rv64i_zvksed_sm4_set_decrypt_key,.-rv64i_zvksed_sm4_set_decrypt_key +___ +} + +#### +# void rv64i_zvksed_sm4_encrypt(const unsigned char *in, unsigned char *out, +# const SM4_KEY *key); +# +{ +my ($in,$out,$keys,$stride)=("a0","a1","a2","t0"); +my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10"); +$code .= <<___; +.p2align 3 +.globl rv64i_zvksed_sm4_encrypt +.type rv64i_zvksed_sm4_encrypt,\@function +rv64i_zvksed_sm4_encrypt: + @{[vsetivli__x0_4_e32_m1_ta_ma]} + + # Order of elements was adjusted in set_encrypt_key() + @{[vle32_v $vk0, $keys]} # rk[0:3] + addi $keys, $keys, 16 + @{[vle32_v $vk1, $keys]} # rk[4:7] + addi $keys, $keys, 16 + @{[vle32_v $vk2, $keys]} # rk[8:11] + addi $keys, $keys, 16 + @{[vle32_v $vk3, $keys]} # rk[12:15] + addi $keys, $keys, 16 + @{[vle32_v $vk4, $keys]} # rk[16:19] + addi $keys, $keys, 16 + @{[vle32_v $vk5, $keys]} # rk[20:23] + addi $keys, $keys, 16 + @{[vle32_v $vk6, $keys]} # rk[24:27] + addi $keys, $keys, 16 + @{[vle32_v $vk7, $keys]} # rk[28:31] + + # Load input data + @{[vle32_v $vdata, $in]} + @{[vrev8_v $vdata, $vdata]} + + # Encrypt with all keys + @{[vsm4r_vs $vdata, $vk0]} + @{[vsm4r_vs $vdata, $vk1]} + @{[vsm4r_vs $vdata, $vk2]} + @{[vsm4r_vs $vdata, $vk3]} + @{[vsm4r_vs $vdata, $vk4]} + @{[vsm4r_vs $vdata, $vk5]} + @{[vsm4r_vs $vdata, $vk6]} + @{[vsm4r_vs $vdata, $vk7]} + + # Save the ciphertext (in reverse element order) + @{[vrev8_v $vdata, $vdata]} + li $stride, -4 + addi $out, $out, 12 + @{[vsse32_v $vdata, $out, $stride]} + + ret +.size rv64i_zvksed_sm4_encrypt,.-rv64i_zvksed_sm4_encrypt +___ +} + +#### +# void rv64i_zvksed_sm4_decrypt(const unsigned char *in, unsigned char *out, +# const SM4_KEY *key); +# +{ +my ($in,$out,$keys,$stride)=("a0","a1","a2","t0"); +my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10"); +$code .= <<___; +.p2align 3 +.globl rv64i_zvksed_sm4_decrypt +.type rv64i_zvksed_sm4_decrypt,\@function +rv64i_zvksed_sm4_decrypt: + @{[vsetivli__x0_4_e32_m1_ta_ma]} + + # Order of elements was adjusted in set_decrypt_key() + @{[vle32_v $vk7, $keys]} # rk[31:28] + addi $keys, $keys, 16 + @{[vle32_v $vk6, $keys]} # rk[27:24] + addi $keys, $keys, 16 + @{[vle32_v $vk5, $keys]} # rk[23:20] + addi $keys, $keys, 16 + @{[vle32_v $vk4, $keys]} # rk[19:16] + addi $keys, $keys, 16 + @{[vle32_v $vk3, $keys]} # rk[15:11] + addi $keys, $keys, 16 + @{[vle32_v $vk2, $keys]} # rk[11:8] + addi $keys, $keys, 16 + @{[vle32_v $vk1, $keys]} # rk[7:4] + addi $keys, $keys, 16 + @{[vle32_v $vk0, $keys]} # rk[3:0] + + # Load input data + @{[vle32_v $vdata, $in]} + @{[vrev8_v $vdata, $vdata]} + + # Encrypt with all keys + @{[vsm4r_vs $vdata, $vk7]} + @{[vsm4r_vs $vdata, $vk6]} + @{[vsm4r_vs $vdata, $vk5]} + @{[vsm4r_vs $vdata, $vk4]} + @{[vsm4r_vs $vdata, $vk3]} + @{[vsm4r_vs $vdata, $vk2]} + @{[vsm4r_vs $vdata, $vk1]} + @{[vsm4r_vs $vdata, $vk0]} + + # Save the ciphertext (in reverse element order) + @{[vrev8_v $vdata, $vdata]} + li $stride, -4 + addi $out, $out, 12 + @{[vsse32_v $vdata, $out, $stride]} + + ret +.size rv64i_zvksed_sm4_decrypt,.-rv64i_zvksed_sm4_decrypt +___ +} + +$code .= <<___; +# Family Key (little-endian 32-bit chunks) +.p2align 3 +FK: + .word 0xA3B1BAC6, 0x56AA3350, 0x677D9197, 0xB27022DC +.size FK,.-FK +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; From patchwork Mon Mar 13 19:13:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= X-Patchwork-Id: 69075 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1378719wrd; Mon, 13 Mar 2023 13:00:51 -0700 (PDT) X-Google-Smtp-Source: AK7set+2Y9O9Hs8GSEBRXbN56H+R4+Zb6diMT5paWigpMq5jqi6g/XmONUzIi/EMpWrSiFxWl3tA X-Received: by 2002:a05:6a20:431a:b0:cb:77ef:b502 with SMTP id h26-20020a056a20431a00b000cb77efb502mr42195347pzk.5.1678737651388; Mon, 13 Mar 2023 13:00:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678737651; cv=none; d=google.com; s=arc-20160816; b=hwGFH+BjczN8axTnbDBQM9vkgBin8HYtpHeVSkRSX25HIvrDee8HQqX+JbC+A+b4j3 Mw34J7znn9/DFknZGm4ADFpgC2RFQRScXJkh7IgmL8GJGdMHFnRFk4pH+RMQgkorwpmd 6RzfqJnmF+mRqtQBAIpmx56MhG5riBpjOEc0n15EBNQSzgBT2FmVas961dddaN+fRMsn zFM4MK3oDMvncgLbwCvJbovVdyua6q0w1f8XwQ6+sHjUUw0cwc9PcS3lLd8iF/cdM/pv SGT2HiUd6TEAVbHOwLgQELqTrvmNAVqomauGCMfmaoWNh6DrJLifum7UUJVcIULv/ZR3 RVjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=omJtUFaS2JOlvXouatGep2he25yHHuqqZuBfFBCPUiA=; b=GU1W1Wx2q5Ta5w9+Tokn5025hQmC8LnLx2Efd4t0YMICkn/Dj4cStaFCY62PaWiUCg gS+3SSU8Pox0oqDHDMT4Qy1fSHXcYloBwxncmj6+aAY9bZff/yl/cS44Zn15q6WKLbbw 5fPhb0Skl0AzNiqJjZcrvlV+autvbKen3auM7xW4W1l8r5mr8n+5xMVsPeP0IF4tnqS1 KBilN7aYDFSzScsU8NxTps7FNgfly8CGeRXB73oD1HAmqBJw4/lkcGeBY5s7Q3SEqrCo gOkjM2XJVxRfWTn+9N5EcXcuJ3Fqb33QyAJKbYtJpswt7YfmkuXnn3GNeTiEDF7fQh3X OTOQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g14-20020aa79dce000000b00576c9c3c4aasi332457pfq.5.2023.03.13.13.00.35; Mon, 13 Mar 2023 13:00:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231410AbjCMTTK (ORCPT + 99 others); Mon, 13 Mar 2023 15:19:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48446 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231313AbjCMTSk (ORCPT ); Mon, 13 Mar 2023 15:18:40 -0400 Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67D4723312 for ; Mon, 13 Mar 2023 12:18:02 -0700 (PDT) Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pbnbX-00028k-Uz; Mon, 13 Mar 2023 20:13:08 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, heiko@sntech.de Subject: [PATCH RFC v3 16/16] RISC-V: crypto: add Zvksh accelerated SM3 hash implementation Date: Mon, 13 Mar 2023 20:13:02 +0100 Message-Id: <20230313191302.580787-17-heiko.stuebner@vrull.eu> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu> References: <20230313191302.580787-1-heiko.stuebner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760284011632018712?= X-GMAIL-MSGID: =?utf-8?q?1760284011632018712?= From: Heiko Stuebner Add support for the SM3 hash function implemented using the special instructions provided by the Zvksh vector crypto instructions. Co-developed-by: Charalampos Mitrodimas Signed-off-by: Charalampos Mitrodimas Signed-off-by: Heiko Stuebner --- arch/riscv/crypto/Kconfig | 11 ++ arch/riscv/crypto/Makefile | 8 +- arch/riscv/crypto/sm3-riscv64-glue.c | 112 ++++++++++++++ arch/riscv/crypto/sm3-riscv64-zvksh.pl | 195 +++++++++++++++++++++++++ 4 files changed, 325 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index a78c4fcb4127..9e50e7236036 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -51,6 +51,17 @@ config CRYPTO_SHA512_RISCV64 Architecture: riscv64 - Zvknhb vector crypto extension +config CRYPTO_SM3_RISCV64 + tristate "Hash functions: SM3 (ShangMi 3)" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_HASH + select CRYPTO_SM3 + help + SHA-512 secure hash algorithm (FIPS 180) + + Architecture: riscv64 + - Zvknhb vector crypto extension + config CRYPTO_SM4_RISCV64 tristate "Ciphers: SM4 (ShangMi 4)" depends on 64BIT && RISCV_ISA_V diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index c721da42af4c..79ff81a05d8b 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -21,6 +21,9 @@ sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknhb.o obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvknhb.o +obj-$(CONFIG_CRYPTO_SM3_RISCV64) += sm3-riscv64.o +sm3-riscv64-y := sm3-riscv64-glue.o sm3-riscv64-zvksh.o + obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed.o @@ -45,10 +48,13 @@ $(obj)/sha256-riscv64-zvknhb.S: $(src)/sha256-riscv64-zvknha.pl $(obj)/sha512-riscv64-zvknhb.S: $(src)/sha512-riscv64-zvknhb.pl $(call cmd,perlasm) +$(obj)/sm3-riscv64-zvksh.S: $(src)/sm3-riscv64-zvksh.pl + $(call cmd,perlasm) + $(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl $(call cmd,perlasm) clean-files += aes-riscv64-zvkned.S clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S clean-files += sha256-riscv64-zvknha.S sha512-riscv64-zvknhb.S -clean-files += sm4-riscv64-zvksed.S +clean-files += sm3-riscv64-zvksh.S sm4-riscv64-zvksed.S diff --git a/arch/riscv/crypto/sm3-riscv64-glue.c b/arch/riscv/crypto/sm3-riscv64-glue.c new file mode 100644 index 000000000000..455f73c27d1f --- /dev/null +++ b/arch/riscv/crypto/sm3-riscv64-glue.c @@ -0,0 +1,112 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Linux/riscv64 port of the OpenSSL SM3 implementation for RISCV64 + * + * Copyright (C) 2023 VRULL GmbH + * Author: Heiko Stuebner + */ + +#include +#include +#include +#include +#include +#include +#include + +asmlinkage void ossl_hwsm3_block_data_order_zvksh(u32 *digest, const void *o, + unsigned int num); + +static void __sm3_block_data_order(struct sm3_state *sst, u8 const *src, + int blocks) +{ + ossl_hwsm3_block_data_order_zvksh(sst->state, src, blocks); +} + +static int riscv64_sm3_update(struct shash_desc *desc, const u8 *data, + unsigned int len) +{ + if (crypto_simd_usable()) { + int ret; + + kernel_rvv_begin(); + ret = sm3_base_do_update(desc, data, len, + __sm3_block_data_order); + kernel_rvv_end(); + return ret; + } else { + sm3_update(shash_desc_ctx(desc), data, len); + return 0; + } +} + +static int riscv64_sm3_finup(struct shash_desc *desc, const u8 *data, + unsigned int len, u8 *out) +{ + + if (!crypto_simd_usable()) { + struct sm3_state *sctx = shash_desc_ctx(desc); + + if (len) + sm3_update(sctx, data, len); + sm3_final(sctx, out); + return 0; + } + + kernel_rvv_begin(); + if (len) + sm3_base_do_update(desc, data, len, + __sm3_block_data_order); + + sm3_base_do_finalize(desc, __sm3_block_data_order); + kernel_rvv_end(); + + return sm3_base_finish(desc, out); +} + +static int riscv64_sm3_final(struct shash_desc *desc, u8 *out) +{ + return riscv64_sm3_finup(desc, NULL, 0, out); +} + +static struct shash_alg sm3_alg = { + .digestsize = SM3_DIGEST_SIZE, + .init = sm3_base_init, + .update = riscv64_sm3_update, + .final = riscv64_sm3_final, + .finup = riscv64_sm3_finup, + .descsize = sizeof(struct sm3_state), + .base.cra_name = "sm3", + .base.cra_driver_name = "sm3-riscv64-zvksh", + .base.cra_priority = 150, + .base.cra_blocksize = SM3_BLOCK_SIZE, + .base.cra_module = THIS_MODULE, +}; + +static int __init sm3_mod_init(void) +{ + /* sm3 needs at least a vlen of 256 to work correctly */ + if (riscv_isa_extension_available(NULL, ZVKSH) && + riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >= 256) + return crypto_register_shash(&sm3_alg); + + return 0; +} + +static void __exit sm3_mod_fini(void) +{ + if (riscv_isa_extension_available(NULL, ZVKSH) && + riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >= 256) + crypto_unregister_shash(&sm3_alg); +} + +module_init(sm3_mod_init); +module_exit(sm3_mod_fini); + +MODULE_DESCRIPTION("SM3 secure hash for riscv64"); +MODULE_AUTHOR("Andy Polyakov "); +MODULE_AUTHOR("Ard Biesheuvel "); +MODULE_LICENSE("GPL v2"); +MODULE_ALIAS_CRYPTO("sm3"); diff --git a/arch/riscv/crypto/sm3-riscv64-zvksh.pl b/arch/riscv/crypto/sm3-riscv64-zvksh.pl new file mode 100644 index 000000000000..d6006ef32e4e --- /dev/null +++ b/arch/riscv/crypto/sm3-riscv64-zvksh.pl @@ -0,0 +1,195 @@ +#! /usr/bin/env perl +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +# The generated code of this file depends on the following RISC-V extensions: +# - RV64I +# - RISC-V vector ('V') with VLEN >= 256 +# - Vector Bit-manipulation used in Cryptography ('Zvkb') +# - ShangMi Suite: SM3 Secure Hash ('Zvksh') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +################################################################################ +# ossl_hwsm3_block_data_order_zvksh(SM3_CTX *c, const void *p, size_t num); +{ +my ($CTX, $INPUT, $NUM) = ("a0", "a1", "a2"); +my ($V0, $V1, $V2, $V3, $V4) = ("v0", "v1", "v2", "v3", "v4"); + +$code .= <<___; +.text +.p2align 3 +.globl ossl_hwsm3_block_data_order_zvksh +.type ossl_hwsm3_block_data_order_zvksh,\@function +ossl_hwsm3_block_data_order_zvksh: + @{[vsetivli__x0_8_e32_m1_ta_ma]} + + # Load initial state of hash context (c->A-H). + @{[vle32_v $V0, $CTX]} + @{[vrev8_v $V0, $V0]} + +L_sm3_loop: + # Copy the previous state to v1. + # It will be XOR'ed with the current state at the end of the round. + @{[vmv_v_v $V1, $V0]} + + # Load the 64B block in 2x32B chunks. + @{[vle32_v $V3, $INPUT]} # v3 := {w0, ..., w7} + add $INPUT, $INPUT, 32 + + @{[vle32_v $V4, $INPUT]} # v4 := {w8, ..., w15} + add $INPUT, $INPUT, 32 + + add $NUM, $NUM, -1 + + # As vsm3c consumes only w0, w1, w4, w5 we need to slide the input + # 2 elements down so we process elements w2, w3, w6, w7 + # This will be repeated for each odd round. + @{[vslidedown_vi $V2, $V3, 2]} # v2 := {w2, ..., w7, 0, 0} + + @{[vsm3c_vi $V0, $V3, 0]} + @{[vsm3c_vi $V0, $V2, 1]} + + # Prepare a vector with {w4, ..., w11} + @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w4, ..., w7, 0, 0, 0, 0} + @{[vslideup_vi $V2, $V4, 4]} # v2 := {w4, w5, w6, w7, w8, w9, w10, w11} + + @{[vsm3c_vi $V0, $V2, 2]} + @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w6, w7, w8, w9, w10, w11, 0, 0} + @{[vsm3c_vi $V0, $V2, 3]} + + @{[vsm3c_vi $V0, $V4, 4]} + @{[vslidedown_vi $V2, $V4, 2]} # v2 := {w10, w11, w12, w13, w14, w15, 0, 0} + @{[vsm3c_vi $V0, $V2, 5]} + + @{[vsm3me_vv $V3, $V4, $V3]} # v3 := {w16, w17, w18, w19, w20, w21, w22, w23} + + # Prepare a register with {w12, w13, w14, w15, w16, w17, w18, w19} + @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w12, w13, w14, w15, 0, 0, 0, 0} + @{[vslideup_vi $V2, $V3, 4]} # v2 := {w12, w13, w14, w15, w16, w17, w18, w19} + + @{[vsm3c_vi $V0, $V2, 6]} + @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w14, w15, w16, w17, w18, w19, 0, 0} + @{[vsm3c_vi $V0, $V2, 7]} + + @{[vsm3c_vi $V0, $V3, 8]} + @{[vslidedown_vi $V2, $V3, 2]} # v2 := {w18, w19, w20, w21, w22, w23, 0, 0} + @{[vsm3c_vi $V0, $V2, 9]} + + @{[vsm3me_vv $V4, $V3, $V4]} # v4 := {w24, w25, w26, w27, w28, w29, w30, w31} + + # Prepare a register with {w20, w21, w22, w23, w24, w25, w26, w27} + @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w20, w21, w22, w23, 0, 0, 0, 0} + @{[vslideup_vi $V2, $V4, 4]} # v2 := {w20, w21, w22, w23, w24, w25, w26, w27} + + @{[vsm3c_vi $V0, $V2, 10]} + @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w22, w23, w24, w25, w26, w27, 0, 0} + @{[vsm3c_vi $V0, $V2, 11]} + + @{[vsm3c_vi $V0, $V4, 12]} + @{[vslidedown_vi $V2, $V4, 2]} # v2 := {w26, w27, w28, w29, w30, w31, 0, 0} + @{[vsm3c_vi $V0, $V2, 13]} + + @{[vsm3me_vv $V3, $V4, $V3]} # v3 := {w32, w33, w34, w35, w36, w37, w38, w39} + + # Prepare a register with {w28, w29, w30, w31, w32, w33, w34, w35} + @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w28, w29, w30, w31, 0, 0, 0, 0} + @{[vslideup_vi $V2, $V3, 4]} # v2 := {w28, w29, w30, w31, w32, w33, w34, w35} + + @{[vsm3c_vi $V0, $V2, 14]} + @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w30, w31, w32, w33, w34, w35, 0, 0} + @{[vsm3c_vi $V0, $V2, 15]} + + @{[vsm3c_vi $V0, $V3, 16]} + @{[vslidedown_vi $V2, $V3, 2]} # v2 := {w34, w35, w36, w37, w38, w39, 0, 0} + @{[vsm3c_vi $V0, $V2, 17]} + + @{[vsm3me_vv $V4, $V3, $V4]} # v4 := {w40, w41, w42, w43, w44, w45, w46, w47} + + # Prepare a register with {w36, w37, w38, w39, w40, w41, w42, w43} + @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w36, w37, w38, w39, 0, 0, 0, 0} + @{[vslideup_vi $V2, $V4, 4]} # v2 := {w36, w37, w38, w39, w40, w41, w42, w43} + + @{[vsm3c_vi $V0, $V2, 18]} + @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w38, w39, w40, w41, w42, w43, 0, 0} + @{[vsm3c_vi $V0, $V2, 19]} + + @{[vsm3c_vi $V0, $V4, 20]} + @{[vslidedown_vi $V2, $V4, 2]} # v2 := {w42, w43, w44, w45, w46, w47, 0, 0} + @{[vsm3c_vi $V0, $V2, 21]} + + @{[vsm3me_vv $V3, $V4, $V3]} # v3 := {w48, w49, w50, w51, w52, w53, w54, w55} + + # Prepare a register with {w44, w45, w46, w47, w48, w49, w50, w51} + @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w44, w45, w46, w47, 0, 0, 0, 0} + @{[vslideup_vi $V2, $V3, 4]} # v2 := {w44, w45, w46, w47, w48, w49, w50, w51} + + @{[vsm3c_vi $V0, $V2, 22]} + @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w46, w47, w48, w49, w50, w51, 0, 0} + @{[vsm3c_vi $V0, $V2, 23]} + + @{[vsm3c_vi $V0, $V3, 24]} + @{[vslidedown_vi $V2, $V3, 2]} # v2 := {w50, w51, w52, w53, w54, w55, 0, 0} + @{[vsm3c_vi $V0, $V2, 25]} + + @{[vsm3me_vv $V4, $V3, $V4]} # v4 := {w56, w57, w58, w59, w60, w61, w62, w63} + + # Prepare a register with {w52, w53, w54, w55, w56, w57, w58, w59} + @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w52, w53, w54, w55, 0, 0, 0, 0} + @{[vslideup_vi $V2, $V4, 4]} # v2 := {w52, w53, w54, w55, w56, w57, w58, w59} + + @{[vsm3c_vi $V0, $V2, 26]} + @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w54, w55, w56, w57, w58, w59, 0, 0} + @{[vsm3c_vi $V0, $V2, 27]} + + @{[vsm3c_vi $V0, $V4, 28]} + @{[vslidedown_vi $V2, $V4, 2]} # v2 := {w58, w59, w60, w61, w62, w63, 0, 0} + @{[vsm3c_vi $V0, $V2, 29]} + + @{[vsm3me_vv $V3, $V4, $V3]} # v3 := {w64, w65, w66, w67, w68, w69, w70, w71} + + # Prepare a register with {w60, w61, w62, w63, w64, w65, w66, w67} + @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w60, w61, w62, w63, 0, 0, 0, 0} + @{[vslideup_vi $V2, $V3, 4]} # v2 := {w60, w61, w62, w63, w64, w65, w66, w67} + + @{[vsm3c_vi $V0, $V2, 30]} + @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w62, w63, w64, w65, w66, w67, 0, 0} + @{[vsm3c_vi $V0, $V2, 31]} + + # XOR in the previous state. + @{[vxor_vv $V0, $V0, $V1]} + + bnez $NUM, L_sm3_loop # Check if there are any more block to process +L_sm3_end: + @{[vrev8_v $V0, $V0]} + @{[vse32_v $V0, $CTX]} + ret + +.size ossl_hwsm3_block_data_order_zvksh,.-ossl_hwsm3_block_data_order_zvksh +___ +} + +print $code; + +close STDOUT or die "error closing STDOUT: $!";